Overview

Dataset statistics

Number of variables2
Number of observations361
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.8 KiB
Average record size in memory16.4 B

Variable types

DateTime1
Numeric1

Warnings

DATE has unique values Unique

Reproduction

Analysis started2021-05-11 22:15:42.285281
Analysis finished2021-05-11 22:15:42.865268
Duration0.58 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

DATE
Date

UNIQUE

Distinct361
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
Minimum1990-01-01 00:00:00
Maximum2020-01-01 00:00:00
2021-05-11T22:15:42.964889image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-11T22:15:43.171644image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

PCOALAUUSDM
Real number (ℝ≥0)

Distinct275
Distinct (%)76.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.06970979
Minimum24
Maximum195.1863354
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2021-05-11T22:15:43.358819image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile26.1
Q133.6
median52.43303571
Q385.56173469
95-th percentile125.0858766
Maximum195.1863354
Range171.1863354
Interquartile range (IQR)51.96173469

Descriptive statistics

Standard deviation33.60143246
Coefficient of variation (CV)0.5502143793
Kurtosis0.4179963001
Mean61.06970979
Median Absolute Deviation (MAD)21.43303571
Skewness1.002984538
Sum22046.16523
Variance1129.056264
MonotonicityNot monotonic
2021-05-11T22:15:43.532043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.519
 
5.3%
3113
 
3.6%
26.111
 
3.0%
40.510
 
2.8%
25.16
 
1.7%
33.15
 
1.4%
25.65
 
1.4%
384
 
1.1%
354
 
1.1%
27.153
 
0.8%
Other values (265)281
77.8%
ValueCountFrequency (%)
241
 
0.3%
24.451
 
0.3%
24.91
 
0.3%
24.964285711
 
0.3%
25.16
1.7%
25.1251
 
0.3%
25.65
1.4%
25.821428571
 
0.3%
26.089285711
 
0.3%
26.111
3.0%
ValueCountFrequency (%)
195.18633541
0.3%
173.30357141
0.3%
166.98979591
0.3%
164.49837661
0.3%
143.07589291
0.3%
141.88775511
0.3%
140.99357141
0.3%
137.76315791
0.3%
135.97127331
0.3%
134.62446431
0.3%

Interactions

2021-05-11T22:15:42.340721image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-05-11T22:15:43.669997image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-11T22:15:43.808958image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-11T22:15:43.946875image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

2021-05-11T22:15:42.577794image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-11T22:15:42.809959image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

DATEPCOALAUUSDM
01990-01-0138.0
11990-02-0138.0
21990-03-0138.0
31990-04-0138.0
41990-05-0140.5
51990-06-0140.5
61990-07-0140.5
71990-08-0140.5
81990-09-0140.5
91990-10-0140.5

Last rows

DATEPCOALAUUSDM
3512019-04-0188.764643
3522019-05-0189.564286
3532019-06-0177.629821
3542019-07-0177.845807
3552019-08-0169.739286
3562019-09-0166.958673
3572019-10-0169.194255
3582019-11-0169.729082
3592019-12-0170.464643
3602020-01-0172.106169