Overview

Dataset statistics

Number of variables14
Number of observations32561
Missing cells4262
Missing cells (%)0.9%
Duplicate rows25
Duplicate rows (%)0.1%
Total size in memory18.1 MiB
Average record size in memory583.0 B

Variable types

CAT8
NUM6

Dataset

DescriptionPredict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)). Prediction task is to determine whether a person makes over 50K a year.
CreatorBarry Becker
AuthorRonny Kohavi and Barry Becker
URLhttps://archive.ics.uci.edu/ml/datasets/adult

Variable descriptions

agedefinition 0
workclassdefinition 1
fnlwgtdefinition 2
educationdefinition 3
education-numdefinition 4
marital-statusdefinition 5
occupationdefinition 6
relationshipdefinition 7
racedefinition 8
sexdefinition 9
capital-gaindefinition 10
capital-lossdefinition 11
hours-per-weekdefinition 12
native-countrydefinition 13

Warnings

Dataset has 25 (0.1%) duplicate rows Duplicates
workclass has 1836 (5.6%) missing values Missing
occupation has 1843 (5.7%) missing values Missing
native-country has 583 (1.8%) missing values Missing
capital-gain has 29849 (91.7%) zeros Zeros
capital-loss has 31042 (95.3%) zeros Zeros

Reproduction

Analysis started2020-10-25 20:12:41.067017
Analysis finished2020-10-25 20:12:53.928648
Duration12.86 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct73
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.58164676
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size254.5 KiB
2020-10-25T20:12:54.032127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.64043255
Coefficient of variation (CV)0.3535471837
Kurtosis-0.1661274596
Mean38.58164676
Median Absolute Deviation (MAD)10
Skewness0.5587433694
Sum1256257
Variance186.0614002
MonotocityNot monotonic
2020-10-25T20:12:54.257601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
368982.8%
 
318882.7%
 
348862.7%
 
238772.7%
 
358762.7%
 
338752.7%
 
288672.7%
 
308612.6%
 
378582.6%
 
258412.6%
 
278352.6%
 
328282.5%
 
388272.5%
 
398162.5%
 
298132.5%
 
418082.5%
 
247982.5%
 
407942.4%
 
267852.4%
 
427802.4%
 
437702.4%
 
227652.3%
 
207532.3%
 
467372.3%
 
457342.3%
 
Other values (48)1199136.8%
 
ValueCountFrequency (%) 
173951.2%
 
185501.7%
 
197122.2%
 
207532.3%
 
217202.2%
 
227652.3%
 
238772.7%
 
247982.5%
 
258412.6%
 
267852.4%
 
ValueCountFrequency (%) 
90430.1%
 
883< 0.1%
 
871< 0.1%
 
861< 0.1%
 
853< 0.1%
 
8410< 0.1%
 
836< 0.1%
 
8212< 0.1%
 
81200.1%
 
80220.1%
 

workclass
Categorical

MISSING

Distinct8
Distinct (%)< 0.1%
Missing1836
Missing (%)5.6%
Memory size254.5 KiB
Private
22696 
Self-emp-not-inc
2541 
Local-gov
 
2093
State-gov
 
1298
Self-emp-inc
 
1116
Other values (3)
 
981
ValueCountFrequency (%) 
Private2269669.7%
 
Self-emp-not-inc25417.8%
 
Local-gov20936.4%
 
State-gov12984.0%
 
Self-emp-inc11163.4%
 
Federal-gov9602.9%
 
Without-pay14< 0.1%
 
Never-worked7< 0.1%
 
(Missing)18365.6%
 
2020-10-25T20:12:54.497044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:54.629285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:12:54.833949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length17
Median length8
Mean length8.920794816
Min length3

Overview of Unicode Properties

Unique unicode characters28
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e3324911.4%
 
3072510.6%
 
a288979.9%
 
t278619.6%
 
v270549.3%
 
i263679.1%
 
r236708.1%
 
P226967.8%
 
-142274.9%
 
n98703.4%
 
o90063.1%
 
l67102.3%
 
c57502.0%
 
S49551.7%
 
g43511.5%
 
p36711.3%
 
f36571.3%
 
m36571.3%
 
L20930.7%
 
d9670.3%
 
F9600.3%
 
W14< 0.1%
 
h14< 0.1%
 
u14< 0.1%
 
y14< 0.1%
 
Other values (3)21< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter21479373.9%
 
Space Separator3072510.6%
 
Uppercase Letter3072510.6%
 
Dash Punctuation142274.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
30725100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P2269673.9%
 
S495516.1%
 
L20936.8%
 
F9603.1%
 
W14< 0.1%
 
N7< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e3324915.5%
 
a2889713.5%
 
t2786113.0%
 
v2705412.6%
 
i2636712.3%
 
r2367011.0%
 
n98704.6%
 
o90064.2%
 
l67103.1%
 
c57502.7%
 
g43512.0%
 
p36711.7%
 
f36571.7%
 
m36571.7%
 
d9670.5%
 
h14< 0.1%
 
u14< 0.1%
 
y14< 0.1%
 
w7< 0.1%
 
k7< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-14227100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin24551884.5%
 
Common4495215.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
3072568.4%
 
-1422731.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e3324913.5%
 
a2889711.8%
 
t2786111.3%
 
v2705411.0%
 
i2636710.7%
 
r236709.6%
 
P226969.2%
 
n98704.0%
 
o90063.7%
 
l67102.7%
 
c57502.3%
 
S49552.0%
 
g43511.8%
 
p36711.5%
 
f36571.5%
 
m36571.5%
 
L20930.9%
 
d9670.4%
 
F9600.4%
 
W14< 0.1%
 
h14< 0.1%
 
u14< 0.1%
 
y14< 0.1%
 
N7< 0.1%
 
w7< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII290470100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e3324911.4%
 
3072510.6%
 
a288979.9%
 
t278619.6%
 
v270549.3%
 
i263679.1%
 
r236708.1%
 
P226967.8%
 
-142274.9%
 
n98703.4%
 
o90063.1%
 
l67102.3%
 
c57502.0%
 
S49551.7%
 
g43511.5%
 
p36711.3%
 
f36571.3%
 
m36571.3%
 
L20930.7%
 
d9670.3%
 
F9600.3%
 
W14< 0.1%
 
h14< 0.1%
 
u14< 0.1%
 
y14< 0.1%
 
Other values (3)21< 0.1%
 

fnlwgt
Real number (ℝ≥0)

Distinct21648
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189778.3665
Minimum12285
Maximum1484705
Zeros0
Zeros (%)0.0%
Memory size254.5 KiB
2020-10-25T20:12:55.028156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39460
Q1117827
median178356
Q3237051
95-th percentile379682
Maximum1484705
Range1472420
Interquartile range (IQR)119224

Descriptive statistics

Standard deviation105549.9777
Coefficient of variation (CV)0.5561749721
Kurtosis6.218810978
Mean189778.3665
Median Absolute Deviation (MAD)59894
Skewness1.446980095
Sum6179373392
Variance1.114079779e+10
MonotocityNot monotonic
2020-10-25T20:12:55.400676image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
16419013< 0.1%
 
20348813< 0.1%
 
12301113< 0.1%
 
11336412< 0.1%
 
12112412< 0.1%
 
12667512< 0.1%
 
14899512< 0.1%
 
12398311< 0.1%
 
19029011< 0.1%
 
12656911< 0.1%
 
15565911< 0.1%
 
10230811< 0.1%
 
12027711< 0.1%
 
24199811< 0.1%
 
11148311< 0.1%
 
12013111< 0.1%
 
18824611< 0.1%
 
11796310< 0.1%
 
17478910< 0.1%
 
11249710< 0.1%
 
19388210< 0.1%
 
12593310< 0.1%
 
21612910< 0.1%
 
9918510< 0.1%
 
12546110< 0.1%
 
Other values (21623)3228499.1%
 
ValueCountFrequency (%) 
122851< 0.1%
 
137691< 0.1%
 
148781< 0.1%
 
188271< 0.1%
 
192141< 0.1%
 
193025< 0.1%
 
193952< 0.1%
 
194101< 0.1%
 
194911< 0.1%
 
195201< 0.1%
 
ValueCountFrequency (%) 
14847051< 0.1%
 
14554351< 0.1%
 
13661201< 0.1%
 
12683391< 0.1%
 
12265831< 0.1%
 
11846221< 0.1%
 
11613631< 0.1%
 
11256131< 0.1%
 
10974531< 0.1%
 
10855151< 0.1%
 

education
Categorical

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.5 KiB
HS-grad
10501 
Some-college
7291 
Bachelors
5355 
Masters
1723 
Assoc-voc
1382 
Other values (11)
6309 
ValueCountFrequency (%) 
HS-grad1050132.3%
 
Some-college729122.4%
 
Bachelors535516.4%
 
Masters17235.3%
 
Assoc-voc13824.2%
 
11th11753.6%
 
Assoc-acdm10673.3%
 
10th9332.9%
 
7th-8th6462.0%
 
Prof-school5761.8%
 
9th5141.6%
 
12th4331.3%
 
Doctorate4131.3%
 
5th-6th3331.0%
 
1st-4th1680.5%
 
Preschool510.2%
 
2020-10-25T20:12:55.625202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:55.805330image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length13
Median length8
Mean length9.433709038
Min length4

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories5 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
3256110.6%
 
e294159.6%
 
o264248.6%
 
-219647.2%
 
l205646.7%
 
a190596.2%
 
r186196.1%
 
c185846.1%
 
S177925.8%
 
g177925.8%
 
s144944.7%
 
d115683.8%
 
h111633.6%
 
H105013.4%
 
m83582.7%
 
t78982.6%
 
B53551.7%
 
138841.3%
 
A24490.8%
 
M17230.6%
 
v13820.4%
 
09330.3%
 
76460.2%
 
86460.2%
 
P6270.2%
 
Other values (7)27700.9%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter20589667.0%
 
Uppercase Letter3886012.7%
 
Space Separator3256110.6%
 
Dash Punctuation219647.2%
 
Decimal Number78902.6%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S1779245.8%
 
H1050127.0%
 
B535513.8%
 
A24496.3%
 
M17234.4%
 
P6271.6%
 
D4131.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e2941514.3%
 
o2642412.8%
 
l2056410.0%
 
a190599.3%
 
r186199.0%
 
c185849.0%
 
g177928.6%
 
s144947.0%
 
d115685.6%
 
h111635.4%
 
m83584.1%
 
t78983.8%
 
v13820.7%
 
f5760.3%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-21964100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1388449.2%
 
093311.8%
 
76468.2%
 
86468.2%
 
95146.5%
 
24335.5%
 
53334.2%
 
63334.2%
 
41682.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin24475679.7%
 
Common6241520.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
3256152.2%
 
-2196435.2%
 
138846.2%
 
09331.5%
 
76461.0%
 
86461.0%
 
95140.8%
 
24330.7%
 
53330.5%
 
63330.5%
 
41680.3%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e2941512.0%
 
o2642410.8%
 
l205648.4%
 
a190597.8%
 
r186197.6%
 
c185847.6%
 
S177927.3%
 
g177927.3%
 
s144945.9%
 
d115684.7%
 
h111634.6%
 
H105014.3%
 
m83583.4%
 
t78983.2%
 
B53552.2%
 
A24491.0%
 
M17230.7%
 
v13820.6%
 
P6270.3%
 
f5760.2%
 
D4130.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII307171100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
3256110.6%
 
e294159.6%
 
o264248.6%
 
-219647.2%
 
l205646.7%
 
a190596.2%
 
r186196.1%
 
c185846.1%
 
S177925.8%
 
g177925.8%
 
s144944.7%
 
d115683.8%
 
h111633.6%
 
H105013.4%
 
m83582.7%
 
t78982.6%
 
B53551.7%
 
138841.3%
 
A24490.8%
 
M17230.6%
 
v13820.4%
 
09330.3%
 
76460.2%
 
86460.2%
 
P6270.2%
 
Other values (7)27700.9%
 

education-num
Real number (ℝ≥0)

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.08067934
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size254.5 KiB
2020-10-25T20:12:55.954725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.572720332
Coefficient of variation (CV)0.2552129916
Kurtosis0.6234440748
Mean10.08067934
Median Absolute Deviation (MAD)1
Skewness-0.3116758679
Sum328237
Variance6.618889907
MonotocityNot monotonic
2020-10-25T20:12:56.109267image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
91050132.3%
 
10729122.4%
 
13535516.4%
 
1417235.3%
 
1113824.2%
 
711753.6%
 
1210673.3%
 
69332.9%
 
46462.0%
 
155761.8%
 
55141.6%
 
84331.3%
 
164131.3%
 
33331.0%
 
21680.5%
 
1510.2%
 
ValueCountFrequency (%) 
1510.2%
 
21680.5%
 
33331.0%
 
46462.0%
 
55141.6%
 
69332.9%
 
711753.6%
 
84331.3%
 
91050132.3%
 
10729122.4%
 
ValueCountFrequency (%) 
164131.3%
 
155761.8%
 
1417235.3%
 
13535516.4%
 
1210673.3%
 
1113824.2%
 
10729122.4%
 
91050132.3%
 
84331.3%
 
711753.6%
 

marital-status
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.5 KiB
Married-civ-spouse
14976 
Never-married
10683 
Divorced
4443 
Separated
 
1025
Widowed
 
993
Other values (2)
 
441
ValueCountFrequency (%) 
Married-civ-spouse1497646.0%
 
Never-married1068332.8%
 
Divorced444313.6%
 
Separated10253.1%
 
Widowed9933.0%
 
Married-spouse-absent4181.3%
 
Married-AF-spouse230.1%
 
2020-10-25T20:12:56.300828image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:56.435385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:12:56.629533image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length22
Median length14
Mean length15.41405362
Min length8

Overview of Unicode Properties

Unique unicode characters25
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e7078714.1%
 
r6835113.6%
 
i465129.3%
 
-415178.3%
 
d335546.7%
 
325616.5%
 
s312526.2%
 
v301026.0%
 
a285685.7%
 
o208534.2%
 
c194193.9%
 
p164423.3%
 
M154173.1%
 
u154173.1%
 
N106832.1%
 
m106832.1%
 
D44430.9%
 
t14430.3%
 
S10250.2%
 
W9930.2%
 
w9930.2%
 
b4180.1%
 
n4180.1%
 
A23< 0.1%
 
F23< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter39521278.7%
 
Dash Punctuation415178.3%
 
Uppercase Letter326076.5%
 
Space Separator325616.5%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1541747.3%
 
N1068332.8%
 
D444313.6%
 
S10253.1%
 
W9933.0%
 
A230.1%
 
F230.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e7078717.9%
 
r6835117.3%
 
i4651211.8%
 
d335548.5%
 
s312527.9%
 
v301027.6%
 
a285687.2%
 
o208535.3%
 
c194194.9%
 
p164424.2%
 
u154173.9%
 
m106832.7%
 
t14430.4%
 
w9930.3%
 
b4180.1%
 
n4180.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-41517100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin42781985.2%
 
Common7407814.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
-4151756.0%
 
3256144.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e7078716.5%
 
r6835116.0%
 
i4651210.9%
 
d335547.8%
 
s312527.3%
 
v301027.0%
 
a285686.7%
 
o208534.9%
 
c194194.5%
 
p164423.8%
 
M154173.6%
 
u154173.6%
 
N106832.5%
 
m106832.5%
 
D44431.0%
 
t14430.3%
 
S10250.2%
 
W9930.2%
 
w9930.2%
 
b4180.1%
 
n4180.1%
 
A23< 0.1%
 
F23< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII501897100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e7078714.1%
 
r6835113.6%
 
i465129.3%
 
-415178.3%
 
d335546.7%
 
325616.5%
 
s312526.2%
 
v301026.0%
 
a285685.7%
 
o208534.2%
 
c194193.9%
 
p164423.3%
 
M154173.1%
 
u154173.1%
 
N106832.1%
 
m106832.1%
 
D44430.9%
 
t14430.3%
 
S10250.2%
 
W9930.2%
 
w9930.2%
 
b4180.1%
 
n4180.1%
 
A23< 0.1%
 
F23< 0.1%
 

occupation
Categorical

MISSING

Distinct14
Distinct (%)< 0.1%
Missing1843
Missing (%)5.7%
Memory size254.5 KiB
Prof-specialty
4140 
Craft-repair
4099 
Exec-managerial
4066 
Adm-clerical
3770 
Sales
3650 
Other values (9)
10993 
ValueCountFrequency (%) 
Prof-specialty414012.7%
 
Craft-repair409912.6%
 
Exec-managerial406612.5%
 
Adm-clerical377011.6%
 
Sales365011.2%
 
Other-service329510.1%
 
Machine-op-inspct20026.1%
 
Transport-moving15974.9%
 
Handlers-cleaners13704.2%
 
Farming-fishing9943.1%
 
Tech-support9282.9%
 
Protective-serv6492.0%
 
Priv-house-serv1490.5%
 
Armed-Forces9< 0.1%
 
(Missing)18435.7%
 
2020-10-25T20:12:56.816508image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:56.995321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length18
Median length14
Mean length13.25849943
Min length3

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e4297910.0%
 
a411329.5%
 
r403339.3%
 
307187.1%
 
-292196.8%
 
i287516.7%
 
c260016.0%
 
l221365.1%
 
s203024.7%
 
n196784.6%
 
t173594.0%
 
p156963.6%
 
o110712.6%
 
m104362.4%
 
f92332.1%
 
g76511.8%
 
h73681.7%
 
v64881.5%
 
d51491.2%
 
P49381.1%
 
y41401.0%
 
C40990.9%
 
E40660.9%
 
x40660.9%
 
A37790.9%
 
Other values (7)149223.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter34104679.0%
 
Uppercase Letter307277.1%
 
Space Separator307187.1%
 
Dash Punctuation292196.8%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
30718100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P493816.1%
 
C409913.3%
 
E406613.2%
 
A377912.3%
 
S365011.9%
 
O329510.7%
 
T25258.2%
 
M20026.5%
 
H13704.5%
 
F10033.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e4297912.6%
 
a4113212.1%
 
r4033311.8%
 
i287518.4%
 
c260017.6%
 
l221366.5%
 
s203026.0%
 
n196785.8%
 
t173595.1%
 
p156964.6%
 
o110713.2%
 
m104363.1%
 
f92332.7%
 
g76512.2%
 
h73682.2%
 
v64881.9%
 
d51491.5%
 
y41401.2%
 
x40661.2%
 
u10770.3%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-29219100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin37177386.1%
 
Common5993713.9%
 

Most frequent Common characters

ValueCountFrequency (%) 
3071851.3%
 
-2921948.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e4297911.6%
 
a4113211.1%
 
r4033310.8%
 
i287517.7%
 
c260017.0%
 
l221366.0%
 
s203025.5%
 
n196785.3%
 
t173594.7%
 
p156964.2%
 
o110713.0%
 
m104362.8%
 
f92332.5%
 
g76512.1%
 
h73682.0%
 
v64881.7%
 
d51491.4%
 
P49381.3%
 
y41401.1%
 
C40991.1%
 
E40661.1%
 
x40661.1%
 
A37791.0%
 
S36501.0%
 
O32950.9%
 
Other values (5)79772.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII431710100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e4297910.0%
 
a411329.5%
 
r403339.3%
 
307187.1%
 
-292196.8%
 
i287516.7%
 
c260016.0%
 
l221365.1%
 
s203024.7%
 
n196784.6%
 
t173594.0%
 
p156963.6%
 
o110712.6%
 
m104362.4%
 
f92332.1%
 
g76511.8%
 
h73681.7%
 
v64881.5%
 
d51491.2%
 
P49381.1%
 
y41401.0%
 
C40990.9%
 
E40660.9%
 
x40660.9%
 
A37790.9%
 
Other values (7)149223.5%
 

relationship
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.5 KiB
Husband
13193 
Not-in-family
8305 
Own-child
5068 
Unmarried
3446 
Wife
1568 
ValueCountFrequency (%) 
Husband1319340.5%
 
Not-in-family830525.5%
 
Own-child506815.6%
 
Unmarried344610.6%
 
Wife15684.8%
 
Other-relative9813.0%
 
2020-10-25T20:12:57.191232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:57.316465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:12:57.485227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length15
Median length10
Mean length10.11974448
Min length5

Overview of Unicode Properties

Unique unicode characters26
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
325619.9%
 
n300129.1%
 
i276738.4%
 
a259257.9%
 
-226596.9%
 
d217076.6%
 
l143544.4%
 
H131934.0%
 
u131934.0%
 
s131934.0%
 
b131934.0%
 
m117513.6%
 
t102673.1%
 
f98733.0%
 
r88542.7%
 
N83052.5%
 
o83052.5%
 
y83052.5%
 
e79572.4%
 
O60491.8%
 
h60491.8%
 
w50681.5%
 
c50681.5%
 
U34461.0%
 
W15680.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter24172873.4%
 
Space Separator325619.9%
 
Uppercase Letter325619.9%
 
Dash Punctuation226596.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
H1319340.5%
 
N830525.5%
 
O604918.6%
 
U344610.6%
 
W15684.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n3001212.4%
 
i2767311.4%
 
a2592510.7%
 
d217079.0%
 
l143545.9%
 
u131935.5%
 
s131935.5%
 
b131935.5%
 
m117514.9%
 
t102674.2%
 
f98734.1%
 
r88543.7%
 
o83053.4%
 
y83053.4%
 
e79573.3%
 
h60492.5%
 
w50682.1%
 
c50682.1%
 
v9810.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-22659100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin27428983.2%
 
Common5522016.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
3256159.0%
 
-2265941.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n3001210.9%
 
i2767310.1%
 
a259259.5%
 
d217077.9%
 
l143545.2%
 
H131934.8%
 
u131934.8%
 
s131934.8%
 
b131934.8%
 
m117514.3%
 
t102673.7%
 
f98733.6%
 
r88543.2%
 
N83053.0%
 
o83053.0%
 
y83053.0%
 
e79572.9%
 
O60492.2%
 
h60492.2%
 
w50681.8%
 
c50681.8%
 
U34461.3%
 
W15680.6%
 
v9810.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII329509100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
325619.9%
 
n300129.1%
 
i276738.4%
 
a259257.9%
 
-226596.9%
 
d217076.6%
 
l143544.4%
 
H131934.0%
 
u131934.0%
 
s131934.0%
 
b131934.0%
 
m117513.6%
 
t102673.1%
 
f98733.0%
 
r88542.7%
 
N83052.5%
 
o83052.5%
 
y83052.5%
 
e79572.4%
 
O60491.8%
 
h60491.8%
 
w50681.5%
 
c50681.5%
 
U34461.0%
 
W15680.5%
 

race
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.5 KiB
White
27816 
Black
3124 
Asian-Pac-Islander
 
1039
Amer-Indian-Eskimo
 
311
Other
 
271
ValueCountFrequency (%) 
White2781685.4%
 
Black31249.6%
 
Asian-Pac-Islander10393.2%
 
Amer-Indian-Eskimo3111.0%
 
Other2710.8%
 
2020-10-25T20:12:57.658054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:57.777938image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:12:57.952614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length6
Mean length6.53898836
Min length6

Overview of Unicode Properties

Unique unicode characters23
Unique unicode categories4 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
3256115.3%
 
i2947713.8%
 
e2943713.8%
 
h2808713.2%
 
t2808713.2%
 
W2781613.1%
 
a65523.1%
 
l41632.0%
 
c41632.0%
 
k34351.6%
 
B31241.5%
 
n27001.3%
 
-27001.3%
 
s23891.1%
 
r16210.8%
 
A13500.6%
 
I13500.6%
 
d13500.6%
 
P10390.5%
 
m6220.3%
 
E3110.1%
 
o3110.1%
 
O2710.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter14239466.9%
 
Uppercase Letter3526116.6%
 
Space Separator3256115.3%
 
Dash Punctuation27001.3%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
W2781678.9%
 
B31248.9%
 
A13503.8%
 
I13503.8%
 
P10392.9%
 
E3110.9%
 
O2710.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
i2947720.7%
 
e2943720.7%
 
h2808719.7%
 
t2808719.7%
 
a65524.6%
 
l41632.9%
 
c41632.9%
 
k34352.4%
 
n27001.9%
 
s23891.7%
 
r16211.1%
 
d13500.9%
 
m6220.4%
 
o3110.2%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-2700100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin17765583.4%
 
Common3526116.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
3256192.3%
 
-27007.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
i2947716.6%
 
e2943716.6%
 
h2808715.8%
 
t2808715.8%
 
W2781615.7%
 
a65523.7%
 
l41632.3%
 
c41632.3%
 
k34351.9%
 
B31241.8%
 
n27001.5%
 
s23891.3%
 
r16210.9%
 
A13500.8%
 
I13500.8%
 
d13500.8%
 
P10390.6%
 
m6220.4%
 
E3110.2%
 
o3110.2%
 
O2710.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII212916100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
3256115.3%
 
i2947713.8%
 
e2943713.8%
 
h2808713.2%
 
t2808713.2%
 
W2781613.1%
 
a65523.1%
 
l41632.0%
 
c41632.0%
 
k34351.6%
 
B31241.5%
 
n27001.3%
 
-27001.3%
 
s23891.1%
 
r16210.8%
 
A13500.6%
 
I13500.6%
 
d13500.6%
 
P10390.5%
 
m6220.3%
 
E3110.1%
 
o3110.1%
 
O2710.1%
 

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size254.5 KiB
Male
21790 
Female
10771 
ValueCountFrequency (%) 
Male2179066.9%
 
Female1077133.1%
 
2020-10-25T20:12:58.145833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:12:58.250185image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:12:58.371065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length7
Median length5
Mean length5.661589018
Min length5

Overview of Unicode Properties

Unique unicode characters7
Unique unicode categories3 ?
Unique unicode scripts2 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e4333223.5%
 
3256117.7%
 
a3256117.7%
 
l3256117.7%
 
M2179011.8%
 
F107715.8%
 
m107715.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter11922564.7%
 
Space Separator3256117.7%
 
Uppercase Letter3256117.7%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M2179066.9%
 
F1077133.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e4333236.3%
 
a3256127.3%
 
l3256127.3%
 
m107719.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin15178682.3%
 
Common3256117.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
32561100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e4333228.5%
 
a3256121.5%
 
l3256121.5%
 
M2179014.4%
 
F107717.1%
 
m107717.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII184347100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e4333223.5%
 
3256117.7%
 
a3256117.7%
 
l3256117.7%
 
M2179011.8%
 
F107715.8%
 
m107715.8%
 

capital-gain
Real number (ℝ≥0)

ZEROS

Distinct119
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1077.648844
Minimum0
Maximum99999
Zeros29849
Zeros (%)91.7%
Memory size254.5 KiB
2020-10-25T20:12:58.571589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7385.292085
Coefficient of variation (CV)6.853152702
Kurtosis154.7994379
Mean1077.648844
Median Absolute Deviation (MAD)0
Skewness11.95384769
Sum35089324
Variance54542539.18
MonotocityNot monotonic
2020-10-25T20:12:58.797400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
02984991.7%
 
150243471.1%
 
76882840.9%
 
72982460.8%
 
999991590.5%
 
5178970.3%
 
3103970.3%
 
4386700.2%
 
5013690.2%
 
8614550.2%
 
3325530.2%
 
2174480.1%
 
10520430.1%
 
4064420.1%
 
4650410.1%
 
14084410.1%
 
20051370.1%
 
3137370.1%
 
27828340.1%
 
594340.1%
 
3908320.1%
 
2829310.1%
 
13550270.1%
 
6849270.1%
 
14344260.1%
 
Other values (94)7352.3%
 
ValueCountFrequency (%) 
02984991.7%
 
1146< 0.1%
 
4012< 0.1%
 
594340.1%
 
9148< 0.1%
 
9915< 0.1%
 
1055250.1%
 
10864< 0.1%
 
11111< 0.1%
 
11518< 0.1%
 
ValueCountFrequency (%) 
999991590.5%
 
413102< 0.1%
 
340955< 0.1%
 
27828340.1%
 
2523611< 0.1%
 
251244< 0.1%
 
220401< 0.1%
 
20051370.1%
 
184812< 0.1%
 
158316< 0.1%
 

capital-loss
Real number (ℝ≥0)

ZEROS

Distinct92
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.30382973
Minimum0
Maximum4356
Zeros31042
Zeros (%)95.3%
Memory size254.5 KiB
2020-10-25T20:12:59.171288image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation402.9602186
Coefficient of variation (CV)4.615607584
Kurtosis20.37680171
Mean87.30382973
Median Absolute Deviation (MAD)0
Skewness4.594629122
Sum2842700
Variance162376.9378
MonotocityNot monotonic
2020-10-25T20:12:59.379062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
03104295.3%
 
19022020.6%
 
19771680.5%
 
18871590.5%
 
1848510.2%
 
1485510.2%
 
2415490.2%
 
1602470.1%
 
1740420.1%
 
1590400.1%
 
1876390.1%
 
1672340.1%
 
1564250.1%
 
2258250.1%
 
1669240.1%
 
1741240.1%
 
2001240.1%
 
1980230.1%
 
1719220.1%
 
2002210.1%
 
2051210.1%
 
1408210.1%
 
1579200.1%
 
2377200.1%
 
1721180.1%
 
Other values (67)3491.1%
 
ValueCountFrequency (%) 
03104295.3%
 
1551< 0.1%
 
2134< 0.1%
 
3233< 0.1%
 
4193< 0.1%
 
62512< 0.1%
 
6533< 0.1%
 
8102< 0.1%
 
8806< 0.1%
 
9742< 0.1%
 
ValueCountFrequency (%) 
43563< 0.1%
 
39002< 0.1%
 
37702< 0.1%
 
36832< 0.1%
 
30042< 0.1%
 
282410< 0.1%
 
27542< 0.1%
 
26035< 0.1%
 
255912< 0.1%
 
25474< 0.1%
 

hours-per-week
Real number (ℝ≥0)

Distinct94
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.43745585
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size254.5 KiB
2020-10-25T20:12:59.611537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.34742868
Coefficient of variation (CV)0.3053463286
Kurtosis2.916686796
Mean40.43745585
Median Absolute Deviation (MAD)3
Skewness0.2276425368
Sum1316684
Variance152.4589951
MonotocityNot monotonic
2020-10-25T20:12:59.841647image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
401521746.7%
 
5028198.7%
 
4518245.6%
 
6014754.5%
 
3512974.0%
 
2012243.8%
 
3011493.5%
 
556942.1%
 
256742.1%
 
485171.6%
 
384761.5%
 
154041.2%
 
702910.9%
 
102780.9%
 
322660.8%
 
242520.8%
 
652440.7%
 
362200.7%
 
422190.7%
 
442120.7%
 
162050.6%
 
121730.5%
 
431510.5%
 
371490.5%
 
81450.4%
 
Other values (69)19866.1%
 
ValueCountFrequency (%) 
1200.1%
 
2320.1%
 
3390.1%
 
4540.2%
 
5600.2%
 
6640.2%
 
7260.1%
 
81450.4%
 
9180.1%
 
102780.9%
 
ValueCountFrequency (%) 
99850.3%
 
9811< 0.1%
 
972< 0.1%
 
965< 0.1%
 
952< 0.1%
 
941< 0.1%
 
921< 0.1%
 
913< 0.1%
 
90290.1%
 
892< 0.1%
 

native-country
Categorical

MISSING

Distinct41
Distinct (%)0.1%
Missing583
Missing (%)1.8%
Memory size254.5 KiB
United-States
29170 
Mexico
 
643
Philippines
 
198
Germany
 
137
Canada
 
121
Other values (36)
 
1709
ValueCountFrequency (%) 
United-States2917089.6%
 
Mexico6432.0%
 
Philippines1980.6%
 
Germany1370.4%
 
Canada1210.4%
 
Puerto-Rico1140.4%
 
El-Salvador1060.3%
 
India1000.3%
 
Cuba950.3%
 
England900.3%
 
Jamaica810.2%
 
South800.2%
 
China750.2%
 
Italy730.2%
 
Dominican-Republic700.2%
 
Vietnam670.2%
 
Guatemala640.2%
 
Japan620.2%
 
Poland600.2%
 
Columbia590.2%
 
Taiwan510.2%
 
Haiti440.1%
 
Iran430.1%
 
Portugal370.1%
 
Nicaragua340.1%
 
Other values (16)3040.9%
 
(Missing)5831.8%
 
2020-10-25T20:13:00.094729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/