Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.6 KiB
Average record size in memory24.1 B

Variable types

CAT3

Warnings

russian has a high cardinality: 995 distinct values High cardinality
english has a high cardinality: 961 distinct values High cardinality
russian is uniformly distributed Uniform
english is uniformly distributed Uniform

Reproduction

Analysis started2020-10-25 20:10:31.664687
Analysis finished2020-10-25 20:10:32.527872
Duration0.86 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

russian
Categorical

HIGH CARDINALITY
UNIFORM

Distinct995
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
знать
 
2
мало
 
2
много
 
2
пора
 
2
что
 
2
Other values (990)
990 
ValueCountFrequency (%) 
знать20.2%
 
мало20.2%
 
много20.2%
 
пора20.2%
 
что20.2%
 
тысяча10.1%
 
звать10.1%
 
возраст10.1%
 
солдат10.1%
 
оказываться10.1%
 
Other values (985)98598.5%
 
2020-10-25T20:10:32.658639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique990 ?
Unique (%)99.0%
2020-10-25T20:10:32.884499image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length6
Mean length6.117
Min length1

Overview of Unicode Properties

Unique unicode characters43
Unique unicode categories7 ?
Unique unicode scripts3 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
о64510.5%
 
т5268.6%
 
а4847.9%
 
е3956.5%
 
с3646.0%
 
и3455.6%
 
н3395.5%
 
ь3165.2%
 
р3065.0%
 
в2634.3%
 
Other values (33)213434.9%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter610699.8%
 
Uppercase Letter3< 0.1%
 
Decimal Number3< 0.1%
 
Space Separator2< 0.1%
 
Open Punctuation1< 0.1%
 
Other Punctuation1< 0.1%
 
Close Punctuation1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
о64510.6%
 
т5268.6%
 
а4847.9%
 
е3956.5%
 
с3646.0%
 
и3455.7%
 
н3395.6%
 
ь3165.2%
 
р3065.0%
 
в2634.3%
 
Other values (24)212334.8%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
М133.3%
 
Р133.3%
 
S133.3%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(1100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
#1100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
6266.7%
 
3133.3%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Cyrillic610699.8%
 
Common80.1%
 
Latin3< 0.1%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о64510.6%
 
т5268.6%
 
а4847.9%
 
е3956.5%
 
с3646.0%
 
и3455.7%
 
н3395.6%
 
ь3165.2%
 
р3065.0%
 
в2634.3%
 
Other values (25)212334.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
225.0%
 
6225.0%
 
(112.5%
 
#112.5%
 
3112.5%
 
)112.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e266.7%
 
S133.3%
 

Most occurring blocks

ValueCountFrequency (%) 
Cyrillic610699.8%
 
ASCII110.2%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о64510.6%
 
т5268.6%
 
а4847.9%
 
е3956.5%
 
с3646.0%
 
и3455.7%
 
н3395.6%
 
ь3165.2%
 
р3065.0%
 
в2634.3%
 
Other values (25)212334.8%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
218.2%
 
e218.2%
 
6218.2%
 
(19.1%
 
S19.1%
 
#19.1%
 
319.1%
 
)19.1%
 

english
Categorical

HIGH CARDINALITY
UNIFORM

Distinct961
Distinct (%)96.1%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
to ask
 
3
to fit, fall; have to
 
3
also, as well, too
 
2
German
 
2
here
 
2
Other values (956)
988 
ValueCountFrequency (%) 
to ask30.3%
 
to fit, fall; have to30.3%
 
also, as well, too20.2%
 
German20.2%
 
here20.2%
 
order20.2%
 
to see20.2%
 
to write20.2%
 
again20.2%
 
to hear20.2%
 
Other values (951)97897.8%
 
2020-10-25T20:10:33.127495image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique924 ?
Unique (%)92.4%
2020-10-25T20:10:33.392522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length130
Median length11
Mean length13.169
Min length1

Overview of Unicode Properties

Unique unicode characters75
Unique unicode categories9 ?
Unique unicode scripts4 ?
Unique unicode blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e137010.4%
 
133410.1%
 
t10488.0%
 
o10217.8%
 
a7886.0%
 
r7305.5%
 
,6735.1%
 
n6565.0%
 
i6164.7%
 
s6114.6%
 
Other values (65)432232.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1058880.4%
 
Space Separator133410.1%
 
Other Punctuation8346.3%
 
Decimal Number1911.5%
 
Open Punctuation750.6%
 
Close Punctuation750.6%
 
Uppercase Letter590.4%
 
Dash Punctuation80.1%
 
Nonspacing Mark5< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e137012.9%
 
t10489.9%
 
o10219.6%
 
a7887.4%
 
r7306.9%
 
n6566.2%
 
i6165.8%
 
s6115.8%
 
l5345.0%
 
h3503.3%
 
Other values (34)286427.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,67380.7%
 
;678.0%
 
#657.8%
 
'111.3%
 
"60.7%
 
40.5%
 
.30.4%
 
?20.2%
 
!20.2%
 
:10.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1334100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S4983.1%
 
M35.1%
 
R35.1%
 
G23.4%
 
I11.7%
 
A11.7%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(75100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
32814.7%
 
12412.6%
 
92412.6%
 
42010.5%
 
72010.5%
 
62010.5%
 
5189.4%
 
8157.9%
 
2136.8%
 
094.7%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)75100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-8100.0%
 

Most frequent Nonspacing Mark characters

ValueCountFrequency (%) 
́5100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1060480.5%
 
Common251719.1%
 
Cyrillic430.3%
 
Inherited5< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e137012.9%
 
t10489.9%
 
o10219.6%
 
a7887.4%
 
r7306.9%
 
n6566.2%
 
i6165.8%
 
s6115.8%
 
l5345.0%
 
h3503.3%
 
Other values (22)288027.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
133453.0%
 
,67326.7%
 
(753.0%
 
)753.0%
 
;672.7%
 
#652.6%
 
3281.1%
 
1241.0%
 
9241.0%
 
4200.8%
 
Other values (14)1325.2%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о614.0%
 
и511.6%
 
к511.6%
 
н37.0%
 
в37.0%
 
е37.0%
 
м37.0%
 
а37.0%
 
ч24.7%
 
р24.7%
 
Other values (8)818.6%
 

Most frequent Inherited characters

ValueCountFrequency (%) 
́5100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1311799.6%
 
Cyrillic430.3%
 
Diacriticals5< 0.1%
 
Punctuation4< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e137010.4%
 
133410.2%
 
t10488.0%
 
o10217.8%
 
a7886.0%
 
r7305.6%
 
,6735.1%
 
n6565.0%
 
i6164.7%
 
s6114.7%
 
Other values (45)427032.6%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
4100.0%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
о614.0%
 
и511.6%
 
к511.6%
 
н37.0%
 
в37.0%
 
е37.0%
 
м37.0%
 
а37.0%
 
ч24.7%
 
р24.7%
 
Other values (8)818.6%
 

Most frequent Diacriticals characters

ValueCountFrequency (%) 
́5100.0%
 

part of speech
Categorical

Distinct37
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size7.8 KiB
noun
374 
verb
232 
adjective
127 
adverb
112 
preposition
 
37
Other values (32)
118 
ValueCountFrequency (%) 
noun37437.4%
 
verb23223.2%
 
adjective12712.7%
 
adverb11211.2%
 
preposition373.7%
 
pronoun363.6%
 
conjunction121.2%
 
misc121.2%
 
cardinal number111.1%
 
particle70.7%
 
Other values (27)404.0%
 
2020-10-25T20:10:33.644281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique20 ?
Unique (%)2.0%
2020-10-25T20:10:33.876809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length26
Median length4
Mean length5.885
Min length3

Overview of Unicode Properties

Unique unicode characters24
Unique unicode categories5 ?
Unique unicode scripts3 ?
Unique unicode blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n98416.7%
 
e69811.9%
 
o58810.0%
 
r4978.4%
 
v4818.2%
 
u4567.7%
 
b3736.3%
 
a3095.3%
 
i2834.8%
 
d2684.6%
 
Other values (14)94816.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter580698.7%
 
Space Separator510.9%
 
Other Punctuation260.4%
 
Open Punctuation1< 0.1%
 
Close Punctuation1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n98416.9%
 
e69812.0%
 
o58810.1%
 
r4978.6%
 
v4818.3%
 
u4567.9%
 
b3736.4%
 
a3095.3%
 
i2834.9%
 
d2684.6%
 
Other values (10)86915.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,26100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
51100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(1100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin580598.6%
 
Common791.3%
 
Cyrillic1< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n98417.0%
 
e69812.0%
 
o58810.1%
 
r4978.6%
 
v4818.3%
 
u4567.9%
 
b3736.4%
 
a3095.3%
 
i2834.9%
 
d2684.6%
 
Other values (9)86815.0%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с1100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
5164.6%
 
,2632.9%
 
(11.3%
 
)11.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII5884> 99.9%
 
Cyrillic1< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n98416.7%
 
e69811.9%
 
o58810.0%
 
r4978.4%
 
v4818.2%
 
u4567.7%
 
b3736.3%
 
a3095.3%
 
i2834.8%
 
d2684.6%
 
Other values (13)94716.1%
 

Most frequent Cyrillic characters

ValueCountFrequency (%) 
с1100.0%
 

Missing values

2020-10-25T20:10:32.254380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:10:32.447174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

russianenglishpart of speech
0иand, thoughconjunction
1вin, atpreposition
2неnotparticle
3онhepronoun
4наon, it, at, topreposition
5яIpronoun
6чтоwhat, that, whyсonjunction, pronoun
7тотthatadjective, pronoun
8бытьto beverb
9сwith, and, from, ofpreposition

Last rows

russianenglishpart of speech
990художникpainter, artistnoun
991знакsignnoun
992заводfactorynoun
993кулакfistnoun
994использоватьto use, utilize, make use ofverb
995стаканglassnoun
996пахнутьto smellverb
997отсюдаfrom hereadverb
998ротmouthnoun
999пораit's time;at times, now and then(See #279)misc