Overview

Dataset statistics

Number of variables5
Number of observations189
Missing cells188
Missing cells (%)19.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.5 KiB
Average record size in memory40.7 B

Variable types

URL1
Categorical3
DateTime1

Warnings

notes has constant value "Reportedly blocked" Constant
date_added is highly correlated with sourceHigh correlation
source is highly correlated with date_addedHigh correlation
notes has 188 (99.5%) missing values Missing
url has unique values Unique

Reproduction

Analysis started2021-05-11 22:15:20.872353
Analysis finished2021-05-11 22:15:22.590598
Duration1.72 second
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

url
URL

UNIQUE

Distinct189
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
http://www.oaklandinstitute.org/
 
1
http://www.goolgule.com/
 
1
http://www.unhcr.org/cgi-bin/texis/vtx/country?iso=eth
 
1
http://ethsat.com/
 
1
http://www.fao.org/
 
1
Other values (184)
184 
ValueCountFrequency (%)
http://www.oaklandinstitute.org/1
 
0.5%
http://www.goolgule.com/1
 
0.5%
http://www.unhcr.org/cgi-bin/texis/vtx/country?iso=eth1
 
0.5%
http://ethsat.com/1
 
0.5%
http://www.fao.org/1
 
0.5%
http://www.nazret.com/1
 
0.5%
http://redeemethiopia.blogspot.com/1
 
0.5%
http://abrahadesta.wordpress.com/1
 
0.5%
http://www.andenet.com/1
 
0.5%
http://www.ginbot7.com/1
 
0.5%
Other values (179)179
94.7%
ValueCountFrequency (%)
http173
91.5%
https16
 
8.5%
ValueCountFrequency (%)
nazret.com8
 
4.2%
www.hrw.org3
 
1.6%
www.cafpde.org3
 
1.6%
www.aeup.org2
 
1.1%
portal.unesco.org2
 
1.1%
www.unesco-iicba.org2
 
1.1%
www.mereja.com2
 
1.1%
telecom.net.et2
 
1.1%
citizenlab.org2
 
1.1%
web.worldbank.org2
 
1.1%
Other values (134)161
85.2%
ValueCountFrequency (%)
/127
67.2%
/blog/index.php7
 
3.7%
/index.htm2
 
1.1%
/index.html2
 
1.1%
/amnews.html1
 
0.5%
/rubrique.php31
 
0.5%
/progynist.html1
 
0.5%
/en/articles/welcome-zone-9-ethiopia1
 
0.5%
/world/ethiopia/index.htm1
 
0.5%
/warka4/1
 
0.5%
Other values (45)45
 
23.8%
ValueCountFrequency (%)
174
92.1%
blog=91
 
0.5%
blog=131
 
0.5%
blog=141
 
0.5%
feed=5&how=paged&what=all1
 
0.5%
blog=151
 
0.5%
id_rubrique=201
 
0.5%
blog=121
 
0.5%
country=231&region=2&section=9&sub_section=21
 
0.5%
dl=01
 
0.5%
Other values (6)6
 
3.2%
ValueCountFrequency (%)
188
99.5%
ethiopia1
 
0.5%

category_code
Categorical

Distinct15
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
NEWS
65 
HUMR
45 
POLR
32 
ECON
13 
ANON
Other values (10)
26 

Length

Max length5
Median length4
Mean length4
Min length3

Characters and Unicode

Total characters756
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)2.1%

Sample

1st rowCULTR
2nd rowNEWS
3rd rowMISC
4th rowMISC
5th rowNEWS

Common Values

ValueCountFrequency (%)
NEWS65
34.4%
HUMR45
23.8%
POLR32
16.9%
ECON13
 
6.9%
ANON8
 
4.2%
CULTR7
 
3.7%
XED5
 
2.6%
MISC3
 
1.6%
HOST3
 
1.6%
MILX2
 
1.1%
Other values (5)6
 
3.2%

Length

2021-05-11T22:15:22.852570image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
news65
34.4%
humr45
23.8%
polr32
16.9%
econ13
 
6.9%
anon8
 
4.2%
cultr7
 
3.7%
xed5
 
2.6%
misc3
 
1.6%
host3
 
1.6%
milx2
 
1.1%
Other values (5)6
 
3.2%

Most occurring characters

ValueCountFrequency (%)
N95
12.6%
R86
11.4%
E85
11.2%
S72
9.5%
W65
8.6%
O56
7.4%
U54
7.1%
H51
6.7%
M50
6.6%
L42
5.6%
Other values (11)100
13.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter756
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N95
12.6%
R86
11.4%
E85
11.2%
S72
9.5%
W65
8.6%
O56
7.4%
U54
7.1%
H51
6.7%
M50
6.6%
L42
5.6%
Other values (11)100
13.2%

Most occurring scripts

ValueCountFrequency (%)
Latin756
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N95
12.6%
R86
11.4%
E85
11.2%
S72
9.5%
W65
8.6%
O56
7.4%
U54
7.1%
H51
6.7%
M50
6.6%
L42
5.6%
Other values (11)100
13.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII756
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N95
12.6%
R86
11.4%
E85
11.2%
S72
9.5%
W65
8.6%
O56
7.4%
U54
7.1%
H51
6.7%
M50
6.6%
L42
5.6%
Other values (11)100
13.2%

date_added
Date

HIGH CORRELATION

Distinct6
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
Minimum2014-04-15 00:00:00
Maximum2018-04-10 00:00:00
2021-05-11T22:15:22.979766image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-05-11T22:15:23.111245image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)

source
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
citizenlab
178 
OONI
 
4
CIPIT
 
4
BBC
 
2
defenddefenders
 
1

Length

Max length15
Median length10
Mean length9.71957672
Min length3

Characters and Unicode

Total characters1837
Distinct characters20
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st rowcitizenlab
2nd rowcitizenlab
3rd rowcitizenlab
4th rowcitizenlab
5th rowcitizenlab

Common Values

ValueCountFrequency (%)
citizenlab178
94.2%
OONI4
 
2.1%
CIPIT4
 
2.1%
BBC2
 
1.1%
defenddefenders1
 
0.5%

Length

2021-05-11T22:15:23.408737image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-11T22:15:23.500340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
citizenlab178
94.2%
ooni4
 
2.1%
cipit4
 
2.1%
bbc2
 
1.1%
defenddefenders1
 
0.5%

Most occurring characters

ValueCountFrequency (%)
i356
19.4%
e183
10.0%
n180
9.8%
c178
9.7%
t178
9.7%
z178
9.7%
l178
9.7%
a178
9.7%
b178
9.7%
I12
 
0.7%
Other values (10)38
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1795
97.7%
Uppercase Letter42
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i356
19.8%
e183
10.2%
n180
10.0%
c178
9.9%
t178
9.9%
z178
9.9%
l178
9.9%
a178
9.9%
b178
9.9%
d4
 
0.2%
Other values (3)4
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
I12
28.6%
O8
19.0%
C6
14.3%
B4
 
9.5%
P4
 
9.5%
T4
 
9.5%
N4
 
9.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1837
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i356
19.4%
e183
10.0%
n180
9.8%
c178
9.7%
t178
9.7%
z178
9.7%
l178
9.7%
a178
9.7%
b178
9.7%
I12
 
0.7%
Other values (10)38
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1837
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i356
19.4%
e183
10.0%
n180
9.8%
c178
9.7%
t178
9.7%
z178
9.7%
l178
9.7%
a178
9.7%
b178
9.7%
I12
 
0.7%
Other values (10)38
 
2.1%

notes
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)100.0%
Missing188
Missing (%)99.5%
Memory size1.6 KiB
Reportedly blocked

Length

Max length18
Median length18
Mean length18
Min length18

Characters and Unicode

Total characters18
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st rowReportedly blocked

Common Values

ValueCountFrequency (%)
Reportedly blocked1
 
0.5%
(Missing)188
99.5%

Length

2021-05-11T22:15:23.728047image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-05-11T22:15:23.809984image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
blocked1
50.0%
reportedly1
50.0%

Most occurring characters

ValueCountFrequency (%)
e3
16.7%
o2
11.1%
d2
11.1%
l2
11.1%
R1
 
5.6%
p1
 
5.6%
r1
 
5.6%
t1
 
5.6%
y1
 
5.6%
1
 
5.6%
Other values (3)3
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter16
88.9%
Uppercase Letter1
 
5.6%
Space Separator1
 
5.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3
18.8%
o2
12.5%
d2
12.5%
l2
12.5%
p1
 
6.2%
r1
 
6.2%
t1
 
6.2%
y1
 
6.2%
b1
 
6.2%
c1
 
6.2%
Uppercase Letter
ValueCountFrequency (%)
R1
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17
94.4%
Common1
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3
17.6%
o2
11.8%
d2
11.8%
l2
11.8%
R1
 
5.9%
p1
 
5.9%
r1
 
5.9%
t1
 
5.9%
y1
 
5.9%
b1
 
5.9%
Other values (2)2
11.8%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII18
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e3
16.7%
o2
11.1%
d2
11.1%
l2
11.1%
R1
 
5.6%
p1
 
5.6%
r1
 
5.6%
t1
 
5.6%
y1
 
5.6%
1
 
5.6%
Other values (3)3
16.7%

Correlations

2021-05-11T22:15:23.878216image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-11T22:15:24.029284image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-11T22:15:22.116561image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-11T22:15:22.435127image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-11T22:15:22.526248image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

urlcategory_codedate_addedsourcenotes
0http://abrahadesta.wordpress.com/CULTR2014-04-15citizenlabNaN
1http://aljazeera.net/NEWS2014-04-15citizenlabNaN
2http://am.wikipedia.org/MISC2014-04-15citizenlabNaN
3http://am.wikipedia.org/wiki/%E1%8B%8B%E1%8A%93%E1%8B%8D_%E1%8C%88%E1%8C%BDMISC2014-04-15citizenlabNaN
4http://amharic.voanews.com/NEWS2014-04-15citizenlabNaN
5http://ancientgebts.org/HUMR2014-04-15citizenlabNaN
6http://carpediemethiopia.blogspot.com/POLR2014-04-15citizenlabNaN
7http://citizenlab.org/NEWS2014-04-15citizenlabNaN
8http://cpj.org/NEWS2014-04-15citizenlabNaN
9http://egoportal.blogspot.com/POLR2014-04-15citizenlabNaN

Last rows

urlcategory_codedate_addedsourcenotes
179https://www.citizenlab.org/NEWS2014-04-15citizenlabNaN
180https://www.dropbox.com/s/n65b3d67f82asn2/Leaked%20National%20Entrance%20Exam_English.pdf?dl=0FILE2016-05-30OONINaN
181https://www.facebook.com/JawarmdNEWS2016-05-30OONINaN
182https://www.facebook.com/pages/Addis-Neger/49967100821NEWS2014-04-15citizenlabNaN
183https://www.hrw.org/HUMR2014-04-15citizenlabNaN
184https://www.mereja.com/NEWS2016-09-09CIPITNaN
185https://www.oromiamedia.org/NEWS2016-05-30OONINaN
186https://www.privacyinternational.org/HUMR2014-04-15citizenlabNaN
187https://www.torproject.org/NEWS2014-04-15citizenlabNaN
188https://www.twitter.com/HOST2014-04-15citizenlabNaN