Advanced Usage

A set of options is available in order to adapt the report generated.

General settings

Parameter

Type

Default

Description

title

string

Pandas Profiling Report

Title for the report, shown in the header and title bar.

pool_size

integer

0

Number of workers in thread pool. When set to zero, it is set to the number of CPUs available.

progress_bar

boolean

True

If True, pandas-profiling will display a progress bar.

The configuration can be changed in the following ways:

Configuration example
# Change the config when creating the report
profile = df.profile_report(title="Pandas Profiling Report", pool_size=1)

# Change the config after
profile.set_variable("html.minify_html", False)

profile.to_file("output.html")

Variable summary settings

Parameter

Type

Default

Description

sort

None, asc or desc

None

Sort the variables asc(ending), desc(ending) or None (leaves original sorting).

variables.descriptions

dict

{}

Ability to display a description alongside the descriptive statistics of each variable ({‘var_name’: ‘Description’}).

vars.num.quantiles

list[float]

[0.05,0.25,0.5,0.75,0.95]

The quantiles to calculate. Note that .25, .5 and .75 are required for other metrics median and IQR.

vars.num.skewness_threshold

integer

20

Warn if the skewness is above this threshold.

vars.num.low_categorical_threshold

integer

5

If the number of distinct values is smaller than this number, then the series is considered to be categorical. Set to 0 to disable.

vars.num.chi_squared_threshold

float

0.999

Set to zero to disable chi squared calculation.

vars.cat.length

boolean

True

Check the string length and aggregate values (min, max, mean, media).

vars.cat.unicode

boolean

False

Check the distribution of characters and their Unicode properties. Often informative, but may be computationally expensive.

vars.cat.cardinality_threshold

integer

50

Warn if the number of distinct values is above this threshold.

vars.cat.n_obs

integer

5

Display this number of observations.

vars.cat.chi_squared_threshold

float

0.999

Same as above.

vars.bool.n_obs

integer

3

Same as above.

Configuration example
profile = df.profile_report(
      sort='ascending',
      vars={
          'num':{'low_categorical_threshold': 0},
          'cat':{
            'length':True,
            'unicode':False,
            'n_obs': 5,
          }
      }
)

profile.set_variable('variables.descriptions',
    {
      'files': 'Files in the filesystem',
      'datec': 'Creation date',
      'datem': 'Modification date',
    }
)

profile.to_file("report.html")

Missing data overview plots

Parameter

Type

Default

Description

missing_diagrams.bar

boolean

True

Display a bar chart with counts of missing values for each column.

missing_diagrams.matrix

boolean

True

Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows.

missing_diagrams.heatmap

boolean

True

Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another).

missing_diagrams.dendrogram

boolean

True

Display a dendrogram. Provides insight in the co-occurrence of missing values (i.e. columns that are both filled or both none).

Configuration example: disable heatmap and dendrogram for large datasets
profile = df.profile_report(
      missing_diagrams={
          'heatmap': False,
          'dendrogram': False,
      }
)
profile.to_file("report.html")

The missing data diagrams are generated by the missingno package.

Correlations

Parameter

Type

Default

Description

correlations.pearson.calculate

boolean

True

Whether to calculate this coefficient

correlations.pearson.warn_high_correlations

boolean

True

Warn for correlations higher than the threshold

correlations.pearson.threshold

float

0.9

Warning threshold

correlations.spearman.calculate

boolean

True

Whether to calculate this coefficient

correlations.spearman.warn_high_correlations

boolean

False

Warn for correlations higher than the threshold

correlations.spearman.threshold

float

0.9

Warning threshold

correlations.kendall.calculate

boolean

True

Whether to calculate this coefficient

correlations.kendall.warn_high_correlations

boolean

False

Warn for correlations higher than the threshold

correlations.kendall.threshold

float

0.9

Warning threshold

correlations.phi_k.calculate

boolean

True

Whether to calculate this coefficient

correlations.phi_k.warn_high_correlations

boolean

False

Warn for correlations higher than the threshold

correlations.phi_k.threshold

float

0.9

Warning threshold

correlations.cramers.calculate

boolean

True

Whether to calculate this coefficient

correlations.cramers.warn_high_correlations

boolean

True

Warn for correlations higher than the threshold

correlations.cramers.threshold

float

0.9

Warning threshold

Disable all correlations:

profile = df.profile_report(
     title="Report without correlations",
     correlations={
         "pearson": {"calculate": False},
         "spearman": {"calculate": False},
         "kendall": {"calculate": False},
         "phi_k": {"calculate": False},
         "cramers": {"calculate": False},
     },
 )

 # or using a shorthand that is available for correlations
    profile = df.profile_report(
     title="Report without correlations",
     correlations=None,
 )

Interactions

Parameter

Type

Default

Description

interactions.continuous

boolean

True

Generate a 2D scatter plot (or hexagonal binned plot) for all continuous variable pairs.

interactions.targets

list

[]

When a list of variable names is given, only interactions between these and all other variables are given.

The HTML Report

Parameter

Type

Default

Description

html.minify_html

bool

True

If True, the output html is minified using the htmlmin package.

html.use_local_assets

bool

True

If True, all assets (stylesheets, scripts, images) are stored locally. If False, a CDN is used for some stylesheets and scripts.

html.inline

boolean

True

If True, all assets are contained in the report. If False, then a web export is created, where all assets are stored in the ‘[REPORT_NAME]_assets/’ directory.

html.navbar_show

boolean

True

Whether to include a navigation bar in the report

html.style.theme

string

None

Select a ‘bootswatch’ theme. Available options: ‘flatly’ (dark) and ‘united’ (orange)

html.style.logo

string

A base64 encoded logo, to display in the navigation bar.

html.style.primary_color

string

#337ab7

The primary color to use in the report.

html.style.full_width

boolean

False

By default, the width of the report is fixed. If set to True, the full width of the screen is used.

Using a custom configuration file

To set the configuration of pandas-profiling using a custom file, you can start one of the sample configuration files below. Then, change the configuration to your liking.

from pandas_profiling import ProfileReport

profile = ProfileReport(df, config_file="your_config.yml")
profile.to_file("report.html")

Sample configuration files

A great way to get an overview of the possible configuration is to look through sample configuration files. The repository contains the following files:

Configuration shorthands

It’s possible to disable certain groups of features through configuration shorthands.

# Disable samples, correlations, missing diagrams and duplicates at once
r = ProfileReport(samples=None, correlations=None, missing_diagrams=None, duplicates=None, interactions=None)

# Or use the .set_variable method
r = ProfileReport()
r.set_variable("samples", None)
r.set_variable("duplicates", None)
r.set_variable("correlations", None)
r.set_variable("missing_diagrams", None)
r.set_variable("interactions", None)