Announcement archive

Version v2.7.0 released

Performance

There were several performance regressions pointed out to me recently when comparing 1.4.1 to 2.6.0. To that end, we benchmarked the code and found several minor features introducing disproportionate computational complexity. Version 2.7.0 optimizes these, giving significant performance improvements! Moreover, the default configuration is tweaked for towards the needs of the average user.

Phased builds and lazy loading

A report is built in phases, which allows for new exciting features such as caching, only re-rendering partial reports and lazily computing the report. Moreover, the progress bar provides more information on the building phase and step.

Documentation

This version introduces more elaborate documentation powered by Sphinx. The previously used pdoc3 has been adequate initially, however misses functionality and extensibility. Several recurring topics are now documented, for instance the configuration parameters are documented and there are pages on big datasets, sensitive data, integrations and resources.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions. If you find value in the package, we welcome you to support the project through GitHub Sponsors! It’s extra exciting that GitHub matches your contribution for the first year.

Find more information here:

New in v2.6.0

Dependency policy

The current dependency policy is suboptimal. Pinning the dependencies is great for reproducibility (high guarantee to work), but on the downside requires frequent maintenance and introduces compatibility issues with other packages. Therefore, we are moving away from pinning dependencies and instead specify a minimum version.

Pandas v1

Early releases of pandas v1 demonstrated many regressions that broke functionality (as acknowledged by the authors here. At this point, pandas is more stable and we notice high demand for compatibility. We move on to support pandas’ latest versions. To ensure compatibility with both versions, we have extended the test matrix to test against both pandas 0.x.y and 1.x.y.

Python 3.6+ features

Python 3.6 introduces ordered dicts and f-strings, which we now rely on. This means that from pandas-profiling 2.6, you should minimally run Python 3.6. For users that for some reason cannot update, you can use pandas-profiling 2.5.0, but you unfortunately won’t benefit from updates or maintenance.

Extended continuous integration

Starting from this release, we use Github Actions and Travis CI combined to increase maintainability. Travis CI handles the testing, Github Actions automates part of the development process by running black and building the docs.

Support pandas-profiling

With your help, we got approved for GitHub Sponsors! It’s extra exciting that GitHub matches your contribution for the first year. Therefore, we welcome you to support the project through GitHub!

Find more information here:

April 14, 2020 💘