v0.3.0
v0.3.0
Bugfixes 🐛
- eda: fix long name in missing heatmap (f6cc399)
- connector: fix bug in url_path_params (c95a7ff)
- eda: fix NA and int viz issue in plot_diff (ef36d5a)
- eda: fix missing for SmallCard and DateTime type (201e487)
- eda: fix create_report for dask csv (93e8567)
- clean: fix mixesd up formats of date in one column (e295695)
- eda: fixed uncaught dtype and long var names (24f0295)
- eda: fix correlation of num columns with small distinct values (9959b78)
- eda: fix issue with dataframe of one column (910bb71)
- eda: add geopoint in type count (94cbca2)
- eda: fixed uncaught dtype exceptions (d301eb7)
- eda: fix str transform with small distinct as categorical (65e7f90)
- eda: fix na values display issue (1ce5775)
- eda: keep na when preprocess df (17d8219)
- clean: fix returned df_clean in clean_dupl (180e6ad)
- clean: escape apostrophes in code exported by clean_dupl (e6ea7e9)
- eda: fixed endless loop and UI issues (69779cd)
- eda: fix insight error (9ad4e26)
- eda: suppress warnings for missing and report (df2a1e7)
- eda: fix insights of plot_correlation (f0ca5f4)
- eda: suppress warnings of progress bar and dask (ca8da4e)
- eda.create_report: fix constant column error (160844a)
- docs: fix docs of clean_df (38dd4b2)
- clean: remove unneeded replace in clean_dupl (51c02cd)
- eda: fixed bugs come with random generated datasets (53ecf76)
- eda: fix bugs in log transformation (209d7d0)
- eda: fixed and optimized css layouts (58e1b18)
- clean: fix bug in validate_country (28068d4)
- eda: fix column name and index related issues (40a89b9)
- eda: variables can be none (325b090)
- connector: path to new config repo (59603e5)
- clean: lat_long regex not match a date format (49d3d22)
- eda.distribution: highlight variable names (998b176)
- eda: fix the error of numerical cell in object column (91c4f9d)
- eda.distribution: box plot with object dtype (a37e9f2)
- clean: add comma after street suffix or name (e7655db)
- clean: cast values as str in validate funcs (8e1b459)
Features ✨
- clean: tuple of input formats for clean_country() (6bc6551)
- clean: add clean_text function (55d3ae9)
- eda: change color of geo map (1dbcddb)
- clean: add clean_currency function (deb5593)
- clean: add clean_df() function (b750284)
- type: detect column as categorical for small unique values (4696e59)
- eda: add geo_plot function (bbe64ec)
- eda: create_report UI improvement (c849b01)
- eda: added new function plot_diff (79523c3)
- connector: allow parameters appear in url path (5adaf30)
- eda: value frequency table (bc37b79)
- eda: create_report UI improvement (72a0ca9)
- clean: add clean_duplication() function (98ff38d)
- clean: support letters in clean_phone (25d163b)
- eda: specify colors in plot(df), plot(df, x) (33fa36e)
- connector: add functionality that lists supported websites (88187e1)
- clean: add clean_address function (e839ecd)
- clean: add clean_headers function (40742a1)
- eda: parameter management and how-to guide (d2e8b10)
- clean: add clean_date function (6aa6410)
- create_report: add tabs for correlation and missing (6dc568b)
Code Quality + Testing 💯
- eda: add test for geo point (943033a)
- eda: add dataset test for report (0de5208)
- eda: add test of random df (68239f0)
- clean: add tests for clean_duplication() (a4b9d32)
- eda: add random data generator (e83f95b)
- clean: add tests for clean_headers (0aca076)
- eda: add test case of object column with numerical cell (5783984)
- clean) : add tests for clean_date and validate_date (812dbb8)
Performance 🚀
- eda: optimize df preprocess and performance of create_report (e7eb182)
- clean: update documentation of clean_date (c540fcc)
- clean: improve performance of clean_duplication (8fda37e)
- eda: use approximate nunique (6030064)
- clean: improve the peformace of clean_email() (176382b)
- clean: improve performance of clean_date (854329b)
Documentation 📃
- readme: update video, paper and titanic report for eda (1126dea)
- eda: replace x, y, z with col1, col2, col3 (57f65b3)
- clean: add documentation for clean_text (65436b0)
- eda: add documentation for insights (1e4659b)
- clean: add documentation for clean_df() (4ecf0d7)
- eda: update user guide's datasets (2428f98)
- eda: add documentation for geo plot (3558257)
- clean: add user guide for clean_duplication (d834e85)
- clean: fix clean documentation (e3bed2b)
- connector: revision (23085dd)
- clean: add documentation for clean_date function (d445f36)
- connector: add info docs (cb8cb5c)
- connector: add config file section (f55226e)
- connector: adding a process overview via DBLP section (5794d6c)
- connector: remove stale rst files (433fdfe)
- connector: convert pagination section from rst to ipynb (e4b9ba0)
- connector: convert authorization section from rst to ipynb (d25af47)
- connector: change the pointer in index file from connector.rst to introduction.ipynb (218e41c)
- connector: rewrite introduction and form doc structure (6a87693)
- connector: update API reference doc (9bed169)
- clean: improve DataPrep.Clean ReadMe (a0bc96b)
- eda: update legacy documentations for eda (8f948e0)
- clean: add documentation for clean_address (4061fca)
- clean: add documentation for clean_headers (7a9d519)
- clean: add links from user guide to api ref (182b525)
- clean: Docstrings for phone and email (47f1e33)
- datasets: add introduction for datasets (83d42ce)
- clean: add API reference (68182f6)
- clean: add documentation for clean_ip function (9da3ed1)
- connector: add query() section (c904d1f)
- connector: add connect() section (bff842e)
Contributors this release 🏆
The following users contributed code to DataPrep since the last release.
- andy <[email protected]> (First time contributor) ⭐️
- AndyWangSFU <[email protected]> (First time contributor) ⭐️
- atol <[email protected]>
- Brandon Lockhart <[email protected]>
- dylanzxc <[email protected]>
- eutialia <[email protected]>
- Jinglin Peng <[email protected]>
- jinglinpeng <[email protected]>
- Lakshay-sethi <[email protected]> (First time contributor) ⭐️
- nzrymiak <[email protected]>
- peiwangdb <[email protected]>
- peterirani <[email protected]> (First time contributor) ⭐️
- qidanrui <[email protected]> (First time contributor) ⭐️
- ryanwdale <[email protected]>
- waterpine <[email protected]>
- Weiyuan Wu <[email protected]>
- Yi Xie <[email protected]>
- yuzhenmao <[email protected]>
- yuzhenmao <[email protected]>
- yxie66 <[email protected]>
- zhixuan_chi <[email protected]>
🎉🎉 Thank you! 🎉🎉