Skip to content

Latest commit

 

History

History
46 lines (30 loc) · 1.68 KB

File metadata and controls

46 lines (30 loc) · 1.68 KB

2.3 Exploratory data analysis

Slides

Notes

Pandas attributes and methods:

  • df[col].unique() -> return a list of unique values in the series
  • df[col].nunique() -> return the number of unique values in the series
  • df.isnull().sum() -> return the number of null values in the dataframe

Matplotlib and seaborn methods:

  • %matplotlib inline -> assure that plots are displayed in jupyter notebook's cells
  • sns.histplot() -> show the histogram of a series

Numpy methods:

  • np.log1p() -> apply log transformation to a variable, after adding one to each input value.

Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible.

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation