Skip to content

Latest commit

 

History

History
61 lines (53 loc) · 1.89 KB

statistics-and-machine-learning.md

File metadata and controls

61 lines (53 loc) · 1.89 KB

Machine Learning Overview Questions

  • Machine Learning
    • Batch Learning vs Online Learning
    • Instance Based Learning vs Model Based Learning
    • Supervised, Unsupervised, Semi-supervised, reinforcement learning
  • Machine Learning Algorithms
    • Algorithm
    • Model Training
    • Model Selection
  • Machine Learning Implementation
    • 1-million users to train K-means
      • local sensitive hashing
    • provide recommendations in real-time
  • Data Analysis and Metrics
    • define user-item scores when user "like" data is hard to get

Data Science/Machine Learning Project

For take-home exercise, often use Jupyter Notebook or R-Markdown. From very open-ended to very detailed instructions(accuracy score expected)

  1. Problem
    1. understand the problem
      1. key challenge and necessary domain knowledges
    2. problem formalization
  2. Data
    1. Data collection
    2. Exploratory Data Analysis - understand the data
      1. occasionally: deal with big data
        1. either drop part of the data or use more advanced (parallel) platforms
      2. summarize descriptive analysis
    3. Data Processing
    4. Data Wrangling, Data Cleaning
      1. Missing Value handling and impact analysis
  3. Feature Matrix
    1. Understanding features
      1. deal with categorical features (encoding)
    2. Feature Engineering
      1. eg. tokenizing, stemming, word2vec ,TF-IDF
    3. Feature Selection
  4. Modelling
    1. Pre-proessing before model training
    2. Optimize metrics
      1. hyper parameters tuning
    3. model evaluation and model selection
      1. Classification vs Regression
      2. model specific features
        1. eg. feature importance from tree-based models
        2. L1, L2 regularization
  5. Evaluation
    1. Model Evaluation and Selection(Offline)
    2. A/B Testing(Online)
    3. Business Value/Summary
      1. Business Case Analysis
      2. The most important part
  6. Model Deployment