-
Notifications
You must be signed in to change notification settings - Fork 3
9th (public) place solution to MeLi Data Challenge 2020
License
ledmaster/meli2020
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
9th (public) place solution to MeLi Data Challenge 2020 This is a very simple solution: The most important model is XGBoost. Stacking with the Neural Network only (barely) flipped my place from 10th to 9th. 1. Run 0_parquet.ipynb to save the original files as parquet and make the loading faster. 2. Run 1a_prep_sbert_neuralmind.ipynb to generate sentence embeddings (using a PT-BR fine-tuned BERT provided by neuralmind) and a KNN index based on this data. 3. Run 1b_prep_ltr_knn_search.ipynb to "melt" the original data and add nearest neighbors. Basically create one row for each candidate item (viewed items + 50 nearest neighbors based on both views and search embeddings from last step) 4. Run 2a_xgb_ranker_knn_neuralmind.ipynb to create a minimal feature set, transform the target into a ranking, save the data for reuse and train a rank:pairwise XGBoost. 5. Run 2b_embbag_nums_yrank_mse.ipynb to create a neural network that takes both features from the previous dataset and the sentence embeddings. To be faster I trained it over the same target, but using MSE (surprisingly not as bad as I thought). 6. Run 3_stack.ipynb to load the previous models predictions and create a XGB to stack them into final predictions. Subs are named 22c, 26, etc because these were the original notebook names as I was naming them in a sequence to organize the progress. Thanks for organizing this competition and preparing a very practical, real-world dataset :)
About
9th (public) place solution to MeLi Data Challenge 2020
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published