Hi, I'm a machine learning engineer focusing/focused on: NLP, recommender system, quantitative finance.
I have machine learning, backend engineering and data engineering experiences, following are the tech stacks I used before:
- Programming Languages: Python, C++, Java (Only for Data-Engineering), SQL, JavaScript, Shell, Rust (A Little)
- Frameworks, Libs or Tools: Pulsar, Milvus, gRPC, Spark, Hive, K8S, PyTorch, FAISS, Redis, Flask, TensorFlow (Long Time Ago)
Here is my projects index:
-
Side Projects
- simpler-distil-whisper: Reproduce paper Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling.
- PLM-ICD-multi-label-classifier: Reproduce paper PLM-ICD: Automatic ICD Coding with Pretrained Language Models.
- SlimPajama-DC Data Deduplicator: Reproduce paper SlimPajama-DC: Understanding Data Combinations for LLM Training.
- feather: A C++ feature-hash lib with Python binding provided.
- osimhash: A Python binder over simhash C++ text deduplication lib.
- pypack: Generates Python runtime tar.gz file (for PySpark) runnable on all python-version/os/platforms.
-
Codes Reading
- fastTextAnnotation: The very detailed code annotation for facebook fasttext lib.
- hnswlibAnnotation: The very detailed code annotation for hnswlib.
- finBERT: BERT for financial news sentiment classification.
-
Self Using
- quicmd: Some useful quickly execution commands.
- config4: Some self-using configs, for now about tmux and vim.
- wiki4codes: Some lib/framework/algorithms/models' trials recording, demos, examples, etc...