Skip to content

Issues: NVIDIA/NeMo-Curator

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Fix PyTest failures from RAPIDS 24.12 GPU CI bug Something isn't working
#437 opened Dec 17, 2024 by sarahyurick
2 of 3 tasks
Fuzzy dedup - minhash buckets and jaccard_map_buckets bug Something isn't working
#430 opened Dec 13, 2024 by ms-leemina
Update minhash API after 25.02 enhancement New feature or request
#426 opened Dec 11, 2024 by ayushdg
LookupError not caught during Encoding handling bug Something isn't working
#411 opened Dec 6, 2024 by ggcr
Add Jupyter notebook tutorials for data classifiers documentation Improvements or additions to documentation
#406 opened Dec 2, 2024 by sarahyurick
7 tasks done
Add Trafilatura text extraction enhancement New feature or request
#400 opened Dec 2, 2024 by sarahyurick
Update get_all_files_paths_under documentation documentation Improvements or additions to documentation
#378 opened Nov 18, 2024 by sarahyurick
Use CrossFit for TokenizerFertilityFilter enhancement New feature or request
#377 opened Nov 15, 2024 by sarahyurick
Add GPU test with NeMo 2.0
#376 opened Nov 15, 2024 by sarahyurick
[IMP] Decrease Merge Peak Memory Usage of ConnectedComponents bug Something isn't working
#375 opened Nov 15, 2024 by VibhuJawa
Zyda2 tutorial - key error when running compute_counts script bug Something isn't working
#345 opened Nov 5, 2024 by ronjer30
Zyda2 tutorial - TypeError when initializing Dask CPU cluster bug Something isn't working
#344 opened Nov 5, 2024 by ronjer30
Deprecate max_text_bytes_per_part enhancement New feature or request
#331 opened Oct 28, 2024 by sarahyurick
ProTip! Adding no:label will show everything without a label.