Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutlass 3.x gemm on sm90 #9398

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from
Open

cutlass 3.x gemm on sm90 #9398

wants to merge 21 commits into from

Conversation

ckl117
Copy link
Contributor

@ckl117 ckl117 commented Nov 8, 2024

PR types

Performance optimization

PR changes

Others

Description

增加cutlass 3.x FP8 GEMM 代码生成和调优;
默认不开启调优,通过环境变量FLAGS_use_cutlass_device_best_config_path=tune控制cutlass FP8 gemm调优功能,如果未设置变量、为空或者defalut则使用默认配置,否则使用指定的json文件配置,增加易用性;

Copy link

paddle-bot bot commented Nov 8, 2024

Thanks for your contribution!

@DrownFish19
Copy link
Collaborator

安装方式已经变化,setup_cuda中cutalss已经通过submodule添加

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.77%. Comparing base (8ed579a) to head (76fb665).
Report is 63 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9398      +/-   ##
===========================================
- Coverage    53.07%   52.77%   -0.31%     
===========================================
  Files          703      718      +15     
  Lines       110801   112373    +1572     
===========================================
+ Hits         58804    59300     +496     
- Misses       51997    53073    +1076     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DrownFish19
Copy link
Collaborator

PaddleNLP-CI跳过了此处测试,合入需验证版本升级带来影响。

csrc/setup_cuda.py Outdated Show resolved Hide resolved
Co-authored-by: Yuanle Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants