Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rvv-intrinsic-generator to define new RVV C intrinsic API for bf16 type #229

Closed
wants to merge 14 commits into from

Conversation

joshua-arch1
Copy link

@joshua-arch1 joshua-arch1 commented May 4, 2023

As is duscussed in #223, the BF16 Extension has recently been proposed and we need to define new intrinsics for convert instruction (bf16-to-fp32/fp32-to-bf16), reinterpret function as well as vfwmaccbf16 in Zvfbfwma Extenstion.

Signed-off-by: joshua-arch1 <[email protected]>
@joshua-arch1 joshua-arch1 reopened this May 4, 2023
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
@joshua-arch1 joshua-arch1 changed the title Update reint_op_template.py to define reinterpret intrinsics for new bf16 type Update rvv-intrinsic-generator to define new RVV C intrinsic API for bf16 type May 4, 2023
@@ -331,7 +331,7 @@ def gen(g):
"Vector Widening Floating-Point Fused Multiply-Add Functions",
REF_DOC_URL +
"#147-vector-widening-floating-point-fused-multiply-add-operations",
["wmacc", "wnmacc", "wmsac", "wnmsac"], FTYPES, WFSEWS, WLMULS,
["wmacc", "wnmacc", "wmsac", "wnmsac"], BFTYPES, WFSEWS, WLMULS,
decorators.has_masking_no_maskedoff_policy)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like bfloat has a separated section and mention its still on the draft status

Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
@circYuan
Copy link

circYuan commented May 8, 2023

Hi! It looks like the commits you provided are generating instructions like vfncvtbf16_rtz_bf16_f_w_f16. However, it seems like this instruction may not be correct because the spec doesn't define the rtz instruction for vector bf16 convert instructions. I think there may need some condition checking on line 115, cvt_op_template.py.

Signed-off-by: joshua-arch1 <[email protected]>
@joshua-arch1
Copy link
Author

The PR for __bf16 ABI has been approved and is going to be merged (riscv-non-isa/riscv-elf-psabi-doc#367). Maybe we can now accelerate finalizing our RVV bf16 intrinsic.

Copy link
Collaborator

@eopXD eopXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pushing this and sorry I haven't been able to review this in the past 2 weeks.

Please separate the vfwmaccbf16, vfwcvtbf16.f.f.v and vfncvtbf16.f.f.w in a separate section. Please merge your commits in #224 to here too.

Regarding the type descriptions in the document, I think you should make it explicit that the bfloat16 types and the intrinsics will not be available when zvbfmin and zvbfwma is not specified in the architecture.

On the other hand, I agree with Rich's latest comment that it would be convenient to have a bfloat16 load/store that represents a (load + fncvt) and (fwcvt + store).

I have replied on my thought regarding this and the planned v1.0 release in the mailing list [0].

[0] https://lists.riscv.org/g/tech-rvv-intrinsics/message/57

@@ -334,6 +334,13 @@ def gen(g):
["wmacc", "wnmacc", "wmsac", "wnmsac"], FTYPES, WFSEWS, WLMULS,
decorators.has_masking_no_maskedoff_policy)

g.function_group(
mac_template,
"Vector BFloat16 Widening Multiply-Add Functions (draft)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Zvfbfwma in the subtitle.

Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
Signed-off-by: joshua-arch1 <[email protected]>
@joshua-arch1 joshua-arch1 requested a review from eopXD May 17, 2023 08:33
@joshua-arch1
Copy link
Author

joshua-arch1 commented May 17, 2023

On the other hand, I agree with Rich's latest comment that it would be convenient to have a bfloat16 load/store that represents a (load + fncvt) and (fwcvt + store).

I have updated my PR according to your comments, except the load/store intrinsics. Maybe you have a different understanding with Rich's. He meant an int16 load/store followed by a reinterpret cast rather than fncvt/fwcvt. Which definition is better? @eopXD

@eopXD
Copy link
Collaborator

eopXD commented Sep 19, 2023

Hi,

May you rebase upon the latest main? Thank you.

@eopXD
Copy link
Collaborator

eopXD commented Sep 19, 2023

On top of rebasing, if possible, I think we can also have the effort of adding the intrinsics for Zvfbfwma and Zvfbfmin here. I imagine we would also need load/store intrinsics for the bfloat type. The narrow conversions and multiply-add instruction intrinsics will need a rounding mode variant.

@kito-cheng
Copy link
Collaborator

I believe this should covered by this and #293 merged into main branch, so close this :)

@kito-cheng kito-cheng closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants