forked from openxla/xla
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto Reorder #6
Open
zjjott
wants to merge
3,335
commits into
main
Choose a base branch
from
feature/auto_reorder
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Auto Reorder #6
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
zjjott
force-pushed
the
feature/auto_reorder
branch
from
March 26, 2024 07:30
d182865
to
c0d7559
Compare
This is a prospective change for openxla#10966. In particular, this will help fix an OSS build problem: "tensorflow/xla/linux/cpu/build_cpu" not being able to find the `InitializeAbslLogging` function. PiperOrigin-RevId: 620055000
…a tuple-tree of `numpy.ndarray`. This is intended for internal debugging use. It cannot be used on OSS because the relevant protobufs are not part of the public API. (Though it must not break the OSS build, naturally.) PiperOrigin-RevId: 620064326
…s to the library for internal debugging tools. PiperOrigin-RevId: 620068167
PiperOrigin-RevId: 620069321
These were used by KernelGen but are no longer needed. PiperOrigin-RevId: 620084345
…n is the entry computation root PiperOrigin-RevId: 620107928
PiperOrigin-RevId: 620111958
We need to honor it. PiperOrigin-RevId: 620121620
Updates LLVM usage to match [aa2c14de1adc](llvm/llvm-project@aa2c14de1adc) PiperOrigin-RevId: 620124069
PiperOrigin-RevId: 620149815
PiperOrigin-RevId: 620156928
This CL extracts current triton codegen requirements for each hlo instruction into a single function to clean the codes in the triton fusion passes. PiperOrigin-RevId: 620157253
This is required in cases where embedded thunk arguments share the same buffer (i.e. they are located at different offsets of the same buffer) PiperOrigin-RevId: 620179451
…aring the same buffer PiperOrigin-RevId: 620184639
PiperOrigin-RevId: 620194665
PiperOrigin-RevId: 620258542
PiperOrigin-RevId: 620259968
PiperOrigin-RevId: 620260337
Changes based on the Hurwitz Zeta algorithm from the article linked in the comments. PiperOrigin-RevId: 620272234
There is an internal issue with running tests on H100s requiring the change to be rolled back. Reverts 0ab2be0 PiperOrigin-RevId: 620273492
…n resharding costs for a given edge as part of one matrix object. PiperOrigin-RevId: 620273768
PiperOrigin-RevId: 620281417
Trying to prevent `error: "Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"` PiperOrigin-RevId: 620284610
Updates LLVM usage to match [80aa52d8c5a8](llvm/llvm-project@80aa52d8c5a8) PiperOrigin-RevId: 620285862
…using mpi. Imported from GitHub PR openxla#7849 Mpi collectives as proposed in jax-ml/jax#11182. I only implemented the inter-process communication and this does not yet support more than 1 threads per process. Adding support for multiple threads/devices per process in the future seems quite a bit more involved if one wanted to do it properly. For MPI I am building and linking against https://github.com/eschnett/MPItrampoline, which dlopens the (wrapped) mpi library at runtime. To wrap and load the desired mpi library one needs compile https://github.com/eschnett/MPIwrapper and set `MPITRAMPOLINE_LIB=/path/to/libmpiwrapper.so`. @hawkinsp Copybara import of the project: -- b74bbb9 by Clemens Giuliani <[email protected]>: add mpi collectives -- 23508eb by Clemens Giuliani <[email protected]>: add explicit Init and Finalize methods and export them to python -- bbe5840 by Clemens Giuliani <[email protected]>: add comment -- 38d1562 by Clemens Giuliani <[email protected]>: fix windows build -- 201f723 by Clemens Giuliani <[email protected]>: fmt -- 2784869 by Clemens Giuliani <[email protected]>: bump xla_extension_version Merging this change closes openxla#7849 COPYBARA_INTEGRATE_REVIEW=openxla#7849 from inailuig:mpi_collectives 2784869 PiperOrigin-RevId: 620302264
…` to `bytes` `xla::PjRtValueType` is defined in C++, where its `std::string` value can contain any string (not necessarily UTF-8). Protobuf verison 3 requires a `string` field to contain UTF-8, so it is more suitable to use `bytes` to express this value. (Note that the string value of `xla::PjRtValueType` would be often consumed by Python, where nanobind would convert `std::string` into Python `str` with UTF-8 decoding. However, this is what some users of `xla::PjRtValueType` choose to do; this is not sufficient enough to constrain the string to be UTF-8 only in C++ APIs.) This is a preemptive change; there is no known problem of using a `string` field previously. PiperOrigin-RevId: 620315110
PiperOrigin-RevId: 620320903
PiperOrigin-RevId: 622946035
PiperOrigin-RevId: 622947937
HloDimensionsInstruction::ClassOf should return false for kTopK. PiperOrigin-RevId: 622950232
PiperOrigin-RevId: 622953342
They don't work after the stream is initialized in GpuStream (the only Stream implementation to make use of the priority). Instead, move the parameter to Stream::Initialize, which is the only place it's actually used. PiperOrigin-RevId: 622958008
PiperOrigin-RevId: 622958040
…nate bool allocated_ member that's now unnecessary. PiperOrigin-RevId: 622964979
…nc values PiperOrigin-RevId: 622971220
PiperOrigin-RevId: 622987817
PiperOrigin-RevId: 623009470
PiperOrigin-RevId: 623011436
PiperOrigin-RevId: 623013151
PiperOrigin-RevId: 623015159
PiperOrigin-RevId: 623053994
…mate;fix all-reduce cost estimate
mars1248
reviewed
Apr 18, 2024
cost_analysis->bytes_accessed(instr) / (1e6 * actual_bandwidth)); | ||
total_time += communication_time; | ||
return total_time; | ||
} | ||
std::vector<double> GpuPerformanceWithCollectiveModel::GetInterInnerBandwidths( | ||
const HloInstruction& instr, const GpuHloCostAnalysis* cost_analysis, | ||
const se::DeviceDescription& gpu_device_info) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数是否是用来计算机内和机间的带宽的?能否添加一个简单函数说明?
auto inner_node_numel_bytes = | ||
numel_bytes * (std::min(kInnerNodeGpu, total_gpu) - 1); | ||
|
||
// all-gather-start(f32[12800,2400]{0,1} replica_groups={{0,1,2,3}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方可以改成计算公式
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Auto Reorder
Using Linear Program set instruction order
run tests