Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update XLA pin to 10/16 #8267

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

JackCaoG
Copy link
Collaborator

No description provided.

@JackCaoG JackCaoG added the tpuci label Oct 16, 2024
@JackCaoG
Copy link
Collaborator Author

failure when build cpp test, looking

2024-10-16T22:52:04.1618895Z     ERROR: /__w/xla/xla/pytorch/xla/test/cpp/BUILD:117:14: no such package '@tsl//tsl/profiler/utils': BUILD file not found in directory 'tsl/profiler/utils' of external repository @tsl. Add a BUILD file to a directory to mark it as a package. and referenced by '//test/cpp:test_xla_sharding'
2024-10-16T22:52:04.1621279Z     ERROR: Analysis of target '//test/cpp:test_xla_sharding' failed; build aborted: Analysis failed
2024-10-16T22:52:04.1622180Z     INFO: Elapsed time: 28.174s
2024-10-16T22:52:04.1622654Z     INFO: 0 processes.
2024-10-16T22:52:04.1623391Z     FAILED: Build did NOT complete successfully (50 packages loaded, 450 targets configured)
2024-10-16T22:52:04.1624848Z     INFO: Streaming build results to: https://source.cloud.google.com/results/invocations/f6c56856-e4e6-4b2d-9be2-547d0cb398be
2024-10-16T22:52:04.1626088Z     error: command '/usr/local/bin/bazel' failed with exit code 1

@JackCaoG
Copy link
Collaborator Author

Good news is that build pass, I will look into how to fix the CPU and TPU test failures.

@JackCaoG
Copy link
Collaborator Author

cpu test was about ifrt, tpu test seems to be a wheel compatibility issue.

@JackCaoG
Copy link
Collaborator Author

TPU CI failure was due to

2024-10-17T01:59:15.2842249Z Requirement already satisfied: torch_xla[pallas] in /home/runner/.local/lib/python3.10/site-packages (2.6.0+gitd558a61)
2024-10-17T01:59:15.2879096Z Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.10/site-packages (from torch_xla[pallas]) (2.1.0)
2024-10-17T01:59:15.2881977Z Requirement already satisfied: numpy in /usr/local/lib/python3.10/site-packages (from torch_xla[pallas]) (2.1.1)
2024-10-17T01:59:15.2884446Z Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/site-packages (from torch_xla[pallas]) (6.0.2)
2024-10-17T01:59:15.2886866Z Requirement already satisfied: requests in /usr/local/lib/python3.10/site-packages (from torch_xla[pallas]) (2.32.3)
2024-10-17T01:59:15.7924850Z INFO: pip is looking at multiple versions of torch-xla[pallas] to determine which version is compatible with other requirements. This could take a while.
2024-10-17T01:59:15.9463032Z Collecting torch_xla[pallas]
2024-10-17T01:59:16.0025144Z   Downloading torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (7.1 kB)
2024-10-17T01:59:16.0125969Z Requirement already satisfied: cloud-tpu-client>=0.10.0 in /usr/local/lib/python3.10/site-packages (from torch_xla[pallas]) (0.10)
2024-10-17T01:59:16.0166390Z Collecting jaxlib==0.4.29 (from torch_xla[pallas])
2024-10-17T01:59:16.0290273Z   Downloading jaxlib-0.4.29-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.8 kB)
2024-10-17T01:59:16.1881478Z Collecting jax==0.4.29 (from torch_xla[pallas])
2024-10-17T01:59:16.2007347Z   Downloading jax-0.4.29-py3-none-any.whl.metadata (23 kB)
2024-10-17T01:59:16.3808085Z Collecting ml-dtypes>=0.4.0 (from jax==0.4.29->torch_xla[pallas])
2024-10-17T01:59:16.3915653Z   Downloading ml_dtypes-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
2024-10-17T01:59:16.5235789Z Collecting opt-einsum (from jax==0.4.29->torch_xla[pallas])
2024-10-17T01:59:16.5341084Z   Downloading opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3 kB)
2024-10-17T01:59:16.7934652Z Collecting scipy>=1.9 (from jax==0.4.29->torch_xla[pallas])

seems like I messed up the jax version.. trying to fix..

@JackCaoG JackCaoG force-pushed the JackCaoG/update_xla_pin_10_15 branch from 0512045 to 85254b8 Compare October 18, 2024 20:27
@JackCaoG
Copy link
Collaborator Author

Rebased, now GPU CI should not run.

@JackCaoG JackCaoG requested review from qihqi and lsy323 October 18, 2024 21:16
@JackCaoG
Copy link
Collaborator Author

@lsy323 @qihqi can you take a look at this pr? I plan to merge it on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant