Fixes for Mac M3 #2830

JoeyOverby · 2024-09-16T15:31:37Z

This is really just a POC - and not a polished PR. This is mostly an attempt to just get this working and then hopefully the owner of the original repo will grant me the ability to put a PR into that one so others can help clean up the code.

My only focus of this was to get it to work for Textual Inversion training - I don't know if it works for the other functionality.

(IMPORTANT: I have a PR for the sd-scripts here, but I'm VERY unsure if it's correct... will update later!)

Fixes

Instead of trying to open a window to ask if overwriting an embedding model file is ok, simply back it up
Add ability to use Mac MPS as a device when it's available
Fix packages/versions to work with MPS setup

Notes

A lot of manual steps were needed while trying to get the packages working. I will try a clean install later, but for now I'm putting in the notes I took while doing this in case someone else wants to do this as well.

Had to remove tensorboard completely. I wasn't able to get the numpy versions to work with both (so I run tensorboard separately in a different venv).
Had to remove tensorflow (which wasn't needed for the Textual Inversion training I was doing).
I'd recommend removing (or backing up/renaming) your venv folder, so you don't have to run the uninstall steps below

Full List of Installed Packages

I'll touch back up the install scripts (and requirements files), but for now I wanted to give everyone the packages and versions that worked for me!


Package                      Version     Editable project location
---------------------------- ----------- -------------------------------------
accelerate                   0.25.0
aiohttp                      3.10.5
altair                       4.2.2
astunparse                   1.6.3
bitsandbytes                 0.41.1
blendmodes                   2022
dadaptation                  3.1
easygui                      0.98.3
fairscale                    0.4.13
gast                         0.6.0
google-pasta                 0.2.0
gradio                       4.43.0
h5py                         3.11.0
imagesize                    1.4.1
invisible-watermark          0.2.0
keras                        2.14.0
libclang                     18.1.1
library                      0.0.0       
lion-pytorch                 0.0.6
lycoris_lora                 2.2.0.post3
ml-dtypes                    0.2.0
numba                        0.59.1
omegaconf                    2.3.0
onnx                         1.16.1
onnxruntime                  1.17.1
open-clip-torch              2.20.0
opt-einsum                   3.3.0
pip                          24.2
prodigyopt                   1.0
pytorch-lightning            2.0.0
scipy                        1.11.4
tensorboard                  2.14.1
tensorflow-io-gcs-filesystem 0.37.1
termcolor                    2.4.0
tk                           0.1.0
torchaudio                   2.4.1
voluptuous                   0.13.1
wandb                        0.15.11
wrapt                        1.14.1

Manual commands I ran to get to this point

pip uninstall open-clip-torch
pip uninstall tensorflow-macos tensorflow-metal tensorflow-estimator -y
pip install --force-reinstall torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/metal.html
pip install numpy==1.26.0 --force-reinstall
pip install Pillow==9.5.0 --force-reinstall
pip install blendmodes==2022 numba==0.59.1 scipy==1.11.4 --force-reinstall

Verify MPS Installed Correctly

python -c "import torch; print(torch.backends.mps.is_available())"
python -c "import numpy; print(numpy.__version__)"
python -c "import pillow; print(pillow.__version__)"

Successful Config

This is a copy of my successful config for running Textual Inversion Training (obviously fill in your paths and names for what you want/need) .
And by successful, I mean it ran. Not that it did a great job. Still working on adjusting the parameters - but hopefully this will give you all of the settings you'd need to at least run (as figuring out things like float and AdamW took me awhile).

bucket_reso_steps = 64
cache_latents = true
cache_latents_to_disk = true
caption_extension = ".txt"
clip_skip = 1
dynamo_backend = "no"
enable_bucket = true
gradient_accumulation_steps = 2
gradient_checkpointing = true
huber_c = 0.1
huber_schedule = "snr"
init_word = "woman"
learning_rate = 5e-6
logging_dir = "<REPO_PATH>/kohya_ss/outputs/<PATH TO YOUR TRAINING DIR>/log"
loss_type = "l2"
lr_scheduler = "cosine"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 1
lr_warmup_steps = 10
max_bucket_reso = 1024
max_data_loader_n_workers = 8
max_timestep = 1000
max_token_length = 150
max_train_steps = 100
min_bucket_reso = 512
min_snr_gamma = 5
mixed_precision = "no"
multires_noise_discount = 0.3
no_half_vae = true
noise_offset_type = "Original"
num_vectors_per_token = 12
optimizer_args = []
optimizer_type = "AdamW"
output_dir = "<REPO_PATH>/kohya_ss/outputs/<PATH TO YOUR TRAINING DIR>/model"
output_name = "MyTrainedModel"
pretrained_model_name_or_path = "<PATH TO YOUR TRAINING CHECKPOINT>.safetensors"
prior_loss_weight = 1
resolution = "1024,1024"
resume = "<REPO_PATH>/kohya_ss/outputs/<PATH TO YOUR TRAINING DIR>/model/<PREVIOUS MODEL>"
sample_every_n_epochs = 1
sample_prompts = "<REPO_PATH>/kohya_ss/outputs/<PATH TO YOUR TRAINING DIR>/model/prompt.txt"
sample_sampler = "euler_a"
save_every_n_epochs = 1
save_every_n_steps = 20
save_last_n_steps = 15
save_last_n_steps_state = 15
save_model_as = "safetensors"
save_precision = "float"
save_state = true
save_state_on_train_end = true
sdpa = true
token_string = "YOUR_TOKEN_STRING_HERE"
train_batch_size = 6
train_data_dir = "<REPO_PATH>/repos/kohya_ss/outputs/<PATH TO YOUR TRAINING DIR>/img"
use_object_template = true

…popup

bmaltais · 2024-09-17T22:26:25Z

Please ensure that none of the changes you submit will introduce issues to the current solution. I see that your changes are marked as a draft, and I understand there's still work to be done.

I appreciate you taking the time to address the MacOS situation, especially since it's an area that hasn’t been looked after in quite some time. However, as I don't own an M3 Mac, I haven't had the opportunity to focus on it myself.

One critical point to keep in mind is to avoid introducing any problems for other users, which could block your code from being merged. For example, there are a significant number of changes in the common requirements.txt file, and I believe this could cause major issues for Linux and Windows users. Perhaps try to use dedicates requirements-macos-m3.txt file where everything is as it should be? This might require changes as to how the setup.py script is run... as it was never intended to support so many variations...

Therefore, I recommend minimizing modifications to the requirements.txt file and instead focusing on creating a requirements-macos-m3.txt file that ensures compatibility with M3 Macs (and possibly M1 and M2 as well).

Regarding the submodule changes, those won’t be accepted. Proper support for M3 should be addressed in the kohya_ss sd-scripts upstream. I won’t approve pulling submodule changes from other sources, as this could introduce concerns for both current and future users.

I’m hopeful we can find a solution that provides proper MacOS support without disrupting the experience for Linux and Windows users.

JoeyOverby · 2024-09-20T20:09:26Z

I'm sorry for the delay - apparently my notifications went to an old work email that I no longer have.

I actually think that we don't need to make any changes to the sd-scripts submodule. I'll try to test that here in a bit. My only concern is what happens with conflicting versions between the mac requirements files and the generic requirements file?

Would it make more sense to have just one file for mac and then not reference the generic requirements one in the mac setups?

And happy to help! Thank you for taking the time to respond. I appreciate it.

bmaltais · 2024-09-20T20:14:08Z

I believe the separate requirements file for Mac is the best approach. If my memory serves me correctly, I think it’s possible to achieve this via a parameter. You can build your solution around this concept.

JoeyOverby · 2024-09-20T20:19:08Z

Ok, I'll take a look at this soon and get back to you with some pieces. I wasn't able to get the gui windows to pop up on Mac for example if you start training and the model already exists and it tries to do a confirmation window. Is this a known issue or something odd with my setup?

…

On Fri, Sep 20, 2024 at 2:14 PM bmaltais ***@***.***> wrote: I believe the separate requirements file for Mac is the best approach. If my memory serves me correctly, I think it’s possible to achieve this via a parameter. You can build your solution around this concept. — Reply to this email directly, view it on GitHub <#2830 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJCJJ3GEW3L6BWRKQFFEHWDZXR62NAVCNFSM6AAAAABOJSBQXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRUGU2TCOJWGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Sincerely, Joey Overby

bmaltais · 2024-09-21T16:07:56Z

I will be away for a week so no rush. I will not be able to work on the GUI for quite a bit.

bghira · 2024-09-22T01:34:30Z

mps has correctness issues and can't be relied on for training a model. however MLX or Tinygrad do not rely on MPS and have proper results. i've never seen good results from training on mps, and i've supported it in simpletuner since january.

JoeyOverby · 2024-09-22T02:29:30Z

Interesting.... I've been trying to train a model for a bit now and it isn't going well. Do you have some documentation on what the problem is? (Also so I know when/if it's fixed). And I'll check out MLX/Tinygrad here as well. Thank you.

…

On Sat, Sep 21, 2024 at 7:34 PM Bagheera ***@***.***> wrote: mps has correctness issues and can't be relied on for training a model. however MLX or Tinygrad do not rely on MPS and have proper results. i've never seen good results from training on mps, and i've supported it in simpletuner since january. — Reply to this email directly, view it on GitHub <#2830 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJCJJ3GGDBKHLUSXRSU7NZLZXYNDZAVCNFSM6AAAAABOJSBQXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRVGQYDKNRRGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Sincerely, Joey Overby

bghira · 2024-09-30T15:11:47Z

the problem is probably an overflow inside pytorch's MPS code that has yet to be discovered. if you go to the pytorch issue tracker and search for label:mps is:open you will see the problem.

the only only reason to support MPS for pytorch in this repo (or the original) is for the maintainer of the repo to be able to directly run the code on their apple development workstation. this is the only reason that i have it supported in simpletuner.

apple machines have upsides for ML development:

they are very power-efficient. i live in a country without a very reliable power grid, and we use solar.
they have extremely fast CPUs, and eg. quantise weights or perform image transforms faster than Intel or AMD can at the highest end
the CPU mode in pytorch is correct when MPS is not, and pytorch on Apple M3 CPU is surprisingly fast

they have downsides:

the code required to support Apple systems often comes at a detriment to the entire codebase
for example you cannot rely on the existence of things like autocast or CUDA streams
torch compile encounters branching problems when you have to check for MPS systems (example, the Flux RoPE code uses float64 on NVIDIA but fp64 isn't available on MPS so it falls back to fp32, which torch compile gets confused and unhappy about)
dtype handling is different between the two platforms, where you will oddly encounter situations where the same code runs improperly on one vs the other
- torchao / quanto (quantisation) introduce more problems here
- CUDA seems fine with mixing bf16 and fp32 compute sometimes while MPS is never happy with this situation
CUDA extensions and custom kernels don't work on MPS and you'll be frustrated to discover just how much of the ecosystem relies on these things

it can be almost a part-time job to keep MPS and CUDA working together, and in the case of pytorch, it's actually several full-time jobs on their end.

if bmaltais or kohya_tech personally never have to run on MPS then i would just run as far and as fast as possible in the opposite direction and never touch that stack. Apple users are better-served by an architecture-specific training framework, if one even exists.

cchance27 · 2024-10-24T18:21:22Z

for example you cannot rely on the existence of things like autocast or CUDA streams

I believe nightly is adding AMP for MPS now.

Apple users are better-served by an architecture-specific training framework, if one even exists.

Not gonna lie i wish someone would work an a sd training script on mlx :S

JoeyOverby added 10 commits September 16, 2024 09:22

Added detection for thread issue

0b637e8

Attempt to use different mechanism for GUI communication

4998133

Tried to fix communication - will revert but saving as a commit

a9e34b7

Added auto backup for model files that already exist to fix need for …

845fef1

…popup

Fix macos for M3

afc0918

Changed to use Macbook MPS device

c5748ed

Temp commit - adding changed submodules code

8d80301

Remove ignore for submodules

a1d8ecf

Add updates to submodules

dd41c4c

Updated submodule to point to my fork

98944ad

JoeyOverby mentioned this pull request Sep 16, 2024

Update submodules as well since MPS changes were needed kohya-ss/sd-scripts#1606

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for Mac M3 #2830

Fixes for Mac M3 #2830

JoeyOverby commented Sep 16, 2024 •

edited

Loading

bmaltais commented Sep 17, 2024

JoeyOverby commented Sep 20, 2024

bmaltais commented Sep 20, 2024

JoeyOverby commented Sep 20, 2024 via email

bmaltais commented Sep 21, 2024

bghira commented Sep 22, 2024

JoeyOverby commented Sep 22, 2024 via email

bghira commented Sep 30, 2024

cchance27 commented Oct 24, 2024

Fixes for Mac M3 #2830

Are you sure you want to change the base?

Fixes for Mac M3 #2830

Conversation

JoeyOverby commented Sep 16, 2024 • edited Loading

Fixes

Notes

Full List of Installed Packages

Manual commands I ran to get to this point

Verify MPS Installed Correctly

Successful Config

bmaltais commented Sep 17, 2024

JoeyOverby commented Sep 20, 2024

bmaltais commented Sep 20, 2024

JoeyOverby commented Sep 20, 2024 via email

bmaltais commented Sep 21, 2024

bghira commented Sep 22, 2024

JoeyOverby commented Sep 22, 2024 via email

bghira commented Sep 30, 2024

cchance27 commented Oct 24, 2024

JoeyOverby commented Sep 16, 2024 •

edited

Loading