-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for Mac M3 #2830
base: master
Are you sure you want to change the base?
Fixes for Mac M3 #2830
Conversation
Please ensure that none of the changes you submit will introduce issues to the current solution. I see that your changes are marked as a draft, and I understand there's still work to be done. I appreciate you taking the time to address the MacOS situation, especially since it's an area that hasn’t been looked after in quite some time. However, as I don't own an M3 Mac, I haven't had the opportunity to focus on it myself. One critical point to keep in mind is to avoid introducing any problems for other users, which could block your code from being merged. For example, there are a significant number of changes in the common requirements.txt file, and I believe this could cause major issues for Linux and Windows users. Perhaps try to use dedicates requirements-macos-m3.txt file where everything is as it should be? This might require changes as to how the setup.py script is run... as it was never intended to support so many variations... Therefore, I recommend minimizing modifications to the requirements.txt file and instead focusing on creating a requirements-macos-m3.txt file that ensures compatibility with M3 Macs (and possibly M1 and M2 as well). Regarding the submodule changes, those won’t be accepted. Proper support for M3 should be addressed in the kohya_ss sd-scripts upstream. I won’t approve pulling submodule changes from other sources, as this could introduce concerns for both current and future users. I’m hopeful we can find a solution that provides proper MacOS support without disrupting the experience for Linux and Windows users. |
I'm sorry for the delay - apparently my notifications went to an old work email that I no longer have. I actually think that we don't need to make any changes to the sd-scripts submodule. I'll try to test that here in a bit. My only concern is what happens with conflicting versions between the mac requirements files and the generic requirements file? Would it make more sense to have just one file for mac and then not reference the generic requirements one in the mac setups? And happy to help! Thank you for taking the time to respond. I appreciate it. |
I believe the separate requirements file for Mac is the best approach. If my memory serves me correctly, I think it’s possible to achieve this via a parameter. You can build your solution around this concept. |
Ok, I'll take a look at this soon and get back to you with some pieces. I
wasn't able to get the gui windows to pop up on Mac for example if you
start training and the model already exists and it tries to do a
confirmation window. Is this a known issue or something odd with my setup?
…On Fri, Sep 20, 2024 at 2:14 PM bmaltais ***@***.***> wrote:
I believe the separate requirements file for Mac is the best approach. If
my memory serves me correctly, I think it’s possible to achieve this via a
parameter. You can build your solution around this concept.
—
Reply to this email directly, view it on GitHub
<#2830 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJCJJ3GEW3L6BWRKQFFEHWDZXR62NAVCNFSM6AAAAABOJSBQXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRUGU2TCOJWGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Sincerely,
Joey Overby
|
I will be away for a week so no rush. I will not be able to work on the GUI for quite a bit. |
mps has correctness issues and can't be relied on for training a model. however MLX or Tinygrad do not rely on MPS and have proper results. i've never seen good results from training on mps, and i've supported it in simpletuner since january. |
Interesting.... I've been trying to train a model for a bit now and it
isn't going well. Do you have some documentation on what the problem is?
(Also so I know when/if it's fixed). And I'll check out MLX/Tinygrad here
as well. Thank you.
…On Sat, Sep 21, 2024 at 7:34 PM Bagheera ***@***.***> wrote:
mps has correctness issues and can't be relied on for training a model.
however MLX or Tinygrad do not rely on MPS and have proper results. i've
never seen good results from training on mps, and i've supported it in
simpletuner since january.
—
Reply to this email directly, view it on GitHub
<#2830 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJCJJ3GGDBKHLUSXRSU7NZLZXYNDZAVCNFSM6AAAAABOJSBQXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRVGQYDKNRRGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Sincerely,
Joey Overby
|
the problem is probably an overflow inside pytorch's MPS code that has yet to be discovered. if you go to the pytorch issue tracker and search for the only only reason to support MPS for pytorch in this repo (or the original) is for the maintainer of the repo to be able to directly run the code on their apple development workstation. this is the only reason that i have it supported in simpletuner. apple machines have upsides for ML development:
they have downsides:
it can be almost a part-time job to keep MPS and CUDA working together, and in the case of pytorch, it's actually several full-time jobs on their end. if bmaltais or kohya_tech personally never have to run on MPS then i would just run as far and as fast as possible in the opposite direction and never touch that stack. Apple users are better-served by an architecture-specific training framework, if one even exists. |
I believe nightly is adding AMP for MPS now.
Not gonna lie i wish someone would work an a sd training script on mlx :S |
This is really just a POC - and not a polished PR. This is mostly an attempt to just get this working and then hopefully the owner of the original repo will grant me the ability to put a PR into that one so others can help clean up the code.
My only focus of this was to get it to work for Textual Inversion training - I don't know if it works for the other functionality.
(IMPORTANT: I have a PR for the sd-scripts here, but I'm VERY unsure if it's correct... will update later!)
Fixes
Notes
A lot of manual steps were needed while trying to get the packages working. I will try a clean install later, but for now I'm putting in the notes I took while doing this in case someone else wants to do this as well.
Full List of Installed Packages
I'll touch back up the install scripts (and requirements files), but for now I wanted to give everyone the packages and versions that worked for me!
Manual commands I ran to get to this point
Verify MPS Installed Correctly
Successful Config
This is a copy of my successful config for running Textual Inversion Training (obviously fill in your paths and names for what you want/need) .
And by successful, I mean it ran. Not that it did a great job. Still working on adjusting the parameters - but hopefully this will give you all of the settings you'd need to at least run (as figuring out things like
float
andAdamW
took me awhile).