Adaptive lora #66

hage1005 · 2023-12-27T02:46:17Z

Add adaptive_threshold option to dynamically determine the rank
Experiment result with MNIST
To use this feature, set compression_ratio_by_covariance or compression_ratio_by_memory in config.yaml

lora:
  init: pca
  compression_ratio_by_covariance: 0.8

This will determine the rank needed for PCA to explain 80% of the covariance
or

lora:
  init: pca
  compression_ratio_by_memory: 0.8

This will determine the rank that compresses gradient memory to 80%.

sangkeun00 · 2023-12-29T14:18:22Z

Thanks for this feature and your analysis! This is a great initial effort. However, there are quite a few things to be figured out before I can merge this PR. For example,

Budget-based vs coverage-based: In your implementation, you fix the covariance coverage and determine the rank for each layer accordingly. However, this could potentially lead to low compression ratio. Instead, we can think about the budget-based approach, where we fix the number of parameters to be tracked and set the rank accordingly based on covariance distribution.
Compatibility with analog.initialize_from_log(): If the rank is determined adaptively for each layer, we have to save this rank structure somewhere so that we can recover the exact LoRA structure when initializing from log.

hage1005 · 2023-12-30T19:42:40Z

Budget-based vs coverage-based: In your implementation, you fix the covariance coverage and determine the rank for each layer accordingly.

Yeah this indeed might lead to low-compression ratio. For MNIST case the compression ratio seems plausible. But regarding "fix the number of parameters", how do we determine this number from the percentage covariance threshold? Should we put all singular values across layers together and sort them?

Compatibility with analog.initialize_from_log(): If the rank is determined adaptively for each layer, we have to save this rank structure somewhere so that we can recover the exact LoRA structure when initializing from log.

Thanks for catching this!

sangkeun00 · 2023-12-30T20:02:55Z

I don't have a concrete answer to the first question at the moment, and believe this is largely a research question (which is exciting). I know many literatures in communication efficient distributed training, which also does gradient compression, also applies different compression ratio across layers. You can probably review them, try/develop new ideas, and find the best working one. Once we have this, we can merge this PR!

sangkeun00 · 2023-12-30T20:26:04Z

Also, we can think about using different ranks for forward and backward. From the implementation perspective, you may allow users to pass tuple of (rank_fwd, rank_bwd) for this. If a user passes an integer value we can use this value to set both rank_fwd and rank_bwd. This is somewhat similar with setting kernel_size or stride in nn.Conv in PyTorch.

hage1005 · 2024-01-08T08:11:24Z

Instead of explicitly saving rank information, I did it by using the shape of the lora weight matrix. Let me know if this seems okay!

sangkeun00

Also, would you be able to come up with some basic unit tests for lora? This would be massively helpful for the future development!

sangkeun00 · 2024-01-08T16:01:10Z

analog/logging/option.py

@@ -97,11 +97,16 @@ def _sanity_check(self):
            )
            self._log["grad"] = True

-    def eval(self):
+    def eval(self, log="grad"):


Instead of having "grad" as a default value, what do you think about having None as a default value, and when it's None we set it to "grad" with a warning message like:

def eval(self, log=None): if log is None: get_logger().warning("we automatically set 'log' to 'grad'. if this is not a desired behavior, please explicitly set your 'log' value.") log = "grad" if isinstance(log, str): ...

eatpk · 2024-02-04T01:20:07Z

tests/examples/test_add_lora.py

+if __name__ == "__main__":
+    unittest.main()


Just a suggestion: I am not sure if this two lines are necessary, let's try to follow the format of other tests!

eatpk · 2024-02-04T01:21:27Z

tests/examples/test_add_lora.py

+        print(if_scores)
+        # torch.save(if_scores, f"{os.path.dirname(os.path.abspath(__file__))}/data/if_analog_lora.pt")


Let's remove the comments! Also, if we are to print the scores( or any other sorts) in the test, I would love to see it in the formatted way!

eatpk · 2024-02-04T01:24:10Z

tests/examples/utils.py

+def construct_mlp(num_inputs=784, num_classes=10):
+    return torch.nn.Sequential(
+        nn.Flatten(),
+        nn.Linear(num_inputs, 4, bias=False),
+        nn.ReLU(),
+        nn.Linear(4, 2, bias=False),
+        nn.ReLU(),
+        nn.Linear(2, num_classes, bias=False),
+    )


I think this can be only used for MNIST data, I am not sure if this can be a general util function.. Let me know of your thoughts!

eatpk · 2024-02-04T01:25:34Z

examples/cifar_influence/compute_influences_pca.py

-if_scores = if_scores.numpy().tolist()
-torch.save(if_scores, "if_analog_pca.pt")
+if_scores = if_scores.numpy().tolist()[0]
+torch.save(if_scores, f"if_analog_pca.pt")


nit: f-string unnecessary.

eatpk · 2024-02-04T01:26:15Z

examples/mnist_influence/compute_influences_scheduler.py

@@ -91,6 +89,6 @@

 # Save
 if_scores = if_scores.numpy().tolist()[0]
-torch.save(if_scores, "if_analog_scheduler.pt")
+torch.save(if_scores, f"if_analog_scheduler.pt")


hage1005 added 2 commits December 26, 2023 17:02

add adaptive threshold

cb07c6b

format

92910f7

hage1005 requested a review from sangkeun00 December 27, 2023 02:46

fix comment

a3b3d7c

hage1005 self-assigned this Dec 30, 2023

hage1005 added 5 commits January 4, 2024 23:59

In progress to save state

fcbed7f

merge main

a2c882e

add log=grad as default option to eval

d5f4779

add compression by ratio

4ab1fb8

Merge branch 'main' into adaptive_lora

e00bde3

sangkeun00 reviewed Jan 8, 2024

View reviewed changes

hage1005 added 4 commits January 8, 2024 23:57

add lora test, add eval warning

fd0f5f1

format

122a898

fix bug

a21fd01

bug fix

3133472

eatpk reviewed Feb 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive lora #66

Adaptive lora #66

hage1005 commented Dec 27, 2023 •

edited

Loading

sangkeun00 commented Dec 29, 2023

hage1005 commented Dec 30, 2023 •

edited

Loading

sangkeun00 commented Dec 30, 2023

sangkeun00 commented Dec 30, 2023

hage1005 commented Jan 8, 2024

sangkeun00 left a comment

sangkeun00 Jan 8, 2024

eatpk Feb 4, 2024

eatpk Feb 4, 2024 •

edited

Loading

eatpk Feb 4, 2024

eatpk Feb 4, 2024

eatpk Feb 4, 2024

		print(if_scores)
		# torch.save(if_scores, f"{os.path.dirname(os.path.abspath(__file__))}/data/if_analog_lora.pt")

Adaptive lora #66

Are you sure you want to change the base?

Adaptive lora #66

Conversation

hage1005 commented Dec 27, 2023 • edited Loading

sangkeun00 commented Dec 29, 2023

hage1005 commented Dec 30, 2023 • edited Loading

sangkeun00 commented Dec 30, 2023

sangkeun00 commented Dec 30, 2023

hage1005 commented Jan 8, 2024

sangkeun00 left a comment

Choose a reason for hiding this comment

sangkeun00 Jan 8, 2024

Choose a reason for hiding this comment

eatpk Feb 4, 2024

Choose a reason for hiding this comment

eatpk Feb 4, 2024 • edited Loading

Choose a reason for hiding this comment

eatpk Feb 4, 2024

Choose a reason for hiding this comment

eatpk Feb 4, 2024

Choose a reason for hiding this comment

eatpk Feb 4, 2024

Choose a reason for hiding this comment

hage1005 commented Dec 27, 2023 •

edited

Loading

hage1005 commented Dec 30, 2023 •

edited

Loading

eatpk Feb 4, 2024 •

edited

Loading