-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for variable referencing #49
Comments
Thanks @alexanderswerdlow! Just to concretize things a bit: could you suggest some syntax for what you're describing here? My weak prior on this is that these sort of "driven" config parameters result in unnecessary complexity, and there are often simple workarounds like defining an |
Of course! I think I should clarify that my initial comment mentions two related but distinct features. The first is just plain variable referencing. For example, I've recently been working with an architecture that can turn some input image into a set of latent vectors or "slots." This is a hyperparameter ( Conditioning the model init on the dataset config does work although there's a couple issues with that:
Hydra's interpolation works for objects as well as primitives so in the example below
Furthermore, if you do go with the alternative and pass several individual objects, it becomes difficult when you have a messy dependency graph. The model/viz is often dependent on the dataset, the viz is dependent on the model, and some parts of the model are dependent on others. Keeping this modular for experimentation requires referencing each other. As for syntax, I'd likely need to spend more time thinking about it but my current hydra config looks something like this, using the interpolation syntax:
Here is an example in Tyro (not one-to-one with the example above to be concise),
Now obviously this wouldn't work exactly as-is because there is a circular dependency in definitions here. In my Tyro example above, I go up (so to speak) to experiment config and then back down to dataset, but a single overarching namespace could be simpler to implement (e.g., referencing only works for a set of pre-defined keys, not from any arbitrary container). The second feature is much smaller both in impact and difficulty but is the ability to perform expressions on variable referencing. You are absolutely right that declaring an However, say you have an input image that is downscaled by Hope that makes sense and I'm happy to explain further! Also totally understand if this is out of scope. |
Yes, that makes sense! For variable references, I'm curious about your thoughts on a few options. One is adapting import dataclasses
import tyro
@dataclasses.dataclass
class ModelConfig:
num_slots: int = -1
@dataclasses.dataclass
class TrainConfig:
num_slots: int
model: ModelConfig
def __post_init__(self) -> None:
if self.model.num_slots == -1:
self.model.num_slots = self.num_slots
print(tyro.cli(TrainConfig)) In this case Of course you can add more complex logic in your A potential downside of this is that you won't be able to use an An alternative option that might be used to circumvent this downside — it's a bit hacky and I wouldn't recommend it, but should work and is in the tests for the import dataclasses
from typing import Annotated
import tyro
@dataclasses.dataclass
class ModelConfig:
num_slots: Annotated[int, tyro.conf.arg(prefix_name=False)]
@dataclasses.dataclass
class TrainConfig:
num_slots: int
model: ModelConfig
# edit: next few lines were unintentionally included
# def __post_init__(self) -> None:
# if self.model.num_slots == -1:
# self.model.num_slots = self.num_slots
print(tyro.cli(TrainConfig)) Again, |
Sorry for the delay and clever idea! Before I go on, I assume for the 2nd example, you didn't intend to include the Some thoughts:
The first option allows arbitrary configuration with expressions but the syntax is a little unwieldy. The second option on the other hand (from what I can tell) essentially gives you a single global namespace (simply without a prefix) to perform referencing. Most use cases are probably fine with a global namespace but I think a core issue remains (for my use case at least).
In other words, say I have an MLP class (dataclass config or actual class); I might want different experiments to use that same MLP in different ways (likely multiple times within the same experiment). That rules out the 2nd approach, but even the 1st approach is difficult. From what I can tell, the user would need to make two distinct higher-level configs (to allow for a different Now I certainly see that this might not be an issue for many and this approach might make a lot of sense for them! I happen to need things particularly modular for experimentation, which is also why I gravitate towards instantiating things directly. Doing so removes an intermediate step that needs to be constantly updated. |
Thanks for clarifying! Two followup questions:
(1) So to re-state: it would be nice to be able to define a config schema via dataclasses, and then define relationships between values in it when you instantiate configs?
(2) I'm not totally following what "instantiating things directly" is referring to. Is this referencing the To try and resolve (1), what about creating some subcommands? When you instantiate each subcommand the default values for each field can be computed from whatever logic you want. import dataclasses
from typing import Dict
import tyro
@dataclasses.dataclass
class ModelConfig:
num_slots: int
@dataclasses.dataclass
class TrainConfig:
num_slots: int
model: ModelConfig
subcommands: Dict[str, TrainConfig] = {}
# First experiment.
subcommands["exp1"] = TrainConfig(
num_slots=2,
model=ModelConfig(num_slots=2),
)
# Second experiment.
num_slots = 4
subcommands["exp2"] = TrainConfig(
num_slots=num_slots,
model=ModelConfig(num_slots=num_slots * 2),
)
config = tyro.cli(
tyro.extras.subcommand_type_from_defaults(subcommands)
)
print(config) Of course since everything is Python, you can also generate this dictionary programatically. Perhaps the downside here is that In general I think there's still a disconnect where I don't fully follow what limitation makes modularity/hierarchy harder than in Hydra. When I read the specializing configs docs in Hydra nothing stands out to me — both the |
As an FYI, I'm also going to raise an error in this case:
(just feels too hacky) |
Hi! Just wanted to ask if there is a canonical/recommended way of doing this. My use case is quite simple e.g.: @dataclass
class MambaCfg:
embed_dim: int
# Other params
y: float
z: str
@dataclass
class AttentionCfg:
embed_dim: int
# Other params
g: float
f: str
@dataclass
class BlockCfg:
embed_dim: int
# e.g. Union of types with 'embed_dim' attribute
# can we set it automatically from the blockcfg embed_dim?
layer: AttentionCfg | MambaCfg In this case if I T = TypeVar("T")
# Just an example; would need to be updated to preserve helptext, etc.
def create_partial_type(cls: Type[T], committed_param: str) -> Type:
class_fields = [
(f.name, f.type, f) for f in fields(cls) if f.name != committed_param
]
def __call__(self, committed_value: Any) -> T:
all_args = {**asdict(self), committed_param: committed_value}
return cls(**all_args)
partial_cls = make_dataclass(
f"Partial{cls.__name__}",
fields=class_fields,
namespace={"__call__": __call__},
)
return partial_cls
# for the previous use-case:
PartialLayerCfg: TypeAlias = Union[
*(
Annotated[obj, tyro.conf.arg(constructor=create_partial_type(obj, "embed_dim"))]
for obj in (AttentionCfg, MambaCfg)
)
]
class BlockCfg:
def __init__(self, embed_dim: int, layer: PartialLayerCfg) -> None:
self.embed_dim = embed_dim
self.layer = layer(embed_dim) edit: Another approach I was considering was allowing the outer-most callable/type have its namespace be accessible, this way one could specify something like |
Hi @mirceamironenco! Unfortunately I don't have a tyro-specific recommendation. I've thought about APIs in the direction of variable interpolation a few times but haven't come up with anything I'm happy with. Usually in these situations I just think about how I would structure things if I were building a pure Python API, for example asking a downstream user to instantiate these config objects in Jupyter notebook, and then a tyro solution falls out of that. This basically reduces to one of:
For the first option, would it be possible to remove from dataclasses import dataclass
from torch import nn
import tyro
@dataclass
class MambaCfg:
# Other params
y: float
z: str
@dataclass
class AttentionCfg:
# Other params
g: float
f: str
@dataclass
class BlockCfg[LayerCfg: (MambaCfg, AttentionCfg)]:
embed_dim: int
layer: LayerCfg
class Mamba(nn.Module):
def __init__(self, cfg: BlockCfg[MambaCfg]): # Takes BlockCfg with isinstance(cfg.layer, MambaCfg)
print(cfg.embed_dim)
print(cfg.layer.y)
...
tyro.cli(BlockCfg) For syntax that looks like |
Hi @brentyi, thank you for the suggestions! I agree with your thought process and 'promoting' the shared parameters is also the solution I would opt for. I think the main frustration/use-case for variable referencing came from (my understanding of) the type system (and its limitations), i.e. for a setting such as: from __future__ import annotations
from dataclasses import asdict, dataclass
from typing import Optional
import torch.nn as nn
@dataclass
class AttentionCfg:
num_heads: int = 8
attn_drop: float = 0.0
window_size: Optional[int] = None
def build(self, embed_dim: int) -> Attention:
return Attention(embed_dim, **asdict(self))
class Attention(nn.Module):
def __init__(
self,
dim: int,
*,
num_heads: int = 8,
attn_drop: float = 0.0,
window_size: Optional[int] = None,
) -> None:
super().__init__()
... Suppose the idea here is that the key-word arguments of class Attention(nn.Module):
def __init__(
self,
dim: int,
*,
foo: float,
num_heads: int = 8,
attn_drop: float = 0.0,
window_size: Optional[int] = None,
) -> None:
super().__init__()
.... Now |
Thanks for clarifying! Yeah, I can see why it's tough if you don't want to take the config instance as input to your constructor. I'm not fully following why variable referencing would make type safety easier, though. If you had a pattern without the extra @dataclass
class AttentionCfg:
num_heads: int = 8
attn_drop: float = 0.0
window_size: Optional[int] = None
def build(self) -> Attention:
return Attention(**asdict(self))
class Attention(nn.Module):
def __init__(
self,
*,
num_heads: int = 8,
attn_drop: float = 0.0,
window_size: Optional[int] = None,
) -> None:
super().__init__()
... wouldn't the same signature/dataclass attribute consistency problem still exist? |
You are right, that's unclear form the previous message. With variable referencing I would drop the {Layer}Cfg classes altogether, and expose the layer constructors to @dataclass
class BlockCfg:
embed_dim: int
# No layer cfgs, just the nn.Module subclasses themselves.
# Presumably we would have some syntax that's missing here which would
# realize the referencing, like Annotated[*_mixer, ...=BlockCfg.embed_dim]
sequence_mixer: Attention | Mamba
state_mixer: Mlp | MoE Assume all layers have an initial positional argument Paying the DRY price and using the layer + cfg-with-same-kwargs option is fine as the user has a nicer experience, hence my question about type hinting borrowed signatures. In any case, thanks for taking a look at this! Looking a bit at the other referencing implementations it seems to add quite a layer of complexity, since you have to topologically sort the dependency graph and instantiate things in order, which might lead to a lot of corner cases given tyros' other (nice!) features (but you would know better). |
Makes sense, thanks for clarifying! It does seem nice, but yeah the naive implementations I can think of also all seem pretty complex in terms of both implementation and user experience. I think the use case of wanting to avoid an explicit config object is also lower priority to myself personally; the overhead of the extra class is annoying but that feels outweighed by the usefulness of being able to instantiate/save/restore the config object independently of the module itself. That said if any new ideas for implementation/syntax occur to you please feel free to share! I'd be interested. |
I've been looking for an alternative to Hydra for config management, specifically one that allows for defining configs in Python, and I stumbled across Tyro which seems like a great library for my use case after some experimentation.
However, one thing that doesn't appear to be possible is referencing a single variable from multiple places in a nested config. As for why this might be needed, it is very common in an ML codebase to require the same parameter in many different places. For example, the number of classification classes might be used in the model construction, visualization, etc.
We might want this value to be dependent on a config group such as the dataset (i.e. each dataset might have a different number of classes). Instead of manually defining each combination of model + dataset, it would be a lot easier to have the model parameters simply reference the dataset parameter, or have them both reference some top-level variable. With Hydra, there is value interpolation that does this.
Since we can define Tyro configs directly in Python, it seems like this could be made much more powerful with support for arbitrary expressions allowing small pieces of logic to be defined in a configuration (e.g., for a specific top-level config we can have a model parameter be
4 * num_classes
). Clearly, we could simply make the4
into a new parameter but there are good reasons we might want it in the config instead.From what I can tell, this type of variable referencing, even without any expressions, is not currently possible with Tyro.
The text was updated successfully, but these errors were encountered: