Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: flatten the query dataframe for contracts #2196

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion src/ape/contracts/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -648,6 +648,7 @@ def query(
f"'stop={stop_block}' cannot be greater than "
f"the chain length ({self.chain_manager.blocks.height})."
)

query: dict = {
"columns": list(ContractLog.model_fields) if columns[0] == "*" else columns,
"event": self.abi,
Expand All @@ -665,7 +666,13 @@ def query(
)
columns_ls = validate_and_expand_columns(columns, ContractLog)
data = map(partial(extract_fields, columns=columns_ls), contract_events)
return pd.DataFrame(columns=columns_ls, data=data)
df = pd.DataFrame(columns=columns_ls, data=data)
# Note: The below check is to check for `event_arguments` in the request.
# If `event_arguments` exists in the request, we flatten the field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So originally my idea was that "*" or specific event arg names mentioned in the columns would go grab those fields from the logs specifically, and really only if event_arguments was present in the columns would you keep it, otherwise it should be flattened in the resulting dataframe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went through all of Ape, trying to find a good place to make this manipulation, and it looked pretty painful. I tried to do make this happen earlier, but there is no information available for what those dict keys are going to be in the event_arguments field. I can't know until we make the query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the comment not correct then? I would think unless event_arguments is in the request, we should flatten that field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was flattening it either way. If you want:

columns="name,event_arguments"

I planned to flatten event_arguments. We don't have to though, wasn't sure if we wanted to always flatten it or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or something else that ContractLog has

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or

contract_address is the one that usually bites me... I always mix up if it the one on ContractLog or a custom event input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to think through the scenario where one of the event arguments might be called name or something else that ContractLog has

This is something I was contemplating here. If that were to happen, pandas will automatically rename both columns to name_x, and name_y.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

I don't think so, it seems much more natural to me to work with it like this:

df = factory.PoolCreated.query("pair", ...)
pools = pool(p for p in df["pair"])
df = df["pair"] | pd.Dataframe(dict(token0=p.token0_balance, token1=p.token1_balance) for p in df["pair"])
df

|  pair | token0 | token1 |
| ----- | ------ | ------ |
| 0x... | 123... | 456... |
...

Copy link
Contributor Author

@johnson2427 johnson2427 Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep event_arguments in the columns names? In the case of the uniswap v2 contract, we'd get event_arguments.token0, event_arguements.token1, event_arguments.pair, event_arguments.

I don't think so, it seems much more natural to me to work with it like this:

df = factory.PoolCreated.query("pair", ...)
pools = pool(p for p in df["pair"])
df = df["pair"] | pd.Dataframe(dict(token0=p.token0_balance, token1=p.token1_balance) for p in df["pair"])
df

|  pair | token0 | token1 |
| ----- | ------ | ------ |
| 0x... | 123... | 456... |
...

I don't disagree, for this contract (UniswapV2), we do have an empty field. I think it's an ID? Maybe we just drop the empty field? Idk if we want that or not.

If we want the ability to query for sub-fields, it's going to take quite a bit of work. In BaseContractLog we have the event_arguments field as a dict. Since that field can be anything depending on the contract, I believe we'd need a model_validator in that model where we can pull apart event_arguments and set the keys of that value to keys of the model itself.

@model_validator(mode="before")
@classmethod
def validate_event_args(cls, v, values):
    if (event_args := values.get("event_arguments")):
        for k, v in event_args.items():
            # Note: we can do a check on the values.keys() to make sure we aren't duplicating values
            values[k] = v
    return values

I'm not 100% sure this will work, I've never created fields in a model on the fly before. But if it does, we'd have to ensure the downstream functionality remains intact. As long as we don't remove event_arguments we should be okay. Maybe we add something to the field names we create so we know those are generated fields? Not sure

if "event_arguments" in columns_ls:
event_arguments = df["event_arguments"].apply(pd.Series)
df = pd.concat([df.drop("event_arguments", axis=1), event_arguments], axis=1)
johnson2427 marked this conversation as resolved.
Show resolved Hide resolved
return df

def range(
self,
Expand Down
Loading