Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to remove properties from the deals schema #119

Open
dkarzon opened this issue May 15, 2020 · 6 comments
Open

Unable to remove properties from the deals schema #119

dkarzon opened this issue May 15, 2020 · 6 comments

Comments

@dkarzon
Copy link

dkarzon commented May 15, 2020

I am trying to setup a hubspot tap with a postgres target and I keep getting an error about postgres trying to create a table with more than 1600 columns in it.
Even though at the time my schema only had 4 properties in it.

However going through the code it looks like if deals is selected as a stream the schema is automatically applied from the json output of this api call https://api.hubapi.com/properties/v1/deals/properties
See code here: https://github.com/singer-io/tap-hubspot/blob/master/tap_hubspot/__init__.py#L191

Is there a way to modify he computed schema for the deals stream at all to remove the properties that I don't need? I haven't been able to find a way to do that at the moment.

My deals schema:

{
"streams": [
    {
        "stream": "deals",
        "tap_stream_id": "deals",
        "key_properties": ["dealId"],
        "schema": {
            "type": "object",
            "properties": {
                "portalId": {
                    "type": [
                        "null",
                        "integer"
                    ]
                },
                "dealId": {
                    "type": [
                        "null",
                        "integer"
                    ]
                },
                "dealname": {
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "dealstage": {
                    "type": [
                        "null",
                        "string"
                    ]
                }
            }
        },
        "metadata": [
            {
                "breadcrumb": [ ],
                "metadata": {
                    "selected": true,
                    "table-key-properties": [
                        "dealId"
                    ],
                    "forced-replication-method": "INCREMENTAL",
                    "valid-replication-keys": [
                        "hs_lastmodifieddate"
                    ]
                }
            }
        ]
    }
]}
@zyanichaimaa
Copy link

have you fixed your problem ?

@gmontanola
Copy link

The same is happening to me. Anyone had any luck with this?

@briansloane
Copy link
Contributor

Are you using the catalog to choose the fields that you want via selected metadata? That should allow you to limit the fields that get emitted even though the schema has all the properties in it.

@gmontanola
Copy link

Yes, I'm! I've only selected like 4-5 properties using "selected: true" and the others are explicitly set to false.

@gmontanola
Copy link

gmontanola commented Sep 9, 2020

Well, I've done some testing and:

  1. The schema is generated using all the available properties for an object (and not the selected ones as @dkarzon)
  2. The 1600 column limit is reached because a property is an object with 4 keys (value, timestamp, source, sourceId) and this results in 4 new columns per property.

@staufman
Copy link

staufman commented Feb 4, 2021

In case anyone else is hitting this, it's a real bummer. I'm not intimately familiar with the code in this repo but for now, I went into tap_hubspot/__init__.py (locally) to line 149 and changed it from if extras: to if False and extras:.

Yes, it's a hack and yes, I don't quite understand the ramifications of not syncing the extra data associated with properties. At the same time, it prevents the explosion of columns needed to pipe the data in Postgres which might be all people need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants