Add form submissions endpoint #96

dorcieg · 2019-08-14T16:56:26Z

On January 25, 2019 Hubspot released a new endpoint to retrieve form submissions by form. This eliminates the need to rely on the form submission information on the contact object. To use this new endpoint you need to know the form id so this new function retrieves all of the form ids and then loops through them to get the form submissions. This endpoint is also different in that it returns the newest submissions first so the key used in the state file per form guid is last_max_submitted_at. The code compares this to the current form submission being processed and stops once the last submission is reached.

cmerrick · 2019-08-14T16:56:27Z

Hi @dorcieg, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

cmerrick · 2019-08-14T19:11:58Z

You did it @dorcieg!

Thank you for signing the Singer Contribution License Agreement.

dmosorast

Thanks for the submission! Some quick comments on this.

For the most part this sounds like a useful stream to include, and looks like it's a solid foundation. There are some edge cases that can come up, especially when we're paging through a parent stream and syncing a child stream per parent object. I would imagine that this stream can take some time to get all the way through for large datasets, so it's likely vulnerable to potential interruption (network, timeouts, etc.).

For when we get the time to work on integrating this, a few pieces of information are usually helpful in that process.

Some information about expected/actual row volume would be good to know. In order to be efficient with users' destinations, we like to be sure that the bookmark is being used correctly, and that there's minimal duplicate data being sent to ensure everything gets replicated.
Some logs (with any data/PII/creds redacted, of course) of it running would be helpful to illustrate things like bookmark usage through the log messages.
Snippets of logs of running it with target-stitch and an Import API connection in Stitch are also a good step to validate that the JSON Schema matches the data that comes through. The Stitch API validates each record against the schema and will return an error if a record does not match.

That's all for now! Please let me know if I can elaborate on any of the comments!

dmosorast · 2019-08-23T14:30:48Z

tap_hubspot/__init__.py

+    return STATE
+
+def _sync_form_submissions_by_form_id(STATE, form_guid):
+    schema = load_schema("form_submissions")


Looks like the schemas/form_submissions.json file needs added to the PR?

dmosorast · 2019-08-23T14:35:34Z

tap_hubspot/__init__.py

+                singer.write_state(STATE)
+                LOGGER.info("No more submissions for this form")
+                break
+    STATE = singer.write_bookmark(STATE, form_guid, bookmark_key, max_bk_value.strftime("%Y-%m-%d %H:%M:%S"))


There's a subtle edge case with this replication strategy that I'm working on right now in #98. If you expect this stream to be high in volume, it could take awhile to sync and there's a kind of race condition with how the updates come in that can cause some to get skipped. #91 has a short illustration of how this can play out.

The best solution I have right now that's resilient to tap failures is storing the current_sync_start in the state and making sure the bookmark doesn't get set above that value. (see #98) It might be worth it to try that pattern here, too.

Also, the bookmark should probably not be written to state until after all forms have been checked, to limit the strange edge cases that can come up if the tap is interrupted. This can return the max_bk_value and accept it as a parameter to maintain it over the whole stream sync. My comment above was assuming that would be the case.

dmosorast · 2019-08-23T14:36:43Z

tap_hubspot/__init__.py

+        while up_to_date == False:
+            form_offset = singer.get_offset(STATE, form_guid)
+
+            if bool(form_offset) and form_offset.get('after') != None:


bool(form_offset) can just be changed to form_offset

dmosorast · 2019-08-23T14:38:51Z

tap_hubspot/__init__.py

+    data = request(get_url("forms")).json()
+
+    for row in data:
+        STATE = _sync_form_submissions_by_form_id(STATE, row['guid'])


The docs say this can only be used for some types of forms. Can/Should that be checked here to prevent possible errors?

I use the endpoint /forms/v2/forms to get the list of forms to iterate through. By default non-marketing forms are filtered out of this endpoint according to the docs so I think we're okay.

dmosorast · 2019-08-23T14:40:17Z

tap_hubspot/__init__.py

+    bookmark_key = 'last_max_submitted_at'
+
+    singer.write_schema("form_submissions", schema, ['guid', 'submittedAt', 'pageUrl'], [bookmark_key])
+    end = utils.strptime_with_tz(get_start(STATE, form_guid, bookmark_key))


utils.strptime_to_utc is the recommended function to use for consistency (unless you need the original timezone).

dmosorast · 2019-08-23T14:45:49Z

tap_hubspot/__init__.py

+                if submitted_at > max_bk_value:
+                    max_bk_value = submitted_at
+
+                if submitted_at <= end:


This could use a comment documenting that the stream is returned in reverse order. (It just tripped me up reading it heh)

dmosorast · 2019-08-23T14:50:02Z

tap_hubspot/__init__.py

+    url = get_url("form_submissions", form_guid=form_guid)
+    path = 'results'
+    params = {
+        'count': 50


Should this be limit? I don't see a count parameter available.

Add form submissions endpoint

7904528

cmerrick added the cla-missing label Aug 14, 2019

cmerrick removed the cla-missing label Aug 14, 2019

dmosorast requested changes Aug 23, 2019

View reviewed changes

add form submissions schema and updates to the endpoint function

78fc879

pquadri mentioned this pull request Oct 8, 2024

feat: add form submissions remoteoss/tap-hubspot#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add form submissions endpoint #96

Add form submissions endpoint #96

dorcieg commented Aug 14, 2019

cmerrick commented Aug 14, 2019

cmerrick commented Aug 14, 2019

dmosorast left a comment

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

dorcieg Sep 2, 2019

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

dmosorast Aug 23, 2019

Add form submissions endpoint #96

Are you sure you want to change the base?

Add form submissions endpoint #96

Conversation

dorcieg commented Aug 14, 2019

cmerrick commented Aug 14, 2019

cmerrick commented Aug 14, 2019

dmosorast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment