Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry: investigate why demo.kedro.org is not flowing to Snowflake #2158

Closed
1 task
Huongg opened this issue Oct 28, 2024 · 15 comments
Closed
1 task

Telemetry: investigate why demo.kedro.org is not flowing to Snowflake #2158

Huongg opened this issue Oct 28, 2024 · 15 comments
Assignees

Comments

@Huongg
Copy link
Contributor

Huongg commented Oct 28, 2024

Description

At the moment, we’re not seeing any data flowing from demo.kedro.org to Snowflake, even though it appears to pass through Heap. This issue is to investigate the cause of this issue.

Checklist

  • Include labels so that we can categorise your feature request
@rashidakanchwala
Copy link
Contributor

Weird, I can do this, when I use demo.kedro.org, and turn on live data feed.

@tynandebold
Copy link
Member

Adding another data point: Heap does send events from the demo.kedro.org page.

image

@Huongg
Copy link
Contributor Author

Huongg commented Oct 29, 2024

hmm so it seems like it does send to heap, but somehow nothing shows in snowflake. As i just see the data from the last 2 months, and they are mainly just localHost or github page. Any idea why this would happened @rashidakanchwala @tynandebold 🤔

Screenshot 2024-10-29 at 09 47 53

@rashidakanchwala
Copy link
Contributor

hey I think this is just the data preview, and it doesn't show all the rows. Maybe you need to write filter query to see when landing page is 'demo.kedro.org'

@Huongg
Copy link
Contributor Author

Huongg commented Oct 29, 2024

hey i think @astrojuanlu also confirmed that he couldn't see the demo page from the landing page when he did the query #2140

I'm not seeing https://demo.kedro.org/ in the top list of landing pages, which is interesting to say the least:
LANDING_PAGE,COUNT
127.0.0.1/,54765
localhost/,1822
127.0.0.1/experiment-tracking,357
anasaito-alpha-lineage-x6qqx4593p59q-4141.githubpreview.dev/,299
deepyaman.github.io/jaffle-shop/,162
172.19.113.195/,139
0.0.0.0/,108
133.127.13.9/,108
4141-cs-697995841852-default.cs-us-central1-pits.cloudshell.dev/,107
4141-cs-200f5089-bab4-4372-bef8-0b4fd0d92189.cs-us-east1-pkhd.cloudshell.dev/,65

@rashidakanchwala
Copy link
Contributor

@Huongg , i think in that case we need to open an issue to investigate why isn't the heap data reflected in snowflake and close this one, or update it correctly.

cc - @astrojuanlu

@Huongg
Copy link
Contributor Author

Huongg commented Oct 29, 2024

yup i can update the description on this ticket if it's easier?

@Huongg Huongg changed the title Telemetry: enabling tracking data via demo.kedro.org Telemetry: investigate why demo.kedro.org is not flowing to Snowflake Oct 29, 2024
@astrojuanlu
Copy link
Member

One surprising thing we're observing is that there's lots of pageviews associated with demo.kedro.org ✔️ but zero "interaction events" 🤔

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Nov 4, 2024

@Huongg , @astrojuanlu

Could you confirm if the table missing demo.kedro.org details was Viz_interactions_table? I’ve identified the reason for the missing information: the event was set up to track clicks on elements with data-heap-event, but demo.kedro.org uses data-test attributes instead, as it’s the latest version of Kedro.

By default, Heap generates a generic click event for every click. In my opinion that 'Generic Click' Heap table is the best to follow, and we can use the hierarchy variable within it to identify which specific event was clicked.

let me know what you think? Here's a screenshot of what viz_interactions_table tracks
Screenshot 2024-11-04 at 14 49 31

@rashidakanchwala rashidakanchwala self-assigned this Nov 4, 2024
@Huongg
Copy link
Contributor Author

Huongg commented Nov 4, 2024

hey great work @rashidakanchwala , i can confirm the table is called VIZ_INTERACTION_EVENT_2 in snowflake. and I can now see your new sync table called KEDRO_VIZ_USER_INTERACTIONS_GENERIC_CLICK_EVENT, which is what we need ❤️

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Nov 4, 2024

For the GENERIC_CLICK_EVENTS table, we need to create two custom columns, either in Tableau or Snowflake, as follows:

Extract Data Heap Events (on Tableau) :

REGEXP_EXTRACT([Heap Hierarchy], "\[data-heap-event=([^;]+)\]")

Extract Data Test Events (on Tableau) :

REGEXP_EXTRACT([Heap Hierarchy], "\[data-test-event=([^;]+)\]")

The union of these columns will capture all relevant events, though the naming conventions may vary and require some cleanup. We can also consider @astrojuanlu 's approach by focusing specifically on newer Kedro-Viz versions, using only data-test events for consistency.

also cc : @lid-rs

@rashidakanchwala
Copy link
Contributor

@lid-rs , @astrojuanlu -- let me know if this is fine, and I can close the issue. I looked more into details - the above and the data is same as HEAP.

@astrojuanlu
Copy link
Member

Amazing, thanks @rashidakanchwala!

Can we then unsync VIZ_INTERACTION_EVENT_2 and focus on KEDRO_VIZ_USER_INTERACTIONS_GENERIC_CLICK_EVENT?

About creating custom columns,

Creating a dynamic column in Tableau is a good quick and dirty solution but eventually the Single Source of Truth should be a "gold" table in Snowflake, see https://lakshmanok.medium.com/what-goes-into-bronze-silver-and-gold-layers-of-a-medallion-data-architecture-4b6fdfb405fc

That's out of the scope for this ticket anyway.

Originally posted by @astrojuanlu in #2140 (comment)

@rashidakanchwala
Copy link
Contributor

I have done so. I reckon the tables will still exist in snowflake just not updated. This will create some noise, should we manually delete/archive/hide them or just ignore them?

Screenshot 2024-11-05 at 09 30 06

@astrojuanlu
Copy link
Member

For now let's ignore them, I'd say.

This now seems to be working:

image

Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

4 participants