Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review and explain frontend telemetry #2140

Closed
3 tasks done
astrojuanlu opened this issue Oct 14, 2024 · 10 comments
Closed
3 tasks done

Review and explain frontend telemetry #2140

astrojuanlu opened this issue Oct 14, 2024 · 10 comments
Assignees

Comments

@astrojuanlu
Copy link
Member

astrojuanlu commented Oct 14, 2024

In the context of kedro-org/kedro-devrel#151 we're trying to understand Kedro Viz frontend telemetry.

This is what we need:

  • (P0) we need to know what is the earliest Kedro Viz version that has good, consistent, reliable, understandable data (and for now we will ignore any older or historical versions)
  • (P0) we need to know whether the user ever opened Kedro Viz. is that possible? or do we only have click events?
    if this is not possible, then we have to decide between "opening the Kedro Viz frontend" == "launching $ kedro viz run" OR "opening the Viz frontend" == ">=1 click event anywhere"
  • (P1) we need to know whether the user navigated to the Experiment Tracking view (click on the menu? direct URL? you tell us) + clicks on any element of the Experiment Tracking view
    you can either catalogue all elements and give us the list, or just tell us how to know whether an element belongs to Experiment Tracking

Here's what we're currently seeing on Snowflake:

-- SELECT * FROM HEAP_FRAMEWORK_VIZ_PRODUCTION.HEAP.VIZ_INTERACTION_EVENT_2 ORDER BY TIME DESC LIMIT 5;

USER_ID,EVENT_ID,SESSION_ID,TIME,SESSION_TIME,TYPE,LIBRARY,PLATFORM,DEVICE_TYPE,COUNTRY,REGION,CITY,IP,REFERRER,LANDING_PAGE,LANDING_PAGE_QUERY,LANDING_PAGE_HASH,BROWSER,SEARCH_KEYWORD,UTM_SOURCE,UTM_CAMPAIGN,UTM_MEDIUM,UTM_TERM,UTM_CONTENT,DOMAIN,QUERY,PATH,HASH,TITLE,HREF,TARGET_TEXT,HEAP_DEVICE_ID,HEAP_PREVIOUS_PAGE,HEAP_HIERARCHY,DATA_ATTRIBUTE_DATA_HEAP_EVENT,DATA_HEAP_EVENT,HEAP_EVENTS
7239620197193325,1375388039555433,409235989008924,2024-10-11 16:25:05.339 Z,2024-10-11 16:23:46.291 Z,click,web,Mac OS X 14.7,Desktop,,,,,,deepyaman.github.io/jaffle-shop/,,,Chrome 129.0.6668,,,,,,,deepyaman.github.io,?pipeline_id=__default__&selected_id=70f3cccb,/jaffle-shop/,,Kedro-Viz,,,,/jaffle-shop/,@div;#root;|@div;.kedro;.kedro-pipeline;.kui-theme--dark;|@div;.kedro-pipeline;|@div;.kedro;.pipeline-metadata;.pipeline-metadata--visible;|@div;.pipeline-metadata__header-toolbox;|@div;.pipeline-toggle;.pipeline-toggle--enabled;|@input;#pipeline-toggle-input-code;.pipeline-toggle-input;[data-heap-event=visible.code.false];[data-test=pipeline-toggle-input-code];[type=checkbox];|,[data-heap-event=visible.code.false],,visible
7239620197193325,7837890494747269,409235989008924,2024-10-11 16:25:03.963 Z,2024-10-11 16:23:46.301 Z,click,web,Mac OS X 14.7,Desktop,,,,,,deepyaman.github.io/jaffle-shop/,,,Chrome 129.0.6668,,,,,,,deepyaman.github.io,?pipeline_id=__default__&selected_id=c70a121d,/jaffle-shop/,,Kedro-Viz,,,,/jaffle-shop/,@div;#root;|@div;.kedro;.kedro-pipeline;.kui-theme--dark;|@div;.kedro-pipeline;|@div;.pipeline-sidebar;.pipeline-sidebar--visible;|@div;.pipeline-ui;|@div;.pipeline-nodelist;|@div;.pipeline-nodelist__split;|@div;.pipeline-nodelist__elements-panel;|@div;.pipeline-nodelist-scrollbars;|@div;|@div;.pipeline-nodelist-section;|@ul;#:r0:;.MuiTreeView-root;.css-v02vic;[aria-multiselectable=false];[role=tree];[tabindex=0];|@li;#:r0:-70f3cccb;.MuiTreeItem-root;.css-105mfs8;.pipeline-treeItem__root--overwrite;[role=treeitem];[tabindex=-1];|@div;.MuiTreeItem-content;.css-tc4u2w;|@div;.MuiTreeItem-label;|@div;.kedro;.pipeline-nodelist__row;.pipeline-nodelist__row--active;.pipeline-nodelist__row--kind-element;.pipeline-nodelist__row--visible;[title=Process Orders];|@button;.pipeline-nodelist__row__text;.pipeline-nodelist__row__text--kind-element;.pipeline-nodelist__row__text--tree;[data-heap-event=clicked.sidebar.task];[data-test=node-Process Orders];[title=Process Orders];|,[data-heap-event=clicked.sidebar.task],,clicked
7239620197193325,1594084408039653,409235989008924,2024-10-11 16:25:02.979 Z,2024-10-11 16:23:46.303 Z,click,web,Mac OS X 14.7,Desktop,,,,,,deepyaman.github.io/jaffle-shop/,,,Chrome 129.0.6668,,,,,,,deepyaman.github.io,?pipeline_id=__default__&selected_id=fe348386,/jaffle-shop/,,Kedro-Viz,,,,/jaffle-shop/,@div;.kedro-pipeline;|@div;.pipeline-sidebar;.pipeline-sidebar--visible;|@div;.pipeline-ui;|@div;.pipeline-nodelist;|@div;.pipeline-nodelist__split;|@div;.pipeline-nodelist__elements-panel;|@div;.pipeline-nodelist-scrollbars;|@div;|@div;.pipeline-nodelist-section;|@ul;#:r0:;.MuiTreeView-root;.css-v02vic;[aria-multiselectable=false];[role=tree];[tabindex=0];|@li;#:r0:-c70a121d;.MuiTreeItem-root;.css-105mfs8;.pipeline-treeItem__root--overwrite;[role=treeitem];[tabindex=-1];|@div;.MuiTreeItem-content;.css-tc4u2w;|@div;.MuiTreeItem-label;|@div;.kedro;.pipeline-nodelist__row;.pipeline-nodelist__row--active;.pipeline-nodelist__row--kind-element;.pipeline-nodelist__row--visible;[title=Process Customers];|@button;.pipeline-nodelist__row__text;.pipeline-nodelist__row__text--kind-element;.pipeline-nodelist__row__text--tree;[data-heap-event=clicked.sidebar.task];[data-test=node-Process Customers];[title=Process Customers];|@span;.pipeline-nodelist__row__label;.pipeline-nodelist__row__label--kind-element;|,[data-heap-event=clicked.sidebar.task],,clicked
7239620197193325,1131452433356885,409235989008924,2024-10-11 16:25:00.139 Z,2024-10-11 16:23:46.305 Z,click,web,Mac OS X 14.7,Desktop,,,,,,deepyaman.github.io/jaffle-shop/,,,Chrome 129.0.6668,,,,,,,deepyaman.github.io,?pipeline_id=__default__&selected_id=78ccdfb4,/jaffle-shop/,,Kedro-Viz,,,,/jaffle-shop/,@div;#root;|@div;.kedro;.kedro-pipeline;.kui-theme--dark;|@div;.kedro-pipeline;|@div;.pipeline-sidebar;.pipeline-sidebar--visible;|@div;.pipeline-ui;|@div;.pipeline-nodelist;|@div;.pipeline-nodelist__split;|@div;.pipeline-nodelist__elements-panel;|@div;.pipeline-nodelist-scrollbars;|@div;|@div;.pipeline-nodelist-section;|@ul;#:r0:;.MuiTreeView-root;.css-v02vic;[aria-multiselectable=false];[role=tree];[tabindex=0];|@li;#:r0:-fe348386;.MuiTreeItem-root;.css-105mfs8;.pipeline-treeItem__root--overwrite;[role=treeitem];[tabindex=-1];|@div;.MuiTreeItem-content;.css-tc4u2w;|@div;.MuiTreeItem-label;|@div;.kedro;.pipeline-nodelist__row;.pipeline-nodelist__row--active;.pipeline-nodelist__row--kind-element;.pipeline-nodelist__row--visible;[title=Rename Orders];|@button;.pipeline-nodelist__row__text;.pipeline-nodelist__row__text--kind-element;.pipeline-nodelist__row__text--tree;[data-heap-event=clicked.sidebar.task];[data-test=node-Rename Orders];[title=Rename Orders];|,[data-heap-event=clicked.sidebar.task],,clicked
7239620197193325,7127285673513780,409235989008924,2024-10-11 16:24:58.578 Z,2024-10-11 16:23:46.303 Z,click,web,Mac OS X 14.7,Desktop,,,,,,deepyaman.github.io/jaffle-shop/,,,Chrome 129.0.6668,,,,,,,deepyaman.github.io,?pipeline_id=__default__&selected_id=96584038,/jaffle-shop/,,Kedro-Viz,,,,/jaffle-shop/,@div;.kedro-pipeline;|@div;.pipeline-sidebar;.pipeline-sidebar--visible;|@div;.pipeline-ui;|@div;.pipeline-nodelist;|@div;.pipeline-nodelist__split;|@div;.pipeline-nodelist__elements-panel;|@div;.pipeline-nodelist-scrollbars;|@div;|@div;.pipeline-nodelist-section;|@ul;#:r0:;.MuiTreeView-root;.css-v02vic;[aria-multiselectable=false];[role=tree];[tabindex=0];|@li;#:r0:-78ccdfb4;.MuiTreeItem-root;.css-105mfs8;.pipeline-treeItem__root--overwrite;[role=treeitem];[tabindex=-1];|@div;.MuiTreeItem-content;.css-tc4u2w;|@div;.MuiTreeItem-label;|@div;.kedro;.pipeline-nodelist__row;.pipeline-nodelist__row--active;.pipeline-nodelist__row--kind-element;.pipeline-nodelist__row--visible;[title=Rename Payments];|@button;.pipeline-nodelist__row__text;.pipeline-nodelist__row__text--kind-element;.pipeline-nodelist__row__text--tree;[data-heap-event=clicked.sidebar.task];[data-test=node-Rename Payments];[title=Rename Payments];|@span;.pipeline-nodelist__row__label;.pipeline-nodelist__row__label--kind-element;|,[data-heap-event=clicked.sidebar.task],,clicked

I have several questions:

  • It's weird that there have been zero telemetry events in the past 2 days. Synchronisation between Heap and Snowflake is happening:
-- SELECT MAX(TIME) FROM HEAP_FRAMEWORK_VIZ_PRODUCTION.HEAP.ANY_COMMAND_RUN;

MAX(TIME)
2024-10-13 21:27:18.307 +0000

Could you double check that frontend telemetry is flowing?

  • I don't think we're tracking the Kedro Viz version here. This will make the analysis very difficult, because the HEAP_HIERARCHY changes across versions (the CSS and HTML change). Could we add that to the table somehow?
  • What does DATA_ATTRIBUTE_DATA_HEAP_EVENT mean?
  • DATA_HEAP_EVENT is empty in all cases. Is that intended?
  • I'm not seeing https://demo.kedro.org/ in the top list of landing pages, which is interesting to say the least:
-- SELECT
--     LANDING_PAGE,
--     COUNT(*) AS COUNT
-- FROM HEAP_FRAMEWORK_VIZ_PRODUCTION.HEAP.VIZ_INTERACTION_EVENT_2
-- GROUP BY LANDING_PAGE
-- ORDER BY COUNT DESC
-- LIMIT 10;

LANDING_PAGE,COUNT
127.0.0.1/,54765
localhost/,1822
127.0.0.1/experiment-tracking,357
anasaito-alpha-lineage-x6qqx4593p59q-4141.githubpreview.dev/,299
deepyaman.github.io/jaffle-shop/,162
172.19.113.195/,139
0.0.0.0/,108
133.127.13.9/,108
4141-cs-697995841852-default.cs-us-central1-pits.cloudshell.dev/,107
4141-cs-200f5089-bab4-4372-bef8-0b4fd0d92189.cs-us-east1-pkhd.cloudshell.dev/,65

Is telemetry enabled for our demo site? If not, can we open a ticket to enable it? And if it's indeed enabled, should we investigate why it's not flowing?

@Huongg
Copy link
Contributor

Huongg commented Oct 28, 2024

hey @astrojuanlu thanks for drafting all these requirements and questions, very helpful to follow them. Please see my reply as below:

  • (P0) we need to know what is the earliest Kedro Viz version that has good, consistent, reliable, understandable data (and for now we will ignore any older or historical versions).**
    • Although we haven’t tracked the version before, we can start now by adopting Kedro’s version tracking approach—for example, storing the kedro_viz version in pyproject.toml.
   "username": user_uuid,
    "project_id": hashed_project_id,
    "project_version": KEDRO_VERSION,
    "project_kedro_viz_version": KEDRO_VIZ_VERSION // something like this for kedro viz version?? 
    "telemetry_version": TELEMETRY_VERSION,
  • (P0) we need to know whether the user ever opened Kedro Viz. is that possible? or do we only have click events?**
    • Currently, we track when users run kedro viz run, and we also track when user lands in the page. Is this enough for tracking purposes, or is there something additional you'd like to capture? For example, would you want to see the top features users engage with when they run Kedro Viz?
Screenshot 2024-10-28 at 15 16 56
  • (P1) we need to know whether the user navigated to the Experiment Tracking view (click on the menu? direct URL? you tell us) + clicks on any element of the Experiment Tracking view
you can either catalogue all elements and give us the list, or just tell us how to know whether an element belongs to Experiment Tracking**
    • I believe clicking the Experiment Tracking button and loading the URL already gets tracked under View/experiment-tracking.
Screenshot 2024-10-28 at 14 53 07
  • What does DATA_ATTRIBUTE_DATA_HEAP_EVENT mean?**
    • This was an older event tagged as data-heap-event. Recently, @rashidakanchwala transitioned it to data-test, which we use for testing with Cypress and can also apply with Heap.
    • I actually need your help here @astrojuanlu —is there a way to query a new column named DATA_TEST_ATTRIBUTE? I tried something in snowflake but I didn’t have the right permission to do so
UPDATE
  VIZ_INTERACTION_EVENT_2
SET
  DATA_ATTRIBUTE_DATA_HEAP_EVENT = REGEXP_REPLACE (
    DATA_ATTRIBUTE_DATA_HEAP_EVENT,
    'data-heap-event="(.*?)"',
    'data-test="\\1"'
  );
  • DATA_HEAP_EVENT is empty in all cases. Is that intended?**

    • I’m not sure what data_heap_event represents, and we likely don’t need this column. @rashidakanchwala , could you please confirm?
  • I'm not seeing https://demo.kedro.org/ in the top list of landing pages, which is interesting to say the least.**

    • I suspect it's not enabled on the demo page, as demo.kedro.org is marked as runningLocally, which may have prevented the data from flowing in. Could you confirm this as well @rashidakanchwala . If it's true then I can remove demo page from the isRunningLocally ?

@astrojuanlu
Copy link
Member Author

Although we haven’t tracked the version before, we can start now by adopting Kedro’s version tracking approach

💯!

we also track when user lands in the page

Exactly what we needed 👍🏼

clicking the Experiment Tracking button and loading the URL already gets tracked under View/experiment-tracking

👍🏼

Recently, @rashidakanchwala transitioned it to data-test, which we use for testing with Cypress and can also apply with Heap [...]

Understood 👍🏼

I tried something in snowflake but I didn’t have the right permission to do so

In the raw table (the one that gets ingested directly from Heap to Snowflake), the column will unfortunately have to still be called DATA_ATTRIBUTE_DATA_HEAP_EVENT. On the KEDRO_BI_DB database we have permissions to create intermediate tables. Some form of ETL should do it (see @DimedS's work on kedro-org/kedro-devrel#145

@Huongg
Copy link
Contributor

Huongg commented Oct 28, 2024

this is perfect, thank you @astrojuanlu. I've created two tickets to address the Kedro-Viz version issue and to enable tracking on the demo page.

Regarding creating intermediate tables to address the DATA_HEAP_EVENT, I'm uncertain if it's worth further effort since I managed to add a column in Tableau that filters through the data hierarchy to return only data-test values. It's a bit hacky, but it works for now. Let me know your thoughts!

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Oct 29, 2024

Yeah, i think there's no easy way to do this. Heap has suggested data* attributes or aria-labels and I think it doesn't give us the information directly hence we need to look for it under HREF/Hierarchy column. Some details here - https://help.heap.io/data-management/react-dom-minification/general-strategies-for-using-heap-with-a-minified-dom/

@astrojuanlu
Copy link
Member Author

I'm confused, @rashidakanchwala could you elaborate? Is this related to kedro-org/kedro-plugins#923, #2158, something else?

@rashidakanchwala
Copy link
Contributor

neither, it's related to @Huongg's comment here

Regarding creating intermediate tables to address the DATA_HEAP_EVENT, I'm uncertain if it's worth further effort since I managed to add a column in Tableau that filters through the data hierarchy to return only data-test values. It's a bit hacky, but it works for now. Let me know your thoughts!

@astrojuanlu
Copy link
Member Author

Ah got it.

Creating a dynamic column in Tableau is a good quick and dirty solution but eventually the Single Source of Truth should be a "gold" table in Snowflake, see https://lakshmanok.medium.com/what-goes-into-bronze-silver-and-gold-layers-of-a-medallion-data-architecture-4b6fdfb405fc

That's out of the scope for this ticket anyway.

@astrojuanlu
Copy link
Member Author

From the original list on #2140 (comment),

Syncing some of these events to Snowflake might require an extra step, if needed we can have a look together.

Finally there's an ongoing investigation on the lack of demo.kedro.org events, @rashidakanchwala do we have a separate issue for that?

One more clarification from my side. I get that DATA_HEAP_EVENT is empty and not needed. It's not clear to me whether we should be looking at the content of HEAP_HIERARCHY

@div;#root;|@div;.kedro;.kedro-pipeline;.kui-theme--dark;|@div;.kedro-pipeline;|@div;.kedro;.pipeline-metadata;.pipeline-metadata--visible;|@div;.pipeline-metadata__header-toolbox;|@div;.pipeline-toggle;.pipeline-toggle--enabled;|@input;#pipeline-toggle-input-code;.pipeline-toggle-input;[data-heap-event=visible.code.false];[data-test=pipeline-toggle-input-code];[type=checkbox];|

or DATA_ATTRIBUTE_DATA_HEAP_EVENT

[data-heap-event=visible.code.false]

Could you clarify?

@rashidakanchwala
Copy link
Contributor

Finally there's an ongoing investigation on the lack of demo.kedro.org events, @rashidakanchwala do we have a separate issue for that?

#2158 - that's the issue.

We should definitely look at HEAP_HIERARCHY and regex this bit [data-test=pipeline-toggle-input-code] let's ignore data-heap for now -- it will show up on older versions of Kedro before 10.0

@astrojuanlu
Copy link
Member Author

I think there's already follow-up issues for everything, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Development

No branches or pull requests

3 participants