User Engagement Forecasting

Animesh

Anirudh

Brandon

Eli

Our General Approach

Data Clean Room

The full source code and summary is here: Data Clean Room Task

We simply followed the instructions given in the prompt release as well as the TPM coding handout.

We chose to use the Intel SGX (Software Guard Extensions), which prove hardware-level protection against privacy risks. While we were considering using some software solutions, such as Docker, we came to the conclusion that hardware protections would suffice.

Our general structure was to use a Azure Virtual Machine with the Intel SGX capability. Then we have a separate resource called a Key Vault to save the public key's AK (Attestation Key).

Aggregate Statistics

The full visualizations and data analysis report is here: Aggregate Statistics Task

After merging and preprocessing the data, we used Pandas and Plotly to visualize the distributions and make statistical inferences. In addition to the given prompts, we also analyzed:

User Likes and Dislikes
Progress In Article

From the statistical analysis, our conclusion was that the most important attribute that differentiates Potential Customers vs. Non Potential Customers is how far they get through the article that the ad is shown on. That is, users who spend more time scrolling through the article are more likely to click on the ad, and users who spend less time scrolling through the article are less likely to click on the ad.

Predictive Model

The full source code and summary is here: Predictive Modelling Task

The predictive model we used was a random forest model with a hundred decision trees. The model achieved accuracy, precision, recall, and AUC scores of 1.0 (meaning 100% accuracy). The most important feature was found to be the progress through the article (as seen during aggregate statistics task!)

Generative Model

The full source code and summary is here: Generative Modelling Task

The generative model we used was a CTGAN which works like a regular GAN but is conditioned to receive and return outputs of a certain format. The CTGAN seemed to generate data which mimicked the distributions of real world data. Columns with privacy risks such as user id and log id were removed and not generated by the CTGAN to ensure trustworthy AI!

Sources Used Throughout Development

TPM Coding Tutorial

Remote Attestation Tutorial

Prompt Release Slides

CTGan Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
Part I - DCR		Part I - DCR
Part II-Task 1		Part II-Task 1
Part II-Task 2		Part II-Task 2
Part II-Task 3		Part II-Task 3
assets		assets
data		data
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

User Engagement Forecasting

Our General Approach

Data Clean Room

Aggregate Statistics

Predictive Model

Generative Model

Sources Used Throughout Development

About

Releases

Packages

Contributors 3

Languages

akannan05/ges24

Folders and files

Latest commit

History

Repository files navigation

User Engagement Forecasting

Our General Approach

Data Clean Room

Aggregate Statistics

Predictive Model

Generative Model

Sources Used Throughout Development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages