Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing data #3

Open
akshika47 opened this issue Feb 19, 2023 · 4 comments
Open

Storing data #3

akshika47 opened this issue Feb 19, 2023 · 4 comments

Comments

@akshika47
Copy link
Member

We have to come up with a strategy for storing data in a secure manner that can be used for data analysis. At the same time, we should come up with a policy framework for analysing our data. Looking for suggestions!

@mahfoos
Copy link

mahfoos commented Feb 22, 2023

We can use Anonymization techniques that can help protect the privacy of individuals whose data is being collected and analyzed. Use techniques such as de-identification and differential privacy to ensure that sensitive information is not disclosed.

@mahfoos
Copy link

mahfoos commented Feb 22, 2023

There are a few libraries in Python that you can use for de-identification of data

  1. Faker
  2. Anonymizer
  3. Deidentify

@akshika47
Copy link
Member Author

This sounds wonderful. Do you want to start coding up a solution to de-identify data_set in a CSV file?

We can start with the ScholarX 2021 student application data set and the CSV file would have the following columns

Timestamp | Full Name | Email address | Please share your LinkedIn profile | University/ Institution | Department/ Major | Grade Point Average (GPA) | CV Submission | Describe your achievements to us | Describe any professional experience you have gained in your field to us | Please let us know your top areas of interest (maximum 5 areas) | Please elaborate your career aspirations/ goals to us | Please select the skills/ related fields that you are most interested in developing/ pursuing | First Choice | What do you expect to gain from the ScholarX program through this mentor choice? | Second Choice | What do you expect to gain from the ScholarX program through this mentor choice? | Third Choice | What do you expect to gain from the ScholarX program through this mentor choice? | Would you like to join the ScholarX Management Team as a volunteer to help make the program successful for next year?

Let me know if you need further information. We can set up a call when you have some code developed and I we can run the code to anonymize the data_set.

@mahfoos
Copy link

mahfoos commented Feb 23, 2023

lets start from cleaning the data set put some analyse such as graph or something , then there are some preprocess step available for train the de-identify model can you add the data set in this repo ?

reference for the model - https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0935-4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants