Basic to-do List

- For the Prediction Model

~~Add word doc to repo~~
~~Find out the basic data points for the dataset~~
~~Make a questionare for the data input (training)~~
Start surveying people
Get atleast 50 column entries
Make a model that predicts the illness based off data points, and for this i think random forest might be the best bet going forward, it basically is decision tree but better, and we need a model that would use mutliple and not so well connected data points to work well together, and RF does it best for us

Project To-do's:

~~https://huggingface.co/blog/sentiment-analysis-python;~~
Start with tokenisation ideation and how to work with it
Make a model that predicts depression and other stuff based on rating on the emotion scale
Start on the NLP front for making believeable conversations, implent ideas from todolist app // Use chatGPT's API
~~Need to download tf.model.h5, it's 1 gig~~
~~Need to make a listed dictionary of emotions based on their importance per inputText, and then sort them, and be able to show the topmost emotion as the prevalant emotion~~
~~Figure out the iteration scenario with the input model and sort out the situation with the scoresList (We can instead just call the function on an interation)~~
~~Figure out how to associate words with emotions and then make a wordcloud of the most common words associated with the emotion~~
~~Make a wordcloud of the main words associated with positive and negative emotions in that specific text~~
~~Try to use Pygmalion AI to have the conversation, it's fine if it is jacked via google cloud~~
Start to learn how to teach the AI to contextualize in a conversation, use a detailed tree structure to make it understand the context of the conversation // use detailed tree structure to make it understand the context of the conversation
Understand how the wordcloud works and based on start with tokenization
~~Find a way to change the AI's name (json file needs to be edited it seems)~~
~~Find a way to make the AI be more professional, and when time comes to be, i need for it to be more relaxed and chill~~
~~Find a way to make the AI be better at keeping context~~
~~Find a way to inject emotions into the conversation, and make it more natural~~
Find a way to extract information like names and stuff
Find a way to inject the questions into the conversation naturally without messing up the flow of the conversation
Need to connect the sentimental analysis with the NLP model and figure out injection
~~Connection with the front end API, learn JSON and all that~~
Need to make the sentimental analysis return -1, 0, 1 for negative, neutral, positive emotions portrayed in the text, so that the model can take heed for this
Sentimental analysis doesn't return correct emotion when taking in words that go with positive emotions but for example there is addition of "not", it doesn't work well, and often disregards the existance of the negative promptive word
~~bB_t.py line 61~~
~~Fix error "ValueError: All arrays must be of the same length" in scripts.basicRun.py in like 3~~
~~Need to run install.bat and then follow the tutorial and download PYG 4bit, and see how it works w/o WSL~~
Need to add stop words to the NLP model so that it does not use racist words and responses
Make a function that goes through the bot's response to see what it's asking and if it's relevant, to then add it to the info dictionary
Need to add ignore for pycache's
Need to add attention mask, pad token ID,
Can add a mood selector that deploys a different story setting based on the user preference
Need to make the git repo more clean, need to clean up readme and todo's

For the Submission:

~~Make a video~~
~~Discuss the algortihm, and how it works~~
~~Results : delivered scope~~

- For the NLP

~~Find out whether to use new or old model for the talking part~~
Find dataset to train off of
~~Use chatGPT's API to make the chatbot, it says that we can modify the APi call to fit our needs, paraphrasing:~~
In general, the attribution should include the OpenAI logo and a statement that your project is "Powered by OpenAI." You may also be required to include additional attribution depending on the type and frequency of your usage.
https://huggingface.co/PygmalionAI/pygmalion-6b/tree/main
~~https://huggingface.co/facebook/blenderbot-400M-distill?text=Hey+my+name+is+Julien%21+How+are+you%3F~~
https://getstream.io/blog/conversational-ai-flutter/

For BlenderBOT 400M

- For the input understanding

Train the model to understand certain keywords
Teach it to relate keywords to moods (Sentiment Analysis, Mood Analysis)
Make different and solo definitions out of each function, so that it helps in the expansion of the code

How to change the name of the bot:

- How to work on privacy of data collected

When collecting data from a chat conversation to feed into an AI-based counselor, it's important to collect only the minimum amount of data necessary to provide the counselor's functionality. Here are some ways you can do this:

Identify key data points: Determine which data points are necessary for the counselor to provide effective advice or support. For example, if the counselor is providing mental health support, you may only need to collect information about the user's mood, stress level, and sleep patterns.
Use pre-defined response options: Use pre-defined response options or buttons to collect user data, rather than open-ended questions that may result in unnecessary data. For example, you can ask users to rate their mood on a scale of 1 to 10, rather than asking them to describe their mood in detail.
Filter out irrelevant data: Use natural language processing (NLP) techniques to filter out irrelevant data from the conversation. For example, you can use sentiment analysis to filter out messages that are not related to the user's mood or stress level.
Collect data in real-time: Collect data in real-time during the conversation, rather than collecting all data at the end of the conversation. This can help you to collect only the necessary data points and avoid collecting unnecessary or irrelevant data.
Anonymize user data: Anonymize user data as much as possible to protect user privacy. For example, you can use unique identifiers instead of usernames or personal information to track user conversations.

By following these best practices, you can help to ensure that you are collecting only the necessary data from chat conversations to feed into your AI-based counselor, while also protecting user privacy and building user trust.

- How to work on the encryption of data collected

If you are feeding data to me for chatbot purposes, there are a few ways to implement encryption to protect the data:

Transport Layer Security (TLS): Use TLS to encrypt data in transit between your server and mine. This will help to prevent any interception or eavesdropping on the communication channel.
Encrypt data at rest: If you need to store user data, you can encrypt it at rest using a symmetric or asymmetric encryption algorithm. This will ensure that the data is protected even if the storage media is compromised.
Hash sensitive data: If you need to store sensitive data like passwords, you can hash the data using a one-way hashing algorithm like SHA-256 or bcrypt. This will help to protect the passwords in case of a data breach.
Use secure APIs: When communicating with me, use secure APIs that require authentication and authorization. This will ensure that only authorized parties can access the data.

It's important to note that encryption can add processing overhead and may impact performance, so it's important to consider the trade-offs between security and performance when implementing encryption.

- Services to use for dB:

There are several services that you can use for secure storage of login details and user data. Here are some options:

Login details:

Amazon Web Services (AWS) provides several secure database services, including Amazon RDS and Amazon DynamoDB, which can be used to securely store login details.
Google Cloud Platform (GCP) also provides secure database services like Google Cloud SQL and Google Cloud Firestore.
Microsoft Azure offers several secure database services, including Azure SQL Database and Azure Cosmos DB.

User data:

AWS provides secure storage services like Amazon S3 and Amazon Glacier that can be used to store user data.
GCP offers several secure storage options like Google Cloud Storage and Google Cloud Bigtable.
Microsoft Azure provides secure storage services like Azure Blob Storage and Azure Data Lake Storage.

In addition to these cloud-based storage options, there are also other third-party storage providers like MongoDB Atlas, Firebase, and DigitalOcean that provide secure storage services.

It's important to evaluate each service based on your specific needs and requirements, including factors like data volume, performance, scalability, and cost. Additionally, you should also consider the security features and certifications offered by each service to ensure that they meet your security standards.

In Project:

In /SRC:

analysis ~

~~Need to clean up the code, make it more readable and understandable, remove the unnecessary comments~~
Figure out how to get neccessary items from the list that is returned
Try to make the code more optimised, it's too cluttered and slow as of now
~~Understand how to go forward with the dataFrame workflow of the list that is returned, and how to visualise it with wordcloud~~
~~Configure wordcloud~~

test ~

Use it for dump commands and testing

input ~

Try to add more text for the analysis to see how the output works out

In /plugins:

Try making a new plugin for each redudant piece of code that is there in the form of comments to make sure it works
Don't forget to import this plugin folder into the main src folder

For the chatBot:

Try to use chatGPT's API to have the insertion for the chatbot

Databases // Reading Sources

https://www.kaggle.com/datasets/arashnic/the-depression-dataset
https://data.world/datasets/depression (each link has a source file inside)
https://datasets.simula.no/depresjon/
https://paperswithcode.com/task/depression-detection (Imp to look at, uses speech data to pred)
https://www.nature.com/articles/s41597-022-01211-x (Uses brainwaves and speech analysis)
https://www.hindawi.com/journals/cin/2022/5731532/ (Isn't this what we are doing ?)
https://github.com/kharrigian/mental-health-datasets (Dataset Megadoc)
https://link.springer.com/article/10.1007/s00521-021-06426-4 (Paper on Deep Learning and RNN to detect depression on text based)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8675644/ (Child depression detection using ML, uses the YMM dataset)

Project Pre-Requisites:

Text Preprocessing: This involves cleaning, tokenizing, and normalizing the text to remove any noise, stop words, punctuation, and convert the text to a standardized format.
Feature Extraction: In this step, you will need to convert the preprocessed text into numerical representations (features) that can be used by the machine learning algorithms. Common methods for feature extraction in NLP include Bag of Words, TF-IDF, Word2Vec, and GloVe.
Machine Learning Algorithms: You will need to be familiar with different ML algorithms such as Decision Trees, Naive Bayes, Logistic Regression, and Neural Networks. You will also need to know how to train, validate, and test these models.
Evaluation Metrics: You will need to be able to evaluate the performance of your machine learning models using metrics such as accuracy, precision, recall, F1-score, and confusion matrix.
Dataset Creation: You will need to have a dataset of conversations between people with and without depression to train your model. You will also need to ensure that your dataset is balanced, representative, and annotated correctly.

Courses:

To get started with these topics, you can take online courses such as "Natural Language Processing with Python" by NLTK, "Applied Machine Learning" by Coursera, "Machine Learning A-Z" by Udemy, or "Deep Learning Specialization" by Coursera.

https://in.coursera.org/learn/machine-learning
https://in.coursera.org/specializations/deep-learning#courses
https://in.coursera.org/specializations/data-science-python#courses

- Database Metrics:

Demographic Info:

Age, Gender, Ethinicity, education, employment status, financial status, (relationship status <- doubtful)

Family History:

If any conditions run in the family

Medical History:

Past and current medical conditions, medications, treatments

Symptoms:

Detailed information on a person's current and past symptoms, including severity, duration, and frequency

Life events:

Traumatic events, major life changes, and other significant experiences

Substance usage:

Information about alcohol, tobacco, and drug use can help to diagnose substance abuse or addiction

Support system:

Relationships with family, friends, and colleagues

Stress levels:

Self explanatory, can be broken down into thresholds and sources

Personal beliefs: (doubtful)

Personal beliefs and attitudes towards mental health, including stigma, can affect a person's willingness to seek help and comply with treatment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

toDO.md

toDO.md

Basic to-do List

- For the Prediction Model

Project To-do's:

For the Submission:

- For the NLP

For BlenderBOT 400M

- For the input understanding

How to change the name of the bot:

- How to work on privacy of data collected

- How to work on the encryption of data collected

- Services to use for dB:

Login details:

User data:

In Project:

In /SRC:

analysis ~

test ~

input ~

In /plugins:

For the chatBot:

Databases // Reading Sources

Project Pre-Requisites:

Courses:

Link Area

- Database Metrics:

Files

toDO.md

Latest commit

History

toDO.md

File metadata and controls

Basic to-do List

- For the Prediction Model

Project To-do's:

For the Submission:

- For the NLP

For BlenderBOT 400M

- For the input understanding

How to change the name of the bot:

- How to work on privacy of data collected

- How to work on the encryption of data collected

- Services to use for dB:

Login details:

User data:

In Project:

In /SRC:

analysis ~

test ~

input ~

In /plugins:

For the chatBot:

Databases // Reading Sources

Project Pre-Requisites:

Courses:

Link Area

- Database Metrics: