Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Chat Bot Interface Trained On Documentation Site #102

Open
inodb opened this issue Apr 1, 2023 · 16 comments
Open

Create Chat Bot Interface Trained On Documentation Site #102

inodb opened this issue Apr 1, 2023 · 16 comments

Comments

@inodb
Copy link
Member

inodb commented Apr 1, 2023

Background:

  • cBioPortal: cBioPortal is an open-source platform for cancer genomics data analysis and visualization. It provides a centralized resource for exploring and analyzing large-scale cancer genomic data sets, including genomic alterations, gene expression, and clinical information. The platform integrates data from multiple sources, including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), and makes it available through a web interface for researchers, clinicians, and the general public. Please refer to the cBioPortal home page for an overview.
  • cBioPortal has lots of documentation available (https://docs.cbioportal.org/) on how (1) to install and configure cBioPortal locally, (2) use cBioPortal as a user, (3) programmatically use the API. Searching through the documentation is not always straightforward and we often get questions on the user group (https://groups.google.com/g/cbioportal) where we mainly point to a link in the docs. A chat interface might be a good solution for giving users quicker feedback on what they are searching for

Goal:

  • Build a chat bot interface for cBioPortal's Documentation

Approach:

Need skills:
Familiarity with the command line and the use of APIs

Possible mentors:
@inodb
@walleXD

@Praashh
Copy link

Praashh commented Apr 2, 2023

Hey @inodb I think my skills are similiar as the project can you assign me ?

@priyanshiaroraaa
Copy link

I know I am new to this but I am currently building a personal virtual assistant in python language for my minor project 2 in college and I have a good command in Java too. I am an AIML student, the knowledge of which will help me train the model for your chatbot. I have good command in python, Java, Machine learning, NLP and AI algorithms. Since I am currently working on my minor 2 project right now and it is not completed, I am attaching my documentation till now and the code till now for reference
MINOR.docx
synopsis presentation short.pptx
Software Requirements Specification.docx

@kamranayesh
Copy link

Hi!
I’m Kamran Ayesh, a CSE final student at Indian Institute of Information Technology Guwahati, India. I have written a well explained proposal for chatbot interface trained on documentation site. I am hoping for feedback or any queries from you soon.
I am very well suited for contributing to this project as during my internship I have made a virtual assistant with robust UI. Being a developer this project will enhance my skills and give better exposure to open-source.

Looking forward to contributing!

Thanks,
Kamran Ayesh

@Nisarg908
Copy link

I'm interested in helping to build a chatbot, I am Nisarg Patel, a CSE 2nd year university student I would like to contribute in building this chatbot. I am new at this but I am ready to learn and help for the cause and this will help me improve.

Looking forward for your response!

Thanks,
Nisarg Patel

@JamesAlaric
Copy link

JamesAlaric commented Apr 4, 2023

Hello i'm interested in helping to develope this chatbot. How can i apply as gsoc contributor? plzzz

@ViditJain123
Copy link

ViditJain123 commented Nov 18, 2023 via email

@NehaAr
Copy link

NehaAr commented Jan 20, 2024

Hi,,,i am working on similar use case for my pipeline..where i am building a chatbot to scrape through the documents in my pipeline..i really would like to solve the above issue

@inodb inodb added GSoC-2024 GSoC 2024 Candidate Projects and removed GSoC-2023 GSoC 2023 Candidate Projects labels Feb 5, 2024
@NeuralFlux
Copy link

Hi @inodb , I'm a CS grad at NYU with a solid grasp of ML, PyTorch, and CLI. I've worked on LLMs for zero-shot classification on food ingredient data. I believe using LLMs for retrieval augmented generation is highly applicable to your use-case. How would you advise me to get started on this?

@kartheekyakkala
Copy link

Hello @inodb, I feel we can use Retrieval-Augmented Generation (RAG) technique instead of fine tuning or training. Since the documentation or knowledge base gets updated now and then, fine tuning the LLM could be costly. Moreover, RAG technique is more reliable as it has up to date knowledge. I'm a CS grad at UCM with huge interest in LLMs and Generative AI. I would like to work on this issue could you give me some leads?

@Steveolas
Copy link

Hey all! I am Ilan, a Data Science grad from the Technion. I would love to contribute to this project.

@inodb As a first step, I wanted to ask if you already thought on how you were going to structure the documentation as data for training. If so, I would love to get am example, If not I think that could be a good step to begin with. Also I would like to know if it's possible to share the documention in some easyto work with format that you might have on the backend. If not, I can just go scraping it straight from the webpage.

Anyway, would love to get some suggestions on what should be the first steps to start getting familiar with the project.

Thanks
Ilan Meissonnier

@Steveolas
Copy link

BTW The Medium link given as example blog is member only :(. The following blog seems like a pretty similar (hard to tell as couldn't read the original LOL). Hope this is helpful.

Ilan

@Steveolas
Copy link

Steveolas commented Mar 7, 2024

Hey all!
I have been thinking about this project a bit and I have some interesting thoughts I'd like to share...

If I was using a chatbot to help me navigate documentation, I would prefer if it would be able to provide me a link to the documentation page where it learned the info from. This way I am able to fact check it and/or read further into the problem I'm having. As we know, LLMs are not always accurate and can sometimes be quite confident even when wrong. While it can be possible to train the chatbot to retrieve a link as well as answer a question (by structuring the training data in such a way), this task might be more simply solved using traditional information retrieval techniques. i.e retrieving the page that best matches a user query from a search bar (I have noticed that the search bar on the documentation webpage is not functional atm). This of course gets more complicated if you want to include answers from the google group conversations, but this approach should definitely be considered. Another option might be trying to combine both approaches together in some way, although we need to decide exactly how to do that.

Would love to hear what everyone thinks about this, or if there might be something I'm missing. Would specifically love to hear your insights on this @inodb.

Sorry for the long post,
Ilan Meissonnier

@skhavindev
Copy link

Hey!

I am khavin. I am a Artificial Intelligence (AI) student currently pursuing a dual Bachelor of Science in data science at the Indian Institute of Technology Madras (IIT Madras) and Sathyabama Institute of Science and Technology. My have high interest in machine learning ,Artificial intellignce ,Neuromorphic computing

I possess extensive experience working with PyTorch and have successfully built chatbots using Google AI Studio. This has given me some experience on how to train and build chatbots. I think this experience is useful for this application and provide further experience to me on real world applications of AI

Looking forward for open source contributing!

Regards,
Khavin S

@Steveolas
Copy link

Hey all, I have made a prototype for for a chatbot using RAG. I think RAG could be a pretty good approach for this project. I'm sharing this prototype as a link for a kaggle notebook if you are intrested, be sure to leave any interesting feedback that you may have.

https://www.kaggle.com/code/ilanmeissonnier/rag-for-cbioportal-documentation-chatbot

Ilan Meissonnier

@Steveolas
Copy link

I have also came across a research paper that came out a few days ago suggesting a method called Research Augmented Fine Tuning (RAFT). I am still not done reading through it but it already seems like it could be a really good approach for this.

@Steveolas
Copy link

Link to the paper 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests