Valmont/F - A re-implementation of the classic ArrowSmith Literature Based Discovery system, using Groovy and Grails.
This is a learning platform, for anyone interested in LBD, to play and experiment. Our first goal is to re-implement the classic algorithm(s) that Don Swanson developed, and then start looking for new and interesting variations and additions.
As such, one of our primary goals here is to make this as modular and pluggable as possible, in order to facilitate experimenting with novel approaches to Literature Based Discovery.
This is very much a Work In Progress at the moment, featuring fairly naive implementations of the orignal "procedure one" and "procedure two" approaches from Swanson's original A-B-C discovery algorithm.
When I say I am called Valmont/F, the name will convey no impression to the reader, one way or another. My occupation is that of open source Literature Based Discovery system on GitHub. If you ask anyone who Valmont was, she will likely be able to tell you to see http://www.gutenberg.org/files/19369/19369-h/19369-h.htm If you ask here why I am named Valmont/F, she will surely say that I am named after the world-famous detective, Eugène Valmont.
There are two main ways to deploy Valmont/F at the moment. The first, and easiest, is to use our public Docker image(s), located at https://hub.docker.com/r/fogbeam/valmont-f
A simple "docker pull fogbeam/valmont-f:latest" followed by a "docker run -d -p 8080:8080 fogbeam/valmont-f:latest" should yield a running Valmont/F instance. The container exposes port 8080, but you can change the -p argument to map that to whatever makes sense on your Docker host. The webapp is running on the root context.
The second way is to clone this Git repo, install Java and Grails (if you don't already have those installed), and then do a ./run_valmont.sh in the root of the cloned repo directory. Take this approach if you want to hack on the code yourself. The required Grails version is 3.3.6 if you plan to run things this way.
- Arrowsmith Project at UIC
- An interactive system for finding complementary literatures: a stimulus to scientific discovery
- Rediscovering Don Swanson:The Past, Present and Future of Literature-based Discovery
- Undiscovered Public Knowledge
- Literature Based Discovery: Beyond the ABCs
- Ranking Indirect Connections in Literature-Based Discovery: The Role of Medical Subject Headings
- Literature Based Discovery: Models, Methods, and Trends
- HTS and hit finding in academia – from chemical genomics to drug discovery
- Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy
- Literature‐based Resurrection of Neglected Medical Discoveries
- Using Concepts in Literature-Based Discovery: Simulating Swanson’s Raynaud–Fish Oil and Migraine–Magnesium Discoveries
- Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph
- Improving Knowledge Discovery in Document Collections through Combining Text Retrieval and Link Analysis Techniques
- Constructing an Associative Concept Space for Literature-based Discovery
- Recent Advances in Literature Based Discovery
- A framework for information extraction from tables in biomedical literature
Original code provided by Fogbeam Labs is licensed under the Apache License v2. Data files and supporting libraries may be under separate licenses. See LICENSE file for more details.
Create a Docker image and push to Docker Hub- DONE- add more terms to the clinical-stopwords list
- add a "domain selector" to toggle what archive is queried and what stopword list(s) are employed
- better tokenization of abstracts and titles, so we don't, for example, treat 'Start' and 'Start.' as different tokens and generate each as a 'b term'
- use NLP, deep learning, etc. to do deeper semantic analysis of article text to find more meaningful connections that simple co-occurence of words
- improve code structure to create reusable components that simplify implementing new algorithms and approaches
- add input validation to existing controllers
- figure out a UI experience for "drilling down" further into the results we currently return, especially for "Procedure One"
- support more complex relationships, especially "multi-hop" ones that involve more than two concepts
- Add visualizations to help navigate / explore results. Maybe use dot / graphviz
- Add caching to reduce the need for downloading documents all the time
- Add ability to filter the initial query(ies) by date range