Skip to content
Juan Miguel Cejuela edited this page Nov 25, 2015 · 24 revisions

Welcome to the Wiki of the amazing Nala pipeline.

You can have a quick peek on why Nala was created in the first place. --> 2 theses' project in order to build up tagtog.net

Introductory talk:
Online presentation Thesis

Pipeline diagram View the pipeline visualization externally in a bigger resolution here. Editable version of the online diagram can be found here. (requires to log in/create an account to Lucidchart).

Goals of 2 theses and this method:

  1. Study significance of NL mentions in mutation mention recognition
  • ratio of standard vs NL in abstracts & full text
  • % of novel mutations not present in SwissProt (would require manual annotation of protein
  • % of mutation mentions in natural language that don't appear as standard mention
  1. Define/extend corpus of NLs
  • size depends on significance of NLs
  1. Method for mutation mention extraction grounded to their genes/proteins
  • Mutation mention recognizer better than tmVar for standard mentions
  • If NLs are relevant, prove good F1 performance (> 70-80)
  • Simple or optionally advanced normalization method
  • Easy to use program:
    • Good documentation:
      • code
      • end-user (biology researcher level, how to call from the command line, ...)
    • Accept inputs: programmatical call (string), text file, corpora' formats**
    • Accept outputs: ann.json (tagtog suitable)
  1. Paper
  • Full draft (1 or 2 papers?) by end of August submittable to Burkhard Rost
  • Submit by September-October

Theses Documentation

Clone this wiki locally