Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse OBO file #533

Open
julie-sullivan opened this issue Mar 17, 2020 · 4 comments
Open

Parse OBO file #533

julie-sullivan opened this issue Mar 17, 2020 · 4 comments
Assignees
Labels
in progress work in progress, under active development

Comments

@julie-sullivan
Copy link
Contributor

BioJava has an OBO parser:

https://github.com/biojava/biojava/blob/biojava-4.1.0/biojava-ontology/src/main/java/org/biojava/nbio/ontology/obo/OboFileParser.java

Is it fit for purpose?

@julie-sullivan
Copy link
Contributor Author

I can't get the OBO parser from BioJava to work at all.

Here's my unit test that passed:

        BufferedReader bufferedReader = FileUtils.newBufferedReader(Paths.get(getClass()
                .getResource("/hp.obo").getPath()));

        OboParser parser = new OboParser();
        Ontology ontology = parser.parseOBO(bufferedReader, "Human phenotype ontology",
                "The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities " +
                        "and clinical features encountered in human disease.");
        assertEquals(5, ontology.getTerms().size());

        Term term = ontology.getTerm("HP:0000001");
        assertEquals("HP:0000001", term.getName());
        assertEquals("All", term.getDescription());

        term = ontology.getTerm("HP:0000002");
        assertEquals("HP:0000002", term.getName());
        assertEquals("Abnormality of body height", term.getDescription());

Why is HP:0000002 the name instead of the ID? Why is the description the name?

Here's the snippet:

[Term]
id: HP:0000001
name: All
comment: Root of all terms in the Human Phenotype Ontology.
xref: UMLS:C0444868

[Term]
id: HP:0000002
name: Abnormality of body height
def: "Deviation from the norm of height with respect to that which is expected according to age and gender norms." [HPO:probinson]
synonym: "Abnormality of body height" EXACT layperson []
xref: UMLS:C4025901
is_a: HP:0001507 ! Growth abnormality
created_by: peter
creation_date: 2008-02-27T02:20:00Z

I followed the cookbook exactly, I must be doing something wrong? Because that's not what I would expect at all.

I am going to try another library.

@julie-sullivan
Copy link
Contributor Author

@julie-sullivan
Copy link
Contributor Author

[   {      "id":"HP:0001187",
      "name":"Hyperextensibility of the finger joints",
      "definition":"The ability of the finger joints to move beyond their normal range of motion.",
      "namespace":"human_phenotype",
      "synonyms":[
         "Finger joint hyperextensibility"
      
],
      "xrefs":[
         "UMLS:C1844577"
      
],
      "parents":[
         "HP:0006094"
      
]
   
},
   {      "id":"HP:0025154",
      "name":"Portosystemic collateral veins",
      "definition":"Presence of biliary veins that serve as a collateral channel to the systemic circulation",
      "namespace":"human_phenotype",
      "comment":"Venous blood returning from the small intestine, stomach, pancreas and spleen converges into the portal vein. The terminal branches of the hepatic portal vein and hepatic artery empty together and mix as they enter sinusoids in the liver. Conditions such as liver cirrhosis, in which scar tissue partially blocks the normal flow of blood, may increases the pressure in the portal vein (portal hypertension).When blood flow through a vessel or a vascular bed is obstructed due to occlusion, collateral pathways open up as blood bypasses the occlusion or obstruction, and this can lead to portosystemic collateral veins in the case of cirrhosis and some other hepatobiliary diseases.",
      "synonyms":[
         "Collateral biliary circulation"
      
],
      "parents":[
         "HP:0012440"
      
]
   
},
   {      "id":"HP:0025153",
      "name":"Transient",
      "definition":"Short-lived and not permanent. This term applies to a phenotypic abnormality that is temporary and of short duration.",
      "namespace":"human_phenotype",
      "parents":[
         "HP:0011008"
      
]
   
},
   {      "id":"HP:0025152",
      "name":"Poor visual behavior for age",
      "definition":"Lack of visual responsiveness or decrease in visual capabilities suggesting a lack of visual responsiveness or decrease in visual capabilities in an infant or young child in which visual behavior fails to meet normal developmental milestones.",
      "namespace":"human_phenotype",
      "comment":"A failure to meet age-related milestones in areas such as (i) focusing ability, (ii) eye coordinationg and tracking of objects in the visual field, (iii) depth perception, (iv) color perception, and (v) object and face recognition. These milestones are generally met in the first three months of life, and failure to meet them may indicate abnormal visual development or function.",
      "synonyms":[
         "Abnormal visual behavior for age"
      
],
      "parents":[
         "HP:0000504"
      
]
   
},
   {      "id":"HP:0001188",
      "name":"Hand clenching",
      "definition":"An abnormal hand posture in which the hands are clenched to fists. All digits held completely flexed at the metacarpophalangeal and interphalangeal joints.",
      "namespace":"human_phenotype",
      "comment":"Hand clenching is commonly characterized by malpositioning of the fingers characterized by radial deviation of the 4th and 5th digits and ulnar deviation of the 2nd digit over the 3rd finger. Hand clenching is distinguished from Camptodactyly, as that term may describe fewer than five digits of a eudactylous hand and does not involve the MCPJ. The digits may overlap when they lie flexed in the palm. It is not necessary to specify the overlapping fingers finding separately.",
      "synonyms":[
         "Clenched hands"
      
],
      "xrefs":[
         "UMLS:C0239815"
      
],
      "parents":[
         "HP:0005922"
      
]
   
}
]

@julie-sullivan
Copy link
Contributor Author

julie-sullivan commented Mar 17, 2020

switched to db cellbase_homo_sapiens_grch38_v4
> db.obo.count()
62395
> db.obo.findOne()
{
	"_id" : ObjectId("5e70dd7e299c136b01f0a74b"),
	"id" : "HP:0001187",
	"name" : "Hyperextensibility of the finger joints",
	"definition" : "The ability of the finger joints to move beyond their normal range of motion.",
	"namespace" : "human_phenotype",
	"synonyms" : [
		"Finger joint hyperextensibility"
	],
	"xrefs" : [
		"UMLS:C1844577"
	],
	"parents" : [
		"HP:0006094"
	]
}

HPO currently contains over 13,000 terms

julie-sullivan added a commit that referenced this issue Mar 18, 2020
julie-sullivan added a commit that referenced this issue Mar 18, 2020
@julie-sullivan julie-sullivan self-assigned this Mar 19, 2020
@julie-sullivan julie-sullivan added the in progress work in progress, under active development label Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress work in progress, under active development
Projects
None yet
Development

No branches or pull requests

1 participant