Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to extract triplets #1455

Open
DeepankarVyas opened this issue Jul 12, 2024 · 3 comments
Open

Unable to extract triplets #1455

DeepankarVyas opened this issue Jul 12, 2024 · 3 comments

Comments

@DeepankarVyas
Copy link

DeepankarVyas commented Jul 12, 2024

I am working on extracting triplets from an annotated string, but the code's returning NULL. Here is the code used:-

library(tidyverse)
library(tm)
library(coreNLP)

# Increase Java heap space
options(java.parameters = "-Xmx4g")

# Initialize CoreNLP with the path to the unzipped folder
initCoreNLP("/Users/..../stanford-corenlp-4.5.7/")  

# Function to extract relations using CoreNLP
extract_relations <- function(text) {
  cat("Text to be annotated:\n", text, "\n\n")
  
  annotation <- tryCatch({
    annotateString(text)
  }, error = function(e) {
    message("Error in annotation: ", e)
    return(NULL)
  })
  
  if (is.null(annotation)) {
    message("Annotation is NULL")
    return(list())
  }
  
  print(annotation)
  
  triples <- tryCatch({
    getOpenIE(annotation)
  }, error = function(e) {
    message("Error in extracting OpenIE triples: ", e)
    return(NULL)
  })
  
  if (is.null(triples) || length(triples) == 0) {
    message("No triples extracted.")
    return(list())
  }
  
  print(triples)
  
}

# Mock dataset to train the model
mock_data <- data.frame(
  match_id = 1,
  home_team = "Manchester United",
  away_team = "Chelsea",
  match_preview = "Manchester United won their last game convincingly and have a strong home record. Chelsea, on the other hand, are struggling with injuries and have lost three of their last five away games.",
  outcome = "homewin",
  stringsAsFactors = FALSE
)

# Extracting features and assigning scores
match <- mock_data[1, ]
relations <- extract_relations(match$match_preview)

This is the output:-

image

Stanford core NLP used- stanford-corenlp-4.5.7
R version - R version 4.3.1

Is it an issue with the way CoreNLP is initialised or something else? Any help is appreciated.

Regards.

@AngledLuffa
Copy link
Contributor

Heads up is that you need three backticks ``` to highlight a large code block, not just one.

I don't know anything about the R interface to CoreNLP. I would check the output of the interface to make sure it's actually starting CoreNLP with the OpenIE annotator as a first pass.

@DeepankarVyas
Copy link
Author

Hi @AngledLuffa ,

Thanks for the heads up.

I think it is starting CoreNLP, as annotateString(text) is successfully annotating the text. It's just the triplet extraction that's creating issues. Could it be due to some missing annotators?

P.S- I manually downloaded the stanford-corenlp-4.5.7 , and can't seem to find .Properties file in the package. Not sure if that's the issue.

Regards

@AngledLuffa
Copy link
Contributor

Sounds good. So, I would first try to check that the OpenIE model is actually part of the annotators loaded when R is creating the pipeline to interface to. It should show up in the output from the pipeline, if the R interface allows for piping the output.

Personally I have zero experience with the R interface and suggest testing that out yourself rather than relying on help from us. You could also find the authors of the R interface and ask them how to check the OpenIE package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants