You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, so we are working on generating biobert embeddings for our project. When we run it on a single word it takes about a second or so. When we run on a list of 10,000 words, it either times out or takes upwards of hours to run. Is this normal? Below is how we are using it:
Hi @krico1 large embeddings like biobert can be quite slow because of the large deep learning models used for it.
But you can also achieve ~ 10x speedup by using NLU in GPU mode
All you need to do is set gpu=True and make sure the GPU is available to Tesnroflow beforehand.
Then you can just call the following to get the GPU pipe nlu.load('en.embed_sentence.biobert.pmc_base_cased', gpu=True)
See this notebook as a reference.
Also note: If you have a large dataset at hand, it will be faster to feed NLU all the data at once instead of one by one
@C-K-Loan Hi! Unfortunately, I am not able to obtain the embeddings (even when adding get_embeddings= True). I tried with multiple models, and by including other parameters but with no success.
In particular, nlu.load(biobert).predict("random sentence", output_level='token', get_embeddings= True) does not give the expected output, I thought the column was being dropped so I added drop_irrelevant_cols= False but still no success.
@C-K-Loan I have the same problem as @MargheCap . I assume it has something to do with how we install the nlu package. Could you share how you install it?
With my installation (below), I get this rather slow calculation:
Hi, so we are working on generating biobert embeddings for our project. When we run it on a single word it takes about a second or so. When we run on a list of 10,000 words, it either times out or takes upwards of hours to run. Is this normal? Below is how we are using it:
def load_biobert(self):
# Load BioBERT model (for sentence-type embeddings)
self.logger.info("Loading BioBERT model...")
start = time.time()
biobert = nlu.load('en.embed_sentence.biobert.pmc_base_cased')
end = time.time()
self.logger.info('done (BioBERT loading time: %.2fs seconds)', end - start)
return biobert
def get_biobert_embeddings(self, strings):
embedding_list = []
for string in strings:
self.logger.debug("...Generating embedding for: %s", string)
embedding_list.append(self.get_biobert_embedding(string))
return embedding_list
def get_biobert_embedding(self, string):
embedding = self.biobert.predict(string, output_level='sentence', get_embeddings=True)
return embedding.sentence_embedding_biobert.values[0]
The text was updated successfully, but these errors were encountered: