collect more dataset to retrain model #5

pocession · 2024-04-16T20:46:54Z

codes for labelling new dataset:

def label_text(raw_text, entity):
    # Split the text and entity into words
    words = raw_text.split()
    entity_words = entity.split()

    # Initialize labels with "O" for each word in the text
    labels = ['O'] * len(words)

    # Find the start of the entity in the text
    for i in range(len(words)):
        # Check if the current slice of words matches the entity words
        if words[i:i+len(entity_words)] == entity_words:
            # Label the start of the entity with "B"
            labels[i] = '1'
            # Label the rest of the entity with "I"
            for j in range(1, len(entity_words)):
                labels[i+j] = '2'

    return labels

# Example usage
raw_text = "Sildenafil is also used in both men and women to treat the symptoms of pulmonary arterial hypertension. This is a type of high blood pressure that occurs between the heart and the lungs."
entity = "pulmonary arterial hypertension"

# Get labels for the example
labels = label_text(raw_text, entity)
print("Words:", raw_text.split())
print("Labels:", labels)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collect more dataset to retrain model #5

collect more dataset to retrain model #5

pocession commented Apr 16, 2024

collect more dataset to retrain model #5

collect more dataset to retrain model #5

Comments

pocession commented Apr 16, 2024