Skip to content

Commit

Permalink
mf
Browse files Browse the repository at this point in the history
  • Loading branch information
mam10eks committed Mar 24, 2024
1 parent 19f05c2 commit 56c5955
Showing 1 changed file with 48 additions and 17 deletions.
65 changes: 48 additions & 17 deletions tutorials/tutorial-ir-datasets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,17 +40,9 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Load ir_dataset \"ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training\" from tira.\n"
]
}
],
"outputs": [],
"source": [
"from tira.third_party_integrations import ir_datasets\n",
"\n",
Expand All @@ -71,13 +63,29 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Download from Zenodo: https://zenodo.org/records/10628640/files/iranthology-20230618-training-truths.zip?download=1\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Download: 100%|██████████| 174k/174k [00:00<00:00, 3.11MiB/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Download finished. Extract...\n",
"Extraction finished: /root/.tira/extracted_datasets/ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training/\n",
"\n",
"Query: 1\n",
"\tText:\t\tretrieval system improving effectiveness\n",
Expand All @@ -94,6 +102,13 @@
"\tDescrition:\tWhich papers focus on how to recognize signs of self-harm in people's social media posts?\n",
"\tNarrative:\tRelevant papers include research on early detection of self-harm on social media platforms such as Facebook, Instagram, Reddit, Twitter and co. Papers that addresses mental health issues like depression or anorexia are not relevant. Furthermore, papers that deal with self-harm but are not related to social media are also not relevant.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
Expand All @@ -119,7 +134,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -155,13 +170,29 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Download from Zenodo: https://zenodo.org/records/10628640/files/iranthology-20230618-training-inputs.zip?download=1\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Download: 100%|██████████| 76.4M/76.4M [00:00<00:00, 84.5MiB/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Download finished. Extract...\n",
"Extraction finished: /root/.tira/extracted_datasets/ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training/\n",
"The dataset has 53673 documents.\n"
]
}
Expand All @@ -172,7 +203,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 5,
"metadata": {},
"outputs": [
{
Expand All @@ -181,7 +212,7 @@
"GenericDoc(doc_id='2005.ipm_journal-ir0anthology0volumeA41A1.7', text='A probabilistic model for stemmer generation AbstractIn this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.')"
]
},
"execution_count": 9,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -206,7 +237,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -265,7 +296,7 @@
"2 test-3 some test query 3 0.2"
]
},
"execution_count": 10,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand Down

0 comments on commit 56c5955

Please sign in to comment.