mf

tira-io · Mar 24, 2024 · 56c5955 · 56c5955
1 parent 19f05c2
commit 56c5955
Showing 1 changed file with 48 additions and 17 deletions.
diff --git a/tutorials/tutorial-ir-datasets.ipynb b/tutorials/tutorial-ir-datasets.ipynb
@@ -40,17 +40,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Load ir_dataset \"ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training\" from tira.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "from tira.third_party_integrations import ir_datasets\n",
     "\n",
@@ -71,13 +63,29 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Download from Zenodo: https://zenodo.org/records/10628640/files/iranthology-20230618-training-truths.zip?download=1\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Download: 100%|██████████| 174k/174k [00:00<00:00, 3.11MiB/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Download finished. Extract...\n",
+      "Extraction finished:  /root/.tira/extracted_datasets/ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training/\n",
       "\n",
       "Query:  1\n",
       "\tText:\t\tretrieval system improving effectiveness\n",
@@ -94,6 +102,13 @@
       "\tDescrition:\tWhich papers focus on how to recognize signs of self-harm in people's social media posts?\n",
       "\tNarrative:\tRelevant papers include research on early detection of self-harm on social media platforms such as Facebook, Instagram, Reddit, Twitter and co. Papers that addresses mental health issues like depression or anorexia are not relevant. Furthermore, papers that deal with self-harm but are not related to social media are also not relevant.\n"
      ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
     }
    ],
    "source": [
@@ -119,7 +134,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -155,13 +170,29 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Download from Zenodo: https://zenodo.org/records/10628640/files/iranthology-20230618-training-inputs.zip?download=1\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Download: 100%|██████████| 76.4M/76.4M [00:00<00:00, 84.5MiB/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Download finished. Extract...\n",
+      "Extraction finished:  /root/.tira/extracted_datasets/ir-lab-jena-leipzig-sose-2023/iranthology-20230618-training/\n",
       "The dataset has 53673 documents.\n"
      ]
     }
@@ -172,7 +203,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
@@ -181,7 +212,7 @@
        "GenericDoc(doc_id='2005.ipm_journal-ir0anthology0volumeA41A1.7', text='A probabilistic model for stemmer generation AbstractIn this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.')"
       ]
      },
-     "execution_count": 9,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -206,7 +237,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -265,7 +296,7 @@
        "2   test-3  some test query 3                  0.2"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }