Update paper

big-data-lab-team · Mar 13, 2024 · 552b63b · 552b63b
1 parent 8ab47dc
commit 552b63b
Show file tree

Hide file tree

Showing 2 changed files with 84 additions and 12 deletions.
diff --git a/paper/sea-neuro/paper-neuro.tex b/paper/sea-neuro/paper-neuro.tex
@@ -205,9 +205,9 @@
 \end{figure*}
     Sea (Figure~\ref{fig:seaneuro:diagram}) is a data-management library that leverages
     the \texttt{LD\_PRELOAD}
-    trick to intercept POSIX file system calls (more specifically, file access calls to the GNU C library,
-    glibc) on Linux systems. This enables Sea to redirect write calls
-    aimed at slower storage devices to a faster devices whenever possible.
+    trick to intercept POSIX file system calls, or more specifically, file access calls to the GNU C library,
+    glibc, on Linux systems. This enables Sea to redirect write calls
+    aimed at slower storage devices to a faster device whenever possible.
     Similarly, when intercepting read calls, Sea can choose to read from a
     faster device if a copy is available on that device. Sea decides which
     storage location it can write to based on the details provided in an
@@ -300,11 +300,9 @@ \subsection{Speedups observed in controlled environment}
 performance had been degraded by busy writers (Figure~\ref{fig:seaneuro:slashbin}),
 although performance was variable. On average, a single SPM pipeline
 process preprocessing a single HCP image produced the greatest speedup
-(13$\times$) \TG{I think this number should be 32x, according to the current notebook}. The AFNI pipeline preprocessing a single fMRI image of the PREVENT-AD dataset
+(12.6$\times$),  \TG{I think this number should be 32x, according to the current notebook}. The AFNI pipeline preprocessing a single fMRI image of the PREVENT-AD dataset
 using a single process was the next fastest pipeline, with an average speedup of
-(5$\times$). The FSL Feat pipeline had speedups as well, with a maximum average speedup of
-1.3$\times$ when preprocessing a single PREVENT-AD image. The SPM pipeline consistently had
-(5$\times$). The FSL Feat pipeline had speedups as well, with a maximum average speedup of
+(4.3$\times$). The FSL Feat pipeline had speedups as well, with a maximum average speedup of
 1.3$\times$ when preprocessing a single PREVENT-AD image. The SPM pipeline consistently had
 excellent speedups, which is likely due to a mix of prefetching the initial
 input files and the I/O patterns of the application. While it was expected that the
@@ -322,13 +320,12 @@ \subsection{Speedups observed in controlled environment}
 The FSL Feat pipeline, in contrast, appeared to be the most compute-bound of the three
 applications. Not only did it spend an extensive amount of time computing, the
 amount of output data generated was least of the three. Due to the
-compute-intensive nature of the FSL Feat pipeline, it is expected that it resulted in the
-compute-intensive nature of the FSL Feat pipeline, it is expected that it resulted in the
+compute-intensive nature of the FSL Feat pipeline, it is expected to have resulted in the
 least significant speedups.
 
 The HCP dataset, on average, obtained the greatest speedups with
 Sea \TG{Mention value}. Out of the three datasets, HCP has the largest images (see
-Table~\ref{table:sea-neuro:data}). Larger individual images mean that they
+Table~\ref{table:sea-neuro:data}). Larger individual images
 occupy more page cache space and take longer to flush to Lustre. Unsurprisingly,
 the next dataset with the largest images (\SI{282}{\mebi\byte} compressed for a
 single image) results in the next largest speedups, followed by PREVENT-AD, the
@@ -349,8 +346,7 @@ \subsection{Speedups observed in controlled environment}
 \subsection{Speedup correlated with Lustre degradation}
 
 Baseline performance without busy writers was comparable to that of Sea's as can be seen in Figure~\ref{fig:seaneuro:slashbin} \TG{(p=x, two-sample T-test)}. However, with
-busy writers, the makespan obtained with Sea was smaller than baseline \TG{(p=x, two-sample T-test)}. We observe that the more Baseline deviates from this identity line, the
-greater the speedup \TG{I would remove this sentence, it's obvious}. We did occasionally observe slowdowns from using Sea. These slowdowns
+busy writers, the makespan obtained with Sea was smaller than baseline \TG{(p=x, two-sample T-test)}. We did occasionally observe slowdowns from using Sea. These slowdowns
 may arise from the initial read of the data, as they appeared to occur less
 frequently with the SPM pipeline, or due to increased CPU contention caused by Sea's rapid I/O.
 

diff --git a/paper/sea-neuro/results/figure_slashbin.ipynb b/paper/sea-neuro/results/figure_slashbin.ipynb
@@ -6898,6 +6898,82 @@
     "df_merged.sort_values(by=\"speed_up\", ascending=False)"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "0f35c653",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "dataset    pipeline  n_images  busy_writers\n",
+       "HCP        spm       1         6               12.617431\n",
+       "ds001545   spm       8         6                7.615075\n",
+       "                     1         6                6.571933\n",
+       "HCP        spm       8         6                5.160400\n",
+       "preventAD  afni      1         6                4.271394\n",
+       "HCP        afni      8         6                4.074774\n",
+       "ds001545   spm       16        6                3.585750\n",
+       "HCP        spm       16        6                2.937354\n",
+       "preventAD  spm       1         6                2.828866\n",
+       "ds001545   afni      1         6                2.369637\n",
+       "HCP        afni      16        6                2.291281\n",
+       "preventAD  afni      8         6                1.622492\n",
+       "HCP        afni      1         6                1.475868\n",
+       "preventAD  fsl       1         6                1.345780\n",
+       "ds001545   fsl       1         6                1.261522\n",
+       "           afni      16        6                1.150132\n",
+       "preventAD  fsl       8         6                1.121890\n",
+       "           afni      16        0                1.079569\n",
+       "ds001545   spm       1         0                1.064818\n",
+       "preventAD  spm       16        6                1.059949\n",
+       "HCP        spm       1         0                1.057301\n",
+       "           afni      8         0                1.053507\n",
+       "ds001545   fsl       8         6                1.039927\n",
+       "preventAD  afni      1         0                1.034140\n",
+       "HCP        afni      1         0                1.033463\n",
+       "preventAD  spm       1         0                1.033344\n",
+       "           fsl       16        6                1.031404\n",
+       "           spm       8         0                1.024793\n",
+       "ds001545   afni      1         0                1.022089\n",
+       "           spm       16        0                1.017527\n",
+       "                     8         0                1.012077\n",
+       "HCP        spm       16        0                1.007503\n",
+       "           fsl       1         6                1.005294\n",
+       "ds001545   afni      16        0                1.002235\n",
+       "HCP        spm       8         0                0.995598\n",
+       "preventAD  spm       16        0                0.993274\n",
+       "HCP        afni      16        0                0.992692\n",
+       "ds001545   fsl       16        6                0.986928\n",
+       "preventAD  fsl       1         0                0.986418\n",
+       "HCP        fsl       8         6                0.983173\n",
+       "ds001545   fsl       1         0                0.981228\n",
+       "           afni      8         0                0.976202\n",
+       "HCP        fsl       16        6                0.971510\n",
+       "preventAD  afni      8         0                0.956270\n",
+       "HCP        fsl       1         0                0.945483\n",
+       "preventAD  fsl       8         0                0.934627\n",
+       "HCP        fsl       8         0                0.934242\n",
+       "ds001545   fsl       8         0                0.919897\n",
+       "preventAD  afni      16        6                0.913218\n",
+       "HCP        fsl       16        0                0.896197\n",
+       "preventAD  fsl       16        0                0.891131\n",
+       "ds001545   fsl       16        0                0.866530\n",
+       "preventAD  spm       8         6                0.712436\n",
+       "ds001545   afni      8         6                0.638075\n",
+       "Name: speed_up, dtype: float64"
+      ]
+     },
+     "execution_count": 47,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_merged.groupby(by=['dataset', 'pipeline', 'n_images', 'busy_writers'])['speed_up'].mean().sort_values(ascending=False)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "f6568ab1",