lincc-frameworks · drewoldag · Nov 13, 2024 · Nov 13, 2024 · Nov 14, 2024
diff --git a/docs/notebooks.rst b/docs/notebooks.rst
@@ -3,4 +3,4 @@ Notebooks
 
 .. toctree::
 
-    Introducing Jupyter Notebooks <notebooks/intro_notebook>
+    Training a simple model <notebooks/train_model>
diff --git a/docs/notebooks/TrainingAModel.ipynb b/docs/notebooks/TrainingAModel.ipynb
diff --git a/docs/notebooks/intro_notebook.ipynb b/docs/notebooks/intro_notebook.ipynb
diff --git a/docs/notebooks/train_model.ipynb b/docs/notebooks/train_model.ipynb
@@ -0,0 +1,97 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Intro to Training and Configurations\n",
+    "\n",
+    "First we import fibad and create a new fibad object, instantiated (implicitly), with the default configuration file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import fibad\n",
+    "\n",
+    "fibad_instance = fibad.Fibad()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For this demo, we'll make a few adjustments to the default configuration settings that the `fibad` object was instantiated with. By accessing the `.config` attribute of the fibad instance, we can modify any configuration value. Here we change which built in model to use, the dataset, batch size, number of epochs for training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fibad_instance.config[\"model\"][\"name\"] = \"ExampleCNN\"\n",
+    "fibad_instance.config[\"data_set\"][\"name\"] = \"CifarDataSet\"\n",
+    "fibad_instance.config[\"data_loader\"][\"batch_size\"] = 64\n",
+    "fibad_instance.config[\"train\"][\"epochs\"] = 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We call the `.train()` method to train the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fibad_instance.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The output of the training will be stored in a time-stamped directory under the `./results/`. By default, a copy of the final configuration used in training is persisted as `runtime_config.toml`. To run fibad again with the same configuration, you can reference the runtime_config.toml file.\n",
+    "\n",
+    "If running in another notebook, instantiate a fibad object like so:\n",
+    "```\n",
+    "new_fibad_instance = fibad.Fibad(config_file='./results/<timestamped_directory>/runtime_config.toml')\n",
+    "```\n",
+    "\n",
+    "Or from the command line:\n",
+    "```\n",
+    ">> fibad train --runtime-config ./results/<timestamped_directory>/runtime_config.toml\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "fibad",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/pyproject.toml b/pyproject.toml
@@ -21,6 +21,8 @@ dependencies = [
     "toml", # Used to load configuration files as dictionaries
     "torch", # Used for CNN model and in train.py
     "torchvision", # Used in hsc data loader, example autoencoder, and CNN model data set
+    "tensorboardX", # Used to log training metrics
+    "tensorboard", # Used to log training metrics
 ]
 
 [project.scripts]

diff --git a/src/fibad/pytorch_ignite.py b/src/fibad/pytorch_ignite.py
@@ -7,6 +7,7 @@
 import torch
 from ignite.engine import Engine, Events
 from ignite.handlers import Checkpoint, DiskSaver, global_step_from_engine
+from ignite.handlers.tensorboard_logger import GradsScalarHandler, TensorboardLogger, WeightsHistHandler
 from torch.nn.parallel import DataParallel, DistributedDataParallel
 from torch.utils.data import Dataset
 
@@ -214,10 +215,30 @@
         greater_or_equal=True,
     )
 
+    tensorboard_logger = TensorboardLogger(log_dir=results_directory)
+
     if config["train"]["resume"]:
         prev_checkpoint = torch.load(config["train"]["resume"], map_location=device)
         Checkpoint.load_objects(to_load=to_save, checkpoint=prev_checkpoint)
 
+    tensorboard_logger.attach(
+        trainer, log_handler=GradsScalarHandler(model), event_name=Events.ITERATION_COMPLETED(every=100)
+    )
+
+    tensorboard_logger.attach(
+        trainer,
+        log_handler=WeightsHistHandler(model),
+        event_name=Events.ITERATION_COMPLETED(every=100),
+    )
+
+    tensorboard_logger.attach_output_handler(
+        trainer,
+        event_name=Events.ITERATION_COMPLETED(every=10),
+        tag="training",
+        output_transform=lambda loss: loss,
+        metric_names="all",
+    )
+
     @trainer.on(Events.STARTED)
     def log_training_start(trainer):
         logger.info(f"Training model on device: {device}")