spark-dicom

Spark DICOM connector in Scala

How to use

Once loaded in the classpath of your Spark cluster, you can load DICOM data in Spark using the dicomFile as follows:

val df = spark.read.format("dicomFile").load("/some/hdfs/path").select("PatientName", "StudyDate", "StudyTime")

You can select DICOM attributes defined in the DICOM standard registry using their keyword. They are defined in the official DICOM standard.

Each attribute is written to a column with a Spark data type equivalent to its VR. The mapping is as follows:

VR	Spark Data type
AE, AS, AT, CS, DS, DT, IS, LO, LT, SH, ST, UC, UI, UR, UT	String
PN	{"Alphabetic": String, "Ideographic": String, "Phonetic": String}
FL, FD	[Double]
SL, SS, US, UL	[Integer]
SV, UV	[Long]
DA	String (formatted as `DateTimeFormatter.ISO_LOCAL_DATE`)
TM	String (formatted as `DateTimeFormatter.ISO_LOCAL_TIME`)

Pixel Data

The PixelData attribute in a DICOM file can be very heavy and make Spark crash. Reading it is disabled by default. In order to be able to select the PixelData column, please turn the includePixelData option on:

spark.read.format("dicomFile").option("includePixelData", true).load("/some/hdfs/path").select("PixelData")

Other columns

isDicom: true if file was read as a DICOM file, false otherwise

De-identification

The DICOM dataframe can be de-identified according to the Basic Confidentiality Profile in the DICOM standard. To use the de-identifier, do the following in scala:

import ai.kaiko.spark.dicom.deidentifier.DicomDeidentifier._

var df = spark.read.format("dicomFile").load("/some/hdfs/path")
df = deidentify(df)

The resulting dataframe will have all the columns dropped/emptied/dummyfied according to the actions described here.

To perform the de-identification with any of the options described in the table, use:

import ai.kaiko.spark.dicom.deidentifier.DicomDeidentifier._
import ai.kaiko.spark.dicom.deidentifier.options._

val config: Map[DeidOption, Boolean] = Map(
  CleanDesc -> true,
  RetainUids -> true
)

var df = spark.read.format("dicomFile").load("/some/hdfs/path")
df = deidentify(df, config)

Current limitations of the de-identification are:

Expected behavior	Current behavior
Tags with `SQ` VR are de-identified	Tags with `SQ` VR are ignored
Private tags are de-identified	Private tags are ignored
The `U` action pseudonimizes the value	The `U` action replaces the value with `ToPseudonimize`
The `C` action cleans the value of PHI/PII	The `C` action replaces the value with `ToClean`

Development

Development shell

A reproducible development environment is provided using Nix.

$ nix-shell

it will provide you the JDK, sbt, and all other required tools.

Build with Nix

Build the JAR artifact:

$ nix-build

Updating dependencies

When changing sbt build dependencies, change depsSha256 in default.nix as instructed.

CI

CI is handled by GitHub actions, using Nix for dependency management, test, build and caching (with Cachix).

Note: for CI to run tests, the CI needs the Nix build to run tests in checkPhase.

You can run the CI locally using act (provided in the Nix shell).

Release

Creating a release is done with the help of the sbt-sonatype, sbt-pgp and sbt-release plugins.

Before starting, make sure to set the Sonatype credentials as environment variables: SONATYPE_USERNAME & SONATYPE_PASSWORD. In addition, make sure to have the gpg utility installed and the release GPG Key available in your keyring.

Then, run:

$ nix-shell
$ sbt
$ release

You will be prompted for the "release version", the "next version" and the GPG Key passphrase. Make sure to follow the SemVer versioning scheme. If all went well, the new release should be available on Maven Central in 10 minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github/workflows		.github/workflows
docs		docs
nix		nix
project		project
scripts		scripts
src		src
.actrc		.actrc
.gitignore		.gitignore
.jvmopts		.jvmopts
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
default.nix		default.nix
release.sbt		release.sbt
shell.nix		shell.nix
sonatype.sbt		sonatype.sbt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-dicom

How to use

Pixel Data

Other columns

De-identification

Development

Development shell

Build with Nix

Updating dependencies

CI

Release

About

Releases

Packages

Contributors 4

Languages

License

kaiko-ai/spark-dicom

Folders and files

Latest commit

History

Repository files navigation

spark-dicom

How to use

Pixel Data

Other columns

De-identification

Development

Development shell

Build with Nix

Updating dependencies

CI

Release

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages