Blog

Uploading NIFTIs

Uploading NIFTIs

There are several mechanisms for uploading DICOM sessions into XNAT that are quite well documented (e.g. in the upload menu in your instance of XNAT or here) but there’s much less documentation available for uploading NIFTI images, though in research at least, NIFTIS are the norm.  So here are a few tips for uploading niftis to your DPUK imaging platform node.  I’ll start with manually uploading NIFTI files, which gives an idea of how xnat can handle non-dicom files and then talk about scripting file uploads which can be done at lowish level with the curl commandline program, or via the python scripting language.  My preferred method is to use python, but understanding the manual and commandline API are useful to understanding how to script uploads with python.

Manually

If you are just getting started and you only have one or two scans to upload, then there could be some paedagogic and/or pragmatic value to manually uploading the images with the “Manage Files” dialogue.

If you first create a subject and then add an “MR Session” experiment then you will see this form, where you can define the sequences (scans in XNAT language) that you wish to store for this scan (session in XNAT language)
Having created a session, you’ll need to use the Manage Files action to open this diaglogue and then first create a “NIFTI” folder for each sequence.
Having created the folders you will be able to use the “Upload Files” to add the nifti files appropriately.  You will be asked “Would you like the contents of this archive file to be extracted on the server? Press ‘OK’ to extract or ‘Cancel’ to proceed with upload without extracting.” You should click the incorrectly named cancel button to upload the .nii.gz file.

I’ve used XNAT 1.7 for the screenshots here as I find no differences between 1.6.5 and 1.7.x in this area and the DPUK imaging platform will be migrating to xnat 1.7 before too long.

Using Bash / Curl

For more than one or two scans, the manual method is time consuming and its more convenient to use the XNAT REST API which allows you automate the upload process.   One of the most direct ways of doing this is to use CURL on the linux/OSX commandline.

Following a similar pattern to the one we have used in the manual upload, we can upload NIFTI files in BASH with parameters, $url, $user, $pass, $project, $subject, $experiment, $scan and $filepath previously defined.  First we need to create the scan (which is xnat terminology for a sequence) if it doesnt already exist.

scan_exists=$(curl -silent -I -u $user:$pass $url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan)
if [[ $scan_exists == HTTP/1.1\ 200* ]]; then
 echo "scan found: $scan"
else
 scan_created=$(curl -silent -X PUT -u $user:$pass $url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan?xsiType=xnat\:mrScanData)
 echo "MR Scan created: $scan $scan_created"
fi

And then create the NIFTI folder for that scan:

resources=$(curl -sb -H -u $user:$pass $url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan/resources?format=csv)
# create NIFTI resource type if not already present
if [[ $(echo "$resources" | grep "NIFTI" | wc -l) == 0 ]]; then
  curl -X PUT -u $user:$pass $url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan/resources/NIFTI
  echo "NIFTI resource folder created for subject $subject, session $experiment, scan $scan"
fi

And finally upload a NIFTI file ($filepath has the full filepath of the NIFTI file):

filename=$(basename $filepath)
nifti_exists=$(curl -silent -I -u $user:$pass $url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan/resources/NIFTI/files/$filename)
if [[ $nifti_exists == HTTP/1.1\ 200* ]]; then
  echo "Nifti file already exists: IGNORED"
else
  curl -u $user:$pass -X PUT --data-binary "@$filepath" "$url/data/archive/projects/$project/subjects/$subject/experiments/$experiment/scans/$scan/resources/NIFTI/files/$filename?format=NIFTI&inbody=true"
  echo "Nifti file uploaded: $filename"
fi

For DPUK users, there is a snippet (snippets are gitlab’s version of github’s gists) for uploading a nifti resource via bash and curl available on the developer portal at https://issues.dpuk.org/dpuk/node/snippets/1.

Using Python

You can use CURL directly, as shown above, but for the greater ease and flexibility pyxnat, a python wrapper for the xnat rest api comes in handy.  There is also a confusingly named python project, xnatpy, which in many ways is easier to use, but doesnt allow you to create subjects or sessions, which is why we’ve stuck with pyxnat here.

Shortcutting the same process as above, with variables of the same name, here’s how we can use pyxnat in python:

# This Python 2.7 file uses the following encoding: utf-8
import os
from pyxnat import Interface

url = "http://10.1.1.17"
user = "admin"
project = "test1"
subject = "sub001"
experiment = "sub001_MRI"
scan = "1"
filepath = "/data/T1.nii.gz"

# establish pyxnat session and connect to project
project = Interface(server=url, user=user, cachedir='/tmp').select.project(project)

# pyxnat will create the subject, experiment, scan and resource if they dont
# already exist (default types are fine in this case)
file = project.subject(subject).experiment(experiment).scan("1").resource("NIFTI").file(os.path.basename(filepath))
if not(file.exists()) and os.path.isfile(filepath):
    # Note that adding spaces for format and content and tags prevents "U U U" in the Manage Files modal.
    # FIXME: either use format, content and tags attributes or find a way not to have to add spaces
    file.put(filepath, format=" ", content=" ", tags=" ")

# set the scan type
project.subject(subject).experiment(experiment).scan(scan).attrs.set("type","T1")

In this example, I’ve shown a shortcut for creating the various objects in one call but I then have to label the scan in a separate instruction.  Alternatively,  I could have used a scan.create(details) instruction, though it would have less neat becuase I’d then have to create the resource in separate instruction. For DPUK users, there is an example script for uploading real data, the Cam-CAN niftis, using pyxnat available on the developer portal at https://issues.dpuk.org/dpuk/node/snippets/10.

Final words

Hopefully, there is enough here for you to get started.  Something to note is that unlike for DICOM sequences we wont get nice snaphot images generated for each sequence you upload with any of these methods as there isnt a default pipleline for doing this.  Finally, if you do upload NIFTI images to your project with one of these methods you should find that the data-freeze / sharing / requesting process works exactly as for dicom data on the DPUK imaging platform.

UKBiobank calculator

The UK Biobank holds a lot of neuroimaging data (17K imaging subjects at the time of writing this post) and continues to scan subjects (see http://imaging.ukbiobank.ac.uk/ for more details). The DPUK Biobank node is intended to make it easier for dementia researchers to work with those data. You can create a spreadsheet of the subjects that you’re interested in from your biobank project (perhaps subjects over 60 with a family history of dementia/alzheimers/cognitive impairment) and upload that spreadsheet to the node to access the imaging data. If you wish to do this there are several questions that are worth asking:

  1. how long will it take for my upload spreadsheet to be processed?
  2. how much space will I need to store the images of interest?
  3. how long will it take to download the images?

We cant answer these questions categorically for you, as there are several possible confounding factors, but we can give you a ballpark to help your planning:

Spreadsheet upload

The spreadsheet upload is described in the projects page.  It takes approx 1.5 seconds to process each subject in the spreadsheet, but it’s not a process that needs to be watched.  You’ll receive an email when the upload’s finished.  Our test project included 8.5K subjects.  It took the node 3.5 hours to load the data from this spreadsheet.

Space requirement

There are six types of file and for each type you can access dicom or nifti files.   In our sampling of the codebase we’ve noticed the following size breakdowns:

SequenceDicom: Zipped size (MB)Dicom: Unzipped size (MB)Nifti: size (MB)
Functional brain images - task239363447
Multiband diffusion brain images123276501
Susceptibility weighted brain images588160428
T1 structural brain images379347
Functional brain images352535672
T2 FLAIR structural brain images29865
Total136829571700

So as a rough rule of thumb you’ll need 2GB for each subject to download the nifti images (e.g. 2TB for 1000 subjects) and 3GB for each subject to download and unzip the dicom images.

Download time

For downloading all of the dicoms we allow approx 1 minute per subject.

Image upload fix

The XNAT web application requires applets for a few of it’s features, particularly the image upload, image download and session timeout indicator features.  However, support for java applets from the major browsers is being withdrawn.  They are no long supported by Chrome or IE.  The only hold-outs, at least for now are Firefox and Safari (with some fairly complex configuration).   However Firefox changed it’s treatment of applets recently which caused a problem with the image upload applet which is discussed in this thread: XNAT 1.7 problem uploading images JAVA plugin is not support.

The vagrant-dpuk-node project which is the reference install for a dpuk node now includes a module that fixes the upload applet problem, at least temporarily, until a more long term solution is implemented for xnat.

DPUK imaging data model

In this project we’ve extended the XNAT data model slightly to include some high level metadata for dementia patients and the imaging modalities that are used to investigate their condition.  Today we’ve added more details of this data model that can be found in the training section.

Production Nodes

I’ve been fielding a number of questions about production installs of the DPUK XNAT node recently, so here are some notes for reference. There are two take home messages:

  • Please do adapt the vagrant project recipe to your own circumstances
  • We’re not yet ready to use XNAT 1.7

Using the Vagrant project

The vagrant-dpuk-node project provides an easy to use introduction to a DPUK node but its not appropriate to use in production. You can run it on your desktop machine get an idea of what XNAT is and see the customisations of a DPUK XNAT node if you are already familiar with XNAT. The provision.sh script provides a script for installing it on a single ubuntu 14.04 server. For a production install I would select a platform that you feel most comfortable with and use the vagrant project as a recipe to be adapted to your circumstances.  I would also recommend moving postgresql to it’s own server. For the Oxford node we have a setup where postgresql and tomcat are installed on separate VMs each with 4 cores and 8GB RAM allocated and we have a third smaller VM that has munin monitoring them both.  Note that there are sqlonly and xnatonly branches of the vagrant project that show how the install can be done with this setup.

XNAT 1.7

We’re not ready to use the latest version of XNAT. The DPUK XNAT node is a lightly modified version of the main 1.6.5 branch with several modules. To upgrade to 1.7 we’ll need to merge the modifications into version 1.7 and adapt the modules for 1.7 and test. We wont begin looking at this until at least the new year.

Sharing improvements

We had some problems sharing larger data-freezes from the Oxford node. These issues were fixed with an update to the dpuk-node-extras module.

The latest version of the node modules can be found in the vagrant project: https://github.com/mattsouth/vagrant-dpuk-node/modules

To deploy you’ll need to replace the existing dpuk-node-extras.jar file in your {xnat-data}/modules/webapp directory with dpuk-node-extras.1-3b.jar and then stop tomcat, run src/xnat/bin/update.sh and then restart tomcat.

Module update: dpuk-node-extras

An update to the dpuk-node-extras.jar module has been made that fixes two issues:
* No Datatype has been selected to shared – Erroneous message (DPUK-286)
* Wizard needs “select all” option for step 2 (DPUK-283)

The latest version of the node modules can be found in the vagrant project: https://github.com/mattsouth/vagrant-dpuk-node/modules

To deploy you’ll need to replace the existing dpuk-node-extras.jar file in your {xnat-data}/modules/webapp directory with dpuk-node-extras.1-3a.jar and then stop tomcat, run src/xnat/bin/update.sh and then restart tomcat.

Copying projects between XNATs

In the DPUK project we’ve created a way of pushing data from DPUK XNAT nodes to the central DPUK XNAT instance using data-freezes. This allows us to push and publish different tranches of data over the lifetime of a research project, however this is an approach that’s specific to the DPUK project and needs a customised central XNAT to receive the data freezes.

If you wish to copy a project from one vanilla XNAT instance to another you must script it yourself. If your task consists of a project that contains only MRSessions then you might try this pyxnat script that I wrote to push data to the central dpuk XNAT node (not the hub, confusingly there is a node on the central infrastructure alongside the central hub).

The script uses pyxnat which you’ll have to install yourself and this brief blog entry sketches some of the design choices that were made in its development.

How to use the script

This transcript should give you an idea of how it works:

you@yourmachine:~$ python copyproject.py
** COPY PROJECT **
Enter the Source xnat url: https://xnat.somewhere.org
Enter the Source xnat project id: example
Enter credentials for source xnat, https://xnat.somewhere.org
User: you
Password:
Enter the Target xnat url: https://xnat.elsewhere.org
Enter the Target xnat project id: test
Enter credentials for target xnat, https://xnat.elsewhere.org
User: anotheryou
Password:
creating subject: EXP0001
creating experiment: EXP0001_Day_1_MRI
downloading experiment files
downloaded and unzipped 452.3 MB in 10.2132329941 seconds
creating scan: 1
creating and uploading zip: SNAPSHOTS.zip
uploaded 15.5 KB in 0.26091003418 seconds
creating and uploading zip: DICOM.zip
uploaded 471.9 KB in 0.284129858017 seconds
creating scan: 2
creating and uploading zip: SNAPSHOTS.zip
uploaded 658.0 KB in 0.209185838699 seconds
creating and uploading zip: DICOM.zip
uploaded 41.6 MB in 1.21827507019 seconds
...
Copy experiment completed in 136.572870016 seconds
creating experiment: EXP0001_Day_2_MRI
downloading experiment files
downloaded and unzipped 452.3 MB in 70.3424210548 seconds
creating scan: 1
creating and uploading zip: SNAPSHOTS.zip
uploaded 20.2 KB in 0.227144956589 seconds
creating and uploading zip: DICOM.zip
uploaded 471.8 KB in 0.257121801376 seconds
creating scan: 2
creating and uploading zip: SNAPSHOTS.zip
uploaded 655.2 KB in 0.254781961441 seconds
creating and uploading zip: DICOM.zip
uploaded 41.6 MB in 0.911252975464 seconds
...
Copy experiment completed in 201.222012997 seconds
Copy subject completed in 784.395915031 seconds
...
Copy project completed in 3137.58364 seconds

The script will need an existing project to copy to. It first asks you to provide details of the source and the target projects. The script will copy over any experiment files, e.g. DICOM, NIFTI, SNAPSHOTS, PDFS that it finds and checks first for existing objects (subjects/experiments) on the target xnat project. If it finds existing objects it skips the copy.

How does it work:

The script iterates through the subjects on the source and checks to see if a subject with the same label is on the target. If not, it copies the subject object over. Then for each subject it does the same for experiments. Then the script checks to see if the target experiment has resources. If it has resources, it ignores it, otherwise it downloads all the resources for the experiment as a zip with a single API hit. Then it iterates through all the source scans and pushes the individual zips of scan/catalog resources to the target project/subject/experiment location.

Why not a bash script or XnatDataClient?

A bash script that uses curl and the XNAT Rest API is fine for creating an object or pushing some nifti files into an existing project / subject / experiment / scan combination but for manipulating object attributes, it got too complicated. The Rest API will provide you data in xml or json format. The xml for a single subject in the project I was working with (which isnt a complicated one) stretches to 2000 lines of text and the json is 6000. Pyxnat simplifies a lot of the complexity of working with the XNAT data. Though, there is a cost – the script as it stands will not capture all the scan data – more work is required here.

Why must you explicitly list the attributes?

XNAT is extensible which is one of the things that makes it powerful, but it also means that it’s hard to know a-priori what data you are going to encounter in a project. Perhaps I could have written some complicated discovery commands, but this seemed too daunting so I just hard-coded the attributes I expected to see.

Why use python requests to push the data

I couldnt work out how to use pyxnat to upload zipped files and extract them in-situ which is something that can be done by the underlying REST API so I dropped down to that, using python’s request library. An XNAT discussion thread informed this choice.

Why can you download all of the resources for an experiment but only upload the resources for a particular scan’s catalog?

Uploading all resources for an experiment in one hit is not supported, even though downloading is. I guess its because the structure is fairly flexible so you dont know for sure what you’re going to find there and copying the whole thing could break in multiple ways. But it would seem like an easy win in this instance to be able to do this.

Where are the hacks?

The main hack is hardcoding the attributes to be copied for each object. There is another one in setting up the snapshots on the target which works, though doesnt provide a clickthrough from the thumbnail.