Copying projects between XNATs

In the DPUK project we’ve created a way of pushing data from DPUK XNAT nodes to the central DPUK XNAT instance using data-freezes. This allows us to push and publish different tranches of data over the lifetime of a research project, however this is an approach that’s specific to the DPUK project and needs a customised central XNAT to receive the data freezes.

If you wish to copy a project from one vanilla XNAT instance to another you must script it yourself. If your task consists of a project that contains only MRSessions then you might try this pyxnat script that I wrote to push data to the central dpuk XNAT node (not the hub, confusingly there is a node on the central infrastructure alongside the central hub).

The script uses pyxnat which you’ll have to install yourself and this brief blog entry sketches some of the design choices that were made in its development.

How to use the script

This transcript should give you an idea of how it works:

you@yourmachine:~$ python
Enter the Source xnat url:
Enter the Source xnat project id: example
Enter credentials for source xnat,
User: you
Enter the Target xnat url:
Enter the Target xnat project id: test
Enter credentials for target xnat,
User: anotheryou
creating subject: EXP0001
creating experiment: EXP0001_Day_1_MRI
downloading experiment files
downloaded and unzipped 452.3 MB in 10.2132329941 seconds
creating scan: 1
creating and uploading zip:
uploaded 15.5 KB in 0.26091003418 seconds
creating and uploading zip:
uploaded 471.9 KB in 0.284129858017 seconds
creating scan: 2
creating and uploading zip:
uploaded 658.0 KB in 0.209185838699 seconds
creating and uploading zip:
uploaded 41.6 MB in 1.21827507019 seconds
Copy experiment completed in 136.572870016 seconds
creating experiment: EXP0001_Day_2_MRI
downloading experiment files
downloaded and unzipped 452.3 MB in 70.3424210548 seconds
creating scan: 1
creating and uploading zip:
uploaded 20.2 KB in 0.227144956589 seconds
creating and uploading zip:
uploaded 471.8 KB in 0.257121801376 seconds
creating scan: 2
creating and uploading zip:
uploaded 655.2 KB in 0.254781961441 seconds
creating and uploading zip:
uploaded 41.6 MB in 0.911252975464 seconds
Copy experiment completed in 201.222012997 seconds
Copy subject completed in 784.395915031 seconds
Copy project completed in 3137.58364 seconds

The script will need an existing project to copy to. It first asks you to provide details of the source and the target projects. The script will copy over any experiment files, e.g. DICOM, NIFTI, SNAPSHOTS, PDFS that it finds and checks first for existing objects (subjects/experiments) on the target xnat project. If it finds existing objects it skips the copy.

How does it work:

The script iterates through the subjects on the source and checks to see if a subject with the same label is on the target. If not, it copies the subject object over. Then for each subject it does the same for experiments. Then the script checks to see if the target experiment has resources. If it has resources, it ignores it, otherwise it downloads all the resources for the experiment as a zip with a single API hit. Then it iterates through all the source scans and pushes the individual zips of scan/catalog resources to the target project/subject/experiment location.

Why not a bash script or XnatDataClient?

A bash script that uses curl and the XNAT Rest API is fine for creating an object or pushing some nifti files into an existing project / subject / experiment / scan combination but for manipulating object attributes, it got too complicated. The Rest API will provide you data in xml or json format. The xml for a single subject in the project I was working with (which isnt a complicated one) stretches to 2000 lines of text and the json is 6000. Pyxnat simplifies a lot of the complexity of working with the XNAT data. Though, there is a cost – the script as it stands will not capture all the scan data – more work is required here.

Why must you explicitly list the attributes?

XNAT is extensible which is one of the things that makes it powerful, but it also means that it’s hard to know a-priori what data you are going to encounter in a project. Perhaps I could have written some complicated discovery commands, but this seemed too daunting so I just hard-coded the attributes I expected to see.

Why use python requests to push the data

I couldnt work out how to use pyxnat to upload zipped files and extract them in-situ which is something that can be done by the underlying REST API so I dropped down to that, using python’s request library. An XNAT discussion thread informed this choice.

Why can you download all of the resources for an experiment but only upload the resources for a particular scan’s catalog?

Uploading all resources for an experiment in one hit is not supported, even though downloading is. I guess its because the structure is fairly flexible so you dont know for sure what you’re going to find there and copying the whole thing could break in multiple ways. But it would seem like an easy win in this instance to be able to do this.

Where are the hacks?

The main hack is hardcoding the attributes to be copied for each object. There is another one in setting up the snapshots on the target which works, though doesnt provide a clickthrough from the thumbnail.


Matt is a software developer and system administrator based in the Oxford Centre for Human Brain Activity at Oxford University's department of Psychiatry