wiki:ExportFormat

Version 2 (modified by flip, 9 years ago) (diff)

--

These are some jumbled thoughts and comments on Simulation export and import. Much of it will hopefully be true of Analysis and RFPulse, too, once those apps have export and import.

Export and Import Technical Notes

Export and import allow users to exchange experiments, metabolites and pulse sequences via email or any other file exchange medium. Thoughtful users could also use export/import to perform backups, although this is more of a accidental feature than an intentional one.

At present, there's no schema describing the XML file format. In lieu of that, you can get a long way by studying the output of an export. It's pretty straightforward. Some of the subtle points are explained below.

The Export Comment

Each export contains a comment. This is informational only; our applications ignore it when importing. At the moment, there's no way for users to see the export comment other than looking directly at the XML.

Timestamps

Each file contains a timestamp at the top that marks when the file was created.

Some objects contain timestamps, too. Timestamps in our import/export files are always in combined ISO format, e.g. 2010-04-30T15:14:56. The seconds field is always present, and there's never time zone information.

Timestamps are always in the local time of the machine that wrote them. Using local time isn't ideal for files that are meant to be shared globally, but time zone information in Python isn't easy to deal with and we opted not to.

Missing Fields

In general, our import code doesn't care if optional fields are present and empty or simply not present. If it's not present, our code assigns a default value.

For instance, a blank comment can be represented as <comment /> or simply not present at all.

It's not valid for mandatory fields to be missing; e.g. each metabolite element must contain at least one spin element.

UUIDs

An object's UUID is stored in its id attribute.

It's valid for objects in an export file to lack a UUID. In this case, when they're imported, a new id is assigned. This makes it easier to import objects from 3rd party software into Vespa -- if you can convert the 3rd party format into our format, we'll import it.

Metabolites and Pulse Sequences

The export of an metabolite or pulse sequence is straightforward. The object's attributes are expressed in XML.

Experiments

An experiment is more complicated because it references a pulse sequence and one or more metabs. The pulse sequence isn't difficult to deal with because there's at most one so its XML nodes are simply included as a subtree of the experiment.

The metabolite definitions are children of the experiment, just as the pulse sequence is. However, each metabolite is also referenced from one or more (probably many more) simulations that are also part of the experiment. Repeating the metabolite definitions inside the simulations would be (a) an inefficient use of space and (b) redundant. Instead, simulations contain only references (by UUID) to the metabolites. It is guaranteed that any metabolite which is referred to by a simulation has its full definition in the experiment.

Compression

Simulation gives the option of compressing the export files it creates. Compression is done with the gzip module in Python's standard library and is compatible with the free gzip utility. Simulation encourages the user to name compressed files with an extension of ".xml.gz" but that's not strictly necessary.

When importing a file, Simulation automatically detects whether or not the file is compressed. Simulation examines the file contents; the filename makes no difference.

In short, compression is transparent to the user and nearly transparent to the app.

Integrity

Assigning UUIDs to objects is meant to guarantee that e.g. metabolite a1b9f07b-3665-4ce8-ba4a-5f454baf9681 will always be a specfic definition of aspartate. However, nothing prevents a user from exporting that aspartate and hand-editing the XML to change aspartate's definition. Guarding against this sort of tampering is simply out of the scope of this application.

Concepts

  • Import is never destructive. It never overwrites anything in your existing database. If it changes your database at all, it does so by adding to what's already there.

  • Conflicts can arise between the names of existing objects and imported objects. When a conflict arises, import creates a unique name for the imported object. The current algorithm adds a timestamp to the name, and if that's still not unique it adds digits until a unique name is found.

  • You can import individual metabolites and pulse sequences from an experiment export file.

Paths Not Taken

  • Selective imports. When the import code reads a file, it assumes that every object it finds in the XML file that's not already in the database should be added. The logic is simple.

It might be nicer for the user if the import code examined the import file and showed the user a list of importable objects (along with, perhaps, a list of objects in the import file that already exist in the database). Naturally, the import dialog would give the user the opportunity to select which items he wants to import.

There's nothing wrong or even complicated about this approach, it's just another dialog and more to design, write, debug, document and maintain.

  • Cancelling imports. Another thing that a more sophisticated import GUI could do is offer the opportunity to cancel an in-progress import.
  • Combining exports. At present, export creates new or overwrites its target file. It doesn't offer the option to append to an existing file. Furhtermore, export only exports one kind of object at a time (ignoring for the moment the fact that exporting experiments implies an export of metabs and pulse sequences too).

There's no way, for instance, to export metabs and pulse sequences to the same file. Nor is there a way to export one's entire database to a single file (unless one would be satisified with only exporting in-use metabs and pulse sequences in which case exporting all experiments would be sufficient).

There's no strong reason for this limitation, I just never wrote the code or designed the GUI to support combining export files.

  • Import as a menu item. Import is implemented as a button on each of the management dialogs. It could be a main menu item (simply "Import...") instead of "Import Metabolites...") . In fact this would probably be necessary if selective imports are implemented.

Another advantage of moving import to the main menu is that it would resolve a pet peeve of mine. Each management dialog has a column of buttons to the left of the list as well as Import & Export below the list, and Close off on it's own in the lower right. The buttons in the column all act on items selected in the list, as does Export. From that point of view, Export belongs in the column. However, it looks strange (IMHO) to separate it from Import, hence the conundrum.

If Import were moved to the main menu, the Import button would go away and Export could move into the column of buttons with its friends.