Changes between Version 1 and Version 2 of ExportFormat

Oct 6, 2010, 12:54:51 PM (9 years ago)



  • ExportFormat

    v1 v2  
     1These are some jumbled thoughts and comments on Simulation export and import.
     2Much of it will hopefully be true of Analysis and RFPulse, too, once those
     3apps have export and import.
    2 Our export format doesn't have rigorous documentation or an XSD.
    3 In lieu of that, you can get a long way by
    4 studying the output of an export. It's pretty straightforward. Some of
    5 the subtle points are explained below.
     5= Export and Import Technical Notes =
     7Export and import allow users
     8to exchange '''experiments''', '''metabolites''' and '''pulse sequences'''
     9via email or any
     10other file exchange medium. Thoughtful users could also use export/import
     11to perform backups, although this is more of a accidental feature than
     12an intentional one.
     14At present, there's no schema describing the XML file format. In lieu of that,
     15you can get a long way by studying the output of an export. It's pretty
     16straightforward. Some of the subtle points are explained below.
    718== The Export Comment ==
    920Each export contains a comment. This is informational only; our applications
    10 ignore it when importing.
     21ignore it when importing. At the moment, there's no way for users to see the
     22export comment other than looking directly at the XML.
    1224== Timestamps ==
    14 Timestamps are always in
     26Each file contains a timestamp at the top that marks when the file was
     29Some objects contain timestamps, too. Timestamps in our import/export files
     30are always in
    1531[ combined ISO format],
    1632e.g. `2010-04-30T15:14:56`. The seconds field is always present, and there's
    4359into our format, we'll import it.
     61== Metabolites and Pulse Sequences ==
     63The export of an metabolite or pulse sequence is straightforward. The
     64object's attributes are expressed in XML.
    4566== Experiments ==
    47 Experiments export files always include the metabolites and pulse sequence
    48 to which the experiment refers. In other words, each experiment contains all
    49 the information you need to recreate it.
     68An experiment is more complicated
     69because it references a pulse sequence and one or more metabs. The pulse
     70sequence isn't difficult to deal with because there's at most one so its
     71XML nodes are simply included as a subtree of the experiment.
    51 When metabolites are referred to in simulations, the export file doesn't
    52 repeat the entire metabolite definition, only the id. It's guaranteed that
    53 the id refers to a metabolite defined in the same experiment element.
     73The metabolite definitions are children of the experiment, just as the pulse
     74sequence is. However, each metabolite is also referenced from one or more
     75(probably many more) simulations that are also part of the experiment.
     76Repeating the metabolite definitions inside the simulations would be
     77(a) an inefficient use of space and (b) redundant. Instead, simulations
     78contain only references (by UUID) to the metabolites. It is guaranteed that
     79any metabolite which is referred to by a simulation has its full definition
     80in the experiment.
    55 Experiments exports don't always include the experiment results. It's up
    56 to the user who does the export whether or not the results will be
    57 included.
    5983== Compression ==
    61 Export files may be compressed; the compression is compatible
    62 with [ gzip]. Our applications examine the file
    63 contents (not the file name) to determine whether or not a file is
    64 compressed.
     85Simulation gives the option of compressing the export files it creates.
     86Compression is done with the
     87[ gzip module in Python's standard library]
     88and is compatible with the free gzip utility. Simulation encourages the user
     89to name compressed files with an extension of ".xml.gz" but that's not
     90strictly necessary.
     92When importing a file, Simulation automatically detects whether or not the
     93file is compressed. Simulation examines the file contents; the filename makes
     94no difference.
     96In short, compression is transparent to the user and nearly transparent to the app.
     98== Integrity ==
     100Assigning UUIDs to objects is meant to guarantee that e.g. metabolite
     101`a1b9f07b-3665-4ce8-ba4a-5f454baf9681` will always be a specfic definition
     102of aspartate. However, nothing prevents a user from exporting that
     103aspartate and hand-editing the XML to change aspartate's definition.
     104Guarding against this sort of tampering is simply out of the scope of this
     108== Concepts ==
     110 * Import is never destructive. It never overwrites anything in your existing
     111   database. If it changes your database at all, it does so by adding to
     112   what's already there.
     114 * Conflicts can arise between the names of existing objects and imported
     115   objects. When a conflict arises, import creates a unique name for the
     116   imported object. The current algorithm adds a timestamp to the name, and
     117   if that's still not unique it adds digits until a unique name is found.
     119 * You can import individual metabolites and pulse sequences from an
     120   experiment export file.
     123== Paths Not Taken ==
     125 * '''Selective imports.''' When the import code reads a file, it assumes
     126   that every object it finds in the XML file that's not already in the
     127   database should be added. The logic is simple.
     129 It might be nicer for the user if the import code examined the import
     130 file and showed the user a list of importable objects (along with, perhaps,
     131 a list of objects in the import file that already exist in the database).
     132 Naturally, the import dialog would give the user the opportunity to
     133 select which items he wants to import.
     135 There's nothing wrong or even complicated about this approach, it's just
     136 another dialog and more to design, write, debug, document and maintain.
     138 * '''Cancelling imports.''' Another thing that a more sophisticated import
     139 GUI could do is offer the opportunity to cancel an in-progress import.
     141 * '''Combining exports.''' At present, export creates new or overwrites its
     142 target file. It doesn't offer the option to append to an existing file.
     143 Furhtermore, export only exports one kind of object at a time (ignoring
     144 for the moment the fact that exporting experiments implies an export of
     145 metabs and pulse sequences too).
     147 There's no way, for instance, to export
     148 metabs and pulse sequences to the same file. Nor is there a way to
     149 export one's entire database to a single file (unless one would be
     150 satisified with only exporting in-use metabs and pulse sequences in which
     151 case exporting all experiments would be sufficient).
     153 There's no strong reason for this limitation, I just never wrote the code
     154 or designed the GUI to support combining export files.
     156 * '''Import as a menu item.''' Import is implemented as a button on each
     157 of the management dialogs. It could be a main menu item (simply "Import...")
     158 instead of "Import Metabolites...") . In fact this would probably be
     159 necessary if selective imports are implemented.
     161 Another advantage of moving import to the main menu is that it would resolve
     162 a pet peeve of mine. Each management dialog has a column of buttons to the
     163 left of the list as well as Import & Export below the list, and Close off
     164 on it's own in the lower right. The buttons in the column all act on items
     165 selected in the list, as does Export. From that point of view, Export belongs
     166 in the column. However, it looks strange (IMHO) to separate it from Import,
     167 hence the conundrum.
     169 If Import were moved to the main menu, the Import button would go away and
     170 Export could move into the column of buttons with its friends.