Parsing a Siemens CSA Header

by Philip Semanchuk

Here's a quick review of what I've been able to infer about the contents of a Siemens CSA Header. You find this kind of data in a DICOM tag like (0x0029, 0x1010), (0x0029, 0x1210), (0x0029, 0x1110), (0x0029, 0x1020), (0x0029, 0x1220), or (0x0029, 0x1120).

My comments and observations are inferences. They might be wrong. YMMV.

A Siemens CSA header is a mix of binary glop, ASCII, binary masquerading as ASCII, and noise masquerading as signal. It's also undocumented, so there's no specification to which to refer. A lot of what I know about it I learned from reading the code in the GDCM project's CSAHeader::LoadFromDataElement() inside gdcmCSAHeader.cxx. I don't know how that code's author figured out what's in a CSA header, but the code works.

The data in my example is taken from a real DICOM file. I saved the tag data to a file and then ran hexdump -Cv the_file. The output below is taken from that hexdump. In between some hexdump rows I've added markup that denotes the sections represented in the hex data. I marked the sections with f1, f2, f3, etc. to denote the different fields. The fields numbers are my own and only make sense in the context of this documentation.

Some general things to note –

Let's start by looking at the first 16 bytes of the tag.

          |   f1    | |   f2    |  |    f3   | |    f4   |
00000000  53 56 31 30 04 03 02 01  53 00 00 00 4d 00 00 00  |SV10....S...M...|

The header starts with f1 which is always "SV10". That's followed by f2 which are the bytes 0x04, 0x03, 0x02 and 0x01.

The first meaningful data are in f3 which contains the number of elements in the subsequent data. In the example above, we have 0x53 == 83 elements. f4 is a delimiter marking the end of this chunk. (Or the start of the next chunk. Or the day of the week. Or maybe they just wanted to add a delimiter to keep the others from getting lonely. Given that all fields' byte lengths are fixed, what's the point of adding delimiters? Your guess is as good as mine.)

          |                      f5                      |          
00000010  49 6d 61 67 65 4e 75 6d  62 65 72 00 47 47 47 00  |ImageNumber.GGG.|
00000020  48 48 48 00 49 49 49 00  4a 4a 4a 00 4b 4b 4b 00  |HHH.III.JJJ.KKK.|
00000030  4c 4c 4c 00 4d 4d 4d 00  4e 4e 4e 00 4f 4f 4f 00  |LLL.MMM.NNN.OOO.|
00000040  50 50 50 00 51 51 51 00  52 52 52 00 53 53 53 00  |PPP.QQQ.RRR.SSS.|

Next we plunge straight into the first of those 83 elements. The first piece of an element is the name. It's 64 bytes long, but only the bytes up to and including the first NULL (ASCII 0x00) are significant so this name is just "ImageNumber". All of the "GGG.HHH.III.JJJ..." stuff you see in the hexdump is noise.

          |    f6   | |    f7   |  |    f8   | |    f9   |
00000050  01 00 00 00 49 53 00 00  06 00 00 00 00 00 00 00  |....IS..........|

Next we have four pieces of data about the element. Respectively, f6 – 9 represent VM, VR, syngo dt and the number of subelements. In the example above, these values are 1, IS (I = ASCII 0x49, S = ASCII 0x53), 6 and 0. The number of subelements is often zero.

          |    f10  | |         f5 (64 bytes)  ---->
00000060  cd 00 00 00 49 6d 61 67  65 43 6f 6d 6d 65 6e 74  |?...ImageComment|
00000070  73 00 9b 00 9c 9c 9c 00  9d 9d 9d 00 9e 9e 9e 00  |s...............|
00000080  9f 9f 9f 00 a0 a0 a0 00  a1 a1 a1 00 a2 a2 a2 00  |....???.???.???.|
00000090  a3 a3 a3 00 a4 a4 a4 00  a5 a5 a5 00 a6 a6 a6 00  |???.???.???.???.|
            end f5  | |   f6    |  |    f7   | |    f8   | 
000000a0  a7 a7 a7 00 01 00 00 00  4c 54 00 00 14 00 00 00  |???.....LT......|
          |    f9   | |   f10   |  | f5 (64 bytes)  ---->
000000b0  00 00 00 00 cd 00 00 00  52 65 66 65 72 65 6e 63  |....?...Referenc|

Next we find f10 which is another delimiter marking the end of this element. After that we're on to the next element which means it is back to f5 again, the 64-byte element name which is "ImageComments". The values for VM, VR, syngo dt and subelement count are 1, LT, 20 and 0.

          |    f9   | |   f10   |  | f5 (64 bytes)  ---->
000000b0  00 00 00 00 cd 00 00 00  52 65 66 65 72 65 6e 63  |....?...Referenc|
000000c0  65 64 49 6d 61 67 65 53  65 71 75 65 6e 63 65 00  |edImageSequence.|
000000d0  f2 f2 f2 00 f3 f3 f3 00  f4 f4 f4 00 f5 f5 f5 00  |???.???.???.???.|
000000e0  f6 f6 f6 00 f7 f7 f7 00  f8 f8 f8 00 f9 f9 f9 00  |???.???.???.???.|
                end f5          |  |    f6   | |   f7    |
000000f0  fa fa fa 00 fb fb fb 00  00 00 00 00 55 49 00 00  |???.???.....UI..|
          |    f8   | |   f9    |  |    f10  |
00000100  19 00 00 00 06 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|

Note: line 000000b0 repeated for clarity.

Next we have the element "ReferencedImageSequence". It's more interesting than the previous ones because it has 6 subelements. Lets have a look.

          |    f8   | |   f9    |  |    f10  | |   f11 -->
00000100  19 00 00 00 06 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|
                         end f11             | |  data -->
00000110  1a 00 00 00 4d 00 00 00  1a 00 00 00 31 2e 32 2e  |....M.......1.2.|
00000120  38 34 30 2e 31 30 30 30  38 2e 35 2e 31 2e 34 2e  |840.10008.5.1.4.|
                end data  | |   |
00000130  31 2e 31 2e 34 00 00 00  35 00 00 00 35 00 00 00  |1.1.4...5...5...|

Note: line 00000100 repeated for clarity.

A subelement begins with f11 which is always 4x4 = 16 bytes long. Call these four chunks A, B, C and D. For some strange reason, C is always a delimiter, while A, B and D are always equal to one another. Go figger. They represent the length of the associated data string, which in this case is 0x1a == 26 bytes.

The 26 bytes of data follow immediately, and in this case they're the string 1.2.840.10008.5.1.4.1.1.4 (NULL-terminated, natch). Following that are something we haven't seen before which is 0 – 3 padding bytes (with the value 0x00) to make the next subelement start on a four-byte boundary. In this case, there's two bytes of padding.

                                   |          f11  -->
00000130  31 2e 31 2e 34 00 00 00  35 00 00 00 35 00 00 00  |1.1.4...5...5...|
                                |  |   data (53 bytes) -->
00000140  4d 00 00 00 35 00 00 00  31 2e 33 2e 31 32 2e 32  |M...5...1.3.12.2|
00000150  2e 31 31 30 37 2e 35 2e  32 2e 33 32 2e 33 35 30  |.1107.5.2.32.350|
00000160  34 36 2e 32 30 30 39 30  37 32 38 31 33 35 34 30  |46.2009072813540|
                           end of data + padding         |
00000170  39 35 32 30 32 38 38 30  30 30 36 34 00 00 00 00  |952028800064....|
          |         f11 for third subelement             |
00000180  1a 00 00 00 1a 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|

Note: line 00000130 repeated for clarity.

After the padding bytes that ended the first subelement, we begin the second with the A==B==D bytes representing a length of 0x35 = 53 bytes. In this case the data string is 1.3.12.2.1107.5.2.32.35046.2009072813540952028800064. Following that is the beginning of the 3rd subelement which is 0x1a = 26 bytes long.

I won't bother going through all six subelements, but I'll skip ahead to note one last thing which is the unceremonious transition from the last subelement of element 3 to the first bytes of element 4.

           end sub el.6  | pad  |  | 64-byte name el.4 -->
00000260  30 30 34 39 00 00 00 00  50 61 74 69 65 6e 74 4f  |0049....PatientO|
00000270  72 69 65 6e 74 61 74 69  6f 6e 00 00 50 a0 88 06  |rientation..P?..|

The name of element 4 ("PatientOrientation") begins immediately after the last subelement of element 3 ends. Thankfully, this is consistent with the case where an element has no subelements at all.

That's all I have to offer. Have fun!