SiemensCsaHeaderParsing: csa.html

File csa.html, 9.3 KB (added by flip, 10 years ago)
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
2
3<html>
4
5<head>
6    <meta name="author" content="Philip Semanchuk">
7    <meta name="copyright" content="All contents &copy; 2010 Philip Semanchuk">
8   
9    <title>The Siemens DICOM CSA Image Header Tag</title> 
10
11    <style type="text/css">
12        pre {
13            border: 1px solid #aaa;
14            padding: 1em;
15            width: 50em;           
16            background-color: #eee;
17        }
18    </style>
19</head>
20
21<body>
22
23<h2>Parsing a Siemens CSA Header</h2>
24
25<p style="font-size: 80%">by Philip Semanchuk</p>
26
27<p>Here's a quick review of what I've been able to infer about the contents
28of a Siemens CSA Header. You find this kind of data in a DICOM tag like
29(0x0029, 0x1010), (0x0029, 0x1210), (0x0029, 0x1110), (0x0029, 0x1020),
30(0x0029, 0x1220), or (0x0029, 0x1120).
31</p>
32
33<p>My comments and observations are inferences. They might be wrong. YMMV.</p>
34
35<p>A Siemens CSA header is a mix of
36binary glop, ASCII, binary masquerading as ASCII, and noise masquerading
37as signal. It's also undocumented, so there's no specification to which
38to refer. A lot of what I know about it I learned from reading the code
39in the GDCM project's CSAHeader::LoadFromDataElement() inside
40gdcmCSAHeader.cxx. I don't know how that code's author figured out what's
41in a CSA header, but the code works.
42</p>
43
44<p>The data in my example is taken from a real DICOM file. I saved the tag
45data to a file and then ran <tt>hexdump -Cv the_file</tt>. The
46output below is taken from that hexdump. In between some hexdump rows I've
47added markup that denotes the sections represented in the hex data. I marked
48the sections with <tt>f1</tt>, <tt>f2</tt>, <tt>f3</tt>, etc. to denote
49the different fields. The fields numbers are my own and only make sense
50in the context of this documentation.
51</p>
52
53<p>Some general things to note &ndash;</p>
54
55<ul>
56    <li>The data in the tag is a list of elements, each of which contains
57        zero or more subelements. The subelements can't be further divided
58        and are either empty or contain a string.
59    </li>
60    <li>Everything begins on four byte boundaries.</li>
61    <li>The example below is little endian. I don't know if this data
62        can be big endian, and if that's possible I don't know what flag
63        would indicate that.
64    </li>
65    <li>Delimiters are thrown in here and there; they are 0x4d == 77 which is
66        ASCII 'M' and 0xcd == 205 which has no ASCII representation.
67    </li>
68    <li>Strings in the data are C-style NULL terminated.</li>
69    <li>Each hexdump line shows 16 bytes.</li>
70</ul>
71
72<p>Let's start by looking at the first 16 bytes of the tag.</p>
73
74<pre>
75          |   f1    | |   f2    |  |    f3   | |    f4   |
7600000000  53 56 31 30 04 03 02 01  53 00 00 00 4d 00 00 00  |SV10....S...M...|
77</pre>
78
79<p>The header starts with f1 which is always "SV10". That's followed by
80f2 which are the bytes 0x04, 0x03, 0x02 and 0x01.</p>
81
82<p>The first meaningful data are in f3 which contains the number of elements
83in the subsequent data. In the example above, we have 0x53 == 83 elements.
84f4 is a delimiter marking the end of this chunk. (Or the start of the next
85chunk. Or the day of the week. Or maybe they just wanted to add a delimiter to
86keep the others from getting lonely. Given that all fields' byte lengths are
87fixed, what's the point of adding delimiters? Your guess is as good as mine.)
88</p>
89   
90<pre>
91          |                      f5                      |         
9200000010  49 6d 61 67 65 4e 75 6d  62 65 72 00 47 47 47 00  |ImageNumber.GGG.|
9300000020  48 48 48 00 49 49 49 00  4a 4a 4a 00 4b 4b 4b 00  |HHH.III.JJJ.KKK.|
9400000030  4c 4c 4c 00 4d 4d 4d 00  4e 4e 4e 00 4f 4f 4f 00  |LLL.MMM.NNN.OOO.|
9500000040  50 50 50 00 51 51 51 00  52 52 52 00 53 53 53 00  |PPP.QQQ.RRR.SSS.|
96</pre>
97
98<p>Next we plunge straight into the first of those 83 elements. The first
99piece of an element is the name. It's 64 bytes long, but only the bytes up
100to and including the first NULL (ASCII 0x00) are significant so this name
101is just "ImageNumber". All of the
102"GGG.HHH.III.JJJ..." stuff you see in the hexdump is noise.
103</p>
104   
105<pre>
106          |    f6   | |    f7   |  |    f8   | |    f9   |
10700000050  01 00 00 00 49 53 00 00  06 00 00 00 00 00 00 00  |....IS..........|
108</pre>
109
110<p>Next we have four pieces of data about the element. Respectively,
111f6 &ndash; 9 represent VM, VR, syngo dt and the number of subelements.
112In the example above, these values are 1, IS (I = ASCII 0x49, S = ASCII 0x53),
1136 and 0. The number of subelements is often zero.
114</p>
115
116<pre>
117          |    f10  | |         f5 (64 bytes)  ---->
11800000060  cd 00 00 00 49 6d 61 67  65 43 6f 6d 6d 65 6e 74  |?...ImageComment|
11900000070  73 00 9b 00 9c 9c 9c 00  9d 9d 9d 00 9e 9e 9e 00  |s...............|
12000000080  9f 9f 9f 00 a0 a0 a0 00  a1 a1 a1 00 a2 a2 a2 00  |....???.???.???.|
12100000090  a3 a3 a3 00 a4 a4 a4 00  a5 a5 a5 00 a6 a6 a6 00  |???.???.???.???.|
122            end f5  | |   f6    |  |    f7   | |    f8   |
123000000a0  a7 a7 a7 00 01 00 00 00  4c 54 00 00 14 00 00 00  |???.....LT......|
124          |    f9   | |   f10   |  | f5 (64 bytes)  ---->
125000000b0  00 00 00 00 cd 00 00 00  52 65 66 65 72 65 6e 63  |....?...Referenc|
126</pre>
127
128<p>Next we find f10 which is another delimiter marking the end of this
129element. After that we're on to the next element which means it is back to f5
130again, the 64-byte element name which is "ImageComments". The values for
131VM, VR, syngo dt and subelement count are 1, LT, 20 and 0.
132</p>
133
134<pre>
135          |    f9   | |   f10   |  | f5 (64 bytes)  ---->
136000000b0  00 00 00 00 cd 00 00 00  52 65 66 65 72 65 6e 63  |....?...Referenc|
137000000c0  65 64 49 6d 61 67 65 53  65 71 75 65 6e 63 65 00  |edImageSequence.|
138000000d0  f2 f2 f2 00 f3 f3 f3 00  f4 f4 f4 00 f5 f5 f5 00  |???.???.???.???.|
139000000e0  f6 f6 f6 00 f7 f7 f7 00  f8 f8 f8 00 f9 f9 f9 00  |???.???.???.???.|
140                end f5          |  |    f6   | |   f7    |
141000000f0  fa fa fa 00 fb fb fb 00  00 00 00 00 55 49 00 00  |???.???.....UI..|
142          |    f8   | |   f9    |  |    f10  |
14300000100  19 00 00 00 06 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|
144</pre>
145
146<p>Note: line <tt>000000b0</tt> repeated for clarity.</p>
147
148<p>Next we have the element "ReferencedImageSequence". It's more interesting
149than the previous ones because it has 6 subelements. Lets have a look.
150</p>
151
152<pre>
153          |    f8   | |   f9    |  |    f10  | |   f11 -->
15400000100  19 00 00 00 06 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|
155                         end f11             | |  data -->
15600000110  1a 00 00 00 4d 00 00 00  1a 00 00 00 31 2e 32 2e  |....M.......1.2.|
15700000120  38 34 30 2e 31 30 30 30  38 2e 35 2e 31 2e 34 2e  |840.10008.5.1.4.|
158                end data  | |   |
15900000130  31 2e 31 2e 34 00 00 00  35 00 00 00 35 00 00 00  |1.1.4...5...5...|
160</pre>
161
162<p>Note: line <tt>00000100</tt> repeated for clarity.</p>
163
164<p>A subelement begins with f11 which is always 4x4 = 16 bytes long. Call
165these four chunks A, B, C and D. For some strange reason, C is always a
166delimiter, while A, B and D are always equal to one another. Go figger. They
167represent the length of the associated data string, which in this case is
1680x1a == 26 bytes.
169</p>
170
171<p>The 26 bytes of data follow immediately, and in this case they're
172the string <tt>1.2.840.10008.5.1.4.1.1.4</tt> (NULL-terminated, natch).
173Following that are something we haven't seen before which is 0 &ndash; 3
174padding bytes (with the value <tt>0x00</tt>) to make the next subelement start on
175a four-byte boundary. In this case, there's two bytes of padding.
176</p>
177
178<pre>
179                                   |          f11  -->
18000000130  31 2e 31 2e 34 00 00 00  35 00 00 00 35 00 00 00  |1.1.4...5...5...|
181                                |  |   data (53 bytes) -->
18200000140  4d 00 00 00 35 00 00 00  31 2e 33 2e 31 32 2e 32  |M...5...1.3.12.2|
18300000150  2e 31 31 30 37 2e 35 2e  32 2e 33 32 2e 33 35 30  |.1107.5.2.32.350|
18400000160  34 36 2e 32 30 30 39 30  37 32 38 31 33 35 34 30  |46.2009072813540|
185                           end of data + padding         |
18600000170  39 35 32 30 32 38 38 30  30 30 36 34 00 00 00 00  |952028800064....|
187          |         f11 for third subelement             |
18800000180  1a 00 00 00 1a 00 00 00  4d 00 00 00 1a 00 00 00  |........M.......|
189</pre>
190
191<p>Note: line <tt>00000130</tt> repeated for clarity.</p>
192
193
194<p>After the padding bytes that ended the first subelement, we begin the
195second with the A==B==D bytes representing a length of 0x35 = 53 bytes. In this
196case the data string is <tt>1.3.12.2.1107.5.2.32.35046.2009072813540952028800064</tt>.
197Following that is the beginning of the 3rd subelement which is 0x1a = 26 bytes
198long.
199</p>
200
201<p>I won't bother going through all six subelements, but I'll skip ahead to
202note one last thing which is the unceremonious transition from the last
203subelement of element 3 to the first bytes of element 4.
204</p>
205
206<pre>
207           end sub el.6  | pad  |  | 64-byte name el.4 -->
20800000260  30 30 34 39 00 00 00 00  50 61 74 69 65 6e 74 4f  |0049....PatientO|
20900000270  72 69 65 6e 74 61 74 69  6f 6e 00 00 50 a0 88 06  |rientation..P?..|
210</pre>
211
212<p>The name of element 4 ("PatientOrientation") begins immediately after
213the last subelement of element 3 ends. Thankfully, this is consistent with
214the case where an element has no subelements at all.
215</p>
216
217<p>That's all I have to offer. Have fun!</p>
218
219
220</body>
221</html>