Provided by: dcmtk_3.6.9-6_amd64 

NAME
dcm2xml - Convert DICOM file and data set to XML
SYNOPSIS
dcm2xml [options] dcmfile-in [xmlfile-out]
DESCRIPTION
The dcm2xml utility converts the contents of a DICOM file (file format or raw data set) to XML
(Extensible Markup Language). There are two output formats. The first one is specific to DCMTK with its
DTD (Document Type Definition) described in the file dcm2xml.dtd. The second one refers to the "Native
DICOM Model" which is specified for the DICOM Application Hosting service found in DICOM part 19.
If dcm2xml reads a raw data set (DICOM data without a file format meta-header) it will attempt to guess
the transfer syntax by examining the first few bytes of the file. It is not always possible to correctly
guess the transfer syntax and it is better to convert a data set to a file format whenever possible
(using the dcmconv utility). It is also possible to use the -f and -t[ieb] options to force dcm2xml to
read a data set with a particular transfer syntax.
PARAMETERS
dcmfile-in DICOM input filename to be converted ("-" for stdin)
xmlfile-out XML output filename (default: stdout)
OPTIONS
general options
-h --help
print this help text and exit
--version
print version information and exit
--arguments
print expanded command line arguments
-q --quiet
quiet mode, print no warnings and errors
-v --verbose
verbose mode, print processing details
-d --debug
debug mode, print debug information
-ll --log-level [l]evel: string constant
(fatal, error, warn, info, debug, trace)
use level l for the logger
-lc --log-config [f]ilename: string
use config file f for the logger
input options
input file format:
+f --read-file
read file format or data set (default)
+fo --read-file-only
read file format only
-f --read-dataset
read data set without file meta information
input transfer syntax:
-t= --read-xfer-auto
use TS recognition (default)
-td --read-xfer-detect
ignore TS specified in the file meta header
-te --read-xfer-little
read with explicit VR little endian TS
-tb --read-xfer-big
read with explicit VR big endian TS
-ti --read-xfer-implicit
read with implicit VR little endian TS
long tag values:
+M --load-all
load very long tag values (e.g. pixel data)
-M --load-short
do not load very long values (default)
+R --max-read-length [k]bytes: integer (4..4194302, default: 4)
set threshold for long values to k kbytes
processing options
specific character set:
+Cr --charset-require
require declaration of extended charset (default)
+Ca --charset-assume [c]harset: string
assume charset c if no extended charset declared
+Cc --charset-check-all
check all data elements with string values
(default: only PN, LO, LT, SH, ST, UC and UT)
# this option is only used for the extended check whether
# the Specific Character Set (0008,0005) attribute should be
# present, but not for the conversion of unaffected element
# values to UTF-8 (e.g. element values with a VR of CS)
+U8 --convert-to-utf8
convert all element values that are affected
by Specific Character Set (0008,0005) to UTF-8
# requires support from an underlying character encoding
# library (see output of --version on which one is available)
output options
general XML format:
-dtk --dcmtk-format
output in DCMTK-specific format (default)
-nat --native-format
output in Native DICOM Model format (part 19)
+Xn --use-xml-namespace
add XML namespace declaration to root element
DCMTK-specific format (not with --native-format):
+Xd --add-dtd-reference
add reference to document type definition (DTD)
+Xe --embed-dtd-content
embed document type definition into XML document
+Xf --use-dtd-file [f]ilename: string
use specified DTD file (only with +Xe)
(default: /usr/local/share/dcmtk-<VERSION>/dcm2xml.dtd)
+Wn --write-element-name
write name of the DICOM data elements (default)
-Wn --no-element-name
do not write name of the DICOM data elements
+Wb --write-binary-data
write binary data of OB and OW elements
(default: off, be careful with --load-all)
encoding of binary data:
+Eh --encode-hex
encode binary data as hex numbers
(default for DCMTK-specific format)
+Eu --encode-uuid
encode binary data as a UUID reference
(default for Native DICOM Model)
+Eb --encode-base64
encode binary data as Base64 (RFC 2045, MIME)
DCMTK Format
The basic structure of the DCMTK-specific XML output created from a DICOM file looks like the following:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
<file-format xmlns="http://dicom.offis.de/dcmtk">
<meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
<element tag="0002,0000" vr="UL" vm="1" len="4"
name="MetaElementGroupLength">
166
</element>
...
<element tag="0002,0013" vr="SH" vm="1" len="16"
name="ImplementationVersionName">
OFFIS_DCMTK_353
</element>
</meta-header>
<data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
<element tag="0008,0005" vr="CS" vm="1" len="10"
name="SpecificCharacterSet">
ISO_IR 100
</element>
...
<sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
<item card="3">
<element tag="0028,3002" vr="xs" vm="3" len="6"
name="LUTDescriptor">
256\0\8
</element>
...
</item>
...
</sequence>
...
<element tag="7fe0,0010" vr="OW" vm="1" len="262144"
name="PixelData" loaded="no" binary="hidden">
</element>
</data-set>
</file-format>
The "file-format" and "meta-header" tags are absent for DICOM data sets.
XML Encoding
Attributes with very large value fields (e.g. pixel data) are not loaded by default. They can be
identified by the additional attribute "loaded" with a value of "no" (see example above). The command
line option --load-all forces to load all value fields including the very long ones.
Furthermore, binary data of OB and OW attributes are not written to the XML output file by default. These
elements can be identified by the additional attribute "binary" with a value of "hidden" (default is
"no"). The command line option --write-binary-data causes also binary value fields to be printed
(attribute value is "yes" or "base64"). But, be careful when using this option together with --load-all
because of the large amounts of pixel data that might be printed to the output. Please note that in this
context element values with a VR of OD, OF, OL and OV are not regarded as "binary data".
Multiple values (i.e. where the DICOM value multiplicity is greater than 1) are separated by a backslash
"\" (except for Base64 encoded data). The "len"attribute indicates the number of bytes for the
particular value field asstored in the DICOM data set, i.e. it might deviate from the XML encoded
valuelength e.g. because of non-significant padding that has been removed. If thisattribute is missing
in "sequence" or "item" start tags, the correspondingDICOM element has been stored with undefined
length.@section dcm2xml_native_format Native DICOM Model FormatThe description of the Native DICOM Model
format can be found in the DICOMstandard, part 19 ("Application Hosting").@subsection dcm2xml_bulk_data
Bulk DataBinary data, i.e. DICOM element values with Value Representations (VR) of OBor OW, as well as
OD, OF, OL, OV and UN values are by default not written to theXML output because of their size. Instead,
for each element, a new UniversallyUnique Identifier (UUID) is being generated and written as an
attribute of a\<BulkData\> XML element. So far, there is no possibility to write anadditional file to
hold the binary data for each of the binary data chunks.This is not required by the standard, however, it
might be useful forimplementing an Application Hosting interface; thus this feature may beavailable in
future versions of \b dcm2xml.In addition, Supplement 163 (Store Over the Web by Representational
StateTransfer Services) introduces a new \<InlineBinary\> XML element that allowsfor encoding binary data
as Base64. Currently, the command line option\e --encode-base64 enables this encoding for the following
VRs: OB, OD, OF, OL,OV, OW and UN.@subsection dcm2xml_known_issues Known IssuesIn addition to what is
written in the above section on "Bulk Data", there arefurther known issues with the current
implementation of the Native DICOM Modelformat. For example, large element values with a VR other than
OB, OD, OF, OL,OV, OW or UN are currently never written as bulk data, although it might beuseful, e.g.
for very long text elements (especially UT) or very long numericfields (of various VRs).@section
dcm2xml_notes NOTES@subsection dcm2xml_character_encoding Character EncodingThe XML character encoding is
determined automatically from the DICOM attribute(0008,0005) "Specific Character Set" using the following
mapping:@verbatim ASCII (ISO_IR 6) => "UTF-8"UTF-8 "ISO_IR 192" => "UTF-8"ISO Latin
1 "ISO_IR 100" => "ISO-8859-1"ISO Latin 2 "ISO_IR 101" => "ISO-8859-2"ISO Latin 3 "ISO_IR 109"
=> "ISO-8859-3"ISO Latin 4 "ISO_IR 110" => "ISO-8859-4"ISO Latin 5 "ISO_IR 148" =>
"ISO-8859-9"ISO Latin 9 "ISO_IR 203" => "ISO-8859-15"Cyrillic "ISO_IR 144" =>
"ISO-8859-5"Arabic "ISO_IR 127" => "ISO-8859-6"Greek "ISO_IR 126" =>
"ISO-8859-7"Hebrew "ISO_IR 138" => "ISO-8859-8"\endverbatimIf this DICOM attribute is missing in
the input file, although needed, option\e --charset-assume can be used to specify an appropriate
character setmanually (using one of the DICOM defined terms). For reasons of backwardcompatibility with
previous versions of this tool, the following terms are alsosupported and mapped automatically to the
associated DICOM defined terms:latin-1, latin-2, latin-3, latin-4, latin-5, latin-9, cyrillic, arabic,
greek,hebrew.Multiple character sets using code extension techniques are not supported. Ifneeded, option
\e --convert-to-utf8 can be used to convert the DICOM file ordata set to UTF-8 encoding prior to the
conversion to XML format. This is alsouseful for DICOMDIR files where each directory record can have a
differentcharacter set.If no mapping is defined and option \e --convert-to-utf8 is not used, non-
ASCIIcharacters and those below #32 are stored as "&#nnn;" where "nnn" refers to thenumeric character
code. This might lead to invalid character entity references(such as "" for ESC) and will cause
most XML parsers to reject the document.@section dcm2xml_logging LOGGINGThe level of logging output of
the various command line tools and underlyinglibraries can be specified by the user. By default, only
errors and warningsare written to the standard error stream. Using option \e --verbose alsoinformational
messages like processing details are reported. Option\e --debug can be used to get more details on the
internal activity, e.g. fordebugging purposes. Other logging levels can be selected using option\e
--log-level. In \e --quiet mode only fatal errors are reported. In suchvery severe error events, the
application will usually terminate. For moredetails on the different logging levels, see documentation
of module "oflog".In case the logging output should be written to file (optionally with logfilerotation),
to syslog (Unix) or the event log (Windows) option \e --log-configcan be used. This configuration file
also allows for directing only certainmessages to a particular output stream and for filtering certain
messagesbased on the module or application where they are generated. An exampleconfiguration file is
provided in <em>\<etcdir\>/logger.cfg</em>.@section dcm2xml_command_line COMMAND LINEAll command line
tools use the following notation for parameters: squarebrackets enclose optional values (0-1), three
trailing dots indicate thatmultiple values are allowed (1-n), a combination of both means 0 to n
values.Command line options are distinguished from parameters by a leading '+' or '-'sign, respectively.
Usually, order and position of command line options arearbitrary (i.e. they can appear anywhere).
However, if options are mutuallyexclusive the rightmost appearance is used. This behavior conforms to
thestandard evaluation rules of common Unix shells.In addition, one or more command files can be
specified using an '@' sign as aprefix to the filename (e.g. <em>\@command.txt</em>). Such a command
argumentis replaced by the content of the corresponding text file (multiplewhitespaces are treated as a
single separator unless they appear between twoquotation marks) prior to any further evaluation. Please
note that a commandfile cannot contain another command file. This simple but effective approachallows
one to summarize common combinations of options/parameters and avoidslongish and confusing command lines
(an example is provided in file<em>\<datadir\>/dumppat.txt</em>).@section dcm2xml_environment
ENVIRONMENTThe \b dcm2xml utility will attempt to load DICOM data dictionaries specifiedin the \e
DCMDICTPATH environment variable. By default, i.e. if the\e DCMDICTPATH environment variable is not set,
the file<em>\<datadir\>/dicom.dic</em> will be loaded unless the dictionary is builtinto the application
(default for Windows).The default behavior should be preferred and the \e DCMDICTPATH environmentvariable
only used when alternative data dictionaries are required. The\e DCMDICTPATH environment variable has
the same format as the Unix shell\e PATH variable in that a colon (":") separates entries. On Windows
systems,a semicolon (";") is used as a separator. The data dictionary code will attempt to load each file
specified in the DCMDICTPATH environment variable. It is an error if no data dictionary can be loaded.
Depending on the command line options specified, the dcm2xml utility will attempt to load character set
mapping tables. This happens when DCMTK was compiled with the oficonv library (which is the default) and
the mapping tables are not built into the library (default when DCMTK uses shared libraries).
The mapping table files are expected in DCMTK's <datadir>. The DCMICONVPATH environment variable can be
used to specify a different location. If a different location is specified, those mapping tables also
replace any built-in tables.
FILES
<datadir>/dcm2xml.dtd - Document Type Definition (DTD) file
SEE ALSO
xml2dcm(1), dcmconv(1)
COPYRIGHT
Copyright (C) 2002-2024 by OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany.
Version 3.6.9 Wed Dec 10 2025 21:34:17 dcm2xml(1)