Recipe 17.7 Handling Invalid Characters in anXML String
Problem
You are
creating an XML string. Before adding a tag containing a text
element, you want to check it to determine whether the string
contains any of the following invalid characters:
- <
- >
- \"
- \'
- &
If any of these characters are encountered, you want them to be
replaced with their escaped form:
- <
- >
- "
- '
- &
Solution
There are different methods to accomplish this, depending on which
XML creation approach you are using. If you are using
XmlTextWriter, the WriteCData
and WriteElementString methods take care of this
for you. If you are using XmlDocument and
XmlElements, the
XmlElement.InnerXML and
XmlElement.InnerText methods will handle these
characters.
The two ways to handle this using an
XmlTextWriter work like this. The
WriteCData method will wrap the invalid character
text in a CDATA section, as shown in the creation
of the InvalidChars1 element in the example that
follows. The other method, using XmlTextWriter, is to
use the WriteElementString method that will
automatically escape the text for you, as shown while creating the
InvalidChars2 element:
// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlTextWriter writer = new XmlTextWriter(Console.Out);
writer.WriteStartElement("Root");
writer.WriteStartElement("InvalidChars1");
writer.WriteCData(invalidChars);
writer.WriteEndElement( );
writer.WriteElementString("InvalidChars2",invalidChars);
writer.WriteEndElement( );
writer.Close( );
The output from this is:
<Root>
<InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
<InvalidChars2><>\&'</InvalidChars2>
</Root>
The two
ways you can handle this problem with XmlDocument
and XmlElement are as follows: the first way is to
surround the text you are adding to the XML element with a
CDATA section, and add it to the
InnerXML property of the
XmlElement like this:
// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";
The second way is to let the XmlElement class
escape the data for you by assigning the text directly to the
InnerText property like this:
// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
invalidElement2.InnerText = invalidChars;
The whole XmlDocument is created with these
XmlElements in this
code:
public static void HandlingInvalidChars( )
{
// set up a string with our invalid chars
string invalidChars = @"<>\&'";
XmlDocument xmlDoc = new XmlDocument( );
// create a root node for the document
XmlElement root = xmlDoc.CreateElement("Root");
xmlDoc.AppendChild(root);
// create the first invalid character node
XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");
// wrap the invalid chars in a CDATA section and use the
// InnerXML property to assign the value as it doesn't
// escape the values, just passes in the text provided
invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";
// append the element to the root node
root.AppendChild(invalidElement1);
// create the second invalid character node
XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");
// Add the invalid chars directly using the InnerText
// property to assign the value as it will automatically
// escape the values
invalidElement2.InnerText = invalidChars;
// append the element to the root node
root.AppendChild(invalidElement2);
Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml);
Console.WriteLine( );
}
The XML created by this procedure (and output to the console) looks
like this:
<Root>
<InvalidChars1><![CDATA[<>\&']]></InvalidChars1>
<InvalidChars2><>\&'</InvalidChars2>
</Root>
Discussion
One of the more
interesting types of nodes is the CDATA type of
node. A CDATA node allows you to represent the
items in the text section as character data, not as escaped XML, for
ease of entry. Normally these characters would need to be in their
escaped format (< for
< and so on) but the CDATA
section allows us to enter them as regular text.
When the CDATA tag is
used in conjunction with the InnerXML property of
the XmlElement class, you can submit characters
that would normally need to be escaped first. The
XmlElement class also has an
InnerText property that will automatically escape
any markup found in the string assigned. This allows you to add these
characters without having to worry about them.
See Also
See the "XmlDocument Class,"
"XmlElement Class," and
"CDATA Sections" topics in the MSDN
documentation.
|