Tom Goff's .Net Musings

Tidbits of information regarding .Net, C#, and SQL Server.

XML Serialization Sorrows

with 11 comments

One of the features of our flagship product allows the user to save the results from a T-SQL script (run against one or more SQL Server databases) to a file. These results are stored in XML and can be re-opened and viewed by the user. One problem we have run into is that the XML standard does not allow certain characters[^]. For example, the null (0x00) character is not technically allowed even when encoded as “�” in the XML file.

Before I coded a work-around or fix, I wanted to understand where the problem data was coming from. In one instance, the null character was clearly coming from a string (or varchar) column. The SELECT statement was:

SELECT TOP 1 filename FROM sysaltfiles WITH (NOLOCK)

It turned out that the filename column included a trailing null character, but only on SQL Server 2000. I was able to verify this with Microsoft’s Query Analzyer.

There was another instance of an invalid character showing up, but this time the SELECT statement was:

SELECT password FROM syslogins WITH (NOLOCK)

It appears that syslogins is a view on top of sysxlogins, which defines the password column as a varbinary. When syslogins selects the password from sysxlogins it converts it as-is to a varchar. This produces the invalid characters, because the binary data cannot be converted in this manner.

Beyond these two tables a user can still write a valid SELECT statement which would produce an invalid character like so:

SELECT CHAR(0) AS InvalidChar

Now, I’ve been calling the characters “invalid” when really they are perfectly valid. SQL Server, .Net, ADO.Net, and all the controls used to display this data perfectly supports these characters. Even more, when saving these characters the XML serializer would properly encode them using their hexadecimal value (e.g. “�” for the null character). The problem was with the XML deserialization process.

When loading an XML file with one of these invalid characters, an InvalidOperationException[^] would be thrown with the following message:

There is an error in XML document (<Line>, <Column>). 

This exception had an inner exception of type XmlException[^] and the following message:

‘.’, hexadecimal value 0x00, is an invalid character. Line <Line>, position <Column>.

The code used to deserialize the file was similar to the following:

TestObject testObject = null; 
if (true == File.Exists(Form1.fileName)) { 
    XmlSerializer xs = null; 
    TextReader tr = null; 

    try { 
        xs = new XmlSerializer(typeof(TestObject)); 
        tr = new StreamReader(Form1.fileName); 
        testObject = xs.Deserialize(tr) as TestObject; 
    } 
    catch (Exception ex) { 
        //... Exception caught here ... 
    } 
    finally { 
        if (null != tr) 
            tr.Close(); 
    } 
}

The solution was very simple, but not obvious. Simply wrapping a XmlTextReader[^] around the StreamReader[^] allows the code to properly decode the invalid characters. This change is shown here:

TestObject testObject = null; 
if (true == File.Exists(Form1.fileName)) { 
    XmlSerializer xs = null; 
    TextReader tr = null; 

    try { 
        xs = new XmlSerializer(typeof(TestObject)); 
        tr = new StreamReader(Form1.fileName); 

        // Wrap stream with an XmlTextReader so bad chars are accepted 
        XmlTextReader xtr = new XmlTextReader(tr); 
        testObject = xs.Deserialize(xtr) as TestObject; 
    } 
    catch (Exception ex) { 
        // ... No exception thrown ... 
    } 
    finally { 
        if (null != tr) 
            tr.Close(); 
    } 
}

I’m unsure why the XML standard doesn’t allow all characters (even the null character, when properly encoded), but I’m sure they had/have their reasons. The only down side is that XML files that contain these invalid characters will still be considered invalid by other applications.

I’ve put together a sample project[^] that demonstrates the XML portion of this blog.

Advertisements

Written by Tom

August 23, 2007 at 1:59 pm

Posted in .Net, Ado.Net, C#, Gotcha, SQL Server, Xml

11 Responses

Subscribe to comments with RSS.

  1. I just want you to know that your solution actually helped someone:)

    Stilgar

    January 29, 2008 at 5:36 pm

  2. Good to hear.

    Tom Goff

    January 30, 2008 at 8:16 am

  3. Tom, thanks — it helped me too.
    Did you find the reason why MS XML doesn’t allow some characters to be encoded into XML?

    Dennis Gorelik

    March 3, 2009 at 1:23 pm

    • Hi Dennis,

      The issue has to do with the XmlTextReader.Normalization property. By default this property is set to false, which allows the encoded characters to be decoded (keep in mind that they encode just fine). When you don’t explicitly create the XmlTextReader, one is created for you that has Normalization set to true.

      From the MSDN documentation on Normalization, there is a line that states that character range checking is not performed when Normalization is false.

      If Normalization is set to false, this also disables character range checking for numeric entities. As a result, character entities, such as , are allowed.

      Tom Goff

      March 4, 2009 at 8:26 am

  4. Tom,

    Please consider InvalidChar.xml with invalid char:

    <?xml version=”1.0″ encoding=”utf-8″?>
    <xml>
    &#xC;
    </xml>

    When I try to open it in FireFox — I’m getting “XML Parsing Error: reference to invalid character number”.
    FireFox is not developed by Microsoft, so I assume that FireFox 3.0.6 is enforcing commonly recognized XML standard in that case.
    The ironic part is that I can open InvalidChar.xml in Internet Explorer 7.0 without errors.

    What’s your take on this?
    Why this chars are prohibited in XML?
    How should I treat such characters if I need to serialize them into XML?

    Dennis Gorelik

    March 4, 2009 at 10:30 am

    • Firefox is probably following the standard as defined by the World Wide Web Consortium (W3C). Just appears that Microsoft is being more flexible in allowing additional chars.

      Tom Goff

      March 5, 2009 at 11:32 am

  5. Brilliant!

    Valso

    August 26, 2009 at 4:01 pm

  6. Great Contribution.
    Thanks!

    Obberer

    December 28, 2010 at 2:18 pm

  7. Thanks! It was something strange to see, that XMLSerializer fails to decode its XML.
    Microsoft mode.

    rdstmk

    July 29, 2011 at 5:49 am

  8. […] the help of Google I have found this post (2007) with a way to have an happy […]

  9. […] the help of Google I have found this post (2007) with a way to have an happy […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: