It is pretty common to come across a scenario where we have to deal with special characters in XML. Like &, (, ), $, etc. With this fix the < be the only illeagal character.
Let’s take look at one way of fixing it if one does not have any control over the XML being received
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
string xmlString = "<?xml version=\'1.0\' encoding=\'UTF-8\' standalone=\'yes\'?>\n<rows xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:x=\"urn:row\">\n<xsd:schema targetNamespace=\"urn:row\">\n<xsd:element name=\"row\">\n<xsd:complexType>\n<xsd:sequence>\n<xsd:element name=\"customer_name\" type=\"xsd:string\" nillable=\"true\"/>\n</xsd:sequence>\n</xsd:complexType>\n</xsd:element>\n</xsd:schema>\n<x:row>\n<customer_name>A&B Company</customer_name>\n</x:row>\n</rows>"; | |
// xmlString.Dump(); // LINQ Pad | |
// var doc = XElement.Parse(xmlString); // Error! | |
string pattern = "(?<start>>)(?<content>.+?(?<!>))(?<end><)|(?<start>\")(?<content>.+?)(?<end>\")"; | |
string result = Regex.Replace(xmlString, pattern, m => | |
m.Groups["start"].Value + | |
HttpUtility.HtmlEncode(HttpUtility.HtmlDecode(m.Groups["content"].Value)) + | |
m.Groups["end"].Value); | |
// result.Dump(); // LINQ Pad | |
var doc = XElement.Parse(result); |
So if one were to execute line 4, following XmlException would be thrown:
