c# - Special characters with XDocument -
i'm trying read file (not xml, structure similar), i'm getting exception:
'┴', hexadecimal value 0x15, invalid character. line 8, position 7.
and file have lot of symbols, can't replace because can't modify content of file purposes...
that's code:
try { xdocument doc = new xdocument(new xdeclaration("1.0", "utf-16", "yes")); doc = xdocument.load(arquivo); } catch (exception e) { messagebox.show(e.message.tostring()); }
and that's part of file:
<codepage>utf16</codepage> <segment>0000016125 <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf </control> <source>to blablablah firewall blablablah local ip address. </source> <target>para blablablah uma blablablah local específico. </target> </segment>
note: file don't have encode xml declaration.
this xml pretty bad;
- you have
<segment>0000016125
in there which, while not technically illegal (it text node), kind of odd. - your
<control>
element contains invalid characters without xmlcdata
section
you can manually normalize xml or in c# via string manipulation, or regex, or similar.
in simple example, <control>
element has invalid characters; therefore relatively simple fix , add cdata
section using string.replace()
method, make this:
<control><![cdata[0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></control>
then can load xml xdocument
using xdocument.parse(string xml)
:
string badxml = @" <temproot> <codepage>utf16</codepage> <segment>0000016125 <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</control> <source>to blablablah firewall blablablah local ip address. </source> <target>para blablablah uma blablablah local específico. </target> </segment> </temproot>"; // assuming <control> element has invalid characters string goodxml = badxml .replace("<control>", "<control><![cdata[") .replace("</control>", "]]></control>"); xdocument xdoc = xdocument.parse(goodxml); xdoc.declaration = new xdeclaration("1.0", "utf-16", "yes"); // stuff xdoc
Comments
Post a Comment