c# - Special characters with XDocument -
i'm trying read file (not xml, structure similar), i'm getting exception:
'┴', hexadecimal value 0x15, invalid character. line 8, position 7. and file have lot of symbols, can't replace because can't modify content of file purposes...
that's code:
try { xdocument doc = new xdocument(new xdeclaration("1.0", "utf-16", "yes")); doc = xdocument.load(arquivo); } catch (exception e) { messagebox.show(e.message.tostring()); } and that's part of file:
<codepage>utf16</codepage> <segment>0000016125 <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf </control> <source>to blablablah firewall blablablah local ip address. </source> <target>para blablablah uma blablablah local específico. </target> </segment> note: file don't have encode xml declaration.
this xml pretty bad;
- you have
<segment>0000016125in there which, while not technically illegal (it text node), kind of odd. - your
<control>element contains invalid characters without xmlcdatasection
you can manually normalize xml or in c# via string manipulation, or regex, or similar.
in simple example, <control> element has invalid characters; therefore relatively simple fix , add cdata section using string.replace() method, make this:
<control><![cdata[0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></control> then can load xml xdocument using xdocument.parse(string xml):
string badxml = @" <temproot> <codepage>utf16</codepage> <segment>0000016125 <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</control> <source>to blablablah firewall blablablah local ip address. </source> <target>para blablablah uma blablablah local específico. </target> </segment> </temproot>"; // assuming <control> element has invalid characters string goodxml = badxml .replace("<control>", "<control><![cdata[") .replace("</control>", "]]></control>"); xdocument xdoc = xdocument.parse(goodxml); xdoc.declaration = new xdeclaration("1.0", "utf-16", "yes"); // stuff xdoc
Comments
Post a Comment