c# - Special characters with XDocument -


i'm trying read file (not xml, structure similar), i'm getting exception:

'┴', hexadecimal value 0x15, invalid character. line 8, position 7. 

and file have lot of symbols, can't replace because can't modify content of file purposes...

that's code:

try {     xdocument doc = new xdocument(new xdeclaration("1.0", "utf-16", "yes"));     doc = xdocument.load(arquivo); } catch (exception e) {     messagebox.show(e.message.tostring()); } 

and that's part of file:

<codepage>utf16</codepage> <segment>0000016125     <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf     </control>     <source>to blablablah   firewall blablablah local ip address.    </source>     <target>para blablablah uma blablablah local específico.  </target> </segment> 

note: file don't have encode xml declaration.

this xml pretty bad;

  1. you have <segment>0000016125 in there which, while not technically illegal (it text node), kind of odd.
  2. your <control> element contains invalid characters without xml cdata section

you can manually normalize xml or in c# via string manipulation, or regex, or similar.

in simple example, <control> element has invalid characters; therefore relatively simple fix , add cdata section using string.replace() method, make this:

<control><![cdata[0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf]]></control> 

then can load xml xdocument using xdocument.parse(string xml):

string badxml = @"     <temproot>         <codepage>utf16</codepage>         <segment>0000016125             <control>0003┴300000┴english(u.s.)portuguese┴┴bla.000┴webgui\messages\xsl\en\blabla\blabla.xlf</control>             <source>to blablablah   firewall blablablah local ip address.    </source>             <target>para blablablah uma blablablah local específico.  </target>         </segment>     </temproot>";  // assuming <control> element has invalid characters string goodxml = badxml     .replace("<control>", "<control><![cdata[")     .replace("</control>", "]]></control>");  xdocument xdoc = xdocument.parse(goodxml); xdoc.declaration = new xdeclaration("1.0", "utf-16", "yes");  // stuff xdoc 

Comments

Popular posts from this blog

c# - OpenXML hanging while writing elements -

php - regexp cyrillic filename not matches -

sql - Select Query has unexpected multiple records (MS Access) -