html5 - Anything wrong with using windows-1252 instead of UTF-8 -
i have test site has been using windows-1252 along. need/use symbols square root symbol. , have no need display in language other english. asked switch utf-8 because of security concerns. after changed utf-8 square roots , other symbols (which being pulled out of oracle db , passed through coldfusion) appear fine on resulting web page. however, if saved document again (post db, page refreshes) symbols transformed strange characters. if saved again more strange characters appear. so...
- if don't need other english there wrong sticking windows-1252? security/hacking issues?
- are there implications of not using utf-8 if using html5 (since default encoding html5)?
- if recommended should switch utf-8, how stored square root symbols (and other symbols) work?
i've read these pages, still having little trouble grasping all. hoping here , clarify me. thanks!
- https://www.owasp.org/index.php/canonicalization,_locale_and_unicode
- excellent description of how utf-8 came about, why it’s awesome, , problems solves… https://www.youtube.com/watch?v=mijmeoh9lt4
- http://www.w3.org/international/questions/qa-choosing-encodings “use utf-8, if can”. “in fact html5 specification draft says "authors encouraged use utf-8. conformance checkers may advise authors against using legacy encodings. authoring tools should default using utf-8 newly-created documents."”
- http://www.w3schools.com/tags/ref_charactersets.asp “for html5, default character encoding utf-8.”
- http://www.joelonsoftware.com/articles/unicode.html
* * * update * * *
i appreciate far make easier understand. i'll simplify original 3 questions clear answer can reached, here is: customer doesn't need support other languages, using html5 tags , ton of json/xml traffic sent , forth via jquery.ajax(). given info, security standpoint, there wrong keeping database set nls_characterset: we8mswin1252
, webpages set <cfheader name="content-type" value="text/html; charset=windows-1252">
? thank you.
here question slight spin off one: why able use character that's not part of charset (windows-1252)?.
windows 1252 1 of many many fixed size character sets. mac has own set. there few iso various parts of europe , other parts of world. of them have slight variations.
the point have fixed-size character, meaning 1 character = 1 byte no matter what.
the bad points are:
- some people may not have encoding installed
- some people may use different encoding, resulting in few issues, not obvious see, ugly on long run
- you can support few languages
that include citation make. in windows-1252 can't display russian, greek, polish ...
utf-8 standard encoding unicode representation on 1+ bytes. can represent large majority of characters may encounter, although designed latin-based languages, other languages take more storage space.
it in used in xml, json, , types of web services may find. default when don't know encoding use. allows limit number of encoding issues, such "i though in latin-1 / no, using latin-9, guy on mac used roman". if have more 1 people working on content of website, may have different encodings on plateforme, , therefore content may messed @ point.
utf-8 is, far know, way standardize encoding used between people without discussion.
typical example is, if website encoded in windows1252, , new dev has mac, you'll in trouble.
Comments
Post a Comment