html5 - Anything wrong with using windows-1252 instead of UTF-8 -


i have test site has been using windows-1252 along. need/use symbols square root symbol. , have no need display in language other english. asked switch utf-8 because of security concerns. after changed utf-8 square roots , other symbols (which being pulled out of oracle db , passed through coldfusion) appear fine on resulting web page. however, if saved document again (post db, page refreshes) symbols transformed strange characters. if saved again more strange characters appear. so...

  1. if don't need other english there wrong sticking windows-1252? security/hacking issues?
  2. are there implications of not using utf-8 if using html5 (since default encoding html5)?
  3. if recommended should switch utf-8, how stored square root symbols (and other symbols) work?

i've read these pages, still having little trouble grasping all. hoping here , clarify me. thanks!

  1. https://www.owasp.org/index.php/canonicalization,_locale_and_unicode
  2. excellent description of how utf-8 came about, why it’s awesome, , problems solves… https://www.youtube.com/watch?v=mijmeoh9lt4
  3. http://www.w3.org/international/questions/qa-choosing-encodings “use utf-8, if can”. “in fact html5 specification draft says "authors encouraged use utf-8. conformance checkers may advise authors against using legacy encodings. authoring tools should default using utf-8 newly-created documents."”
  4. http://www.w3schools.com/tags/ref_charactersets.asp “for html5, default character encoding utf-8.”
  5. http://www.joelonsoftware.com/articles/unicode.html

* * * update * * *

i appreciate far make easier understand. i'll simplify original 3 questions clear answer can reached, here is: customer doesn't need support other languages, using html5 tags , ton of json/xml traffic sent , forth via jquery.ajax(). given info, security standpoint, there wrong keeping database set nls_characterset: we8mswin1252 , webpages set <cfheader name="content-type" value="text/html; charset=windows-1252">? thank you.

here question slight spin off one: why able use character that's not part of charset (windows-1252)?.

windows 1252 1 of many many fixed size character sets. mac has own set. there few iso various parts of europe , other parts of world. of them have slight variations.

the point have fixed-size character, meaning 1 character = 1 byte no matter what.

the bad points are:

  • some people may not have encoding installed
  • some people may use different encoding, resulting in few issues, not obvious see, ugly on long run
  • you can support few languages

that include citation make. in windows-1252 can't display russian, greek, polish ...

utf-8 standard encoding unicode representation on 1+ bytes. can represent large majority of characters may encounter, although designed latin-based languages, other languages take more storage space.

it in used in xml, json, , types of web services may find. default when don't know encoding use. allows limit number of encoding issues, such "i though in latin-1 / no, using latin-9, guy on mac used roman". if have more 1 people working on content of website, may have different encodings on plateforme, , therefore content may messed @ point.

utf-8 is, far know, way standardize encoding used between people without discussion.

typical example is, if website encoded in windows1252, , new dev has mac, you'll in trouble.


Comments

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

sql - Select Query has unexpected multiple records (MS Access) -