HTML Entities and Encodings
There seems to be confusion in regards to HTML Entities and Encodings over the web. HTML Entities are not Encodings; they are just a representation set of characters but that has nothing to do with the overall encoding of an HTML document or text in general.
Oddly Wikipedia; explains it best:
Numeric references always refer to Universal Character Set code points, regardless of the page's encoding. Using numeric references that refer to UCS control code ranges is forbidden, with the exception of the linefeed, tab, and carriage return characters. That is, characters in the hexadecimal ranges 00“08, 0B“0C, 0E“1F, 7F, and 80“9F cannot be used in an HTML document, not even by reference”so "™", for example, is not allowed. However, for backward compatibility with early HTML authors and browsers that ignored this restriction, raw characters and numeric character references in the 80“9F range are interpreted by some browsers as representing the characters mapped to bytes 80“9F in the Windows-1252 encoding.
Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately then HTML character references are usually only required for a few special characters (or not at all if a native Unicode encoding like UTF-8 is used).
Why you want all your HTML documents in UTF-8?
All text in IETF protocols for this here internet are using UTF-8. Here is a good outline on how to decide a charset from the RFC 2277 "Best Current Practice, IETF Policy on Character Sets and Languages" published in 1998:
3.2. How to decide a charset
When the protocol allows a choice of multiple charsets, someone must
make a decision on which charset to use.In some cases, like HTTP, there is direct or semi-direct
communication between the producer and the consumer of data
containing text. In such cases, it may make sense to negotiate a
charset before sending data.In other cases, like E-mail or stored data, there is no such
communication, and the best one can do is to make sure the charset is
clearly identified with the stored data, and choosing a charset that
is as widely known as possible.Note that a charset is an absolute; text that is encoded in a charset
cannot be rendered comprehensibly without supporting that charset.(This also applies to English texts; charsets like EBCDIC do NOT have
ASCII as a proper subset)
Negotiating a charset may be regarded as an interim mechanism that is
to be supported until support for interchange of UTF-8 is prevalent;
however, the timeframe of "interim" may be at least 50 years, so
there is every reason to think of it as permanent in practice.
So basically if your encodings or representation of text is intermingled with different char sets you'll usually have problems but if your overall document is in the UTF-8 encoding all of the chars that need to be represented will be available. So at least in the web browser space; all of the major web browsers will be able to display your bit of text. In other programs or where ever you are storing your data you may have problems if you aren't doing the due diligence. So in the end making ALL of your text or data UTF-8 alleviates the problem completely. Hopefully unlike said above; it doesn't take 50 years for people to get this.
This is pretty simple stuff, if you aren't encoding your text or data that you display in a a universal character set. It's just simple; you are doing it wrong.
Chi.mp
Looks like there is a new aggregator of social services that I didn't know about called Chimp - Content Hub & Identity Management Platform. I haven't tried it yet because I haven't received the beta code (just signed up) but if it's better than Mugshot; then i'll be closing down my Mugshot account. It would be a shame to do so but Mugshot hasn't moved forward in this space, I'll spend the time doing the due diligence into the company and their background before a switch. Nonetheless, it seems interesting from what I have read. Especially this post on OpenID as part of a solution stack rather than a specific technology. We'll see though; anything that will help me better manage my online identity.
Most of the items posted here are syndicated to several web sites and social networks just so I can keep it all in one place and lots of people have started linking to the content management stuff (which is awesome, and thanks!) Also primarily because I don't like having to duplicate a piece of content in several different places, or explicitly having to go about referring one to this forum or the next. My initial thoughts were that it was going to be a bad idea because the audiences are so different but I'm finding that it's been informative and has received positive feedback so I will continue.