Over the last few weeks I have been reading extensively about Unicode, OpenType, UTF-8 plus a whole set of technologies related to typesetting complex scripts (Arabic) with LuaTeX. It is, for sure, a pretty complex picture so I thought I would try to summarise some of what I have learnt through a series of “mini tutorials” to (hopefully) help others save some time. The most time-consuming part of the learning process is piecing together the “landscape”, gathering an awareness of the components you need, and understanding how they fit together – before you can start to explore any one of them in detail. Building an appreciation of the interdependent concepts is also the most frustrating part because you feel that you are not actually “getting anywhere” and it takes a lot of time.
There is already a vast wealth of information published on Unicode so I will try not to simply repeat that but attempt to “paint a picture” of the basic ideas and concepts in the order that makes sense to me, at least.
Great free tools for Windows users
To get started, here are a couple of free tools that will help to learn and explore Unicode. I’ll start the proper tutorials in Part 2 of this series.
BabelPad
This is far more than just an excellent free Unicode text editor for Windows. It provides a wide range of tools and utilities that will help you get to grips with Unicode.
The Unibook™ Character Browser
Invaluable resource to explore information about the characters defined in the Unicode Standard and the International Standard ISO/IEC 10646.
… and free tools for Mac users
Many thanks to Patrick Gundlach for the following update:
“For Mac users, I would like to recommend the unicode checker at http://earthlingsoft.net/UnicodeChecker/ – I use it very often to look up the code point of a character and to look behind the scenes (composition, alternatives, variants).”
For the mac users, I would like to recommend the unicode checker at http://earthlingsoft.net/UnicodeChecker/ – I use it very often to look up the code point of a character and to look behind the scenes (composition, alternatives, variants). Great article, as usual, btw!
Pingback: Unicode for the impatient (Part 3: UTF-8 bits, bytes and C code) « STM publishing: tools, technologies and change