From typeset Arabic directly to SVG with LuaTeX

Just a brief post

With the explosive growth of interest in “eBooks” and the use of SVG in EPUB3, I thought it would be worth experimenting to see how “easy” it was to produce SVG directly from typeset Arabic: using LuaTeX. Turns out it is certainly quite possible and an inline SVG example is shown below (OK, it should be displayed on the right-hand side, I know ;-)). This SVG was created using a point size of 100 for all calculations of the SVG “width” and “height” values. No hand editing was done at all, it is exactly as output. Need to finish kerning, vowel placement and cursive positioning in the SVG export functions but I think that should be OK.

Very likely that full mathematical formulae could also be exported directly to SVG using LuaTeX’s node structures: but they are deeply nested and complex so it could be tricky. Quite possibly, LuaTeX offers excellent potential for fully automated eBook production and, of course, print PDF production, from a single TeX source file suitably marked-up.

Typesetting Arabic with LuaTeX [via a C plug-in] (Part 1)

Introduction

In this new series of posts I’m going to attempt an overview of the topics, concepts, ideas and technologies involved in typesetting Arabic with LuaTeX, via a DLL I’m writing in C. Actually, the C code is very substantially platform-independent so it should compile on non-Windows machines… one day, when it’s “finished”…

Up until 2 years ago I was teaching myself Arabic (see my Amazon book reviews) and had reached the point where I wanted to write-up my notes and worked exercises: I needed to typeset Arabic and wanted to use a TeX-based solution. Having looked around I stumbled upon some truly amazing video presentations of Arabic typesetting work being undertaken by Idris Hamid and Hans Hagen, using a tool called LuaTeX: something I’d never heard of. I was truly stunned by what I saw, the quality of their Arabic typesetting was (is) incredible, so I had to find out more. A few hours later I’d worked out that the typesetting was being achieved through Hans Hagen’s ConTeXt package, with LuaTeX as the underlying TeX engine. However, I’m personally not a user of ConTeXt, but the LuaTeX engine was just so interesting that I had to explore it. Well, two years later and I’ve not done any further learning of Arabic, having replaced that activity with plenty of explorations into LuaTeX and a host of other technologies, particularly OpenType and Unicode.

Coming up to the present day, I’ve finally reached the point where I have puzzled out enough detail of the “big picture” to attempt a home-grown Arabic typesetting solution for LuaTeX, but one where most of the “heavy lifting” is done in C, with Lua code to interface with and talk to LuaTeX. For sure, there are ready-made options such as XeTeX or the range of Arabic typesetting solutions created by the TeX community. However, my interest is creating a solution that will just as easily output SVG or other non-PDF formats, plus allow the automated production of new and novel “typeset structures” and diagrams that will really help with learning Arabic: things I wish had been present in the many books I have bought and studied but which may just be too time-consuming, or difficult/expensive, to produce by “conventional” applications. These are big goals, but definitely achievable, albeit over a year or two of further work.

Sample

Just by way of an early example, see the following PDF, as usual, through the Google Docs viewer or download PDF here. The trained eye will certainly spot a few issues that need fixing but so far it’s not looking too bad :-). But there is a long, long way to go yet. The font used is Microsoft’s “Arabic Typesetting” because it is contains a substantial number of OpenType features including cursive positioning, mark-to-base positioning, an enormous range of ligatures plus many other features which make it an ideal choice of font to work with (in my opinion). In the example (the made-up words) you can see the non-horizontal baseline achieved with cursive positioning plus the ability to control vowel placement with great flexibility.

But it’s still far from perfect, I’ll readily admit. I hope I can finish this work, and find the time to complete these articles. I’ll certainly try!

Video: Integrating LuaTeX and Microsoft Word (passing math formulae)

Introduction

Just a short post, this time with my first venture into screencasts. No sound in this one, sorry, but there will be in any future videos.

From LuaTeX to Word

The following simple demo shows equation material in LuaTeX being passed to Microsoft Word. It does not use any plug-ins for Word and it is not using clipboard techniques: standard Windows automation technologies only. I’ll explain the details in a future post and provide code, but I’ve only just got it working so the code is very “alpha stage” and needs polishing. The techniques open the door to full data transfer between LuaTeX and Microsoft Word, including MathML, and may offer some very powerful document conversion opportunities.

The demo is a very simple one, just passing some basic formulae to Word from LuaTeX. At the top left of the screen is the LuaTeX PDF output displayed by Evince; at the bottom left is Microsoft Word receiving and displaying math formulae sent from LuaTeX.

The video resolution supports good full-screen display, just click the button provided by the video player. Stay tuned for future videos :-).

This looks useful: Gow – The lightweight alternative to Cygwin

Just received notification of this:

“Gow (Gnu On Windows) is the lightweight alternative to Cygwin. It uses a convenient Windows installer that installs about 130 extremely useful open source UNIX applications compiled as native win32 binaries. It is designed to be as small as possible, about 10 MB, as opposed to Cygwin which can run well over 100 MB depending upon options.”

https://github.com/bmatzelle/gow/wiki

Just off to grab this now!

Quick and dirty method for creating spot colours in PDFs

Introduction

Just a 10-minute hack to explore putting spot colours into a PDF via pdf_colorstack nodes. I don’t have access to Acrobat Professional at the moment to check the separations properly, so treat this as an “alpha” method (i.e., not fully tested…). The colour defined below is lifted straight from an early PDF specification and implemented via LuaTeX nodes. As it says “on the tin”: a quick and dirty method :-).

\pdfoutput=1
\hoffset-1in
\voffset-1in
\nopagenumbers
\pdfcompresslevel=0

\directlua {

n = pdf.immediateobj("stream", "{ dup 0.84 mul
exch 0.00 exch dup 0.44 mul
exch 0.21 mul
}", "/FunctionType 4
/Domain [0.0 1.0]
/Range [0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0] ")

o = pdf.immediateobj("[ /Separation
/LogoGreen
/DeviceCMYK".." "..n.." 0 R]")

pdf.pageresources =  " /ColorSpace << /LogoGreen "..o.." 0 R >> "

pdf_colstart = node.new("whatsit","pdf_colorstack")
pdf_end = node.new("whatsit","pdf_colorstack")

pdf_colstart.data="/LogoGreen  CS  /LogoGreen cs 1  SC  1 sc "
pdf_colstart.cmd=1

pdf_end.data= " "
pdf_end.cmd = 2

tex.box[1999]= node.hpack(pdf_colstart) 
tex.box[2000]= node.hpack(pdf_end) 

}

\def\makeitgreen#1{\copy1999\relax#1\copy2000\relax}

There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything \makeitgreen{embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to}  repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc.

\bye

Resulting PDF

As usual, through the Google Docs viewer or download here.