Adding a UTF-8-capable regular expression library to LuaTeX

Introduction

In this post I’m going to sketch out adding the free PCRE C library to LuaTeX through a DLL and outline how you can get PCRE to call LuaTeX! The following is just an outline of an experiment, not a tutorial on PCRE, and I’ve not tried this in a production environment. So, do please undertake all necessary testing and due diligence in your own code!

PCRE: Perl Compatible Regular Expressions

PCRE is a mature C library which provides a very powerful regular expression engine. It is also capable of working with UTF-8 encoded strings, which is, of course, very useful because LuaTeX uses UTF-8 input. I’m not going to cover the entire PCRE build process in this post because, frankly, it’ll take too long. But in outline…

Building PCRE as a static library (.lib)

  1. I used CMake to create a Visual Studio 2008 project via the PCRE-supplied CMakeLists.txt file. Using the CMake tool you can set the appropriate compile-time flags for UFT-8 support: PCRE_SUPPORT_UTF and PCRE_SUPPORT_UNICODE_PROPERTIES. The latter is very useful for seaching UTF-8 strings based on their Unicode character properties. Full details are in the PCRE documentation.
  2. After you finish configuring the PCRE build, and have selected your build environment, press Generate and CMake will output a complete Visual Studio project that you can open and start working on. Wonderful!
  3. However, getting PCRE to build as a static library was fine but I did have a few hassles getting the library to correctly link against the DLL I was building. It took me a bit of time to figure out which additional PCRE preprocessor directives I needed to set in the DLL C code to ensure everything was #define‘d properly.

Building a DLL for LuaTeX

I wrote a very brief overview of building DLLs for LuaTeX in this post so I won’t repeat the details here. Instead, I’ll give a summary indicating how you can get PCRE to call LuaTeX. One word of advice, PCRE comes with a lot of documentation and you’ll need to read through it very carefully! Asking PCRE to call LuaTeX sounds strange but indeed you can do it because PCRE provides the ability to register a callback function it will call each time it matches a string. Perl has a similar ability to execute Perl code on matching a string. From the PCRE documentation:

“PCRE provides a feature called ‘callout’, which is a means of temporarily passing control to the caller of PCRE in the middle of pattern matching. The caller of PCRE provides an external function by putting its entry point in the global variable pcre_callout.”

Calling LuaTeX

OK, so how do we do that? There are two parts to this story: create a Lua function you want to call from C and create the C function which calls the Lua function.

  1. From within LuaTeX, use \directlua{...} to create a simple Lua function printy that we are going to call from PCRE. This Lua function takes a string and sends it to LuaTeX via tex.print(). In these examples I sent LuaTeX a simple text string "Yo! I was called!", which LuaTeX then typeset. Of course, you could also send LuaTeX the string that was matched by PCRE!
           \directlua{
                  function printy (str)
                  tex.print(str)
                  end
           }
    
  2. The next part is to create the C code to call a Lua function. This C function is the callout that PCRE will call when it matches a string.
           int mycallout(pcre_callout_block *cb){
           lua_State *L;
           L = cb->callout_data;
           if (L){
                  lua_getglobal(cb->callout_data, "printy");
                  if(!lua_isfunction(L,-1)) {
                         lua_pop(L,1);
                         return 0;
                   }
    
                  lua_pushstring(L, "Yo! I was called!");   /* push 1st argument */
                  /* Now make the call to printy with 1 argument and 0 results*/
                  if (lua_pcall(L, 1, 0, 0) != 0) {
                  // report your error 
                   return 0;
                  }
        }
        return 0;
    }
    

    A few points here are worth noting.

    • From the PCRE documentation:

      “The external callout function returns an integer to PCRE. If the value is zero, matching proceeds as normal. If the value is greater than zero, matching fails at the current point, but the testing of other matching possibilities goes ahead, just as if a lookahead assertion had failed. If the value is less than zero, the match is abandoned, the matching function returns the negative value”

    • The lua_State variable, *L, is passed in via a mechanism I’ll outline below.
    • The line lua_getglobal(cb->callout_data, "printy") does the main work of pushing the value of the gloabal variable printy onto Lua’s stack. Of course, in effect this is a pointer to the function we defined in LuaTeX, and which we call through lua_pcall(...). Further details in the Lua documentation.
    • The above code does near-zero error checking, it is purely to demonstrate the ideas!

Other PCRE bits and pieces

There are a few other points to consider, namely how do you setup the callout and how do you pass lua_State *L to the callout? I’m not going to explain in great detail how all these parts hang together in a full application, simply point out some key pieces.

  1. You have to set the PCRE global variable pcre_callout, a function pointer, to your callout function. Simply, pcre_callout = mycallout; Yes, it does work. Here, re represents our compiled regular expression pattern. Note that you must use the PCRE_UTF8 option if you are searching UTF-8 encoded text.
  2. Before you can start searching, you need to “compile” your regular expression pattern.
                  re = pcre_compile(pattern,
    		      PCRE_UTF8|PCRE_UCP,
    		      &err_msg,
    		      &err,
    		      NULL);
    
  3. Note, to use PCRE callouts you need to use the appropriate syntax in your regular expression; from the PCRE documentation, “Within a regular expression, (?C) indicates the points at which the external function is to be called.” Once you have compiled your search pattern, and done your error checking, you need to run the search engine using the compiled pattern and your target string (s) in the code below.
  4. The next step is to create a pointer to something called a pcre_callout_block, which is a struct. This struct has a field called callout_data which is a pointer into which you can store whatever you want to pass into the mycallout function: here, I’m setting it to the lua_State variable, L. By doing this, each time PCRE matches a string and calls the callout funtion, the lua_State variable, L will be available for our use! Clearly, you’ll need to do this from within the appropriate function you call from LuaTeX. Once this is done you are ready to begin your searching using pcre_exec(...).

                  pcre_extra *p;
                  p = (pcre_extra*) malloc(sizeof(pcre_extra));
                  memset(p,0, sizeof(pcre_extra));
                  p->callout_data = L;
                  p->flags=PCRE_EXTRA_CALLOUT_DATA;
                         res = pcre_exec(re,
                                p,
                                s,
                                len,
                                0,
                                0,
                                offsets,
                         OVECMAX);
    

Summary

PCRE is a marvellous and powerful C library – with copious documentation that you’ll need to read very carefully! The ability to provide LuaTeX with a UTF-8-enabled regex engine could open the way to some useful applications, particularly when combined with LuaTeX’s own callback mechanism. In particular, the process_input_buffer callback which allows you to change the contents of the line input buffer just before LuaTeX actually starts looking at it. The mind boggles at the possibilities!

Browsing LuaTeX source with NetBeans

Introduction

It’s been a long time since I posted anything on this blog, mainly because my job has been keeping me very busy. As time permits I’ve been reading parts of the LuaTeX source code in an attempt to better understand how it all works: cross-referencing the source code to explanations in the LuaTeX Reference. A couple of days ago I stumbled on the NetBeans IDE – a free Integrated Development Environment. I was interested to see that NetBeans has a Subversion Checkout Wizard (i.e., built-in SVN capabilities), so you can checkout a copy of the LuaTeX code repository and import it directly into NetBeans as a new project. So, I downloaded NetBeans (with C/C++ support) and checked out a copy of the LuaTeX code base, directly from within NetBeans. After completing the download, NetBeans automatically imported the LuaTeX code to create a new project. Very nice!

However, I have not tried to build LuaTeX using NetBeans (because I need to understand more about the build process) but I have found that it provides excellent tools to search and browse the source code, allowing you to very quickly explore and probe some of the deeper mysteries of TeX.

Tip: tell NetBeans about .w files

Much of the LuaTeX code base is written in CWEB (integrated C source code and documentation); consequently, many of the source files have a .w extension. You’ll need to configure NetBeans to tell it about .w files: see Tools –> Options –> Miscellaneous.

Here’s a screenshot showing a search for the build_page() function, part of TeX’s page-building machinery, showing you where and when TeX exercises the page builder.

Typesetting Arabic with LuaTeX: Part 2 (documentation, tools and libraries)

Introduction

I’ve been thinking about the next article in this series and what should it address so I’ve decided to skip ahead and give a summary of the documentation, tools and libraries which made it possible for me to experiment with typesetting Arabic. I’m listing these because it actually took a long time to assemble the reading materials and tools required, so it may just save somebody, somewhere, the many hours I spent hunting it all down. For sure, there’s a ton of stuff I want to write about, in an attempt to piece together the various concepts and ideas involved in gaining a better understanding of Unicode, OpenType and Arabic text typesetting/display. However, I’m soon to start a new job, which means I’ll have less time to devote to this blog so I’ll try to post as much as I can over the next couple of weeks.

Just for completeness, I should say that, for sure, you can implement Arabic layout/typesetting for LuaTeX in pure Lua code, as the ConTeXt distribution has done, through the quite incredible work of Idris Hamid and Hans Hagen.

Documentation

There is a lot to read. Here are some resources that are either essential or helpful.

Unicode

Clearly, you’ll need to read relevant parts of the Unicode Standard. Here’s my suggested minimal reading list.

  • Chapter 8: Middle Eastern Scripts . This gives an extremely useful description of cursive joining and a model for implementing contextual analysis.
  • Unicode ranges for Arabic (see also these posts). You’ll need the Unicode code charts for Arabic (PDFs downloadable and listed under Middle Eastern Scripts, here)
  • Unicode Bidirectional Algorithm. Can’t say that I’ve really read this properly, and certainly not yet implemented anything to handle mixed runs of text, but you certainly need it.

OpenType

Whether you are interested in eBooks, conventional typesetting or the WOFF standard, these days a working knowledge of OpenType font technology is very useful. If you want to explore typesetting Arabic then it’s simply essential.

C libraries

It’s always a good idea to leverage the work of true experts, especially if it is provided for free as an open source library! I spent a lot of time hunting for libraries, so here is my summary of what I found and what I eventually settled on using.

  • IBM’s ICU: Initially, I looked at using IBM’s International Components for Unicode but, for my requirements, it was serious overkill. It is a truly vast and powerful open source library (for C/C++ and Java) if you need the wealth of features it provides.
  • HarfBuzz: This is an interesting and ongoing development. The HarfBuzz OpenType text shaping engine looks like it will become extremely useful; although I had a mixed experience trying to build it on Windows, which is almost certainly due to my limitations, not those of the library. If you’re Linux-based then no doubt it’ll be fine for you. As it matures to a stable release I’ll definitely take another look.
  • GNU FriBidi: As mentioned above, essential for a full implementation of displaying (eBooks, browsers etc) or typesetting mixed left-to-right and right-to-left scripts is the Unicode Bidirectional Algorithm. Fortunately, there’s a free and standard implementation of this available as a C library: GNU FriBidi I’ve not yet reached the point of being able to use it but it’s the one I’ll choose.

My libraries of choice

Eventually, I settled on FreeType and libotf. You need to use them together because libotf depends on FreeType. Both libraries are mature and easy to use and I simply cannot praise these libraries too highly. Clearly, this is my own personal bias and preference but ease of use rates extremely highly on my list of requirements. FreeType has superb documentation whereas libotf does not, although it has some detailed comments within the main #include file. I’ll definitely post a short “getting started with libotf” because it is not difficult to use (when you’ve worked it out!).

libotf: words are not enough!

Mindful that I’ve not yet explained how all these libraries work together, or what they do, but I just have to say that libotf is utterly superb. libotf provides a set of functions which “drive” the features and lookups contained in an OpenType font, allowing you to pass in a Unicode string and apply OpenType tables to generate the corresponding sequence of glyphs which you can subsequently render. Of course, for Arabic you also need to perform contextual analysis to select the appropriate joining forms but once that is done then libotf lets you take full advantage of any advanced typesetting features present in the font.

UTF-8 encoding/decoding

To pass Unicode strings between your C code and LuaTeX you’ll be using UTF-8 so you will need to encode and decode UTF-8 from within your C. Encoding is easy and has been covered elsewhere on this site. For decoding UTF-8 into codepoints I use the The Flexible and Economical UTF-8 Decoder.

Desktop software

In piecing together my current understanding of Unicode and OpenType I found the following software to be indespensible. Some of these are Windows-only applications.

  • VOLT: Microsoft’s excellent and free VOLT (Visual OpenType Labout Tool). I’ll certainly try to write an introduction to VOLT but you can also download the free Volt Training Video.
  • Font editors: Fontlab Studio 5 (commercial) or FontForge (free).
  • Adobe FDK: The Adobe Font Development Kit contains some excellent utilities and I highly recommend it.
  • Character browser: To assist with learning/exploring Unicode I used the Unibook character browser.
  • BabelPad: Absoutely superb Windows-based Unicode text editor. Packed with features that can assist with understanding Unicode and the rendering of complex scripts. For example, the ability to toggle complex rendering so that you can edit Arabic text without any Uniscribe shaping being applied.
  • BabelMap: Unicode Character Map for Windows is another great tool from the author of BabelPad.
  • High quality Arabic fonts. By “high quality” I don’t just mean the design and hinting but also the number of OpenType features implemented or contained in the font itself, such as cursive positioning, ligatures, vowel placement (mark to base, mark to ligature, mark to mark etc). My personal favourite is Arabic Typesetting (shipped with Windows) but SIL International also provide free Arabic fonts provide one called Scheherazade.

TIP: Microsoft VOLT and the Arabic Typesetting or Scheherazade fonts. I’ll talk about VOLT in more detail later but Microsoft and SIL provide “VOLT versions” of their respective Arabic fonts. These are absolutely invaluable resources for understanding advanced OpenType concepts and if you are interested to learn more I strongly recommend taking a look at them.

  • The VOLT version of the Arabic Typesetting font is shipped with the VOLT installer and is contained within a file called “VoltSupplementalFiles.exe”, so just run that to extract the VOLT version.
  • The VOLT version of Scheherazade is made available as a download from SIL.

I can only offer my humble thanks to the people who created these resources and made them available for free: a truly substantial amount of work is involved in creating them.

LuaCOM: connecting LuaTeX to Windows automation

Introduction

The Windows operating system provides a technology called COM, which stands for Component Object Model. In essence, it provides a way for software components and applications to “talk to each other”. That’s a gross oversimplification but it gives the general idea. It’s now an old technology but nevertheless it is still very powerful; over the years I’ve used it quite extensively for automating various publishing/production tasks. In those days it was with Perl using a module called Win32::OLE.

Of course, applications have to be written to support COM so you can think of COM-aware applications as offering a “set of services” that you can call — many applications provide the ability to call those services from scripting languages which have support for COM (via modules/plugins etc), such as Perl, Ruby and, of course, Lua via LuaCOM. A combination of COM-aware applications and scripting languages with COM support provides a very flexible way to “glue together” all sorts of different applications to create novel automated workflows/processes.

Using COM from within scripting languages is fairly straightforward but under the surface COM is, to me anyway, a complex beast indeed. The best low-level COM programming tutorials I have ever read are published on codeproject.com, written by Michael Dunn. Here’s one such tutorial Introduction to COM – What It Is and How to Use It.

LuaCOM

LuaCOM lets you use COM in your Lua scripts, i.e., it is a binding to COM. I don’t know if there are freely available builds of the latest version (probably with Windows distributions of Lua), but you can download and compile the latest version from Github.

LuaCOM is a DLL (Dynamic Link Library) that you load using the standard “require” feature of Lua. For example, to start Microsoft Word from within your Lua code, using LuaCOM, you would do something like this:

com = require("luacom")
-- should be CreateObject not GetObject!
Word =com.CreateObject("Word.Application")
Word.Visible=1
doc = Word.Documents:Open("g:/x.docx")

Naturally, the Microsoft Office applications have very extensive support for COM and offer a huge number of functions that you can call should you wish to automate a workflow process via COM from within Lua. For example, you can access all the native equation objects within a Word document (read, write, create and convert equations…). If you have watched this video and wondered how I got LuaTeX and Word to talk to each other, now you know: LuaCOM provided the glue.

Using LuaTeX to create SVG of typeset formulae

Introduction

This is a current work-in-progress so I’ll keep it brief and outline the ideas.

There are, of course, a number of tools available to generate SVG from TeX or, more correctly, SVG from DVI. Indeed, I wrote one such tool myself some 9 years ago: as an event-driven COM object which fired events to a Perl backend. For sure, DVI to SVG works but with LuaTeX you can do it differently and, in my opinion, in a much more natural and integrated way. The key is the node structures which result from typeset maths. By parsing the node tree you can create the data to construct the layout and generate SVG (or whatever you like).

Math node structures

Let’s take some plain TeX math and store it in a box:

\setbox101=\hbox{$\displaystyle\eqalign{\left| {1 \over \zeta - z - h} - 
{1 \over \zeta - z} \right| & = \left|{(\zeta - z) - (\zeta - z - h) \over (\zeta - z - h)(\zeta - z)}
\right| \cr & =\left| {h \over (\zeta - z - h)(\zeta - z)} \right| \cr
& \leq {2 |h| \over |\zeta - z|^2}.\cr}$}

What does the internal node structure, resulting from this math, actually look like? Well, it looks pretty complicated but in reality it’s quite easy to parse with recursive functions, visiting each node in turn and exporting the data contained in each node. Note that you must take care to preserve context by “opening” and “closing” the node data for each hlist or vlist as you return from each level of recursion.

Download PDF

The idea is that you pass box101 to Lua function which starts at the root node of the box and works its way through and down the node tree. One such function might look like this:


<<snip lots of code>>

function listnodes(head)
while head do
		local id = head.id
				if id==0 then		
					mnodes.nodedispatch[id](head, hdepth+1)
				elseif id==1 then
					mnodes.nodedispatch[id](head, vdepth+1)
					else
					mnodes.nodedispatch[id](head)
				end
			if id == node.id('hlist') or id == node.id('vlist') then
				--print("enter recursing", depth)
					if id==0 then	
						hdepth=hdepth+1
					elseif id==1 then
						vdepth=vdepth+1
						else
					end
				--mnodes.open(id, depth,head)
    			listnodes(head.list)
					if id==0 then	
						mnodes.close(id, hdepth)	
						hdepth=hdepth-1
					elseif id==1 then
						mnodes.close(id, vdepth)	
						vdepth=vdepth-1
						else
					end
				--print("return recursing", depth)
		end
	head = head.next
end
end

What you do with the data in each node depends on your objectives. My preference (current thinking) is to generate a “Lua program” which is a set of Lua functions that you can run to do the conversion. The function definitions are dictated by the conversion you want to perform. For example, the result of parsing the node tree could be something like this (lots of lines omitted):

HLISTopen(1,13295174,0,2445314,2772995,0,0,0)
MATH(0,0)
HLISTopen(2,0,0,0,0,0,0,0)
HLISTclose(2)
GLUE(skip,109224,0,0,0,0)
VLISTopen(1,13076726,0,2445314,2772995,0,0,0)
HLISTopen(2,13076726,0,622600,950280,0,0,0)
GLUE(tabskip,0,0,0,0,0)
HLISTopen(3,5670377,0,622600,950280,0,0,0)
RULE(0,557056,229376)
GLUE(skip,0,65536,0,2,0)
MATH(0,0)
HLISTopen(4,5670377,0,622600,950280,0,0,0)
HLISTopen(5,5670377,0,622600,950280,0,0,0)
VLISTopen(2,218453,-950280,1572880,0,0,0,0)
HLISTopen(6,218453,0,393220,0,0,0,0)
GLYPH(12,19,218453,0,393220)
HLISTclose(6)
KERN(0,0)
HLISTopen(6,218453,0,393220,0,0,0,0)
GLYPH(12,19,218453,0,393220)
HLISTclose(6)
KERN(0,0)
HLISTopen(6,218453,0,393220,0,0,0,0)
GLYPH(12,19,218453,0,393220)
HLISTclose(6)
KERN(0,0)
HLISTopen(6,218453,0,393220,0,0,0,0)
GLYPH(12,19,218453,0,393220)
HLISTclose(6)
VLISTclose(2)
HLISTopen(6,2805531,0,576976,865699,0,0,0)
HLISTopen(7,2805531,0,576976,865699,0,0,0)
HLISTopen(8,78643,-163840,0,0,0,0,0)
HLISTclose(8)
VLISTopen(2,2648245,0,576976,865699,0,0,0)
HLISTopen(8,2648245,0,0,422343,2,1,17)
GLUE(skip,0,65536,65536,2,2)
GLYPH(49,1,327681,422343,0)
GLUE(skip,0,65536,65536,2,2)
HLISTclose(8)
KERN(266409,0)
RULE(-1073741824,26214,0)
KERN(145167,0)
HLISTopen(8,2648245,0,127431,455111,0,0,0)
GLYPH(16,7,286721,455111,127431)
KERN(48355,0)
GLUE(medmuskip,145632,72816,145632,0,0)
GLYPH(0,13,509726,382293,54613)
GLUE(medmuskip,145632,72816,145632,0,0)
GLYPH(122,7,304775,282168,0)
KERN(28823,0)
GLUE(medmuskip,145632,72816,145632,0,0)
GLYPH(0,13,509726,382293,54613)
GLUE(medmuskip,145632,72816,145632,0,0)
GLYPH(104,7,377591,455111,0)
HLISTclose(8)
VLISTclose(2)
HLISTopen(8,78643,-163840,0,0,0,0,0)
HLISTclose(8)
HLISTclose(7)
HLISTclose(6)
GLUE(medmuskip,145632,72816,145632,0,0)
GLYPH(0,13,509726,382293,54613)
GLUE(medmuskip,145632,72816,145632,0,0)
HLISTopen(6,1626950,0,576976,865699,0,0,0)
HLISTopen(7,1626950,0,576976,865699,0,0,0)


<<snip loads of lines>>


GLYPH(58,7,182045,69176,0)
HLISTclose(4)
MATH(1,0)
GLUE(skip,0,65536,0,2,0)
HLISTclose(3)
GLUE(tabskip,0,0,0,0,0)
HLISTclose(2)
VLISTclose(1)
GLUE(skip,109224,0,0,0,0)
MATH(1,0)
HLISTclose(1)

At each node you can emit a “function” such as GLUE(skip,0,65536,65536,2,2) or GLYPH(49,1,327681,422343,0) which contain the node data as arguments of the function call. Each of these “functions” can then be “run” by providing a suitable function body: perhaps one for SVG, HTML5 canvas and JavaScript, or EPS file or whatever. The point is you can create whatever you like simply by emitting the appropriate data from each node.

What about glyph outlines?

Fortunately, there is a truly wonderful C library called FreeType which has everything you need to generate the spline data, even with TeX’s wonderfully arcane font encodings for the 8-bit world of Type 1 fonts. Of course, to plug FreeType into the overall solution you will need to write a linkable library: I use DLLs on Windows. FreeType is a really nice library, easy to use and made even more enjoyable by Lua’s C API.

Summary

Even though I have omitted a vast amount of detail, and the work is not yet finished, I hope you can see that LuaTeX offers great potential for new and powerful solutions.

Typesetting Arabic with LuaTeX [via a C plug-in] (Part 1)

Introduction

In this new series of posts I’m going to attempt an overview of the topics, concepts, ideas and technologies involved in typesetting Arabic with LuaTeX, via a DLL I’m writing in C. Actually, the C code is very substantially platform-independent so it should compile on non-Windows machines… one day, when it’s “finished”…

Up until 2 years ago I was teaching myself Arabic (see my Amazon book reviews) and had reached the point where I wanted to write-up my notes and worked exercises: I needed to typeset Arabic and wanted to use a TeX-based solution. Having looked around I stumbled upon some truly amazing video presentations of Arabic typesetting work being undertaken by Idris Hamid and Hans Hagen, using a tool called LuaTeX: something I’d never heard of. I was truly stunned by what I saw, the quality of their Arabic typesetting was (is) incredible, so I had to find out more. A few hours later I’d worked out that the typesetting was being achieved through Hans Hagen’s ConTeXt package, with LuaTeX as the underlying TeX engine. However, I’m personally not a user of ConTeXt, but the LuaTeX engine was just so interesting that I had to explore it. Well, two years later and I’ve not done any further learning of Arabic, having replaced that activity with plenty of explorations into LuaTeX and a host of other technologies, particularly OpenType and Unicode.

Coming up to the present day, I’ve finally reached the point where I have puzzled out enough detail of the “big picture” to attempt a home-grown Arabic typesetting solution for LuaTeX, but one where most of the “heavy lifting” is done in C, with Lua code to interface with and talk to LuaTeX. For sure, there are ready-made options such as XeTeX or the range of Arabic typesetting solutions created by the TeX community. However, my interest is creating a solution that will just as easily output SVG or other non-PDF formats, plus allow the automated production of new and novel “typeset structures” and diagrams that will really help with learning Arabic: things I wish had been present in the many books I have bought and studied but which may just be too time-consuming, or difficult/expensive, to produce by “conventional” applications. These are big goals, but definitely achievable, albeit over a year or two of further work.

Sample

Just by way of an early example, see the following PDF, as usual, through the Google Docs viewer or download PDF here. The trained eye will certainly spot a few issues that need fixing but so far it’s not looking too bad :-). But there is a long, long way to go yet. The font used is Microsoft’s “Arabic Typesetting” because it is contains a substantial number of OpenType features including cursive positioning, mark-to-base positioning, an enormous range of ligatures plus many other features which make it an ideal choice of font to work with (in my opinion). In the example (the made-up words) you can see the non-horizontal baseline achieved with cursive positioning plus the ability to control vowel placement with great flexibility.

But it’s still far from perfect, I’ll readily admit. I hope I can finish this work, and find the time to complete these articles. I’ll certainly try!