Creating graphics with LuaTeX nodes

Introduction

In this post I’ll give a very simple example of how you can use LuaTeX’s node machinery to create graphics. The example really is very basic but, hopefully, indicates what could be achieved with more sophisticated code. I hope it is interesting or useful and would certainly appreciate being advised if any of my explanations contain technical inaccuracies, which I will fix.

I have used Google Doc’s viewer to display PDFs throughout.

Creating your own LuaTeX setup

If you are new to LuaTeX, just how do you start exploring this amazing piece of software? My personal preference and advice is start your journey by creating your own super-minimal LuaTeX installation. For sure, it takes a bit of time to set it up, and to understand how all the bits fit together but, in my personal opinion, it’s really worth the effort and you learn a huge amount during the process. For my own experiments I use a “version” of the Plain TeX format simply because, compared to LaTeX, it has a far less complex page layout mechanism (i.e., \output routine). You can of course tame LaTeX’s page layout as I have documented here but Plain TeX is, for me, ideal for learning about the internals of LuaTeX.

A simple Lua module to help

To get started I’ve provided a very simple-minded Lua module called “nodelist” which takes a TeX box number and scans through the list of nodes from which the content of the box is built. Note that nodelist only reports on a few node types but it is easy to extend the code to report on many more node types with as much information as you want. To do that you’ll need to read The LuaTeX Reference Manual which, as of this writing, means Chapter 4 (Section 4.1) and Chapter 8.

Installing Lua modules

If you want to use the nodelist module you’ll need to put the Lua code in a location that LuaTeX will find it. That location is, or should be, defined by the LUAINPUTS entry in your texmf.cnf file. For example, in my custom setup LUAINPUTS is set to:

LUAINPUTS = .;$TEXMF/scripts

Note that the $TEXMF variable means the root directory of your TeX installation. Your LUAINPUTS variable may point to a different location, especially if you have a standard installation.

To load and initialise the nodelist module you simply issue the \directlua call

\directlua{require(“nodelist”) nodelist.inilists()}

This loads the code and then runs a function nodelist.inilists() which initialises some Lua tables containing lookup data.

Avoiding catcode problems: One of the biggest advantages of putting your Lua code into a module or executing it via dofile() or loadfile() is that you avoid complications with TeX’s category codes. For a great explanation of catcodes and in-line Lua code see the LuaTeX Wiki.

A very basic Plain TeX page

The following example provides a basic starting point. Here, we define the PDF document to be 100mm wide and 100mm tall. In addition, we set TeX’s \vsize and \hsize parameters to ensure that the final box shipped out (box 255) is the same size as our PDF page. \hoffset and \voffset are both set to -1 inch to prevent TeX from shifting the final output. With these settings, TeX’s origin is the top-left corner of our PDF page.

\pdfoutput=1
\hoffset-1in
\voffset-1in
\pdfpageheight=100mm
\pdfpagewidth=100mm
\vsize=\pdfpageheight
\hsize=\pdfpagewidth
\topskip=0pt
\directlua{require("nodelist") nodelist.inilists()}
\output={\shipout\box255}
\noindent Hello.
\bye

The typeset output of the above code is:

Onto the graphics

The following example is deliberately kept simple so that the basic ideas are easier to understand. If you have used pdfTeX you may have experimented with the \pdfliteral facility which lets you inject raw PDF “code” into the PDF being built by pdfTeX, allowing you to create all sorts of fancy effects. LuaTeX takes this one step further and lets you create nodes containing PDF data which will be output when the PDF is generated.

The following code defines a macro \makegraphic#1#2#3{…} which draws a very simple graphic using LuaTeX nodes.

\pdfoutput=1
\hoffset-1in
\voffset-1in
\pdfpageheight=100mm
\pdfpagewidth=100mm
\vsize=\pdfpageheight
\hsize=\pdfpagewidth
\topskip=0pt
\directlua{require("nodelist") nodelist.inilists()}
\output={\shipout\box255}

% A macro to draw a graphic using \directlua
\def\makegraphic#1#2#3{
\directlua{

% Start by creating a new pdf_literal node

n = node.new("whatsit","pdf_literal")

% The mode value defines how the origin is established 
% see the pdfTeX reference manual for a discussion of \pdfliteral

n.mode = 0

% Here we are generating the PDF data for our graphic
% note how we can use TeX's parameters in the for loop!

local data=""
for x=#1,#2, #3 do
data=data.." 0 "..x.." m " ..x.." 5 l "
end
data= "q".." "..data.." .5 w S Q "

% Now we have the PDF data we can attach it to our pdf_literal node

n.data=data

% Here we are creating some very stretchy glue. I'll explain why
% in the text...

g=node.new("glue")
g.subtype=0
gs=node.new("glue_spec")
gs.width=0
gs.stretch=65536
gs.stretch_order=2
g.spec=gs

% Create a copy of the glue node
 
f=node.copy(g) 

% We now have two glue nodes and a pdf_literal node so we
% need to join them together. One way is to set the "next" field
% value to chain our node list together to give [glue] [pdf_literal] [glue]

n.next=g
f.next=n

%The node.hpack function "packs" our node list together into 
% a horizontal list --- there is a node.vpack(...) too. 
% Here it sets the glue because we have given the size of 10 TeX points
% = 10*65536 and the keyword "exactly". This is like saying
% \hbox to 10pt {.....}

% We store the result in box 1000 which we can refer
% to in regular TeX code; e.g., \box1000, \copy1000

tex.box[1000]=node.hpack(f, 10*65536,"exactly")

}}

\makegraphic{0}{10}{1}
% Here we take a look at the node lists in box 1000
\directlua{nodelist.listnodes(tex.box[1000])}
\noindent This is box 1000\box1000 which is cool

\bye

Here’s the typeset result:

Some more explanations

One of the most interesting questions is where is the origin for starting our drawing? In the above code I mentioned that we set

n.mode = 0

By setting mode=0, what will happen is that when LuaTeX generates the PDF it will make a PDF transformation to establish the origin to be wherever this node ends up on the page. Other values of mode will set the origin to be the lower-left corner of the PDF page so that all your drawing operations are relative to the page corner. However, in the above example there’s a deliberate complication. Remember the stretchy glue on either side of our pdf_literal node? This has an effect on the origin (0,0) of the pdf_literal.

Drawing outside the box

Another important point is that there is nothing preventing our graphic from drawing anywhere on the page and spilling outside the box in which it is contained. To prevent this you may need to set a clipping path or make sure you don’t draw outside the bounds (width, depth, height) of the box containing your graphic.

The origin

The key point is that to TeX our pdf_literal does not have any width and it is sandwiched between two glue values. You can think of following function call

node.hpack(f, 10*65536,”exactly”)

as packaging our node list ready for assigning to a box (it creates an hlist and sets the glue). Now the point is that the size we are packinging is 10 TeX points. So, we have two very flexible glues sandwiching something of zero size and having to “fill a box” of 10 TeX points. So the glue on either side sets to 5 points and the result is that the left-hand glue pushes the origin 5 points to the right: the pdf_literal sees the origin as the middle of the box simply because the glue is set to equal values.

If you have successfully installed the nodelist module you should see the following output on your terminal:

HLIST:  width:  655360  depth:  0       height:         0       shift:  0 glue order:     2       glue sign:      1       glue set:       5
GLUE skip       width:  0       stretch:        65536   shrink:         0 stretch order:  2       shrink order:   0
pdf_literal q  0 0 m 0 5 l  0 1 m 1 5 l  0 2 m 2 5 l  0 3 m 3 5 l  0 4 m 4 5 l
0 5 m 5 5 l  0 6 m 6 5 l  0 7 m 7 5 l  0 8 m 8 5 l  0 9 m 9 5 l  0 10 m 10 5 l .5 w S Q
GLUE skip       width:  0       stretch:        65536   shrink:         0 stretch order:  2       shrink order:   0

Here we can see details of box 1000 and that the glue set ratio is 5: there is a total of 10 points to be filled (pdf_litral contributes nothing) with total glue stretch of 2 fil, hence each glue streches by 5 points.

Another view of the node list is provided by the node tree structure:

Conclusions

LuaTeX’s node machinery opens up many interesting opportunities and applications. Here we have seen a simple example but through building boxes and glue at the node level you can create very powerful and sophisticated document engineering applications.

PoDoFoBrowser: free tool to view PDF internals

During the course of working with LuaTeX it can very helpful to explore the internals of the PDF files it generates, especially when using \pdfliteral calls or the LuaTeX PDF API functions (pdf.print() etc). To view the structures inside a PDF file you need tools that will parse and decompress the data and streams inside the PDF. One such utility, and free, is the PoDoFoBrowser. Incidentally, PoDoFo is “… a free, portable C++ library which includes classes to parse PDF files and modify their contents into memory”.

Here’s an example PoDoFoBrowser screenshot:

Viewing the internals of a PDF file using PoDoFoBrowser

Non-free option

Another PDF browsing/editing tool, but non-free and Windows only (trial version is free), is PDFTron’s PDF CosEdit, which I also use (registration is $99 plus taxes).

Viewing the internals of a PDF file using PDFTron's PDF CosEdit

I’m sure there are many others but these are ones I have used.

Trivial example of LuaTeX’s post_linebreak_filter

LuaTeX provides a very interesting facility called “callbacks”. A callback is a Lua function that you provide and which LuaTeX will call at certain times during the course of processing your document. LuaTeX defines a number of callback opportunities which are detailed in the LuaTeX Reference Manual.

To use callbacks you have to provide the Lua function and then register it with LuaTeX so that it knows to call your function at the appropriate time. You can provide callbacks for many purposes such as reading input files, processing input buffers, hooking into line breaking, page building and so forth.

Callbacks are an exceptionally powerful concept and they open the door to very sophisticated document processing solutions. For example, you can hook into the internal node lists and process them to achieve special effects. Here is a very simple and short “plain TeX” example of hooking into LuaTeX’s post_linebreak_filter (and, in addition, playing with setting the page size).

To quote the LuaTeX Reference Manual

post_linebreak_filter: This callback is called just after LuaTEX has converted a list of nodes into a stack of \hboxes.

In the following example, just after LuaTeX has processed the vbox, and broken the paragraph into lines, it calls a Lua function called linelist which simply ships out each line in the paragraph. Also, and not part of the callback, a copy of the vbox is output on the last page, setting the page size to be that of the vbox.

\pdfoutput=1
\hoffset-1in
\voffset-1in
\nopagenumbers
\directlua{
linelist=function(head)
local boxer=200
for line in node.traverse_id('hlist',head) do
           tex.setbox(boxer, node.copy(line))
           tex.box[boxer].height= tex.box[boxer].height+65536 %add 65536sp = 1pt to avoid clipping
           tex.box[boxer].depth= tex.box[boxer].depth+65536 %add 65536sp = 1pt to avoid clipping
           tex.shipout(boxer)
end
return head
end
callback.register("post_linebreak_filter",linelist)
}

\setbox1000=\vbox{\hsize=50mm Let us examine the structure of a list of hboxes in a vbox because
it is an instructive thing to do.}
\pdfpagewidth=\wd1000
\vsize=\ht1000
\advance\vsize by 65536 sp
\pdfpageheight=\vsize
\box1000
\bye

The output

The output is a 5-page PDF: the first four pages are the individual lines in the typeset paragraph and the final page is the vbox.

Introduction to LuaTeX (presentation extract)

Update: Google viewer stopped working so I deleted it. Just download the PowerPoint if interested.

Here is an extract from a recent PowerPoint presentation. Posting this is an experiment to see if the Google Docs viewer will render PowerPoint .pptx files. For sure, I could do this through SlideShare but I just wanted to see if it could be done via hosting it on my server.

Please be patient :-), it may take a few seconds to load or render. It works fine with my version of FireFox so apologies if you can’t see it but you can download the file (~ 360kb) if you prefer.

Cheers

Graham

Lua code to process a LuaTeX node list

Introduction

LuaTeX provides access to the deepest internal structures of the TeX engine: nodes, the fundamental building blocks created and assembled by the typesetting engine. I won’t try to explain nodes in detail here but instead refer you to an excellent article on the LuaTeX wiki.

If you are interested to explore node structures, for example the internal structure of a vbox or hbox, you can use the following code to get you started. It does not present anything radically new but simply gives some simple boilerplate code that you can expand to suit your own interests. For example, I used it to convert a node list to a PostScript representation of a paragraph.

Here is an example representation of a node structure.

How to build these node diagrams? I built this diagram using a DLL I wrote for LuaTeX: a customised build of the graphviz library with a Lua binding using the excellent LuaGRAPH library. I also used Patrick Gundlach‘s Lua code LuaTeX nodelist visualization to create the data for graphviz to process (Thanks Patrick!). The node graphs were converted to EPS (via graphviz) and PDFs were generated on the fly using GhostScript in a DLL with a Lua binding. You can of course use Patrick’s code to generate the graphviz data and run graphviz via the command line or via system/shell calls using Lua. I just prefer to have everything callable from DLLs.

Basic background information

Internally, LuaTeX defines quite a number of different node types; for a full list refer to the LuaTeX Reference Manual. You can generate a list of the node types using the LuaTeX API call

node.types()

which returns a table.

For example:

\directlua{
for i,v in pairs(node.types()) do
   print(i,v)
end
}

If you look at the sample node structure diagram above you can see that node lists are a nested linked list structure. To process this data structure you need to “walk over” the node list with a recursive function. The reason for needing recursion is that internally TeX builds nested data structures and it let’s you have boxes within boxes within boxes… These nested structures have to be parsed using recursion. So, the idea is that you start with the first node in the list and then visit and examine each node in turn. As we’ve noted there are quite a few different types of node, so the “action” you may want to perform for each node will depend on the type (id) of that node.

The way I’ve chosen to do this is to have a set of functions and to excute the appropriate function when you see a node of a particular type. One way to do this is with a table indexed by node id and the table value indexed by the id is a function. For example, suppose we have a function called “processnode”:

\directlua{
function processnode(node)
   print("processnode called")
end
}

The argument to the function “node” is the particular node you are looking at. Using the LuaTeX API function node.types() you can quickly populate a table with code such as this:

\directlua {
   nodedispatch={}
      for i,v in pairs(node.types()) do
         nodedispatch[i]=processnode
   end
}

Here, nodedispatch is our table indexed by node type, with each value set to a function called processnode. Calling the processnode function is very easy. Suppose you have a node id value idvalue then all you need to do is something like this:

nodedispatch[idvalue](node)

nodedispatch[idvalue] returns the function and (node) calls the function with your node object.

And whatsits too!

One very important node type is the “whatsit” (see the LuaTeX Reference Manual). TeX’s whatsits all have the same node id but the various different whatsits are defined by the subtype field of the main whatsit node. Similar to node.types() LuaTeX provides a handy API function node.whatsits() which we can use to build another function table, this time for processing whatsits.

\directlua {
whatsitdispatch={}
   for i,v in pairs(node.whatsits()) do
   whatsitdispatch[i]=processwhatsit
end
}

Where processwhatsit is another function to process whatsits.

Wrapping it all together

The above gives a brief summary of the approach but we now need to hook this all together into something you can use (you can download the full code below). Firstly, we need our recursive function to process the node list:

\directlua{
function listnodes(head)
	while head do
		local id = head.id
		nodedispatch[id](head)
   		if id == node.id('hlist') or id == node.id('vlist') then
    			listnodes(head.list)
		end
	head = head.next
       end
end
}

Note that the recursion happens when we see a node type of hlist or vlist because these contain links to further lists which we need to “recurse into”. We now need to glue this into our TeX code which we can do with a simple TeX macro as follows:

\def\dobox#1{\directlua{listnodes(tex.box[#1])}}

An example of using this would be:

\setbox100=\vbox{I love Lua\TeX!}
\dobox{100}

Download sample code

I’ve put some sample code (in a TeX file) for download here.