A minimal LuaTeX setup on Windows (Part 6: final)

Well, it feels like it has taken a lot of writing to get to this, the final Part of A minimal LuaTeX setup on Windows. At the end of Part 5 we had discussed creating a minimal TDS-compliant directory structure to contain the file types we need to install for a minimal plain TeX setup:

  • TeX Font Metric files (extension .tfm)
  • Font encoding files (extension .enc)
  • Glyph data files (extension .pfb, on Windows)
  • The plain TeX format file (extension .fmt)
  • The plain TeX source files (plain.tex and hyphen.tex)
  • .map files (specifically for pdfTeX)

We decided on a minimal TDS-compliant directory, starting with c:\luatexblog\texmf. Considering just the fonts, they will be located in subdirectories of c:\luatexblog\texmf. We’ll create a set of directories which follow the structure:

c:\luatexblog\texmf\fonts\[type]\[supplier]\[typeface]

Where [type] will be

  • tfm: for .tfm files (TeX font metrics)
  • type1: for .pfb files (Printer Font Binary)

Where [supplier] will be public (i.e., for free fonts) and [typeface] will simply be cm (for Computer Modern). In addition, under c:\luatexblog\texmf\fonts\ we’ll need to create directories for

  • map: for .map files (pdfTeX and LuaTeX font mapping files)
  • enc: for .enc files (font encoding)

Finally, we directories to contain

  • plain TeX source files (plain.tex and hyphen.tex)
  • the plain TeX .fmt file
  • the texmf.cnf file that we’ll write for Kpathsea

Going back to the Kpathsea documentation which gives a nice example of a skeleton TDS, you should create a directory structure that looks something like this:

Note that if you add more .pfb files under the directory c:\luatexblog\texmf\fonts\type1 then it is best practice to create a new subdirectory whose name reflects the supplier, for example I have added “adobe” as an example. Under each supplier you add a name for the typeface, e.g., utopia, and that’s where you would put the .pfb files:

c:\luatexblog\texmf\fonts\type1\adobe\utopia\*.pfb

Here is where we will put the various files we need.

File type TDS file path
.tfm c:\luatexblog\texmf\fonts\tfm\public\cm\
.pfb c:\luatexblog\texmf\fonts\type1\public\cm\
.enc c:\luatexblog\texmf\fonts\enc\
.map c:\luatexblog\texmf\fonts\map\
plain.tex c:\luatexblog\texmf\tex\plain\base\
hyphen.tex c:\luatexblog\texmf\tex\generic\hyphen\
texmf.cnf c:\luatexblog\texmf\web2c\
plain.fmt c:\luatexblog\texmf\web2c\

Filling these directories with files

Clearly, we will be generating plain.fmt and you saw in Part 5 where to get plain.tex and hyphen.tex. We will shortly be writing texmf.cnf by hand, so that leaves the following files to be obtained from somewhere:

  • TeX Font Metric files (.tfm)
  • Font encoding files (.enc)
  • Glyph data files (.pfb, on Windows)
  • .map files (specifically for pdfTeX)

But firstly, a note of caution

TeX Font Metric files (for text fonts) need to used and obtained with a little caution because they are tied to a specific font encoding. In general, and particularly with plain TeX (which assumes a certain encoding) you cannot just use them without knowing how they were encoded when they were created. For example, the afm2tfm.exe utility available from TeX Live (converts Adobe’s AFM files to TeX’s .tfm) can be given an encoding vector on its command line. Certainly, LaTeX has far more flexibility with encodings but plain TeX is rather less versatile.

Obtaining the .tfm files for the Computer Modern fonts

Using the method of browsing TeX Live, you can access the Computer Modern .tfm files here:

svn://tug.org/texlive/trunk/Master/texmf-dist/fonts/tfm/public/cm

One oddity: manfnt.tfm
This .tfm is required to build the plain TeX format and you can get it here.

Obtaining the .pfb files for the Computer Modern fonts

The American Mathematical Society provides the Computer Modern fonts in Adobe Type 1 format, which can be downloaded as part of their AMSFonts collection.

Obtaining the .map file for pdfTeX (pdftex.map)

I have created an absolutely minimal pdftex.map file which you can download from this site.

Obtaining the .enc files

In short, for this ultra-minimal setup you won’t need any so we’ll ignore them.

What about luatex.exe?

Download a copy of the latest binary and copy it to c:\luatexblog\luatex.exe.

Note: edit your PATH
Don’t forget that you will need to add c:\luatexblog to your computer’s PATH environment variable otherwise your PC won’t be able to find luatex.exe when you try to run it!

Kpathsea and texmf.cnf

We are nearly finished! All we now need to do is tell Kpathsea where to locate the various files in our minimal TDS tree and we do this through a texmf.cnf file that we must save to c:\luatexblog\texmf\web2c\texmf.cnf.

If you look at the texmf.cnf file supplied with TeX Live it looks quite daunting and complex because Kpathsea’s powerful searching algorithms allow you to construct quite complex expressions to describe paths and directory structures. Kpathsea allows you to create TeX installations of quite some complexity with multiple TDS trees being used for different purposes. We will not even touch a tiny fraction of Kpathsea’s power and flexibility.

See, for example

Describing the features of Kpathsea in detail is far beyond the scope of this post, perhaps one for another day. Interested readers should refer to the Kpathsea documentation and the texmf.cnf file available on theTeX Live repository – it contains very many helpful comments. For those who are comfortable reading C, there is a lot of additional information in the comments scattered throughout the Kpathsea source code. Happy reading!

Final steps

  1. Set an environment variable called TEXMFCNF which tells Kpathsea where to start looking for your configuration files (texmf.cnf). For our installation it should be set to

    • TEXMFCNF=c:\luatexblog\texmf\web2c\
  2. Enable Kpathsea debugging environment variables:
    • KPATHSEA_DEBUG_OUTPUT=c:/kspsluatex.log
    • KPATHSEA_DEBUG=-1
  3. Put the following into a text file called texmf.cnf and save it to
    c:\luatexblog\texmf\web2c

    
              WEB2C=c:/luatexblog/texmf/web2c
              TEXINPUTS = ./:c:/luatexblog/texmf/tex//
              TEXFONTMAPS = c:/luatexblog/texmf/fonts/map
              TFMFONTS = c:/luatexblog/texmf/fonts/tfm//
              TEXFORMATS=c:/luatexblog/texmf/web2c
              T1FONTS = c:/luatexblog/texmf/fonts/type1//
              ENCFONTS = c:/luatexblog/texmf/fonts/enc
    
    Erratum: Apologies, the above texmf.cnf is not correct (although it works). A far better way is as follows.
    
              TEXMF=$SELFAUTOLOC/texmf
              WEB2C=$TEXMF/web2c
              TEXINPUTS = .;$TEXMF/tex//
              TEXFONTMAPS = $TEXMF/fonts/map
              TEXFORMATS=$TEXMF/web2c
              TFMFONTS = $TEXMF/fonts/tfm//
              T1FONTS = $TEXMF/fonts/type1//
              ENCFONTS = $TEXMF/fonts/enc
    
    
    
Summary of these texmf.cnf variables

Source: The pdfTeX Manual

  • $SELFAUTOLOC: An environment variable set by Kpathsea (when it starts running) which gives the location of the executable; i.e., c:\luatexblog for luatex.exe as that is where we put it.
  • TEXINPUTS: This variable specifies where pdfTeX (and LuaTeX) finds its input files. Image files are considered input files and searched for along this path.
  • TEXFONTMAPS: Search path for font map (.map) files.
  • TFMFONTS: Search path for font metric (.tfm) files.
  • TEXFORMATS: Search path for format (.fmt) files.
  • T1FONTS: Search path for Type 1 font files (.pfa and .pfb).
  • ENCFONTS: Search path for encoding (.enc) files.
General notes
  • Kpathsea uses forward slashes “/”
  • In our example, TEXINPUTS starts with “.” and has a second path “c:/luatexblog/texmf/tex//” (separated by “;“) which ends in “//“.
    • .“: this means “the current directory”.
    • The “//” means search recursively into the directory
    • ;” is a separator for Kpathsea “path elements”

Something I have ignored: ls-R databases
Kpathsea can use an externally-built filename database file named ls-R that maps files to directories, thus avoiding the need to exhaustively search the disk. See the Kpathsea documentation for more detail.

Running luatex

If you type luatex --help at the DOS prompt you’ll see a number of command-line options. But of these only 4 are of immediate interest:

--fmt=FORMAT load the format file FORMAT
--ini be iniluatex, for dumping formats
--output-directory=DIR use DIR as the directory to write files to
--output-format=FORMAT use FORMAT for job output; FORMAT is ‘dvi’ or ‘pdf’

Note that when you run LuaTeX it will write the PDF file, and maybe others, to the current working directory unless you specify an another directory using --output-directory=DIR

Generating plain.fmt

We’ll use the fact that LuaTeX will write to the current directory unless told otherwise. Start a DOS prompt and change the directory to where we want the plain.fmt file to be located:

c:\luatexblog\texmf\web2c

Run the command line:

c:\luatexblog\texmf\web2c> luatex --ini plain.tex \dump

You should see something like the following, and a file called plain.fmt created in
c:\luatexblog\texmf\web2c

This is LuaTeX, Version beta-0.65.0-2010121314 (rev 4033) (INITEX)
(c:/luatexblog/texmf/tex/plain/base/plain.tex
Preloading the plain format: codes, registers, parameters, fonts, more fonts,
macros, math definitions, output routines, hyphenation
(c:/luatexblog/texmf/tex/generic/hyphen/hyphen.tex))
Beginning to dump on file plain.fmt
(format=plain 2011.1.25)

...
...
...

TeXing our first file

Create a directory called myfiles under c:\luatexblog and change to that directory.

c:\luatexblog\myfiles>

Grab any plain TeX example from the web (for example http://www.combinatorics.org/Information/plain.html)

Save it to a text file c:\luatexblog\myfiles\test.tex

Run the command

c:\luatexblog\myfiles> luatex --fmt=plain test.tex

You should see

c:\luatexblog\myfiles> luatex --fmt=plain test.tex
This is LuaTeX, Version beta-0.65.0-2010122301
(./test.tex 1. Introduction. [1] )
Output written on test.dvi (1 page, 2432 bytes).
Transcript written on test.log.

With a file test.dvi output in c:\luatexblog\myfiles

Now run the command

c:\luatexblog\myfiles> luatex --fmt=plain --output-format=pdf test.tex

You should see

c:\luatexblog\myfiles>luatex --fmt=plain --output-format=pdf test.tex
This is LuaTeX, Version beta-0.65.0-2010122301
(./test.tex 1. Introduction. [1{c:/luatexblog/texmf/fonts/map/pdftex.map}] )
<c:/luatexblog/texmf/fonts/type1/public/cm/cmbx10.PFB><c:/luatexblog/texmf/fonts/
type1/public/cm/cmcsc10.PFB><c:/luatexblog/texmf/fonts/type1/public/cm/cmr10.PF
B><c:/luatexblog/texmf/fonts/type1/public/cm/cmr12.PFB><c:/luatexblog/texmf/fon
ts/type1/public/cm/cmtt10.PFB>
Output written on test.pdf (1 page, 70507 bytes).
Transcript written on test.log.

With a file test.pdf output in c:\luatexblog\myfiles

Conclusion

Over these 6 tutorials I have tried to cover, in general terms, some of the “TeX landscape” and to equip you with enough information to begin building your own LuaTeX test environment, should you wish to. Of course, I have omitted vast amounts of technical detail in the interest of simplicity and building “a conceptual framework” for your own investigations. I hope that I have not made any serious errors but if anyone spots some, do please let me know so that I can release corrected and updated posts.

I hope that somewhere “out on the web” someone has enjoyed these and found them to be useful.

Cheers

Graham!

A minimal LuaTeX setup on Windows (Part 5)

A summary of what we are going to do next

This is going to be a pretty long post, and I’ve been working on it for days! I’ve been trying to get the right flow of ideas and the level of technical detail and concepts “just right”. We’ll cover a lot of ground, skimming over some deep details, but hopefully end up with something that is useful. As always, you are the main audience and judges of whether this content is a useful addition to “the blogosphere”: if you want to comment, please do.

In this part of the tutorial we’ll work towards a minimal LuaTeX installation using one of the simplest TeX formats: Donald Knuth’s plain TeX format, as described in The TeXbook. To do this we will be taking the following steps in the process:

  1. Identify all the various files and resources that LuaTeX will need to process a document written in the plain TeX format.
  2. Work out the best way to organise these various file types on your hard drive.
  3. Work out how we will tell Kpathsea where to find these files:
    • using environment variables;
    • using a minimal hand-written texmf.cnf file.
  4. Build the plain TeX format file (plain.fmt).

What we are going to leave for later: staying simple
We are going to ignore the (wonderful world of) OpenType fonts (for now) and stay with the far simpler Adobe Type 1 PostScript fonts. In addition, we are not going to explore any LuaTeX-specific features such as \directlua{...}, purely to keep the discussion as simple as possible at this point.

The plain TeX format

Knuth’s plain TeX format is described in great detail in The TeX Book, and has the advantage that it requires requires just two files to build the format file:

  • plain.tex
  • hyphen.tex
Where do you get these files?

I’d recommend grabbing them from the Comprehensive TeX Archive Network (CTAN). They can be downloaded from CTAN, here: http://www.ctan.org/tex-archive/macros/plain/base/. Alternatively, you can browse TeX Live.

Step 1: What other files and resources does LuaTeX need for plain TeX?

When Donald Knuth wrote the original TeX engine, the output of “TeXing a document” was something called a DeVice Independent file (referred to from now on as a DVI file). Describing the DVI file format in detail is beyond the scope of this post and there are plenty of resources on the web which you can access for more detail; for example, the UK TUG FAQ is a good starting point, as is the Wikipedia entry.

Newer TeX engines have, of course, been developed to output PDF files in addition to DVI files, most notably, starting with pdfTeX. LuaTeX can be seen as an extension of pdfTeX and also outputs PDF directly.

Input, typesetting and output (DVI vs PDF)

To help with understanding the following sections, it will be useful to consider the “typesetting process” as built up of three fundamental activities:

  1. reading in the text to be typeset
  2. the TeX engine executing its internal functions and algorithms: “typesetting”; i.e., breaking paragraphs into lines, constructing mathematical formulae and so forth
  3. the process of writing the typeset result to a file (a DVI or PDF)

For current purposes, it is the process of “writing the typeset result to a file” which we need to discuss. Compared to writing DVI files, when writing PDF files TeX engines need access to additional resources, and that is going to affect the resources we need to make available through our setup and installation. Of course, both DVI and PDF files contain a representation the typesetting work done by the TeX engine; however, they differ in one very important way: PDF files output directly by TeX engines embed (i.e., write into) the PDF file the actual data required to display fonts, whereas DVI files do not. You can think of the DVI file format as an “intermediate file format” which provides a description of the typeset results, but to visualise the results described by DVI files they have to be processed by external applications. It is the job of these external applications, often called “drivers”, to make sure that they have access to the data required to display fonts: whether on a screen, on a desktop printer or any other device. The philosophy behind the design of the DVI file format was to create a representation of the typeset result which could then be output on any device through the use of the appropriate “device driver software”: leaving the messy device-dependent details to external applications. Hence the name DeVice Independent file. It also explains (in part!) why DVI files are tiny compared to their PDF counterparts: DVI files do not contain font (glyph) data (or images etc), whereas those resources are embedded into PDF files produced by TeX engines.

In summary, the most important point for us is that for TeX to output a DVI file it does not need access to the actual data required to display the fonts used in your document. Now, if you are new to TeX this may seem very strange and almost a contradiction: a typesetting program that does not need access to fonts? To explain this, we need to be very clear on precisely what conventional or original TeX engines actually understand by “a font”.

Of metrics, characters, glyphs and encodings

The subject of fonts is a huge topic, one I intend to write about in future posts, but for now I need to introduce four key concepts at this point in the story: metrics, characters, glyphs and encodings.

Characters and glyphs

I thought long and hard about how to explain the difference between characters and glyphs but I think the the Unicode standard does it as well as anything I’ve read, so I’d like to quote from the Unicode standard (version 6.0) which says:

Characters are the abstract representations of the smallest components of written language that have semantic value. They represent primarily, but not exclusively, the letters, punctuation, and other signs that constitute natural language text and technical notation.

Glyphs represent the shapes that characters can have when they are rendered or displayed. In contrast to characters, glyphs appear on the screen or paper as particular representations of one or more characters.

So, you can think of a character as being the name of a fundamental building block of a language (e.g., the letter ‘capital A’) and a glyph is a character being expressed in a specific visual form. So, for example, the following SVG graphic show 4 glyphs representing the character ‘capital A’:

Metrics

Firstly, I must stress again, that I am not discussing OpenType font technology but restricting the discussion to the older world of Adobe Type 1 fonts for use with plain TeX. As far as TeX engines are concerned, to do their job of typesetting they treat glyphs as simple boxes and all they want to know is three simple values for each glyph you want to typeset: width, height and depth. This set of numbers are called metrics.

Now, I have deliberately used the term glyph, not character, because a glyph is the visual representation of a character and, clearly, it is the ‘size’ of the glyph boxes that TeX wants to know. For example, you can easily see this because if you type a row of characters, say, ‘capital A’, in Microsoft Word and apply a different typeface to each one then it should be clear that the width of each glyph depends on the typeface you have applied: i.e., the specific visual representation, i.e., the glyph. So, when you use a paricular “font” with TeX, all that TeX is worried about are the metrics which provide numeric information about the glyphs. The typesetting algorithms inside the TeX engine do not care about the specifics of what the glyph looks like, it just wants the metrics so that it can calculate line breaks, compute the layout of a formula or where to end the page.

TeX font metric files

When the TeX engine is typesetting your document, breaking paragraphs into lines, constructing mathematical formulae, all it wants to know are some numeric values about the glyphs in the fonts you are using. It does not care about what the individual glyphs actually look like.

Metrics in reality: more than just width, height, depth
I have simplified the discussion somewhat. The actual metrics used by TeX engines include a range of additional data values which I won’t discuss here but I just want to note that real metrics contain more data than just width, height and depth of glyphs. Actually, in reality, there are two classes of metrics that TeX engines require: metrics for text fonts and metrics for math fonts. To typeset mathematics, TeX engines need some additional numbers (metrics) which the TeX engine uses to control the processes which construct the typeset formula.

Encodings

Font encoding is a messy topic, one which is impossible to cover thoroughly in a few lines. So, my apologies in advance to any experts reading this, but I’m aiming for “minimal simplicity”, at this point. Although a pretty obvious thing to say, what we need to realise is that when software is storing or processing text data, it is actually working with numbers: numbers which represent characters. When it comes to displaying the text (which internally is being stored as numbers) there has a be a process to decide which characters are actually being represented by that set of numbers. We need some form of “mapping” from those numbers to the characters they are expected represent. That mapping is called… the encoding. An encoding is simply a set of numbers which are allocated to a specific range of characters.

Introduction to Adobe Type 1 PostScript fonts (ignoring OpenType)

To assist with the discussions, we should think of a “font” as consisting of two files:

  • the font metrics: a file containing numeric data describing the width, height, depth of the glyphs in the font. Again, I stress this is a simplification because real metric files contain a range of additional data.
  • the font glyphs: this is the data which describes how to draw the glyphs themselves, i.e., the lines and curves from which glyph shapes are built.

Within the world of Adobe Type 1 PostScript fonts, the font metric files are called Adobe Font Metrics or AFM files (.afm) and the font glyphs (on Windows) are stored in a separate file called Printer Font Binary (PFB or .pfb files). AFM files are a simple text file format whereas PFB files are a compact binary format.

Do TeX engines use Adobe Font Metrics files?

No. TeX engines expect their metric files in a very specific format called the TeX Font Metric (TFM, .tfm) format. This is a highly compact binary format, unlike Adobe’s AFM format. Of course, there are utilities to convert from AFM files to TFM files for use with TeX; for example, afm2tfm.exe shipped with TeX Live.

Answer to: What other files and resources does LuaTeX need for plain TeX?

Finally, we are in a position to answer this question. For LuaTeX to successfully output a PDF file containing the typeset results it needs access to:

  • TeX Font Metric files (extension .tfm)
  • Font encoding files (extension .enc)
  • Glyph data files (extension .pfb, on Windows)
  • The plain TeX format file (extension .fmt)
  • The file(s) containing your document (extension .tex etc)
  • And one we have not explained: .map files (specifically for pdfTeX)

In addition, of course, to any graphics you want to include but we’ll leave graphics to the LaTeX format.

pdfTeX and .map files: a primer
We have noted that TeX engines such as pdfTeX and LuaTeX can output direct to PDF but to do so they need access to the actual font data files which contain the descriptions of what the glyphs look like (.pfb files on Windows) so that they can embed this data into the PDF. We have seen that, for the pure typesetting purposes, TeX engines only need access to TeX font metrics. The magic ingredient which connects the two is called a font map file and is a specific requirement for pdfTeX-related TeX engines. Quoting a slightly edited extract from the pdfTeX manual:

“Font map files provide the connection between TeX tfm font files and the outline font file names (.pfb files). They contain also information about re-encoding arrays, partial font embedding (“subsetting”), and character transformation parameters (like SlantFont and ExtendFont). Those map files were first created for dvi postprocessors. But, as pdfTeX in pdf output mode includes all pdf processing steps, it also needs to know about font mapping, and therefore reads in one or more map files. Map files are not read in when pdfTeX is in dvi mode. By default, pdfTeX reads the map file pdftex.map. In Web2c, map files are searched for using the TEXFONTMAPS config file value and environment variable.”

Step 2: Work out the best way to organise these various file types on your hard drive

So we have identified the resources that LuaTeX needs, we now need to think about how should we organise these files in the most appropriate way: i.e., a suitable directory structure. Readers who have been following this set of tutorials may already have seen the posting on the TeX Directory Structure (TDS), and that’s what I’ll use to guide the remainder of this tutorial.

Reminder: Kpathsea!
Don’t forget that the Kpathsea path-searching library (built into LuaTeX) is the vital component that will be searching through the TDS directory structure to locate the files and resources that LuaTeX will be looking for. Consequently, it is a good idea to make sure that your directory structure is optimised to make the best use of Kpathsea, so that LuaTeX can find files as fast as possible.

You can have multiple TDS trees: thanks to Kpathsea!
The Kpathsea library supports the use of multiple TDS trees so that you can split a big TeX installation into multiple directory structures, providing flexibility to manage your TeX installation. A paper by By Michael J Downes: Managing Multiple TDS Trees covers this in some detail.

Quoting from this document:

“In this document, we shall designate the root TDS directory by “texmf” (for “TEX and METAFONT”). We recommend using that name where possible, but the actual name of the directory is up to the installer. On PC networks, for example, this could map to a logical drive specification such as T:. Similarly, the location of this directory on the system is site-dependent. It may be at the root of the file system; on Unix systems, /usr/local/share, /usr/local, /usr/local/lib, and /opt are common choices.”

So, our first task is to create a directory structure which is rooted in a directory called “texmf”. As the TDS specification says “the location of this directory on the system is site-dependent” so we can put it wherever we choose. On my PC I’m going to use

c:\luatexblog\texmf

Further, section 3 Top-level directories of the TDS specification says that “the directories under the texmf root identify the major components of a TeX system”. However, the TDS specification also notes (section A Unspecified pieces) that the location of certain file types is not covered by the recommendations:

  • The location of executable programs is too site-dependent to recommend a location. A site may place executables outside the texmf tree altogether, in a platform-dependent directory within texmf, or elsewhere.
  • The location of implementation-specific files (e.g., TeX .fmt files): by their nature, these must be left to the implementor or TeX maintainer.

So, where we put the luatex.exe file and the plain TeX .fmt file is up to us. The Kpathsea documentation (currently for version 6, July 2010) gives a nice example of a skeleton TDS which I’ll use for our minimal install for plain TeX. Based on the reasoning above, we need to define directories which contain:

  • TeX Font Metric files (extension .tfm)
  • Font encoding files (extension .enc)
  • Glyph data files (extension .pfb, on Windows)
  • The plain TeX format file (extension .fmt)
  • The plain TeX source files (plain.tex and hyphen.tex)
  • .map files (specifically for pdfTeX)

These will be subdirectories of c:\luatexblog\texmf. Starting with the font-related directories, we’ll create a set of directories which follow the structure:

c:\luatexblog\texmf\fonts\[type]\[supplier]\[typeface]

Where [type] will be

  • tfm: for .tfm files (TeX font metrics)
  • type1: for .pfb files (Printer Font Binary)

Where [supplier] will be public (i.e., for free fonts) and [typeface] will simply be cm (for Computer Modern). You can see this is the directory structure used on TeX Live:

In addition, under c:\luatexblog\texmf\fonts\ we’ll need to create directories for

  • map: for .map files (pdfTeX and LuaTeX font mapping files)
  • enc: for .enc files (font encoding)

Now we just need a directories to contain

  • plain TeX source files (plain.tex and hyphen.tex)
  • the plain TeX .fmt file
  • the texmf.cnf file that we’ll write for Kpathsea

As discussed in previous posts, Kpathsea uses a mixture of environment variables and configuration files (called texmf.cnf) to perform its path-searching magic. Actually, it uses a fairly complex interplay between environment variables and variables named in configuration files. I’m not going to explore this because it is described, in detail, in the Kpathsea documentation.

Giving Kpathsea a starting point

Clearly, when you start LuaTeX (and hence Kpathsea) there has got to be some way for the Kpathsea library to “hook into” your computer setup, a kind of “entry point” if you like so that it knows where to find your texmf.cnf file(s). You do this by setting an environment variable called TEXMFCNF which tells Kpathsea where to start looking for your configuration files (texmf.cnf).

Debugging Kpathsea searches

Most of us have, at one time or another, experienced situations where the TeX engine cannot locate a particular file or class of files. This can be rather frustrating so it is well worth setting a couple of environment variables to switch on Kpathsea’s debugging (creating a log file): telling you where Kpathsea is looking for a particular file, or type of file. This can be extremely helpful to diagnose “can’t find file” errors. The environment variables you need to set are KPATHSEA_DEBUG and KPATHSEA_DEBUG_OUTPUT.

  • KPATHSEA_DEBUG_OUTPUT: this is the path and name of the log file to record the debug output (for example, KPATHSEA_DEBUG_OUTPUT=c:/kspsluatex.log).
  • KPATHSEA_DEBUG: this takes a numeric value which controls the type of debugging output to generate. If you set it to -1 then Kpathsea will log everything into the file pointed to by KPATHSEA_DEBUG_OUTPUT. However, note that setting KPATHSEA_DEBUG=-1 will create a lot of output. Other values for KPATHSEA_DEBUG are documented here.

OK, I think that’s enough for one post. I’ll (hopefully) complete this series in the next tutorial. Until then, stay tuned and let me know if you spot errors in the above. I’ll fix a few tiredness-induced typos in this post, too! (update: ahem, few typos fixed… was rather late to be writing this…)

Thank you Microsoft, here’s why!

I am a massive fan of open source software and enjoy the freedom of being able to download source code via SVN (e.g., LuaTeX and GhostScript). I subscribe to numerous mailing lists to keep up-to-date with various open source distributions close to my heart and interests (FreeType, GhostScript, LuaTeX, Lua, …). In short, I love, enjoy and applaud open source software. So, why am I saying “Thank you Microsoft, here’s why!” About 6 months ago I purchased, at some expense (about £500, that hurt!), a copy of Fontlab Studio to support my research into OpenType fonts with LuaTeX. Of course, I have also installed Fontforge and, in addition, Microsoft VOLT (Visual OpenType Layout Tool ). Of course, FontForge is open source whereas VOLT is free to use: you only get the executable, not the source code. Another useful addition to the armoury is the Adobe Font Development Kit for OpenType. With all these tools and resources at your disposal you have everything to occupy many lifetimes of research.

A few years ago, during the course of my research, I came across the blog by Murray Sargent, professor and laser physicist who works for Microsoft. In short, Professor Sargent is responsible for the excellent mathematical typesetting in Microsoft Word 2007, and later. The Cambria-Math OpenType mathematical font was produced by Tiro Typeworks and currently defines the standard for OpenType mathematical typography; in addition, Microsoft proposed and pioneered the MATH table for OpenType fonts. Cambria-Math works wonderfully well with LuaTeX. Personally, I am grateful to Microsoft for contributing to mathematical typesetting, releasing VOLT and, on a personal basis, sending me a copy of their excellent book “Mathematical Typesetting”. I just want to leave you with the thought that no matter how big a corporation might be, within it are departments and groups run and managed by real people who are dedicated to their art and responsive to requests from respectful, like-minded, individuals. The corporate machine is one thing, small dedicated teams of friendly, helpful experts are another. That is why I say “Thank you Microsoft, here’s why!”.

Three weeks later, who is reading my blog?

Well, I’ve been spending many hours of my evenings and weekends writing this blog. Ya know, like all blogs, this stuff doesn’t just happen! I’m sitting here at my computer writing and editing for up to 6 hours a day (on weekends) in addition to a pretty full-on day job in STM publishing. So I took a look at the statistics of who is reading this stuff and where the visits are coming from. I have to say I was pleasantly surprised! In the short time this blog has been active I’ve had nearly 1000 visits from more than 20 countries! Including the Seychelles and Mauritius! So, I’d like to say hello to all visitors, but especially to everyone from the top three non-UK (and non-Google, but bless you!) countries: Germany, India and France. One thing that all bloggers have to manage is SPAM, and I can tell you that I’ve already had a fair amount from those scumbags. That is why I have to moderate all comments, so that you, dear readers, do not see that garbage. Be assured I report all SPAM posts to relevant sites.

So, dear reader, if you have a few moments to say hello, do please write a comment to let me know if the content is useful, or not useful, or what should be covered here. After all, I work in publishing where peer review is critical to what we do.

Look forward to hearing from you!

Cheers

Graham

Some TeX projects on code.google.com

Just a quick post to flag up a couple of projects on code.google.com.

  • One for the Chinese TeX community: http://code.google.com/p/ctex-kit/. Quoting from the web site: “This project aims to bring together many existing efforts including xeCJK, zhspacing, LuaTeX related Chinese support, etc.”
  • The Y&Y TeX C source code: http://code.google.com/p/yytex/. A great commercial Windows TeX system in its time and one I still have installed to this day. Cost me quite a lot of money, as I recall, but it worked really well, especally DVIPSONE (the PostScript Driver) and the DVI previewer DVIWindo which had the full support of Adobe Type Manager (you’ll see that in the C code!). Y&Y, Inc. ceased trading in 2004 and subsequently donated the source code to the TeX Users Group which has released the sources under the GNU GPL.

Extending LuaTeX on Windows with plugins (DLLs)

About 6 months ago I came across an article and presentation by Luigi Scarso called “LuaTEX lunatic”, with a subtitle And Now for Something Completely Different. And different it was because, for me, it opened my eyes to some of the real power of LuaTeX: extending it via C/C++ libraries. Luigi’s truly excellent paper is Linux-centric but the general ideas hold true for any platform, including Windows.

The power of Lua’s require(...) function

The Lua language provides a function called require(...) which allows you to load and run libraries – that can be written in pure Lua or the Lua C API. Refer to the Libraries And Bindings page on lua-users.org for more details.

Using require(...) with LuaTeX: a primer
Once again, the secret ingredient is the LuaTeX command \directlua{...} which, as discussed in previous posts, lets you run Lua code from within documents you process with LuaTeX. Suppose you have a DLL which you, or someone else, have written with a Lua binding and you want to use it with LuaTeX. How do you do it?

Firstly, within texmf.cnf you need to define a variable called CLUAINPUTS, which tells Kpathsea where to search for files with extension .dll and .so (shared object file, on Linux). For example, in my hand-rolled texmf.cnf the setting is

CLUAINPUTS=$TEXMF/dlls

The LuaTeX Reference Manual notes the default setting of

CLUAINPUTS=.:$SELFAUTOLOC/lib/{$progname,$engine,}/lua//

World’s most pointless DLL code?

Just for completeness, and by way of an ultra-minimal example, here is probably the world’s most pointless C code for a DLL that you can call from LuaTeX. To compile this you will, of course, need to ensure that you link to the Lua libraries (note that I use Microsoft’s Visual Studio for this)


#include ‹windows.h›
#include "lauxlib.h"
#include "lua.h"

#define LUA_LIB   int __declspec(dllexport) 

static int helloluatex_greetings(lua_State *L){

	printf("Hello to LuaTeX from the world's smallest DLL!");
	return 0;
}


static const luaL_reg helloluatex[] = {
{"greetings", helloluatex_greetings},
	{NULL, NULL}
};

LUA_LIB luaopen_helloluatex (lua_State *L) {
  luaL_register(L, "helloluatex", helloluatex);
  return 1;
}

You need to compile the above C code into a DLL called helloluatex.dll and copy it to the directory or path pointed to by CLUAINPUTS.

LuaTeX code to use our new DLL

Here is a minimal (LaTeX) file to load helloluatex.dll and call the greetings function we defined via the Lua C API. We'll call the file dlltest.tex.

\documentclass[11pt,twoside]{article}
\begin{document}
\pagestyle{empty}
\directlua{

	require("helloluatex")
	helloluatex.greetings()
}
\end{document}

Running this as luatex --fmt=lualatex dlltest.tex gives the output

This is LuaTeX, Version beta-0.65.0-2010122301
(c:/.../dlltest.tex
LaTeX2e <2009/09/24>
(c:/.../formats/pdflatex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(c:/.../formats/pdflatex/base/size11.clo))
No file dlltest.aux.
Hello to LuaTeX from the world's smallest DLL!(./dlltest.aux) )
 262 words of node memory still in use:
   2 hlist, 1 vlist, 1 rule, 2 glue, 39 glue_spec, 2 write nodes
   avail lists: 2:12,3:1,6:3,7:1,9:1
No pages of output.
Transcript written on dlltest.log.

Note that you see Hello to LuaTeX from the world's smallest DLL! printed out to the DOS window.

This is, of course, a rather simple example so I'll try to provide more useful examples over the coming weeks and months. I have integrated a number of libraries into LuaTeX, including FreeType and GhostScript, and many others, so I'll try to cover some of these wonderful C libraries as time permits. Stay tuned!