TeX4ht: Overview of the Process

TeX4ht system has the ability to translate any TeX or LaTeX document into other markup formats such as SGML, HTML, XML, MathML, OpenOffice format, Braille, etc. The system has an extensive load of TeX packages, hypertext fonts, a post-processor for dvi and another post-processor to generate CSS and image files of math formulae and equations. It works in three different stages. Given below is the summary (extracted from TeX4ht’s documentation by Eitan), of how the translation process works.

The system can be activated with a sequence of commands of the following form, typically embedded within a script.

       latex      x            (or ‘tex x’)
       latex      x
       latex      x
       tex4ht     x
       t4ht       x

The three compilations with LaTeX (or TeX) are needed to ensure proper links. The approach is illustrated in the following picture.

Translation process

The schematic diagram of the translation process.

  • x.tex: This is a source TeX or LaTeX or other TeX file that imports the style files tex4ht.sty and *.4ht. (The name is arbitrarily chosen for the purpose of our discussion here.) The style files define the features for the output.
  • tex4ht: The output of TeX is a standard dvi file interleaved with special instructions for the post-processor namely, tex4ht to use. The special instructions come from implicit and explicit requests made in the source file through commands of TeX4ht.

    The utility tex4ht, which is a binary program, translates the dvi code into standard text, while obeying the requests it gets from the special instructions. The special instructions may request the creation of files, insertion of HTML code, filtering of pictures, and so forth.

    In the extreme case that the source code contains no commands of TeX4ht, tex4ht gets pure dvi code and it outputs (almost) plain text with no hypertext elements in it.

    The special (special) instructions seeded in the dvi code are not understood by dvi processors other than those of TeX4ht.

  • x.idv: This is a dvi file extracted from x.dvi, and it contains the images needed in the HTML files.
  • x.lg: This is a log file listing the pictures of x.idv, the png files that should be created, CSS information, and user directives introduced through the Needs{...} command.
  • t4ht: This is an interpreter for executing the requests made in the x.lg script.

2 Responses to “TeX4ht: Overview of the Process”

  • Dear CVR,

    Greetings from Lisbon. Formerly, we exchanged some emails for installing the font Kurier on my machine. Anyhow, I was looking for ways for converting my latex code to plain ascii, and surprisingly, I came across your website. Although, I have been able to convert my latex file to plain html by:-

    htlatex file.tex
    tex4ht file.tex

    The following file produces the required html in plain text format. Is it possible for tex4ht to directly produce a txt file?



    • Many thanks for the comments, Ashish. You might run LaTeX on your TeX file and then post-process the dvi with tex4ht. The process will involve the following steps:

      latex file.tex
      tex4ht file

      It still produces file.html, but will be a plain text file without any html elements. Hope, this is what you wanted.

Leave a Reply