Pandoc with LaTeX: Best of both worlds

LaTeX is one of the best typesetting system out there. But it has a glaring issue—writing the source .tex file. Here is where Pandoc comes to rescue. It can act as a bridge between a simple markup language (Markdown for now) and a LaTeX compilable code.

On this page

While LaTeX is amazing at what it does, typesetting anything non-professional becomes tideous and redundant. Now Markdown on the other hand, might be on the other side of the spectrum. It’s easy to type, has limited number of commands to remember, and doesn’t need a separate compiling engine. It is great for taking notes, writing articles.

Now, how do we get the ease of writing of Markdown and the final output of LaTeX? The answer is Pandoc—“The universal markup converter”. Pandoc can not only convert markdown files to .tex or .pdf but it can do so with custom templates or filters.

Installing Pandoc and LaTeX

  1. Installing Pandoc is straight-forward.
  2. Installing LaTeX from popular distributions: TeX Live, MiKTeX.

Now, lets convert a sample markdown file to PDF.

pandoc input.md -o output.pdf
Preview of default PDF output from Pandoc.
Preview of a default PDF output by Pandoc.

Customizing the output

The vanilla output is acceptable but some features like custom fonts, two-column layout, headers & footers can be important. Pandoc provides multiple ways to customize the output such as arguments, extensions, templates, header files or filters.

Conversion process of markdown to pdf by Pandoc.
Conversion process of markdown to pdf by Pandoc. (Simplified)

Here, we will be utilizing pandoc arguments, a latex header file, and a filter to improve our output file. A defaults to tie all the arguments, headers and filters will be made.

head.tex
LaTeX code added to the preamble.
fenced_block.lua
Lua filter to add fenced blocks/divs feature for PDF output.
table_caption.py
A fix to captions for tables in two-column mode.
mydefault.yaml
A default file that stores all the necessary arguments and metadata.

Creating LaTeX header file

You can add custom LaTeX code into the preamble of the generated .tex file using a header file. Use --include-in-header argument with --standalone (--standalone is necessary to add header to create a complete file and not a snippet of LaTeX code). This is how you will be customizing the appearance of your output. Below is the code for header file head.tex.

%% head.tex

%% Automatic line breaking for code blocks
\usepackage{fvextra}
\fvset{breaklines=true}

%% Custom typefaces
\usepackage{fontspec,unicode-math}

\setmainfont{LibertinusSerif}[
    Extension = .otf,
    SmallCapsFeatures = {LetterSpace=2, Renderer=Basic},
    UprightFont    = *-Regular,
    BoldFont       = *-Bold,
    ItalicFont     = *-Italic,
    BoldItalicFont = *-BoldItalic
]
\setsansfont{LibertinusSans}[
    Extension = .otf,
    SmallCapsFeatures = {LetterSpace=2, Renderer=Basic, WordSpace={1.5}},
    UprightFont    = *-Regular,
    BoldFont       = *-Bold,
    ItalicFont     = *-Italic,
]
\setmathfont{LibertinusMath-Regular.otf}
\setmonofont[Scale=MatchLowercase]{FiraMono-Regular.otf}

\setlength{\columnsep}{0.25in}

%% Styling main title section
\makeatletter
\def\@maketitle{
    \raggedright
    {\LARGE \bfseries \@title}\\[2ex]
    {\Large \@author}\\[1ex]
    {\Large \@date}\\[8ex]
}
\makeatother

%% For styling headings
\usepackage{titlesec}

\titleformat{\section}{\normalsize\bfseries\raggedright\scshape\sffamily\MakeUppercase}{\thesection}{1em}{}
\titleformat{\subsection}{\sffamily\raggedright\bfseries}{\thesubsection}{.6em}{}
\titleformat{\subsubsection}{\normalsize\sffamily\itshape}{\thesubsubsection}{.6em}{}

\titlespacing*{\section}{0pt}{1\baselineskip}{0.5\baselineskip}
\titlespacing*{\subsection}{0pt}{0.5\baselineskip}{0.3\baselineskip}
\titlespacing*{\subsubsection}{0pt}{0.75\baselineskip}{2pt}

%% Header/Footer styling
\usepackage{fancyhdr}
\fancyhf{}
\renewcommand*{\headrulewidth}{0.4pt}

\fancyfoot[LE,RO]{\bfseries\thepage}
\fancyhead[RO]{\sffamily\textbf{\rightmark}}
\fancyhead[LE]{\sffamily\scshape\textbf{\leftmark}}

\pagestyle{fancy}

\fancypagestyle{plain}{
    \fancyhf{}
    \fancyfoot[LE,RO]{\bfseries\thepage}
    \renewcommand*{\headrulewidth}{0pt}
}

Brief summary of what the code in head.tex does.

  1. Automatic line breaking of long code blocks.
  2. Setting custom sans-serif, sans, math, and monospace font. Since I am using fontspec package for this, it is required to set --pdf-engine to something other than pdflatex, maybe xelatex or lualatex.
  3. Title section (article document class) styling.
  4. Heading styling with titlesec package.
  5. Header and footer styling with fancyhdr package.

You can run pandoc with header file using the command below:

pandoc \
    --standalone \
    --include-in-header=head.tex \
    --pdf-engine=xelatex \
    input.md \
    -o output.pdf
PDF output after compiling with custom header file.
PDF output after compiling with custom header file.

Creating filters

Pandoc supports filter written in various languages which can be used to manipulate the content and even existing attributes styles.

One of the prominent feature in pandoc markdown is its fenced blocks/divs.

::: container-name
Content here...
:::

But this will be ignored when converting to PDFs. A solution to that could be using a filter.

-- fenced_block.lua
-- A "note" container environment

function Div(el)
  if el.classes[1] == "note" then
    -- insert element in front
    if el.attributes.title then
      st = "\\begin{Note}{"..el.attributes.title.."}"
    else 
      st = "\\begin{Note}"
    end
    table.insert(
      el.content, 1,
      pandoc.RawBlock("latex", st))
    -- insert element at the back
    table.insert(
      el.content,
      pandoc.RawBlock("latex", "\\end{Note}"))
  end
  return el
end

The above lua filter replaces any instance of ::: note with the Note environment along with a title as ⟨my title⟩ whenever available. Along with it, append the following code to head.tex. So, the new Note environment can be appropriately styled with tcolorbox package with automatic counter.

%% head.tex

\usepackage[most]{tcolorbox}
\tcbuselibrary{breakable}

\tcbset {
    base/.style={
        arc=0mm,
        bottomtitle=0.5mm,
        boxrule=0.4pt,
        colback=white,
        colbacktitle=white,
        coltitle=black,
        fonttitle=\sffamily\bfseries,
        titlerule=0pt,
        left=0.8em,
        right=0.8em,
        title={#1~\thetcbcounter},
        toptitle=0.75mm,
        overlay={
            \draw[black, line width=0.4pt] ([xshift=0.8em]title.south west)--([xshift=-0.8em]title.south east);
        }
    }
}

\newtcolorbox[auto counter,number within=section]{BOX}[1]{
    enhanced jigsaw,
    breakable,  % To make breakable box
    sharp corners,
    colframe=black,
    base={#1},
}

\newenvironment{Note}[1]%
    {\begin{BOX}{#1\hfill Note}%
        \setlength{\parskip}{1ex}}%
    {\end{BOX}}

Lets setup a sample markdown file to test this functionality.

::: note
Lorem ipsum dolor […]
:::

::: {.note title="This note has a title too!"}
Nam vel commodo […]
:::

Now to get the desired output use the following command in terminal:

pandoc \
    --standalone \
    --include-in-header=head.tex \
    --filter=fenced_block.lua \
    --pdf-engine=xelatex \
    input.md \
    -o output.pdf
Preview of PDF displaying note block.
Preview of PDF displaying note block. First as default (without title) and second with custom title.

table_caption.py (a slight modification to @lhoupert’s filter) is a fix to have pandoc’s table captions support in two-column mode. This will require installation of a python distribution, and pandocfilters.

pip install pandocfilters
# table_caption.py
# Based on https://github.com/jgm/pandoc/issues/1023#issuecomment-797435932

import pandocfilters as pf
from pandocfilters import walk

def return_latex_ref(x):
    result = []

    def go(key, val, format, meta):
        if key in 'RawInline':
            result.append(val[1])

    walk(x, go, "", {})
    return ''.join(result)

def tabular(key, value, format, meta):

  if key == 'Table' and 'processed' not in value[0][1]:
    caption = value[1]
    cap = pf.stringify(caption) + ' ' + return_latex_ref(caption)

    cmd = f'\\renewcommand\\tcap{{{cap}}}'

    value[0][1].append('processed')
    value[1] = [None, []]
    mytable = pf.elt('Table', len(value))
    
    return [pf.RawBlock('latex', cmd), mytable(*value)]

if __name__ == '__main__':
  pf.toJSONFilter(tabular)

Below code should be appended to head.tex. It contains code complimenting table_caption.py filter. It also contains a code snippet to fix for table exclusively for two-column mode (A bug due to Pandoc, requires twocolumn to be set).

% head.tex

\newcommand\tcap{} % For table captions
%% Table fix for twocolumn mode
\renewenvironment{longtable}{\begin{table}\begin{tabular}}{\end{tabular}\caption{\tcap}\end{table}}
\renewcommand{\endhead}{}
\renewcommand{\toprule}[2]{\specialrule{\heavyrulewidth}{\abovetopsep}{\belowrulesep}}
\renewcommand{\midrule}[2]{\specialrule{\lightrulewidth}{\aboverulesep}{\belowrulesep}}
\renewcommand{\bottomrule}[2]{\specialrule{\heavyrulewidth}{\aboverulesep}{\belowbottomsep}}
pandoc \
    --standalone \
    --include-in-header=head.tex \
    --metadata=classoption:twocolumn \
    --filter=table_caption.py \
    --lua-filter=fenced_block.lua \
    --pdf-engine=xelatex \
    input.md \
    -o output.pdf

Creating Pandoc defaults

As you have seen from above, the command has become long and tideous after multiple arguments. Pandoc offers setting a defaults file which can store all the arguments and metadata. Defaults can be called using -d ⟨FILE⟩ or --defaults=⟨FILE⟩.

Defaults file (if any) will be present in the pandoc folder. In my case, the ⟨FILE⟩ is present in $HOME/.local/share/pandoc/defaults/ in .yaml format. To find your pandoc folder run:

pandoc -v
pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.2, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
User data directory: /home/mycomputer/.local/share/pandoc
Copyright (C) 2006-2022 John MacFarlane. Web:  https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
# If 'defaults' folder doesn't exist
mkdir -p ~/.local/share/pandoc/defaults/

If you want the ‘defaults’ folder in your work folder instead (which is completely possible), use relative paths in the commands. E.g. path/to/default.yaml instead of default.yaml

For this tutorial a possible setup for the pandoc root folder may look like this:

# mydefault.yaml

standalone: true # Needed to add header
pdf-engine: xelatex
number-sections: true # Allow numbering of sections
highlight-style: monochrome
from: markdown
verbosity: ERROR
filters:
    # ${.} for defaults folder path
    # Add feature for containers for PDF output
    - ${.}/filters/fenced_block.lua
    # Fix for table captions
    - ${.}/filters/table_caption.py
metadata:
    documentclass: article
    papersize: a4
    classoption:
        - 11pt
        - twoside
        - twocolumn
        - fleqn # Align display style equations to left
    geometry:
        - margin=1in
        - heightrounded
include-in-header:
    - ${.}/headers/head.tex

The output

Finally, you can export the PDF using the command below:

# .yaml extension is not required
pandoc -d mydefault input.md -o output.pdf

If defaults is present in local folder:

pandoc -d defaults/mydefault input.md -o output.pdf
Preview of PDF output from Pandoc with custom template. Page 1 of 2
Preview of PDF output from Pandoc with custom template. (Page 1 of 2)
Preview of PDF output from Pandoc with custom template. Page 2 of 2
Preview of PDF output from Pandoc with custom template. (Page 2 of 2)

Contrary to the default output, the final output is now two-column, typesetted in Libertinus font family, custom ‘Note’ box, section numbering, styled headers and footers, left-aligned math equations, narrow margins, and much more.

Miscellaneous

  1. Pandoc will appropriately convert the math equations into valid math format according to the output format. For LaTeX, any math equations inside $⟨math⟩$, $$⟨math⟩$$ and \begin{⟨environment⟩} ⟨math⟩ \end{⟨environment⟩} will be taken as a literal LaTeX math.
  2. By default raw_tex extension is enabled allowing any valid latex command or environment to be ignored by pandoc and inserted verbatim into the final .tex file.
  3. \begin{align} or any other equation environment doesn’t work inside $$ $$ so instead use aligned or remove $$ around the \begin{align} environment. LaTeX Stack Exchange discussion.
  4. Pandoc uses longtable package for its tables which causes issues when twocolumn option is set. Possible fixes are mentioned in issue #1023, stack exchange forum.

Conclusion

As you can see, the output now is significantly better in appearance than last. This approach can be adopted to many other use cases like preparing notices, writing letters and applications, or documents that utilizes templating patterns. Now you can give your much needed focus to the content rather than the outlook. All the aforementioned code along with a sample output can be found in this Github repo.

References

  1. Official Pandoc manual
  2. Pandoc filters
  3. pandocfilters, A python module for writing pandoc filters
  4. List of pandoc extensions, Github
  5. Collection of Python pandoc filters written with Panflute module
  6. LaTeX book, Wikibooks
How I convert Gregorian date to Nepali date