Skip to content

Pandoc Tricks

Benct Philip Jonsson edited this page Dec 7, 2019 · 34 revisions

Here’s some tricks that are allowed by pandoc but not obvious at first sight.

Table of Contents

From Markdown, To Markdown

using pandoc -f markdown... -t markdown... can have surprisingly useful applications. As a demo, this file is generated by

pandoc -f gfm -t gfm --atx-headers \
     --reference-location=block --toc -s -o temp-github.md temp.md

Be careful of @ though, you need to escape it in pandoc since it is treated as citation in pandoc.

Cleanup

As shown in issue #2814, rendering a document to itself can be used to clean up / normalize your markdown file.

TOC generation

e.g. you have a long markdown file in GitHub and want to have a TOC, you can use pandoc -s -t gfm --toc -o example-with-toc.md example.md

This a useful workaround to update the TOC of very long documents, but—beware!—if you use this trick for writing over the input file, you’ll end stacking TOCs—each new Table of Contents being generated above the previously built ones, and indexing them too. This technique is useful when working with different source and output files.

Also, you can add a title to the TOC using the toc-title variable, but only if you use a markdown template — as explained ahead.

Using Markdown Templates

Did you know that you can use pandoc template with markdown too?

Ask pandoc to write-out the default template for markdown:

pandoc --print-default-template=markdown > template.markdown

And now let’s peek at the template we got:

$if(titleblock)$
$titleblock$

$endif$
$for(header-includes)$
$header-includes$

$endfor$
$for(include-before)$
$include-before$

$endfor$
$if(toc)$
$toc$

$endif$
$body$
$for(include-after)$

$include-after$
$endfor$

As you can see, there’s plenty of conditional statements to play with, allowing for additional control over the output markdown file.

You can also use the toc-title template variable to tell pandoc to add a title on top of the generated TOC. Change the template’s toc block like this:

$if(toc)$
$if(toc-title)$
# $toc-title$
$endif$

$toc$

$endif$

And now invoke pandoc like this:

pandoc --toc -V toc-title:"Table of Contents" --template=template.markdown -o example-with-toc.md example.md

And you’ll see in the example-with-toc.md file an auto-generated Table of Contents with a # Table of Contents title over it.

NOTE: if you also include some extra markdown contents with the --include-before-body option (eg: --include-before-body=somefile.md) the contents of the included file will go before the TOC (at least, with the template used in this example) and any headings it contains will not be included in the TOC — ie: the TOC only indexes what comes after the $toc$ template tag. This is useful if you’d like to include an Abstract before the TOC.

Math in Pure Markdown

The manual said:

Note: the --webtex option will affect Markdown output as well as HTML.

This can be used to put math in pure markdown. e.g. you want to put math directly in the README.md in GitHub.

For example, in the temp.md:

# Important Discovery!

$1+2\neq3!$

Try it!

Run this:

pandoc --atx-headers --webtex=https://latex.codecogs.com/png.latex? -s -o temp-codecogs.md temp.md

Then the output becomes:

Important Discovery!

1+2\neq3!

Try it!

Convert Between the 4 Table Syntaxes in Pandoc

Say, in your source markdown file pipe.md:

| testing     | pandoc            | tables  |
|-------------|-------------------|---------|
| simple cell | no multiline cell | and     |
| so          | on                | no list |

In command line,

pandoc -t markdown-simple_tables-multiline_tables-pipe_tables -s -o grid.md pipe.md

In the output grid.md:

+--------------------------+--------------------------+--------------------------+
| testing                  | pandoc                   | tables                   |
+==========================+==========================+==========================+
| simple cell              | no multiline cell        | and                      |
+--------------------------+--------------------------+--------------------------+
| so                       | on                       | no list                  |
+--------------------------+--------------------------+--------------------------+

Repeated Footnotes Anchors and Headers Across Multiple Files

If you use auto-identifiers for the headers, and there are different headers with the same name across different files, you’d want to catenate them together, and pandoc can do this for you:

pandoc file1.md file2.md ...

But if there are repeated footnotes anchors on both files, you need to use the --file-scope option, which will parse each file individually (so the footnotes anchors are “local” to the individual file):

pandoc file1.md file2.md --file-scope ...

What about if the 2 files have both these problems? i.e., headers with same names (hence the same Id by the auto-identifier) and footnotes with same anchors appear across the files. Either approach gives you problems.

In this case, you can use “to markdown from markdown” to write an intermediate markdown file using --file-scope, which handles the colliding footnote anchors for you, and then generate the final document from that intermediate markdown file, and let the auto-identifiers handle the headers for you:

pandoc --file-scope -o intermediate.md file1.md file2.md
pandoc intermediate.md ...

Template Snippet

If you wrote a template snippet that do not form a complete template. The -H, -B, or -A option would not help because pandoc would put your snippet as is and wouldn’t process it as a template. i.e. The snippet is included after the template is processed.

A trick mentioned by @cagix in jgm/pandoc-templates#220 is this:

pandoc --template=template_snippet.tex document.md -o processed_snippet
pandoc ... -H processed_snippet document.md -o document.<toFormat>
# Or shorter but bash only (process substitution)
SNIPPET=template_snippet.tex; INPUT=document.md; OUTPUT=document.<toFormat>
pandoc ... -H <(pandoc --template=$SNIPPET $INPUT) $INPUT -o $OUTPUT

The first line will process your template snippet according to the properties of the document, but since your snippet (probably) do not have $body$, the body would not be in the output. Now the snippet is processed and can then be included through -H as is in the 2nd line.

YAML Metadata for Any Format

YAML metadata is only defined for pandoc’s markdown syntax. See jgm/pandoc#1960.

Currently, there is a workaround like this (while the YAML metadata only accepts markdown syntax):

pandoc -f markdown -t native -s metadata.yml | sed '$ d' > metadata.native
pandoc -t native -o document.native document.<fromFormat>
pandoc -f native -s -o document.<toFormat> metadata.native document.native
# Or shorter but bash only (process substitution)
YAML=metadata.yml; INPUT=document.<fromFormat>; OUTPUT=document.<toFormat>
pandoc ... -f native -s -o $OUTPUT <(pandoc -f markdown -t native -s $YAML | sed '$ d') <(pandoc -t native $INPUT)

Explanation:

The sed in the first line: because the metadata.yml is regarding as a markdown document with no body, so the last line of the metadata in native format is [], which you need to remove. Another way of removing it is head -n -1 (would not work on Mac’s default head). From my test it seems the meta in native is always in one-line, if true then head -n1 will work (which also works on Mac).

Left-aligning Tables in LaTeX

Based on this pandoc-discuss exchange and this TeX StackExchange topic, it is possible to left-align all tables in a document (in the PDF output from LaTeX) with this single invocation in the YAML header block of the markdown document:

---
header-includes:
  - |
    ```{=latex}
    \usepackage[margins=raggedright]{floatrow}
    ```
...

This applies to all floats, and fine-grained control may be achieved with the options outlined in the documentation for the floatrow LaTeX package.

GFM Task Lists with Pandoc

Task lists are part of pandoc as of v2.6. Syntax is the same as GFM.

Today in date metadata

Add this to the pandoc command you use:

-M date="$(date "+%B %e, %Y")"

POSIX only.

Definition list terms on their own line in LaTeX

Most tools, including most Web browsers render definition lists with the (first) definition on a separate, indented line below the term:

Term

  Definition A.
  Second line of definition.
  
  Definition B.

LaTeX instead sets the term in bold and the first definition run-in on the same line, which doesn't look good if you have space between paragraphs as Pandoc does:

**Term**  Definition A.
  Second line of definition.
  
  Definition B.

It is easy to fix this without loading any extra package. Just make sure the following is in your LaTeX preamble:

% "Clone" the original \item command
\let\originalitem\item

% Redefine the \item command using the "clone"
\makeatletter
\renewcommand{\item}[1][\@nil]{%
    \def\tmp{#1}%
    \ifx\tmp\@nnil\originalitem\else\originalitem[#1]\hfill\par\fi}
\makeatother

This still leaves the term in boldface. To get the term in the normal typeface change the invocation of \originalitem[#1] to

\originalitem[\textnormal{#1}]

Put this in your custom template or add a header-includes field to your document metadata:

---
header-includes:
  - |
    ````{=latex}
    % insert the fix here
    ````

Level 4 and 5 headings on their own line in LaTeX

In LaTeX level 4 headings are rendered with the \paragraph command and level 5 headings are rendered with the \subparagraph command. These commands set the (first) paragraph after the heading run-in with the heading. There is an easy way to fix this. Make sure to include the following in your LaTeX preamble:

% Make "clones" of the commands
\let\originalparagraph\paragraph
\let\originalsubparagraph\subparagraph

% Redefine the commands using the "clones"
\renewcommand{\paragraph}[1]%
{\originalparagraph{#1}\hfill}
\renewcommand{\subparagraph}[1]%
{\originalsubparagraph{#1}\hfill}

Note that unlike the similar fix for definition list terms there should not be any \par after the \hfill here!

Globbing input files in the right order

As you probably know you can pass multiple input files to pandoc on the command line and they will be treated as a single long file, with blank lines inserted between them:

$ pandoc this.md that.md other.md -o all.html

You can even glob a whole gang of files. This will concatenate all files with an .md extension in the current directory (aka folder):

$ pandoc *.md -o all.html

There is a snag with globbing like this, however: pandoc will get the list of file names sorted in ASCII order — i.e. similar to alphabetical ordering, but using the order of characters in the ASCII encoding as sorting order (or actually according to the sorting order of the current locale in your shell) where letters A-Z and a-z happen to come alphabetically, but with all of A-Z before all of a-z — in any case possibly not in the order they are supposed to come in the text: given the three files given in the first example the *.md glob pattern is equivalent to

other.md
that.md
this.md

There is however an age-old workaround, which actually exploits the glob sorting feature:

If you want the files in a specific order give them names starting with a zero-padded number:

00-intro.md
01-this.md
02-that.md
03-other.md
...
09-something.md
10-anything.md
11-more.md

The leading 0 in 01..09 makes sure that they sort before 10... This is necessary because the shell has no concept of numeric sorting but sorts all characters in globbed file names in ASCII order, but in ASCII order 0 comes before 1, which comes before 2 and so on, and all digits come before all letters.

It is usually a good idea to add an extra trailing 0 as well (or more if you have a lot of files or are going to move files around a lot):

000-intro.md
010-this.md
020-that.md
030-other.md
...
090-something.md
100-anything.md
110-more.md

this way if you want to move a part of the text around or add a part between existing parts you don't need to renumber all the files; you can just give the files containing the moved parts suitable intermediate numbers:

000-intro.md
010-this.md
015-other.md # used to be 030
020-that.md
...
090-something.md
100-anything.md
105-additional.md # new file!
110-more.md

Using this technique the numbering of the file names may get out of synch with the numbering of sections/chapters in the text, but that is OK; you should generally rely on the names/labels of chapters/sections as identifiers and let Pandoc itself, LaTeX and/or pandoc-crossref handle the actual section numbering. The numbers in the file names are file numbers, in a format which is good for the shell, and as human friendly as possible.

You can’t perform that action at this time.