WILT: Symbols in Pandoc
Struggled to include unicode symbols in a markdown>pandoc>PDF workflow. Found a way.
1> pandoc -t pdf -f markdown -o Roadmap.pdf --pdf-engine=xelatex --template=../pandoc/PDF.template Roadmap.md
2[WARNING] Missing character: There is no ✪ (U+272A) (U+272A) in font [lmroman10-regular]:mapping=t
...damn. These are the errors that make me cry. I'm not a pandoc expert and I know this is going to hurt.
Problem evaluation
Markdown to PDF via Pandoc specifies the fonts that are being used. Unlike the freeform nature of HTML where you can put a unicode character in with wild abandon; your specified character set for the documents need to have the character installed. Obviously, my default character set doesn't have it. What one has characters like ✪?
Font research
Looking through the fonts on the mac, and reading up, turns out most fonts don't include extended character sets beyond latin or their chosen characters, because they can get HUGE for no good reason. OK, that's sensible. I then played with Wingdings and Webdings and others, and didn't find my unicode characters. Why? Because they've got specific character replacements of the basic character set. Of course.
A duckduckgo finds Monospace that does have the unicodes. Nice. How to get them in the PDF?
newunicodechar
This command lets you specify a substitute - if the latex generator sees the specified character in the code, replace it with something else. And the something else can include LaTeX commands. Nice! I add the top in my Frontmatter:
1header-includes:
2- |
3 ```{=latex}
4 \usepackage{newunicodechar}
5 \newfontfamily\myfont[]{Monospace}
6 \DeclareTextFontCommand{\textmyfont}{\myfont}
7 \newunicodechar{✪}{\textmyfont{①}}
8 \newunicodechar{Ⓗ}{\textmyfont{Ⓗ}}
9 \newunicodechar{Ⓜ}{\textmyfont{Ⓜ}}
10 \newunicodechar{Ⓛ}{\textmyfont{Ⓛ}}
11 ```
The header-includes
above is a token in my PDF template file so the text gets included right at the top of the latex converted output before it gets through the PDF generation part.
DeclareTextFontCommand
lets you use the font command \myfont
inline as textmyfont
rather than needing the begin
;end
longform syntax.
You'll note that the ✪
is replaced with a ①
in line 7. That's because Monospace didn't have that symbol, but I like that for the web Pandoc output, so I fall back to the 1 for PDF. In the Markdown itself, those characters remain as the &...; unicode representations. When Pandoc runs xelatex to convert, the newunicodechar
sees the &...; code and substitutes it with the latex command.
Success!


