IEEE Bibliographies with Pandoc

I’ve recently rolled my own python pandoc-filter to parse the bibliography to be compatible with the IEEEtran.cls for IEEE style transactions for LaTeX submissions.
I thought I’d write it up here rather than forget the result — it took me all day to write only 30+ lines of filter code and most of that was figuring out how to debug a filter.

First off, if you’re just looking for the filter it can be found here. Ok! now let’s get into it…

The Problem




The problem begins with IEEEtran’s style guide only correctly formatting `proper` bibtex sections in the bibliography.
For instance:
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.

Unfortunately, the way pandoc does referencing is fixed to the AST representation, which heavily relies on explicit use of hypertargets to deal with citations directly.
It has the form:
{[}1{]} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.

Thankfully, we can write a filter using the pandoc-filter python package to update the AST automagically!

The Workflow

First off, it’s pretty common to muck up your python code when putting together these filters.
Unfortunately, the error messages are pretty cryptic / non-existant — I got mostly fd:4: hClose: resource vanished (Broken pipe) or Pandoc died with exitcode "83" during conversion.

Secondly, you can’t use print statements — since it’s included in AST output — or a debugger such as pdb since the process is spawned on a separate process, no stepping through code for you!

A method that I found worked was to generate the intermediate JSON representation used by pandoc’s AST.
This can be generated easily using pandoc -t json and can be stored as a file.
Next, I fired up ipython and installed the pypandoc library — this was really useful for fast iterations to text my filter.
Load up the json file in ipython such as: json_dat = open('test_out.json').read()
Now, you can quickly prototype your filter with pypandoc.convert_text(json_dat,'tex',format='json',filters=[os.path.join('pandoc-tools','')]) — for my filter file called

Checking between the output of the function — just involves checking latex output — to see if the desired changes were made.
Most of my time was spent printing json_dat, skipping to the bad chunk of code, and counting the number of []’s in the AST to figure out why the variable of interest wasn’t collected. Thankfully, these errors around not collecting the right number of arguments are described extensively in the pandoc-filter output!

The Solution

Voila! The result of my bib-filter now generates:
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.

Good luck and happy hacking!
p.s. the pandoc community need’s all the filters we can get, so thank you for viewing this post.

Writing sciency things in Markdown – Pandoc is Awesome!

Pandoc is an awesome tool!
This is especially true once properly configured for scientific writing.
Personally, I write all my papers in Markdown — or RMarkdown for the fancy stuff that requires generating figures — and leave pandoc to automatically produces pdfs and LaTeX output.
In fact, all my builds are simultaneously generated for 3 separate versions — corresponding to the major style guides in computer science — each in ACM, IEEE and LNCS formatting.
I get really distracted writing LaTeX directly — it’s really easy to lose track on what you want to say when writing when you could spend half the day type-setting and resizing figures.
This is where writing in markdown really shines; it allow’s you the flexibility of LaTeX — since TeX can be embedded at any part of the document — without you going down the long road of type-setting and losing your train of thought.
Best of all, if you’re about to submit the paper and need to finally focus on typesetting it’s easy to generate a LaTeX output of your work and edit as you normally would using the classic TeX workflow.
The full code is available on github and was built with the following packages:

  • pandoc — 1.19.2
  • pandoc-citeproc — 0.10.4
  • pandoc-crossref —

The corresponding pdfs can be viewed here as ACM, IEEE and LNCS.