IEEE Bibliographies with Pandoc

I’ve recently rolled my own python pandoc-filter to parse the bibliography to be compatible with the IEEEtran.cls for IEEE style transactions for LaTeX submissions.
I thought I’d write it up here rather than forget the result — it took me all day to write only 30+ lines of filter code and most of that was figuring out how to debug a filter.

First off, if you’re just looking for the filter it can be found here. Ok! now let’s get into it…

The Problem

Bad:

 

Good:

The problem begins with IEEEtran’s style guide only correctly formatting `proper` bibtex sections in the bibliography.
For instance:
\begin{thebibliography}{00}
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.
\bibitem{b2}...
\end{thebibliography}

Unfortunately, the way pandoc does referencing is fixed to the AST representation, which heavily relies on explicit use of hypertargets to deal with citations directly.
It has the form:
\section*{References}\label{references}
\addcontentsline{toc}{section}{References}
\hypertarget{refs}{}
\hypertarget{ref-declerck2016cori}{}
{[}1{]} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.

Thankfully, we can write a filter using the pandoc-filter python package to update the AST automagically!

The Workflow

First off, it’s pretty common to muck up your python code when putting together these filters.
Unfortunately, the error messages are pretty cryptic / non-existant — I got mostly fd:4: hClose: resource vanished (Broken pipe) or Pandoc died with exitcode "83" during conversion.

Secondly, you can’t use print statements — since it’s included in AST output — or a debugger such as pdb since the process is spawned on a separate process, no stepping through code for you!

A method that I found worked was to generate the intermediate JSON representation used by pandoc’s AST.
This can be generated easily using pandoc -t json and can be stored as a file.
Next, I fired up ipython and installed the pypandoc library — this was really useful for fast iterations to text my filter.
Load up the json file in ipython such as: json_dat = open('test_out.json').read()
Now, you can quickly prototype your filter with pypandoc.convert_text(json_dat,'tex',format='json',filters=[os.path.join('pandoc-tools','bib-filter.py')]) — for my filter file called bib-filter.py.

Checking between the output of the function — just involves checking latex output — to see if the desired changes were made.
Most of my time was spent printing json_dat, skipping to the bad chunk of code, and counting the number of []’s in the AST to figure out why the variable of interest wasn’t collected. Thankfully, these errors around not collecting the right number of arguments are described extensively in the pandoc-filter output!

The Solution

Voila! The result of my bib-filter now generates:
\begin{thebibliography}{00}
\hypertarget{refs}{}
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.
\end{thebibliography}

Good luck and happy hacking!
p.s. the pandoc community need’s all the filters we can get, so thank you for viewing this post.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sign in with Twitter