IEEE Bibliographies with Pandoc

I’ve recently rolled my own python pandoc-filter to parse the bibliography to be compatible with the IEEEtran.cls for IEEE style transactions for LaTeX submissions.
I thought I’d write it up here rather than forget the result — it took me all day to write only 30+ lines of filter code and most of that was figuring out how to debug a filter.

First off, if you’re just looking for the filter it can be found here. Ok! now let’s get into it…

The Problem

Bad:

 

Good:

The problem begins with IEEEtran’s style guide only correctly formatting `proper` bibtex sections in the bibliography.
For instance:
\begin{thebibliography}{00}
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.
\bibitem{b2}...
\end{thebibliography}

Unfortunately, the way pandoc does referencing is fixed to the AST representation, which heavily relies on explicit use of hypertargets to deal with citations directly.
It has the form:
\section*{References}\label{references}
\addcontentsline{toc}{section}{References}
\hypertarget{refs}{}
\hypertarget{ref-declerck2016cori}{}
{[}1{]} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.

Thankfully, we can write a filter using the pandoc-filter python package to update the AST automagically!

The Workflow

First off, it’s pretty common to muck up your python code when putting together these filters.
Unfortunately, the error messages are pretty cryptic / non-existant — I got mostly fd:4: hClose: resource vanished (Broken pipe) or Pandoc died with exitcode "83" during conversion.

Secondly, you can’t use print statements — since it’s included in AST output — or a debugger such as pdb since the process is spawned on a separate process, no stepping through code for you!

A method that I found worked was to generate the intermediate JSON representation used by pandoc’s AST.
This can be generated easily using pandoc -t json and can be stored as a file.
Next, I fired up ipython and installed the pypandoc library — this was really useful for fast iterations to text my filter.
Load up the json file in ipython such as: json_dat = open('test_out.json').read()
Now, you can quickly prototype your filter with pypandoc.convert_text(json_dat,'tex',format='json',filters=[os.path.join('pandoc-tools','bib-filter.py')]) — for my filter file called bib-filter.py.

Checking between the output of the function — just involves checking latex output — to see if the desired changes were made.
Most of my time was spent printing json_dat, skipping to the bad chunk of code, and counting the number of []’s in the AST to figure out why the variable of interest wasn’t collected. Thankfully, these errors around not collecting the right number of arguments are described extensively in the pandoc-filter output!

The Solution

Voila! The result of my bib-filter now generates:
\begin{thebibliography}{00}
\hypertarget{refs}{}
\bibitem{b1}\hypertarget{ref-declerck2016cori}{} T. Declerck \emph{et al.}, ``Cori - a system to support data-intensive computing,'' \emph{Proceedings of the Cray User Group}, p. 8, 2016.
\end{thebibliography}

Good luck and happy hacking!
p.s. the pandoc community need’s all the filters we can get, so thank you for viewing this post.

Writing sciency things in Markdown – Pandoc is Awesome!

Pandoc is an awesome tool!
This is especially true once properly configured for scientific writing.
Personally, I write all my papers in Markdown — or RMarkdown for the fancy stuff that requires generating figures — and leave pandoc to automatically produces pdfs and LaTeX output.
In fact, all my builds are simultaneously generated for 3 separate versions — corresponding to the major style guides in computer science — each in ACM, IEEE and LNCS formatting.
I get really distracted writing LaTeX directly — it’s really easy to lose track on what you want to say when writing when you could spend half the day type-setting and resizing figures.
This is where writing in markdown really shines; it allow’s you the flexibility of LaTeX — since TeX can be embedded at any part of the document — without you going down the long road of type-setting and losing your train of thought.
Best of all, if you’re about to submit the paper and need to finally focus on typesetting it’s easy to generate a LaTeX output of your work and edit as you normally would using the classic TeX workflow.
The full code is available on github and was built with the following packages:

  • pandoc — 1.19.2
  • pandoc-citeproc — 0.10.4
  • pandoc-crossref — 0.3.0.0

The corresponding pdfs can be viewed here as ACM, IEEE and LNCS.

Just telling it how it is…

Never has a Huxley quote been more in order.

“That so many of the well fed young television-watchers in the world’s most powerful democracy should be so completely indifferent to the idea of self-government, so blankly uninterested in freedom of thought and the right to dissent, is distressing, but not too surprising. “Free as a bird”, we say, and envy the winged creatures for their power of unrestricted movement in all the three dimensions. But alas, we forget the dodo. Any bird that has learned how to grub up a good living without being compelled to use its wings will soon renounce the privilege of flight and remain forever grounded.”

― Aldous Huxley, Brave New World Revisited

via Wikileaks

WATCH: The world’s smallest movie, starring… atoms

IBM makes an extraordinary little film by manipulating individual molecules

Hollywood loves to go big with movies. IBM scientists went small — very small. The company’s researchers produced a short, stop-motion film to showcase their efforts to design the next generation of data storage, and they did it by manipulating individual atoms to create images of a boy playing with a ball and bouncing on a trampoline. The clip, called A Boy and His Atom, has been certified by Guinness World Records as the “Smallest Stop-Motion Film” ever.

via theweek