RMarkdown/knitr etc Considered Harmful

Typically, I write my scientific reports in Latex. A makefile orchestrates all my analysis in stages, and some steps produce latex fragments that appear in the final document. A typical step reads the previous steps’s appropriate data into R, performs a single calculation, model training or evaluation, or generates a figure while simultaneously writing out the appropriate fragment of Latex that describes the process, including quantitative details when necessary.

I like this because each step is simple to understand, its dependencies are clearly documented by the makefile, and the reporting on the step is located right where the code is. And Make automatically handles rebuilding the appropriate parts of my document when I tell it to.

Contrast this with RMarkdown, which encourages the scientist to pile state into the document willy-nilly. Steps which depend on one another can be separated by large regions of text and code. As you develop your Markdown file, the strong temptation is to evaluate fragments of code in your interpreter, which can lead to hard to understand bugs and unreproducible results.

Most notebook style authoring tools have this problem.

I suppose its a classic story of usability vs correctness and as usual, I don’t know why I expect correctness to win.

Leave a Reply

Your email address will not be published. Required fields are marked *