Last week or so, I achieved a wonderous thing. A trivial thing. I acheived a wondrous, trivial thing:
I wrote my most popular tweet ever:
My new thing is ending every Rmd with a list of links to the forums / SO questions / blogs / github repos that I used to solve the problem #rstats pic.twitter.com/U51KT9kiym— Andrew MacDonald (@polesasunder) January 26, 2018
That’s right! Dr. MacDonald going viral on the internet by urging people to write.. bibliographies. I am a fancy scholar, you see. This sort of thing is our solemn duty!
Anyway, this made me think a lot more about how I use Rmarkdown to write and talk about science.
So here are three quick tips! This is not an introduction to R markdown; if you want that you should read the wonderful website devoted to it, or the book about knitr.
This is a basic sort of “workflow” hack. You name a chunk – usually the first – with “setup”, and you can run it in one click within Rstudio.
This is a great place to load your libraries and your data, of course. But lately I’ve been developing mine more and more. I write short functions, define the options for all other chunks, and modify or reshape data. basically whatever subsequent chunks will need to run, I do it here. That way, every chunk that comes after depends (mostly) on only one other – the
setup chunk. This makes it easy to jump in and start working.
Of course, some chunks end up depending on each other. For example, I might fit a model in one chunk and spend the next few chunks visualizing it in different ways. But lately I’ve started even trying to move these dependencies into
setup. When I fit a model that takes a while to run, I often save the output into an RDS file, and load it in the setup chunk. That way I don’t have to run the chunk that fits the model; I usually set
eval = FALSE in that chunk’s options.
You can of course get much more elaborate! Maëlle Salmon in her blog on whether chunks are pets or livestock references this Twitter thread by David Robinson which is all about different strategies for defining dependencies. You can define dependencies among chunks with
autodep or the
dependson chunk options, which is pretty cool!
Embrace the Table of Contents
Another great reason to name your chunks cool things is the Rstudio automatic table of contents. You get it by clicking on a the top right of your Rmd editing window, OR on the list on the bottom left! They really really want you to feel organized and good about yourself over at Rstudio.
These table of contents have a great little nested structure. It shows you your document headings, your chunk labels, any sections inside your chunks, and then your functions. This makes it easy to jump around and gives you a sense of control over your life. Data analysis is confusing enough without making yourself suffer; embrace easy structure.
Perhaps it is a little weird that I use code sections inside of chunks, but it does seem to work! You do it like this:
# load libraries --------------- library(tidyverse) # read data -------------------- latest_results <- readr::read_csv("data/mmm_hot_fresh_data.csv")
That’s it! Any comment that ends in multiple
----- characters is interpreted as a section header. This One Weird Trick works in R scripts, too!
Rmd is where you live now
For me, for many scientists, writing R code is a hedonistically artistic, left-brained, paint-in-your-hair sort of experience. Many ecologists learn how to code the same way we learned how to catch salamanders as children – trial and error, flipping over rocks till we get a reward. This is right and good. The world is too full of living things for biologists to also be amazing programmers.
However, once the ecstasy of creation has swept over us, we awake late the next morning to find our canvas covered with 2100 lines of R code and object names like
new_fixed_data. Heads throbbing with a statistical absinthe hangover, we trudge through it slowly over days, trying to figure out what we did.
Like all art, after creation comes editing. Make your editing easy! Write in
.Rmd. Scribble, scratch out, try some models that turn out to be garbage and set
eval=FALSE on those chunks. Re-knit regularly to make sure that everything still works. If you have an
.Rmd file that will actually knit and produce output, then you have hope. You know that it is possible to retrace what you did.
ok now listen, the harsh truth is you're better off writing one thick, messy .Rmd where you keep all your garbage models and weird musings then going off on some precious folder structure and artfully-named .R files where you can't find a damned thing ever. #rstats #oldman— Andrew MacDonald (@polesasunder) January 17, 2018
You can always create an elaborate reproducible workflow later, with Make, Drake, Remake, Snakemake or your whichever .*ake you most love. This is the two-stage workflow some Data Scientists prefer.
Introduce a few helpful habits into your writing! Write in Rmarkdown, keep it organized, and allow yourself to relax and focus on your science! Happy writing :)