Behind the Scenes of The Carpentries Workbench
Introduction
The Carpentries Workbench is the set of R packages and other tools that make it easy for anyone to create and contribute to a Carpentries Lesson and it was designed with the following guiding principles:
- Lesson contributors do not need to know anything about the toolchain to contribute in a meaningful way,
- Elements of the toolchain that evaluates, validates, and stylizes should live in separate repositories to allow for seamless updating, and
- The procedures should be well-documented and generalizable enough that the toolchain is not entirely dependent on R.
This document provides details of how the packages of the Workbench work behind the scenes to create a full Carpentries lesson website from markdown source file.
Tools
The Workbench is built on top of the following major pieces of software, all of which are available via RStudio
- Git
- R
- Pandoc
The Workbench itself consists of three R packages, which can all be updated on the fly with no changes to the lesson.
There are three packages that comprise the Workbench:
- {sandpaper}: User interface and engine for the Workbench
- {pegboard}: Validation and parsing of lesson components
- {varnish}: HTML, CSS, and JavaScript templates
In addition, the Workbench uses the following packages for support:
Local Workflow
The two-step
The local workflow is known as a ‘two-step’ workflow, which renders markdown from the source files (either Markdown or R Markdown) and then applies the styling to HTML rendered from these Markdown sources.
Only the source files here are tracked by Git. Everything else is ignored locally.
We use the two-step process because it provides us an air-gap between the tools needed to build the markdown and the tools needed to render the website. It also provides us a ready cache of outputs so that R Markdown source content does not need to be re-rendered. Moreover, we designed these tools to be independent from each other so that if, in the future, we can mix and match with different tools as they become available.
The two-step process is not new; the {rmarkdown} package uses this process behind the scenes, but it will discard the markdown output by default.
Validation
Lesson validation is performed by {pegboard} by parsing Markdown and evaluating the elements for low-hanging fruit of accessibility:
The validation of lesson elements is performed before the lesson is built, so that the contributor can address any issues even if they have a broken component in the rest of the toolchain. Invalid lesson elements are displayed on the contributors R console with information about the location of the error, an explanation of what was wrong, and a link to resources to help explain the error and offer correction.
In Practice
Because of the need for bootstrapping, validation, and caching, the number of steps from source files to lesson website is considerably more than two. The diagram below describes shows the process by which a lesson is built using the Workbench.
- The lesson contributor has an idea and writes it in Markdown or R Markdown
- The lesson contributor runs sandpaper::serve()to start the engine.
- {sandpaper} passes this file to {pegboard}, which checks it for accessibility and reports to the user if there are any errors
- {sandpaper} passes the file to {knitr}, which renders the file to Markdown and stores it in the site/builtfolder
- {sandpaper} passes the file to PANDOC, which renders the Markdown to HTML (this is temporarily stored as a character vector in R)
- {sandpaper} passes the HTML to {pkgdown}, which applies the templates from {varnish}, creating the lesson website.
Remote Workflow
The motivation for the remote workflows is the same as the local workflow: to allow for rendering of an HTML website without having to rebuild files that have previously been built. The only twist is that these files are necessarily ephemeral because we will never be building the site on the same server day to day, so how do we avoid rebuilding markdown intermediates and HTML outputs when we do not track them by git?
The answer is with orphan branches that map on to the folders in site/ using git worktrees, which is achieved via the internal function sandpaper:::ci_deploy().
| Folder | Branch | Contents | 
|---|---|---|
| site/built | md-outputs | Markdown outputs and rendered files (e.g. images) | 
| site/docs | gh-pages | HTML outputs for the live website. | 
sandpaper:::ci_deploy() process- orphan branch
- Orphan branches are separate branches known to git that share no common history with the main branch.
- work tree
- Work trees are a special git workflow that allows you to work on multiple for the same repository in separate folders.
Each time a commit happens on the main branch, the main branch is checked out and then git worktrees are provisioned inside of the site/ directory for each branch via the internal function sandpaper:::git_worktree_setup(), which is modified from Hadley Wickham’s pkgdown::deploy_to_branch() function. After they are provisioned and the contents populated from the existing branches, then they appear on the remote system just like they appear on your local system and the lesson can be updated without rebuilding everyting.
Once it is all done, the contents are pushed to their respective branches, the worktrees are disassembled, and the remote runner is released to another task.