4 Flow Diagrams
4.1 Introduction
This section builds on The broad workflow and details the internal process that are invoked with the sandpaper::build_lesson()
function. If you look at the source for this function, it contains a total of sevens significant lines of code (many more due to documentation and comments).
The pre-flight steps all happen before a single source file is built. These check for pandoc, validate the lesson, and configure global elements. The last two lines are responsible for building the site and combining them with the global variables and templates.
Users will invoke this function in the following ways:
venue | function | purpose |
---|---|---|
local | sandpaper::build_lesson() |
render content for offline use |
local | sandpaper::serve() |
dynamically render and preview content |
remote | sandpaper:::ci_deploy() |
render content and deploy to branches |
All of these methods will call sandpaper::validate_lesson()
(which also sets up global metadata and menu variables) and the two-step internal functions sandpaper:::build_markdown()
and sandpaper:::build_site()
. Below, I break down and detail the process for each.
4.2 Preflight Checks
Before a lesson can be built, we need to confirm the following:
- We have access to the tools needed to build a lesson (e.g. pandoc). This is achieved via the
sandpaper::check_pandoc()
- We are inside a lesson that can be built with The Carpentries Workbench
4.3 validate_lesson()
The lesson validator is a bit of a misnomer. Yes, it does peform lesson validation, which it does so through the methods in the pegboard::Lesson
R6 class.
In order to use thse methods, it first loads the lesson, via the sandpaper::this_lesson()
function, which loads and caches the pegboard::Lesson
object. It also caches elements that are mostly duplicated across episodes with small tweaks for each episode:
- metadata in JSON-LD format
- sidebar
- extras menu for learner and instructor views
- translations of menu elements defined in {varnish}
4.4 build_markdown()
4.4.1 Generating Markdown
Markdown generation for the lesson is controlled by the internal function sandpaper:::build_markdown()
.
When a lesson contains R Markdown files, these need to have content rendered to markdownsot hat we can further process them. This content is processed with the {knitr} R package in a separate R process. Markdown source content on the other hand is copied to the site/built
folder.
Because R Markdown files can take some time to render, we use MD5 sums of the episode contents (stored in the site/built/md5sum.txt
file) to skip any files that have not changed.
One package that is missing from the above diagram is {renv} and that’s partially because it has an indirect effect on the lesson: it provisions the packages needed to build the lesson.
When episodes are rendered from R Markdown to Markdown, we attempt to reproduce the build environment as closely as possible by using the {renv} package. If the global package cache from {renv} is available, then the lesson profile is activated before the episode is sent to {knitr} and R will use the packages provided in that profile. This has two distinct advantages:
- The user does not have to worry about overwriting packages in their own library (i.e. a graduate researcher working on their dissertation does not want to have to rewrite their analyses because of a new version of {sf})
- The package versions will be the same as the versions on the GitHub version of the site, which means that there will be no false positives of new errors popping up
For details on the package cache, see the Building Lessons With A Package Cache article.
At this step, the markdown has been written and the state of the cache is updated so if we re-run this function, then it will show that no changes have occured. After this step, the internal function sandpaper:::build_site()
is run where the markdown file that we just created is converted to HTML with pandoc and stored in an R object. This R object is then manipulated and then written to an HTML file with the {varnish} website templates applied.
We use this function in the pull request workflows to demonstrate the changes in markdown source files, which is useful when package versions change, causing the output to potentially change.
4.5 build_site()
The following sections will discuss the HTML generation (the following section), manipulation (the section after that), and applying the template (the final section) separately because, while these processes are each run via the internal sandpaper:::build_site()
function, they are functionally separate.
4.5.1 Generating HTML
Each markdown file is processed into HTML via pandoc and returned to R as text. This is done via the internal function sandpaper:::render_html()
.
From here, the HTML exists as the internal body content of a website without a header, footer, or any styling. It is nearly ready for insertion into a website template. The next section details the flow we use to tweak the HTML content.
4.5.2 Processing HTML
The HTML needs to be tweaked because the output from pandoc, even with our lua filters, still needs some modification. We tweak the content by first converting the HTML into an Abstract Syntax Tree (AST). This allows us to programmatically manipulate tags in the HTML without resorting to using regular expressions.
In this part, we update links, images, headings, structure that we could not fix using lua filters. We also apply translations to some of the menu elements that are not templated in {varnish}. We then use the information from the episode to complete the global menu variable with links to the second level headings in the episode.
Working with XML data is perhaps one of the strangest experiences for an R user because in R, functions will normally return a copy of the data, but when working with an XML document parsed by {xml2}, the data is modified in place.
It allows us to do neat things, but there is a learning curve associated.
I have written hopefully helpful handbooks (guides) on
4.5.3 Applying Website Template
Now that we have an HTML AST that has been corrected and associated metadata, we are ready to write this to HTML. This process is achieved by passing the AST and metadata to {pkgdown} where it performs a little more manipulation, applies the {varnish} template, and writes it to disk.