Flow Diagrams

Introduction

This section builds on The broad workflow and details the internal process that are invoked with the sandpaper::build_lesson() function. If you look at the source for this function, it contains a total of sevens significant lines of code (many more due to documentation and comments).

The pre-flight steps all happen before a single source file is built. These check for pandoc, validate the lesson, and configure global elements. The last two lines are responsible for building the site and combining them with the global variables and templates.

Users will invoke this function in the following ways:

venue function purpose
local sandpaper::build_lesson() render content for offline use
local sandpaper::serve() dynamically render and preview content
remote sandpaper:::ci_deploy() render content and deploy to branches

All of these methods will call sandpaper::validate_lesson() (which also sets up global metadata and menu variables) and the two-step internal functions sandpaper:::build_markdown() and sandpaper:::build_site(). Below, I break down and detail the process for each.

Preflight Checks

Before a lesson can be built, we need to confirm the following:

  1. We have access to the tools needed to build a lesson (e.g. pandoc). This is achieved via the sandpaper::check_pandoc()
  2. We are inside a lesson that can be built with The Carpentries Workbench

validate_lesson()

The lesson validator is a bit of a misnomer. Yes, it does peform lesson validation, which it does so through the methods in the pegboard::Lesson R6 class.

In order to use thse methods, it first loads the lesson, via the sandpaper::this_lesson() function, which loads and caches the pegboard::Lesson object. It also caches elements that are mostly duplicated across episodes with small tweaks for each episode:

  • metadata in JSON-LD format
  • sidebar
  • extras menu for learner and instructor views

build_markdown()

Generating Markdown

Markdown generation for the lesson is controlled by the internal function sandpaper:::build_markdown().

When a lesson contains R Markdown files, these need to have content rendered to markdownsot hat we can further process them. This content is processed with the {knitr} R package in a separate R process. Markdown source content on the other hand is copied to the site/built folder.

Because R Markdown files can take some time to render, we use MD5 sums of the episode contents (stored in the site/built/md5sum.txt file) to skip any files that have not changed.

sequenceDiagram
    autonumber
    participant episodes/episode.Rmd
    box rgb(255, 214, 216) The Workbench
    participant {sandpaper}
    end
    box rgb(230, 234, 240) Document Engine
    participant {renv}
    participant {knitr}
    end
    box rgb(255, 231, 168) Generated Files
    participant site/built/md5sum.txt
    participant site/built/episode.md
    end

    site/built/md5sum.txt -->> {sandpaper}: READ file cache
    {sandpaper} -->> {knitr}: RUN conversion
    episodes/episode.Rmd -->> {knitr}: PROCESS changed file(s)
    {knitr} -->> site/built/episode.md: WRITE Markdown
    {sandpaper} -->> site/built/md5sum.txt: WRITE file cache

Package Cache and Reproducibility

One package that is missing from the above diagram is {renv} and that’s partially because it has an indirect effect on the lesson: it provisions the packages needed to build the lesson.

When episodes are rendered from R Markdown to Markdown, we attempt to reproduce the build environment as closely as possible by using the {renv} package. If the global package cache from {renv} is available, then the lesson profile is activated before the episode is sent to {knitr} and R will use the packages provided in that profile. This has two distinct advantages:

  1. The user does not have to worry about overwriting packages in their own library (i.e. a graduate researcher working on their dissertation does not want to have to rewrite their analyses because of a new version of {sf})
  2. The package versions will be the same as the versions on the GitHub version of the site, which means that there will be no false positives of new errors popping up

For details on the package cache, see the Building Lessons With A Package Cache article.

At this step, the markdown has been written and the state of the cache is updated so if we re-run this function, then it will show that no changes have occured. After this step, the internal function sandpaper:::build_site() is run where the markdown file that we just created is converted to HTML with pandoc and stored in an R object. This R object is then manipulated and then written to an HTML file with the {varnish} website templates applied.

We use this function in the pull request workflows to demonstrate the changes in markdown source files, which is useful when package versions change, causing the output to potentially change.

build_site()

The following sections will discuss the HTML generation (the following section), manipulation (the section after that), and applying the template (the final section) separately because, while these processes are each run via the internal sandpaper:::build_site() function, they are functionally separate.

Generating HTML

Each markdown file is processed into HTML via pandoc and returned to R as text. This is done via the internal function sandpaper:::render_html().

sequenceDiagram
    autonumber
    box rgb(255, 214, 216) The Workbench
    participant {sandpaper}
    end
    box rgb(230, 234, 240) Document Engine
    participant pandoc
    end
    box rgb(255, 231, 168) Generated Files
    participant site/built/episode.md
    end

    {sandpaper} -->> pandoc: LOAD pandoc with lua filters
    site/built/episode.md -->> pandoc: READ markdown
    pandoc -->> {sandpaper}: RENDER HTML as text

From here, the HTML exists as the internal body content of a website without a header, footer, or any styling. It is nearly ready for insertion into a website template. The next section details the flow we use to tweak the HTML content.

Processing HTML

The HTML needs to be tweaked because the output from pandoc, even with our lua filters, still needs some modification. We tweak the content by first converting the HTML into an Abstract Syntax Tree (AST). This allows us to programmatically manipulate tags in the HTML without resorting to using regular expressions.

In this part, we update links, images, headings, structure that we could not fix using lua filters. We then use the information from the episode to complete the global menu variable with links to the second level headings in the episode.

sequenceDiagram
    autonumber
    box rgb(255, 214, 216) The Workbench
    participant {sandpaper}
    end
    box rgb(230, 241, 255) R Object
    participant HTML(AST)
    end
    box rgb(230, 234, 240) Helper Package
    participant {xml2}
    end

    {sandpaper} -->> {xml2}: READ HTML
    {xml2} -->> HTML(AST): PARSE HTML
    activate {sandpaper}
    note right of HTML(AST): sandpaper:::fix_nodes()
    {xml2} -->> HTML(AST): update structure
    HTML(AST) -->> {sandpaper}: extract menu items
    note right of {sandpaper}: generate learner and instructor versions
    deactivate {sandpaper}

Working With XML

Working with XML data is perhaps one of the strangest experiences for an R user because in R, functions will normally return a copy of the data, but when working with an XML document parsed by {xml2}, the data is modified in place.

It allows us to do neat things, but there is a learning curve associated.

Applying Website Template

Now that we have an HTML AST that has been corrected and associated metadata, we are ready to write this to HTML. This process is achieved by passing the AST and metadata to {pkgdown} where it performs a little more manipulation, applies the {varnish} template, and writes it to disk.

sequenceDiagram
    autonumber
    box rgb(255, 214, 216) The Workbench
    participant {sandpaper}
    participant {varnish}
    end
    box rgb(230, 241, 255) R Object
    participant HTML(AST)
    end
    box rgb(230, 234, 240) Helper Package
    participant {pkgdown}
    end
    box rgb(255, 231, 168) Generated Files
    participant site/docs/episode.html
    end

    activate {sandpaper}
    {sandpaper} -->> {pkgdown}: Set global menu variables
    HTML(AST) -->> {pkgdown}: Hand off HTML to pkgdown
    deactivate {sandpaper}
    activate {pkgdown}
    {varnish} -->> {pkgdown}: Load template
    {pkgdown} -->> site/docs/episode.html: WRITE website
    deactivate {pkgdown}