15  Introduction

15.1 Philosophy

The Workbench was built to be platform-independent, so that you could continue to deploy its features without relying on specific features of GitHub. We use the exact same two step process of building the markdown and then passing that to the HTML renderer, with a twist. The core deployment function in the workbench is sandpaper:::ci_deploy(), which will deploy the rendered markdown and the HTML to separate orphan branches in the Git repository that are mapped as git worktrees to the site/built and site/docs folders during the build process. In each step, when the build is successful, the results are pushed to the respective worktree before moving to the next step. When the process is done (regardless of outcome), the worktrees are torn down gracefully.

flowchart TB
    classDef default color:#383838,fill:#FFF7F1,stroke-width:1px
    classDef external color:#383838,fill:#E6EEF8,stroke-width:1px
    classDef normal color:#081457,fill:#E3E6FC,stroke-width:1px
    classDef local fill:#FFC700,stroke:#333,stroke-width:1px
    classDef remote fill:#D2BDF2,stroke:#201434,stroke-width:1px
    classDef notouch fill:#F99697,stroke:#A4050E,stroke-width:1px

    GH[("@main")]:::remote
    MDOUT[("@md-outputs")]:::notouch
    PAGES[("@gh-pages")]:::notouch
    DEPLOY(["ci_deploy()"]):::external
    CIBUILDMD(["ci_build_markdown()"]):::external
    CIBUILDSITE(["ci_build_site()"]):::external

    subgraph virtual machine
    REPO["[repo]"]:::local
    BUILT["[repo]/site/built"]:::local
    SITE["[repo]/site/docs"]:::local
    VLESS("validate_lesson()"):::normal
    BUILDMD(["build_markdown()"]):::normal
    BUILDSITE(["build_site()"]):::normal
    end

    GH ---> REPO
    GH ~~~ DEPLOY
    REPO -.- VLESS

    DEPLOY ---> VLESS
    DEPLOY ---> CIBUILDMD
    DEPLOY ---> CIBUILDSITE
    VLESS -.- BUILDMD
    CIBUILDMD ---> MDOUT
    MDOUT <-.-> BUILT
    CIBUILDMD ---> BUILDMD
    CIBUILDSITE ---> PAGES
    PAGES <-.-> SITE
    CIBUILDSITE ---> BUILDSITE
    BUILT -.- BUILDSITE
    VLESS -.- BUILDSITE
    BUILDMD --> BUILT
    BUILDSITE --> SITE

This allows us to retain the commit history from building not just the HTML, but also the markdown outputs without interfering with the commit history for the lesson source. It also gives us the ability to use these branches as a cache so that the lesson doesn’t have to rebuild from scratch every time, but the biggest advantage is in the things that go beyond just deploying lessons.

15.1.1 Proof

One way to prove this works on any system that uses Git, you can create a lesson, push it to GitHub, and then render the contents before GitHub is done setting up its runners.

The file test-deploy.R will do just that1:

# Load The packages --------------------------------------
library(sandpaper)
library(usethis)
library(withr)
library(ids)
library(fs)
# Generate the Lesson ------------------------------------
tmp <- tempfile()
id <- paste0("TEST-", adjective_animal(style = "kebab"))
dir_create(tmp)
lsn <- path(tmp, id)
create_lesson(lsn, open = FALSE)
# Push the Lesson To Github ------------------------------
with_dir(lsn, {
  use_github()
})
# Render and Deploy the Lesson ---------------------------
with_dir(lsn, {
  sandpaper:::ci_deploy()
})
# Set GitHub Pages ---------------------------------------
with_dir(lsn, {
  use_github_pages()
})

You must run it in a non-interactive fashion:

Rscript remote/test-deploy.R

Now you can visit the GitHub repository and if you wait ~30 seconds, GitHub will have created a website for you and will still be setting up the lesson engine. This shows that it is possible to deploy as long as you have the following:

  1. A system with {sandpaper} and Git set up properly
  2. push access to a remote Git repository

In fact, if you look at the example for ci_deploy(), you will see that it creates a lesson and remote repository and walks you through the process that happens.

The challenge when deploying a Workbench lesson then lies in the step of provisioning the virtual machine or docker container to build a lesson when it updates.

15.2 Beyond Deployment

Having a single workflow for deployment is fine, but in the context of a lesson that will generate its content, other tools are needed to avoid the element of surprise from taking over when a change is made to the lesson. On the converse side, tools are needed to bring in updates that can affect the security and accuracy of the lesson.

15.2.1 Pull Request Management

The norm for working on GitHub is a trunk-based workflow—small branches containing different features or bug fixes are created and then merged into the default branch after review. If new content is added or packages update, it is important to have mechanisms to verify that the contents of a lesson and to intervene if something is incorrect before the changes happen.

15.2.2 Updating Compontents

The update workflows are there because we understand that a data science lesson does not live in isolation and it cannot be built in isolation—contents and tools need to be updated as the software ecosystem changes. Thus, just like we provide the {sandpaper} functions sandpaper::update_cache() and sandpaper::update_github_workflows(), these are also available as GitHub workflows that will create a pull request (if it has permissions).

15.3 In Practice

We use GitHub Workflows to build and deploy our lessons2 and the rest of the chapters in this section will discuss how we set these up, but within the context of GitHub. Remember that our philosophy is that the workbench should be deployable anywhere. These workflows are responsible for provisioning GitHub’s Ubuntu 22.04 Runner Image with the packages and software needed to build a lesson with The Workbench.

15.3.1 Workflows

There are broadly four categories of workflows, where an asterisk (*) denotes workflows that can be manually triggered by maintainers and a carrot (^) denotes workflows that require a personal access token to create a pull request

  1. Deployment* (sandpaper-main.yaml)
  2. Pull Request Responders (pr-preflight.yaml, pr-receive.yaml)
  3. Updates*^ (update-cache.yaml, update-workflows.yaml)
  4. Pull Request Preview Managers (pr-comment.yaml, pr-close-signal.yaml, pr-post-remove-branch.yaml)

These workflows are individually documented in the sandpaper repository

These workflows are interrelated and have different triggers. Below are a set of diagrams that disambiguates these relationships. First up are the workflows that are run on a schedule and on demand. Note that the update workflows will only push to a branch if any updates exist, otherwise, they will exit silently.

flowchart LR
    classDef default color:#383838,fill:#FFF7F1,stroke-width:1px
    classDef external color:#383838,fill:#E6EEF8,stroke-width:1px
    classDef normal color:#081457,fill:#E3E6FC,stroke-width:1px
    classDef local fill:#FFC700,stroke:#333,stroke-width:1px
    classDef remote fill:#D2BDF2,stroke:#201434,stroke-width:1px
    classDef notouch fill:#F99697,stroke:#A4050E,stroke-width:1px

    WEEK[\"CRON weekly"\]:::remote
    MONTH[\"CRON monthly"\]:::remote

    subgraph MAIN WORKFLOW
    push[\"push to main"\]:::remote
    md-outputs[("md-outputs")]:::local
    gh-pages[("gh-pages")]:::local

    sandpaper-main.yaml:::normal
    end

    subgraph "UPDATES (requires SANDPAPER_WORKFLOW token)"
    update-cache.yaml:::normal
    update-workflows.yaml:::normal

    update-cache[("update/packages")]:::notouch
    update-workflows[("update/workflows")]:::notouch

    PR[/"pull request"/]:::remote
    end

    push --> sandpaper-main.yaml
    WEEK --> sandpaper-main.yaml
    sandpaper-main.yaml -.->|"pushes to"| md-outputs
    sandpaper-main.yaml -.->|"pushes to"| gh-pages
    WEEK --> update-cache.yaml
    MONTH --> update-workflows.yaml
    update-cache.yaml -.->|"pushes to"| update-cache
    update-workflows.yaml -.->|"pushes to"| update-workflows
    update-cache.yaml -.->|"creates"| PR
    update-workflows.yaml -.->|"creates"| PR

Notice how none of the workflows push to main. The update workflows will push to the update/* branches and then create a pull request. It’s common to find workflows that will perform updates and then immediately push to the default branch (which is the case for the lesson-transition workflow), but it’s important to remember that a workflow that does automatic updates prevents the maintainers from critically inspecting the changes to the components. This is especially true of the update-cache.yaml workflow, which will update the {renv} lockfile. By passing it through the pull request process first, we can give the maintainers a way to audit the changes coming through.

flowchart LR
    subgraph PULL REQUEST
    classDef default color:#383838,fill:#FFF7F1,stroke-width:1px
    classDef external color:#383838,fill:#E6EEF8,stroke-width:1px
    classDef normal color:#081457,fill:#E3E6FC,stroke-width:1px
    classDef local fill:#FFC700,stroke:#333,stroke-width:1px
    classDef remote fill:#D2BDF2,stroke:#201434,stroke-width:1px
    classDef notouch fill:#F99697,stroke:#A4050E,stroke-width:1px

    md-outputs[("md-outputs")]:::local
    PR[\"pull request"\]:::remote
    pr-preflight.yaml:::normal
    pr-recieve.yaml(["pr-recieve.yaml"]):::normal
    pr-comment.yaml:::normal
    pr-close-signal.yaml:::normal
    pr-post-remove-branch.yaml:::normal
    md-outputs-PR[("md-outputs-PR#")]:::notouch
    end

    PR --> pr-preflight.yaml
    pr-preflight.yaml -.->|"comments on"| PR
    pr-preflight.yaml ~~~ pr-recieve.yaml
    PR -->|"on maintainer approval"| pr-recieve.yaml
    pr-recieve.yaml -.-|"uses"| md-outputs
    pr-recieve.yaml -.->|"triggers"| pr-comment.yaml
    pr-comment.yaml -.->|"creates"| md-outputs-PR
    pr-comment.yaml -.->|"comments on"| PR
    PR -.->|"on close"| pr-close-signal.yaml
    pr-close-signal.yaml -.->|"triggers"| pr-post-remove-branch.yaml
    pr-post-remove-branch.yaml -.->|"deletes"| md-outputs-PR

15.3.2 Actions

These workflows use a series of Custom GitHub Actions (aside from the official GitHub actions of checkout and cache) which can be found in the following repositories:

  • https://github.com/carpentries/actions a combination of both composite and JavaScript Actions that perform the duties for provisioning the workbench, provisioning packages for R-based lessons, validating pull requests, downloading data from previous runs, commenting on pull requests, and updating components.
  • https://github.com/r-lib/actions similar to carpentries/actions, but these are used in our workflows to provision R (that is, set up the correct environment variables) and to provision pandoc. Many of these actions are designed for packages and we use them heavily in the workbench development.
  • https://github.com/carpentries/create-pull-request a fork of a popular action that will create a pull request from a Github Workflow. This is a fork so that we can make sure that we will keep it secure.

Each repository has the actions documented to a degree, but we will discuss the implications and design of the actions in a following chapter.


  1. Please note: this will only work if you have a GitHub PAT set up so that {usethis} can interact with the GitHub API.↩︎

  2. GitHub can be a bit confusing with it’s terminology and fluid concepts. Their resource for Understanding GitHub Actions may help, but here’s how I think about it In this publication, whenever I refer to a GitHub Workflow, this is a YAML file that lives inside of a repository that tells GitHub how to set up its machine to build the lesson. It’s like a recipe plan and shopping list for a dinner. On the other hand, when I refer to a GitHub Action, this is a self-contained piece of software that will perform a specific task within a workflow. This is akin to a specific kitchen utensil, ingredient or spice within a recipe.↩︎