15 Introduction
15.1 Philosophy
The Workbench was built to be platform-independent, so that you could continue to deploy its features without relying on specific features of GitHub. We use the exact same two step process of building the markdown and then passing that to the HTML renderer, with a twist. The core deployment function in the workbench is sandpaper:::ci_deploy()
, which will deploy the rendered markdown and the HTML to separate orphan branches in the Git repository that are mapped as git worktrees to the site/built
and site/docs
folders during the build process. In each step, when the build is successful, the results are pushed to the respective worktree before moving to the next step. When the process is done (regardless of outcome), the worktrees are torn down gracefully.
This allows us to retain the commit history from building not just the HTML, but also the markdown outputs without interfering with the commit history for the lesson source. It also gives us the ability to use these branches as a cache so that the lesson doesn’t have to rebuild from scratch every time, but the biggest advantage is in the things that go beyond just deploying lessons.
15.1.1 Proof
One way to prove this works on any system that uses Git, you can create a lesson, push it to GitHub, and then render the contents before GitHub is done setting up its runners.
The file test-deploy.R
will do just that1:
# Load The packages --------------------------------------
library(sandpaper)
library(usethis)
library(withr)
library(ids)
library(fs)
# Generate the Lesson ------------------------------------
<- tempfile()
tmp <- paste0("TEST-", adjective_animal(style = "kebab"))
id dir_create(tmp)
<- path(tmp, id)
lsn create_lesson(lsn, open = FALSE)
# Push the Lesson To Github ------------------------------
with_dir(lsn, {
use_github()
})# Render and Deploy the Lesson ---------------------------
with_dir(lsn, {
:::ci_deploy()
sandpaper
})# Set GitHub Pages ---------------------------------------
with_dir(lsn, {
use_github_pages()
})
You must run it in a non-interactive fashion:
Rscript remote/test-deploy.R
Now you can visit the GitHub repository and if you wait ~30 seconds, GitHub will have created a website for you and will still be setting up the lesson engine. This shows that it is possible to deploy as long as you have the following:
- A system with {sandpaper} and Git set up properly
- push access to a remote Git repository
In fact, if you look at the example for ci_deploy()
, you will see that it creates a lesson and remote repository and walks you through the process that happens.
The challenge when deploying a Workbench lesson then lies in the step of provisioning the virtual machine or docker container to build a lesson when it updates.
15.2 Beyond Deployment
Having a single workflow for deployment is fine, but in the context of a lesson that will generate its content, other tools are needed to avoid the element of surprise from taking over when a change is made to the lesson. On the converse side, tools are needed to bring in updates that can affect the security and accuracy of the lesson.
15.2.1 Pull Request Management
The norm for working on GitHub is a trunk-based workflow—small branches containing different features or bug fixes are created and then merged into the default branch after review. If new content is added or packages update, it is important to have mechanisms to verify that the contents of a lesson and to intervene if something is incorrect before the changes happen.
15.2.2 Updating Compontents
The update workflows are there because we understand that a data science lesson does not live in isolation and it cannot be built in isolation—contents and tools need to be updated as the software ecosystem changes. Thus, just like we provide the {sandpaper} functions sandpaper::update_cache() and sandpaper::update_github_workflows(), these are also available as GitHub workflows that will create a pull request (if it has permissions).
15.3 In Practice
We use GitHub Workflows to build and deploy our lessons2 and the rest of the chapters in this section will discuss how we set these up, but within the context of GitHub. Remember that our philosophy is that the workbench should be deployable anywhere. These workflows are responsible for provisioning GitHub’s Ubuntu 22.04 Runner Image with the packages and software needed to build a lesson with The Workbench.
15.3.1 Workflows
There are broadly four categories of workflows, where an asterisk (*) denotes workflows that can be manually triggered by maintainers and a carrot (^) denotes workflows that require a personal access token to create a pull request
- Deployment* (
sandpaper-main.yaml
) - Pull Request Responders (
pr-preflight.yaml
,pr-receive.yaml
) - Updates*^ (
update-cache.yaml
,update-workflows.yaml
) - Pull Request Preview Managers (
pr-comment.yaml
,pr-close-signal.yaml
,pr-post-remove-branch.yaml
)
These workflows are individually documented in the sandpaper repository
These workflows are interrelated and have different triggers. Below are a set of diagrams that disambiguates these relationships. First up are the workflows that are run on a schedule and on demand. Note that the update workflows will only push to a branch if any updates exist, otherwise, they will exit silently.
Notice how none of the workflows push to main. The update workflows will push to the update/*
branches and then create a pull request. It’s common to find workflows that will perform updates and then immediately push to the default branch (which is the case for the lesson-transition workflow), but it’s important to remember that a workflow that does automatic updates prevents the maintainers from critically inspecting the changes to the components. This is especially true of the update-cache.yaml
workflow, which will update the {renv} lockfile. By passing it through the pull request process first, we can give the maintainers a way to audit the changes coming through.
15.3.2 Actions
These workflows use a series of Custom GitHub Actions (aside from the official GitHub actions of checkout
and cache
) which can be found in the following repositories:
- https://github.com/carpentries/actions a combination of both composite and JavaScript Actions that perform the duties for provisioning the workbench, provisioning packages for R-based lessons, validating pull requests, downloading data from previous runs, commenting on pull requests, and updating components.
- https://github.com/r-lib/actions similar to carpentries/actions, but these are used in our workflows to provision R (that is, set up the correct environment variables) and to provision pandoc. Many of these actions are designed for packages and we use them heavily in the workbench development.
- https://github.com/carpentries/create-pull-request a fork of a popular action that will create a pull request from a Github Workflow. This is a fork so that we can make sure that we will keep it secure.
Each repository has the actions documented to a degree, but we will discuss the implications and design of the actions in a following chapter.
Please note: this will only work if you have a GitHub PAT set up so that {usethis} can interact with the GitHub API.↩︎
GitHub can be a bit confusing with it’s terminology and fluid concepts. Their resource for Understanding GitHub Actions may help, but here’s how I think about it In this publication, whenever I refer to a GitHub Workflow, this is a YAML file that lives inside of a repository that tells GitHub how to set up its machine to build the lesson. It’s like a recipe plan and shopping list for a dinner. On the other hand, when I refer to a GitHub Action, this is a self-contained piece of software that will perform a specific task within a workflow. This is akin to a specific kitchen utensil, ingredient or spice within a recipe.↩︎