Auditing Pull Requests
Last updated on 2024-12-03 | Edit this page
Overview
Questions
- What happens during a pull request?
- How do I review generated content of a pull request?
- How do I handle a pull request from a bot?
Objectives
- Identify key features of a pull request to review
- Identify the benefits of pull request comments from a bot
- Understand why bots will initiate pull requests
- Understand the purpose of automated pull requests
Introduction
One of the biggest benefits of working on a Carpentries Lesson is that it gives maintainers and contributors practice collaboratively working on GitHub and practicing common software engineering practices, including pull requests and reviews. In the Carpentries Workbench, we have implemented new features that will make reviewing contributed content easier for maintainers:
- Source content is checked for valid headings, links, and images
- Generated content is rendered to markdown and placed in a temporary branch for visual inspection.
- Pull requests are checked for malicious attacks.
Reviewing A Pull Request
When you recieve a pull request, a check will first validate that the lesson can be built and then, if the lesson can be built, it will generate output and leave a comment that provides information about the rendered output:
With this information, you can click on the link that says ‘Inspect the changes’ to navigate to a diff of the rendered files. In this example, we have manipulated the output of a plot, and GitHub allows us to visually inspect these differences by scrolling down to the file mentioned in the diff and clicking on the “file” icon to the top right, which indicates to “display the rich diff”.
Living with Entropy
In R Markdown documents, If you use any sort of code that generates
random numbers, you may end up with small changes that show up on the
list of changed files. See this example where using
the ggplot2
function geom_jitter()
leads to
slightly different image files. You can fix this by setting a seed
for the random number generator (e.g. set.seed(1)
) at the
beginning of the episode, so that the same random numbers are generated
each time the lesson is built.
Of course, if you have a rendered lesson, another important thing is to check to make sure the outputs continue to work. If you notice any new errors or warnings new in the diff, you can work with the contributor to resolve them.
Risk Management
Accepting generated content into lessons from anyone runs the risk of a security breach by exposing secrets. To mitigate this risk, GitHub limits the scope of what is possible inside a pull request so that we can check and render the content without risk of exploitation. Through this, we render and check the lesson inside the pull request with no privileges, check that the pull request is valid (not malicious), and then create a temporary branch for an exploratory preview, allowing the maintainer to audit the generated content before it gets adopted into the curriculum.
If the PR is invalid (e.g. the contributor spoofed a separate, valid PR, or modified one of the github actions files), then the maintainer is alerted that the PR is potentially risky (see the Being Vigilant section for details)
Automated Pull Requests
There are two situations where you would receive a pull request from The Carpentries Apprentice bot
- The workflows need to be updated to the latest versions
- You have a lesson that uses generated content, the software requirements file (e.g. renv.lock or [future] requirements.txt) is updated to the latest versions and the lesson is re-built.
More details about the purpose of these builds can be found in The Chapter on updating lesson components.
For Lessons Outside of The Carpentries
If you are using {sandpaper} to work on a lesson on your own personal account, these pull requests may never trigger. If you want them to work, follow the instructions in the technical article in {sandpaper} called [Working with Automated Pull Requests].
Workflow Updates
When you receive a workflow update pull request, it will be on a
branch called update/workflows
, state that it is a bot and
then indicate which version of sandpaper the workflows will be updated
to.
Because this PR contains changed workflow files, it will be marked as invalid no preview will be created, rendering a comment that indicates as such.
Updating Package Cache
Updates to the package cache are on the updates/packages
branch and accompanied by a bot comment that indicates the package
versions that have been updated.
You will notice at the bottom of the comment there are instructions for how to check out a new branch and inspect the changes locally:
You are free to push code changes to this branch to update any lesson material that has changed due to package updates or you can also pin the versions of the packages you do not want updated.
Status Updates
When the pull request comes in from [The Carpentries Apprentice], you will see this comment immediately:
In the next couple of minutes, the R Markdown files will be re-built with the updated versions of the packages and the comment will update to reveal changes that have been made (if any).
Being Vigilant
Preventing Malicious Commits
The pull request previews are designed to allow you to inspect the potential changes before they go live. Because our lessons run arbitrary code, it is important to inspect the changes to make sure that someone is not trying to insert anything malicious into your lesson. A good rule of thumb for maintaining your lesson is that if there are changes are changes you do not understand coming from someone other than @carpentries-bot, then, it’s a good idea to wait to merge until you can fully understand the changes that are being proposed.
One risk that might happen is if someone updates lesson content and the github workflows at the same time. If this happens, you will see a comment from the workflows that looks like this:
It is not always the case that changes in lesson files and workflow files will be bad, but it is not good practice to mix them.
Transition from carpentries/styles
During the migration to The Carpentries Workbench, we are using the lesson transition tool to convert lessons from the former “lesson template” to The Workbench. This involved removing commits unrelated to lesson content from the git history, which reduces the size of the lesson’s git repository and has the benefit of making the contribution log more clear. The downside is that forks that were created before the lesson was transferred to The Workbench suddenly became invalid.
If someone attempts to merge a pull request from an old repository, the first thing you will notice is hundreds of new commits and the second thing you will notice is the results of the automated check
Key Points
- Pull requests for generated formats requires validate of prose and generated content
- Inspecting the rendered markdown output can help maintainers identify changes that occur due to software before they are deployed to the website
- Automated pull requests help keep the infrastructure up-to-date