Migrating lessons from the previous infrastructure
Last updated on 2024-11-05 | Edit this page
This page describes a workflow for semi-automated transition from The Carpentries old infrastructure to the Workbench. It is based on the transition workflows and documentation originally written by Zhian Kamvar. See the full set of lesson transition scripts and documentation at https://github.com/carpentries/lesson-transition. Please contact The Carpentries Curriculum Team with questions about this workflow.
The workflow is intended for use by Carpentries community members who want to transition their own lesson repository from the old “styles” infrastructure to use the Workbench.
After following this process you will have replaced the contents of your current lesson repository with a Workbench version containing the same lesson content.
Callout
We strongly recommend that you create a backup of your lesson repository before you follow this workflow.
Prerequisites
Before following the steps described below, please make sure that you have done the following:
- Enabled push access to GitHub from the command line on your local system: you will need to have SSH or HTTPS access configured to allow you to push changes to your GitHub repository from the command line. See Authenticating with the command line in GitHub’s documentation for more details and step-by-step guides.
- Merged or closed all open pull requests on your lesson repository. Any open pull requests will be rendered invalid by the migration between infrastructures.
- Installed Git and R on your local system. The workflow described below assumes that you can call R as a command from a Bash shell session.
- (Optional, but highly recommended) archived a final version of your lesson in the old infrastructure. Although this workflow has been documented with recommendations to minimise the chances of accidentally overwriting your lesson with anything other than a fully-tranisitoned equivalent, it is still wise to ensure that you have a complete and up-to-date copy of your lesson as a backup. For example, you could publish a release to Zenodo.
- (Optional, but highly recommended) created or synced up a fork of
the lesson. This can be used to test the transitioned lesson on GitHub
before making the final, irreversible changes to your primary lesson
repository. The fork only needs to contain a copy of the default branch
of your lesson (usually
gh-pages
).
Transition Workflow
- Set up the transition tools
- Create an Rscript for your lesson
- Run the transition tool
- Set up your GitHub repository to receive the transitioned lesson
- Post-transition steps
Callout
Note: these steps assume your lesson exists in The Carpentries
Incubator i.e. under the carpentries-incubator
organisation
on GitHub. If it does not, you will need to replace all of the folder
names etc below accordingly.
1. Set up the transition tools
-
Open a Bash shell and run the following:
BASH
git clone https://github.com/carpentries/lesson-transition.git && cd lesson-transition git switch -c YOUR-LESSON-NAME-transition # provide a branch name appropriate for your lesson, e.g. for the carpentries-incubator/docker-introduction lesson, you would call the branch docker-introduction-transition git submodule update --init git-filter-repo # this will ensure the git-filter-repo tool is available
-
Install dependencies for lesson transition tool:
If you already have
renv
installed on your system: open R in this directory (run theR
command in Bash), answery
to the promptWould you like to restore the project library?
, wait for the project setup to complete, then exit R-
If you do not yet have
renv
installed: open R in this directory (run theR
command in Bash), then run the following:R
install.packages('renv') # enter 'y' to complete the installation library('renv') renv::restore()
or run the following commands in Bash:
-
Returning to Bash (you can quit R by calling the
q()
function), run the following:BASH
Rscript establish-template.R template/ git submodule add --force -b gh-pages https://github.com/carpentries-incubator/YOUR-LESSON-NAME carpentries-incubator/YOUR-LESSON-NAME # replace YOUR-LESSON-NAME with the name of your lesson, and replace 'gh-pages' with 'main' if main is the default branch for your lesson
2. Create an Rscript for your lesson
The transition tool requires an R script to exist for any lesson it migrates to the new infrastrucure. In the large-scale transition of Carpentries lessons, these R scripts were used to handle various edge cases, customisations to the standard Markdown syntax, etc that existed in every lesson but were unique to each.
The script is run on the version of the Workbench lesson created by the transition tool, as a kind of post-processing step before the changes made by the transition are committed. Any changes made by the script will appear as if carried out by the transition tool, thereby avoiding additional commits in your project history associated with “cleaning up” after the migration.
It is sufficient to create an empty file, named appropriately. This is available in the add-lesson.sh file, which will also provide hints about how the data can be transformed.
But you may wish to populate this script with some function calls to
clean up various common artifacts produced by the transition, or to
preserve any custom workflows you have
added to your lesson repository. Look at the .R
scripts in
the carpentries-incubator/
,
datacarpentry
,
librarycarpentry
,
and swcarpentry
directories of the lesson-transition
repository for
inspiration.
3. Run the transition tool
Try this first
In Bash, run the release process via make
to save
yourself some work:
Callout
Notes:
- the
YOUR-LESSON-NAME.json
file does not need to exist for these commands to run - if you would like to run a test first, you can use
make sandpaper/carpentries-incubator/YOUR-LESSON-NAME.json
, which will create a ‘beta test’ version of the transitioned lesson in thecarpentries-incubator
folder.
If that doesn’t work…
Try running the process step-by-step:
-
In Bash, run:
BASH
bash filter-and-transform.sh release/carpentries-incubator/YOUR-LESSON-NAME.json carpentries-incubator/YOUR-LESSON-NAME.R $(readlink -f ./filter-list.txt) 'return message' cd release/carpentries-incubator/YOUR-LESSON-NAME LESSONNAME=`pwd | rev | cut -d/ -f1 | rev` for FILENAME in commit-map ref-map suboptimal-issues do cp .git/filter-repo/$FILENAME ../$LESSONNAME-$FILENAME.hash done egrep ' 0{40}$' .git/filter-repo/commit-map | cut -d' ' -f1 | head -1 > ../$LESSONNAME-invalid.hash if [ ! -s ../$LESSONNAME-invalid.hash ]; then rm ../$LESSONNAME-invalid.hash;fi
-
Adjust lesson config file
Option 1 (manual): Using your favourite text editor, open the
config.yaml
file and remove the lines setting theworkbench-beta
,analytics
,lang
, andurl
parameters.-
Option 2 (using regular expressions): In Bash, run:
-
Set the correct address for your origin remote repository. In Bash, run:
Custom workflow files
The contents of the .github
folder are not preserved by
the transition tool. If you had any custom workflow files in your lesson
repository before migrating to the new infrastructure, you will need to
add those back into .github/workflows/
at this point then
commit the changes. (You could also modify the Rscript for your lesson
transition to do this for you, and repeat the transition process to
include it.)
Build and check your transitioned lesson
At this stage, the
release/carpentries-incubator/YOUR-LESSON-NAME/
directory
should contain a transitioned version of your lesson. To check how
things are looking, install the
Workbench tools for your system, then open R in this directory and
run sandpaper::serve()
.
Optional: update your lesson’s R script to produce a smoother transition
While previewing this transitioned lesson site, you might see some
problems in the content of your lesson site that appeared during the
migration. Liquid comments (delineated by {% comment %}
and
{% endcomment %}
tags) are one commonly-encountered
artifact. Another is broken links to the lesson setup instructions,
which are found at index.html#setup
in a Workbench site.
These can be fixed by editing the lesson after transition, but for a
cleaner commit history on your lesson you might wish to delete the
transitioned lesson directory (inside the release
folder),
modify the R script for your lesson to handle those issues, and re-run
the transition tool. (See Create an Rscript for
your lesson above.)
If something goes wrong
To go back to the start and try again, delete the directory for your
lesson within the release/
directory,
i.e. rm -rf release/carpentries-incubator/YOUR-LESSON-NAME
.
If the transition tool ran successfully but your lesson build fails, this is usually due to customisations made to the lesson that fall outside what the transition tool expected to find. If you run into problems, we recommend that you try to identify differences between yours and a typical lesson repository (the Workbench Markdown template and R Markdown template are good examples) and experiment to see if any of those are causing the site build to fail.
If something goes wrong and you cannot debug the problem on your own,
post a message to the #workbench
channel on The Carpentries
Slack workspace, or reach out to the Curriculum Team by email (curriculum@carpentries.org). Try to provide as much
information as you can, including any error messages and logging
information that were produced when you ran the steps described above.
We will do our best to help you but please note that the Core Team’s
capacity to provide support for transition of community lessons is
severely limited.
4. Set up your GitHub repository to receive the transitioned lesson
Callout
We recommend that you try these steps out on a fork of your lesson first, so that you can be certain everything works before making permanent changes to your main lesson repository.
-
Adjust the
config.yaml
:- Record your lesson creation date in config.yaml: run
git log
and hit Shift+G to jump to end of file, note down the date when the first commit was made. Then, using your favourite text editor, openconfig.yaml
and modify thecreated
field by replacing~
with this creation date in YYYY-MM-DD format. - Check the
source
URL specified inconfig.yaml
: this may be set incorrectly during the transition process. Adjust it to the correct URL for your lesson source repository on GitHub.
- Record your lesson creation date in config.yaml: run
-
Commit the changes you made to the lesson config file:
-
Rename the branches of your project:
- On your GitHub repository, rename the
gh-pages
branch tolegacy/gh-pages
(ifmain
is your default branch, also rename that tolegacy/main
).- Branches can be renamed by going to the list of all branches on your
repository (add
/branches/all
to the end of the URL for your GitHub repository e.g. https://github.com/datacarpentry/image-processing/branches/all) and selecting the pencil icon button next to the relevant branch in that listing.
- Branches can be renamed by going to the list of all branches on your
repository (add
- On your GitHub repository, rename the
-
In Bash on your local system (make sure you are working in the root of the
release/carpentries-incubator/YOUR-LESSON-NAME
directory), run the following commands (please read the comments that annotate these commands and note that we strongly recommend that you execute these one-at-a-time!):BASH
git remote -v # check the names and addresses of your remote repositories: if you are testing on a fork and it is not listed here, add it with 'git remote add' https://git-scm.com/docs/git-remote#Documentation/git-remote.txt-emaddem git fetch --prune origin # this assumes your remote is called origin - if you are testing on a fork, use the name of that remote here instead git checkout --orphan gh-pages # set up gh-pages branch # double-check that you are in the root of your lesson within the lesson-transition/release folder (e.g. if transitioning carpentries-incubator/docker-introduction, you should be in lesson-transition/release/carpentries-incubator/docker-introduction/) git rm -rf . mkdir -p .github/workflows/ curl -o .github/workflows/close-pr.yaml https://raw.githubusercontent.com/carpentries/lesson-transition/main/close-pr.yaml # download the workflow that will auto-close invalid PRs to gh-pages git add .github/workflows/close-pr.yaml git commit --allow-empty -m 'Initialising gh-pages branch' git push --force origin HEAD:gh-pages
-
If everything has gone well up to this point, it is time to go back to the
main
branch and force push its contents to GitHub: -
On your GitHub repository:
- set the default branch to
main
(in Settings->General, click the button with two arrows next to the name of the default branch) - in Settings->Branches, add rules to protect the
main
branch (require pull requests) and locklegacy/*
- make sure that your lesson site is being served with GitHub Pages
from the root folder of the gh-pages branch (in Settings->Pages,
under Build and deployment, ensure that
gh-pages
is selected with the dropdown under Branch and that/ (root)
is the folder selected, then hit the Save button)
- set the default branch to
If all of the above worked, you should now have a Workbench version of your lesson.
5. Post-transition steps
After you have transitioned your lesson, you should:
- Delete and re-create any forks and local clones of your lesson project, to minimise the likelihood that you will accidentally push the old project history back to the GitHub repository. If you have any collaborators and fellow lesson developers/maintainers, ask them to do the same.
- If your lesson is in The Carpentries Incubator, tell the Curriculum Team that you have completed the transition so that we can activate the automated creation of pull requests to update the Workbench infrastructure when new versions of the packages are released.
- (Optional, but highly recommended) Open a pull request to https://github.com/carpentries/reactables/ to
add the invalid commit hash (in the
invalid.hash
file created for your lesson during the transition (step 3 above)) to theworkbench/invalid-hashes.json
file. This will include the hash in the data feed used by our infrastructure to support the GitHub Actions workflow that will automatically close any pull requests opened to your repository from a branch containing the old project history. To do this:- Make a fork of the
carpentries/reactables
GitHub repository - On a new branch of that fork, edit the
workbench/invalid-hashes.json
file, adding a new line before the final}
line matching the format of the other lines containing hashes, i.e."carpentries-incubator/YOUR-LESSON-NAME": "HASHASHASHASHASHASHASH"
whereHASHASHASHASHASHASHASH
is the hash contained in therelease/carpentries-incubator/YOUR-LESSON-NAME-invalid.hash
file created during the infrastructure transition. Make sure to add a comma,
to the end of the preceding line to ensure the validity of the JSON file. - commit the change, then open a pull request back to
carpentries/reactables
to suggest that we merge your invalid hash into our data feed. A member of the Curriculum Team will review and merge your changes.
- Make a fork of the