Software Carpentry (SWC) and Data Carpentry (DC) are two programs of The Carpentries (a fiscally sponsored project of Community Initiatives). We teach essential computing and data skills. We exist because the skills needed to do computational, data-intensive research are not part of basic research training in most disciplines.
Software Carpentry enables researchers to create purpose-built tools, whether it be a Unix shell script to automate repetitive tasks, or software code in programming languages such as Python, R, or MATLAB. These enable researchers to build programs that can be read, re-used, and validated, greatly enhancing the sharing and reproducibility of their research.
Data Carpentry learners are taught to work with data more effectively. Workshops focus on the data lifecycle, covering data organization, cleaning and management through to data analysis and visualization. Lessons are domain-specific, with coverage in biology, genomics, and social sciences.
The Carpentries began systematically recording data for our workshops in 2012. We use this data to investigate how The Carpentries has grown over the years including number and geographic reach of our workshops, and learners at these workshops. We also look at our Instructor Training program, including number and geographic reach of instructor training events, number of trainees and their completion rates, and onboarding of new Instructor Trainers.
Data are collected by a team of Workshop Administrators. In Africa, Australia, Canada, New Zealand, and the United Kingdom, Workshop Administrators are affiliated with our member institutions and provide in-kind staff time. A full-time Carpentries staff member is the Workshop Administrator for the rest the world.
Software Carpentry workshops or Data Carpentry workshops generally comprise two full days of face-to-face instruction, based on either Software Carpentry or Data Carpentry lesson materials, respectively.
Workshops are taught by volunteer trained and certified Instructors. Certified Instructors comprise people who have completed our instructor training course. Software Carpentry and Data Carpentry lessons are all open source, and are hosted on GitHub.
For each workshop, we collected the following data:
variable | definition |
---|---|
slug | Unique identifier for each workshop. Takes the form YYYY-MM-DD-sitename . |
start | Start date of the workshop. Takes the form YYYY-MM-DD . |
attendance | Number of learners at the workshop. |
host_name | Institution that hosted the workshop. |
country | The two-letter country code for the country in which the workshop was held. |
workshop_type | Whether this is a Software Carpentry (SWC) or Data Carpentry (DC) workshop. |
The full data set, representing 1332 workshops, can be found in the Programmatic Assessment folder of The Carpentries Assessment repository on GitHub (https://github.com/carpentries/assessment/tree/master/programmatic-assessment).
The number of Software Carpentry and Data Carpentry workshops appears to have remained roughly steady for the past several years, after a sharp jump from 2014 to 2015. The year 2015 was the first year in which The Carpentries had staff and a dedicated database to track workshop data, so some of this growth may also reflect a growth in internal systems.
The data shown here may not account for unreported self-organized workshops. Although The Carpentries attempts to collect data on all workshops run under "The Carpentries" brand, sometimes institutions may run a workshop without reporting it back to The Carpentries staff. Thus, if there has been a shift from more centrally-organized to more self-organized workshops, this may cause an underestimate of our workshop growth. We are working to improve data collection to have more accurate reflections of our scope of work.
This may also reflect a shift to sites running a variation of Carpentries lessons, rather than official full Carpentries workshops. While we've known anecdotally that this happens often, we have not systematically collected any data on when or how Carpentries lessons are used in other contexts.
Figure 1: Workshops by Carpentry by Year
This bar chart shows the number of Data Carpentry (DC) and Software Carpentry (SWC) workshops each year. Data for 2018 is a projection. The proportion of workshops in the first quarter of 2017 relative to the full year was applied to actual first quarter data from 2018 to calculate this projection. Source data can be found in Table 1 in the Appendix.
The Carpentries began in predominantly white, English speaking countries. In later years, it expanded to other European countries, and most recently expanded reach to include African and Latin American countries. A list of all countries ever having hosted a Carpentries workshop can be found in Table 2 of the Appendix.
In many countries, we have seen a steady increase in the number of workshops run. In several countries though (including Australia, Canada, and New Zealand), we have seen a decline in the number of workshops run. This may be due to the reasons cited above, including unreported self-organized workshops or an increase in variations on Carpentries workshops. In either case this is a motivation for The Carpentries to improve data collection and methods to understand our scope of work beyond our centrally coordinated workshops.
Decreases in the number of workshops run in some countries may also be accounted for by shifts in our instructor community. Carpentries activity is sustained by our instructors. Some instructors may move to new geographies or to new career phases. Without a larger community in place, these geographies may not have had the capacity to sustain their activity. This is a motivation for The Carpentries to build strong and sustainable communities, with systems that account for individual turnover.
Table 2 in the appendix shows each country having hosted a workshop with the number of workshops each year from 2012 to 2018. This data is used to plot the countries having hosted 10 or more workshops since 2012 as well as year each country held its first workshop below.
This bar chart looks only at countries that have hosted 10 or more workshops since 2012. For each country, the number of workshops run each year is plotted. Data for 2018 is a projection. Source data can be found in Table 3 in the Appendix.
The map below shows each country that has hosted a Carpentries workshop, noting the year they hosted their first workshop. Darker colors represent countries with first workshops in more recent years. This shows Carpentries origin in Australia, Canada, the United States, and western Europe, with increased workshops in Africa in recent years.
From 2015 - 2017, The Carpentries saw remarkable growth in the number of countries running Carpentries workshops for the first time. However, many of these countries did not have sustainable communities allowing them to continue running workshops. While The Carpentries has held workshops in at total of 43 countries, 18 of these have held only one workshop. This is motivation for The Carpentries to look at building sustainable communities when working in new geographies.
This bar chart represents the number of unique countries running a workshop each year. Table 5 in the Appendix includes a list of each country having held a workshop each year. Data from 2018 is actual data, not a projection.
In addition to looking at how many workshops we have run, we look at how many people we have impacted through our workshops. Workshop specific attendance data can be found in the Programmatic Assessment folder of The Carpentries Assessment repository on GitHub (https://github.com/carpentries/assessment/tree/master/programmatic-assessment ).
After running a workshop, The Carpentries staff ask hosts or instructors to submit attendance data to us. In some cases, hosts and instructors provide a detailed account of the number registered and the number attended each day. In other cases, they offer their best estimate of overall attendance.
This data allows us to demonstrate our impact by showing the number of learners who have been exposed to the computational, coding, and data science skills taught by The Carpentries. Further analysis of learner outcomes can be found in The Carpentries Learner Assessment report (https://github.com/carpentries/assessment/tree/master/learner-assessment ).
This bar chart represents the total number of Software Carpentry (SWC) and Data Carpentry (DC) learners each year. Recent years show a drop in total number of learners served, which may be due to smaller class sizes as seen in the section below. Numbers for 2018 are a projection.
Source data can be found in Table 6 in the Appendix.
The Carpentries has historically recommended (but not enforced) a class size of no more than 40 learners with two instructors. In early years, we saw more workshops closer to the upper limit, with many extreme outliers. The Carpentries experimented with class sizes more than twice our recommendation. In recent years, the mean and median class sizes have dropped, with fewer extreme outliers. This trend towards smaller class sizes with fewer extremes is driven by workshop hosts and instructors, rather than being mandated by The Carpentries. Our instructors experimented with large class sizes, and found that that this was not an ideal classroom environment.
Our curriculum and lessons are designed to be hands on, engaging, and interactive. This kind of environment is difficult to manage with larger class. The downward trend in class sizes shows that our hosts and instructors appreciate the importance in maintaining this kind of environment.
The average class size over the past few years has been 23-24 learners. This may indicate that we should update our official recommendation, as our community is able to experiment and learn what works best in the field.
Actual class size for each workshop can be found in the Programmatic Assessment folder of The Carpentries Assessment repository on GitHub (https://github.com/carpentries/assessment/tree/master/programmatic-assessment ).
This box plot shows the frequency of class sizes by year. The lower and upper ends of each box represent bounds of the 2nd and 3rd quartiles, while the lower and upper tails represent the bounds of the 1st and 4th quartiles. The center line of each box represents the median, and the center dot of each box represents the mean. Outliers are represented by dots outside the tails of each plot.
Data from 2018 is based on actual first quarter data, not a projection for the year.
Summary data can be found in Table 7 in the Appendix.
Over the last hundred years, researchers have discovered an enormous amount about how people learn and how best to teach them. Unfortunately, much of that knowledge has not yet been translated into common classroom practice, especially at the university level. To this goal, we offer an Instructor Training program.
This two-day class has the following overall goals:
Because we have only two days, some things are beyond the scope of this class. We do not teach:
This training is based on our constantly revised and updated curriculum (https://carpentries.github.io/instructor-training/ ).
When considering certification completion rates, analyses below exclude data from 2018 Q1. Trainees have 90 days to complete their certification requirements, so no one who attended instructor training in 2018 Q1 would be expected to have completed certification.
For each of our instructor training events, we collected the following data:
variable | definition |
---|---|
slug | Unique identifier for each workshop. Takes the form YYYY-MM-DD-sitename . |
start | Start date of the workshop. Takes the form YYYY-MM-DD . |
country | The country in which the workshop was held. Online events are noted as "online" even if all participants were in one country. |
attendance | Number of trainees at the workshop. |
count_badged | Number of trainees awarded a Software Carpentry (SWC) or Data Carpentry (DC) badge. * |
pct_completion | Percent of trainees awarded a Software Carpentry (SWC) or Data Carpentry (DC) badge. * |
Until 2015, all Instructor Training events were run by one person, and were exclusively online. After starting the Trainers training program, we now have trainers across the globe who can run online events across timezones and inperson events as needed both at their home institutions and traveling as needed.
Since 2012, The Carpentries has run 135 instructor training events. This includes in person events in Australia, Canada, Netherlands, New Zealand, Norway, Poland, Puerto Rico**, South Africa, Switzerland, United Kingdom, and United States. It also includes 69 online events, allowing us to reach new instructors in many other countries.
* While we grant both Software Carpentry and Data Carpentry badges we do not distinguish between them for teaching eligibility or any other status within The Carpentries.
** While Puerto Rico is a United States Territory, it is separated out here for the purposes of demonstrating our global reach.
Table 8 in the Appendix lists all Instructor Training events, including total attendance and total badged from each event.
This bar chart shows the number of online and in person training events run each year. Data for 2018 represents actual, not projected, data. Source data can be found in Table 9 in the Appendix. Table 10 in the Appendix shows which country in-person events took place in.
This box plot shows the frequency of class sizes by year. The lower and upper ends of each box represent bounds of the 2nd and 3rd quartiles, while the lower and upper tails represent the bounds of the 1st and 4th quartiles. The center line of each box represents the median, and the center dot of each box represents the mean. Outliers are represented by dots outside the tails of each plot.
Data from 2018 is excluded. As noted above, trainees from 2018 would not be expected to complete certification reqirements within the first quarter. Summary data can be found in Table 11 in the Appendix.
We continued to look at progress from going through instructor training to completing certification and getting badged on to when instructors taught their first workshop. All dates are expressed as the first of the month. Exact dates are masked to preserve anonymity.
For each certified instructor, we collected the following data:
variable | definition |
---|---|
date_awarded | The first day of the month the badge was awarded. |
first_wkshp | The first day of the month this instructor taught their first workshop. |
days | A calculated field representing the difference between these two dates. |
The Carpentries requires that all centrally organized workshops are taught by certified Carpentries instructors. However, self organized workshops may be taught by one certified instructor who works with a peer or colleague as a co-instructors. Many of these people go on to complete our instructor training program and become certified instructors themselves, which is why we see some people teaching even years before their certification date.
The full data set can be found in the Programmatic Assessment folder of The Carpentries Assessment repository on GitHub (https://github.com/carpentries/assessment/tree/master/programmatic-assessment ).
In early years, many people were teaching Carpentries workshops before being badged because we did not have a formal badging process in place, or the oversight to ensure that instructors completed training and were badged before teaching.
Consistently, nearly half of our instructors have been teaching before they were badged. This shows that many of our instructors are coming to us already connected to The Carpentries community, acting as co-instructors for self organized workshops. This connection and experience motivates them to complete the instructor training program and continue teaching Carpentries workshops.
This box plot shows the days between badging and teaching by year. The lower and upper ends of each box represent bounds of the 2nd and 3rd quartiles, while the lower and upper tails represent the bounds of the 1st and 4th quartiles. The center line of each box represents the median, and the center dot of each box represents the mean. Outliers are represented by dots outside the tails of each plot.
Summary data can be found in Table 12 in the Appendix.
The bar chart below shows how many badged instructors have never taught, taught one workshop only, 2-5 workshops, 6-10 workshops, 11-15 workshops, 16-20 workshops, and 21 or more workshops.
As of March 31, 2018, The Carpentries has 1480 badged instructors. Of those who have taught at least once, the majority have taught between 2 and 5 workshops. We also see 526 instructors (36%) who have never taught a workshop. This does not account for how long they have been an instructor so it may include people who were badged as recently as late 2017, and have not yet had an opportunity to teach. Nonetheless, it is motivation for The Carpentries to explore why these instructors have never taught, and explore what we can do to ensure they are supported in finding and creating opportunities to teach.
Source data can be found in Table 13 in the Appendix.
Until 2016, all Instructor Training events were run as online events by the Software Carpentry founder and former Executive Director. Knowing the limitations of having only one Instructor Trainer, in 2016, The Carpentries launched a training program for Instructor Trainers.
This allowed us to expand reach by running several events a month, across timezones for online events. It also allowed us to build capacity at member organizations who have onsite Instructor Trainers. These Trainers run events for their site building a community of trained and certified instructors there. These trained and certified instructors also have onsite support to run workshops.
By brining on new Trainers in Europe and more recently in Africa, we have a large community of Trainers who overlap time zones and connect with a wider audience. We've also expanded our geographic reach, allowing us to reach communities we may not otherwise connect with.
It is due to our growing Trainer community that we are able to run more events, reach people across wider geographies, and bring on new instructors. Another Trainers training event is planned for late 2018, and we anticipate the same growth rate in this community as we saw in 2017.
We currently have 58 Instructor Trainers total. Numbers for 2018 represent actual, not projected data. Another round of Trainers training is expected in late 2018, adding 10-15 new Trainers to our community. Source data can be found in Table 14 in the Appendix.
The map below shows how many Instructor Trainers we have in each country. Through directed efforts of Carpentries staff and community members, we've seen significant growth in our Trainers community in Africa. The Trainer count in South Africa is equal to countries like Canada, Australia, or New Zealand, even though we have a much shorter history in South Africa. Source data can be found in Table 15 in the Appendix.
In looking at data representing workshops, learners, instructors, and trainers from 2012 to the present we've seen meaningful growth in many areas. In 2015, Data Carpentry grew from Software Carpentry's roots. This was in recognition of the importance of data analysis skills, specifically knowing people in different domains interact with data in different ways. With this experience, looking ahead we look forward to exploring integration of other Carpentries like Library Carpentry and HPC (High Performance Computing) Carpentry.
Along the way, we also grew our Instructor Training program from one to 58 trainers spread out across time zones and geographies. This allows us to train and certify many more instructors than ever possible, engaging with new communities across the globe. At the same time, as we expand our global reach, we need to be sure we are building sustainable communities, so the Carpentries presence can grow and thrive beyond single isolated workshops.
In collecting and analyzing the data included in this report, we also have recognized gaps in our data. This is motivation for The Carpentries to work with staff and other community members to ensure we have clear systems for collecting, sharing, and maintaining data.
Future iterations of this report will also look at activity of other Carpentries communities. This includes Lesson Maintainers who ensure lessons are up to date with pedagogical best practices as well as current technologies. This also includes Mentors who ensure that new Trainees are supported in their journey to becoming Instructors and that new Instructors are equally supported as they begin teaching workshops.
Feedback on this report is welcome. This can include reactions or new questions raised by information shared in this report; suggestions for other analyses or visualizations; code review; or any other comments. Feedback can be shared via issues in this GitHub repo (https://github.com/carpentries/assessment ) or via email to team@carpentries.org.
This table shows the number of Data Carpentry (DC) and Software Carpentry (SWC) workshops each year. Data for 2018 is a projection. The proportion of workshops in the first quarter of 2017 relative to the full year was applied to actual first quarter data from 2018.
This table shows the number of Data Carpentry (DC) and Software Carpentry (SWC) workshops in each country each year. For 2018, only actual data through March is represented, as most countries' data are too small to make meaningful predictions.
This table shows the number of Carpentries workshops each year for countries having hosted more than 10 workshops since January 2012. This does not separate out between Data Carpentry and Software Carpentry. For these countires, we have enough history to make projections for 2018. The proportion of workshops in the first quarter of 2017 relative to the full year was applied to actual first quarter data from 2018.
This table lists each country that held its first workshop by year.
This table lists every country that held a workshop that year.
This table shows the total number of learners at Data Carpentry (DC) and Software Carpentry (SWC) workshops each year. Numbers for 2018 are a projection.
In some cases, the hosts or instructors do not report back on attendance data. From 2012 through 2018 Q1, 122 of 1323 (about 9%) workshops were missing attendance.
For the analyses in this report, workshops missing attendance are excluded from the analyses. Because low attendance is a possible factor in not having reported attendance, replacing missing attendance data with means would not be an accurate reflection of our numbers.
For each year from 2012 to 2018 this shows the following:
variable | definition |
---|---|
count | Number of workshops that year |
mean | Mean (average) attendance at each workshop |
std | Standard deviation |
min | Smallest class size |
25%, 50%, 75% | 1st, 2nd, and 3rd quartile class size |
max | Largest class size |
This table lists all instructor training events The Carpentries has held since 2012. The "count_badged" column is a total of all individuals from that event with at least one badge. We are not distinguishing between Software Carpentry and Data Carpentry badges.
For reference, all training events through 2018 Q1 are listed here. However, the analyses in this report exclude data from 2018 Q1. Trainees have 90 days to complete their certification requirements, so no one who attended instructor training in 2018 Q1 would be expected to have completed certification.
This table shows the total number of online and in-person training events each year. Numbers for 2018 represent actual data, not a projection.
This table lists all Instructor Training events held each year. Events listed by country are in-person events. All online events are listed as online, even if all trainees came from the same country.
For each year from 2012 to 2017 this shows the following:
variable | definition |
---|---|
count | Number of training events that year |
mean | Mean (average) completion rates |
std | Standard deviation |
min | Smallest completion rate |
25%, 50%, 75% | 1st, 2nd, and 3rd quartile completion rate |
max | Largest completion rate |
For each year from 2012 to 2018 this shows the following:
variable | definition |
---|---|
count | Number of instructors receiving a badge that year who have also taught at least one workshop |
mean | Mean (average) number of days between receiving badge and first teaching experience |
std | Standard deviation |
min | Smallest number of days between badging and teaching |
25%, 50%, 75% | 1st, 2nd, and 3rd quartile number of days between badging and teaching |
max | Largest number of days between badging and teaching |
The table below shows how many badged instructors have never taught, taught 1 workshop, 2-5 workshops, 6-10 workshops, 11-15 workshops, 16-20 workshops, and 21 or more workshops. The left side of each bin is exclusive; the right side is inclusive.
The table below shows how many new Instructor Trainers joined each year. Numbers for 2018 are actual data, not a projection.
The table below lists how many Instructor Trainers we have in total in each country.