Skip to main content Link Menu Expand (external link) Copy Copied

Pointers: Open Datasets for (Data Science) Education

This website is the portal to access and submit to Pointers - an open library of datasets and supporting material for teaching data science concepts in a workshop setting.

The project is a collaboration between the Academic Data Science Alliance (ADSA) and The Carpentries.

Why Does the Collection Exist?

Many repositories already exist for open data, and we do not intend to replace any of these.

The purpose of this collection is specifically to collect datasets that are intended for use in an educational setting. Where most data repositories prefer to collect authentic, raw data that has not undergone any additional processing, we recognise that raw data - and real data! - is often not the easiest to use as an example when teaching.

Raw data tends to include noise and other artifacts that can distract from the skills and concepts a teacher wishes to focus on. Raw data is often too large to be useful in most teaching contexts, where lesson time might be wasted on downloading and processing very large volumes of data. Raw data can be too complex for educators and learners to quickly and easily get to grips with, increasing the time it takes to prepare a lesson, or distracting from the main focus of the lesson content.

Datasets intended for use in teaching may be derived from a larger raw dataset, e.g. having been subsampled, or include only a selection of the columns in the full version of the data. They may even be entirely or partly artificial e.g. have noise intentionally added in, replace identifying information with fake personal information for fictional people, etc.

A dataset for education is also likely to include considerably more supporting information (metadata and other materials) to describe the data and facilitate its reuse.

This project was established to collect these datasets that have been specifically created or prepared for use in an education context.

Code of Conduct

All members of the project and those submitting datasets are required to abide by The Carpentries Code of Conduct.