OxRSE Training for Schmidt AI in Science Fellows

November 30, 2023 in training3 minutes

The Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship is a new fellowship programme at Oxford University, funded by Schmidt Futures, and supported by the OxRSE group as an in-kind contribution from the University of Oxford. The programme aims to support the development of new AI methods for scientific research, and to train a new generation of researchers who are skilled in both AI and scientific research. The first cohort of 10 fellows was appointed in 2023, and OxRSE provided software engineering training for the new fellows.

Training

The training programme was designed to provide the fellows with the skills they would need to develop and maintain software for their research projects, in order to maximise the impact of their research. The training was delivered in a week long workshop consisting of a mixture of lectures and practical exercises. The workshops covered the following topics, all using the Python programming language:

  • Software Architecture and Design
    • Proceedural Programming
    • Object Oriented Programming
    • Functional Programming
  • Software Testing with Pytest
  • Collaborative Software Development with Git and GitHub
  • Continuous Integration with GitHub Actions
  • Containerisation with Docker
  • Packaging Python libraries and publishing them on PyPI
  • Introduction to HPC and the ARC cluster
  • Workflows with Snakemake

The workshop was delivered by members of the OxRSE group, in particular Martin Robinson, Fergus Cooper, and John Brittain.

Training Materials

All the training materials are hosted on our training website which has been developed by the OxRSE group in collaboration with the UNIVERSE-HPC project. As part of the UNIVERSE-HPC project, the OxRSE group have been helping to collate and develop training materials for research software engineering, putting them into a form that is easily maintainable and reuseable across different training partners and events, and rendered into a form that is easy to use by learners. All the training materials are hosted in markdown format in a GitHub repository and made freely available under a Creative Commons license. The website we used to deliver the training is a Next.js application that hosts individual training events, allowing students to login, enrol on events and mark lessons as complete, allowing instructors to visualise student progress. The website source code is available and freely reusable under an MIT license.

Group Software Projects

As part of the training, the fellows were split into 3 groups of 2-5 people and asked to develop a small software project together over the course of a month. The goal of the project was to give the fellows experience of working together on a software project, and to give them a chance to put into practice the skills they had learned during the training. The fellows were given some ideas for potential projects, but were free to choose their own project based on their interests. The three projects were:

  • turbo-parakeet. A python library and pipeline for cleaning, processing and clustering time-series data
  • arxread. A python library for searching, downloading and displaying papers from the arXiv
  • paper-plumber. Python package / CLI application that employs large language models to scan, understand, and extract key data points from scientific papers, streamlining research and data analysis. Built on top of the findpapers package.

Future Plans

With the generous support of Schmidt Futures, we are planning to expand the software engineering training in 2024 to include 40 Schmidt fellows from other universities via remote training events paired with an in-person workshop for 25 fellows in Oxford to work on group software projects suggested by fellows and faculty across the 9 universities involved in the programme. OxRSE will also be continuing to support the local Oxford fellows with software engineering training and support for their research projects.