The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP) at NHS Digital.

Overview

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work.

Repository owner: NHS Digital Analytical Services

Email: [email protected]

To contact us raise an issue on Github or via email and will respond promptly.

RAP community of practice

Welcome to the landing page for the RAP community of practice repo.

You can learn all about Reproducible analytical pipelines (RAP) on our what is RAP page. In a nutshell though, RAP is becoming the standard for publishing analytical outputs in government. RAP combines a number of ways of working that help to improve the reliability, transparency, and speed of statistics publications. Reproducible Analytical Pipelines follow the principles of the AQUA Book guidelines, which revolve around analysis being reproducible, auditable, transparent, and quality assured.

The RAP community of practice includes all analysts and data scientists who are interested in adopting the working practices included in reproducible analytical pipelines (RAP). This repo is a central repository for resources and guidance to help teams adopting RAP practices. There is an associated [MS Teams page] where you can introduce yourself, ask for help, or discuss different approaches. Over time we hope to build up a community of people who can self-support and further develop these ways of working.

The community of practice aims to support teams in adopting RAP practices through:

  1. Offering in-person support as teams establish new working practices
  2. Producing learning materials that offer reusable templates adapted for the NHSD analytical environment

This work is prompted by the observations that teams can struggle to adopt RAP practices without direct support. While no one element of RAP is particularly difficult, learning several new skills at the same time as delivering BAU is challenging. Teams can struggle to find the defended time to embed these practices. See the Statistics Authority report on the barriers to RAP adoption for more information. Luckily, in NHSD we have strong senior support for RAP and many teams have already begun to adopt many of the practices included in RAP. Consequently, we already have a large pool of skilled, ethusiastic analysts who are willing to help others. These resources also aim to support the goals laid out in the Goldacre report Bringing NHS data analysis into the 21st century and to align with Tim Berners-Lee's Five star data principles.

Support and training

If your team is embarking upon a RAP journey, you should look at our what is RAP page and try to complete the self-assessment. From there, we recommend reaching out for some in-person support. The RAP Champion Function (within the Data Science Skilled Team) can offer support in many forms:

  • Reviewing your RAP work and assessing your progress against the levels of RAP
  • Peer review of code
  • Workshops for a specific RAP capability
  • Consultancy style engagement where we plan a migration strategy
  • Pair coding
  • Shadowing another team

If you want to talk about any of this then please reach out on the [RAP community of practice MS Teams] page (internal to NHSD).

We maintain a list of people who are willing to dedicate some time to support others. Please add your name to the mix if you are willing to support someone else. You don't need to be an expert - just willing to share what you know.

Tutorials and resources

As we work alongside teams, we try to produce reusable learning materials pitched at specifically supporting NHSD teams. We try (with partial success) to avoid reproducing guidance that is easily available online. Instead, we link to lots of external resources where you can self-serve. Our focus instead aims to create some bespoke guidance that lays out how you would accomplish these practices in the NHSD setting.

Here are some of the initial resources:

These resources are demand-driven so if you want something then please ask on the [MS Teams page]. We would also ask you to contribute if you can improve on any of the resources or can fill in any other gaps.

The resources are not intended to be prescriptive. There are many ways to accomplish a task and teams have valid reasons for choosing other approaches. Instead the intention of the resources provided here is to offer a way in for teams who want to adopt good practices that they have heard about but don't know where to start.

Misc

We have taken inspiration from the NHSD software engineering COP. It has tons of great material so I encourage you to read and reflect on these working practices.

Licence

RAP Community of Practice codebase is released under the MIT License.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Comments
  • Dead link

    Dead link

    opened by abbieprescott 4
  • dependency management

    dependency management "not possible in DAE"

    In Levels of RAP it say: Does your repo include dependency management? (i.e. requirements.txt or conda environment for RDS users. Not possible in DAE)

    It's not strictly true that this cannot be described for DAE - though it is more limited. One can describe the cluster used (runtime, libraries etc).

    opened by SamHollings 2
  • RAP Publishing Checks - Clarify what are credentials and secrets

    RAP Publishing Checks - Clarify what are credentials and secrets

    We've had some feedback that the part of the publishing checks that says "no credentials or secrets" is not clear, as analysts have not seen these terms before.

    The following text might make things easier to understand:

    Credentials or secrets are essentially passwords that computers use for encrypted communication or access to services. For example, with many APIs (like the Google Maps API) you must supply a credential code to access the service. Often times these codes look like long strange combinations of letters and numbers (l79sDgH9s...). We must not share our passwords publicly, so you should not commit credentials and secrets.

    opened by goodyguts 2
  • Environment and dependecy management - needs to be clearer

    Environment and dependecy management - needs to be clearer

    In the "levels of RAP" people become confused by environment and dependency management - we need to link to page which very clearly describe these, what the point of it is, and how they can know if they're meeting this requirement.

    opened by SamHollings 2
  • Pyspark guidance

    Pyspark guidance

    I'm not a fan of referring to it as a "flavour of python" (about PYspark page)

    I think Pyspark should be contained underneath Python.

    I also think it should make it clear that distribution of processing only occurs if its set up right - spark on a normal laptop will not be any more powerful than say pandas. On a big cluster in databricks is a different story.

    I think this page might also need a reference to other python datastructures - and how there is a right tool for the right job.

    duplicate 
    opened by SamHollings 1
  • Split out Terminal guidance from

    Split out Terminal guidance from "git" guidance.

    The terminal guidance is contained within the git guidance - but the terminal is a separate tool which can be used for many purposes - probably better to have it as its own level alongside Python, git etc, and then for these pages to be referenced by the other technologies.

    opened by SamHollings 1
  • code in the open - topics and add to data-analytics-services

    code in the open - topics and add to data-analytics-services

    On the "how to publish your code in the open page" - we should tell people they should add their publication to the page: https://github.com/NHSDigital/data-analytics-services and also that they should set appropriate topics for their publication, i.e. nhs-digital-publication

    opened by SamHollings 1
  • Signpost resources to ensure accessibility requirements are met

    Signpost resources to ensure accessibility requirements are met

    This is most relevant for any outputs produced. See guidance.

    As a starting point, the python visualisation guide should include tips on how to make visualisations more accessible:

    • The Home Office has some posters on accessible design
    • There are also countless online resources on accessibility relating to colour-blindness, visual impairments etc.

    We should also consider including a note on accessibility in the design of RAP. A pipeline would be difficult to reproduce if a user could not access any part of the pipeline. This includes README files, as well as output types.

    opened by harrietrs 1
  • Environment management external links

    Environment management external links

    We should do more to explain how environment management plays into reproducibility.

    This page is quite useful and would save us duplicating: https://realpython.com/python-virtual-environments-a-primer/

    opened by connor1q 1
  • Broken link

    Broken link

    https://github.com/NHSDigital/rap-community-of-practice/blob/main/python/project-structure-and-packaging.md#generic-package-template

    There is a broken link to the generic package template in the section above

    opened by connor1q 1
  • Contributions section

    Contributions section

    We're keen to encourage external improvements to these resources but we don't yet have a contributions section that explains how we will review and moderate.

    opened by connor1q 1
  • Code review page ideas

    Code review page ideas

    We have recently been doing some code reviewing. Here are a few things that we think might make the page more helpful.

    Code review before merge request

    Code should be reviewed with someone before submitting a merge request. The reviewer should consider whether the code needs to be refactored or redesigned.

    I'm not sure that I always agree with this. Merge requests make it really easy to leave comments on different parts of the code, and in some ways make the life of the reviewer and the merge request submitter easier. Maybe rephrase as

    You don't have to save reviewing your code until the end. You can do small reviewing and also pair programming while developing the ticket. Seeking feedback sooner could mean you save time because you do not have to change as much when the final review happens later.

    Different types of code review

    There are different types of code review that you can get. It may be worth highlighting them.

    1. Merge request code review

      A standard review process that checks whether changes to the codebase are acceptable. You focus only on the code that has changed. It should be relatively quick, and very regular (one every time you implement a new feature). Normally done by a member of the team.

    2. Full code review

      A code review where someone looks at all your code together, and gives you overall feedback. This review allows someone to look at the bigger picture, rather than one individual feature. These reviews take longer, and are less regular. Normally done by members outside your team, so that it is a fresh pair of eyes.

    3. Fitness to publish checks

      A code review to check the code is okay to publish. Note that, in the code review, you will normally limit yourself to making suggestions that you want completed before the code is published. This may mean you avoid suggesting big changes to the code, and instead focus in on checks like ensuring documentation is well written, or removing passwords from the code.

    Maybe split code review checklist into beginner and advanced items?

    One of the items on the code review checklist is

    Documentation is hosted for easy access. GitHub Pages and Read the Docs provide a free service for hosting documentation publicly.

    Even with advanced teams in data services I do not see them doing this. It might be worth prioritizing, so that the checklist is less overwhelming.

    Maybe organise the checklist items by the RAP level the team is aiming for.

    on jira workplan 
    opened by goodyguts 2
  • 03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    03_quality-assuring-analytical-ouputs page not clearly linked with levels of RAP

    The AQUA page (https://github.com/NHSDigital/rap-community-of-practice/blob/main/implementing_RAP/general_guidance/quality-assuring-analytical-ouputs.md) is not clearly associated with the levels of RAP and so people can find it a bit confusing when and how they should be following it.

    We need to more clearly link it into peoples workflow when planning out RAP (some of it is beyond RAP and more general guidance on managing analytical work), and perhaps reduce duplication by removing those bits already covered by the "levels of RAP" - and making these clear.

    on jira workplan 
    opened by SamHollings 1
  • Clean code guidance

    Clean code guidance

    some teams want to use clean code - we need guidance on the best way to approach this for analytical code, why you would want to do it, and what to watch out for.

    on jira workplan 
    opened by SamHollings 2
Releases(v1.1.0)
  • v1.1.0(Dec 21, 2022)

    What's Changed

    Automatic Release Notes

    • Release v1.1.0 by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    New Contributors

    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/35

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/compare/v1.0.0...v1.1.0

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Dec 6, 2022)

    What Changed

    Automatic release notes

    • Hr 1188 r git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • Add Intro to R link by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/3
    • Improving layout and expanding rollout section by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • Cq updates by @connor1q in https://github.com/NHSDigital/rap-community-of-practice/pull/5
    • Hr changes by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/9
    • Hr updates to git by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/10
    • Update publishing code in the open by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • Sh new front page by @SamHollings in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • Restructure and edit files by @abbieprescott in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • Create gh-pages version by @harrietrs in https://github.com/NHSDigital/rap-community-of-practice/pull/31
    • add two new guides and pr prep by @helrich in https://github.com/NHSDigital/rap-community-of-practice/pull/32
    • Publishes when to stop coding guide by @josephwilson8-nhs in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • Added new improved guides on virtual environments by @xiyaozhuang in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    New Contributors

    • @helrich made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/2
    • @connor1q made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/4
    • @harrietrs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/20
    • @SamHollings made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/22
    • @abbieprescott made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/23
    • @josephwilson8-nhs made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/33
    • @xiyaozhuang made their first contribution in https://github.com/NHSDigital/rap-community-of-practice/pull/34

    Full Changelog: https://github.com/NHSDigital/rap-community-of-practice/commits/v1.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
NHS Digital
NHS Digital Public Repository
NHS Digital
Tools, guides, and resources for blockchain analysts to interface with data on the Ergo platform.

Ergo Intelligence Objective Provide a suite of easy-to-use toolkits, guides, and resources for blockchain analysts and data scientists to quickly unde

Chris 5 Mar 15, 2022
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

DataCamp 114 Oct 12, 2022
A python package template that can be adapted for RAP projects

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work. Repositor

NHS Digital 3 Nov 8, 2022
A tool to build reproducible wheels for you Python project or for all of your dependencies

asaman: Amra Saman (আমরা সমান) This is a tool to build reproducible wheels for your Python project or for all of your dependencies. What this means is

Kushal Das 14 Aug 5, 2022
null 1 May 12, 2022
Data Structures and Algorithms Python - Practice data structures and algorithms in python with few small projects

Data Structures and Algorithms All the essential resources and template code nee

Hesham 13 Dec 1, 2022
A Python library that helps data scientists to infer causation rather than observing correlation.

A Python library that helps data scientists to infer causation rather than observing correlation.

QuantumBlack Labs 1.7k Jan 4, 2023
List of short Codeforces problems with a statement of 1000 characters or less. Python script and data files included.

Shortest problems on Codeforces List of Codeforces problems with a short problem statement of 1000 characters or less. Sorted for each rating level. B

null 32 Dec 24, 2022
A cookiecutter to start a Python package with flawless practices and a magical workflow 🧙🏼‍♂️

PyPackage Cookiecutter This repository is a cookiecutter to quickly start a Python package. It contains a ton of very useful features ?? : Package man

Daniel Leal 16 Dec 13, 2021
This is a practice on Airflow, which is building virtual env, installing Airflow and constructing data pipeline (DAGs)

airflow-test This is a practice on Airflow, which is Builing virtualbox env and setting Airflow on that env Installing Airflow using python virtual en

Jaeyoung 1 Nov 1, 2021
Donatus Prince 6 Feb 25, 2022
BOHB tune library template (included example)

BOHB-template 실행 방법 python main.py 2021-10-10 기준 tf keras 버전 (tunecallback 방식) 완료 tf gradienttape 버전 (train_iteration 방식) 완료 pytorch 버전은 구현 준비중 방법 소개

Seungwoo Han 5 Mar 24, 2022
Framework for creating efficient data processing pipelines

Aqueduct Framework for creating efficient data processing pipelines. Contact Feel free to ask questions in telegram t.me/avito-ml Key Features Increas

avito.tech 137 Dec 29, 2022
Reproducible nvim completion framework benchmarks.

Nvim.Bench Reproducible nvim completion framework benchmarks. Runs inside Docker. Fair and balanced Methodology Note: for all "randomness", they are g

i love my dog 14 Nov 20, 2022
An open-source Python project series where beginners can contribute and practice coding.

Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont

Leah Nguyen 491 Jan 4, 2023
A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

Finding-LCM-using-python-from-scratch Here, I write a practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python wi

Sachin Vinayak Dabhade 4 Sep 24, 2021
Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

DNN Training, from theory to practice This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th o

Alexandre Défossez 6 Nov 14, 2022
A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc

Neo4j 100 Dec 20, 2022
Purge your likes and wall comments from VKontakte. Set yourself free from your digital footprint.

vk_liberator Regain liberty in the cruel social media world. This program assists you with purging your metadata from Russian social network VKontakte

null 20 Jun 11, 2021