Skip to main content
Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Removing PII, PHI, PCII, and sensitive data

Git is a version control system that will save the entire history of changes to files. When PII, PHI, PCII, or sensitive data is committed you will need to follow the steps below to scrub it from the git database.

Note: Official documentation is located here.

Removing PII, PHI, and PCII

Git is a version control system that stores the history of all changes in a repository. If PII/PHI/PCII is checked in to a repository, it cannot simply be deleted since a record of the changes will still reside in the .git folder for every time that the file was changed.

To fix this we will use a 3rd party open source tool called git filter-repo. Git has an inbuilt facility to fix these, filter-branch, but it is not performant and has some undesired side effects so to the point that the git maintainers suggest that it not be used.

Prerequisites

  • Install git filter-repo
  • Determine if the file(s) are tracked by Git LFS
    • cat .gitattributes
    • If the file or file extension is in .gitattributes then it is tracked by Git LFS
    • Gather a list of the Git object id’s for each file you are removing
      • git ls-files -s path/to/file
      • This will return “[6 Digit permission] [41 Digit OID] [size of the file] [file name]” you only need the 41 digit OID

How to remove PII, PHI, and PCII

  • git clone --mirror <repo> temp_name
  • cd temp_name
  • git filter-repo --path path/to/pii
  • git push --mirror https://url/to/repo
  • Contact GitHub Support, asking them to remove cached views and references to the sensitive data in pull requests on GitHub.
    • If the file(s) are tracked in LFS provide the OID’s to customer support and ask them to remove them as well
  • Finally make sure all users of the repository rebase branches instead of merging them. If they do not do this the old tainted history will get re-added
    See the official docs for more details