Removing assets from a repo

For most Statamic projects I work on, I leave the site’s images in the Git repository. This is simple, centralised and allows GitHub to be a full and effective backup. However, this won’t suit a site with hundreds or thousands of images.

Below is an approach I’ve used on a few projects to reduce the size of a repo – either at the beginning, or retrospectively.

The approach involves:

  • have image assets locally, but gitignore'd
  • use Digitalocean Spaces for backup and inter-site syncing
  • use s3cmd to move files between server and Spaces
  • use a lifecycle.xml policy to auto-expiry date-stamped backups
  • use git-filter-repo to rewrite history (if required)

Last updated: July 2024

Caveats and disclaimers

  • YMMV
  • Assets in a Statamic site consist of:
    • The asset files themselves (eg: example.jpg)
    • Asset meta data (eg: width, height, alt text, credits, etc.)
    • Cached versions of the asset files created with Glide (we aren’t backing these up)
  • Losing or changing the location of images in a Statamic project will break the content.
  • I use Spaces as I use Droplets so it makes sense to keep them on the same provider.
    • This approach might well work with AWS S3, but I don’t know the exact steps.
    • You could even connect your site directly to a remote filesystem as Laravel accommodates that (see Statamic Assets Drivers)
  • I use Laravel Forge to provision servers, and manage them with things like cronjobs.
  • My projects are in the UK so I make some choices based on that (ie: data centre region)
  • I’ll use example as shorthand for a client project’s name (ie: example-prod-01)
  • I’ll use main as the name for the Statamic asset container
  • Architecturally, Spaces are “flat” in that they do not have directory structures (as per S3). Although they appear to have folders (DO even has a “create a folder” button”), they are not hierarchical. A folder is actually a prefix that has some /s in it.

Create a Digitalocean Space

  1. Login or create a Digitalocean account or team
    - $200 credit for first 60 days with this referral link: m.do.co/c/c677cf2cc36b
  2. Create a Space
    1. datacenter region: AMS3 (Amsterdam)
    2. “Enable CDN”: your choice, but I don’t use this feature.
    3. Spaces bucket name: example-assets-main-01

Install & configure s3cmd

  • This is a command line tool for syncing with DigitalOcean Spaces or Amazon S3.
  • You’ll want to include this approach in your docs in some way.
  • See this Gist

Expiry Lifecycle

  • This creates an isolated, timestamped backup as some protection against a ‘bad sync’.
  • The script creates a top-level folder with a date-stamp
  • These backups are auto-deleted if they match the pattern (eg: start with `backup-`)
  • See this Gist

Remove from Git

  • We now need to tell Git to stop tracking the assets directory
  • Add the asset container’s path to .gitignore
    • For a container called main, this would be /public/main
  • Remove the directory from Git, but not delete the files:
    • git rm -r --cached public/main

Once this is done Git no longer actively knows about the images.

However, the images are still in the .git/ directory and repo as objects within its history. The next step with Git Filter Repo resolves this.

Git-Filter-Repo

  • This is a command line utility recommended by GitHub to remove large or sensitive files
  • By its nature it rewrites Git history with a rebase (so teams beware!)
  • See this Gist