Removing assets from a repo
https://wesort.co.uk/blog/writing/removing-assets-from-a-repo
For most Statamic projects I work on, I leave the site’s images in the Git repository. This is simple, centralised and allows GitHub to be a full and effective backup. However, this won’t suit a site with hundreds or thousands of images.
Below is an approach I’ve used on a few projects to reduce the size of a repo – either at the beginning, or retrospectively.
The approach involves:
- have image assets locally, but
gitignore'd
- use Digitalocean Spaces for backup and inter-site syncing
- use s3cmd to move files between server and Spaces
- use a
lifecycle.xml
policy to auto-expiry date-stamped backups - use git-filter-repo to rewrite history (if required)
Last updated: July 2024
Caveats and disclaimers
- YMMV
- Assets in a Statamic site consist of:
- The asset files themselves (eg:
example.jpg
) - Asset meta data (eg: width, height, alt text, credits, etc.)
- Cached versions of the asset files created with Glide (we aren’t backing these up)
- The asset files themselves (eg:
- Losing or changing the location of images in a Statamic project will break the content.
- I use Spaces as I use Droplets so it makes sense to keep them on the same provider.
- This approach might well work with AWS S3, but I don’t know the exact steps.
- You could even connect your site directly to a remote filesystem as Laravel accommodates that (see Statamic Assets Drivers)
- I use Laravel Forge to provision servers, and manage them with things like cronjobs.
- My projects are in the UK so I make some choices based on that (ie: data centre region)
- I’ll use
example
as shorthand for a client project’s name (ie:example-prod-01
) - I’ll use
main
as the name for the Statamic asset container - Architecturally, Spaces are “flat” in that they do not have directory structures (as per S3). Although they appear to have folders (DO even has a “create a folder” button”), they are not hierarchical. A folder is actually a prefix that has some
/
s in it.
Create a Digitalocean Space
- Login or create a Digitalocean account or team
- $200 credit for first 60 days with this referral link: m.do.co/c/c677cf2cc36b - Create a Space
- datacenter region: AMS3 (Amsterdam)
- “Enable CDN”: your choice, but I don’t use this feature.
- Spaces bucket name:
example-assets-main-01
Install & configure s3cmd
- This is a command line tool for syncing with DigitalOcean Spaces or Amazon S3.
- You’ll want to include this approach in your docs in some way.
- See this Gist
Expiry Lifecycle
- This creates an isolated, timestamped backup as some protection against a ‘bad sync’.
- The script creates a top-level folder with a date-stamp
- These backups are auto-deleted if they match the pattern (eg: start with `backup-`)
- See this Gist
Remove from Git
- We now need to tell Git to stop tracking the assets directory
- Add the asset container’s path to
.gitignore
- For a container called
main
, this would be/public/main
- For a container called
- Remove the directory from Git, but not delete the files:
git rm -r --cached public/main
Once this is done Git no longer actively knows about the images.
However, the images are still in the .git/
directory and repo as objects within its history. The next step with Git Filter Repo resolves this.
Git-Filter-Repo
- This is a command line utility recommended by GitHub to remove large or sensitive files
- By its nature it rewrites Git history with a
rebase
(so teams beware!) - See this Gist