Legacy URL redirects

https://wesort.co.uk/blog/writing/legacy-url-redirect-on-nginx-with-query-strings

Often when a new website is created there is an existing website on the domain. That site has URLs and those URLs are addresses that are in use. Be that by search engine indexes, visitor’s bookmarks or links pasted into emails that someone may someday click on.

These need to be handled carefully when a new site is launched so as not to lose the SEO ranking that has accumulated over the life of the site, or inconvenience anyone using an old link. Here’s a technical outline of how I do it.

Update: 10 August 2020

Step 1: Compile a list of the legacy site’s URLs
~ The goal is to have a structured list of what already exists, and a way for someone who knows the content well to set where that address should now point.
~ I have never really found an easy and comprehensive way to do this. I tend to use xml-sitemaps.com and then wrangle that list in a text editor before pasting into a gSheet. But someone out there probably has a better way with wget.
~ To list the URLs and give the content author an interface for the new paths, this example gSheet works well for me. It uses this colour coding:
green cell background = enter data
grey cell background = formulas
blue cell background = labels
~ Paste the legacy URLs into column A
~ Replace cell B2 & D2 (the fully qualified domain name) with your own
~ Be sure to ‘fill down’ the formulas in columns B, D and E

Step 2: Fill in the new destination paths
~ Each row in column C needs a value. Possibly even / to send visitors to the homepage.
~ This can be a laborious task but never takes as long as one fears. Once again, it should be done by someone who genuinely knows the content of the new site.
~ Do it when you are nearly ready to launch to ensure the new URLs are correct.
~ Be careful not to create any redirect loops – this is where the destination path points to a legacy path that redirects ad infinitum.

Step 3a: Using the routing within CMS
~ If you’re using a CMS, it may include already an interface for doing this.
~ I mainly use Statamic and on V2 sites within Setting > Routes you can set these with a human readable syntax:
/old/path/page-name: /new-page-name
~ Here’s a gist of the non-site-specific portion of my Statamic V2 nginx.conf file which includes some cache control config.

Step 3b: Configure the server
~ On a recent project I had to opt for this method because: the paths included query strings, and there were 500+ legacy paths that I didn’t want cluttering the routes file. I chose to only put the query string redirects into the server config so that the client could easily adjust the normal [non-query] paths from the control panel should they need to.
~ For context, I use Laravel Forge to provision servers with Nginx on Digitalocean. Computers in this context are completely pedantic and at times frustrating. Everyone’s environment is slightly different so debugging this can be difficult.
~ There are a few steps and what’s below is for Nginx only. (My gSheet has a sheet for Apache, but it’s only for redirects on non-query string URLs. If you know how to do, I will happily update this post.)

3b.1. Create redirect file
~ Create a file at root called legacy-redirects.map
~ Copy column E from the gSheet
~ Paste in and delete those first two rows (which are labels in the gSheet)

3b.2. Point nginx.conf at the redirect file
~ To edit that file: Forge > go to the site > scroll to bottom > “Files” (dropdown) > “Edit Nginx Configuration”.
~ BEFORE the first server block, insert the following being sure to replace your domain on line 3:

map $uri$is_args$args $new_uri {
    default 0;
    include /home/forge/example.com/legacy-redirects.map;
}

3b.3 Instruct nginx.conf to use the redirects
~ WITHIN the server block and AFTER # FORGE CONFIG (DO NOT REMOVE!) insert

if ($new_uri) {
    return 301 $new_uri;
}

~ Save the nginx.conf and hope it doesn’t throw an error

Step 4: Test, test, test
~ You can do this from your browser but caching plays a big role in redirects and the caches stack up (browser, router, ISP, etc).
~ I tend to use something like httpstatus.io by pasting in a handful noteworthy legacy URLs (plain addresses, with query strings, with www prefix, http, https, etc)

Finally, thank you to James Blair who kindly helped me with the query string element of this.

Further writing »