Backups on Google Cloud Storage, with Go
Some more fun today. I had to set up some backups on a VPS. There are many backup tools around, but they all seem so complicated. And many don’t store to cloud storage providers, which is want I wanted to do.
Was that a good enough excuse to write Go code? I hope so (;. Ideally, I would have a tool that has a command-line interface like Mercurial. So you can call “$cmd backup full” or “$cmd backup incremental” from cron or the command-line. You could have similar commands to restore entire backups, or only specific files. It would also have a mechanism to clean out old backups, preferrably automatically. I’m sure it exists, but I couldn’t find a tool that looks simple and does this (and is not ancient).
Well, I don’t have time or energy to write my ideal backup tool, so I wrote a few programs with which together I can make backups.
First cloudstream, which streams data from/to the cloud. It works with Google Cloud Storage, in S3-compatibility mode. That’s nice and simple, it supports clients streaming new files to storage by using “transfer-encoding: chunked”. S3 itself doesn’t understand that and requires you to write files of 5MB, making code more awkward if you want to get reasonable speed. The tool works as you would expect from these example commands:
echo hi | cloudstream put /mybucket/hi.txt
cloudstream get /mybucket/hi.txt >hi.txt
This can be used in a pipeline. You can stream tar files containing backups to storage.
Of course, you don’t want to store plaintext files in the cloud just like that. So you encrypt them. You could use a standard tool for this (e.g. openssl enc), and it seems to support streaming, but it’s more fun to write Go code to do that (and get library functions in the process). That’s cryptstream. It works like this:
echo hi | cryptstream encrypt >hi.enc
cryptstream decrypt < hi.enc
So now you can run:
tar -cf - somedir | gzip | cryptstream encrypt | cloudstream put /mybucket/backup-1.tgz.enc
Before doing that, you need a few configuration files in the directory you run that (or higher up). First “cloudstream.conf”:
accesskey GOOG123456789
secret YmxhaGJsYWhibGFoYmxhaGJsYWhibGFoCg
And second “cryptstream.conf”:
key 87a3b045a65a18177b5ca0bae9f109049e545468703559669ac0853296fa3a03
These are fake credentials, don’t worry. They are in configuration files because you don’t want to use them on the command-line.