Backing up your Plone deployment¶
Strategies for backing up operating Plone installations.
A guide to determining what to back up and how to back it up and restore it safely.
The key rules of backing up a working system are probably:
- Back up everything
- Maintain multiple generations of backup
- Test restoring your backups
This guide assumes that you are already doing this for your system as a whole, and will only cover the considerations specific to Plone. When we say we are assuming you're already doing this for the system as a whole, what we mean is that your system backup mechanisms - rsync, bakula, whatever - are already backing up the directories into which you've installed Plone.
So, your buildout and buildout caches are already backed up, and you've tested the restore process. So, your remaining consideration is making sure that Plone's database files are adequately backed up and recoverable.
Objects in motion tend to remain in motion. Objects that are in motion are difficult or impossible to back up accurately.
Translation: Plone is a long-lived process that is constantly changing its content database. The largest of these files, the Data.fs filestorage which contains everything except Binary Large OBjects (BLOBs), is always open for writing. The BLOB storage, a potentially very complex file hierarchy, is constantly changing and must be referentially synchronized to the filestorage.
This means that most system backup schemes are incapable of making useful backups of the content database while it's in use. We assume you don't want to stop your Plone site just to backup, so you need to add procedures to make sure you have useful backups of Plone's data. (We assume that you know that the same thing is true of your relational database storage. If not, get to studying!)
Your Plone instance installation will contain a ./var directory (in the same directory as buildout.cfg) that contains the frequently changing data files for the instance. Much of what's in ./var, though, is not your actual content database. Rather, it's log, process id, and socket files.
The directories that actually contain content data are:
This is where Zope Object Database filestorage is maintained. Unless you've multiple storages or have changed the name, the key file is Data.fs. It's typically a large file and contains everything except BLOBS.
The other files in filestorage, with extensions like .index, .lock, .old, .tmp are ephemeral, and will be recreated by Zope if they're absent.
This directory contains a very deeply nested directory hierarchy that, in turn, contains the BLOBs of your database: PDFs, image files, office automation files and such.
The key thing to know about filestorage and blobstorage is that they are maintained synchronously. The filestorage has references to BLOBs in the blobstorage. If the two storages are not perfectly synchronized, you'll get errors.
is a well-maintained and well-supported recipe for solving the "objects in
motion" problem for a live Plone database. It makes it easy to both back up
and restore the object database. The recipe is basically a sophisticated
repozo, a Zope database backup tool, and
common file synchronization tool.
Big thanks to Reinout van Rees, Maurits van Rees and community helpers for creating and maintaining collective.recipe.backup. We all owe them drinks of their choice.
If you're using any of Plone's installation kits, collective.recipe.backup is
included in your install. If not, you may add it to your buildout by adding
[buildout] parts = ... backup ... [backup] recipe = collective.recipe.backup
There are several useful option settings for the recipe, all set by adding
configuration information. All are documented on the PyPI page. Perhaps the most
useful is the
location option, which sets the destination for backup
[backup] recipe = collective.recipe.backup location = /path/to/reliably/attached/storage/filestorage blobbackuplocation = /path/to/reliably/attached/storage/blobstorage
If this is unspecified, the backup destination is the buildout var directory. The backup destination, though, may be any reliably attached location - including another partition, drive or network storage.
Once you've run buildout, you'll have
scripts in your buildout. Since all options are set via buildout, there are
few command-line options, and operation is generally as simple as using the
bin/restore will accept a date-time argument if you're
keeping multiple backups. See the docs for details.
Backup operations may be run without stopping Plone. Restore operations require that you stop Plone, then restart after the restore is complete.
bin/backup is commonly included in a cron table for regular operation.
Make sure you test backup/restore before relying on it.
collective.recipe.backup offers both incremental and full backup and will maintain multiple generations of backups. Tune these to meet your needs.
When incremental backup is enabled, doing a database packing operation will automatically cause the next backup to be a full backup.
If your backup continuity needs are extreme, your incremental backup may be equally extreme. There are Plone installations where incremental backups are run every few minutes.