Backing up your data: some interim suggestions
Library and Information Services will be reviewing its long-term plans to provide a managed backup system for the data from users' desktops. The present article contains advice on what to do in the meantime. It deals with measures that you should be taking as a matter of course to protect the data which you hold on your desktop PC or Mac.
What are backups for?
Backup copies are for security, to protect your data from accidental loss. There are different sorts of contingency, and the measures needed vary accordingly. We consider three sorts of backup:
- Disk backups to enable you to re-instate an entire disk, in case your disk becomes unreadable through either data corruption or physical damage, or in case your computer is lost or stolen
- File backups to enable you to restore earlier versions of particular files or groups of files
- Archives to preserve your data for the indefinite future
General considerations
We shall consider each of these sorts of backup in turn, but first there are some general points to make about the centrally managed backups taken by LIS, backup media and generations of backups.
Centrally managed backups
LIS takes backup copies of data held on the centrally managed servers. This includes users' home directories and the mail spool areas (where your incoming mail is stored until you read it), as well as the main university web server and the administrative servers. These backups are scheduled so that if the disk on any of the servers is corrupted, damaged or stolen the data can be restored to a state not more than one day old. The tapes holding the backups are held in secure locations well apart from the computers whose data they contain.
It is possible, but time-consuming, to recover individual files from the centrally managed backup tapes. The purpose of these backups is not to provide a safety-net for users who have unintentionally deleted or modified their files, but to enable LIS to restore the servers' disks in case of damage, corruption or loss.
Backups are generally carried out on a weekly cycle, with a full copy made one day, and incremental copies made on other days of the week, to capture changes day by day. Tapes are held for approximately six weeks, so that it is possible to revert to an earlier state where corruption or damage has gone undetected for some time.
Backup media
Backup copies must be kept physically separate from the disk containing the original data. You should make your backup copy either across the network onto a disk on another computer, or onto removable media.
Backup across the network:
You need to remember the limited space on the network disks. They can't accommodate copies of all the documents on everyone's hard disk. With this proviso, you might consider backing up to:
- your home directory (your allocated portion of the disk on your home server). This is highly secure, in so far as the home directories are themselves backed up by LIS, but space is strictly limited. LIS cannot provide a general service enabling everyone to copy all their documents from their desktops into their home directories.
- a shared disk on a centrally managed server. This too is highly secure because the centrally managed server will itself be backed up by LIS, but shared disks are provided for shared access to data and restricted, password-protected access to confidential material, and are not large enough to accommodate backup copies of everyone's documents.
- a locally managed server in your own department. This is a good option provided you have the technical expertise in your department to set up the backup server and maintain it over time. Departments should consult LIS before setting up their own servers.
Backup to removable media:
The following types of removable media should be considered:
- Floppy disks: obviously these are only appropriate for small-scale file-by-file backups, of the sort you might take after each editing session on your document.
- Zip disks: the maximum capacity of Zip disks is now 750 Megabytes, but we would recommend using a 250 Megabyte drive, on grounds both of cost and compatibility with existing 100 or 250 Megabyte drives.
- Optical storage: of the different current standard options (CD-R, CD-RW and miscellaneous DVD) we recommend using CD-R (CD-Recordable) on grounds of reliability. That is: compatibility (maximum likelihood of being read on another machine), longevity, and non-erasability - you can't accidentally erase data written on a CD-R (and you can in fact regain old versions of data from a multi-session disk). The capacity of CDs varies. An '80-minute' CD will take something over 700 Megabytes of data. The process of burning a CD is simpler now than in the past.
- External hard disk drive: These can equal or exceed your internal hard drive in capacity.
- USB memory stick: these are easily portable and easy to use for short-term storage and portability, but their long-term reliability has been questioned. Current products can take 512 Megabytes of data
- Magnetic tape: these are used for large-scale data storage such as the centrally managed backups. They are less easy to manage than the other media referred to, and recovery of individual files can take very long time. Tapes vary in capacity but will typically hold tens of Gigabytes.
Backup generations
You have to restore a file from backup when you find that the current version on your hard disk is damaged or deleted. Occasionally the restored copy itself turns out to be damaged. This happens when the backup copy was made after the damage occurred but before it was detected. To guard against this sort of frustration you should keep more than one generation of backup.
The centrally managed backup tapes are kept for six weeks before being over-written. This is ample for the purpose for which these backups are taken.
When you make a backup copy containing the latest version of a document, you should not over-write the previous backup of the document, just in case your latest version has introduced some sort of corruption into the file, or you have made some inadvertent changes to the contents. To avoid overwriting, you should either have a set of removable disks and use them in order, or you should rename the files or folders to contain the date on which the backup was made. The exact procedure to be adopted depends upon what sort of backup you are carrying out.
Although you must keep more than one generation of backup, there are limits to how far back you should go. These limits are partly practical limits on how much data you can accommodate, and partly a matter of how long you should, from the point of view of both law and policy, retain copies of your documents. Note the above comment about writing multi-session CD-Rs.
Different sorts of backup
Most users need to carry out all three sorts of backup identified at the beginning of this note.
Guard against loss of your entire disk
You can lose your entire hard disk through physical damage, software corruption or theft. If this happens you will usually want to restore your data to as recent a state as possible. If you backed up your disk every hour, you could get it back in the state it was just an hour before it was lost. You have to balance the cost and inconvenience of an hourly backup against the value, to you, of the reassurance of having such a recent copy to restore in case of loss.
Backups made for this purpose do not usually need to be kept for very long. You need to keep more than one generation, to guard against the case where the backup itself is faulty, but you would not usually keep more than three generations.
If you are backing up to an external hard disk with a capacity equal to or greater than your main disk, you can make an exact image of your main disk, including the system files and applications software as well as your data and documents. If you have to restore your main disk, you can then restore it in exactly the state it was when backed up.
If you are using other media, a complete disk image is less practicable. Furthermore, the reason for having to restore your data from backup is often that your main disk has become corrupted in some way. In such a case it is likely that at the time the backup image was taken the corruption may already have got a hold, so you would not want to restore the systems files and applications software for fear of re-introducing the corruption.
The most common procedure is to backup your data and documents, but not the software. If you have to restore your disk you would start by installing the operating system and applications software from the original installation disks, and then restore the documents and data from the backup copy. This, however, would not cover updates to software that you may have downloaded.
The documents and data that you should backup regularly includes the following
- documents from applications such as Word, Excel and Access
- mailboxes, attachments and addressbook
- web bookmarks or favourites
In order to be ready to restore your disk in case of loss or corruption, you need to know where to find the installation files for all your software. The installation files may be on the CDs originally purchased, or you may have downloaded them from the web and stored them on your hard disk. You could have a folder for downloaded files within your My Documents folder, so that your latest downloads, including essential updates for your software, are included in your backups.
For reasons of speed and efficiency you want to avoid backing up unnecessary data. This is one good reason for having a strict house-keeping policy, and getting rid of unnecessary files.
Guard against loss of individual files
If you discover that a particular file has been damaged or accidentally deleted, you will need to restore it from backup. It may be that you do not discover the damage to your file until some weeks or months after it occurred. The regular backup of your disk referred to in the previous section will go back a limited length of time, so you cannot rely on it when it comes to retrieving long-lost files. For this situation you need a more long-term strategy. Exactly what you do will depend on your own needs and style of working. Here are a couple of examples.
- When a project completes one of its stages you should take a snapshot copy of all the documents associated with the project, and keep it either permanently, or until it is superseded by the next stage in the project
- If you are working on a major file you should take a backup copy of that one file after every editing session. Keep several generations of the file, so that you can recover from accidental damage or loss which is not immediately apparent
The contingencies we are guarding against here do not involve the loss of your whole disk, but only of particular files. Backups made for this purpose do not necessarily have to be stored away from your PC or Mac. However it is sensible to have copies of crucial files kept safely somewhere away from your main disk, for example in your home directory or on a CD or Zip disk.
Archiving
When a project is complete you should take a full copy of all the files associated with the project. In this case you have to consider the long-term future of the data. The subject of archiving requires more detailed consideration than can be given here, but, briefly, if the project is of a sort which requires the long-term retention of the associated data files, you must take steps
- to keep the backup copy secure, which means that it must be refreshed (copied onto fresh disks, for example) every few years
- to label the data, both by physically labelling the disks and by including metadata in with the files
- to store the data in a format which does not depend on a particular piece of proprietary software to read it
How to carry out your backups
In view of the large amount of data that we accumulate over the years there are obvious advantages in storing it in compressed form on our backup media. There are various data compression programs on the market of which PKZIP and WinZip are the best known. These will compress your data files into a single file (often known confusingly as an archive), and also provide a restoration facility so that you can extract particular files from the compressed file. Windows has its own Backup and Restore utility. These programs are particularly appropriate for the regular backups of all your data files.
For copying individual documents or small groups of documents it might be simpler not to compress them into an "archive", but to copy them across to the backup disk as separate files.
For true archiving purposes you should certainly not compress the data since this is likely to impose upon it a format which will be unreadable in years to come. In the past the Microsoft backup utility was notorious for producing backup files which could not be restored on later versions of Windows.
Retention of backup copies and archives
With legislation on Data Protection and Freedom of Information it has become essential for the University to know what information is being stored and for how long. Guidelines on document management are being developed which are expected to recommend that information should be deleted as soon as it is no longer required. This includes email messages as well as more formal documents. When removing information from your hard disk you should also delete or destroy all backup copies. You also need to consider how long you need to retain archived copies, for instance superseded policy or procedure documents that may be needed in the future for legal or reference purposes.
The Future
LIS is hoping to provide more extensive facilities for backing up users' desktop data. Requirements vary across the University, and LIS will consult widely. In the immediate future, however, the onus will lie very much on individual users, and on unit managers, to ensure that adequate backup measures are taken.