splitbrain.org

electronic brain surgery since 2001

Backing up my Data

I started thinking about my backup strategy, yesterday. The problem is that there is no real strategy. I have some stuff on RAID and my development files are checked into my darcs repository. But that's it.

This post is to do a little public brainstorming on what I'd like to implement.


What to backup

Let's have a look at the data I have.

First there is the stuff I never, ever want to loose because it can't be reproduced and is of big value to me. That's my photos mostly. Currently about 14GB but growing yearly by 6 to 10GB I estimate.

Then there's data that I would hate to loose, but could be reproduced if necessary. That's my media files. MP3s mostly. This takes a lot of space – about 100GB I'd say.

The last part is smaller, but rapidly changing data (in comparison). That is all the stuff in my home directory, config files and databases.

Where to backup

This different data needs to be backed up differently. Data of the first type has to be backed up distributed. Each photo should exist on at least two different physical locations. When my house burns down, my photos should be safe.

The media files on the other hand do not need to be backed up to an offsite location. Having them on a second disk or secured by RAID should be enough to be safe from a hard disk crash. When my house burns down I have different worries than my MP3s.

The last group of data is a bit different. Here I not only need a backup in case of fire or a hard disk crash, but I need them protected from my self. I need to be able to restore accidentally deleted or overwritten data. So incremental backups need to be created. Only the most recent version needs to be stored offsite of course.

My "Dream" Solution

proposed backup setup So here is what I have in mind. Central part is a Network-Attached-Storage (NAS) device. All client machines would backup to the NAS, the NAS itself then mirrors certain parts to a server on the Internet.

This mirroring could be done at times I'm asleep or not at home, so it wouldn't clog up my small upload when I need it myself.

NAS Hardware

I'm not sure which device would be best fitted, yet. Here are my requirements:

  • hold several 3.5“ SATA hard drives
  • low power consumption (will run 24/7)
  • silent (fanless preferred)
  • open operating system (Linux or BSD)
  • clever disk management

The last part is something, I'm not sure if it exists in any ready made solution. Ideally I do not want to think about partitioning and disk management at all. I want to be able to put in another disk and have it added to the available space. And I want to mark a disk for removal, let the system shuffle all data to the remaining disk(s) and remove it.

What I do not need is a RAID1). Since the NAS will already be the second location for my data (first being the desktop or laptop disk) and some data is even copied to a third (outside) location, RAID would be overkill.

Storage Server

The cheapest and simplest solution would probably be rented webspace with SSH access. Dreamhost for example offers “unlimited” space but not for “sites whose essential purpose is to use disk or bandwidth” – not sure if using them as a backup site would be allowed.

Amazon S3 might be another, but more expensive solution.

Backup Software

All the backups should happen automatically in the background and not interrupt my usual work. For the laptop the software would need to detect when the local network with the NAS isn't available and postpone backups until it is connected again. Support for Windows would be nice but isn't a necessity.

Restoring files, browsing the backups and accessing older snapshots needs to be easy. It shouldn't be necessary to use any special client to do so, but having a nice optional interface would be a plus.

Speaking of the snapshots. This should have some cleverness. Files changing more often should be backuped more often. Also older snapshots should be deleted automatically to avoid running out of space. A client-server solution might be sensible to move this logic from the desktop to the NAS.

The offsite mirroring could probably be done by a simple cron job using rsync. For the client machines a single client should be used to backup all three kinds of data.

Suggestions?

As you can see I'm not very far with my plans, yet. A bit of googling brought up some interesting backup clients, but I haven't looked too deeply into any of them, yet. I'm not sure what NAS hardware I should use either.

If you are using a similar backup setup, please let me know in the comments. If you use a completely different one let me know as well and please explain what your rationale was. And of course I'm grateful for all hardware and software suggestions you can make.

Tags:
backup, linux, timemachine, snapshot
Similar posts:
1)
in the sense of redundant disk array