Subscribe to RSS feed

splitbrain.org - electronic brain surgery since 2001

Backing up my Data

I started thinking about my backup strategy, yesterday. The problem is that there is no real strategy. I have some stuff on RAID and my development files are checked into my darcs repository. But that's it.

This post is to do a little public brainstorming on what I'd like to implement.


What to backup

Let's have a look at the data I have.

First there is the stuff I never, ever want to loose because it can't be reproduced and is of big value to me. That's my photos mostly. Currently about 14GB but growing yearly by 6 to 10GB I estimate.

Then there's data that I would hate to loose, but could be reproduced if necessary. That's my media files. MP3s mostly. This takes a lot of space – about 100GB I'd say.

The last part is smaller, but rapidly changing data (in comparison). That is all the stuff in my home directory, config files and databases.

Where to backup

This different data needs to be backed up differently. Data of the first type has to be backed up distributed. Each photo should exist on at least two different physical locations. When my house burns down, my photos should be safe.

The media files on the other hand do not need to be backed up to an offsite location. Having them on a second disk or secured by RAID should be enough to be safe from a hard disk crash. When my house burns down I have different worries than my MP3s.

The last group of data is a bit different. Here I not only need a backup in case of fire or a hard disk crash, but I need them protected from my self. I need to be able to restore accidentally deleted or overwritten data. So incremental backups need to be created. Only the most recent version needs to be stored offsite of course.

My "Dream" Solution

proposed backup setup So here is what I have in mind. Central part is a Network-Attached-Storage (NAS) device. All client machines would backup to the NAS, the NAS itself then mirrors certain parts to a server on the Internet.

This mirroring could be done at times I'm asleep or not at home, so it wouldn't clog up my small upload when I need it myself.

NAS Hardware

I'm not sure which device would be best fitted, yet. Here are my requirements:

  • hold several 3.5” SATA hard drives
  • low power consumption (will run 24/7)
  • silent (fanless preferred)
  • open operating system (Linux or BSD)
  • clever disk management

The last part is something, I'm not sure if it exists in any ready made solution. Ideally I do not want to think about partitioning and disk management at all. I want to be able to put in another disk and have it added to the available space. And I want to mark a disk for removal, let the system shuffle all data to the remaining disk(s) and remove it.

What I do not need is a RAID1). Since the NAS will already be the second location for my data (first being the desktop or laptop disk) and some data is even copied to a third (outside) location, RAID would be overkill.

Storage Server

The cheapest and simplest solution would probably be rented webspace with SSH access. Dreamhost for example offers “unlimited” space but not for “sites whose essential purpose is to use disk or bandwidth” – not sure if using them as a backup site would be allowed.

Amazon S3 might be another, but more expensive solution.

Backup Software

All the backups should happen automatically in the background and not interrupt my usual work. For the laptop the software would need to detect when the local network with the NAS isn't available and postpone backups until it is connected again. Support for Windows would be nice but isn't a necessity.

Restoring files, browsing the backups and accessing older snapshots needs to be easy. It shouldn't be necessary to use any special client to do so, but having a nice optional interface would be a plus.

Speaking of the snapshots. This should have some cleverness. Files changing more often should be backuped more often. Also older snapshots should be deleted automatically to avoid running out of space. A client-server solution might be sensible to move this logic from the desktop to the NAS.

The offsite mirroring could probably be done by a simple cron job using rsync. For the client machines a single client should be used to backup all three kinds of data.

Suggestions?

As you can see I'm not very far with my plans, yet. A bit of googling brought up some interesting backup clients, but I haven't looked too deeply into any of them, yet. I'm not sure what NAS hardware I should use either.

If you are using a similar backup setup, please let me know in the comments. If you use a completely different one let me know as well and please explain what your rationale was. And of course I'm grateful for all hardware and software suggestions you can make.

Tags:
backup,
linux,
timemachine,
snapshot
Similar posts:
1) in the sense of redundant disk array
Posted on Saturday, December the 20th 2008 (3 years ago).

Comments?

1
This is my setup. I have a server that has about 750 gigs of storage which shares a drive. All the other computers in the house store all their important data to that drive. Music, photos, tax returns, etc. The server has a 300 gig external drive attached to it for immediate backups. Then I use Jungle Disk (http://www.jungledisk.com/) on the server to backup my data to Amazon S3. It's $20 USD for a lifetime subscription, which I think is well worth it. Using this setup I think I have a pretty comprehensive backup plan. Jungle Disk emails me every day a report of how my backup went. Right now, I only store the most important things to Amazon S3 (Photos and Documents about 10 gigs) and it costs me about $1.50 a month.
2008-12-20 21:00:38
2
I used to use folder share, but it now is Windows Live Sync. Instead of Live Sync I use Live Mesh - mesh.com. It allows me to sync directories between different machines as well as with their only storage. Terminal services is also build into the desktop software so it also replaced tightvnc for me. It works for me, but at this time they do not support Linux, plus you are limited only to their 5 GB storage space. I wish M$ would simply make the skydrive and the Live desktop one so instead of 5 GB it is 25 GB.

The nice things are that it syncs directories between multiple machines and with online storage. The sync can also be some what customized. It also comes with terminal services and with an online interface so you don't have to have terminal services installed on the machine you are connecting from. You can also share the storage and connection with other users.

The bad things are still some bugs, limited storage, no Linux support.
2008-12-20 22:20:29
3
Shame on you, now you've got me thinking again ;)

Amazon S3 indeed looks interesting (and affordable), especially when combined with something like duplicity (encryption, yay!) or s3sync and a cronjob run at night from the NAS (see http://foosel.org/blog/2008/04 … rfect_htpc for our hardware setup here). Syncing the sbackup (http://sbackup.wiki.sourceforge.net/) results over to an already existing rootserver overnight could be an option of course too, but that depends on the size of backuped data I'd say.

Will have to look into this...
2008-12-20 22:27:25
4
I have only a few files needed to back up and synced between computers. I use http://getdropbox.com to do my syncing. It's basically a repository. It keeps all versions of a file and updates each computer at the moment of changing. The free plan includes 2GB but you can get 50GB for either $100 a year $10 a month.
2008-12-21 00:01:46
5
Not easy to decide on these things I think.
My self I use rsync over password-less ssh connection to back up my CVS repo and NFS shares. But since a while I'v been thinking of replacing Rsync by Amanda (http://www.amanda.org) to have some more options.
2008-12-21 03:38:35
6
> But since a while I'v been thinking of replacing Rsync by
> Amanda

Don't.

Amanda is great (as in, it still makes you pull your hair out, but it's worth it) if you have -- say -- 20 to 30 hosts and more  to backup, but for a small network with maybe a notebook, a desktop workstation and a netbook it's simply more trouble than it's worth. Maybe take a look at rsnapshot instead, or duplicity or the like.
2008-12-21 12:52:16
7
I agree with you, I have to confess though, that I just want to give it a try to see for my self :)

I didn't know about rsnapshot though, it builds upon rsync and provides more features.
2008-12-21 17:50:26
8
Thanks for your comments so far, please keep them coming.

Just a few remarks: Linux-Support is a must, so anything Mac or Windows only is straight out. Dropbox looks nice, but my upload speed sucks, so I need to have a local solution that only transfers the most important stuff elsewhere.

Any hardware suggestions? Foosel, your HTPC setup is a bit overpowered for my taste. I'm thinking more in the range of a 200Mhz Arm processor or similar low power CPUs.
2008-12-21 17:53:30
9
Thought so, wanted to mention it anyway ;)

Have you taken a look at the NSLU, something OpenWRT based plus an external harddrive (and in exchange for an existing router), or maybe an EeeBox (Atom based)?
2008-12-21 20:23:03
10
Dreamhost does not allow you to use their "normal" webspace for personal backups, the TOS states only data associated with your website(s) can be stored on their servers.

They do however, provide a "Personal Backup" account which allows you to store 50GB of data using your normal plan on a separate server. However this data is not backup up or guaranteed in any way. http://wiki.dreamhost.com/Pers … nal_Backup

Considering that S3 costs $7.50US to store 50GB of data per-month and a Dreamhost account is ~10$US a month, depending on pre-pay length, S3 would seem to be a better deal. You would, of course, have to factor in bandwidth costs with S3. You would probably use the EU servers, which have a slightly higher cost.

If you do use S3 I would also recommend looking into JungleDisk, it should fit all your requirements.

Good luck.
2008-12-22 04:42:55
geniusfreak
11
did you take a look at drobo? (http://www.drobo.com/)
it at seems to fit most of your requirements for the NAS hardware, seems to be quite the hype and works fine with linux (http://drobo-utils.sourceforge.net/)
2008-12-23 00:32:31
12
Yes, the Drobo seems to fit my idea of automatic disk management very well. But as far as I understand, you need the Drobo *and* the separate Drobo Share module to access it from the network. Amazon.de tells me that would be about 750 Euro plus the hard drives. Far too much.
2008-12-23 00:40:57
13
I make this http://github.com/niky81/s3rba … ree/master to backup my important things on S3, try it if you want
2008-12-23 17:44:43
14
I also use Dreamhost as backup space. As a host Dreamhost really isn't that good ( I get around 5 downtimes of at least half and hour a month ) but it works as a backup solution since they introduced their "personal backup" where every Dreamhost user gets 50GB of space to backup whatever one might need to backup. 50GB of backup space for around 8€/month is really cheap, plus each GB is only 0.10$/month. And if you need some quick webspace you can use their webhosting, they even allow you to compile your own PHP version for example. Another downside might be the fact that DH considers backup space to be just that, a backup of existing data so they don't backup their backup space like providers which do nothing but network storage. On the upside, you can use rsync to backup your data.
2008-12-27 18:51:32
jan
15
RAID is good, but it's not backup.  Look at the sticky situation that Journalspace got into very recently by mistaking RAID for backup.  If you delete a file, it'll probably be deleted on the rest of the RAID array too, so it's really gone :(

Rsync-ing to a seperate machine is a pretty damn good solution IMO, I think though that Amazon S3, while possibly an ideal solution could be prohibitively expensive!
2009-01-03 21:11:12
16
Could we have an update on your backup solution if something has changed?
2010-01-13 23:01:03
CAPTCHA

No HTML allowed. URLs will be linked with nofollow attribute. Whitespace is preserved.