Re: Looking for advice on rsync
From: Nick Landsberg (SPAMhukolauTRAP_at_SPAMworldnetTRAP.att.net)
Date: 10/17/04
- Next message: Michael Heiming: "Re: Fedora Core 2 does not detect my sound card."
- Previous message: Lawrence DčOliveiro: "Re: kernel recompile minor problems: help"
- In reply to: Bill Unruh: "Re: Looking for advice on rsync"
- Next in thread: Tony Lawrence: "Re: Looking for advice on rsync"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 16 Oct 2004 22:13:44 GMT
Bill Unruh wrote:
> "Tony Lawrence" <pcunix@gmail.com> writes:
>
> ]Yesterday I was at a client site where they explained that they wanted
> ]to keep a stand-by server up to date and ready to take over in case of
> ]main system failure. Fine, lots of people do that, and currently they
> ]are doing it by restoring backups every morning. What they were asking
> ]about was using rsync or some other mechanism to keep the machines
> ]more current.
>
> ]My first reaction was to question them about their app: it's
> ]apparently a mess of Basic programs that work on hundreds of different
> ]data files 24 x 7. I asked if they shut down the users for backup.
> ]They don't. I explained that the backup can't really be guaranteed
> ]consistent if users are writing data to files because obviously files
> ]are going to be backed up while they are being written to. As the
> ]files are related to each other (A/R header and detail files, indexes,
> ]etc.) you can have inconsistent versions on the backup media.
> ]. Somebody told them (obviously incorrectly) that rsync could prevent
> ]this. There are databases that will let you replicate data while
> ]you run, but that's application level, and this app has no such
> ]ability. Rsync can't do any better than a backup for that.
>
> ]I talked about snapshots, and asked if they could shut down users for
> ]the very brief time it takes to do that. Nope. Can't ever stop the
> ]flow of data. I explained that rsync can't answer that problem any
> ]better than a tape backup can: files and indexes may be inconsistent
> ]with each other.
>
> ]I also wonder if rsync's rolling checksum might even make things
> ]better or worse. On the one hand we get less data transferred as
> ]opposed to just a rcp or whatever, but open files that may be getting
> ]writes during the checksum make things even more confusing.
>
> ]They can't change the app. They can't shut down users. Their bank says
> ]they have to have better disaster recovery. These things seem
> ]impossible to reconcile. My feeling is that if the banks demands have
> ]to be met, then the user community HAS to put up with periods where
> ]they can't use the app. If they use snapshots, that period can be
> ]brief, otherwise its going to be fairly long (13 GB of data to
> ]transfer).
>
>
> They need some sort of file locking mechanism within their program so that
> if the lock file is present, the app suspends its use of the file until it
> is freed. That way rsync. or whatever could lock the file, do the update
> and then unlock it. The users would simply notice a brief hesitation in
> service-- something they already probably experience 10 times a day anyway.
>
>
Good idea, Bill, but Tony seems to have an overconstrained problem
on his hands.
1. - You can't shut down the app for even a minute.
2. - You can't change the app so that it can be suspended.
but ...
3. - You must have a recoverable backup.
and ...
4. The app was not built with backups in mind (I specifically
do not use the term "designed").
Given the current state as I understand it, they do "backups"
once a day. (It is not clear whether they have ever attempted to
restart from these backups. Have they?) If a disaster should
happen they risk the loss of up to 24 hours of data. Let's
say 12 hours on average. That's assuming that the last backup
is usable and will give them a consistent view even as of 3 AM
that morning. If not, it's 3 AM the previous morning, if lucky,
or it may be 3 AM a week ago or something worse.
Given the above, what's the cost of losing 12+ hours
worth of data when a a disaster happens (in terms of lost
business, revenue, or whatever this company considers
important) versus the cost of delaying (losing?) 5 minutes
worth of transactions on a nightly basis? (Or however
long the snapshot takes.) In addition, what time may be
spent tryint to restore sanity to a messed up backup?
Quantifying all the alternatives
and putting the numbers in front of the customer
will go a long way to convince them of what's
the right decision for them.
NPL
P.S. - It might be a good idea to ask them if they have
either records of war stories to tell after even a minor
disaster, like a disk crash. If it took them 6 hours
to recover from a disk crash, the scars may still be
painful enough that they would go for re-structuring
their code to handle a "the file is locked" condition
rather then risk another 6 hours worth of downtime.
-- "It is impossible to make anything foolproof because fools are so ingenious" - A. Bloch
- Next message: Michael Heiming: "Re: Fedora Core 2 does not detect my sound card."
- Previous message: Lawrence DčOliveiro: "Re: kernel recompile minor problems: help"
- In reply to: Bill Unruh: "Re: Looking for advice on rsync"
- Next in thread: Tony Lawrence: "Re: Looking for advice on rsync"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|