Re: Using Linux for data archival



Cyber Punk wrote:
I currently have lots of data that I'd like to archive with varying
degrees of reliability. The data I have consists of:

1) Important documents - needs to be encrypted and redundantly stored.
I can use Truecrypt for encryption.

2) Large files of non-essential data I'd just like to have easily
accessible. DVD rips, isos, music.

3) Many small files of non-essential data, such as website wgets or
saved webpages.

Can anyone recommend:
1) The best Linux filesystem to use for such data; doesn't corrupt
easily, remains quick for a few large files/many small files, less
prone to data fragmentation.

2) What open source data archival software I should use that has on
average a high compression ratio and a recovery record to help recover
most/all of the archive in the event of data corruption.

3) Whether it is better to store data as tarfiles & compressed with a
recovery record, or uncompressed without being tarred and no recovery
record.

4) One insiduous way hard drives fail is that files start
disappearing. Is there a way of getting Linux to report missing
files?

5) My file book keeping was less than perfect and sometimes I have
multiple copies of files of the same name. Is there a way of copying
everything into a large hard drive but getting Linux to only overwrite
clashing file names if they are newer?

6) Whether statistically speaking with some thought of cost, one is
better off with RAID arrays or just backing up key data to DVDs.

Thanks.
What I have here, may work for you.

I have a debian linux server, which contains ALL my data that I don't want to lose, and serves three desktop machines. Using SMB. Its a 6 year old chassis with very little RAM and no screen at all.

Even my mail clients store all the mail on it, and I did toy with web stuff, but decided a list of bookmarks and browsing history wasn't that important.

It has a second hard drive, and every night a cron job rdiff updates the second drive to be a copy of the first.

I looked at burning dvds, but the cost of doing that was after a short while, more expensive than the second hard drive. And I had filename issues.

The second drive is twice as big as the first one. When the first one fills up, I will make the second one the main data disk, get one twice as big again, and use that as backup.

The advantages of doing this are:-

- Having a file server integrates well with the desktop machines: if you get in the habit of using the server for everything, there is no extra action required to ensure the data is on there.

- using the second hard drive to autoback the first, is - if automated - a huge boon. Unlike RAID, you actually have a *copy* of everything, so if you screw up a file as I did yesterday, the original is in the backup for last night..no need to find a backup and do anything untarrish.

- If either of the disks on the server go pear shaped, you have the other. RAID can itself go pear shaped. I always prefer mirroring to RAID if the data is slow moving enough. use RAID to keep a fast moving data handling machine on a 24x7 uptime..don't use it to preserver archival material.

- the data is instantly accessible. No need to store DVDs, find them, insert and fiddle.


If you are truly paranoid, get a friend who is likewise, and back up each others data over the Internet. That works even if your machines get stolen.


.



Relevant Pages

  • Re: Raid problem?
    ... I have a server that I support that I ... the two hard drives be configured with raid (all data stored ... It says all hard drives OK. ... Device Port Location: Internal ...
    (microsoft.public.windows.server.sbs)
  • Re: SBS Server Problems - Solution and Raid Suggestions.
    ... Event logs on the server are the first piece of information you need. ... > server (except extra 1GB ram and network card) ... > I was thinking to take out there network card, and hard drives and put in ... > I would like to move them over to a Raid 5 setting. ...
    (microsoft.public.windows.server.sbs)
  • Re: GHOST RAID 1 to 0
    ... Check the docs for the RAID controller or phone Dell technical support. ... The problem is that if I disable the Dell PERC 5/i the server does not see any of the physical drives and I believe that there are only 2 onboard SATA controllers so a multi-drive environment would not work in this environment. ... The client would like everything on seperate hard drives ie) OS (Mirrored ...
    (microsoft.public.windows.server.general)
  • Re: RAIDING different size drives
    ... You dont need to restore specific parts on a hard drive failure with the OS not mirrored. ... it being too easy to delete your good mirror by making it part of a new raid, ... But people *do* use it by mistake, thinking they are doing a proper backup. ... with a Win live CD to get a look at the state of the hard drives when deciding what has died etc. ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: Personal Firewalls
    ... Firewalls (Whatever you choose should be checked and automatically combed ... A dedicated client PC with no unneeded applications loaded would ... I would not host the data on a web server unless absolutely ... >inexpensive solution could be the installation of removable hard drives. ...
    (Security-Basics)