Re: Setting up a Linux network storage cluster
From: Peter T. Breuer (ptb_at_lab.it.uc3m.es)
Date: 11/26/04
- Next message: Jens: "tc and SIP phone"
- Previous message: Alexander V. Butenko: "adsl Zyxel Prestige 630-C1 trouble"
- In reply to: Johannes Petersson: "Re: Setting up a Linux network storage cluster"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 26 Nov 2004 13:04:23 +0100
Johannes Petersson <jp@data-tronic.se> wrote:
> ptb@lab.it.uc3m.es (Peter T. Breuer) wrote in message news:<745c72-1np.ln1@news.it.uc3m.es>...
> > Fine. But why a cluster? Can't you just use them individually as
> > backup? They don't seem to need to be available in real time, so why
> > bother clustering? No failover needed.
> >
> Yes, failover is needed since I want to be able to replace one failing
> computer without taking the whole backup solution down and manually
> rescuing the data in the failing computer.
But why FAILOVER of the data? Why not just two copies of the data? You
can move the IP address automatically from one node to the other if you
like, but I see no need for realtime update of the data between them.
Just two copies, done daily. You can copy one from the other at daily
intervals too, and hence have two days of backups online, or keep
the even days on one machine and the odd days on another machine, and
thus have twice the extension of backups available, only at most
missing out one day if one machine goes down.
I really don't see any need for realtime failover of the data.
> > Huge. There's no good way to make use of such very heterogenous sizes
> > efficiently and safely. The best thing to do would be to take the drives
> > of similar size, say 5 of 120GB, and put them into one machine as a
> > RAID5 device. Do the same with those of size 60GB, and the same with
> > those of size 40GB, and the same with those of size 20GB.
> >
> Okay, this suggestion is probably a good idea, and I will try to make
> this happen.
>
> > At that point you can start thinking about how to arrange them into
> > pairs of failover devices in a cluster, if you really wanted to, but I
> > don't see the point.
> >
> This is also very interesting. This is where openMosix once again
What on earth has openMosix got to do with anything? I don't know where
you get that word from. You are not doing computation, which is what
Mosix is for.
> comes in or? The point of this when the computers already have hard
> drives internaly raided might not be that big. But the original point
> was not to have raided hard drives in each computer but to set up all
> the computers to raid between each other in the network / cluster.
There is no sense to that, and if you were to do it, it would not
follow that design!
> > Well, the canonical way. but what's the difficulty and WHY? It's crazy.
> >
> Why is it crazy? It would give us several terrabytes of storage to
> basically zero cost (the only cost would be my monthly salary, which
Not the way you are doing it it wouldn't. Please take notice instead of
skipping the explanation.
> is basically zero! :). What do you mean with the cannonical way, have
The normal way! Form raid arrays on the nodes, and do the backup to
alternate backup nodes on alternate days. Failover the floating IP for
recoveries from one to the other if you must, but I really don't see
why. It's no big deal to lose a day of backups if you also have the
previous day still available, and the next day. If you were that
paranoid you would be spooling the backups to tape in the background!
If you really wanted to form networked raid arrays, you would do it in
pairs, forming from two computers with two partitions two raided
devices - say a mirror - one on each node, each with one local component
and one remote component. The nodes would failover the IP and do
slightly complicated things to present the appropriate local device for
replacement export in case the other node fails.
> each of the computers to map the subsequent one with for example NFS?
?? No, although you can export the raided devices via nfs if yu wanted
to.
> > You can't - it's a crazy idea. While a RAID5 of 20 disks is safer
> > against disaster than a linear aggregate of the 20, it's chances of
> > going down per day are 19p^2/2, as opposed to 20p, where p is the
> > probability of one disk dying per day. So it's about 1/p times less
> > likely to die on any particular day. HOWEVER, that forgets that
> > failures are not independent - usually heat or spike related. The real
> > probabilities are higher. And you also have a much higher absolute
> > probability of failure per se, simply because you are using 20 disks
> > instead of one. And then there are the network brownouts ...
> >
> I'm not talking about that every computer should hold a copy of every
> other, I would want them to be raided maybe three ways within the
Three ways? You mean three mirror copies, or raid5 with three components?
> network nodes. So if one computer fails it would be possible to detach
> it and replace it on the fly.
Oh - well, that's a little more sensible and is a classic configuration.
In that configuration the three nodes each have three partitions. They
raid together two remote partitions (one each from the other two nodes)
and one local partition to form a single raid5 partition which they
export (via nfs, for example).
Failover is complicated, however. I don't think I want to describe it!
I don't think I even want to think about it. It's insane because of its
dependence on the net to even begin to function normally.
You'd get more mileage out of raiding (raid5) two local partitions and
one remote partition on each node, but then you need four local
partitions per node, if I count correctly. Each node has to supply one
extra partition to each of the two other remote nodes, as well as bind
two of its own partitions into its own raid5. Four in total, locally.
And when I say partition, you can take that as "disk", if you prefer.
That configuration would be sensible. The other is nonsensical because
losing the net ruins all the raids. This configuration survives a
network dropout without any problem.
> > local raid topology is usally a quad or triple (1 local device and 2
> > or 3 remote devices forming a raid5), and the whole toplogy of the nodes
> > is a torus or something similar. Thus each node would serve out one
> > raid5 device and import two or three raid5 devices from neighbours,
> > each of the latter comprising at least one of its own local disks.
> >
> This is basically what I'm talking about, and the way to solve this
> would be?
See above.
> Using NFS or samba or what to make the nodes import / export
> to eachother and then manually run rsync? There has to be a better way
Eh? Oh - don't you know about network block devices? There is nothing
to "solve".
> than this that is always up to date and always keeps the computers in
> sync with each other??
You don't care abut them being in sync - it's not a realtime
application.
> > > - I've looked into setting up all the computers as a openMosix cluster
> > > with oMFS (openMosix File System) but I don't know if this actually
> >
> > This is not what you want - that sort of thing is for computation, not
> > storage.
> >
> Okay, so what is it that I want? I would call it a storage cluster but
> how do I set that up and what software / kernel patches do I need?
Just the normal stuff. Look at the high availability pages.
> > > - Then there is the possibillity of using some sort of Logical Volume
> > > Management System, like EVMS or LVM, maybe EVMS combined with a
> > > cluster is needed?
> >
> > That would be over the TOP of the raid device you make. Forget it for
> > now.
> >
> Well, I'm not really sure I should forget it since I want to make my
> network or cluster in to the raid device and then I should probably
> put EVMS or LVM on top of that.
Sure. Forget it for now. It'll go on top of the raid devices.
Peter
- Next message: Jens: "tc and SIP phone"
- Previous message: Alexander V. Butenko: "adsl Zyxel Prestige 630-C1 trouble"
- In reply to: Johannes Petersson: "Re: Setting up a Linux network storage cluster"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|