Re: IBM pSeries Self-Healing Technology.

From: Rich Gibbs (rgibbs_at_REMOVEhis.com)
Date: 04/05/04


Date: Mon, 05 Apr 2004 17:57:23 -0400

Mike Cox said the following, on 04/05/04 13:05:
> Rich Gibbs <rgibbs@REMOVEhis.com> wrote in message news:<4070d155@news101.his.com>...
>
>>TuxTrax said the following, on 04/04/04 20:59:
>>
>>>In article <3d6111f1.0403281110.7c20bb54@posting.google.com>,
>>> mikecoxlinux@yahoo.com (Mike Cox) wrote:
>>>
>>>
>>
>> [snip]
>>
>>>>That is what x86 needs. An architecture of reliablity and
>>>>self-healing. x86 has the speed but now it needs the reliability and
>>>>fail-over capabilities. Until then, AIX and pSeries will do the real
>>>>work, and x86/Linux will still be on non-mission critical stuff like
>>>>software development where you have a back-up of everything in CVS
>>>>(and customers dont' get mad if you have to reboot or you lose their
>>>>data for that day).
>>>
>>>
>>>mike, that is called redundancy, and it is both expensive and
>>>unnecessary in consumer electronics. This aint nasa, it's email and the
>>>web.
>>>
>>
>>The other thing to keep in mind is that, just as there is a range of
>>application requirements, there is a range of ways to address
>>reliability and fail-over. There are applications for which providing
>>duplicate server facilities can get to entirely acceptable reliability
>>levels without the need for very expensive hardware.
>>
>>One example, which I have implemented in a couple of different places,
>>is delivery of within-day, trade-by-trade security prices to a trading
>>operation. You use two servers, two incoming feeds (each tee'd to both
>>servers), duplicate internal network backbones, and so on. You can
>>easily get a failover time of < 1 second, and continuous uptimes of
>>several years, with the capacity to handle many hundreds of updates per
>>second.
>
>
> Ouch!
>
> You better look at that setup again, because you could very well have
> data integrity issues. What happens when one of those servers has a
> problem? How do you know which server has the correct data? That is
> why a database management system *must* have transaction support! As
> far as I know, there aren't many RDBMS except for Oracle that can
> provide transaction support for a cluster of servers. And that
> version of Oracle is *much* more expensive than a pSeries!
>

That setup has been looked at extensively, not only by me, and is
actually USED extensively in trading environments today. (Installations
from large vendors like Reuters use the same approach.) This is _not_ a
database system in the sense that an airline reservation system is; I
would say it is more accurately described as a data delivery or
distribution system. I meant it as an example of a specialized
application that could achieve high reliability without specialized
hardware -- which it does in fact do.

Both servers process all the updates from both incoming feeds. Those
feeds come from (for example) the Stock Exchange, and they are the
_only_ source of updates. All the incoming updates are sequenced and
time-stamped, and there is guaranteed to be a "heartbeat" update at
least every couple of seconds. So the server is either processing
updates, or it's effectively dead.

The RDBMS was introduced by your assumption, not by me; in fact, there
is no RDBMS used. The entire "data base" is just (effectively) one
table, with ~ 1 million rows, each row being ~400 bytes. So the whole
thing fits into memory. This is not a typical data base application,
for several reasons:

    -- The most important thing is to deliver the most recent value to
the user's workstation.
    -- There is only one source of updates, which is by definition right.
    -- The history of updates is valuable for later analysis / research
purposes, but is of very little immediate operational value. You keep
the history essentially by "tape recording" the incoming update stream.
    -- Unlike most databases, the volume of updates in a typical day is
much larger than the total size of the database.
    -- The maximum rate at which updates can arrive is known in advance
(from the maximum capacity of the data feed).
    -- The statistical properties of the updates are pretty well known.
  For example, prices and trading volumes change frequently, the trading
currency does not.

Believe me, I do understand database integrity issues. When you work in
an investment banking environment, your systems get a lot of scrutiny
from a lot of different sets of auditors. I would never advocate using
a system like I described for trade processing and position keeping --
that _does_ belong in a RDBMS with tight integrity controls and
comprehensive auditing facilities.

> If I were your employer, I'd be quite worried about the design of the
> system because if there is a problem with one of the servers, you may
> not know *which* one has the correct data.

Fortunately. my employers and clients understand the requirements of
their business better than you do. (I don't mean that as an insult: it
is a specialized business that requires specialized knowledge.) FWIW,
we never received anything but positive comments in our reviews by the
SEC, the Federal Reserve, the FSA (in the UK), and so on.

-- 
Rich Gibbs
rgibbs@his.com


Relevant Pages

  • Re: Network slow after 2nd DC
    ... You can consider this a new network. ... It is the only DC running DNS and DHCP ... All servers have fixed IP's ... every hour for the last 3 hours because the updates did not change the ...
    (microsoft.public.windows.server.active_directory)
  • Re: Installation instructions for Firefox somewhere?
    ... Many updates are not security-related. ... > of dealing with an app that breaks as a result of a security update ... The solution here is to stop using Windows, ... They can just plug in the servers and run ...
    (freebsd-questions)
  • Re: Auto-Updates for production servers
    ... other connections to the terminal servers. ... They have Microsoft Windows Auto-Updates turned on for all production ... and auto updates 'are required to prevent viruses and hackers'. ...
    (microsoft.public.windows.server.general)
  • Re: This Is Why PC World Are Getting Smaller!
    ... The same thing happens when BT servers stop working. ... As soon as I switched it on, it wanted to download Windows 7 Updates. ... files settings etc, ...
    (uk.legal)
  • Re: [OT] Server for Debian + MySQL
    ... support an RDBMS, dozens applications and 10,000 OLTP users. ... even have many RDBMS servers and a lot of web ... system-grunt-work to allow the CPU to just deal with the application? ...
    (Debian-User)