Re: How to PostMortem?

From: w_tom (w_tom1_at_hotmail.com)
Date: 04/18/04

  • Next message: Helpm: "Forcing Redhat 9 installer to use 'lower' architecture (i686)?"
    Date: Sun, 18 Apr 2004 13:51:24 -0400
    
    

      It is convenient to verify fans. But if one stopped fan
    caused overheating in a 70 degree room, then computer was
    defective - computer could not operate at another normal
    temperature - 100 degree F. More often, some will cite heat
    and surges as reasons for failure due to insufficient
    technical knowledge.

      First thing to check if doors are binding is the
    foundation. Same applies to computer. It is built on a
    foundation called power supply. Procedures to verify power
    supply subsystem - all three components of that system - was
    posted as "Computer doesnt start at all" in
    alt.comp.hardware on 10 Jan 2004 at
    http://tinyurl.com/2t69q
    or
    "I think my power supply is dead" in alt.comp.hardware on 5
    Feb 2004 at http://tinyurl.com/yvbw9 . Notice a simple and
    essential tool is required - 3.5 digit multimeter. You need
    that tool to perform this and other inspections.

      Second, run comprehensive diagnostics on the system. All
    responsible computer systems have comprehensive diagnostics
    provided free. Best to run these tests in a 100 degree R room
    or by selectively heating computer components with a hairdryer
    on high. Heat tends to make a defective component more
    obvious. (BTW, this is why so many will fix a computer by
    installing more fans rather than replace the defective
    component).

      For example, defective memory often works just fine in a 70
    degree room. Heat it to 100 degrees, then the comprehensive
    memory diagnostic finds failures.

      In the meantime, some wire buildings with the AC electric
    wire pushed into back of receptacle. Sufficient for lights
    and toasters. Woefully unacceptable for computers. Follow AC
    electric from computer's receptacle to every receptacle, back
    to breaker box. Simply remove cover plate and inspect. Wires
    must be firmly attached to (wrapped around) screws on side of
    each receptacle. Those stab-lock type connections have caused
    computer crashes.

      If not sure what to look for, then buy a $0.40 receptacle at
    Home Depot to understand this visual inspection.

      Surge protectors and UPSes are mostly promoted by urban
    myth. For example, a surge occurs typically once every eight
    years. If computer crashed due to surge, then you have
    hardware damage. If a brownout occurred, well, many brownouts
    will cause blinking clocks and still not effect a computer.
    Intel specs are quite blunt about this. Incandescent lights
    can dim to less than 40% and a computer, with full load of
    peripherals, must still startup and work unaffected. Again,
    basic specs that any technically informed computer assembler
    should know.

      This of course assumes you bought a power supply on
    electrical specs - not the specification called price. With
    so many technically naive N Americans, Asian manufacturer have
    discovered a very profitable business. They sell supplies
    missing essential functions at below $60 full retail. They
    avoid providing specifications since their profits are from
    the technically naive (those who quickly jump to heat and
    surges as reasons for failure). Review specifications from
    that power supply vendor. Did he provide them? Therein lies
    a very good reason for computer failure. Power supply
    purchased using bean counter mentality rather than on a long
    list of numerical and technical specifications.

      Power supply must not even damage computer. Just another
    function missing in 'discount' supplies. And yet many will
    say power supply can damage a computer only because they saw
    same. IOW blame placed elsewhere rather that admit computer
    damage was directly traceable to their technical naivety.
    Time to learn if the power supply even meets minimally
    acceptable standards (and not foolishly try to fix the problem
    with more watts).

      Provided are the beginnings of how one does a post mortem -
    using numerical specs and science rather than speculation
    about heat. Good luck in your many inspections and data
    collection.

    "Dr. Sven Geier" wrote:
    > Heya all -- I just had the very first linux server crash of my life
    > (and I've been noodling around with Slack since the 0.9something
    > kernels). After 186 days uptime, my machine just simply hung. No
    > response. I returned home from work and
    >
    > - the secreen was blank
    > - C-A-F1 or C-A-F2 didn't do a thing (I don't run X)
    > - No response to a ping from the wife's WinBox.
    >
    > I hit the reset-button and it checked it's HD and all and came up
    > just fine.
    >
    > Now I strongly suspect the hardware, of course, as software doesn't
    > ecactly tend to "suddenly quit after half a year" for no particular
    > reason. But since I've never actually had to do such a thing, I
    > have no idea where to start trying to figure out what exactly
    > happened. All the logs (in /var/log) just simply happen to stop at
    > 19:22pst last night with no *obvious* sign of trouble, but I can't
    > say that I'd necessarily recognize it if it bit me in the nose.
    >
    > The setup is a plain vanilla Slack9(point zero, but swaret-maintained)
    > on a generic Soyo motherboard with generic ram and HD and all --
    > nothing terribly exotic in there. No USB or FireWire devices, not
    > even a soundcard or such.
    >
    > I figure if some part of the hardware is marginal, there might be
    > some way to test that(??) Anybody got an idea where to start?
    >
    > Thank you very much in advance
    >
    > -- Sven


  • Next message: Helpm: "Forcing Redhat 9 installer to use 'lower' architecture (i686)?"
    Loading