Re: how can a bit be off in memory?



On Sat, 30 Jun 2007 11:15:47 +0100, The Natural Philosopher wrote:

ray wrote:
On Fri, 29 Jun 2007 23:58:48 +0200, Charles T. Smith wrote:

Vim started crashing on me, particularly when I tried to open new
lines. I finally checked it out with rpm and a newly downloaded copy of
vim's rpm and discovered that exactly one byte, deep into vim, was
wrong.

I rebooted my machine (which has been super-solid for years) - and the
difference was gone.

So, what are the opinions - did I run into a hardware glich, or was
there a freaky issue with memory mapping?

I think I'd run 'badblocks' on the disk.

Oh..I had this years ago - and repeatably too..on a machine with some
third party hardware in it.

Two bytes were FF on a file loaded from the floppy disk.

It turned out to be a timing issue: At a given point the DMA was doing its
stuff, and the IO address momentarily passed that assigned to the 3rd
party card, which was slow on decoding it, that it thought it was still
there when the IO line went up..and so the card grabbed the bus and
plonked an 'all ones' 16 bit on the thing..

My guess is yo have some marginal hardware in the system, and maybe the
weather was warm, and the transfer from disk just happened to be in an
area of memory that excited the bad hardware..

In other words a confluence of events conspired..

Never believe that computers are 100% reliable,. A friend spent some time
years ago developing hardware: they ran into an interrupt timing issue.
Every 4-5 hours the machine would crash when a timer interrupt interrupted
a particular piece of code. For reasons deep and complex, they couldn't
turn interrupts off, but the managed to re-code the bit of code so the
offending part was very small, and they calculated the crash would occur
only once in every 5 years or so.

They left it that way, knowing that occasionally a user would scratch his
head, reboot, and shrug his shoulders 'I wonder what THAT was?' :-)
:-)

ONE bad byte is indicative of an 8 bit peripheral misbehaving. Its NOT
indicative of a memory or disk error...those happen at the bit level and
generally result in an error being flagged.

The odd thing is that your on-disk copy got corrupted..or was it? Or was
it the LOADED copy that was corrupted?


I did a 'cmp -l /bin/vim /tmp/vim*/*/bin/vim'
and there was one byte difference. Unfortunately, I didn't
pay attention to what the values were - could have be a single bit.

I could believe the simple timing-margin thing you mentioned earlier.
The point being, of course, that I was necessarily comparing out of the
block cache ...

an idea occurs to me ... I should have tried to run for a few hours
with a different copy of vim in the hopes that the buffer cache would get
flushed and then try the cmp again.

What I'm really curious about is if it was only a cosmic ray or other
hardware explanation (like timing issues), or whether it could have
been a software problem, like with memory mapping hardware improperly
configured somewhere for some instant, or something like that.


.



Relevant Pages

  • Re: Bring Apple][ back with USB / BlueTooth
    ... in addition to hardware timing loops for things like disk ... And this is another reason Apple][can't hardly improve anymore since ... Apple][can use SATA 3G hard disk, ...
    (comp.emulators.apple2)
  • Re: how can a bit be off in memory?
    ... So, what are the opinions - did I run into a hardware glich, or was ... Two bytes were FF on a file loaded from the floppy disk. ... It turned out to be a timing issue: At a given point the DMA was doing ... Every 4-5 hours the machine would crash when a timer interrupt ...
    (comp.os.linux.misc)
  • Re: how can a bit be off in memory?
    ... So, what are the opinions - did I run into a hardware glich, or was ... Two bytes were FF on a file loaded from the floppy disk. ... It turned out to be a timing issue: At a given point the DMA was doing its ... Every 4-5 hours the machine would crash when a timer interrupt interrupted ...
    (comp.os.linux.misc)
  • Re: Interrupts
    ... timing something like an interrupt in the ... emulator is *completely* worthless. ... have a nice hardware signal with which you can measure timing. ...
    (microsoft.public.windowsce.embedded.vc)
  • Re: Slow XP
    ... Did you update hardware drivers from the manufacturer when you upgraded OS ... When was the last time you checked your hard disk for problems? ... Microsoft has these suggestions for Protecting your computer from the ... The system restore feature is a new one - first appearing in Windows ...
    (microsoft.public.windowsxp.perform_maintain)