[olug] Machine Locking up, need hardware guru advice

Trent Melcher tmelcher at trilogytel.com
Mon Aug 9 18:07:36 UTC 2004


When you tested the secondary drive did you have this on the same ribbon
cable as the Primary drive????   I have seen a bad ribbon cable exhibit
this type of behavour, also incompatible controller drivers for the mobo
can cause this....

Trent

Trent Melcher
Network/System Administrator
Startouch International LTD.
 

-----Original Message-----
From: olug-bounces at olug.org [mailto:olug-bounces at olug.org] On Behalf Of
Miller, Scott L (Omaha Networks)
Sent: Monday, August 09, 2004 11:17 AM
To: Omaha Linux User Group
Subject: [olug] Machine Locking up, need hardware guru advice


Hi all,

I've had random hard lockup problems with a self built PC for a long
while now.  It's getting to be a real PITA, as it managed to wipe my
root partition out this past weekend.

Requisite info:
	ASUS A7N8X Deluxe mobo
	AMD Athlon XP 2800+ (barton 2.083 GHz)
	1 Gig Ram
	MSI nVidia GeForce FX5200 Video Card
	Primary IDE Channel : 60 Gig HD (WD I think)  &  52x CD-Rom
	Secondary IDE Channel: 20 Gig HD (also WD I think)  &
24?x16?x52x CD-RW
	OS - I don't think it matters, Mandrake 9.1 until the crash this
weekend claimed the root partition, then Knoppix CD received at latest
install fest after that, it's also locked up when running the new Novell
supplied SUSE distribution that I installed at the install fest on a 160
Gig Harddrive.  I could also test with Win2K some more, but haven't
yet...

Symptoms:
	Locks up hard - no keyboard/mouse response at all, reset or
power button to reboot.

Troubleshooting steps taken:
	First off, heat is not a problem, system is water cooled,
processor temp monitored by 2 sensors, one built into mobo, the other
probe, which is mounted next to the processor, is connected to a Digital
Doc 5.  From all the readings I've taken, no part of the system has ever
gotten above 100 degrees Fahrenheit.  A DigDoc5 monitors 8 locations,
I'm monitoring incoming air, video card processor, memory, northbridge
heatsink, processor, drive area, power supply and something else I can't
remember off hand. (typing at work, machine's at home)

	Ok, so I first thought RAM was the problem, but I've swapped
that a few times, and run memtest a bunch.  No lock ups during that
process, and the memory tests are clean. I also thought maybe it was
driver issue with Mandrake until the CD version of Knoppix exhibited the
same behavior.  Also used to think it might have been the USB
mouse/keyboard, but swapping those for PS/2 mouse and keyboard didn't
make any difference.

	So, once the crash ate the root partition, I booted up with the
Knoppix CD to attempt a fix, it was toast.  Then, I tried to reinstall
Mandrake, got mostly finished but then it Locked Up. I rebooted, seemed
to be fine, started configuring, had another lock up, this time it ate
Perl, and thus wouldn't let me into X-Windows.  So, I abandoned that and
began to only troubleshoot.

	I again grabbed the Knoppix CD, booted it, and ran the memtest
program for about an hour. No lockups, no errors.  No subsequent memtest
runs ever resulted in a lock up.

	Now, for those who are not familiar with the Knoppix CD, it is
an entire linux installation on CD, when it boots it creates a RAM drive
to store the various things like /etc /home etc.  So, there is no hard
drive involved when it first comes up. This is important because as long
as I left the hard drives alone, the machine was stable and running well
for hours at a time.  I did that to search the net for other
descriptions of problems similar to mine, and I ended up upgrading my
BIOS during that search.  The BIOS update didn't help at all, but also
didn't hurt anything either (that I can tell).

	Once I got the hard drives involved, that's when the machine
locks up.  I started testing the first hard drive thinking there might
be some bad blocks.  Now to be totally fair, I was able to get a random
read/write non-destructive test of the root partition to complete 2 or 3
times.  However, it was the 7 to 10 times that the random lockups
happened during this process that has led me to believe that the mobo
chipset or linux drivers for said chipset is the real culprit.  I ruled
out the actual hard drive by also testing on a blank partition I had on
the secondary 20 Gig HD, and it locked up during that test as well.
BTW, no bad blocks were ever found on the Hard Drives.

	I'd also thought about conflicts with the CD drives, so removed
the CD-ROM drive that was sharing the primary channel with the 60 gig
HD.  Didn't matter. It still locked up.

	So, does anyone have any suggestions of what more to test?  Or
maybe what program to use under windows to really stress test the Hard
Drive/controller?

Thanks,

-Scott
_______________________________________________
OLUG mailing list
OLUG at olug.org
http://lists.olug.org/mailman/listinfo/olug





More information about the OLUG mailing list