20658 – "hda: lost interrupt" and machine lock-up

Bug 20658 - "hda: lost interrupt" and machine lock-up

Summary: "hda: lost interrupt" and machine lock-up

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-11-10 20:19 UTC by Yaron Minsky
Modified:	2008-08-01 16:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:38:51 UTC
Embargoed:

Attachments	(Terms of Use)

Description Yaron Minsky 2000-11-10 20:19:55 UTC

My laptop (A solo 2500 with BIOS revision 10.12) often freezes up when
it is woken up from sleep.  It's actually not entirely frozen, but anything
that tries to look at the disk waits forever, which is more or less the
same thing.  It has been suggested (see bug
<a href="http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=6596">
6596</a>) that this is some kind of BIOS related bug.

Note that I don't have any problems with NT on this machine.

Comment 1 Michael Schwendt 2000-11-12 18:32:02 UTC

The same occurs on my non-laptop machine (AMD Duron 650 MHz with GA-7ZX) and
/dev/hdb (hdb: IBM-DTTA-350840, 8063MB w/467kB Cache, CHS=1027/255/63, UDMA).
Once been put into "sleep mode", I can't reactivate that HD with hdparm or I/O
access. The kernel's "hdb: lost interrupt" message/retry series locks up the
machine, trying forever to wake up the HD. As long as I have a console left, I
can't kill the process which is trying to access the HD.

I've tried the hdparm work-around which is mentioned in /etc/sysconfig/apmd, but
it doesn't help. 

I remember that on a much slower machine (Pentium I) with Red Hat Linux 6.2, the
kernel managed to wake up the machine after approximately four retries,
performing a HD reset.

Comment 2 Yaron Minsky 2000-11-12 19:07:09 UTC

Perhaps information about the model of my disk is useful as well:

hda: TOSHIBA MK1011GAV, 9590MB w/0kB Cache, CHS=1222/255/63

I've now tried the hdparm fix suggested in /etc/sysconfig/apmd.  I don't know
yet if it works, and it will be hard to tell definitively, since the problem is
intermittent.  I'll report back as to how it works out.

Comment 3 Yaron Minsky 2000-11-13 16:34:51 UTC

hdparm fix did not work.

Comment 4 Pavel Shkilionok 2000-11-17 13:09:54 UTC

I have the similar problem on my non-laptop machine. Here is my configuration: 
Pentium III 600MHz with GA-6BX7+ motherboard, 256MB RAM. My /dev/hde: QUANTUM 
FIREBALL 19595MB Ultra66. Problem is that every time Im trying to retrieve 
data from tar.gz-ed archive which is about 100MB in size and contains about 
4,000 files by executing e.g. 
     gzip dc archname.tar.gz  | tar xvf 

I always get an error about:
hde: lost interrupt
hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hde: drive not ready for command

After that RH hangs forever not reacting on Ctrl+C or other actions. 

I have to say that I switched my harddrive into IDE slots instead of U66, 
disabling ATA66 on motherbord but it didn't help and the problem is still 
recreatable.

Comment 5 Yaron Minsky 2000-11-28 18:33:27 UTC

It's worth noting that this problem could well be a hardware error.  It's hard
to tell.  I had a similar problem, in particular, the following kind of message

hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }

It turned out to be a bum EMI filter chip which was toasting my hard drive.
Figuring out whether these are hardware or kernel problems (or some weird
combination) can be tricky.

Comment 6 Pavel Shkilionok 2000-11-29 08:30:14 UTC

I just would like to inform you that I was able to fix this problem on my PC. 
The matter is I've discovered after a long investigation that problem never 
appears if PC is not connected to network. I had network adapter 3Com EtherLink 
10/100 PCI NIC (3C905B-TX) installed on my PC, so I just replaced it with a 
simpler one RealTek RTL8139 FastEthernet to fix this problem and now I'm happy! 
Hope that would help you as well.

Comment 7 Michael Schwendt 2000-11-29 12:25:48 UTC

Well, I'm using very cheap PCI 10 Mbps Ethernet cards already, and I'm not
sharing interrupts. But maybe my network card is too cheap and therefore causes
hardware conflicts that affect a single harddrive.

$ cat /proc/interrupts 
           CPU0       
  0:     115321          XT-PIC  timer
  1:        990          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  3:      22560          XT-PIC  serial
  5:          1          XT-PIC  soundblaster
 10:       6715          XT-PIC  NE2000
 13:          1          XT-PIC  fpu
 14:     242060          XT-PIC  ide0
 15:         12          XT-PIC  ide1
NMI:          0

Also, I seem to remember that I've taken the harddrive from a Win95 machine
which was used to compile Win32 executables. There haven't been any
suspend/sleep problems with that system (IIRC). A slower CPU managed to reset
the drive (see above).

Comment 8 Need Real Name 2000-12-10 15:51:03 UTC

I have noted the same problem ever since I upgraded my compaq presario running rh 

6.1 to the 2.2.16 kernel. I have an alternate installation on the machine running 

rh5.2 w/kernel 2.2-pre6 and it does not ever give me the 'lost interrupt' 

message.



Therefore I don't think it's hardware, as the original rh6.1 kernel didn't 

exhibit the problem on the same hardware, and I have an older 2.2 kernel on the 

same machine that never has the problem.



I can repeatably cause the problem to occur by doing disk I/O on two IDE drives 

and heavy network I/O (the machine in question is a router) at the same time with 

kernel 2.2.16.

Comment 9 Yaron Minsky 2000-12-14 15:21:06 UTC

I think I now know what triggers at least some of the lost interrupts.  It
apparently happens when I put the machine to sleep with the ethernet card in (A
Linksys PCMLM56), and wake it up with the card out.

Comment 10 Michael Schwendt 2000-12-22 01:13:13 UTC

I've had to change my HD configuration and have taken out the IBM drive
(mentioned earlier). I still get "lost interrupt" (but also drive busy error)
messages upon waking up sleeping Seagate drives (e.g. a ST310240A). But the
kernel almost immediately performs an IDE reset which is successful. 

So, for me this is no longer issue for some time.

Comment 11 Kaiserslautern, Germany 2001-01-07 08:00:11 UTC

I have encountered this same problem with a new RH 7.0 install.  It occurs as 
packages are being loaded by anaconda.  System freezes, HD light remains lit, 
and "hda: lost interrupt" begins.  I've tried minimum installation (kinda brute 
force it through), disabling all PM in BIOS, and different partitions.
I am using an AMD K6-2 450, 256MB PC100, WD307AA (30G) divided into multiple 
partitions for Caldera OpenLinux 2.3, Win2000, and Win98.  COL installed and 
has worked fine for some time.

Comment 12 Bugzilla owner 2004-09-30 15:38:51 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.