Bug 58984
Summary: | Deadlock on DMA with ceratin seagate IDE drives | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Kevin Range <range006> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.2 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-04-12 04:31:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kevin Range
2002-01-28 22:00:51 UTC
Can you try hdparm -X34 /dev/hdXX ? THat's one step lower in DMA but still with dma... -X34 does improve things. No more deadlocks. However, operation is still not problem free. /var/log/messages says: Feb 1 12:57:44 muscat kernel: hda: timeout waiting for DMA Feb 1 12:57:44 muscat kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Feb 1 12:57:44 muscat kernel: blk: queue c03bab80, I/O limit 4095Mb (mask 0xffffffff) Feb 1 12:57:49 muscat kernel: hda: status timeout: status=0xd0 { Busy } Feb 1 12:57:49 muscat kernel: hda: drive not ready for command Feb 1 12:57:49 muscat kernel: ide0: reset: success Feb 1 12:57:49 muscat kernel: blk: queue c03bab80, I/O limit 4095Mb (mask 0xffffffff) when we run our "toture test" above. Andre Hedrick says that this is an "Intel PIIX4 erratium, NO FIX", but then why do the machines with the 2.13 firmware work fine in UltraDMA mode? Similar problem with segate ST380021A2 IDE drive, firmware version 3.01 on a Asus CUV4x-DLS. 1GB memory, 2x 1GHz PIII, no sound card, G400 video card. IDE is primary master (used as a scratch directory), no other IDE disks in system. 1 36GB scsi disk (used for everything else) and 1 scsi DVD-RAM drive hdparm -d0 /dev/hda dosent fix problem. When copying files to the IDE drive or essentially using the IDE drive at all, system will momentarily freeze, followed by a clicking noise from IDE drive. System then becomes usable for the next few minutes. Looking in /var/log/messages shows: Feb 19 08:03:36 gewurztraminer kernel: hda: DMA disabled Feb 19 08:15:00 gewurztraminer kernel: hda: status timeout: status=0xd0 { Busy } Feb 19 08:15:00 gewurztraminer kernel: hda: no DRQ after issuing WRITE Feb 19 08:15:30 gewurztraminer kernel: ide0: reset timed-out, status=0x80 Feb 19 08:15:35 gewurztraminer kernel: hda: status timeout: status=0x80 { Busy } Feb 19 08:15:35 gewurztraminer kernel: hda: drive not ready for command Feb 19 08:15:35 gewurztraminer kernel: ide0: reset: success Feb 19 08:32:55 gewurztraminer kernel: hda: status timeout: status=0xd0 { Busy } Feb 19 08:32:55 gewurztraminer kernel: hda: no DRQ after issuing WRITE Feb 19 08:33:25 gewurztraminer kernel: ide0: reset timed-out, status=0x80 Feb 19 08:33:30 gewurztraminer kernel: hda: status timeout: status=0x80 { Busy } Feb 19 08:33:30 gewurztraminer kernel: hda: drive not ready for command Feb 19 08:33:30 gewurztraminer kernel: ide0: reset: success Feb 19 08:39:20 gewurztraminer kernel: hda: status timeout: status=0xd0 { Busy } Feb 19 08:39:20 gewurztraminer kernel: hda: no DRQ after issuing WRITE Feb 19 08:39:50 gewurztraminer kernel: ide0: reset timed-out, status=0x80 Feb 19 08:39:55 gewurztraminer kernel: Feb 19 08:39:55 gewurztraminer kernel: wait_on_irq, CPU 0: Feb 19 08:39:55 gewurztraminer kernel: irq: 0 [ 0 0 ] Feb 19 08:39:55 gewurztraminer kernel: bh: 1 [ 0 1 ] Feb 19 08:39:55 gewurztraminer kernel: Stack dumps: Feb 19 08:39:55 gewurztraminer kernel: CPU 1:4004dd52 4004dd62 4004dd72 4004dd82 4004dd92 400b05f0 4004ddb2 4004ddc2 Feb 19 08:39:55 gewurztraminer kernel: 4004ddd2 4004dde2 4009a140 4004de02 4004de12 4004de22 400b3040 4004de42 Feb 19 08:39:55 gewurztraminer kernel: 40113700 400b0940 4004de72 4004de82 4004de92 4004dea2 4004deb2 4004dec2 Feb 19 08:39:55 gewurztraminer kernel: Call Trace: Feb 19 08:39:55 gewurztraminer kernel: Feb 19 08:39:55 gewurztraminer kernel: CPU 0:f7ff3f28 c024aed0 00000000 00000000 ffffffff 00000000 c0108842 c024aee5 Feb 19 08:39:55 gewurztraminer kernel: 00000000 db088000 00000001 c0174b78 db088168 c02deee4 f7ff3f74 f7ff2658 Feb 19 08:39:55 gewurztraminer kernel: f7ff2000 c011fb4d db088000 db088130 c02deee4 f7ff2000 00000000 c0128095 Feb 19 08:39:56 gewurztraminer kernel: Call Trace: [call_spurious_interrupt+119259/153515] .rodata.str1.1 [kernel] 0x74b Feb 19 08:39:56 gewurztraminer kernel: Call Trace: [<c024aed0>] .rodata.str1.1 [kernel] 0x74b Feb 19 08:39:56 gewurztraminer kernel: [__global_cli+226/368] __global_cli [kernel] 0xe2 Feb 19 08:39:56 gewurztraminer kernel: [<c0108842>] __global_cli [kernel] 0xe2 Feb 19 08:39:56 gewurztraminer kernel: [call_spurious_interrupt+119280/153515] .rodata.str1.1 [kernel] 0x760 Feb 19 08:39:56 gewurztraminer kernel: [<c024aee5>] .rodata.str1.1 [kernel] 0x760 Feb 19 08:39:56 gewurztraminer kernel: [flush_to_ldisc+216/288] flush_to_ldisc [kernel] 0xd8 Feb 19 08:39:56 gewurztraminer kernel: [<c0174b78>] flush_to_ldisc [kernel] 0xd8 Feb 19 08:39:56 gewurztraminer kernel: [__run_task_queue+93/112] __run_task_queue [kernel] 0x5d Feb 19 08:39:56 gewurztraminer kernel: [<c011fb4d>] __run_task_queue [kernel] 0x5d Feb 19 08:39:56 gewurztraminer kernel: [context_thread+325/512] context_thread [kernel] 0x145 Feb 19 08:39:56 gewurztraminer kernel: [<c0128095>] context_thread [kernel] 0x145 Feb 19 08:39:56 gewurztraminer kernel: [context_thread+0/512] context_thread [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: [<c0127f50>] context_thread [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: [_stext+0/96] stext [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: [<c0105000>] stext [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26 Feb 19 08:39:56 gewurztraminer kernel: [<c0105866>] kernel_thread [kernel] 0x26 Feb 19 08:39:56 gewurztraminer kernel: [context_thread+0/512] context_thread [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: [<c0127f50>] context_thread [kernel] 0x0 Feb 19 08:39:56 gewurztraminer kernel: Feb 19 08:39:56 gewurztraminer kernel: Feb 19 08:39:56 gewurztraminer kernel: hda: status timeout: status=0x80 { Busy } Feb 19 08:39:56 gewurztraminer kernel: hda: drive not ready for command Feb 19 08:39:56 gewurztraminer kernel: ide0: reset: success New information. Running with no DMA is not just "not problem free". We are experienceing data loss. Apparently data being written during teh "ide reset dance" is lost or corrupted. Even more strange is the regularity of the resets, aprrox. every four minutes (while scping files to it): Feb 20 10:56:36 tokay kernel: hda: status timeout: status=0x80 { Busy } Feb 20 10:56:36 tokay kernel: hda: drive not ready for command Feb 20 10:56:36 tokay kernel: ide0: reset: success Feb 20 11:00:06 tokay kernel: hda: status timeout: status=0xd0 { Busy } Feb 20 11:00:06 tokay kernel: hda: no DRQ after issuing WRITE Feb 20 11:00:36 tokay kernel: ide0: reset timed-out, status=0x80 Feb 20 11:00:41 tokay kernel: hda: status timeout: status=0x80 { Busy } Feb 20 11:00:41 tokay kernel: hda: drive not ready for command Feb 20 11:00:41 tokay kernel: ide0: reset: success Feb 20 11:04:21 tokay kernel: hda: status timeout: status=0xd0 { Busy } Feb 20 11:04:21 tokay kernel: hda: no DRQ after issuing WRITE Feb 20 11:04:51 tokay kernel: ide0: reset timed-out, status=0x80 Feb 20 11:04:56 tokay kernel: hda: status timeout: status=0x80 { Busy } Feb 20 11:04:56 tokay kernel: hda: drive not ready for command Feb 20 11:04:56 tokay kernel: ide0: reset: success Feb 20 11:06:43 tokay sshd(pam_unix)[7335]: session opened for user root by (uid=0) Feb 20 11:08:36 tokay kernel: hda: status timeout: status=0xd0 { Busy } I have four machines here all with the specs I gave in my first report. Two are running Model=ST380021A, FwRev=2.13 with no problems. Two are running Model=ST380021A, FwRev=3.01 and having data loss. All proposed solutions (hdparm whatever) have failed. All proposed explainations have been shown to be false ("Intel PIIX4 erratium, NO FIX" makes no sense if two machines work, two don't, and someone else with a non-intel chipset is having similar problems). Seagate says the drives are all good, so this is looking like a Linux kernel problem to me. Any ideas on how to get these drives working? I got rid of all of these drives, so I can't test this bug anymore. |