Bug 232811 - system freezes with kernel-2.6.20-1.2925
system freezes with kernel-2.6.20-1.2925
Status: CLOSED INSUFFICIENT_DATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks: 427887
  Show dependency treegraph
 
Reported: 2007-03-18 05:35 EDT by han pingtian
Modified: 2008-02-07 23:28 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-02-07 23:28:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/dmesg (18.62 KB, text/plain)
2007-03-20 07:00 EDT, han pingtian
no flags Details
smolt info from hanpingtian using 2.6.18 (4.02 KB, text/plain)
2007-03-22 10:32 EDT, Chuck Ebbert
no flags Details
smolt profile with kernel-2.6.20-1.2925 (4.04 KB, text/plain)
2007-03-23 09:33 EDT, han pingtian
no flags Details
smolt profile with kernel-2.6.20-1.2925 (4.04 KB, text/plain)
2007-03-23 09:34 EDT, han pingtian
no flags Details

  None (edit)
Description han pingtian 2007-03-18 05:35:58 EDT
Description of problem:
After running with the kernel-2.6.20-1.2925 for a while (randomly), the system
freezes. Keyboard and mouse lose responding. But the display of the screen
doesn't disappear.It is really "freezing"

Version-Release number of selected component (if applicable):
kernel-2.6.20-1.2925

How reproducible:
always

Steps to Reproduce:
1.running with kernel-2.6.20-1.2925 for a while
2.
3.
  
Actual results:
system freezing.

Expected results:


Additional info:
http://smolt.fedoraproject.org/show?UUID=e5b52a3c-03b9-4f38-a4df-14f1f872e389
Comment 1 Chuck Ebbert 2007-03-19 14:07:57 EDT
Can you post the log from when you boot?

Just post the contents of /var/log/dmesg
Comment 2 han pingtian 2007-03-20 07:00:51 EDT
Created attachment 150474 [details]
/var/log/dmesg
Comment 3 Scott White 2007-03-20 18:29:41 EDT
I have also been experiencing system freezes on this kernel with exactly the
same symptoms, but with i686.  Screen freezes, Pings stop, no message to screen,
nothing in the logs.

I have been running 2.6.18-1.2798.fc6-i686 since late Feb with no issues,
upgraded 24 hours ago to 2.6.20-1.2925 and have had 5 or 6 hangs since.  System
checks out with Memtest.  Booting back to 2.6.18 fixes it.

The freezes seem to coincide with heavy IO on my 8 disk RAID 5 stripe on a
Supermicro sata_mv card.  If I don't attempt to rebuild the array the system
will stay up for several hours.  Rebuilding the array under 2.6.20-1.2925 will
never complete and often the system will not even complete booting.  Again
2.6.18-1.2798 is fine.
Comment 4 Chuck Ebbert 2007-03-21 11:14:36 EDT
Can you post the exact models of your disk drives?

Lines in kernel log should look something like this:

scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD160JJ/ ZM10 PQ: 0 ANSI: 5

Also, can you post whether NCQ was enabled for each drive, for example:

ata1.00: ATA-7, max UDMA7, 312500000 sectors: LBA48 NCQ (depth 31/32)
Comment 5 Vaclav "sHINOBI" Misek 2007-03-21 19:09:05 EDT
Similar problems here (FC 5 with kernel-2.6.20-1.2300.fc5 and sata_nv). Log shows:

kernel: ata2: EH in ADMA mode, notifier 0x0 notifier_error 
0x0 gen_ctl 0x1501000 status 0x400
kernel: ata2: CPB 0: ctl_flags 0x1f, resp_flags 0x0
kernel: ata2: CPB 1: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 2: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 3: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 4: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 5: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 6: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: Resetting port
kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 acti
on 0x2 frozen
kernel: ata2.00: cmd 61/08:00:cd:e3:50/00:00:09:00:00/40 ta
g 0 cdb 0x0 data 4096 out
kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Em
ask 0x4 (timeout)
kernel: ata2: soft resetting port
kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 3
kernel: ata2.00: configured for UDMA/133
kernel: ata2: EH complete

kernel: scsi 0:0:0:0: Direct-Access     ATA      ST380817AS
       3.42 PQ: 0 ANSI: 5
kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 3
00)
kernel: ata2.00: ATA-6, max UDMA/133, 156301488 sectors: LB
A48 NCQ (depth 31/32)

The problems appeared just with this latest kernel update.
Comment 6 han pingtian 2007-03-22 10:07:06 EDT
scsi 2:0:0:0: Direct-Access     ATA      WDC WD1600JS-22M 02.0 PQ: 0 ANSI: 5

And it seems there is no "NCQ" in the logs.
Comment 7 Chuck Ebbert 2007-03-22 10:11:20 EDT
(In reply to comment #0)
> 
> Additional info:
> http://smolt.fedoraproject.org/show?UUID=e5b52a3c-03b9-4f38-a4df-14f1f872e389

Can you resend the smolt info while running the new kernel?

The driver info has alsmost certainly changed.
Comment 8 Chuck Ebbert 2007-03-22 10:32:30 EDT
Created attachment 150666 [details]
smolt info from hanpingtian using 2.6.18
Comment 9 Chuck Ebbert 2007-03-22 12:11:00 EDT
(In reply to comment #3)
> I have also been experiencing system freezes on this kernel with exactly the
> same symptoms, but with i686.  Screen freezes, Pings stop, no message to screen,
> nothing in the logs.

> The freezes seem to coincide with heavy IO on my 8 disk RAID 5 stripe on a
> Supermicro sata_mv card.  If I don't attempt to rebuild the array the system
> will stay up for several hours.  Rebuilding the array under 2.6.20-1.2925 will
> never complete and often the system will not even complete booting.  Again
> 2.6.18-1.2798 is fine.

This is a separate bug. Please file a new bugzilla report so we can track it
properly.
Comment 10 han pingtian 2007-03-23 09:33:42 EDT
Created attachment 150755 [details]
smolt profile with kernel-2.6.20-1.2925
Comment 11 han pingtian 2007-03-23 09:34:16 EDT
Created attachment 150756 [details]
smolt profile with kernel-2.6.20-1.2925
Comment 12 Chuck Ebbert 2007-03-23 17:49:45 EDT
Okay, it is still using sata_sil for the hard drives.
Comment 13 Chuck Ebbert 2007-03-26 10:41:26 EDT
Test kernels (1.2937) for this issue are at:

http://people.redhat.com/cebbert

Please test and report back.
Comment 14 Pasi Karkkainen 2007-03-26 11:27:15 EDT
Hmm.. 2.6.18 and 2.6.19 fc6 xen kernels work OK for me, but 2.6.20 freezed after
a while (from a couple of seconds to some minutes..)..

Now this 1.2937 crashes immediately during the bootup.. :(

Anything I can try?

Comment 15 Pasi Karkkainen 2007-03-26 11:34:04 EDT
I tried 1.2937 again and with the second and third try it booted ok.. I wonder
what happened with the first try.. then the system rebooted itself while booting
the kernel?

Now let's see if 1.2937 actually stays up and doesn't crash by itself like
1.2933 did. 

My hardware is Intel P4 with i955x chipset, ahci sata disks.
Comment 16 Pasi Karkkainen 2007-03-26 11:58:25 EDT
No win.. the server is rebooting itself every 1-10 mins with 1.2937.. the server
is idle when that happens (or maybe md-raid1 reconstruction running, but nothing
else). 

Comment 17 Chuck Ebbert 2007-03-26 12:01:57 EDT
(In reply to comment #16)
> No win.. the server is rebooting itself every 1-10 mins with 1.2937.. the server
> is idle when that happens (or maybe md-raid1 reconstruction running, but nothing
> else). 
> 
> 

Please report a separate bug for this, as it involves Xen.


Comment 18 Pasi Karkkainen 2007-03-26 12:40:07 EDT
Done: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234008
Comment 19 Pasi Karkkainen 2007-03-27 02:51:23 EDT
Also related?: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=233918
Comment 20 Chuck Ebbert 2007-03-27 10:26:22 EDT
Can someone who originally reported this bug please test kernel 2937 or greater?
The Xen problem is a completely different bug.
Comment 21 han pingtian 2007-03-28 09:00:02 EDT
(In reply to comment #20)
> Can someone who originally reported this bug please test kernel 2937 or greater?
> The Xen problem is a completely different bug.
> 
I am testing it now ....
One question: there is no such package as "kmod-fglrx.2.6.20-1.2937", but my
X-window is still running, could you tell me why?
Comment 22 Chuck Ebbert 2007-03-28 09:30:30 EDT
(In reply to comment #21)
> I am testing it now ....
> One question: there is no such package as "kmod-fglrx.2.6.20-1.2937", but my
> X-window is still running, could you tell me why?

I was wondering about that myself...
Comment 23 han pingtian 2007-03-28 10:05:37 EDT
It freezes just now......
Before that, I am running yum. It blocked at futex and I killed it. And then, just  
for a while, the system freezes.
Comment 24 David Juran 2007-04-03 12:53:20 EDT
I've just tested with 2.6.20-1.2940.fc6 and it works considerably better then
2933, but still not perfect. With 2933 the computer locked up completely and a
power cycle (reset was not enough) was required to obtain access to the  SATA
disk again. Performing the same operation with 2940 the system became
unresponsive for a while and then the messages below showed up in the syslog but
the machine recovered.


Apr  3 19:10:22 localhost kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400
Apr  3 19:10:22 localhost kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:30 localhost kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:30 localhost kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 14: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 15: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 16: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 17: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 18: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 19: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 20: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 21: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 22: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 23: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 24: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 25: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 26: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 27: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 28: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 29: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 30: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: Resetting port
Apr  3 19:10:32 localhost kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff
SErr 0x0 action 0x2 frozen
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:00:8d:fb:39/01:00:0f:00:00/40 tag 0 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/00:08:75:99:59/02:00:0e:00:00/40 tag 1 cdb 0x0 data 262144 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/80:10:55:14:3a/01:00:0f:00:00/40 tag 2 cdb 0x0 data 196608 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:18:dd:15:3a/01:00:0f:00:00/40 tag 3 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:20:85:10:3a/01:00:0f:00:00/40 tag 4 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:28:c5:17:3a/01:00:0f:00:00/40 tag 5 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:30:ad:19:3a/01:00:0f:00:00/40 tag 6 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:38:35:23:3a/01:00:0f:00:00/40 tag 7 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:40:1d:25:3a/01:00:0f:00:00/40 tag 8 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:48:b5:0c:3a/01:00:0f:00:00/40 tag 9 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:50:05:27:3a/01:00:0f:00:00/40 tag 10 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:58:65:f2:39/01:00:0f:00:00/40 tag 11 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/80:60:4d:f4:39/01:00:0f:00:00/40 tag 12 cdb 0x0 data 196608 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/10:68:b5:d9:97/00:00:03:00:00/40 tag 13 cdb 0x0 data 8192 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:70:75:fd:39/01:00:0f:00:00/40 tag 14 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:78:5d:ff:39/01:00:0f:00:00/40 tag 15 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:80:9d:0e:3a/01:00:0f:00:00/40 tag 16 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:88:6d:12:3a/01:00:0f:00:00/40 tag 17 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:90:2d:03:3a/01:00:0f:00:00/40 tag 18 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:98:fd:06:3a/01:00:0f:00:00/40 tag 19 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:a0:e5:08:3a/01:00:0f:00:00/40 tag 20 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:a8:a5:f9:39/01:00:0f:00:00/40 tag 21 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:b0:cd:0a:3a/01:00:0f:00:00/40 tag 22 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:b8:95:1b:3a/01:00:0f:00:00/40 tag 23 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:c0:7d:1d:3a/01:00:0f:00:00/40 tag 24 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:c8:65:1f:3a/01:00:0f:00:00/40 tag 25 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:d0:bd:f7:39/01:00:0f:00:00/40 tag 26 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:d8:4d:21:3a/01:00:0f:00:00/40 tag 27 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/08:e0:3d:da:97/00:00:03:00:00/40 tag 28 cdb 0x0 data 4096 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/10:e8:bd:ff:97/00:00:03:00:00/40 tag 29 cdb 0x0 data 8192 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:f0:d5:f5:39/01:00:0f:00:00/40 tag 30 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3: soft resetting port
Apr  3 19:10:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Apr  3 19:10:32 localhost kernel: ata3.00: configured for UDMA/133
Apr  3 19:10:32 localhost kernel: ata3: EH complete
Apr  3 19:10:32 localhost kernel: SCSI device sda: 398297088 512-byte hdwr
sectors (203928 MB)
Apr  3 19:10:32 localhost kernel: sda: Write Protect is off
Apr  3 19:10:32 localhost kernel: SCSI device sda: write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Comment 25 David Juran 2007-04-13 05:00:41 EDT
Is this the correct place to suggest additions to the ata_device_blacklist in
drivers/ata/libata-core.c? If so I'd suggest adding the following entry there:

{ "Maxtor 6B200M0",     "BANC",         ATA_HORKAGE_NONCQ }

With this entry, my computer works fine again and the drive no longer locks the
machine up under load.

Comment 26 David Juran 2007-04-13 08:35:17 EDT
"works fine" turned out to be a bit of an exaggeration, under heave I/O the
machine hard-locked and needed a power cycle to recover )-:
I'm now running 2.6.20-1.2944.fc6 with the parameter "adma=0" passed to the
sata_nv module and this seems (so far) to work fine...
Comment 27 Chuck Ebbert 2007-04-19 11:32:46 EDT
kernel 2944 has the latest NCQ blacklist from 2.6.21
Comment 28 Vaclav "sHINOBI" Misek 2007-04-19 17:01:52 EDT
For me, the 2944 makes the same problems as the previous kernels. Maybe the
sata_nv fix from the latest kernels should help.
Comment 29 Robert Hancock 2007-04-19 19:17:59 EDT
It doesn't look like David Juran's issue is a problem with the driver. The CPB
response flags indicate 0x2 which means the controller has sent the command to
the device and is waiting for it to indicate completion, obviously it never did.
Quite likely NCQ does not work properly on that drive and it needs to be added
to the NCQ blacklist.

Disabling ADMA also disables NCQ so it is not surprising that it also stops the
problem from showing up.
Comment 30 Chuck Ebbert 2007-04-19 19:38:48 EDT
(In reply to comment #29)
> It doesn't look like David Juran's issue is a problem with the driver. The CPB
> response flags indicate 0x2 which means the controller has sent the command to
> the device and is waiting for it to indicate completion, obviously it never did.
> Quite likely NCQ does not work properly on that drive and it needs to be added
> to the NCQ blacklist.
> 

But David says he added the drive to the blacklist himself and that didn't
fix the problem. Maybe he didn't add it properly?
Comment 31 Robert Hancock 2007-04-20 02:05:35 EDT
I don't think the firmware part of the line he mentioned he added is correct.
The SCSI layer lists only the first 4 characters of the firmware string but the
actual ATA string is longer, you need the full string (from hdparm -I for example).
Comment 32 David Juran 2007-04-23 13:21:56 EDT
D'Oh!
So the firmware revision should be "BANC1BM0". I'll re-enable adma and try this
for a few days and let you know how it fares...
Comment 33 Robert Hancock 2007-04-23 18:41:09 EDT
If the blacklist entry has been recognized properly you should see "NCQ (not
used)" instead of "NCQ (depth 31/32)".
Comment 34 David Juran 2007-04-26 14:53:41 EDT
It seems my drive is more messed up then it has any kind of right to be. To find
out the model and revision, I inserted  into ata_device_blacklisted the
following printk:

        printk(KERN_NOTICE "modellen ar: XXX%sXXX\n",model_num);
        printk(KERN_NOTICE "revisionen ar: XXX%sXXX\n",model_rev);


and this is what I got into dmesg:

modellen ar: XXXMaxtor 6B200M0XXX
revisionen ar: XXXBANC1BM0Maxtor 6<C0>^E^?^?XXX

There seem to be some non-printable characters in model_rev! Maybe it would make
sense to just blacklist the entire model irregardless of revision i.e.

{ "Maxtor 6B200M0",     NULL,           ATA_HORKAGE_NONCQ }
Comment 35 han pingtian 2007-05-13 08:06:32 EDT
Any updates? The kernel-2.6.20-1.2948.fc6.x86_64 doesn't fix this problem ....
Comment 36 han pingtian 2007-06-08 09:25:11 EDT
kernel-2.6.20-1.2952.fc6.x86_64 failed.
Any updates?
Comment 37 han pingtian 2007-06-17 08:20:29 EDT
kernel-2.6.21-1.3194.fc7 and kernel-2.6.21-1.3228.fc7 both failed in fedora 7.
Comment 38 han pingtian 2007-07-19 09:02:00 EDT
Any updates? Why kernel-2.6.18-1.2798 no such problem but all updated kernel have 
this problem? 
Comment 39 Jarod Wilson 2007-07-23 11:30:50 EDT
(In reply to comment #38)
> Any updates? Why kernel-2.6.18-1.2798 no such problem but all updated kernel have 
> this problem? 

Hard to say without having your exact system in front of us here. All these
kernels along the way work for the vast majority of users. Have you tried the
recently pushed 2.6.22.1-based kernels yet?
Comment 40 han pingtian 2007-07-23 23:42:01 EDT
> Hard to say without having your exact system in front of us here. All these
Did you need any infos? Could I do something?
> kernels along the way work for the vast majority of users. Have you tried the
> recently pushed 2.6.22.1-based kernels yet?
I will try it later.
Comment 41 han pingtian 2007-07-24 08:42:01 EDT
kernel-2.6.22.1-27.fc7.x86_64 fails also ...
Comment 42 Vaclav "sHINOBI" Misek 2007-08-23 07:43:40 EDT
On my system kernel-2.6.22.1-41.fc7 solved this bug.
Comment 43 Jon Stanley 2008-01-07 20:50:01 EST
(This is a mass-update to all current FC6 kernel bugs in NEW state)

Hello,

I'm reviewing this bug list as part of the kernel bug triage project, an attempt
to isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug, however this version of Fedora is no longer
maintained.

Please attempt to reproduce this bug with a current version of Fedora (presently
Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a
few days if there is no further information lodged.

Thanks for using Fedora!
Comment 44 Jon Stanley 2008-02-07 23:28:20 EST
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA,
since no information has been lodged for over 30 days.

Please re-open this bug or file a new one if you can provide the requested data,
and thanks for filing the original report!

Note You need to log in before you can comment on or make changes to this bug.