Bug 250267 - system freezes with kernel-2.6.22.1-33.fc7.x86_64
system freezes with kernel-2.6.22.1-33.fc7.x86_64
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-07-31 11:20 EDT by han pingtian
Modified: 2007-11-30 17:12 EST (History)
5 users (show)

See Also:
Fixed In Version: kernel-2.6.22.1-41.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-09-21 09:40:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description han pingtian 2007-07-31 11:20:38 EDT
+++ This bug was initially created as a clone of Bug #232811 +++

Description of problem:
After running with the kernel-2.6.20-1.2925 for a while (randomly), the system
freezes. Keyboard and mouse lose responding. But the display of the screen
doesn't disappear.It is really "freezing"

Version-Release number of selected component (if applicable):
kernel-2.6.20-1.2925

How reproducible:
always

Steps to Reproduce:
1.running with kernel-2.6.20-1.2925 for a while
2.
3.
  
Actual results:
system freezing.

Expected results:


Additional info:
http://smolt.fedoraproject.org/show?UUID=e5b52a3c-03b9-4f38-a4df-14f1f872e389

-- Additional comment from cebbert@redhat.com on 2007-03-19 14:07 EST --
Can you post the log from when you boot?

Just post the contents of /var/log/dmesg


-- Additional comment from hanpingtian@gmail.com on 2007-03-20 07:00 EST --
Created an attachment (id=150474)
/var/log/dmesg


-- Additional comment from scott-bugzilla@riskboys.com on 2007-03-20 18:29 EST --
I have also been experiencing system freezes on this kernel with exactly the
same symptoms, but with i686.  Screen freezes, Pings stop, no message to screen,
nothing in the logs.

I have been running 2.6.18-1.2798.fc6-i686 since late Feb with no issues,
upgraded 24 hours ago to 2.6.20-1.2925 and have had 5 or 6 hangs since.  System
checks out with Memtest.  Booting back to 2.6.18 fixes it.

The freezes seem to coincide with heavy IO on my 8 disk RAID 5 stripe on a
Supermicro sata_mv card.  If I don't attempt to rebuild the array the system
will stay up for several hours.  Rebuilding the array under 2.6.20-1.2925 will
never complete and often the system will not even complete booting.  Again
2.6.18-1.2798 is fine.

-- Additional comment from cebbert@redhat.com on 2007-03-21 11:14 EST --
Can you post the exact models of your disk drives?

Lines in kernel log should look something like this:

scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG HD160JJ/ ZM10 PQ: 0 ANSI: 5

Also, can you post whether NCQ was enabled for each drive, for example:

ata1.00: ATA-7, max UDMA7, 312500000 sectors: LBA48 NCQ (depth 31/32)

-- Additional comment from misek@tnet.cz on 2007-03-21 19:09 EST --
Similar problems here (FC 5 with kernel-2.6.20-1.2300.fc5 and sata_nv). Log shows:

kernel: ata2: EH in ADMA mode, notifier 0x0 notifier_error 
0x0 gen_ctl 0x1501000 status 0x400
kernel: ata2: CPB 0: ctl_flags 0x1f, resp_flags 0x0
kernel: ata2: CPB 1: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 2: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 3: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 4: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 5: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: CPB 6: ctl_flags 0x1f, resp_flags 0x1
kernel: ata2: Resetting port
kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 acti
on 0x2 frozen
kernel: ata2.00: cmd 61/08:00:cd:e3:50/00:00:09:00:00/40 ta
g 0 cdb 0x0 data 4096 out
kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Em
ask 0x4 (timeout)
kernel: ata2: soft resetting port
kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 3
kernel: ata2.00: configured for UDMA/133
kernel: ata2: EH complete

kernel: scsi 0:0:0:0: Direct-Access     ATA      ST380817AS
       3.42 PQ: 0 ANSI: 5
kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 3
00)
kernel: ata2.00: ATA-6, max UDMA/133, 156301488 sectors: LB
A48 NCQ (depth 31/32)

The problems appeared just with this latest kernel update.


-- Additional comment from hanpingtian@gmail.com on 2007-03-22 10:07 EST --
scsi 2:0:0:0: Direct-Access     ATA      WDC WD1600JS-22M 02.0 PQ: 0 ANSI: 5

And it seems there is no "NCQ" in the logs.

-- Additional comment from cebbert@redhat.com on 2007-03-22 10:11 EST --
(In reply to comment #0)
> 
> Additional info:
> http://smolt.fedoraproject.org/show?UUID=e5b52a3c-03b9-4f38-a4df-14f1f872e389

Can you resend the smolt info while running the new kernel?

The driver info has alsmost certainly changed.

-- Additional comment from cebbert@redhat.com on 2007-03-22 10:32 EST --
Created an attachment (id=150666)
smolt info from hanpingtian using 2.6.18


-- Additional comment from cebbert@redhat.com on 2007-03-22 12:11 EST --
(In reply to comment #3)
> I have also been experiencing system freezes on this kernel with exactly the
> same symptoms, but with i686.  Screen freezes, Pings stop, no message to screen,
> nothing in the logs.

> The freezes seem to coincide with heavy IO on my 8 disk RAID 5 stripe on a
> Supermicro sata_mv card.  If I don't attempt to rebuild the array the system
> will stay up for several hours.  Rebuilding the array under 2.6.20-1.2925 will
> never complete and often the system will not even complete booting.  Again
> 2.6.18-1.2798 is fine.

This is a separate bug. Please file a new bugzilla report so we can track it
properly.


-- Additional comment from hanpingtian@gmail.com on 2007-03-23 09:33 EST --
Created an attachment (id=150755)
smolt profile with kernel-2.6.20-1.2925


-- Additional comment from hanpingtian@gmail.com on 2007-03-23 09:34 EST --
Created an attachment (id=150756)
smolt profile with kernel-2.6.20-1.2925


-- Additional comment from cebbert@redhat.com on 2007-03-23 17:49 EST --
Okay, it is still using sata_sil for the hard drives.


-- Additional comment from cebbert@redhat.com on 2007-03-26 10:41 EST --
Test kernels (1.2937) for this issue are at:

http://people.redhat.com/cebbert

Please test and report back.


-- Additional comment from pasik@iki.fi on 2007-03-26 11:27 EST --

Hmm.. 2.6.18 and 2.6.19 fc6 xen kernels work OK for me, but 2.6.20 freezed after
a while (from a couple of seconds to some minutes..)..

Now this 1.2937 crashes immediately during the bootup.. :(

Anything I can try?



-- Additional comment from pasik@iki.fi on 2007-03-26 11:34 EST --

I tried 1.2937 again and with the second and third try it booted ok.. I wonder
what happened with the first try.. then the system rebooted itself while booting
the kernel?

Now let's see if 1.2937 actually stays up and doesn't crash by itself like
1.2933 did. 

My hardware is Intel P4 with i955x chipset, ahci sata disks.


-- Additional comment from pasik@iki.fi on 2007-03-26 11:58 EST --
No win.. the server is rebooting itself every 1-10 mins with 1.2937.. the server
is idle when that happens (or maybe md-raid1 reconstruction running, but nothing
else). 



-- Additional comment from cebbert@redhat.com on 2007-03-26 12:01 EST --
(In reply to comment #16)
> No win.. the server is rebooting itself every 1-10 mins with 1.2937.. the server
> is idle when that happens (or maybe md-raid1 reconstruction running, but nothing
> else). 
> 
> 

Please report a separate bug for this, as it involves Xen.




-- Additional comment from pasik@iki.fi on 2007-03-26 12:40 EST --
Done: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=234008

-- Additional comment from pasik@iki.fi on 2007-03-27 02:51 EST --

Also related?: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=233918

-- Additional comment from cebbert@redhat.com on 2007-03-27 10:26 EST --
Can someone who originally reported this bug please test kernel 2937 or greater?
The Xen problem is a completely different bug.


-- Additional comment from hanpingtian@gmail.com on 2007-03-28 09:00 EST --
(In reply to comment #20)
> Can someone who originally reported this bug please test kernel 2937 or greater?
> The Xen problem is a completely different bug.
> 
I am testing it now ....
One question: there is no such package as "kmod-fglrx.2.6.20-1.2937", but my
X-window is still running, could you tell me why?

-- Additional comment from cebbert@redhat.com on 2007-03-28 09:30 EST --
(In reply to comment #21)
> I am testing it now ....
> One question: there is no such package as "kmod-fglrx.2.6.20-1.2937", but my
> X-window is still running, could you tell me why?

I was wondering about that myself...


-- Additional comment from hanpingtian@gmail.com on 2007-03-28 10:05 EST --
It freezes just now......
Before that, I am running yum. It blocked at futex and I killed it. And then, just  
for a while, the system freezes.


-- Additional comment from djuran@redhat.com on 2007-04-03 12:53 EST --
I've just tested with 2.6.20-1.2940.fc6 and it works considerably better then
2933, but still not perfect. With 2933 the computer locked up completely and a
power cycle (reset was not enough) was required to obtain access to the  SATA
disk again. Performing the same operation with 2940 the system became
unresponsive for a while and then the messages below showed up in the syslog but
the machine recovered.


Apr  3 19:10:22 localhost kernel: ata3: EH in ADMA mode, notifier 0x0
notifier_error 0x0 gen_ctl 0x1501000 status 0x400
Apr  3 19:10:22 localhost kernel: ata3: CPB 0: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 1: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 2: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 3: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 4: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 5: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 6: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 7: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 8: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 9: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:22 localhost kernel: ata3: CPB 10: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:30 localhost kernel: ata3: CPB 11: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:30 localhost kernel: ata3: CPB 12: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 13: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 14: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 15: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 16: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 17: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 18: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 19: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 20: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 21: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 22: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 23: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 24: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 25: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 26: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 27: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 28: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 29: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: CPB 30: ctl_flags 0x1f, resp_flags 0x2
Apr  3 19:10:31 localhost kernel: ata3: Resetting port
Apr  3 19:10:32 localhost kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff
SErr 0x0 action 0x2 frozen
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:00:8d:fb:39/01:00:0f:00:00/40 tag 0 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/00:08:75:99:59/02:00:0e:00:00/40 tag 1 cdb 0x0 data 262144 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/80:10:55:14:3a/01:00:0f:00:00/40 tag 2 cdb 0x0 data 196608 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:18:dd:15:3a/01:00:0f:00:00/40 tag 3 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:20:85:10:3a/01:00:0f:00:00/40 tag 4 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:28:c5:17:3a/01:00:0f:00:00/40 tag 5 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:30:ad:19:3a/01:00:0f:00:00/40 tag 6 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:38:35:23:3a/01:00:0f:00:00/40 tag 7 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:40:1d:25:3a/01:00:0f:00:00/40 tag 8 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:48:b5:0c:3a/01:00:0f:00:00/40 tag 9 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:50:05:27:3a/01:00:0f:00:00/40 tag 10 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:58:65:f2:39/01:00:0f:00:00/40 tag 11 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/80:60:4d:f4:39/01:00:0f:00:00/40 tag 12 cdb 0x0 data 196608 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/10:68:b5:d9:97/00:00:03:00:00/40 tag 13 cdb 0x0 data 8192 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:70:75:fd:39/01:00:0f:00:00/40 tag 14 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:78:5d:ff:39/01:00:0f:00:00/40 tag 15 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:80:9d:0e:3a/01:00:0f:00:00/40 tag 16 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:88:6d:12:3a/01:00:0f:00:00/40 tag 17 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:90:2d:03:3a/01:00:0f:00:00/40 tag 18 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:98:fd:06:3a/01:00:0f:00:00/40 tag 19 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:a0:e5:08:3a/01:00:0f:00:00/40 tag 20 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:a8:a5:f9:39/01:00:0f:00:00/40 tag 21 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:b0:cd:0a:3a/01:00:0f:00:00/40 tag 22 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:b8:95:1b:3a/01:00:0f:00:00/40 tag 23 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:c0:7d:1d:3a/01:00:0f:00:00/40 tag 24 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:c8:65:1f:3a/01:00:0f:00:00/40 tag 25 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:d0:bd:f7:39/01:00:0f:00:00/40 tag 26 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:d8:4d:21:3a/01:00:0f:00:00/40 tag 27 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/08:e0:3d:da:97/00:00:03:00:00/40 tag 28 cdb 0x0 data 4096 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
60/10:e8:bd:ff:97/00:00:03:00:00/40 tag 29 cdb 0x0 data 8192 in
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3.00: cmd
61/e8:f0:d5:f5:39/01:00:0f:00:00/40 tag 30 cdb 0x0 data 249856 out
Apr  3 19:10:32 localhost kernel:          res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr  3 19:10:32 localhost kernel: ata3: soft resetting port
Apr  3 19:10:32 localhost kernel: ata3: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Apr  3 19:10:32 localhost kernel: ata3.00: configured for UDMA/133
Apr  3 19:10:32 localhost kernel: ata3: EH complete
Apr  3 19:10:32 localhost kernel: SCSI device sda: 398297088 512-byte hdwr
sectors (203928 MB)
Apr  3 19:10:32 localhost kernel: sda: Write Protect is off
Apr  3 19:10:32 localhost kernel: SCSI device sda: write cache: enabled, read
cache: enabled, doesn't support DPO or FUA


-- Additional comment from djuran@redhat.com on 2007-04-13 05:00 EST --
Is this the correct place to suggest additions to the ata_device_blacklist in
drivers/ata/libata-core.c? If so I'd suggest adding the following entry there:

{ "Maxtor 6B200M0",     "BANC",         ATA_HORKAGE_NONCQ }

With this entry, my computer works fine again and the drive no longer locks the
machine up under load.



-- Additional comment from djuran@redhat.com on 2007-04-13 08:35 EST --
"works fine" turned out to be a bit of an exaggeration, under heave I/O the
machine hard-locked and needed a power cycle to recover )-:
I'm now running 2.6.20-1.2944.fc6 with the parameter "adma=0" passed to the
sata_nv module and this seems (so far) to work fine...

-- Additional comment from cebbert@redhat.com on 2007-04-19 11:32 EST --
kernel 2944 has the latest NCQ blacklist from 2.6.21

-- Additional comment from misek@tnet.cz on 2007-04-19 17:01 EST --
For me, the 2944 makes the same problems as the previous kernels. Maybe the
sata_nv fix from the latest kernels should help.

-- Additional comment from hancockr@shaw.ca on 2007-04-19 19:17 EST --
It doesn't look like David Juran's issue is a problem with the driver. The CPB
response flags indicate 0x2 which means the controller has sent the command to
the device and is waiting for it to indicate completion, obviously it never did.
Quite likely NCQ does not work properly on that drive and it needs to be added
to the NCQ blacklist.

Disabling ADMA also disables NCQ so it is not surprising that it also stops the
problem from showing up.

-- Additional comment from cebbert@redhat.com on 2007-04-19 19:38 EST --

(In reply to comment #29)
> It doesn't look like David Juran's issue is a problem with the driver. The CPB
> response flags indicate 0x2 which means the controller has sent the command to
> the device and is waiting for it to indicate completion, obviously it never did.
> Quite likely NCQ does not work properly on that drive and it needs to be added
> to the NCQ blacklist.
> 

But David says he added the drive to the blacklist himself and that didn't
fix the problem. Maybe he didn't add it properly?


-- Additional comment from hancockr@shaw.ca on 2007-04-20 02:05 EST --
I don't think the firmware part of the line he mentioned he added is correct.
The SCSI layer lists only the first 4 characters of the firmware string but the
actual ATA string is longer, you need the full string (from hdparm -I for example).

-- Additional comment from djuran@redhat.com on 2007-04-23 13:21 EST --
D'Oh!
So the firmware revision should be "BANC1BM0". I'll re-enable adma and try this
for a few days and let you know how it fares...

-- Additional comment from hancockr@shaw.ca on 2007-04-23 18:41 EST --
If the blacklist entry has been recognized properly you should see "NCQ (not
used)" instead of "NCQ (depth 31/32)".

-- Additional comment from djuran@redhat.com on 2007-04-26 14:53 EST --
It seems my drive is more messed up then it has any kind of right to be. To find
out the model and revision, I inserted  into ata_device_blacklisted the
following printk:

        printk(KERN_NOTICE "modellen ar: XXX%sXXX\n",model_num);
        printk(KERN_NOTICE "revisionen ar: XXX%sXXX\n",model_rev);


and this is what I got into dmesg:

modellen ar: XXXMaxtor 6B200M0XXX
revisionen ar: XXXBANC1BM0Maxtor 6<C0>^E^?^?XXX

There seem to be some non-printable characters in model_rev! Maybe it would make
sense to just blacklist the entire model irregardless of revision i.e.

{ "Maxtor 6B200M0",     NULL,           ATA_HORKAGE_NONCQ }

-- Additional comment from hanpingtian@gmail.com on 2007-05-13 08:06 EST --
Any updates? The kernel-2.6.20-1.2948.fc6.x86_64 doesn't fix this problem ....

-- Additional comment from hanpingtian@gmail.com on 2007-06-08 09:25 EST --
kernel-2.6.20-1.2952.fc6.x86_64 failed.
Any updates?

-- Additional comment from hanpingtian@gmail.com on 2007-06-17 08:20 EST --
kernel-2.6.21-1.3194.fc7 and kernel-2.6.21-1.3228.fc7 both failed in fedora 7.

-- Additional comment from hanpingtian@gmail.com on 2007-07-19 09:02 EST --
Any updates? Why kernel-2.6.18-1.2798 no such problem but all updated kernel have 
this problem? 

-- Additional comment from jwilson@redhat.com on 2007-07-23 11:30 EST --
(In reply to comment #38)
> Any updates? Why kernel-2.6.18-1.2798 no such problem but all updated kernel have 
> this problem? 

Hard to say without having your exact system in front of us here. All these
kernels along the way work for the vast majority of users. Have you tried the
recently pushed 2.6.22.1-based kernels yet?

-- Additional comment from hanpingtian@gmail.com on 2007-07-23 23:42 EST --

> Hard to say without having your exact system in front of us here. All these
Did you need any infos? Could I do something?
> kernels along the way work for the vast majority of users. Have you tried the
> recently pushed 2.6.22.1-based kernels yet?
I will try it later.


-- Additional comment from hanpingtian@gmail.com on 2007-07-24 08:42 EST --
kernel-2.6.22.1-27.fc7.x86_64 fails also ...
Comment 1 han pingtian 2007-07-31 11:22:31 EDT
The newest kernel kernel-2.6.22.1-33.fc7.x86_64 fails also. I have to clone it
to F7 from fc6.
Comment 2 Chuck Ebbert 2007-08-22 16:24:14 EDT
(In reply to comment #1)
> The newest kernel kernel-2.6.22.1-33.fc7.x86_64 fails also. I have to clone it
> to F7 from fc6.

Does adding "pci=nomsi,nommconf" to the kernel command line help, or did we try
that already?
Comment 3 han pingtian 2007-08-23 04:11:13 EDT
(In reply to comment #2)
> (In reply to comment #1)
> > The newest kernel kernel-2.6.22.1-33.fc7.x86_64 fails also. I have to clone it
> > to F7 from fc6.
> 
> Does adding "pci=nomsi,nommconf" to the kernel command line help, or did we try
> that already?

No, I hadn't added those command line options. And I have switched to i386
release now.
Comment 4 Vaclav "sHINOBI" Misek 2007-08-23 07:42:25 EDT
On my system it seems this bug is solved by kernel-2.6.22.1-41.fc7, although I'm
not sure what was changed. BTW haven't tried pci=nomsi,nommconf option before.
Comment 5 Christopher Brown 2007-09-21 09:40:40 EDT
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am closing this bug as it appears resolved. If I have erred, please accept my
profuse apologies and re-open and I will attempt to assist in its resolution.

Cheers
Chris

Note You need to log in before you can comment on or make changes to this bug.