Bug 199142

Summary: sata_promise(?) BUG unable to handle NULL pointer dereference
Product: [Fedora] Fedora Reporter: Josep <josep.puigdemont>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: davej, ddalton, dzrudy, jason_mack, jason__m, lsof, mail, mh, peterm, p.r.schaffner, whitefrost01, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-14 16:19:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 172490    
Attachments:
Description Flags
Boot messages
none
lspci
none
verbose lspci
none
lspci -v
none
Strange messages
none
Boot messages, with udev messages set to "debug"
none
Proposed fix
none
sata_promise patch none

Description Josep 2006-07-17 14:58:33 UTC
Description of problem:
Fedora Core won't boot with kernel 2.6.17-1.2157_FC5 (it happens also with
earlier builds of 2.6.17 kernels too).
The boot process stops when trying to start udev.
There seems to be a BUG in some kernel module (see attachments).

Version-Release number of selected component (if applicable):
kernel-2.6.17-1.2157_FC5

How reproducible:
Always

Steps to Reproduce:
Boot FC5 with kernel-2.6.17-1.2157_FC5
(You'd probably need the right hardware to reproduce it, I'll attach the output
of lspci)
  
Actual results:
Boot process hangs on udev (it previously dumps a stack trace)

Expected results:
boot normally

Additional info:
This doesn't happen with 2.6.16 kernels (or earlier).
I could not find this bug in bugzilla.kernel.org.

Comment 1 Josep 2006-07-17 14:58:33 UTC
Created attachment 132549 [details]
Boot messages

Comment 2 Josep 2006-07-17 15:00:49 UTC
Created attachment 132550 [details]
lspci

Comment 3 Josep 2006-07-17 15:04:54 UTC
Created attachment 132551 [details]
verbose lspci

I meant to do a "lspci -v" for more info. Here we go, sorry for the noise.

Comment 4 Josep 2006-07-17 22:52:09 UTC
I've just realized this might be a duplicate of #197441
(commenting there too)


Comment 5 Joachim Selke 2006-07-24 14:30:16 UTC
Created attachment 132918 [details]
lspci -v

I can confirm this bug. On my system the problem is even worse. When booting I
don't get any further than the message "Red Hat nash version 5.0.32 starting".
After that I get "BUG: unable to handle kernel NULL pointer dereference ..."
(see the above attachment of boot messages for the rest of it, it looks very
similar here).

I attached the output of "lspci -v".

Comment 6 Joachim Selke 2006-07-24 15:01:40 UTC
I forgot to mention the kernels I tried during the last weeks:

2.6.16-1.2133_FC5smp -> works
2.6.17-1.2139_FC5smp -> problem
2.6.17-1.2145_FC5smp -> problem
2.6.17-1.2157_FC5smp -> problem
2.6.17-1.2159_FC5smp -> problem

Comment 7 Josep 2006-08-11 07:13:32 UTC
The problem still persists with kernel-2.6.17-1.2174_FC5

If I am not mistaken, FC6 will use 2.6.17 kernels (and above), this means it
won't work for people using that SATA controller, should we raise the severity
to HIGH?

I'm still not sure if it is a duplicate of bug 197441.


Comment 8 chopperboy 2006-08-15 16:10:44 UTC
This is just a "me-too" comment.  I'm seeing the same thing on my Athlons (both
64 and 32 bit machines).  Is someone looking into this?  All bugs filed on this
issue seem to be unassigned.

Comment 9 Josep 2006-09-17 09:54:30 UTC
Created attachment 136477 [details]
Strange messages

These strange messages appear _always_ after loading sata_promise.
Notice though that the media check dialog looks fine (all messages after that
are ok)

Comment 10 Josep 2006-09-17 10:57:39 UTC
I can confirm that this issue is still present in the current kernel
2.6.17-1.2187_FC5.
I can also confirm that it disappears if I disable the SATA P20579 controller on
the BIOS.

About comment #9, it's the fc6t3 installation cd. The strange text messages
always appear when the sata controller is enabled, and never when it is disabled.


Comment 11 Dawid Zamirski 2006-09-19 13:24:15 UTC
Just FYI, commenting out the linux-2.6-sata-promise-pata-ports.patch from spec
file in kernel's src.rpm solved the problem for me. See bug #201966

Comment 12 Dave Jones 2006-10-16 20:17:02 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 13 Dawid Zamirski 2006-10-17 12:56:04 UTC
Hi Dave,

I have just installed the new kernel and it still doesn't boot until I rebuild
the rpm without the sata-promise-pata-ports patch. I have also plugged my drives
to my second onboard controller (via_sata driver) and disabled the Promise one
in the BIOS. After that the kernel's boot process goes a little futrther, but
the machine freeezes on starting UDEV stage. While running on via_sata, I was
able to see that sata_promise is still being loaded and causes a traceback. I
coudn't catch the full tracebak since I don't have a serial console and
boot_delay=500 doesn't help because it's an initrd stage, however it seems to be
identical to the one I reported in my other bug (#20166). Also in this case
(running on via_sata) removing the sata-promise-pata-ports patch allows me to
boot sucessfully.

Comment 14 Josep 2006-10-17 20:56:38 UTC
Like Dawid, I can also confirm that the kernel still crashes with a similar
error message at startup (the NULL pointer dereference).
Although the kernell seems to continue booting, it freezes when starting udev (I
left it for a couple of hours without anything happening at one time).
When disabling the SATA P20579 device in the BIOS (as I mention in comment 10),
the kernel boots without problems (udev gives an error that I never remember to
write down, I don't think it's relevant though).

Although probably not relevant, I noticed that windows crashes when trying to
turn off the computer (it resets instead, due to the crash). When diabling the
PATA device, windows can turn off the computer normally.


Comment 15 Andy 2006-10-21 03:44:29 UTC
I fixed the problem apparently be removing the logical volumes and manually
configuring the partitions without any logical volumes. 

Comment 16 Josep 2006-11-11 21:23:00 UTC
Just to confirm that this bug is still present with the kernel shipped with FC 6
(2.16.18), although the crash (the "unable to handle NULL pointer reference"),
happens to a later stage, around when it starts udev.

Please, let me know if you'd like me to send the boot messages log.


Comment 17 Dave Jones 2006-11-12 07:26:16 UTC
yes, if the crash message makes it into the logs, please do attach.


Comment 18 Josep 2006-11-13 00:11:09 UTC
Hi, this is tha "backtrace" generated during boot (I attached a serial terminal
so I could get the messages captured in a file).

This is an extract of the log that I'll attach (some udev messages were mixed up
in here, sorry).
You'll see the nvidia kernel module inserted. If you think that's causing the
problem, I'll remove it. I need to add that some times the crash actually
happened _before_ the nvidia module was inserted.


BUG: unable to handle kernel NULL pointer dereferenced
udevd-event[ at virtual address 00000008
704]: wait_for_s printing eip:
ysfs: file '/sys*pde = 3f657067
/devices/pci0000Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:10.2/usb3/usbdev3.1_ep00/dev
Modules linked in: soundcore r8169 emu10k1_gp gameport ohci1394 ieee1394
sata_promise nvidia(U) i2c_viapro k8_edac edac_mc i2c_core serio_raw ide_cd
cdrom dm_snapshot dm_zero dm_mirror dm_mod usb_storage sata_via libata sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f89a154c>]    Tainted: P      VLI
EFLAGS: 00010202   (2.6.18-1.2798.fc6 #1) 
EIP is at pdc_sata_scr_read+0x14/0x17 [sata_promise]
eax: 00000008   ebx: f71c035c   ecx: f71c035c   edx: 00000002
esi: f7e62df8   edi: f71c035c   ebp: f71bcd78   esp: f7e62d98
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 723, ti=f7e62000 task=f7f04920 task.ti=f7e62000)
Stack: f88f1a22 f71bc930 f71bc4e8 f88f657d f88fcceb f7c18c80 00000053 f88fcd97 
       f88aa300 f88aa338 00000000 000000c1 f7087400 f7c18c80 00000000 00000003 
       f7fa3048 00000002 00000002 00000003 00000003 00000005 00000005 f71bc930 
Call Trace:
 [<f88f1a22>] sata_scr_read+0x1a/0x28 [libata]
 [<f88f657d>] ata_device_add+0x406/0x765 [libata]
 [<f89a1a25>] pdc_ata_init_one+0x2dc/0x317 [sata_promise]
 [<c04f0293>] pci_device_probe+0x36/0x57
 [<c05525b1>] driver_probe_device+0x45/0x9a
 [<c05526dc>] __driver_attach+0x65/0x8f
 [<c0552036>] bus_for_each_dev+0x37/0x59
 [<c0552512>] driver_attach+0x16/0x18
 [<c0551d2e>] bus_add_driver+0x6f/0x10d
 [<c04f03c5>] __pci_register_driver+0x49/0x63
 [<c043f1fb>] sys_init_module+0x17de/0x1977
 [<c0404013>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 =======================
Code: 69 42 a8 c7 e8 ab 44 a6 c7 83 c4 14 89 da 89 f0 5b 5e e9 1d 81 f5 ff 89 c1
83 c8 ff 83 fa 02 77 0c 8d 04 95 00 00 00 00 03 41 68 <8b> 00 c3 8b 80 b8 1f 00
00 8b 40 18 8b 40 40 c3 83 fa 02 53 89 
EIP: [<f89a154c>] pdc_sata_scr_read+0x14/0x17 [sata_promise] SS:ESP 0068:f7e62d98
 :00/0000:00:06.0/bus' appeared after 0 loops


Comment 19 Josep 2006-11-13 00:20:49 UTC
Created attachment 141013 [details]
Boot messages, with udev messages set to "debug"

Boot messages with udev debug messages.

Notice that close to the bottom of the file, you'll see a string "FET", this is
the translated message "DONE" that appears when each of the init scripts
sucessfully finishes.

Right now it is impossible for me to boot to Fedora, but if you need more
information or that I change something, I have a dual boot with debian, and
from there I could modify anything needed.

Comment 20 Josep 2006-11-13 01:15:33 UTC
Just to make sure, these are the messages without the nvidia module inserted.

                Press 'I' to enter interactive startup.                         
S'est.  configurant el rellotge  (utc): dl nov 13 02:11:57 CET 2006 [ FET  ]    
S'est.  iniciant el udev: BUG: unable to handle kernel NULL pointer dereference 
at virtual address 00000008                                                     
 printing eip:                                                                  
*pde = 3ef44067                                                                 
Oops: 0000 [#1]                                                                 
SMP                                                                             
last sysfs file: /class/input/input2/event2/dev                                 
Modules linked in: ieee1394 r8169 ide_cd sata_promise emu10k1_gp i2c_viapro k8_e
dac cdrom gameport i2c_core pcspkr edac_mc snd_seq_device snd_timer snd_page_all
oc snd_util_mem serio_raw snd_hwdep snd soundcore dm_snapshot dm_zero dm_mirror 
dm_mod usb_storage sata_via libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uh
ci_hcd                                                                          
CPU:    0                                                                       
EIP:    0060:[<f898754c>]    Not tainted VLI                                    
EFLAGS: 00010202   (2.6.18-1.2798.fc6 #1)                                       
EIP is at pdc_sata_scr_read+0x14/0x17 [sata_promise]                            
eax: 00000008   ebx: f723c35c   ecx: f723c35c   edx: 00000002                   
esi: f7d38df8   edi: f723c35c   ebp: f7218d78   esp: f7d38d98                   
ds: 007b   es: 007b   ss: 0068                                                  
Process modprobe (pid: 617, ti=f7d38000 task=f7f08430 task.ti=f7d38000)         
Stack: f88f1a22 f7218930 f72184e8 f88f657d f88fcceb f71458c0 00000053 f88fcd97  
       f8866300 f8866338 00000000 000000b1 f7037000 f71458c0 00000000 00000003  
       f7fa3048 00000002 00000002 00000003 00000003 00000005 00000005 f7218930  
Call Trace:                                                                     
 [<f88f1a22>] sata_scr_read+0x1a/0x28 [libata]                                  
 [<f88f657d>] ata_device_add+0x406/0x765 [libata]                               
 [<f8987a25>] pdc_ata_init_one+0x2dc/0x317 [sata_promise]                       
 [<c04f0293>] pci_device_probe+0x36/0x57                                        
 [<c05525b1>] driver_probe_device+0x45/0x9a                                     
 [<c05526dc>] __driver_attach+0x65/0x8f                                         
 [<c0552036>] bus_for_each_dev+0x37/0x59                                        
 [<c0552512>] driver_attach+0x16/0x18                                           
 [<c0551d2e>] bus_add_driver+0x6f/0x10d                                         
 [<c04f03c5>] __pci_register_driver+0x49/0x63                                   
 [<c043f1fb>] sys_init_module+0x17de/0x1977                                     
 [<c0404013>] syscall_call+0x7/0xb                                              
DWARF2 unwinder stuck at syscall_call+0x7/0xb                                   
Leftover inexact backtrace:                                                     
 =======================                                                        
Code: 69 e2 a9 c7 e8 ab e4 a7 c7 83 c4 14 89 da 89 f0 5b 5e e9 1d 21 f7 ff 89 c1
 83 c8 ff 83 fa 02 77 0c 8d 04 95 00 00 00 00 03 41 68 <8b> 00 c3 8b 80 b8 1f 00
 00 8b 40 18 8b 40 40 c3 83 fa 02 53 89                                         
EIP: [<f898754c>] pdc_sata_scr_read+0x14/0x17 [sata_promise] SS:ESP 0068:f7d38d9
8                                                                               
 udevd-event[608]: run_program: '/sbin/modprobe' abnormal exit                  
<6>Floppy drive(s): fd0 is 1.44M                                                
FDC 0 is a post-1991 82077


Comment 21 Jeff Garzik 2006-11-13 15:03:28 UTC
Created attachment 141053 [details]
Proposed fix

See if this patch fixes things.

Comment 22 Need Real Name 2006-11-13 18:03:13 UTC
*** Bug 198937 has been marked as a duplicate of this bug. ***

Comment 23 Dawid Zamirski 2006-11-13 19:24:31 UTC
*** Bug 201966 has been marked as a duplicate of this bug. ***

Comment 24 Josep 2006-11-13 19:48:16 UTC
Trying to apply your patch, I discrovered that the file
drivers/ata/sata_promise.c
does not exist, although there is a
drivers/scsi/sata_promise.c
with similar (probably identical) code. Should I patch the later file, instead?

I am using the following source rpm:
kernel-2.6.18-1.2849.fc6.src.rpm


Comment 25 Markus Hakansson 2006-11-13 20:39:15 UTC
*** Bug 212320 has been marked as a duplicate of this bug. ***

Comment 26 Dawid Zamirski 2006-11-13 23:56:12 UTC
Jeff,

I applied it to /driver/scsi/sata_promise.c and it now oopses in
pdc_sata_scr_write instead of pdc_sata_scr_read It seems to have exactly the
same "if" condition, so I'll patch it with the same change and let you know if
it helps

Comment 27 Dawid Zamirski 2006-11-14 04:40:24 UTC
Created attachment 141129 [details]
sata_promise patch

Applying the same "if" condition in pdc_sata_scr_read and pdc_sata_scr_write
solved the problem for me :-)

Comment 28 White FrosT 2006-11-14 16:09:53 UTC
I have removed the sata promise patch from the kernel config to get a TX2 150 
going. It is running now, but gives me a lot warning messages:
kernel: ata1: status=0x50 { DriveReady SeekComplete }
kernel: ata1: no sense translation for status: 0x50
kernel: ata1: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Same problem as descibed here: http://lkml.org/lkml/2006/10/25/198 ?

Comment 29 Jeff Garzik 2006-11-14 16:18:54 UTC
I have committed the scr_read and scr_write modifications to
jgarzik/libata-dev.git#promise-sata-pata (which will eventually trickle up to -mm).

For other errors (e.g. White FrosT @ Comment #28) please file a separate bug report.

Comment 30 Dawid Zamirski 2006-11-14 16:30:41 UTC
Jeff,

Does it mean that this patch will make into next kernel update for FC6 and 5 (it
would enable me to upgrade to FC6) or will we have to wait until it is accepted
by upstream?

Comment 31 Jason Mack 2006-11-15 01:32:07 UTC
Jeff, thanks for the patch.  I hope to try it this weekend.  My kernel BUG
serial output for sata_promise is posted over in Bug 199216.

Speaking of that, there seems to be a lot of outstanding bug reports on this issue.

These are the ones I found so far:

Bugzilla Bug 199142: sata_promise BUG unable to handle NULL pointer dereference
Bugzilla Bug 198937: sata_promise crash at boot
Bugzilla Bug 201966: Anaconda can't find my HDD on sata_promise
Bugzilla Bug 199216: Sata_promise works for kernel 2.6.16-1.2122_FC5 but not for
2.6.17-1.2157_FC5

Is there someone (QA maybe) who can scrub out the related bugs and mark them as
dup's of this one so we all know where to go to find the progress reports?

Comment 32 Jason Mack 2006-11-15 01:35:11 UTC
*** Bug 199216 has been marked as a duplicate of this bug. ***

Comment 33 Jason M 2006-11-27 16:45:46 UTC
(In reply to comment #30)
> Does it mean that this patch will make into next kernel update for FC6 and 5 (it
> would enable me to upgrade to FC6) or will we have to wait until it is accepted
> by upstream?

I have the same question.

It does seem a bit odd to me to close the bug before there is a record posted of
a successful test?

I can provide that, though.

I used the patch 141129 from below ("sata_promise patch") on FC5 kernel 2.6.18,
#2239 and I successfully booted, SATA drives seem ok.  I also recompiled with
SMP turned on, and this worked too.

Comment 34 Josep 2006-11-27 17:02:07 UTC
I can also confirm that with the given patch the described problem went away.
What I'm waiting for now is a new official kernel for FC6 with the patch, it's
been a while ;-)


Comment 35 Phil Schaffner 2006-12-08 17:46:29 UTC
kernel-2.6.18-1.2860.fc6 from updates-testing repo does work correctly with my
Promise SATA controller/drive.


Comment 36 Jason M 2006-12-11 03:53:43 UTC
(In reply to comment #35)
> kernel-2.6.18-1.2860.fc6 from updates-testing repo does work correctly with my
> Promise SATA controller/drive.

Agreed.

2.6.18-1.2860.fc6 from updates-testing works here too.

Hardware:  Mass storage controller: Promise Technology, Inc. PDC20575 (SATAII150
TX2plus) (rev 02).

Now, will a "clean" install of FC6 over http/ftp work, or do I pull the card
during install?  In other words, does the 2860 kernel ever become the one that
Anaconda will use for inital boot/install?


Comment 37 Need Real Name 2007-01-22 14:14:01 UTC
> Now, will a "clean" install of FC6 over http/ftp work, or do I pull the card

I tried the Zod live cd, and it was fine.
Does this mean the update has made the non-testing FC6 kernel?