Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 637194

Summary: [Qlogic 5.6 bug] qlcnic: fix kernel NULL pointer dereference __qlcnic_shutdown+0xe/0x8a
Product: Red Hat Enterprise Linux 5 Reporter: PaulB <pbunyan>
Component: kernelAssignee: Chad Dupuis (Cavium) <cdupuis>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.6CC: amit.salecha, andriusb, atodorov, bdonahue, bpicco, cward, dmach, GR-Linux-NIC-Dev, jburke, jwilson, peterm, rajesh.borundia
Target Milestone: rcKeywords: OtherQA
Target Release: 5.6   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:22:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
null pointer in shutdown
none
qlcnic: Fix missing error codes none

Description PaulB 2010-09-24 14:57:41 UTC
Description of problem:
System PANICS during install RHEL5.6-Server-20100921.n_nfs-x86_64

Version-Release number of selected component (if applicable):
RHEL5.6-Server-20100921.n_nfs-x86_64

Actual results:
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:  
 [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
PGD 237e58067 PUD 237e57067 PMD 0  
Oops: 0000 [1] SMP  
last sysfs file: /devices/pci0000:00/0000:00:1d.0/usb1/1-0:1.0/bAlternateSetting 
CPU 0  
Modules linked in: sha256 aes_generic dm_crypt dm_emc dm_round_robin dm_multipath scsi_dh dm_snapshot dm_mirror dm_zero xfs lock_nolock gfs2 ext3 jbd ext4 crc16 jbd2 msdos dm_raid45 dm_message dm_mem_cache dm_region_hash dm_log dm_mod raid456 xor raid10 raid1 raid0 qla2xxx ata_piix libata cciss qla4xxx scsi_transport_fc netxen_nic qlcnic ehci_hcd uhci_hcd iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr edd loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs 
Pid: 1, comm: init Not tainted 2.6.18-222.el5 #1 
RIP: 0010:[<ffffffff88293569>]  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
RSP: 0018:ffff810237f97e08  EFLAGS: 00010282 
RAX: ffffffff882935e5 RBX: 0000000000000000 RCX: ffffffff8020d600 
RDX: ffff8101379c4800 RSI: 0000000000000246 RDI: ffff8101379c4800 
RBP: 0000000028121969 R08: ffff810237d5e810 R09: 0000000000000004 
R10: ffff810237f97c78 R11: ffffffff882935e5 R12: ffff8101379c4800 
R13: 0000000000000008 R14: 0000000000000004 R15: 0000000000000000 
FS:  00000000158c8850(0063) GS:ffffffff80423000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000000000000040 CR3: 0000000237e54000 CR4: 00000000000006e0 
Process init (pid: 1, threadinfo ffff810237f96000, task ffff8101044ef7a0) 
Stack:  ffff8101379c4800 0000000028121969 00000000fee1dead ffffffff882935ee 
 ffff810237bb6870 ffffffff801ceb56 0000000000000000 ffffffff8009cc14 
 0000000001234567 ffffffff8009cd9e ffffffffffffffff ffff810237f97ee8 
Call Trace: 
 [<ffffffff882935ee>] :qlcnic:qlcnic_shutdown+0x9/0x18 
 [<ffffffff801ceb56>] device_shutdown+0x56/0x88 
 [<ffffffff8009cc14>] kernel_restart+0x9/0x46 
 [<ffffffff8009cd9e>] sys_reboot+0x146/0x1c7 
 [<ffffffff8003af15>] hrtimer_try_to_cancel+0x4a/0x53 
 [<ffffffff8005a44a>] hrtimer_cancel+0xc/0x16 
 [<ffffffff80063ce5>] do_nanosleep+0x47/0x70 
 [<ffffffff8005a337>] hrtimer_nanosleep+0x58/0x118 
 [<ffffffff800a4530>] hrtimer_wakeup+0x0/0x22 
 [<ffffffff8001dde9>] sigprocmask+0xb7/0xdb 
 [<ffffffff80054cae>] sys_nanosleep+0x4c/0x62 
 [<ffffffff8005d116>] system_call+0x7e/0x83 
 
 
Code: 48 8b 6b 40 48 89 ef e8 0f 10 fa f7 48 89 df e8 98 ff ff ff  
RIP  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
 RSP <ffff810237f97e08> 
CR2: 0000000000000040 
 <0>Kernel panic - not syncing: Fatal exception 


Expected results:
System should successfully install.

Additional info:
see next comment

-pbunyan

Comment 2 Chad Dupuis (Cavium) 2010-09-24 16:57:27 UTC
(In reply to comment #0)
> Description of problem:
> System PANICS during install RHEL5.6-Server-20100921.n_nfs-x86_64
> 
> Version-Release number of selected component (if applicable):
> RHEL5.6-Server-20100921.n_nfs-x86_64
> 
> Actual results:
> Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:  
>  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
> PGD 237e58067 PUD 237e57067 PMD 0  
> Oops: 0000 [1] SMP  
> last sysfs file:
> /devices/pci0000:00/0000:00:1d.0/usb1/1-0:1.0/bAlternateSetting 
> CPU 0  
> Modules linked in: sha256 aes_generic dm_crypt dm_emc dm_round_robin
> dm_multipath scsi_dh dm_snapshot dm_mirror dm_zero xfs lock_nolock gfs2 ext3
> jbd ext4 crc16 jbd2 msdos dm_raid45 dm_message dm_mem_cache dm_region_hash
> dm_log dm_mod raid456 xor raid10 raid1 raid0 qla2xxx ata_piix libata cciss
> qla4xxx scsi_transport_fc netxen_nic qlcnic ehci_hcd uhci_hcd iscsi_ibft
> iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi
> sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr
> edd loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs 
> Pid: 1, comm: init Not tainted 2.6.18-222.el5 #1 
> RIP: 0010:[<ffffffff88293569>]  [<ffffffff88293569>]
> :qlcnic:__qlcnic_shutdown+0xe/0x8a 
> RSP: 0018:ffff810237f97e08  EFLAGS: 00010282 
> RAX: ffffffff882935e5 RBX: 0000000000000000 RCX: ffffffff8020d600 
> RDX: ffff8101379c4800 RSI: 0000000000000246 RDI: ffff8101379c4800 
> RBP: 0000000028121969 R08: ffff810237d5e810 R09: 0000000000000004 
> R10: ffff810237f97c78 R11: ffffffff882935e5 R12: ffff8101379c4800 
> R13: 0000000000000008 R14: 0000000000000004 R15: 0000000000000000 
> FS:  00000000158c8850(0063) GS:ffffffff80423000(0000) knlGS:0000000000000000 
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
> CR2: 0000000000000040 CR3: 0000000237e54000 CR4: 00000000000006e0 
> Process init (pid: 1, threadinfo ffff810237f96000, task ffff8101044ef7a0) 
> Stack:  ffff8101379c4800 0000000028121969 00000000fee1dead ffffffff882935ee 
>  ffff810237bb6870 ffffffff801ceb56 0000000000000000 ffffffff8009cc14 
>  0000000001234567 ffffffff8009cd9e ffffffffffffffff ffff810237f97ee8 
> Call Trace: 
>  [<ffffffff882935ee>] :qlcnic:qlcnic_shutdown+0x9/0x18 
>  [<ffffffff801ceb56>] device_shutdown+0x56/0x88 
>  [<ffffffff8009cc14>] kernel_restart+0x9/0x46 
>  [<ffffffff8009cd9e>] sys_reboot+0x146/0x1c7 
>  [<ffffffff8003af15>] hrtimer_try_to_cancel+0x4a/0x53 
>  [<ffffffff8005a44a>] hrtimer_cancel+0xc/0x16 
>  [<ffffffff80063ce5>] do_nanosleep+0x47/0x70 
>  [<ffffffff8005a337>] hrtimer_nanosleep+0x58/0x118 
>  [<ffffffff800a4530>] hrtimer_wakeup+0x0/0x22 
>  [<ffffffff8001dde9>] sigprocmask+0xb7/0xdb 
>  [<ffffffff80054cae>] sys_nanosleep+0x4c/0x62 
>  [<ffffffff8005d116>] system_call+0x7e/0x83 
> 
> 
> Code: 48 8b 6b 40 48 89 ef e8 0f 10 fa f7 48 89 df e8 98 ff ff ff  
> RIP  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
>  RSP <ffff810237f97e08> 
> CR2: 0000000000000040 
>  <0>Kernel panic - not syncing: Fatal exception 
> 
> 
> Expected results:
> System should successfully install.
> 
> Additional info:
> see next comment
> 
> -pbunyan

A couple of questions:

1. Did this occur during a network install where the QLogic 82XX card was being used as the install device?
2. Where in the install does this occur?  The stack trace here seems to indicate that we're cleaning up so I assume this occurs when the install is trying to reboot after the installation completed?

Thanks.

Comment 3 Jeff Burke 2010-09-24 17:18:35 UTC
A couple of questions:

1. Did this occur during a network install where the QLogic 82XX card was being
used as the install device? 
 Answer - From inventory all I can tell is the default install interface is using the netxen_nic driver. This looks to be an issue with the qlcnic driver.

2. Where in the install does this occur? 
 Answer - Looking at the log it happens at the end of the installation while rebooting.

----------------<snip>------------------
sending kill signals...done 
disabling swap... 
	/dev/mapper/VolGroup00-LogVol01 
unmounting filesystems... 
	/mnt/runtime done 
	disabling /dev/loop0 
	/proc/bus/usb done 
	/proc done 
	/dev/pts done 
	/sys done 
	/tmp/ramfs done 
	/mnt/source done 
	/selinux done 
	/mnt/sysimage/boot done 
	/mnt/sysimage/sys done 
	/mnt/sysimage/proc/bus/usb done 
	/mnt/sysimage/proc done 
	/mnt/sysimage/selinux done 
	/mnt/sysimage/dev done 
	/mnt/sysimage done 
rebooting system 
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:  
.
.
.
----------------</snip>------------------

Comment 4 asalecha 2010-09-27 06:27:02 UTC
Created attachment 449830 [details]
null pointer in shutdown

Attaching patch based on dump. Private data is unavailable and dereferencing it causing null pointer exception.

Also gives us exact steps to reproduce the problem i.e O.S and installation procedure.

Comment 5 Chad Dupuis (Cavium) 2010-09-27 13:48:07 UTC
Is there a way to add this patch to a test build of the RHEL 5 ISO?  It would seem the only way to verify this patch would be to retest the installation.

Comment 6 asalecha 2010-10-25 09:18:35 UTC
Any update on this ?

Comment 7 Andrius Benokraitis 2010-11-12 18:40:52 UTC
Chad, are you using this bugzilla to post the fix?

Comment 8 Chad Dupuis (Cavium) 2010-11-12 19:00:12 UTC
(In reply to comment #7)
> Chad, are you using this bugzilla to post the fix?

Yes, bz562723 was originally for posting the patches necessary to add qlcnic to RHEL 5.6 so it would make more sense to track this point fix with this bz.

Comment 14 Chad Dupuis (Cavium) 2010-11-16 13:32:15 UTC
Created attachment 460834 [details]
qlcnic: Fix missing error codes

We were able to reproduce the issue and believe this fix upstream:

http://kerneltrap.org/mailarchive/linux-netdev/2010/8/27/6283992

fixes this issue.  Essentially there were some errors that we were not returning the correct error value from probe which makes the PCI layer falsely assume that probe succeeded including falsely populating the PCI driver data.  We have tested this on a local setup by error injection.  After failing probe and returning a positive error value when we reboot the system we see the shutdown panic.  If the error value is negative then shutdown works fine.

Comment 15 bob picco 2010-11-16 14:00:13 UTC
(In reply to comment #14)
> Created attachment 460834 [details]
> qlcnic: Fix missing error codes
> 
> We were able to reproduce the issue and believe this fix upstream:
> 
> http://kerneltrap.org/mailarchive/linux-netdev/2010/8/27/6283992
> 
> fixes this issue.  Essentially there were some errors that we were not
> returning the correct error value from probe which makes the PCI layer falsely
> assume that probe succeeded including falsely populating the PCI driver data. 
> We have tested this on a local setup by error injection.  After failing probe
> and returning a positive error value when we reboot the system we see the
> shutdown panic.  If the error value is negative then shutdown works fine.

Oh. Thanks you saved me expending worthless energy. This patch is in V2
of Sucheta's patch for bz#562921. It's not in R5.6 qlcnic driver. So it explains
why R6.1 didn't have an issue last night. I was about to pursue but you saved
me the effort.

thanx,

bob

Comment 18 RHEL Program Management 2010-11-16 15:10:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Jarod Wilson 2010-11-23 17:05:19 UTC
in kernel-2.6.18-233.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 22 Chris Ward 2010-12-02 15:25:50 UTC
Reminder! There should be a fix present for this BZ in snapshot 3 -- unless otherwise noted in a previous comment.

Please test and update this BZ with test results as soon as possible.

Comment 23 Alexander Todorov 2010-12-03 09:30:26 UTC
Hello,
this is still present with snap #4 on hp-ml370g6-01.rhts.eng.bos.redhat.com. Moving back to assigned.



sending kill signals...done 
disabling swap... 
	/dev/mapper/VolGroup00-LogVol01 
unmounting filesystems... 
	/mnt/runtime done 
	disabling /dev/loop0 
	/proc/bus/usb done 
	/proc done 
	/dev/pts done 
	/sys done 
	/tmp/ramfs done 
	/selinux done 
	/mnt/sysimage/boot done 
	/mnt/sysimage/sys done 
	/mnt/sysimage/proc/bus/usb done 
	/mnt/sysimage/proc done 
	/mnt/sysimage/selinux done 
	/mnt/sysimage/dev done 
	/mnt/sysimage done 
rebooting system 
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:  
 [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
PGD 237e58067 PUD 237e5c067 PMD 0  
Oops: 0000 [1] SMP  
last sysfs file: /devices/pci0000:00/0000:00:1d.0/usb1/1-0:1.0/bAlternateSetting 
CPU 0  
Modules linked in: sha256 aes_generic dm_crypt dm_emc dm_round_robin dm_multipath scsi_dh dm_snapshot dm_mirror dm_zero xfs lock_nolock gfs2 ext3 jbd ext4 crc16 jbd2 msdos dm_raid45 dm_message dm_mem_cache dm_region_hash dm_log dm_mod raid456 xor raid10 raid1 raid0 qla2xxx ata_piix libata cciss qla4xxx scsi_transport_fc netxen_nic qlcnic ehci_hcd uhci_hcd iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr edd loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs 
Pid: 1, comm: init Not tainted 2.6.18-225.el5 #1 
RIP: 0010:[<ffffffff88293569>]  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
RSP: 0018:ffff810237f97e08  EFLAGS: 00010282 
RAX: ffffffff882935e5 RBX: 0000000000000000 RCX: ffffffff8020dcfc 
RDX: ffff81013793e800 RSI: 0000000000000246 RDI: ffff81013793e800 
RBP: 0000000028121969 R08: ffff8101375b7810 R09: 0000000000000004 
R10: ffff810237f97c78 R11: ffffffff882935e5 R12: ffff81013793e800 
R13: 0000000000000008 R14: 0000000000000004 R15: 0000000000000000 
FS:  0000000012271850(0063) GS:ffffffff80424000(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000000000000040 CR3: 0000000237e54000 CR4: 00000000000006e0 
Process init (pid: 1, threadinfo ffff810237f96000, task ffff8101044ef7a0) 
Stack:  ffff81013793e800 0000000028121969 00000000fee1dead ffffffff882935ee 
 ffff810237baf870 ffffffff801cf253 0000000000000000 ffffffff8009cf42 
 0000000001234567 ffffffff8009d0cc ffffffffffffffff ffff810237f97ee8 
Call Trace: 
 [<ffffffff882935ee>] :qlcnic:qlcnic_shutdown+0x9/0x18 
 [<ffffffff801cf253>] device_shutdown+0x56/0x88 
 [<ffffffff8009cf42>] kernel_restart+0x9/0x46 
 [<ffffffff8009d0cc>] sys_reboot+0x146/0x1c7 
 [<ffffffff8003af19>] hrtimer_try_to_cancel+0x4a/0x53 
 [<ffffffff8005a453>] hrtimer_cancel+0xc/0x16 
 [<ffffffff80063ce5>] do_nanosleep+0x47/0x70 
 [<ffffffff8005a340>] hrtimer_nanosleep+0x58/0x118 
 [<ffffffff800a484f>] hrtimer_wakeup+0x0/0x22 
 [<ffffffff8001dde0>] sigprocmask+0xb7/0xdb 
 [<ffffffff80054cf5>] sys_nanosleep+0x4c/0x62 
 [<ffffffff8005d116>] system_call+0x7e/0x83 
 
 
Code: 48 8b 6b 40 48 89 ef e8 0b 17 fa f7 48 89 df e8 98 ff ff ff  
RIP  [<ffffffff88293569>] :qlcnic:__qlcnic_shutdown+0xe/0x8a 
 RSP <ffff810237f97e08> 
CR2: 0000000000000040 
 <0>Kernel panic - not syncing: Fatal exception

Comment 24 Marvell Linux NIC Driver 2010-12-03 10:08:48 UTC
What is the kernel version of snap #4?

Comment 25 Alexander Todorov 2010-12-03 10:20:20 UTC
It's in the traceback: 2.6.18-225.el5

Comment 26 Marvell Linux NIC Driver 2010-12-03 10:31:10 UTC
Accoring to Comment #20, this fix is available in kernel-2.6.18-233.el5. Shouldn't you be trying a newer kernel version?

Comment 27 Alexander Todorov 2010-12-03 10:37:58 UTC
(In reply to comment #20)
> in kernel-2.6.18-233.el5
> You can download this test kernel (or newer) from
> http://people.redhat.com/jwilson/el5
> 
> Detailed testing feedback is always welcomed.

Jarod,
how come that this kernel is not pulled into snap #4 and devel whiteboard says "Snapshot 3". Can you please make sure that the package will appear in the next snapshot and update whiteboard accordingly. 

In the mean time I'll revert the status back to ON_QA.

Comment 28 Jarod Wilson 2010-12-03 21:39:04 UTC
Sorry, not my department. I build the kernel, tag it in brew, put a note in bugzilla when the patches have been committed, when they're available for download, and add the build to the errata. Getting it into a compose is rel-eng's domain.

Comment 30 Andrius Benokraitis 2010-12-06 13:49:56 UTC
This fix will be available in Snapshot 5.

Comment 32 errata-xmlrpc 2011-01-13 21:22:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html