Bug 677703
Summary: | [RHEL5.5] Panic in iscsi_sw_tcp_data_ready() | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gary Smith <gasmith> | ||||
Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||
Status: | CLOSED ERRATA | QA Contact: | Storage QE <storage-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.5 | CC: | bdonahue, coughlan, cward, dhoward, jwest, martinez, martin.wilck, mchristi, pbenas, qcai, skito | ||||
Target Milestone: | rc | Keywords: | OtherQA, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Running a reboot test on an iSCSI root host resulted in kernel panic. When the iscsi_tcp module is destroying a connection it grabs the sk_callback_lock and clears the sk_user_data/conn pointer to signal that the callback functions should not execute the operation. However, some functions were not grabbing the lock, causing a NULL pointer kernel panic when iscsi_sw_tcp_conn_restore_callbacks was called and, consequently, one of the callbacks was called. With this update, the underlying source code has been modified to address this issue, and kernel panic no longer occurs.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-21 09:27:08 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 703056 | ||||||
Attachments: |
|
Description
Gary Smith
2011-02-15 16:05:08 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 483588 [details] access sk user data under callback lock Hey Gary, The attached patch should fix the problem. I cannot replicate the problem here so I need FJ to test. I made a i686 and x86_64 kernels here http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3171660 with the patch merged. Could you give to your contacts for verification? Thanks Jusat encountered this problem again on RHEL5.6 after ~1400 reboots. Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [<ffffffff880f7df6>] :iscsi_tcp:iscsi_sw_tcp_data_ready+0x19/0x5c PGD 0 Oops: 0000 [1] SMP last sysfs file: /class/iscsi_transport/tcp/caps CPU 0 Modules linked in: be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i c xgb3 dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg shpchp tpm_tis tpm i7core_edac tpm_bios i2c_i801 pcspkr edac_mc i2c_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snap shot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc ahci libata ixgbe 8021q dca iscsi_ibft iscsi_tcp libiscsi_tcp libiscs i2 scsi_transport_iscsi2 scsi_transport_iscsi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-243.el5.bz647824.2.1fts #1 RIP: 0010:[<ffffffff880f7df6>] [<ffffffff880f7df6>] :iscsi_tcp:iscsi_sw_tcp_data_ready+0x19/0x5c RSP: 0018:ffffffff804a5c20 EFLAGS: 00010282 RAX: 0000000000000001 RBX: ffff81047ee89980 RCX: 00000000fffc7e11 RDX: 00000000fffc7e11 RSI: 0000000000000000 RDI: ffff81047ee89ac0 RBP: ffff81047ee89980 R08: 00000000c149640f R09: 00000000fffc7e11 R10: 000000000002fa57 R11: 00000000000004b0 R12: 0000000000000000 R13: 0000000000000020 R14: ffff81010fe6e0a2 R15: ffff81047f3fee18 FS: 0000000000000000(0000) GS:ffffffff80426000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff80456000, task ffffffff80311b60) Stack: ffff81087e82e300 ffff81047fc20ac0 000000004a4a2e30 ffff81047ee89980 ffff81047ee89980 ffff81047ee89980 ffff81047f3fedc0 ffffffff8001bf16 0000000000000000 ffff81047ee89980 ffff81087e82e300 ffff81047f3fedc0 Call Trace: <IRQ> [<ffffffff8001bf16>] tcp_rcv_established+0x5dc/0x8b9 [<ffffffff8003b87f>] tcp_v4_do_rcv+0x2a/0x2fa [<ffffffff80027466>] tcp_v4_rcv+0x9f9/0xa4d [<ffffffff80027466>] tcp_v4_rcv+0x9f9/0xa4d [<ffffffff80034acb>] ip_local_deliver+0x19d/0x263 [<ffffffff80034acb>] ip_local_deliver+0x19d/0x263 [<ffffffff80035c27>] ip_rcv+0x539/0x57c [<ffffffff80020ba2>] netif_receive_skb+0x470/0x49f [<ffffffff8812efed>] :ixgbe:ixgbe_clean_rx_irq+0x537/0x7a8 [<ffffffff8813528f>] :ixgbe:ixgbe_clean_rxtx_many+0x10c/0x223 [<ffffffff8000ca35>] net_rx_action+0xac/0x1b3 [<ffffffff8001253c>] __do_softirq+0x89/0x133 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 [<ffffffff8006d5f5>] do_softirq+0x2c/0x7d [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff801a13ab>] acpi_processor_idle_simple+0x1af/0x31c [<ffffffff801a1368>] acpi_processor_idle_simple+0x16c/0x31c [<ffffffff801a11fc>] acpi_processor_idle_simple+0x0/0x31c [<ffffffff800494f8>] cpu_idle+0x95/0xb8 [<ffffffff80461807>] start_kernel+0x220/0x225 [<ffffffff8046122f>] _sinittext+0x22f/0x236 Code: 49 8b 6c 24 08 e8 2a cd f6 f7 48 89 e6 48 c7 c2 72 7b 0f 88 RIP [<ffffffff880f7df6>] :iscsi_tcp:iscsi_sw_tcp_data_ready+0x19/0x5c RSP <ffffffff804a5c20> CR2: 0000000000000008 <0>Kernel panic - not syncing: Fatal exception Weird ... I can't read comment #7 here although I received an email with its contents. However I can confirm that attachment #483588 [details] looks promising. > Weird ... I can't read comment #7 here although I received an email with its > contents. However I can confirm that attachment #483588 [details] looks promising. Sorry Martin, my bad, I made comment#7 private out of habit because it was directed to me internally and contained an internal link. Packages now uploaded to support case. Verified - Gary's test kernel fixes the problem. Patch(es) available in kernel-2.6.18-261.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Running a reboot test on an iSCSI root host resulted in kernel panic. When the iscsi_tcp module is destroying a connection it grabs the sk_callback_lock and clears the sk_user_data/conn pointer to signal that the callback functions should not execute the operation. However, some functions were not grabbing the lock, causing a NULL pointer kernel panic when iscsi_sw_tcp_conn_restore_callbacks was called and, consequently, one of the callbacks was called. With this update, the underlying source code has been modified to address this issue, and kernel panic no longer occurs. We saw a very similar panic situation on RHEL6.1 lately (see bug #718786). The RHEL6.1 kernel (and upstream) don't seem to contain anything similar to the patch applied here ... Is it possible that the same fix is needed in 6.1 (and upstream), too? I just sent the fix upstream a couple weeks ago so I do not think it is in Linus's tree. It is in the scsi maintainer's tree. I also just sent it for 6.2. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |