595417 – [RHEL6] bug in DMA handling on ibmvscsi driver

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 595417 - [RHEL6] bug in DMA handling on ibmvscsi driver

Summary: [RHEL6] bug in DMA handling on ibmvscsi driver

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	ppc64
OS:	All
Priority:	low
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Best
QA Contact:	Storage QE
Docs Contact:
URL:
Whiteboard:
Depends On:	579454
Blocks:
TreeView+	depends on / blocked

Reported:	2010-05-24 14:59 UTC by Aristeu Rozanski
Modified:	2010-11-11 16:16 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	579454
Environment:
Last Closed:	2010-11-11 16:16:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Aristeu Rozanski 2010-05-24 14:59:45 UTC

lso seen on kernel-debug version 2.6.32-29.el6:

+++ This bug was initially created as a clone of Bug #579454 +++

SACHIN P. SANT <sachinp.com>-
 After 1st stage of TEXT MODE INSTALLATION completes, the systems (p55 -power5 )
segfaults on 1st reboot.  Just before the segfault following badness message is
displayed on the console :

returning from prom_init
Phyp-dump not supported on this hardware
ibmvscsi 3000000b: fast_fail not supported in server
------------[ cut here ]------------
Badness at lib/dma-debug.c:820
sd 0:0:1:0: [sda] Assuming drive cache: write through
sd 0:0:1:0: [sda] Assuming drive cache: write through
sd 0:0:1:0: [sda] Assuming drive cache: write through

This same badness message can be recreated with latest upstream kernels. Here
is the complete back trace(against vanilla kernel) :

ibmvscsi 30000003: Client reserve enabled
ibmvscsi 30000003: sent SRP login
ibmvscsi 30000003: SRP_LOGIN succeeded
ibmvscsi 30000003: DMA-API: device driver frees DMA memory with wrong function
[device address=0x0000000000011520] [size=36 bytes] [mapped as scather-gather]
[unmapped as single]
------------[ cut here ]------------
Badness at lib/dma-debug.c:820
NIP: c00000000039bd24 LR: c00000000039bd20 CTR: c0000000000704a4
REGS: c00000000f69f6f0 TRAP: 0700   Tainted: G        W   (2.6.34-rc3)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 48000082  XER: 00000004
TASK = c00000000125cc70[0] 'swapper' THREAD: c000000001324000 CPU: 0
GPR00: c00000000039bd20 c00000000f69f970 c000000001322e38 00000000000000b6
GPR04: 0000000000000001 c0000000000c1ea8 0000000000000000 
0000000000000002
GPR08: 0000000000000000 c00000000125cc70 0000000000000f66 0000000000000001
GPR12: 0000000000000002 c00000000f669000 0000000000d47940 0000000001c00000
GPR16: ffffffffffffffff 0000000002673148 00000000018ff984 0000000000000006
GPR20: 0000000000000000 c000000000c7bb80 0000000000000000 
0000000000000000
GPR24: 0000000000000001 c000000001de6b00 0000000000000001 c000000001de6f80
GPR28: c000000109c696b0 c00000000f69fa90 c0000000012acab0 c00000000f69f970
NIP [c00000000039bd24] .check_unmap+0x3e0/0x784
LR [c00000000039bd20] .check_unmap+0x3dc/0x784
Call Trace:
[c00000000f69f970] [c00000000039bd20] .check_unmap+0x3dc/0x784 (unreliable)
[c00000000f69fa20] [c00000000039c3dc] .debug_dma_unmap_page+0x98/0xc8
[c00000000f69fb60] [d000000001dd63f4] .unmap_cmd_data+0xd0/0x11c [ibmvscsic]
[c00000000f69fc00] [d000000001dd8878] .handle_cmd_rsp+0xe0/0x154 [ibmvscsic]
[c00000000f69fca0] [d000000001dd7694] .ibmvscsi_handle_crq+0x44c/0x500
[ibmvscsic]
[c00000000f69fd40] [d000000001ddaca4] .rpavscsi_task+0x50/0xd8 [ibmvscsic]
[c00000000f69fdf0] [c0000000000c9e84] .tasklet_action+0x108/0x1d4
[c00000000f69fea0] [c0000000000cb778] .__do_softirq+0x168/0x2b8
[c00000000f69ff90] [c0000000000337b0] .call_do_softirq+0x14/0x24
[c000000001327840] [c000000000010664] .do_softirq+0xa0/0x104
[c0000000013278e0] [c0000000000cb0e4] .irq_exit+0x70/0xd0
[c000000001327960] [c00000000000fee4] .do_IRQ+0x214/0x2d8
[c000000001327a20] [c000000000004d28] hardware_interrupt_entry+0x28/0x2c
--- Exception: 501 at .raw_local_irq_restore+0xc0/0xdc
   LR = .cpu_idle+0x12c/0x1d0
[c000000001327d10] [c000000001290a28] mv88e6131_switch_driver+0x8da0/0x35588
(unreliable)
[c000000001327db0] [c000000000017e14] .cpu_idle+0x12c/0x1d0
[c000000001327e50] [c00000000000a71c] .rest_init+0xe8/0x10c
[c000000001327ee0] [c000000000a12e38] .start_kernel+0x4ec/0x510
[c000000001327f90] [c000000000008c64] .start_here_common+0x2c/0x48
Instruction dump:
e81c001a e93d001a e97e8030 78001f24 79291f24 e87e80c0 e8dd0028 e8fd0030
7d0b002a 7d2b482a 48393a49 60000000 <0fe00000> 480000b8 2f800003 409e00f4
Mapped at:
[<c00000000039c76c>] .debug_dma_map_sg+0xa0/0x220
[<c0000000005085c4>] .scsi_dma_map+0x120/0x164
[<d000000001dd8a6c>] .ibmvscsi_queuecommand+0x180/0x5d0 [ibmvscsic]
[<c0000000004fd9b4>] .scsi_dispatch_cmd+0x21c/0x2cc
[<c000000000506058>] .scsi_request_fn+0x3cc/0x57c
scsi 0:0:1:0: Direct-Access     AIX      VDASD            0001 PQ: 0 ANSI: 3

This problem has been reported to community. Here is the link :
http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-April/081541.html

The following patch should fix this issue :

http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-April/081545.html

Please include the above patch in F13.

--- Additional comment from bugproxy.com on 2010-05-03 14:52:13 EDT ---

------- Comment From Subrata Modak subrata.ibm.com 2010-05-03 14:43 EDT-------
The issue is still reproducible with . The following error still occurs. Probably, the proposed patch has not made to the fedora packages:

Segmentation fault
No root device found
Segmentation fault
No root device found
Boot has failed, sleeping forever.

Redhat,

Any Information on this ?

Regards--
Subrata

--- Additional comment from bugproxy.com on 2010-05-04 08:14:02 EDT ---

------- Comment From Subrata Modak subrata.ibm.com 2010-05-04 07:53 EDT-------
Redhat,

Any news about this patch going into Fedora Kernel?

Regards--
Subrata

Comment 2 Steve Best 2010-06-03 17:49:03 UTC

posted to rh-kernel mailing list
http://post-office.corp.redhat.com/archives/rhkernel-list/2010-June/msg00205.html

Comment 3 RHEL Program Management 2010-06-07 15:53:29 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Mike Gahagan 2010-06-08 14:58:34 UTC

I'm still seeing this with the -33.debug kernel.

vio_register_driver: driver ibmvscsi registering
ibmvscsi 30000002: SRP_VERSION: 16.a
scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8
ibmvscsi 30000002: partner initialization complete
ibmvscsi 30000002: host srp version: 16.a, host partition 06-54F4A (1), OS 3, max io 262144
ibmvscsi 30000002: Client reserve enabled
ibmvscsi 30000002: sent SRP login
ibmvscsi 30000002: SRP_LOGIN succeeded
ibmvscsi 30000002: DMA-API: device driver frees DMA memory with wrong function [device address=0x0000000000004a10] [size=36 bytes] [mapped as scather-gather] [unmapped as single]
------------[ cut here ]------------
Badness at lib/dma-debug.c:820
NIP: c00000000030d094 LR: c00000000030d090 CTR: 0000000000000001
REGS: c000000002faf720 TRAP: 0700   Not tainted  (2.6.32-33.el6.ppc64.debug)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 28000042  XER: 20000001
TASK = c000000001056b20[0] 'swapper' THREAD: c000000001104000 CPU: 0
GPR00: c00000000030d090 c000000002faf9a0 c000000001103c48 00000000000000b6 
GPR04: 0000000000000001 c00000000008d634 0000000000000000 0000000000000002 
GPR08: 0000000000000000 c000000001056b20 c0000000000bf318 0000000000000001 
GPR12: 0000000028000028 c0000000011e2500 0000000001b1fa78 0000000000d48800 
GPR16: 0000000004300000 000000000021dcff 0000000000c85ba8 c000000001046dc0 
GPR20: 0000000000000006 0000000000000000 c000000001181180 000000000000000a 
GPR24: 0000000000000001 c000000001f91e90 c000000001026e98 c0000000011c8778 
GPR28: c000000001fb1d00 c0000001fdbd9bd8 c0000000010a39d8 c000000002fafac0 
NIP [c00000000030d094] .check_unmap+0x794/0x830
LR [c00000000030d090] .check_unmap+0x790/0x830
Call Trace:
[c000000002faf9a0] [c00000000030d090] .check_unmap+0x790/0x830 (unreliable)
[c000000002fafa50] [c00000000030d4a4] .debug_dma_unmap_page+0x94/0xe0
[c000000002fafb90] [d000000001860364] .unmap_cmd_data+0xf4/0x180 [ibmvscsic]
[c000000002fafc20] [d000000001862944] .handle_cmd_rsp+0x74/0x150 [ibmvscsic]
[c000000002fafcb0] [d0000000018618c4] .ibmvscsi_handle_crq+0x434/0x5b0 [ibmvscsic]
[c000000002fafd40] [d000000001864b44] .rpavscsi_task+0x44/0xe0 [ibmvscsic]
[c000000002fafdf0] [c000000000095284] .tasklet_action+0x1b4/0x1e0
[c000000002fafea0] [c000000000096cec] .__do_softirq+0x13c/0x2d0
[c000000002faff90] [c000000000032948] .call_do_softirq+0x14/0x24
[c000000001107900] [c00000000000e9b0] .do_softirq+0x140/0x160
[c0000000011079a0] [c0000000000966a4] .irq_exit+0xc4/0xd0
[c000000001107a20] [c00000000000ec14] .do_IRQ+0x144/0x230
[c000000001107ad0] [c000000000004804] hardware_interrupt_entry+0x1c/0x98
--- Exception: 501 at .cpu_idle+0x15c/0x1e0
    LR = .cpu_idle+0x15c/0x1e0
[c000000001107dc0] [c000000000016340] .cpu_idle+0x150/0x1e0 (unreliable)
[c000000001107e70] [c000000000009d74] .rest_init+0x84/0xa0
[c000000001107ef0] [c000000000840dc8] .start_kernel+0x598/0x5b8
[c000000001107f90] [c0000000000083d4] .start_here_common+0x1c/0x48
Instruction dump:
e97d001a e81f001a e93e8050 e87e80e0 e8df0028 e8ff0030 796b1f24 78001f24 
7d09582a 7d29002a 482a4fc9 60000000 <0fe00000> 4bfffca0 e89e8058 7c852378 
Mapped at:
 [<c0000000003d157c>] .scsi_dma_map+0x10c/0x150
 [<d000000001862bc0>] .ibmvscsi_queuecommand+0x1a0/0x660 [ibmvscsic]
 [<c0000000003c5910>] .scsi_dispatch_cmd+0x220/0x3e0
 [<c0000000003cf0a4>] .scsi_request_fn+0x484/0x580
 [<c0000000002cdcb8>] .__generic_unplug_device+0x58/0x70
scsi 0:0:3:0: Direct-Access     AIX      VDASD            0001 PQ: 0 ANSI: 3
scsi 0:0:4:0: Direct-Access     AIX      VDASD            0001 PQ: 0 ANSI: 3
scsi: waiting for bus probes to complete ...
scsi_scan_0 used greatest stack depth: 9008 bytes left
sd 0:0:3:0: [sda] 251658240 512-byte logical blocks: (128 GB/120 GiB)
sd 0:0:3:0: [sda] Write Protect is off
sd 0:0:3:0: [sda] Mode Sense: 2f 00 00 08
sd 0:0:4:0: [sdb] 251658240 512-byte logical blocks: (128 GB/120 GiB)
sd 0:0:3:0: [sda] Cache data unavailable
sd 0:0:3:0: [sda] Assuming drive cache: write through
sd 0:0:4:0: [sdb] Write Protect is off
sd 0:0:4:0: [sdb] Mode Sense: 2f 00 00 08
sd 0:0:4:0: [sdb] Cache data unavailable
sd 0:0:4:0: [sdb] Assuming drive cache: write through
sd 0:0:3:0: [sda] Cache data unavailable
sd 0:0:3:0: [sda] Assuming drive cache: write through
 sda:
sd 0:0:4:0: [sdb] Cache data unavailable
sd 0:0:4:0: [sdb] Assuming drive cache: write through
 sdb: sda1 sda2 sda3
sd 0:0:3:0: [sda] Cache data unavailable
sd 0:0:3:0: [sda] Assuming drive cache: write through
sd 0:0:3:0: [sda] Attached SCSI disk
 sdb1
sd 0:0:4:0: [sdb] Cache data unavailable
sd 0:0:4:0: [sdb] Assuming drive cache: write through
sd 0:0:4:0: [sdb] Attached SCSI disk

Comment 5 Aristeu Rozanski 2010-07-01 16:21:09 UTC

Patch(es) available on kernel-2.6.32-42.el6

Comment 8 Mike Gahagan 2010-08-19 21:38:48 UTC

No longer seeing this on an IBM Power6 with vscsi with the -66 kernel (snapshot 12)

Comment 9 releng-rhel@redhat.com 2010-11-11 16:16:05 UTC

Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.