Bug 505555

Summary:	Rare: When zFCP.ko loads, kernel panics.
Product:	Red Hat Enterprise Linux 5	Reporter:	Arlinton Bourne <abourne>
Component:	kernel	Assignee:	Hans-Joachim Picht <hpicht>
Status:	CLOSED NOTABUG	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.3	CC:	cward, dkovalsk, epasch, hpicht, jglauber, jjarvis, jkachuck, peterm
Target Milestone:	rc	Keywords:	Reopened, TestBlocker
Target Release:	---
Hardware:	s390x
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-11-30 13:51:43 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	533192

Description Arlinton Bourne 2009-06-12 12:34:16 UTC

Description of problem:
When zFCP loads (ananconda or live system) the kernel panics. I can reproduce this on two of my zFCP enabled guests, and my guests only.

Here's the panic:

scsi0 : zfcp 
operand exception: 0015 Ý#1¨ 
CPU:    0    Tainted: G      
Process ksoftirqd/0 (pid: 3, task: 00000000007fe618, ksp: 0000000001f1fd90) 
Krnl PSW : 0704000180000000 0000000040897130 (tiqdio_tl+0x34c/0x267c Ýqdio¨) 
Krnl GPRS: 0000000000000002 000000000001000b 00000000ffffffff 00000000ffffffff 
           00000000ffffffff 000000000001000b 00000000408a8c00 0000000000000000 
           000000003e005000 0000000000000000 0000000000000040 00000000408a4818 
           0000000040888000 000000004089b740 0000000001f0bec8 0000000001f0bde0 
Krnl Code: b2 22 00 50 88 50 00 1c a7 f4 00 aa bf bf 81 d0 a7 74 00 a6  
Call Trace: 
(Ý<00000000001a5ff8>¨ ccw_device_timeout+0x0/0x84) 
 Ý<0000000000043eac>¨ tasklet_hi_action+0x108/0x1cc 
 Ý<00000000000433da>¨ __do_softirq+0xba/0x190 
 Ý<000000000001ec8a>¨ do_softirq+0x8a/0xb0 
(Ý<00000003003b0007>¨ 0x3003b0007) 
 Ý<000000000004355c>¨ ksoftirqd+0xac/0x13c 
 Ý<0000000000055d94>¨ kthread+0x118/0x14c 
 Ý<000000000001859e>¨ kernel_thread_starter+0x6/0xc 
 Ý<0000000000018598>¨ kernel_thread_starter+0x0/0xc 
 
 <0>Kernel panic - not syncing: Fatal exception in interrupt 
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00015EF8

Relevant FCP options:
FCP_1="0.0.4206 0X01 0X5005076300C4156D 0X0 0X5308000000000000"
FCP_2="0.0.4207 0X02 0X5005076300C4156D 0X1 0X5309000000000000"

If you need access to the guests ping me via e-mail :)

Comment 2 Jan Glauber 2009-07-06 12:27:46 UTC

We've fixed a similar problem in RHEL5.3. What is the exact kernel version that panics?

Comment 3 Arlinton Bourne 2009-07-06 22:20:17 UTC

(In reply to comment #2)
> We've fixed a similar problem in RHEL5.3. What is the exact kernel version that
> panics?  

This is from kicking off an install from anaconda.

Kernel Version 2.6.18-128.el5 (5.3 Anaconda):
Starting graphical installation... 
scsi0 : zfcp 
operand exception: 0015 Ý#1¨ 
CPU:    0    Tainted: G      
Process ksoftirqd/0 (pid: 3, task: 00000000007fe618, ksp: 0000000001f1fd90) 
Krnl PSW : 0704000180000000 0000000040897130 (tiqdio_tl+0x34c/0x267c Ýqdio¨) 
Krnl GPRS: 0000000000000002 0000000000010007 00000000ffffffff 00000000ffffffff 
           00000000ffffffff 0000000000010007 00000000408a8c00 0000000000000000 
           000000003e78d000 0000000000000000 0000000000000040 00000000408a4818 
           0000000040888000 000000004089b740 0000000001f0bec8 0000000001f0bde0 
Krnl Code: b2 22 00 50 88 50 00 1c a7 f4 00 aa bf bf 81 d0 a7 74 00 a6  
Call Trace: 
(Ý<00000000001a5ff8>¨ ccw_device_timeout+0x0/0x84) 
 Ý<0000000000043eac>¨ tasklet_hi_action+0x108/0x1cc 
 Ý<00000000000433da>¨ __do_softirq+0xba/0x190 
 Ý<000000000001ec8a>¨ do_softirq+0x8a/0xb0 
(Ý<00000003003b0007>¨ 0x3003b0007) 
 Ý<000000000004355c>¨ ksoftirqd+0xac/0x13c 
 Ý<0000000000055d94>¨ kthread+0x118/0x14c 
 Ý<000000000001859e>¨ kernel_thread_starter+0x6/0xc 
 Ý<0000000000018598>¨ kernel_thread_starter+0x0/0xc 
 
 <0>Kernel panic - not syncing: Fatal exception in interrupt 
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
00: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 00015EF8


Kernel Version 2.6.18-156.el5 (5.4 Nightly Anaconda): 
The VNC server is now running. 
Starting graphical installation... 
operand exception: 0015 Ý#1¨ 
CPU: 1 Tainted: G      2.6.18-156.el5 #1 
Process ksoftirqd/1 (pid: 5, task: 0000000001f69888, ksp: 0000000001f6fd90) 
Krnl PSW : 0704000180000000 000000004088eb3c (tiqdio_tl+0x34c/0x287c Ýqdio¨) 
Krnl GPRS: 0000000000000002 0000000000010007 00000000ffffffff 00000000ffffffff 
           00000000ffffffff 0000000000010007 00000000408a0a00 0000000000000000 
           000000003e0a8000 0000000000000000 0000000000000040 000000004089c620 
           000000004087f000 0000000040893378 0000000001f4fec8 0000000001f4fdd8 
Krnl Code: b2 22 00 50 88 50 00 1c a7 f4 00 aa bf bf 81 d0 a7 74 00 a6  
Call Trace: 
(Ý<000000000027d798>¨ 0x27d798) 
 Ý<0000000000045078>¨ tasklet_hi_action+0x108/0x1cc 
 Ý<00000000000445a6>¨ __do_softirq+0xba/0x190 
 Ý<000000000001ecda>¨ do_softirq+0x8a/0xb0 
(Ý<00000003003cc007>¨ 0x3003cc007) 
 Ý<0000000000044728>¨ ksoftirqd+0xac/0x13c 
 Ý<0000000000057018>¨ kthread+0x118/0x14c 
 Ý<00000000000185ae>¨ kernel_thread_starter+0x6/0xc 
 Ý<00000000000185a8>¨ kernel_thread_starter+0x0/0xc 
 
 <0>Kernel panic - not syncing: Fatal exception in interrupt 
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
01: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 0001F02A

Comment 6 IBM Bug Proxy 2009-07-16 12:30:39 UTC

------- Comment From ursula.braun.com 2009-07-16 08:28 EDT-------
Is it possible to take a dump after the panic occurred?

Comment 9 David Kovalsky 2009-09-03 08:34:01 UTC

Arlinton, can you provide the dump requested in comment #6?

Comment 10 IBM Bug Proxy 2009-09-09 09:40:33 UTC

------- Comment From mgrf.com 2009-09-09 05:38 EDT-------
(In reply to comment #9)
> Arlinton, can you provide the dump requested in comment #6?
>

Please provide dump to enable for debugging, Thx

Comment 11 Arlinton Bourne 2009-09-09 15:10:04 UTC

(In reply to comment #10)
> ------- Comment From mgrf.com 2009-09-09 05:38 EDT-------
> (In reply to comment #9)
> > Arlinton, can you provide the dump requested in comment #6?
> >
> 
> Please provide dump to enable for debugging, Thx  

Unfortunately during the last outage, we did a power-on reset of the z9 and the problem has 'disappeared' (this happened before - hence the word Rare in the topic). Stay tuned as I try to replicate the issue.

Comment 12 Arlinton Bourne 2009-09-12 08:37:13 UTC

(In reply to comment #10)
> ------- Comment From mgrf.com 2009-09-09 05:38 EDT-------
> (In reply to comment #9)
> > Arlinton, can you provide the dump requested in comment #6?
> >
> 
> Please provide dump to enable for debugging, Thx  

For the time this is reproducible again, what is the recommended way to dump the memory?

Comment 17 Chris Ward 2009-11-18 10:08:17 UTC

@IBM (ursula.braun.com)

Please review and respond to comment #12. Thanks.

Comment 18 IBM Bug Proxy 2009-11-18 10:40:47 UTC

------- Comment From ursula.braun.com 2009-11-18 05:39 EDT-------
Dump handling for RHEL5 is described here:
http://www.ibm.com/developerworks/linux/linux390/october2005_documentation.html

Comment 23 IBM Bug Proxy 2010-04-01 11:00:48 UTC

------- Comment From mgrf.com 2010-04-01 06:57 EDT-------
(In reply to comment #4)
> Dump handling for RHEL5 is described here:
> http://www.ibm.com/developerworks/linux/linux390/october2005_documentation.html

Hello Red Hat,
any more questions?

Comment 27 IBM Bug Proxy 2010-09-21 09:31:50 UTC

------- Comment From  2010-09-21 05:26 EDT-------
Hello Redhat,

Can we close this bug, if it is no more reproducing?

Thanks
Muni