Bug 431907

Summary: 5.2: service iscsi stop on a busy target causes a soft CPU lockup on ppc64.
Product: Red Hat Enterprise Linux 5 Reporter: Barry Donahue <bdonahue>
Component: kernelAssignee: David Howells <dhowells>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: bdonahue, bpeters, dhoward
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-22 23:13:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
It a piece of the /var/log/messages file with several examples of the lockup. none

Description Barry Donahue 2008-02-07 19:30:27 UTC
Description of problem: If you install the 20080206.nightly on a ppc64 box with
the latest iscsi-initiator-utils, executing service iscsi stop on a busy target
will cause a soft lockup.


Version-Release number of selected component (if applicable):
   kernel: 2.6.18-77.el5
   iscsi: iscsi-initiator-utils-6.2.0.868-0.3.el5

How reproducible:Every time


Steps to Reproduce:
1. Install 5.2
2. Install latest iscsi-initiator-utils
3. Add this to /etc/iscsi/iscsid.conf: node.conn[0].iscsi.HeaderDigest = None
4. --login to the target build files systen and mount it.
5. Do IO to iscsi LUN and the do service iscsi stop.

Actual results: You will encounter a soft lokup.
   Target system was ibm-js21-01.lab.boston.redhat.com.


Expected results: We should only get some IO errors on the iscsi LUN.


Additional info:
   Actual test script was:
#!/bin/bash
let "x=1"
while [ $x -lt 100 ] 
        do
                mount /dev/sdb1 /mnt/sdb
                status=$?
                if [ $status != 0 ]; then
                        echo "Mount FAILED"
                        exit
                fi
                dd if=/dev/zero of=/mnt/sdb/file bs=1024 count=1000000&
                echo "x=$x"
                sleep 5
                service iscsi stop
                umount /mnt/sdb
                service iscsi start
                sleep 5
                let x++
        done

Comment 1 Barry Donahue 2008-02-07 19:30:27 UTC
Created attachment 294258 [details]
It a piece of the /var/log/messages file with several examples of the lockup.

Comment 2 David Howells 2008-03-28 15:24:00 UTC
Can you run with a kernel that has LOCKDEP enabled?  That might determine what 
is causing CPUs to get stuck.

Comment 3 David Howells 2008-03-28 15:30:08 UTC
The log is a bit weird: there appear to be soft lockups occurring in 
vprintk().  This suggests that either someone's holding the spinlock and not
letting go, or that another CPU is hammering the printk's so hard that the
faulting CPU isn't getting a look in.  If this is running in a virtual
partition on a ppc64 machine, then the yield-to-hypervisor nature of spinlocks
there may be exacerbating the situation.

Comment 4 Peter Martuccelli 2008-04-07 16:28:43 UTC
This is not a blocker or RHEL 5.2.  Moving out to RHEL 5.2 for further
investigation.

Comment 6 RHEL Program Management 2008-07-25 17:05:39 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 7 Ludek Smid 2008-07-25 21:54:00 UTC
Unfortunately the previous automated notification about the
non-inclusion of this request in Red Hat Enterprise Linux 5.3 used
the wrong text template. It should have read: this request has been
reviewed by Product Management and is not planned for inclusion
in the current minor release of Red Hat Enterprise Linux.

If you would like this request to be reviewed for the next minor
release, ask your support representative to set the next rhel-x.y
flag to "?" or raise an exception.

Comment 9 RHEL Program Management 2014-03-07 12:46:50 UTC
This bug/component is not included in scope for RHEL-5.11.0 which is the last RHEL5 minor release. This Bugzilla will soon be CLOSED as WONTFIX (at the end of RHEL5.11 development phase (Apr 22, 2014)). Please contact your account manager or support representative in case you need to escalate this bug.

Comment 10 Barry Donahue 2014-03-07 13:42:43 UTC
That sounds like the best plan.