Bug 140472

Summary:

x86, x86_64 and IA64 scsi inquiry command hangs in wait_for_completion

Product:

Red Hat Enterprise Linux 4

Reporter:

sheryl sage <sheryl.sage>

Component:

kernel

Assignee:

Doug Ledford <dledford>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

4.0

CC:

bjohnson, davej, jturner, linux26port, rkenna, tburke

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-06-08 15:13:00 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

146015, 147461

Attachments:

Description	Flags
qla2x00 driver bug fix patch	none

Description sheryl sage 2004-11-22 23:34:18 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET 
CLR 1.1.4322)

Description of problem:
Sending multiple scsi inquiry commands to a disk attached via a 
qlogic fiber card eventually causes the scsi inquiry command to hang 
in wait_for_completion.

This has only been seen on IA64 and x86_64 systems, I have been 
unable to reproduce on IA32 sysytems.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
Running the following will result in a number of hung scsiinfo 
processes after a few seconds (where sdc is a fiber disk connected 
via a qlogic card).

while :; do scsiinfo -i /dev/sdc & done

Additional info:

Comment 1 sheryl sage 2004-11-22 23:41:41 UTC

This has only been seen on IA64 and x86_64 systems, I have been 
unable to reproduce on IA32 sysytems.

Stacktrace of a hung scsiinfo process (IA64):

Stack traceback for pid 11048
0xe000004074b18000    11048     8704  0    1   D  0xe000004074b18490  
scsiinfo
0xa00000010008cfb0 schedule+0xf70
        args (0xe000004070a167f8, 0xe0000040443e3800, 
0xe000000039df0000,
0xe000004070a1677c, 0xe0000040443e3830)
        kernel 0xa00000010008c040 0xa00000010008d8a0
0xa00000010008eff0 wait_for_completion+0x1b0
        args (0xe000004074b1fc10, 0x2, 0xe000004074b1fbc8, 
0xe000004074b1fbd0,
0xa00000010038a140)
        kernel 0xa00000010008ee40 0xa00000010008f100
0xa00000010038a140 blk_execute_rq+0x1a0
        args (0xe0000000023a2480, 0xe000000039c76d80, 
0xe0000000023baac0,
0xe0000000023bab78, 0xa000000100392030)
        kernel 0xa000000100389fa0 0xa00000010038a1a0
0xa000000100392030 scsi_cmd_ioctl+0x10d0
        args (0x400, 0xe0000000023bab40, 0xfffffffbfff, 
0xe0000000023babb0,
0xe0000000023a2480)
        kernel 0xa000000100390f60 0xa000000100392260
0xa00000020003f640 [sd_mod]sd_ioctl+0x100
        args (0x5382, 0xe000004044012780, 0x1, 0x600000000000bb20,
0xe000000039c76d80)
        sd_mod 0xa00000020003f540 0xa00000020003fa60
0xa00000010038ccb0 blkdev_ioctl+0x110
        args (0xe000004043288e80, 0x600000000000bb20, 
0xa00000020003f540,
0xe000000039c76d80, 0xe000004043288f38)
        kernel 0xa00000010038cba0 0xa00000010038d640
0xa00000010015a280 block_ioctl+0x40
        args (0xe000000004a5aaf8, 0xe000004044012780, 0x1, 
0x600000000000bb20,
0xa000000100177ba0)
0xa00000010038ccb0 blkdev_ioctl+0x110
        args (0xe000004043288e80, 0x00000000000bb20, 
0xa00000020003f540,
0xe000000039c76d80, 0xe000004043288f38)
        kernel 0xa00000010038cba0 0xa00000010038d640
0xa00000010015a280 block_ioctl+0x40
        args (0xe000000004a5aaf8, 0xe000004044012780, 0x1, 
0x600000000000bb20,
0xa000000100177ba0)

Comment 2 sheryl sage 2004-11-22 23:42:36 UTC

Running the following will result in a number of hung scsiinfo 
processes after a
few seconds (where sdc is a fiber disk connected via a qlogic card).

while :; do scsiinfo -i /dev/sdc & done

Comment 3 sheryl sage 2004-11-22 23:43:58 UTC

It is happening on i386 also:
Reproduction steps:
1) write the following two scripts

swan:~ # cat test.sh
while [ 0 ]
do
        scsiinfo -i /dev/sdc >/dev/null 2>&1
done

swan:~ # cat runparllal.sh
limit=10
count=1
while [ $count -le $limit ]
do
        ./test.sh &
        let count=count+1
done

Now run "runparllal.sh". After some time "scsiinfo" commands are 
hangs.

Comment 5 sheryl sage 2004-12-09 23:49:57 UTC

No update from RedHat ?

Comment 6 sheryl sage 2004-12-15 19:31:37 UTC

qla_iocb.c -- don't use block layer hw segment counts???  Please look 
to the maintainer of this code for the fix.

Comment 10 Doug Ledford 2005-02-09 17:44:47 UTC

On my two test systems, both systems pass the "do commands get lost"
test and no commands ever fail 2.6.9-5.0.1.12 smp kernel (although
with the QLogic driver in particular there does appear to be fairness
issues, in other words with 10 bash scripts trying to all send
commands to the drive, 1 of the 10 will be sending 100+ commands per
second while the other 9 will be momentarily stalled, but the other 9
always end up getting their chance eventually, so they aren't stalled
completely, with aic79xx driver this isn't an issue, all the scripts
run at about the same rate).

However, when attempting to let the test scripts run overnight on the
2.6.9-5.0.1.12 smp kernel on both ia32 and x86_64, both machines
crashed.  The x86_64 machine triggered the oom killer and basically
killed everything on the machine without making any headway towards
freeing up the memory it needed.  The ia32 machine died completely and
wouldn't respond to anything keyboard input, network pings, etc.

So, the basic summary right now is I'm no longer seeing the issues
that Veritas was seeing, but there are new issues of a different
nature that have to be addressed.

Comment 11 Doug Ledford 2005-02-10 02:24:14 UTC

My testing confirms that this bk changeset:

[dledford@compaq-rhel4 linus]$ bk export -tpatch -r1.1938.423.2
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/12/03 12:40:55-08:00 greg
#   [PATCH] sysfs: fix sysfs_dir_close memory leak
#
#   sysfs_dir_close did not free the "cursor" sysfs_dirent used for
keeping
#   track of position in the list of sysfs_dirent nodes.  Consequently,
#   doing a "find /sys" would leak a sysfs_dirent for each of the 1140
#   directories in my /sys tree, or about 36kB each time.
#
#
#   From: "Adam J. Richter" <adam>
#   Signed-off-by: Greg Kroah-Hartman <greg>
#   Signed-off-by: Linus Torvalds <torvalds>
#
# fs/sysfs/dir.c
#   2004/12/03 10:42:51-08:00 greg +2 -0
#   sysfs: fix sysfs_dir_close memory leak
#
diff -Nru a/fs/sysfs/dir.c b/fs/sysfs/dir.c
--- a/fs/sysfs/dir.c    2005-02-09 21:21:24 -05:00
+++ b/fs/sysfs/dir.c    2005-02-09 21:21:24 -05:00
@@ -351,6 +351,8 @@
        list_del_init(&cursor->s_sibling);
        up(&dentry->d_inode->i_sem);

+       release_sysfs_dirent(cursor);
+
        return 0;
 }

solves the leaked size-32 kmalloc's and should therefore solve the OOM
problem.  It's possible that the lockup on the ia32 box was actually a
memory deadlock and could possibly be this same thing.  However,
during further testing, an oops was encountered and bugzilla 147638
was created to track the oops.

Comment 12 Buddhi Madhav 2005-02-15 11:54:31 UTC

Some more update on this bug:

This bug is in  "qla2x00_start_scsi()"
(drivers/scsi/qla2xxx/qla_iocb.c). It is relaying on the 
"cmd->request->nr_hw_segments" value to compute the required number of
request entries. This has junk value for inquiry commands. The reason
is  "request->nr_hw_segments" is not initialised in "get_request()"
function.
                                                                     
                                                      
Since "cmd->request->nr_hw_segments" has junk value, and the
corresponding required number of request entries are large. With this
large number of entries, qla2x00_start_scsi() failed to issue the
request. So, this request is repetedly put into pending queue, and is
getting repetedly timedout.
                                                                     
         
Solution:
----------
"qla2x00_start_scsi()" should not depend on
"cmd->request->nr_hw_segments" to compute the required number of
request entries, Insted it can use the output of "pci_map_sg()".

Comment 14 Doug Ledford 2005-02-24 21:59:31 UTC

Created attachment 111409 [details]
qla2x00 driver bug fix patch

Buddhi, thank you for the pointer.  After code inspection, you are correct
regarding the qlogic driver.  It can be considered a bug that the qlogic driver
was ever looking at request->nr_hw_segments in the first place as that's
specifically a block layer request struct where as low level SCSI device
drivers really should never look at anything outside the scsi_command struct
for their information.	I've written a patch to correct this problem and some
related PCI DMA mapping issues in the qlogic driver that were found while
investigating this problem.  That test patch is attached here and is currently
being tested by me.  Upon successful test completion, I'll submit it for review
and possible inclusion in our next kernel update.

Comment 16 Doug Ledford 2005-02-25 18:43:08 UTC

The qlogic patch has been submitted to our internal list for review
(and has already received several ACK's and should make U1) as well as
submitted upstream for review and accepted QLogic for inclusion in
their future driver updates.

Comment 18 Tim Powers 2005-06-08 15:13:00 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html