Bug 708914 - command multipath hang if executed right after hba kernel modules loaded
Summary: command multipath hang if executed right after hba kernel modules loaded
Keywords:
Status: CLOSED DUPLICATE of bug 645343
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: Mike Snitzer
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-30 06:14 UTC by Gris Ge
Modified: 2011-09-23 21:17 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-09-23 21:17:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patches needed for scsi_dh issue in 2.6.18-194 kernel (15.74 KB, patch)
2011-06-08 21:26 UTC, jeremy.filizetti
no flags Details | Diff

Description Gris Ge 2011-05-30 06:14:38 UTC
Description of problem:

If we execute multipath -r right after modprobe lpfc.
multipath will hang at these strace output:
====================
open("/dev/sdr", O_RDONLY)              = 4
ioctl(4, SG_IO, {'S', SG_DXFER_FROM_DEV, cmd[6]=[12, 01, 80, 00, ff, 00], mx_sb_len=32, iovec_count=0, dxfer_len=255, timeout=300000, flags=0
====================

Version-Release number of selected component (if applicable):
kernel-2.6.18-262.el5
device-mapper-multipath-0.4.7-46.el5

How reproducible:
100%

Steps to Reproduce:
1. multipath -F and make sure multipathd is running
2. modprobe -r lpfc
3. modprobe lpfc
4. multipath -r

Actual results:
multipath command hang and cannot terminated.

Expected results:
multipath table reloaded correctly.

Additional info:

Comment 1 jeremy.filizetti 2011-06-06 11:00:05 UTC
I think this is the same problem I am seeing.  My issue is with an OFED 1.5.2 built SRP initiator not the RHEL provided one.

Version Info:
1 - A RHEL modified Lustre kernel 2.6.18-194.17.1-el5 with a DDN modified device mapper RPM
2 - A RHEL original 2.6.18-194.26.1.el5 kernel with device-mapper-multipath-0.4.7-32.el5_5.6

Steps to Reproduce:
Boot system with openibd and multipathd service started or 
1. service multipathd start
2. service openibd start

Actual results:
multipathd does not finish setting up all device maps, hangs in kpartx on differnet devices each time I've seen it.  If "multipath -ll" is ran to verify paths before complete it also hangs

Expected results:
All device maps should be updated by multipathd and "multipath -ll" should report the proper status of all paths.

Additional info:
I have a few hung kpartx tasks as a result of the multipathd running:

Jun  1 15:01:01 test kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  1 15:01:01 test kernel: kpartx        D ffffffff80150d56     0 12087  12068                     (NOTLB)
Jun  1 15:01:01 test kernel:  ffff811223f41c08 0000000000000086 0000000000000001 ffffffff800e27ef
Jun  1 15:01:01 test kernel:  ffff811224974c08 0000000000000005 ffff81123fcea100 ffff81127f337820
Jun  1 15:01:01 test kernel:  00000035154def40 0000000000002100 ffff81123fcea2e8 0000000900000003
Jun  1 15:01:01 test kernel: Call Trace:
Jun  1 15:01:01 test kernel:  [<ffffffff800e27ef>] block_read_full_page+0x259/0x276
Jun  1 15:01:01 test kernel:  [<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90
Jun  1 15:01:01 test kernel:  [<ffffffff80028b03>] sync_page+0x0/0x42
Jun  1 15:01:01 test kernel:  [<ffffffff800637ea>] io_schedule+0x3f/0x67
Jun  1 15:01:01 test kernel:  [<ffffffff80028b41>] sync_page+0x3e/0x42
Jun  1 15:01:01 test kernel:  [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66
Jun  1 15:01:01 test kernel:  [<ffffffff8003fc4c>] __lock_page+0x5e/0x64
Jun  1 15:01:01 test kernel:  [<ffffffff800a09f8>] wake_bit_function+0x0/0x23
Jun  1 15:01:01 test kernel:  [<ffffffff8000c373>] do_generic_mapping_read+0x1df/0x359
Jun  1 15:01:01 test kernel:  [<ffffffff8000d18c>] file_read_actor+0x0/0x159
Jun  1 15:01:01 test kernel:  [<ffffffff8000c639>] __generic_file_aio_read+0x14c/0x198
Jun  1 15:01:01 test kernel:  [<ffffffff800c6852>] generic_file_read+0xac/0xc5
Jun  1 15:01:01 test kernel:  [<ffffffff800a09ca>] autoremove_wake_function+0x0/0x2e
Jun  1 15:01:01 test kernel:  [<ffffffff800e4c3b>] block_ioctl+0x1b/0x1f
Jun  1 15:01:01 test kernel:  [<ffffffff8004211a>] do_ioctl+0x21/0x6b
Jun  1 15:01:01 test kernel:  [<ffffffff800301f2>] vfs_ioctl+0x457/0x4b9
Jun  1 15:01:01 test kernel:  [<ffffffff80063b05>] mutex_lock+0xd/0x1d
Jun  1 15:01:01 test kernel:  [<ffffffff8000b729>] vfs_read+0xcb/0x171
Jun  1 15:01:01 test kernel:  [<ffffffff80011c14>] sys_read+0x45/0x6e
Jun  1 15:01:01 test kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
[root@test log]# ps -efl | grep kpartx
0 S root     10809     1  0  76  -2 -  2699 wait   Jun01 ?        00:00:00 /bin/bash -c /sbin/mpath_wait /dev/mapper/test-OST0025; /sbin/kpartx -a -p p /dev/mapper/test-OST0025 
4 D root     10814 10809  0  79  -2 -  3132 sync_p Jun01 ?        00:00:00 /sbin/kpartx -a -p p /dev/mapper/test-OST0025
0 S root     12068     1  0  75  -2 -  2699 wait   Jun01 ?        00:00:00 /bin/bash -c /sbin/mpath_wait /dev/mapper/test-OST0005; /sbin/kpartx -a -p p /dev/mapper/test-OST0005 
4 D root     12087 12068  0  75  -2 -  3132 sync_p Jun01 ?        00:00:00 /sbin/kpartx -a -p p /dev/mapper/test-OST0005
0 S root     21151 17219  0  77   0 - 15290 pipe_w 10:32 pts/1    00:00:00 grep kpartx
[root@test log]# ps -eo pid,args,wchan | grep kpartx
10814 /sbin/kpartx -a -p p /dev/m sync_page
12087 /sbin/kpartx -a -p p /dev/m sync_page
21156 grep kpartx                 pipe_wait

Comment 2 Mike Christie 2011-06-07 00:15:58 UTC
This might be https://bugzilla.redhat.com/show_bug.cgi?id=674932.

We had thought it was related to using scsi_dh_* modules, but if in comment #1 you are using SRP then we might have been on the wrong track.

Jeremy, the SRP target you are using does not use scsi_dh_emc, scsi_dh_alua or scsi_dh_rdac does it? You can tell by looking in /var/log/messages for some attached to emc or alua or rdac messages or doing a lsmod and checking if one of those modules is loaded and has a refcount greater than 1.

Comment 3 Mike Christie 2011-06-07 01:54:57 UTC
Jeremy,

I also wanted to confirm that at the time you got the hung kpartx that there were no transport errors/issues, right?

When kpartx hangs, can you do a dd directly to the /dev/sdXs that make up the paths for the multipath device ok, or does the dds fail to some paths?

Comment 4 jeremy.filizetti 2011-06-07 03:22:48 UTC
Unfortunately I'm not authorized to see bug 674932.  This system does use ALUA.  I have two different version of multipath that I tested, one is from the storage vendor DDN.  I think the primary difference is that the DDN modified multipath interprets the target port groups from VPD page 0x83 slightly different so it sees 4 different path priorities while the regular dm-multipath see just 2 different.  The problem was verified though on both.

I don't have remote access to the system and it's on a stand alone network but IIRC my multipath config looks something like:

I'm pretty sure there were no transport errors.  I've had no issues with leaving multipathd stopped, starting the IB service (which discovers and adds storage), and manually running "multipath -v1" a little after the storage is all discovered.  I've mounted the file systems and tested IO to them like that.  Obviously after kpartx is hanging all IO also hangs.

From what I looked at today with strace and systemtap it looks like:
1 thread of multipathd is in dm_suspend code path in drivers/md/dm.c in this loop:

1442        while (1) {
1443                set_current_state(TASK_INTERRUPTIBLE);
1444
1445                if (!atomic_read(&md->pending) || signal_pending(current))
1446                        break;
1447
1448                io_schedule();
1449        }

I'm hoping to take more of a look at other modules to figure out why md->pending is never decremented on later this week.

Comment 5 jeremy.filizetti 2011-06-07 03:51:11 UTC
One other comment probably worth noting since you mentioned you suspect scsi_dh modules.  When I took a look at how many times some of the functions were called with systemtap, activate_path() is called 59 times for my 59 LUNs but pg_init_done() is only called 57 the last time I looked with 2 devices hung in kpartx.  pg_init_done calls scsi_dh_activate().

Comment 6 Mike Christie 2011-06-07 21:14:29 UTC
Jeremy,

In your /var/log/messages do you see some messages like:

alua: Detached

?

Could you try this kernel
http://people.redhat.com/jwilson/el5/265.el5/
?

Comment 7 jeremy.filizetti 2011-06-08 21:26:38 UTC
Created attachment 503786 [details]
patches needed for scsi_dh issue in 2.6.18-194 kernel

I confirmed today that we are seeing the alua detachments in /var/log/messages after the messages printed from functions in alua_initialize().

Also, I tested the kernel you requested today and it seemed to clear the problem.  Unfortunately I need a fix for 5.5 today and soon 5.6.  After looking through the diffs from the RHEL patch in the src RPMs I think I extracted what is needed in the attached patch.  I built and tested the new kernel and it worked for the SCSI device handler issue but I didn't test anything else.  Can you verify that the patch looks fine and there isn't anything I'm overlooking?

Comment 9 Mike Christie 2011-06-09 22:50:49 UTC
(In reply to comment #7)
> problem.  Unfortunately I need a fix for 5.5 today and soon 5.6.

So are you going to request that we make z stream kernels?

I think the patch looks ok. I ccd the engineer that made the patches for our kernel to double check.

Comment 10 jeremy.filizetti 2011-06-12 20:51:34 UTC
> So are you going to request that we make z stream kernels?

No I'll handle the kernel myself, just looking to get a head knod from someone a little more familiar with this code then myself as to whether the patch looks good or not.

Comment 11 RHEL Program Management 2011-06-21 05:31:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 12 jeremy.filizetti 2011-06-21 13:25:48 UTC
(In reply to comment #11)
> This request was evaluated by Red Hat Product Management for inclusion in Red
> Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the
> currently developed update.
> 
> Contact your manager or support representative in case you need to escalate
> this bug.

So this will be included in 5.7 but not added to older releases (5.6 and 5.5)?

Comment 14 Mike Christie 2011-06-24 02:06:43 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > This request was evaluated by Red Hat Product Management for inclusion in Red
> > Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the
> > currently developed update.
> > 
> > Contact your manager or support representative in case you need to escalate
> > this bug.
> 
> So this will be included in 5.7 but not added to older releases (5.6 and 5.5)?

Yes.

I think if you want it in a 5.6 or 5.5 kernel you can request some sort of special port like a zstream kernel release. I think to do that you do what it requests in comment 11 where you have to have some sort of support rep do it. I do not have the proper bugzilla permissions to request it.

Comment 17 Mike Snitzer 2011-09-23 21:17:07 UTC
If you need info from me in the future _please_ set the needinfo flag accordingly.  I somehow missed this bug until now.

The patch that was provided in comment#7 rolls up changes that were introduced to address various scsi_dh* bugs in RHEL5:

from Mike Christie (for 5.7):
bug#666304
[scsi] scsi_dh: allow scsi_dh_detach to detach when attached


from Mike Snitzer (for 5.7):
bug#645343
[scsi] device_handler: propagate SCSI device deletion
[scsi] device_handler: fix ref counting in scsi_dh_activate error path

bug#619361
[scsi] scsi_dh_alua: handle transitioning state correctly

bug#667660
[scsi] scsi_dh_alua: add scalable ONTAP lun to dev list


from Michal Schmidt (for 5.6):
bug#556476
[misc] add round_jiffies_up and related routines


And some rdac changes for IBM and Dell device support from Rob Evers.


Anyway, closing bug as dup of bug#645343 as that seems like the most relevant change for the issue discussed in this BZ -- but could be we need a combination of 645343 and 666304.

But all these changes are in 5.7, any request for z-stream needs to be formally requested.

*** This bug has been marked as a duplicate of bug 645343 ***


Note You need to log in before you can comment on or make changes to this bug.