Bug 580869

Summary:

vgscan with disconnected Fibre Channel luns in filter can cause call trace

Product:

Red Hat Enterprise Linux 5

Reporter:

Qixiang Wan <qwan>

Component:

lvm2

Assignee:

LVM and device-mapper development team <lvm-team>

Status:

CLOSED WONTFIX

QA Contact:

Corey Marthaler <cmarthal>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.5

CC:

agk, bmarzins, bmr, bsettle, christophe.varoqui, cpelland, dwysocha, edamato, egoggin, heinzm, iannis, jbrassow, joe.thornber, junichi.nomura, kueda, leiwang, lmb, lmiksik, marco.uhl, ovirt-maint, prajnoha, prockai, Rhev-m-bugs, slevine, srevivo, tools-bugs, tranlan, vbian, zkabelac

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-04-04 20:41:42 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

928849, 1049888

Attachments:

Description	Flags
dmesg	none
vdsm log	none
/var/log/messages	none
/var/log/messages	none
/var/log/messages	none

Description Qixiang Wan 2010-04-09 10:08:46 UTC

Description of problem:
While vdsm running the RHEVH host, it may fork vgscan process to scan the VG on local disk and also fibre channel or iscsi LUNs if they are present. if we boot up the RHEVH host with some Luns attached, then deattach all of them. the vgscan process will hang up and cause call trace. Not sure what's the opposite effect as the RHEVH can still work.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 5.5-2.2 (0.10)
kernel-2.6.18-194.el5
vdsm-4.5-45.el5rhev

How reproducible:
100%

Steps to Reproduce:
1. Boot RHEVH host with Fibre Channel LUNs attached. WWID of the LUNs in this case are : 3600a0b80005ad1d7000004094a22dddb,
3600a0b80005adb0b000004474a22de44

$ uname -a
Linux intel-5405-32-2.englab.nay.redhat.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

2. $ multipath -l
3600a0b80005ad1d7000004094a22dddb dm-1 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:3:0 sdd 8:48  [active][undef]
 \_ 2:0:1:0 sdf 8:80  [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:2:0 sdb 8:16  [active][undef]
 \_ 2:0:3:0 sdh 8:112 [active][undef]
3600a0b80005adb0b000004474a22de44 dm-2 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
 \_ 1:0:2:1 sdc 8:32  [active][undef]
 \_ 2:0:3:1 sdi 8:128 [active][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:3:1 sde 8:64  [active][undef]
 \_ 2:0:1:1 sdg 8:96  [active][undef]
350000f000b0f9000 dm-0 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:0 sda 8:0   [active][undef]
SATA_SAMSUNG_HD251HJ_S1FYJ90S104227 dm-12 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:0 sda 8:0   [active][undef]

3. Run the following command:
$ while sleep 1;do /usr/sbin/vgscan --config '        
 devices {
 filter = [ "a|/dev/mapper/3600a0b80005ad1d7000004094a22dddb|", "a|/dev/mapper/3600a0b80005adb0b000004474a22de44|", "r|.*|" ]
 }
 backup {
 retain_min = 50
 retain_days = 0
 }
 '; done

3. Dettach the LUNs by plugging out all the optical fiber lines connected to the host. Then the vgscan command in step 3 will hang up, after several minutes you will get call trace message in terminal or dmesg.

$ ps aux | grep -v grep | grep vgscan
root     11472  0.0  0.0  22680  2140 pts/0    D+   09:14   0:00 /usr/sbin/vgscan --config          ? devices {? filter = [ "a|/dev/mapper/3600a0b80005ad1d7000004094a22dddb|", "a|/dev/mapper/3600a0b80005adb0b000004474a22de44|", "r|.*|" ]? }? backup {? retain_min = 50? retain_days = 0? }?

$ dmesg
...
qla2xxx 0000:08:00.1: LOOP DOWN detected (4 4 0).
qla2xxx 0000:08:00.0: LOOP DOWN detected (4 4 0).
 rport-2:0-0: blocked FC remote port time out: saving binding
 rport-2:0-1: blocked FC remote port time out: saving binding
 rport-2:0-2: blocked FC remote port time out: saving binding
 rport-2:0-3: blocked FC remote port time out: saving binding
device-mapper: multipath: Failing path 8:80.
device-mapper: multipath: Failing path 8:96.
device-mapper: multipath: Failing path 8:112.
device-mapper: multipath: Failing path 8:128.
 rport-1:0-0: blocked FC remote port time out: saving binding
 rport-1:0-1: blocked FC remote port time out: saving binding
 rport-1:0-2: blocked FC remote port time out: saving binding
 rport-1:0-3: blocked FC remote port time out: saving binding
sd 1:0:3:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdd, sector 62914432
device-mapper: multipath: Failing path 8:48.
device-mapper: multipath: Failing path 8:16.
device-mapper: multipath: Failing path 8:32.
device-mapper: multipath: Failing path 8:64.
INFO: task vgscan:11472 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgscan        D ffff810009025e20     0 11472  11168                     (NOTLB)
 ffff8108570e7cb8 0000000000000082 0000000000000400 ffffffff8001c211
 00000000000004a0 0000000000000008 ffff81087a79a860 ffff81011dda5080
 00000072eaa36120 00000000027bf569 ffff81087a79aa48 0000000400000000
Call Trace:
 [<ffffffff8001c211>] generic_make_request+0x211/0x228
 [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90
 [<ffffffff800647ea>] io_schedule+0x3f/0x67
 [<ffffffff800f5824>] __blockdev_direct_IO+0x8da/0xa80
 [<ffffffff800e6859>] blkdev_direct_IO+0x32/0x37
 [<ffffffff800e6791>] blkdev_get_blocks+0x0/0x96
 [<ffffffff8000c514>] __generic_file_aio_read+0xb8/0x198
 [<ffffffff8012b541>] inode_has_perm+0x56/0x63
 [<ffffffff800c78fb>] generic_file_read+0xac/0xc5
 [<ffffffff8012b541>] inode_has_perm+0x56/0x63
 [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80042489>] do_ioctl+0x21/0x6b
 [<ffffffff8012e042>] selinux_file_permission+0x9f/0xb6
 [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
 [<ffffffff80011c01>] sys_read+0x45/0x6e
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

INFO: task vgscan:11472 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgscan        D ffff810009025e20     0 11472  11168                     (NOTLB)
 ffff8108570e7cb8 0000000000000082 0000000000000400 ffffffff8001c211
 00000000000004a0 0000000000000008 ffff81087a79a860 ffff81011dda5080
 00000072eaa36120 00000000027bf569 ffff81087a79aa48 0000000400000000
Call Trace:
 [<ffffffff8001c211>] generic_make_request+0x211/0x228
 [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90
 [<ffffffff800647ea>] io_schedule+0x3f/0x67
 [<ffffffff800f5824>] __blockdev_direct_IO+0x8da/0xa80
 [<ffffffff800e6859>] blkdev_direct_IO+0x32/0x37
 [<ffffffff800e6791>] blkdev_get_blocks+0x0/0x96
 [<ffffffff8000c514>] __generic_file_aio_read+0xb8/0x198
 [<ffffffff8012b541>] inode_has_perm+0x56/0x63
 [<ffffffff800c78fb>] generic_file_read+0xac/0xc5
 [<ffffffff8012b541>] inode_has_perm+0x56/0x63
 [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80042489>] do_ioctl+0x21/0x6b
 [<ffffffff8012e042>] selinux_file_permission+0x9f/0xb6
 [<ffffffff8000b6b0>] vfs_read+0xcb/0x171
 [<ffffffff80011c01>] sys_read+0x45/0x6e
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0
...


Actual results:
1. vgscan process will hange up, and get hung_task_timeout_secs call trace. But RHEVH can work as usual.
2. if just run vgs, there will be no problem

Expected results:
vgscan should have the ability to deal with such issue, and should not cause call trace.

Additional info:

Comment 1 Qixiang Wan 2010-04-09 10:09:25 UTC

Created attachment 405503 [details]
dmesg

Comment 3 Alan Pevec 2010-04-09 10:29:58 UTC

vdsm.log please

but FYI vgs is just displaying LVM metadata while vgscan is accessing disk devices, only way to solve this would be if LVM tools had timeout option.

Comment 4 Qixiang Wan 2010-04-09 10:31:08 UTC

multipath -l output after disconnect all the Fibre Channel LUNs:
$ multipath -l
3600a0b80005ad1d7000004094a22dddb dm-1 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:1:0 sdb 8:16  [failed][undef]
 \_ 1:0:3:0 sdh 8:112 [failed][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:3:0 sdd 8:48  [failed][undef]
 \_ 1:0:2:0 sdf 8:80  [failed][undef]
3600a0b80005adb0b000004474a22de44 dm-13 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:3:1 sde 8:64  [failed][undef]
 \_ 1:0:2:1 sdg 8:96  [failed][undef]
\_ round-robin 0 [prio=0][enabled]
 \_ 2:0:1:1 sdc 8:32  [failed][undef]
 \_ 1:0:3:1 sdi 8:128 [failed][undef]
350000f000b0f9000 dm-0 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:0 sda 8:0   [active][undef]
SATA_SAMSUNG_HD251HJ_S1FYJ90S104227 dm-10 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 0:0:0:0 sda 8:0   [active][undef]

Comment 5 Qixiang Wan 2010-04-09 10:32:53 UTC

Created attachment 405514 [details]
vdsm log

Comment 6 Ayal Baron 2010-04-11 12:52:11 UTC

this is an lvm issue, moving to lvm.

Comment 7 Ayal Baron 2010-04-13 15:28:51 UTC

Raising severity seeing as this is a rhev blocker.
This is easily reproducible and is very likely to happen to customers and will cause different flows in vdsm to fail.

Comment 8 Alasdair Kergon 2010-04-16 17:40:00 UTC

Disable 'queue_if_no_path' ?

Comment 9 Ben Marzinski 2010-04-16 18:11:36 UTC

multipath has queue_if_no_path set for some of these devices:

3600a0b80005adb0b000004474a22de44 dm-13 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]

That means if all the paths are down, multipath will queue the IO waiting for a path to return. These devices (IBM,1726-4xx  FAStT) are by default configured
to queue IO for 5 minutes before failing it.  Does this resolve after you see

device-mapper: multipath: Failing path 8:80.
device-mapper: multipath: Failing path 8:96.
device-mapper: multipath: Failing path 8:112.
device-mapper: multipath: Failing path 8:128.

In the logs.  This means that the path has finally timed out. After this point
accessing a failed multipath device should be just like accessing any failed device.

Comment 10 Ben Marzinski 2010-05-03 20:47:45 UTC

Actually, it would be really helpful if I could get a copy of /var/log/messages from when this recreates. You should see messages like:

multipathd: checker failed path 8:16 in map mpath1

multipathd: mpath1: remaining active paths: 0 

multipathd: mpath1: Entering recovery mode: max_retries=120

multipathd: mpath1: Disable queueing


After that last multipathd message, everything should free up. Obviously, the path and map information will be different.  Without this, it's hard to tell if multipathd isn't doing what it is supposed to do.  If everything does eventually return to normal (at least normal for failed paths), then the only problem that I can see is the annoying kernel messages. Multipathd is told to queue all IO for 5 minutes when all paths are down.  scanning the devices requires IO, so if the devices have been down for less than 5 minutes, the scanning should hang.  I don't see why the kernel cares that a process has been waiting on IO for more than 2 minutes, but the stack trace definitely makes it look like the kernel is complaining about a process that is simply waiting on IO, which is exactly what should happen until the device times out.

If multipathd is running and you either aren't seeing these messages or after seeing them, the stuck processes aren't getting unstuck, then there is a bigger bug.

Comment 11 Qixiang Wan 2010-05-04 08:38:15 UTC

1. Attach one FC LUN(3600a0b80005b0acc00005c774bdf76b5) to the host:

$ multipath -ll -v2
SATA_SAMSUNG_HD251HJ_S1FYJ90S104285 dm-11 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:0:0 sde 8:64  [active][ready]
350000f000b0f9000 dm-1 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:0:0 sde 8:64  [active][ready]
3600a0b80005b0acc00005c774bdf76b5 dm-0 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=200][active]
 \_ 0:0:2:0 sdb 8:16  [active][ready]
 \_ 1:0:1:0 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:0 sda 8:0   [active][ghost]
 \_ 1:0:0:0 sdc 8:32  [active][ghost]

2. Plug out all the lines, then issue the vgscan command:
$ /usr/sbin/vgscan --config '        
 devices {
 filter = [ "a|/dev/mapper/3600a0b80005b0acc00005c774bdf76b5|", "r|.*|" ]
}
 backup {
 retain_min = 50
 retain_days = 0
 }
 '
3. can see the following messages in /var/log/message soon:
...
2010-05-04T08:17:29.691329+00:00 intel-5405-32-1 multipathd: checker failed path 8:0 in map 3600a0b80005b0acc00005c774bdf76b5
2010-05-04T08:17:29.691427+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: remaining active paths: 1
2010-05-04T08:17:29.691438+00:00 intel-5405-32-1 multipathd: sdb: rdac checker reports path is down
2010-05-04T08:17:29.691548+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: switch to path group #1
2010-05-04T08:17:29.691584+00:00 intel-5405-32-1 multipathd: sdc: rdac checker reports path is down
2010-05-04T08:17:29.691593+00:00 intel-5405-32-1 multipathd: checker failed path 8:32 in map 3600a0b80005b0acc00005c774bdf76b5
2010-05-04T08:17:29.691601+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: Entering recovery mode: max_retries=300
2010-05-04T08:17:29.691610+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: remaining active paths: 0
...

4. but the vgscan process keep hanging even after 15 minutes, the vgscan command issued in step haven't got any return:
$ ps aux | grep vgscan | grep -v grep 
root      8704  0.0  0.0  22536  2024 pts/0    D+   08:17   0:00 /usr/sbin/vgscan --config         ? devices {? filter = [ "a|/dev/mapper/3600a0b80005b0acc00005c774bdf76b5|", "r|.*|" ]?}? backup {? retain_min = 50? retain_days = 0? }?
$ date
Tue May  4 08:34:48 UTC 2010

5. $ multipath -ll -v2
sda: checker msg is "rdac checker reports path is down"
sdb: checker msg is "rdac checker reports path is down"
sdc: checker msg is "rdac checker reports path is down"
sdd: checker msg is "rdac checker reports path is down"
SATA_SAMSUNG_HD251HJ_S1FYJ90S104285 dm-11 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:0:0 sde 8:64  [active][ready] 
350000f000b0f9000 dm-1 ATA,SAMSUNG HD251HJ
[size=233G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
 \_ 2:0:0:0 sde 8:64  [active][ready] 
3600a0b80005b0acc00005c774bdf76b5 dm-0 IBM,1726-4xx  FAStT
[size=30G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:2:0 sdb 8:16  [failed][faulty]
 \_ 1:0:1:0 sdd 8:48  [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:0 sda 8:0   [failed][faulty]
 \_ 1:0:0:0 sdc 8:32  [failed][faulty]


6. /var/log/message will be attached soon

Comment 12 Qixiang Wan 2010-05-04 08:40:34 UTC

Created attachment 411220 [details]
/var/log/messages

Comment 13 Qixiang Wan 2010-05-04 08:45:23 UTC

Created attachment 411221 [details]
/var/log/messages

Comment 14 Ben Marzinski 2010-05-12 15:02:49 UTC

Looking at the /var/log/messages output you have

2010-05-04T08:17:29.943555+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: Entering recovery mode: max_retries=300

max_retries=300 means that the node will queue for (300  * polling_interval) seconds.  Usually polling interval is 5 seconds.  This means that it will queue for 300 * 5 / 60 = 25 minutes

The last message in /var/log/messages is
2010-05-04T08:38:51.090281+00:00 intel-5405-32-1 multipathd: sdd: rdac checker reports path is down

This is only 21 minutes later, so mutipath should still be queueing according to the configuration. When I said earlier that it should only queue for 5 minutes, I was looking at the wrong device, 25 minutes is the default configuration for this device.  If you want to test this quickly, you could add the following configuration to /etc/multipath.conf

devices {
       device {
               vendor                  "IBM"
               product                 "1726-4xx"
               getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
               prio_callout            "/sbin/mpath_prio_rdac /dev/%n" 
               features                "0"
               hardware_handler        "1 rdac"
               path_grouping_policy    group_by_prio
               failback                immediate
               rr_weight               uniform
               no_path_retry           12
               rr_min_io               1000
               path_checker            rdac
       }
}

This simply queues for a minute, and then fails the IO back.  After a minute, you should see something like

May 12 03:52:24 ask-08 multipathd: mpath1: Disable queueing 

And then it should act like a regular failed device.

Comment 15 Qixiang Wan 2010-05-13 05:33:22 UTC

I tested with the same steps in Comment #11 (in the same environment), pulled out the FC lines at 2010-05-13T03:19:xx, and then issued the vgscan command. Actually I saw the "Disable queueing" message in /var/log/messages after 50 minutes, vgscan process also got unstuck after 50 minutes:
...
2010-05-13T04:09:38.027087+00:00 intel-5405-32-1 multipathd: 3600a0b80005b0acc00005c774bdf76b5: Disable queueing
...

------
$ date
Thu May 13 04:09:35 UTC 2010
$ ps aux | grep vgscan | grep -v grep
root      8957  0.0  0.0  22536  2024 pts/1    D+   03:19   0:00 /usr/sbin/vgscan --config ? devices {? filter = [ "a|/dev/mapper/3600a0b80005b0acc00005c774bdf76b5|", "r|.*|" ]?}? backup {? retain_min = 50? retain_days = 0? }?
------

here is the multipath.conf on this host:
------
$ cat /etc/multipath.conf

# RHEV REVISION 0.5


defaults {
    udev_dir                /dev
    polling_interval        10
    selector                "round-robin 0"
    path_grouping_policy    multibus
    getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
    prio_callout            /bin/true
    path_checker            readsector0
    rr_min_io               100
    max_fds                 8192
    rr_weight               priorities
    failback                immediate
    no_path_retry           fail
    user_friendly_names     no
}
------

/var/log/message will be attached soon.

Comment 16 Qixiang Wan 2010-05-13 05:36:08 UTC

Created attachment 413634 [details]
/var/log/messages

got the "Disable queueing" message after 50 minutes

Comment 17 Ben Marzinski 2010-05-13 18:25:33 UTC

That makes sense.  Comment #15 shows that you have polling_interval set to 10, so 300 * 10 / 60 = 50 minutes. I realize that this is excessive, but the hardware vendors set the default configurations for their devices, and IBM went with 300.  I will try to ping someone from IBM and ask them if they'd rather go with something shorter (like 60, which is possibly what they meant, If they forgot that the you retry every polling_interval seconds, and not every second). Is there a reason why you have the polling interval set to 10 seconds, instead of the default of 5 seconds?

Comment 18 Qixiang Wan 2010-05-14 02:53:56 UTC

(In reply to comment #17)
> Is there a reason why you have the polling interval set to 10 seconds,
> instead of the default of 5 seconds?
I think it's original came from vdsm in RHEVH 5.4-2.1.
Ayal, could you help to answer this question? thanks.

Comment 19 Ayal Baron 2010-05-14 06:08:12 UTC

IIRC that was the rhel 5.4 default. Cyril?

Comment 20 Ben Marzinski 2010-05-14 19:02:23 UTC

Multipath's compiled-in polling_interval default has always been 5 seconds.

Comment 21 Ayal Baron 2010-05-16 07:05:40 UTC

(In reply to comment #20)
> Multipath's compiled-in polling_interval default has always been 5 seconds.    
That very well could be, but the multipath.conf file supplied with 5.4 contains a value of 10 for polling_interval (I just verified that).
Should we change this to 5?
Even if we do change this to 5, 25m would still be excessive...

Comment 22 Ben Marzinski 2010-05-18 17:23:59 UTC

If you are talking about the multipath.conf file that the device-mapper-multipath installs by default, the

#       polling_interval        10

listed is simply listed as part of an example of how to configure some standard options, like the comment above is says. They were not intended to be uncommented and used as is.

I think setting the polling_interval to 5 would be a good idea. But until I hear from IBM that they want the no_path_retry value changed, I'm not going to change it from what they requested.  There are many devices which simply are set to queue indefinitely when all paths are down.  For these devices, the solution is to either restore access to the device, delete the device if it is not coming, or manually disable queueing with

# multipathd -k"disablequeueing map <mapname>"

So saying that you want the device to wait for 25 minutes isn't actually that unusual in the range of timeouts chosen.  It's designed, I believe, to give the sysadmin enough time to notice the issue, trouble-shoot it, and if possible, solve it, before multipathd simply fails the IO.

Comment 23 Ayal Baron 2010-05-20 07:22:44 UTC

(In reply to comment #22)
> If you are talking about the multipath.conf file that the
> device-mapper-multipath installs by default, the
> 
> #       polling_interval        10
> 
> listed is simply listed as part of an example of how to configure some standard
> options, like the comment above is says. They were not intended to be
> uncommented and used as is.
> 
> I think setting the polling_interval to 5 would be a good idea. But until I
> hear from IBM that they want the no_path_retry value changed, I'm not going to
> change it from what they requested.  There are many devices which simply are
> set to queue indefinitely when all paths are down.  For these devices, the
> solution is to either restore access to the device, delete the device if it is
> not coming, or manually disable queueing with
> 
> # multipathd -k"disablequeueing map <mapname>"
> 
> So saying that you want the device to wait for 25 minutes isn't actually that
> unusual in the range of timeouts chosen.  It's designed, I believe, to give the
> sysadmin enough time to notice the issue, trouble-shoot it, and if possible,
> solve it, before multipathd simply fails the IO.    

Right, the problem is that vgscan takes a vg lock which means that during these 25 or 50 min lvm is pretty much disabled entirely even if the device has nothing to do with any vg...

Comment 24 Ben Marzinski 2010-05-20 15:37:46 UTC

What do you propose that multipath should do? Multipath can be, and usually is, intentionally configured to queue IO when all paths to a device are down, either for a while, or forever. The idea is that when you lose your paths, you either go and fix the issue, or you remove the multipath device.

vgscan sends IO to the device. If all paths are down and the device is set to queue IO when all paths are down, that's what it will do. If you actually remove the devices from the system, multipath will remove the multipath device (assuming that it's not currently in use). Multipath even has an option so that if you remove all of the devices, and the device is in use, it turns off queueing to avoid problems like this. But for that to work, you must actually remove the devices. If they just get disconnected, there is no way for multipath to tell if that was intentional or not.

So multipath is working as designed, and I don't see any way to make a special case for this. If you want the devices gone, then remove the scsi devices, or alternatively, remove the multipath devices. If multipath can't talk to the device at all, it won't create a new multipath device for it, even if the scsi device exists. If you don't do this, and get stuck waiting for queued IO, then you need to disable queueing on the device, using

# multipathd -k"disablequeueing map <mapname>"

This will turn off queueing, so that whatever's using the device can get unstuck, and you can remove the device. If you don't want to bothered with this, and your device queues IO by default, then you need to edit the configuration and change that. There are enough options for multipath, that it's pretty impossible to have a default config that will always work. Sometimes people will have to edit their /etc/multipath.conf file. Unfortunately, there is no way to make a default configuration parameter that overrides the device specific ones, so you'll have to make one like the example I gave in Comment #14

Comment 25 Ayal Baron 2010-05-23 07:28:02 UTC

(In reply to comment #24)
> What do you propose that multipath should do?  Multipath can be, and usually
> is, intentionally configured to queue IO when all paths to a device are down,
> either for a while, or forever. The idea is that when you lose your paths, you
> either go and fix the issue, or you remove the multipath device.
> 
> vgscan sends IO to the device. If all paths are down and the device is set to
> queue IO when all paths are down, that's what it will do. If you actually
> remove the devices from the system, multipath will remove the multipath device
> (assuming that it's not currently in use). Multipath even has an option so that
> if you remove all of the devices, and the device is in use, it turns off
> queueing to avoid problems like this. But for that to work, you must actually
> remove the devices.  If they just get disconnected, there is no way for
> multipath to tell if that was intentional or not.
> 
> So multipath is working as designed, and I don't see any way to make a special
> case for this.  If you want the devices gone, then remove the scsi devices, or
> alternatively, remove the multipath devices. If multipath can't talk to the
> device at all, it won't create a new multipath device for it, even if the scsi
> device exists.  If you don't do this, and get stuck waiting for queued IO, then
> you need to disable queueing on the device, using
> 
> # multipathd -k"disablequeueing map <mapname>"
> 
> This will turn off queueing, so that whatever's using the device can get
> unstuck, and you can remove the device. If you don't want to bothered with
> this, and your device queues IO by default, then you need to edit the
> configuration and change that.  There are enough options for multipath, that
> it's pretty impossible to have a default config that will always work.
> Sometimes people will have to edit their /etc/multipath.conf file.
> Unfortunately, there is no way to make a default configuration parameter that
> overrides the device specific ones, so you'll have to make one like the example
> I gave in Comment #14    
I agree, I don't think this is a multipath-dm issue.
The question is should this not be dealt with through lvm (seeing as in such cases vgscan takes locks on vgs and p_global and might cause any lvm command to hang as a result).

Comment 27 Ben Marzinski 2010-06-21 15:59:46 UTC

I'm reassigning this bug to lvm, for a response to Comment #25, however I don't know that locking here is avoidable.

Comment 28 Alasdair Kergon 2010-07-09 20:57:22 UTC

No changes in the short term, no.  Longer term this part of the LVM design changes, but we've no ETA for that.

Comment 30 Ludek Smid 2010-11-26 09:07:16 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 37 Chris Williams 2017-04-04 20:41:42 UTC

Red Hat Enterprise Linux 5 shipped it's last minor release, 5.11, on September 14th, 2014. On March 31st, 2017 RHEL 5 exits Production Phase 3 and enters Extended Life Phase. For RHEL releases in the Extended Life Phase, Red Hat  will provide limited ongoing technical support. No bug fixes, security fixes, hardware enablement or root-cause analysis will be available during this phase, and support will be provided on existing installations only.  If the customer purchases the Extended Life-cycle Support (ELS), certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release will be provided.  The specific support and services provided during each phase are described in detail at http://redhat.com/rhel/lifecycle

This BZ does not appear to meet ELS criteria so is being closed WONTFIX. If this BZ is critical for your environment and you have an Extended Life-cycle Support Add-on entitlement, please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration of an errata. Please note, only certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release can be considered.