Bug 484956 - qdiskd does not prune partitions mapped to dm-mpio devices
qdiskd does not prune partitions mapped to dm-mpio devices
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
low Severity medium
: rc
: ---
Assigned To: Lon Hohberger
Cluster QE
Depends On:
  Show dependency treegraph
Reported: 2009-02-10 15:48 EST by Lon Hohberger
Modified: 2010-10-23 03:36 EDT (History)
8 users (show)

See Also:
Fixed In Version: cman-2.0.100-1.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-02 07:09:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Lon Hohberger 2009-02-10 15:48:01 EST
Description of problem:

If a quorum disk is located on a partition of a LUN instead of the LUN itself, qdiskd does not notice that the actual devices are slaves to another dm-mpio device.

For example, suppose we have a single disk with two paths:

  /dev/sdc  |  =>  /dev/dm-8
  /dev/sdf  |

If we then partition that disk, a new multipath device, dm-15, gets created:

  /dev/sdc1  |  =>  /dev/dm-15
  /dev/sdf1  |

This is all well and good.  However, the scandisk code in qdiskd does not correctly determine that /dev/sdc1 and /dev/sdf1 are slaves to /dev/dm-15.  This is because as it scans /sys/block for devices, it looks for /sys/block/sdc/sdc1/holders, but no holders are set for partitions, only the parent LUN.

Consequently, mkqdisk -L will return 3 hits when searching for a label as it stands, instead of 1:


Version-Release number of selected component (if applicable): cman-2.0.98-1.el5
Comment 1 Fabio Massimo Di Nitto 2009-02-11 05:17:42 EST
Hi guys,

I believe this is now fixed in our git trees.

I tested on ora1 and it works fine for me.

Lon the commits in master/stable2/stable3 branch are:

commit 02bdf16609905e52a2ba6f52b202f764cd42d650
Author: Fabio M. Di Nitto <fdinitto@redhat.com>
Date:   Wed Feb 11 10:54:05 2009 +0100

    qdisk: propagate parent_holder information to childs
    Bug 484956 part 3.
    If a device (e.g. sda) has holders (lvm or mpath), it's children
    do not have holders information in sysfs.
    Make sure to propagate the information from parent to children
    when scanning in sysfs.
    Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit 9e427e336fd6b0cc4d1467e93392232ec96b4e6c
Author: Fabio M. Di Nitto <fdinitto@redhat.com>
Date:   Wed Feb 11 10:41:48 2009 +0100

    qdisk: remove debugging printf.
    remove leftover from previous commit "Bug 484956 part 2".
    Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit cb1e3dc19959f6193acc700000f0110c3dd58c13
Author: Fabio M. Di Nitto <fdinitto@redhat.com>
Date:   Wed Feb 11 10:40:34 2009 +0100

    qdisk: fix device scanning order.
    Bug 484956 part 2.
    This patch re-arrange the check so that full devices are always scanned
    before the underneath partitions.
    Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>

commit 054e9285257ea034c9ea8dc72095869bf82c0794
Author: Fabio M. Di Nitto <fdinitto@redhat.com>
Date:   Wed Feb 11 09:41:08 2009 +0100

    qdisk: fix device scanning.
    Bug 484956 part 1.
    The basic sysfs scanning filter was completely wrong and it
    was not scanning for full devices at all.
    So entire devices like multipaths or sda where missing from the
    original allocation.
    This patch re-arrange the check so that we perform a full scan.
    Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Comment 2 Fabio Massimo Di Nitto 2009-02-11 05:20:35 EST
Reassign to lon for final cherry pick and test from RHEL branches.
Comment 3 Lon Hohberger 2009-02-11 18:32:19 EST
Those patches look fine.  I will build a package for Tes to run on his cluster, but I think for the purposes of Tes's work, that 'device=' should be used again.
Comment 4 Lon Hohberger 2009-02-12 14:25:47 EST
Pushed these 3 patches to RHEL5 branch.
Comment 5 Corey Marthaler 2009-03-17 17:51:45 EDT
Fix verified in cman-2.0.99-1.el5.

Label is on partition 4.

[root@grant-01 ~]# mkqdisk -L
mkqdisk v0.6.0
        Magic:                eb7a62c2
        Label:                GRANT
        Created:              Tue Mar 17 16:32:39 2009
        Host:                 grant-01
        Kernel Sector Size:   512
        Recorded Sector Size: 512

Mar 17 16:47:16 grant-01 qdiskd[10524]: <info> Quorum Partition: /dev/dm-7 Label: GRANT 
Mar 17 16:47:16 grant-01 qdiskd[10525]: <info> Quorum Daemon Initializing 
Mar 17 16:47:19 grant-01 qdiskd[10525]: <info> Heuristic: 'ping -c3 -t5 sts.lab.msp.redhat 
Mar 17 16:47:26 grant-01 qdiskd[10525]: <info> Initial score 1/1 
Mar 17 16:47:26 grant-01 qdiskd[10525]: <info> Initialization complete 
Mar 17 16:47:26 grant-01 openais[10427]: [CMAN ] quorum device registered 
Mar 17 16:47:26 grant-01 qdiskd[10525]: <notice> Score sufficient for master operation (1/ 
Mar 17 16:47:32 grant-01 qdiskd[10525]: <info> Assuming master role 

[root@grant-01 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   0   M      0   2009-03-17 16:47:26  /dev/dm-7
   1   M  26176   2009-03-17 16:46:22  grant-01
   2   M  26188   2009-03-17 16:46:22  grant-02
   3   M  26184   2009-03-17 16:46:22  grant-03
Comment 7 Lon Hohberger 2009-07-22 15:09:53 EDT
This bug was present whenever multipath was used with qdiskd.  Because qdiskd could not discern slaves from master devices in dm-mpio configurations, it would select the first device from /proc/partitions, which was usually wrong.

This would reduce availability of qdiskd by making it immediately fail even when multipath was used.

Administrators could configure around this problem by specifying device name directly (e.g. /dev/dm-1) in cluster.conf instead of using a label.  However, the fix included allows the use of qdiskd labels in cluster.conf instead of device names.
Comment 9 errata-xmlrpc 2009-09-02 07:09:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.