Bug 1359510

Summary: RFE: Request for warning if multipathd is not running/active
Product: Red Hat Enterprise Linux 7 Reporter: Milan P. Gandhi <mgandhi>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Lin Li <lilin>
Severity: low Docs Contact: Steven J. Levine <slevine>
Priority: medium    
Version: 7.2CC: agk, bmarzins, byount, dwysocha, heinzm, jpittman, lilin, mgandhi, msnitzer, mthacker, prajnoha, prockai, rbalakri, salmy, yizhan, zkabelac
Target Milestone: rcKeywords: EasyFix, FutureFeature, Patch
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-101.el7 Doc Type: Enhancement
Doc Text:
Warning messages when `multipathd` is not running Users now get warning messages if they run a `multipath` command that creates or lists multipath devices while `multipathd` is not running. If `multipathd` is not running, then the devices are not able to restore paths that have failed or react to changes in the device setup. The `multipathd` daemon now prints a warning message if there are multipath devices and `multipathd` is not running.
Story Points: ---
Clone Of: 1305589 Environment:
Last Closed: 2017-08-01 16:34:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1305589    
Bug Blocks: 1298243, 1385242    
Attachments:
Description Flags
Proposed Patch v1 none

Description Milan P. Gandhi 2016-07-24 13:21:53 UTC
+++ This bug was initially created as a clone of Bug #1305589 +++

Description of problem:

Opening at customer request.  Is it possible to get a warning if the dm_multipath module is loaded, multipath devices are present, but multipathd is not set to on?  Perhaps when using the 'multipath' command.  The idea is that most users never use the multipathd command; they configure multipathing and use the multipath command. 

The multipathd service is an integral part of multipathing, so we should at least give an informative message if it is not running and the system is using multipath.

Why do we not auto-start multipathd if the module is in use?  Are there use cases where we would want to have multipath devices, but not use multipathd?  I have seen many cases where proper failover/mapping was not acheived because multipathd was not running and no one knew.  Could we request an auto load for multipathd instead of the warning?  Maybe both?  

Thanks for your time.  Please let me know if I can be of any help.

Version-Release number of selected component (if applicable):

device-mapper-multipath-0.4.9-87.el6.x86_64

Expected Results:

- multipathd service autostart with module load
- warning if service is not running when using multipath command

--- Additional comment from Ben Marzinski on 2016-02-08 15:04:02 EST ---

Loading the module can't autostart the service. There's no hook for anything like that in the module code. Besides, if there are no multipath devices, we don't want to start the service.  If the service is enabled in systemd, it will autostart on boot, as long a /etc/multipath.conf exists.  Creating the configuration file the recommended way, with

# mpathconf --enable

does enable the service in systemd. However, it does not autostart the service at that point.  This is because users often want to actually edit the configuration file before starting the service the first time. The service will autostart on the next reboot.

There are two ways to actually create multipath devices. Either multipathd will automatically create them, or they can be created manually by running multipath.  I can make multipath print a warning message if multipathd isn't running. This should solve the case where there are multipath devices but multipathd isn't running.

Does this sound reasonable?

--- Additional comment from John Pittman on 2016-02-08 15:12:22 EST ---

Yes Ben that sounds perfect.
Thanks a lot.

John

--- Additional comment from Milan P. Gandhi on 2016-07-24 09:19 EDT ---

With the attached patch, dm-multipath commands e.g. multipath -v2, multipath -ll etc. now checks if there are multipath device maps created, and multipathd service is running or not?

If the multipath device maps are created, but multipathd service is not running, then in such case there will be a warning message displayed to inform the user that IO failover/failback may not work as expected without multipathd process running.

Comment 1 Milan P. Gandhi 2016-07-24 13:28:12 UTC
Created attachment 1183370 [details]
Proposed Patch v1

Comment 2 Milan P. Gandhi 2016-07-24 13:35:42 UTC
dm-multipath command outputs before applying the attached patch:


[root@testhost ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[root@testhost ~]# uname -a
Linux testhost 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@testhost ~]# rpm -qa|grep -i multipath
device-mapper-multipath-0.4.9-85.el7_2.5.x86_64
device-mapper-multipath-libs-0.4.9-85.el7_2.5.x86_64
[root@testhost ~]# 


Currently "multipathd" service is running: 


[root@testhost ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2016-07-24 17:06:31 IST; 4min 4s ago
 Main PID: 517 (multipathd)
   CGroup: /system.slice/multipathd.service
           └─517 /sbin/multipathd
[...]
[root@testhost ~]# 
[root@testhost ~]# 


Stop the "multipathd" service:


[root@testhost ~]# systemctl stop multipathd
[root@testhost ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2016-07-24 17:10:42 IST; 2s ago
 Main PID: 517 (code=exited, status=0/SUCCESS)

[...]
Jul 24 17:10:42 testhost systemd[1]: Stopped Device-Mapper Multipath Device Controller.
[root@testhost ~]# 


With "multipathd" service stopped, we do not get any warnings for "multipath -v2, -ll, -v4" commands:


[root@testhost ~]# multipath -ll
mpathd (1IET_00050004) dm-5 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:4 sdd 8:48 active ready running
mpathc (1IET_00050003) dm-4 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:3 sdc 8:32 active ready running
mpathb (1IET_00050002) dm-3 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:2 sdb 8:16 active ready running
mpatha (1IET_00050001) dm-2 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:1 sda 8:0  active ready running
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# multipath -v2
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# multipath -F
[root@testhost ~]# multipath -v2
create: mpatha (1IET_00050001) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:1 sda 8:0  undef ready running
create: mpathb (1IET_00050002) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:2 sdb 8:16 undef ready running
create: mpathc (1IET_00050003) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:3 sdc 8:32 undef ready running
create: mpathd (1IET_00050004) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:4 sdd 8:48 undef ready running
[root@testhost ~]# 
[root@testhost ~]#

Comment 4 Milan P. Gandhi 2016-07-24 13:40:17 UTC
Using the test dm-multipath packages containing the attached patch:


Currently "multipathd" service is running on the server:
=======================================================

[root@testhost ~]# 
[root@testhost ~]# ps aux|grep -i multip
root       520  0.0  0.6 252732  6920 ?        SLl  18:13   0:00 /sbin/multipathd
root      2464  0.0  0.0 112652   980 pts/0    S+   18:16   0:00 grep --color=auto -i multip
[root@testhost ~]# 


Now stop the "multipathd" service:
=================================

[root@testhost ~]# systemctl stop multipathd
[root@testhost ~]# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sun 2016-07-24 18:17:52 IST; 4s ago
  Process: 516 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS)
  Process: 507 ExecStartPre=/sbin/multipath -A (code=exited, status=0/SUCCESS)
  Process: 488 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
 Main PID: 520 (code=exited, status=0/SUCCESS)

[...]
Jul 24 18:17:52 testhost systemd[1]: Stopped Device-Mapper Multipath Device Controller.
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# ps aux|grep -i multip
root      2613  0.0  0.0 112648   980 pts/0    R+   18:18   0:00 grep --color=auto -i multip
[root@testhost ~]# 
[root@testhost ~]# 


The "multipath -v2", "multipath -ll" commands now verifies if there is at least
one multipath device present on server, if yes, it will check if "multipathd"
process is running or not.

Since there are multipath device maps present, but "multipathd" service is not
running, we get the warning as seen in following command output:
=======================================================


[root@testhost ~]# multipath -ll
mpathd (1IET_00050004) dm-5 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:4 sdd 8:48 active ready running
mpathc (1IET_00050003) dm-4 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:3 sdc 8:32 active ready running
mpathb (1IET_00050002) dm-3 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:2 sdb 8:16 active ready running
mpatha (1IET_00050001) dm-2 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:1 sda 8:0  active ready running
Jul 24 18:18:08 | multipath device maps are present, but 'multipathd' service is not running
Jul 24 18:18:08 | IO failover/failback will not work without 'multipathd' service running
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# multipath -v2
Jul 24 18:18:12 | multipath device maps are present, but 'multipathd' service is not running
Jul 24 18:18:12 | IO failover/failback will not work without 'multipathd' service running
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# multipath -ll
mpathd (1IET_00050004) dm-5 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:4 sdd 8:48 active ready running
mpathc (1IET_00050003) dm-4 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:3 sdc 8:32 active ready running
mpathb (1IET_00050002) dm-3 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:2 sdb 8:16 active ready running
mpatha (1IET_00050001) dm-2 IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 3:0:0:1 sda 8:0  active ready running
Jul 24 18:18:16 | multipath device maps are present, but 'multipathd' service is not running
Jul 24 18:18:16 | IO failover/failback will not work without 'multipathd' service running
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# 
[root@testhost ~]# multipath -F
[root@testhost ~]# 
[root@testhost ~]# multipath -v2
create: mpatha (1IET_00050001) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:1 sda 8:0  undef ready running
create: mpathb (1IET_00050002) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:2 sdb 8:16 undef ready running
create: mpathc (1IET_00050003) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:3 sdc 8:32 undef ready running
create: mpathd (1IET_00050004) undef IET     ,VIRTUAL-DISK    
size=512M features='0' hwhandler='0' wp=undef
`-+- policy='service-time 0' prio=1 status=undef
  `- 3:0:0:4 sdd 8:48 undef ready running
Jul 24 18:18:23 | 'multipathd' service is currently not running, IO failover/failback will not work
[root@testhost ~]#

Comment 6 Mark Thacker 2016-11-23 16:14:06 UTC
Adding pm_ack for this.

Comment 7 Ben Marzinski 2017-02-16 18:32:32 UTC
Patch applied. Thanks.

Comment 10 Steven J. Levine 2017-05-08 18:05:02 UTC
Ben:  Could you check my edit to the doc text description for this feature (just a slight rearrangement for the release note format).

Steven

Comment 11 Ben Marzinski 2017-05-08 18:40:06 UTC
Looks fine.

Comment 12 Steven J. Levine 2017-05-16 16:23:10 UTC
Adding a "topic" summary sentence to the doc text.  I had thought the first paragraph would serve as a fine summary, but it turns out to be too many characters.  I didn't change anything in the approved description.

Comment 13 Lin Li 2017-05-22 11:05:13 UTC
Verified on device-mapper-multipath-0.4.9-111
1. [root@storageqe-84 ~]# rpm -qa | grep multipath
device-mapper-multipath-debuginfo-0.4.9-111.el7.x86_64
device-mapper-multipath-libs-0.4.9-111.el7.x86_64
device-mapper-multipath-0.4.9-111.el7.x86_64
device-mapper-multipath-sysvinit-0.4.9-111.el7.x86_64
device-mapper-multipath-devel-0.4.9-111.el7.x86_64

2. [root@storageqe-84 ~]# mpathconf --enable

3. [root@storageqe-84 ~]# service multipathd restart
Restarting multipathd (via systemctl):  [  OK  ]

4. [root@storageqe-84 ~]# multipath -ll
mpathb (353333330000007d0) dm-5 Linux   ,scsi_debug      
size=1.0G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 5:0:0:0 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 6:0:0:0 sdd 8:48 active ready running
mpatha (360fff19abdd9552f8a36e5355226ba27) dm-0 EQLOGIC ,100E-00         
size=50G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:1:0 sdb 8:16 active ready running

5. [root@storageqe-84 ~]# service multipathd stop
Stopping multipathd (via systemctl):  [  OK  ]

6. [root@storageqe-84 ~]# multipath -ll
mpathb (353333330000007d0) dm-5 Linux   ,scsi_debug      
size=1.0G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 5:0:0:0 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 6:0:0:0 sdd 8:48 active ready running
mpatha (360fff19abdd9552f8a36e5355226ba27) dm-0 EQLOGIC ,100E-00         
size=50G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 4:0:1:0 sdb 8:16 active ready running
May 22 13:00:34 | multipath device maps are present, but 'multipathd' service is not running            ------------------------->
May 22 13:00:34 | IO failover/failback will not work without 'multipathd' service running           ------------------------>
----------------->List multipath devices while multipathd is stopped, multipath  prints a warning message now.

Comment 14 errata-xmlrpc 2017-08-01 16:34:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1961