Bug 829308 - service clvmd stop ignores failures
service clvmd stop ignores failures
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.0
Unspecified Unspecified
medium Severity high
: rc
: ---
Assigned To: Ondrej Kozina
Cluster QE
:
Depends On: 1038818
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-06 08:44 EDT by Jaroslav Kortus
Modified: 2014-06-17 21:17 EDT (History)
12 users (show)

See Also:
Fixed In Version: lvm2-2.02.105-5.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 05:43:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jaroslav Kortus 2012-06-06 08:44:55 EDT
Description of problem:
service clvmd stop does not produce error output when something does not go well.
In my situation gfs2 filesystem from cluster VG is mounted:
(07:36:03) [root@r7-node01:~]$ vgs
  VG      #PV #LV #SN Attr   VSize   VFree  
  shared    1   1   0 wz--nc 171.00g 121.00g
  vg_none   1   2   0 wz--n-   4.50g      0 

(07:36:04) [root@r7-node01:~]$ lvs
  LV      VG      Attr     LSize  Pool Origin Data%  Move Log Copy%  Convert
  gfs2    shared  -wi-ao-- 50.00g                                           
  lv_root vg_none -wi-ao--  2.56g                                           
  lv_swap vg_none -wi-ao--  1.94g     

(07:36:27) [root@r7-node01:~]$ mount | grep gfs2
/dev/mapper/shared-gfs2 on /mnt/shared type gfs2 (rw,relatime,seclabel)

(07:36:16) [root@r7-node01:~]$ service clvmd stop
Stopping clvmd (via systemctl):                            [  OK  ]

(07:39:12) [root@r7-node01:~]$ echo $?
0
(07:36:24) [root@r7-node01:~]$ tail /var/log/messages 
Jun  6 07:36:24 r7-node01 clvmd[6147]: Deactivating clustered VG(s):   Can't deactivate volume group "shared" with 1 open logical volume(s)
Jun  6 07:36:24 r7-node01 clvmd[6147]: [FAILED]

(07:39:14) [root@r7-node01:~]$ service clvmd status
clvmd.service - LSB: This service is Clusterd LVM Daemon.
	  Loaded: loaded (/etc/rc.d/init.d/clvmd)
	  Active: inactive (dead) since Wed, 06 Jun 2012 07:39:12 -0500; 53s ago
	 Process: 6206 ExecStop=/etc/rc.d/init.d/clvmd stop (code=exited, status=5/NOTINSSTALLED)
	 Process: 6174 ExecStart=/etc/rc.d/init.d/clvmd start (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/clvmd.service
		  └ 6100 clvmd -T30

$ pgrep clvmd
6100


Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.95-6.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. create cluster VG (with cluster locking)
2. create LV inside with gfs2 and mount it
3. issue service clvmd stop

  
Actual results:
- clvmd still running
- error messages not propagated and ignored (exit code 0)
- service clvmd status reporting bogus info (inactive (dead))

Expected results:
- failures propagated
- clvmd stopping aborted and status reported as running
- subsequent stops should retry the procedure (now it's not retried and the failure messages are visible only on first stop)

Additional info:
Comment 1 Ondrej Kozina 2012-10-12 07:58:45 EDT
Short explanation:
Init script is not compatible with systemd. During deactivation process vgchange -anl returns errno 5, which is correct as underlaying LV is busy. But the return code is interpreted by systemd as NOTINSTALLED. Going to write correct systemd unit file.
Comment 2 Peter Rajnoha 2012-10-12 08:08:09 EDT
(...if you're at it, please do the cmirrord systemd unit as well - I think all the deps are already converted to systemd units nowadays...)
Comment 3 Alasdair Kergon 2012-11-01 13:41:19 EDT
Two cases to separate:

1.  Clean shutdown

- the scripts should run in the right sequence to stop any processes using any clustered filesystems, unmount the clustered filesystems, deactivate all clustered LVs, stop clvmd.


2.  Unclean shutdown

- if something doesn't stop cleanly, the node might need to be fenced
Comment 6 Nate Straz 2013-05-17 11:53:38 EDT
Was this ever merged?  The currently nightly RPMs for RHEL7 still include the old init scripts.  I can see a patch was submitted back in November 2012.

https://www.redhat.com/archives/lvm-devel/2012-November/msg00001.html

The added files are not in upstream lvm.git.
Comment 7 Peter Rajnoha 2013-05-21 04:09:32 EDT
(In reply to Nate Straz from comment #6)
> Was this ever merged?  The currently nightly RPMs for RHEL7 still include
> the old init scripts.  I can see a patch was submitted back in November 2012.
> 
> https://www.redhat.com/archives/lvm-devel/2012-November/msg00001.html
> 
> The added files are not in upstream lvm.git.

Yes, this is not merged yet. We'll get back to this soon...
Comment 8 Peter Rajnoha 2014-02-11 04:22:33 EST
See also bug #1044677, but let's keep this bz for testing the exact thing reported here...
Comment 9 Peter Rajnoha 2014-02-13 03:46:23 EST
Moving to 7.1 for consideration, see bz #1038818 comment #7 for more info.
Comment 11 Peter Rajnoha 2014-03-31 09:00:16 EDT
To QA:
...as already commented in comment #9, we have OCF resource file to manage clvmd/cmirrord now instead of initscripts/systemd units. When defining clvmd/cmirrord as a cluster resource, any filesystem on top of that is also a cluster resource and hence the proper dependency should be induced that will prevent stopping clvmd/cmirrord when it's still in use - the cluster resource manager will try to stop the resource layered above before trying to stop the layers below. So proper shutdown sequence should be assured this way.

On the other hand, if using clvmd and cmirrord directly as systemd units without cluster resource manager (bug #1038818 comment #17), we should get a proper error message now (the "stop" operation for the lvm2-cluster-activation fails and lvm2-clvmd.service keeps going on):

[root@rhel7-b ~]# mount /dev/vg/lvol0 /mnt/

[root@rhel7-b ~]# systemctl stop lvm2-cluster-activation

[root@rhel7-b ~]# systemctl status lvm2-cluster-activation
lvm2-cluster-activation.service - Clustered LVM volumes activation service
   Loaded: loaded (/usr/lib/systemd/system/lvm2-cluster-activation.service; enabled)
   Active: failed (Result: exit-code) since Mon 2014-03-31 08:56:06 EDT; 4s ago
  Process: 1843 ExecStop=/usr/lib/systemd/lvm2-cluster-activation deactivate (code=exited, status=1/FAILURE)
  Process: 796 ExecStart=/usr/lib/systemd/lvm2-cluster-activation activate (code=exited, status=0/SUCCESS)
 Main PID: 796 (code=exited, status=0/SUCCESS)

Mar 31 08:55:29 rhel7-b.virt systemd[1]: Started Clustered LVM volumes activation service.
Mar 31 08:55:29 rhel7-b.virt lvm2-cluster-activation[796]: Activating all VG(s):   2 logical volume(s) in volume ...tive
Mar 31 08:55:29 rhel7-b.virt lvm2-cluster-activation[796]: 1 logical volume(s) in volume group "vg" now active
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Stopping Clustered LVM volumes activation service...
Mar 31 08:56:06 rhel7-b.virt lvm2-cluster-activation[1843]: Deactivating clustered VG(s):   Logical volume vg/lvol...se.
Mar 31 08:56:06 rhel7-b.virt lvm2-cluster-activation[1843]: Can't deactivate volume group "vg" with 1 open logical...(s)
Mar 31 08:56:06 rhel7-b.virt systemd[1]: lvm2-cluster-activation.service: control process exited, code=exited status=1
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Stopped Clustered LVM volumes activation service.
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Unit lvm2-cluster-activation.service entered failed state.
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Triggering OnFailure= dependencies of lvm2-cluster-activation.service.
Hint: Some lines were ellipsized, use -l to show in full.

[root@rhel7-b ~]# systemctl status lvm2-clvmd
lvm2-clvmd.service - Clustered LVM daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-clvmd.service; static)
   Active: active (running) since Mon 2014-03-31 08:55:29 EDT; 3min 26s ago
     Docs: man:clvmd(8)
  Process: 752 ExecStart=/usr/sbin/clvmd $CLVMD_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 753 (clvmd)
   CGroup: /system.slice/lvm2-clvmd.service
           `-753 /usr/sbin/clvmd -T30

Mar 31 08:55:27 rhel7-b.virt systemd[1]: Starting Clustered LVM daemon...
Mar 31 08:55:28 rhel7-b.virt clvmd[753]: Cluster LVM daemon started - connected to Corosync
Mar 31 08:55:29 rhel7-b.virt systemd[1]: Started Clustered LVM daemon.
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Service lvm2-clvmd.service is not needed anymore. Stopping.
Mar 31 08:56:06 rhel7-b.virt systemd[1]: Started Clustered LVM daemon.
Comment 12 Nenad Peric 2014-03-31 09:56:43 EDT
This is a bit of a convulted thing to verify especially due to the fact that the only supported way of running a RHEL7 cluster would be with resources, and not standalone daemons. 
However, the new daemons behave a bit better and are showing the real status of clvmd even when it failed to stop due to an open LV.
Due to some systemd quirks we still get an exit status of 0, even when the command itself failed. 

Here are the tests (resources in cluster disabled, running only clvmd/dlm/cmirrord as stand-alone):

[root@virt-010 ~]# mount /dev/cluster/lvol1 /mnt

[root@virt-010 ~]# systemctl status lvm2-cluster-activation.service
lvm2-cluster-activation.service - Clustered LVM volumes activation service
   Loaded: loaded (/usr/lib/systemd/system/lvm2-cluster-activation.service; enabled)
   Active: active (exited) since Mon 2014-03-31 15:34:18 CEST; 2min 23s ago
  Process: 2295 ExecStop=/usr/lib/systemd/lvm2-cluster-activation deactivate (code=exited, status=0/SUCCESS)
  Process: 2328 ExecStart=/usr/lib/systemd/lvm2-cluster-activation activate (code=exited, status=0/SUCCESS)
 Main PID: 2328 (code=exited, status=0/SUCCESS)

We cannot issue a stop command direcly to clvmd:

[root@virt-010 ~]# systemctl stop lvm2-clvmd
Failed to issue method call: Operation refused, unit lvm2-clvmd.service may be requested by dependency only.

The whole "suite" has to be stopped due to dependencies:

[root@virt-010 ~]# systemctl stop lvm2-cluster-activation.service
[root@virt-010 ~]# echo $?
0

However, the cluster daemons are not stopped and the separate status calls show that as true (this is a systemd reporting 0 even though command essentially failed):

[root@virt-010 ~]# service lvm2-cluster-activation status
Redirecting to /bin/systemctl status  lvm2-cluster-activation.service
lvm2-cluster-activation.service - Clustered LVM volumes activation service
   Loaded: loaded (/usr/lib/systemd/system/lvm2-cluster-activation.service; enabled)
   Active: failed (Result: exit-code) since Mon 2014-03-31 15:37:02 CEST; 5min ago
  Process: 2383 ExecStop=/usr/lib/systemd/lvm2-cluster-activation deactivate (code=exited, status=1/FAILURE)
  Process: 2328 ExecStart=/usr/lib/systemd/lvm2-cluster-activation activate (code=exited, status=0/SUCCESS)
 Main PID: 2328 (code=exited, status=0/SUCCESS)

^^^^^^ This is where a failure to STOP due to a mounted volume occured


[root@virt-010 ~]# service lvm2-clvmd status
Redirecting to /bin/systemctl status  lvm2-clvmd.service
lvm2-clvmd.service - Clustered LVM daemon
   Loaded: loaded (/usr/lib/systemd/system/lvm2-clvmd.service; static)
   Active: active (running) since Mon 2014-03-31 15:25:21 CEST; 14min ago
     Docs: man:clvmd(8)
 Main PID: 959 (clvmd)
   CGroup: /system.slice/lvm2-clvmd.service
           └─959 /usr/sbin/clvmd -T30


The problem reported in the opening comment cannot be tested directly due to the change in behavior and supported scenarios for RHEL7 clusters. 
The systemd units, however are acting more sane and are showing the real status of daemons.

Marking VERIFIED with:

cmirror-standalone-2.02.105-14.el7.x86_64.rpm  
lvm2-cluster-standalone-2.02.105-14.el7.x86_64.rpm
Comment 13 Ludek Smid 2014-06-13 05:43:27 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.