Bug 1627842 - Anonymous clone of clvmd causes restart of all dependent services on node re-join.
Summary: Anonymous clone of clvmd causes restart of all dependent services on node re-...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker
Version: 7.6
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: rc
: ---
Assignee: Ken Gaillot
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-11 16:42 UTC by michal novacek
Modified: 2018-10-24 17:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-20 12:15:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pcs cluster report (2.99 MB, application/x-bzip)
2018-09-11 16:42 UTC, michal novacek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1498473 0 None None None 2018-10-24 17:39:10 UTC

Description michal novacek 2018-09-11 16:42:08 UTC
Created attachment 1482401 [details]
pcs cluster report

Anonymous clone of clvmd causes restart of all dependent services on node
re-join.

We have two node cluster (1) (2) that basically have the following dependency
tree:

dlm-clone -> clvmd-clone --------> mysql-group (vip, clustered vg, gfs2, db in docker container)
             docker-clone ---/

--> meaning order and colocation.


When one of the node1 is shut down resources are correctly moved to the node2.
When node1 rejoins cluster all resources dependant on clvmd get another restart.


This seems like very bad thing to happen because cluster node joining the
cluster must not cause peacufully running resources restart.

Steps to reproduce.
1/ Configure cluster with said dependencies (1) (2)
2/ pcs cluster stop node1
3/ pcs cluster start node1


Actual result: restart of all peacefully running resources on node2

Expected result: resources kept running on node2

Reproducibility: always

Additional information:

# rpm -q pacemaker resource-agents
pacemaker-1.1.19-7.el7.x86_64
resource-agents-4.1.1-10.el7.x86_64

We use 'pcs resource defaults resource-stickiness=200' so the resources do not relocate when new node joins (step 3).

We see the same bahaviour in RHEL7.5.

The problem is shown by this log excerpt:

Sep 11 18:13:54 virt-246 corosync[31438]: [TOTEM ] A new membership (10.37.167.116:772) was formed. Members joined: 1
Sep 11 18:13:54 virt-246 corosync[31438]: [QUORUM] Members[2]: 1 2
Sep 11 18:13:54 virt-246 corosync[31438]: [MAIN  ] Completed service synchronization, ready to provide service.
Sep 11 18:13:54 virt-246 crmd[31454]:  notice: Node virt-245 state is now member
Sep 11 18:13:54 virt-246 pacemakerd[31448]:  notice: Node virt-245 state is now member
Sep 11 18:13:56 virt-246 crmd[31454]:  notice: High CPU load detected: 1.160000
Sep 11 18:13:56 virt-246 stonith-ng[31450]:  notice: Node virt-245 state is now member
Sep 11 18:13:56 virt-246 attrd[31452]:  notice: Node virt-245 state is now member
Sep 11 18:13:56 virt-246 cib[31449]:  notice: Node virt-245 state is now member
Sep 11 18:13:57 virt-246 crmd[31454]:  notice: State transition S_IDLE -> S_INTEGRATION
Sep 11 18:14:01 virt-246 pengine[31453]: warning: Processing failed monitor of clvmd:0 on virt-246: not running
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Start      dlm:1                     (             virt-245 )
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Restart    clvmd:0                   (             virt-246 )   due to required dlm-clone running
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Start      clvmd:1                   (             virt-245 )
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Start      dockerd:1                 (             virt-245 )
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Restart    db-stage-vip              (             virt-246 )   due to required clvmd-clone running
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Restart    db-stage-lvm              (             virt-246 )   due to required db-stage-vip start
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Restart    db-stage-fs               (             virt-246 )   due to required db-stage-lvm start
Sep 11 18:14:01 virt-246 pengine[31453]:  notice:  * Restart    mysql-stage               (             virt-246 )   due to required db-stage-fs start
Sep 11 18:14:01 virt-246 pengine[31453]:  notice: Calculated transition 27, saving inputs in /var/lib/pacemaker/pengine/pe-input-386.bz2

It seems that dlm:1 is started and this causes clvmd to be restarted and this leads to chain restart of all dependencies.

(1) [root@virt-246 ~]# pcs status
Cluster name: el-cluster
Stack: corosync
Current DC: virt-246 (version 1.1.19-7.el7-c3c624ea3d) - partition with quorum
Last updated: Tue Sep 11 18:20:39 2018
Last change: Tue Sep 11 18:13:32 2018 by root via cibadmin on virt-246

2 nodes configured
24 resources configured (10 DISABLED)

Online: [ virt-245 virt-246 ]

Full list of resources:

 fence-virt-245	(stonith:fence_xvm):	Started virt-246
 fence-virt-246	(stonith:fence_xvm):	Started virt-246
 Clone Set: dlm-clone [dlm]
     Started: [ virt-245 virt-246 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-245 virt-246 ]
 Clone Set: dockerd-clone [dockerd]
     Started: [ virt-245 virt-246 ]
 Resource Group: mysql-g-stage
     db-stage-vip	(ocf::heartbeat:IPaddr):	Started virt-246
     db-stage-lvm	(ocf::heartbeat:LVM):	Started virt-246
     db-stage-fs	(ocf::heartbeat:Filesystem):	Started virt-246
     mysql-stage	(ocf::heartbeat:docker):	Started virt-246
 Resource Group: mysql-g-live
     db-live-vip	(ocf::heartbeat:IPaddr):	Stopped (disabled)
     db-live-lvm	(ocf::heartbeat:LVM):	Stopped (disabled)
     db-live-fs	(ocf::heartbeat:Filesystem):	Stopped (disabled)
     mysql-live	(ocf::heartbeat:docker):	Stopped (disabled)
 Clone Set: container-shared-fs-clone [container-shared-fs]
     Stopped (disabled): [ virt-245 virt-246 ]
 Resource Group: cqe-frontend-stage
     frontend-stage-vip	(ocf::heartbeat:IPaddr):	Stopped
     frontend-stage	(ocf::heartbeat:docker):	Stopped
 Resource Group: cqe-frontend-live
     frontend-live-vip	(ocf::heartbeat:IPaddr):	Stopped
     frontend-live	(ocf::heartbeat:docker):	Stopped
 cqe-dispatcher-live	(ocf::heartbeat:docker):	Stopped
 cqe-dispatcher-stage	(ocf::heartbeat:docker):	Stopped

Failed Actions:
* clvmd:1_monitor_30000 on virt-246 'not running' (7): call=111, status=complete, exitreason='',
    last-rc-change='Tue Sep 11 17:46:15 2018', queued=0ms, exec=386ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

(2): [root@virt-246 ~]# pcs config
Cluster Name: el-cluster
Corosync Nodes:
 virt-245 virt-246
Pacemaker Nodes:
 virt-245 virt-246

Resources:
 Clone: dlm-clone
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=10 start-delay=0 timeout=20 (dlm-monitor-interval-10)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: with_cmirrord=1
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: monitor interval=30 timeout=90 (clvmd-monitor-interval-30)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Clone: dockerd-clone
  Resource: dockerd (class=systemd type=docker)
   Operations: monitor interval=60 timeout=100 (dockerd-monitor-interval-60)
               start interval=0s timeout=100 (dockerd-start-interval-0s)
               stop interval=0s timeout=100 (dockerd-stop-interval-0s)
 Group: mysql-g-stage
  Resource: db-stage-vip (class=ocf provider=heartbeat type=IPaddr)
   Attributes: cidr_netmask=22 ip=10.37.165.142
   Operations: monitor interval=10s timeout=20s (db-stage-vip-monitor-interval-10s)
               start interval=0s timeout=20s (db-stage-vip-start-interval-0s)
               stop interval=0s timeout=20s (db-stage-vip-stop-interval-0s)
  Resource: db-stage-lvm (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=storage-db-stage
   Operations: methods interval=0s timeout=5s (db-stage-lvm-methods-interval-0s)
               monitor interval=10s timeout=30s (db-stage-lvm-monitor-interval-10s)
               start interval=0s timeout=30s (db-stage-lvm-start-interval-0s)
               stop interval=0s timeout=30s (db-stage-lvm-stop-interval-0s)
  Resource: db-stage-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/storage-db-stage/db-stage directory=/var/lib/mysql-stage fstype=gfs2
   Operations: monitor interval=20s timeout=40s (db-stage-fs-monitor-interval-20s)
               notify interval=0s timeout=60s (db-stage-fs-notify-interval-0s)
               start interval=0s timeout=60s (db-stage-fs-start-interval-0s)
               stop interval=0s timeout=60s (db-stage-fs-stop-interval-0s)
  Resource: mysql-stage (class=ocf provider=heartbeat type=docker)
   Attributes: image=docker.io/mariadb:10.3 run_opts="--user 5010:5010 --volume /var/lib/mysql-stage:/var/lib/mysql --volume /shared/containers/configs/mysql-stage:/etc/mysql/conf.d --volume /shared/containers/logs/mysql-stage:/var/log/mysql --publish 10.37.165.142:3306:3306"
   Operations: monitor interval=30s timeout=30s (mysql-stage-monitor-interval-30s)
               start interval=0s timeout=90s (mysql-stage-start-interval-0s)
               stop interval=0s timeout=90s (mysql-stage-stop-interval-0s)
 Group: mysql-g-live
  Meta Attrs: target-role=Stopped
  Resource: db-live-vip (class=ocf provider=heartbeat type=IPaddr)
   Attributes: cidr_netmask=22 ip=10.37.165.133
   Operations: monitor interval=10s timeout=20s (db-live-vip-monitor-interval-10s)
               start interval=0s timeout=20s (db-live-vip-start-interval-0s)
               stop interval=0s timeout=20s (db-live-vip-stop-interval-0s)
  Resource: db-live-lvm (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=storage-db-live
   Operations: methods interval=0s timeout=5s (db-live-lvm-methods-interval-0s)
               monitor interval=10s timeout=30s (db-live-lvm-monitor-interval-10s)
               start interval=0s timeout=30s (db-live-lvm-start-interval-0s)
               stop interval=0s timeout=30s (db-live-lvm-stop-interval-0s)
  Resource: db-live-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/storage-db-live/db-live directory=/var/lib/mysql-live fstype=gfs2
   Operations: monitor interval=20s timeout=40s (db-live-fs-monitor-interval-20s)
               notify interval=0s timeout=60s (db-live-fs-notify-interval-0s)
               start interval=0s timeout=60s (db-live-fs-start-interval-0s)
               stop interval=0s timeout=60s (db-live-fs-stop-interval-0s)
  Resource: mysql-live (class=ocf provider=heartbeat type=docker)
   Attributes: image=docker.io/mariadb:10.3 run_opts="--user 5020:5020 --volume /var/lib/mysql-live:/var/lib/mysql --volume /shared/containers/configs/mysql-live:/etc/mysql/conf.d --volume /shared/containers/logs/mysql-live:/var/log/mysql --publish 10.37.165.133:3306:3306"
   Operations: monitor interval=30s timeout=30s (mysql-live-monitor-interval-30s)
               start interval=0s timeout=90s (mysql-live-start-interval-0s)
               stop interval=0s timeout=90s (mysql-live-stop-interval-0s)
 Clone: container-shared-fs-clone
  Meta Attrs: interleave=true target-role=Stopped
  Resource: container-shared-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/mapper/storage--data-container--logs directory=/shared/containers fstype=gfs2
   Operations: monitor interval=20s timeout=40s (container-shared-fs-monitor-interval-20s)
               notify interval=0s timeout=60s (container-shared-fs-notify-interval-0s)
               start interval=0s timeout=60s (container-shared-fs-start-interval-0s)
               stop interval=0s timeout=60s (container-shared-fs-stop-interval-0s)
 Group: cqe-frontend-stage
  Resource: frontend-stage-vip (class=ocf provider=heartbeat type=IPaddr)
   Attributes: cidr_netmask=22 ip=10.37.165.165
   Operations: monitor interval=10s timeout=20s (frontend-stage-vip-monitor-interval-10s)
               start interval=0s timeout=20s (frontend-stage-vip-start-interval-0s)
               stop interval=0s timeout=20s (frontend-stage-vip-stop-interval-0s)
  Resource: frontend-stage (class=ocf provider=heartbeat type=docker)
   Attributes: image=docker-registry.engineering.redhat.com/cqe/frontend:latest-stage monitor_cmd="curl http://localhost:8080/clusterqe/" run_opts="--env-file=/shared/containers/configs/container-variables-stage --volume /shared/containers/logs/frontend-stage:/var/log --volume /shared/containers/configs/cqe:/etc/cluster-django --publish 10.37.165.165:80:8080"
   Operations: monitor interval=30s timeout=30s (frontend-stage-monitor-interval-30s)
               start interval=0s timeout=240s (frontend-stage-start-interval-0s)
               stop interval=0s timeout=90s (frontend-stage-stop-interval-0s)
 Group: cqe-frontend-live
  Resource: frontend-live-vip (class=ocf provider=heartbeat type=IPaddr)
   Attributes: cidr_netmask=22 ip=10.37.165.220
   Operations: monitor interval=10s timeout=20s (frontend-live-vip-monitor-interval-10s)
               start interval=0s timeout=20s (frontend-live-vip-start-interval-0s)
               stop interval=0s timeout=20s (frontend-live-vip-stop-interval-0s)
  Resource: frontend-live (class=ocf provider=heartbeat type=docker)
   Attributes: image=docker-registry.engineering.redhat.com/cqe/frontend:latest-live monitor_cmd="curl http://localhost:8080/clusterqe/" run_opts="--env-file=/shared/containers/configs/container-variables-live --volume /shared/containers/logs/frontend-live:/var/log --volume /shared/containers/configs/cqe:/etc/cluster-django --publish 10.37.165.220:80:8080"
   Operations: monitor interval=30s timeout=30s (frontend-live-monitor-interval-30s)
               start interval=0s timeout=240s (frontend-live-start-interval-0s)
               stop interval=0s timeout=90s (frontend-live-stop-interval-0s)
 Resource: cqe-dispatcher-live (class=ocf provider=heartbeat type=docker)
  Attributes: image=docker-registry.engineering.redhat.com/cqe/dispatcher:latest-live run_opts="--env-file=/shared/containers/configs/container-variables-live --volume /shared/containers/logs/frontend-live:/var/log --volume /shared/containers/configs/cqe:/etc/cluster-django"
  Operations: monitor interval=30s timeout=30s (cqe-dispatcher-live-monitor-interval-30s)
              start interval=0s timeout=90s (cqe-dispatcher-live-start-interval-0s)
              stop interval=0s timeout=90s (cqe-dispatcher-live-stop-interval-0s)
 Resource: cqe-dispatcher-stage (class=ocf provider=heartbeat type=docker)
  Attributes: image=docker-registry.engineering.redhat.com/cqe/dispatcher:latest-stage run_opts="--env-file=/shared/containers/configs/container-variables-stage --volume /shared/containers/logs/frontend-stage:/var/log --volume /shared/containers/configs/cqe:/etc/cluster-django"
  Operations: monitor interval=30s timeout=30s (cqe-dispatcher-stage-monitor-interval-30s)
              start interval=0s timeout=90s (cqe-dispatcher-stage-start-interval-0s)
              stop interval=0s timeout=90s (cqe-dispatcher-stage-stop-interval-0s)

Stonith Devices:
 Resource: fence-virt-245 (class=stonith type=fence_xvm)
  Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-245 pcmk_host_map=virt-245:virt-245.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-virt-245-monitor-interval-60s)
 Resource: fence-virt-246 (class=stonith type=fence_xvm)
  Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-246 pcmk_host_map=virt-246:virt-246.cluster-qe.lab.eng.brq.redhat.com
  Operations: monitor interval=60s (fence-virt-246-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
  start clvmd-clone then start mysql-g-stage (kind:Mandatory)
  start dockerd-clone then start mysql-g-stage (kind:Mandatory)
  start clvmd-clone then start mysql-g-live (kind:Mandatory)
  start dockerd-clone then start mysql-g-live (kind:Mandatory)
  start clvmd-clone then start container-shared-fs-clone (kind:Mandatory)
  start container-shared-fs-clone then start cqe-frontend-stage (kind:Mandatory)
  start dockerd-clone then start cqe-frontend-stage (kind:Mandatory)
  start mysql-g-stage then start cqe-frontend-stage (kind:Mandatory)
  start container-shared-fs-clone then start cqe-frontend-live (kind:Mandatory)
  start dockerd-clone then start cqe-frontend-live (kind:Mandatory)
  start mysql-g-live then start cqe-frontend-live (kind:Mandatory)
  start cqe-frontend-live then start cqe-dispatcher-live (kind:Mandatory)
  start cqe-frontend-stage then start cqe-dispatcher-stage (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
  mysql-g-stage with clvmd-clone (score:INFINITY)
  mysql-g-stage with dockerd-clone (score:INFINITY)
  mysql-g-live with clvmd-clone (score:INFINITY)
  mysql-g-live with dockerd-clone (score:INFINITY)
  container-shared-fs-clone with clvmd-clone (score:INFINITY)
  cqe-frontend-stage with container-shared-fs-clone (score:INFINITY)
  cqe-frontend-stage with dockerd-clone (score:INFINITY)
  cqe-frontend-stage with mysql-g-stage (score:100)
  cqe-frontend-live with container-shared-fs-clone (score:INFINITY)
  cqe-frontend-live with dockerd-clone (score:INFINITY)
  cqe-frontend-live with mysql-g-live (score:100)
  cqe-dispatcher-live with container-shared-fs-clone (score:INFINITY)
  cqe-dispatcher-live with dockerd-clone (score:INFINITY)
  cqe-dispatcher-live with mysql-g-live (score:100)
  cqe-dispatcher-stage with container-shared-fs-clone (score:INFINITY)
  cqe-dispatcher-stage with dockerd-clone (score:INFINITY)
  cqe-dispatcher-stage with mysql-g-stage (score:100)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 200
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: el-cluster
 dc-version: 1.1.19-7.el7-c3c624ea3d
 have-watchdog: false

Quorum:
  Options:

Comment 2 Ken Gaillot 2018-09-12 23:23:28 UTC
This is the expected behavior when the "interleave" clone option is not set:

"When this clone is ordered relative to another clone, if this option is false (the default), the ordering is relative to all instances of the other clone, whereas if this option is true, the ordering is relative only to instances on the same node. Allowed values: false, true"

Let me know if that doesn't solve the issue.

Comment 3 michal novacek 2018-09-14 16:22:04 UTC
Yes, this seems to solve the issue.

In the light of the events causing me to fill in this bug it seems only logical to want by default the opposite behaviour to current state. Is there a reason why default for clones is interleave=false?

Comment 4 Ken Gaillot 2018-09-14 17:48:31 UTC
(In reply to michal novacek from comment #3)
> Yes, this seems to solve the issue.
> 
> In the light of the events causing me to fill in this bug it seems only
> logical to want by default the opposite behaviour to current state. Is there
> a reason why default for clones is interleave=false?

interleave=false is considered safer, since Pacemaker can't know whether the applications involved can function properly with interleave=true.

I.e. if false is the default, the worst that happens is unnecessary restarts for applications that would benefit from true; whereas if true is the default, resource failure is guaranteed for applications that don't support it.

At this point, it also is important for backward compatibility.


Note You need to log in before you can comment on or make changes to this bug.