Bug 1274274 - Pacemaker confirms MariaDB stop operation to early, XtraDB tables in datadir on GFS2 volume get corrupted and are recreated each MariaDB start
Pacemaker confirms MariaDB stop operation to early, XtraDB tables in datadir ...
Status: CLOSED DUPLICATE of bug 1286316
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker (Show other bugs)
7.1
x86_64 Linux
unspecified Severity medium
: rc
: 7.3
Assigned To: Andrew Beekhof
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-22 08:08 EDT by Steffen Hau
Modified: 2016-01-20 09:52 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-20 09:52:17 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steffen Hau 2015-10-22 08:08:44 EDT
Description of problem:

I have issues with pacemaker confirming a stop operation which has not been stopped at this time.

I have created a really simple cluster (only 1 node currently for simplification, RHEL 7.1, fully updated,  with Resilient Storage AddOn):
pcs property set no-quorum-policy=ignore
pcs stonith create ipmi-admin1 fence_ipmilan pcmk_monitor_timeout=120s pcmk_host_list="admin1" ipaddr="172.29.1.1" action="reboot" login="root" passwd="service" lanplus="1" method="cycle"
pcs stonith create ipmi-admin2 fence_ipmilan pcmk_monitor_timeout=120s pcmk_host_list="admin2" ipaddr="172.29.2.1" action="reboot" login="root" passwd="service" lanplus="1" method="cycle"
pcs resource create adminip ocf:heartbeat:IPaddr2 ip="172.30.0.1" cidr_netmask="16" nic="ens1f1:" op monitor interval=30s
pcs resource create mariadb systemd:mariadb op start timeout="60" stop timeout="60" monitor interval="60s"
pcs constraint order start adminip then mariadb
pcs constraint colocation add mariadb with adminip



Here is what is logged to the systemd journal ("pcs cluster stop --all"):
Okt 21 12:56:22 admin2 systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
Okt 21 12:56:22 admin2 pacemakerd[21170]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:22 admin2 pacemakerd[21170]: notice: Shuting down Pacemaker
Okt 21 12:56:22 admin2 pacemakerd[21170]: notice: Stopping crmd: Sent -15 to process 21177
Okt 21 12:56:22 admin2 crmd[21177]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:22 admin2 crmd[21177]: notice: Requesting shutdown, upper limit is 1200000ms
Okt 21 12:56:22 admin2 crmd[21177]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Okt 21 12:56:22 admin2 pengine[21176]: notice: On loss of CCM Quorum: Ignore
Okt 21 12:56:22 admin2 pengine[21176]: notice: Scheduling Node admin2 for shutdown
Okt 21 12:56:22 admin2 pengine[21176]: notice: Stop    ipmi-admin1   (admin2)
Okt 21 12:56:22 admin2 pengine[21176]: notice: Stop    ipmi-admin2   (admin2)
Okt 21 12:56:22 admin2 pengine[21176]: notice: Stop    adminip (admin2)
Okt 21 12:56:22 admin2 pengine[21176]: notice: Stop    mariadb (admin2)
Okt 21 12:56:22 admin2 pengine[21176]: notice: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-433.bz2
Okt 21 12:56:22 admin2 crmd[21177]: notice: Initiating action 8: stop ipmi-admin1_stop_0 on admin2 (local)
Okt 21 12:56:22 admin2 crmd[21177]: notice: Initiating action 9: stop ipmi-admin2_stop_0 on admin2 (local)
Okt 21 12:56:22 admin2 crmd[21177]: notice: Initiating action 11: stop mariadb_stop_0 on admin2 (local)
Okt 21 12:56:22 admin2 crmd[21177]: notice: Operation ipmi-admin1_stop_0: ok (node=admin2, call=27, rc=0, cib-update=47, confirmed=true)
Okt 21 12:56:22 admin2 crmd[21177]: notice: Operation ipmi-admin2_stop_0: ok (node=admin2, call=29, rc=0, cib-update=48, confirmed=true)
Okt 21 12:56:24 admin2 crmd[21177]: notice: Operation mariadb_stop_0: ok (node=admin2, call=31, rc=0, cib-update=49, confirmed=true)
Okt 21 12:56:24 admin2 crmd[21177]: notice: Initiating action 10: stop adminip_stop_0 on admin2 (local)
Okt 21 12:56:24 admin2 IPaddr2(adminip)[22317]: INFO: IP status = ok, IP_CIP=
Okt 21 12:56:24 admin2 crmd[21177]: notice: Operation adminip_stop_0: ok (node=admin2, call=33, rc=0, cib-update=50, confirmed=true)
Okt 21 12:56:24 admin2 crmd[21177]: notice: Transition 2 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-433.bz2): Complete
Okt 21 12:56:24 admin2 crmd[21177]: notice: Stopped 0 recurring operations at shutdown... waiting (0 ops remaining)
Okt 21 12:56:24 admin2 crmd[21177]: notice: Disconnected from the LRM
Okt 21 12:56:24 admin2 crmd[21177]: notice: Disconnecting from Corosync
Okt 21 12:56:24 admin2 cib[21172]: warning: new_event_notification (21172-21177-11): Broken pipe (32)
Okt 21 12:56:24 admin2 cib[21172]: warning: A-Sync reply to crmd failed: No message of desired type
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Stopping pengine: Sent -15 to process 21176
Okt 21 12:56:24 admin2 pengine[21176]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Stopping attrd: Sent -15 to process 21175
Okt 21 12:56:24 admin2 attrd[21175]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:24 admin2 attrd[21175]: notice: Cleaning up before exit
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Stopping lrmd: Sent -15 to process 21174
Okt 21 12:56:24 admin2 lrmd[21174]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Stopping stonith-ng: Sent -15 to process 21173
Okt 21 12:56:24 admin2 stonith-ng[21173]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Stopping cib: Sent -15 to process 21172
Okt 21 12:56:24 admin2 cib[21172]: notice: Invoking handler for signal 15: Terminated
Okt 21 12:56:24 admin2 cib[21172]: notice: Disconnecting from Corosync
Okt 21 12:56:24 admin2 pacemakerd[21170]: notice: Shutdown complete
Okt 21 12:56:24 admin2 systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
Okt 21 12:56:24 admin2 systemd[1]: Stopping Cluster Controlled mariadb...
Okt 21 12:56:24 admin2 systemd[1]: Stopping Corosync Cluster Engine...
Okt 21 12:56:24 admin2 corosync[21155]: [MAIN  ] Node was shut down by a signal
Okt 21 12:56:24 admin2 corosync[22345]: Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Unloading all Corosync service engines.
Okt 21 12:56:24 admin2 corosync[21155]: [QB    ] withdrawing server sockets
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Okt 21 12:56:24 admin2 corosync[21155]: [QB    ] withdrawing server sockets
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync configuration map access
Okt 21 12:56:24 admin2 corosync[21155]: [QB    ] withdrawing server sockets
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync configuration service
Okt 21 12:56:24 admin2 corosync[21155]: [QB    ] withdrawing server sockets
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Okt 21 12:56:24 admin2 corosync[21155]: [QB    ] withdrawing server sockets
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Okt 21 12:56:24 admin2 corosync[21155]: [SERV  ] Service engine unloaded: corosync profile loading service
Okt 21 12:56:24 admin2 corosync[21155]: [MAIN  ] Corosync Cluster Engine exiting normally
Okt 21 12:56:25 admin2 corosync[22345]: Waiting for corosync services to unload:.[  OK  ]
Okt 21 12:56:25 admin2 systemd[1]: Stopped Corosync Cluster Engine.
Okt 21 12:56:25 admin2 mysqld_safe[21526]: 151021 12:56:25 mysqld_safe mysqld from pid file /var/databases/mariadb.pid ended
Okt 21 12:56:25 admin2 systemd[1]: Stopped Cluster Controlled mariadb.


As you can see, there is something wrong:
Okt 21 12:56:22 admin2 crmd[21177]: notice: Initiating action 11: stop mariadb_stop_0 on admin2 (local)
Okt 21 12:56:24 admin2 crmd[21177]: notice: Operation mariadb_stop_0: ok (node=admin2, call=31, rc=0, cib-update=49, confirmed=true)
Okt 21 12:56:24 admin2 systemd[1]: Stopping Cluster Controlled mariadb...
Okt 21 12:56:25 admin2 systemd[1]: Stopped Cluster Controlled mariadb.


A manual stop with "pcs resource disable mariadb" doesn't show this behaviour:
Okt 21 13:09:29 admin2 crmd[25033]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Okt 21 13:09:29 admin2 pengine[25032]: notice: On loss of CCM Quorum: Ignore
Okt 21 13:09:29 admin2 pengine[25032]: notice: Stop    mariadb (admin2)
Okt 21 13:09:29 admin2 pengine[25032]: notice: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-450.bz2
Okt 21 13:09:29 admin2 crmd[25033]: notice: Initiating action 8: stop mariadb_stop_0 on admin2 (local)
Okt 21 13:09:29 admin2 systemd[1]: Stopping Cluster Controlled mariadb...
Okt 21 13:09:30 admin2 mysqld_safe[25258]: 151021 13:09:30 mysqld_safe mysqld from pid file /var/databases/mariadb.pid ended
Okt 21 13:09:30 admin2 systemd[1]: Stopped Cluster Controlled mariadb.
Okt 21 13:09:31 admin2 crmd[25033]: notice: Operation mariadb_stop_0: ok (node=admin2, call=15, rc=0, cib-update=37, confirmed=true)
Okt 21 13:09:31 admin2 crmd[25033]: notice: Transition 2 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-450.bz2): Complete
Okt 21 13:09:31 admin2 crmd[25033]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]

This time, pacemaker confirms the stop operation after mariadb has fully ended.


Because of this, I'm not able to have the MariaDB datadir on a GFS2 volume, as the GFS2 volume gets unmounted while MariaDB is still shutting down.:

The pcs commands:
pcs cluster setup --start --name bwForMAHD admin2
pcs property set no-quorum-policy=ignore
pcs stonith create ipmi-admin1 fence_ipmilan pcmk_monitor_timeout=120s pcmk_host_list="admin1" ipaddr="172.29.1.1" action="reboot" login="root" passwd="service" lanplus="1" method="cycle"
pcs stonith create ipmi-admin2 fence_ipmilan pcmk_monitor_timeout=120s pcmk_host_list="admin2" ipaddr="172.29.2.1" action="reboot" login="root" passwd="service" lanplus="1" method="cycle"
pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true
pcs constraint order start dlm-clone then clvmd-clone
pcs constraint colocation add clvmd-clone with dlm-clone
pcs resource create bwforfs Filesystem device="/dev/bwfor/admin" directory="/bwfor" fstype="gfs2" "options=noatime" op monitor interval=10s on-fail=fence clone interleave=true
pcs constraint order start clvmd-clone then bwforfs-clone
pcs constraint colocation add bwforfs-clone with clvmd-clone
pcs resource create adminip ocf:heartbeat:IPaddr2 ip="172.30.0.1" cidr_netmask="16" nic="ens1f1:" op monitor interval=30s
pcs constraint order start bwforfs-clone then adminip
pcs constraint colocation add adminip with bwforfs-clone
pcs resource create mariadb systemd:mariadb op monitor interval="60s"
pcs constraint order start adminip then mariadb
pcs constraint colocation add mariadb with adminip

Systemd journal:
Okt 21 14:11:31 admin2 systemd[1]: Stopping Pacemaker High Availability Cluster Manager...
Okt 21 14:11:31 admin2 pacemakerd[5059]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:31 admin2 pacemakerd[5059]: notice: Shuting down Pacemaker
Okt 21 14:11:31 admin2 pacemakerd[5059]: notice: Stopping crmd: Sent -15 to process 5065
Okt 21 14:11:31 admin2 crmd[5065]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:31 admin2 crmd[5065]: notice: Requesting shutdown, upper limit is 1200000ms
Okt 21 14:11:31 admin2 crmd[5065]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN origin=crm_shutdown ]
Okt 21 14:11:31 admin2 pengine[5064]: notice: On loss of CCM Quorum: Ignore
Okt 21 14:11:31 admin2 pengine[5064]: notice: Cannot pair clvmd:0 with instance of dlm-clone
Okt 21 14:11:31 admin2 pengine[5064]: notice: Cannot pair bwforfs:0 with instance of clvmd-clone
Okt 21 14:11:31 admin2 pengine[5064]: notice: Scheduling Node admin2 for shutdown
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    ipmi-admin1  (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    ipmi-admin2  (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    dlm:0        (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    clvmd:0 (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    bwforfs:0 (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    adminip (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Stop    mariadb (admin2)
Okt 21 14:11:31 admin2 pengine[5064]: notice: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-476.bz2
Okt 21 14:11:31 admin2 crmd[5065]: notice: Initiating action 11: stop ipmi-admin1_stop_0 on admin2 (local)
Okt 21 14:11:31 admin2 crmd[5065]: notice: Initiating action 12: stop ipmi-admin2_stop_0 on admin2 (local)
Okt 21 14:11:31 admin2 crmd[5065]: notice: Initiating action 29: stop mariadb_stop_0 on admin2 (local)
Okt 21 14:11:31 admin2 crmd[5065]: notice: Operation ipmi-admin1_stop_0: ok (node=admin2, call=56, rc=0, cib-update=63, confirmed=true)
Okt 21 14:11:31 admin2 crmd[5065]: notice: Operation ipmi-admin2_stop_0: ok (node=admin2, call=58, rc=0, cib-update=64, confirmed=true)
Okt 21 14:11:33 admin2 crmd[5065]: notice: Operation mariadb_stop_0: ok (node=admin2, call=60, rc=0, cib-update=65, confirmed=true)
Okt 21 14:11:33 admin2 crmd[5065]: notice: Initiating action 28: stop adminip_stop_0 on admin2 (local)
Okt 21 14:11:33 admin2 IPaddr2(adminip)[9912]: INFO: IP status = ok, IP_CIP=
Okt 21 14:11:33 admin2 crmd[5065]: notice: Operation adminip_stop_0: ok (node=admin2, call=62, rc=0, cib-update=66, confirmed=true)
Okt 21 14:11:33 admin2 crmd[5065]: notice: Initiating action 23: stop bwforfs_stop_0 on admin2 (local)
Okt 21 14:11:33 admin2 Filesystem(bwforfs)[9957]: INFO: Running stop for /dev/bwfor/admin on /bwfor
Okt 21 14:11:33 admin2 Filesystem(bwforfs)[9976]: INFO: Trying to unmount /bwfor
Okt 21 14:11:33 admin2 Filesystem(bwforfs)[9989]: ERROR: Couldn't unmount /bwfor; trying cleanup with TERM
Okt 21 14:11:33 admin2 Filesystem(bwforfs)[10000]: INFO: sending signal TERM to: mysql     9546  8934  1 14:11 ?        Sl     0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/bwfor/mariadb/databases --plugin-dir=/usr/lib64/mysql/plugin --log-error=/bwfor/mariadb/log...
Okt 21 14:11:34 admin2 Filesystem(bwforfs)[10015]: ERROR: Couldn't unmount /bwfor; trying cleanup with TERM
Okt 21 14:11:34 admin2 Filesystem(bwforfs)[10026]: INFO: sending signal TERM to: mysql     9546  8934  1 14:11 ?        D      0:00 [mysqld]
Okt 21 14:11:34 admin2 mysqld_safe[8934]: 151021 14:11:34 mysqld_safe mysqld from pid file /bwfor/mariadb/run/admin2-mariadb.pid ended
Okt 21 14:11:34 admin2 systemd[1]: Stopped Cluster Controlled mariadb.
Okt 21 14:11:35 admin2 ntpd[1348]: Deleting interface #27 ens1f1, 172.30.0.1#123, interface stats: received=123, sent=123, dropped=0, active_time=260 secs
Okt 21 14:11:35 admin2 dlm_controld[5264]: 74204 cpg_dispatch error 9
Okt 21 14:11:35 admin2 Filesystem(bwforfs)[10047]: INFO: unmounted /bwfor successfully
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [ umount: /bwfor: target is busy. ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [         (In some cases useful info about processes that use ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [          the device is found by lsof(8) or fuser(1)) ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [ ocf-exit-reason:Couldn't unmount /bwfor; trying cleanup with TERM ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [ umount: /bwfor: target is busy. ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [         (In some cases useful info about processes that use ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [          the device is found by lsof(8) or fuser(1)) ]
Okt 21 14:11:35 admin2 lrmd[5062]: notice: bwforfs_stop_0:9920:stderr [ ocf-exit-reason:Couldn't unmount /bwfor; trying cleanup with TERM ]
Okt 21 14:11:35 admin2 crmd[5065]: notice: Operation bwforfs_stop_0: ok (node=admin2, call=64, rc=0, cib-update=67, confirmed=true)
Okt 21 14:11:35 admin2 crmd[5065]: notice: Initiating action 18: stop clvmd_stop_0 on admin2 (local)
Okt 21 14:11:36 admin2 clvm(clvmd)[10104]: INFO: Deactivating clustered VG(s):
Okt 21 14:11:36 admin2 multipathd[808]: dm-3: remove map (uevent)
Okt 21 14:11:36 admin2 multipathd[808]: dm-3: devmap not registered, can't remove
Okt 21 14:11:36 admin2 multipathd[808]: dm-3: remove map (uevent)
Okt 21 14:11:36 admin2 clvm(clvmd)[10117]: INFO: 0 logical volume(s) in volume group "bwfor" now active
Okt 21 14:11:36 admin2 clvm(clvmd)[10123]: INFO: Signaling clvmd to exit
Okt 21 14:11:36 admin2 clvm(clvmd)[10130]: INFO: Waiting for clvmd to exit
Okt 21 14:11:36 admin2 dlm_controld[5264]: 74204 cpg_dispatch error 9
Okt 21 14:11:36 admin2 clvm(clvmd)[10142]: INFO: clvmd is not running
Okt 21 14:11:36 admin2 clvm(clvmd)[10148]: INFO: clvmd terminated
Okt 21 14:11:36 admin2 clvm(clvmd)[10157]: INFO: cmirrord is not running
Okt 21 14:11:36 admin2 crmd[5065]: notice: Operation clvmd_stop_0: ok (node=admin2, call=66, rc=0, cib-update=68, confirmed=true)
Okt 21 14:11:36 admin2 crmd[5065]: notice: Initiating action 13: stop dlm_stop_0 on admin2 (local)
Okt 21 14:11:36 admin2 kernel: dlm: closing connection to node 1
Okt 21 14:11:37 admin2 crmd[5065]: notice: Operation dlm_stop_0: ok (node=admin2, call=68, rc=0, cib-update=69, confirmed=true)
Okt 21 14:11:37 admin2 crmd[5065]: notice: Transition 7 (Complete=15, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-476.bz2): Complete
Okt 21 14:11:37 admin2 crmd[5065]: notice: Stopped 0 recurring operations at shutdown... waiting (0 ops remaining)
Okt 21 14:11:37 admin2 crmd[5065]: notice: Disconnected from the LRM
Okt 21 14:11:37 admin2 crmd[5065]: notice: Disconnecting from Corosync
Okt 21 14:11:37 admin2 cib[5060]: warning: new_event_notification (5060-5065-11): Broken pipe (32)
Okt 21 14:11:37 admin2 cib[5060]: warning: A-Sync reply to crmd failed: No message of desired type
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Stopping pengine: Sent -15 to process 5064
Okt 21 14:11:37 admin2 pengine[5064]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Stopping attrd: Sent -15 to process 5063
Okt 21 14:11:37 admin2 attrd[5063]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:37 admin2 attrd[5063]: notice: Cleaning up before exit
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Stopping lrmd: Sent -15 to process 5062
Okt 21 14:11:37 admin2 lrmd[5062]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Stopping stonith-ng: Sent -15 to process 5061
Okt 21 14:11:37 admin2 stonith-ng[5061]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Stopping cib: Sent -15 to process 5060
Okt 21 14:11:37 admin2 cib[5060]: notice: Invoking handler for signal 15: Terminated
Okt 21 14:11:37 admin2 cib[5060]: notice: Disconnecting from Corosync
Okt 21 14:11:37 admin2 pacemakerd[5059]: notice: Shutdown complete
Okt 21 14:11:37 admin2 systemd[1]: Stopped Pacemaker High Availability Cluster Manager.
Okt 21 14:11:38 admin2 systemd[1]: Stopping Corosync Cluster Engine...
Okt 21 14:11:38 admin2 corosync[5043]: [MAIN  ] Node was shut down by a signal
Okt 21 14:11:38 admin2 corosync[10190]: Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Unloading all Corosync service engines.
Okt 21 14:11:38 admin2 corosync[5043]: [QB    ] withdrawing server sockets
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Okt 21 14:11:38 admin2 corosync[5043]: [QB    ] withdrawing server sockets
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync configuration map access
Okt 21 14:11:38 admin2 corosync[5043]: [QB    ] withdrawing server sockets
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync configuration service
Okt 21 14:11:38 admin2 corosync[5043]: [QB    ] withdrawing server sockets
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Okt 21 14:11:38 admin2 corosync[5043]: [QB    ] withdrawing server sockets
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Okt 21 14:11:38 admin2 corosync[5043]: [SERV  ] Service engine unloaded: corosync profile loading service
Okt 21 14:11:38 admin2 corosync[5043]: [MAIN  ] Corosync Cluster Engine exiting normally
Okt 21 14:11:39 admin2 corosync[10190]: Waiting for corosync services to unload:.[  OK  ]
Okt 21 14:11:39 admin2 systemd[1]: Stopped Corosync Cluster Engine.


when I start the cluster stack again, MariaDB recreates the files ibdata1 and the ib_logfile[0,1] (I guess they get corrupted by the TERM signal, see the mysqld process in squery brackets). Afterwars, all my InnoDB based tables are no longer accessible:
151021 15:02:20 mysqld_safe Starting mysqld daemon with databases from /bwfor/mariadb/databases
151021 15:02:20 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
151021 15:02:20 [Note] /usr/libexec/mysqld (mysqld 5.5.44-MariaDB) starting as process 11385 ...
151021 15:02:20 InnoDB: The InnoDB memory heap is disabled
151021 15:02:20 InnoDB: Mutexes and rw_locks use GCC atomic builtins
151021 15:02:20 InnoDB: Compressed tables use zlib 1.2.7
151021 15:02:20 InnoDB: Using Linux native AIO
151021 15:02:20 InnoDB: Initializing buffer pool, size = 2.0G
151021 15:02:20 InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
151021 15:02:20  InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
151021 15:02:20  InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 256 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200
151021 15:02:21  InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 256 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: 127 rollback segment(s) active.
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
151021 15:02:23  InnoDB: Waiting for the background threads to start
151021 15:02:24 Percona XtraDB (http://www.percona.com) 5.5.43-MariaDB-37.2 started; log sequence number 0
151021 15:02:24 [Note] Plugin 'FEEDBACK' is disabled.
151021 15:02:24 [Note] Server socket created on IP: '0.0.0.0'.
151021 15:02:25 [Note] Event Scheduler: Loaded 0 events
151021 15:02:25 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.44-MariaDB'  socket: '/bwfor/mariadb/databases/mysql.sock'  port: 3306  MariaDB Server


The only way to workaround this issue is to "pcs resource disable mariadb" before the cluster stack is restarted (reboot or with pcs stop and start), but even this fails sometimes. It looks like I have 2 issues: MariaDB ist nor properly stopped, MariaDB has issues with GFS2 as disk storage.



Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.115-3.el7_1.1.x86_64
mariadb-5.5.44-1.el7_1.x86_64
pacemaker-1.1.12-22.el7_1.4.x86_64



How reproducible:
See pcs commands I've executed.

Steps to Reproduce:
1.
2.
3.

Actual results:
ibdata and ib_logfile are recreated at each start

Expected results:
MariaDB should start without recreating these files

Additional info:


Document URL: 

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 


Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Andrew Beekhof 2015-11-25 18:04:58 EST
Use the OCF mariadb agent instead of the systemd unit file.
Comment 3 Ken Gaillot 2016-01-20 09:52:17 EST
This issue has been resolved and will be fixed in both 7.3 and an upcoming 7.2 z-stream.

*** This bug has been marked as a duplicate of bug 1286316 ***

Note You need to log in before you can comment on or make changes to this bug.