Bug 1241511

Summary: dlm_controld waits for fencing which will never occur causing hang
Product: Red Hat Enterprise Linux 7 Reporter: michal novacek <mnovacek>
Component: dlmAssignee: David Teigland <teigland>
Status: CLOSED NOTABUG QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: cluster-maint, dvossel, jbrassow, jkortus, mnovacek, zren
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-14 15:30:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pcs cluster report output none

Description michal novacek 2015-07-09 11:26:12 UTC
Created attachment 1050232 [details]
pcs cluster report output

Description of problem:
Have a quorate cluster running pacemaker cluster with clvmd and dlm clone.
Disabling and enabling network communication between all nodes at the same time
will most of the times (but not always) lead back to quorate cluster without
any fencing. In this case, dlm_controld expects some fencing and will hang
until it occurs. Fencing will never occur because the cluster is quorate with
all nodes.

Then, at the same time, disable network communication between cluster nodes. This will lead to all cluster nodes turning inquorate. 

Version-Release number of selected component (if applicable):
dlm-4.0.2-5.el7.x86_64
lvm2-cluster-2.02.115-3.el7.x86_64
pacemaker-1.1.12-22.el7.x86_64
corosync-2.3.4-4.el7.x86_64

How reproducible: very frequent

Steps to Reproduce:
1. have quorate pacemaker cluster
. check nodes uptime
. disable network communication between all nodes with iptables and wait for
    all nodes turning inquorate
. enable at the same time network communication between nodes 
. check whether fencing occured and if it has not check dlm status and logs

Actual results: dlm hanging

Expected results: dlm happilly working

Additional info:
# tail /var/log/messages
...
Jul  9 12:18:33 virt-020 pengine[2287]: warning: custom_action: Action dlm:2_stop_0 on virt-019 is unrunnable (offline)
Jul  9 12:18:33 virt-020 pengine[2287]: warning: custom_action: Action dlm:2_stop_0 on virt-019 is unrunnable (offline)
Jul  9 12:18:33 virt-020 pengine[2287]: notice: LogActions: Stop    dlm:1       (virt-018 - blocked)
Jul  9 12:18:33 virt-020 pengine[2287]: notice: LogActions: Stop    dlm:2       (virt-019 - blocked)
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon joined 2 needs fencing
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon joined 1 needs fencing
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon node 1 stateful merge
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon node 1 stateful merge
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon node 2 stateful merge
Jul  9 12:18:41 virt-020 dlm_controld[2438]: 151 daemon node 2 stateful merge
Jul  9 12:19:12 virt-020 dlm_controld[2438]: 183 fence work wait to clear merge 2 clean 1 part 0 gone 0
Jul  9 12:19:39 virt-020 dlm_controld[2438]: 210 clvmd wait for fencing

Comment 4 David Teigland 2015-07-21 15:37:27 UTC
"Fencing will never occur because the cluster is quorate with all nodes"

If stateful cluster nodes fail, they need to be fenced.

If dlm is in charge of fencing, then stateful cluster merges are a situation where you might need to manually intervene (e.g. if no partition maintained quorum).  When pacemaker does fencing, I don't know what's supposed to happen.

Comment 5 David Teigland 2015-07-28 14:26:21 UTC
If you reproduce this with dlm by itself (get rid of pacemaker) then I could explain the behavior.  Please either reproduce that way, or reassign to pacemaker.

Comment 7 Andrew Beekhof 2015-08-13 22:01:27 UTC
The cluster doesn't require fencing in this situation.
If the dlm requires it, then it is up to the dlm to initiate it.

Comment 8 David Teigland 2015-08-14 15:30:22 UTC
The expected dlm behavior here remains the same as it's been in the past (since partition/merge handling was added), and there does not appear to be anything to fix.

In the case of a cluster partition that merges, if one partition maintained quorum, then it will kill merged nodes.  Otherwise, as in this case, user intervention is required to select and kill merged nodes.

Comment 9 Eric Ren 2016-04-26 02:48:43 UTC
Hello Michal,

> How reproducible: very frequent
> 
> Steps to Reproduce:
> 1. have quorate pacemaker cluster
> . check nodes uptime
> . disable network communication between all nodes with iptables and wait for
>     all nodes turning inquorate
> . enable at the same time network communication between nodes 
> . check whether fencing occured and if it has not check dlm status and logs

With 3 nodes cluster, unfortunately I cannot reproduce(fencing quickly happens ) if applying iptables manually:-/ Looking at cluster report you attached, I think you may use some automatic method to make a real transient disconnection all of sudden. If so, could you please share your method/scripts to help reproduce?

The reason why I'm here is this patch (https://github.com/ClusterLabs/pacemaker/pull/839) has a problem which will cause both nodes to be fenced in 2-nodes cluster unnecessarily in the following case:

1. Bring both nodes up in the cluster and all resources started.
2. Fence one node by issuing "pkill -9 corosync"
3. Watch logs and surviving node fences the other node and then ends up self fencing 

It will decrease availability in 2-nodes scenario. IMHO, the patch shouldn't let "controld" RA rely on "dlm_tool ls" to get "wait fencing" because this message means there's a node in cluster needing fencing. This commands on each node tell RA the same message, so every node will die. IOW, we need dlm tell RA if this node needs fencing, then that patch should work better.

Thanks for your time;-)

Comment 10 michal novacek 2016-05-11 08:48:51 UTC
What I do is that I create /root/iptables.sh on each of the cluster node and then I do run from a node outside of the cluster:

for i in 1 2 3; do ssh node$i /root/iptables.sh & done; wait

This way I was able to manifest the problem described in like less than ten attempts on a three node cluster. 

The imortant thing is '&' instead of ';' in the for cycle which will the commands in parallel. 

Hope this helps.

Comment 11 Eric Ren 2016-05-11 08:59:44 UTC
(In reply to michal novacek from comment #10)
> What I do is that I create /root/iptables.sh on each of the cluster node and
> then I do run from a node outside of the cluster:
> 
> for i in 1 2 3; do ssh node$i /root/iptables.sh & done; wait
> 
> This way I was able to manifest the problem described in like less than ten
> attempts on a three node cluster. 
> 
> The imortant thing is '&' instead of ';' in the for cycle which will the
> commands in parallel. 
> 
> Hope this helps.

Hi Michal,

Thanks a lot for your info! I've reproduced this problem now. In case you may interest:
1. setup ntp (optional);
2. put this scritps on every nodes:
---
#!/bin/sh

PATH=$PATH:/usr/sbin/
has_quorum=
hosts="ocfs2test2,ocfs2test3"   # other 2 nodes

iptables -A INPUT -s $hosts -j DROP
echo "iptables: add rules" > /tmp/cron.log

while true; do
	has_quorum=`corosync-quorumtool | awk '{if($1=="Quorate:") print $2;}'`
	if [ $has_quorum == "No" ] ; then
		echo "Quorum lost now" >> /tmp/cron.log
		break;
	fi
done

iptables -D INPUT -s $hosts -j DROP
echo "iptables: remove rules" >> /tmp/cron.log
---
3. concurrently trigger to run by crontab;

Thanks again.