Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1044886

Summary:

Stop the network service and the node does not be fenced

Product:

Red Hat Enterprise Linux 6

Reporter:

Michael Yang <michael199089>

Component:

cluster

Assignee:

Christine Caulfield <ccaulfie>

Status:

CLOSED NOTABUG

QA Contact:

Cluster QE <mspqa-list>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

6.5

CC:

ccaulfie, cluster-maint, michael199089, rpeterso, teigland

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-12-20 10:43:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
my cluster configuration	none

Description Michael Yang 2013-12-19 07:59:31 UTC

Description of problem:
There is two nodes in my cluster with a qdisk and I execute "/etc/init.d/network stop" on node1.

On node2, clustat shows everything is OK, and the log shows node1 has been fenced successd.

While on node1, I found that,
[root@node1 ~]# clustat
Cluster Status for Ask8TBxVaX2qyMl0 @ Thu Dec 19 15:42:40 2013
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 192.168.35.11                                                       1 Online, rgmanager
 192.168.35.12                                                       2 Online, Local, rgmanager
 HBDEV                                                               0 Offline, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State         
 ------- ----                                                     ----- ------                                                     -----         
 vm:aeVYUOcG-XYGKVH-VujB                                          (none)                                                           disabled   


/var/log/messages shows:



Version-Release number of selected component (if applicable):
cman-2.0.115-51

How reproducible:
always

Steps to Reproduce:
1. two nodes cluster with qdisk
2. execute "/etc/init.d/network stop" on node1
3. execute "clustat" on node1

Actual results:
node1 does not be fenced

Expected results:
node1 should be fenced

Additional info:

Comment 1 Michael Yang 2013-12-19 08:09:34 UTC

Sorry, logs on node1 is here:

2013-12-19T15:34:29.932373+08:00 err daemon node1 qdiskd[11158]:  <err> Qdisk heartbeat send message to address 239.192.103.60 failed,errno=22
2013-12-19T15:34:29.935842+08:00 err daemon node1 qdiskd[11158]:  <err> Error writing node ID block 1
2013-12-19T15:34:29.935915+08:00 err daemon node1 qdiskd[11158]:  <err> Error writing to quorum disk
2013-12-19T15:34:29.935949+08:00 err daemon node1 qdiskd[11158]:  <err> Qdisk heartbeat send message to address 239.192.103.60 failed,errno=22
2013-12-19T15:34:29.935968+08:00 err daemon node1 qdiskd[11158]:  <err> Error writing node ID block 1
2013-12-19T15:34:29.935986+08:00 err daemon node1 qdiskd[11158]:  <err> Error writing to quorum disk

Comment 2 Christine Caulfield 2013-12-20 09:47:40 UTC

It's impossible to be sure without seeing the whole configuration. But if node2 said that node1 was fenced and it wasn't, I would check your fencing configuration is correct. 

One of the things you must always do before deploying a cluster is to check that the fence_node command does actually fence the nodes.

Comment 3 Michael Yang 2013-12-20 10:12:08 UTC

Created attachment 839503 [details]
my cluster configuration

Comment 4 Michael Yang 2013-12-20 10:12:32 UTC

Uploaded my configuration.

Logs on node2(192.168.35.12):

2013-12-20T17:54:33.848478+08:00 info daemon h35-12 fenced[2424]:  192.168.35.11 not a cluster member after 3 sec post_fail_delay
2013-12-20T17:54:33.849866+08:00 info daemon h35-12 fenced[2424]:  fencing node "192.168.35.11"
2013-12-20T17:54:34.207879+08:00 alert user h35-12 python:  Fence_agent:Fence begin.
2013-12-20T17:54:35.227584+08:00 alert user h35-12 python:  Fence_agent: Params from cman read target:192.168.35.11
2013-12-20T17:54:35.841740+08:00 info local6 h35-12 clurgmgrd[2484]:  <info> Waiting for node #1 to be fenced
2013-12-20T17:54:37.035085+08:00 alert user h35-12 python:  Fence_agent:Fence timeout:Fence finished.:['/sbin/fence_agent'] 192.168.35.11
2013-12-20T17:54:37.115505+08:00 info daemon h35-12 fenced[2424]:  fence "192.168.35.11" success
2013-12-20T17:54:37.842771+08:00 info local6 h35-12 clurgmgrd[2484]:  <info> Node #1 fenced; continuing

On node2:

[root@h35-12 ~]# clustat
Cluster Status for Ask8TBxVaX2qyMl0 @ Fri Dec 20 17:56:47 2013
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 192.168.35.11                                                       1 Offline
 192.168.35.12                                                       2 Online, Local, rgmanager
 HBDEV                                                               0 Online, Quorum Disk

 Service Name                                                     Owner (Last)                                                     State         
 ------- ----                                                     ----- ------                                                     -----         
 vm:bfdfLHnJ-hrcCaL-TknD                                          (none)                                                           stopped       

[root@h35-12 ~]# cman_tool nodes -af
Node  Sts   Inc   Joined               Name
   0   M      0   2013-12-19 19:25:22  HBDEV
   1   X    336                        192.168.35.11
       Last fenced:   2013-12-20 17:54:37 by two_nodes_device
   2   M    308   2013-12-19 19:21:02  192.168.35.12
       Addresses: 192.168.35.12 


Fence_node command is Ok, as well as unplug the cable directly, node1 will reboot when fence happened.

But now, when shutdown the network service on node1, its cluster status always shows quorate. Node1 should be reboot or at least, the cluster status should be inquorate I think.

Comment 5 Christine Caulfield 2013-12-20 10:30:48 UTC

What is 'fence_agent' and where did it come from? it doesn't look like a supported Red Hat fence agent to me.

Comment 6 Michael Yang 2013-12-20 10:38:56 UTC

We don't hava a fence device, 'fence_agent' is a script to replace that

Comment 7 Christine Caulfield 2013-12-20 10:43:24 UTC

In that case you're on your own I'm afraid. We can only support clusters with valid fencing.

Looking at the log I can see that the 'fence agent' you have is giving some sort of error. But if it doesn't actually do anything do reboot the other system then it's not a fence agent and you can't expect the other node to know what's been going on.

I'm going to close this bug, because your fence agent isn't fencing and that's nothing we can fix.