| Summary: | Stop the network service and the node does not be fenced | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Michael Yang <michael199089> | ||||
| Component: | cluster | Assignee: | Christine Caulfield <ccaulfie> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.5 | CC: | ccaulfie, cluster-maint, michael199089, rpeterso, teigland | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-12-20 10:43:24 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Michael Yang
2013-12-19 07:59:31 UTC
Sorry, logs on node1 is here: 2013-12-19T15:34:29.932373+08:00 err daemon node1 qdiskd[11158]: <err> Qdisk heartbeat send message to address 239.192.103.60 failed,errno=22 2013-12-19T15:34:29.935842+08:00 err daemon node1 qdiskd[11158]: <err> Error writing node ID block 1 2013-12-19T15:34:29.935915+08:00 err daemon node1 qdiskd[11158]: <err> Error writing to quorum disk 2013-12-19T15:34:29.935949+08:00 err daemon node1 qdiskd[11158]: <err> Qdisk heartbeat send message to address 239.192.103.60 failed,errno=22 2013-12-19T15:34:29.935968+08:00 err daemon node1 qdiskd[11158]: <err> Error writing node ID block 1 2013-12-19T15:34:29.935986+08:00 err daemon node1 qdiskd[11158]: <err> Error writing to quorum disk It's impossible to be sure without seeing the whole configuration. But if node2 said that node1 was fenced and it wasn't, I would check your fencing configuration is correct. One of the things you must always do before deploying a cluster is to check that the fence_node command does actually fence the nodes. Created attachment 839503 [details]
my cluster configuration
Uploaded my configuration.
Logs on node2(192.168.35.12):
2013-12-20T17:54:33.848478+08:00 info daemon h35-12 fenced[2424]: 192.168.35.11 not a cluster member after 3 sec post_fail_delay
2013-12-20T17:54:33.849866+08:00 info daemon h35-12 fenced[2424]: fencing node "192.168.35.11"
2013-12-20T17:54:34.207879+08:00 alert user h35-12 python: Fence_agent:Fence begin.
2013-12-20T17:54:35.227584+08:00 alert user h35-12 python: Fence_agent: Params from cman read target:192.168.35.11
2013-12-20T17:54:35.841740+08:00 info local6 h35-12 clurgmgrd[2484]: <info> Waiting for node #1 to be fenced
2013-12-20T17:54:37.035085+08:00 alert user h35-12 python: Fence_agent:Fence timeout:Fence finished.:['/sbin/fence_agent'] 192.168.35.11
2013-12-20T17:54:37.115505+08:00 info daemon h35-12 fenced[2424]: fence "192.168.35.11" success
2013-12-20T17:54:37.842771+08:00 info local6 h35-12 clurgmgrd[2484]: <info> Node #1 fenced; continuing
On node2:
[root@h35-12 ~]# clustat
Cluster Status for Ask8TBxVaX2qyMl0 @ Fri Dec 20 17:56:47 2013
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
192.168.35.11 1 Offline
192.168.35.12 2 Online, Local, rgmanager
HBDEV 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
vm:bfdfLHnJ-hrcCaL-TknD (none) stopped
[root@h35-12 ~]# cman_tool nodes -af
Node Sts Inc Joined Name
0 M 0 2013-12-19 19:25:22 HBDEV
1 X 336 192.168.35.11
Last fenced: 2013-12-20 17:54:37 by two_nodes_device
2 M 308 2013-12-19 19:21:02 192.168.35.12
Addresses: 192.168.35.12
Fence_node command is Ok, as well as unplug the cable directly, node1 will reboot when fence happened.
But now, when shutdown the network service on node1, its cluster status always shows quorate. Node1 should be reboot or at least, the cluster status should be inquorate I think.
What is 'fence_agent' and where did it come from? it doesn't look like a supported Red Hat fence agent to me. We don't hava a fence device, 'fence_agent' is a script to replace that In that case you're on your own I'm afraid. We can only support clusters with valid fencing. Looking at the log I can see that the 'fence agent' you have is giving some sort of error. But if it doesn't actually do anything do reboot the other system then it's not a fence agent and you can't expect the other node to know what's been going on. I'm going to close this bug, because your fence agent isn't fencing and that's nothing we can fix. |