Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1057258

Summary: fence_xvm terminated
Product: Red Hat Enterprise Linux 7 Reporter: Miroslav Lisik <mlisik>
Component: fence-virtAssignee: Ryan McCabe <rmccabe>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0CC: cluster-maint, fdinitto, jkortus, mlisik
Target Milestone: beta   
Target Release: 7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: fence-virt-0.3.0-16.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 09:45:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Extract from /var/log/messages.
none
fence xvm coredump none

Description Miroslav Lisik 2014-01-23 17:33:57 UTC
Created attachment 854513 [details]
Extract from /var/log/messages.

Description of problem:
fence_xvm was terminated during fence process.
*** buffer overflow detected ***: fence_xvm terminated in /var/log/messages


Version-Release number of selected component (if applicable):
fence-virt-0.3.0-14.el7.src.rpm


How reproducible: always


Steps to Reproduce:
1. Create 3 node cluster
pcs cluster setup --start --name newcluster <node1> <node2> <node3>
2. Add stonith devices
pcs stonith create xvm1 fence_xvm port=<node1>
pcs stonith create xvm2 fence_xvm port=<node2>
pcs stonith create xvm3 fence_xvm port=<node3>
3. on some node (e.g. <node2>) disable network
systemctl stop network.service

Actual results:
fence_xvm terminated log in /var/log/messages
Sometimes node was fenced and sometimes wasn't.


Expected results:
fence_xvm doesn't terminate and node is fenced.


Additional info:

Cluster Name: newcluster
Corosync Nodes:
 virt-074.cluster-qe.lab.eng.brq.redhat.com virt-075.cluster-qe.lab.eng.brq.redhat.com virt-076.cluster-qe.lab.eng.brq.redhat.com 
Pacemaker Nodes:
 virt-074.cluster-qe.lab.eng.brq.redhat.com virt-075.cluster-qe.lab.eng.brq.redhat.com virt-076.cluster-qe.lab.eng.brq.redhat.com 

Resources: 

Stonith Devices: 
 Resource: xvm-074 (class=stonith type=fence_xvm)
  Attributes: debug=10 port=virt-074.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (xvm-074-monitor-interval-60s)
 Resource: xvm-075 (class=stonith type=fence_xvm)
  Attributes: debug=10 port=virt-075.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (xvm-075-monitor-interval-60s)
 Resource: xvm-076 (class=stonith type=fence_xvm)
  Attributes: debug=10 port=virt-076.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (xvm-076-monitor-interval-60s)
Fencing Levels: 

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-22.el7-368c726iguration

Comment 2 Miroslav Lisik 2014-01-27 13:23:54 UTC
Created attachment 856062 [details]
fence xvm coredump

Coredump might be helpful.

Comment 6 Fabio Massimo Di Nitto 2014-02-18 14:55:51 UTC
Miroslav: do you have iptables or selinux in enforcing mode?

I don“t use iptables, selinux is in permissive mode and there are no AVC for fence in audit.log.

IF are you using qcow2 images, can you please try to regenerate the VM? I have seen a few times FS corruptions happening that lead to some odd VM behavior

Comment 7 Ryan McCabe 2014-02-18 16:59:11 UTC
I think this should be solved by applying upstream commit:

commit 04710b40794fb31e9cd70c4a205decf6b40206fd
Author: Ryan McCabe <rmccabe>
Date:   Wed Jul 10 17:31:21 2013 -0400

    fence-virt: Fail properly if unable to bind the listener socket
    
    Bail out properly in multicast mode if we're unable to bind the TCP
    listener socket.
    
    Signed-off-by: Ryan McCabe <rmccabe>

Comment 9 Miroslav Lisik 2014-03-07 18:09:31 UTC
I have this cluster configuration:
==================================

Cluster name: cluster
Last updated: Fri Mar  7 18:12:24 2014
Last change: Fri Mar  7 18:12:08 2014 via cibadmin on virt-066.cluster-qe.lab.eng.brq.redhat.com
Stack: corosync
Current DC: virt-068.cluster-qe.lab.eng.brq.redhat.com (3) - partition with quorum
Version: 1.1.10-25.el7-368c726
3 Nodes configured
3 Resources configured


Online: [ virt-066 virt-067 virt-068 ]

Full list of resources:

 xvm1	(stonith:fence_xvm):	Started virt-066 
 xvm2	(stonith:fence_xvm):	Started virt-067 
 xvm3	(stonith:fence_xvm):	Started virt-068 

PCSD Status:
  virt-066.cluster-qe.lab.eng.brq.redhat.com: Online
  virt-067.cluster-qe.lab.eng.brq.redhat.com: Online
  virt-068.cluster-qe.lab.eng.brq.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Stonith resources:
==================
 Resource: xvm1 (class=stonith type=fence_xvm)
  Attributes: port=virt-066 
  Operations: monitor interval=60s (xvm1-monitor-interval-60s)
 Resource: xvm2 (class=stonith type=fence_xvm)
  Attributes: port=virt-067
  Operations: monitor interval=60s (xvm2-monitor-interval-60s)
 Resource: xvm3 (class=stonith type=fence_xvm)
  Attributes: port=virt-068 
  Operations: monitor interval=60s (xvm3-monitor-interval-60s)


I chose one node and I stopped network service on it.
fence_xvm from package fence-virt-0.3.0-14.el7.rpm crashed after network service stop. Log snippet after crash:

virt-076 stonith-ng[13707]: notice: dynamic_list_search_cb: Disabling port list queries for xvm-075 (-103): *** buffer overflow detected ***: fence_xvm terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f7095d5faf7]
/lib64/libc.so.6(+0x10bcc0)[0x7f7095d5dcc0]
/lib64/libc.so.6(+0x10da67)[0x7f7095d5fa67]
fence_xvm[0x402023]
fence_xvm[0x4019de]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7095c73af5]
fence_xvm[0x401b31]
======= Memory map: ========
00400000-0040a000 r-xp 00000000 fd:01 26141278

With fence_xvm from NEW fixed package fence-virt-0.3.0-16.el7 I don't get mentioned result. No backtrace in log.

I think the issue with fence_xvm is fixed.

But node with stopped network service is still not fenced. Probably there is some bug in stonith. I filed new bug https://bugzilla.redhat.com/show_bug.cgi?id=1074024

Comment 10 Ludek Smid 2014-06-13 09:45:07 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.