Bug 1057258
| Summary: | fence_xvm terminated | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Miroslav Lisik <mlisik> | ||||||
| Component: | fence-virt | Assignee: | Ryan McCabe <rmccabe> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.0 | CC: | cluster-maint, fdinitto, jkortus, mlisik | ||||||
| Target Milestone: | beta | ||||||||
| Target Release: | 7.0 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | fence-virt-0.3.0-16.el7 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-13 09:45:07 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 856062 [details]
fence xvm coredump
Coredump might be helpful.
Miroslav: do you have iptables or selinux in enforcing mode? I don“t use iptables, selinux is in permissive mode and there are no AVC for fence in audit.log. IF are you using qcow2 images, can you please try to regenerate the VM? I have seen a few times FS corruptions happening that lead to some odd VM behavior I think this should be solved by applying upstream commit:
commit 04710b40794fb31e9cd70c4a205decf6b40206fd
Author: Ryan McCabe <rmccabe>
Date: Wed Jul 10 17:31:21 2013 -0400
fence-virt: Fail properly if unable to bind the listener socket
Bail out properly in multicast mode if we're unable to bind the TCP
listener socket.
Signed-off-by: Ryan McCabe <rmccabe>
I have this cluster configuration: ================================== Cluster name: cluster Last updated: Fri Mar 7 18:12:24 2014 Last change: Fri Mar 7 18:12:08 2014 via cibadmin on virt-066.cluster-qe.lab.eng.brq.redhat.com Stack: corosync Current DC: virt-068.cluster-qe.lab.eng.brq.redhat.com (3) - partition with quorum Version: 1.1.10-25.el7-368c726 3 Nodes configured 3 Resources configured Online: [ virt-066 virt-067 virt-068 ] Full list of resources: xvm1 (stonith:fence_xvm): Started virt-066 xvm2 (stonith:fence_xvm): Started virt-067 xvm3 (stonith:fence_xvm): Started virt-068 PCSD Status: virt-066.cluster-qe.lab.eng.brq.redhat.com: Online virt-067.cluster-qe.lab.eng.brq.redhat.com: Online virt-068.cluster-qe.lab.eng.brq.redhat.com: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Stonith resources: ================== Resource: xvm1 (class=stonith type=fence_xvm) Attributes: port=virt-066 Operations: monitor interval=60s (xvm1-monitor-interval-60s) Resource: xvm2 (class=stonith type=fence_xvm) Attributes: port=virt-067 Operations: monitor interval=60s (xvm2-monitor-interval-60s) Resource: xvm3 (class=stonith type=fence_xvm) Attributes: port=virt-068 Operations: monitor interval=60s (xvm3-monitor-interval-60s) I chose one node and I stopped network service on it. fence_xvm from package fence-virt-0.3.0-14.el7.rpm crashed after network service stop. Log snippet after crash: virt-076 stonith-ng[13707]: notice: dynamic_list_search_cb: Disabling port list queries for xvm-075 (-103): *** buffer overflow detected ***: fence_xvm terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7f7095d5faf7] /lib64/libc.so.6(+0x10bcc0)[0x7f7095d5dcc0] /lib64/libc.so.6(+0x10da67)[0x7f7095d5fa67] fence_xvm[0x402023] fence_xvm[0x4019de] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7095c73af5] fence_xvm[0x401b31] ======= Memory map: ======== 00400000-0040a000 r-xp 00000000 fd:01 26141278 With fence_xvm from NEW fixed package fence-virt-0.3.0-16.el7 I don't get mentioned result. No backtrace in log. I think the issue with fence_xvm is fixed. But node with stopped network service is still not fenced. Probably there is some bug in stonith. I filed new bug https://bugzilla.redhat.com/show_bug.cgi?id=1074024 This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |
Created attachment 854513 [details] Extract from /var/log/messages. Description of problem: fence_xvm was terminated during fence process. *** buffer overflow detected ***: fence_xvm terminated in /var/log/messages Version-Release number of selected component (if applicable): fence-virt-0.3.0-14.el7.src.rpm How reproducible: always Steps to Reproduce: 1. Create 3 node cluster pcs cluster setup --start --name newcluster <node1> <node2> <node3> 2. Add stonith devices pcs stonith create xvm1 fence_xvm port=<node1> pcs stonith create xvm2 fence_xvm port=<node2> pcs stonith create xvm3 fence_xvm port=<node3> 3. on some node (e.g. <node2>) disable network systemctl stop network.service Actual results: fence_xvm terminated log in /var/log/messages Sometimes node was fenced and sometimes wasn't. Expected results: fence_xvm doesn't terminate and node is fenced. Additional info: Cluster Name: newcluster Corosync Nodes: virt-074.cluster-qe.lab.eng.brq.redhat.com virt-075.cluster-qe.lab.eng.brq.redhat.com virt-076.cluster-qe.lab.eng.brq.redhat.com Pacemaker Nodes: virt-074.cluster-qe.lab.eng.brq.redhat.com virt-075.cluster-qe.lab.eng.brq.redhat.com virt-076.cluster-qe.lab.eng.brq.redhat.com Resources: Stonith Devices: Resource: xvm-074 (class=stonith type=fence_xvm) Attributes: debug=10 port=virt-074.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (xvm-074-monitor-interval-60s) Resource: xvm-075 (class=stonith type=fence_xvm) Attributes: debug=10 port=virt-075.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (xvm-075-monitor-interval-60s) Resource: xvm-076 (class=stonith type=fence_xvm) Attributes: debug=10 port=virt-076.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (xvm-076-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: Colocation Constraints: Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.10-22.el7-368c726iguration