Bug 1953597
| Summary: | pacemaker_remoted shows "Error in the push function" if more than one resource is assigned to remote guest KVM VM | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Juergen Schleich <jenginfo> |
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.9 | CC: | admin, cluster-maint, sbradley |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-19 16:20:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi, Would it be possible for you to open a support case first? I would like to rule out other possible causes before focusing on pacemaker. There are multiple components involved in this issue, and support has better capabilities for narrowing that down. You can initiate a case with Red Hat's Global Support Services group through one of the methods listed at the following link: https://access.redhat.com/start/how-to-engage-red-hat-support Hi, thanks for the update. This issue happened in an early state of a project. I can open the SR in a couple of weeks when the contracts are in place... (In reply to Juergen Schleich from comment #3) > Hi, thanks for the update. > This issue happened in an early state of a project. I can open the SR in a > couple of weeks when the contracts are in place... Sounds good If this is determined to be an issue in Pacemaker, we can reopen |
Description of problem: In a pacemaker cluster with remote guest node the pacemaker_remoted shows the following error messages: pacemaker_remoted[992]: error: Connection terminated: Error in the push function. pacemaker_remoted[992]: error: Connection terminated: The specified session has been invalidated for some reason. pacemaker_remoted[992]: error: Could not send remote message: Software caused connection abort in the /var/log/message file of the pacemaker remote guest node when moving the VirtualDomain resource to another physical node. The issue occurs if assigning two resources to the pacemaker remote guest node and live migrate one of the resource to the other node. The issue is not visible is assigning only one resource to the pacemaker remote guest node. Version-Release number of selected component (if applicable): physical nodes: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.9 (Maipo) # uname -a Linux pnode03 3.10.0-1160.11.1.el7.x86_64 #1 SMP Tue Dec 15 11:58:45 PST 2020 x86_64 x86_64 x86_64 GNU/Linux # pacemakerd -$ Pacemaker 1.1.23-1.0.1.el7 remote guest node: # uname -a Linux vmguestremote5 3.10.0-1160.el7.x86_64 #1 SMP Thu Oct 1 17:21:35 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux # pacemakerd -$ Pacemaker 1.1.23-1.0.1.el7 How reproducible: always. Simple create a pacemaker remote guest node which is controlled by VirtualDomain resource agent. Then assign 2 resources to the remote guest node resource. Afterwards do a move (live migration) of the VirtualDomain resource which is owning the 2 resources. Steps to Reproduce: 1. Create a KVM VM which can be used for live migration in pacemaker 2. Create the VirtualDomain resource for this KVM VM e.g: # pcs resource create vmguestremote5-rs VirtualDomain hypervisor="qemu:///system" config="/etc/pacemaker/vmguestremote5.xml" migration_transport=ssh meta allow-migrate="true" priority="100" 3. Add the VM as remote guest node to pacemaker [vmguestremote5]# yum install pacemaker-remote resource-agents pcs [vmguestremote5]# systemctl enable pcsd [vmguestremote5]# systemctl start pcsd [vmguestremote5]# systemctl start pacemaker_remote [vmguestremote5]# systemctl enable pacemaker_remote [vmguestremote5]# passwd hacluster [pnode03]# pcs cluster auth vmguestremote5 -u hacluster [pnode03]# pcs cluster node add-guest vmguestremote5 vmguestremote5-rs optional, test live migration Actual results: 4. Now starting with error reproduction: a) Start nfsserver and create directories in the remote guest node # systemctl start nfsserver # mkdir /export/data1 # mkdir /export/data2 b) Create two exportfs resources # pcs resource create nfsdata1 ocf:heartbeat:exportfs clientspec="*/24" options=rw,sync,no_root_squash directory=/export/data1 fsid=1 # pcs constraint location nfsdata1 prefers vmguestremote5 # pcs resource create nfsdata2 ocf:heartbeat:exportfs clientspec="*/24" options=rw,sync,no_root_squash directory=/export/data2 fsid=2 # pcs constraint location nfsdata2 prefers vmguestremote5 c) Do a live migration of the VirtualDomain resource # pcs resource move vmguestremote5-rs pnode03 Monitor the live migration with crm_mon and in remote guest node the /var/log/messages file: In crm_mon you will see: vmguestremote5-rs (ocf::heartbeat:VirtualDomain): Started pnode04 vmguestremote5-rs (ocf::heartbeat:VirtualDomain): FAILED pnode04 <<<<<<<<<<< vmguestremote5-rs (ocf::heartbeat:VirtualDomain): Started pnode03 instead of: vmguestremote5-rs (ocf::heartbeat:VirtualDomain): Started pnode04 vmguestremote5-rs (ocf::heartbeat:VirtualDomain): Migrating pnode04 vmguestremote5-rs (ocf::heartbeat:VirtualDomain): Started pnode03 /var/log/messages file will show: Apr 23 12:41:33 vmguestremote5 pacemaker_remoted[1553]: error: Connection terminated: Error in the push function. Apr 23 12:41:33 vmguestremote5 pacemaker_remoted[1553]: error: Connection terminated: The specified session has been invalidated for some reason. Apr 23 12:41:33 vmguestremote5 pacemaker_remoted[1553]: error: Could not send remote message: Software caused connection abort Apr 23 12:41:33 vmguestremote5 pacemaker_remoted[1553]: warning: Could not notify client remote-lrmd-vmguestremote5:3121/52219315-8c3a-4488-bb3e-e7e0db17472a: Software caused connection abort Expected results: No error messages if doing a live migration. And no "FAILED" message from VirtualDomain resource in crm_mon output if doing a live migration. Additional info: A) The same issue occur if using the ocf:heartbeat:Dummy resource agent. B) The same issue occur if you execute the command [vmguestremote5]# pcs resource move vsmgc5k-rs pnode03 in the remote guest node. In this case only 1 resource need to be configured to reproduce the error message in /var/log/messages and the FAILED message in crm_mon.