Bug 1242558
Summary: | journal: cannot remove config /etc/libvirt/qemu/tux1.xml: Operation not permitted | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Robert Scheck <redhat-bugzilla> | ||||
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.1 | CC: | agk, cfeist, cluster-maint, djansa, dyuan, jtomko, lmiksik, mnovacek, mzhan, obockows, ppostler, rbalakri, robert.scheck, royoung | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | resource-agents-3.9.5-61.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1283877 (view as bug list) | Environment: | |||||
Last Closed: | 2016-11-03 23:57:32 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1283877 | ||||||
Attachments: |
|
Description
Robert Scheck
2015-07-13 15:06:00 UTC
Cross-filed case #01476059 on the Red Hat customer portal. Hello, The customer logs are attached in the case #01476059, but they are around 116MBs ( i can't attach them here) Do you need any specific files? Release : Red Hat Enterprise Linux Server release 7.1 (Maipo) Kernel : 3.10.0-229.7.2.el7.x86_64 vdsm : Not Installed libvirt : 1.2.8-16.el7_1.3 qemu-img : 1.5.3-86.el7_1.2 qemu-kvm : 1.5.3-86.el7_1.2 The only code paths where libvirt can log that error are the virDomainUndefineFlags API and the migration APIs with the use of the VIR_MIGRATE_UNDEFINE_SOURCE flag. So from libvirt's point of view everything is fine, somebody must have asked libvirt to undefine the machine. I do not see any logs related to libvirt in the customer case, but this is probably done by a different application. Is there anything running that could ask libvirtd to undefine a machine? Ján, could that be caused by /usr/lib/ocf/resource.d/heartbeat/VirtualDomain somehow? We added the virtual machine to pacemaker like this: mkdir -p /var/lib/libvirt/qemu/pacemaker chown qemu:qemu /var/lib/libvirt/qemu/pacemaker pcs cluster cib vm_cfg pcs -f vm_cfg \ resource create vm ocf:heartbeat:VirtualDomain \ config=/etc/libvirt/qemu/vm.xml \ snapshot=/var/lib/libvirt/qemu/pacemaker pcs -f vm_cfg \ resource op remove vm monitor interval=10 timeout=30 # See: RHBZ#1031141 pcs -f vm_cfg \ resource op remove vm start interval=0s timeout=90000 # See: RHBZ#1031141 pcs -f vm_cfg \ resource op remove vm stop interval=0s timeout=90000 # See: RHBZ#1031141 pcs -f vm_cfg \ resource op add vm monitor interval=60s timeout=30s pcs -f vm_cfg \ resource op add vm start interval=0 timeout=120s pcs -f vm_cfg \ resource op add vm stop interval=0 timeout=120s pcs -f vm_cfg \ constraint colocation add vm libvirtd INFINITY pcs -f vm_cfg \ constraint order libvirtd then vm pcs cluster cib-push vm_cfg And VirtualDomain_Start() calls verify_undefined() which runs itself the "virsh undefine <domain>" command. Created attachment 1056605 [details]
Diff between VirtualDomain from RHEL 6 and 7
The "virsh undefined" stuff was newly added with RHEL 7 - and as it works
on RHEL 6, this could be the cause...maybe?
The undefine call was added for bug 1016140: https://github.com/ClusterLabs/resource-agents/commit/f00dcaf19 The following upstream change copies the file back into the original location if it disappeared during the undefine: https://github.com/ClusterLabs/resource-agents/commit/897c03a3 which effectively makes the undefine last only until the next libvirtd restart. If the domain needs to be undefined for the VirtualDomain agent, I think the config should be stored somewhere else, not in libvirt's /etc/libvirt/qemu - all the machines there are marked as defined when the libvirt daemon starts up. I am not sure if 897c03a3 is really a good fix for this issue, given that it is messing around dynamically in /etc/libvirt and /tmp. Note that e.g. in our environment, /etc/libvirt, /var/lib/libvirt and /var/log/libvirt reside on a DRBD device and we currently also seem to loose VMs during pacemaker takeover. That patch just seems to copy the config back if it disappears during undefine. If the config file disappears during undefine, it should stay deleted - otherwise what's the point of undefining it? This is to keep the configuration when doing undefine as virsh removes it if it is in the libvirt-directories, but not if you put it outside of those directories, so this is a patch to avoid users upgrading from earlier versions losing their configuration. This is how it is done in upstream, and you can read the discussion/reasoning here: https://github.com/ClusterLabs/resource-agents/issues/487. Step-by-step reproduction: Keep a backup of vm.xml, so it doesnt get lost in the Before test. Before: # rpm -q resource-agents resource-agents-3.9.5-54.el7_2.6.x86_64 # pcs resource disable vm # virsh define /etc/libvirt/qemu/vm.xml # pcs resource enable vm # ls /etc/libvirt/qemu/vm.xml ls: cannot access /etc/libvirt/qemu/vm.xml: No such file or directory After: # rpm -q resource-agents resource-agents-3.9.5-61.el7.x86_64 # pcs resource disable vm # virsh define /etc/libvirt/qemu/vm.xml # pcs resource enable vm # ls /etc/libvirt/qemu/vm.xml /etc/libvirt/qemu/vm.xml I have verified that xml file defining virtual machine does not disappear from /etc/libvirt/qemu/ after the machine is started with resource-agents 3.9.5-79.el7.x86_64. ---- common environment: # pcs resource create vm ocf:heartbeat:VirtualDomain config=/etc/libvirt/qemu/rhel-7.xm # pcs resource show vm Resource: vm (class=ocf provider=heartbeat type=VirtualDomain) Attributes: config=/etc/libvirt/qemu/rhel-7.xml Utilization: cpu=2 hv_memory=1024 Operations: start interval=0s timeout=90 (vm-start-interval-0s) stop interval=0s timeout=90 (vm-stop-interval-0s) monitor interval=10 timeout=30 (vm-monitor-interval-10) # pcs resource disable vm # ls -l /etc/libvirt/qemu/rhel-7.xml -rw-------. 1 root root 4004 26. čec 10.59 /etc/libvirt/qemu/rhel-7.xml # virsh define /etc/libvirt/qemu/rhel-7.xml before the fix (resource-agents-3.9.5-54.el7.x86_64) ---------------------------------------------------- # pcs resource enable vm # ls -l /etc/libvirt/qemu/*.xml ls: cannot access /etc/libvirt/qemu/rhel-7.xml: No such file or directory after the fix (resource-agents-3.9.5-79.el7.x86_64) --------------------------------------------------- # pcs resource enable vm # ls -l /etc/libvirt/qemu/*.xml -rw-------. 1 root root 4004 26. čec 10.59 /etc/libvirt/qemu/rhel-7.xml # pcs resource | grep vm vm (ocf::heartbeat:VirtualDomain): Started kiff-01.cluster-qe.lab.eng.brq.redhat.com Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2174.html |