RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1242558 - journal: cannot remove config /etc/libvirt/qemu/tux1.xml: Operation not permitted
Summary: journal: cannot remove config /etc/libvirt/qemu/tux1.xml: Operation not permi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1283877
TreeView+ depends on / blocked
 
Reported: 2015-07-13 15:06 UTC by Robert Scheck
Modified: 2019-09-12 08:37 UTC (History)
14 users (show)

Fixed In Version: resource-agents-3.9.5-61.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1283877 (view as bug list)
Environment:
Last Closed: 2016-11-03 23:57:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Diff between VirtualDomain from RHEL 6 and 7 (13.96 KB, patch)
2015-07-27 14:30 UTC, Robert Scheck
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1016140 0 unspecified CLOSED VirtualDomain will not start lxc container that is shut off 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:2174 0 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2016-11-03 13:16:36 UTC

Internal Links: 1016140

Description Robert Scheck 2015-07-13 15:06:00 UTC
Description of problem:
Either journal or libvirt (?) tries to delete the XML configuration files
for virtual machines from time to time, but I do not get why this happens.
If we 'chattr +i' the files, we get these errors logged:

Jul  8 09:14:35 intranet1 journal: libvirt version: 1.2.8, package: 16.el7_1.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-04-02-12:11:36, x86-024.build.eng.bos.redhat.com)
Jul  8 09:14:35 intranet1 journal: cannot remove config /etc/libvirt/qemu/tux1.xml: Operation not permitted
Jul  8 09:14:35 intranet1 journal: cannot remove config /etc/libvirt/qemu/tux2.xml: Operation not permitted
Jul  8 09:14:35 intranet1 journal: cannot remove config /etc/libvirt/qemu/tux3.xml: Operation not permitted

Without 'chattr +i' the file is gone and you (as a sysadmin) are doomed.

Version-Release number of selected component (if applicable):
libvirt-1.2.8-16.el7_1.3.x86_64
systemd-208-20.el7_1.5.x86_64

How reproducible:
Don't know how to reproduce exactly, sometimes shutting down the VM is
enough. Note that /etc/libvirt is symlinked to a DRBD partition, but that
shouldn't be the cause given the same scenario works with RHEL 6 fine.

Actual results:
Delete XML configuration file for virtual machine.

Expected results:
No daemon should start deleting configuration files under /etc simply.

Comment 2 Robert Scheck 2015-07-13 15:42:42 UTC
Cross-filed case #01476059 on the Red Hat customer portal.

Comment 3 Alexandros Gkesos 2015-07-27 11:45:21 UTC
Hello,

The customer logs are attached in the case #01476059, but they are around 116MBs ( i can't attach them here)

Do you need any specific files?

Release    :  Red Hat Enterprise Linux Server release 7.1 (Maipo)
Kernel     :  3.10.0-229.7.2.el7.x86_64

vdsm	    : Not Installed          	 libvirt     : 1.2.8-16.el7_1.3      
qemu-img   : 1.5.3-86.el7_1.2       	 qemu-kvm    : 1.5.3-86.el7_1.2

Comment 4 Ján Tomko 2015-07-27 14:21:42 UTC
The only code paths where libvirt can log that error are the virDomainUndefineFlags API and the migration APIs with the use of the VIR_MIGRATE_UNDEFINE_SOURCE flag.

So from libvirt's point of view everything is fine, somebody must have asked libvirt to undefine the machine.

I do not see any logs related to libvirt in the customer case, but this is probably done by a different application. Is there anything running that could ask libvirtd to undefine a machine?

Comment 5 Robert Scheck 2015-07-27 14:27:55 UTC
Ján, could that be caused by /usr/lib/ocf/resource.d/heartbeat/VirtualDomain
somehow? We added the virtual machine to pacemaker like this:

mkdir -p /var/lib/libvirt/qemu/pacemaker
chown qemu:qemu /var/lib/libvirt/qemu/pacemaker
pcs cluster cib vm_cfg
pcs -f vm_cfg \
  resource create vm ocf:heartbeat:VirtualDomain \
    config=/etc/libvirt/qemu/vm.xml \
    snapshot=/var/lib/libvirt/qemu/pacemaker
pcs -f vm_cfg \
  resource op remove vm monitor interval=10 timeout=30   # See: RHBZ#1031141
pcs -f vm_cfg \
  resource op remove vm start interval=0s timeout=90000  # See: RHBZ#1031141
pcs -f vm_cfg \
  resource op remove vm stop interval=0s timeout=90000   # See: RHBZ#1031141
pcs -f vm_cfg \
  resource op add vm monitor interval=60s timeout=30s
pcs -f vm_cfg \
  resource op add vm start interval=0 timeout=120s
pcs -f vm_cfg \
  resource op add vm stop interval=0 timeout=120s
pcs -f vm_cfg \
  constraint colocation add vm libvirtd INFINITY
pcs -f vm_cfg \
  constraint order libvirtd then vm
pcs cluster cib-push vm_cfg

And VirtualDomain_Start() calls verify_undefined() which runs itself the
"virsh undefine <domain>" command.

Comment 6 Robert Scheck 2015-07-27 14:30:24 UTC
Created attachment 1056605 [details]
Diff between VirtualDomain from RHEL 6 and 7

The "virsh undefined" stuff was newly added with RHEL 7 - and as it works
on RHEL 6, this could be the cause...maybe?

Comment 7 Ján Tomko 2015-07-27 15:00:50 UTC
The undefine call was added for bug 1016140:
https://github.com/ClusterLabs/resource-agents/commit/f00dcaf19

The following upstream change copies the file back into the original location if it disappeared during the undefine:
https://github.com/ClusterLabs/resource-agents/commit/897c03a3
which effectively makes the undefine last only until the next libvirtd restart.

If the domain needs to be undefined for the VirtualDomain agent, I think the config should be stored somewhere else, not in libvirt's /etc/libvirt/qemu - all the machines there are marked as defined when the libvirt daemon starts up.

Comment 8 Robert Scheck 2015-07-27 15:17:29 UTC
I am not sure if 897c03a3 is really a good fix for this issue, given that it
is messing around dynamically in /etc/libvirt and /tmp. Note that e.g. in our
environment, /etc/libvirt, /var/lib/libvirt and /var/log/libvirt reside on a
DRBD device and we currently also seem to loose VMs during pacemaker takeover.

Comment 13 Ján Tomko 2015-10-07 14:43:15 UTC
That patch just seems to copy the config back if it disappears during undefine.
If the config file disappears during undefine, it should stay deleted - otherwise what's the point of undefining it?

Comment 14 Oyvind Albrigtsen 2015-10-08 10:48:36 UTC
This is to keep the configuration when doing undefine as virsh removes it if it is in the libvirt-directories, but not if you put it outside of those directories, so this is a patch to avoid users upgrading from earlier versions losing their configuration.

This is how it is done in upstream, and you can read the discussion/reasoning here: https://github.com/ClusterLabs/resource-agents/issues/487.

Comment 19 Oyvind Albrigtsen 2016-03-01 11:10:42 UTC
Step-by-step reproduction:
Keep a backup of vm.xml, so it doesnt get lost in the Before test.

Before:
# rpm -q resource-agents
resource-agents-3.9.5-54.el7_2.6.x86_64
# pcs resource disable vm
# virsh define /etc/libvirt/qemu/vm.xml
# pcs resource enable vm
# ls /etc/libvirt/qemu/vm.xml
ls: cannot access /etc/libvirt/qemu/vm.xml: No such file or directory

After:
# rpm -q resource-agents
resource-agents-3.9.5-61.el7.x86_64
# pcs resource disable vm
# virsh define /etc/libvirt/qemu/vm.xml
# pcs resource enable vm
# ls /etc/libvirt/qemu/vm.xml 
/etc/libvirt/qemu/vm.xml

Comment 20 michal novacek 2016-07-26 09:37:02 UTC
I have verified that xml file defining virtual machine does not disappear from
/etc/libvirt/qemu/ after the machine is started with resource-agents
3.9.5-79.el7.x86_64.

----

common environment:
# pcs resource create vm ocf:heartbeat:VirtualDomain config=/etc/libvirt/qemu/rhel-7.xm
# pcs resource show vm
 Resource: vm (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/rhel-7.xml
  Utilization: cpu=2 hv_memory=1024
  Operations: start interval=0s timeout=90 (vm-start-interval-0s)
              stop interval=0s timeout=90 (vm-stop-interval-0s)
              monitor interval=10 timeout=30 (vm-monitor-interval-10)
# pcs resource disable vm
# ls -l /etc/libvirt/qemu/rhel-7.xml
-rw-------. 1 root root 4004 26. čec 10.59 /etc/libvirt/qemu/rhel-7.xml
# virsh define /etc/libvirt/qemu/rhel-7.xml

before the fix (resource-agents-3.9.5-54.el7.x86_64)
----------------------------------------------------
# pcs resource enable vm
# ls -l /etc/libvirt/qemu/*.xml
ls: cannot access /etc/libvirt/qemu/rhel-7.xml: No such file or directory


after the fix (resource-agents-3.9.5-79.el7.x86_64)
---------------------------------------------------
# pcs resource enable vm
# ls -l /etc/libvirt/qemu/*.xml
-rw-------. 1 root root 4004 26. čec 10.59 /etc/libvirt/qemu/rhel-7.xml

# pcs resource  | grep vm
 vm     (ocf::heartbeat:VirtualDomain): Started kiff-01.cluster-qe.lab.eng.brq.redhat.com

Comment 22 errata-xmlrpc 2016-11-03 23:57:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html


Note You need to log in before you can comment on or make changes to this bug.