Hide Forgot
Description of problem: In our setup we create an LVM volume on a loopback device which is mounted anywhere. The loopbackfile lives on the root partition. When Vmware VSPHERE triggers a nimble snapshot, we imagine that vmtoolsd freezes the partition containing the loopback device and freezes our mounted partition, causing the guest vm cpu to go high. The only solution here is to reboot the guest OS to recover. I'm starting filing the bug against RedHat but I'm pretty sure I'll have to file it too against VMWare as well. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 0. mount some data partition in /data 1. losetup -f 2. losetup /dev/loop1 /data/loopbackfile 3. pvcreate /dev/loop1 4. vgcreate vgroup1 /dev/loop1 5. lvcreate -n lvolume1 vgroup1 6. mkfs.ext4 /dev/vgroup1/lvolume1 7. mount /dev/vgroup1/lvolume1 /mnt/mountpoint1 8. do some writes on /mnt/mountpoint1 9. schedule a nimble snapshot on the datastore containing your VM, or perform a snapshot with vmware or manually freeze the partition containing the loopbackfile Actual results: 1. audit daemon floods with messages and cpu goes high. 2. unusable guest after a couple of minutes. Expected results: the guest OS is able to recover. Additional info:
By any chance is selinux enabled? If so, you may need to relabel the system.
(In reply to Steve Grubb from comment #1) > By any chance is selinux enabled? If so, you may need to relabel the system. Thanks Steve. No, we've disabled SELINUX as part of our bootstrapping.
Just FYI, This is also happening when using a docker host VM (using RHEL7) with the devicemapper driver using loop mounted sparse files which is a more generic scenario. Thanks,
Do you have any audit rules loaded? If so, maybe do a key report aureport --start today --key --summary If not, then try aureport --start today --event --summary Auditd eating CPU means you are getting a lot of events. You need to find the source of the events so that you can adjust rules or something.
I think the audit backlog and CPU issues are the effect of the frozen filesystem as stated in: https://access.redhat.com/solutions/473223 Probably You could re-assign this BZ to the correct product compoinent if it's the case? Disabling the vmtoolsd solves the issue but at a higher cost at the long term. Can be this only a VMware bug then?
> Can be this only a VMware bug then? Perhaps. Frozen file system would probably cause one thread of the audit system to stop which blocks the other that reads the backlog. When that overflows then syscalls get put on a wait queue.
You mention that there is a flood of audit messages, can you provide an example of the audit messages you are seeing in this flood?
(In reply to Paul Moore from comment #8) > You mention that there is a flood of audit messages, can you provide an > example of the audit messages you are seeing in this flood? Not for now. I'll update the BZ when I setup a test environment again and I reproduce the issue.
See the RH KB article below, it appears to be a case of VMware freezing the filesystem and failing to unfreeze it later. * https://access.redhat.com/solutions/1551943
(In reply to Paul Moore from comment #10) > See the RH KB article below, it appears to be a case of VMware freezing the > filesystem and failing to unfreeze it later. > > * https://access.redhat.com/solutions/1551943 Thanks for your help Paul, that explains it in detail. I'll contact VMWare for a solution. -Cesar
Cesar, I'm going to close this as CANTFIX since this is a VMWare issue, if this turns out not to be the case feel free to reopen this case.