Bug 1319295 - vmtoolsd freezes a filesystem and is not able to thaw it.
Summary: vmtoolsd freezes a filesystem and is not able to thaw it.
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: audit
Version: 6.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Steve Grubb
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-18 18:09 UTC by Cesar Sanchez
Modified: 2016-03-21 12:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-21 12:51:57 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Cesar Sanchez 2016-03-18 18:09:05 UTC
Description of problem:

In our setup we create an LVM volume on a loopback device which is mounted anywhere.

The loopbackfile lives on the root partition.

When Vmware VSPHERE triggers a nimble snapshot, we imagine that vmtoolsd freezes the partition containing the loopback device and freezes our mounted partition, causing the guest vm cpu to go high. The only solution here is to reboot the guest OS to recover.

I'm starting filing the bug against RedHat but I'm pretty sure I'll have to file it too against VMWare as well.

Version-Release number of selected component (if applicable):

How reproducible:


Steps to Reproduce:
0. mount some data partition in /data
1. losetup -f 
2. losetup /dev/loop1 /data/loopbackfile
3. pvcreate /dev/loop1
4. vgcreate vgroup1 /dev/loop1
5. lvcreate -n lvolume1 vgroup1
6. mkfs.ext4 /dev/vgroup1/lvolume1
7. mount /dev/vgroup1/lvolume1 /mnt/mountpoint1
8. do some writes on /mnt/mountpoint1
9. schedule a nimble snapshot on the datastore containing your VM, or perform a snapshot with vmware or manually freeze the partition containing the loopbackfile

Actual results:
1. audit daemon floods with messages and cpu goes high.
2. unusable guest after a couple of minutes.

Expected results:
the guest OS is able to recover.

Additional info:

Comment 1 Steve Grubb 2016-03-18 18:13:32 UTC
By any chance is selinux enabled? If so, you may need to relabel the system.

Comment 3 Cesar Sanchez 2016-03-18 18:39:50 UTC
(In reply to Steve Grubb from comment #1)
> By any chance is selinux enabled? If so, you may need to relabel the system.

Thanks Steve.

No, we've disabled SELINUX as part of our bootstrapping.

Comment 4 Cesar Sanchez 2016-03-18 18:41:15 UTC
Just FYI,

This is also happening when using a docker host VM (using RHEL7) with the devicemapper driver using loop mounted sparse files which is a more generic scenario.

Thanks,

Comment 5 Steve Grubb 2016-03-18 18:43:37 UTC
Do you have any audit rules loaded? If so, maybe do a key report

aureport --start today --key --summary

If not, then try

aureport --start today --event --summary

Auditd eating CPU means you are getting a lot of events. You need to find the source of the events so that you can adjust rules or something.

Comment 6 Cesar Sanchez 2016-03-18 18:55:17 UTC
I think the audit backlog and CPU issues are the effect of the frozen filesystem as stated in:

https://access.redhat.com/solutions/473223

Probably You could re-assign this BZ to the correct product compoinent if it's the case?

Disabling the vmtoolsd solves the issue but at a higher cost at the long term.

Can be this only a VMware bug then?

Comment 7 Steve Grubb 2016-03-18 18:59:30 UTC
> Can be this only a VMware bug then?

Perhaps. Frozen file system would probably cause one thread of the audit system to stop which blocks the other that reads the backlog. When that overflows then syscalls get put on a wait queue.

Comment 8 Paul Moore 2016-03-18 19:25:55 UTC
You mention that there is a flood of audit messages, can you provide an example of the audit messages you are seeing in this flood?

Comment 9 Cesar Sanchez 2016-03-18 19:56:35 UTC
(In reply to Paul Moore from comment #8)
> You mention that there is a flood of audit messages, can you provide an
> example of the audit messages you are seeing in this flood?

Not for now.

I'll update the BZ when I setup a test environment again and I reproduce the issue.

Comment 10 Paul Moore 2016-03-18 20:30:47 UTC
See the RH KB article below, it appears to be a case of VMware freezing the filesystem and failing to unfreeze it later.

 * https://access.redhat.com/solutions/1551943

Comment 11 Cesar Sanchez 2016-03-19 00:27:09 UTC
(In reply to Paul Moore from comment #10)
> See the RH KB article below, it appears to be a case of VMware freezing the
> filesystem and failing to unfreeze it later.
> 
>  * https://access.redhat.com/solutions/1551943

Thanks for your help Paul, that explains it in detail.

I'll contact VMWare for a solution.

-Cesar

Comment 12 Paul Moore 2016-03-21 12:51:57 UTC
Cesar, I'm going to close this as CANTFIX since this is a VMWare issue, if this turns out not to be the case feel free to reopen this case.


Note You need to log in before you can comment on or make changes to this bug.