Red Hat Bugzilla – Bug 1464083
libvirtd doesn't give a decent error if inotify limits are too low
Last modified: 2017-08-15 03:50:35 EDT
Description of problem:
Basically I hit the bug exactly as diagnosed and worked around here:
$ virsh list --all
error: failed to connect to the hypervisor
error: Cannot recv data: Connection reset by peer
$ cat /proc/sys/fs/inotify/max_user_watches
$ cat /proc/sys/fs/inotify/max_user_instances
$ sudo sysctl -n -w fs.inotify.max_user_watches=16384
$ sudo sysctl -n -w fs.inotify.max_user_instances=256
$ virsh list --all
Id Name State
- tmp-bz1431579 shut off
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Unclear how to exactly reproduce it.
Created attachment 1290663 [details]
libvirtd log when it fails
Actually the problem is a bit stranger than I thought. It appears
that something leaks the inotify watches, so that even increasing
the limits does not help - eventually it will run out again.
This happens after using 'make check-release' which is a very
long test of libguestfs which runs many hundreds, possibly thousands
of VM instances using libvirt.
It turns out (thanks lsof) this is actually caused by leaking
gpg-agent instances. I'll file another bug about that.
However libvirt could still give a decent error message.
The only bits of libvirt using inotify are UML and Xen drivers, each only register a single watch, at initial startup. So if there is a failure, it should only hit at libvirtd startup time - i guess if you are using libvirt session mode though, and have enough time for libvirtd to shutdown you'd be starting it multiple time, and so might not see the failure immediately in your test suite.
The problem we're seeing with error reporting here is related to the auto-spawn of libvirtd. We're successfully spawning libvirtd, and at least starting to connect to it, because the listener socket is ready, then UML fails to setup inotify, causing it to shutdown again, at which point virsh gets the error. We've no way to get the errors reported by libvirtd, back to virsh, hence the somewhat unhelpful error message we see.
(In reply to Daniel Berrange from comment #4)
> The only bits of libvirt using inotify are UML and Xen drivers, each only
> register a single watch, at initial startup. So if there is a failure, it
> should only hit at libvirtd startup time - i guess if you are using libvirt
> session mode though, and have enough time for libvirtd to shutdown you'd be
> starting it multiple time, and so might not see the failure immediately in
> your test suite.
Just to clarify: libvirtd (session instance) cannot be started
at all. There is no session daemon, and running trivial virsh commands
fails, and there is no session daemon running afterwards either.
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.