Bug 1464083 - libvirtd doesn't give a decent error if inotify limits are too low
libvirtd doesn't give a decent error if inotify limits are too low
Status: NEW
Product: Fedora
Classification: Fedora
Component: libvirt (Show other bugs)
27
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Libvirt Maintainers
Fedora Extras Quality Assurance
:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs
  Show dependency treegraph
 
Reported: 2017-06-22 08:00 EDT by Richard W.M. Jones
Modified: 2017-08-15 03:50 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
libvirtd log when it fails (1.04 MB, text/plain)
2017-06-22 08:02 EDT, Richard W.M. Jones
no flags Details

  None (edit)
Description Richard W.M. Jones 2017-06-22 08:00:40 EDT
Description of problem:

Basically I hit the bug exactly as diagnosed and worked around here:
https://github.com/Connexions/devops/wiki/libvirtd-won't-start

$ virsh list --all
error: failed to connect to the hypervisor
error: Cannot recv data: Connection reset by peer
$ cat /proc/sys/fs/inotify/max_user_watches 
8192
$ cat /proc/sys/fs/inotify/max_user_instances 
128
$ sudo sysctl -n -w fs.inotify.max_user_watches=16384
16384
$ sudo sysctl -n -w fs.inotify.max_user_instances=256
256
$ virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     tmp-bz1431579                  shut off


Version-Release number of selected component (if applicable):

libvirt-daemon-3.2.1-3.fc26.x86_64

How reproducible:

As above.

Steps to Reproduce:

Unclear how to exactly reproduce it.
Comment 1 Richard W.M. Jones 2017-06-22 08:02 EDT
Created attachment 1290663 [details]
libvirtd log when it fails
Comment 2 Richard W.M. Jones 2017-06-22 11:52:40 EDT
Actually the problem is a bit stranger than I thought.  It appears
that something leaks the inotify watches, so that even increasing
the limits does not help - eventually it will run out again.

This happens after using 'make check-release' which is a very
long test of libguestfs which runs many hundreds, possibly thousands
of VM instances using libvirt.
Comment 3 Richard W.M. Jones 2017-06-22 11:54:43 EDT
It turns out (thanks lsof) this is actually caused by leaking
gpg-agent instances.  I'll file another bug about that.

However libvirt could still give a decent error message.
Comment 4 Daniel Berrange 2017-06-22 12:21:23 EDT
The only bits of libvirt using inotify are UML and Xen drivers, each only register a single watch, at initial startup. So if there is a failure, it should only hit at libvirtd startup time - i guess if you are using libvirt session mode though, and have enough time for libvirtd to shutdown you'd be starting it multiple time, and so might not see the failure immediately in your test suite.

The problem we're seeing with error reporting here is related to the auto-spawn of libvirtd. We're successfully spawning libvirtd, and at least starting to connect to it, because the listener socket is ready, then UML fails to setup inotify, causing it to shutdown again, at which point virsh gets the error. We've no way to get the errors reported by libvirtd, back to virsh, hence the somewhat unhelpful error message we see.
Comment 5 Richard W.M. Jones 2017-06-22 13:01:47 EDT
(In reply to Daniel Berrange from comment #4)
> The only bits of libvirt using inotify are UML and Xen drivers, each only
> register a single watch, at initial startup. So if there is a failure, it
> should only hit at libvirtd startup time - i guess if you are using libvirt
> session mode though, and have enough time for libvirtd to shutdown you'd be
> starting it multiple time, and so might not see the failure immediately in
> your test suite.

Just to clarify: libvirtd (session instance) cannot be started
at all.  There is no session daemon, and running trivial virsh commands
fails, and there is no session daemon running afterwards either.
Comment 6 Jan Kurik 2017-08-15 03:50:35 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Note You need to log in before you can comment on or make changes to this bug.