Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Libvirt should deal with starting process of domain rapidly when setting a small value on max_files in qemu.conf
Version-Release number of selected component (if applicable):
libvirt-0.10.2-40.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64
How reproducible:
100%
Steps to Reproduce:
[root@rhel6 ~]# cat /etc/libvirt/qemu.conf | grep ^max
max_processes = 1000
max_files = 25
[root@rhel6 ~]# service libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
[root@rhel6 ~]# virsh list --all
Id Name State
----------------------------------------------------
- r7 shut off
[root@rhel6 ~]# time virsh start r7
^C <== it shouldn't hang,should success or fail quickly.
real 14m53.045s
user 0m0.038s
sys 0m0.039s
[root@rhel6 ~]# virsh list --all
Id Name State
----------------------------------------------------
1 r7 shut off
[root@rhel6 ~]# time virsh start r7
error: Domain is already active
real 0m0.040s
user 0m0.017s
sys 0m0.014s
[root@rhel6 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 1000 1000 processes
Max open files 26 26 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 62834 62834 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Actual results:
As shown above steps, virsh start lost response.
Expected results:
It shouldn't hang on any valid value for max_files.
When the max_files < a value(here is 25), we can get a clear error message timely,
max_files = 21 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'common'
failed to create eventfd
real 0m2.868s
user 0m0.022s
sys 0m0.021s
max_files = 22 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'modifiers'
failed to create eventfd
real 0m2.805s
user 0m0.022s
sys 0m0.022s
max_files = 24 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: failed to create signalfd
real 0m2.839s
user 0m0.024s
sys 0m0.026s
When max_files = 26 on my machine, I can start the domain successfully:
[root@rhel6 ~]# time virsh start r7
Domain r7 started
real 0m2.630s
user 0m0.020s
sys 0m0.015s
Interesting, so when only the right number of fds is allowed to be open for qemu, it doesn't fail to start but never responds to qmp_capabilities:
This is the last line logged by the thread which starts the domain:
2014-07-21 08:02:11.782+0000: 25964: debug : qemuMonitorSend:911 : QEMU_MONITOR_SEND_MSG: mon=0x7f46ac006340 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1
I also can reproduce it on below scenario.
[root@ibm-x3850x5-01 ~]# rpm -q qemu-kvm libvirt
qemu-kvm-0.12.1.2-2.445.el6.x86_64
libvirt-0.10.2-46.el6.x86_64
[root@ibm-x3850x5-01 ~]# cat /etc/libvirt/qemu.conf | grep max_processes -b3
12159:max_processes = 99
12178-max_files = 10000
[root@ibm-x3850x5-01 ~]# virsh dumpxml win7 | grep vcpu
<vcpu placement='static'>100</vcpu>
[root@ibm-x3850x5-01 ~]# time virsh start win7
^C
real 67m18.505s
user 0m0.218s
sys 0m0.087s
[root@ibm-x3850x5-01 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 99 99 processes
Max open files 10001 10001 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 7754069 7754069 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Given that this is a pretty obscure and there's an easy workaround (raise max_files) I don't think this makes sense for fixing RHEL6.
If QE wants to see if it reproduces for RHEL7, might be worth opening a new bug there.