Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1121544

Summary: QMP hangs when using a small (but specific) value of max_files in qemu.conf
Product: Red Hat Enterprise Linux 6 Reporter: Hu Jianwei <jiahu>
Component: qemu-kvmAssignee: Cole Robinson <crobinso>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 6.6CC: chayang, dyuan, honzhang, jdenemar, juzhang, mkenneth, mzhan, qzhang, rbalakri, rpacheco, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 19:56:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/libvirt/qemu/r7.log
none
/var/log/libvirt/libvirtd.log none

Description Hu Jianwei 2014-07-21 07:57:42 UTC
Description of problem:
Libvirt should deal with starting process of domain rapidly when setting a small value on max_files in qemu.conf

Version-Release number of selected component (if applicable):
libvirt-0.10.2-40.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
[root@rhel6 ~]# cat /etc/libvirt/qemu.conf | grep ^max
max_processes = 1000
max_files = 25
[root@rhel6 ~]# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

[root@rhel6 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     r7                             shut off

[root@rhel6 ~]# time virsh start r7
^C                                          <== it shouldn't hang,should success or fail quickly.

real        14m53.045s
user        0m0.038s
sys        0m0.039s
[root@rhel6 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     r7                             shut off

[root@rhel6 ~]# time virsh start r7
error: Domain is already active


real        0m0.040s
user        0m0.017s
sys        0m0.014s

[root@rhel6 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit                     Soft Limit           Hard Limit           Units    
Max cpu time              unlimited            unlimited            seconds  
Max file size             unlimited            unlimited            bytes    
Max data size             unlimited            unlimited            bytes    
Max stack size            10485760             unlimited            bytes    
Max core file size        0                    unlimited            bytes    
Max resident set          unlimited            unlimited            bytes    
Max processes             1000                 1000                 processes
Max open files            26                   26                   files    
Max locked memory         65536                65536                bytes    
Max address space         unlimited            unlimited            bytes    
Max file locks            unlimited            unlimited            locks    
Max pending signals       62834                62834                signals  
Max msgqueue size         819200               819200               bytes    
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us  


Actual results:
As shown above steps, virsh start lost response.


Expected results:
It shouldn't hang on any valid value for max_files.

When the max_files < a value(here is 25), we can get a clear error message timely,

max_files = 21 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'common'
failed to create eventfd



real        0m2.868s
user        0m0.022s
sys        0m0.021s

max_files = 22 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'modifiers'
failed to create eventfd



real        0m2.805s
user        0m0.022s
sys        0m0.022s

max_files = 24 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: failed to create signalfd



real        0m2.839s
user        0m0.024s
sys        0m0.026s

When max_files = 26 on my machine, I can start the domain successfully:
[root@rhel6 ~]# time virsh start r7
Domain r7 started


real        0m2.630s
user        0m0.020s
sys        0m0.015s

Comment 1 Hu Jianwei 2014-07-21 08:08:33 UTC
Created attachment 919550 [details]
/var/log/libvirt/qemu/r7.log

Comment 2 Hu Jianwei 2014-07-21 08:09:19 UTC
Created attachment 919551 [details]
/var/log/libvirt/libvirtd.log

Comment 4 Jiri Denemark 2014-07-22 09:20:26 UTC
Interesting, so when only the right number of fds is allowed to be open for qemu, it doesn't fail to start but never responds to qmp_capabilities:

This is the last line logged by the thread which starts the domain:

2014-07-21 08:02:11.782+0000: 25964: debug : qemuMonitorSend:911 : QEMU_MONITOR_SEND_MSG: mon=0x7f46ac006340 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1

Comment 5 Hu Jianwei 2014-09-25 04:48:06 UTC
I also can reproduce it on below scenario.

[root@ibm-x3850x5-01 ~]# rpm -q qemu-kvm libvirt
qemu-kvm-0.12.1.2-2.445.el6.x86_64
libvirt-0.10.2-46.el6.x86_64

[root@ibm-x3850x5-01 ~]# cat /etc/libvirt/qemu.conf | grep max_processes -b3
12159:max_processes = 99
12178-max_files = 10000

[root@ibm-x3850x5-01 ~]# virsh dumpxml win7 | grep vcpu
  <vcpu placement='static'>100</vcpu>
[root@ibm-x3850x5-01 ~]# time virsh start win7
^C

real	67m18.505s
user	0m0.218s
sys	0m0.087s

[root@ibm-x3850x5-01 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             99                   99                   processes 
Max open files            10001                10001                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       7754069              7754069              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

Comment 9 Cole Robinson 2015-12-04 19:56:32 UTC
Given that this is a pretty obscure and there's an easy workaround (raise max_files) I don't think this makes sense for fixing RHEL6.

If QE wants to see if it reproduces for RHEL7, might be worth opening a new bug there.