Bug 1121544 - QMP hangs when using a small (but specific) value of max_files in qemu.conf
Summary: QMP hangs when using a small (but specific) value of max_files in qemu.conf
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: qemu-kvm
Version: 6.6
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Cole Robinson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-21 07:57 UTC by Hu Jianwei
Modified: 2015-12-04 19:56 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-04 19:56:32 UTC


Attachments (Terms of Use)
/var/log/libvirt/qemu/r7.log (8.61 KB, text/plain)
2014-07-21 08:08 UTC, Hu Jianwei
no flags Details
/var/log/libvirt/libvirtd.log (455.02 KB, text/plain)
2014-07-21 08:09 UTC, Hu Jianwei
no flags Details

Description Hu Jianwei 2014-07-21 07:57:42 UTC
Description of problem:
Libvirt should deal with starting process of domain rapidly when setting a small value on max_files in qemu.conf

Version-Release number of selected component (if applicable):
libvirt-0.10.2-40.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.428.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
[root@rhel6 ~]# cat /etc/libvirt/qemu.conf | grep ^max
max_processes = 1000
max_files = 25
[root@rhel6 ~]# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

[root@rhel6 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     r7                             shut off

[root@rhel6 ~]# time virsh start r7
^C                                          <== it shouldn't hang,should success or fail quickly.

real        14m53.045s
user        0m0.038s
sys        0m0.039s
[root@rhel6 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     r7                             shut off

[root@rhel6 ~]# time virsh start r7
error: Domain is already active


real        0m0.040s
user        0m0.017s
sys        0m0.014s

[root@rhel6 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit                     Soft Limit           Hard Limit           Units    
Max cpu time              unlimited            unlimited            seconds  
Max file size             unlimited            unlimited            bytes    
Max data size             unlimited            unlimited            bytes    
Max stack size            10485760             unlimited            bytes    
Max core file size        0                    unlimited            bytes    
Max resident set          unlimited            unlimited            bytes    
Max processes             1000                 1000                 processes
Max open files            26                   26                   files    
Max locked memory         65536                65536                bytes    
Max address space         unlimited            unlimited            bytes    
Max file locks            unlimited            unlimited            locks    
Max pending signals       62834                62834                signals  
Max msgqueue size         819200               819200               bytes    
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us  


Actual results:
As shown above steps, virsh start lost response.


Expected results:
It shouldn't hang on any valid value for max_files.

When the max_files < a value(here is 25), we can get a clear error message timely,

max_files = 21 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'common'
failed to create eventfd



real        0m2.868s
user        0m0.022s
sys        0m0.021s

max_files = 22 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: Could not read keymap file: 'modifiers'
failed to create eventfd



real        0m2.805s
user        0m0.022s
sys        0m0.022s

max_files = 24 on my machine:
[root@rhel6 ~]# time virsh start r7
error: Failed to start domain r7
error: internal error process exited while connecting to monitor: failed to create signalfd



real        0m2.839s
user        0m0.024s
sys        0m0.026s

When max_files = 26 on my machine, I can start the domain successfully:
[root@rhel6 ~]# time virsh start r7
Domain r7 started


real        0m2.630s
user        0m0.020s
sys        0m0.015s

Comment 1 Hu Jianwei 2014-07-21 08:08:33 UTC
Created attachment 919550 [details]
/var/log/libvirt/qemu/r7.log

Comment 2 Hu Jianwei 2014-07-21 08:09:19 UTC
Created attachment 919551 [details]
/var/log/libvirt/libvirtd.log

Comment 4 Jiri Denemark 2014-07-22 09:20:26 UTC
Interesting, so when only the right number of fds is allowed to be open for qemu, it doesn't fail to start but never responds to qmp_capabilities:

This is the last line logged by the thread which starts the domain:

2014-07-21 08:02:11.782+0000: 25964: debug : qemuMonitorSend:911 : QEMU_MONITOR_SEND_MSG: mon=0x7f46ac006340 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1

Comment 5 Hu Jianwei 2014-09-25 04:48:06 UTC
I also can reproduce it on below scenario.

[root@ibm-x3850x5-01 ~]# rpm -q qemu-kvm libvirt
qemu-kvm-0.12.1.2-2.445.el6.x86_64
libvirt-0.10.2-46.el6.x86_64

[root@ibm-x3850x5-01 ~]# cat /etc/libvirt/qemu.conf | grep max_processes -b3
12159:max_processes = 99
12178-max_files = 10000

[root@ibm-x3850x5-01 ~]# virsh dumpxml win7 | grep vcpu
  <vcpu placement='static'>100</vcpu>
[root@ibm-x3850x5-01 ~]# time virsh start win7
^C

real	67m18.505s
user	0m0.218s
sys	0m0.087s

[root@ibm-x3850x5-01 ~]# cat /proc/`pidof qemu-kvm`/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             99                   99                   processes 
Max open files            10001                10001                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       7754069              7754069              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

Comment 9 Cole Robinson 2015-12-04 19:56:32 UTC
Given that this is a pretty obscure and there's an easy workaround (raise max_files) I don't think this makes sense for fixing RHEL6.

If QE wants to see if it reproduces for RHEL7, might be worth opening a new bug there.


Note You need to log in before you can comment on or make changes to this bug.