Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1172522

Summary: Failed to start domains discontinuously during starting 514 guests continuously
Product: Red Hat Enterprise Linux 7 Reporter: Hu Jianwei <jiahu>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dyuan, honzhang, jdenemar, mzhan, rbalakri
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-10 10:26:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hu Jianwei 2014-12-10 09:57:09 UTC
Description
Failed to start domains discontinuously during starting 514 guests continuously

Version:
libvirt-1.2.8-10.el7.x86_64
qemu-kvm-rhev-2.1.2-16.el7.x86_64
spice-server-0.12.4-8.el7.x86_64
selinux-policy-3.13.1-9.el7.noarch

How reproducible:
100%

Steps to Reproduce:
[root@ibm-x3850x5-02 scalability]# getenforce 
Enforcing

[root@ibm-x3850x5-02 scalability]# cat /etc/systemd/system/multi-user.target.wants/libvirtd.service| grep Limit
LimitNOFILE=2048000

[root@dhcp-66-83-39 433894]# cat /etc/libvirt/libvirtd.conf | grep ^[^#]
listen_tls = 1
auth_tls = "none"
max_anonymous_clients = 20000
max_workers = 20000
max_requests = 20000
max_client_requests = 5000
log_level = 1
log_filters="1:qemu_monitor_json"
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
[root@dhcp-66-83-39 433894]# cat /etc/libvirt/qemu.conf | grep ^[^#]
max_processes = 1000
max_files = 10000
[root@dhcp-66-83-39 433894]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1934080
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1000000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1934080
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[root@dhcp-66-83-39 433894]#
[root@dhcp-66-83-39 433894]# for i in {1..514};do virsh start guest$i;sleep 3;done
...(clipped)
Domain guest263 started

Domain guest264 started

error: Failed to start domain guest265
error: internal error: early end of file from monitor: possible problem:
((null):72465): Spice-ERROR **: red_dispatcher.c:1145:red_dispatcher_init: create thread failed 11


error: Failed to start domain guest266
error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable


error: Failed to start domain guest267
error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable


Domain guest268 started

...(clipped)

error: Failed to start domain guest271
error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable


error: Failed to start domain guest272
error: internal error: early end of file from monitor: possible problem:
((null):72589): Spice-ERROR **: red_dispatcher.c:1145:red_dispatcher_init: create thread failed 11


Domain guest273 started

Domain guest274 started

error: Failed to start domain guest275
error: internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Resource temporarily unavailable


error: Failed to start domain guest276
error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable


error: Failed to start domain guest277
error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable
...(clipped)

Actual results:
Met two errors:
a. error: internal error: early end of file from monitor: possible problem:
((null):72746): Spice-ERROR **: red_dispatcher.c:1145:red_dispatcher_init: create thread failed 11

b. error: internal error: early end of file from monitor: possible problem:
qemu: qemu_thread_create: Resource temporarily unavailable

When enlarge the sleep time to 60 second, the issue still can reproduce.

Disabled selinux(Permissive), the issue still can reproduce too.

++++++++++++++++++++++++++++++++++++++++++++++
Some logs from libvirtd.log:
2014-12-10 09:51:25.294+0000: 8568: error : qemuMonitorOpenUnix:309 : failed to connect to monitor socket: No such process
2014-12-10 09:51:25.294+0000: 8567: debug : virEventPollMakePollFDs:402 : Prepare n=268 w=269, f=281 e=25 d=0
2014-12-10 09:51:25.294+0000: 8568: debug : virFileClose:99 : Closed fd 359
...
2014-12-10 09:51:25.297+0000: 8567: debug : virEventPollMakePollFDs:402 : Prepare n=150 w=151, f=162 e=25 d=0
2014-12-10 09:51:25.297+0000: 8568: error : qemuProcessWaitForMonitor:2205 : internal error: process exited while connecting to monitor: qemu: qemu_thread_create: Resource temporarily unavailable

2014-12-10 09:51:25.297+0000: 8567: debug : virEventPollMakePollFDs:402 : Prepare n=151 w=152, f=163 e=25 d=0
2014-12-10 09:51:25.297+0000: 8567: debug : virEventPollMakePollFDs:402 : Prepare n=152 w=153, f=164 e=25 d=0
2014-12-10 09:51:25.297+0000: 8567: debug : virEventPollMakePollFDs:402 : Prepare n=153 w=154, f=166 e=25 d=0
2014-12-10 09:51:25.297+0000: 8568: debug : virFileClose:99 : Closed fd 358
++++++++++++++++++++++++++++++++++++++++++++++++


Expected results:
Fix it

Additional info:
domain xml example:
[root@dhcp-66-83-39 433894]# virsh dumpxml guest400
<domain type='kvm'>
  <name>guest400</name>
  <uuid>7a7df8f0-dedd-468d-be1d-19951a03f6c8</uuid>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/home/images/guest400.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='8192' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='none'/>
  </devices>
</domain>

Comment 2 Jiri Denemark 2014-12-10 10:26:49 UTC
Well, you set max_processes to 1000 and then try to start 514 domains with 4 VCPUs each, which means that 2056 (514 * 4) CPU threads will need to be created. It's logical QEMU can't create threads and dies. And I don't even count non-CPU threads QEMU also needs to create. Each QEMU process will have at least one but possibly more threads in addition to CPU threads.

Comment 3 Hu Jianwei 2014-12-10 13:32:51 UTC
Yes, when enlarging the max_processes in qemu.conf, the "Resource temporarily unavailable" is disappeared, but hit 3 times another error:

[root@dhcp-66-83-39 ~]# cat /etc/libvirt/qemu.conf | grep ^[^#]
max_processes = 1000000
max_files = 1000000

...
Domain guest497 started

Domain guest498 started

Domain guest499 started

error: Failed to start domain guest500
error: internal error: early end of file from monitor: possible problem:
((null):25194): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied


Domain guest501 started

Domain guest502 started
...

Generally, User don't know what resource temporarily unavailable from the error message, why not let libvirt or qemu-kvm report a clear error message?

Thanks.

Comment 4 Hu Jianwei 2014-12-12 02:34:43 UTC
Also, during doing 514 guests migration, I hit 14 times same issue in comment 3.

[root@ibm-x3850x5-02 scalability]# rpm -q libvirt qemu-kvm-rhev
libvirt-1.2.8-10.el7.x86_64
qemu-kvm-rhev-2.1.2-16.el7.x86_64

[root@ibm-x3850x5-02 scalability]# cat mig.sh 
#! /bin/sh -
for i in {1..514}
do  
echo "---------guest$i-------"
 virsh migrate --live  --undefinesource guest$i qemu+tls://10.66.83.39/system --verbose
done

[root@ibm-x3850x5-02 scalability]# nohup sh mig.sh &

After finishing the scripts, check the output.
[root@ibm-x3850x5-02 scalability]# cat nohup.out | grep error -b2
Migration: [100 %]
48159----------guest207-------
48184:error: internal error: early end of file from monitor: possible problem:
48257-((null):24633): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
48362-
--
Migration: [100 %]
48677----------guest212-------
48702:error: internal error: early end of file from monitor: possible problem:
48775-((null):24711): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
48880-
--
Migration: [100 %]
49233----------guest217-------
49258:error: internal error: early end of file from monitor: possible problem:
49331-((null):24776): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
49436-
--
Migration: [100 %]
58016----------guest279-------
58041:error: internal error: early end of file from monitor: possible problem:
58114-((null):25805): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
58219-
--
Migration: [100 %]
64553----------guest316-------
64578:error: internal error: early end of file from monitor: possible problem:
64651-((null):26682): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
64756-
--
Migration: [100 %]
65541----------guest323-------
65566:error: internal error: early end of file from monitor: possible problem:
65639-((null):26849): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
65744-
--
Migration: [100 %]
66655----------guest329-------
66680:error: internal error: early end of file from monitor: possible problem:
66753-((null):27036): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
66858-
--
Migration: [100 %]
67019----------guest331-------
67044:error: internal error: early end of file from monitor: possible problem:
67117-((null):27094): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
67222-
--
Migration: [100 %]
67383----------guest333-------
67408:error: internal error: early end of file from monitor: possible problem:
67481-((null):27152): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
67586-
--
Migration: [100 %]
68269----------guest339-------
68294:error: internal error: early end of file from monitor: possible problem:
68367-((null):27285): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
68472-
--
Migration: [100 %]
73285----------guest365-------
73310:error: internal error: early end of file from monitor: possible problem:
73383-((null):28066): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
73488-
--
Migration: [100 %]
77798----------guest387-------
77823:error: internal error: early end of file from monitor: possible problem:
77896-((null):28939): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
78001-
--
Migration: [100 %]
84457----------guest417-------
84482:error: internal error: early end of file from monitor: possible problem:
84555-((null):30161): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
84660-
--
Migration: [100 %]
89573----------guest441-------
89598:error: internal error: early end of file from monitor: possible problem:
89671-((null):31366): Spice-ERROR **: reds.c:3975:do_spice_init: statistics shm_open failed, Permission denied
89776-
[root@ibm-x3850x5-02 scalability]#

Comment 5 Jiri Denemark 2014-12-12 15:27:36 UTC
The "resource temporarily unavailable" is a system error message which libvirt or qemu gets when they try to create a new thread or process, we can't get anything better than this. And both libvirt and qemu reports when they got the error: "cannot execute binary /usr/libexec/qemu-kvm" or "qemu_thread_create" so I think it's pretty clear.

I'm not sure what the spice error is about but is it a different issue. It comes from a spice server on the destination host. Perhaps log files there could give us more info about the error...

Comment 6 Hu Jianwei 2014-12-15 00:42:58 UTC
OK, I'll try to capture the related logs, if possible, a separate bug will be reported. Thanks.