Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 881538

Summary: virsh start domain can't return after libvirtd restart when 1024 guest are running
Product: Red Hat Enterprise Linux 6 Reporter: hongming <honzhang>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: acathrow, dallan, dyasny, dyuan, jdenemar, jgalipea, mzhan, rwu, weizhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-10 10:42:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
guest xml none

Description hongming 2012-11-29 03:47:06 UTC
Description of problem:
virsh start domain can't return and quit when 1000 guest are running. And the qemu process of guest can start , the libvirtd doesn't hang.
The host machine has 1 T mem and 160 cpu.


Version-Release number of selected component (if applicable):
libvirt-0.10.2-10.el6.x86_64
qemu-kvm-0.12.1.2-2.335.el6.x86_64 

How reproducible:
100% 

Steps to Reproduce:
# virsh list --all
......
 2114  hm1012                         running
 2115  hm1013                         running
 2116  hm1014                         running
 2117  hm1015                         running
 2118  hm1016                         running
 2119  hm1017                         running
 2120  hm1018                         running
 2121  hm1019                         running
 2122  hm1020                         running
 2123  hm1021                         running
 2124  hm1022                         running
 2125  hm1023                         running
 2126  hm1024                         running

# virsh  destroy  hm1
Domain hm1 destroyed


# virsh start hm1  <------Always waiting ,  can't return and quit.



In another terminal, can get the qemu process of guest.
# ps -ef|grep d8f4b92e-05ac-72b9-22c3-db1e78f9bce1
qemu     126323      1 99 21:59 ?        00:37:29 /usr/libexec/qemu-kvm -name hm1 -S -M rhel6.3.0 -enable-kvm -m 536 -smp 1,sockets=1,cores=1,threads=1 -uuid d8f4b92e-05ac-72b9-22c3-db1e78f9bce1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/hm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive file=/home/scalability-guests/disks/hm1,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=1046,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:d3:99:dd,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
root     126710 145537  1 22:37 pts/1    00:00:00 grep d8f4b92e-05ac-72b9-22c3-db1e78f9bce1


# virsh dominfo hm1
Id:             2129
Name:           hm1
UUID:           d8f4b92e-05ac-72b9-22c3-db1e78f9bce1
OS Type:        hvm
State:          shut off
CPU(s):         1
CPU time:       2303.9s
Max memory:     548864 KiB
Used memory:    548864 KiB
Persistent:     yes
Autostart:      disable
Managed save:   no
Security model: none
Security DOI:   0

# free -g
             total       used       free     shared    buffers     cached
Mem:          1009         53        955          0          0         24
-/+ buffers/cache:         28        980
Swap:            3          0          3 
  
Actual results:
virsh start domain can't return and quit 

Expected results:
Successfully start domain. 

Additional info:

Comment 2 Dave Allan 2012-11-29 13:09:21 UTC
How long did you wait for the start to return?

Comment 3 hongming 2012-11-30 02:59:11 UTC
It never return and it hang, but libvirtd doesn't hang. I was waiting for it about 20 minutes at most. It occurs after restart libvirtd. Please refer to the following steps. 

# for i in {1..1024};do virsh start hm$i;done
......

# virsh list --all    
......
 2052  hm1020                         running
 2053  hm1021                         running
 2054  hm1022                         running
 2055  hm1023                         running
 2056  hm1024                         running
 -     yuping-rhel6                   shut off

# virsh destroy hm1
Domain hm1 destroyed   

# virsh start hm1       <======= It works fine before restart libvirtd
Domain hm1 started

# service libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]

# virsh destroy hm1
Domain hm1 destroyed

# date
Thu Nov 29 21:34:26 EST 2012

# virsh start hm1       <======= It hangs after restart libvirtd

^C

# date
Thu Nov 29 21:47:28 EST 2012

Comment 4 hongming 2012-11-30 03:31:56 UTC
Before start 1024 guest ,the following prerequisite steps have been done. 

1.# service cgconfig stop

2.Disable memballoon for each guest 
# virsh dumpxml guest |grep memballoon

  <memballoon model='none'>


3.Change the max_processes and max_files to 65535
# vi /etc/libvirt/qemu.conf |grep max_processes

max_processes=65535
max_files=65535

4.Change the LIBVIRTD_NOFILES_LIMIT to 65535
# vi /etc/sysconfig/libvirtd

LIBVIRTD_NOFILES_LIMIT=65535

5.Add more swap.
# free -g
             total       used       free     shared    buffers     cached
Mem:          1009          7       1002          0          0          0
-/+ buffers/cache:          6       1002
Swap:         1028          0       1028

Comment 5 Jiri Denemark 2012-11-30 14:14:38 UTC
Could you please provide exact steps needed to be done with freshly started libvirtd and no domain running to reproduce this issue? And I'm afraid we will need debug logs from libvirtd generated during that process.

Comment 7 hongming 2012-12-03 10:20:02 UTC
Created attachment 656528 [details]
guest xml

Comment 9 Jiri Denemark 2012-12-05 09:47:12 UTC
Are these logs from the run which succeeded and did not exhibit the bug? If so, I'm afraid they are quite useless to me. I need the logs when this bug reproduces.

Comment 10 hongming 2012-12-05 10:08:30 UTC
Yes ,the log from test in Comment 6 -start command return after 1 minute. Can the result be accepted ? 

I will try to reproduce the bug and get log when the two NUMA hosts are combined into one.

Comment 11 Jiri Denemark 2012-12-05 10:39:03 UTC
I think the 1 minute delay is expected given that you restarted the daemon just before running virsh start. And since it doesn't hang, the logs will not provide any hint about the hang. That said, please, try to get the logs when this bug reproduces.

Comment 12 hongming 2012-12-10 10:26:33 UTC
Hi Jiri

I can't reproduce it again in the same machine. I have tried it many times. it doesn't hang again and can return after 1 ~2  minute. Please close the bug. I am sorry for this.

Comment 13 Jiri Denemark 2012-12-10 10:42:22 UTC
OK, please, reopen or report again in case this bug reproduces.