Bug 881538
| Summary: | virsh start domain can't return after libvirtd restart when 1024 guest are running | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | hongming <honzhang> | ||||
| Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | acathrow, dallan, dyasny, dyuan, jdenemar, jgalipea, mzhan, rwu, weizhan | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-12-10 10:42:22 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
How long did you wait for the start to return? It never return and it hang, but libvirtd doesn't hang. I was waiting for it about 20 minutes at most. It occurs after restart libvirtd. Please refer to the following steps.
# for i in {1..1024};do virsh start hm$i;done
......
# virsh list --all
......
2052 hm1020 running
2053 hm1021 running
2054 hm1022 running
2055 hm1023 running
2056 hm1024 running
- yuping-rhel6 shut off
# virsh destroy hm1
Domain hm1 destroyed
# virsh start hm1 <======= It works fine before restart libvirtd
Domain hm1 started
# service libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
# virsh destroy hm1
Domain hm1 destroyed
# date
Thu Nov 29 21:34:26 EST 2012
# virsh start hm1 <======= It hangs after restart libvirtd
^C
# date
Thu Nov 29 21:47:28 EST 2012
Before start 1024 guest ,the following prerequisite steps have been done.
1.# service cgconfig stop
2.Disable memballoon for each guest
# virsh dumpxml guest |grep memballoon
<memballoon model='none'>
3.Change the max_processes and max_files to 65535
# vi /etc/libvirt/qemu.conf |grep max_processes
max_processes=65535
max_files=65535
4.Change the LIBVIRTD_NOFILES_LIMIT to 65535
# vi /etc/sysconfig/libvirtd
LIBVIRTD_NOFILES_LIMIT=65535
5.Add more swap.
# free -g
total used free shared buffers cached
Mem: 1009 7 1002 0 0 0
-/+ buffers/cache: 6 1002
Swap: 1028 0 1028
Could you please provide exact steps needed to be done with freshly started libvirtd and no domain running to reproduce this issue? And I'm afraid we will need debug logs from libvirtd generated during that process. Created attachment 656528 [details]
guest xml
Are these logs from the run which succeeded and did not exhibit the bug? If so, I'm afraid they are quite useless to me. I need the logs when this bug reproduces. Yes ,the log from test in Comment 6 -start command return after 1 minute. Can the result be accepted ? I will try to reproduce the bug and get log when the two NUMA hosts are combined into one. I think the 1 minute delay is expected given that you restarted the daemon just before running virsh start. And since it doesn't hang, the logs will not provide any hint about the hang. That said, please, try to get the logs when this bug reproduces. Hi Jiri I can't reproduce it again in the same machine. I have tried it many times. it doesn't hang again and can return after 1 ~2 minute. Please close the bug. I am sorry for this. OK, please, reopen or report again in case this bug reproduces. |
Description of problem: virsh start domain can't return and quit when 1000 guest are running. And the qemu process of guest can start , the libvirtd doesn't hang. The host machine has 1 T mem and 160 cpu. Version-Release number of selected component (if applicable): libvirt-0.10.2-10.el6.x86_64 qemu-kvm-0.12.1.2-2.335.el6.x86_64 How reproducible: 100% Steps to Reproduce: # virsh list --all ...... 2114 hm1012 running 2115 hm1013 running 2116 hm1014 running 2117 hm1015 running 2118 hm1016 running 2119 hm1017 running 2120 hm1018 running 2121 hm1019 running 2122 hm1020 running 2123 hm1021 running 2124 hm1022 running 2125 hm1023 running 2126 hm1024 running # virsh destroy hm1 Domain hm1 destroyed # virsh start hm1 <------Always waiting , can't return and quit. In another terminal, can get the qemu process of guest. # ps -ef|grep d8f4b92e-05ac-72b9-22c3-db1e78f9bce1 qemu 126323 1 99 21:59 ? 00:37:29 /usr/libexec/qemu-kvm -name hm1 -S -M rhel6.3.0 -enable-kvm -m 536 -smp 1,sockets=1,cores=1,threads=1 -uuid d8f4b92e-05ac-72b9-22c3-db1e78f9bce1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/hm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive file=/home/scalability-guests/disks/hm1,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=1046,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:d3:99:dd,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 root 126710 145537 1 22:37 pts/1 00:00:00 grep d8f4b92e-05ac-72b9-22c3-db1e78f9bce1 # virsh dominfo hm1 Id: 2129 Name: hm1 UUID: d8f4b92e-05ac-72b9-22c3-db1e78f9bce1 OS Type: hvm State: shut off CPU(s): 1 CPU time: 2303.9s Max memory: 548864 KiB Used memory: 548864 KiB Persistent: yes Autostart: disable Managed save: no Security model: none Security DOI: 0 # free -g total used free shared buffers cached Mem: 1009 53 955 0 0 24 -/+ buffers/cache: 28 980 Swap: 3 0 3 Actual results: virsh start domain can't return and quit Expected results: Successfully start domain. Additional info: