| Summary: | sometimes guest start will hang and the status is ambiguous when start 512 guests | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | weizhang <weizhan> |
| Component: | libvirt | Assignee: | Osier Yang <jyang> |
| Status: | CLOSED WORKSFORME | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.2 | CC: | acathrow, dallan, dyuan, gren, juzhang, jyang, mzhan, nzhang, rwu, veillard |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-10-21 09:44:19 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
(In reply to comment #0) > Description of problem: > # virsh list |grep "396" > 396 rhel6u1-x86_646 shut off > > When do > # virsh start rhel5u7-x86_6464 Here I mean the same guest rhel6u1-x86_646, should be # virsh start rhel6u1-x86_646 error: Domain is already active > error: Domain is already active > > #virsh destroy rhel6u1-x86_646 > error: Failed to destroy domain rhel6u1-x86_646 > error: Requested operation is not valid: domain is not running > > After destroy, the guest return to normal shut off status and can be started > again > The fact that the domain is listed means it has been added to the hash table of started domains, although the actual start process has not yet progressed far enough to reach the point where the domain is marked as running. We have to drop mutex to call into the domain monitor to verify that the domain started, so that explains why there is a window where a domain can show up in the active list while still being shut off. But until I know the root cause for why the creation seems to hang, I'm not sure if it is worth tweaking code to try to prevent this data race. |
Description of problem: I start 512 guests with loop, when start on 396th guest, it hang without return, but libvirtd still running and on other console, virsh list still work well. I check the guest status with virsh list (without --all or inactive), it shows that the 307th guest is in shut off status but in active domain list. # virsh list |grep "396" 396 rhel6u1-x86_646 shut off When do # virsh start rhel5u7-x86_6464 error: Domain is already active #virsh destroy rhel6u1-x86_646 error: Failed to destroy domain rhel6u1-x86_646 error: Requested operation is not valid: domain is not running After destroy, the guest return to normal shut off status and can be started again Version-Release number of selected component (if applicable): libvirt-0.9.4-17.el6.x86_64 kernel-2.6.32-206.el6.x86_64 qemu-kvm-0.12.1.2-2.196.el6.x86_64 How reproducible: sometimes Steps to Reproduce: 1. start 512 guest with command # for i in {1..512}; do virsh start guest$i; done 2. 3. Actual results: It may hang on one guest start up, but the virsh list will show error info Expected results: virsh start will not hang, and virsh list will show correctly Additional info: # free -g total used free shared buffers cached Mem: 992 865 127 0 2 688 -/+ buffers/cache: 174 818 Swap: 0 0 0 # top -p `pidof libvirtd` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7202 root 20 0 994m 32m 5236 S 26.8 0.0 6:46.61 libvirtd I don't know if it is helpful, but I found in libvirtd.log it may report error like 23:30:12.323: 7202: error : qemuMonitorIO:583 : internal error End of file from monitor 23:31:19.956: 7202: error : qemuMonitorIO:583 : internal error End of file from monitor 10:08:59.096: 7202: error : virNetSocketReadWire:911 : End of file while reading data: Input/output error 10:09:00.844: 7202: error : virNetSocketReadWire:911 : End of file while reading data: Input/output error