Bug 1395916
| Summary: | [Scale] "VM not responding" during VM start up when 1000 Disks are loaded | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | mlehrer |
| Component: | General | Assignee: | Francesco Romani <fromani> |
| Status: | CLOSED DUPLICATE | QA Contact: | guy chen <guchen> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.0.5.1 | CC: | bugs, eberman, mgoldboi, michal.skrivanek, mlehrer, oourfali, rgolan, tjelinek |
| Target Milestone: | --- | Keywords: | Performance |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-12-14 15:12:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
mlehrer
2016-11-17 00:02:39 UTC
Is it really not responding? What is the state of the process when this happens? Do you have libvirt logs? host logs? What is the CPU usage of the host when this happens? Libvirt monitor is unresponsive:
- VM created by engine here:
2016-11-15 12:51:04,609 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateVDSCommand] (default task-50) [aa00c14] START, CreateVDSCommand(HostName = ucs1-b420-1.eng.lab.tlv.redhat.com, CreateVmVDSCommandParameters:{runAsync='true', hostId='283a78eb-7382-4a58-b8f4-1de6141ee0a1', vmId='75de2ec4-aa46-4c04-aa3b-1494c0b49e8f', vm='VM [run_vm_RFE_-1]'}), log id: 6587d4b1
- Engine got response and saved the initial state to WaitForLunch
2016-11-15 12:51:04,789 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (default task-50) [aa00c14] FINISH, CreateVmVDSCommand, return: WaitForLaunch, log id: 53effb24
- ~1 second go by
- Subsequent engine monitoring detected NonResponding status reported by VDSM
2016-11-15 12:51:05,776 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler6) [] VM '75de2ec4-aa46-4c04-aa3b-1494c0b49e8f'(run_vm_RFE_-1) moved from 'WaitForLaunch' --> 'NotResponding
- VDSM report libvirt monitor is unresponsive
jsonrpc.Executor/1::WARNING::2016-11-15 12:51:05,772::vm::4842::virt.vm::(_setUnresponsiveIfTimeout) vmId=`75de2ec4-aa46-4c04-aa3b-1494c0b49e8f`::monitor become unresponsive (command timeout, age=4379472.67)
https://bugzilla.redhat.com/show_bug.cgi?id=1382583 should solve this. Please test again with the fix (will be in oVirt 4.0.6) (In reply to Yaniv Kaul from comment #1) > Is it really not responding? What is the state of the process when this > happens? > Do you have libvirt logs? host logs? > What is the CPU usage of the host when this happens? I've updated the logs[1] after performing 2 more manual VM starts. Although based on the comments above we'll need to retest to see if this occurs in that build. [1] Updated the logs to include libvirt logs and vdsm logs after performing 2 more manual VM starts. They're found in the shared folder, "BZ_1395916". When performing this test from the UI the amount of time it takes for the respective libvirt log to update can vary between 1s to 7s depending on whether the the VM has been powered on previously. The error message of "VM not responding" will occur each time as shown in the video/logs found here: https://drive.google.com/drive/folders/0B8V1DXeGhPPWQk5jaElsaDFONnM?usp=sharing Could you please try it again when https://bugzilla.redhat.com/show_bug.cgi?id=1382583 is going to be in build? So, the https://bugzilla.redhat.com/show_bug.cgi?id=1398415 is ON_QA on 4.0.6. Could you please test this again to see if the issue is gone now? (In reply to Tomas Jelinek from comment #6) > So, the https://bugzilla.redhat.com/show_bug.cgi?id=1398415 is ON_QA on > 4.0.6. Could you please test this again to see if the issue is gone now? Confirmed that the issue is not reproducible on rhevm: 4.0.6.3-0.1.el7ev great, so closing as duplicate since it is now confirmed. *** This bug has been marked as a duplicate of bug 1382583 *** |