Bug 1348129
| Summary: | "VMs running in DB, but not in VDS" error while powering off all 29 VMs together | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | SATHEESARAN <sasundar> | ||||||
| Component: | Frontend.WebAdmin | Assignee: | Arik <ahadas> | ||||||
| Status: | CLOSED WORKSFORME | QA Contact: | SATHEESARAN <sasundar> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.6.7 | CC: | ahadas, bugs, sasundar, tjelinek | ||||||
| Target Milestone: | --- | Flags: | sasundar:
planning_ack?
sasundar: devel_ack? sasundar: testing_ack? |
||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: |
HCI
RHEV 3.6.7.4 and RHGS 3.1.3 ( RC )
|
|||||||
| Last Closed: | 2016-06-29 12:18:50 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1258386 | ||||||||
| Attachments: |
|
||||||||
|
Description
SATHEESARAN
2016-06-20 08:54:09 UTC
Not sure this is the related information.
But found the exceptions in the engine.log
<snip>
2016-06-20 04:38:56,610 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-60) [2ab9a462] Failed during vms monitoring on host host1.lab.eng.blr.redhat
.com error is: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
2016-06-20 04:38:56,610 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-60) [2ab9a462] Exception:: java.lang.ClassCastException: java.lang.String ca
nnot be cast to java.util.Map
at org.ovirt.engine.core.vdsbroker.VmsMonitoring.processVmDevices(VmsMonitoring.java:612) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VmsMonitoring.updateVmDevices(VmsMonitoring.java:562) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VmsMonitoring.processVmsWithDevicesChange(VmsMonitoring.java:368) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VmsMonitoring.refreshVmStats(VmsMonitoring.java:242) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.VmsMonitoring.perform(VmsMonitoring.java:165) [vdsbroker.jar:]
at org.ovirt.engine.core.vdsbroker.PollVmStatsRefresher.poll(PollVmStatsRefresher.java:36) [vdsbroker.jar:]
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) [:1.8.0_91]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_91]
at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_91]
at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:]
at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
</snip>
Other error messages looks like "vms running in db but not in VDS"
<snip>
2016-06-20 04:59:38,195 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-1) [110b2584] VM '59d48f24-d6b8-4e9e-b020-b728ed239da3(appvm03) is running in db and not running in VDS 'host1.lab.eng.blr.redhat.com'
2016-06-20 04:59:38,195 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-1) [110b2584] START, FullListVDSCommand(HostName = , FullListVDSCommandParameters:{runAsync='true', hostId='f4a25596-079b-44ed-970b-2142c67b9873', vds='Host[,f4a25596-079b-44ed-970b-2142c67b9873]', vmIds='[10c5e550-4b07-4a25-9a96-ae016864f2a8]'}), log id: 50edd038
2016-06-20 04:59:38,279 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVDSCommand] (DefaultQuartzScheduler_Worker-1) [110b2584] FINISH, FullListVDSCommand, return: [{status=Up, nicModel=rtl8139,pv, emulatedMachine=rhel6.5.0, guestDiskMapping={e5051a6c-dc96-44eb-b={name=/dev/vda}, QEMU_DVD-ROM_QM00003={name=/dev/sr0}}, vmId=10c5e550-4b07-4a25-9a96-ae016864f2a8, pid=6368, devices=[Ljava.lang.Object;@5e702073, smp=2, vmType=kvm, displayIp=0, display=vnc, displaySecurePort=-1, memSize=4096, displayPort=5900, spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir, statusTime=4725612060, vmName=HostedEngine, clientIp=, pauseCode=NOERR}], log id: 50edd038
2016-06-20 04:59:38,280 INFO [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-1) [110b2584] Received a memballoon Device without an address when processing VM 10c5e550-4b07-4a25-9a96-ae016864f2a8 devices, skipping device: {device=memballoon, specParams={model=none}, deviceType=balloon, type=balloon, alias=balloon0}
2016-06-20 04:59:38,280 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-1) [110b2584] Failed during vms monitoring on host host1.lab.eng.blr.redhat.com error is: java.lang.ClassCastException
2016-06-20 04:59:38,280 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-1) [110b2584] Exception:: java.lang.ClassCastException
</snip>
Created attachment 1169794 [details]
engine.log from hosted engine
Also seeing these error messages in engine.log <snip> 2016-06-20 05:10:57,645 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-53) [7f967349] Failed during vms monitoring on host host1.lab.eng.blr.redhat .com error is: java.lang.ClassCastException 2016-06-20 05:10:57,645 ERROR [org.ovirt.engine.core.vdsbroker.VmsMonitoring] (DefaultQuartzScheduler_Worker-53) [7f967349] Exception:: java.lang.ClassCastException </snip> appvm14 and appvm03 are the VMs stuck in "powering down" state. I am not sure the way to revive these VMs. Created attachment 1169797 [details]
screenshot showing that the vms are stuck in powering down state
I could able to hit this issue in my testing again. I observed that few of the VMs are stuck in powering-down state are on the hosts which runs the hosted engine. Once I migrated hosted-engine from that hosts, those VMs got powered down. Raising the severity as the outcome is unmonitored VMs without any indication to the user. Seems like the issue is unrelated to the powering-off VMs - that just exposed the problem in the monitoring after a change in the devices of one of the VMs. As part of parsing the result of FullList we get ClassCast exception. Until 2016-06-20 04:13:36,984 - there is not stacktrace for the ClassCast exception, so I don't know the root cause for this. In 2016-06-20 04:13:36,984, we start getting ClassCast exception with the stacktrace that shows the problem is that VDSM reports device addresses as String while we expect it to be a Map. This won't happen in 4.0 and above since we don't count on the type anymore. I'll post a 3.6 only fix for this. More investigation is needed in order to identify the cause for the exceptions we got before 2016-06-20 04:13:36,984. Does it still happen? if so, could you please add the output of "vdsClient 0 list" on host1.lab.eng.blr.redhat.com ? (In reply to Arik from comment #9) > Does it still happen? if so, could you please add the output of "vdsClient 0 > list" on host1.lab.eng.blr.redhat.com ? Arik, I couldn't hit it again. This is relevant only on 3.6 and it can not be reproduced anymore. Feel free to reopen if it happens again and provide the all the logs and the result of the "vdsClient 0 list" on the relevant host |