Bug 1690301
| Summary: | VDSM fails to query the guest agent with KeyError: 'appsList' | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Mugdha Soni <musoni> | ||||
| Component: | General | Assignee: | Dan Kenigsberg <danken> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Lukas Svaty <lsvaty> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.30.10 | CC: | bugs, pmatyas, rbarry, rhs-bugs, sabose, sankarshan, sasundar, stirabos, tgolembi | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 1690294 | Environment: | |||||
| Last Closed: | 2019-03-20 00:51:29 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1690294 | ||||||
| Attachments: |
|
||||||
|
Description
Mugdha Soni
2019-03-19 09:06:06 UTC
ovirt-engine logs " http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/mugdha/ovirt-engine-log/ " Let's focus on engine.log.
The engine see the bootstrap local VM as an external VM:
2019-03-18 15:59:06,530+05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] VM 'fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43' was discovered as 'Up' on VDS '7a3ed5ac-786e-476c-8120-e815d9934f98'(rhsqa-grafton1.lab.eng.blr.redhat.com)
2019-03-18 15:59:06,550+05 INFO [org.ovirt.engine.core.bll.AddUnmanagedVmsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [136031b] Running command: AddUnmanagedVmsCommand internal: true.
2019-03-18 15:59:06,553+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [136031b] START, DumpXmlsVDSCommand(HostName = rhsqa-grafton1.lab.eng.blr.redhat.com, Params:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98', vmIds='[fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43]'}), log id: 95d0bca
2019-03-18 15:59:06,558+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [136031b] FINISH, DumpXmlsVDSCommand, return: {fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43=<domain type='kvm' id='1'>
<name>HostedEngineLocal</name>
<uuid>fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43</uuid>
<memory unit='KiB'>16740352</memory>
<currentMemory unit='KiB'>16740352</currentMemory>
<vcpu placement='static'>4</vcpu>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type>
<boot dev='hd'/>
<bootmenu enable='no'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>Haswell-noTSX-IBRS</model>
<feature policy='require' name='vme'/>
<feature policy='require' name='f16c'/>
<feature policy='require' name='rdrand'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='arat'/>
<feature policy='require' name='xsaveopt'/>
<feature policy='require' name='abm'/>
</cpu>
<clock offset='utc'>
<timer name='kvmclock' present='yes'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/tmp/localvmDxh30G/images/4276937a-3a05-4a4a-a4c9-771eb4f9a0a9/010d2981-0230-4672-92f7-9c6ed0ef6bb9'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/var/tmp/localvmDxh30G/seed.iso'/>
<backingStore/>
<target dev='hda' bus='ide'/>
<readonly/>
<alias name='ide0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='usb' index='0' model='none'/>
<controller type='pci' index='0' model='pci-root'>
<alias name='pci.0'/>
</controller>
<controller type='ide' index='0'>
<alias name='ide'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='virtio-serial' index='0'>
<alias name='virtio-serial0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</controller>
<interface type='network'>
<mac address='00:45:54:19:48:01'/>
<source network='default' bridge='virbr0'/>
<target dev='vnet0'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/4'/>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/4'>
<source path='/dev/pts/4'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-HostedEngineLocal/org.qemu.guest_agent.0'/>
<target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
<alias name='channel0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
<input type='mouse' bus='ps2'>
<alias name='input0'/>
</input>
<input type='keyboard' bus='ps2'>
<alias name='input1'/>
</input>
<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
<listen type='address' address='127.0.0.1'/>
</graphics>
<video>
<model type='vga' vram='16384' heads='1' primary='yes'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='none'/>
<rng model='virtio'>
<backend model='random'>/dev/random</backend>
<alias name='rng0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</rng>
</devices>
<seclabel type='dynamic' model='selinux' relabel='yes'>
<label>system_u:system_r:svirt_t:s0:c119,c535</label>
<imagelabel>system_u:object_r:svirt_image_t:s0:c119,c535</imagelabel>
</seclabel>
<seclabel type='dynamic' model='dac' relabel='yes'>
<label>+107:+107</label>
<imagelabel>+107:+107</imagelabel>
</seclabel>
</domain>
}, log id: 95d0bca
2019-03-18 15:59:06,727+05 INFO [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [616c5678] Lock Acquired to object 'EngineLock:{exclusiveLocks='[external-HostedEngineLocal=VM_NAME]', sharedLocks=''}'
2019-03-18 15:59:06,815+05 INFO [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Running command: AddVmFromScratchCommand internal: true. Entities affected : ID: c7c98b60-4967-11e9-9569-004554194801 Type: ClusterAction group CREATE_VM with role type USER
2019-03-18 15:59:07,000+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] EVENT_ID: USER_ADD_VM(34), VM external-HostedEngineLocal was created by SYSTEM.
2019-03-18 15:59:07,001+05 INFO [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Lock freed to object 'EngineLock:{exclusiveLocks='[external-HostedEngineLocal=VM_NAME]', sharedLocks=''}'
2019-03-18 15:59:07,012+05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Received a console Device without an address when processing VM fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43 devices, skipping device: {alias=serial0, type=console, device=console, deviceId=ec5ad979-5cd6-443c-97ca-6c005d0709f4}
2019-03-18 15:59:07,013+05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Received a vga Device without an address when processing VM fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43 devices, skipping device: {alias=video0, type=video, device=vga, deviceId=1359c3ee-0ea3-45ce-a212-7e34192c95e3}
2019-03-18 15:59:07,013+05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Received a vnc Device without an address when processing VM fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43 devices, skipping device: {alias=, type=graphics, device=vnc, deviceId=2bedfc6a-e134-4238-a5d1-e29aa8019d03}
2019-03-18 15:59:07,026+05 INFO [org.ovirt.engine.core.bll.AddUnmanagedVmsCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] Importing VM 'HostedEngineLocal' as 'external-HostedEngineLocal', as it is running on the on Host, but does not exist in the engine.
2019-03-18 15:59:07,041+05 INFO [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] VMs initialization finished for Host: 'rhsqa-grafton1.lab.eng.blr.redhat.com:7a3ed5ac-786e-476c-8120-e815d9934f98'
That bootstrap VM in our case has uuid: fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43
Then we restart the engine on the target VM on the shared storage.
So hosted-engine-setup tries to remove the bootstrap engine VM to avoid any leftover but it fails with:
2019-03-18 16:07:59,115+05 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Running command: ShutdownVmCommand internal: false. Entities affected : ID: fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43 Type: VMAction group SHUT_DOWN_VM with role type USER
2019-03-18 16:07:59,138+05 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Entered (VM 'external-HostedEngineLocal').
2019-03-18 16:07:59,138+05 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Sending shutdown command for VM 'external-HostedEngineLocal'.
2019-03-18 16:07:59,142+05 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98', vmId='fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 16d89021
2019-03-18 16:07:59,156+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] START, DestroyVDSCommand(HostName = rhsqa-grafton1.lab.eng.blr.redhat.com, DestroyVmVDSCommandParameters:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98', vmId='fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 66deda5d
2019-03-18 16:07:59,161+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Destroy VM couldn't find VM 'fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43'. If after Cancel Migration and VM is UP on source, ignore next error, it's just a clean-up call
2019-03-18 16:07:59,161+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Failed in 'DestroyVDS' method
2019-03-18 16:07:59,169+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM rhsqa-grafton1.lab.eng.blr.redhat.com command DestroyVDS failed: Virtual machine does not exist
2019-03-18 16:07:59,169+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand' return value 'StatusOnlyReturn [status=Status [code=1, message=Virtual machine does not exist]]'
2019-03-18 16:07:59,170+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] HostName = rhsqa-grafton1.lab.eng.blr.redhat.com
2019-03-18 16:07:59,170+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Command 'DestroyVDSCommand(HostName = rhsqa-grafton1.lab.eng.blr.redhat.com, DestroyVmVDSCommandParameters:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98', vmId='fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist, code = 1
2019-03-18 16:07:59,170+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] FINISH, DestroyVDSCommand, return: , log id: 66deda5d
2019-03-18 16:07:59,170+05 ERROR [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Failed to destroy VM 'fdafa94f-07bc-4bfd-ae02-f44ddb2f7a43' in VDS = '7a3ed5ac-786e-476c-8120-e815d9934f98' , error = 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist, code = 1'
2019-03-18 16:07:59,170+05 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] FINISH, DestroyVmVDSCommand, return: , log id: 16d89021
2019-03-18 16:07:59,170+05 ERROR [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] Command 'org.ovirt.engine.core.bll.ShutdownVmCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to DestroyVDS, error = Virtual machine does not exist, code = 1 (Failed with error noVM and code 1)
2019-03-18 16:07:59,199+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [eeb95d8c-8bed-428a-9439-e4241f489a57] EVENT_ID: USER_FAILED_SHUTDOWN_VM(74), Failed to initiate shutdown on VM external-HostedEngineLocal (Host: rhsqa-grafton1.lab.eng.blr.redhat.com, User: admin@internal-authz).
2019-03-18 16:07:59,217+05 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Desktop does not exist]
but the root cause is here:
2019-03-18 16:08:07,344+05 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': "'appsList'"}]
2019-03-18 16:08:07,344+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] Failed in 'GetAllVmStatsVDS' method
2019-03-18 16:08:07,344+05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] Command 'GetAllVmStatsVDSCommand(HostName = rhsqa-grafton1.lab.eng.blr.redhat.com, VdsIdVDSCommandParametersBase:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetAllVmStatsVDS, error = Internal JSON-RPC error: {'reason': "'appsList'"}, code = -32603
2019-03-18 16:08:07,344+05 INFO [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (EE-ManagedThreadFactory-engineScheduled-Thread-10) [] Failed to fetch vms info for host 'rhsqa-grafton1.lab.eng.blr.redhat.com' - skipping VMs monitoring.
GetAllVmStatsVDS on rhsqa-grafton1.lab.eng.blr.redhat.com failed with error = Internal JSON-RPC error: {'reason': "'appsList'"}, code = -32603 and so the VM list for rhsqa-grafton1.lab.eng.blr.redhat.com was not up to date and so ovirt-hosted-engine-setup failed to remove the bootstrap local engine VM since the VM list was missing.
Mugdha, can you please attach /var/log/vdsm/vdsm.log from rhsqa-grafton1.lab.eng.blr.redhat.com (for the relevant time) to better understand why GetAllVmStatsVDS there failed?
It seems an issue of vdsm querying ovirt-guest-agent:
2019-03-18 16:07:48,165+0530 ERROR (jsonrpc/3) [api] FINISH getStats error='appsList' (api:134)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 361, in getStats
stats = vm.getStats().copy()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1766, in getStats
oga_stats = self._getGuestStats()
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 1967, in _getGuestStats
stats = self.guestAgent.getGuestInfo()
File "/usr/lib/python2.7/site-packages/vdsm/virt/guestagent.py", line 505, in getGuestInfo
del qga['appsList']
KeyError: 'appsList'
2019-03-18 16:07:48,165+0530 INFO (jsonrpc/3) [api.virt] FINISH getStats return={'status': {'message': 'General Exception: ("\'appsList\'",)', 'code': 100}} from=::1,48218, vmId=44c126ae-e9d8-409a-85ae-5288060659dd (api:54)
Please notice that 44c126ae-e9d8-409a-85ae-5288060659dd is the target VM where the engine is really running and not the bootstrap one which is down.
2019-03-18 16:11:07,595+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] START, DumpXmlsVDSCommand(HostName = rhsqa-grafton1.lab.eng.blr.redhat.com, Params:{hostId='7a3ed5ac-786e-476c-8120-e815d9934f98', vmIds='[44c126ae-e9d8-409a-85ae-5288060659dd]'}), log id: 54a08ec9
2019-03-18 16:11:07,600+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DumpXmlsVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [] FINISH, DumpXmlsVDSCommand, return: {44c126ae-e9d8-409a-85ae-5288060659dd=<domain type='kvm' id='2'>
<name>HostedEngine</name>
<uuid>44c126ae-e9d8-409a-85ae-5288060659dd</uuid>
<metadata xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">
Moving to VDSM for further investigation.
Created attachment 1545557 [details]
vdsm.log
Maybe https://gerrit.ovirt.org/#/c/98499/ fixes this as well. Tomáš, can you please take a look? Yes, that patch fixes exactly this issue. *** This bug has been marked as a duplicate of bug 1676893 *** *** Bug 1690379 has been marked as a duplicate of this bug. *** |