Bug 1680398
Summary: | QEMU-GA capabilities are reported as none for a few minutes after VM start. Therefore, VM details are not reported by engine | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Elad <ebenahar> | ||||||||||||
Component: | Core | Assignee: | Tomáš Golembiovský <tgolembi> | ||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matyáš <pmatyas> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 4.30.8 | CC: | aefrat, bugs, dholler, fgarciad, lleistne, mburman, michal.skrivanek, pagranat, rbarry, sfishbai, tgolembi | ||||||||||||
Target Milestone: | ovirt-4.4.0 | Keywords: | Automation, AutomationBlocker | ||||||||||||
Target Release: | 4.40.14 | Flags: | rbarry:
ovirt-4.3?
|
||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | vdsm-4.40.14 | Doc Type: | Enhancement | ||||||||||||
Doc Text: |
Mechanism for polling QEMU-GA in VDSM has been enhanced to query newly started VMs more often in order to get the stats as soon as the agent becomes available in guest.
|
Story Points: | --- | ||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2020-05-20 20:03:03 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1661283, 1740058 | ||||||||||||||
Attachments: |
|
Description
Elad
2019-02-24 16:36:49 UTC
Tomas, any ideas? capabilities check interval is 5 minutes. Logs above seem to match that. We should consider some simple back off algorithm for VM start In the last builds with rhel8.0 guest, I see a problem that could be related to the current BZ, though I'm not sure. I describe it here for your consideration if it could be the same problem or separate bug. The problem: Start the rhel8 guest VM on iscsi disk . block the storage on the host. wait for the paused state. The rhel8 VM doesn't turn to paused state. With thel7.6 VM it works perfect - after less than a minute after the storage blocked it turns to the Paused state. I attach logs from two tests - first_run.tar.gz and second_run.tar.gz. In the test1 I run two iscsi VMs rhel7.6 and rhel8 the first time after creating, wait for IP. block the iscsi storage on the host => as result test_rhel7.6 turns to Paused and the test_rhel8 is not. The test2 is just the same . The difference if that it is not the first run , so , I get IP sooner , though the result it the same: the test_rhel8 never turns to the paused state. I wait until the host turns to non-operational at about 2019-04-04 10:14:51 which causes test_rhel8 to migrate. This behavior causes automation tests failures. happens a lot though not 100% cases. sometimes it behaves as expected. ### test1 (the first run and getting IP after vm created) starts at: 2019-04-04 09:41:38,079+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel8 started on Host host_mixed_1 2019-04-04 09:41:53,117+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel7.6 started on Host host_mixed_1 2019-04-04 09:48:08,806+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID: VM_PAUSED_EIO(145), VM test_rhel7.6 has been paused due to storage I/O problem. ### test 2 (the second run) starts at: 2019-04-04 10:05:55,792+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel7.6 started on Host host_mixed_1 2019-04-04 10:06:10,905+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel8 started on Host host_mixed_1 about Thu Apr 4 10:14:51 IDT 2019 host_mixed_1 turns to non-operational state which causes test_rhel8 to migrate Created attachment 1551703 [details]
first_run.tar
Created attachment 1551704 [details]
second_run.tar
This problem appears also in : ovirt-engine-4.3.3.6, with rhel8.0 guest 2019-05-04 03:29:36,156 - MainThread - RemoteExecutor - DEBUG - [root.16.38/qum5net] OUT: [{u'displayInfo': [{u'tlsPort': u'5901', u'ipAddress': u'10.46.16.38', u'port': u'5900', u'type': u'spice'}], u'memUsage': u'63', u'acpiEnable': u'true', u'guestFQDN': u'', u'vmId': u'43647a6f-bf30-478d-b2f6-9e40db8bda0c', u'session': u'Unknown', u'timeOffset': u'0', u'memoryStats': {u'swap_out': 0, u'majflt': 0, u'minflt': 0, u'mem_free': u'378480', u'swap_in': 0, u'pageflt': 0, u'mem_total': u'772136', u'mem_unused': u'378480'}, u'balloonInfo': {u'balloon_max': u'1048576', u'balloon_cur': u'1048576', u'balloon_target': u'1048576', u'balloon_min': u'1048576'}, u'pauseCode': u'NOERR', u'network': {u'vnet0': {u'macAddr': u'00:1a:4a:16:88:ac', u'rxDropped': u'0', u'tx': u'4149', u'txDropped': u'0', u'rxErrors': u'0', u'rx': u'254188', u'txErrors': u'0', u'state': u'unknown', u'sampleTime': 4319342.34, u'speed': u'1000', u'name': u'vnet0'}}, u'vmType': u'kvm', u'cpuUser': u'0.05', u'elapsedTime': u'311', u'vmJobs': {}, u'cpuSys': u'0.40', u'appsList': [u'qemu-guest-agent-2.12.0'], u'vmName': u'copy_disk_vm_glusterfs', u'vcpuCount': u'1', u'hash': u'-8771708058453342833', u'cpuUsage': u'8600000000', u'vcpuPeriod': 100000 Logs attached Created attachment 1564473 [details]
New_Logs
not related to RHEL 8 at all, just to qemu-ga without ovirt-ga which can happen on any guest. Created attachment 1613592 [details]
New_Logs_4.3.6.5V
Work is being done on fixing the issue This bug is on POST for 4.4.1 but referenced patches are included in vdsm v4.40.13. Can we move this to modified for 4.4.0? Verified on vdsm-4.40.14-1.el8ev.x86_64 This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |