Created attachment 1538179 [details] engine, hypervisor and logs from the guest Description of problem: After start VM for a RHEL8 guest with qemu GA installed, the VM details, like IP and FQDN, are not reported by engine. In vdsm.log I observed that QEMU-GA capabilities for the VM are commands=[]. Version-Release number of selected component (if applicable): Hypervisor: vdsm-4.30.9-1.el7ev.x86_64 qemu-kvm-rhev-2.12.0-21.el7.x86_64 libvirt-4.5.0-10.el7_6.4.x86_64 sanlock-3.6.0-1.el7.x86_64 selinux-policy-3.13.1-229.el7_6.9.noarch kernel - 3.10.0-957.10.1.el7.x86_64 Guest: RHEL8 kernel - 4.18.0-67.el8.x86_64 qemu-guest-agent-2.12.0-61.module+el8+2786+5afd5ae3.x86_64 selinux-policy-3.14.1-59.el8.noarch Engine: ovirt-engine-4.3.1.1-0.1.el7.noarch How reproducible: Sometimes. This causes many failures in RHV QE automation, as we rely on the VM IP for SSH connection. Steps to Reproduce: 1. Create a VM with RHEL8 installed + qemu GA running in the guest 2. Start the VM Actual results: No QEMU-GA capabilities after VM start 2019-02-24 17:38:50,834+0200 INFO (qgapoller/1) [vds] New QEMU-GA capabilities for vm_id=568dbc4a-e8ca-43a9-bd48-0cad263ea767, qemu-ga=None, commands=[] (qemuguestagent:171) In the guest, I observed the following audit messages (which resemble to the ones reported in bug 1663092): [ 632.789546] audit: type=1400 audit(1551023330.829:3): avc: denied { read } for pid=653 comm="qemu-ga" name="b252:3" dev="tmpfs" ino=19466 scontext=system_u:system_r:virt_qemu_ga_t:s0 tcontext=system_u:object_r:udev_var_run_t:s0 tclass=file permissive=1 [ 632.886759] audit: type=1400 audit(1551023330.829:4): avc: denied { open } for pid=653 comm="qemu-ga" path="/run/udev/data/b252:3" dev="tmpfs" ino=19466 scontext=system_u:system_r:virt_qemu_ga_t:s0 tcontext=system_u:object_r:udev_var_run_t:s0 tclass=file permissive=1 [ 632.994361] audit: type=1400 audit(1551023330.830:5): avc: denied { getattr } for pid=653 comm="qemu-ga" path="/run/udev/data/b252:3" dev="tmpfs" ino=19466 scontext=system_u:system_r:virt_qemu_ga_t:s0 tcontext=system_u:object_r:udev_var_run_t:s0 tclass=file permissive=1 Also, in libvirtd.log, saw these messages: 2019-02-24 15:38:50.833+0000: 3047: error : qemuDomainAgentAvailable:9133 : Guest agent is not responding: QEMU guest agent is not connected Only after ~5 minutes, the capabilities are available: 2019-02-24 17:43:50,866+0200 INFO (qgapoller/0) [vds] New QEMU-GA capabilities for vm_id=568dbc4a-e8ca-43a9-bd48-0cad263ea767, qemu-ga=2.12.0, commands=set([u'guest-get-timezone', u'guest-suspend-hybrid', u'gue st-get-fsinfo', u'guest-set-time', u'guest-sync', u'guest-get-memory-block-info', u'guest-fsfreeze-freeze', u'guest-network-get-interfaces', u'guest-fstrim', u'guest-get-host-name', u'guest-shutdown', u'guest-ge t-vcpus', u'guest-sync-delimited', u'guest-set-user-password', u'guest-set-memory-blocks', u'guest-fsfreeze-status', u'guest-ping', u'guest-fsfreeze-freeze-list', u'guest-info', u'guest-get-memory-blocks', u'gue st-set-vcpus', u'guest-fsfreeze-thaw', u'guest-suspend-disk', u'guest-get-osinfo', u'guest-get-time', u'guest-suspend-ram', u'guest-get-users']) (qemuguestagent:171) Expected results: QEMU-GA capabilities should be reported immediately after the qemu GA agent is up and running. Additional info: engine, hypervisor and logs from the guest are attached Guest: [root@localhost audit]# getenforce Permissive [root@localhost audit]# systemctl status qemu-guest-agent.service ● qemu-guest-agent.service - QEMU Guest Agent Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2019-02-24 15:39:15 EST; 4h 3min left Main PID: 653 (qemu-ga) Tasks: 1 (limit: 4648) Memory: 2.6M CGroup: /system.slice/qemu-guest-agent.service └─653 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-> Feb 24 15:39:15 localhost.localdomain systemd[1]: Started QEMU Guest Agent.
Tomas, any ideas?
capabilities check interval is 5 minutes. Logs above seem to match that. We should consider some simple back off algorithm for VM start
In the last builds with rhel8.0 guest, I see a problem that could be related to the current BZ, though I'm not sure. I describe it here for your consideration if it could be the same problem or separate bug. The problem: Start the rhel8 guest VM on iscsi disk . block the storage on the host. wait for the paused state. The rhel8 VM doesn't turn to paused state. With thel7.6 VM it works perfect - after less than a minute after the storage blocked it turns to the Paused state. I attach logs from two tests - first_run.tar.gz and second_run.tar.gz. In the test1 I run two iscsi VMs rhel7.6 and rhel8 the first time after creating, wait for IP. block the iscsi storage on the host => as result test_rhel7.6 turns to Paused and the test_rhel8 is not. The test2 is just the same . The difference if that it is not the first run , so , I get IP sooner , though the result it the same: the test_rhel8 never turns to the paused state. I wait until the host turns to non-operational at about 2019-04-04 10:14:51 which causes test_rhel8 to migrate. This behavior causes automation tests failures. happens a lot though not 100% cases. sometimes it behaves as expected. ### test1 (the first run and getting IP after vm created) starts at: 2019-04-04 09:41:38,079+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-1) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel8 started on Host host_mixed_1 2019-04-04 09:41:53,117+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-91) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel7.6 started on Host host_mixed_1 2019-04-04 09:48:08,806+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-70) [] EVENT_ID: VM_PAUSED_EIO(145), VM test_rhel7.6 has been paused due to storage I/O problem. ### test 2 (the second run) starts at: 2019-04-04 10:05:55,792+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel7.6 started on Host host_mixed_1 2019-04-04 10:06:10,905+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [] EVENT_ID: USER_RUN_VM(32), VM test_rhel8 started on Host host_mixed_1 about Thu Apr 4 10:14:51 IDT 2019 host_mixed_1 turns to non-operational state which causes test_rhel8 to migrate
Created attachment 1551703 [details] first_run.tar
Created attachment 1551704 [details] second_run.tar
This problem appears also in : ovirt-engine-4.3.3.6, with rhel8.0 guest 2019-05-04 03:29:36,156 - MainThread - RemoteExecutor - DEBUG - [root.16.38/qum5net] OUT: [{u'displayInfo': [{u'tlsPort': u'5901', u'ipAddress': u'10.46.16.38', u'port': u'5900', u'type': u'spice'}], u'memUsage': u'63', u'acpiEnable': u'true', u'guestFQDN': u'', u'vmId': u'43647a6f-bf30-478d-b2f6-9e40db8bda0c', u'session': u'Unknown', u'timeOffset': u'0', u'memoryStats': {u'swap_out': 0, u'majflt': 0, u'minflt': 0, u'mem_free': u'378480', u'swap_in': 0, u'pageflt': 0, u'mem_total': u'772136', u'mem_unused': u'378480'}, u'balloonInfo': {u'balloon_max': u'1048576', u'balloon_cur': u'1048576', u'balloon_target': u'1048576', u'balloon_min': u'1048576'}, u'pauseCode': u'NOERR', u'network': {u'vnet0': {u'macAddr': u'00:1a:4a:16:88:ac', u'rxDropped': u'0', u'tx': u'4149', u'txDropped': u'0', u'rxErrors': u'0', u'rx': u'254188', u'txErrors': u'0', u'state': u'unknown', u'sampleTime': 4319342.34, u'speed': u'1000', u'name': u'vnet0'}}, u'vmType': u'kvm', u'cpuUser': u'0.05', u'elapsedTime': u'311', u'vmJobs': {}, u'cpuSys': u'0.40', u'appsList': [u'qemu-guest-agent-2.12.0'], u'vmName': u'copy_disk_vm_glusterfs', u'vcpuCount': u'1', u'hash': u'-8771708058453342833', u'cpuUsage': u'8600000000', u'vcpuPeriod': 100000 Logs attached
Created attachment 1564473 [details] New_Logs
not related to RHEL 8 at all, just to qemu-ga without ovirt-ga which can happen on any guest.
Created attachment 1613592 [details] New_Logs_4.3.6.5V
Work is being done on fixing the issue
This bug is on POST for 4.4.1 but referenced patches are included in vdsm v4.40.13. Can we move this to modified for 4.4.0?
Verified on vdsm-4.40.14-1.el8ev.x86_64
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.