Bug 1933669
Summary: | Improve live snapshot logs in case of aborting | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Liran Rotenberg <lrotenbe> |
Component: | Core | Assignee: | Liran Rotenberg <lrotenbe> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tamir <tamir> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.40.13 | CC: | ahadas, bugs, dfodor, jean-louis |
Target Milestone: | ovirt-4.4.6 | Keywords: | CodeChange |
Target Release: | 4.40.60.5 | Flags: | pm-rhel:
ovirt-4.4+
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | vdsm-4.40.60.5 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-05 05:35:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Liran Rotenberg
2021-03-01 11:34:16 UTC
Verified on RHV 4.4.6-6. All looks good to me. Env: - Engine instance with RHV 4.4.6-6 (ovirt-engine-4.4.6.5-0.17.el8ev) and RHEL 8.4 installed. - 3 hosts with RHV 4.4.6-6 and RHEL 8.4, vdsm-4.40.60.5-1.el8ev I verified both triggers: (I) The job hit the timeout, then calling to abort the job in libvirt. Steps: 1. Run "engine-config -s LiveSnapshotTimeoutInMinutes=1" on engine machine. 2. Run "systemctl restart ovirt-engine" on engine machine. 3. Create a 4.6 data center and a 4.6 cluster. 4. Install a host and create a new NFS storage domain. 5. Create a VM from a RHEL 8 template with disk, 63000 MB Memory Size, 63000 MB Maximum memory and 63000 MB Physical Memory Guaranteed. 6. Run the VM. 7. Install "stress" tool in the VM. 8. Run "stress --vm 1 --vm-bytes 60000M --vm-keep -t 60m" in the VM. 9. Create snapshot with memory and wait until the snapshot fail. Results (As Expected): 1. The "LiveSnapshotTimeoutInMinutes" property is set as 1 min. 2. The engine was restarted. 3. The 4.6 data center and the 4.6 cluster were created. 4. The host was installed and the NFS storage domain was created. 5. The VM was created. 6. The VM is running. 7. The package is installed. 8. The VM's memory is stressed. 9. The snapshot was failed. Logs: "engine.log": 2021-04-29 17:31:06,223+03 WARN [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] VM is null - no unlocking 2021-04-29 17:31:06,264+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] EVENT_ID: USER_ADD_DISK_FINISH ED_FAILURE(2,022), Add-Disk operation failed to complete. 2021-04-29 17:31:06,277+03 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] Trying to release exclusive lock which does not exist, lock key: '517548c6-75a8-4833-9a5a-112f3eccd09fVM' 2021-04-29 17:31:06,278+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] Lock freed to object 'EngineLock:{ex clusiveLocks='[517548c6-75a8-4833-9a5a-112f3eccd09f=VM]', sharedLocks=''}' 2021-04-29 17:31:06,326+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] EVENT_ID: USER_CREATE_SNAPSHOT _FINISHED_FAILURE(69), Failed to complete snapshot '1' creation for VM 'VDSM'. 2021-04-29 17:31:06,327+03 WARN [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-82) [] Command 'CreateSnapshotForVm' id: '0 3380316-97c6-4663-88cf-91ed91f4d009' end method execution failed, as the command isn't marked for endAction() retries silently ignoring "vdsm.log": 2021-04-29 17:31:00,842+0300 INFO (virt/50a60924) [virt.vm] (vmId='517548c6-75a8-4833-9a5a-112f3eccd09f') Snapshot timeout reached, operation aborted (snapshot:472) 2021-04-29 17:31:00,842+0300 ERROR (virt/50a60924) [virt.vm] (vmId='517548c6-75a8-4833-9a5a-112f3eccd09f') Unable to take snapshot (snapshot:478) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/jobs/snapshot.py", line 462, in snapshot self._vm.run_dom_snapshot(snapxml, snap_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5856, in run_dom_snapshot self._dom.snapshotCreateXML(snapxml, snap_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 3049, in snapshotCreateXML raise libvirtError('virDomainSnapshotCreateXML() failed') libvirt.libvirtError: operation aborted: snapshot job: canceled by client (II) libvirt error that returns to VDSM. Steps: 1. Run "engine-config -s LiveSnapshotTimeoutInMinutes=30" on engine machine. 2. Run "systemctl restart ovirt-engine" on engine machine. 3. Create a 4.6 data center and a 4.6 cluster. 4. Install a host and create a new NFS storage domain. 5. Create a VM from a RHEL 8 template with disk, 63000 MB Memory Size, 63000 MB Maximum memory and 63000 MB Physical Memory Guaranteed. 6. Run the VM. 7. Install "stress" tool in the VM. 8. Run "stress --vm 1 --vm-bytes 60000M --vm-keep -t 60m" in the VM. 9. Create snapshot with memory. 10. Run "virsh -r domjobinfo VM_NAME" in the host the VM is running on until there is a snapshot job on the VM. (Instead of the VM_NAME, put the vm name you chose) 11. Run "virsh domjobabort VM_NAME" in the host the VM is running on. (Instead of the VM_NAME, put the vm name you chose) 12. Wait until the snapshot fail. Results (As Expected): 1. The "LiveSnapshotTimeoutInMinutes" property is set as 30 min. 2. The engine was restarted. 3. The 4.6 data center and the 4.6 cluster were created. 4. The host was installed and the NFS storage domain was created. 5. The VM was created. 6. The VM is running. 7. The package is installed. 8. The VM's memory is stressed. 9. The snapshot in progress. 10. The domain got the snapshot job on. 11. The job was aborted. 12. The snapshot was failed. Logs: "engine.log": 021-04-29 18:09:44,132+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateLiveSnapshotForVmCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-10) [c7194afe-5a9d-4dbd-a65e-5c907f009ca7] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateLiveSnapshotForVmCommand' with failure. 2021-04-29 18:09:44,188+03 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-10) [c7194afe-5a9d-4dbd-a65e-5c907f009ca7] Command 'CreateSnapshotForVm' id: '1def18d8-8780-4a7d-a782-771549da8320' child commands '[bcdb68f8-1fd1-447b-9d2b-5128cf6ddabd, 980bc287-933a-4aea-ad58-deca1653acd6, 521a088f-b884-4ccf-bbca-41b8a04e1785, e0ad8e1c-4ff3-40b6-8858-0a16ace1a6ed]' executions were completed, status 'FAILED' 2021-04-29 18:09:45,245+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-99) [c7194afe-5a9d-4dbd-a65e-5c907f009ca7] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure. "vdsm.log": 2021-04-29 18:09:13,527+0300 WARN (virt/cb961973) [virt.vm] (vmId='517548c6-75a8-4833-9a5a-112f3eccd09f') Snapshot operation aborted by libvirt: operation aborted: snapshot job: canceled by client (snapshot:47 7) 2021-04-29 18:09:13,527+0300 ERROR (virt/cb961973) [virt.vm] (vmId='517548c6-75a8-4833-9a5a-112f3eccd09f') Unable to take snapshot (snapshot:478) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/jobs/snapshot.py", line 462, in snapshot self._vm.run_dom_snapshot(snapxml, snap_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5856, in run_dom_snapshot self._dom.snapshotCreateXML(snapxml, snap_flags) File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 3049, in snapshotCreateXML raise libvirtError('virDomainSnapshotCreateXML() failed') libvirt.libvirtError: operation aborted: snapshot job: canceled by client This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |