Created attachment 1594112 [details] engine & vdsm logs Description of problem: a suspended VM which is part of affinity group fails to run with error: "Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2: gzip: stdin: decompression OK, trailing garbage ignored" Version-Release number of selected component (if applicable): ovirt-engine-4.3.5.4-0.1.el7.noarch How reproducible: 80% in the following scenario Steps to Reproduce: 1. Create affinity group with hard positive VMs rule for VM1,VM2, VM3 (in Cluster/Affinity Groups). 2. Run VMs on host1. 3. Suspend one VM from the group - success. 4. Select all three VMs , choose to migrate with closure option (Migrate VMs in Affinity in Migrate window) to host2 - success:two VMs migrated . 5. Try to resume the suspended VM. Actual results: VM fails to run. 2019-07-28 18:54:51,105+03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '155c0288-1fde-47f4-8014-b6449d9ee51e'(golden_env_mixed_virtio_2_0) moved from 'RestoringState' --> 'Down' 2019-07-28 18:54:51,170+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2: gzip: stdin: decompression OK, trailing garbage ignored Expected results: VM must on the host2 Additional info: sometimes it succeeds, but fails quite a lot. Please see in the attached engine.log the error at 2019-07-28 18:54:51,170+03
Created attachment 1594113 [details] qemu log of failed VM
Full /var/log would be helpful in this case
Created attachment 1594688 [details] engine /var/log/ Attached var/log dir for engine , source host and destination host . The scenario (the same as in description): three VMs in positive hard affinity VMs rule are running on host_mixed_2. vm golden_env_mixed_virtio_2_0 is suspended. Then select all three VMs and choose to migrate with closure option to the host_mixed_1. Two VMs are migrated successfully . Try to resume the suspended VM fails. The ERROR happens at: 2019-07-30 17:28:00,652+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2: gzip: stdin: decompression OK, trailing garbage ignored 2019-07-30 17:28:02,359+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-67833) [5833f98] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM golden_env_mixed_virtio_2_0 (User: admin@internal-authz). The bug is quite reproducible. sometimes it is just required to repeat the scenario two or three times.
Created attachment 1594702 [details] host /var/log/ for source and destination
Thanks Polina. So, this is still on block storage, and possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1503468 (and the related change from lzo to gzip). Reproducible on NFS?
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly
ok, closing. Please reopen if still relevant/you want to work on it.