Bug 1733804

Summary: Resume of a suspended VM fails with error "Child process (gzip -dc) unexpected exit status 2"
Product: [oVirt] ovirt-engine Reporter: Polina <pagranat>
Component: BLL.VirtAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED DEFERRED QA Contact: meital avital <mavital>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.5.4CC: bugs, rbarry
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine & vdsm logs
none
qemu log of failed VM
none
engine /var/log/
none
host /var/log/ for source and destination none

Description Polina 2019-07-28 19:30:33 UTC
Created attachment 1594112 [details]
engine & vdsm logs

Description of problem: a suspended VM which is part of affinity group fails to run with error: "Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2: 
gzip: stdin: decompression OK, trailing garbage ignored"

Version-Release number of selected component (if applicable): ovirt-engine-4.3.5.4-0.1.el7.noarch

How reproducible: 80% in the following scenario

Steps to Reproduce:

1. Create affinity group with hard positive VMs rule for VM1,VM2, VM3 (in Cluster/Affinity Groups).
2. Run VMs on host1.
3. Suspend one VM from the group - success.
4. Select all three VMs , choose to migrate with closure option (Migrate VMs in Affinity in Migrate window) to host2 - success:two VMs migrated .
5. Try to resume the suspended VM.

Actual results: VM fails to run.

2019-07-28 18:54:51,105+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '155c0288-1fde-47f4-8014-b6449d9ee51e'(golden_env_mixed_virtio_2_0) moved from 'RestoringState' --> 'Down'
2019-07-28 18:54:51,170+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2:
gzip: stdin: decompression OK, trailing garbage ignored

Expected results:
VM must on the host2

Additional info: sometimes it succeeds, but fails quite a lot. Please see in the attached engine.log the error at 2019-07-28 18:54:51,170+03

Comment 1 Polina 2019-07-28 19:39:24 UTC
Created attachment 1594113 [details]
qemu log of failed VM

Comment 2 Ryan Barry 2019-07-29 10:07:37 UTC
Full /var/log would be helpful in this case

Comment 3 Polina 2019-07-30 16:09:12 UTC
Created attachment 1594688 [details]
engine /var/log/

Attached var/log dir for engine , source host and destination host .
The scenario (the same as in description): three VMs in positive hard affinity VMs rule are running on host_mixed_2. vm  golden_env_mixed_virtio_2_0 is suspended. Then select all three VMs and choose to migrate with closure option to the host_mixed_1. Two VMs are migrated successfully . Try to resume the suspended VM fails.

The ERROR happens at:

2019-07-30 17:28:00,652+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2:
gzip: stdin: decompression OK, trailing garbage ignored

2019-07-30 17:28:02,359+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-67833) [5833f98] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM golden_env_mixed_virtio_2_0  (User: admin@internal-authz).

The bug is quite reproducible. sometimes it is just required to repeat the scenario two or three times.

Comment 4 Polina 2019-07-30 16:20:37 UTC
Created attachment 1594702 [details]
host /var/log/ for source and destination

Comment 5 Ryan Barry 2019-07-30 16:41:10 UTC
Thanks Polina.

So, this is still on block storage, and possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1503468 (and the related change from lzo to gzip). Reproducible on NFS?

Comment 6 Michal Skrivanek 2020-03-18 15:50:25 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 7 Michal Skrivanek 2020-03-18 15:54:55 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 8 Michal Skrivanek 2020-04-01 14:49:10 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 9 Michal Skrivanek 2020-04-01 14:52:05 UTC
ok, closing. Please reopen if still relevant/you want to work on it.