Bug 1733804 - Resume of a suspended VM fails with error "Child process (gzip -dc) unexpected exit status 2"
Summary: Resume of a suspended VM fails with error "Child process (gzip -dc) unexpecte...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.5.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Michal Skrivanek
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-28 19:30 UTC by Polina
Modified: 2020-04-01 14:52 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Virt
Embargoed:


Attachments (Terms of Use)
engine & vdsm logs (5.35 MB, application/gzip)
2019-07-28 19:30 UTC, Polina
no flags Details
qemu log of failed VM (55.11 KB, text/plain)
2019-07-28 19:39 UTC, Polina
no flags Details
engine /var/log/ (11.62 MB, application/gzip)
2019-07-30 16:09 UTC, Polina
no flags Details
host /var/log/ for source and destination (84 bytes, text/plain)
2019-07-30 16:20 UTC, Polina
no flags Details

Description Polina 2019-07-28 19:30:33 UTC
Created attachment 1594112 [details]
engine & vdsm logs

Description of problem: a suspended VM which is part of affinity group fails to run with error: "Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2: 
gzip: stdin: decompression OK, trailing garbage ignored"

Version-Release number of selected component (if applicable): ovirt-engine-4.3.5.4-0.1.el7.noarch

How reproducible: 80% in the following scenario

Steps to Reproduce:

1. Create affinity group with hard positive VMs rule for VM1,VM2, VM3 (in Cluster/Affinity Groups).
2. Run VMs on host1.
3. Suspend one VM from the group - success.
4. Select all three VMs , choose to migrate with closure option (Migrate VMs in Affinity in Migrate window) to host2 - success:two VMs migrated .
5. Try to resume the suspended VM.

Actual results: VM fails to run.

2019-07-28 18:54:51,105+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-1) [] VM '155c0288-1fde-47f4-8014-b6449d9ee51e'(golden_env_mixed_virtio_2_0) moved from 'RestoringState' --> 'Down'
2019-07-28 18:54:51,170+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2:
gzip: stdin: decompression OK, trailing garbage ignored

Expected results:
VM must on the host2

Additional info: sometimes it succeeds, but fails quite a lot. Please see in the attached engine.log the error at 2019-07-28 18:54:51,170+03

Comment 1 Polina 2019-07-28 19:39:24 UTC
Created attachment 1594113 [details]
qemu log of failed VM

Comment 2 Ryan Barry 2019-07-29 10:07:37 UTC
Full /var/log would be helpful in this case

Comment 3 Polina 2019-07-30 16:09:12 UTC
Created attachment 1594688 [details]
engine /var/log/

Attached var/log dir for engine , source host and destination host .
The scenario (the same as in description): three VMs in positive hard affinity VMs rule are running on host_mixed_2. vm  golden_env_mixed_virtio_2_0 is suspended. Then select all three VMs and choose to migrate with closure option to the host_mixed_1. Two VMs are migrated successfully . Try to resume the suspended VM fails.

The ERROR happens at:

2019-07-30 17:28:00,652+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [] EVENT_ID: VM_DOWN_ERROR(119), VM golden_env_mixed_virtio_2_0 is down with error. Exit message: Wake up from hibernation failed:internal error: Child process (gzip -dc) unexpected exit status 2:
gzip: stdin: decompression OK, trailing garbage ignored

2019-07-30 17:28:02,359+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-67833) [5833f98] EVENT_ID: USER_FAILED_RUN_VM(54), Failed to run VM golden_env_mixed_virtio_2_0  (User: admin@internal-authz).

The bug is quite reproducible. sometimes it is just required to repeat the scenario two or three times.

Comment 4 Polina 2019-07-30 16:20:37 UTC
Created attachment 1594702 [details]
host /var/log/ for source and destination

Comment 5 Ryan Barry 2019-07-30 16:41:10 UTC
Thanks Polina.

So, this is still on block storage, and possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1503468 (and the related change from lzo to gzip). Reproducible on NFS?

Comment 6 Michal Skrivanek 2020-03-18 15:50:25 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 7 Michal Skrivanek 2020-03-18 15:54:55 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 8 Michal Skrivanek 2020-04-01 14:49:10 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 9 Michal Skrivanek 2020-04-01 14:52:05 UTC
ok, closing. Please reopen if still relevant/you want to work on it.


Note You need to log in before you can comment on or make changes to this bug.