Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1788185

Summary: VM fails to start after previewing or committing ram snapshot
Product: [oVirt] ovirt-engine Reporter: Evelina Shames <eshames>
Component: BLL.StorageAssignee: Fedor Gavrilov <fgavrilo>
Status: CLOSED WORKSFORME QA Contact: Avihai <aefrat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.0CC: bugs, eshenitz, lsvaty, pagranat, tnisan, vjuranek
Target Milestone: ovirt-4.4.0Flags: pm-rhel: ovirt-4.4+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-01 14:08:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1803551    
Bug Blocks:    
Attachments:
Description Flags
logs none

Description Evelina Shames 2020-01-06 16:13:48 UTC
Created attachment 1650169 [details]
logs

Description of problem:
VM with iscsi disk fails to start after previewing or committing ram snapshot with the following errors:

Engine:
2020-01-06 05:59:55,562+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-17) [3a059f40] EVENT_ID: VM_DOWN_ERROR(119), VM vm_TestCase5138_0605550812 is down with error. Exit message: Wake up from hibernation failed:internal error: child reported (status=125): unable to get SELinux context of /rhev/data-center/mnt/blockSD/76d1afe7-fca9-41b3-83c4-98c62380b4f1/images/2cd3b48c-96e9-4ea8-a564-70a9429eb21f/91a162fe-848b-418e-8ef2-423f0d194522: No such file or directory.

VDSM:
2020-01-05 22:59:54,682-0500 ERROR (vm/e865707b) [virt.vm] (vmId='e865707b-a7a2-401e-b8e6-1f33357f7c2d') The vm start process failed (vm:835)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 769, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2547, in _run
    fname, srcDomXML, libvirt.VIR_DOMAIN_SAVE_PAUSED)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 4732, in restoreFlags
    if ret == -1: raise libvirtError ('virDomainRestoreFlags() failed', conn=self)
libvirt.libvirtError: internal error: child reported (status=125): unable to get SELinux context of /rhev/data-center/mnt/blockSD/76d1afe7-fca9-41b3-83c4-98c62380b4f1/images/2cd3b48c-96e9-4
ea8-a564-70a9429eb21f/91a162fe-848b-418e-8ef2-423f0d194522: No such file or directory

If there are few hosts, the VM will try to start on another host and may succeed.
But if there is only one host - it fails.

Version-Release number of selected component (if applicable):
ovirt-engine-4.4.0-0.13.master.el7.noarch
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
libvirt-5.6.0-6.module+el8.1.0+4244+9aa4e6bb.x86_64

How reproducible:
Most of the times

Steps to Reproduce:
1. Create VM with iscsi disk
2. Run VM
3. Create ram snapshot s1
4. Power off VM
5. Preview s1 -> try to run VM
   OR Preview and commit s1 -> try to run VM


Actual results:
Operation fails

Expected results:
Operation should succeed

Additional info:
Logs are attached

Comment 1 Tal Nisan 2020-01-13 15:39:59 UTC
Vojtech, seems like something messed the SELinux labels of the created memory disks, can you please look for the cause?

Comment 3 Vojtech Juranek 2020-02-05 11:16:35 UTC
I tried several times and cannot reproduce.
Please provide vdsm tag/version used for reproducer, I cannot find vdsm-4.40.0-164 anywhere.
Also do you create memory snapshot with all disks in the reproducer or only with one of them?
Cloud you also check that missing disk (in reprorted case /rhev/data-center/mnt/blockSD/76d1afe7-fca9-41b3-83c4-98c62380b4f1/images/2cd3b48c-96e9-4ea8-a564-70a9429eb21f/91a162fe-848b-418e-8ef2-423f0d194522) exists on the host and run ls -laZ on it?
Thanks

Comment 4 Evelina Shames 2020-02-11 07:01:22 UTC
Created attachment 1662379 [details]
new logs - TestCase5134

Tried to reproduced again on:
engine-4.4.0-0.19.master.el7
vdsm-4.40.2-1.el8ev.x86_64

I couldn't manage to reproduce with these steps manually on this version, I'll try to find a manual flow, but the same error appears in our automation, attaching relevant logs.

*There is a time difference of 7 hours between engine log and vdsm log.

Comment 5 Tal Nisan 2020-02-17 15:36:09 UTC
*** Bug 1803519 has been marked as a duplicate of this bug. ***

Comment 6 Tal Nisan 2020-02-24 15:16:46 UTC
Vojtech, please try if you can find anything suspicious in the logs

Comment 7 Vojtech Juranek 2020-02-25 12:38:12 UTC
I was able to reproduce locally woth CentOS 8 and libvirt 5.6. It seems to be same issue as BZ #1803551 (libvirt access links in /rhev dir instead using /dev), but more investigation is needed to be sure about it.

Comment 8 Eyal Shenitzky 2020-04-01 07:30:55 UTC
Evelina, BZ #1803551 is already fixed with libvirt-6.0.0-10.el8.
Can you please test if the issue still occurs?

If not, please close this bug.

Comment 9 Evelina Shames 2020-04-01 14:08:58 UTC
Works on engine-4.4.0-0.29.master.el8ev