Bug 1475261 - When restarting VDSM while VM is running and after that powering off the VM, virsh report that VM state is running
When restarting VDSM while VM is running and after that powering off the VM, ...
Status: CLOSED NOTABUG
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
4.19.23
x86_64 Linux
urgent Severity urgent (vote)
: ovirt-4.1.4
: ---
Assigned To: Dan Kenigsberg
Raz Tamir
: Automation, Regression
Depends On: 1474320
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-26 06:15 EDT by Denis Chaplygin
Modified: 2017-07-26 06:26 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1474320
Environment:
Last Closed: 2017-07-26 06:26:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: blocker+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 79800 ovirt-4.1 MERGED Revert "Revert "storage: Made diskType a real property."" 2017-07-27 07:46 EDT
oVirt gerrit 79801 ovirt-4.1 MERGED Revert "Revert "storage: Added disk type change logging"" 2017-07-27 07:47 EDT
oVirt gerrit 79802 ovirt-4.1 MERGED storage: Handle inaccessible storage when creating a drive 2017-07-27 07:47 EDT
oVirt gerrit 79814 ovirt-4.1 MERGED tests: Add failing test when storage is not available 2017-07-27 07:47 EDT

  None (edit)
Description Denis Chaplygin 2017-07-26 06:15:26 EDT
+++ This bug was initially created as a clone of Bug #1474320 +++

Description of problem:
When restarting VDSM while VM is running and after that powering off the VM, virsh report that VM state is running, by looking on the UI the VM is down.

 Id    Name                           State
----------------------------------------------------
 71    golden_env_mixed_virtio_0      running


Version-Release number of selected component (if applicable):
ovirt-engine-4.1.4.2-0.1.el7.noarch 
vdsm-4.19.23-1.el7ev.x86_64

How reproducible:
100

Steps to Reproduce:
1. Run VM 
2. Restart vdsm
3. Power off VM 

Actual results:
Virsh report that VM state still running

Expected results:
VM should be down

Additional info:

--- Additional comment from Ori Ben Sasson on 2017-07-24 08:27 EDT ---



--- Additional comment from Michal Skrivanek on 2017-07-24 09:03:12 EDT ---

would appreciate vdsm logs with debug level

--- Additional comment from Ori Ben Sasson on 2017-07-24 09:43 EDT ---



--- Additional comment from Ori Ben Sasson on 2017-07-24 09:43:47 EDT ---

(In reply to Michal Skrivanek from comment #2)
> would appreciate vdsm logs with debug level

done

--- Additional comment from Meni Yakove on 2017-07-24 09:48:38 EDT ---

Reproduce on RHEL 7.3 as well

--- Additional comment from Michal Skrivanek on 2017-07-24 10:11:00 EDT ---

Thanks.
so far it points to a vdsm recovery problem, not really a recent regression

--- Additional comment from Michal Skrivanek on 2017-07-25 01:32:51 EDT ---



--- Additional comment from Michal Skrivanek on 2017-07-25 01:59:02 EDT ---

yay, it is a recent regression
Francesco, please revert 
Ia4277334c36920bdc275343e3520d2f39851695f
I2af090425961361a111360eb66a56899684f2eae

--- Additional comment from Michal Skrivanek on 2017-07-25 02:25:53 EDT ---

the regression was introduced in 4.19.21

--- Additional comment from Francesco Romani on 2017-07-25 03:03:10 EDT ---

(In reply to Michal Skrivanek from comment #8)
> yay, it is a recent regression
> Francesco, please revert 
> Ia4277334c36920bdc275343e3520d2f39851695f
> I2af090425961361a111360eb66a56899684f2eae

Rationale for the revert:
1. Ia4277334c36920bdc275343e3520d2f39851695f
This is actually where we break:
https://gerrit.ovirt.org/#/c/78938/3/vdsm/virt/vmdevices/storage.py

        if not hasattr(self, '_diskType'):
            if (self.device in ("cdrom", "floppy") or not
                    utils.isBlockDevice(self.path)):
                self.diskType = DISK_TYPE.FILE
            else:
                self.diskType = DISK_TYPE.BLOCK

"self.path" may not yet be available at this time in recovery. And the logs provided here indeed demonstrate that it is not available.

2. I2af090425961361a111360eb66a56899684f2eae just depends on the other one, no other reason.

This means we need to check master for the same bug.

--- Additional comment from Francesco Romani on 2017-07-25 03:04:24 EDT ---

Denis, please check https://bugzilla.redhat.com/show_bug.cgi?id=1474320#c10

--- Additional comment from Francesco Romani on 2017-07-25 06:56:59 EDT ---

for *this* bug, we will go on with the revert, to quickly unblock 4.1.4. The broader issue deserves more work, currently ongoing on master branch.

--- Additional comment from Michael Burman on 2017-07-25 10:08:14 EDT ---

Verified on - vdsm-4.19.24-1.el7ev.x86_64

--- Additional comment from Michael Burman on 2017-07-25 14:11:05 EDT ---

HI Francesco,

Can i ask why patches were added to this bug after it was moved to ON_QA and verified?
Should this be retested on next build?

--- Additional comment from Nir Soffer on 2017-07-25 14:31:29 EDT ---

I added the real fix to this issue to master, we can remove the extra patches
from this bug and clone or open a new bug for the master fix.

--- Additional comment from Michael Burman on 2017-07-26 01:15:25 EDT ---

(In reply to Nir Soffer from comment #15)
> I added the real fix to this issue to master, we can remove the extra patches
> from this bug and clone or open a new bug for the master fix.

I believe this is exactly what we should do. This report was verified on 4.1.4 and no patches should be added here any more. 
Please create a proper bug for master in order to test this real fix on master 4.2 as well. Thanks

--- Additional comment from Gil Klein on 2017-07-26 02:00:00 EDT ---

(In reply to Michael Burman from comment #16)
> (In reply to Nir Soffer from comment #15)
> > I added the real fix to this issue to master, we can remove the extra patches
> > from this bug and clone or open a new bug for the master fix.
> 
> I believe this is exactly what we should do. This report was verified on
> 4.1.4 and no patches should be added here any more. 
> Please create a proper bug for master in order to test this real fix on
> master 4.2 as well. Thanks
+1 This is verified for 4.1.4 after the revert, are should be retested for 4.1.5

--- Additional comment from Francesco Romani on 2017-07-26 03:00:33 EDT ---

(In reply to Michael Burman from comment #14)
> HI Francesco,
> 
> Can i ask why patches were added to this bug after it was moved to ON_QA and
> verified?
> Should this be retested on next build?

Answered in https://bugzilla.redhat.com/show_bug.cgi?id=1474320#c15 - clearing NEEDINFO.
Comment 1 Red Hat Bugzilla Rules Engine 2017-07-26 06:15:43 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 2 Michal Skrivanek 2017-07-26 06:26:36 EDT
this issue is already fixed. Further proposed changes are part of the libgfapi development and needs to be trakced there

Note You need to log in before you can comment on or make changes to this bug.