Bug 1474320 - When restarting VDSM while VM is running and after that powering off the VM, virsh report that VM state is running
Summary: When restarting VDSM while VM is running and after that powering off the VM, ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.19.23
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ovirt-4.1.4
: ---
Assignee: Francesco Romani
QA Contact: Michael Burman
URL:
Whiteboard:
: 1474409 (view as bug list)
Depends On:
Blocks: 1475261
TreeView+ depends on / blocked
 
Reported: 2017-07-24 12:10 UTC by Ori Ben Sasson
Modified: 2019-04-28 13:24 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
: 1475261 (view as bug list)
Environment:
Last Closed: 2017-07-28 14:11:31 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: blocker+


Attachments (Terms of Use)
engine and vdsm logs (2.67 MB, application/zip)
2017-07-24 12:10 UTC, Ori Ben Sasson
no flags Details
qemu log (176.31 KB, text/plain)
2017-07-24 12:27 UTC, Ori Ben Sasson
no flags Details
vdsm log with DEBUG mode (3.44 MB, text/plain)
2017-07-24 13:43 UTC, Ori Ben Sasson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 79782 0 ovirt-4.1 MERGED Revert "storage: Added disk type change logging" 2017-07-25 08:16:24 UTC
oVirt gerrit 79783 0 ovirt-4.1 MERGED Revert "storage: Made diskType a real property." 2017-07-25 08:16:28 UTC
oVirt gerrit 79794 0 master MERGED tests: Add failing test when storage is not available 2017-07-26 07:18:43 UTC
oVirt gerrit 79795 0 master MERGED storage: Handle inaccessible storage when creating a drive 2017-07-26 07:18:46 UTC
oVirt gerrit 79800 0 ovirt-4.1 MERGED Revert "Revert "storage: Made diskType a real property."" 2017-07-27 11:46:59 UTC
oVirt gerrit 79801 0 ovirt-4.1 MERGED Revert "Revert "storage: Added disk type change logging"" 2017-07-27 11:47:05 UTC
oVirt gerrit 79802 0 ovirt-4.1 MERGED storage: Handle inaccessible storage when creating a drive 2017-07-27 11:47:16 UTC
oVirt gerrit 79814 0 ovirt-4.1 MERGED tests: Add failing test when storage is not available 2017-07-27 11:47:11 UTC

Description Ori Ben Sasson 2017-07-24 12:10:40 UTC
Created attachment 1303599 [details]
engine and vdsm logs

Description of problem:
When restarting VDSM while VM is running and after that powering off the VM, virsh report that VM state is running, by looking on the UI the VM is down.

 Id    Name                           State
----------------------------------------------------
 71    golden_env_mixed_virtio_0      running


Version-Release number of selected component (if applicable):
ovirt-engine-4.1.4.2-0.1.el7.noarch 
vdsm-4.19.23-1.el7ev.x86_64

How reproducible:
100

Steps to Reproduce:
1. Run VM 
2. Restart vdsm
3. Power off VM 

Actual results:
Virsh report that VM state still running

Expected results:
VM should be down

Additional info:

Comment 1 Ori Ben Sasson 2017-07-24 12:27:49 UTC
Created attachment 1303625 [details]
qemu log

Comment 2 Michal Skrivanek 2017-07-24 13:03:12 UTC
would appreciate vdsm logs with debug level

Comment 3 Ori Ben Sasson 2017-07-24 13:43:12 UTC
Created attachment 1303690 [details]
vdsm log with DEBUG mode

Comment 4 Ori Ben Sasson 2017-07-24 13:43:47 UTC
(In reply to Michal Skrivanek from comment #2)
> would appreciate vdsm logs with debug level

done

Comment 5 Meni Yakove 2017-07-24 13:48:38 UTC
Reproduce on RHEL 7.3 as well

Comment 6 Michal Skrivanek 2017-07-24 14:11:00 UTC
Thanks.
so far it points to a vdsm recovery problem, not really a recent regression

Comment 7 Michal Skrivanek 2017-07-25 05:32:51 UTC
*** Bug 1474409 has been marked as a duplicate of this bug. ***

Comment 8 Michal Skrivanek 2017-07-25 05:59:02 UTC
yay, it is a recent regression
Francesco, please revert 
Ia4277334c36920bdc275343e3520d2f39851695f
I2af090425961361a111360eb66a56899684f2eae

Comment 9 Michal Skrivanek 2017-07-25 06:25:53 UTC
the regression was introduced in 4.19.21

Comment 10 Francesco Romani 2017-07-25 07:03:10 UTC
(In reply to Michal Skrivanek from comment #8)
> yay, it is a recent regression
> Francesco, please revert 
> Ia4277334c36920bdc275343e3520d2f39851695f
> I2af090425961361a111360eb66a56899684f2eae

Rationale for the revert:
1. Ia4277334c36920bdc275343e3520d2f39851695f
This is actually where we break:
https://gerrit.ovirt.org/#/c/78938/3/vdsm/virt/vmdevices/storage.py

        if not hasattr(self, '_diskType'):
            if (self.device in ("cdrom", "floppy") or not
                    utils.isBlockDevice(self.path)):
                self.diskType = DISK_TYPE.FILE
            else:
                self.diskType = DISK_TYPE.BLOCK

"self.path" may not yet be available at this time in recovery. And the logs provided here indeed demonstrate that it is not available.

2. I2af090425961361a111360eb66a56899684f2eae just depends on the other one, no other reason.

This means we need to check master for the same bug.

Comment 11 Francesco Romani 2017-07-25 07:04:24 UTC
Denis, please check https://bugzilla.redhat.com/show_bug.cgi?id=1474320#c10

Comment 12 Francesco Romani 2017-07-25 10:56:59 UTC
for *this* bug, we will go on with the revert, to quickly unblock 4.1.4. The broader issue deserves more work, currently ongoing on master branch.

Comment 13 Michael Burman 2017-07-25 14:08:14 UTC
Verified on - vdsm-4.19.24-1.el7ev.x86_64

Comment 14 Michael Burman 2017-07-25 18:11:05 UTC
HI Francesco,

Can i ask why patches were added to this bug after it was moved to ON_QA and verified?
Should this be retested on next build?

Comment 15 Nir Soffer 2017-07-25 18:31:29 UTC
I added the real fix to this issue to master, we can remove the extra patches
from this bug and clone or open a new bug for the master fix.

Comment 16 Michael Burman 2017-07-26 05:15:25 UTC
(In reply to Nir Soffer from comment #15)
> I added the real fix to this issue to master, we can remove the extra patches
> from this bug and clone or open a new bug for the master fix.

I believe this is exactly what we should do. This report was verified on 4.1.4 and no patches should be added here any more. 
Please create a proper bug for master in order to test this real fix on master 4.2 as well. Thanks

Comment 17 Gil Klein 2017-07-26 06:00:00 UTC
(In reply to Michael Burman from comment #16)
> (In reply to Nir Soffer from comment #15)
> > I added the real fix to this issue to master, we can remove the extra patches
> > from this bug and clone or open a new bug for the master fix.
> 
> I believe this is exactly what we should do. This report was verified on
> 4.1.4 and no patches should be added here any more. 
> Please create a proper bug for master in order to test this real fix on
> master 4.2 as well. Thanks
+1 This is verified for 4.1.4 after the revert, are should be retested for 4.1.5

Comment 18 Francesco Romani 2017-07-26 07:00:33 UTC
(In reply to Michael Burman from comment #14)
> HI Francesco,
> 
> Can i ask why patches were added to this bug after it was moved to ON_QA and
> verified?
> Should this be retested on next build?

Answered in https://bugzilla.redhat.com/show_bug.cgi?id=1474320#c15 - clearing NEEDINFO.

Comment 19 Michal Skrivanek 2017-07-26 10:20:01 UTC
the patches are part of libgfapi. There's no point in adding patches to _this_ bug as this bug is only about a regression in other, unrelated flow


Note You need to log in before you can comment on or make changes to this bug.