Bug 1391933 - Do not block the VM monitoring thread when something unexpected shows up
Summary: Do not block the VM monitoring thread when something unexpected shows up
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.0.5.5
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.0.6
: 4.0.6.2
Assignee: Arik
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1362618
TreeView+ depends on / blocked
 
Reported: 2016-11-04 12:33 UTC by Martin Sivák
Modified: 2017-01-18 07:25 UTC (History)
14 users (show)

Fixed In Version:
Clone Of: 1362618
Environment:
Last Closed: 2017-01-18 07:25:09 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
sosreport from alma03 (15.23 MB, application/x-xz)
2016-11-29 15:15 UTC, Nikolai Sednev
no flags Details
sosreport from alma04 (15.18 MB, application/x-xz)
2016-11-29 15:17 UTC, Nikolai Sednev
no flags Details
sosreport from engine (8.47 MB, application/x-xz)
2016-11-29 15:18 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 67000 0 ovirt-engine-4.0 MERGED core: fix possible deadlock 2016-11-18 12:39:09 UTC
oVirt gerrit 67036 0 ovirt-engine-4.0.6 MERGED core: fix possible deadlock 2016-11-18 12:40:52 UTC

Comment 1 Martin Sivák 2016-11-04 12:36:27 UTC
This clone is supposed to track the backport to handle unexpected issues and exceptions better.

Comment 2 Red Hat Bugzilla Rules Engine 2016-11-04 12:36:45 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 3 Michal Skrivanek 2016-11-10 14:13:40 UTC
feel free to push a backport sooner, but from my perspective this is fixed in 4.1 and the only thing it affects is bug 1362618 which is hopefully covered by bug 1392903. So this backport would be just a reassurance for HE stability

Comment 4 Nikolai Sednev 2016-11-29 14:40:43 UTC
Works for me on these components on hosts:
libvirt-client-2.0.0-10.el7.x86_64
qemu-kvm-rhev-2.6.0-27.el7.x86_64
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-imageio-daemon-0.4.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
vdsm-4.18.15.3-1.el7ev.x86_64
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.5.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

One engine:
rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch
rhevm-4.0.6.1-0.1.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhev-release-4.0.6-3-001.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhev-guest-tools-iso-4.0-6.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-doc-4.0.6-1.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

I've deployed a clean deployment of rhevm-appliance-20161116.0-1.el7ev.noarch (el7.3 based) over NFS, then upgraded the engine to latest components as appears above, then added NFS data storage domain to get hosted_storage auto-imported in to the engine's WEBUI. Then added additional hosed engine host via WEBUI.

Made at least 8 iterations of steps 1-10:
1.HE-VM running on alma03 with ha score 3400.
2.Setting alma03 in to the maintenance via WEBUI.
3.HE-VM migrated to alma04 successfully and alma03 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI.
4.Activated alma03 back without any issues and host went back active.
5.Waited for alma03 to get ha score of 3400 a few minutes.
6.HE-VM running on alma04 with ha score 3400.
7.Setting alma04 in to the maintenance via WEBUI.
8.HE-VM migrated to alma03 successfully and alma04 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI.
9.Activated alma04 back without any issues and host went back active.
10.Waited for alma04 to get ha score of 3400 a few minutes.

I did not gotten in to the initally reported issue, hence moving this bug to verified.

Comment 5 Nikolai Sednev 2016-11-29 15:05:18 UTC
Changing back to assigned, as bug was eventually reproduced after ~10th iteration, when I did not waited for target host to become active with positive score, I've tried to set host with HE-VM on it (alma04) in to maintenance, while alma03 was active, but not in positive HA score.
attaching sosreports from my environment.

Comment 6 Red Hat Bugzilla Rules Engine 2016-11-29 15:05:26 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 7 Nikolai Sednev 2016-11-29 15:15:48 UTC
Created attachment 1225889 [details]
sosreport from alma03

Comment 8 Nikolai Sednev 2016-11-29 15:17:13 UTC
Created attachment 1225892 [details]
sosreport from alma04

Comment 9 Nikolai Sednev 2016-11-29 15:18:42 UTC
Created attachment 1225893 [details]
sosreport from engine

Comment 10 Nikolai Sednev 2016-11-29 16:35:09 UTC
Moving back to verified, as what I've found would be documented and separate bug will be opened. As for this exact issue was not reproduced, then moving it to verified.

Comment 11 Nikolai Sednev 2016-11-30 07:08:51 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1069269, which was opened forth to https://bugzilla.redhat.com/show_bug.cgi?id=1391933#c5 by msivak.

Comment 12 Nikolai Sednev 2016-11-30 07:23:43 UTC
Sorry, added a wrong link, this is the correct one: https://bugzilla.redhat.com/show_bug.cgi?id=1399766.


Note You need to log in before you can comment on or make changes to this bug.