Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1659096

Summary: [downstream clone - 4.2.8] Hosted-Engine VM failed to start mixing ovirt-hosted-engine-setup from 4.1 with ovirt-hosted-engine-ha from 4.2
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-hosted-engine-haAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.1.5CC: emarcus, lsurette, nsednev
Target Milestone: ovirt-4.2.8Keywords: Triaged, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v2.2.19 Doc Type: Bug Fix
Doc Text:
Enforce version consistency at rpm level between the ovirt-hosted-engine-ha and the ovirt-hosted-engine-setup.
Story Points: ---
Clone Of: 1643663 Environment:
Last Closed: 2019-01-22 12:44:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1643663    
Bug Blocks: 1653845    

Description RHV bug bot 2018-12-13 14:56:08 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1643663 +++
======================================================================

Description of problem:
 
Hosted Engine VM failed to start giving the python attribute error:


engine-upgrade-check indicated no RHEV upgrade. Applied updates via yum, included a kernel update.
Shutdown of engine completed; however hosted-engine --vm-start fails with python error:
[root@vhacdwdwhvsh223 ~]# hosted-engine --vm-start
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 149, in <module>
    args.command(args)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 63, in checkVmStatus
    status = cli.getVmStats(args.vmid)
AttributeError: '_Client' object has no attribute 'getVmStats'
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 149, in <module>
    args.command(args)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 41, in create
    status = cli.create(vm_params)
AttributeError: '_Client' object has no attribute 'create'


How reproducible:


Steps to Reproduce:

This error is encountered at the customer's HE setup and the occurrence of the events are as inline:

Observations-1:
===============

- Upon checking the the vm status and the ps output from each hosts , we observed the HE VM as UP on one of the hosts, host3:
~~~
$ less 50-clu200_hosted-engine_status|egrep '==|status'
--== Host 1 status ==--
Engine status                      : unknown stale-data
--== Host 2 status ==--
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
--== Host 3 status ==--
Engine status                      : {"health": "good", "vm": "up", "detail": "Up"}
--== Host 4 status ==--
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
--== Host 5 status ==--
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
~~~
However upon further checking, it was observed that there was no qemu process running, virsh output showed the hosted engine as shut off and the ps output showed the HE VM unrecheable.

Actions Taken:
==============
1. Restarted the agent and broker services in sequence
2. Checked the HE VM status on all the 5 hosts to confirm what HostedEngine believes now.

Outcome:
========
1. Now it showed the HE VM as down in the vm status output on every host.
2. Started the HE VM by running the command on host-3 to check if we still get the python error.
3. The VM did not start and returned the same error of python attribute missing:
~~~
AttributeError: '_Client' object has no attribute 'getVmStats'
AttributeError: '_Client' object has no attribute 'create'
~~~

Observations-2:
===============

- I tried searching for some similar kbases, BZ and found nothing relevant to the specific attribute as mentioned above.
- I observed one more error in the syslog of the host which reads a libvirtd error that made me suspect if the storage is unavailable to that host:
~~~
Oct 25 18:20:37 vhacdwdwhvsh221 libvirtd: 2018-10-25 23:20:37.422+0000: 5996: error : qemuOpenFileAs:3234 : Failed to open file '/var/run/vdsm/storage/eac03447-50c9-4542-a9f3-65aebb58d68f/3188be35-fc63-4f37-a25c-cb82ba2ceeee/44dde38e-d0f5-4d7d-92a3-7994c425ec87': No such file or directory
~~~
- But upon checking the sosreport, it is confirmed that the LV is not missing and the LUN is present:
~~~

/dev/mapper/360002ac000000000000000240001ea6e eac03447-50c9-4542-a9f3-65aebb58d68f lvm2 a--  99.62g  33.25g 100.00g 4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi lvm2 4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi 100.00g /dev/mapper/360002ac000000000000000240001ea6e 253 4      63.99m   128.00m       2 144.00m 99.62g  33.25g  <66.38g a--  allocatable                         797   531             2        2       0       0    used           lvm2 0nedej-kANN-DVnZ-BZs0-LbeX-rmey-q4GraN eac03447-50c9-4542-a9f3-65aebb58d68f wz--n- writeable  extendable                       normal                99.62g  33.25g                                     128.00m   797   266     0     0   1           0  13   0  26 MDT_CLASS=Data,MDT_DESCRIPTION=hosted_storage,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=5a870ad7-01be-02c2-00dc-00000000030d,MDT_PV0=pv:360002ac000000000000000240001ea6e&44&uuid:4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi&44&pestart:0&44&pecount:797&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=eac03447-50c9-4542-a9f3-65aebb58d68f,MDT_TYPE=FCP,MDT_VERSION=4,MDT_VGUUID=0nedej-kANN-DVnZ-BZs0-LbeX-rmey-q4GraN,MDT__SHA_CKSUM=946e4c59e0f29b022e0b9f09d6b4d37d587e9939,RHAT_storage_domain                                                                                                                                                               2        2    63.99m   128.00m unmanaged


$ less su_vdsm_-s_.bin.sh_-c_.usr.bin.tree_-l_.rhev.data-center | grep 44dde38e-d0f5-4d7d-92a3-7994c425ec87
|   |       |   `-- 44dde38e-d0f5-4d7d-92a3-7994c425ec87 -> /dev/eac03447-50c9-4542-a9f3-65aebb58d68f/44dde38e-d0f5-4d7d-92a3-7994c425ec87
        |       |   `-- 44dde38e-d0f5-4d7d-92a3-7994c425ec87 -> /dev/eac03447-50c9-4542-a9f3-65aebb58d68f/44dde38e-d0f5-4d7d-92a3-7994c425ec87
~~~

- DNS configurations show that the host is pointing to "localhost" in the resolv.conf which should not be the case: (Also see #54)
$ cat etc/resolv.conf

~~~ 
# Managed by ansible, hand edits will be overwritten.
search vha.med.va.gov
domain vha.med.va.gov
nameserver 127.0.0.1
nameserver 10.224.149.150
nameserver 10.224.45.3
nameserver 10.3.27.33
~~~

 
Actual results:


Expected results:


Additional info:

(Originally by Bhushan Ranpise)

Comment 1 RHV bug bot 2018-12-13 14:56:10 UTC
Just to add additional information. The customer claims that they were simply conducting recommended minor updates on the system and this issue started.

(Originally by Randy Bollinger)

Comment 5 RHV bug bot 2018-12-13 14:56:20 UTC
sosreport-wcarlson.02236185-20181026073634 shows
   ovirt-hosted-engine-setup-2.1.3.6-1.el7ev.noarch (from ovirt-4.1)
but
   ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch and vdsm-client-4.20.27.2-1.el7ev.noarch (from ovirt-4.2)

vdsm-client had a drastic change between 4.1 to 4.2, we should have had a "Conflicts: ovirt-hosted-engine-ha < 2.2" there.

(Originally by danken)

Comment 6 RHV bug bot 2018-12-13 14:56:21 UTC
(In reply to Dan Kenigsberg from comment #3)
> sosreport-wcarlson.02236185-20181026073634 shows
>    ovirt-hosted-engine-setup-2.1.3.6-1.el7ev.noarch (from ovirt-4.1)
> but
>    ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch and
> vdsm-client-4.20.27.2-1.el7ev.noarch (from ovirt-4.2)
> 
> vdsm-client had a drastic change between 4.1 to 4.2, we should have had a
> "Conflicts: ovirt-hosted-engine-ha < 2.2" there.

Yes, I think that issue is simply due to have mixed up ovirt-hosted-engine-setup from 4.1 with ovirt-hosted-engine-ha from 4.2.

In ovirt-hosted-engine-setup spec file from 4.2 we have 
  Requires:       ovirt-hosted-engine-ha >= 2.2.13

but we miss a 
  Conflicts: ovirt-hosted-engine-ha >= 2.2
in ovirt-hosted-engine-setup spec file from 4.1
or a 
  Conflicts: ovirt-hosted-engine-setup < 2.2
in ovirt-hosted-engine-ha spec file from 4.2 (next build)
should prevent this issue.

Upgrading also ovirt-hosted-engine-setup on all the hosts to the latest 2.2.z should solve this.

(Originally by Simone Tiraboschi)

Comment 10 RHV bug bot 2018-12-13 14:56:26 UTC
moving to Integration for consideration of further defensive spec changes, but it doesn't seem like a product bug

(Originally by michal.skrivanek)

Comment 20 RHV bug bot 2018-12-13 14:56:40 UTC
The customer was able to downgrade ovirt-hosted-engine-ha and restart the host and ovirt-engine/Hosted Engine were able to start again. Is there a need to collect a new log collector at this time?

(Originally by Robert McSwain)

Comment 22 Nikolai Sednev 2019-01-07 16:49:03 UTC
Forth to https://bugzilla.redhat.com/show_bug.cgi?id=1643663#c22, moving to verified.

Comment 24 errata-xmlrpc 2019-01-22 12:44:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0118

Comment 25 Daniel Gur 2019-08-28 13:11:21 UTC
sync2jira

Comment 26 Daniel Gur 2019-08-28 13:15:33 UTC
sync2jira