Description of problem: Hosted Engine VM failed to start giving the python attribute error: engine-upgrade-check indicated no RHEV upgrade. Applied updates via yum, included a kernel update. Shutdown of engine completed; however hosted-engine --vm-start fails with python error: [root@vhacdwdwhvsh223 ~]# hosted-engine --vm-start Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 149, in <module> args.command(args) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 63, in checkVmStatus status = cli.getVmStats(args.vmid) AttributeError: '_Client' object has no attribute 'getVmStats' Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 149, in <module> args.command(args) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 41, in create status = cli.create(vm_params) AttributeError: '_Client' object has no attribute 'create' How reproducible: Steps to Reproduce: This error is encountered at the customer's HE setup and the occurrence of the events are as inline: Observations-1: =============== - Upon checking the the vm status and the ps output from each hosts , we observed the HE VM as UP on one of the hosts, host3: ~~~ $ less 50-clu200_hosted-engine_status|egrep '==|status' --== Host 1 status ==-- Engine status : unknown stale-data --== Host 2 status ==-- Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} --== Host 3 status ==-- Engine status : {"health": "good", "vm": "up", "detail": "Up"} --== Host 4 status ==-- Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} --== Host 5 status ==-- Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} ~~~ However upon further checking, it was observed that there was no qemu process running, virsh output showed the hosted engine as shut off and the ps output showed the HE VM unrecheable. Actions Taken: ============== 1. Restarted the agent and broker services in sequence 2. Checked the HE VM status on all the 5 hosts to confirm what HostedEngine believes now. Outcome: ======== 1. Now it showed the HE VM as down in the vm status output on every host. 2. Started the HE VM by running the command on host-3 to check if we still get the python error. 3. The VM did not start and returned the same error of python attribute missing: ~~~ AttributeError: '_Client' object has no attribute 'getVmStats' AttributeError: '_Client' object has no attribute 'create' ~~~ Observations-2: =============== - I tried searching for some similar kbases, BZ and found nothing relevant to the specific attribute as mentioned above. - I observed one more error in the syslog of the host which reads a libvirtd error that made me suspect if the storage is unavailable to that host: ~~~ Oct 25 18:20:37 vhacdwdwhvsh221 libvirtd: 2018-10-25 23:20:37.422+0000: 5996: error : qemuOpenFileAs:3234 : Failed to open file '/var/run/vdsm/storage/eac03447-50c9-4542-a9f3-65aebb58d68f/3188be35-fc63-4f37-a25c-cb82ba2ceeee/44dde38e-d0f5-4d7d-92a3-7994c425ec87': No such file or directory ~~~ - But upon checking the sosreport, it is confirmed that the LV is not missing and the LUN is present: ~~~ /dev/mapper/360002ac000000000000000240001ea6e eac03447-50c9-4542-a9f3-65aebb58d68f lvm2 a-- 99.62g 33.25g 100.00g 4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi lvm2 4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi 100.00g /dev/mapper/360002ac000000000000000240001ea6e 253 4 63.99m 128.00m 2 144.00m 99.62g 33.25g <66.38g a-- allocatable 797 531 2 2 0 0 used lvm2 0nedej-kANN-DVnZ-BZs0-LbeX-rmey-q4GraN eac03447-50c9-4542-a9f3-65aebb58d68f wz--n- writeable extendable normal 99.62g 33.25g 128.00m 797 266 0 0 1 0 13 0 26 MDT_CLASS=Data,MDT_DESCRIPTION=hosted_storage,MDT_IOOPTIMEOUTSEC=10,MDT_LEASERETRIES=3,MDT_LEASETIMESEC=60,MDT_LOCKPOLICY=,MDT_LOCKRENEWALINTERVALSEC=5,MDT_LOGBLKSIZE=512,MDT_PHYBLKSIZE=512,MDT_POOL_UUID=5a870ad7-01be-02c2-00dc-00000000030d,MDT_PV0=pv:360002ac000000000000000240001ea6e&44&uuid:4i1USJ-P9gG-tC9P-BavB-Yw9J-9RlG-iDi5xi&44&pestart:0&44&pecount:797&44&mapoffset:0,MDT_ROLE=Regular,MDT_SDUUID=eac03447-50c9-4542-a9f3-65aebb58d68f,MDT_TYPE=FCP,MDT_VERSION=4,MDT_VGUUID=0nedej-kANN-DVnZ-BZs0-LbeX-rmey-q4GraN,MDT__SHA_CKSUM=946e4c59e0f29b022e0b9f09d6b4d37d587e9939,RHAT_storage_domain 2 2 63.99m 128.00m unmanaged $ less su_vdsm_-s_.bin.sh_-c_.usr.bin.tree_-l_.rhev.data-center | grep 44dde38e-d0f5-4d7d-92a3-7994c425ec87 | | | `-- 44dde38e-d0f5-4d7d-92a3-7994c425ec87 -> /dev/eac03447-50c9-4542-a9f3-65aebb58d68f/44dde38e-d0f5-4d7d-92a3-7994c425ec87 | | `-- 44dde38e-d0f5-4d7d-92a3-7994c425ec87 -> /dev/eac03447-50c9-4542-a9f3-65aebb58d68f/44dde38e-d0f5-4d7d-92a3-7994c425ec87 ~~~ - DNS configurations show that the host is pointing to "localhost" in the resolv.conf which should not be the case: (Also see #54) $ cat etc/resolv.conf ~~~ # Managed by ansible, hand edits will be overwritten. search vha.med.va.gov domain vha.med.va.gov nameserver 127.0.0.1 nameserver 10.224.149.150 nameserver 10.224.45.3 nameserver 10.3.27.33 ~~~ Actual results: Expected results: Additional info:
Just to add additional information. The customer claims that they were simply conducting recommended minor updates on the system and this issue started.
sosreport-wcarlson.02236185-20181026073634 shows ovirt-hosted-engine-setup-2.1.3.6-1.el7ev.noarch (from ovirt-4.1) but ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch and vdsm-client-4.20.27.2-1.el7ev.noarch (from ovirt-4.2) vdsm-client had a drastic change between 4.1 to 4.2, we should have had a "Conflicts: ovirt-hosted-engine-ha < 2.2" there.
(In reply to Dan Kenigsberg from comment #3) > sosreport-wcarlson.02236185-20181026073634 shows > ovirt-hosted-engine-setup-2.1.3.6-1.el7ev.noarch (from ovirt-4.1) > but > ovirt-hosted-engine-ha-2.2.11-1.el7ev.noarch and > vdsm-client-4.20.27.2-1.el7ev.noarch (from ovirt-4.2) > > vdsm-client had a drastic change between 4.1 to 4.2, we should have had a > "Conflicts: ovirt-hosted-engine-ha < 2.2" there. Yes, I think that issue is simply due to have mixed up ovirt-hosted-engine-setup from 4.1 with ovirt-hosted-engine-ha from 4.2. In ovirt-hosted-engine-setup spec file from 4.2 we have Requires: ovirt-hosted-engine-ha >= 2.2.13 but we miss a Conflicts: ovirt-hosted-engine-ha >= 2.2 in ovirt-hosted-engine-setup spec file from 4.1 or a Conflicts: ovirt-hosted-engine-setup < 2.2 in ovirt-hosted-engine-ha spec file from 4.2 (next build) should prevent this issue. Upgrading also ovirt-hosted-engine-setup on all the hosts to the latest 2.2.z should solve this.
moving to Integration for consideration of further defensive spec changes, but it doesn't seem like a product bug
The customer was able to downgrade ovirt-hosted-engine-ha and restart the host and ovirt-engine/Hosted Engine were able to start again. Is there a need to collect a new log collector at this time?
Forth to our latest conversation with Simone, here are steps that were executed for verification: 1.Installed 4.1 ovirt-hosted-engine-setup-2.1.4.2-1.el7ev.noarch and ovirt-hosted-engine-ha-2.1.11-1.el7ev.noarch on RHEL7.5. 2.echo "ovirt-hosted-engine-setup" > /etc/yum/pluginconf.d/versionlock.list 3.Upgraded OS to RHEL7.6. 4.Added 4.2 repos. 5.Tried to "yum update ovirt-hosted-engine-ha", while keeping an older ovirt-hosted-engine-setup due to versionlock. Update executed successfully and both ovirt-hosted-engine-setup and ovirt-hosted-engine-ha were updated. ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.32-1.el7ev.noarch Mixed versions issue was fixed. Moving to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1049
sync2jira