Description of problem: A fresh install of oVirt 3.5, made following this guide: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ but using iSCSI for hosted-engine deployment instead of glusterfs, on a single node (for now). CentOS 7 is the host, CentOS 6 for the vm holding the engine. Everything works, I've some vms running ok, the engine seems ok, but: * in the web gui, the Hosted Engine HA is reported as "Not Active" * from the cli, "hosted-engine --check-liveliness" returns "Hosted Engine is up!" but "hosted-engine --vm-status" fails with a python Exception: Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 116, in <module> if not status_checker.print_status(): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 59, in print_status all_host_stats = ha_cli.get_all_host_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 155, in get_all_host_stats return self.get_all_stats(self.StatModes.HOST) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats stats = self._parse_stats(stats, mode) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 141, in _parse_stats md = metadata.parse_metadata_to_dict(host_id, data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/metadata.py", line 147, in parse_metadata_to_dict constants.METADATA_FEATURE_VERSION)) ovirt_hosted_engine_ha.lib.exceptions.FatalMetadataError: Metadata version 9 from host 5 too new for this agent (highest compatible version: 1) the "metadata" file, which is really a device under "/rhev/data-center/mnt/blockSD/d4af11cf-c656-40ca-bd42-d81cd7738a6b/ha_agent/hosted-engine.metadata" if observed contains some metadata mixed with a lot of garbage (read: binary data, don't know if expected but according to other discussions there should be only some readable metadata?) If I restart the agent and the broker, I have some notifications that reports the following transitions: StartState-ReinitializeFSM ReinitializeFSM-EngineStarting EngineStarting-EngineUP The engine is working ok, but the anomalies reported persists. Any hint on what to check?
CC from the ovirt-users mailing list: I blindly tried the following: * check the name of the block device used for metadata * then shutdown the engine vm * then stop agent and broker on first host and finally zeroed the block device with dd if=/dev/zero of=/dev/dm-12 (dm-12 is the block device pointed by metadata file) started again broker and agent, after a while the engine was started by HA and the metadata was readable. now hosted_engine --vm-status works ok and I was able to add a 2nd node to the cluster. Also web GUI now reports Hosted Engine HA as Active Maybe the metadata block device needs to be cleared when doing iSCSI setup? Don't know if this is correct, but seems to work ok now.
In 3.6.0 there will be an option to fix the metadata in case of such issues. The root cause here was a mixture of new meatadata with an older agent, which should not happen.
Actually, the iSCSI volume needs to be wiped out before we start using it as VDSM does not do that automatically.
Patch has been merged, please move to modified if no other change is requred.
Can you please add steps to reproduce? ISCSI storage specific? centOS specific?
iSCSI specific. The reproducer is quite simple. Standard hosted engine installation is needed and a non-clean iSCSI disk has to be used for the storage. Or take an existing install and fill the ha_agent.metadata with random data.
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Deployment over iSCSI non-clean LUN finished successfully with vt17.3. Used the following: ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch.
oVirt 3.5.5 has been released including fixes for this issue.
(In reply to Elad from comment #8) > Deployment over iSCSI non-clean LUN finished successfully with vt17.3. Used > the following: > ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch > ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch. Any chance to find out how this bug was verified? I am pretty certain that the fix is wrong for iSCSI at least and wonder if it was really verified. See also bug 1346341 and likely also bug 1314522 and other similar reports.
The bug was verified according to the steps in comment 6
(In reply to Elad from comment #11) > The bug was verified according to the steps in comment 6 How did you force a non-clean disk?
(In reply to Yedidyah Bar David from comment #12) > (In reply to Elad from comment #11) > > The bug was verified according to the steps in comment 6 > > How did you force a non-clean disk? Deployed HE and re-deployed over the same LUN.
OK. For bug 1346341 we'll provide more detailed instructions. Thanks!