Bug 1346341 - hosted-engine-setup doesn't correctly initialize the lockspace volume with zeroes
Summary: hosted-engine-setup doesn't correctly initialize the lockspace volume with ze...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.0.1
: 2.0.1
Assignee: Yedidyah Bar David
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks: 1306952
TreeView+ depends on / blocked
 
Reported: 2016-06-14 15:01 UTC by Jiri Belka
Modified: 2017-05-11 09:29 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-04 13:31:34 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
dfediuck: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 59241 0 master MERGED Fix size calculation when zeroing VDSM volumes 2021-01-21 15:12:27 UTC
oVirt gerrit 60269 0 v2.0.z MERGED Fix size calculation when zeroing VDSM volumes 2021-01-21 15:12:27 UTC

Description Jiri Belka 2016-06-14 15:01:31 UTC
Description of problem:

using 3.6-snapshot causes ovirt-hosted-engine-setup-1.3.7.3-0.0.master.20160607094202.git6c7a783.el7.centos.noarch / hosted-engine command not to work correctly with ovirt-engine-appliance-3.6-20160613.1.el7.centos.noarch

# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 117, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 60, in print_status
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 107, in get_all_stats
    stats = self._parse_stats(stats, mode)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 146, in _parse_stats
    md = metadata.parse_metadata_to_dict(host_id, data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/metadata.py", line 156, in parse_metadata_to_dict
    constants.METADATA_FEATURE_VERSION))
ovirt_hosted_engine_ha.lib.exceptions.FatalMetadataError: Metadata version 6 from host 41 too new for this agent (highest compatible version: 1)

i was using iscsi storage for HE on EL7.


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch
ovirt-engine-appliance-3.6-20160613.1.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1. install EL7 host with 3.6-snapshot repos enabled
2. yum install ovirt-engine-appliance
3. hosted-engine --deploy and use iscsi (not sure if storage relevant but i had
   iscsi)
4. hosted-engine --vm-status

Actual results:
# hosted-engine --vm-status
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 117, in <module>
    if not status_checker.print_status():
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 60, in print_status
    all_host_stats = ha_cli.get_all_host_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
    return self.get_all_stats(self.StatModes.HOST)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 107, in get_all_stats
    stats = self._parse_stats(stats, mode)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 146, in _parse_stats
    md = metadata.parse_metadata_to_dict(host_id, data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/metadata.py", line 156, in parse_metadata_to_dict
    constants.METADATA_FEATURE_VERSION))
ovirt_hosted_engine_ha.lib.exceptions.FatalMetadataError: Metadata version 6 from host 41 too new for this agent (highest compatible version: 1)


Expected results:
should see usual output with HE VM/ENV status

Additional info:

Comment 2 Yedidyah Bar David 2016-06-15 06:54:04 UTC
Few notes:

1. (Probably unrelated to current bug) The setup log has:

2016-06-14 13:48:10 INFO otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:639 Creating Volume Group
2016-06-14 13:48:11 DEBUG otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:641 {'status': {'message': 'Failed to initialize physical device: ("[\'/dev/mapper/1IET_000c0001\']",)', 'code': 601}}
2016-06-14 13:48:11 ERROR otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:647 Error creating Volume Group: Failed to initialize physical device: ("['/dev/mapper/1IET_000c0001']",)
2016-06-14 13:48:11 DEBUG otopi.plugins.otopi.dialog.human human.queryString:156 query OVEHOSTED_FORCE_CREATEVG
2016-06-14 13:48:11 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:219 DIALOG:SEND                 The selected device is already used.

But code 601 (or the message) do not say _why_ it failed. It could have been network issues, permissions, failure on the iscsi target, whatever. Not sure we can/should do anything about this in the code, but it still might help to check/attach other relevant logs (vdsm, storage, syslog etc).

2. This looks remarkably similar to bug 1238823 . The fix there was to zero the volume before first use. But the patch was in HA code, and I can't find where we call this code in setup, if at all. I might be missing something.

Comment 3 Simone Tiraboschi 2016-06-15 08:56:25 UTC
The fix was in -ha in VdsmBackend.createVolume in storage_backend.py

While on -setup we already:
from ovirt_hosted_engine_ha.lib import storage_backends
...
backend = storage_backends.VdsmBackend(
...
            created = backend.create({
                lockspace + '.lockspace': 1024*1024*backend.blocksize/512,
                lockspace + '.metadata': md_size,
            })

and VdsmBackend.create internally calls VdsmBackend.createVolume

Comment 4 Yedidyah Bar David 2016-06-15 13:02:30 UTC
Reproduction/verification steps:

1. Allocate an iSCSI target lun for hosted-engine
2. Fill it with random data
3. Deploy hosted-engine on it
4. Run 'hosted-engine --vm-status'

Comment 5 Simone Tiraboschi 2016-06-15 13:24:57 UTC
(In reply to Yedidyah Bar David from comment #4)
> Reproduction/verification steps:
> 
> 1. Allocate an iSCSI target lun for hosted-engine
> 2. Fill it with random data

^^^
I think that the point is here: during development activities we mainly tested iSCSI deployment with iSCSI targetcli using file based backend.
The same in automated tests and probably also for QE.
In this case the LUN is initialized to zero by construction.

Nothing really ensures that all the SAN devices really initialize the device.


> 3. Deploy hosted-engine on it
> 4. Run 'hosted-engine --vm-status'

Comment 6 Yedidyah Bar David 2016-06-15 13:50:27 UTC
To clarify: A workaround is to make sure that the storage space used for hosted-engine is zeroed prior to deploy. As Simone noted above, this is the default for many storage systems. If it's not, the admin should manually zero it.

Comment 7 Jiri Belka 2016-07-07 09:05:31 UTC
(In reply to Yedidyah Bar David from comment #6)
> To clarify: A workaround is to make sure that the storage space used for
> hosted-engine is zeroed prior to deploy. As Simone noted above, this is the
> default for many storage systems. If it's not, the admin should manually
> zero it.

This workaround never worked for me if iSCSI target is EL6 (though not tested with EL7 target), "discovered" LUN was always detected by hosted-engine --deploy as dirty.

Comment 8 Yedidyah Bar David 2016-07-07 13:44:14 UTC
(In reply to Jiri Belka from comment #7)
> This workaround never worked for me if iSCSI target is EL6 (though not
> tested with EL7 target), "discovered" LUN was always detected by
> hosted-engine --deploy as dirty.

Well, this isn't current bug :-) Current bug is if it does not tell you "dirty" but it still actually is.

Comment 9 Jiri Belka 2016-07-28 16:00:18 UTC
ok, steps based on #4

ovirt-hosted-engine-setup-2.0.1-1.el7ev.noarch


Note You need to log in before you can comment on or make changes to this bug.