Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1854867

Summary: HE deployment fails with iscsi storage
Product: Red Hat Enterprise Virtualization Manager Reporter: Guilherme Santos <gdeolive>
Component: ovirt-hosted-engine-setupAssignee: Asaf Rachmani <arachman>
Status: CLOSED WORKSFORME QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: high    
Version: 4.4.0CC: bzlotnik, lsurette, lsvaty, michal.skrivanek, mnecas, mperina, pelauter, sbonazzo, tnisan
Target Milestone: ovirt-4.4.3   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-01 09:11:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 6 Michal Skrivanek 2020-07-08 15:47:09 UTC
you have two of them, otopi_storage_domain_details and otopi_storage_domain_details_iscsi. Only the latter has "available", but that's not the one the installer is using. Can you describe what these SDs are?

Comment 10 Guilherme Santos 2020-07-09 07:13:37 UTC
Both otopi_storage_domain_details and otopi_storage_domain_details_iscsi describe the same SD (the one that the HE creates for the HE vm).
Some keys may change name/position, but the info is the same, sd id, datacenter, lun id, target... However the latest seems to have more info (like not only the "available" one but also "used" data for example)

Comment 11 Michal Skrivanek 2020-07-09 07:59:18 UTC
It looks like the SD activation didn't return available space. Can you please try to check with REST API, when you have that SD inactive/unattached, can you try to activate it using http://ovirt.github.io/ovirt-engine-api-model/master/#services/attached_storage_domain ? What does it return?

Comment 12 Benny Zlotnik 2020-07-09 10:38:38 UTC
are there additional logs where one could see vdsm requests/responses?

Comment 13 Vojtech Juranek 2020-07-09 11:19:57 UTC
(In reply to Michal Skrivanek from comment #11)
> It looks like the SD activation didn't return available space. Can you
> please try to check with REST API, when you have that SD
> inactive/unattached, can you try to activate it using
> http://ovirt.github.io/ovirt-engine-api-model/master/#services/
> attached_storage_domain ? What does it return?

I don't think it has anything to do what SD activation does under the hood. REST API doesn't seems to return any values, this is what I'm getting:

vjuranek@localhost tmp$ curl -X POST -H "Content-type: application/xml" --user admin@internal:ovirt --cacert cert.pem --data "<action/>" https://ovirt-imageio.local/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/activate
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<action>
    <job href="/ovirt-engine/api/jobs/8be55ffb-4062-4210-aec8-f12577751469" id="8be55ffb-4062-4210-aec8-f12577751469"/>
    <status>complete</status>
    <storage_domain href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79" id="4a115d1c-4fef-4266-a5f1-6d6425b6dd79">
        <actions>
            <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/activate" rel="activate"/>
            <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/deactivate" rel="deactivate"/>
        </actions>
        <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/disks" rel="disks"/>
        <data_center href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf" id="13345997-b94f-42dd-b8ef-a1392f65cebf"/>
        <data_centers>
            <data_center href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf" id="13345997-b94f-42dd-b8ef-a1392f65cebf"/>
        </data_centers>
    </storage_domain>
</action>

Comment 14 Nikolai Sednev 2020-07-13 10:08:53 UTC
Successfully deployed HE on latest components over iSCSI:
Software Version:4.4.1.2-0.10.el8ev
rhvm-appliance-4.4-20200604.0.el8ev.x86_64
ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch
Linux 4.18.0-193.13.1.el8_2.x86_64 #1 SMP Tue Jul 7 14:03:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 (Ootpa)

Please consider closing as works for me.

Comment 15 Tal Nisan 2020-07-13 14:18:47 UTC
(In reply to Nikolai Sednev from comment #14)
> 
> Please consider closing as works for me.

Any objections Guilherme or can I close it?
I'd like to understand as well what went wrong here, was there a regression or not?

Comment 16 Nikolai Sednev 2020-07-13 14:52:46 UTC
(In reply to Tal Nisan from comment #15)
> (In reply to Nikolai Sednev from comment #14)
> > 
> > Please consider closing as works for me.
> 
> Any objections Guilherme or can I close it?
> I'd like to understand as well what went wrong here, was there a regression
> or not?

BTW, the deployment took much longer time now vs previous versions. The deplay started during VM local disk copy to iSCSI shared storage, speed was about 24MBps instead of ~100MB+, storage and network were not loaded and should not be an issue.

Comment 17 Michal Skrivanek 2020-07-14 12:30:41 UTC
(In reply to Vojtech Juranek from comment #13)
> I don't think it has anything to do what SD activation does under the hood.
> REST API doesn't seems to return any values, this is what I'm getting:

hm, then it could really be something in the ansible module. Martin, can you please check that?

Comment 18 Nikolai Sednev 2020-07-14 12:51:17 UTC
I also see several messages like this one in /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200714142037-e99lf2.log after deployment was finished:

2020-07-14 15:36:27,173+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK
 [ovirt.hosted_engine_setup : Destroy local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
2020-07-14 15:36:27,575+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'m
sg': 'The task includes an option with an undefined variable. The error was: \'local_vm_disk_path\' is undefined\n\nTh
e error appears to be in \'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/clean_local_storage_pools.yml\': l
ine 16, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appea
rs to be:\n\n    changed_when: true\n  - name: Destroy local storage-pool {{ local_vm_disk_path.split(\'/\')[5] }}\n  
  ^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template 
expression brackets when they\nstart a value. For instance:\n\n    with_items:\n      - {{ foo }}\n\nShould be written
 as:\n\n    with_items:\n      - "{{ foo }}"\n', '_ansible_no_log': False}

Comment 19 Martin Perina 2020-07-15 09:11:07 UTC
(In reply to Michal Skrivanek from comment #17)
> (In reply to Vojtech Juranek from comment #13)
> > I don't think it has anything to do what SD activation does under the hood.
> > REST API doesn't seems to return any values, this is what I'm getting:
> 
> hm, then it could really be something in the ansible module. Martin, can you
> please check that?

Looks like the issue is in hosted-engine-setup role

Comment 20 Martin Necas 2020-07-21 07:00:25 UTC
I tested the ovirt_storage_domain_facts if it always returns the `available` attribute and it does so I also think the issue is in the hosted-engine-setup

Comment 21 Michal Skrivanek 2020-07-22 12:49:02 UTC
ok. I have my doubts but alright, let's move back to Integration for further investigation

Comment 22 Benny Zlotnik 2020-07-22 12:58:26 UTC
From what I've previously seen, I think the fields were null which would explain why they were absent. Looking at the code, they are null only when the storage domain is deactivated or vdsm does not report them, but there's not enough information to look at unfortunately

Comment 23 Michal Skrivanek 2020-07-23 13:15:03 UTC
thanks benny.
so then again my question is the same - and it contradicts comment #20 - if the field is mandatory in ansible doc why are we *not* returning that key? Blame the setup code all you want, but if there are circumstances when the documentation doesn't match the real behavior of ansible module (for whatever reason) we need to fix either the module or doc.

Comment 24 Michal Skrivanek 2020-07-23 13:15:46 UTC
separately, since it looks like a race based on comment #22, is this reproducible?

Comment 25 Sandro Bonazzola 2020-07-27 14:46:20 UTC
Looks like no information is needed from me anymore.
I think this bug has enough needed info to be provided before digging further.

Comment 26 Guilherme Santos 2020-07-27 15:45:05 UTC
I wouldn't say so Michal... As the activation of the storage domain and all the report dealt by vdsm is handled by installation itself under the hood. The activation task comes before the failing, so I assume the SD is activated by then.
I can try to disrupt the vdsm in the middle of the installation, after the activation task i.e., to try to reproduce it, but not sure. I'll post here if I get able to do it.

Comment 27 Sandro Bonazzola 2020-07-29 09:04:57 UTC
Restoring needinfo on Martin for comment #23

Comment 28 Lukas Svaty 2020-07-29 16:39:49 UTC
Removing regression and blocker+ on low reproducibility.

Comment 29 Martin Perina 2020-07-30 10:00:47 UTC
(In reply to Michal Skrivanek from comment #23)
> thanks benny.
> so then again my question is the same - and it contradicts comment #20 - if
> the field is mandatory in ansible doc why are we *not* returning that key?
> Blame the setup code all you want, but if there are circumstances when the
> documentation doesn't match the real behavior of ansible module (for
> whatever reason) we need to fix either the module or doc.

So looking at hosted-engine-setup role we have following usages of ovirt_storage_domains_facts:

https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_storage_domain.yml#L115
https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml#L116
https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/03_hosted_engine_final_tasks.yml#L13

The 1st and 2nd usage is followed by debug so we can easily check the content:

2020-07-08 11:37:26,528+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 storage_domain_details: {'changed': False, 'ansible_facts': {'ovirt_storage_domains': [{'href': '/ovirt-engine/api/storagedomains/9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'comment': '', 'description': '', 'id': '9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'name': 'hosted_storage', 'available': 102005473280, 'backup': False, 'block_size': 512, 'committed': 0, 'critical_space_action_blocker': 5, 'discard_after_delete': True, 'disk_profiles': [], 'disk_snapshots': [], 'disks': [], 'external_status': 'ok', 'master': False, 'permissions': [], 'status': 'unattached', 'storage': {'type': 'iscsi', 'volume_group': {'id': 'iY14jP-d6Oy-2Rwp-eN70-2841-IA7G-Binj0f', 'logical_units': [{'address': '3par-iscsi-1.scl.lab.tlv.redhat.com', 'discard_max_size': 33554432, 'discard_zeroes_data': False, 'id': '360002ac000000000000014d700021f6b', 'lun_mapping': 2, 'paths': 0, 'port': 3260, 'portal': '3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21', 'product_id': 'VV', 'serial': 'S3PARdataVV_CZ3836C3RB', 'size': 107374182400, 'storage_domain_id': '9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'target': 'iqn.2000-05.com.3pardata:20210002ac021f6b', 'vendor_id': '3PARdata', 'volume_group_id': 'iY14jP-d6Oy-2Rwp-eN70-2841-IA7G-Binj0f'}]}}, 'storage_connections': [], 'storage_format': 'v5', 'supports_discard': True, 'supports_discard_zeroes_data': False, 'templates': [], 'type': 'data', 'used': 4294967296, 'vms': [], 'warning_low_space_indicator': 10, 'wipe_after_delete': False}]}, 'deprecations': [{'msg': "The 'ovirt_storage_domain_facts' module has been renamed to 'ovirt_storage_domain_info', and the renamed one no longer returns ansible_facts", 'version': '2.13'}], 'failed': False}

Above is the only one presence of using the module and 'available' key is fetched:

'available': 102005473280


So if you want to reproduce we need to add debug also to the 3rd usage and we can easily confirm during reproduction if available is present or no.

Comment 30 Asaf Rachmani 2020-08-06 08:20:36 UTC

(In reply to Guilherme Santos from comment #26)
> I wouldn't say so Michal... As the activation of the storage domain and all
> the report dealt by vdsm is handled by installation itself under the hood.
> The activation task comes before the failing, so I assume the SD is
> activated by then.
> I can try to disrupt the vdsm in the middle of the installation, after the
> activation task i.e., to try to reproduce it, but not sure. I'll post here
> if I get able to do it.

Were you able to reproduce the issue?

Comment 31 Guilherme Santos 2020-08-31 13:49:29 UTC
(In reply to Asaf Rachmani from comment #30)
> 
> (In reply to Guilherme Santos from comment #26)
> > I wouldn't say so Michal... As the activation of the storage domain and all
> > the report dealt by vdsm is handled by installation itself under the hood.
> > The activation task comes before the failing, so I assume the SD is
> > activated by then.
> > I can try to disrupt the vdsm in the middle of the installation, after the
> > activation task i.e., to try to reproduce it, but not sure. I'll post here
> > if I get able to do it.
> 
> Were you able to reproduce the issue?

I wasn't

Comment 32 Asaf Rachmani 2020-09-01 09:11:22 UTC
Following comment 14 and comment 31 closing it as WORKSFORME.