Bug 1854867
| Summary: | HE deployment fails with iscsi storage | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Guilherme Santos <gdeolive> |
| Component: | ovirt-hosted-engine-setup | Assignee: | Asaf Rachmani <arachman> |
| Status: | CLOSED WORKSFORME | QA Contact: | Nikolai Sednev <nsednev> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4.0 | CC: | bzlotnik, lsurette, lsvaty, michal.skrivanek, mnecas, mperina, pelauter, sbonazzo, tnisan |
| Target Milestone: | ovirt-4.4.3 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-01 09:11:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 6
Michal Skrivanek
2020-07-08 15:47:09 UTC
Both otopi_storage_domain_details and otopi_storage_domain_details_iscsi describe the same SD (the one that the HE creates for the HE vm). Some keys may change name/position, but the info is the same, sd id, datacenter, lun id, target... However the latest seems to have more info (like not only the "available" one but also "used" data for example) It looks like the SD activation didn't return available space. Can you please try to check with REST API, when you have that SD inactive/unattached, can you try to activate it using http://ovirt.github.io/ovirt-engine-api-model/master/#services/attached_storage_domain ? What does it return? are there additional logs where one could see vdsm requests/responses? (In reply to Michal Skrivanek from comment #11) > It looks like the SD activation didn't return available space. Can you > please try to check with REST API, when you have that SD > inactive/unattached, can you try to activate it using > http://ovirt.github.io/ovirt-engine-api-model/master/#services/ > attached_storage_domain ? What does it return? I don't think it has anything to do what SD activation does under the hood. REST API doesn't seems to return any values, this is what I'm getting: vjuranek@localhost tmp$ curl -X POST -H "Content-type: application/xml" --user admin@internal:ovirt --cacert cert.pem --data "<action/>" https://ovirt-imageio.local/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/activate <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <action> <job href="/ovirt-engine/api/jobs/8be55ffb-4062-4210-aec8-f12577751469" id="8be55ffb-4062-4210-aec8-f12577751469"/> <status>complete</status> <storage_domain href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79" id="4a115d1c-4fef-4266-a5f1-6d6425b6dd79"> <actions> <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/activate" rel="activate"/> <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/deactivate" rel="deactivate"/> </actions> <link href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf/storagedomains/4a115d1c-4fef-4266-a5f1-6d6425b6dd79/disks" rel="disks"/> <data_center href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf" id="13345997-b94f-42dd-b8ef-a1392f65cebf"/> <data_centers> <data_center href="/ovirt-engine/api/datacenters/13345997-b94f-42dd-b8ef-a1392f65cebf" id="13345997-b94f-42dd-b8ef-a1392f65cebf"/> </data_centers> </storage_domain> </action> Successfully deployed HE on latest components over iSCSI: Software Version:4.4.1.2-0.10.el8ev rhvm-appliance-4.4-20200604.0.el8ev.x86_64 ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch Linux 4.18.0-193.13.1.el8_2.x86_64 #1 SMP Tue Jul 7 14:03:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.2 (Ootpa) Please consider closing as works for me. (In reply to Nikolai Sednev from comment #14) > > Please consider closing as works for me. Any objections Guilherme or can I close it? I'd like to understand as well what went wrong here, was there a regression or not? (In reply to Tal Nisan from comment #15) > (In reply to Nikolai Sednev from comment #14) > > > > Please consider closing as works for me. > > Any objections Guilherme or can I close it? > I'd like to understand as well what went wrong here, was there a regression > or not? BTW, the deployment took much longer time now vs previous versions. The deplay started during VM local disk copy to iSCSI shared storage, speed was about 24MBps instead of ~100MB+, storage and network were not loaded and should not be an issue. (In reply to Vojtech Juranek from comment #13) > I don't think it has anything to do what SD activation does under the hood. > REST API doesn't seems to return any values, this is what I'm getting: hm, then it could really be something in the ansible module. Martin, can you please check that? I also see several messages like this one in /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200714142037-e99lf2.log after deployment was finished:
2020-07-14 15:36:27,173+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK
[ovirt.hosted_engine_setup : Destroy local storage-pool {{ local_vm_disk_path.split('/')[5] }}]
2020-07-14 15:36:27,575+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 {'m
sg': 'The task includes an option with an undefined variable. The error was: \'local_vm_disk_path\' is undefined\n\nTh
e error appears to be in \'/usr/share/ansible/roles/ovirt.hosted_engine_setup/tasks/clean_local_storage_pools.yml\': l
ine 16, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appea
rs to be:\n\n changed_when: true\n - name: Destroy local storage-pool {{ local_vm_disk_path.split(\'/\')[5] }}\n
^ here\nWe could be wrong, but this one looks like it might be an issue with\nmissing quotes. Always quote template
expression brackets when they\nstart a value. For instance:\n\n with_items:\n - {{ foo }}\n\nShould be written
as:\n\n with_items:\n - "{{ foo }}"\n', '_ansible_no_log': False}
(In reply to Michal Skrivanek from comment #17) > (In reply to Vojtech Juranek from comment #13) > > I don't think it has anything to do what SD activation does under the hood. > > REST API doesn't seems to return any values, this is what I'm getting: > > hm, then it could really be something in the ansible module. Martin, can you > please check that? Looks like the issue is in hosted-engine-setup role I tested the ovirt_storage_domain_facts if it always returns the `available` attribute and it does so I also think the issue is in the hosted-engine-setup ok. I have my doubts but alright, let's move back to Integration for further investigation From what I've previously seen, I think the fields were null which would explain why they were absent. Looking at the code, they are null only when the storage domain is deactivated or vdsm does not report them, but there's not enough information to look at unfortunately thanks benny. so then again my question is the same - and it contradicts comment #20 - if the field is mandatory in ansible doc why are we *not* returning that key? Blame the setup code all you want, but if there are circumstances when the documentation doesn't match the real behavior of ansible module (for whatever reason) we need to fix either the module or doc. separately, since it looks like a race based on comment #22, is this reproducible? Looks like no information is needed from me anymore. I think this bug has enough needed info to be provided before digging further. I wouldn't say so Michal... As the activation of the storage domain and all the report dealt by vdsm is handled by installation itself under the hood. The activation task comes before the failing, so I assume the SD is activated by then. I can try to disrupt the vdsm in the middle of the installation, after the activation task i.e., to try to reproduce it, but not sure. I'll post here if I get able to do it. Restoring needinfo on Martin for comment #23 Removing regression and blocker+ on low reproducibility. (In reply to Michal Skrivanek from comment #23) > thanks benny. > so then again my question is the same - and it contradicts comment #20 - if > the field is mandatory in ansible doc why are we *not* returning that key? > Blame the setup code all you want, but if there are circumstances when the > documentation doesn't match the real behavior of ansible module (for > whatever reason) we need to fix either the module or doc. So looking at hosted-engine-setup role we have following usages of ovirt_storage_domains_facts: https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_storage_domain.yml#L115 https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/01_create_target_hosted_engine_vm.yml#L116 https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/create_target_vm/03_hosted_engine_final_tasks.yml#L13 The 1st and 2nd usage is followed by debug so we can easily check the content: 2020-07-08 11:37:26,528+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 storage_domain_details: {'changed': False, 'ansible_facts': {'ovirt_storage_domains': [{'href': '/ovirt-engine/api/storagedomains/9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'comment': '', 'description': '', 'id': '9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'name': 'hosted_storage', 'available': 102005473280, 'backup': False, 'block_size': 512, 'committed': 0, 'critical_space_action_blocker': 5, 'discard_after_delete': True, 'disk_profiles': [], 'disk_snapshots': [], 'disks': [], 'external_status': 'ok', 'master': False, 'permissions': [], 'status': 'unattached', 'storage': {'type': 'iscsi', 'volume_group': {'id': 'iY14jP-d6Oy-2Rwp-eN70-2841-IA7G-Binj0f', 'logical_units': [{'address': '3par-iscsi-1.scl.lab.tlv.redhat.com', 'discard_max_size': 33554432, 'discard_zeroes_data': False, 'id': '360002ac000000000000014d700021f6b', 'lun_mapping': 2, 'paths': 0, 'port': 3260, 'portal': '3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21', 'product_id': 'VV', 'serial': 'S3PARdataVV_CZ3836C3RB', 'size': 107374182400, 'storage_domain_id': '9eb05cb7-b7da-4e09-9edd-1f55c4c5ef61', 'target': 'iqn.2000-05.com.3pardata:20210002ac021f6b', 'vendor_id': '3PARdata', 'volume_group_id': 'iY14jP-d6Oy-2Rwp-eN70-2841-IA7G-Binj0f'}]}}, 'storage_connections': [], 'storage_format': 'v5', 'supports_discard': True, 'supports_discard_zeroes_data': False, 'templates': [], 'type': 'data', 'used': 4294967296, 'vms': [], 'warning_low_space_indicator': 10, 'wipe_after_delete': False}]}, 'deprecations': [{'msg': "The 'ovirt_storage_domain_facts' module has been renamed to 'ovirt_storage_domain_info', and the renamed one no longer returns ansible_facts", 'version': '2.13'}], 'failed': False} Above is the only one presence of using the module and 'available' key is fetched: 'available': 102005473280 So if you want to reproduce we need to add debug also to the 3rd usage and we can easily confirm during reproduction if available is present or no. (In reply to Guilherme Santos from comment #26) > I wouldn't say so Michal... As the activation of the storage domain and all > the report dealt by vdsm is handled by installation itself under the hood. > The activation task comes before the failing, so I assume the SD is > activated by then. > I can try to disrupt the vdsm in the middle of the installation, after the > activation task i.e., to try to reproduce it, but not sure. I'll post here > if I get able to do it. Were you able to reproduce the issue? (In reply to Asaf Rachmani from comment #30) > > (In reply to Guilherme Santos from comment #26) > > I wouldn't say so Michal... As the activation of the storage domain and all > > the report dealt by vdsm is handled by installation itself under the hood. > > The activation task comes before the failing, so I assume the SD is > > activated by then. > > I can try to disrupt the vdsm in the middle of the installation, after the > > activation task i.e., to try to reproduce it, but not sure. I'll post here > > if I get able to do it. > > Were you able to reproduce the issue? I wasn't Following comment 14 and comment 31 closing it as WORKSFORME. |