Bug 1566805
Summary: | ansible installer gets confused about node identification when gluster nodes are added. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | raffaele spazzoli <rspazzol> | ||||||
Component: | Installer | Assignee: | Scott Dodson <sdodson> | ||||||
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> | ||||||
Status: | CLOSED NOTABUG | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | aos-bugs, boris.ruppert, jarrpa, jokerman, mmccomas, pkanthal, rspazzol, rteague, sbain | ||||||
Version: | 3.9.0 | Keywords: | Triaged | ||||||
Target Milestone: | --- | Flags: | rspazzol:
needinfo-
|
||||||
Target Release: | 3.9.z | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-12-14 20:47:15 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
raffaele spazzoli
2018-04-13 02:51:09 UTC
To clarify, the app nodes in example are env1-node-hzn2, env1-node-z2b1, env1-node-zc97 ? This is pretty hard to follow without actually seeing the group mappings and variables. The anomalies are masters aren't labeled region=master app aren't labeled region=primary infra aren't labeled region=infra The only nodes that actually get the region label applied are the cns nodes? Can you please provide your inventory and groupvars? Created attachment 1421324 [details]
working_inventory
Created attachment 1421325 [details]
non-working inventory
Scott, you have correctly identified the anomaly. I attach an example of a working inventory and a non-working inventory. Thank you for the example inventories. I am unable to track down the issue with the information provided so far. Please attach a log file with -vvv output. Please also attach yaml dump of the effected inventory using: $ ansible-inventory -i hosts --list --yaml We have exactly the same behavior with the ansible installer 3.9.27 and 3.9.30. All region labels get lost during the installation (when glusterfs is deployed), only region=storage survives. All other node labels survive the installation. Our current workaround is to change the node labels and selectors to something different, e.g type=master, type=infra, etc... We ran the following command against a customer's setup where we believe this bug may be manifesting itself: ansible-inventory -i [inventory file] --list --yaml The resulting YAML file shows the correct openshift_node_labels across the masters, infra, and workers. Regardless of what Ansible is receiving from the inventory, installing with two Gluster clusters (one for apps and one for logging, metrics, and registry) results in all nodes being labeled as region=infra. We plan to test and see if the behavior is repeatable without the Gluster installation playbooks being run. We are also looking in the 3.9 roles and playbooks where Ansible's oc_label module is used to try and determine where Ansible may be applying the incorrect label. There problem here is that you're using "region" as your node-selector label for CNS. Each node can only have one label of a given name, so any node that is designated for GlusterFS will have its region label value changed to "cns". An easier solution would be to change the node selector to something like "storage=glusterfs". The problem described above is a result of the combination of several inventory variables. openshift_storage_glusterfs_nodeselector: "region=cns" openshift_storage_glusterfs_wipe: true By specifying the node selector with the key 'region' as well as setting openshift_storage_glusterfs_wipe=True, a task [1] during install will remove all labels from all hosts with the 'key' defined in the node selector. Thus, all 'region' labels are removed from all hosts, and then only the label region=cns is added. To resolve this issue, specify a custom node selector that does not use 'region' as the key, or do not specify any openshift_storage_glusterfs_nodeselector in the inventory which would allow the default node selector of 'glusterfs=storage-host' [2]. This issue does not exist in release-3.10 as the 'Unlabel' task does not exist due to code refactoring for additional improvements in the openshift_storage_glusterfs role. Please report if the above recommendations will resolve this issue. [1] https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_storage_glusterfs/tasks/glusterfs_deploy.yml#L19-L26 [2] https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_storage_glusterfs/README.md#role-variables This is believed to be a misconfiguration. Please see the suggestion in Comment 11. |