Description of problem: the 3.9 installer has a new logic to identify the role of the nodes and separate between master, infra and compute nodes. Everything seems to work fine under these circumstances. When I add cns nodes the correct labels seems to not be applied anymore. Here are the relevant part of my inventory: to the masters I apply the following labels: openshift_node_labels: region: master to the infranodes I apply the following labels: openshift_node_labels: region: infra node-role.kubernetes.io/infranode: true to the app nodes I apply the following nodes: openshift_node_labels: region: primary to the cns nodes I apply the following labels: openshift_node_labels: region: cns node-role.kubernetes.io/cnsnode: true the default node selector is the following: osm_default_node_selector: 'region=primary' the result in terms of labels applied to nodes is the following: [root@env1-master-sq3z ~]# oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS env1-cnsnode-4prb Ready cnsnode,compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=env1-cnsnode-4prb,node-role.kubernetes.io/cnsnode=True,node-role.kubernetes.io/compute=true,region=cns env1-cnsnode-c531 Ready cnsnode,compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=env1-cnsnode-c531,node-role.kubernetes.io/cnsnode=True,node-role.kubernetes.io/compute=true,region=cns env1-cnsnode-s864 Ready cnsnode,compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=env1-cnsnode-s864,node-role.kubernetes.io/cnsnode=True,node-role.kubernetes.io/compute=true,region=cns env1-infranode-7t4s Ready infranode 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=env1-infranode-7t4s,node-role.kubernetes.io/infranode=True env1-infranode-g9m6 Ready infranode 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=env1-infranode-g9m6,node-role.kubernetes.io/infranode=True env1-infranode-xpwf Ready infranode 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=env1-infranode-xpwf,node-role.kubernetes.io/infranode=True env1-master-j2f4 Ready master 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=env1-master-j2f4,node-role.kubernetes.io/master=true env1-master-sq3z Ready master 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=env1-master-sq3z,node-role.kubernetes.io/master=true env1-master-tv6g Ready master 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=env1-master-tv6g,node-role.kubernetes.io/master=true env1-node-hzn2 Ready compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=env1-node-hzn2,node-role.kubernetes.io/compute=true env1-node-z2b1 Ready compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=env1-node-z2b1,node-role.kubernetes.io/compute=true env1-node-zc97 Ready compute 38m v1.9.1+a0ce1bc657 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-f,kubernetes.io/hostname=env1-node-zc97,node-role.kubernetes.io/compute=true version 3.9.14 reproducible 100% Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
To clarify, the app nodes in example are env1-node-hzn2, env1-node-z2b1, env1-node-zc97 ? This is pretty hard to follow without actually seeing the group mappings and variables. The anomalies are masters aren't labeled region=master app aren't labeled region=primary infra aren't labeled region=infra The only nodes that actually get the region label applied are the cns nodes? Can you please provide your inventory and groupvars?
Created attachment 1421324 [details] working_inventory
Created attachment 1421325 [details] non-working inventory
Scott, you have correctly identified the anomaly. I attach an example of a working inventory and a non-working inventory.
Thank you for the example inventories. I am unable to track down the issue with the information provided so far. Please attach a log file with -vvv output. Please also attach yaml dump of the effected inventory using: $ ansible-inventory -i hosts --list --yaml
We have exactly the same behavior with the ansible installer 3.9.27 and 3.9.30. All region labels get lost during the installation (when glusterfs is deployed), only region=storage survives. All other node labels survive the installation. Our current workaround is to change the node labels and selectors to something different, e.g type=master, type=infra, etc...
We ran the following command against a customer's setup where we believe this bug may be manifesting itself: ansible-inventory -i [inventory file] --list --yaml The resulting YAML file shows the correct openshift_node_labels across the masters, infra, and workers. Regardless of what Ansible is receiving from the inventory, installing with two Gluster clusters (one for apps and one for logging, metrics, and registry) results in all nodes being labeled as region=infra. We plan to test and see if the behavior is repeatable without the Gluster installation playbooks being run. We are also looking in the 3.9 roles and playbooks where Ansible's oc_label module is used to try and determine where Ansible may be applying the incorrect label.
There problem here is that you're using "region" as your node-selector label for CNS. Each node can only have one label of a given name, so any node that is designated for GlusterFS will have its region label value changed to "cns". An easier solution would be to change the node selector to something like "storage=glusterfs".
The problem described above is a result of the combination of several inventory variables. openshift_storage_glusterfs_nodeselector: "region=cns" openshift_storage_glusterfs_wipe: true By specifying the node selector with the key 'region' as well as setting openshift_storage_glusterfs_wipe=True, a task [1] during install will remove all labels from all hosts with the 'key' defined in the node selector. Thus, all 'region' labels are removed from all hosts, and then only the label region=cns is added. To resolve this issue, specify a custom node selector that does not use 'region' as the key, or do not specify any openshift_storage_glusterfs_nodeselector in the inventory which would allow the default node selector of 'glusterfs=storage-host' [2]. This issue does not exist in release-3.10 as the 'Unlabel' task does not exist due to code refactoring for additional improvements in the openshift_storage_glusterfs role. Please report if the above recommendations will resolve this issue. [1] https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_storage_glusterfs/tasks/glusterfs_deploy.yml#L19-L26 [2] https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_storage_glusterfs/README.md#role-variables
This is believed to be a misconfiguration. Please see the suggestion in Comment 11.