Description of problem: Failing to upgrade ELK after cluster upgrade Version-Release number of selected component (if applicable): 3.11.153 --> 3.11.219 How reproducible: 100% so far Steps to Reproduce: 1. Upgrade clsuter from 3.11.153 to 3.11.219 2. upgrade ELK stack per https://docs.openshift.com/container-platform/3.11/upgrading/automated_upgrades.html#upgrading-efk-logging-stack Actual results: TASK [openshift_logging : fail] ****************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/main.yaml:2 Monday 01 June 2020 10:45:54 -0600 (0:00:00.555) 0:00:18.187 *********** fatal: [masterc01.testnet.net]: FAILED! => { "changed": false, "msg": "Only one Fluentd nodeselector key pair should be provided" Expected results: Upgrade completion with out failure Additional info:
Please attach your inventory file and entire log. Looks like the error is telling us what the problem is: "Only one Fluentd nodeselector key pair should be provided"
Moving to UpcomingSprint as unlikely to be resolved by EOS
*** This bug has been marked as a duplicate of bug 1848454 ***
Hi Jon, A couple of questions: Any particular reason you're not relying on the playbooks/openshift-logging/config.yml and shut things down, patch them, etc manually? At which point of this workflow did you run the logging dump tool? Could you run it just before you run the Ansible PB? Regards, Sergey.
Moving to UpcomingSprint as unlikely to be addressed by EOD
Filed a docs issue for 3.11: https://github.com/openshift/openshift-docs/issues/25677, closing this one.
From the update documentation: 3. Run the openshift-logging/config.yml playbook according to the deploying the EFK stack instructions to complete the logging upgrade. You run the installation playbook for the new OpenShift Container Platform version to upgrade the logging deployment. so /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml So yes setting the nodeSelector to non-existing: true, is what the instructions say to do and this will cause the DS to terminate the logging-fluentd pods as expected. then we have to remove the other nodeSelector so that we don't get the error about nodeSelectors and break the updating of the EFK stack so I run the oc command you had me try oc patch ds logging-fluentd --type json -p '[{ "op": "remove", "path": "/spec/template/spec/nodeSelector/logging-infra-fluentd" }]' This works, then we update the EFK stack ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml and the EFK gets updated, however during that process as I showed in the last email it labels all nodes with non-existing: true, and the DS starts spinning up openshift-fluentd pods, the correct version. After the update I remove the patch that was applied and add back the original oc patch ds logging-fluentd -p '{"spec": {"template": {"spec": {"nodeSelector": {"logging-infra-fluentd": "true"}}}}}' oc patch ds logging-fluentd --type json -p '[{ "op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing" }]' It all works except that now all nodes have an extra label of non-existing: true. I found this out but updating from 3.11.153 to 3.11.248 then I went to update to 3.11.286 and that is when I saw this as when I want to patch to remove logging-infra-fluentd": "true" it did not terminate the logging-fluentd pods and started me looking into why.
Docs PR https://github.com/openshift/openshift-docs/pull/27310