Created attachment 1249787 [details] ansible_upgrade_log Description of problem: Upgrade logging stacks from 3.3.1 to 3.5.0, ansible script failed at TASK [openshift_logging : command] failed eventually at last: RUNNING HANDLER [openshift_logging : restart master] *************************** PLAY RECAP ********************************************************************* $master : ok=428 changed=52 unreachable=0 failed=1 # oc get po NAME READY STATUS RESTARTS AGE logging-curator-2-deploy 0/1 Error 0 20m logging-deployer-vf4l1 0/1 Completed 0 36m logging-es-5glarbby-2-hrs2f 0/1 Pending 0 19m ES pod not able to start due to node not labeled well: ------- 35m 1s 126 {default-scheduler } Warning FailedScheduling pod (logging-es-5glarbby-2-hrs2f) failed to fit in any node fit failure summary on nodes : MatchNodeSelector (1) And the node label "logging-infra-fluentd=true" before upgrade are lost: # oc get node --show-labels NAME STATUS AGE LABELS $master Ready,SchedulingDisabled 6h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=$master,role=node $node Ready 6h beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=$node,registry=enabled,role=node,router=enabled Version-Release number of selected component (if applicable): https://github.com/openshift/openshift-ansible/ How reproducible: Always Steps to Reproduce: 1.Deploy logging 3.3.1 stacks (on OCP 3.5.0) with journald log driver enabled and node selectors defined in configmap: "use-journal": "true" "curator-nodeselector": "logging-infra-fluentd=true" "es-nodeselector": "logging-infra-fluentd=true" "kibana-nodeselector": "logging-infra-fluentd=true" Also bound es with hostPV storage on es node, wait until log entries shown on kibana UI. 2.Upgrade to logging 3.5.0 stacks by using ansible 3.Check upgrade result Actual results: Upgrade failed at TASK [openshift_logging : command] Expected results: CEFK pods should be running post upgrade Additional info: Ansible log attached Repro env attached
Per @ewolinetz, running playbook in upgrade will scale down and remove the node labels. Upon creation of the 3.5 logging objects, node selectors will only be applied if you set the following in the inventory: openshift_logging_es_nodeselector openshift_logging_es_ops_nodeselector openshift_logging_kibana_nodeselector openshift_logging_kibana_ops_nodeselector openshift_logging_curator_nodeselector openshift_logging_curator_ops_nodeselector openshift_logging_fluentd_nodeselector openshift_logging_fluentd_ops_nodeselector which must be a hash like: {'logging-infra-fluentd': 'true'}
The original issue can be fixed (with https://bugzilla.redhat.com/show_bug.cgi?id=1424981 encountered) after setting these in the inventory file used for upgrade: openshift_logging_es_nodeselector={'logging-infra-fluentd':'true'} openshift_logging_kibana_nodeselector={'logging-infra-fluentd':'true'} openshift_logging_curator_nodeselector={'logging-infra-fluentd':'true'} openshift_logging_fluentd_nodeselector={'logging-infra-fluentd':'true'} It's better to mention this in the doc, or end users may think that nodeselectors should be inherited from prior to upgrade. @jcantril Do you think it necessary to have a separate doc issue to track this?
In the 3.5 document PR we have made the change to reflect this. It may be worth a separate issue to explicitly note the difference between 3.4 and 3.5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3438