Description of problem: Hi, after upgrading our OCP cluster to 3.5. We are trying to upgrade our EFK logging to 3.5 using ansible playbook as per instruction: https://docs.openshift.com/container-platform/3.5/install_config/upgrading/automated_upgrades.html#automated-upgrading-efk-logging-stack We got this in our ansible inventory file: <<< [openshift@drlosl01 ~]$ cat /etc/ansible/hosts | grep logging openshift_master_logging_public_url=https://kibana.ose-preprod.soemthing.com openshift_logging_install_logging=true openshift_logging_image_version=v3.5 openshift_logging_es_cluster_size=4 >>> when we run the upgrade playbook: ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml it has successfully updated the images with v3.5, however, there are a few config has been overwritten, such as: [root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep security -A 1 securityContext: privileged: true and [root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep nodeSelector -A 2 nodeSelector: logging-es-node: "1" region: infra [root@drlosm01 ~]# oc get dc logging-es-6teajwil -o yaml | grep nodeSelector -A 2 nodeSelector: logging-es-node: "2" region: infra [root@drlosm01 ~]# oc get dc logging-es-fib6vdfp -o yaml | grep nodeSelector -A 2 nodeSelector: logging-es-node: "3" region: infra [root@drlosm01 ~]# oc get dc logging-es-q1ngcy5b -o yaml | grep nodeSelector -A 2 nodeSelector: logging-es-node: "4" region: infra I understand that we can use this openshift_logging_es_nodeselector={"region":"infra"} to add nodeSelector for all logging-es DC. Note that we got a specify nodeSelector 'logging-es-node' which is used to tie one es instance to one particular node since we are using local storage for ES storage. The deployment is based on documentation here: https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch * As it was the recommended approach by the time we designed and deployed this. Please let us know if it is no longer the case. Please advise what is the supported way to upgrade logging to 3.5 in this case? Version-Release number of the following components: OCP 3.5 How reproducible: Steps to Reproduce: 1.Run ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgrading to OCP 3.5 2. 3. Actual results: Expected results: Additional info:
Miheer, Can you further explain what you are trying to accomplish? We believe the missing security context is a bug but it is not clear to us what the intention is of trying to bind a specific ES pod to a specific cluster node. Please explain as well as attach the DCs for the ES nodes and any PV/PVC definitions.
We have deployed our logging stack based in docs here: https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch As you can see, the instructions showed using local storage for Elasticsearch and each ES instance has to be bound to specify ES node with nodeSelector. <<< apiVersion: v1 kind: DeploymentConfig spec: template: spec: nodeSelector: logging-es-node: "1" >>> and point number 3 says: The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. What we are trying to achieve is: how can we prevent the automated logging upgrade ansible playbook to not overwrite the above configuration? No PV and PVC been used in this case.
Resolved in PR openshift-ansible/pull/5143
Has this been fixed in atomic-openshift-ansible yet? Can we move this to ON_QA?
The PR wasn't merged to latest v3.5 branch. It should merged to v3.3 and attached to an installer errata.
Verified and pass using openshift-ansible-3.5.125- $ cat v34_securitygroup securityContext: supplementalGroups: -- securityContext: supplementalGroups: -- securityContext: supplementalGroups: $ cat v34_nodeSelector nodeSelector: logging-node: "1" -- nodeSelector: logging-node: "2" -- nodeSelector: logging-node: "3" $ cat v35_securitygroup securityContext: supplementalGroups: -- securityContext: supplementalGroups: -- securityContext: supplementalGroups: cat v35_nodeSelector nodeSelector: logging-node: "1" restartPolicy: Always securityContext: -- nodeSelector: logging-node: "2" restartPolicy: Always securityContext: -- nodeSelector: logging-node: "3" restartPolicy: Always securityContext:
I miss one securityContext in comment 10. There are two securityContext in DC. The first is created by ansible. The second is created by 'oc patch' command following the document [1] The second one is still overwrote when using the openshift-ansible-3.5.125 with the fix PR. @Jeff, could you confirm if we need to persist the second securityContext.? [1] https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html-> Persistent Elasticsearch Storage -> 2. Each Elasticsearch replica definition must be patched to claim that privilege, for example: $ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do oc scale $dc --replicas=0 oc patch $dc \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}' done
@Anping, I think if this is the only way for the logging stack to continue to work after they applied this patch that it is reasonable to expect us to need to preserver the 'privileged': true
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1478771 to address c#11
@Jeff, I think the opened bug is https://bugzilla.redhat.com/show_bug.cgi?id=1496271. we only preserved the nodeSelector in this bug fix.
The nodeSelector is preserved. so move to verified. the securityContext issue will be fixed in Bug1496271
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2900