Bug 1478771 - We do not preserve the security context and node selector for the elasticsearch dc after running ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgade to OCP 3.5 [NEEDINFO]
We do not preserve the security context and node selector for the elasticsear...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.5.0
Unspecified Unspecified
unspecified Severity medium
: ---
: 3.5.z
Assigned To: Jeff Cantrill
Anping Li
:
Depends On:
Blocks: 1482661
  Show dependency treegraph
 
Reported: 2017-08-06 22:22 EDT by Miheer Salunke
Modified: 2017-10-17 07:45 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The nodeSelector and supplementalGroups from existing deployments were being ignored because it was believed these parameters should be defined in the inventory file. There was no way to define values for each DC which hinders the usecase where a deployment is using hostmount volumes Consequence: The nodeSelector and supplementalGroups were being replaces with those defined in the inventory file. Fix: Use nodeSelector and supplementalGroups from logging facts if they exist for a given DC Result: nodeSelector and supplementalGroups are retained when applied changes.
Story Points: ---
Clone Of:
: 1482661 (view as bug list)
Environment:
Last Closed: 2017-10-17 07:45:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
anli: needinfo? (jcantril)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github openshift-ansible/pull/5143 None None None 2017-08-18 14:04 EDT

  None (edit)
Description Miheer Salunke 2017-08-06 22:22:33 EDT
Description of problem:

Hi, after upgrading our OCP cluster to 3.5. We are trying to upgrade our EFK logging to 3.5 using ansible playbook as per instruction: 

https://docs.openshift.com/container-platform/3.5/install_config/upgrading/automated_upgrades.html#automated-upgrading-efk-logging-stack

We got this in our ansible inventory file:
<<<
[openshift@drlosl01 ~]$ cat /etc/ansible/hosts | grep logging
openshift_master_logging_public_url=https://kibana.ose-preprod.soemthing.com
openshift_logging_install_logging=true
openshift_logging_image_version=v3.5
openshift_logging_es_cluster_size=4
>>>

when we run the upgrade playbook:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

it has successfully updated the images with v3.5, however, there are a few config has been overwritten, such as:

[root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep security -A 1
        securityContext:
          privileged: true

and

[root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "1"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-6teajwil -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "2"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-fib6vdfp -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "3"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-q1ngcy5b -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "4"
        region: infra

I understand that we can use this openshift_logging_es_nodeselector={"region":"infra"} to add nodeSelector for all logging-es DC. Note that we got a specify nodeSelector 'logging-es-node' which is used to tie one es instance to one particular node since we are using local storage for ES storage. The deployment is based on documentation here:

https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch

* As it was the recommended approach by the time we designed and deployed this. Please let us know if it is no longer the case. 

Please advise what is the supported way to upgrade logging to 3.5 in this case?



Version-Release number of the following components:
OCP 3.5

How reproducible:

Steps to Reproduce:
1.Run ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgrading to OCP 3.5
2.
3.

Actual results:


Expected results:

Additional info:
Comment 2 Jeff Cantrill 2017-08-07 13:50:06 EDT
Miheer, Can you further explain what you are trying to accomplish?  We believe the missing security context is a bug but it is not clear to us what the intention is of trying to bind a specific ES pod to a specific cluster node.  Please explain as well as attach the DCs for the ES nodes and any PV/PVC definitions.
Comment 3 Miheer Salunke 2017-08-08 21:26:31 EDT
We have deployed our logging stack based in docs here: https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch

As you can see, the instructions showed using local storage for Elasticsearch and each ES instance has to be bound to specify ES node with nodeSelector.
<<<
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1" 
>>>

and point number 3 says: 
The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. 

What we are trying to achieve is: how can we prevent the automated logging upgrade ansible playbook to not overwrite the above configuration?

No PV and PVC been used in this case.
Comment 4 Jeff Cantrill 2017-08-18 14:04:18 EDT
Resolved in PR openshift-ansible/pull/5143
Comment 5 Rich Megginson 2017-09-08 12:16:23 EDT
Has this been fixed in atomic-openshift-ansible yet?  Can we move this to ON_QA?
Comment 7 Anping Li 2017-09-14 05:08:51 EDT
The PR wasn't merged to latest v3.5 branch.  It should merged to v3.3 and attached to an installer errata.
Comment 10 Anping Li 2017-09-25 22:18:37 EDT
Verified and pass using openshift-ansible-3.5.125-

$ cat v34_securitygroup
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:

$ cat v34_nodeSelector
        nodeSelector:
          logging-node: "1"
--
        nodeSelector:
          logging-node: "2"
--
        nodeSelector:
          logging-node: "3"



$ cat v35_securitygroup
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:

 cat v35_nodeSelector
        nodeSelector:
          logging-node: "1"
        restartPolicy: Always
        securityContext:
--
        nodeSelector:
          logging-node: "2"
        restartPolicy: Always
        securityContext:
--
        nodeSelector:
          logging-node: "3"
        restartPolicy: Always
        securityContext:
Comment 11 Anping Li 2017-09-25 23:36:36 EDT
I miss one securityContext in comment 10. There are two securityContext in DC. 
The first is created by ansible. 
The second  is created by 'oc patch' command following the document [1] 
The second one is still overwrote when using the openshift-ansible-3.5.125 with the fix PR.

@Jeff, could you confirm if we need to persist the second securityContext.?


[1]
https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html-> Persistent Elasticsearch Storage -> 2. Each Elasticsearch replica definition must be patched to claim that privilege, for example:
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done
Comment 12 Jeff Cantrill 2017-09-26 16:19:10 EDT
@Anping,

I think if this is the only way for the logging stack to continue to work after they applied this patch that it is reasonable to expect us to need to preserver the 'privileged': true
Comment 13 Jeff Cantrill 2017-09-26 16:58:39 EDT
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1478771 to address c#11
Comment 14 Anping Li 2017-09-26 22:20:28 EDT
@Jeff, I think the opened bug is https://bugzilla.redhat.com/show_bug.cgi?id=1496271.

we only preserved the nodeSelector in this bug fix.
Comment 15 Anping Li 2017-09-28 01:38:28 EDT
The nodeSelector is preserved. so move to verified. the securityContext issue will be fixed in  Bug1496271
Comment 17 errata-xmlrpc 2017-10-17 07:45:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2900

Note You need to log in before you can comment on or make changes to this bug.