1478771 – We do not preserve the security context and node selector for the elasticsearch dc after running ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgade to OCP 3.5

Bug 1478771 - We do not preserve the security context and node selector for the elasticsearch dc after running ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgade to OCP 3.5

Summary: We do not preserve the security context and node selector for the elasticsear...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.5.z
Assignee:	Jeff Cantrill
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1482661
TreeView+	depends on / blocked

Reported:	2017-08-07 02:22 UTC by Miheer Salunke
Modified:	2020-12-14 09:22 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The nodeSelector and supplementalGroups from existing deployments were being ignored because it was believed these parameters should be defined in the inventory file. There was no way to define values for each DC which hinders the usecase where a deployment is using hostmount volumes Consequence: The nodeSelector and supplementalGroups were being replaces with those defined in the inventory file. Fix: Use nodeSelector and supplementalGroups from logging facts if they exist for a given DC Result: nodeSelector and supplementalGroups are retained when applied changes.
Clone Of:
Clones:	1482661 (view as bug list)
Environment:
Last Closed:	2017-10-17 11:45:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift-ansible/pull/5143	0	None	None	None	2019-12-02 15:06:31 UTC
Red Hat Product Errata	RHBA-2017:2900	0	normal	SHIPPED_LIVE	OpenShift Container Platform atomic-openshift-utils bug fix and enhancement	2017-10-17 15:44:50 UTC

Description Miheer Salunke 2017-08-07 02:22:33 UTC

Description of problem:

Hi, after upgrading our OCP cluster to 3.5. We are trying to upgrade our EFK logging to 3.5 using ansible playbook as per instruction: 

https://docs.openshift.com/container-platform/3.5/install_config/upgrading/automated_upgrades.html#automated-upgrading-efk-logging-stack

We got this in our ansible inventory file:
<<<
[openshift@drlosl01 ~]$ cat /etc/ansible/hosts | grep logging
openshift_master_logging_public_url=https://kibana.ose-preprod.soemthing.com
openshift_logging_install_logging=true
openshift_logging_image_version=v3.5
openshift_logging_es_cluster_size=4
>>>

when we run the upgrade playbook:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

it has successfully updated the images with v3.5, however, there are a few config has been overwritten, such as:

[root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep security -A 1
        securityContext:
          privileged: true

and

[root@drlosm01 ~]# oc get dc logging-es-566tybv5 -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "1"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-6teajwil -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "2"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-fib6vdfp -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "3"
        region: infra

[root@drlosm01 ~]# oc get dc logging-es-q1ngcy5b -o yaml | grep nodeSelector -A 2
      nodeSelector:
        logging-es-node: "4"
        region: infra

I understand that we can use this openshift_logging_es_nodeselector={"region":"infra"} to add nodeSelector for all logging-es DC. Note that we got a specify nodeSelector 'logging-es-node' which is used to tie one es instance to one particular node since we are using local storage for ES storage. The deployment is based on documentation here:

https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch

* As it was the recommended approach by the time we designed and deployed this. Please let us know if it is no longer the case. 

Please advise what is the supported way to upgrade logging to 3.5 in this case?



Version-Release number of the following components:
OCP 3.5

How reproducible:

Steps to Reproduce:
1.Run ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml after upgrading to OCP 3.5
2.
3.

Actual results:


Expected results:

Additional info:

Comment 2 Jeff Cantrill 2017-08-07 17:50:06 UTC

Miheer, Can you further explain what you are trying to accomplish?  We believe the missing security context is a bug but it is not clear to us what the intention is of trying to bind a specific ES pod to a specific cluster node.  Please explain as well as attach the DCs for the ES nodes and any PV/PVC definitions.

Comment 3 Miheer Salunke 2017-08-09 01:26:31 UTC

We have deployed our logging stack based in docs here: https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html#aggregated-elasticsearch

As you can see, the instructions showed using local storage for Elasticsearch and each ES instance has to be bound to specify ES node with nodeSelector.
<<<
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1" 
>>>

and point number 3 says: 
The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. 

What we are trying to achieve is: how can we prevent the automated logging upgrade ansible playbook to not overwrite the above configuration?

No PV and PVC been used in this case.

Comment 4 Jeff Cantrill 2017-08-18 18:04:18 UTC

Resolved in PR openshift-ansible/pull/5143

Comment 5 Rich Megginson 2017-09-08 16:16:23 UTC

Has this been fixed in atomic-openshift-ansible yet?  Can we move this to ON_QA?

Comment 7 Anping Li 2017-09-14 09:08:51 UTC

The PR wasn't merged to latest v3.5 branch.  It should merged to v3.3 and attached to an installer errata.

Comment 10 Anping Li 2017-09-26 02:18:37 UTC

Verified and pass using openshift-ansible-3.5.125-

$ cat v34_securitygroup
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:

$ cat v34_nodeSelector
        nodeSelector:
          logging-node: "1"
--
        nodeSelector:
          logging-node: "2"
--
        nodeSelector:
          logging-node: "3"



$ cat v35_securitygroup
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:
--
        securityContext:
          supplementalGroups:

 cat v35_nodeSelector
        nodeSelector:
          logging-node: "1"
        restartPolicy: Always
        securityContext:
--
        nodeSelector:
          logging-node: "2"
        restartPolicy: Always
        securityContext:
--
        nodeSelector:
          logging-node: "3"
        restartPolicy: Always
        securityContext:

Comment 11 Anping Li 2017-09-26 03:36:36 UTC

I miss one securityContext in comment 10. There are two securityContext in DC. 
The first is created by ansible. 
The second  is created by 'oc patch' command following the document [1] 
The second one is still overwrote when using the openshift-ansible-3.5.125 with the fix PR.

@Jeff, could you confirm if we need to persist the second securityContext.?


[1]
https://docs.openshift.com/container-platform/3.5/install_config/aggregate_logging.html-> Persistent Elasticsearch Storage -> 2. Each Elasticsearch replica definition must be patched to claim that privilege, for example:
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done

Comment 12 Jeff Cantrill 2017-09-26 20:19:10 UTC

@Anping,

I think if this is the only way for the logging stack to continue to work after they applied this patch that it is reasonable to expect us to need to preserver the 'privileged': true

Comment 13 Jeff Cantrill 2017-09-26 20:58:39 UTC

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1478771 to address c#11

Comment 14 Anping Li 2017-09-27 02:20:28 UTC

@Jeff, I think the opened bug is https://bugzilla.redhat.com/show_bug.cgi?id=1496271.

we only preserved the nodeSelector in this bug fix.

Comment 15 Anping Li 2017-09-28 05:38:28 UTC

The nodeSelector is preserved. so move to verified. the securityContext issue will be fixed in  Bug1496271

Comment 17 errata-xmlrpc 2017-10-17 11:45:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2900

Note You need to log in before you can comment on or make changes to this bug.