Bug 1470394 - Logging playbook run in isolation ignores/conflicts with default node selector
Summary: Logging playbook run in isolation ignores/conflicts with default node selector
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.z
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-12 22:08 UTC by Erik M Jacobs
Modified: 2017-08-31 17:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The installer handles both installation/configuration of logging during the initial deployment, as well as post-deployment by calling one of the ad-hoc playbooks. In the case where a default project nodeselector was used for the initial installation, the ad-hoc logging deployment would fail due to selector conflicts. Now, the logging project is force created with a null nodeselector to avoid conflicts regardless of when the logging deployment is performed.
Clone Of:
Environment:
Last Closed: 2017-08-31 17:00:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1828 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, and 3.3 bug fix update 2017-08-31 20:59:56 UTC

Description Erik M Jacobs 2017-07-12 22:08:26 UTC
Description of problem:
I installed OpenShift 3.5 and logging's installation worked fine from scratch. I then deleted the logging project and re-ran the logging installation (to test something else). 

The logging project is not created with a null nodeselector in this case, which subsequently causes component deployment to fail.

Version-Release number of selected component (if applicable):
openshift v3.5.5.26
kubernetes v1.5.2+43a9be4

Here are some relevant snippets from the installer config:
osm_default_node_selector='env=user'
openshift_hosted_logging_curator_nodeselector='env=infra'
openshift_hosted_logging_elasticsearch_nodeselector='env=infra'
openshift_hosted_logging_kibana_nodeselector='env=infra'

Here is the project yaml that is created:
apiVersion: v1
kind: Project
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/sa.scc.mcs: s0:c10,c0
    openshift.io/sa.scc.supplemental-groups: 1000090000/10000
    openshift.io/sa.scc.uid-range: 1000090000/10000
  creationTimestamp: 2017-07-12T20:05:17Z
  name: logging
  resourceVersion: "8970"
  selfLink: /oapi/v1/projects/logging
  uid: 6d14c3f6-673d-11e7-8b26-123e6dadc042
spec:
  finalizers:
  - openshift.io/origin
  - kubernetes
status:
  phase: Active


Here is an example problem in the deployment (fluentd)
Status:                 Failed
Reason:                 MatchNodeSelector
Message:                Pod Predicate MatchNodeSelector failed

nodeSelector:
    env: user
    fluentd: "true"

Notice that the resulting nodeselector includes the system default, because the project doesn't override the selector with null.

I am not sure why this works on install from scratch (perhaps the default nodeselector comes in way later), but running the playbook directly is broken:

ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true

You can see here that "create project" is run simply:
https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_hosted_logging/tasks/deploy_logging.yaml#L26-L29

Unfortunately, this will inherit the default nodeselector.

It appears that we need to modify the create project behavior to specify that there should be a null nodeselector:

{{ openshift.common.client_binary }} adm --config={{ mktemp.stdout }}/admin.kubeconfig new-project logging --node-selector=""

Again - I am not sure why this works with a from-scratch installation, but it definitely does not work with a post-install logging installation.

Comment 1 Erik M Jacobs 2017-07-12 22:28:48 UTC
I made a quick fix:

https://github.com/openshift/openshift-ansible/pull/4746

I'm not sure if this is the "best" fix, but it does fix the problem.

Comment 2 Erik M Jacobs 2017-07-12 23:04:49 UTC
Resubmitted as this fix only matters for 3.5:

https://github.com/openshift/openshift-ansible/pull/4747

Comment 3 Erik M Jacobs 2017-07-17 13:33:13 UTC
I got a note that this is "missing doc text" -- is there anything I need to do here?

Comment 5 Xia Zhao 2017-07-25 08:33:10 UTC
Tested with openshift-ansible-playbooks-3.5.101-1.git.0.0107544.el7.noarch

# openshift version
openshift v3.5.5.31.3
kubernetes v1.5.2+43a9be4
etcd 3.1.0


Test step:
0. Set following parameters in inventory file:
openshift_logging_curator_nodeselector={"registry":"enabled"}
openshift_logging_elasticsearch_nodeselector={"registry":"enabled"}
openshift_logging_kibana_nodeselector={"registry":"enabled"}

1. Deploy logging on OCP

2. Delete the logging project on OCP by running:
# oc delete project logging

3. Rerun the logging playbook with parameters in step 0):
# ansible-playbook -i ~/inventory -vvv /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true

4. logging stacking are all running:
# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-30pc6       1/1       Running   0          2m
logging-es-nnsfiays-1-0cdhm   1/1       Running   0          2m
logging-fluentd-6s64m         1/1       Running   0          2m
logging-fluentd-q7fsf         1/1       Running   0          2m
logging-kibana-1-vzzbb        2/2       Running   0          2m

Comment 7 errata-xmlrpc 2017-08-31 17:00:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1828


Note You need to log in before you can comment on or make changes to this bug.