Bug 1470394 - Logging playbook run in isolation ignores/conflicts with default node selector
Logging playbook run in isolation ignores/conflicts with default node selector
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.5.z
Assigned To: Jeff Cantrill
Xia Zhao
Depends On:
  Show dependency treegraph
Reported: 2017-07-12 18:08 EDT by Erik M Jacobs
Modified: 2017-08-31 13:00 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The installer handles both installation/configuration of logging during the initial deployment, as well as post-deployment by calling one of the ad-hoc playbooks. In the case where a default project nodeselector was used for the initial installation, the ad-hoc logging deployment would fail due to selector conflicts. Now, the logging project is force created with a null nodeselector to avoid conflicts regardless of when the logging deployment is performed.
Story Points: ---
Clone Of:
Last Closed: 2017-08-31 13:00:23 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Erik M Jacobs 2017-07-12 18:08:26 EDT
Description of problem:
I installed OpenShift 3.5 and logging's installation worked fine from scratch. I then deleted the logging project and re-ran the logging installation (to test something else). 

The logging project is not created with a null nodeselector in this case, which subsequently causes component deployment to fail.

Version-Release number of selected component (if applicable):
openshift v3.5.5.26
kubernetes v1.5.2+43a9be4

Here are some relevant snippets from the installer config:

Here is the project yaml that is created:
apiVersion: v1
kind: Project
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/sa.scc.mcs: s0:c10,c0
    openshift.io/sa.scc.supplemental-groups: 1000090000/10000
    openshift.io/sa.scc.uid-range: 1000090000/10000
  creationTimestamp: 2017-07-12T20:05:17Z
  name: logging
  resourceVersion: "8970"
  selfLink: /oapi/v1/projects/logging
  uid: 6d14c3f6-673d-11e7-8b26-123e6dadc042
  - openshift.io/origin
  - kubernetes
  phase: Active

Here is an example problem in the deployment (fluentd)
Status:                 Failed
Reason:                 MatchNodeSelector
Message:                Pod Predicate MatchNodeSelector failed

    env: user
    fluentd: "true"

Notice that the resulting nodeselector includes the system default, because the project doesn't override the selector with null.

I am not sure why this works on install from scratch (perhaps the default nodeselector comes in way later), but running the playbook directly is broken:

ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true

You can see here that "create project" is run simply:

Unfortunately, this will inherit the default nodeselector.

It appears that we need to modify the create project behavior to specify that there should be a null nodeselector:

{{ openshift.common.client_binary }} adm --config={{ mktemp.stdout }}/admin.kubeconfig new-project logging --node-selector=""

Again - I am not sure why this works with a from-scratch installation, but it definitely does not work with a post-install logging installation.
Comment 1 Erik M Jacobs 2017-07-12 18:28:48 EDT
I made a quick fix:


I'm not sure if this is the "best" fix, but it does fix the problem.
Comment 2 Erik M Jacobs 2017-07-12 19:04:49 EDT
Resubmitted as this fix only matters for 3.5:

Comment 3 Erik M Jacobs 2017-07-17 09:33:13 EDT
I got a note that this is "missing doc text" -- is there anything I need to do here?
Comment 5 Xia Zhao 2017-07-25 04:33:10 EDT
Tested with openshift-ansible-playbooks-3.5.101-1.git.0.0107544.el7.noarch

# openshift version
openshift v3.
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Test step:
0. Set following parameters in inventory file:

1. Deploy logging on OCP

2. Delete the logging project on OCP by running:
# oc delete project logging

3. Rerun the logging playbook with parameters in step 0):
# ansible-playbook -i ~/inventory -vvv /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true

4. logging stacking are all running:
# oc get po
NAME                          READY     STATUS    RESTARTS   AGE
logging-curator-1-30pc6       1/1       Running   0          2m
logging-es-nnsfiays-1-0cdhm   1/1       Running   0          2m
logging-fluentd-6s64m         1/1       Running   0          2m
logging-fluentd-q7fsf         1/1       Running   0          2m
logging-kibana-1-vzzbb        2/2       Running   0          2m
Comment 7 errata-xmlrpc 2017-08-31 13:00:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.