Bug 1470394
| Summary: | Logging playbook run in isolation ignores/conflicts with default node selector | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Erik M Jacobs <ejacobs> |
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
| Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.5.1 | CC: | aos-bugs, rmeggins |
| Target Milestone: | --- | ||
| Target Release: | 3.5.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
The installer handles both installation/configuration of logging during the initial deployment, as well as post-deployment by calling one of the ad-hoc playbooks. In the case where a default project nodeselector was used for the initial installation, the ad-hoc logging deployment would fail due to selector conflicts. Now, the logging project is force created with a null nodeselector to avoid conflicts regardless of when the logging deployment is performed.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-31 17:00:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I made a quick fix: https://github.com/openshift/openshift-ansible/pull/4746 I'm not sure if this is the "best" fix, but it does fix the problem. Resubmitted as this fix only matters for 3.5: https://github.com/openshift/openshift-ansible/pull/4747 I got a note that this is "missing doc text" -- is there anything I need to do here? Tested with openshift-ansible-playbooks-3.5.101-1.git.0.0107544.el7.noarch
# openshift version
openshift v3.5.5.31.3
kubernetes v1.5.2+43a9be4
etcd 3.1.0
Test step:
0. Set following parameters in inventory file:
openshift_logging_curator_nodeselector={"registry":"enabled"}
openshift_logging_elasticsearch_nodeselector={"registry":"enabled"}
openshift_logging_kibana_nodeselector={"registry":"enabled"}
1. Deploy logging on OCP
2. Delete the logging project on OCP by running:
# oc delete project logging
3. Rerun the logging playbook with parameters in step 0):
# ansible-playbook -i ~/inventory -vvv /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true
4. logging stacking are all running:
# oc get po
NAME READY STATUS RESTARTS AGE
logging-curator-1-30pc6 1/1 Running 0 2m
logging-es-nnsfiays-1-0cdhm 1/1 Running 0 2m
logging-fluentd-6s64m 1/1 Running 0 2m
logging-fluentd-q7fsf 1/1 Running 0 2m
logging-kibana-1-vzzbb 2/2 Running 0 2m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1828 |
Description of problem: I installed OpenShift 3.5 and logging's installation worked fine from scratch. I then deleted the logging project and re-ran the logging installation (to test something else). The logging project is not created with a null nodeselector in this case, which subsequently causes component deployment to fail. Version-Release number of selected component (if applicable): openshift v3.5.5.26 kubernetes v1.5.2+43a9be4 Here are some relevant snippets from the installer config: osm_default_node_selector='env=user' openshift_hosted_logging_curator_nodeselector='env=infra' openshift_hosted_logging_elasticsearch_nodeselector='env=infra' openshift_hosted_logging_kibana_nodeselector='env=infra' Here is the project yaml that is created: apiVersion: v1 kind: Project metadata: annotations: openshift.io/description: "" openshift.io/display-name: "" openshift.io/sa.scc.mcs: s0:c10,c0 openshift.io/sa.scc.supplemental-groups: 1000090000/10000 openshift.io/sa.scc.uid-range: 1000090000/10000 creationTimestamp: 2017-07-12T20:05:17Z name: logging resourceVersion: "8970" selfLink: /oapi/v1/projects/logging uid: 6d14c3f6-673d-11e7-8b26-123e6dadc042 spec: finalizers: - openshift.io/origin - kubernetes status: phase: Active Here is an example problem in the deployment (fluentd) Status: Failed Reason: MatchNodeSelector Message: Pod Predicate MatchNodeSelector failed nodeSelector: env: user fluentd: "true" Notice that the resulting nodeselector includes the system default, because the project doesn't override the selector with null. I am not sure why this works on install from scratch (perhaps the default nodeselector comes in way later), but running the playbook directly is broken: ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=true You can see here that "create project" is run simply: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_hosted_logging/tasks/deploy_logging.yaml#L26-L29 Unfortunately, this will inherit the default nodeselector. It appears that we need to modify the create project behavior to specify that there should be a null nodeselector: {{ openshift.common.client_binary }} adm --config={{ mktemp.stdout }}/admin.kubeconfig new-project logging --node-selector="" Again - I am not sure why this works with a from-scratch installation, but it definitely does not work with a post-install logging installation.