Bug 1463881 - Some fluentd pods are in MatchNodeSelector status and can not be started up
Some fluentd pods are in MatchNodeSelector status and can not be started up
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Jeff Cantrill
Xia Zhao
Depends On:
  Show dependency treegraph
Reported: 2017-06-21 22:08 EDT by Junqi Zhao
Modified: 2017-06-28 04:46 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-06-28 04:46:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
fluentd log (14.93 KB, text/plain)
2017-06-21 22:08 EDT, Junqi Zhao
no flags Details
ansible inventory file (586 bytes, text/plain)
2017-06-21 22:10 EDT, Junqi Zhao
no flags Details

  None (edit)
Description Junqi Zhao 2017-06-21 22:08:08 EDT
Created attachment 1290419 [details]
fluentd log

Description of problem:
Deploy logging on GCE which has 10 nodes, 5 fluentd pods are in MatchNodeSelector status and can not be started up,  and find "Pod Predicate MatchNodeSelector failed"  in fluentd pod log

# oc get node
NAME                             STATUS                     AGE
upg0620-master-etcd-1            Ready,SchedulingDisabled   1d
upg0620-master-etcd-2            Ready,SchedulingDisabled   1d
upg0620-master-etcd-3            Ready,SchedulingDisabled   1d
upg0620-node-primary-1           Ready                      1d
upg0620-node-primary-2           Ready                      1d
upg0620-node-primary-3           Ready                      1d
upg0620-node-primary-4           Ready                      1d
upg0620-node-primary-5           Ready                      1d
upg0620-node-registry-router-1   Ready                      1d
upg0620-node-registry-router-2   Ready                      1d

# oc get po -n logging -o wide
NAME                              READY     STATUS              RESTARTS   AGE       IP           NODE
logging-curator-1-t0lgf           1/1       Running             0          26m   upg0620-node-primary-4
logging-curator-ops-1-c70lk       1/1       Running             0          26m   upg0620-node-primary-5
logging-es-euacbkmo-1-7jfbl       1/1       Running             0          26m   upg0620-node-primary-1
logging-es-ops-ouquca0p-1-vpqq5   1/1       Running             0          26m    upg0620-node-primary-3
logging-fluentd-0j05t             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-1
logging-fluentd-1l12k             1/1       Running             0          27m   upg0620-node-primary-1
logging-fluentd-1vjrp             0/1       MatchNodeSelector   0          26m       <none>       upg0620-node-registry-router-2
logging-fluentd-28vk4             0/1       MatchNodeSelector   0          26m       <none>       upg0620-node-registry-router-1
logging-fluentd-3vn58             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-3
logging-fluentd-dmz9b             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-2
logging-fluentd-nfz79             1/1       Running             0          26m   upg0620-node-primary-4
logging-fluentd-qx3k5             1/1       Running             0          27m   upg0620-node-primary-2
logging-fluentd-scslf             1/1       Running             0          26m   upg0620-node-primary-5
logging-fluentd-tlx7v             1/1       Running             0          27m    upg0620-node-primary-3
logging-kibana-1-45t4b            2/2       Running             0          26m   upg0620-node-primary-4
logging-kibana-ops-1-c0djx        2/2       Running             0          26m   upg0620-node-primary-1

Version-Release number of selected component (if applicable):
# oc version
oc v3.5.5.27
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Images from ops mirror
# docker images | grep logging
logging-kibana          3.5.0               e0974f3393e2        10 hours ago        343.1 MB
logging-fluentd         3.5.0               63a1d8086c64        10 hours ago        232.8 MB
logging-curator         3.5.0               c14f234e4210        10 hours ago        211.3 MB
logging-auth-proxy      3.5.0               90d8b97402af        10 hours ago        215.3 MB
logging-elasticsearch   3.5.0               14766cbe8b39        10 hours ago        399.5 MB

How reproducible:

Steps to Reproduce:
1.Deploy logging on GCE which has 10 nodes

Actual results:
5 fluentd pods are in MatchNodeSelector status and can not be started up

Expected results:
All pods should be in running status

Additional info:
Attached inventory file and fluentd log
Comment 1 Junqi Zhao 2017-06-21 22:10 EDT
Created attachment 1290420 [details]
ansible inventory file
Comment 2 Weihua Meng 2017-06-21 22:57:46 EDT
system works as expected from scheduler point.
Why 10 fluentd pods needed?
Comment 3 Weihua Meng 2017-06-21 23:09:14 EDT
those 5 pods are not running due to MatchNodeSelector, the reason is right.
those 5 nodes do not match all requirement.

    logging-infra-fluentd: "true"
    region: primary
    role: node
Comment 4 Junqi Zhao 2017-06-28 04:46:10 EDT
wrong configurations, closed as WORKSFORME

Note You need to log in before you can comment on or make changes to this bug.