1463881 – Some fluentd pods are in MatchNodeSelector status and can not be started up

Bug 1463881 - Some fluentd pods are in MatchNodeSelector status and can not be started up

Summary: Some fluentd pods are in MatchNodeSelector status and can not be started up

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Cantrill
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-22 02:08 UTC by Junqi Zhao
Modified:	2017-06-28 08:46 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-28 08:46:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fluentd log (14.93 KB, text/plain) 2017-06-22 02:08 UTC, Junqi Zhao	no flags	Details
ansible inventory file (586 bytes, text/plain) 2017-06-22 02:10 UTC, Junqi Zhao	no flags	Details
View All

Description Junqi Zhao 2017-06-22 02:08:08 UTC

Created attachment 1290419 [details]
fluentd log

Description of problem:
Deploy logging on GCE which has 10 nodes, 5 fluentd pods are in MatchNodeSelector status and can not be started up,  and find "Pod Predicate MatchNodeSelector failed"  in fluentd pod log

# oc get node
NAME                             STATUS                     AGE
upg0620-master-etcd-1            Ready,SchedulingDisabled   1d
upg0620-master-etcd-2            Ready,SchedulingDisabled   1d
upg0620-master-etcd-3            Ready,SchedulingDisabled   1d
upg0620-node-primary-1           Ready                      1d
upg0620-node-primary-2           Ready                      1d
upg0620-node-primary-3           Ready                      1d
upg0620-node-primary-4           Ready                      1d
upg0620-node-primary-5           Ready                      1d
upg0620-node-registry-router-1   Ready                      1d
upg0620-node-registry-router-2   Ready                      1d

# oc get po -n logging -o wide
NAME                              READY     STATUS              RESTARTS   AGE       IP           NODE
logging-curator-1-t0lgf           1/1       Running             0          26m       10.2.18.40   upg0620-node-primary-4
logging-curator-ops-1-c70lk       1/1       Running             0          26m       10.2.16.29   upg0620-node-primary-5
logging-es-euacbkmo-1-7jfbl       1/1       Running             0          26m       10.2.10.54   upg0620-node-primary-1
logging-es-ops-ouquca0p-1-vpqq5   1/1       Running             0          26m       10.2.8.31    upg0620-node-primary-3
logging-fluentd-0j05t             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-1
logging-fluentd-1l12k             1/1       Running             0          27m       10.2.10.53   upg0620-node-primary-1
logging-fluentd-1vjrp             0/1       MatchNodeSelector   0          26m       <none>       upg0620-node-registry-router-2
logging-fluentd-28vk4             0/1       MatchNodeSelector   0          26m       <none>       upg0620-node-registry-router-1
logging-fluentd-3vn58             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-3
logging-fluentd-dmz9b             0/1       MatchNodeSelector   0          27m       <none>       upg0620-master-etcd-2
logging-fluentd-nfz79             1/1       Running             0          26m       10.2.18.38   upg0620-node-primary-4
logging-fluentd-qx3k5             1/1       Running             0          27m       10.2.12.38   upg0620-node-primary-2
logging-fluentd-scslf             1/1       Running             0          26m       10.2.16.28   upg0620-node-primary-5
logging-fluentd-tlx7v             1/1       Running             0          27m       10.2.8.30    upg0620-node-primary-3
logging-kibana-1-45t4b            2/2       Running             0          26m       10.2.18.39   upg0620-node-primary-4
logging-kibana-ops-1-c0djx        2/2       Running             0          26m       10.2.10.55   upg0620-node-primary-1

Version-Release number of selected component (if applicable):
# oc version
oc v3.5.5.27
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Images from ops mirror
# docker images | grep logging
logging-kibana          3.5.0               e0974f3393e2        10 hours ago        343.1 MB
logging-fluentd         3.5.0               63a1d8086c64        10 hours ago        232.8 MB
logging-curator         3.5.0               c14f234e4210        10 hours ago        211.3 MB
logging-auth-proxy      3.5.0               90d8b97402af        10 hours ago        215.3 MB
logging-elasticsearch   3.5.0               14766cbe8b39        10 hours ago        399.5 MB



How reproducible:
Always

Steps to Reproduce:
1.Deploy logging on GCE which has 10 nodes
2.
3.

Actual results:
5 fluentd pods are in MatchNodeSelector status and can not be started up

Expected results:
All pods should be in running status

Additional info:
Attached inventory file and fluentd log

Comment 1 Junqi Zhao 2017-06-22 02:10:50 UTC

Created attachment 1290420 [details]
ansible inventory file

Comment 2 Weihua Meng 2017-06-22 02:57:46 UTC

system works as expected from scheduler point.
Why 10 fluentd pods needed?

Comment 3 Weihua Meng 2017-06-22 03:09:14 UTC

those 5 pods are not running due to MatchNodeSelector, the reason is right.
those 5 nodes do not match all requirement.

  nodeSelector:
    logging-infra-fluentd: "true"
    region: primary
    role: node

Comment 4 Junqi Zhao 2017-06-28 08:46:10 UTC

wrong configurations, closed as WORKSFORME

Note You need to log in before you can comment on or make changes to this bug.