Bug 1469037 - Sometime daemonset DESIRED=0 even this matched node
Sometime daemonset DESIRED=0 even this matched node
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.6.0
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: ewolinet
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-10 06:19 EDT by DeShuai Ma
Modified: 2017-08-16 15 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 01:31:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
atomic-openshift-master.log (11.02 MB, text/x-vhdl)
2017-07-10 06:19 EDT, DeShuai Ma
no flags Details
nod1.log (5.42 MB, text/x-vhdl)
2017-07-10 06:20 EDT, DeShuai Ma
no flags Details
node2.log (10.90 MB, text/x-vhdl)
2017-07-10 06:22 EDT, DeShuai Ma
no flags Details
ds&node info (21.28 KB, text/plain)
2017-07-11 03:04 EDT, DeShuai Ma
no flags Details

  None (edit)
Description DeShuai Ma 2017-07-10 06:19:41 EDT
Created attachment 1295764 [details]
atomic-openshift-master.log

Description of problem:
When install service-catalog by openshift-ansible, I met service-catalog can't running error, then ssh to install debug, the ds's DESIRED=0, but actually there is matched node. Then restart master service can fix this.

Version-Release number of selected component (if applicable):
openshift v3.6.136
kubernetes v1.6.1+5115d708d7
etcd 3.2.1


How reproducible:
Sometime

Steps to Reproduce:
1.[root@ip-172-18-0-4 ~]# oc get ds -n kube-service-catalog
NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR               AGE
apiserver            0         0         0         0            0           openshift-infra=apiserver   27m
controller-manager   0         0         0         0            0           openshift-infra=apiserver   27m
[root@ip-172-18-0-4 ~]# 
[root@ip-172-18-0-4 ~]# 
[root@ip-172-18-0-4 ~]# 
[root@ip-172-18-0-4 ~]# 
[root@ip-172-18-0-4 ~]# oc get no --show-labels
NAME                            STATUS                     AGE       VERSION             LABELS
ip-172-18-0-4.ec2.internal      Ready,SchedulingDisabled   43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m3.medium,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1d,kubernetes.io/hostname=ip-172-18-0-4.ec2.internal,openshift-infra=apiserver,role=node
ip-172-18-11-233.ec2.internal   Ready                      43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m3.medium,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1d,kubernetes.io/hostname=ip-172-18-11-233.ec2.internal,registry=enabled,role=node,router=enabled

2.
3.

Actual results:


Expected results:
https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/17715/console



Additional info:
Comment 1 DeShuai Ma 2017-07-10 06:20 EDT
Created attachment 1295776 [details]
nod1.log
Comment 2 DeShuai Ma 2017-07-10 06:22 EDT
Created attachment 1295777 [details]
node2.log
Comment 3 Paul Morie 2017-07-10 11:49:26 EDT
Eric Wolinetz is attempting to reproduce this now.  He says the node labels in the original comment look correct.

Could we get the logs from the controller manager?  I reviewed the node logs and they looked uneventful.
Comment 4 Paul Morie 2017-07-10 11:50:12 EDT
Could we also get a yaml dump of the daemon sets that were created?
Comment 5 DeShuai Ma 2017-07-10 21:59:48 EDT
The controller-manager log is attached in file atomic-openshift-master.log

daemonset.yaml: http://pastebin.test.redhat.com/501739 (note: the daemonset I provided the link is working well as I restart the master)
Comment 6 DeShuai Ma 2017-07-11 03:04 EDT
Created attachment 1296100 [details]
ds&node info

Reproduce again. Attach some info about ds and node
Comment 7 Paul Morie 2017-07-13 14:38:32 EDT
We debugged a customer issue similar to this one yesterday.  Can we establish:

1.  Are pods being created at all for the daemon set?  If so, can we get yamls and describe output for them?
2.  Is there a node selector associated with the namespace? Can we get a yaml for the namespace?

In the issue we debugged today, the default node selectors for the project and later the cluster were resulting in pods being created, but not being scheduled on certain nodes due to conflicts between the pod's node selector and the nodes labels that were introduced by the project node selector.
Comment 8 DeShuai Ma 2017-07-14 12:39:43 EDT
When happen again, I'll check what you said. To be honest, it's really hard to reproduce it.
Comment 9 DeShuai Ma 2017-07-14 12:44:07 EDT
This daemonset doesn't create by my manual. it create by openshift-ansible when enable service-catalog. This ds is service-catalog apiserver and controller-manager in kube-service-catalog project.
Comment 10 Paul Morie 2017-07-18 13:47:02 EDT
I spoke to Eric and he is not currently using a node selector on the namespace the installer creates for the catalog components.  He is going to add one in this PR: https://github.com/openshift/openshift-ansible/pull/4781

That should address this issue - I don't think that we have a cause to believe that something else is happening.  I am going to reassign this bug to Eric and he can move it to ON_QA once that PR is merged.
Comment 12 DeShuai Ma 2017-07-24 02:44:05 EDT
Verify on openshift-ansible-3.6.162-1.git.0.50e29bd.el7.noarch.rpm.

Now can't met the error again.
Comment 14 errata-xmlrpc 2017-08-10 01:31:01 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.