Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1595573

Summary: New 3.9 install asb-1-deploy and asb-etcd-1-deploy stay in error status with update acceptor rejected
Product: OpenShift Container Platform Reporter: David Caldwell <dcaldwel>
Component: openshift-controller-managerAssignee: Michal Fojtik <mfojtik>
Status: CLOSED NOTABUG QA Contact: Wang Haoran <haowang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, cstark, dapark, dcaldwel, maszulik, mfojtik, syangsao
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-24 14:32:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oc describe asb-1-deploy
none
oc describe asb-etcd
none
oc get dc,rc,pod -o yaml
none
complete journald since install filtered for unit openshift-controllers
none
journal log for the time period during which I changed to loglevel=4 only - filtered for unit openshift-controllers none

Description David Caldwell 2018-06-27 07:46:17 UTC
Description of problem:

Environment is quicklab.

Fresh installation of OCP 3.9 asb-1-deploy and asb-etcd-1-deploy stay in error state.

Version-Release number of selected component (if applicable):
OCP 3.9.30 


How reproducible:

I needed a fresh installation of OCP 3.9 latest for testing, so I cleaned and then redeployed the cluster.

Cleaned cluster using

$ ansible-playbook -i /home/quicklab/hosts /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

Installed cluster using

$ ansible-playbook -i /home/quicklab/hosts /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml


Actual results:

$ oc get pods
NAME                READY     STATUS    RESTARTS   AGE
asb-1-deploy        0/1       Error     0          16h
asb-etcd-1-deploy   0/1       Error     0          16h


Expected results:

Zero errors.


Additional info:

I still have this environment running if any troubleshooting is needed.

oc logs output:

$ oc logs asb-1-deploy 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

$ oc logs asb-etcd-1-deploy 
--> Scaling asb-etcd-1 to 1
error: update acceptor rejected asb-etcd-1: pods for rc 'openshift-ansible-service-broker/asb-etcd-1' took longer than 600 seconds to become available

Comment 1 Michal Fojtik 2018-06-27 08:44:00 UTC
Can we get the following:

* oc describe rs/asb-1
* oc describe pod/asb-1-deploy
* oc describe pod/asb-etcd-1-deploy
* controller logs (ideally with loglevel>3)
* oc get dc,rc,pod -o yaml

Comment 2 David Caldwell 2018-06-27 10:31:23 UTC
Created attachment 1454994 [details]
oc describe asb-1-deploy

Comment 3 David Caldwell 2018-06-27 10:31:52 UTC
Created attachment 1454995 [details]
oc describe asb-etcd

Comment 4 David Caldwell 2018-06-27 10:32:59 UTC
Created attachment 1454996 [details]
oc get dc,rc,pod -o yaml

Comment 5 David Caldwell 2018-06-27 10:36:11 UTC
$ oc describe rs/asb-1
Error from server (NotFound): replicasets.extensions "asb-1" not found

$ oc get rs --all-namespaces
NAMESPACE               NAME                    DESIRED   CURRENT   READY     AGE
openshift-web-console   webconsole-746dbc7568   3         3         3         19h

Comment 6 David Caldwell 2018-06-27 10:41:48 UTC
Controller logs to follow. Once the loglevel is set, is there an action/command you would like me to collect logs for?

Comment 7 Michal Fojtik 2018-06-27 12:51:10 UTC
Also:

oc logs asb-1-deploy 

That should tell us what the deployer pod was doing for 10 minutes after it timeouted.

Comment 8 David Caldwell 2018-06-27 15:15:33 UTC
$ oc logs asb-1-deploy 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

Comment 9 David Caldwell 2018-06-27 15:30:51 UTC
Created attachment 1455069 [details]
complete journald since install filtered for unit openshift-controllers

Comment 10 David Caldwell 2018-06-27 15:32:02 UTC
Created attachment 1455070 [details]
journal log for the time period during which I changed to loglevel=4 only - filtered for unit openshift-controllers

Comment 11 David Caldwell 2018-07-19 12:18:55 UTC
Same issue in 3.9.33. New install (on upshift quicklab) and the asb deployment pods are stuck in error:

[quicklab@master-0 ~]$ oc logs asb-1-deploy -n openshift-ansible-service-broker 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

[quicklab@master-0 ~]$ oc version
oc v3.9.33
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.9.33
kubernetes v1.9.1+a0ce1bc657

[quicklab@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE                           NAME                          READY     STATUS    RESTARTS   AGE
default                             docker-registry-1-dn2zt       1/1       Running   1          2h
default                             registry-console-1-szqjp      1/1       Running   1          2h
default                             router-2-t9hm8                1/1       Running   0          1h
kube-service-catalog                apiserver-g4r9k               1/1       Running   0          1h
kube-service-catalog                apiserver-g5hxw               1/1       Running   0          1h
kube-service-catalog                apiserver-q62jm               1/1       Running   0          1h
kube-service-catalog                controller-manager-6gvfb      1/1       Running   1          2h
kube-service-catalog                controller-manager-rhn2x      1/1       Running   2          2h
kube-service-catalog                controller-manager-t2pf2      1/1       Running   6          2h
openshift-ansible-service-broker    asb-1-deploy                  0/1       Error     0          2h
openshift-ansible-service-broker    asb-etcd-1-deploy             0/1       Error     0          2h
openshift-template-service-broker   apiserver-bkqg5               1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-72vmj   1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-89ctr   1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-9lggd   1/1       Running   1          2h

Comment 12 Daein Park 2018-10-30 00:19:15 UTC
I also met this issue, and I could resolve it using "oc rollout latest" as workaround.

My verification steps is as follows.

~~~
# oc version
oc v3.9.43
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.9.43
kubernetes v1.9.1+a0ce1bc657

# oc get pod
NAME                READY     STATUS    RESTARTS   AGE
asb-1-deploy        0/1       Error     0          12d
asb-etcd-1-deploy   0/1       Error     0          12d

# oc get pvc
NAME      STATUS    VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS   AGE
etcd      Bound     etcd-pv       1Gi        RWO                           12d

# oc rollout latest dc/asb-etcd
deploymentconfig "asb-etcd" rolled out

# oc get pod
NAME                READY     STATUS              RESTARTS   AGE
asb-1-deploy        0/1       Error               0          12d
asb-etcd-2-deploy   0/1       ContainerCreating   0          2s

# oc get pod
NAME               READY     STATUS    RESTARTS   AGE
asb-1-deploy       0/1       Error     0          12d
asb-etcd-2-t6wwj   1/1       Running   0          2m
~~~

Comment 13 Christian Stark 2019-02-15 10:04:42 UTC
the solution should be to check the pvc in the project (named etcd). I had the same behaviour and then found that it was pending.

Comment 14 Maciej Szulik 2019-06-24 14:32:13 UTC
Looks like the previous comment solved the issue, as well as original reporter found a working solution. I'm going to close this, feel free to re-open if it's still a problem.