Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1595573

Summary:

New 3.9 install asb-1-deploy and asb-etcd-1-deploy stay in error status with update acceptor rejected

Product:

OpenShift Container Platform

Reporter:

David Caldwell <dcaldwel>

Component:

openshift-controller-manager

Assignee:

Michal Fojtik <mfojtik>

Status:

CLOSED NOTABUG

QA Contact:

Wang Haoran <haowang>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

3.9.0

CC:

aos-bugs, cstark, dapark, dcaldwel, maszulik, mfojtik, syangsao

Target Milestone:

---

Target Release:

3.9.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-06-24 14:32:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
oc describe asb-1-deploy	none
oc describe asb-etcd	none
oc get dc,rc,pod -o yaml	none
complete journald since install filtered for unit openshift-controllers	none
journal log for the time period during which I changed to loglevel=4 only - filtered for unit openshift-controllers	none

Description David Caldwell 2018-06-27 07:46:17 UTC

Description of problem:

Environment is quicklab.

Fresh installation of OCP 3.9 asb-1-deploy and asb-etcd-1-deploy stay in error state.

Version-Release number of selected component (if applicable):
OCP 3.9.30 


How reproducible:

I needed a fresh installation of OCP 3.9 latest for testing, so I cleaned and then redeployed the cluster.

Cleaned cluster using

$ ansible-playbook -i /home/quicklab/hosts /usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml

Installed cluster using

$ ansible-playbook -i /home/quicklab/hosts /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml


Actual results:

$ oc get pods
NAME                READY     STATUS    RESTARTS   AGE
asb-1-deploy        0/1       Error     0          16h
asb-etcd-1-deploy   0/1       Error     0          16h


Expected results:

Zero errors.


Additional info:

I still have this environment running if any troubleshooting is needed.

oc logs output:

$ oc logs asb-1-deploy 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

$ oc logs asb-etcd-1-deploy 
--> Scaling asb-etcd-1 to 1
error: update acceptor rejected asb-etcd-1: pods for rc 'openshift-ansible-service-broker/asb-etcd-1' took longer than 600 seconds to become available

Comment 1 Michal Fojtik 2018-06-27 08:44:00 UTC

Can we get the following:

* oc describe rs/asb-1
* oc describe pod/asb-1-deploy
* oc describe pod/asb-etcd-1-deploy
* controller logs (ideally with loglevel>3)
* oc get dc,rc,pod -o yaml

Comment 2 David Caldwell 2018-06-27 10:31:23 UTC

Created attachment 1454994 [details]
oc describe asb-1-deploy

Comment 3 David Caldwell 2018-06-27 10:31:52 UTC

Created attachment 1454995 [details]
oc describe asb-etcd

Comment 4 David Caldwell 2018-06-27 10:32:59 UTC

Created attachment 1454996 [details]
oc get dc,rc,pod -o yaml

Comment 5 David Caldwell 2018-06-27 10:36:11 UTC

$ oc describe rs/asb-1
Error from server (NotFound): replicasets.extensions "asb-1" not found

$ oc get rs --all-namespaces
NAMESPACE               NAME                    DESIRED   CURRENT   READY     AGE
openshift-web-console   webconsole-746dbc7568   3         3         3         19h

Comment 6 David Caldwell 2018-06-27 10:41:48 UTC

Controller logs to follow. Once the loglevel is set, is there an action/command you would like me to collect logs for?

Comment 7 Michal Fojtik 2018-06-27 12:51:10 UTC

Also:

oc logs asb-1-deploy 

That should tell us what the deployer pod was doing for 10 minutes after it timeouted.

Comment 8 David Caldwell 2018-06-27 15:15:33 UTC

$ oc logs asb-1-deploy 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

Comment 9 David Caldwell 2018-06-27 15:30:51 UTC

Created attachment 1455069 [details]
complete journald since install filtered for unit openshift-controllers

Comment 10 David Caldwell 2018-06-27 15:32:02 UTC

Created attachment 1455070 [details]
journal log for the time period during which I changed to loglevel=4 only - filtered for unit openshift-controllers

Comment 11 David Caldwell 2018-07-19 12:18:55 UTC

Same issue in 3.9.33. New install (on upshift quicklab) and the asb deployment pods are stuck in error:

[quicklab@master-0 ~]$ oc logs asb-1-deploy -n openshift-ansible-service-broker 
--> Scaling asb-1 to 1
error: update acceptor rejected asb-1: pods for rc 'openshift-ansible-service-broker/asb-1' took longer than 600 seconds to become available

[quicklab@master-0 ~]$ oc version
oc v3.9.33
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.9.33
kubernetes v1.9.1+a0ce1bc657

[quicklab@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE                           NAME                          READY     STATUS    RESTARTS   AGE
default                             docker-registry-1-dn2zt       1/1       Running   1          2h
default                             registry-console-1-szqjp      1/1       Running   1          2h
default                             router-2-t9hm8                1/1       Running   0          1h
kube-service-catalog                apiserver-g4r9k               1/1       Running   0          1h
kube-service-catalog                apiserver-g5hxw               1/1       Running   0          1h
kube-service-catalog                apiserver-q62jm               1/1       Running   0          1h
kube-service-catalog                controller-manager-6gvfb      1/1       Running   1          2h
kube-service-catalog                controller-manager-rhn2x      1/1       Running   2          2h
kube-service-catalog                controller-manager-t2pf2      1/1       Running   6          2h
openshift-ansible-service-broker    asb-1-deploy                  0/1       Error     0          2h
openshift-ansible-service-broker    asb-etcd-1-deploy             0/1       Error     0          2h
openshift-template-service-broker   apiserver-bkqg5               1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-72vmj   1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-89ctr   1/1       Running   1          2h
openshift-web-console               webconsole-68b848cb77-9lggd   1/1       Running   1          2h

Comment 12 Daein Park 2018-10-30 00:19:15 UTC

I also met this issue, and I could resolve it using "oc rollout latest" as workaround.

My verification steps is as follows.

~~~
# oc version
oc v3.9.43
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

openshift v3.9.43
kubernetes v1.9.1+a0ce1bc657

# oc get pod
NAME                READY     STATUS    RESTARTS   AGE
asb-1-deploy        0/1       Error     0          12d
asb-etcd-1-deploy   0/1       Error     0          12d

# oc get pvc
NAME      STATUS    VOLUME        CAPACITY   ACCESS MODES   STORAGECLASS   AGE
etcd      Bound     etcd-pv       1Gi        RWO                           12d

# oc rollout latest dc/asb-etcd
deploymentconfig "asb-etcd" rolled out

# oc get pod
NAME                READY     STATUS              RESTARTS   AGE
asb-1-deploy        0/1       Error               0          12d
asb-etcd-2-deploy   0/1       ContainerCreating   0          2s

# oc get pod
NAME               READY     STATUS    RESTARTS   AGE
asb-1-deploy       0/1       Error     0          12d
asb-etcd-2-t6wwj   1/1       Running   0          2m
~~~

Comment 13 Christian Stark 2019-02-15 10:04:42 UTC

the solution should be to check the pvc in the project (named etcd). I had the same behaviour and then found that it was pending.

Comment 14 Maciej Szulik 2019-06-24 14:32:13 UTC

Looks like the previous comment solved the issue, as well as original reporter found a working solution. I'm going to close this, feel free to re-open if it's still a problem.