Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1613544

Summary:

oc can't get information after master-restart api and controllers

Product:

OpenShift Container Platform

Reporter:

Weibin Liang <weliang>

Component:

Master

Assignee:

Michal Fojtik <mfojtik>

Status:

CLOSED WONTFIX

QA Contact:

Xingxing Xia <xxia>

Severity:

high

Docs Contact:

Priority:

low

Version:

3.11.0

CC:

aos-bugs, jokerman, maszulik, mmccomas, weliang, wsun

Target Milestone:

---

Target Release:

4.1.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-02-20 09:47:00 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
master-logs	none

Description Weibin Liang 2018-08-07 19:57:38 UTC

Created attachment 1474087 [details]
master-logs

Description of problem:
oc command can't get information after  master-restart api and controllers

Version-Release number of selected component (if applicable):
v3.11

How reproducible:
Every time

Steps to Reproduce:
[root@ip-172-18-6-251 ~]# oc version
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-6-251.ec2.internal:8443
openshift v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
[root@ip-172-18-6-251 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-gwnbn    1/1       Running   0          5h
dockergc-nfkdg             1/1       Running   0          5h
dockergc-st7wz             1/1       Running   0          5h
registry-console-1-kbz2r   1/1       Running   0          5h
router-1-c8fzf             1/1       Running   0          5h
[root@ip-172-18-6-251 ~]# oc get clusternetwork default
NAME      CLUSTER NETWORKS   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14:9    172.30.0.0/16     redhat/openshift-ovs-multitenant
[root@ip-172-18-6-251 ~]# master-restart api
[root@ip-172-18-6-251 ~]# master-restart controllers
[root@ip-172-18-6-251 ~]# oc version
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
error: server took too long to respond with version information.
[root@ip-172-18-6-251 ~]# oc get pods
No resources found.
Unable to connect to the server: net/http: TLS handshake timeout
[root@ip-172-18-6-251 ~]# oc get clusternetwork default
NAME      CLUSTER NETWORKS   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14:9    172.30.0.0/16     redhat/openshift-ovs-multitenant
[root@ip-172-18-6-251 ~]# oc get pods
No resources found.
Unable to connect to the server: net/http: TLS handshake timeout
[root@ip-172-18-6-251 ~]# oc get all
error: the server doesn't have a resource type "deploymentconfigs"
[root@ip-172-18-6-251 ~]# 
[root@ip-172-18-6-251 ~]# oc get all
No resources found.
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
[root@ip-172-18-6-251 ~]# 

Actual results:
oc get can not get information

Expected results:
oc get command should work

Additional info:
The log form master-logs api api and master-logs controllers controllers is attached.

Comment 1 Xingxing Xia 2018-08-08 02:54:48 UTC

What's your instance favor? Per https://bugzilla.redhat.com/show_bug.cgi?id=1593635#c39 , ec2 env should use large (e.g. m3.large)

Comment 2 Weibin Liang 2018-08-08 14:12:52 UTC

Work around for this bug is to set vm_type: m3.large when install the ec2 cluster.

The concern is how about the customers who do not want to pay more money to update their vm type to m3.large?

Comment 3 Weibin Liang 2018-08-08 15:20:55 UTC

Even set vm_type: m3.large, after master-restart api and controllers, then reboot the master, the problem happen again.

[root@ip-172-18-5-143 ec2-user]# oc get pods
The connection to the server ip-172-18-5-143.ec2.internal:8443 was refused - did you specify the right host or port?
[root@ip-172-18-5-143 ec2-user]# oc get all
The connection to the server ip-172-18-5-143.ec2.internal:8443 was refused - did you specify the right host or port?

Comment 4 Maciej Szulik 2018-08-09 11:14:42 UTC

What is 'master-restart api' and 'master-restart controllers' what actions exactly does it perform? Who provided you these?

Comment 5 Weibin Liang 2018-08-09 13:02:41 UTC

Both 'master-restart api' and 'master-restart controllers' are from:
https://docs.openshift.com/container-platform/3.10/release_notes/ocp_3_10_release_notes.html#ocp-310-important-installation-changes, and commands is to causes the kubelet to restart the entire static pod for the named component.

In v3.10 and v3.11, what's the command to restart master after modify master-config.yaml file? 

Below two commands are rejected in v3.11:
systemctl status atomic-openshift-master-api.service 
systemctl status atomic-openshift-master-controllers.service