Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1613544

Summary: oc can't get information after master-restart api and controllers
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: MasterAssignee: Michal Fojtik <mfojtik>
Status: CLOSED WONTFIX QA Contact: Xingxing Xia <xxia>
Severity: high Docs Contact:
Priority: low    
Version: 3.11.0CC: aos-bugs, jokerman, maszulik, mmccomas, weliang, wsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 09:47:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
master-logs none

Description Weibin Liang 2018-08-07 19:57:38 UTC
Created attachment 1474087 [details]
master-logs

Description of problem:
oc command can't get information after  master-restart api and controllers

Version-Release number of selected component (if applicable):
v3.11

How reproducible:
Every time

Steps to Reproduce:
[root@ip-172-18-6-251 ~]# oc version
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-6-251.ec2.internal:8443
openshift v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
[root@ip-172-18-6-251 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-gwnbn    1/1       Running   0          5h
dockergc-nfkdg             1/1       Running   0          5h
dockergc-st7wz             1/1       Running   0          5h
registry-console-1-kbz2r   1/1       Running   0          5h
router-1-c8fzf             1/1       Running   0          5h
[root@ip-172-18-6-251 ~]# oc get clusternetwork default
NAME      CLUSTER NETWORKS   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14:9    172.30.0.0/16     redhat/openshift-ovs-multitenant
[root@ip-172-18-6-251 ~]# master-restart api
[root@ip-172-18-6-251 ~]# master-restart controllers
[root@ip-172-18-6-251 ~]# oc version
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
error: server took too long to respond with version information.
[root@ip-172-18-6-251 ~]# oc get pods
No resources found.
Unable to connect to the server: net/http: TLS handshake timeout
[root@ip-172-18-6-251 ~]# oc get clusternetwork default
NAME      CLUSTER NETWORKS   SERVICE NETWORK   PLUGIN NAME
default   10.128.0.0/14:9    172.30.0.0/16     redhat/openshift-ovs-multitenant
[root@ip-172-18-6-251 ~]# oc get pods
No resources found.
Unable to connect to the server: net/http: TLS handshake timeout
[root@ip-172-18-6-251 ~]# oc get all
error: the server doesn't have a resource type "deploymentconfigs"
[root@ip-172-18-6-251 ~]# 
[root@ip-172-18-6-251 ~]# oc get all
No resources found.
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
The connection to the server ip-172-18-6-251.ec2.internal:8443 was refused - did you specify the right host or port?
[root@ip-172-18-6-251 ~]# 

Actual results:
oc get can not get information

Expected results:
oc get command should work

Additional info:
The log form master-logs api api and master-logs controllers controllers is attached.

Comment 1 Xingxing Xia 2018-08-08 02:54:48 UTC
What's your instance favor? Per https://bugzilla.redhat.com/show_bug.cgi?id=1593635#c39 , ec2 env should use large (e.g. m3.large)

Comment 2 Weibin Liang 2018-08-08 14:12:52 UTC
Work around for this bug is to set vm_type: m3.large when install the ec2 cluster.

The concern is how about the customers who do not want to pay more money to update their vm type to m3.large?

Comment 3 Weibin Liang 2018-08-08 15:20:55 UTC
Even set vm_type: m3.large, after master-restart api and controllers, then reboot the master, the problem happen again.

[root@ip-172-18-5-143 ec2-user]# oc get pods
The connection to the server ip-172-18-5-143.ec2.internal:8443 was refused - did you specify the right host or port?
[root@ip-172-18-5-143 ec2-user]# oc get all
The connection to the server ip-172-18-5-143.ec2.internal:8443 was refused - did you specify the right host or port?

Comment 4 Maciej Szulik 2018-08-09 11:14:42 UTC
What is 'master-restart api' and 'master-restart controllers' what actions exactly does it perform? Who provided you these?

Comment 5 Weibin Liang 2018-08-09 13:02:41 UTC
Both 'master-restart api' and 'master-restart controllers' are from:
https://docs.openshift.com/container-platform/3.10/release_notes/ocp_3_10_release_notes.html#ocp-310-important-installation-changes, and commands is to causes the kubelet to restart the entire static pod for the named component.

In v3.10 and v3.11, what's the command to restart master after modify master-config.yaml file? 

Below two commands are rejected in v3.11:
systemctl status atomic-openshift-master-api.service 
systemctl status atomic-openshift-master-controllers.service