Bug 1466732

Summary: cluster server temporarily unavailable and then recovered
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED CURRENTRELEASE QA Contact: Weihua Meng <wmeng>
Severity: low Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, ccoleman, decarr, eparis, jokerman, lxia, mmccomas, wmeng
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 17:35:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weihua Meng 2017-06-30 10:54:45 UTC
Description of problem:
cluster server temporarily unavailable and recovered 

Version-Release number of selected component (if applicable):
openshift v3.6.126.4

How reproducible:
Always

Steps to Reproduce:
1. create RC with replicas=20
apiVersion: v1
kind: ReplicationController
metadata:
  name: frontend-1
spec:
  replicas: 20 
  selector:    
    name: frontend
  template:    
    metadata:
      labels:  
        name: frontend 
    spec:
      containers:
      - image: openshift/hello-openshift
        name: helloworld
        ports:
        - containerPort: 8080
          protocol: TCP
      restartPolicy: Always
2. oc get all (immediately) 


Actual results:
[meng@dhcp-140-97 test]$ oc get all
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
^[[A
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
NAME                  READY     STATUS              RESTARTS   AGE
po/frontend-1-0jjmw   0/1       Pending             0          16s
po/frontend-1-1x4cg   0/1       Pending             0          16s
po/frontend-1-2690l   0/1       Pending             0          16s
po/frontend-1-2c1g0   0/1       Pending             0          16s
po/frontend-1-2sc9h   0/1       Pending             0          16s
po/frontend-1-4fkbd   0/1       Pending             0          16s
po/frontend-1-5k4z1   0/1       Pending             0          16s
po/frontend-1-5qf1q   0/1       Pending             0          16s
po/frontend-1-6mpsb   0/1       Pending             0          16s
po/frontend-1-6rtzh   0/1       Pending             0          16s
po/frontend-1-6twt5   0/1       Pending             0          16s
po/frontend-1-85n51   0/1       Pending             0          16s
po/frontend-1-bt5c1   0/1       Pending             0          16s
po/frontend-1-chtgm   0/1       Pending             0          16s
po/frontend-1-cwrvk   0/1       Pending             0          16s
po/frontend-1-dpkpx   0/1       Pending             0          16s
po/frontend-1-fb7dl   0/1       Pending             0          16s
po/frontend-1-fbmjv   0/1       Pending             0          16s
po/frontend-1-gt58x   0/1       Pending             0          16s
po/frontend-1-gwwlt   0/1       Pending             0          16s
po/frontend-1-h73x4   0/1       Pending             0          16s
po/frontend-1-hnqdl   0/1       Pending             0          16s
po/frontend-1-jfdt8   0/1       Pending             0          16s
po/frontend-1-kct59   0/1       Pending             0          16s
po/frontend-1-kwj5h   0/1       Pending             0          16s
po/frontend-1-ltc0b   0/1       Pending             0          16s
po/frontend-1-m1vsc   0/1       ContainerCreating   0          16s
po/frontend-1-m5ng0   0/1       Pending             0          16s
po/frontend-1-mxk27   0/1       Pending             0          16s
po/frontend-1-nh35f   0/1       ContainerCreating   0          16s
po/frontend-1-p5pw2   0/1       Pending             0          16s
po/frontend-1-pq13r   0/1       Pending             0          16s
po/frontend-1-rdlr4   0/1       Pending             0          16s
po/frontend-1-tdt88   0/1       Pending             0          16s
po/frontend-1-v8p7b   0/1       Pending             0          16s
po/frontend-1-vwx52   0/1       Pending             0          16s
po/frontend-1-vz7sp   0/1       Pending             0          16s
po/frontend-1-w34mq   0/1       Pending             0          16s
po/frontend-1-wngtt   0/1       Pending             0          16s
po/frontend-1-wzrws   0/1       Pending             0          16s
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?


Expected results:
NO "The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?"


Additional info:
cluster on GCE
1 master and 2 node
each with 2 CPU and 7GiB memory

E0630 06:37:38.732039   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRoleBinding: Get https://qe-wmeng0630-master-1:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.732094   22299 reflector.go:201] github.com/openshift/origin/pkg/deploy/generated/informers/internalversion/factory.go:45: Failed to list *api.DeploymentConfig: Get https://qe-wmeng0630-master-1:8443/apis/apps.openshift.io/v1/deploymentconfigs?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.732121   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v2alpha1.CronJob: Get https://qe-wmeng0630-master-1:8443/apis/batch/v2alpha1/cronjobs?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.733256   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRole: Get https://qe-wmeng0630-master-1:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterroles?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.734349   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.DaemonSet: Get https://qe-wmeng0630-master-1:8443/apis/extensions/v1beta1/daemonsets?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.737420   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.ConfigMap: Get https://qe-wmeng0630-master-1:8443/api/v1/configmaps?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.742834   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.PersistentVolumeClaim: Get https://qe-wmeng0630-master-1:8443/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.743812   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.ReplicaSet: Get https://qe-wmeng0630-master-1:8443/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.745672   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.StorageClass: Get https://qe-wmeng0630-master-1:8443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.746777   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.PersistentVolume: Get https://qe-wmeng0630-master-1:8443/api/v1/persistentvolumes?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.750781   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.PodSecurityPo

Comment 1 Eric Paris 2017-06-30 15:58:25 UTC
Can we get the master logs for "10.240.0.21" while you see this problem? Connection refused is a bit odd...

Comment 2 Derek Carr 2017-06-30 16:12:53 UTC
I am not sure how to proceed further without logs.

Comment 7 Derek Carr 2017-07-03 14:28:21 UTC
i suspect this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1465361

does the problem continue with the changes made in referenced bz?

Comment 8 Weihua Meng 2017-07-04 03:42:30 UTC
the logs are not same. 
may be same root cause.
I will retry when that bug verified.

Comment 9 Derek Carr 2017-07-05 17:58:58 UTC
is this an HA deployment?  were the masters behind a loadbalancer?

Comment 10 Clayton Coleman 2017-07-05 20:44:48 UTC
You can gzip attachments before uploading to send larger chunks.

Comment 11 Weihua Meng 2017-07-10 08:17:38 UTC
It is not a HA cluster.
After investigation, it happens when run 
"openshift start master controllers --config=/etc/origin/master/master-config.yaml" on master, 
which means there might be two openshift master processes running at the same time, resulting in errors.
and when one process exited, the cluster recovered.

Comment 12 Weihua Meng 2018-04-25 09:36:50 UTC
Not meet this issue recently.