Bug 1466732 - cluster server temporarily unavailable and then recovered
cluster server temporarily unavailable and then recovered
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.6.0
Unspecified Unspecified
medium Severity low
: ---
: ---
Assigned To: Derek Carr
Weihua Meng
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-30 06:54 EDT by Weihua Meng
Modified: 2018-04-25 05:36 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Weihua Meng 2017-06-30 06:54:45 EDT
Description of problem:
cluster server temporarily unavailable and recovered 

Version-Release number of selected component (if applicable):
openshift v3.6.126.4

How reproducible:
Always

Steps to Reproduce:
1. create RC with replicas=20
apiVersion: v1
kind: ReplicationController
metadata:
  name: frontend-1
spec:
  replicas: 20 
  selector:    
    name: frontend
  template:    
    metadata:
      labels:  
        name: frontend 
    spec:
      containers:
      - image: openshift/hello-openshift
        name: helloworld
        ports:
        - containerPort: 8080
          protocol: TCP
      restartPolicy: Always
2. oc get all (immediately) 


Actual results:
[meng@dhcp-140-97 test]$ oc get all
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
^[[A
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
[meng@dhcp-140-97 test]$ oc get all
NAME                  READY     STATUS              RESTARTS   AGE
po/frontend-1-0jjmw   0/1       Pending             0          16s
po/frontend-1-1x4cg   0/1       Pending             0          16s
po/frontend-1-2690l   0/1       Pending             0          16s
po/frontend-1-2c1g0   0/1       Pending             0          16s
po/frontend-1-2sc9h   0/1       Pending             0          16s
po/frontend-1-4fkbd   0/1       Pending             0          16s
po/frontend-1-5k4z1   0/1       Pending             0          16s
po/frontend-1-5qf1q   0/1       Pending             0          16s
po/frontend-1-6mpsb   0/1       Pending             0          16s
po/frontend-1-6rtzh   0/1       Pending             0          16s
po/frontend-1-6twt5   0/1       Pending             0          16s
po/frontend-1-85n51   0/1       Pending             0          16s
po/frontend-1-bt5c1   0/1       Pending             0          16s
po/frontend-1-chtgm   0/1       Pending             0          16s
po/frontend-1-cwrvk   0/1       Pending             0          16s
po/frontend-1-dpkpx   0/1       Pending             0          16s
po/frontend-1-fb7dl   0/1       Pending             0          16s
po/frontend-1-fbmjv   0/1       Pending             0          16s
po/frontend-1-gt58x   0/1       Pending             0          16s
po/frontend-1-gwwlt   0/1       Pending             0          16s
po/frontend-1-h73x4   0/1       Pending             0          16s
po/frontend-1-hnqdl   0/1       Pending             0          16s
po/frontend-1-jfdt8   0/1       Pending             0          16s
po/frontend-1-kct59   0/1       Pending             0          16s
po/frontend-1-kwj5h   0/1       Pending             0          16s
po/frontend-1-ltc0b   0/1       Pending             0          16s
po/frontend-1-m1vsc   0/1       ContainerCreating   0          16s
po/frontend-1-m5ng0   0/1       Pending             0          16s
po/frontend-1-mxk27   0/1       Pending             0          16s
po/frontend-1-nh35f   0/1       ContainerCreating   0          16s
po/frontend-1-p5pw2   0/1       Pending             0          16s
po/frontend-1-pq13r   0/1       Pending             0          16s
po/frontend-1-rdlr4   0/1       Pending             0          16s
po/frontend-1-tdt88   0/1       Pending             0          16s
po/frontend-1-v8p7b   0/1       Pending             0          16s
po/frontend-1-vwx52   0/1       Pending             0          16s
po/frontend-1-vz7sp   0/1       Pending             0          16s
po/frontend-1-w34mq   0/1       Pending             0          16s
po/frontend-1-wngtt   0/1       Pending             0          16s
po/frontend-1-wzrws   0/1       Pending             0          16s
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?
The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?


Expected results:
NO "The connection to the server qe-wmeng0630-master-1.0630-jyx.qe.rhcloud.com:8443 was refused - did you specify the right host or port?"


Additional info:
cluster on GCE
1 master and 2 node
each with 2 CPU and 7GiB memory

E0630 06:37:38.732039   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRoleBinding: Get https://qe-wmeng0630-master-1:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterrolebindings?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.732094   22299 reflector.go:201] github.com/openshift/origin/pkg/deploy/generated/informers/internalversion/factory.go:45: Failed to list *api.DeploymentConfig: Get https://qe-wmeng0630-master-1:8443/apis/apps.openshift.io/v1/deploymentconfigs?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.732121   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v2alpha1.CronJob: Get https://qe-wmeng0630-master-1:8443/apis/batch/v2alpha1/cronjobs?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.733256   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *rbac.ClusterRole: Get https://qe-wmeng0630-master-1:8443/apis/rbac.authorization.k8s.io/v1beta1/clusterroles?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.734349   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.DaemonSet: Get https://qe-wmeng0630-master-1:8443/apis/extensions/v1beta1/daemonsets?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.737420   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.ConfigMap: Get https://qe-wmeng0630-master-1:8443/api/v1/configmaps?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.742834   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.PersistentVolumeClaim: Get https://qe-wmeng0630-master-1:8443/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.743812   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.ReplicaSet: Get https://qe-wmeng0630-master-1:8443/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.745672   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.StorageClass: Get https://qe-wmeng0630-master-1:8443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.746777   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1.PersistentVolume: Get https://qe-wmeng0630-master-1:8443/api/v1/persistentvolumes?resourceVersion=0: dial tcp 10.240.0.21:8443: getsockopt: connection refused
E0630 06:37:38.750781   22299 reflector.go:201] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions/factory.go:70: Failed to list *v1beta1.PodSecurityPo
Comment 1 Eric Paris 2017-06-30 11:58:25 EDT
Can we get the master logs for "10.240.0.21" while you see this problem? Connection refused is a bit odd...
Comment 2 Derek Carr 2017-06-30 12:12:53 EDT
I am not sure how to proceed further without logs.
Comment 7 Derek Carr 2017-07-03 10:28:21 EDT
i suspect this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1465361

does the problem continue with the changes made in referenced bz?
Comment 8 Weihua Meng 2017-07-03 23:42:30 EDT
the logs are not same. 
may be same root cause.
I will retry when that bug verified.
Comment 9 Derek Carr 2017-07-05 13:58:58 EDT
is this an HA deployment?  were the masters behind a loadbalancer?
Comment 10 Clayton Coleman 2017-07-05 16:44:48 EDT
You can gzip attachments before uploading to send larger chunks.
Comment 11 Weihua Meng 2017-07-10 04:17:38 EDT
It is not a HA cluster.
After investigation, it happens when run 
"openshift start master controllers --config=/etc/origin/master/master-config.yaml" on master, 
which means there might be two openshift master processes running at the same time, resulting in errors.
and when one process exited, the cluster recovered.
Comment 12 Weihua Meng 2018-04-25 05:36:50 EDT
Not meet this issue recently.

Note You need to log in before you can comment on or make changes to this bug.