Bug 1862271 - Can not get deployment due to no route to host
Summary: Can not get deployment due to no route to host
Keywords:
Status: CLOSED DUPLICATE of bug 1872470
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Aniket Bhat
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-30 20:45 UTC by Paige Rubendall
Modified: 2020-09-08 16:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-08 16:38:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1852103 0 low CLOSED Failed to create pod due forbidden user for replicationcontrollers 2021-02-22 00:41:40 UTC

Description Paige Rubendall 2020-07-30 20:45:40 UTC
Description of problem:
During pod density testing pods are finishing in Error state (12 of 2000 pods) 

Version-Release number of selected component (if applicable):
% oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-07-25-065959   True        False         29h     Cluster version is 4.6.0-0.nightly-2020-07-25-065959


How reproducible:
100%

Steps to Reproduce:
1. Scale up cluster to 20 worker nodes.
2. Create 2000 projects (200 per node):
  - git clone https://github.com/openshift/svt.git
  - cd svt openshift_scalability
  - touch test.yaml
  - vim test.yaml

```
projects:
  - num: 2000
    basename: svt-
    templates:
      -
        num: 1
        file: ./content/deployment-config-1rep-pause-template.json
```

  - cp $KUBECONFIG ~/.kube/config
  - python cluster-loader.py -f test.yaml -p 5

Actual results:

% oc get pods --all-namespaces | egrep -v "Running|Complete" 
NAMESPACE                                          NAME                                                                 READY   STATUS      RESTARTS   AGE
svt-1645                                           deploymentconfig0-1-deploy                                           0/1     Error       0          124m
svt-1708                                           deploymentconfig0-1-deploy                                           0/1     Error       0          123m
svt-1714                                           deploymentconfig0-1-deploy                                           0/1     Error       0          123m
svt-1750                                           deploymentconfig0-1-deploy                                           0/1     Error       0          123m
svt-1767                                           deploymentconfig0-1-deploy                                           0/1     Error       0          122m
svt-1770                                           deploymentconfig0-1-deploy                                           0/1     Error       0          122m
svt-1797                                           deploymentconfig0-1-deploy                                           0/1     Error       0          122m
svt-1806                                           deploymentconfig0-1-deploy                                           0/1     Error       0          122m
svt-1840                                           deploymentconfig0-1-deploy                                           0/1     Error       0          121m
svt-1916                                           deploymentconfig0-1-deploy                                           0/1     Error       0          120m
svt-1920                                           deploymentconfig0-1-deploy                                           0/1     Error       0          120m

% oc logs deploymentconfig0-1-deploy -n svt-1645   
error: couldn't get deployment deploymentconfig0-1: Get "https://172.30.0.1:443/api/v1/namespaces/svt-1645/replicationcontrollers/deploymentconfig0-1": dial tcp 172.30.0.1:443: connect: no route to host

% oc get replicationcontrollers -n svt-1645
NAME                  DESIRED   CURRENT   READY   AGE
deploymentconfig0-1   0         0         0       127m


% oc describe replicationcontrollers deploymentconfig0-1 -n svt-1645
Name:         deploymentconfig0-1
Namespace:    svt-1645
Selector:     deployment=deploymentconfig0-1,deploymentconfig=deploymentconfig0,name=replicationcontroller0
Labels:       openshift.io/deployment-config.name=deploymentconfig0
              template=deploymentConfigTemplate
Annotations:  kubectl.kubernetes.io/desired-replicas: 1
              openshift.io/deployer-pod.completed-at: 2020-07-30 18:31:44 +0000 UTC
              openshift.io/deployer-pod.created-at: 2020-07-30 18:31:35 +0000 UTC
              openshift.io/deployer-pod.name: deploymentconfig0-1-deploy
              openshift.io/deployment-config.latest-version: 1
              openshift.io/deployment-config.name: deploymentconfig0
              openshift.io/deployment.phase: Failed
              openshift.io/deployment.replicas: 0
              openshift.io/deployment.status-reason: config change
              openshift.io/encoded-deployment-config:
                {"kind":"DeploymentConfig","apiVersion":"apps.openshift.io/v1","metadata":{"name":"deploymentconfig0","namespace":"svt-1645","selfLink":"/...
Replicas:     0 current / 0 desired
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       deployment=deploymentconfig0-1
                deploymentconfig=deploymentconfig0
                name=replicationcontroller0
  Annotations:  openshift.io/deployment-config.latest-version: 1
                openshift.io/deployment-config.name: deploymentconfig0
                openshift.io/deployment.name: deploymentconfig0-1
  Containers:
   pause0:
    Image:      gcr.io/google-containers/pause-amd64:3.0
    Port:       8080/TCP
    Host Port:  0/TCP
    Environment:
      ENVVAR1_0:  lF6ipPHq34NJ3TyLTQvuk2QH1qViWmMwjfjwvEeBBMw4sR5y1tTdGtPLTCUow2oB4P1yNcLtTJwXNXhffHlj3Ecni7MRInh3AF50APMRMrUmVohibI5C6OYY0dsHa8PdxUAd6vM7Iq0EA5PyTQHkguTvmMVNsvXtL42htL5soN8xe2aFPYd0tHwV6aG2oMTQI7CkgllhCD0nPhESKxvS7uqj2TNSEYp8aqLBDlvHjjWOT14a7uKb5c2LH1EAii2
      ENVVAR2_0:  lF6ipPHq34NJ3TyLTQvuk2QH1qViWmMwjfjwvEeBBMw4sR5y1tTdGtPLTCUow2oB4P1yNcLtTJwXNXhffHlj3Ecni7MRInh3AF50APMRMrUmVohibI5C6OYY0dsHa8PdxUAd6vM7Iq0EA5PyTQHkguTvmMVNsvXtL42htL5soN8xe2aFPYd0tHwV6aG2oMTQI7CkgllhCD0nPhESKxvS7uqj2TNSEYp8aqLBDlvHjjWOT14a7uKb5c2LH1EAii2
      ENVVAR3_0:  lF6ipPHq34NJ3TyLTQvuk2QH1qViWmMwjfjwvEeBBMw4sR5y1tTdGtPLTCUow2oB4P1yNcLtTJwXNXhffHlj3Ecni7MRInh3AF50APMRMrUmVohibI5C6OYY0dsHa8PdxUAd6vM7Iq0EA5PyTQHkguTvmMVNsvXtL42htL5soN8xe2aFPYd0tHwV6aG2oMTQI7CkgllhCD0nPhESKxvS7uqj2TNSEYp8aqLBDlvHjjWOT14a7uKb5c2LH1EAii2
      ENVVAR4_0:  lF6ipPHq34NJ3TyLTQvuk2QH1qViWmMwjfjwvEeBBMw4sR5y1tTdGtPLTCUow2oB4P1yNcLtTJwXNXhffHlj3Ecni7MRInh3AF50APMRMrUmVohibI5C6OYY0dsHa8PdxUAd6vM7Iq0EA5PyTQHkguTvmMVNsvXtL42htL5soN8xe2aFPYd0tHwV6aG2oMTQI7CkgllhCD0nPhESKxvS7uqj2TNSEYp8aqLBDlvHjjWOT14a7uKb5c2LH1EAii2
    Mounts:       <none>
  Volumes:        <none>
Events:           <none>


Expected results:
All pods created with no errors 

Additional info:

Comment 2 Andrew McDermott 2020-07-31 11:42:32 UTC
Can you attach the output of `oc get events`? 

It's not clear why this should be attributed to routing - if you can't create a deployment that speaks to other resource issues.

Comment 3 Andrew McDermott 2020-07-31 11:48:56 UTC
Iā€™m adding UpcomingSprint, because I was occupied by fixing bugs with
higher priority/severity, developing new features with higher
priority, or developing new features to improve stability at a macro
level. I will revisit this bug next sprint.

Comment 4 Andrew McDermott 2020-07-31 11:56:00 UTC
(In reply to Andrew McDermott from comment #2)
> Can you attach the output of `oc get events`? 
> 
> It's not clear why this should be attributed to routing - if you can't
> create a deployment that speaks to other resource issues.

% oc logs deploymentconfig0-1-deploy -n svt-1645   
error: couldn't get deployment deploymentconfig0-1: Get "https://172.30.0.1:443/api/v1/namespaces/svt-1645/replicationcontrollers/deploymentconfig0-1": dial tcp 172.30.0.1:443: connect: no route to host

^ I missed this on first reading.

If your cluster is still up can you attach:

$ oc get events

and what does:

$ oc get pods -n openshift-ingress
$ oc log -n openshift-ingress <router-pod-XXX>

show?

Comment 5 Paige Rubendall 2020-08-04 02:34:30 UTC
I have not been able to get back to this exact state, I have hit other issues before I was able to get more detailed information

Comment 8 Paige Rubendall 2020-08-06 15:07:16 UTC
Was able to get to this same error state. Attaching logs of ingress and events as private comments 

Here is the information from my new cluster 

% oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-08-05-082458   True        False         108m    Cluster version is 4.6.0-0.nightly-2020-08-05-082458

% oc get pods -A | grep svt | grep Error                                          
svt-1366                                           deploymentconfig0-1-deploy                                            0/1     Error               0          8m36s
svt-1383                                           deploymentconfig0-1-deploy                                            0/1     Error               0          8m23s
svt-1397                                           deploymentconfig0-1-deploy                                            0/1     Error               0          8m9s
svt-1568                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m34s
svt-1585                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m15s
svt-1586                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m15s
svt-1591                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m11s
svt-1592                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m11s
svt-1595                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m4s
svt-1596                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m6s
svt-1597                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m5s
svt-1598                                           deploymentconfig0-1-deploy                                            0/1     Error               0          5m6s
svt-1600                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m58s
svt-1602                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m58s
svt-1603                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m58s
svt-1605                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m54s
svt-1607                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m54s
svt-1608                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m53s
svt-1611                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m48s
svt-1615                                           deploymentconfig0-1-deploy                                            0/1     Error               0          4m41s
svt-887                                            deploymentconfig0-1-deploy                                            0/1     Error               0          16m
svt-897                                            deploymentconfig0-1-deploy                                            0/1     Error               0          16m

Comment 10 Andrew McDermott 2020-08-06 16:33:29 UTC
Looking through the router logs from comment #8 and the events from comment #9 I don't see anything that hints at an ingress problem.

It's not clear that ingress is failing so moving this to SDN as we see (from comment #4):

% oc logs deploymentconfig0-1-deploy -n svt-1645   
error: couldn't get deployment deploymentconfig0-1: Get "https://172.30.0.1:443/api/v1/namespaces/svt-1645/replicationcontrollers/deploymentconfig0-1": dial tcp 172.30.0.1:443: connect: no route to host

which is an internal endpoint.

Comment 11 Paige Rubendall 2020-08-06 16:47:07 UTC
Changing sub component to ovn because these clusters have been using the ovn specific configuration

Comment 14 Luis Sanchez 2020-08-21 13:30:23 UTC
Please reproduce in newer build and capture must-gather output.

Comment 16 Stefan Schimanski 2020-08-25 10:26:26 UTC
No route to host is an SDN problem, isn't it?


Note You need to log in before you can comment on or make changes to this bug.