Bug 1807193 - Windows pod unreachable with "No route to host" error
Summary: Windows pod unreachable with "No route to host" error
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.5.0
Assignee: Sebastian Soto
QA Contact: gaoshang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-25 18:59 UTC by Sebastian Soto
Modified: 2020-03-25 05:45 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-25 05:45:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovnkube-node (272.30 KB, text/plain)
2020-03-03 06:50 UTC, gaoshang
no flags Details

Description Sebastian Soto 2020-02-25 18:59:24 UTC
Description of problem:
After a Windows pod running a webserver is provisioned curling the webserver gives 

```
Failed to connect to 10.132.0.6 port 80: No route to host
```

Version-Release number of selected component (if applicable):


How reproducible:
Very

Steps to Reproduce:
1. Deploy Windows pod:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: win-webserver
  name: win-webserver
spec:
  securityContext:
  selector:
    matchLabels:
      app: win-webserver
  replicas: 1
  template:
    metadata:
      labels:
        app: win-webserver
      name: win-webserver
    spec:
      podSecurityContext:
      tolerations:
      - key: "os"
        value: "Windows"
        Effect: "NoSchedule"
      containers:
      - name: windowswebserver
        securityContext:
        image: mcr.microsoft.com/windows/servercore:ltsc2019
        imagePullPolicy: IfNotPresent
        command:
        - powershell.exe
        - -command
        - $listener = New-Object System.Net.HttpListener; $listener.Prefixes.Add('http://*:80/'); $listener.Start();Write-Host('Listening at http://*:80/'); while ($listener.IsListening) { $context = $listener.GetContext(); $response = $context.Response; $content='<html><body><H1>Windows Container Web Server</H1></body></html>'; $buffer = [System.Text.Encoding]::UTF8.GetBytes($content); $response.ContentLength64 = $buffer.Length; $response.OutputStream.Write($buffer, 0, $buffer.Length); $response.Close(); };
      nodeSelector:
        beta.kubernetes.io/os: windows


2. Curl the pod from a linux pod

Actual results:
Failed to connect to 10.132.0.6 port 80: No route to host

Expected results:
<html><body><H1>Windows Container Web Server</H1></body></html>

Additional info:

Comment 3 Dan Williams 2020-03-02 22:40:58 UTC
The "k8s.ovn.org/l3-gateway-config annotation not found for node" message should be suppressed, it just means ovnkube couldn't find that annotation and shouldn't affect operation. I've suppressed that message in the hybrid overlay code upstream now.

Comment 6 gaoshang 2020-03-03 06:50:59 UTC
Created attachment 1667123 [details]
ovnkube-node

Comment 7 gaoshang 2020-03-03 06:52:30 UTC
Comment on attachment 1667123 [details]
ovnkube-node

# oc logs -n openshift-ovn-kubernetes pod/ovnkube-node-grghd -c ovnkube-node

Comment 9 Sebastian Soto 2020-03-03 21:40:03 UTC
I have not been able to reproduce this bug using openshift-install-linux-4.4.0-0.nightly-2020-03-02-124231

What I've tried:

Running the east-west test on the same 2 VM's 10+ times
Running the WSU and then the east-west test on the same 2 VMs 6 times
Running the WSU and then the east-west test on new VMs 4 times

Comment 10 Anurag saxena 2020-03-04 21:46:24 UTC
Indeed. Not reproducible for me as well on 4.4.0-0.nightly-2020-03-04-143604. Shang gao Can you also check in your env on latest nightly?

Comment 11 gaoshang 2020-03-05 15:08:28 UTC
(In reply to Anurag saxena from comment #10)
> Indeed. Not reproducible for me as well on
> 4.4.0-0.nightly-2020-03-04-143604. Shang gao Can you also check in your env
> on latest nightly?

I think this bug still exist, please see following steps

1, Create win-webserver and linux-webserver pod, at first east-west network testing passed.
[root@sgaoos aws]# oc get pod -o wide
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
my-nginx-75897978cd-rd4m2        1/1     Running   0          81m   10.131.0.13   ip-10-0-134-251.us-east-2.compute.internal   <none>           <none>
win-webserver-79b64df8b9-chw7f   1/1     Running   0          82m   10.132.0.2    ip-10-0-29-113.us-east-2.compute.internal    <none>           <none>
[root@sgaoos aws]# oc exec my-nginx-75897978cd-rd4m2 curl 10.132.0.2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   125  100   125    0     0    543      0 --:--:-- --:--:-- --:--:--   543
<html><body><H1>Windows Container Web Server</H1><p>IP 10.132.0.2 callerCount 3 <p>IP 10.132.0.2 callerCount 5 </body></html>

2, After more than 3 hours, please see pod "AGE", now the same east-west network failed.

[root@sgaoos aws]# oc get pod -o wide
NAME                             READY   STATUS      RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
my-nginx-75897978cd-rd4m2        1/1     Running     0          3h25m   10.131.0.13   ip-10-0-134-251.us-east-2.compute.internal   <none>           <none>
win-webserver-79b64df8b9-chw7f   1/1     Running     0          3h26m   10.132.0.2    ip-10-0-29-113.us-east-2.compute.internal    <none>           <none>
[root@sgaoos aws]# oc exec my-nginx-75897978cd-rd4m2 curl 10.132.0.2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:02:08 --:--:--     0curl: (7) Failed to connect to 10.132.0.2 port 80: Connection timed out
command terminated with exit code 7

3, Created another linux-webserver pod by edit deployment, the new pod to win-webserver still works. 

[root@sgaoos aws]# oc get pod -o wide
NAME                             READY   STATUS      RESTARTS   AGE    IP            NODE                                         NOMINATED NODE   READINESS GATES
my-nginx-75897978cd-rd4m2        1/1     Running     0          4h     10.131.0.13   ip-10-0-134-251.us-east-2.compute.internal   <none>           <none>
my-nginx-75897978cd-s2ks8        1/1     Running     0          12m    10.128.2.11   ip-10-0-159-2.us-east-2.compute.internal     <none>           <none>
win-webserver-79b64df8b9-chw7f   1/1     Running     0          4h1m   10.132.0.2    ip-10-0-29-113.us-east-2.compute.internal    <none>           <none>

[root@sgaoos aws]# oc exec my-nginx-75897978cd-s2ks8 curl 10.132.0.2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   158  100   158    0     0    731      0 --:--:-- --:--:-- --:--:--   731
<html><body><H1>Windows Container Web Server</H1><p>IP 10.132.0.2 callerCount 2 <p>IP 10.132.0.2 callerCount 30 <p>IP 10.132.0.2 callerCount 22 </body></html>


Maybe something happened in pod network during these 3 hours, which stopped the channel between linux pod to windows pod. It's the same when win-webserver pod access linux-webserver pod.


Note You need to log in before you can comment on or make changes to this bug.