Bug 1971669

Summary: OCP IPI Publish Internal - GCP: Load Balancer service with External Traffic Policy as Local is not working
Product: OpenShift Container Platform Reporter: Pamela Escorza <pescorza>
Component: NetworkingAssignee: Andrew Stoycos <astoycos>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: medium CC: aconstan, astoycos, bbennett
Version: 4.6Flags: pescorza: needinfo? (astoycos)
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-01 18:23:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1959711    
Bug Blocks:    

Description Pamela Escorza 2021-06-14 14:22:04 UTC
Description of problem:
Load Balancer service with External Traffic Policy as Local is not working for OCP IPI Internal Google Cloud Platform. 
Once EgressNetwrokPolicy is configured the services is not able to be requested.
This bug is opened to confirm if the behavior is as expected therefore is not supported or if there is a flaw on the configuration
 
Service Configuration:

Load Balancer service with External Traffic Policy as Local:

Name:                     phphw-lb-internal-local
Namespace:                php-hw
Labels:                   app=php-hw
Annotations:              cloud.google.com/load-balancer-type: Internal
Selector:                 deploymentconfig=php-hw
Type:                     LoadBalancer
IP:                       172.30.45.164
LoadBalancer Ingress:     10.17.0.33
Port:                     8080-tcp  8080/TCP
TargetPort:               8080/TCP
NodePort:                 8080-tcp  32249/TCP
Endpoints:                10.153.2.6:8080
Port:                     8443-tcp  8443/TCP
TargetPort:               8443/TCP
NodePort:                 8443-tcp  32581/TCP
Endpoints:                10.153.2.6:8443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     32418
Events:                   <none>

Egress Network Policy applied:

$ oc describe egressnetworkpolicy -n php-hw 
Name:   default-rules
Namespace:  php-hw
Created:  2 days ago
Labels:   <none>
Annotations:  <none>
Rule:   Allow to access.redhat.com
Rule:   Allow to 10.17.0.3/32
Rule:   Deny to 0.0.0.0/0

Information about VM used for testing:

[dopleg@instance-bastion-17 netpol]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq state UP group default qlen 1000
    link/ether 42:01:0a:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 10.17.0.2/32 brd 10.17.0.2 scope global noprefixroute dynamic eth0
       valid_lft 51687sec preferred_lft 51687sec
    inet6 fe80::fb0:6025:85f3:51da/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Testing:
[dopleg@instance-bastion-17 netpol]$ oc get service -n php-hw
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
php-hw                      ClusterIP      172.30.214.192   <none>        8080/TCP,8443/TCP               2d8h
phphw-lb-internal-cluster   LoadBalancer   172.30.196.87    10.17.0.34    8080:32183/TCP,8443:32597/TCP   2d5h
phphw-lb-internal-local     LoadBalancer   172.30.45.164    10.17.0.33    8080:32249/TCP,8443:32581/TCP   2d5h

[dopleg@instance-bastion-17 netpol]$ curl -I 10.17.0.33:8080
curl: (7) Failed connect to 10.17.0.33:8080; Connection timed out

Checking the data network packet, it's blocked at the endpoint:

$ tshark -r ocp-int-79462-worker-a-b6zzj.c.pescorza-tam-ocp-cee.internal_11_06_2021-13_53_53-UTC.pcap
    1   0.000000    10.17.0.2 → 10.153.2.6   74 0.000000000 0 0 0 0 28400 47350 → 8080 [SYN] Seq=0 Win=28400 Len=0 MSS=1420 SACK_PERM=1 TSval=256391574 TSecr=0 WS=128
    2   0.000082   10.153.2.6 → 10.17.0.2    74 0.000082000 0 0 0 1 27160 8080 → 47350 [SYN, ACK] Seq=0 Ack=1 Win=27160 Len=0 MSS=1370 SACK_PERM=1 TSval=305556644 TSecr=256391574 WS=128
    3   1.000195    10.17.0.2 → 10.153.2.6   74 1.000113000 0 0 0 0 28400 [TCP Retransmission] 47350 → 8080 [SYN] Seq=0 Win=28400 Len=0 MSS=1420 SACK_PERM=1 TSval=256392576 TSecr=0 WS=128
    4   1.000225   10.153.2.6 → 10.17.0.2    74 0.000030000 0 0 0 1 27160 [TCP Retransmission] 8080 → 47350 [SYN, ACK] Seq=0 Ack=1 Win=27160 Len=0 MSS=1370 SACK_PERM=1 TSval=305557644 TSecr=256391574 WS=128
    5   2.021554   10.153.2.6 → 10.17.0.2    74 1.021329000 0 0 0 1 27160 [TCP Retransmission] 8080 → 47350 [SYN, ACK] Seq=0 Ack=1 Win=27160 Len=0 MSS=1370 SACK_PERM=1 TSval=305558666 TSecr=256391574 WS=128
    6   3.004163    10.17.0.2 → 10.153.2.6   74 0.982609000 0 0 0 0 28400 [TCP Retransmission] 47350 → 8080 [SYN] Seq=0 Win=28400 Len=0 MSS=1420 SACK_PERM=1 TSval=256394580 TSecr=0 WS=128
    7   3.004191   10.153.2.6 → 10.17.0.2    74 0.000028000 0 0 0 1 27160 [TCP Retransmission] 8080 → 47350 [SYN, ACK] Seq=0 Ack=1 Win=27160 Len=0 MSS=1370 SACK_PERM=1 TSval=305559648 TSecr=256391574 WS=128
    8   5.030521   10.153.2.6 → 10.17.0.2    74 2.026330000 0 0 0 1 27160 [TCP Retransmission] 8080 → 47350 [SYN, ACK] Seq=0 Ack=1 Win=27160 Len=0 MSS=1370 SACK_PERM=1 TSval=305561674 TSecr=256391574 WS=128
    9   5.157547 0a:58:0a:99:02:06 → 8e:f7:82:f0:47:2e 42       Who has 10.153.2.1? Tell 10.153.2.6
   10   5.157794 8e:f7:82:f0:47:2e → 0a:58:0a:99:02:06 42       Who has 10.153.2.6? Tell 10.153.2.1
   11   5.157806 0a:58:0a:99:02:06 → 8e:f7:82:f0:47:2e 42       10.153.2.6 is at 0a:58:0a:99:02:06
   12   5.157808 8e:f7:82:f0:47:2e → 0a:58:0a:99:02:06 42       10.153.2.1 is at 8e:f7:82:f0:47:2e

Version-Release number of selected component (if applicable):
OpenShift 4.6 IPI Publish Internal - Google Cloud Platform

How reproducible:
It's reproducible by  

Steps to Reproduce:
1. Deploying an IPI OCP 4.6 published Internal cluster on GCP
2. Deploy an application 
3. Configure a Load Balancer Service with external traffic policy as local to request the application
4. Verify if service is reachable

Actual results:
Connection timed out when requesting the load balancer service

Expected results:
Resolve the load balancer service request correctly

Additional info:

Comment 1 Andrew Stoycos 2021-06-14 18:25:09 UTC
Hi there, 

Can you please provide a must-gather from the cluster while the issue is reproducible? 

Thanks,
Andrew

Comment 4 Pamela Escorza 2021-07-01 15:18:33 UTC
Hi @astoycos, any update on this issue?

Comment 16 errata-xmlrpc 2021-09-01 18:23:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.7.28 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3262