Bug 1579649

Summary: [3.5][OpenStack+cloudprovider disabled] cannot wake up the resources after idling service
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, bmeng, zzhao
Version: 3.5.1Keywords: Reopened
Target Milestone: ---   
Target Release: 3.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1567043 Environment:
Last Closed: 2018-05-22 19:37:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1567043, 1579652    
Bug Blocks:    

Description Hongan Li 2018-05-18 05:26:00 UTC
+++ This bug was initially created as a clone of Bug #1567043 +++

Description of problem:
cannot wake up the resources after idling service

Version-Release number of selected component (if applicable):
openshift v3.4.1.44.52
kubernetes v1.4.0+776c994

How reproducible:
always

Steps to Reproduce:
1. create rc (pod,svc)
# oc create -f  https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/list_for_caddy.json

2. idling the service
# oc idle service-unsecure -n lha
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)

note: tried both options "--dry-run=false" and "--dry-run=true" but above output always show "(dry run)".

3. Generate some traffic to un-idle the service 
# curl 172.30.202.54:27017
curl: (7) Failed connect to 172.30.202.54:27017; No route to host


Actual results:
cannot wake up the resource, and the iptables is not correct after idling (no random port opened for the idled service)

[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable


Expected results:
should wake up the resource when receiving traffic

Additional info:
the iptables looks good before idling, see below

# oc get svc -n lha
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.214.106   <none>        27443/TCP   20s
service-unsecure   172.30.202.54    <none>        27017/TCP   20s
[root@host-8-242-109 ~]# curl 172.30.202.54:27017
Hello-OpenShift-1 http-8080
[root@host-8-242-109 ~]# 
[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SEP-GFQRI2E3EIJELQBB -s 10.130.0.17/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-GFQRI2E3EIJELQBB -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.130.0.17:8443
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -s 10.130.0.17/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.130.0.17:8080
-A KUBE-SEP-VGWZGBHRIB24XXKZ -s 10.128.0.20/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VGWZGBHRIB24XXKZ -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.128.0.20:8080
-A KUBE-SEP-XKULLKEY4RDFTDQL -s 10.128.0.20/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-XKULLKEY4RDFTDQL -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.128.0.20:8443
-A KUBE-SERVICES -d 172.30.202.54/32 -p tcp -m comment --comment "lha/service-unsecure:http cluster IP" -m tcp --dport 27017 -j KUBE-SVC-CQEG2R4O4IX66RKH
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-VGWZGBHRIB24XXKZ
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -j KUBE-SEP-VFVCTHYVGKJKI5D5
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XKULLKEY4RDFTDQL
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -j KUBE-SEP-GFQRI2E3EIJELQBB

--- Additional comment from Ben Bennett on 2018-04-28 02:58:10 CST ---

Idling was tech preview in 3.4.

We are tracking a later idling bug with https://bugzilla.redhat.com/show_bug.cgi?id=1562184 and it is probably the same root cause, but we aren't going to backport to 3.4 anyway.

--- Additional comment from hongli on 2018-05-14 11:22:19 CST ---

Looks the issue is related to cloudprovider disable, cannot reproduce the problem in 3.4.1.44.53 on OpenStack + Cloudprovider enable.

[root@host-8-243-47 ~]# oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.250.212   <none>        27443/TCP   20s
service-unsecure   172.30.114.124   <none>        27017/TCP   20s
[root@host-8-243-47 ~]# oc idle service-unsecure
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# iptables-save | grep lha
-A KUBE-PORTALS-CONTAINER -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-PORTALS-HOST -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# curl 172.30.114.124:27017
Hello-OpenShift-1 http-8080
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# oc version
oc v3.4.1.44.53
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://host-8-243-47.host.centralci.eng.rdu2.redhat.com:8443
openshift v3.4.1.44.53
kubernetes v1.4.0+776c994
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]#

--- Additional comment from hongli on 2018-05-14 16:07:07 CST ---

Do more testing and narrow down the reproducing condition to OCP on "OpenStack + Cloudprovider disabled".

Comment 1 Hongan Li 2018-05-18 05:29:28 UTC
can reproduce the same issue in OCP v3.5.5.31.67

Comment 2 Ben Bennett 2018-05-22 19:37:24 UTC
Idling was tech preview in 3.4, 3.5, and 3.6.

Closing this since it is fixed in the current releases.