Bug 1579649

Summary:	[3.5][OpenStack+cloudprovider disabled] cannot wake up the resources after idling service
Product:	OpenShift Container Platform	Reporter:	Hongan Li <hongli>
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aos-bugs, bbennett, bmeng, zzhao
Version:	3.5.1	Keywords:	Reopened
Target Milestone:	---
Target Release:	3.5.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1567043	Environment:
Last Closed:	2018-05-22 19:37:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1567043, 1579652
Bug Blocks:

Description Hongan Li 2018-05-18 05:26:00 UTC

+++ This bug was initially created as a clone of Bug #1567043 +++

Description of problem:
cannot wake up the resources after idling service

Version-Release number of selected component (if applicable):
openshift v3.4.1.44.52
kubernetes v1.4.0+776c994

How reproducible:
always

Steps to Reproduce:
1. create rc (pod,svc)
# oc create -f  https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/list_for_caddy.json

2. idling the service
# oc idle service-unsecure -n lha
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)

note: tried both options "--dry-run=false" and "--dry-run=true" but above output always show "(dry run)".

3. Generate some traffic to un-idle the service 
# curl 172.30.202.54:27017
curl: (7) Failed connect to 172.30.202.54:27017; No route to host


Actual results:
cannot wake up the resource, and the iptables is not correct after idling (no random port opened for the idled service)

[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable


Expected results:
should wake up the resource when receiving traffic

Additional info:
the iptables looks good before idling, see below

# oc get svc -n lha
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.214.106   <none>        27443/TCP   20s
service-unsecure   172.30.202.54    <none>        27017/TCP   20s
[root@host-8-242-109 ~]# curl 172.30.202.54:27017
Hello-OpenShift-1 http-8080
[root@host-8-242-109 ~]# 
[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SEP-GFQRI2E3EIJELQBB -s 10.130.0.17/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-GFQRI2E3EIJELQBB -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.130.0.17:8443
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -s 10.130.0.17/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.130.0.17:8080
-A KUBE-SEP-VGWZGBHRIB24XXKZ -s 10.128.0.20/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VGWZGBHRIB24XXKZ -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.128.0.20:8080
-A KUBE-SEP-XKULLKEY4RDFTDQL -s 10.128.0.20/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-XKULLKEY4RDFTDQL -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.128.0.20:8443
-A KUBE-SERVICES -d 172.30.202.54/32 -p tcp -m comment --comment "lha/service-unsecure:http cluster IP" -m tcp --dport 27017 -j KUBE-SVC-CQEG2R4O4IX66RKH
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-VGWZGBHRIB24XXKZ
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -j KUBE-SEP-VFVCTHYVGKJKI5D5
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XKULLKEY4RDFTDQL
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -j KUBE-SEP-GFQRI2E3EIJELQBB

--- Additional comment from Ben Bennett on 2018-04-28 02:58:10 CST ---

Idling was tech preview in 3.4.

We are tracking a later idling bug with https://bugzilla.redhat.com/show_bug.cgi?id=1562184 and it is probably the same root cause, but we aren't going to backport to 3.4 anyway.

--- Additional comment from hongli on 2018-05-14 11:22:19 CST ---

Looks the issue is related to cloudprovider disable, cannot reproduce the problem in 3.4.1.44.53 on OpenStack + Cloudprovider enable.

[root@host-8-243-47 ~]# oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.250.212   <none>        27443/TCP   20s
service-unsecure   172.30.114.124   <none>        27017/TCP   20s
[root@host-8-243-47 ~]# oc idle service-unsecure
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# iptables-save | grep lha
-A KUBE-PORTALS-CONTAINER -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-PORTALS-HOST -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# curl 172.30.114.124:27017
Hello-OpenShift-1 http-8080
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# oc version
oc v3.4.1.44.53
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://host-8-243-47.host.centralci.eng.rdu2.redhat.com:8443
openshift v3.4.1.44.53
kubernetes v1.4.0+776c994
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]#

--- Additional comment from hongli on 2018-05-14 16:07:07 CST ---

Do more testing and narrow down the reproducing condition to OCP on "OpenStack + Cloudprovider disabled".

Comment 1 Hongan Li 2018-05-18 05:29:28 UTC

can reproduce the same issue in OCP v3.5.5.31.67

Comment 2 Ben Bennett 2018-05-22 19:37:24 UTC

Idling was tech preview in 3.4, 3.5, and 3.6.

Closing this since it is fixed in the current releases.