1579649 – [3.5][OpenStack+cloudprovider disabled] cannot wake up the resources after idling service

Bug 1579649 - [3.5][OpenStack+cloudprovider disabled] cannot wake up the resources after idling service

Summary: [3.5][OpenStack+cloudprovider disabled] cannot wake up the resources after id...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.5.z
Assignee:	Ben Bennett
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	1567043 1579652
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-18 05:26 UTC by Hongan Li
Modified:	2022-08-04 22:20 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1567043
Environment:
Last Closed:	2018-05-22 19:37:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Hongan Li 2018-05-18 05:26:00 UTC

+++ This bug was initially created as a clone of Bug #1567043 +++

Description of problem:
cannot wake up the resources after idling service

Version-Release number of selected component (if applicable):
openshift v3.4.1.44.52
kubernetes v1.4.0+776c994

How reproducible:
always

Steps to Reproduce:
1. create rc (pod,svc)
# oc create -f  https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/list_for_caddy.json

2. idling the service
# oc idle service-unsecure -n lha
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)

note: tried both options "--dry-run=false" and "--dry-run=true" but above output always show "(dry run)".

3. Generate some traffic to un-idle the service 
# curl 172.30.202.54:27017
curl: (7) Failed connect to 172.30.202.54:27017; No route to host


Actual results:
cannot wake up the resource, and the iptables is not correct after idling (no random port opened for the idled service)

[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable


Expected results:
should wake up the resource when receiving traffic

Additional info:
the iptables looks good before idling, see below

# oc get svc -n lha
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.214.106   <none>        27443/TCP   20s
service-unsecure   172.30.202.54    <none>        27017/TCP   20s
[root@host-8-242-109 ~]# curl 172.30.202.54:27017
Hello-OpenShift-1 http-8080
[root@host-8-242-109 ~]# 
[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SEP-GFQRI2E3EIJELQBB -s 10.130.0.17/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-GFQRI2E3EIJELQBB -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.130.0.17:8443
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -s 10.130.0.17/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.130.0.17:8080
-A KUBE-SEP-VGWZGBHRIB24XXKZ -s 10.128.0.20/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VGWZGBHRIB24XXKZ -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.128.0.20:8080
-A KUBE-SEP-XKULLKEY4RDFTDQL -s 10.128.0.20/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-XKULLKEY4RDFTDQL -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.128.0.20:8443
-A KUBE-SERVICES -d 172.30.202.54/32 -p tcp -m comment --comment "lha/service-unsecure:http cluster IP" -m tcp --dport 27017 -j KUBE-SVC-CQEG2R4O4IX66RKH
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-VGWZGBHRIB24XXKZ
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -j KUBE-SEP-VFVCTHYVGKJKI5D5
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XKULLKEY4RDFTDQL
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -j KUBE-SEP-GFQRI2E3EIJELQBB

--- Additional comment from Ben Bennett on 2018-04-28 02:58:10 CST ---

Idling was tech preview in 3.4.

We are tracking a later idling bug with https://bugzilla.redhat.com/show_bug.cgi?id=1562184 and it is probably the same root cause, but we aren't going to backport to 3.4 anyway.

--- Additional comment from hongli on 2018-05-14 11:22:19 CST ---

Looks the issue is related to cloudprovider disable, cannot reproduce the problem in 3.4.1.44.53 on OpenStack + Cloudprovider enable.

[root@host-8-243-47 ~]# oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.250.212   <none>        27443/TCP   20s
service-unsecure   172.30.114.124   <none>        27017/TCP   20s
[root@host-8-243-47 ~]# oc idle service-unsecure
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# iptables-save | grep lha
-A KUBE-PORTALS-CONTAINER -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-PORTALS-HOST -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# curl 172.30.114.124:27017
Hello-OpenShift-1 http-8080
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# oc version
oc v3.4.1.44.53
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://host-8-243-47.host.centralci.eng.rdu2.redhat.com:8443
openshift v3.4.1.44.53
kubernetes v1.4.0+776c994
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]#

--- Additional comment from hongli on 2018-05-14 16:07:07 CST ---

Do more testing and narrow down the reproducing condition to OCP on "OpenStack + Cloudprovider disabled".

Comment 1 Hongan Li 2018-05-18 05:29:28 UTC

can reproduce the same issue in OCP v3.5.5.31.67

Comment 2 Ben Bennett 2018-05-22 19:37:24 UTC

Idling was tech preview in 3.4, 3.5, and 3.6.

Closing this since it is fixed in the current releases.

Note You need to log in before you can comment on or make changes to this bug.