Bug 1567043 - [3.4][cloudprovider disable] cannot wake up the resources after idling service
Summary: [3.4][cloudprovider disable] cannot wake up the resources after idling service
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.4.z
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1579649 1579652
TreeView+ depends on / blocked
 
Reported: 2018-04-13 10:36 UTC by Hongan Li
Modified: 2018-05-22 19:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1579649 1579652 (view as bug list)
Environment:
Last Closed: 2018-05-22 19:37:14 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Hongan Li 2018-04-13 10:36:43 UTC
Description of problem:
cannot wake up the resources after idling service

Version-Release number of selected component (if applicable):
openshift v3.4.1.44.52
kubernetes v1.4.0+776c994

How reproducible:
always

Steps to Reproduce:
1. create rc (pod,svc)
# oc create -f  https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/list_for_caddy.json

2. idling the service
# oc idle service-unsecure -n lha
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)

note: tried both options "--dry-run=false" and "--dry-run=true" but above output always show "(dry run)".

3. Generate some traffic to un-idle the service 
# curl 172.30.202.54:27017
curl: (7) Failed connect to 172.30.202.54:27017; No route to host


Actual results:
cannot wake up the resource, and the iptables is not correct after idling (no random port opened for the idled service)

[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable


Expected results:
should wake up the resource when receiving traffic

Additional info:
the iptables looks good before idling, see below

# oc get svc -n lha
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.214.106   <none>        27443/TCP   20s
service-unsecure   172.30.202.54    <none>        27017/TCP   20s
[root@host-8-242-109 ~]# curl 172.30.202.54:27017
Hello-OpenShift-1 http-8080
[root@host-8-242-109 ~]# 
[root@host-8-242-109 ~]# iptables-save | grep lha
-A KUBE-SEP-GFQRI2E3EIJELQBB -s 10.130.0.17/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-GFQRI2E3EIJELQBB -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.130.0.17:8443
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -s 10.130.0.17/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VFVCTHYVGKJKI5D5 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.130.0.17:8080
-A KUBE-SEP-VGWZGBHRIB24XXKZ -s 10.128.0.20/32 -m comment --comment "lha/service-unsecure:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-VGWZGBHRIB24XXKZ -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp -j DNAT --to-destination 10.128.0.20:8080
-A KUBE-SEP-XKULLKEY4RDFTDQL -s 10.128.0.20/32 -m comment --comment "lha/service-secure:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-XKULLKEY4RDFTDQL -p tcp -m comment --comment "lha/service-secure:https" -m tcp -j DNAT --to-destination 10.128.0.20:8443
-A KUBE-SERVICES -d 172.30.202.54/32 -p tcp -m comment --comment "lha/service-unsecure:http cluster IP" -m tcp --dport 27017 -j KUBE-SVC-CQEG2R4O4IX66RKH
-A KUBE-SERVICES -d 172.30.214.106/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-VGWZGBHRIB24XXKZ
-A KUBE-SVC-CQEG2R4O4IX66RKH -m comment --comment "lha/service-unsecure:http" -j KUBE-SEP-VFVCTHYVGKJKI5D5
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XKULLKEY4RDFTDQL
-A KUBE-SVC-P6NT6I2XSZW2EWVD -m comment --comment "lha/service-secure:https" -j KUBE-SEP-GFQRI2E3EIJELQBB

Comment 1 Ben Bennett 2018-04-27 18:58:10 UTC
Idling was tech preview in 3.4.

We are tracking a later idling bug with https://bugzilla.redhat.com/show_bug.cgi?id=1562184 and it is probably the same root cause, but we aren't going to backport to 3.4 anyway.

*** This bug has been marked as a duplicate of bug 1562184 ***

Comment 2 Hongan Li 2018-05-14 03:22:19 UTC
Looks the issue is related to cloudprovider disable, cannot reproduce the problem in 3.4.1.44.53 on OpenStack + Cloudprovider enable.

[root@host-8-243-47 ~]# oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     172.30.250.212   <none>        27443/TCP   20s
service-unsecure   172.30.114.124   <none>        27017/TCP   20s
[root@host-8-243-47 ~]# oc idle service-unsecure
Marked service lha/service-unsecure to unidle resource ReplicationController lha/caddy-rc (unidle to 2 replicas)
Idled ReplicationController lha/caddy-rc (dry run)
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# iptables-save | grep lha
-A KUBE-PORTALS-CONTAINER -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-PORTALS-HOST -d 172.30.114.124/32 -p tcp -m comment --comment "lha/service-unsecure:http" -m tcp --dport 27017 -j DNAT --to-destination 172.16.120.79:40540
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https cluster IP" -m tcp --dport 27443 -j KUBE-SVC-P6NT6I2XSZW2EWVD
-A KUBE-SERVICES -d 172.30.250.212/32 -p tcp -m comment --comment "lha/service-secure:https has no endpoints" -m tcp --dport 27443 -j REJECT --reject-with icmp-port-unreachable
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# curl 172.30.114.124:27017
Hello-OpenShift-1 http-8080
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]# oc version
oc v3.4.1.44.53
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://host-8-243-47.host.centralci.eng.rdu2.redhat.com:8443
openshift v3.4.1.44.53
kubernetes v1.4.0+776c994
[root@host-8-243-47 ~]# 
[root@host-8-243-47 ~]#

Comment 3 Hongan Li 2018-05-14 08:07:07 UTC
Do more testing and narrow down the reproducing condition to OCP on "OpenStack + Cloudprovider disabled".

Comment 4 Hongan Li 2018-05-18 05:40:40 UTC
can reproduce the same issue in v3.5 and v3.6, but not in v3.7 and v3.9, so cloned this one to 3.5 and 3.6.

Comment 5 Ben Bennett 2018-05-22 19:37:14 UTC
Idling was tech preview in 3.4, 3.5, and 3.6.

Closing this since it is fixed in the current releases.


Note You need to log in before you can comment on or make changes to this bug.