Description of problem: Idle service cannot be waken up Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-07-175228 How reproducible: always Steps to Reproduce: 1. Create rc test pod and svc $ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/list_for_pods.json replicationcontroller/test-rc created service/test-service created $ oc get pod NAME READY STATUS RESTARTS AGE test-rc-bswdm 1/1 Running 0 4s test-rc-p7hpg 1/1 Running 0 4s 2. Create another test pod oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/pod-for-ping.json 3. oc idle test-service 4. Access the idle service, can NOT be worked $ oc exec -n ffcbg hello-pod -- curl 172.30.54.122:27017 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:06 --:--:-- 0^C 5. check the ep $ oc get ep -n ffcbg -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Endpoints metadata: annotations: idling.alpha.openshift.io/idled-at: "2021-08-09T11:52:34Z" idling.alpha.openshift.io/unidle-targets: '[{"kind":"ReplicationController","name":"test-rc","replicas":2}]' creationTimestamp: "2021-08-09T11:52:24Z" labels: name: test-service name: test-service namespace: ffcbg resourceVersion: "206036" uid: 7e2d6e4d-5a93-4922-9825-ace2fe442100 kind: List metadata: resourceVersion: "" selfLink: "" 6. check the service $ oc get svc -n ffcbg -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Service metadata: annotations: idling.alpha.openshift.io/idled-at: "2021-08-09T11:52:34Z" idling.alpha.openshift.io/unidle-targets: '[{"kind":"ReplicationController","name":"test-rc","replicas":2}]' creationTimestamp: "2021-08-09T11:52:24Z" labels: name: test-service name: test-service namespace: ffcbg resourceVersion: "206021" uid: 051fe7ee-23ef-451a-b32c-aaad8b687cb0 spec: clusterIP: 172.30.54.122 clusterIPs: - 172.30.54.122 ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: http port: 27017 protocol: TCP targetPort: 8080 selector: name: test-pods sessionAffinity: None type: ClusterIP status: loadBalancer: {} kind: List metadata: resourceVersion: "" selfLink: "" 7. check the event $ oc get event -n ffcbg LAST SEEN TYPE REASON OBJECT MESSAGE 5m26s Normal Scheduled pod/hello-pod Successfully assigned ffcbg/hello-pod to ip-10-0-188-18.us-east-2.compute.internal 5m24s Normal AddedInterface pod/hello-pod Add eth0 [10.131.0.57/23] from openshift-sdn 5m24s Normal Pulled pod/hello-pod Container image "quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95" already present on machine 5m24s Normal Created pod/hello-pod Created container hello-pod 5m24s Normal Started pod/hello-pod Started container hello-pod 5m41s Normal Scheduled pod/test-rc-7vj5w Successfully assigned ffcbg/test-rc-7vj5w to ip-10-0-215-184.us-east-2.compute.internal 5m39s Normal AddedInterface pod/test-rc-7vj5w Add eth0 [10.129.2.146/23] from openshift-sdn 5m39s Normal Pulled pod/test-rc-7vj5w Container image "quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95" already present on machine 5m39s Normal Created pod/test-rc-7vj5w Created container test-pod 5m39s Normal Started pod/test-rc-7vj5w Started container test-pod 5m29s Normal Killing pod/test-rc-7vj5w Stopping container test-pod 5m41s Normal Scheduled pod/test-rc-jqbl5 Successfully assigned ffcbg/test-rc-jqbl5 to ip-10-0-188-18.us-east-2.compute.internal 5m39s Normal AddedInterface pod/test-rc-jqbl5 Add eth0 [10.131.0.56/23] from openshift-sdn 5m39s Normal Pulled pod/test-rc-jqbl5 Container image "quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95" already present on machine 5m39s Normal Created pod/test-rc-jqbl5 Created container test-pod 5m39s Normal Started pod/test-rc-jqbl5 Started container test-pod 5m29s Normal Killing pod/test-rc-jqbl5 Stopping container test-pod 5m41s Normal SuccessfulCreate replicationcontroller/test-rc Created pod: test-rc-7vj5w 5m41s Normal SuccessfulCreate replicationcontroller/test-rc Created pod: test-rc-jqbl5 5m29s Normal SuccessfulDelete replicationcontroller/test-rc Deleted pod: test-rc-7vj5w 5m29s Normal SuccessfulDelete replicationcontroller/test-rc Deleted pod: test-rc-jqbl5 8. check the iptables rule sh-4.4# iptables-save | grep ffcbg -A KUBE-PORTALS-CONTAINER -d 172.30.54.122/32 -p tcp -m comment --comment "ffcbg/test-service:http" -m tcp --dport 27017 -j REDIRECT --to-ports 35041 -A KUBE-PORTALS-HOST -d 172.30.54.122/32 -p tcp -m comment --comment "ffcbg/test-service:http" -m tcp --dport 27017 -j DNAT --to-destination 10.0.128.147:35041 Actual results: Expected results: access the svc can wake up the pods Additional info:
Assigning to Dan as he's been working on some idling bugs for openshift-sdn
LOL so how did this pass CI? E0809 19:07:45.030051 1951 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"test-service.1699b8ec3bca9e5a", GenerateName:"", Namespace:"test", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:v1.MicroTime{Time:time.Time{wall:0xc03c7d20418db83b, ext:1334903747131, loc:(*time.Location)(0x30c9ba0)}}, Series:(*v1.EventSeries)(nil), ReportingController:"kube-proxy", ReportingInstance:"kube-proxy-ip-10-0-191-238.us-west-1.compute.internal", Action:"The service-port %s:%s needs pods.", Reason:"NeedPods", Regarding:v1.ObjectReference{Kind:"Service", Namespace:"test", Name:"test-service", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"test-service%!(EXTRA string=http)", Type:"Normal", DeprecatedSource:v1.EventSource{Component:"", Host:""}, DeprecatedFirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeprecatedLastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeprecatedCount:0}': 'events.events.k8s.io is forbidden: User "system:serviceaccount:openshift-sdn:sdn" cannot create resource "events" in API group "events.k8s.io" in the namespace "test"' (will not retry!) in particular 'cannot create resource "events" in API group "events.k8s.io"' In 1.22 kube-proxy moved from the old core/v1 Event API to the new events.k8s.io/v1 Event api, so I guess we need to update our RBAC rules...
Verified this bug on 4.9.0-0.nightly-2021-08-19-184748 We met an issue when idle service in 4.9 $ oc idle test-service --kubeconfig=/home/zzhao/workdir/dhcp-140-240-zzhao/ocp4_testuser-21.kubeconfig ReplicationController "v0te6/test-rc" has been idled STDERR: error: unable to mark service "v0te6/test-service" as idled: endpoints "test-service" is forbidden: User "testuser-21" cannot patch resource "endpoints" in API group "" in the namespace "v0te6" this is another bug is tracing this https://bugzilla.redhat.com/show_bug.cgi?id=1995505 $ oc get pod -n v0te6 No resources found in v0te6 namespace. $ oc get svc -n v0te6 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE test-service ClusterIP 172.30.241.66 <none> 27017/TCP 4m40s $ oc rsh -n openshift-multus multus-admission-controller-7kh4c Defaulted container "multus-admission-controller" out of: multus-admission-controller, kube-rbac-proxy sh-4.4# curl 172.30.241.66:27017 Hello OpenShift! $ oc get pod -n v0te6 NAME READY STATUS RESTARTS AGE test-rc-rgvn2 1/1 Running 0 16s test-rc-zbwr5 1/1 Running 0 16s Move this bug to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759