Description of problem: Pod keep in ContainerCreating status when set invalid value in pod bandwidth # oc get pod -n d1 NAME READY STATUS RESTARTS AGE iperf 0/1 ContainerCreating 0 1h Version-Release number of selected component (if applicable): openshift v3.6.153 kubernetes v1.6.1+5115d708d7 How reproducible: Always Steps to Reproduce: 1. Create a pod with invalid pod bandwidth # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/egress-ingress/invalid-iperf.json 2. Check the pod # oc get pod Actual results: # oc get pod -n d1 NAME READY STATUS RESTARTS AGE iperf 0/1 ContainerCreating 0 1h # oc describe pod iperf -n d1 Name: iperf Namespace: d1 Security Policy: anyuid Node: host-8-174-69.host.centralci.eng.rdu2.redhat.com/10.8.174.69 Start Time: Mon, 24 Jul 2017 02:52:44 -0400 Labels: <none> Annotations: kubernetes.io/egress-bandwidth=-10M kubernetes.io/ingress-bandwidth=-3M openshift.io/scc=anyuid Status: Pending IP: Controllers: <none> Containers: iperf: Container ID: Image: yadu/hello-openshift-iperf Image ID: Port: State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-hf0cm (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: default-token-hf0cm: Type: Secret (a volume populated by a Secret) SecretName: default-token-hf0cm Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 default-scheduler Normal Scheduled Successfully assigned iperf to host-8-174-69.host.centralci.eng.rdu2.redhat.com 1h 59m 9 kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com Warning DNSSearchForming Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains 1h 8m 114 kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com Normal SandboxChanged Pod sandbox changed, it will be killed and re-created. 1h 3m 125 kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com Warning FailedSync Error syncing pod Expected results: Pod status should be Error or something else instead of keeping in ContainerCreating status. Before 3.5, when we set invalid value in pod bandwidth, we get some warning like "resource is unreasonably small (< 1kbit)" in events log. In 3.6, FailedSync event was intentionally changed to reduce etcd event spam according this PR:https://github.com/openshift/origin/pull/14693 , now when we set invalid bandwidth in pod, we don't have any meaningful warnning in events log and the pod keeps in ContainerCreating status, it may confuse users which don't have permission to check the node log Additional info:
invalid values should be caught in validation, not at runtime.
Derek, makes sense to me. Just presently we need to make sure that user receives some feedback. Even admins can have trouble diagnosing such issues when they don't expect what the trouble could be. I don't know if `ingress-bandwidth` is the only annotation that can have this problem. IMO we need to be sure to send feedback for any post-validation issues now and in the future. While I agree that we shouldn't have post-validation issues, we obviously have. And new features can introduce such at any time. Implementing a way for this to be propagated back to the user is ultimate for a reasonable UX.
You can always send back a new event specific invalid bandwidth settings. Piggybacking on the FailedSync event is not ideal. I think the FailedSync event should go away honestly as it means nothing to a user. InvalidBandwidth events is much more meaningful.
Current version of openshift does not have this problem [root@localhost origin]# oc get all NAME READY STATUS RESTARTS AGE po/iperf 1/1 Running 0 9m NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/kubernetes 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 11m [root@localhost origin]# oc version oc v3.7.0-alpha.1+994a5a6-244 kubernetes v1.7.0+695f48a16f features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://192.168.1.69:8443 openshift v3.7.0-alpha.0+66c7f6c-430-dirty kubernetes v1.7.0+695f48a16f Feel free to reopen this bug if the problem persists.
openshift v3.7.7 kubernetes v1.7.6+a08f5eeb62 I still could reproduce this issue on latest OCP 3.7 # oc get all NAME READY STATUS RESTARTS AGE po/iperf 0/1 ContainerCreating 0 21m @Ivan are you using invalid value in pod bandwidth? The issue only could be reproduced when using invalid pod bandwidth eg: { "kind": "Pod", "apiVersion":"v1", "metadata": { "name": "iperf", "annotations": { "kubernetes.io/egress-bandwidth": "-10M", "kubernetes.io/ingress-bandwidth": "-3M" } }, "spec": { "containers": [{ "name": "iperf", "image": "yadu/hello-openshift-iperf" }] } }
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days