Bug 1474274

Summary: Pod keeps in ContainerCreating status when set invalid value in pod bandwidth
Product: OpenShift Container Platform Reporter: Yan Du <yadu>
Component: NetworkingAssignee: Ivan Chavero <ichavero>
Status: CLOSED UPSTREAM QA Contact: Meng Bo <bmeng>
Severity: low Docs Contact:
Priority: medium    
Version: 3.6.0CC: akostadi, aos-bugs, decarr, ichavero
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 06:37:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yan Du 2017-07-24 09:30:21 UTC
Description of problem:
Pod keep in ContainerCreating status when set invalid value in pod bandwidth
# oc get pod -n d1
NAME      READY     STATUS              RESTARTS   AGE
iperf     0/1       ContainerCreating   0          1h


Version-Release number of selected component (if applicable):
openshift v3.6.153
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1. Create a pod with invalid pod bandwidth
# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/egress-ingress/invalid-iperf.json
2. Check the pod
# oc get pod



Actual results:
# oc get pod -n d1
NAME      READY     STATUS              RESTARTS   AGE
iperf     0/1       ContainerCreating   0          1h

# oc describe pod iperf -n d1
Name:            iperf
Namespace:        d1
Security Policy:    anyuid
Node:            host-8-174-69.host.centralci.eng.rdu2.redhat.com/10.8.174.69
Start Time:        Mon, 24 Jul 2017 02:52:44 -0400
Labels:            <none>
Annotations:        kubernetes.io/egress-bandwidth=-10M
            kubernetes.io/ingress-bandwidth=-3M
            openshift.io/scc=anyuid
Status:            Pending
IP:            
Controllers:        <none>
Containers:
  iperf:
    Container ID:    
    Image:        yadu/hello-openshift-iperf
    Image ID:        
    Port:        
    State:        Waiting
      Reason:        ContainerCreating
    Ready:        False
    Restart Count:    0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hf0cm (ro)
Conditions:
  Type        Status
  Initialized     True 
  Ready     False 
  PodScheduled     True 
Volumes:
  default-token-hf0cm:
    Type:    Secret (a volume populated by a Secret)
    SecretName:    default-token-hf0cm
    Optional:    false
QoS Class:    BestEffort
Node-Selectors:    <none>
Tolerations:    <none>
Events:
  FirstSeen    LastSeen    Count    From                                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                                -------------    --------    ------            -------
  1h        1h        1    default-scheduler                                Normal        Scheduled        Successfully assigned iperf to host-8-174-69.host.centralci.eng.rdu2.redhat.com
  1h        59m        9    kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com            Warning        DNSSearchForming    Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains
  1h        8m        114    kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com            Normal        SandboxChanged        Pod sandbox changed, it will be killed and re-created.
  1h        3m        125    kubelet, host-8-174-69.host.centralci.eng.rdu2.redhat.com            Warning        FailedSync        Error syncing pod


Expected results:
Pod status should be Error or something else instead of keeping in ContainerCreating status. 
Before 3.5, when we set invalid value in pod bandwidth, we get some warning like "resource is unreasonably small (< 1kbit)" in events log. 

In 3.6, FailedSync event was intentionally changed to reduce etcd event spam according this PR:https://github.com/openshift/origin/pull/14693 , now when we set invalid bandwidth in pod, we don't have any meaningful warnning in events log and the pod keeps in ContainerCreating status, it may confuse users which don't have permission to check the node log

Additional info:

Comment 1 Derek Carr 2017-07-25 14:05:18 UTC
invalid values should be caught in validation, not at runtime.

Comment 2 Aleksandar Kostadinov 2017-07-25 16:33:18 UTC
Derek, makes sense to me. Just presently we need to make sure that user receives some feedback. Even admins can have trouble diagnosing such issues when they don't expect what the trouble could be.

I don't know if `ingress-bandwidth` is the only annotation that can have this problem. IMO we need to be sure to send feedback for any post-validation issues now and in the future.

While I agree that we shouldn't have post-validation issues, we obviously have. And new features can introduce such at any time. Implementing a way for this to be propagated back to the user is ultimate for a reasonable UX.

Comment 3 Derek Carr 2017-07-26 14:54:24 UTC
You can always send back a new event specific invalid bandwidth settings.  Piggybacking on the FailedSync event is not ideal.  I think the FailedSync event should go away honestly as it means nothing to a user.  InvalidBandwidth events is much more meaningful.

Comment 4 Ivan Chavero 2017-11-07 06:37:51 UTC
Current version of openshift does not have this problem

[root@localhost origin]# oc get all  
NAME       READY     STATUS    RESTARTS   AGE
po/iperf   1/1       Running   0          9m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)                 AGE
svc/kubernetes   172.30.0.1   <none>        443/TCP,53/UDP,53/TCP   11m
[root@localhost origin]# oc version
oc v3.7.0-alpha.1+994a5a6-244
kubernetes v1.7.0+695f48a16f
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.1.69:8443
openshift v3.7.0-alpha.0+66c7f6c-430-dirty
kubernetes v1.7.0+695f48a16f

Feel free to reopen this bug if the problem persists.

Comment 5 Yan Du 2017-11-13 08:36:36 UTC
openshift v3.7.7
kubernetes v1.7.6+a08f5eeb62

I still could reproduce this issue on latest OCP 3.7
#  oc get all
NAME       READY     STATUS              RESTARTS   AGE
po/iperf   0/1       ContainerCreating   0          21m

@Ivan are you using invalid value in pod bandwidth? The issue only could be reproduced when using invalid pod bandwidth
eg: 
{
  "kind": "Pod",
  "apiVersion":"v1",
  "metadata": {
        "name": "iperf",
        "annotations": {
            "kubernetes.io/egress-bandwidth": "-10M",
            "kubernetes.io/ingress-bandwidth": "-3M"
        }
  },
  "spec": {
      "containers": [{
        "name": "iperf",
        "image": "yadu/hello-openshift-iperf"
      }]
  }
}

Comment 6 Red Hat Bugzilla 2023-09-14 04:01:32 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days