Description of problem: free-int cluster, there are 5 node-exporter pods in ContainerCreating status $ oc get pod -n openshift-monitoring | grep node-exporter node-exporter-4dsj5 2/2 Running 2 19d node-exporter-7qjr2 2/2 Running 0 4d node-exporter-8jtk6 2/2 Running 2 19d node-exporter-8s78w 2/2 Running 2 19d node-exporter-czsrb 2/2 Running 0 19d node-exporter-dwj2n 0/2 ContainerCreating 0 9h node-exporter-fkskr 2/2 Running 2 19d node-exporter-h2q9q 0/2 ContainerCreating 0 10h node-exporter-h7ggd 0/2 ContainerCreating 0 13h node-exporter-j56ln 2/2 Running 0 10h node-exporter-jsxdj 2/2 Running 2 18d node-exporter-lvvdl 2/2 Running 0 19d node-exporter-lzwt2 2/2 Running 0 1d node-exporter-pmscl 2/2 Running 0 19d node-exporter-r5mbv 0/2 ContainerCreating 0 10h node-exporter-wh426 0/2 ContainerCreating 0 9h node-exporter-z8k4p 2/2 Running 2 19d describe one pod $ oc describe pod node-exporter-dwj2n -n openshift-monitoring Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 4m (x2604 over 9h) kubelet, ip-172-31-63-238.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_node-exporter-dwj2n_openshift-monitoring_a73dae20-b609-11e8-a8f8-0ac586c2eb16_0": Error determining manifest MIME type for docker://registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11: unable to retrieve auth token: invalid username/password Version-Release number of selected component (if applicable): openshift v3.11.0-0.21.0 How reproducible: Always Steps to Reproduce: 1. Check pods under openshift-monitoring 2. 3. Actual results: There are 5 node-exporter pods in ContainerCreating status Expected results: All node-exporter pods should be ready Additional info:
I'm seeing this for essentially every daemonset. See kubectl get ev -n openshift-sdn kubectl get ev -n openshift-node Those yield the same errors/events.
This is likely the result of the docker auth token for registry.reg-aws.openshift.com expiring. There is a docker config.json local to each node that allows the kubelet to pull the sandbox image (ose-pod). It seems that one a subset of these nodes, the token has expired. This is an issue for any long lived cluster that 1) pulls ose-pod from a private registry and 2) registry auth uses tokens that expire
Same problem here, there's a temporary solution? In addcition the problem started when I was working on my project, everthing was going normal, so this problema appears. I tried to recreate the project, deleted absolutelly every single thing of the old one and recreated everything, but the bug still there.
Issue got solved, looks like that was a temporary problem.