Bug 1918126

Summary: Prometheus in CreateContainerError state after upgrade to 4.5.11
Product: OpenShift Container Platform Reporter: Jaspreet Kaur <jkaur>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: Kubelet QA Contact: MinLi <minmli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: alegrand, anpicker, aos-bugs, bjarvis, dosmith, erooth, kakkoyun, lcosic, nagrawal, pehunt, pkrupa, spasquie, surbania, tsweeney, vpagar
Version: 4.5   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-03 20:31:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaspreet Kaur 2021-01-20 06:42:58 UTC
Description of problem: Prometheus faiks to start after recent upgrade. Below logs are seen in the logs :


oc get pods -n openshift-monitoring -l app=prometheus -o wide
NAME               READY   STATUS                 RESTARTS   AGE     IP              NODE                               NOMINATED NODE   READINESS GATES
prometheus-k8s-0   6/7     CreateContainerError   0          2d19h   192.171.2.22    ocp03infra03.example.com   <none>           <none>
prometheus-k8s-1   7/7     Running                0          22h     192.168.2.160   ocp03infra02.example.com   <none>           <none>

Jan 19 14:21:09 ocp03infra03.example.com crio[276176]: time="2021-01-19 14:21:09.351368767Z" level=info msg="CreateCtr: releasing container name k8s_prometheus_prometheus-k8s-0_openshift-monitoring_0e545ec3-3569-49b1-8ca1-ca765896d77c_0" file="server/container_create.go:565" id=4f0ac1e1-ccbd-425c-869b-0253757358dd name=/runtime.v1alpha2.RuntimeService/CreateContainer
Jan 19 14:21:09 ocp03infra03.ocp03.smbcgroup.com crio[276176]: time="2021-01-19 14:21:09.351494781Z" level=debug msg="Response error: container create failed: time=\"2021-01-19T14:21:09Z\" level=error msg=\"container_linux.go:348: starting container process caused \\\"exec: \\\\\\\"/bin/prometheus\\\\\\\": stat /bin/prometheus: no such file or directory\\\"\"\ncontainer_linux.go:348: starting container process caused \"exec: \\\"/bin/prometheus\\\": stat /bin/prometheus: no such file or directory\"\n" file="go-grpc-middleware/chain.go:25" id=4f0ac1e1-ccbd-425c-869b-0253757358dd name=/runtime.v1alpha2.RuntimeService/CreateContainer
Jan 19 14:21:09 ocp03infra03.example.com hyperkube[3730250]: E0119 14:21:09.351894 3730250 remote_runtime.go:200] CreateContainer in sandbox "83dd41b197ab281618e6a34b052a7ea689c74c60193e2959f988b7751b110220" from runtime service failed: rpc error: code = Unknown desc = container create failed: time="2021-01-19T14:21:09Z" level=error msg="container_linux.go:348: starting container process caused \"exec: \\\"/bin/prometheus\\\": stat /bin/prometheus: no such file or directory\""
Jan 19 14:21:09 ocp03infra03.example.com hyperkube[3730250]: container_linux.go:348: starting container process caused "exec: \"/bin/prometheus\": stat /bin/prometheus: no such file or directory"
Jan 19 14:21:09 ocp03infra03.ocp03.smbcgroup.com hyperkube[3730250]: E0119 14:21:09.352016 3730250 kuberuntime_manager.go:801] container start failed: CreateContainerError: container create failed: time="2021-01-19T14:21:09Z" level=error msg="container_linux.go:348: starting container process caused \"exec: \\\"/bin/prometheus\\\": stat /bin/prometheus: no such file or directory\""
Jan 19 14:21:09 ocp03infra03.example.com hyperkube[3730250]: container_linux.go:348: starting container process caused "exec: \"/bin/prometheus\": stat /bin/prometheus: no such file or directory"
Jan 19 14:21:09 ocp03infra03.example.com hyperkube[3730250]: E0119 14:21:09.352075 3730250 pod_workers.go:191] Error syncing pod 0e545ec3-3569-49b1-8ca1-ca765896d77c ("prometheus-k8s-0_openshift-monitoring(0e545ec3-3569-49b1-8ca1-ca765896d77c)"), skipping: failed to "StartContainer" for "prometheus" with CreateContainerError: "container create failed: time=\"2021-01-19T14:21:09Z\" level=error msg=\"container_linux.go:348: starting container process caused \\\"exec: \\\\\\\"/bin/prometheus\\\\\\\": stat /bin/prometheus: no such file or directory\\\"\"\ncontainer_linux.go:348: starting container process caused \"exec: \\\"/bin/prometheus\\\": stat /bin/prometheus: no such file or directory\"\n"


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Prometheus fails to start


Expected results: Prometheus should be running after upgrade


Additional info:

Comment 14 Peter Hunt 2021-03-03 20:31:06 UTC
as per https://bugzilla.redhat.com/show_bug.cgi?id=1918126#c12 I am closing this

Comment 15 Peter Hunt 2021-04-16 20:13:59 UTC
for posterity, I am updating this because I actually suspect it's due to the attached bug

*** This bug has been marked as a duplicate of bug 1942536 ***

Comment 16 Red Hat Bugzilla 2023-09-15 00:58:39 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days