Bug 2018413
Summary: | Error: context deadline exceeded, OCP 4.8.9 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | akmithal <akmithal> | ||||
Component: | Node | Assignee: | Peter Hunt <pehunt> | ||||
Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | medium | CC: | akmithal, aos-bugs, lmauda, nchhabra, pehunt | ||||
Version: | 4.8 | Flags: | lmauda:
needinfo?
schoudha: needinfo? |
||||
Target Milestone: | --- | ||||||
Target Release: | 4.10.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-03-10 16:23:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
akmithal
2021-10-29 08:19:55 UTC
Created attachment 1838236 [details]
Must gather logs collected for this error
you seem to have attached the ceph must-gather, rather than the openshift one. can you get me the resulting tar from ``` oc adm must-gather --node-name $node ``` where $node is the node this deployment is stuck on Another instance of this error today on deleting the Noobaa endpoint pod. This pod was running fine since few days. ----------------------------------------------- [root@ocp-akshat-1-inf ~]# oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES noobaa-core-0 1/1 Running 0 5d3h 10.254.5.77 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> noobaa-db-pg-0 1/1 Running 0 5d3h 10.254.5.74 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> noobaa-default-backing-store-noobaa-pod-f0ff5410 1/1 Running 0 5d3h 10.254.5.76 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> noobaa-endpoint-64bc4dffb6-wrw9x 0/1 CreateContainerError 0 15m 10.254.5.163 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> noobaa-operator-9bcc845cb-4r22x 1/1 Running 32 5d3h 10.254.8.87 master2.ocp-akshat-1.cp.fyre.ibm.com <none> <none> ocs-metrics-exporter-f97b6c966-2ctp9 1/1 Running 0 5d3h 10.254.5.71 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> ocs-operator-88f9d4c99-md28g 1/1 Running 35 5d3h 10.254.5.69 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> odf-console-77dc4875d4-sv5f6 1/1 Running 0 5d3h 10.254.5.72 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> odf-operator-controller-manager-6dbb67c6f9-w5mq6 2/2 Running 40 5d3h 10.254.8.86 master2.ocp-akshat-1.cp.fyre.ibm.com <none> <none> rook-ceph-operator-76ff6c5b9b-54j5l 1/1 Running 0 5d3h 10.254.5.70 master0.ocp-akshat-1.cp.fyre.ibm.com <none> <none> ----------------------------------------------- I have collected logs from master0 node and uploaded in box - https://ibm.ent.box.com/folder/145794528783 ( as the size of files is quite big) can you attach in google drive or something else? I am not able to access box without an ibm account Hi, I have attached this file in google drive - https://drive.google.com/file/d/1zZDNBmcgW0eRmr1sEMPO90deladG2V_Q/view?usp=sharing If I were to guess, I would guess this container has a very large directory attached as a volume. Is that the case? If so, following the steps in https://hackmd.io/7heLp_noQmqU_Ef7VaiCKg (eventually will be published to https://access.redhat.com/node/6221251) for the selinux relabeling may help. Can we try upgrading and trying that out? Hi @pehunt and @liranmauda This problem didn't go away with the fix from @liran.mauda --------------------------------------------- Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 17m default-scheduler Successfully assigned openshift-storage/noobaa-endpoint-7cb76c78c6-vt8k5 to worker1.ocp-akshat-2.cp.fyre.ibm.com Warning FailedMount 6m37s (x396 over 16m) kubelet MountVolume.SetUp failed for volume "pvc-f266e7f9-da62-41bf-aed8-527f34ccd341" : kubernetes.io/csi: mounter.SetUpAt failed to check for STAGE_UNSTAGE_VOLUME capability: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/spectrumscale.csi.ibm.com/csi.sock: connect: connection refused" Normal AddedInterface 4m26s multus Add eth0 [10.254.16.50/22] from openshift-sdn Warning Failed 25s (x2 over 2m26s) kubelet Error: context deadline exceeded [root@api ~]# podo NAME READY STATUS RESTARTS AGE noobaa-core-0 1/1 Running 0 22h noobaa-db-pg-0 1/1 Running 0 23h noobaa-default-backing-store-noobaa-pod-1bfc596f 1/1 Running 0 23h noobaa-endpoint-7cb76c78c6-947n5 0/1 ContainerCreating 0 17m noobaa-endpoint-7cb76c78c6-hctm7 0/1 CreateContainerError 0 17m noobaa-endpoint-7cb76c78c6-hglz4 1/1 Running 0 17m noobaa-endpoint-7cb76c78c6-kgnm8 1/1 Running 0 17m noobaa-endpoint-7cb76c78c6-mc4mq 0/1 ContainerCreating 0 17m noobaa-endpoint-7cb76c78c6-qmr9c 0/1 ContainerCreating 0 17m noobaa-endpoint-7cb76c78c6-tpdh4 0/1 ContainerCreating 0 17m noobaa-endpoint-7cb76c78c6-vt8k5 0/1 CreateContainerError 0 17m noobaa-operator-6c567cfcdd-wvlcn 1/1 Running 8 (5h10m ago) 23h ocs-metrics-exporter-5c87b7c77-fpk8s 1/1 Running 0 23h ocs-operator-c494fbdf5-gq9zw 1/1 Running 4 (15h ago) 23h odf-console-67c5878d75-4zl7n 1/1 Running 0 23h odf-operator-controller-manager-65c98b8b55-mc7cg 2/2 Running 3 (15h ago) 23h rook-ceph-operator-8585fd44df-f7vzd 1/1 Running 0 23h --------------------------------------------- The fix done was: 1. kubectl edit scc Change type to RunAsAny for seLinuxContext: 2. Edit noobaa-endpoint deployment under securityContext: ----- fsGroupChangePolicy: "OnRootMismatch" seLinuxOptions: type: "spc_t" ----- @akmithal.com Looking at your yamls (On DM) it looks like it was not edited... NooBaa operator runs over those yamls so changing the replica of noobaa operator to 0 then manually editing the yaml should work. Please update us. Hello akmithal, could you help check if the issue is fixed after PR is merged? I see the PR is merged in nooba operator and based on comment #12, I am marking his verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |