Bug 2009090
| Summary: | Increased CI reports of "illegally transitioned to Pending" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> |
| Component: | Node | Assignee: | Elana Hashman <ehashman> |
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, ehashman, wking |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-29 22:47:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Devan Goodwin
2021-09-29 22:31:30 UTC
Present in 4.9, see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1997478#c25 > Sep 29 21:45:00 ip-10-0-255-253 hyperkube[1388]: E0929 21:45:00.504009 1388 kubelet_pods.go:1484] "Pod attempted illegal phase transition" pod="openshift-network-diagnostics/network-check-arget-xqqn8" originalStatusPhase=Failed apiStatusPhase=Pending apiStatus="&PodStatus{Phase:Pending,Conditions:[]PodCondition{},Message:,Reason:,HostIP:,PodIP:,StartTime:<nil>,ContainerStatuses:[]ContainerStatus{ContainerStatus{Name:network-check-target-container,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:da24fe6e3f9bf5bf398ea11d384c9ef4d5e964e827408f5e7eedffa3e67e2b26,ImageID:,ContainerID:,Started:nil,},},QOSClass:Burstable,InitContainerStatuses:[]ContainerStatus{},NominatedNodeName:,PodIPs:[]PodIP{},EphemeralContainerStatuses:[]ContainerStatus{},}" Note that https://bugzilla.redhat.com/show_bug.cgi?id=1933760 is an unrelated bug, in which some pods accidentally get reset to Pending from Running. Those error messages stem from event processing as seen by the API server in the OpenShift tests. In this case, we are seeing the kubelet directly log "Pod attempted illegal phase transition", which is invoked here: https://github.com/openshift/kubernetes/blob/f181eb2582e1649676395f25710c14b427b3369c/pkg/kubelet/kubelet_pods.go#L1480-L1484 This log message is only possible when a terminal pod (failed or succeeded) attempts a phase transition. Ah, no, disregard comment above as crossed wires. This does not appear to be any higher impact on 4.10 than it was on 4.9 or 4.8. That bug still has not been fixed. The test is marked as flaky but as this shouldn't actually have any impact on the pods themselves (only the test accounting). Marking as dupe of 1933760. We had one patch go in which significantly improved the symptoms we were seeing in 1933760 but it's not fully fixed. Static pods may not be fixable and it's unclear if it's a genuine bug. I will file a separate bug for the phenomenon in https://bugzilla.redhat.com/show_bug.cgi?id=2009090#c3 *** This bug has been marked as a duplicate of bug 1933760 *** |