Bug 1775252
| Summary: | [Enhancement] Missing degraded condition when the static pod installer is unable to create pods due to networking errors | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Michal Fojtik <mfojtik> | |
| Component: | kube-apiserver | Assignee: | Michal Fojtik <mfojtik> | |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> | |
| Severity: | low | Docs Contact: | ||
| Priority: | low | |||
| Version: | 4.3.0 | CC: | aos-bugs, eparis, mfojtik, scuppett, xxia | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1764629 | |||
| : | 1782791 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-01 15:02:34 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1782791, 1782793, 1782795 | |||
| Bug Blocks: | 1764629 | |||
|
Comment 2
Xingxing Xia
2019-12-11 08:15:35 UTC
You should see the InstallerPodNetworkingDegraded condition set *after* 5 minutes (which is the maximum time we allow CNI to fix itself until we set that condition). Tried again, after 18m, still not see InstallerPodNetworkingDegraded set, only see InstallerPodContainerWaitingDegraded set, but the "message" is truncated so not know "because" what:
[xxia 2019-12-11 22:00:23 my]$ oc get po -n openshift-kube-apiserver --show-labels | grep installer
installer-10-xxia-1-7f894-m-0.c.openshift-qe.internal 0/1 Completed 0 40m app=installer
installer-10-xxia-1-7f894-m-1.c.openshift-qe.internal 0/1 Completed 0 42m app=installer
installer-10-xxia-1-7f894-m-2.c.openshift-qe.internal 0/1 Completed 0 38m app=installer
installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal 0/1 ContainerCreating 0 18m app=installer
...
[xxia 2019-12-11 22:01:30 my]$ oc get kubeapiserver cluster -o yaml
...
status:
conditions:
...
- lastTransitionTime: "2019-12-11T02:07:45Z"
message: 3 nodes are active; 3 nodes are at revision 10; 0 nodes have achieved
new revision 11
status: "True"
type: Available
- lastTransitionTime: "2019-12-11T13:43:15Z"
message: 3 nodes are at revision 10; 0 nodes have achieved new revision 11
status: "True"
type: Progressing
...
- lastTransitionTime: "2019-12-11T13:36:47Z"
message: The master node(s) "xxia-1-7f894-m-1.c.openshift-qe.internal" not ready
reason: MasterNodesReady
status: "True"
type: NodeControllerDegraded
...
- lastTransitionTime: "2019-12-11T02:04:19Z"
status: "False"
type: InstallerPodPendingDegraded
- lastTransitionTime: "2019-12-11T13:48:52Z"
message: 'Pod "installer-11-xxia-1-7f894-m-1.c.openshift-qe.internal" on node
"xxia-1-7f894-m-1.c.openshift-qe.internal" container "installer" is waiting
for 18m33.058366683s because '
reason: ContainerCreating
status: "True"
type: InstallerPodContainerWaitingDegraded
- lastTransitionTime: "2019-12-11T02:04:19Z"
status: "False"
type: InstallerPodNetworkingDegraded
...
Tested 4.3.0-0.nightly-2019-12-13-180405 env, still got http://file.rdu.redhat.com/~xxia/bug-1775252-result-for-c6.txt : For InstallerPodContainerWaitingDegraded, the `because ""` does not tell it is because what; InstallerPodNetworkingDegraded is not True, given "network is not ready" is told. *** Bug 1782795 has been marked as a duplicate of this bug. *** Xingxing Xia can you please retest? If this won't work now, can you please capture the installer pod YAML (to check whether the status in the pod carry the reason why it is stucked). Verified in 4.3.0-0.nightly-2020-06-23-075250 using above comment steps. After 5 mins, got the message and reason in InstallerPod* conditions:
$ oc get kubeapiserver cluster -o yaml
...
- lastTransitionTime: "2020-06-23T10:45:30Z"
status: "False"
type: InstallerPodPendingDegraded
- lastTransitionTime: "2020-06-23T11:42:47Z"
message: Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on node
"ip-10-0-131-75.ap-northeast-2.compute.internal" container "installer" is waiting
for 6m55.3685872s because ""
reason: ContainerCreating
status: "True"
type: InstallerPodContainerWaitingDegraded
- lastTransitionTime: "2020-06-23T11:42:47Z"
message: 'Pod "installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal" on
node "ip-10-0-131-75.ap-northeast-2.compute.internal" observed degraded networking:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to create
pod network sandbox k8s_installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal_openshift-kube-apiserver_0a7eecda-3697-44d3-bfe8-ecccfa327d3f_0(8bf0fdb3acd0d390df7361dc70bd97a544f3e468f58c7129146a1d480e818309):
Multus: [openshift-kube-apiserver/installer-7-ip-10-0-131-75.ap-northeast-2.compute.internal]:
PollImmediate error waiting for ReadinessIndicatorFile: timed out waiting for
the condition'
reason: FailedCreatePodSandBox
status: "True"
type: InstallerPodNetworkingDegraded
...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2628 |