Bug 1752982
| Summary: | [GCP] e2e failure: namespace is empty but is not yet removed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | David Eads <deads> | |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> | |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.2.0 | CC: | aos-bugs, aos-storage-staff, fbranczy, hekumar, jokerman, mpatel | |
| Target Milestone: | --- | |||
| Target Release: | 4.2.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1753293 (view as bug list) | Environment: | ||
| Last Closed: | 2019-10-16 06:41:28 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1753293, 1758327 | |||
The log specifically implicates pods, which are deleted by the kubelet. Assigning to the node team for the next look of why kubelets are taking a long time to remove pods. The attached PR will improve the responsiveness of the Kubelet sending SIGKILL signals to pods. *** Bug 1727090 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |
The error looks like ``` fail [k8s.io/kubernetes/test/e2e/framework/framework.go:338]: Sep 17 05:10:50.330: Couldn't delete ns: "e2e-test-image-change-build-triggger-rpdgr": namespace e2e-test-image-change-build-triggger-rpdgr was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed (&errors.errorString{s:"namespace e2e-test-image-change-build-triggger-rpdgr was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed"}) ``` from https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/308 Chasing this through the KCM logs, we can see that namespace deletion starts sometime before 05:01 E0917 05:01:11.157191 1 namespace_controller.go:148] unexpected items still remain in namespace: e2e-test-image-change-build-triggger-rpdgr for gvr: /v1, Resource=pods E0917 05:06:25.855298 1 namespace_controller.go:148] unexpected items still remain in namespace: e2e-test-image-change-build-triggger-rpdgr for gvr: /v1, Resource=pods I0917 05:11:30.194300 1 namespace_controller.go:171] Namespace has been deleted e2e-test-image-change-build-triggger-rpdgr There are lots of other repeat messages as we backoff for an extended period of time, but the most noteworthy bit is that it takes over 5 minutes to delete pods. Volume detach seems like a reasonable thing for a first check. I seem to recall that we thought we had a problem on GCP in 3.11.