https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.2/299#openshift-tests-sig-cli-kubectl-client-k8sio-simple-pod-should-contain-last-line-of-the-log-suiteopenshiftconformanceparallel-suitek8s May 22 20:15:07.017: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered May 22 20:15:09.963: INFO: Couldn't delete ns: "kubectl-754": namespace kubectl-754 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed (&errors.errorString{s:"namespace kubectl-754 was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed"}) May 22 20:15:09.965: INFO: Running AfterSuite actions on all nodes May 22 20:15:09.965: INFO: Running AfterSuite actions on node 1 This is likely an issue with the namespace controller, needs triage. Medium flake rate
Similar errors are found very often: https://search.svc.ci.openshift.org/?search=was+not+deleted+with+limit&maxAge=168h&context=2&type=all
I have been looking at this issue today. First and foremost, namespaces are deleted - `ns` controller that runs as part of `kcm` prints the status into the log file. I think we could lower the severity of the issue. I have analyzed logs from a few test runs and it seems that the namespaces couldn't have been deleted because there were some pending pods. This conflicts with the error msg from the tests "timed out waiting for the condition, namespace is empty but is not yet removed". I think that `ns` controller was behind because it had tried many times and constantly increased the next sync period (backoff). For example, let's put the events from https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/2161/pull-ci-openshift-installer-master-e2e-aws/7079/ on a timeline: 1. At 01:29:09.065 the test decided to destroy the namespace 2. ns controller tried to remove the namespace from 01:29:40 to 01:34:25, it couldn't due to "unexpected items still remain in namespace: e2e-test-build-webhooks-xvsrq for gvr: /v1, Resource=pods" 3. At 01:39:11.605 the test gave up - "Couldn't delete ns: "e2e-test-build-webhooks-xvsrq": namespace e2e-test-build-webhooks-xvsrq was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed" 4. At 01:39:28 (next sync run) ns contoller removed the namespace - "namespace_controller.go:171] Namespace has been deleted e2e-test-build-webhooks-xvsrq" I have opened https://github.com/openshift/origin/pull/23557 to see why the pods cannot be deleted.
I'm going to assign the issue to Node team as I haven’t found anything suspicious on the server-side and I think it has something to do with how kubelet or the underlying container runtime handles container creation/deletion/reporting. I think it is worthing knowing why some containers need sometimes more time to be deleted. The issue can be easily reproduced by running "TestWebhook" test, for example "openshift-tests run openshift/conformance --dry-run | grep -E "\sTestWebhook\s" | openshift-tests run -f -" I have also attached the logs from a faulty run where "pushbuild-2-build" and "pushbuild-1-build" weren't removed right away.
*** This bug has been marked as a duplicate of bug 1727090 ***