Description of problem: In 3.6, extensions/v1beta1 was removed. In 3.5, there was a play[1] to update the jobs backend to the batch API. It appears as if that did not update completed jobs as they still reference the old API. This leads to a massive spam of unexpected ListAndWatch error logs that slow down the API server. Oct 10 12:35:10 atomic-openshift-master-api[40847]: E1010 12:35:10.805130 40847 cacher.go:274] unexpected ListAndWatch error: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: Failed to list *batch.Job: no kind "Job" is registered for version "extensions/v1beta1" [root@home]# oc get all Error from server: no kind "Job" is registered for version "extensions/v1beta1" [1] https://github.com/openshift/openshift-ansible/blob/release-1.5/playbooks/common/openshift-cluster/upgrades/v3_5/storage_upgrade.yml Version-Release number of selected component (if applicable): How reproducible: 3.4 to 3.5 to 3.6.1 upgrade Steps to Reproduce: 1. These jobs were created in 3.4 and running with 2. Upgraded to 3.5 GA when it was released 3. Upgraded to 3.6 and the API is gone Actual results: Cluster is degraded due to API availability. Can not delete the jobs or the namespace anymore. Expected results: Smooth transition. Additional info:
This looks like old jobs that somehow managed to slip the migration, remove them with the following commands: ETCDCTL_API=3 etcdctl --key=<path_to_master.etcd-client.key> --cert=<path_to_master.etcd-client.crt> --cacert=<path_to_ca.crt> --endpoints=<etcd_address> del /kubernetes.io/jobs/<namespace>/<job_name> It's also worth checking pods created by those jobs, although they should be cleaned up by the garbage collector once job is gone.
Matthew Robson: Does the delete commands works for you ?
Sorry, yes. The delete command worked and it resolved the issue. We cleaned up all of the jobs and pods and confirmed they were all gone via: ETCDCTL_API=3 etcdctl --key=<path_to_master.etcd-client.key> --cert=<path_to_master.etcd-client.crt> --cacert=<path_to_ca.crt> --endpoints=<etcd_address> get / --prefix To see all remaining objects.
Matthew Robson: Thanks , then I'll verity this issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3389