Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1500513 - The extensions/v1beta1 API is not updated on old successful Jobs
The extensions/v1beta1 API is not updated on old successful Jobs
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.6.1
All Linux
unspecified Severity high
: ---
: 3.6.z
Assigned To: Maciej Szulik
zhou ying
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-10 15:31 EDT by Matthew Robson
Modified: 2017-12-07 02:12 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-07 02:12:13 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3389 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Enterprise security, bug fix, and enhancement update 2017-12-07 07:09:10 EST

  None (edit)
Description Matthew Robson 2017-10-10 15:31:06 EDT
Description of problem:

In 3.6, extensions/v1beta1 was removed. In 3.5, there was a play[1] to update the jobs backend to the batch API.

It appears as if that did not update completed jobs as they still reference the old API.

This leads to a massive spam of unexpected ListAndWatch error logs that slow down the API server.

Oct 10 12:35:10 atomic-openshift-master-api[40847]: E1010 12:35:10.805130   40847 cacher.go:274] unexpected ListAndWatch error: github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/cacher.go:215: Failed to list *batch.Job: no kind "Job" is registered for version "extensions/v1beta1"

[root@home]# oc get all
Error from server: no kind "Job" is registered for version "extensions/v1beta1"

[1] https://github.com/openshift/openshift-ansible/blob/release-1.5/playbooks/common/openshift-cluster/upgrades/v3_5/storage_upgrade.yml
Version-Release number of selected component (if applicable):


How reproducible:

3.4 to 3.5 to 3.6.1 upgrade


Steps to Reproduce:
1. These jobs were created in 3.4 and running with
2. Upgraded to 3.5 GA when it was released
3. Upgraded to 3.6 and the API is gone

Actual results:
Cluster is degraded due to API availability. Can not delete the jobs or the namespace anymore.


Expected results:
Smooth transition.

Additional info:
Comment 3 Maciej Szulik 2017-10-16 11:04:24 EDT
This looks like old jobs that somehow managed to slip the migration, remove them with the following commands:
 
ETCDCTL_API=3 etcdctl --key=<path_to_master.etcd-client.key> --cert=<path_to_master.etcd-client.crt> --cacert=<path_to_ca.crt> --endpoints=<etcd_address> del /kubernetes.io/jobs/<namespace>/<job_name>

It's also worth checking pods created by those jobs, although they should be cleaned up by the garbage collector
once job is gone.
Comment 4 zhou ying 2017-10-17 04:30:59 EDT
Matthew Robson:

Does the delete commands works for you ?
Comment 5 Matthew Robson 2017-10-31 09:16:15 EDT
Sorry, yes. The delete command worked and it resolved the issue.

We cleaned up all of the jobs and pods and confirmed they were all gone via:

ETCDCTL_API=3 etcdctl --key=<path_to_master.etcd-client.key> --cert=<path_to_master.etcd-client.crt> --cacert=<path_to_ca.crt> --endpoints=<etcd_address> get / --prefix

To see all remaining objects.
Comment 6 zhou ying 2017-10-31 21:15:54 EDT
Matthew Robson:

   Thanks , then I'll verity this issue.
Comment 9 errata-xmlrpc 2017-12-07 02:12:13 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3389

Note You need to log in before you can comment on or make changes to this bug.