Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1396404 - Sometimes "activeDeadlineSeconds" in ScheduledJob doesn't take affect
Sometimes "activeDeadlineSeconds" in ScheduledJob doesn't take affect
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.3.1
Unspecified Unspecified
medium Severity medium
: ---
: 3.8.0
Assigned To: Maciej Szulik
Chuan Yu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-11-18 04:20 EST by Bing Li
Modified: 2018-03-28 10:05 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Where there is not activity in jobs controller will touch every job only when performing full resync on all jobs. Consequence: This results in some jobs significantly exceeding short activeDeadlineSeconds. Fix: Enguque jobs having short activeDeadlineSeconds set to be resynced more frequently. Result: Short activeDeadlineSeconds is applied correctly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-03-28 10:05:01 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 10:05 EDT

  None (edit)
Description Bing Li 2016-11-18 04:20:05 EST
Description of problem:
When scheduling jobs with "activeDeadlineSeconds", sometimes the job is still running even after the deadline exceeds.

Version-Release number of selected component (if applicable):
OpenShift Master: v3.3.1.3
Kubernetes Master: v1.3.0+52492b4

How reproducible:
Always

Steps to Reproduce:
1. Create a scheduledjob with "activeDeadlineSeconds:
$ cat sj.yaml 
apiVersion: batch/v2alpha1
kind: ScheduledJob
metadata:
  labels:
    run: sj
  name: sj
spec:
  jobTemplate:
    metadata:
    spec:
      completion: 1
      Parallelism: 1
      activeDeadlineSeconds: 10
      template:
        metadata:
          labels:
            run: sj
        spec:
          containers:
          - args:
            - sleep
            - "90"
            image: busybox
            imagePullPolicy: Always
            name: sj
            resources: {}
          restartPolicy: Never
  schedule: '* * * * *'
  suspend: false
$ oc create -f sj.yaml

2. Check if the job's pod would be terminated when reaching "activeDeadlineSeconds".

Actual results:
2. Sometimes 
$ oc get job
NAME            DESIRED   SUCCESSFUL   AGE
sj-1552910142   1         0            2m
sj-1628735297   1         1            3m
sj-1628866369   1         0            30s
sj-1704691524   1         0            1m
$ oc get pod
NAME                  READY     STATUS      RESTARTS   AGE
sj-1628735297-97di6   0/1       Completed   0          3m
sj-1628866369-ima1n   1/1       Running     0          35s
$ oc describe job sj-1552910142
...
Events:
  FirstSeen        LastSeen        Count        From                        SubobjectPath        Type                Reason                        Message
  ---------        --------        -----        ----                        -------------        --------        ------                        -------
  18m                18m                1        {job-controller }                        Normal                SuccessfulCreate        Created pod: sj-1552910142-uughm
  17m                17m                1        {job-controller }                        Normal                SuccessfulDelete        Deleted pod: sj-1552910142-uughm
  17m                17m                2        {job-controller }                        Normal                DeadlineExceeded        Job was active longer than specified deadline
$ oc get pod sj-1552910142-uughm
No resources found.
Error from server: pods "sj-1552910142-uughm" not found
$ oc describe job sj-1628735297
...
Events:
  FirstSeen        LastSeen        Count        From                        SubobjectPath        Type                Reason                        Message
  ---------        --------        -----        ----                        -------------        --------        ------                        -------
  20m                20m                1        {job-controller }                        Normal                SuccessfulCreate        Created pod: sj-1628735297-97di6
  18m                18m                1        {job-controller }                        Normal                DeadlineExceeded        Job was active longer than specified deadline
$ oc get pod sj-1628735297-97di6 -o yaml
...
        finishedAt: 2016-11-18T08:10:40Z
        reason: Completed
        startedAt: 2016-11-18T08:09:10Z

Expected results:
Job should finish when reaching "activeDeadlineSeconds" and delete the pod, instead of wait untill the pod completes.
Comment 2 Maciej Szulik 2016-11-23 04:59:52 EST
This is known problem with activeDeadlineSeconds in jobs, which should be supported. See https://github.com/kubernetes/kubernetes/issues/32149 for more details. At this point I don't have any ETA for it, yet. This only affects short ADS, where short means here less then 10mins, which is the full resync time in the job controller.
Comment 3 Michal Fojtik 2017-03-27 08:26:49 EDT
The upstream issue is not resolved yet and I don't think it will make it for 1.6, so adding target release to be 3.7 which more reflects the reality.
Comment 4 Maciej Szulik 2017-08-25 05:01:56 EDT
It'll be part of 3.8 release, at soonest, looking at the upstream issue. Moving target accordingly.
Comment 5 Maciej Szulik 2017-11-03 08:31:05 EDT
This is waiting for https://github.com/openshift/origin/pull/17115, I doubt this will happen this sprint, so I'm adding UpcomingSprint keyword.
Comment 6 Wang Haoran 2017-11-28 22:35:11 EST
It works fine with:
oc v3.8.0-alpha.0+fe6445a-249
kubernetes v1.8.1+0d5291c
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.8.0-alpha.0+e6b20e1
kubernetes v1.7.6+a08f5eeb6
Comment 9 errata-xmlrpc 2018-03-28 10:05:01 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.