Description of problem: After update the concurrencyPolicy of scheduledjob to Replace, the job scheduled not as the policy, only scheduled once, then will no new jobs scheduled: [root@dhcp-140-15 ~]# oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 2 Wed, 19 Oct 2016 10:28:00 +0800 [root@dhcp-140-15 ~]# oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-3360002802 1 0 24s sj3-467178220 1 0 1m [root@dhcp-140-15 ~]# oc patch scheduledjobs sj3 -p '{"spec":{"concurrencyPolicy": "Replace"}}' "sj3" patched [root@dhcp-140-15 ~]# oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 2 Wed, 19 Oct 2016 10:28:00 +0800 [root@dhcp-140-15 ~]# oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 3 Wed, 19 Oct 2016 10:29:00 +0800 [root@dhcp-140-15 ~]# oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 3 Wed, 19 Oct 2016 10:29:00 +0800 [root@dhcp-140-15 ~]# oc get pod NAME READY STATUS RESTARTS AGE sj3-4060648175-uqfm9 1/1 Running 0 40s [root@dhcp-140-15 ~]# oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-4060648175 1 0 46s [root@dhcp-140-15 ~]# oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-4060648175 1 0 1m [root@dhcp-140-15 ~]# oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-4060648175 1 1 10m [root@dhcp-140-15 ~]# oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-4060648175 1 1 28m [root@dhcp-140-15 ~]# oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 2 Wed, 19 Oct 2016 10:29:00 +0800 And master config got some logs like: Oct 18 22:34:44 qe-pm-chuyumaster-1 docker[23945]: I1018 22:34:44.823091 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Oct 18 22:34:54 qe-pm-chuyumaster-1 docker[23945]: I1018 22:34:54.834128 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Oct 18 22:35:04 qe-pm-chuyumaster-1 docker[23945]: I1018 22:35:04.846256 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Oct 18 22:35:14 qe-pm-chuyumaster-1 docker[23945]: I1018 22:35:14.858327 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Oct 18 22:35:24 qe-pm-chuyumaster-1 docker[23945]: I1018 22:35:24.874125 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Version-Release number of selected component (if applicable): openshift v3.3.1.3 How reproducible: Always Steps to Reproduce: 1. Create a scheduledjob oc run sj3 --image=busybox --restart=Never --schedule="*/1 * * * *" -- sleep 300 2. Check the master log 3. Set the concurrencyPolicy of the scheduledjob as Replace oc patch scheduledjobs sj3 -p '{"spec":{"concurrencyPolicy": "Replace"}}' 4. Check the scheduledjobs by 'oc get jobs' and 'oc get scheduledjobs' and master log. Actual results: 1. no new jobs scheduled as cron setting. 2. logs in master log: Oct 18 22:35:24 qe-pm-chuyumaster-1 docker[23945]: I1018 22:35:24.874125 1 event.go:216] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"67363c5c-95a3-11e6-a188-42010af00014", APIVersion:"batch", ResourceVersion:"3109", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-467178220" not found Expected results: 1. new jobs scheduled as cron setting. 2. should not get logs like this. Additional info:
ScheduledJobs are a techpreview feature in 3.3.1 (alpha in kubernetes in origin master). That's why this is not blocking the release in any way, but I'll fix the problem in the master only.
Upstream fix is in https://github.com/kubernetes/kubernetes/pull/35420 Downstream cherry-pick in https://github.com/openshift/origin/pull/11523
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/f5e3c6dbbb59b8a6e374485c8d11d829b4571a91 Merge pull request #11523 from soltysh/bug1386463 Merged by openshift-bot
Checked with devenv-fedora_5292, the issue still not fixed. openshift v1.4.0-alpha.0+8ecb3f5-997 [chuyu@dhcp-140-15 redhat]$ oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-1705609028 1 0 1m [chuyu@dhcp-140-15 redhat]$ oc get pods NAME READY STATUS RESTARTS AGE sj3-1705609028-o9rjg 1/1 Running 0 1m [chuyu@dhcp-140-15 redhat]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 2 Wed, 02 Nov 2016 13:59:00 +0800 I1102 06:04:52.772985 1133 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:"chuyu", Name:"sj3", UID:"3f739531-a0c1-11e6-904b-0ef7883dfef8", APIVersion:"batch", ResourceVersion:"649", FieldPath:""}): type: 'Warning' reason: 'FailedGet' Get job: jobs.batch "sj3-1781434183" not found
I've followed the steps described in #c1 and it's working as expected. I am running v1.4.0-alpha.0+537c0a5-1006 which is only a few commits ahead of what you were testing with. Can you give me the exact steps you're testing with?
Here is my steps on openshift v1.4.0-alpha.0+90d8c62-1000 1.login as normal user, and create a new project "chuyu" 2.schedule job with 'oc run sj3 --image=busybox --restart=Never --schedule="*/1 * * * *" -- sleep 300' 3.wait about 5m, then edited the scheduledjobs sj3 with concurrencyPolicy 'Replace' 4.then check scheduledjobs, jobs and pods, since then still no new job scheduled. As the docs description, the concurrencyPolicy for Replace, every minutes will have a new job scheduled to replace the old one. see the command output: [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 4 Thu, 03 Nov 2016 15:27:00 +0800 [chuyu@dhcp-140-15 ~]$ oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-1782089543 1 0 2m sj3-1857914698 1 0 2m sj3-1858045770 1 0 17s sj3-1933870925 1 0 1m [chuyu@dhcp-140-15 ~]$ oc edit scheduledjobs sj3 scheduledjob "sj3" edited [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 5 Thu, 03 Nov 2016 15:28:00 +0800 [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 4 Thu, 03 Nov 2016 15:28:00 +0800 [chuyu@dhcp-140-15 ~]$ oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-1782089543 1 1 5m sj3-1858045770 1 0 3m [chuyu@dhcp-140-15 ~]$ oc get pod NAME READY STATUS RESTARTS AGE sj3-1782089543-8cf0q 0/1 Completed 0 5m sj3-1858045770-tel3i 1/1 Running 0 3m [chuyu@dhcp-140-15 ~]$ oc get pod NAME READY STATUS RESTARTS AGE sj3-1782089543-8cf0q 0/1 Completed 0 6m sj3-1858045770-tel3i 1/1 Running 0 4m [chuyu@dhcp-140-15 ~]$ oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-1782089543 1 1 7m sj3-1858045770 1 1 5m [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 3 Thu, 03 Nov 2016 15:28:00 +0800 [chuyu@dhcp-140-15 ~]$ oc get jobs NAME DESIRED SUCCESSFUL AGE sj3-1782089543 1 1 7m sj3-1858045770 1 1 5m [chuyu@dhcp-140-15 ~]$ oc get pods NAME READY STATUS RESTARTS AGE sj3-1782089543-8cf0q 0/1 Completed 0 7m sj3-1858045770-tel3i 0/1 Completed 0 5m [chuyu@dhcp-140-15 ~]$ oc get pods NAME READY STATUS RESTARTS AGE sj3-1782089543-8cf0q 0/1 Completed 0 8m sj3-1858045770-tel3i 0/1 Completed 0 6m [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 3 Thu, 03 Nov 2016 15:28:00 +0800 [chuyu@dhcp-140-15 ~]$ oc get scheduledjobs NAME SCHEDULE SUSPEND ACTIVE LAST-SCHEDULE sj3 */1 * * * * False 3 Thu, 03 Nov 2016 15:28:00 +0800 [chuyu@dhcp-140-15 ~]$ date Thu Nov 3 15:38:05 CST 2016
Checked with openshift v1.4.0-alpha.0+019d471-1064, still get the same results. Also checked the src code on the instance, the PR have been merged. As https://github.com/openshift/openshift-docs/blob/master/dev_guide/scheduled_jobs.adoc, when the Concurrency with "Replace", "should cancels the currently running job and replaces it with a new one." that should be the issue.
Fix is in https://github.com/openshift/origin/pull/11751 and waiting upstream approval.
Checked with the OCP latest verion, the issue was fixed. openshift v3.4.0.23+24b1a58 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066