Bug 1300949 - [devexp_public_683]Not all build pods can be deleted during deletion pending of buildConfig
Summary: [devexp_public_683]Not all build pods can be deleted during deletion pending ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Build
Version: 3.x
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Martin Milata
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-22 07:56 UTC by Dongbo Yan
Modified: 2017-05-30 12:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-30 12:49:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Script for triggering the bug (728 bytes, text/plain)
2016-11-22 17:58 UTC, Martin Milata
no flags Details
openshift master log via journalctl (1.86 MB, text/x-vhdl)
2016-12-02 08:24 UTC, Dongbo Yan
no flags Details

Description Dongbo Yan 2016-01-22 07:56:11 UTC
Description of problem:
BuildConfig will be deletion pending status when delete it,and all builds will be deleted during this period.But not all build pods can be deleted during deletion pending.Some build pods remain exist.When you create new bc and builds, it will make builds fail because some pods exist.

Version-Release number of selected component (if applicable):
oc v1.1.1-95-gacaa8ef
kubernetes v1.1.0-origin-1107-g4c8e6f4

How reproducible:
Always 
About 50% build pods can't be deleted immediately

Steps to Reproduce:
1.Login openshift, create project and bc.Start 200 builds.
2.Delete bc, check build pod status.
3.Create a new bc, and start 200 builds, then check build status.

Actual results:
2.After bc deleted, there are some build pods existing.
3.Some builds fail since previous build pod existing.

Expected results:
Should delete all of build pods during bc is under deletion pending status.

Additional info:

Comment 1 Wenjing Zheng 2016-01-25 07:40:47 UTC
This is a bug for new features for next version 3.2, not a bug for 3.1.1

Comment 2 Martin Nagy 2016-01-28 15:53:15 UTC
After discussion we decided this bug has low impact since it is hard to trigger, while at the same time a solution to it might be too complicated.

Lowering severity and priority, please note that this bug should not block the acceptance of the card.

Comment 3 Ben Parees 2016-11-11 18:23:57 UTC
Martin, here's another low priority bug you can look into if you're looking for stuff to do.  I think it may require some refactoring of the buildconfig deletion logic to close the timing window that exists here.

Comment 4 Martin Milata 2016-11-15 18:02:25 UTC
I can't reproduce the problem. Dongbo, can you still reproduce it? Are you using the web UI or CLI? If CLI, can you paste the commands and the output? Were the builds Finished/Failed or New/Pending?

Looking at the CLI, when deleting buildconfig, it deletes all its builds first (unless run with --cascade=false) before deleting the buildconfig object. If I understand this correctly the only reason for this bug to happen is if deleting a build didn't cause its pod to be deleted.

Comment 5 Dongbo Yan 2016-11-16 10:54:02 UTC
Steps:
1.$ oc new-app https://raw.githubusercontent.com/openshift/origin/master/examples/sample-app/application-template-stibuild.json
2.$ for i in {1..200}; do oc start-build ruby-sample-build > $i;done
3.$ oc delete bc ruby-sample-build

Actual result: After bc deleted, still a lot of build pod exist
# oc delete bc ruby-sample-build 
buildconfig "ruby-sample-build" deleted
# oc get pod -w
NAME                         READY     STATUS        RESTARTS   AGE
database-1-lkgmy             1/1       Running       0          3m
frontend-1-8nfkd             1/1       Running       0          3m
frontend-1-qmvcw             1/1       Running       0          3m
frontend-3-deploy            0/1       Terminating   0          1m
frontend-3-hook-pre          0/1       Terminating   0          30s
ruby-sample-build-10-build   1/1       Terminating   0          2m
ruby-sample-build-11-build   1/1       Terminating   0          2m
ruby-sample-build-12-build   1/1       Terminating   0          2m
ruby-sample-build-13-build   1/1       Terminating   0          2m
ruby-sample-build-14-build   1/1       Terminating   0          2m
ruby-sample-build-15-build   1/1       Terminating   0          2m
ruby-sample-build-16-build   1/1       Terminating   0          1m
ruby-sample-build-17-build   1/1       Terminating   0          1m

Comment 6 Martin Milata 2016-11-16 13:27:16 UTC
Thanks! I wonder why the pods are stuck in Terminating. What's your docker and openshift version?

Comment 7 Dongbo Yan 2016-11-17 01:50:55 UTC
openshift v3.4.0.26+f7e109e
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-57.el7.x86_64
 Go version:      go1.6.2
 Git commit:      79ebcd8-unsupported
 Built:           Thu Oct 20 14:37:17 2016
 OS/Arch:         linux/amd64

Comment 8 Martin Milata 2016-11-22 17:58:49 UTC
Created attachment 1222813 [details]
Script for triggering the bug

Reproducing the bug reliably:

We want to have build pods in Terminating state for some time after deleting their build. This can be accomplished by patching OpenShift to terminate build pods with 60s grace period (period between processes in the pod being sent SIGTERM and SIGKILL). Then you can run a container with something that ignores SIGTERM.

Patch:

  diff --git a/pkg/build/controller/factory/factory.go b/pkg/build/controller/factory/factory.go
  index f3f53c5..4e27367 100644
  --- a/pkg/build/controller/factory/factory.go
  +++ b/pkg/build/controller/factory/factory.go
  @@ -637,7 +637,8 @@ func (c ControllerClient) CreatePod(namespace string, pod *kapi.Pod) (*kapi.Pod,
   
   // DeletePod destroys a pod using the Kubernetes client.
   func (c ControllerClient) DeletePod(namespace string, pod *kapi.Pod) error {
  -       return c.KubeClient.Pods(namespace).Delete(pod.Name, nil)
  +       grace := int64(60)
  +       return c.KubeClient.Pods(namespace).Delete(pod.Name, &kapi.DeleteOptions{GracePeriodSeconds: &grace})
   }
   
   // GetPod gets a pod using the Kubernetes client.

Dockerfile for such container:

  FROM centos:7
  CMD python -c 'import signal; import time; signal.signal(signal.SIGTERM, signal.SIG_IGN); time.sleep(600)'

Rebuild OpenShift with the patch, build the image, tag it as "ignoresigterm". Start OpenShift, run the attached script. What it does is essentially:

1. Create BuildConfig that uses the container above as a custom build container, start build.
2. Delete the BuildConfig, this deletes the build and takes the pod into Terminating state.
3. Recreate the BuildConfig and start the build again.

Results: the build is in state New until the Terminating pod finishes. Then it usually goes straight into Failed but sometimes is Running for a second and then goes Failed. It doesn't seem to create any pod in either case.

Notes:

- You can do this with Build directly instead of BuildConfig, the behavior is the same.
- When you create a Pod directly instead of step 1., and then delete it with nonzero grace period in 2., the build started in 3. will be in New until the manually created Pod finishes. Afterwards the build moves into Running and the build pod is started. This is different behavior than when the Pod is created by Build.

Comment 9 Ben Parees 2016-11-22 18:30:15 UTC
@Martin it seems like based on your recreate you think the issue is builds that are created after the "delete buildconfig" request is issued, is that accurate?

Comment 10 Martin Milata 2016-11-25 11:43:33 UTC
Ben, not sure if I understand your question. The issue is that when you create a build "foo" it starts a build pod "foo-build". When a pod with this name already exists, and is in Terminating state, the build waits in New before the pod terminates and then goes into Failed state.

Comment 11 Ben Parees 2016-11-25 16:55:19 UTC
@Martin how are we getting into a situation where the build pod already exists?

I'm also not clear on what this has to do w/ deleting buildconfigs.  If the buildconfig is deleted, and the build is deleted, and the pod is "terminating", does the pod never reach a complete/failed state?

Comment 12 Martin Milata 2016-11-28 11:54:31 UTC
I know of two ways how to get into this situation. First:

* create build foo
- pod foo-build is created
* delete build foo
- pod foo-build goes into Terminating
* create build foo (before foo-build terminates)
- foo is in state New
- pod foo-build terminates
- foo goes into state Failed

Same thing can be done with Build created by BuildConfig, which is what the original report describes:

* create buildconfig foo (assume it has ConfigChange trigger)
- build foo-1 is created
- pod foo-1-build is created
* delete buildconfig foo
- build foo-1 is deleted
- pod foo-1-build goes into Terminating (buildconfig deletion doesn't wait for the pod to actually stop)
* create buildconfig foo
- build foo-1 is created
- foo-1 is in state New
- pod foo-1-build terminates
- foo-1 goes into state Failed

> If the buildconfig is deleted, and the build is deleted, and the pod is "terminating", does the pod never reach a complete/failed state?

Not sure if it can hang in Terminating forever, I think Docker bugs have caused this in the past. This bug exists even if the pod eventually terminates - user just has to recreate the build before the pod stops. No idea how often does this happen in practice.

I need to stare at the code some time now to understand what's really going on.

Comment 13 Ben Parees 2016-11-28 15:15:17 UTC
Sounds like the real issue here is our handling of the scenario where a build is created, but a pod for that build(well, what appears to be a pod for that build, but isn't really) already exists.

Ideally we should just fail the build when this happens, with a reason of "build pod already exists, cannot process the build"

However there are valid failover/race conditions where I think we might encounter this (the build pod was already created, correctly, and we should just do nothing with the build and wait for the build pod to finish).

One option would be to add more uniqueness (via annotation or something) that ties a particular build pod to a particular build independent of the build name, but that's going to lead to a mess of changes and possible backwards compatibility issues that i'd rather not take on to address what is basically a pretty small timing window.

We might be able to do it based on creation time... if the existing build pod is older than the build, then we know it's from something else and doesn't belong to this build (the pod is always created after the build).

So perhaps we could use that to throw a proper error when we can't create the build pod, and mark the build as failed with a useful reason.

Comment 14 Martin Milata 2016-11-28 15:52:16 UTC
(In reply to Ben Parees from comment #13)
> Sounds like the real issue here is our handling of the scenario where a
> build is created, but a pod for that build(well, what appears to be a pod
> for that build, but isn't really) already exists.

Yes.

> Ideally we should just fail the build when this happens, with a reason of
> "build pod already exists, cannot process the build"

Other possibility would be to wait for the existing pod to terminate. At least in this particular case the resulting user experience would be better (build finishing later vs. build failure).

> However there are valid failover/race conditions where I think we might
> encounter this (the build pod was already created, correctly, and we should
> just do nothing with the build and wait for the build pod to finish).

If the build pod was successfully created the build transitions from phase New to Pending. So this would only be error when the build is in New, or am I missing something?

> ...
> We might be able to do it based on creation time... if the existing build
> pod is older than the build, then we know it's from something else and
> doesn't belong to this build (the pod is always created after the build).

The timestamps are good idea. Could we perhaps combine it with the build phase?

For reference, here's the code involved:

https://github.com/openshift/origin/blob/v1.4.0-rc1/pkg/build/controller/controller.go#L202
Here the build pod is created - if that fails warning event is emitted, otherwise phase is changed to Pending.

https://github.com/openshift/origin/blob/v1.4.0-rc1/pkg/build/controller/controller.go#L277
I assume this function is run whenever a pod changes state. Here we can add additional logic to ignore changes from pods that were not started by existing build.

Comment 15 Ben Parees 2016-11-28 16:03:45 UTC
> Other possibility would be to wait for the existing pod to terminate.

i don't think we can do that from within an "oc delete buildconfig", if you've asked to delete the buildconfig (and associated builds) we should be terminating the builds.  you've already indicated you don't care about the outcome.

if anything, it's possible we should cascade delete the build pods when we delete the build.

> If the build pod was successfully created the build transitions from phase New to Pending. So this would only be error when the build is in New, or am I missing something?

I think that's correct, the problematic scenario would be:

1) build in new state
2) controller creates pod for the build
3) controller dies (never updates build to pending state)
4) controller restarts, sees build in new state
5) controller tries to create the pod for the build
6) controller sees pod already exists, incorrectly fails the build (when what it should have done is ignored the existing pod and let the build proceed as normal, via the pod execution).

Comment 16 Ben Parees 2016-11-28 16:07:49 UTC
> I assume this function is run whenever a pod changes state. Here we can add additional logic to ignore changes from pods that were not started by existing build.

i'm reluctant to lean too heavily on timestamps for this, there could be other reasons a pod's timestamp pre-dates the build's timestamp (clock being changed at just the right time), so i'd prefer to only use this mechanism to close the issue we're discussing here (failure to create a pod for a build because one already exists: need to determine if it's a no-op(the existing pod is valid) or a reason to fail the build (the existing pod is from something else, so we're stuck))

Comment 17 Martin Milata 2016-11-28 16:30:21 UTC
(In reply to Ben Parees from comment #15)
> > Other possibility would be to wait for the existing pod to terminate.
> 
> i don't think we can do that from within an "oc delete buildconfig", if
> you've asked to delete the buildconfig (and associated builds) we should be
> terminating the builds.  you've already indicated you don't care about the
> outcome.

Sorry, what I meant is to let the new build stay in phase New until the old build pod terminates. That already happens when build controller tries to create build pod but its name collides with pod not created by build controller, reproducible like this:

* manually create pod named foo-build, i.e. without the "openshift.io/build.name" annotation
* create build named foo
- build cannot create pod, emits warning events, stays in New
* delete pod foo-build
- foo-build stops, because of the lack of annotation build controllers ignores the event
- build creates pod successfully, transitions from New->Pending
- (build and its pod continue normally)

> if anything, it's possible we should cascade delete the build pods when we
> delete the build.

From CLI? When build is deleted the controller deletes its pod, albeit with default grace period. We can change the grace period to 0 to delete the pod immediately using sigkill. Not sure if that's a good idea.

> > If the build pod was successfully created the build transitions from phase New to Pending.
> > So this would only be error when the build is in New, or am I missing something?
> 
> I think that's correct, the problematic scenario would be:
> 
> 1) build in new state
> 2) controller creates pod for the build
> 3) controller dies (never updates build to pending state)
> 4) controller restarts, sees build in new state
> 5) controller tries to create the pod for the build
> 6) controller sees pod already exists, incorrectly fails the build (when
> what it should have done is ignored the existing pod and let the build
> proceed as normal, via the pod execution).

Oh, the possibility of controllers crashing at any time adds whole new level of paranoia:)

Comment 18 Ben Parees 2016-11-28 16:57:39 UTC
> Sorry, what I meant is to let the new build stay in phase New until the old build pod terminates. That already happens when build controller tries to create build pod but its name collides with pod not created by build controller, reproducible like this:
* manually create pod named foo-build, i.e. without the "openshift.io/build.name" annotation
* create build named foo
- build cannot create pod, emits warning events, stays in New
* delete pod foo-build
- foo-build stops, because of the lack of annotation build controllers ignores the event
- build creates pod successfully, transitions from New->Pending
- (build and its pod continue normally)

unfortunately that's a contrived example... normally the build-pod would have the annotation, so the build will get updated by the "old" build pod as it transitions through states (unless we did as you suggested, but i'm wary of, to ignore build pod state updates if the build pod is older than the build)

> From CLI? When build is deleted the controller deletes its pod, albeit with default grace period. We can change the grace period to 0 to delete the pod immediately using sigkill. Not sure if that's a good idea.

hm.  there's not a ton of control users have over what is done during the "graceful" shutdown anyway, so terminating immediately might not be that big a deal, but i agree it's not ideal.

Comment 19 Martin Milata 2016-11-29 10:48:39 UTC
This commit message might also be relevant:

> Also change builds to go into "Error" status when a problem occurs
> and record a message.  Slightly refactor the build controller code.
> If the build pod already exists, assume that there is a race in build
> execution and stay in "pending" state.
> ...
https://github.com/openshift/origin/commit/7d705af09388d8a46a9901ebe3ba28ffb992d695

Do we need to discuss the behavior more widely (mailing list?) or should I just open a PR to make the build Failed when it's new and sees its build pod running?

Comment 20 Ben Parees 2016-11-29 15:48:33 UTC
actually make it BuildPhaseError instead of failed, but you can just make the change, I don't think we need a larger discussion.

Comment 21 Martin Milata 2016-11-29 16:41:56 UTC
PR: https://github.com/openshift/origin/pull/12057

Comment 22 openshift-github-bot 2016-12-01 18:56:51 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/0cfb747a088461f03fab091c61759b576a926e3e
Fail new builds that can't start build pod because it already exists

Consider following scenario:

- build is started by user
- build controller creates build pod
- build is deleted by user
- build pod is killed but remains Terminating during grace period
- build with same name as in first step is started by user

Previously the new build stayed in New phase until the old build pod
finished and then went into Failed after the pod terminated. This commit
causes the new build to go into Error phase when it first tries creating
build pod and fails because pod with the same name (and earlier
creationTimestamp) exists.

Slightly improves bug 1300949 (the described behavior is still very
similar).

Comment 23 Ben Parees 2016-12-01 20:59:13 UTC
There's still a big difference between the issue Martin was able to recreate, and how Dongbo originally reported it.  In Dongbo's scenario, all the builds were instantiated first, then the buildconfig was deleted, yet some build pods were left around despite the fact that they should have been deleted by HandleBuildDeletion.

I think we need to get a recreate from @Dongbo that includes level 5 logging from the master so we can see what HandleBuildDeletion is doing/not doing to see why the pods are not being deleted.

Comment 24 Dongbo Yan 2016-12-02 08:24:36 UTC
Created attachment 1227206 [details]
openshift master log via journalctl

Comment 25 Martin Milata 2016-12-02 11:55:59 UTC
I don't see the difference. HandleBuildDeletion calls the API endpoint for deleting pods. It doesn't set the gracePeriodSeconds parameter so the default value of 30 is used. The processes in the pod are sent SIGTERM signal and deletionTimestamp of the pod is set to now+30s which makes the pod show as "Terminating" in oc output. Afterwards, when current time is greater that deletionTimestamp and there are still some processes running in the pod, they are sent SIGKILL which forces the processes to stop (and the pod to terminate). [1]

So when HandleBuildDeletion returns, the pods may still exist for up to 30 seconds. In this <30s window, if you create build with the same name as before, name collision happens and the build fails. This is consistent with what Dongbo is describing - what am I missing?

[1] http://kubernetes.io/docs/user-guide/pods/#termination-of-pods

Comment 26 Ben Parees 2016-12-02 15:15:38 UTC
@Martin it wasn't clear to me from https://bugzilla.redhat.com/show_bug.cgi?id=1300949#c5 that new builds were being created after the buildconfig was deleted, it sounded like the complaint was just that the pods stayed in terminating, no reference to creating new builds after deleting the buildconfig.

Reading back through the original report, I see it is more like the scenario you described, so I think this is as "resolved" as it's going to get.  That is, w/ your change the behavior will now be that the builds created after the buildconfig was deleted, will end up in an "error" state with a reason being that a pod already existed so they build couldn't proceed.

@Dongbo can you confirm that if you wait ~30s after deleting the buildconfig, all the build pods are removed and you can successfully create new builds at that point?

Comment 27 Dongbo Yan 2016-12-05 10:18:18 UTC
I don't wait 30s after deleting bc. If all build pods are removed, I can trigger new build successfully.

While I reported, I want to delete build pod during BuildDeletion period, if output an error when start-build, it's also ok

Comment 28 Ben Parees 2016-12-05 13:32:11 UTC
@Dongbo I just wanted to confirm that eventually the old build pods do get cleaned up on their own, if you wait a sufficient period of time.

I'm going to mark this ON_QA since I think we've resolved all we're going to resolve here:

1) if you create a new build who's name matches an existing build pod, that build will still fail (which was the original reported issue)

2) but now the error will make it clearer that the build failed because an older pod already existed, instead of the build failing as a result of the older pod failing.

Comment 29 Dongbo Yan 2016-12-06 10:38:25 UTC
Verified
openshift v1.5.0-alpha.0+6eefa2a-247
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

devenv-rhel7_5491 (ami-0eb7b319)


Note You need to log in before you can comment on or make changes to this bug.