Description of problem: After upgrading to Openshift 3.9 it is observed that API crashes when using old format of webhook in buildconfig We are getting 5-10 core dumps a day from Openshift due to this issue it is however strongly correlated with bitbucket webhooks as we do not see them with any other repository sources. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1504819 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Crashes each time when the webhook is called Expected results: Webhooks should not impact Openshift services Additional info: High impact on the cluster
OK, this fails here: https://github.com/openshift/origin/blob/f98a624c53c8e90c6e0d863fd7f355d76151dd3b/pkg/build/webhook/bitbucket/bitbucket.go#L158
(In reply to Michal Fojtik from comment #7) > OK, this fails here: > > https://github.com/openshift/origin/blob/ > f98a624c53c8e90c6e0d863fd7f355d76151dd3b/pkg/build/webhook/bitbucket/ > bitbucket.go#L158 Also looks like this is broken in 3.10 as well. Fix: https://github.com/openshift/origin/pull/19912 (Ben, can you backport this down to 3.9?)
Thanks Michal!
Hi Michal, QE tried to reproduce with 3.9 cluster, but cannot get above panic error in audit.log and master api log: 1. We have tried with both old and new webhook link like below, build all can succeed and master not restart: new: /apis/build.openshift.io/v1/namespaces/wzheng1/buildconfigs/ruby-ex/webhooks/wRD8k9kb-QMXMpUYy-gQ/bitbucket old: /oapi/v1/namespaces/wzheng1/buildconfigs/ruby-ex/webhooks/wRD8k9kb-QMXMpUYy-gQ/bitbucket 2. Then we tried with pushevent.json to clean up all commit info from https://github.com/openshift/origin/blob/master/pkg/build/webhook/bitbucket/testdata/pushevent.json#L208 to line 267, leave just below: "created": false, "forced": false, "closed": false, "commits": [ <all deleted> ], "truncated": false Then use curl to invoke bitbucket webhook, below error appears and master api server restarts (also without above panic error): curl: (56) NSS: client certificate not found (nickname not specified)
Not sure why you didn't see the panic but it sounds like you did recreate the behavior w/ your second approach. That should be sufficient to verify the fix. Regarding approach (1) i'm not sure what you mean by new + old webhook link...I believe the issue here is the payload being supplied, whether it comes from an older bitbucket server or a newer one (the payload format changed in bitbucket and we have codepaths to support both payloads): this one is the older/problematic one: https://confluence.atlassian.com/bitbucket/event-payloads-740262817.html#EventPayloads-Push this is the new payload (no issue here): https://confluence.atlassian.com/bitbucketserver/event-payload-938025882.html
Verified openshift v3.10.0-0.63.0 kubernetes v1.10.0+b81c8f8 Reproduce steps: 1.Create an application, set bitbucket webhook trigger in bc 2.Get bitbucket webhook link 3.Create a payload file like below: ...snip... "push": { "changes": [ {"commits": [ ]} ] } ...snip... 4.use curl to invoke bitbucket webhook # curl -H "X-Event-Key: repo:push" -H "Content-Type: application/json" -k -X POST --data-binary @payload.json https://xxx/apis/build.openshift.io/v1/namespaces/dyan/buildconfigs/my-ruby-hello-world/webhooks/-woRAnLJXfdPquOh-Y1T/bitbucket Actual result: cannot trigger new build with error: { "kind": "Status", "apiVersion": "v1", "metadata": {}, "status": "Failure", "message": "Unable to extract valid event from payload: ... snip ... "reason": "BadRequest", "code": 400 } And master api does not crash
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816