Bug 1585663 - API server crashes when using old format of webhook triggers in build Configs
Summary: API server crashes when using old format of webhook triggers in build Configs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Build
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.10.0
Assignee: Michal Fojtik
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks: 1586076
TreeView+ depends on / blocked
 
Reported: 2018-06-04 11:23 UTC by Jaspreet Kaur
Modified: 2018-07-30 19:17 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The webhook payload can contain an empty commit array which results in an array indexing error when processed by the api server. Consequence: The api server crashes. Fix: Check for an empty array before attempting to index into it. Result: Empty commit payloads are handled w/o crashing the api server.
Clone Of:
: 1586076 (view as bug list)
Environment:
Last Closed: 2018-07-30 19:16:54 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:17:31 UTC

Description Jaspreet Kaur 2018-06-04 11:23:35 UTC
Description of problem: After upgrading to Openshift 3.9 it is observed that API crashes when using old format of webhook in buildconfig

We are getting 5-10 core dumps a day from Openshift due to this issue it is however strongly correlated with bitbucket webhooks as we do not see them with any other repository sources.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1504819


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Crashes each time when the webhook is called 


Expected results: Webhooks should not impact Openshift services


Additional info: High impact on the cluster

Comment 8 Michal Fojtik 2018-06-05 11:03:39 UTC
(In reply to Michal Fojtik from comment #7)
> OK, this fails here:
> 
> https://github.com/openshift/origin/blob/
> f98a624c53c8e90c6e0d863fd7f355d76151dd3b/pkg/build/webhook/bitbucket/
> bitbucket.go#L158

Also looks like this is broken in 3.10 as well.

Fix: https://github.com/openshift/origin/pull/19912

(Ben, can you backport this down to 3.9?)

Comment 9 Ben Parees 2018-06-05 13:03:56 UTC
Thanks Michal!

Comment 10 Wenjing Zheng 2018-06-06 09:10:31 UTC
Hi Michal, QE tried to reproduce with 3.9 cluster, but cannot get above panic error in audit.log and master api log:
1. We have tried with both old and new webhook link like below, build all can succeed and master not restart:
new: /apis/build.openshift.io/v1/namespaces/wzheng1/buildconfigs/ruby-ex/webhooks/wRD8k9kb-QMXMpUYy-gQ/bitbucket
old: /oapi/v1/namespaces/wzheng1/buildconfigs/ruby-ex/webhooks/wRD8k9kb-QMXMpUYy-gQ/bitbucket

2. Then we tried with pushevent.json to clean up all commit info from https://github.com/openshift/origin/blob/master/pkg/build/webhook/bitbucket/testdata/pushevent.json#L208 to line 267, leave just below:
                                "created": false,
                                "forced": false,
                                "closed": false,
                                "commits": [
                                 <all deleted>
                                ],
                                "truncated": false

Then use curl to invoke bitbucket webhook, below error appears and master api server restarts (also without above panic error):
curl: (56) NSS: client certificate not found (nickname not specified)

Comment 11 Ben Parees 2018-06-06 14:51:33 UTC
Not sure why you didn't see the panic but it sounds like you did recreate the behavior w/ your second approach.  That should be sufficient to verify the fix.

Regarding approach (1) i'm not sure what you mean by new + old webhook link...I believe the issue here is the payload being supplied, whether it comes from an older bitbucket server or a newer one (the payload format changed in bitbucket and we have codepaths to support both payloads):

this one is the older/problematic one:
https://confluence.atlassian.com/bitbucket/event-payloads-740262817.html#EventPayloads-Push

this is the new payload (no issue here):
https://confluence.atlassian.com/bitbucketserver/event-payload-938025882.html

Comment 13 Dongbo Yan 2018-06-07 02:44:58 UTC
Verified
openshift v3.10.0-0.63.0
kubernetes v1.10.0+b81c8f8

Reproduce steps:
1.Create an application, set bitbucket webhook trigger in bc
2.Get bitbucket webhook link
3.Create a payload file like below:
  ...snip...
  "push": {
            "changes": [
                        {"commits": [ ]}
             ]
           }
  ...snip...
4.use curl to invoke bitbucket webhook
# curl -H "X-Event-Key: repo:push" -H "Content-Type: application/json" -k -X POST --data-binary @payload.json https://xxx/apis/build.openshift.io/v1/namespaces/dyan/buildconfigs/my-ruby-hello-world/webhooks/-woRAnLJXfdPquOh-Y1T/bitbucket

Actual result:
cannot trigger new build with error:
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unable to extract valid event from payload: 
  ... snip ...
  "reason": "BadRequest",
  "code": 400
}

And master api does not crash

Comment 15 errata-xmlrpc 2018-07-30 19:16:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.