Bug 1347904

Summary: Ceph RGW deadlocks in curl_multi_wait
Product: Red Hat Enterprise Linux 7 Reporter: Ken Dreyer (Red Hat) <kdreyer>
Component: curlAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Stefan Dordevic <sdordevi>
Severity: medium Docs Contact:
Priority: high    
Version: 7.3CC: cbodley, kdudka, netwiz
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: curl-7.29.0-32.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 17:44:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1327142    
Attachments:
Description Flags
beaker regression test for curl_multi_wait
none
curl-7.29.0-32.el7.src.rpm none

Description Ken Dreyer (Red Hat) 2016-06-18 04:35:03 UTC
Description of problem:
Ceph's RGW uses curl_multi_wait and hits a deadlock in this code (bug 1327142). The following PR for RHEL 7.2's curl fixes this: https://github.com/ktdreyer/curl/pull/1

Version-Release number of selected component (if applicable):
curl-7.29.0-25.el7

How reproducible:
always

Steps to Reproduce:
1. See details at https://github.com/ktdreyer/curl/pull/1

Actual results:
Ceph RGW deadlocks

Expected results:
Ceph RGW does not deadlock

Comment 3 Kamil Dudka 2016-06-20 16:05:06 UTC
Thanks a lot for identifying the fix and preparing the patches!  To summarize it here, this is a request to backport the following upstream commits:

https://github.com/curl/curl/commit/curl-7_29_0-273-g136a3a0
https://github.com/curl/curl/commit/curl-7_31_0-68-g6d30f8e
https://github.com/curl/curl/commit/curl-7_31_0-78-g513e587

Comment 5 Casey Bodley 2016-06-20 19:13:20 UTC
Created attachment 1169974 [details]
beaker regression test for curl_multi_wait

I've attached a beaker regression test to validate the fix.

Comment 7 Kamil Dudka 2016-06-21 07:36:13 UTC
(In reply to Casey Bodley from comment #5)
> I've attached a beaker regression test to validate the fix.

Works reliably for me.  Thank you for preparing the test!

@QE: Please make sure that libcurl-devel is installed for the test to run (unless it is installed somehow automatically).

Comment 12 Kamil Dudka 2016-08-17 07:40:01 UTC
*** Bug 1367614 has been marked as a duplicate of this bug. ***

Comment 13 Steven Haigh 2016-08-18 02:35:15 UTC
Just wondering if there is any chance of getting a copy of the curl-7.29.0-32.el7
 packages for testing?

This problems is currently hitting our systems hard with 100% CPU usage on everything. Would like to test this and feed back the info to the DotNet Core team.

Comment 14 Kamil Dudka 2016-08-19 08:25:31 UTC
Created attachment 1192069 [details]
curl-7.29.0-32.el7.src.rpm

(In reply to Steven Haigh from comment #13)
> Just wondering if there is any chance of getting a copy of the
> curl-7.29.0-32.el7 packages for testing?

I am attaching an *unsupported* source RPM for *TESTING PURPOSES ONLY*.  Please do not use it on production systems.  Feedback is appreciated!

Comment 15 Steven Haigh 2016-08-20 10:03:31 UTC
Just as an update - I've built these packages and pushed them to my testing repo for testing on the machine with this problem.

I don't want to restart the C# dotnet core app at the moment - but should probably have more info during the work week...

Comment 16 Steven Haigh 2016-08-24 05:50:31 UTC
I can confirm that this package fixes the issues we were seeing with the dotnet core applications as per BZ 1367614.

Comment 17 Kamil Dudka 2016-08-24 06:10:47 UTC
Perfect.  Thanks for confirmation!

Comment 19 errata-xmlrpc 2016-11-03 17:44:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2575.html