Bug 1932396

Summary: [TRACKER for BZ #1943619] - RGW does not handle "Expect: 100-continue" answers from http requests not needing it
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Guillaume Moutier <gmoutier>
Component: cephAssignee: Yuval Lifshitz <ylifshit>
Status: CLOSED ERRATA QA Contact: Tiffany Nguyen <tunguyen>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.6CC: bniver, ebenahar, jthottan, madam, mschindl, muagarwa, nberry, ocs-bugs, odf-bz-bot, sostapov, tunguyen, ylifshit
Target Milestone: ---Flags: tunguyen: needinfo-
Target Release: ODF 4.9.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-164.ci Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1943619 (view as bug list) Environment:
Last Closed: 2021-12-13 17:44:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1943619    

Description Guillaume Moutier 2021-02-24 14:57:54 UTC
Description of problem:
When trying to send bucket notifications to ElasticSearch HTTP endpoint (http://elasticsearch:9200/index/_doc/), the following error appears in RGW logs:
---
debug 2021-02-24 14:34:34.180 7f6185119700  1 ====== starting new request req=0x7f629e30a680 =====
debug 2021-02-24 14:34:34.190 7f6183916700  1 push to endpoint HTTP/S Endpoint
URI: http://elasticsearch-sample-es-http.rgw-es.svc:9200/s3_index/_doc/
Ack Level: 
don't verify SSL failed, with error: -5
debug 2021-02-24 14:34:34.190 7f6183916700  1 ====== req done req=0x7f629e30a680 op status=0 http_status=200 latency=0.0100002s ======
debug 2021-02-24 14:34:34.190 7f6183916700  1 beast: 0x7f629e30a680: 10.131.2.41 - - [2021-02-24 14:34:34.0.190742s] "PUT /std-user-bucket1/Pipfile HTTP/1.1" 200 180 - "Boto3/1.17.14 Python/3.6.8 Linux/4.18.0-193.41.1.el8_2.x86_64 Botocore/1.20.14" -
---
Notes:
- The error is not linked to SSL, this is a generic error message. Endpoint is working over http. Plus SSL verify is disabled from the bucket notification configuration.
- Direct curl command over endpoint works flawlessly.
- After discussion with Yuval Lifshitz, "its is because the client send "Expect: 100-continue" - If a server respect that field, and actually answer with 100 Continue result code, the RGW treat that as an -EIO. -EIO = -5"
- This bug seems to have been corrected by this upstream and in the 4.2z1 branch: https://github.com/ceph/ceph/pull/34414

So apparently it would only need to make its way to OCS.


Version-Release number of selected component (if applicable): OCS 4.6.2


How reproducible:
Send an RGW bucket notification to an ElasticSearch endpoint.


Actual results:
- Notification sending error.


Expected results:
- Notification sent and logged into ElasticSearch.


Additional info:
This problem should occur with any http endpoint respecting the "Expect: 100-continue" flag in the request.

Comment 2 Yuval Lifshitz 2021-02-24 15:01:55 UTC
issue was fixed by the following commit:

commit 75b17dd1193d63f60eb677d7523321717626299c
Author: Yuval Lifshitz <yuvalif>
Date:   Mon Apr 6 12:50:37 2020 +0300

    rgw/http: add timeout to http client
    
    also, prevent "Expect: 100-continue" from being sent
    when not needed
    
    Signed-off-by: Yuval Lifshitz <yuvalif>
    (cherry picked from commit dd49cc83078c7e268ce3de7ab0bfbf3035ed5d50)

Comment 3 Mudit Agarwal 2021-03-02 09:27:41 UTC
Confirmed with Yuval, above commit is already there in RHCS4.2z1. Moving this BZ to MODIFIED.

Comment 11 Tiffany Nguyen 2021-03-23 19:50:20 UTC
Not able to verifiy this bz due to put object error.  See https://bugzilla.redhat.com/show_bug.cgi?id=1932396#c10 for more detail.

Comment 12 Mudit Agarwal 2021-03-24 06:16:57 UTC
Hi Yuval,

PTAL, do you know in which exact ceph version this fix went in?

Thanks

Comment 14 Mudit Agarwal 2021-03-26 12:39:45 UTC
Hi Yuval,

I have tracked the commit, it is here: https://gitlab.cee.redhat.com/ceph/ceph/-/commit/75b17dd1193d63f60eb677d7523321717626299c
Which means the build we are testing with has the patch but if we are still hitting the issue we may have to investigate more.

Let me know if I have to open a Ceph BZ for this.

Thanks

Comment 16 Scott Ostapovicz 2021-03-26 15:57:27 UTC
This seems to be a tracker for an RHCS issue that does not have a BZ.  Please create an RHCS BZ for this issue and link it here.

Comment 17 Mudit Agarwal 2021-03-26 16:04:34 UTC
Created a tracker and moving out of 4.7 as we need a fix in Ceph.
Will set the acks accordingly.

Comment 23 Mudit Agarwal 2021-09-22 09:19:42 UTC
Fix should be available in the latest ODF builds

Comment 32 errata-xmlrpc 2021-12-13 17:44:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086