Bug 1670321

Summary: [GSS] Downloads are corrupted when using RGW with civetweb as frontend
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Karun Josy <kjosy>
Component: RGWAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: high    
Version: 3.2CC: assingh, bancinco, cbodley, ceph-eng-bugs, ceph-qe-bugs, edonnell, kbader, kjosy, mbenjamin, sweil, tchandra, tserlin, vimishra
Target Milestone: z2   
Target Release: 3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.8-113.el7cp Ubuntu: ceph_12.2.8-96redhat1xenial Doc Type: Bug Fix
Doc Text:
.CivetWeb was rebased to upstream version 1.10 and the `enable_keep_alive` CivetWeb option works as expected When using the Ceph Object Gateway with the CivetWeb front end, the CivetWeb connections timed out despite the `enable_keep_alive` option enabled. Consequently, S3 clients that did not reconnect or retry were not reliable. With this update, CivetWeb has been rebased to the 1.10 upstream version, and the `enable_keep_alive` option works as expected. As a result, CivetWeb connections no longer time out in this case. In addition, the new CivetWeb version introduces more strict header checks. This new behavior can cause certain return codes to change because invalid requests are detected sooner. For example, in previous version CivetWeb returned the `403 Forbidden` error on an invalid HTTP request, but in the new version it returns the `400 Bad Request` error instead.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 15:56:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Karun Josy 2019-01-29 09:21:36 UTC
* Description of problem:

While using Ceph Rados Gateway with civetweb as frontend, segmented downloads of large files gets corrupted. This issue is seen only when the downloaded files are very large and multiple instances of the downloads are run in parallel. 


* Version-Release number of selected component (if applicable):
3.2

* How reproducible:
Always

* Steps to Reproduce:
We can reproduce by using a test bucket, put a large file there and use 'aria2' download manager  to download 10 instances of the files using 8 segments for each file and with 250 KB/s rate limit, calculating the MD5s, and comparing them.
For eg:
- Copy an ISO file to a test bucket
    # aws s3 cp  CentOS-7-x86_64-NetInstall-1810.iso s3://test --acl public-read-write --endpoint-url http://10.74.255.176:8080
- Install aria2 in another server
    # wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    # rpm -ivvh epel-release-latest-7.noarch.rpm
    # yum install aria2
- Run the test command :
    # for i in {1..10}; do aria2c --max-overall-download-limit=250K -s 8 -x 8 -o test$i "http://10.74.255.176:8080/test/CentOS-7-x86_64-NetInstall-1810.iso" && md5sum test$i; done 

* Actual results:
Downloaded files have different md5sums, which implies they are corrupted.

* Expected results:
All downloaded files should have the same md5sum.

* Additional info:
We tested the same thing using 'Beast' and was not able to recreate the issue.

Comment 2 Karun Josy 2019-01-29 10:53:51 UTC
Hello,

There is a correction in the command mentioned in the description to download the files parallely. It should be :
====
for i in {1..10}; do aria2c --max-overall-download-limit=250K -s 8 -x 8 -o test$i "http://10.74.255.176:8080/test/CentOS-7-x86_64-NetInstall-1810.iso" && md5sum test$i >> testmd5 & done
====

This will download the ISO file 10 times and save as  'test1' to 'test10'. We can compare the md5sum of these downloaded files which will be in the file 'testmd5' to see if it is corrupted or not.


Thanks and regards!
Karun

Comment 23 errata-xmlrpc 2019-04-30 15:56:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911