Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1358129

Summary: Multisite some of the object sync operations were skipped
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGWAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: shilpa <smanjara>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 2.0CC: cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, sweil, tserlin, yehuda
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.2-26.el7cp Ubuntu: ceph_10.2.2-20redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:44:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shilpa 2016-07-20 06:41:24 UTC
Description of problem:
Upload multipart objects on both zones. A few objects were skipped from being synced on one of the zones and retry was not attempted. 

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-23.el7cp.x86_64

How reproducible:
Saw this issue only on the new build ceph-radosgw-10.2.2-23.el7cp.x86_64


Actual results:

The initial multipart upload of bucket3/big.txt started on magna115 here:

2016-07-19 10:07:35.218618 7f0acbfff700  1 ====== starting new request
req=0x7f0acbff9710 =====
2016-07-19 10:07:35.218629 7f0acbfff700  2 req 3236:0.000011::PUT
/bucket3/big.txt::initializing for trans_id =
tx000000000000000000ca4-00578dfbe7-5e46-us-1

and finished here:

2016-07-19 10:08:17.049985 7f0ac47f0700  1 ====== req done
req=0x7f0ac47ea710 op status=0 http_status=200 ======
2016-07-19 10:08:17.050015 7f0ac47f0700  1 civetweb: 0x7f0b780009b0:
10.8.128.74 - - [19/Jul/2016:10:08:16 +0000] "PUT /bucket3/big.txt
HTTP/1.1" 200 0 - Boto/2.41.0 Python/2.7.5 Linux/3.10.0-327.el7.x86_64


magna059 sees it in the sync log:

2016-07-19 10:08:41.390609 7f33daffd700 20 bucket sync single entry
(source_zone=f5717851-2682-475a-b24b-7bcdec728cbe)
b=bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]
log_entry=00000000204.11489.3 op=0 op_state=1
2016-07-19 10:08:41.390699 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 10:08:41.390706 7f33daffd700  5 bucket sync: sync obj:
f5717851-2682-475a-b24b-7bcdec728cbe/bucket3(@{i=us-2.rgw.buckets.index,e=us-1.rgw.buckets.non-ec}us-2.rgw.buckets.data[f5717851-2682-475a-b24b-7bcdec728cbe.14122.41])/big.txt[0]
2016-07-19 10:08:41.390711 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:fetch
...
2016-07-19 10:08:41.397525 7f33f47e8700 20 sending request to
http://magna115:80/bucket3/big.txt?rgwx-zonegroup=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95&rgwx-prepend-metadata=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95


magna115 sees GET request from magna059:

2016-07-19 10:08:37.179355 7f0b31ffb700  2 req 3332:0.001339:s3:GET
/bucket3/big.txt:get_obj:executing

... (almost 50 minutes later) ...

2016-07-19 10:57:59.221172 7f0b31ffb700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5
2016-07-19 10:57:59.221193 7f0b31ffb700 20 get_obj_data::cancel_all_io()
2016-07-19 10:57:59.221629 7f0b31ffb700  0 WARNING: set_req_state_err
err_no=5 resorting to 500
2016-07-19 10:57:59.221737 7f0b31ffb700  2 req 3332:2962.043722:s3:GET
/bucket3/big.txt:get_obj:completing
2016-07-19 10:57:59.221748 7f0b31ffb700  2 req 3332:2962.043733:s3:GET
/bucket3/big.txt:get_obj:op status=-5
2016-07-19 10:57:59.221752 7f0b31ffb700  2 req 3332:2962.043737:s3:GET
/bucket3/big.txt:get_obj:http status=500
2016-07-19 10:57:59.221759 7f0b31ffb700  1 ====== req done
req=0x7f0b31ff5710 op status=-5 http_status=500 ======
2016-07-19 10:57:59.221786 7f0b31ffb700 20 process_request() returned -5


magna059 gets error reply:

2016-07-19 10:59:52.175208 7f33f47e8700  0 store->fetch_remote_obj()
returned r=-5
2016-07-19 10:59:52.175604 7f33f47e8700  1 heartbeat_map reset_timeout
'RGWAsyncRadosProcessor::m_tp thread 0x7f33f47e8700' had timed out after 600
...
2016-07-19 10:59:52.177590 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate()
2016-07-19 10:59:52.177596 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate()
returned r=-5
...
2016-07-19 10:59:52.178023 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 10:59:52.178028 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:done,
retcode=-5
2016-07-19 10:59:52.178031 7f33daffd700  0 ERROR: failed to sync object:
bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt
...
2016-07-19 11:00:05.774443 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 11:00:05.774445 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate() returned r=-5
2016-07-19 11:00:05.774448 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:finish
2016-07-19 11:00:05.774451 7f33daffd700 20 stack->operate() returned ret=-5

Despite the error, we still update incremental bucket sync position.

Comment 6 John Poelstra 2016-07-20 15:47:57 UTC
Will get pulled into next build

Comment 11 shilpa 2016-08-01 12:28:32 UTC
The issue has not been seen since ceph-10.2.2-26. Moving to verified

Comment 13 errata-xmlrpc 2016-08-23 19:44:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html