Description of problem: Upload multipart objects on both zones. A few objects were skipped from being synced on one of the zones and retry was not attempted. Version-Release number of selected component (if applicable): ceph-radosgw-10.2.2-23.el7cp.x86_64 How reproducible: Saw this issue only on the new build ceph-radosgw-10.2.2-23.el7cp.x86_64 Actual results: The initial multipart upload of bucket3/big.txt started on magna115 here: 2016-07-19 10:07:35.218618 7f0acbfff700 1 ====== starting new request req=0x7f0acbff9710 ===== 2016-07-19 10:07:35.218629 7f0acbfff700 2 req 3236:0.000011::PUT /bucket3/big.txt::initializing for trans_id = tx000000000000000000ca4-00578dfbe7-5e46-us-1 and finished here: 2016-07-19 10:08:17.049985 7f0ac47f0700 1 ====== req done req=0x7f0ac47ea710 op status=0 http_status=200 ====== 2016-07-19 10:08:17.050015 7f0ac47f0700 1 civetweb: 0x7f0b780009b0: 10.8.128.74 - - [19/Jul/2016:10:08:16 +0000] "PUT /bucket3/big.txt HTTP/1.1" 200 0 - Boto/2.41.0 Python/2.7.5 Linux/3.10.0-327.el7.x86_64 magna059 sees it in the sync log: 2016-07-19 10:08:41.390609 7f33daffd700 20 bucket sync single entry (source_zone=f5717851-2682-475a-b24b-7bcdec728cbe) b=bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0] log_entry=00000000204.11489.3 op=0 op_state=1 2016-07-19 10:08:41.390699 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate() 2016-07-19 10:08:41.390706 7f33daffd700 5 bucket sync: sync obj: f5717851-2682-475a-b24b-7bcdec728cbe/bucket3(@{i=us-2.rgw.buckets.index,e=us-1.rgw.buckets.non-ec}us-2.rgw.buckets.data[f5717851-2682-475a-b24b-7bcdec728cbe.14122.41])/big.txt[0] 2016-07-19 10:08:41.390711 7f33daffd700 5 Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:fetch ... 2016-07-19 10:08:41.397525 7f33f47e8700 20 sending request to http://magna115:80/bucket3/big.txt?rgwx-zonegroup=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95&rgwx-prepend-metadata=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95 magna115 sees GET request from magna059: 2016-07-19 10:08:37.179355 7f0b31ffb700 2 req 3332:0.001339:s3:GET /bucket3/big.txt:get_obj:executing ... (almost 50 minutes later) ... 2016-07-19 10:57:59.221172 7f0b31ffb700 0 ERROR: flush_read_list(): d->client_c->handle_data() returned -5 2016-07-19 10:57:59.221193 7f0b31ffb700 20 get_obj_data::cancel_all_io() 2016-07-19 10:57:59.221629 7f0b31ffb700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2016-07-19 10:57:59.221737 7f0b31ffb700 2 req 3332:2962.043722:s3:GET /bucket3/big.txt:get_obj:completing 2016-07-19 10:57:59.221748 7f0b31ffb700 2 req 3332:2962.043733:s3:GET /bucket3/big.txt:get_obj:op status=-5 2016-07-19 10:57:59.221752 7f0b31ffb700 2 req 3332:2962.043737:s3:GET /bucket3/big.txt:get_obj:http status=500 2016-07-19 10:57:59.221759 7f0b31ffb700 1 ====== req done req=0x7f0b31ff5710 op status=-5 http_status=500 ====== 2016-07-19 10:57:59.221786 7f0b31ffb700 20 process_request() returned -5 magna059 gets error reply: 2016-07-19 10:59:52.175208 7f33f47e8700 0 store->fetch_remote_obj() returned r=-5 2016-07-19 10:59:52.175604 7f33f47e8700 1 heartbeat_map reset_timeout 'RGWAsyncRadosProcessor::m_tp thread 0x7f33f47e8700' had timed out after 600 ... 2016-07-19 10:59:52.177590 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate() 2016-07-19 10:59:52.177596 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate() returned r=-5 ... 2016-07-19 10:59:52.178023 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate() 2016-07-19 10:59:52.178028 7f33daffd700 5 Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:done, retcode=-5 2016-07-19 10:59:52.178031 7f33daffd700 0 ERROR: failed to sync object: bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt ... 2016-07-19 11:00:05.774443 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate() 2016-07-19 11:00:05.774445 7f33daffd700 20 cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE: operate() returned r=-5 2016-07-19 11:00:05.774448 7f33daffd700 5 Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:finish 2016-07-19 11:00:05.774451 7f33daffd700 20 stack->operate() returned ret=-5 Despite the error, we still update incremental bucket sync position.
Will get pulled into next build
The issue has not been seen since ceph-10.2.2-26. Moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html