Bug 1327142
Summary: | Object delete did not sync to the peer zone in multisite configuration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shilpa <smanjara> | ||||||
Component: | RGW | Assignee: | Casey Bodley <cbodley> | ||||||
Status: | CLOSED ERRATA | QA Contact: | shilpa <smanjara> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 2.0 | CC: | amaredia, cbodley, ceph-eng-bugs, gmeno, hnallurv, j.klare, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 2.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHEL: ceph-10.2.1-13.el7cp Ubuntu: ceph_10.2.1-2redhat1xenial | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-08-23 19:36:18 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1347904 | ||||||||
Bug Blocks: | 1343229 | ||||||||
Attachments: |
|
Description
shilpa
2016-04-14 10:32:54 UTC
There are multiple issues coming into play here. The following upstream PRs take care of the different issues: https://github.com/ceph/ceph/pull/8709 https://github.com/ceph/ceph/pull/8685 https://github.com/ceph/ceph/pull/8772 *** Bug 1327519 has been marked as a duplicate of this bug. *** The patches Yehuda mentioned in Comment 2 have been merged to master. We still need the changes to be backported to jewel. Cherry-pick to jewel in progress: https://github.com/ceph/ceph/pull/9017 (In reply to Ken Dreyer (Red Hat) from comment #5) > Cherry-pick to jewel in progress: > https://github.com/ceph/ceph/pull/9017 Ken, will this be available in the next build? The next build will be based on 10.2.1, so yes. The above PR was merged to jewel and is present in v10.2.1. I still see this issue on ceph-radosgw-10.2.1-1.el7cp.x86_64 Object deletes are still not synced to the peer zone. Only this time, the file is not resynced back to the zone where the delete was initiated. Any further operations (creates or deletes) also fail to sync. 2016-05-18 07:09:44.685026 7feee1ffb700 0 rgw meta sync: ERROR: can't remove key: bucket.instance:container17:0a6973f5-af95-4665-aa6b-be8dd41491d8.14164.19 ret=-2 2016-05-18 07:09:44.849754 7feda3fff700 1 civetweb: 0x7feda4004df0: 10.8.128.59 - - [18/May/2016:07:09:43 +0000] "POST /admin/log HTTP/1.1" 200 0 - - 2016-05-18 07:09:44.851600 7fee53fff700 0 rgw meta sync: cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker pos=1_1463555295.641429_553.1 2016-05-18 07:09:44.851609 7fee53fff700 0 rgw meta sync: cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker pos=1_1463555295.717155_557.1 2016-05-18 07:09:44.851613 7fee53fff700 0 rgw meta sync: cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker pos=1_1463555295.764512_559.1 2016-05-18 07:09:44.851616 7fee53fff700 0 rgw meta sync: cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker pos=1_1463555295.787972_560.1 2016-05-18 07:09:45.144490 7feda37fe700 1 civetweb: 0x7fedac004df0: 10.8.128.59 - - [18/May/2016:07:09:44 +0000] "POST /admin/log HTTP/1.1" 200 0 - - 2016-05-18 07:09:45.194290 7fee52ffd700 0 ERROR: lease cr failed, done early 2016-05-18 07:09:45.194302 7fee52ffd700 0 ERROR: full sync on container17 bucket_id=0a6973f5-af95-4665-aa6b-be8dd41491d8.14164.16 shard_id=-1 failed, retcode=-16 Any further syncs between the zones are failing. # radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0 realm 4a77379a-174c-42c6-abff-45c148cdab4e (earth) zonegroup c0e3ca48-6260-4ed8-b83a-165f530e4869 (us) zone 0a6973f5-af95-4665-aa6b-be8dd41491d8 (us-1) metadata sync no sync (zone is master) data sync source: 411e1b05-2194-447c-a30d-63cf9b2d8884 (us-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 5 shards A radosgw restart helps. After restarting the requests sync successfully: 2016-05-18 08:18:55.598521 7f91d37f6700 0 store->fetch_remote_obj() returned r=-2 2016-05-18 08:18:55.599695 7f91e57fa700 0 store->fetch_remote_obj() returned r=-2 2016-05-18 08:18:55.599953 7f91dd7fa700 0 store->fetch_remote_obj() returned r=-2 2016-05-18 08:18:55.612941 7f91d2ff5700 0 _send_request(): deleting obj=container3:ks-pre.log 2016-05-18 08:18:55.612954 7f91e5ffb700 0 _send_request(): deleting obj=container3:s3file.py 2016-05-18 08:18:55.617699 7f91d57fa700 0 _send_request(): deleting obj=container17:s3file.py 2016-05-18 08:18:55.646109 7f91cefed700 0 store->fetch_remote_obj() returned r=-2 2016-05-18 08:18:55.648808 7f91d67fc700 0 _send_request(): deleting obj=container18:s3file.py # radosgw-admin sync status --rgw-zone=us-1 --debug-rgw=0 --debug-ms=0 realm 4a77379a-174c-42c6-abff-45c148cdab4e (earth) zonegroup c0e3ca48-6260-4ed8-b83a-165f530e4869 (us) zone 0a6973f5-af95-4665-aa6b-be8dd41491d8 (us-1) metadata sync no sync (zone is master) data sync source: 411e1b05-2194-447c-a30d-63cf9b2d8884 (us-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source Another thing to add, trying a delete operation on an object of about 1.5G, recreates the deleted object same as before. http://tracker.ceph.com/issues/15542 (In reply to shilpa from comment #11) > I still see this issue on ceph-radosgw-10.2.1-1.el7cp.x86_64 > > Object deletes are still not synced to the peer zone. Only this time, the > file is not resynced back to the zone where the delete was initiated. Any > further operations (creates or deletes) also fail to sync. > > > 2016-05-18 07:09:44.685026 7feee1ffb700 0 rgw meta sync: ERROR: can't > remove key: > bucket.instance:container17:0a6973f5-af95-4665-aa6b-be8dd41491d8.14164.19 > ret=-2 > 2016-05-18 07:09:44.849754 7feda3fff700 1 civetweb: 0x7feda4004df0: > 10.8.128.59 - - [18/May/2016:07:09:43 +0000] "POST /admin/log HTTP/1.1" 200 > 0 - - > 2016-05-18 07:09:44.851600 7fee53fff700 0 rgw meta sync: > cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker > pos=1_1463555295.641429_553.1 > 2016-05-18 07:09:44.851609 7fee53fff700 0 rgw meta sync: > cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker > pos=1_1463555295.717155_557.1 > 2016-05-18 07:09:44.851613 7fee53fff700 0 rgw meta sync: > cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker > pos=1_1463555295.764512_559.1 > 2016-05-18 07:09:44.851616 7fee53fff700 0 rgw meta sync: > cr:s=0x7fee4c003890:op=0x7fee4c036e90:18RGWMetaSyncShardCR: adjusting marker > pos=1_1463555295.787972_560.1 > 2016-05-18 07:09:45.144490 7feda37fe700 1 civetweb: 0x7fedac004df0: > 10.8.128.59 - - [18/May/2016:07:09:44 +0000] "POST /admin/log HTTP/1.1" 200 > 0 - - > 2016-05-18 07:09:45.194290 7fee52ffd700 0 ERROR: lease cr failed, done > early > 2016-05-18 07:09:45.194302 7fee52ffd700 0 ERROR: full sync on container17 > bucket_id=0a6973f5-af95-4665-aa6b-be8dd41491d8.14164.16 shard_id=-1 failed, > retcode=-16 this means that it couldn't lock the container to (probably) do the sync because it was busy doing another thing. Did you do anything else on that bucket before the object removal? Can you document precisely what you do, and what the order and timing of things? > > Any further syncs between the zones are failing. For how long? Created attachment 1160898 [details]
Master zone logs
Created attachment 1160899 [details]
Logs
*** Bug 1326290 has been marked as a duplicate of this bug. *** *** Bug 1327569 has been marked as a duplicate of this bug. *** Hi, i am seeing the same errors in my logs while i use COSBench to test the multisite deployment. The COSBench workload first creates 100 containers, then writes one object to each container, then deletes the object and lastly the containers too. For now i am working with 4k objects to avoid multipart uploads and it seems that i get exactly one message per container and lateron per object. I tried to pin it to some specific actions by manually creating a bucket with s3cmd and then adding one 4k object into it: (the usual logs from civetweb are filtered out, just showing logs containing "ERROR") create bucket1: -> logs in master zone: 2016-05-31 16:58:52.343411 7f5850ff1700 0 ERROR: failed to wait for op, ret=-22: POST http://rgw1.de/admin/log?type=metadata¬ify&rgwx-zonegroup=ce98cb46-10d3-4b0f-827a-cc19774057af -> no logs in secondary zone create bucket2: -> logs in master zone: 2016-05-31 16:59:13.205150 7f5850ff1700 0 ERROR: failed to wait for op, ret=-22: POST http://rgw1.de/admin/log?type=metadata¬ify&rgwx-zonegroup=ce98cb46-10d3-4b0f-827a-cc19774057af -> no logs in secondary zone put one 4k file into bucket1: -> no logs in master zone: -> logs in secondary zone: 2016-05-31 17:00:25.341013 7f8488fe9700 0 ERROR: lease cr failed, done early 2016-05-31 17:00:25.341037 7f8488fe9700 0 ERROR: full sync on bucket1 bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.103 shard_id=1 failed, retcode=-16 put one 4k file into bucket2: -> no logs in master zone: -> logs in secondary zone: 2016-05-31 17:00:03.020905 7f8488fe9700 0 ERROR: lease cr failed, done early 2016-05-31 17:00:03.020932 7f8488fe9700 0 ERROR: full sync on bucket2 bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.104 shard_id=1 failed, retcode=-16 delete one 4k file from bucket1: -> no logs in master zone: -> logs in secondary zone: 2016-05-31 17:07:09.891998 7f8488fe9700 0 ERROR: lease cr failed, done early 2016-05-31 17:07:09.892024 7f8488fe9700 0 ERROR: incremental sync on bucket1 bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.103 shard_id=1 failed, retcode=-16 same for the file in bucket2 I am seeing no errors when deleting the bucket. The sync itself seems to work fine and i can download the file without any issue from both sites after i create it. So i have no idea if this is an actual issue or just an ERROR in the logs. I am running the radosgw on ubuntu xenial and use a selfbuild package from the jewel branch of today. Cheers, Jan (In reply to Jan Klare from comment #22) > Hi, > > i am seeing the same errors in my logs while i use COSBench to test the > multisite deployment. The COSBench workload first creates 100 containers, > then writes one object to each container, then deletes the object and lastly > the containers too. For now i am working with 4k objects to avoid multipart > uploads and it seems that i get exactly one message per container and > lateron per object. I tried to pin it to some specific actions by manually > creating a bucket with s3cmd and then adding one 4k object into it: > > (the usual logs from civetweb are filtered out, just showing logs containing > "ERROR") > > create bucket1: > -> logs in master zone: > 2016-05-31 16:58:52.343411 7f5850ff1700 0 ERROR: failed to wait for op, > ret=-22: POST > http://rgw1.de/admin/log?type=metadata¬ify&rgwx-zonegroup=ce98cb46-10d3- > 4b0f-827a-cc19774057af > -> no logs in secondary zone > > create bucket2: > -> logs in master zone: > 2016-05-31 16:59:13.205150 7f5850ff1700 0 ERROR: failed to wait for op, > ret=-22: POST > http://rgw1.de/admin/log?type=metadata¬ify&rgwx-zonegroup=ce98cb46-10d3- > 4b0f-827a-cc19774057af > -> no logs in secondary zone > > put one 4k file into bucket1: > -> no logs in master zone: > -> logs in secondary zone: > 2016-05-31 17:00:25.341013 7f8488fe9700 0 ERROR: lease cr failed, done early > 2016-05-31 17:00:25.341037 7f8488fe9700 0 ERROR: full sync on bucket1 > bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.103 shard_id=1 failed, > retcode=-16 > > put one 4k file into bucket2: > -> no logs in master zone: > -> logs in secondary zone: > 2016-05-31 17:00:03.020905 7f8488fe9700 0 ERROR: lease cr failed, done early > 2016-05-31 17:00:03.020932 7f8488fe9700 0 ERROR: full sync on bucket2 > bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.104 shard_id=1 failed, > retcode=-16 > > delete one 4k file from bucket1: > -> no logs in master zone: > -> logs in secondary zone: > 2016-05-31 17:07:09.891998 7f8488fe9700 0 ERROR: lease cr failed, done early > 2016-05-31 17:07:09.892024 7f8488fe9700 0 ERROR: incremental sync on > bucket1 bucket_id=79d864f5-3c4b-48da-8f9e-b24d82c1aacd.770892.103 shard_id=1 > failed, retcode=-16 > > same for the file in bucket2 > > I am seeing no errors when deleting the bucket. > > The sync itself seems to work fine and i can download the file without any > issue from both sites after i create it. So i have no idea if this is an > actual issue or just an ERROR in the logs. I am running the radosgw on > ubuntu xenial and use a selfbuild package from the jewel branch of today. > > Cheers, > Jan Thanks Jan. The ERRORs you're seeing do occur during normal operation. You can safely ignore them, unless you also observe puts or deletes that don't get synced to other zones for a significant amount of time. Hi Casey, thanks for the fast response. If these ERRORs are part of the "normal operation", i can ignore them for my testing setup, but in my opinion ERROR messages should never be part of "normal operation". Any chance this will be fixed in jewel? Any idea why i get these ERRORs, even thou everything works fine? If you want, we can move this to a separate bug, since this seems not to be directly related to the actual "delete sync not working" issue. Cheers, Jan (In reply to Jan Klare from comment #27) > Hi Casey, > > thanks for the fast response. If these ERRORs are part of the "normal > operation", i can ignore them for my testing setup, but in my opinion ERROR > messages should never be part of "normal operation". Any chance this will be > fixed in jewel? Any idea why i get these ERRORs, even thou everything works > fine? If you want, we can move this to a separate bug, since this seems not > to be directly related to the actual "delete sync not working" issue. > > Cheers, > Jan I agree that the ERROR text and --debug-rgw levels for these messages should change. Could you please open a new bug? Hi Shilpa, We're preparing a new build that includes the latest fix from Yehuda. This allows the data sync process to keep retrying on bucket sync failures, including those related to "ERROR: lease cr failed, done early". This may not resolve all of the sync issues that you're seeing, but should at least limit the ways that it can reproduce. We'll follow up with the fixed-in-version when it's ready. (In reply to Casey Bodley from comment #28) > (In reply to Jan Klare from comment #27) > > Hi Casey, > > > > thanks for the fast response. If these ERRORs are part of the "normal > > operation", i can ignore them for my testing setup, but in my opinion ERROR > > messages should never be part of "normal operation". Any chance this will be > > fixed in jewel? Any idea why i get these ERRORs, even thou everything works > > fine? If you want, we can move this to a separate bug, since this seems not > > to be directly related to the actual "delete sync not working" issue. > > > > Cheers, > > Jan > > I agree that the ERROR text and --debug-rgw levels for these messages should > change. Could you please open a new bug? new bug created here http://tracker.ceph.com/issues/16121 (In reply to Casey Bodley from comment #29) > Hi Shilpa, > > We're preparing a new build that includes the latest fix from Yehuda. This > allows the data sync process to keep retrying on bucket sync failures, > including those related to "ERROR: lease cr failed, done early". This may > not resolve all of the sync issues that you're seeing, but should at least > limit the ways that it can reproduce. We'll follow up with the > fixed-in-version when it's ready. Hi Casey, I was able to reproduce the issue again on the new build 10.2.1-18. I have uploaded the compressed logs here. I first saw the issue on container2. So that would be the place to look at, with timestamp: 2016-06-14 10:05:30.0.783646s. There were about 235 files I uploaded at once. And that's when none of the files got synced. Created attachment 1167825 [details]
magna075 logs
Logs for magna075 show that the data sync thread stops producing output about 15 minutes before the ~235 objects were uploaded on magna115. The last output from that thread indicates a deadlock when submitting an http request to another thread. I've asked Shilpa to set up another reproducer so that I can attach a debugger and figure out exactly where these threads are blocked. Here are the stack traces from gdb illustrating the deadlock. The same deadlock occurred between the metadata sync and http threads. data sync thread: Thread 138 (Thread 0x7f261affd700 (LWP 9325)): #0 0x00007f271b5e41cd in write () from /lib64/libpthread.so.0 #1 0x00007f271bf7f940 in RGWHTTPManager::signal_thread() () from /lib64/librgw.so.2 #2 0x00007f271bf807ee in RGWHTTPManager::remove_request(RGWHTTPClient*) () from /lib64/librgw.so.2 #3 0x00007f271bf80855 in RGWHTTPClient::~RGWHTTPClient() () from /lib64/librgw.so.2 #4 0x00007f271c0cd8a9 in RGWRESTStreamRWRequest::~RGWRESTStreamRWRequest() () from /lib64/librgw.so.2 #5 0x00007f271c0d3549 in RGWRESTReadResource::~RGWRESTReadResource() () from /lib64/librgw.so.2 #6 0x00007f271bf316f8 in RefCountedObject::put() () from /lib64/librgw.so.2 #7 0x00007f271bfe2bc3 in RGWReadRemoteDataLogShardInfoCR::~RGWReadRemoteDataLogShardInfoCR() () from /lib64/librgw.so.2 #8 0x00007f271bf316f8 in RefCountedObject::put() () from /lib64/librgw.so.2 #9 0x00007f271bf2d94f in RGWCoroutinesStack::operate(RGWCoroutinesEnv*) () from /lib64/librgw.so.2 #10 0x00007f271bf2f6a1 in RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> >&) () from /lib64/librgw.so.2 #11 0x00007f271bf30260 in RGWCoroutinesManager::run(RGWCoroutine*) () from /lib64/librgw.so.2 #12 0x00007f271bfd75d2 in RGWRemoteDataLog::run_sync(int, rgw_data_sync_status&) () from /lib64/librgw.so.2 #13 0x00007f271c0988b9 in RGWDataSyncProcessorThread::process() () from /lib64/librgw.so.2 #14 0x00007f271c03e313 in RGWRadosThread::Worker::entry() () from /lib64/librgw.so.2 #15 0x00007f271b5dddc5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f271abe7ced in clone () from /lib64/libc.so.6 data sync http thread: Thread 136 (Thread 0x7f2619ffb700 (LWP 9327)): #0 0x00007f271abdd69d in poll () from /lib64/libc.so.6 #1 0x00007f271ba353a9 in Curl_poll () from /lib64/libcurl.so.4 #2 0x00007f271ba2e37b in curl_multi_wait () from /lib64/libcurl.so.4 #3 0x00007f271bf7e311 in do_curl_wait(CephContext*, void*, int) () from /lib64/librgw.so.2 #4 0x00007f271bf80e02 in RGWHTTPManager::reqs_thread_entry() () from /lib64/librgw.so.2 #5 0x00007f271bf8148d in RGWHTTPManager::ReqsThread::entry() () from /lib64/librgw.so.2 #6 0x00007f271b5dddc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f271abe7ced in clone () from /lib64/libc.so.6 RGWHTTPManager::signal_thread() is trying to write 4 bytes to a pipe while curl_multi_wait() is polling that pipe for wakeups. Blocking on a write() indicates that the pipe's buffer is full, so we must be leaking bytes on the read size. I'm now testing a potential fix. I was able to reproduce this deadlock locally by building with HAVE_CURL_MULTI_WAIT=0. If libcurl doesn't support curl_multi_wait(), we fall back to different logic that uses select() instead. This fallback has a bug that prevents us from ever reading bytes from the pipe, which is why it deadlocks consistently. However, looking at the build logs for these downstream packages, I do see "checking for curl_multi_wait in -lcurl... yes" which would indicate that HAVE_CURL_MULTI_WAIT=1. curl_multi_wait() was added in libcurl 7.28.0, and it appears that we're building against libcurl-devel-7.29.0. So I'm having trouble reconciling these: configure is telling us that HAVE_CURL_MULTI_WAIT=1, while we're observing behavior that should only be possible with HAVE_CURL_MULTI_WAIT=0. Submitted a PR against master with the fix for select(): https://github.com/ceph/ceph/pull/9785 I've established that these downstream packages do have curl_multi_wait(), so the fix to select() will not help here. Further investigation with gdb on magna075 shows that curl_multi_wait() is never flagging this pipe fd as signaled. This behavior has not been observed in development environments running recent versions of libcurl (my Fedora has libcurl-devel-7.43.0). I managed to find a "curl_multi_wait: fix revents" commit at https://github.com/curl/curl/commit/513e587c5eb966038731530c8f47fe0cf27513ce, which landed in 7.32.0. I haven't yet tested it to determine whether it resolves the issue. Ken created a rhel-7.2-patches branch of libcurl at version 7.29.0. After building that branch, my RGWHTTPManager unit test reproduced the deadlock every time. I then cherry-picked "curl_multi_wait: fix revents" and two related commits onto that branch, and the unit test passed consistently without deadlock. This confirms that those fixes will resolve this bug. I opened a pull request for Ken at https://github.com/ktdreyer/curl/pull/1 for packaging. Testing with the new curl package. Plain object sync seems to work. But I noticed two issues with syncing deletes: 1. Upload an object of about 1.5G from master zone. Wait for it to finish uploading. While the object has not finished syncing, delete the object from the master zone. At this point checked the object sync operation has completed and now the object starts to reverse sync thereby re-creating the object on the master zone. 2. Introduced a network delay of 200ms. Did the same operation of creating an object on master zone but with a smaller size file of 500MB and deleting it before the sync finishes. This time, the first object create sync operation never completed. The two zones are out of sync: # radosgw-admin sync status --rgw-zone=us-2 --debug-rgw=0 --debug-ms=0 realm fedc07d8-a4cc-40c0-b8ad-4e1be8251726 (earth) zonegroup 4401713c-7fdf-4619-adea-829c5e7fdd0d (us) zone 591f5f4f-2b22-4346-ae9c-45c7e37ad5ac (us-2) metadata sync syncing full sync: 0/64 shards metadata is caught up with master incremental sync: 64/64 shards data sync source: 38b0ab46-20fd-4c94-9f19-193e86c7e343 (us-1) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 2 shards oldest incremental change not applied: 2016-06-22 07:37:26.0.924151s I have debug logs for the second scenario attached. Will reproduce the first one and upload. The tests were run in buckets 'test20' and 'test21' with file name f22 Created attachment 1170650 [details]
magna115 logs
Created attachment 1170656 [details]
magna075
Thanks Shilpa. We have a preliminary fix for these issues upstream. Can we track them as a separate bug so we can close this one for 2.0? (In reply to Casey Bodley from comment #49) > Thanks Shilpa. We have a preliminary fix for these issues upstream. Can we > track them as a separate bug so we can close this one for 2.0? Sure that makes sense. The new curl package fixes the object create sync. Tracking sync delete issues in BZ:1349285 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html |