Description of problem: According to tracker 10295 (http://tracker.ceph.com/issues/10295), Yehuda Sadeh is working on a tool to identify orphan objects (RE: #C31) left behind despite successful deletion. I'd like to file this BZ against tracker 10295 to officially request we backport this utility to Ceph 1.3. Version-Release number of selected component (if applicable): Ceph 1.3 How reproducible: Steps to Reproduce (trigger BZ as follows): 1. Create 100 objects 10MB ea in rgw. 2. Observe that .rgw.buckets occupies 1GB data (as expected). 3. Delete all 100 objects. 4. rgw is now empty radosgw-admin bucket list --bucket=test [] 5. Space occupied is unexpectedly high. rados df: .rgw.buckets 11 959M 0 7447G 303 There are 303 objects with names like default.4209.1__shadow_.cC7M_LMTe8oNgWoGZOiNCiJco9cEEOl_1 ceph df: the object store is basically empty but 3GB RAW is used GLOBAL: SIZE AVAIL RAW USED %RAW USED 22345G 22342G 3126M 0.01 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS data 0 0 0 7447G 0 metadata 1 0 0 7447G 0 rbd 2 0 0 7447G 0 .rgw.root 3 840 0 7447G 3 .rgw.control 4 0 0 7447G 8 .rgw 5 331 0 7447G 2 .rgw.gc 6 0 0 7447G 521 .users.uid 7 1341 0 7447G 5 .users.email 8 43 0 7447G 4 .users 9 43 0 7447G 4 .rgw.buckets.index 10 0 0 7447G 1 .rgw.buckets 11 959M 0 7447G 303 12 0 0 7447G 0 Actual results: Expected results: Additional info: rgw deleting S3 objects leaves __shadow_ objects behind
The tool has been backported to upstream 0.94.4.
Yehuda, Running the tool "orphans find". I get the following error when I run it on my gateway node: sudo radosgw-admin orphans find --pool=.rgw.buckets --job-id=something ERROR: failed to open log pool (.log ret=-2 could not init search, ret=-2 I assume that job id is something that we assign so we can track it? What does the error message means..I haveb't been able to run the tool successfully. Hoping you could help. Didn't find anything meaningful in the logs with debug enabled.
the tool tries to open the radosgw log pool and fails. The log pool needs to exist prior. The following command should do it: $ rados mkpool .log
(In reply to Yehuda Sadeh from comment #5) > the tool tries to open the radosgw log pool and fails. The log pool needs to > exist prior. The following command should do it: > > $ rados mkpool .log Thanks that works. Also, apart from "radosgw-admin orphans find" are there any other options that the tool gives? I couldn't find the code fix for this in the upstream tracket bug.
(In reply to shilpa from comment #6) > (In reply to Yehuda Sadeh from comment #5) > > the tool tries to open the radosgw log pool and fails. The log pool needs to > > exist prior. The following command should do it: > > > > $ rados mkpool .log > I was exploring the orphans tool options. Ran a "radosgw-admin orphans find" followed by "orphans finish" command: # radosgw-admin orphans finish --job-id=test 2> findfinish.txt Segmentation fault (core dumped) From the output of the comamnd: 2016-01-29 07:17:31.912779 7fa407013900 20 init_complete bucket index max shards: 0 *** Caught signal (Segmentation fault) ** in thread 7fa407013900 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb) 1: radosgw-admin() [0x6316c2] 2: (()+0xf100) [0x7fa4037b1100] 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e] 4: (main()+0xbe92) [0x5010b2] 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15] 6: radosgw-admin() [0x5064d9] 2016-01-29 07:17:31.914752 7fa407013900 -1 *** Caught signal (Segmentation fault) ** in thread 7fa407013900 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb) 1: radosgw-admin() [0x6316c2] 2: (()+0xf100) [0x7fa4037b1100] 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e] 4: (main()+0xbe92) [0x5010b2] 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15] 6: radosgw-admin() [0x5064d9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Looking at the bt on the coredump: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `radosgw-admin orphans finish --job-id=something'. Program terminated with signal 11, Segmentation fault. #0 0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 37 return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid), Missing separate debuginfos, use: debuginfo-install ceph-radosgw-0.94.5-3.el7cp.x86_64 (gdb) bt #0 0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 #1 0x000000000063178d in handle_fatal_signal(int) () #2 <signal handler called> #3 0x000000000051b61e in RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*) () #4 0x00000000005010b2 in main () Outputs of the commands are attached to the BZ along with the logs.
Created attachment 1119371 [details] Output from orphans find cmd
Created attachment 1119372 [details] O/p from orphans finish cmd
Created attachment 1119373 [details] Rgw logs
this looks like upstream ceph issue 13888.
13824 to be more correct (13888 is a related backport issue). Please do open a downstream bug for it for the appropriate version.
(In reply to Yehuda Sadeh from comment #12) > 13824 to be more correct (13888 is a related backport issue). Please do open > a downstream bug for it for the appropriate version. Thanks. Downstream BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1304032
The dependent bug is fixed in the latest builds; this is now ready for verification.
The orphans find and finish command seems to work fine. The files that have not yet gone through garbage collection, will be skipped and those that have not been cleaned up even after gc run will be linked, as shown below.. Hope this is what is required from the tool? 2016-02-07 08:17:00.982260 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_7 2016-02-07 08:17:00.982262 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_8 2016-02-07 08:17:00.982264 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_9 2016-02-07 08:17:00.985980 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__1 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.986531 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__10 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.987046 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__11 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.987590 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__12 (mtime=1454830471 threshold=1454746337) Yehuda, please confirm.
yeah, looks good.
Verified the tool's functionality in ceph-0.94.5-8.el7cp.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0313