Bug 1254398
| Summary: | backport tool to identify orphan objects left behind by rgw deleting S3 objects | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Moses Muir <mmuir> | ||||||||
| Component: | RGW | Assignee: | Yehuda Sadeh <yehuda> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 1.3.0 | CC: | cbodley, ceph-eng-bugs, flucifre, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, yehuda | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 1.3.2 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | RHEL: ceph-0.94.5-8.el7cp Ubuntu: ceph_0.94.5-5redhat1trusty | Doc Type: | Bug Fix | ||||||||
| Doc Text: |
Cause: An earlier bug in previous versions of RGW caused S3 deletion operations to leave __shadow_ objects behind (RHBZ#1211431).
Consequence: Users' clusters could contain orphaned __shadow_ objects that should be cleaned up to save space.
Fix: The radosgw-admin command has a new "orphans" subcommand.
Result: Users can use this subcommand to find and fix orphaned objects.
|
Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2016-02-29 14:42:59 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1304032 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Moses Muir
2015-08-18 01:54:40 UTC
The tool has been backported to upstream 0.94.4. Yehuda, Running the tool "orphans find". I get the following error when I run it on my gateway node: sudo radosgw-admin orphans find --pool=.rgw.buckets --job-id=something ERROR: failed to open log pool (.log ret=-2 could not init search, ret=-2 I assume that job id is something that we assign so we can track it? What does the error message means..I haveb't been able to run the tool successfully. Hoping you could help. Didn't find anything meaningful in the logs with debug enabled. the tool tries to open the radosgw log pool and fails. The log pool needs to exist prior. The following command should do it: $ rados mkpool .log (In reply to Yehuda Sadeh from comment #5) > the tool tries to open the radosgw log pool and fails. The log pool needs to > exist prior. The following command should do it: > > $ rados mkpool .log Thanks that works. Also, apart from "radosgw-admin orphans find" are there any other options that the tool gives? I couldn't find the code fix for this in the upstream tracket bug. (In reply to shilpa from comment #6) > (In reply to Yehuda Sadeh from comment #5) > > the tool tries to open the radosgw log pool and fails. The log pool needs to > > exist prior. The following command should do it: > > > > $ rados mkpool .log > I was exploring the orphans tool options. Ran a "radosgw-admin orphans find" followed by "orphans finish" command: # radosgw-admin orphans finish --job-id=test 2> findfinish.txt Segmentation fault (core dumped) From the output of the comamnd: 2016-01-29 07:17:31.912779 7fa407013900 20 init_complete bucket index max shards: 0 *** Caught signal (Segmentation fault) ** in thread 7fa407013900 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb) 1: radosgw-admin() [0x6316c2] 2: (()+0xf100) [0x7fa4037b1100] 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e] 4: (main()+0xbe92) [0x5010b2] 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15] 6: radosgw-admin() [0x5064d9] 2016-01-29 07:17:31.914752 7fa407013900 -1 *** Caught signal (Segmentation fault) ** in thread 7fa407013900 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb) 1: radosgw-admin() [0x6316c2] 2: (()+0xf100) [0x7fa4037b1100] 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e] 4: (main()+0xbe92) [0x5010b2] 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15] 6: radosgw-admin() [0x5064d9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Looking at the bt on the coredump: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `radosgw-admin orphans finish --job-id=something'. Program terminated with signal 11, Segmentation fault. #0 0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 37 return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid), Missing separate debuginfos, use: debuginfo-install ceph-radosgw-0.94.5-3.el7cp.x86_64 (gdb) bt #0 0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 #1 0x000000000063178d in handle_fatal_signal(int) () #2 <signal handler called> #3 0x000000000051b61e in RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*) () #4 0x00000000005010b2 in main () Outputs of the commands are attached to the BZ along with the logs. Created attachment 1119371 [details]
Output from orphans find cmd
Created attachment 1119372 [details]
O/p from orphans finish cmd
Created attachment 1119373 [details]
Rgw logs
this looks like upstream ceph issue 13888. 13824 to be more correct (13888 is a related backport issue). Please do open a downstream bug for it for the appropriate version. (In reply to Yehuda Sadeh from comment #12) > 13824 to be more correct (13888 is a related backport issue). Please do open > a downstream bug for it for the appropriate version. Thanks. Downstream BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1304032 The dependent bug is fixed in the latest builds; this is now ready for verification. The orphans find and finish command seems to work fine. The files that have not yet gone through garbage collection, will be skipped and those that have not been cleaned up even after gc run will be linked, as shown below.. Hope this is what is required from the tool? 2016-02-07 08:17:00.982260 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_7 2016-02-07 08:17:00.982262 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_8 2016-02-07 08:17:00.982264 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_9 2016-02-07 08:17:00.985980 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__1 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.986531 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__10 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.987046 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__11 (mtime=1454830470 threshold=1454746337) 2016-02-07 08:17:00.987590 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__12 (mtime=1454830471 threshold=1454746337) Yehuda, please confirm. yeah, looks good. Verified the tool's functionality in ceph-0.94.5-8.el7cp. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0313 |