1254398 – backport tool to identify orphan objects left behind by rgw deleting S3 objects

Bug 1254398 - backport tool to identify orphan objects left behind by rgw deleting S3 objects

Summary: backport tool to identify orphan objects left behind by rgw deleting S3 objects

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	1.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.2
Assignee:	Yehuda Sadeh
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1304032
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-18 01:54 UTC by Moses Muir
Modified:	2022-02-21 18:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:	RHEL: ceph-0.94.5-8.el7cp Ubuntu: ceph_0.94.5-5redhat1trusty
Doc Type:	Bug Fix
Doc Text:	Cause: An earlier bug in previous versions of RGW caused S3 deletion operations to leave __shadow_ objects behind (RHBZ#1211431). Consequence: Users' clusters could contain orphaned __shadow_ objects that should be cleaned up to save space. Fix: The radosgw-admin command has a new "orphans" subcommand. Result: Users can use this subcommand to find and fix orphaned objects.
Clone Of:
Environment:
Last Closed:	2016-02-29 14:42:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output from orphans find cmd (64.34 KB, text/plain) 2016-01-29 07:42 UTC, shilpa	no flags	Details
O/p from orphans finish cmd (27.65 KB, text/plain) 2016-01-29 07:43 UTC, shilpa	no flags	Details
Rgw logs (3.54 MB, text/plain) 2016-01-29 07:47 UTC, shilpa	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	9604	None	None	None	2015-12-17 20:35:46 UTC
Red Hat Issue Tracker	RHCEPH-3480	None	None	None	2022-02-21 18:34:57 UTC
Red Hat Product Errata	RHBA-2016:0313	normal	SHIPPED_LIVE	Red Hat Ceph Storage 1.3.2 bug fix and enhancement update	2016-02-29 19:37:43 UTC

Description Moses Muir 2015-08-18 01:54:40 UTC

Description of problem:

According to tracker 10295 (http://tracker.ceph.com/issues/10295), Yehuda Sadeh is working on a tool to identify orphan objects (RE: #C31) left behind despite successful deletion. I'd like to file this BZ against tracker 10295 to officially request we backport this utility to Ceph 1.3.

Version-Release number of selected component (if applicable):

Ceph 1.3

How reproducible:


Steps to Reproduce (trigger BZ as follows):

1. Create 100 objects 10MB ea in rgw.
2. Observe that .rgw.buckets occupies 1GB data (as expected).
3. Delete all 100 objects.
4. rgw is now empty

    radosgw-admin bucket list --bucket=test
    []

5. Space occupied is unexpectedly high.

rados df:    .rgw.buckets           11     959M         0         7447G         303

There are 303 objects with names like default.4209.1__shadow_.cC7M_LMTe8oNgWoGZOiNCiJco9cEEOl_1

ceph df: the object store is basically empty but 3GB RAW is used
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
22345G 22342G 3126M 0.01
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
data 0 0 0 7447G 0
metadata 1 0 0 7447G 0
rbd 2 0 0 7447G 0
.rgw.root 3 840 0 7447G 3
.rgw.control 4 0 0 7447G 8
.rgw 5 331 0 7447G 2
.rgw.gc 6 0 0 7447G 521
.users.uid 7 1341 0 7447G 5
.users.email 8 43 0 7447G 4
.users 9 43 0 7447G 4
.rgw.buckets.index 10 0 0 7447G 1
.rgw.buckets 11 959M 0 7447G 303
12 0 0 7447G 0


Actual results:


Expected results:


Additional info:

rgw deleting S3 objects leaves __shadow_ objects behind

Comment 2 Yehuda Sadeh 2015-10-20 18:15:25 UTC

The tool has been backported to upstream 0.94.4.

Comment 4 shilpa 2016-01-28 07:40:19 UTC

Yehuda,
Running the tool "orphans find". I get the following error when I run it on my gateway node:

sudo radosgw-admin orphans find --pool=.rgw.buckets --job-id=something
ERROR: failed to open log pool (.log ret=-2
could not init search, ret=-2

I assume that job id is something that we assign so we can track it? What does  the error message means..I haveb't been able to run the tool successfully. Hoping you could help. Didn't find anything meaningful in the logs with debug enabled.

Comment 5 Yehuda Sadeh 2016-01-28 16:15:38 UTC

the tool tries to open the radosgw log pool and fails. The log pool needs to exist prior. The following command should do it:

$ rados mkpool .log

Comment 6 shilpa 2016-01-28 17:47:31 UTC

(In reply to Yehuda Sadeh from comment #5)
> the tool tries to open the radosgw log pool and fails. The log pool needs to
> exist prior. The following command should do it:
> 
> $ rados mkpool .log

Thanks that works. Also, apart from "radosgw-admin orphans find" are there any other options that the tool gives? I couldn't find the code fix for this in the upstream tracket bug.

Comment 7 shilpa 2016-01-29 07:39:20 UTC

(In reply to shilpa from comment #6)
> (In reply to Yehuda Sadeh from comment #5)
> > the tool tries to open the radosgw log pool and fails. The log pool needs to
> > exist prior. The following command should do it:
> > 
> > $ rados mkpool .log
> 

I was exploring the orphans tool options. Ran a "radosgw-admin orphans find" followed by "orphans finish" command:

# radosgw-admin orphans finish --job-id=test 2> findfinish.txt
Segmentation fault (core dumped)

From the output of the comamnd:

2016-01-29 07:17:31.912779 7fa407013900 20 init_complete bucket index max shards: 0
*** Caught signal (Segmentation fault) **
 in thread 7fa407013900
 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb)
 1: radosgw-admin() [0x6316c2]
 2: (()+0xf100) [0x7fa4037b1100]
 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e]
 4: (main()+0xbe92) [0x5010b2]
 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15]
 6: radosgw-admin() [0x5064d9]
2016-01-29 07:17:31.914752 7fa407013900 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa407013900

 ceph version 0.94.5-3.el7cp (cd5b9adaf4725d17c27dc32258778be21520e1bb)
 1: radosgw-admin() [0x6316c2]
 2: (()+0xf100) [0x7fa4037b1100]
 3: (RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*)+0x9e) [0x51b61e]
 4: (main()+0xbe92) [0x5010b2]
 5: (__libc_start_main()+0xf5) [0x7fa4027d4b15]
 6: radosgw-admin() [0x5064d9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Looking at the bt on the coredump:


[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `radosgw-admin orphans finish --job-id=something'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
37	  return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install ceph-radosgw-0.94.5-3.el7cp.x86_64
(gdb) bt
#0  0x00007fdab4319fcb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x000000000063178d in handle_fatal_signal(int) ()
#2  <signal handler called>
#3  0x000000000051b61e in RGWOrphanSearch::init(std::string const&, RGWOrphanSearchInfo*) ()
#4  0x00000000005010b2 in main ()


Outputs of the commands are attached to the BZ along with the logs.

Comment 8 shilpa 2016-01-29 07:42:27 UTC

Created attachment 1119371 [details]
Output from orphans find cmd

Comment 9 shilpa 2016-01-29 07:43:03 UTC

Created attachment 1119372 [details]
O/p from orphans finish cmd

Comment 10 shilpa 2016-01-29 07:47:06 UTC

Created attachment 1119373 [details]
Rgw logs

Comment 11 Yehuda Sadeh 2016-02-02 16:27:45 UTC

this looks like upstream ceph issue 13888.

Comment 12 Yehuda Sadeh 2016-02-02 16:32:11 UTC

13824 to be more correct (13888 is a related backport issue). Please do open a downstream bug for it for the appropriate version.

Comment 13 shilpa 2016-02-02 17:36:44 UTC

(In reply to Yehuda Sadeh from comment #12)
> 13824 to be more correct (13888 is a related backport issue). Please do open
> a downstream bug for it for the appropriate version.

Thanks. Downstream BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1304032

Comment 14 Ken Dreyer (Red Hat) 2016-02-06 02:32:26 UTC

The dependent bug is fixed in the latest builds; this is now ready for verification.

Comment 16 shilpa 2016-02-07 08:24:42 UTC

The orphans find and finish command seems to work fine. 

The files that have not yet gone through garbage collection, will be skipped and those that have not been cleaned up even after gc run will be linked, as shown below.. Hope this is what is required from the tool? 

2016-02-07 08:17:00.982260 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_7
2016-02-07 08:17:00.982262 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_8
2016-02-07 08:17:00.982264 7f85b01ba880 20 linked: default.82072.3__shadow_.gJCaVkhdaOFZyJvtgvTI_xu6ljmI6af_9
2016-02-07 08:17:00.985980 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__1 (mtime=1454830470 threshold=1454746337)
2016-02-07 08:17:00.986531 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__10 (mtime=1454830470 threshold=1454746337)
2016-02-07 08:17:00.987046 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__11 (mtime=1454830470 threshold=1454746337)
2016-02-07 08:17:00.987590 7f85b01ba880 20 skipping: default.144113.1__shadow_.gUp7myWbQCly2ZJJWVFQCph4oWFya9__12 (mtime=1454830471 threshold=1454746337)

Yehuda, please confirm.

Comment 17 Yehuda Sadeh 2016-02-09 17:21:38 UTC

yeah, looks good.

Comment 18 shilpa 2016-02-10 06:49:09 UTC

Verified the tool's functionality in ceph-0.94.5-8.el7cp.

Comment 21 errata-xmlrpc 2016-02-29 14:42:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313

Note You need to log in before you can comment on or make changes to this bug.