1752130 – [RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for a given bucket

Bug 1752130 - [RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for a given bucket

Summary: [RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	z4
Target Release:	3.3
Assignee:	J. Eric Ivancich
QA Contact:	Uday kurundwade
Docs Contact:	Ranjini M N
URL:
Whiteboard:
Duplicates (2):	1750965 1821865 (view as bug list)
Depends On:	1805376
Blocks:	1726135 1770955 1812375 1815211 1821884
TreeView+	depends on / blocked

Reported:	2019-09-13 19:29 UTC by Vikhyat Umrao
Modified:	2023-10-06 18:36 UTC (History)
CC List:	15 users (show)
Fixed In Version:	RHEL: ceph-12.2.12-96.el7cp Ubuntu: ceph_12.2.12-91redhat1
Doc Type:	Enhancement
Doc Text:	.New commands to view the RADOS objects and orphans This release adds two new commands to view how Object Gateway maps to RADOS objects and produce a potential list of orphans for further processing. The `radosgw-admin bucket radoslist --bucket=<bucket_name>` command lists all RADOS objects in the bucket. The `rgw-orphan-list` command lists all orphans in a specified pool. These commands keep intermediate results on the local file system.
Clone Of:
Clones:	1770955 1812375 1815211 (view as bug list)
Environment:
Last Closed:	2020-04-06 08:27:04 UTC
Embargoed:
Dependent Products:
Flags:	rmandyam: needinfo-

Attachments	(Terms of Use)
A utility to efficiently compute a line-by-line diff on two sorted text files. (230.00 KB, application/octet-stream) 2019-12-19 04:51 UTC, J. Eric Ivancich	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	41828	None	None	None	2019-09-13 19:29:48 UTC
Red Hat Issue Tracker	RHCEPH-7634	None	None	None	2023-10-06 18:36:19 UTC
Red Hat Product Errata	RHBA-2020:1320	None	None	None	2020-04-06 08:27:45 UTC

Internal Links: 1750965

Description Vikhyat Umrao 2019-09-13 19:29:22 UTC

Description of problem:
[RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for a given bucket
https://tracker.ceph.com/issues/41828

Comment 1 RHEL Program Management 2019-09-13 19:29:29 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Vikhyat Umrao 2019-09-13 19:56:52 UTC

Thanks, Eric for this feature it will help us a lot. 

For finding the orphan with this command help we will create two lists one which will be produced with this command and another one will be produced with `rados -p <rgw data pool> ls` command.

If my understanding is correct `rados ls` has to be run *first* and it should be completed and recorded to a text file then only this *new* sub-command with per bucket basis should be run to find out the legitimate object names at rados level from each bucket and then all the legitimate buckets objects list can be combined then take a diff from `rados ls` recorded output and the final results should be orphan objects.

The reason I am saying `rados ls` has to be run first and recorded and saved and then only this new command should be started to avoid any ongoing new writes to be recorded in the data pool should not be in the list as we want always bucket list output to be superset of legitimate objects and `rados ls` output to be subset of total objects.

Is my understanding correct?

Comment 3 Vikhyat Umrao 2019-09-13 20:11:02 UTC

In summary, the procedure looks something like given below if my understanding in comment#3 is correct.


- The approach is pretty simple, execute the `rados ls` command and store the list objects in a list text file.
- The new radosgw-admin subcommand will give legitimate objects name for each bucket as they stored in the rados. This command would be per bucket so can be run for different buckets in parallel keeping an eye on load on the cluster.
  + Once you have object's name from all the bucket combine them to a single list.

- Take a difference of `rados ls` list and legitimate objects list the difference in `rados ls` list text file will come out will be the orphan objects.
- Once you have the list of orphan objects you can process them with `rados rm` command.


Eric - please let me know your inputs.

Comment 5 J. Eric Ivancich 2019-09-13 20:30:35 UTC

(In reply to Vikhyat Umrao from comment #2)
> The reason I am saying `rados ls` has to be run first and recorded and saved
> and then only this new command should be started to avoid any ongoing new
> writes to be recorded in the data pool should not be in the list as we want
> always bucket list output to be superset of legitimate objects and `rados
> ls` output to be subset of total objects.
> 
> Is my understanding correct?

I think this is correct. Obviously as long as the bucket is not being manipulated it would not matter. But if there is a chance that the bucket is being manipulated, then we would not to create a new rados object that would be unaccounted for by an earlier listing of expect rados objects.

Comment 6 J. Eric Ivancich 2019-09-13 20:32:32 UTC

(In reply to Vikhyat Umrao from comment #3)
> In summary, the procedure looks something like given below if my
> understanding in comment#3 is correct.
> 
> 
> - The approach is pretty simple, execute the `rados ls` command and store
> the list objects in a list text file.
> - The new radosgw-admin subcommand will give legitimate objects name for
> each bucket as they stored in the rados. This command would be per bucket so
> can be run for different buckets in parallel keeping an eye on load on the
> cluster.
>   + Once you have object's name from all the bucket combine them to a single
> list.
> 
> - Take a difference of `rados ls` list and legitimate objects list the
> difference in `rados ls` list text file will come out will be the orphan
> objects.
> - Once you have the list of orphan objects you can process them with `rados
> rm` command.
> 
> 
> Eric - please let me know your inputs.

That sounds approximately correct.

Comment 7 J. Eric Ivancich 2019-09-13 20:34:00 UTC

(In reply to J. Eric Ivancich from comment #6)
> (In reply to Vikhyat Umrao from comment #3)
> > In summary, the procedure looks something like given below if my
> > understanding in comment#3 is correct.
> > 
> > 
> > - The approach is pretty simple, execute the `rados ls` command and store
> > the list objects in a list text file.
> > - The new radosgw-admin subcommand will give legitimate objects name for
> > each bucket as they stored in the rados. This command would be per bucket so
> > can be run for different buckets in parallel keeping an eye on load on the
> > cluster.
> >   + Once you have object's name from all the bucket combine them to a single
> > list.
> > 
> > - Take a difference of `rados ls` list and legitimate objects list the
> > difference in `rados ls` list text file will come out will be the orphan
> > objects.
> > - Once you have the list of orphan objects you can process them with `rados
> > rm` command.
> > 
> > 
> > Eric - please let me know your inputs.
> 
> That sounds approximately correct.

My use of the word "approximately" is to leave a little wiggle room to see what's easiest to do correctly given time pressures.

Comment 8 Vikhyat Umrao 2019-09-13 22:36:53 UTC

[RFE] tools/rados: allow list objects in a specific pg in a pool
https://bugzilla.redhat.com/show_bug.cgi?id=1752163
https://tracker.ceph.com/issues/41831

Already present in Nautilus the above bug will backport it to RHCS 3.x.

Comment 10 J. Eric Ivancich 2019-09-17 20:57:50 UTC

I provided this update on another, related BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1750965#c14), but it really belongs here....

I wanted to provide an update as to where I am on the tool....

In order to populate a test cluster, I create four buckets and apply different rgw features to each.

    * a bucket with both a successful and aborted multipart upload
    * a bucket that is resharded twice
    * a bucket with versioning turned on, each object is updated twice after initially put, and half are removed
    * a bucket with some original objects and also some objects copied from other bucket (with the bucket they came from removed)

I then use `rados ls` to get the raw pool objects. And then I run the tool. I sort the outputs of each and compare.

There are some mismatches, which I'm currently digging into.

Comment 14 J. Eric Ivancich 2019-09-17 21:11:16 UTC

Another update....

I believe I've now resolved the differences in buckets with multipart objects.

There are still differences in versioned buckets, but I think I know the issue and have a good idea as to how to resolve it. I may or may not be able to get that done tonight.

Eric

Comment 15 J. Eric Ivancich 2019-09-17 21:13:45 UTC

So that others can see how I'm doing preliminary testing of the tool and provide feedback, here's the script I'm currently using to generate the buckets and their contents.

#!/bin/bash

# trap "exit" INT TERM ERR
# trap "kill 0" EXIT

src_obj=/tmp/src.$$
count=5
size=2222 # in megabytes

dowait() {
    # set to true to slow down process and require enter to be pressed
    if true ;then
	read -p "$* Continue..."
    fi
}

upload() {
    # no checking for now...
    if false ;then
	if [ $# -ne 4 ] ;then
	    echo Usage: $0 bucket size minutes prefix
	    exit 1
	fi
    fi

    obj=$1
    bucket=$2
    tag=$3
    kill_time=$4

    dest_obj="${tag}"

    if [ "$kill_time" -ne 0 ] ;then
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/${dest_obj} &
	sleep $kill_time
	s3cmd multipart s3://${bucket}
	echo stopping upload of $tag
	kill -TERM $(jobs -pr)
    else
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/$dest_obj
	echo finished upload of $tag
    fi
}

../src/stop.sh
RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
sleep 5

# dowait attach debugger

############################################################

s3cmd mb s3://multipart-bkt

dd if=/dev/urandom of=$src_obj bs=1M count=${size}

# upload $src_obj $bucket 101 0

upload $src_obj multipart-bkt multipart-obj-1 0
upload $src_obj multipart-bkt multipart-obj-2 14

wait

############################################################

echo "hello world" >$src_obj

s3cmd mb s3://resharded-bkt

for f in $(seq 8) ; do
    dest_obj="reshard-obj-${f}"
    s3cmd put -q $src_obj s3://resharded-bkt/$dest_obj
done

bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt
bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt

############################################################

s3cmd mb s3://versioned-bkt

bucket-enable-versioning.sh versioned-bkt

for f in $(seq 3) ;do
    echo "this is data $f" >$src_obj
    for g in $(seq 10) ;do
	dest_obj="versioned-obj-${g}"
	s3cmd put -q $src_obj s3://versioned-bkt/$dest_obj
    done
done

for g in $(seq 1 2 10) ;do
    dest_obj="versioned-obj-${g}"
    s3cmd rm s3://versioned-bkt/$dest_obj
done

############################################################

s3cmd mb s3://orig-bkt

echo "this is small" >$src_obj
for f in $(seq 4) ;do
    dest_obj="orig-obj-$f"
    s3cmd put -q $src_obj s3://orig-bkt/$dest_obj
done

s3cmd mb s3://copy-bkt

s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3

for f in $(seq 5 6) ;do
    dest_obj="orig-obj-$f"
    s3cmd put -q $src_obj s3://copy-bkt/$dest_obj
done

s3cmd rb --recursive s3://orig-bkt

############################################################

buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"

for b in $buckets ; do
    echo " "
    s3cmd ls s3://$b
    # echo " "
    # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done

########################################

echo done

Comment 18 J. Eric Ivancich 2019-09-18 04:11:24 UTC

(In reply to Vikhyat Umrao from comment #16)
> Thanks, Eric to me looks good. If I understand correct this tool will be
> completed when `rados ls` and this tool output will match exactly and there
> is no difference the reason for this is in perfect test scenario(when no
> leak and orphan objects) the list should be exact same from both outputs.

That's mostly true, Vikhyat. My testing script, however, forces a failed multipart upload (along with a successful one). So we should expect to see object in `rados ls` from the failed multipart upload that we do not see with the tool, thus allowing those objects to be removed. And that part seems to be working currently.

I just need to resolve the discrepancy in versioned buckets.

Comment 21 J. Eric Ivancich 2019-09-18 23:25:56 UTC

Here's an update.

So I got a version of the tool "radosgw-admin bucket radoslist --bucket=foobar" that works against all my test cases. I updated my old testing script (https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c15) that creates buckets of various forms, so that all objects written were over 4MB, to force tail objects. I will add a comment below with the latest version.

This was all done in my 3.2 dev environment and passing my tests. Customer is on 2.5 currently, but I haven't maintained a vm where I can build that given it's e-o-l'ed.

I backported it to 2.5, which was trivial, and added the commit to the branch Thomas made for the build. He has initiated the build, and he and I are at the office waiting to make sure it hits the stage where we can be assured that there are not any compiler errors.

Once we have a build, I think the plan is to have Vikhyat do some testing. And depending on what Vikhyat finds, we'll go from there.

Eric

Comment 22 J. Eric Ivancich 2019-09-18 23:27:45 UTC

Below is the latest version of my script to create a number of buckets that exercise various features of rgw, so I can then compare the output of `rados ls` and `radosgw-admin bucket radoslist --bucket=<bucket-name>`.


#!/bin/bash

# trap "exit" INT TERM ERR
# trap "kill 0" EXIT

count=5
huge_size=2222 # in megabytes
big_size=6 # in megabytes

huge_obj=/tmp/huge_obj.$$
big_obj=/tmp/big_obj.$$


dowait() {
    # set to true to slow down process and require enters to be pressed
    if true ;then
	read -p "$* Continue..."
    fi
}

upload() {
    # no checking for now...
    if false ;then
	if [ $# -ne 4 ] ;then
	    echo Usage: $0 bucket size minutes prefix
	    exit 1
	fi
    fi

    obj=$1
    bucket=$2
    tag=$3
    kill_time=$4

    dest_obj="${tag}"

    if [ "$kill_time" -ne 0 ] ;then
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/${dest_obj} &
	sleep $kill_time
	s3cmd multipart s3://${bucket}
	echo stopping upload of $tag
	kill -TERM $(jobs -pr)
    else
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/$dest_obj
	echo finished upload of $tag
    fi
}

../src/stop.sh
RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
sleep 5

dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
dd if=/dev/urandom of=$big_obj bs=1M count=${big_size}



# dowait attach debugger

############################################################

s3cmd mb s3://multipart-bkt

upload $huge_obj multipart-bkt multipart-obj-1 0
upload $huge_obj multipart-bkt multipart-obj-2 14

wait

############################################################

s3cmd mb s3://resharded-bkt

for f in $(seq 8) ; do
    dest_obj="reshard-obj-${f}"
    s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj
done

bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt
bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt

############################################################

s3cmd mb s3://versioned-bkt

bucket-enable-versioning.sh versioned-bkt

for f in $(seq 3) ;do
    for g in $(seq 10) ;do
	dest_obj="versioned-obj-${g}"
	s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj
    done
done

for g in $(seq 1 2 10) ;do
    dest_obj="versioned-obj-${g}"
    s3cmd rm s3://versioned-bkt/$dest_obj
done

############################################################

s3cmd mb s3://orig-bkt

for f in $(seq 4) ;do
    dest_obj="orig-obj-$f"
    s3cmd put -q $big_obj s3://orig-bkt/$dest_obj
done

s3cmd mb s3://copy-bkt

s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3

for f in $(seq 5 6) ;do
    dest_obj="orig-obj-$f"
    s3cmd put -q $big_obj s3://copy-bkt/$dest_obj
done

s3cmd rb --recursive s3://orig-bkt

############################################################

buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"

for b in $buckets ; do
    echo " "
    s3cmd ls s3://$b
    # echo " "
    # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done

########################################

echo done

Comment 32 J. Eric Ivancich 2019-09-20 02:05:41 UTC

Here's an update....

I spent most of the day working on discrepancies on the 2.5 version of this. [As background, I've not maintained build environments for 2.5, so it was faster to do my initial work on 3.2. But in the backport to 2.5 discrepancies emerged. Key data structures changed in name and form and use between those two versions changed, causing a number of issues.]

I worked through all but one of them. The remaining discrepancy should not have an impact on customer's specific use-case. Let me discuss it in detail below.

When a multipart upload fails, certain objects remain; and we'd like clean them up, such as with this tool. Because it's a multipart upload, segments are uploaded individually in order to make the whole. The manifest is not created (at least in 2.5) until the upload is complete. And the algorithm this tool is based on uses the manifest in order to generate the full set of rados objects.

So in the case of incomplete multipart uploads, the tool will a) fail to produce the list of objects that did get uploaded (this is good because we want those to be deleted), and b) it will produce some incorrect object names (this is neutral as it doesn't help or hurt the process when we subtract the objects produced here from those produced by `rados ls`). I could have removed b) HOWEVER, there is a comment in the code that sufficiently old objects (which I interpret to mean objects created on sufficiently old versions of rgw) do not have a manifest either. So I left the code there.

So the question remains how long ago it was that versions of rgw did not produce the manifests and whether the customer's cluster ever used such a version of rgw (and upgraded from there). I will look at commits but I will also ask Yehuda, in case he knows off-hand.

But given that this version, after sufficient testing on Friday, *might* be useable by the customer, I've added a commit with fixes to the branch and spoke with Thomas. He will initiate the builds sometime tonight, so a version will be available for Vikhyat for testing tomorrow morning.

Comment 33 J. Eric Ivancich 2019-09-20 02:34:22 UTC

Yehuda,

There is a comment in the orphan finding code that a "very very old object" may not have a manifest. Do you know off-hand which versions of ceph/RGW this would apply to? I would like to know that this condition would not apply to customer's cluster.

Thank you,

Eric

Comment 34 J. Eric Ivancich 2019-09-20 02:50:03 UTC

I looked through older commits and it seems that the RGWObjManfiest came into being in the spring of 2012 and gained some important functionality in 2014. I wouldn't mind hearing Yehuda's take, nonetheless.

Comment 36 Casey Bodley 2019-09-20 15:15:24 UTC

(In reply to J. Eric Ivancich from comment #32)
> When a multipart upload fails, certain objects remain; and we'd like clean
> them up, such as with this tool. Because it's a multipart upload, segments
> are uploaded individually in order to make the whole. The manifest is not
> created (at least in 2.5) until the upload is complete. And the algorithm
> this tool is based on uses the manifest in order to generate the full set of
> rados objects.

These incomplete parts do get tracked in the bucket index under the "multipart" namespace. These entries are returned by cls_bucket_list_ordered() but they're filtered out (based on params.enforce_ns and params.ns) in RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list this "multipart" namespace separately (similar to list_multipart_parts() in rgw_multi.cc), or it could set params.enforce_ns=false and manually filter out entries where namespace is not "" or "multipart".

Comment 38 J. Eric Ivancich 2019-09-20 16:04:44 UTC

(In reply to Casey Bodley from comment #36)
> (In reply to J. Eric Ivancich from comment #32)
> > When a multipart upload fails, certain objects remain; and we'd like clean
> > them up, such as with this tool. Because it's a multipart upload, segments
> > are uploaded individually in order to make the whole. The manifest is not
> > created (at least in 2.5) until the upload is complete. And the algorithm
> > this tool is based on uses the manifest in order to generate the full set of
> > rados objects.
> 
> These incomplete parts do get tracked in the bucket index under the
> "multipart" namespace. These entries are returned by
> cls_bucket_list_ordered() but they're filtered out (based on
> params.enforce_ns and params.ns) in
> RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list
> this "multipart" namespace separately (similar to list_multipart_parts() in
> rgw_multi.cc), or it could set params.enforce_ns=false and manually filter
> out entries where namespace is not "" or "multipart".

Thanks, Casey! That's interesting. But I'm thinking the tool should *not* list these objects left over from an incomplete multipart upload, thereby making them eligible for deletion by the manual process discussed above (subtracting output of tool from `rados ls` after sorting both). Or perhaps the ultimate version of the tool would have a command-line option as to whether these get listed or not, with the default being not.

Comment 44 J. Eric Ivancich 2019-09-20 21:36:09 UTC

Update

The tool fails in a normal case used by customer -- swift dlo uploads. Vikhyat just did the test on a new bucket, and the tool produces no output for that bucket. Previous tests were only done after issuing an expiration directive and didn't apply to the normal case....

****
**** So the tool is UNSAFE to use in combination with the `rados ls`, sorts, diffs, and `rados rm`....
****

I'll see what I can do tonight and over the weekend....

Vikhyat is letting the customer know that the tool is currently UNSAFE for their purposes.

Comment 59 J. Eric Ivancich 2019-09-23 21:15:56 UTC

Update

I've added some commits to the branch that contains the tool that pass my prior tests and my newer tests focused on swift -- regular objects along with dlo- and slo-style large objects. Thomas is initiating builds.

The `swift` command-line tool can be used to upload large objects via dlo or slo. When it does so, it creates a separate container to hold the segments, and by default that separate container has the name of the container the large object is being `swift upload`ed to, although using that other container can be specified on the command-line using the `--segment-container` option.

I suspect the ideal way to use swift is to allow all the segments of a large object go to a separate container. That way the users container appears to contain just the full objects. I don't know if customer has strictly held to this behavior or not.

But if you run this tool on a user-container, it will follow the swift-specific manifests of dlo and slo large objects to the segments in whatever container they appear in, and to the rgw manifests that those may contain to produce a list of rados objects that back up that container. In other words, by applying the tool to one container, it can produce objects in other containers, such as associated "_segments" containers. This is PROPER behavior.

Now what if the user (e.g., customer) applies the tool to one of the containers created just for the segments. Well in that case it will re-visit the same objects and their rgw manifests listing them *again*! The customer may be forced to do this if their internal users did not maintain a strict "division" between containers with user objects and containers with their associated segments.

If there is a chance that a the same rados objects may be produced more than once in the entire process, the user may want to consider using the `sort` command-line tool with the `--unique option`, to discard duplicates. Otherwise, they'll have to think carefully through what effect having duplicate entries would have on their process.

Additionally, it should be noted that the linux `sort` command-line tool can behave differently depending on how the "LANG" environment variable is set. When LANG is set to "en_US.UTF-8" the sorted results will be in one order and when set to something else, such as "C". I've found that setting `export LANG=C` produces results that make the most sense from a programming viewpoint.

HOWEVER, IF IN THE PROCESS TO DETERMINE ORPHAN RADOS OBJECTS THAT ARE SAFE TO REMOVE, IT IS ***VITAL*** THAT WHEN THE OUTPUT FROM THIS TOOL IS COMBINED AND SORTED AND WHEN THE OUTPUT FROM `rados ls` IS SORTED, THAT THEY BE SORTED USING THE SAME SETTING OF "LANG". If the LANG setting is different, then `diff` cannot operate correctly and the results will be meaningless and DANGEROUS to feed into an `rados rm` process.

Testing this tool and all manual and scripted procedures to remove "orphan" objects is VITAL for safety and ultimate health of the production cluster. There are many subtleties that, if not understood and accounted for, could result in DISASTEROUS RESULTS.

Comment 60 J. Eric Ivancich 2019-09-23 21:18:21 UTC

Please note that the caveats described above in comment #32 remain.

See: https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c32

Comment 61 J. Eric Ivancich 2019-09-23 21:20:09 UTC

# Here is the latest version of the script used in swift testing

#!/bin/bash

# trap "exit" INT TERM ERR
# trap "kill 0" EXIT

export ST_AUTH=http://localhost:8000/auth/v1.0
export ST_USER=test:tester
export ST_KEY=testing

huge_size=2222 # in megabytes
big_size=6 # in megabytes

# 600MB
segment_size=629145600

huge_obj=~/huge_obj.temp
big_obj=~/big_obj.$$

dd if=/dev/urandom of=$big_obj bs=1M count=${big_size}

make_huge() {
    # if huge object does not exist or is not at least 2'ish gig
    if [ ! -f "$huge_obj" ] ;then
	dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
    elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then
	dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
    fi
}

############################################################
# plain test

if true ;then
    for f in $(seq 4) ;do
	swift upload swift-plain-ctr $big_obj --object-name swift-obj-$f
    done
fi

############################################################
# dlo test

if true ;then
    make_huge

    # upload in 300MB segments
    swift upload swift-dlo-ctr $huge_obj --object-name dlo-obj-1 \
        -S $segment_size
fi

############################################################
# slo test

if true ;then
    make_huge

    # upload in 300MB segments
    swift upload swift-slo-ctr $huge_obj --object-name slo-obj-1 \
        -S $segment_size --use-slo
fi

############################################################

buckets="swift-plain-ctr swift-dlo-ctr swift-slo-ctr swift-dlo-ctr_segments swift-slo-ctr_segments"

for b in $buckets ; do
    echo " "
    swift list $b
    # echo " "
    # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done

echo "buckets: $buckets"

############################################################

rm -f $big_obj

echo done

Comment 62 J. Eric Ivancich 2019-09-23 21:21:11 UTC

# Here is the latest version of the script used in s3 testing

#!/bin/bash

# trap "exit" INT TERM ERR
# trap "kill 0" EXIT

huge_size=2222 # in megabytes
big_size=6 # in megabytes

huge_obj=~/huge_obj.temp
big_obj=~/big_obj.$$


dowait() {
    # set to true to slow down process and require enters to be pressed
    if true ;then
	read -p "$* Continue..."
    fi
}

upload() {
    # no checking for now...
    if false ;then
	if [ $# -ne 4 ] ;then
	    echo Usage: $0 bucket size minutes prefix
	    exit 1
	fi
    fi

    obj=$1
    bucket=$2
    tag=$3
    kill_time=$4

    dest_obj="${tag}"

    if [ "$kill_time" -ne 0 ] ;then
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/${dest_obj} &
	sleep $kill_time
	s3cmd multipart s3://${bucket}
	echo stopping upload of $tag
	kill -TERM $(jobs -pr)
    else
	echo starting upload of $tag
	s3cmd put -q $obj s3://${bucket}/$dest_obj
	echo finished upload of $tag
    fi
}

# src/stop.sh
# RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
# $HOME/ceph-work1/start.sh
# sleep 5

dd if=/dev/urandom of=$big_obj bs=1M count=${big_size}

make_huge() {
    # if huge object does not exist or is not at least 2'ish gig
    if [ ! -f "$huge_obj" ] ;then
	dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
    elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then
	dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
    fi
}

# dowait attach debugger

############################################################
# multipart test

if true ;then
    make_huge

    s3cmd mb s3://multipart-bkt

    upload $huge_obj multipart-bkt multipart-obj-1 0
    upload $huge_obj multipart-bkt multipart-obj-2 14

    wait
fi

############################################################

if true ;then

    s3cmd mb s3://resharded-bkt

    for f in $(seq 8) ; do
	dest_obj="reshard-obj-${f}"
	s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj
    done

    src/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt -c outceph.conf
    src/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt -c out/ceph.conf

fi

############################################################

if true ;then

    s3cmd mb s3://versioned-bkt

    bucket-enable-versioning.sh versioned-bkt

    for f in $(seq 3) ;do
	for g in $(seq 10) ;do
	    dest_obj="versioned-obj-${g}"
	    s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj
	done
    done

    for g in $(seq 1 2 10) ;do
	dest_obj="versioned-obj-${g}"
	s3cmd rm s3://versioned-bkt/$dest_obj
    done

fi

############################################################

if true ;then

    s3cmd mb s3://orig-bkt

    for f in $(seq 4) ;do
	dest_obj="orig-obj-$f"
	s3cmd put -q $big_obj s3://orig-bkt/$dest_obj
    done

    s3cmd mb s3://copy-bkt

    s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
    s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3

    for f in $(seq 5 6) ;do
	dest_obj="orig-obj-$f"
	s3cmd put -q $big_obj s3://copy-bkt/$dest_obj
    done

    s3cmd rb --recursive s3://orig-bkt

fi

############################################################

buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"

for b in $buckets ; do
    echo " "
    s3cmd ls s3://$b
    # echo " "
    # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done

############################################################

src/radosgw-admin -c out/ceph.conf gc process --include-all

rm -f $big_obj

echo done

Comment 73 J. Eric Ivancich 2019-09-24 20:49:46 UTC

Update w.r.t. tenanted containers

If a bucket belongs to a tenant, for this tool to respond correctly, `radosgw-admin bucket radoslist` must have both `--bucket=` and `--tenant=` command-line arguments.

If a user instead tries to not specify the tenant and prefix the container name with the tenant identifier, the tool will not be able to follow the manifests appropriately and may generate runtime errors.

Note: since `radosgw-admin` uses the "--bucket" terminology, we use "bucket" and "container" interchangeably given that s3 and swift have their own preferred terms.

Comment 80 J. Eric Ivancich 2019-09-25 04:53:25 UTC

I think it's vitally important that the customer fully understands the issues described in comment #32, comment #59, and comment #73 before they test this tool, and certainly before they use this tool to remove any rados objects. It sounds like the initial cluster the customer intends to use the tool on does not have multipart uploads and does not use tenants. HOWEVER, they may choose to use this tool on other clusters. And understanding these subtleties is really *not* optional given it may be used in conjunction with `rados rm`, which could blow away a cluster's necessary data.

What is the plan to insure this?

Comment 137 J. Eric Ivancich 2019-12-19 04:51:03 UTC

Created attachment 1646286 [details]
A utility to efficiently compute a line-by-line diff on two sorted text files.

Since customer noted that the standard linux `diff` utility has issues running on very large input files, this is a simplified diff that can use minimal space due to the requirement that the two input files are in sorted order.

build.sh builds the executable
test.sh tests the executable
clean.sh cleans the directory to prepare it for being tar'd up

Comment 169 errata-xmlrpc 2020-04-06 08:27:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1320

Comment 170 mamccoma 2020-05-07 14:49:41 UTC

*** Bug 1821865 has been marked as a duplicate of this bug. ***

Comment 171 Vikhyat Umrao 2020-09-30 17:12:12 UTC

*** Bug 1750965 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.