Description of problem: [RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for a given bucket https://tracker.ceph.com/issues/41828
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
Thanks, Eric for this feature it will help us a lot. For finding the orphan with this command help we will create two lists one which will be produced with this command and another one will be produced with `rados -p <rgw data pool> ls` command. If my understanding is correct `rados ls` has to be run *first* and it should be completed and recorded to a text file then only this *new* sub-command with per bucket basis should be run to find out the legitimate object names at rados level from each bucket and then all the legitimate buckets objects list can be combined then take a diff from `rados ls` recorded output and the final results should be orphan objects. The reason I am saying `rados ls` has to be run first and recorded and saved and then only this new command should be started to avoid any ongoing new writes to be recorded in the data pool should not be in the list as we want always bucket list output to be superset of legitimate objects and `rados ls` output to be subset of total objects. Is my understanding correct?
In summary, the procedure looks something like given below if my understanding in comment#3 is correct. - The approach is pretty simple, execute the `rados ls` command and store the list objects in a list text file. - The new radosgw-admin subcommand will give legitimate objects name for each bucket as they stored in the rados. This command would be per bucket so can be run for different buckets in parallel keeping an eye on load on the cluster. + Once you have object's name from all the bucket combine them to a single list. - Take a difference of `rados ls` list and legitimate objects list the difference in `rados ls` list text file will come out will be the orphan objects. - Once you have the list of orphan objects you can process them with `rados rm` command. Eric - please let me know your inputs.
(In reply to Vikhyat Umrao from comment #2) > The reason I am saying `rados ls` has to be run first and recorded and saved > and then only this new command should be started to avoid any ongoing new > writes to be recorded in the data pool should not be in the list as we want > always bucket list output to be superset of legitimate objects and `rados > ls` output to be subset of total objects. > > Is my understanding correct? I think this is correct. Obviously as long as the bucket is not being manipulated it would not matter. But if there is a chance that the bucket is being manipulated, then we would not to create a new rados object that would be unaccounted for by an earlier listing of expect rados objects.
(In reply to Vikhyat Umrao from comment #3) > In summary, the procedure looks something like given below if my > understanding in comment#3 is correct. > > > - The approach is pretty simple, execute the `rados ls` command and store > the list objects in a list text file. > - The new radosgw-admin subcommand will give legitimate objects name for > each bucket as they stored in the rados. This command would be per bucket so > can be run for different buckets in parallel keeping an eye on load on the > cluster. > + Once you have object's name from all the bucket combine them to a single > list. > > - Take a difference of `rados ls` list and legitimate objects list the > difference in `rados ls` list text file will come out will be the orphan > objects. > - Once you have the list of orphan objects you can process them with `rados > rm` command. > > > Eric - please let me know your inputs. That sounds approximately correct.
(In reply to J. Eric Ivancich from comment #6) > (In reply to Vikhyat Umrao from comment #3) > > In summary, the procedure looks something like given below if my > > understanding in comment#3 is correct. > > > > > > - The approach is pretty simple, execute the `rados ls` command and store > > the list objects in a list text file. > > - The new radosgw-admin subcommand will give legitimate objects name for > > each bucket as they stored in the rados. This command would be per bucket so > > can be run for different buckets in parallel keeping an eye on load on the > > cluster. > > + Once you have object's name from all the bucket combine them to a single > > list. > > > > - Take a difference of `rados ls` list and legitimate objects list the > > difference in `rados ls` list text file will come out will be the orphan > > objects. > > - Once you have the list of orphan objects you can process them with `rados > > rm` command. > > > > > > Eric - please let me know your inputs. > > That sounds approximately correct. My use of the word "approximately" is to leave a little wiggle room to see what's easiest to do correctly given time pressures.
[RFE] tools/rados: allow list objects in a specific pg in a pool https://bugzilla.redhat.com/show_bug.cgi?id=1752163 https://tracker.ceph.com/issues/41831 Already present in Nautilus the above bug will backport it to RHCS 3.x.
I provided this update on another, related BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1750965#c14), but it really belongs here.... I wanted to provide an update as to where I am on the tool.... In order to populate a test cluster, I create four buckets and apply different rgw features to each. * a bucket with both a successful and aborted multipart upload * a bucket that is resharded twice * a bucket with versioning turned on, each object is updated twice after initially put, and half are removed * a bucket with some original objects and also some objects copied from other bucket (with the bucket they came from removed) I then use `rados ls` to get the raw pool objects. And then I run the tool. I sort the outputs of each and compare. There are some mismatches, which I'm currently digging into.
Another update.... I believe I've now resolved the differences in buckets with multipart objects. There are still differences in versioned buckets, but I think I know the issue and have a good idea as to how to resolve it. I may or may not be able to get that done tonight. Eric
So that others can see how I'm doing preliminary testing of the tool and provide feedback, here's the script I'm currently using to generate the buckets and their contents. #!/bin/bash # trap "exit" INT TERM ERR # trap "kill 0" EXIT src_obj=/tmp/src.$$ count=5 size=2222 # in megabytes dowait() { # set to true to slow down process and require enter to be pressed if true ;then read -p "$* Continue..." fi } upload() { # no checking for now... if false ;then if [ $# -ne 4 ] ;then echo Usage: $0 bucket size minutes prefix exit 1 fi fi obj=$1 bucket=$2 tag=$3 kill_time=$4 dest_obj="${tag}" if [ "$kill_time" -ne 0 ] ;then echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/${dest_obj} & sleep $kill_time s3cmd multipart s3://${bucket} echo stopping upload of $tag kill -TERM $(jobs -pr) else echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/$dest_obj echo finished upload of $tag fi } ../src/stop.sh RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short sleep 5 # dowait attach debugger ############################################################ s3cmd mb s3://multipart-bkt dd if=/dev/urandom of=$src_obj bs=1M count=${size} # upload $src_obj $bucket 101 0 upload $src_obj multipart-bkt multipart-obj-1 0 upload $src_obj multipart-bkt multipart-obj-2 14 wait ############################################################ echo "hello world" >$src_obj s3cmd mb s3://resharded-bkt for f in $(seq 8) ; do dest_obj="reshard-obj-${f}" s3cmd put -q $src_obj s3://resharded-bkt/$dest_obj done bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt ############################################################ s3cmd mb s3://versioned-bkt bucket-enable-versioning.sh versioned-bkt for f in $(seq 3) ;do echo "this is data $f" >$src_obj for g in $(seq 10) ;do dest_obj="versioned-obj-${g}" s3cmd put -q $src_obj s3://versioned-bkt/$dest_obj done done for g in $(seq 1 2 10) ;do dest_obj="versioned-obj-${g}" s3cmd rm s3://versioned-bkt/$dest_obj done ############################################################ s3cmd mb s3://orig-bkt echo "this is small" >$src_obj for f in $(seq 4) ;do dest_obj="orig-obj-$f" s3cmd put -q $src_obj s3://orig-bkt/$dest_obj done s3cmd mb s3://copy-bkt s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1 s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3 for f in $(seq 5 6) ;do dest_obj="orig-obj-$f" s3cmd put -q $src_obj s3://copy-bkt/$dest_obj done s3cmd rb --recursive s3://orig-bkt ############################################################ buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt" for b in $buckets ; do echo " " s3cmd ls s3://$b # echo " " # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1 done ######################################## echo done
(In reply to Vikhyat Umrao from comment #16) > Thanks, Eric to me looks good. If I understand correct this tool will be > completed when `rados ls` and this tool output will match exactly and there > is no difference the reason for this is in perfect test scenario(when no > leak and orphan objects) the list should be exact same from both outputs. That's mostly true, Vikhyat. My testing script, however, forces a failed multipart upload (along with a successful one). So we should expect to see object in `rados ls` from the failed multipart upload that we do not see with the tool, thus allowing those objects to be removed. And that part seems to be working currently. I just need to resolve the discrepancy in versioned buckets.
Here's an update. So I got a version of the tool "radosgw-admin bucket radoslist --bucket=foobar" that works against all my test cases. I updated my old testing script (https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c15) that creates buckets of various forms, so that all objects written were over 4MB, to force tail objects. I will add a comment below with the latest version. This was all done in my 3.2 dev environment and passing my tests. Customer is on 2.5 currently, but I haven't maintained a vm where I can build that given it's e-o-l'ed. I backported it to 2.5, which was trivial, and added the commit to the branch Thomas made for the build. He has initiated the build, and he and I are at the office waiting to make sure it hits the stage where we can be assured that there are not any compiler errors. Once we have a build, I think the plan is to have Vikhyat do some testing. And depending on what Vikhyat finds, we'll go from there. Eric
Below is the latest version of my script to create a number of buckets that exercise various features of rgw, so I can then compare the output of `rados ls` and `radosgw-admin bucket radoslist --bucket=<bucket-name>`. #!/bin/bash # trap "exit" INT TERM ERR # trap "kill 0" EXIT count=5 huge_size=2222 # in megabytes big_size=6 # in megabytes huge_obj=/tmp/huge_obj.$$ big_obj=/tmp/big_obj.$$ dowait() { # set to true to slow down process and require enters to be pressed if true ;then read -p "$* Continue..." fi } upload() { # no checking for now... if false ;then if [ $# -ne 4 ] ;then echo Usage: $0 bucket size minutes prefix exit 1 fi fi obj=$1 bucket=$2 tag=$3 kill_time=$4 dest_obj="${tag}" if [ "$kill_time" -ne 0 ] ;then echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/${dest_obj} & sleep $kill_time s3cmd multipart s3://${bucket} echo stopping upload of $tag kill -TERM $(jobs -pr) else echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/$dest_obj echo finished upload of $tag fi } ../src/stop.sh RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short sleep 5 dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} dd if=/dev/urandom of=$big_obj bs=1M count=${big_size} # dowait attach debugger ############################################################ s3cmd mb s3://multipart-bkt upload $huge_obj multipart-bkt multipart-obj-1 0 upload $huge_obj multipart-bkt multipart-obj-2 14 wait ############################################################ s3cmd mb s3://resharded-bkt for f in $(seq 8) ; do dest_obj="reshard-obj-${f}" s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj done bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt ############################################################ s3cmd mb s3://versioned-bkt bucket-enable-versioning.sh versioned-bkt for f in $(seq 3) ;do for g in $(seq 10) ;do dest_obj="versioned-obj-${g}" s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj done done for g in $(seq 1 2 10) ;do dest_obj="versioned-obj-${g}" s3cmd rm s3://versioned-bkt/$dest_obj done ############################################################ s3cmd mb s3://orig-bkt for f in $(seq 4) ;do dest_obj="orig-obj-$f" s3cmd put -q $big_obj s3://orig-bkt/$dest_obj done s3cmd mb s3://copy-bkt s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1 s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3 for f in $(seq 5 6) ;do dest_obj="orig-obj-$f" s3cmd put -q $big_obj s3://copy-bkt/$dest_obj done s3cmd rb --recursive s3://orig-bkt ############################################################ buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt" for b in $buckets ; do echo " " s3cmd ls s3://$b # echo " " # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1 done ######################################## echo done
Here's an update.... I spent most of the day working on discrepancies on the 2.5 version of this. [As background, I've not maintained build environments for 2.5, so it was faster to do my initial work on 3.2. But in the backport to 2.5 discrepancies emerged. Key data structures changed in name and form and use between those two versions changed, causing a number of issues.] I worked through all but one of them. The remaining discrepancy should not have an impact on customer's specific use-case. Let me discuss it in detail below. When a multipart upload fails, certain objects remain; and we'd like clean them up, such as with this tool. Because it's a multipart upload, segments are uploaded individually in order to make the whole. The manifest is not created (at least in 2.5) until the upload is complete. And the algorithm this tool is based on uses the manifest in order to generate the full set of rados objects. So in the case of incomplete multipart uploads, the tool will a) fail to produce the list of objects that did get uploaded (this is good because we want those to be deleted), and b) it will produce some incorrect object names (this is neutral as it doesn't help or hurt the process when we subtract the objects produced here from those produced by `rados ls`). I could have removed b) HOWEVER, there is a comment in the code that sufficiently old objects (which I interpret to mean objects created on sufficiently old versions of rgw) do not have a manifest either. So I left the code there. So the question remains how long ago it was that versions of rgw did not produce the manifests and whether the customer's cluster ever used such a version of rgw (and upgraded from there). I will look at commits but I will also ask Yehuda, in case he knows off-hand. But given that this version, after sufficient testing on Friday, *might* be useable by the customer, I've added a commit with fixes to the branch and spoke with Thomas. He will initiate the builds sometime tonight, so a version will be available for Vikhyat for testing tomorrow morning.
Yehuda, There is a comment in the orphan finding code that a "very very old object" may not have a manifest. Do you know off-hand which versions of ceph/RGW this would apply to? I would like to know that this condition would not apply to customer's cluster. Thank you, Eric
I looked through older commits and it seems that the RGWObjManfiest came into being in the spring of 2012 and gained some important functionality in 2014. I wouldn't mind hearing Yehuda's take, nonetheless.
(In reply to J. Eric Ivancich from comment #32) > When a multipart upload fails, certain objects remain; and we'd like clean > them up, such as with this tool. Because it's a multipart upload, segments > are uploaded individually in order to make the whole. The manifest is not > created (at least in 2.5) until the upload is complete. And the algorithm > this tool is based on uses the manifest in order to generate the full set of > rados objects. These incomplete parts do get tracked in the bucket index under the "multipart" namespace. These entries are returned by cls_bucket_list_ordered() but they're filtered out (based on params.enforce_ns and params.ns) in RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list this "multipart" namespace separately (similar to list_multipart_parts() in rgw_multi.cc), or it could set params.enforce_ns=false and manually filter out entries where namespace is not "" or "multipart".
(In reply to Casey Bodley from comment #36) > (In reply to J. Eric Ivancich from comment #32) > > When a multipart upload fails, certain objects remain; and we'd like clean > > them up, such as with this tool. Because it's a multipart upload, segments > > are uploaded individually in order to make the whole. The manifest is not > > created (at least in 2.5) until the upload is complete. And the algorithm > > this tool is based on uses the manifest in order to generate the full set of > > rados objects. > > These incomplete parts do get tracked in the bucket index under the > "multipart" namespace. These entries are returned by > cls_bucket_list_ordered() but they're filtered out (based on > params.enforce_ns and params.ns) in > RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list > this "multipart" namespace separately (similar to list_multipart_parts() in > rgw_multi.cc), or it could set params.enforce_ns=false and manually filter > out entries where namespace is not "" or "multipart". Thanks, Casey! That's interesting. But I'm thinking the tool should *not* list these objects left over from an incomplete multipart upload, thereby making them eligible for deletion by the manual process discussed above (subtracting output of tool from `rados ls` after sorting both). Or perhaps the ultimate version of the tool would have a command-line option as to whether these get listed or not, with the default being not.
Update The tool fails in a normal case used by customer -- swift dlo uploads. Vikhyat just did the test on a new bucket, and the tool produces no output for that bucket. Previous tests were only done after issuing an expiration directive and didn't apply to the normal case.... **** **** So the tool is UNSAFE to use in combination with the `rados ls`, sorts, diffs, and `rados rm`.... **** I'll see what I can do tonight and over the weekend.... Vikhyat is letting the customer know that the tool is currently UNSAFE for their purposes.
Update I've added some commits to the branch that contains the tool that pass my prior tests and my newer tests focused on swift -- regular objects along with dlo- and slo-style large objects. Thomas is initiating builds. The `swift` command-line tool can be used to upload large objects via dlo or slo. When it does so, it creates a separate container to hold the segments, and by default that separate container has the name of the container the large object is being `swift upload`ed to, although using that other container can be specified on the command-line using the `--segment-container` option. I suspect the ideal way to use swift is to allow all the segments of a large object go to a separate container. That way the users container appears to contain just the full objects. I don't know if customer has strictly held to this behavior or not. But if you run this tool on a user-container, it will follow the swift-specific manifests of dlo and slo large objects to the segments in whatever container they appear in, and to the rgw manifests that those may contain to produce a list of rados objects that back up that container. In other words, by applying the tool to one container, it can produce objects in other containers, such as associated "_segments" containers. This is PROPER behavior. Now what if the user (e.g., customer) applies the tool to one of the containers created just for the segments. Well in that case it will re-visit the same objects and their rgw manifests listing them *again*! The customer may be forced to do this if their internal users did not maintain a strict "division" between containers with user objects and containers with their associated segments. If there is a chance that a the same rados objects may be produced more than once in the entire process, the user may want to consider using the `sort` command-line tool with the `--unique option`, to discard duplicates. Otherwise, they'll have to think carefully through what effect having duplicate entries would have on their process. Additionally, it should be noted that the linux `sort` command-line tool can behave differently depending on how the "LANG" environment variable is set. When LANG is set to "en_US.UTF-8" the sorted results will be in one order and when set to something else, such as "C". I've found that setting `export LANG=C` produces results that make the most sense from a programming viewpoint. HOWEVER, IF IN THE PROCESS TO DETERMINE ORPHAN RADOS OBJECTS THAT ARE SAFE TO REMOVE, IT IS ***VITAL*** THAT WHEN THE OUTPUT FROM THIS TOOL IS COMBINED AND SORTED AND WHEN THE OUTPUT FROM `rados ls` IS SORTED, THAT THEY BE SORTED USING THE SAME SETTING OF "LANG". If the LANG setting is different, then `diff` cannot operate correctly and the results will be meaningless and DANGEROUS to feed into an `rados rm` process. Testing this tool and all manual and scripted procedures to remove "orphan" objects is VITAL for safety and ultimate health of the production cluster. There are many subtleties that, if not understood and accounted for, could result in DISASTEROUS RESULTS.
Please note that the caveats described above in comment #32 remain. See: https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c32
# Here is the latest version of the script used in swift testing #!/bin/bash # trap "exit" INT TERM ERR # trap "kill 0" EXIT export ST_AUTH=http://localhost:8000/auth/v1.0 export ST_USER=test:tester export ST_KEY=testing huge_size=2222 # in megabytes big_size=6 # in megabytes # 600MB segment_size=629145600 huge_obj=~/huge_obj.temp big_obj=~/big_obj.$$ dd if=/dev/urandom of=$big_obj bs=1M count=${big_size} make_huge() { # if huge object does not exist or is not at least 2'ish gig if [ ! -f "$huge_obj" ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} fi } ############################################################ # plain test if true ;then for f in $(seq 4) ;do swift upload swift-plain-ctr $big_obj --object-name swift-obj-$f done fi ############################################################ # dlo test if true ;then make_huge # upload in 300MB segments swift upload swift-dlo-ctr $huge_obj --object-name dlo-obj-1 \ -S $segment_size fi ############################################################ # slo test if true ;then make_huge # upload in 300MB segments swift upload swift-slo-ctr $huge_obj --object-name slo-obj-1 \ -S $segment_size --use-slo fi ############################################################ buckets="swift-plain-ctr swift-dlo-ctr swift-slo-ctr swift-dlo-ctr_segments swift-slo-ctr_segments" for b in $buckets ; do echo " " swift list $b # echo " " # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1 done echo "buckets: $buckets" ############################################################ rm -f $big_obj echo done
# Here is the latest version of the script used in s3 testing #!/bin/bash # trap "exit" INT TERM ERR # trap "kill 0" EXIT huge_size=2222 # in megabytes big_size=6 # in megabytes huge_obj=~/huge_obj.temp big_obj=~/big_obj.$$ dowait() { # set to true to slow down process and require enters to be pressed if true ;then read -p "$* Continue..." fi } upload() { # no checking for now... if false ;then if [ $# -ne 4 ] ;then echo Usage: $0 bucket size minutes prefix exit 1 fi fi obj=$1 bucket=$2 tag=$3 kill_time=$4 dest_obj="${tag}" if [ "$kill_time" -ne 0 ] ;then echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/${dest_obj} & sleep $kill_time s3cmd multipart s3://${bucket} echo stopping upload of $tag kill -TERM $(jobs -pr) else echo starting upload of $tag s3cmd put -q $obj s3://${bucket}/$dest_obj echo finished upload of $tag fi } # src/stop.sh # RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short # $HOME/ceph-work1/start.sh # sleep 5 dd if=/dev/urandom of=$big_obj bs=1M count=${big_size} make_huge() { # if huge object does not exist or is not at least 2'ish gig if [ ! -f "$huge_obj" ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} fi } # dowait attach debugger ############################################################ # multipart test if true ;then make_huge s3cmd mb s3://multipart-bkt upload $huge_obj multipart-bkt multipart-obj-1 0 upload $huge_obj multipart-bkt multipart-obj-2 14 wait fi ############################################################ if true ;then s3cmd mb s3://resharded-bkt for f in $(seq 8) ; do dest_obj="reshard-obj-${f}" s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj done src/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt -c outceph.conf src/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt -c out/ceph.conf fi ############################################################ if true ;then s3cmd mb s3://versioned-bkt bucket-enable-versioning.sh versioned-bkt for f in $(seq 3) ;do for g in $(seq 10) ;do dest_obj="versioned-obj-${g}" s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj done done for g in $(seq 1 2 10) ;do dest_obj="versioned-obj-${g}" s3cmd rm s3://versioned-bkt/$dest_obj done fi ############################################################ if true ;then s3cmd mb s3://orig-bkt for f in $(seq 4) ;do dest_obj="orig-obj-$f" s3cmd put -q $big_obj s3://orig-bkt/$dest_obj done s3cmd mb s3://copy-bkt s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1 s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3 for f in $(seq 5 6) ;do dest_obj="orig-obj-$f" s3cmd put -q $big_obj s3://copy-bkt/$dest_obj done s3cmd rb --recursive s3://orig-bkt fi ############################################################ buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt" for b in $buckets ; do echo " " s3cmd ls s3://$b # echo " " # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1 done ############################################################ src/radosgw-admin -c out/ceph.conf gc process --include-all rm -f $big_obj echo done
Update w.r.t. tenanted containers If a bucket belongs to a tenant, for this tool to respond correctly, `radosgw-admin bucket radoslist` must have both `--bucket=` and `--tenant=` command-line arguments. If a user instead tries to not specify the tenant and prefix the container name with the tenant identifier, the tool will not be able to follow the manifests appropriately and may generate runtime errors. Note: since `radosgw-admin` uses the "--bucket" terminology, we use "bucket" and "container" interchangeably given that s3 and swift have their own preferred terms.
I think it's vitally important that the customer fully understands the issues described in comment #32, comment #59, and comment #73 before they test this tool, and certainly before they use this tool to remove any rados objects. It sounds like the initial cluster the customer intends to use the tool on does not have multipart uploads and does not use tenants. HOWEVER, they may choose to use this tool on other clusters. And understanding these subtleties is really *not* optional given it may be used in conjunction with `rados rm`, which could blow away a cluster's necessary data. What is the plan to insure this?
Created attachment 1646286 [details] A utility to efficiently compute a line-by-line diff on two sorted text files. Since customer noted that the standard linux `diff` utility has issues running on very large input files, this is a simplified diff that can use minimal space due to the requirement that the two input files are in sorted order. build.sh builds the executable test.sh tests the executable clean.sh cleans the directory to prepare it for being tar'd up
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1320
*** Bug 1821865 has been marked as a duplicate of this bug. ***
*** Bug 1750965 has been marked as a duplicate of this bug. ***