Bug 1752130
| Summary: | [RFE] rgw: provide a subcommand under radosgw-admin to list rados objects for a given bucket | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vikhyat Umrao <vumrao> | ||||
| Component: | RGW | Assignee: | J. Eric Ivancich <ivancich> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Uday kurundwade <ukurundw> | ||||
| Severity: | high | Docs Contact: | Ranjini M N <rmandyam> | ||||
| Priority: | medium | ||||||
| Version: | 2.5 | CC: | cbodley, ceph-eng-bugs, ceph-qe-bugs, dwood, ivancich, jbautist, kbader, mamccoma, mbenjamin, mhackett, rmandyam, sweil, tserlin, ukurundw, yehuda | ||||
| Target Milestone: | z4 | Keywords: | FutureFeature | ||||
| Target Release: | 3.3 | Flags: | rmandyam:
needinfo-
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHEL: ceph-12.2.12-96.el7cp Ubuntu: ceph_12.2.12-91redhat1 | Doc Type: | Enhancement | ||||
| Doc Text: |
.New commands to view the RADOS objects and orphans
This release adds two new commands to view how Object Gateway maps to RADOS objects and produce a potential list of orphans for further processing. The `radosgw-admin bucket radoslist --bucket=<bucket_name>` command lists all RADOS objects in the bucket. The `rgw-orphan-list` command lists all orphans in a specified pool. These commands keep intermediate results on the local file system.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1770955 1812375 1815211 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-04-06 08:27:04 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1805376 | ||||||
| Bug Blocks: | 1726135, 1770955, 1812375, 1815211, 1821884 | ||||||
| Attachments: |
|
||||||
|
Description
Vikhyat Umrao
2019-09-13 19:29:22 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. Thanks, Eric for this feature it will help us a lot. For finding the orphan with this command help we will create two lists one which will be produced with this command and another one will be produced with `rados -p <rgw data pool> ls` command. If my understanding is correct `rados ls` has to be run *first* and it should be completed and recorded to a text file then only this *new* sub-command with per bucket basis should be run to find out the legitimate object names at rados level from each bucket and then all the legitimate buckets objects list can be combined then take a diff from `rados ls` recorded output and the final results should be orphan objects. The reason I am saying `rados ls` has to be run first and recorded and saved and then only this new command should be started to avoid any ongoing new writes to be recorded in the data pool should not be in the list as we want always bucket list output to be superset of legitimate objects and `rados ls` output to be subset of total objects. Is my understanding correct? In summary, the procedure looks something like given below if my understanding in comment#3 is correct. - The approach is pretty simple, execute the `rados ls` command and store the list objects in a list text file. - The new radosgw-admin subcommand will give legitimate objects name for each bucket as they stored in the rados. This command would be per bucket so can be run for different buckets in parallel keeping an eye on load on the cluster. + Once you have object's name from all the bucket combine them to a single list. - Take a difference of `rados ls` list and legitimate objects list the difference in `rados ls` list text file will come out will be the orphan objects. - Once you have the list of orphan objects you can process them with `rados rm` command. Eric - please let me know your inputs. (In reply to Vikhyat Umrao from comment #2) > The reason I am saying `rados ls` has to be run first and recorded and saved > and then only this new command should be started to avoid any ongoing new > writes to be recorded in the data pool should not be in the list as we want > always bucket list output to be superset of legitimate objects and `rados > ls` output to be subset of total objects. > > Is my understanding correct? I think this is correct. Obviously as long as the bucket is not being manipulated it would not matter. But if there is a chance that the bucket is being manipulated, then we would not to create a new rados object that would be unaccounted for by an earlier listing of expect rados objects. (In reply to Vikhyat Umrao from comment #3) > In summary, the procedure looks something like given below if my > understanding in comment#3 is correct. > > > - The approach is pretty simple, execute the `rados ls` command and store > the list objects in a list text file. > - The new radosgw-admin subcommand will give legitimate objects name for > each bucket as they stored in the rados. This command would be per bucket so > can be run for different buckets in parallel keeping an eye on load on the > cluster. > + Once you have object's name from all the bucket combine them to a single > list. > > - Take a difference of `rados ls` list and legitimate objects list the > difference in `rados ls` list text file will come out will be the orphan > objects. > - Once you have the list of orphan objects you can process them with `rados > rm` command. > > > Eric - please let me know your inputs. That sounds approximately correct. (In reply to J. Eric Ivancich from comment #6) > (In reply to Vikhyat Umrao from comment #3) > > In summary, the procedure looks something like given below if my > > understanding in comment#3 is correct. > > > > > > - The approach is pretty simple, execute the `rados ls` command and store > > the list objects in a list text file. > > - The new radosgw-admin subcommand will give legitimate objects name for > > each bucket as they stored in the rados. This command would be per bucket so > > can be run for different buckets in parallel keeping an eye on load on the > > cluster. > > + Once you have object's name from all the bucket combine them to a single > > list. > > > > - Take a difference of `rados ls` list and legitimate objects list the > > difference in `rados ls` list text file will come out will be the orphan > > objects. > > - Once you have the list of orphan objects you can process them with `rados > > rm` command. > > > > > > Eric - please let me know your inputs. > > That sounds approximately correct. My use of the word "approximately" is to leave a little wiggle room to see what's easiest to do correctly given time pressures. [RFE] tools/rados: allow list objects in a specific pg in a pool https://bugzilla.redhat.com/show_bug.cgi?id=1752163 https://tracker.ceph.com/issues/41831 Already present in Nautilus the above bug will backport it to RHCS 3.x. I provided this update on another, related BZ (https://bugzilla.redhat.com/show_bug.cgi?id=1750965#c14), but it really belongs here.... I wanted to provide an update as to where I am on the tool.... In order to populate a test cluster, I create four buckets and apply different rgw features to each. * a bucket with both a successful and aborted multipart upload * a bucket that is resharded twice * a bucket with versioning turned on, each object is updated twice after initially put, and half are removed * a bucket with some original objects and also some objects copied from other bucket (with the bucket they came from removed) I then use `rados ls` to get the raw pool objects. And then I run the tool. I sort the outputs of each and compare. There are some mismatches, which I'm currently digging into. Another update.... I believe I've now resolved the differences in buckets with multipart objects. There are still differences in versioned buckets, but I think I know the issue and have a good idea as to how to resolve it. I may or may not be able to get that done tonight. Eric So that others can see how I'm doing preliminary testing of the tool and provide feedback, here's the script I'm currently using to generate the buckets and their contents.
#!/bin/bash
# trap "exit" INT TERM ERR
# trap "kill 0" EXIT
src_obj=/tmp/src.$$
count=5
size=2222 # in megabytes
dowait() {
# set to true to slow down process and require enter to be pressed
if true ;then
read -p "$* Continue..."
fi
}
upload() {
# no checking for now...
if false ;then
if [ $# -ne 4 ] ;then
echo Usage: $0 bucket size minutes prefix
exit 1
fi
fi
obj=$1
bucket=$2
tag=$3
kill_time=$4
dest_obj="${tag}"
if [ "$kill_time" -ne 0 ] ;then
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/${dest_obj} &
sleep $kill_time
s3cmd multipart s3://${bucket}
echo stopping upload of $tag
kill -TERM $(jobs -pr)
else
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/$dest_obj
echo finished upload of $tag
fi
}
../src/stop.sh
RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
sleep 5
# dowait attach debugger
############################################################
s3cmd mb s3://multipart-bkt
dd if=/dev/urandom of=$src_obj bs=1M count=${size}
# upload $src_obj $bucket 101 0
upload $src_obj multipart-bkt multipart-obj-1 0
upload $src_obj multipart-bkt multipart-obj-2 14
wait
############################################################
echo "hello world" >$src_obj
s3cmd mb s3://resharded-bkt
for f in $(seq 8) ; do
dest_obj="reshard-obj-${f}"
s3cmd put -q $src_obj s3://resharded-bkt/$dest_obj
done
bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt
bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt
############################################################
s3cmd mb s3://versioned-bkt
bucket-enable-versioning.sh versioned-bkt
for f in $(seq 3) ;do
echo "this is data $f" >$src_obj
for g in $(seq 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd put -q $src_obj s3://versioned-bkt/$dest_obj
done
done
for g in $(seq 1 2 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd rm s3://versioned-bkt/$dest_obj
done
############################################################
s3cmd mb s3://orig-bkt
echo "this is small" >$src_obj
for f in $(seq 4) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $src_obj s3://orig-bkt/$dest_obj
done
s3cmd mb s3://copy-bkt
s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3
for f in $(seq 5 6) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $src_obj s3://copy-bkt/$dest_obj
done
s3cmd rb --recursive s3://orig-bkt
############################################################
buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"
for b in $buckets ; do
echo " "
s3cmd ls s3://$b
# echo " "
# bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done
########################################
echo done
(In reply to Vikhyat Umrao from comment #16) > Thanks, Eric to me looks good. If I understand correct this tool will be > completed when `rados ls` and this tool output will match exactly and there > is no difference the reason for this is in perfect test scenario(when no > leak and orphan objects) the list should be exact same from both outputs. That's mostly true, Vikhyat. My testing script, however, forces a failed multipart upload (along with a successful one). So we should expect to see object in `rados ls` from the failed multipart upload that we do not see with the tool, thus allowing those objects to be removed. And that part seems to be working currently. I just need to resolve the discrepancy in versioned buckets. Here's an update. So I got a version of the tool "radosgw-admin bucket radoslist --bucket=foobar" that works against all my test cases. I updated my old testing script (https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c15) that creates buckets of various forms, so that all objects written were over 4MB, to force tail objects. I will add a comment below with the latest version. This was all done in my 3.2 dev environment and passing my tests. Customer is on 2.5 currently, but I haven't maintained a vm where I can build that given it's e-o-l'ed. I backported it to 2.5, which was trivial, and added the commit to the branch Thomas made for the build. He has initiated the build, and he and I are at the office waiting to make sure it hits the stage where we can be assured that there are not any compiler errors. Once we have a build, I think the plan is to have Vikhyat do some testing. And depending on what Vikhyat finds, we'll go from there. Eric Below is the latest version of my script to create a number of buckets that exercise various features of rgw, so I can then compare the output of `rados ls` and `radosgw-admin bucket radoslist --bucket=<bucket-name>`.
#!/bin/bash
# trap "exit" INT TERM ERR
# trap "kill 0" EXIT
count=5
huge_size=2222 # in megabytes
big_size=6 # in megabytes
huge_obj=/tmp/huge_obj.$$
big_obj=/tmp/big_obj.$$
dowait() {
# set to true to slow down process and require enters to be pressed
if true ;then
read -p "$* Continue..."
fi
}
upload() {
# no checking for now...
if false ;then
if [ $# -ne 4 ] ;then
echo Usage: $0 bucket size minutes prefix
exit 1
fi
fi
obj=$1
bucket=$2
tag=$3
kill_time=$4
dest_obj="${tag}"
if [ "$kill_time" -ne 0 ] ;then
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/${dest_obj} &
sleep $kill_time
s3cmd multipart s3://${bucket}
echo stopping upload of $tag
kill -TERM $(jobs -pr)
else
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/$dest_obj
echo finished upload of $tag
fi
}
../src/stop.sh
RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
sleep 5
dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
dd if=/dev/urandom of=$big_obj bs=1M count=${big_size}
# dowait attach debugger
############################################################
s3cmd mb s3://multipart-bkt
upload $huge_obj multipart-bkt multipart-obj-1 0
upload $huge_obj multipart-bkt multipart-obj-2 14
wait
############################################################
s3cmd mb s3://resharded-bkt
for f in $(seq 8) ; do
dest_obj="reshard-obj-${f}"
s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj
done
bin/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt
bin/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt
############################################################
s3cmd mb s3://versioned-bkt
bucket-enable-versioning.sh versioned-bkt
for f in $(seq 3) ;do
for g in $(seq 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj
done
done
for g in $(seq 1 2 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd rm s3://versioned-bkt/$dest_obj
done
############################################################
s3cmd mb s3://orig-bkt
for f in $(seq 4) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $big_obj s3://orig-bkt/$dest_obj
done
s3cmd mb s3://copy-bkt
s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3
for f in $(seq 5 6) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $big_obj s3://copy-bkt/$dest_obj
done
s3cmd rb --recursive s3://orig-bkt
############################################################
buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"
for b in $buckets ; do
echo " "
s3cmd ls s3://$b
# echo " "
# bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done
########################################
echo done
Here's an update.... I spent most of the day working on discrepancies on the 2.5 version of this. [As background, I've not maintained build environments for 2.5, so it was faster to do my initial work on 3.2. But in the backport to 2.5 discrepancies emerged. Key data structures changed in name and form and use between those two versions changed, causing a number of issues.] I worked through all but one of them. The remaining discrepancy should not have an impact on customer's specific use-case. Let me discuss it in detail below. When a multipart upload fails, certain objects remain; and we'd like clean them up, such as with this tool. Because it's a multipart upload, segments are uploaded individually in order to make the whole. The manifest is not created (at least in 2.5) until the upload is complete. And the algorithm this tool is based on uses the manifest in order to generate the full set of rados objects. So in the case of incomplete multipart uploads, the tool will a) fail to produce the list of objects that did get uploaded (this is good because we want those to be deleted), and b) it will produce some incorrect object names (this is neutral as it doesn't help or hurt the process when we subtract the objects produced here from those produced by `rados ls`). I could have removed b) HOWEVER, there is a comment in the code that sufficiently old objects (which I interpret to mean objects created on sufficiently old versions of rgw) do not have a manifest either. So I left the code there. So the question remains how long ago it was that versions of rgw did not produce the manifests and whether the customer's cluster ever used such a version of rgw (and upgraded from there). I will look at commits but I will also ask Yehuda, in case he knows off-hand. But given that this version, after sufficient testing on Friday, *might* be useable by the customer, I've added a commit with fixes to the branch and spoke with Thomas. He will initiate the builds sometime tonight, so a version will be available for Vikhyat for testing tomorrow morning. Yehuda, There is a comment in the orphan finding code that a "very very old object" may not have a manifest. Do you know off-hand which versions of ceph/RGW this would apply to? I would like to know that this condition would not apply to customer's cluster. Thank you, Eric I looked through older commits and it seems that the RGWObjManfiest came into being in the spring of 2012 and gained some important functionality in 2014. I wouldn't mind hearing Yehuda's take, nonetheless. (In reply to J. Eric Ivancich from comment #32) > When a multipart upload fails, certain objects remain; and we'd like clean > them up, such as with this tool. Because it's a multipart upload, segments > are uploaded individually in order to make the whole. The manifest is not > created (at least in 2.5) until the upload is complete. And the algorithm > this tool is based on uses the manifest in order to generate the full set of > rados objects. These incomplete parts do get tracked in the bucket index under the "multipart" namespace. These entries are returned by cls_bucket_list_ordered() but they're filtered out (based on params.enforce_ns and params.ns) in RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list this "multipart" namespace separately (similar to list_multipart_parts() in rgw_multi.cc), or it could set params.enforce_ns=false and manually filter out entries where namespace is not "" or "multipart". (In reply to Casey Bodley from comment #36) > (In reply to J. Eric Ivancich from comment #32) > > When a multipart upload fails, certain objects remain; and we'd like clean > > them up, such as with this tool. Because it's a multipart upload, segments > > are uploaded individually in order to make the whole. The manifest is not > > created (at least in 2.5) until the upload is complete. And the algorithm > > this tool is based on uses the manifest in order to generate the full set of > > rados objects. > > These incomplete parts do get tracked in the bucket index under the > "multipart" namespace. These entries are returned by > cls_bucket_list_ordered() but they're filtered out (based on > params.enforce_ns and params.ns) in > RGWRados::Bucket::List::list_objects_ordered(). Your tool could either list > this "multipart" namespace separately (similar to list_multipart_parts() in > rgw_multi.cc), or it could set params.enforce_ns=false and manually filter > out entries where namespace is not "" or "multipart". Thanks, Casey! That's interesting. But I'm thinking the tool should *not* list these objects left over from an incomplete multipart upload, thereby making them eligible for deletion by the manual process discussed above (subtracting output of tool from `rados ls` after sorting both). Or perhaps the ultimate version of the tool would have a command-line option as to whether these get listed or not, with the default being not. Update The tool fails in a normal case used by customer -- swift dlo uploads. Vikhyat just did the test on a new bucket, and the tool produces no output for that bucket. Previous tests were only done after issuing an expiration directive and didn't apply to the normal case.... **** **** So the tool is UNSAFE to use in combination with the `rados ls`, sorts, diffs, and `rados rm`.... **** I'll see what I can do tonight and over the weekend.... Vikhyat is letting the customer know that the tool is currently UNSAFE for their purposes. Update I've added some commits to the branch that contains the tool that pass my prior tests and my newer tests focused on swift -- regular objects along with dlo- and slo-style large objects. Thomas is initiating builds. The `swift` command-line tool can be used to upload large objects via dlo or slo. When it does so, it creates a separate container to hold the segments, and by default that separate container has the name of the container the large object is being `swift upload`ed to, although using that other container can be specified on the command-line using the `--segment-container` option. I suspect the ideal way to use swift is to allow all the segments of a large object go to a separate container. That way the users container appears to contain just the full objects. I don't know if customer has strictly held to this behavior or not. But if you run this tool on a user-container, it will follow the swift-specific manifests of dlo and slo large objects to the segments in whatever container they appear in, and to the rgw manifests that those may contain to produce a list of rados objects that back up that container. In other words, by applying the tool to one container, it can produce objects in other containers, such as associated "_segments" containers. This is PROPER behavior. Now what if the user (e.g., customer) applies the tool to one of the containers created just for the segments. Well in that case it will re-visit the same objects and their rgw manifests listing them *again*! The customer may be forced to do this if their internal users did not maintain a strict "division" between containers with user objects and containers with their associated segments. If there is a chance that a the same rados objects may be produced more than once in the entire process, the user may want to consider using the `sort` command-line tool with the `--unique option`, to discard duplicates. Otherwise, they'll have to think carefully through what effect having duplicate entries would have on their process. Additionally, it should be noted that the linux `sort` command-line tool can behave differently depending on how the "LANG" environment variable is set. When LANG is set to "en_US.UTF-8" the sorted results will be in one order and when set to something else, such as "C". I've found that setting `export LANG=C` produces results that make the most sense from a programming viewpoint. HOWEVER, IF IN THE PROCESS TO DETERMINE ORPHAN RADOS OBJECTS THAT ARE SAFE TO REMOVE, IT IS ***VITAL*** THAT WHEN THE OUTPUT FROM THIS TOOL IS COMBINED AND SORTED AND WHEN THE OUTPUT FROM `rados ls` IS SORTED, THAT THEY BE SORTED USING THE SAME SETTING OF "LANG". If the LANG setting is different, then `diff` cannot operate correctly and the results will be meaningless and DANGEROUS to feed into an `rados rm` process. Testing this tool and all manual and scripted procedures to remove "orphan" objects is VITAL for safety and ultimate health of the production cluster. There are many subtleties that, if not understood and accounted for, could result in DISASTEROUS RESULTS. Please note that the caveats described above in comment #32 remain. See: https://bugzilla.redhat.com/show_bug.cgi?id=1752130#c32 # Here is the latest version of the script used in swift testing #!/bin/bash # trap "exit" INT TERM ERR # trap "kill 0" EXIT export ST_AUTH=http://localhost:8000/auth/v1.0 export ST_USER=test:tester export ST_KEY=testing huge_size=2222 # in megabytes big_size=6 # in megabytes # 600MB segment_size=629145600 huge_obj=~/huge_obj.temp big_obj=~/big_obj.$$ dd if=/dev/urandom of=$big_obj bs=1M count=${big_size} make_huge() { # if huge object does not exist or is not at least 2'ish gig if [ ! -f "$huge_obj" ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size} fi } ############################################################ # plain test if true ;then for f in $(seq 4) ;do swift upload swift-plain-ctr $big_obj --object-name swift-obj-$f done fi ############################################################ # dlo test if true ;then make_huge # upload in 300MB segments swift upload swift-dlo-ctr $huge_obj --object-name dlo-obj-1 \ -S $segment_size fi ############################################################ # slo test if true ;then make_huge # upload in 300MB segments swift upload swift-slo-ctr $huge_obj --object-name slo-obj-1 \ -S $segment_size --use-slo fi ############################################################ buckets="swift-plain-ctr swift-dlo-ctr swift-slo-ctr swift-dlo-ctr_segments swift-slo-ctr_segments" for b in $buckets ; do echo " " swift list $b # echo " " # bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1 done echo "buckets: $buckets" ############################################################ rm -f $big_obj echo done # Here is the latest version of the script used in s3 testing
#!/bin/bash
# trap "exit" INT TERM ERR
# trap "kill 0" EXIT
huge_size=2222 # in megabytes
big_size=6 # in megabytes
huge_obj=~/huge_obj.temp
big_obj=~/big_obj.$$
dowait() {
# set to true to slow down process and require enters to be pressed
if true ;then
read -p "$* Continue..."
fi
}
upload() {
# no checking for now...
if false ;then
if [ $# -ne 4 ] ;then
echo Usage: $0 bucket size minutes prefix
exit 1
fi
fi
obj=$1
bucket=$2
tag=$3
kill_time=$4
dest_obj="${tag}"
if [ "$kill_time" -ne 0 ] ;then
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/${dest_obj} &
sleep $kill_time
s3cmd multipart s3://${bucket}
echo stopping upload of $tag
kill -TERM $(jobs -pr)
else
echo starting upload of $tag
s3cmd put -q $obj s3://${bucket}/$dest_obj
echo finished upload of $tag
fi
}
# src/stop.sh
# RGW=1 OSD=1 MON=1 MDS=0 MGR=0 FS=0 ../src/vstart.sh -n -l -b --short
# $HOME/ceph-work1/start.sh
# sleep 5
dd if=/dev/urandom of=$big_obj bs=1M count=${big_size}
make_huge() {
# if huge object does not exist or is not at least 2'ish gig
if [ ! -f "$huge_obj" ] ;then
dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
elif [ $(stat --printf="%s" $huge_obj) -lt 2000000000 ] ;then
dd if=/dev/urandom of=$huge_obj bs=1M count=${huge_size}
fi
}
# dowait attach debugger
############################################################
# multipart test
if true ;then
make_huge
s3cmd mb s3://multipart-bkt
upload $huge_obj multipart-bkt multipart-obj-1 0
upload $huge_obj multipart-bkt multipart-obj-2 14
wait
fi
############################################################
if true ;then
s3cmd mb s3://resharded-bkt
for f in $(seq 8) ; do
dest_obj="reshard-obj-${f}"
s3cmd put -q $big_obj s3://resharded-bkt/$dest_obj
done
src/radosgw-admin bucket reshard --num-shards 3 --bucket=resharded-bkt -c outceph.conf
src/radosgw-admin bucket reshard --num-shards 5 --bucket=resharded-bkt -c out/ceph.conf
fi
############################################################
if true ;then
s3cmd mb s3://versioned-bkt
bucket-enable-versioning.sh versioned-bkt
for f in $(seq 3) ;do
for g in $(seq 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd put -q $big_obj s3://versioned-bkt/$dest_obj
done
done
for g in $(seq 1 2 10) ;do
dest_obj="versioned-obj-${g}"
s3cmd rm s3://versioned-bkt/$dest_obj
done
fi
############################################################
if true ;then
s3cmd mb s3://orig-bkt
for f in $(seq 4) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $big_obj s3://orig-bkt/$dest_obj
done
s3cmd mb s3://copy-bkt
s3cmd cp s3://orig-bkt/orig-obj-1 s3://copy-bkt/copied-obj-1
s3cmd cp s3://orig-bkt/orig-obj-3 s3://copy-bkt/copied-obj-3
for f in $(seq 5 6) ;do
dest_obj="orig-obj-$f"
s3cmd put -q $big_obj s3://copy-bkt/$dest_obj
done
s3cmd rb --recursive s3://orig-bkt
fi
############################################################
buckets="multipart-bkt resharded-bkt versioned-bkt copy-bkt"
for b in $buckets ; do
echo " "
s3cmd ls s3://$b
# echo " "
# bin/radosgw-admin bucket radoslist --bucket=$b --debug-rgw=20 --debug-ms=1
done
############################################################
src/radosgw-admin -c out/ceph.conf gc process --include-all
rm -f $big_obj
echo done
Update w.r.t. tenanted containers If a bucket belongs to a tenant, for this tool to respond correctly, `radosgw-admin bucket radoslist` must have both `--bucket=` and `--tenant=` command-line arguments. If a user instead tries to not specify the tenant and prefix the container name with the tenant identifier, the tool will not be able to follow the manifests appropriately and may generate runtime errors. Note: since `radosgw-admin` uses the "--bucket" terminology, we use "bucket" and "container" interchangeably given that s3 and swift have their own preferred terms. I think it's vitally important that the customer fully understands the issues described in comment #32, comment #59, and comment #73 before they test this tool, and certainly before they use this tool to remove any rados objects. It sounds like the initial cluster the customer intends to use the tool on does not have multipart uploads and does not use tenants. HOWEVER, they may choose to use this tool on other clusters. And understanding these subtleties is really *not* optional given it may be used in conjunction with `rados rm`, which could blow away a cluster's necessary data. What is the plan to insure this? Created attachment 1646286 [details]
A utility to efficiently compute a line-by-line diff on two sorted text files.
Since customer noted that the standard linux `diff` utility has issues running on very large input files, this is a simplified diff that can use minimal space due to the requirement that the two input files are in sorted order.
build.sh builds the executable
test.sh tests the executable
clean.sh cleans the directory to prepare it for being tar'd up
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1320 *** Bug 1821865 has been marked as a duplicate of this bug. *** *** Bug 1750965 has been marked as a duplicate of this bug. *** |