Bug 1321507

Summary: Recover quarantined objects
Product: Red Hat OpenStack Reporter: Sachin <sacpatil>
Component: openstack-swiftAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: derekh, pmukhedk, sacpatil, scohen, sgotliv, srevivo, zaitcev
Target Milestone: ---Keywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-05 13:17:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sachin 2016-03-28 06:05:22 UTC
Description of problem:

When swift fails to check the checksum of the objects, it moves them to quarantine(/srv/node/d1/quarantined). Is there are way to recover those objects
and (re-)store back to their canonical location?


Version-Release number of selected component (if applicable):

2015.1.2-7.el7ost


How reproducible:


Steps to Reproduce:
1. Upload file/image to swift container
  
   $ swift post mycontainer
   $ echo "content" > file.txt
   $ swift upload mycontainer file.txt

   $ swift list mycontainer
    
2. Alter content of object

   $ echo "second line" >> 712/f2a/b207d8b283f6b954f484b6c966974f2a/1459172973.66183.data

3. Object is moved to quarantine -- /srv/node/d1/quarantined/objects/


Actual results:


Expected results:


Additional info:


Number of objects moved to quarantine can be confirmed by the command,

    $ swift-recon -q

Comment 2 Pete Zaitcev 2016-03-28 19:16:06 UTC
What's the point of returning the broken object into its location if you
have just damaged it? If you succeed, you'll have a damaged object
that does not match its Etag, is all. What are you trying to accomplish?

Comment 3 Sachin 2016-03-29 04:38:27 UTC
It so happened that one of the customer re-ran overcloud deployment from different directory. 

Although his deployment was successful, tripleo-overcloud-passwords changed and so does the 'path_suffix' in swift.conf. This also caused swift to unable to check the checksum of existing objects(images) which were then moved to quarantine. Actually the object are not corrupted but swift failed to verify the checksum.

Any suggestions?

Comment 4 Pete Zaitcev 2016-03-31 21:31:29 UTC
It is possible to restore quarantined objects using existing tools.

Before doing it, the administrator must correct whatever issue
prompted the mass-quarantining. In particular, the hash suffix
must be restored. You must consult OSPd expert to verify that it
is sufficient to re-run it from a correct directory.

Next step is, for each object, to find out the full path where
the object resides on a storage node. This is done by examining
the metadata of the object file (with getfsattr IIRC). There is
a tool swift-object-info that can help with that. Once the proper
hash and account are known, one has to run swift-get-nodes
to reconstruct full paths to which the objects are to be restored.

Once paths are known, move the object from the quarantine directory
to its proper location. Once all objects are moved, restore
the servers with "swift-init object-server start" and verify
that the object is accessible with the old name. If that passes,
you may restart repliation and audition daemons and verify
operation.

However, in normal use it is considered extraordinary for users
to destroy seed values like this. The OSPd clearly does not have
enough failsafes.

Comment 5 Sachin 2016-04-01 07:25:00 UTC
If this safe enough to be carried out in the production environment, can you please elaborate the steps(with example commands) which I can carry out and then pass on to the customer?

Comment 6 Sergey Gotliv 2016-04-02 15:00:10 UTC
(In reply to Sachin from comment #5)
> If this safe enough to be carried out in the production environment, can you
> please elaborate the steps(with example commands) which I can carry out and
> then pass on to the customer?

Sachin, please, recommend your customer re-upload images to Swift. It will be faster and safer. We'll take this case with the OSPd folks to prevent the situation described in comment#3 to happen in the future.

Comment 7 Sachin 2016-04-03 04:41:53 UTC
Hi Sergey,

Your suggestion has been conveyed to the customer.

Thanx for the reply.

Comment 9 Sachin 2016-05-05 13:17:19 UTC
Sean,

Closing the case.