Bug 1321507 - Recover quarantined objects
Summary: Recover quarantined objects
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-swift
Version: 7.0 (Kilo)
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Pete Zaitcev
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-28 06:05 UTC by Sachin
Modified: 2016-05-05 13:17 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-05 13:17:19 UTC


Attachments (Terms of Use)

Description Sachin 2016-03-28 06:05:22 UTC
Description of problem:

When swift fails to check the checksum of the objects, it moves them to quarantine(/srv/node/d1/quarantined). Is there are way to recover those objects
and (re-)store back to their canonical location?


Version-Release number of selected component (if applicable):

2015.1.2-7.el7ost


How reproducible:


Steps to Reproduce:
1. Upload file/image to swift container
  
   $ swift post mycontainer
   $ echo "content" > file.txt
   $ swift upload mycontainer file.txt

   $ swift list mycontainer
    
2. Alter content of object

   $ echo "second line" >> 712/f2a/b207d8b283f6b954f484b6c966974f2a/1459172973.66183.data

3. Object is moved to quarantine -- /srv/node/d1/quarantined/objects/


Actual results:


Expected results:


Additional info:


Number of objects moved to quarantine can be confirmed by the command,

    $ swift-recon -q

Comment 2 Pete Zaitcev 2016-03-28 19:16:06 UTC
What's the point of returning the broken object into its location if you
have just damaged it? If you succeed, you'll have a damaged object
that does not match its Etag, is all. What are you trying to accomplish?

Comment 3 Sachin 2016-03-29 04:38:27 UTC
It so happened that one of the customer re-ran overcloud deployment from different directory. 

Although his deployment was successful, tripleo-overcloud-passwords changed and so does the 'path_suffix' in swift.conf. This also caused swift to unable to check the checksum of existing objects(images) which were then moved to quarantine. Actually the object are not corrupted but swift failed to verify the checksum.

Any suggestions?

Comment 4 Pete Zaitcev 2016-03-31 21:31:29 UTC
It is possible to restore quarantined objects using existing tools.

Before doing it, the administrator must correct whatever issue
prompted the mass-quarantining. In particular, the hash suffix
must be restored. You must consult OSPd expert to verify that it
is sufficient to re-run it from a correct directory.

Next step is, for each object, to find out the full path where
the object resides on a storage node. This is done by examining
the metadata of the object file (with getfsattr IIRC). There is
a tool swift-object-info that can help with that. Once the proper
hash and account are known, one has to run swift-get-nodes
to reconstruct full paths to which the objects are to be restored.

Once paths are known, move the object from the quarantine directory
to its proper location. Once all objects are moved, restore
the servers with "swift-init object-server start" and verify
that the object is accessible with the old name. If that passes,
you may restart repliation and audition daemons and verify
operation.

However, in normal use it is considered extraordinary for users
to destroy seed values like this. The OSPd clearly does not have
enough failsafes.

Comment 5 Sachin 2016-04-01 07:25:00 UTC
If this safe enough to be carried out in the production environment, can you please elaborate the steps(with example commands) which I can carry out and then pass on to the customer?

Comment 6 Sergey Gotliv 2016-04-02 15:00:10 UTC
(In reply to Sachin from comment #5)
> If this safe enough to be carried out in the production environment, can you
> please elaborate the steps(with example commands) which I can carry out and
> then pass on to the customer?

Sachin, please, recommend your customer re-upload images to Swift. It will be faster and safer. We'll take this case with the OSPd folks to prevent the situation described in comment#3 to happen in the future.

Comment 7 Sachin 2016-04-03 04:41:53 UTC
Hi Sergey,

Your suggestion has been conveyed to the customer.

Thanx for the reply.

Comment 9 Sachin 2016-05-05 13:17:19 UTC
Sean,

Closing the case.


Note You need to log in before you can comment on or make changes to this bug.