Bug 963335 - glusterd enters D state after replace-brick abort operation
Summary: glusterd enters D state after replace-brick abort operation
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.3.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-15 16:57 UTC by Paschalis Korosoglou
Modified: 2014-12-14 19:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-14 19:40:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Log file from source brick side (22.80 KB, application/octet-stream)
2013-05-15 16:57 UTC, Paschalis Korosoglou
no flags Details
Log file segment from destination brick side (42.61 KB, application/octet-stream)
2013-05-15 16:57 UTC, Paschalis Korosoglou
no flags Details

Description Paschalis Korosoglou 2013-05-15 16:57:01 UTC
Created attachment 748413 [details]
Log file from source brick side

Description of problem:

glusterd entered in D state after replace-brick abort operation

Version-Release number of selected component (if applicable):

glusterfs-server-3.3.1-1.el6.x86_64

How reproducible:

Not easily actually. The problem occured after issuing an abort command on a replace-brick operation that had failed (number of files in the destination brick was less than the number of files on the source brick). So to reproduce it a replace-brick operation will first have to fail with the symptoms described above. In as far as gluster deamon was concerned the replace brick had been concluded OK (Migration complete) but the difference was found by measuring number of files. The difference was also evident using df on the source and destination bricks. 

The symptom was that after issuing the abort command glusterd deamon went into D state so it was impossible to stop and start it. Actually the abort command timed out, we restarted glusterd on all servers in the ring but on this server the command was issued we have to hard reboot him to recover. 

Steps to Reproduce:
1. Assuming the replace-brick operation does not complete properly issue the following command

# gluster volume replace-brick <volume> <source> <destination> abort
  
Actual results:

glusterd enters D state

Expected results:

Abort should have been successful

Additional info:

In a way we are now stuck with this operation because gluster still thinks there is an uncommitted replace-brick operation. So the problem we have now is that we have no way to trying to move the brick somewhere else (using replace-brick). One option would be to edit files under /var/lib/gluster with caution but this is very risky to our understanding. 

I am attaching segments of log files regarding the replace brick operation

Comment 1 Paschalis Korosoglou 2013-05-15 16:57:38 UTC
Created attachment 748414 [details]
Log file segment from destination brick side

Comment 3 Paschalis Korosoglou 2013-10-25 09:10:06 UTC
Hi, 

should we list this bug under 996047?

Best, 
Paschalis

Comment 4 Niels de Vos 2014-11-27 14:54:27 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.


Note You need to log in before you can comment on or make changes to this bug.