Bug 963335

Summary: glusterd enters D state after replace-brick abort operation
Product: [Community] GlusterFS Reporter: Paschalis Korosoglou <pkoro>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.1CC: bugs, ctrianta, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-12-14 19:40:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log file from source brick side
none
Log file segment from destination brick side none

Description Paschalis Korosoglou 2013-05-15 16:57:01 UTC
Created attachment 748413 [details]
Log file from source brick side

Description of problem:

glusterd entered in D state after replace-brick abort operation

Version-Release number of selected component (if applicable):

glusterfs-server-3.3.1-1.el6.x86_64

How reproducible:

Not easily actually. The problem occured after issuing an abort command on a replace-brick operation that had failed (number of files in the destination brick was less than the number of files on the source brick). So to reproduce it a replace-brick operation will first have to fail with the symptoms described above. In as far as gluster deamon was concerned the replace brick had been concluded OK (Migration complete) but the difference was found by measuring number of files. The difference was also evident using df on the source and destination bricks. 

The symptom was that after issuing the abort command glusterd deamon went into D state so it was impossible to stop and start it. Actually the abort command timed out, we restarted glusterd on all servers in the ring but on this server the command was issued we have to hard reboot him to recover. 

Steps to Reproduce:
1. Assuming the replace-brick operation does not complete properly issue the following command

# gluster volume replace-brick <volume> <source> <destination> abort
  
Actual results:

glusterd enters D state

Expected results:

Abort should have been successful

Additional info:

In a way we are now stuck with this operation because gluster still thinks there is an uncommitted replace-brick operation. So the problem we have now is that we have no way to trying to move the brick somewhere else (using replace-brick). One option would be to edit files under /var/lib/gluster with caution but this is very risky to our understanding. 

I am attaching segments of log files regarding the replace brick operation

Comment 1 Paschalis Korosoglou 2013-05-15 16:57:38 UTC
Created attachment 748414 [details]
Log file segment from destination brick side

Comment 3 Paschalis Korosoglou 2013-10-25 09:10:06 UTC
Hi, 

should we list this bug under 996047?

Best, 
Paschalis

Comment 4 Niels de Vos 2014-11-27 14:54:27 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.