Bug 963169

Summary: glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED NOTABUG QA Contact: Rajesh Madaka <rmadaka>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: amukherj, nsathyan, racpatel, rhs-bugs, rmadaka, sanandpa, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-26 06:35:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-05-15 10:12:45 UTC
Description of problem:
glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process

Version-Release number of selected component (if applicable):
3.4.0.8rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. had a cluster of 4 server and volume(DHT) having  3 bricks. 

[root@fred ~]# gluster v info sanity
 
Volume Name: sanity
Type: Distribute
Volume ID: f72df54d-410c-4f34-b181-65d8bd0cdcc4
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity
Brick2: mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity
Brick3: fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity

2.detach  servers and probe it again
[root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com
peer detach: failed: Brick(s) with the peer fred.lab.eng.blr.redhat.com exist in cluster
[root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com force
peer detach: success
[root@mia ~]# gluster peer probe fred.lab.eng.blr.redhat.com 
peer probe: success

- also detach server fan and add it back

[root@mia ~]# gluster peer status
Number of Peers: 4

Hostname: mia.lab.eng.blr.redhat.com
Uuid: 1698dc55-2245-4b20-9b8c-60fbe77a06ff
State: Peer in Cluster (Connected)

Hostname: fan.lab.eng.blr.redhat.com
Uuid: c6dfd028-d46f-4d20-a9c6-17c04e7fb919
State: Peer in Cluster (Connected)

Hostname: cutlass.lab.eng.blr.redhat.com
Uuid: 8969af20-77e0-41a5-bb8e-500d1a238f1b
State: Peer in Cluster (Connected)

Hostname: fred.lab.eng.blr.redhat.com
Port: 24007
Uuid: ababf76c-a741-4e27-a6bb-93da035d8fd7
State: Peer in Cluster (Connected)

3.chek gluster volume status

[root@fred ~]# gluster v status

Status of volume: sanity
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity	N/A	N	4380
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49154	Y	1623
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity	N/A	N	N/A
NFS Server on localhost					2049	Y	4411
NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b	2049	Y	3549
NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff	2049	Y	1632
NFS Server on c6dfd028-d46f-4d20-a9c6-17c04e7fb919	2049	Y	4386
 
There are no active volume tasks

verify that glustefsd is running on that server or not - it's running

[root@fan ~]# ps -aef | grep glusterfsd
root      1605     1  0 May14 ?        00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id sanity.fan.lab.eng.blr.redhat.com.rhs-brick1-sanity -p /var/lib/glusterd/vols/sanity/run/fan.lab.eng.blr.redhat.com-rhs-brick1-sanity.pid -S /var/run/5013ac74e2050c547e6087ce611cbe45.socket --brick-name /rhs/brick1/sanity -l /var/log/glusterfs/bricks/rhs-brick1-sanity.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49154 --xlator-option sanity-server.listen-port=49154
root      1616     1  0 May14 ?        00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id t1.fan.lab.eng.blr.redhat.com.rhs-brick1-t1 -p /var/lib/glusterd/vols/t1/run/fan.lab.eng.blr.redhat.com-rhs-brick1-t1.pid -S /var/run/d221c6eaad62743f6a0336c357372761.socket --brick-name /rhs/brick1/t1 -l /var/log/glusterfs/bricks/rhs-brick1-t1.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49155 --xlator-option t1-server.listen-port=49155
root      4464  3106  0 01:51 pts/0    00:00:00 grep glusterfsd


4. try to start volume forcefully in order to get brick process on line. It always fails saying that Commit failed


[root@fan ~]# gluster volume start sanity force
volume start: sanity: failed: Commit failed on localhost. Please check the log file for more details.


5. stop volume and start again. brick processes are online now

[root@fan ~]# gluster volume status sanity
Status of volume: sanity
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49157	Y	4978
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49154	Y	4846
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49159	Y	4831
NFS Server on localhost					2049	Y	4988
NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff	2049	Y	4856
NFS Server on ababf76c-a741-4e27-a6bb-93da035d8fd7	2049	Y	4842
NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b	2049	Y	3986
 
There are no active volume tasks

Actual results:
start force should bring the brick process back and status should show them online

Expected results:


Additional info:

Comment 3 Rachana Patel 2013-05-21 06:53:18 UTC
log always  says

W [syncop.c:32:__run] 0-management: re-running already running task

Comment 4 Nagaprasad Sathyanarayana 2014-05-06 11:43:43 UTC
Dev ack to 3.0 RHS BZs

Comment 5 Atin Mukherjee 2014-05-08 12:33:55 UTC
The second step mentioned here i.e. peer detach with force would fail as there has been a recent change in the peer detach functionality (http://review.gluster.org/5325) which would not allow a peer to be detached (even with force) if it holds a brick.

Comment 7 Atin Mukherjee 2014-08-12 06:16:51 UTC
Hi Rachana,

Can you please try to reproduce this bug as I believe this is no more valid based on comment 5. 

~Atin

Comment 8 Atin Mukherjee 2014-08-26 06:35:48 UTC
Closing this bug, please re-open if it gets reproduced.

Comment 9 Sweta Anandpara 2018-01-15 08:12:46 UTC
Re setting the need_info to current glusterd qe for the question asked in comment 7.

Comment 10 Red Hat Bugzilla 2023-09-14 01:44:11 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days