963169 – glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process

Bug 963169 - glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process

Summary: glusterd : 'gluster volume start <volname> force' is unable to start brick pr...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	krishnan parthasarathi
QA Contact:	Rajesh Madaka
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-05-15 10:12 UTC by Rachana Patel
Modified:	2023-09-14 01:44 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-26 06:35:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rachana Patel 2013-05-15 10:12:45 UTC

Description of problem:
glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process

Version-Release number of selected component (if applicable):
3.4.0.8rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1. had a cluster of 4 server and volume(DHT) having  3 bricks. 

[root@fred ~]# gluster v info sanity
 
Volume Name: sanity
Type: Distribute
Volume ID: f72df54d-410c-4f34-b181-65d8bd0cdcc4
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity
Brick2: mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity
Brick3: fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity

2.detach  servers and probe it again
[root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com
peer detach: failed: Brick(s) with the peer fred.lab.eng.blr.redhat.com exist in cluster
[root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com force
peer detach: success
[root@mia ~]# gluster peer probe fred.lab.eng.blr.redhat.com 
peer probe: success

- also detach server fan and add it back

[root@mia ~]# gluster peer status
Number of Peers: 4

Hostname: mia.lab.eng.blr.redhat.com
Uuid: 1698dc55-2245-4b20-9b8c-60fbe77a06ff
State: Peer in Cluster (Connected)

Hostname: fan.lab.eng.blr.redhat.com
Uuid: c6dfd028-d46f-4d20-a9c6-17c04e7fb919
State: Peer in Cluster (Connected)

Hostname: cutlass.lab.eng.blr.redhat.com
Uuid: 8969af20-77e0-41a5-bb8e-500d1a238f1b
State: Peer in Cluster (Connected)

Hostname: fred.lab.eng.blr.redhat.com
Port: 24007
Uuid: ababf76c-a741-4e27-a6bb-93da035d8fd7
State: Peer in Cluster (Connected)

3.chek gluster volume status

[root@fred ~]# gluster v status

Status of volume: sanity
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity	N/A	N	4380
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49154	Y	1623
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity	N/A	N	N/A
NFS Server on localhost					2049	Y	4411
NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b	2049	Y	3549
NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff	2049	Y	1632
NFS Server on c6dfd028-d46f-4d20-a9c6-17c04e7fb919	2049	Y	4386
 
There are no active volume tasks

verify that glustefsd is running on that server or not - it's running

[root@fan ~]# ps -aef | grep glusterfsd
root      1605     1  0 May14 ?        00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id sanity.fan.lab.eng.blr.redhat.com.rhs-brick1-sanity -p /var/lib/glusterd/vols/sanity/run/fan.lab.eng.blr.redhat.com-rhs-brick1-sanity.pid -S /var/run/5013ac74e2050c547e6087ce611cbe45.socket --brick-name /rhs/brick1/sanity -l /var/log/glusterfs/bricks/rhs-brick1-sanity.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49154 --xlator-option sanity-server.listen-port=49154
root      1616     1  0 May14 ?        00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id t1.fan.lab.eng.blr.redhat.com.rhs-brick1-t1 -p /var/lib/glusterd/vols/t1/run/fan.lab.eng.blr.redhat.com-rhs-brick1-t1.pid -S /var/run/d221c6eaad62743f6a0336c357372761.socket --brick-name /rhs/brick1/t1 -l /var/log/glusterfs/bricks/rhs-brick1-t1.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49155 --xlator-option t1-server.listen-port=49155
root      4464  3106  0 01:51 pts/0    00:00:00 grep glusterfsd


4. try to start volume forcefully in order to get brick process on line. It always fails saying that Commit failed


[root@fan ~]# gluster volume start sanity force
volume start: sanity: failed: Commit failed on localhost. Please check the log file for more details.


5. stop volume and start again. brick processes are online now

[root@fan ~]# gluster volume status sanity
Status of volume: sanity
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49157	Y	4978
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49154	Y	4846
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity	49159	Y	4831
NFS Server on localhost					2049	Y	4988
NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff	2049	Y	4856
NFS Server on ababf76c-a741-4e27-a6bb-93da035d8fd7	2049	Y	4842
NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b	2049	Y	3986
 
There are no active volume tasks

Actual results:
start force should bring the brick process back and status should show them online

Expected results:


Additional info:

Comment 3 Rachana Patel 2013-05-21 06:53:18 UTC

log always  says

W [syncop.c:32:__run] 0-management: re-running already running task

Comment 4 Nagaprasad Sathyanarayana 2014-05-06 11:43:43 UTC

Dev ack to 3.0 RHS BZs

Comment 5 Atin Mukherjee 2014-05-08 12:33:55 UTC

The second step mentioned here i.e. peer detach with force would fail as there has been a recent change in the peer detach functionality (http://review.gluster.org/5325) which would not allow a peer to be detached (even with force) if it holds a brick.

Comment 7 Atin Mukherjee 2014-08-12 06:16:51 UTC

Hi Rachana,

Can you please try to reproduce this bug as I believe this is no more valid based on comment 5. 

~Atin

Comment 8 Atin Mukherjee 2014-08-26 06:35:48 UTC

Closing this bug, please re-open if it gets reproduced.

Comment 9 Sweta Anandpara 2018-01-15 08:12:46 UTC

Re setting the need_info to current glusterd qe for the question asked in comment 7.

Comment 10 Red Hat Bugzilla 2023-09-14 01:44:11 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.