Description of problem: glusterd : 'gluster volume start <volname> force' is unable to start brick process and it fails but 'gluster volume stop' followed by 'gluster volume start' starts that brick process Version-Release number of selected component (if applicable): 3.4.0.8rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. had a cluster of 4 server and volume(DHT) having 3 bricks. [root@fred ~]# gluster v info sanity Volume Name: sanity Type: Distribute Volume ID: f72df54d-410c-4f34-b181-65d8bd0cdcc4 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity Brick2: mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity Brick3: fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity 2.detach servers and probe it again [root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com peer detach: failed: Brick(s) with the peer fred.lab.eng.blr.redhat.com exist in cluster [root@mia ~]# gluster peer detach fred.lab.eng.blr.redhat.com force peer detach: success [root@mia ~]# gluster peer probe fred.lab.eng.blr.redhat.com peer probe: success - also detach server fan and add it back [root@mia ~]# gluster peer status Number of Peers: 4 Hostname: mia.lab.eng.blr.redhat.com Uuid: 1698dc55-2245-4b20-9b8c-60fbe77a06ff State: Peer in Cluster (Connected) Hostname: fan.lab.eng.blr.redhat.com Uuid: c6dfd028-d46f-4d20-a9c6-17c04e7fb919 State: Peer in Cluster (Connected) Hostname: cutlass.lab.eng.blr.redhat.com Uuid: 8969af20-77e0-41a5-bb8e-500d1a238f1b State: Peer in Cluster (Connected) Hostname: fred.lab.eng.blr.redhat.com Port: 24007 Uuid: ababf76c-a741-4e27-a6bb-93da035d8fd7 State: Peer in Cluster (Connected) 3.chek gluster volume status [root@fred ~]# gluster v status Status of volume: sanity Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity N/A N 4380 Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity 49154 Y 1623 Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity N/A N N/A NFS Server on localhost 2049 Y 4411 NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b 2049 Y 3549 NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff 2049 Y 1632 NFS Server on c6dfd028-d46f-4d20-a9c6-17c04e7fb919 2049 Y 4386 There are no active volume tasks verify that glustefsd is running on that server or not - it's running [root@fan ~]# ps -aef | grep glusterfsd root 1605 1 0 May14 ? 00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id sanity.fan.lab.eng.blr.redhat.com.rhs-brick1-sanity -p /var/lib/glusterd/vols/sanity/run/fan.lab.eng.blr.redhat.com-rhs-brick1-sanity.pid -S /var/run/5013ac74e2050c547e6087ce611cbe45.socket --brick-name /rhs/brick1/sanity -l /var/log/glusterfs/bricks/rhs-brick1-sanity.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49154 --xlator-option sanity-server.listen-port=49154 root 1616 1 0 May14 ? 00:00:00 /usr/sbin/glusterfsd -s fan.lab.eng.blr.redhat.com --volfile-id t1.fan.lab.eng.blr.redhat.com.rhs-brick1-t1 -p /var/lib/glusterd/vols/t1/run/fan.lab.eng.blr.redhat.com-rhs-brick1-t1.pid -S /var/run/d221c6eaad62743f6a0336c357372761.socket --brick-name /rhs/brick1/t1 -l /var/log/glusterfs/bricks/rhs-brick1-t1.log --xlator-option *-posix.glusterd-uuid=c6dfd028-d46f-4d20-a9c6-17c04e7fb919 --brick-port 49155 --xlator-option t1-server.listen-port=49155 root 4464 3106 0 01:51 pts/0 00:00:00 grep glusterfsd 4. try to start volume forcefully in order to get brick process on line. It always fails saying that Commit failed [root@fan ~]# gluster volume start sanity force volume start: sanity: failed: Commit failed on localhost. Please check the log file for more details. 5. stop volume and start again. brick processes are online now [root@fan ~]# gluster volume status sanity Status of volume: sanity Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/sanity 49157 Y 4978 Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/sanity 49154 Y 4846 Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/sanity 49159 Y 4831 NFS Server on localhost 2049 Y 4988 NFS Server on 1698dc55-2245-4b20-9b8c-60fbe77a06ff 2049 Y 4856 NFS Server on ababf76c-a741-4e27-a6bb-93da035d8fd7 2049 Y 4842 NFS Server on 8969af20-77e0-41a5-bb8e-500d1a238f1b 2049 Y 3986 There are no active volume tasks Actual results: start force should bring the brick process back and status should show them online Expected results: Additional info:
log always says W [syncop.c:32:__run] 0-management: re-running already running task
Dev ack to 3.0 RHS BZs
The second step mentioned here i.e. peer detach with force would fail as there has been a recent change in the peer detach functionality (http://review.gluster.org/5325) which would not allow a peer to be detached (even with force) if it holds a brick.
Hi Rachana, Can you please try to reproduce this bug as I believe this is no more valid based on comment 5. ~Atin
Closing this bug, please re-open if it gets reproduced.
Re setting the need_info to current glusterd qe for the question asked in comment 7.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days