+++ This bug was initially created as a clone of Bug #1330385 +++ Description of problem: ======================= glusterd restart is failing if volume brick is down due to underlying filsystem crash (XFS) Version-Release number of selected component (if applicable): ============================================================ mainline How reproducible: ================= Always Steps to Reproduce: =================== 1. Have one/two node cluster 2. Create 1*2 volume and start it. 3. crash underlying filesystem for one of the volume brick using "godown tool" OR any other way. 4. Check brick is down using "volume status" 5. Try glusterd restart //restart will fail. Actual results: =============== glusterd restart is failing if volume brick is down due to FS crash. Expected results: ================= glusterd restart should work. Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-04-26 01:57:25 EDT --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Byreddy on 2016-04-26 02:13:45 EDT --- Additional info: ================ [root@dhcp42-82 ~]# gluster volume status Status of volume: Dis Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.82:/bricks/brick2/br2 49155 0 Y 23291 Brick 10.70.42.82:/bricks/brick1/br1 N/A N/A N N/A Brick 10.70.42.82:/bricks/brick2/br3 49154 0 Y 23329 NFS Server on localhost 2049 0 Y 23354 NFS Server on dhcp43-136.lab.eng.blr.redhat .com 2049 0 Y 8049 Task Status of Volume Dis ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-82 ~]# [root@dhcp42-82 ~]# systemctl restart glusterd Job for glusterd.service failed because the control process exited with error code. See "systemctl status glusterd.service" and "journalctl -xe" for details. [root@dhcp42-82 ~]# glusterd logs: ============= pid --log-level INFO) [2016-04-26 06:08:47.439960] I [MSGID: 106478] [glusterd.c:1337:init] 0-management: Maximum allowed open file descriptors set to 65536 [2016-04-26 06:08:47.440044] I [MSGID: 106479] [glusterd.c:1386:init] 0-management: Using /var/lib/glusterd as working directory [2016-04-26 06:08:47.453605] W [MSGID: 103071] [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2016-04-26 06:08:47.453658] W [MSGID: 103055] [rdma.c:4901:init] 0-rdma.management: Failed to initialize IB Device [2016-04-26 06:08:47.453677] W [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2016-04-26 06:08:47.453885] W [rpcsvc.c:1597:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2016-04-26 06:08:47.453924] E [MSGID: 106243] [glusterd.c:1610:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2016-04-26 06:08:52.512606] I [MSGID: 106513] [glusterd-store.c:2065:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30712 [2016-04-26 06:08:53.671078] I [MSGID: 106544] [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID: eac322e5-ef82-47db-b88b-2449c0164482 [2016-04-26 06:08:53.671466] C [MSGID: 106425] [glusterd-store.c:2434:glusterd_store_retrieve_bricks] 0-management: realpath() failed for brick /bricks/brick1/br1. The underlying file system may be in bad state [Input/output error] [2016-04-26 06:08:53.671847] E [MSGID: 106201] [glusterd-store.c:3092:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: Dis [2016-04-26 06:08:53.671888] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2016-04-26 06:08:53.671900] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2016-04-26 06:08:53.671907] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed [2016-04-26 06:08:53.672475] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7fe6e9e2b2ad] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x120) [0x7fe6e9e2b150] -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7fe6e9e2a739] ) 0-: received signum (0), shutting down (END)
REVIEW: http://review.gluster.org/14075 (glusterd: glusterd should restart on a underlying file system crash) posted (#1) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/14075 (glusterd: glusterd should restart on a underlying file system crash) posted (#2) for review on master by Atin Mukherjee (amukherj)
REVIEW: http://review.gluster.org/14075 (glusterd: persist brickinfo->real_path) posted (#3) for review on master by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/14075 committed in master by Jeff Darcy (jdarcy) ------ commit f0fb05d2cefae08c143f2bfdef151084f5ddb498 Author: Atin Mukherjee <amukherj> Date: Tue Apr 26 15:27:43 2016 +0530 glusterd: persist brickinfo->real_path Since real_path was not persisted and gets constructed at every glusterd restart, glusterd will fail to come up if one of the brick's underlying file system is crashed. Solution is to construct real_path only once and get it persisted. Change-Id: I97abc30372c1ffbbb2d43b716d7af09172147b47 BUG: 1330481 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/14075 CentOS-regression: Gluster Build System <jenkins.com> Smoke: Gluster Build System <jenkins.com> Reviewed-by: Kaushal M <kaushal> NetBSD-regression: NetBSD Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user