Description of problem: gluster wrongly reports bricks on-line, even when brick path is not available Version-Release number of selected component (if applicable): mainline When we do a node restart in CNS cluster, all path in /var/lib/heketi/fstab is not getting mounted ( Brick path) $ cat fstab |wc -l 127 $ cat df_output |wc -l 86 But gluster volume status shows bricks are on line : $ cat vol_status |grep -i vol_117382c88c4337df0b0ee35a3cb7ca51 -A15 Status of volume: vol_117382c88c4337df0b0ee35a3cb7ca51 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.16.77.21:/var/lib/heketi/mounts/vg _809af91663966a9fd655d5955bc1ad31/brick_e9a b265ff2a9f7608e66e17a1e90cf3d/brick 49155 0 Y 8944 Brick 10.16.77.20:/var/lib/heketi/mounts/vg _0a7e1052758ea35c3a27b5842e14e8b4/brick_a28 118e271db63e880e2ac5f06609617/brick 49153 0 Y 27905 Brick 10.16.77.23:/var/lib/heketi/mounts/vg _a7f22615f3be390d5f8648cbe32ed001/brick_c15 1ec7fb3a34b1a3daa361e127f5c76/brick 49152 0 Y 1075 Self-heal Daemon on localhost N/A N/A Y 31715 Self-heal Daemon on 10.16.77.25 N/A N/A Y 30533 Self-heal Daemon on crp-prod-glusterfs02.sr v.allianz N/A N/A Y 17016 -- Task Status of Volume vol_117382c88c4337df0b0ee35a3cb7ca51 ------------------------------------------------------------------------------ There are no active volume tasks But when we run, gluster volume heal info on the volumes, it shows transport end point not connected which is correct, since brick path is not available at all. Note : Volume taken for example, I have manually mounted the brick corresponding to node 10.16.77.23 Brick log snippet : ~~~ [2018-06-01 08:54:28.356370] E [index.c:2342:init] 8-vol_117382c88c4337df0b0ee35a3cb7ca51-index: Failed to find index basepath /var/lib/heketi/mounts/vg_a7f22615f3be390d5f8648cbe32ed001/brick_c151ec7fb3a34b1a3daa361e127f5c76/brick/.glusterfs/indices. [2018-06-01 08:54:28.356403] W [graph.c:1192:glusterfs_graph_attach] 0-glusterfs: failed to initialize graph for xlator /var/lib/heketi/mounts/vg_a7f22615f3be390d5f8648cbe32ed001/brick_c151ec7fb3a34b1a3daa361e127f5c76/brick [2018-06-01 09:07:51.362685] I [glusterfsd-mgmt.c:864:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd/vols/vol_117382c88c4337df0b0ee35a3cb7ca51/vol_117382c88c4337df0b0ee35a3cb7ca51.10.16.77.23.var-lib-heketi-mounts-vg_a7f22615f3be390d5f8648cbe32ed001-brick_c151ec7fb3a34b1a3daa361e127f5c76-brick.vol [2018-06-01 09:07:51.370311] E [index.c:2342:init] 7-vol_117382c88c4337df0b0ee35a3cb7ca51-index: Failed to find index basepath /var/lib/heketi/mounts/vg_a7f22615f3be390d5f8648cbe32ed001/brick_c151ec7fb3a34b1a3daa361e127f5c76/brick/.glusterfs/indices. [2018-06-01 09:07:51.370363] W [graph.c:1192:glusterfs_graph_attach] 0-glusterfs: failed to initialize graph for xlator /var/lib/heketi/mounts/vg_a7f22615f3be390d5f8648cbe32ed001/brick_c151ec7fb3a34b1a3daa361e127f5c76/brick ~~~
COMMIT: https://review.gluster.org/20202 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: Add multiple checks before attach/start a brick Problem: In brick mux scenario sometime glusterd is not able to start/attach a brick and gluster v status shows brick is already running Solution: 1) To make sure brick is running check brick_path in /proc/<pid>/fd , if a brick is consumed by the brick process it means brick stack is come up otherwise not 2) Before start/attach a brick check if a brick is mounted or not 3) At the time of printing volume status check brick is consumed by any brick process Test: To test the same followed procedure 1) Setup brick mux environment on a vm 2) Put a breaking point in gdb in function posix_health_check_thread_proc at the time of notify GF_EVENT_CHILD_DOWN event 3) unmount anyone brick path forcefully 4) check gluster v status it will show N/A for the brick 5) Try to start volume with force option, glusterd throw message "No device available for mount brick" 6) Mount the brick_root path 7) Try to start volume with force option 8) down brick is started successfully Change-Id: I91898dad21d082ebddd12aa0d1f7f0ed012bdf69 fixes: bz#1595320 Signed-off-by: Mohit Agrawal <moagrawa>
REVIEW: https://review.gluster.org/20651 (glusterd: more stricter checks of if brick is running in multiplex mode) posted (#1) for review on master by Atin Mukherjee
COMMIT: https://review.gluster.org/20651 committed in master by "Atin Mukherjee" <amukherj> with a commit message- glusterd: more stricter checks of if brick is running in multiplex mode While gf_attach () utility can help in detaching a brick instance from the brick process which the kill_brick () function in tests/volume.rc uses it has a caveat which is as follows: 1. It doesn't ensure the respective brick is marked as stopped which glusterd does from glusterd_brick_stop 2. Sometimes if kill_brick () is executed just after a brick stack is up, the mgmt_rpc_notify () can take some time before marking priv->connected to 1 and before it if kill_brick () is executed, brick will fail to initiate the pmap_signout which would inturn cleans up the pidfile. To avoid such possibilities, a more stricter check on if a brick is running or not in brick multiplexing has been brought in now where it not only checks for its pid's existance but checks if the respective process has the brick instance associated with it before checking for brick's status. Change-Id: I98b92df949076663b9686add7aab4ec2f24ad5ab Fixes: bz#1595320 Signed-off-by: Atin Mukherjee <amukherj>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report. glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html [2] https://www.gluster.org/pipermail/gluster-users/