+++ This bug was initially created as a clone of Bug #1322772 +++ +++ This bug was initially created as a clone of Bug #1322765 +++ Description of problem: After node reboot, glusterd didn't come up . Error" realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory]" Version-Release number of selected component (if applicable): glusterfs-3.7.9-1.el7rhgs.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create 2*2 distribute replicate volume 2. Enable uss 3. Create snasphot and activate 4. Reboot one of the node Actual results: After node reboot, glusterd should come up Expected results: glusterd is down after node reboot Additional info: [root@dhcp46-4 ~]# gluster v info Volume Name: testvol Type: Distributed-Replicate Volume ID: 60769503-f742-458d-97c0-8e090147f82a Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.46.4:/rhs/brick1/b1 Brick2: 10.70.47.46:/rhs/brick2/b2 Brick3: 10.70.46.213:/rhs/brick3/b3 Brick4: 10.70.46.148:/rhs/brick4/b4 Options Reconfigured: performance.readdir-ahead: on features.uss: enable features.barrier: disable snap-activate-on-create: enable ================================ glusterd logs from node which is rebooted [2016-03-31 12:03:00.551394] C [MSGID: 106425] [glusterd-store.c:2425:glusterd_store_retrieve_bricks] 0-management: realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory] [2016-03-31 12:19:44.102994] I [rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2016-03-31 12:19:44.106631] W [socket.c:870:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 15, Invalid argument [2016-03-31 12:19:44.106676] E [socket.c:2966:socket_connect] 0-management: Failed to set keep-alive: Invalid argument The message "I [MSGID: 106498] [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0" repeated 2 times between [2016-03-31 12:19:44.085669] and [2016-03-31 12:19:44.086167] [2016-03-31 12:19:44.114223] C [MSGID: 106425] [glusterd-store.c:2425:glusterd_store_retrieve_bricks] 0-management: realpath () failed for brick /run/gluster/snaps/130949baac8843cda443cf8a6441157f/brick3/b3. The underlying file system may be in bad state [No such file or directory] [2016-03-31 12:19:44.114364] E [MSGID: 106201] [glusterd-store.c:3082:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: 130949baac8843cda443cf8a6441157f [2016-03-31 12:19:44.114387] E [MSGID: 106195] [glusterd-store.c:3475:glusterd_store_retrieve_snap] 0-management: Failed to retrieve snap volumes for snap snap1 [2016-03-31 12:19:44.114399] E [MSGID: 106043] [glusterd-store.c:3629:glusterd_store_retrieve_snaps] 0-management: Unable to restore snapshot: snap1 [2016-03-31 12:19:44.114509] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2016-03-31 12:19:44.114542] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2016-03-31 12:19:44.114554] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed [2016-03-31 12:19:44.115626] W [glusterfsd.c:1251:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7fc632a1b2ad] -->/usr/sbin/glusterd(glusterfs_process_volfp+0x120) [0x7fc632a1b150] -->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7fc632a1a739] ) 0-: received signum (0), shutting down --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-03-31 05:46:54 EDT --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from RHEL Product and Program Management on 2016-03-31 06:02:19 EDT --- This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. --- Additional comment from Vijay Bellur on 2016-03-31 06:18:46 EDT --- REVIEW: http://review.gluster.org/13869 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#1) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-04-01 06:16:10 EDT --- REVIEW: http://review.gluster.org/13869 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#2) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-04-04 00:53:45 EDT --- REVIEW: http://review.gluster.org/13869 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#3) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-04-04 03:07:51 EDT --- REVIEW: http://review.gluster.org/13869 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#4) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-04-04 11:19:52 EDT --- REVIEW: http://review.gluster.org/13869 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#5) for review on master by Atin Mukherjee (amukherj) --- Additional comment from Vijay Bellur on 2016-04-05 06:46:51 EDT --- COMMIT: http://review.gluster.org/13869 committed in master by Atin Mukherjee (amukherj) ------ commit d3c77459593255ed2c88094c8477b8a0c9ff9073 Author: Atin Mukherjee <amukherj> Date: Thu Mar 31 14:58:02 2016 +0530 glusterd: build realpath post recreate of brick mount for snapshot Commit a60c39d introduced a new field called real_path in brickinfo to hold the realpath() conversion. However at restore path for all snapshots and snapshot restored volumes the brickpath gets recreated post restoration of bricks which means the realpath () call will fail here for all the snapshots and cloned volumes. Fix is to store the realpath for snapshots and clones post recreating the brick mounts. For normal volume it would be done during retrieving the brick details from the store. Change-Id: Ia34853acddb28bcb7f0f70ca85fabcf73276ef13 BUG: 1322772 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/13869 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Avra Sengupta <asengupt> Reviewed-by: Rajesh Joseph <rjoseph> Smoke: Gluster Build System <jenkins.com>
REVIEW: http://review.gluster.org/13905 (glusterd: build realpath post recreate of brick mount for snapshot) posted (#1) for review on release-3.7 by Atin Mukherjee (amukherj)
COMMIT: http://review.gluster.org/13905 committed in release-3.7 by Vijay Bellur (vbellur) ------ commit 6bcae5cc8081697eca0ac72631e31327e1a786a9 Author: Atin Mukherjee <amukherj> Date: Thu Mar 31 14:58:02 2016 +0530 glusterd: build realpath post recreate of brick mount for snapshot Backport of http://review.gluster.org/#/c/13869 Commit a60c39d introduced a new field called real_path in brickinfo to hold the realpath() conversion. However at restore path for all snapshots and snapshot restored volumes the brickpath gets recreated post restoration of bricks which means the realpath () call will fail here for all the snapshots and cloned volumes. Fix is to store the realpath for snapshots and clones post recreating the brick mounts. For normal volume it would be done during retrieving the brick details from the store. Change-Id: Ia34853acddb28bcb7f0f70ca85fabcf73276ef13 BUG: 1324014 Signed-off-by: Atin Mukherjee <amukherj> Reviewed-on: http://review.gluster.org/13869 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Avra Sengupta <asengupt> Reviewed-by: Rajesh Joseph <rjoseph> Smoke: Gluster Build System <jenkins.com> Reviewed-on: http://review.gluster.org/13905 Reviewed-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.11, please open a new bug report. glusterfs-3.7.11 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-April/026321.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user