Description of problem: ======================= Had a four node cluster with Distributed-Replica volume and started volume set operation on one of the node and glusterd restart on other node in the cluster, after some time of execution found that glusterd stopped running on node where glusterd restart was happening. Version-Release number of selected component (if applicable): ============================================================== glusterfs-3.7.5-19 How reproducible: ================= seen it one time Steps to Reproduce: =================== 1.Have four node cluster with distributed volume type (2*2) 2.On node-1 execute the command >> for i in `seq 1 100`; do gluster volume set Dis-Rep stat-prefetch on; sleep 2; gluster volume set Dis-Rep stat-prefetch off;done 3.On node-2, execute the following command >> for i in `seq 1 100`; do systemctl stop glusterd; systemctl start glusterd; gluster volume info; done Actual results: =============== Glusterd was not running on node-2 where glusterd restart was happening Expected results: ================= GlusterD should be running and no errors in glusterd logs Additional info:
This issue is hit at part of the negative testing where while gluster volume set was executed at the same point of time glusterd in another instance was brought down. In the faulty node we could see /var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp file has the correct contents. This indicates that when we rename from .tmp to the actual one while committing into glusterd store the operation failed. Considering rename been a syscall and it should be atomic this shouldn't have happened. Further analysis to take place.
man 2 rename has the following notes: If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place. When overwriting there will probably be a window in which both oldpath and newpath refer to the file being renamed. Considering the above two points there is a possible window where when we bring down GlusterD instance on the node and rename () was in process to rename info.tmp file to info but cleanup_and_exit() forcibly brought down the process resulting into terminating the rename operation in the middle and leaving both the files in place where info.tmp file contains all the content and info file is been zeroed out. If we can ensure that in cleanup_and_exit (glusterfs as a whole) all the threads first finish processing its task and then a graceful shutdown happens we should be able to take care of such problems. Since this is a negative test and occurrence of GlusterD going down while a transaction is been initiated is rare, lowering down the priority & severity.
I don't think we'd need to spend much time fixing these kind of issues as this is very much a negative test and unlikely in production set up. I am closing this bug saying Won't fix. Feel free to reopen with proper justification.
Created attachment 1219638 [details] the vols directory The attachment is the vols directory in the node where glusterd can't be start.
Hi Atin Mukherjee, I face the very similar issue in glusterfs 3.7.6 as you know. When I start the glusterd some error happened. And the log is following. [2016-11-08 07:58:34.989365] I [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2016-11-08 07:58:34.998356] I [MSGID: 106478] [glusterd.c:1350:init] 0-management: Maximum allowed open file descriptors set to 65536 [2016-11-08 07:58:35.000667] I [MSGID: 106479] [glusterd.c:1399:init] 0-management: Using /system/glusterd as working directory [2016-11-08 07:58:35.024508] I [MSGID: 106514] [glusterd-store.c:2075:glusterd_restore_op_version] 0-management: Upgrade detected. Setting op-version to minimum : 1 [2016-11-08 07:58:35.025356] E [MSGID: 106206] [glusterd-store.c:2562:glusterd_store_update_volinfo] 0-management: Failed to get next store iter [2016-11-08 07:58:35.025401] E [MSGID: 106207] [glusterd-store.c:2844:glusterd_store_retrieve_volume] 0-management: Failed to update volinfo for c_glusterfs volume [2016-11-08 07:58:35.025463] E [MSGID: 106201] [glusterd-store.c:3042:glusterd_store_retrieve_volumes] 0-management: Unable to restore volume: c_glusterfs [2016-11-08 07:58:35.025544] E [MSGID: 101019] [xlator.c:428:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2016-11-08 07:58:35.025582] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2016-11-08 07:58:35.025629] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed [2016-11-08 07:58:35.026109] W [glusterfsd.c:1236:cleanup_and_exit] (-->/usr/sbin/glusterd(glusterfs_volumes_init-0x1b260) [0x1000a718] -->/usr/sbin/glusterd(glusterfs_process_volfp-0x1b3b8) [0x1000a5a8] -->/usr/sbin/glusterd(cleanup_and_exit-0x1c02c) [0x100098bc] ) 0-: received signum (0), shutting down And then I found that the size of vols/volume_name/info is 0.It cause glusterd shutdown. But I found that vols/volume_name_info.tmp is not 0. And I found that there is a brick file vols/volume_name/bricks/xxxx.brick is 0, but vols/volume_name/bricks/xxxx.brick.tmp is not 0. you said that "This issue is hit at part of the negative testing where while gluster volume set was executed at the same point of time glusterd in another instance was brought down. In the faulty node we could see /var/lib/glusterd/vols/<volname>info file been empty whereas the info.tmp file has the correct contents." in comment 2. I have two questions for you. 1.Could you reproduce this issue by gluster volume set glusterd which was brought down? 2.Could you be certain that this issue is cause by rename is interrupted in kernel? In my case there are two files, info and 10.32.1.144.-opt-lvmdir-c2-brick, are both empty. But in my view only one rename can be running At the same time. Why there are two files are empty? Or rename("info.tmp", "info") and rename("xxx-brick.tmp", "xxx-brick") are running in two thread? I have added the vols directory in attachment. Thanks, Xin