Description of problem: ======================= Getting the below error messages in the glusterd log when deprobe the cluster node is done. [2016-10-12 07:31:07.381464] E [MSGID: 106029] [glusterd-utils.c:7767:glusterd_check_files_identical] 0-management: stat on file: /var/lib/glusterd//-server.vol failed (No such file or directory) [No such file or directory] [2016-10-12 07:31:07.381736] E [MSGID: 106570] [glusterd-utils.c:7196:glusterd_friend_remove_cleanup_vols] 0-management: Failed to reconfigure all daemon services. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-2 How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a two nodes (n1 and n2) cluster using 3.8.4-2 build 2. detach the n2 node from n1 // n1#]gluster peer detach n2 3. Check for error messages in n1 glusterd.log Actual results: =============== Getting error messages in glusterd.log when peer detach is done Expected results: ================= Error messages should not come. Additional info:
I assume there is a Glusto or other regression test case for this. Please point to its location or attach it to this BZ. Thanks!
Also, could you mention how urgent this is? Severity is set to "high", but priority is set to "undefined". If this problem causes the log to grow rapidly, we should fix it soon, otherwise we'll move it out to a later update.
(In reply to Niels de Vos from comment #3) > I assume there is a Glusto or other regression test case for this. Please > point to its location or attach it to this BZ. Thanks! (In reply to Niels de Vos from comment #3) > I assume there is a Glusto or other regression test case for this. Please > point to its location or attach it to this BZ. Thanks! test case bug ID which caught this issue is https://bugzilla.redhat.com/show_bug.cgi?id=1246946
sorry i cleared the need info of others ..i will set it back
(In reply to Niels de Vos from comment #4) > Also, could you mention how urgent this is? Severity is set to "high", but > priority is set to "undefined". If this problem causes the log to grow > rapidly, we should fix it soon, otherwise we'll move it out to a later > update. This is not urgent and it's not blocker and no functionality loss But we have regression test case ( mentioned above) which will be marked as failed during regression cycle and Regression keyword will be added to this bug. and these error messages are not continuous, for every peer detach operation, it will throw those two error messages.
I'm not really sure how changing the default for starting gNFS or not would have anything to do with peer probing or related log messages. Can you elaborate?
(In reply to Byreddy from comment #8) > (In reply to Niels de Vos from comment #4) > > Also, could you mention how urgent this is? Severity is set to "high", but > > priority is set to "undefined". If this problem causes the log to grow > > rapidly, we should fix it soon, otherwise we'll move it out to a later > > update. > > This is not urgent and it's not blocker and no functionality loss But we > have regression test case ( mentioned above) which will be marked as failed > during regression cycle and Regression keyword will be added to this bug. I disagree! Why would you want to mark a test failed given the test has actually passed? On the basis of having couple of error entries in the log a test case can not be failed and regression keyword can not be used IMO. Rahul - please chime in with your thoughts. > > and these error messages are not continuous, for every peer detach > operation, it will throw those two error messages.
(In reply to Kaleb KEITHLEY from comment #9) > I'm not really sure how changing the default for starting gNFS or not would > have anything to do with peer probing or related log messages. > > Can you elaborate? glusterd_friend_remove () ==> glusterd_friend_remove_cleanup_vols () ==> glusterd_svcs_reconfigure () ==> glusterd_nfssvc_reconfigure () where this function is unconditionally called (should be called only if gNFS is active) And this is for peer detach code path.
(In reply to Atin Mukherjee from comment #10) > (In reply to Byreddy from comment #8) > > (In reply to Niels de Vos from comment #4) > > > Also, could you mention how urgent this is? Severity is set to "high", but > > > priority is set to "undefined". If this problem causes the log to grow > > > rapidly, we should fix it soon, otherwise we'll move it out to a later > > > update. > > > > This is not urgent and it's not blocker and no functionality loss But we > > have regression test case ( mentioned above) which will be marked as failed > > during regression cycle and Regression keyword will be added to this bug. > > I disagree! Why would you want to mark a test failed given the test has > actually passed? On the basis of having couple of error entries in the log a > test case can not be failed and regression keyword can not be used IMO. > As per the test case, peer detach should not populate any error messages but currently it's throwing errors and this issue was not there in last GA release so it's regression from my side. > Rahul - please chime in with your thoughts. > > > > and these error messages are not continuous, for every peer detach > > operation, it will throw those two error messages.
Created attachment 1209905 [details] peer detach related test case
Gaurav - Can you start looking into this issue? I'd like to get this fixed in next release.
upstream patch : https://review.gluster.org/#/c/16607
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101314/
Build: 3.8.4.24 Based on the patch tested below scenarios 1. Detach peer with no volume been created in the cluster. 2. Detach peer after deleting all the volumes which were created but never started. Detaching peer is not producing any error messages in the glusterd log. Hence marking the bz as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774