1383979 – Getting error messages in glusterd.log when peer detach is done

Bug 1383979 - Getting error messages in glusterd.log when peer detach is done

Summary: Getting error messages in glusterd.log when peer detach is done

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Gaurav Yadav
QA Contact:	Bala Konda Reddy M
Docs Contact:
URL:
Whiteboard:
Depends On:	1421607
Blocks:	1417147
TreeView+	depends on / blocked

Reported:	2016-10-12 09:49 UTC by Byreddy
Modified:	2017-09-21 04:54 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.8.4-19
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1421607 (view as bug list)
Environment:
Last Closed:	2017-09-21 04:28:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
peer detach related test case (38.74 KB, image/png) 2016-10-13 04:33 UTC, Byreddy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description Byreddy 2016-10-12 09:49:04 UTC

Description of problem:
=======================
Getting the below error messages in the glusterd log when deprobe the cluster node is done.

[2016-10-12 07:31:07.381464] E [MSGID: 106029] [glusterd-utils.c:7767:glusterd_check_files_identical] 0-management: stat on file: /var/lib/glusterd//-server.vol failed (No such file or directory) [No such file or directory]
[2016-10-12 07:31:07.381736] E [MSGID: 106570] [glusterd-utils.c:7196:glusterd_friend_remove_cleanup_vols] 0-management: Failed to reconfigure all daemon services.



Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Create a two nodes (n1 and n2) cluster using 3.8.4-2 build
2. detach the n2 node from n1 //  n1#]gluster peer detach n2
3. Check for error messages in n1 glusterd.log

Actual results:
===============
Getting error messages in glusterd.log when peer detach is done


Expected results:
=================
Error messages should not come.


Additional info:

Comment 3 Niels de Vos 2016-10-12 10:27:06 UTC

I assume there is a Glusto or other regression test case for this. Please point to its location or attach it to this BZ. Thanks!

Comment 4 Niels de Vos 2016-10-12 10:31:16 UTC

Also, could you mention how urgent this is? Severity is set to "high", but priority is set to "undefined". If this problem causes the log to grow rapidly, we should fix it soon, otherwise we'll move it out to a later update.

Comment 5 Byreddy 2016-10-12 10:49:38 UTC

(In reply to Niels de Vos from comment #3)
> I assume there is a Glusto or other regression test case for this. Please
> point to its location or attach it to this BZ. Thanks!





(In reply to Niels de Vos from comment #3)
> I assume there is a Glusto or other regression test case for this. Please
> point to its location or attach it to this BZ. Thanks!

test case bug ID which caught this issue is https://bugzilla.redhat.com/show_bug.cgi?id=1246946

Comment 7 Byreddy 2016-10-12 10:53:04 UTC

sorry i cleared the need info of others ..i will set it back

Comment 8 Byreddy 2016-10-12 11:05:30 UTC

(In reply to Niels de Vos from comment #4)
> Also, could you mention how urgent this is? Severity is set to "high", but
> priority is set to "undefined". If this problem causes the log to grow
> rapidly, we should fix it soon, otherwise we'll move it out to a later
> update.

This is not urgent and it's not blocker and no functionality loss But we have regression test case ( mentioned above) which will be marked as failed during regression cycle and Regression keyword will be added to this bug.

and these error messages are not continuous, for every peer detach operation, it will throw those two error messages.

Comment 9 Kaleb KEITHLEY 2016-10-12 11:33:23 UTC

I'm not really sure how changing the default for starting gNFS or not would have anything to do with peer probing or related log messages.

Can you elaborate?

Comment 10 Atin Mukherjee 2016-10-12 11:37:08 UTC

(In reply to Byreddy from comment #8)
> (In reply to Niels de Vos from comment #4)
> > Also, could you mention how urgent this is? Severity is set to "high", but
> > priority is set to "undefined". If this problem causes the log to grow
> > rapidly, we should fix it soon, otherwise we'll move it out to a later
> > update.
> 
> This is not urgent and it's not blocker and no functionality loss But we
> have regression test case ( mentioned above) which will be marked as failed
> during regression cycle and Regression keyword will be added to this bug.

I disagree! Why would you want to mark a test failed given the test has actually passed? On the basis of having couple of error entries in the log a test case can not be failed and regression keyword can not be used IMO.

Rahul - please chime in with your thoughts.
> 
> and these error messages are not continuous, for every peer detach
> operation, it will throw those two error messages.

Comment 11 Atin Mukherjee 2016-10-12 11:41:02 UTC

(In reply to Kaleb KEITHLEY from comment #9)
> I'm not really sure how changing the default for starting gNFS or not would
> have anything to do with peer probing or related log messages.
> 
> Can you elaborate?

glusterd_friend_remove () ==> glusterd_friend_remove_cleanup_vols () ==> glusterd_svcs_reconfigure () ==> glusterd_nfssvc_reconfigure () where this function is unconditionally called (should be called only if gNFS is active)

And this is for peer detach code path.

Comment 12 Byreddy 2016-10-12 11:50:10 UTC

(In reply to Atin Mukherjee from comment #10)
> (In reply to Byreddy from comment #8)
> > (In reply to Niels de Vos from comment #4)
> > > Also, could you mention how urgent this is? Severity is set to "high", but
> > > priority is set to "undefined". If this problem causes the log to grow
> > > rapidly, we should fix it soon, otherwise we'll move it out to a later
> > > update.
> > 
> > This is not urgent and it's not blocker and no functionality loss But we
> > have regression test case ( mentioned above) which will be marked as failed
> > during regression cycle and Regression keyword will be added to this bug.
> 
> I disagree! Why would you want to mark a test failed given the test has
> actually passed? On the basis of having couple of error entries in the log a
> test case can not be failed and regression keyword can not be used IMO.
> 

As per the test case, peer detach should not populate any error messages but currently it's throwing errors and  this issue was not there in last GA release so it's regression from my side.


> Rahul - please chime in with your thoughts.
> > 
> > and these error messages are not continuous, for every peer detach
> > operation, it will throw those two error messages.

Comment 15 Byreddy 2016-10-13 04:33:55 UTC

Created attachment 1209905 [details]
peer detach related test case

Comment 20 Atin Mukherjee 2017-02-09 05:22:10 UTC

Gaurav - Can you start looking into this issue? I'd like to get this fixed in next release.

Comment 21 Atin Mukherjee 2017-02-13 10:46:47 UTC

upstream patch : https://review.gluster.org/#/c/16607

Comment 23 Atin Mukherjee 2017-03-24 09:44:45 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/101314/

Comment 25 Bala Konda Reddy M 2017-05-10 10:33:00 UTC

Build: 3.8.4.24

Based on the patch tested below scenarios
1. Detach peer with no volume been created in the cluster.
2. Detach peer after deleting all the volumes which were created but never started.

Detaching peer is not producing any error messages in the glusterd log.

Hence marking the bz as verified.

Comment 27 errata-xmlrpc 2017-09-21 04:28:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 28 errata-xmlrpc 2017-09-21 04:54:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.