Bug 1679892

Summary:	assertion failure log in glusterd.log file when a volume start is triggered
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Sanju <srakonde>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6	CC:	bugs, pasik, srakonde
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-10 11:26:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1672818, 1732875

Description Atin Mukherjee 2019-02-22 07:45:41 UTC

Description of problem:

[2019-02-22 07:38:28.772914] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-22 07:38:32.322872] I [glusterd-utils.c:6305:glusterd_brick_start] 0-management: starting a fresh brick process for brick /tmp/b1
[2019-02-22 07:38:32.420144] I [MSGID: 106142] [glusterd-pmap.c:290:pmap_registry_bind] 0-pmap: adding brick /tmp/b1 on port 49152
[2019-02-22 07:38:32.420635] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-02-22 07:38:32.491504] E [mem-pool.c:351:__gf_free] (-->/usr/local/lib/glusterfs/6.0alpha/xlator/mgmt/glusterd.so(+0x4842e) [0x7fc95a8f742e] -->/usr/local/lib/glusterfs/6.0alpha/xlator/mgmt/glusterd.so(+0x4821a) [0x7fc95a8f721a] -->/usr/local/lib/libglusterfs.so.0(__gf_free+0x22d) [0x7fc96042ccfd] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size
[2019-02-22 07:38:32.492228] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-02-22 07:38:32.493431] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2019-02-22 07:38:32.494848] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2019-02-22 07:38:32.495530] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-02-22 07:38:32.495655] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-02-22 07:38:32.495728] I [MSGID: 106599] [glusterd-nfs-svc.c:81:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Out of a 3 node cluster setup create a replica 3 volume and start it.

Actual results:
assertion failure and 'failed to dispatch handler' errors are seen.

Expected results:

No errors in the glusterd log should be seen. Assertion failure log tends to indicate there might be a corruption too which is more severe here.

Additional info:

Comment 1 Atin Mukherjee 2019-03-12 05:02:52 UTC

I don't see this happening any further on the latest testing of the release-6 branch. Will keep this bug open for sometime, but taking out the 6.0 blocker.

Comment 2 Sanju 2019-03-12 05:24:17 UTC

I still see the assertion failure message in the glusterd.log

[2019-03-12 05:19:06.206695] E [mem-pool.c:351:__gf_free] (-->/usr/local/lib/glusterfs/6.0rc0/xlator/mgmt/glusterd.so(+0x48133) [0x7f264602c133] -->/usr/local/lib/glusterfs/6.0rc0/xlator/mgmt/glusterd.so(+0x47f0a) [0x7f264602bf0a] -->/usr/local/lib/libglusterfs.so.0(__gf_free+0x22d) [0x7f265263ac9d] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size

I will update the bug with root cause as soon as possible.

Thanks,
Sanju

Comment 3 Atin Mukherjee 2019-06-09 05:39:11 UTC

Ping! Any progress on this? Is this still seen with latest master?

Comment 4 Sanju 2019-06-10 11:26:04 UTC

https://review.gluster.org/#/c/glusterfs/+/22600/ has removed this assert condition, so we don't see this assertion failure in log now.

Susant is working on this issue and https://bugzilla.redhat.com/show_bug.cgi?id=1700865 is tracking it. So, I'm closing this bug.

Thanks,
Sanju

*** This bug has been marked as a duplicate of bug 1700865 ***