Bug 1679892

Summary: assertion failure log in glusterd.log file when a volume start is triggered
Product: [Community] GlusterFS Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Sanju <srakonde>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 6CC: bugs, pasik, srakonde
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-10 11:26:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1672818, 1732875    

Description Atin Mukherjee 2019-02-22 07:45:41 UTC
Description of problem:

[2019-02-22 07:38:28.772914] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-02-22 07:38:32.322872] I [glusterd-utils.c:6305:glusterd_brick_start] 0-management: starting a fresh brick process for brick /tmp/b1
[2019-02-22 07:38:32.420144] I [MSGID: 106142] [glusterd-pmap.c:290:pmap_registry_bind] 0-pmap: adding brick /tmp/b1 on port 49152
[2019-02-22 07:38:32.420635] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2019-02-22 07:38:32.491504] E [mem-pool.c:351:__gf_free] (-->/usr/local/lib/glusterfs/6.0alpha/xlator/mgmt/glusterd.so(+0x4842e) [0x7fc95a8f742e] -->/usr/local/lib/glusterfs/6.0alpha/xlator/mgmt/glusterd.so(+0x4821a) [0x7fc95a8f721a] -->/usr/local/lib/libglusterfs.so.0(__gf_free+0x22d) [0x7fc96042ccfd] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size
[2019-02-22 07:38:32.492228] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2019-02-22 07:38:32.493431] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-gfproxyd: setting frame-timeout to 600
[2019-02-22 07:38:32.494848] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2019-02-22 07:38:32.495530] I [MSGID: 106131] [glusterd-proc-mgmt.c:86:glusterd_proc_stop] 0-management: nfs already stopped
[2019-02-22 07:38:32.495655] I [MSGID: 106568] [glusterd-svc-mgmt.c:253:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-02-22 07:38:32.495728] I [MSGID: 106599] [glusterd-nfs-svc.c:81:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Out of a 3 node cluster setup create a replica 3 volume and start it.

Actual results:
assertion failure and 'failed to dispatch handler' errors are seen.

Expected results:

No errors in the glusterd log should be seen. Assertion failure log tends to indicate there might be a corruption too which is more severe here.

Additional info:

Comment 1 Atin Mukherjee 2019-03-12 05:02:52 UTC
I don't see this happening any further on the latest testing of the release-6 branch. Will keep this bug open for sometime, but taking out the 6.0 blocker.

Comment 2 Sanju 2019-03-12 05:24:17 UTC
I still see the assertion failure message in the glusterd.log

[2019-03-12 05:19:06.206695] E [mem-pool.c:351:__gf_free] (-->/usr/local/lib/glusterfs/6.0rc0/xlator/mgmt/glusterd.so(+0x48133) [0x7f264602c133] -->/usr/local/lib/glusterfs/6.0rc0/xlator/mgmt/glusterd.so(+0x47f0a) [0x7f264602bf0a] -->/usr/local/lib/libglusterfs.so.0(__gf_free+0x22d) [0x7f265263ac9d] ) 0-: Assertion failed: mem_acct->rec[header->type].size >= header->size

I will update the bug with root cause as soon as possible.

Thanks,
Sanju

Comment 3 Atin Mukherjee 2019-06-09 05:39:11 UTC
Ping! Any progress on this? Is this still seen with latest master?

Comment 4 Sanju 2019-06-10 11:26:04 UTC
https://review.gluster.org/#/c/glusterfs/+/22600/ has removed this assert condition, so we don't see this assertion failure in log now.

Susant is working on this issue and https://bugzilla.redhat.com/show_bug.cgi?id=1700865 is tracking it. So, I'm closing this bug.

Thanks,
Sanju

*** This bug has been marked as a duplicate of bug 1700865 ***