1662828 – Longevity: glusterfsd(brick process) crashed when we do volume creates and deletes

Bug 1662828 - Longevity: glusterfsd(brick process) crashed when we do volume creates and deletes

Summary: Longevity: glusterfsd(brick process) crashed when we do volume creates and de...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 3
Assignee:	Mohit Agrawal
QA Contact:	Rochelle
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1662906
TreeView+	depends on / blocked

Reported:	2019-01-02 06:09 UTC by Nag Pavan Chilakam
Modified:	2019-02-04 07:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.12.2-37
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1662906 (view as bug list)
Environment:
Last Closed:	2019-02-04 07:41:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
script (626 bytes, text/plain) 2019-01-04 13:17 UTC, Nag Pavan Chilakam	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0263	0	None	None	None	2019-02-04 07:41:53 UTC

Description Nag Pavan Chilakam 2019-01-02 06:09:05 UTC

Description of problem:
======================
was running a test to check memory footprint increasing when we create/delete volumes and see if oom kill can happen. also was retesting "bz#1661144 - Longevity: Over time brickmux feature not being honored(ie new bricks spawning) and bricks not getting attached to brick process" without heketi

after about 3 days I found that the brick crashed, this was the brick whose memory footprint was increasing


(gdb) bt
#0  0x00007f3e88e5ce30 in pthread_detach () from /lib64/libpthread.so.0
#1  0x00007f3e77ded1df in posix_spawn_health_check_thread (xl=0x7f328698ae60) at posix-helpers.c:1858
#2  0x00007f3e77de80f1 in init (this=<optimized out>) at posix.c:7935
#3  0x00007f3e89ffa1db in __xlator_init (xl=0x7f328698ae60) at xlator.c:472
#4  xlator_init (xl=xl@entry=0x7f328698ae60) at xlator.c:500
#5  0x00007f3e8a033ae9 in glusterfs_graph_init (graph=graph@entry=0x7f328693f410) at graph.c:363
#6  0x00007f3e8a035314 in glusterfs_graph_attach (orig_graph=0x7f3e78004230, path=<optimized out>, 
    newgraph=newgraph@entry=0x7f3964340748) at graph.c:1248
#7  0x000055ad7dc2813d in glusterfs_handle_attach (req=0x7f3964006cb8) at glusterfsd-mgmt.c:978
#8  0x00007f3e8a0361f0 in synctask_wrap () at syncop.c:375
#9  0x00007f3e8866e010 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) bt
#0  0x00007f3e88e5ce30 in pthread_detach () from /lib64/libpthread.so.0
#1  0x00007f3e77ded1df in posix_spawn_health_check_thread (xl=0x7f328698ae60) at posix-helpers.c:1858
#2  0x00007f3e77de80f1 in init (this=<optimized out>) at posix.c:7935
#3  0x00007f3e89ffa1db in __xlator_init (xl=0x7f328698ae60) at xlator.c:472
#4  xlator_init (xl=xl@entry=0x7f328698ae60) at xlator.c:500
#5  0x00007f3e8a033ae9 in glusterfs_graph_init (graph=graph@entry=0x7f328693f410) at graph.c:363
#6  0x00007f3e8a035314 in glusterfs_graph_attach (orig_graph=0x7f3e78004230, path=<optimized out>, 
    newgraph=newgraph@entry=0x7f3964340748) at graph.c:1248
#7  0x000055ad7dc2813d in glusterfs_handle_attach (req=0x7f3964006cb8) at glusterfsd-mgmt.c:978
#8  0x00007f3e8a0361f0 in synctask_wrap () at syncop.c:375
#9  0x00007f3e8866e010 in ?? () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) t a a bt

Thread 2565 (Thread 0x7f323a833700 (LWP 39875)):
#0  0x00007f3e88e5f965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e76ca4883 in changelog_ev_connector (data=0x7f321dcc7838) at changelog-ev-handle.c:205
#2  0x00007f3e88e5bdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e88723ead in clone () from /lib64/libc.so.6

Thread 2564 (Thread 0x7f2fef88e700 (LWP 43331)):
#0  0x00007f3e88e5f965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e76862303 in br_stub_signth (arg=<optimized out>) at bit-rot-stub.c:867
#2  0x00007f3e88e5bdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e88723ead in clone () from /lib64/libc.so.6

Thread 2563 (Thread 0x7f30030a8700 (LWP 43332)):
#0  0x00007f3e88e5f965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e76860d2b in br_stub_worker (data=<optimized out>) at bit-rot-stub-helpers.c:375
#2  0x00007f3e88e5bdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e88723ead in clone () from /lib64/libc.so.6

Thread 2562 (Thread 0x7f34f4ad5700 (LWP 47099)):
#0  0x00007f3e88e5f965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e76ca4883 in changelog_ev_connector (data=0x7f311a363d48) at changelog-ev-handle.c:205
#2  0x00007f3e88e5bdd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3e88723ead in clone () from /lib64/libc.so.6

Thread 2561 (Thread 0x7f34e382d700 (LWP 47106)):
#0  0x00007f3e88e5f965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f3e77ded70b in posix_fsyncer_pick (this=this@entry=0x7f322c2a7e00, head=head@entry=0x7f34e382ce80)
    at posix-helpers.c:1988


Version-Release number of selected component (if applicable):
============================
3.12.2-34


How reproducible:
================
hit once

Steps to Reproduce:
1) 3 node setup , brickmux enabled, default of maxbrickperproc=250
2) created 1 volume which will NOT be deleted throughout the test
3) creating about 100volumes and starting them, then creating next set of 100volumes and deleting old 100volumes
4) so at any time max 201 volumes exist






Actual results:
===========
after about 3 days of test, crash was seen

Comment 7 Nag Pavan Chilakam 2019-01-04 13:17:33 UTC

Created attachment 1518390 [details]
script

Comment 18 errata-xmlrpc 2019-02-04 07:41:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0263

Note You need to log in before you can comment on or make changes to this bug.