Bug 1461649

Summary: glusterd crashes when statedump is taken
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra G <rgowdapp>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1461655 (view as bug list) Environment:
Last Closed: 2017-09-21 04:59:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1461655    
Bug Blocks: 1417151    

Description Raghavendra G 2017-06-15 05:27:05 UTC
Description of problem:

* start glusterd
* take statedump
* glusterd crashes

Thread 1 (Thread 0x7f535fd71700 (LWP 1557)):
#0  glusterd_dump_priv (this=<optimized out>) at glusterd-statedump.c:243
#1  0x00007f5367aaf655 in gf_proc_dump_xlator_info (top=<optimized out>) at statedump.c:502
#2  0x00007f5367aafb92 in gf_proc_dump_info (signum=signum@entry=10, ctx=0xd1e010) at statedump.c:837
#3  0x00000000004089b9 in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2083
#4  0x0000003866e07d14 in start_thread (arg=0x7f535fd71700) at pthread_create.c:309
#5  0x0000003866af168d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) l
248	                                                "pmap[%d].type", port);
249	                        gf_proc_dump_write (key, "%d", pmap->ports[port].type);
250	                        gf_proc_dump_build_key (key, "glusterd",
251	                                                "pmap[%d].brickname", port);
252	                        gf_proc_dump_write (key, "%s",
253	                                            pmap->ports[port].brickname);
254	
255	                }
256	                /* Dump client details */
257	                glusterd_dump_client_details (priv);
(gdb) p pmap
$2 = (struct pmap_registry *) 0x0
(gdb) l glusterd-statedump.c:243
238	                /* Dump peer details */
239	                GLUSTERD_DUMP_PEERS (&priv->peers, uuid_list, _gf_false);
240	
241	                /* Dump pmap data structure from base port to last alloc */
242	                pmap = priv->pmap;
243	                for (port = pmap->base_port; port <= pmap->last_alloc;
244	                     port++) {
245	                        gf_proc_dump_build_key (key, "glusterd", "pmap_port");
246	                        gf_proc_dump_write (key, "%d", port);
247	                        gf_proc_dump_build_key (key, "glusterd",
(gdb) p pmap
$3 = (struct pmap_registry *) 0x0

I think since there are no bricks, pmap is NULL resulting in crash. This _MAY_ not happen when there are volumes and bricks associated with glusterd

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Atin Mukherjee 2017-06-15 05:46:53 UTC
You are right. This is because the pmap object is created when glusterd allocates the the port for the first time and that will happen only when the first volume is started.

Comment 3 Atin Mukherjee 2017-06-15 06:07:04 UTC
upstream patch : https://review.gluster.org/#/c/17549

Comment 5 Atin Mukherjee 2017-06-15 11:01:08 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/109180

Comment 8 Bala Konda Reddy M 2017-07-07 06:17:01 UTC
BUILD: 3.8.4-32

On a fresh setup Started glusterd, didn't perform any gluster volume operations and took statedump of glusterd. No glusterd crashes seen.

Hence marking the bz as verified

Comment 10 errata-xmlrpc 2017-09-21 04:59:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774