1661144 – Longevity: Over time brickmux feature not being honored(ie new bricks spawning) and bricks not getting attached to brick process

Bug 1661144 - Longevity: Over time brickmux feature not being honored(ie new bricks spawning) and bricks not getting attached to brick process

Summary: Longevity: Over time brickmux feature not being honored(ie new bricks spawnin...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Sanju
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-20 07:40 UTC by Nag Pavan Chilakam
Modified:	2020-06-10 12:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-12 16:46:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2018-12-20 07:40:25 UTC

Description of problem:
========================
We have been seeing that memory consumed on volume creates by brick process is not being released after volume delete. The same has been reported in BZ#1653225
However, to gauge the severity , we wanted to see at what stage the brick process could get OOM killed.
As part of that, I was running a test which does creates and deletes of volumes continuously, till we see a OOM kill of glusterfsd, which kept increasing.
The volume ops were triggered from heketi but to a standalone 3 node rhgs setup.
After about 4 days, I noticed that on one node a new brick process was being spawned instead of reusing the existing process. 
Then in the same day, another node too faced this problem.
And also I noticed that from volume status most of the volumes' bricks were not even getting attached to any of the brick processes.

The 3rd node seems to not yet been impacted with this problem, but can get into the same situation over time. 


Version-Release number of selected component (if applicable):
=======================
rhgs:3.12.2-32
heketi:8


How reproducible:
================
1/1

Steps to Reproduce:
1) created a 3 node rhgs setup (each is 8GB ram VM, the purpose of doing this was to just see if the brick proc gets OOMKilled)
Brick mux was enabled
2) had a seperate heketi mngt node, which is not part of cluster/
3) first created a volume called "heketi" of size 5GB----->this volume will not be deleted throughout the test
4) started to create volumes of size ranging from 1GB-5GB from heketi and then deleting the volumes , using a shell script. This was done for  26000 Volume creates. Note that at any point of time the max number of volumes was 11-60(and mostly it was 11-21 volumes only, took this decision of not having many volumes at one time, due to the setup capability)
5)After about 4days about 26000 volume creates , noticed that one node, started to spawn new brick processes instead of using the existing
6) after another 10hrs or so, another node to started to face same problem
7) Volume status shows most of the bricks hosted by above two nodes are not even being attached to the brick process.

All seems to be good so far on the 3rd node


Actual results:
==============
1)Seeing newer brick processes even when brickmux is enabled(same volume config, so this is not expected)
2)bricks are not getting attached to brick process

Comment 1 Nag Pavan Chilakam 2018-12-20 13:29:17 UTC

sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1661144

Comment 5 Sanju 2018-12-26 11:25:35 UTC

Initial analysis:

From the glusterd logs from 10.70.35.140:

[2018-12-19 12:01:30.651068] I [glusterd-utils.c:5658:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick to existing process for /var/lib/heketi/mounts/vg_7b849ba83b2d76829b278eab5556b3da/brick_95b8f5bbaf22fef921f32ca16134f1f7/brick
[2018-12-19 12:01:30.651241] E [glusterd-utils.c:5690:attach_brick] 0-management: adding brick to process failed
[2018-12-19 12:01:30.651273] W [glusterd-utils.c:5722:attach_brick] 0-management: attach failed for /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick
[2018-12-19 12:01:30.651303] I [glusterd-utils.c:6316:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick
[2018-12-19 12:01:30.666957] I [glusterd-utils.c:5514:attach_brick_callback] 0-management: attach_brick failed pidfile is /var/run/gluster/vols/52-F-4-5/10.70.35.140-var-lib-heketi-mounts-vg_32bcf9d1cb7d524d777001328b36cac0-brick_7ac2ffb973558759acb1fd3fe78417fb-brick.pid for brick_path /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick

The above messages say that, the brick tried to get attached to the existing brick process. But it is failed. then the brick has started a fresh brick process.

a snippet from attach_brick():

                        ret = send_attach_req (this, rpc, path, brickinfo,
                                               other_brick,
                                               GLUSTERD_BRICK_ATTACH);
                        rpc_clnt_unref (rpc);
                        if (!ret) {
                                ret = pmap_registry_extend (this, other_brick->port,
                                            brickinfo->path);
                                if (ret != 0) {
                                        gf_log (this->name, GF_LOG_ERROR,
                                                "adding brick to process failed");
                                        goto out;
                                }
                                brickinfo->port = other_brick->port;
                                ret = glusterd_brick_process_add_brick(brickinfo
                                                                 , other_brick);
                                if (ret) {
                                        gf_msg (this->name, GF_LOG_ERROR, 0,
                                                GD_MSG_BRICKPROC_ADD_BRICK_FAILED,
                                                "Adding brick %s:%s to brick "
                                                "process failed", brickinfo->hostname,
                                                brickinfo->path);
                                        return ret;
                                }
                                return 0;
                        }
We have "adding brick to process failed" in glusterd log, that means pmap_registry_extend () has returned a non-zero value.

Source code of pmap_registry_extend() :

        pmap = pmap_registry_get (this);

        if (port > pmap->max_port) {
                return -1;
        }
        
        switch (pmap->ports[port].type) {
        case GF_PMAP_PORT_LEASED:
        case GF_PMAP_PORT_BRICKSERVER:
                break;
        default:
                return -1;
        }
        
        old_bn = pmap->ports[port].brickname;
        if (old_bn) {
                bn_len = strlen(brickname);
                entry = strstr (old_bn, brickname);
                while (entry) { 
                        found = 1; 
                        if ((entry != old_bn) && (entry[-1] != ' ')) {
                                found = 0;
                        }
                        if ((entry[bn_len] != ' ') && (entry[bn_len] != '\0')) {
                                found = 0;
                        }
                        if (found) {
                                return 0;
                        }
                        entry = strstr (entry + bn_len, brickname);
                }
                asprintf (&new_bn, "%s %s", old_bn, brickname);
        } else {
                new_bn = strdup (brickname);
        }
        
        if (!new_bn) { 
                return -1;
        }
        
        pmap->ports[port].brickname = new_bn;
        free (old_bn);

        return 0;

So, pmap_registry_extend () can return non-zero value, when
1. port > pmap->max_port
2. pmap->ports[port].type is none of GF_PMAP_PORT_LEASED and GF_PMAP_PORT_BRICKSERVER
3. new_bn is null

we are seeing
[2018-12-19 12:01:30.837404] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick on port 49153
in glusterd logs. that says port is not greater than max_port

I suspect new_bn is null and we've returned -1. We are allocating memory for new_bn by using asprintf/strdup. Somehow, we might have failed to allocate the memory. I will continue to investigate this.

Thanks,
Sanju

Comment 9 Nag Pavan Chilakam 2019-03-25 09:05:16 UTC

Atin, any particular reason we don't want to propose it for 3.5.0, given that it can impact OCS environments?

Note You need to log in before you can comment on or make changes to this bug.