Bug 1661144
| Summary: | Longevity: Over time brickmux feature not being honored(ie new bricks spawning) and bricks not getting attached to brick process | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | core | Assignee: | Sanju <srakonde> |
| Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | rhgs-3.4 | CC: | amukherj, nchilaka, pasik, puebele, rhs-bugs, storage-qa-internal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-12 16:46:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nag Pavan Chilakam
2018-12-20 07:40:25 UTC
Initial analysis:
From the glusterd logs from 10.70.35.140:
[2018-12-19 12:01:30.651068] I [glusterd-utils.c:5658:attach_brick] 0-management: add brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick to existing process for /var/lib/heketi/mounts/vg_7b849ba83b2d76829b278eab5556b3da/brick_95b8f5bbaf22fef921f32ca16134f1f7/brick
[2018-12-19 12:01:30.651241] E [glusterd-utils.c:5690:attach_brick] 0-management: adding brick to process failed
[2018-12-19 12:01:30.651273] W [glusterd-utils.c:5722:attach_brick] 0-management: attach failed for /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick
[2018-12-19 12:01:30.651303] I [glusterd-utils.c:6316:glusterd_brick_start] 0-management: starting a fresh brick process for brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick
[2018-12-19 12:01:30.666957] I [glusterd-utils.c:5514:attach_brick_callback] 0-management: attach_brick failed pidfile is /var/run/gluster/vols/52-F-4-5/10.70.35.140-var-lib-heketi-mounts-vg_32bcf9d1cb7d524d777001328b36cac0-brick_7ac2ffb973558759acb1fd3fe78417fb-brick.pid for brick_path /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick
The above messages say that, the brick tried to get attached to the existing brick process. But it is failed. then the brick has started a fresh brick process.
a snippet from attach_brick():
ret = send_attach_req (this, rpc, path, brickinfo,
other_brick,
GLUSTERD_BRICK_ATTACH);
rpc_clnt_unref (rpc);
if (!ret) {
ret = pmap_registry_extend (this, other_brick->port,
brickinfo->path);
if (ret != 0) {
gf_log (this->name, GF_LOG_ERROR,
"adding brick to process failed");
goto out;
}
brickinfo->port = other_brick->port;
ret = glusterd_brick_process_add_brick(brickinfo
, other_brick);
if (ret) {
gf_msg (this->name, GF_LOG_ERROR, 0,
GD_MSG_BRICKPROC_ADD_BRICK_FAILED,
"Adding brick %s:%s to brick "
"process failed", brickinfo->hostname,
brickinfo->path);
return ret;
}
return 0;
}
We have "adding brick to process failed" in glusterd log, that means pmap_registry_extend () has returned a non-zero value.
Source code of pmap_registry_extend() :
pmap = pmap_registry_get (this);
if (port > pmap->max_port) {
return -1;
}
switch (pmap->ports[port].type) {
case GF_PMAP_PORT_LEASED:
case GF_PMAP_PORT_BRICKSERVER:
break;
default:
return -1;
}
old_bn = pmap->ports[port].brickname;
if (old_bn) {
bn_len = strlen(brickname);
entry = strstr (old_bn, brickname);
while (entry) {
found = 1;
if ((entry != old_bn) && (entry[-1] != ' ')) {
found = 0;
}
if ((entry[bn_len] != ' ') && (entry[bn_len] != '\0')) {
found = 0;
}
if (found) {
return 0;
}
entry = strstr (entry + bn_len, brickname);
}
asprintf (&new_bn, "%s %s", old_bn, brickname);
} else {
new_bn = strdup (brickname);
}
if (!new_bn) {
return -1;
}
pmap->ports[port].brickname = new_bn;
free (old_bn);
return 0;
So, pmap_registry_extend () can return non-zero value, when
1. port > pmap->max_port
2. pmap->ports[port].type is none of GF_PMAP_PORT_LEASED and GF_PMAP_PORT_BRICKSERVER
3. new_bn is null
we are seeing
[2018-12-19 12:01:30.837404] I [MSGID: 106143] [glusterd-pmap.c:282:pmap_registry_bind] 0-pmap: adding brick /var/lib/heketi/mounts/vg_32bcf9d1cb7d524d777001328b36cac0/brick_7ac2ffb973558759acb1fd3fe78417fb/brick on port 49153
in glusterd logs. that says port is not greater than max_port
I suspect new_bn is null and we've returned -1. We are allocating memory for new_bn by using asprintf/strdup. Somehow, we might have failed to allocate the memory. I will continue to investigate this.
Thanks,
Sanju
Atin, any particular reason we don't want to propose it for 3.5.0, given that it can impact OCS environments? |