Description of problem: Brick Multiplex is not sharing the same PID for the bricks if we revert the volume options to default Version-Release number of selected component (if applicable): glusterfs-3.12.2-7.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create 2 * 3 ( distrubute-replicate ) and start the volume ( volname : 23 ) 2. disable the self-heal-daemon ( volname : 23 ) 3. enable the self-heal-daemon ( volname : 23 ) 4. create 1 * 2 replicate volume ( volname : 12 ) 5. start the volume 6. check the brick PID's Actual results: seen 2 different PID for the bricks on the same node Expected results: Expected is single PID for the bricks on the same node Additional info: # gluster vol info Volume Name: 12 Type: Replicate Volume ID: ee108657-6f15-422d-9516-83674728d38f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.35.61:/bricks/brick1/b0 Brick2: 10.70.35.174:/bricks/brick1/b1 Options Reconfigured: transport.address-family: inet nfs.disable: on cluster.localtime-logging: disable Volume Name: 23 Type: Distributed-Replicate Volume ID: d2821465-9dd4-4e72-929c-24ff8940ab69 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.35.61:/bricks/brick0/testvol_distributed-replicated_brick0 Brick2: 10.70.35.174:/bricks/brick0/testvol_distributed-replicated_brick1 Brick3: 10.70.35.17:/bricks/brick0/testvol_distributed-replicated_brick2 Brick4: 10.70.35.163:/bricks/brick0/testvol_distributed-replicated_brick3 Brick5: 10.70.35.136:/bricks/brick0/testvol_distributed-replicated_brick4 Brick6: 10.70.35.214:/bricks/brick0/testvol_distributed-replicated_brick5 Options Reconfigured: cluster.self-heal-daemon: on transport.address-family: inet nfs.disable: on cluster.localtime-logging: disable # > volume status # gluster vol status Status of volume: 12 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.61:/bricks/brick1/b0 49153 0 Y 1030 Brick 10.70.35.174:/bricks/brick1/b1 49153 0 Y 19855 Self-heal Daemon on localhost N/A N/A Y 1108 Self-heal Daemon on dhcp35-163.lab.eng.blr. redhat.com N/A N/A Y 27239 Self-heal Daemon on dhcp35-17.lab.eng.blr.r edhat.com N/A N/A Y 26199 Self-heal Daemon on dhcp35-214.lab.eng.blr. redhat.com N/A N/A Y 26494 Self-heal Daemon on dhcp35-136.lab.eng.blr. redhat.com N/A N/A Y 16859 Self-heal Daemon on dhcp35-174.lab.eng.blr. redhat.com N/A N/A Y 19877 Task Status of Volume 12 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: 23 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.61:/bricks/brick0/testvol_di stributed-replicated_brick0 49154 0 Y 759 Brick 10.70.35.174:/bricks/brick0/testvol_d istributed-replicated_brick1 49154 0 Y 19750 Brick 10.70.35.17:/bricks/brick0/testvol_di stributed-replicated_brick2 49153 0 Y 26044 Brick 10.70.35.163:/bricks/brick0/testvol_d istributed-replicated_brick3 49153 0 Y 27124 Brick 10.70.35.136:/bricks/brick0/testvol_d istributed-replicated_brick4 49153 0 Y 16753 Brick 10.70.35.214:/bricks/brick0/testvol_d istributed-replicated_brick5 49153 0 Y 26379 Self-heal Daemon on localhost N/A N/A Y 1108 Self-heal Daemon on dhcp35-174.lab.eng.blr. redhat.com N/A N/A Y 19877 Self-heal Daemon on dhcp35-163.lab.eng.blr. redhat.com N/A N/A Y 27239 Self-heal Daemon on dhcp35-17.lab.eng.blr.r edhat.com N/A N/A Y 26199 Self-heal Daemon on dhcp35-136.lab.eng.blr. redhat.com N/A N/A Y 16859 Self-heal Daemon on dhcp35-214.lab.eng.blr. redhat.com N/A N/A Y 26494 Task Status of Volume 23 ------------------------------------------------------------------------------ There are no active volume tasks # > glusterfsd process # ps -eaf | grep -i glusterfsd root 759 1 0 07:29 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.61 --volfile-id 23.10.70.35.61.bricks-brick0-testvol_distributed-replicated_brick0 -p /var/run/gluster/vols/23/10.70.35.61-bricks-brick0-testvol_distributed-replicated_brick0.pid -S /var/run/gluster/fab7030ba424ceab78598686d2e8915f.socket --brick-name /bricks/brick0/testvol_distributed-replicated_brick0 -l /var/log/glusterfs/bricks/bricks-brick0-testvol_distributed-replicated_brick0.log --xlator-option *-posix.glusterd-uuid=be801d54-d39e-40cb-967c-0987cfd4f5f7 --brick-port 49154 --xlator-option 23-server.listen-port=49154 root 1030 1 0 07:30 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.35.61 --volfile-id 12.10.70.35.61.bricks-brick1-b0 -p /var/run/gluster/vols/12/10.70.35.61-bricks-brick1-b0.pid -S /var/run/gluster/4b97f0f7ed13959d7c03731d880c4462.socket --brick-name /bricks/brick1/b0 -l /var/log/glusterfs/bricks/bricks-brick1-b0.log --xlator-option *-posix.glusterd-uuid=be801d54-d39e-40cb-967c-0987cfd4f5f7 --brick-port 49153 --xlator-option 12-server.listen-port=49153 root 1480 24362 0 07:47 pts/0 00:00:00 grep --color=auto -i glusterfsd #
As discussed with Vijay, this is expected by design and hence implementation. The current way of matching the brick compatibility is to compare the volume options between the volume which hosts the source brick and the volume which hosts the brick which need to be attached to the source brick. The piece of code which does this: static int opts_mismatch (dict_t *dict1, char *key, data_t *value1, void *dict2) { data_t *value2 = dict_get (dict2, key); int32_t min_len; /* * If the option is only present on one, we can either look at the * default or assume a mismatch. Looking at the default is pretty * hard, because that's part of a structure within each translator and * there's no dlopen interface to get at it, so we assume a mismatch. * If the user really wants them to match (and for their bricks to be * multiplexed, they can always reset the option). */ if (!value2) { gf_log (THIS->name, GF_LOG_DEBUG, "missing option %s", key); return -1; } min_len = MIN (value1->len, value2->len); if (strncmp (value1->data, value2->data, min_len) != 0) { gf_log (THIS->name, GF_LOG_DEBUG, "option mismatch, %s, %s != %s", key, value1->data, value2->data); return -1; } return 0; } We could look for the default options along with mismatch but f you read through the comment of the function, it actually specifies why it's choosen not to implement like it. Looking at the nature of the problem, this doesn't impact anything from the functionality perspective, it's just that we'll end up spawning a new brick process. I'd not be looking to fix this at RHGS 3.4.0. I'm marking this as 3.4.0-beyond. If you have a valid justification on this need to be fixed at 3.4.0 please do comment. Also I disagree on the severity here, It should be low.