Description of problem: To fix Bug 1726000, we have added a new config option in geo replication which picks up <distCount> from gluster v info --xml output and stores it in master_distribution_count. However it was observed that <distCount> value was always 3 for all replica Nx3 volumes and for disperse volumes it was always 6. This resulted in master_distribution_count always being 3 for Nx3 replica volume sessions and 6 for all Distributed-Disperse volume sessions. Due to this the patch sent for Bug 1726000 won't work. ################################################################################ [root@dhcp43-49 ~]# gluster v geo-rep master3 10.70.43.140::slave3 config master_distribution_count 3 [root@dhcp43-49 ~]# gluster v geo-rep master2 10.70.43.140::slave2 config master_distribution_count 3 ################################################################################ [root@dhcp43-49 ~]# gluster v info master2 Volume Name: master2 Type: Distributed-Replicate Volume ID: 33e3f668-a1f3-449c-b9e6-a21d4ca4561d Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick4/master2 Brick2: dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick4/master2 Brick3: dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick4/master2 Brick4: dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick5/master2 Brick5: dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick5/master2 Brick6: dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick5/master2 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable ################################################################################ XML output: ################################################################################ [root@dhcp43-49 ~]# gluster v info master2 --xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> <volInfo> <volumes> <volume> <name>master2</name> <id>33e3f668-a1f3-449c-b9e6-a21d4ca4561d</id> <status>1</status> <statusStr>Started</statusStr> <snapshotCount>0</snapshotCount> <brickCount>6</brickCount> <distCount>3</distCount> <stripeCount>1</stripeCount> <replicaCount>3</replicaCount> <arbiterCount>0</arbiterCount> <disperseCount>0</disperseCount> <redundancyCount>0</redundancyCount> <type>7</type> <typeStr>Distributed-Replicate</typeStr> <transport>0</transport> <bricks> <brick uuid="c0f99599-7244-4c85-a726-cc7550671498">dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick4/master2<name>dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick4/master2</name><hostUuid>c0f99599-7244-4c85-a726-cc7550671498</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7cd717be-500d-435c-aae5-2c207a583c44">dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick4/master2<name>dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick4/master2</name><hostUuid>7cd717be-500d-435c-aae5-2c207a583c44</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="e1f9d580-ec02-4f92-bd12-10d91c1796df">dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick4/master2<name>dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick4/master2</name><hostUuid>e1f9d580-ec02-4f92-bd12-10d91c1796df</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="c0f99599-7244-4c85-a726-cc7550671498">dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick5/master2<name>dhcp43-49.lab.eng.blr.redhat.com:/bricks/brick5/master2</name><hostUuid>c0f99599-7244-4c85-a726-cc7550671498</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7cd717be-500d-435c-aae5-2c207a583c44">dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick5/master2<name>dhcp43-96.lab.eng.blr.redhat.com:/bricks/brick5/master2</name><hostUuid>7cd717be-500d-435c-aae5-2c207a583c44</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="e1f9d580-ec02-4f92-bd12-10d91c1796df">dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick5/master2<name>dhcp43-93.lab.eng.blr.redhat.com:/bricks/brick5/master2</name><hostUuid>e1f9d580-ec02-4f92-bd12-10d91c1796df</hostUuid><isArbiter>0</isArbiter></brick> </bricks> <optCount>8</optCount> <options> <option> <name>changelog.changelog</name> <value>on</value> </option> <option> <name>geo-replication.ignore-pid-check</name> <value>on</value> </option> <option> <name>geo-replication.indexing</name> <value>on</value> </option> <option> <name>transport.address-family</name> <value>inet</value> </option> <option> <name>storage.fips-mode-rchecksum</name> <value>on</value> </option> <option> <name>nfs.disable</name> <value>on</value> </option> <option> <name>performance.client-io-threads</name> <value>off</value> </option> <option> <name>cluster.enable-shared-storage</name> <value>enable</value> </option> </options> </volume> <count>1</count> </volumes> </volInfo> </cliOutput> ################################################################################ Version-Release number of selected component (if applicable): glusterfs-6.0-15 How reproducible: Always Steps to Reproduce: 1. Create a nx3 volume. 2. Run # gluster v info --xml and check <distCount> value in the xml output. Actual results: The value of <distCount> is always 3. Expected results: Based on the type of volume the value of distCount should be set to a appropriate number. Additional info: - This isn't a regression as I was able to reproduce it in 3.4.0 and 3.4.4 - For EC volumes it is showing <distCount>6</distCount> even when the Distributed-Disperse volume is of type 2 x ( 4 + 2) and 3 x (4 + 2). ################################################################################ [root@dhcp35-107 ~]# gluster v info Volume Name: slave Type: Distributed-Disperse Volume ID: 98f79758-cb32-4c05-8f2b-367ca92ff2b3 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick1/slave Brick2: dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick1/slave Brick3: dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick1/slave Brick4: dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick2/slave Brick5: dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick2/slave Brick6: dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick2/slave Brick7: dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick3/slave Brick8: dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick3/slave Brick9: dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick3/slave Brick10: dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick4/slave Brick11: dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick4/slave Brick12: dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick4/slave Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on ################################################################################ [root@dhcp35-107 ~]# gluster v info --xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cliOutput> <opRet>0</opRet> <opErrno>0</opErrno> <opErrstr/> <volInfo> <volumes> <volume> <name>slave</name> <id>98f79758-cb32-4c05-8f2b-367ca92ff2b3</id> <status>1</status> <statusStr>Started</statusStr> <snapshotCount>0</snapshotCount> <brickCount>12</brickCount> <distCount>6</distCount> <stripeCount>1</stripeCount> <replicaCount>1</replicaCount> <arbiterCount>0</arbiterCount> <disperseCount>6</disperseCount> <redundancyCount>2</redundancyCount> <type>9</type> <typeStr>Distributed-Disperse</typeStr> <transport>0</transport> <bricks> <brick uuid="96913581-ac68-4235-b825-b63d0cbbb8f3">dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick1/slave<name>dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick1/slave</name><hostUuid>96913581-ac68-4235-b825-b63d0cbbb8f3</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="3c1da764-56fb-4ddb-a87e-598730956618">dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick1/slave<name>dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick1/slave</name><hostUuid>3c1da764-56fb-4ddb-a87e-598730956618</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb">dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick1/slave<name>dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick1/slave</name><hostUuid>7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="96913581-ac68-4235-b825-b63d0cbbb8f3">dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick2/slave<name>dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick2/slave</name><hostUuid>96913581-ac68-4235-b825-b63d0cbbb8f3</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="3c1da764-56fb-4ddb-a87e-598730956618">dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick2/slave<name>dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick2/slave</name><hostUuid>3c1da764-56fb-4ddb-a87e-598730956618</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb">dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick2/slave<name>dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick2/slave</name><hostUuid>7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="96913581-ac68-4235-b825-b63d0cbbb8f3">dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick3/slave<name>dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick3/slave</name><hostUuid>96913581-ac68-4235-b825-b63d0cbbb8f3</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="3c1da764-56fb-4ddb-a87e-598730956618">dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick3/slave<name>dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick3/slave</name><hostUuid>3c1da764-56fb-4ddb-a87e-598730956618</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb">dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick3/slave<name>dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick3/slave</name><hostUuid>7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="96913581-ac68-4235-b825-b63d0cbbb8f3">dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick4/slave<name>dhcp35-107.lab.eng.blr.redhat.com:/bricks/brick4/slave</name><hostUuid>96913581-ac68-4235-b825-b63d0cbbb8f3</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="3c1da764-56fb-4ddb-a87e-598730956618">dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick4/slave<name>dhcp35-119.lab.eng.blr.redhat.com:/bricks/brick4/slave</name><hostUuid>3c1da764-56fb-4ddb-a87e-598730956618</hostUuid><isArbiter>0</isArbiter></brick> <brick uuid="7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb">dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick4/slave<name>dhcp35-164.lab.eng.blr.redhat.com:/bricks/brick4/slave</name><hostUuid>7f13d3bc-99f8-4fc9-b145-f01b8e9da9cb</hostUuid><isArbiter>0</isArbiter></brick> </bricks> <optCount>3</optCount> <options> <option> <name>transport.address-family</name> <value>inet</value> </option> <option> <name>storage.fips-mode-rchecksum</name> <value>on</value> </option> <option> <name>nfs.disable</name> <value>on</value> </option> </options> </volume> <count>1</count> </volumes> </volInfo> </cliOutput> ################################################################################
RCA: From declaration of glusterd_volinfo_ struct: int subvol_count; /* Number of subvolumes in a distribute volume */ int dist_leaf_count; /* Number of bricks in one distribute subvolume */ glusterd_add_volume_detail_to_dict() is adding dist_leaf_count into the dict instead of subvol_count. keylen = snprintf(key, sizeof(key), "volume%d.dist_count", count); ret = dict_set_int32n(volumes, key, keylen, volinfo->dist_leaf_count); <-- this is leading to wrong distCount value in vol info --xml output. if (ret) goto out; Thanks, Sanju
patch https://review.gluster.org/#/c/glusterfs/+/23521 posted upstream for review.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249