Description of problem: While trying to bump up the op-version after upgrading a CNS setup from 3.9 to 3.10, the operation timed out. sh-4.2# gluster volume set all cluster.op-version 31302 Error : Request timed out However, after few minutes the op-version had got bumped up. So, there is no functionality impact on the command getting timed out. sh-4.2# gluster vol get all all Option Value ------ ----- cluster.server-quorum-ratio 51 cluster.enable-shared-storage disable cluster.op-version 31302 cluster.max-op-version 31302 cluster.brick-multiplex on cluster.max-bricks-per-process 0 cluster.daemon-log-level INFO Version-Release number of selected component (if applicable): glusterfs-3.12.2-18.el7rhgs.x86_64 # glusterd --version glusterfs 3.12.2 How reproducible: 1/1 Steps to Reproduce: 1. create a CNS system with 1200 volumes 2. upgrade CNS 3.9 to to OCS 3.10 3. After the upgrade, bump up the op-version by running 'gluster volume set all cluster.op-version 31302' Actual results: command timed out Expected results: setting op-version should succeed and return success Additional info:
Root cause : On a setup with 1200 volumes, the code flow tries to restart all the shd services 1200 times which is an overkill. Restarting all the per node daemons only once should be sufficient enough. upstream patch : https://review.gluster.org/#/c/glusterfs/+/21608/
I'm bumping up the severity to high as even though the transaction eventually completes but it does block the other transactions in the queue which means all the gluster management commands will be stuck when cluster.op-version is attempted to be bumped up.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3827