Created attachment 931341 [details] Scripts required to execute the case Description of problem: ======================= The storage cluster contains 4 2 x 2 distribute-replicate volumes. When gluster volume status is executed , the following error message is observed in the glusterd logs. [2014-08-27 07:55:03.408345] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 18c8abf1-21cd-4247-ac76-b47c7cf36764 [2014-08-27 08:05:42.957555] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : f2f1e439-49bd-45e5-81e7-dd71f2a9630a [2014-08-27 08:05:50.565215] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo] 0-: Unable to get transaction opinfo for transaction ID : 0a8e78fc-b929-4ee9-a756-bb70ef749b60 Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.6.0.27 built on Aug 4 2014 11:49:25 How reproducible: ===================== Observed multiple times. Steps to Reproduce: ========================= 1. Create 4 2 x 2 dis-rep volume. Start the volumes. 2. On management node have a cron job running to create snapshot of all the volumes every 15 minutes. ( */15 9,10,11,12,13,14,15,16,17,18 * * * cd /root && ./snaps.sh create >>/tmp/crontab_snaps_output 2>&1 ) NOTE: cronjob had created around 392 snapshots in the last 2 days. Hence deleted those snapshots before starting the IO's on the mount point. 3. Create 4 mounts for each of the volume from 2 different clients. 4. copy the attached scripts "create_dirs_files_multi_thread.py , create_dirs_files.pl " to all the mount points from one of the client. 5. From all the 4 mounts on client1 execute the following: ./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 10240 --starting-dir-num 1 --dir-depth 5 --dir-width 4 6. From all the 4 mounts on client2 execute the following: ./create_dirs_files_multi_thread.py --number-of-threads 100 --num-files-per-dir 25 --min-file-size 1024 --max-file-size 5120 --starting-dir-num 101 --dir-depth 5 --dir-width 4 7. executed gluster volume status from one of the node. Actual results: ================== Observed the error messages in the glusterd log Expected results: ===================== TBD Additional info: ==================== root@mia [Aug-27-2014-13:24:45] >gluster snapshot info No snapshots present root@mia [Aug-27-2014-13:24:54] >gluster v info Volume Name: vol1 Type: Distributed-Replicate Volume ID: 86fb01ba-be09-4734-87ab-bd77b926c1e5 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device0/b1 Brick2: rhs-client12:/rhs/device0/b2 Brick3: rhs-client13:/rhs/device0/b3 Brick4: rhs-client14:/rhs/device0/b4 Options Reconfigured: features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable Volume Name: vol2 Type: Distributed-Replicate Volume ID: f9740540-88d7-4ee6-b720-024bd827bcac Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device1/b1 Brick2: rhs-client12:/rhs/device1/b2 Brick3: rhs-client13:/rhs/device1/b3 Brick4: rhs-client14:/rhs/device1/b4 Options Reconfigured: features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable Volume Name: vol3 Type: Distributed-Replicate Volume ID: 2dc6c55c-e945-4f44-bd4a-e0da47041b78 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device2/b1 Brick2: rhs-client12:/rhs/device2/b2 Brick3: rhs-client13:/rhs/device2/b3 Brick4: rhs-client14:/rhs/device2/b4 Options Reconfigured: features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable Volume Name: vol4 Type: Distributed-Replicate Volume ID: 224dc91f-530c-4dea-a289-c7b5f1239133 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device3/b1 Brick2: rhs-client12:/rhs/device3/b2 Brick3: rhs-client13:/rhs/device3/b3 Brick4: rhs-client14:/rhs/device3/b4 Options Reconfigured: features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable root@mia [Aug-27-2014-13:24:59] >gluster v status Status of volume: vol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/b1 49152 Y 7991 Brick rhs-client12:/rhs/device0/b2 49152 Y 18408 Brick rhs-client13:/rhs/device0/b3 49152 Y 25248 Brick rhs-client14:/rhs/device0/b4 49152 Y 2898 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device1/b1 49153 Y 4079 Brick rhs-client12:/rhs/device1/b2 49153 Y 18508 Brick rhs-client13:/rhs/device1/b3 49153 Y 25348 Brick rhs-client14:/rhs/device1/b4 49153 Y 31190 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device2/b1 49154 Y 4150 Brick rhs-client12:/rhs/device2/b2 49154 Y 18583 Brick rhs-client13:/rhs/device2/b3 49154 Y 25417 Brick rhs-client14:/rhs/device2/b4 49154 Y 31257 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol4 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device3/b1 49155 Y 4220 Brick rhs-client12:/rhs/device3/b2 49155 Y 18651 Brick rhs-client13:/rhs/device3/b3 49155 Y 25493 Brick rhs-client14:/rhs/device3/b4 49155 Y 31327 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks root@mia [Aug-27-2014-13:25:03] > root@mia [Aug-27-2014-13:35:41] > root@mia [Aug-27-2014-13:35:41] >gluster v status Status of volume: vol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/b1 49152 Y 7991 Brick rhs-client12:/rhs/device0/b2 49152 Y 18408 Brick rhs-client13:/rhs/device0/b3 49152 Y 25248 Brick rhs-client14:/rhs/device0/b4 49152 Y 2898 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device1/b1 49153 Y 4079 Brick rhs-client12:/rhs/device1/b2 49153 Y 18508 Brick rhs-client13:/rhs/device1/b3 49153 Y 25348 Brick rhs-client14:/rhs/device1/b4 49153 Y 31190 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device2/b1 49154 Y 4150 Brick rhs-client12:/rhs/device2/b2 49154 Y 18583 Brick rhs-client13:/rhs/device2/b3 49154 Y 25417 Brick rhs-client14:/rhs/device2/b4 49154 Y 31257 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol4 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device3/b1 49155 Y 4220 Brick rhs-client12:/rhs/device3/b2 49155 Y 18651 Brick rhs-client13:/rhs/device3/b3 49155 Y 25493 Brick rhs-client14:/rhs/device3/b4 49155 Y 31327 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks root@mia [Aug-27-2014-13:35:42] > root@mia [Aug-27-2014-13:35:49] > root@mia [Aug-27-2014-13:35:49] > root@mia [Aug-27-2014-13:35:49] >gluster v status Status of volume: vol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/b1 49152 Y 7991 Brick rhs-client12:/rhs/device0/b2 49152 Y 18408 Brick rhs-client13:/rhs/device0/b3 49152 Y 25248 Brick rhs-client14:/rhs/device0/b4 49152 Y 2898 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device1/b1 49153 Y 4079 Brick rhs-client12:/rhs/device1/b2 49153 Y 18508 Brick rhs-client13:/rhs/device1/b3 49153 Y 25348 Brick rhs-client14:/rhs/device1/b4 49153 Y 31190 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device2/b1 49154 Y 4150 Brick rhs-client12:/rhs/device2/b2 49154 Y 18583 Brick rhs-client13:/rhs/device2/b3 49154 Y 25417 Brick rhs-client14:/rhs/device2/b4 49154 Y 31257 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol3 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: vol4 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device3/b1 49155 Y 4220 Brick rhs-client12:/rhs/device3/b2 49155 Y 18651 Brick rhs-client13:/rhs/device3/b3 49155 Y 25493 Brick rhs-client14:/rhs/device3/b4 49155 Y 31327 NFS Server on localhost 2049 Y 1570 Self-heal Daemon on localhost N/A Y 1578 NFS Server on rhs-client11 2049 Y 7998 Self-heal Daemon on rhs-client11 N/A Y 8005 NFS Server on rhs-client13 2049 Y 21453 Self-heal Daemon on rhs-client13 N/A Y 21463 NFS Server on rhs-client14 2049 Y 2906 Self-heal Daemon on rhs-client14 N/A Y 2916 NFS Server on rhs-client12 2049 Y 18664 Self-heal Daemon on rhs-client12 N/A Y 18673 Task Status of Volume vol4 ------------------------------------------------------------------------------ There are no active volume tasks root@mia [Aug-27-2014-13:35:50] >
SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1134288/ BRICK1 : rhs-client11 BRICK2 : rhs-client12 BRICK3 : rhs-client13 BRICK4 : rhs-client14 MGMT_NODE : mia
Although the log level is ERROR here but this log is expected if volume status is executed. In gluster volume status transaction flow the first RPC request comes to glusterd with no volume name, and if volume name is not specified then the transaction op info will not be set and this log will be seen. Since this is an expected functionality closing this bug.
http://review.gluster.org/9207 addressed this confusion in upstream. Here is the snippet of the commit message: There is a code path (__glusterd_handle_stage_op) where glusterd_get_txn_opinfo may fail to get a valid transaction id if there is no volume name provided in the command, however if this function fails to get a txn id in op state machine then its a serious issue and op-sm is impacted. From debugability aspect gf_log () can never give the consumer of this function, so logging these failures with gf_log_calling_fn is must here. Once the branching takes place I believe we can close this bug.
downstream patch https://code.engineering.redhat.com/gerrit/55037 posted for review.
Tested with glusterfs-3.7.1-12.el7rhgs 'gluster volume status' triggered on the node doesn't adds up any error log messages in glusterd logs Marking this bug as verified
Doc text signed off.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html