Description of problem: I have a data tiering volume and started executing fs-sanity on it. Now, I see that dbench a tool part of the fs-sanity is hung with threads in "D" state. Version-Release number of selected component (if applicable): glusterfs-3.7.5-7.el7rhgs.x86_64 nfs-ganesha-2.2.0-11.el7rhgs.x86_64 How reproducible: seen for the first time. Actual results: ganesha-gfapi.log reports, [2015-11-26 17:46:35.213205] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] [2015-11-26 17:46:36.213664] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7fe45a13074b] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-tier-dht: invalid argument: loc->parent [Invalid argument] [2015-11-26 17:46:36.213869] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-cold-dht: invalid argument: loc->parent [Invalid argument] [2015-11-26 17:46:36.214608] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-hot-dht: invalid argument: loc->parent [Invalid argument] The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209148] and [2015-11-26 17:46:36.214616] [2015-11-26 17:46:36.215323] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2015-11-26 17:46:36.215487] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2015-11-26 17:46:36.215629] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-3: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2015-11-26 17:46:36.215742] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-4: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-6: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209935] and [2015-11-26 17:46:36.215813] [2015-11-26 17:46:36.215892] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-5: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument] [2015-11-26 17:46:36.216153] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument] on client side, # ps -auxww | grep dbench root 24378 0.0 0.0 113120 1400 pts/1 S+ 03:12 0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench.sh root 24379 0.0 0.0 113124 1400 pts/1 S+ 03:12 0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench_run.sh root 24381 0.0 0.0 6420 620 pts/1 S+ 03:12 0:05 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24382 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24383 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24384 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24385 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24386 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24387 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24388 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24389 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24390 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 root 24391 0.0 0.0 7444 1376 pts/1 D+ 03:12 0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10 # gluster volume info vol2 Volume Name: vol2 Type: Tier Volume ID: 769c043e-7764-4f04-9b4f-5d83a0b4d5a2 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.46.60:/rhs/brick4/d2r2-vol2 Brick2: 10.70.46.63:/rhs/brick4/d2r1-vol2 Brick3: 10.70.46.64:/rhs/brick4/d1r2-vol2 Brick4: 10.70.46.59:/rhs/brick4/d1r1-vol2 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick5: 10.70.46.59:/rhs/brick3/d1r1-vol2 Brick6: 10.70.46.64:/rhs/brick3/d1r2-vol2 Brick7: 10.70.46.63:/rhs/brick3/d2r1-vol2 Brick8: 10.70.46.60:/rhs/brick3/d2r2-vol2 Options Reconfigured: cluster.write-freq-threshold: 0 cluster.read-freq-threshold: 0 ganesha.enable: on features.cache-invalidation: on cluster.tier-mode: cache features.ctr-enabled: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable Expected results: dbench runs should finish properly, Additional info:
Verified this bug with glusterfs-3.7.9-2 and nfs-ganesha-2.3.1-4 and the issue is not reproducible. Executed dbench tool separately on both v3 and v4 ganesha mount with below tier volume configuration and it passes without any issues. [root@dhcp37-180 ~]# gluster volume info tiervolume Volume Name: tiervolume Type: Tier Volume ID: 45fd73f7-e8ed-43da-b9c6-79ae042cef12 Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: 10.70.37.174:/bricks/brick3/b3 Brick2: 10.70.37.127:/bricks/brick3/b3 Brick3: 10.70.37.158:/bricks/brick3/b3 Brick4: 10.70.37.180:/bricks/brick3/b3 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick5: 10.70.37.180:/bricks/brick0/b0 Brick6: 10.70.37.158:/bricks/brick0/b0 Brick7: 10.70.37.127:/bricks/brick0/b0 Brick8: 10.70.37.174:/bricks/brick0/b0 Brick9: 10.70.37.180:/bricks/brick1/b1 Brick10: 10.70.37.158:/bricks/brick1/b1 Brick11: 10.70.37.127:/bricks/brick1/b1 Brick12: 10.70.37.174:/bricks/brick1/b1 Brick13: 10.70.37.180:/bricks/brick2/b2 Brick14: 10.70.37.158:/bricks/brick2/b2 Brick15: 10.70.37.127:/bricks/brick2/b2 Brick16: 10.70.37.174:/bricks/brick2/b2 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on cluster.tier-mode: cache features.ctr-enabled: on nfs.disable: on performance.readdir-ahead: on cluster.enable-shared-storage: enable nfs-ganesha: enable
As per comment 4, this has already been verified on tiered volume and no issues were seen while multiple executions of dbench tool. Can be closed.