Bug 1285695 - nfs-ganesha+data tiering: dbench test process in "D" state with vers=4
nfs-ganesha+data tiering: dbench test process in "D" state with vers=4
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Bug Updates Notification Mailing List
storage-qa-internal@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-26 04:51 EST by Saurabh
Modified: 2017-08-28 14:03 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Saurabh 2015-11-26 04:51:36 EST
Description of problem:
I have a data tiering volume and started executing fs-sanity on it. Now, I see that dbench a tool part of the fs-sanity is hung with threads in "D" state.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-7.el7rhgs.x86_64
nfs-ganesha-2.2.0-11.el7rhgs.x86_64

How reproducible:
seen for the first time.

Actual results:
ganesha-gfapi.log reports,

[2015-11-26 17:46:35.213205] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2015-11-26 17:46:36.213664] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7fe45a13074b] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-tier-dht: invalid argument: loc->parent [Invalid argument]
[2015-11-26 17:46:36.213869] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-cold-dht: invalid argument: loc->parent [Invalid argument]
[2015-11-26 17:46:36.214608] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-hot-dht: invalid argument: loc->parent [Invalid argument]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209148] and [2015-11-26 17:46:36.214616]
[2015-11-26 17:46:36.215323] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215487] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215629] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-3: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215742] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-4: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-6: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209935] and [2015-11-26 17:46:36.215813]
[2015-11-26 17:46:36.215892] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-5: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.216153] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]


on client side,

# ps -auxww | grep dbench
root     24378  0.0  0.0 113120  1400 pts/1    S+   03:12   0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench.sh
root     24379  0.0  0.0 113124  1400 pts/1    S+   03:12   0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench_run.sh
root     24381  0.0  0.0   6420   620 pts/1    S+   03:12   0:05 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24382  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24383  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24384  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24385  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24386  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24387  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24388  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24389  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24390  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24391  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10

# gluster volume info vol2
 
Volume Name: vol2
Type: Tier
Volume ID: 769c043e-7764-4f04-9b4f-5d83a0b4d5a2
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.60:/rhs/brick4/d2r2-vol2
Brick2: 10.70.46.63:/rhs/brick4/d2r1-vol2
Brick3: 10.70.46.64:/rhs/brick4/d1r2-vol2
Brick4: 10.70.46.59:/rhs/brick4/d1r1-vol2
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: 10.70.46.59:/rhs/brick3/d1r1-vol2
Brick6: 10.70.46.64:/rhs/brick3/d1r2-vol2
Brick7: 10.70.46.63:/rhs/brick3/d2r1-vol2
Brick8: 10.70.46.60:/rhs/brick3/d2r2-vol2
Options Reconfigured:
cluster.write-freq-threshold: 0
cluster.read-freq-threshold: 0
ganesha.enable: on
features.cache-invalidation: on
cluster.tier-mode: cache
features.ctr-enabled: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


Expected results:
dbench runs should finish properly,

Additional info:
Comment 4 Shashank Raj 2016-04-30 14:14:32 EDT
Verified this bug with glusterfs-3.7.9-2 and nfs-ganesha-2.3.1-4 and the issue is not reproducible.

Executed dbench tool separately on both v3 and v4 ganesha mount with below tier volume configuration and it passes without any issues.

[root@dhcp37-180 ~]# gluster volume info tiervolume
 
Volume Name: tiervolume
Type: Tier
Volume ID: 45fd73f7-e8ed-43da-b9c6-79ae042cef12
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.37.174:/bricks/brick3/b3
Brick2: 10.70.37.127:/bricks/brick3/b3
Brick3: 10.70.37.158:/bricks/brick3/b3
Brick4: 10.70.37.180:/bricks/brick3/b3
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.37.180:/bricks/brick0/b0
Brick6: 10.70.37.158:/bricks/brick0/b0
Brick7: 10.70.37.127:/bricks/brick0/b0
Brick8: 10.70.37.174:/bricks/brick0/b0
Brick9: 10.70.37.180:/bricks/brick1/b1
Brick10: 10.70.37.158:/bricks/brick1/b1
Brick11: 10.70.37.127:/bricks/brick1/b1
Brick12: 10.70.37.174:/bricks/brick1/b1
Brick13: 10.70.37.180:/bricks/brick2/b2
Brick14: 10.70.37.158:/bricks/brick2/b2
Brick15: 10.70.37.127:/bricks/brick2/b2
Brick16: 10.70.37.174:/bricks/brick2/b2
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.tier-mode: cache
features.ctr-enabled: on
nfs.disable: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable
Comment 6 Shashank Raj 2016-07-28 06:12:57 EDT
As per comment 4, this has already been verified on tiered volume and no issues were seen while multiple executions of dbench tool. Can be closed.

Note You need to log in before you can comment on or make changes to this bug.