Bug 1285695 - nfs-ganesha+data tiering: dbench test process in "D" state with vers=4
Summary: nfs-ganesha+data tiering: dbench test process in "D" state with vers=4
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: nfs-ganesha
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-26 09:51 UTC by Saurabh
Modified: 2018-11-15 05:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-15 05:02:26 UTC
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2015-11-26 09:51:36 UTC
Description of problem:
I have a data tiering volume and started executing fs-sanity on it. Now, I see that dbench a tool part of the fs-sanity is hung with threads in "D" state.

Version-Release number of selected component (if applicable):
glusterfs-3.7.5-7.el7rhgs.x86_64
nfs-ganesha-2.2.0-11.el7rhgs.x86_64

How reproducible:
seen for the first time.

Actual results:
ganesha-gfapi.log reports,

[2015-11-26 17:46:35.213205] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]
[2015-11-26 17:46:36.213664] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7fe45a13074b] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-tier-dht: invalid argument: loc->parent [Invalid argument]
[2015-11-26 17:46:36.213869] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-cold-dht: invalid argument: loc->parent [Invalid argument]
[2015-11-26 17:46:36.214608] E [dht-helper.c:598:dht_subvol_get_hashed] (-->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xf8c) [0x7fe44e280d7c] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7fe44e280871] -->/usr/lib64/glusterfs/3.7.5/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1c8) [0x7fe44e24a7b8] ) 0-vol2-hot-dht: invalid argument: loc->parent [Invalid argument]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209148] and [2015-11-26 17:46:36.214616]
[2015-11-26 17:46:36.215323] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-2: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215487] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-1: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215629] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-3: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.215742] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-4: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
The message "W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-6: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]" repeated 2 times between [2015-11-26 17:46:34.209935] and [2015-11-26 17:46:36.215813]
[2015-11-26 17:46:36.215892] W [MSGID: 114031] [client-rpc-fops.c:2974:client3_3_lookup_cbk] 0-vol2-client-5: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2015-11-26 17:46:36.216153] E [glfs-handleops.c:1809:glfs_h_poll_cache_invalidation] (-->/usr/lib64/ganesha/libfsalgluster.so(GLUSTERFSAL_UP_Thread+0xc3) [0x7fe45a801ec3] -->/lib64/libgfapi.so.0(glfs_h_poll_upcall+0x190) [0x7fe45a3e9ce0] -->/lib64/libgfapi.so.0(+0x18a4c) [0x7fe45a3e9a4c] ) 0-glfs_h_poll_cache_invalidation: invalid argument: object [Invalid argument]


on client side,

# ps -auxww | grep dbench
root     24378  0.0  0.0 113120  1400 pts/1    S+   03:12   0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench.sh
root     24379  0.0  0.0 113124  1400 pts/1    S+   03:12   0:00 /bin/bash /opt/qa/tools/system_light/scripts/dbench/dbench_run.sh
root     24381  0.0  0.0   6420   620 pts/1    S+   03:12   0:05 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24382  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24383  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24384  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24385  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24386  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24387  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24388  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24389  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24390  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10
root     24391  0.0  0.0   7444  1376 pts/1    D+   03:12   0:00 dbench -t 300 -c /opt/qa/tools/client.txt -s -S 10

# gluster volume info vol2
 
Volume Name: vol2
Type: Tier
Volume ID: 769c043e-7764-4f04-9b4f-5d83a0b4d5a2
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.60:/rhs/brick4/d2r2-vol2
Brick2: 10.70.46.63:/rhs/brick4/d2r1-vol2
Brick3: 10.70.46.64:/rhs/brick4/d1r2-vol2
Brick4: 10.70.46.59:/rhs/brick4/d1r1-vol2
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick5: 10.70.46.59:/rhs/brick3/d1r1-vol2
Brick6: 10.70.46.64:/rhs/brick3/d1r2-vol2
Brick7: 10.70.46.63:/rhs/brick3/d2r1-vol2
Brick8: 10.70.46.60:/rhs/brick3/d2r2-vol2
Options Reconfigured:
cluster.write-freq-threshold: 0
cluster.read-freq-threshold: 0
ganesha.enable: on
features.cache-invalidation: on
cluster.tier-mode: cache
features.ctr-enabled: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


Expected results:
dbench runs should finish properly,

Additional info:

Comment 4 Shashank Raj 2016-04-30 18:14:32 UTC
Verified this bug with glusterfs-3.7.9-2 and nfs-ganesha-2.3.1-4 and the issue is not reproducible.

Executed dbench tool separately on both v3 and v4 ganesha mount with below tier volume configuration and it passes without any issues.

[root@dhcp37-180 ~]# gluster volume info tiervolume
 
Volume Name: tiervolume
Type: Tier
Volume ID: 45fd73f7-e8ed-43da-b9c6-79ae042cef12
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.37.174:/bricks/brick3/b3
Brick2: 10.70.37.127:/bricks/brick3/b3
Brick3: 10.70.37.158:/bricks/brick3/b3
Brick4: 10.70.37.180:/bricks/brick3/b3
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.37.180:/bricks/brick0/b0
Brick6: 10.70.37.158:/bricks/brick0/b0
Brick7: 10.70.37.127:/bricks/brick0/b0
Brick8: 10.70.37.174:/bricks/brick0/b0
Brick9: 10.70.37.180:/bricks/brick1/b1
Brick10: 10.70.37.158:/bricks/brick1/b1
Brick11: 10.70.37.127:/bricks/brick1/b1
Brick12: 10.70.37.174:/bricks/brick1/b1
Brick13: 10.70.37.180:/bricks/brick2/b2
Brick14: 10.70.37.158:/bricks/brick2/b2
Brick15: 10.70.37.127:/bricks/brick2/b2
Brick16: 10.70.37.174:/bricks/brick2/b2
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
cluster.tier-mode: cache
features.ctr-enabled: on
nfs.disable: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable

Comment 6 Shashank Raj 2016-07-28 10:12:57 UTC
As per comment 4, this has already been verified on tiered volume and no issues were seen while multiple executions of dbench tool. Can be closed.


Note You need to log in before you can comment on or make changes to this bug.