Bug 1285237
Summary: | nfs-ganesha+data tiering: posix compliance tests gets stuck at rename test | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> |
Component: | tier | Assignee: | Mohammed Rafi KC <rkavunga> |
Status: | CLOSED WONTFIX | QA Contact: | Sweta Anandpara <sanandpa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | akhakhar, hgowtham, jthottan, kkeithle, nbalacha, nchilaka, ndevos, nlevinki, rcyriac, rhinduja, rhs-bugs, rkavunga, sankarshan, skoduri |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | tier-fuse-nfs-samba | ||
Fixed In Version: | glusterfs-3.7.9-1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-08 18:31:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1276227 | ||
Bug Blocks: |
Description
Saurabh
2015-11-25 09:36:49 UTC
Please provide the pkt trace captured and the logs -ganesha/ganesha-gfapi.log, sosreport. I could consistently reproduce this issue. For some reason, as part tests/rename/10.t , mknod()-> glfs_h_mknod() fails with EEXIST error. In NFS-Ganesha, on receiving this error, does a lookup of the object for which it receives ENOENT. Since it is in inconsistent state, it keeps retrying the fop resulting in hang on the client side. This seem to be an issue with dht-tier. For every file creation, dht-tier seem to be creating linked files on cold tier first and then actual files on hot-tier. If for any reason actual file creation fails, linked files are not deleted from the cold tier (as can be seen below) # ls -ltr /bricks/xfs_brick/b*/dir_tmp/fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7 /bricks/xfs_brick/b2-tier/dir_tmp/fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7: total 0 /bricks/xfs_brick/b1-tier/dir_tmp/fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7: total 0 /bricks/xfs_brick/b2/dir_tmp/fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7: total 0 ---------T 2 root root 0 Dec 7 13:18 fstest_cbf533920c1896845883fb493fce0d06 /bricks/xfs_brick/b1/dir_tmp/fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7: total 0 ---------T 2 root root 0 Dec 7 13:18 fstest_cbf533920c1896845883fb493fce0d06 This led to the inconsistent behaviour seen by the nfs-ganesha (mentioned in the above comment). Exclusive file creation with the same file path shall result in ERR_EXISTS error since linked files are present and LOOKUP shall fail with EINVAL error since the actual files are missing. ]# ls fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7/fstest_cbf533920c1896845883fb493fce0d06 ls: cannot access fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7/fstest_cbf533920c1896845883fb493fce0d06: No such file or directory # mkfifo fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7/fstest_cbf533920c1896845883fb493fce0d06 mkfifo: cannot create fifo `fstest_3e00dbbaaeb95ee267e1a3b59a821b34/fstest_dc09c43ca69e030460a777fac05f25d7/fstest_cbf533920c1896845883fb493fce0d06': File exists This seem to be known issue with tiering. Request Rafi/Nithya to mark this bug duplicate (if any present already) Sounds like the root cause is addressed with bug 1291212? Jiffin, what do you thinks? I tested the bug in upstream 3.7.9 and didn't hit the issue on nfsv4 mount. on the server --------------- gluster v info vol Volume Name: vol Type: Tier Volume ID: 6bd03c40-4a6e-4beb-847f-29fba87132fb Status: Started Number of Bricks: 4 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: 10.70.43.7:/brick1/volhh2 Brick2: 10.70.43.58:/brick1/volhh1 Cold Tier: Cold Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick3: 10.70.43.58:/brick1/vol Brick4: 10.70.43.7:/brick1/vol Options Reconfigured: performance.readdir-ahead: on nfs.disable: on features.cache-invalidation: on ganesha.enable: on features.ctr-enabled: on cluster.tier-mode: cache nfs-ganesha: enable cluster.enable-shared-storage: enable on the client -------------------------------------- mount 10.70.43.58:/vol on /mnt/nfs/1 type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.43.92,local_lock=none,addr=10.70.43.58) I only executed rename test suite in posix compliance test output of the test Changing to the specified mountpoint /mnt/nfs/1/run14744 executing posix_compliance start: 22:26:22 real 0m17.735s user 0m1.191s sys 0m3.230s /export//opt/qa/tools/posix-testsuite/tests/rename/00.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/01.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/02.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/03.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/04.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/05.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/06.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/07.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/08.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/09.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/10.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/11.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/12.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/13.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/14.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/15.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/16.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/17.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/18.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/19.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/20.t .. ok All tests successful. Files=21, Tests=479, 17 wallclock secs ( 0.39 usr 0.05 sys + 1.58 cusr 6.58 csys = 8.60 CPU) Result: PASS /export//opt/qa/tools/posix-testsuite/tests/rename/00.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/01.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/02.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/03.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/04.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/05.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/06.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/07.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/08.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/09.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/10.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/11.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/12.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/13.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/14.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/15.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/16.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/17.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/18.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/19.t .. ok /export//opt/qa/tools/posix-testsuite/tests/rename/20.t .. ok All tests successful. Files=21, Tests=479, 17 wallclock secs ( 0.23 usr 0.04 sys + 0.88 cusr 3.17 csys = 4.32 CPU) Result: PASS end: 22:26:41 removed posix compliance directories 1 Total 1 tests were successful Switching over to the previous working directory Removing /mnt/nfs/1//run14744/ This issue was reported long back. After that, quite number of patches went to code base. Based on comment4 and comment11, the issue could have fixed with patch http://review.gluster.org/#/c/12829/ Shashank, Can you please try out this in the latest build ? As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary. |