Bug 1386811 - BAT: pNFS tests failures reported with NetApp testsuite
Summary: BAT: pNFS tests failures reported with NetApp testsuite
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: nfs-ganesha
Classification: Retired
Component: FSAL_GLUSTER
Version: 2.4
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-19 16:28 UTC by Soumya Koduri
Modified: 2018-11-19 08:48 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 08:48:28 UTC


Attachments (Terms of Use)
nfstest_pnfs_20161019101346_2.cap (430.22 KB, application/octet-stream)
2016-10-19 16:28 UTC, Soumya Koduri
no flags Details
nfstest_pnfs_20161019101346_3.cap (210.40 KB, application/octet-stream)
2016-10-19 16:29 UTC, Soumya Koduri
no flags Details

Description Soumya Koduri 2016-10-19 16:28:31 UTC
Created attachment 1212206 [details]
nfstest_pnfs_20161019101346_2.cap

Description of problem:

There were multiple issues wrt pNFS reported while running NetApp testsuite.

Configuration:
2-node setup (one acting as both MDS and DS) and another as DS.

Test scenario:
Two files are opened and writes are done.

First run of the above test passes. But during the second run, when file1 is re-opened, DS returns ESERVERFAULT error for PUTFH operation. So the client falls back to MDS and does WRITEs but that also results in Destroy_Session affecting open of second file.

Will attach the pkt traces.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Soumya Koduri 2016-10-19 16:29:09 UTC
Created attachment 1212207 [details]
nfstest_pnfs_20161019101346_3.cap

Comment 2 Soumya Koduri 2016-10-19 16:36:52 UTC
172.16.8.118 was acting as both MDS and DS.

172.16.8.119 is DS only.

Comment 3 Daniel Gryniewicz 2016-10-19 18:39:47 UTC
Proposed fix:  https://review.gerrithub.io/298883

Comment 4 Soumya Koduri 2016-10-19 20:57:36 UTC
With the above patch applied, the issue mentioned earlier is fixed, but ran into a different issue now .

In case if MDS == DS and LAYOUT_COMMIT is sent, file size is not updated.

Logs from the testsuite - 
    FAIL: LAYOUTCOMMIT should not be sent to MDS when DS == MDS  <<< this is client-side issue
    PASS: LAYOUTCOMMIT should be sent to MDS with correct file range
    PASS: LAYOUTCOMMIT should use the layout stateid
    PASS: LAYOUTCOMMIT new offset should be set
    PASS: LAYOUTCOMMIT last write offset (65535) should be one less than the file size (65536)
    PASS: LAYOUTCOMMIT layout type should be LAYOUT4_NFSV4_1_FILES
    PASS: LAYOUTCOMMIT layout update field should be empty for LAYOUT4_NFSV4_1_FILES
    PASS: GETATTR asking for file size is sent within LAYOUTCOMMIT compound
    PASS: LAYOUTCOMMIT reply file size changed is not set (ERRATA)
    FAIL: GETATTR should return correct file size within LAYOUTCOMMIT compound, expecting 65536, got 0

Comment 5 Soumya Koduri 2016-10-19 21:09:14 UTC
Above tests are performed using RHEL-7.3 client. I tried with fedora client. LAYOUTCOMMIT is not sent if MDS is the DS for the given file.

Comment 6 Soumya Koduri 2016-10-19 21:40:44 UTC
The issue can be reproduced with below steps as well.

file2 is present on a different machine (DS) - (SIZE rightly updated)

[skoduri@skoduri ~]$ ls -ltr /mnt/skoduri/file2
-rw-rw-r--. 1 skoduri skoduri 6 Oct 20 03:05 /mnt/skoduri/file2
[skoduri@skoduri ~]$ echo "hello world" > /mnt/skoduri/file2
[skoduri@skoduri ~]$ ls -ltr /mnt/skoduri/file2
-rw-rw-r--. 1 skoduri skoduri 12 Oct 20 03:05 /mnt/skoduri/file2
[skoduri@skoduri ~]$ 


file3 is present on MDS server -

[skoduri@skoduri ~]$ ls -ltr /mnt/skoduri/file3
-rw-rw-r--. 1 skoduri skoduri 12 Oct 20 03:04 /mnt/skoduri/file3
[skoduri@skoduri ~]$ 
[skoduri@skoduri ~]$ echo "hello" > /mnt/skoduri/file3
[skoduri@skoduri ~]$ ls -ltr /mnt/skoduri/file3
-rw-rw-r--. 1 skoduri skoduri 0 Oct 20 03:06 /mnt/skoduri/file3
[skoduri@skoduri ~]$ 

>>>> File size is shown as '0'. 

[skoduri@skoduri ~]$ sleep 65
[skoduri@skoduri ~]$ ls -ltr /mnt/skoduri/file3
-rw-rw-r--. 1 skoduri skoduri 6 Oct 20 03:06 /mnt/skoduri/file3
[skoduri@skoduri ~]$ 

>>> After certain time filesize is rightly updated.

Since MDS is the DS in this case, there is no cache-invalidation involved. I even disabled gluster md-cache. Maybe some where in ganesha layers itself, we are caching invalid size.

Comment 7 Daniel Gryniewicz 2016-10-20 12:24:18 UTC
Proposed fix; https://review.gerrithub.io/298963

Comment 8 Soumya Koduri 2016-10-20 20:24:26 UTC
Another issue reported with invalid size returned post WRITE incase of DS == MDS. There is no LAYOUT_COMMIT sent here. 

Proposed fix: https://review.gerrithub.io/#/c/299035

Comment 9 Soumya Koduri 2016-10-21 13:54:35 UTC
Above mentioned patches fixed most of the issues except for one case where in if DS != MDS and there is WRITE followed by GETATTR, the size returned is invalid. The reason could be that there could have been delay in the upcall(cache-invalidation) request sent by Gluster and hence the attrs(size) may not have been updated on MDS by then.

Comment 10 Soumya Koduri 2018-11-19 08:48:28 UTC
The issues mentioned in the bug are addressed except for the last one caused by delay in upcall notification. Since there isnt much to be done for that, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.