| Summary: | SAMBA+TIER : File when created from one windows client over the same volume mount is not accessible from other windows client over same volume mount | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Vivek Das <vdas> |
| Component: | samba | Assignee: | Michael Adam <madam> |
| Status: | CLOSED DEFERRED | QA Contact: | Vivek Das <vdas> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | aspandey, gdeschner, ira, kramdoss, madam, mzywusko, nlevinki, pkarampu, rhinduja, rhs-smb, sankarshan |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-05 10:36:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vivek Das
2016-03-30 15:47:58 UTC
sosreports uploaded @http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1322518/ Vivek: I just sat down with Dan in Westford, and we found the following: 1. I don't know what version of windows you used, or what office program. That makes reproduction very difficult. Please send that information. 2. The behavior you see may be due to share modes, and 100% normal. please verify. 3. When we used smbclient to put a file to a tiered volume and then ran "ls" from another client, we showed a 0 byte size. When the tier was removed, the file showed the correct size. Clearly #3 is a bug, and the type of bug that might make office or any other app behave strange. I'd like us to get that cleaned up before looking into this in more detail. Thanks, -Ira To be clear: .docx means Word here? And how much time is spent between operations? (Tiering means timing matters, alas.) Also, can you reproduce the issue without office, just using smbclient? What is the type of error thrown? But this is a big improvement, and a big step in the right direction. Thanks, I debugged the problem little more and found that the exact problem is rename.
The following set of operations can easily recreate the problem
Client1:
+ test.txt is accessbile
Client2:
+ test.txt is accessible
Client1:
+ create new file ("new.txt")
+ rename new.txt to test.txt
+ test.txt is accessible
Client2:
+ test.txt is NOT accessible
If time permits I will do little more investigation on the same.
(In reply to rjoseph from comment #6) > I debugged the problem little more and found that the exact problem is > rename. > > The following set of operations can easily recreate the problem > > > Client1: > + test.txt is accessbile > > Client2: > + test.txt is accessible > > Client1: > + create new file ("new.txt") > + rename new.txt to test.txt > + test.txt is accessible > > Client2: > + test.txt is NOT accessible > > If time permits I will do little more investigation on the same. Thanks for the analysis! It looks very plausible to me that the above pattern, should be a reproducer. It should be able to repro it with that in smbclient. Can you confirm that, Rajesh? The problem is reproducible via smbclient as well. client1: smb: \> open bad.txt open file \bad.txt: for read/write fnum 50799 smb: \> close 50799 smb: \> rm bad.txt smb: \> rename sample.txt bad.txt client2: smb: \> open bad.txt open file \bad.txt: for read/write fnum 64454 smb: \> close 64454 smb: \> open bad.txt Failed to open file \gala.txt. NT_STATUS_INVALID_PARAMETER I opened two smbclient session and performed the above operations interactively. Initially bad.txt is accessible by both the clients, but once I renamed the file I could not open it again. Update after further analysis of the bug: Currently the problem is seen if the cold/hot tier is a disperse volume. This prompted me to check disperse volume directly without tiered volume. The problem is seen with distributed-disperse volume as well. Though there is a slight difference in behavior between the two volume types. If it is a distributed-disperse volume if you do a "ls" on the directory (i.e. parent directory of the file, bad.txt) then the subsequent open passes. But in tiered volume it always fails even after ls on the parent directory. client2: smb: \> open bad.txt open file \bad.txt: for read/write fnum 64454 smb: \> close 64454 smb: \> open bad.txt Failed to open file \bad.txt. NT_STATUS_INVALID_PARAMETER smb: \> ls . D 0 Thu Apr 14 16:26:21 2016 .. D 0 Thu Apr 14 16:26:21 2016 bad.txt R 0 Thu Apr 14 16:26:21 2016 smb: \> open bad.txt open file \bad.txt: for read/write fnum 1749 From the initial analysis it seems that in glfs_resolve_component the first lookup (syncop_lookup) always fails with ESTALE. which causes glfs_resolve_component function to regenerate a new inode and gfid and re-trigger a new lookup. In normal distribute-replicate volume the second lookup always passes. But for disperse the second lookup also fails leading to glfs_resolve_component function to fail. Following are the findings in EC - After getting the response form server, EC updates the loc->gfid too. To do so, it checks for the loc->gfid and iatt->ia_gfid in "ec_loc_gfid_check". if loc->gfid is null, it just copies the iattt->ia_gfid into it and return success. If both the gfid's are different, It returns failure with op_errno received from server which will be a case of stale gfid. At this point, "glfs_resolve_component" sends a fresh lookup. It creates new inode and gfid and sends the lookup. However, It does not reset the loc->gfid for the fresh lookup. Now, for the second (fresh) lookup, EC gets proper response from backend. But in "ec_loc_gfid_check" it again fails as it tries to compare loc->gfid (which is still older) and iatt->ia_gfid (received from server). There could be two solutions for this - 1 - For fresh lookup reset the loc->gfid to null. 2 - If [1] is not possible, we have to handle ESTALE case in EC in different way. As we only have parent gfid and name in loc for a fresh lookup, having gfid set, to an old gfid, is incorrect. Assigning it to gfapi team. More update on this. With the new build being provided i.e 3.7.9-3 i created a microsoft word file with some data in windows client 1 and it reflected the file size too. Then in windows client 2 mounted the same tier volume , accessed the same file (created in client 1) and appended some more data. That too updated the file size which clearly reflected. Now when i login back to windows client 1 and even after repeated refresh of the volume share the size for the file is reflecting as 0Kb. Is this happening because of the look up issue ?? best i know we don't support smb+tier |