Bug 1441992
| Summary: | ls hang seen on an existing mount point (3.2 client) when the server is upgraded and parallel readdir is enabled | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Sweta Anandpara <sanandpa> |
| Component: | glusterfs | Assignee: | Poornima G <pgurusid> |
| Status: | CLOSED ERRATA | QA Contact: | Vinayak Papnoi <vpapnoi> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.3 | CC: | amukherj, kaushal, nchilaka, pgurusid, rhs-bugs, storage-qa-internal, vbellur |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.8.4-24 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-21 04:37:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1417151 | ||
|
Description
Sweta Anandpara
2017-04-13 09:09:47 UTC
[qe@rhsqe-repo 1441992]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo 1441992]$ [qe@rhsqe-repo 1441992]$ pwd /home/repo/sosreports/1441992 [qe@rhsqe-repo 1441992]$ [qe@rhsqe-repo 1441992]$ ll total 49308 -rwxr-xr-x. 1 qe qe 12584772 Apr 13 14:33 sosreport-dhcp47-157-sysreg-prod-20170413034037.tar.xz -rwxr-xr-x. 1 qe qe 12578400 Apr 13 14:33 sosreport-dhcp47-162-sysreg-prod-20170413034025.tar.xz -rwxr-xr-x. 1 qe qe 12627140 Apr 13 14:34 sosreport-dhcp47-164-sysreg-prod-20170413034018.tar.xz -rwxr-xr-x. 1 qe qe 12697080 Apr 13 14:33 sosreport-dhcp47-165-sysreg-prod-20170413034012.tar.xz [qe@rhsqe-repo 1441992]$ RCA: The 3.2 op-version is 31001, and parallel readdir op-version is 31000. But the 3.2 code doesn't recognize parallel-readdir feature and still we will be able to enable the feature, because the 3.2 opversion is greater than parallel readdir's opversion. Ideally we should pick features of higher opversion in one release and then in the next release we pick a feature of lower opversion. We shouldn't allow enabling parallel readdir until: - cluster op-version is that of 3.3 - all the clients and servers are upgraded to 3.3 - this condition is what is breaking as the parallel readdir opversion is lower than that of 3.2. If we allow setting parallel-readdir when there are older clients, then the older clients might crash or fail to mount as they do not understand the new feature "parallel-readdir". So the bug would be, setting "parallel-readdir on" is working even when older clients are connected, but it is expected to fail. Possible solution: Increase the op-version of parallel-readdir to be > 31001 only in downstream. Will wait for more discussion with glusterd team before arriving at the solution (In reply to Poornima G from comment #3) > RCA: > > The 3.2 op-version is 31001, and parallel readdir op-version is 31000. But > the 3.2 code doesn't recognize parallel-readdir feature and still we will be > able to enable the feature, because the 3.2 opversion is greater than > parallel readdir's opversion. Ideally we should pick features of higher > opversion in one release and then in the next release we pick a feature of > lower opversion. > > We shouldn't allow enabling parallel readdir until: > - cluster op-version is that of 3.3 > - all the clients and servers are upgraded to 3.3 - this condition is what > is breaking as the parallel readdir opversion is lower than that of 3.2. > If we allow setting parallel-readdir when there are older clients, then the > older clients might crash or fail to mount as they do not understand the new > feature "parallel-readdir". > > So the bug would be, setting "parallel-readdir on" is working even when > older clients are connected, but it is expected to fail. > > Possible solution: > Increase the op-version of parallel-readdir to be > 31001 only in downstream. > Will wait for more discussion with glusterd team before arriving at the > solution Kaushal, I don't have any other solution in mind apart from what Poornima is referring at. Do you think we can handle this in any other way. Unfortunately this will again lead us in diverging the op-versions between upstream and downstream. I cannot think of other way. We will need to diverge op-versions again. The original intent of syncing op-versions across upstream and downstream was to allow upstream clients to use downstream volumes. Diverging will break this, and I guess we're okay with that. But now, we'll need someone to track these changes between upstream and downstream, and make sure these changes are done whenever we fork a downstream branch from upstream. (In reply to Kaushal from comment #7) > I cannot think of other way. We will need to diverge op-versions again. > > The original intent of syncing op-versions across upstream and downstream > was to allow upstream clients to use downstream volumes. Diverging will > break this, and I guess we're okay with that. > > But now, we'll need someone to track these changes between upstream and > downstream, and make sure these changes are done whenever we fork a > downstream branch from upstream. Yes, that can be taken care by DOWNSTREAM ONLY tag in the downstream patches. Patch posted at https://code.engineering.redhat.com/gerrit/104403 Build : 3.8.4-28 Followed the steps mentioned in the description. Umount and remount after server upgrade to 3.3 is working fine. Pumped IOs from the mount point and tried to access the files from the client. Was successfully able to access the files without any hangs. Hence, moving this bug to verfied. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |