Bug 1465289
Summary: | Regression: non-disruptive(in-service) upgrade on EC volume fails | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | disperse | Assignee: | Sunil Kumar Acharya <sheggodu> | |
Status: | CLOSED WONTFIX | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, aspandey, asrivast, nchilaka, pkarampu, rhinduja, rhs-bugs, sheggodu, storage-qa-internal | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: |
In service upgrade requires disperse.optimistic-change-log to be OFF.
gluster v set <volname> disperse.optimistic-change-log off
|
Story Points: | --- | |
Clone Of: | ||||
: | 1468261 (view as bug list) | Environment: | ||
Last Closed: | 2017-12-06 14:20:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1468261, 1470938 | |||
Bug Blocks: |
Description
Nag Pavan Chilakam
2017-06-27 06:36:22 UTC
upstream patch : https://review.gluster.org/#/c/17703/ downstream patch : https://code.engineering.redhat.com/gerrit/#/c/112278/ on_qa validation: was doing the same test above to verify inservice upgrade from 3.8.4-34 to 3.8.4-35(both versions have the supposed fix), still seeing the above issue of input/output error. Hence moving to failed_qa a) What is the workaround for performing in-service upgrade of EC volumes? workaround: set "disperse.optimistic-change-log" to off. b) What's the impact of the workaround in terms of data integrity? No impact. c) What are the steps for performing offline upgrade? By offline, I am referring to inaccessibility of storage by the applications/clients during the course of upgrade. Steps are outlined here: https://gluster.readthedocs.io/en/latest/Upgrade-Guide/upgrade_to_3.10/#offline-upgrade-procedure Thanks. Please get the steps documented for RHGS 3.3. problem reported in BZ#1473668 has following repercussions: This problem is seen on both fuse and gnfs(and would be existing on smb/ganesha) 1)the heal almost never completes(it does complete but after a very long time or when IOs and entry creates are stopped in that directory), leading to enduser frustration. 2)Inservice upgrade cannot proceed to next set of nodes as after the first node is upgraded,the entries pending for heal stalls at some point and tends to be in that state forever. And as per inservice upgrade we should proceed to next set of nodes only after heal is completed. This may never be achieved. Even the workaround of disabling both optimistic-change-log and eagerlock, doesn't overcome this problem This would mean that inservice upgrade cannot be supported with or without workaround I have tested this on different builds as below upgrade from 3.8.4-18-6(3.2 async GA) to 3.8.4-18-38/41/27 etc or even just doing a pkill glusterfsd,glusterfs and restart of glusterd(just a brick down scenario), with IOs going on : 1)kernel untar 2)file creations under one directory ...say 1million small files Relevant documentation is done as part of RHGS-3.3.0. Bug 1481946 is "CLOSED CURRENTRELEASE". |