Bug 596061
Summary: | [RFE] - File domains metadata should be saved in 2 copies to avoid corruption in case of IO failure to one copy | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Moran Goldboim <mgoldboi> | ||||
Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | ||||
Status: | CLOSED WONTFIX | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | unspecified | CC: | abaron, acathrow, amureini, bazulay, dyasny, fsimonce, hateya, iheim, mgoldboi, Rhev-m-bugs, ykaul | ||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||
Target Release: | 3.3.4 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Enhancement | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-01-30 22:51:01 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
The problem happened due to NFS server becoming unavailable exactly at the time we are going to update the metadata. In general we have nothing to prevent the the unrecoverable metadata loss in case of underlying storage (NFS or SAN) going down on us during critical metadata update. We kinda relying on storage being available and, well, reliable. If we are decide to not trust storage anymore we could develop two phased metadata commit process that would keep two (synchronized) copies of the metadata. Thus eliminating corruption of the only metadata copy we have. Such addition would probably fail beyond the scope of 2.2, so I am re-targeting this issue to 6.0-2.3. Feel free to change it if I am wrong. See also bug 574733 This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug. |
Created attachment 416719 [details] vdsm log Description of problem: IO error on activate storage (pool meta data corruption) will lead to nonfunctional datacenter Thread-1084::ERROR::2010-05-26 06:38:17,351::dispatcher::103::irs::[Errno 5] Input/output error Thread-1084::ERROR::2010-05-26 06:38:17,352::dispatcher::104::irs::Traceback (most recent call last): Thread-1247::ERROR::2010-05-26 06:46:34,620::misc::58::irs::Meta Data parameter invalid: ('Version or spm id invalid',) Thread-1247::ERROR::2010-05-26 06:46:34,620::misc::59::irs::Traceback (most recent call last): Thread-1247::ERROR::2010-05-26 06:46:34,624::dispatcher::98::irs::{'status': {'message': "Meta Data parameter invalid: ('Version or spm id invalid',)", 'code': 755}, 'args': [('Version or spm id invalid',)]} Version-Release number of selected component (if applicable): vdsm22-4.5-57.el5rhev How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: data center becomes down, no host can take spm. Expected results: the metadata should be restored to last version Additional info: