Bug 1687032

Summary: Bad error handling when writing storage domain metadata may corrupt metadata
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Shir Fishbain <sfishbai>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.30.8CC: aefrat, bugs, tnisan
Target Milestone: ovirt-4.3.3Flags: sbonazzo: ovirt-4.3?
sbonazzo: planning_ack?
sbonazzo: devel_ack?
sbonazzo: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.30.12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-16 13:58:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nir Soffer 2019-03-08 23:12:31 UTC
Description of problem:

A storage read or write error when modifying storage domain metadata may
leave storage domain metadata in inconsistent state.

There are 2 issues:

- Read error may leave metadata object in transaction state. After that no data
  will be written to storage until the storage domain is refreshed (usually 
  every 5 minutes).

- Write error does not rollback the changes, so the metadata keeps values
  modified during a transaction, while state on storage was not modified.

Both issues can cause different hosts to see different metadata at the same
time.

I don't know if how to reproduce this with the real system, but it is easy
to reproduce in vdsm automated tests when we can inject both read and write
errors.

Version-Release number of selected component (if applicable):
Any

How reproducible:
Always in vdsm tests, should be very hard in real system.

Steps to Reproduce:
Inject read or write errors when accessing storage domain metadata.

Comment 1 Avihai 2019-04-02 08:30:23 UTC
Hi Nir,

As this is already verified on vdsm tests and there is no real system scenario, can you please verify this bug?

Comment 2 Tal Nisan 2019-04-02 14:11:07 UTC
Verified in VDSM test as it is (almost) impossible to reproduce in a running system

Comment 3 Sandro Bonazzola 2019-04-16 13:58:22 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.