Bug 1687032 - Bad error handling when writing storage domain metadata may corrupt metadata
Summary: Bad error handling when writing storage domain metadata may corrupt metadata
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.30.8
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.3.3
: ---
Assignee: Nir Soffer
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-08 23:12 UTC by Nir Soffer
Modified: 2019-08-02 15:25 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.30.12
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:22 UTC
oVirt Team: Storage
Embargoed:
sbonazzo: ovirt-4.3?
sbonazzo: planning_ack?
sbonazzo: devel_ack?
sbonazzo: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 98384 0 master MERGED storage: make flush and refresh methods of PersistentDict private 2019-03-13 23:11:23 UTC
oVirt gerrit 98385 0 master MERGED storage: fix PersistentDict transaction rollback 2019-03-19 09:27:20 UTC
oVirt gerrit 98387 0 master MERGED tests: Add more failing tests for persistent dict 2019-03-12 14:36:37 UTC
oVirt gerrit 98388 0 master MERGED persistent: Fix transaction cleanup after errors 2019-03-12 15:31:50 UTC
oVirt gerrit 98603 0 ovirt-4.3 MERGED tests: Add more failing tests for persistent dict 2019-03-19 17:04:41 UTC
oVirt gerrit 98604 0 ovirt-4.3 MERGED persistent: Fix transaction cleanup after errors 2019-03-19 17:04:44 UTC
oVirt gerrit 98626 0 ovirt-4.3 MERGED storage: make flush and refresh methods of PersistentDict private 2019-03-19 17:04:46 UTC
oVirt gerrit 98663 0 ovirt-4.3 MERGED storage: fix PersistentDict transaction rollback 2019-03-19 17:04:49 UTC

Description Nir Soffer 2019-03-08 23:12:31 UTC
Description of problem:

A storage read or write error when modifying storage domain metadata may
leave storage domain metadata in inconsistent state.

There are 2 issues:

- Read error may leave metadata object in transaction state. After that no data
  will be written to storage until the storage domain is refreshed (usually 
  every 5 minutes).

- Write error does not rollback the changes, so the metadata keeps values
  modified during a transaction, while state on storage was not modified.

Both issues can cause different hosts to see different metadata at the same
time.

I don't know if how to reproduce this with the real system, but it is easy
to reproduce in vdsm automated tests when we can inject both read and write
errors.

Version-Release number of selected component (if applicable):
Any

How reproducible:
Always in vdsm tests, should be very hard in real system.

Steps to Reproduce:
Inject read or write errors when accessing storage domain metadata.

Comment 1 Avihai 2019-04-02 08:30:23 UTC
Hi Nir,

As this is already verified on vdsm tests and there is no real system scenario, can you please verify this bug?

Comment 2 Tal Nisan 2019-04-02 14:11:07 UTC
Verified in VDSM test as it is (almost) impossible to reproduce in a running system

Comment 3 Sandro Bonazzola 2019-04-16 13:58:22 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.