Bug 1665689 - sos plugin is running lvm commands without locking, risking VG metadata corruption
Summary: sos plugin is running lvm commands without locking, risking VG metadata corru...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.30.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.3.3
: ---
Assignee: Nir Soffer
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-12 21:06 UTC by Nir Soffer
Modified: 2019-04-16 13:58 UTC (History)
3 users (show)

Fixed In Version: vdsm-4.30.12
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:29 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
lvm_commands (19.43 KB, application/gzip)
2019-04-02 15:08 UTC, Evelina Shames
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 96774 0 master MERGED sos: Use metadata_read_only=1 for lvm commands 2019-03-20 15:40:30 UTC
oVirt gerrit 98719 0 ovirt-4.3 MERGED sos: Use metadata_read_only=1 for lvm commands 2019-03-22 14:27:26 UTC

Description Nir Soffer 2019-01-12 21:06:37 UTC
Description of problem:

vdsm sos plugin contains this note:

    # locking_type is set to 0 in order to match lvm sos commands. With this
    # configuration we don't take any locks, so we will never block because
    # there is a stuck lvm command.

And then it uses:

    global {
        locking_type=0
        use_lvmetad=0
    }

To run pvs, vgs, lvs.

According to David Teigland, any lvm command may attempt to recover VG metadata, so running these commands without locking in oVirt setup, when
lvm is used over shared storage is risky.

The locking_type was change to 0 in this commit:

commit dc362f0f8e43c30c709700222cb3121908170eff
Author: Ala Hino <ahino>
Date:   Sun Sep 17 13:49:24 2017 +0300

    sos-plugin: Add lvm configuration when executing lvm commands
    
    Add lvm configuration when executing lvm command to gather information.
    This is important because, for example, lvmetada could be enabled and we
    want to disable it when creating the sos report, in addition and as
    recommended, the user can define devices filter and we want to override
    that filter to get all relevant devices.
    In this patch we provide the following lvm config:
    
        global {
            locking_type = 0
            use_lvmetad = 0
        }
    
        devices {
            preferred_names = [ '^/dev/mapper/' ]
            ignore_suspended_devices = 1
            write_cache_state = 0
            disable_after_error_count = 3
            filter = [ 'a|^/dev/mapper/.*|', 'r|.*|' ]
        }
    
    Change-Id: I63f1bcb08a0380537e84d9e3ece36fd876b00635
    Bug-Url: https://bugzilla.redhat.com/1474566
    Signed-off-by: Ala Hino <ahino>

Version-Release number of selected component (if applicable):
v4.20.4

How reproducible:
Not reproducible - the chance that 3 sos commands will conflict with SPM
command is very low, but it is possible.

Steps to Reproduce:
1. Run sosreport on oVirt host while many vms are running and extending thin
   provisioned block disks.

Actual results:
Running sosreport may corrupt VG metadata. See bug 1553133 for similar issue.

Expected results:
sosreport should never modify VG metadata.

Additional info:
Same issue exists in sos lvm2 plugin - the locking_type=0 change was copied
from the lvm2 plugin.
We have upstream issue for sos here:
https://github.com/sosreport/sos/issues/1533

Comment 1 Sandro Bonazzola 2019-01-28 09:36:45 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 2 Evelina Shames 2019-04-02 12:03:12 UTC
Nir, can you please provide a clearer canario?
(How many VMs/disks/... ?)

Comment 3 Nir Soffer 2019-04-02 12:05:39 UTC
There is no way to reproduce this manually. The only test needed is running sos
vdsm plugin and testing that we don't have regressions.

We expect to get the relevant lvm command output in the report.

Comment 4 Evelina Shames 2019-04-02 15:08:35 UTC
Seems fine to me, can you please confirm?

Comment 5 Evelina Shames 2019-04-02 15:08:43 UTC
Created attachment 1551053 [details]
lvm_commands

Comment 6 Nir Soffer 2019-04-02 15:11:51 UTC
(In reply to Evelina Shames from comment #5)
You collected lvm commands from the lvm sos plugin. What we need to collect
is data from vdsm sos plugin. This should contain data from lvm showing only the 
shared storage used by vdsm.

Comment 7 Evelina Shames 2019-04-03 08:33:41 UTC
Is it will be enough to run lvm commands manually on this host and compare it to vdsm sos plugin data?

Comment 8 Evelina Shames 2019-04-03 10:46:38 UTC
I created manually another vg on the same host and it appears in vdsm sos plugin data. As I understand, it shouldn't appear there, am I right?

Comment 9 Nir Soffer 2019-04-03 10:52:51 UTC
(In reply to Evelina Shames from comment #8)
> I created manually another vg on the same host and it appears in vdsm sos
> plugin data. As I understand, it shouldn't appear there, am I right?

Did you create it on multipath device (should be in vdsm report), or using local
disk (should not be in vdsm report)?

Comment 10 Evelina Shames 2019-04-03 12:05:00 UTC
(In reply to Nir Soffer from comment #6)
> (In reply to Evelina Shames from comment #5)
> You collected lvm commands from the lvm sos plugin. What we need to collect
> is data from vdsm sos plugin. This should contain data from lvm showing only
> the 
> shared storage used by vdsm.

Verified according this comment in:
ovirt-engine-4.3.3.1-0.1.el7.noarch
vdsm-4.30.12-1.el7ev.x86_64

Comment 11 Sandro Bonazzola 2019-04-16 13:58:29 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.