Bug 1665689

Summary: sos plugin is running lvm commands without locking, risking VG metadata corruption
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.30.0CC: aefrat, bugs, tnisan
Target Milestone: ovirt-4.3.3Flags: rule-engine: ovirt-4.3+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.30.12 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-16 13:58:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lvm_commands none

Description Nir Soffer 2019-01-12 21:06:37 UTC
Description of problem:

vdsm sos plugin contains this note:

    # locking_type is set to 0 in order to match lvm sos commands. With this
    # configuration we don't take any locks, so we will never block because
    # there is a stuck lvm command.

And then it uses:

    global {
        locking_type=0
        use_lvmetad=0
    }

To run pvs, vgs, lvs.

According to David Teigland, any lvm command may attempt to recover VG metadata, so running these commands without locking in oVirt setup, when
lvm is used over shared storage is risky.

The locking_type was change to 0 in this commit:

commit dc362f0f8e43c30c709700222cb3121908170eff
Author: Ala Hino <ahino>
Date:   Sun Sep 17 13:49:24 2017 +0300

    sos-plugin: Add lvm configuration when executing lvm commands
    
    Add lvm configuration when executing lvm command to gather information.
    This is important because, for example, lvmetada could be enabled and we
    want to disable it when creating the sos report, in addition and as
    recommended, the user can define devices filter and we want to override
    that filter to get all relevant devices.
    In this patch we provide the following lvm config:
    
        global {
            locking_type = 0
            use_lvmetad = 0
        }
    
        devices {
            preferred_names = [ '^/dev/mapper/' ]
            ignore_suspended_devices = 1
            write_cache_state = 0
            disable_after_error_count = 3
            filter = [ 'a|^/dev/mapper/.*|', 'r|.*|' ]
        }
    
    Change-Id: I63f1bcb08a0380537e84d9e3ece36fd876b00635
    Bug-Url: https://bugzilla.redhat.com/1474566
    Signed-off-by: Ala Hino <ahino>

Version-Release number of selected component (if applicable):
v4.20.4

How reproducible:
Not reproducible - the chance that 3 sos commands will conflict with SPM
command is very low, but it is possible.

Steps to Reproduce:
1. Run sosreport on oVirt host while many vms are running and extending thin
   provisioned block disks.

Actual results:
Running sosreport may corrupt VG metadata. See bug 1553133 for similar issue.

Expected results:
sosreport should never modify VG metadata.

Additional info:
Same issue exists in sos lvm2 plugin - the locking_type=0 change was copied
from the lvm2 plugin.
We have upstream issue for sos here:
https://github.com/sosreport/sos/issues/1533

Comment 1 Sandro Bonazzola 2019-01-28 09:36:45 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 2 Evelina Shames 2019-04-02 12:03:12 UTC
Nir, can you please provide a clearer canario?
(How many VMs/disks/... ?)

Comment 3 Nir Soffer 2019-04-02 12:05:39 UTC
There is no way to reproduce this manually. The only test needed is running sos
vdsm plugin and testing that we don't have regressions.

We expect to get the relevant lvm command output in the report.

Comment 4 Evelina Shames 2019-04-02 15:08:35 UTC
Seems fine to me, can you please confirm?

Comment 5 Evelina Shames 2019-04-02 15:08:43 UTC
Created attachment 1551053 [details]
lvm_commands

Comment 6 Nir Soffer 2019-04-02 15:11:51 UTC
(In reply to Evelina Shames from comment #5)
You collected lvm commands from the lvm sos plugin. What we need to collect
is data from vdsm sos plugin. This should contain data from lvm showing only the 
shared storage used by vdsm.

Comment 7 Evelina Shames 2019-04-03 08:33:41 UTC
Is it will be enough to run lvm commands manually on this host and compare it to vdsm sos plugin data?

Comment 8 Evelina Shames 2019-04-03 10:46:38 UTC
I created manually another vg on the same host and it appears in vdsm sos plugin data. As I understand, it shouldn't appear there, am I right?

Comment 9 Nir Soffer 2019-04-03 10:52:51 UTC
(In reply to Evelina Shames from comment #8)
> I created manually another vg on the same host and it appears in vdsm sos
> plugin data. As I understand, it shouldn't appear there, am I right?

Did you create it on multipath device (should be in vdsm report), or using local
disk (should not be in vdsm report)?

Comment 10 Evelina Shames 2019-04-03 12:05:00 UTC
(In reply to Nir Soffer from comment #6)
> (In reply to Evelina Shames from comment #5)
> You collected lvm commands from the lvm sos plugin. What we need to collect
> is data from vdsm sos plugin. This should contain data from lvm showing only
> the 
> shared storage used by vdsm.

Verified according this comment in:
ovirt-engine-4.3.3.1-0.1.el7.noarch
vdsm-4.30.12-1.el7ev.x86_64

Comment 11 Sandro Bonazzola 2019-04-16 13:58:29 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.