Bug 1793207 - [RFE] Notify if multipath User Friendly Names are used
Summary: [RFE] Notify if multipath User Friendly Names are used
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.3.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.5.2
: 4.5.2
Assignee: Albert Esteve
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-20 22:37 UTC by Marina Kalinin
Modified: 2023-12-15 17:12 UTC (History)
16 users (show)

Fixed In Version: vdsm-4.50.2.2
Doc Type: Enhancement
Doc Text:
A new warning has been added to the vdsm-tool to protect users from using the unsupported user_friendly_names multipath configuration. The following is an example of the output: $ vdsm-tool is-configured --module multipath WARNING: Invalid configuration: 'user_friendly_names' is enabled in multipath configuration: section1 { key1 value1 user_friendly_names yes key2 value2 } section2 { user_friendly_names yes } This configuration is not supported and may lead to storage domain corruption.
Clone Of:
Environment:
Last Closed: 2022-09-08 11:26:41 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt vdsm pull 235 0 None Draft Multipath: warn when user_friendly_names is enabled 2022-06-19 06:57:09 UTC
Red Hat Bugzilla 1793696 0 high CLOSED [Admin] Add warning on multipath User Friendly Names not supported 2023-12-15 17:13:07 UTC
Red Hat Knowledge Base (Solution) 4758231 0 None None None 2020-08-31 20:07:50 UTC
Red Hat Product Errata RHSA-2022:6392 0 None None None 2022-09-08 11:27:09 UTC

Internal Links: 1872564

Description Marina Kalinin 2020-01-20 22:37:55 UTC
It is not supported to use multipath User Friendly Names(UFN) on RHV hosts. If UFN are used, it may cause lvm corruption on block storage domains - see bz#1553133.

If this configuration is so dangerous, we should do what we can to protect the user from using this configuration in their environment.

Comment 2 Nir Soffer 2020-01-21 07:21:47 UTC
User friendly names cannot be used so
there is no need to notify.

Vdsm control this configuration, but
but user may make multipath
configuration private and override this
setting. We need to block this option.

If user_friendly_names are enabled vdsm
sould fail to start.

Comment 3 Doron Fediuck 2020-01-21 09:25:59 UTC
I suggest we just restore the warning we used to have in the docs[1].
VDSM may fail on detecting it but we need to warn users before we enforce it.

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.0/html/Administration_Guide/Managing_Storage.html#Managing_Storage_Entities

Comment 4 Marina Kalinin 2020-01-21 20:34:20 UTC
+1 to both options to get implemented. 
I can open the docs bug for htat, and keep this one to track the software change.

Comment 5 Marina Kalinin 2020-01-21 20:39:11 UTC
Docs bug: BZ#1793696

Comment 6 Germano Veit Michel 2020-01-21 21:57:07 UTC
(In reply to Nir Soffer from comment #2)
> If user_friendly_names are enabled vdsm
> sould fail to start.

Can we just have a knob to turn this off? We do not want Sev1 outages from customers caught off guard after upgrading.

Other than that its a good way to get all customers with the supported config. Thanks.

Comment 7 Michal Skrivanek 2020-01-22 04:40:15 UTC
This should be fairly straightforward to implement as an Insights check

Comment 10 Nir Soffer 2020-05-26 14:07:08 UTC
(In reply to Germano Veit Michel from comment #6)
> (In reply to Nir Soffer from comment #2)
> > If user_friendly_names are enabled vdsm
> > sould fail to start.
> 
> Can we just have a knob to turn this off? We do not want Sev1 outages from
> customers caught off guard after upgrading.

I don't want to handle data corruption caused by this unsupported configuration.

Upgrade should fail if a system has user_friendly names. 

We need to add an upgrade pre-check validating multipath configuration.
If the host is using PRIVATE multipath configuration managed by the 
user, or drop-in multipath configuration file enabling this the upgrade
should fail.

If user modified multipath configuration after upgrade and enabled user friendly
names, vdsm should fail to start until the bad configuration is removed.

I think both checks can be implemented using "vdsm-tool isconfigured".

Comment 11 Germano Veit Michel 2020-05-26 22:21:19 UTC
(In reply to Nir Soffer from comment #10)
> (In reply to Germano Veit Michel from comment #6)
> > (In reply to Nir Soffer from comment #2)
> > > If user_friendly_names are enabled vdsm
> > > sould fail to start.
> > 
> > Can we just have a knob to turn this off? We do not want Sev1 outages from
> > customers caught off guard after upgrading.
> 
> I don't want to handle data corruption caused by this unsupported
> configuration.
> 
> Upgrade should fail if a system has user_friendly names. 
> 
> We need to add an upgrade pre-check validating multipath configuration.
> If the host is using PRIVATE multipath configuration managed by the 
> user, or drop-in multipath configuration file enabling this the upgrade
> should fail.
> 
> If user modified multipath configuration after upgrade and enabled user
> friendly
> names, vdsm should fail to start until the bad configuration is removed.
> 
> I think both checks can be implemented using "vdsm-tool isconfigured".

Fair enough. 

I think we also need a KCS on how to move on once the upgrade makes
vdsm fail to start.

To use the same SD, I'd say we would need to at least:
- Change SD metadata (MDT_PV=xxx) to actual WWID
- Update DB with the same info

Or should the customer export the VMs using a host with old VDSM
and create new SD from scratch?

This is going to cause some pain, we have an internal labs in BNE with 
friednly names set for years, its a good one to test this.

Comment 12 Michal Skrivanek 2020-06-23 12:35:17 UTC
This request is not currently committed to 4.4.z, moving it to 4.5

Comment 17 Nir Soffer 2020-08-31 16:23:16 UTC
I think we should start with warning in multipath configurator
if user friendly names are used.

Testing this change:

1. Change /etc/multipath.conf

    # Required for having same device names on all hosts.
    # DO NOT CHANGE!

    user_friendly_names         no

To:

    user_friendly_names         yes

2. Run:

    # vdsm-tool is-configured --module multipath

This should show warning like:

    WARNING: invalid configuration in /etc/multipath.conf:

        user_friendly_names         yes

    This configuration is not supported and may lead storage
    domain corruption.

This warning should be logged to syslog each time vdsm starts.

The next step would be to fail in "vdsm-too is-configured" which 
will require changing this configuration to continue using vdsm.

Comment 18 Nir Soffer 2020-08-31 16:40:01 UTC
Ben, what is the best way to check the current value of
user_friendly_names?

We can use:

    # multipathd show config local | grep user_friendly_names
	user_friendly_names "no"

But this may show several values, for example it may be defined
by some devices, like:

        device {
                vendor "NETAPP"
                product "LUN"
                path_grouping_policy "group_by_prio"
                features "2 pg_init_retries 50"
                prio "ontap"
                failback "immediate"
                no_path_retry "queue"
                flush_on_last_del "yes"
                dev_loss_tmo "infinity"
                user_friendly_names "no"
        }

So it may show the value several times.

    # multipathd show config | grep user_friendly_names
	user_friendly_names "no"
		user_friendly_names "no"

We can check that all configurations are "no", but if we find one
configuration that is "yes", we don't have a good way to show which
section had this, without parsing the configuration.

Should we use:

overrides {
    user_friendly_names "no"
}

To ensure that the value is not overridden by some device
section?

If we use this, do we have a way to get only the overrides
configuration, assuming that nothing can override it?

lvm has very useful dumpconfig command:

# lvm dumpconfig devices/filter
filter=["a|^/dev/disk/by-id/lvm-pv-uuid-80ovnb-mZIO-J65Y-rl9n-YAY7-h0Q9-Aezk8D$|","r|.*|"]

But I'm not sure how this can work for dynamic configuration/state 
reported by "multipathd show config local".

It think it would help if e can get multipath configuration in json
format, like "multipathd show maps json".

Comment 19 Ben Marzinski 2020-08-31 20:25:16 UTC
(In reply to Nir Soffer from comment #18)
> Ben, what is the best way to check the current value of
> user_friendly_names?

Right now, there's really no better way than using

# multipathd show config local

can parsing the result.

> We can use:
> 
>     # multipathd show config local | grep user_friendly_names
> 	user_friendly_names "no"
> 
> But this may show several values, for example it may be defined
> by some devices, like:
> 
>         device {
>                 vendor "NETAPP"
>                 product "LUN"
>                 path_grouping_policy "group_by_prio"
>                 features "2 pg_init_retries 50"
>                 prio "ontap"
>                 failback "immediate"
>                 no_path_retry "queue"
>                 flush_on_last_del "yes"
>                 dev_loss_tmo "infinity"
>                 user_friendly_names "no"
>         }
> 
> So it may show the value several times.
> 
>     # multipathd show config | grep user_friendly_names
> 	user_friendly_names "no"
> 		user_friendly_names "no"
> 
> We can check that all configurations are "no", but if we find one
> configuration that is "yes", we don't have a good way to show which
> section had this, without parsing the configuration.
> 
> Should we use:
> 
> overrides {
>     user_friendly_names "no"
> }
> 
> To ensure that the value is not overridden by some device
> section?

That would guarantee that nothing in defaults or devices section could set user_friendly_names to "yes". I'm not sure how big of a deal this is, but it would still be possible to
set user_friendly_names "yes" in the multipaths section, since the option is allowed there as well.

> If we use this, do we have a way to get only the overrides
> configuration, assuming that nothing can override it?

No. You would have to get the local config, and the search for

"^overrides {"

and make sure that

user_friendly_names "no"

is between that and the next line with

"^}"

To be safe you would also need to search the multipaths section.

> lvm has very useful dumpconfig command:
> 
> # lvm dumpconfig devices/filter
> filter=["a|^/dev/disk/by-id/lvm-pv-uuid-80ovnb-mZIO-J65Y-rl9n-YAY7-h0Q9-
> Aezk8D$|","r|.*|"]
> 
> But I'm not sure how this can work for dynamic configuration/state 
> reported by "multipathd show config local".
> 
> It think it would help if e can get multipath configuration in json
> format, like "multipathd show maps json".

That would be doable. You can open a bugzilla for it, if that's the way you want to go.

Comment 24 Sandro Bonazzola 2021-02-23 13:44:25 UTC
Dropping FieldEngineering, not going to be handled in insight.

Comment 25 Yaniv Kaul 2021-04-18 07:33:38 UTC
(In reply to Sandro Bonazzola from comment #24)
> Dropping FieldEngineering, not going to be handled in insight.

Why not?

Comment 26 Sandro Bonazzola 2021-04-20 15:11:38 UTC
(In reply to Yaniv Kaul from comment #25)
> (In reply to Sandro Bonazzola from comment #24)
> > Dropping FieldEngineering, not going to be handled in insight.
> 
> Why not?

The detection is not covering a business critical imminent disruption and it's basically a pre-upgrade check. As such, it's not accepted within insight and must be handled within the product.

Comment 27 Michal Skrivanek 2021-04-21 14:45:47 UTC
I believe the only feasible thing we can do here is to prevent activation of hosts that use friendly names. That way you notice and can take an action before a corruption happens. It's disruptive, but at least without data loss.

Comment 28 Marina Kalinin 2021-04-21 21:16:56 UTC
Here is the TTT article: https://source.redhat.com/groups/public/t3/technical_topic_torrent_blog/rhv_does_not_support_user_friendly_names_in_its_hosts_multipath_configuration.

However, I still do not understand why there is such a resistance to add notification to the UI for the user.

Comment 31 Arik 2022-03-16 13:28:25 UTC
(In reply to Michal Skrivanek from comment #27)
> I believe the only feasible thing we can do here is to prevent activation of
> hosts that use friendly names. That way you notice and can take an action
> before a corruption happens. It's disruptive, but at least without data loss.

+1

Comment 33 Shir Fishbain 2022-05-30 19:55:17 UTC
QE doesn't have the capacity to verify during 4.5.1

Comment 36 Shir Fishbain 2022-08-09 14:23:31 UTC
Verified 

After enabling "user_friendly_names" in any section of the configuration and running "vdsm-tool is-configured --module multipath" I got the following warning:

[root@..... ~]# vdsm-tool is-configured --module multipath
Current revision of multipath.conf detected, preserving
WARNING: Invalid configuration: 'user_friendly_names' is enabled in multipath configuration:
  defaults {
    verbosity 2
    polling_interval 5
    max_polling_interval 20
    reassign_maps no
    multipath_dir /lib64/multipath
    path_selector service-time 0
    path_grouping_policy failover
    uid_attribute ID_SERIAL
    prio const
    prio_args 
    features 0
    path_checker tur
    alias_prefix mpath
    failback manual
    rr_min_io 1000
    rr_min_io_rq 1
    max_fds 4096
    rr_weight uniform
    no_path_retry 16
    queue_without_daemon no
    flush_on_last_del yes
    user_friendly_names yes
    fast_io_fail_tmo 5
    dev_loss_tmo 30
    bindings_file /etc/multipath/bindings
    wwids_file /etc/multipath/wwids
    prkeys_file /etc/multipath/prkeys
    log_checker_err always
    all_tg_pt no
    retain_attached_hw_handler yes
    detect_prio yes
    detect_checker yes
    force_sync no
    strict_timing no
    deferred_remove no
    config_dir /etc/multipath/conf.d
    delay_watch_checks no
    delay_wait_checks no
    san_path_err_threshold no
    san_path_err_forget_rate no
    san_path_err_recovery_time no
    marginal_path_err_sample_time no
    marginal_path_err_rate_threshold no
    marginal_path_err_recheck_gap_time no
    marginal_path_double_failed_time no
    find_multipaths off
    uxsock_timeout 4000
    retrigger_tries 3
    retrigger_delay 10
    missing_uev_wait_timeout 30
    skip_kpartx no
    disable_changed_wwids ignored
    remove_retries 0
    ghost_delay no
    find_multipaths_timeout -10
    enable_foreign 
    marginal_pathgroups no
    recheck_wwid no
  }
This configuration is not supported and may lead to storage domain corruption.

Versions:
vdsm-4.50.2.2-1.el8ev.x86_64
ovirt-engine-4.5.2-0.3.el8ev.noarch

Comment 40 errata-xmlrpc 2022-09-08 11:26:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV RHEL Host (ovirt-host) [ovirt-4.5.2] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6392


Note You need to log in before you can comment on or make changes to this bug.