It is not supported to use multipath User Friendly Names(UFN) on RHV hosts. If UFN are used, it may cause lvm corruption on block storage domains - see bz#1553133. If this configuration is so dangerous, we should do what we can to protect the user from using this configuration in their environment.
User friendly names cannot be used so there is no need to notify. Vdsm control this configuration, but but user may make multipath configuration private and override this setting. We need to block this option. If user_friendly_names are enabled vdsm sould fail to start.
I suggest we just restore the warning we used to have in the docs[1]. VDSM may fail on detecting it but we need to warn users before we enforce it. [1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.0/html/Administration_Guide/Managing_Storage.html#Managing_Storage_Entities
+1 to both options to get implemented. I can open the docs bug for htat, and keep this one to track the software change.
Docs bug: BZ#1793696
(In reply to Nir Soffer from comment #2) > If user_friendly_names are enabled vdsm > sould fail to start. Can we just have a knob to turn this off? We do not want Sev1 outages from customers caught off guard after upgrading. Other than that its a good way to get all customers with the supported config. Thanks.
This should be fairly straightforward to implement as an Insights check
(In reply to Germano Veit Michel from comment #6) > (In reply to Nir Soffer from comment #2) > > If user_friendly_names are enabled vdsm > > sould fail to start. > > Can we just have a knob to turn this off? We do not want Sev1 outages from > customers caught off guard after upgrading. I don't want to handle data corruption caused by this unsupported configuration. Upgrade should fail if a system has user_friendly names. We need to add an upgrade pre-check validating multipath configuration. If the host is using PRIVATE multipath configuration managed by the user, or drop-in multipath configuration file enabling this the upgrade should fail. If user modified multipath configuration after upgrade and enabled user friendly names, vdsm should fail to start until the bad configuration is removed. I think both checks can be implemented using "vdsm-tool isconfigured".
(In reply to Nir Soffer from comment #10) > (In reply to Germano Veit Michel from comment #6) > > (In reply to Nir Soffer from comment #2) > > > If user_friendly_names are enabled vdsm > > > sould fail to start. > > > > Can we just have a knob to turn this off? We do not want Sev1 outages from > > customers caught off guard after upgrading. > > I don't want to handle data corruption caused by this unsupported > configuration. > > Upgrade should fail if a system has user_friendly names. > > We need to add an upgrade pre-check validating multipath configuration. > If the host is using PRIVATE multipath configuration managed by the > user, or drop-in multipath configuration file enabling this the upgrade > should fail. > > If user modified multipath configuration after upgrade and enabled user > friendly > names, vdsm should fail to start until the bad configuration is removed. > > I think both checks can be implemented using "vdsm-tool isconfigured". Fair enough. I think we also need a KCS on how to move on once the upgrade makes vdsm fail to start. To use the same SD, I'd say we would need to at least: - Change SD metadata (MDT_PV=xxx) to actual WWID - Update DB with the same info Or should the customer export the VMs using a host with old VDSM and create new SD from scratch? This is going to cause some pain, we have an internal labs in BNE with friednly names set for years, its a good one to test this.
This request is not currently committed to 4.4.z, moving it to 4.5
I think we should start with warning in multipath configurator if user friendly names are used. Testing this change: 1. Change /etc/multipath.conf # Required for having same device names on all hosts. # DO NOT CHANGE! user_friendly_names no To: user_friendly_names yes 2. Run: # vdsm-tool is-configured --module multipath This should show warning like: WARNING: invalid configuration in /etc/multipath.conf: user_friendly_names yes This configuration is not supported and may lead storage domain corruption. This warning should be logged to syslog each time vdsm starts. The next step would be to fail in "vdsm-too is-configured" which will require changing this configuration to continue using vdsm.
Ben, what is the best way to check the current value of user_friendly_names? We can use: # multipathd show config local | grep user_friendly_names user_friendly_names "no" But this may show several values, for example it may be defined by some devices, like: device { vendor "NETAPP" product "LUN" path_grouping_policy "group_by_prio" features "2 pg_init_retries 50" prio "ontap" failback "immediate" no_path_retry "queue" flush_on_last_del "yes" dev_loss_tmo "infinity" user_friendly_names "no" } So it may show the value several times. # multipathd show config | grep user_friendly_names user_friendly_names "no" user_friendly_names "no" We can check that all configurations are "no", but if we find one configuration that is "yes", we don't have a good way to show which section had this, without parsing the configuration. Should we use: overrides { user_friendly_names "no" } To ensure that the value is not overridden by some device section? If we use this, do we have a way to get only the overrides configuration, assuming that nothing can override it? lvm has very useful dumpconfig command: # lvm dumpconfig devices/filter filter=["a|^/dev/disk/by-id/lvm-pv-uuid-80ovnb-mZIO-J65Y-rl9n-YAY7-h0Q9-Aezk8D$|","r|.*|"] But I'm not sure how this can work for dynamic configuration/state reported by "multipathd show config local". It think it would help if e can get multipath configuration in json format, like "multipathd show maps json".
(In reply to Nir Soffer from comment #18) > Ben, what is the best way to check the current value of > user_friendly_names? Right now, there's really no better way than using # multipathd show config local can parsing the result. > We can use: > > # multipathd show config local | grep user_friendly_names > user_friendly_names "no" > > But this may show several values, for example it may be defined > by some devices, like: > > device { > vendor "NETAPP" > product "LUN" > path_grouping_policy "group_by_prio" > features "2 pg_init_retries 50" > prio "ontap" > failback "immediate" > no_path_retry "queue" > flush_on_last_del "yes" > dev_loss_tmo "infinity" > user_friendly_names "no" > } > > So it may show the value several times. > > # multipathd show config | grep user_friendly_names > user_friendly_names "no" > user_friendly_names "no" > > We can check that all configurations are "no", but if we find one > configuration that is "yes", we don't have a good way to show which > section had this, without parsing the configuration. > > Should we use: > > overrides { > user_friendly_names "no" > } > > To ensure that the value is not overridden by some device > section? That would guarantee that nothing in defaults or devices section could set user_friendly_names to "yes". I'm not sure how big of a deal this is, but it would still be possible to set user_friendly_names "yes" in the multipaths section, since the option is allowed there as well. > If we use this, do we have a way to get only the overrides > configuration, assuming that nothing can override it? No. You would have to get the local config, and the search for "^overrides {" and make sure that user_friendly_names "no" is between that and the next line with "^}" To be safe you would also need to search the multipaths section. > lvm has very useful dumpconfig command: > > # lvm dumpconfig devices/filter > filter=["a|^/dev/disk/by-id/lvm-pv-uuid-80ovnb-mZIO-J65Y-rl9n-YAY7-h0Q9- > Aezk8D$|","r|.*|"] > > But I'm not sure how this can work for dynamic configuration/state > reported by "multipathd show config local". > > It think it would help if e can get multipath configuration in json > format, like "multipathd show maps json". That would be doable. You can open a bugzilla for it, if that's the way you want to go.
Dropping FieldEngineering, not going to be handled in insight.
(In reply to Sandro Bonazzola from comment #24) > Dropping FieldEngineering, not going to be handled in insight. Why not?
(In reply to Yaniv Kaul from comment #25) > (In reply to Sandro Bonazzola from comment #24) > > Dropping FieldEngineering, not going to be handled in insight. > > Why not? The detection is not covering a business critical imminent disruption and it's basically a pre-upgrade check. As such, it's not accepted within insight and must be handled within the product.
I believe the only feasible thing we can do here is to prevent activation of hosts that use friendly names. That way you notice and can take an action before a corruption happens. It's disruptive, but at least without data loss.
Here is the TTT article: https://source.redhat.com/groups/public/t3/technical_topic_torrent_blog/rhv_does_not_support_user_friendly_names_in_its_hosts_multipath_configuration. However, I still do not understand why there is such a resistance to add notification to the UI for the user.
(In reply to Michal Skrivanek from comment #27) > I believe the only feasible thing we can do here is to prevent activation of > hosts that use friendly names. That way you notice and can take an action > before a corruption happens. It's disruptive, but at least without data loss. +1
QE doesn't have the capacity to verify during 4.5.1
Verified After enabling "user_friendly_names" in any section of the configuration and running "vdsm-tool is-configured --module multipath" I got the following warning: [root@..... ~]# vdsm-tool is-configured --module multipath Current revision of multipath.conf detected, preserving WARNING: Invalid configuration: 'user_friendly_names' is enabled in multipath configuration: defaults { verbosity 2 polling_interval 5 max_polling_interval 20 reassign_maps no multipath_dir /lib64/multipath path_selector service-time 0 path_grouping_policy failover uid_attribute ID_SERIAL prio const prio_args features 0 path_checker tur alias_prefix mpath failback manual rr_min_io 1000 rr_min_io_rq 1 max_fds 4096 rr_weight uniform no_path_retry 16 queue_without_daemon no flush_on_last_del yes user_friendly_names yes fast_io_fail_tmo 5 dev_loss_tmo 30 bindings_file /etc/multipath/bindings wwids_file /etc/multipath/wwids prkeys_file /etc/multipath/prkeys log_checker_err always all_tg_pt no retain_attached_hw_handler yes detect_prio yes detect_checker yes force_sync no strict_timing no deferred_remove no config_dir /etc/multipath/conf.d delay_watch_checks no delay_wait_checks no san_path_err_threshold no san_path_err_forget_rate no san_path_err_recovery_time no marginal_path_err_sample_time no marginal_path_err_rate_threshold no marginal_path_err_recheck_gap_time no marginal_path_double_failed_time no find_multipaths off uxsock_timeout 4000 retrigger_tries 3 retrigger_delay 10 missing_uev_wait_timeout 30 skip_kpartx no disable_changed_wwids ignored remove_retries 0 ghost_delay no find_multipaths_timeout -10 enable_foreign marginal_pathgroups no recheck_wwid no } This configuration is not supported and may lead to storage domain corruption. Versions: vdsm-4.50.2.2-1.el8ev.x86_64 ovirt-engine-4.5.2-0.3.el8ev.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV RHEL Host (ovirt-host) [ovirt-4.5.2] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6392