Bug 1720578

Summary: [OpenStack] NIC naming validation can produce false positives and block a valid upgrade
Product: Red Hat Enterprise Linux 7 Reporter: Jiri Stransky <jstransk>
Component: leapp-repositoryAssignee: Leapp team <leapp-notifications>
Status: CLOSED CANTFIX QA Contact: Alois Mahdal <amahdal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.6CC: mbocek, msekleta, pstodulk
Target Milestone: rcKeywords: Upgrades
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-21 08:26:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1708241, 1727807    

Description Jiri Stransky 2019-06-14 09:26:09 UTC
Description of problem:

The PersistentNetNamesDisable actor validates NIC names and blocks the upgrade if the machine has more than one "eth<number>" NICs under the assumption that the names will change after upgrade. While this is probably the usual case, it can also happen that the "eth<number>" names are set up in a way that the naming would be stable after the upgrade. In this scenario the upgrade is safe/valid, but Leapp will still block it.

It is probably unrealistic to produce 100% correct heuristic whether the NIC names will change or not as there are multiple ways to stabilize them (ifcfg files with HWADDR entry, udev rules to name the ones i know about :) ). Therefore it would probably be best to keep the actor enabled by default, but provide a way to disable it if the user is certain they have made their NIC naming stable.

Version-Release number of selected component (if applicable):

leapp-repository-0.7.0-5.el7_6.noarch

https://github.com/oamg/leapp-repository/blob/d8387b491f4d10536ec2927af2825ef4da1187cf/repos/system_upgrade/el7toel8/actors/persistentnetnamesdisable/actor.py

How reproducible: always

Steps to Reproduce:

* Have a machine with multiple eth<number> NICs.

* Make the naming stable e.g. via HWADDR entries in ifcfg files.

* `leapp upgrade` refuses to do the upgrade.

* (After manually editing the PersitentNetNamesDisable actor to mark it as experimental, the upgrade proceeds without issues, and machine boots up with the same NIC names as before the upgrade.)

Expected results:

If i'm certain my NIC names will be the same even after the upgrade, i'd like to be able to disable the PersistentNetNamesDisable actor.

Comment 2 Michal Bocek 2019-06-19 12:31:25 UTC
Michal Sekletar can comment on the feasibility of improving the actor by detecting "stable" multiple-ethX-NIC configurations.

In regards to disabling actors, we may add a generic option to leapp to disable any actor, but this option would always be considered at-your-own-risk option causing the upgrade to be unsupported should any issue related to the disabled actor arise.

Comment 3 Petr Stodulka 2019-06-19 12:44:51 UTC
> In regards to disabling actors, we may add a generic option to leapp to disable any actor, but this option would always be considered at-your-own-risk option causing the upgrade to be unsupported should any issue related to the disabled actor arise.

I really do not like the idea. Maybe we will implement that, but I hope that not. The solution of the problem in this case should be a user question. So the user would specify that issues connected to the NIC names are "False positives" and upgrade should be processed anyway (on their own risk). - Unless there is a bug in actor. Currently it is not possible to ask user for cooperation now. So probably this issues will be postponed until the user questions will be implemented.

Anyway, @msekleta, what do you think about the issue?

Comment 4 Jiri Stransky 2019-06-20 09:46:06 UTC
Our use of Leapp is not interactive, so users will not have the opportunity to answer questions on CLI, let's please also include a non-interactive solution (e.g. CLI parameter or environment variable or generic way to disable an actor).

Regarding improvement of the heuristic, i think one way is to look at `ifcfg` files for the eth interfaces and check if there is HWADDR entry, which will stabilize the name even on RHEL 8. That would reduce the false positives but it does not remove them completely, e.g. another way to stabilize names is via custom udev rules (OpenStack project os-net-config utilizes them [1]), but those could be more difficult to parse as they are much more free-form than ifcfg files. We should probably have a manual override of the actor logic anyway, to ensure users don't get blocked from upgrading.

[1] https://github.com/openstack/os-net-config/blob/22aeee1b87731618356f2792d49795a01f40ec19/os_net_config/sriov_config.py#L194-L208

Comment 5 Jiri Stransky 2019-06-20 09:51:24 UTC
E.g. on my machine right now i have:

[heat-admin@controller-0 ~]$ cat /etc/udev/rules.d/70-persistent-net.rules 
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:0c:3c:41", NAME="eth0"

which stabilizes the name for eth0. But this solution is quite specific to OpenStack so i wouldn't necessarily expect Leapp to look for this. I think we'd be fine with disabling/overriding the actor.

Comment 6 Petr Stodulka 2019-06-20 09:55:23 UTC
@Jiri: We are not talking so much about interactive upgrade as about something like possibility to specify answers using asnwerfile - not telling that answerfile will be the way, but imagine that we will produce something which user would be able to modify and use it for 2nd run.

I will wait for asnwer from Michal. AFAIK ifcfg files are deprecated and are not 100% safe. But Michal knows more about that.

Comment 7 Michal Sekletar 2019-06-21 08:11:33 UTC
Jiri, what makes you believe that ifcfg with NAME+HWADDR where NAME=ethX makes naming stable? It is just a pure luck that you didn't run into problems yet. Imagine system with 2 virtio-net NICs.

It is very likely that two virtio NICs might show up in parallel and things then just blow up (in second column are names configured in ifcfg files).

eth0 (virtio) -> eth1 (eth1 name is already taken, rename fails) 
eth1 (virtio) -> eth0 (eth0 name is already taken, rename fails)

Sure NICs have ethX names but in a swapped order.

What I described above is the situation from RHEL-7. In RHEL-6 udev had couple more tricks up its sleeve to prevent this (temporary rename to rename_X name to free up the name), however, I can make the situation bit more complex by throwing one more NIC into mix (different driver) and that scheme will fall flat on its face (as it did many times in RHEL-6). I mean, there is a reason why we ditched it in RHEL-7.

Btw, NAME+HWADDR works perfectly well, *IF*, you use different prefix for your NICs (e.g. net, lan).

Comment 8 Petr Stodulka 2019-06-21 08:26:35 UTC
Closing as cantfix. Jiri, if you think that this is not right, discuss it with Michal first before you reopen the bug. Without working safe solution around ethX (includes Michal's approval the solution is correct), there will not be plan to do any changes around. Only thing that still would be implemented in future, is possibility of user-question, that will allow user to shoot themselves in leg - but from this point, I don't want to any such solution like that - as we have bunch of other precedences, why we should not do that.

Comment 9 Jiri Stransky 2019-06-21 10:57:21 UTC
Thanks for the responses, i didn't know about the renaming collisions issue which could cause the naming to fail. Given that this is not solvable safely and we know the reasons why now, i agree with CLOSED/CANTFIX.