Bug 1720578
| Summary: | [OpenStack] NIC naming validation can produce false positives and block a valid upgrade | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jiri Stransky <jstransk> |
| Component: | leapp-repository | Assignee: | Leapp team <leapp-notifications> |
| Status: | CLOSED CANTFIX | QA Contact: | Alois Mahdal <amahdal> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.6 | CC: | mbocek, msekleta, pstodulk |
| Target Milestone: | rc | Keywords: | Upgrades |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-21 08:26:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1708241, 1727807 | ||
|
Description
Jiri Stransky
2019-06-14 09:26:09 UTC
Michal Sekletar can comment on the feasibility of improving the actor by detecting "stable" multiple-ethX-NIC configurations. In regards to disabling actors, we may add a generic option to leapp to disable any actor, but this option would always be considered at-your-own-risk option causing the upgrade to be unsupported should any issue related to the disabled actor arise. > In regards to disabling actors, we may add a generic option to leapp to disable any actor, but this option would always be considered at-your-own-risk option causing the upgrade to be unsupported should any issue related to the disabled actor arise.
I really do not like the idea. Maybe we will implement that, but I hope that not. The solution of the problem in this case should be a user question. So the user would specify that issues connected to the NIC names are "False positives" and upgrade should be processed anyway (on their own risk). - Unless there is a bug in actor. Currently it is not possible to ask user for cooperation now. So probably this issues will be postponed until the user questions will be implemented.
Anyway, @msekleta, what do you think about the issue?
Our use of Leapp is not interactive, so users will not have the opportunity to answer questions on CLI, let's please also include a non-interactive solution (e.g. CLI parameter or environment variable or generic way to disable an actor). Regarding improvement of the heuristic, i think one way is to look at `ifcfg` files for the eth interfaces and check if there is HWADDR entry, which will stabilize the name even on RHEL 8. That would reduce the false positives but it does not remove them completely, e.g. another way to stabilize names is via custom udev rules (OpenStack project os-net-config utilizes them [1]), but those could be more difficult to parse as they are much more free-form than ifcfg files. We should probably have a manual override of the actor logic anyway, to ensure users don't get blocked from upgrading. [1] https://github.com/openstack/os-net-config/blob/22aeee1b87731618356f2792d49795a01f40ec19/os_net_config/sriov_config.py#L194-L208 E.g. on my machine right now i have:
[heat-admin@controller-0 ~]$ cat /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:0c:3c:41", NAME="eth0"
which stabilizes the name for eth0. But this solution is quite specific to OpenStack so i wouldn't necessarily expect Leapp to look for this. I think we'd be fine with disabling/overriding the actor.
@Jiri: We are not talking so much about interactive upgrade as about something like possibility to specify answers using asnwerfile - not telling that answerfile will be the way, but imagine that we will produce something which user would be able to modify and use it for 2nd run. I will wait for asnwer from Michal. AFAIK ifcfg files are deprecated and are not 100% safe. But Michal knows more about that. Jiri, what makes you believe that ifcfg with NAME+HWADDR where NAME=ethX makes naming stable? It is just a pure luck that you didn't run into problems yet. Imagine system with 2 virtio-net NICs. It is very likely that two virtio NICs might show up in parallel and things then just blow up (in second column are names configured in ifcfg files). eth0 (virtio) -> eth1 (eth1 name is already taken, rename fails) eth1 (virtio) -> eth0 (eth0 name is already taken, rename fails) Sure NICs have ethX names but in a swapped order. What I described above is the situation from RHEL-7. In RHEL-6 udev had couple more tricks up its sleeve to prevent this (temporary rename to rename_X name to free up the name), however, I can make the situation bit more complex by throwing one more NIC into mix (different driver) and that scheme will fall flat on its face (as it did many times in RHEL-6). I mean, there is a reason why we ditched it in RHEL-7. Btw, NAME+HWADDR works perfectly well, *IF*, you use different prefix for your NICs (e.g. net, lan). Closing as cantfix. Jiri, if you think that this is not right, discuss it with Michal first before you reopen the bug. Without working safe solution around ethX (includes Michal's approval the solution is correct), there will not be plan to do any changes around. Only thing that still would be implemented in future, is possibility of user-question, that will allow user to shoot themselves in leg - but from this point, I don't want to any such solution like that - as we have bunch of other precedences, why we should not do that. Thanks for the responses, i didn't know about the renaming collisions issue which could cause the naming to fail. Given that this is not solvable safely and we know the reasons why now, i agree with CLOSED/CANTFIX. |