Bug 1812559
Summary: | Need better error/exception for MTU apply failure | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Yossi Segev <ysegev> | ||||||
Component: | nmstate | Assignee: | Gris Ge <fge> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Mingyu Shi <mshi> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 8.2 | CC: | acardace, atragler, bgalvani, danken, ellorent, ferferna, fge, jiji, jishi, lrintel, myakove, network-qe, phoracek, rkhan, sukulkar, thaller, till | ||||||
Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||||
Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | nmstate-1.2.1-0.1.alpha1.el8 | Doc Type: | No Doc Update | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2022-02-14 08:10:53 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1876539 | ||||||||
Attachments: |
|
Description
Yossi Segev
2020-03-11 15:17:50 UTC
Created attachment 1669337 [details]
NNCE output
Created attachment 1669338 [details]
journalctl output
Hi Yossi, In the `nnce.out`, nmstate state out the detailed error as: ``` libnmstate.error.NmstateVerificationError: desired ======= --- name: ens7 type: ethernet state: up ipv4: address: [] auto-dns: true auto-gateway: true auto-routes: true dhcp: true \ enabled: true ipv6: enabled: false mac-address: FA:16:3E:9D:E8:A3 mtu: 2000 current ======= --- name: ens7 type: ethernet state: up ipv4: \ address: [] auto-dns: true auto-gateway: true auto-routes: true \ dhcp: true enabled: true ipv6: enabled: false mac-address: FA:16:3E:9D:E8:A3 mtu: 1450 difference ========== --- desired +++ current @@ -12,4 +12,4 @@ ipv6: enabled: false mac-address: FA:16:3E:9D:E8:A3 -mtu: 2000 +mtu: 1450 ``` Which means, nmstate try to apply 2000, but got 1450 after applied, hence rollback. The diff in the NNS is good, but it's not enough. If the NNCP state is "FailedToConfigure", then it necessarily means that an error occurred, therefore an ERROR line should appear in the NNCE. The NM publishes this error via journalctl, in this line which I also added in the bug descrition: Mar 11 12:52:36 host-172-16-0-33 NetworkManager[1482]: <debug> [1583931156.5406] platform-linux: sysctl: failed to set '/proc/sys/net/ipv6/conf/ens7/mtu' to '2000': (22) Invalid argument So why not forwarding this line - as an ERROR message - to the NNCE/NNCP? It would enable a much easier and intuitive debugging for the user. Hi Thomas, When NM failed to set the MTU, it still indicate the activation finished. Is that possible for NM to fail the activation with error message it stated in log: sysctl: failed to set '/proc/sys/net/ipv6/conf/ens7/mtu' to '2000': (22) Invalid argument Thank you. That doesn't seem so easy. For one, when you try to configure an MTU of an interface, kernel requires that the underlying interfaces' MTU is large enough. That means for example the MTU of a VLAN must be not larger than the MTU of ethernet below it. Or the MTU of a SRIOv VF might need to be no larger than the MTU of the PF. Understanding the logic how kernel rejects and enforces MTU sizes is not trivial, so NetworkManager doesn't even try. Also, the MTU of a device gets reconfigured, when the MTU of the underlying device changes. That means, there are cases where the MTU of the interface cannot be configured until some time later, when the parent device is ready. Coordinating that (to consistently fail) is non-trivial. Also, various link settings don't lead to a failure of the activation. E.g. if there is a failure to set autoneg/speed/duplex, then the activation just proceeds, it doesn't fail. For one, that is again because it's hard to understand why kernel fails to comply, and how to properly handle that. Second, it's not clear that every such condition constitutes a hard failre. So, maybe it's possible. But doesn't seem easy. And is it really useful? Why? If you merely want to detect that the MTU was not in fact correctly set, then we could instead expose that on D-Bus (or you could check yourself). Hi Thomas, Hiding error is not a good practice of API. If you think some failures should not block/fail the activation. Please report those errors in another way, some properties/methods of `NM.ActiveConnection`. Showing a warning message in journal/syslog and treating it like pass is not OK for me in this case and very hard to normal user to know what failed. Is there any progress, are there plans to tackle this issue? Hi Thmoas, Is it still possible to request NetworkManager fail the activation on MTU apply failure? (In reply to Gris Ge from comment #9) > Hi Thmoas, > > Is it still possible to request NetworkManager fail the activation on MTU > apply failure? you mean for 8.3? No, given the schedule, that is almost impossible (require a very strong effort). Beside, the biggest problem is the change in behavior here (of starting to fail). Handling that without breaking existing setups, is what makes it harder. In general, there are plans to tackle this issue (otherwise, we would have closed the bug). (In reply to Thomas Haller from comment #10) > (In reply to Gris Ge from comment #9) > > Hi Thmoas, > > > > Is it still possible to request NetworkManager fail the activation on MTU > > apply failure? > > you mean for 8.3? No, given the schedule, that is almost impossible (require > a very strong effort). RHEL 8.4 is OK for me. > > Beside, the biggest problem is the change in behavior here (of starting to > fail). Handling that without breaking existing setups, is what makes it > harder. > > > In general, there are plans to tackle this issue (otherwise, we would have > closed the bug). Do you need me to create RFE bug to NM for this request? Thank you! (In reply to Gris Ge from comment #11) > Do you need me to create RFE bug to NM for this request? No. I think this bz suffices. (In reply to Thomas Haller from comment #12) > (In reply to Gris Ge from comment #11) > > Do you need me to create RFE bug to NM for this request? > > No. I think this bz suffices. Should I change component for NetworkManager? Created bug 1876539 for NetworkManager to improve error handling on MTU apply failure. Hi Yossi Segev and Petr Horáček, Currently, nmstate will fail with `libnmstate.error.NmstateVerificationError` with MTU difference in the output. Are you expecting nmstate to raise specific error like `libnmstate.error.NmstateMtuApplyError` or just a error/warning log line in the log context of nmstate? Thank you! As a start - an actual ERROR report is better than the current state, where there is only DEBUG report without any indication about an invalid MTU. If an MTU failure will now result in an actual ERROR-labeled message, with an indication that the origin of the failure is MTU, then we should be fine. @Gris - can you please add an example of both the NNCE output and the nmstate-handler log upon this failure? It would help me understand if it satisfies my expectations when I submitted this bug. Hi Yossi, Currently, NetworkManager only generate a line to journald about MTU invalid error, nmstate cannot receive any indication on the source of failure. Nmstate can only verify whether user get that they asked and state the difference for the root cause. Without NetworkManager buy-in, the only effort nmstate can do raise a dedicate exception when NmstateVerificationError happens, looks into whether MTU is the only root cause of verification failure. It might take me a week or so to learn this NNCE stuff(I assume it is from kubernetes-nmstate). Will provide the example later. Gris, can we help you with the kubernetes-nmstate part? Although, I believe this could be reproducible with nmstatectl alone. (In reply to Petr Horáček from comment #22) > Gris, can we help you with the kubernetes-nmstate part? Although, I believe > this could be reproducible with nmstatectl alone. Yes please. Could check whether NNCP/NNCE/NNS contains NmstateVerificationError with mtu difference? To reproduce the problem, simply set MTU to a very big number. From the NNCE attached to this BZ: libnmstate.error.NmstateVerificationError: \ndesired\n=======\n---\nname: ens7\ntype: ethernet\nstate: up\nipv4:\n address: []\n auto-dns: true\n auto-gateway: true\n auto-routes: true\n dhcp: true\n \ enabled: true\nipv6:\n enabled: false\nmac-address: FA:16:3E:9D:E8:A3\nmtu: 2000\n\ncurrent\n=======\n---\nname: ens7\ntype: ethernet\nstate: up\nipv4:\n \ address: []\n auto-dns: true\n auto-gateway: true\n auto-routes: true\n \ dhcp: true\n enabled: true\nipv6:\n enabled: false\nmac-address: FA:16:3E:9D:E8:A3\nmtu: 1450\n\ndifference\n==========\n--- desired\n+++ current\n@@ -12,4 +12,4 @@\n ipv6:\n enabled: false\n mac-address: FA:16:3E:9D:E8:A3\n-mtu: 2000\n+mtu: 1450\n\n\n'" So there is an NmstateVerificationError, but it doesn't specify that the error is due to the invalid MTU - it just compares the desired state to the current state. Hi Yossi, The NmstateVerificationError has identified the cause of failure as MTU does not match with desired state. What's your preferred way of error reporting on this? Not sure kernel dmesg can redirect to netlink or not. But yes, nmstate/NetworkManager should do better on showing the error message instead of `NmstateVerificationError`. But I don't know how to do that yet. Let me investigate a little bit. If you are asking the error message format change to only including the difference without context. I can do that in RHEL 8.5. Is the error message chanage enough for you? > If you are asking the error message format change to only including the difference without context. I can do that in RHEL 8.5.
> Is the error message chanage enough for you?
In a lack of a better option - this is a compromise I can live with.
But I would really prefer a clear and explicit ERROR message, e.g.:
12:52:37,337 root ERROR Unsupported MTU 2000 requested.
I believe that if NmstateVerificationError exists, then it should and can be "transformed" to a relevant ERROR level message.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Reopen to continue to work. The action plan for this bug is request NetworkManager to raise the log priority of MTU failure from trace to warning. Hi Yossi, NetworkManager only pass logs to nmstate through their API, hence nmstate could not help on identifying why the desire state verification failure. With NetworkManager-1.32.10-2.el8.x86_64, the MTU failure is shown as warn instead of debug message, for example: Oct 12 14:04:52 el8 NetworkManager[951]: <warn> [1634018692.1414] platform-linux: do-change-link[3]: failure changing link: failure 22 (Invalid argument - mtu greater than device maximum) This could help you debug this issue in the future. Could you try it in your system and see whether it meet your expectation? Thank you! Hi Gris, Our product currently uses NetworkManager v1.30.0-10 (on our Openshift 4.9 clusters, running RHEL 8.4 nodes), so I can't reproduce the issue and test whether this warning solution you suggested is sufficient. Can you tell when NetworkManager v1.32.10 is going to be available, i.e. on which Openshift/RHEL versions it's expected to be used. Thanks you very much! Yossi Hi Petr, I have created https://bugzilla.redhat.com/show_bug.cgi?id=2044150 for tracking the effort. Please check whether my proposed solution works or not. This bug will focusing on getting MTU error show in NetworkManager with proper level(not trace/debug). Thanks! If the NM messages you specified appear in an nmstate entity - most important in an NNCE, but can also appear in NNS and NNCP - then that would meet my expectation. Otherwise, if these messages only appear in journalctl, then I am afraid it doesn't change the current state, where one must drill through jourbalctl in order to find these NM messages, instead of viewing them in nmstate output. Hi Yossi, Thanks for the feedback! I will try my ideas to see whether it works or not. (In reply to Gris Ge from comment #55) > Hi Yossi, > > Thanks for the feedback! I will try my ideas to see whether it works or not. Hi Gris, As I see you've opened https://bugzilla.redhat.com/show_bug.cgi?id=2044150 to make the solution, shall we verify the current one or wait? *** This bug has been marked as a duplicate of bug 2044150 *** |