Bug 1564590
Summary: | Cannot add host to clean install of Ovirt (when ovirtmgmt interface has MTU of 9000) | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | james.mclaren.open | ||||
Component: | BLL.Network | Assignee: | Martin Perina <mperina> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Meni Yakove <myakove> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.2.2.3 | CC: | bugs, james.mclaren.open, mburman, mperina, myakove | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-08-06 11:58:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
james.mclaren.open
2018-04-06 17:23:53 UTC
Could you please engine logs? Hello Martin, The engine.log file is in the original attachment: https://bugzilla.redhat.com/attachment.cgi?id=1418254 Are you sure ovirt-host1.localdomain is really DNS resolvable by the Engine? (In reply to james.mclaren.open from comment #2) > Hello Martin, > > The engine.log file is in the original attachment: > https://bugzilla.redhat.com/attachment.cgi?id=1418254 Logs are not complete, I can see 1st installation failure on 2018-04-05 20:22:49 from which we don't have logs and which was strangely interrupted by "No route to host" exception. Then I can see the 2nd attempt at 2018-04-05 20:37:45, which failed quite soon again due to "No route to host" exception. The 3rd one started at 2018-04-05 20:58:15,574+01, which failed due to SSH timeout. The 4th one started at 2018-04-06 11:26:22,926+01 which again failed due to SSH timeout error. And there are bunch of others ... So what's you current status? Is it possible to remove host from engine, install OS from scratch to it, try to add it to engine and in case of failure attach complete SOS report from both engine and host? Were you able to resolve the issue with adding a completely clean host? If not could you please provide logs requested in Comment 4? After hours of testing I have found what triggers the problem. If the NIC eventually used for the ovirtmanagement network (p2p1 in this case) has a MTU set high (e.g. 9000) the host installation fails. If the MTU is left at the default value (1500) the host installation succeeds. If you install with a 1500 MTU and later try to increase the MTU to 9000 for the ovirtmanagement network in the Engine web GUI it fails again. So the workaround is to leave the NIC that will become the ovirtmanagement network with a default MTU i.e. 1500. (In reply to james.mclaren.open from comment #6) > After hours of testing I have found what triggers the problem. If the NIC > eventually used for the ovirtmanagement network (p2p1 in this case) has a > MTU set high (e.g. 9000) the host installation fails. If the MTU is left at > the default value (1500) the host installation succeeds. > > If you install with a 1500 MTU and later try to increase the MTU to 9000 for > the ovirtmanagement network in the Engine web GUI it fails again. > > So the workaround is to leave the NIC that will become the ovirtmanagement > network with a default MTU i.e. 1500. I must admit that I've seen that happening as well. Let me to try to reproduce (in OST). (In reply to Yaniv Kaul from comment #7) > (In reply to james.mclaren.open from comment #6) > > After hours of testing I have found what triggers the problem. If the NIC > > eventually used for the ovirtmanagement network (p2p1 in this case) has a > > MTU set high (e.g. 9000) the host installation fails. If the MTU is left at > > the default value (1500) the host installation succeeds. > > > > If you install with a 1500 MTU and later try to increase the MTU to 9000 for > > the ovirtmanagement network in the Engine web GUI it fails again. > > > > So the workaround is to leave the NIC that will become the ovirtmanagement > > network with a default MTU i.e. 1500. > > I must admit that I've seen that happening as well. Let me to try to > reproduce (in OST). Works fine for me (in OST - tested setting the ovirtmgmt to 9000 and the relevant interfaces to 9000 and it installed well on me, keeping the value). James, is it still reproducible in your environment? If so could you please provide completelogs from engine and installed host (using sos logcollector tool) so we can investigate that? The installation is now in use (with MTU 1500 on mngtmt NIC, 9000 on other 3 NICs) so I cant generate new logs at the moment. I'm not sure why the logs are regarded as being incomplete. The engine.log covers about 10 attempts to get the host installed. The last attempt is reflected in the engine.log from approx 2018-04-06 17:29:58,352+01 onwards which corresponds with the failed installation log left on the host: ovirt-host-deploy-ansible-20180406174155-ovirt-host1.localdomain-4315862e.log ? It was completely reproduceable over ~20+ install attempts with minor networking tweaks to try about work around the issue. Meni, could you please try to reproduce this issue? (In reply to Martin Perina from comment #11) > Meni, could you please try to reproduce this issue? Meni, any progress with reproducing this issue? Closing as insufficient data, feel free to reopen and provide requested information The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |