Bug 2084186
Summary: | [RFE] Improve the management for multiple network adapters in oVirt | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Yury.Panchenko | ||||||||
Component: | BLL.Network | Assignee: | eraviv | ||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Michael Burman <mburman> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.4.10.7 | CC: | amusil, bugs, michal.skrivanek, mperina, Yury.Panchenko | ||||||||
Target Milestone: | --- | Keywords: | FutureFeature | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2022-10-11 10:32:51 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Yury.Panchenko
2022-05-11 15:43:12 UTC
Hi Yury.Panchenko, Regarding separating the traffic to storage and VM networks - this is supported by defining several networks and assigning different roles to them using 'network roles' in Clusters|Logical Networks|Manage Networks. Regarding number of ports per NIC - please explain what exactly is your use case so we can better evaluate. Thanks, Eitan RHV - networking Hello Eitan,
> Regarding separating the traffic to storage and VM networks - this is supported by defining several networks and assigning different roles to them using 'network roles' in Clusters|Logical Networks|Manage Networks.
This doesn't work on the core level.
For example, I have two networks NetA - in VLAN a and NetB in VLAN b, they have different subnets and gateways and they both have access to the Internet.
There is RHV node server with two port 10G adapter, port 1 is connected to NetA and port 2 is connected to NetB.
In this setup one of the ports doesn't work, it has configuration, but all traffic goes through only one port. To resolve that I must connect to the terminal and manually define routes for every network port.
Then I deploy ovirt engine and connect storage domain via NetB. When I define the second netwrok via "'network roles' in Clusters|Logical Networks|Manage Networks." the ovirt drops all my network routes from the server and the second port again doesn't work.
So I must again connect to the server and define this routes manually.
According to my experience with other hypervisors I'd like to get 'out of box' funcionality in this area, because the setup is very basic and there will be many customers who will get the same problems
I still do not follow what are you trying to do. Please describe exact steps. You are supposed to configure host networking in Host:NEtwork Interfaces:Setup Host Networks > You are supposed to configure host networking in Host:NEtwork Interfaces:Setup Host Networks
Yes I use this to configure two networks, but one of the nework always doesn't work in this case.
It connected but network traffic doesn't come. Source of the problem is incorrect network routes created by this utility. The host trys to pass all network traffic via one adapter.
You must connect to the host and manually define routes for each port in terminal to make them work together.
And when you configure this nics in the Ovirt, you will repeate this trick with the routes again.
Hi Yury, I think we are still not clearly understand(DEV + QE) what you are trying to achieve. Let's try to understand. First of all the 'ovirtmgmt' network is the default route network by default. This is possible to change and assign another network that will be the default network of the host. From your comments, it seems like you want to have a default route per port/network, but only one network can be the default route of the host. Also, it is possible to use VLANs, assign vlan networks on same port or on different, this way to separate traffic. One network can be used for storage connection and one for management. The VLANs must be properly configured on the switch side of course. roles: default route - by default is ovirtmgmt all traffic is via ovirtmgmt, unless specified otherwise. It is possible to change this role and assign any other network, as long as it's has a bootproto configured. You can set one vlan network to be the default route of the host and the other network for the other usage, like storage. Can you share with us a screenshot of the UI setup host networks, to see what you are trying to do? Thanks, Created attachment 1885809 [details]
nic config + ping
Hello Muchael, I uploaded few screenshots to describe my current setup. You also can see that ping works only via the ovirmmgmt interface Hi Yury, From the attachment OvirtNetwork.png I see that you have two networks with two separate subnets on two separate nics- this is a standard and well supported use case by engine. We are not aware of any problems or lack of functionality around it. In the attachment ping.txt the existing ovirtmgmt bridge and nics reflect the setup viewable on engine. What is not clear to me: 1. I don't see any vlan usage which is inconsistent with what you wrote in comment 2. 2. The SAN network is marked as out of sync which means that whatever you configured on engine is not consistent with what is configured on the host. This in itself might be an indicator of a mis-configuration. 3. Both ovirtmgmt and SAN have quite a few dropped packets which also might indicate a mis-configuration on the switch this host is connected to or the switch-host connection. In the attachment ping.txt, the fact that ping results signals to me that - 1. ovirtmgmt is the default gateway on the host - this is the default configuration by engine. 2. another route\gateway for the second subnet is missing on the host. or, 3. maybe the host to switch or switch configuration has a problem? You have not included the routing table on the host or the nic configuration in engine so I cannot ascertain this. Did you configure the boot protocol of enp94s0f0 nic for the SAN network? By default it is none in which case there cannot be any L3 communication on that network. If you set it to static, you can specify the gateway for that subnet in setup networks dialog in engine by clicking the pencil icon. If it is configured to dhcp the gateway is acquired automatically. Can you check the above and provide more details? Also, engine.log and vdsm.log might shed some more light on the situation. Thanks, Eitan Created attachment 1886152 [details]
network setup
Hello Eitan, I resolved the out of sync problem and uploaded the nework setup, but I still can't ping san network via the SAN interface, I can do this only via the Ovirt nic > 1. I don't see any vlan usage which is inconsistent with what you wrote in I use native tagging on the switch ports, this part is done by swith. The both nics have different VLANs > 2. The SAN network is marked as out of sync which means that whatever you configured on engine is not consistent with what is configured on the host. This in itself might be an indicator of a mis-configuration. fixed > 1. ovirtmgmt is the default gateway on the host - this is the default configuration by engine. It's ok for me. I'd like to use the SAN nic only for the SAN subnet 172.24.175.x > 2. another route\gateway for the second subnet is missing on the host. or, I expected that the engine or the RHV create this route automatically > 3. maybe the host to switch or switch configuration has a problem? No, there are many other servers on the same switch and networks, so they don't have any problems. > You have not included the routing table on the host or the nic configuration in engine so I cannot ascertain this. There is. I don't change anything in the routes [root@PDCQA189 /]# ip route default via 172.25.16.1 dev ovirtmgmt proto static metric 1 172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425 > Did you configure the boot protocol of enp94s0f0 nic for the SAN network? no > Also, engine.log and vdsm.log might shed some more light on the situation. I didn't see any problems in the logs. I will upload them if it needs Hello Eitan, to make the second nic work, I must add this routes on the RHV node [root@PDCQA189 ~]# ip route add 172.24.175.1 dev enp94s0f0 [root@PDCQA189 ~]# ip route add 172.24.175.0/24 via 172.24.175.1 dev enp94s0f0 Hi Yuri, In comment#10 you mentioned that you did not configure the boot protocol of enp94s0f0 but attachment ovirt1.png shows that a DHCP configuration has been set up. With this configuration RHV should have supported the communication on the SAN network. So in order to understand what's wrong could you please: 1. Remove all the manual changes you made to the host 2. RHV > webadmin > Hosts > your_host > Management > Refresh Capabilities 3. RHV > webadmin > Hosts > your_host > Network Interfaces > make sure all interfaces are synced and if not run setup networks with sync for each 4. Refresh capabilities again and make sure all interfaces are in sync This will ensure the RHV side configuration has been applied to the host interfaces. Next, could you please provide the following output: 1. RHV > webadmin > Hosts > your_host > General > Software > VDSM version 2. /var/log/vdsm/vdsm.log from the host during the interval when steps 2,3,4 above were performed 3. On the host shell: 2.1 `ip route show table all` 2.2 `ip rule show all` Thanks, Eitan Hello Eitan, I've done all the steps VDSM version is vdsm-4.50.0.13-1.el8ev [root@PDCQA189 ~]# ip route show table all default via 172.24.175.1 dev enp94s0f0 table 264766186 proto dhcp metric 100 172.24.175.0/24 dev enp94s0f0 table 264766186 proto kernel scope link src 172.24 .175.28 metric 100 default via 172.25.16.1 dev ovirtmgmt table 329647082 proto static metric 425 172.25.16.0/22 via 172.25.16.61 dev ovirtmgmt table 329647082 proto static metri c 425 172.25.16.1 dev ovirtmgmt table 329647082 proto static scope link metric 425 default via 172.25.16.1 dev ovirtmgmt proto static metric 1 172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425 broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0 .1 broadcast 172.24.175.0 dev enp94s0f0 table local proto kernel scope link src 172 .24.175.28 local 172.24.175.28 dev enp94s0f0 table local proto kernel scope host src 172.24 .175.28 broadcast 172.24.175.255 dev enp94s0f0 table local proto kernel scope link src 1 72.24.175.28 broadcast 172.25.16.0 dev ovirtmgmt table local proto kernel scope link src 172. 25.16.61 local 172.25.16.61 dev ovirtmgmt table local proto kernel scope host src 172.25. 16.61 broadcast 172.25.19.255 dev ovirtmgmt table local proto kernel scope link src 17 2.25.16.61 ::1 dev lo proto kernel metric 256 pref medium fe80::/64 dev vnet1 proto kernel metric 256 pref medium fe80::/64 dev vnet3 proto kernel metric 256 pref medium fe80::/64 dev vnet5 proto kernel metric 256 pref medium fe80::/64 dev vnet11 proto kernel metric 256 pref medium fe80::/64 dev vnet12 proto kernel metric 256 pref medium fe80::/64 dev vnet13 proto kernel metric 256 pref medium fe80::/64 dev vnet14 proto kernel metric 256 pref medium fe80::/64 dev vnet15 proto kernel metric 256 pref medium fe80::/64 dev vnet16 proto kernel metric 256 pref medium fe80::/64 dev vnet18 proto kernel metric 256 pref medium fe80::/64 dev vnet19 proto kernel metric 256 pref medium local ::1 dev lo table local proto kernel metric 0 pref medium local fe80::fc16:3eff:fe32:6d7f dev vnet13 table local proto kernel metric 0 pre f medium local fe80::fc6f:38ff:feed:2 dev vnet3 table local proto kernel metric 0 pref me dium local fe80::fc6f:38ff:feed:10 dev vnet19 table local proto kernel metric 0 pref medium local fe80::fc6f:38ff:feed:4b dev vnet1 table local proto kernel metric 0 pref m edium local fe80::fc6f:38ff:feed:d7 dev vnet18 table local proto kernel metric 0 pref medium local fe80::fc6f:38ff:feed:d9 dev vnet14 table local proto kernel metric 0 pref medium local fe80::fc6f:38ff:feed:e9 dev vnet5 table local proto kernel metric 0 pref m edium local fe80::fc6f:38ff:feed:123 dev vnet15 table local proto kernel metric 0 pref medium local fe80::fc6f:38ff:feed:126 dev vnet16 table local proto kernel metric 0 pref medium local fe80::fc6f:67ff:fe42:ae dev vnet11 table local proto kernel metric 0 pref medium local fe80::fc6f:67ff:fe42:b0 dev vnet12 table local proto kernel metric 0 pref medium multicast ff00::/8 dev eno1 table local proto kernel metric 256 pref medium multicast ff00::/8 dev br-int table local proto kernel metric 256 pref medium multicast ff00::/8 dev enp94s0f1 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet1 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet3 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet5 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet11 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet12 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet13 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet14 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet15 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet16 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet18 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet19 table local proto kernel metric 256 pref medium [root@PDCQA189 ~]# ip route default via 172.25.16.1 dev ovirtmgmt proto static metric 1 172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.61 metric 425 [root@PDCQA189 ~]# ip rule show all 0: from all lookup local 3200: from all to 172.25.16.0/22 lookup 329647082 3200: from 172.25.16.0/22 lookup 329647082 32766: from all lookup main 32767: from all lookup default Created attachment 1888351 [details]
vdsm log
Hi Yuri, Apologies for the delayed reply... It seems that the rules table you posted is not complete. A rule for subnet 172.24.175.0/24 is missing. This rule should have been created by RHV when you attached the SAN network to interface enp94s0f0 in the webadmin. I cannot say why this happened because I don't have the logs from that moment. So let's try to fix by detaching the SAN network and then attaching back via the webadmin, with the intention that when attaching it again the routing rules will be correctly created. This is assuming that there are no leftover manual configurations that might interfere. 1. RHV > webadmin > Hosts > your_host > Network Interfaces > setup networks: detach SAN 2. wait for a confirmation event on the events tab that the network has been detached 3. print out the `ip route show table all` `ip rule show all` just to make sure we get the expected result 4. RHV > webadmin > Hosts > your_host > Network Interfaces > setup networks: attach SAN to enp94s0f0 5. wait for a confirmation event on the events tab that the network has been attached 5. print out the `ip route show table all` `ip rule show all` 6. try the ping... :) In case we have the same failure again, it would be very helpful if you can post the vdsm.log and supervdsm.log logging the above flow. Thanks, Eitan Hello Eitan, > A rule for subnet 172.24.175.0/24 is missing. This rule should have been created by RHV when you attached the SAN network to interface enp94s0f0 in the webadmin Yes that is my point and I created it manually. > So let's try to fix by detaching the SAN network and then attaching back via the webadmin I've tried this few times, it didn't fix the problem. > try the ping... :) It works in case of manual route configuration, but the RHV node still uses ovirtmgmt I fixed the problem but in a radical way I disabled gateways of the storage san nics, then ovirtmgmt nics can't access to it and the RHV nodes use SAN nics instead. Now the problem is more clear for me: 1) You must have two networks (let's name it NetA for ovirtmgmt and NetB for SAN) 2) Both networks must have gateways to common external network. 3) Ovirtmgmt uses the NetA as a primary network 4) The node has default route via the GatewayA and two routes for the SubnetA and the SubnetB 5) To access the SAN network the node must use NetB, but because it's able to reach it from the NetA via GatewayA it uses the NetA 6) If we block the external network access for the SAN network on the storage it will work normally Hi Yury, Comparing the output of `ip rule show all` and `ip route show table all` it occurs to me that although there is a missing rule in the rules output, the corresponding routing does appear in the route tables. So I suspect that something has been corrupted which RHV cannot fix just by detaching and re-attaching the network that as you commented does not help. Since your flow is fully supported by RHV and reproducible as working on our envs, please try to recreate it on a separate vanilla host just using RHV, preferably in a way that does not use the existing networks\switches between the 'bad' host and RHV. Thanks, Eitan Hi Yury, have you had time to take a look? Hello Martin, I'm working on this case. I can provide some results on the next week. thanks. Hello Martin, I've done new setup with a new host, but I have the same problem # ip rule show all 0: from all lookup local 100: from all to 192.168.222.1/24 lookup main 100: from all to 192.168.1.1/24 lookup main 101: from 192.168.222.1/24 lookup main 101: from 192.168.1.1/24 lookup main 32766: from all lookup main 32767: from all lookup default [root@psrh451 ~]# ip route show table all default via 172.24.144.1 dev Net2 table 59048282 proto dhcp src 172.24.153.64 me tric 426 172.24.144.0/20 dev Net2 table 59048282 proto kernel scope link src 172.24.153.6 4 metric 426 default via 172.25.16.1 dev ovirtmgmt proto dhcp src 172.25.16.88 metric 425 172.25.16.0/22 dev ovirtmgmt proto kernel scope link src 172.25.16.88 metric 425 broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0 .1 broadcast 172.24.144.0 dev Net2 table local proto kernel scope link src 172.24.1 53.64 local 172.24.153.64 dev Net2 table local proto kernel scope host src 172.24.153. 64 broadcast 172.24.159.255 dev Net2 table local proto kernel scope link src 172.24 .153.64 broadcast 172.25.16.0 dev ovirtmgmt table local proto kernel scope link src 172. 25.16.88 local 172.25.16.88 dev ovirtmgmt table local proto kernel scope host src 172.25. 16.88 broadcast 172.25.19.255 dev ovirtmgmt table local proto kernel scope link src 17 2.25.16.88 ::1 dev lo proto kernel metric 256 pref medium fe80::/64 dev vnet5 proto kernel metric 256 pref medium fe80::/64 dev vnet6 proto kernel metric 256 pref medium local ::1 dev lo table local proto kernel metric 0 pref medium anycast fe80:: dev vnet5 table local proto kernel metric 0 pref medium anycast fe80:: dev vnet6 table local proto kernel metric 0 pref medium local fe80::fc16:3eff:fe7e:570e dev vnet5 table local proto kernel metric 0 pref medium local fe80::fc6f:b7ff:fe52:0 dev vnet6 table local proto kernel metric 0 pref me dium multicast ff00::/8 dev br-int table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet5 table local proto kernel metric 256 pref medium multicast ff00::/8 dev vnet6 table local proto kernel metric 256 pref medium multicast ff00::/8 dev ens224 table local proto kernel metric 256 pref medium We are not able to reproduce the issue you raised, if we perform setup on a new hosts using steps from Comment 15, everything works for us. Could you please recheck that your really followed the steps from Comment 15? This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000). This bug always confirms in my labs, but I don't understand which additional information I can provide here. In my env I use workaround which helps me. So, let's wait for real customer cases. Thanks. We are not able to reproduce the issue despite our best effort, using steps from Comment 15 always get us to working status, so there must be something else in customer's environment, which causes that issue. But as we are out of ideas what to try and we didn't get report from other users about this issue, we need to close this bug as WORKSFORME |