Description of problem: Did FFU on hybrid setup of SRIOV. Had 3 instances - one with PF port, one with VF and one with normal port. Process finished ok, the VF and the normal ports instances are still reachable and functioning as expected; The PF-port instance lost it's connectivity, even after rebooting the instance it is still unreachable. Version-Release number of selected component (if applicable): FFU OSP10-OSP13 How reproducible: 1/1 Steps to Reproduce: 1. Deploy SRIOV hybrid setup 2. Create 3 instances with the 3 different port types 3. Run FFU, try to ping/ssh the instances after the process finishes. Actual results: PF-port instance unreachable Expected results: All instances are reachable. Additional info: My setup is available for investigations for now. approach for credentials.
Roee - can you give us info on logging into this system? Also, after FFU if you spawn a new SRIOV instance does it also fail to ping?
Created attachment 1497663 [details] Ping 10.0.1.14 (PF private IP) path Ping 10.0.1.14 (PF private IP) path
Hello Roee: I have created again the PF VM using your scripts. I added a password to "could-user" [1] in order to access to the VM. From the controller0 DHCP netspace [2] I have connectivity with the PF machine (net-64-1-pf). As you can see in [3], the network configuration is correct. I've tested that the ARP message is correctly flooded across all the compute node interfaces (computesriov-0/1, p1p1 and p1p2). When the PF interface is detached from the kernel and given to libvirt, the message continues flowing inside the VM. Also I've tested pinging from the PF machine to the NORMAL machine (10.0.1.14 --> 10.0.1.7) [4]. [1] cloud-config file: (overcloud) [stack@undercloud-0 ~]$ cat create_int.yaml #cloud-config write_files: - path: /etc/sysconfig/network-scripts/ifcfg-eth0.228 owner: "root" permissions: "777" content: | DEVICE="eth0.228" BOOTPROTO="dhcp" ONBOOT="yes" VLAN="yes" PERSISTENT_DHCLIENT="yes" runcmd: - [ sh, -c , "systemctl restart network" ] users: - name: cloud-user lock-passwd: False plain_text_passwd: pass chpasswd: { expire: False } sudo: ALL=(ALL) NOPASSWD:ALL ssh_pwauth: True [2] ip netns exec qdhcp-9907fd65-8e8d-4a58-a7b4-7a6d82b08949 ping 10.0.1.14 [3] [cloud-user@net-64-1-pf ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether a0:36:9f:7f:28:b8 brd ff:ff:ff:ff:ff:ff inet6 2620:52:0:23a4:a236:9fff:fe7f:28b8/64 scope global noprefixroute dynamic valid_lft 2591976sec preferred_lft 604776sec inet6 2001::a236:9fff:fe7f:28b8/64 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::a236:9fff:fe7f:28b8/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: eth0.228@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether a0:36:9f:7f:28:b8 brd ff:ff:ff:ff:ff:ff inet 10.0.1.14/24 brd 10.0.1.255 scope global noprefixroute dynamic eth0.228 valid_lft 85290sec preferred_lft 85290sec inet6 2001::a236:9fff:fe7f:28b8/64 scope global mngtmpaddr noprefixroute dynamic valid_lft 86395sec preferred_lft 14395sec inet6 fe80::a236:9fff:fe7f:28b8/64 scope link noprefixroute valid_lft forever preferred_lft forever [4] [cloud-user@net-64-1-pf ~]$ ping 10.0.1.7 PING 10.0.1.7 (10.0.1.7) 56(84) bytes of data. 64 bytes from 10.0.1.7: icmp_seq=1 ttl=64 time=0.339 ms 64 bytes from 10.0.1.7: icmp_seq=2 ttl=64 time=0.363 ms
Hello Roee: I have created again the PF VM using your scripts. I added a password to "could-user" [1] in order to access to the VM. From the controller0 DHCP netspace [2] I have connectivity with the PF machine (net-64-1-pf). As you can see in [3], the network configuration is correct. Please, check the attached file to see the ping path. I've tested that the ARP message is correctly flooded across all the compute node interfaces (computesriov-0/1, p1p1 and p1p2). When the PF interface is detached from the kernel and given to libvirt, the message continues flowing inside the VM. Also I've tested pinging from the PF machine to the NORMAL machine (10.0.1.14 --> 10.0.1.7) [4]. Can you check it again? Please, give me feedback if [1] cloud-config file: (overcloud) [stack@undercloud-0 ~]$ cat create_int.yaml #cloud-config write_files: - path: /etc/sysconfig/network-scripts/ifcfg-eth0.228 owner: "root" permissions: "777" content: | DEVICE="eth0.228" BOOTPROTO="dhcp" ONBOOT="yes" VLAN="yes" PERSISTENT_DHCLIENT="yes" runcmd: - [ sh, -c , "systemctl restart network" ] users: - name: cloud-user lock-passwd: False plain_text_passwd: pass chpasswd: { expire: False } sudo: ALL=(ALL) NOPASSWD:ALL ssh_pwauth: True [2] ip netns exec qdhcp-9907fd65-8e8d-4a58-a7b4-7a6d82b08949 ping 10.0.1.14 [3] [cloud-user@net-64-1-pf ~]$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether a0:36:9f:7f:28:b8 brd ff:ff:ff:ff:ff:ff inet6 2620:52:0:23a4:a236:9fff:fe7f:28b8/64 scope global noprefixroute dynamic valid_lft 2591976sec preferred_lft 604776sec inet6 2001::a236:9fff:fe7f:28b8/64 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fe80::a236:9fff:fe7f:28b8/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: eth0.228@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether a0:36:9f:7f:28:b8 brd ff:ff:ff:ff:ff:ff inet 10.0.1.14/24 brd 10.0.1.255 scope global noprefixroute dynamic eth0.228 valid_lft 85290sec preferred_lft 85290sec inet6 2001::a236:9fff:fe7f:28b8/64 scope global mngtmpaddr noprefixroute dynamic valid_lft 86395sec preferred_lft 14395sec inet6 fe80::a236:9fff:fe7f:28b8/64 scope link noprefixroute valid_lft forever preferred_lft forever [4] [cloud-user@net-64-1-pf ~]$ ping 10.0.1.7 PING 10.0.1.7 (10.0.1.7) 56(84) bytes of data. 64 bytes from 10.0.1.7: icmp_seq=1 ttl=64 time=0.339 ms 64 bytes from 10.0.1.7: icmp_seq=2 ttl=64 time=0.363 ms
Roee, If you haven't hit this issue in the 4 months since your last comment, can we conclude that the issue is resolved? If you hit it again then we can always reopen the BZ. Nate
Closing since we can't get a reproducer. Thanks!