Bug 2048556

Summary:

VM with 16+ CPUs - no connectivity if networkInterfaceMultiqueue is enabled

Product:

Container Native Virtualization (CNV)

Reporter:

Ruth Netser <rnetser>

Component:

Networking

Assignee:

Petr Horáček <phoracek>

Status:

CLOSED MIGRATED

QA Contact:

awax

Severity:

high

Docs Contact:

Priority:

high

Version:

4.10.0

CC:

dholler, fdeutsch, gkapoor, jhopper, nkoenig, nrozen, omergi, phoracek

Target Milestone:

---

Target Release:

4.14.2

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2023-12-14 16:07:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
tar with domxml and domcapabilities for VM with enabled and disabled multiqueue	none

Description Ruth Netser 2022-01-31 14:07:03 UTC

Description of problem:
Start a VM with 16+ CPUs - no connectivity if networkInterfaceMultiqueue is enabled

Version-Release number of selected component (if applicable):
cluster-network-addons-operator version v4.10.0-41

How reproducible:


Steps to Reproduce:
1. Start a VM with 16 CPUs (tested on a cluster with 3 SR-IOV interfaces)

Actual results:
The VM cannot ping the node
If networkInterfaceMultiqueue is disabled, there's connectivity between the VM and the node

Expected results:
The VM should be able to reach the node


Additional info:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    vm.kubevirt.io/validations: "[\n   {\n     \"name\": \"minimal-required-memory\"\
      ,\n     \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n  \
      \   \"rule\": \"integer\",\n     \"message\": \"This VM requires more memory.\"\
      ,\n     \"min\": 1610612736\n   }\n]\n"
  labels:
    app: sap-hana-vm-1643637473-8532958
    vm.kubevirt.io/template: rhel8-saphana-tiny
    vm.kubevirt.io/template.namespace: openshift
    vm.kubevirt.io/template.revision: '1'
    vm.kubevirt.io/template.version: v0.19.1
  name: sap-hana-vm-1643637473-8532958
spec:
  dataVolumeTemplates:
  - apiVersion: cdi.kubevirt.io/v1beta1
    kind: DataVolume
    metadata:
      name: sap-hana-vm
    spec:
      source:
        registry:
          pullMethod: node
          url: docker://registry.redhat.io/rhel8/rhel-guest-image:8.4.0
      storage:
        resources:
          requests:
            storage: 50Gi
        storageClassName: nfs
  running: false
  template:
    metadata:
      annotations:
        vm.kubevirt.io/flavor: tiny
        vm.kubevirt.io/os: rhel8
        vm.kubevirt.io/workload: saphana
      labels:
        kubevirt.io/domain: sap-hana-vm-1643637473-8532958
        kubevirt.io/size: tiny
        kubevirt.io/vm: sap-hana-vm-1643637473-8532958
    spec:
      domain:
        cpu:
          cores: 16
          dedicatedCpuPlacement: true
          features:
          - name: invtsc
            policy: require
          isolateEmulatorThread: true
          model: host-passthrough
          numa:
            guestMappingPassthrough: {}
          sockets: 1
          threads: 1
        devices:
          blockMultiQueue: true
          disks:
          - dedicatedIOThread: true
            disk:
              bus: virtio
            name: sap-hana-vm
          - disk:
              bus: virtio
            name: cloudinitdisk
          - disk:
              bus: virtio
            name: downwardmetrics
          interfaces:
          - masquerade: {}
            model: virtio
            name: default
          - name: sriov-net1
            sriov: {}
          - name: sriov-net2
            sriov: {}
          - name: sriov-net3
            sriov: {}
          networkInterfaceMultiqueue: true
          rng: {}
        ioThreadsPolicy: auto
        machine:
          type: pc-q35-rhel8.4.0
        memory:
          guest: 24Gi
          hugepages:
            pageSize: 1Gi
        resources:
          requests:
            memory: 44Gi
      hostname: sap-hana-vm-1643637473-8532958
      networks:
      - name: default
        pod: {}
      - multus:
          networkName: ssp-high-performance-vm-test-sap-hana-vm/sriov-net-1
        name: sriov-net1
      - multus:
          networkName: ssp-high-performance-vm-test-sap-hana-vm/sriov-net-2
        name: sriov-net2
      - multus:
          networkName: ssp-high-performance-vm-test-sap-hana-vm/sriov-net-3
        name: sriov-net3
      nodeSelector:
        kubevirt.io/workload: hana
      terminationGracePeriodSeconds: 180
      tolerations:
      - effect: NoSchedule
        key: kubevirt.io/workload
        operator: Equal
        value: hana
      volumes:
      - dataVolume:
          name: sap-hana-vm
        name: sap-hana-vm
      - cloudInitNoCloud:
          userData: "#cloud-config\nuser: cloud-user\npassword: password\nchpasswd:\
            \ { expire: False }\nbootcmd:\n- sysctl -w net.ipv4.conf.all.rp_filter=0\n\
            \nssh_authorized_keys:\n [ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCj47ubVnxR16JU7ZfDli3N5QVBAwJBRh2xMryyjk5dtfugo5JIPGB2cyXTqEDdzuRmI+Vkb/A5duJyBRlA+9RndGGmhhMnj8and3wu5/cEb7DkF6ZJ25QV4LQx3K/i57LStUHXRTvruHOZ2nCuVXWqi7wSvz5YcvEv7O8pNF5uGmqHlShBdxQxcjurXACZ1YY0YDJDr3AJai1KF9zehVJODuSbrnOYpThVWGjFuFAnNxbtuZ8EOSougN2aYTf2qr/KFGDHtewIkzZmP6cjzKO5bN3pVbXxmb2Gces/BYHntY4MXBTUqwsmsCRC5SAz14bEP/vsLtrNhjq9vCS+BjMT\
            \ root]\nruncmd: [\"sudo sed -i '/^PubkeyAccepted/ s/$/,ssh-rsa/'\
            \ /etc/crypto-policies/back-ends/opensshserver.config\", \"sudo sed -i\
            \ 's/^#\\\\?PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config\"\
            , 'sudo systemctl enable sshd', 'sudo systemctl restart sshd']"
        name: cloudinitdisk
      - downwardMetrics: {}
        name: downwardmetrics

Comment 1 Ruth Netser 2022-02-01 08:32:18 UTC

Created attachment 1858254 [details]
tar with domxml and domcapabilities for VM with enabled and disabled multiqueue

Comment 2 oshoval 2022-04-03 13:26:19 UTC

Hi Ruth,
thanks for the details and the sap-hana cluster

Tried various combinations,
if we remove sriov-net3 (and are left with the others)
it does work.
if we leave sriov-net3 by itself or with one of the others it doesn't

do we have another cluster exactly like this one ?
need to understand what is special with the sriov-net3, of maybe the hardware behind it
has a problem ?

thanks

Comment 3 oshoval 2022-04-05 07:28:53 UTC

Update:
checking the "ip r" / "ip a" of the VM when the bug happens,
we see that the interfaces are flipped, so the routing is going via the sriov instead of via the default interface.

If we use consistent network device naming by removing net.ifnames=0 from /etc/default/grub
rebuild the grub (sudo grub2-mkconfig -o /boot/grub2/grub.cfg) and reboot the system, it works,
the routing is now good (the primary has lower metric for the default gateway)

See please https://bugzilla.redhat.com/show_bug.cgi?id=1874096#c14
for more info

Comment 4 oshoval 2022-04-05 09:41:58 UTC

Hi Geetika,

Can you please try to create the VM with mac for each sriov interface, and cloud-init that uses set-name
according the macaddress match, on a sap-hand cluster with the 3 sriov interfaces, multi queue and 17+ cpus
to see if it also solves the above problem?
using set-name should give consistent network device naming, and therefore also solve the wrong routing.
If it helps, we can update the templates, instead of updating the guest grub.

Thanks

Comment 5 Petr Horáček 2022-05-26 10:52:14 UTC

Due to capacity, I'm moving this to 4.12.

Comment 9 omergi 2023-07-06 14:56:01 UTC

Hello,

When running a VM with multiple interfaces and the guest OS use netX interface naming (e.g.:eth0, eth1, ...), the order of the interface may not be consistent.
Please set the VM with predictable naming [1].

If the issue reproduces it will be great to have access to its environment.

[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-consistent_network_device_naming#sec-Naming_Schemes_Hierarchy

Comment 10 Ruth Netser 2023-07-16 05:30:16 UTC

(In reply to omergi from comment #9)
> Hello,
> 
> When running a VM with multiple interfaces and the guest OS use netX
> interface naming (e.g.:eth0, eth1, ...), the order of the interface may not
> be consistent.
> Please set the VM with predictable naming [1].
> 
> If the issue reproduces it will be great to have access to its environment.
> 
> [1]
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/
> html/networking_guide/ch-consistent_network_device_naming#sec-
> Naming_Schemes_Hierarchy

Comment 11 Fabian Deutsch 2023-07-19 12:56:31 UTC

Petr, do we know in which component the problem is?

Comment 12 Petr Horáček 2023-07-19 13:04:13 UTC

From Or's investigation it looks like an issue with the guest config - it was not using consistent naming (something that especially SR-IOV VFs suffer from) and that lead to mismatched interfaces and issues with connectivity. How multiqueue (only applied on the Pod network) was related to that is a mystery to me.

So to me it's a problem of QE's guest configuration until proven otherwise. If it really ends up being a multiqueue problem, then it could be CNV network (configuring multiqueue on the TAP), libvirt, or below.

Comment 13 Fabian Deutsch 2023-07-19 13:11:05 UTC

Thanks, Petr.

It would be great if we can remove the uncertainty and understand if it's a guest configuration problem (do we need to document anything here then?) - or a mqueue problem.