Bug 1960263
Summary: | SR-IOV obliviously reboot the node | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Peng Liu <pliu> |
Component: | Networking | Assignee: | Peng Liu <pliu> |
Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | dosmith, keyoung, vlaad, zzhao |
Version: | 4.6 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1960103 | Environment: | |
Last Closed: | 2022-08-26 14:18:24 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1960103 | ||
Bug Blocks: |
Comment 4
zhaozhanqi
2021-06-07 10:52:12 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.34 bux fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2267 Need to backport one more patch which fixes the scenario where custom MCP is created. Verified this bug on 4.6.0-202106232234 Create the following yaml file at same time # cat 1g-mc.yaml intel-dpdk.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 50-kargs-1g-hugepages spec: kernelArguments: - default_hugepagesz=1G - hugepagesz=1G - hugepages=4 apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci mtu: 1700 nicSelector: deviceID: "158b" pfNames: - ens1f1 rootDevices: - '0000:3b:00.1' vendor: '8086' nodeSelector: feature.node.kubernetes.io/sriov-capable: 'true' numVfs: 2 priority: 99 resourceName: inteldpdk # oc logs sriov-network-config-daemon-klpbk | grep MCP I0628 02:03:05.797224 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait... I0628 02:03:35.801191 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait... I0628 02:04:05.802850 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait... I0628 02:09:41.253031 3422 daemon.go:486] completeDrain(): resume MCP worker # oc describe node dell-per740-14.rhts.eng.pek2.redhat.com Name: dell-per740-14.rhts.eng.pek2.redhat.com Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux feature.node.kubernetes.io/sriov-capable=true kubernetes.io/arch=amd64 kubernetes.io/hostname=dell-per740-14.rhts.eng.pek2.redhat.com kubernetes.io/os=linux node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos Annotations: k8s.ovn.org/l3-gateway-config: {"default":{"mode":"shared","interface-id":"br-ex_dell-per740-14.rhts.eng.pek2.redhat.com","mac-address":"e4:43:4b:5b:6c:28","ip-addresses... k8s.ovn.org/node-chassis-id: 44557c31-ea74-49f4-abae-78316e0dffa3 k8s.ovn.org/node-join-subnets: {"default":"100.64.3.0/29"} k8s.ovn.org/node-local-nat-ip: {"default":["169.254.12.7"]} k8s.ovn.org/node-mgmt-port-mac-address: 36:26:ff:f3:a3:8b k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.73.116.62/23","ipv6":"2620:52:0:4974:928e:6695:d41e:b1a4/64"} k8s.ovn.org/node-subnets: {"default":"10.128.2.0/23"} machineconfiguration.openshift.io/currentConfig: rendered-worker-d2dd550696cbfafc253011805efcfe77 machineconfiguration.openshift.io/desiredConfig: rendered-worker-d2dd550696cbfafc253011805efcfe77 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done sriovnetwork.openshift.io/state: Idle volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Fri, 25 Jun 2021 06:04:40 -0400 Taints: <none> Unschedulable: false Lease: HolderIdentity: dell-per740-14.rhts.eng.pek2.redhat.com AcquireTime: <unset> RenewTime: Sun, 27 Jun 2021 22:28:43 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:20 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.73.116.62 Hostname: dell-per740-14.rhts.eng.pek2.redhat.com Capacity: cpu: 32 ephemeral-storage: 584963052Ki hugepages-1Gi: 4Gi memory: 32479680Ki openshift.io/inteldpdk: 2 pods: 250 Allocatable: cpu: 31500m ephemeral-storage: 538028206007 hugepages-1Gi: 4Gi memory: 27134400Ki openshift.io/inteldpdk: 2 pods: 250 |