Bug 1962638
| Summary: | missing secret for sriov-network-config-daemon after upgrade from OCP 4.5.16 to 4.6.17 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> |
| Component: | Networking | Assignee: | Peng Liu <pliu> |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED NEXTRELEASE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | dansmall, gdiotte, pibanezr |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | Telco | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-24 06:11:48 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Andreas Karis
2021-05-20 12:41:56 UTC
@akaris The file '/var/run/secrets/kubernetes.io/serviceaccount/token' shall be injected into the pod by Kubernetes automatically. As you were doing an upgrade. Can you check the status of the MCP with 'oc get mcp'? Also please check if there are any other pods on the same node reporting the same error. The requested oc get mcp can be found below. Also of note: this has only been observed specifically on this sriov-network-config-daemon pod on this worker. It appears to be tied to reboots of this node, as a pull-secret that triggered a reboot reproduced the issue.
Node: worker-06 (0020-sosreport-worker-06-2021-05-18-qcorose.tar.xz)
~~~
[akaris@supportshell 02943641]$ omg get nodes --show-labels | grep loadbalancer | awk '{print $1}'
[WARN] Skipped 2/489 lines from the end of master-0.yaml to the load the yaml file properly
[WARN] Skipped 4/927 lines from the end of worker-04.yaml to the load the yaml file properly
worker-06
worker-07
~~~
~~~
[akaris@supportshell 02943641]$ omg get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
loadbalancer rendered-loadbalancer-e67038cac0e0c6059a2a452518ae1085 True False False 2 2 2 0 241d
master rendered-master-42a372c895c06a4bda9512763a899f20 True False False 3 3 3 0 241d
worker rendered-worker-b516ea73de067359e6f3d7cf3fe4d627 True False False 6 6 6 0 241d
~~~
~~~
[akaris@supportshell oc_adm_inspect_ns_openshift-sriov-network-operator]$ grep -R nodeName: inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov*
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-cni-559cn/sriov-cni-559cn.yaml: nodeName: worker-06
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-cni-cftwr/sriov-cni-cftwr.yaml: nodeName: worker-07
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-device-plugin-6kt5r/sriov-device-plugin-6kt5r.yaml: nodeName: worker-06
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-device-plugin-q58q5/sriov-device-plugin-q58q5.yaml: nodeName: worker-07
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-46tb4/sriov-network-config-daemon-46tb4.yaml: nodeName: worker-06
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-6lml7/sriov-network-config-daemon-6lml7.yaml: nodeName: worker-07
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-bqrlx/sriov-network-config-daemon-bqrlx.yaml: nodeName: worker-02
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-hvvxl/sriov-network-config-daemon-hvvxl.yaml: nodeName: worker-01
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-mnhj9/sriov-network-config-daemon-mnhj9.yaml: nodeName: worker-00
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-q5bqg/sriov-network-config-daemon-q5bqg.yaml: nodeName: worker-04
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-rlwcv/sriov-network-config-daemon-rlwcv.yaml: nodeName: worker-03
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-config-daemon-x9ssn/sriov-network-config-daemon-x9ssn.yaml: nodeName: worker-05
inspect.local.8920316685580493059/namespaces/openshift-sriov-network-operator/pods/sriov-network-operator-5879fb4869-v4xdn/sriov-network-operator-5879fb4869-v4xdn.yaml: nodeName: master-2
~~~
~~~
[akaris@supportshell 02943641]$ omg get mcp loadbalancer -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: '2020-09-16T20:37:17Z'
generation: 27
name: loadbalancer
resourceVersion: '385038868'
selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/loadbalancer
uid: 94bc148b-4701-456d-8a77-8de94fe99f5d
spec:
configuration:
name: rendered-loadbalancer-e67038cac0e0c6059a2a452518ae1085
source:
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 00-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-container-runtime
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-kubelet
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 05-hugepages-kernelarg
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 06-blacklist-sctp-module
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 11-worker-bonding
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 12-worker-sssd
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 15-load-eric-amf-modules
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 50-sshd-crypto-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 50-worker-idmap
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-coredns-override-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-loadbalancer-kernelarg-nosmt
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-generated-crio-capabilities
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-generated-registries
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-ssh
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values:
- worker
- load-balancer
nodeSelector:
matchLabels:
node-role.kubernetes.io/load-balancer: ''
paused: false
status:
conditions:
- lastTransitionTime: '2020-09-16T20:37:55Z'
message: ''
reason: ''
status: 'False'
type: NodeDegraded
- lastTransitionTime: '2021-05-02T00:03:29Z'
message: ''
reason: ''
status: 'False'
type: RenderDegraded
- lastTransitionTime: '2021-05-02T00:03:34Z'
message: ''
reason: ''
status: 'False'
type: Degraded
- lastTransitionTime: '2021-05-11T19:30:44Z'
message: All nodes are updated with rendered-loadbalancer-e67038cac0e0c6059a2a452518ae1085
reason: ''
status: 'True'
type: Updated
- lastTransitionTime: '2021-05-11T19:30:44Z'
message: ''
reason: ''
status: 'False'
type: Updating
configuration:
name: rendered-loadbalancer-e67038cac0e0c6059a2a452518ae1085
source:
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 00-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-container-runtime
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-worker-kubelet
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 05-hugepages-kernelarg
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 06-blacklist-sctp-module
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 11-worker-bonding
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 12-worker-sssd
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 15-load-eric-amf-modules
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 50-sshd-crypto-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 50-worker-idmap
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-coredns-override-worker
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-loadbalancer-kernelarg-nosmt
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-generated-crio-capabilities
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-generated-registries
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-worker-ssh
degradedMachineCount: 0
machineCount: 2
observedGeneration: 27
readyMachineCount: 2
unavailableMachineCount: 0
updatedMachineCount: 2
~~~
~~~
[akaris@supportshell 02943641]$ omg get machineconfig -A
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
00-worker fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
01-master-container-runtime fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
01-master-kubelet fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
01-worker-container-runtime fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
01-worker-kubelet fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 241d
05-hugepages-kernelarg 2.2.0 241d
06-blacklist-sctp-module 2.2.0 219d
11-master-bonding 2.2.0 241d
11-worker-bonding 2.2.0 241d
12-master-sssd 2.2.0 241d
12-worker-sssd 2.2.0 241d
15-load-eric-amf-modules 2.2.0 219d
50-sshd-crypto-master 2.2.0 151d
50-sshd-crypto-worker 2.2.0 151d
50-worker-idmap 2.2.0 151d
99-coredns-override-master 3.1.0 6d
99-coredns-override-worker 3.1.0 6d
99-loadbalancer-kernelarg-nosmt 2.2.0 241d
99-master-generated-crio-capabilities 2.2.0 151d
99-master-generated-registries fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
99-master-ssh 2.2.0 241d
99-worker-generated-crio-capabilities 2.2.0 151d
99-worker-generated-registries fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
99-worker-ssh 2.2.0 241d
rendered-loadbalancer-082d32812ca9768cdee577980a8103f4 287dd2cfa692ecbbce7b3bc1913b99b3e2d2f5c7 2.2.0 151d
rendered-loadbalancer-13a8335f2046645d727637f5ee2c72f2 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-loadbalancer-13d976bfebd30a32a75f701aa54c3096 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 219d
rendered-loadbalancer-1a7a6f58ac5b36732e0e07f5c6d3e24a cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-loadbalancer-205302633a5c29a3eee35a5ec330ebdf 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27 2.2.0 219d
rendered-loadbalancer-42cb1af1078712cf191710afa588e4b2 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-loadbalancer-45cf8b5a4ff8fde713c3b5b02c207a0c cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-loadbalancer-6bb55ae458d3c56eda7d3d789c7c4bcb fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-loadbalancer-718eec92432e91c77e644525f8552c9f 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27 2.2.0 241d
rendered-loadbalancer-d8d5c6141f75c9e80b1b05473dba8cc5 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-loadbalancer-e67038cac0e0c6059a2a452518ae1085 fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-loadbalancer-ec7c478427ea92c589d4d7eccac50b3e 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 194d
rendered-master-0606b8cd9cb3a1328dc1baeb511bea76 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 219d
rendered-master-42a372c895c06a4bda9512763a899f20 fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-master-52b76832aadf250ab2ac67b450d44164 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-master-57e1dae84829f1ad67e80feb8560d24c 287dd2cfa692ecbbce7b3bc1913b99b3e2d2f5c7 2.2.0 151d
rendered-master-62977f3ca03c2ed0be287ca6649975a0 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-master-96189514f64425ea300a36634233b8e3 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-master-a19f36ea563c2586a02e18250cdbac08 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-master-a4a5a77c90c26e6b43590edae3abd8e9 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27 2.2.0 241d
rendered-master-acef0d1f36a94836c42895441f85c866 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 6d
rendered-master-c29a5491507884db73e0a3cfacf7bb28 fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-master-d01f03e10cf5f4155acf642ef166b71d 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 194d
rendered-worker-17d2b9d62c511967a453d310cbbd36ec 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 194d
rendered-worker-577409a7776da128db21bebab94dfe30 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27 2.2.0 241d
rendered-worker-581ce191b1a159ff439b26b3f62eae11 fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-worker-73ab53151ec3d7ca201b15376fc0d612 287dd2cfa692ecbbce7b3bc1913b99b3e2d2f5c7 2.2.0 151d
rendered-worker-755eb578712a7ba1d27d24219dc5140a cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-worker-a77ffedbcca3cf125b0f67aebd523e85 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
rendered-worker-b25d232de2c0579357fe8c1075a8e324 601c2285f497bf7c73d84737b9977a0e697cb86a 2.2.0 219d
rendered-worker-b516ea73de067359e6f3d7cf3fe4d627 fc2e69c4408d898b24760eea9e889f0673369e67 3.1.0 6d
rendered-worker-e83002e9c5b12547b429517b338e56be cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 16d
rendered-worker-fa9440b6170bfeb7a801931bc56a30f1 480accd5d4f631d34e560aa5c8a3dfab0c7bbe27 2.2.0 219d
rendered-worker-faa847eb307c77e0edf12e109fa68391 cdce2822a6b3bff31b5aafc23b773f7dcbea2caa 2.2.0 151d
Could you help to collect the kubelet log of that node? I think I may find the root cause. Can you upgrade the sriov operator to 4.6 and see if the issue can be resolved? I think you hit a bug in the 4.5 code, which has been fixed in 4.6. Hi, I think the issue cannot be reproduced any more for this case. Can you just point out the code section which you think is the culprit and possibly the patch and we'll relay that to the customer. Thanks so much! - Andreas Hi Peng, New attachments have been added to the salesforce case that include the kubelet logs you requested. Cheers, Dan @akaris In 4.5 code, https://github.com/openshift/sriov-network-operator/blob/7637810f42a401af61095dbed107101beb774170/pkg/plugins/generic/generic_plugin.go#L121, here we chroot to the host root path with `utils.Chroot`. Normally, if 'utils.SyncNodeState' returns no error, the 'exit()' shall be invoked and chroot back to the pod's own root path, where the /var/run/secrets/kubernetes.io/serviceaccount/token is mounted. However, in your case, an error was returned, therefore the 'exit()' was skiped. So the process cannot find /var/run/secrets/kubernetes.io/serviceaccount/token any more. In 4.6, the logic was changed to ensure the 'exit()' is always invoked. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |