Bug 2066891
| Summary: | ovnkube-trace fails if the container doesn't have bash shell | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Pablo Alonso Rodriguez <palonsor> |
| Component: | Networking | Assignee: | Andreas Karis <akaris> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | high | ||
| Priority: | low | CC: | rbrattai |
| Version: | 4.7 | Flags: | rbrattai:
needinfo-
|
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-04-30 18:04:53 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Pablo Alonso Rodriguez
2022-03-22 17:05:13 UTC
For reference, a command like this seems to work to replace what the code in[2] does:
oc rsh -c ovnkube-node ovnkube-node-7dxzf bash -c 'ip -o link show "$(chroot /host crictl pods --state ready --namespace "^openshift-dns\$" --name "^dns-default-qk94t\$" -q | head -1 -c15)"' | sed -r -e 's/^.+if([0-9]+):.+$/\1/g'
Where:
- ovnkube-node-7dxzf is the ovnkube-node pod of the node (requiring bash here is not a problem because it is under our control)
- chroot /host : We chroot to the host to get access to crio
- crictl pods --state ready --namespace "^openshift-dns\$" --name "^dns-default-qk94t\$" -q : This gets the pod ID from crio
- head -1 -c15 - We get the first 15 characters of the crio pod id because they match the veth pair interface on the host network whose ifindex we want to retrieve.
- ip -o link show: We get a oneliner with the veth pair information with the other side of the pair on host network. Note that the name is displayed like ${FIRST_15_CHARACTERS_OF_POD_ID}@if${THE_INDEX_WE_WANT_IN_POD_NETWORK_NAMESPACE}
- sed -r -e 's/^.+if([0-9]+):.+$/\1/g' - This extracts the desired ifindex from the @ifXX on the name.
I am open to any comments on this approach.
Regards.
Fails on HyperShift hosted cluster F0808 17:33:08.168679 34860 ovnkube-trace.go:1049] Failed to get database URIs: cannot find ovnkube pods with container: ovnkube-master On HyperShift ovnkube-node root 3027 0.1 0.7 748276 57316 ? Ssl 17:07 0:02 /usr/bin/ovnkube --init-node ip-10.compute.internal --nb-address ssl:ovnkube-sbdb-clusters-hypershift-ci-13033.apps.o412a11h.qe.devcluster.openshift.com:443 --sb-address ssl:ovnkube-sbdb-clusters-hypershift-ci-13033.apps.o412a11h.qe.devcluster.openshift.com:443 --nb-client-privkey /ovn-cert/tls.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nb-cert-common-name ovn --sb-client-privkey /ovn-cert/tls.key --sb-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --config-file=/run/ovnkube-config/ovnkube.conf --loglevel 4 --inactivity-probe=180000 --gateway-mode shared --gateway-interface br-ex --metrics-bind-address 127.0.0.1:29103 --ovn-metrics-bind-address 127.0.0.1:29105 --metrics-enable-pprof --export-ovs-metrics --disable-snat-multiple-gws I don't think that this ever worked on hypershift. Can you check if this works on a non-hypershift cluster, and then we'd need an RFE or a bug for ovnkube-trace on hypershift? I created https://issues.redhat.com/browse/OCPBUGS-298 to track the hypershift use case. But indeed, ovnkube-trace was never compatible with hypershift. The code that it fails on wasn't modified for this BZ: ~~~ func getDatabaseURIs(coreclient *corev1client.CoreV1Client, restconfig *rest.Config, ovnNamespace string) (string, string, bool, error) { containerName := "ovnkube-master" var err error found := false var podName string listOptions := metav1.ListOptions{} pods, err := coreclient.Pods(ovnNamespace).List(context.TODO(), listOptions) if err != nil { return "", "", false, err } for _, pod := range pods.Items { for _, container := range pod.Spec.Containers { if container.Name == containerName { found = true podName = pod.Name break } } } if !found { klog.V(5).Infof("Cannot find ovnkube pods with container %s", containerName) return "", "", false, fmt.Errorf("cannot find ovnkube pods with container: %s", containerName) } ~~~ Verified on 4.12.0-0.nightly-2022-09-20-095559
Tested on containers created `FROM scratch`
test-ovnkube-trace.sh passed with one issue, worker node matching fails on 3-node all-in-one
oc get nodes --show-labels | awk '!/node-role.kubernetes.io\/master=|node-role.kubernetes.io\/control-plane=/ && $1!="NAME" {print $1}'
fails on nodes with these labels.
NAME STATUS ROLES AGE VERSION LABELS
master-0-0.o412e1db-0.qe.lab.redhat.com Ready control-plane,master,worker 4d22h v1.24.0+07c9eb7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-0-0.o412e1db-0.qe.lab.redhat.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
master-0-1.o412e1db-0.qe.lab.redhat.com Ready control-plane,master,worker 4d22h v1.24.0+07c9eb7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-0-1.o412e1db-0.qe.lab.redhat.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
master-0-2.o412e1db-0.qe.lab.redhat.com Ready control-plane,master,worker 4d22h v1.24.0+07c9eb7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master-0-2.o412e1db-0.qe.lab.redhat.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
OCP is no longer using Bugzilla and this bug appears to have been left in an orphaned state. If the bug is still relevant, please open a new issue in the OCPBUGS Jira project: https://issues.redhat.com/projects/OCPBUGS/summary |