Bug 2062126
Summary: | IPfailover pod is crashing during creation showing keepalived_script doesn't exist | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Melvin Joseph <mjoseph> |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | aos-bugs, bperkins, hongli, mjoseph, mmasters |
Version: | 4.11 | Flags: | mjoseph:
needinfo-
|
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: If the user does not specify a network interface name, keepalived-ipfailover tries a few default names. This list of default names is hard-coded and previously included only the following values: "enp0s3", "enp0s8", and "eth1". If the host uses predictable network interface names as assigned by systemd/udev and the host has a network interface in a PCI Express hotplug slot, then the network interface is given a name of the form "ens3", in which case none of these default names matches it.
Consequence: keepalived-ipfailover failed to start if the user didn't specify a network interface name and the network interface was in a PCI Express hotplug slot.
Fix: The list of default names in keepalived-ipfailover was augmented by adding "ens3" to the end of the list.
Result: keepalived-ipfailover now checks for an "ens3" network interface, increasing the likelihood that keepalived-ipfailover finds a network interface in a PCI Express hotplug slot when the user does not specify which network interface name to use. Because this fix only changes the defaulting logic and adds to the end of the list of default names, it does not affect the behavior when the user specifies a name or when one of the other default names matches a network interface.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:52:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Melvin Joseph
2022-03-09 08:39:56 UTC
Setting blocker- as this appears to be a configuration issue. The "default user 'keepalived_script' for script execution does not exist" message is only a warning. According to <https://www.keepalived.org/manpage.html>, keepalived defaults to the user "keepalived_script" if it exists and otherwise defaults to the user that keepalived is already running as. The reason why keepalived exits is probably because the "enp0s3" interface doesn't exist. Does the host have an "enp0s3" interface? If not, try setting OPENSHIFT_HA_NETWORK_INTERFACE in the ipfailover deployment to the correct interface name. melvinjoseph@mjoseph-mac Downloads % oc set env deploy/ipfailover OPENSHIFT_HA_NETWORK_INTERFACE=ens3 deployment.apps/ipfailover updated melvinjoseph@mjoseph-mac Downloads % oc get po NAME READY STATUS RESTARTS AGE ipfailover-796d85684d-6dlb4 1/1 Running 0 20s ipfailover-796d85684d-7dtx2 1/1 Running 0 20s When i changed the interface to ens3 in the ipfailover config file the pods came up. Normally when i use this deploy script there was not such issue in past, and i was verifying one ipfailvoer bug, so thought the fix break the feature. The naming difference is related to the underlying hardware. The naming system that systemd/udev uses to name network interfaces is defined at <https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/>: > The following different naming schemes for network interfaces are now supported by udev natively: > > 1. Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1) > 2. Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1) > 3. Names incorporating physical/geographical location of the connector of the hardware (example: enp2s0) > 4. Names incorporating the interfaces's MAC address (example: enx78e7d1ea46da) > 5. Classic, unpredictable kernel-native ethX naming (example: eth0) So apparently previous clusters on which you have tested keepalived-ipfailover used scheme 3 (resulting in the name "enp0s3"), but for more recent tests, you have a cluster with PCIe hotplug slots, which are named using scheme 2 (resulting in the name "ens3"). The keepalived-ipfailover configuration script guesses a few names if the user doesn't configure one: > VBOX_INTERFACES="enp0s3 enp0s8 eth1" Source: <https://github.com/openshift/images/blob/86494446733fc171ee757e8166191e32d5931eb9/ipfailover/keepalived/lib/utils.sh#L6>. > function get_network_device() { > for dev in $1 ${VBOX_INTERFACES}; do > if ip addr show dev "$dev" &> /dev/null; then > echo "$dev" > return > fi > done > > ip route get 8.8.8.8 | awk '/dev/ { f=NR }; f && (NR-1 == f)' RS=" " > } Source: <https://github.com/openshift/images/blob/86494446733fc171ee757e8166191e32d5931eb9/ipfailover/keepalived/lib/utils.sh#L246-L255>. We could add "ens3" to the end of VBOX_INTERFACES to autodetect this name as well. Is that desirable? melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.ci.test-2022-03-12-022506-ci-ln-rttk652-latest True False 70m Cluster version is 4.10.0-0.ci.test-2022-03-12-022506-ci-ln-rttk652-latest melvinjoseph@mjoseph-mac Downloads % oc create sa ipfailover serviceaccount/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user priviledged -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:priviledged added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user hostnetwork -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc create configmap keepalived-checkscript --from-file=mycheckscript.sh configmap/keepalived-checkscript createdmelvinjoseph@mjoseph-mac Downloads % oc create -f https://github.com/jechen0648/ipfailover/blob/main/deploy-ipfailover.yaml deployment.apps/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc get all -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/ipfailover-788b595477-cd672 1/1 Running 0 18s 10.0.3.139 rttk652-b5564-vnk62-worker-0-t4v78 <none> <none> pod/ipfailover-788b595477-xblc5 1/1 Running 0 18s 10.0.1.135 rttk652-b5564-vnk62-worker-0-m77tv <none> <none> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 93m <none> service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 89m <none> NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ipfailover 2/2 2 2 20s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ipfailover-788b595477 2 2 2 19s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover,pod-template-hash=788b595477 melvinjoseph@mjoseph-mac Downloads % oc logs ipfailover-788b595477-cd672 - Loading ip_vs module ... - Checking if ip_vs module is available ... ip_vs 172032 0 - Module ip_vs is loaded. - check for iptables rule for keepalived multicast (224.0.0.18) ... chroot: cannot change root directory to '/host': No such file or directory - adding iptables rule to INPUT to access 224.0.0.18. chroot: cannot change root directory to '/host': No such file or directory - Generating and writing config to /etc/keepalived/keepalived.conf - Starting failover services ... Sat Mar 12 04:25:56 2022: Starting Keepalived v2.1.5 (07/13,2020) Sat Mar 12 04:25:56 2022: Running on Linux 4.18.0-305.40.2.el8_4.x86_64 #1 SMP Tue Mar 8 14:29:54 EST 2022 (built for Linux 4.18.0) Sat Mar 12 04:25:56 2022: Command line: '/usr/sbin/keepalived' '-D' '-n' '--log-console' Sat Mar 12 04:25:56 2022: Opening file '/etc/keepalived/keepalived.conf'. Sat Mar 12 04:25:56 2022: NOTICE: setting config option max_auto_priority should result in better keepalived performance Sat Mar 12 04:25:56 2022: Starting VRRP child process, pid=74 Sat Mar 12 04:25:56 2022: Registering Kernel netlink reflector Sat Mar 12 04:25:56 2022: Registering Kernel netlink command channel Sat Mar 12 04:25:56 2022: Opening file '/etc/keepalived/keepalived.conf'. Sat Mar 12 04:25:56 2022: WARNING - default user 'keepalived_script' for script execution does not exist - please create. Sat Mar 12 04:25:56 2022: (/etc/keepalived/keepalived.conf: Line 29) Truncating auth_pass to 8 characters Sat Mar 12 04:25:56 2022: SECURITY VIOLATION - scripts are being executed but script_security not enabled. Sat Mar 12 04:25:56 2022: (ipfailover_VIP_1) Warning - nopreempt will not work with initial state MASTER - clearing Sat Mar 12 04:25:56 2022: Assigned address 10.0.3.139 for interface ens3 Sat Mar 12 04:25:56 2022: Assigned address fe80::f7c0:638d:a465:aba7 for interface ens3 Sat Mar 12 04:25:56 2022: Registering gratuitous ARP shared channel Sat Mar 12 04:25:56 2022: (ipfailover_VIP_1) removing VIPs. Sat Mar 12 04:25:56 2022: VRRP sockpool: [ifindex( 2), family(IPv4), proto(112), fd(10,11)] Sat Mar 12 04:25:56 2022: Script `chk_ipfailover` now returning 1 Sat Mar 12 04:25:56 2022: VRRP_Script(chk_ipfailover) failed (exited with status 1) Sat Mar 12 04:25:56 2022: (ipfailover_VIP_1) Entering FAULT STATE Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |