Bug 2099794

Summary:	crio occasionally fails to start during deployment
Product:	Red Hat Enterprise Linux 8	Reporter:	Alexander Chuzhoy <sasha>
Component:	NetworkManager	Assignee:	Beniamino Galvani <bgalvani>
Status:	CLOSED ERRATA	QA Contact:	David Jaša <djasa>
Severity:	high	Docs Contact:
Priority:	medium
Version:	8.6	CC:	bgalvani, dornelas, jligon, lrintel, mfilanov, mko, mrussell, nmagnezi, nstielau, otuchfel, pehunt, pmaidmen, rkhan, rravaiol, rravanel, sfaye, sizucchi, sukulkar, thaller, till, toneata, trwest, vbenes, wking
Target Milestone:	rc	Keywords:	TestBlocker, Triaged, ZStream
Target Release:	---	Flags:	rravaiol: needinfo-
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	NetworkManager-1.40.2-1.el8	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	2123412 2132059 2132281 2132282 (view as bug list)		Environment:
Last Closed:	2023-05-16 09:04:54 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2123412, 2132059, 2132281, 2132282

Description Alexander Chuzhoy 2022-06-21 17:21:13 UTC

OCP: 4.11.0-0.nightly-2022-06-21-094850


This is SNO with bootstrap in place.

The issue happens after the first reboot during deployment.

[core@api ~]$ sudo -i
[systemd]
Failed Units: 1
  crio.service


[root@api ~]# journalctl -u crio
-- Logs begin at Tue 2022-06-21 16:55:53 UTC, end at Tue 2022-06-21 17:10:01 UTC. --
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082253998Z" level=info msg="Starting CRI-O, version: 1.24.1-5.rhaos4.11.gitff26732.el8, git: ()"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082423729Z" level=info msg="Node configuration value for hugetlb cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082434163Z" level=info msg="Node configuration value for pid cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082496908Z" level=info msg="Node configuration value for memoryswap cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082504163Z" level=info msg="Node configuration value for cgroup v2 is false"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.088160794Z" level=info msg="Node configuration value for systemd CollectMode is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.093884595Z" level=info msg="Node configuration value for systemd AllowedCPUs is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.095352059Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SET>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.099186965Z" level=info msg="Conmon does support the --sync option"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.099205563Z" level=info msg="Conmon does support the --log-global-size-max option"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100484264Z" level=info msg="Using seccomp default profile when unspecified: true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100504335Z" level=info msg="No seccomp profile specified, using the internal default"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100511142Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100518055Z" level=info msg="No blockio config file specified, blockio not configured"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100523743Z" level=info msg="RDT not available in the host system"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100685256Z" level=info msg="Updated default CNI network name to "
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.133535760Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/vers>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.134275546Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8::63]:10010: bind: cannot assign reques>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Failed with result 'exit-code'.
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Consumed 85ms CPU time



When I manually tried to start the crio service - it went up:

[root@api ~]# systemctl start crio
[root@api ~]# journalctl -u crio
-- Logs begin at Tue 2022-06-21 16:55:53 UTC, end at Tue 2022-06-21 17:10:47 UTC. --
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082253998Z" level=info msg="Starting CRI-O, version: 1.24.1-5.rhaos4.11.gitff26732.el8, git: ()"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082423729Z" level=info msg="Node configuration value for hugetlb cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082434163Z" level=info msg="Node configuration value for pid cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082496908Z" level=info msg="Node configuration value for memoryswap cgroup is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.082504163Z" level=info msg="Node configuration value for cgroup v2 is false"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.088160794Z" level=info msg="Node configuration value for systemd CollectMode is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.093884595Z" level=info msg="Node configuration value for systemd AllowedCPUs is true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.095352059Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SET>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.099186965Z" level=info msg="Conmon does support the --sync option"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.099205563Z" level=info msg="Conmon does support the --log-global-size-max option"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100484264Z" level=info msg="Using seccomp default profile when unspecified: true"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100504335Z" level=info msg="No seccomp profile specified, using the internal default"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100511142Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100518055Z" level=info msg="No blockio config file specified, blockio not configured"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100523743Z" level=info msg="RDT not available in the host system"
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.100685256Z" level=info msg="Updated default CNI network name to "
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.133535760Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/vers>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com crio[2344]: time="2022-06-21 16:58:04.134275546Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8::63]:10010: bind: cannot assign reques>
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Failed with result 'exit-code'.
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
Jun 21 16:58:04 api.sno-0.qe.lab.redhat.com systemd[1]: crio.service: Consumed 85ms CPU time
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.626656871Z" level=info msg="Starting CRI-O, version: 1.24.1-5.rhaos4.11.gitff26732.el8, git: ()"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.626784767Z" level=info msg="Node configuration value for hugetlb cgroup is true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.626809614Z" level=info msg="Node configuration value for pid cgroup is true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.626881763Z" level=info msg="Node configuration value for memoryswap cgroup is true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.626887105Z" level=info msg="Node configuration value for cgroup v2 is false"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.632928030Z" level=info msg="Node configuration value for systemd CollectMode is true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.637856941Z" level=info msg="Node configuration value for systemd AllowedCPUs is true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.638751521Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SET>
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.642741142Z" level=info msg="Conmon does support the --sync option"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.642784677Z" level=info msg="Conmon does support the --log-global-size-max option"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643122373Z" level=info msg="Using seccomp default profile when unspecified: true"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643137845Z" level=info msg="No seccomp profile specified, using the internal default"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643143217Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643148287Z" level=info msg="No blockio config file specified, blockio not configured"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643153129Z" level=info msg="RDT not available in the host system"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.643203668Z" level=info msg="Updated default CNI network name to "
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.651618205Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/vers>
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com crio[5448]: time="2022-06-21 17:10:43.652596800Z" level=info msg="Serving metrics on :9537 via HTTP"
Jun 21 17:10:43 api.sno-0.qe.lab.redhat.com systemd[1]: Started Container Runtime Interface for OCI (CRI-O).
[root@api ~]# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD




When I rebooted the node - the crio service started fine and the deployment proceeded normally.

Comment 3 Alexander Chuzhoy 2022-06-27 17:13:39 UTC

*** Bug 2101490 has been marked as a duplicate of this bug. ***

Comment 4 Trey West 2022-07-11 20:10:09 UTC

Hey @pehunt , these are the full logs:

Jul 11 19:44:38 spoke-rwn-0-1 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.013451971Z" level=info msg="Starting CRI-O, version: 1.24.1-11.rhaos4.11.gitb0d2ef3.el8, git: ()"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.014017663Z" level=info msg="Node configuration value for hugetlb cgroup is true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.014108765Z" level=info msg="Node configuration value for pid cgroup is true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.014193717Z" level=info msg="Node configuration value for memoryswap cgroup is true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.014237574Z" level=info msg="Node configuration value for cgroup v2 is false"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.022080843Z" level=info msg="Node configuration value for systemd CollectMode is true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.029460658Z" level=info msg="Node configuration value for systemd AllowedCPUs is true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.031461785Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.035196028Z" level=info msg="Conmon does support the --sync option"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.035272468Z" level=info msg="Conmon does support the --log-global-size-max option"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.036460758Z" level=info msg="Using seccomp default profile when unspecified: true"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.036728910Z" level=info msg="No seccomp profile specified, using the internal default"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.036787570Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.036829129Z" level=info msg="No blockio config file specified, blockio not configured"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.036878867Z" level=info msg="RDT not available in the host system"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.037042906Z" level=info msg="Updated default CNI network name to "
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.067318702Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/version not found: open /var/lib/crio/version: no such file or directory"
Jul 11 19:44:39 spoke-rwn-0-1 crio[2144]: time="2022-07-11 19:44:39.068212424Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8:6::38]:10010: bind: cannot assign requested address"
Jul 11 19:44:39 spoke-rwn-0-1 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
Jul 11 19:44:39 spoke-rwn-0-1 systemd[1]: crio.service: Failed with result 'exit-code'.
Jul 11 19:44:39 spoke-rwn-0-1 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
Jul 11 19:44:39 spoke-rwn-0-1 systemd[1]: crio.service: Consumed 78ms CPU time

Comment 6 Alexander Chuzhoy 2022-07-13 19:52:45 UTC

We often run into this issue.

Comment 7 Peter Hunt 2022-08-15 19:17:36 UTC

it looks like the cri-o that previously was running is still somehow holding open the port. Are we sure there aren't two cri-o's running at the same time? how does the bootstrap in place work WRT kubelet/cri-o

Comment 8 Omer Tuchfeld 2022-08-16 13:18:19 UTC

The error for an already used port on a bind is "bind: address already in use", which is not the error we see in the logs here. However, the error when attempting to bind to an interface that doesn't exist seems to be to be the same as we see in the logs here: "bind: cannot assign requested address"

Comment 10 Paul Maidment 2022-08-17 21:02:31 UTC

------------------------------------------------------------------------------------------------------------------
Steps to reproduce:
------------------------------------------------------------------------------------------------------------------

Using a Beaker machine and assisted test infra

TEST_TEARDOWN=false KUBE_API=true ENABLE_KUBE_API=true NUM_MASTERS=1 ISO_IMAGE_TYPE=minimal-iso STATIC_IPS=true TEST_FUNC=test_kubeapi make setup run test_kube_api_parallel

------------------------------------------------------------------------------------------------------------------
When finished, you should be able to query KVM to find out the host file entries you should add for host addresses
------------------------------------------------------------------------------------------------------------------

# virsh list
 Id   Name                                   State
------------------------------------------------------
 1    minikube                               running
 4    test-infra-cluster-c0f6efad-master-0   running
 5    test-infra-cluster-20358370-master-0   running

From here you can query to find ip addresses to add to your hosts file

virsh net-list
 Name                                    State    Autostart   Persistent
--------------------------------------------------------------------------
 default                                 active   yes         yes
 mk-minikube                             active   yes         yes
 test-infra-net-20358370                 active   yes         yes
 test-infra-net-c0f6efad                 active   yes         yes
 test-infra-secondary-network-20358370   active   yes         yes
 test-infra-secondary-network-c0f6efad   active   yes         yes

[root@localhost assisted-test-infra]# virsh net-dhcp-leases test-infra-secondary-network-20358370
 Expiry Time           MAC address         Protocol   IP address           Hostname   Client ID or DUID
---------------------------------------------------------------------------------------------------------------------
 2022-08-17 17:02:03   02:00:00:68:a3:81   ipv6       3001:db9:1::e2/120   -          00:03:00:01:02:00:00:68:a3:81

[root@localhost assisted-test-infra]# virsh net-dhcp-leases test-infra-secondary-network-c0f6efad
 Expiry Time           MAC address         Protocol   IP address          Hostname                               Client ID or DUID
---------------------------------------------------------------------------------------------------------------------------------------
 2022-08-17 17:06:21   02:00:00:ed:a2:e8   ipv4       192.168.145.10/24   test-infra-cluster-c0f6efad-master-0   01:02:00:00:ed:a2:e8

------------------------------------------------------------------------------------------------------------------
Add some entries to /etc/hosts
------------------------------------------------------------------------------------------------------------------
3001:db9:1::e2 api.test-infra-cluster-20358370.redhat.com
192.168.145.10 api.test-infra-cluster-c0f6efad.redhat.com

------------------------------------------------------------------------------------------------------------------
Log into the command line of the IPv6 host
------------------------------------------------------------------------------------------------------------------
ssh core.redhat.com

 journalctl | grep crio

and you will see something similar to the following

Aug 17 18:41:31 test-infra-cluster-20358370-master-0 crio[2319]: time="2022-08-17 18:41:31.447622177Z" level=fatal msg="Failed to start streaming server: listen tcp [1001:db9:1::12]:10010: bind: cannot assign requested address"

Comment 12 Mat Kowalski 2022-08-24 13:26:38 UTC

Hi guys, we are hitting the same issue where the signature is

```
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.635765832Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8:c956::1a]:10010: bind: cannot assign requested address"
```

Luckily we have the full journal attached as part of https://issues.redhat.com/browse/OCPBUGS-511.

Comment 13 Michael Filanov 2022-08-24 13:49:40 UTC

Raising the priority, if the workaround is to reboot the node then it's not a good solution for ZTP, no one will be able to reboot the nodes in ZTP flow.

Comment 17 Alexander Chuzhoy 2022-08-31 15:08:00 UTC

The issue reproduces with 4.12:

NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="412.86.202208292256-0"
VERSION_ID="4.12"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 412.86.202208292256-0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.12/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.12"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.12"
OPENSHIFT_VERSION="4.12"
RHEL_VERSION="8.6"
OSTREE_VERSION="412.86.202208292256-0"


The only difference is that we don't see anything in the output from journalctl -u crio:


[core@api ~]$ sudo crictl ps
FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" 


[core@api ~]$ sudo systemctl status crio
● crio.service - Container Runtime Interface for OCI (CRI-O)
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─01-kubens.conf, 10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
   Active: inactive (dead)
     Docs: https://github.com/cri-o/cri-o


[core@api ~]$ sudo journalctl -u crio
-- Logs begin at Wed 2022-08-31 02:10:13 UTC, end at Wed 2022-08-31 13:17:08 UTC. --
-- No entries --

Comment 19 Paul Maidment 2022-09-19 08:00:47 UTC

There are broken tests because of this issue and it is considered to be a test blocker. Are there any updates on the status/proposed fixes for this?

Comment 20 Beniamino Galvani 2022-09-28 07:32:36 UTC

This seems a duplicate of bug 2096386. 

In RHEL 8.7 NM waits for DHCPv6 addresses to become non-tentative before signaling that the interface is ready. However, the original bug is still open because in case of DAD collision NM doesn't send a DHCPDECLINE, and so the fix is considered incomplete. Anyway, the fix should be enough to solve the "bind: cannot assign requested address" problem.

Comment 21 Mat Kowalski 2022-09-28 13:44:21 UTC

This issue has been very initially observed by the Assisted Installer team in OCP 4.11; my understanding is that fixing it in RHEL 8.7 makes that fix available only in OCP 4.12. If this is correct, can we please track the backport?

Comment 22 Peter Hunt 2022-09-28 13:55:19 UTC

I think 4.12 will stay on 8.6, 4.13 will move to 8.8. @bgalvani is a backport to 8.6 possible?

Comment 24 sfaye 2022-10-03 13:56:24 UTC

Hi Peter, 

Yes a backport to 8.6 is possible. Let me set the correct flag for that.

Comment 34 Thomas Haller 2022-11-18 11:04:31 UTC

there might be more to this.

As this is a duplicate of bug 2096386, which was already partly fixed in rhel-8.7-GA. But there was a follow-up fix that will make 2096386 fully fixed in future rhel-8.8.

that means, rhel-8.7 still partly has the problem. We probably should reopen the rhel-8.7.z bug 2132281.

See also https://bugzilla.redhat.com/show_bug.cgi?id=2096386#c31


In any case, this will be fixed on rhel-8.8. Moving this bug forward.

Comment 37 David Jaša 2022-11-22 03:09:28 UTC

VERIFIED in NetworkManager-1.40.2-1.el8

The duplicate address is removed from the host once kernel DAD finds that another interface in the network segment already has it.

Comment 39 errata-xmlrpc 2023-05-16 09:04:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2968