Bug 2159066 - stopped detecting port conflicts with kernel 6.0.16 - bind() does not fail with "address already in use" any more
Summary: stopped detecting port conflicts with kernel 6.0.16 - bind() does not fail wi...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: CockpitTest
: 2159802 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-08 09:19 UTC by Martin Pitt
Modified: 2023-02-26 08:34 UTC (History)
37 users (show)

Fixed In Version: kernel-6.1.4-200.fc37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-26 08:34:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
patch to fix it (521 bytes, patch)
2023-01-13 06:51 UTC, nvwarr
no flags Details | Diff

Description Martin Pitt 2023-01-08 09:19:24 UTC
Description of problem: Cockpit CI's recent Fedora 37 image refresh [1] detected a regression with podman's port conflict handling.

Version-Release number of selected component (if applicable):

The image build log [2] has a complete list of package updates. Of these, I confirmed the kernel-core 6.0.15-300.fc37 -> 6.0.16-300.fc37 update to be the one that triggers the regression. However, I am filing this against podman for your initial triaging, as I don't know if that's a kernel regression or some new feature that podman needs to be adapted to.

podman-4.3.1-1.fc37.x86_64
crun-1.7.2-3.fc37.x86_64
kernel-core-6.0.16-300.fc37.x86_64


How reproducible: Always


Steps to Reproduce:

  podman run -d -p 5000:5000 --name c1 registry:2
  podman run -d -p 5000:5000 --name c2 registry:2

Actual results: Both commands succeed

Expected results: Until recently, the second command failed with

  Error: c2 listen tcp 0.0.0.0:5000: bind: address already in use

Now both containers start and "podman ps" claims that they are both forwarding local port 5000 to the container. But this (naturally) only works for the first container.

Additional info:
[1] https://github.com/cockpit-project/bots/pull/4248
[2] https://cockpit-logs.us-east-1.linodeobjects.com/image-refresh-logs/fedora-37-20230106-230101.log

Comment 1 Daniel Walsh 2023-01-08 11:49:56 UTC
Might be an issue with the kernel.

I did something similar with
nc -l 5001 &
nc -l 5001

Comment 2 Andy Harvey 2023-01-08 16:51:43 UTC
This does seem to be a problem in the kernel. I am also seeing it in 6.0.17-300.fc37.x86_64 & 6.0.18-300.fc37.x86_64.

I see it when I start two ssh sessions going to a server both using X11 forwarding. The second or subsequent sessions get allocated the same port number for the forwarding ports.
On kernel 6.0.15 with debug enabled sshd reports "Address already in use" as it cycles through ports 6010- onwards.
On the faulty kernels sshd just allocates port 6010 on all sessions.

Comment 3 Martin Pitt 2023-01-08 19:02:07 UTC
Ack, this starts looking serious! Reassigning to the kernel then.

Comment 4 Miro Hrončok 2023-01-11 17:57:29 UTC
This is no longer happening in kernel-6.1.4-200.fc37.

See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/7VPNMC77YC3SI5LFYKUA4B5MTFPLTLVB/

Comment 5 Ben Cotton 2023-01-12 15:48:35 UTC
*** Bug 2159802 has been marked as a duplicate of this bug. ***

Comment 6 nvwarr 2023-01-13 06:50:16 UTC
This bug breaks X11 forwarding in ssh and also breaks some in-house software. For us it is quite a severe showstopper. Fortunately, it seems easy to fix. In net/ipv4/inet_connection_sock.c:370, ret should be initialised to -EADDRINUSE not 1 in the function inet_csk_get_port. Patching this seems to fix the problem. The suggestion is from:
https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmagdk9CRg@mail.gmail.com/ which Miro pointed to.

Comment 7 nvwarr 2023-01-13 06:51:09 UTC
Created attachment 1937748 [details]
patch to fix it

Here's the actual patch

Comment 8 Martin Pitt 2023-02-26 08:34:19 UTC
Current F37/F38 kernels seem fine.


Note You need to log in before you can comment on or make changes to this bug.