Bug 2159066

Summary: stopped detecting port conflicts with kernel 6.0.16 - bind() does not fail with "address already in use" any more
Product: [Fedora] Fedora Reporter: Martin Pitt <mpitt>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 37CC: acaringi, acui, adscvr, airlied, alciregi, andy, bbaude, bskeggs, cks-rhbugzilla, container-sig, debarshir, dwalsh, go-sig, hdegoede, hpa, ismail, jarodwilson, jglisse, jnovy, josef, kernel-maint, lgoncalv, linville, lsm5, masami256, mchehab, mheon, mhofmann, mhroncok, nvwarr, patrick, pehunt, pholzing, ptalbert, rh.container.bot, santiago, steved
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: CockpitTest
Fixed In Version: kernel-6.1.4-200.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-26 08:34:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to fix it none

Description Martin Pitt 2023-01-08 09:19:24 UTC
Description of problem: Cockpit CI's recent Fedora 37 image refresh [1] detected a regression with podman's port conflict handling.

Version-Release number of selected component (if applicable):

The image build log [2] has a complete list of package updates. Of these, I confirmed the kernel-core 6.0.15-300.fc37 -> 6.0.16-300.fc37 update to be the one that triggers the regression. However, I am filing this against podman for your initial triaging, as I don't know if that's a kernel regression or some new feature that podman needs to be adapted to.

podman-4.3.1-1.fc37.x86_64
crun-1.7.2-3.fc37.x86_64
kernel-core-6.0.16-300.fc37.x86_64


How reproducible: Always


Steps to Reproduce:

  podman run -d -p 5000:5000 --name c1 registry:2
  podman run -d -p 5000:5000 --name c2 registry:2

Actual results: Both commands succeed

Expected results: Until recently, the second command failed with

  Error: c2 listen tcp 0.0.0.0:5000: bind: address already in use

Now both containers start and "podman ps" claims that they are both forwarding local port 5000 to the container. But this (naturally) only works for the first container.

Additional info:
[1] https://github.com/cockpit-project/bots/pull/4248
[2] https://cockpit-logs.us-east-1.linodeobjects.com/image-refresh-logs/fedora-37-20230106-230101.log

Comment 1 Daniel Walsh 2023-01-08 11:49:56 UTC
Might be an issue with the kernel.

I did something similar with
nc -l 5001 &
nc -l 5001

Comment 2 Andy Harvey 2023-01-08 16:51:43 UTC
This does seem to be a problem in the kernel. I am also seeing it in 6.0.17-300.fc37.x86_64 & 6.0.18-300.fc37.x86_64.

I see it when I start two ssh sessions going to a server both using X11 forwarding. The second or subsequent sessions get allocated the same port number for the forwarding ports.
On kernel 6.0.15 with debug enabled sshd reports "Address already in use" as it cycles through ports 6010- onwards.
On the faulty kernels sshd just allocates port 6010 on all sessions.

Comment 3 Martin Pitt 2023-01-08 19:02:07 UTC
Ack, this starts looking serious! Reassigning to the kernel then.

Comment 4 Miro HronĨok 2023-01-11 17:57:29 UTC
This is no longer happening in kernel-6.1.4-200.fc37.

See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/7VPNMC77YC3SI5LFYKUA4B5MTFPLTLVB/

Comment 5 Ben Cotton 2023-01-12 15:48:35 UTC
*** Bug 2159802 has been marked as a duplicate of this bug. ***

Comment 6 nvwarr 2023-01-13 06:50:16 UTC
This bug breaks X11 forwarding in ssh and also breaks some in-house software. For us it is quite a severe showstopper. Fortunately, it seems easy to fix. In net/ipv4/inet_connection_sock.c:370, ret should be initialised to -EADDRINUSE not 1 in the function inet_csk_get_port. Patching this seems to fix the problem. The suggestion is from:
https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmagdk9CRg@mail.gmail.com/ which Miro pointed to.

Comment 7 nvwarr 2023-01-13 06:51:09 UTC
Created attachment 1937748 [details]
patch to fix it

Here's the actual patch

Comment 8 Martin Pitt 2023-02-26 08:34:19 UTC
Current F37/F38 kernels seem fine.