Bug 2125104 (PIPE_FD_SIGIO_POLL_HUP) - please indicate "Device Disconnected" with SIGIO & si_code==POLL_HUP on O_ASYNC & fcntl( ... { F_SETSIG + F_SETOWN } ) enabled pipe / FIFO FDs [NEEDINFO]
Summary: please indicate "Device Disconnected" with SIGIO & si_code==POLL_HUP on O_ASY...
Keywords:
Status: NEW
Alias: PIPE_FD_SIGIO_POLL_HUP
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-09-08 02:29 UTC by Jason Vas Dias
Modified: 2022-11-01 14:28 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Fedora 36+ on x86_64 (but should apply to ALL)
Last Closed:
Type: Enhancement
Embargoed:
jason.vas.dias: needinfo? (dhowells)


Attachments (Terms of Use)
Patch against fs/pipe.c of v5.19.7 (2.69 KB, patch)
2022-09-08 02:29 UTC, Jason Vas Dias
no flags Details | Diff
testcase: t_sigio.c (5.47 KB, text/x-csrc)
2022-09-08 02:32 UTC, Jason Vas Dias
no flags Details
testcase: t_sigio.c (5.87 KB, text/x-csrc)
2022-09-09 14:44 UTC, Jason Vas Dias
no flags Details
Patch against a RHEL8 v4.18 kernel for same issue - fs/pipe.c only (3.33 KB, patch)
2022-10-04 14:23 UTC, Jason Vas Dias
no flags Details | Diff
improved test case with corrected format to print pid & number of ns. (6.07 KB, text/x-csrc)
2022-10-04 18:04 UTC, Jason Vas Dias
no flags Details
Modified v6.0.5-200.fc36 spec file to build latest upstream v6.0.6 with patch applied (217.53 KB, text/plain)
2022-11-01 14:20 UTC, Jason Vas Dias
no flags Details
The v6.0.5-200.fc36 patch-6.0-redhat.patch modified to apply to v6.0.6's drivers/hdmi/vc4/vc4_hdmi.c (48.78 KB, text/plain)
2022-11-01 14:28 UTC, Jason Vas Dias
no flags Details

Description Jason Vas Dias 2022-09-08 02:29:54 UTC
Created attachment 1910355 [details]
Patch against fs/pipe.c of v5.19.7

1. Please describe the problem:

Compile & run the example program on any version of Linux prior to my patched
v5.19.7 version, and it prints:

$ gcc -o t_sigio t_sigio.c 
$ ./t_sigio
t_sigio[2002797]: 336339 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO....
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: Alarmed: killing cat pid 2002798.
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: 1131733499: 1131731216: 2283 : SIGIO #1 : (1 , 41).
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: FAILURE.

Processes with a read end of a pipe open to one writer, if that
one writer closes its write end (eg. by being killed),
that are not currently read()-ing the pipe, got a SIGIO with
the struct siginfo 'si_code' field set to 1 (POLL_IN) 
& si_band set to 41, even though 
  ioctl(fd, FIONREAD, &sz) returns with sz==0 .

Run it under the patched v5.19.7 kernel I produced by modifying
the latest Fedora 36 v5.19.6 spec file, and it prints:

$ uname -a
Linux jvdspc.jvds.net 5.19.7-200.fc36.x86_64+debug #1 SMP PREEMPT_DYNAMIC Wed Sep 7 21:14:57 IST 2022 x86_64 x86_64 x86_64 GNU/Linux
[jvd@jvdspc:~ [3230] 02:30:42 [#:3!:6975]{0}	
$ ./t_sigio
t_sigio[3257]: 855318 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO....
t_sigio[3257]: 0 bytes available.
t_sigio[3257]: Alarmed: killing cat pid 3258.
t_sigio[3257]: 0 bytes available.
t_sigio[3257]: SUCCESS.

So it DOES get a SIGIO (not printed, unfortunately...)
which sets 'si_code' to POLL_HUP , with the very short
attached patch to fs/pipe.c, which simply sends SIGIO
with si_code=POLL_HUP whenever no writers are detected
on a read pipe, or whenever a pipe is closed, and there
are no writers .

All Fedora + full GNOME or Xfce4 Desktop processes start up
& run fine under this patched kernel - Emacs displays Manual Pages fine
(very pipe dependant...) - and my test program (demonstrating
& fixing a problem in an I/O library I am developing) works fine.

So the patch appears to be harmless, and really makes I/O programming
with pipes & SIGIO much more robust & straightforward.



2. What is the Version-Release number of the kernel:

Fails test under all previous kernels to my patched 5.19.7 .

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

no.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

See #1.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

yes - tested under Fedora 36's v5.19.6, same as rawhide's version .

6. Are you running any modules that not shipped with directly Fedora's kernel?:

no

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Will do if requested - I did 'dmesg -c > ~/dmesg.log' - it is very big.

Comment 1 Jason Vas Dias 2022-09-08 02:32:26 UTC
Created attachment 1910356 [details]
testcase: t_sigio.c

$ gcc -o t_sigio sigio.c
$ ./t_sigio
t_sigio[4884]: 743001 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO....
t_sigio[4884]: 0 bytes available.
t_sigio[4884]: Alarmed: killing cat pid 4885.
t_sigio[4884]: 0 bytes available.
t_sigio[4884]: SUCCESS.

Comment 2 Jason Vas Dias 2022-09-08 02:39:11 UTC
I did report this in more detail and I submitted the patch & test program to kernel Bugzilla 216458 :
 https://bugzilla.kernel.org/show_bug.cgi?id=216458 

If @dhowells could please review this bug & patch & respond to lkml post I sent about
this I'd be much obliged.

Why can't Linux send POLL_HUP on pipes with no writers ? It helps alot and does no harm -
a read() still returns length 0 .

Comment 3 Jason Vas Dias 2022-09-08 14:20:49 UTC
After full BIOS & Peripherals (Dock) Firmware upgrade, with recent
'yum update', and use of 'gnome-software' to install Manufacturer's
BIOS Upgrade, during which ALL my Laptop's firmware was upgraded,
yesterday, my patched v5.19.7+debug kernel is running fine, no
problems seen that were not seen before (oops-es on Synaptic Touchpad
gesture non-recognition, and on failures of lvm2 to mount my loop
devices during boot ... these happened before are are non-issues 
/ easily rectified). Of course, it is not signed, so there were
a few extra messages in the logs about that ...
Attaching patches to the kernel.spec and 'process_configs.sh' files -
changes I had to make to get the RPMs to build - and I still had to
to 'rpmbuild -bb --short-circuit' at the end because the kabi-dw*.tar.xz
files for ABI 5.19.7 were not found.

A link to the SRPM on my Google Drive is :

https://drive.google.com/file/d/1Mtr1nJ6wGUQfKMZajhzjdZ2PZOyFUDo7/view?usp=sharing

I can upload the binary RPMs produced, if anyone wants:

$ ls -ltr kernel*.rpm
-rw-r--r--. 1 jvd devel    267569 Sep  8 01:06 kernel-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel    267677 Sep  8 01:06 kernel-debug-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel    267733 Sep  8 01:06 kernel-debug-devel-matched-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel    889781 Sep  8 01:06 kernel-debug-modules-internal-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel   3955805 Sep  8 01:06 kernel-debug-modules-extra-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel  16637969 Sep  8 01:07 kernel-debug-devel-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel  54246361 Sep  8 01:07 kernel-debug-core-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel  62176229 Sep  8 01:07 kernel-debug-modules-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel  81671093 Sep  8 01:08 kernel-debuginfo-common-x86_64-5.19.7-200.fc36.x86_64.rpm
-rw-r--r--. 1 jvd devel 847494721 Sep  8 01:10 kernel-debug-debuginfo-5.19.7-200.fc36.x86_64.rpm

Comment 4 Jason Vas Dias 2022-09-09 14:44:57 UTC
Created attachment 1910754 [details]
testcase: t_sigio.c

Improved to determine correct popen() pid to kill -
sometimes, on busy system, this may NOT be (pid+1).
This change has nothing to do with core pipe FD
issue being tested - it is just sometimes it could
kill the wrong pid & so get a 2nd alarm and fail.

Comment 5 Jason Vas Dias 2022-10-04 14:23:02 UTC
Created attachment 1915984 [details]
Patch against a RHEL8 v4.18 kernel for same issue - fs/pipe.c only

This patch builds & is tested against latest RHEL8 v4.18 kernel (Rocky Linux EL8)
.

No problems observed on target build + file (NFS) server platform,
and same t_sigio test program now passes.

Comment 6 Jason Vas Dias 2022-10-04 14:25:19 UTC
Here is a link to the SRPM for v4.18 based kernels I built & am running OK :
  https://drive.google.com/file/d/1i-kAfDegaLv2NXgh_21iRUVDmZfZuNaR/view?usp=sharing
on a Rocky RHEL8 clone remote fileserver ,
and here is a link to the SRPM for v5.19.12+ based kernels I am running,
with first version of fs/pipe.c patch applied, on my Fedora 36 
x86_64 12-core laptop :
  https://drive.google.com/file/d/1adj1LbooL2NCIEYz3z0mrmaDHeWhvOjR/view?usp=sharing
.
The only difference to the spec files are 
 A) not enabling kernel signing, and 
 B) applying my patch to fs/pipe.c .
They all work fine - no problems observed in server or laptop,
and the point is, how else can programs distinguish between
a closed final write+read with then what would be a read() return value of 0,
which is NOT perceived by my test program, since it does not enter
 read() or poll(),
when ioctl(fd, FIONREAD, &len) returns with len==0 , and a write with 
a still active receiver which can then go on to handle more input, without
possibility of hanging infinitely ?
There is no easy / robust / reliable answer before my patch, IMHO -
entering poll() or read() is what I am able to avoid with this patch,
which reliably MAKES NO DIFFERENCE to ANY existing O_ASYNC pipe reader / 
writer process on a fully up-to-date Rocky Linux distro server system and on a 
fully up-to-date Fedora desktop system , and DOES allow the end-of-input with
length==0 case to be easily distinguished from the waiting-for-more input case .

Comment 7 Jason Vas Dias 2022-10-04 18:04:42 UTC
Created attachment 1916012 [details]
improved test case with corrected format to print pid & number of ns.

$ gcc -o t_sigio t_sigio.c
[jvd@jvdspc:~ [3264] 18:53:41 [#:72!:7842]{0}	
$ ./t_sigio
t_sigio[45021]: 251050312 Listening to 'cat' (stdin), process 45022 on pipe fd: 3 with SIGIO....
t_sigio[45021]: 0 bytes available.
t_sigio[45021]: 1251195901: Alarmed: killing cat pid 45022.
t_sigio[45021]: 0 bytes available.
t_sigio[45021]: 1384607628: 1384604300: 3328 : SIGIO #1 : (6 , 18).
t_sigio[45021]: SUCCESS.

Comment 8 Jason Vas Dias 2022-11-01 14:20:53 UTC
Created attachment 1921544 [details]
Modified v6.0.5-200.fc36 spec file to build latest upstream v6.0.6 with patch applied

I built linux-6.0.6 with (most) 6.0.5 patches applied ,
AND the unmodified v5.19.7 patch posted above applied with no changes needed,
 ($ sha256sum fs_pipe_c_pipe_fd_sigio_poll_hup.patch 
  97bff61cb6f1f1d97e38d834ec76ea554aa8376c730ac2f76c038dbea029b0e6    
  fs_pipe_c_pipe_fd_sigio_poll_hup.patch  
 ),
but the patch-6.0-redhat.patch needed a minor change: 

$ diff -U0 patch-6.0-redhat.patch~ patch-6.0-redhat.patch
--- patch-6.0-redhat.patch~	2022-10-26 16:30:02.000000000 +0100
+++ patch-6.0-redhat.patch	2022-10-31 14:21:37.911203966 +0000
@@ -604 +604 @@
-@@ -2712,9 +2712,16 @@ static int vc4_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi)
+@@ -2712,9 +2712,16 @@
@@ -611 +611 @@
-
+ 
@@ -620,2 +620,2 @@
-
-@@ -2796,6 +2803,12 @@ static int vc5_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi)
+ 
+@@ -2796,6 +2803,12 @@
@@ -624 +624 @@
-
+ 
@@ -634 +634 @@
-@@ -2859,7 +2872,7 @@ static int vc4_hdmi_runtime_suspend(struct device *dev)
+@@ -2859,7 +2872,7 @@
@@ -637 +637 @@
-
+ 
@@ -640 +640 @@
-
+ 
@@ -643,14 +643 @@
-@@ -2869,12 +2882,37 @@ static int vc4_hdmi_runtime_resume(struct device *dev)
- 	struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev);
- 	unsigned long __maybe_unused flags;
- 	u32 __maybe_unused value;
-+	unsigned long rate;
- 	int ret;
-
--	ret = clk_prepare_enable(vc4_hdmi->hsm_clock);
-+	/*
-+	 * The HSM clock is in the HDMI power domain, so we need to set
-+	 * its frequency while the power domain is active so that it
-+	 * keeps its rate.
-+	 */
-+	ret = clk_set_min_rate(vc4_hdmi->hsm_rpm_clock, HSM_MIN_CLOCK_FREQ);
+@@ -2884,6 +2897,21 @@
@@ -659,5 +646 @@
-
-+	ret = clk_prepare_enable(vc4_hdmi->hsm_rpm_clock);
-+	if (ret)
-+		return ret;
-+
+ 
@@ -681,2 +664,2 @@
-
-@@ -2896,6 +2934,10 @@ static int vc4_hdmi_runtime_resume(struct device *dev)
+ 
+@@ -2905,6 +2933,10 @@
@@ -684 +667 @@
-
+ 
@@ -691 +674 @@
-
+ 

otherwise the patch generated against v6.0.5 would not apply to v6.0.6 
upstream linux tarball.

All RPMs updated to latest version and all kernel RPMs built fine, system
is running great, no strangeness with heavy pipe FD users observed,
and the test program succeeds, which it cannot do without the patch -
with the patch, pipe FD users DO NOT have to enter poll() or read() 
in order to determine when the remote end hangs up, they will get
a SIGIO with si_code set to POLL_HUP in this case - that is the ONLY change,
and it works great - please consider applying this patch! Thanks!

Comment 9 Jason Vas Dias 2022-11-01 14:28:52 UTC
Created attachment 1921545 [details]
The v6.0.5-200.fc36 patch-6.0-redhat.patch modified to apply to v6.0.6's drivers/hdmi/vc4/vc4_hdmi.c

To get v6.0.5's redhat patch to apply to v6.0.6, these changes were made


Note You need to log in before you can comment on or make changes to this bug.