Created attachment 1910355 [details] Patch against fs/pipe.c of v5.19.7 1. Please describe the problem: Compile & run the example program on any version of Linux prior to my patched v5.19.7 version, and it prints: $ gcc -o t_sigio t_sigio.c $ ./t_sigio t_sigio[2002797]: 336339 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO.... t_sigio[2002797]: 0 bytes available. t_sigio[2002797]: Alarmed: killing cat pid 2002798. t_sigio[2002797]: 0 bytes available. t_sigio[2002797]: 1131733499: 1131731216: 2283 : SIGIO #1 : (1 , 41). t_sigio[2002797]: 0 bytes available. t_sigio[2002797]: FAILURE. Processes with a read end of a pipe open to one writer, if that one writer closes its write end (eg. by being killed), that are not currently read()-ing the pipe, got a SIGIO with the struct siginfo 'si_code' field set to 1 (POLL_IN) & si_band set to 41, even though ioctl(fd, FIONREAD, &sz) returns with sz==0 . Run it under the patched v5.19.7 kernel I produced by modifying the latest Fedora 36 v5.19.6 spec file, and it prints: $ uname -a Linux jvdspc.jvds.net 5.19.7-200.fc36.x86_64+debug #1 SMP PREEMPT_DYNAMIC Wed Sep 7 21:14:57 IST 2022 x86_64 x86_64 x86_64 GNU/Linux [jvd@jvdspc:~ [3230] 02:30:42 [#:3!:6975]{0} $ ./t_sigio t_sigio[3257]: 855318 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO.... t_sigio[3257]: 0 bytes available. t_sigio[3257]: Alarmed: killing cat pid 3258. t_sigio[3257]: 0 bytes available. t_sigio[3257]: SUCCESS. So it DOES get a SIGIO (not printed, unfortunately...) which sets 'si_code' to POLL_HUP , with the very short attached patch to fs/pipe.c, which simply sends SIGIO with si_code=POLL_HUP whenever no writers are detected on a read pipe, or whenever a pipe is closed, and there are no writers . All Fedora + full GNOME or Xfce4 Desktop processes start up & run fine under this patched kernel - Emacs displays Manual Pages fine (very pipe dependant...) - and my test program (demonstrating & fixing a problem in an I/O library I am developing) works fine. So the patch appears to be harmless, and really makes I/O programming with pipes & SIGIO much more robust & straightforward. 2. What is the Version-Release number of the kernel: Fails test under all previous kernels to my patched 5.19.7 . 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : no. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: See #1. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: yes - tested under Fedora 36's v5.19.6, same as rawhide's version . 6. Are you running any modules that not shipped with directly Fedora's kernel?: no 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Will do if requested - I did 'dmesg -c > ~/dmesg.log' - it is very big.
Created attachment 1910356 [details] testcase: t_sigio.c $ gcc -o t_sigio sigio.c $ ./t_sigio t_sigio[4884]: 743001 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO.... t_sigio[4884]: 0 bytes available. t_sigio[4884]: Alarmed: killing cat pid 4885. t_sigio[4884]: 0 bytes available. t_sigio[4884]: SUCCESS.
I did report this in more detail and I submitted the patch & test program to kernel Bugzilla 216458 : https://bugzilla.kernel.org/show_bug.cgi?id=216458 If @dhowells could please review this bug & patch & respond to lkml post I sent about this I'd be much obliged. Why can't Linux send POLL_HUP on pipes with no writers ? It helps alot and does no harm - a read() still returns length 0 .
After full BIOS & Peripherals (Dock) Firmware upgrade, with recent 'yum update', and use of 'gnome-software' to install Manufacturer's BIOS Upgrade, during which ALL my Laptop's firmware was upgraded, yesterday, my patched v5.19.7+debug kernel is running fine, no problems seen that were not seen before (oops-es on Synaptic Touchpad gesture non-recognition, and on failures of lvm2 to mount my loop devices during boot ... these happened before are are non-issues / easily rectified). Of course, it is not signed, so there were a few extra messages in the logs about that ... Attaching patches to the kernel.spec and 'process_configs.sh' files - changes I had to make to get the RPMs to build - and I still had to to 'rpmbuild -bb --short-circuit' at the end because the kabi-dw*.tar.xz files for ABI 5.19.7 were not found. A link to the SRPM on my Google Drive is : https://drive.google.com/file/d/1Mtr1nJ6wGUQfKMZajhzjdZ2PZOyFUDo7/view?usp=sharing I can upload the binary RPMs produced, if anyone wants: $ ls -ltr kernel*.rpm -rw-r--r--. 1 jvd devel 267569 Sep 8 01:06 kernel-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 267677 Sep 8 01:06 kernel-debug-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 267733 Sep 8 01:06 kernel-debug-devel-matched-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 889781 Sep 8 01:06 kernel-debug-modules-internal-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 3955805 Sep 8 01:06 kernel-debug-modules-extra-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 16637969 Sep 8 01:07 kernel-debug-devel-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 54246361 Sep 8 01:07 kernel-debug-core-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 62176229 Sep 8 01:07 kernel-debug-modules-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 81671093 Sep 8 01:08 kernel-debuginfo-common-x86_64-5.19.7-200.fc36.x86_64.rpm -rw-r--r--. 1 jvd devel 847494721 Sep 8 01:10 kernel-debug-debuginfo-5.19.7-200.fc36.x86_64.rpm
Created attachment 1910754 [details] testcase: t_sigio.c Improved to determine correct popen() pid to kill - sometimes, on busy system, this may NOT be (pid+1). This change has nothing to do with core pipe FD issue being tested - it is just sometimes it could kill the wrong pid & so get a 2nd alarm and fail.
Created attachment 1915984 [details] Patch against a RHEL8 v4.18 kernel for same issue - fs/pipe.c only This patch builds & is tested against latest RHEL8 v4.18 kernel (Rocky Linux EL8) . No problems observed on target build + file (NFS) server platform, and same t_sigio test program now passes.
Here is a link to the SRPM for v4.18 based kernels I built & am running OK : https://drive.google.com/file/d/1i-kAfDegaLv2NXgh_21iRUVDmZfZuNaR/view?usp=sharing on a Rocky RHEL8 clone remote fileserver , and here is a link to the SRPM for v5.19.12+ based kernels I am running, with first version of fs/pipe.c patch applied, on my Fedora 36 x86_64 12-core laptop : https://drive.google.com/file/d/1adj1LbooL2NCIEYz3z0mrmaDHeWhvOjR/view?usp=sharing . The only difference to the spec files are A) not enabling kernel signing, and B) applying my patch to fs/pipe.c . They all work fine - no problems observed in server or laptop, and the point is, how else can programs distinguish between a closed final write+read with then what would be a read() return value of 0, which is NOT perceived by my test program, since it does not enter read() or poll(), when ioctl(fd, FIONREAD, &len) returns with len==0 , and a write with a still active receiver which can then go on to handle more input, without possibility of hanging infinitely ? There is no easy / robust / reliable answer before my patch, IMHO - entering poll() or read() is what I am able to avoid with this patch, which reliably MAKES NO DIFFERENCE to ANY existing O_ASYNC pipe reader / writer process on a fully up-to-date Rocky Linux distro server system and on a fully up-to-date Fedora desktop system , and DOES allow the end-of-input with length==0 case to be easily distinguished from the waiting-for-more input case .
Created attachment 1916012 [details] improved test case with corrected format to print pid & number of ns. $ gcc -o t_sigio t_sigio.c [jvd@jvdspc:~ [3264] 18:53:41 [#:72!:7842]{0} $ ./t_sigio t_sigio[45021]: 251050312 Listening to 'cat' (stdin), process 45022 on pipe fd: 3 with SIGIO.... t_sigio[45021]: 0 bytes available. t_sigio[45021]: 1251195901: Alarmed: killing cat pid 45022. t_sigio[45021]: 0 bytes available. t_sigio[45021]: 1384607628: 1384604300: 3328 : SIGIO #1 : (6 , 18). t_sigio[45021]: SUCCESS.
Created attachment 1921544 [details] Modified v6.0.5-200.fc36 spec file to build latest upstream v6.0.6 with patch applied I built linux-6.0.6 with (most) 6.0.5 patches applied , AND the unmodified v5.19.7 patch posted above applied with no changes needed, ($ sha256sum fs_pipe_c_pipe_fd_sigio_poll_hup.patch 97bff61cb6f1f1d97e38d834ec76ea554aa8376c730ac2f76c038dbea029b0e6 fs_pipe_c_pipe_fd_sigio_poll_hup.patch ), but the patch-6.0-redhat.patch needed a minor change: $ diff -U0 patch-6.0-redhat.patch~ patch-6.0-redhat.patch --- patch-6.0-redhat.patch~ 2022-10-26 16:30:02.000000000 +0100 +++ patch-6.0-redhat.patch 2022-10-31 14:21:37.911203966 +0000 @@ -604 +604 @@ -@@ -2712,9 +2712,16 @@ static int vc4_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi) +@@ -2712,9 +2712,16 @@ @@ -611 +611 @@ - + @@ -620,2 +620,2 @@ - -@@ -2796,6 +2803,12 @@ static int vc5_hdmi_init_resources(struct vc4_hdmi *vc4_hdmi) + +@@ -2796,6 +2803,12 @@ @@ -624 +624 @@ - + @@ -634 +634 @@ -@@ -2859,7 +2872,7 @@ static int vc4_hdmi_runtime_suspend(struct device *dev) +@@ -2859,7 +2872,7 @@ @@ -637 +637 @@ - + @@ -640 +640 @@ - + @@ -643,14 +643 @@ -@@ -2869,12 +2882,37 @@ static int vc4_hdmi_runtime_resume(struct device *dev) - struct vc4_hdmi *vc4_hdmi = dev_get_drvdata(dev); - unsigned long __maybe_unused flags; - u32 __maybe_unused value; -+ unsigned long rate; - int ret; - -- ret = clk_prepare_enable(vc4_hdmi->hsm_clock); -+ /* -+ * The HSM clock is in the HDMI power domain, so we need to set -+ * its frequency while the power domain is active so that it -+ * keeps its rate. -+ */ -+ ret = clk_set_min_rate(vc4_hdmi->hsm_rpm_clock, HSM_MIN_CLOCK_FREQ); +@@ -2884,6 +2897,21 @@ @@ -659,5 +646 @@ - -+ ret = clk_prepare_enable(vc4_hdmi->hsm_rpm_clock); -+ if (ret) -+ return ret; -+ + @@ -681,2 +664,2 @@ - -@@ -2896,6 +2934,10 @@ static int vc4_hdmi_runtime_resume(struct device *dev) + +@@ -2905,6 +2933,10 @@ @@ -684 +667 @@ - + @@ -691 +674 @@ - + otherwise the patch generated against v6.0.5 would not apply to v6.0.6 upstream linux tarball. All RPMs updated to latest version and all kernel RPMs built fine, system is running great, no strangeness with heavy pipe FD users observed, and the test program succeeds, which it cannot do without the patch - with the patch, pipe FD users DO NOT have to enter poll() or read() in order to determine when the remote end hangs up, they will get a SIGIO with si_code set to POLL_HUP in this case - that is the ONLY change, and it works great - please consider applying this patch! Thanks!
Created attachment 1921545 [details] The v6.0.5-200.fc36 patch-6.0-redhat.patch modified to apply to v6.0.6's drivers/hdmi/vc4/vc4_hdmi.c To get v6.0.5's redhat patch to apply to v6.0.6, these changes were made