Bug 2101641 - unprivileged GDB can lock up rawhide system running kernel-5.19.0-0.{rc1,rc2,rc3,rc4}
Summary: unprivileged GDB can lock up rawhide system running kernel-5.19.0-0.{rc1,rc2,...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-28 03:04 UTC by Kevin Buettner
Modified: 2022-11-30 22:17 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-30 22:17:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
C source file / test case (2.76 KB, text/x-csrc)
2022-06-28 03:04 UTC, Kevin Buettner
no flags Details
Kernel logs (66.17 KB, text/plain)
2022-06-28 03:06 UTC, Kevin Buettner
no flags Details
more kernel logs (71.38 KB, text/plain)
2022-06-28 10:25 UTC, Frank Ch. Eigler
no flags Details

Description Kevin Buettner 2022-06-28 03:04:17 UTC
Created attachment 1893070 [details]
C source file / test case

1. Please describe the problem:

Running the GDB test case gdb.threads/process-dies-while-detaching.exp will lock up recent rawhide kernels.

Specifically, when GDB's 'detach' command is run as described below, I see that the virt-manager CPU usage monitor becomes pegged. The system is then almost immediately unresponsive and pings to that machine no longer work.

2. What is the Version-Release number of the kernel:

I've reproduced this problem using the following kernel versions:

5.19.0-0.rc1.14.fc37.x86_64
5.19.0-0.rc1.20220610git874c8ca1e60b.18.fc37.x86_64
5.19.0-0.rc2.21.fc37.x86_64
5.19.0-0.rc2.20220616git30306f6194ca.23.fc37.x86_64
5.19.0-0.rc3.20220621git78ca55889a54.28.fc37.x86_64
5.19.0-0.rc3.20220624git92f20ff72066.30.fc37.x86_64
5.19.0-0.rc4.33.fc37.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I've verified that the test works as expected using the following kernel versions:

5.18.0-60.fc37.x86_64
5.19.0-0.rc0.20220524git143a6252e1b8.60.fc37.x86_64
5.19.0-0.rc0.20220603git50fd82b3a9a9.11.fc37.x86_64


The problem first appeared with 5.19.0-0.rc1.14.fc37.x86_64.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Yes, I can reproduce the issue...

Build the attached .c file using the following command:

gcc -o process-dies-while-detaching -g process-dies-while-detaching.c 

Run it under GDB using this command:

gdb ./process-dies-while-detaching -iex 'set debuginfod enabled off' -ex start -ex 'watch globalvar' -ex 'tbreak _exit' -ex continue -ex detach

When the program / test runs correctly, you should see:

...
Thread 35 "process-dies-wh" hit Temporary breakpoint 3, __GI__exit (status=0) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	{
Detaching from program: /mesquite2/kernel-gdb-detach-bug/process-dies-while-detaching, process 1475
[Inferior 1 (process 1475) detached]
(gdb) 

(The thread number and process number will likely differ from one run to another.)

When it fails to work / locks up the system, you'll just this instead:

...
Thread 200 "process-dies-wh" hit Temporary breakpoint 3, __GI__exit (status=0) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	{


Note that the 'Detaching from program' message is missing.  Also, the machine will lock up at this point.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Yes, the problem occurs with the most recent version of the rawhide kernel at the time that I wrote this bug report.  That version is:

5.19.0-0.rc4.33.fc37.x86_64

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Kevin Buettner 2022-06-28 03:06:55 UTC
Created attachment 1893071 [details]
Kernel logs

Kernel logs as requested.  (Though I doubt that they'll be very useful.)

Comment 2 Frank Ch. Eigler 2022-06-28 10:25:23 UTC
Created attachment 1893150 [details]
more kernel logs

Some traces from a few episodes of this problem, this time saved via netconsole.

Comment 3 Kevin Buettner 2022-07-10 01:45:32 UTC
This seems to be fixed in 5.19.0-0.rc5.20220708gite8a4e1c1bb69.44.fc37.x86_64.


Note You need to log in before you can comment on or make changes to this bug.