Bug 1879450 - wchan field in 'ps' is always blank (hyphen) even for sleeping tasks
Summary: wchan field in 'ps' is always blank (hyphen) even for sleeping tasks
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-16 10:25 UTC by Craig Ringer
Modified: 2021-05-25 16:44 UTC (History)
21 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2021-05-25 16:44:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Craig Ringer 2020-09-16 10:25:43 UTC
ps's "wchan" output is blank on Fedora 32.

$ ps -C postgres -o pid,ppid,stat,command:20,wchan:40
    PID    PPID STAT COMMAND              WCHAN
  87169    1869 S+   /home/craig/pg/2QREL -
  87171   87169 Ss   postgres: checkpoint -
  87172   87169 Ss   postgres: background -
  87173   87169 Ss   postgres: walwriter  -
  87174   87169 Ss   postgres: autovacuum -
  87175   87169 Ss   postgres: stats coll -
  87176   87169 Ss   postgres: pglogical  -
  87177   87169 Ss   postgres: logical re -
  87179   87169 Ss   postgres: pglogical  -
  87181   87169 Ss   postgres: pglogical  -
  87190    1869 S+   /home/craig/pg/2QREL -
  87192   87190 Ss   postgres: checkpoint -
  87193   87190 Ss   postgres: background -
  87194   87190 Ss   postgres: walwriter  -
  87195   87190 Ss   postgres: autovacuum -
  ...

The same is true when run as root; this doesn't appear to be a simple user access restriction.

SELinux doesn't have any audit complaints about it, and temporarily disabling SELinux enforcement with "sudo setenforce 0" has no effect on the outcome.

I noticed this in Fedora 31, but it persists in Fedora 32.

$ uname -a
Linux kaylee 5.8.6-201.fc32.x86_64 #1 SMP Fri Sep 4 03:27:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ ps --version
ps from procps-ng 3.3.15

Comment 1 Craig Ringer 2020-09-16 10:33:19 UTC
I originally wrote this bug against procps-ng, but changed it to a kernel bug report because /proc/$pid/wchan is 0 for sleeping processes, suggesting it's down to the kernel's reporting not ps's display:

$ ps -p 87169 -o pid,ppid,stat,command:20,wchan:40
    PID    PPID STAT COMMAND              WCHAN
  87169    1869 S+   /home/craig/pg/2QREL -

$ cat /proc/87169/wchan; echo
0

$ cat /proc/87169/stat
87169 (postgres) S 1869 47896 3242 34816 47896 4194304 2727 18571 0 0 4 3 55 36 20 0 1 0 1017842 246456320 3912 18446744073709551615 4694016 11250049 140724457406240 0 0 0 0 19935232 84487 0 0 0 17 3 0 0 0 0 0 14470576 14531864 41201664 140724457409368 140724457409504 140724457409504 140724457414600 0

$ gdb -q -nx -batch -p 87169 -ex 'set batch on' -ex 'bt' -ex 'quit'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f199fee4f7a in __GI___select (nfds=4, readfds=0x7ffcf74b7be0, writefds=0x0, exceptfds=0x0, timeout=0x7ffcf74b7c60) at ../sysdeps/unix/sysv/linux/select.c:41
41	  return SYSCALL_CANCEL (select, nfds, readfds, writefds, exceptfds,
No symbol "batch" in current context.
#0  0x00007f199fee4f7a in __GI___select (nfds=4, readfds=0x7ffcf74b7be0, writefds=0x0, exceptfds=0x0, timeout=0x7ffcf74b7c60) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x000000000082f908 in ServerLoop () at XXXX/src/backend/postmaster/postmaster.c:1671
#2  0x000000000082f2da in PostmasterMain (argc=3, argv=0x274ce90) at XXXX/src/backend/postmaster/postmaster.c:1380
#3  0x0000000000754079 in main (argc=3, argv=0x274ce90) at XXXX/src/backend/main/main.c:228


We can see that postgres is sleeping in select() - which is actually odd, since I would've expected it to use epoll(). But wchan is "-" in ps, and 0 in procfs.

This isn't postgres specific, it was just a convenient process to poke as an example. I see the same for everything.

Comment 2 Craig Ringer 2020-09-16 10:50:04 UTC
Also note

$ awk '{ printf "stat=%s flags=%x kstesp=%d ksteip=%d wchan=%d\n", $3, $9, $29, $30, $35; }' < /proc/87169/stat
stat=S flags=400000 kstesp=0 ksteip=0 wchan=0

Also, the EUID and EGID shown by ps are the same as the running user id and gid. In any case, I can ptrace the process, so this shouldn't be anything like the old patch here https://lwn.net/Articles/331158/ .

Comment 3 Craig Ringer 2020-09-16 11:05:38 UTC
A further update here: I've found that while `wchan` is always `-`, the `nwchan` format-specifier field from `ps` may show as either `0` or `ffffff`. 

Oddly, the 30th field of /proc/{pid}/stat seems to be zero either way, but that may be user error on my part.

# awk '$30 != 0 { print $1, $30; }' /proc/*/stat 2>/dev/null
#

Comment 4 Georg Sauthoff 2020-11-07 11:41:26 UTC
I can confirm that wchan support is completely broken on Fedora 32:


uname -a
Linux goedel.lru.li 5.8.15-201.fc32.x86_64 #1 SMP Thu Oct 15 15:56:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
whoami
root
cat /proc/*/wchan
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


Perhaps this has something to do with the kernel being compiled without frame-pointer support (cf. get_wchan() in arch/x86/kernel/process.c)?

Comment 5 Georg Sauthoff 2020-11-15 20:17:53 UTC
FWIW (and perhaps expected), also broken on Fedora 33:

uname -a
Linux dell12.lru.li 5.8.18-300.fc33.x86_64 #1 SMP Mon Nov 2 19:09:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/*/wchan
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Comment 6 Fedora Program Management 2021-04-29 16:37:55 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Ben Cotton 2021-05-25 16:44:50 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.