RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1374332 - /proc/stat reports zero procs_blocked even with D-state processes
Summary: /proc/stat reports zero procs_blocked even with D-state processes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.8
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Oleg Nesterov
QA Contact: Chunyu Hu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-08 13:05 UTC by Rodrigo A B Freire
Modified: 2017-04-28 16:56 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1374397 (view as bug list)
Environment:
Last Closed: 2017-04-03 15:14:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2608861 0 None None None 2016-09-08 14:58:28 UTC

Internal Links: 1796043

Description Rodrigo A B Freire 2016-09-08 13:05:50 UTC
Description of problem:
 * vmstat 'b' field, as per its manpage:
       b: The number of processes in uninterruptible sleep.
 * Generating a synthetic amount of D-state processes will not cause any change in vmstat 'b' field, nor in dstat -ap.

Version-Release number of selected component (if applicable):
* procps-3.2.8-36.el6

How reproducible:
* 100% / Always

Steps to Reproduce:
1. Freeze a filesystem (not you root!) using fsfreeze -f /test for example
2. Monitor the system running vmstat 1 and/or dstat -ap
3. Run the following script:
   $ for i in {1..30} ; do touch /test/$i.txt & done

Actual results:
* vmstat or dstat will not show any change in 'b' / 'blk' fields

Expected results:
* vmstat or dstat would have to point the D-state hung processes.

Additional info:
---

Comment 1 Jan Rybar 2016-09-08 13:19:23 UTC
vmstat takes data from /proc/stat, where the value really is 0 for some reason (however in /proc/PID/stat the information about D-status is correct). Seems like a matter of kernel.

Comment 2 Rodrigo A B Freire 2016-09-08 15:01:13 UTC
https://www.kernel.org/doc/Documentation/filesystems/proc.txt states:

> 'The   "procs_blocked" line gives  the  number of  processes currently blocked,
> waiting for I/O to complete.

However, the described reproducer in Comment #0 does not change procs_blocked.

Comment 4 Joe Lawrence 2016-09-12 20:00:21 UTC
According to the kernel implementation of fs/proc/stat.c :: show_stat(),
what the kernel is reporting as "procs_blocked" is the sum of all
per-cpu "nr_iowait" variables.

nr_iowait is *only* updated when a thread calls io_schedule_timeout() --
incremented before scheduling and decremented after waking up.

Note: it is very possible for a kernel thread to be TASK_UNINTERRUPTIBLE
(ie, D-state) *without* waiting on I/O.  For example, any call to msleep
will set the current task to TASK_UNINTERRUPTIBLE.  In this case,
instead of waiting until a disk I/O completes, the task waits for a
timer to expire.

What about the example in Comment #0?

  % dd if=/dev/zero of=/tmp/temp bs=1M count=500
  % losetup /dev/loop1 /tmp/temp
  % mkfs.ext4 /dev/loop1
  % mkdir /mnt/temp
  % mount /dev/loop1 /mnt/temp
  % fsfreeze -f /mnt/temp/
  % touch /mnt/temp/foo &

  % grep procs_blocked /proc/stat
  procs_blocked 0

  % cat /proc/$(pgrep touch)/stat
  3014 (touch) D 2963 3014 2963 34816 3059 4202496 272 0 0 0 0 0 0 0 20 0 1 0 1088036 110514176 87 18446744073709551615 4194304 4244516 140734986742128 140734986741528 139644875737616 0 0 0 0 18446744071580943694 0 0 17 12 0 0 0 0 0 6345544 6349600 34729984 140734986748923 140734986748943 140734986748943 140734986751977 0

  % cat /proc/$(pgrep touch)/stack
  [<ffffffff8120054e>] __sb_start_write+0xde/0x110
  [<ffffffff8121e5a4>] mnt_want_write+0x24/0x50
  [<ffffffff8120cdff>] do_last+0xc1f/0x12a0
  [<ffffffff8120d542>] path_openat+0xc2/0x490
  [<ffffffff8120f6bb>] do_filp_open+0x4b/0xb0
  [<ffffffff811fcbd3>] do_sys_open+0xf3/0x1f0
  [<ffffffff811fccee>] SyS_open+0x1e/0x20
  [<ffffffff81693a09>] system_call_fastpath+0x16/0x1b
  [<ffffffffffffffff>] 0xffffffffffffffff

  crash> dis -l __sb_start_write+0xde
  /usr/src/debug/kernel-3.10.0-501.el7/linux-3.10.0-501.el7.x86_64/fs/super.c: 1147
  0xffffffff8120054e <__sb_start_write+222>:      lea    -0x58(%rbp),%rsi

  1141 int __sb_start_write(struct super_block *sb, int level, bool wait)
  1142 {
  1143 retry:
  1144         if (unlikely(sb->s_writers.frozen >= level)) {
  1145                 if (!wait)
  1146                         return 0;
  1147                 wait_event(sb->s_writers.wait_unfrozen,
  1148                            sb->s_writers.frozen < level);
  1149         }

The this case, the filesystem was frozen, so we haven't even gotten as
far as pushing any I/O out to the device.  The implementation of
include/linux/wait.h :: wait_event() sets TASK_UNINTERRUPTIBLE, checks
on a condition and schedules continuously until the condition is met.

The kernel documentation would probably be a little clearer if it read,
"The "procs_blocked" line gives the number of processes currently
blocked *ON* waiting for I/O to complete."
        ^^^^

It would also be clearer if the kernel /proc/stat interface had printed
"nr_iowait" (as it's referred to in the source code) rather than
"procs_blocked".  However, that ship has sailed and renaming this field
will break all manner of "awk '/procs_blocked/{print $NF}' /proc/stat"
type scripts.

I took a peek at the source for vmstat and it's interesting that in the
absence of a /proc/stat "procs_blocked" field (Linux 2.5.46
(approximately) and below), it will iterate through all /proc/<PID>/stat
files looking for 'R' or 'D' fields.  In 2002 code was added to use
/proc/stat "procs_blocked" if it was available.  Nothing in the commit
message for this change makes reference to this field or any change in
reporting semantics.  IMHO, this was a bug introduced into vmstat long
ago... since nobody has complained in the interim, I would suggest
changing the vmstat documentation to match its current implementation:

FIELD DESCRIPTION FOR VM MODE
   Procs
       r: The number of runnable processes (running or waiting for run time).
       b: The number of processes blocked on IO (in uninterruptible sleep.)
                                  ^^^^^^^^^^^^^
or something to the effect of explaining that the value is only the
subset of TASK_UNINTERRUPTIBLE waiting on I/O completion.

Comment 6 Oleg Nesterov 2017-03-23 22:50:25 UTC
(In reply to Joe Lawrence from comment #4)
>
> nr_iowait is *only* updated when a thread calls io_schedule_timeout() --
> incremented before scheduling and decremented after waking up.

exactly!

and in this case /usr/bin/touch waits for the semaphore.

I was very sure I have already close this bug as NOTABUG... probably it
was another one with the same description.

I think this one should be closed too.

Comment 7 Rodrigo A B Freire 2017-03-24 00:43:57 UTC
(In reply to Oleg Nesterov from comment #6)

> I was very sure I have already close this bug as NOTABUG... probably it
> was another one with the same description.
> 
> I think this one should be closed too.

https://www.youtube.com/watch?v=L0MK7qz13bU&feature=youtu.be&t=65


Note You need to log in before you can comment on or make changes to this bug.