Description of problem: Current version of ps display it's output sorted according to PID, not according to process start time as is common for Linux systems. Therefore, once pid counter wraps around, it's quite difficult to find newly started processed in ps output. Version-Release number of selected component (if applicable): procps-3.2.7-8.2.fc6 Steps to Reproduce: 1. Make your system wrap-around pid counter. 2. Run ps -ef and check where it appears in its output. Additional info: Issue was also mentioned in log for #220752 as possible cause of reported problem. After 2 weeks, no reaction from submitter nor maintainer, hence I decided to open dedicated bug report.
Are you able to produce different output with any diffrent version of procps on same system (kernel)? [1] I don't think so. The ps command only follows order in /proc directory. There is not any extra sort() in the ps command. It seems old kernels sort /proc in differt manner than new. Try ls -fl /proc. [1] I've tried procps-3.x and procps-2.x (upstream versions) with same results.
Thanks for quick reply! I have tried with procps 3.0.5 (oldest I found on upstream page) and behavior is the same. It really seems to be /proc sorting issue ('ls -f /proc' is very good point ;). I have also tried different kernels: - kernel-2.6.18-1.2798.fc6 - ok, /proc sorted as expected - kernel-2.6.19-1.2895.fc6, kernel-2.6.19-1.2911.fc6 - new proc sorting New questions arise: - Is this new /proc sorting intended or bug? - If it's intended, will procps be modified to compensate for this change?
It seems like a regression. Re-assigning to kernel guys.
Looks like upstream issue. I managed to reproduce similar behavior on Gentoo 2.6.19 kernel. ChangeLog-2.6.19 mentions several /proc -related changes by Eric W. Biederman, maybe that's the point where it was introduced...
From Albert Cahalan, author of procps: ps --sort=start_time I've always just assumed the order to be random. From Eric W. Biederman, who made the kernel change: Apologies, but this was a bug fix for a more serious issue. The code to report the directory entries by start time was fundamentally broken. In particular the sequence: opendir readdir readdir readdir .... closedir can miss processes that exist for the entire duration of that sequence. Which is non-posix, non-intuitive, and has no reasonable work around. The sorting by pid happened as a side effect of finding a stable token we can come back to so we can at least guarantee normal readdir semantics. That objects that exist for the entire readdir are guaranteed to be displayed. That objects that come into existence or are deleted during the readdir may be missed. That isn't perfect but it is a useable semantic.
Chuck, thanks for reply! It's more or less what I expected. Btw: Those replies from Albert and Eric were taken from private communication or from some public discussion about the issue on mailing list or forum, which I haven't managed to find? If later is the case, any pointers are appreciated.
From: http://lkml.org/lkml/2007/2/28/314 http://lkml.org/lkml/2007/2/28/339