Bug 158277
Summary: | ps <pid> can sometimes not see <pid> | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Alan Tyson <alan.tyson> |
Component: | kernel | Assignee: | Peter Martuccelli <peterm> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | albert, kzak, petrides, rick.stern |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-05-31 19:52:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alan Tyson
2005-05-20 09:00:24 UTC
Yes, I agree that output from "strace -e trace=open ps <pid>" looks strange and "ps" wastes time with reading unnecessary files. But this problem must be resolved by upstream. I'm not sure with a fix in EL3 or EL4. This is a serious kernel bug. It's also an "I told you so" bug; you can see my arguments against the readdir cursor hack on the linux-kernel mailing list. Somebody, Hugo I believe, had a tree-based /proc lookup that was 100% reliable. Patch that into your kernel and you might be all set. There was another problem, probably fixed by now: Any remaining problems would be caused by glibc and the kernel disagreeing on the size of a struct being used for directory reads. I do not know if this has since been fixed. If strace reveals seeks on the directory reads, then you are likely to have this problem also. Encoding the PID into the directory offset would take care of this. Karel - at least for now, is RH going to file this one in kernel.org? (In reply to comment #2) > Somebody, Hugo I believe, had a tree-based /proc lookup that was 100% reliable. > Patch that into your kernel and you might be all set. That is probably Hugh, not Hugo. I mean the guy who was posting small patch sets a while back, with maybe a dozen patches or so. Albert, I still don't understand why "ps <pid>" doesn't read /proc/<pid>/ directly. Why does it read files in others directories? It reads /proc/#/stat|status|cmdline for all process. Why? That's simply a lack of optimization. Before you go trying to hack this, please remember: 1. There is a -N option. 2. One may use "ps -p 42 -a -U root -p 1" and similar. 3. Consider threads. 4. There are likely to be other "interesting" issues. Asking for just one PID isn't obscure, but I'm not sure it's all that common either. So I don't know if this optimization is worth the code complexity. It might be worthwhile. NOTE NOTE NOTE!!! Adding such an optimization does NOT fix the /proc problem. Regular ps listings will still miss processes from time to time. I find this to be seriously unacceptable. I was rather shocked when Linus accepted the band-aid hack (saved /proc read cursor) instead of the reliable version. Perhaps with a fresh patch and this new bug report you can get a real fix into the kernel. Until then, all tools that read /proc to find processes will be unreliable. This issue needs to be resolved upstream first. I find it somewhat odd that this was marked CLOSED/WONTFIX, because: 1. it makes many monitoring tools unreliable (not just ps) 2. a kernel patch is available 3. with real evidence of problems, pushing the patch upstream should be doable While time has passed, backing out the /proc cursor hack should be doable. Albert - do you have teh kernel buzilla number that ws fixed? As far as I know, very few kernel bug reports go via the bugzilla. The current status regarding this bug: 1. glibc behavior is unknown (must NOT use lseek on /proc directory) 2. the 2.6 kernel has several band-aid hacks related to /proc readdir So you WILL be seeing this bug, but less often than in the past. Hugh's tree-based /proc lookup patch certainly no longer applies to the current kernel source. There have been locking changes since then, and a couple band-aid hacks were added. |