Bug 461246 - RHEL4 64 bit skips all pids with bit 15 set (32768-65535, 98304-131071 etc)
Summary: RHEL4 64 bit skips all pids with bit 15 set (32768-65535, 98304-131071 etc)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Martin Jenner
URL:
Whiteboard:
: 431617 465371 (view as bug list)
Depends On:
Blocks: 461304 479182
TreeView+ depends on / blocked
 
Reported: 2008-09-05 10:12 UTC by Issue Tracker
Modified: 2018-10-20 03:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-18 19:08:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pid sequence check program (1.13 KB, text/plain)
2008-09-05 10:16 UTC, Martin Poole
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1024 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Issue Tracker 2008-09-05 10:12:58 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Martin Poole 2008-09-05 10:15:14 UTC
Used a test program pidseqchk.c to check the behaviour of the kernel.

The program continuously fork()s itself until there is a sequence discontinuity or an error occurs, at which point it notes the extent of the continuous allocation and then tidies up and continues around the loop.

Under the RHEL4-U4 kernel the gaps between 32768 & 65536 and above 98304 were observed with one particular exception.

# uname -a
Linux host-175.example.com 2.6.9-42.EL #1 Wed Jul 12 23:15:20 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
# echo 99999 >/proc/sys/kernel/pid_max
# ulimit -u unlimited
# ./pidseqchk
fork failed: 11/Resource temporarily unavailable
sequence break:  3688 - 11832 (new 11834)
fork failed: 11/Resource temporarily unavailable
sequence break:  11834 - 19978 (new 19980)
fork failed: 11/Resource temporarily unavailable
sequence break:  19980 - 28124 (new 28126)
sequence break:  28126 - 32767 (new 65536)
fork failed: 11/Resource temporarily unavailable
sequence break:  65536 - 73680 (new 73682)
fork failed: 11/Resource temporarily unavailable
sequence break:  73682 - 81826 (new 81828)
fork failed: 11/Resource temporarily unavailable
sequence break:  81828 - 89972 (new 89974)
fork failed: 11/Resource temporarily unavailable
sequence break:  89974 - 98118 (new 98120)
sequence break:  98120 - 98303 (new 131072)
sequence break:  131072 - 131072 (new 300)
sequence break:  300 - 305 (new 307)

It is curious to note that the kernel does give a value greater than pid_max when creating pid 131072 which suggests that there is a fencepost issue somewhere.


After much digging I managed to find a short thread on the kernel mailing list from 2004 http://marc.info/?t=107335880700001&r=1&w=2 which highlights the specific problem in the kernel/pid.c file within the allod_pidmap() routine whereby when pid crosses into the second (and subsequent even) page used for tracking pid use it then skips over the page.

The particular patch was not adopted and the alloc_pidmap was re-written for the 2.6.10 kernel.

<wli>
 [PATCH] pidhashing: rewrite alloc_pidmap()

 Rewrite alloc_pidmap() to clarify control flow by eliminating all usage of
 goto, honor pid_max and first available pid after last_pid semantics, make
 only a single pass over the used portion of the pid bitmap, and update
 copyrights to reflect ongoing maintenance by Ingo and myself.

Comment 2 Martin Poole 2008-09-05 10:16:45 UTC
Created attachment 315854 [details]
pid sequence check program

Comment 3 Martin Poole 2008-09-05 10:42:48 UTC
Summary of LKML thread.

the alloc_pidmap() routine in kernel/pid.c skips alternate pidmap pages.

On entry to the alloc_pidmap() routine the pid value is set to last_pid+1. When this increments out of an existing page (32767->32768) the offset into the map page becomes 0. when it reaches the test of !offset before next_map: it enters the block and then moves to the next map.

Comment 5 RHEL Program Management 2008-10-30 18:27:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Vivek Goyal 2008-12-10 22:12:01 UTC
Committed in 78.21.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 7 Jiri Pirko 2008-12-11 07:58:53 UTC
*** Bug 465371 has been marked as a duplicate of this bug. ***

Comment 11 Johnray Fuller 2009-03-16 13:56:48 UTC
*** Bug 431617 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2009-05-18 19:08:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.