Bug 952946 - 32-bit process stack space allocation is broken in PIE mode
Summary: 32-bit process stack space allocation is broken in PIE mode
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: other
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: harden-failure 947022
TreeView+ depends on / blocked
 
Reported: 2013-04-17 03:31 UTC by Tom Lane
Modified: 2018-06-21 08:52 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-19 10:09:36 UTC
Type: Bug


Attachments (Terms of Use)
patch for postgresql.spec to enable _hardened_build and add debugging patch (1.52 KB, patch)
2013-04-17 03:31 UTC, Tom Lane
no flags Details | Diff
patch to print getrlimit and memory map every so often (thanks to John Reiser for suggestions) (2.12 KB, patch)
2013-04-17 03:33 UTC, Tom Lane
no flags Details | Diff
postmaster log extract (268.30 KB, text/plain)
2013-04-17 03:33 UTC, Tom Lane
no flags Details
standalone test case (874 bytes, text/plain)
2013-04-18 20:11 UTC, John Reiser
no flags Details
postgres log with (edited) memory map dump after each 1K of stack growth (7.22 KB, text/plain)
2013-04-18 22:49 UTC, Tom Lane
no flags Details
adjusted reproducer using malloc() (1.05 KB, application/x-tgz)
2013-10-02 05:55 UTC, Pavel Raiskup
no flags Details

Description Tom Lane 2013-04-17 03:31:53 UTC
Created attachment 736659 [details]
patch for postgresql.spec to enable _hardened_build and add debugging patch

Description of problem:
I find that enabling _hardened_build breaks PostgreSQL 32-bit builds: they fail regression tests fairly consistently, with symptoms indicating that the kernel is providing only 2MB of stack space even though getrlimit(RLIMIT_STACK) claims the stack limit is 8MB.  This does not happen without _hardened_build, and it's not 100% consistent with, so there's something rotten in the address space randomization stuff.

It should be noted that I'm testing 32-bit builds under mock with a 64-bit kernel; I do not know whether the kernel's word width is relevant here.

Version-Release number of selected component (if applicable):
kernel-3.9.0-0.rc5.git1.301.fc19.x86_64

How reproducible:
Seems close to 100% when using F19-alpha environment on a laptop.  I see the same behavior on my due-for-retirement F16 workstation, although on that box it only fails maybe 50% of the time; don't know if this is related to the beefier hardware or the older kernel.

Steps to Reproduce:
[ sorry for the overcomplicated test case, but I've been unable to reproduce this with a simple test program ]

1.  Grab current postgresql sources from Fedora package git, and add _hardened_build to the specfile; optionally add check-stack.patch which attempts to provide some relevant debug output.
2.  Build in 32-bit environment under mock, viz 
/usr/bin/mock -r fedora-19-i386 /tmp/postgresql-9.2.4-1y.fc20.src.rpm

On an actual 32-bit machine it might not be necessary to use mock ... or then again maybe the 64-bit kernel is an important part of the equation.
  
Actual results:
regression tests fail due to crash in "infinite_recurse()" test case, which is meant to verify that the platform's stack depth limit has been correctly detected.  If this doesn't happen immediately, try it a few times.

Expected results:
Should pass reliably.

Additional info:
I've attached a specfile patch, the referenced check-stack.patch, and an extract from the postmaster log showing what the check-stack patch prints before dying.  It is quite clear that the effective stack depth limit is only about 2MB, even though getrlimit claims it's 8MB.

I've tried to generate an equivalent failure using a short test program, without much success.  I speculate that the reason postgres fails has to do with the fact that it loads a fair number of shared libraries (cf memory maps in log), or with the fact that it creates a SysV-style shared memory segment.

Comment 1 Tom Lane 2013-04-17 03:33:19 UTC
Created attachment 736660 [details]
patch to print getrlimit and memory map every so often (thanks to John Reiser for suggestions)

Comment 2 Tom Lane 2013-04-17 03:33:51 UTC
Created attachment 736661 [details]
postmaster log extract

Comment 3 Vincent Danen 2013-04-17 16:01:44 UTC
Note the discussion on fedora-devel:

http://lists.fedoraproject.org/pipermail/devel/2013-April/181553.html

This is probably something we want to chase as using the hardened build is definitely desired.

Comment 5 Tom Lane 2013-04-17 16:43:09 UTC
I've been able to replicate this on a 32-bit F18 installation, not using mock, just building the SRPM as a normal unprivileged user.  So that lets mock off the hook for sure, and I can say it's not a 64-bit-kernel-vs-32-bit-userland issue either.  This run was with kernel 3.8.7-201.fc18.i686.PAE, and whatever F18 packages beaker is installing at the moment.

Comment 9 John Reiser 2013-04-18 20:11:14 UTC
Created attachment 737408 [details]
standalone test case

Here's a standalone testcase in 41 lines of C.  Compile with
   gcc -m32 -pie -fPIE -g -o ./where where.c

The problem is that sbrk() can grow the heap of a -pie program until the heap overlaps the stack, with no complaint from kernel nor glibc.

The testcase repeatedly expands the heap by 0.5MB, printing the address space each time.  Each time overlap is detected, then the testcase pauses by reading one byte from stdin.  Execution continues until sbrk() fails.

Typical output when the testcase detects overlap that nobody else does is:
   f779e000-f779f000 rw-p 00000000 08:15 7024004   ./where
   f779f000-f77a1000 rw-p 00000000 00:00 0 
   f83b3000-ff133000 rw-p 00000000 00:00 0         [heap]
   ff926000-ff947000 rw-p 00000000 00:00 0         [stack]
   stack rlim_cur=0x800000  rlim_max=0xffffffff  stack=0xff945e08
   sbrk(0x80000)=0xff133000
   warning: possible overlap of heap and stack

The overlap happens because (0xff133000 + 0x80000) > (0xff945e08 - 0x800000).
The high end of the sbrk is (0xff133000 + 0x80000), which is the low end plus the size; and the low end of stackspace is (0xff945e08 - 0x800000), which is the current value minus the maximum stack size.

Comment 10 Tom Lane 2013-04-18 21:01:48 UTC
(In reply to comment #9)
> The problem is that sbrk() can grow the heap of a -pie program until the
> heap overlaps the stack, with no complaint from kernel nor glibc.

Oooh, great diagnosis.  I was about to object that this must be a different symptom from what I'm seeing in Postgres, but I'd forgotten that the Postgres test case is eating heap space even faster than it's eating stack (as you can easily see from the successive memory maps in the log in comment #2).  There's still some daylight between heap and stack in the last map dump, but extrapolation says they'd have overlapped by several MB by the time of the crash.  I'm now thinking that the core dump comes when heap-data manipulations corrupt the stack.

Further experimentation says that sbrk will complain only when extending the heap would overrun the currently mapped bottom of stack (0xff926000 in John's sample map above), forgetting that we may have promised via RLIMIT_STACK that the stack can be extended to below that.  And apparently, the stack expansion code doesn't notice that it's intruding on already-allocated heap space either.

Comment 11 Tom Lane 2013-04-18 22:49:28 UTC
Created attachment 737459 [details]
postgres log with (edited) memory map dump after each 1K of stack growth

I modified the previously shown check-stack.patch so it'd print the memory map each time it printed "observed stack", ie after each 1K of stack growth.  This attachment shows the last few printouts before crash; for brevity I omitted all but the last three lines of each memory map.  It's rather interesting to watch the map start labeling the stack as "[heap]".  But I think what's really happening is that the code that ought to expand the stack just silently gives up once there's no room to expand the stack anymore.  Which is not terribly surprising.  So IMO John is right to affix the blame on the sbrk() side of things: the heap should not have been allowed to intrude into the region reserved by RLIMIT_STACK.  These maps show conclusively that it was so allowed --- the region reserved for stack should go down to 0xff3b5000, but here's heap allocated up to 0xff9ce000.

Comment 13 Josh Boyer 2013-09-18 20:46:25 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 14 Pavel Raiskup 2013-10-01 13:57:35 UTC
Yes.  This is still problematic.  Considering you have ASLR enabled, if you
allocate a memory (does not metter if you use brk(), sbrk() or malloc()), the
memory mapping of [heap] grows up to fields where stack should reside (at
least if the RLIMIT_STACK was guaranteed).

Ideally, allocation glibc function should not go far then to RLIMIT_STACK if
ASLR is ON.  Otherwise, stack memory mapping may decrease to _very_ small
space.

Is there any know workaround for this behaviour (e.g. if we were allowed to
guarantee some stack minimum size)?  Pavel

Comment 15 Pavel Raiskup 2013-10-02 05:55:42 UTC
Created attachment 806268 [details]
adjusted reproducer using malloc()

Attached reproducer is yet enhanced John's reproducer showing that it is
possible to cut stack's virtual address space to minimum values by enlarging
heap space.  Just run 'make && ./program' (reproduced on x86_64/i386
F19).

Comment 16 Justin M. Forbes 2014-03-10 14:43:56 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for more than 1 month and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 19, please feel free to reopen the bug and provide the additional information requested.

Comment 17 Tom Lane 2014-03-10 15:01:27 UTC
I think leaving this bug in NEEDINFO state is a mistake; it implies (at least to some people) that more information is needed from the bug reporter.  ISTM that at this point the ball is definitely in the kernel maintainers' court: to either fix it, or provide a workaround for sbrk's failure to honor RLIMIT_STACK.

Comment 18 Justin M. Forbes 2014-05-21 19:30:12 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.14.4-100.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 19 Josh Boyer 2014-05-22 12:37:13 UTC
This bug is langishing (obviously).  I'm marking it so it won't get hit by the auto-rebase-needinfo dealings.

Has anyone reported this to the upstream MM developers?  That would likely be the best bet for resolution.

Comment 20 Tom Lane 2014-05-22 12:56:01 UTC
This came up again just a few days ago on the Postgres mailing lists:
http://www.postgresql.org/message-id/20140519091808.GA7296@msgid.df7cb.de
Debian's PG packager is now seeing it on all their 32-bit architectures
(*not* only i386).

Comment 21 Tom Lane 2015-02-24 17:06:45 UTC
I think marking this as "i686 only" was incorrect, see comment #20.

Comment 22 Josh Boyer 2015-02-24 17:12:35 UTC
We don't have a generic '32-bit hardware' category and 'All' is also incorrect.  I guess I'll mark it 'other'.

Comment 23 Fedora End Of Life 2015-05-29 08:59:47 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Tom Lane 2015-08-28 13:46:15 UTC
I'm told that this has been fixed upstream by
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a87938b2e246b81b4fb713edb371a9fa3c5c3c86

Dunno if that's migrated into any RH kernels yet, but if so you could try checking whether the problem's gone away, and if so consider enabling PIE for PG.

Comment 25 Josh Boyer 2015-08-28 14:50:07 UTC
(In reply to Tom Lane from comment #24)
> I'm told that this has been fixed upstream by
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=a87938b2e246b81b4fb713edb371a9fa3c5c3c86
> 
> Dunno if that's migrated into any RH kernels yet, but if so you could try
> checking whether the problem's gone away, and if so consider enabling PIE
> for PG.

It's in 4.1, so all current Fedora releases have that fix.

Comment 26 Pavel Raiskup 2015-09-17 08:59:19 UTC
Tested on F22 i386, and PostgreSQL's testsuite passed.  Same for Rawhide scratch
build.  I'm afraid that the problem with not guaranteed RLIMIT_STACK still
exists (reproducer in commend #15 still segfaults on i386) -- but that is
apparently not a problem for PostgreSQL (so I'll enable hardening).

Not sure whether the [heap] and [stack] collision shouldn't be fixed...
(I bet its better to let this bug opened).

FYI, after bit of testing, I also filed bug 1263974 (I'm not 100% sure about
its correctness) -- that issue made my testing bit more uncomfortable..

Comment 27 Fedora End Of Life 2015-11-04 10:29:30 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 28 Fedora End Of Life 2016-07-19 10:09:36 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 29 Florian Weimer 2017-05-15 10:45:48 UTC
As far as I can tell, this still hasn't been fixed.


Note You need to log in before you can comment on or make changes to this bug.