104583 – LTC4383-Question concerning the algorithm used for choosing where the shared libraries load

Bug 104583 - LTC4383-Question concerning the algorithm used for choosing where the shared libraries load

Summary: LTC4383-Question concerning the algorithm used for choosing where the shared ...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-17 15:46 UTC by IBM Bug Proxy
Modified:	2007-11-30 22:06 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2003-10-15 14:25:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description IBM Bug Proxy 2003-09-17 15:46:18 UTC

The following has be reported by IBM LTC:  
Question concerning the algorithm used for choosing where the shared libraries load
This is more a question than a bug.

On RHEL 3 IA32, we noticed that the shared libraries no longer load at 
0x40000000 and they load in the space between 0x0 and the executable in the 
address space.

My question is what happens when this space is filled.  Do you have a specific 
place where you put these after the exec, or is it random?  My concern is that 
this could cause a potential problem for DB2.  We are still investigating the 
effects of this change.Glen/Greg - a good question for Red Hat on RHEL3.  Thanks.
Yvonne - does this impact DB2 at all ?

Comment 1 Jakub Jelinek 2003-09-17 19:57:03 UTC

Unless you're using prelink(8), it is kernel which decides these addresses.

Comment 2 Arjan van de Ven 2003-09-17 20:09:58 UTC

if this space is filled, we go to other free regions (in practice,
TASK_UNMAPPED_BASE onwards)
while this is suboptimal (eg it breaks the big free area between the binary and
the top of stack/mmaps) it's a safe thing to do for a rare situation

Comment 3 IBM Bug Proxy 2003-09-23 14:11:26 UTC

------ Additional Comments From yvchan.com  2003-23-09 10:05 -------
Is the TASK_UNMAPPED_BASE the same as previous kernels and is this moveable if 
need be?

Comment 4 Arjan van de Ven 2003-09-23 14:17:52 UTC

TASK_UNMAPPED_BASE is at 1/3 of virtual space still; it is not movable.

but with prelink YOU decide where libs get loaded; kernel policy only matters if
there's no preference...

Comment 5 IBM Bug Proxy 2003-09-23 15:06:12 UTC

------ Additional Comments From yvchan.com  2003-23-09 10:59 -------
There are a couple of things I'd like to clarify first.

From what I can tell of prelink, it would be run on the system after we install 
DB2?  If this is a the case, then we can *NOT* do this.  It could cause DB2 to 
stop running should the customer update their system libraries for any reason.  
This would become a maintenance/support nightmare for us.

Do the binaries need to be built with gcc 3.2+ to use this?  If so, we can not 
do this on x86, or IA64 since we use the Intel compiler.

Comment 6 Arjan van de Ven 2003-09-23 16:47:12 UTC

you can tell prelink to run once before you ship stuff if you manually pick
where libs go.

If it works with Intel's compiler I don't know, I have no information on how
compatible that is with gcc.

Comment 7 IBM Bug Proxy 2003-09-23 17:11:14 UTC

------ Additional Comments From yvchan.com  2003-23-09 13:09 -------
ok.  This is not an option then.  We build on RH 7.2 and ship only 1 set of 
binaries per architecture.  We do *NOT* re-build or plan to change any of this 
until v9, due to customer commitments.  We can not afford the effort to ship 
multiple binaries, due to both resouce, and support issues.  

Since we have 1 binary (per arch) we can not use this option since we support 
quite a few distro's with that one binary.  We used to make use of the 
mapped_base file that existed in /proc/<pid>/ directory.  This has been 
obviously removed.  Is there a reason for this?

Comment 8 IBM Bug Proxy 2003-09-23 17:15:58 UTC

------ Additional Comments From yvchan.com  2003-23-09 13:10 -------
One more thing to note the mapped_base file was availabe in Red Hat Advanced 
Server 2.1

Comment 9 Arjan van de Ven 2003-09-23 17:18:41 UTC

mapped_base has not been added to RHEL3 because it is basically not needed; we
expect(ed) that all libs would just not be in that area.

Question: is this actually observed in practice or is this just a theoretical
"what if" thing ?

Comment 10 Arjan van de Ven 2003-09-23 18:55:53 UTC

For a next generation of your product it may even make sense to move the
executable up a bit so that all libs fit below it for sure; that way a maximum
space between binary and stack is available (eg big mmap segment possible)

Comment 11 IBM Bug Proxy 2003-09-24 06:49:13 UTC

------ Additional Comments From yvchan.com  2003-23-09 16:13 -------
We are looking at creating a scenario where this would happen.  It's not 
unlikely in our opinion that this could cause us problems.

If the shared library comes along and decides that it attaches at 
TASK_UNMAPPED_BASE which is 1/3 of vm is 0x40000000(?) because the section 
above the exe is full.  And we come along and detect that the shared libs are 
higher than 0x30000000, we attach our shared memory starting here.  (We make an 
assumption that if it's higher then 0x3, then the shared libs are at 
0x20000000  -- I know, not the greatest, but we can fix this.)  However, the 
problem is when we attach at 0x30000000 and the shared lib starts attaching at 
0x40000000 and we bump into it.  (I assume in this scenario, we sigsegv!).  

The other possibility is that we have our shared memory attach for say 1.5 G 
starting from 0x30000000, and we need more space for the libraries.  What 
happens? Is the kernel smart enough to move the libs to below our shared memory 
segment? or do we get even weirder behaviour?

Comment 12 Arjan van de Ven 2003-09-24 07:01:19 UTC

> TASK_UNMAPPED_BASE which is 1/3 of vm is 0x40000000(?) 
it is that value for 3Gb userspace; for 4Gb userspace it's more (1/3rd more)

The kernel will never map shared libraries in a place where something else
already exists. (and if you use dlopen() glibc also has a big say in this btw)

My recommendation would be that if you map a large area, to either not provide a
hint address (eg let the kernel find a hole this big), or to try to work from
the stack downwards. In no case should MAP_FIXED be used for things like this,
since that erradicates all existing mappings that might exist.

Comment 13 IBM Bug Proxy 2003-09-24 16:15:16 UTC

------ Additional Comments From yvchan.com  2003-24-09 11:36 -------
We don't use MAPPED_FIX as far as I know since we use shmget/shmat with our 
shared memory mapping.  The reason we use the hint address is so that all our 
our executables can have the shared memory attached at the same place.

However, perhaps this won't be a problem.  We have just gotten a test program 
that will either dlopen, and/or just dynamically link the shared libs in.  The 
shared libs load in the space before the executable, and then right after it.

This is good news, however it conflicts with your comment that the shared libs 
would contine to use TASK_UNMAPPED_BASE as the position to restart from.  
Comments?

Comment 14 IBM Bug Proxy 2003-09-26 15:31:18 UTC

------ Additional Comments From yvchan.com  2003-26-09 11:27 -------
*SIGH*  we have just hit what I was afraid of.  

This is coming to us from the LDAP team and this is the kernel they are using:

Linux ldapdut009 2.4.21-3.ELsmp #1 SMP Fri Sep 19 14:06:12 EDT 2003 i686 i686 
i386 GNU/Linux

This is a process mapping of the db2sysc process: 
-----------------------------------------------------------------BEGIN
08048000-08051000 r-xp 00000000 08:05 
359776     /home/ldapdb2/sqllib/adm/db2sysc
08051000-08056000 rw-p 00008000 08:05 
359776     /home/ldapdb2/sqllib/adm/db2sysc
08056000-08077000 rw-p 00000000 00:00 0
10000000-10ef4000 rw-s 00000000 00:04 4161540    /SYSV9ff53761 (deleted)
11000000-125fc000 rw-s 00000000 00:04 4194309    /SYSV00000000 (deleted)
50000000-54ecc000 rw-s 00000000 00:04 4227078    /SYSV00000000 (deleted)
b3ccb000-b3ecb000 r--p 00000000 08:08 2966002    /usr/lib/locale/locale-archive
b3ecb000-b3f4c000 rw-p 00001000 00:00 0
b3f4c000-b416e000 rw-s 00000000 00:04 4128771    /SYSV9ff53774 (deleted)
b416e000-b41af000 rw-p 00000000 00:00 0
b41af000-b41ba000 r-xp 00000000 08:08 507940     /lib/libnss_files-2.3.2.so
b41ba000-b41bb000 rw-p 0000a000 08:08 507940     /lib/libnss_files-2.3.2.so
b41bb000-b41bc000 rw-p 00000000 00:00 0
b41bc000-b41c3000 r-xp 00000000 08:02 
392488     /opt/IBM/db2/V8.1/lib/libdb2trcapi.so.1
b41c3000-b41c7000 rw-p 00006000 08:02 
392488     /opt/IBM/db2/V8.1/lib/libdb2trcapi.so.1
b41c7000-b41c8000 rw-p 00000000 00:00 0
b41c8000-b41ed000 r-xp 00000000 08:02 
392471     /opt/IBM/db2/V8.1/lib/libdb2genreg.so.1
b41ed000-b420b000 rw-p 00024000 08:02 
392471     /opt/IBM/db2/V8.1/lib/libdb2genreg.so.1
b420b000-b4210000 rw-p 00000000 00:00 0
b4210000-b4221000 r-xp 00000000 08:02 
392480     /opt/IBM/db2/V8.1/lib/libdb2locale.so.1
b4221000-b422e000 rw-p 00010000 08:02 
392480     /opt/IBM/db2/V8.1/lib/libdb2locale.so.1
b422e000-b4230000 rw-p 00000000 00:00 0
b4230000-b4232000 r-xp 00000000 08:02 
392477     /opt/IBM/db2/V8.1/lib/libdb2install.so.1
b4232000-b4233000 rw-p 00001000 08:02 
392477     /opt/IBM/db2/V8.1/lib/libdb2install.so.1
b4233000-b423b000 r-xp 00000000 08:08 4096014    /lib/tls/librtkaio-2.3.2.so
b423b000-b423c000 rw-p 00007000 08:08 4096014    /lib/tls/librtkaio-2.3.2.so
b423c000-b4247000 rw-p 00000000 00:00 0
b4247000-b4249000 r-xp 00000000 08:08 507920     /lib/libdl-2.3.2.so
b4249000-b424a000 rw-p 00001000 08:08 507920     /lib/libdl-2.3.2.so
b424a000-b424f000 r-xp 00000000 08:08 507918     /lib/libcrypt-2.3.2.so
b424f000-b4250000 rw-p 00004000 08:08 507918     /lib/libcrypt-2.3.2.so
b4250000-b4277000 rw-p 00000000 00:00 0
b4277000-b43a8000 r-xp 00000000 08:08 4096006    /lib/tls/libc-2.3.2.so
b43a8000-b43ab000 rw-p 00130000 08:08 4096006    /lib/tls/libc-2.3.2.so
b43ab000-b43af000 rw-p 00000000 00:00 0
b43af000-b43fe000 r-xp 00000000 08:02 
392454     /opt/IBM/db2/V8.1/lib/libcxa.so.1
b43fe000-b441d000 rw-p 0004e000 08:02 
392454     /opt/IBM/db2/V8.1/lib/libcxa.so.1
b441d000-b443e000 r-xp 00000000 08:08 4096009    /lib/tls/libm-2.3.2.so
b443e000-b443f000 rw-p 00020000 08:08 4096009    /lib/tls/libm-2.3.2.so
b443f000-b4475000 r-xp 00000000 08:02 
392482     /opt/IBM/db2/V8.1/lib/libdb2osse.so.1
b4475000-b44b8000 rw-p 00035000 08:02 
392482     /opt/IBM/db2/V8.1/lib/libdb2osse.so.1
b44b8000-b44bc000 rw-p 00000000 00:00 0
b44bc000-b60d2000 r-xp 00000000 08:02 
392531     /opt/IBM/db2/V8.1/lib/libdb2e.so.1
b60d2000-b7564000 rw-p 01c15000 08:02 
392531     /opt/IBM/db2/V8.1/lib/libdb2e.so.1
b7564000-b75cd000 rw-p 00000000 00:00 0
b75cd000-b75da000 r-xp 00000000 08:08 4096011    /lib/tls/libpthread-0.59.so
b75da000-b75db000 rw-p 0000c000 08:08 4096011    /lib/tls/libpthread-0.59.so
b75db000-b75dd000 rw-p 00000000 00:00 0
b75ea000-b75eb000 rw-p 00001000 00:00 0
b75eb000-b7600000 r-xp 00000000 08:08 507907     /lib/ld-2.3.2.so
b7600000-b7601000 rw-p 00015000 08:08 507907     /lib/ld-2.3.2.so
bfff7000-c0000000 rwxp ffffa000 00:00 0
-----------------------------------------------------------------END

1/3 of 4G address space is 0x50000000 which is right in the middle of our 
shared memory attachment points!  As we can see, the shared libs then jump down 
to 0xb3...  which is ok, but this breaks things up horribly for us.  Is there a 
reason that the shared libs in this case don't start their attachment point at 
an address lower than the exe's?

Comment 15 Arjan van de Ven 2003-09-26 15:35:09 UTC

are there any special LD_ASSUME_KERNEL settings used here ?

Comment 16 Arjan van de Ven 2003-09-26 15:36:27 UTC

btw all non-PROT_EXEC mmaps will grow down from the stack (eg on a 3Gb kernel,
down from 0xbffff... ), PROT_EXEC mmaps (eg libaries, assuming ld.so mmaps them
with PROT_EXEC, which is why LD_ASSUME_KERNEL influences this) should use the
different 'below the binary' allocator.

Comment 17 Arjan van de Ven 2003-09-26 15:37:48 UTC

(oh and setarch usage will impact this too)

Comment 18 IBM Bug Proxy 2003-09-26 15:56:11 UTC

------ Additional Comments From yvchan.com  2003-26-09 11:51 -------
hmm.  no LD_ASSUME_KERNEL isn't set in the environment.  I don't think we have 
setarch usage.. but the PROT_EXEC sounds familiar.  I will look into that.  I 
suppose it's too late to ask for the mapped_base patch to be put back in?

Comment 19 Arjan van de Ven 2003-09-26 15:58:45 UTC

basically yes ;(

but for this specific case I don't think it would have helped either btw; this
appears to use the non-PROT_EXEC allocator which doesn't use TASK_UNMAPPED_BASE
at all.

Comment 20 IBM Bug Proxy 2003-09-26 17:22:47 UTC

------ Additional Comments From yvchan.com  2003-26-09 13:17 -------
I'm trying to confirm this, but I believe we do use the PROT_EXEC.  I know the 
changes we make with respect to mapped_base does work on RH AS 2.1.

Comment 21 IBM Bug Proxy 2003-10-03 17:32:25 UTC

------ Additional Comments From yvchan.com  2003-03-10 13:28 -------
It looks like that's correct.   We don't use PROT_EXEC.

Can I ask how the mapped_base patch in RHAS 2.1 worked if from what you've said 
earlier shouldn't have?

Comment 22 Arjan van de Ven 2003-10-03 17:41:30 UTC

I don't fully understand what you mean exactly.

In AS2.1 all mmap allocations were allocated from TASK_UNMAPPED_BASE onwards,
which by default is set at 1Gb (1/3rd of VA). We made this a per process tunable
because loosing all the VA below 1Gb (the brk space) was undesirable for databases.

In RHEL3 we don't break the VA space in the middle by default but make
non-PROT_EXEC mmaps grow downwards from the stack and put PROT_EXEC's below the
executable. This leaves the brkspace vs mmap space undivided; TASK_UNMAPPED_BASE
isn't really relevant normally therefore.

Comment 23 IBM Bug Proxy 2003-10-03 20:26:51 UTC

------ Additional Comments From khoa.com  2003-03-10 16:23 -------
*** Bug 4741 has been marked as a duplicate of this bug. ***

Comment 24 IBM Bug Proxy 2003-10-06 16:56:31 UTC

------ Additional Comments From yvchan.com  2003-06-10 12:53 -------
I think I should have looked at the process map that I sent you more closely, 
and it looks like some stuff has changed since I originally opened this 
bugzilla with respects to the shared library attachments.

So effectively we have 0x10000000 to 0xa0000000 available to use for shared 
memory attaches...  (everything just above the exec to just below the stack + 
shared libs since we are not PROT_EXEC)

I apologize for some of these questions, but I need to make sure I understand 
this properly.

Oh, and one last thing, does the stack still start at 0xbfffffff or is it now 
0xffffffff?

Comment 25 Arjan van de Ven 2003-10-06 17:00:10 UTC

>(everything just above the exec to just below the stack + 
>shared libs since we are not PROT_EXEC)

that is the general idea yes; this should be more than you had before

> Oh, and one last thing, does the stack still start at 0xbfffffff or is it now 
> 0xffffffff? 

this depends on which kernel you use. The kernel-smp kernel will have it start
at 0xbfff ... while the kernel-hugemem kernel will start at 4Gb minus a tiny bit.

Comment 26 IBM Bug Proxy 2003-10-15 14:23:04 UTC

------ Additional Comments From yvchan.com  2003-15-10 10:17 -------
Thanks everyone.  This issue can be closed.  We are comfortable with the 
information here, and it looks like this change on x86 is to our advantage.

Comment 27 Arjan van de Ven 2003-10-15 14:25:26 UTC

> Thanks everyone.  This issue can be closed.  We are comfortable with the 
> information here, and it looks like this change on x86 is to our advantage. 

that was the goal of the change ;)

anyway closing on the Red Hat side

Comment 28 IBM Bug Proxy 2003-10-16 01:05:01 UTC

------ Additional Comments From khoa.com  2003-15-10 21:02 -------
Closing this bug!

Note You need to log in before you can comment on or make changes to this bug.