Bug 435337
Summary: | pthread_attr_setstacksize is incorrect on PPC and S390 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Tom Lane <tgl> | ||||
Component: | glibc | Assignee: | Jakub Jelinek <jakub> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.1 | CC: | David.Holmes, dwmw2, fweimer, gbenson, hhorak, jwboyer, langel | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | ppc64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-03-11 09:28:45 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tom Lane
2008-02-28 19:24:35 UTC
Please read man 3p pthread_attr_getguardsize The default guard size is one page, in RHEL5-ish and some other kernels ppc has 64K page, F8/F9 ppc64 kernels use AFAIK 4K pages. Well, *something* in this area changed recently, because mysql worked fine on RHEL5 up through December. Can you tell me exactly what did change? Nothing changed really. The only possible change would be if the buildboxes were using RHEL4 or some other kernels until December. BTW, I do not agree with your reading of the specification. While neither the pthread_attr_setstacksize nor pthread_attr_getguardsize spec pages say in so many words whether the guard area is to be subtracted from the requested stack size, the guardsize page says that "... the implementation allocates *extra* memory at the overflow end of the stack ..." which to me implies the guard area is *in addition to* the stack size. This is also in accord with common sense; if the guard area is supposed to be included in the stack size, wouldn't there be a large warning on the setstacksize page to remind people to allow for it when selecting their stack size? It'd certainly mean that no one could correctly use setstacksize without being aware of the guardsize parameter, but there's not even a cross-reference to it on the setstacksize page. So I remain of the opinion that RHEL5's behavior is broken. We were building on hosts with 4KiB pages till recently BTW, it appears that the brew machines are still using 4KB pages? I just tested a scratch build and the failure doesn't seem to occur in brew. Any info on when/if brew is likely to transition to 64KB pages? I believe they plan to move to RHEL5 for the build system some time "soon". This would mean 64KiB pages. IcedTea uses pthread_attr_setstacksize and assumes that the amount it asked for is the amount it got. Are you saying that this has *never* been the case? I have just finished experimenting with RHTS machines. Using the RHEL5-U1 releases, I find that the exact requested stacksize is allocated on i386, x86_64, and ia64. Only ppc and s390x subtract the guard space. (I didn't try s390 separately.) Seeing that all the mainstream architectures allocate the full requested stack space, I think your position that this is not a bug is completely untenable. It is hardly likely that any program out there will be expecting that it has to add on the guard area. Then your testing wasn't very good. Try say: #include <pthread.h> #include <stdio.h> #include <unistd.h> void *tf (void *arg) { char buf[64]; snprintf (buf, sizeof buf, "cat /proc/%d/maps", (int) getpid ()); system (buf); return arg; } int main (void) { pthread_attr_t a; pthread_attr_init (&a); pthread_attr_setstacksize (&a, 16 * 1024 * 1024); pthread_attr_setguardsize (&a, 10 * 1024 * 1024); pthread_t th; pthread_create (&th, &a, tf, NULL); pthread_join (th, NULL); } and you'll see that the guard area is part of the stack sized allocation on all architectures. That's a useful test case but I don't think it proves your point. What I'm seeing with it on my x86_64 box is that the allocated stack space is one page (4K) larger than it should be according to your argument. Since one page is the default and typical guard area, the net effect is that a program that is ignorant of the guard area parameter will get a stack that is exactly the size it asked for. Thus, I stand by my opinion that few programs out there will be expecting this behavior. That's just on i686 and x86_64, iff stacksize is multiple of 64K, one page is added to avoid page aliasing performance degradation. My IcedTea builds all succeed on ppc, ppc64, x86, x86_64. But on the koji machines, the ppc build fails no matter what. I am certain we are having the same problem. Hm, I think this also shows that ia64 is just plain broken. Consider this variant of your test program: #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> void *tf (void *arg) { char buf[64]; snprintf (buf, sizeof buf, "cat /proc/%d/maps", (int) getpid ()); system (buf); sleep(4); return arg; } int main (void) { pthread_attr_t a; pthread_t th1; pthread_t th2; pthread_t th3; pthread_attr_init (&a); pthread_attr_setstacksize (&a, 16 * 1024 * 1024); // pthread_attr_setguardsize (&a, 10 * 1024 * 1024); pthread_create (&th1, &a, tf, NULL); sleep(1); pthread_create (&th2, &a, tf, NULL); sleep(1); pthread_create (&th3, &a, tf, NULL); pthread_join (th1, NULL); pthread_join (th2, NULL); pthread_join (th3, NULL); return 0; } Running this on a RHEL5-U1 ia64 RHTS machine, the printout shows the thread stack space as 2000000000b18000-2000000000b1c000 ---p 2000000000b18000 00:00 0 2000000000b1c000-200000000131c000 rw-p 2000000000b1c000 00:00 0 then 2000000000b18000-2000000000b1c000 ---p 2000000000b18000 00:00 0 2000000000b1c000-2000000001b18000 rw-p 2000000000b1c000 00:00 0 2000000001b18000-2000000001b1c000 ---p 2000000001b18000 00:00 0 2000000001b1c000-200000000231c000 rw-p 2000000001b1c000 00:00 0 then 2000000000b18000-2000000000b1c000 ---p 2000000000b18000 00:00 0 2000000000b1c000-2000000001b18000 rw-p 2000000000b1c000 00:00 0 2000000001b18000-2000000001b1c000 ---p 2000000001b18000 00:00 0 2000000001b1c000-2000000002b18000 rw-p 2000000001b1c000 00:00 0 2000000002b18000-2000000002b1c000 ---p 2000000002b18000 00:00 0 2000000002b1c000-200000000331c000 rw-p 2000000002b1c000 00:00 0 At each step the newest thread only seems to be getting 8MB not 16 as requested. BTW, this last might help explain some bizarre coding I found in mysql: #if defined(__ia64__) || defined(__ia64) /* Peculiar things with ia64 platforms - it seems we only have half the stack size in reality, so we have to double it here */ pthread_attr_setstacksize(&thr_attr,thread_stack*2); #else pthread_attr_setstacksize(&thr_attr,thread_stack); #endif I had thought that this was either nuts or due to insufficient understanding of the guard area issue, but when I replace this code with something that just adds the guard area size, it crashes --- on ia64 only. ia64 has two stacks for each thread, normal stack and register stack. Normal stack grows down, register stack grows up, guard page(s) if any are in the middle. So how does that explain the change in the size of the previous thread's already-allocated stack? Oh, nevermind, I see what you're saying: there's no guard space between one thread's normal stack and the next one's register stack. Bizarre. One more question, if I may. It looks like on ia64, if you setstacksize to some reasonably-round number, you get exactly half of that for normal stack and half less the guard area for the register stack. Correct? How can one know if this is enough register stack? The stack depth limiting techniqures in both mysql and postgresql will (I believe) measure normal stack accurately, but they've got no handle on register stack depth AFAICS. Can the register stack grow faster than normal stack? Or even as fast? (In reply to comment #12) > That's just on i686 and x86_64, iff stacksize is multiple of 64K, one page is > added to avoid page aliasing performance degradation. So we have varying behavior depending on the stacksize set? Reading the two man pages involved we have: pthread_attr_getguardsize: "If a thread’s stack is created with guard protection, the implementation allocates extra memory at the overflow end of the stack as a buffer against stack overflow of the stack pointer." pthread_attr_setstacksize: "The stacksize attribute shall define the minimum stack size (in bytes) allocated for the created threads stack." Note that getguardsize explicitly states that the implementation should allocate _extra_ memory, and setstacksize should provide the _minimum_ stack size in bytes. To my reading, this means that glibc should pad out the guard page unconditionally. I'm a bit confused as to how the current behavior can be considered conforming to POSIX. Jakub could you explain that please? With a less than strict reading, certain parts might be a bit ambiguous so perhaps this needs to go to the standards committee for clarification. In the meantime however, programmers need to be aware of the current behavior. Could we perhaps add a brief section to the man page of pthread_attr_setstacksize that describes it's interaction with the guard page? Given the IA64 situation I've come to the conclusion that the current implementation is the best you could do. The alternative would be to end up with threads allocating way more stack than expect, just to hide some complexity from application developers. Created attachment 297069 [details]
Oddity
Out of interest, I noticed it's impossible to allocate a stack that's a power
of two pages in size on i386 and x86_64 machines: you get exactly one page more
than you asked for:
to-gcj-1:[~]$ cat /etc/fedora-release
Fedora release 8 (Werewolf)
to-gcj-1:[~]$ uname -a
Linux to-gcj-1.yyz.redhat.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 13:18:33 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux
to-gcj-1:[~]$ gcc -o sticky-stacker -lpthread sticky-stacker.c &&
./sticky-stacker
Requested 512000, got 512000
Requested 516096, got 516096
Requested 520192, got 520192
Requested 524288, got 528384
Requested 528384, got 528384
Requested 532480, got 532480
Requested 536576, got 536576
Is this expected?
Yes, that's expected: /* To avoid aliasing effects on a larger scale than pages we adjust the allocated stack size if necessary. This way allocations directly following each other will not have aliasing problems. */ #if MULTI_PAGE_ALIASING != 0 if ((size % MULTI_PAGE_ALIASING) == 0) size += pagesize_m1 + 1; #endif and libc/nptl/sysdeps/i386/i686/Makefile:CFLAGS-pthread_create.c += -DMULTI_PAGE_ALIASING=65536 libc/nptl/sysdeps/x86_64/Makefile:CFLAGS-pthread_create.c += -DMULTI_PAGE_ALIASING=65536 Cool, I thought it would be but I wanted to check. > To my reading, this means that glibc should pad out the guard page
> unconditionally. I'm a bit confused as to how the current behavior can be
> considered conforming to POSIX. Jakub could you explain that please?
>
> With a less than strict reading, certain parts might be a bit ambiguous so
> perhaps this needs to go to the standards committee for clarification. In the
> meantime however, programmers need to be aware of the current behavior. Could
> we perhaps add a brief section to the man page of pthread_attr_setstacksize that
> describes it's interaction with the guard page?
Jakub, any comments on this at all?
I've talked to Ulrich about this and he says this is intentional and not violating POSIX. It still desperately needs a documentation change, as suggested at comment #20. I agree with Tom Lane and Josh Boyer, the guard pages should be in addition to the stack usable by the thread. Gary Benson just raised this issue with OpenJDK because our code expects the glibc guard-page to be outside the stack requested by setstacksize, or as reported by getstacksize. |