Bug 193740
Summary: | libapreq2 code miscompiled with -O2: infinite loop | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bojan Smojver <bojan> | ||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5 | CC: | aoliva | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-09-22 16:44:57 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Bojan Smojver
2006-06-01 02:05:21 UTC
Sorry, stuffed up the intermediate code. The correct one is here: ------------------------------- do { apr_bucket *f = (&(in)->list)->next; do { (((((f))))->link.prev)->link.next = ((((f))))->link.next; (((((f))))->link.next)->link.prev = ((((f))))->link.prev; } while (0); do { apr_bucket *ap__b = (f); do { ((((ap__b))))->link.next = ((struct apr_bucket *)((char *)(((&(out)->list))) - ((long) (((char *) (&(((struct apr_bucket*)((void *)0))->link))) - ((char *) ((void *)0)))))); ((((ap__b))))->link.prev = (((struct apr_bucket *)((char *)(((&(out)->list))) - ((long) (((char *) (&(((struct apr_bucket*)((void *)0))->link))) - ((char *) ((void *)0)))))))->link.prev; ((((struct apr_bucket *)((char *)(((&(out)->list))) - ((long) (((char *) (&(((struct apr_bucket*)((void *)0))->link))) - ((char *) ((void *)0)))))))->link.prev)->link.next = (((ap__b))); (((struct apr_bucket *)((char *)(((&(out)->list))) - ((long) (((char *) (&(((struct apr_bucket*)((void *)0))->link))) - ((char *) ((void *)0)))))))->link.prev = (((ap__b))); } while (0); ; } while (0); } while (e != (&(in)->list)->next); ------------------------------- Thanks to Joe Schaefer for spotting this! Please preprocess parse_multipart.c (either by replacing -c with -E, or adding -save-temps to the options) and attach parse_multipart.i here. Also, please mention all the command line options used. Created attachment 130327 [details]
Intermediate file
Compiled with: gcc -DHAVE_CONFIG_H -I. -I. -I../include -I/usr/local/apache2/include/apr-1 -I/usr/include/mysql -DLINUX=2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -O2 -g -Wall -fstack-protector -MT parser_multipart.lo -MD -MP -MF .deps/parser_multipart.Tpo -c parser_multipart.c -fPIC -DPIC -save-temps -o .libs/parser_multipart.o I think it is quite probable this violates strict aliasing, though I'm not 100% sure, Alex, what do you think about that? Here is a simplified code, with the #if 1 part aliasing clearly decides in->list.next couldn't have been modified, while in #else part it assumes it could. struct apr_bucket_brigade { void *p; struct apr_bucket_list { struct apr_bucket *next; struct apr_bucket *prev; } list; void *bucket_alloc; }; struct apr_bucket { struct { struct apr_bucket *next; struct apr_bucket *prev; } link; const void *type; __SIZE_TYPE__ length; }; void foo (struct apr_bucket_brigade *out, struct apr_bucket_brigade *in) { struct apr_bucket * e = (struct apr_bucket *) in->list.next; do { struct apr_bucket * f = (struct apr_bucket *) in->list.next; f->link.prev->link.next = f->link.next; f->link.next->link.prev = f->link.prev; f->link.next = (struct apr_bucket *) (char *) &out->list; f->link.prev = ((struct apr_bucket *) (char *) &out->list)->link.prev; #if 1 ((struct apr_bucket *) (char *) &out->list)->link.prev->link.next = f; ((struct apr_bucket *) (char *) &out->list)->link.prev = f; #else out->list.prev->link.next = f; out->list.prev = f; #endif } while (e != in->list.next); } Ditto x86_64 and libapreq2 2.07. Reports with: FreeBSD 6.1-RELEASE gcc version 4.1.2 20060526 (prerelease) suggest that this may be an FC specific issue (i.e. code compiles fine, apparently). Or maybe even that this was fixed in 4.1 series already. As many other strict aliasing violations (if this is one), this is highly scheduling etc. dependent, so whether you can or can't reproduce it on other system is irrelevant. I can reproduce the problem with current x86_64-linux GCC trunk e.g. Mainline still behaves the same way, so I doubt it's some fix in the 4.1 series. Clearly list and link don't have the same types, so it might be just fair if accesses to them were regarded as non-aliasing, however even if I use struct apr_bucket_list as the type for link, it still fails in the very same way, so that's not the reason for the problem. Removing the intermediate (char*) casts, that are there only to silence type-punning warnings, I do get warnings for the casts, but if I make the change of using struct apr_bucket_list as the type for link, then the warnings are gone, but the code still fails. This might be a GCC bug. In either case, I do see that, because the pointer to link is cast not to the type of the list member, but to the type of the enclosing struct, then when the pointer is dereferenced, we do access the stored value of the object with a type different from its effective type, and that invokes undefined behavior. AFAICT, it doesn't matter that we only access a sub-object of a type that is structurally equivalent, or even the same type; as soon as we use an incompatible enclosing type, we're breaking the rules, and GCC is entitled to follow the rules to make the code break faster. One thing seems strange in the intermediate code: ------------------------------ ((long) (((char *) (&(((struct apr_bucket*)((void *)0))->link))) - ((char *) ((void *)0)))) ------------------------------ The above code is most likely this from apr_general.h: ------------------------------ #define APR_OFFSET(p_type,field) \ ((long) (((char *) (&(((p_type)NULL)->field))) - ((char *) NULL))) ------------------------------ I was under the impression that on Linux, there is offsetof that can be used. So, I created a little test program: ------------------------------ #include <stdio.h> #include <stddef.h> int main(int argc,char **argv){ struct test{ int a; char *b; }; printf("%lu\n",offsetof(struct test,b)); return 0; } ------------------------------ Compiled it: ------------------------------ gcc -Wall -save-temps -ansi -o test test.c ------------------------------ This give intermediate code: ------------------------------ int main(int argc,char **argv){ struct test{ int a; char *b; }; printf("%lu\n",__builtin_offsetof (struct test, b)); return 0; } ------------------------------ Sooo, why isn't APR code using that instead? No idea... BTW, the output of the program on an x86_64 box is 8. Here is the intermediate code when using native offsetof(): ---------------------------------------- do { apr_bucket *f = (&(in)->list)->next; do { (((((f))))->link.prev)->link.next = ((((f))))->link.next; (((((f))))->link.next)->link.prev = ((((f))))->link.prev; } while (0); do { apr_bucket *ap__b = (f); do { ((((ap__b))))->link.next = ((struct apr_bucket *)((char *)(((&(out)->list))) - __builtin_offsetof (struct apr_bucket, link))); ((((ap__b))))->link.prev = (((struct apr_bucket *)((char *)(((&(out)->list))) - __builtin_offsetof (struct apr_bucket, link))))->link.prev; ((((struct apr_bucket *)((char *)(((&(out)->list))) - __builtin_offsetof (struct apr_bucket, link))))->link.prev)->link.next = (((ap__b))); (((struct apr_bucket *)((char *)(((&(out)->list))) - __builtin_offsetof (struct apr_bucket, link))))->link.prev = (((ap__b))); } while (0); ; } while (0); } while (e != (&(in)->list)->next); ---------------------------------------- However, relevant assembly is not much better (code still hangs in an infinite loop): ---------------------------------------- .L19: movq 56(%rsp), %rbp .LVL27: movq 24(%rsp), %rdi movq %r12, %rdx movq %r14, %rsi subq %r15, %rbp cmpq %r12, %rbp cmovbe %rbp, %rdx addq %r15, %rdi call strncmp@PLT .LVL28: testl %eax, %eax je .L47 .loc 1 159 0 testq %r15, %r15 je .L27 .LBB3: .loc 1 165 0 movq 40(%rsp), %rax xorl %r15d, %r15d movq 8(%rax), %rcx .LVL29: cmpq %rcx, %r13 .p2align 4,,7 .L29: .loc 1 166 0 movq 8(%rcx), %rdx movq (%rcx), %rax movq %rax, (%rdx) movq (%rcx), %rax .LBB4: .loc 1 167 0 movq %rbx, (%rcx) .LBE4: .loc 1 166 0 movq %rdx, 8(%rax) .LBB5: .loc 1 167 0 movq 8(%rbx), %rax movq %rax, 8(%rcx) movq 8(%rbx), %rax movq %rcx, 8(%rbx) movq %rcx, (%rax) .LBE5: .LBE3: .loc 1 168 0 jne .L29 jmp .L19 ---------------------------------------- So, it's not the use of non-native offsetoff() that is doing this... I went a step further and redefined like this (i.e. no (char*) cast): ---------------------------------------- #define APR_RING_SENTINEL(hp, elem, link) \ (struct elem *)((hp) - APR_OFFSETOF(struct elem, link)) ---------------------------------------- Now during compilation I dutifully get: ---------------------------------------- parser_multipart.c:167: warning: dereferencing type-punned pointer will break strict-aliasing rules ---------------------------------------- Looks like the (char*) cast gets rid of the compiler warnings but not really the underlying problem. Maybe these macros should be rewritten? If the macros in libapreq2 have not been rewritten yet, they should. I think there has been some discussion about this in APR, but I don't remember seeing a commit. Anyway, here is the thread: http://mail-archives.apache.org/mod_mbox/apr-dev/200708.mbox/%3cad8a87120708151140j2a839772i96ee303a82fe3a19@mail.gmail.com%3e For now, we're just building libapreq2 with -fno-strict-aliasing. Created attachment 308411 [details]
Fix for libapreq2
This patch makes things work on Fedora 9 with GCC 4.3.0.
|