Created attachment 1667329 [details] Code showing 'endless' loop with gcc10 & -O2 Description of problem: Current version of lvm2 in rawhide/fc32/fc33 compiled by latest gcc (gcc-c++-10.0.1-0.8.fc33.x86_64 in my case) is working improperly. I've tried to extract some example code showing miscompilation of existing lvm2 code (this example approximates daemons/dmeventd/dmeventd.c _timeout_thread()) The code prints 1x with compiled with -O0 or -O2 -fno-tree-pta Or prints endless loop of text when compiled just with -O2. Leaving to 'experts' to diagnose the exact reason of this issue - Is there is some 'workaround' for lvm2 code ? Version-Release number of selected component (if applicable): gcc-10.0.1-0.8.fc33.x86_64 How reproducible: gcc-10 -O2 Workaround: So far disabling '-fno-tree-pta' seems to generate working code.
Well it looks as though the optimiser is deciding that this: &v->field != (head); which expands to &thread->timeout_list != (&_timeout_registry) is always true so it's impossible to exit the loop. Simplifying a bit we get: struct list { struct list *n; }; struct obj { int n; struct list l; } _o; struct list _l = { .n = &_o.l }; int main(int argc, char *argv[]) { struct obj *o = &_o; _o.l.n = &_l; while (&o->l != &_l) o = ((struct obj *)((const char *)(o->l.n) - (const char *)&((struct obj *)0)->l)); return 0; } Compile with just -ftree-pta: gcc -g -Og -fno-combine-stack-adjustments -fno-compare-elim -fno-cprop-registers -fno-defer-pop -fno-forward-propagate -fno-guess-branch-probability -fno-inline -fno-ipa-profile -fno-ipa-pure-const -fno-ipa-reference -fno-ipa-reference-addressable -fno-omit-frame-pointer -fno-reorder-blocks -fno-shrink-wrap -fno-split-wide-types -fno-toplevel-reorder -fno-tree-builtin-call-dce -fno-tree-ccp -fno-tree-ch -fno-tree-coalesce-vars -fno-tree-copy-prop -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre -fno-tree-sink -fno-tree-slsr -fno-tree-ter -ftree-pta on rawhide gcc version 10.0.1 20200216 (Red Hat 10.0.1-0.8) (GCC) gives us this endless loop: (gdb) disas main Dump of assembler code for function main: 0x0000000000401106 <+0>: movq $0x404020,0x2f37(%rip) # 0x404048 <_o+8> 0x0000000000401111 <+11>: mov $0x404040,%eax 0x0000000000401116 <+16>: mov 0x8(%rax),%rax 0x000000000040111a <+20>: sub $0x8,%rax 0x000000000040111e <+24>: jmp 0x401116 <main+16> versus with gcc -g (gdb) disas main Dump of assembler code for function main: 0x0000000000401106 <+0>: push %rbp 0x0000000000401107 <+1>: mov %rsp,%rbp 0x000000000040110a <+4>: mov %edi,-0x14(%rbp) 0x000000000040110d <+7>: mov %rsi,-0x20(%rbp) 0x0000000000401111 <+11>: movq $0x404040,-0x8(%rbp) 0x0000000000401119 <+19>: movq $0x404020,0x2f24(%rip) # 0x404048 <_o+8> 0x0000000000401124 <+30>: jmp 0x401136 <main+48> 0x0000000000401126 <+32>: mov -0x8(%rbp),%rax 0x000000000040112a <+36>: mov 0x8(%rax),%rax 0x000000000040112e <+40>: sub $0x8,%rax 0x0000000000401132 <+44>: mov %rax,-0x8(%rbp) 0x0000000000401136 <+48>: mov -0x8(%rbp),%rax 0x000000000040113a <+52>: add $0x8,%rax 0x000000000040113e <+56>: cmp $0x404020,%rax 0x0000000000401144 <+62>: jne 0x401126 <main+32> 0x0000000000401146 <+64>: mov $0x0,%eax 0x000000000040114b <+69>: pop %rbp 0x000000000040114c <+70>: retq
The code is bogus. Note the assignment to "o" within the loop. You're subtracting two pointers, that can never result in a pointer in valid code. PTA knows this. Another package had a similar error, but I can't remember it offhand.
So it should be subtracting size_t o = ((struct obj *)((const char *)(o->l.n) - (size_t)(const char *)&((struct obj *)0)->l)); or better o = ((struct obj *)((const char *)(o->l.n) - offsetof(struct obj, l))); Could the compiler warn about problems like these?
- Check the other macros - Consider modernising them with offsetof (stddef.h) and maybe container_of like the kernel - Also consider type checking like the kernel - Check the rest of the code base for any similar problems
See also [bug 1812176] if that might be related.
(In reply to Jan Pokorný [poki] from comment #5) > See also [bug 1812176] if that might be related. Very likely if an easy fix hasn't been pushed out yet.
Already modified in upstream lvm2 with these 2 patches: https://www.redhat.com/archives/lvm-devel/2020-March/msg00014.html https://www.redhat.com/archives/lvm-devel/2020-March/msg00012.html
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.