Bug 1809826 - gcc miscompiles lvm2 code without using -fno-tree-pta
Summary: gcc miscompiles lvm2 code without using -fno-tree-pta
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 33
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
Assignee: Zdenek Kabelac
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-03 23:45 UTC by Zdenek Kabelac
Modified: 2021-11-30 16:19 UTC (History)
28 users (show)

Fixed In Version: lvm2-2.03.09-1.fc33
Clone Of:
Environment:
Last Closed: 2021-11-30 16:19:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Code showing 'endless' loop with gcc10 & -O2 (1.15 KB, text/x-csrc)
2020-03-03 23:45 UTC, Zdenek Kabelac
no flags Details

Description Zdenek Kabelac 2020-03-03 23:45:28 UTC
Created attachment 1667329 [details]
Code showing 'endless' loop with gcc10 & -O2

Description of problem:

Current version of lvm2 in rawhide/fc32/fc33 compiled by latest gcc (gcc-c++-10.0.1-0.8.fc33.x86_64 in my case) is working improperly.

I've tried to extract some example code showing miscompilation of existing lvm2 code (this example approximates  daemons/dmeventd/dmeventd.c  _timeout_thread())

The code prints 1x  with compiled with -O0  or  -O2 -fno-tree-pta
Or prints endless loop of text when compiled just with -O2.

Leaving to 'experts' to diagnose the exact reason of this issue -
Is there is some 'workaround' for lvm2 code ?


Version-Release number of selected component (if applicable):
gcc-10.0.1-0.8.fc33.x86_64

How reproducible:
gcc-10 -O2 

Workaround:
So far disabling '-fno-tree-pta' seems to generate working code.

Comment 1 Alasdair Kergon 2020-03-04 03:01:04 UTC
Well it looks as though the optimiser is deciding that this:
  &v->field != (head);
which expands to
  &thread->timeout_list != (&_timeout_registry)
is always true so it's impossible to exit the loop.


Simplifying a bit we get:

struct list {
	struct list *n;
};

struct obj {
	int n;
	struct list l;
} _o;

struct list _l = { .n = &_o.l };

int main(int argc, char *argv[])
{
	struct obj *o = &_o;

	_o.l.n = &_l;

	while (&o->l != &_l)
		o = ((struct obj *)((const char *)(o->l.n) - (const char *)&((struct obj *)0)->l));

	return 0;
}


Compile with just -ftree-pta:

gcc -g -Og -fno-combine-stack-adjustments -fno-compare-elim -fno-cprop-registers -fno-defer-pop -fno-forward-propagate -fno-guess-branch-probability -fno-inline -fno-ipa-profile -fno-ipa-pure-const -fno-ipa-reference -fno-ipa-reference-addressable -fno-omit-frame-pointer -fno-reorder-blocks -fno-shrink-wrap -fno-split-wide-types -fno-toplevel-reorder -fno-tree-builtin-call-dce -fno-tree-ccp -fno-tree-ch -fno-tree-coalesce-vars -fno-tree-copy-prop -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre -fno-tree-sink -fno-tree-slsr -fno-tree-ter -ftree-pta 

on rawhide

gcc version 10.0.1 20200216 (Red Hat 10.0.1-0.8) (GCC) 

gives us this endless loop:

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000401106 <+0>:	movq   $0x404020,0x2f37(%rip)        # 0x404048 <_o+8>
   0x0000000000401111 <+11>:	mov    $0x404040,%eax
   0x0000000000401116 <+16>:	mov    0x8(%rax),%rax
   0x000000000040111a <+20>:	sub    $0x8,%rax
   0x000000000040111e <+24>:	jmp    0x401116 <main+16>

versus 

with gcc -g 

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000401106 <+0>:	push   %rbp
   0x0000000000401107 <+1>:	mov    %rsp,%rbp
   0x000000000040110a <+4>:	mov    %edi,-0x14(%rbp)
   0x000000000040110d <+7>:	mov    %rsi,-0x20(%rbp)
   0x0000000000401111 <+11>:	movq   $0x404040,-0x8(%rbp)
   0x0000000000401119 <+19>:	movq   $0x404020,0x2f24(%rip)        # 0x404048 <_o+8>
   0x0000000000401124 <+30>:	jmp    0x401136 <main+48>
   0x0000000000401126 <+32>:	mov    -0x8(%rbp),%rax
   0x000000000040112a <+36>:	mov    0x8(%rax),%rax
   0x000000000040112e <+40>:	sub    $0x8,%rax
   0x0000000000401132 <+44>:	mov    %rax,-0x8(%rbp)
   0x0000000000401136 <+48>:	mov    -0x8(%rbp),%rax
   0x000000000040113a <+52>:	add    $0x8,%rax
   0x000000000040113e <+56>:	cmp    $0x404020,%rax
   0x0000000000401144 <+62>:	jne    0x401126 <main+32>
   0x0000000000401146 <+64>:	mov    $0x0,%eax
   0x000000000040114b <+69>:	pop    %rbp
   0x000000000040114c <+70>:	retq

Comment 2 Jeff Law 2020-03-04 03:04:48 UTC
The code is bogus.  Note the assignment to "o" within the loop.  You're subtracting two pointers, that can never result in a pointer in valid code.  PTA knows this.

Another package had a similar error, but I can't remember it offhand.

Comment 3 Alasdair Kergon 2020-03-04 03:33:37 UTC
So it should be subtracting size_t

  o = ((struct obj *)((const char *)(o->l.n) - (size_t)(const char *)&((struct obj *)0)->l));

or better

  o = ((struct obj *)((const char *)(o->l.n) - offsetof(struct obj, l)));

Could the compiler warn about problems like these?

Comment 4 Alasdair Kergon 2020-03-04 03:41:46 UTC
- Check the other macros
  - Consider modernising them with offsetof (stddef.h) and maybe container_of like the kernel
  - Also consider type checking like the kernel
- Check the rest of the code base for any similar problems

Comment 5 Jan Pokorný [poki] 2020-03-10 17:20:54 UTC
See also [bug 1812176] if that might be related.

Comment 6 Alasdair Kergon 2020-03-10 17:25:56 UTC
(In reply to Jan Pokorný [poki] from comment #5)
> See also [bug 1812176] if that might be related.

Very likely if an easy fix hasn't been pushed out yet.

Comment 7 Zdenek Kabelac 2020-03-11 09:39:01 UTC
Already modified in upstream lvm2 with these 2 patches:

https://www.redhat.com/archives/lvm-devel/2020-March/msg00014.html
https://www.redhat.com/archives/lvm-devel/2020-March/msg00012.html

Comment 8 Ben Cotton 2020-08-11 15:31:42 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 9 Ben Cotton 2021-11-04 17:35:29 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Ben Cotton 2021-11-30 16:19:56 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.