Bug 1126580 - need suppression of kernel commit #2062afb4f804a (gcc -fvar-tracking)
Summary: need suppression of kernel commit #2062afb4f804a (gcc -fvar-tracking)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-04 20:24 UTC by Frank Ch. Eigler
Modified: 2014-12-22 02:32 UTC (History)
12 users (show)

Fixed In Version: kernel-3.17.7-300.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-21 06:36:04 UTC


Attachments (Terms of Use)

Description Frank Ch. Eigler 2014-08-04 20:24:25 UTC
kernel commit #2062afb4f804a was an overreaction to a gcc bug that's already
fixed in f21/rawhide, and impedes functionality of perf/systemtap/crash.
Please nuke it in Fedora/RHEL builds that might inherit 3.16.

Comment 1 Josh Boyer 2014-08-05 10:18:02 UTC
Uuugghhh.

So I don't see upstream making this a conditional at this point.  Which means "nuke" requires us to carry a patch to disable it.  If that's what is required then I guess we can do that, but it's another deviation from upstream (and a fairly large one) that make it harder for us to interact with them.

Can you summarize what the problem is for perf/systemtap/crash?

Comment 2 Jakub Jelinek 2014-08-05 10:24:19 UTC
With the upstream hack, you can loose debug info coverage of some to most of the variables and debug info becomes significantly less accurate.  For tools like systemtap that is a show stopper.
Just try to build kernel with -fvar-tracking-assignments and -fno-var-tracking-assignments and compare dwlocstat output from those two builds.

I don't understand Linus' decision, perhaps he just thinks debug info is useless, if there is some wrong-code at -O2, he would not build the kernel with -O0 just because of that.

Comment 3 Frank Ch. Eigler 2014-08-05 11:11:10 UTC
Josh, reversion can consist of a one character "#" addition to the top-level
Makefile.  The git comment was dozens of times larger than the patch - it
was not large.

Comment 4 Josh Boyer 2014-08-05 11:24:28 UTC
(In reply to Frank Ch. Eigler from comment #3)
> Josh, reversion can consist of a one character "#" addition to the top-level
> Makefile.  The git comment was dozens of times larger than the patch - it
> was not large.

I'm aware of that.  I'm not stupid.  The actual patch itself isn't really the problem.  Come on Frank, surely you know that.

The problem is this is unconditional upstream.  That means that upstream will never have kernel builds with ths gcc option present, which means anyone that reverts this is deviating from upstream in a way that upstream can't (or more likely won't) recreate.  This may be theoretical, but it's still a concern.  A concern that seems warranted give the issue the prompted this in the first place.

I'll try and argue again for it at least being conditional like DEBUG_INFO is. I'd rather at least have this configurable upstream.  It should be fairly easy to make a case for this given that it breaks multiple things on multiple distros (if I understand the problem fully).

Comment 5 Frank Ch. Eigler 2014-08-05 11:31:22 UTC
"I'll try and argue again for it at least being conditional"

Sounds good.

Comment 6 Josh Boyer 2014-08-05 11:37:44 UTC
BTW, this commit is already in the stable kernels as well, so it's not contained to 3.16.

Comment 7 Justin M. Forbes 2014-08-20 13:35:01 UTC
The gcc bug is already fixed in Rawhide/F21, but has the fix been applied to F20 gcc?

Comment 8 Frank Ch. Eigler 2014-09-08 15:34:36 UTC
Users are encountering this problem on F20 now.
https://sourceware.org/bugzilla/show_bug.cgi?id=17362

Comment 9 Jakub Jelinek 2014-09-08 15:43:05 UTC
Not in F20 yet, it is in F21/rawhide/RHEL 7.0.z/RHEL 7.1 tree.
That said, I think the latent bug has only been triggered on the kernel with gcc 4.9.x and later, not with 4.8.x or earlier.

Comment 10 Josh Boyer 2014-09-08 18:24:42 UTC
Also, that sourceware.org report seems to be speculation.  It would be good to get confirmation.

Thinking about this more, I think this is something gcc/systemtap is going to have to deal with in an upstream manner.  It does systemtap very little good to only work on Fedora kernels.

Comment 11 Josh Stone 2014-09-08 18:42:20 UTC
(In reply to Josh Boyer from comment #10)
> Thinking about this more, I think this is something gcc/systemtap is going
> to have to deal with in an upstream manner.  It does systemtap very little
> good to only work on Fedora kernels.

There's nothing systemtap can do except what you see in that sourceware bug - apologize that the parameters you wanted aren't available.  If the compiler hides data somewhere without telling us about it in DWARF, we're stuck.  We'll still operate as much as possible, just limited to what DWARF information is actually available.

Comment 12 Josh Boyer 2014-09-08 18:48:29 UTC
(In reply to Josh Stone from comment #11)
> (In reply to Josh Boyer from comment #10)
> > Thinking about this more, I think this is something gcc/systemtap is going
> > to have to deal with in an upstream manner.  It does systemtap very little
> > good to only work on Fedora kernels.
> 
> There's nothing systemtap can do except what you see in that sourceware bug
> - apologize that the parameters you wanted aren't available.  If the
> compiler hides data somewhere without telling us about it in DWARF, we're
> stuck.  We'll still operate as much as possible, just limited to what DWARF
> information is actually available.

Yes, I understand that.  Which is exactly what you're going to have to do with every other distribution or upstream kernel built.

I tried getting it to be configurable, but I am certainly not the best person to argue on behalf of systemtap.  I am nothing more than an ill informed middle man in that case.  I'd suggest restarting the conversation upstream and/or proposing alternative tests that could be run to verify gcc has the bug fixed at kernel build time.

Comment 13 Frank Ch. Eigler 2014-09-08 18:54:02 UTC
> I tried getting it to be configurable, but I am certainly not the best
> person to argue on behalf of systemtap.  

(It's not just systemtap that's affected, but crash, perf, etc.)

I hope someone does eventually convince Linus of a more precise
solution to the problem.  In the interim, let's not punish those
Fedora users whose compilers are fine (f21 + rawhide, f19/f20
coming soon after a gcc update).

Comment 14 Josh Stone 2014-10-16 16:59:15 UTC
I sent a kbuild patch to make VTA optional:
https://lkml.org/lkml/2014/10/6/461

Comment 15 Carlos O'Donell 2014-11-13 22:03:21 UTC
+1, this makes debugging hard. I wanted to debug an inotify issue and the systemtap output is useless becaue of this bug.

Comment 16 Josh Boyer 2014-11-13 22:15:57 UTC
How would you debug the issue on an upstream kernel?

Comment 17 Josh Boyer 2014-11-13 22:18:05 UTC
To clarify my question, you're an upstream glibc maintainer.  If you faced this issue on another distribution using an upstream kernel or even their kernel, what would you do?

Comment 18 Carlos O'Donell 2014-11-14 14:05:02 UTC
(In reply to Josh Boyer from comment #17)
> To clarify my question, you're an upstream glibc maintainer.  If you faced
> this issue on another distribution using an upstream kernel or even their
> kernel, what would you do?

File a bug with my distribution and ask politely for the kernel maintainers to rever the commit :-)

If I *had* to fix it, and I can, I'd rebuild a kernel myself with the patch reverted. However, every time you make me do that, it's less time I spend on glibc fixing issues for Fedora users (like making cloud images smaller for Fedora by splitting out localizations).

Comment 19 Josh Boyer 2014-11-14 14:22:04 UTC
My point is, if this is such a benefit to users then why are you the first to ask about it?  I'm not aware of any other distributions carrying a patch to enable it either, but I would love to be wrong there.

At any rate, before I add Josh's patch (which btw thank you for sending) I need to know exactly which gcc versions for each release contain the bugfix making it safe to enable so I can adjust the BR on gcc for the kernel.

Comment 21 Mark Wielaard 2014-11-14 15:44:49 UTC
(In reply to Josh Boyer from comment #19)
> My point is, if this is such a benefit to users then why are you the first
> to ask about it?

BTW. I am simply using an earlier kernel till this is fixed. Sorry for not explicitly asking. I saw the bug was already filed and the patch posted, so I assumed it would be fixed soon without having to ask or add a +1.

Comment 24 Josh Stone 2014-11-14 17:04:08 UTC
I was thinking it's been far long enough without response that I should resend that patch upstream.  Josh, any comment about it before I do?

Comment 25 Jakub Jelinek 2014-11-14 17:06:17 UTC
(In reply to Josh Boyer from comment #19)
> At any rate, before I add Josh's patch (which btw thank you for sending) I
> need to know exactly which gcc versions for each release contain the bugfix
> making it safe to enable so I can adjust the BR on gcc for the kernel.

So, when talking about the only known wrong-code caused by VTA, PR61801:
RHEL 6.6 : gcc >= 4.4.7-10 (though, the bug is just theoretical in 4.4-RH)
RHEL 7.0.z : gcc >= 4.8.2-16.2.el7
RHEL 7.1 : gcc >= 4.8.3-2.el7
Fedora 19 : gcc >= 4.8.3-2.fc19
Fedora 20 : gcc >= 4.8.3-2.fc20
Fedora 21 : gcc >= 4.9.1-2.fc21
Fedora 22 : gcc >= 4.9.1-2.fc22
Developer Toolset [12] : not fixed
Developer Toolset 3 : devtoolset-3-gcc >= 4.9.1-2.el{6,7}
(though, DTS isn't used for building kernel, so you only care about the gcc versions).

Comment 26 Josh Boyer 2014-11-14 17:13:21 UTC
When this scratch build completes (x86_64 only), could you guys try it out and see if the issue is fixed for you?

http://koji.fedoraproject.org/koji/taskinfo?taskID=8143450

Comment 27 Josh Boyer 2014-11-14 17:13:43 UTC
(In reply to Jakub Jelinek from comment #25)
> (In reply to Josh Boyer from comment #19)
> > At any rate, before I add Josh's patch (which btw thank you for sending) I
> > need to know exactly which gcc versions for each release contain the bugfix
> > making it safe to enable so I can adjust the BR on gcc for the kernel.
> 
> So, when talking about the only known wrong-code caused by VTA, PR61801:
> RHEL 6.6 : gcc >= 4.4.7-10 (though, the bug is just theoretical in 4.4-RH)
> RHEL 7.0.z : gcc >= 4.8.2-16.2.el7
> RHEL 7.1 : gcc >= 4.8.3-2.el7
> Fedora 19 : gcc >= 4.8.3-2.fc19
> Fedora 20 : gcc >= 4.8.3-2.fc20
> Fedora 21 : gcc >= 4.9.1-2.fc21
> Fedora 22 : gcc >= 4.9.1-2.fc22
> Developer Toolset [12] : not fixed
> Developer Toolset 3 : devtoolset-3-gcc >= 4.9.1-2.el{6,7}
> (though, DTS isn't used for building kernel, so you only care about the gcc
> versions).

Thanks Jakub.

Comment 28 Josh Boyer 2014-11-14 17:14:36 UTC
(In reply to Josh Stone from comment #24)
> I was thinking it's been far long enough without response that I should
> resend that patch upstream.  Josh, any comment about it before I do?

Nothing specific, no :\.  The only minor suggestion is to CC some of the other distro kernel maintainers and Andrew Morton, but I'm not sure that is going to make a difference.

Comment 29 Josh Stone 2014-11-21 18:30:03 UTC
(In reply to Josh Boyer from comment #26)
> When this scratch build completes (x86_64 only), could you guys try it out
> and see if the issue is fixed for you?
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=8143450

Sorry I missed this earlier.  I confirmed that this scratch build fixes the "do_execve" inline parameters as noted in sourceware PR17362, and also corrects 10 debuginfo failures found in the full systemtap testsuite.

Comment 30 Josh Stone 2014-11-21 18:51:59 UTC
I resent my patch upstream: https://lkml.org/lkml/2014/11/21/505

Comment 31 Josh Boyer 2014-12-16 19:27:23 UTC
(In reply to Josh Stone from comment #30)
> I resent my patch upstream: https://lkml.org/lkml/2014/11/21/505

Applied in rawhide and a build kicked off.  I'll get it into F20 and F21 later today.

Thanks for the persistence.  One can hope eventually upstream will pick it up.

Comment 32 Fedora Update System 2014-12-17 19:01:50 UTC
kernel-3.17.7-300.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/kernel-3.17.7-300.fc21

Comment 33 Fedora Update System 2014-12-17 19:03:45 UTC
kernel-3.17.7-200.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.17.7-200.fc20

Comment 34 Fedora Update System 2014-12-19 18:31:12 UTC
Package kernel-3.17.7-200.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.17.7-200.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-17283/kernel-3.17.7-200.fc20
then log in and leave karma (feedback).

Comment 35 Fedora Update System 2014-12-21 06:36:04 UTC
kernel-3.17.7-200.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 36 Fedora Update System 2014-12-22 02:32:21 UTC
kernel-3.17.7-300.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.