kernel commit #2062afb4f804a was an overreaction to a gcc bug that's already fixed in f21/rawhide, and impedes functionality of perf/systemtap/crash. Please nuke it in Fedora/RHEL builds that might inherit 3.16.
Uuugghhh. So I don't see upstream making this a conditional at this point. Which means "nuke" requires us to carry a patch to disable it. If that's what is required then I guess we can do that, but it's another deviation from upstream (and a fairly large one) that make it harder for us to interact with them. Can you summarize what the problem is for perf/systemtap/crash?
With the upstream hack, you can loose debug info coverage of some to most of the variables and debug info becomes significantly less accurate. For tools like systemtap that is a show stopper. Just try to build kernel with -fvar-tracking-assignments and -fno-var-tracking-assignments and compare dwlocstat output from those two builds. I don't understand Linus' decision, perhaps he just thinks debug info is useless, if there is some wrong-code at -O2, he would not build the kernel with -O0 just because of that.
Josh, reversion can consist of a one character "#" addition to the top-level Makefile. The git comment was dozens of times larger than the patch - it was not large.
(In reply to Frank Ch. Eigler from comment #3) > Josh, reversion can consist of a one character "#" addition to the top-level > Makefile. The git comment was dozens of times larger than the patch - it > was not large. I'm aware of that. I'm not stupid. The actual patch itself isn't really the problem. Come on Frank, surely you know that. The problem is this is unconditional upstream. That means that upstream will never have kernel builds with ths gcc option present, which means anyone that reverts this is deviating from upstream in a way that upstream can't (or more likely won't) recreate. This may be theoretical, but it's still a concern. A concern that seems warranted give the issue the prompted this in the first place. I'll try and argue again for it at least being conditional like DEBUG_INFO is. I'd rather at least have this configurable upstream. It should be fairly easy to make a case for this given that it breaks multiple things on multiple distros (if I understand the problem fully).
"I'll try and argue again for it at least being conditional" Sounds good.
BTW, this commit is already in the stable kernels as well, so it's not contained to 3.16.
The gcc bug is already fixed in Rawhide/F21, but has the fix been applied to F20 gcc?
Users are encountering this problem on F20 now. https://sourceware.org/bugzilla/show_bug.cgi?id=17362
Not in F20 yet, it is in F21/rawhide/RHEL 7.0.z/RHEL 7.1 tree. That said, I think the latent bug has only been triggered on the kernel with gcc 4.9.x and later, not with 4.8.x or earlier.
Also, that sourceware.org report seems to be speculation. It would be good to get confirmation. Thinking about this more, I think this is something gcc/systemtap is going to have to deal with in an upstream manner. It does systemtap very little good to only work on Fedora kernels.
(In reply to Josh Boyer from comment #10) > Thinking about this more, I think this is something gcc/systemtap is going > to have to deal with in an upstream manner. It does systemtap very little > good to only work on Fedora kernels. There's nothing systemtap can do except what you see in that sourceware bug - apologize that the parameters you wanted aren't available. If the compiler hides data somewhere without telling us about it in DWARF, we're stuck. We'll still operate as much as possible, just limited to what DWARF information is actually available.
(In reply to Josh Stone from comment #11) > (In reply to Josh Boyer from comment #10) > > Thinking about this more, I think this is something gcc/systemtap is going > > to have to deal with in an upstream manner. It does systemtap very little > > good to only work on Fedora kernels. > > There's nothing systemtap can do except what you see in that sourceware bug > - apologize that the parameters you wanted aren't available. If the > compiler hides data somewhere without telling us about it in DWARF, we're > stuck. We'll still operate as much as possible, just limited to what DWARF > information is actually available. Yes, I understand that. Which is exactly what you're going to have to do with every other distribution or upstream kernel built. I tried getting it to be configurable, but I am certainly not the best person to argue on behalf of systemtap. I am nothing more than an ill informed middle man in that case. I'd suggest restarting the conversation upstream and/or proposing alternative tests that could be run to verify gcc has the bug fixed at kernel build time.
> I tried getting it to be configurable, but I am certainly not the best > person to argue on behalf of systemtap. (It's not just systemtap that's affected, but crash, perf, etc.) I hope someone does eventually convince Linus of a more precise solution to the problem. In the interim, let's not punish those Fedora users whose compilers are fine (f21 + rawhide, f19/f20 coming soon after a gcc update).
I sent a kbuild patch to make VTA optional: https://lkml.org/lkml/2014/10/6/461
+1, this makes debugging hard. I wanted to debug an inotify issue and the systemtap output is useless becaue of this bug.
How would you debug the issue on an upstream kernel?
To clarify my question, you're an upstream glibc maintainer. If you faced this issue on another distribution using an upstream kernel or even their kernel, what would you do?
(In reply to Josh Boyer from comment #17) > To clarify my question, you're an upstream glibc maintainer. If you faced > this issue on another distribution using an upstream kernel or even their > kernel, what would you do? File a bug with my distribution and ask politely for the kernel maintainers to rever the commit :-) If I *had* to fix it, and I can, I'd rebuild a kernel myself with the patch reverted. However, every time you make me do that, it's less time I spend on glibc fixing issues for Fedora users (like making cloud images smaller for Fedora by splitting out localizations).
My point is, if this is such a benefit to users then why are you the first to ask about it? I'm not aware of any other distributions carrying a patch to enable it either, but I would love to be wrong there. At any rate, before I add Josh's patch (which btw thank you for sending) I need to know exactly which gcc versions for each release contain the bugfix making it safe to enable so I can adjust the BR on gcc for the kernel.
(In reply to Josh Boyer from comment #19) > My point is, if this is such a benefit to users then why are you the first > to ask about it? BTW. I am simply using an earlier kernel till this is fixed. Sorry for not explicitly asking. I saw the bug was already filed and the patch posted, so I assumed it would be fixed soon without having to ask or add a +1.
I was thinking it's been far long enough without response that I should resend that patch upstream. Josh, any comment about it before I do?
(In reply to Josh Boyer from comment #19) > At any rate, before I add Josh's patch (which btw thank you for sending) I > need to know exactly which gcc versions for each release contain the bugfix > making it safe to enable so I can adjust the BR on gcc for the kernel. So, when talking about the only known wrong-code caused by VTA, PR61801: RHEL 6.6 : gcc >= 4.4.7-10 (though, the bug is just theoretical in 4.4-RH) RHEL 7.0.z : gcc >= 4.8.2-16.2.el7 RHEL 7.1 : gcc >= 4.8.3-2.el7 Fedora 19 : gcc >= 4.8.3-2.fc19 Fedora 20 : gcc >= 4.8.3-2.fc20 Fedora 21 : gcc >= 4.9.1-2.fc21 Fedora 22 : gcc >= 4.9.1-2.fc22 Developer Toolset [12] : not fixed Developer Toolset 3 : devtoolset-3-gcc >= 4.9.1-2.el{6,7} (though, DTS isn't used for building kernel, so you only care about the gcc versions).
When this scratch build completes (x86_64 only), could you guys try it out and see if the issue is fixed for you? http://koji.fedoraproject.org/koji/taskinfo?taskID=8143450
(In reply to Jakub Jelinek from comment #25) > (In reply to Josh Boyer from comment #19) > > At any rate, before I add Josh's patch (which btw thank you for sending) I > > need to know exactly which gcc versions for each release contain the bugfix > > making it safe to enable so I can adjust the BR on gcc for the kernel. > > So, when talking about the only known wrong-code caused by VTA, PR61801: > RHEL 6.6 : gcc >= 4.4.7-10 (though, the bug is just theoretical in 4.4-RH) > RHEL 7.0.z : gcc >= 4.8.2-16.2.el7 > RHEL 7.1 : gcc >= 4.8.3-2.el7 > Fedora 19 : gcc >= 4.8.3-2.fc19 > Fedora 20 : gcc >= 4.8.3-2.fc20 > Fedora 21 : gcc >= 4.9.1-2.fc21 > Fedora 22 : gcc >= 4.9.1-2.fc22 > Developer Toolset [12] : not fixed > Developer Toolset 3 : devtoolset-3-gcc >= 4.9.1-2.el{6,7} > (though, DTS isn't used for building kernel, so you only care about the gcc > versions). Thanks Jakub.
(In reply to Josh Stone from comment #24) > I was thinking it's been far long enough without response that I should > resend that patch upstream. Josh, any comment about it before I do? Nothing specific, no :\. The only minor suggestion is to CC some of the other distro kernel maintainers and Andrew Morton, but I'm not sure that is going to make a difference.
(In reply to Josh Boyer from comment #26) > When this scratch build completes (x86_64 only), could you guys try it out > and see if the issue is fixed for you? > > http://koji.fedoraproject.org/koji/taskinfo?taskID=8143450 Sorry I missed this earlier. I confirmed that this scratch build fixes the "do_execve" inline parameters as noted in sourceware PR17362, and also corrects 10 debuginfo failures found in the full systemtap testsuite.
I resent my patch upstream: https://lkml.org/lkml/2014/11/21/505
(In reply to Josh Stone from comment #30) > I resent my patch upstream: https://lkml.org/lkml/2014/11/21/505 Applied in rawhide and a build kicked off. I'll get it into F20 and F21 later today. Thanks for the persistence. One can hope eventually upstream will pick it up.
kernel-3.17.7-300.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/kernel-3.17.7-300.fc21
kernel-3.17.7-200.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.17.7-200.fc20
Package kernel-3.17.7-200.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.17.7-200.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-17283/kernel-3.17.7-200.fc20 then log in and leave karma (feedback).
kernel-3.17.7-200.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.17.7-300.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.