Bug 1207874 - Recursive error in radeon device driver module after resume from hibernation
Summary: Recursive error in radeon device driver module after resume from hibernation
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 21
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-31 23:51 UTC by gitne
Modified: 2016-02-24 17:33 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-02 10:42:57 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Kernel log of recursive error in radeon device driver module (1.61 MB, text/plain)
2015-03-31 23:51 UTC, gitne
no flags Details
lspci (22.45 KB, text/plain)
2015-03-31 23:56 UTC, gitne
no flags Details
cpuinfo (4.64 KB, text/plain)
2015-03-31 23:58 UTC, gitne
no flags Details
meminfo (1.20 KB, text/plain)
2015-04-01 00:00 UTC, gitne
no flags Details
iomem (2.61 KB, text/plain)
2015-04-01 00:55 UTC, gitne
no flags Details
ioports (1.48 KB, text/plain)
2015-04-01 00:56 UTC, gitne
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 77181 0 None None None Never
Linux Kernel 95911 0 None None None Never

Internal Links: 1207887

Description gitne 2015-03-31 23:51:31 UTC
Created attachment 1009330 [details]
Kernel log of recursive error in radeon device driver module

Description of problem:
Recursive error in radeon device driver module after resume from hibernation. See attached kernel log for details.

Version-Release number of selected component (if applicable):
Linux 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (and later)

Steps to reproduce:
1. Run X.Org X Server 1.16.3 with radeon X.Org driver 7.5.0 (plus GNOME 3.14 or whatever else you like) on AMD A10-7800 Radeon R7
2. Suspend to disk with i.e. "systemctl hibernate".
3. Resume from hibernation.

Actual results:
Current login session is killed and X.Org server recovers into AIGLX software rendering.

Expected results:
No error in radeon device driver module.

Additional info:
See attached cpuinfo and meminfo. Booting with radeon default kernel boot parameters.

Comment 1 gitne 2015-03-31 23:56:54 UTC
Created attachment 1009331 [details]
lspci

Comment 2 gitne 2015-03-31 23:58:47 UTC
Created attachment 1009332 [details]
cpuinfo

Comment 3 gitne 2015-04-01 00:00:12 UTC
Created attachment 1009333 [details]
meminfo

Comment 4 gitne 2015-04-01 00:55:36 UTC
Created attachment 1009352 [details]
iomem

Comment 5 gitne 2015-04-01 00:56:11 UTC
Created attachment 1009353 [details]
ioports

Comment 6 gitne 2015-05-03 07:29:37 UTC
Why is this bug report linked to bug FreeDesktop.org 89829? As far as I can tell these bugs have nothing in common.

Comment 7 Felix Schwarz 2015-05-03 09:58:09 UTC
Not sure why I linked this specific fdo bug report as it is clearly not the same bug (as you noticed as well). I think I saw a very similar bug report upstream but can't find it now.

Anyhow: From my experience it's best if you file your bug report upstream if you are willing to bisect/test new patches from the AMD developers. This is very likely a genuine upstream bug and Fedora developers are usually busy fixing all the integration stuff.

Comment 8 gitne 2015-05-06 16:13:27 UTC
Linked to bug report upstream.

Comment 9 gitne 2015-05-06 16:33:15 UTC
This bug is obviously also linked to Linux kernel bugs 77181 and 60827. Mantas Mikulėnas has determined that git commit 4474f3a91f95 was the last known good to work. archiesix has determined that this bug persists even since kernel version 3.9.11.

It's a pity that actually *users* have to do the digging for this kind of information. Its all there but kernel developers are obviously too tired or too lazy to do actual work after they have spent countless hours bragging about how genius they are in delivering fucked up work. If you can't do it, don't touch it.
Oh and another "secret" has been revealed: The bug is caused by ring test failures. Wow! Who could have thought of that!?

Comment 10 Felix Schwarz 2015-05-06 20:42:27 UTC
Jakob: I think it's best not to assign this to bugzilla's "kernel" component. At least the intel bugs get assigned to the xorg-x11-... because there are just too many "kernel" bugs in Fedora. I assume the AMD guys employ the same strategy.

> It's a pity that actually *users* have to do the digging for this kind of
> information. Its all there but kernel developers are obviously too tired or
> too lazy to do actual work after they have spent countless hours bragging
> about how genius they are in delivering fucked up work. If you can't do it,
> don't touch it.

I can understand your frustration but I think you should take some extra time before writing such statements. https://getfedora.org/code-of-conduct

Comment 11 Jérôme Glisse 2015-05-06 21:52:49 UTC
(In reply to Jacob Wisor from comment #9)
> This bug is obviously also linked to Linux kernel bugs 77181 and 60827.
> Mantas Mikulėnas has determined that git commit 4474f3a91f95 was the last
> known good to work. archiesix has determined that this bug persists even
> since kernel version 3.9.11.
> 
> It's a pity that actually *users* have to do the digging for this kind of
> information. Its all there but kernel developers are obviously too tired or
> too lazy to do actual work after they have spent countless hours bragging
> about how genius they are in delivering fucked up work. If you can't do it,
> don't touch it.
> Oh and another "secret" has been revealed: The bug is caused by ring test
> failures. Wow! Who could have thought of that!?

You obviously assume that if you are hitting this bug so must all other people with same hardware. Well that's a wrong assumption, even for a same family of GPU each of the OEM (Asus, Saphire, ...) customize the video bios and select different components for their board notably memory chip. Add different motherboard, system memory, system bios, system PCIE chipset, ... to the mix and you end up with vastly different configurations in which each elements might trigger a bug that only happen with this specific configurations.

So if you are hitting a bug such like this, it is likely because none of the dev are hitting it on their hardware. You might be unlucky or the dev might be lucky. But you should not assume that when it comes to the hardware, a bug affecting a GPU family affects all the GPU of that family. It is a lot more complex.

Comment 12 gitne 2015-05-29 08:20:03 UTC
(In reply to Felix Schwarz from comment #10)
> Jakob: I think it's best not to assign this to bugzilla's "kernel"
> component. At least the intel bugs get assigned to the xorg-x11-... because
> there are just too many "kernel" bugs in Fedora. I assume the AMD guys
> employ the same strategy.

I see, thank you for the info. I was compelled to move this bug back to the kernel component because it is a kernel-space bug, not a user-space bug. The xorg-x11-drv-ati component seems to be for the AMD driver module of the X11 server only, which of course runs in user-space. But, if xorg-x11-drv-ati is synonymous with the radeon kernel device driver module and the X11 server's AMD device driver module in this context then the bug should have probably stayed in xorg-x11-drv-ati. Do you want me to change it back?
 
> > It's a pity that actually *users* have to do the digging for this kind of
> > information. Its all there but kernel developers are obviously too tired or
> > too lazy to do actual work after they have spent countless hours bragging
> > about how genius they are in delivering fucked up work. If you can't do it,
> > don't touch it.
> 
> I can understand your frustration but I think you should take some extra
> time before writing such statements. https://getfedora.org/code-of-conduct

Okay, fair enough. However, I have no means to verify or to know that my problem is taken care of seriously. There are plenty of bug reports in both Bugzilla systems that linger around for a long time. And too be honest, I do not believe they are all taken care of.
I have worked with bug trackers myself and fixed a lot of bugs in my course of work, even the hard interdependent ones. Many people said it was impossible or too much work to do but it can be done. You just need to be persistent and work with the reporters, otherwise you have little chance to actually fixing anything. So please work with me. Give me some modified kernel package, what ever, but work with me. If you do not have the exact hardware to replicate the error then this situation calls even more so for working with the reporter by testing code. For now, I have the feeling that we have not been doing much more than just juggling around a bug report in Bugzilla.

Comment 13 gitne 2015-05-29 08:57:03 UTC
(In reply to Jerome Glisse from comment #11)
> (In reply to Jacob Wisor from comment #9)
> > This bug is obviously also linked to Linux kernel bugs 77181 and 60827.
> > Mantas Mikulėnas has determined that git commit 4474f3a91f95 was the last
> > known good to work. archiesix has determined that this bug persists even
> > since kernel version 3.9.11.
> > 
> > It's a pity that actually *users* have to do the digging for this kind of
> > information. Its all there but kernel developers are obviously too tired or
> > too lazy to do actual work after they have spent countless hours bragging
> > about how genius they are in delivering fucked up work. If you can't do it,
> > don't touch it.
> > Oh and another "secret" has been revealed: The bug is caused by ring test
> > failures. Wow! Who could have thought of that!?
> 
> You obviously assume that if you are hitting this bug so must all other
> people with same hardware. Well that's a wrong assumption, even for a same
> family of GPU each of the OEM (Asus, Saphire, ...) customize the video bios
> and select different components for their board notably memory chip. Add
> different motherboard, system memory, system bios, system PCIE chipset, ...
> to the mix and you end up with vastly different configurations in which each
> elements might trigger a bug that only happen with this specific
> configurations.

I am not running on any OEM hardware, no dedicated GPU, just the A10's integrated GPU. They only component that I can think of to be adding any "magic sauce" here is the system BIOS. And since the system BIOS is usually involved while transitioning to the hibernation or suspend states it may be a problem indeed. However, I doubt that very much because the hibernation and suspend features work just fine on Windows (running the AMD graphics device driver) *and* when running in VESA mode on Linux. So it is pretty easy to rule out the system BIOS having any bug that all of the aforementioned software pieces might employ equally to workaround that bug. Hence, this bug can clearly be only attributed to the radeon kernel device driver module or the dynamically loaded Kaveri GPU firmware, which comes with the kernel.

> So if you are hitting a bug such like this, it is likely because none of the
> dev are hitting it on their hardware. You might be unlucky or the dev might
> be lucky. But you should not assume that when it comes to the hardware, a
> bug affecting a GPU family affects all the GPU of that family. It is a lot
> more complex.

First of all, I did not assume nor stated that this bug affects the entire family of GPUs. All I have said is that the A10 integrated GPU should have quite a notable market penetration by now, so that it should be relatively easy to find hardware for testing.
Secondly, if this is a bug specific to a certain configuration only - which again, I highly doubt - then why not work together with the person who is experiencing the bug instead of just shrugging one's shoulders and dispensing platonic pity about the fact that the person is affected? I am sorry, but so far I have seen no serious attempt to work with me.

Comment 14 Fedora End Of Life 2015-11-04 11:45:15 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Fedora End Of Life 2015-12-02 10:43:03 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.