Bug 855275 - Kernel Error during Bootup: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
Kernel Error during Bootup: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedu...
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati (Show other bugs)
17
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: X/OpenGL Maintenance List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-07 03:42 EDT by Ali Akcaagac
Modified: 2013-01-12 05:25 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-06 02:04:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
messages.txt from /var/log (108.93 KB, text/plain)
2012-09-07 03:42 EDT, Ali Akcaagac
no flags Details
Greetings from kernel-3.6.5-1.fc17.i686 (761.68 KB, image/jpeg)
2012-11-02 08:02 EDT, Ali Akcaagac
no flags Details
Archive with *Screenshots* (3.74 MB, application/x-gzip)
2012-11-02 14:27 EDT, Egor Zaharov
no flags Details
Full of oops'es (19.44 KB, application/x-bzip)
2012-11-03 09:23 EDT, Ali Akcaagac
no flags Details
Logs from failed 3.6.10 session (32.24 KB, application/x-gzip)
2012-12-31 15:45 EST, Matt Slingsby
no flags Details

  None (edit)
Description Ali Akcaagac 2012-09-07 03:42:30 EDT
Created attachment 610646 [details]
messages.txt from /var/log

I own a E325 Lenovo Notebook with the brand type (NWX2UGE 12972UG) running F17 and the following kernel:

Linux localhost.localdomain 3.5.3-1.fc17.i686 #1 SMP Wed Aug 29 19:25:38 UTC 2012 i686 i686 i386 GNU/Linux

The build in APU is a combo of CPU and Radeon graphic card.

Whenever I boot Fedora I end up in playing bingo. Quite often the system hangs during boot and won't show the blue fedora logo.

I have to reboot multiple times (sometimes up to 5 times or more) to get the little logo to popup and continue loading. Therefore this is basicly ALWAYS reproducable.

I already flashed a new BIOS on the Notebook to see whether this solvs the problem but sadly it doesn't.

Attached is the message log providing more information about architecture and system components. The log clearly describes an kernel issue with scheduling the radeon driver - which by my understanding may cause this problem.
Comment 1 Egor Zaharov 2012-09-28 16:18:51 EDT
Reproduced on Sony VAIO model VPCYB1S1R, with AMD E-350+Radeon HD 6310 APU.
Comment 2 Egor Zaharov 2012-09-28 16:27:04 EDT
(In reply to comment #1)
> Reproduced on Sony VAIO model VPCYB1S1R, with AMD E-350+Radeon HD 6310 APU.
(Forgot) On 3.5.3-1.fc17.i686.PAE kernel
Comment 3 Daniele Viganò 2012-11-01 13:35:22 EDT
Also, randomly, on Samsung NP305U1A (AMD E-450).

uname -a
Linux sam.daniele.vigano.me 3.6.2-4.fc17.x86_64 #1 SMP Wed Oct 17 02:43:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

dmesg log
[    3.205027] [drm] Initialized radeon 2.24.0 20080528 for 0000:00:01.0 on mino
r 0
[   13.364085] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[   13.364104] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000000002 l
ast fence id 0x0000000000000001)
[   37.260730] radeon 0000:00:01.0: couldn't schedule ib
[   37.260742] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !

workaround
put notebook in standby (in my case pressing the hw power button) and then resume from standby
Comment 4 Daniele Viganò 2012-11-01 13:39:29 EDT
Upstream bug report: https://bugzilla.kernel.org/show_bug.cgi?id=47481
Comment 5 Ali Akcaagac 2012-11-02 07:55:56 EDT
> workaround: put notebook in standby (in my case pressing the hw
> power button) and then resume from standby

Doesn't work for me.
Comment 6 Ali Akcaagac 2012-11-02 08:02:11 EDT
Created attachment 637051 [details]
Greetings from kernel-3.6.5-1.fc17.i686

This is what happened with todays update. Someone must have messed the gfx subsystem for this APU which causes exactly this to happen.

Normally it requires me 10 reboots or so to get Fedora 17 loaded once due the error. But now it requires me 10 reboots to get exactly this. Usually the point where the blue Fedora spinner/logo pops up.

The square thing on this picture is the mouse pointer. Switching to console works but gives a similar broken picture.

So basicly kernel-3.6.5-1.fc17.i686 is a nogo.
Comment 7 Ali Akcaagac 2012-11-02 08:03:39 EDT
Downgrading to kernel-3.6.3-1.fc17.i686 works as it should. Including 10 times or so reboot to get the system booted up once.

This with a notebook that is sold as fully linux compatible from lenovo.
Comment 8 Egor Zaharov 2012-11-02 14:27:09 EDT
Created attachment 637198 [details]
Archive with *Screenshots*

Reproduced on 3.5.4-3.6.3 kernels. But now, it is something different...
It's same "playing in bingo" way to boot up my Fedora 17, but:
* Turning off rhgb, can slightly increase the chances of booting up.
* Sometimes stucks on message:
[    5.948975] fb: conflicting fb hw usage radeondrmfb vs VESA VGA - removing generic driver
Blacklisting vesafb slightly reduces the chances of appearing this, but not 100%.
* Sometimes(with increased chances of appearing if wakeing up notebook from hibernation) can see this, forgot the name, "artifact". Seen in the photos(in archive) I attached.

That's all i can say for now. This is still "playing in bingo" style booting up system. Can't wait fix...
Comment 9 Ali Akcaagac 2012-11-03 09:23:41 EDT
Created attachment 637484 [details]
Full of oops'es

I came to the point where I start to regret having switched from stable Ubuntu to Fedora. Within the past 11 months (Fedora 16 to 17) I got trapped into so many problems related to untested stuff being pumped to the masses from Fedora that I ask myself where the quality people are.

I spent the entire last evening and the entire day now to get this sorted out and get a "halfway working" system again. No luck!.

* Kernel downgraded. No luck!
* xorg-ati downgraded. No luck!
* recreated initramfs. No luck!
* abrt downgraded. No luck!
* Only luck with "nomodeset" but this is worse than it was before with the 10 times reboot.

Read upstream bugreport. Thanks for nothing dudes!
Comment 10 Ali Akcaagac 2012-11-03 09:31:16 EDT
Anyone knows a quick and dirty workaround to get the system usable again ? Something specific to downgrade ? Not "nomodeset" or things like this. I am using it right now. I hope the solution is not called Windows.
Comment 11 Ali Akcaagac 2012-11-03 11:41:08 EDT
After some really frustrating hours I found something:

I stripped the system down to a minimum. Tried to boot the kernel in a stripped way without background things.

Tried to recreate initramfs with dracut and figured out that dracut was trying to search for ucode files related to radeon. I then went to the freedesktop page and read a few lines about the open source radeon driver. Figured out that certain radeon cards may need the ucode files.

I realized that under some unknown circumstances the "linux-firmware" package was missing or got removed from this installation. I will check my 2 days old backup to see whether the pack is present in the "untouched and" clean package.

Try this. Maybe it helps:

sudo yum update
sudo yum install linux-firmware              <-- could be missing
sudo dracut --force --xz -v
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Somehow I got the system running again without these white stripes. I will try a couple of reboots from now on and see if things work out again. Nonetheless the "playing bingo 10 times reboot" stuff seem to be still existing. Maybe donwgrading ati drivers may help too.

sudo yum downgrade xorg-x11-drv-ati
Comment 12 Oliver Henshaw 2012-11-29 07:04:14 EST
The latest from the upstream bug seems to be that the original issue is fixed in 3.7 but still needs backporting to 3.6.x.
Comment 13 Ali Akcaagac 2012-11-29 07:29:30 EST
(In reply to comment #12)
> The latest from the upstream bug seems to be that the original issue is
> fixed in 3.7 but still needs backporting to 3.6.x.

I agree here! I keep getting Kernel updates for F17, like the 3.6.8 that I am downloading right now. In hope that with increasing versions that this bug is going to disappear.

To solve this issue can be seen urgent, since it may cause file system corruption, file corruption, because you are forced to power off/on the system. I already had this issue happen once. Luckely I had a backup of my F17 so I was able to recover from it.

So either way. Please someone from the Fedora Kernel Maintainers. Please backport this fix if it's not done in mainstream. Or simply provide 3.7.x Kernel versions.

This issue is more than frustrating and could be solved easily by providing the fix. It also affects a lot of users with similar Hardware!
Comment 14 Josh Boyer 2012-11-29 08:21:51 EST
3.7 kernels are in rawhide.  You can use those on f17 and f18.  Both f17 and f18 will be rebased to 3.7 in the not too distant future.  If you would like to test those 3.7 kernels and let us know if it actually resolves your issue, that would be very helpful.

The commit was CC'd to stable and should be in the next 3.6.x release.
Comment 15 Ali Akcaagac 2012-11-29 15:05:40 EST
(In reply to comment #14)
> 3.7 kernels are in rawhide.  You can use those on f17 and f18.  Both f17 and
> f18 will be rebased to 3.7 in the not too distant future.  If you would like
> to test those 3.7 kernels and let us know if it actually resolves your
> issue, that would be very helpful.

Yes, I tested "kernel-3.7.0-0.rc7.git1.1.fc19.i686" today on my FC17 box. Did a dozen reboots (warm and cold). The system came up perfectly all the time. This was the first time that turing the machine on (or rebooting) it, didn't end in frustrations. Sadly the kernel is in debug mode (therefore slow).

> The commit was CC'd to stable and should be in the next 3.6.x release.

Ohhh yes I can't wait until the next 3.6.x is showing up in updates-testing. This will finally gives a lot less pain here :)

Thanks. I will report back once the new kernel shows up.
Comment 16 Josh Boyer 2012-12-03 18:54:36 EST
(In reply to comment #15)
> > The commit was CC'd to stable and should be in the next 3.6.x release.
> 
> Ohhh yes I can't wait until the next 3.6.x is showing up in updates-testing.
> This will finally gives a lot less pain here :)

The patch missed the official 3.6.9 stable release.  I've added it into the Fedora kernel git and started an F17 scratch build.  If those impacted by this bug could test the build below once it completes, that would be very appreciated.

http://koji.fedoraproject.org/koji/taskinfo?taskID=4753166
Comment 17 Ali Akcaagac 2012-12-04 04:42:57 EST
(In reply to comment #16)
> If those impacted by this bug could test the build below once it completes,
> that would be very appreciated.
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=4753166

Thanks for the time and the work you've spent on this build. I downloaded it this morning (kernel bins) and installed it on my system.

I do notice the same "cleaner" turning off the lcd and turning on the lcd (simple spoken) as the 3.7.x kernel that I tested did.

Allthough after a few reboots 10x I was still trapped 2x in the black screen issue. I realized that I might have missed to re-run dracut to re-create the initramfs. I did this too and tried rebooting a couple of times again and all seem to work fine. But still I am a bit worried about the 2x fails that I run into.

But all in all this is a better improvement. From once 20% - 80% (2 success boots / 8 fail boots out of 10) to 90% - 10% (9 success boots / 1 fail boots out of 10 (because of missing dracut re-build)) this seem to be much better.

I will now use this kernel for the next upcoming days and report back. Maybe someone else could do some tests as well and provide some feedback.
Comment 18 Egor Zaharov 2012-12-04 07:39:20 EST
(In reply to comment #16)
> (In reply to comment #15)
> > > The commit was CC'd to stable and should be in the next 3.6.x release.
> > 
> > Ohhh yes I can't wait until the next 3.6.x is showing up in updates-testing.
> > This will finally gives a lot less pain here :)
> 
> The patch missed the official 3.6.9 stable release.  I've added it into the
> Fedora kernel git and started an F17 scratch build.  If those impacted by
> this bug could test the build below once it completes, that would be very
> appreciated.
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=4753166

I downloaded this kernel, installed, and executed /usr/libexec/plymouth-update-initrd before the tests(for reasons of clarity).

10 reboots / no fails. I think the problem is fixed now.
Comment 19 Josh Boyer 2012-12-04 08:15:18 EST
Thanks to both of you.

I've submitted official builds for this to koji (the F17 build will be identical to that of the build you've already tested).  Bodhi will leave the usual comments as things to into the repos, etc.
Comment 20 Ali Akcaagac 2012-12-04 08:19:40 EST
Well we have to thank you for your time.

Finally a working AND painless boot process.

Could someone please tell me the difference between issuing plymouth-update-initrd and dracut ? Which one to prefer ?
Comment 21 Egor Zaharov 2012-12-04 08:24:43 EST
(In reply to comment #20)
> Well we have to thank you for your time.
> 
> Finally a working AND painless boot process.
> 
> Could someone please tell me the difference between issuing
> plymouth-update-initrd and dracut ? Which one to prefer ?

There is no differences, really
[nexfwall@Sony-PCG31311V ~]$ cat /usr/libexec/plymouth/plymouth-update-initrd
#!/bin/bash
/sbin/new-kernel-pkg --package kernel --mkinitrd --dracut --depmod --install $(uname -r)

And /sbin/new-kernel-pkg is just a bash script too.


Thank you Josh, for your time.
Comment 22 Fedora Update System 2012-12-04 16:13:20 EST
kernel-3.6.9-4.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.6.9-4.fc18
Comment 23 Fedora Update System 2012-12-04 16:16:33 EST
kernel-3.6.9-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.6.9-2.fc17
Comment 24 Fedora Update System 2012-12-04 16:17:52 EST
kernel-3.6.9-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.6.9-2.fc16
Comment 25 Fedora Update System 2012-12-05 01:51:35 EST
Package kernel-3.6.9-2.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.6.9-2.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-19711/kernel-3.6.9-2.fc17
then log in and leave karma (feedback).
Comment 26 Fedora Update System 2012-12-06 02:04:26 EST
kernel-3.6.9-2.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 27 Fedora Update System 2012-12-06 23:25:18 EST
kernel-3.6.9-4.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 28 Fedora Update System 2012-12-11 15:17:50 EST
kernel-3.6.10-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.6.10-2.fc16
Comment 29 Fedora Update System 2012-12-17 21:33:50 EST
kernel-3.6.10-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 30 Matt Slingsby 2012-12-31 14:06:24 EST
I have an ATI 4850 and had not seen the "failed to schedule IB" problem until I upgraded to kernel-3.6.10-2.fc17.x86_64  and now I see frequent lock-ups, screen corruption of failures to initialise Xorg.  Booting with kernel-3.6.9-2.fc17.x86_64 works perfectly.
Comment 31 Ali Akcaagac 2012-12-31 15:05:54 EST
Well. For *us* the upstream regression fix solved the problem of at least cleanly booting Linux on fc17. But.... and this is something new.... I am now on fc18 with 3.6.11-3.fc18.i686 running. Whenever I switch to console and then back to Xorg I get the same issues as Matt describes. But this happened recently.

Here it plays STROBO ... black and white flashing of the screen.... Impossible to initiate CTRL+ALT+DEL. Only way to solve it is to hard shutdown with the power buttons.

Booting into GDM works perfectly. This must have been introduced recently. On fc18 I also saw that Xorg and some other components got updated.

This is really a sad situation for people with APU and people with normal GPU.

Matt, could you provide some logs like /var/log/messages or something and Xorg.. I get some new radeon stuff being printed.

Maybe the regressions in upstream are only partially fixed.

I will provide some logs NEXT YEAR :)
Comment 32 Matt Slingsby 2012-12-31 15:45:38 EST
Created attachment 670869 [details]
Logs from failed 3.6.10 session

Attaching /var/log/messages and Xorg.log from a failed attempt to boot my machine with 3.6.10 kernel.  I was able to get to a console login with ctrl-alt-3 and shutdown the system.  I think this attempt to boot just presented a snowy grey screen instead of gdm or anything helpful.
Comment 33 Ali Akcaagac 2012-12-31 16:37:08 EST
I see! You are getting the same problems that I initially had and that's what this regression fix from upstream Kernel is supposed to fix. So I wonder why you still receive this in your logs.

May I ask you to try this:

sudo yum install linux-firmware              <-- could be missing
sudo depmoad -ae
sudo dracut --force --xz -v
/usr/libexec/plymouth/plymouth-generate-initrd
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

On next boot describe what happens. What I also like to know is what happens when you switch from Xorg to Console and then back to Xorg. Do you get this "STROBO" like effect too ?

With the same .10 kernel try another boot using "nomodeset" inside grub. I would like to know what the results are now.
Comment 34 Matt Slingsby 2012-12-31 18:26:30 EST
linux firmware was already installed and up-to-date.

The problem is rather intermittent, and apart from one slight glitch, it is refusing to happen at the moment.  I've run the other commands you listed and will monitor this over the next few days to see if the problem reoccurs.
Comment 35 Matt Slingsby 2013-01-12 05:25:38 EST
I continued to get problems after running the commands you requested, but never with consistent symptoms.  As the machine affected was my primary PC, I bought a cheap NVidia card last weekend and am using that now instead. Sorry I can't help further.

Note You need to log in before you can comment on or make changes to this bug.