Bug 1339855 - intel_pch_fifo_underrun_irq_handler when switching terminals
Summary: intel_pch_fifo_underrun_irq_handler when switching terminals
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-intel
Version: 23
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL: https://bollocks.fedorapeople.org/ker...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-26 00:51 UTC by bollocks
Modified: 2017-10-22 23:22 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-01 00:24:51 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description bollocks 2016-05-26 00:51:48 UTC
User-Agent:       Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0
Build Identifier: 

In any of the 4.5.* kernels that I've tried in F23 and F4 I've encountered an error where the kernel doesn't seem to initialize the driver correctly when I switch terminals using alt+ctrl+f2.  Sometimes after showing graphical glitches the getty login prompt shows up but other times it freezes with graphical glitches.  

In the cases where it eventually did show the getty login screen I was able to run dmesg | grep -i -E 'intel|drm|i9|error'. The output can be found at 
https://bollocks.fedorapeople.org/kernel45probs/error.txt

I think the last couple lines are the important ones.



Reproducible: Always

Steps to Reproduce:
1.Run any kernel version 4.5.x on a machine using intel iron lake graphics.
2.press ctrl+alt+f<number> to change terminals
3.repeat until graphics freeze
Actual Results:  
Graphics freeze.  Sometimes it manages to initialize with this dmesg output:

https://bollocks.fedorapeople.org/kernel45probs/error.txt


Expected Results:  
I expect the getty prompt to show up without glitches.  This is the normal dmesg output:
https://bollocks.fedorapeople.org/kernel45probs/normal.txt

I've tried updating my graphics driver from the intel website but this doesn't change anything.

Comment 1 Tomasz Torcz 2016-05-26 14:08:31 UTC
I see those error message during graphical usage under Wayland. It causes intermittent screen corruption, lasting under a second.

Linux sandworm.pipebreaker.pl 4.5.3-300.fc24.x86_64 #1 SMP Thu May 5 01:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[Sun May 22 19:41:03 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Sun May 22 21:08:33 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Sun May 22 22:10:46 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Sun May 22 23:14:07 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 00:18:51 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 01:20:42 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 01:45:09 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 13:06:35 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 15:39:54 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 19:42:07 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Mon May 23 21:25:15 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Tue May 24 00:49:47 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Tue May 24 11:18:58 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Tue May 24 13:03:09 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Wed May 25 04:40:39 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[Wed May 25 09:35:36 2016] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Hardware is Lenovo Thinkpad T400, 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)

Comment 2 bollocks 2016-05-27 01:45:41 UTC
Someone on IRC suggested it might be dynamic CPU frequency changes causing this.  I gave it a shot and still run in to the same problem.  I got some more data by looking at journalctl:
https://bollocks.fedorapeople.org/kernel45probs/journalctlerror.txt

It looks like the OS is still working after the i915 error.  That seems to suggest that a quick fix might be possible if you could detect the failure and retry.

Comment 3 bollocks 2016-05-27 05:14:10 UTC
I tried a few things.  I installed xorg-x11-drv-intel from the rawhide repo and the bug still occured with kernel 4.5.5.  I installed kernel 4.7 from rawhide.  I was unable to reproduce the bug exactly.  When switching tty the glitches still appeared but i was unable to get the screen to hang as before.  The FIFO underrun messages in dmesg did show up.  When I exited i3 and ended my xsession(i had booted in to multi-user.target and run startx) the screen froze and I was unable to change terminals.  I believe the problem still exists in 4.7 but it isn't as bad.  I tried several times to reproduce this and was unable to.  

I also figured out how to set the debug mode for drm so I could get more data from dmesg:
https://bollocks.fedorapeople.org/kernel45probs/drmdebug.txt

I'm considering compiling from the drm-intel git repo next.  Anyone have any ideas where to go from here?

Comment 4 bollocks 2016-05-30 23:08:35 UTC
My bug's component was changed to xorg-x11-drv-intel so I decided to take a look there.  I rebuilt xorg-x11-drv-intel with code from the intel git repo for xf86-video-intel. I didn't update intel-gpu-tools. I don't think it matters but I'll probably end up rebuilding it with updated intel-gpu-tools code anyway.  

I booted kernel 4.5.5-201 with drm.debug=0x3f.  From what I can tell 3f is the most verbose mode. I produced the crash and on reboot saved the journalctl output for the last boot:

https://bollocks.fedorapeople.org/kernel45probs/gpucrash-drv-160530.tar.gz

The Intel Graphics site suggested I dump the gpu registers with intel_reg_dumper and get the error state from /sys/class/drm/card0/error .  So I set up systemd to run this on shutdown:

https://bollocks.fedorapeople.org/kernel45probs/gpuregdump

since I wouldn't be able to access the machine after the display stopped responding.  The output from intel_reg_dumper was:

https://bollocks.fedorapeople.org/kernel45probs/gpureg-drv-160530.txt

The output from the error state file reported no error:

https://bollocks.fedorapeople.org/kernel45probs/errorstate-drv-160530.txt

I noticed with these drivers it became harder to cause the crash the longer the system was allowed to run.  If I logged in through lxdm it became less likely I would be able to cause the crash.  I estimate a 10% chance as opposed to the normal 75%.  There are some graphical glitches when switching even if it doesn't hang.  They are however different.  Instead of lines and blocks of random colors on the left and upper 10 columns and rows I get dashes about 5 pixels high and 40 pixels wide going across the screen. These only last about .1 seconds.  

Before I tried tying that script to systemd i tried starting a script that dumped gpu registers and read the error state every .5 seconds.  Running this script made it impossible to recreate the crash.  However if I removed the part that dumped the registers I was able to recreate the crash.  So dumping the registers seems to prevent the crash for some reason.  I'm not sure if repeated register dumping is the cause or a single register dump does the trick.  I'm not sure if reading the registers is what does it or the extra work is what prevents the crash.  

As I said before the longer the system is logged in the less likely the crash is to occur.  So I believe either the registers being read while the system is logged in or the system initializing stuff over time is causing it.

I think my next step will be to compile the kernel from the intel-drm-nightly branch.

Comment 5 bollocks 2016-05-31 01:12:58 UTC
I reported this upstream: https://bugs.freedesktop.org/show_bug.cgi?id=96284

Comment 6 bollocks 2016-06-11 02:23:27 UTC
I upgraded to F24 and was still able to reproduce the problem in kernel 4.5.5 and 4.5.6.  After upgrading to 4.5.7 from updates-testing I no longer have an issue with graphical glitches when switching to a tty in text mode and no longer have screen freezing.  

However when switching from a text mode to a graphics mode tty the glitches are a little worse.  There is no screen freezing so this update is a great improvement for me.  I did have one issue when I logged in using the LXDM log in screen and while I was logging in I switched to a text mode tty using ctrl+alt+f2.  I had no issues switching but when I switched back there was a weird graphics glitch where the wallpaper was tearing combination of my normal wall paper and the LXDM logging screen.  I believe LXDM puts the wall paper as the x11 root window because if I run i3 without logging in through LXDM I get a black wall paper.

When I started programs the programs showed up fine and operated normally.  I was able to log out and after reinitializing the graphical mode everything was fine.  This occurs whenever you log out of your WM and LXDM takes over and shows the log in screen.  It kind of gives you a blank slate.  After logging in again everything was fine.  I haven't been able to reproduce this error.

I'm learning to use ffmpeg and I think it allows you to do screen capture videos.  I'll try to get some captures of these glitches one I figure it out.

Comment 7 bollocks 2016-06-11 02:29:35 UTC
Also I am not seeing any FIFO underrun errors in my dmesg after upgrading to 4.5.7.  Some selected dmesg output:

dmesg | grep -E 'i915|drm'
[    1.945472] [drm] Initialized drm 1.1.0 20060810
[    2.116967] [drm] Memory usable by graphics device = 2048M
[    2.116971] [drm] Replacing VGA console driver
[    2.130750] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.130753] [drm] Driver supports precise vblank timestamp query.
[    2.184435] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3
[    2.207213] [drm] RC6 disabled, disabling runtime PM support
[    2.224829] [drm] Initialized i915 1.6.0 20151218 for 0000:00:02.0 on minor 0
[    2.396594] fbcon: inteldrmfb (fb0) is primary device
[    3.598721] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[    3.807051] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[ 6320.845035] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[57165.185029] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment

Comment 8 bollocks 2016-06-18 11:53:28 UTC
Everything seems to be working fine.  No glitches.  No screen freezes.  Transition between a console and xorg tty is smooth in either direction.

 I'm running Fedora 24 kernel 4.5.7-300.fc24.  That bogus alignment error is gone.  I'm running packages based on git clones made on Jun 15 2016.  The packages except libva-intel-driver can be found here:

https://copr.fedorainfracloud.org/coprs/bollocks/intelblood/

A current dmesg output:
dmesg | grep -E 'i915|drm'
[    0.542039] [drm] Initialized drm 1.1.0 20060810
[    0.542824] [drm] Memory usable by graphics device = 2048M
[    0.542839] [drm] Replacing VGA console driver
[    0.551898] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    0.551900] [drm] Driver supports precise vblank timestamp query.
[    0.602839] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3
[    0.623498] [drm] RC6 disabled, disabling runtime PM support
[    0.625069] [drm] Initialized i915 1.6.0 20151218 for 0000:00:02.0 on minor 0
[    0.817481] fbcon: inteldrmfb (fb0) is primary device
[    2.029455] [drm:intel_enable_lvds] *ERROR* timed out waiting for panel to power on
[    2.238514] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device

Comment 9 bollocks 2016-06-30 21:00:33 UTC
I distro-synced all of my Intel graphics stack packages  and with kernel 4.5.7 and 4.6.3 the problem is completely gone. I'd say this bug no longer affects F24.  Next I'll try it out with a F23 install since created this bug report while using F23.

Comment 10 bollocks 2016-07-01 00:24:51 UTC
I tested it on F23 using kernel 4.5.7.  The glitches are gone.

Comment 11 bollocks 2016-07-01 00:26:06 UTC
Also no screen freezes.  Thanks everyone.

Comment 12 Sergio Basto 2017-10-22 23:22:33 UTC
@bollocks , I got in my hands on toshiba satellite pro S500-155

the dmesg | grep -E 'i915|drm' [1]
dmesg | grep -i -E 'intel|drm|i9|error' [2]

should I try something from https://copr.fedorainfracloud.org/coprs/bollocks/intelblood/ ? 



[    2.511292] [drm] Memory usable by graphics device = 2048M
[    2.511296] [drm] Replacing VGA console driver
[    2.520877] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.520880] [drm] Driver supports precise vblank timestamp query.
[    2.522402] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.593334] [drm] RC6 disabled, disabling runtime PM support
[    2.595012] [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
[    2.631353] fbcon: inteldrmfb (fb0) is primary device
[    3.669324] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[    3.715223] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   16.963723] snd_hda_intel 0000:00:1b.0: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])




[    0.000000] Reserving Intel graphics memory at 0x00000000bc000000-0x00000000bfffffff
[    0.022000] smpboot: CPU0: Intel(R) Core(TM) i3 CPU       M 370  @ 2.40GHz (family: 0x6, model: 0x25, stepping: 0x5)
[    0.022000] Performance Events: PEBS fmt1+, Westmere events, 16-deep LBR, Intel PMU driver.
[    0.881745] intel_idle: MWAIT substates: 0x1120
[    0.881746] intel_idle: v0.4.1 model 0x25
[    0.881895] intel_idle: lapic_timer_reliable_states 0xffffffff
[    0.922342] RAS: Correctable Errors collector initialized.
[    1.208202] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.208276] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._SDD, AE_NOT_FOUND (20170531/psparse-550)
[    1.208369] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.208438] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20170531/psparse-550)
[    1.210169] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.210241] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._SDD, AE_NOT_FOUND (20170531/psparse-550)
[    1.210334] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.210404] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20170531/psparse-550)
[    1.520532] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.520604] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._SDD, AE_NOT_FOUND (20170531/psparse-550)
[    1.520697] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.520767] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20170531/psparse-550)
[    1.523242] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.523415] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._SDD, AE_NOT_FOUND (20170531/psparse-550)
[    1.523607] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[    1.523774] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20170531/psparse-550)
[    2.283384] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    2.283387] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    2.510031] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.510665] pci 0000:00:00.0: Intel HD Graphics Chipset
[    2.511292] [drm] Memory usable by graphics device = 2048M
[    2.511296] [drm] Replacing VGA console driver
[    2.520877] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.520880] [drm] Driver supports precise vblank timestamp query.
[    2.522402] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.593334] [drm] RC6 disabled, disabling runtime PM support
[    2.595012] [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
[    2.631353] fbcon: inteldrmfb (fb0) is primary device
[    3.669324] [drm:intel_enable_lvds [i915]] *ERROR* timed out waiting for panel to power on
[    3.715223] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   14.427667] intel ips 0000:00:1f.6: CPU TDP doesn't match expected value (found 25, expected 29)
[   14.428007] intel ips 0000:00:1f.6: IPS driver initialized, MCP temp limit 90
[   14.697073] tpm tpm0: [Hardware Error]: Adjusting reported timeouts: A 750->750000us B 2000->2000000us C 750->750000us D 750->750000us
[   14.903626] audit: type=1130 audit(1508713074.418:64): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@backlight:intel_backlight comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[   15.869053] tpm tpm0: A TPM error (7) occurred attempting to read a pcr value
[   15.871672] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[   16.963723] snd_hda_intel 0000:00:1b.0: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[   17.761751] input: HDA Intel MID Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input11
[   17.761835] input: HDA Intel MID Headphone as /devices/pci0000:00/0000:00:1b.0/sound/card0/input12
[   17.761912] input: HDA Intel MID HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1b.0/sound/card0/input13
[  430.849041] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  430.849045] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._SDD, AE_NOT_FOUND (20170531/psparse-550)
[  430.849062] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  430.849065] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20170531/psparse-550)
[  430.854728] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  430.854732] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._SDD, AE_NOT_FOUND (20170531/psparse-550)
[  430.854748] ACPI Error: [GTF1] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  430.854751] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT1._GTF, AE_NOT_FOUND (20170531/psparse-550)
[  431.884315] tpm tpm0: A TPM error (7) occurred attempting to read a pcr value
[  432.459397] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  432.459405] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._SDD, AE_NOT_FOUND (20170531/psparse-550)
[  432.459423] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  432.459427] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20170531/psparse-550)
[  432.468063] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  432.468069] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._SDD, AE_NOT_FOUND (20170531/psparse-550)
[  432.468085] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20170531/psargs-364)
[  432.468089] ACPI Error: Method parse/execution failed \_SB.PCI0.SAT0.PRT0._GTF, AE_NOT_FOUND (20170531/psparse-550)


Note You need to log in before you can comment on or make changes to this bug.