Bug 585920 - KMS:RV620PRO:HD3470 GPU lockup
Summary: KMS:RV620PRO:HD3470 GPU lockup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xorg-x11-drv-ati
Version: 6.0
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Jérôme Glisse
QA Contact: desktop-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-26 12:38 UTC by Taunus
Modified: 2010-11-30 15:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-30 15:32:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
Some errors on /var/log/messages (1.75 KB, text/plain)
2010-04-27 07:30 UTC, Taunus
no flags Details

Description Taunus 2010-04-26 12:38:55 UTC
Description of problem:
After using rhel6 beta for couple of days I have had it crashing twice. Suddenly Thinkpad T400 which is docked to laptop just stops responding. The image on external screen vanishes and screen goes to powersave. There is no kernel panic (at least the caps lock light isn't blinkin). No reply when pingin. No respond to pressing caps lock.

Version-Release number of selected component (if applicable):
rhel 6 beta kernel

How reproducible:
I don't know

Steps to Reproduce:
1. Run laptop for a unknown period of time
2. see it crash
3. 
  
Actual results:
thinkpad t400 with ati mobility radeon hd 3470 crashes

Expected results:
Does not crash

Additional info:

Comment 2 RHEL Program Management 2010-04-26 14:13:44 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Taunus 2010-04-26 18:42:17 UTC
I had another crash right after posting the bug. I was browsing gnome-look.org like what I was doing when the previous crash occured.
no flash plugin installed.

Comment 4 Taunus 2010-04-27 06:25:42 UTC
I've been now browsing gnome-look really hard for a while but no crash :-)

Comment 5 Taunus 2010-04-27 07:30:23 UTC
Created attachment 409385 [details]
Some errors on /var/log/messages

After about one hour I got a crash again. I searched the /var/log/messages but there is nothing at the time of the crash. Attached are some messages the have occured at boot time, approx one hour before the crash.

Comment 6 Taunus 2010-04-27 08:43:25 UTC
Another crash after one hour of browsing the web. T400 BIOS was upgraded to the latest 3.16-1.06

Did not help obviously.

Comment 7 Taunus 2010-04-27 08:55:16 UTC
This is same bug as this one:
https://bugzilla.redhat.com/show_bug.cgi?id=560829

Comment 8 Taunus 2010-04-30 06:43:32 UTC
With the nomodeset boot option it does not crash. Please fix this anyway...

Comment 10 Taunus 2010-05-03 09:41:03 UTC
I have gotten couple of panics when I have set nomodeset in kernel boot, caps lock light is blinking. I wonder where I could fetch the console output. It would be nice if it was possible to save the crash log output to a file or some removable device before stopping operations. Like a usb stick or something. Or harddrive. Anyway, I'll try to get the log somehow.

Comment 11 Matěj Cepl 2010-05-03 13:55:38 UTC
(In reply to comment #10)
> I have gotten couple of panics when I have set nomodeset in kernel boot, caps
> lock light is blinking. I wonder where I could fetch the console output. It
> would be nice if it was possible to save the crash log output to a file or some
> removable device before stopping operations. Like a usb stick or something. Or
> harddrive. Anyway, I'll try to get the log somehow.    

You can try to bootup into text mode (add 3 to the kernel command line) and then run (as normal user) command startx. If luck strikes, X would crash and you could return back to the command line. /var/log/Xorg.0.log and /var/log/messages are two pieces of information we are after, and then output of dmesg command could show whatever was going on in the kernel.

All this information attached to this bug as separate uncompressed attachments would be very welcomed.

Thank you

Comment 12 Pekka Järveläinen 2010-05-12 07:58:05 UTC
CE: hpet increasing min_delta_ns to 22500 nsec
[drm:radeon_fence_wait] *ERROR* fence(ffff880054595b00:0x001C7521) 355ms timeout
[drm:radeon_fence_wait] *ERROR* last signaled fence(0x001C7521)
[drm:radeon_fence_wait] *ERROR* fence(ffff8800be987e80:0x001C7533) 507ms timeout going to reset GPU
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE77304E0
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00110103
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200020C0
radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0:   R_000E60_SRBM_SOFT_RESET=0x00000402

Comment 13 Taunus 2010-05-24 11:37:00 UTC
I got also this:
netconsole: network logging started
[drm:radeon_fence_wait] *ERROR* fence(ffff88010de0b300:0x00002B8E) 510ms timeout going to reset GPU
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200000C0
radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0:   R_000E60_SRBM_SOFT_RESET=0x00000402


netconsole seems to be a good way to catch console output. Sender machine has this:
#!/bin/bash
# send data from sender to receiver

# enabling konsole logging
dmesg -n 8

# probing for netconsole kernel module with parameters
# 10.0.0.1 = konsole sender ip, the computer with a problem that is
# 10.0.0.2 = receiver ip
# XX:XX:XX:XX:XX:XX = receiver eth0 MAC address
modprobe netconsole netconsole=4444@10.0.0.1/eth0,6666@10.0.0.2/XX:XX:XX:XX:XX:XX


And receiver machine has:
#!/bin/bash

# The port is udp type
echo "Opening firewall port 6666 for UDP traffic"
/sbin/iptables -I RH-Firewall-1-INPUT -p udp -m udp --dport 6666 -j ACCEPT

echo "Listening on UDP port 6666 for any messages..."
nc -l -u 6666

Comment 14 Taunus 2010-05-24 11:40:28 UTC
modprobe netconsole netconsole=bunch-of-parameters

text should be on the same line. I put this here just for reference.

Comment 15 Jérôme Glisse 2010-06-11 14:52:41 UTC
Do you still experience such lockup with more recent RHEL6 software ? Also can you please attach output of :
cat /proc/interrupts

Thx

Comment 16 Pekka Järveläinen 2010-06-11 18:57:23 UTC
 cat /proc/interrupts
           CPU0       CPU1       
  0:    3201736    3242296   IO-APIC-edge      timer
  1:          3          6   IO-APIC-edge      i8042
  4:          1          1   IO-APIC-edge    
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc0
  9:        210      37884   IO-APIC-fasteoi   acpi
 12:       1043        988   IO-APIC-edge      i8042
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb6, yenta, radeon@pci:0000:01:00.0
 17:        553          1   IO-APIC-fasteoi   uhci_hcd:usb7, firewire_ohci
 18:         14         17   IO-APIC-fasteoi   uhci_hcd:usb8
 19:        111         95   IO-APIC-fasteoi   ehci_hcd:usb2
 20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 21:        105         73   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
 23:      93719         70   IO-APIC-fasteoi   ehci_hcd:usb1
 27:          0          0   PCI-MSI-edge      pciehp
 29:     334724       4314   PCI-MSI-edge      ahci
 30:        677     897603   PCI-MSI-edge      eth0
 31:    1571523     627338   PCI-MSI-edge      iwlagn
 32:         64       2488   PCI-MSI-edge      HDA Intel
NMI:       1949        979   Non-maskable interrupts
LOC:    3503268    1680068   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:    1195579    3124567   Rescheduling interrupts
CAL:        104        190   Function call interrupts
TLB:      10665       3747   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        118        118   Machine check polls
ERR:          1
MIS:          0

I use now nomodeset and have had no more crashes. Does it exists more recent than 2.6.32-19.el6.x86_64 ? I didn't find any updates from 
ftp://ftp.redhat.com/pub/redhat/rhel/beta/6

Comment 17 Jérôme Glisse 2010-06-14 08:02:52 UTC
I cat /proc/interrupts with kms enabled ie without nomodeset option, after few minutes of X activity.

Comment 18 Taunus 2010-06-15 12:19:15 UTC
I would certainly like to try any updates for the rhl6beta has but they are not available anywhere. Or if they are, where to get them?

By the way:
http://mirrors.kernel.org/redhat/redhat/rhel/beta/6/
is fast if you need to download something.

Comment 19 Pekka Järveläinen 2010-06-18 10:34:13 UTC
without nomodeset option
cat /proc/interrupts
           CPU0       CPU1       
  0:      48766      53087   IO-APIC-edge      timer
  1:          4          5   IO-APIC-edge      i8042
  4:          1          1   IO-APIC-edge    
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc0
  9:        211        536   IO-APIC-fasteoi   acpi
 12:       1104        927   IO-APIC-edge      i8042
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb6, yenta
 17:          0          5   IO-APIC-fasteoi   uhci_hcd:usb7, firewire_ohci
 18:        487         22   IO-APIC-fasteoi   uhci_hcd:usb8
 19:        105         97   IO-APIC-fasteoi   ehci_hcd:usb2
 20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 21:        110         89   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
 23:       2746         55   IO-APIC-fasteoi   ehci_hcd:usb1
 27:          0          0   PCI-MSI-edge      pciehp
 29:        402       1438   PCI-MSI-edge      radeon
 30:      14286       2411   PCI-MSI-edge      ahci
 31:         78       3604   PCI-MSI-edge      eth0
 32:      10086       4548   PCI-MSI-edge      iwlagn
 33:         61        716   PCI-MSI-edge      HDA Intel
NMI:        271        140   Non-maskable interrupts
LOC:      52400      46392   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:      26706      47039   Rescheduling interrupts
CAL:         67        163   Function call interrupts
TLB:       1465       1296   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          1          1   Machine check polls
ERR:          1
MIS:          0

Comment 20 RHEL Program Management 2010-07-15 14:46:24 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 21 Taunus 2010-08-31 16:02:50 UTC
This does not seem to problem anymore on rhel6beta2refresh

Comment 23 Jérôme Glisse 2010-11-30 15:32:33 UTC
Closing this one, reopen if it's still an issue with final rhel6.


Note You need to log in before you can comment on or make changes to this bug.