Bug 1029144 - Repeatable bus error; __memcpy_sse2_unaligned/R600UploadToScreenCS on big images
Repeatable bus error; __memcpy_sse2_unaligned/R600UploadToScreenCS on big images
Status: CLOSED EOL
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati (Show other bugs)
20
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: X/OpenGL Maintenance List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-11 13:44 EST by rh
Modified: 2015-06-29 08:52 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-06-29 08:52:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Xorg.2.log with backtrace in (45.34 KB, text/x-log)
2013-11-11 14:36 EST, rh
no flags Details
bt full from gdb (4.59 KB, text/plain)
2013-11-11 14:48 EST, rh
no flags Details
Upstream Patch by Alex Deucher <alexander.deucher@amd.com> (2.63 KB, patch)
2014-01-11 10:52 EST, kubrick@fgv6.net
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
FreeDesktop.org 73083 None None None Never

  None (edit)
Description rh 2013-11-11 13:44:27 EST
Description of problem:

X server bus error'd see backtrace below

Version-Release number of selected component (if applicable):

xorg-x11-server-Xorg-1.14.3-4.fc20.x86_64
xorg-x11-drv-ati-7.2.0-3.20131101git3b38701.fc20.x86_64


How reproducible:

Had it twice so far

Steps to Reproduce:

I've triggered this twice in quick succession; I'm not 100% sure of the cause, but I had just opened 
http://www.theguardian.com/technology/2013/oct/07/nokia-lumia-1020-review-41-megapixel-camera
in a tab in firefox.-review-41-megapixel-camera

the 1st time, and the 2nd time it died was when I went back to Firefox and started to scroll down in that

(I'm in KDE with a Radeon HD4350/RV710)

Actual results:
[ 26490.499] (EE)
[ 26490.499] (EE) Backtrace:
[ 26490.503] (EE) 0: /usr/bin/X (OsLookupColor+0x129) [0x4734f9]
[ 26490.503] (EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x322920f74f]
[ 26490.503] (EE) 2: /lib64/libc.so.6 (__memcpy_sse2_unaligned+0x29) [0x3228a93fd9]
[ 26490.504] (EE) 3: /usr/lib64/xorg/modules/drivers/radeon_drv.so (_init+0x1f297) [0x7fdeab3dbbb7]
[ 26490.504] (EE) 4: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x43e0) [0x7fdeaa75ee90]
[ 26490.504] (EE) 5: /usr/bin/X (dixDestroyPixmap+0x1459) [0x437ea9]
[ 26490.504] (EE) 6: /usr/bin/X (SendErrorToClient+0x427) [0x43a1b7]
[ 26490.505] (EE) 7: /usr/bin/X (_init+0x3b1a) [0x42be0a]
[ 26490.505] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3228a21d65]
[ 26490.505] (EE) 9: /usr/bin/X (_start+0x29) [0x428a25]
[ 26490.505] (EE) 10: ? (?+0x29) [0x29]
[ 26490.505] (EE)
[ 26490.506] (EE) Bus error at address 0x7fde97425000
[ 26490.506] (EE)
Fatal server error:
[ 26490.506] (EE) Caught signal 7 (Bus error). Server aborting


Expected results:

No crash!

Additional info:

The automated reporting thing added this as a dupe of bug 955617 which appears to be completely different in almost every sense other than a crashing X server.
Comment 1 rh 2013-11-11 13:48:51 EST
OK, just triggered 3rd time - seems fully reproducable scrolling down in ff on that page!
Comment 2 rh 2013-11-11 14:32:07 EST
I can repeat this in a simpler scenario; I'm doing it in a second VT to avoid killing my main session:

 * Change to a spare VT
 * X :2 &
 * export DISPLAY=:2
 * xterm &
 * fvwm &
 * firefox &
 * Bring up the URL above, scroll down slowly

so it's not specific to the KDE desktop I was running.
Comment 3 rh 2013-11-11 14:36:15 EST
Created attachment 822611 [details]
Xorg.2.log with backtrace in
Comment 4 rh 2013-11-11 14:38:01 EST
[dg@major log]$ su -c 'lspci -nn'
Password: 
00:00.0 Host bridge [0600]: Intel Corporation Core Processor DMI [8086:d131] (rev 11)
00:03.0 PCI bridge [0604]: Intel Corporation Core Processor PCI Express Root Port 1 [8086:d138] (rev 11)
00:08.0 System peripheral [0880]: Intel Corporation Core Processor System Management Registers [8086:d155] (rev 11)
00:08.1 System peripheral [0880]: Intel Corporation Core Processor Semaphore and Scratchpad Registers [8086:d156] (rev 11)
00:08.2 System peripheral [0880]: Intel Corporation Core Processor System Control and Status Registers [8086:d157] (rev 11)
00:08.3 System peripheral [0880]: Intel Corporation Core Processor Miscellaneous Registers [8086:d158] (rev 11)
00:10.0 System peripheral [0880]: Intel Corporation Core Processor QPI Link [8086:d150] (rev 11)
00:10.1 System peripheral [0880]: Intel Corporation Core Processor QPI Routing and Protocol Registers [8086:d151] (rev 11)
00:1a.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 05)
00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b56] (rev 05)
00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 05)
00:1c.1 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 [8086:3b44] (rev 05)
00:1c.2 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 [8086:3b46] (rev 05)
00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 05)
00:1c.4 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 [8086:3b4a] (rev 05)
00:1d.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 05)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a5)
00:1f.0 ISA bridge [0601]: Intel Corporation 5 Series Chipset LPC Interface Controller [8086:3b02] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b22] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 05)
02:00.0 FireWire (IEEE 1394) [0c00]: VIA Technologies, Inc. VT6315 Series Firewire Controller [1106:3403]
02:00.1 IDE interface [0101]: VIA Technologies, Inc. VT6415 PATA IDE Host Controller [1106:0415] (rev a0)
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 03)
07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RV710 [Radeon HD 4350/4550] [1002:954f]
07:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] RV710/730 HDMI Audio [Radeon HD 4000 series] [1002:aa38]
ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-Core Registers [8086:2c51] (rev 04)
ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2c81] (rev 04)
ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2c90] (rev 04)
ff:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2c91] (rev 04)
ff:03.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller [8086:2c98] (rev 04)
ff:03.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Target Address Decoder [8086:2c99] (rev 04)
ff:03.4 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Test Registers [8086:2c9c] (rev 04)
ff:04.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Control Registers [8086:2ca0] (rev 04)
ff:04.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Address Registers [8086:2ca1] (rev 04)
ff:04.2 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Rank Registers [8086:2ca2] (rev 04)
ff:04.3 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Thermal Control Registers [8086:2ca3] (rev 04)
ff:05.0 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Control Registers [8086:2ca8] (rev 04)
ff:05.1 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Address Registers [8086:2ca9] (rev 04)
ff:05.2 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Rank Registers [8086:2caa] (rev 04)
ff:05.3 Host bridge [0600]: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Thermal Control Registers [8086:2cab] (rev 04)
[dg@major log]$ 

[dg@major log]$ xrandr
Screen 0: minimum 320 x 200, current 3840 x 1080, maximum 8192 x 8192
VGA-0 disconnected (normal left inverted right x axis y axis)
HDMI-0 connected primary 1920x1080+1920+0 (normal left inverted right x axis y axis) 477mm x 268mm
   1920x1080      60.0*+
   1600x1200      60.0  
   1680x1050      59.9  
   1280x1024      75.0     60.0  
   1440x900       75.0     59.9  
   1152x864       75.0  
   1024x768       75.1     70.1     60.0  
   832x624        74.6  
   800x600        72.2     75.0     60.3     56.2  
   640x480        75.0     72.8     66.7     60.0  
   720x400        70.1  
DVI-0 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 531mm x 298mm
   1920x1080      60.0*+
   1280x1024      75.0     60.0  
   1152x864       75.0  
   1024x768       75.1     60.0  
   800x600        75.0     60.3  
   640x480        75.0     60.0  
   720x400        70.1  

Dell S2409W and an Iiyama E2208HDS (Iiyama on the HDMI?)
Comment 5 rh 2013-11-11 14:48:47 EST
Created attachment 822613 [details]
bt full from gdb
Comment 6 rh 2013-11-11 15:20:19 EST
Not sure I can add much over that bt, one of the images on that page is:

RoryCJHR.jpg?psid=1[5] JPEG 7712x4352 7712x4352+0+0 8-bit sRGB 8.717MB 

which seems similar to match the src_obj size.

The pitch/height in the dst stuff in the backtrace look very bogus.
Comment 7 rh 2013-12-15 10:46:31 EST
Hmm not as repeatable as it was a month ago; but I can still trigger it - it seems to need me to open another tab in the firefox at the moment to get it to trigger.

I thought this had gone and turned into bug 993463 but maybe not.
Comment 8 rh 2013-12-26 21:32:26 EST
in R600UploadToScreenCS it's taking the:

    if (!(driver_priv->tiling_flags & (RADEON_TILING_MACRO | RADEON_TILING_MICRO))) {
        if (!radeon_bo_is_referenced_by_cs(driver_priv->bo, info->cs)) {
            flush = FALSE;
            if (!radeon_bo_is_busy(driver_priv->bo, &dst_domain)) {
                goto copy;  <-----
            }
        }

goto copy; in the case where this is failing

I added some more debug just before the memcpy that fails:

R600UploadToScreenCS: raw dst=0x7ff1efed3000
R600UploadToScreenCS: dst=0x7ff1efed3000 copy_pitch=31232 src=0x226e4b8 size=30848

so it's not failing on the 1st copy exit 1 case (and we don't print the rest of the debug for short stuff)

02257000-02376000 rw-p 00000000 00:00 0                                  [heap]
7ff1efed3000-7ff1f8073000 rw-s 1a1a6a000 00:05 10713                     /dev/dri/card0

so that says to me that it's in range of the mapped device and the source is good.
The 'size' looks about right given the width of the image, so I don't see why it's blowing up at the moment.
Comment 9 rh 2013-12-27 13:16:35 EST
OK< a bit more digging:

Program received signal SIGBUS, Bus error.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:39
39              movdqu  %xmm8, (%rdi)
(gdb) info reg
rdi            0x7fee9b646000   140662785990656

From /proc/pid/maps of the X server
7fee9b646000-7feea37e6000 rw-s 185556000 00:05 9484     /dev/dri/card0                                                                                                               
So the SIGBUSing instruction is really from the right address - and it's at the start of the mapped area

(gdb) print $_siginfo                                                                                                                                                                                                                        
$1 = {si_signo = 7, si_errno = 0, si_code = 2, _sifields = {_pad = {-1687920640, 32750, 0, 0, 1872902144, 32767, 0, 0, 91, 110, 0, 0, 0, 0, 119, 124, 1872902143, 32767,                                                                     
      574453248, -938495348, 0, 0, 0, 0, 27891376, 0, 682180569, 50}, _kill = {si_pid = -1687920640, si_uid = 32750}, _timer = {si_tid = -1687920640, si_overrun = 32750,                                                                    
      si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = -1687920640, si_uid = 32750, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {                                                                                
      si_pid = -1687920640, si_uid = 32750, si_status = 0, si_utime = 8044053457088282624, si_stime = 32767}, _sigfault = {si_addr = 0x7fee9b646000}, _sigpoll = {                                                                           
      si_band = 140662785990656, si_fd = 0}}}                           

and si_code 2 is apparently:
         #define BUS_ADRERR      (__SI_FAULT|2)  /* non-existent physical address */
Comment 10 rh 2013-12-27 15:22:41 EST
Filed upstream as:
https://bugs.freedesktop.org/show_bug.cgi?id=73083
Comment 11 rh 2013-12-29 07:47:09 EST
Note the 'possible fix' in the upstream bug; it seems to be surviving with that.
Comment 12 kubrick@fgv6.net 2014-01-11 10:52:31 EST
Created attachment 848644 [details]
Upstream Patch by Alex Deucher <alexander.deucher@amd.com>

http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/commit/?id=bcc454ea2fb239e13942270faec7801270615b9c

Seems to be a good fix. I tested it an the problem (that was 100% reproducible) is gone.

Please pull.
Comment 13 Samuel Sieb 2014-04-16 11:56:59 EDT
Is this patch going to get pulled into F20?
Comment 14 D. Hugh Redelmeier 2014-04-18 15:52:31 EDT
I think that I'm hitting this problem.  I got here from https://bugzilla.redhat.com/show_bug.cgi?id=955617 thanks to rh's comments.

Evidence: /var/tmp/abrt/ccpp-2014-04-18-15:03:41-835/core_backtrace shows

  __memcpy_sse2_unaligned
  R600UploadToScreenCS
  exaPutImage

xorg-x11-drv-ati-7.2.0-3.20131101git3b38701.fc20.x86_64

The most recent entry in the changelog for that RPM is dated Fri Nov 01 2013 so I guess that the patch has not been installed.


Also: this looks like it matches comment 20 in https://bugzilla.redhat.com/show_bug.cgi?id=1003221

Also: there's a chance this is the same as what I reported in comment 33 of https://bugzilla.redhat.com/show_bug.cgi?id=924076  The backtrace has some commonality.
Comment 15 Stas Sergeev 2014-05-31 07:52:37 EDT
Same here on F19 with all updates:
---
[  7137.021] (EE) Backtrace:
[  7137.038] (EE) 0: /usr/bin/X (OsLookupColor+0x129) [0x46f059]
[  7137.048] (EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x7fe2f84b8f8f]
[  7137.059] (EE) 2: /lib64/libc.so.6 (__memcpy_sse2+0x15b) [0x7fe2f780ff4b]
[  7137.069] (EE) 3: /usr/lib64/xorg/modules/drivers/radeon_drv.so (_init+0x1ec9
f) [0x7fe2f67cf78f]
[  7137.076] (EE) 4: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x42d4)
 [0x7fe2f5a948d4]
[  7137.076] (EE) 5: /usr/bin/X (dixDestroyPixmap+0x1109) [0x435099]
---
Comment 16 Hin-Tak Leung 2014-06-03 13:21:07 EDT
I am not sure why abort-applet put mine under Bug 1003221 - I no longer has the coredump or the mail to root, but mine is (two identical crashes when I switches vt, about 3 hours appart):

May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 0: /usr/bin/Xorg (OsLookupColor+0x129) [0x473759]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x3fa9a0f74f]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 2: /usr/bin/Xorg (mieqProcessDeviceEvent+0x125) [0x5839b5]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 3: /usr/bin/Xorg (mieqProcessInputEvents+0xf7) [0x583b77]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 4: /usr/bin/Xorg (ProcessInputEvents+0x19) [0x48b539]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 5: /usr/bin/Xorg (xf86Wakeup+0x35d) [0x48be6d]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 6: /usr/bin/Xorg (WakeupHandler+0x6d) [0x43e6ad]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 7: /usr/bin/Xorg (WaitForSomething+0x1bf) [0x46a90f]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 8: /usr/bin/Xorg (SendErrorToClient+0x111) [0x43a091]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 9: /usr/bin/Xorg (_init+0x3b0a) [0x42c00a]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 10: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3fa8e21d65]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 11: /usr/bin/Xorg (_start+0x29) [0x428c35]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE) 12: ? (?+0x29) [0x29]
May 27 05:43:37 localhost gdm-Xorg-:0: (EE)
Comment 17 rh 2014-06-04 04:19:04 EDT
Hin-Tak:
  I don't think that failure is this bug; this bug is only for the memcpy_sse* in the Pixmap code on Radeons.
Bug 1003221 gets all/most X crashes; so probably best to carry on there until you find one more specific.
Comment 18 rh 2014-10-04 15:45:28 EDT
This is looking pretty solid to me in F21; I can now view large images and that page; so I guess the:

xorg-x11-drv-ati-7.4.0-1.fc21.x86_64

has the upstream fix in.
Comment 19 Fedora End Of Life 2015-05-29 05:44:03 EDT
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 20 Fedora End Of Life 2015-06-29 08:52:45 EDT
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.