Bug 619244

Summary: Lockup in nouveau_bo_map_range
Product: Red Hat Enterprise Linux 6 Reporter: Rik van Riel <riel>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: rstrode
Target Milestone: rcKeywords: RHELNAK, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-06 10:41:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rik van Riel 2010-07-29 02:01:16 UTC
Description of problem:

On clicking the "Reboot" option in virt-manager (under the arrow on the right), while running a virtual machine that is scaled to the size of the window it is running in, X seizes up with an infinite SIGALRM loop.

Version-Release number of selected component (if applicable):

xorg-x11-server-Xorg-1.7.7-22.el6.x86_64

Steps to Reproduce:
1. run a virtual machine scaled to the window, through virt-manager
2. reboot the virtual machine, through virt-manager
3. log in over ssh to catch Xorg do the following:
  
Actual results:

--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482, 0x7fff8950ec00)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482, 0x7fff8950ec00)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482, 0x7fff8950ec00)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482, 0x7fff8950ec00)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482, 0x7fff8950ec00)    = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xe)                       = -1 EINTR (Interrupted system call)
ioctl(9, 0x40086482^C <unfinished ...>

Expected results:

The virtual machine restarts.

Additional info:

Is this the same SIGALRM loop I have noticed on Fedora 12 when opening a firefox tab while my mouse is crossing boundaries from one GPU/desktop to another?

This is on a different system though - the F12 system has two radeon cards, while the RHEL6 system has a single nvidia card.

Comment 2 RHEL Program Management 2010-07-29 02:27:33 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Adam Jackson 2010-07-29 21:30:56 UTC
The SIGALRM is a symptom, not the disease.  We arm a timer when we start request processing so we can punish clients that take too much CPU time.  We don't disable it after the client's been sufficiently punished though.

So what's actually happening is you're stuck in that interruptible ioctl forever.  I don't know what that decodes to offhand, but since you're using virt-manager in the (insane) scaling mode I'm going to assume it's something to do with OpenGL.  Care to get a stack trace from gdb instead?

Comment 4 Adam Jackson 2010-07-29 21:37:31 UTC
Also, to be clear, are you talking about X in the host or in the guest?

Comment 5 Ray Strode [halfline] 2010-07-29 21:39:59 UTC
If you're talking about X in the host, then this may just be a duplicate of bug 617505

If you're talking about X in the guest, then bug 617505 is probably unrelated.

Comment 6 Rik van Riel 2010-07-30 14:02:00 UTC
I am talking about X in the host.  The SIGALRM loop continued after I killed the guest.

I'll try to get you a backtrace.

Comment 7 Rik van Riel 2010-08-01 22:56:11 UTC
Today I got X to lock up again.  All I had to do was drag the virt-manager window, without even opening the guest.  

Unfortunately the debuginfo packages do not seem to be complete, even after running the debuginfo-install command suggested by gdb. The missing symbols seem to be in libexa.so and other libs that are a part of xorg-x11-server-Xorg.

Just in case it is useful, there is another side effect: the lower 2/3 or so of the screen turns totally blank. I'm running in 1024x768 resolution.


Missing separate debuginfos, use: debuginfo-install xorg-x11-server-Xorg-1.7.7-22.el6.x86_64
(gdb) bt
#0  0x00000035892d95d7 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x0000003592e03388 in drmIoctl (fd=9, request=1074291842, 
    arg=0x7fff8bb80610) at xf86drm.c:184
#2  0x0000003592e0360b in drmCommandWrite (fd=<value optimized out>, 
    drmCommandIndex=<value optimized out>, data=<value optimized out>, 
    size=<value optimized out>) at xf86drm.c:2361
#3  0x00007fd226ce9dfd in nouveau_bo_wait (bo=0x21226f0, 
    cpu_write=<value optimized out>, no_wait=<value optimized out>, 
    no_block=<value optimized out>) at nouveau_bo.c:385
#4  0x00007fd226ce9fee in nouveau_bo_map_range (bo=0x21226f0, delta=0, 
    size=<value optimized out>, flags=12) at nouveau_bo.c:428
#5  0x00007fd226f0671e in ?? ()
   from /usr/lib64/xorg/modules/drivers/nouveau_drv.so
#6  0x00007fd2262a4297 in ?? () from /usr/lib64/xorg/modules/libexa.so
#7  0x00007fd2262a71b2 in ?? () from /usr/lib64/xorg/modules/libexa.so
#8  0x00007fd2262af9b9 in ?? () from /usr/lib64/xorg/modules/libexa.so
#9  0x00007fd2262a9ff5 in ?? () from /usr/lib64/xorg/modules/libexa.so
#10 0x00000000004b30cb in ?? ()
#11 0x000000000045780a in miPaintWindow ()
#12 0x0000000000457ba8 in miWindowExposures ()
#13 0x00000000004f8967 in ?? ()
#14 0x00000000005555eb in miHandleValidateExposures ()
#15 0x0000000000554a5d in miMoveWindow ()
#16 0x0000000000557dfe in ?? ()
#17 0x000000000043f902 in ConfigureWindow ()
#18 0x000000000044c467 in ?? ()
#19 0x000000000044ce8c in ?? ()
#20 0x0000000000421ffa in _start ()
(gdb)

Comment 8 Rik van Riel 2010-08-01 22:59:07 UTC
I also found some nouveau error messages in my dmesg.  I do not know if these are from before or after X crashed:

[drm] nouveau 0000:04:04.0: PFIFO_DMA_PUSHER - Ch 1
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: DATA_ERROR, nStatus: BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0300 Data 0x01e03e90:0x00e3e2de
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c00 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c04 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c08 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c0c Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c10 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0c14 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
... (many more) ...
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0df8 Data 0x01e03e90:0x00000000
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - nSource: METHOD_CNT, nStatus: INVALID_STATE BAD_ARGUMENT
[drm] nouveau 0000:04:04.0: PGRAPH_NOTIFY - Ch 1/3 Class 0x004a Mthd 0x0dfc Data 0x01e03e90:0x00000000

Comment 9 Adam Jackson 2010-08-02 13:46:37 UTC
Moving to nouveau.

Comment 10 Rik van Riel 2010-08-02 14:03:12 UTC
Btw, this is the video card:

04:04.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model 64/Model 64 Pro] (rev 15)

Comment 11 Adam Jackson 2010-08-02 21:05:58 UTC
Drivers for cards released in 1999 are not 6.0 material.  Moving to 6.1.

Comment 12 RHEL Program Management 2011-01-07 04:33:21 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 13 Suzanne Logcher 2011-01-07 16:23:51 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 14 RHEL Program Management 2011-02-01 06:04:28 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 15 RHEL Program Management 2011-02-01 18:31:15 UTC
This request was erroneously denied for the current release of
Red Hat Enterprise Linux.  The error has been fixed and this
request has been re-proposed for the current release.

Comment 16 RHEL Program Management 2011-04-04 02:32:49 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 17 RHEL Program Management 2011-10-07 16:16:18 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 18 Jan Kurik 2017-12-06 10:41:03 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/