Bug 607077
Summary: | [mi] EQ overflowing. The server is probably stuck in an infinite loop | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Aleksandar Mihajlov <aleksandar.mihajlov> | ||||||||||||||||||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||||||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||||
Priority: | low | ||||||||||||||||||||||||
Version: | 6.0 | CC: | ajschult784, matti.aarnio, rockowitz, vengmd | ||||||||||||||||||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||||||||||||||||||
Target Release: | --- | Flags: | aleksandar.mihajlov:
needinfo-
|
||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2017-12-06 11:29:48 UTC | Type: | --- | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Attachments: |
|
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Backtrace: 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x46e898] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x45ee24] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xce) [0x4840be] 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f78e9b11000+0x516f) [0x7f78e9b1616f] 4: /usr/bin/Xorg (0x400000+0x7b227) [0x47b227] 5: /usr/bin/Xorg (0x400000+0x10d163) [0x50d163] 6: /lib64/libpthread.so.0 (0x34f0000000+0xf0f0) [0x34f000f0f0] 7: /lib64/libc.so.6 (ioctl+0x7) [0x34ef4d69d7] 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x23) [0x3500c03383] 9: /usr/lib64/libdrm.so.2 (drmCommandWrite+0x1b) [0x3500c0360b] 10: /usr/lib64/libdrm_nouveau.so.1 (0x7f78ed1e4000+0x2f1d) [0x7f78ed1e6f1d] 11: /usr/lib64/libdrm_nouveau.so.1 (nouveau_bo_map_range+0xfc) [0x7f78ed1e711c] 12: /usr/lib64/libdrm_nouveau.so.1 (0x7f78ed1e4000+0x2106) [0x7f78ed1e6106] 13: /usr/lib64/libdrm_nouveau.so.1 (nouveau_pushbuf_flush+0x29c) [0x7f78ed1e649c] 14: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f78ed408000+0x3abdd) [0x7f78ed442bdd] 15: /usr/lib64/xorg/modules/libexa.so (0x7f78eafa5000+0xd196) [0x7f78eafb2196] 16: /usr/lib64/xorg/modules/libexa.so (0x7f78eafa5000+0xe072) [0x7f78eafb3072] 17: /usr/bin/Xorg (0x400000+0xcadc0) [0x4cadc0] 18: /usr/bin/Xorg (0x400000+0xc12de) [0x4c12de] 19: /usr/bin/Xorg (0x400000+0x421fc) [0x4421fc] 20: /usr/bin/Xorg (0x400000+0x21d8a) [0x421d8a] 21: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x34ef41eb1d] 22: /usr/bin/Xorg (0x400000+0x21949) [0x421949] Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please add drm.debug=0x04 to the kernel command line, restart computer, wait until Xorg crash, switch to console (Ctrl-Alt-F2), collect and attach * your X server config file (/etc/X11/xorg.conf, if available), * output of the dmesg command, and * system log (/var/log/messages) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above. We will review this issue again once you've had a chance to attach this information. Thanks in advance. Created attachment 427316 [details]
/var/log/messages
Created attachment 427317 [details]
output of dmesg
It happened again. Buy I couldn't switch to console with Ctrl+Alt+F2. I had to reboot the machine. Maybe isn't just a X problem ? I could ping the machine, But I couldn't access with ssh. First messages after reboot are starting from Jun 28 10:23 I also attached output of dmesg, but it is output after reboot. I don't know if this is useful, but it is all I have. Created attachment 427576 [details]
output of dmesg 29.06.
Created attachment 427577 [details]
/var/log/messages 29.06.
Created attachment 427578 [details]
xorg.conf
Created attachment 427579 [details]
Xorg log 29.06.
Created attachment 427580 [details]
output of strace
Created attachment 427581 [details]
output of top command
Created attachment 427582 [details]
file descriptors of X
Ok, this time I have more useful data. I could access to machine even X was frozen, so I collect more data. You can find: output of dmesg /var/log/messages xorg.conf Xorg.0.log output of strace (strace -p <PID of Xorg>) output of top command (X is taking from 95% to 100% of CPU) list of Xorg file descriptors (ls -l /proc/<PID>/fd) As i can see from stracer, Xorg is stuck in: ..... ioctl(11, 0x40086485, 0x7fff94dcac70) = ? ERESTARTSYS (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) ioctl(11, 0x40086485, 0x7fff94dcac70) = ? ERESTARTSYS (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) ioctl(11, 0x40086485, 0x7fff94dcac70) = ? ERESTARTSYS (To be restarted) ...... where file descriptor 11 is: /dev/dri/card0 I hope this is more useful then previous logs. This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. It has been denied for the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. This request was erroneously denied for the current release of Red Hat Enterprise Linux. The error has been fixed and this request has been re-proposed for the current release. Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Did you still see this issue in 6.1? Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. I found a way to easily reproduce this error at Fedora 20. I am always starting to init state 3 ( = text console, ) logging in, and then explicitly starting the X with command: $ startx > x.log 2>&1 & This enables me to show you the log extract below. xorg-x11-drv-nouveau-1.0.9-2.fc20.x86_64 xorg-x11-server-Xorg-1.14.4-7.fc20.x86_64 The way how I trigger this is simple: 0) Have a PC with "GeForce GTX 550 Ti" video card, 16 GB RAM, 4+ x86-64 cores. 1) Have a text terminal open in X (desktop suite does not matter) 2) Go to a directory with around 30 .doc files 3) Run command: ooffice *.doc 4) Wait about a minute (15-20 docs to open) and X server should crash. Screen goes black and non-standard mouse cursor appears. The keyboard drops off the USB, mouse (USB) works. I didn't test if un-pluggin and re-plugging of the keyboard recovers it. Login on the machine from network (from other machine,) and run "init 6" to reboot it. Otherwise the x.log file may be incomplete, that is just pressing RESET button may not have the X server's alert data written all the way to disk.. Important thing in this is to have the ooffice launch quickly many documents, and it becomes able to provoke some sort of timing dependent deadlock. A representative sample of 'file' output on these documents that have 3-5 pages of text, no pictures: B-CR-TS102204-14-Errata v1.doc: Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.0, Code page: 1252, Title: CR template v1.5.0, Author: xxxxx, MCC, Keywords: CR, template, Template: 3gpp_70.dot, Last Saved By: xxxxx, Revision Number: 7, Name of Creating Application: Microsoft Word 8.0, Total Editing Time: 09:00, Last Printed: Fri Feb 13 14:58:00 2004, Create Time/Date: Wed Nov 24 13:34:00 2004, Last Saved Time/Date: Fri Dec 3 00:09:00 2004, Number of Pages: 1, Number of Words: 665, Number of Characters: 3791, Security: 0 --------------------------------- (EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed. (EE) (EE) Backtrace: (EE) 0: /usr/bin/X (?+0x33) [0x583373] (EE) 1: /usr/bin/X (?+0x33) [0x451453] (EE) 2: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2d1d) [0x7f2212d07c8d] (EE) 3: /usr/bin/X (?+0x2d1d) [0x48dfed] (EE) 4: /usr/bin/X (?+0x2d1d) [0x4b73bd] (EE) 5: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x381fa0f74f] (EE) 6: /lib64/libc.so.6 (ioctl+0x7) [0x381eeec067] (EE) 7: /lib64/libdrm.so.2 (drmIoctl+0x34) [0x3825a036e4] (EE) 8: /lib64/libdrm.so.2 (drmCommandWrite+0x1e) [0x3825a05fce] (EE) 9: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0x99) [0x7f2215b196e9] (EE) 10: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_space+0xd1) [0x7f2215b1a9e1] (EE) 11: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x1edf9) [0x7f2215d60e29] (EE) 12: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x91e) [0x7f22156e84fe] (EE) 13: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x120c) [0x7f22156e9c3c] (EE) 14: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x7fa5) [0x7f22156ee2b5] (EE) 15: /usr/bin/X (?+0x7fa5) [0x533875] (EE) 16: /usr/bin/X (?+0x7fa5) [0x52ce35] (EE) 17: /usr/bin/X (?+0x7fa5) [0x441f25] (EE) 18: /usr/bin/X (?+0x7fa5) [0x4304a5] (EE) 19: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x381ee21d65] (EE) 20: /usr/bin/X (?+0xf5) [0x428d01] (EE) 21: ? (?+0xf5) [0xf5] (EE) (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack. (EE) [mi] mieq is *NOT* the cause. It is a victim. (EE) [mi] EQ overflow continuing. 100 events have been dropped. (EE) (EE) Backtrace: (EE) 0: /usr/bin/X (?+0x1) [0x451421] (EE) 1: /usr/lib64/xorg/modules/input/evdev_drv.so (_init+0x2d1d) [0x7f2212d07c8d] (EE) 2: /usr/bin/X (?+0x2d1d) [0x48dfed] (EE) 3: /usr/bin/X (?+0x2d1d) [0x4b73bd] (EE) 4: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x381fa0f74f] (EE) 5: /lib64/libc.so.6 (ioctl+0x7) [0x381eeec067] (EE) 6: /lib64/libdrm.so.2 (drmIoctl+0x34) [0x3825a036e4] (EE) 7: /lib64/libdrm.so.2 (drmCommandWrite+0x1e) [0x3825a05fce] (EE) 8: /lib64/libdrm_nouveau.so.2 (nouveau_bo_wait+0x99) [0x7f2215b196e9] (EE) 9: /lib64/libdrm_nouveau.so.2 (nouveau_pushbuf_space+0xd1) [0x7f2215b1a9e1] (EE) 10: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (_init+0x1edf9) [0x7f2215d60e29] (EE) 11: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x91e) [0x7f22156e84fe] (EE) 12: /usr/lib64/xorg/modules/libexa.so (exaEnableDisableFBAccess+0x120c) [0x7f22156e9c3c] (EE) 13: /usr/lib64/xorg/modules/libexa.so (exaMoveOutPixmap+0x7fa5) [0x7f22156ee2b5] (EE) 14: /usr/bin/X (?+0x7fa5) [0x533875] (EE) 15: /usr/bin/X (?+0x7fa5) [0x52ce35] (EE) 16: /usr/bin/X (?+0x7fa5) [0x441f25] (EE) 17: /usr/bin/X (?+0x7fa5) [0x4304a5] (EE) 18: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x381ee21d65] (EE) 19: /usr/bin/X (?+0xf5) [0x428d01] (EE) 20: ? (?+0xf5) [0xf5] (EE) (EE) [mi] EQ overflow continuing. 200 events have been dropped. --------------------------------- The dmesg did show following (typical extracts, not thousands of repeats of same pairs) Out of some 16 000 log lines, majority are repeats like these four: kernel: [467685.150774] nouveau E[ PGRAPH][0000:02:00.0] TRAP ch 2 [0x023fc00000 X[2079]] kernel: [467685.150782] nouveau E[ PGRAPH][0000:02:00.0] SHADER 0xa0040a0e kernel: [467685.150799] nouveau E[ PGRAPH][0000:02:00.0] TRAP ch 2 [0x023fc00000 X[2079]] kernel: [467685.150803] nouveau E[ PGRAPH][0000:02:00.0] SHADER 0xa0040a0e Removing those, the remainder of kernel messages is: kernel: [467685.155755] nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC0/MP trap: INVALID_OPCODE kernel: [467685.155763] nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC2/MP trap: INVALID_OPCODE kernel: [467685.155769] nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC3/MP trap: INVALID_OPCODE kernel: [467685.155793] nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC1/MP trap: INVALID_OPCODE kernel: nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC0/MP trap: INVALID_OPCODE kernel: nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC2/MP trap: INVALID_OPCODE kernel: nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC3/MP trap: INVALID_OPCODE kernel: nouveau E[ PGRAPH][0000:02:00.0] GPC0/TPC1/MP trap: INVALID_OPCODE kernel: [467685.938554] nouveau E[ PFIFO][0000:02:00.0] read fault at 0x0030b40000 [INVALID_STORAGE_TYPE] from PGRAPH/GPC0/(unknown enum 0x00000007) on channel 0x023fc00000 [X[2079]] kernel: nouveau E[ PFIFO][0000:02:00.0] read fault at 0x0030b40000 [INVALID_STORAGE_TYPE] from PGRAPH/GPC0/(unknown enum 0x00000007) on channel 0x023fc00000 [X[2079]] Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/ |
Created attachment 426178 [details] X log Description of problem: After several days of normal work, X stops working. It doesn't respond to Ctrl+Alt+Backspace. The only way to recover machine is to reboot. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: