Bug 856790

Summary:	abrt Xorg bug filing isn't useful
Product:	[Fedora] Fedora	Reporter:	Dave Airlie <airlied>
Component:	abrt	Assignee:	abrt <abrt-devel-list>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	17	CC:	abrt-devel-list, dvlasenk, iprikryl, jfilak, jmoskovc, mmilata, mtoman, rvokal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-08-01 16:19:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	756771

Description Dave Airlie 2012-09-12 19:40:51 UTC

Currently abrt is filing lots of Xorg serve crashes that aren't useful

a) missing debuginfo in the backtrace - so no way to actually tell whats going wrong

b) some backtraces aren't all out server crashes, they are just rendering operations taking an increasinly long time (gpu could be being reset underneath)

c) attachments are less than useful, no dmesg, no /var/log/Xorg,?.log files.

d) binary driver bugs are being logged, nvidia_drv.so, fglrx_drv.so.

even with these fixed I'm not sure the X team is going to be able to spend time doing anything but closing the bugs unread as its flooding out bug inbox.

Comment 1 Jiri Moskovcak 2012-09-13 08:59:46 UTC

(In reply to comment #0)
> Currently abrt is filing lots of Xorg serve crashes that aren't useful
> 
> a) missing debuginfo in the backtrace - so no way to actually tell whats
> going wrong

- we will improve this and won't allow reporting crashes with incomplete backtraces

> 
> b) some backtraces aren't all out server crashes, they are just rendering
> operations taking an increasinly long time (gpu could be being reset
> underneath)

- is there a pattern we can look for to recognize such problems, so we can make ABRT to ignore it?

> 
> c) attachments are less than useful, no dmesg, no /var/log/Xorg,?.log files.
> 

- that sounds like a bug
- Xorg reports from abrt should contain the Xorg*log files

> d) binary driver bugs are being logged, nvidia_drv.so, fglrx_drv.so.
> 

- we can simply ignore xorg crashes when the kernel is tainted (even though we will loose some legit reports...)

> even with these fixed I'm not sure the X team is going to be able to spend
> time doing anything but closing the bugs unread as its flooding out bug
> inbox.

- we have created a new server which will aggregate and filter the reports from ABRT which should significantly lower the noise in bugzilla

Please see:
https://retrace.fedoraproject.org/faf/problems/hot/

Comment 2 Dave Airlie 2012-09-13 10:24:42 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > Currently abrt is filing lots of Xorg serve crashes that aren't useful
> > 
> > a) missing debuginfo in the backtrace - so no way to actually tell whats
> > going wrong
> 
> - we will improve this and won't allow reporting crashes with incomplete
> backtraces

Yeah I'm not 100% sure how good the Xorg backtrace code is at using debug symbols properly, but at the moment it gives a couple of symbols in the server and nothing else useful.

> 
> > 
> > b) some backtraces aren't all out server crashes, they are just rendering
> > operations taking an increasinly long time (gpu could be being reset
> > underneath)
> 
> - is there a pattern we can look for to recognize such problems, so we can
> make ABRT to ignore it?

Not really, if the server hasn't died its might just be a transitory stall.

> 
> > 
> > c) attachments are less than useful, no dmesg, no /var/log/Xorg,?.log files.
> > 
> 
> - that sounds like a bug
> - Xorg reports from abrt should contain the Xorg*log files

Most of them have two attachments, both look to be tarballs of xorg.conf.d dirs.


> > d) binary driver bugs are being logged, nvidia_drv.so, fglrx_drv.so.
> > 
> 
> - we can simply ignore xorg crashes when the kernel is tainted (even though
> we will loose some legit reports...)

or just grep for nvidia_drv.so or fglrx_drv.so and kill it if found.

> 
> > even with these fixed I'm not sure the X team is going to be able to spend
> > time doing anything but closing the bugs unread as its flooding out bug
> > inbox.
> 
> - we have created a new server which will aggregate and filter the reports
> from ABRT which should significantly lower the noise in bugzilla
> 
> Please see:
> https://retrace.fedoraproject.org/faf/problems/hot/

sounsd good.

Comment 3 Denys Vlasenko 2012-10-16 10:40:40 UTC

(In reply to comment #0)
> Currently abrt is filing lots of Xorg serve crashes that aren't useful
> 
> a) missing debuginfo in the backtrace - so no way to actually tell whats
> going wrong
> 
> b) some backtraces aren't all out server crashes, they are just rendering
> operations taking an increasinly long time (gpu could be being reset
> underneath)
> 
> c) attachments are less than useful, no dmesg, no /var/log/Xorg,?.log files.
> 
> d) binary driver bugs are being logged, nvidia_drv.so, fglrx_drv.so.

The processing of xorg problems is described in this file:
/etc/libreport/events.d/xorg_event.conf

It's a bash shell code. Please take a look at it.

I don't see why Xorg.0.log is missing in some BZs, the code to save it in xorg_event.conf is straightforward:
    test -f /var/log/Xorg.0.log && cp /var/log/Xorg.0.log .
ideas why it fails?

For now I am adding code to drop backtraces which contains nvidia_drv.so or fglrx_drv.so, like this:

EVENT=post-create analyzer=xorg
+        # Blacklist known binary-only modules:
+        grep /nvidia_drv.so backtrace &&
+        { echo "nvidia_drv.so was loaded - not saving the crash"; exit 1; }
+        grep /fglrx_drv.so backtrace &&
+        { echo "fglrx_drv.so was loaded - not saving the crash"; exit 1; }
         ...

and saving of dmesg. Any other modules you want to add to blacklist?

> even with these fixed I'm not sure the X team is going to be able to spend
> time doing anything but closing the bugs unread as its flooding out bug
> inbox.

The "flooding out bug inbox" situation must be avoided, even at the cost of losing some reports. Please let us know which reports are least useful/useless and we can just drop (not report) them. For example, maybe if Xorg.0.log is not found, then don't bother reporting?

Comment 4 Denys Vlasenko 2012-10-16 10:43:58 UTC

commit 88773edc1045800fbcc3f41d00345be46d743d98
Author: Denys Vlasenko <vda.linux>
Date:   Tue Oct 16 12:42:29 2012 +0200

    xorg_event: make post-create save dmesg, drop problems w/ binary modules

Comment 5 Denys Vlasenko 2012-10-16 17:51:21 UTC

(In reply to comment #0)
> a) missing debuginfo in the backtrace - so no way to actually tell whats
> going wrong

We probably can generate a better backtrace if X server would stop trying to handle its crashes and would just dump core as other programs do.

As it stands now, X server intercepts SIGSEGV etc, handles the crash internally (produces backtrace and emits it to the log file), then exits, and we are limited to scrubbing Xorg.0.log file.

A dedicated tool for automatic crash preprocessing based on coredump analysis can do better. Can't say we are there yet (I might be biased), but we are trying. :)


> b) some backtraces aren't all out server crashes, they are just rendering
> operations taking an increasinly long time (gpu could be being reset
> underneath)

We can filter out these if you let us know how.
I see the following code in xorg-server-1.12.0/os/osinit.c:

if (sip->si_code == SI_USER) {
     ErrorF("Recieved signal %d sent by process %ld, uid %ld\n",
             ^^^^^^^^ BTW, typo here...
            signo, (long) sip->si_pid, (long) sip->si_uid);
} else {
    switch (signo) {
        case SIGSEGV:
        case SIGBUS:
        case SIGILL:
        case SIGFPE:
            ErrorF("%s at address %p\n", strsignal(signo), sip->si_addr);

I can make it so that only those backtraces which have " at address " string preceding them are reported. That is, I can make so that only SIGSEGV/BUS/ILL/FPEs are reported. Do you want this change?

Comment 6 Adam Jackson 2012-10-19 22:40:39 UTC

(In reply to comment #5)
> (In reply to comment #0)
> > a) missing debuginfo in the backtrace - so no way to actually tell whats
> > going wrong
> 
> We probably can generate a better backtrace if X server would stop trying to
> handle its crashes and would just dump core as other programs do.

Now that the retrace server exists that's not a terrible idea, assuming that actually works for suid-root apps.

Comment 7 Fedora End Of Life 2013-07-04 05:19:16 UTC

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Fedora End Of Life 2013-08-01 16:19:33 UTC

Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.