Bug 781461

Summary: .xsession-errors file will fill up HDD inordinately
Product: [Fedora] Fedora Reporter: Bill C. Riemers <briemers>
Component: gdmAssignee: Ray Strode [halfline] <rstrode>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 16CC: alfredo.maria.ferrari, collura, darwish.07, jmccann, johannbg, johannbg, john.kissane, kitsuta, lpoetter, metherid, mschmidt, mwoehlke.floss, neil.bryant, notting, plautrba, rstrode, systemd-maint, wtogami, xgl-maint
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 495190
: 781462 (view as bug list) Environment:
Last Closed: 2013-02-14 00:49:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 495190    
Bug Blocks: 473302, 781462    

Description Bill C. Riemers 2012-01-13 14:25:01 UTC
I'm cloning this bug because it also happens on Fedora 14, 15, and 16, not just rawhide.  I think Fedora 14 is EOL so I will only clone for Fedora 15 and Fedora 16.  About the only way to recover is to reboot.  Typically gnome will not be able to capture things like the network manager again if I simply logout, remove the file, toggle my init level to restart gnome, and log back in.

To my this is an EXTREMELY serious bug.  An error file should never be allowed to grow indefinitely.  What is worse, it is written at such a fast rate, virtually all the disk IO is used.  And since Linux will gladly fill up all available memory with the write cache, invariable the system ends up needed to swap, at which point it slows down to the point where it might be easier to pull the power cord than attempt a graceful recovery.   Really quite unacceptable.

Generally the last few thousand lines of the file are messages warning the disk space is almost full.  Great.  If my disk is almost full, fill it up with messages telling me that...

One has to look much earlier in the file to find the real error that caused the problem.   I have not reported this problem in the past, because I need to remove the file, as I do not necessarily have the time to copy a 30-40 GB file to an external device so I can parse through it to see what the error that caused it was.   This time I actually did a dd to copy the first 1GB so I could examine it, so I will report that as a separate bug.   But to me the problem with the error file is far more critical.   Even if we solve the underlying application this time, next week we'll find another.  Worse, because of the problem with the error file people will not be reporting those underlying errors.

In Java if a message is repeated multiply times, typically the log simply lists the repeats like:

Previous message repeated XX times.

Or if it is a repeated series:

Previous 6 message repeated XX times.

Implementing something like this as well a log rotation file cap would be a great idea.  But for the first pass a simple log rotation.

Or for that matter, even inserting an occasional close and reopen in the system would help, as it would mean that if the file was removed manually, it would not continue to consume disk space until a restart.


+++ This bug was initially created as a clone of Bug #495190 +++

Description of problem:
Given the right set of (not too hard to reproduce) circumstances, the .xsession-errors file can and will fill up hundreds of GiB until the HDD itself is full.

Version-Release number of selected component (if applicable):
Does not seem to be applicable.

How reproducible:
Moderately difficult to reproduce.  Essentially, you need an application that reliably generates one or more of the same error over and over into the .xsession-errors file until the user executes an action.  The faster it generates errors, the better.  I've had this happen about six times inadvertently, so it should not be too difficult.

Steps to Reproduce:
1. Get an application to spam errors into the .xsession-errors file (located in /home/username/)
2. Wait.
  
Actual results:
.xsession-errors will continue to expand until it literally fills the HDD.  This has happened up to 500GiB for me.  Often the original errors that were spammed into the file are not reported because I need to delete the file and restart X to get my system to do much of anything.

Expected results:
.xsession-errors, ideally, would be capped in some manner depending on the free space left in the drive.  Old messages would be cycled out once the limit is reached.

Additional info:
This has happened on both my i686 laptop and x86_64 PC.  I don't know what is with my penchant for applications spamming errors into the .xsession-errors file.

--- Additional comment from mcepl on 2009-04-10 11:44:32 EDT ---

Well, assigning to developers, but I don't think it is so nice idea -- we should rather spent our time on fixing the application itself ;-).

--- Additional comment from kitsuta on 2009-04-10 13:27:20 EDT ---

(In reply to comment #1)
> Well, assigning to developers, but I don't think it is so nice idea -- we
> should rather spent our time on fixing the application itself ;-).  

That would be nice, but fixing the applications take more time, the bugs are more difficult to retrieve because the errors file needs to be deleted, and neglecting to put in a fail safe means that we leave the problem open for the next application that decides to massively spam the .xsession-errors file.  I mentioned it in passing in the report, but this happened to me six separate times on two computers - and I am quite sure it is more than one application that did this since my two systems run fairly unique environments, different programs, etc.

I would love to be able to put in bug reports for the actual sources of the error spamming, but I have nowhere to move a multi-GiB file - and restarting with the file still on the system results in a system that hangs on boot.  :(

--- Additional comment from ajax on 2009-04-13 18:14:24 EDT ---

This is not an X server bug.  It is, if anything, an xinit bug.  We should probably do something like 'exec grep . | logger -p user.warning' instead, except that /var/log/messages seems to have become root-only, boo.

--- Additional comment from notting on 2009-04-14 11:34:58 EDT ---

syslog does some duplicate detection, but it's easy to defeat, just by logging alternating different messages, and /var/log will fill up just the same.

I don't think it's the responsibility of the  X session to police the logs in this way; it can never do a really good job.b

--- Additional comment from rstrode on 2009-04-14 11:37:00 EDT ---

For what it's worth, gdm used to force ~/.xsession-errors to a particular size, but we changed it to no longer do that:

http://mail.gnome.org/archives/gdm-list/2007-November/msg00017.html

See the above thread for details and the rationale.

--- Additional comment from kitsuta on 2009-04-14 12:21:17 EDT ---

Thank you all for your feedback so far.  If needed, I am happy to change the component for the report.

Thanks for the link to the thread.  I read it, and while I think the patch back then was a good idea, it wasn't to allow .xsession-errors to grow inordinately for its own sake.  It looks like the way the capping was handled was breaking some important things, and the most elegant solution was to simply not cap .xsession-errors.  However, I think the rationale for choosing that solution may be outdated.  From the thread:

"- .xsession-errors is ideally empty all the time.  UI programs aren't
supposed to write to stdout/stderr, so when they do it's normally for
exceptional reasons."

"ideally" and "aren't supposed to" does not mean it doesn't happen.  In fact, my .xsession-errors is rarely empty, even when I'm not doing mean and nasty things to my system (e.g., moving several GB files off an NAS device).  Considering the topic is two years old, it's likely that this is more common now than it was before.

"- most files in a typical user's home directory are orders of
magnitude bigger than .xsession-errors"

Normally yes, but not when an application spams an error.  Even if it is "for exceptional reasons", if the application is ignored (i.e., the computer has been left to do a certain task overnight) or the error is otherwise allowed to go on, .xsession-errors becomes a monster.

"- given the above two things, it's very unlikely that that
.xsession-errors will ever hit the user limit"

The logic behind the lifting of the cap essentially assumes that the .xsession-errors file will be "very unlikely" to cause problems because of its size.  Unfortunately, my experience has shown that is no longer the case.

Additionally, it appears that the method for forcing the file to a particular size caused apps to die.  That is a clear reason for a patch, but it does not necessarily mean the .xsession-errors file can or should be left to grow unimpeded until it fills up the HDD.

If a complicated solution breaks things, perhaps a simpler solution would suffice, for example, replacing .xsession-errors with a new, blank file if it gets too large.  That seems like it would be a fairly non-breaking-things solution, but I'm not knowledgeable about OS programming.  All I'd really like is to find a good solution, so I'm not bothered if this is handled by something other than X session.

--- Additional comment from rstrode on 2009-04-14 15:17:56 EDT ---

We could potentially watch the file with gio and truncate it if it gets too large.

--- Additional comment from ajax on 2009-07-06 10:46:02 EDT ---

Well, for xinit this is kind of a WONTFIX as it's kind of out of scope.  But gdm might want to do better.

Reassigning to gdm.

--- Additional comment from alfredo.ferrari on 2009-07-13 05:46:14 EDT ---

This problem is really a killer. It makes user sessions crash all the time for
exceeded quota. My experience (Fedora 10 fully up-to-date) is that .xsession-errors grows up of 300 MBytes per day and sometimes more. Most of the messages are obviously debugging stuff from some applications, completely irrelevant. is there a way to switch off logging at all?

--- Additional comment from mcepl on 2011-01-10 08:07:50 EST ---

(In reply to comment #9)
> This problem is really a killer.

Well, kind of workaround for hopeless situation is cron job nuking ~/.xsession-errors file in regular intervals (or when the file is bigger then something).

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:29:42 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:31:40 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:34:28 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:37:23 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:43:22 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:47:23 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:49:39 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:51:56 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from fedora-admin-xmlrpc on 2011-06-21 11:53:07 EDT ---

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

--- Additional comment from neil.bryant on 2011-10-28 15:53:44 EDT ---

Got here because a friend asked about a 500GB file on his machine. I have a simple suggestion, if I understand the mechanics correctly.

As I understand previous comments, the file contains (at least partly) stdout and stderr chatter from gui apps.

Would it be possible to get gdm to redirect/suppress those separately? My file is mostly eaten up with what looks like stdout from tracker. My feeling is that if I could set the equivalent of:
  1>/dev/null 2>~.xsession-errors
that I could cut the file size by far more than half. If I could make the change via an entry into gconf or similar, so much the better.

That would at least make it easier to triage what needs attention in the log. And for users like my friend, it would probably shrink it to the point that an eventual restart happens before it gets too big.

I haven't a clue what the code looks like, so if this isn't feasible, etc, please disregard.

--- Additional comment from mw_triad.net on 2011-11-21 00:27:38 EST ---

Please add some sort of size limit or rotation method that does not require restarting X (to *all* DM's, if that is where it needs to happen). Fixing apps is not a solution; it is too easy for one rogue app to do Bad Things to a system as things currently stand. (And some of us are not willing to accept "restart X " as a "solution".)

Right now, my .xsession-errors is 0 bytes, thanks to 'truncate' (which was able to recover the disk space, fortunately!) and killing the offending process (knotify spitting out two gstreamer errors at the rate of *tens of MiB per second*... one, ironically, about being out of disk space), but the FD offset (next time something gets written to it) is floating somewhere around 1.3 TiB. (Yes, this broken process filled my *2 TB* HD.)

Really, I think writing directly to a file is broken. This needs to go through some kind of logger process that can perform live rotation when the file hits a certain size threshold. (And should have some form of protection against a rogue process thrashing the disk and wreaking havoc on its lifespan. Ideally by tracking output from every process separately and just dropping logging from anything that exceeds a rate limit.)

--- Additional comment from notting on 2011-11-21 15:39:54 EST ---

Moving to systemd - moving to systemd for the session would allow for a much saner place for solving this.

--- Additional comment from darwish.07 on 2012-01-06 06:54:56 EST ---

After about a week of normal Fedora 15 usage, .xsession-errors fills the entire partition, leading to random errors all over the place!

This should have a much higher priority than "low": an unsuspecting user would probably blame the entire distribution, stability-wise, and switch to a "more stable" distro. 

Thanks,

--- Additional comment from mschmidt on 2012-01-06 08:03:07 EST ---

(In reply to comment #22)

Yes, systemd's journal will resolve this nicely.
Until we get there, this works ;) :
ln -sf /dev/null ~/.xsession-errors

--- Additional comment from mw_triad.net on 2012-01-06 14:33:34 EST ---

(In reply to comment #24)
> (In reply to comment #22)
> 
> Yes, systemd's journal will resolve this nicely.
> Until we get there, this works ;) :
> ln -sf /dev/null ~/.xsession-errors

Where would one place this command to execute at the appropriate time? Unless it is done after the old log is moved but before X is redirected, all you'll do is make it so you can't find the ginormous file without lsof.

--- Additional comment from mschmidt on 2012-01-09 08:46:16 EST ---

I created the symlink just once. After reboot, it's still a symlink. gdm did not overwrite it with a regular file.

--- Additional comment from john.kissane on 2012-01-10 04:23:23 EST ---

(In reply to comment #24)
> (In reply to comment #22)
> 
> Yes, systemd's journal will resolve this nicely.
> Until we get there, this works ;) :
> ln -sf /dev/null ~/.xsession-errors

One of my colleagues ran into this issue as well with Fedora 14 where a 70Gb .xsession-errors along with a 40Gb .xession-errors.old file caused him to exceed his disk quota on an nfs volume. I'll get him to try this solution to see if it stops the problem happening again.

Comment 1 Michal Schmidt 2012-01-13 15:00:55 UTC
Like https://bugzilla.redhat.com/show_bug.cgi?id=495190#c28, reassigning to gdm.

Comment 2 Fedora End Of Life 2013-01-16 22:13:44 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Fedora End Of Life 2013-02-14 00:50:02 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.