Bug 258221

Summary: corrupts printers.conf
Product: Red Hat Enterprise Linux 5 Reporter: jmccann
Component: cupsAssignee: Tim Waugh <twaugh>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.0CC: cschalle, k.georgiou, security, sputhenp, ttsig, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHBA-2008-0366 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 17:19:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cups-error-log.txt.gz from this afternoon none

Description jmccann 2007-08-27 21:33:09 UTC
Description of problem:
After some period of time CUPS corrupts its printers.conf file and stops the
affected queue.

Version-Release number of selected component (if applicable):
cups-1.2.4-11.5.3.el5

How reproducible:
Has occurred 4 times now on different RHEL5 systems.

Steps to Reproduce:
1. Update printers.conf (using gui or directly)
2. Wait
3. printers.conf becomes corrupted
  
Actual results:
Here is a diff after moving the corrupted file out of the way and restoring the
last known good one:
sudo diff -u printers.conf{.bad,}
--- printers.conf.bad   2007-08-27 17:20:59.000000000 -0400
+++ printers.conf       2007-08-27 17:21:09.000000000 -0400
@@ -1,5 +1,5 @@
 # Printer configuration file for CUPS v1.2.4
-# Written by cupsd on 2007-08-24 18:02
+# Written by cupsd on 2007-07-27 12:23
 <Printer fifthlp>
 Info Xerox Phaser 7760GX
 Location Near room 510
@@ -48,10 +48,9 @@
 <DefaultPrinter secondlp>
 Info Xerox Phaser 7760GX
 Location Room 218
-DeviceURI 0{
-State Stopped
-StateMessage /usr/lib/cups/backend/0{ failed
-StateTime 1187992928
+DeviceURI lpd://ccprint.pha.jhu.edu/secondlp
+State Idle
+StateTime 1185552398
 Accepting Yes
 Shared Yes
 JobSheets none none

Comment 1 security 2007-09-10 15:56:22 UTC
Same here on cups-1.2.12-4.fc7
  * Device URI seems to be random characters
  * Only occuring on one of our devices: HP Laserjet 4200
  * This device is default printer
  * Corruption does not occur regularly or with any pattern.
  * So far have not located anything upstream or with other distro trackers

Comment 2 jmccann 2007-09-10 16:05:50 UTC
Corruption has only occurred for the default device in my case as well.  Also
irregular occurance.

Comment 3 Tim Waugh 2007-09-10 16:09:07 UTC
Has this ever occurred "spontaneously", without changing the CUPS configuration?
 What methods do both of you use for adjusting the CUPS configuration?
(system-config-printer, CUPS web interface, KDE tools, etc)

Comment 4 jmccann 2007-09-10 16:22:39 UTC
Hi Tim.

Yes, it has always occurred spontaneously without changing the configuration. 
We set the configuration at system install time and do not change it.

Comment 5 Tim Waugh 2007-09-12 09:08:47 UTC
I would very much like to see the /var/log/cups/error_log* file corresponding to
the time the problem occurred.  There will be several files there: error_log,
error_log.1, error_log.2 etc.  Please run this command:

ls -l /var/log/cups/error_log*

to see the timestamps on the files -- each timestamp was modified the very last
time the log file was written to.  If you can attach that file, and let me know
roughly what time and date the problem occurred, I hope I will be able to find
out more about why it happened.  Thanks.

Comment 6 jmccann 2007-09-12 17:16:43 UTC
Here is the bit of error_log for one occurrence of the problem.

I [13/Aug/2007:07:26:05 -0400] Adding start banner page "none" to job 13.
I [13/Aug/2007:07:26:05 -0400] Adding end banner page "none" to job 13.
I [13/Aug/2007:07:26:05 -0400] Job 13 queued on "secondlp" by "martel".
I [13/Aug/2007:07:26:05 -0400] Started filter /usr/lib/cups/filter/pstops (PID
17347) for job 13.
I [13/Aug/2007:07:26:05 -0400] Started backend /usr/lib/cups/backend/lpd (PID
17348) for job 13.
I [13/Aug/2007:07:26:09 -0400] Adding start banner page "none" to job 14.
I [13/Aug/2007:07:26:09 -0400] Adding end banner page "none" to job 14.
I [13/Aug/2007:07:26:09 -0400] Job 14 queued on "secondlp" by "martel".
I [13/Aug/2007:07:26:09 -0400] Started filter /usr/lib/cups/filter/pstops (PID
17356) for job 14.
I [13/Aug/2007:07:26:09 -0400] Started backend /usr/lib/cups/backend/0c<C2>^H^A
(PID 17357) for job 14.
E [13/Aug/2007:07:26:09 -0400] PID 17357 (/usr/lib/cups/backend/0c<C2>^H^A)
stopped with status 22!
I [13/Aug/2007:07:26:09 -0400] Hint: Try setting the LogLevel to "debug" to find
out more.
I [13/Aug/2007:07:26:09 -0400] [Job 14] Backend returned status 22 (unknown)
I [13/Aug/2007:07:26:09 -0400] Saving printers.conf...


Comment 7 Tim Waugh 2007-09-13 09:46:08 UTC
So at 07:26:05 the device URI is correct, but 4 seconds later it is incorrect. 
That is a really useful log snippet -- thanks!

Would it be possible to try increasing the verbosity of the error log on that
machine?  I'd like to get as much information as possible about what cupsd is up
to in that time -- perhaps it is receiving a weird browse packet or something. 
Here's what I'd like you to try:

1. Edit /etc/cups/cupsd.conf and change the 'LogLevel' line to read:

LogLevel debug2

2. Tell cupsd to reload its configuration file:

/sbin/service cups reload

This will put a lot of information in /var/log/cups/error_log so you might need
to keep an eye on free disk space.  To change it back at any time, change the
LogLevel line to read:

LogLevel info

and reload CUPS as above.

I would be really interested in seeing the debug2 messages from the error_log
file in between the correct backend being started and the incorrect one.

Comment 8 jmccann 2007-11-15 21:04:24 UTC
OK, I finally got this happen on a system with debug enabled.

I printed a number of documents today just fine.
Then I updated to RHEL5.1 using yum.
Then I printed from Firefox and the document didn't print.
When the yum updates finished I rebooted.
I tried twice more to print the document from Firefox and then discovered that
the queue was corrupted.

I'll attach the error log for today.

Comment 9 jmccann 2007-11-15 21:06:37 UTC
Created attachment 260391 [details]
cups-error-log.txt.gz from this afternoon

Comment 10 Tim Waugh 2007-11-20 12:14:06 UTC
I think I might have discovered the root cause of this.

Comment 12 jmccann 2007-11-29 23:50:05 UTC
I have noticed a strong correlation between this problem occurring and when we
manipulate the paper trays (ie. when changing paper).

Comment 13 RHEL Program Management 2007-12-03 20:45:40 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release.  This request will
be reviewed for a future Red Hat Enterprise Linux release.

Comment 14 Tom Sightler 2007-12-10 16:57:45 UTC
Over the weekend we upgraded our primary print server from RHEL4.5 to RHEL5.1
which handles nearly 300 printers and we are now seeing this problem.  So far it
has happened only on our Intermec label printers.  These printers use a raw
print queue and the lpd backend but they make up approximately half of those 300
printers so it could be just luck.  The others are mostly postscript laser
printers using the Jetdirect backend.

This is causing us HUGE issues without production systems.  It's sad because one
of the reasons for somewhat "rushing" the 5.1 upgrade was to get the new
improved printer management (managing ~300 printers with the tools in RHEL4 was
nearly impossible).

Any test packages or patches out there?


Comment 15 Tom Sightler 2007-12-11 23:39:56 UTC
With lack of anything else to try I downloaded the most recent FC7 source rpm
(cups-1.2.12-8) and compiled for RHEL5.  We have now survived 24 hours without
the problem reoccurring.

I saw above that someone else reported similar issues with cups-1.2.12-4 but I
decided to take a stab anyway.  Are there any patches in 1.2.12-8 that might
possibly fix this issue?

Also, we continue to only see the failure on printers that use the "lpd"
backend.  We've noticed the problem seems to happen when the printer has any
type of issue that causes a job to fail (out of paper, paper jam, etc).  It
appears to happen when the cupsd process attempts to update the State, or
perhaps more specifically, the StateMessage in the printers.conf file.  I guess
we don't have proof of this, but in every case where we've been able to get
decent information from the user the printer was working shortly before the
failure, and then ran out of paper, or ink, or had some other problem that
needed user attention.


Comment 16 Tim Waugh 2007-12-12 15:34:37 UTC
Tom: yes, 1.2.12-8.fc7 contains the fix I had in mind while writing comment #10.
 I am sorry that I do not have a proper test package ready yet for Red Hat
Enterprise Linux 5.  Thank you very much for testing the Fedora 7 package --
that is very useful.

The nature of the problem is not limited to any particular backend or operation:
the actual error may happen quite some time earlier than the symptom becomes
apparent.

I am working to get a real test package ready, and to include this fix in Red
Hat Enterprise Linux 5.

Comment 17 Tom Sightler 2007-12-13 04:19:20 UTC
Thanks Tim,

It was unclear from the changelog if FC7 package included the fix but now that I 
know it does I feel even better.  We survived yet another day without issues so 
I was feeling pretty good, but that helps me relax a little.

I was going to try and dig through the patches to find the differences between 
1.2.12-4 and 1.2.12-8 and see if I could backport any fixes but since my 
immediate critical issue is addressed I'll just wait for any test packages that 
you come up with.  I have opened an official support case with Redhat support 
but they're still in the "collecting error logs" stage.

Thanks for your assistance,
Tom


Comment 18 Tim Waugh 2007-12-13 16:36:56 UTC
I have put test packages (1.2.4-11.14.el5_1.3.0.0.1) here:

http://people.redhat.com/twaugh/tmp/cups-rhel5-258221/

There are binary packages for i386 and x86_64 -- if you need a different
architecture please let me know.

These are not supported packages and should be used for testing only.

Comment 25 Tim Waugh 2008-05-06 15:39:59 UTC
*** Bug 378451 has been marked as a duplicate of this bug. ***

Comment 26 errata-xmlrpc 2008-05-21 17:19:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0366.html