Package: lpr-50-4 Caveat: For some reason, this was producible quite easily when the printer was hooked up to the serial port, don't believe this is essential though. Description: 'lpr'ing two small files would frequently case the second file to get caught in the print queue. 'lpq' would report 'warning: no daemon present'. Restarting the daemon would cause the second job to be printed properly. Detective work: The lpd daemon would spawn off a child to take care of the first job. You lpr the second job, a second daemon gets spawned off. During this time, the first daemon would enter printjob.c, line 261(the 'done' section). The first daemon believe's it is done processesing, but it doesn't release the lock untill the exit on line 271. During this time, the daemon which was spawned to handle the second job hits lpd.c line 142 and sees that the first daemon has the lock, so the second daemon exits, assuming the first daemon will take care of printing the second job. Of course, the first daemon think's it's done processing, so it doesn't bother printing the second job either, leaving the job queue'd up untill the daemon is restarted(either explicitly, or through submission of another print-job). Fix: Get rid of the deadlock. This was causing our software to break pretty seriously, right now as a stop-gap I've added a flock() to printjob.c:L262 to unlock the file. I *believe* this resolves the problem we were having, but I haven't had a chance to scrutinize the code very closely to ensure that unlocking the file at that point is Ok, or to see if this removes the deadlock completly, or if it just makes the timing harder to hit.
This is fixed in Rawhide - we're using LPRng from now on.
This will have to be fixed, garh.
This is what I ended up doing, based upon your comments. I reordered the code to reduce the time during which a deadlock might happen (though not by much). /* * search the spool directory for more work. */ nitems = getq(&queue); if (nitems == 0) { /* no more work to do */ done: flock(lfd,LOCK_UN); /* Unlock the lock now, to avoid deadlocks */ if (count > 0) { /* Files actually printed */ if (!SF && !tof) (void) write(ofd, FF, strlen(FF)); if (TR != NULL) /* output trailer */ (void) write(ofd, TR, strlen(TR)); } (void) close(ofd); (void) wait(NULL); (void) unlink(tempfile); exit(0); } else if (nitems < 0) { syslog(LOG_ERR, "%s: can't scan %s", printer, SD); exit(1); } goto again; I will close this when I push the Errata (which means QA and Docs has to sign off on it).
Okay, if a more guaranteed fix is needed, I could add a spinlock file lock, but I dont really want to do that, for obvious reasons. So if you need this reopened, I guess we add the spin lock.
Okay, we have a child bug: Bug #18853 It's contents: Having problem printing to network printers. The jobs hang in the queue. when running 'lpc stat <printername>' jobs show as waiting. When I run 'ps -ef |grep lpd' - 'lpd' shows as running more than once (usually twice). If I kill the second instance, the reports will occasionally print. Often it is a matter of a combination of killing the process, and trying 'lpc restart <printername>' or 'lpc down/up <printername>'. I have not found consistency in getting the reports to print. I had a file sent to me - lpr-0.50-7.i386.rpm - installed that. It seems to help a little, but we still have a problem. There are only 3 of 18 printers that get hung. They are all at the bottom in printtool, but were not to start. I deleted and re-added 2 of them. I have also swapped hardware on one of them. The 3 printers are HP laserjets - a model 4, 5, & 4050N. The 4050N has a built in network card, the other 2 are using NetGear PS104 print servers. I have them set up on linux as 'Remote Unix (lpd) queue', Remote Host is IP Address, Remote Queue is Raw. These printers also exist on the NT side. I actually set them up there first, to use the software with the printservers/network cards to assign the IP address to the devices. Then I create them on Linux.
*** Bug 18853 has been marked as a duplicate of this bug. ***
I will examine how to do more complete locking on lpr, but it wont be easy. The lpr-0.50-7 package is about the most that can easily be done to fix this, and anything more complete will take some serious thought to avoid creating NEW race condtions.
I'm not going to get to this.