Bug 452305 - RHTS test /kernel/standards/usex/1.9-29 fails sometimes
RHTS test /kernel/standards/usex/1.9-29 fails sometimes
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: cups (Show other bugs)
4.7
i386 Linux
medium Severity medium
: rc
: ---
Assigned To: Tim Waugh
qe-baseos-daemons
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-20 16:00 EDT by Vivek Goyal
Modified: 2012-06-20 11:58 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 11:58:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Full log for Comment #8 (328.57 KB, text/plain)
2009-08-28 14:31 EDT, Jeff Burke
no flags Details

  None (edit)
Description Vivek Goyal 2008-06-20 16:00:10 EDT
Description of problem:

Sometimes (2 times to be precise), we observed "usex" rhts test failure with
rhel4 kernels. 
Creating to a bug to find out what's wrong. 

Following are the links to two rhts failures observed.

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3268791
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3016172

Version-Release number of selected component (if applicable):
74.EL

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Vivek Goyal 2009-07-10 11:01:40 EDT
Noticed one more instance.

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=8976650
Comment 2 Dave Anderson 2009-07-10 11:42:48 EDT

In all 3 logs, the /usr/bin/lpq command, which is simply invoked by
usex as "/usr/bin/lpq", has hung:

FINAL SCREEN:

********************************************************************************
*  ID  BSIZE  MODE   POINTER   OPERATION         FILE NAME         PASS  STAT  *
*  --  -----  ----  ---------  ---------  -----------------------  ----  ----  *
*   1  /usr/bin/lpq  [                                          ]     2 <HANG> *
*   2  /usr/bin/lpq  [                                          ]     2 <HANG> *
*   3  /usr/bin/lpq  [                                          ]     1 <HANG> *
*   4  256 mb random           592.7 mb   virtual memory test            BKGD  *
*   5  1851851 dhrystones/second          dhrystone benchmark       766  BKGD  *
*   6  1785714 dhrystones/second          dhrystone benchmark       774  BKGD  *
*   7  1785714 dhrystones/second          dhrystone benchmark       850  BKGD  *
*   8  572.736 MWIPS           calibrate  whetstone benchmark         3  BKGD  *
*   9  589.074 MWIPS           Loop N1..  whetstone benchmark         3  BKGD  *
*  10  587.125 MWIPS           calibrate  whetstone benchmark         3  BKGD  *
*                                                                              *
*                                                                              *
********************************************************************************
*  Unix System EXerciser  *  USER SYSTEM IDLE  LOADAVG  TASKS/RUN   TEST TIME  *
*   USEX Version 1.9-29   *   96%    4%    0%    7.09      75/9     000:14:00  *
*                         *                                                    *
* dell-pe700-01 - i686 (2)*  PAGE-IN PAGE-OUT  FREEMEM  FREESWAP   CSW   INTR  *
*Linux 2.6.9-71.EL.vgoyal.*     0       0      31.0 mb   1.9 gb     61   1642  *
*                         *  SWAP-IN SWAP-OUT  BUFFERS   CACHED   FORKS        *
*    24May08  09:25:49    *     0       0      85.9 mb   3.2 gb     5          *
********************************************************************************

It may have something to do with the way the system is configured perhaps?
I don't have an RHTS machine allocated at the moment, so I can't test it.

So there's two options -- I can remove "lpq" from the bin test suite,
or you can bring up a RHEL4 RHTS machine, run /usr/bin/lpq by itself,
and determine whether it's a bug with lpq.

I'll leave this BZ in needinfo until somebody can tell me what to do.
Comment 3 Dave Anderson 2009-07-10 11:47:52 EDT
The "lpq" commands appear to be waiting on a socket, i.e., rather than
failing immediately:

lpq           S 00000001  2616 15230  14052                     (NOTLB)
c7354cb4 00000082 00000023 00000001 df5bd280 f719b9b0 00000019 c2e33de0 
       f719b9b0 00000000 c2e3c740 c2e3bde0 00000001 00000000 d06c1b80 000f4f00 
       f719b9b0 f702d9b0 f702db1c 00000001 00000004 c7354000 7fffffff 00000000 
Call Trace:
 [<c02dec10>] schedule_timeout+0xc4/0x159
 [<c01ad97f>] avc_has_perm_noaudit+0x8d/0xda
 [<c02a56b0>] ip_finish_output2+0x0/0x18d
 [<c02862b5>] release_sock+0xf/0x4f
 [<c0285dea>] sk_wait_data+0x88/0xbd
 [<c0120555>] autoremove_wake_function+0x0/0x2d
 [<c0120555>] autoremove_wake_function+0x0/0x2d
 [<c02dea62>] __cond_resched+0x14/0x39
 [<c02ab7ec>] tcp_recvmsg+0x379/0x681
 [<c02863e2>] sock_common_recvmsg+0x30/0x46
 [<c0283033>] sock_recvmsg+0xef/0x10c
 [<c0144de0>] buffered_rmqueue+0x17d/0x1a5
 [<c0120555>] autoremove_wake_function+0x0/0x2d
 [<c015d39a>] fget+0x3b/0x42
 [<c02842d7>] sys_recvfrom+0xa5/0xf5
 [<c014fd71>] handle_mm_fault+0xdc/0x193
 [<c011b03f>] do_page_fault+0x1ae/0x5c6
 [<c02acce3>] tcp_setsockopt+0x521/0x52f
 [<c028640d>] sock_common_setsockopt+0x15/0x19
 [<c0284340>] sys_recv+0x19/0x1d
 [<c0284a22>] sys_socketcall+0x17b/0x1fb
 [<c010a146>] do_syscall_trace+0xc0/0xca
 [<c02e053f>] syscall_call+0x7/0xb

If that's the correct behaviour, then the "lpq" command should probably
be removed from the "bin" test suite of commands.  But it's strange that
nobody's ever complained until now...
Comment 4 Dave Anderson 2009-07-10 13:57:02 EDT
I can't reproduce it on an i686 running 2.6.9-89.ELsmp.  

I've run "usex -e" with the bin test suite running the 264 different
commands it found on the system, as well restricting each bin test
to running "lpq" over and over again.  They continue to run just fine,
returning this error message:

[root@sun-x4200-01 ~]# /usr/bin/lpq
lpq: error - no default destination available.
[root@sun-x4200-01 ~]# 

It's annoying that in each of the 3 log files posted, there's a segment
that looks like this:

<snip>....</snip>

which removes the individual test information for the first 5 or 6 tests.
The hanging "lpq" tests, as shown above, are running in test slots 1, 2
and 3, so any usex-internal test information for those tests have been
"snipped" out of the log files.

Anyway, I have no other suggestions.
Comment 5 Dave Anderson 2009-07-10 14:00:47 EDT
> I can't reproduce it on an i686 running 2.6.9-89.ELsmp.

BTW, that's on a freshly-allocated RHTS machine.
Comment 6 Jeff Burke 2009-07-10 15:28:43 EDT
David,
   FYI <snip>....</snip> is added intentionally. Basically when a log file is too big, RHTS will put the first 25 and the last 25 lines of the log with the snip snip in the middle.

   At the top of the RHTS page there is a Binary Log section. In that section you will see the full logs for the test.
Comment 7 Dave Anderson 2009-07-10 16:40:56 EDT
(In reply to comment #6)
> David,
>    FYI <snip>....</snip> is added intentionally. Basically when a log file is
> too big, RHTS will put the first 25 and the last 25 lines of the log with the
> snip snip in the middle.
> 
>    At the top of the RHTS page there is a Binary Log section. In that section
> you will see the full logs for the test.  

I don't know where the "RHTS page" is?  All I know about are the three
log files pointed to by this BZ, and those log files only have "Recipe ID"
and "Job ID" pointers.  I'm guessing that the RHTS pages that you talk about
somehow found embedded in some drop-down list in the "Job ID" page(s), but
my eyes glazed over trying to find them...
Comment 8 PaulB 2009-08-28 14:09:52 EDT
All,
While reviewing the following kernel_workflow, this issue was noticed.

Name: RHEL4.U7, kernel 2.6.9-78.0.26.EL Fri Aug 28
Developer: JOlsa
Description: Test results for RHTS  Job#86862

Job#86862
i386 - kernel 2.6.9-78.0.26.EL 
/kernel/standards/usex/1.9-29
Recipe-254030
Test - /kernel/standards/usex/1.9-29 - FAIL
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=9813309

-PaulB
Comment 9 Jeff Burke 2009-08-28 14:31:56 EDT
Created attachment 359103 [details]
Full log for Comment #8
Comment 10 Dave Anderson 2009-08-28 15:17:29 EDT
This is the same issue as reported in the original report, where the
commands that were invoked simply as:

   /usr/bin/lpq

all ended up waiting (perceived as "hanging") on a socket forever,
or actually for ~12 minutes until the whole usex test sequence was
shut down.

The lpq man page states:

       lpq shows the current print queue status on the named printer.
       Jobs queued on the default destination will be shown if no printer
       or class is specified on the command-line.

So given that no arguments are given, it must be trying to determine
the "default destination"?  I don't have any insight into how "lpq" works,
but again, if I run it on my RHEL4 machine, it does this:

  # /usr/bin/lpq
  color-fl2 is ready
  no entries
  #

But if there's no printer configuration done at all, as is the case
with RHTS, I had presumed that the command should still return something?

And when I did try to reproduce it on a freshly-allocated RHTS machine, as
indicated in comment #4, it returned this: 

  root@sun-x4200-01 ~]# /usr/bin/lpq
  lpq: error - no default destination available.
  [root@sun-x4200-01 ~]# 

And I'm guessing that even in your RHTS testing, that it does work like
the above 99% of the time? 

In any case, I'll be glad to simply remove the "lpq" command from the
set of usex "bin" test commands.
Comment 11 Dave Anderson 2009-08-28 15:35:01 EDT
Alternatively, when invoking the "usex" command, you can use the --exclude
option to preclude specific commands in the "bin" test suite:

  usex --exclude lpq

where the the argument to --exclude can be a comma-separated list of
commands, as in

  usex --exclude lpq,ps,ls
Comment 12 Dave Anderson 2009-08-28 15:43:31 EDT
I've put in an RHTS reserve-workflow request for this last machine that 
exhibited the error:

  hp-dl120g5-01.rhts.eng.bos.redhat.com

I'll see if I can reproduce it.
Comment 13 Jeff Burke 2009-08-28 16:04:36 EDT
Dave,
   Yes it is the same issue. lpq is in a hung state.  It does work 99% of the time with all the other tests runs.  I don't recall seeing this issue in any release prior to U7, so I am not sure that removing the lpq from the bin test is the right answer.  That just sounds like we are avoiding a potential problem.  In the end this may end up being an issue with lpq command itself.

   Is there anything you can think of that we could add to help debug this? We have a lot of data but it just doesn't seem to be the right data.  We can't seem to reproduce it either.  It happens then we don't see it for a while.

Thanks,
Jeff
Comment 14 Dave Anderson 2009-08-28 16:08:47 EDT
>   Is there anything you can think of that we could add to help debug this? We
> have a lot of data but it just doesn't seem to be the right data.  We can't
> seem to reproduce it either.  It happens then we don't see it for a while.

Keep the machine alive instead of returning it back to RHTS?
Comment 15 Dave Anderson 2009-09-03 17:05:42 EDT
I finally got that machine hp-dl120g5-01.rhts.eng.bos.redhat.com
where the problem was seen.  /usr/bin/lpq runs as expected:

   [root@hp-dl120g5-01 ~]# /usr/bin/lpq
  lpq: error - no default destination available.
  [root@hp-dl120g5-01 ~]# 

I'll run that same usex test setup as done in the report over
the weekend and see if it ever hangs.
Comment 16 Dave Anderson 2009-09-08 14:04:08 EDT
I ran the same usex test (3 bins, 1 vmem, 3 drys and 3 whets) for several
days with no problem on hp-dl120g5-01.rhts.eng.bos.redhat.com, which had
seen the problem (?) before.

All "lpq" runs in each bin test ran as expected, returning
"lpq: error - no default destination available".  I've also just
run a set of just bin tests that just run lpq repeatedly, and
that exhibits no problem.

So the only thing I can suggest is to rig the RHTS test such
that it will keep the machine alive if a "HANG" is perceived.
Comment 17 Vivek Goyal 2010-02-08 10:53:45 EST
Noticed one more instance of it.

http://rhts.redhat.com/cgi-bin/rhts/test_list.cgi?test_filter=/kernel/standards/usex/1.9-29&result=Fail&rwhiteboard=kernel%202.6.9-89.20.EL.vgoyal.test1&arch=i386&jobids=123882

Regarding, lpq not returning immediately, and it looks like in case of failure, lpq is waiting on a socket. Could it be possible that there is a default destination and lpq is trying to query that? May be network is down?  Just some wild guesses.
Comment 19 Tim Waugh 2010-02-08 11:56:50 EST
lpq connects to the CUPS server, usually localhost but configurable in /etc/cups/client.conf and ~/.cups/client.conf.

So lpq is "waiting" on a socket -- doing what, read()?  connect()?
Comment 20 Dave Anderson 2010-02-08 12:58:12 EST
The "fail" link from comment #17 has sysrq-t type output.
Here are the 3 lpq tasks:

lpq           S F6D647C0  2440 13625  12690                     (NOTLB)
ea4b3ca4 00000086 c0391660 f6d647c0 f77021f0 00000000 c21ca7c0 000f4e83 
       f1df18b0 f1df1a3c ea4b3000 7fffffff 00000000 ea4b3d24 c03221ef f1df18b0 
       00000000 e76e0730 e76e0730 e76e0730 e76e0730 00000002 c01d5a90 00000000 
Call Trace:
 [<c03221ef>] schedule_timeout+0xc6/0x17c
 [<c01d5a90>] avc_has_perm_noaudit+0x8d/0xda
 [<c02e39a9>] ip_finish_output+0x197/0x19f
 [<c02bf961>] sk_wait_data+0x87/0xbb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c02e9c31>] tcp_recvmsg+0x379/0x651
 [<c02c00eb>] sock_common_recvmsg+0x30/0x46
 [<c02bc845>] sock_recvmsg+0xef/0x10c
 [<c0320fd1>] schedule+0x44d/0x606
 [<c0162bef>] anon_vma_prepare+0x1f/0x1cb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c01d5a30>] avc_has_perm_noaudit+0x2d/0xda
 [<c02bdb50>] sys_recvfrom+0xa5/0xf5
 [<c015e42b>] handle_mm_fault+0xec/0x212
 [<c011dedd>] do_page_fault+0x1ac/0x4dc
 [<c02eb0fb>] tcp_setsockopt+0x527/0x535
 [<c02c0116>] sock_common_setsockopt+0x15/0x19
 [<c02bdbb9>] sys_recv+0x19/0x1d
 [<c02be29b>] sys_socketcall+0x17b/0x1fb
 [<c010b9af>] do_syscall_trace+0xc0/0xc9
 [<c0323937>] syscall_call+0x7/0xb
lpq           S F65779C0  2620 13863  12689                     (NOTLB)
f21afca4 00000086 c0391660 f65779c0 f5fc8c90 00000000 e4fb8bc0 000f4e84 
       f6b9b770 f6b9b8fc f21af000 7fffffff 00000000 f21afd24 c03221ef f6b9b770 
       00000000 e76e0730 e76e0730 e76e0730 e76e0730 00000002 c01d5a90 00000000 
Call Trace:
 [<c03221ef>] schedule_timeout+0xc6/0x17c
 [<c01d5a90>] avc_has_perm_noaudit+0x8d/0xda
 [<c02e39a9>] ip_finish_output+0x197/0x19f
 [<c02bf961>] sk_wait_data+0x87/0xbb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c02e9c31>] tcp_recvmsg+0x379/0x651
 [<c02c00eb>] sock_common_recvmsg+0x30/0x46
 [<c02bc845>] sock_recvmsg+0xef/0x10c
 [<c0320fd1>] schedule+0x44d/0x606
 [<c0162bef>] anon_vma_prepare+0x1f/0x1cb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c02bdb50>] sys_recvfrom+0xa5/0xf5
 [<c015e42b>] handle_mm_fault+0xec/0x212
 [<c011dedd>] do_page_fault+0x1ac/0x4dc
 [<c02eb0fb>] tcp_setsockopt+0x527/0x535
 [<c02c0116>] sock_common_setsockopt+0x15/0x19
 [<c02bdbb9>] sys_recv+0x19/0x1d
 [<c02be29b>] sys_socketcall+0x17b/0x1fb
 [<c010b9af>] do_syscall_trace+0xc0/0xc9
 [<c0323937>] syscall_call+0x7/0xb
lpq           S F0E19080  2620 13895  12688                     (NOTLB)
dbd84ca4 00000082 c0391660 f0e19080 f1df0170 00000000 294a1e40 000f4e85 
       f6b9a600 f6b9a78c dbd84000 7fffffff 00000000 dbd84d24 c03221ef f6b9a600 
       00000000 e76e0730 e76e0730 e76e0730 e76e0730 00000002 c01d5a90 00000000 
Call Trace:
 [<c03221ef>] schedule_timeout+0xc6/0x17c
 [<c01d5a90>] avc_has_perm_noaudit+0x8d/0xda
 [<c02e39a9>] ip_finish_output+0x197/0x19f
 [<c02bf961>] sk_wait_data+0x87/0xbb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c02e9c31>] tcp_recvmsg+0x379/0x651
 [<c02c00eb>] sock_common_recvmsg+0x30/0x46
 [<c02bc845>] sock_recvmsg+0xef/0x10c
 [<c0320fd1>] schedule+0x44d/0x606
 [<c0162bef>] anon_vma_prepare+0x1f/0x1cb
 [<c0121dab>] autoremove_wake_function+0x0/0x2d
 [<c01d5a30>] avc_has_perm_noaudit+0x2d/0xda
 [<c02bdb50>] sys_recvfrom+0xa5/0xf5
 [<c015e42b>] handle_mm_fault+0xec/0x212
 [<c011dedd>] do_page_fault+0x1ac/0x4dc
 [<c02eb0fb>] tcp_setsockopt+0x527/0x535
 [<c02c0116>] sock_common_setsockopt+0x15/0x19
 [<c02bdbb9>] sys_recv+0x19/0x1d
 [<c02be29b>] sys_socketcall+0x17b/0x1fb
 [<c010b9af>] do_syscall_trace+0xc0/0xc9
 [<c0323937>] syscall_call+0x7/0xb
Comment 21 Tim Waugh 2010-02-08 17:57:31 EST
So recv().  So it has connected to some CUPS server somewhere, and is waiting for it to respond to a request.
Comment 22 Dave Anderson 2010-02-09 11:22:25 EST
On a fresh RHTS install, I'm presuming that /etc/cups/client.conf will
be a zero-length emtpy file, and that ~/.cups/client.conf won't exist.

So for example, doing an strace on a RHEL5 RHTS machine I've got, the
recv() calls are run against fd 4, which are these:

socket(PF_FILE, SOCK_STREAM, 0)         = 4
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = -1 EOPNOTSUPP (Operation not supported)
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
connect(4, {sa_family=AF_FILE, path="/var/run/cups/cups.sock"...}, 26) = 0
send(4, "POST / HTTP/1.1\r\nContent-Length:"..., 134, 0) = 134
send(4, "\1\1@\2\0\0\0\1\1G\0\22attributes-charset\0\5"..., 416, 0) = 416
recv(4, "HTTP/1.1 100 Continue\r\n\r\nHTTP/1."..., 2048, 0) = 340
send(4, "POST / HTTP/1.1\r\nContent-Length:"..., 134, 0) = 134
send(4, "\1\1@\5\0\0\0\1\1G\0\22attributes-charset\0\5"..., 416, 0) = 416
recv(4, "HTTP/1.1 100 Continue\r\n\r\nHTTP/1."..., 2048, 0) = 340
send(4, "POST / HTTP/1.1\r\nContent-Length:"..., 133, 0) = 133
send(4, "\1\1@\1\0\0\0\1\1G\0\22attributes-charset\0\5"..., 75, 0) = 75
recv(4, "HTTP/1.1 100 Continue\r\n\r\nHTTP/1."..., 2048, 0) = 336

If the cups service is turned off, then then no recv() calls
are made by lpq.

So it appears that, for whatever reason, the cups server on the localhost
is not responding in these very rare occasions.  (RHEL4 only?)
Comment 23 Tim Waugh 2010-02-09 11:32:46 EST
Perhaps take a look in /var/log/cups/error_log and see if it has any clues about what cupsd is up to?
Comment 24 Dave Anderson 2010-02-09 11:41:56 EST
(In reply to comment #23)
> Perhaps take a look in /var/log/cups/error_log and see if it has any clues
> about what cupsd is up to?    

Good point -- Jeff, any chance you can work your magic with RHTS to save
that file in the case of an error?
Comment 25 Jeff Burke 2010-03-18 10:56:34 EDT
Dave,
  I have modified the test. The next times this happes we should have the logs.

Thanks,
Jeff
Comment 26 PaulB 2010-05-03 13:50:20 EDT
Dave,
The test was modified to tar the cups.log file. The issue was reproduced and we have the following data or cups.log:
$ cat error_log.1
I [30/Apr/2010:08:39:04 -0400] Listening to 0:631
I [30/Apr/2010:08:39:04 -0400] Loaded configuration file "/etc/cups/cupsd.conf"
I [30/Apr/2010:08:39:04 -0400] Configured for up to 100 clients.
I [30/Apr/2010:08:39:04 -0400] Allowing up to 100 client connections per host.
I [30/Apr/2010:08:39:04 -0400] Full reload is required.
I [30/Apr/2010:08:39:04 -0400] LoadPPDs: Wrote "/etc/cups/ppds.dat", 14 PPDs...
I [30/Apr/2010:08:39:04 -0400] Full reload complete.
E [30/Apr/2010:08:39:04 -0400] StartListening: Unable to find IP address for server name "localhost.localdomain" - Host name lookup failure
I [30/Apr/2010:08:47:45 -0400] Listening to 0:631
I [30/Apr/2010:08:47:45 -0400] Loaded configuration file "/etc/cups/cupsd.conf"
I [30/Apr/2010:08:47:45 -0400] Configured for up to 100 clients.
I [30/Apr/2010:08:47:45 -0400] Allowing up to 100 client connections per host.
I [30/Apr/2010:08:47:45 -0400] Full reload is required.
I [30/Apr/2010:08:47:46 -0400] LoadPPDs: Read "/etc/cups/ppds.dat", 14 PPDs...
I [30/Apr/2010:08:47:46 -0400] LoadPPDs: No new or changed PPDs...
I [30/Apr/2010:08:47:46 -0400] Full reload complete.
I [30/Apr/2010:08:53:28 -0400] Scheduler shutting down normally.
I [30/Apr/2010:08:56:10 -0400] Listening to 0:631
I [30/Apr/2010:08:56:10 -0400] Loaded configuration file "/etc/cups/cupsd.conf"
I [30/Apr/2010:08:56:10 -0400] Configured for up to 100 clients.
I [30/Apr/2010:08:56:10 -0400] Allowing up to 100 client connections per host.
I [30/Apr/2010:08:56:10 -0400] Full reload is required.
I [30/Apr/2010:08:56:10 -0400] LoadPPDs: Read "/etc/cups/ppds.dat", 14 PPDs...
I [30/Apr/2010:08:56:10 -0400] LoadPPDs: No new or changed PPDs...
I [30/Apr/2010:08:56:10 -0400] Full reload complete.
I [30/Apr/2010:09:02:48 -0400] Scheduler shutting down normally.
I [30/Apr/2010:09:05:34 -0400] Listening to 0:631
I [30/Apr/2010:09:05:34 -0400] Loaded configuration file "/etc/cups/cupsd.conf"
I [30/Apr/2010:09:05:34 -0400] Configured for up to 100 clients.
I [30/Apr/2010:09:05:34 -0400] Allowing up to 100 client connections per host.
I [30/Apr/2010:09:05:34 -0400] Full reload is required.
I [30/Apr/2010:09:05:35 -0400] LoadPPDs: Read "/etc/cups/ppds.dat", 14 PPDs...
I [30/Apr/2010:09:05:35 -0400] LoadPPDs: No new or changed PPDs...
I [30/Apr/2010:09:05:35 -0400] Full reload complete.
E [30/Apr/2010:09:05:35 -0400] StartListening: Unable to bind socket for address 00000000:631 - Address already in use.
 
===============================================================================
Data was taken from the following kernel test results:
Job#153584
i386 - kernel 2.6.9-78.0.31.EL hugemem
	/kernel/standards/usex/1.9-29
	Recipe-395218
	 Test - /kernel/standards/usex/1.9-29 - FAIL
	  http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=13830569
	   See this link for console out data
         
         Above noted cupslog.tar data is here:
         http://rhts.redhat.com/testlogs/2010/04/153584/395218/3181865/cupslog.tar

===============================================================================
-pbunyan
Comment 27 Dave Anderson 2010-05-03 14:19:15 EDT
> Dave,
> The test was modified to tar the cups.log file. The issue was reproduced and we
> have the following data or cups.log

The logs would possibly be helpful/meaningful to Tim.
Comment 28 PaulB 2010-06-15 13:38:12 EDT
All,
This issue was seen again.
This issue was seen during a RHEL4 kernel workflow.
Job# 162778: kernel 2.6.9-89.28.EL.vgoyal.test1
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=162778
Recipe# 416886:
http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=416886

Test FAILURE is here:
/kernel/standards/usex/1.9-29
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14765582

Best,
-pbunyan
Comment 30 Vivek Goyal 2010-10-29 13:50:16 EDT
Tim,

Is the data you were looking for is available in comment #26? IOW, are there any clues to this bz and how can we go about fixing it. We do see it happen once in a while during rhel4 rhts testing.
Comment 31 Tim Waugh 2010-11-01 13:19:17 EDT
There's nothing out of the ordinary in the log file, although it is not very verbose.

The best guess I have for what's going on is that in some particular situation cupsd gets confused about which open client connections it needs to reply to.

Each lpq invocation causes at least 4 IPP requests to be issued.  Could it be that there are several lpq invocations running side by side, and that the problem occurs when there is a particular timing to the overleaving of requests between the two clients?

I wonder if having two continuous loops as follows would trigger the problem?:

while :; do lpq &>/dev/null; done &
while :; do lpq &>/dev/null; done &

As for determining the cause, we'd need more verbose debug logging in /var/log/cups/error_log.  Here is how that can be achieved:

sed -i -e 's,^LogLevel.*,LogLevel debug2' /etc/cups/cupsd.conf
sed -i -e '/^MaxLogSize/d' /etc/cups/cupsd.conf
echo MaxLogSize 0 >> /etc/cups/cupsd.conf
/sbin/service cups restart
Comment 37 Jiri Pallich 2012-06-20 11:58:04 EDT
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.