Bug 1648697 - cups-browsed keeps re-adding printer after HTTP error (triggers high CPU load in cupsd)
Summary: cups-browsed keeps re-adding printer after HTTP error (triggers high CPU load...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cups-filters
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Zdenek Dohnal
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-11 19:30 UTC by Benjamin Berg
Modified: 2018-12-17 19:12 UTC (History)
4 users (show)

Fixed In Version: cups-filters-1.20.3-11.fc29
Clone Of:
Environment:
Last Closed: 2018-12-17 19:12:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
cups-browsed log of the situation (41.77 KB, text/plain)
2018-11-11 19:30 UTC, Benjamin Berg
no flags Details
Log with HttpLocalTimeout 10 (34.05 KB, text/plain)
2018-11-12 18:06 UTC, Benjamin Berg
no flags Details
Log with build 30919475 (254.79 KB, text/plain)
2018-11-20 20:51 UTC, Benjamin Berg
no flags Details
Log with debug patch applied (69.97 KB, text/plain)
2018-11-21 21:29 UTC, Benjamin Berg
no flags Details

Description Benjamin Berg 2018-11-11 19:30:29 UTC
Created attachment 1504445 [details]
cups-browsed log of the situation

If I turn on another laptop which is exporting a few printers, then this causes very high CPU loads with cups-browsed seemingly failing to add a printer and retrying in a loop.

The printer in question is a network printer, which is not in my local network. So e.g. contacting the endpoint for that printer would obviously fail; that said, maybe there is also something wrong with the exporting machine.

Attaching the full log, I believe the relevant lines are:

Sun Nov 11 20:09:10 2018 HTTP timeout! (consider increasing HttpLocalTimeout/HttpRemoteTimeout value)
Sun Nov 11 20:09:10 2018 Unable to create/modify CUPS queue (Success)!

I have no idea what the HTTP request in question is.

Comment 1 Zdenek Dohnal 2018-11-12 16:26:51 UTC
Hi Ben,

thank you for reporting the issue! Do I understand you correctly when you are saying:

(In reply to Benjamin Berg from comment #0)
> If I turn on another laptop which is exporting a few printers, then this
> causes very high CPU loads with cups-browsed

you have 2 machines - the one which is exporting the print queues and the other one which experience the issue - both in the?

> 
> The printer in question is a network printer, which is not in my local
> network. So e.g. contacting the endpoint for that printer would obviously
> fail; that said, maybe there is also something wrong with the exporting
> machine.

If the setup is as I wrote above, then cups-browsed on 2nd machine is not trying to reach your printer, but your cups daemon on 1st machine, which exports the print queues.

> 
> Attaching the full log, I believe the relevant lines are:
> 
> Sun Nov 11 20:09:10 2018 HTTP timeout! (consider increasing
> HttpLocalTimeout/HttpRemoteTimeout value)
> Sun Nov 11 20:09:10 2018 Unable to create/modify CUPS queue (Success)!
> 
> I have no idea what the HTTP request in question is.

cups-browsed created http connection to local CUPS server when starting to create print queues, because cups-browsed creates print queues for local CUPS. Then cups-browsed sets timeout - 5s by default - for that connection. And the connection timeouts during getting ppd file.

Ben, would you mind trying to set HttpLocalTimeout to higher value (f.e. 10) in /etc/cups/cups-browsed.conf, restart cups-browsed service and see if it helps?

Comment 2 Benjamin Berg 2018-11-12 18:06:01 UTC
Created attachment 1504844 [details]
Log with HttpLocalTimeout 10

(In reply to Zdenek Dohnal from comment #1)
> (In reply to Benjamin Berg from comment #0)
> > If I turn on another laptop which is exporting a few printers, then this
> > causes very high CPU loads with cups-browsed
> 
> you have 2 machines - the one which is exporting the print queues and the
> other one which experience the issue - both in the?

Yep, ben-t450 is exporting the printers, ben-x1 is the Fedora machine with cups-browsed. Both in the same network.

> cups-browsed created http connection to local CUPS server when starting to
> create print queues, because cups-browsed creates print queues for local
> CUPS. Then cups-browsed sets timeout - 5s by default - for that connection.
> And the connection timeouts during getting ppd file.
> 
> Ben, would you mind trying to set HttpLocalTimeout to higher value (f.e. 10)
> in /etc/cups/cups-browsed.conf, restart cups-browsed service and see if it
> helps?

Yup, HttpLocalTimeout set to 10s works around the issue. It seems that cups simply needs the time to process the PPD(s)?

Comment 3 Zdenek Dohnal 2018-11-14 08:31:47 UTC
(In reply to Benjamin Berg from comment #2)
> Yup, HttpLocalTimeout set to 10s works around the issue. It seems that cups
> simply needs the time to process the PPD(s)?

It is not actually a workaround, it is correct way to solve the issue - some PPDs needs more time to process, but it isn't reason to keep timeout length bigger for PPDs, which are smaller and quicker to process.
I'm closing the issue as NOTABUG and I'll add the mention of the issue on https://fedoraproject.org/wiki/How_to_debug_printing_problems .

Comment 4 Zdenek Dohnal 2018-11-14 08:45:28 UTC
I have eyes and don't see - it is actually there already https://fedoraproject.org/wiki/How_to_debug_printing_problems#cups-browsed_consumes_large_amount_of_CPU

Comment 5 Benjamin Berg 2018-11-14 10:39:27 UTC
Sorry, I do not agree with the decision that this is NOTABUG.

I think it is fair to consider the failure of the operation an expected behaviour. In the end, one probably needs to draw a line somewhere as to expected cupsd response times.

However, cups-browsed is handling this error condition poorly. It is retrying the operation in a tight loop and using up system resources for an operation that has very little hope of succeeding. This failure case should be handled more gracefully be cups-browsed.

A simple back-off algorithm to only try to re-add the printer periodically would already do the trick.

Comment 6 Zdenek Dohnal 2018-11-14 13:36:36 UTC
Hmm, actually it is strange to have only 3s or 4s time frame for updating the queues, when TIMEOUT_RETRY interval should be 10s (which could give time to CUPS to recover from previous query).
It is caused by getting current time at the start of function, but it is used lower in the code for setting the timeout, so current time is not so current as I would expect and causes shorter timeouts. 

Would you mind trying the scratch build https://koji.fedoraproject.org/koji/taskinfo?taskID=30881492 ?

Comment 7 Zdenek Dohnal 2018-11-14 15:33:06 UTC
Just clarifying things - I don't consider reaching http timeout as bug in the case (for most PPDs timeout is not reached, user can set bigger timeout in the config). Retries after only 3s/4s instead of 10s is the bug. 

Reported upstream with patch https://github.com/OpenPrinting/cups-filters/pull/71

Comment 8 Benjamin Berg 2018-11-14 20:01:06 UTC
With that patch the CPU isn't maxed out all the time anymore and I have some seconds before cups-browsed tries to add the printer again.

So, a slight improvement, but it still doesn't give up trying to add the printer.

Comment 9 Zdenek Dohnal 2018-11-15 13:38:51 UTC
With the newest patch cups-browsed abandons his try to add a specific printer after 5 tries and skip the printer after that (until other cups-browsed restart). cups-browsed cannot abandon whole updating process, because there can be still functional print queues or newly discovered printers.

Would you mind trying this build?
https://koji.fedoraproject.org/koji/taskinfo?taskID=30899328

Comment 10 Benjamin Berg 2018-11-15 13:43:05 UTC
Nice, that sounds good! I'll try it tonight (or tomorrow morning) when I am back home.

Comment 11 Benjamin Berg 2018-11-15 22:23:42 UTC
Hm, weird, the new build does not seem to change anything. It is still retrying indefinitely.

Comment 12 Zdenek Dohnal 2018-11-16 08:03:07 UTC
Sorry, my bad - overlooked 'break' lines, I realized it this morning at home... this one should be ok:

https://koji.fedoraproject.org/koji/taskinfo?taskID=30917994

Comment 13 Benjamin Berg 2018-11-16 08:54:51 UTC
Ah, happens :)

Unfortunately, that still doesn't seem to make a difference.

Comment 14 Zdenek Dohnal 2018-11-16 09:23:31 UTC
Just reminding - it will should try to add each printer 5 times and then skipping it - so in your case, for your 3 printers, it will be do retries about for 3 minutes until cups-browsed starts to skip all of them (if they get timeouts) - 50s (5 retries, each in 10s timeframe) for the first, 50s the second e.g.

Comment 15 Zdenek Dohnal 2018-11-16 09:34:20 UTC
Actually, 'break' wasn't for cycle, but for inner 'switch', so https://koji.fedoraproject.org/koji/taskinfo?taskID=30899328 (from yesterday) should work (after 3 minutes cups-browsed skips the printers).

Comment 16 Zdenek Dohnal 2018-11-16 09:39:25 UTC
If (I hope I did not) I did not let max retries assigned to 10 (older patch had it)... for just to be sure https://koji.fedoraproject.org/koji/taskinfo?taskID=30919474

Comment 17 Benjamin Berg 2018-11-16 09:46:27 UTC
Thanks! I'll try again tonight.

Yeah, I probably had not waited long enough then. Though I believe only one of the printers was running into timeouts in my case.

Comment 18 Benjamin Berg 2018-11-16 19:08:02 UTC
Hm, something is still wrong. It is still looping (for like 6 minutes now).

# grep 'Fri Nov 16 20:' /var/log/cups/cups-browsed_log  | grep -c 'Non-raw queue Molly with PPD'
43

So, 43 timeouts already on that printer.

Comment 19 Zdenek Dohnal 2018-11-19 08:33:45 UTC
Darn... would you mind sharing the logs in the file and attach it?

Comment 20 Benjamin Berg 2018-11-20 20:51:55 UTC
Created attachment 1507463 [details]
Log with build 30919475

Sorry, didn't get around to it yesterday.

So, did a reinstall of
  https://koji.fedoraproject.org/koji/taskinfo?taskID=30919475
just to make sure I am on the right version.

The

Comment 21 Zdenek Dohnal 2018-11-21 12:02:16 UTC
No problem, thank you for cooperating!

I actually still see problem with short timeouts (the build below has attempt to fix it), plus not only Molly printer got timeout, but PDF too - but cups-browsed gracefully ignores that and thinks it is okay and sets state to CONFIRMED...

This is more 'debug' build https://koji.fedoraproject.org/koji/taskinfo?taskID=31034955 , I would like to see contents of p->timeouts variable when cups-browsed is running. Would you mind testing it and attaching debug log, please?

Comment 22 Benjamin Berg 2018-11-21 21:29:18 UTC
Created attachment 1507831 [details]
Log with debug patch applied

Hrm, so, after not seeing *any* change, I figured I would grab the source RPM. It seems like the patch was simply not applied :-/

A local rebuild in mock later, and it works correctly:

Wed Nov 21 22:27:56 2018 How many timeouts already the queue has? 5

And, there it stops. Attached the full log.

Comment 23 Zdenek Dohnal 2018-11-22 12:49:39 UTC
I don't know what to say than 'I'm sorry'... I tried to use different way to create the patches than I was used to, and made such mistake... at least skipping the printer works...

I'll show the patch to upstream and see on what we can agree.

Comment 24 Zdenek Dohnal 2018-12-03 16:25:04 UTC
Upstream merged the patch, I'll backport it to F29.

Comment 25 Benjamin Berg 2018-12-04 16:48:46 UTC
Nice. Thanks a lot for solving this!

Comment 26 Zdenek Dohnal 2018-12-14 14:18:42 UTC
Bodhi is extremely cooperative today, so I'll change the bug manually... https://bodhi.fedoraproject.org/updates/FEDORA-2018-317a8771d0 requested to testing.

Comment 27 Fedora Update System 2018-12-16 03:57:44 UTC
cups-filters-1.20.3-11.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-317a8771d0

Comment 28 Fedora Update System 2018-12-17 19:12:19 UTC
cups-filters-1.20.3-11.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.