RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1782216 - Print queue is paused after ipp backend ends with CUPS_BACKEND_STOP
Summary: Print queue is paused after ipp backend ends with CUPS_BACKEND_STOP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cups
Version: 8.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.0
Assignee: Zdenek Dohnal
QA Contact: Petr Sklenar
URL:
Whiteboard:
Depends On: 1782161
Blocks: 1949067
TreeView+ depends on / blocked
 
Reported: 2019-12-11 12:45 UTC by Zdenek Dohnal
Modified: 2024-12-20 18:57 UTC (History)
5 users (show)

Fixed In Version: cups-2.2.6-39.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1782161
: 1949067 (view as bug list)
Environment:
Last Closed: 2021-11-09 19:38:31 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
CUPS job (88.99 KB, application/gzip)
2020-08-17 07:17 UTC, Pierre Ossman
no flags Details
failed and retried job (98.47 KB, text/plain)
2020-10-06 08:25 UTC, Pierre Ossman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:4393 0 None None None 2021-11-09 19:38:37 UTC

Description Zdenek Dohnal 2019-12-11 12:45:24 UTC
+++ This bug was initially created as a clone of Bug #1782161 +++

Description of problem:

Printer is configured with the policy "retry-job", yet is paused anyway when a job fails.


Version-Release number of selected component (if applicable):

cups-1.6.3-40.el7.x86_64


How reproducible:

Currently not at all. Retrying the job caused it to succeed.


Actual results:

Printer paused.


Expected results:

Printer moves on to the next job in the queue.


Additional info:

Not reported upstream as the version of CUPS in RHEL 7 is ancient. Upstream seems to have fixes on this theme in newer versions though:

https://github.com/apple/cups/issues/5517

Excerpt from error_log attached.

--- Additional comment from INVALID USER on 2019-12-11 10:09:15 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Zdenek Dohnal on 2019-12-11 11:49:43 UTC ---

Hi Pierre

--- Additional comment from Zdenek Dohnal on 2019-12-11 12:43:30 UTC ---

D'oh, forgot that I already wrote something :)

Hi Pierre,

thank you for taking the time to report this issue! I appreciate the feedback and use reports such as this one to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and I am not able to guarantee the timeliness or suitability of a resolution.

If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution. 

For information on how to contact the Red Hat production support team, please visit:
    https://www.redhat.com/support/process/production/#howto

IMO the issue does not seem critical (it happens only for several cases when ipp backend reports CUPS_BACKEND_STOP and the queue can be reenabled by 'cupsenable') for have it fixed in RHEL 7, but I would like to include the fix in RHEL 8.

So I'm closing this bug as WONTFIX and I'll open it for RHEL 8.

Comment 1 Zdenek Dohnal 2020-03-10 12:54:59 UTC
Hi Pierre,

the issue seems to be connected to specific printer models - are you able to reproduce it on any of your printers?

If we are unable to reproduce the issue, then I would close the issue.

The issue is already solved in newer CUPS which is in Fedora, so unless you encounter the issue, I would not press for the fix.

Comment 2 Pierre Ossman 2020-03-13 14:49:53 UTC
I'm afraid we don't know exactly which job caused this issue so we're not in a good position to reproduce this. We still have the specific printer that saw this though, a Lexmark CX727de.

I would say this is very important for robustness though, so if the fix for this is not in the version of CUPS i RHEL 8 then we would very much like to see an upgrade or backport in the future. Without the fix "retry-job" isn't really reliable.

Comment 3 Tomáš Hozza 2020-03-25 14:41:01 UTC
Hi.

Please reopen in case you'll  have a reproducer.

Comment 4 Pierre Ossman 2020-08-17 07:17:10 UTC
Created attachment 1711558 [details]
CUPS job

I'm afraid it happened again on our new RHEL 8 system (same printer though).

Attached is a tar ball with the job log and the control and data file from the queue.

Comment 5 Zdenek Dohnal 2020-08-17 08:39:51 UTC
Hi Pierre,

thanks for the info!

From the logs, the device returns an error for IPP Validate Job operation. It really seems like a similar issue which is reported on upstream.

I'll apply the patch which you suggested and send you testing packages as soon as possible.

Unfortunately, this seems like a device bug, so reproducing on my own will be complicated - the same can be told about automatic testing.

If you have a valid RHEL subscription, would you mind creating a customer case for the issue at access.redhat.com? It will help me to prioritize the issue during planning.

Comment 6 Pierre Ossman 2020-08-17 08:46:23 UTC
Unfortunately it happens rarely here as well, so I can't say we can reliably test anything either. It is a bit of a problem when it happens though as users cannot resolve this themselves.

Doing a best effort fix based on the logs might be all that can be done right now.

We have a subscription so I'll create a support case for this.

Comment 8 Zdenek Dohnal 2020-08-17 11:17:12 UTC
Pierre,

I sent the testing rpms to Apurbita - he will share the rpms within customer case. It is better to share them via secure customer case rather than sharing them on a public server.

Comment 11 Pierre Ossman 2020-08-27 07:37:30 UTC
The issue unfortunately still exists with the test packages:

> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: ---- job-attributes-tag ----
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: media-col collection {media-size={x-dimension=21000 y-dimension=29700} media-top-margin=423 media-bottom-margin=423 media-left-margin=423 media-right-margin=423 media>
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: print-color-mode keyword color
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: sides keyword two-sided-long-edge
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: multiple-document-handling keyword separate-documents-uncollated-copies
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: ---- end-of-attributes-tag ----
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: Validate-Job: server-error-internal-error (Success)
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: ---- end-of-attributes-tag ----
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: PID 569560 (/usr/lib/cups/backend/ipp) stopped with status 4.
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: Hint: Try setting the LogLevel to "debug" to find out more.
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: Backend returned status 4 (stop printer)
> Aug 27 08:41:20 tl.cendio.se cupsd[178352]: Printer stopped due to backend errors; please consult the error_log file for details.

cups-2.2.6-38.el8.testing.x86_64
cups-libs-2.2.6-38.el8.testing.x86_64
cups-pk-helper-0.2.6-5.el8.x86_64
cups-filesystem-2.2.6-38.el8.testing.noarch
cups-filters-1.20.0-19.el8.x86_64
python3-cups-1.9.72-21.el8.x86_64
cups-filters-libs-1.20.0-19.el8.x86_64

Comment 12 Zdenek Dohnal 2020-08-27 10:07:19 UTC
Thank you for letting me know, Pierre!

These results complicate the issue - the testing packages contains the upstream patch you sent the link. If the patch doesn't fix the issue, then:

A) the issue is not fixed in upstream

B) more code will need to be backported to do the trick

Ad B) I checked upstream ipp backend code and one commit could be connected to the issue. I'll send rpms via Apurbita again, because they are available in internal network.

Ad A) Because of nature of the issue (not reliable reproducer, happens with specific printer model, happens only sometimes) and the current state of CUPS upstream (no commits since approx. March) I don't think the issue is critical enough to have its fix as a downstream patch.

If the option A) is correct, I would like to ask you to cooperate with me and Apurbita.
We need to gather the information which will be used in upstream, come with proposed patch and this ticket could be closed as UPSTREAM. 

We can reopen the issue after upstream fixes/merges the possible patch.

To sum it up - I'll backport other possibly relevant commit, put more debug printf to understand why previous patch didn't work and share the rpms with Apurbita - we will see what to do based on results.

Comment 14 Zdenek Dohnal 2020-08-27 10:46:02 UTC
Testing rpms were sent to Apurbita.

Comment 23 Zdenek Dohnal 2020-09-22 11:29:48 UTC
Hi Pierre and Apurbita,

the issue appeared 14th Aug and 27th Aug... would it be possible to have the testing build in your production for another month and then we can mark it as fixed? Providing the fixing in RHEL 8 is business important for you, because I hesitate to push a fix which I nor Pierre can verify if it solves the issue for sure...

The newer major RHEL version will have the cups version which has the backported commits, so I'm thinking if it is not better to wait for it.

Comment 24 Pierre Ossman 2020-09-23 06:46:35 UTC
(In reply to Zdenek Dohnal from comment #23)
> the issue appeared 14th Aug and 27th Aug... would it be possible to have the
> testing build in your production for another month and then we can mark it
> as fixed? 

Yes, that's okay with us. The test build has worked just fine otherwise.

Comment 25 Zdenek Dohnal 2020-09-29 11:37:07 UTC
Thanks for the confirmation, Pierre!

Then would you mind writing status update (here or on the customer case) on the matter approx. 20th October? I'll mark the date in my calendar too, so I wouldn't forget to ping.

Comment 28 Pierre Ossman 2020-10-05 06:51:59 UTC
Sure, no problem.

Comment 29 Pierre Ossman 2020-10-05 11:23:17 UTC
I'm sorry to say that the bug appeared today again:

> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: ---- end-of-attributes-tag ----
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Validate-Job: server-error-internal-error (Success)
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: ---- end-of-attributes-tag ----
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Validate-Job - got status > BAD_REQUEST, going to cleanup.
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Cleanup - got into cleanup.
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: IPP status 500
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Job Canceled 0
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: PID 2635022 (/usr/lib/cups/backend/ipp) stopped with status 4.
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Hint: Try setting the LogLevel to "debug" to find out more.
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: PID 2635021 (/usr/lib/cups/filter/pdftopdf) did not catch or ignore signal 13.
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Backend returned status 4 (stop printer)
> Oct 05 13:01:17 tl.cendio.se cupsd[14577]: Printer stopped due to backend errors; please consult the error_log file for details.

Let me know if you want the complete log for the job.

Comment 30 Zdenek Dohnal 2020-10-06 06:59:19 UTC
Thank you for letting me know, Pierre!

So what I got from logs:

The job again fails on Validate-Job operation with internal error, and the backend is stopped because 'monitor_job()' (because the printer hasn't any job-id associated with its object, which means it started to do printing job) or 'sigterm_handler()' (no sigterm comes to ipp backend, because it exits 'normally') doesn't set 'job_canceled' to -1/1.

To get the functionality which is brought by the upstream patch, 'something' needs to set 'job_canceled'. But the execution ends before even creating a job, so applying 'job_canceled' somewhere doesn't make sense.

------------------------------------------------------------------------

These results are a problem... I'll be frank - as I wrote before, the current CUPS upstream is not available right now, so IMO I wouldn't go for a downstream patch for a device specific issue unless there is a strong business reason.

But I would like to investigate as far as I can and report it upstream, so it will be tracked there at least and someday will be approached.

Would you mind continuing to work with me on the investigation, even if the solution will not arrive into RHEL, Pierre?

Only other thing which occurred to me is to check Validate-Job request validity - I went through RFC 2911, which specifies Validate-Job operation and it looks okay according the logs.

Can you upload a job log for the same printer when the job is printed fine?

Would you mind trying what ipptool has to say about your printer? Check what this command outputs (with cups-ipptool installed):

$ ipptool -tv ipp://printer.lkpg.cendio.se:631/ipp/print -f <the file used when backend stopped> validate-job.test

Comment 31 Pierre Ossman 2020-10-06 08:25:05 UTC
Created attachment 1719304 [details]
failed and retried job

Here is the job that failed, and also the subsequent success once the printer was resumed. I hope that is enough?

It's unfortunate that CUPS has no decent upstream. But I don't think that means you can just ignore bugs in RHEL. We are after all using RHEL for a reason, instead of rolling our own. So I do expect this issue to be resolved, even if it means a Red Hat specific patch for it.

Comment 32 Pierre Ossman 2020-10-06 08:29:16 UTC
This is the output from ipptool:

# ipptool -tv ipp://printer.lkpg.cendio.se:631/ipp/print -f d00760-001 validate-job.test
"/usr/share/cups/ipptool/validate-job.test":
    Validate-Job:
        attributes-charset (charset) = utf-8
        attributes-natural-language (naturalLanguage) = en
        printer-uri (uri) = ipp://printer.lkpg.cendio.se:631/ipp/print
        requesting-user-name (nameWithoutLanguage) = root
        document-format (mimeMediaType) = application/octet-stream
        copies (integer) = 1
    Validate file/ticket using Validate-Job                              [PASS]
        RECEIVED: 96 bytes in response
        status-code = successful-ok (OK)
        attributes-charset (charset) = utf-8
        attributes-natural-language (naturalLanguage) = en-us
        status-message (textWithoutLanguage) = OK

Comment 33 Zdenek Dohnal 2020-10-06 11:56:19 UTC
(In reply to Pierre Ossman from comment #31)
> It's unfortunate that CUPS has no decent upstream.

Unfortunately Apple engineers haven't worked on CUPS github project since March, when they committed only several security fixes. It was their first activity since December, when Mike Sweet, the CUPS previous owner, left Apple. Nevertheless this is just one aspect to consider on our side when working on issues in RHEL. It is not something that would stop us from fixing critical customer issues in RHEL. In case a downstream patch would be the only option, we would need to get a business impact from you (what is the current impact and what would be the impact on your business if not fixed) in order to make final decision. Please understand that we have to consider impact on all customers.

> But I don't think that means you can just ignore bugs in RHEL.

It is not my intention to ignore bugs in RHEL - I'm sorry it sounded that way - but I need to be more strict because of this situation. The current situation with this issue is that it looks most probably as a firmware / hardware error. Based on my investigation so far, the IPP requests from CUPS are following IPP specification and the job, which once failed in Validate-Job, passes Validate-Job on the second try with the same setup, file and attributes according the logs.

There is still a step which I would like to take in order to really acknowledge that this is not an issue on CUPS side - we can check the packet traffic during failure and retry and see if it matches with logs.

Would you mind running the following command:

sudo tcpdump -s0 -U -i any -w printer_lkpg.pcap host printer.lkpg.cendio.se and port 631

on the machine where you issue a print jobs from till you experience the issue and do a successful retry?

The command will catch all network communication (from all interfaces), which will go to/come from the printer you experience the issue with (according the logs it is printer.lkpg.cendio.se ) and will go via port 631 (IPP port).

Please take a job log from those two jobs (failed and retried) and attach them with printer_lkp.pcap file produced via tcpdump command - I will need an exact time when those two jobs will happen, to easily navigate in the caught network traffic.



The good news is that even if this would be really firmware/hardware issue, I have some ideas that could help to workaround the issue:

1) The issue is connected to IPP operation Validate-Job, which we should be able to print via different backend, f.e. socket backend, if the printer has port 9100 opened. This will strip the print queue of several features, which is brought by IPP (toner, paper checks, better error handling), but it will not use Validate-Job operation. 

Please let me know if idea 1) is acceptable for you - I can help you with settings a print queue via socket backend, with PPD file generated from CUPS or driverless binary from cups-filters, if you want.

2) IPP operation Validate-Job is recommended, not required operation since IPP 2.0. On this note I can imagine a minimal downstream patch, which will skip the validation result, if the error from the printer is 'INTERNAL_ERROR' and IPP version is >= 2.0. In my humble opinion it will comply with RFC.



Also if this will eventually turn out to be a hardware issue, I would encourage you to seek also support on the printer vendor side. We would be more than happy to work on solving your problem in case they would provide some actionable information about what the printer firmware actually does.

This issue is not a general one (can not be reproduced with any printer) and it actually can not be reliably reproduced on your side. This makes it really hard for us to debug the issue and find a fix or workaround. I would like to thank you for your patience and providing all the valuable debug information so far.

Comment 34 Pierre Ossman 2020-10-12 07:37:49 UTC
(In reply to Zdenek Dohnal from comment #33)
> 
> Would you mind running the following command:
> 
> sudo tcpdump -s0 -U -i any -w printer_lkpg.pcap host printer.lkpg.cendio.se
> and port 631
> 
> on the machine where you issue a print jobs from till you experience the
> issue and do a successful retry?
> 

We can try. I'm a bit concerned about the size of such a dump though. We'll have to keep an eye on it.

> 
> Please let me know if idea 1) is acceptable for you - I can help you with
> settings a print queue via socket backend, with PPD file generated from CUPS
> or driverless binary from cups-filters, if you want.
> 

Likely not. Those status reports are useful and avoid hassle.

> 2) IPP operation Validate-Job is recommended, not required operation since
> IPP 2.0. On this note I can imagine a minimal downstream patch, which will
> skip the validation result, if the error from the printer is
> 'INTERNAL_ERROR' and IPP version is >= 2.0. In my humble opinion it will
> comply with RFC.
> 

Sounds good. Can we verify that the printer support IPP 2.0 though?

> 
> Also if this will eventually turn out to be a hardware issue, I would
> encourage you to seek also support on the printer vendor side. We would be
> more than happy to work on solving your problem in case they would provide
> some actionable information about what the printer firmware actually does.
> 
> This issue is not a general one (can not be reproduced with any printer) and
> it actually can not be reliably reproduced on your side. This makes it
> really hard for us to debug the issue and find a fix or workaround. I would
> like to thank you for your patience and providing all the valuable debug
> information so far.

No worries. We understand that this issue will take some time given the circumstances.

I do note that the printer is not running the latest firmware. Would you like us to test an upgrade or would you like to continue testing with the current version?

Comment 35 Zdenek Dohnal 2020-10-12 08:22:27 UTC
(In reply to Pierre Ossman from comment #34)
> 
> I do note that the printer is not running the latest firmware. Would you
> like us to test an upgrade or would you like to continue testing with the
> current version?

Would you mind upgrading the printer firmware and then:

1) first, please try to reproduce with stable CUPS packages, catch the network traffic - if reproduced, please upload job log and network traffic when job failed and it was retried successfully

2) then install testing CUPS packages, catch the network traffic, if reproduced, please upload job log and network traffic when job failed and it was retried successfully

Comment 36 Pierre Ossman 2020-10-12 10:15:06 UTC
Alright, printer is now updated from CXTAT.053.023 to CXTAT.073.023 and we are running cups-2.2.6-33.el8.x86_64. Let's see what happens.

Comment 37 Zdenek Dohnal 2020-10-12 10:19:36 UTC
Thank you! I'll set the needinfo again, since we need the results of testing.

Comment 38 Pierre Ossman 2020-10-19 12:30:30 UTC
I'm afraid it happens with the updated firmware as well. I've attached the relevant files to the support case.

Are the last round of testing packages still relevant or do you want to generate new ones based on the latest updates to the stable cups package?

Comment 39 Zdenek Dohnal 2020-10-20 13:43:32 UTC
Hi Pierre,

thank you for the data!

The testing packages already contain changes from the latest updates, so no need to regenerate them.

I analyzed the traffic dump you uploaded and found following:

1) IPP packet for Validate-Job operation from failed and resubmitted job are the same.

2) The difference is how the printer acts after receiving IPP Validate-Job:

  A)  resubmitted job
The printer after receiving 'IPP Validate-Job Request' sends TCP packet with ACK back to CUPS, then 'HTTP/1.1 100 Continue' packet from the printer follows within 1s (in resubmitted job it took 3ms). CUPS sends TCP packet with ACK to acknowledge it got 'HTTP/1.1 100 Continue' and printer sends 'IPP Validate-Job Response'.

  B) failed job
The printer sends a TCP packet with ACK back to CUPS, but no 'HTTP/1.1 100 Continue' comes within 1s.

Comment 40 Pierre Ossman 2020-10-20 14:35:08 UTC
So should we proceed with trying to get a tcpdump with the testing packages our do you have what you need already?

Comment 45 Zdenek Dohnal 2020-10-21 09:14:26 UTC
Hi Pierre,

yes, I have all I need for now.

The upstream patch I applied in testing rpms doesn't have any impact on your issue - so no need to run tests with them.

Thank you for the uploaded files!

Comment 91 errata-xmlrpc 2021-11-09 19:38:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: cups security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4393


Note You need to log in before you can comment on or make changes to this bug.