Bug 1954469

Summary: Workaround occasional bundled recvmsg() from Avahi in ippfind
Product: [Fedora] Fedora Reporter: Zdenek Dohnal <zdohnal>
Component: cupsAssignee: Zdenek Dohnal <zdohnal>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: msekleta, mzeuom, pemensik, pizza, rdieter, twaugh, zdohnal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed patch
none
ippfind strace with bundled Avahi response
none
ippfind strace when a device is reported none

Description Zdenek Dohnal 2021-04-28 08:25:08 UTC
driverless binary, which is used for looking for driverless device uri or generating driverless PPDs, can fail sometimes even if the printer is available and seen via avahi-browse:

$ driverless list

The correct status:
DEBUG: Started ippfind (PID 39831)
"driverless:ipp://HP%20LaserJet%20M1536dnf%20MFP%20(42307C)._ipp._tcp.local/" en "HP" "HP LaserJet M1536dnf MFP, driverless, cups-filters 1.28.7" "MFG:HP;MDL:LaserJet M1536dnf MFP;CMD:PDF,PS,PCL,AppleRaster,URF;"
DEBUG: ippfind (PID 39831) exited with no errors.

The error which happens sometimes:
DEBUG: Started ippfind (PID 39806)
ERROR: ippfind (PID 39806) stopped with status 1!

I enabled debug logging on Avahi and I see that there is no resolver methods spawned when the error happens, but I haven't been able to isolate the issue in ippfind, which is executed from driverless.

I will ask the cups-filters upstream for the exact ippfind command they use in driverless (I wasn't able to set it up according driverless code).

Workaround:
- give it an other try

Comment 1 Zdenek Dohnal 2021-04-28 08:42:11 UTC
The bug influences every device listing - so f.e. gnome-control-center, lpinfo -v, system-config-printer and cups web ui are affected.

However, this bug isn't blocker, because driverless/IPP everywhere enabled printers can print via temporary queues or the device uri can be created manually by user. Common uris are:

ipp://IP:631/ipp/print
ipp://hostname:631/ipp/print

Comment 2 Solomon Peachy 2021-05-25 11:34:21 UTC
I'm seeing a similar, yet slightly different issue on a fully updated F34 box:

$  driverless list
DEBUG: Started ippfind (PID 152478)
ERROR: ippfind (PID 152478) stopped with status 2!

$ rpm -qf /usr/bin/driverless
cups-filters-1.28.8-1.fc34.x86_64

re-running 'driverless' returns the same error every time.

(FWIW 'ippfind' shows 27 entries..)

This is the cmdline driverless seems to invoke:

ippfind -T 0 _ipps._tcp _ipp._tcp ! --txt printer-type --and \( --txt-pdl image/pwg-raster --or --txt-pdl application/PCLm --txt-pdl image/urf --or --txt-pdl application/pdf \) -x echo -en "{service_scheme}\t{service_name}\t..." \;

Dropping the '!' seems to make things work for me.  Not sure why it's there.

Comment 3 Zdenek Dohnal 2021-05-26 07:41:04 UTC
Hi Solomon,

I have F33 and I installed the newest cups-filters from updates-testing (1.28.8-1), and I cannot get the same error. I'll upgrade my second laptop (where I want to have always the latest Fedora for testing) and see if I can reproduce.

Anyway, the error code from ippfind is for IPPFIND_EXIT_BONJOUR - browsing or resolving error - it is hit when ippfind:

- cannot create avahi poll,
- cannot create avahi client on that poll,
- cannot create avahi browser on the client,
- when connection to Avahi failed and we've been told to stop iterating over our Avahi poll
- cannot create resolver on avahi client, which does resolving of .local addresses

However regarding 'ippfind' command - my understanding is that driverless utility wants to show uris only for real devices, not for remote cups queues which are advertised by cups server in the LAN - mdns records which are from cupsd on LAN have this 'printer-type' field in txt record, so that's why 'ippfind' in 'driverless' uses '! --txt printer-type'.
AFAIK 'driverless' was created to get IPP-Everywhere URI and PPD based on IPP communication via current printer driver architecture, so users would slowly move to driverless tech and, especially since GUI toolkits didn't support temp queues for long time, to make driverless work for them (although via permanent queues).

So if your setup is in a similar way (having a server which shares the queues, all devices are in different network than your PC - meaning they aren't shown in avahi-browse, driverless shows nothing and ippfind shows all remote queues), IMO driverless returning nothing is expected.

If that's the case, what I think as a bug is that error status (which can be a bug or result of other configuration), and IMO it would be nice to have 'driverless' configurable for showing uris/generating PPDs for remote queues shared via mDNS.

Plus, since GTK can support temp queues in F33 and we don't need to install a permanent queue or run cups-browsed for driverless device to make it work in GTK, I'm considering removal of 'driverless' from /usr/lib/cups/backend and /usr/lib/cups/driver, so it will not be run during installation.

=======================================================================================

To sum it up:

1) your issue doesn't look like a bug (if you have the configuration as I described above) or a different bug - I would be grateful if you could confirm what is your configuration. I can file a different bug for that.

2a) if your setup is as described above, then it is a bug to return 2 as a status code. The correct one should be '1'.

2b) if you have a different setup, I need to investigate further - maybe avahi, nss-mdns and resolved didn't play along

3) there is a possible feature for 'driverless' to support option for remote cups queues, but since GUIs can support temp queues nowadays in some way, IMO usage of 'driverless' is reduced to cases when something doesn't work and you need to have PPD based on IPP communication and create a permanent queue for it, so I don't see that as a critical feature to implement.

Comment 4 Solomon Peachy 2021-06-05 02:09:26 UTC
For whatever reason, I'm now unable to recreate the 'status 2' response.  

Here is a rundown of my configuration:

 * No native IPP printers
 * All advertised printers are queues exported from CUPS (27 in all, 26 unique printers)
   * One queue is native postscript, the rest go through gutenprint
 * Server and Client PC are both on the same local network, connected via ethernet.
 * Server and Client are both fully updated Fedora 34 systems (cups-2.3.3op2-7, cups-filters-1.28.8-1, avahi-0.8-9)
 * 'driverless list' returns "status 1"
 * ippfind returns all 27 "printers" with ipp://.... URIs.
 * avahi-browse returns all printers, exported with _ipp._tcp, _ipps._tcp, and _printer._tcp over IPv4
   * all of the above are the same for both server and PC

Except for the number of printers on a single system, I don't think it's anything unusual.

Please let me know if there's any more information you need.

Comment 5 Zdenek Dohnal 2021-06-07 05:24:27 UTC
(In reply to Solomon Peachy from comment #4)
> For whatever reason, I'm now unable to recreate the 'status 2' response.

Ok, do let me know if it shows up again by a new bug.

Thank you in advance!

> 
> Here is a rundown of my configuration:
> 
>  * No native IPP printers
>  * All advertised printers are queues exported from CUPS (27 in all, 26
> unique printers)
>    * One queue is native postscript, the rest go through gutenprint
>  * Server and Client PC are both on the same local network, connected via
> ethernet.
>  * Server and Client are both fully updated Fedora 34 systems
> (cups-2.3.3op2-7, cups-filters-1.28.8-1, avahi-0.8-9)
>  * 'driverless list' returns "status 1"
>  * ippfind returns all 27 "printers" with ipp://.... URIs.
>  * avahi-browse returns all printers, exported with _ipp._tcp, _ipps._tcp,
> and _printer._tcp over IPv4
>    * all of the above are the same for both server and PC
> 
> Except for the number of printers on a single system, I don't think it's
> anything unusual.

Ok, so AFAIK driverless behaves correctly in your setup. driverless chose to show only actual printers, not print queues shared by CUPS. But let's check up with Till on printing-architecture.

My problem is that I have a real printer on my LAN, and sometimes driverless returns 1 instead of printer uri.

Comment 6 Ben Cotton 2021-08-10 12:59:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.

Comment 7 Ben Cotton 2022-11-29 16:55:59 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 8 Zdenek Dohnal 2023-01-26 16:39:22 UTC
Ok, so I was able to get further into this - it happens because driverless runs ippfind with two searches at one time - _ipps._tcp and _ipp._tcp. The full ippfind command:

$ ippfind -T 0 _ipps._tcp _ipp._tcp ! --txt printer-type --and \( --txt-pdl image/pwg-raster --or --txt-pdl application/PCLm --or --txt-pdl image/urf --or --txt-pdl application/pdf \) -x echo -en {service_scheme}\t\t{service_name}\t{service_domain}\t\n \;

Those arguments cause ippfind starting two browse service instances in Avahi client - one for _ipps._tcp, the other for _ipp._tcp -  but Avahi daemon responds in one message for both instances (see the attached strace output) containing all events for both service browsers. However, ippfind thinks the Avahi data are ready if poll() in poll_callback() sees POLLIN event on Avahi related file descriptor - it sets a global variable avahi_got_data, which is checked in main() and if it is set, the main() will start to process the data which are saved in 'services' array from browse_callback() in case the received Avahi event was ItemNew.
Unfortunately, in case the data processing in main() ends earlier than ItemNew event is processed by browse_callback(), which happens a lot if the Avahi answer arrives in one message for both browser and the answer for the first browser is empty, ippfind processes it earlier and prematurely returns nothing.

As a fix I would increment avahi_got_data everytime ALL_FOR_NOW event arrives, which seems to indicate the response for the browser is complete, and avahi_got_data has to be equal to number of searches (browsers) we started before we start to process the responses or we time out.

I'll send the fix upstream for review, but in case it will be okay, this fix will give us reliable discovery of driverless printers in CUPS Web UI.

Comment 9 Ben Cotton 2023-02-07 14:52:04 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 10 Zdenek Dohnal 2023-02-20 07:57:08 UTC
Created attachment 1945180 [details]
Proposed patch

Comment 11 Zdenek Dohnal 2023-02-27 09:05:34 UTC
Created attachment 1946680 [details]
ippfind strace with bundled Avahi response

strace of ippfind process which was acquired by:

$ strace -f -T -tt -v -Y -yy -s4092 -o ippfind.strace ippfind -T 0 _ipps._tcp _ipp._tcp ! --txt printer-type --and \( --txt-pdl image/pwg-raster --or --txt-pdl application/PCLm --or --txt-pdl image/urf --or --txt-pdl application/pdf \) -x echo -en {service_scheme}\t\t{service_name}\t{service_domain}\t\n \;

Avahi daemon bundles responses for two different browsers into one response, which causes ippfind to end prematurely and shows nothing.

Comment 12 Zdenek Dohnal 2023-02-27 09:08:00 UTC
Created attachment 1946681 [details]
ippfind strace when a device is reported

Strace output gathered by:

$ strace -v -f -tt -T -Y -yy -s4092 -o ippfind-good.strace ippfind -T 0 _ipps._tcp _ipp._tcp ! --txt printer-type --and \( --txt-pdl image/pwg-raster --or --txt-pdl application/PCLm --or --txt-pdl image/urf --or --txt-pdl application/pdf \) -x echo -en {service_scheme}\t\t{service_name}\t{service_domain}\t\n \;

Avahi now sends two responses in two separate sendmsg(), which ippfind is able to process correctly.

Comment 13 Zdenek Dohnal 2023-02-27 10:24:28 UTC
I'll reassign this bug to Avahi based on CUPS upstream review, because it is its 'bundling responses' behavior which caused the ippfind issue - the response is not processed in Avahi browsers in time before ippfind processes it further, ending up with empty ippfind output even if there is a device on the network.

We can't implement ippfind to wait until we have all data from Avahi as well, because Avahi doesn't seem to provide a reliable event which we can check to find out whether browsers got all data or not... so we check a descriptor with poll() and wait with processing the data till poll() indicates there are no data for reading on the descriptor. This means a race condition appears when poll() no longer see a data for reading and Avahi browsers haven't processed the data yet.

Comment 14 Zdenek Dohnal 2023-02-27 11:28:14 UTC
Reported upstream https://github.com/lathiat/avahi/issues/442

Comment 15 Zdenek Dohnal 2023-03-09 16:18:48 UTC
Avahi daemon itself sends separate DBUS messages, glibc sometimes bundles them - I'll try to work around in ippfind.