Bug 1577854

Summary: DNF search should use logical conjunction of terms
Product: [Fedora] Fedora Reporter: marekhwd
Component: dnfAssignee: Jaroslav Rohel <jrohel>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: carl, dmach, jmracek, jrohel, mhatina, packaging-team-maint, rpm-software-management, vmukhame
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-28 08:17:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description marekhwd 2018-05-14 09:41:51 UTC
Example of apt search:

  $ apt-cache search vim gtk
  vim-gnome - Vi IMproved - enhanced vi editor (dummy package)
  vim-gtk3 - Vi IMproved - enhanced vi editor - with GTK3 GUI
  vim-tiny - Vi IMproved - enhanced vi editor - compact version
  nescc - Programming Language for Deeply Networked Systems
  vim-gtk - Vi IMproved - enhanced vi editor - with GTK2 GUI
  vim-syntax-gtk - Syntax files to highlight GTK+ keywords in vim
  zathura - document viewer with a minimalistic interface

This searches descriptions too (for instance, zathura is document viewer with Vi keybindings and GTK frontend).

DNF search uses logical disjunction (OR) of search terms, which is rarely needed. One example where logical conjunction (AND) is default is Google search engine and probably any other search engine available.

I couldn't file this bug under 'dnf' component, beacuse bugzilla wanted me to pick subcomponent and there was only 'repoquery' available, which is not related to this issue, so I picked 'libdnf'.

Comment 1 marekhwd 2018-05-14 11:27:50 UTC
Even better example:

Ubuntu:
  user@xubu:~$ apt search apache output log
  apache2-utils - Apache HTTP Server (utility programs for web servers)
  subversion-tools - Assorted tools related to Apache Subversion
  apachedex - Compute APDEX from Apache-style logs
  collectd-core - statistics collection and monitoring daemon (core system)
  cronolog - Logfile rotator for web servers
  liblog-log4perl-perl - Perl port of the widely popular log4j logging package
  liblog4net-cil-dev - highly configurable logging API for the CLI
  liblog4net1.2-cil - highly configurable logging API for the CLI
  libluasandbox-bin - Generic Lua sandbox library for dynamic data analysis – utilities
  libluasandbox-dev - Generic Lua sandbox library for dynamic data analysis – development files
  libluasandbox0 - Generic Lua sandbox library for dynamic data analysis — dynamic library
  nagios-plugins-contrib - Plugins for nagios compatible monitoring systems
  syslog-ng-mod-kafka - Enhanced system logging daemon (kafka destination)
  viewvc - web interface for CVS and/or Subversion repositories
  webdruid - Web server log file analysis tool
  user@xubu:~$ apt-cache search apache output log | wc -l
  15
  user@xubu:~$ 

Fedora:
  [root@localhost ~]# dnf search apache output log | head -20
  Last metadata expiration check: 0:06:24 ago on Mon 14 May 2018 01:18:23 PM CEST.
  ===================== Name & Summary Matched: apache, log ======================
  apache-commons-logging.noarch : Apache Commons Logging
  ant-apache-log4j.noarch : Optional apache log4j tasks for ant
  apache-log4j-extras.noarch : Apache Extras Companion for Apache log4j
  perl-Apache-LogRegex.noarch : Parse a line from an Apache logfile into a hash
  apache-logging-parent.noarch : Parent pom for Apache Logging Services projects
  apache-commons-logging-javadoc.noarch : API documentation for
                                        : apache-commons-logging
  perl-Apache-LogFormat-Compiler.noarch : Compile a log format string to perl-code
  ===================== Name & Summary Matched: output, log ======================
  perl-Log-Handler.noarch : Log messages to several outputs
  rsyslog-elasticsearch.x86_64 : ElasticSearch output module for rsyslog
  perl-PCP-LogSummary.x86_64 : Performance Co-Pilot (PCP) Perl bindings for
                             : post-processing output of pmlogsummary
  ===================== Name & Summary Matched: apache, log ======================
  log4j-bom.noarch : Apache Log4j BOM
  log4j-web.noarch : Apache Log4j Web
  log4j-nosql.noarch : Apache Log4j NoSql
  log4j-jmx-gui.noarch : Apache Log4j JMX GUI
  log4j-taglib.noarch : Apache Log4j Tag Library
  [root@localhost ~]# dnf search apache output log | wc -l
  Last metadata expiration check: 0:06:47 ago on Mon 14 May 2018 01:18:23 PM CEST.
  1952
  [root@localhost ~]# 

DNF produces 1952 results! It doesn't do conjunction of all search terms in the beginning of the output. And I don't even understand why matches for "apache, log" are listed twice (three times including "log, output"). DNF search results are very inconsistent, hard to read, and generally confusing for the user.

Comment 3 marekhwd 2018-05-14 12:07:43 UTC
Could you please elaborate as to why this was CLOSED with WONTFIX?

Comment 4 Jaroslav Rohel 2018-05-15 06:54:10 UTC
1. ...DNF search uses logical disjunction (OR) of search terms...
Using "OR: is not a bug but a feature.
Changes of behavior confuse the users.  We can satisfy you and upset the users who are comfortable with the current behavior. Generally, I need a good argument for changing behavior.

Fact is that the results are grouped and the most relevant results are listed first. You can use grep or head command.

2. ...And I don't even understand why matches for "apache, log" are listed twice...
Results are grouped and there is a lot of combinations.
In first group (most relevant) the both keywords "apache" and "log" are in the name of package and at least one of them is in the summary.
In second group only "log" keyword is in the name (less relevant group) of package and at least one of them is in the summary.
It seems complicated but there is sorting and grouping logic which is trying to put the most relevant results to the beginning of the list.

Sorry I didn't make happy you. May be we will do better listing in the future but not at these time.

Comment 5 marekhwd 2018-05-15 08:30:17 UTC
1. Well, it doesn't seem like a feature but rather a relic behavior from Yum to me. I challenge you to give me one use case where user would want OR instead of AND. I can give you use case for AND: I want NeoVim with GTK frontend, so I search for

  dnf search vim gtk

All sane search interfaces use AND by default (see Google).

2. Then the output is confusing as hell. The grouping logic should cater for user friendliness.

I didn't expect you to make me happy. I'm quite experienced coder and I contributed small fixes and features to plethora of open source projects. When I see a problem I first report it and possibly later I poke the code and submit patches. I could definitely help, but seeing this closed as WONTFIX in few minutes after report makes me wonder whether it's even good use of time to try to contribute to this project, because if this issue was closed in minutes, my work could be dismissed in same amount of time without any consideration of actual argument.

DNF search output is so hard to parse, that no one probably even tries it, so I doubt anything would break. And I already linked user opinions of the "OR feature": https://www.reddit.com/r/Fedora/comments/5a69tx/dnf_search_output_is_difficult_to_read/

Comment 6 Jaroslav Rohel 2018-05-15 10:55:00 UTC
There is more point of view on this problem.
We discussed it in team. We propose solution.

Actual behavior:
Search package metadata for the keywords. Operation OR is used for searched keywords. Keywords are matched as case-insensitive substrings, globbing is supported. By default the command will only look at package names and summaries, failing that (or whenever all was given as an argument) it will match against package descriptions and URLs. The result is sorted from the most relevant results to the least.

Proposed behavior:
Search package metadata for the keywords. Keywords are matched as case-insensitive substrings, globbing is supported. By default the operation AND for keywords is used and the command will only look at package names and summaries. If option "--all" is given it will use operation OR for keywords and match against package descriptions and URLs. The result is sorted from the most relevant results to the least.

Comment 7 Jaroslav Mracek 2018-06-28 08:17:12 UTC
The issue is solved by dnf-3.0.1-1 that was released into rawhide.