Bug 678591

Summary: htdig doesn't give a proper error message when failing to execute external parser
Product: Red Hat Enterprise Linux 6 Reporter: Miroslav Vadkerti <mvadkert>
Component: htdigAssignee: Adam Tkac <atkac>
Status: CLOSED NOTABUG QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: grdetil, mkoci, ovasik, rvokal, sghosh
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 435741 Environment:
Last Closed: 2011-02-18 17:26:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Comment 1 Miroslav Vadkerti 2011-02-18 14:49:12 UTC
This bug can be reproduced in htdig-3.2.0-0.10.b6.el6

Comment 2 Gilles Detillieux 2011-02-18 16:21:05 UTC
Miroslav, could you please elaborate on the results you get when reproducing this problem in htdig on RHEL6?  How do your actual results compare to the ones I originally reported as bug 435741.  Looking at the patches and spec file in the source RPM for htdig (htdig-3.2.0-0.10.b6.el6.src.rpm), I can see that the patch for bug 435741 is in there and is being applied to the source, so I find it very puzzling that the problem still exists.  Can you confirm that version 3.2.0-0.10.b6.el6 is indeed the one and only version of htdig you have running on your system, and that htdig is parsing its own debugging output when running with -vvv and configured with an external_parsers attribute that it can't execute?

Comment 3 Gilles Detillieux 2011-02-18 16:32:21 UTC
Sorry, that should be -vvvv (4 v's) above, not -vvv.

Some useful commands to check things out would be the following, for which I'd be interested in seeing the output:

$ which htdig
(should be /usr/bin/htdig)
$ rpm -qf $(which htdig)
(should be htdig-3.2.0-0.10.b6.el6)
$ grep external_parsers /etc/htdig/htdig.conf

$ htdig -vvvv

Comment 4 Miroslav Vadkerti 2011-02-18 16:38:47 UTC
Gilles,

I have only htdig-3.2.0-0.10.b6.el6 installed.

# cat /etc/htdig/htdig.conf 
base_dir:           /var/lib/htdig
common_dir:     /usr/share/htdig
translate_latin1:       false
start_url:      http://www.google.com
external_parsers:    text/html /usr/local/bin/htmlparserfake

# htdig -vvv
ht://dig Start Time: Fri Feb 18 11:32:59 2011
	0:1:http://www.google.com/
New server: www.google.com, 80
 - Persistent connections: enabled
 - HEAD before GET: enabled
 - Timeout: 30
 - Connection space: 0
 - Max Documents: -1
 - TCP retries: 1
 - TCP wait time: 5
 - Accept-Language: 
Trying to retrieve robots.txt file
Making HTTP request on http://www.google.com/robots.txt
Header line: HTTP/1.1 200 OK
Header line: Content-Length: 5570
Header line: Content-Type: text/plain
Header line: Last-Modified: Mon, 14 Feb 2011 19:41:32 GMT
Header line: Date: Fri, 18 Feb 2011 16:33:00 GMT
Header line: Expires: Fri, 18 Feb 2011 16:33:00 GMT
Header line: Cache-Control: private, max-age=0
Header line: Vary: Accept-Encoding
Header line: X-Content-Type-Options: nosniff
Header line: Server: sffe
Header line: X-XSS-Protection: 1; mode=block
Request time: 1 secs
Header line: HTTP/1.1 200 OK
Header line: Content-Type: text/plain
Header line: Last-Modified: Mon, 14 Feb 2011 19:41:32 GMT
Header line: Date: Fri, 18 Feb 2011 16:33:00 GMT
Header line: Expires: Fri, 18 Feb 2011 16:33:00 GMT
Header line: Cache-Control: private, max-age=0
Header line: Vary: Accept-Encoding
Header line: X-Content-Type-Options: nosniff
Header line: Server: sffe
Header line: X-XSS-Protection: 1; mode=block
Header line: Transfer-Encoding: chunked
Request time: 0 secs
Parsing robots.txt file using myname = htdig
Robots.txt line: User-agent: *
Found 'user-agent' line: *
Robots.txt line: Disallow: /search
[snip]
	1:1:http://www.google.com/ skipped
pick: www.google.com, # servers = 1
> www.google.com supports HTTP persistent connections (infinite)
0:2:0:http://www.google.com/: Making HTTP request on http://www.google.com/
Header line: HTTP/1.1 302 Found
Header line: Location: http://www.google.cz/
Header line: Cache-Control: private
Header line: Content-Type: text/html; charset=UTF-8
Header line: Set-Cookie: PREF=ID=2aeef165b7e5cb44:FF=0:TM=1298046780:LM=1298046780:S=6Y13eJy7ZFw3AqXT; expires=Sun, 17-Feb-2013 16:33:00 GMT; path=/; domain=.google.com
Header line: Set-Cookie: NID=44=IHIeVx1KPEnPu8mPKVOc_OOaXVjYcxe_wSsIVvPkszdiulFy-t9bu1OKlSucqxPe_XinFRBrR49g8l31V3YfDiCvp1MaCl7WzBW6ruJqMrP-3bBYFcoPgij83296pztb; expires=Sat, 20-Aug-2011 16:33:00 GMT; path=/; domain=.google.com; HttpOnly
Header line: Date: Fri, 18 Feb 2011 16:33:00 GMT
Header line: Server: gws
Header line: Content-Length: 218
Header line: X-XSS-Protection: 1; mode=block
Request time: 0 secs
 redirect
redirect: http://www.google.cz/

   Rejected: URL not in the limits! pick: www.google.com, # servers = 1
> www.google.com supports HTTP persistent connections (infinite)
ht://dig End Time: Fri Feb 18 11:33:00 2011

According to our test case there should be this error message:
External parser error: Can't execute /usr/local/bin/htmlparserfake
But I see none

---------------------------
To your last post:
# which htdig
/usr/bin/htdig
# rpm -qf $(which htdig)
htdig-3.2.0-0.10.b6.el6.x86_64
# grep external_parsers /etc/htdig/htdig.conf
external_parsers:    text/html /usr/local/bin/htmlparserfake
# htdig -vvvv | grep External
(no output)

The whole output:
http://fpaste.org/IvUR/

Comment 5 Gilles Detillieux 2011-02-18 17:11:41 UTC
The problem is with the test case you give, htdig never even gets around to attempting to parse an HTML document, so no attempt is made to call the external parser.  The start_url of http://www.google.com redirects to http://www.google.cz/, which causes htdig to give this error:

   Rejected: URL not in the limits! pick: www.google.com, # servers = 1

This is because by default, limit_urls_to is set to the value of start_url, so that htdig doesn't stray off to other sites.  You should try a simple start_url that just gives you a single HTML page with no redirects as your test case, e.g.:

start_url: http://www.htdig.org/author.html

Then you should get the error:

execv: No such file or directory
External parser error: Can't execute /usr/local/bin/htmlparserfake

By the way, your first attribute, base_dir, isn't a standard htdig attribute name.  I think you mean "database_dir".

Comment 6 Miroslav Vadkerti 2011-02-18 17:26:23 UTC
Thanks very much! I fixed the start_url and database_dir. I didn't have much time to dig into it, it wasn't originally written by me. The test now passes. Closing as not a bug.