Bug 126482 - htdig does not understand robots.txt file
Summary: htdig does not understand robots.txt file
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: htdig
Version: 2
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Phil Knirsch
QA Contact:
URL: ftp://ftp.ccsf.org/htdig-patches/3.2....
Whiteboard: patch
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-06-22 10:45 UTC by Jacek Piskozub
Modified: 2015-03-05 01:14 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-07-06 16:17:05 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jacek Piskozub 2004-06-22 10:45:01 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040619

Description of problem:
hidig-3.2.0b5 does not work if the website has _any_ robots.txt file.
It is a known bug of this version of htdig. A patch is available. See
ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/robots.0

The patch is a very simple one:

--- htdig/Server.cc.orig	2003-10-27 17:28:52.000000000 -0600
+++ htdig/Server.cc	2003-11-13 11:31:24.000000000 -0600
@@ -338,6 +338,8 @@
 		
     String	fullpatt = "^[^:]*://[^/]*(";
     fullpatt << pattern << ')';
+    if (pattern.length() == 0)
+	fullpatt = "";
     _disallow.set(fullpatt, config->Boolean("case_sensitive"));
 }
 

I have the same symptoms with FC2 (i386). Removing the (correct!)
robot.txt file makes htdig index my site again.

Version-Release number of selected component (if applicable):
htdig-3.2.0b5-7

How reproducible:
Always

Steps to Reproduce:
1. Set up a website
2. Create a robot.txt file (with at least "User Agent: *" line)
3. Update to FC2
4. See that the word database is empty
    

Actual Results:  Searching the website gives an arror about
db.words.db not found

Expected Results:  The website is searchable

Comment 1 Phil Knirsch 2004-07-06 16:17:05 UTC
OK, looks sane and logical.

Included in htdig-3.2.0b6-1 and later.

Read ya, Phil


Note You need to log in before you can comment on or make changes to this bug.