Bug 839414 - pulp-admin repo discovery genereates rogue requests (10Mbit /sec) to mirrors
pulp-admin repo discovery genereates rogue requests (10Mbit /sec) to mir...
Status: CLOSED WONTFIX
Product: Pulp
Classification: Community
Component: user-experience (Show other bugs)
Master
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: pulp-bugs
Preethi Thomas
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-11 18:03 EDT by Kris Buytaert
Modified: 2014-01-02 17:08 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-02 17:08:37 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kris Buytaert 2012-07-11 18:03:47 EDT
Description of problem:

pulp-admin repo discovery generates rogue requests  (10Mbit /sec) to mirrors 

Version-Release number of selected component (if applicable):
pulp-1.1.11-1.el6.noarch


How reproducible:

[root@pulp pulp]#  pulp-admin  repo  discovery  --url  http://centos.mirror.triple-it.nl/6.3/os/x86_64/  --type yum 
Discovering urls with yum metadata, This could take some time...
2012-07-11 23:47:04,071 4070:140200049100544: pulp.server.webservices.controllers.services:INFO: services:457 Discovering compatible repo urls @ [http://centos.mirror.triple-it.nl/6.3/os/x86_64/]
Number of Urls Discovered (|): 97

it keeps going after 97,  even ctrl-c does not solve the problem.

The only way to stop this is to stop pulp-server and delete the task from mongo
(All tips on other ways to handle this are welcome) 


also tested with  http://centos.mirror.triple-it.nl/6.3/os/x86_64/   
does not happen however with http://be.mirror.eurid.eu/centos/6.3/os/x86_64/  

Steps to Reproduce:
1. 
pulp-admin  repo  discovery  --url http://centos.cu.be/6.3/os/x86_64/ --type yum 


2. watch counter go up 
3. ctrl-c
  
Actual results:
a) a continous growing number of found repos (100+)
b) A constant flow of requests to $url/6.3/os/x86_64/isolinux/../isolinux... 
c) The repo admin complaining that I was sending rogue requests at 10Mbit/sec 


Expected results:

Detection of a couple of repos and the menu to select them
Comment 1 Sam Kottler 2012-07-17 10:27:29 EDT
I am able to reproduce this issue on a personal mirror at a slightly lower rate, but it still causes significant network traffic issues on AWS.
Comment 2 Pradeep Kilambi 2012-07-17 10:48:58 EDT
Basically discovery is grabbing the html and extracting the anchor tags and looking for specific metadata. In this case, looks like the url discovered has a '../' which will turn out to be a valid url to discover and now traverses down the tree. This will obviously generate a whole bunch of requests as the depth is traversed through. The urls i tried without a '../' which is most usually the case work fine as the root is the url we start with. 

I'll see if there is any elegant way to detect this, but its doing the expected based on how the tree is presented in the given url.
Comment 3 Michael Hrivnak 2012-07-18 12:09:41 EDT
I'll just add a touch more context:

Many repositories are setup as a web server just serving static files in a directory structure, with the web server generating indexes dynamically.  Default behavior of common web servers such as Apache is to generate a link for the parent directory.

http://httpd.apache.org/docs/2.2/mod/mod_autoindex.html

A problem is that the link is sometimes in different formats.  If you look at the HTML source for the link below, you will see that the parent link is absolute, not relative.

http://be.mirror.eurid.eu/centos/6.3/os/x86_64/

However for this next repo, the link is relative ("../"):

http://centos.cu.be/6.3/os/x86_64/

Since this is default behavior of popular web servers, we should be able to identify and ignore parent directory links.  And because of the differences shown above, we'll need to watch for both relative and absolute paths that would take us up the tree.
Comment 89 Randy Barlow 2014-01-02 17:08:37 EST
Pulp 2.X no longer has the repo discovery feature, so I am closing this bug.

Note You need to log in before you can comment on or make changes to this bug.