Bug 839414 - pulp-admin repo discovery genereates rogue requests (10Mbit /sec) to mirrors
Summary: pulp-admin repo discovery genereates rogue requests (10Mbit /sec) to mir...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Pulp
Classification: Retired
Component: user-experience
Version: Master
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: pulp-bugs
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-11 22:03 UTC by Kris Buytaert
Modified: 2014-01-02 22:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-02 22:08:37 UTC
Embargoed:


Attachments (Terms of Use)

Description Kris Buytaert 2012-07-11 22:03:47 UTC
Description of problem:

pulp-admin repo discovery generates rogue requests  (10Mbit /sec) to mirrors 

Version-Release number of selected component (if applicable):
pulp-1.1.11-1.el6.noarch


How reproducible:

[root@pulp pulp]#  pulp-admin  repo  discovery  --url  http://centos.mirror.triple-it.nl/6.3/os/x86_64/  --type yum 
Discovering urls with yum metadata, This could take some time...
2012-07-11 23:47:04,071 4070:140200049100544: pulp.server.webservices.controllers.services:INFO: services:457 Discovering compatible repo urls @ [http://centos.mirror.triple-it.nl/6.3/os/x86_64/]
Number of Urls Discovered (|): 97

it keeps going after 97,  even ctrl-c does not solve the problem.

The only way to stop this is to stop pulp-server and delete the task from mongo
(All tips on other ways to handle this are welcome) 


also tested with  http://centos.mirror.triple-it.nl/6.3/os/x86_64/   
does not happen however with http://be.mirror.eurid.eu/centos/6.3/os/x86_64/  

Steps to Reproduce:
1. 
pulp-admin  repo  discovery  --url http://centos.cu.be/6.3/os/x86_64/ --type yum 


2. watch counter go up 
3. ctrl-c
  
Actual results:
a) a continous growing number of found repos (100+)
b) A constant flow of requests to $url/6.3/os/x86_64/isolinux/../isolinux... 
c) The repo admin complaining that I was sending rogue requests at 10Mbit/sec 


Expected results:

Detection of a couple of repos and the menu to select them

Comment 1 Sam Kottler 2012-07-17 14:27:29 UTC
I am able to reproduce this issue on a personal mirror at a slightly lower rate, but it still causes significant network traffic issues on AWS.

Comment 2 Pradeep Kilambi 2012-07-17 14:48:58 UTC
Basically discovery is grabbing the html and extracting the anchor tags and looking for specific metadata. In this case, looks like the url discovered has a '../' which will turn out to be a valid url to discover and now traverses down the tree. This will obviously generate a whole bunch of requests as the depth is traversed through. The urls i tried without a '../' which is most usually the case work fine as the root is the url we start with. 

I'll see if there is any elegant way to detect this, but its doing the expected based on how the tree is presented in the given url.

Comment 3 Michael Hrivnak 2012-07-18 16:09:41 UTC
I'll just add a touch more context:

Many repositories are setup as a web server just serving static files in a directory structure, with the web server generating indexes dynamically.  Default behavior of common web servers such as Apache is to generate a link for the parent directory.

http://httpd.apache.org/docs/2.2/mod/mod_autoindex.html

A problem is that the link is sometimes in different formats.  If you look at the HTML source for the link below, you will see that the parent link is absolute, not relative.

http://be.mirror.eurid.eu/centos/6.3/os/x86_64/

However for this next repo, the link is relative ("../"):

http://centos.cu.be/6.3/os/x86_64/

Since this is default behavior of popular web servers, we should be able to identify and ignore parent directory links.  And because of the differences shown above, we'll need to watch for both relative and absolute paths that would take us up the tree.

Comment 89 Randy Barlow 2014-01-02 22:08:37 UTC
Pulp 2.X no longer has the repo discovery feature, so I am closing this bug.


Note You need to log in before you can comment on or make changes to this bug.