Bug 767739

Summary: [PATCH} rpmlint always returns invalid URL for sources on Google Code
Product: [Fedora] Fedora Reporter: Jason Tibbitts <j>
Component: rpmlintAssignee: Tom "spot" Callaway <tcallawa>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 21CC: a.badger, ktdreyer, manuel.wolfshant, michele, mizdebsk, sergio.pasra, tcallawa, tmz, ville.skytta
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rpmlint-1.6-3.fc21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-07 16:05:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Minimal patch fixing the issue none

Description Jason Tibbitts 2011-12-14 19:03:29 UTC
Created attachment 546858 [details]
Minimal patch fixing the issue

I got tired of people asking if rpmlint's invalid-url complaints for things hosted at googlecode.com could be ignored, so I decided to figure out what the underlying cause is.  I thought it might be some blacklisting based on User-Agent, but it turns out that google simply returns 404 for HEAD requests of files hosted at googlecode.com.  Since check_url in AbstractCheck.py overrides the default request type and always uses HEAD, users always see invalid-url warnings for URLs containing googlecode.com.

I attach a minimal patch to use GET for googlecode.com URLs.  Alternately, _HeadRequest could be done away with entirely, but I'm sure there's some performance implication for always using GET over HEAD.  Let me know if you'd like this done differently, or if the set of sites needs to be configurable or something.

Comment 1 Tom "spot" Callaway 2011-12-14 19:21:31 UTC
This is a long standing bug in Google Code:
http://code.google.com/p/support/issues/detail?id=660

I keep hoping they'll unfsck themselves and fix it, but after a few years of Chromium, I no longer have faith.

I'd rather not carry this patch forever though, so if rpmlint upstream takes it, we will too.

Comment 2 Ville Skyttä 2011-12-14 21:11:41 UTC
With GET, stuff the URL points at gets actually downloaded; overriding the default request type always to HEAD is done specifically to avoid this.  For some reason which is not documented in the urllib2 docs as far as I can tell, with the submitted patch, the content is not downloaded entirely with GET but some of it is.

Downloading all of the content while rpmlinting a package is IMO inappropriate behavior and I do not plan to make upstream rpmlint do that, and because it is not documented in urllib2 why only some of it ends up downloaded (at least on my box), I'm not happy about taking the risk either.  Also, downloading a bit with GET and then suddenly closing the connection or closing the connection without reading the content doesn't sound like a well behaved HTTP client to me.  Using GET with the Range header set to a small value would sound better behaved, but I'm not sure I want to open that can of worms either.

Getting rid of the message can be done easily in rpmlint's config, for example:
addFilter("invalid-url .*\.googlecode\.com/.*HTTP Error 404")

Comment 3 Jason Tibbitts 2011-12-14 22:46:22 UTC
I can't imagine a trivial additional load on googlecode that's entirely their fault is somehow worse than producing either a false positive (URL marked bad when it's good) or a false negative (not even bothering to check what could be a bad URL just because we might download a small bit of extra data).

But I guess it's not worth fighting about.

Comment 4 Ville Skyttä 2011-12-15 21:21:40 UTC
Load on googlecode's server is a secondary concern.  The primary one is the possible time and bandwidth taken on end user systems if rpmlinting a package ends up actually downloading non-trivial amount of content -- think for example packages containing tens of megabytes of sources.

Comment 5 Jason Tibbitts 2011-12-16 05:20:19 UTC
This is hilarious, but there's not much else I can add that isn't inflammatory.  I would hope to have a tool that actually checks what it says it checks, but it seems that's too much to wish for.

Comment 6 Mikolaj Izdebski 2012-08-09 11:39:58 UTC
This issue also affects URLs pointing to www.jboss.org.

rpmlint prints a warning:
> jboss-reflect.noarch: W: invalid-url URL: http://www.jboss.org HTTP Error 403: Forbidden

GET method works:

> $ echo "GET / HTTP/1.1
> Host: www.jboss.org
>
> " | nc www.jboss.org 80 | head -1
> HTTP/1.1 200 OK

But HEAD fails:

> $ echo "HEAD / HTTP/1.1
> Host: www.jboss.org
>
> " | nc www.jboss.org 80 | head -1
> HTTP/1.1 403 Forbidden

Comment 7 Ken Dreyer 2012-10-03 21:50:51 UTC
To generalize the problem for any servers that refuse HEAD requests: could rpmlint do a second request, using GET, and close the connection after receiving 1 byte of the HTTP body?

Comment 8 Fedora End Of Life 2013-04-03 17:48:07 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 9 Michele Baldessari 2014-11-23 13:33:45 UTC
FWIW: this also fails with bitbucket.org downloads
(like https://bitbucket.org/lazka/mutagen/downloads/mutagen-1.26.tar.gz)

SPECS/python-mutagen.spec: I: checking-url https://bitbucket.org/lazka/mutagen/downloads/mutagen-1.26.tar.gz (timeout 10 seconds)
SPECS/python-mutagen.spec: W: invalid-url Source0: https://bitbucket.org/lazka/mutagen/downloads/mutagen-1.26.tar.gz HTTP Error 403: Forbidden

Comment 10 Fedora End Of Life 2015-01-09 16:54:40 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Fedora End Of Life 2015-05-29 08:41:47 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Fedora Update System 2015-05-29 16:27:25 UTC
rpmlint-1.6-3.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/rpmlint-1.6-3.fc22

Comment 13 Fedora Update System 2015-05-29 16:27:32 UTC
rpmlint-1.6-3.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/rpmlint-1.6-3.fc21

Comment 14 Fedora Update System 2015-05-29 16:27:40 UTC
rpmlint-1.6-3.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/rpmlint-1.6-3.fc20

Comment 15 Fedora Update System 2015-05-30 15:52:00 UTC
Package rpmlint-1.6-3.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing rpmlint-1.6-3.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-9179/rpmlint-1.6-3.fc20
then log in and leave karma (feedback).

Comment 16 Fedora Update System 2015-06-07 16:05:14 UTC
rpmlint-1.6-3.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2015-06-09 15:05:32 UTC
rpmlint-1.6-3.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2015-06-09 15:20:32 UTC
rpmlint-1.6-3.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.