Bug 562209 - Booting boot.iso, installer unable to read network package metadata
Summary: Booting boot.iso, installer unable to read network package metadata
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: rawhide
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Anaconda Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F13Alpha, F13AlphaBlocker
TreeView+ depends on / blocked
 
Reported: 2010-02-05 15:59 UTC by James Laska
Modified: 2014-01-21 06:16 UTC (History)
11 users (show)

Fixed In Version: python-urlgrabber-3.9.1-5.fc13
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-26 21:07:30 UTC


Attachments (Terms of Use)
anaconda-logs.tgz (/tmp/*log, /tmp/anaconda-screenshots) (312.26 KB, application/x-compressed-tar)
2010-02-05 15:59 UTC, James Laska
no flags Details
a reproducer (711 bytes, text/plain)
2010-02-16 15:38 UTC, Kamil Dudka
no flags Details

Description James Laska 2010-02-05 15:59:00 UTC
Created attachment 389109 [details]
anaconda-logs.tgz (/tmp/*log, /tmp/anaconda-screenshots)

Description of problem:

Booting the installer using the boot.iso, the installer is unable to read the metadata from default mirrorlist URL.  I've entered several known good URL's and the error continues.  I believe something might be faulty with booting the boot.iso.  

Booting the vmlinuz/initrd.img using a network install.img (e.g. stage2=) does not exhibit this problem.

Version-Release number of selected component (if applicable):

 * anaconda-13.25

How reproducible:

 * 100% when booting boot.iso

Steps to Reproduce:
1. Download http://alt.fedoraproject.org/pub/alt/stage/rawhide-testing/i386/os/images/boot.iso
2. Boot (bare metal or KVM guest)
3. Proceed through install accepting all defaults
4. When prompted, activate networking

Actual results:

Unable to read package metadata from default mirrorlist URL (or any manually entered valid rawhide URLs).

10:55:44,321 WARNING anaconda: Try 1/10 for http://download.fedoraproject.org/pub/fedora/linux/development/x86_64/os/repodata/repomd.xml failed: [Errno 14] PYCURL ERROR 6 - ""
10:55:44,573 WARNING anaconda: Failed to get http://download.fedoraproject.org/pub/fedora/linux/development/x86_64/os/repodata/repomd.xml from mirror 1/1, or downloaded file is corrupt

Expected results:

 * Should read network repodata

Additional info:

 * If I manually enter the URL and attempt to download the repomd.xml file from tty2 ... it works fine
 * Adding as an Alpha blocker per the alpha release requirement (https://fedoraproject.org/wiki/Fedora_13_Alpha_Release_Criteria) - "#  The installer must boot (if appropriate) and run on all primary architectures from default live image, DVD, and boot.iso install media "

Comment 1 Chris Lumens 2010-02-05 22:16:43 UTC
This only happens if you do not have the network brought up in advance and must use the netconfig dialog.  So if you use updates=http://, or stage2=, or some other method you won't see this problem.

Error 6 is:

  CURLE_COULDNT_RESOLVE_HOST,    /* 6 */

Now, when anaconda brings up the network there should be a /etc/resolv.conf getting written out and the resolver cache gettting flushed.  However, if I break at the right place in yum (yumRepo.py:692) during anaconda and manually flush the resolver cache and then try the uh.urlgrab() call again, nothing's improved.  If I start up a different python process on tty2 and do the same thing, it works fine.

I'm at a loss for why we're seeing this in anaconda.  Is python-pycurl or libcurl perhaps holding on to the resolver cache information somehow and needs to be told to flush?

Comment 2 James Laska 2010-02-09 16:26:24 UTC
Jeff or Stepan, do you have any thoughts on the previous comment#1?

Comment 3 Jeffrey C. Ollie 2010-02-09 16:42:28 UTC
I have no idea although Chris' hypothesis sounds plausible.  This sounds more like it should be assigned to libcurl itself as pycurl is just a wrapper around libcurl.

Comment 4 James Laska 2010-02-09 17:14:30 UTC
Thanks Jeff, reassigning to libcurl as requested.

Comment 5 James Laska 2010-02-10 20:47:39 UTC
Kamil, any thoughts on this issue?

Comment 6 Kamil Dudka 2010-02-11 10:26:27 UTC
Rawhide curl does not use glibc/nss for name resolving.  It uses c-ares (bug #514771).  Can I somehow reproduce the bug out of anaconda?  It's merely impossible to debug it there...

Comment 7 Chris Lumens 2010-02-11 15:12:09 UTC
I have not tested this, but this procedure ought to work:

(1) Boot a machine without networking.
(2) Start a python shell.
(3) Do some action that uses pycurl, say urlgrabber.urlgrab("http://www.fedoraproject.org")
(4) That should fail with PYCURL ERROR #6.
(5) Without quitting the python shell, bring up the network.
(6) Repeat step #3.  That should also fail, if this bug is to be believed.

Comment 8 James Laska 2010-02-12 00:59:47 UTC
(In reply to comment #7)
> I have not tested this, but this procedure ought to work:

# python
>>> import urlgrabber
>>> urlgrabber.urlgrab("http://www.fedoraproject.org")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 612, in urlgrab
    return default_grabber.urlgrab(url, filename, **kwargs)
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 976, in urlgrab
    return self._retry(opts, retryfunc, url, filename)
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 880, in _retry
    r = apply(func, (opts,) + args, {})
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 962, in retryfunc
    fo = PyCurlFileObject(url, filename, opts)
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1056, in __init__
    self._do_open()
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1314, in _do_open
    self._do_grab()
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1444, in _do_grab
    self._do_perform()
  File "/usr/lib/python2.6/site-packages/urlgrabber/grabber.py", line 1301, in _do_perform
    raise err
urlgrabber.grabber.URLGrabError: [Errno 14] PYCURL ERROR 6 - ""
>>> 
[1]+  Stopped                 python

# ifconfig  eth0
# dhclient eth0
# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 52:54:00:7E:25:B5  
          inet addr:10.10.10.152  Bcast:10.10.11.255  Mask:255.255.252.0

# fg
python
>>> urlgrabber.urlgrab("http://www.fedoraproject.org")
''

Comment 9 Kamil Dudka 2010-02-12 01:15:54 UTC
Any chance to run anaconda within a debugger?

Is it possible to catch a strace of the failure?

Comment 10 Chris Lumens 2010-02-12 14:41:04 UTC
Running anaconda in a debugger is not really doable.  However, any tree containing anaconda-13.24-1 or later has strace in the loader and stage2, so we can strace anaconda.  Doesn't look like jlaska was able to reproduce it.

Comment 11 Chris Lumens 2010-02-12 18:54:54 UTC
I ran strace -p on anaconda before enabling the network via the UI, and this is a snip of what I'm seeing.  Note the sin_addr argument to connect.  On a working install, strace will show the expected address here.  Full strace output available if needed.

connect(26, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
sendto(26, "\364\341\1\0\0\1\0\0\0\0\0\0\7mirrors\rfedoraproje"..., 43, MSG_NOSIGNAL, NULL, 0) = 43
clock_gettime(CLOCK_MONOTONIC, {285, 962307462}) = 0
clock_gettime(CLOCK_MONOTONIC, {285, 962516013}) = 0
clock_gettime(CLOCK_MONOTONIC, {285, 962717607}) = 0
poll([{fd=26, events=POLLIN|POLLRDNORM}], 1, 8123) = 1 ([{fd=26, revents=POLLERR}])
clock_gettime(CLOCK_MONOTONIC, {285, 963221462}) = 0
recvfrom(26, 0x7fffffff6140, 513, 0, 0x7fffffff6350, 0x7fffffff613c) = -1 ECONNREFUSED (Connection refused)
close(26)

Comment 12 Kamil Dudka 2010-02-16 15:38:06 UTC
Created attachment 394569 [details]
a reproducer

Attached is a py script triggering the state out of anaconda.  Does any documentation says it is a flaw in curl and/or c-ares?  If not, the bug should be reassigned back to anaconda or whatever...

Comment 13 Kamil Dudka 2010-02-16 16:00:48 UTC
Note the last test case only works with c-ares:

Traceback (most recent call last):
  File "./trigger_bz562209.py", line 13, in perf
    c.perform()
error: (6, 'Could not resolve host: www.fedoraproject.org (Could not contact DNS servers)')

open("/etc/resolv.conf", O_RDONLY)      = 3
open("/etc/nsswitch.conf", O_RDONLY)    = 3
open("/dev/urandom", O_RDONLY)          = 3
open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
open("/etc/hosts", O_RDONLY)            = 4
OK

When using glibc/nss any of the tested cases does not work.

Comment 14 James Laska 2010-02-17 15:52:11 UTC
In IRC discussion with Kamil, he noted it wasn't clear when I filed this bug that F-12 does not use libcurl in loader, while F-13 does.  So it will be difficult to track down a specific change that introduced this, since it was a large change.

Kamil offered some suggestions ...

10:40:35  kdudka: so the portable solution is to not use libcurl before /etc/resolv.conf is valid
10:40:44  kdudka: or reload the module
10:41:16  kdudka: a less portable, though less intrusive right now, solution is to close the handle and create a new one
10:41:23  kdudka: but this will work only with c-ares
10:41:46  kdudka: we are going to drop c-ares for Fedora 14 because of several bugs, which are not easy to fix

Clumens noted that anaconda uses libcurl directly in stage#1 loader, but switches to using python-urlgrabber in stage#2.  So perhaps something in that transition is introducing this behavior?

I've added skvidal to the cc list in case this issue bleeds into the python-pycurl space.

Comment 15 seth vidal 2010-02-17 16:06:58 UTC
urlgrabber isn't intercepting anything at the perform level, that I'm aware of.


If there's a place I need to be looking at, I'll be glad to but I'm not sure how this happens at the urlgrabber layer.

Comment 16 Bill Nottingham 2010-02-17 17:38:52 UTC
(In reply to comment #14)
> Clumens noted that anaconda uses libcurl directly in stage#1 loader, but
> switches to using python-urlgrabber in stage#2.  So perhaps something in that
> transition is introducing this behavior?

The issue isn't what the loader uses or doesn't use, it appears to be just whether or not the loader sets up networking. If the loader doesn't, libcurl is initailized before networking is set up (presumably on module import), and it doesn't notice when the network actually is brought up.

Comment 17 Kamil Dudka 2010-02-17 17:40:39 UTC
(In reply to comment #14)
> In IRC discussion with Kamil, he noted it wasn't clear when I filed this bug
> that F-12 does not use libcurl in loader, while F-13 does.  So it will be
> difficult to track down a specific change that introduced this, since it was a
> large change.

James, thanks for summarizing it here!

> Clumens noted that anaconda uses libcurl directly in stage#1 loader, but

Is urlinstTransfer() in loader/urls.c the correct place to look at?

The function urlinstTransfer() does not seem to exist in f12 anaconda.  What's the equivalent there?

> switches to using python-urlgrabber in stage#2.  So perhaps something in that
> transition is introducing this behavior?

Chris, could you please point me to the right place in the code?

Was that code also changed between f12 and f13?

Comment 18 Kamil Dudka 2010-02-17 18:00:05 UTC
(In reply to comment #16)
> The issue isn't what the loader uses or doesn't use, it appears to be just
> whether or not the loader sets up networking. If the loader doesn't, libcurl is
> initailized before networking is set up (presumably on module import), and it
> doesn't notice when the network actually is brought up.    

Which module are you actually talking about?  pycurl, nor urlgrabber reads /etc/resolv.conf on import:

$ strace -e trace=open python <<< "import pycurl; pycurl.global_init(pycurl.GLOBAL_ALL)" 2>&1 | grep etc
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/ld.so.cache", O_RDONLY)      = 4
open("/etc/selinux/config", O_RDONLY)   = 4

$ strace -e trace=open python <<< "import urlgrabber" 2>&1 | grep etc
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/localtime", O_RDONLY)        = 7
open("/etc/ld.so.cache", O_RDONLY)      = 8
open("/etc/selinux/config", O_RDONLY)   = 8

However it happens on performing a transfer:

$ strace -e trace=open python <<< "import urlgrabber; urlgrabber.urlgrab('http://fedoraproject.org', '/dev/null')" 2>&1 | grep etcopen("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/localtime", O_RDONLY)        = 7
open("/etc/ld.so.cache", O_RDONLY)      = 8
open("/etc/selinux/config", O_RDONLY)   = 8
open("/etc/nsswitch.conf", O_RDONLY)    = 3
open("/etc/host.conf", O_RDONLY)        = 3
open("/etc/resolv.conf", O_RDONLY)      = 3
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 3
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/ld.so.cache", O_RDONLY)      = 3
open("/etc/gai.conf", O_RDONLY)         = -1 ENOENT (No such file or directory)

Comment 19 Chris Lumens 2010-02-17 18:19:28 UTC
> > Clumens noted that anaconda uses libcurl directly in stage#1 loader, but
> 
> Is urlinstTransfer() in loader/urls.c the correct place to look at?

urlinstTransfer is the sole user of libcurl in the loader, so if we want to investigate anaconda's use of libcurl that's the place to do it.  Note that there's no way out of urlinstTransfer without calling curl_easy_cleanup.

> The function urlinstTransfer() does not seem to exist in f12 anaconda.  What's
> the equivalent there?

urlinstStartTransfer and urlinstFinishTransfer are where you want to start.

> > switches to using python-urlgrabber in stage#2.  So perhaps something in that
> > transition is introducing this behavior?
> 
> Chris, could you please point me to the right place in the code?
> 
> Was that code also changed between f12 and f13?    

F12 and earlier uses our own url fetching code in loader, and yum/urlgrabber in stage2 (once the graphical installer starts, basically).  F13 and later uses libcurl in loader and yum/urlgrabber in stage2.  Note that the yum/urlgrabber stack has also changed in F13 to use pycurl.

I really don't see how changes in what loader uses could affect what happens in stage2, though.  As I said, I don't believe there's a way out of urlinstTransfer that doesn't result in curl_easy_cleanup getting called.

Comment 20 seth vidal 2010-02-17 18:27:30 UTC
urlgrabber in F12 was using pycurl, too.

It's not new to f13.

Comment 21 Chris Lumens 2010-02-18 16:08:32 UTC
Okay, let's sum things up.

This problem appears to be happening because the nameserver configuration provided by /etc/resolv.conf changes mid-installation.  The initial configuration was either empty or created when the network wasn't up, therefore is invalid.  The new configuration does not get used because the old configuration is cached.

We've seen problems like this before in anaconda, and the fix has always been to use glibc's res_init function to tell it to re-read the nameserver configuration.  However, that does not work in this case because libcurl uses a different resolver.  This resolver does not provide a similar function therefore we cannot use this fix.

One potential fix according to the reproducer above is to close down the handle provided by pycurl.Curl() and create a new one, as that effectively creates a new object and forces the nameserver configuration to be re-read.  However, anaconda does not have access to any of this information.  It doesn't create any pycurl handles - that's all hidden from us by the yum/urlgrabber stack.  Therefore, we can't really make that modification in anaconda.

A workaround is to bring up the network really early on, like in the loader by passing updates=.  For me, an unanswered question is why we're only seeing this now.  The only libcurl-related changes in anaconda are in the loader, not stage2, and are completely contained within one optional function.

Does this accurately sum up the current situation?

Comment 22 seth vidal 2010-02-18 16:22:43 UTC
Something to try out:

Yum keeps a grabber object per repo, ultimately.

this is where the pycurl.Curl() object gets setup - well - deep inside urlgrabber.

but we should be able to shut it all down by deleting the grabber objects in the yum repo objects.

so something like:

for repo in yumbaseobj.repos.listEnabled():
    repo._grab = None
    repo._grab_func = None

then the next time you use yum to fetch data it should setup a new grabber object.

Can you give that a try?

Comment 23 Kamil Dudka 2010-02-18 16:23:52 UTC
(In reply to comment #21)
> The only libcurl-related changes in anaconda are in the loader, not
> stage2, and are completely contained within one optional function.

f12 libcurl is not built on top of c-ares.  It was intorduced in f13 - see comment #6.  Is it possible to check it by forcing my own libcurl.so?  Is it sufficient to put the library on the updates floppy?

> Does this accurately sum up the current situation?    

AFAICT, yes.

Comment 24 Kamil Dudka 2010-02-18 17:01:42 UTC
Just to keep the bug up2date.  I've tried to force libcurl.so based on glibc/nss (instead of c-ares) and the error within anaconda did not occur.

Comment 25 Chris Lumens 2010-02-19 15:08:21 UTC
I looked into this further, and the suggestions in comment #22 did not help.  The reason for this is that the pycurl.Curl instance is created in urlgrabber.grabber outside any class, so merely importing urlgrabber.grabber before bringing up the network will trigger this issue.  I think anaconda and yum both do this.  We came up with a couple ideas for fixes:

(1) Have urlgrabber only create the Curl handle on demand.  This means no connection sharing, though.

(2) Add a method to urlgrabber to reset the Curl handle.  This means adding a method to the API that we don't really want people to use.

(3) Patching libcurl to handle network changes.

Comment 26 Kamil Dudka 2010-02-19 15:47:28 UTC
(In reply to comment #25)
> (3) Patching libcurl to handle network changes.    

The above implies patching of c-ares, including API/ABI changes in both libcurl and c-ares, or am I wrong?

Comment 27 seth vidal 2010-02-19 20:00:14 UTC
python-urlgrabber-3.9.1-5.fc13 built to implement #2.

reset_curl_obj()

Comment 28 Fedora Update System 2010-02-23 03:42:39 UTC
anaconda-13.29-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/anaconda-13.29-1.fc13

Comment 29 Fedora Update System 2010-02-23 14:04:59 UTC
python-urlgrabber-3.9.1-5.fc13,anaconda-13.29-1.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/python-urlgrabber-3.9.1-5.fc13,anaconda-13.29-1.fc13

Comment 30 Fedora Update System 2010-02-23 16:09:30 UTC
anaconda-13.29-1.fc13, python-urlgrabber-3.9.1-5.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update anaconda python-urlgrabber'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F13/FEDORA-2010-2688

Comment 31 James Laska 2010-02-23 17:45:00 UTC
Unable to test without a fix for the 'no input devices recognized' bug#566396.  Will retest once that issue is addressed.

Comment 32 Fedora Update System 2010-02-26 02:59:21 UTC
python-urlgrabber-3.9.1-5.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/F13/FEDORA-2010-2688

Comment 33 Fedora Update System 2010-02-26 11:53:21 UTC
python-urlgrabber-3.9.1-5.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update python-urlgrabber'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F13/FEDORA-2010-2688

Comment 34 James Laska 2010-02-26 14:01:14 UTC
Tested with F-13-Alpha-RC4 i386/x86_64.  The reported problem is resolved.

Comment 35 Fedora Update System 2010-03-02 00:52:06 UTC
python-urlgrabber-3.9.1-5.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/F13/FEDORA-2010-2688

Comment 36 Fedora Update System 2010-03-03 01:57:47 UTC
python-urlgrabber-3.9.1-5.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.