Bug 537870 - Failure to download images/install.img - error reading header: cpio: read failed
Summary: Failure to download images/install.img - error reading header: cpio: read failed
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: David Cantrell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-16 16:00 UTC by James Laska
Modified: 2013-09-02 06:42 UTC (History)
4 users (show)

Fixed In Version: anaconda-13.9-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-23 19:53:55 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/tmp/syslog (50.21 KB, text/plain)
2009-11-16 16:00 UTC, James Laska
no flags Details
/tmp/anaconda.log (2.12 KB, text/plain)
2009-11-16 16:05 UTC, James Laska
no flags Details

Description James Laska 2009-11-16 16:00:40 UTC
Created attachment 369726 [details]
/tmp/syslog

Description of problem:

 * Fails to activate networking 

Version-Release number of selected component (if applicable):

 * anaconda version 13.8

How reproducible:


Steps to Reproduce:
1. Initiate a manual install on virt or bare metal
2. provide a location for a remote install source
  
Actual results:

                      ┌────────────┤ Error ├────────────┐                       
                      │                                 │                       
                      │ Unable to retrieve              │                       
                      │ http://download.fedoraproject.o │                       
                      │ rg/pub/fedora/linux/development │                       
                      │ /x86_64/os//images/install.img. │                       
                      │                                 │                       
                      │             ┌────┐              │                       
                      │             │ OK │              │                       
                      │             └────┘              │                       
                      │error reading header: cpio: read failed -│Success                
                      │                                 │                       
                      └─────────────────────────────────┘     


Expected results:

Downloading images/install.img correctly.

Additional info:

 * See attached files (/tmp/syslog and anaconda.log).
 * From the failing system, I am able to ping other hosts, but only by IP.  It seems DNS might not be setup correctly?

Comment 1 James Laska 2009-11-16 16:05:16 UTC
Created attachment 369727 [details]
/tmp/anaconda.log

Comment 2 Chris Lumens 2009-11-16 16:16:32 UTC
According to your syslog, you've got nameserver information but this looks likely to be the problem:

<185>Nov 16 15:03:02 NET: dhclient: failed to create default route: 10.10.11.254 dev eth0

Comment 3 Dan Williams 2009-11-17 02:52:55 UTC
I wonder why it's even try to do that; when run by NetworkManager dhclient-script doesn't get run but instead NetworkManager handles the default route.  But that message appears to come from dhclient-script's add_default_gateway() function.  Which I don't understand...

David, any idea here?  NM runs dhclient with a command-line like:

/sbin/dhclient -d -sf /usr/libexec/nm-dhcp-client.action -pf /var/run/dhclient-usb0.pid -lf /var/lib/dhclient/dhclient-f4419c0a-1740-4b2b-b61e-91935bdae692-usb0.lease -cf /var/run/nm-dhclient-usb0.conf usb0

using the custom script file of course...  the script handling seems a bit convoluted in dhclient, but I can't offhand see where it would ever be calling dhclient-script anywhere.

Comment 4 David Cantrell 2009-11-18 03:14:29 UTC
I'm definitely able to reproduce this locally.  NetworkManager and loader are working fine.  At least for me, I get a DHCP lease, NM does what it does, I have an IP, routing table configured, and /etc/resolv.conf written.

I hit OK at the error message for Unable to Download and change the hostname of my install server to the IP address.  It works after that.

It looks like our problem is with libcurl and DNS resolution.  I added this to loader/urls.c:

diff --git a/loader/urls.c b/loader/urls.c
index 495516a..24ceb33 100644
--- a/loader/urls.c
+++ b/loader/urls.c
@@ -104,6 +104,8 @@ int urlinstTransfer(struct loaderData_s *loaderData, struct 
                     char **extraHeaders, char *dest) {
     struct progressCBdata *cb_data;
     CURLcode status;
+    CURLSHcode sh;
+    CURLSH *sharedns = NULL;
     struct curl_slist *headers = NULL;
     char *version;
     FILE *f = NULL;
@@ -126,6 +128,34 @@ int urlinstTransfer(struct loaderData_s *loaderData, struct
     curl_easy_setopt(loaderData->curl, CURLOPT_URL, ui->url);
     curl_easy_setopt(loaderData->curl, CURLOPT_WRITEDATA, f);
 
+    if ((sharedns = curl_share_init()) != NULL) {
+        sh = curl_share_setopt(sharedns, CURLSHOPT_SHARE, CURL_LOCK_DATA_DNS);
+
+        if (sh != CURLSHE_OK) {
+            logMessage(ERROR, "%s: %d: %s", __func__, __LINE__,
+                       curl_easy_strerror(sh));
+            sh = curl_share_cleanup(sharedns);
+
+            if (sh != CURLSHE_OK) {
+                logMessage(ERROR, "%s: %d: %s", __func__, __LINE__,
+                           curl_easy_strerror(sh));
+            }
+
+            sharedns = NULL;
+        } else {
+            status = curl_easy_setopt(loaderData->curl, CURLOPT_SHARE,
+                                      sharedns);
+            if (status != CURLE_OK) {
+                logMessage(ERROR, "%s: %d: %s", __func__, __LINE__,
+                           curl_easy_strerror(status));
+            }
+        }
+    } else {
+        logMessage(ERROR, "%s: %d: curl_share_init() returned NULL",
+                   __func__, __LINE__);
+        sharedns = NULL;
+    }
+
     /* If a proxy was provided, add the options for that now. */
     if (loaderData->proxy && strcmp(loaderData->proxy, "")) {
         curl_easy_setopt(loaderData->curl, CURLOPT_PROXY, loaderData->proxy);
@@ -183,6 +213,12 @@ int urlinstTransfer(struct loaderData_s *loaderData, struct
     fclose(f);
     free(version);
 
+    sh = curl_share_cleanup(sharedns);
+    if (sh != CURLSHE_OK) {
+        logMessage(ERROR, "%s: %d: %s", __func__, __LINE__,
+                   curl_easy_strerror(sh));
+    }
+
     return status;
 }
 

*BUT*, that didn't work.  I need to read up on libcurl a bit more and figure out what's happening.  libcurl has CURLOPT_DNS_USE_GLOBAL_CACHE, which is marked as deprecated.  You are supposed to create a shared variable and enable the DNS settings there.  That's what I tried, but my first attempt didn't work.  Just wanted to let people know where I am.  It's definitely not a NetworkManager problem.

Comment 5 Dan Williams 2009-11-18 20:04:03 UTC
That's odd.  Why does curl care at all?  Doesn't it just do gethostbyname() or whatever and libc takes care of the DNS resolution?

Maybe the res_init() call in get_connection() somehow isn't doing what we want?  Can you try calling res_init() right before you start the libcurl requests?

Looks like the only place the loader calls res_init() is in get_connection():

        if (state == NM_STATE_CONNECTED) {
            logMessage(INFO, "%s (%d): NetworkManager connected",
                       __func__, __LINE__);
            res_init();
            g_object_unref(client);
            return 0;
        }

which should be fine, but lets try tossing res_init() in a few other places to work around glibc stupidity...

Comment 6 David Cantrell 2009-11-18 20:31:03 UTC
libcurl maintains some sort of state when you initialize it.  From their curl_easy_setopt() documentation:

"NOTE: the name resolve functions of various libc implementations don't re-read name server information unless explicitly told so (for example, by calling res_init(3)). This may cause libcurl to keep using the older server even if DHCP has updated the server info, and this may look like a DNS cache issue to the casual libcurl-app user."

What we were doing in loader was calling curl_global_init() once and then using that curl object throughout.  I moved the init calls for curl to our urlinstTransfer() function and added clean up functions, so each time we need to use curl, we set things up, call curl, and clean up.  Fixes the problem we're seeing.

Comment 7 David Cantrell 2009-11-19 00:51:29 UTC
Fixed in commit 46312dc05b61d7fd18fe9710461eb9b0a9118607.


Note You need to log in before you can comment on or make changes to this bug.