Bug 431757

Summary: caching needed when "Starting install process."
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: dtimms, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-01 15:36:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 235705    
Attachments:
Description Flags
anaconda.log
none
X.log
none
syslog
none
anaconda.log
none
X.log
none
follow tcp stream - child died
none
strace for one .hdr while "Starting install process" none

Description John Reiser 2008-02-06 19:44:38 UTC
Description of problem:  Installer hangs "Starting install process.  This may
take several minutes." after 15 minutes on a 2.5GHz machine with 2GB RAM.


Version-Release number of selected component (if applicable):
anaconda-11.4.0.28

How reproducible: Always


Steps to Reproduce:
1. Fedora 9 Alpha:  Boot today's i386 Rescue CD (2008-02-06), choose Install,
English, US kenboard, enable IPv4 only (no IPv6),
http://mirrors.kernel.org/fedroa/releases/test/9-Alpha/Fedora/i386/os
2. Enter timezone, password, manual partition (choose pre-partitioned 15GB as
root, format to ext3), DHCP, specify hostname, no GRUB, root password, default
Base + productivity.
3.
  
Actual results: Hangs at "Starting install process.  This may take several
minutes." Shell on C-A-F2 allows "top", which shows almost no CPU activity, but
processes cycle through Xorg, etc.


Expected results: Install continues normally.


Additional info: Will attach anaconda.log and X.log .

Comment 1 John Reiser 2008-02-06 19:45:57 UTC
Created attachment 294135 [details]
anaconda.log

Comment 2 John Reiser 2008-02-06 19:46:21 UTC
Created attachment 294136 [details]
X.log

Comment 3 John Reiser 2008-02-06 19:51:53 UTC
The box already runs Fedora 8 i386 and x86_64 and Ubuntu 7.04, all installed in
the last 7 days.

Comment 4 John Reiser 2008-02-06 20:59:11 UTC
Created attachment 294146 [details]
syslog

Comment 5 John Reiser 2008-02-06 20:59:31 UTC
Created attachment 294147 [details]
anaconda.log

Comment 6 John Reiser 2008-02-06 20:59:51 UTC
Created attachment 294148 [details]
X.log

Comment 7 John Reiser 2008-02-06 21:23:12 UTC
Package install started after 20 minutes.  TWENTY MINUTES.  Something is
*horribly* slow.  2.5GHz (Athlon X2 4800+) and 2GB RAM is not the fastest box
today, but it is a very long ways away from the advertised minimums.

Comment 8 Jeremy Katz 2008-02-06 23:10:48 UTC
I *suspect* that your latency was network based.  We have to do a lot of header
grabbing to actually start the install.  And that's not exactly fast

Comment 9 David Timms 2008-02-12 13:58:42 UTC
Created attachment 294659 [details]
follow tcp stream - child died

(In reply to comment #8)
> I *suspect* that your latency was network based.  We have to do a lot of
> header grabbing to actually start the install.  And that's not exactly fast

If it is going to take more than 1 second - please give good tui/gui.
Eg. retrieved header 1 of 1345, or a progress bar. Est time remaing after the
first two items complete...to give some impression.
====
However, my opinion:
Based on a vmware-server 1.0.4 test on a F8 kernel-2.6.23.14-115.fc8 host. My
machine is still saying "Starting Install Process - this may take several
minutes" after 3 hours.
On VT3, I can see WARNING : Try 1/10 ftp://gg/g.rpm Failed. This seems to be
happening every couple of files from download.fedora.redhat.com

Since it is in a vm, I ran wireshark on the host and see:

wait {30 seconds}
ftp login|cd|test|RESU {4 seconds <1-2kB/sec via gkrellm}
ftp data {< 2 seconds 5-20kB/sec}
- repeat wait {30 seconds} and so on.

Note: 
1. the ftp connection is not kept alive. it logs in|cd's back to the same
folder for every file.
2. often the "follow tcp stream" shows:
- 500 OOPS: vsf_sysutil_recv_peek: no data
- 500 OOPS: child died
during the download. In this case the data is downloading nicely, when the
above occurs. capture shows the ftp client sending a heap eg 10-30 TCP RSTs,
which the FTP server ignores for a bit, then aborts transfer.

The combination looks like it will take the machine all night++ to complete.

Comment 10 David Timms 2008-02-12 14:10:47 UTC
Very occasionally I see:
421 There are too many connections from your internet address.

in the ftp stream.
However, the client waits only 0.1s of seconds, connects logs in and starts the
cd|RESU process.

John: did you install actually complete ? 
did you give it 8 hours or more ?

Comment 11 John Reiser 2008-02-12 15:13:15 UTC
I succesfully completed two installs using the original F-9(Alpha) rescue CD and
http.  After  the clue about network latency (Comment #8 above), I watched the
activity light on the ethernet port during "Starting install process."  There
was an obvious cycle of about 2 or 3 seconds in length, repeated over and over.
 The number of packages is about 870, which would be 30 to 45 minutes for all
the <package>.hdr.  [Perhaps the remote end got faster from time to time.]

This is an obvious candidate for pipelining, parallelism (3 or 4 threads), or
for aggregating the *.hdr into a single file on the distribution medium in advance.

Comment 12 John Reiser 2008-02-12 15:59:50 UTC
(In reply to comment #9)
For some ranges of network difficulties (such as moderate traffic congestion on
route, high load on server or client, etc.) then http can be noticeably faster
than ftp.  http requires fewer network round trips per file (example: no
separate 'cd' command; the full path is in the original http GET request) and
has less state in a connection.  Sometimes http servers degrade more gracefully
(they just increase latency) than ftp servers (total cutoff based on number of
simultaneous connections.)

Comment 13 David Timms 2008-02-12 20:17:25 UTC
(In reply to comment #10)
> The combination looks like it will take the machine all night++ to complete.
The install completed; end=06:50, start=21:30 ie: 9hr:20. Installed size=2.5GB.
In a rough way that works out to 595kbit/sec for the duration {on a 10Mb adsl
connection that is traversing the pacific}.

For an additional data point, I will repeat with nearest .au mirror, then from
internal web server.

Comment 14 John Reiser 2008-02-20 18:55:04 UTC
Attaching strace [strace -o "|gzip -dc >strace.out.gz" -p <pid_of_anaconda>] and
analyzing the output suggests:

1. Cache the patience of the human by using a progress bar (bug #186789).

2. Re-use the connection for more than one .hdr; do not close the connection
every time.  Put a limit of something like one minute elapsed or 20 .hdr per
connection on this caching.

3. Cache the DNS: don't translate the mirror name into numeric IPx for every
.hdr.  Even if there is a caching DNS server on the same local network,
communicating with it is slow; and in many cases the nearest DNS is even slower
because it is via DSL or cable modem.  Put a limit of something like five
minutes on this caching.

4. Cache CPU time: use a 64-character local buffer to read http and ftp command
protocol.  The current 1-character reads on http require two system calls per
character, which is slower than a 150KB/s ADSL network!  (Evidence: the first
8192-byte read for payload [after the http GetPartialContent protocol overhead]
often returns more than the 1388 bytes [hint: MTU] of subsequent reads.)  

5. On DVD or CD, optimize usage of the device's buffer (often 2MB sequential)
and of the head position.  This can be done by splitting each .rpm into two File
Sections [see 6.5.1 of the Ecma-119.pdf ISO 9660 specification], with the first
section [extent] covering the .hdr sectors.  Group the .hdr sections together
sequentially; put the remainder of each file somewhere else.  This will reduce
seeks dramatically (bug #352481).

6. Fix .rpm to allow compressed ChangeLog and compressed FileList.  These are
*not* compressed in current .rpm!

Comment 15 John Reiser 2008-02-20 19:10:16 UTC
Created attachment 295440 [details]
strace for one .hdr while "Starting install process"

Here is current system call activity (763 calls) to retrieve one .hdr via http.
 Note the DNS resolution of mirrors.kernel.org, reading the http response using
a separate poll+recv for each character, payload read gets 3844 bytes the first
time and only 1388 the second, connection closed immediately.

Comment 16 Chris Lumens 2008-03-12 21:55:35 UTC
#1:  Done.

Comment 17 Will Woods 2008-04-01 17:23:15 UTC
Since there's a progress bar for this now, I wouldn't consider it a release
blocker. But there's still some work that could be done here. 

I suspect most of the work that will need to happen will be deep in the bowels
of urlgrabber. We might even consider moving to pycurl.. but these are both F10
items, at the earliest.

I'm moving this to F9Target, in hopes that someone will find a simple speedup
that we can use for F9.

Comment 18 Will Woods 2008-04-01 17:25:12 UTC
Oh yeah - see also bug 344301, regarding caching headers from the CD.

Comment 19 Will Woods 2008-04-01 17:25:52 UTC
and bug 344301 about poor FTP performance.

Comment 20 Will Woods 2008-04-01 17:27:11 UTC
Oops, sorry - bug 352481 is the one about CD caching. It was already mentioned
in comment #14.

Comment 21 Bug Zapper 2008-05-14 05:02:15 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Andy Lindeberg 2008-08-01 15:36:12 UTC
Item 1 has been taken care of, and we will consider any patches you submit for
items 3, 4, and 5. Please submit items 2 and 6 against the proper components
(urlgrabber and rpm, respectively).