Bug 382691

Summary:	createrepo takes a long time over nfs
Product:	[Fedora] Fedora	Reporter:	Don Zickus <dzickus>
Component:	createrepo	Assignee:	James Antill <james.antill>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	low	Docs Contact:
Priority:	low
Version:	7	CC:	james.antill, lmacken
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-11-14 22:59:24 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Don Zickus 2007-11-14 15:55:52 UTC

Description of problem:
when I test the rhel5 kernel, I usually run 'make test' from the cvs directory.
 This kicks off the createrepo command which looks something like

mkdir -p /work/public_html/dist-cvs-repos/kernel-2.6.18-56.el5
/usr/bin/createrepo -o
/home/dzickus/public_html/dist-cvs-repos/kernel-2.6.18-56.el5 --baseurl
http://porkchop.redhat.com/brewroot/packages/kernel/2.6.18/56.el5/
/mnt/redhat/brewroot/packages/kernel/2.6.18/56.el5

where /home/dzickus is my autofs home directory up in Westford and /mnt/redhat
is nfs mounted to get me the brewroot packages
(curly.devel.redhat.com:/vol/engineering/devarchive/redhat)

currently it takes about 45 minutes for this to complete.  If I were to log on
to porkchop and run the same command, it takes around a minute or so.

It seems the nfs connection is slowing things down.  Now I understand there will
be latency issues from Raleigh to Westford and running this on porkchop produces
an ideal scenario.  However, I was hoping that running this over nfs in Westford
would cost me about 5 minutes instead of the 45 minutes.

Version-Release number of selected component (if applicable):
createrepo-0.4.10-1.fc7

How reproducible:
very

Steps to Reproduce:
1. run make test over an nfs mounted /mnt/redhat directory
2.
3.
  
Actual results:
sloow

Expected results:
fast

Additional info:

Comment 1 Seth Vidal 2007-11-14 16:19:19 UTC

umm - why is this filed against createrepo? How could createrepo make the rate
of reading/writing files to nfs any faster?

Comment 2 Don Zickus 2007-11-14 19:26:35 UTC

I don't know the internals of what createrepo is doing and haven't noticed any
other performance problems with the other scripts I use on that directory.  

I just ran the following test

- cp all the rpms from packages/kernel/2.6.18/56.el5 to a local directory and
ran createrepo on that -> 20 minutes + 1.5 minutes = 21.5 minutes total

- createrepo using nfs mounted rpms -> 26.5 minutes

It seems odd that it would be faster to copy everything over and run createrepo
on that rather than run createrepo on the nfs mount.

Perhaps the bug is somewhere else (glibc, kernel nfs, etc) but I don't know how
to prove that so I am starting here.  I am hoping someone with internal
knowledge of createrepo can help me diagnose and point me in a direction to
where I can find the root cause here.

Comment 3 James Antill 2007-11-14 21:18:15 UTC

 Wait you care about 21.5 mins vs. 26.5 mins? That's like a 23% increase. 1 min
vs. 45 mins, I can see why you'd care.

 You could try using --cachedir to help out on the write side, but as seth says
we really can't do anything about the read side (apart from recommending you
have a real local copy in westford).

Comment 4 Don Zickus 2007-11-14 21:32:16 UTC

Please re-read my comment #2.

I care about 1.5 minutes vs 26.5 minutes.  The 21.5 minutes was an example test
case to demonstrate that createrepo seems broken.

Comment 5 Seth Vidal 2007-11-14 21:36:20 UTC

So the total read time over nfs took 20 minutes.
the total createrepo time took 1.5 minutes

running createrepo on files in nfs means:
1. read the files over nfs
2. make the metadata

so, um.. what?

Comment 6 Don Zickus 2007-11-14 21:54:50 UTC

I assumed createrepo was smart in the sense it wouldn't read _all_ 2.6GB of data
in order to create the metadata but instead just read the rpm metadata which
would be considerably smaller.

Perhaps my assumption is wrong?

Comment 7 Seth Vidal 2007-11-14 22:06:25 UTC

createrepo has to checksum and in some cases gpg check every package. So there's
no way to just read that data.

I don't believe it is possible to checksum a file w/o reading ALL of it.

Comment 8 Don Zickus 2007-11-15 15:07:27 UTC

Hmm.  Interesting.  I didn't realize createrepo checksums the whole package.  I
guess that would make sense for the long wait.

Now, out of curiousity, I thought (from a high level perspective here) that rpm
would checksum and sign the payload, storing it in the rpm header.  While
yum/createrepo would just checksum and sign the header.  This way the payload
checksum is calculated at rpm creation time saving yum effort in re-calculating.
 Then upon remote downloading an rpm, verification would occur in two steps: 
verify the header, use the info inside the header to verify the payload.

I mean why should yum care about the payload?  All it does is handle a
collection of rpm headers, right?

Or is it because yum has to deal with broken rpm headers?

Just trying to overcome my ignorance in this area.  I wandered in to the rpm
source code numerous times to look at other performance problems.  The house of
mirrors scared me enough to leave it be.