Red Hat Bugzilla – Bug 382691
createrepo takes a long time over nfs
Last modified: 2014-01-21 18:00:41 EST
Description of problem:
when I test the rhel5 kernel, I usually run 'make test' from the cvs directory.
This kicks off the createrepo command which looks something like
mkdir -p /work/public_html/dist-cvs-repos/kernel-2.6.18-56.el5
where /home/dzickus is my autofs home directory up in Westford and /mnt/redhat
is nfs mounted to get me the brewroot packages
currently it takes about 45 minutes for this to complete. If I were to log on
to porkchop and run the same command, it takes around a minute or so.
It seems the nfs connection is slowing things down. Now I understand there will
be latency issues from Raleigh to Westford and running this on porkchop produces
an ideal scenario. However, I was hoping that running this over nfs in Westford
would cost me about 5 minutes instead of the 45 minutes.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run make test over an nfs mounted /mnt/redhat directory
umm - why is this filed against createrepo? How could createrepo make the rate
of reading/writing files to nfs any faster?
I don't know the internals of what createrepo is doing and haven't noticed any
other performance problems with the other scripts I use on that directory.
I just ran the following test
- cp all the rpms from packages/kernel/2.6.18/56.el5 to a local directory and
ran createrepo on that -> 20 minutes + 1.5 minutes = 21.5 minutes total
- createrepo using nfs mounted rpms -> 26.5 minutes
It seems odd that it would be faster to copy everything over and run createrepo
on that rather than run createrepo on the nfs mount.
Perhaps the bug is somewhere else (glibc, kernel nfs, etc) but I don't know how
to prove that so I am starting here. I am hoping someone with internal
knowledge of createrepo can help me diagnose and point me in a direction to
where I can find the root cause here.
Wait you care about 21.5 mins vs. 26.5 mins? That's like a 23% increase. 1 min
vs. 45 mins, I can see why you'd care.
You could try using --cachedir to help out on the write side, but as seth says
we really can't do anything about the read side (apart from recommending you
have a real local copy in westford).
Please re-read my comment #2.
I care about 1.5 minutes vs 26.5 minutes. The 21.5 minutes was an example test
case to demonstrate that createrepo seems broken.
So the total read time over nfs took 20 minutes.
the total createrepo time took 1.5 minutes
running createrepo on files in nfs means:
1. read the files over nfs
2. make the metadata
so, um.. what?
I assumed createrepo was smart in the sense it wouldn't read _all_ 2.6GB of data
in order to create the metadata but instead just read the rpm metadata which
would be considerably smaller.
Perhaps my assumption is wrong?
createrepo has to checksum and in some cases gpg check every package. So there's
no way to just read that data.
I don't believe it is possible to checksum a file w/o reading ALL of it.
Hmm. Interesting. I didn't realize createrepo checksums the whole package. I
guess that would make sense for the long wait.
Now, out of curiousity, I thought (from a high level perspective here) that rpm
would checksum and sign the payload, storing it in the rpm header. While
yum/createrepo would just checksum and sign the header. This way the payload
checksum is calculated at rpm creation time saving yum effort in re-calculating.
Then upon remote downloading an rpm, verification would occur in two steps:
verify the header, use the info inside the header to verify the payload.
I mean why should yum care about the payload? All it does is handle a
collection of rpm headers, right?
Or is it because yum has to deal with broken rpm headers?
Just trying to overcome my ignorance in this area. I wandered in to the rpm
source code numerous times to look at other performance problems. The house of
mirrors scared me enough to leave it be.