Bug 360291 - yum.resolveDeps infinite loop when mashing f7-updates-testing for ppc
Summary: yum.resolveDeps infinite loop when mashing f7-updates-testing for ppc
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: mash
Version: rawhide
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Bill Nottingham
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-10-31 14:30 UTC by Luke Macken
Modified: 2016-09-20 02:38 UTC (History)
6 users (show)

Fixed In Version: 3.2.8-1.fc8
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-06 20:52:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
mash-ppc.out (3.47 KB, text/plain)
2007-11-02 22:57 UTC, Luke Macken
no flags Details
solve.py with excludes (1.33 KB, text/x-python)
2007-11-06 09:35 UTC, Tim Lauridsen
no flags Details
Patch to stop the looping (452 bytes, patch)
2007-11-07 13:54 UTC, Tim Lauridsen
no flags Details | Diff

Description Luke Macken 2007-10-31 14:30:32 UTC
Description of problem:
Since 10/23, we have been unable to mash a ppc updates-testing repository.  The
mash process gets stuck within yum.resolveDeps(), and continues to eat as much
memory as possible.

Output of 1 iteration (cut after the first mentioned package comes up again)
http://lmacken.fedorapeople.org/mash.out

Output of `grep "Checking deps for" mash.out`
http://lmacken.fedorapeople.org/mash.deps

During one of my tests, I was able to trigger this bug when trying to only mash
this python package.  This mash never finished and had to be killed manually.
http://lmacken.fedorapeople.org/python-2.5-14.fc7.mash

Version-Release number of selected component (if applicable):
yum-3.2.7-1
mash-0.2.8-1

Comment 1 Luke Macken 2007-10-31 16:32:38 UTC
So I was just able to mash f7-updates, which included
python-2.5-14.fc7.ppc{,64}.rpm, so that may not be the culprit.

Comment 2 Tim Lauridsen 2007-11-01 13:52:15 UTC
Took a look at the mash code, there is a lot of os.fork()'ing, maybe i would be
an  idea to make a hacked version of mash without the fork'ing. I might take a
little longer but it should be less resource hungry and safer.


Comment 3 Bill Nottingham 2007-11-01 13:59:54 UTC
Why would forking or not change the depsolver behavior?

Comment 4 Luke Macken 2007-11-01 14:17:47 UTC
FWIW, I'm running my mash tests with fork=False.

I just kicked off a mash of f7-updates-testing using the latest yum from git,
which contains some depsolver patches from Florian Festi.  We'll see what happens.

Comment 5 Tim Lauridsen 2007-11-01 14:20:18 UTC
(In reply to comment #3)
> Why would forking or not change the depsolver behavior?

It should not, but as far as i can seen you are running more depsolves at the
same   time in a different process (one for each arch).
Running something in a background can introduce some weird issues in python, so
to    make it easier to debug IMHO, it would better if it was running in the
foreground to see, if it still base possible to make the depsolver go nuts.


Comment 6 Tim Lauridsen 2007-11-01 14:21:11 UTC
(In reply to comment #4)
> FWIW, I'm running my mash tests with fork=False.
> 
> I just kicked off a mash of f7-updates-testing using the latest yum from git,
> which contains some depsolver patches from Florian Festi.  We'll see what happens.

As far as i can see in the mash git repo, fork=false is not working any more

Comment 7 Luke Macken 2007-11-01 19:29:41 UTC
The f7-updates-testing mash with the latest yum/mash code produced the same
results: two mash processes stuck depsolving multilib forever.

Comment 8 Luke Macken 2007-11-01 19:34:27 UTC
 5052 lmacken   18   0 2158m 1.9g 1392 R  2.3 95.1 107:26.77 mash                  

Also having trouble mashing i386/x86_64 updates-testing as well.

Comment 9 Bill Nottingham 2007-11-01 19:39:14 UTC
To eliminate the forking issue - is this just mashing one arch at a time? 

Comment 10 Luke Macken 2007-11-01 20:45:06 UTC
Yep, I commented out the os.fork() in Mash.doDepSolveAndMultilib.

Comment 11 Luke Macken 2007-11-02 22:57:59 UTC
Created attachment 247151 [details]
mash-ppc.out

Attached is the ppc{,64} multilib filelist from another failed attempt, just in
case someone can see something blatantly wrong with it.

Comment 12 Christopher Stone 2007-11-03 17:48:22 UTC
I am not sure if this is related, but my package does not show up in the testing
repos AFAICT:

https://admin.fedoraproject.org/updates/F7/FEDORA-2007-2621

This updated involved changing a devel subpackage from noarch to be arch specific.

Comment 13 Seth Vidal 2007-11-03 17:53:28 UTC
Things that we know:
1. it only happens on ppc
2. it only happens on rawhide/f8 - eventhough f7 has the same mash/yum versions
3. there appears to be no rhyme nor reason to the package set installed
in it when it loops
4. we can't seem to get a normal yum depsolving call to blow up like
this
5. it didn't happen with yum 3.2.5-3 or 3.2.7-*. It started happening
after 2 weeks of mashes on 3.2.5-3. So it clearly isn't anything NEW in
the yum code it's something older that is being triggered by a change in
packages.

Anyone else know any other definitive facts we know?


Comment 14 Luke Macken 2007-11-03 18:23:30 UTC
(In reply to comment #13)
> Things that we know:
> 1. it only happens on ppc
> 2. it only happens on rawhide/f8 - eventhough f7 has the same mash/yum versions

To clarify a bit, it happens *on* FC6 x86_64 when trying to compose a ppc F7
updates-testing repository, using mash-0.2.8-1 and yum-3.2.7-1/yum-3.2.5


Comment 15 Luke Macken 2007-11-03 18:49:56 UTC
Hmm.. random observation before I head out the door.

When my f7-updates-testing.mash file contains:

   arches = i386 x86_64

It doesn't even enter Mash.doDepSolveAndMultilib., and finishes in less than 2
minutes.  However, when I change the order of the arches to:

   arches = x86_64 i386

It solves for multilib, and seems to get into the same infinite loop.

Comment 16 Luke Macken 2007-11-05 17:28:22 UTC
So now I am unable to re-produce the ordering issue that I mentioned in my last
comment.  Regardless of ordering of x86_64/i386, both cases add files for
multilib and never finish solving deps for it.

Comment 17 Luke Macken 2007-11-05 18:00:26 UTC
Here is how you can reproduce this issue locally.  I have been able to do so on
two different machines running FC6 and F7.  broken-repo.tar.bz2 is about 290mb;
we need all of the RPMs because mash uses YumLocalPackage.

  $ wget http://publictest2.fedoraproject.org/broken-repo.tar.bz2
  $ tar -jxvf broken-repo.tar.bz2
  $ cd broken-repo
  $ ./solve.py

The solve.py that is in the tarball is based on the values that we're hitting in
mash.  You can find it here, for reference:
http://publictest2.fedoraproject.org/solve.py

Comment 18 Tim Lauridsen 2007-11-06 09:34:23 UTC
I have done a little testing with the broken-repo
I can reproduce the error.

I have found out that removing
libselinux-2.0.14-8.fc7.x86_64.rpm
will make it work right, so the problem is some how triggered by this package.

i have attached a modified solve.py there excludes
libselinux-2.0.14-8.fc7.x86_64.rpm



Comment 19 Tim Lauridsen 2007-11-06 09:35:32 UTC
Created attachment 248921 [details]
solve.py with excludes

Comment 20 Tim Lauridsen 2007-11-06 12:33:24 UTC
Added some extra debug output to the yum depsolver and found out
libgcj-devel-4.1.2-18.fc7.x86_64.rpm is the package causing the endless loop 
combinded with libselinux-2.0.14-8.fc7.x86_64.rpm i a transaction with a lot of
other packages.

The looping takes place in 

in this piece of code in the Depsolve class (yum/depsolve.py)

           # check Requires
            while CheckDeps:
                print "check Requires"
                self.cheaterlookup = {}
                if self.dsCallback: self.dsCallback.tscheck()
                CheckDeps, checkinstalls, checkremoves, missing =
self._resolveRequires(errors)
                CheckInstalls |= checkinstalls
                CheckRemoves |= checkremoves

The self._resolveRequeres method is returning CheckDeps = 1 all the time when
the endless looping take place.

the interesting code in the self._resolveRequeres

            missing_in_pkg = False
            for po, dep in thisneeds:
                (checkdep, missing, errormsgs) = self._processReq(po, dep)
                if checkdep:
                    print checkdep,po,missing,dep
                CheckDeps |= checkdep
                errors += errormsgs
                missing_in_pkg |= missing

I have added the 
                if checkdep:
                    print checkdep,po,missing,dep
lines to see package there not can be resolved

When the looping occours i get this output

check Requires
1 libstdc++-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libstdc++.so.6', 0, '')
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')
1 libgcc - 4.1.2-18.fc7.i386 0 ('/usr/sbin/libgcc_post_upgrade', 0, '')
1 calc-stdrc - 2.12.2.1-9.fc7.x86_64 0 ('/usr/bin/calc', 0, '')
1 libgcc - 4.1.2-18.fc7.x86_64 0 ('/usr/sbin/libgcc_post_upgrade', 0, '')
check Requires
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')
check Requires
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')
check Requires
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')
check Requires
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')

If i remove some of the packages from the transaction the i will not loop forever.
removed yum-updatesd-3.2.6-2.fc7.noarch.rpm
removed yum-utils-1.1.8-1.fc7.noarch.rpm
removed yum-versionlock-1.1.8-1.fc7.noarch.rpm


check Requires
1 libstdc++-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libstdc++.so.6', 0, '')
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')
1 libgcc - 4.1.2-18.fc7.i386 0 ('/usr/sbin/libgcc_post_upgrade', 0, '')
1 calc-stdrc - 2.12.2.1-9.fc7.x86_64 0 ('/usr/bin/calc', 0, '')
1 libgcc - 4.1.2-18.fc7.x86_64 0 ('/usr/sbin/libgcc_post_upgrade', 0, '')
1 yum-refresh-updatesd - 1.1.8-1.fc7.noarch 0 ('yum-updatesd', 0, '')
check Requires

I could have been other packages than these 3 one, if you remove 3-8 random
packages from the transaction the it will work.





Comment 21 Tim Lauridsen 2007-11-06 13:14:05 UTC
This is the output from yum with debuglevel=9 just before each
1 libgcj-devel - 4.1.2-18.fc7.x86_64 0 ('/usr/lib64/libgcj.so.8rh', 0, '')


Checking deps for libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
looking for ('libgcj', 'EQ', ('0', '4.1.2', '18.fc7')) as a requirement of
libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
looking for ('/usr/lib64/libgcj.so.8rh', None, (None, None, None)) as a
requirement of libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
looking for ('zlib-devel', None, (None, None, None)) as a requirement of
libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
looking for ('/usr/lib64/libz.so', None, (None, None, None)) as a requirement of
libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
looking for ('/bin/awk', None, (None, None, None)) as a requirement of
libgcj-devel.x86_64 0-4.1.2-18.fc7 - u
libgcj-devel requires: /usr/lib64/libgcj.so.8rh
Searching pkgSack for dep: /usr/lib64/libgcj.so.8rh
skipping reposetup, pkgsack exists
skipping reposetup, pkgsack exists
Potential match for /usr/lib64/libgcj.so.8rh from libgcj - 4.1.2-18.fc7.x86_64
libgcj already in ts, skipping this one
1 libgcj-devel - 4.1.2-18.fc7.x86_64 libgcj-devel-4.1.2-18.fc7.x86_64.rpm
('/usr/lib64/libgcj.so.8rh', 0, '')

Comment 22 Luke Macken 2007-11-06 15:13:19 UTC
Awesome, thanks for helping us dig into this, Tim.  I also noticed yesterday
that when I excluded ustr-1.0.1-5.fc7, I was able to avoid the loop -- but
untagging it from updates-testing did not fix the problem.

So, in terms of updates-testing, what do you recommend we do to temporarily
mitigate this issue so we can get people testing stuff again?  Untag
gcc-4.1.2-18.fc7 and/or libselinux-2.0.14-8.fc7 ?

Comment 23 Tim Lauridsen 2007-11-07 13:17:04 UTC
untaging libselinux or keeping the number of packages below 200, should do also
work. but that can be hard to do.



Comment 24 Tim Lauridsen 2007-11-07 13:54:37 UTC
Created attachment 250051 [details]
Patch to stop the looping

This patch to yum stops the looping, but i don't know if breaks the depsolving
logic.
Comments please.

Comment 25 Jesse Keating 2007-11-07 14:51:18 UTC
Bill, this is your area, thoughts?

Comment 26 Seth Vidal 2007-11-07 15:06:04 UTC
I just looked at the patch. The checkdeps being set to 1 there doesn't make any
sense b/c we're not adding anything new to the transaction set. 

Comment 27 Bill Nottingham 2007-11-07 17:13:02 UTC
So, Seth - you think the patch is correct?

Comment 28 Seth Vidal 2007-11-07 18:36:22 UTC
yah, I think it might be. Tim, go ahead and throw it past yum-devel. But unless
I'm way off it looks like it is correct.



Comment 29 Tim Lauridsen 2007-11-09 08:29:49 UTC
I have committed the patch to upstream yum.

Comment 30 Luke Macken 2007-11-09 14:50:29 UTC
I patched yum on releng1 yesterday and was able to successfully mash
f7-updates-testing.  Thanks for tracking down and patching this issue, Tim!

Comment 31 Fedora Update System 2007-12-06 17:39:13 UTC
yum-3.2.8-1.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update yum'

Comment 32 Fedora Update System 2007-12-06 20:52:08 UTC
yum-3.2.8-1.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.