Bug 820612 - spacewalk-repo-sync hangs after encountering larger number of errors/missing packages
Summary: spacewalk-repo-sync hangs after encountering larger number of errors/missing ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 1.7
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Michael Mráka
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
: 889885 965386 (view as bug list)
Depends On:
Blocks: space20
TreeView+ depends on / blocked
 
Reported: 2012-05-10 13:30 UTC by Dominik L. Borkowski
Modified: 2013-09-12 13:54 UTC (History)
5 users (show)

Fixed In Version: spacewalk-java-1.10.36-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-08-02 12:58:22 UTC
Embargoed:


Attachments (Terms of Use)

Description Dominik L. Borkowski 2012-05-10 13:30:29 UTC
Description of problem:

spacewalk-repo-sync 

Version-Release number of selected component (if applicable):

1.7

How reproducible:

Always. Happens on multiple repositories, from various servers, the only common element is that those repositories are missing packages, or problems reading their signatures.

Steps to Reproduce:
1. create a channel with repository pointing to one that may be missing packages, e.g.: http://download.opensuse.org/repositories/devel:/languages:/python/openSUSE_12.1/
2. attempt to synchronize the repository, either manually with spacewalk-repo-sync or via the web interface. it seems to happen more often when executed via the web interface/taskomatic
3. see that spacewalk-repo-sync eventually hangs, and requires SIGKILL 
  
Actual results:

sample session:

tail -f /var/log/rhn/repo-sync/*python* shows our last signs of activity:

failure: x86_64/python-pyrad-1.2-2.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
638/904 : python-pyrss2gen-1.0.0-2.1-0.x86_64
failure: x86_64/python-pyrss2gen-1.0.0-2.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
639/904 : python-pyst-0.2-6.1-0.x86_64
failure: x86_64/python-pyst-0.2-6.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
640/904 : python-pyxmpp-debuginfo-1.1.1-2.1-0.x86_64
failure: x86_64/python-pyxmpp-debuginfo-1.1.1-2.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.


ps shows:

root     11755  4.2  0.6 422360 53996 ?        S    09:12   0:18 /usr/bin/python -u /usr/bin/spacewalk-repo-sync --channel opensuse12.1-x86_64-python --type yum


# strace -t -f -p 11755
Process 11755 attached - interrupt to quit
09:21:45 write(2, "failure: x86_64/python-pyxmpp-de"..., 130


# finally, after issuing kill -9, taskomatic's log shows:

INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pymongo-debugsource-2.1.1-4.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pypar-debuginfo-2.1.4_94-1.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pypar-debugsource-2.1.4_94-1.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pyrad-1.2-2.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pyrss2gen-1.0.0-2.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | failure: x86_64/python-pyst-0.2-6.1.x86_64.rpm from opensuse12.1-x86_64-python: [Errno 256] No more mirrors to try.
INFO   | jvm 1    | 2012/05/10 09:28:00 | Traceback (most recent call last):
INFO   | jvm 1    | 2012/05/10 09:28:00 | 

Expected results:

Additional info:


Biggest issue here is that there can be only a single spacewalk-repo-sync process running, so a failure caused by one channel will block all the other updates.

Comment 1 Michael Mráka 2012-05-18 13:36:10 UTC
Well, I can't reproduce the bug with reproducer above.
And I can't reproduce the bug even with repo containing 3500+ missing packages.

# spacewalk-repo-sync -c test_repo_channel
Repo URL: http://download.opensuse.org/repositories/devel:/languages:/python/openSUSE_12.1/
Packages in repo:              3529
Packages already synced:          0
Packages to sync:              3529
...
3527/3529 : man-pages-es-extra-1.55-14.el6-0.noarch
failure: ../Packages/man-pages-es-extra-1.55-14.el6.noarch.rpm from spacewalk-n-centos6: [Errno 256] No more mirrors to try.
3528/3529 : iso-codes-3.16-2.el6-0.noarch
failure: ../Packages/iso-codes-3.16-2.el6.noarch.rpm from spacewalk-n-centos6: [Errno 256] No more mirrors to try.
3529/3529 : tftp-0.49-7.el6-0.x86_64
failure: ../Packages/tftp-0.49-7.el6.x86_64.rpm from spacewalk-n-centos6: [Errno 256] No more mirrors to try.
Repo http://xxx/pub/repo/ has 0 errata.
Sync completed.
Total time: 0:00:23


I guess this is your local network issue.

Comment 2 Dominik L. Borkowski 2012-05-19 00:21:12 UTC
Network issue is very unlikely, since the issue can be repeated with same repositories mirrored on a nearby server. The issue is always reproducible, regardless of source, and much larger repos from same sources succeed. Only correlation between spacewalk-repo-sync hanging and not seem to be repos with large number of missing packages.

One thing to note: this is almost always triggered when spacewalk-repo-sync is executed via taskomatic. I have a very hard time reproducing it when launching spacewalk-repo-sync directly from CLI.

Is there anything else I can provide to aid with debugging this issue?

Thanks!

Comment 3 Dominik L. Borkowski 2012-05-19 00:26:28 UTC
Network issue is very unlikely, since the issue can be repeated with same repositories mirrored on a nearby server. The issue is always reproducible, regardless of source, and much larger repos from same sources succeed. Only correlation between spacewalk-repo-sync hanging and not seem to be repos with large number of missing packages.

One thing to note: this is almost always triggered when spacewalk-repo-sync is executed via taskomatic. I have a very hard time reproducing it when launching spacewalk-repo-sync directly from CLI.

Is there anything else I can provide to aid with debugging this issue?

Thanks!

Comment 4 Jan Pazdziora 2013-03-08 13:24:55 UTC
The reproducer is actually pretty deterministic. The best way to see the behaviour is to just have a repodata with no packages. To achieve that, do:

# mkdir -p /var/www/html/pub/fedora-18/repodata
# cd /var/www/html/pub/fedora-18/repodata
# wget --no-directories -r --level 1 --no-parent http://dl.fedoraproject.org/pub/fedora/linux/releases/18/Everything/x86_64/os/repodata/

Then create channel and repository and point that repository to http://localhost/pub/fedora-18. Then click Sync Now.

Look at the log in /var/log/rhn/reposync -- it ends with

575/33868 : openldap-servers-2.4.33-3.fc18-0.x86_64
failure: Packages/o/openldap-servers-2.4.33-3.fc18.x86_64.rpm from fedora-18: [Errno 256] No more mirrors to try.
576/33868 : OpenImageIO-devel-1.0.9-1.fc18-0.i686
failure: Packages/o/OpenImageIO-devel-1.0.9-1.fc18.i686.rpm from fedora-18: [Errno 256] No more mirrors to try.

The lines with the "failure" string amount to 64551 bytes, the other lines total 29029 bytes. I assume that spacewalk-repo-sync outputs stuff both to the log file and to its stdout and stderr, and taskomatic tries to read the stdout output first and then things get stuck on that stderr pipe buffer.

Comment 7 Michael Mráka 2013-04-03 08:53:38 UTC
Fixed in spacewalk nightly by
commit 1e7b903cbcb421a2e4eb74a0590618a71e0a263d
    820612 - implemented executor with paralled stdout and stderr reading

Comment 8 Tomas Lestach 2013-04-03 16:22:13 UTC
+ checkstyle fix ...

spacewalk.git: 0a28394bdbdea829e4cbdc0d13efca02deb1bb29

Comment 9 Tomáš Kašpárek 2013-08-02 12:58:22 UTC
Fix for this bug is present in Spacewalk 2.0, closing this bug as CURRENTRELEASE.

Comment 10 Michael Mráka 2013-09-12 13:52:48 UTC
*** Bug 965386 has been marked as a duplicate of this bug. ***

Comment 11 Michael Mráka 2013-09-12 13:54:30 UTC
*** Bug 889885 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.