Bug 516556 - Repo sync hangs after about 2 hours
Summary: Repo sync hangs after about 2 hours
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 0.6
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Tomas Lestach
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: space16
TreeView+ depends on / blocked
 
Reported: 2009-08-10 13:01 UTC by Mark Chappell
Modified: 2011-10-10 08:08 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-10-10 08:08:07 UTC
Embargoed:


Attachments (Terms of Use)

Description Mark Chappell 2009-08-10 13:01:13 UTC
about 2 hours into a WEB initiated repo sync, the sync hangs.  Command line syncs seem to complete fine.

[root@gyne reposync]# ps -Af | grep repo
root      5034  2343  0 13:48 pts/2    00:00:00 grep repo
root     31978 30099  2 11:43 ?        00:02:33 /usr/bin/python -u /usr/bin/spacewalk-repo-sync --channel centos-53-i386-update --url http://www.mirrorservice.org/sites/mirror.centos.org/5/updates/i386/ --type yum --label centos-updates-i386
[root@gyne reposync]# strace -f -p 31978
Process 31978 attached - interrupt to quit
write(2, "Exception reported from gyne.bat"..., 1581

[root@gyne ~]# lsof -p 31978
COMMAND     PID USER   FD   TYPE    DEVICE     SIZE      NODE NAME
spacewalk 31978 root    2w  FIFO       0,6          247199101 pipe

[root@gyne ~]# lsof | grep 247199101
java      30099      root   98r     FIFO                0,6           247199101 pipe
spacewalk 31978      root    2w     FIFO                0,6           247199101 pipe


The java process appears to have stopped trying to read out of the FIFO...

Eventually you seem to get an IO exception from the python
Exception type exceptions.IOError

Exception Handler Information
Traceback (most recent call last):
  File "/usr/share/rhn/satellite_tools/reposync.py", line 131, in import_packages
    self.print_msg(str(index+1) + "/" + str(len(to_download)) + " : "+ \
  File "/usr/share/rhn/satellite_tools/reposync.py", line 213, in print_msg
    print message
IOError: [Errno 32] Broken pipe

Comment 1 Jan Pazdziora (Red Hat) 2010-11-19 16:03:43 UTC
Mass-moving to space13.

Comment 2 Jan Pazdziora (Red Hat) 2010-11-26 15:08:02 UTC
Tomas, could you please try to reproduce? If the spacewalk-repo-sync process fails for whatever reason (the remote httpd stopped, for example) and writes error to stderr, will taskomatic handle it gracefully and let the process finish (albeit with error)?

Comment 3 Tomas Lestach 2011-01-24 14:58:37 UTC
I wasn't able to reproduce the issue.

Every time, the repo-sync finishes, I do not see it in the ps -Af list, regardless the repo-sync finished successfully or failed.

Could you, please, send me a more detailed reproducer, how to trigger the repo-sync so it keeps hanging?

Comment 4 Mark Chappell 2011-01-24 16:44:20 UTC
I no longer work for my previous employer, so don't have access to that particular host.  The way I could (reliably) generate the issue was by setting up a large repo to sync, Fedora/CentOS over the WAN.  However I've not tried for some time.

Comment 5 Miroslav Suchý 2011-04-11 07:32:19 UTC
We did not have time for this one during Spacewalk 1.4 time frame. Mass moving to Spacewalk 1.5.

Comment 6 Miroslav Suchý 2011-04-11 07:36:44 UTC
We did not have time for this one during Spacewalk 1.4 time frame. Mass moving to Spacewalk 1.5.

Comment 7 Jan Pazdziora (Red Hat) 2011-07-20 11:50:09 UTC
Aligning under space16.

Comment 8 Jan Pazdziora (Red Hat) 2011-07-25 16:57:10 UTC
Tomáš, can we assume that with the latest scheduler code, the stdout is redirected to the log file, so it no longer is a pipe to the Java process, and as such this problem is resolved in the current releases?

Comment 9 Tomas Lestach 2011-07-26 10:08:01 UTC
Execution of external commands fundamentally didn't change, so this isn't a reason for closing the BZ.

Comment 10 Jan Pazdziora (Red Hat) 2011-10-10 06:36:08 UTC
I just did a 2.5-hour spacewalk-repo-sync of Fedora 15 repository, the 2943 packages are now safely in. This was with Spacewalk nightly:

# rpm -qa | grep spacewalk | sort
spacewalk-admin-1.6.1-1.el5
spacewalk-backend-1.6.26-1.el5
spacewalk-backend-app-1.6.26-1.el5
spacewalk-backend-applet-1.6.26-1.el5
spacewalk-backend-config-files-1.6.26-1.el5
spacewalk-backend-config-files-common-1.6.26-1.el5
spacewalk-backend-config-files-tool-1.6.26-1.el5
spacewalk-backend-iss-1.6.26-1.el5
spacewalk-backend-iss-export-1.6.26-1.el5
spacewalk-backend-libs-1.6.26-1.el5
spacewalk-backend-package-push-server-1.6.26-1.el5
spacewalk-backend-server-1.6.26-1.el5
spacewalk-backend-sql-1.6.26-1.el5
spacewalk-backend-sql-oracle-1.6.26-1.el5
spacewalk-backend-tools-1.6.26-1.el5
spacewalk-backend-xml-export-libs-1.6.26-1.el5
spacewalk-backend-xmlrpc-1.6.26-1.el5
spacewalk-backend-xp-1.6.26-1.el5
spacewalk-base-1.6.25-1.el5
spacewalk-base-minimal-1.6.25-1.el5
spacewalk-branding-1.6.4-1.el5
spacewalk-certs-tools-1.6.5-1.el5
spacewalk-common-1.5.1-1.el5
spacewalk-config-1.6.3-1.el5
spacewalk-doc-indexes-1.1.1-1.el5
spacewalk-grail-1.6.25-1.el5
spacewalk-html-1.6.25-1.el5
spacewalk-java-1.6.55-1.el5
spacewalk-java-config-1.6.55-1.el5
spacewalk-java-lib-1.6.55-1.el5
spacewalk-java-oracle-1.6.55-1.el5
spacewalk-monitoring-1.4.1-1.el5
spacewalk-monitoring-selinux-1.6.2-1.el5
spacewalk-oracle-1.5.1-1.el5
spacewalk-pxt-1.6.25-1.el5
spacewalk-schema-1.6.19-1.el5
spacewalk-search-1.6.3-1.el5
spacewalk-selinux-1.6.1-1.el5
spacewalk-setup-1.5.11-1.el5
spacewalk-setup-jabberd-1.6.1-1.el5
spacewalk-slf4j-1.6.1-1.el5
spacewalk-sniglets-1.6.25-1.el5
spacewalk-taskomatic-1.6.55-1.el5

I propose closing CURRENTRELEASE.

We can always open new bug if we have some fresh reproducer.

Comment 11 Tomas Lestach 2011-10-10 08:08:07 UTC
I agree with Comment#10. I also do not see any issues.

Closing CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.