Bug 747631
| Summary: | Race condition: Satellite-sync hangs up forever | ||
|---|---|---|---|
| Product: | Red Hat Satellite 5 | Reporter: | Šimon Lukašík <slukasik> |
| Component: | Satellite Synchronization | Assignee: | Michael Mráka <mmraka> |
| Status: | CLOSED ERRATA | QA Contact: | Jan Hutař <jhutar> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 541 | CC: | cperry, jhutar, mmraka, msuchy |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | spacewalk-backend-1.2.13-59 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-22 13:11:57 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 677498 | ||
That
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
is called from out_queue.get_nowait() in:
1758 while list(itertools.ifilter(lambda x: x.isAlive(), all_threads)) or out_queue.qsize() > 0:
1759 try:
1760 (rpmManip, package, is_done) = out_queue.get_nowait()
1761 except Queue.Empty:
1762 time.sleep(0.1)
1763 continue
and the race condition probably is - although out_queue.qsize() > 0 is true subsequent out_queue.get_nowait() raises Queue.Empty. Doc says Queue.qsize() returns the approximate size of the queue.
Bug has been fixed in spacewalk master by
commit 970a100b43b9bafc34655915923a1d6b11408ab1
747631 - exit loop when all packages are finished
Spacewalk package: spacewalk-backend-1.6.59-1
Backported to SATELLITE-5.4 as
commit fe292fc954ca7809f42e9640d48fa809f61437a4
747631 - exit loop when all packages are finished
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1848.html |
Description of problem: There is a race condition in multi-threading satellite-sync. Approximately 1 in 200+ attempt of Satellite-sync hangs up forever. It appears at the very end of 'Downloading rpm packages' phase, before 'Processing rpm packages complete'. Strace of the satellite-sync process shows only: select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) Probably waiting for some resources. Version-Release number of selected component (if applicable): RHN Satellite 5.4.1 spacewalk-backend-1.2.13-55.el6sat.noarch How reproducible: The issue is not deterministic and its very rare. However, I am able to reproduce it when I run the reproducer in loop. Steps to Reproduce: 1. satellite-sync some channel (I used dump with 9 packages.) 2. 3. Actual results: Process rarely hangs-up. Expected results: Process will never hang up. Additional info: