| Summary: | Disruption in Internet Connectivity leave a large number of sleeping grinder processes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Pulp | Reporter: | Ernest W. Durbin III <ewdurbin> | ||||||
| Component: | user-experience | Assignee: | John Matthews <jmatthew> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Preethi Thomas <pthomas> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 1.0.0 | CC: | skarmark | ||||||
| Target Milestone: | --- | Keywords: | Triaged | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-02-24 20:15:18 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 534281 [details]
graph of processes on pulp server at time of IP connection break ~2300
Created attachment 534282 [details]
tarball of logfiles snipped around connection disruption ~2300
I was able to replicate this issue without using scheduled syncs 1) pulp-admin repo create --id bad_url --feed http://bad_url_should_fail_no_data.com 2) pulp-admin repo sync --id bad_url Repeat the sync several times and you see on each attempt the grinder activeobject processes are left running. Example below: $ ps auxf | grep grinder | wc -l 62 [jmatthews@jwm-devel pulp{master}$ sudo pulp-admin repo sync --id bad_url -F Sync for repository bad_url started Sync: Error Item Details: error: Exception: Traceback (most recent call last): File "/shared/repo/grinder/src/grinder/activeobject.py", line 429, in process retval = method(*args, **kwargs) File "/shared/repo/grinder/src/grinder/RepoFetch.py", line 51, in fetchItem verify_options=self.verify_options) File "/shared/repo/grinder/src/grinder/BaseFetch.py", line 328, in fetch checksum, headers, retryTimes, packages_location) File "/shared/repo/grinder/src/grinder/BaseFetch.py", line 328, in fetch checksum, headers, retryTimes, packages_location) File "/shared/repo/grinder/src/grinder/BaseFetch.py", line 270, in fetch curl.perform() error: (6, 'Could not resolve host: bad_url_should_fail_no_data.com; Cannot allocate memory') [jmatthews@jwm-devel pulp{master}$ ps auxf | grep grinder | wc -l 77 Issue was that we were not explicitly killing the activeobject processes if we encountered an exception when fetching metadata. Commit is here: http://git.fedorahosted.org/git/?p=grinder.git;a=commitdiff;h=e9758fc8f07e7fda58da8dd021d7bb845b74c993 build: 0.255 [root@katello-test ~]# rpm -q pulp pulp-0.0.255-1.el6.noarch [root@katello-test ~]# [root@katello-test ~]# pulp-admin -u admin -p admin repo create --id bad_url --feed http://bad_url_should_fail_no_data.com Successfully created repository [ bad_url ] [root@katello-test ~]# pulp-admin repo sync --id bad_url -F error: error: operation failed: sslv3 alert certificate expired [root@katello-test ~]# [root@katello-test ~]# [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url -F Sync for repository bad_url started Sync: Error Item Details: error: Exception: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 429, in process retval = method(*args, **kwargs) File "/usr/lib/python2.6/site-packages/grinder/RepoFetch.py", line 51, in fetchItem verify_options=self.verify_options) File "/usr/lib/python2.6/site-packages/grinder/BaseFetch.py", line 328, in fetch checksum, headers, retryTimes, packages_location) File "/usr/lib/python2.6/site-packages/grinder/BaseFetch.py", line 328, in fetch checksum, headers, retryTimes, packages_location) File "/usr/lib/python2.6/site-packages/grinder/BaseFetch.py", line 270, in fetch curl.perform() error: (6, "Couldn't resolve host 'bad_url_should_fail_no_data.com'") [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url Sync for repository bad_url started Use "repo status" to check on the progress [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url Sync for repository bad_url started Use "repo status" to check on the progress [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url Sync for repository bad_url started Use "repo status" to check on the progress [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url Sync for repository bad_url started Use "repo status" to check on the progress [root@katello-test ~]# pulp-admin -u admin -p admin repo sync --id bad_url Sync for repository bad_url started Use "repo status" to check on the progress [root@katello-test ~]# ps auxf | grep grinder | wc -l 1 [root@katello-test ~]# rpm -q pulp pulp-0.0.255-1.el6.noarch [root@katello-test ~]# Pulp v1.0 is released Closed Current Release. Pulp v1.0 is released. |
Description of problem: When a scheduled sync begins and the internet connection for the machine is not operational, the grinder processes started by the sync process appear to sleep. Even once the internet connection has been restored, these processes do not recover or die. Version-Release number of selected component (if applicable): Pulp community release 18 pulp 0.0.244-5.fc15 grinder 0.0.127-1.fc15 Fedora 15 2.6.40.6-0.fc15.x86_64 How reproducible: I'm uncertain on this. Only have one Pulp box going Steps to Reproduce: 1. Configure a repository for automated syncs. Schedule one in future 2. Before, or possibly during a scheduled sync tear down the internet connection 3. Verify that a number of grinder procs are sleeping `ps aux | grep grinder\/activeobject\.pyc` Actual results: The repository does not sync, a number of grinder object processes are sleeping. In order to remove them, a `service pulp-server restart` works. Expected results: Graceful failure... wait for next sync? Additional info: other things to follow.