Hide Forgot
Description of problem: Assume a content host (running goferd without a problem, connected to Sat/Caps port 5647) runs out of free disk space - a user error but one that can easily happen. In this situation, any new foreman-task propagated to goferd on this machine gets stalled forever (or until the timeout for package install / capsule sync, so matter of hours? havent waited so long). Usually with a symptom of goferd having no established TCP connection for a long time - despite heartbeats are enabled. It would be nice to catch this situation (sooner) and provide meaningful error message into the foreman-task. Version-Release number of selected component (if applicable): (content host) python-gofer-2.6.8-1.el7sat.noarch python-gofer-proton-2.6.8-1.el7sat.noarch gofer-2.6.8-1.el7sat.noarch (Satellite) ruby193-rubygem-foreman-tasks-0.6.15.7-1.el7sat.noarch How reproducible: 100% Steps to Reproduce: 1. Have a content host registered to Satellite6 2. Fill its disk (at least /var partition) 3. hammer -u admin -p password content-host package remove --content-host-id UUID --organization-id 1 --packages sos 4. monitor TCP connections established from goferd to Satellite/Capsule port 5647 5. (after few minutes, free the disk on content host) Actual results: 3. timeouts / never finishes 4. no connection for a longer time Expected results: 3. to finish (sooner) with self-explanatory error 4. almost everytime there needs to be an established TCP connection Additional info: I *think* the problem is in goferd that fails to write a json file with pending work to /var/lib/gofer/messaging/pending/katelloplugin. So it has nothing to pick up later on. This failure to write should be reported as task failed. In parallel, goferd looses TCP connection to qdrouterd (for some time, in my reproducer it got established after some (tens of?) minutes - no idea what triggers this.
Removing "Improvement" due to another symptom detected: goferd process consumes 100% CPU after a while. That sounds rather a bug than improvement.
(In reply to Bryan Kearney from comment #4) > so this is not fixed by 1295957? Yes, thanks for spotting it. goferd high CPU usage not reproducible further since qpid-proton 0.9-12 used. Changing back to "[Improvement] foreman-task to have warning goferd failed due to disk full" since goferd should be robust enough to report back to katello disk full / failure in creating json file in katelloagent dir.
Du to change... moving this out.
Thank you for your interest in Satellite 6. We have evaluated this request, and we do not expect this to be implemented in the product in the foreseeable future. We are therefore closing this out as WONTFIX. If you have any concerns about this, please feel free to contact Rich Jerrido or Bryan Kearney. Thank you.
The Satellite Team is attempting to provide an accurate backlog of bugzilla requests which we feel will be resolved in the next few releases. We do not believe this bugzilla will meet that criteria, and have plans to close it out in 1 month. This is not a reflection on the validity of the request, but a reflection of the many priorities for the product. If you have any concerns about this, feel free to contact Rich Jerrido or Bryan Kearney or your account team. If we do not hear from you, we will close this bug out. Thank you.
Thank you for your interest in Satellite 6. We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in the product in the foreseeable future. This is due to other priorities for the product, and not a reflection on the request itself. We are therefore closing this out as WONTFIX. If you have any concerns about this, please do not reopen. Instead, feel free to contact Rich Jerrido or Bryan Kearney. Thank you.