Bug 626353
Summary: | recipe stuck in waiting without watchdog kicking in | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Ales Zelinka <azelinka> | ||||||
Component: | beah | Assignee: | Marian Csontos <mcsontos> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 0.5 | CC: | bpeck, dcallagh, kbaker, mcsontos, rmancy | ||||||
Target Milestone: | future_maint | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-03-24 13:17:26 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 632609 | ||||||||
Attachments: |
|
Comment 2
Marian Csontos
2010-08-25 05:34:47 UTC
Bill, I seek your opinion: I have repeating implemented but... Are XML-RPC failures as seen in Comment 2 ever expected? [1] If a call has correct parameters [2] and network is fine, shall I simple repeat the call until it passes and let EWD kill the job if it does not? [3] Thinking about it, harness should not repeat the real calls, but instead of them use a ping-like call: Bug 636093 [1] The first one. The second one is a consequence I am trying to get rid of. [2] Let's suppose it does - otherwise something is broken already and the task/recipe will be broken anyway. [3] For cases of broken network setup I filled in Bug 636080. Created attachment 449887 [details]
repeating-proxy: per-call repeating
Created attachment 449888 [details]
repeat task_start - until it pass
Though it would work for task_start, this will require more sophisticated approach: 1. if the call fails, use ping call (as in Bug 636093) to determine if the net is broken and wait until service is restored. Repeat ping until: 1.1 the original call succeeds: go on wth next calls. 1.2 the ping succeeds: retry original call (2) 2. before repeating the call, try to get the original call's status: 2.1 e.g. for task_start/task_end use task_info 2.2 for task_result/upload_file repeat the call 3. the call must finish in finite time in the worst case and this must be reported: 3.1 try to push new result 3.2 message on console if the result fails Thanks to Bill for kicking me off. As this is not a high priority bug fix and I won't be able to provide and test the golden-grail solution for 0.5.58 I am pushing this ahead too. ...and once more. Ping - Is this still an issue? Risk at least. I haven't seen this issue for a long time. Won't mind closing this as fixed. Thanks. closing, if seen again please re-open. |