Bug 1883827 - Crash on kill containers failure
Summary: Crash on kill containers failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: Rom Freiman
QA Contact: Yuri Obshansky
URL:
Whiteboard: assisted-installer-prod
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-30 10:55 UTC by Yuval Goldberg
Modified: 2022-08-25 21:42 UTC (History)
3 users (show)

Fixed In Version: v1.0.10.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-25 21:42:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yuval Goldberg 2020-09-30 10:55:51 UTC
Description of problem:
In case of a restart command after there were failures on the agent side with failed containers - It would crush.


Steps to Reproduce:
1. DHCP mode with dhclient failure
2. restart command

Actual results:
Sep 29 18:26:49 localhost agent[1813]: time="29-09-2020 18:26:49" level=info msg="Sending step <execute-dfa8b817> reply output <b1b41ca48ddefa078dc542e9cb68ff3e8ea6ca63c7b81fdd41e74618b544a73b\n1a8d66af0f15c4d99b1d703f948caf26e608cd1185e888e4f5d1061d51a17288\n> error <Error: can only kill running containers. 03b76041d08357411fa35c6596b9df6f7b551f919b8dde12ab1209d4d7d798d9 is in state exited: container state improper\nError: can only kill running containers. 64c0645230d88b0dddc33db1992eb5c69a838425f98a3388f5509d5327876bdb is in state exited: container state improper\nError: can only kill running containers. 6aba004b9b3ccd5a0fef068da9b84f55f08426198e10b0dbbc6f549cc57752fc is in state exited: container state improper\nError: can only kill running containers. 6af5dd68bbf1ee78758f99502467d9e176267633867036496238aa820e52f4c9 is in state exited: container state improper\nError: can only kill running containers. 7d3d03158a8ee4432f323c1d3aef9c522aa84d72692fd714465a45d18ab20476 is in state exited: container state improper\nError: can only kill running containers. 034de39b55f9f9e547fad1b7286dfe8cc73a708acca31892f636d0e3729b0b23 is in state exited: container state improper\nError: can only kill running containers. a15ccb347c6b69bf8296701e6938c7f88dc63ca098153ed3c09dba8e7e4abcea is in state exited: container state improper\nError: can only kill running containers. 069a37b2850785cea15170acbdaf065b4e925f9c96f6622bafd13e30c8c25c92 is in state exited: container state improper\nError: can only kill running containers. 46082ed44bc1367a5f70adb7353e233ba7bdd702f56f9b473e97b821291bf9f2 is in state exited: container state improper\nError: can only kill running containers. 9ba88206a89d162a307713a5b4b1b982a21d37b956f94b0d2118c38a103c4db4 is in state exited: container state improper\n> exit-code <125>" file="step_processor.go:37" request_id=edc5fa66-362d-44b2-bfa3-bbc1df2544c0

Expected results:


Additional info:
- Fix agent side to support killed containers
- Assisted-service shouldn't crush in case of agent kill execution failure

Comment 1 Ori Amizur 2020-10-08 09:39:30 UTC
Should fix the reset/cancel command so these errors should not happen

Comment 2 Raz Regev 2020-10-12 08:13:19 UTC
fixed, the command changed from 'kill' to 'stop' and does now exit with status 0.

Comment 4 mchernyk 2020-10-29 21:01:15 UTC
The issue does not reproduce on v1.0.10.2:

Oct 29 20:37:24 master-0-1 agent[1984]: time="29-10-2020 20:37:24" level=info msg="Sending step <execute-cbbffa01> reply output <0a77c1e8e4f20612f98772cb3c9553ea41a2b371e295ec326102cae8ab96d135\n> error <> exit-code <0>" file="step_processor.go:46" request_id=65e3848d-cf39-41a0-96a3-7c1ade5ec390
Oct 29 20:39:24 master-0-1 agent[1984]: time="29-10-2020 20:39:24" level=info msg="Query for next steps" file="step_processor.go:101" request_id=da486687-9d82-49a3-a86a-f11935d8fd74
Oct 29 20:39:25 master-0-1 agent[1984]: time="29-10-2020 20:39:25" level=info msg="Executing step: <execute-eed33f25>, command: </usr/bin/podman>, args: <[stop -i -t 5 assisted-installer]>" file="step_processor.go:78" request_id=da486687-9d82-49a3-a86a-f11935d8fd74
Oct 29 20:39:25 master-0-1 agent[1984]: time="29-10-2020 20:39:25" level=info msg="Sending step <execute-eed33f25> reply output <0a77c1e8e4f20612f98772cb3c9553ea41a2b371e295ec326102cae8ab96d135\n> error <> exit-code <0>" file="step_processor.go:46" request_id=da486687-9d82-49a3-a86a-f11935d8fd74
Oct 29 20:41:25 master-0-1 agent[1984]: time="29-10-2020 20:41:25" level=info msg="Query for next steps" file="step_processor.go:101" request_id=ad5750b5-018f-4db6-91cd-ffbf402c6d27
Oct 29 20:41:26 master-0-1 agent[1984]: time="29-10-2020 20:41:26" level=info msg="Executing step: <execute-d588780c>, command: </usr/bin/podman>, args: <[stop -i -t 5 assisted-installer]>" file="step_processor.go:78" request_id=ad5750b5-018f-4db6-91cd-ffbf402c6d27
Oct 29 20:41:26 master-0-1 agent[1984]: time="29-10-2020 20:41:26" level=info msg="Sending step <execute-d588780c> reply output <0a77c1e8e4f20612f98772cb3c9553ea41a2b371e295ec326102cae8ab96d135\n> error <> exit-code <0>" file="step_processor.go:46" request_id=ad5750b5-018f-4db6-91cd-ffbf402c6d27
Oct 29 20:43:26 master-0-1 agent[1984]: time="29-10-2020 20:43:26" level=info msg="Query for next steps" file="step_processor.go:101" request_id=4c4cff4f-5bd5-4b2f-96d7-06ad4c6777a7
Oct 29 20:43:26 master-0-1 agent[1984]: time="29-10-2020 20:43:26" level=info msg="Executing step: <execute-2542ca18>, command: </usr/bin/podman>, args: <[stop -i -t 5 assisted-installer]>" file="step_processor.go:78" request_id=4c4cff4f-5bd5-4b2f-96d7-06ad4c6777a7
Oct 29 20:43:26 master-0-1 agent[1984]: time="29-10-2020 20:43:26" level=info msg="Sending step <execute-2542ca18> reply output <0a77c1e8e4f20612f98772cb3c9553ea41a2b371e295ec326102cae8ab96d135\n> error <> exit-code <0>" file="step_processor.go:46" request_id=4c4cff4f-5bd5-4b2f-96d7-06ad4c6777a7


Note You need to log in before you can comment on or make changes to this bug.