Description of problem: When booting hosts into ISO some or all of hosts are not discovered by the service. agent logs: Sep 21 11:41:29 master-0-1 podman[1860]: Trying to pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15... Sep 21 11:41:34 master-0-1 podman[1860]: Getting image source signatures Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:9d0d09b1ea44d90760e85af2ce11a721c1fd9c646015000e4453b10c661fd21c Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:8e7ef64fd2cf1d5fbafc966ca5339400a2a8d26dcd5025cae0bffeb76155b005 Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:c4d668e229cd131e0a8e4f8218dca628d9cf9697572875e355fe4b247b6aa9f0 Sep 21 11:41:36 master-0-1 podman[1860]: Copying blob sha256:ec1681b6a383e4ecedbeddd5abc596f3de835aed6db39a735f62395c8edbff30 Sep 21 11:42:55 master-0-1 podman[1860]: Copying config sha256:8a15567bae378372803735ca3e4359cd5d91057b30ae54631c3ba82a7e6660fc Sep 21 11:42:55 master-0-1 podman[1860]: Writing manifest to image destination Sep 21 11:42:55 master-0-1 podman[1860]: Storing signatures Sep 21 11:42:59 master-0-1 systemd[1]: agent.service: Start-pre operation timed out. Terminating. Sep 21 11:42:59 master-0-1 systemd[1]: agent.service: Failed with result 'timeout'. Sep 21 11:42:59 master-0-1 systemd[1]: Failed to start agent.service. Sep 21 11:43:03 master-0-1 systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart. Sep 21 11:43:03 master-0-1 systemd[1]: agent.service: Scheduled restart job, restart counter is at 6. Sep 21 11:43:03 master-0-1 systemd[1]: Stopped agent.service. Sep 21 11:43:03 master-0-1 systemd[1]: Starting agent.service... Sep 21 11:43:03 master-0-1 podman[1973]: Trying to pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15... Sep 21 11:43:08 master-0-1 podman[1973]: Getting image source signatures Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:ec1681b6a383e4ecedbeddd5abc596f3de835aed6db39a735f62395c8edbff30 Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:c4d668e229cd131e0a8e4f8218dca628d9cf9697572875e355fe4b247b6aa9f0 Sep 21 11:43:08 master-0-1 podman[1973]: Copying blob sha256:9d0d09b1ea44d90760e85af2ce11a721c1fd9c646015000e4453b10c661fd21c Sep 21 11:43:09 master-0-1 podman[1973]: Copying blob sha256:8e7ef64fd2cf1d5fbafc966ca5339400a2a8d26dcd5025cae0bffeb76155b005 Sep 21 11:43:36 master-0-1 podman[1973]: Copying config sha256:8a15567bae378372803735ca3e4359cd5d91057b30ae54631c3ba82a7e6660fc Sep 21 11:43:36 master-0-1 podman[1973]: Writing manifest to image destination Sep 21 11:43:36 master-0-1 podman[1973]: Storing signatures Sep 21 11:43:36 master-0-1 podman[1973]: Error: error creating container storage: size for layer "053c169f70b03d18472b4004472910fdb2465d55ae9335c422f2cd4b7479a21e" is unknown, failing getSize() Sep 21 11:43:36 master-0-1 systemd[1]: agent.service: Control process exited, code=exited status=125 Sep 21 11:43:36 master-0-1 systemd[1]: agent.service: Failed with result 'exit-code'. Sep 21 11:43:36 master-0-1 systemd[1]: Failed to start agent.service. Sep 21 11:43:40 master-0-1 systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart. Sep 21 11:43:40 master-0-1 systemd[1]: agent.service: Scheduled restart job, restart counter is at 7. Sep 21 11:43:40 master-0-1 systemd[1]: Stopped agent.service. Version-Release number of selected component (if applicable): Staging: { "release_tag": "v1.0.9.4-ds", "versions": { "assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.4", "assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19", "assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15", "assisted-installer-service": "quay.io/app-sre/assisted-service:b793c52", "discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15", "image-builder": "quay.io/app-sre/assisted-iso-create:b793c52" } } How reproducible: This is not reproduced every time. Also when it is reproduced, somtimes this happens with all the nodes and sometimes with only one node. Steps to Reproduce: 1. Create Cluster 2. Generate and download ISO 3. Boot nodes into ISO Actual results: Node discovery fails Expected results: Nodes to be discovered by service Additional info:
Maybe this is a timeout issue of the systemd service pre-step
I have one environment (out of 2) which constantly reproduce this issue (seal32). Hosts are getting up to 20 retries. Agent fails to start. When manually ran: sudo podman pull registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15 issue resolves. so i guess its timeout issue.
Reproducible on ocp-edge33.lab.eng.tlv2.redhat.com 1 master node failed 2 masters and 2 workers are OK Sep 21 15:18:24 master-0-1 podman[6154]: Error: error creating container storage: size for layer "ccf04fbd6e1943f648d1c2980e96038edc02b543c597556> Sep 21 15:18:24 master-0-1 systemd[1]: agent.service: Control process exited, code=exited status=125 Sep 21 15:18:24 master-0-1 systemd[1]: agent.service: Failed with result 'exit-code'. Sep 21 15:18:24 master-0-1 systemd[1]: Failed to start agent.service.
PR - https://github.com/openshift/assisted-service/pull/406
issue persists on staging: { "release_tag": "v1.0.9.5-ds", "versions": { "assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.5", "assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19", "assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15", "assisted-installer-service": "quay.io/app-sre/assisted-service:7fd51db", "discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15", "image-builder": "quay.io/app-sre/assisted-iso-create:7fd51db" } }
Verified on staging: { "release_tag": "v1.0.9.5-ds", "versions": { "assisted-ignition-generator": "quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.5", "assisted-installer": "registry.stage.redhat.io/openshift4/assisted-installer-rhel8:v4.6.0-19", "assisted-installer-controller": "registry.stage.redhat.io/openshift4/assisted-installer-reporter-rhel8:v4.6.0-15", "assisted-installer-service": "quay.io/app-sre/assisted-service:27cfe0d", "discovery-agent": "registry.stage.redhat.io/openshift4/assisted-installer-agent-rhel8:v4.6.0-15", "image-builder": "quay.io/app-sre/assisted-iso-create:27cfe0d" } }