Bug 1856535 - Discovery ocpmetal/agent failed to start on hosts
Summary: Discovery ocpmetal/agent failed to start on hosts
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Ronnie Lazar
QA Contact: Yuri Obshansky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-13 21:38 UTC by Yuri Obshansky
Modified: 2020-07-15 12:45 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-15 12:45:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yuri Obshansky 2020-07-13 21:38:36 UTC
Description of problem:
Virt environment. 
Discovery ocpmetal/agent failed to start on hosts
[core@localhost ~]$ sudo journalctl -u agent
-- Logs begin at Mon 2020-07-13 21:19:06 UTC, end at Mon 2020-07-13 21:26:23 UTC. --
Jul 13 21:20:28 localhost systemd[1]: Starting agent.service...
Jul 13 21:20:29 localhost podman[1128]: 2020-07-13 21:20:29.356988325 +0000 UTC m=+0.536028425 system refresh
Jul 13 21:20:29 localhost podman[1128]: Trying to pull quay.io/ocpmetal/agent:latest...
Jul 13 21:20:32 localhost podman[1128]: Getting image source signatures
Jul 13 21:20:33 localhost podman[1128]: Copying blob sha256:90fc9795ab2927c8568d06190455b32a63f469a0c68e1b68378aae33fa599b18
Jul 13 21:20:33 localhost podman[1128]: Copying blob sha256:90e2fe808d18ae8e1429680d4f9d83c65dc0999fd3b2fe59798a39703f05ce0e
Jul 13 21:20:34 localhost podman[1128]: Copying blob sha256:5d20c808ce198565ff70b3ed23a991dd49afac45dece63474b27ce6ed036adc6
Jul 13 21:20:38 localhost podman[1128]: Copying config sha256:ad50532f294f987ab523297eea85485cf33939114d6f8f5325e83522193dba9d
Jul 13 21:20:38 localhost podman[1128]: Writing manifest to image destination
Jul 13 21:20:38 localhost podman[1128]: Storing signatures
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.247121278 +0000 UTC m=+12.426161433 image pull  
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.27285694 +0000 UTC m=+12.451897060 container create f1279e6cd657a666ccec3da4c58c39ba0>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.610656444 +0000 UTC m=+12.789696547 container init f1279e6cd657a666ccec3da4c58c39ba0d>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.616948141 +0000 UTC m=+12.795988274 container start f1279e6cd657a666ccec3da4c58c39ba0>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.617066071 +0000 UTC m=+12.796106170 container attach f1279e6cd657a666ccec3da4c58c39ba>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.67287649 +0000 UTC m=+12.851916635 container died f1279e6cd657a666ccec3da4c58c39ba0d3>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.815353081 +0000 UTC m=+12.994393189 container remove f1279e6cd657a666ccec3da4c58c39ba>
Jul 13 21:20:41 localhost systemd[1]: Started agent.service.
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_asset_tag: open /sys/class/dmi/id/board_asset_tag: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_serial: open /sys/class/dmi/id/board_serial: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_vendor: open /sys/class/dmi/id/board_vendor: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_version: open /sys/class/dmi/id/board_version: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: time="13-07-2020 21:20:41" level=warning msg="Could not find motherboard serial number" file="machine_uuid>
Jul 13 21:20:41 localhost agent[1536]: time="13-07-2020 21:20:41" level=warning msg="Error registering host: Post http://192.168.39.216:30259/api>
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_asset_tag: open /sys/class/dmi/id/board_asset_tag: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_serial: open /sys/class/dmi/id/board_serial: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_vendor: open /sys/class/dmi/id/board_vendor: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_version: open /sys/class/dmi/id/board_version: no such file or directory

No running containers on hosts:
# podman ps -a
CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES

Image downlaoded
# podman images
REPOSITORY               TAG      IMAGE ID       CREATED        SIZE
quay.io/ocpmetal/agent   latest   ad50532f294f   10 hours ago   180 MB

But, process is running
$ ps -ef | grep agent
root        1536       1  0 21:20 ?        00:00:00 /usr/local/bin/agent --host 192.168.39.216 --port 30259 --cluster-id 7af8f49f-d670-4238-98b5-03689ecda2f7 --agent-version quay.io/ocpmetal/agent:latest

Error from agent.log
time="13-07-2020 21:36:41" level=warning msg="Could not find motherboard serial number" file="machine_uuid_scanner.go:47"
time="13-07-2020 21:36:41" level=warning msg="Error registering host: Post http://192.168.39.216:30259/api/assisted-install/v1/clusters/7af8f49f-d670-4238-98b5-03689ecda2f7/hosts: dial tcp 192.168.39.216:30259: connect: connection refused" file="register_node.go:36" request_id=67da0e17-bba9-44af-849d-c5d191de7ab7





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ori Amizur 2020-07-15 08:23:51 UTC
The problem is caused  by iptables filtering rule(s) that block traffic from agent to bm-inventory.
It seems that the filtering rules are created by libvirt, but it has to be checked.
Please see https://docs.google.com/document/d/1WDc5LQjNnqpznM9YFTGb9Bg1kqPVckgGepS4KBxGSqw/edit#heading=h.9eoh8w2mv54t in the section "Handling iptables REJECT rules" how the problem was handled locally.


Note You need to log in before you can comment on or make changes to this bug.