Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1856535

Summary: Discovery ocpmetal/agent failed to start on hosts
Product: OpenShift Container Platform Reporter: Yuri Obshansky <yobshans>
Component: assisted-installerAssignee: Ronnie Lazar <alazar>
assisted-installer sub component: discovery-agent QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED NOTABUG Docs Contact:
Severity: urgent    
Priority: unspecified CC: aos-bugs, oamizur
Version: 4.5   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-15 12:45:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yuri Obshansky 2020-07-13 21:38:36 UTC
Description of problem:
Virt environment. 
Discovery ocpmetal/agent failed to start on hosts
[core@localhost ~]$ sudo journalctl -u agent
-- Logs begin at Mon 2020-07-13 21:19:06 UTC, end at Mon 2020-07-13 21:26:23 UTC. --
Jul 13 21:20:28 localhost systemd[1]: Starting agent.service...
Jul 13 21:20:29 localhost podman[1128]: 2020-07-13 21:20:29.356988325 +0000 UTC m=+0.536028425 system refresh
Jul 13 21:20:29 localhost podman[1128]: Trying to pull quay.io/ocpmetal/agent:latest...
Jul 13 21:20:32 localhost podman[1128]: Getting image source signatures
Jul 13 21:20:33 localhost podman[1128]: Copying blob sha256:90fc9795ab2927c8568d06190455b32a63f469a0c68e1b68378aae33fa599b18
Jul 13 21:20:33 localhost podman[1128]: Copying blob sha256:90e2fe808d18ae8e1429680d4f9d83c65dc0999fd3b2fe59798a39703f05ce0e
Jul 13 21:20:34 localhost podman[1128]: Copying blob sha256:5d20c808ce198565ff70b3ed23a991dd49afac45dece63474b27ce6ed036adc6
Jul 13 21:20:38 localhost podman[1128]: Copying config sha256:ad50532f294f987ab523297eea85485cf33939114d6f8f5325e83522193dba9d
Jul 13 21:20:38 localhost podman[1128]: Writing manifest to image destination
Jul 13 21:20:38 localhost podman[1128]: Storing signatures
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.247121278 +0000 UTC m=+12.426161433 image pull  
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.27285694 +0000 UTC m=+12.451897060 container create f1279e6cd657a666ccec3da4c58c39ba0>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.610656444 +0000 UTC m=+12.789696547 container init f1279e6cd657a666ccec3da4c58c39ba0d>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.616948141 +0000 UTC m=+12.795988274 container start f1279e6cd657a666ccec3da4c58c39ba0>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.617066071 +0000 UTC m=+12.796106170 container attach f1279e6cd657a666ccec3da4c58c39ba>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.67287649 +0000 UTC m=+12.851916635 container died f1279e6cd657a666ccec3da4c58c39ba0d3>
Jul 13 21:20:41 localhost podman[1128]: 2020-07-13 21:20:41.815353081 +0000 UTC m=+12.994393189 container remove f1279e6cd657a666ccec3da4c58c39ba>
Jul 13 21:20:41 localhost systemd[1]: Started agent.service.
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_asset_tag: open /sys/class/dmi/id/board_asset_tag: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_serial: open /sys/class/dmi/id/board_serial: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_vendor: open /sys/class/dmi/id/board_vendor: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: WARNING: Unable to read board_version: open /sys/class/dmi/id/board_version: no such file or directory
Jul 13 21:20:41 localhost agent[1536]: time="13-07-2020 21:20:41" level=warning msg="Could not find motherboard serial number" file="machine_uuid>
Jul 13 21:20:41 localhost agent[1536]: time="13-07-2020 21:20:41" level=warning msg="Error registering host: Post http://192.168.39.216:30259/api>
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_asset_tag: open /sys/class/dmi/id/board_asset_tag: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_serial: open /sys/class/dmi/id/board_serial: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_vendor: open /sys/class/dmi/id/board_vendor: no such file or directory
Jul 13 21:21:41 localhost agent[1536]: WARNING: Unable to read board_version: open /sys/class/dmi/id/board_version: no such file or directory

No running containers on hosts:
# podman ps -a
CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES

Image downlaoded
# podman images
REPOSITORY               TAG      IMAGE ID       CREATED        SIZE
quay.io/ocpmetal/agent   latest   ad50532f294f   10 hours ago   180 MB

But, process is running
$ ps -ef | grep agent
root        1536       1  0 21:20 ?        00:00:00 /usr/local/bin/agent --host 192.168.39.216 --port 30259 --cluster-id 7af8f49f-d670-4238-98b5-03689ecda2f7 --agent-version quay.io/ocpmetal/agent:latest

Error from agent.log
time="13-07-2020 21:36:41" level=warning msg="Could not find motherboard serial number" file="machine_uuid_scanner.go:47"
time="13-07-2020 21:36:41" level=warning msg="Error registering host: Post http://192.168.39.216:30259/api/assisted-install/v1/clusters/7af8f49f-d670-4238-98b5-03689ecda2f7/hosts: dial tcp 192.168.39.216:30259: connect: connection refused" file="register_node.go:36" request_id=67da0e17-bba9-44af-849d-c5d191de7ab7





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Ori Amizur 2020-07-15 08:23:51 UTC
The problem is caused  by iptables filtering rule(s) that block traffic from agent to bm-inventory.
It seems that the filtering rules are created by libvirt, but it has to be checked.
Please see https://docs.google.com/document/d/1WDc5LQjNnqpznM9YFTGb9Bg1kqPVckgGepS4KBxGSqw/edit#heading=h.9eoh8w2mv54t in the section "Handling iptables REJECT rules" how the problem was handled locally.