Bug 1870047 - [Assisted-4.5][Agent] Log message is not informative enough when agent failed to register
Summary: [Assisted-4.5][Agent] Log message is not informative enough when agent failed...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ori Amizur
QA Contact: Yuri Obshansky
URL:
Whiteboard: OCP-Metal-v1.0.9
: 1866344 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-19 09:28 UTC by Lital Alon
Modified: 2022-08-25 21:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-25 21:24:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lital Alon 2020-08-19 09:28:53 UTC
Description of problem:
Log message is not informative enough when agent failed to register, due to connectivity issues (for example)

level=warning msg="Error registering host: Post http://10.35.64.32:6000/api/assisted-install/v1/clusters/3395e1e5-3978-4df8-8308-bdc98a50


Steps to Reproduce:
1. generate ISO and block node communication with the service
2. start the node with ISO
3. check agent logs

Actual results:
warning message that is not informative

Expected results:
error message, with proper message explaining why registration failed 


Additional Info:

Aug 19 09:17:11 localhost systemd[1]: Starting agent.service...
Aug 19 09:17:12 localhost podman[1478]: Trying to pull quay.io/ocpmetal/assisted-installer-agent:latest...
Aug 19 09:17:12 localhost podman[1478]:   Get https://quay.io/v2/: dial tcp: lookup quay.io on [::1]:53: read udp [::1]:38047->[::1]:53: read: connection refused
Aug 19 09:17:12 localhost podman[1478]: Error: unable to pull quay.io/ocpmetal/assisted-installer-agent:latest: unable to pull image: Error initializing source docker://quay.io/ocpmetal/assisted-installer->
Aug 19 09:17:12 localhost systemd[1]: agent.service: Control process exited, code=exited status=125
Aug 19 09:17:12 localhost systemd[1]: agent.service: Failed with result 'exit-code'.
Aug 19 09:17:12 localhost systemd[1]: Failed to start agent.service.
Aug 19 09:17:15 master-0-0 systemd[1]: agent.service: Service RestartSec=3s expired, scheduling restart.
Aug 19 09:17:15 master-0-0 systemd[1]: agent.service: Scheduled restart job, restart counter is at 1.
Aug 19 09:17:15 master-0-0 systemd[1]: Stopped agent.service.
Aug 19 09:17:15 master-0-0 systemd[1]: Starting agent.service...
Aug 19 09:17:15 master-0-0 podman[1646]: Trying to pull quay.io/ocpmetal/assisted-installer-agent:latest...
Aug 19 09:17:19 master-0-0 podman[1646]: Getting image source signatures
Aug 19 09:17:20 master-0-0 podman[1646]: Copying blob sha256:1e6c589a6bdc6168de946effc025c7c77fc87844bac1184432b8c2b32a33cf04
Aug 19 09:17:20 master-0-0 podman[1646]: Copying blob sha256:567a38b13300c859b88cd65706ddf5df8027838359f620e673dadb00e21edde0
Aug 19 09:17:20 master-0-0 podman[1646]: Copying blob sha256:390a3d946759d226768b8ed1f347558593b04b5a6d918b8b51723b9b56b5c0d6
Aug 19 09:17:30 master-0-0 podman[1646]: Copying config sha256:11e05608e174a9646c4030ea16a690ea56aa1552b5b588349ae20d7f7c7f5ae2
Aug 19 09:17:30 master-0-0 podman[1646]: Writing manifest to image destination
Aug 19 09:17:30 master-0-0 podman[1646]: Storing signatures
Aug 19 09:17:39 master-0-0 systemd[1]: Started agent.service.
Aug 19 09:17:39 master-0-0 agent[2115]: time="19-08-2020 09:17:39" level=warning msg="Error registering host: Post http://10.35.64.32:6000/api/assisted-install/v1/clusters/3395e1e5-3978-4df8-8308-bdc98a50

Comment 1 Ronnie Lazar 2020-08-25 14:59:13 UTC
*** Bug 1866344 has been marked as a duplicate of this bug. ***

Comment 2 Ronnie Lazar 2020-08-25 15:00:53 UTC
When we output the message about host not being able to register to the assisted-service we should also output why it cannot register.
In Assisted-Service we have the information, and we should pass it back to the agent so it will be printed to the logs

Comment 3 Ori Amizur 2020-09-03 11:23:03 UTC
Actually, there is not any additional information why the registration failed.  The error is returned from the go-swagger generated code.
This is the format of the log-message: 

s.Logger().Warnf("Error registering host: %s", err.Error())

It is not one of the known responses from swagger.

Comment 4 Ori Amizur 2020-09-04 10:28:42 UTC
After talking with lalon, it seems that the displayed messages are actually truncated.  When viewing messages with journalct, messages are truncated on the screen. To view the rest of the messages, the user has to scroll right. So, the messages above are actually truncated messages, that the remainder of the messages are not displayed

Comment 5 Ori Amizur 2020-09-06 06:44:26 UTC
For example, this is how a similar message is displayed:

Jun 08 09:13:55 localhost.localdomain agent[344837]: time="08-06-2020 06:13:55" level=warning msg="Error registering host: Post http://127.0.0.1:8080/api/assisted-install/v1/clusters/11111111-1111-1111-1111-111111111111/hosts: read tcp 127.0.0.1:57180->127.0.0.1:8080: read: connection reset by peer" file="register_node.go:34" request_id=ae50c7a0-4085-4100-96d9-54d20e76e82c

Comment 6 Ori Amizur 2020-09-13 08:01:40 UTC
This issue was used for agent logging improvements. PRs:
https://github.com/openshift/assisted-installer-agent/pull/32
https://github.com/openshift/assisted-installer-agent/pull/33

Comment 7 Lital Alon 2020-09-16 11:16:29 UTC
Verified on Staging, Sep16, v1.0.39


Note You need to log in before you can comment on or make changes to this bug.