Bug 1513945 - Ironic trying to contact IPA agent using its link local address during IPv6 deployment
Summary: Ironic trying to contact IPA agent using its link local address during IPv6 d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 12.0 (Pike)
Assignee: Derek Higgins
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks: 1335964
TreeView+ depends on / blocked
 
Reported: 2017-11-16 10:39 UTC by Derek Higgins
Modified: 2018-02-05 19:15 UTC (History)
6 users (show)

Fixed In Version: openstack-ironic-python-agent-2.2.2-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:20:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 123977 0 None MERGED List zaqar in the developer project listing 2020-11-05 21:56:43 UTC
OpenStack gerrit 520582 0 None MERGED Ignore IPv6 link local addresses 2020-11-05 21:56:43 UTC
OpenStack gerrit 521949 0 None MERGED Ignore IPv6 link local addresses 2020-11-05 21:56:43 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Internal Links: 1459187

Description Derek Higgins 2017-11-16 10:39:38 UTC
Description of problem:
During a deployment over IPv6, the IPA agent send ironic a "callback_url", this url is then used by ironic to communicate with the agent. In the latest version the agent appears to be picking up the IPv6 link local address instead of the routeable IPv6 address.

This results in ironic failing to contact the IPA agent
2017-11-16 02:58:08.463 22632 ERROR ironic.drivers.modules.agent_base_vendor IronicException: Error invoking agent command iscsi.start_iscsi_target for node e8b17346-e8b4-4fa2-920e-d50dda81282e. Error: HTTPConnectionPool(host='fe80::f816:3eff:fe4e:27f7', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x12599110>: Failed to establish a new connection: [Errno 22] EINVAL',))

Version-Release number of selected component (if applicable):
python-ironic-python-agent-2.2.2-0.20171027214812.bd8c6c7.el7ost.noarch
python-ironic-lib-2.10.0-1.el7ost.noarch
openstack-ironic-python-agent-2.2.2-0.20171027214812.bd8c6c7.el7ost.noarch

How reproducible:
every time

Additional info:

tcpdump shows the wrong IP being sent to ironic

POST /v1/heartbeat/e0c67420-4f66-4571-9db3-1190e0370a7b HTTP/1.1
Host: [fd00:1101::0001]:6385
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: application/json
User-Agent: python-requests/2.11.1
X-OpenStack-Ironic-API-Version: 1.22
Content-Type: application/json
Content-Length: 59

{"callback_url": "http://[fe80::f816:3eff:fe4e:27f7]:9999"}


I suspect something has changed with the timing, the IPA agent used to select the slaac IP address to use, perhapes its now running earlier in the process before the slaac address is assigned.

Comment 1 Derek Higgins 2017-11-16 11:39:23 UTC
Running some commands on a running IPA agent shows how this can happen if the fd00:1101.... hasn't been assigned to eth0

$ ip -o addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: eth0    inet6 fd00:1101::f816:3eff:fe4e:27f7/64 scope global mngtmpaddr dynamic \       valid_lft 86379sec preferred_lft 14379sec
2: eth0    inet6 fe80::f816:3eff:fe4e:27f7/64 scope link \       valid_lft forever preferred_lft forever

$ ip route get fd00:1101::1
fd00:1101::1 dev eth0 proto kernel src fd00:1101::f816:3eff:fe4e:27f7 metric 256 

# Removing the fd00:1101:.. address shows the link-local address being returned (this is the command thats run by API to find the ip address to use)
# see ironic_python_agent/agent.py  _get_route_source()

$ ip addr del fd00:1101::f816:3eff:fe4e:27f7/64 dev eth0 
$ ip route get fd00:1101::1
fd00:1101::1 dev eth0 proto kernel src fe80::f816:3eff:fe4e:27f7 metric 256 





As a test I've inserted a 5 second sleep into the IPA agent code, and deployment now works, so I think that the agent is running this command before the interface gets its slaac address, in previous versions it must have been after.

Comment 2 Derek Higgins 2017-11-16 13:27:59 UTC
Fix proposed upstream
    https://review.openstack.org/#/c/520582/

Comment 9 Bob Fournier 2017-11-22 16:05:58 UTC
Fix merged downstream - https://code.engineering.redhat.com/gerrit/#/c/123977/

Comment 11 Derek Higgins 2017-11-24 12:18:02 UTC
Tested and fixed  

Got a successful IPv6 deploy with the following in the logs

Nov 23 09:37:58 localhost.localdomain ironic-python-agent[479]: 2017-11-23 09:37:57.883 479 INFO ironic_python_agent.agent [-] Ignoring link-local source to fd00:1101::0001: fd00:1101::1 dev eth0 proto kernel src fe80::f816:3eff:fed8:b16e metric 256

Comment 14 errata-xmlrpc 2017-12-13 22:20:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.