Bug 1566059

Summary: Scoped link local IPv6 addresses break VM listing (happens when ovirt-guest-agent is not installed but qemu-guest-agent is)
Product: [oVirt] ovirt-engine Reporter: Tomáš Golembiovský <tgolembi>
Component: BLL.NetworkAssignee: eraviv
Status: CLOSED CURRENTRELEASE QA Contact: Petr Matyáš <pmatyas>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.2.6CC: bugs, danken, kmorgan, lsvaty, lveyde, michal.skrivanek, msheena, pmatyas, tgolembi
Target Milestone: ovirt-4.2.3Keywords: AutomationBlocker
Target Release: ---Flags: rule-engine: ovirt-4.2+
ykaul: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.3.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-14 15:11:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1551350    
Attachments:
Description Flags
Engine log none

Description Tomáš Golembiovský 2018-04-11 12:45:44 UTC
Created attachment 1420285 [details]
Engine log

Description of problem:

IPv6 addresses may contain scope [1]. Such addresses have the from: <address>%<zone>
 
When guest agent provides IPv6 with scope in stats to the engine it breaks VM listing in the UI. engine.log contains the following errors:

2018-04-11 14:35:25,080+02 ERROR [org.ovirt.engine.core.bll.SearchQuery] (default task-10) [794ba721-326f-469f-b251-896172ea1519] Query 'SearchQuery' failed: StatementCallba
ck; SQL [SELECT * FROM ((SELECT  vms.* FROM  vms  )  ORDER BY vm_name ASC ) as T1 OFFSET (1 -1) LIMIT 100]; ERROR: invalid input syntax for type inet: "fe80::343d:ee87:2a50:
daf5%4"

[1] https://tools.ietf.org/html/rfc4007#section-11

Comment 1 Michal Skrivanek 2018-04-13 08:08:25 UTC
I do not mind alternative temporary solution of trimming the report, but one way or the other it should be resolved by 4.2.3.
How/where do you suggest to fix this?

Comment 2 Dan Kenigsberg 2018-04-15 09:19:51 UTC
Michal, is this new in any way? What happens when the guest reports % to the rhv-4.1?

Regardless, Engine must never trust the guest and quote its data.

Comment 3 Tomáš Golembiovský 2018-04-15 18:29:11 UTC
(In reply to Dan Kenigsberg from comment #2)
> Michal, is this new in any way?

It is new in the sense that such IPs are reported from QEMU Guest Agent and QEMU Guest Agent polling is a new feature in 4.2.

> What happens when the guest reports % to the
> rhv-4.1?

I did not try this, but I expect the same problem there. That being said, it's probably not an issue. IPs reported by oVirt Guest Agent don't contain the scope (the source of information is different from QEMU Guest Agent). Unless something changes inside Windows in the future I don't think older RHV versions are in danger.

> 
> Regardless, Engine must never trust the guest and quote its data.

Comment 4 Yaniv Kaul 2018-04-17 19:57:41 UTC
(In reply to Tomáš Golembiovský from comment #3)
> (In reply to Dan Kenigsberg from comment #2)
> > Michal, is this new in any way?
> 
> It is new in the sense that such IPs are reported from QEMU Guest Agent and
> QEMU Guest Agent polling is a new feature in 4.2.

What is the possibility to revert back to ovirt-guest-agent until we can fix it?
Sounds somewhat easier than fix in engine, for the time being?
> 
> > What happens when the guest reports % to the
> > rhv-4.1?
> 
> I did not try this, but I expect the same problem there. That being said,
> it's probably not an issue. IPs reported by oVirt Guest Agent don't contain
> the scope (the source of information is different from QEMU Guest Agent).
> Unless something changes inside Windows in the future I don't think older
> RHV versions are in danger.
> 
> > 
> > Regardless, Engine must never trust the guest and quote its data.

So true!

Comment 5 Tomáš Golembiovský 2018-04-18 06:45:27 UTC
(In reply to Yaniv Kaul from comment #4)
> (In reply to Tomáš Golembiovský from comment #3)
> > (In reply to Dan Kenigsberg from comment #2)
> > > Michal, is this new in any way?
> > 
> > It is new in the sense that such IPs are reported from QEMU Guest Agent and
> > QEMU Guest Agent polling is a new feature in 4.2.
> 
> What is the possibility to revert back to ovirt-guest-agent until we can fix
> it?
> Sounds somewhat easier than fix in engine, for the time being?

This is only problem when ovirt-guest-agent is not installed. If both are installed and running then data from ovirt-guest-agent is used.

> > 
> > > What happens when the guest reports % to the
> > > rhv-4.1?
> > 
> > I did not try this, but I expect the same problem there. That being said,
> > it's probably not an issue. IPs reported by oVirt Guest Agent don't contain
> > the scope (the source of information is different from QEMU Guest Agent).
> > Unless something changes inside Windows in the future I don't think older
> > RHV versions are in danger.
> > 
> > > 
> > > Regardless, Engine must never trust the guest and quote its data.
> 
> So true!

Comment 7 eraviv 2018-04-23 07:58:45 UTC
After looking into the details, here are my observations:

Engine:
-------
Initially engine saves all reported ips (both ipv4 and ipv6) to vm_dynamic.vm_ip as a single string in a 'text' type (e.g. "192.168.122.1 fe80::1").

When the 'vms' view is called it executes the stored procedure "fn_get_comparable_ip_list" which breaks the string into individual ips and tries to build an array of type 'inet' with each ip in a separate item of the array. 

'inet' is a built-in postgres type which stores only valid ip addresses and supports sorting thereof, but which does not support zone ids for ipv6 addresses. Engine has been using the inet type in the db for its sorting functionality.

The ipv6 address with the '%' is rejected when the inet array is being populated by the stored procedure, and an exception is thrown back to engine. 


IPv6:
-----
- According to [1] non-global-scope ipv6 addresses may have a zone suffix in one of the formats:
        <address>%<zone_id>
        <address>%<zone_id>/<prefix>  

- According to [2] IPv6 requires a link-local address on every network interface on which the IPv6 protocol is enabled, even when routable addresses are also assigned.The link-local address is required for [...] IPv6-based protocols, such as DHCPv6.

- According to [3] link-Local addresses are designed to be used for addressing on a single link for purposes such as automatic address configuration, neighbour discovery, or when no routers are present.

- According to the bug, qemu-guest-agent already reports non-global addresses with their zone_id appended to the address, and vdsm forwards them as-is.

- According to [4] and to postgres 'inet' type documentation there are no plans to support the zone_id in this type.

Conclusion:
-----------
Addresses with a zone_id suffix cannot be saved to the db because the 'inet' type does not support them, but they cannot be totally ignored because a reporting interface may only have a link-local address. 

So the suggested solution is to strip the ipv6 addresses of the zone_id on entering the engine, and saving them in the db without it.

It is assumed that as far as we currently know, addresses are not reported with their prefix attached so no loss of current info is expected.

When the addresses are requested from the engine via the REST API, they are reported by engine under their respective interfaces, so the zone_id is redundant in that use case. 
In the web-admin, only the first ipv4 and ipv6 addresses of the guests are actually visible (without hovering on the field) so it is assumed that a duplicate ipv6 will not appear unless hovered over.

----------------------------------------------------
[1] https://tools.ietf.org/html/rfc4007#section-6
[2] https://en.wikipedia.org/wiki/Link-local_address#IPv6
[3] https://tools.ietf.org/html/rfc4291#section-2.5.6
[4] http://www.postgresql-archive.org/IPv6-link-local-addresses-and-init-data-type-td5905510.html

Comment 8 Tomáš Golembiovský 2018-04-23 10:18:54 UTC
(In reply to eraviv from comment #7)

> So the suggested solution is to strip the ipv6 addresses of the zone_id on
> entering the engine, and saving them in the db without it.

Seems good enough for me until we have a valid use-case for preserving it.

Comment 9 Martin Perina 2018-05-02 10:56:47 UTC
*** Bug 1573830 has been marked as a duplicate of this bug. ***

Comment 10 msheena 2018-05-07 08:23:27 UTC
Tomáš,
What are the setup required to recreate and steps?
Would appreciate assistance on how to check what zone_id qemu agent is reporting?

Comment 11 Tomáš Golembiovský 2018-05-07 10:18:55 UTC
You need a Windows VM with recent QEMU-GA (idealy use RHV-toolsSetup ISO from 4.2). Also you need to disable oVirt guest agent service to recreate the issue.

Comment 12 Petr Matyáš 2018-05-11 10:46:37 UTC
Verified on ovirt-engine-4.2.3.5-0.1.el7.noarch

Comment 13 Sandro Bonazzola 2018-05-14 15:11:53 UTC
This bugzilla is included in oVirt 4.2.3 release, published on May 4th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 14 Dominik Holler 2019-04-15 16:15:54 UTC
*** Bug 1626220 has been marked as a duplicate of this bug. ***