RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1467845 - Ovirt-guest-agent fail to get ipv4 adresses with RuntimeError: Could not open a NETLINK connection for eth0
Summary: Ovirt-guest-agent fail to get ipv4 adresses with RuntimeError: Could not open...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: python-ethtool
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: beta
: ---
Assignee: Python Maintainers
QA Contact: Branislav Náter
URL:
Whiteboard:
: 1486487 (view as bug list)
Depends On:
Blocks: 1630906
TreeView+ depends on / blocked
 
Reported: 2017-07-05 10:22 UTC by Raz Tamir
Modified: 2019-10-22 08:32 UTC (History)
24 users (show)

Fixed In Version: python-ethtool-0.8-8.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 12:40:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vm logs (105.41 KB, application/x-gzip)
2017-07-05 10:28 UTC, Raz Tamir
no flags Details
OGA use case (667 bytes, text/x-python)
2017-09-27 10:41 UTC, Tomáš Golembiovský
no flags Details
ovirt-guest-agent (46.02 KB, text/plain)
2018-10-21 15:17 UTC, Roni
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2067 0 None None None 2019-08-06 12:40:42 UTC

Description Raz Tamir 2017-07-05 10:22:46 UTC
Description of problem:
When starting a VM that was thinly cloned from a template, I wee that the VM didn't get IP although it has one:

[root@localhost var]# ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.82.188  netmask 255.255.254.0  broadcast 10.35.83.255
        inet6 fe80::21a:4aff:fe16:25bc  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:4a:16:25:bc  txqueuelen 1000  (Ethernet)
        RX packets 3862  bytes 279513 (272.9 KiB)
        RX errors 0  dropped 43  overruns 0  frame 0
        TX packets 761  bytes 758489 (740.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


The errors from ovirt-guest-agent.log:

Dummy-2::ERROR::2017-07-05 09:18:12,773::GuestAgentLinux2::217::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 213, in ethtool_list_nics
    'inet': self._get_ipv4_addresses(devinfo),
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses
    for ip in dev.get_ipv4_addresses():
RuntimeError: Could not open a NETLINK connection for eth0
Dummy-1::ERROR::2017-07-05 09:18:16,632::GuestAgentLinux2::217::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 213, in ethtool_list_nics
    'inet': self._get_ipv4_addresses(devinfo),
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses
    for ip in dev.get_ipv4_addresses():
SystemError: error return without exception set


Version-Release number of selected component (if applicable):
ovirt-guest-agent-common-1.0.13-3.el7ev.noarch
qemu-guest-agent-2.5.0-3.el7.x86_64
ethtool-4.8-1.el7.x86_64
python-ethtool-0.8-5.el7.x86_64
Red Hat Enterprise Linux Server release 7.3



How reproducible:
~30%

Steps to Reproduce:
1. Clone VM from template as thin copy
2. Start the VM
3.

Actual results:


Expected results:


Additional info:

Comment 1 Raz Tamir 2017-07-05 10:28:29 UTC
Created attachment 1294539 [details]
vm logs

Comment 2 Tomas Jelinek 2017-07-12 10:30:45 UTC
so the VM actually has the IP only it is not reported properly. Targeting 4.2

Comment 3 Red Hat Bugzilla Rules Engine 2017-07-12 10:30:52 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Yaniv Kaul 2017-08-23 15:02:51 UTC
Can we please take a look and see what the latest status is? It continues to fail QE automation tests.

Comment 5 Tomáš Golembiovský 2017-08-24 10:47:50 UTC
I've got a VM from Raz yesterday and it seems like a race between network setup and the guest agent.

In the journal I found:

Aug 23 11:23:05.126492 localhost.localdomain network[695]: Bringing up loopback interface:  [  OK  ]
Aug 23 11:23:05.340381 localhost.localdomain network[695]: Bringing up interface eth0:
Aug 23 11:23:05.366656 localhost.localdomain NetworkManager[595]: <info>  [1503476585.3659] device (eth0): link connected
Aug 23 11:23:07.209084 localhost.localdomain dhclient[845]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 (xid=0x29d1da05)
Aug 23 11:23:07.588767 localhost.localdomain dhclient[845]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x29d1da05)
Aug 23 11:23:07.591290 localhost.localdomain dhclient[845]: DHCPOFFER from 10.35.83.254
Aug 23 11:23:08.097425 localhost.localdomain dhclient[845]: DHCPACK from 10.35.83.254 (xid=0x29d1da05)


and in ovirt-guest-agent.log an entry that corresponds to the query for IP addresses:

Dummy-2::ERROR::2017-08-23 11:23:05,571::GuestAgentLinux2::217::root::Error retrieving network interfaces.                                                                    
Traceback (most recent call last):         
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 213, in ethtool_list_nics                                                                                     
    'inet': self._get_ipv4_addresses(devinfo),                                         
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses                                                                                   
    for ip in dev.get_ipv4_addresses():    
RuntimeError: Could not open a NETLINK connection for eth0                             

From this point onwards the python-ethtool module is in an inconsistent state and every subsequenty attempt to query the IP addressses ends with an error:

Dummy-1::ERROR::2017-08-23 11:23:10,013::GuestAgentLinux2::217::root::Error retrieving network interfaces.                                                                    
Traceback (most recent call last):         
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 213, in ethtool_list_nics                                                                                     
    'inet': self._get_ipv4_addresses(devinfo),                                         
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses                                                                                   
    for ip in dev.get_ipv4_addresses():    
SystemError: error return without exception set                                        


Sadly, I'm still not able to reproduce the error after the boot. This makes it
hard to come up with a solution for this.

You can try to install the newer version of python-ethtool from Fedora.
python2-ethtool-0.13-1.fc26.x86_64 installs cleanly on RHEL 7.4 without
requiring any other dependencies. The code was heavily changed between 0.8 in
RHEL and latest 0.13 so it is possible the problem has been fixed.

Comment 6 Raz Tamir 2017-08-28 12:41:07 UTC
Executed our tier 2, twice, and I don't see the issue anymore with the new python2-ethtool package

Comment 7 Tomáš Golembiovský 2017-08-29 22:55:35 UTC
I've opened downstream bug 1486487 on python-ethtool. We'll have to live with the workaround until this is fixed in d/s package.

Comment 8 Tomáš Golembiovský 2017-08-30 09:12:42 UTC
On the second thought I'm moving the bug to platform.

Comment 10 Tomáš Golembiovský 2017-08-30 09:13:54 UTC
*** Bug 1486487 has been marked as a duplicate of this bug. ***

Comment 12 Gil Klein 2017-08-31 10:22:52 UTC
I'm asking to consider this for 7.4.z due to the impact on RHV automation tests.

Comment 13 Lumír Balhar 2017-09-26 10:18:38 UTC
Hello.

From comments above it seems that the problem is solved in the newest version of python-ethtool. Because we should avoid rebasing the whole ethtool to the newest version in RHEL, I can try to make some patch which fixes this bug.

Problem is that from tracebacks above I cannot identify how ethtools is used and which functions are called.

Could you please provide the part of ovirt-guest-agent/GuestAgentLinux2.py where I can find the invocation of ethtool and inspect a way how ethtools is used?

Thank you.

Comment 14 Tomáš Golembiovský 2017-09-27 10:41:08 UTC
Hi Lumir,

the relevant code in OGA enumerates the devices and then requests the IP addresses and MAC address. I've created a minimal script that does the same thing. See the attachment.

Comment 15 Tomáš Golembiovský 2017-09-27 10:41:58 UTC
Created attachment 1331363 [details]
OGA use case

Comment 16 Lumír Balhar 2017-10-06 11:39:46 UTC
Hello.

I spend some time trying to prepare a patch for your really simple usage of python-ethtool but I failed and I honestly think that rebasing is the only solution here.

I thought that codebase was heavily changed because of porting it to Python 3 but it isn't and history contains a lot of changes in internal structures, internal methods renaming, fixes, migration from libnl-1 to libnl-3, the reimplementation of IPv6 support, changes of data types etc.

Could you test OGA with a newer version of python-ethtool? I am asking because rebasing python-ethtool to the newest version could be problematic because a lot of important packages depend on it.

I am suggesting to try versions 0.9, 0.10 and 0.11 one by one because since v0.11 there are a lot of changes related to Python 3 compatibility, CI, docs etc not that important for the module itself.

What do you think about it?

Comment 17 Tomáš Golembiovský 2017-10-06 14:15:25 UTC
It seemed to me thet the important patch was added between 0.9 and 0.10 so it's possible 0.10 could work -- or at least fail with a more meaningful message.

Raz, could you please run the test suite with the following packages to see which if any of those fixes the issue?

https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm

Comment 18 Charalampos Stratakis 2017-10-10 14:16:53 UTC
One of the reasons that python-maintenance is reluctant in updating the package is due to important packages that depend on it.

Packages that require python-ethtool as a runtime dependency.

firstboot
rhn-client-tools
subscription-manager
targetcli
tuna

Comment 19 Raz Tamir 2017-10-15 10:30:16 UTC
(In reply to Tomáš Golembiovský from comment #17)
> It seemed to me thet the important patch was added between 0.9 and 0.10 so
> it's possible 0.10 could work -- or at least fail with a more meaningful
> message.
> 
> Raz, could you please run the test suite with the following packages to see
> which if any of those fixes the issue?

I'm currently using python2-ethtool-0.13-1.fc26.x86_64 as you suggested in comment #5 and seems like it solves our issue
> 
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/
> x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/
> x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm

Comment 20 Charalampos Stratakis 2017-10-16 09:07:21 UTC
(In reply to Raz Tamir from comment #19)
> (In reply to Tomáš Golembiovský from comment #17)
> > It seemed to me thet the important patch was added between 0.9 and 0.10 so
> > it's possible 0.10 could work -- or at least fail with a more meaningful
> > message.
> > 
> > Raz, could you please run the test suite with the following packages to see
> > which if any of those fixes the issue?
> 
> I'm currently using python2-ethtool-0.13-1.fc26.x86_64 as you suggested in
> comment #5 and seems like it solves our issue
> > 
> > https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/
> > x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
> > https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/
> > x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm

Would it be possible to use a lower version though? In order for us to determine where the fix happened?

Comment 21 Charalampos Stratakis 2017-10-23 14:54:32 UTC
Adding the maintainers of affected packages by a potentional rebase of python-ethtool on cc for their opinion.

Comment 22 Tomáš Kašpárek 2017-10-24 11:24:15 UTC
As for rhn-client tools I'd say we're okay with this as we were using python-ethtool for long time while it was available in Fedora and we're using just a limited part of it.

However I am still inclined towards cherry-picking the fix instead off rebase.

Comment 23 Raz Tamir 2017-11-09 07:40:23 UTC
(In reply to Charalampos Stratakis from comment #20)
> (In reply to Raz Tamir from comment #19)
> > (In reply to Tomáš Golembiovský from comment #17)
> > > It seemed to me thet the important patch was added between 0.9 and 0.10 so
> > > it's possible 0.10 could work -- or at least fail with a more meaningful
> > > message.
> > > 
> > > Raz, could you please run the test suite with the following packages to see
> > > which if any of those fixes the issue?
> > 
> > I'm currently using python2-ethtool-0.13-1.fc26.x86_64 as you suggested in
> > comment #5 and seems like it solves our issue
> > > 
> > > https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/
> > > x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
> > > https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/
> > > x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm
> 
> Would it be possible to use a lower version though? In order for us to
> determine where the fix happened?

Is my input still relevant here?

Comment 24 Tomáš Golembiovský 2017-11-09 09:20:46 UTC
(In reply to Raz Tamir from comment #23)
> Is my input still relevant here?

Yes. Could you please test the two other versions of python-ethtool mentioned in comment #17, please? Let us know whether those versions work or produce errors.

Comment 25 Raz Tamir 2017-11-14 08:08:40 UTC
(In reply to Tomáš Golembiovský from comment #17)
> It seemed to me thet the important patch was added between 0.9 and 0.10 so
> it's possible 0.10 could work -- or at least fail with a more meaningful
> message.
> 
> Raz, could you please run the test suite with the following packages to see
> which if any of those fixes the issue?
> 
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/
> x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/
> x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm

Hi Tomáš,

I couldn't reproduce the bug with both RPMs

Comment 26 Charalampos Stratakis 2017-11-28 15:04:03 UTC
Since the deadline for the alpha has already passed is it ok if we try to tackle this for the beta?

Also we don't currently have any useful reproducer from our side, so would it be possible to provide one for us so we can then bisect the commit that fixed your issue? Or maybe outline the steps that you take in order to get those failures?

Comment 27 Raz Tamir 2017-11-28 15:41:30 UTC
Hi,

I couldn't reproduce it on versions later than python-ethtool-0.8-5.el7.x86_64

Currently working with python2-ethtool-0.13-1.fc26.x86_64 as suggested in comment #5 and it works fine for us

Comment 28 Tomas Orsava 2017-11-28 16:01:28 UTC
Hi Raz!

We're glad that the Fedora version of python2-ethtool solved the issue for you! And we'd like to fix the RHEL version as well, so you won't have to encounter the issue again in the future.

You said you couldn't reproduce it on versions later than python-ethtool-0.8-5.el7.x86_64, does it mean that version 0.8-5 was the last one where you manager to reproduce the issue?


And further, we'd really appreciate it if you could explain how to reproduce the issue in more detail so we could take a look at it ourselves.

Thanks for your time!

Comment 29 Raz Tamir 2017-11-28 16:11:09 UTC
Hi Tomas,

(In reply to Tomas Orsava from comment #28)
> Hi Raz!
> 
> We're glad that the Fedora version of python2-ethtool solved the issue for
> you! And we'd like to fix the RHEL version as well, so you won't have to
> encounter the issue again in the future.
> 
> You said you couldn't reproduce it on versions later than
> python-ethtool-0.8-5.el7.x86_64, does it mean that version 0.8-5 was the
> last one where you manager to reproduce the issue?
Yes,
I couldn't reproduce it after this version.  Also tried:
https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm
As suggested in comment #17.
> 
> 
> And further, we'd really appreciate it if you could explain how to reproduce
> the issue in more detail so we could take a look at it ourselves.

The simplest way is to create a script to start VM, wait for IP and do it again until you stuck and the VM does not get IP.
> 
> Thanks for your time!

Comment 30 Charalampos Stratakis 2017-11-28 16:43:20 UTC
(In reply to Raz Tamir from comment #29)
> Hi Tomas,
> 
> (In reply to Tomas Orsava from comment #28)
> > Hi Raz!
> > 
> > We're glad that the Fedora version of python2-ethtool solved the issue for
> > you! And we'd like to fix the RHEL version as well, so you won't have to
> > encounter the issue again in the future.
> > 
> > You said you couldn't reproduce it on versions later than
> > python-ethtool-0.8-5.el7.x86_64, does it mean that version 0.8-5 was the
> > last one where you manager to reproduce the issue?
> Yes,
> I couldn't reproduce it after this version.  Also tried:
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.11/8.fc26/
> x86_64/python-ethtool-0.11-8.fc26.x86_64.rpm
> https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.10/1.fc21/
> x86_64/python-ethtool-0.10-1.fc21.x86_64.rpm
> As suggested in comment #17.
> > 
> > 
> > And further, we'd really appreciate it if you could explain how to reproduce
> > the issue in more detail so we could take a look at it ourselves.
> 
> The simplest way is to create a script to start VM, wait for IP and do it
> again until you stuck and the VM does not get IP.
> > 
> > Thanks for your time!

Would you be able to provide us with said script or how to setup the environment? We are not really experienced with the workflows of ovirt.

Comment 39 Charalampos Stratakis 2018-01-04 15:30:34 UTC
(In reply to Miro Hrončok from comment #38)
> Could you please try with 0.9 as well?
> 
> https://kojipkgs.fedoraproject.org/packages/python-ethtool/0.9/2.fc19/x86_64/
> python-ethtool-0.9-2.fc19.x86_64.rpm

Any update on that request?

Comment 40 Raz Tamir 2018-01-07 12:31:22 UTC
I tested with https://kojipkgs.fedoraproject.org/packages/python-ethtool/0.9/2.fc19/x86_64/python-ethtool-0.9-2.fc19.x86_64.rpm and issue did not reproduce

python-ethtool-0.9-2.fc19.x86_64

Comment 41 Charalampos Stratakis 2018-04-30 13:52:50 UTC
These are the commits between the 0.8 and 0.9 version[0]. I will try to provide some rpm's out of each commit.

[0] https://github.com/fedora-python/python-ethtool/compare/v0.8...v0.9

Comment 42 Charalampos Stratakis 2018-05-14 11:26:47 UTC
Here are the rpm's with all the changes between 0.8 and 0.9 versions applied consecutively on top of each other. Do note though, that the d9922c0 is a change making ethtool compatible with libnl3 while dropping libnl support, so it is not feasible to backport this to RHEL. Also the builds will be garbage collected after a few days. Please take a look at them, we should be able to define by that which commit fixes your issue.  

Commit c61363e: Already applied
Commit d9922c0: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272668
Commit abab733: Doesn't build
Commit 052d432: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272835
Commit f8b0e05: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272991
Commit d3f5fd7: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273178
Commit 2c2228b: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273222
Commit 3d7572b: Upstream SPEC file changes
Commit d345876: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273555
Commit 3aba60f: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273643
Commit 7328447: MANIFEST.in changes
Commit f805e92: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273720

Comment 43 Raz Tamir 2018-05-14 14:20:37 UTC
I'm not sure what is the needinfo for?

Comment 44 Charalampos Stratakis 2018-05-14 14:32:37 UTC
(In reply to Charalampos Stratakis from comment #42)
> Here are the rpm's with all the changes between 0.8 and 0.9 versions applied
> consecutively on top of each other. Do note though, that the d9922c0 is a
> change making ethtool compatible with libnl3 while dropping libnl support,
> so it is not feasible to backport this to RHEL. Also the builds will be
> garbage collected after a few days. Please take a look at them, we should be
> able to define by that which commit fixes your issue.  
> 
> Commit c61363e: Already applied
> Commit d9922c0:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272668
> Commit abab733: Doesn't build
> Commit 052d432:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272835
> Commit f8b0e05:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272991
> Commit d3f5fd7:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273178
> Commit 2c2228b:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273222
> Commit 3d7572b: Upstream SPEC file changes
> Commit d345876:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273555
> Commit 3aba60f:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273643
> Commit 7328447: MANIFEST.in changes
> Commit f805e92:
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273720

Could you test those rpm's to figure out which ones fail and which do not for the issue?

Comment 45 Charalampos Stratakis 2018-06-06 11:57:16 UTC
The builds have been garbage collected and no answer was provided.

If this does not pose a serious issue anymore I will close the bug in a week, as most probably the fix for that happened after porting ethtool to libnl3 which is a change we will not do, due to compatibility reasons.

Comment 46 Raz Tamir 2018-06-07 14:49:46 UTC
(In reply to Charalampos Stratakis from comment #44)
> (In reply to Charalampos Stratakis from comment #42)
> > Here are the rpm's with all the changes between 0.8 and 0.9 versions applied
> > consecutively on top of each other. Do note though, that the d9922c0 is a
> > change making ethtool compatible with libnl3 while dropping libnl support,
> > so it is not feasible to backport this to RHEL. Also the builds will be
> > garbage collected after a few days. Please take a look at them, we should be
> > able to define by that which commit fixes your issue.  
> > 
> > Commit c61363e: Already applied
> > Commit d9922c0:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272668
> > Commit abab733: Doesn't build
> > Commit 052d432:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272835
> > Commit f8b0e05:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16272991
> > Commit d3f5fd7:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273178
> > Commit 2c2228b:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273222
> > Commit 3d7572b: Upstream SPEC file changes
> > Commit d345876:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273555
> > Commit 3aba60f:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273643
> > Commit 7328447: MANIFEST.in changes
> > Commit f805e92:
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16273720
> 
> Could you test those rpm's to figure out which ones fail and which do not
> for the issue?

Hi Charalampos,

QE currently don't have the capacity to try all those RPMs.
Please reconsider the reproduction steps provided in comment #0

Thanks

Comment 47 Miro Hrončok 2018-06-07 14:58:41 UTC
There are literally no reproduction steps in comment #0

It has been explicitly said in comment #37 that it would take us ages to set the environment to reproduce this. So we provided the RPMs for you to test it. If you don't have the capacity to test a couple of RPMs, we don't really have the capacity to spend "ages" on this. So I think this is a Mexican standoff.

Should I close this then with insufficient data?

Comment 48 Raz Tamir 2018-06-07 15:30:40 UTC
(In reply to Miro Hrončok from comment #47)
> There are literally no reproduction steps in comment #0
Sorry got confused with the comments.
this is what I meant:
"The simplest way is to create a script to start VM, wait for IP and do it again until you stuck and the VM does not get IP."
> 
> It has been explicitly said in comment #37 that it would take us ages to set
> the environment to reproduce this. So we provided the RPMs for you to test
> it. If you don't have the capacity to test a couple of RPMs, we don't really
> have the capacity to spend "ages" on this. So I think this is a Mexican
> standoff.
> 
> Should I close this then with insufficient data?
No,

There is no insufficient data here.
I provided many tests results with many different RPMs.
At this time, we don't have the room for this testing effort.
I lowered the severity of this bug so it won't be considered as urgent

Comment 49 Charalampos Stratakis 2018-06-07 15:46:34 UTC
(In reply to Raz Tamir from comment #48)
> (In reply to Miro Hrončok from comment #47)
> > There are literally no reproduction steps in comment #0
> Sorry got confused with the comments.
> this is what I meant:
> "The simplest way is to create a script to start VM, wait for IP and do it
> again until you stuck and the VM does not get IP."
> > 

Please provide said script then. Really this is a waste of resources at this point, trying to second guess everything, then providing builds and not getting an answer.

> > It has been explicitly said in comment #37 that it would take us ages to set
> > the environment to reproduce this. So we provided the RPMs for you to test
> > it. If you don't have the capacity to test a couple of RPMs, we don't really
> > have the capacity to spend "ages" on this. So I think this is a Mexican
> > standoff.
> > 
> > Should I close this then with insufficient data?
> No,
> 
> There is no insufficient data here.
> I provided many tests results with many different RPMs.
> At this time, we don't have the room for this testing effort.
> I lowered the severity of this bug so it won't be considered as urgent

Comment 50 Honza Horak 2018-06-07 18:01:23 UTC
I tried to take a look at the commits and found one suspicious:
https://github.com/fedora-python/python-ethtool/commit/f8b0e05c03add3d0bd7736c3e3f81e8eb28bc7c3

So I created an RPM with just this one back-ported and keeping it on the place which shouldn't be pruned:
https://hhorak.fedorapeople.org/python-ethtool-0.8-7.testbz1467845.el7.x86_64.rpm

Raz, if you found time to test at least this one, it could help us move forward, maybe I was lucky to pick the one that causes this.

Comment 55 Petr Viktorin (pviktori) 2018-08-27 11:13:26 UTC
Raz, could you test the RPM Honza provided in Comment #50?

Comment 56 Raz Tamir 2018-08-28 06:43:23 UTC
Liran,

As you already have the script for starting a VM in loop and test if the boot succeeds, could you take care of this request in comment #54?

Comment 57 Liran Rotenberg 2018-09-03 06:22:25 UTC
Tested with the given rpm in comment #50:
https://hhorak.fedorapeople.org/python-ethtool-0.8-7.testbz1467845.el7.x86_64.rpm


ovirt-guest-agent-common-1.0.14-3.el7ev.noarch
qemu-guest-agent-2.8.0-2.el7.x86_64
python-ethtool-0.8-7.testbz1467845.el7.x86_64
ethtool-4.8-7.el7.x86_64
Red Hat Enterprise Linux Server release 7.5 (Maipo)

I had 6 VMs with these RPMs, Starting and shutting down.
In iteration number 108, i hit the BZ.

One of the VM didn't report IP back to engine although it got IP.

[root@vm-30-65 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.30.65  netmask 255.255.255.0  broadcast 10.35.30.255
        inet6 fe80::21a:4aff:fe16:1041  prefixlen 64  scopeid 0x20<link>
        ether 00:1a:4a:16:10:41  txqueuelen 1000  (Ethernet)
        RX packets 20923  bytes 1823076 (1.7 MiB)
        RX errors 0  dropped 12  overruns 0  frame 0
        TX packets 2888  bytes 372956 (364.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

First occurrence on ovirt-guest-agent log:
Dummy-1::INFO::2018-09-02 21:35:22,606::OVirtAgentLogic::322::root::Received an external command: refresh...
Dummy-2::ERROR::2018-09-02 21:35:22,611::GuestAgentLinux2::218::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 214, in ethtool_list_nics
    'inet': self._get_ipv4_addresses(devinfo),
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses
    for ip in dev.get_ipv4_addresses():
RuntimeError: Could not open a NETLINK connection for eth0
Dummy-1::ERROR::2018-09-02 21:35:23,364::GuestAgentLinux2::218::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 214, in ethtool_list_nics
    'inet': self._get_ipv4_addresses(devinfo),
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses
    for ip in dev.get_ipv4_addresses():
SystemError: error return without exception set

Then the log keep loop with this error:
Dummy-2::ERROR::2018-09-03 00:39:23,072::GuestAgentLinux2::218::root::Error retrieving network interfaces.
Traceback (most recent call last):
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 214, in ethtool_list_nics
    'inet': self._get_ipv4_addresses(devinfo),
  File "/usr/share/ovirt-guest-agent/GuestAgentLinux2.py", line 189, in _get_ipv4_addresses
    for ip in dev.get_ipv4_addresses():
SystemError: error return without exception set

Comment 58 Petr Viktorin (pviktori) 2018-09-07 13:58:17 UTC
Lumír, could you check how get_etherinfo_address can return NULL without setting an exceptions?
It looks like it could do that even on master.

Comment 59 Victor Stinner 2018-09-07 13:59:20 UTC
My bets are on these two lines which return NULL with no exception set:


vstinner@apu$ git diff
diff --git a/python-ethtool/etherinfo.c b/python-ethtool/etherinfo.c
index f90fea4..c78b507 100644
--- a/python-ethtool/etherinfo.c
+++ b/python-ethtool/etherinfo.c
@@ -237,6 +237,7 @@ PyObject * get_etherinfo_address(PyEtherInfo *self, nlQuery query)
     int err = 0;
 
     if (!self) {
+        /* missing exception */
         return NULL;
     }
 
@@ -279,6 +280,7 @@ PyObject * get_etherinfo_address(PyEtherInfo *self, nlQuery query)
         break;
 
     default:
+        /* missing exception */
         return NULL;
     }

Comment 60 Lumír Balhar 2018-09-10 12:03:40 UTC
(In reply to Victor Stinner from comment #59)
> My bets are on these two lines which return NULL with no exception set:
> 
> 
> vstinner@apu$ git diff
> diff --git a/python-ethtool/etherinfo.c b/python-ethtool/etherinfo.c
> index f90fea4..c78b507 100644
> --- a/python-ethtool/etherinfo.c
> +++ b/python-ethtool/etherinfo.c
> @@ -237,6 +237,7 @@ PyObject * get_etherinfo_address(PyEtherInfo *self,
> nlQuery query)
>      int err = 0;
>  
>      if (!self) {
> +        /* missing exception */
>          return NULL;
>      }
>  
> @@ -279,6 +280,7 @@ PyObject * get_etherinfo_address(PyEtherInfo *self,
> nlQuery query)
>          break;
>  
>      default:
> +        /* missing exception */
>          return NULL;
>      }

I am not sure. get_etherinfo_address is called from _ethtool_etherinfo_get_ipv4_addresses [0] (or the same for IPv6) which is a method of etherinfo object where `self` parameter is checked with proper exception raising and `query` parameter is always set to NLQRY_ADDR4 or NLQRY_ADDR6 so two mentioned missing exceptions should have no effect.

[0] https://github.com/fedora-python/python-ethtool/blob/master/python-ethtool/etherinfo_obj.c#L161

Petr, how do you know that it may happen also in the newest ethtool?

In ethtool v0.8, the `self` parameter is also checked in ethinfo method and `query` is set to NLQRY_ADDR when the get_etherinfo function is called [0].
However, there are some suspicious returns in get_etherinfo function [1] where no exception is set [2] and [3].

[0] https://github.com/fedora-python/python-ethtool/blob/v0.8/python-ethtool/etherinfo_obj.c#L264
[1] https://github.com/fedora-python/python-ethtool/blob/v0.8/python-ethtool/etherinfo.c#L302
[2] https://github.com/fedora-python/python-ethtool/blob/v0.8/python-ethtool/etherinfo.c#L332
[3] https://github.com/fedora-python/python-ethtool/blob/v0.8/python-ethtool/etherinfo.c#L366

What can we do now? We can create a new rpm with all exceptions fixed and see which one will cause the problem again. Am I right?

Comment 61 Lumír Balhar 2018-09-11 12:20:21 UTC
Liran, how complicated it would be to set up our own environment where we'd be able to test it? Or would it be possible to gain access to some machine, where we can reproduce your issue and make changes in place until we'll solve it?

We can create a new RPM with some patches, give it to you and wait for results, fix issues and do the same thing again … but … because of other bugs appeared during testing of the previous one, I think that it would be better to let us test it and make changes until it will be okay.

Comment 63 Lumír Balhar 2018-10-01 14:11:44 UTC
Hello.

With a help from Liran and access to the testing platform, I did some tests. I had six virtual machines with python-ethtool installed from RPM from comment #50 where the suspicious patch is included. I wrote script using Ovirt SDK to reboot and check all virtual machines. I did almost 130 of reboot loops (more than 750 individual reboots) without any problem.

It's really hard to say whether we found the fix for original issue or it's just impossible to reproduce it again. The same obviously applies to the problem mentioned in comment #57. Moreover, if I'd add more patches, the result would be the same.

I don't know how we can inspect it more. I think that the best will be to backport the patch and then wait until we found some better way how to reproduce possible future issues.

What do you think?
Lumír

Comment 64 Raz Tamir 2018-10-02 07:54:22 UTC
(In reply to Lumír Balhar from comment #63)
> Hello.
> 
> With a help from Liran and access to the testing platform, I did some tests.
> I had six virtual machines with python-ethtool installed from RPM from
> comment #50 where the suspicious patch is included. I wrote script using
> Ovirt SDK to reboot and check all virtual machines. I did almost 130 of
> reboot loops (more than 750 individual reboots) without any problem.
> 
> It's really hard to say whether we found the fix for original issue or it's
> just impossible to reproduce it again. The same obviously applies to the
> problem mentioned in comment #57. Moreover, if I'd add more patches, the
> result would be the same.
> 
> I don't know how we can inspect it more. I think that the best will be to
> backport the patch and then wait until we found some better way how to
> reproduce possible future issues.
> 
> What do you think?
> Lumír

Seems ok to me.
We can't test it forever so we can assume it is fixed now

Comment 65 Lumír Balhar 2018-10-09 06:51:18 UTC
Charris, could you help us to plan the backporting of the mentioned patch?

Comment 66 Roni 2018-10-21 15:15:21 UTC
The problem was reproduced again at: v4.2.7.3-0.1.el7ev
Versions from the VM:
- Red Hat Enterprise Linux Server release 7.6 Beta (Maipo)
- python-ethtool-0.8-7.el7.x86_64
- qemu-guest-agent-2.12.0-2.el7.x86_64
- ovirt-guest-agent-common-1.0.14-3.el7ev.noarch

See attached: ovirt-guest-agent.log taken from the VM

Art log:
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier2/104/artifact/logs_per_test/networking/sr_iov/vm_test/TestSriovVm01/test_03_check_mac_of_vf_on_vm/art_test_runner.log

At the above Art log the following key return empty list 
u'netIfaces': []

Comment 67 Roni 2018-10-21 15:17:36 UTC
Created attachment 1496170 [details]
ovirt-guest-agent

Comment 68 Charalampos Stratakis 2018-10-22 11:57:45 UTC
(In reply to Lumír Balhar from comment #65)
> Charris, could you help us to plan the backporting of the mentioned patch?

Definitely. Let's push it for 7.7

Comment 69 Raz Tamir 2018-10-22 12:18:12 UTC
Hi,

Do we have a working patch QE will be able to work with?
This is impacting our testing badly

Comment 70 Lumír Balhar 2018-10-23 08:35:20 UTC
Hello.

The patch I was testing python-ethtool with is the one mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1467845#c50

Commit on github: https://github.com/fedora-python/python-ethtool/commit/f8b0e05c03add3d0bd7736c3e3f81e8eb28bc7c3

Charris, could you please backport it to 7.7? It seems that fixing this has a high priority for QE.

Comment 71 Petr Balogh 2018-10-30 15:18:48 UTC
Hello,

is there also ppc64le rpm which we can use for PPC images we are using in our testing envs?

Can we use for example this one as WA? https://kojipkgs.fedoraproject.org//packages/python-ethtool/0.13/1.fc26/ppc64le/python2-ethtool-0.13-1.fc26.ppc64le.rpm ?

Thx

Comment 73 Petr Viktorin (pviktori) 2018-11-01 14:41:27 UTC
Did you have time to test that RPM?

Comment 74 Petr Balogh 2018-11-06 09:40:58 UTC
Our last executions should contain mentioned rpm so all latest executions for rhv 4.2.7-7 where we use latest guest agent image should contain it. But someone from network team needs to confirm.

Comment 75 Petr Balogh 2018-11-06 12:21:51 UTC
Michael, as this is network related, can you please confirm? Thx

Comment 77 Petr Balogh 2018-11-06 12:54:02 UTC
Raz, I applied those RPMs in our runs, so I cannot probably do more here, so probably if someone will hit the issue again they should update here in thicket, but not sure to who I can move to needinfo from me?

Comment 78 Raz Tamir 2018-11-07 09:22:30 UTC
Thank you Petr.

I don't think we can do something else to "test" the fix besides let it run with our regression cycles and track it

Comment 79 Lumír Balhar 2018-12-11 10:25:18 UTC
Hello.

I think that we should specify some plan because we cannot keep this bug open forever waiting for a next bug.

So, the question is: Are you guys satisfied with the RPM provided by Honza?

If so, we can make some plan how to deliver the same content the standard way to all users.

Thank you and have a nice day.

Comment 80 Raz Tamir 2018-12-12 08:49:01 UTC
Hi Lumir,

As a follow-up to comment #78, I agree with you that we can't wait forever to see if the bug keeps reproducing, so please, let's take Honza's RPM and use it as a valid fix.
One more request I have is that the RPMs are very old and depend on a very old branch.  Could you please ensure the patches will be rebased on the latest branch?

Comment 81 Lumír Balhar 2018-12-12 10:45:48 UTC
Hello Raz.

I am sorry but I might not understand your last request. We probably just apply the suspicious patch from Honza's RPMs to the python-ethtool in all RHEL releases where version < 0.9 is shipped which means RHEL 7.4 to 7.7. We cannot update it because there are a lot of changes and they may cause some incompatibilities with critical system tools.

Comment 83 Lumír Balhar 2018-12-14 15:08:36 UTC
After a quick discussion with Honza, we decided to fix this only in RHEL 7.7. Any issues or comments about that?

Comment 84 Lumír Balhar 2018-12-18 09:08:27 UTC
Scratch build for RHEL 7.7: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=19556038

Comment 89 Lumír Balhar 2019-03-25 14:13:47 UTC
Any updates here? If it works for more than three months, we might consider it as tested.

Comment 93 errata-xmlrpc 2019-08-06 12:40:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2067


Note You need to log in before you can comment on or make changes to this bug.