Created attachment 937862 [details]
Relevant logs- host loosing connectivity
Description of problem:
RHEL7 Hosts are loosing connectivity with engine and stay in non-responsive state until network service is restarted and only then host is going up.
It is happening also with dhcp and static ip configured on rhevm.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Working setup with rhel7 host
Host loosing connectivity during evening/night. Host stays in non-responsive state
Host shouldn't loose connectivity with engine. But if he does loose connectivity, i expect him to enroll back.
Created attachment 937865 [details]
/var/log/messages from my 2 rhel7 hosts
Created attachment 937867 [details]
Connectivity.logs from my 2 rhel7 hosts
I attached relevant logs from my two rhel7 servers.
connectivity.logs- in this logs you can see when the host lost connectivity with engine.
It's not happening with static IPs, there is some issue with dhcp. Could you check in your machines if after a few hours of having dhcp the dhclient process is still alive?
I will check that
I changed my rhel7 host from static ip to dhcp and during the night host lost connectivity with engine.
I'm not sure if the dhclient process was alive at that point, but he was alive in the last time i checked before going home.
I have changed the priority to urgent,this bug frequently aborts many of my test scripts,plus all storage guys hit this issue on nearly daily basis
It was alive when you checked before going home, was it alive when you returned the next day? (Even after losing connectivity)
The dhclient process wasn't alive when i returned the next day.
Thanks Michael. I managed to reproduce it as well on f20. We need to find out why the dhclient process quits.
Ok Toni, Thank you.
We all waiting for a solution there.
For now, i configured all my rhel7 hosts with static ip, so they won't loose connectivity.
Ok, I went to Michael's machine and after talking with Jiři Popelka applied the patch for https://bugzilla.redhat.com/show_bug.cgi?id=1116004 there. The patch in question checks if the arping answer belongs to a mac address in the machine.
The issue didn't happend again and I can confirm that the case that was making it fail was the same. Thus, I mark this as a duplicate of bz#1116004
*** This bug has been marked as a duplicate of bug 1116004 ***