Bug 1395617

Summary: rhcert-listener doesn't start after reboot in KDUMP nfs test on a machine with 2 NICs
Product: Red Hat Certification Program Reporter: Rainer Koenig <Rainer.Koenig>
Component: redhat-certification-hardwareAssignee: Nobody <nobody>
Status: NEW --- QA Contact: rhcert qe <rhcert-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 1.0CC: gnichols
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Rainer Koenig 2016-11-16 10:00:55 UTC
Description of problem:
Running the KDUMP nfs test from the web UI on a SUT that has 2 NICs.
After triggering the crash the machine reboots, but the web UI reports 
still that its waiting for a response. 
Checking rhcert-backend server status shows that the rcert-listerner is not running on port 8009. 
Doing an "rhcert-backend server start" solves the problem immediately.

Version-Release number of selected component (if applicable):
redhat-certification-hardware-4.1-20161019.el7.noarch

How reproducible:
On the machine with 2 NICs: always
On a machine with just one NIC: never

Steps to Reproduce:
1. Install RHEL 7.3 & redhat-certification-4.1... on SUT
2. register the machine to the web UI
3. Perform kdump nfs test

Actual results:
Kdump gets triggered, machine crasehs and reboots and web UI is waiting for a response.

Expected results:
Machine should reboot and then the test should proceed, meaing the web UI gets connected again to the SUT. 

Additional info:
I wanted to try out if this is depending on the NIC that I use when registering the system. My machine has 2 IP adresses:
192.168.2.138 and 192.168.2.168. I tried the second one (.168) and the first attempt to register didn't produce any entry in the list on the web UI. On the second attempt I got my machine listed, but with the first address (.138). I guess that this surprise is triggered by the different metric values that I see when looking at the ip routes:

default via 192.168.2.1 dev enp0s25  proto static  metric 100 
default via 192.168.2.1 dev enp4s0  proto static  metric 101 
192.168.2.0/24 dev enp0s25  proto kernel  scope link  src 192.168.2.138  metric 100 
192.168.2.0/24 dev enp4s0  proto kernel  scope link  src 192.168.2.168  metric 101 
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 

A workaround for this problem is to do a "rhcert-backend server start" so that the rhcert-listerner gets started manually.

Comment 1 Rainer Koenig 2016-12-13 13:33:59 UTC
Problem occurs also in RHCert 4.2. Seems to happen every time the SUT has more than one NIC.