971477 – [RHSC] Host (RHS Anshi) goes to Non-operational state after coming UP.

Bug 971477 - [RHSC] Host (RHS Anshi) goes to Non-operational state after coming UP.

Summary: [RHSC] Host (RHS Anshi) goes to Non-operational state after coming UP.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhsc
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Timothy Asir
QA Contact:	Shruti Sampat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-06 15:48 UTC by Shruti Sampat
Modified:	2013-07-08 12:13 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-08 12:13:13 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
engine logs (3.34 MB, text/x-log) 2013-06-06 16:52 UTC, Shruti Sampat	no flags	Details
vdsm logs (4.64 MB, text/x-log) 2013-06-06 17:02 UTC, Shruti Sampat	no flags	Details
View All

Description Shruti Sampat 2013-06-06 15:48:19 UTC

Description of problem:
---------------------------------------
After being added to a 3.1 cluster, an Anshi node, goes to non-operational state after coming UP initially. Both glusterd and vdsmd are running.

Any message regarding the change in state of the host from UP to non-operational is not seen in the Events log.

The following message is seen in the Events log multiple times -

Bridged network ovirtmgmt is attached to multiple interfaces: eth2,eth0 on Host rhs-client20.lab.eng.blr.redhat.com.

The following is seen in the engine logs -

2013-06-06 20:52:23,003 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-16) Host 'rhs-client20.lab.eng.blr
.redhat.com' moved to Non-Operational state because interface/s 'eth0, ' are down which needed by network/s 'ovirtmgmt, ' in the current cluster

Version-Release number of selected component (if applicable):
Red Hat Storage Console Version: 2.1.0-0.bb2.el6rhs
glusterfs 3.3.0.10rhs
vdsm-4.9.6-23.el6rhs.x86_64

How reproducible:
Intermittent

Steps to Reproduce:
1. Install RHS Anshi iso on storage server.
2. Update glusterfs to glusterfs 3.3.0.10rhs and vdsm to vdsm-4.9.6-23.el6rhs.x86_64.
3. Add host to a 3.1 cluster via the Console.

Actual results:
Host comes up initially. Then goes to Non-operational state. On trying to activate the host, it again comes up. Then goes to Non-operational state again and so on.

Expected results:
Host should remain UP.

Additional info:
The host is a physical machine. The contents of the file /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt -

[root@rhs-client20 u5_rpms]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
DEVICE=ovirtmgmt
TYPE=Bridge
ONBOOT=yes
DELAY=0
BOOTPROTO=dhcp
NM_CONTROLLED=no

The contents of /etc/sysconfig/network-scripts/ifcfg-eth2

[root@rhs-client20 u5_rpms]# cat /etc/sysconfig/network-scripts/ifcfg-eth2
DEVICE="eth2"
BRIDGE="ovirtmgmt"
BOOTPROTO="dhcp"
HWADDR="00:25:90:93:62:02"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
NM_CONTROLLED="yes"
ONBOOT="yes"
TYPE="Ethernet"

The contents of /etc/sysconfig/network-scripts/ifcfg-eth0 - 

[root@rhs-client20 u5_rpms]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
HWADDR="00:25:90:7C:2C:7A"
NM_CONTROLLED="yes"
ONBOOT="yes"

Comment 2 Shruti Sampat 2013-06-06 16:52:56 UTC

Created attachment 757756 [details]
engine logs

Comment 3 Shruti Sampat 2013-06-06 17:02:29 UTC

Created attachment 757758 [details]
vdsm logs

Comment 7 Dan Kenigsberg 2013-06-19 10:26:22 UTC

What does `brctl show` have on your faulty host? (just to rule out that vdsm is lying about the ovirtmgmt being connected to two nics)

{'ovirtmgmt': {'addr': '10.70.36.44', 'cfg': {'DELAY': '0', 'NM_CONTROLLED': 'no', 'BOOTPROTO': 'dhcp', 'DEVICE': 'ovirtmgmt', 'TYPE': 'Bridge', 'ONBOOT': 'yes'}, 'mtu': '1500', 'netmask': '255.255.254.0', 'stp': 'off', 'ports': ['eth0', 'eth2']}}

Does it reproduce on any other system?
Does it go away once you manually

  brctl delif ovirtmgmt eth0

Comment 8 Shruti Sampat 2013-06-19 11:04:09 UTC

This is the output of `brctl show` - 

[root@rhs-client20 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
ovirtmgmt               8000.0025907c2c7a       no              eth0
                                                        eth2

Yes, it goes away after doing 'brctl delif ovirtmgmt eth0'.

I will try with another machine and let you know if it happens again.

Comment 10 Shruti Sampat 2013-07-08 12:13:13 UTC

Closing as NOTABUG, because I am unable to reproduce the issue.

Note You need to log in before you can comment on or make changes to this bug.