Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 917370

Summary: Nova: Race during nova boot bridge creation and MySQL query request caused 'Lost connection to MySQL server during query'.
Product: Red Hat OpenStack Reporter: Omri Hochman <ohochman>
Component: openstack-novaAssignee: Nikola Dipanov <ndipanov>
Status: CLOSED NOTABUG QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: high    
Version: 2.0 (Folsom)CC: eglynn, ndipanov, ohochman
Target Milestone: asyncKeywords: Triaged
Target Release: 2.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-09 17:07:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 917534, 920662, 1102484    
Bug Blocks:    
Attachments:
Description Flags
All_logs
none
reproduce_seconnd_time none

Description Omri Hochman 2013-03-03 16:32:42 UTC
Nova: Race during nova boot bridge creation and MySQL query request caused 'Lost connection to MySQL server during query'.


Description:
------------
I lost connection to the controller for 16 minutes (no ssh, no ping), 
Immediately when I sent the 'nova boot command' (of the first instance). 
The problem occurred between  16:00:40 - 16:16:08. from looking at the /var/log/secure, It seems that the last thing was: "ip addr del 10.35.160.11/24 brd 10.35.160", it occurred just before controller lost connection. 
  
The connection resumed after 16 minutes - from looking at the /etc/nova/network.log, It seems that during that time of bridge creation (ip addr del), there was MySQL query request that couldn't return and failed with  
time out after 16 minutes ('Lost connection to MySQL server during query'). 



Environment (1 controller and 2 compute-nodes) :
------------------------------------------------
openstack-nova-compute-2012.2.3-4.el6ost.noarch
openstack-nova-api-2012.2.3-4.el6ost.noarch
openstack-nova-scheduler-2012.2.3-4.el6ost.noarch
openstack-nova-novncproxy-0.4-3.el6ost.noarch
openstack-nova-network-2012.2.3-4.el6ost.noarch
openstack-nova-common-2012.2.3-4.el6ost.noarch
openstack-selinux-0.1.2-5.el6ost.noarch
kernel-2.6.32-358.el6.x86_64
libvirt-0.10.2-18.el6.x86_64


/var/log/secure  (Note: "ip addr del" - in - 16:00:40) :
------------------------------------------------------
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip link show dev br100
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl addbr br100
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl setfd br100 0
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl stp br100 off
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip link set br100 up
Mar  3 16:00:39 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl addif br100 eth2
Mar  3 16:00:40 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route del default via 10.35.160.254
Mar  3 16:00:40 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr show dev eth2 scope global
Mar  3 16:00:40 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr del 10.35.160.11/24 brd 10.35.160
.255 scope global dev eth2
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr add 10.35.160.11/24 brd 10.35.160
.255 scope global dev br100
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route add default via 10.35.160.254
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-save -c -t filter
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-restore -c
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-save -c -t mangle
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-restore -c
Mar  3 16:16:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-save -c -t nat
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf iptables-restore -c
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf sysctl -w net.ipv4.ip_forward=1
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr show dev br100 scope global
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route show dev br100
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route del default dev br100
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr del 10.35.160.11/24 brd 10.35.160.255 scope global dev br100
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr add 192.168.32.1/24 brd 192.168.32.255 dev br100
Mar  3 16:16:09 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr add 10.35.160.11/24 brd 10.35.160.255 scope global dev br100
Mar  3 16:16:10 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route add default via 10.35.160.254
Mar  3 16:16:10 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf FLAGFILE=["/usr/share/nova/nova-dist.conf


From /etc/nova/nova.conf (note - mysql ipaddr): 
----------------------------------------------
sql_connection=mysql://nova:2be505df0205432b.160.11/nova
flat_interface=eth2
public_interface=eth2

From /etc/nova/network.log : 
----------------------------
2013-03-03 11:13:43 12932 INFO nova.openstack.common.rpc.impl_qpid [-] Connected to AMQP server on 10.35.160.11:5672
2013-03-03 16:00:38 12932 INFO nova.openstack.common.rpc.impl_qpid [-] Connected to AMQP server on 10.35.160.11:5672
2013-03-03 16:16:08 12932 ERROR nova.service [-] model server went away
2013-03-03 16:16:08 12932 TRACE nova.service Traceback (most recent call last):
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/service.py", line 549, in report_state
2013-03-03 16:16:08 12932 TRACE nova.service     self.service_id, state_catalog)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/api.py", line 185, in service_update
2013-03-03 16:16:08 12932 TRACE nova.service     return IMPL.service_update(context, service_id, values)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 111, in wrapper
2013-03-03 16:16:08 12932 TRACE nova.service     return f(*args, **kwargs)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 484, in service_update
2013-03-03 16:16:08 12932 TRACE nova.service     service_ref.save(session=session)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 402, in _
_exit__
2013-03-03 16:16:08 12932 TRACE nova.service     self.commit()
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 318, in c
ommit
2013-03-03 16:16:08 12932 TRACE nova.service     t[1].commit()
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2046, in 
commit
2013-03-03 16:16:08 12932 TRACE nova.service     self._do_commit()
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2076, in 
_do_commit
2013-03-03 16:16:08 12932 TRACE nova.service     self.connection._commit_impl()
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1280, in 
_commit_impl
2013-03-03 16:16:08 12932 TRACE nova.service     self._handle_dbapi_exception(e, None, None, None, None)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1277, in 
_commit_impl
2013-03-03 16:16:08 12932 TRACE nova.service     self.engine.dialect.do_commit(self.connection)
2013-03-03 16:16:08 12932 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/dialects/mysql/base.py", line 1
891, in do_commit
2013-03-03 16:16:08 12932 TRACE nova.service     connection.commit()
2013-03-03 16:16:08 12932 TRACE nova.service OperationalError: (OperationalError) (2013, 'Lost connection to MySQL server during query') None None
2013-03-03 16:16:08 12932 TRACE nova.service 



/var/log/message: 
-------------------
Mar  3 16:00:21 puma01 dnsmasq[13176]: read /etc/hosts - 2 addresses
Mar  3 16:00:21 puma01 dnsmasq[13176]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Mar  3 16:00:21 puma01 dnsmasq[13176]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Mar  3 16:00:22 puma01 kernel: lo: Disabled Privacy Extensions
Mar  3 16:00:40 puma01 kernel: device eth2 entered promiscuous mode
Mar  3 16:00:40 puma01 kernel: br100: port 1(eth2) entering forwarding state
Mar  3 16:01:41 puma01 qpidd[8478]: 2013-03-03 16:01:41 error Connection 10.35.160.11:5672-10.35.160.11:52170 timed out: closing
Mar  3 16:01:41 puma01 qpidd[8478]: 2013-03-03 16:01:41 error Connection 10.35.160.11:5672-10.35.160.11:52346 timed out: closing
Mar  3 16:01:44 puma01 qpidd[8478]: 2013-03-03 16:01:44 error Connection 10.35.160.11:5672-10.35.160.11:52518 timed out: closing
Mar  3 16:01:44 puma01 qpidd[8478]: 2013-03-03 16:01:44 error Connection 10.35.160.11:5672-10.35.160.11:52519 timed out: closing
Mar  3 16:01:50 puma01 qpidd[8478]: 2013-03-03 16:01:50 error Connection 10.35.160.11:5672-10.35.160.15:34734 timed out: closing
Mar  3 16:02:13 puma01 qpidd[8478]: 2013-03-03 16:02:13 error Connection 10.35.160.11:5672-10.35.160.11:52399 timed out: closing
Mar  3 16:02:23 puma01 qpidd[8478]: 2013-03-03 16:02:23 error Connection 10.35.160.11:5672-10.35.160.15:34735 timed out: closing
Mar  3 16:02:37 puma01 qpidd[8478]: 2013-03-03 16:02:37 error Connection 10.35.160.11:5672-10.35.160.11:52425 timed out: closing
Mar  3 16:02:37 puma01 qpidd[8478]: 2013-03-03 16:02:37 error Connection 10.35.160.11:5672-10.35.160.11:52343 timed out: closing
Mar  3 16:02:37 puma01 qpidd[8478]: 2013-03-03 16:02:37 error Connection 10.35.160.11:5672-10.35.160.11:55567 timed out: closing
Mar  3 16:02:37 puma01 qpidd[8478]: 2013-03-03 16:02:37 error Connection 10.35.160.11:5672-10.35.160.13:37004 timed out: closing
Mar  3 16:02:38 puma01 qpidd[8478]: 2013-03-03 16:02:38 error Connection 10.35.160.11:5672-10.35.160.13:37005 timed out: closing
Mar  3 16:02:38 puma01 qpidd[8478]: 2013-03-03 16:02:38 error Connection 10.35.160.11:5672-10.35.160.11:52348 timed out: closing
Mar  3 16:02:38 puma01 qpidd[8478]: 2013-03-03 16:02:38 error Connection 10.35.160.11:5672-10.35.160.11:55585 timed out: closing
Mar  3 16:02:39 puma01 qpidd[8478]: 2013-03-03 16:02:39 error Connection 
10.35.160.11:5672-10.35.160.11:52426 timed out: closing
Mar  3 16:02:39 puma01 qpidd[8478]: 2013-03-03 16:02:39 error Connection 10.35.160.11:5672-10.35.160.11:52168 timed out: closing
Mar  3 16:02:40 puma01 qpidd[8478]: 2013-03-03 16:02:40 error Connection 10.35.160.11:5672-10.35.160.11:52341 timed out: closing
Mar  3 16:02:40 puma01 qpidd[8478]: 2013-03-03 16:02:40 error Connection 10.35.160.11:5672-10.35.160.11:52344 timed out: closing
Mar  3 16:16:08 puma01 kernel: __ratelimit: 1086 callbacks suppressed
Mar  3 16:16:08 puma01 kernel: type=1400 audit(1362320168.564:50311): avc:  denied  { sys_admin } for  pid=16497 comm="iptables-save" capability=21  scontext=unconfined_u
:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:08 puma01 kernel: type=1400 audit(1362320168.564:50312): avc:  denied  { sys_resource } for  pid=16497 comm="iptables-save" capability=24  scontext=unconfine
d_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:08 puma01 kernel: type=1400 audit(1362320168.733:50313): avc:  denied  { sys_admin } for  pid=16501 comm="iptables-restor" capability=21  scontext=unconfined
_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:08 puma01 kernel: type=1400 audit(1362320168.733:50314): avc:  denied  { sys_resource } for  pid=16501 comm="iptables-restor" capability=24  scontext=unconfi
ned_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:09 puma01 kernel: type=1400 audit(1362320169.081:50315): avc:  denied  { sys_admin } for  pid=16511 comm="iptables-save" capability=21  scontext=unconfined_u
:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:09 puma01 kernel: type=1400 audit(1362320169.081:50316): avc:  denied  { sys_resource } for  pid=16511 comm="iptables-save" capability=24  scontext=unconfine
d_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:09 puma01 kernel: type=1400 audit(1362320169.198:50317): avc:  denied  { sys_admin } for  pid=16515 comm="iptables-restor" capability=21  scontext=unconfined
_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:09 puma01 kernel: type=1400 audit(1362320169.199:50318): avc:  denied  { sys_resource } for  pid=16515 comm="iptables-restor" capability=24  scontext=unconfi
ned_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:11 puma01 dnsmasq[16552]: started, version 2.48 cachesize 150
Mar  3 16:16:11 puma01 dnsmasq[16552]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP "--bind-interfaces with SO_BINDTODEVICE"
Mar  3 16:16:11 puma01 dnsmasq-dhcp[16552]: DHCP, static leases only on 192.168.32.2, lease time 2m
Mar  3 16:16:11 puma01 dnsmasq[16552]: reading /etc/resolv.conf
Mar  3 16:16:11 puma01 dnsmasq[16552]: using nameserver 10.34.32.3#53
Mar  3 16:16:11 puma01 dnsmasq[16552]: using nameserver 10.35.28.1#53
Mar  3 16:16:11 puma01 dnsmasq[16552]: using nameserver 10.35.28.28#53
Mar  3 16:16:11 puma01 dnsmasq[16552]: read /etc/hosts - 2 addresses
Mar  3 16:16:11 puma01 dnsmasq[16552]: read /var/lib/nova/networks/nova-br100.conf
Mar  3 16:16:11 puma01 kernel: type=1400 audit(1362320171.793:50319): avc:  denied  { sys_admin } for  pid=16557 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:16:11 puma01 kernel: type=1400 audit(1362320171.793:50320): avc:  denied  { sys_resource } for  pid=16557 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:25 puma01 dnsmasq[16552]: read /etc/hosts - 2 addresses
Mar  3 16:17:25 puma01 dnsmasq[16552]: read /var/lib/nova/networks/nova-br100.conf
Mar  3 16:17:25 puma01 kernel: __ratelimit: 12 callbacks suppressed
Mar  3 16:17:25 puma01 kernel: type=1400 audit(1362320245.672:50327): avc:  denied  { sys_admin } for  pid=16594 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:25 puma01 kernel: type=1400 audit(1362320245.672:50328): avc:  denied  { sys_resource } for  pid=16594 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:25 puma01 kernel: type=1400 audit(1362320245.794:50329): avc:  denied  { sys_admin } for  pid=16598 comm="iptables-restor" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:25 puma01 kernel: type=1400 audit(1362320245.794:50330): avc:  denied  { sys_resource } for  pid=16598 comm="iptables-restor" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:26 puma01 kernel: type=1400 audit(1362320246.207:50331): avc:  denied  { sys_admin } for  pid=16608 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:26 puma01 kernel: type=1400 audit(1362320246.207:50332): avc:  denied  { sys_resource } for  pid=16608 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:26 puma01 kernel: type=1400 audit(1362320246.325:50333): avc:  denied  { sys_admin } for  pid=16612 comm="iptables-restor" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:26 puma01 kernel: type=1400 audit(1362320246.325:50334): avc:  denied  { sys_resource } for  pid=16612 comm="iptables-restor" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:26 puma01 dnsmasq-dhcp[16552]: DHCPRELEASE(br100) 192.168.32.2 fa:16:3e:54:3d:1d unknown lease
Mar  3 16:17:28 puma01 kernel: type=1400 audit(1362320248.220:50335): avc:  denied  { sys_admin } for  pid=16645 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:28 puma01 kernel: type=1400 audit(1362320248.220:50336): avc:  denied  { sys_resource } for  pid=16645 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0 tcontext=unconfined_u:system_r:iptables_t:s0 tclass=capability
Mar  3 16:17:29 puma01 dnsmasq[16552]: read /etc/hosts - 2 addresses
Mar  3 16:17:29 puma01 dnsmasq[16552]: read /var/lib/nova/networks/nova-br100.conf
Mar  3 16:18:18 puma01 kernel: __ratelimit: 28 callbacks suppressed
Mar  3 16:18:18 puma01 kernel: type=1400 audit(1362320298.905:50351): avc:  denied  { sys_admin } for  pid=16745 comm="qemu-kvm" capability=21  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:18 puma01 kernel: type=1400 audit(1362320298.905:50352): avc:  denied  { sys_resource } for  pid=16745 comm="qemu-kvm" capability=24  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:18 puma01 kernel: type=1400 audit(1362320298.956:50353): avc:  denied  { sys_admin } for  pid=16745 comm="qemu-kvm" capability=21  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:18 puma01 kernel: type=1400 audit(1362320298.956:50354): avc:  denied  { sys_resource } for  pid=16745 comm="qemu-kvm" capability=24  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:20 puma01 kernel: kvm: 16745: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
Mar  3 16:18:20 puma01 kernel: type=1400 audit(1362320300.938:50355): avc:  denied  { sys_admin } for  pid=16748 comm="qemu-kvm" capability=21  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:20 puma01 kernel: type=1400 audit(1362320300.938:50356): avc:  denied  { sys_resource } for  pid=16748 comm="qemu-kvm" capability=24  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:21 puma01 kernel: type=1400 audit(1362320301.219:50357): avc:  denied  { sys_admin } for  pid=16748 comm="qemu-kvm" capability=21  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:21 puma01 kernel: type=1400 audit(1362320301.219:50358): avc:  denied  { sys_resource } for  pid=16748 comm="qemu-kvm" capability=24  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:24 puma01 kernel: fuse init (API version 7.13)
Mar  3 16:18:27 puma01 kernel: type=1400 audit(1362320307.029:50359): avc:  denied  { sys_admin } for  pid=16745 comm="qemu-kvm" capability=21  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:27 puma01 kernel: type=1400 audit(1362320307.029:50360): avc:  denied  { sys_resource } for  pid=16745 comm="qemu-kvm" capability=24  scontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:qemu_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.130:50361): avc:  denied  { sys_admin } for  pid=16839 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.130:50362): avc:  denied  { sys_resource } for  pid=16839 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.264:50363): avc:  denied  { sys_admin } for  pid=16843 comm="iptables-restor" capability=21  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.264:50364): avc:  denied  { sys_resource } for  pid=16843 comm="iptables-restor" capability=24  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.624:50365): avc:  denied  { sys_admin } for  pid=16853 comm="iptables-save" capability=21  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.624:50366): avc:  denied  { sys_resource } for  pid=16853 comm="iptables-save" capability=24  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.744:50367): avc:  denied  { sys_admin } for  pid=16857 comm="iptables-restor" capability=21  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:28 puma01 kernel: type=1400 audit(1362320308.744:50368): avc:  denied  { sys_resource } for  pid=16857 comm="iptables-restor" capability=24  scontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:iptables_t:s0-s0:c0.c1023 tclass=capability
Mar  3 16:18:30 puma01 kernel: device vnet0 entered promiscuous mode
Mar  3 16:18:30 puma01 kernel: br100: port 2(vnet0) entering forwarding state
Mar  3 16:18:46 puma01 dnsmasq-dhcp[16552]: DHCPDISCOVER(br100) fa:16:3e:37:f6:f0 
Mar  3 16:18:46 puma01 dnsmasq-dhcp[16552]: DHCPOFFER(br100) 192.168.32.3 fa:16:3e:37:f6:f0 
Mar  3 16:18:46 puma01 dnsmasq-dhcp[16552]: DHCPREQUEST(br100) 192.168.32.3 fa:16:3e:37:f6:f0

Comment 2 Omri Hochman 2013-03-03 16:38:31 UTC
Created attachment 704577 [details]
All_logs

Comment 3 Russell Bryant 2013-03-06 22:50:49 UTC
Let's get the SELinux issues shown in your log sorted first to make sure that didn't have anything to do with the problem originally described.  The SELinux part looks like the same as bug 917534.

Comment 4 Lon Hohberger 2013-03-08 22:14:56 UTC
Note that in bug 917534, Dan and Eric point out that those particular AVCs are not generated by SELinux policy - the vfork system call ran out of some sort of resources - likely a ulimit is too low at some level.

Comment 5 Omri Hochman 2013-03-10 09:49:43 UTC
(In reply to comment #3)
> Let's get the SELinux issues shown in your log sorted first to make sure
> that didn't have anything to do with the problem originally described.  The
> SELinux part looks like the same as bug 917534.

Just to note, that I already encountered the SELinux issue in other occasion,  and It had no such an impact as described in this bug.

Comment 6 Lon Hohberger 2013-03-11 15:28:47 UTC
Yes, I'm pretty sure what's happening is that all of the processes created by dnsmasq have real UID of 'nobody' (and effective UID of root), so they're hitting the ulimit for the 'nobody' UID.

We probably need to increase the ulimit for number of processes.

Comment 7 Lon Hohberger 2013-03-11 18:40:07 UTC
The AVC parts we'll keep working on specifically in bug 917534 - the rest of the bug should be addressed separately.

Comment 8 Nikola Dipanov 2013-03-11 19:39:53 UTC
Omri,

keeping in mind what Lon explains in bug 917534 comment 15, would it be possible to try to make sure the MySQL connectivity issue never happens with proper rlimits set for the user, or better yet trying to artificially produce it now that we have what we think is the cause, so we can have a better chance of understanding how they are related if at all.

Moving this to async and adding a needinfo to try and reproduce with the information we have.

Comment 9 Lon Hohberger 2013-03-12 18:06:41 UTC
You should be able to reproduce the SELinux AVCs more reliably by setting an artificially low rlimit for the nova user.

Comment 11 Omri Hochman 2013-03-17 13:46:50 UTC
(In reply to comment #8)
> Omri,
> 
> keeping in mind what Lon explains in bug 917534 comment 15, would it be
> possible to try to make sure the MySQL connectivity issue never happens with
> proper rlimits set for the user, or better yet trying to artificially
> produce it now that we have what we think is the cause, so we can have a
> better chance of understanding how they are related if at all.
> 
> Moving this to async and adding a needinfo to try and reproduce with the
> information we have.


I was trying to reproduce artificially and failed, but Then when I've updated to newer openstack build and launch the first VMs the issue reproduced :

Again, just after booting the first instances I lost networking with the controller (no ping/ssh), this time the networking didn't recovered after 16 minutes, It took more than that, but eventually the connection resumed.

Environment:
-----------
openstack-nova-cert-2012.2.3-7.el6ost.noarch
openstack-nova-network-2012.2.3-7.el6ost.noarch
openstack-nova-compute-2012.2.3-7.el6ost.noarch
openstack-selinux-0.1.2-7.el6ost.noarch

console.log  (16 minutes until 'heartbeat timeout')
------------
2013-03-17 12:57:53 AUDIT nova.compute.manager [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] [instance: c92
a3757-fef6-4cca-8084-6b747ef9aedf] Starting instance...
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Attem
pting claim: memory 512 MB, disk 0 GB, VCPUs 1
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Total
 memory: 32099 MB, used: 512 MB, free: 31587 MB
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Memor
y limit: 48148 MB, free: 47636 MB
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Total
 disk: 442 GB, used: 0 GB, free: 442 GB
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Disk 
limit not specified, defaulting to unlimited
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Total
 VCPUs: 12, used: 0
2013-03-17 12:57:53 AUDIT nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] CPU l
imit: 192
2013-03-17 13:13:22 INFO nova.compute.resource_tracker [req-8edf7bec-090e-4718-a6bb-d174b22a96d8 00c37f45630747b28d60645d73ed1988 2b409314aff74e2aa014209afa07aa3c] Aborti
ng claim: [Claim c92a3757-fef6-4cca-8084-6b747ef9aedf: 512 MB memory, 0 GB disk, 1 VCPUS]
2013-03-17 13:13:22 12596 ERROR nova.openstack.common.rpc.impl_qpid [-] Failed to declare consumer for topic '2d7905592ad849d282020404983d3f91': heartbeat timeout
2013-03-17 13:13:22 12596 ERROR nova.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 101] ENETUNREACH. Sleeping 1 seconds
2013-03-17 13:13:22 12596 ERROR nova.service [-] model server went away
2013-03-17 13:13:22 12596 TRACE nova.service Traceback (most recent call last):
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/service.py", line 544, in report_state
2013-03-17 13:13:22 12596 TRACE nova.service     service_ref = db.service_get(ctxt, self.service_id)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/api.py", line 123, in service_get
2013-03-17 13:13:22 12596 TRACE nova.service     return IMPL.service_get(context, service_id)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 111, in wrapper
2013-03-17 13:13:22 12596 TRACE nova.service     return f(*args, **kwargs)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib/python2.6/site-packages/nova/db/sqlalchemy/api.py", line 343, in service_get
2013-03-17 13:13:22 12596 TRACE nova.service     filter_by(id=service_id).\
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2156, in first
2013-03-17 13:13:22 12596 TRACE nova.service     ret = list(self[0:1])
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2023, in __getitem__
2013-03-17 13:13:22 12596 TRACE nova.service     return list(res)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2227, in __iter__
2013-03-17 13:13:22 12596 TRACE nova.service     return self._execute_and_instances(context)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
2013-03-17 13:13:22 12596 TRACE nova.service     close_with_result=True)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
2013-03-17 13:13:22 12596 TRACE nova.service     **kw)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 730, in connection
2013-03-17 13:13:22 12596 TRACE nova.service     close_with_result=close_with_result)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 736, in _connection_for_bind
2013-03-17 13:13:22 12596 TRACE nova.service     return engine.contextual_connect(**kwargs)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 2490, in contextual_connect
2013-03-17 13:13:22 12596 TRACE nova.service     self.pool.connect(),
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 224, in connect
2013-03-17 13:13:22 12596 TRACE nova.service     return _ConnectionFairy(self).checkout()
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 387, in __init__
2013-03-17 13:13:22 12596 TRACE nova.service     rec = self._connection_record = pool._do_get()2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 741, in _do_get
2013-03-17 13:13:22 12596 TRACE nova.service     con = self._create_connection()
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 188, in _create_connection
2013-03-17 13:13:22 12596 TRACE nova.service     return _ConnectionRecord(self)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 270, in __init__
2013-03-17 13:13:22 12596 TRACE nova.service     self.connection = self.__connect()
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/pool.py", line 330, in __connect
2013-03-17 13:13:22 12596 TRACE nova.service     connection = self.__pool._creator()
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/strategies.py", line 80, in connect
2013-03-17 13:13:22 12596 TRACE nova.service     return dialect.connect(*cargs, **cparams)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/SQLAlchemy-0.7.8-py2.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 281, in connect
2013-03-17 13:13:22 12596 TRACE nova.service     return self.dbapi.connect(*cargs, **cparams)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/MySQLdb/__init__.py", line 81, in Connect
2013-03-17 13:13:22 12596 TRACE nova.service     return Connection(*args, **kwargs)
2013-03-17 13:13:22 12596 TRACE nova.service   File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 187, in __init__
2013-03-17 13:13:22 12596 TRACE nova.service     super(Connection, self).__init__(*args, **kwargs2)
2013-03-17 13:13:22 12596 TRACE nova.service OperationalError: (OperationalError) (2003, "Can't connect to MySQL server on '10.35.160.11' (101)") None None
2013-03-17 13:13:22 12596 TRACE nova.service 
2013-03-17 13:13:22 12596 ERROR nova.openstack.common.rpc.impl_qpid [-] Unable to connect to AMQP server: [Errno 101] ENETUNREACH. Sleeping 1 seconds
2013-03-17 13:13:22 12596 ERROR nova.openstack.common.rpc.impl_qpid [-] Failed to consume message from queue: heartbeat timeout

/var/log/secure:
-----------------
Mar 17 12:10:27 puma01 sshd[5715]: pam_unix(sshd:session): session closed for user root
Mar 17 12:57:52 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip link show dev br100
Mar 17 12:57:52 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl addbr br100
Mar 17 12:57:52 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl setfd br100 0
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl stp br100 off
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip link set br100 up
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf brctl addif br100 eth2
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route del default via 10.35.160.254
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr show dev eth2 scope global
Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr del 10.35.160.11/24 brd 10.35.160.255 scope global dev eth2
Mar 17 13:33:18 puma01 login: pam_unix(login:session): session opened for user root by (uid=0)
Mar 17 13:33:18 puma01 login: DIALUP AT ttyS0 BY root
Mar 17 13:33:18 puma01 login: ROOT LOGIN ON ttyS0
Mar 17 14:37:29 puma01 sshd[6013]: pam_unix(sshd:session): session closed for user root
Mar 17 15:09:08 puma01 sshd[5862]: pam_unix(sshd:session): session closed for user root
Mar 17 15:09:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr add 10.35.160.11/24 brd 10.35.160.255 scope global dev br100
Mar 17 15:09:08 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip route add default via 10.35.160.254


/var/log/messages:
------------------
 Mar 17 12:12:25 puma01 kernel: lo: Disabled Privacy Extensions
Mar 17 12:39:34 puma01 puppet-agent[5577]: Finished catalog run in 0.10 seconds
Mar 17 12:57:53 puma01 kernel: device eth2 entered promiscuous mode
Mar 17 12:57:53 puma01 kernel: br100: port 1(eth2) entering forwarding state
Mar 17 12:59:27 puma01 qpidd[8323]: 2013-03-17 12:59:27 error Connection 10.35.160.11:5672-10.35.160.11:43394 timed out: closing
Mar 17 12:59:28 puma01 qpidd[8323]: 2013-03-17 12:59:28 error Connection 10.35.160.11:5672-10.35.160.11:43396 timed out: closing
Mar 17 12:59:47 puma01 qpidd[8323]: 2013-03-17 12:59:47 error Connection 10.35.160.11:5672-10.35.160.11:43602 timed out: closing
Mar 17 12:59:47 puma01 qpidd[8323]: 2013-03-17 12:59:47 error Connection 10.35.160.11:5672-10.35.160.11:43606 timed out: closing
Mar 17 12:59:51 puma01 qpidd[8323]: 2013-03-17 12:59:51 error Connection 10.35.160.11:5672-10.35.160.13:43825 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.13:43826 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.11:43968 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.11:43662 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.11:43618 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.11:43963 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.11:43940 timed out: closing
Mar 17 12:59:52 puma01 qpidd[8323]: 2013-03-17 12:59:52 error Connection 10.35.160.11:5672-10.35.160.13:43846 timed out: closing
Mar 17 12:59:53 puma01 qpidd[8323]: 2013-03-17 12:59:53 error Connection 10.35.160.11:5672-10.35.160.11:43604 timed out: closing
Mar 17 12:59:53 puma01 qpidd[8323]: 2013-03-17 12:59:53 error Connection 10.35.160.11:5672-10.35.160.11:43943 timed out: closing
Mar 17 12:59:53 puma01 qpidd[8323]: 2013-03-17 12:59:53 error Connection 10.35.160.11:5672-10.35.160.11:43609 timed out: closing
Mar 17 13:09:34 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': getaddrinfo: Name or service not known
Mar 17 13:09:34 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Could not evaluate: getaddrinfo: Name or service not known Could not retrieve file metadata for pu
ppet://theforeman.eng.lab.tlv.redhat.com/plugins: getaddrinfo: Name or service not known
Mar 17 13:09:35 puma01 puppet-agent[5577]: Could not retrieve catalog from remote server: getaddrinfo: Name or service not known
Mar 17 13:09:35 puma01 puppet-agent[5577]: Using cached catalog
Mar 17 13:09:35 puma01 puppet-agent[5577]: Could not retrieve catalog; skipping run
Mar 17 13:09:35 puma01 puppet-agent[5577]: Could not send report: getaddrinfo: Name or service not known
Mar 17 13:39:35 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': getaddrinfo: Name or service not kn
own
Mar 17 13:39:35 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Could not evaluate: getaddrinfo: Name or service not known Could not retrieve file metadata for pu
ppet://theforeman.eng.lab.tlv.redhat.com/plugins: getaddrinfo: Name or service not known
Mar 17 13:39:36 puma01 puppet-agent[5577]: Could not retrieve catalog from remote server: getaddrinfo: Name or service not known
Mar 17 13:39:36 puma01 puppet-agent[5577]: Using cached catalog
Mar 17 13:39:36 puma01 puppet-agent[5577]: Could not retrieve catalog; skipping run
Mar 17 13:39:36 puma01 puppet-agent[5577]: Could not send report: getaddrinfo: Name or service not known
Mar 17 14:09:36 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': getaddrinfo: Name or service not kn
own
Mar 17 14:09:36 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Could not evaluate: getaddrinfo: Name or service not known Could not retrieve file metadata for pu
ppet://theforeman.eng.lab.tlv.redhat.com/plugins: getaddrinfo: Name or service not known
Mar 17 14:09:37 puma01 puppet-agent[5577]: Could not retrieve catalog from remote server: getaddrinfo: Name or service not known
Mar 17 14:09:37 puma01 puppet-agent[5577]: Using cached catalog
Mar 17 14:09:37 puma01 puppet-agent[5577]: Could not retrieve catalog; skipping run
Mar 17 14:09:37 puma01 puppet-agent[5577]: Could not send report: getaddrinfo: Name or service not known
Mar 17 14:39:38 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': getaddrinfo: Name or service not known
Mar 17 14:39:38 puma01 puppet-agent[5577]: (/File[/var/lib/puppet/lib]) Could not evaluate: getaddrinfo: Name or service not known Could not retrieve file metadata for puppet://theforeman.eng.lab.tlv.redhat.com/plugins: getaddrinfo: Name or service not known
Mar 17 14:39:39 puma01 puppet-agent[5577]: Could not retrieve catalog from remote server: getaddrinfo: Name or service not known
Mar 17 14:39:39 puma01 puppet-agent[5577]: Using cached catalog
Mar 17 14:39:39 puma01 puppet-agent[5577]: Could not retrieve catalog; skipping runMar 17 14:39:39 puma01 puppet-agent[5577]: Could not send report: getaddrinfo: Name or service not known
Mar 17 15:09:11 puma01 dnsmasq[7638]: started, version 2.48 cachesize 150
Mar 17 15:09:11 puma01 dnsmasq[7638]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP "--bind-interfaces with SO_BINDTODEVICE"
Mar 17 15:09:11 puma01 dnsmasq-dhcp[7638]: DHCP, static leases only on 192.168.36.2, lease time 2m
Mar 17 15:09:11 puma01 dnsmasq[7638]: reading /etc/resolv.conf
Mar 17 15:09:11 puma01 dnsmasq[7638]: using nameserver 10.34.32.3#53
Mar 17 15:09:11 puma01 dnsmasq[7638]: using nameserver 10.35.28.1#53
Mar 17 15:09:11 puma01 dnsmasq[7638]: using nameserver 10.35.28.28#53
Mar 17 15:09:11 puma01 dnsmasq[7638]: read /etc/hosts - 2 addresses
Mar 17 15:09:11 puma01 dnsmasq[7638]: read /var/lib/nova/networks/nova-br100.conf
Mar 17 15:09:47 puma01 puppet-agent[5577]: Finished catalog run in 0.17 seconds
Mar 17 15:10:57 puma01 dnsmasq[7638]: read /etc/hosts - 2 addresses
Mar 17 15:10:57 puma01 dnsmasq[7638]: read /var/lib/nova/networks/nova-br100.conf
Mar 17 15:10:57 puma01 dnsmasq-dhcp[7638]: DHCPRELEASE(br100) 192.168.36.2 fa:16:3e:10:85:eb unknown lease
Mar 17 15:11:01 puma01 dnsmasq[7638]: read /etc/hosts - 2 addresses
Mar 17 15:11:01 puma01 dnsmasq[7638]: read /var/lib/nova/networks/nova-br100.conf
Mar 17 15:11:28 puma01 kernel: kvm: 8086: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
Mar 17 15:11:33 puma01 kernel: fuse init (API version 7.13)
Mar 17 15:11:46 puma01 kernel: device vnet0 entered promiscuous mode
Mar 17 15:11:46 puma01 kernel: br100: port 2(vnet0) entering forwarding state
Mar 17 15:12:02 puma01 dnsmasq-dhcp[7638]: DHCPDISCOVER(br100) fa:16:3e:5f:96:d1 
Mar 17 15:12:02 puma01 dnsmasq-dhcp[7638]: DHCPOFFER(br100) 192.168.36.3 fa:16:3e:5f:96:d1 
Mar 17 15:12:02 puma01 dnsmasq-dhcp[7638]: DHCPREQUEST(br100) 192.168.36.3 fa:16:3e:5f:96:d1 
Mar 17 15:12:02 puma01 dnsmasq-dhcp[7638]: DHCPACK(br100) 192.168.36.3 fa:16:3e:5f:96:d1 fed101

Comment 12 Omri Hochman 2013-03-17 13:54:38 UTC
Created attachment 711397 [details]
reproduce_seconnd_time

adding all nova logs including messages/secure.

Comment 14 Omri Hochman 2013-03-17 14:51:45 UTC
Just few things worth emphasize: 
-------------------------------
1)Not sure It's related, but on my setup I use only one NIC for flat_interface and public_interface. 

Private interface for Flat DHCP on the Nova compute servers  [eth1] : eth2 
Public interface on the Nova network server  [eth0] : eth2
Private interface for Flat DHCP on the Nova network server  [eth1] : eth2

2) Under /etc/nova/noca.conf  mysql://nova:7a3b46d10c3d4998.160.11/nova

3) The Issue happens when I boot the first VM. 
At same time '/var/log/secure' shows: 
"Mar 17 12:57:53 puma01 sudo:     nova : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf ip addr del 10.35.160.11/24 brd 10.35.160.255 scope global dev eth"

from that moment there's no connection:  "Unable to connect to AMQP server: [Errno 101] ENETUNREACH"

Comment 15 Nikola Dipanov 2013-04-09 17:07:51 UTC
Comment #14 is very revealing.

If you are trying to set up nova FlatDHCP single-host-networking with more than one compute host - then compute hosts need to have 2 nics, one of which is dedicated

This is explained in more details here:

http://docs.openstack.org/trunk/openstack-compute/admin/content/configuring-flat-dhcp-networking.html

The basic idea is that the network to which VMs are connecting to the networking node running dnsmasq (and possibly other VMs on other compute hosts) needs to be a separate logical network as the IP addresses operating on it are those from the private range, and the traffic is bridged by the bridges each compute host sets up.

Further to that - nics on this network need to be set to promiscuous mode as they need to receive frames intended for VMs which will have their own MAC addresses.

Closing this as not a bug.