Bug 1484470 - oslo_rootwrap.client holds socket affecting instance spawning
Summary: oslo_rootwrap.client holds socket affecting instance spawning
Keywords:
Status: CLOSED DUPLICATE of bug 1451082
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-rootwrap
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Victor Stinner
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-08-23 16:51 UTC by Pablo Caruana
Modified: 2021-03-11 15:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-24 09:45:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pablo Caruana 2017-08-23 16:51:51 UTC
1)- after a fresh deployment of overcloud,  openvswitch-agent is showed as dead, making the any tenant instance deployment fails by not able to bind the port.
    After reboot of compute node, the patchport is present again, and all operations on the VM level (deployment, reboot, shutdown, etc) are working again.

Investigating the issue looks still in the same family than https://bugzilla.redhat.com/show_bug.cgi?id=1451082 for 
-close inherited filedescriptors after forking
-Allow rootwrap-daemon to timeout and exit If the client side abnormally exits, its rootwrap daemon cannot receive a shutdown message and will be left forever. Let it timeout and exit to save such cases.

openstack-neutron-9.3.1-2.1.el7ost.noarch                   
openstack-neutron-openvswitch-9.3.1-2.1.el7ost.noarch       
openstack-neutron-sriov-nic-agent-9.3.1-2.1.el7ost.noarch   
openvswitch-2.6.1-10.git20161206.el7fdp.x86_64             
python-neutron-9.3.1-2.1.el7ost.noarch                      
python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch       
python-ryu-4.9-2.1.el7ost.noarch                            
python-ryu-common-4.9-2.1.el7ost.noarch 
python-oslo-rootwrap-5.1.1-1.el7ost.noarc     

Neutron and python-ryu are marked for last available erratas for RHOSP 10.               


[openvswitch-agent.log]
- Server was deployed and  rootwrap daemon process with pid=8775

    1 2017-08-22 10:47:56.512 8542 INFO neutron.common.config [-] Logging enabled!
      2 2017-08-22 10:47:56.512 8542 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 9.3.1
      3 2017-08-22 10:47:56.514 8542 WARNING oslo_config.cfg [-] Option "rabbit_hosts" from group "oslo_messaging_rabbit" is deprecated for removal.  Its value may be silently ignored in the future.
      4 2017-08-22 10:47:56.515 8542 WARNING oslo_config.cfg [-] Option "rabbit_password" from group "oslo_messaging_rabbit" is deprecated for removal.  Its value may be silently ignored in the future.
      5 2017-08-22 10:47:56.515 8542 WARNING oslo_config.cfg [-] Option "rabbit_userid" from group "oslo_messaging_rabbit" is deprecated for removal.  Its value may be silently ignored in the future.
      6 2017-08-22 10:47:56.517 8542 INFO ryu.base.app_manager [-] loading app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
      7 2017-08-22 10:47:56.821 8542 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
      8 2017-08-22 10:47:56.822 8542 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
      9 2017-08-22 10:47:56.827 8542 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
     10 2017-08-22 10:47:56.827 8542 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
     11 2017-08-22 10:47:56.828 8542 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
     12 2017-08-22 10:47:57.031 8542 INFO oslo_rootwrap.client [-] Spawned new rootwrap daemon process with pid=8775


-- Then at 14:50 logging was enabled that includes restarting services

329 2017-08-22 14:50:57.114 24253 INFO neutron.common.config [-] Logging enabled!

--first signals ryu.lib.hub was no able to connect as  pid=8775 was still using the socket.

    508 2017-08-22 14:50:57.318 24253 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
    509 2017-08-22 14:50:57.319 24253 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
    510 2017-08-22 14:50:57.319 24253 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
    511 2017-08-22 14:50:57.320 24253 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
    512 2017-08-22 14:50:57.320 24253 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
    513 2017-08-22 14:50:57.323 24253 DEBUG neutron.callbacks.manager [-] Subscribe: <function init_handler at 0x5a4a488> Open vSwitch agent after_init subscribe /usr/lib/python2.7/site-packages/neutron/callbacks/manager.py:42
    514 2017-08-22 14:50:57.323 24253 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'addr', 'show', 'to', '172.17.2.13'] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
    515 2017-08-22 14:50:57.350 24253 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
    516   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    517     return func(*args, **kwargs)
    518   File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 97, in __call__
    519     self.ofp_ssl_listen_port)
    520   File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 120, in server_loop
    521     datapath_connection_factory)
    522   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 117, in __init__
    523     self.server = eventlet.listen(listen_info)
    524   File "/usr/lib/python2.7/site-packages/eventlet/convenience.py", line 43, in listen
    525     sock.bind(addr)
    526   File "/usr/lib64/python2.7/socket.py", line 224, in meth
    527     return getattr(self._sock,name)(*args)
    528 error: [Errno 98] Address already in use
    529 


This is mentioned there is a previos pid using the socket. this was associated with pid being in sleep state

8775  0.0  0.0 193388  2832 ?S    10:47   0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf   (this is a sudo wrapper allowing neutron run ovs client and similars operations at root)

Noticed the State:  S (sleeping), 
Pocess in S or D state is usually in a blocking system call, such as reading or writing to a file or the network, waiting for a called program to finish, or while waiting on semaphores or other synchronization primitives. It will go into the sleep state while waiting.

on our scenario the pid was waiting on the  child  8776 

 strace -p 8775
Process 8775 attached
select(13, [6 12], [], NULL, NULL


ps  -ef | grep -i 8775
root      8775     1  0 10:47 ?00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
root      8776  8775  0 10:47 ?00:00:00 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf


Checking the socket was in used for 8775

[root@overcloud-compute-0neutron]# ss -lnp|grep 6633
tcp    LISTEN     51     50     127.0.0.1:6633  *:*   users:(("sudo",pid=8775,fd=5))

[root@overcloud-compute-0neutron]# netstat -tulpn |grep 6633
tcp       51      0 127.0.0.1:6633  0.0.0.0:*       LISTEN      8775/sudo


[root@overcloud-compute-0neutron]# ps -aux | grep 8775
root      8775  0.0  0.0 193388  2832 ?S    10:47   0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

After killing the child one and freeing the socket service was able to start again, neutron agent-list showing openvswitch.agent being alive again as bein able to run the OMMAND=/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json

Aug 22 15:55:47 overcloud-compute-0 sudo[28614]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
Aug 22 15:55:47 overcloud-compute-0 sudo[28622]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json

--OVS proccesses being restarted correctly.

[root@overcloud-compute-0neutron]# ps -ef | grep ovs
root     28334     1  0 15:50 ?00:00:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root     28368     1  0 15:50 ?00:00:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root     28622 28589  0 15:55 ?00:00:00 sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json
root     28624 28622  0 15:55 ?00:00:00 /usr/bin/python2 /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json
root     28628 28624  0 15:55 ?00:00:00 /bin/ovsdb-client monitor Interface name,ofport,external_ids --format=json


-Rest of the logs entries:

   1584 2017-08-22 15:55:42.361 28589 INFO ryu.base.app_manager [-] loading app ryu.app.ofctl.service
   1585 2017-08-22 15:55:42.362 28589 INFO ryu.base.app_manager [-] loading app ryu.controller.ofp_handler
   1586 2017-08-22 15:55:42.362 28589 INFO ryu.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp of OVSNeutronAgentRyuApp
   1587 2017-08-22 15:55:42.362 28589 INFO ryu.base.app_manager [-] instantiating app ryu.controller.ofp_handler of OFPHandler
   1588 2017-08-22 15:55:42.363 28589 INFO ryu.base.app_manager [-] instantiating app ryu.app.ofctl.service of OfctlService
   1589 2017-08-22 15:55:42.364 28589 DEBUG neutron.callbacks.manager [-] Subscribe: <function init_handler at 0x4d04488> Open vSwitch agent after_init subscribe /usr/lib/python2.7/site-packages/neutron/callbacks/manager.py:42
   1590 2017-08-22 15:55:42.365 28589 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'addr', 'show', 'to', '172.17.2.13'] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
   1591 2017-08-22 15:55:42.375 28589 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:150
   1592 2017-08-22 15:55:42.440 28589 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
   1593 2017-08-22 15:55:42.440 28589 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125


root@overcloud-compute-0neutron]# systemctl restart neutron-openvswitch-agent.service
[root@overcloud-compute-0neutron]# systemctl -l  status neutron-openvswitch-agent.service
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-08-22 15:55:41 UTC; 14s ago
  Process: 28582 ExecStartPre=/usr/bin/neutron-enable-bridge-firewall.sh (code=exited, status=0/SUCCESS)
 Main PID: 28589 (neutron-openvsw)
   CGroup: /system.slice/neutron-openvswitch-agent.service
   ├─28589 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent --log-file /var/log/neutron/openvswitch-agent.log
   ├─28614 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
   ├─28615 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
   ├─28622 sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json
   ├─28624 /usr/bin/python2 /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json
   └─28628 /bin/ovsdb-client monitor Interface name,ofport,external_ids --format=json

Aug 22 15:55:41 overcloud-compute-0 neutron-enable-bridge-firewall.sh[28582]: net.bridge.bridge-nf-call-arptables = 1
Aug 22 15:55:41 overcloud-compute-0 neutron-enable-bridge-firewall.sh[28582]: net.bridge.bridge-nf-call-iptables = 1
Aug 22 15:55:41 overcloud-compute-0 neutron-enable-bridge-firewall.sh[28582]: net.bridge.bridge-nf-call-ip6tables = 1
Aug 22 15:55:41 overcloud-compute-0 systemd[1]: Started OpenStack Neutron Open vSwitch Agent.
Aug 22 15:55:41 overcloud-compute-0 neutron-openvswitch-agent[28589]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Aug 22 15:55:41 overcloud-compute-0 neutron-openvswitch-agent[28589]: Option "verbose" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
Aug 22 15:55:41 overcloud-compute-0 neutron-openvswitch-agent[28589]: Option "rpc_backend" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
Aug 22 15:55:42 overcloud-compute-0 neutron-openvswitch-agent[28589]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Aug 22 15:55:47 overcloud-compute-0 sudo[28614]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
Aug 22 15:55:47 overcloud-compute-0 sudo[28622]:  neutron : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json

Comment 6 Pablo Caruana 2017-10-24 09:46:28 UTC
Archiving this bugzilla as duplicated of https://bugzilla.redhat.com/show_bug.cgi?id=1451082 covered  officially as per: https://access.redhat.com/errata/RHBA-2017:2653.


Note You need to log in before you can comment on or make changes to this bug.