Bug 871722
| Summary: | 3.x [Scalability] Failed to attach more than ~3350 vlan-networks to host's nic due to unexpected exception in getVdsCaps | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Rami Vaknin <rvaknin> | ||||
| Component: | vdsm | Assignee: | lpeer <lpeer> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | meital avital <mavital> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, bazulay, danken, hateya, iheim, jkt, lpeer, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | network | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-01-12 12:23:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
The vdsm.log excerpt is from a call to delNetwork(vlan3350)
Thread-132709::DEBUG::2012-10-29 07:09:41,213::BindingXMLRPC::874::vds::(wrapper) client [10.35.160.91]::call delNetwork with ('vlan3350', '3350', '', ['eth3']) {} flowID [37e06cbd]
Any idea why this was attempted, or why it has no trace in engine.log?
According to the correlation id, it's probably operation done by engine's transaction roll-back:
2012-10-29 07:06:41,164 INFO [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] Running command: AttachNetworkToVdsInterfaceCommand internal: false. Entities affected : ID: b79e6836-1fad-11e2-81c1-8febf9096a31 Type: VDS
2012-10-29 07:06:41,166 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddNetworkVDSCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] START, AddNetworkVDSCommand(HostName = puma01, HostId = b79e6836-1fad-11e2-81c1-8febf9096a31, networkName=vlan3350, oldNetworkName=null, hostAddr=null, checkConnectivity=false, connectionTimeout=0, vlanId=3350, bondName=null, nics=[eth3], inetAddr=, networkMask=null, gateway=, stp=false, bondingOptions=null, bootProtocol=None, vmNetwork=true, network=vlan3350 {id=71bc913c-a1a8-459b-9285-69d0362a058e, description=null, subnet=null, gateway=null, type=null, vlan_id=3350, stp=false, storage_pool_id=5058a6dd-37cb-44ed-b811-d84aee9aae4c, mtu=0, vmNetwork=true, cluster=network_cluster {id={clusterId=null, networkId=null}, status=NonOperational, is_display=false, required=true}}), log id: 2a320cb6
2012-10-29 07:09:41,168 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] XML RPC error in command AddNetworkVDS ( HostName = puma01 ), the error was: java.util.concurrent.TimeoutException, TimeoutException:
2012-10-29 07:09:41,169 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddNetworkVDSCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] FINISH, AddNetworkVDSCommand, log id: 2a320cb6
2012-10-29 07:09:41,169 ERROR [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] Command org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSNetworkException:
2012-10-29 07:10:00,000 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-87) Autorecovering hosts is disabled, skipping
2012-10-29 07:10:00,001 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-87) Autorecovering storage domains is disabled, skipping
2012-10-29 07:10:43,738 ERROR [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-3) [37e06cbd] Transaction rolled-back for command: org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand.
2012-10-29 07:10:43,758 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-3) Operation Failed: [Network error during communication with the Host.]
Yes, there has been a network error when vdsm attempted to respond
MainProcess|Thread-132580::DEBUG::2012-10-29 07:09:43,411::configNetwork::248::root::(writeBackupFile) Persistently backed up /var/lib/vdsm/netconfback/logicalnetworks/vlan3350 (until next 'set safe config')
Thread-132580::DEBUG::2012-10-29 07:09:43,511::BindingXMLRPC::880::vds::(wrapper) return addNetwork with {'status': {'message': 'Done', 'code
': 0}}
Thread-132580::ERROR::2012-10-29 07:09:43,516::SecureXMLRPCServer::73::root::(handle_error) client ('10.35.160.91', 36988)
Traceback (most recent call last):
File "/usr/lib64/python2.6/SocketServer.py", line 560, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 69, in finish_request
client_address)
File "/usr/lib64/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python2.6/SocketServer.py", line 617, in __init__
self.handle()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 329, in handle
self.handle_one_request()
File "/usr/lib64/python2.6/BaseHTTPServer.py", line 323, in handle_one_request
method()
File "/usr/lib64/python2.6/SimpleXMLRPCServer.py", line 494, in do_POST
self.wfile.write(response)
File "/usr/lib64/python2.6/socket.py", line 324, in write
self.flush()
File "/usr/lib64/python2.6/socket.py", line 303, in flush
self._sock.sendall(buffer(data, write_offset, buffer_size))
File "/usr/lib64/python2.6/ssl.py", line 203, in sendall
v = self.send(data[count:])
File "/usr/lib64/python2.6/ssl.py", line 174, in send
v = self._sslobj.write(data)
error: [Errno 32] Broken pipe
Over 3k vlans on a single host should be ok given that physical switches scale only to a few hundreds. |
Created attachment 635981 [details] vdsm and engine logs Versions: vdsm-4.9.6-39.0.el6_3.x86_64 si22 Scenario: 1. Create 4096 vlan-networks in a data center and attach them to a cluster 2. Add the above vlan-networks to host's nic Results: I've managed to attach ~3350 networks to the host nic without any problem while using a bare-metal host (Intel Xeon E5645, 2.40GHz, 32GB RAM), then the next ~20 networks succedded/failed alternately, then all the rest network attachements failed. Note that every ifconfig command (with output to /dev/null) on this order of host networks takes ~40 seconds to return while "ip addr" takes less than 1 second. Additional Information: From vdsm.log, same output dumped by manually running of getVdsCaps =================================================================== Thread-132709::ERROR::2012-10-29 07:10:43,737::BindingXMLRPC::892::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/BindingXMLRPC.py", line 879, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/BindingXMLRPC.py", line 346, in delNetwork return api.delNetwork(bridge, vlan, bond, nics, options) File "/usr/share/vdsm/API.py", line 1164, in delNetwork _netinfo = netinfo.NetInfo() File "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py", line 369, in __init__ _netinfo = get() File "/usr/lib64/python2.6/site-packages/vdsm/netinfo.py", line 308, in get for bridge in bridges()]) KeyError: 'vlan3350' Side effect in rhevm ==================== 2012-10-29 06:43:17,498 INFO [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] Running command: AttachNetworkToVdsInterfaceCommand internal: false. Entities affected : ID: b79e6836-1fad-11e2-81c1-8febf9096a31 Type: VDS 2012-10-29 06:43:17,501 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddNetworkVDSCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] START, AddNetworkVDSCommand(HostName = puma01, HostId = b79e6836-1fad-11e2-81c1-8febf9096a31, networkName=vlan3346, oldNetworkName=null, hostAddr=null, checkConnectivity=false, connectionTimeout=0, vlanId=3346, bondName=null, nics=[eth3], inetAddr=, networkMask=null, gateway=, stp=false, bondingOptions=null, bootProtocol=None, vmNetwork=true, network=vlan3346 {id=2c2747be-f33f-445e-871e-12cb1f91bbd8, description=null, subnet=null, gateway=null, type=null, vlan_id=3346, stp=false, storage_pool_id=5058a6dd-37cb-44ed-b811-d84aee9aae4c, mtu=0, vmNetwork=true, cluster=network_cluster {id={clusterId=null, networkId=null}, status=NonOperational, is_display=false, required=true}}), log id: 2e839d41 2012-10-29 06:46:17,504 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] XML RPC error in command AddNetworkVDS ( HostName = puma01 ), the error was: java.util.concurrent.TimeoutException, TimeoutException: 2012-10-29 06:46:17,504 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.AddNetworkVDSCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] FINISH, AddNetworkVDSCommand, log id: 2e839d41 2012-10-29 06:46:17,504 ERROR [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] Command org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSNetworkException: 2012-10-29 06:47:13,825 ERROR [org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand] (ajp-/127.0.0.1:8702-12) [5c1ed607] Transaction rolled-back for command: org.ovirt.engine.core.bll.AttachNetworkToVdsInterfaceCommand.