Bug 616016 - [vdsm] [libvirt] (scale) host fails to connect to qemu sockets (60 guests) within given time-out set
[vdsm] [libvirt] (scale) host fails to connect to qemu sockets (60 guests) w...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.1
All Linux
low Severity high
: rc
: ---
Assigned To: Dan Kenigsberg
Haim
:
Depends On:
Blocks: 655920
  Show dependency treegraph
 
Reported: 2010-07-19 09:36 EDT by Haim
Modified: 2014-01-12 19:46 EST (History)
6 users (show)

See Also:
Fixed In Version: vdsm-4.9-17.el6.x86_64
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-08-19 11:18:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
recovry triouts log (9.67 KB, text/plain)
2010-07-19 09:39 EDT, Haim
no flags Details
full vdsm log. (518.50 KB, application/x-gzip)
2010-09-20 07:32 EDT, Haim
no flags Details

  None (edit)
Description Haim 2010-07-19 09:36:41 EDT
Description of problem:

my system runs about 180 vms and divided between 3 hosts (60 vms per host), and when I restarted libvirtd service on SPM machine, system started to behave badly. 


vdsm service went down (as a results of broken connection to libvirt) and restarted, when it came back up, it tried to connect to all qemu sockets, but, apparently, failed to do it in the given time (which is 60 seconds?), so server went into non-operational.
tried it several times with no success - got the following error: 

clientIFinit::DEBUG::2010-07-19 14:37:49,752::clientIF::1095::vds::Trying to recover b1009d0a-9925-4782-bd12-ec011a1e2644
clientIFinit::INFO::2010-07-19 14:37:49,824::clientIF::487::vds::network None: using 0
clientIFinit::DEBUG::2010-07-19 14:37:49,827::clientIF::639::vds::Total desktops after creation of b1009d0a-9925-4782-bd12-ec011a1e2644 is 63
clientIFinit::DEBUG::2010-07-19 14:37:49,829::clientIF::1095::vds::Trying to recover 1361a5fa-bc79-4e1d-9d62-d0c7250da41c
clientIFinit::INFO::2010-07-19 14:37:49,902::clientIF::487::vds::network None: using 0
clientIFinit::DEBUG::2010-07-19 14:37:49,905::clientIF::639::vds::Total desktops after creation of 1361a5fa-bc79-4e1d-9d62-d0c7250da41c is 64
clientIFinit::DEBUG::2010-07-19 14:37:49,907::clientIF::1095::vds::Trying to recover 6723379a-e34f-4a9b-9f34-bde5650b8d59
clientIFinit::INFO::2010-07-19 14:37:49,982::clientIF::487::vds::network None: using 0

when I killed all qemu processes and activated server again, it managed to go up. 
this was not reproducible with 30 vms (qemu sockets). 

I think we should increase time-out and set it to be proportional to the number of open qemu sockets. 

repro steps: 

1) run 60 vms on SPM 
2) restart libvirtd service
Comment 1 Haim 2010-07-19 09:38:07 EDT
vdsm-4.9-10.el6.x86_64
libvirt-0.8.1-15.el6.x86_64
Comment 2 Haim 2010-07-19 09:39:59 EDT
Created attachment 432887 [details]
recovry triouts log
Comment 3 Dan Kenigsberg 2010-09-12 08:15:30 EDT
I am sorry for my late response, but this bug is missing important information. What does it mean that "system started to behave badly"? Did libvirt reconnect to all VMs? Did vdsm respond to local vdsClient? Is there any timeout on vdsm.log?
Comment 4 Haim 2010-09-20 07:32:35 EDT
Created attachment 448436 [details]
full vdsm log.

Dan, did manage to reproduce and have the following info: 

1) vdsClient does not responde and hangs forever. 
2) virsh has the same behivior and hangs forever. 
3) vdsm log shows the following (also attached fully): 

Thread-55985::ERROR::2010-09-20 13:50:15,800::libvirtvm::961::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-55985::DEBUG::2010-09-20 13:50:15,801::clientIF::118::vds::(prepareForShutdown) cannot run prepareForShutdown twice

please note that its not an SPM issue. 

more info 
top - 13:59:54 up 21:42,  3 users,  load average: 170.51, 156.80, 145.07
Tasks: 485 total,  23 running, 462 sleeping,   0 stopped,   0 zombie
Cpu(s): 46.1%us, 46.7%sy,  0.0%ni,  0.0%id,  6.4%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:  32871336k total, 10800092k used, 22071244k free,   138340k buffers
Swap: 16383992k total,        0k used, 16383992k free,  1041104k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
21296 vdsm      20   0 2459m 422m 2980 D 200.0  1.3  10:48.66 qemu-kvm                                                                                         
25516 vdsm      20   0 3203m 418m 2980 D 600.0  1.3  13:12.77 qemu-kvm                                                                                         
 5957 vdsm      20   0  837m  15m 2948 S 102.7  0.0  50:42.00 qemu-kvm                                                                                         
15140 vdsm      20   0 1457m 151m 2968 S 21.8  0.5   3:25.77 qemu-kvm                                                                                          
14651 vdsm      20   0 1177m 196m 2968 S 17.0  0.6   3:22.21 qemu-kvm                                                                                          
 3787 vdsm      20   0 1037m 395m 2980 R 16.6  1.2  10:04.90 qemu-kvm                                                                                          
32613 vdsm      20   0 2287m 396m 2980 R 16.6  1.2   8:53.40 qemu-kvm                                                                                          
22383 vdsm      20   0 1347m 439m 2980 S 16.0  1.4  15:18.40 qemu-kvm                                                                                          
21546 vdsm      20   0 2499m 425m 2980 S 15.7  1.3  11:04.20 qemu-kvm                                                                                          
 1260 vdsm      20   0 2011m 394m 2984 R 15.0  1.2   8:38.26 qemu-kvm                                                                                          
 9674 vdsm      20   0 1243m 399m 2980 R 14.3  1.2   8:11.38 qemu-kvm                                                                                          
31654 vdsm      20   0 2267m 408m 2980 D 14.0  1.3   9:20.45 qemu-kvm                                                                                          
 6922 vdsm      20   0 2017m 389m 2980 S 13.7  1.2   5:25.40 qemu-kvm                                                                                          
 3443 vdsm      20   0 1627m 401m 2980 S 12.7  1.2  10:03.78 qemu-kvm                                                                                          
 5142 vdsm      20   0 1435m 406m 2980 R 12.7  1.3   9:18.88 qemu-kvm                                                                                          
13066 vdsm      20   0 1177m 219m 2968 D 12.7  0.7   3:41.12 qemu-kvm                                                                                          
 9235 vdsm      20   0 1563m 403m 2980 S 12.4  1.3   8:17.13 qemu-kvm                                                                                          
30548 vdsm      20   0 2715m 415m 2980 R 12.4  1.3   9:49.82 qemu-kvm                                                                                          
  821 vdsm      20   0 2651m 418m 2980 R 12.1  1.3   9:16.03 qemu-kvm                                                                                          
 1498 vdsm      20   0 2203m 394m 2980 S 12.1  1.2   8:40.78 qemu-kvm                                                                                          
 3127 vdsm      20   0 1431m 409m 2980 S 12.1  1.3  10:14.47 qemu-kvm                                                                                          
12718 vdsm      20   0 1563m 401m 2980 S 12.1  1.3   7:05.40 qemu-kvm                                                                                          
21421 vdsm      20   0 1283m 450m 2980 S 12.1  1.4  15:38.72 qemu-kvm                                                                                          
25011 vdsm      20   0 1943m 443m 2980 S 12.1  1.4  16:29.32 qemu-kvm                                                                                          
 1710 vdsm      20   0 1947m 394m 2980 S 11.7  1.2   8:19.94 qemu-kvm                                                                                          
 1936 vdsm      20   0 1771m 413m 2980 S 11.7  1.3  10:33.77 qemu-kvm                                                                                          
 5524 vdsm      20   0 2523m 394m 2980 R 11.7  1.2   5:38.62 qemu-kvm


[root@magenta-vdsd ~]# vdsClient -s 0 list table




^CTraceback (most recent call last):
  File "/usr/share/vdsm/vdsClient.py", line 1975, in <module>
    code, message = commands[command][0](commandArgs)
  File "/usr/share/vdsm/vdsClient.py", line 172, in do_list
    response = self.s.list(True)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/site-packages/M2Crypto/m2xmlrpclib.py", line 49, in request
    h.endheaders()
  File "/usr/lib64/python2.6/httplib.py", line 904, in endheaders
    self._send_output()
  File "/usr/lib64/python2.6/httplib.py", line 776, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.6/httplib.py", line 735, in send
    self.connect()
  File "/usr/lib64/python2.6/site-packages/M2Crypto/httpslib.py", line 50, in connect
    self.sock.connect((self.host, self.port))
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 185, in connect
    ret = self.connect_ssl()
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
    return m2.ssl_connect(self.ssl, self._timeout

hope you have everything you need, if not, ping me.
Comment 5 Dan Kenigsberg 2010-09-26 07:08:35 EDT
This issue seems to be resolved in vdsm-4.9-17.el6.x86_64.
Comment 6 Haim 2010-12-15 04:32:32 EST
verified in vdsm4.9-30. 

run about 65 vms on single host, and managed use both APIs (vds-cli\virsh) to query running vms: 

vdsm in: 
real    0m1.484s
user    0m0.685s
sys     0m0.101s

libvirt in: 
real    0m0.180s
user    0m0.007s
sys     0m0.033s

Note You need to log in before you can comment on or make changes to this bug.