Created attachment 946136 [details] vdsm log from rhel 7 host Description of problem: During successful migration, we see several errors in vdsm log: Thread-56::ERROR::2014-10-12 21:21:53,498::guestagent::356::vm.Vm::(desktopLock) vmId=`e7ae437a-e487-46c2-a293-95f7f0659925`::desktopLock failed Traceback (most recent call last): File "/usr/share/vdsm/virt/guestagent.py", line 354, in desktopLock self._forward("lock-screen") File "/usr/share/vdsm/virt/guestagent.py", line 224, in _forward self._sock.send(message) File "/usr/lib64/python2.6/socket.py", line 167, in _dummy raise error(EBADF, 'Bad file descriptor') error: [Errno 9] Bad file descriptor The guest is running Fedora 20 from tlv pxe, probably without guest agent (I did not install one). Version-Release number of selected component (if applicable): master Oct 12 How reproducible: Seen on both rhel 7 and 6.5, 2/2 migrations. Steps to Reproduce: 1. Run vm on rhel 7 or 6.5 machine with one iscsi disk 2. Migrate to other machine 3. Migration succeeds Actual results: Unexpected errors in log Expected results: Clean log
Created attachment 946137 [details] vdsm log from rhel 6 host
it is expected in this flow. Maybe make the err msg a bit more pretty / info level.
still want to consider this "soon"
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset. Please set the correct milestone or add the z-stream flag.
In oVirt testing is done on single stream by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
The previous logs in the bug used log level info, making the flow unclear. I reproduce the issue again with current master and some patches for improving logging: https://gerrit.ovirt.org/49343 In the old logs, we had 4 different threads calling desktopLock and failing in the same time. I could not reproduce this now, maybe because the code was changed, or maybe something with the setup was different. Now we see only one failed desktopLock call for each migration. To reproduce: 1. Have 2 hosts 2. Start vm on one host 3. Open console 4. Migrate to other host The attached log contains 3 instances of this error. Here is annotated version of one of them: 1. Migration finished, in migration downtime thread (Thread-114) 2015-11-28 17:58:25,846 DEBUG [virt.vm] (Thread-114) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::stopping migration downtime thread [migration:454(stop)] 2015-11-28 17:58:25,846 INFO [virt.vm] (Thread-114) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::migration took 10 seconds to complete [migration:368(_startUnderlyingMigration)] 2015-11-28 17:58:25,848 DEBUG [virt.vm] (Thread-115) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::migration downtime thread exiting [migration:451(run)] 2015-11-28 17:58:25,849 INFO [virt.vm] (Thread-114) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::Changed state to Down: Migration succeeded (code=4) [vm:1293(setDownStatus)] 2. Guest agent stopped 2015-11-28 17:58:25,849 INFO [virt.vm] (Thread-114) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::Stopping connection [guestagent:336(stop)] 2015-11-28 17:58:25,849 DEBUG [vds] (Thread-114) Delete fileno 34 from listener. [vmchannels:214(unregister)] At this point, guest agent socket is closed, and guest agent _stopped variable is True. 3. 2 seconds later, another thread (Thread-117) call desktopLock 2015-11-28 17:58:27,448 DEBUG [virt.vm] (Thread-117) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::desktopLock called [guestagent:376(desktopLock)] 2015-11-28 17:58:27,448 ERROR [virt.vm] (Thread-117) vmId=`ecc85cb8-a80f-4822-8c53-a07423939499`::desktopLock failed [guestagent:379(desktopLock)] Traceback (most recent call last): File "/usr/share/vdsm/virt/guestagent.py", line 377, in desktopLock self._forward("lock-screen") File "/usr/share/vdsm/virt/guestagent.py", line 244, in _forward self._sock.sendall(message) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) File "/usr/lib64/python2.7/socket.py", line 170, in _dummy raise error(EBADF, 'Bad file descriptor') The call failed because the socket was closed by Thread-114. The main issue seems to be incorrect usage of guest agent from 2 threads One thread stop it, and the other try to send a message to a stopped guest agent. A secondary issue is failing in a more clear way when the guest agent is stopped. This can be hard to to with the current code, since it uses the _stopped variable also for keeping the connection state (stopped == disconnected). It seems that guest agent should keep more states (stopped, connecting, connected).
Created attachment 1100001 [details] Vdsm log using debug log level with current master
This bug was fixed and is slated to be in the upcoming version. As we are focusing our testing at this phase on severe bugs, this bug was closed without going through its verification step. If you think this bug should be verified by QE, please set its severity to high and move it back to ON_QA