|Summary:||python logging problem - when rotating to new log file, logger checks file's stat when the file does not exist|
|Product:||Red Hat Enterprise Linux 6||Reporter:||vvyazmin <vvyazmin>|
|Component:||python||Assignee:||Robert Kuska <rkuska>|
|Status:||CLOSED ERRATA||QA Contact:||Jan Kepler <jkejda>|
|Version:||6.4||CC:||abaron, amureini, bazulay, danken, fholec, hateya, iheim, jberan, jkejda, jkt, lnovich, lpeer, rkuska, ybronhei|
|Fixed In Version:||python-2.6.6-47.el6||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-11-21 09:13:59 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description email@example.com 2013-03-07 18:26:51 UTC
Created attachment 706757 [details] ## Logs vdsm, rhevm Description of problem: SPM server turned to “Non Responsive” state during creation a pool with multiple VM's Version-Release number of selected component (if applicable): RHEVM 3.1.3 - SI27.3 environment: RHEVM: rhevm-3.1.0-50.el6ev.noarch VDSM: vdsm-4.10.2-1.6.el6.x86_64 LIBVIRT: libvirt-0.10.2-18.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.1.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 How reproducible: 100% Steps to Reproduce: My environment: DC: FC Hosts: Servers x 4 (tigris[01-02].environment) SD: XIO connected via FC switch (8Gbit) VM's: Pool with 1000 VM's (automatic pool) 1. Start create a pool with multiple VM's (in my case it was one pool with 1000 VM's) Actual results: After approximately 388 VM's (57 min.) , SPM server turned to “Non Responsive” state. Expected results: Successfully create a pool Additional info: After I rebooted SPM host, system return to work normally, and I continue create a relevant pool. /var/log/ovirt-engine/engine.log /var/log/vdsm/vdsm.log Thread-5991::DEBUG::2013-03-06 14:01:02,711::misc::83::Storage.Misc.excCmd::(<lambda>) '/bin/dd iflag=direct skip=5 bs=512 if=/dev/e69b2953-20a9-4f68-95b4-c1162d62e615/metadata count=1' (cwd None) 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::ERROR::2013-03-06 14:01:02,068::volume::514::Storage.Volume::(create) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/volume.py", line 476, in create srcVolUUID, imgPath, volPath) File "/usr/share/vdsm/storage/blockVolume.py", line 186, in _create mdSlot = dom.mapMetaOffset(volUUID, VOLUME_MDNUMBLKS) File "/usr/share/vdsm/storage/blockSD.py", line 711, in mapMetaOffset return self.getFreeMetadataSlot(slotSize) File "/usr/share/vdsm/storage/blockSD.py", line 744, in getFreeMetadataSlot occupiedSlots = self._getOccupiedMetadataSlots() File "/usr/share/vdsm/storage/blockSD.py", line 716, in _getOccupiedMetadataSlots for lv in lvm.getLV(self.sdUUID): File "/usr/share/vdsm/storage/lvm.py", line 847, in getLV lv = _lvminfo.getLv(vgName, lvName) File "/usr/share/vdsm/storage/lvm.py", line 604, in getLv lvs = self._reloadlvs(vgName) File "/usr/share/vdsm/storage/lvm.py", line 414, in _reloadlvs with self._oplock.acquireContext(LVM_OP_RELOAD): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/misc.py", line 1202, in acquireContext self.acquire(operation) File "/usr/share/vdsm/storage/misc.py", line 1215, in acquire self.log.debug("Got the operational mutex") File "/usr/lib64/python2.6/logging/__init__.py", line 1044, in debug self._log(DEBUG, msg, args, **kwargs) File "/usr/lib64/python2.6/logging/__init__.py", line 1173, in _log self.handle(record) File "/usr/lib64/python2.6/logging/__init__.py", line 1183, in handle self.callHandlers(record) File "/usr/lib64/python2.6/logging/__init__.py", line 1220, in callHandlers hdlr.handle(record) File "/usr/lib64/python2.6/logging/__init__.py", line 679, in handle self.emit(record) File "/usr/lib64/python2.6/logging/handlers.py", line 396, in emit stat = os.stat(self.baseFilename) OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log' 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,750::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576' 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,759::resourceManager::553::ResourceManager::(releaseResource) Released resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576' (0 active users) 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,763::resourceManager::558::ResourceManager::(releaseResource) Resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576' is free, finding out if anyone is waiting for it. 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,767::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576', Clearing records. 62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::ERROR::2013-03-06 14:01:02,775::task::833::TaskManager.Task::(_setError) Task=`62c03d4e-6cf5-480d-a3c1-55bf7ba85e22`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 307, in run return self.cmd(*self.argslist, **self.argsdict)
Comment 1 Yaniv Kaul 2013-03-07 18:32:50 UTC
OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log is a good hint - there's a bug that permissions change, please verify which permission vdsm.log has.
Comment 2 Haim 2013-03-07 19:58:42 UTC
(In reply to comment #1) > OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log is a > good hint - there's a bug that permissions change, please verify which > permission vdsm.log has. reboot solved the problem, so it doesn't look related to the permission change bug, maybe its related in someway, Yaniv?
Comment 3 firstname.lastname@example.org 2013-03-10 08:41:25 UTC
permissions for /var/log/vdsm/vdsm.log [root@tigris01 ~]# ls -lah /var/log/vdsm/vdsm.log -rw-r--r--. 1 vdsm kvm 24M Mar 10 08:40 /var/log/vdsm/vdsm.log
Comment 4 Maor 2013-03-10 13:27:36 UTC
Something here looks fishy, how come we get, no such file for /var/log/vdsm/vdsm.log, but this log line, is getting written there. and also it gets fixed after reboot. Can u please reproduce this again
Comment 5 Allon Mureinik 2013-03-10 14:46:42 UTC
Discussed with Yaniv B. - he is researching the log issue, which seems like the key issue here. If there is an additional storage bug, please reopen after this issue is cleared
Comment 6 Yaniv Bronhaim 2013-03-12 17:22:30 UTC
Apparently its python bug in logging package code. They check if the file exists, then run stat on file. If the file was deleted between the check and the stat it will throw an exception. We catch that exception and log the exception. So the next emit sees that the file is gone and creates it.
Comment 7 Saggi Mizrahi 2013-03-12 17:26:09 UTC
It appears to be a bug in pythons logging. They keep using this racy pattern: if not os.path.exists(self.baseFilename): ... else: stat = os.stat(self.baseFilename) ... This means that if you happen to logrotate in the middle of this it will throw an exception and the next time you log it will recreate the file which seems to follow the pattern we're seeing. This exists in python all they way up to 2.7 and even 3.X. I'll need to talk to dmalcolm replacement about patching python for us.
Comment 8 Yaniv Bronhaim 2013-03-27 11:16:52 UTC
Moging the bug to python.
Comment 9 Haim 2013-03-27 11:59:36 UTC
(In reply to comment #8) > Moging the bug to python. you closed the bug as NOTABUG? where is the python bug?
Comment 10 Yaniv Bronhaim 2013-03-27 12:13:15 UTC
didn't mean to change the status
Comment 12 Bohuslav "Slavek" Kabrda 2013-05-07 09:23:34 UTC
It seems that there is an upstream issue filed for this bug  and it even has patch . Could you please try to apply that patch and see if it fixes your problem? (It is for 2.7 branch, but AFAICS it should apply and work ok.)  http://bugs.python.org/issue14632  http://hg.python.org/cpython/rev/15a33d7d2b50
Comment 13 Yaniv Bronhaim 2013-05-12 12:01:15 UTC
Currently I haven't managed to reproduce it, it happened only once to our QE. The patch seems to fit the issue and should solve or decrease the possibility for the exception.. will it be possible to apply backport of this patch to python 2.6 for rhel6.5?
Comment 17 Robert Kuska 2013-06-24 08:44:46 UTC
Created attachment 764511 [details] Proposed patch - watched filehandler
Comment 23 errata-xmlrpc 2013-11-21 09:13:59 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1582.html