919163 – python logging problem - when rotating to new log file, logger checks file's stat when the file does not exist

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 919163 - python logging problem - when rotating to new log file, logger checks file's stat when the file does not exist

Summary: python logging problem - when rotating to new log file, logger checks file's ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	python
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Kuska
QA Contact:	Jan Kepler
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-07 18:26 UTC by vvyazmin@redhat.com
Modified:	2016-04-18 10:16 UTC (History)
CC List:	14 users (show)
Fixed In Version:	python-2.6.6-47.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-11-21 09:13:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
## Logs vdsm, rhevm (4.82 MB, application/x-gzip) 2013-03-07 18:26 UTC, vvyazmin@redhat.com	no flags	Details
Proposed patch - watched filehandler (2.54 KB, patch) 2013-06-24 08:44 UTC, Robert Kuska	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:1582	0	normal	SHIPPED_LIVE	Moderate: python security, bug fix, and enhancement update	2013-11-20 21:39:43 UTC

Description vvyazmin@redhat.com 2013-03-07 18:26:51 UTC

Created attachment 706757 [details]
## Logs vdsm, rhevm

Description of problem:  SPM server turned to “Non Responsive” state during  creation a pool with multiple VM's


Version-Release number of selected component (if applicable):
RHEVM 3.1.3 - SI27.3 environment:

RHEVM: rhevm-3.1.0-50.el6ev.noarch
VDSM: vdsm-4.10.2-1.6.el6.x86_64
LIBVIRT: libvirt-0.10.2-18.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.1.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
My environment:
DC: FC
Hosts: Servers x 4 (tigris[01-02].environment)
SD: XIO connected via FC switch (8Gbit)
VM's: Pool with 1000 VM's (automatic pool)

1. Start create a pool with multiple VM's (in my case it was one pool with 1000 VM's)
  
Actual results:
After approximately 388 VM's (57 min.) , SPM server turned to “Non Responsive” state.

Expected results:
Successfully create a pool

Additional info:
After I rebooted SPM host, system return to work normally, and I continue create a relevant pool.

/var/log/ovirt-engine/engine.log

/var/log/vdsm/vdsm.log

Thread-5991::DEBUG::2013-03-06 14:01:02,711::misc::83::Storage.Misc.excCmd::(<lambda>) '/bin/dd iflag=direct skip=5 bs=512 if=/dev/e69b2953-20a9-4f68-95b4-c1162d62e615/metadata count=1' (cwd None)
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::ERROR::2013-03-06 14:01:02,068::volume::514::Storage.Volume::(create) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 476, in create
    srcVolUUID, imgPath, volPath)
  File "/usr/share/vdsm/storage/blockVolume.py", line 186, in _create
    mdSlot = dom.mapMetaOffset(volUUID, VOLUME_MDNUMBLKS)
  File "/usr/share/vdsm/storage/blockSD.py", line 711, in mapMetaOffset
    return self.getFreeMetadataSlot(slotSize)
  File "/usr/share/vdsm/storage/blockSD.py", line 744, in getFreeMetadataSlot
    occupiedSlots = self._getOccupiedMetadataSlots()
  File "/usr/share/vdsm/storage/blockSD.py", line 716, in _getOccupiedMetadataSlots
    for lv in lvm.getLV(self.sdUUID):
  File "/usr/share/vdsm/storage/lvm.py", line 847, in getLV
    lv = _lvminfo.getLv(vgName, lvName)
  File "/usr/share/vdsm/storage/lvm.py", line 604, in getLv
    lvs = self._reloadlvs(vgName)
  File "/usr/share/vdsm/storage/lvm.py", line 414, in _reloadlvs
    with self._oplock.acquireContext(LVM_OP_RELOAD):
  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
    return self.gen.next()
  File "/usr/share/vdsm/storage/misc.py", line 1202, in acquireContext
    self.acquire(operation)
  File "/usr/share/vdsm/storage/misc.py", line 1215, in acquire
    self.log.debug("Got the operational mutex")
  File "/usr/lib64/python2.6/logging/__init__.py", line 1044, in debug
    self._log(DEBUG, msg, args, **kwargs)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1173, in _log
    self.handle(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1183, in handle
    self.callHandlers(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1220, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 679, in handle
    self.emit(record)
  File "/usr/lib64/python2.6/logging/handlers.py", line 396, in emit
    stat = os.stat(self.baseFilename)
OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log'
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,750::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576'
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,759::resourceManager::553::ResourceManager::(releaseResource) Released resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576' (0 active users)
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,763::resourceManager::558::ResourceManager::(releaseResource) Resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576' is free, finding out if anyone is waiting for it.
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::DEBUG::2013-03-06 14:01:02,767::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'e69b2953-20a9-4f68-95b4-c1162d62e615_imageNS.4b312846-8fea-4d71-93ce-cc8bd92fc576', Clearing records.
62c03d4e-6cf5-480d-a3c1-55bf7ba85e22::ERROR::2013-03-06 14:01:02,775::task::833::TaskManager.Task::(_setError) Task=`62c03d4e-6cf5-480d-a3c1-55bf7ba85e22`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 307, in run
    return self.cmd(*self.argslist, **self.argsdict)

Comment 1 Yaniv Kaul 2013-03-07 18:32:50 UTC

OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log is a good hint - there's a bug that permissions change, please verify which permission vdsm.log has.

Comment 2 Haim 2013-03-07 19:58:42 UTC

(In reply to comment #1)
> OSError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log is a
> good hint - there's a bug that permissions change, please verify which
> permission vdsm.log has.

reboot solved the problem, so it doesn't look related to the permission change bug, maybe its related in someway, Yaniv?

Comment 3 vvyazmin@redhat.com 2013-03-10 08:41:25 UTC

permissions for /var/log/vdsm/vdsm.log

[root@tigris01 ~]# ls -lah /var/log/vdsm/vdsm.log
-rw-r--r--. 1 vdsm kvm 24M Mar 10 08:40 /var/log/vdsm/vdsm.log

Comment 4 Maor 2013-03-10 13:27:36 UTC

Something here looks fishy, how come we get, no such file for /var/log/vdsm/vdsm.log, but this log line, is getting written there. and also it gets fixed after reboot.
Can u please reproduce this again

Comment 5 Allon Mureinik 2013-03-10 14:46:42 UTC

Discussed with Yaniv B. - he is researching the log issue, which seems like the key issue here.

If there is an additional storage bug, please reopen after this issue is cleared

Comment 6 Yaniv Bronhaim 2013-03-12 17:22:30 UTC

Apparently its python bug in logging package code. They check if the file exists, then run stat on file. If the file was deleted between the check and the stat it will throw an exception. We catch that exception and log the exception. So the next emit sees that the file is gone and creates it.

Comment 7 Saggi Mizrahi 2013-03-12 17:26:09 UTC

It appears to be a bug in pythons logging.

They keep using this racy pattern:

        if not os.path.exists(self.baseFilename):
        ...
        else:
            stat = os.stat(self.baseFilename)
            ...

This means that if you happen to logrotate in the middle of this it will
throw an exception and the next time you log it will recreate the file
which seems to follow the pattern we're seeing.

This exists in python all they way up to 2.7 and even 3.X.
I'll need to talk to dmalcolm replacement about patching python
for us.

Comment 8 Yaniv Bronhaim 2013-03-27 11:16:52 UTC

Moging the bug to python.

Comment 9 Haim 2013-03-27 11:59:36 UTC

(In reply to comment #8)
> Moging the bug to python.

you closed the bug as NOTABUG? where is the python bug?

Comment 10 Yaniv Bronhaim 2013-03-27 12:13:15 UTC

didn't mean to change the status

Comment 12 Bohuslav "Slavek" Kabrda 2013-05-07 09:23:34 UTC

It seems that there is an upstream issue filed for this bug [1] and it even has patch [2]. Could you please try to apply that patch and see if it fixes your problem? (It is for 2.7 branch, but AFAICS it should apply and work ok.)


[1] http://bugs.python.org/issue14632
[2] http://hg.python.org/cpython/rev/15a33d7d2b50

Comment 13 Yaniv Bronhaim 2013-05-12 12:01:15 UTC

Currently I haven't managed to reproduce it, it happened only once to our QE. The patch seems to fit the issue and should solve or decrease the possibility for the exception.. will it be possible to apply backport of this patch to python 2.6 for rhel6.5?

Comment 17 Robert Kuska 2013-06-24 08:44:46 UTC

Created attachment 764511 [details]
Proposed patch - watched filehandler

Comment 23 errata-xmlrpc 2013-11-21 09:13:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1582.html

Note You need to log in before you can comment on or make changes to this bug.