Bug 1327522 - pulp to log warn messages when swallowing an exception
Summary: pulp to log warn messages when swallowing an exception
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Pulp
Version: 6.1.8
Hardware: x86_64
OS: Linux
medium
high vote
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Peter Ondrejka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-15 09:56 UTC by Pavel Moravec
Modified: 2021-04-06 17:56 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-21 16:54:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Pulp Redmine 2136 0 Normal CLOSED - CURRENTRELEASE publish step error handling incorrectly assumes open file 2017-01-16 21:02:39 UTC

Description Pavel Moravec 2016-04-15 09:56:35 UTC
Description of problem:
In /usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py: line 165:

    def process(self):
..
            try:
                self.total_units = self._get_total()
                self.report_progress()
                self.initialize()
                self.report_progress()
..
            finally:
                # Always call finalize to allow cleanup of file handles
                self.finalize()
            self.post_process()

Assume an exception (example below) is raised such that the file expected to be created has not been created at all. Calling self.finalize() then fails with:

Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) Task pulp.server.managers.repo.publish.publish[d83cb083-161b-472d-ba43-05d121dde87f] raised unexpected: OSError(2, 'No such file or directory')
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) Traceback (most recent call last):
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     R = retval = fun(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 328, in __call__
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return super(Task, self).__call__(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self.run(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/publish.py", line 99, in publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     transfer_repo, conduit, call_config)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/publish.py", line 127, in _do_publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     publish_report = publish_repo(transfer_repo, conduit, call_config)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 495, in wrap_f
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return f(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/distributor.py", line 143, in publish_repo
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self._publisher.publish()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 558, in publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self.process_lifecycle()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 503, in process_lifecycle
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     super(PluginStep, self).process_lifecycle()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 127, in process_lifecycle
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     step.process()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 211, in process
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     self.finalize()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/publish.py", line 419, in finalize
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     self.other_context.checksum)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/metadata/repomd.py", line 82, in add_metadata_file_metadata
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     timestamp_element.text = str(int(os.path.getmtime(file_path)))
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib64/python2.7/genericpath.py", line 54, in getmtime
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return os.stat(filename).st_mtime
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) OSError: [Errno 2] No such file or directory: '/var/lib/pulp/working/repos/Default_Organization-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_RPMs_x86_64_7Server/distributors/yum_distributor/repodata/other.xml.gz'

This happens due to the fact that MetadataFileContext finalize method sees the file is not opened, so it does nothing, but PublishRpmStep finalize assumes there has been created file for other_context. (same applies when an exception is raised for file_lists or primary context).

The exception above is practically impossible to troubleshoot. It is so due to the fact a swallowed exception is silently ignored. Simple printing warning with exception details would help a lot to understand what happened.

One real example what exception could be raised & swallowed:

Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448) 01612026: caught exception: Traceback (most recent call last):
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 183, in process
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self.initialize()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/publish.py", line 403, in initialize
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     context.initialize()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/metadata_writer.py", line 74, in initialize
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._open_metadata_file_handle()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/metadata_writer.py", line 364, in _open_metadata_file_handle
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     content = gzip_handle.read(BUFFER_SIZE)
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 261, in read
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._read(readsize)
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 325, in _read
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._read_eof()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 347, in _read_eof
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     hex(self.crc)))
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448) IOError: CRC check failed 0x7d4c4209 != 0x60a4e457L

I.e. CRC check on the gzip file is wrong -> exception raised -> silently ignored -> no other (or filelist or primary) context file exists -> cryptic "No such file exists" raised.

Another possibility might be (havent tested/seen) file permissions, too many open FDs, some I/O problem,.. many.

Please add some warning what exception was raised, like:


                # Double check & return if we have been canceled
                if self.canceled:
                    return
            except Exception as e:
                _LOG.warn('caught exception: %s' % traceback.format_exc())
            finally:
                # Always call finalize to allow cleanup of file handles
                self.finalize()


Version-Release number of selected component (if applicable):
pulp-server-2.6.0.20-1.el7sat.noarch


How reproducible:
100% (if reproducer known)


Steps to Reproduce:
1. Somehow let damage primar / other / filelist gzip for some repo (not sure on this step)
2. Try to sync & publish the repo
3. Try to understand from logs what happened


Actual results:
2. fails
3. no idea what happened, "No such file or directory: other.xml.gz" is use-less info hiding the real cause.


Expected results:
2. fails
3. original exception raised that prevented opening the metadata / context file.


Additional info:

Comment 2 Bryan Kearney 2016-07-08 20:22:46 UTC
Per 6.3 planning, moving out non acked bugs to the backlog

Comment 5 pulp-infra@redhat.com 2016-11-21 18:52:01 UTC
The Pulp upstream bug status is at NEW. Updating the external tracker on this bug.

Comment 6 pulp-infra@redhat.com 2016-11-21 18:52:04 UTC
The Pulp upstream bug priority is at Normal. Updating the external tracker on this bug.

Comment 7 Michael Hrivnak 2016-12-05 20:15:30 UTC
I suggest re-triaging this for priority and severity.

This bug is very difficult to reproduce. If pulp fails to open a new file for writing, the error handling tries to then close that file. This just causes the error to be reported incorrectly, obscuring the original (highly unlikely) problem.

That is silly, but very unlikely to happen in the real world. It's worth fixing, but given that information, I suggest the severity should not be "high", and perhaps the priority should not even be "medium".

Comment 8 pulp-infra@redhat.com 2016-12-12 21:31:43 UTC
The Pulp upstream bug status is at ASSIGNED. Updating the external tracker on this bug.

Comment 9 pulp-infra@redhat.com 2016-12-12 23:31:40 UTC
The Pulp upstream bug status is at POST. Updating the external tracker on this bug.

Comment 10 pulp-infra@redhat.com 2016-12-16 21:12:28 UTC
The Pulp upstream bug status is at MODIFIED. Updating the external tracker on this bug.

Comment 11 pulp-infra@redhat.com 2016-12-16 21:31:32 UTC
All upstream Pulp bugs are at MODIFIED+. Moving this bug to POST.

Comment 14 pulp-infra@redhat.com 2017-01-10 02:02:24 UTC
The Pulp upstream bug status is at ON_QA. Updating the external tracker on this bug.

Comment 15 pulp-infra@redhat.com 2017-01-16 21:02:40 UTC
The Pulp upstream bug status is at CLOSED - CURRENTRELEASE. Updating the external tracker on this bug.

Comment 16 Peter Ondrejka 2018-01-23 16:34:58 UTC
Verified on Sat 6.3 snap 33

Comment 17 Satellite Program 2018-02-21 16:54:37 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336


Note You need to log in before you can comment on or make changes to this bug.