Bug 1051658 - satellite-sync DST handling issue: "'NoneType' object is unsubscriptable" error on MST/PST Satellites with RHEL 6 Supplementary Channel RHBA-2013:1729-1 (erratum modified within 1hr before DST)
Summary: satellite-sync DST handling issue: "'NoneType' object is unsubscriptable" err...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Stephen Herr
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
: 1057265 (view as bug list)
Depends On: 1038207
Blocks: 1063340 space21
TreeView+ depends on / blocked
 
Reported: 2014-01-10 19:42 UTC by Stephen Herr
Modified: 2016-12-06 21:49 UTC (History)
16 users (show)

Fixed In Version: spacewalk-backend-2.1.45-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1038207
: 1063340 (view as bug list)
Environment:
Last Closed: 2014-03-04 13:07:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Stephen Herr 2014-01-10 19:42:43 UTC
+++ This bug was initially created as a clone of Bug #1038207 +++

Description of problem:

Satellite sync is failing for supplementary channels: 

rhel-x86_64-server-supplementary-5 
rhel-x86_64-server-supplementary-6
rhel-x86_64-workstation-supplementary-6
rhel-x86_64-client-supplementary-6

Believe this might be a possible CDN issue. We are currently trying to get a dump to a customer to confirm syncs against a dump will succeed. 

How reproducible:

satellite-sync for supplementary channels

so far only external customers, not internal


 Error when syncing satellite:

 ------------------------------------------------------------ 
   SYNC ERROR: unhandled exception occurred:

  Exception reported from its-rhnwb
  Time: Tue Nov 26 04:13:27 2013
  Exception type <type 'exceptions.TypeError'>

  Exception Handler Information
  Traceback (most recent call last):
   File "/usr/bin/satellite-sync", line 139, in main
     return satsync.Runner().main()
   File "/usr/lib/python2.6/site-packages/spacewalk/satellite_tools/satsync.py", line 225, in main
     ret = method()

   [...]

  File "/usr/lib/python2.6/site-packages/spacewalk/server/importlib/errataImport.py", line 56, in _preprocessErratum
     self._processPackage(package)
   File "/usr/lib/python2.6/site-packages/spacewalk/server/importlib/importLib.py", line 666, in _processPackage
     Import._processPackage(self, package)
   File "/usr/lib/python2.6/site-packages/spacewalk/server/importlib/importLib.py", line 646, in _processPackage
     evr.append(package[f])
  TypeError: 'NoneType' object is unsubscriptable

--- Additional comment from Shannon Hughes on 2014-01-09 11:40:51 EST ---

Some more progress this morning. This might be a DST related issue. There are no issues accessing the pickled satsync/short-packages/63/rhn-package-798463 file. We do read in the modified dates. From rhnLib.py we convert the value to a timestamp, 

1383466145.0

When looking at the cache we open a locked file object and then use the python os module to grab the modified timestamp: 

        if self.modified:
            if os.fstat(fd.fileno())[ST_MTIME] != self.modified:               
                fd.close()
                raise KeyError(name)

>>> import os
>>> import sys
>>> fd = open('/var/cache/rhn/satsync/short-packages/63/rhn-package-798463', 'r')
>>> import fcntl
>>> from stat import ST_MTIME
>>> s = os.fstat(fd.fileno())[ST_MTIME]
>>> s
1383462545

The two values are 3600 seconds apart, which screams DST issue to me. When i run the timestamp function from rhnLib in the python interpreter I get back a value that matches the os.fstat call: 

>>> t = rhnLib.timestamp("20131103010905")
>>> t
1383462545.0

--- Additional comment from Shannon Hughes on 2014-01-09 17:18:00 EST ---

Going to be changing this to a satellite bug as sherr and I have a good idea of the issue with the way mktime being called from satellite is deciding on DST. Full summary will be coming shortly.

--- Additional comment from Stephen Herr on 2014-01-09 17:22:56 EST ---

For some reason (that I have yet to determine) this bug affects only people whos Satellites are in MST or PST.

We can see on Satellites that reproduce this error that time.mktime (which just calls the linux C implementation of mktime, see 'man mktime'). That the same date-time string returns two different times during the course of the sat-sync. So for example (in mountain time) if the last_modified date of the package in question is "2013-11-03 01:09:05", then even though that date string does not change, at one point in the satsync time.mktime returns 1383462545.0 and at another point in the sat-sync time.mktime returns 1383466145.0.

Note that both are actually valid timestamps for that last_modified date. November 3rd is the end of daylight savings time in the US, so the 1:00 hour is repeated as we "fall back". Note also that in that last_modified date there is no timezone information or any other way for mktime to determine if it should return the dst timestamp or non-dst timestamp.

The mktime manpage states about the tm_isdst variable (which is -1 when we pass it in since we don't know if dst was in effect or not): "...and a negative value means that mktime() should (use timezone information and system databases to) attempt to determine whether  DST  is in effect at the specified time." Something during the sat-sync changes the system state such that mktime's guess as to whether to return the dst timestamp or not changes. The linux time functions save their state in a variable named "daylight", so the difference may be something as simple as "The previously evaluated time was in dst / non dst".

The timestamp that gets returned matters, because for some reason we lookup the package data from an in-memory cache by package_id *and* timestamp. This has always been the case as far back in the history as I am able to look. Because look up the package data by timestamp, when we use a different timestamp for a package we have previously read we get a None object back, which leads to the TypeError reported in comment 0.

I can't tell you why this only affects servers in Mountain or Pacific time. I can't tell you what exactly is changing to make linux's mktime function change its mind about what the timestamp should be. But this is the problem. I think the fix should be to update the sat-sync code to only use pid to look up package information, not pid and timestamp.

Comment 1 Stephen Herr 2014-01-10 19:51:07 UTC
Committing to Spacewalk master:
3143007bf42ca1f063cdb7a949b4a1b0157bd38f

Comment 2 Matej Kollar 2014-01-17 12:15:44 UTC
Switching MODIFIED Spacewalk bugs to ON_QA before 2.1 release.

Comment 3 Stephen Herr 2014-01-24 22:27:07 UTC
In bug 1057265 jhutar discovered a problem with sat-syncing from channel dumps that turned out to be due to this bug. Fixing:

ce24ca518c0914b78b8884a1f5f854cb300b96ed

Comment 4 Stephen Herr 2014-01-24 22:28:51 UTC
*** Bug 1057265 has been marked as a duplicate of this bug. ***

Comment 5 Matej Kollar 2014-03-04 13:07:56 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21

Comment 6 Matej Kollar 2014-03-04 13:08:58 UTC
Spacewalk 2.1 has been released.
https://fedorahosted.org/spacewalk/wiki/ReleaseNotes21

Comment 7 Stephen Herr 2016-12-06 21:49:45 UTC
(In reply to Stephen Herr from comment #0)
> I can't tell you why this only affects servers in Mountain or Pacific time.
> I can't tell you what exactly is changing to make linux's mktime function
> change its mind about what the timestamp should be. But this is the problem. I > think the fix should be to update the sat-sync code to only use pid to look up > package information, not pid and timestamp.

I realized I never added an additional comment to clarify, so I'll do so now since this gets referenced every once in a while.

The problem is with Daylight Savings Time and mktime. Mktime allows you to tell it if the given datetime is in DST or not. If you don't tell it, it figures it out based on the date / time. If it *can't* figure it out, because for example you are in the "fall back" hour that is repeated, then it *guesses* based on the *last* date / time that you passed to mktime.

So which unix timestamp mktime returns for a given second in the fall-back hour is therefore non-deterministic. Sometimes it may assume the DST timestamp, sometimes the non-DST timestamp, based on whatever the last datetime it processed was.

So somewhere in the process when we were processing a long list of datetimes it chose one and then later in the process it chose the other. And since the combination of pid and timestamp was required for the unique key, it couldn't find what it was looking for and eventually resulted in trying to subscript a None object.


Note You need to log in before you can comment on or make changes to this bug.