Bug 1479533

Summary: [starter-us-east-1] error from yum module during upgrade
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: InstallerAssignee: Luke Meyer <lmeyer>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:06:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Justin Pierce 2017-08-08 18:00:17 UTC
Description of problem:

Version-Release number of the following components:
openshift-ansible v3.6.173.0.5
ansible 2.2.3.0

How reproducible:
Rare

Steps to Reproduce:
1. Ran upgrade on large HA cluster (>100 nodes). Occurred on one. 

Actual results:

Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/packaging/os/yum.py
<54.174.162.89> ESTABLISH SSH CONNECTION FOR USER: root
<54.174.162.89> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/opsmedic/.ansible/cp/ansible-ssh-%h-%p-%r 54.174.162.89 '/bin/sh -c '"'"'/usr/bin/python && sleep 0'"'"''
fatal: [starter-us-east-1-node-compute-8bcb6]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "invocation": {
        "module_args": {
            "conf_file": null, 
            "disable_gpg_check": false, 
            "disablerepo": null, 
            "enablerepo": null, 
            "exclude": null, 
            "install_repoquery": true, 
            "list": null, 
            "name": [
                "rQ4"
            ], 
            "state": "latest", 
            "update_cache": false, 
            "validate_certs": true
        }
    }, 
    "msg": "Traceback (most recent call last):\n  File \"/usr/bin/yum\", line 29, in <module>\n    yummain.user_main(sys.argv[1:], exit_code=True)\n  File \"/usr/share/yum-cli/yummain.py\", line 370, in user_main\n    errcode = main(args)\n  File \"/usr/share/yum-cli/yummain.py\", line 179, in main\n    result, resultmsgs = base.doCommands()\n  File \"/usr/share/yum-cli/cli.py\", line 573, in doCommands\n    return self.yum_cli_commands[self.basecmd].doCommand(self, self.basecmd, self.extcmds)\n  File \"/usr/share/yum-cli/yumcommands.py\", line 1626, in doCommand\n    ypl = base.returnPkgLists(extcmds, repoid=repoid)\n  File \"/usr/share/yum-cli/cli.py\", line 1400, in returnPkgLists\n    ignore_case=True, repoid=repoid)\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 3005, in doPackageLists\n    for (n,a,e,v,r) in self.up.getUpdatesList():\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 1093, in <lambda>\n    up = property(fget=lambda self: self._getUpdates(),\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 838, in _getUpdates\n    self._up = rpmUtils.updates.Updates(self.rpmdb.simplePkgList(), self.pkgSack.simplePkgList())\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 1074, in <lambda>\n    pkgSack = property(fget=lambda self: self._getSacks(),\n  File \"/usr/lib/python2.7/site-packages/yum/__init__.py\", line 778, in _getSacks\n    self.repos.populateSack(which=repos)\n  File \"/usr/lib/python2.7/site-packages/yum/repos.py\", line 386, in populateSack\n    sack.populate(repo, mdtype, callback, cacheonly)\n  File \"/usr/lib/python2.7/site-packages/yum/yumRepo.py\", line 242, in populate\n    mydbtype)\n  File \"/usr/lib/python2.7/site-packages/yum/yumRepo.py\", line 287, in _check_uncompressed_db_gen\n    cached=repo.cache)\n  File \"/usr/lib/python2.7/site-packages/yum/misc.py\", line 1165, in repo_gen_decompress\n    return decompress(filename, dest=dest, check_timestamps=True)\n  File \"/usr/lib/python2.7/site-packages/yum/misc.py\", line 1152, in decompress\n    os.utime(out, (fi.st_mtime, fi.st_mtime))\nOSError: [Errno 2] No such file or directory: '/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary_db.sqlite'\n", 
    "rc": 1, 
    "results": []
}

Comment 1 Scott Dodson 2017-08-08 18:08:54 UTC
We need to add failure tolerance to all node operations rather than just the drain and upgrade phases.

Comment 2 Scott Dodson 2017-09-25 15:25:03 UTC
We've added retries around yum transactions.

Comment 4 Johnny Liu 2017-09-27 14:58:21 UTC
Already did verification in https://bugzilla.redhat.com/show_bug.cgi?id=1482551#c8, and PASS.

Comment 8 errata-xmlrpc 2017-11-28 22:06:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188