Bug 1361185

Summary: race condition of yum and rhn_check
Product: Red Hat Enterprise Linux 7 Reporter: Paulo Andrade <pandrade>
Component: rhn-client-toolsAssignee: Michael Mráka <mmraka>
Status: CLOSED ERRATA QA Contact: Pavel Studeník <pstudeni>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: jhutar, mmraka, pandrade, pstudeni, tlestach
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 22:19:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paulo Andrade 2016-07-28 13:06:26 UTC
User had an issue and a reproducer would be:

# yum clean all; yum search rpmbuild
[...]
Traceback (most recent call last):
File "/bin/yum", line 29, in <module>
yummain.user_main(sys.argv[1:], exit_code=True)
File "/usr/share/yum-cli/yummain.py", line 365, in user_main
errcode = main(args)
File "/usr/share/yum-cli/yummain.py", line 174, in main
result, resultmsgs = base.doCommands()
File "/usr/share/yum-cli/cli.py", line 573, in doCommands
return self.yum_cli_commands[self.basecmd].doCommand(self, self.basecmd, self.extcmds)
File "/usr/share/yum-cli/yumcommands.py", line 1726, in doCommand
return base.search(extcmds)
File "/usr/share/yum-cli/cli.py", line 1440, in search
for (po, keys, matched_value) in matching:
File "/usr/lib/python2.7/site-packages/yum/_init_.py", line 3207, in searchGenerator
for sack in self.pkgSack.sacks.values():
File "/usr/lib/python2.7/site-packages/yum/_init_.py", line 1079, in <lambda>
pkgSack = property(fget=lambda self: self._getSacks(),
File "/usr/lib/python2.7/site-packages/yum/_init_.py", line 784, in _getSacks
self.repos.populateSack(which=repos)
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 383, in populateSack
sack.populate(repo, mdtype, callback, cacheonly)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 260, in populate
dobj = repo_cache_function(xml, csum)
File "/usr/lib64/python2.7/site-packages/sqlitecachec.py", line 46, in getPrimary
self.repoid))
TypeError: Can not create providesname index: disk I/O error

  User was told to use a systemtap script to check for
file access, and when the problem happens, an overlapping
open/unlink was found:

[...]
open   : yum[10870] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"        <- yum open
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"  <- rhn_check open
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"  <- rhn_check unlink
unlink : yum[10870] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"        <- yum unlink

  The last yum unlink causes the error, that it "detects" as
a disk I/O error when it attempts to unlink a file that only
it (yum) should have open.

  User has rhn_check running every 5 minutes.

Comment 3 Tomas Lestach 2016-10-26 09:04:51 UTC
Following the reproducer steps I see no traceback ...

# yum clean all; yum search rpmbuild
Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos,
              : subscription-manager
Cleaning repos: rhel-x86_64-server-7
Cleaning up everything
Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos,
              : subscription-manager
This system is receiving updates from RHN Classic or Red Hat Satellite.
rhel-x86_64-server-7                                     | 1.5 kB     00:00     
rhel-x86_64-server-7/group                               | 626 kB     00:00     
rhel-x86_64-server-7/updateinfo                          | 1.2 MB     00:00     
rhel-x86_64-server-7/primary                             |  15 MB     00:00     
rhel-x86_64-server-7                                                11396/11396
============================ N/S matched: rpmbuild =============================
perl-macros.x86_64 : Macros for rpmbuild

  Name and summary matches only, use "search all" for everything.
#

Comment 4 Tomas Lestach 2016-10-26 09:05:50 UTC
To correctly triage the BZ, can we get more information on this BZ? And to enhance reproducer steps?

Comment 5 Paulo Andrade 2016-10-26 11:43:03 UTC
The problem was concurrent access from rhn_check in
the background, and the user running yum.

The systemtap script used was:
"""
#!/usr/bin/stap

global mypath = "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal";
global quoted;

probe begin {
    printf("Stap ready.\n")
    quoted = sprintf("\"%s\"", mypath)
}

probe syscall.open {
    if (filename == quoted)
      printf("open   : %s[%d] %s\n", execname(), tid(), quoted)
}

probe syscall.unlink {
    if (pathname == quoted)
      printf("unlink : %s[%d] %s\n", execname(), tid(), pathname)
}

probe syscall.unlinkat {
    if (pathname_str == quoted)
      printf("unlinkat: %s[%d] %s\n", execname(), tid(), pathname_str)
}

probe syscall.rename {
    if (oldpath == quoted || newpath == quoted)
      printf("rename : %s[%d] %s -> %s\n", execname(), tid(), oldpath, newpath)
}

probe end {
    printf("Stap done.\n");
}
"""

And the part of the output showing the race was:
"""
open   : yum[10870] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"        <- yum open
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"
open   : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"  <- rhn_check open
unlink : rhn_check[10882] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"  <- rhn_check unlink
unlink : yum[10870] "/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/gen/primary.xml.sqlite-journal"        <- yum unlink
"""

  rhn_check and yum probably should have an exclusive lock
to prevent either from accessing the above files.

  A reproducer may not be too easy to build. It would
be something like:

script1:
while :; do rhn_check; done

script2:
while :; do yum clean all; yum search rpmbuild; done

what should trigger the condition relatively fast.


The original issue was due to running rhn_check every 5
minutes, and yum from time to time, so the problem would
infrequently happen.

Comment 6 Michael Mráka 2017-03-02 15:24:52 UTC
Fixed in upstream
commit 63010f56bae6a794ae6317bb9870edcc14518393
    1361185 - use yum's lock to prevent concurent yum & rhn_check actions

Comment 9 Pavel Studeník 2017-06-23 16:06:13 UTC
run:

>> while :; do rhn_check; done
>> while :; do yum clean all; yum search rpmbuild; done
....
    self.repos.populateSack(which=repos)
  File "/usr/lib/python2.7/site-packages/yum/repos.py", line 386, in populateSack
    sack.populate(repo, mdtype, callback, cacheonly)
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 263, in populate
    dobj = repo_cache_function(xml, csum)
  File "/usr/lib64/python2.7/site-packages/sqlitecachec.py", line 46, in getPrimary
    self.repoid))
TypeError: Can not create providesname index: disk I/O error


Verified: yum-rhn-plugin-2.0.1-9.el7.noarch

Comment 10 errata-xmlrpc 2017-08-01 22:19:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2039