Bug 1282838

Summary: Extremely slow channel.software.syncErrata API leads to spacewalk-clone-by-date traceback
Product: [Community] Spacewalk Reporter: Grant Gainey <ggainey>
Component: APIAssignee: Grant Gainey <ggainey>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.4CC: bishop, ggainey, jalviso, kshirsal, mc, risantam, satqe-list, tkasparek, tpapaioa
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: spacewalk-java-2.5.12-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1220487 Environment:
Last Closed: 2016-06-16 07:48:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1484117    

Description Grant Gainey 2015-11-17 15:22:07 UTC
+++ This bug was initially created as a clone of Bug #1220487 +++

Description of problem:

Run spacewalk-clone-by-date for a large base channel, and it errors out when calling the channel.software.syncErrata API:

****
# spacewalk-clone-by-date -y -d "2015-05-01" -u admin -p XXXXXX --channels=rhel-x86_64-server-6 clone-rhel-x86_64-server-6
Reading repository information.
Using issue_date.

By continuing the following will be cloned:
  rhel-x86_64-server-6 -> clone-rhel-x86_64-server-6  (2899/2899 Errata)
Cloning Errata into clone-rhel-x86_64-server-6 (2899):
________________________________________
######################################## - complete
Traceback (most recent call last):
  File "/usr/bin/spacewalk-clone-by-date", line 412, in <module>
    sys.exit(abs(main() or 0))
  File "/usr/bin/spacewalk-clone-by-date", line 402, in main
    return cloneByDate.main(args)
  File "/usr/share/rhn/utils/cloneByDate.py", line 282, in main
    cloner.clone(options.skip_depsolve)
  File "/usr/share/rhn/utils/cloneByDate.py", line 464, in clone
    cloner.process()
  File "/usr/share/rhn/utils/cloneByDate.py", line 621, in process
    self.clone()
  File "/usr/share/rhn/utils/cloneByDate.py", line 730, in clone
    self.remote_api.sync_errata(self.to_label)
  File "/usr/share/rhn/utils/cloneByDate.py", line 872, in sync_errata
    self.client.channel.software.syncErrata(self.auth_token, to_label)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1243, in request
    headers
xmlrpclib.ProtocolError: <ProtocolError for localhost/rpc/api: 503 Service Temporarily Unavailable>

Alternatively, run an API script that calls channel.software.syncErrata for a clone channel directly.

****
# cat api-sync-errata.py 
#!/usr/bin/python
import xmlrpclib

SATELLITE_URL = "http://localhost/rpc/api"
SATELLITE_LOGIN = "admin"
SATELLITE_PASSWORD = "XXXXXX"

client = xmlrpclib.Server(SATELLITE_URL, verbose=0)

key = client.auth.login(SATELLITE_LOGIN, SATELLITE_PASSWORD)

print client.channel.software.syncErrata(key, "clone-rhel-x86_64-server-6")
client.auth.logout(key)

# ./api-sync-errata.py
Traceback (most recent call last):
  File "./api-sync-errata.py", line 12, in <module>
    print client.channel.software.syncErrata(key, "clone-rhel-x86_64-server-6-4")
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1243, in request
    headers
xmlrpclib.ProtocolError: <ProtocolError for localhost/rpc/api: 503 Service Temporarily Unavailable>
****

Version-Release number of selected component (if applicable):

spacewalk-java-2.3.8-102.el6sat.noarch

How reproducible:

100%

Steps to Reproduce:

See description above.

Actual results:

503 error when running spacewalk-clone-by-date, due to slow channel.software.syncErrata API.

Expected results:

No error, syncErrata API finishes in a reasonable amount of time.

Additional info:

The API runs extremely slowly for a few reasons:

1.) ChannelSoftwareHandler.syncErrata() calls ChannelManager.listErrataNeedingResync() and ChannelManager.listErrataPackagesForResync(), which call the datasource queries 
list_errata_needing_sync and list_packages_needing_sync in Errata_queries.xml, respectively, which return a List of ErrataOverview or PackageOverview objects. These objects are only used to get the corresponding errata or package id's. If there are many errata in the cloned channel that need to be synced, then creating all of these objects just to get their id's is inefficient, and we could replace these queries with ones that return the id's in the first place.

****
./com/redhat/rhn/frontend/xmlrpc/channel/software/ChannelSoftwareHandler.java:

    public Integer syncErrata(User loggedInUser, String channelLabel) {
[...]
        List<ErrataOverview> errata = ChannelManager.listErrataNeedingResync(channel,
                loggedInUser);
        List<Long> eids = new ArrayList<Long>();
        for (ErrataOverview e : errata) {
            eids.add(e.getId());
        }

        List<PackageOverview> packages = ChannelManager
                .listErrataPackagesForResync(channel, loggedInUser);
        List<Long> pids = new ArrayList<Long>();
        for (PackageOverview p : packages) {
            pids.add(p.getId());
        }

        ChannelEditor.getInstance().addPackages(loggedInUser, channel, pids);

        for (Long eid : eids) {
            Errata e = ErrataManager.lookupErrata(eid, loggedInUser);
            if (e.isPublished() && e.isCloned()) {
                ErrataFactory.syncErrataDetails((PublishedClonedErrata) e);
            }
            else {
                log.fatal("Tried to sync errata with id " + eid +
                        " But it was not published or was not cloned");
            }
        }
****

2.) The queries used by list_errata_needing_sync and list_packages_needing_sync queries themselves are extremely slow. In my testing on Satellite 5.7 w/ embedded postgresql db, for a clone of rhel-x86_64-server-6 that already has all the errata synced, the queries ran for up to an hour without completing. The results can be obtained much more quickly without the inner joins on rhnErrata, rhnChannelErrata, etc. In my testing, the syncErrata API completed in 0.5 seconds, rather then timing out and never completing, with the new queries (see attached patch).

--- Additional comment from Grant Gainey on 2015-10-19 13:30:22 EDT ---

The patch submitted with this BZ does have a significant impact on the syncErrata call - I see it going from ~1.3s to ~0.5s. But syncErrata isn't the "long pole" in the channel-cloning process - indeed, even in its current suboptimal state it's a tiny part of the total time.

Some examples pre-patch:

===
[root@foo-011 conf]# time spacewalk-clone-by-date -y -d "2015-07-01" -u admin -p <pwd> --channels=rhel-x86_64-server-6 clone-rhel-x86_64-server-6 --no-errata-sync
...
real	81m38.495s
user	47m33.629s
sys	0m21.989s
[root@foo-011 conf]# time ./api-sync-errata.py 
1

real	0m1.118s
user	0m0.057s
sys	0m0.089s
[root@foo-011 conf]# 
===

and post-patch:

===
[root@foo-011 conf]# time spacewalk-clone-by-date -y -d "2015-07-01" -u admin -p <pwd> --channels=rhel-x86_64-server-6 clone-rhel-x86_64-server-6 --no-errata-sync
...
real	75m18.503s
user	45m48.657s
sys	0m22.711s
[root@foo-011 conf]# time ./api-sync-errata.py 
1

real	0m0.554s
user	0m0.031s
sys	0m0.025s
[root@foo-011 conf]#
===

Comment 1 Grant Gainey 2015-11-17 16:22:28 UTC
spacewalk.github: 2be48080365ed26f12de18818c3327e84af0dcf9

Comment 2 Tomáš Kašpárek 2016-06-16 07:48:10 UTC
Spacewalk 2.5 was released.

Comment 3 Michael Calmer 2016-09-06 09:31:15 UTC
Hi Grant: would you explain why you reverted this fix in git and what is the replacement for it?

Comment 4 Grant Gainey 2016-09-06 21:30:09 UTC
I reverted it because a) it didn't help much, and b) the queries were broken in ways that broke errata-cloning.

See https://bugzilla.redhat.com/show_bug.cgi?id=1367915 for more details.

There have been a couple of 'fixes' to clone-by-date recently that actually broke required functionality - see https://bugzilla.redhat.com/show_bug.cgi?id=1369185 for another example.

Comment 5 Eric Herget 2017-09-28 18:08:02 UTC
This BZ closed some time during 2.5, 2.6 or 2.7.  Adding to 2.7 tracking bug.