Bug 829485

Summary: errata.clone and errata.cloneAsOriginal times out with 503 Service Temporarily Unavailable when errate list > 50 items
Product: [Community] Spacewalk Reporter: Stephen Herr <sherr>
Component: APIAssignee: Stephen Herr <sherr>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: high Docs Contact:
Priority: high    
Version: 1.8CC: cperry, csuleski, dtrainor, mhuth, mmello, nigjones, prutledg, slukasik, tlestach, xdmoon
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: spacewalk-java-1.8.98-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 745688 Environment:
Last Closed: 2012-11-01 16:21:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 745688, 838672    
Bug Blocks: 871344    

Description Stephen Herr 2012-06-06 21:17:00 UTC
+++ This bug was initially created as a clone of Bug #745688 +++

Description of problem:
If the errata list is bigger than about 50 the errata.clone and errata.cloneAsOriginal API methods encounter a 503 Service Temporarily Unavailable error after 60 seconds.  However the errata are actually being processed in the background and if we wait for some time then we can see that all the errata were cloned ok.

When you clone a bunch of errata via the webui, the clone action is performed in background and the webui is directly usable and an Internal Server Error doesn't result.  However, the same action performed by the API is a blocking action and eventually the 503 error is the result, yet the errata are still being cloned in the background.

Is it possible for the errata.clone and errata.cloneAsOriginal API calls to be non-blocking (ie don't result in the 503 error) and the errata still be executed in background?

Version-Release number of selected component (if applicable):
Satellite 5.4.1

How reproducible:
Always

Steps to Reproduce:
1. Write an API script to have errata.clone() or errata.cloneAsOriginal() process more than 50 errata
2. 60 seconds after the call is made a xmlrpclib.ProtocolError exception occurs with a 503 Service Temporarily Unavailable error
3. The errata continue to be processed in the background and sometime later (eg an hour) the errata are fully cloned.
  
Actual results:
xmlrpclib.ProtocolError: <ProtocolError for gss-sat.gsslab.pnq.redhat.com/rpc/api: 503 Service Temporarily Unavailable>

Expected results:
errata.clone() and errata.cloneAsOriginal() are non-blocking and return before the timeout, like the webUI with cloning errata.

Additional info:
Problem officially known and reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=678721#c6
"when cloning more than 8 errata within the API, the API times out with: 503 Service Temporarily Unavailable"

https://bugzilla.redhat.com/show_bug.cgi?id=591291#c21
"The errata merging process is no longer completed within the api call, but on the background. Note: It takes about 28 minutes on my test machine."

--- Additional comment from mhuth on 2011-10-13 00:16:10 EDT ---

Running this script:
---------------------
#!/usr/bin/python
import xmlrpclib

saturl = "<saturl>"
username = "<username"
password = "<password>"

sat = xmlrpclib.Server(saturl, verbose=1)
key = sat.auth.login(username, password)

errataList = sat.channel.software.mergeErrata(key, "rhel-x86_64-server-optional-6", "opt6-clone3", "1970-01-01", "2011-09-10")
cloneErrataList = []
for errata in errataList:
   cloneErrataList.append(errata['advisory_name'])

ret = sat.errata.cloneAsOriginal(key, "opt6-clone3", cloneErrataList)
print ret

sat.auth.logout(key)
---------------------

... results in:
reply: 'HTTP/1.1 503 Service Temporarily Unavailable\r\n'
header: Date: Mon, 10 Oct 2011 05:57:33 GMT
header: Content-Length: 402
header: Connection: close
header: Content-Type: text/html; charset=iso-8859-1
Traceback (most recent call last):
  File "./merge_errata3.py", line 16, in <module>
    ret = sat.errata.cloneAsOriginal(key, "rs-clone4", cloneErrataList)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1243, in request
    headers
xmlrpclib.ProtocolError: <ProtocolError for gss-sat.gsslab.pnq.redhat.com/rpc/api: 503 Service Temporarily Unavailable>

... and in the /var/log/httpd/error_log on the satellite:
[Mon Oct 10 11:28:33 2011] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
[Mon Oct 10 11:28:33 2011] [error] ajp_read_header: ajp_ilink_receive failed
[Mon Oct 10 11:28:33 2011] [error] (120006)APR does not understand this error code: proxy: dialog to (null) (*) failed
... so the webserver timed out getting a response back from tomcat and hence the ISE.

--- Additional comment from mhuth on 2011-10-13 18:40:54 EDT ---

Expanded reproducer steps:

1. In the Satellite WebUI, create a clone of a channel, eg rhel-x86_64-server-optional-6.  Select 'original state of the channel' when creating the clone.
2. In an API script, use channel.software.mergeErrata() API method to get a list of errata to be applied to the cloned channel.  Make sure the list > 50 items 
3. Pass this list of errata to errata.clone() or errata.cloneAsOriginal()
4. After 60 seconds the method call will time out with the ISE message:

xmlrpclib.ProtocolError: <ProtocolError for
gss-sat.gsslab.pnq.redhat.com/rpc/api: 503 Service Temporarily Unavailable>

5. However the errata cloning process is performed in the background and may take a while.  The calls can be seen in /var/log/rhn/rhn_web_api.log:

[2011-10-11 06:55:24,745] INFO  - REQUESTED FROM: *callerIp* CALL: auth.login(admin, ********) CALLER: (admin) TIME: 0.0070 seconds
[2011-10-11 06:55:25,556] INFO  - REQUESTED FROM: *callerIp* CALL: channel.software.mergeErrata(39569x046414df1c65709ffa173eeaa032f4f8, rhel-x86_64-server-optional-6, opt6-clone3, 1970-01-01, 2011-09-10) CALLER: (admin) TIME: 0.233 seconds
...
[2011-10-11 08:17:19,962] INFO  - REQUESTED FROM: *callerIp* CALL: errata.cloneAsOriginal(39569x046414df1c65709ffa173eeaa032f4f8, opt6-clone3, [RHSA-2011:1242, RHBA-2011:1124, RHEA-2011:0654, RHBA-2011:0935, ... RHSA-2011:1154]) CALLER: (admin) TIME: 4912.745 seconds

cloneAsOriginal() method started at 6.55am and finished at 8.17am.

Hope that helps,
Mark

--- Additional comment from tlestach on 2012-01-25 11:06:38 EST ---

this commit might help ...

spacewalk.git: 5cfe38aea6d5574654117aabae20ba45d8ca96a0

(just note for myself, need to verify)

--- Additional comment from tpapaioa on 2012-05-16 13:08:00 EDT ---

=== In Red Hat Customer Portal Case 00638950 ===
--- Comment by Papaioannou, Tasos on 5/16/2012 1:07 PM ---

In my testing, the timeouts were due to the fact that the errata.clone* APIs do not perform the errata cloning asynchronously, whereas errata cloning from the web UI does.

1.) clone errata from the Web UI:

***
code/src/com/redhat/rhn/frontend/action/channel/manage/ConfirmErrataAction.java:


        if (requestContext.wasDispatched("Clone Errata")) {
            Map params = new HashMap();
            params.put("cid", cid);
            return getStrutsDelegate().forwardParams(mapping.findForward("clone"), params);
        }


code/src/com/redhat/rhn/frontend/action/channel/manage/PublishErrataAction.java:

        ErrataManager.publishErrataToChannelAsync(currentChan, errataIds, user);
***


2.) clone errata via errata.cloneAsOriginal API (via custom script or spacewalk-clone-by-date command):

***
./code/src/com/redhat/rhn/frontend/xmlrpc/errata/ErrataHandler.java:

    public Object[] cloneAsOriginal(String sessionKey, String channelLabel,
            List<String> advisoryNames) throws InvalidChannelRoleException {
        return clone(sessionKey, channelLabel, advisoryNames, true);

    public Object[] clone(String sessionKey, String channelLabel,
            List advisoryNames) throws InvalidChannelRoleException {
        return clone(sessionKey, channelLabel, advisoryNames, false);

    private Object[] clone(String sessionKey, String channelLabel,
            List<String> advisoryNames, boolean inheritAllPackages) {

[...]

        //Now publish them all to the channel in a single shot
        List<Errata> published = ErrataFactory.publishToChannel(errataToPublish, channel,
                loggedInUser, true);
        for (Errata e : published) {
            ErrataFactory.save(e);
        }
***


Increasing the value of the Timeout parameter in /etc/httpd/conf/httpd.conf to a sufficiently large number of seconds, then restarting httpd, avoids the error (but the API still takes a long time to run, e.g., 30-40 minutes for a couple hundred advisories).

Comment 1 Stephen Herr 2012-06-06 21:26:33 UTC
I have created two new API methods that clone errata asynchronously, cloneAsync() and cloneAsOriginalAsync(). These methods will enqueue the errata cloning and return, and will not wait for the actual cloning to occur.

I have also updated the spacewalk-clone-by-date script to include a --background option, which will use the asynchronous methods instead of the synchronous ones. 

Committed to Spacewalk git: faee056b1eacf35b36d34a79c2aaaffb58c0255e

Comment 2 Stephen Herr 2012-06-07 13:56:52 UTC
and 85b1ab06a230843ccd263cb626b84c0ac6f59f00
updated man page ^

Comment 3 Stephen Herr 2012-06-07 14:05:51 UTC
and 24e19395e13ddc8d88dcbeace669db62e388bdec
(fixed typo)

Comment 4 Stephen Herr 2012-06-22 20:05:38 UTC
and 32b55cbe8222607a9c8f59d109f3f90ed96e1fee
This commit fixes some database issues.

Comment 5 Stephen Herr 2012-06-26 18:47:15 UTC
This commit fixes the last of the known issues, the deadlock if you try to schedule a synchronous job while the asyncrhonous jobs are happening in the background. Instead of a deadlock the user is presented with an error and told to wait a little while.

7713d370ccb509980056ecd18a5846960f964458

Comment 7 Jan Pazdziora 2012-10-30 19:25:25 UTC
Moving ON_QA. Packages that address this bugzilla should now be available in yum repos at http://yum.spacewalkproject.org/nightly/

Comment 8 Jan Pazdziora 2012-11-01 16:21:13 UTC
Spacewalk 1.8 has been released: https://fedorahosted.org/spacewalk/wiki/ReleaseNotes18