Description of problem: The Fedora infrastructure runs a script every 10 minutes (via a cron) to synchronize the assignee and default CC from our package database to bugzilla. Up until the last maintenance window of May 10th, this script was running under 10 minutes. Since then the script takes much longer and ends up with a `502 Server Error: Proxy Error` error. Version-Release number of selected component (if applicable): - python-bugzilla-1.2.0-1.el7.noarch - current bugzilla How reproducible: always Steps to Reproduce: 1. run https://github.com/fedora-infra/pkgdb2/blob/master/utility/pkgdb-sync-bugzilla 2. wait :) 3. Actual results: The error occurs when we call: bugzilla.add_edit_component() at: https://github.com/fedora-infra/pkgdb2/blob/master/utility/pkgdb-sync-bugzilla#L420 Expected results: Component is edited fine and quickly Additional info: From the discussion with mkeir on IRC (#fedora-admin): mkeir | pingou, I can see some "login_required" errors in the XMLRPC log for the IP matching the server after prior cookie sessions. It may be best to raise this as a BZ for the development team to look at Since I was told that time information are interested, here is a run of the script: Of course when I tried to replicate the error I do not get it, so a run that went fine but took a while: # date -u && time PKGDB2_CONFIG=/etc/pkgdb2/pkgdb2.cfg /usr/bin/pkgdb-sync-bugzilla && date -u Mon May 11 08:13:47 UTC 2015 real 66m0.319s user 18m1.121s sys 1m10.144s Mon May 11 09:19:48 UTC 2015 The 502 error did occur just before that
I've started another run of the script to see if I can generate the error again, but regardless of the 502 error, the script is now running more than 6 times slower than it used to :(
The Bugzilla master database server has been experiencing some severe load spikes recently. Investigation is ongoing. I believe that mkeir added some mod_qos config to throttle the rate of requests from your script in an attempt to determine whether that was the source of the load spikes. The high load and the throttling will both be contributing to the increase in run-time for your script. At present, the evidence I can see in the logs suggests that the pkgdb-sync-bugzilla script is not the primary source of the load spikes we are seeing, but that the script can be improved. The script seems to make an excessive number of Component.update RPC calls, and there was a dramatic increase in the quantity of those on May 10. For the three days prior to that, pkgdb02.phx2.fedoraproject.org hit Bugzilla about 25,000 times per day. On May 10, it hit Bugzilla about 200,000 times. From inspection of logging data, it appears that the script is rewriting all component ownership information on every run, rather than just what has changed since the previous run. For example, the script called Component.update 608 times for the MySQL component on May 10. Looking at the attached log extract, it appears that there may be some other problems for that component, as the script is making separate calls for "MySQL" and "mysql", with different ownership details for each. Also, some calls include a component description and some do not. In any case, one can see that the assignee and default_cc for the component does not change from one call to the next, so there's no need for those calls to be made. Ideally, the script should only sync changes to the data and not rewrite data that Bugzilla already has up-to-date. Doing that would result in much shorter run-time for the script as well as reducing the load on Bugzilla. Finally, note that it is possible to update multiple components in a single call to Component.update, as described in https://bugzilla.redhat.com/docs/en/html/api/extensions/RedHat/lib/WebService/Component.html#Update_Components. Updating multiple components in a single call is considerably faster than updating the same number of components in separate calls, and again will help to reduce the load on Bugzilla.
I agree that this script could be optimized, but it's still taking it 6 times longer than it used to :) At the time I ran the tests and before when I was running into these 502 errors I was speaking with mkeir on IRC whoe told me that the mod_qos config had been disabled. Together with him we figure out that we had several instances of our cron running which is what was producing the traffic you were seeing. I think these several instances of the script were, at least partly, due to the script taking much longer and thus the cron starting another copy while the first was still running. We are now using a lock to ensure this situation does not occur again in the future. As for the MySQL vs mysql example you give, pkgdb has both components: https://admin.fedoraproject.org/pkgdb/package/mysql/ https://admin.fedoraproject.org/pkgdb/package/MySQL/ Both are retired but they still exists and I guess they probably exists also in bugzilla's component list for Fedora. We have made some work on pkgdb to list all the package that are retired on all active branches, with the idea that they could be removed from bugzilla: https://admin.fedoraproject.org/pkgdb/api/#list_packages_retired but this might deserve a bug report of its own. I will see if we can port the script to be and smarter and doing the update of multiple components at once.
(In reply to Pierre-YvesChibon from comment #4) > I agree that this script could be optimized, but it's still taking it 6 > times longer than it used to :) The increased run-time was at least partly due to another user overloading the system with large numbers of concurrent search queries. We're working with that user to improve their application so that it uses Bugzilla more fairly and we have also asked them to use staging systems for testing rather than the live system. > As for the MySQL vs mysql example you give, pkgdb has both components: > https://admin.fedoraproject.org/pkgdb/package/mysql/ > https://admin.fedoraproject.org/pkgdb/package/MySQL/ > Both are retired but they still exists and I guess they probably exists also > in bugzilla's component list for Fedora. This might be a problem when Bugzilla changes backend databases from MySQL to PostgreSQL in a few months time. Pg is case-insensitive by default, so I would expect that it might match both of these for a Component.update call instead of matching only the correct one. For retired components that's probably not a problem, but if there are other instances where the names of live components differ only in case, Component.update is likely to misbehave. > We have made some work on pkgdb to list all the package that are retired on > all active branches, with the idea that they could be removed from bugzilla: > https://admin.fedoraproject.org/pkgdb/api/#list_packages_retired > but this might deserve a bug report of its own. Note that a component that is retired can't be deleted if it is referenced by one or more bugs. Instead, the component is marked as "Not Enabled for Bugs", which prevents any new bugs being filed against the component and prevents any existing bugs being moved to that component. > I will see if we can port the script to be and smarter and doing the update > of multiple components at once. That would be appreciated. Feel free to ping the Bugzilla team if you have any further questions about the API.
I think the issues that lead to the reported problem have been resolved. The user who triggered the original load spike has been moved to a test instance of Bugzilla and recent logging data shows that the number of hits from pkgdb02.phx2.fedoraproject.org fell from 633866 in March to 54241 in July, with a corresponding fall in the number of calls to Component.update. Please file new bugs for any future problems with accessing Red Hat Bugzilla.