Hide Forgot
Description of problem: I installed a fresh setup. All cds and rhua nodes are installed from scratch. I added multiple repos. And cds sync ran on its scheduled interval during yesterday night. I found few sync failures in logs. Today when I triggered individual cds sync, sync ran for few minutes ( around 10-15) and failed with an error. I'm observing the same behaviour since morning. I tried sync one after another on two nodes and got same error. I'll add all outputs in seperate comments. Please note that when cds sync failed at that time goferd was running on cds nodes. Version-Release number of selected component (if applicable): rhui-tools 2.0.37 pulp 0.204 How reproducible: always on my fresh setup Steps to Reproduce: Add two or more large repos on rhua, then sync cds with rhua Actual results: PulpException: 'Error on the CDS during sync; check the server log for more information' Expected results: CDS sync should be successfull without any error. Additional info:
------------------------------------------------------------------------------ -= Red Hat Update Infrastructure Management Tool =- -= CDS Synchronization Status =- Last Refreshed: 16:31:58 (updated every 5 seconds, ctrl+c to exit) cds00120 .................................................... [ UP ] cds00121 .................................................... [ UP ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds00120 07-12-2011 20:08 07-12-2011 16:24 Error cds00121 07-12-2011 20:08 07-12-2011 16:31 Error Connected: dhcp201-141.englab.pnq.redhat.com ------------------------------------------------------------------------------ ^Crhui (sync) => Logs from cds node: ============== [root@dhcp201-121 ~]# tail -f /var/log/gofer/agent.log 2011-07-12 16:16:02,176 [ERROR][worker-0] __call__() @ dispatcher.py:359 - The following errors occurred during the CDS sync: Error synchronizing repository [rhel-server-6-releases-6Server-x86_64] Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/gofer/messaging/dispatcher.py", line 356, in __call__ retval = method(*args, **keywords) File "/usr/lib/gofer/plugins/cdsplugin.py", line 118, in sync cdslib.sync(sync_data) File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 191, in sync raise Exception('The following errors occurred during the CDS sync: ' + ', '.join(error_messages)) Exception: The following errors occurred during the CDS sync: Error synchronizing repository [rhel-server-6-releases-6Server-x86_64] [root@dhcp201-121 ~]#
rhui (sync) => vc Select a CDS: 1 - cds00120 2 - cds00121 Enter value (1-2) or 'b' to abort: 2 Start Time: 07-12-2011 16:10 Finish Time: 07-12-2011 16:16 Elapsed Time: 0:05:17 Result: Error Exception: PulpException('Error on the CDS during sync; check the server log for more information',) Traceback: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run result = self.callable(*self.args, **self.kwargs) File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync self.dispatcher.sync(cds, payload) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync self._send(stub.sync, data) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send result = func(*args) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__ return self.stub._send(request, opts) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send any=opts.any) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send return self.__getreply(sn, reader) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply return self.__onreply(envelope) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply raise RemoteException.instance(reply) PulpException: 'Error on the CDS during sync; check the server log for more information'
pulp.log: =========== 2011-07-12 16:16:01,626 28844:139986621933312: pulp.server.api.cds:ERROR: cds:585 CDS threw an error during sync to CDS [dhcp201-121.englab.pnq.redhat.com] Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync self.dispatcher.sync(cds, payload) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync self._send(stub.sync, data) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send result = func(*args) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__ return self.stub._send(request, opts) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send any=opts.any) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send return self.__getreply(sn, reader) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply return self.__onreply(envelope) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply raise RemoteException.instance(reply) CdsMethodException 2011-07-12 16:16:01,670 28844:139986621933312: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 657a44b5-ac73-11e0-81c4-525400cc6441: CdsApi.cds_sync(dhcp201-121.englab.pnq.redhat.com, ) Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run result = self.callable(*self.args, **self.kwargs) File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync self.dispatcher.sync(cds, payload) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync self._send(stub.sync, data) File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send result = func(*args) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__ return self.stub._send(request, opts) File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send any=opts.any) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send return self.__getreply(sn, reader) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply return self.__onreply(envelope) File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply raise RemoteException.instance(reply) PulpException: 'Error on the CDS during sync; check the server log for more information'
from the below outputs , we can see that sync was running for a while (used space increasing in folowing outputs) [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15103252 31531368 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15103372 31531248 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15103692 31530928 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15103836 31530784 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15075024 31559596 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp201121-lv_root 49130312 15058892 31575728 33% / tmpfs 498576 0 498576 0% /dev/shm /dev/vda1 495844 31582 438662 7% /boot [root@dhcp201-121 ~]# ^C [root@dhcp201-121 ~]#
As I mentioned on irc, try syncing other repo urls and see if you see the same issue. Based on what I see in pulp-cds logs the issue seems to be that this particular repo has a mismatch in filelist data and its not able to retrieve it.
I removed the "rhel6 release" repo and CDS sync was successfull with other repos. cds00120 .................................................... [ UP ] cds00121 .................................................... [ UP ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds00120 07-14-2011 20:08 07-13-2011 21:02 Success cds00121 07-14-2011 20:08 07-14-2011 15:40 Success Connected: dhcp201-141.englab.pnq.redhat.com ------------------------------------------------------------------------------ ^Crhui (sync) => exit [root@dhcp201-141 mongodb]# pulp-admin -u admin -p admin cds list +------------------------------------------+ CDS Instances +------------------------------------------+ Name cds00121 Hostname dhcp201-121.englab.pnq.redhat.com Description RHUI CDS Cluster rhui-cds-cluster Sync Schedule 2011-07-12T02:08:27+05:30/PT6H Repos rhel-server-6-updates-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, rhui-1.1-5Server-i386, rhui-1.1-5Server-x86_64, rhui-1.2-5Server-x86_64, rhui-1.2-5Server-i386 Last Sync 2011-07-14 15:40:43+05:30 Status: Responding Yes Last Heartbeat 2011-07-14 10:20:01.993358+00:00 Name cds00120 Hostname dhcp201-120.englab.pnq.redhat.com Description RHUI CDS Cluster rhui-cds-cluster Sync Schedule 2011-07-12T02:08:07+05:30/PT6H Repos rhel-server-6-updates-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, rhui-1.1-5Server-i386, rhui-1.1-5Server-x86_64, rhui-1.2-5Server-x86_64, rhui-1.2-5Server-i386 Last Sync 2011-07-13 21:02:42+05:30 Status: Responding No Last Heartbeat None
As discussed on irc, this seems to be issue with bad metadata on rhua. Sachin is working on retesting this on a fresh rhua setup. moving to on_qa.
I synced [rhel-server-6-releases-6Server-x86_64] repo from CDN to RHUA and then sync one CDS node from RHUA with same repo. Sync ran successfully. ----------------------------------------------------------------------------- -= Red Hat Update Infrastructure Management Tool =- -= CDS Synchronization Status =- Last Refreshed: 17:34:48 (updated every 5 seconds, ctrl+c to exit) cds108 ...................................................... [ UP ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds108 07-15-2011 19:07 07-15-2011 12:17 Success Connected: dhcp201-109.englab.pnq.redhat.com ------------------------------------------------------------------------------ ^Crhui (sync) => [root@dhcp201-109 mongodb]# pulp-admin -u admin -p admin cds list +------------------------------------------+ CDS Instances +------------------------------------------+ Name cds108 Hostname dhcp201-108.englab.pnq.redhat.com Description RHUI CDS Cluster rhui-cds-cluster Sync Schedule 2011-07-15T01:07:05+05:30/PT6H Repos rhel-server-6-releases-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, repo_001 Last Sync 2011-07-15 12:17:56+05:30 Status: Responding Yes Last Heartbeat 2011-07-15 12:06:40.093379+00:00 [root@dhcp201-109 mongodb]# So seems that the issue is not with the rhel6 releases repo. Currently I'm moving this to verified as its not reproducible.
moving to release pending
closing out, product released