Bug 720657

Summary: CDS sync failed while syncing multiple repos from rhua
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: Sachin Ghai <sghai>
Component: CDSAssignee: Jay Dobies <jason.dobies>
Status: CLOSED CURRENTRELEASE QA Contact: wes hayutin <whayutin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: kbidarka, pkilambi, sghai
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-31 12:52:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sachin Ghai 2011-07-12 12:07:25 UTC
Description of problem:
I installed a fresh setup. All cds and rhua nodes are installed from scratch. I added multiple repos. And cds sync ran on its scheduled interval during yesterday night. I found few sync failures in logs. 

Today when I triggered individual cds sync, sync ran for few minutes ( around 10-15) and failed with an error.

I'm observing the same behaviour since morning. I tried sync one after another on two nodes and got same error. I'll add all outputs in seperate comments.

Please note that when cds sync failed at that time goferd was running on cds nodes.

Version-Release number of selected component (if applicable):
rhui-tools 2.0.37
pulp 0.204

How reproducible:
always on my fresh setup

Steps to Reproduce:
Add two or more large repos on rhua, then sync cds with rhua
  
Actual results:
PulpException: 'Error on the CDS during sync; check the server log for more information'


Expected results:
CDS sync should be successfull without any error.

Additional info:

Comment 1 Sachin Ghai 2011-07-12 12:09:25 UTC
------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 16:31:58
(updated every 5 seconds, ctrl+c to exit)


cds00120 .................................................... [  UP  ]
cds00121 .................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds00120
07-12-2011 20:08             07-12-2011 16:24             Error      

cds00121
07-12-2011 20:08             07-12-2011 16:31             Error      


                                  Connected: dhcp201-141.englab.pnq.redhat.com
------------------------------------------------------------------------------
^Crhui (sync) => 



Logs from cds node:
==============

[root@dhcp201-121 ~]# tail -f /var/log/gofer/agent.log

2011-07-12 16:16:02,176 [ERROR][worker-0] __call__() @ dispatcher.py:359 - The following errors occurred during the CDS sync: Error synchronizing repository [rhel-server-6-releases-6Server-x86_64]
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/gofer/messaging/dispatcher.py", line 356, in __call__
    retval = method(*args, **keywords)
  File "/usr/lib/gofer/plugins/cdsplugin.py", line 118, in sync
    cdslib.sync(sync_data)
  File "/usr/lib/python2.6/site-packages/pulp/cds/cdslib.py", line 191, in sync
    raise Exception('The following errors occurred during the CDS sync: ' + ', '.join(error_messages))
Exception: The following errors occurred during the CDS sync: Error synchronizing repository [rhel-server-6-releases-6Server-x86_64]
[root@dhcp201-121 ~]#

Comment 2 Sachin Ghai 2011-07-12 12:09:51 UTC
rhui (sync) => vc

Select a CDS: 
  1  - cds00120
  2  - cds00121
Enter value (1-2) or 'b' to abort: 2

  Start Time:     07-12-2011 16:10
  Finish Time:    07-12-2011 16:16
  Elapsed Time:   0:05:17
  Result:         Error
  Exception:      PulpException('Error on the CDS during sync; check the server log for more information',)
  Traceback:
Traceback (most recent call last):

  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
    result = self.callable(*self.args, **self.kwargs)

  File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync
    self.dispatcher.sync(cds, payload)

  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync
    self._send(stub.sync, data)

  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send
    result = func(*args)

  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__
    return self.stub._send(request, opts)

  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send
    any=opts.any)

  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send
    return self.__getreply(sn, reader)

  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply
    return self.__onreply(envelope)

  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply
    raise RemoteException.instance(reply)

PulpException: 'Error on the CDS during sync; check the server log for more information'

Comment 3 Sachin Ghai 2011-07-12 12:10:45 UTC
pulp.log:
===========
2011-07-12 16:16:01,626 28844:139986621933312: pulp.server.api.cds:ERROR: cds:585 CDS threw an error during sync to CDS [dhcp201-121.englab.pnq.redhat.com]
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync
    self.dispatcher.sync(cds, payload)
  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync
    self._send(stub.sync, data)
  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send
    result = func(*args)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__
    return self.stub._send(request, opts)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send
    any=opts.any)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send
    return self.__getreply(sn, reader)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply
    return self.__onreply(envelope)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply
    raise RemoteException.instance(reply)
CdsMethodException
2011-07-12 16:16:01,670 28844:139986621933312: pulp.server.tasking.task:ERROR: task:380 Task failed: Task 657a44b5-ac73-11e0-81c4-525400cc6441: CdsApi.cds_sync(dhcp201-121.englab.pnq.redhat.com, )
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pulp/server/tasking/task.py", line 329, in run
    result = self.callable(*self.args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/pulp/server/api/cds.py", line 568, in cds_sync
    self.dispatcher.sync(cds, payload)
  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 138, in sync
    self._send(stub.sync, data)
  File "/usr/lib/python2.6/site-packages/pulp/server/cds/dispatcher.py", line 170, in _send
    result = func(*args)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 71, in __call__
    return self.stub._send(request, opts)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/stub.py", line 142, in _send
    any=opts.any)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 121, in send
    return self.__getreply(sn, reader)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 173, in __getreply
    return self.__onreply(envelope)
  File "/usr/lib/python2.6/site-packages/gofer/messaging/policy.py", line 189, in __onreply
    raise RemoteException.instance(reply)
PulpException: 'Error on the CDS during sync; check the server log for more information'

Comment 4 Sachin Ghai 2011-07-12 12:13:15 UTC
from the below outputs , we can see that sync was running for a while (used space increasing in folowing outputs)

[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15103252  31531368  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15103372  31531248  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15103692  31530928  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15103836  31530784  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15075024  31559596  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp201121-lv_root
                      49130312  15058892  31575728  33% /
tmpfs                   498576         0    498576   0% /dev/shm
/dev/vda1               495844     31582    438662   7% /boot
[root@dhcp201-121 ~]# ^C
[root@dhcp201-121 ~]#

Comment 5 Pradeep Kilambi 2011-07-13 17:10:06 UTC
As I mentioned on irc, try syncing other repo urls and see if you see the same issue. Based on what I see in pulp-cds logs the issue seems to be that this particular repo has a mismatch in filelist data and its not able to retrieve it.

Comment 6 Sachin Ghai 2011-07-14 10:24:06 UTC
I removed the "rhel6 release" repo and CDS sync was successfull with other repos.

cds00120 .................................................... [  UP  ]
cds00121 .................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds00120
07-14-2011 20:08             07-13-2011 21:02             Success    

cds00121
07-14-2011 20:08             07-14-2011 15:40             Success    


                                  Connected: dhcp201-141.englab.pnq.redhat.com
------------------------------------------------------------------------------
^Crhui (sync) => exit
[root@dhcp201-141 mongodb]# pulp-admin -u admin -p admin cds list
+------------------------------------------+
                CDS Instances
+------------------------------------------+

Name                	cds00121                 
Hostname            	dhcp201-121.englab.pnq.redhat.com
Description         	RHUI CDS                 
Cluster             	rhui-cds-cluster         
Sync Schedule       	2011-07-12T02:08:27+05:30/PT6H
Repos               	rhel-server-6-updates-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, rhui-1.1-5Server-i386, rhui-1.1-5Server-x86_64, rhui-1.2-5Server-x86_64, rhui-1.2-5Server-i386
Last Sync           	2011-07-14 15:40:43+05:30
Status:
   Responding       	Yes                      
   Last Heartbeat   	2011-07-14 10:20:01.993358+00:00


Name                	cds00120                 
Hostname            	dhcp201-120.englab.pnq.redhat.com
Description         	RHUI CDS                 
Cluster             	rhui-cds-cluster         
Sync Schedule       	2011-07-12T02:08:07+05:30/PT6H
Repos               	rhel-server-6-updates-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, rhui-1.1-5Server-i386, rhui-1.1-5Server-x86_64, rhui-1.2-5Server-x86_64, rhui-1.2-5Server-i386
Last Sync           	2011-07-13 21:02:42+05:30
Status:
   Responding       	No                       
   Last Heartbeat   	None

Comment 7 Pradeep Kilambi 2011-07-14 13:39:45 UTC
As discussed on irc, this seems to be issue with bad metadata on rhua. Sachin is working on retesting this on a fresh rhua setup. moving to on_qa.

Comment 8 Sachin Ghai 2011-07-15 12:14:06 UTC
I synced [rhel-server-6-releases-6Server-x86_64] repo from CDN to RHUA and then sync one CDS node from RHUA with same repo. Sync ran successfully.





-----------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 17:34:48
(updated every 5 seconds, ctrl+c to exit)


cds108 ...................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds108
07-15-2011 19:07             07-15-2011 12:17             Success    


                                  Connected: dhcp201-109.englab.pnq.redhat.com
------------------------------------------------------------------------------
^Crhui (sync) => 


[root@dhcp201-109 mongodb]# pulp-admin -u admin -p admin cds list
+------------------------------------------+
                CDS Instances
+------------------------------------------+

Name                	cds108                   
Hostname            	dhcp201-108.englab.pnq.redhat.com
Description         	RHUI CDS                 
Cluster             	rhui-cds-cluster         
Sync Schedule       	2011-07-15T01:07:05+05:30/PT6H
Repos               	rhel-server-6-releases-6Server-x86_64, rhel-server-6-optional-updates-6Server-x86_64, repo_001
Last Sync           	2011-07-15 12:17:56+05:30
Status:
   Responding       	Yes                      
   Last Heartbeat   	2011-07-15 12:06:40.093379+00:00


[root@dhcp201-109 mongodb]# 



So seems that the issue is not with the rhel6 releases repo. Currently I'm moving this to verified as its not reproducible.

Comment 9 wes hayutin 2011-08-01 21:38:39 UTC
moving to release pending

Comment 10 wes hayutin 2012-05-31 12:52:02 UTC
closing out, product released