Bug 711389

Summary: CDS sync status remains in "running" state and heartbeat stop responding
Product: Red Hat Update Infrastructure for Cloud Providers Reporter: Sachin Ghai <sghai>
Component: CDSAssignee: Jay Dobies <jason.dobies>
Status: CLOSED DUPLICATE QA Contact: wes hayutin <whayutin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: kbidarka, sghai
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-07 12:52:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sachin Ghai 2011-06-07 11:17:03 UTC
Description of problem:
I registered a new CDS (cds0065) and associate multiple repos ( custom and Redhat contents repos). I set the interval time of one hour. When CDS sync started after an hour, it was running fine for initial couple of minutes. and suddently the CDS node status went down. Then i checked the hearbeat using 'cds list'. It was not responding. 

Sync status remains un 'running" state even after couple of hours.

I noticed this happens in case of large repo sync's and only on scheduled syncs.  



------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 15:28:52
(updated every 50 seconds, ctrl+c to exit)


cds00193 .................................................... [  UP  ]
cds0065 ..................................................... [ DOWN ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds00193
06-07-2011 16:23             06-07-2011 15:23             finished   

cds0065
06-07-2011 14:45             Never                        running    


                                          Connected: dhcp193-79.pnq.redhat.com
------------------------------------------------------------------------------
^Crhui (sync) => 



[root@dhcp193-79 pulp]# pulp-admin -u admin -p admin cds list
+------------------------------------------+
                CDS Instances
+------------------------------------------+

Name                	cds0065                  
Hostname            	dhcp193-65.pnq.redhat.com
Description         	None                     
Group               	None                     
Sync Schedule       	2011-06-07T13:45:23+05:30/PT1H
Repos               	repo101, repo102, rhel-server-6-optional-releases-6Server-x86_64, rhel-server-6-releases-6Server-x86_64, rhui-1.2-5Server-i386, rhui-1.2-5Server-x86_64
Last Sync           	Never                    
Status:
   Responding       	No                       
   Last Heartbeat   	2011-06-07 09:25:34.755349+00:00


Name                	cds00193                 
Hostname            	dhcp193-193.pnq.redhat.com
Description         	None                     
Group               	None                     
Sync Schedule       	2011-06-07T14:23:29+05:30/PT1H
Repos               	None                     
Last Sync           	2011-06-07 14:25:40+05:30
Status:
   Responding       	Yes                      
   Last Heartbeat   	2011-06-07 09:27:49.343612+00:00


[root@dhcp193-79 pulp]# 


Version-Release number of selected component (if applicable):
pulp 0.186
rhui-tools 2.0.26

How reproducible:
Yesterday I started my test and I faced this issue 3 times.

Steps to Reproduce:
1. Registered CDS node
2. Associate multiple large repos
3. Wait for sync schedule to start.
3. check the sync status using rhui-manager
  
Actual results:
Heartbeat stop responding and sync status remains running

Expected results:
CDS Sync should work properly for Large repos.

Additional info:

Comment 1 Sachin Ghai 2011-06-07 11:19:40 UTC
CDS sync status ( for cds0065) in rhui-manager displays running:
=========================================

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 16:40:22
(updated every 50 seconds, ctrl+c to exit)


cds00193 .................................................... [  UP  ]
cds0065 ..................................................... [ DOWN ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds00193
06-07-2011 17:23             06-07-2011 16:23             finished   

cds0065
06-07-2011 14:45             Never                        running    


                                          Connected: dhcp193-79.pnq.redhat.com

------------------------------------------------------------------------------


This is from CDS node. No change in disk usage. It means pkg downloading is not running.

[root@dhcp193-65 Packages]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp19365-lv_root
                      19134332   6256356  11905996  35% /
tmpfs                   251696         0    251696   0% /dev/shm
/dev/vda1               495844     30226    440018   7% /boot
[root@dhcp193-65 Packages]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp19365-lv_root
                      19134332   6256356  11905996  35% /
tmpfs                   251696         0    251696   0% /dev/shm
/dev/vda1               495844     30226    440018   7% /boot
[root@dhcp193-65 Packages]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp19365-lv_root
                      19134332   6256356  11905996  35% /
tmpfs                   251696         0    251696   0% /dev/shm
/dev/vda1               495844     30226    440018   7% /boot
[root@dhcp193-65 Packages]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/vg_dhcp19365-lv_root
                      19134332   6256356  11905996  35% /
tmpfs                   251696         0    251696   0% /dev/shm
/dev/vda1               495844     30226    440018   7% /boot

Comment 2 Jay Dobies 2011-06-07 12:52:34 UTC
I saw this yesterday too. Gofer crashed (no errors in gofer's logs, but the process wasn't running) and the sync perpetually remained in the running state.

The running issue will likely be addressed in one of the other sync status related bugs. The heartbeat one I think is a gofer issue and is already being worked by 711329.

*** This bug has been marked as a duplicate of bug 711329 ***