Description of problem: I registered a new CDS (cds0065) and associate multiple repos ( custom and Redhat contents repos). I set the interval time of one hour. When CDS sync started after an hour, it was running fine for initial couple of minutes. and suddently the CDS node status went down. Then i checked the hearbeat using 'cds list'. It was not responding. Sync status remains un 'running" state even after couple of hours. I noticed this happens in case of large repo sync's and only on scheduled syncs. ------------------------------------------------------------------------------ -= Red Hat Update Infrastructure Management Tool =- -= CDS Synchronization Status =- Last Refreshed: 15:28:52 (updated every 50 seconds, ctrl+c to exit) cds00193 .................................................... [ UP ] cds0065 ..................................................... [ DOWN ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds00193 06-07-2011 16:23 06-07-2011 15:23 finished cds0065 06-07-2011 14:45 Never running Connected: dhcp193-79.pnq.redhat.com ------------------------------------------------------------------------------ ^Crhui (sync) => [root@dhcp193-79 pulp]# pulp-admin -u admin -p admin cds list +------------------------------------------+ CDS Instances +------------------------------------------+ Name cds0065 Hostname dhcp193-65.pnq.redhat.com Description None Group None Sync Schedule 2011-06-07T13:45:23+05:30/PT1H Repos repo101, repo102, rhel-server-6-optional-releases-6Server-x86_64, rhel-server-6-releases-6Server-x86_64, rhui-1.2-5Server-i386, rhui-1.2-5Server-x86_64 Last Sync Never Status: Responding No Last Heartbeat 2011-06-07 09:25:34.755349+00:00 Name cds00193 Hostname dhcp193-193.pnq.redhat.com Description None Group None Sync Schedule 2011-06-07T14:23:29+05:30/PT1H Repos None Last Sync 2011-06-07 14:25:40+05:30 Status: Responding Yes Last Heartbeat 2011-06-07 09:27:49.343612+00:00 [root@dhcp193-79 pulp]# Version-Release number of selected component (if applicable): pulp 0.186 rhui-tools 2.0.26 How reproducible: Yesterday I started my test and I faced this issue 3 times. Steps to Reproduce: 1. Registered CDS node 2. Associate multiple large repos 3. Wait for sync schedule to start. 3. check the sync status using rhui-manager Actual results: Heartbeat stop responding and sync status remains running Expected results: CDS Sync should work properly for Large repos. Additional info:
CDS sync status ( for cds0065) in rhui-manager displays running: ========================================= ------------------------------------------------------------------------------ -= Red Hat Update Infrastructure Management Tool =- -= CDS Synchronization Status =- Last Refreshed: 16:40:22 (updated every 50 seconds, ctrl+c to exit) cds00193 .................................................... [ UP ] cds0065 ..................................................... [ DOWN ] Next Sync Last Sync Last Result ------------------------------------------------------------------------------ cds00193 06-07-2011 17:23 06-07-2011 16:23 finished cds0065 06-07-2011 14:45 Never running Connected: dhcp193-79.pnq.redhat.com ------------------------------------------------------------------------------ This is from CDS node. No change in disk usage. It means pkg downloading is not running. [root@dhcp193-65 Packages]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp19365-lv_root 19134332 6256356 11905996 35% / tmpfs 251696 0 251696 0% /dev/shm /dev/vda1 495844 30226 440018 7% /boot [root@dhcp193-65 Packages]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp19365-lv_root 19134332 6256356 11905996 35% / tmpfs 251696 0 251696 0% /dev/shm /dev/vda1 495844 30226 440018 7% /boot [root@dhcp193-65 Packages]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp19365-lv_root 19134332 6256356 11905996 35% / tmpfs 251696 0 251696 0% /dev/shm /dev/vda1 495844 30226 440018 7% /boot [root@dhcp193-65 Packages]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg_dhcp19365-lv_root 19134332 6256356 11905996 35% / tmpfs 251696 0 251696 0% /dev/shm /dev/vda1 495844 30226 440018 7% /boot
I saw this yesterday too. Gofer crashed (no errors in gofer's logs, but the process wasn't running) and the sync perpetually remained in the running state. The running issue will likely be addressed in one of the other sync status related bugs. The heartbeat one I think is a gofer issue and is already being worked by 711329. *** This bug has been marked as a duplicate of bug 711329 ***