Bug 1540981

Summary: cdn-sync fails with error "SYNC ERROR: attempting to display as much information as possible".
Product: Red Hat Satellite 5 Reporter: Deepannagaraj Nagarathinam <dnagarat>
Component: Satellite SynchronizationAssignee: Grant Gainey <ggainey>
Status: CLOSED ERRATA QA Contact: Radovan Drazny <rdrazny>
Severity: medium Docs Contact:
Priority: medium    
Version: 580CC: ggainey, rdrazny, s.eerkes, tlestach
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: spacewalk-backend-2.5.3-162-sat Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-06 15:47:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1450111    
Attachments:
Description Flags
error message when syncing none

Description Deepannagaraj Nagarathinam 2018-02-01 13:02:56 UTC
Description of problem:

- The cdn-sync command failed due to insufficient disk space. After increasing the disk space, if we try to run the cdn-sync again then it fails with the below error.
~~~~~~
SYNC ERROR: attempting to display as much information as possible
~~~~~~

Version-Release number of selected component (if applicable):

- Red Hat Satellite v 5.8

How reproducible:

- Always

Steps to Reproduce:

- Build a Satellite 5.8 server with less disk space.
- Try to sync a channel which is bigger than the disk size.
- Once the sync is failed, increase the disk space and try to sync the channel again.

Actual results:

- Synchronization failed with the below error.
~~~~~~
SYNC ERROR: attempting to display as much information as possible
~~~~~~

Expected results:

- Synchronization should complete successfully and the error should be more precise.

Additional info:

Comment 1 Deepannagaraj Nagarathinam 2018-02-01 13:05:32 UTC
Hi Team,

As a workaround, we can move the checksum_cache file on the satellite server to a temporary location.

------
# mv /var/cache/reposync/checksum_cache /tmp/
# cdn-sync
------

Deepannagaraj N.

Comment 2 Jan Dobes 2018-02-01 13:45:25 UTC
There is probably not enough space in /var/cache location. Was disk space increased on partition containing /var/cache? The checksum_cache file is used for speed up subsequent cdn-sync runs without need to recompute checksum of all RPMs in channel. If this file gets too big, it can be safely deleted.

Comment 3 Deepannagaraj Nagarathinam 2018-02-01 13:48:31 UTC
Hello Jan,

Thank you for the quick response.

Yes, the disk space is increased and after that the cdn-sync fails to run and the error is also not more clear so it will be difficult for the customer to understand.

Deepannagaraj N.

Comment 4 Jan Dobes 2018-02-01 15:12:42 UTC
What is following command output?
# df -h

Because Satellite is using multiple locations to store files, depending on customer's disk partition layout some partitions may require more space than other.

I agree the error message could be clearer.

Comment 5 Deepannagaraj Nagarathinam 2018-02-01 16:49:53 UTC
Hello Jan,

In my test Satellite server, I have everything mounted under "/".

~~~~~~
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_vm-lv_root
                      181G  135G   38G  79% /
tmpfs                 2.4G  4.0K  2.4G   1% /dev/shm
/dev/sda1             477M   75M  377M  17% /boot
/dev/mapper/vg_vm-lv_home
                      9.8G   36M  9.2G   1% /home
~~~~~~

Above size is after increasing the disk space.

Deepannagaraj N.

Comment 6 Tomas Lestach 2018-02-01 17:33:37 UTC
Not sure, if we can reproduce this, as I remember at least two other BZs dealing with limited space not hitting this issue, but let's try.
If there was an internal reproducer, that would be great.

Comment 8 Jan Dobes 2018-02-02 14:26:48 UTC
I suppose we could update the error message when the file gets corrupted or there isn't enough disk space to advise user to delete this file or check if there is enough space.

Comment 9 Tomas Lestach 2018-02-02 17:11:10 UTC
Thanks Deepannagaraj!
Ack, Jan.

Comment 10 Sander Eerkes 2018-02-13 14:29:42 UTC
An corrupted /var/opt/satellite/var/cach/reposync/checksum_cache file caused the problem.
The cdn-sync command gave a strange python error, sugestion to give a more clean error when this problem rises again.
After i deleted the /var/opt/satellite/var/cach/reposync/checksum_cache, the file was rebuild and i was able to sync again.

Sugestion:
better error handling of the cdn-sync command or in case of error automaticly rebuild (or delete) the file.

Reproduce:
replace the /var/opt/satellite/var/cach/reposync/checksum_cache file with a broken one.

Comment 11 Grant Gainey 2018-02-13 16:28:12 UTC
(In reply to Sander Eerkes from comment #10)
> An corrupted /var/opt/satellite/var/cach/reposync/checksum_cache file caused
> the problem.
> The cdn-sync command gave a strange python error, sugestion to give a more
> clean error when this problem rises again.
> After i deleted the /var/opt/satellite/var/cach/reposync/checksum_cache, the
> file was rebuild and i was able to sync again.
> 
> Sugestion:
> better error handling of the cdn-sync command or in case of error
> automaticly rebuild (or delete) the file.
> 
> Reproduce:
> replace the /var/opt/satellite/var/cach/reposync/checksum_cache file with a
> broken one.

Concur. One note - checksum_cache path is actually

  /var/cache/rhn/reposync/checksum_cache

Comment 12 Grant Gainey 2018-02-13 16:51:42 UTC
Proposed wording change:

===
[root@host ~]# cdn-sync -c rhn-tools-rhel-x86_64-server-6
11:50:19 ======================================
11:50:19 | Channel: rhn-tools-rhel-x86_64-server-6
11:50:19 ======================================
11:50:19 Sync of channel started.
SYNC ERROR: Perhaps checksum_cache is corrupted? Remove /var/cache/rhn/reposync/checksum_cache and retry.
<type 'exceptions.EOFError'>
[root@host ~]# rm /var/cache/rhn/reposync/checksum_cache
[root@host ~]# cdn-sync -c rhn-tools-rhel-x86_64-server-6
11:50:31 ======================================
11:50:31 | Channel: rhn-tools-rhel-x86_64-server-6
11:50:31 ======================================
11:50:31 Sync of channel started.
11:50:31 Repo URL: https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/rhn-tools/os
11:50:32 Packages in repo:               226
11:50:32 No new packages to sync.
11:50:33 Repo https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/rhn-tools/os has comps file fbecca9e03f6e988b833143039e9583454cc84f7-comps.xml.
11:50:33 Repo https://cdn.redhat.com/content/dist/rhel/server/6/6Server/x86_64/rhn-tools/os has 61 errata.
11:50:33 No new errata to sync.
11:50:34 Kickstartable tree not detected (no valid treeinfo file)
11:50:34 Sync of channel completed in 0:00:02.
11:50:34 Total time: 0:00:02
[root@host-8-243-229 ~]#

Comment 13 Grant Gainey 2018-02-13 17:14:45 UTC
spacewalk.github:
daac1eb0e4f8c9b301d9793e0be93d20776c1a70

Comment 15 Grant Gainey 2018-02-13 17:18:16 UTC
Let's not make comments private unless we have to - discussion that isn't exposing customer data or internal machine names is fine to have in public comments.

Comment 17 Sander Eerkes 2018-02-14 08:15:43 UTC
The error i got with the checksum_cache file was diffrent since i think the file was partly corrupted.
After i removed the file the problem was solved.
See my error message:

https://drive.google.com/file/d/1Qcxfdk8AEQ3iN8Vh6JbjjulONjP29Ren/view?usp=sharing

ps. i made a screenshot as i was not able to copy and paste it here.

Comment 18 Sander Eerkes 2018-02-14 08:26:42 UTC
The file was corrupted due to a new full sync of a new channel.
After the file system went full to 100% we aborted the cdn-sync.
Then we increased the file system, after the resize it was 65%. (75G free)
Then we ran cdn-sync again and at that moment we saw the error.
The error was realy strange, see the above link.
When we removed the checksum_cache file everything was back to normal.
So in my case the error message was not clear at all, thats why this ticket was created.

Comment 19 Sander Eerkes 2018-02-14 09:21:10 UTC
Created attachment 1395828 [details]
error message when syncing

Error due to (partly?) corruped checksum_cache file.

Comment 20 Grant Gainey 2018-02-14 16:28:26 UTC
(In reply to Sander Eerkes from comment #18)
> The file was corrupted due to a new full sync of a new channel.
> After the file system went full to 100% we aborted the cdn-sync.
> Then we increased the file system, after the resize it was 65%. (75G free)
> Then we ran cdn-sync again and at that moment we saw the error.
> The error was realy strange, see the above link.
> When we removed the checksum_cache file everything was back to normal.
> So in my case the error message was not clear at all, thats why this ticket
> was created.

Sanders - yup, exactly so. The proposed fix makes the problem and possible fix  more clear:

===
SYNC ERROR: Perhaps checksum_cache is corrupted? Remove /var/cache/rhn/reposync/checksum_cache and retry.
===

Comment 21 Radovan Drazny 2018-02-28 16:43:10 UTC
Reproduced using the reproducer from the comment #7 on spacewalk-backend-2.5.3-160. Creating an empty /var/cache/rhn/reposync/checksum_cache file was enough to reproduce the error during the cdn-sync.

After updating to spacewalk-backend-2.5.3-162, "corrupting" the checksum_cache  file again, and running the cdn-sync, there was a following output and error message:

# cdn-sync -c rhel-x86_64-server-7 --no-packages
05:25:20 ======================================
05:25:20 | Channel: rhel-x86_64-server-7
05:25:20 ======================================
05:25:20 Sync of channel started.
SYNC ERROR: Perhaps checksum_cache is corrupted? Remove /var/cache/rhn/reposync/checksum_cache and retry.
<type 'exceptions.EOFError'>

After the checksum_cache file the cdn-sync was successful again.

VERIFIED

Comment 24 errata-xmlrpc 2018-03-06 15:47:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0393

Comment 25 Sander Eerkes 2018-03-06 16:09:37 UTC
(In reply to Grant Gainey from comment #20)
> (In reply to Sander Eerkes from comment #18)
> > The file was corrupted due to a new full sync of a new channel.
> > After the file system went full to 100% we aborted the cdn-sync.
> > Then we increased the file system, after the resize it was 65%. (75G free)
> > Then we ran cdn-sync again and at that moment we saw the error.
> > The error was realy strange, see the above link.
> > When we removed the checksum_cache file everything was back to normal.
> > So in my case the error message was not clear at all, thats why this ticket
> > was created.
> 
> Sanders - yup, exactly so. The proposed fix makes the problem and possible
> fix  more clear:
> 
> ===
> SYNC ERROR: Perhaps checksum_cache is corrupted? Remove
> /var/cache/rhn/reposync/checksum_cache and retry.
> ===


Hi Grant,

Is this also fixed for Satellite 6.X or is there no checksum_cache in 6.X?

Gr, Sander.

Comment 26 Grant Gainey 2018-03-06 16:12:47 UTC
(In reply to Sander Eerkes from comment #25)
> (In reply to Grant Gainey from comment #20)
> > (In reply to Sander Eerkes from comment #18)
> > > The file was corrupted due to a new full sync of a new channel.
> > > After the file system went full to 100% we aborted the cdn-sync.
> > > Then we increased the file system, after the resize it was 65%. (75G free)
> > > Then we ran cdn-sync again and at that moment we saw the error.
> > > The error was realy strange, see the above link.
> > > When we removed the checksum_cache file everything was back to normal.
> > > So in my case the error message was not clear at all, thats why this ticket
> > > was created.
> > 
> > Sanders - yup, exactly so. The proposed fix makes the problem and possible
> > fix  more clear:
> > 
> > ===
> > SYNC ERROR: Perhaps checksum_cache is corrupted? Remove
> > /var/cache/rhn/reposync/checksum_cache and retry.
> > ===
> 
> 
> Hi Grant,
> 
> Is this also fixed for Satellite 6.X or is there no checksum_cache in 6.X?
> 
> Gr, Sander.

Hi Sander,

Sat6 is a completely different product - there's no overlap between this BZ and anything in Sat6 or its subprojects (pulp would be the thing that fills this particular niche, I think) cdn-sync is a Sat5-only thing.