Bug 698277

Summary: guarantee dc_prepare_repos runs successfully
Product: [Retired] CloudForms Cloud Engine Reporter: Dave Johnson <dajohnso>
Component: aeolus-conductorAssignee: Jan Provaznik <jprovazn>
Status: CLOSED CURRENTRELEASE QA Contact: Dave Johnson <dajohnso>
Severity: high Docs Contact:
Priority: unspecified    
Version: 0.3.1CC: akarol, cpelland, dajohnso, deltacloud-maint, jprovazn, ssachdev, whayutin
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-21 10:57:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 697919    
Attachments:
Description Flags
logs.tgz
none
aeolus-configure logs none

Description Dave Johnson 2011-04-20 15:05:43 UTC
Created attachment 493512 [details]
logs.tgz

Description of problem:

So over the last week I have ran into numerous instances where the AMI comes online and dc_prepare_repos has failed.  When this occurs and you try to build a image with selecting packages, you get the 'Internal Server Error failed to read cached packages info, run 'rake dc:prepare_repos'. 

In IRC it was mentioned that pulp is around the corner and it would be a waste of time to address this problem that has been there forever.  I somewhat agree however it concerns me that I continue to hit this over the past week, hadn't seen it until now and now it seems I hit it all the time. 

I did 6 AMI instances this morning, 4 of them came online with failed dc_prepare_repos.  That pushed me over the edge and I think we need to do something about this.


Version-Release number of selected component (if applicable):
aeolus-conductor-doc-0.0.3-6.el6.x86_64
aeolus-conductor-0.0.3-6.el6.x86_64
aeolus-conductor-daemons-0.0.3-6.el6.x86_64
aeolus-configure-2.0.0-8.el6.noarch



Additional info:

Instance #1 error  (ec2-50-16-82-137)
===================================================
Wed Apr 20 09:24:18 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted!
Wed Apr 20 09:24:18 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/14/Fedora/x86_64/os/repodata/db4d7a09b6cd42ac07c9655864843c0df5cf76db37806d6a009f4c6430d66264-primary.xml.gz: ^_<8b>^H^H^@^@^@^@^Bÿ/srv/pungi/14.RC1/14/Fedora/x86_64/os/.repodata/primary.xml^@ì½Û¶ÛF<92>(ø¾¿^BK^O§¥.<91>^D^Rwµí^²$«Ô%Ù*où´z^öJ ^S$L^P<80>qÙ{S^Oç¡Ïÿ̼ÌZó>¿Ò?2^Q    <80>7<91>`&@Úîé®*<97>¹ÉÈÈÌÈ̸edÄ7ÿü¸J´{^<94>q<96>~ûÄ<98>êO4<9e><86>^Y<8b>Óù·O~ùôÃÄ{òÏßÝ|³â^Ue´¢^Z<80>§å·O^VU<95>¿<98>Í<92>8­^_§¬^ò)gõ¬<83><9a><85>Ùj<95>¥O^Zè^WE¾:Û^B`<9e>h9^M<97>tÎ^A?q^]ç     ôÛ~£Uë<9c>^?û^D<81>¾»Ñ´oRºâße9O³(<8a>C>Í<8a>ù$¡é^\Á'«âîÝ<8f>ßÌ^D^L^BÓ"\|÷è9w<8e>õÍLü<81>߶sÖx<9e><85>^K<98>ù^S¤Â·OÌ©<89>4(xòí^S^_>G¡a=<99><89>^Vá<82><87>˲^µ<83>)^W<94>Ø^N^Lz9<8f>Ù·OþíÍí<93>ï^LÇ^M^B<87>é<94>qî2bP/âAÄ<88>îúÔtýÀf,²m<93><86>¶g^Pî^Y^D<= <96>ïë:  
<snip>
>o¯CÒÏ©ñû   D%Ö^H¾Paã^H^Qã½^Z©íö~×íSåp¿<89>ç<82>óN<9f>ì3@u#ÿÚ<80><8f>^Ký^[¤^X^?<8f>^QYÚ6<87>!¡Q^S^[)tyÑõJ<96>^B<93>^Vþ÷ÿ^CQÇ^UKY^]z^@
Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):
Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace)
Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115


Instance #2 success  (ec2-184-72-69-101)
===========================================================
Wed Apr 20 09:22:05 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:22:05 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): executed successfully


Instance #3 error  (ec2-75-101-227-16)
=================================================================
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted!
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/repomd.xml: <?xml version="1.0" encoding="UTF-8"?>
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):   <revision>1273711547</revision>
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor
/Exec[dc_prepare_repos]/returns (notice):   <data type="other_db">

<snip>

Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </repomd>
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace)
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115


Instance #4 error  (ec2-50-16-169-110)
=====================================================
Wed Apr 20 09:54:39 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted!
Wed Apr 20 09:54:39 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/ed88d22fca1c8bcc07d85bb677d5f8f45422a373a53b6dd213d57d7dfc278878-primary.xml.gz: ^_<8b>^H^H^@^@^@^@^Bÿ/srv/pungi/13.RC3/13/Fedora/x86_64/os/.repodata/primary.xml^@ì½í<92>ÛFÒ.ø<9f>W<81>ÐF¬í^]<91>]<9f>^@j<8e>gNÈ<92>,ëX²uÜòûê¼±^Q<8a>^BªÀ<86>^[$8^@Ø^_ú±<97>´^W±W¶U^E<80>^DÙ^DP^@Ø<92><Ç3<96>ÔMÖgVUfVVæ<93>ßÿ÷»UâÜÈ,<8f>Óõ?<9e>À^ExâÈu<98><8a>x½üÇ<93>ßßÿ8÷<9f>ü÷^?ξ_É<82>^K^pG^U_çÿxrU^T<9b>¿_\$ñz{·^PÛk¹<90>b{Q<97>º^HÓÕ*]?)Kÿ=Û¬zk¨2O<9c>^M^O¯ùRªö1^Dè<89>ê·úÄ)î7ò^_Ot¡^?Î^\çû5_É^?&qpw^]Dq"çBÞÈäû^Kó±þ<9e>gáÕ?ï|÷£K¾¿0¿èO«i:r<93><86>Wÿx¢¦ª>1<93>^¸O<9c>

<snip>

Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): t<88>ÓöYÑÑ<97>-Ð"ª<96>òFã^Qº<9c>?ÒŽ(G-Ô±Nnþ^ܬâG^?^]^M}â¶<86>𿨰í<91>µ<87>F^VÆ^YQëfH-¥Jµ(¥Ã?.^S<D^U<Þ%ô>f^C²P^F<97>h:Pûw<88>êëÆ{Ý;¼^HµqZpÃvI^K°ý þØ<8d>غnV<8c>Û<81>·<80>CSÜÔæâÍã<9d>?¯È{^O^C§µßà¿ùÿ^A<8f>¹<93>^T¸|~^@
Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):
Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace)
Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115


Instance #5 error  (ec2-184-73-119-66)
============================================
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted!
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/repomd.xml: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <html><head>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <title>404 Not Found</title>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </head><body>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <h1>Not Found</h1>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <p>The requested URL /releases/13/Fedora/x86_64/os/repodata/repomd.xml was not found on this server.</p>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <hr>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <address>Apache/2.2.3 (CentOS) Server at fedora.mirror.netriplex.com Port 80</address>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </body></html>
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace)
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115


Instance #6 success (ec2-72-44-57-77)
=============================================
Wed Apr 20 09:54:47 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Wed Apr 20 09:54:47 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): executed successfully

Comment 1 Mike Orazi 2011-04-25 16:03:38 UTC
Failure of dc_prepare_repos almost always corresponds to an issue
with the an issue with the selected yum repo.

I'm not sure there is a good short term fix to make this more reliable.

Comment 2 Jan Provaznik 2011-04-28 14:09:16 UTC
Not sure if this patch helps but it's worth to try it:
https://fedorahosted.org/pipermail/aeolus-devel/2011-April/001285.html

Comment 3 Dave Johnson 2011-04-29 22:22:49 UTC
Did this make it into the latest AMI?  If it did, I ran into the same issues 2 out of 3 times on the latest AMI, ami-6e807f07, plus I didn;t see any logging about retries (would I?).  

If it didn't make it into the AMI, it probably needs to be in there, for whatever reason I continue to encounter this when the AMI is deployed inside the EC2 cloud.

Comment 4 Jan Provaznik 2011-05-04 08:30:44 UTC
patch should be in new rpms, patch commit is 1cd3de1b25a43bbeeb3dc0f0cc20be8cdb1de74a

Comment 5 wes hayutin 2011-06-14 15:39:57 UTC
moving to on_qa for review

Comment 6 Aziza Karol 2011-06-21 10:30:19 UTC
This issue still exist.

Error observed in aeolus-configure logs:
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted!
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): no such file to load -- util/repository_manager
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice):
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace)
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor)
Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus/manifests/conductor.pp:114

logs attached.

[root@nec-em19 aeolus-configure]# rpm -qa  | grep aeolus
rubygem-aeolus-cli-0.0.1-1.fc14.20110620142346git1c969a7.noarch
aeolus-conductor-doc-0.3.0-0.fc14.20110620142346git1c969a7.noarch
aeolus-conductor-daemons-0.3.0-0.fc14.20110620142346git1c969a7.noarch
aeolus-conductor-0.3.0-0.fc14.20110620142346git1c969a7.noarch
aeolus-all-0.3.0-0.fc14.20110620142346git1c969a7.noarch
aeolus-configure-2.0.1-0.fc14.20110602110128git5cb9257.noarch
[root@nec-em19 aeolus-configure]#

Comment 7 Aziza Karol 2011-06-21 10:31:17 UTC
Created attachment 505795 [details]
aeolus-configure logs

Comment 8 Jan Provaznik 2011-06-21 10:57:44 UTC
Error reported by Aziza is different/not related to the original issue. This last error is filled in bug https://bugzilla.redhat.com/show_bug.cgi?id=714757

Original error can't be reproduced any more as this functionality was removed -> closing this bug.

Comment 9 wes hayutin 2011-12-08 14:16:46 UTC
perm close