Created attachment 493512 [details] logs.tgz Description of problem: So over the last week I have ran into numerous instances where the AMI comes online and dc_prepare_repos has failed. When this occurs and you try to build a image with selecting packages, you get the 'Internal Server Error failed to read cached packages info, run 'rake dc:prepare_repos'. In IRC it was mentioned that pulp is around the corner and it would be a waste of time to address this problem that has been there forever. I somewhat agree however it concerns me that I continue to hit this over the past week, hadn't seen it until now and now it seems I hit it all the time. I did 6 AMI instances this morning, 4 of them came online with failed dc_prepare_repos. That pushed me over the edge and I think we need to do something about this. Version-Release number of selected component (if applicable): aeolus-conductor-doc-0.0.3-6.el6.x86_64 aeolus-conductor-0.0.3-6.el6.x86_64 aeolus-conductor-daemons-0.0.3-6.el6.x86_64 aeolus-configure-2.0.0-8.el6.noarch Additional info: Instance #1 error (ec2-50-16-82-137) =================================================== Wed Apr 20 09:24:18 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted! Wed Apr 20 09:24:18 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/14/Fedora/x86_64/os/repodata/db4d7a09b6cd42ac07c9655864843c0df5cf76db37806d6a009f4c6430d66264-primary.xml.gz: ^_<8b>^H^H^@^@^@^@^Bÿ/srv/pungi/14.RC1/14/Fedora/x86_64/os/.repodata/primary.xml^@ì½Û¶ÛF<92>(ø¾¿^BK^O§¥.<91>^D^Rwµí^²$«Ô%Ù*où´z^öJ ^S$L^P<80>qÙ{S^Oç¡Ïÿ̼ÌZó>¿Ò?2^Q <80>7<91>`&@Úîé®*<97>¹ÉÈÈÌÈ̸edÄ7ÿü¸J´{^<94>q<96>~ûÄ<98>êO4<9e><86>^Y<8b>Óù·O~ùôÃÄ{òÏßÝ|³â^Ue´¢^Z<80>§å·O^VU<95>¿<98>Í<92>8^_§¬^ò)gõ¬<83><9a><85>Ùj<95>¥O^Zè^WE¾:Û^B`<9e>h9^M<97>tÎ^A?q^]ç ôÛ~£Uë<9c>^?û^D<81>¾»Ñ´oRºâße9O³(<8a>C>Í<8a>ù$¡é^\Á'«âîÝ<8f>ßÌ^D^L^BÓ"\|÷è9w<8e>õÍLü<81>ß¶sÖx<9e><85>^K<98>ù^S¤Â·OÌ©<89>4(xòí^S^_>G¡a=<99><89>^Vá<82><87>˲^µ<83>)^W<94>Ø^N^Lz9<8f>Ù·OþíÍí<93>ï^LÇ^M^B<87>é<94>qî2bP/âAÄ<88>îúÔtýÀf,²m<93><86>¶g^Pî^Y^D<= <96>ïë: <snip> >o¯CÒÏ©ñû D%Ö^H¾Paã^H^Qã½^Z©íö~×íSåp¿<89>ç<82>óN<9f>ì3@u#ÿÚ<80><8f>^Ký^[¤^X^?<8f>^QYÚ6<87>!¡Q^S^[)tyÑõJ<96>^B<93>^Vþ÷ÿ^CQÇ^UKY^]z^@ Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace) Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:24:22 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115 Instance #2 success (ec2-184-72-69-101) =========================================================== Wed Apr 20 09:22:05 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:22:05 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): executed successfully Instance #3 error (ec2-75-101-227-16) ================================================================= Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted! Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/repomd.xml: <?xml version="1.0" encoding="UTF-8"?> Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm"> Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <revision>1273711547</revision> Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor /Exec[dc_prepare_repos]/returns (notice): <data type="other_db"> <snip> Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </repomd> Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace) Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:22:37 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115 Instance #4 error (ec2-50-16-169-110) ===================================================== Wed Apr 20 09:54:39 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted! Wed Apr 20 09:54:39 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/ed88d22fca1c8bcc07d85bb677d5f8f45422a373a53b6dd213d57d7dfc278878-primary.xml.gz: ^_<8b>^H^H^@^@^@^@^Bÿ/srv/pungi/13.RC3/13/Fedora/x86_64/os/.repodata/primary.xml^@ì½í<92>ÛFÒ.ø<9f>W<81>ÐF¬í^]<91>]<9f>^@j<8e>gNÈ<92>,ëX²uÜòûê¼±^Q<8a>^BªÀ<86>^[$8^@Ø^_ú±<97>´^W±W¶U^E<80>^DÙ^DP^@Ø<92><Ç3<96>ÔMÖgVUfVVæ<93>ßÿ÷»UâÜÈ,<8f>Óõ?<9e>À^ExâÈu<98><8a>x½üÇ<93>ßßÿ8÷<9f>ü÷^?ξ_É<82>^K^pG^U_çÿxrU^T<9b>¿_\$ñz{·^PÛk¹<90>b{Q<97>º^HÓÕ*]?)Kÿ=Û¬zk¨2O<9c>^M^O¯ùRªö1^Dè<89>ê·úÄ)î7ò^_Ot¡^?Î^\çû5_É^?&qpw^]Dq"çBÞÈäû^Kó±þ<9e>gáÕ?ï|÷£K¾¿0¿èO«i:r<93><86>Wÿx¢¦ª>1<93>^¸O<9c> <snip> Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): t<88>ÓöYÑÑ<97>-Ð"ª<96>òFã^Qº<9c>?ÒŽ(G-Ô±Nnþ^ܬâG^?^]^M}â¶<86>𿨰í<91>µ<87>F^VÆ^YQëfH-¥Jµ(¥Ã?.^S<D^U<Þ%ô>f^C²P^F<97>h:Pûw<88>êëÆ{Ý;¼^HµqZpÃvI^K°ý þØ<8d>غnV<8c>Û<81>·<80>CSÜÔæâÍã<9d>?¯È{^O^C§µßà¿ùÿ^A<8f>¹<93>^T¸|~^@ Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace) Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:54:44 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115 Instance #5 error (ec2-184-73-119-66) ============================================ Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted! Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): failed to fetch http://download.fedoraproject.org/pub/fedora/linux/releases/13/Fedora/x86_64/os/repodata/repomd.xml: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <html><head> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <title>404 Not Found</title> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </head><body> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <h1>Not Found</h1> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <p>The requested URL /releases/13/Fedora/x86_64/os/repodata/repomd.xml was not found on this server.</p> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <hr> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): <address>Apache/2.2.3 (CentOS) Server at fedora.mirror.netriplex.com Port 80</address> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): </body></html> Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace) Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:54:51 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus_recipe/manifests/conductor.pp:115 Instance #6 success (ec2-72-44-57-77) ============================================= Wed Apr 20 09:54:47 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Wed Apr 20 09:54:47 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): executed successfully
Failure of dc_prepare_repos almost always corresponds to an issue with the an issue with the selected yum repo. I'm not sure there is a good short term fix to make this more reliable.
Not sure if this patch helps but it's worth to try it: https://fedorahosted.org/pipermail/aeolus-devel/2011-April/001285.html
Did this make it into the latest AMI? If it did, I ran into the same issues 2 out of 3 times on the latest AMI, ami-6e807f07, plus I didn;t see any logging about retries (would I?). If it didn't make it into the AMI, it probably needs to be in there, for whatever reason I continue to encounter this when the AMI is deployed inside the EC2 cloud.
patch should be in new rpms, patch commit is 1cd3de1b25a43bbeeb3dc0f0cc20be8cdb1de74a
moving to on_qa for review
This issue still exist. Error observed in aeolus-configure logs: Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): rake aborted! Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): no such file to load -- util/repository_manager Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (See full trace by running task with --trace) Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (notice): (in /usr/share/aeolus-conductor) Tue Jun 21 01:58:13 -0400 2011 /Stage[main]/Aeolus::Conductor/Exec[dc_prepare_repos]/returns (err): change from notrun to 0 failed: /usr/bin/rake dc:prepare_repos returned 1 instead of one of [0] at /usr/share/aeolus-configure/modules/aeolus/manifests/conductor.pp:114 logs attached. [root@nec-em19 aeolus-configure]# rpm -qa | grep aeolus rubygem-aeolus-cli-0.0.1-1.fc14.20110620142346git1c969a7.noarch aeolus-conductor-doc-0.3.0-0.fc14.20110620142346git1c969a7.noarch aeolus-conductor-daemons-0.3.0-0.fc14.20110620142346git1c969a7.noarch aeolus-conductor-0.3.0-0.fc14.20110620142346git1c969a7.noarch aeolus-all-0.3.0-0.fc14.20110620142346git1c969a7.noarch aeolus-configure-2.0.1-0.fc14.20110602110128git5cb9257.noarch [root@nec-em19 aeolus-configure]#
Created attachment 505795 [details] aeolus-configure logs
Error reported by Aziza is different/not related to the original issue. This last error is filled in bug https://bugzilla.redhat.com/show_bug.cgi?id=714757 Original error can't be reproduced any more as this functionality was removed -> closing this bug.
perm close