Bug 2152305 - The ppc64le builds frequently timeouts & fails.
Summary: The ppc64le builds frequently timeouts & fails.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Copr
Classification: Community
Component: backend
Version: unspecified
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Copr Team
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-10 13:06 UTC by Balint Cristian
Modified: 2023-01-02 15:53 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-02 15:53:00 UTC
Embargoed:


Attachments (Terms of Use)

Description Balint Cristian 2022-12-10 13:06:39 UTC
Description
===========

First, this *only* happens to ppc64le.
Not noticed on x86-64/aarch64 builders, even under shortage/queue conditions.

* Many (if not all) ppc64le builds systematically fails.
* See some examples here, (builds was canceled) for exposure:

  $ copr list-builds rezso/HDL | grep cancel | awk '{print $1}'

  5119900, 5119899, 5119882, 5119868, 5119855
  5116866, 5116863, 5116862, 5116861, 5116860
  5116859, 5116857

* Failures *always* end up with some kind of timeout:
  "Connection timed out during banner exchange"
  "Connection to XX.XX.XX.XX port 22 timed out"

  E.g. https://download.copr.fedorainfracloud.org/results/rezso/HDL/fedora-rawhide-ppc64le/05119899-verible/builder-live.log.gz


Ocurrence:
==========

* Before latest scheduled upgrade there was only few (waivable) failures.
* After latest scheduled upgrade these failures are frequently occurring.

Sometimes might be a shortage/queue of ppc64le builders, but this don't explain such failures.

Comment 1 Balint Cristian 2022-12-11 10:02:12 UTC
Update
=======


 * Another strange behaviour, related to ppc64le only, that might hint at firewall/badcfg issues, despite "internet access" is checked:
 
  <...>
  Cloning into '.'...
  fatal: unable to access 'https://github.com/chipsalliance/Surelog.git/': Failed to connect to github.com port 443 after 1039 ms: No route to host
  <...>

Comment 2 Jakub Kadlčík 2022-12-13 00:29:52 UTC
Hello Balint,
thank you for the report.

I think it will be related to this issue
https://github.com/fedora-copr/copr/issues/2433

It is my priority for this week

Comment 3 Jakub Kadlčík 2023-01-02 15:53:00 UTC
Sorry, it took so long, I had a hard time debugging this and needed some help from @praiskup.
It should be fixed now, more information in the GitHub issue above.

Please let us know if you see this happening again.


Note You need to log in before you can comment on or make changes to this bug.