Bug 1272535 - Server and bridge regularly crash when releng is signing rpms in batched mode
Summary: Server and bridge regularly crash when releng is signing rpms in batched mode
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: sigul
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Patrick Uiterwijk
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 801168 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-16 16:24 UTC by Amanda Carter
Modified: 2016-08-15 16:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-15 16:42:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
backtrace using pyrasite and gdb of sigul-bridge (15.39 KB, text/plain)
2015-11-16 12:14 UTC, Nikos Mavrogiannopoulos
no flags Details
pyrasite.patch (525 bytes, patch)
2015-11-24 11:48 UTC, Miloslav Trmač
no flags Details | Diff

Description Amanda Carter 2015-10-16 16:24:26 UTC
Description of problem:

[From dgilmore; reply from mitr]

> to the point that when I sign rpms I run the
> following in a different screen window to keep things moving along
>
> while true; do sleep 45; lsof |grep python|grep TCP|grep FIN_WAIT2|sed -e
> 's|
> python||g' -e 's|ausil.*||g'|xargs kill; done
>
> which is quite ugly,  the bridge locks up and needs to be restarted
> somewhat
> frequently and the backend server occasionally locks up also.
>

Yeah ☹

I haven’t been able to reproduce these crashes locally, and IIRC I have
more or less proven that there isn’t a bug in the sigul code; which makes
it ar more likely that I have been wrong in the analysis than that the code
is bug-free, but either way, I haven’t been able to find the bug and I
haven’t had other ideas how to proceed looking for it.

So the best alternative I could come up with has been to “just” rewrite the
whole thing to avoid the nested TLS usage (probably easier than remotely
debugging the whole underlying library stack without a reproducer), but
that’s a big enough project that I can’t do that in my non-existent spare
time.

Ideally, someone else could take a look and find the bug obvious and
trivial to fix. Without that, we are looking at rewriting at least the
sigul transport layer (src/double_tls.py, ~20% of the codebase). The RPC
client/validation/server/integration with koji/gpg/rpm might be salvageable
or at least could be useful as an a starting point.

Comment 1 Dennis Gilmore 2015-10-16 17:31:06 UTC
The while loop is to deal with a case where the client is waiting on a final ack that the bridge thinks it has been sent. it causes the client to deadlock waiting on something that will never come.

Comment 2 Amanda Carter 2015-11-04 21:02:11 UTC
*** Bug 801168 has been marked as a duplicate of this bug. ***

Comment 3 Amanda Carter 2015-11-04 21:03:39 UTC
See some details from previous reports of similar problems in BZ#801168 marked as a dupe of this one.

Comment 4 Kevin Fenzi 2015-11-09 20:46:04 UTC
ok. I sat down to gather more info today. 

The releng scripting basically boils down to calling: 

sigul --batch sign-rpm --store-in-koji --koji-only --v3-signature <keyname> rpm

(when rpm is 1 rpm), and 'sign-rpms' if it's more than one. 

So, for testing I did the following: 

* Gathered a list of all the rpms produced by a fedora-23 texlive build: 
https://koji.fedoraproject.org/koji/buildinfo?buildID=686733 (5159 rpms)

* ran: 
sigul sign-rpms --output /home/fedora/kevin/foobar --v3-signature fedora-23 `cat texlive-rpms-list`

* Entered my passphrase for the f23 key. 

* Waited until it hung. 

I can attach the entire server and bridge logs if you like. 
The last few lines of each are: 

bridge: 

2015-11-09 20:34:45,266 DEBUG: sign-rpms:send replies: Started handling 'id' = '\x00\x00\x02\x16', 'status' = '\x00\x00\x00\x00'
2015-11-09 20:34:45,266 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x03\xcd', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cmsd-doc', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xad/I\xb3K\xd7\xf4\x8brcN\xc8!\x88;\xac', 'rpm-version' = 'svn18787.0'
2015-11-09 20:34:45,266 DEBUG: Subreply: 'id' = '\x00\x00\x02A', 'status' = '\x00\x00\x00\x00'
2015-11-09 20:34:45,268 INFO: Subrequest: 'id' = '\x00\x00\x03\xcd', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cmsd-doc', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xad/I\xb3K\xd7\xf4\x8brcN\xc8!\x88;\xac', 'rpm-version' = 'svn18787.0'
2015-11-09 20:34:45,452 DEBUG: sign-rpms:koji requests: Started handling reply 'id' = '\x00\x00\x02\x17', 'status' = '\x00\x00\x00\x00'
2015-11-09 20:34:45,453 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x03\x93', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-circuitikz', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xf6lu$"Q\'t\x18ak6W\x8f\x80\xcc', 'rpm-version' = 'svn28665.0.3.0'
2015-11-09 20:34:45,453 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x03\xce', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cmtiup', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xa8\xffS\x91\x95\xa6\xcf\xbe;\x9e\xa5;>\x95\xea\xd3', 'rpm-version' = 'svn20512.1.3a'
2015-11-09 20:34:45,453 INFO: Subrequest: 'id' = '\x00\x00\x03\xce', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cmtiup', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xa8\xffS\x91\x95\xa6\xcf\xbe;\x9e\xa5;>\x95\xea\xd3', 'rpm-version' = 'svn20512.1.3a'
2015-11-09 20:34:45,454 DEBUG: sign-rpms:send replies: Started handling 'id' = '\x00\x00\x02\x17', 'status' = '\x00\x00\x00\x00'
2015-11-09 20:34:45,455 DEBUG: sign-rpms:read replies: Started handling 'id' = '\x00\x00\x02B', 'status' = '\x00\x00\x00\x00'
2015-11-09 20:34:45,455 DEBUG: Subreply: 'id' = '\x00\x00\x02B', 'status' = '\x00\x00\x00\x00'

vault: 

2015-11-09 20:27:35,217 INFO: Subrequest: 'id' = '\x00\x00\x03\x0e', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-canoniclayout', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = 'D53\xde9oW\xa5\xbaFF\xa3\xf4\x93\xe1\xd3', 'rpm-version' = 'svn24523.0.4'
2015-11-09 20:27:35,595 INFO: Signed RPM ('texlive-boisik', 4L, 'svn15878.0.5', '13.fc23', 'noarch', '364acb3259369b950069591310cf76aded1b5ae6130ce9571e2d312cfb1a0c77c8c7ac1ad862f01576340dc82e369d4c8d7ecef47c30a6c6ece2ed4e9f8958ef') with key fedora-23
2015-11-09 20:27:35,595 DEBUG: sign-rpms:signing: Started handling ('texlive-boisik-doc', 4L, 'svn15878.0.5', '13.fc23', 'noarch', '15db7859449d3e8ac6e54f109b71bb8c87421978880c11a050bfc6e910287bdd4817aab9d4bc30012e2c9623e5abe3b232dd88c4a9c09dc0ce555004699ec665')
2015-11-09 20:27:35,597 DEBUG: sign-rpms:requests: Started handling 'id' = '\x00\x00\x03\x0f', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-canoniclayout-doc', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xfa\xf0\xc9\xda\xd6?xf\x82\x04\xf7z\xb7\xe9D\xcb', 'rpm-version' = 'svn24523.0.4'
2015-11-09 20:27:35,597 INFO: Subrequest: 'id' = '\x00\x00\x03\x0f', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-canoniclayout-doc', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = '\xfa\xf0\xc9\xda\xd6?xf\x82\x04\xf7z\xb7\xe9D\xcb', 'rpm-version' = 'svn24523.0.4'
2015-11-09 20:27:35,984 INFO: Signed RPM ('texlive-boisik-doc', 4L, 'svn15878.0.5', '13.fc23', 'noarch', '15db7859449d3e8ac6e54f109b71bb8c87421978880c11a050bfc6e910287bdd4817aab9d4bc30012e2c9623e5abe3b232dd88c4a9c09dc0ce555004699ec665') with key fedora-23
2015-11-09 20:27:35,984 DEBUG: sign-rpms:signing: Started handling ('texlive-boites', 4L, 'svn32235.1.1', '13.fc23', 'noarch', '99e46617a9ad6c5bf0017a8fb4a61e0341a9224fc1840aacba758e024ae7c80e0f53e34bcc130cabf1ad375369902e3f89d6a8414b8ff4f33ff678902d69f7fb')
2015-11-09 20:27:35,985 DEBUG: sign-rpms:requests: Started handling 'id' = '\x00\x00\x03\x10', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cantarell', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = 'M\xcb\x10\x19\xd1l\xa1J\xb07\x13\xa4a\xad\xd0&', 'rpm-version' = 'svn27066.2.4'
2015-11-09 20:27:35,986 INFO: Subrequest: 'id' = '\x00\x00\x03\x10', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-cantarell', 'rpm-release' = '13.fc23', 'rpm-sigmd5' = 'M\xcb\x10\x19\xd1l\xa1J\xb07\x13\xa4a\xad\xd0&', 'rpm-version' = 'svn27066.2.4'
2015-11-09 20:27:36,360 INFO: Signed RPM ('texlive-boites', 4L, 'svn32235.1.1', '13.fc23', 'noarch', '99e46617a9ad6c5bf0017a8fb4a61e0341a9224fc1840aacba758e024ae7c80e0f53e34bcc130cabf1ad375369902e3f89d6a8414b8ff4f33ff678902d69f7fb') with key fedora-23

Let me know if you want the full logs or other debugging info...

Comment 5 Kevin Fenzi 2015-11-09 23:03:18 UTC
Using pyrasite on the vault I see its threads are: 

Thread 0x7f6fdaa8a700
  File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
    self.__bootstrap_inner()

  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()

  File "<string>", line 167, in run

  File "/usr/lib64/python2.7/code.py", line 243, in interact
    more = self.push(line)

  File "/usr/lib64/python2.7/code.py", line 265, in push
    more = self.runsource(source, self.filename)

  File "/usr/lib64/python2.7/code.py", line 87, in runsource
    self.runcode(code)

  File "/usr/lib64/python2.7/code.py", line 103, in runcode
    exec code in self.locals

  File "<console>", line 3, in <module>
Thread 0x7f6feb00d740
  File "/usr/share/sigul/server.py", line 1552, in <module>
    main()

  File "/usr/share/sigul/server.py", line 1532, in main
    (_, status) = os.waitpid(child_pid, 0)

Not sure that helps, but thought I would add it.

Comment 6 Nikos Mavrogiannopoulos 2015-11-10 14:23:03 UTC
I was able to reproduce the issue even without --store-in-koji. Just with the same amount of packages. Hopefully we get something useful out of it soon.

Comment 7 Miloslav Trmač 2015-11-10 21:29:53 UTC
(In reply to Kevin Fenzi from comment #5)
> Using pyrasite on the vault I see its threads are:

FWIW on the vault, almost nothing interesting is happening in the top-level process; every request is handled by a new child, and that child has another helper process (and then there is RPM and GPG).  Backtraces from all relevant processes (all sigul Python processes, for starters) might be more useful.

(Trying to reproduce the reproduction by Nikos…)

Comment 8 Nikos Mavrogiannopoulos 2015-11-11 14:25:38 UTC
(In reply to Kevin Fenzi from comment #4)
> ok. I sat down to gather more info today. 
> 
> The releng scripting basically boils down to calling: 
> 
> sigul --batch sign-rpm --store-in-koji --koji-only --v3-signature <keyname>
> rpm
> 
> (when rpm is 1 rpm), and 'sign-rpms' if it's more than one. 
> 
> So, for testing I did the following: 
> 
> * Gathered a list of all the rpms produced by a fedora-23 texlive build: 
> https://koji.fedoraproject.org/koji/buildinfo?buildID=686733 (5159 rpms)
> 
> * ran: 
> sigul sign-rpms --output /home/fedora/kevin/foobar --v3-signature fedora-23
> `cat texlive-rpms-list`
> 
> * Entered my passphrase for the f23 key. 
> 
> * Waited until it hung. 

Hi Kevin, could you clarify what do you mean by it hung? Which process hung? Are the server and bridge unable to serve any other connections? What is the output of the client in the "hung" state when run with the -v -v options?

Comment 9 Kevin Fenzi 2015-11-11 16:43:33 UTC
(In reply to Nikos Mavrogiannopoulos from comment #8)

> Hi Kevin, could you clarify what do you mean by it hung? Which process hung?
> Are the server and bridge unable to serve any other connections? What is the
> output of the client in the "hung" state when run with the -v -v options?

Everything stopped processing. No more logs on the vault/server, bridge or client. 

New client attempts just hang and don't get any answer from the bridge. 

In order to bring things back up, the server and bridge must both be restarted.

Comment 10 Nikos Mavrogiannopoulos 2015-11-12 17:01:38 UTC
The situation when it blocks is the following:

Connections:

client <--established--> bridge <--established--> server

client has two processes. The parent has a socket with the child in FIN_WAIT2 mode, that is waiting a termination from the child. The child has its socket in CLOSE_WAIT mode, that is, it has not yet called close in that socket.

The client's child which is blocked indefinitely on nspr_poll() - that explains why it hasn't called close in the previous socket. As he has only one more socket with the bridge, it is most likely waiting input from the bridge.

bridge has two connections hanging indefinitely in CLOSE_WAIT mode. They are connections to koji and kojipkg. For some reason I cannot perform lsof/gdb/ or strace in that system. I'll have to retry with the bridge in F23.

Comment 11 Nikos Mavrogiannopoulos 2015-11-16 12:14:03 UTC
Created attachment 1094863 [details]
backtrace using pyrasite and gdb of sigul-bridge

Attaching the backtrace of the sigul bridge.

Comment 12 Miloslav Trmač 2015-11-18 20:54:25 UTC
OK, thanks to Nikos arranging for a good testing environment, sigul-0.101 is available at http://people.redhat.com/mitr/rpmsigner/rhel6/ . 

tl:dr; Please give this a try; it should fix sign-rpms and do nothing for sign-rpm.


Detailed ChangeLog at https://git.fedorahosted.org/cgit/sigul.git/tree/NEWS?h=sigul-0.101 .

Notes:

* This fixes a hang in (sigul sign-rpms) which we we able to reproduce. (As a side-effect, the bridge may now be performing two Koji requests concurrently, instead of only one at a time.)

* In case of some exceptions, the processes could hang without printing anything. Now the exception is properly logged, but the hang has not been fixed yet (it is, AFAIK, theoretical [reproduced for me by a permission issue you are certainly not running into], and we wanted to have something for you to test ASAP), tracked as #1283364 .  Note that if this scenario happens, the logged exception may not be at the bottom of the log! Search for “Unexpected error” or “Terminated by an exception”.)

* Both of these affect _only_ (sigul sign-rpms) as that has been the provided reproducer.  (I’ll now try running (sigul ign-rpm) on the same set of RPMs overnight, we’ll see how that fares.)

Comment 13 Kevin Fenzi 2015-11-23 18:15:58 UTC
So, I updated to 0.101 on the bridge, vault and bodhi-backend01 client. 

I was able to do a full run of texlive signing (5200+ packages) in
about 33 minutes with no lockup. ;) 

I then did a run of 471 packages via the releng scripts that use koji,
etc. That ran fine for a while, but then seems to have gotten stuck. 

On the bridge: 

2015-11-19 21:51:23,808 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x01\xd5', 'rpm-arch' = 'i686', 'rpm-epoch' = '', 'rpm-name' = 'gchemtable', 'rpm-release' = '10.fc23', 'rpm-sigmd5' = 'h\xc2[SV)\x17\x91\x1b&;\xa65\x89\xc2\xea', 'rpm-version' = '0.14.10'
2015-11-19 21:51:23,872 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x01\xd6', 'rpm-arch' = 'x86_64', 'rpm-epoch' = '2', 'rpm-name' = 'libpng-debuginfo', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '0\xac\x1aki\x1c\x08\x96\x1d\xf6%\xee\x16\xdd\x17\xc9', 'rpm-version' = '1.6.19'
2015-11-19 21:51:23,938 DEBUG: sign-rpms:send requests finished, exc_info: None
2015-11-19 21:51:23,938 DEBUG: Sending final EOFs to sign-rpms:read replies...
2015-11-19 21:51:23,939 DEBUG: Waiting for sign-rpms:read replies...
2015-11-19 21:59:20,062 ERROR: I/O error: NSPR connection reset

On the vault: 

015-11-19 21:43:06,247 INFO: Subrequest: 'id' = '\x00\x00\x01\xcf', 'rpm-arch' = 'armv7hl', 'rpm-epoch' = '', 'rpm-name' = 'geany-plugins-geanygendoc', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '\x8bE=\x99-`\x91\xb5\x90\x94\xf6\xa8\xc2\x90\xbfQ', 'rpm-version' = '1.26'
2015-11-19 21:43:06,279 DEBUG: sign-rpms:requests: Started handling 'id' = '\x00\x00\x01\xd0', 'rpm-arch' = 'x86_64', 'rpm-epoch' = '', 'rpm-name' = 'pypy-devel', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '\x9c+\xdd\xe2\xf1\xd7\n>\xac\x1a\x99\x13tE\xac\x11', 'rpm-version' = '4.0.0'
2015-11-19 21:43:06,279 INFO: Subrequest: 'id' = '\x00\x00\x01\xd0',
'rpm-arch' = 'x86_64', 'rpm-epoch' = '', 'rpm-name' = 'pypy-devel',
'rpm-release' = '1.fc23', 'rpm-sigmd5' =
'\x9c+\xdd\xe2\xf1\xd7\n>\xac\x1a\x99\x13tE\xac\x11', 'rpm-version' =  
'4.0.0'

I then killed the bridge and restarted it and on the vault got: 

2015-11-19 21:57:48,171 DEBUG: (child) _CombiningBuffer: data dropped
2015-11-19 21:57:51,387 DEBUG: sign-rpms:replies: Started handling
('gcrystal', None, '0.14.10', '10.fc23', 'armv7hl',
'98ecdf8fcd018a5367c44a86c48e1cbd56dde6616e16a2dbace8773e38971a5d49474f785802865a51401cee58577916e6a2657df5610b3871b6c2b666eac771')

This is using a different path/command line here... 

sigul --batch sign-rpms --store-in-koji --koji-only --v3-signature fedora-23 compiz-0.8.9-2.fc23.x86_64.rpm nordugrid-arc-cache-service-5.0.4-1.fc23.armv7hl.rpm nordugrid-arc-5.0.4-1.fc
23.x86_64.rpm babl-devel-0.1.14-1.fc23.armv7hl.rpm compiz-devel-0.8.9-2.fc23.x86_64.rpm lxdm-debuginfo-0.5.2-2.D
20151121gitd5b7ae6b.fc23.armv7hl.rpm perl-thrift-0.9.1-17.fc23.noarch.rpm...413more

Can you test the koji path's there? Or would that be something you would need a staging instance in our infrastructure for?

I just tried to duplicate the issue again today and couldn't. I will see if I can get a reproducer or at least see it again. Thanks.

Comment 14 Miloslav Trmač 2015-11-24 11:48:36 UTC
Created attachment 1098145 [details]
pyrasite.patch

Just to make sure, reiterating:
> Note that if this scenario happens, the logged exception may not be at the
> bottom of the log! Search for “Unexpected error” or “Terminated by an
> exception”.)

We can test a from-Koji path, but not --store-in-koji, so also not --koji-only; the RPMs are being sent to clients. Also, for completeness, we must have
> koji_do_proxy_auth = False
in sigul/settings.py; not appropriate for your use.


As for more information, thanks for pointing out the existence of pyrasite; that seems to be the best tool available at least so far. 

What I found useful:
The “Dumping thread stacks” example from http://pyrasite.readthedocs.org/en/latest/Payloads.html is a good starting point, if possible when run on _all_ processes on all three hosts.  

Those backtraces do not capture the exceptions from already terminated threads, though; so at that point interactive examination may be needed, e.g. once I ended up running (pyrasite shell $pid) and things like
> sys._current_frames().items()[2][1].f_back.f_back.f_back.f_locals
> sys._current_frames().items()[2][1].f_back.f_back.f_back.f_locals['koji_queue']._qsize()
> sys._current_frames().items()[2][1].f_back.f_back.f_locals['threads'][2].exc_info
> traceback.print_exception(*sys._current_frames().items()[2][1].f_back.f_back.f_locals['threads'][2].exc_info)

Also, running pyrasite as root on a sigul-owned processes did not work because the sigul-owned processes could not connect back to the root-owned socket; the attached pyrasite patch (very insecure on a multi-user machine!) helped.

Comment 15 Miloslav Trmač 2015-11-24 11:53:42 UTC
(In reply to Miloslav Trmač from comment #14)
> …and things like
> …
(To clarify, this is not a request for capturing the output of precisely those commands; that would be backtrace-specific.)

Comment 16 Kevin Fenzi 2015-11-25 18:25:15 UTC
I got a lockup today and logs on the bridge had: 

2015-11-25 18:01:11,312 DEBUG: sign-rpms:send replies: Started handling 'id' = '\x00\x00\x00\xf7', 'status' = '\x00\x00\x00\x00'
2015-11-25 18:01:11,313 ERROR: NSPR error
Traceback (most recent call last):
  File "/usr/share/sigul/utils.py", line 559, in run
    self._real_run()
  File "/usr/share/sigul/bridge.py", line 878, in _real_run
    self.__handle_one_rpm(rpm)
  File "/usr/share/sigul/bridge.py", line 884, in __handle_one_rpm
    self.__conn.client_buf.write(rpm.reply_header_data)
  File "/usr/share/sigul/double_tls.py", line 880, in write
    self.__socket.send(utils.u32_pack(len(data)))
NSPRError: (PR_HOST_UNREACHABLE_ERROR) Host is unreachable.

but then it kept going for a while. Still don't have a solid reproducer. ;(

Comment 18 Miloslav Trmač 2015-11-26 13:29:22 UTC
(In reply to Kevin Fenzi from comment #16)
> I got a lockup today and logs on the bridge had: 
> 
> 2015-11-25 18:01:11,312 DEBUG: sign-rpms:send replies: Started handling 'id'
> = '\x00\x00\x00\xf7', 'status' = '\x00\x00\x00\x00'
> 2015-11-25 18:01:11,313 ERROR: NSPR error
> Traceback (most recent call last):
>   File "/usr/share/sigul/utils.py", line 559, in run
>     self._real_run()
>   File "/usr/share/sigul/bridge.py", line 878, in _real_run
>     self.__handle_one_rpm(rpm)
>   File "/usr/share/sigul/bridge.py", line 884, in __handle_one_rpm
>     self.__conn.client_buf.write(rpm.reply_header_data)
>   File "/usr/share/sigul/double_tls.py", line 880, in write
>     self.__socket.send(utils.u32_pack(len(data)))
> NSPRError: (PR_HOST_UNREACHABLE_ERROR) Host is unreachable.

Thanks, this is sufficient to show that #1283364 will need to be fixed. And that I was wrong about ‘Search for “Unexpected error” or “Terminated by an exception”.)', the error messages can have other forms, like the above.

> but then it kept going for a while.

That is expected; in this case things hang only after the vault can’t sign any more because too many replies are in flight, which eventually prevents the bridge from queuing more incoming requests.

> Still don't have a solid reproducer. ;(

FWIW, the bridge got “host unreachable” (or something NSPR mapped to “host unreachable”) while trying to send data to the client. Could this be related to a signing client shutting down/suspending a laptop, a VPN dropping, or something like that?  (Not that that would fix sigul, of course.)

Comment 19 Kevin Fenzi 2015-11-26 18:41:51 UTC
(In reply to Miloslav Trmač from comment #18)

> FWIW, the bridge got “host unreachable” (or something NSPR mapped to “host
> unreachable”) while trying to send data to the client. Could this be related
> to a signing client shutting down/suspending a laptop, a VPN dropping, or
> something like that?  (Not that that would fix sigul, of course.)

Shouldn't be. The bridge is a vm on a cisco UCS blade center, and the client host is another vm on the same blade center (although a different blade). There's no vpns or suspends going on.

Comment 20 Miloslav Trmač 2015-11-26 21:35:27 UTC
sigul-0.102 is available at http://people.redhat.com/mitr/rpmsigner/rhel6/ , with the same deployment mechanism and/or issues as the last time.

Again, this only affects sign-rpms and does nothing for sign-rpm.

Detailed ChangeLog at https://git.fedorahosted.org/cgit/sigul.git/tree/NEWS?h=sigul-0.102 .

With this, exceptions in any of the worker threads should cause the whole processing pipeline to eventually terminate.

Note that it takes the Linux kernel, per tcp(7), “13 to 30 minutes” for it to detect that a host is completely dead and not responding to incoming data at all, and only then is user-space notified. The RPM items will in the mean time queue in the various socket buffers, and eventually in inter-thread queues; so depending on the processing speed, between the time a host goes away and sigul terminates, the logs may show activity or they may show no progress.  The Recv-Q and Send-Q columns in (ss -nat) may suggest such queueing, or high values in retry counts within (ss -nato).


(FWIW a possible way to reproduce something similar, if not exactly equivalent, in the harness by Nikos is to (docker stop sigul-client); that will result in “connection reset” instead of “host unreachable”.)

For example, for me, with the bridge downloading RPMs from koji but sending them back to the client, (docker stop sigul-client) causes:
A The bridge→client socket buffers fill up, within a minute
B Bridge processing of replies continues until ~200 RPMs are queued
C Then vault→bridge socket buffers fill up
D Then the two vault queues fill up (until ~200 RPMs queued, again)
(Vault stops logging activity at this point)
E Then the bridge→vault socket buffers fill up
F Request processing in the bridge continues until ~200 RPMs are queued
(Bridge stops logging activity at this point)
G The client→bridge socket buffers, usually full for me most of the time because the client processing is faster than the bridge, would eventually empty and the bridge would block until more data from the (actually missing) client arrives.

I have seen this process terminate at D and F so far; i.e. there has always been visible activity logged on the bridge at least up to the point of an ECONNRESET and the bridge and server terminating the processing of this request; but that may not be the case for you.  Still, the process with data queued in its send buffer should terminate within 30 minutes or so if the peer completely goes away.

[This is different from a peer hanging; a peer hanging but continuing to ACK that it can not receive more data will keep the connection open indefinitely.  But that is clearly not the case for your PR_HOST_UNREACHABLE_ERROR.]


FWIW all of this will “only” convert server hangs into signing failures; it will still be necessary to either resolve the cause of the “host unreachable” errors (which are unlikely to be within sigul) or to have clients retry after that network error goes away.

Comment 21 Kevin Fenzi 2015-11-27 16:41:33 UTC
Thanks for all the info. 

I have updated to 0.102 here. Will let you know any results.

Comment 22 Till Maas 2015-11-28 08:55:16 UTC
With both client and brige updated to 0.102 I get the following error on the bridge trying so sign Rawhide:

2015-11-28 08:52:27,341 ERROR: sign-rpms:send replies: I/O error: NSPR connection reset
2015-11-28 08:52:41,424 DEBUG: sign-rpms:koji replies: Sending queue EOF failed, queue already orphaned
2015-11-28 08:52:41,424 INFO: sign-rpms:koji replies: Writing to work queue failed, queue orphaned
2015-11-28 08:52:42,362 DEBUG: sign-rpms:read replies: Exception, shutting down write side as well
2015-11-28 08:52:42,362 DEBUG: sign-rpms:read replies: Sending queue EOF failed, queue already orphaned
2015-11-28 08:52:42,362 INFO: sign-rpms:read replies: Writing to work queue failed, queue orphaned
2015-11-28 08:52:42,363 ERROR: sign-rpms:send requests: I/O error: NSPR connection reset
2015-11-28 08:52:56,447 DEBUG: sign-rpms:koji requests: Sending queue EOF failed, queue already orphaned
2015-11-28 08:52:56,447 INFO: sign-rpms:koji requests: Writing to work queue failed, queue orphaned

Comment 23 Till Maas 2015-11-28 08:59:08 UTC
it seems like sigul was not done writing errors when I posted comment:22 - bug 1283364 contains the full error.

Comment 24 Kevin Fenzi 2016-01-18 17:46:52 UTC
So, the last week I was pushing updates the batches all worked fine for me aside from one case where the client didn't see that the server was all done. 

Till: Can you test batches with f24 again and see what issues (if any) remain?

(Since it looks like 1283364 was fixed by iptables rule adjustments?)

Comment 25 Dennis Gilmore 2016-01-20 17:29:43 UTC
I got a lockup on the client today

client side shows

14961 pts/8    S+     0:04  |   \_ /usr/bin/python ./sigulsign_unsigned.py fedora-23 -v --write-all --sigul-batch-size=50 libtool-2.4.6-8.fc23 llvm-3.7.0-3.fc23 elfutils-0.165-2.fc23 gnome-calendar-3.18.2.1-2.fc23 nfs-utils-1.3.3-6.rc3.fc23 kernel-4.3.3-301.fc23 python-pandas-0.17.1-1.fc23 ardour4-4.6.0-1.fc23 nodejs-mysql-2.6.2-3.fc23 perl-WWW-Twilio-TwiML-1.05-1.fc23 libdkimpp-1.0.8-1.fc23 python-flower-0.8.3-5.fc23 qmmp-0.9.6-1.fc23 pcre2-10.21-1.fc23 php-twig-1.23.3-1.fc23 docker-compose-1.5.2-4.fc23 openscap-daemon-0.1.1-2.fc23 perl-Net-GitHub-0.82-1.fc23 python-boto3-1.2.3-1.fc23 perl-Getopt-Lucid-1.06-1.fc23 perl-Date-Holidays-DE-1.7-1.fc23 gofer-2.7.1-1.fc23 mk-files-20151111-1.fc23 fontopia-1.1-2.fc23 vdr-2.2.0-8.fc23 python-ivi-0.14.9-3.fc23 dcap-2.47.10-1.fc23 wordwarvi-1.1-1.git6beed31.fc23 nordugrid-arc-5.0.5-1.fc23 nordugrid-arc-doc-2.0.6-1.fc23 coreboot-utils-4.2-1.fc23 ocserv-0.10.11-1.fc23 perl-Razor-Agent-2.85-21.fc23 python-voluptuous-0.8.8-1.fc23 python-photutils-0.2-1.fc23 weechat-1.4-1.fc23 batctl-2015.2-1.fc23 fastd-17-4.fc23 libuecc-6-1.fc23 openttd-1.5.3-1.fc23 gphotoframe-2.0.2-2.hg2084299dffb6.fc23 yash-2.40-1.fc23 prosody-0.9.9-2.fc23 php-di-symfony2-bridge-1.1.0-1.fc23 php-di-phpdoc-reader-2.0.1-1.fc23 php-di-invoker-1.2.0-1.fc23 php-mnapoli-phpunit-easymock-0.2.1-1.fc23 php-container-interop-1.1.0-1.fc23 php-di-5.2.0-1.fc23 checkstyle-6.13-1.fc23 rubygem-pkg-config-1.1.7-1.fc23 dislocker-0.5.1-2.fc23 libsigrok-0.3.0-4.fc23 hash-slinger-2.7-1.fc23 docker-client-3.1.9-1.fc23 freeciv-2.5.2-1.fc23 metis-5.1.0-10.fc23 voms-2.0.12-7.fc23 php-pear-PHP-CodeSniffer-2.5.1-1.fc23 golang-github-DataDog-datadog-go-0-0.1.gitb050cd8.fc23 mediainfo-0.7.81-1.fc23 libmediainfo-0.7.81-1.fc23 perl-System-Command-1.117-1.fc23 python-ldap3-0.9.8.6-3.fc23 ibus-anthy-1.5.8-1.fc23 elementary-1.16.1-1.fc23 perl-Log-Report-Optional-1.02-1.fc23 yoshimi-1.3.8.1-1.fc23 scap-security-guide-0.1.28-1.fc23 eclipse-pydev-4.5.1-1.fc23 gsi-openssh-7.1p2-1.fc23 evas-generic-loaders-1.16.0-2.fc23 texlive-2014-18.20140525_r34255.fc23 python-gammu-2.5-1.fc23 nsd-4.1.7-4.fc23 efl-1.16.1-2.fc23 mg-20160118-1.fc23 pidgin-window-merge-0.3-5.fc23 sssd-1.13.3-2.fc23 docker-registry-0.9.1-4.fc23 scalapack-2.0.2-12.fc23 cutter-1.2.5-1.fc23 qucs-0.0.18-7.fc23 python-bcrypt-2.0.0-3.fc23 pcsc-lite-1.8.15-1.fc23 vagrant-libvirt-0.0.32-1.fc23 perl-Number-Fraction-2.00-1.fc23 mkvtoolnix-8.8.0-1.fc23 colorhug-client-0.2.8-1.fc23 python-monotonic-0.5-1.fc23 hwdata-0.285-2.fc23 gnome-2048-3.18.2-2.fc23 rubygem-test-unit-rr-1.0.5-1.fc23 rubygem-test-unit-3.1.7-100.fc23 mozilla-noscript-2.9.0.2-1.fc23 php-pecl-propro-1.0.2-1.fc23 php-pecl-raphf-1.1.2-1.fc23 gnome-abrt-1.2.1-1.fc23 libinput-1.1.4-3.fc23 lxqt-notificationd-0.10.0-3.fc23 liblxqt-0.10.0-7.fc23 lxqt-panel-0.10.0-4.fc23 lxqt-config-0.10.0-3.fc23 lxqt-common-0.10.0-7.fc23 lxqt-powermanagement-0.10.0-3.fc23 lxqt-policykit-0.10.0-3.fc23 lxqt-globalkeys-0.10.0-4.fc23 lxqt-session-0.10.0-3.fc23 lxqt-about-0.10.0-4.fc23 libqtxdg-1.3.0-2.fc23 lxqt-sudo-0.10.0-4.fc23 pcmanfm-qt-0.10.0-2.fc23
19440 pts/8    Sl+    0:00  |       \_ /usr/bin/python /usr/share/sigul/client.py --batch sign-rpms --store-in-koji --koji-only --v3-signature fedora-23 eclipse-pydev-debuginfo-4.5.1-1.fc23.x86_64.rpm texlive-tabvar-svn28908.1.7-18.fc23.noarch.rpm texlive-supertabular-svn15878.4.1a-18.fc23.noarch.rpm texlive-dtk-doc-svn27443.1.29-18.fc23.noarch.rpm texlive-listings-svn33095.1.5c-18.fc23.noarch.rpm texlive-bidi-doc-svn32896.14.6-18.fc23.noarch.rpm texlive-figsize-svn18784.0.1-18.fc23.noarch.rpm texlive-newfile-doc-svn15878.1.0c-18.fc23.noarch.rpm efl-debuginfo-1.16.1-2.fc23.armv7hl.rpm texlive-babel-albanian-svn30254.1.0c-18.fc23.noarch.rpm texlive-threeparttable-doc-svn17383.0-18.fc23.noarch.rpm texlive-pxbase-svn28516.0.5-18.fc23.noarch.rpm qucs-0.0.18-7.fc23.x86_64.rpm texlive-latexdiff-doc-svn30961.1.0.3-18.fc23.noarch.rpm texlive-digiconfigs-svn15878.0.5-18.fc23.noarch.rpm texlive-vwcol-doc-svn15878.0.1-18.fc23.noarch.rpm texlive-ogham-svn24876.0-18.fc23.noarch.rpm texlive-sourcecodepro-doc-svn32006.2.2-18.fc23.noarch.rpm texlive-jfontmaps-bin-svn29848.0-18.20140525_r34255.fc23.noarch.rpm texlive-ifplatform-svn21156.0.3a-18.fc23.noarch.rpm texlive-etoolbox-doc-svn20922.2.1-18.fc23.noarch.rpm texlive-dialogl-svn28946.0-18.fc23.noarch.rpm texlive-slantsc-svn25007.2.11-18.fc23.noarch.rpm texlive-covington-doc-svn17750.0-18.fc23.noarch.rpm texlive-hepparticles-svn15878.0-18.fc23.noarch.rpm texlive-pst-2dplot-svn15878.1.5-18.fc23.noarch.rpm texlive-biblatex-dw-svn31752.1.6a-18.fc23.noarch.rpm texlive-lgreek-svn21818.0-18.fc23.noarch.rpm scalapack-mpich-2.0.2-12.fc23.armv7hl.rpm texlive-edmac-svn15878.0-18.fc23.noarch.rpm texlive-altfont-doc-svn15878.1.1-18.fc23.noarch.rpm texlive-chemcono-doc-svn17119.1.3-18.fc23.noarch.rpm texlive-memoir-doc-svn31044.3.7b-18.fc23.noarch.rpm texlive-chextras-doc-svn27118.1.01-18.fc23.noarch.rpm texlive-lfb-svn15878.1.0-18.fc23.noarch.rpm texlive-gfsbaskerville-svn19440.1.0-18.fc23.noarch.rpm texlive-tpic2pdftex-bin-svn29741.0-18.20140525_r34255.fc23.noarch.rpm texlive-bibleref-svn21145.1.14-18.fc23.noarch.rpm texlive-auto-pst-pdf-doc-svn23723.0.6-18.fc23.noarch.rpm texlive-uri-svn21608.0-18.fc23.noarch.rpm texlive-libris-svn19409.1.007-18.fc23.noarch.rpm texlive-todonotes-doc-svn28362.0-18.fc23.noarch.rpm texlive-shadowtext-svn26522.0.3-18.fc23.noarch.rpm texlive-sfmath-svn15878.0.8-18.fc23.noarch.rpm texlive-thumbs-doc-svn33134.1.0n-18.fc23.noarch.rpm texlive-unamthesis-doc-svn29799.2.02-18.fc23.noarch.rpm texlive-tkz-linknodes-doc-svn22833.1.0c-18.fc23.noarch.rpm texlive-mdframed-doc-svn31075.1.9b-18.fc23.noarch.rpm texlive-bgteubner-svn25892.2.02-18.fc23.noarch.rpm texlive-gsftopk-bin-svn30088.0-18.20140525_r34255.fc23.x86_64.rpm
19441 pts/8    S+     0:00  |           \_ /usr/bin/python /usr/share/sigul/client.py --batch sign-rpms --store-in-koji --koji-only --v3-signature fedora-23 eclipse-pydev-debuginfo-4.5.1-1.fc23.x86_64.rpm texlive-tabvar-svn28908.1.7-18.fc23.noarch.rpm texlive-supertabular-svn15878.4.1a-18.fc23.noarch.rpm texlive-dtk-doc-svn27443.1.29-18.fc23.noarch.rpm texlive-listings-svn33095.1.5c-18.fc23.noarch.rpm texlive-bidi-doc-svn32896.14.6-18.fc23.noarch.rpm texlive-figsize-svn18784.0.1-18.fc23.noarch.rpm texlive-newfile-doc-svn15878.1.0c-18.fc23.noarch.rpm efl-debuginfo-1.16.1-2.fc23.armv7hl.rpm texlive-babel-albanian-svn30254.1.0c-18.fc23.noarch.rpm texlive-threeparttable-doc-svn17383.0-18.fc23.noarch.rpm texlive-pxbase-svn28516.0.5-18.fc23.noarch.rpm qucs-0.0.18-7.fc23.x86_64.rpm texlive-latexdiff-doc-svn30961.1.0.3-18.fc23.noarch.rpm texlive-digiconfigs-svn15878.0.5-18.fc23.noarch.rpm texlive-vwcol-doc-svn15878.0.1-18.fc23.noarch.rpm texlive-ogham-svn24876.0-18.fc23.noarch.rpm texlive-sourcecodepro-doc-svn32006.2.2-18.fc23.noarch.rpm texlive-jfontmaps-bin-svn29848.0-18.20140525_r34255.fc23.noarch.rpm texlive-ifplatform-svn21156.0.3a-18.fc23.noarch.rpm texlive-etoolbox-doc-svn20922.2.1-18.fc23.noarch.rpm texlive-dialogl-svn28946.0-18.fc23.noarch.rpm texlive-slantsc-svn25007.2.11-18.fc23.noarch.rpm texlive-covington-doc-svn17750.0-18.fc23.noarch.rpm texlive-hepparticles-svn15878.0-18.fc23.noarch.rpm texlive-pst-2dplot-svn15878.1.5-18.fc23.noarch.rpm texlive-biblatex-dw-svn31752.1.6a-18.fc23.noarch.rpm texlive-lgreek-svn21818.0-18.fc23.noarch.rpm scalapack-mpich-2.0.2-12.fc23.armv7hl.rpm texlive-edmac-svn15878.0-18.fc23.noarch.rpm texlive-altfont-doc-svn15878.1.1-18.fc23.noarch.rpm texlive-chemcono-doc-svn17119.1.3-18.fc23.noarch.rpm texlive-memoir-doc-svn31044.3.7b-18.fc23.noarch.rpm texlive-chextras-doc-svn27118.1.01-18.fc23.noarch.rpm texlive-lfb-svn15878.1.0-18.fc23.noarch.rpm texlive-gfsbaskerville-svn19440.1.0-18.fc23.noarch.rpm texlive-tpic2pdftex-bin-svn29741.0-18.20140525_r34255.fc23.noarch.rpm texlive-bibleref-svn21145.1.14-18.fc23.noarch.rpm texlive-auto-pst-pdf-doc-svn23723.0.6-18.fc23.noarch.rpm texlive-uri-svn21608.0-18.fc23.noarch.rpm texlive-libris-svn19409.1.007-18.fc23.noarch.rpm texlive-todonotes-doc-svn28362.0-18.fc23.noarch.rpm texlive-shadowtext-svn26522.0.3-18.fc23.noarch.rpm texlive-sfmath-svn15878.0.8-18.fc23.noarch.rpm texlive-thumbs-doc-svn33134.1.0n-18.fc23.noarch.rpm texlive-unamthesis-doc-svn29799.2.02-18.fc23.noarch.rpm texlive-tkz-linknodes-doc-svn22833.1.0c-18.fc23.noarch.rpm texlive-mdframed-doc-svn31075.1.9b-18.fc23.noarch.rpm texlive-bgteubner-svn25892.2.02-18.fc23.noarch.rpm texlive-gsftopk-bin-svn30088.0-18.20140525_r34255.fc23.x86_64.rpm

[ausil@bodhi-backend01 scripts]$ lsof |grep python|grep TCP|grep FIN_WAIT2
python    19449            ausil    7u     IPv4 2846994384       0t0        TCP localhost:52957->localhost:55554 (FIN_WAIT2)
python    19449 19452      ausil    7u     IPv4 2846994384       0t0        TCP localhost:52957->localhost:55554 (FIN_WAIT2)
python    19450            ausil    7u     IPv4 2846761826       0t0        TCP bodhi-backend01.phx2.fedoraproject.org:56028->sign-bridge1:44334 (FIN_WAIT2)




on the bridge 

tail -50 /var/log/sigul_bridge.log-20160117
2016-01-20 17:28:51,869 INFO: Subrequest: 'id' = '\x00\x00\x00 ', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-arabxetex-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x04\xafryX\x1e\x80q2!\xfd\xf2\xd1)\xa8\x84', 'rpm-version' = 'svn17470.v1.1.4'
2016-01-20 17:28:51,880 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00!', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-l2tabu-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x1cy\tc\xfa\x94\x80\x1c\xda\x0f\x0c*.\xd7\x1a\x02', 'rpm-version' = 'svn24038.2.3'
2016-01-20 17:28:51,880 INFO: Subrequest: 'id' = '\x00\x00\x00!', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-l2tabu-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x1cy\tc\xfa\x94\x80\x1c\xda\x0f\x0c*.\xd7\x1a\x02', 'rpm-version' = 'svn24038.2.3'
2016-01-20 17:28:51,886 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00"', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-kotex-utils', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'L\x02V&[\x16\xe8,m\x01\x8d\x16<\x05\x02\x98', 'rpm-version' = 'svn32101.2.0.1'
2016-01-20 17:28:51,886 INFO: Subrequest: 'id' = '\x00\x00\x00"', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-kotex-utils', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'L\x02V&[\x16\xe8,m\x01\x8d\x16<\x05\x02\x98', 'rpm-version' = 'svn32101.2.0.1'
2016-01-20 17:28:51,892 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00#', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-hitec', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x06|Y\xc5-\xa2\x89\xfb\xd8*\xc2\xeaa(\n\x02', 'rpm-version' = 'svn15878.0.0_beta_'
2016-01-20 17:28:51,892 INFO: Subrequest: 'id' = '\x00\x00\x00#', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-hitec', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x06|Y\xc5-\xa2\x89\xfb\xd8*\xc2\xeaa(\n\x02', 'rpm-version' = 'svn15878.0.0_beta_'
2016-01-20 17:28:51,898 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00$', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-feynmf', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xeakE\xe3\xc6\x99\n2*\xd16^\x8f\xe6nT', 'rpm-version' = 'svn17259.1.08'
2016-01-20 17:28:51,898 INFO: Subrequest: 'id' = '\x00\x00\x00$', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-feynmf', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xeakE\xe3\xc6\x99\n2*\xd16^\x8f\xe6nT', 'rpm-version' = 'svn17259.1.08'
2016-01-20 17:28:51,905 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00%', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-l3packages', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x8b3f\x17\x85>1\x95F\x99\xdaC\x9e\xc4\xdd\xa8', 'rpm-version' = 'svn32537.SVN_4634'
2016-01-20 17:28:51,905 INFO: Subrequest: 'id' = '\x00\x00\x00%', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-l3packages', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\x8b3f\x17\x85>1\x95F\x99\xdaC\x9e\xc4\xdd\xa8', 'rpm-version' = 'svn32537.SVN_4634'
2016-01-20 17:28:51,911 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00&', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-aecc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xdc\x1b\xff\xfe\x8f\xbff\x86\xbfPZ\xa4u\x9c\xe5N', 'rpm-version' = 'svn28574.0'
2016-01-20 17:28:51,912 INFO: Subrequest: 'id' = '\x00\x00\x00&', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-aecc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xdc\x1b\xff\xfe\x8f\xbff\x86\xbfPZ\xa4u\x9c\xe5N', 'rpm-version' = 'svn28574.0'
2016-01-20 17:28:51,917 DEBUG: sign-rpms:read requests: Started handling 'id' = "\x00\x00\x00'", 'rpm-arch' = 'x86_64', 'rpm-epoch' = '', 'rpm-name' = 'blacs-common', 'rpm-release' = '12.fc23', 'rpm-sigmd5' = '\x03\x96\xff\xa0J8R\x98\x18\xf6\xfd\x99r\x04;\xbc', 'rpm-version' = '2.0.2'
2016-01-20 17:28:51,917 INFO: Subrequest: 'id' = "\x00\x00\x00'", 'rpm-arch' = 'x86_64', 'rpm-epoch' = '', 'rpm-name' = 'blacs-common', 'rpm-release' = '12.fc23', 'rpm-sigmd5' = '\x03\x96\xff\xa0J8R\x98\x18\xf6\xfd\x99r\x04;\xbc', 'rpm-version' = '2.0.2'
2016-01-20 17:28:51,923 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00(', 'rpm-arch' = 'i686', 'rpm-epoch' = '', 'rpm-name' = 'scalapack-debuginfo', 'rpm-release' = '12.fc23', 'rpm-sigmd5' = '"Ly\xa8)\x11\xef\xbe\xe3t\xf8As\xe0\xd2M', 'rpm-version' = '2.0.2'
2016-01-20 17:28:51,923 INFO: Subrequest: 'id' = '\x00\x00\x00(', 'rpm-arch' = 'i686', 'rpm-epoch' = '', 'rpm-name' = 'scalapack-debuginfo', 'rpm-release' = '12.fc23', 'rpm-sigmd5' = '"Ly\xa8)\x11\xef\xbe\xe3t\xf8As\xe0\xd2M', 'rpm-version' = '2.0.2'
2016-01-20 17:28:51,929 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00)', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-xetexfontinfo-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\'F\x9d$\x12\xc2\xe5\xa01\xc8\x01"\xb8j\xd7\x12', 'rpm-version' = 'svn15878.0'
2016-01-20 17:28:51,929 INFO: Subrequest: 'id' = '\x00\x00\x00)', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-xetexfontinfo-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\'F\x9d$\x12\xc2\xe5\xa01\xc8\x01"\xb8j\xd7\x12', 'rpm-version' = 'svn15878.0'
2016-01-20 17:28:51,954 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00*', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-icsv-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "V\x18\x8d\x8bi\x9dg'\xbeH)\xdc\xa9\x95\xe3\x0f", 'rpm-version' = 'svn15878.0.2'
2016-01-20 17:28:51,954 INFO: Subrequest: 'id' = '\x00\x00\x00*', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-icsv-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "V\x18\x8d\x8bi\x9dg'\xbeH)\xdc\xa9\x95\xe3\x0f", 'rpm-version' = 'svn15878.0.2'
2016-01-20 17:28:51,965 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00+', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-tdsfrmath', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xff\xda\xcd\x97\x1d\xdcQ;7\xdb(\x7f\x8a\x8b\xd4\x01', 'rpm-version' = 'svn15878.1.3'
2016-01-20 17:28:51,965 INFO: Subrequest: 'id' = '\x00\x00\x00+', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-tdsfrmath', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xff\xda\xcd\x97\x1d\xdcQ;7\xdb(\x7f\x8a\x8b\xd4\x01', 'rpm-version' = 'svn15878.1.3'
2016-01-20 17:28:51,972 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00,', 'rpm-arch' = 'noarch', 'rpm-epoch' = '', 'rpm-name' = 'scap-security-guide', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '\x00\xd8\xcd\x8f\xd4\x87\x15B\x14lD&\xd0>K\xa7', 'rpm-version' = '0.1.28'
2016-01-20 17:28:51,972 INFO: Subrequest: 'id' = '\x00\x00\x00,', 'rpm-arch' = 'noarch', 'rpm-epoch' = '', 'rpm-name' = 'scap-security-guide', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '\x00\xd8\xcd\x8f\xd4\x87\x15B\x14lD&\xd0>K\xa7', 'rpm-version' = '0.1.28'
2016-01-20 17:28:51,978 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00-', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-arabi', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'J\xe9\xfd\xbep\xf0\xa9QZ\xcaD\xd8\xc4yZ\x94', 'rpm-version' = 'svn25095.1.1'
2016-01-20 17:28:51,978 INFO: Subrequest: 'id' = '\x00\x00\x00-', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-arabi', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'J\xe9\xfd\xbep\xf0\xa9QZ\xcaD\xd8\xc4yZ\x94', 'rpm-version' = 'svn25095.1.1'
2016-01-20 17:28:51,984 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00.', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-lewis-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "qpi\xfc\xd8,4b\xb8'\x9ce\xf07\xa9\x02", 'rpm-version' = 'svn15878.0.1'
2016-01-20 17:28:51,984 INFO: Subrequest: 'id' = '\x00\x00\x00.', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-lewis-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "qpi\xfc\xd8,4b\xb8'\x9ce\xf07\xa9\x02", 'rpm-version' = 'svn15878.0.1'
2016-01-20 17:28:51,990 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x00/', 'rpm-arch' = 'x86_64', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-kpathsea-bin', 'rpm-release' = '18.20140525_r34255.fc23', 'rpm-sigmd5' = "N)\x94m;\xb3|\xa2N\xc5\xfe\xd6\x0e\xd7'9", 'rpm-version' = 'svn30088.0'
2016-01-20 17:28:51,990 INFO: Subrequest: 'id' = '\x00\x00\x00/', 'rpm-arch' = 'x86_64', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-kpathsea-bin', 'rpm-release' = '18.20140525_r34255.fc23', 'rpm-sigmd5' = "N)\x94m;\xb3|\xa2N\xc5\xfe\xd6\x0e\xd7'9", 'rpm-version' = 'svn30088.0'
2016-01-20 17:28:51,998 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x000', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-scheme-minimal', 'rpm-release' = '18.20140525_r34255.fc23', 'rpm-sigmd5' = 'q\x07\x88oU\xb5(\x1ew\t\x83\xa3\x8c\xefnm', 'rpm-version' = 'svn13822.0'
2016-01-20 17:28:51,998 INFO: Subrequest: 'id' = '\x00\x00\x000', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-scheme-minimal', 'rpm-release' = '18.20140525_r34255.fc23', 'rpm-sigmd5' = 'q\x07\x88oU\xb5(\x1ew\t\x83\xa3\x8c\xefnm', 'rpm-version' = 'svn13822.0'
2016-01-20 17:28:52,004 DEBUG: sign-rpms:read requests: Started handling 'id' = '\x00\x00\x001', 'rpm-arch' = 'i686', 'rpm-epoch' = '', 'rpm-name' = 'mkvtoolnix', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '%\x07K8rq\xbd\x94\x11\xae8\xf4G\x17s\xa2', 'rpm-version' = '8.8.0'
2016-01-20 17:28:52,004 INFO: Subrequest: 'id' = '\x00\x00\x001', 'rpm-arch' = 'i686', 'rpm-epoch' = '', 'rpm-name' = 'mkvtoolnix', 'rpm-release' = '1.fc23', 'rpm-sigmd5' = '%\x07K8rq\xbd\x94\x11\xae8\xf4G\x17s\xa2', 'rpm-version' = '8.8.0'
2016-01-20 17:28:52,005 DEBUG: sign-rpms:read requests finished, exc_info: None
2016-01-20 17:28:52,005 DEBUG: Sending final EOFs to sign-rpms:koji requests...
2016-01-20 17:28:52,005 DEBUG: Waiting for sign-rpms:koji requests...
2016-01-20 17:28:53,221 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x01', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-hyphen-uppersorbian', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xfe\x0e3\x8e\x80\xb9\x80\x80%\xc1T\xc7rT\xfe)', 'rpm-version' = 'svn23085.0'
2016-01-20 17:28:53,221 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x00\x00', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-gmverb', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'J\xd9\xcc\xc9\x05\xe7\xbb\x1b\x11\xb7-\xc1cu\x9e?', 'rpm-version' = 'svn24288.v0.98'
2016-01-20 17:28:53,398 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x02', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-figflow', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = ')l\xad$\xea\x85\x11\x0fX\xcc\x03q\xf6-\x01\xb7', 'rpm-version' = 'svn21462.0'
2016-01-20 17:28:53,582 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x03', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-pst-ob3d', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "\x12\xa0\xd58|\x1d\x0cY\xee'\xc7w\xebs\x15\xac", 'rpm-version' = 'svn15878.0.21'
2016-01-20 17:28:53,787 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x04', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-plates-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '?\x01a\x875\x85\xaf`\xc4\xd8\x89\xaf\xe0O\x08<', 'rpm-version' = 'svn15878.0.1'
2016-01-20 17:28:54,161 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x05', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-actuarialangle-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '~\xca\xa6`}R\x95`5H\xd0\x84O\x8b\x1es', 'rpm-version' = 'svn28004.0'
2016-01-20 17:28:54,348 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x06', 'rpm-arch' = 'i686', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-pstools-bin', 'rpm-release' = '18.20140525_r34255.fc23', 'rpm-sigmd5' = '\xcb3zO\x8a\xb1\xf7\xda\x82\xe1\x13$\xef\xaf\xb6J', 'rpm-version' = 'svn30088.0'
2016-01-20 17:28:54,534 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x07', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-pkuthss-doc', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = 'C\xafe\x03\x8d\xac\xa7\x05\xd5 \xc5\xcc\x8a\xf9\x9d\x82', 'rpm-version' = 'svn33464.1.5.2'
2016-01-20 17:28:54,709 DEBUG: sign-rpms:koji requests: Started handling request 'id' = '\x00\x00\x00\x08', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-munich', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xdf\x80c5(\x03kU{\xce\xcd\x93\xdf\xd1\xb3\xf2', 'rpm-version' = 'svn15878.0'
2016-01-20 17:28:54,758 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x00\x01', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-hyphen-uppersorbian', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = '\xfe\x0e3\x8e\x80\xb9\x80\x80%\xc1T\xc7rT\xfe)', 'rpm-version' = 'svn23085.0'
2016-01-20 17:28:54,785 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x00\x02', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-figflow', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = ')l\xad$\xea\x85\x11\x0fX\xcc\x03q\xf6-\x01\xb7', 'rpm-version' = 'svn21462.0'
2016-01-20 17:28:54,807 DEBUG: sign-rpms:send requests: Started handling 'id' = '\x00\x00\x00\x03', 'rpm-arch' = 'noarch', 'rpm-epoch' = '4', 'rpm-name' = 'texlive-pst-ob3d', 'rpm-release' = '18.fc23', 'rpm-sigmd5' = "\x12\xa0\xd58|\x1d\x0cY\xee'\xc7w\xebs\x15\xac", 'rpm-version' = 'svn15878.0.21'

Comment 26 Till Maas 2016-01-20 19:21:57 UTC
@Dennis: If there is a stateful firewall on bodhi-backend01 it will most likely be the reason why the client hung. Ansible contains the firewall rules for autosign01 that I used to work around this.

Comment 27 Kevin Fenzi 2016-01-20 23:41:40 UTC
I put these same rules in place on bodhi-backend now.

Comment 28 Till Maas 2016-02-09 18:05:49 UTC
it seems that there are no crashs with the firewall disabled. However it appears to me that the client exists without errors even though not all packages were signed (not 100% sure).

btw. did you notice https://fedorahosted.org/sigul/ticket/2 regardig a different patch required for sigul now?

Comment 29 Jan Kurik 2016-02-24 13:51:04 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 30 Fedora Admin XMLRPC Client 2016-08-09 14:55:44 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 31 Patrick Uiterwijk 2016-08-14 21:11:10 UTC
(In reply to Till Maas from comment #28)
> it seems that there are no crashs with the firewall disabled. However it
> appears to me that the client exists without errors even though not all
> packages were signed (not 100% sure).

Did you ever manage to dig into this?
Either discover that this is still a real problem or that it is not?

Comment 32 Patrick Uiterwijk 2016-08-15 16:42:50 UTC
nirik | to my knowledge that is fixed.

Kevin says that this issue was fixed, so I'll close this bug out.
If it turns out that this bug still happens, feel free to reopen.


Note You need to log in before you can comment on or make changes to this bug.