Bug 1624514

Summary: pgrep/pkill fail with large uids
Product: Red Hat Enterprise Linux 7 Reporter: todd_lewis
Component: procps-ngAssignee: Jan Rybar <jrybar>
Status: CLOSED ERRATA QA Contact: Karel Volný <kvolny>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.7-AltCC: kvolny, todd_lewis
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: procps-ng-3.3.10-28.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1827731 (view as bug list) Environment:
Last Closed: 2020-09-29 20:36:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1827731, 2119083    
Attachments:
Description Flags
Fixes uname to long uid conversion
none
Fix same problem with gids as well as uids none

Description todd_lewis 2018-08-31 21:18:01 UTC
Description of problem:
The pgrep and pkill utilities fail to convert symbolic names to numeric
uids if the numeric values are 2^31 (2147483648) or above.

Version-Release number of selected component (if applicable):
3.3.10-17 on RHEL7 (but same behavior on 3.3.12-3 on Fedora 28)

How reproducible:
Always

Steps to Reproduce:
1. Create user with uid >= 2^31 (2147483648)
2. Start processes under that userid
3. use "pgrep -u thatuser" or "pkill -u thatuser" to list or kill those processes

Actual results:
No process listed / killed

Expected results:
Some processes listed / killed

Additional info:
Specifying the numeric uid works as documented, but the conversion
fails for symbolid uids. Behold:

 $ getent passwd goodid badid
 goodid:x:2147483647:2147483647:procps-ng demo a:/home/goodid:/bin/bash
 badid:x:2147483648:2147483648:procps-ng demo b:/home/badid:/bin/bash

 $ sudo runuser -l goodid bash -c "sleep 60" &
 [4] 6716
 $ sudo runuser -l badid bash -c "sleep 60" &
 [5] 6717

 $ pgrep -a -u goodid,badid
 6716 sleep 60

 $ pgrep -a -u 2147483647,2147483648
 6716 sleep 60
 6717 sleep 60

Comment 2 Jan Rybar 2018-09-03 08:08:27 UTC
Hello,
thank you for your report and your interest in functionality of pgrep/pkill.
Can I ask for information about any specific use-case of creating a user with UID > 2^31, in which pgrep/pkill is required?
Thank you and have nice and prosperous new week.

Comment 3 todd_lewis 2018-09-03 10:50:01 UTC
(In reply to Jan Rybar from comment #2)
> Hello,
> thank you for your report and your interest in functionality of pgrep/pkill.
> Can I ask for information about any specific use-case of creating a user
> with UID > 2^31, in which pgrep/pkill is required?
> Thank you and have nice and prosperous new week.

You can. We are in the process of automating with Ansible the installation and configuration of a large suite of SAS products for consistency across new dev, test, support, and production environments, on the order of ~70 bare metal hosts. These will replace existing older environments on aging hardware. This involves the creation of around a half dozen userids on the systems, instances of which already exist on the older systems. We elected to retain the existing username to numeric id mapping. At least one of these userids (the one that cost us over a day of work because of this problem) has had the numeric uid of 3333333333 since long before my group became involved with the project.

Note that except for pgrep/pkill, that admittedly crazy id number has not caused a problem in either the older RHEL6 or new RHEL7 systems.

I'm not advocating for huge uid numbers, but since they clearly do work otherwise, it seems pgrep and pkill -- which do the Right Thing when given the numbers directly -- ought to work when given the name. Instead they fail silently when converting the name to the number.

Comment 4 todd_lewis 2018-09-04 12:52:41 UTC
Created attachment 1480773 [details]
Fixes uname to long uid conversion

The attached patch preserves the high bit of the unsigned numeric uid as returned from getpwnam(). From my limited testing, this fixed the reported problem.

Comment 5 todd_lewis 2018-09-04 14:49:51 UTC
Created attachment 1480807 [details]
Fix same problem with gids as well as uids

Turns out the same problem exists with long (>2^31) gids. This patch fixes both.

Comment 6 Jan Rybar 2018-09-11 09:26:58 UTC
Hello Todd,
Thank you for your patch proposals.
I tried to investigate why el.num is 'long', which causes many implicit casts, anyways and it seems I'll have to ask authors themselves about its reason.
Meanwhile I'll test these integer casts, but I guess it will be included in rhel-7.7 at earliest.

Thank you very much.
Have a nice day.

Comment 7 todd_lewis 2018-12-20 14:25:27 UTC
Any movement on this? It's been over three months, and we've got
  * clear data lost due to bad cast (works with numeric id, fails w/ symbolic)
  * paying customers having to do silly work-arounds
  * the most dead simple fix a C program can have, and
  * working patch in hand
What can I do from this end to facilitate pushing this forward? Thanks.

Comment 8 Jan Rybar 2018-12-29 20:37:38 UTC
Hello Todd,

I can assure you that I keep monitoring this bug.
I noticed that you filed a merge request at the upstream on Gitlab. As you can see, the activities at upstream side have cooled down lately.
It is advised to implement the fix within RHEL after it's featured on upstream side.
The closest release of RHEL containing the patch is RHEL-7.7.

Thank you for your patience.
I wish you happy New Year.

Comment 15 errata-xmlrpc 2020-09-29 20:36:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (procps-ng bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4017