Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2223974

Summary:	Hang in os::Linux::get_namespace_pid with jps command
Product:	Red Hat Enterprise Linux 8	Reporter:	Paulo Andrade <pandrade>
Component:	java-11-openjdk	Assignee:	Andrew John Hughes <ahughes>
Status:	CLOSED MIGRATED	QA Contact:	OpenJDK QA <java-qa>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	---	CC:	myamazak
Target Milestone:	rc	Keywords:	MigratedToJIRA
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-12 23:14:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Paulo Andrade 2023-07-19 12:51:17 UTC

User had a coredump of a process that was apparently hung in
os::Linux::get_namespace_pid. My previous analysis from almost
one year ago:

"""
  Thread #0 is waiting for thread #1 to finish.

  Thread 1 is in the fgetc call in the "for(;;)" loop below:

// Determine if the vmid is the parent pid for a child in a PID namespace.
// Return the namespace pid if so, otherwise -1.
int os::Linux::get_namespace_pid(int vmid) {
  char fname[24];
  int retpid = -1;

  snprintf(fname, sizeof(fname), "/proc/%d/status", vmid);
  FILE *fp = os::fopen(fname, "r");

  if (fp) {
    int pid, nspid;
    int ret;
    while (!feof(fp) && !ferror(fp)) {
      ret = fscanf(fp, "NSpid: %d %d", &pid, &nspid);
      if (ret == 1) {
        break;
      }
      if (ret == 2) {
        retpid = nspid;
        break;
      }
      for (;;) {
        int ch = fgetc(fp);
        if (ch == EOF || ch == (int)'\n') break;
      }
    }
    fclose(fp);
  }
  return retpid;
}
"""

  Suspecting issues:

* errno value is 3:
  #define ESRCH            3      /* No such process */

* The fp flags have _IO_ERR_SEEN set:
  #define _IO_ERR_SEEN          0x0020

* The FILE* fp did open "/proc/8503/status"
  Maybe there was a race condition and this thread did already exit.

* The fgetc call should be returning EOF, not hanging, so, it might be
  some issue with the procfs file.

  I suspect it might be related to vm.swappiness=1 in /etc/sysctl.conf

  It would be useful if there was a jps hung process while generating
the sosreport, as this could provide some extra data.

  It is also desirable to know where in the kernel it is hung, for
example, have the output of:

for pid in $(pidof jps); do echo ==$pid==; cat /proc/$pid/stack; done

while it is hung.

  If replated to vm.swappiness=1 it should be in some low memory condition
state. Experimenting with default vm.swappiness=60 should sort out this.

  This small program should reproduce the hang if it were a generalized
case, but it should be some complex condition...
"""
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>

int
main(int argc, char *argv[])
{
  pid_t     pid;
  FILE     *fp;

  if ((pid = fork()) == -1) {
    perror("failed to fork");
    exit(1);
  }
  if (pid == 0) {
    printf("child: about to sleep 3\n");
    if (execl("/usr/bin/sleep", "sleep", "3", NULL, NULL))
      perror("failed to start sleep");
  }
  else {
    int ch, status;
    sleep(1);
    char path[256];
    sprintf(path, "/proc/%ld/status", pid);
    if ((fp = fopen(path, "r")) == NULL) {
      perror("failed to open /proc/pid/status");
      exit(1);
    }
    printf("parent: opened %s\n", path);
    do {
      printf("parent: waiting for %d\n", pid);
      if (waitpid(pid, &status,  WUNTRACED | WCONTINUED) == -1) {
	perror("failed to waitpid");
	exit(1);
      }
    } while (!WIFEXITED(status));
    printf("parent: process %d exited\n", pid);
    for (;;) {
      ch = fgetc(fp);
      printf("ch = %d, errno = %s, feof = %d, ferror = %d\n",
	     ch, strerror(errno), feof(fp), ferror(fp));
      if (ch == EOF || feof(fp)) {
	break;
      }
      fputc(ch, stdout);
    }
  }
  return 0;
}
"""

  User experienced the issue again.
  Besides vm.swappiness=1, user also has several entries in the pattern:

$USER	hard	nofile	819200
$USER	soft	nofile	819200

in /etc/security/limits.conf for several different users.

  Now user tested perf when the issue happened again, and indeed the
process is looping in the kernel, and using too much cpu time.

  Fix should be mostly trivial in java code, and if EOF is returned,
exit the main loop and return -1, not just break the for loop and
return to the main while loop.

...
    78.34%     0.00%  jps              [unknown]                 [k] 0000000000000000
            |
            ---0
               |          
               |--75.76%--read
               |          |          
               |          |--45.28%--entry_SYSCALL_64_after_hwframe
...

Comment 1 RHEL Program Management 2023-09-12 23:12:18 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 2 RHEL Program Management 2023-09-12 23:14:06 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.