ADVISORY: JDK1.3.0 under Linux:

Tagged:
Below is a clipping from Sun on working around JVM crashes under high thread counts in the JVM 1.3 for Linux + On Linux, use a larger signal number for hotspot thread suspension/resumption + handler. The signal number being used is specified by environment variable + _JAVA_SR_SIGNUM. Setting it to a number larger than SIGSEGV (11) will solve + the problem. A good number to use is 12, which is SIGUSR2. Using signal 16 + to work around the problem might have potential problems. + + So on tcsh, "setenv _JAVA_SR_SIGNUM 12" can solve the problem. + + + xxxxx@xxxxx 2001-03-02 + Evaluation + + On Linux, this bug is caused by how thread suspension/resumption + mechanism is implemented in HotSpot. The suspension and resumption + are implemented with signal handler SR_handler that uses SIGUSR1 (=10) + by default. The thread is suspended by an infinite loop of sigsuspend() + inside SR_handler until another SIGUSR1 to resume the thread. SR_handler + does not mask other signals. This makes sense because the thread might + be potentially suspended for a long time and we don't want to hide + things like SIGINT. + + Currently on Linux, if one thread has more than one synchronous signals + pending, they are delivered in the order of increasing signal numbers. + Unfortunately SIGSEGV is 11. So, for example, if the thread raised SEGV + (e.g. null pointer dereference in the compiled code, which is handled + in SEGV handler and will throw NullPointerException as the result) and + before it was handled, the thread was suspended (e.g. by GC), it will + have two pending signals (SIGUSR1 = 10 for suspension, and SIGSEGV = 11). + SIGUSR1 was handled first because it's smaller. SR_handler is invoked + and it changes some state variables of current Java thread, but before + it goes into sigsuspend, kernel delivers the next pending signal SIGSEGV + and invokes the SEGV handler. Inside SEGV handler, it tries to locate + the fault address in Java code. This will fail because now the return + address in ucontext points to SR_handler, which apparently cannot be + mapped to any Java method! + + Changing the signal number used by SR_handler successfully works around this + problem, because by using a larger signal number for suspension/resumption, + SEGV handler will always be invoked first with the ucontext correctly set + up for the SEGV site, so that it will have the correct fault address to look + up in code cache and we will not see assertion failures any more. + + Another possible workaround is to mask SIGSEGV for SR_handler, so that SEGV + handler will only be invoked after SR_handler finishes. In that case, SEGV + handler still has the correct fault address to look up. + + See also 4401965 + + xxxxx@xxxxx 2001-03-02 + + ------------------------------------------------- + + Linux part is fixed by using SIGUSR2 for suspension/resumption. + + We don't use signal handler for thread suspension/resumption on Windows. + The cause needs further investigation. Windows part is tracked as + 4423483. + + xxxxx@xxxxx 2001-03-08 +