ADVISORY: JDK1.3.0 under Linux:
Submitted by kebernet on Fri, 04/13/2001 - 14:35
Tagged:
Below is a clipping from Sun on working around JVM
crashes under high thread counts in the JVM 1.3
for Linux
+ On Linux, use a larger signal number for hotspot
thread suspension/resumption
+ handler. The signal number being used is
specified by environment variable
+ _JAVA_SR_SIGNUM. Setting it to a number larger
than SIGSEGV (11) will solve
+ the problem. A good number to use is 12, which
is SIGUSR2. Using signal 16
+ to work around the problem might have potential
problems.
+
+ So on tcsh, "setenv _JAVA_SR_SIGNUM 12" can
solve the problem.
+
+
+ xxxxx@xxxxx 2001-03-02
+ Evaluation
+
+ On Linux, this bug is caused by how thread
suspension/resumption
+ mechanism is implemented in HotSpot. The
suspension and resumption
+ are implemented with signal handler SR_handler
that uses SIGUSR1 (=10)
+ by default. The thread is suspended by an
infinite loop of sigsuspend()
+ inside SR_handler until another SIGUSR1 to
resume the thread. SR_handler
+ does not mask other signals. This makes sense
because the thread might
+ be potentially suspended for a long time and we
don't want to hide
+ things like SIGINT.
+
+ Currently on Linux, if one thread has more than
one synchronous signals
+ pending, they are delivered in the order of
increasing signal numbers.
+ Unfortunately SIGSEGV is 11. So, for example, if
the thread raised SEGV
+ (e.g. null pointer dereference in the compiled
code, which is handled
+ in SEGV handler and will throw
NullPointerException as the result) and
+ before it was handled, the thread was suspended
(e.g. by GC), it will
+ have two pending signals (SIGUSR1 = 10 for
suspension, and SIGSEGV = 11).
+ SIGUSR1 was handled first because it's smaller.
SR_handler is invoked
+ and it changes some state variables of current
Java thread, but before
+ it goes into sigsuspend, kernel delivers the
next pending signal SIGSEGV
+ and invokes the SEGV handler. Inside SEGV
handler, it tries to locate
+ the fault address in Java code. This will fail
because now the return
+ address in ucontext points to SR_handler, which
apparently cannot be
+ mapped to any Java method!
+
+ Changing the signal number used by SR_handler
successfully works around this
+ problem, because by using a larger signal number
for suspension/resumption,
+ SEGV handler will always be invoked first with
the ucontext correctly set
+ up for the SEGV site, so that it will have the
correct fault address to look
+ up in code cache and we will not see assertion
failures any more.
+
+ Another possible workaround is to mask SIGSEGV
for SR_handler, so that SEGV
+ handler will only be invoked after SR_handler
finishes. In that case, SEGV
+ handler still has the correct fault address to
look up.
+
+ See also 4401965
+
+ xxxxx@xxxxx 2001-03-02
+
+ -------------------------------------------------
+
+ Linux part is fixed by using SIGUSR2 for
suspension/resumption.
+
+ We don't use signal handler for thread
suspension/resumption on Windows.
+ The cause needs further investigation. Windows
part is tracked as
+ 4423483.
+
+ xxxxx@xxxxx 2001-03-08
+







Recent comments
21 weeks 5 days ago
21 weeks 6 days ago
24 weeks 3 days ago
25 weeks 1 day ago
25 weeks 1 day ago
25 weeks 1 day ago
29 weeks 4 days ago
29 weeks 6 days ago
30 weeks 2 days ago
30 weeks 4 days ago