Thursday, March 13, 2014

JDBC Connection reset error -> Java SecureRandom -> Linux /dev/(u)random

Some time ago, we were encountering a JDBC connection reset quite frequently in a standalone Java job that was being kicked off every 5 mins from cron.


Env details
JDBC Driver: Oracle 11.2.0.2.0 JDBC
JVM: java version "1.7.0_25" OpenJDK Runtime Environment (rhel-2.3.10.4.el5-x86_64) OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
OS:  Red Hat Enterprise Linux Server release 6.2 (Santiago)
Database Server: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

The exception stacktrace is displayed below :-

Exception in thread "main" java.sql.SQLRecoverableException: IO Error: Connection reset
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:428)
at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:536)
at oracle.jdbc.driver.T4CConnection.(T4CConnection.java:228)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
...

The first observation from the logs was that the connect was hanging somewhere between 60 and 90 seconds before the Connection reset.

Some investigation soon revealed something that was not quite expected. Apparently, the 11g JDBC driver tries to initialize the java.security.SecureRandom class to generate random numbers, possibly to be used in the client server handshake for initial session setup. On Linux, the call to generate a seed for SecureRandom can block if /dev/random does not have sufficient entropy available. And after a certain interval, the server resets the TCP connection as it sees no activity from the client. This is what the exception is about.

The easy workaround is to set a JVM system param i.e. -Djava.security.egd=file:///dev/urandom in the Java command line. This works because reads from /dev/urandom do not block even in the absence of entropy and simply continue to return (pseudo-)random bytes of lower quality.
There is also an alternative and roughly equivalent setting in the java.security file that can also be used as described in 1 below.

The practical upshot is that one of the above parameter settings is always recommended  even if no Connection reset exceptions are encountered. This is because some unnecessary blocking of the connect may still occur, though it may not always be long enough to trigger a reset from the server. Certainly this can cut down the database connect time in most cases. In theory, there could be security risks arising from the use of /dev/urandom but this may not be the weakest link in the app security chain.

Though the problem can be worked around relatively easily, it is probably worthwhile to get some more background as it not entirely obvious that a connection reset from the database server can be directly related to a random seed generation call from Java.

man urandom provides a lot more detail on /dev/random and /dev/urandom but an easy way of seeing the difference in behavior is to try the following a few times on Linux.

hexdump /dev/random
and
hexdump /dev/urandom

The first will typically produce outputs slowly while the second just zips through. Also the number of bits of entropy available at any point can be seen using cat /proc/sys/kernel/random/entropy_avail

Note: Mac OSX uses a different mechanism and the man page (man urandom) mentions the use of the Yarrow pseudo random number generator (PRNG) with the result that /dev/random and /dev/urandom are equivalent and reads do not block. Things are similar for Windows.

Still, there are further twists in the tale that are a bit tedious to get into but it may be worth mentioning some of the key ones.

1. The config file $JAVA_HOME/jre/lib/security/java.security has an invalid setting for securerandom.source

securerandom.source=file:/dev/urandom
#
# The entropy gathering device is described as a URL and can also
# be specified with the system property "java.security.egd". For example,
#  -Djava.security.egd=file:/dev/urandom
# Specifying this system property will override the securerandom.source
# setting.

Although this setting should result in /dev/urandom being used for seed generation, this does not really happen and one of these values has the desired effect instead
file:///dev/urandom
file:/dev/./urandom
file:/dev/../dev/urandom

Seems like this is a Java quirk/bug and the same holds for -Djava.security.egd

2. SecureRandom has a nextBytes method for getting the actual random bytes. This is not the call that blocks. Instead, not surprisingly, it is getSeed or generateSeed that can potentially block when /dev/random is in effect.

3. Virtualized environments tend to suffer from a lack of entropy
http://www.phoronix.com/scan.php?page=news_item&px=MTI5NzY
http://www.forbes.com/2009/07/30/cloud-computing-security-technology-cio-network-cloud-computing.html

No comments:

Post a Comment