java - timeout errors from Tomcat talking to MongoDB in Azure -
my system tomcat 7 server running on ubuntu talking mongodb cluster running in centos. have on aws , working fine.
i brought exact same thing on azure , having constant, seemingly random timeouts when tomcat app tries query mongodb. typical error is:
jan 31 08:13:54 catalina.out: jan 31, 2014 4:14:09 pm com.mongodb.dbportpool goterror jan 31 08:13:54 catalina.out: warning: emptying dbportpool xxx.cloudapp.net/xxx.xxx.xxx.xxx:21191 b/c of error jan 31 08:13:54 catalina.out: java.net.socketexception: connection timed out jan 31 08:13:54 catalina.out: @ java.net.socketinputstream.socketread0(native method) jan 31 08:13:54 catalina.out: @ java.net.socketinputstream.read(socketinputstream.java:146) jan 31 08:13:54 catalina.out: @ java.io.bufferedinputstream.fill(bufferedinputstream.java:235) jan 31 08:13:54 catalina.out: @ java.io.bufferedinputstream.read1(bufferedinputstream.java:275) jan 31 08:13:54 catalina.out: @ java.io.bufferedinputstream.read(bufferedinputstream.java:334) jan 31 08:13:54 catalina.out: @ org.bson.io.bits.readfully(bits.java:46) jan 31 08:13:54 catalina.out: @ org.bson.io.bits.readfully(bits.java:33) jan 31 08:13:54 catalina.out: @ org.bson.io.bits.readfully(bits.java:28) jan 31 08:13:54 catalina.out: @ com.mongodb.response.<init>(response.java:40) jan 31 08:13:54 catalina.out: @ com.mongodb.dbport.go(dbport.java:142) jan 31 08:13:54 catalina.out: @ com.mongodb.dbport.call(dbport.java:92) jan 31 08:13:54 catalina.out: @ com.mongodb.dbtcpconnector.innercall(dbtcpconnector.java:244) jan 31 08:13:54 catalina.out: @ com.mongodb.dbtcpconnector.call(dbtcpconnector.java:216) jan 31 08:13:54 catalina.out: @ com.mongodb.dbapilayer$mycollection.__find(dbapilayer.java:288) jan 31 08:13:54 catalina.out: @ com.mongodb.db.command(db.java:262) jan 31 08:13:54 catalina.out: @ com.mongodb.db.command(db.java:244) jan 31 08:13:54 catalina.out: @ com.mongodb.dbcollection.getcount(dbcollection.java:985) jan 31 08:13:54 catalina.out: @ com.mongodb.dbcollection.getcount(dbcollection.java:956) jan 31 08:13:54 catalina.out: @ com.mongodb.dbcollection.getcount(dbcollection.java:931) jan 31 08:13:54 catalina.out: @ com.mongodb.dbcollection.count(dbcollection.java:878) jan 31 08:13:54 catalina.out: @ com.eweware.service.base.store.impl.mongo.dao.basedaoimpl._exists(basedaoimpl.java:788) jan 31 08:13:54 catalina.out: @ com.eweware.service.base.store.impl.mongo.dao.groupdaoimpl._exists(groupdaoimpl.java:18)
i using java driver 2.11.4 , initializing follows:
builder.autoconnectretry(true) .connectionsperhost(10) .writeconcern(writeconcern.fsynced) .connecttimeout(30000) .socketkeepalive(true);
in reading around interwebs saw materials suggesting there azure issue , c# suggestions have not seen on how correct java.
some more details:
- this happens when mongodb single node or replica set
- this happens whether or not server under load. in fact, server seems perform better under load cold start. under constant load timeout
- the timeouts appear random, in there no discernable pattern in call fail or when fail. go hour no issues, other times every call fail
- if retry calls on timeout error, work. takes >100 retries, other times single 1 work.
here retry code trying:
private dbobject findoneretry(dbobject criteria, dbobject fields, dbcollection collection) throws systemerrorexception { dbobject obj = null; (int attempt = 1; attempt < max_retries; attempt++) { try { obj = collection.findone(criteria, fields); // getting socketexception inside here return obj; } catch (exception e) { if (attempt > max_retries) { throw new systemerrorexception(makeerrormessage("findoneretry", "find", attempt, e, null), e, errorcodes.server_db_error); } else { logger.warning(getclass().getname() + ": findoneretry failed , retry in attempt #" + attempt + " in collection " + _getcollection()); } } } return obj; }
any suggestions on how correct?
thanks in advance!
this due high tcp_keepalive_time in centos server.
in mongos server: sudo nano /proc/sys/net/ipv4/tcp_keepalive_time change 7200 60
restart azure instance.
update: in order make sure vm has value of tcp_keepalive_time:
add line:
bash -c 'echo 60 > /proc/sys/net/ipv4/tcp_keepalive_time'
to:
/etc/rc.d/rc.local
update update: flavors linux, there /etc/sysctl.d/
directory. create file, e.g. mongo.conf
containing:
net.ipv4.tcp_keepalive_time = 60
put file in directory , run:
sysctl -p /etc/sysctl.d/mongo.conf
verify change with:
sysctl net.ipv4.tcp_keepalive_time
this survive reboots in redhat-based , debian-based systems i've come across. you'll want sure , check /etc/sysctl.conf
, other files in /etc/sysctl.d
see if variable being set else , make appropriate changes.
Comments
Post a Comment