2011/09/20

Go away legendaries!

Few days ago I found a performance issue in a SOA architecture.

JEE Architecture (simplified)


Browser ----(Comet)----> WEB ----(CXF/SOAP)----> SOA ----> SGBD

A Real-Time (RT) like implementation was required to deliver notifications to connected front-end users. Because of the SOAP backend server, RT is not possible between WEB and SOAP instance.
The current implementation was the following : one thread is created on session creation event. This thread located in WEB instance periodically performs a SOAP request. This thread is attached to user's session lifecycle.

I suggested to move to another implementation : one thread to manage all notifications of all users.

Performance Issue ?

Does this new implementation delivers better regarding all performance facets? CPU, memory and response time? Will the CPU usage be best for several small tasks?

Benchmark

Let's talking the JMX results with different use cases, but all with this settings :
TT=5s (think time)
duration=270s, (total time, includes the loading)
1 of 2 shared core for WEB and SOA (so only 50% CPU available)
SOA and WEB on the same instance (so no network connections)
SGBD is MySQL on the LAN, ping 19ms

Old implementation

First, with existing implementation, 100 VU

WEB (lighweight) side result, 100VU

SOA side results, 100VU

All available CPU is used, SOA thread pool reach 54 concurrent threads, GC ran very many times.

Next scenario, same implementation, 200 VU



WEB (lighweight) side result, 200VU

SOA side results, 200VU

Again, all available CPU is used, SOA thread pool reach 70 concurrent threads, GC ran as fast as possible when memory limit 256M was reached. I addition, we see an additional 50% memory usage on client side dued to threads management.

Next scenario, same implementation, 200 VU, plenty of memory

As the GC ran ran too many times, I increased the SOA memory to an unreachable limit.
Client side

SOA side
In this implementation, GC was not the root cause of CPU overhead.

New implementation

For the new implementation, in order to cut off this post, I will show only the 200VU results.
WEB (lighweight) side result, 200VU

SOA side results, 200VU
That's it, barely 1% CPU used both sides, only 2 GC ran on SOA, and it's initial thread pool (30) has not been increased.

Conclusion

In addition, old implementation results could be worst if loopback interface was not used because of HTTP network connections.
Hoping this showcase will definitely chase these kind of IT urban legends.