profiling

Profiling, the extrapolation of information about something, based on known qualities, may refer specifically to: [read more at http://en.wikipedia.org/wiki/Profiling]

  • Still not enough, we were forced to profile the java code and make some big changes.... (from part 1)

    Profiling

    You either profile an application for speed and/or memory usage and/or memory leaks. Our application is fast enough at the moment. Our major concern is optimizing memory usage and thus avoiding disk memory swapping.

    Some words about architecture


    It is not possible to profile any applications without having a deep understanding of the architecture behind. The Product Catalog is an innovative product which is a meta model for storing insurances product in a database, a Product is read only and  can derivate instance that we call Policies. Policies are users data holder, containing no text, just values, and sharing a similar structure as the Product. This let the product know everything about (cross) default values, (cross) validations,  multiple texts, attributes order/length/type etc... and thus separate definition (Products) from implementation (Policies). Products and Policies can be fully described  with Bricks, Attributes in a tree manner.

    Reduce the number of object created


    Looking at the code, we have seen that too many Products (17 Products has 15000 objects either Attributes/Bricks/Texts/Value/ValueRange) were loaded in memory. While this is clearly giving a speed advantage on an application server, it is simply killing the offline platform with it 1GB RAM (remember memory really free is 500Mb)
    The problem is that Attributes and Brick are using/can use  a lot of fields/meta data in the database which translate into simple java type (String for UUID, and meta data keys and values) in  memory. We start looking at the profiler and the 100 MB used by the product cache.
    Reducing this amount of object was the first priority, a lot of them are meta data which are common and spread across the Product Tree in memory. Since avoiding creation of unneeded object is always recommended, we decide to remove duplicate element in the tree by singularizing them. This is possible because the product is Read Only and made of identical meta data keys, meta keys value.

    Entropy and cardinality of meta data
    An Attribute may have an unlimited number of meta text (among other things), common meta data keys are  "long", and "short" and "help" text description in 4 languages (en_us, fr_ch, de_ch, it_ch), while this is not a problem in the database, this make the Product  object tree size quite huge  in the Product cache (containing Products Value Object). ..Counting some of them in database for example return stunning result. 
    We found 60000 "long" texts which translate into 60000 String text keys and 60000 String text values (worst case scenario since texts values may not be all reusable). Reducing this number of Objects is done quite easily by not creating new instance of  String, Decimal, Integer object and returning always the right and same Object instance. (we keep them in a MAP and return either a new instance or a previously cache one).

    Large objects cardinality  but a poor entropy
    By running two or three SQL statement and trying to distinguish real distinct values, we found that a lot of these meta data are made of a relative small number of different values. By just storing a limited number of String like "0", "1", "2" to... "99", "default", "long", "short", "de_ch", "fr_ch" we have reach a cache efficiency and reuse of object instance of 99%
    After that "small" change in the way value objects (VO) are created and connected, a java String object containing before "de_ch" and existing 10000 times in memory is now replaced across all  Attributes/Bricks by the same instance!

     The gain is simply phenomenal. Memory gain is bigger than 50%.

    Reducing the number of objects in memory 
    Instead of storing thousands of Products Text String in memory,  we decided to allocate them on disk using  java reflection API and a Dynamic Proxy.

    The idea is to save all String in one or more files on disk, the position of each text and length being saved in the corresponding Value Object. So basically we gain the space used by a String in memory  at the expense of a long (String position in file relative to start of file) and an  int (length of String) primitive type

    References:  Proxy  - InvocationHandler
    Resume: Java String disk based allocation
    Code snippet: soon

    Use better data structures
    Java has a lot of  quality library, commons collections from apache are well known. Javalution is a real time library with real time and reduce garbage collector impact. We have use FastTable and FastMap where it make sense.

    For example the class FastTablehas the following advantages over the widely used java.util.ArrayList:
    • No large array allocation (for large collections multi-dimensional arrays are employed). The garbage collector is not stressed with large chunk of memory to allocate (likely to trigger a full garbage collection due to memory fragmentation).
    • Support concurrent access/iteration without synchronization if the collection values are not removed/inserted

    Different caching strategy
    By design the ProductCatalog is able to use many caching strategy. One is named "Nocache" limit number of object in memory to the bare minimum, and redirect all access to product to database. In a mono user environment, and since products reside in 4 tables only (so only 4 select to read all data from DB and some VO to rebuild the tree are needed), the through output is more than enough.

    More to come...



    References
  • We are working since 3 days on tuning a big application: 

    • Client server enterprise grade application,
    • Run on 2 JVM  with 4Gb (Tomcat/Application server) of RAM each!
    • Run on 2 Double core AMD 64 bits server,
    • Linux 64bits,
    • Has a lot of parallel users and > 10000 are registered
    • Use a product meta model which separate definition from implementation data.
    • Java server faces, java, ajax
     This application  is just consuming too much memory for the offline version. Our objective is to make that big application run

    • The same code as above,
    • In windows XP, 
    • IBM T40,  Intel Pentium M 1.6 GHz,  DDR266/PC2100
    • 1 JVM with 500Mb in Tomcat,
    • 1 GB of physical Ram,
    • 1 Desktop user who may run also Lotus Notes, Microsoft Office at the same time...
    There is already a lot of good resources and valuable advices on internet (Google is your friend :-)). Before digging in the code, and since the code is already productive,  we have done some tuning on component first.
    By tuning each components involved one after the other, this follow the principle: Lets do some quick win first before changing algorithm and increasing risk of breaking something....
    In order to back up each changes made with some statistics, the first step was to develop a testcase with Web Stress Tool(Commercial) but Apache JMETER (... replace with your favorite web testing tool) would have do the job

    At the OS Level

    by trying to convince the company to turn the anti virus off on some files and directory. They were scanning XHTML, javascript, XML, class files, images, so nearly everything... during EACH file access. Note the user has no windows right to alter files.


    MySQL 5(we are already using the latest 5.X branch by luck)

    By removing TCP database access and using name pipe only (+30 to +50% performances),

    By Installing MySQL Enterprise Advisor and Monitor. (You can request a free trial key here) and looking at what the advisor recommend. Attention this tool has been developed for monitoring servers, some recommendations are simply not always usable. In our case we are constrained by the memory, remember less than 500Mb, so we did not blindly follow advices. Basic stuff were done, like adding indexes (were it make sense to avoid full tables scan and reduce slow queries), increasing  buffers,

    By switching to myISAM (multi threaded with table locking) instead of innoDB (multi threaded with row locking), and also avoiding other storage engine to run with different algorithm to run in parallel..

    MyISAM is the default storage engine for the MySQL relational database management system. It is based on the older ISAM code but has many useful extensions. In recent MySQL versions, the InnoDB engine has widely started to replace MyISAM due to its support for transactions, referential integrity constraints, and higher concurrency.Each MyISAM table is stored on disk in three files. The files have names that begin with the table name and have an extension to indicate the file type. MySQL uses a .frm file to store the definition of the table, but this file is not a part of the MyISAM engine, but instead is a part of the server. The data file has a .MYD (MYData) extension. The index file has a .MYI (MYIndex) extension. [WikiPedia]

    InnoDB is a storage engine for MySQL, included as standard in all current binaries distributed by MySQL AB. Its main enhancement over other storage engines available for use with MySQL is ACID-compliant transaction support, similar to PostgreSQL, along with declarative referential integrity (foreign key support). InnoDB became a product of Oracle Corporation after their acquisition ofInnobase Oy, in October 2005. The software is dual licensed. It is distributed under the GNU General Public License, but can also be licensed to parties wishing to combine InnoDB in proprietary software. [WikiPedia]

    What are the differences, and you may want also to use myISAM for mono user applications...
    1. InnoDB recovers from a crash or other unexpected shutdown by replaying its logs. MyISAM must fully scan and repair or rebuild any indexes or possibly tables which had been updated but not fully flushed to disk. Since the InnoDB approach is approximately fixed time while the MyISAM time grows with the size of the data files, InnoDB offers greater perceived availability and reliability as database sizes grow.
    2. MyISAM relies on the operating system for caching reads and writes to the data rows while InnoDB does this within the engine itself, combining the row caches with the index caches. Dirty (changed) database pages are not immediately sent to the operating system to be written by InnoDB, which can make it substantially faster than MyISAM in some situations.
    3. InnoDB stores data rows physically in primary key order while MyISAM typically stores them mostly in the order in which they are added. This corresponds to the MS SQL Server feature of “Clustered Indexes” and the Oracle feature known as "index organized tables." When the primary key is selected to match the needs of common queries this can give a substantial performance benefit. For example, customer bank records might be grouped by customer in InnoDB but by transaction date with MyISAM, so InnoDB would likely require fewer disk seeks and less RAM to retrieve and cache a customer account history. On the other hand, inserting data in orders that differ substantially from primary key (PK) order will presumably require that InnoDB do a lot of reordering of data in order to get it into PK order. This places InnoDB at a slight disadvantage in that it does not permit insertion order based table structuring.
    4. InnoDB currently does not provide the compression and terse row formats provided by MyISAM, so both the disk and cache RAM required may be larger. A lower overhead format is available for MySQL 5.0, reducing overhead by about 20% and use of page compression is planned for a future version.
    5. When operating in fully ACID-compliant modes, InnoDB must do a flush to disk at least once per transaction, though it will combine flushes for inserts from multiple connections. For typical hard drives or arrays, this will impose a limit of about 200 update transactions per second. If you require higher transaction rates, disk controllers with write caching and battery backup will be required in order to maintain transactional integrity. InnoDB also offers several modes which reduce this effect, naturally leading to a loss of transactional integrity. MyISAM has none of this overhead, but only because it does not support transactions. [WikiPedia]
    For us the speed of myISAM is clearly balancing the drawback for a desktop applications.

    JSF tuning

    Obvious settings here:, JSF is lacking more fine tuning settings. Serialization is occurring during the model life cycle and consume memory and CPU. We may dig deeply later.
    • javax.faces.STATE_SAVING_METHOD to server
    • org.apache.myfaces.COMPRESS_STATE_IN_SESSIONto true since memory is the biggest constraint for us
    • org.apache.myfaces.NUMBER_OF_VIEWS_IN_SESSION to 0
    • facelets.BUFFER_SIZE to 8192

    Tomcat tuning

    Nothing big can be done here...For me Tomcat is really missing a dynamic web application loader:  Tomcat is simply installing all applications found in /webapps at startup even if they are not used. They are never remove from memory or serialized to disk. Tomcat 4.1 seems to have a memory footprint of 22 Mb, going to the latest Tomcat 6.0 is a too big changes for us now, but we might reconsider it in the future. Removing java library which are not use from WEB-INF/libby trial and error can save some precious Bytes through as it is pretty common when you use frameworks to have jar not desired. For example: junit.jar, jdbc drivers, jms.jar,...Moving common lib to shared/lib may also help remove duplicate jars from webapps class loader and memory.


    JVM tuning

    Java 1.5 and java 1.6 have made a lot of progress, and the JIT compiler found in JAVA 1.5/1.6 is getting  more and more aggressive...The basic rule is to turn the GC JVM log on (by adding -Xloggc:<file> [-XX:+PrintGCDetails]) and analyze it offline with a tool like GCViewer (free). The JIT is doing a pretty good job as the application run more and more faster with the time, but it is just a feeling ;-)
    By analyzing the GC logs we were able to optimize and avoid big mis configurations mistakes, one more time a lot of articles and books are available on how to tune a JVM. Sadly java has no advisor at the moment or is not using genetic algorithms to tune itself...It remains a dream for now.

    By using an empiric approach, which means:
    1 changing JVM parameters -> running test cases ->deciding if we give CPU away or minimize RAM usage -> go back to 1

    We come down to the following  exotic parameters (Xms and Xmx are not of any help since it is really depending on your application and how memory is managed internally)

     -XX:+AggressiveOpts -XX:-UseConcMarkSweepGC

    By the way I use them in Eclipse + JDK 1.6 since months. This page A Collection of JVM Options compiled by: Joseph D. Mocker (Sun Microsystems, Inc.) has been of a great help during this stage.

    Still not enough, we were forced to profile the java code and make some big changes....