scalability

Scalability is ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. read more at WikiPedia

  • Resources such as JavaScript and CSS files can be compressed before being sent to the browser, improving network efficiencies and application load time in certain case. If you are not using Apache with mod_deflate or nginx in front of your web application, you may need to implement resources compression yourself….

    Wait! don’t start writing your own filter to compress files like CSS, html, txt, javascript it is way more difficult than you think to properly handle http response headers and do proper handling of mime type and caching. In one sentence don’t start reinventing the wheel: use ehcache for example.

    Ehcache is an open source, standards-based cache used to boost performance, offload the database and simplify scalability. Ehcache is robust, proven and full-featured and this has made it the most widely-used Java-based cache. It can scale from in-process with one or more nodes through to a mixed in-process/out-of-process configuration with terabyte-sized caches. For applications needing a coherent distributed cache, Ehcache uses the open source Terracotta Sever Array.

    in the pom.xml of your project add the following dependency to ehcache-web

    <dependency>
        <groupId>net.sf.ehcache</groupId>
        <artifactId>ehcache-web</artifactId>
        <version>2.0.4</version>
    </dependency>

    in your web.xml, add a filter and configure it properly

    <filter>
     <filter-name>CompressionFilter</filter-name>
     <filter-class>net.sf.ehcache.constructs.web.filter.GzipFilter</filter-class>
    </filter>
    <filter-mapping>
     <filter-name>CompressionFilter</filter-name>
     <url-pattern>*.css</url-pattern>
    </filter-mapping>
    <filter-mapping>
     <filter-name>CompressionFilter</filter-name>
     <url-pattern>*.html</url-pattern>
    </filter-mapping>
    <filter-mapping>
     <filter-name>CompressionFilter</filter-name>
     <url-pattern>*.js</url-pattern>
    </filter-mapping>
    <filter-mapping>
     <filter-name>CompressionFilter</filter-name>
     <url-pattern>/*</url-pattern>
    </filter-mapping>

    Read more at EhCache Web Caching page.

    As a bonus, I provide you also below the configuration for the famous challenger HTTP server nginx

     ##
     # Gzip Settings
     ##
     gzip  on;
     gzip_http_version 1.1;
     gzip_vary on;
     gzip_comp_level 6;
     gzip_proxied any;
     gzip_types text/plain text/css application/json application/x-javascript \
    text/xml application/xml application/xml+rss text/javascript \
    application/javascript text/x-js; gzip_buffers 16 8k; gzip_disable "MSIE [1-6]\.(?!.*SV1)";

    &160;

    or for the number one HTTP server Apache using mod deflate /etc/apache2/conf.d/deflate.conf

    <Location />
    # Insert filter
    SetOutputFilter DEFLATE
    
    AddOutputFilterByType DEFLATE text/plain
    AddOutputFilterByType DEFLATE text/xml
    AddOutputFilterByType DEFLATE application/xhtml+xml
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE application/xml
    AddOutputFilterByType DEFLATE image/svg+xml
    AddOutputFilterByType DEFLATE application/rss+xml
    AddOutputFilterByType DEFLATE application/atom_xml
    AddOutputFilterByType DEFLATE application/x-javascript
    AddOutputFilterByType DEFLATE text/html
    
    # Netscape 4.x has some problems...
    BrowserMatch ^Mozilla/4 gzip-only-text/html
    
    # Netscape 4.06-4.08 have some more problems
    BrowserMatch ^Mozilla/4\.0[678] no-gzip
    
    # MSIE masquerades as Netscape, but it is fine
    BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
    # Don't compress images
    SetEnvIfNoCase Request_URI \
    \.(?:gif|jpe?g|png)$ no-gzip dont-vary
    
    # Make sure proxies don't deliver the wrong content
    Header append Vary User-Agent env=!dont-vary
    </Location>
  • At SD Forum 2006, Randy Shoup and Dan Pritchett, both with eBay, gave a presentation on eBay's architecture. Pritchett subsequently posted his presentation slides in his blog, The eBay Architecture.
    Predictably, the presentation contained a few inspiring statistics, such as:
    • 15,000 application instances running in eight data centers
    • 212,000,000 registered users
    • 1 billion page views per day
    • 26 billion SQL queries and updates per day
    • Over 2 petabytes of data
    • $1,590 worth of goods traded per second
    • Over 1 billion photos
    • 7 languages
    • 99.94% uptime

    Other stats in the presentation related to development process and features, such as:

  • Over 300 new features released each quarter
  • Over 100,000 lines of code released every two weeks

"According to the presentation, the goal of eBay's current architecture is to handle an additional ten-fold increase in traffic, something eBay expects to reach within a few short years. Another architecture objective is to be able to handle peak loads, and for components to gracefully degrade under unusual load or in the case of system failures." read more  HERE

 


  • This PDF is a lot more technical, but reveal some challenges WikiPedia is facing in order to maintain its infrastructure/response time.
    WikiPedia is simply the biggest multilingual free-content encyclopedia on the Internet. Over 7 million articles in over 200 languages, and still growing.


    [...]Started as Perl CGI script running on single server in 2001, site has grown into distributed platform, containing multiple technologies, all of them open. The principle of openness
    forced all operation to use free & open-source software only. Having commercial alternatives out of question, WikiPedia had the challenging task to build efficient platform of freely
    available components. [...]

    One more time, worth reading if you are into web development, performance and scalability. It seems that lighttpd is more and more used for serving static files (html, images, js, css, pdf...) instead of the venerable Apache

    If you have still some new fresh neuron to burn, you can read the  Overall system architecture and more HERE
    The most important news here is that Wikipedia currently uses APC, so I choose the right PHP cache ;-)