infrastructure

Infrastructure is the basic physical and organizational structure needed for the operation of a society or enterprise, or the services and facilities necessary for an economy to function. read more at WikiPedia

  • google_logo

    Google’s Jeff Dean was one of the keynote speakers at an ACM workshop on large-scale computing systems, and discussed some of the technical details of the company’s mighty infrastructure, which is spread across dozens of data centers around the world. His presentation give some insight about what’s going on at Google, and how they have found innovative solutions to meet their never ending quest of speed and bandwidth usage. All their figures have impressed me a lot!

    You will learn some of their in house technologies, aka

    • Google File System (GFS): a scalable distributed file system for large distributed data-intensive applications.
    • Map Reduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers [WikiPedia],&160; see Hadoop project for a free open source Java MapReduce implementation.
    • BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), see Hadoop HBase project for something similar.
    • Their new project: Spanner which will be responsible for Storage & computation system to spans all over their datacenters.

    Read now this great document online http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf (if it disappear, ask me for a copy)

  • At SD Forum 2006, Randy Shoup and Dan Pritchett, both with eBay, gave a presentation on eBay's architecture. Pritchett subsequently posted his presentation slides in his blog, The eBay Architecture.
    Predictably, the presentation contained a few inspiring statistics, such as:
    • 15,000 application instances running in eight data centers
    • 212,000,000 registered users
    • 1 billion page views per day
    • 26 billion SQL queries and updates per day
    • Over 2 petabytes of data
    • $1,590 worth of goods traded per second
    • Over 1 billion photos
    • 7 languages
    • 99.94% uptime

    Other stats in the presentation related to development process and features, such as:

  • Over 300 new features released each quarter
  • Over 100,000 lines of code released every two weeks

"According to the presentation, the goal of eBay's current architecture is to handle an additional ten-fold increase in traffic, something eBay expects to reach within a few short years. Another architecture objective is to be able to handle peak loads, and for components to gracefully degrade under unusual load or in the case of system failures." read more  HERE

 

  • If you want to know a bit more on the infrastructure that is used by Google...

  • I am looking for a new hardware, probably a Athlon 64 with raid5 support and nvidia chipset. a shame; Nforce4 is not well supported on Linux. Anyway, I came across that page at Wikipedia on GOOGLE hardware. Quite interesting reading..

    Google platform



  • This PDF is a lot more technical, but reveal some challenges WikiPedia is facing in order to maintain its infrastructure/response time.
    WikiPedia is simply the biggest multilingual free-content encyclopedia on the Internet. Over 7 million articles in over 200 languages, and still growing.


    [...]Started as Perl CGI script running on single server in 2001, site has grown into distributed platform, containing multiple technologies, all of them open. The principle of openness
    forced all operation to use free & open-source software only. Having commercial alternatives out of question, WikiPedia had the challenging task to build efficient platform of freely
    available components. [...]

    One more time, worth reading if you are into web development, performance and scalability. It seems that lighttpd is more and more used for serving static files (html, images, js, css, pdf...) instead of the venerable Apache

    If you have still some new fresh neuron to burn, you can read the  Overall system architecture and more HERE
    The most important news here is that Wikipedia currently uses APC, so I choose the right PHP cache ;-)
  • Kyle Cordes’s blog post on the “YouTube Scalability Talk” It’s definitely worth reading!