Hi everyone,

Good list Eugene.
Allow me to add a few comments where I think there is more to it than this (see inline comments).


On 3/1/12 2:07 PM, Eugene Lidede (Synergy) wrote:
1)      Caching is king. Much like man shall not live on bread alone, so shan’t web servers functionality rely on disk-based databases alone… There are no alternatives, especially for high volume traffic. Try Googling:

amen - but most web-apps will gain more with front-end optimization instead of back-end optimization.
See: http://www.youtube.com/watch?v=BTHvs3V8DBA

d.      Squid among others. Each has its best scenario implementation


use Varnish instead of squid, it is SO much more efficient (PHK is a genius).

2)      The web-facing side of your application ought to see your DB as storage and storage alone, not as a runtime working set as you would with a desktop application. If you need to join two or three tables to come up with content for a page, join those tables offline into a different table not while your users are staring at  their screens.


I disagree with this point - doing precalculation/data-warehouse or whatever you want to call it is an extreme measure, RDBMS' are extremely fast at combining relational data (if you have the right indexes) and having to maintain a separate cache in your data-storage is troublesome.
Personally I often/almost have a serverside-cache of the final HTML page since despite being dynamic in nature the frequency of which pages change is low compared to the amount of time it is read. But creating a separate pre-joined table I would do extremely rare..

3)      Cache as much as you can at your application level. When you are done caching, kindly cache some more!

Note: Use of RDBMS to maintain user sessions or authentication beyond the initial name/password matching is 90’s programming, you need to enroll for that chipuka certification thingy! There is a good reason why servers come with several GBs of RAM nowadays as standard,


Sometimes old-school is the best school ;-)
Storing sessions and authentication in RDBMS enables you to easily migrate and scale live sessions across multiple servers!

5)      When you cannot for the life of you cache anymore, tell others to cache for you via http headers like:

a.       Cache-Control:

b.      Expires:

c.       Last-Modified:


Generally this should be the first thing to do rather than the last... and don't forget ETags they are extremely useful for dynamic content.

b.      The web browsers do not give marks for good – or poorly implemented “for”, “while” and “repeat” loops nor fancy problem solving code, instead they timeout. If you have complex logic to implement, do it via stored procedures – as the data goes into the DB not while its being retrieved, spare your scripts for rendering, spare your server some CPU cycles, and spare us the environment.


Have never been a fan of stored-procedures. The work that needs to be done is the same, if its done in the database or in your code is in most cases of little difference. Database-vendors on the other hand loves to teach you how to do them, since once you have build stored-procedures for "their" DB then your application is tied closely into that DBMS.
I would argue that pushing it to the DB makes your solution less scalable since the DB is usually the bottleneck if/when you scale your application across multiple servers and in that case you want the DB to do as little work as possible.

8)      Size matters:

b.      Eliminate whitespace in your generated HTML if you can. This is easy if you use templates for your pages that are filled in by the scripts. You could have indented sources on dev, that you “compile” to remove white-space and “check-out” into prod.


Personally I do this - but in the bigger picture the gain is greatly offset by using compression, all those spaces become almost nothing.

9)      Benchmark to know your limits. Apache has a nice tool called ab for this kind of thing. And when you reach your limits, bow out gracefully like the educated fellow you are, not like the habitual drunk at the locals.


ab is a poor-man's tool for performance testing... it is good to get a baseline stress-test, but it will tell you very little about real life performance.
The best way to do it is to identify a set of usecases then distribute those according to your expected load and insert them into a tool like jmeter...
I can highly recommend "The Art of application performance testing" by Ian Molyneaux - it is a real eye-opener book (or was for me when I read it).


Then there is the more general architecture considerations - if you want to push the most out of one server then use a statefull model (J2EE / .NET) with lots of caching layers, if you want to scale easily across multiple servers then go for a stateless design with persistent caching.

regards
Mike