@Eugene, nice! very nice. Me copying, printing and sticking on my wall.

On Thu, Mar 1, 2012 at 2:07 PM, Eugene Lidede (Synergy) <eugene@synergy.co.ke> wrote:

The problem at KNEC could be a bad app/db design, inadequate infrastructure, or a conspiracy… I can’t tell for sure, but I can draw from my experiences having been down this path.

 

In the case I was involved in, we threw more of everything at our implementation: hours, skills, relationships, life, bandwidth, hardware and even more of hardware, but it would only work for a while. We quit shared hosting and setup dedicated servers, wapi!

 

Over the years, here are some of the pointers I know for sure work:

 

1)      Caching is king. Much like man shall not live on bread alone, so shan’t web servers functionality rely on disk-based databases alone… There are no alternatives, especially for high volume traffic. Try Googling:

a.       Memcache

b.      PHP APC

c.       Apache mod_proxy or

d.      Squid among others. Each has its best scenario implementation

2)      The web-facing side of your application ought to see your DB as storage and storage alone, not as a runtime working set as you would with a desktop application. If you need to join two or three tables to come up with content for a page, join those tables offline into a different table not while your users are staring at  their screens.

3)      Cache as much as you can at your application level. When you are done caching, kindly cache some more!

Note: Use of RDBMS to maintain user sessions or authentication beyond the initial name/password matching is 90’s programming, you need to enroll for that chipuka certification thingy! There is a good reason why servers come with several GBs of RAM nowadays as standard,

4)      When you cache in RAM, you will benefit from extra hard disk life. Ask folks who play MP3s from mechanical disks how often theirs crash

While relational databases are convenient in terms of retrieval, they deliver this at a comparatively heavy footprint with respect to system resources. This is of course worsened by poor/inefficient indices and general bad database/app design

 

Respect standards

5)      When you cannot for the life of you cache anymore, tell others to cache for you via http headers like:

a.       Cache-Control:

b.      Expires:

c.       Last-Modified:

6)      Just like in an KNEC exam (sic) answer what you are asked and not with your own random stuff. Process and obey http headers like these below, and respond accordingly:

a.       If-modified-since

b.      If-match

c.       If-non-match etc

If you had been caching from points 1 – 3, it ought to be a trivial exercise. If you obey the above two, my experience shows you could eliminate up to 30% or more of processing requirements, and subsequent bandwidth requirements

7)      Speed that thrills does not kill: If you have been caching and observing standards so far, you will have noticed an increase in speed. Dispense off the request quickly.

a.       If you use keep-alives whether at web server level or DB level using persistent connections reconsider. This is contentious, but in my experience, keep alives in many instances chew up valuable resources idling, waiting and generally being of no use.

b.      The web browsers do not give marks for good – or poorly implemented “for”, “while” and “repeat” loops nor fancy problem solving code, instead they timeout. If you have complex logic to implement, do it via stored procedures – as the data goes into the DB not while its being retrieved, spare your scripts for rendering, spare your server some CPU cycles, and spare us the environment.

c.       I am sure you optimized your queries to the best of your abilities

8)      Size matters:

a.       Compress your output.

b.      Eliminate whitespace in your generated HTML if you can. This is easy if you use templates for your pages that are filled in by the scripts. You could have indented sources on dev, that you “compile” to remove white-space and “check-out” into prod.

c.       Your .js and .css stuff ought to be pre-compressed into .gz and served as such. For ye Joomla fanatics, join up your JS and CSS assets into as few files as possible. Spare your server the agony of serving 20+ 10KB files

9)      Benchmark to know your limits. Apache has a nice tool called ab for this kind of thing. And when you reach your limits, bow out gracefully like the educated fellow you are, not like the habitual drunk at the locals.

10)   Most importantly monitor your parameters: bandwidth, disk activity, RAM utilization, hardware health, power consumption etc, and compare these to number users online. If possible do what the telcos do with their ARPU metric, reduce your entire server(s) operation to a per-user metric e.g. bandwidth required per so many users or per unit of revenue, or any other combos that make business sense.

 

This are my 2 bits scraped from here and there over the years, to mitigate high load. We did trip on load afterwards, but the above steps had set idle several servers from our initial buying frenzy

 

 

Regards

 

 

From: skunkworks-bounces@lists.my.co.ke [mailto:skunkworks-bounces@lists.my.co.ke] On Behalf Of solomon kariri
Sent: Thursday, March 01, 2012 11:13 AM
To: Skunkworks Mailing List


Subject: Re: [Skunkworks] KNEC WEBSITE

 

 

On Thu, Mar 1, 2012 at 10:43 AM, Peter Karunyu <pkarunyu@gmail.com> wrote:

@Solomon, kindly oblige me with questions below...

Lets assume a traffic of 1.5 million users. Since there were about 400,000 candidates, each one of them submits a request, and each one tells at most 3 siblings to do the same :-)

Ok so 1.5 million simultaneous requests, each request will take at most 10 milliseconds so we will need 15 million milliseconds to do this so thats 15,000 seconds, lets assme we have 1000 threads servicing concurrently, so we will require 15 seconds to service this. 15 seconds, that's so little taking into account that its really prectically impossible to have 1.5 million requests happening at EXACTLY the same time. so, that will be no bottle neck.

On Thu, Mar 1, 2012 at 10:24 AM, solomon kariri <solomonkariri@gmail.com> wrote:

Ok in my opinion,

All this data is read only,

Its so little it can fit into RAM

I believe the limit should be bandwidth, ok lets assume this implementation,

First of all they get rid of that php file and replace it with a simple index.html, that way it will just be served, nothing processed to generate html, plus it will be cached by the browser.

They will then add a javascript that simply does an ajax query, receives a JSON response and generates the relevant html to display the JSON. That will move quite a lot of processing to the client side.

They will need a PHP file @ the server side to service this JSON request, no? And I think there is no processing per se; all they are doing is fetch data, display data.

Well first of all, I wont use PHP. If it was me, I would use Java, that's what I'm good at and what I can explain things with very easily. So they will have a servlet, the data will be loaded into a static array first time the server starts and the array stays in memory forever.

 

 

On the server, they can simply load all the records on an array and sort on index number.

Assuming they are using PHP, an array might not cut it since it will have to be created for each request. 1.5m requests is a tad too many.  On the other hand, if they have an in-memory MySQL table indexed on the candidates index number, the entire table is loaded into RAM, making it a bit faster. Making the candidates index number column not allow NULLs and then use it in the WHERE clause will probably make the search results query really really fast.

With my Java approach, the array is created only once when the server starts. I dont know much about how php does this. Well for the MYSQL thing, its most likely to keep going back to the file system one in a while, what I want is a system that NEVER goes back to the hard disk to look up anything, all the information is in RAM. we already know index numbers are unique, and we have them already, so no need for not allowing nulls and no SQL queries run anywhere, SQl qeries need to be parsed and optimized, I dont know how well MYSQL does this or its query caching protocol but all in all with my approach, MYSQL doesn't come up anywhere except the forst time the server starts and the data is loaded into the array.


Secondly, playing around with key_buffer_size, they can actually load the entire index onto RAM, making searches even faster!

This is totally unnecessary with my approach.

 

 

That index number can actually be treated as a long, so no complex comparison. The sorting will be done just once, when the server starts since the data doesn't change. This will take O(nlogn) time. that will be like 5 seconds on the maximum. For any requests, a binary search is done on the sorted data and response is offered immediately. Since the data doesn't change, they can have a pool of threads servicing the requests and performing the binary searches concurrently. All searches will take O(logn) time, that's like negligible for the amount of data involved.

You know, why are we searching in the first place? The data is read only! So why not adopt a strategy of search-once-display-many-times? If a candidate is searched the first time, cache the results and display the cached results to the other 3 siblings!

No, I wouldnt suggest caching on the server side, just the client side, we can make the javascript use GET protocol and tell browser that the results are cacheable. That way the same requests happening from the same browser will use the cache. For the server side, the RAM speed is quite high and we dont want to use up so mch RAM storing caches of every result.

 

But wait a minute, we know that at most 400,000 students will search, so why not search for them before they do and cache the results? Write a simple routine which outputs the results for all these students to static files.

NO not at all, that will involve disk access, disk access is usually very slow compared to RAM and Processr Speed, we are trying as mch as possible to avoid ANY disk access.

 

If we are dealing with static files, then we can get rid of Apache and instead use Nginx or LightHTTPD.

So we cant use this because the file system based approach is not recommended. 

 

If they want to keep access logs as well, well, that's pretty simple, they will create a simple in memory queue and add an entry to the queue and leave the process of writing that to disk/database to a separate thread or a number of threads, that way, the slow disk access speeds don't affect response time. With that, the only limit left will be the bandwidth. Actually with a 5mbps up and down link, they will be sorted, all people are looking for is text, most of the time. 

So I just wonder, is this so hard to implement or I'm I missing something?

If only the techies there are diligent, they can solve this problem with zero cost since all the tools and solutions they need are open source.
 

Actually, I can add something here to make it more efficient. Seek times in disks are usually slow. Disks are quite good at batch writes though. So instead of having to save the logs to disk/database directly, the thread responsible for this simply blocks access to the incoming qeue lock for about 5ms every 2 mins, creates a new empty one and keeps a copy of the current one. RAM copying is quite fast, it will be just a matter of memory reference change to the newly created queue. then it unblocks and queueing can continue, then instead of processing the copied queue it simply serializes it in a batch write to the disk and frees the space it was occupying in RAM leaving the space available for new queueing. The serialized qeues can then be processed later even in another machine.

 

On Thu, Mar 1, 2012 at 9:51 AM, James Kagwe <kagwejg@gmail.com> wrote:

Surprising they don't want to fix a problem that occurs only once a year yet the system is only relevant once a year. Its better not to offer a service than to offer a substandard service. They must build the required capacity or just kill the service altogether, otherwise its just a waste of resources. They probably an learn from electoral commission tallying system.



On 3/1/2012 8:52 AM, Peter Karunyu wrote:

A member of this list who knows someone in KNEC said here that they know what the problem is, they know how to fix it, they just don't see the logic in fixing a problem which occurs once a year.

So, in addition to lamenting here, why don't we think a lil bit outside the box;

We propose a solution which not only works for this annual occurrence, but also works for other problems they have which we don't know. For example, how about coming up with a solution which they can use to disseminate ALL exam results, not just KCSE, online? That should save then quite a bit in paper and printing costs.

But I think the real cause of this problem is lack of accountability; the CIRT team @ CCK focuses solely on security, the Ministry of Info. focuses on policies, KICTB focuses on implementing some of those policies and a few other things, but not including quality of software. The directorate of e-government provides oversight on these systems. So if my opinions here are correct, someone @ Dr. Kate Getao's office is sleeping on the job.

On Thu, Mar 1, 2012 at 8:11 AM, Bernard Owuor <b_owuor@yahoo.com> wrote:

True. Fact that you can see "Failed connection to mysql DB" means that there's more than enough infrastructure.
(1) You get a response from the server
- this means there is sufficient bandwidth, and the webserver that hosts the app has sufficient CPU cycles

(2) they're using mysql
Apart from potential limitations in the the number of connections in windows, you can easily do 500 - 1000 simultaneous connections. Only one connection is needed, though, so this should not be an issue

Obviously, the architecture is poor and the app is not tested. The developer really skimped on their computer science classes, or didn't have any at all.


--- On Wed, 2/29/12, Rad! <conradakunga@gmail.com> wrote:


From: Rad! <conradakunga@gmail.com>
Subject: Re: [Skunkworks] KNEC WEBSITE
To: "Skunkworks Mailing List" <skunkworks@lists.my.co.ke>
Date: Wednesday, February 29, 2012, 1:57 PM

 

Why are we assuming the problem is the infrastructure?

On Wednesday, February 29, 2012, Solomon Mbũrũ Kamau wrote:

Can we do a harambee, like the one we did, the other day, for the purchase of a server(s) for KNEC and give it to them as a gift?

On 29 February 2012 17:38, ndungu stephen <ndungustephen@gmail.com> wrote:

But of course..
_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke

 

 

-----Inline Attachment Follows-----

 

_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://orion.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke




--
Regards,
Peter Karunyu
-------------------


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------
 
Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke

 


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke



 

--

Solomon Kariri,

Software Developer,
Cell: +254736 729 450
Skype: solomonkariri


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke




--
Regards,
Peter Karunyu
-------------------


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke



 

--
Solomon Kariri,

Software Developer,
Cell: +254736 729 450
Skype: solomonkariri


_______________________________________________
Skunkworks mailing list
Skunkworks@lists.my.co.ke
------------
List info, subscribe/unsubscribe
http://lists.my.co.ke/cgi-bin/mailman/listinfo/skunkworks
------------

Skunkworks Rules
http://my.co.ke/phpbb/viewtopic.php?f=24&t=94
------------
Other services @ http://my.co.ke



--
Regards,
Peter Karunyu
-------------------