On Thu, Mar 1, 2012 at 10:43 AM, Peter Karunyu <pkarunyu@gmail.com> wrote:
@Solomon, kindly oblige me with questions below...
Lets assume a traffic of 1.5 million users. Since there were about 400,000 candidates, each one of them submits a request, and each one tells at most 3 siblings to do the same :-)
Ok so 1.5 million simultaneous requests, each request will take at most 10 milliseconds so we will need 15 million milliseconds to do this so thats 15,000 seconds, lets assme we have 1000 threads servicing concurrently, so we will require 15 seconds to service this. 15 seconds, that's so little taking into account that its really prectically impossible to have 1.5 million requests happening at EXACTLY the same time. so, that will be no bottle neck.
I believe the limit should be bandwidth, ok lets assume this implementation,
First of all they get rid of that php file and replace it with a simple index.html, that way it will just be served, nothing processed to generate html, plus it will be cached by the browser.
They will then add a javascript that simply does an ajax query, receives a JSON response and generates the relevant html to display the JSON. That will move quite a lot of processing to the client side.
They will need a PHP file @ the server side to service this JSON request, no? And I think there is no processing per se; all they are doing is fetch data, display data.
Well first of all, I wont use PHP. If it was me, I would use Java, that's what I'm good at and what I can explain things with very easily. So they will have a servlet, the data will be loaded into a static array first time the server starts and the array stays in memory forever.
On the server, they can simply load all the records on an array and sort on index number.
Assuming they are using PHP, an array might not cut it since it will have to be created for each request. 1.5m requests is a tad too many. On the other hand, if they have an in-memory MySQL table indexed on the candidates index number, the entire table is loaded into RAM, making it a bit faster. Making the candidates index number column not allow NULLs and then use it in the WHERE clause will probably make the search results query really really fast.
With my Java approach, the array is created only once when the server starts. I dont know much about how php does this. Well for the MYSQL thing, its most likely to keep going back to the file system one in a while, what I want is a system that NEVER goes back to the hard disk to look up anything, all the information is in RAM. we already know index numbers are unique, and we have them already, so no need for not allowing nulls and no SQL queries run anywhere, SQl qeries need to be parsed and optimized, I dont know how well MYSQL does this or its query caching protocol but all in all with my approach, MYSQL doesn't come up anywhere except the forst time the server starts and the data is loaded into the array.
Secondly, playing around with key_buffer_size, they can actually load the entire index onto RAM, making searches even faster!
This is totally unnecessary with my approach.
That index number can actually be treated as a long, so no complex comparison. The sorting will be done just once, when the server starts since the data doesn't change. This will take O(nlogn) time. that will be like 5 seconds on the maximum. For any requests, a binary search is done on the sorted data and response is offered immediately. Since the data doesn't change, they can have a pool of threads servicing the requests and performing the binary searches concurrently. All searches will take O(logn) time, that's like negligible for the amount of data involved.
You know, why are we searching in the first place? The data is read only! So why not adopt a strategy of search-once-display-many-times? If a candidate is searched the first time, cache the results and display the cached results to the other 3 siblings!
No, I wouldnt suggest caching on the server side, just the client side, we can make the javascript use GET protocol and tell browser that the results are cacheable. That way the same requests happening from the same browser will use the cache. For the server side, the RAM speed is quite high and we dont want to use up so mch RAM storing caches of every result.
But wait a minute, we know that at most 400,000 students will search, so why not search for them before they do and cache the results? Write a simple routine which outputs the results for all these students to static files.
NO not at all, that will involve disk access, disk access is usually very slow compared to RAM and Processr Speed, we are trying as mch as possible to avoid ANY disk access.
If we are dealing with static files, then we can get rid of Apache and instead use Nginx or LightHTTPD.
So we cant use this because the file system based approach is not recommended.
If they want to keep access logs as well, well, that's pretty simple, they will create a simple in memory queue and add an entry to the queue and leave the process of writing that to disk/database to a separate thread or a number of threads, that way, the slow disk access speeds don't affect response time. With that, the only limit left will be the bandwidth. Actually with a 5mbps up and down link, they will be sorted, all people are looking for is text, most of the time.
So I just wonder, is this so hard to implement or I'm I missing something?
If only the techies there are diligent, they can solve this problem with zero cost since all the tools and solutions they need are open source.
Actually, I can add something here to make it more efficient. Seek times in disks are usually slow. Disks are quite good at batch writes though. So instead of having to save the logs to disk/database directly, the thread responsible for this simply blocks access to the incoming qeue lock for about 5ms every 2 mins, creates a new empty one and keeps a copy of the current one. RAM copying is quite fast, it will be just a matter of memory reference change to the newly created queue. then it unblocks and queueing can continue, then instead of processing the copied queue it simply serializes it in a batch write to the disk and frees the space it was occupying in RAM leaving the space available for new queueing. The serialized qeues can then be processed later even in another machine.
On Thu, Mar 1, 2012 at 9:51 AM, James Kagwe <kagwejg@gmail.com> wrote:
Surprising they don't want to fix a problem that occurs only once a
year yet the system is only relevant once a year. Its better not to
offer a service than to offer a substandard service. They must build
the required capacity or just kill the service altogether, otherwise
its just a waste of resources. They probably an learn from electoral
commission tallying system.
On 3/1/2012 8:52 AM, Peter Karunyu wrote:
A member of this list who knows someone in KNEC said here that
they know what the problem is, they know how to fix it, they just
don't see the logic in fixing a problem which occurs once a year.
So, in addition to lamenting here, why don't we think a lil bit
outside the box;
We propose a solution which not only works for this annual
occurrence, but also works for other problems they have which we
don't know. For example, how about coming up with a solution which
they can use to disseminate ALL exam results, not just KCSE,
online? That should save then quite a bit in paper and printing
costs.
But I think the real cause of this problem is lack of
accountability; the CIRT team @ CCK focuses solely on security,
the Ministry of Info. focuses on policies, KICTB focuses on
implementing some of those policies and a few other things, but
not including quality of software. The directorate of e-government
provides oversight on these systems. So if my opinions here are
correct, someone @ Dr. Kate Getao's office is sleeping on the job.
On Thu, Mar 1, 2012 at 8:11 AM, Bernard Owuor <b_owuor@yahoo.com>
wrote:
True. Fact that you can see "Failed
connection to mysql DB" means that there's more than
enough infrastructure.
(1) You get a response from the server
- this means there is sufficient bandwidth, and the
webserver that hosts the app has sufficient CPU cycles
(2) they're using mysql
Apart from potential limitations in the the number of
connections in windows, you can easily do 500 - 1000
simultaneous connections. Only one connection is
needed, though, so this should not be an issue
Obviously, the architecture is poor and the app is not
tested. The developer really skimped on their computer
science classes, or didn't have any at all.