This series of articles will introduce a pet project of mine, KCK, a self-priming, intelligent cache designed to maximize the performance of data pipelines and data-driven websites. Its goal is to make fast, Python-powered networked systems a bit faster.
For this first article, I’m going to define what I mean when I talk about “fast, Python-powered networked systems,” because if you’re not running some version of my flavor of such a system or if you've not put much thought into architecture generally, then my project will likely just add complexity and you’ll kick yourself for ever having looked at it. Beware all who pass through these gates, for here be dragons.
So, my idea of a well-formed website is based on a few simple ideas:
I've built systems this way in Perl, Ruby and now Python. The front-end systems made HTTP requests to the back-end servers and the back-end servers talk to the databases and do whatever other heavy lifting needs to happen.
If you structure a site this way, build a proper database schema, and make efficient queries, these systems scale very well because the front-end servers are able to scale independent of the back-end servers and heavy load on one doesn't affect the performance of the other.
The diagram below shows the basic structure of these systems.
For many web-based business applications, this architecture is enough to handle hundreds or even thousands of simultaneous users.
Because the front-end and back-end servers are horizontally scalable, the database ends up being the final bottleneck. If the system isn't doing much on a per-user basis, if the database is properly indexed and if the queries are efficient, a database like PostgreSQL or MySQL can handle a LOT of traffic.
But when the fundamentals have all been properly addressed and execution time continues to exceed performance goals, the options for improving responsiveness, apart from simply buying bigger, faster computers, look like:
KCK was built to provide vastly increased levels of scalability at a greatly reduced cost for web systems. But underperforming reporting and billing systems--or any other systems whose performance is ultimately a function of how fast the database can answer queries--while they might have a different architecture than the web system discussed here, will have similar options available and KCK may well provide performance benefits in these contexts as well.
KCK is designed to be a flexible and powerful solution for each of these options:
The diagram below shows how KCK fits into a system’s architecture.
Again, KCK is a self-priming, intelligent cache designed to maximize the performance of data pipelines and data-driven websites.
I’ve been looking forward to sharing the MVP, which was loosely defined to be the point where it can both reduce operational cost and improve performance by an order of magnitude and where I have the Jupyter notebooks to prove it.
I'm close. Future posts will spell it all out. More to come.
Feedback welcomed, contributors championed.