The days of running a website on a single server are not quite over. This is still the norm for a lot of sites and, if it's properly implemented, the site will handle a modest amount of traffic without issue.
With that said, it doesn't take much to overwhelm a site running on a single server. An unexpected burst of interest from a mention on Reddit or an article in the newspaper is often the culprit, but a new feature that significantly increases pressure on the webserver can cause problems as well and will often necessitate a rollback of a new release while a resolution to the issue is found.
But keeping up with spikes in demand is only one reason to be thinking about scaling a website. Another has to do with availability. If the network connecting a single web server to the internet goes down, of course, the web site will become unavailable.
The solution to both the demand-spike problem and the availability problem is the same. Whether a site is running in a datacenter or in the cloud at AWS or Google, the idea is that you can set up what's known as a "load balancer" that will distribute incoming requests to the array of web servers it services.
If you really want to be resilient, the thing to do is to distribute your web servers to various areas of the region, country or world, so no single disaster will take down your site in its entirety.
From a system configuration perspective, this is a pretty straightforward concept. The whole thing requires just a few clicks in AWS and the HAProxy learning curve isn't altogether steep for the DIY crowd.
But things can get a bit tricky, however, if your development team didn't plan for this architecture in advance.
The biggest problems have to do with caching and sessions. The caching issue involves the potential for different servers to have cached values that should be the same but aren't because of timing differences. In many cases, these discrepancies aren't particularly bothersome and can be overlooked. But the session problem is more serious. This issue has to do with a users's session data being stored in one server but being unavailable in the next because the user hasn't yet interacted with it.
Both of these issues can be solved using a shared key/value store which serves as the shared memory of the group of web servers. There are some performance considerations to bear in mind and this can mean a fairly significant rewrite of code to make use of the shared data storage, but there really isn't a great way around it to transition to a data access model that scales.
At the end of the day, your code just can't depend on data stored in the memory or on the filesystem of machine the webserver is running on because, over multiple requests, a cache data mismatch is extremely likely to occur. And this will, at best, lead to a bad user experience and, at worst, lead to both a bad user experience and a corrupt database.
The good news is, once you have switch to a shared memory solution based on a networked key/value store, you can scale to handle huge volumes of incoming requests and, if you're running in the cloud, you can automatically spin up 20 servers to handle a spike in demand and drop back down to one or two when traffic drops back to normal levels.
For Django and Rails developers, this strategy is absolutely trivial to implement. But regardless of your development environment, and even if it requires a a fair amount of code changes, the model is straightforward and easy to understand. And it beautifully eliminates the problem of data inconsistency, one of the biggest issues you can face scaling web applications.