It’s been nearly 10 years since Affino last ran on Linux, and at the time there was simply no demand for it, surprising as that might seem. There was also still a fair amount of confusion in the market over proprietary software such as Affino running on Open Source platforms.
Everything has changed in the interim, and as Affino is now primarily a SaaS platform, the equation has flipped on its head. Since the vast majority of Affino sites run off the Amazon Web Services cloud, including all the Affino run ones, the OS question has evolved more towards auto-scaling, rapid deployment, performance and cost.
We still love Microsoft and Windows and it will remain core to our Data platform moving ahead, but we see Linux playing a much greater part in the future for Affino application servers.
Affino 7 is already running well on Apache, and Tomcat and these are our Web and Java service technologies of choice moving ahead. They make it very easy for us to embrace Linux, and today is a red letter day as Affino is running on Linux again for the first time in the 21st century.
It’s early days yet, it will probably take a week to iron out the remaining issues, but the time is right for us to embrace Linux and the incredible automation it offers. It will allow us to realise a lot of the goals we have with Affino 7 in a much faster timeframe than we had anticipated.
Also, Affino on Linux is Fast.
It took at bit longer in the end to make the full transition to Linux, we had to overcome dozens of bugs spread across most of the underlying platforms to do so. Fortunately the platform updates have been frequent, and during the interim all the platforms we’re using have improved a great deal to the extent that there are no longer any issues.
We’ve also had to deal with long running tasks, i.e. what happens if a task is longer than the life expectancy of a server. Once servers become disposable as they do on Scalr, i.e. they’re created whenever the demand requires and removed as soon as it subsides, it’s essential that long running tasks are handled gracefully. We will be using the auto-scaling in the future to distribute tasks across multiple servers as they become available to speed them up.
We’ve reduced greatly the time it takes for an Affino instance to start up since every second counts when sites are busy. This will be something we will be focusing on with every new Affino release. We can now also be less frugal with server resources and focus more on performance and less on stability since down-time is now virtually zero when servers fail.
We’ve further evolved the separation of files from the application servers. Most files now run from separate file repositories, whilst some are only kept for the life of the application servers. A lot of tuning has gone into this.
We’ve moved our search to Solr and as a result have set up a separate Solr cloud since the Solr indexes can’t run effectively on displosable application servers. No doubt we’ll extend this further as the need to scale the indexes arises.
Finally we’ve evolved the way we handle caching, since auto-scaling means we can now scale better to deal with demand peaks rather than simply serve cached content.
The benefits with the instant scaling and fail-over are immence and have been well worth the effort. We’re busy now at looking at what else we can automate.
We’ve had a very interesting couple of months reviewing all our assumptions about how to set up and run servers and networks. The reality is that working with dynamic systems like Scalr where the application server is entirely disposable means that all assumptions need to be re-viewed and re-evaluated on real use-cases.
Many of the methodologies we developed for keeping application servers as robust as possible in the pre-scalable cloud era, i.e. last 20 years, absolutely work against you. With scalable nodes, priorities shift from trying to make individual nodes as robust as possible to making site update as high as possible.
The most basic of which is that if a node shows any signs of dropped performance in needs to be killed immediately and be replaced by a new one. It took us a while to learn that one. Our learnings are still ongoing, but we’ve re-written major aspects of our application platform over the past couple of months further improve how it works in the post server era.