At Affino we always aim to have the most reliable and high quality stats throughout, and as such we update the stats engines regularly to improve on Affino’s statistical accuracy.
The single biggest factor in generating accurate web stats is identifying what is a human view versus a bot view. Affino has a host of ways to do this and we continue to evolve the discovery process. Crucially underlying everything is that we audit every page impression / call that happens in Affino and classify it as User / Admin / Bot and Other. If you go to Control > Audit you can drill down into this.
On the average site around 5% of traffic is a human page view, and 95% is something else. On Affino.com it is 4.925% human as an example.
We classify bot views once we have identified that the impressions come either from a Bot IP address, or has a bot user agent, and often multiple variations of these, so one bot might have a number of IP Addresses, strings and behaviours that we use to identify them.
When a bot hits the site we don’t count it as an impression, as we simply don’t know what’s happening with the content, normally it is an engine scraping it, or indexing it, or simply checking to see that the page is there. We only count human impressions and only then when we have run a set of additional checks to verify that the page was fully viewed and interacted with.
In this latest release we added a host of additional logging and log processing to support ABC logs.
The ABC has certain rules for identifying bots and administration views, these are quite different from Affino’s bot itentification processes, so to generate ABC logs we capture all the traffic and only when you enable the ABC logging do we process and generate the ABC log data. At no point does this affect the core Affino stats. It means that ABC will report very different site traffic levels to Affino, and on initial trials we have noted that this difference is up to 250%. Crucially ABC is looking to provide comparablility in its statistical profiling between sites by treating all sites the same.
It means that if you want your stats to tell different stories then you have options in how you go about it.
The reason we have as our primary goal the commitment to accuracy is so that you can know exactly how well your campaigns are doing, and so that we can provide accurate audience identification and conversion rates. Bots can really disrupt your campaign and conversion analysis, and give you misleading impressions, so we feel it is crucial that we focus the stats on real people and their actions and decisions.
There are over 2,000 bots which regularly interact with Affino sites, these are just the ones Affino has identified. There will likely be more, stealthy, well behaved, undeclared bots that Affino has yet to surface. Many of these identify themselves and adhere to the bot guidance provided in the robots.txt on each site. Equally many violate robots.txt and in fact try to maliciously access pages specifically identified as excluded, and in the case of malicious bots attempt to penetrate Affino in some capacity.
We have updated Affino in hundreds of ways over the past quarter specifically to fend off malicious or, with a generous interpretation, badly coded bots. Frequently bots are coded to seek out vulnerabilities and generate errors which can expose elements of the underlying Affino structure. We are systematically identifying and updating aspects of Affino each day as they are flagged to prevent malicious access or errors.
In terms of the bot discovery, every time a bot is identified it goes into a bot inbox, in a typical week Affino identifies around 200 new potential bots. The team reviews these potential bots each Friday, and does a deep analysis of their behaviour.
If they are confirmed as bots we add rules to clearly identify them. If we find the bot to be malicious, or potentially destructive to our customers’ sites, e.g. DDoS like behaviour then we put them on the block list and prevent them from accessing any client site. Once any bot is identified and placed on a bot list then any future traffic from them is in turn logged and excluded from the general user stats.
As mentioned previously, some bots are intentionally, or in some cases un-intentionally, violating the robots.txt guidance by in effect performing denial of service attacks on Affino sites. These bots can hit individual sites thousands or even tens of thousands of times each day. More importantly they frequently will launch multiple simultaneous calls in the same second (or even millisecond) which could not be better designed to cause issues and trigger security shutdowns or cloud scaling.
We have rolled out a new bot blocking engine which will allow us to block bots in seconds and then roll that out to every Affino site. This will help considerably with ensuring the fastest site response times and in reducing scaling events which result in momentary slowdowns.
We are not anticipating updating the core analysis engine for some time now, however each week we will continue to add new bots to the list and further refine the bot detection and blocking, which might in turn affect your statistics. At the time of identifying and blocking an individual bot we have little idea of the future impact to the Affino site traffic levels (especially on individual sites), but in terms of malicious bots we know it can dramatically improve the performance of a host of sites.
What we will be doing over the coming months is rolling out a host of major advances in how Affino’s stats are presented throughout Affino including the new page dashboard, as well as a host of new dashboards and high level reports throughout Affino.