AdRoll is building products that allow customers of any size, big or small, with a lot of marketing experience or none, to run high performing marketing campaigns. And with over 20,000 customers in over 150 countries, we’ve been fairly successful in this.
Our main product has been Retargeting, the basic concept is very simple: when one of your customers leaves your site without buying, we enable you to reach them with the most appropriate messaging while they browse other sites or use their phone. This is great because it’s easily measurable and it performs as well as search.
The part that is not easy in Retargeting is the complexity of the infrastructure that lies behind it, which is called Real Time Bidding or RTB. The basic requirement in RTB, is that we are latency bound. When we talk about real time we mean it in the literal sense of a classic firm real time system. Our machines receive about 60B requests each day and they need to respond to each one of them in less than 100ms. 1% of errors with 60B requests means 600M errors and that’s too much opportunity cost.
100ms max latency is a very challenging problem globally though. We have customers from over 150 countries and light in fiber goes at 120k miles per second. That’s a real problem because this ray of light actually takes about 60ms to roundtrip between New York and Paris. That’s the majority of the time we have available to bid on a single impression.
If you want to operate in RTB, you need global presence from the beginning. It would have been extremely capital intensive for us to go across the many regions of the world that we needed to start data center or bandwidth contracts, hire engineers and create entities in all of those different nations. With our infrastructure in Amazon AWS we never have to worry about that. We just opened our business in Japan and it only takes us a few minutes to bootstrap our infrastructure and start serving that region of the world with the required low latency.
But we can’t just have our machines across the world, our data needs to be there too and it needs to be able to be queried with the same low latency. Operating a massively scalable data storage to handle over 1M requests per second is certainly an interesting task but not that differentiating for our customers. We’d rather focus on product and our algorithms. So we built the parts of the infrastructure that we needed and started using DynamoDB to store information. Today we have almost 500B items stored in DynamoDB overall. No engineers working on its operation, except all of those inside Amazon, while at the same time we see consistently low latencies of less than 5ms, throughout this growth period.
Just having data sitting statically however is not enough, it needs to be analyzed and you need to be able to store a lot of it. Moreover we can only achieve our full potential if all of AdRoll teams are able to access the data when they need it and quickly. All of our teams coordinate using Amazon S3. We’re now storing over 300TB of new compressed data in S3 every month, over 17M new files, and in March 2015 we stored as much data as we did the first 4 months of 2014 combined. This is our core asset and the more accessible it is the more value we can extract from it, this is why we generate over 10B requests per month on our buckets across all of our teams.
We also like to play around with new services when they come out and Lambda was no exception. Polling is not a scalable strategy to figure out when new files are added to S3, especially when you add 17M of them per month. When Lambda was released, a couple engineers quickly prototyped a change in flow and moved Lambda in front of S3. Very quickly we were able to go in production with this change and now every team receives notifications for the files they are interested in, as soon as they are added, solving a major bottleneck.