In the last few months, I’ve been asked to join a couple industry events focused on Big Data technologies to share what the team at Webtrends has been doing with Big Data, including a keynote at the Hadoop Summit. Just last week, I spoke to the Apache Spark Meetup group in Seattle, alongside Big Data thought leaders like Denny Lee from Databricks. These events are great ways to share what we know about Big Data and learn how to use it better, so I always walk away reinvigorated.
There has been tremendous interest from the Big Data community in the Webtrends journey and our approach to creating a powerful platform to unlock the power of the digital marketing data that we collect on behalf of our clients. We are taking a very innovative approach to applying Big Data technologies to solve the specific problems of digital analysts and marketers. Using a unique combination of open source technologies, including Hadoop/HDFS, Samza and Spark, our clients will truly have an unlimited view into what their customers are doing, a better understanding of what is happening on their digital properties, and insight into the lifetime values for visitors. There has been a lot of hype about the power of Big Data, but its promise is quickly becoming a reality for Webtrends clients.
As the Chief Architect at Webtrends, it’s my job to ensure we use technology innovation to solve the challenges of our clients. This blog will hopefully provide you with a little information on not just where we are about to be, but what is already a reality.
Our global data collection facilities around the world already collect data in near real time, and in fewer than 40 milliseconds we transfer it back to our data processing datacenters where the data is ingested and stored for client use. Powered by Apache Spark, Webtrends has been ingesting client data into our Big Data platform since 2014, while we have simultaneously been developing solutions for our clients. This data is already available to every Analytics On Demand client via Webtrends Explore providing unlimited ad-hoc data exploration. And via Webtrends Streams, the groundbreaking speed of this architecture enables our clients take action on their data as soon as it gets collected. The next step here is to make this data available in real-time for analysis and reporting. This is not limited to just certain data or key metrics, but all collected data available for exploration in real-time, which is not that far away.
We’ve really put Spark to work along with HDFS and Hadoop. Even at 13 billion events a day, and growing, we’re able to transform the data as a real time stream. Webtrends has always delivered analytics for websites and any other digital properties, and with the advent of the Internet of Things (IoT), this data has multiplied drastically in terms of volume and diversity. Reconciling across devices and sessions is very important to the digital marketer, and our streaming capabilities already do this with our lossless data collection, processing and storage. At no point through the data pipeline does our platform lose the raw event.
Any analytics provider that has been around for 5+ years has stored aggregate data and visitor level data in different data stores. The uses of the data are very different, so separate data stores were the only way to ensure response time SLAs were met. But with this architectural approach comes trade-offs. The separate data stores reduced what you could do with the data. Our Big Data platform allows us to keep visitor session and event data next to each other in the same data store. We are excited about the benefits of having data in one place that is fully sessionized and able to associate to a visitor lifetime values. This approach already enables our clients to fire event–based triggers while individuals are still on their site.
Having this data collected in real-time and stored together opens up a lot of opportunities for our clients. Soon, using LDA (Latent Dirichlet allocation) and cluster analysis, we can start classifying people’s behavior for propensity for doing certain actions while they are still on the site – is this person going to buy, abandon, or convert? That insight will allow clients to understand customer value and have automated actions to engage via emails, display ad offers, or personalized campaigns to optimize the experience and results.
Many of our clients are developing their own Big Data initiatives and/or customer intelligence databases for which the Webtrends analytics data (at the visitor level) is one of the many critical data sources. Webtrends has always provided data extracts for this purpose, but the latency between data collection and data availability is now too long. What our Big Data platform will enable us to do is shorten the window between the time a visitor buys a product on the website to the time when that visitor level record is available for consuming into a client’s customer intelligence system. New applications and encrypted data connectors will be able to deliver real-time insights soon after they happen supporting our clients internal data initiatives in a much more timely manner.
So, I am excited to get these new capabilities into the hands of our clients. The Big Data infrastructure is in place and the data is already there and available. We are in the process of building and testing new and intuitive data access and delivery methods making real-time analysis and data integration a reality. 2016 is going to be fun.