Season 4, Episode 1
Today, nearly all businesses are using some form of behavioral customer data to drive insight. There is a wide spectrum of sophistication emerging, from small businesses using basic website analytics to understand customer journeys, to SaaS businesses putting behavioral data in the heart of their product, processes and decision-making.
In this episode of the Orbit podcast Tim Harrison, a specialist in Hg’s data science and analytics team, speaks with Yali Sassoon, co-founder at Snowplow Analytics – a business helping organisations to unlock value from rich customer data.
Our two guests discuss the starting points, main challenges and latest trends in unlocking valuable data.
We see that companies that are driving value and innovating with customer behaviour data are:
- Asking the right questions.
Being clear on what business problem the data is going to solve. It is better to invest in a group of people looking to move the needle on a specific issue, rather than investing in data infrastructure and data tooling, data scientists and analysts for data’s sake.
- Building a data driven culture.
Without backing from the wider businesses, a data team working in isolation are unlikely to yield value. Ultimately the business, and the questions it wants to answer, should drive investments in technology, data stack and team, not the other way around.
- Adopting an agile approach to a project to behaviour data.
Imperfectly answering questions about customer behaviour is better than not answering them at all. Early insights provide a foundation for getting better and more sophisticated over time.
The emerging trends show more sophisticated businesses looking at real-time data insight on their customers. Organizations are getting more comfortable with leveraging customer behaviour data for churn models, constantly identifying who is at risk of leaving, and using how customer’s interact with their platform to personalize the experience. The future will be less about analytics and reporting, and more about using the data proactively to improve our customer experience.
Hello everyone, and welcome to Orbit, the Hg podcast series, where we speak to leaders and innovators from across the software and technology ecosystem, discussing the key trends, changing how we all do business. I’m Tim Harrison, a specialist in Hg’s data science and analytics team. And today, I’m delighted to be joined by Yali Sassoon, co-founder at Snowplow Analytics. Snowplow enables businesses to unlock data, to better understand their customers, and I’ve invited Yali to discuss how businesses can use data science to best leverage this customer behavior data, and product telemetry. So, before we dive in Yali, why don’t we kick off? And you can introduce yourself properly, and perhaps tell us the story of how Snowplow started.
The history of Snowplow Analytics and the data problem it set to solve
Thanks very much, Tim. Hello, everybody. My name is Yali, as Tim has already alluded to, I’m a data guy. So, I’ve been a data guy all my life. If you go very far back in time, I studied maths and science at university and then history and philosophy of science. And then I was really fundamentally interested in how, over the period of centuries, we learned to reason about more and more disciplines as human beings in a kind of a numeric and numbers driven way. So, when I finished university, I got into consultancy, and that was a good opportunity to take a systematic kind of scientific approach to understanding how businesses work, and use things to help them work better. I did that as a strategy consultant and operations consultant. I then got a chance to work actually in a tech company, in a startup, where I had a lot of passion in the UK, in a company called Openads that later became OpenX.
And then 10 years ago, I co-founded Snowplow. And at the time, my co-founder Alex and I, we were consultants. We were working with a lot of businesses in the UK, so we’re both based in London, helping them really do digital product development and customer marketing more effectively, and showing how data could be used to do both of those things more effectively. And one of the data sets that we were really excited about were behavioral data. So, we work with customers, in many cases, who are very used to using transactional data for loyalty cards and those sorts of things, to understand and segment their customer base, based on what they’d bought, and that could tell you a lot, but we were really interested in using behavioral data to better understand customers, to better track them through more of their journey, not just what products that they bought, but which had they shown an interest in. “Where had they shown an interest in,” as an example?
And what we realized, in 2012, was the technology possibilities for creating and processing that data at scale, that was becoming a lot easier because of kind of open source, big data technologies Hadoop, which no one really talks about anymore, but was huge in 2012, and cloud services, things like Amazon Web Services that made it easy for people with small budgets like the two of us to go and spin up these clusters, and use them to help our customers process really large volumes of data. So, that was really exciting. That led us to start Snowplow. And the idea behind Snowplow, when we started in 2012, was to enable any business to collect granular behavioral data from their website… We were primarily web focused at the beginning… And warehouse that data and then ask any question that they wanted of that data themselves.
So, our customers were using Google Analytics and Site Catalyst at the time. And those tools would enable customers to collect that data, but it forced them to use that data to do very specific things, to do conversion rate optimization or to optimize their digital marketing spend. And we really felt like there was a load more possibility that could be done with this data. We wanted to realize some of that possibility. And so we built Snowplow as a project to enable us to do that for our customers. And then we were really excited that other people in the world sort of started picking up our software and using it, and that encouraged us to build a business around it, and 10 years later, here we are.
And did that vision of starting, am I correct in saying it started as an open source project, and that sort of other people started using that Snowplow code base? Is that how you always envisioned it?
It’d be nice to give us that much credit. I think what happened was we built this initial version of the technology, and we were really excited about it. This would let us work with any one of our customers, and they could collect their own data and ask any question of that data, and that felt very liberating. It felt like we were freeing ourselves and our customers from the shackles of these digital analytics solutions that were charging to collect customer data and then charging to give them back their custom data, so they could do what they want with it in the case of Adobe Site Catalyst at time, which just since become Adobe Analytics. And so the idea behind open sourcing was to enable other people to be liberated from the shackles of their service providers, and really enable any clever data person to be able to ask any question of the data.
So, one of the things that I strongly seen over my previous career as a data analyst is that big breakthroughs, or the times that data really makes a difference, is often not in the analysis. It’s in the question you ask. If you ask an interesting question and then use data to answer it, that’s when you really drive value. And I really felt like the digital analytics industry was locked into everybody asking and answering the same questions, using the same tools, and that’s what we were rebelling against. And so it was a very natural thing to open source this, and see what kind of innovation that would unleash. But honestly, when we open sourced it, we didn’t know we were going to build a business around it. We didn’t have any idea how many other people were interested in doing what we were doing and were excited about these possibilities.
And our expectation was that probably would not be very many of them, because we’d open sourced other projects and we’d started other businesses that hadn’t done so well. So, we were very pleasantly surprised and then we built the business around the project later. So, the project was first and the project was open source at its inception for those reasons of setting people free and liberating them to collect their own data. And then the business came second, but there wasn’t some grand plan to build a business from the beginning. It was just excitement about the possibilities, and enabling people to realize those possibilities.
Early applications for customer behavior data insights in business
That’s really fascinating. And then you said something there that I want to touch on, which is that point about sort of enabling new questions to be asked, if I’ve understood that correctly. Do you have an example of sort of what was sort of the way people were stuck in thinking before, when they were maybe trying to do customer behavior data insights that sort of was unlocked when they actually had this data in their own systems, and at their fingertips?
So, a couple of examples at the time that have since become really commonplace… So, to take a retailer example, we were very interested in understanding, across the set of products that a retailer would offer, “How often was each product viewed and then how often was each product added to basket,” and look at that ratio, and that’s something that you couldn’t do back in 2012 with Google Analytics, which was the tool that most people were using, but that was really important, because that ratio would tell you, or those two numbers would tell you, you might have products that lots of people were looking at, but nobody was adding to basket. And that tells you something. Either you’re over promoting that product, or lots of people are interested in that product, but maybe it’s wrongly priced or wrongly described, because the look to book ratio is very low.
But at the other end, you might have products that everybody who looks at adds it to their basket, so they’re really compelling, but there are very few people going to view those products. And so those are very actionable numbers that are really helpful for an online retailer, driving up the yield on their site. And that wasn’t something that was really possible in those tools, because people weren’t really asking those questions. The tooling is now much better, and many more of those questions can be answered. Another question that we were very interested in was, “What is the journey that a customer goes through, especially with a high purchase item, or if you’re a SaaS provider, that they’re going through, looking to evaluate a SaaS solution before they sign up?” And that might be a journey that spans multiple sessions when they’re educating themselves on different aspects of the product and, “Which parts of that education seem to be critical for what type of user?”
And that seemed to be a very fundamental and very interesting thing for any organizations want to explore. If I’m a SaaS provider and there’s one product that I offer, what are the stage gates that somebody goes through in their own mind when they’re evaluating my tool, and which parts of my content were most helpful in moving them forwards? And that was something that was just very, very difficult to do with additional analytics tools in 2012. They were all based around sessions. So, they worked really well where you wanted to understand what somebody did in this defined period of time, coming in off this ad and then potentially purchasing.
So, a bookshop might be able to get most of the analytics that they needed, but we were seeing all these different digital businesses, some of which were B2B, some of which were B2C. You think about dating sites. You think about online games, and there are all these interesting customer journeys that you want to understand, and actually start asking open questions around, and the tools didn’t really let you explore that in any meaningful way, but actually understanding that purchaser, and if you’re a SaaS company, is absolutely critical, and if you well understand it, you well understand stage gates. You can optimize that, and with significant impacts on your business growth. So, what questions you ask matters deeply.
Applications in B2B SaaS
That’s a really cool point, and I think there, the distinction, sort of one challenge that we’ve often seen is that maybe a B2B company knows of these use cases in the B2C context, where there’re retailers with millions of customers, and they can really build up statistics. Maybe you could help us understand. So, even for a B2B SaaS company, when you’re looking at that maybe initial customer acquisition journey, or marketing or sales journey, is that where this customer behavior data can also play a role, with those multiple sessions? Is that what you mean?
Absolutely, yeah. The questions are often quite similar across different types of businesses and different types of industries. Everybody, in nearly all businesses, is interested in, “There’s some kind of customer acquisition journey,” and the longer that is, and the more complicated is, the more interesting it is, and probably the more important it is, or the more challenging it is to understand, and understanding it is challenging. And you alluded to one of the challenges. If you’re a B2C business with millions of customers, there are a whole variety of statistical methods that can help you build that understanding that might not be open to you if you were a B2B SaaS provider with maybe a smaller, more niche customer base, but they might be much, much higher value, but those questions are just as valuable, and that understanding is just as valuable, and it just means that the approach you need to take to building that understanding is different, but it starts with asking those same really important questions.
That’s really fascinating. And it’s really great to understand that potential value still, even in the B2B space. Having worked with a lot of sort of different companies, one thing I’ve heard a few times is that customer behavior data is often seen as a topic that CTOs or heads of data think has potentially got a bit of a barrier to entry, or is difficult to get started with. In your opinion, one, I guess, is that true, and sort of what is the best way for businesses to get started with using customer behavior data, sort of particularly in the B2B SaaS space? And maybe are there any sort of must have requirements that you need to be doing in your data capability beforehand?
That’s a good question. I want to think that nearly all businesses are doing something with behavioral data even if it might be very simple. So, there’s a kind of a spectrum you can draw, and on one end of the spectrum, normally, the first thing the organizations are doing is some level of marketing attribution, trying to understand, “I’ve got a website,” “I’ve got a product,” “The different people that are coming and looking at my product, where are they coming from, and how many of them are signing up?” And there’s some level of that going on. I think there are lots of businesses where they’re like, “We’ve got some numbers, but they don’t really tell us much, because we know that a purchase cycle is long, and we can’t really map the user over that long journey.”
And maybe it’s not one user. Maybe there are five or six or 10 different people involved in a purchase journey, and trying to figure out that these 10 cookies, or really these 10 people across these eight devices that all belong to one company, that they’re all part of the same journey. That can be difficult, but there should be something going on, just around understanding traffic levels and so on. And then I expect, on the product side, there’s some basic product analytics in place like, “How many daily active users are engaging in our products?” “What are engagement levels looking like over time?” “If there are key funnels or workflows, tracking people through those funnels with a view to optimizing them, maybe there’s an acquisition flow,” and there’re several steps in onboarding into the product, and tracking there. So, those are sort of the simplest uses of behavioral data that I think most organizations will have something in place, right over to the other end of the scale, where you get very sophisticated B2B SaaS companies, companies like a Data Dog, for example, that put behavioral data in the heart of their business.
And they’re using the data to understand the user journey in detail, all the way up to signing up to the product. And then they’re using behavioral data within the product to understand the product experience, and improve it for their users, and point their users at dashboards and content that’s most likely to be interesting to them, to understand which subsets of the customer base might be at risk of churning, or seem to be engaging less, and are there ways to reengage? Then are there ways to push other customers to use the product more effectively, and drive usage and drive value for those customers? And if they’re consuming the data with AI and ML algorithms to sort of identify those segments and activate them. So, there’s this real spectrum in the industry between what the most advanced organizations are doing, and what some of the least advanced organizations are doing.
And if the question is, “What does an organization need to do to get started,” “Is there anything they need to have in place first,” I think there’s a couple of things. So, I think there needs to be a commitment to using data to make a difference in the business, and some organizations don’t have that. And in those organizations, it’s just difficult, because answering these questions with data takes time and resources. And so unless there’s a critical mass of people that are committed to doing that, and ideally commitment at a management level, then that becomes a difficult thing to do. This stuff requires people and technology and processes and culture and time. So, that’s one sort of prerequisite and then the other thing that I think is really helpful, and is often missed, is building that set of questions and prioritizing based on the impact to the business.
If you’ve got a business where retention is the key business driver, then understanding, “What are the drivers that churn, and who’s likely to churn,” becomes business critical, and that’s where your data resource, if you have, should be focused on. Similarly, if your goal is growing business and new logos, then that customer acquisition piece becomes important. So, having clarity on the areas of focus and then the questions that you’d want answered from a data perspective to help you make a difference to those areas of focus, that, I think, is really valuable, because then, instead of investing in data infrastructure and data tooling and data scientists and data analysts for data’s sake, you are investing in a group of people to move some specific needle in some specific direction. And that, I think, is a much better starting point for moving up the data maturity curve, and starting to do this stuff in a more sophisticated way.
I find it really interesting that when we talked about sort of what the must haves were, it wasn’t an answer of, “Yeah, you need a data warehouse and a data lake and these technologies,” but really it was, if I’ve understood correctly, more coming from the people and the data driven culture. I guess you’ve seen a lot of companies go through that transformation. What are the sort of the key things you look out for to identify whether that culture is ready, and sort of any other ideas you have for how to get there, for data teams that want to start doing this but maybe they’re not sure if their organizations are there yet?
So, a few random thoughts there, because there are different sorts of signals that you can look for. So, one is talking to the people as an organization, especially the people in different lines of business, to understand where data is being used to date, where there’s an intention to use data. And that’s valuable, because you get some view of the extent to which data is part of the way the organization makes decisions, and the extent to which data is valued by different stakeholders in an organization. So, it’s possible you get organizations where there are people who are being very data driven, but they’re doing it within their own domains, and outside those domains, maybe that’s not really appreciated, and that’s not really how things are done. There are others where that’s done and that’s celebrated and then that’s emulated everywhere else, and people are really pushing that.
So, that’s something to look for. Something else we often ask, to come to your point, is around the technology setup so that I’m a firm believer that the business and the questions drive the investments in technology, not the other way around, but we do have a lot of organizations invest in the tech first and then have to go and sort of source the questions, which I think is a bit back to front that is quite common. And there, when you try and talk to different people about data, they’ll talk about the technology. And when you push them about, “How is the technology been used to make a difference,” if there’s something missing there, that, to me, is very risky. So, we started Snowplow 10 years ago in our space about that.
“We think this, and take it with a pinch of salt.” I think we were quite ahead of our time. And a lot of our early adopters were very ahead of their time, and they were very technology and architecture focused, and they started adopting our tech with a load of gusto, and a load of excitement and then, when sort of the economic climate changed, suddenly our stakeholders at a lot of these companies disappeared and then the Snowplow contracts were terminated, and we realized that our stakeholders in these organizations, they hadn’t necessarily sold the rest of the business on what they were doing, but probably more crucially, they hadn’t demonstrated value, and it’s frustrating, because I know some of them had built some really great stuff. They’d used the behavioral data to build really rich insights, but they hadn’t taken the rest of the business on the journey.
So, either the business had made great decisions but had not understood how they’d done it and then cut the infrastructure that had supported it, or they’d never actually gone on to implement the decision and realize the benefit. There’s a trap where you can invest in the technology but not sort of leverage it in a very public way to show value, and I think that’s a trap that data teams often fall into. I think there’s another trap where you’re very aware that there are these business questions to answer. You then look at what the technology is required to solve those problems. You’re like, “That’s a really big investment, and that’s difficult. We don’t really have the people and the time. So, let’s park it and muddle through,” and that is another sort of recipe… That’s another mistake to make, because actually, imperfectly answering these questions is better than not answering them, and provides a foundation for getting better over time.
So, you want to take an agile approach. Be very clear what questions are valuable to answer. Focus on getting some sort of answer, but understand that maybe your first answer is quite limited, and then iterate that as you’re able to show value. To bring it back to your question… Sorry, I’m not very good at being succinct… You can assess the business on how much infrastructure they’ve invested, and the effectiveness of their data stack. “Do they have a data warehouse? Do they have a data lake? Have they got investments in BI and AI tooling? Do lots of people have access to that?” That’s interesting. And then it’s interesting to talk to the individual teams and find out how they’re using the data and then some combination of the two. Hopefully, they’re using the data to do loads of amazing things, and they’ve got the stack there, but if there’s a mismatch, or an imbalance, then that normally spells some sort of trouble.
That’s extremely interesting. And there’re loads I want unpick there, but I guess, thinking on… One sort of word that you mentioned there was sort of the “driving value” and measuring sort of the ROI and the impacts, and thinking about kinds of projects that we often see, things about sort of churn prediction, “Where can we upsell prediction, cross sales prediction, and propensity modeling there, finding the best opportunities with our customers,” what do you see as the best practice there in how to demonstrate that value to the business, either sort of upfront at the beginning of the project? Do you sort of recommend trying to make that ROI case early on, or do you say, “Well, actually, we’ve got to do a six week sprint, get a proof of concept and then demonstrate value,” or how do you normally see that working for your customers?
I think we see both. My personal preference is to say, “Look, we’ve got this. Our churn today is here, and we want to move it here, and this is worth this amount to the business, and we are going to take a data driven approach to improving our retention. And so what we’re going to do is we’re going to try and identify who the users are who we’re likely to churn. Then we’re going to try and validate if we’re good at identifying them and then there are statistical approaches we can take to doing that. And then when we get good at spotting them, we’re going to talk to them to try and understand… To talk to a subset of them… To try and understand why they’re churning. And we can also talk to ones that have churned and we know they churn. And then we’re going to use that to create a set of hypotheses and validate that in data, and try and make changes that are going to make a difference to that number. So, maybe we’re going to market and engage these users around the product. Maybe we’re going to make changes to the product.”
It’s an iterative cycle. You’re coming up with different ideas to reduce churning. You’re using data to figure out, “Do those ideas change or not.” And maybe as part of a quarterly OKR process, you’re coming back to the business saying, “Look, we set out this quarter to move the retention percentage from here to here, and we only moved it 70% of what we did, but this is how we did that, and these are the ideas we have to continue moving it into the next quarter.” So, be data driven from the beginning. Set the target and the value that would unlock for the business, but just be clear that the data is going to help us come up with ideas for how we do it and keep us honest and measure our progress against that.
But how quickly we’re able to do that and how much that we’re able to do is uncertain, and I think most people in the business, if that argument is made clearly and repeated enough times, will accept it. And it generally is possible to show progress. “How much progress,” often a source of contention. “It’s not fast enough,” or, “It is fast enough,” or whatever that is, but if people can see the numbers are moving in the right direction, then you’re building a goodwill to then take the same approach and apply it to other parts of the business.
Operationalizing data insights
And I guess there’s a big step between sort of building your model, whether that’s machine learning or just sort of some feature correlation of what you’re seeing in the customer behavior data that is correlative or predictive of likelihood of churn or likelihood of a cross sale, if they’re one of those use cases. There’s a big step going from that and having a model that can help predict that to actually operationalizing it and then driving either a reduction in churn or new cross sales opportunities. How do you see that connection between, I guess, what is probably the Data Team or the Analytics Team driving that initial modeling and analysis, to the Business Teams?
So, I think before you get to the ML model, there are probably some really quick and dirty proxies that are quite easy to operationalize. So, the churn example, how much time your customer spends in the product is a hopefully not too difficult thing to calculate. Actually, time is one of those things that’s weirdly more difficult to calculate than it should, but if you’ve got some idea, and some measure of time, level of engagement, chances are that if that’s below some threshold, the likelihood of churn is much, much higher. And so you don’t need to engineer a particularly difficult ML model to identify that. So, start off with that. Show that is predictive, and that’s hopefully something that can easily be calculated, and get that flow working between the data team and the business team so that, for example, you’re identifying, at a particular point in the month, which of the accounts you think are risk of churning.
You’re running some sort of marketing, or some sort of program on them, to try and prevent that. You’ve got a holdout group. So, you’re testing, “Am I making a difference versus my control group?” And then you are iterating, “Get that working before you’ve gone and built an ML based model, and if you can show results there, but actually, you think your prediction isn’t that good, then the next step is to go and build a model.” But I think it’d be a mistake to go and build a really elaborate churn model that takes loads of different features about loads of different aspects of your customer’s behavior, train that until it’s really, really accurate, then put it live, then get some results, then run the program, then realize that you’ve got a load of workflow to figure out with the Business Team, because you might have lost six months in that process.
A really interesting message, and I think it’s one that we see as well in the Hg Data Team, which is often there are cases where we do machine learning models, and that can be highly predictive and highly powerful, and really drive impact. But often, especially when maybe there hasn’t been so much before, some of that simple analytics, and looking at a few key drivers, can be really a quick win there. Maybe on that topic… You kind of touched on it there with minutes in products… Are there any other sort of interesting ones you’ve seen over time, either for sort of churn prediction or maybe sort of cross sales and upsells, where you’ve been potentially surprised, or you found it actually has a lot of predictive power for customer behavior data signal?
So, time is worth dwelling on, because it just comes up again and again and again. Somebody told me, but I can’t remember who, and this might not be accurate, that Netflix, which is sort of famous for its recommendation algorithm, that one of the key features that uses how much time people spend dwelling on different series when they’re going through the menu, that is very predictive. That may or may not be true. I think that’s super interesting. Across our customer base, which is a lot of SaaS, a lot of retail, some FinTechs and media companies, time spent is definitely one of the most predictive signals. With churn, this is an obvious one that sometimes gets missed. There’re certain parts of the account screen that, if somebody’s looking around, chances are they’re looking for the cancellation button. So, that can be pretty, pretty predictive, but pretty crude. It might be a bit late by the time-
Time hovering over the “cancel” button?
Exactly, and then engagement with the support team… The other thing that we sometimes see being interesting is, if the customer is in a workflow and they don’t get to the end of that workflow and so you see them maybe retrying that workflow multiple times, and they’re spinning… They’re trying to accomplish something and they’re not managing to do it… If that happens more than a certain amount, that is often predictive that their product experience is plummeting, and they might be at risk of churning.
The big trends in SaaS – Data to power parts and personalize products
That’s pretty fascinating, and it’s these additional signals that kind of go beyond just that basic, “How many minutes are they spending in the product,” that sort of other metrics of time within. I guess there, all our businesses are interested in sort of the churn and potential for upsell. I guess, thinking of the CTOs and CPOs, what kind of use cases do you see actually where customer behavior data can be used to sort of actually enhance the core product itself, either in sort of feedback, but also have you seen use cases where really customer behavior data sort of provides live feedback and live engagement with the customer?
That is a big trend. So, we’re seeing more and more SaaS companies actually are using the data to power parts of their product. So, sometimes we see customer facing metrics. So, you might be a SaaS company that provides… I don’t know… An email service to your users, and being able to provide statistics back to administrators of the account, showing who within the organization is using the email service, and how often, and where can be valuable to help those businesses understand the value that they’re getting from the SaaS tools list, those customer facing metrics can be powerful. Sometimes the behavioral data is actually used to power parts of the product in more subtle ways, so recommending content or personalizing the experience in some ways. So, based on what the user is doing, can we help them accomplish what they want to accomplish? And that’s something that we see more and more, and that’s a really nice use of a combination of sort of very predictive behavioral data and then machine learning, to use that to make informed guesses about what somebody is or isn’t going to be interested in.
Just to bring that one to life, I guess, in a sort of real example, is that a user is in the workflow and is clicking certain buttons and then it sort of directs them to the bit they’re most likely to use next, or is it more when they’re doing a search, and sort of the recommendations that can be brought to light?
It’s more search and recommendations, so if they’re engaging in one part of the product experience, then around that, highlighting other parts of the experience that might be relevant. So, generally, I’ve seen it much less in the actual workflows, where I think that the focus is more on if the workflow is directing users to make each step in the workflow easier. Sometimes they can be a bit more blurred, especially if there are workflow steps where you’re… I don’t know… Maybe you’re making a presentation, and the set of backgrounds you’re choosing from. Then there are opportunities to use behavioral data and other things that you know about, either this person or other people in their organization, the type of presentations that they’ve created to suggest the backgrounds, or the templates that are most likely to help this user at this point in time when they’re creating their presentation, as an example.
The big challenges to behavioural data implementation
Really interesting, and I think that’s a really big step there, with actually using this customer behavior data, not just for descriptive reporting and predictive analytics, but actually using it as part of the live customer experience to enhance the product offering. One point I wanted touch on, we mentioned one challenge really being, whether you call it the “data driven culture,” or getting not just the tech stack, but also the people and process, are there any other big challenges you’ve seen, or that you come across time and time again that essentially cause big difficulties in customer behavior data projects? What are the things that companies should watch out for?
With behavioral data in particular, there are quite significant organizational challenges. So, especially if you are a company where your product or service is offered on a large number of different channels, you often have different engineering, product engineering squads, that are responsible for each of those products. And then you want to collect. You want to create data across those different platforms and channels in a consistent way. And so you’re asking those different product engineering teams to instrument tracking as per a specification, and that’s a challenge. That’s a big coordination effort, and what happens is companies are on a journey. So, they start using this data in quite simple ways. When they’re using it in those simple ways, if there are differences in the way those teams have instrumented the tracking, or some of the tracking is better coverage than others, probably doesn’t matter.
But, as they start moving up the sophistication, as they start maturing in the use of data, as they start using this data to predict who is slightly to churn, and who’s not, suddenly, each line of data matters a lot more. If you’re missing a couple of lines, or a couple of lines , but they need something different, they might fundamentally change how you think about the intention of this particular user. And what often happens at that stage in the journey is that the people get frustrated that the quality of the data isn’t there to support the more sophisticated data use cases, and that is a good thing, weirdly.
So, it creates a huge amount of frustration, but what that means is the companies move from one level of sophistication to another, and they now need to invest in improving the data quality, and no organization ever invests in the data quality ahead of having those challenge, because it’s a big investment and it’s just not worth it, and spending all that time really diligencing the tracking and diligencing the quality of the data and so on isn’t worth it in the absence of that business value. So you’ve got to get to the point where you need the good data before you actually invest in the good data, and that’s a perfectly natural part of the journey.
So, you see companies starting with, for example, package analytics solutions, which are very quick and simple to get up and running, and get to value nice and quickly. And then, as they move onto more of these ML and AI powered use cases, as they move to try and understand their customers’ acquisition journeys, or their propensity to churn in more nuanced ways, they often need to migrate tools to tools like Snowplow, and they need to migrate their approaches to think more about things like data quality that didn’t matter so much before.
And the thing I’m always urging is that there’re often a lot of fights and upsets and frustration, but that is a natural part of the process. You invest in the data quality and it comes and then these use cases follow, and the key thing is you don’t just invest in data quality and then hope that magic happens. You do it because then you want to execute in those use cases and then, as soon as you have the data, you execute and then you show the business that investment was worth it, because otherwise, you’re asking a lot of people to put in a lot of work, and you’re not showing them how they’re getting value, and that can be a really… That’s a common failure made. “This person has taken up a load of my team’s time getting us to do all this work on data, and we haven’t seen any of the benefits of that.”
Brilliant, and that’s part of their journey, which is they now need to go back and look at the data quality, and I guess, with that comes that sort of lack of trust phase, where people sort of have the numbers, but you lack that trust. And then you say the sort of real sort of way of mitigating that is that, when you have that problem, sort of to lean into it and say, “Now we need to address data quality, and have a single source of truth, or a unified view, of what a user is in different channels, or what minutes and products are, and is that really the solution, sort of having one customer behavior data system that goes through all these different channels, or one sort of single definition?
It helps, but I don’t think it’s the solution, necessarily. So, I am skeptical when somebody is like, “This is the platform. This is the solution. This is the time that we take the data that’s bad now, and we’re going to fix all the problems.” I think we’ve got to be honest, and say, “The data quality is here, and it needs to get to here to execute on this use case, and to execute on this use case after that, we’re going to have to do this extra work, and push it up. So, we need to make this investment, but we’re doing it because, to my earlier point, we want to unlock this use case and unlock this business value.” So, communicate that to the business. Find the team, or the place where you can do the data quality work, and prove that out, so don’t do it across the business.
That’s much too hard. If you’re a multichannel, multi-platform business pick, the platform with the Product Engineering Team, that are most data driven and most likely to look at the data and most likely to understand and care about it, and work with them on a POC. Show that, for that platform, you can, and for the people the customers that engaging through that platform, accurately predict churn, accurately predict upsell, or reliably make some difference to the business and then take that and go and work with each team sequentially to go and push through that sort of change.
And there are points in the journey where you will need to upgrade, or swap out and swap in different data tooling, to help support that quality mission, and the tooling around data qualities got a lot more developed, and that ecosystem is a lot richer, because many more organizations are on this journey, and they need that help, and that’s definitely a big part of where Snowplow plays. But, again, it’s all about the people in the processes and the journey and the politics, and the technology comes second. You can buy the best technology in the world, but if you promise everybody you’re going to fix all this stuff in eight months and then these 10 use cases are going to go live, you’ll probably be going to end up with anger in your face, unfortunately.
Really relevant, and very, very interesting to see it that way… Looking ahead, what trends do you see emerging from this sort of 10 year journey that you’ve been on with Snowplow, I’d imagine there’s been a big delta from those days to now. Where do you see us going in the next three, five, 10 years?
It’s really exciting to look ahead. I think a lot of the last 10 years has been about the disaggregation of the analytic stack. So, if we go back 10 years ago, the way people worked with behavioral data was through packaged analytics solutions. It was through solutions like Google Analytics and Adobe, and what’s happened since is companies are being much more imaginative about the behavioral data that they’re collecting, and the way they’re using it, and they’re investing in data warehouses. They can store their own data in tools like Snowplow, so they can create their own data, and AI tools like DataRobot and H2O and SageMaker, so they can build their own models and streaming architectures and so on, so they can activate that data in interesting ways. So, we’re seeing sort of this idea of the modern data stack.
I think that is going to become a lot more mainstream. So, many more organizations are getting to the level of sophistication where they’re owning their own data. They’re thinking deeply about what data they’re creating, and how they’re using that data, and they’re piecing together different tools to enable them to meet different objectives as they work on that journey. Today, I think there’s a subset of companies that are on that journey, and that number is growing and growing very quickly, which is really exciting. I then see, in terms of the evolution of the industry, a lot more AI and ML in production. So, I think there are a lot of organizations now that are experimenting with that in a development mode. But I think we’re going to see a lot more of that in a real time mode as organizations get more comfortable with things like churn models, and they have their things they want to run constantly, and constantly be identifying who is at risk of leaving, and similarly, personalization engines and so on, so much more of a focus on real time.
And so that’s going to have some interesting implications. I think the tooling for working with data in real time is going to get a lot better, and I think for businesses, what that means is that the bar around what they do with data is going to get a lot higher, which is really exciting. So, what’s cutting edge today, I think, will become a lot more standard, and where we’ll start to see real differentiation is in real time. It’ll be less on analytics and reporting. I think it’s in really pretty good shape, and organizations have become very good at it, and it’s more about how we’re using the data proactively to improve our customer experience.
That really resonates. And the part about what companies are sort of doing with data, the bar getting higher and higher, is one that we see, and one that helps me wake up in the morning, so really excited about the journey there, and where customer behavior data plays its role. I think there’re loads more we could discuss, and I’ve got whole loads of questions that we’ll have to say for another podcast session on the modern data stack and where that’s going as well, but Yali, we’re out of time. So, I want to thank you once again for a fantastic session, and I really appreciate your time and look forward to talking with you again in the future.
Thanks so much, Tim. It’s been a pleasure talking to you.