Data Mesh - The future for data led organisations

The good, the bad and the ugly

We are privileged to work with many types of organisation, in multiple industries, so we to get to see the good, the bad and the ugly in terms of data processes.

Many companies haven’t focused on their data strategy because often, data just hasn't been at the forefront of why their company has gained success. They’ve managed to do a great job serving customers without a data strategy for such a long time, so why invest in something where the ROI is unclear?

However, there is usually a breaking point:

  • The time required for manually getting answers

  • Numerous and ever growing number of data sources

  • No clear single source of truth

  • The nagging feeling of untapped data potential

  • Data compliance, data security, data breaches

  • And finally the fear of the competition gaining competitive advantage

This leads to a familiar shout around the office:

We need to do something about our data!

Where do you start?

When you start a data architectural build, is there a right way?

What if you buy the tech and data resources and then find out it’s built the wrong way for your organisation in 5 years?

Well, over the last few months I’ve been interviewing Heads of Data with this question in mind. If we start with the end in mind, what is the north star we should be marching towards?


Key finding 1

The main obstacle with any developed data analytics function is the bottleneck of data and skillset – the more central intelligence you have, the more outer teams want to use it, so you need to find a way of allowing data and insights to flow.

So how do you do that if data is so complex that only your tech wizards can make sense of it?

You need cultural, technical, and strategic changes. At a high level, you need to:

  1. Invest in technology - Think cloud, scalable and accessible.

  2. Build a data-focused team - Not just your data squad; include others who aren’t necessarily data savvy but are experts in their own field.

  3. Communicate insights effectively - Insights and data need to be well presented, preferably with a story-telling, visual approach.



Key finding 2

If you need a modern data environment, then presently there is a trend emerging and it’s not something you can buy off the shelf.

For those with limited time, here is the three-point summary (and it’s called a Data Mesh):

  1. A decentralised data structure with domain/product teams who own the analytical and operational data.

  2. Self-serve infrastructure, data products and federated governance.

  3. A data-driven decision-making culture across the organisation.

And what’s the benefit of doing this?

You’ll double the analytical throughput.

Let’s break those points down, so we understand what they mean.

A decentralised data structure – In most organisations, you’ll have a centralised data structure, where the data team manages the data warehouse and is often a bottleneck for reports and analysis. Conversely, a decentralised structure pushes the data and knowledge out to the teams that need the insights. So how do you manage that I hear you ask…?

With domain/product teams, that are responsible for their data and building data products. These teams are highly knowledgeable (more than the central data team) about their domain data, processes and are essentially guardians of the data & products created there.

To facilitate this, they need a data platform infrastructure so they can self-serve plus create data products, that meet the specific needs of the domain. These domain products are then available to other domains, for cross-domain analysis.

All of this needs to be managed by an overarching federated governance – an adherence to the organisational rules and industry regulations.

Clear as mud? Perhaps a visual will help (with thanks to datamesh-architecture.com)

If you’re like me, an example always helps as well.

Let’s imagine a retailer with bricks and mortar and an online shop offering. A key domain is likely to be CRM with a Customer View product, which almost all departments/teams will want to use.

  • The CRM domain team has in-depth knowledge of the customer.

  • Operational data is ingested into the defined Customer View data product, likely a Single Customer View, which would be used for answering 90% of the questions usually asked.

  • This CRM domain team have the ability to build their own data products, such as a customer segmentation, which will be available for other domains.

Other domain teams for this example are likely to be:

  • Management Support - a team dedicated to delivering the dashboards, reports and KPIs that the C-level require.

  • Product - the team developing and monitoring the product offering, for stock and demand levels, pricing structure, etc.


So, this all sounds great but how do you get there?

Key finding 3

You can’t just jump straight to this structure as it will take significant time to evolve processes, platforms and people (there are tonnes of reading material if you’re keen to know more, see the links at the bottom).

From the perspective of many of our clients, this guiding north star is quite far off. The good news is that there are many roads to this nirvana and thanks to some of the great people I have spoken with, we can use real life experience to guide us.

  1. Start with a centralised data warehouse.

    • Just because the end goal is decentralised, you need to walk before you run.

    • Note - we can get you up and running extremely quickly with this, see here.

  2. As the company grows, there will be noticeable assets that people congregate around.

    • Focus on one of these as the starting point.

  3. Keep data engineers and data scientists centralised to begin with.

  4. Start the process of pushing data analysts out to the teams/departments, so they begin the process of learning domain knowledge.

  5. Utilise new technology like reverse ETL tools for activation in platforms, rather than using data domain data engineers.

  6. Once established, push out data engineers and data scientists to bolster and evolve the “Domain team”.


Where are you on this journey?

Organisations develop at different speeds, so you don’t need to fast track anything but you do need a plan of where you are and where you’re going.

I’ve drawn this diagram in many forms over the years (my team will know it well) and this is the latest iteration, using the Microsoft Azure stack as we know it extremely well.

Data evolution

Stage 1 - Siloed data

  • You have all the tools and software for everything; they’re great at doing their own thing but can you get a report that shows your teams timesheets data against the cost of delivering the product? Probably not without some elaborate Excel machinations.

Stage 2 - Business Intelligence 101

  • The first step to dealing with these multiple data source is to use a Business Intelligence tool such as Power BI, Tableau or Qlik.

  • Now you have finally got answers but as you keep adding data sources into the data model, the dashboard slows down and even getting the thing open means you’ll have time to make cup of tea.

Stage 3 - Data warehouse intelligence

  • Data flows nicely through the system via APIs, RPAs and manual files into a data warehouse where your analysts can deploy their talents to have your reporting tool pointing at the right data. This is what you’ve been working towards but as soon as it takes hold (from an orgnaisation cultural point of view), the backlog begins.

  • Data requests for report changes, deep dive analysis, new data sources, changes to data sources, fixes and issues. The Business As Usual list grows and grows and the strain of a centralised system starts to show.

Stage 4 - Domain team intelligence

  • Domain teams are set up so they can answer their own questions, build products and define new data for others to use.

  • The strain is taken off the central data team, who can now focus on bigger projects and high end advanced data analytics projects.

Stage 5 - Artificial Intelligence…

  • Unknown territory but a decentralised system needs constant maintenance and governance, which could become overwhelming and costly if the system grows too big.

  • Watch this space, as there is likely a way of simplifying this with technology and Artificial Intelligence.


Worried that you’re at stage zero?

If you’re not even at stage one, then now is the time to start your research and discussions.

Below are the three main areas to focus on for tech stack, with links to all:

  1. Pick your cloud instance of choice (Azure, GCP, AWS, Snowflake, Databricks)

  2. Utilise technology for ingestion (Fivetran, Data Factory, Talend, AWS and Google offerings)

  3. And reverse ETL tools (Hightouch, Census, Dataform, Airbyte and Grouparoo)

Finally, as you might imagine, we recommend you utilise expert advice for choosing the right tech for your organisation’s strategy, and build in stages.

Please reach out with comments, questions, suggestions or just for a coffee and chat.

Alternatively, join the conversation in our LinkedIn post.

About the author

Louis Keating is the Founder of White Box, with over two decades of experience in data science and advanced analytics.

 

Get in touch if you’d like to understand what your data-driven roadmap could look like.

 

Further reading and other considerations

Is the Data Mesh the only way?

Definitely not. I had a great conversation about how Data Virtualisation can be just as efficient, perhaps more so but it will require extensive API development.

How complicated is it to set up a Data Mesh?

For complete transparency, I’ve never set one up so I’m reliant on the people I’ve spoken to and the research I’ve pulled together. It is complicated and you will need a diverse set of skills with buy-in from the top management.

As with any strategic change, you need a thorough plan and in my opinion, you need to showcase it with a smaller subset - the Proof of Concept method to gain full buy-in from the C-level.

How does the data governance work?

You need to establish clear policies and procedures for data privacy, security, and compliance, as you’ll have domain teams managing their own data products.

 

Links for further reading:

https://www.datamesh-architecture.com/

https://www.datamesh-architecture.com/data-product-canvas

https://cloud.google.com/architecture/data-mesh

https://martinfowler.com/articles/data-mesh-principles.html

https://medium.sqldbm.com/data-mesh-overhyped-misunderstood-and-useful-e65c60ba6643

https://www.astera.com/type/blog/data-warehouse-concepts/




Explore more of our data stories