Skip to Content

Data Hubs: Why, How, and the Future

Roosa Säntti
5 Jul 2022

Data is both challenge and an opportunity for any business today. Ensuring it’s collected, structured and brought together in a meaningful way from various systems is one part. Then making it accessible in a user-friendly way to those that need and can use it to drive the business is the other. Today’s bonus is also to do all this in a sustainable way – moving from BIG data to smart data and collecting and using what is needed instead of getting it all “just in case”.

A data hub can address those requirements, but it’s a solution that needs internal buy-in to execute effectively. This makes communicating the potential value of such a project to stakeholders an essential early step. So, how do you deliver and ‘get it done’? 

The technical foundation is the core, but a data hub doesn’t bring true value to a business without people: data governance, data culture, data literacy and a business driven data development model are needed to turn your data into sunshine. 

Main principles for a modern data hub

The initial struggle for a data team is the immense amount of data being generated, processed and stored each day by a business – not forgetting external data such as from the internet, IoT and social media. It may be of some reassurance to recognize that this is a struggle most organizations face.

Data teams are set the task of continuously improving their data platform ecosystem whilst also helping a business extract value from their data through insights and predictions with the help of analytics, AI (artificial intelligence) and ML (machine learning).

Data scientists and analysts will use this growth of data volume and richness whilst also discovering newly available data assets, understanding their provenances, and taking appropriate actions based on the insights.

In turn, this feeds back to the data team, who have the responsibility of ensuring when it comes to data, it is available to the right person at the right time and in the right format.

A data hub, or data platform, delivers a solution to these challenges by collecting your organization’s useful data into a centralized platform that is:

  • scalable to accommodate future data growth
  • flexible in that it supports diverse types of data sets: structured, semi-structured and unstructured
  • reliable to ensure data is always available for consumption
  • secure so that the data is shared with all the data consumers in a controlled manner

A maxim that says no two companies have the same data structures. Even if only partially true, it does still make the point that a data hub has to be able to adapt for different use cases.

When designing the platform, the main things to consider are:

  • Data Push vs Data Pull
  • General vs Specific
  • Online vs Offline
  • Centralized vs Federated Governance

All the major cloud platforms such as MS Azure, AWS, GCP, and Snowflake provide a good basis for your data hub. Therefore, the final tech selection should be based on your enterprise architecture and the business use cases and requirements. 

People matter: Data Governance, Culture and Literacy

The immediate need once a centralised data platform is designed is to start implementing effective data governance to ensure that the data ownerships, roles, responsibilities and decision making processes are identified, assigned and in action. 

Many companies still lack clear roles for data ownership. Data owners are especially needed in the context of data hubs to ensure conceptual data models are done and up to date. To prevent a data hub from becoming a data swamp, you need to define the data you bring in. The starting point of any analytical or AI development activity should be a conceptual data model – by this, we mean describing your business with data. Once the conceptual model is defined, you may move on to the next layers of data modelling – logical and physical. 

Data culture and literacy are often considered the softer sides in data development, but we have learned that they are the hardest. Data culture is built when people learn by doing and seeing how data can support a business and solve business problems. Culture eats strategy for breakfast, and hence, it cannot be forced. Data literacy means that people in the organisation know how to read data – reports, dashboards, raw data, etc – and also understand their position in the data supply chain. Where their role fits in creating and using data.

Business value as a driver for development

Once you have the foundations of a centralised data hub in place comes the fundamental question: what should you do with it, and how can you ensure it serves the business in the right way and with the right priority?

These questions become especially crucial when an organization’s maturity grows and there is increased demand for AI and analytics projects. Data culture is the key and makes businesses demand more.

Usually, even a small team can build a platform that has the fundamentals in place, but once demand grows, companies start to struggle with how to prioritize and scale the development.

What and how you develop on your data hub should always be business case driven – each analytics and AI project should have an expected business value. It should be systematically verified that the actual result – a data product – creates that value. This is where a business’ role is critical; only they can say if it serves the purpose.

Our experience at Capgemini has taught us these three aspects are the most important to consider for analytics and AI development:

  1. Prioritization – Business value is the key
  2. Organizing – Teams know their data
  3. Development – Where data meets design and AI

1 – Prioritization – Business value is the key

Every development project should have an expected business value and be described using, for example, a business model canvas. This should include data availability analysis, expected business value, development cost, cost of delay, and business case owners. The forum to decide on the prioritization of the use cases should be a mix of people representing business, IT and data development teams.

2 – Organizing – Teams know their data

A data development team should have enough knowledge about the business and its data whilst a multi-functional team can independently run the agile development project. The target is to create a data product that needs to be owned by the business area. Creating data definitions step by step allows data equity to be built as the project progresses.

The development teams must follow commonly agreed on ways of working and development processes. Data sets are defined and shared transparently, for example, through a data catalog, so an internal data marketplace is created in turn. Data(set) re-usability is born. 

3 – Development – Where data meets design

A data hub is a perfect basis for agile and scalable service development and deployment. However, it is not only about tech since the business value and end-user experience should always be kept in mind. To achieve this, the service design method provides a way to steer the development process.

Operating in the same data platform with governed high quality data from the early trials mitigates the pain points of the productization of the services. The applications may vary from simple BI dashboards to state-of-art AI services. Whenever machine learning is involved,  it is crucial that there are standardized processes to build, deploy and operationalize ML models. This approach is called MLOps, a close relative to DevOps. It guarantees a robust life-cycle management for ML solutions which are often sensitive to many kinds of changes and especially changes in the input data.

To discover more about data hubs and related technology solutions, speak with our expert Insights & Data team.

Author

Roosa Säntti

Head of Insights & Data

Email: roosa.santti@capgemini.com

Contact us

Thank You! We have received your form submission.

We are sorry, the form submission failed. Please try again.