Saltar al contenido

From big data to right data: data platforms in the age of eco-responsibility

Nicolas Ydder
1st September 2023

The current practice of collecting and storing large amounts of data is often counterproductive and environmentally damaging. It is essential to embrace a frugal data approach to enhance quality over quantity, lessening data collection and environmental impact. This shift entails forming a governance body and implementing operational changes like optimizing storage and diverse storage modes. A lifecycle approach and focusing on information needed, rather than data, can also lead to data-saving choices. This reinforces control over data assets, reduces costs and risks, and aligns with a company’s CSR strategy.

When it comes to data, it has long been considered that the more data you collect, the better off you are. After all, it could always be used. And with the rise of big data and the cloud, it was believed there was no reason to limit ourselves since the infrastructures could support gigantic volumes without strain. However, we are aware that masses of data, measured in petabytes, are rarely useful or are even counterproductive, as it becomes impossible to find your way around. As a result, up to 80 percent of the data is not used.

But even unused data requires transfers, handling, processing, storage, security measures, and more. And since all this happens on a very large scale, the resulting environmental footprint is not negligible. For example, in the European Union, data centers accounted for 2.7 percent of electricity demand in 2018 and will reach 3.21 percent in 2030 at the current pace. At a time when IT is being criticized for its growing ecological impact, fighting against this “infobesity” is a priority in terms of eco-responsibility.

Cultural and strategic issues

Data frugality means breaking with the habit and ease of keeping everything. From exhaustive big data, we need to move to right data: prioritize quality over quantity by collecting and keeping only what is useful. But choosing and selecting is an effort and a change, and therefore above all a cultural issue. Everyone must be aware that, despite its “immaterial” nature, the accumulation of data is a waste that is detrimental to the company and the environment and that it is everyone’s responsibility to reduce it: IT, in charge of infrastructures; the data organization, in charge of data assets; and the business lines, the only ones able to evaluate the value and interest of datasets.

Necessary governance

Data frugality is thus becoming a common goal. However, not everyone will have the same point of view on what should be kept, for what purpose and under what conditions. It is, therefore, necessary to set up a governance body to define policy, guidelines, and roles, and to arbitrate differences of opinion. At the technical level, it can be supported by a “design authority,” embodied by architects and business decision-makers who will issue rules, manage their deployment, and ensure they are applied rigorously and consistently.

One of the reasons for the inflation of data is that no one is responsible for controlling volume. In the framework of governance, it is therefore essential that someone takes on this role. It will be up to that person to ensure data has optimal business impact and veto if its environmental impact outweighs the benefits. To make such decisions, it will be necessary to put in place indicators more detailed than the volume of data and storage cost. All of this should be managed at the project portfolio level.

“Data frugality reinforces the control of data assets, reduces costs and risks, and aligns with a company’s CSR strategy, making it a transformative approach in the age of eco-responsibility.”

Operational measures

In practical terms, data frugality involves several operational measures, many of which can begin to be implemented without waiting for the governance framework to be established. These actions are the priority to obtain the first significant gains and to initiate a change in perception.

  • Storage: A lot of data is not up for debate and could be rationalized by simply leveraging platform features such as deduplication to optimize storage devices, with positive returns in line with a “FinOps” approach.
  • Access: Another priority area that is relatively easy to implement is the differentiation of storage modes over the data lifecycle, with “hot” access for the most immediately needed data, “warm” access for less frequent analysis and reporting, “cold” for precautionary archiving, and finally archiving for purely historical purposes, on tape.
  • The form: At the level of the data architecture, several technical levers can help limit volumes, such as compression (provided that the gains are not absorbed by too frequent decompression), binary serialization (optimization of object storage), data virtualization (to avoid unnecessary replication), and data sharing.

Adopt a lifecycle approach

But the most important change must occur at the level of data projects and products, where we must now focus on information, on what we need to know, and not on the data that enables us to know it.

In this way, it is possible to make data-saving choices throughout the lifecycle without detriment to business results, by using, if available, third-party data rather than collecting and owning one’s own; filtering data at the source and pre-processing it to bring up only what makes sense; choosing pre-trained or data-saving algorithms (few-shot learning, zero-shot learning); determine the threshold of precision/relevance that is just necessary and do not extend the calculations beyond that threshold; keep only the results and not the raw data that made it possible to obtain them (or keep only representative samples); and be satisfied, when possible and relevant, with aggregated results and averages, rather than detailed figures. Note that all these measures need to be documented and traceable in case there is a need to account for the various sorts and deletions made.

If they coordinate their efforts, business, IT, and data organizations have many levers to reduce data volumes and their environmental footprint. Above all, by considering the reflex of constantly questioning the meaning and usefulness of what is collected, this policy of frugality reinforces the control of data assets, which, in turn, also contributes to reducing costs and associated risks. For the time being, there are no regulations that require companies to moderate the amount of data they collect but, from all points of view, it is in their best interest to anticipate this.

Data frugality is a concern that transcends the scope of green IT alone and intersects with the broader framework of a company’s CSR strategy. This approach to data collection and storage aligns with the transformation towards a more data-driven organization. Therefore, driving the change towards data frugality must be a top-level decision and included in the strategic objectives for the environment.

INNOVATION TAKEAWAYS

PRIORITIZE QUALITY OVER QUANTITY

By collecting and keeping only what is useful, which requires a cultural and strategic shift towards a data frugality approach.

ESTABLISH A GOVERNANCE BODY

To define a policy, guidelines, roles, and to arbitrate differences of opinion. Implement operational measures such as optimizing storage and differentiating storage modes.

WORK THE DATA ARCHITECTURE

Several technical levers can help limit volumes, such as compression (provided that the gains are not absorbed by too frequent decompression), binary serialization (optimization of object storage), data virtualization (to avoid unnecessary replication), and data sharing.

Interesting read?

Capgemini’s Innovation publication, Data-powered Innovation Review | Wave 6 features 19 such fascinating articles, crafted by leading experts from Capgemini, and key technology partners like Google,  Starburst,  MicrosoftSnowflake and Databricks. Learn about generative AI, collaborative data ecosystems, and an exploration of how data an AI can enable the biodiversity of urban forests. Find all previous waves here.

Nicolas Ydder

Managing Data Architect, Insights and Data France, Capgemini 
Nicolas Ydder is one of the key lead of the CTO Office within insight & Data France. He as nearly 10 years of experience in architecture and has been at the forefront of technology and innovation by being one of the pillars of the Applied Innovation Exhange in Toulouse. He is nowadays applying his knowledge around innovation and Data for an aerospace company.