Gaming on our smartphones, hanging out on social media, watching a video or planning the next weekend or working from home: We have become accustomed to using the internet for all aspects of our lives as a matter of course. We put up with advertising so that we can use the services for free. And forget that we pay a high price for it: Our data is collected and processed in such a way that it becomes a gold mine.
“Oh, cool, there's a competition for a trip to the city I've always wanted to see.” - “Look, there's a psycho test that will give you great tips. Subscribe to the newsletter and we'll tell you your result.” - “Take part in the survey and you'll immediately find out how to become rich, beautiful and famous.” The intention is not always obvious at first glance. Human behavior is cleverly used to collect data.
There is not necessarily any malicious intent or commercial interest behind this. Data is also collected, processed and evaluated for scientific purposes or out of social responsibility. It is a fundamental requirement that personal data must be protected. At least it should be. But for the purposes of this blog post, let's assume that all requirements for data protection and data security are met and that all regulations are complied with.
What kind of data is required?
Data is a very abstract term to begin with. Some may think of the password for online banking, which is spied on in order to empty the account. But data can be valuable in many different ways, depending on the purpose for which it is collected.
Generally speaking, data is information or facts that come in different types and forms and can be found in practically all areas of life. Data is therefore not an invention of the 21st century, and it was already being collected when information was still mainly passed on in the form of printed media. Telephone directories are a good example: telephone numbers were collected in them and assigned to the names of their owners, creating a contact list from the previous century.
Data can therefore be structured as telephone numbers, e-mail addresses, names and home addresses. While this is data about people, product information includes data on article numbers, price, weight and stock levels. Checked the weather forecast today? Data is also needed for this: Temperature, humidity, wind speed and so on and so forth. These few examples alone indicate that data plays a role everywhere.
Additional data can come from all kinds of media, from texts, images, videos and audio clips. Medical values and data, geographical data and personal data can be collected. There is probably no data that cannot have a meaning in some context.
At this point, it is not about how this data is collected. More central to this article is the question of why this data is collected and where its benefits lie for us humans.
Preparing data in a way that people can understand
It should be clear to everyone that huge amounts of data can be collected in a short space of time. And then? You can't win anything with a series of numbers or letters alone. So that I know whether I should take my rain jacket or my sunglasses to a meeting in the park the next day, I don't want to trawl through endless sequences of numbers or consult a crystal ball.
So it's all about data quality and preparing the collected data in such a way that it ultimately provides me with the information I need.
Data cleansing is the first step towards readability. Outliers must be identified and strategies developed to deal with duplications or missing values.
The next step is to standardize data so that it is comparable. To do this, figures need to be converted into standard formats and measured variables need to be converted into specific units (to stay with the weather example: I want to know the temperature in degrees Celsius and not Fahrenheit).
The data can then be visualized so that correlations become clear at a glance. This can be done using diagrams, colors, shapes and labels. Even clearer are interactive models that allow you to filter the data according to your own preferences and make certain detailed information visible.
Our model for public transport in the Rhine-Main region follows precisely this interactive approach. Try it out for yourself: https://prohive.balcik.tech/de/zur-plattform
The interactive approach is clearer than a rigid document with static tables.
It all depends on the context
Figures alone do not offer any added value. The data must tell its story so that we humans can embed it meaningfully in context and derive insights from it. The interactive approach makes it possible to put abstract data into an understandable context. Only when we recognize a logical structure can we draw conclusions and derive recommendations for action. Or we can first form an opinion based on information whose correlations we can only recognize by processing the data.
How the data is processed in individual cases depends, among other things, on the target group. Specialists need different information than a general audience, which is looking for reliable information that is prepared in an understandable way and is as accessible as possible.
There are numerous complex tools and technologies for the technical side of processing that make implementation easier. These range from programming languages and business intelligence tools (BI tools) to special software and no-code solutions that enable fast presentation without in-depth programming knowledge.
The fact that we have only briefly touched on the actual procedure here should not obscure the importance of data preparation: only careful preparation makes even complex data sets accessible and understandable for a target group. This forms the basis for well-founded decisions.
Save data and keep it up to date
Collecting data is one thing, keeping it up to date is another. Updating processes must be considered and implemented from the outset. Back-up strategies ensure regular back-ups and consistency across different versions. Database management requires efficient storage and sensible access methods. Data integrity ensures consistency between different systems. Documentation ensures that metadata is maintained and changes are traceable.
At the other end of the data collection chain is waste collection. In order for data to be kept up to date, a way of dealing with data waste must also be found. How to deal with outdated or no longer relevant data must first be defined, as must the data renewal cycle. Only then can the data be cleansed and/or archived. This is necessary in terms of resource efficiency: storage and computing power are optimized by processing only the relevant data.
Last but not least: sustainability must also be taken into account with data. Environmental protection does not stop at data analysis.ngen Halt.