The World of Data

Article 3 - October 14, 2022

Nowadays, loads of data is produced and gathered on a daily basis. It has been said that the number of bytes in the digital universe was 40 times bigger than the number of stars in the observable universe. The pandemic even accelerated this into a huge datadrift, so data is everywhere and in everything! All this data can be useful for different purposes. However, this vast universe of information brings with it some challenges. At AxonJay, We like to see these challenges as opportunities. Hence, we welcome this universe of data chaos and data drift.

Look at the source

We could distinguish data based on the source, i.e. public (aka open), web or purchased data. The first two options are available for free, while the latter comes at a price.

Power to the people

Public data could be delivered by governments or independent organisations. The Belgian government for example, delivers their ‘Crossroads Bank for Enterprises’ for free online. A similar database can be found in other countries as well. Eurostat is another known open data source. It is the statistical office of the European Union. They publish high-quality European statistics and indicators ranging from agriculture and fisheries to economy and finance.

Crawling, Scraping and … Peeling?!

When data isn’t publicly offered, we have to follow another route. The most direct one is getting content presented on websites through crawling, scraping and peeling. Traditionally, web crawling is about finding urls, while scraping is about extracting data from a website. These are both slow methods for gathering data.

Web peeling on the other hand, is a new process applied by AxonJay. Here we analyse the websites before even opening them. This makes it possible to avoid unnecessary crawling and scraping processes. Hence, this is faster and more efficient than the classical crawling and scraping processes from the past.

The Price Is Right, or is it?

When data isn’t publicly available or if it takes too long to gather the necessary information using other methods, we can think of buying data. This seems like a simple process, but a lot of factors need to be taken into account:

What is the price we’re paying? How is the data offered? What amount of the data can we actually use? What is the quality of the data? What is the update frequency? Who are we doing business with? … And maybe even more important, why do we want/need this data?

Data for the sake of data, or being extremely conservative with purchasing data isn’t a solution. However, the solution should match the challenge. Hence, If we’re in need of an off-road bicycle and buy a Formula one racing car, it will cost loads of money and it will not be suited for the challenge. On the other hand, if I buy a new pair of shoes, the outcome will probably be cheaper, but I wouldn’t be going the full distance fast enough. The biggest challenge is to know what you need. Find the proper solution and pay the correct amount for it.

Facts vs. behaviour

Factual data or behavioural data are two different descriptions for company data. Both are useful, but used for different purposes. Factual data can encompass contact information, the sector, number of employees, etc. and gives the official view of a company While behavioural data encompasses news articles, social media feeds, changes in ownership, changes in key employees, etc. which shows the true behaviour based on real time updates. In the behaviour examples the word ‘change’ is an important one. Often a change in factual data equals behaviour data as long as you track the changes. Just the change in data is sometimes even more important than the value it changes to. For example, the change to a new CEO says more than who is actually going to be the new CEO. Therefore, you have a subdivision of slow and fast changing data. Slow changing data (e.g. addresses, management, etc.) can indicate small or big changes depending on the specific change. While fast changing data (e.g. news, new products, etc.) tend to be less fundamental behavioural changes. It is important to identify relevant and useful sources, but knowledge about a topic is key to understanding what might be usable. Linking the data source to the questions you want an answer to, is not always as straightforward as it looks. It can even be something as small as the flight records of private jets which have been shown to be useful in predicting upcoming mergers and acquisitions.

Put it to good use

You can have all the data in the world. If it isn’t usable you can’t work with it. Hence, an often overlooked problem with data is how to store it. The structure of the data of the Crossroads Bank for Enterprises in Belgium is similar but not exactly the same as it is in the UK, Denmark, Slovenia or France. Hence, if you want to combine information from different countries, you need to be prepared to structure everything correctly. Any data scientist will tell you that due to the great variety of data present, preparing all of it can take up more than 90% of their time. Hence, getting the data is one thing. Combining all of it into a coherent and understandable whole is another.

The Self-Machine-Learning Platform™ from AxonJay is designed to solve this time consuming problem. It has standardised processes for loads of data from different sources coming in all shapes and sizes. Not only does it turn this chaotic data universe into a coherent and understandable whole, it provides it in a usable format without the need for even one data scientist to have a look at it. Saving everybody loads of data-headaches and time.

The Tribe of AxonJay

Sources:

Data vs. stars:
https://seedscientific.com/how-much-data-is-created-every-day/

Private jets vs. M&A :
https://marcellus.in/story/hedge-funds-are-tracking-private-jets-to-find-the-next-megadeal/#