Big Data, Machine Learning: how to make the most of your data

Not a day goes by without hearing or reading an economic report in which the magic word is pronounced: “Data!” Data Scientist, Data Intelligence, Data management and of course Big Data. For the few people who are not put off by these terms and who are a little more open-minded, the notions of Machine Learning, Deep Learning or Artificial Intelligence often come on the second wave of trendy expressions.

Experts in this field will learn nothing (or not much) from this article. The idea of this article is simply to demystify these concepts for business and industry professionals who too often neglect the digital gold mine that lies dormant at the bottom of their computer system and is just waiting to be used.

Big Data in all its form

“Big data”: this is probably the word that intimidates the most and scares away most normal people (i.e. those who don’t spend their lives in front of a computer!). Very few companies or organizations currently have to worry about big data. It could be translated simply as “a lot of data”. The amount of digital data generated daily around the world is growing exponentially. More and more data is being stored and this creates both problems and opportunities. But don’t worry, you don’t need billions or millions of pieces of data to start looking at their potential. A few thousand lines may already contain essential hidden information… It is data scientists who can find it and make the most of it!

Data Scientist : modern-day statistician

“Data Scientists” are halfway between statistics and computer science. They use the power of computers to apply the most proven statistical methods to the most revolutionary! A data scientist is like an engineer: he cannot be a specialist in everything, but he has the mindset and the training to analyze any situation and find the most suitable solution.

Artificial Intelligence: from the simplest to the most complex

These proven or revolutionary modeling methods are grouped under the term “Artificial Intelligence”. A combination of strange words, but that’s not what debate is about. Contrary to what one might think, artificial intelligence can be disconcertingly simple. A simple Excel formula on a customer file that says, “IF the temperature expected during the weekend exceeds 25° THEN I will run a special offer on hats IF NOT I will offer a sweater for 2 bought” is already a small algorithm that fits within the framework of Artificial Intelligence. Nothing to be afraid of. But of course, we can propose more sophisticated and interesting algorithms. This is often when we hear about “Machine Learning”.

Machine Learning : almost automatic learning

I like to explain to marketing and communication specialists that Machine Learning is nothing more and nothing less than super regression! Remember the math classes in school. Regression consists in drawing a line (straight or curved) to the nearest dot in a series of dots. This line is the model or prediction: the result of the algorithm. Several of them can be found, and their relevance can be noted, i.e. whether it passes close enough to the points. The Data Scientist will then be able to choose the best model according to the scores obtained. I talked to you about “super regression”. There are dozens of methods to produce regressions or classifications (good wine or mediocre wine?) depending on the structure and size of the available data. This is what “machine learning” is all about. Why learning? Because the computer itself calculates the parameters of the equation.

Deep learning : its the computer that chooses

But in the example above, it is the human who determines the variables on which to make the calculations (for example: number of orders in the month, date of the last order, amount of each order, etc.). When it is the computer that chooses the significant variables to do its modeling, it is called “Deep Learning”. To put it simply, it is impossible for a human to define exhaustively what makes the difference between a dog image and a cat image. This is what can be done with algorithms such as neural networks that fall within the scope of “Deep Learning”, which is particularly adapted and powerful for automatic image recognition or text classification (spam or not? hate speech or not? etc.).

Data Intelligence and Data Management

Data Management is simply the industrialized management of your databases. Highly IT-oriented, a good data manager will provide you with well-structured and easily accessible data, the true fuel for your models, or in other words: “garbage in, garbage out”.

Data Intelligence is what comes down to a state of mind… using (at least) intelligently (even better) one’s data to extract value and support one’s decisions respecting one’s strategic line: this is what we advocate to our clients.

In conclusion

At 37.5 we target our offering around Machine Learning to create predictive models that add value to databases that do not necessarily require millions of lines. Data management, storage and infrastructure can also be provided by our partners. This is the playground of “Data Engineers” or “Data Managers”.


Nicolas CLÉMENT

Consultant Data Intelligence

Share on linkedin
Share
Limagrain