The Data Fabric for Machine Learning. Part 1.

Introduction

If you search for machine learning online you’ll find around 2,050,000,000 results. Yeah for real. It’s not easy to find that description or definition that fits every use or case, but there are amazing ones. Here I’ll propose a different definition of machine learning, focusing on a new paradigm, the data fabric.
Objectives
General

Explain the data fabric connection with machine learning.

Specifics

Give a description of the data fabric and ecosystems to create it.
Explain in a few words what is machine learning.
Propose a way of visualizing machine learning insights inside of the data fabric.

Main theory

If we can construct a data fabric that supports all the data in the company, then a business insight inside of it can be thought as a dent in it. The automatic process of discovering what that insight is, it’s called machine learning.
Section 1. What is the Data Fabric?

I’ve talked before about the data fabric, and I gave a definition of it (I’ll put it here again bellow).

There are several words we should mention when we talk about the data fabric: graphs, knowledge-graph, ontology, semantics, linked-data. Read the article from above if you want those definitions; and then we can say that:

The Data Fabric is the platform that supports all the data in the company. How it’s managed, described, combined and universally accessed. This platform is formed from an Enterprise Knowledge Graph to create an uniform and unified data environment.

Let’s break that definition in parts. The first thing we need it’s a knowledge graph.

The knowledge graph consists in integrated collections of data and information that also contains huge numbers of links between different data. The key here is that instead of looking for possible answers, under this new model we’re seeking an answer. We want the facts — where those facts come from is less important. The data here can represent concepts, objects, things, people and actually whatever you have in mind. The graph fills in the relationships, the connections between the concepts.

Knowledge graphs also allow you to create structures for the relationships in the graph. With it, it’s possible to set up a framework to study data and its relation to other data (remember ontology?).

In this context we can ask this question to our data lake:

What exists here?

The concept of the data lake it’s important too because we need a place to store our data, govern it and run our jobs. But we need a smart data lake, a place that understand what we have and how to use it, that’s one of the benefits of having a data fabric.

The data fabric should be uniform and unified, meaning that we should make an effort in being able to organize all the data in the organization in one place and really manage and govern it.

[READ MORE]

Leave a Reply

Your email address will not be published. Required fields are marked *