The Data Catalog Drives Digital Transformation – Artificial Intelligence Drives the Catalog


The Data Management category of products began with a focus on Data Integration, Master Data Management, Data Quality and management of Data Dictionaries. Today, the category has grown in importance and strategic value, with products that enhance discoverability and usability of an organization’s data by its employees. Essentially, Data Management has shifted from a tactical focus on documentation and regulatory compliance to a proactive focus on driving adoption of Analytics and accelerating data-driven thinking. At the center of this change is the modern Data Catalog.

The Importance of the Catalog

Data Catalogs began life as little more than repositories for database schema, sometimes accompanied by business documentation around the database tables and columns. In the present technology environment, Data Catalogs are business-oriented directories that help users find the data they need, quickly. Instead of looking up a table name and reading its description, users can search for business entities, then find data sets related to them, so they can quickly perform analysis and derive insights. That’s a 180-degree turn toward the business and digital transformation.

While this newer, more-business positive role for Data Catalogs is positive and progressive, it is not something that comes without effort. A Data Catalog is powerful only if its content is comprehensive and authoritative. Conversely, Data Catalogs that are missing key business or technical information will see poor adoption and can hinder an organization’s goals around building a data-driven culture. But how can enterprises, with their vast array of databases, applications and – increasingly – Data Lakes, build a catalog that is accurate and complete?

Begin to Build

One way to build a Data Catalog is by teaming business domain experts with technologists and go through the systems to which their expertise applies. Step-by-step, table-by-table and column-by-column, these experts can build out the knowledge base that is the Data Catalog. The problem with this approach is that it’s slow – slower, in fact, than the rate at which most organizations are adding new databases and data sets to their data landscape. As such, this approach is unsustainable.

Adding to the complexity, it’s increasingly the case that subject matter experts’ knowledge won’t cover databases in their entirety, and “tribal knowledge” is what’s really required to make a Data Catalog comprehensive and trustworthy. This then leads to an approach of “crowdsourcing” catalog information across business units and, indeed, the entire enterprise, to build out the catalog.

While the inclusivity of such an approach can be helpful, relying on crowdsourcing to augment business domain experts and build an authoritative catalog won’t get the job done. Crowdsourcing alone is a wing-and-a-prayer approach to Data Management.

Enter AI and ML

In the modern data arena, Artificial Intelligence and Machine Leaning must be used alongside subject matter expertise and crowdsourcing, in order to fully leverage their value, and keep up with today’s explosive growth of data. Business domain expertise and crowdsourcing anchor the catalog. Machine Learning scales that knowledge across an enterprise’s data estate to make the catalog comprehensive.

Artificial Intelligence and Machine learning can be used to discover relationships in databases, or Data Lakes, as well as between multiples of these. While some of these relationships may be contained in metadata, many will not be. Machine Learning, by analyzing the data itself, can find these hidden relationships, allowing experts to confirm the discoveries and make them even more accurate going forward.

Leveraging this relationship discovery helps extrapolate expert and crowd-sourced information in the catalog. When business entities are defined and associated with certain data elements, that same knowledge can be applied to related elements without having to be entered again. When business entities are tagged, the tags from related entities can be applied as well, so that discovered relationships can yield discovered tags.

[READ MORE]

Pillars of Data Governance Readiness: Enterprise Data Management Methodology

Pillars of Data Governance Readiness: Enterprise Data Management Methodology

Facebook’s data woes continue to dominate the headlines and further highlight the importance of having an enterprise-wide view of data assets. The high-profile case is somewhat different than other prominent data scandals as it wasn’t a “breach,” per se. But questions of negligence persist, and in all cases, data governance is an issue.

This week, the Wall Street Journal ran a story titled “Companies Should Beware Public’s Rising Anxiety Over Data.” It discusses an IBM poll of 10,000 consumers in which 78% of U.S. respondents say a company’s ability to keep their data private is extremely important, yet only 20% completely trust organizations they interact with to maintain data privacy. In fact, 60% indicate they’re more concerned about cybersecurity than a potential war.

The piece concludes with a clear lesson for CIOs: “they must make data governance and compliance with regulations such as the EU’s General Data Protection Regulation [GDPR] an even greater priority, keeping track of data and making sure that the corporation has the ability to monitor its use, and should the need arise, delete it.”

With a more thorough data governance initiative and a better understanding of data assets, their lineage and useful shelf-life, and the privileges behind their access, Facebook likely could have gotten ahead of the problem and quelled it before it became an issue. Sometimes erasure is the best approach if the reward from keeping data onboard is outweighed by the risk.

But perhaps Facebook is lucky the issue arose when it did. Once the GDPR goes into effect, this type of data snare would make the company non-compliant, as the regulation requires direct consent from the data owner (as well as notification within 72 hours if there is an actual breach).

Considering GDPR, as well as the gargantuan PR fallout and governmental inquiries Facebook faced, companies can’t afford such data governance mistakes.

During the past few weeks, we’ve been exploring each of the five pillars of data governance readiness in detail and how they come together to provide a full view of an organization’s data assets. In this blog, we’ll look at enterprise data management methodology as the fourth key pillar.
Enterprise Data Management in Four Steps

Enterprise data management methodology addresses the need for data governance within the wider data management suite, with all components and solutions working together for maximum benefits.

A successful data governance initiative should both improve a business’ understanding of data lineage/history and install a working system of permissions to prevent access by the wrong people. On the flip side, successful data governance makes data more discoverable, with better context so the right people can make better use of it.

This is the nature of Data Governance 2.0 – helping organizations better understand their data assets and making them easier to manage and capitalize on – and it succeeds where Data Governance 1.0 stumbled.

Enterprise Data Management: So where do you start?

Metadata management provides the organization with the contextual information concerning its data assets. Without it, data governance essentially runs blind.

The value of metadata management is the ability to govern common and reference data used across the organization with cross-departmental standards and definitions, allowing data sharing and reuse, reducing data redundancy and storage, avoiding data errors due to incorrect choices or duplications, and supporting data quality and analytics capabilities.

[READ MORE]

Data Privacy and Blockchain in the Age of IoT

Thanks to the internet of things (IoT), the world is connected like never before. While that fact has opened up a vast array of opportunities when it comes to communication and data sharing across platforms, it also come with concerns. Specifically, how can we ensure that our personal information is protected? And what roles do data scientists and big data play in protecting that sensitive information?

Blockchain technology is becoming an essential tool that can protect HIPAA-protected medical data and other forms of personal information that are worth quite a bit on the dark web. Society as a whole has come to favor wireless connectivity and 24/7 accessibility, but we need to ensure that our data privacy remains intact in the age of IoT-driven technology.

What Is Your Personal Data

As your personal data can take myriad forms, the amount it’s worth can vary considerably. For example, advertisers are interested in your consumer profile and shopping behavior data to develop personalized ads and run targeted campaigns. To cybercriminals, who aim to leverage personal data in order to commit identity theft, that data can be worth far more.

But that type of data sharing isn’t all bad: Consumer data can also be used by companies that have issued a product recall or by legal professionals putting together a claim against a negligent manufacturer. By identifying consumers that have purchased a defective product, companies can properly retrieve products at the individual level. In some instances, this can protect from you and your family from injury and even death, as exemplified the Fisher Price Rock ‘n Play Sleeper, which was recalled in April 2019.

But where are such organizations finding your data? Much of your consumer profile data can be found on your social media sites — especially Facebook, which has no problem sharing your data with advertisers. That’s because the corporation brings in an enormous amount of revenue from those advertisers.

[READ MORE]

An Introduction to Deep Learning and Neural Networks

It seems as if not a week goes by in which the artificial intelligence concepts of deep learning and neural networks make it into media headlines, either due to an exciting new use case or in an opinion piece speculating whether such rapid advances in AI will eventually replace the majority of human labor. Deep learning has improved speech recognition, genomic sequencing, and visual objection recognition, among many other areas.

The availability of exceptionally powerful computer systems at a reasonable cost, combined with the influx of large swathes of data that define the so-called Age of Big Data and the talents of data scientists, have together provided the foundation for the accelerated growth and use of deep learning and neural networks.

Companies are now beginning to adopt AI frameworks and libraries, such as MxNet, which is a deep learning framework that gives users the ability to train deep learning models using a variety of languages. There are also dedicated AI platforms aimed at supporting data scientists in deep learning modeling and training which professionals can integrate into their workflows.

It’s important, though, to specify that deep learning, neural networks, and machine learning are not interchangeable terms. This article helps to clarify the definitions for you with an introduction to deep learning and neural networks.

Deep Learning and Neural Networks Defined

Neural Network

An artificial neural network, shortened to neural network for simplicity, is a computer system that has the ability to learn how to perform tasks without any task-specific programming. For example, a simple neural network might learn how to recognize images that contain elephants using data alone.

The term neural network comes from the inspiration behind the architectural design of these systems, which was to mimic the basic structure of a biological brain’s own neural network so that computers could perform specific tasks.

The neural network has a layered design, with an input layer, an output layer, and one or more hidden layer between them. Mathematical functions—termed neurons—operate at all layers. Neurons essentially receive inputs and produce an output. Initially, random weights are associated with inputs, making the output of each neuron random. However, by using an algorithm that feeds errors back through the network, the system adapts the weights at each neuron and becomes better at producing an accurate output.

[READ MORE]

Two-thirds of the world’s population are now connected by mobile devices

This story was delivered to BI Intelligence Apps and Platforms Briefing subscribers. To learn more and subscribe, please click here.

Two-thirds of the world’s population are connected by mobile devices, according to data from GSMA.

This milestone of 5 billion unique mobile subscribers globally was achieved in Q2 2017. By 2020, almost 75% of the global population will be connected by mobile.

Here are the key takeaways from the report:

  • Smartphones will continue to drive new mobile subscriptions. By 2020, new smartphone users will account for 66% of new global connections, up from 53% in Q2 2017.
  • Developing markets will account for the largest share of new mobile subscription growth over the forecast period. Forty percent of new subscribers will stem from five markets: India, China, Nigeria, Indonesia, and Pakistan.
  • But mobile growth is slowing. It took around four years to reach 5 billion mobile users, compared with the three-and-a-half years it took to reach 4 billion. This suggests it’s going to take longer to reach 6 billion users, as the pool of new mobile users continues to shrink.

Affordability, content relevance, and digital literacy are likely bigger inhibitors to mobile internet adoption than a lack of network infrastructure is. Two-thirds of the 3.7 billion consumers who aren’t connected to the internet are within range of 3G or 4G networks. This suggests that device cost, a lack of relevant apps and content, and not knowing how to use the device are the primary barriers to mobile adoption.

[READ MORE]

The Inextricable Link Between Cloud Technology And The New, Untethered Workforce

Cloud computing has changed more than just how applications are bought, where they run and how data is stored.

It has changed the interaction between customers, code and business outcomes. More importantly for business information technology executives, it creates opportunities to lead initiatives well beyond the traditional IT stack — into areas ranging from e-learning to customer service. And instead of simply being a more efficient way of doing work, it’s giving IT leaders the ability to reshape work for the better.

No wonder a Harvard Business Review story hails the cloud as “the most impactful information technology of our time.”

Business leaders are turning to enterprise cloud technology because the nature of work itself is changing. It can no longer be defined as a single place in a fixed office, and a job description is more of fuzzy guideline than an out-and-out rule. As a result, work is stifled when it’s bounded by a predictable, cookie-cutter stack of devices and software.

According to the 2018 Deloitte Global Human Capital Trends report, employees at 91 percent of organizations work outside their designated functional areas. Thirty-five percent do so regularly. The old model of static software installations on fixed computers doesn’t flex or scale to these demands.

To keep track of who’s doing what and how, companies are dramatically increasing reliance on cloud collaboration and social media interaction for work communication. In the Deloitte report, 70 percent of organizations say they will expand their use of online collaboration platforms, and 67 percent will make more use of work-based social media. To free up time, they’ll curtail phone calls (seen decreasing by 30 percent of businesses) and face-to-face meetings (projected to decline by 44 percent of respondents).

The earlier waves of enterprise cloud tech focused on transforming a functional area or customer-facing process into a browser tab. That was valuable, but a screen full of browser tabs is little different than a desktop full of application icons. And the growth in these solutions created provisioning headaches, security challenges and regulatory risks. It was difficult to ensure consistent, centrally controlled permissions that gave employees the right amount of access at the right times, and to track the way sensitive information was used in a consistent manner.

[READ MORE]

What Will We Do When The World’s Data Hits 163 Zettabytes In 2025?

The shock value of the recent prediction by research group IDC that the world will be creating 163 zettabytes of data a year by 2025 depends on three things.

Firstly, who knows what a zettabyte is (one trillion gigabytes)? Secondly, what is the current annual data creation rate (16.3ZB)? And thirdly do these figures mean anything in a world where we take for granted that data will expand exponentially forever and mostly accept the future reality of autonomous cars and intelligent personal assistants, yet have little real idea of how they will change our lives?

IDC’s paper, Data Age 2025, perhaps answers only the first two questions. Forecasting a ten-fold increase in worldwide data by 2025, it envisions an era focused on creating, utilizing, and managing “life critical” data necessary for the smooth running of daily life.

Consumers will create, share and access data on a completely device-neutral basis. The cloud will grow well beyond previous expectations and corporate leaders will have unparalleled opportunities to leverage new business opportunities, it predicts. But firms will also need to make strategic choices on data collection, utilization and location.

I recently interviewed Jeff Fochtman, vice-president of marketing Seagate Technology, the $14BN market capitalization data storage group that sponsored the IDC report.

Critical Problems

“For individuals and businesses, a data deluge can cause problems in just being able to manage, store and access that data,” he says.

“But the thing that jumps out at me is the critical problems that data is going to solve. On an increasingly populated planet, data is going to solve the things impacting on the human experience: traffic, how we move around, how we grow food and how we feed the population globally.

[READ MORE]

Forbes Insights: The Rise Of The Customer Data Platform And What It Means To Businesses

Treasure Data and Forbes Insights recently partnered to present a broad ranging survey, Data Versus Goliath: Customer Data Strategies to Disrupt the Disruptors, that uncovers the attitudes and perceptions of today’s marketing leaders. This article, written by the Forbes Insights team, highlights some of the key takeaways from the survey and originally appeared on the Forbes website on June 20, 2018.

For years, marketing executives have sought an elusive 360-degree view of their customers, and the nature of customer data analytics and designing the customer experience (CX) is evolving dramatically within today’s organizations.

There is an emerging approach to bringing customer data into one place: the customer data platform, or CDP.

The good news is that much of the data to inform these questions is being collected and stored by enterprises or their partner organizations today. The bad news is that this data is typically maintained in separate systems, across organizational silos, and often cannot be surfaced at the time it’s needed to contribute to, or enhance, a specific customer experience—let alone inform a larger customer experience strategy.

For the most part, we are still in the early stages of customer data analytics, as indicated by a new survey of 400 marketing leaders, conducted by Forbes Insights and Treasure Data. According to the survey, it still takes marketers too much time to analyze and draw conclusions about the success of a marketing campaign or a change to the customer experience—47% say it takes more than a week, while another 47% say it takes three to five days.

And the tools and solutions to accelerate CX development still need to be put into place. A majority of executives, 52%, report that while they are leveraging a variety of tools and technologies in functions or lines of business, there is little coordination and there’s a lack of the right tools. Only 19% report having a robust set of analytics tools and technology services supporting customer-data-driven decisions and campaigns.

Yet there is an emerging approach to bringing customer data into one place: the customer data platform, or CDP. This new generation of systems is designed to bring all this disparate data about customers into a single intelligent environment and provide a synchronized, well-integrated view of the customer. These platforms are seeing widespread adoption across enterprises, as supported by the Forbes Insights/Treasure Data survey. Some 78% of organizations either have, or are developing, a customer data platform.

Understanding This New Type Of Platform

Customer data platforms are more broad-based than the traditional CRM systems that have been in place in many organizations for years. While CRM systems are designed to enable management and analysis of a particular customer channel, CDPs bring data from across corporate channels into a single platform. Although CRM and business intelligence solutions have provided some intelligence about customer trends, CDPs tie customer data directly to marketing and sales initiatives.

[READ MORE]

6 Ways Companies Use Predictive Analytics Across Industries

Can we really know what outcomes are likely? It may not be as far-fetched as it sounds. Predictive analytics can give us a highly accurate “Crystal Ball,” allowing us to see into the future, leveraging insights gleaned from large data sets and advanced machine learning (ML) algorithms.

Predictive analytics is the use of data, algorithms, and ML techniques to assign ‘scores’ to various user segments based on historical data. Its goal is to assess a likelihood of future events — such as a purchase or customer churn — so that a specific action can be taken. Using predictive analytics, we can know with a high degree of certainty the outcomes for future customers and business activities.

  1. Identification of Customers Likely to Churn

    In modern growth marketing efforts, churn is a crucial statistic. The old axiom rings true, “It’s cheaper to keep an existing customer than to find a new one.” Predictive retention models can identify which customers are most likely to churn — and companies can respond by reaching out to them with education on product benefits or other promotions. Predictive scoring can also identify a set of behaviors in customers who are less likely to churn. Messaging likely churners and steering them to adopt behaviors of customers who are less likely to churn is a valuable outcome for any business.

  2. Recommendations for eCommerce Cross-selling and Upselling

    If you’re a retailer selling a variety of products, predictive scoring can help you tailor your ‘recommended for you’ product placements by analyzing historical customer data and applying customer profiles to offer look-alike targeting for optimal conversion. For example, someone who has purchased hiking boots might be shown advertising for other outdoor gear — while someone who has bought kitchenware might be shown ads for kitchenware.

[READ MORE]

Enterprise Data Management: What You Need to Know

The increased pressure to implement enterprise data management is not the result of a fad or herd mentality. There are real operational needs driving this movement. Chief among these is the urgent demand to make data accessible and useful. Businesses need to be able to put their data to work fueling decisions, empowering efficiencies, and shaping company direction. That means data must be standardized, converted to useful forms, and stored where it is secure but still accessible to users.
If you are lucky, your company was able to get a jump on its data management, implementing proactive measures over the last few years to handle the swelling data tide. Unfortunately many enterprises weren’t quite so vigilant and now find themselves scrambling to get a handle on their data, which is growing and changing by the day.

The Many Sides of Enterprise Data

According to IDC, businesses are managing a volume of data that is growing at an average of 40% a year. Not only are companies handling more data, but the types of data are expanding as well. Data streams contain everything from inventory figures and financial information to videos, images, and other unstructured data coming in from social media, mobile, and the Internet of Things (IoT).  All of these varied data types need to be centralized, organized, and made accessible and useable to the business. That is the true mission of enterprise data management.

[READ MORE]