mastechinfotrellis - Tumblr blog

mastechinfotrellis · 3 years ago

Text

How to use Informatica Power Center as a RESTful Web Service Client?

Introduction

In today’s world DATA is ubiquitous and critical to the business which eventually increases the need for integration across different platforms like Cloud, Web Service Provider etc. When it comes to Data Integration, business needs effective communication between their software systems and ETL tool to fulfill their needs.

This blog post explains what a REST Web Service is, how you can create a Power Center workflow, use REST based method to access the web services via HTTP Transformation.

REST Overview

Web Service provides a common platform that allows two different applications on various platforms to communicate and exchange messages between them over HTTP protocol. Web Services can be accessed using different methods or styles. In Web Service world REpresentational State Transfer(REST) is a stateless client-server architecture in which Web Services are exposed as URLs. The typical method of accessing Web resource in a RESTful system is through various HTTP methods such as GET, POST, PUT and DELETE.

REST can use SOAP Web Services and any protocol such as HTTP, SOAP.

High-Level REST Architecture

FIGURE 1: HIGH-LEVEL REST ARCHITECTUREREST call using Informatica Power Center

Let us explain with the help of a sample web service which is used in a Retail Industry. In a Retail industry, customer can purchase various items from the retailer through online and offline and we need a centralized repository called a Data Warehouse where all the transactional data is fed into. The item details are then further normalized into Data Marts to form a common schema which will have facts and dimensions. On a large scale environment, this could be a cumbersome task in designing a data model.

To overcome this we have business needs to access the data over the network and the best way will be to use the web service. So the final order details, pricing, payment details, and shipping information are translated internally and saved in a third party system which is either in an XML or JSON format. The third-party system hosted over a secured network is accessed via a URL. Since this system is based on REST API calls, we use Informatica Power Center as a client to access the server which is a web service API. Informatica Data Quality (IDQ) can also be integrated separately with the Power Center and Informatica MDM to strengthen enterprise data quality.

In Power Center, you can use an HTTP transformation to send a request and receive a response from a REST Web service.

Success Story

5.5% revenue uplift for an Insurance Provider with a Customer-360 view

Learn More

Pre-requisites to configure REST call using HTTP Transformation

Valid REST based URL supporting either XML or JSON

Valid SSL certificates to connect to REST URL outside of the network

Basic Authentication Mechanism(User Name and Password)

Step by Step ProcessCreating a Source File

Create a source file which specifies the search parameters to be passed as input to the HTTP transformation.

FIGURE 2: SOURCE FILECreate HTTP Transformation

Create HTTP transformation with the following input ports:

Username

Password

OrderNumber

FIGURE 3: HTTP TRANSFORMATIONBase URL

Set the Base URL to get the Order details based on the input parameters passed from the source file.

Base URL: https://test-swaggerUI/orders/search

FIGURE 4: BASE URL SETTINGSSet the HTTP Header parameters

Header parameters in HTTP transformation contain header data for request and response in a specified format such as XML or JSON. Following are the parameters:

Default Value: Specify the acceptable values supported by the web service. By default, the response will be in JSON. If you need XML you need to specify as ‘text/xml’

HTTP Name: For REST API the default value is “Accept”

FIGURE 5: HEADER PARAMETER SETTINGSHTTP Method Selection Type

Specify the type of HTTP method on the URL. In our example we are trying to get the order detail information, so specify the type as “GET”

FIGURE 6: HTTP METHOD SELECTION TYPEHTTP Output

Pass the HTTP output to either a flat file or an XML parser to parse the information and load it into the target table. In this example, we write the output of the HTTP to a flat file.

This is how the final mapping will look like.

Final MappingFIGURE 7: FINAL MAPPINGCreate a connection in Workflow manager and run the workflow

Navigate to Workflow manager ->Connections-> Applications->HTTP Transformation

FIGURE 8: HTTP APPLICATION TYPE

Name: Enter the name of the HTTP Transformation which is understandable

User Name: Enter the username of the REST API URL call

Password: Enter the password for the URL

Authentication Type: Select the Authentication type as “Basic”

Create a workflow for the mapping and run the workflow. Finally, view the output. In our example, for the Order Number “ORD-123”

FIGURE 9: FINAL OUTPUTIssues faced while connecting to HTTP Transformation

While using HTTP Transformation the most common issue is “SSL Certificate Error”. So when you run the workflow which is using the final URL you will get the below errors in session logs

Root Cause

This error is caused because the connection to the web service URL or Rest API does not include the certificate data that is needed to authenticate the connection from the certificate provider’s end. The certificate bundle does not contain a certificate from a Certificate Authority that the Web service Provider uses.

Resolution

Download the SSL certificates for the URL and add to the certificate files located under $INFA_HOME/server/bin/ca-bundle.crt

Refer the below KB article to know how to add those certificates

HOW TO: Extract certificates from a HTTPS URL and add to ca-bundle.crt file

Conclusion

This concludes how to create a Power Center Workflow using REST based methods to access a web service using HTTP Transformation, invoke HTTP commands to get data from the web service resource.

#bigdata #data science #machine learning #data

0 notes

mastechinfotrellis · 3 years ago

Text

Connecting Informatica PowerCenter to Teradata

Teradata - Overview

PowerCenter works with many databases, among which Teradata is one of a kind. Informatica PowerCenter integrates Teradata database into any business system and it serves as the technology foundation for controlling data movements. In Informatica PowerCenter, ODBC is used to connect with Teradata tables and its data.

This blog helps you to create, configure, compile, and execute a PowerCenter workflow in Windows that can read the data from and write the data to Teradata database.

What’s unique about Teradata database?

Teradata is an RDBMS with multiple processors to cater parallel processing. Because of its linear scalability, the performance increases as you increase the number of nodes.

Configuring and Executing a PowerCenter Workflow

Let us look at the set of steps for Configuring Teradata ODBC Connection on Powercenter Informatica.

Prerequisites

Ensure that you have the latest version of VMware Workstation Player

Install a Teradata Express VM on the VMware player (It’s the Teradata server)

Make sure you have a Teradata Tools and Utility for Windows (TTU)

Informatica PowerCenter

Download from the below link Teradata Express for VMware Player

Download from the below link Teradata Tools and Utilities - Windows Installation Package (this includes Teradata ODBC driver, SQL Assistant, and Administrator)

This table lists the TTU versions and the corresponding compatible PowerCenter versions.

Configuration in Teradata VMware

After the installation of Teradata VM, you can power on the virtual machine.

Create a new database, user, tables and data of your requirement. Finally, check whether the Network Adapter setting is set to “Bridged Network” to make Teradata VM to be visible to all the machines in the network.

Configuration In PowerCenter

If the PowerCenter server and client application is on the Windows machine, then you need to perform the below steps in both the server and client.

Add an entry in the host file for the IP address of the Teradata VM

After the installation of TTU in Windows,

Create a System DSN in ODBC Administrator with ‘Teradata’ as a driver.

Give a name for the Data Source.

Specify the IP or VM name in the ‘Name or IP address’ field.

Give the username and password of your Teradata database.

Sample Mapping

Create a mapping in the PowerCenter Designer application with your requirements.

Reading Data from Teradata Database

Writing Data to Teradata Database

Relational Connection Configuration

Open the Workflow manager and create a new relational connection

Name the relational connection

Provide the username and password for the database from which tables are accessed

Give the name of the database in the Database name attribute and the data source name in the Data source name attribute.

Connection Environment SQL: Specify the SQL if any to run every time when a connection is made

Transaction Environment SQL: Specify the SQL if any to run at the beginning of each transaction.

Challenges faced

The workflow failed because of the following error:

Choosing the appropriate Adapter Settings

This error occurred since the Informatica server couldn’t connect with Teradata VM when “NAT” was the network adapter setting for the VMware. To resolve this, switch the network adapter setting to “Bridged network”

If you use NAT, VM will not have its own IP address on the external network. The host system shares a single network identity which is not visible on the external network.

Instead, when you use Bridged networking, it connects VM to a network by using the network adapter on the host system and will have its own IP address.

Connecting Host and Guest

The connection between host and guest VM (Teradata VM) failed while pinging from the command prompt. The connection between guest and other machines (except the host) on the network was good.

This error was resolved by following the below steps in the host machine

Open Network and sharing center

Select your adapter that has Internet

Click Properties button from the Connection Status Window

Uncheck DNE Lightweight Filter in the Adapter's properties

Click OK and wait for your adapter to reset by itself

Conclusion

This completes the execution of a PowerCenter workflow in Windows to read and write the data to Teradata database

Published by - Mastech InfoTrellis

#database informatica dataanalysis MDM datamanagement

0 notes

mastechinfotrellis · 3 years ago

Text

Connecting MongoDB using IBM DataStage

Introduction

MongoDB is an open-source document- oriented schema-less database system. It does not organize the data using rules of a classical relational data model. Unlike other relational databases where data is stored in columns and rows, MongoDB is built on the architecture of collections and documents. One collection holds different documents and functions. Data is stored in the form of JSON style documents. MongoDB supports dynamic queries on documents using a document based query language like SQL.

This blog post explains how MongoDB can be integrated with IBM DataStage with an illustration.

Why MongoDB?

For the past two decades we have been using Relational Database as data store as they were the only option that was available. But with the introduction of NoSQL, we have more options based on the requirement. Mongo DB is predominantly used in insurance and travel industry.

We can extract any semi-structured data and load it to MongoDB through any of the integration tools. Also Extract from MongoDB is easier and faster when compared to relational databases.

MongoDB integration with IBM DataStage

Since we don’t have a specific external stage in IBM DataStage tool to integrate MongoDB, we are going with Java Integration stage to load or extract data from MongoDB.

Since MongoDB is a schema free database, we can use structured or semi-structured data extracted through DataStage and load it to MongoDB.

Prerequisites

Make sure you have java installed on your machine.

Install Eclipse tool.

Java requires below MongoDB jar to be imported inside the package to use MongoDB functions

Also, Java requires below jar file to be imported inside the package to extract or load data from DataStage

mongo-java-driver-2.11.3.jar or higher versions if available (Download it from the internet)

jar (It is available on the DataStage server. Location: /opt/IBM/InformationServer/Server/DSEngine/java/lib)

Illustration of a DataStage job

Create a job in DataStage to parse the below sample XML

The XML contains one person information and one or more person name objects linked with the person.

FIGURE 1: PERSON XML LINKED TO PERSON NAME & PERSON DATA

2. In this job, link lnk_PersonInfo_in contains person information

3. And, link lnk_PersonNameInfo_in contains person name information

4. In this Job, directly we can use Java Integration Stage to insert data into MongoDB for the Person information link

5. Develop Java code to load person data into MongoDB.

The Result in MongoDB after inserting person information:

6. Create another job to load PersonName information into MongoDB though Java Integration Stage

FIGURE 2: LOADING PERSON NAME INFO INTO MONGODB

7. Below is the Java code to update PersonName information for the respective _id’s

8. To Integrate java code in DataStage

Export the java code as a jar file (LoadParty.jar)

Place the LoadParty.jar and mongo-java-driver-2.11.3.jar in the DataStage server at any location.

9. Configure the Jar files in Java Transformation stage

Java Transformation stage used to load Person Data information

Java Transformation stage used to load Person Name Data information

10. Final result in MongoDB:

Conclusion

Currently there is no external stage for MongoDB in DataStage. Extract/Load from MongoDB in DataStage would become simpler if there is any external stage introduced in future.

#data datatransformation datainformation IBM mongoDB

0 notes

mastechinfotrellis · 3 years ago

Text

EDW Readiness Checklist for adding new Data Sources

Overview

It is common practice to make changes to the underlying systems either to correct problems or to provide support for new features that are needed by the business. Changes can be in the form of adding a new source system to your existing Enterprise Data Warehouse (EDW).

This blog post examines the issue of adding new source systems in an EDW environment, how to manage customizations in an existing EDW, what type of analysis has to be made before the commencement of a project, in the impacted areas, and the solution steps required.

Enterprise Data Warehouse

An Enterprise Data warehouse (EDW) is a conceptual architecture that helps to store subject-oriented, integrated, time variant and non-volatile data for decision making. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. An EDW includes various source systems, ETL (E- extract, T - transform, and L - load), Staging Area, Data warehouse, various Data Marts and BI reporting as shown in EDW Architecture.

FIGURE 1

: ENTERPRISE DATA WAREHOUSE ARCHITECTUREWhy do we need to add a new source system/data in an existing EDW?

An organization can have many transactional source systems. At the time of building EDW, the organization may or may not consider all the source systems. But over the time they need to add those left outsource systems or newly arrived source system into the existing EDW for their decision-making reports.

Challenges in Adding a New Source System/Data to the existing EDW

Let us illustrate with a real-time scenario.

Let us assume that the existing EDW has the below data model as shown in (Figure 2: Existing EDW Data Model). Business has decided to add data from a new source system (Figure 1: Source 4) to the existing EDW. This new source system will be populating data into all the existing dimension tables also has some more information which requires storing in new dimension table (Figure 3: Store Dimension).

FIGURE 2

: EXISTING EDW DATA MODEL

Adding data by creating a new dimension table in the EDW does not pose an issue because we will be able to create new ETL jobs, staging tables, DIM tables, Marts and, Reports accordingly.

Pain Point

The problem arises when we try populating new source data into the existing DIM/Fact tables or in Marts in EDW without proper analysis. This may corrupt the existing ETL jobs, upstream and downstream applications, reporting process and finally corrupting the EDW.

FIGURE 3

: EDW DATA MODEL AFTER ADDING A NEW SOURCE SYSTEMWhat kind of analysis must be performed before commencing a project?

Analysis has to be performed in the below-mentioned areas:

New Source System/Data

Any organization that has many lines of business globally or locally stores the data in multiple source systems. No two source systems will be completely identical in terms of data model, tables ddl (data definition language) and information stored.

Before introducing a new source system to the EDW, data has to be analyzed in the row and column level (maximum length, type, and frequency of the data). It is recommended to use any profiling tool (like IDQ or IBM IIA) to get a complete picture of your data.

Target DIM/Fact tables

It is crucial to have a complete picture of the existing EDW DIM/Fact tables to which the new source system data will be a part of.

Accommodating the new Source System Data

Most of the EDW Table column types are defined as INTEGER but the problem arises when the new source system has the data type as ‘BIGINT”. It is a known fact that BIGINT value cannot reside in an INTEGER field.

In this scenario, we had to change the schema of the existing DW table from INT to BIGINT. But we must ensure that there is no impact on the existing data received from other source systems.

Similarly, we found many changes required to do in existing EDW to accommodate the new source system data.

Mapping

After completing the analysis of source and target tables we have to create the BIBLE (mapping document) for smoothly marrying the new source system to the existing EDW. It should clearly have defined what ddls need to be changed in the staging and Target DB tables.

Changing the ddls of the table is not an easy task because the table already has a huge volume of data and indexed. It is recommended to consult a Database Architect or performed under the supervision of an expert.

ETL Jobs

Any changes in the existing target table ddls will bring huge ripples in the ETL jobs. As we know most of the EDW stores the data in SCD2/SCD3. For implementing SCD2/SCD3 we use lookups in target tables. In the above scenario, we modified the target table column datatype INTEGER to BIGINT. But what will happen with the existing ETL jobs that were doing a lookup in the same table and expecting INTEGER datatype. Obviously, it will start failing. So, it is mandatory to do rigorous impact analysis to the existing ETL jobs.

Upstream and Downstream Applications

The data stored in the EDW is uploaded from the upstream applications and consumed by downstream applications like reporting and data analysis applications. Any changes in ETL jobs or in EDW will impact these applications. Without doing proper impact analysis of these applications and adding new source data in the existing EDW can impact adversely.

Conclusion

Mastech InfoTrellis’ 11 plus years of expertise in building MDM, ETL, Data Warehouse, adding new source systems/data to an existing EDW, have helped us to perform a rigorous impact analysis and arrive at the right solution sets without corrupting the EDW.

Published by Mastech InfoTrellis

#enterprise data datamanagement MDM dataanalysis

0 notes

mastechinfotrellis · 3 years ago

Text

Big Data and the Internet of Things (IoT)

The Internet has come a long way from a shaky dial up to the interconnected revolutionary world we are in. This new wave of interconnectedness is called the IoT (Internet of Things) or sometimes IoE (Internet of Everything)

What is IoT?

In its most simplistic term, IoT refers to the inter-networking of physical devices. These devices are capable of making network connections and exchange data with a cloud infrastructure or other similar devices in their existing network. Many of the devices in this arena would be information generating devices like GPS, temperature sensor, pressure sensor or information receiving devices mainly actuators like motors, relays etc., collectively known as the edge points of the network. Other components are the gateways that aggregate information from edge devices and the cloud that stores and processes information received from gateways/devices.

In some architecture, we can do not-so-complex data processing at the edge devices or gateways which is called Fog Computing. In such a configuration, it is possible to sense and control the “connected objects” remotely, thus reducing the amount of data transported to the cloud for storage/processing. It is also possible for the devices to communicate or share information with each other and run advanced machine learning techniques on the data and make decisions by themselves.

Launch of IPv6 had made it possible to connect billions of devices to the same network, enabling creation of more sophisticated networks. Other technology standards like Li-Fi, BLE and Z-Wave give IoT a push by providing very low energy means of communication between devices in the network.

Conceptual architecture of an IoT system:

Today, IoT applications range from tracking down your lost keys or mobile phone using Bluetooth and other wireless technologies to remotely monitor/manage your home to cut down on bills and resource usage to engage with the data exhaust produced from your city/neighborhood! These are just a few capabilities of what we can actually achieve with sensors and actuators and networked intelligence!! IoT will soon be part of many aspects of our lives such as consumer goods, smart homes/cities, manufacturing, transportation etc.

IoT’s Big Data Problem

As mentioned earlier, the continuum of sensors and devices interconnected through a variety of communication protocols like Bluetooth, BLE, ZigBee, GSM etc. generate huge volumes of data every second. Considering the fact that billions of such devices be connected to the same network, the amount of data a typical system generates runs into several million megabytes per second. For example, in 2015 Paris Air Show, Bombardier showcased its C Series jetliner which is fitted with 5,000 sensors that generate up to 10 GB of data per second. There are many similar industrial cases that produce TBs of operational data every day. The solution to processing such huge data comes with “Big Data technologies”.

The speed at which Big data and IoT is developing is tremendous and is affecting all areas of technologies and businesses as it increases the benefits for organizations and individuals. The growth of data produced via IoT has affected the big data landscape widely. This has made the big data analytics challenging because of the collection and processing of huge amounts of data through various sensors in the IoT environment. The analytics range from mere drilling down to complex optimizations performed on the ingested data.

IoT analytics challenges:

As IoT data is created by devices operating remotely under widely varying environmental conditions and the data is also communicated over long distances often across different networking technologies,the analyses become challenging. Some of the problems are:

Data volume: The data flowing into an organization can grow large very quickly with millions of IoT devices with various sensors sending data on a regular basis. Organizations need to adapt to processing such huge volume of data that inflows on ongoing basis. The data volumes and computing resource needs that IoT demands will soon outpace all the other organization data combined.

Problems with time and space: IoT devices are located in various time zones and geographical locations. This added information need to be captured for precise analytics which again increases the volume and complexity.

Data quality: The quality of the data generated is the decision maker in IoT analytics and needs to be trusted. The quality of analytics depends on how clean and authentic the data is and how quickly we can derive value from that data.

The key challenge to big data technologies is to visualize and uncover insights from various types of IoT data – structured, unstructured, real time etc.

Big IoT Data AnalyticsBig IoT data analytics should include both:

Batch Analytics:

Tasks that require huge volumes of data are typically handled by batch operations. The datasets can be processed from permanent distributed storages using Hadoop MapReduce or in-memory computations using Apache Spark. Apache Pig and Hive are used for data querying and analyses. Since these run on cheap commodity servers on a distributed manner, they are the best bet for processing historical data and deriving insights and predictive models out of it.

Today, most modern data analytics tools like Teradata or Tableau can directly hook into HDFS and process data and generate reports/dashboards.

(Pseudo) Real-time Analytics:

These types of analytics refer to the system that depends on instantaneous feedback based on the data received from the sensors. For example, IoT based health care system which receives data from numerous sensors on a patient’s body. One important feature of such a system is to aggregate real time data from the sensors and run algorithms that can automatically detect situations that need immediate medical attention. The situation is detected; a medical provider or an emergency response system should be notified immediately. In this case the analysis-response cycle should only take few seconds as every second would be a matter of life and death. Other scenarios would be fraud detection, security breach etc. to flag unusual behaviors for immediate actions.

A classic Hadoop based solution might not work in the above cases because of the fact that it relies on MapReduce which is considerable slow involving costly IO operations. The solution is to augment Hadoop ecosystem with a faster real-time engine like Spark, Storm etc.

Following are different options for implementing the real-time layer:

Apache Storm, Kafka and Trident : Highly scalable, reliable, distributed , fast and real-time computing to process high velocity data

Spark Streaming - extension of the core SparkAPI that enables scalable, high-throughput, fault-tolerant stream processing of live data streams

How industries are using Big IoT data:

GE’s Big Bet on Data and Analytics

GE is venturing with newer visions for operational technology (OT) on top of industrial machinery. They connect machines via cloud and use data analytics to help predict breakdowns and assess overall health of machineries.

The Supermarket of the Future

The Supermarket of the future will enhance human shopping experience providing off-the-shelf technology with airy layouts, easy to reach items and informative screens suspended at eye level. Imagine you can access every bit of info about the produce you are buying, say from the location and climatic conditions where it grew, the chemical treatments done to its journey to the shelf right in front of you! Coop Italia’s supermarket is designed with such a rich shopping experience for users.

Published by Mastech InfoTrellis

#data dataanalysis bigdata dataanalytics #data management #data engineering #data science

0 notes

mastechinfotrellis · 3 years ago

Text

Approaching Data as an Enterprise Asset

If you walk into a meeting with all your senior executives and pose the question:

“Do you consider and treat your data as an Enterprise Asset?"

The response you will get is:

“Of course we do.”

The problem in most organizations, however, is that while it is recognized that data is a corporate asset, the practices surrounding the data do not support the automatic response of Yes We Do.

What does it really mean, to treat your data as an enterprise asset? If we were talking about office equipment, corporate offices, fleets of vehicles or many of the other tangible things a corporation may consider an asset, you would hear things like:

Planning

Acquisition

Operation and Maintenance

Improvement

Monitoring

Disposal

When we consider the practices a typical corporation performs for the data assets, we would probably find a similar list of things, but it will most likely read more like:

Some Planning

Acquisition

Operation and Some Maintenance

Little Improvement

No Monitoring

Some Disposal

Is this a harsh assessment of how data gets treated at the enterprise level? For some organizations, maybe, but they're probably more the exception than the rule. It's not surprising, though, that this is the reality for many organizations, as the problem is one that has grown over time as the advances in technology and sources of information have grown at rates far greater than our ability or desire to implement change. In other words:

You Are Not Alone!

The Paradigm Shift

Managing data as an enterprise asset requires a fundamental shift in the way a typical organization operates.

Today many organizations are divided into silos with each responsible for its own line of business or business function. Communication and cooperation across the silos is sporadic at best. Decisions while having the enterprise in mind are basically self-serving, with only limited contribution or support for the other silos. This self-serving nature contributes to data sprawl, adding to complexity, duplication and additional cost for maintaining the data assets.

IT projects are justified, funded, and managed within the silos that need the benefits provided by the projects. Projects can have cross-silo benefits and responsibilities which are typically evaluated in terms of that silo's benefit and cost.

To manage data at the enterprise level, the enterprise has to change, learning to break down these silo based barriers and create an environment where decisions and actions are in support of the entire enterprise, not the silo. To achieve this level of cooperation requires executive level backing not only to start the initiative but also as an on-going process to address issues as they arrive in the future. While making the silo go away seems to be the impossible task, executives can foster the enterprise view by making funding available for enterprise wide initiatives which benefit the enterprise. Enterprise stake holders can form committees to provide governance and oversight for enterprise initiatives.

How do you get your executives excited about approaching such significant change to support data at the enterprise level? Relate the opportunities to executive hot spots:

Regulatory - requirements for privacy, data retention, government reporting

Cost reduction – for IT projects, productivity improvements, business agility, data sharing

Data quality – providing cost reduction, customer satisfaction

Intellectual property – supporting discovery, retention and propagation

Analytics – supporting improved decision making, fraud detection, increased revenues

Planning and Governance

Data governance is one of the key requirements for successful data asset management at the enterprise level. At its simplest, data governance is about the processes that control the creation, accessing, sharing, using, and retiring of information, and what happens when there is a problem.

Data governance should:

Develop a strategy. Decide what data to manage, identify critical and master data. Determine the value of data based on cost of collection, maintenance, business value, risk of lost or inaccuracies.

Establish a committee of the various corporate entities or lines of business that are in a position to understand the data, speak for their line of business, and be able to make a decision

Establish a set of policies to define data integrity, quality, security, classifications and use of data

Establish a set of standards to control how to implement the policies, like naming conventions, data modelling, tools, technologies, and methodologies

Procedures for addressing quality issues, business rule issues, data naming issues, and security issues

Exception management and remediation

Integration of governance into IT project management cycles with check points to provide continuous oversight

Establish a set of penalties for noncompliance (enforced governance requires both rules and penalties)

Documentation and metadata requirements

Shared Data Access

To successfully manage data at the enterprise level you need to manage the creation, maintenance, and inquiry of that information from shared applications. Many organizations are looking to Service Oriented Architecture (SOA) as an approach to provide this shared computing environment for the enterprise. Interestingly enough, in order for SOA to work effectively, the same paradigm shift described earlier is also required here. Applications supporting the entire enterprise cannot be funded and controlled by a silo. Enterprise applications must be handled outside of the silo to be able to support every line of business. Funding for enterprise services cannot be the responsibility of the silo as these will get lower priority compared to the silo's other initiatives.

Silos provide ownership for applications and shared enterprise applications also require ownership. The owner needs to be responsible for the application so that decisions made and actions taken are for the benefit of all stakeholders and not just a specific business group.

Master data management (MDM) is one of these shared enterprise applications that requires the cooperation of all stake holders to be successful. Master data management is an enterprise utility application that can benefit a wide cross section of the business and provide the controlled, shared environment for success.

Invest in Data Quality

Many organizations are not really aware of the quality of their data. Quality issues normally arise as a result of business intelligence initiatives as they tend to highlight the situations where data is not reflecting what the business thinks it should. Marketing campaigns having high levels of failed delivery highlight bad customer contact information, and other business processes that do not function as intended all highlight the same problem.

How can you recognize when you have trustworthy data quality? If you can answer these questions with Yes, you most likely have trustworthy data quality:

Is my data accurate? Do the values make sense? Are the values meaningful?

Is my data valid? Are the values in range? Are the dates proper?

Is my data complete? Are time periods missing? Are mandatory fields empty?

Is my data consistent? Do I get the same information from every source?

Is my information timely? Can my data tell me what happened today, yesterday, last month?

If you are lucky enough to have a high level of data quality, then in order to maintain it you need data stewards who are responsible for dealing with any anomalies that may arise. Data stewards respond to identified data issues and perform corrective actions whenever possible. Common data issues that data stewards deal with are duplication, bad customer addresses, missing customer records and a host of other issues.

Improve Data Collection

Many data a issues area created at point of entry. Data collection applications may not validate information enough, data collection methods may be inferior, and people entering data may not understand what's required or the importance of the data being collected. When problems affecting data quality are identified, action should be taken to correct the problem as soon as possible. If data was important enough to collect, it should be important enough to collect properly.

Control Data Sharing

Data being collected at one point and shared with other applications should be tracked and validated that the data is being sourced from the proper place and is fit for the intended purpose. Data that is shared should also be sourced from the system of record. If data is sourced from an application which is not the source, you risk some processing being done on that data that causes unwanted side effects.

Data replication is a fact of life. Applications may need local copies of shared data to enable some functionality, or you may need to have a local copy to achieve the performance thresholds you required. Replication is a valid process as long as the reasons for the replication are good enough to bypass using the central store, and the replication is properly documented so all downstream consumers of data are known.

Metadata and Documentation

One major aspect of data sharing is having the ability to locate, understand, and access the data. A large client once estimated that 40% of all their IT projects was spent discovering data they already had. Metadata, (data about data), is an important resource to supporting an enterprise data environment. While creating and maintaining metadata is not usually high on the projects list of things to do, it can provide a wide variety of benefits like:

Intellectual property retention

Reduced IT costs by providing readily accessible information to support new development

Improved data quality by having clear definition of business rules

Better decision making by identifying appropriate sources for business purpose

Metadata creation and maintenance should be part of the project lifecycle, and like governance, have check points which must be satisfied to proceed with the project. Metadata is most reliable when automation for the collection and maintenance of the metadata can be employed, but that only works for metadata of physical things, while high level definitions and descriptions require human intervention.

Monitoring your Data

Monitor your data so you can understand how it changes over time. Monitor the quality of your data so problem sources can be identified earlier. Monitoring your data can provide business insight. If you are monitoring your MDM customer information you can see customer growth or decline, types of customers, and changes in customer behaviour. Monitoring allows you to see trends in your data and your business so you can become more proactive and less reactive.

Data has a life cycle

Like tangible assets, your data has a life cycle and at some point it does not provide the intended value and should be retired. Data retention may be governed by legal obligations, by government legislation, or by business value. Storage costs are significantly cheaper than before, but the hidden costs of managing all this historical data should not be over looked.

Identify when your data should be retired and either purge it or archive it for long term storage to allow your data environment to keep to a manageable size.

Enterprise data management is a program

If you've gotten this far and decided this is too much to tackle, you can relax as this is an evolutionary process that will take years. Like any major initiative, trying to accomplish it in one big bang is rarely a good idea. Enterprise data management is a program that is always active and growing, but it is a practice that cannot be ignored. The scale of the problem may be different for every organization, but every company will face the issues caused by unmanaged data. Recognizing that enterprise data management is basically a corporate utility supporting all aspects of the business can help position it for easier adoption or expansion depending on the state of your enterprise data management practices.

#data datamanagement dataengineering enterprise enterprisedata dataingestion masterdatamanagement

0 notes

mastechinfotrellis · 3 years ago

Text

Blueprint for a successful Data Quality Program

Data Quality - Overview

Corporates have started to realize that Data accumulated over the years is proving to be an invaluable asset for the business. The data is analyzed and strategies are devised for the business based on the outcome of Analytics. The accuracy of the prediction and hence the success of the business depends on the quality of the data upon which analytics is performed. So it becomes all the more important for the business to manage data as a strategic asset and its benefits can be fully exploited.

This blog aims to provide a blueprint for a highly successful Data Quality project, practices to be followed for improving Data Quality, and how companies can make the right data-driven decisions by following these best practices.

Source Systems and Data Quality Measurement

To measure the quality of the data, third-party “Data Quality tools “should hook on to the source system and measure the Data Quality. Having a detailed discussion with the owners of the systems identified for Data Quality measurement needs to be undertaken at a very early stage of the project. Many system owners may not have an issue with allowing a third-party Data Quality tool to access their data directly.

But some systems will have regulatory compliance because of which the systems’ owners will not permit other users or applications to access their systems directly. In such a scenario the systems owner and the Data Quality architect will have to agree upon the format in which the data will be extracted from the source system and shared with the Data Quality measurement team for assessing the Data Quality.

Some of the Data Quality tools that are leaders in the market are Informatica, IBM, SAP, SAS, Oracle, Syncsort, Talend.

The Data Quality Architecture should be flexible enough to absorb the data from such systems in any standard format such as CSV, API, and Messages. Care should be taken such that the data that is being made available for Data Quality measurement is extracted and shared with them in an automated way.

Environment Setup

If the Data Quality tool is directly going to connect to the source system, evaluation of the source systems’ metadata, across various environments is another important activity that should be carried out at the initial days of the Data Quality Measurement program. The tables or objects, which hold the source data, should be identical across different environments. If they are not identical, then decisions should be taken to sync them up across environments and should be completed before the developers are onboarded in the project.

If the Data Quality Team is going to receive data in the form of files, then the location in which the files or data will be shared should be identified and the shared location is created with the help of the infrastructure team. Also, the Data Quality tool should be configured so that it can READ the files available in the SHARED Folder.

Data Quality Measurement Architecture

FIGURE 1: DATA QUALITY MEASUREMENT ARCHITECTUREData Acquisition

The architecture for the Data acquisition layer (How do we get the data for DQ Measurement) should be documented and socialized with the Business users, Data Owners, and Data Stewards. Data Quality tool's connectivity to different source systems developed on disparate technologies should be established. The systems which don’t allow ETL tools to directly access the data should be able to send / FTP data to a shared folder.

The data which is not directly available for consumption of Data Quality Measurement should be delivered in an agreed-upon format. The Data Quality Team will validate for their conformity before they are consumed. If the files don't comply with the agreed-upon format, the Data Acquisition and the subsequent Data Quality measurement should not be kicked off. An intimation or communication in the form of mail should be triggered to the team which is responsible for the source data. A similar communication protocol should be implemented for failure at any stage of the Data Quality measurement.

End to End Automation

The entire process of Data acquisition, Data Quality Measurement, Report generation, Report Distribution through Data Quality Dashboard refresh should happen with zero manual intervention. Similarly, the decision to hold or purge data namely the Source Data received in the form of a file and the data used for Data Quality measurement also needs to be documented and agreed upon with the stakeholders. The purging process should also be automated and scheduled to happen at agreed-upon intervals.

Difference Reports and Data Quality Dashboards

In the ecosystem of applications, there will be few attributes that traverse across systems. For example; Opportunity ID created in POS traverses to other systems. The Data Quality rule for those attributes will be the same in all the systems they are found in. Landscaping of all the source systems should be conducted and an inventory of such attributes traveling across systems should be created. Reusable components should be built to measure the Data Quality of those attributes.

The Data Stewards will be interested in knowing the degree of consistency and integrity of a domain's data across the landscape. This will be used as an instrument by the business users to support any decisions made as part of the Data Quality Measurement program on enhancing the data or making other decisions on the content and validity of the data.

This landscaping activity will help the architecture team to create two important assets which will be used by the Data Stewards and business users – Data lineage and Data Dictionary.

Conclusion

Data Quality is more of a business problem to solve than a technical problem. The success of a Data quality program depends on the synergy between the Data Stewards, Application Owners, and IT Teams. The winning team will have the right combination of representatives from each of the teams along with representation from the Project Sponsor and business owners.

Published by - Data Engineering Solutions

#data mdm dataengineering datamanagement dataengineeringsolutions

0 notes

mastechinfotrellis · 3 years ago

Text

Tableau - Your Visual Interface to Data

In the current aggressive business condition, utilizing the data resources of an association is vital to building effective worldwide business endeavors of the future. Multiple organizations burn through a huge number of dollars on business knowledge/information warehousing (BI/DW) arrangements, yet these activities have yielded not as much as expected rate of return (ROI) as they have neglected to distinguish and address the basic angles impacting the result of these BI activities.

Business Intelligence (BI) Editions come in all shapes and sizes. Some accentuate the design, while others tout their showy interface. Some cost a huge number of dollars, while others cost hundreds.

A decent BI edition gets consistent upgrades, and along these lines, dependably stays current with the most recent innovation and patterns. Each great Business Intelligence software or tool must incorporate certain important features. This blog post details on a BI software, Tableau.

Tableau extends the value of the data across any organization’s vertical. It empowers ones business with the freedom to explore data in a trusted environment without limiting it to pre-defined questions, wizards, or chart types. With its ease of deployment , robust integration, simplicity of scalability, and excellent reliability any organization no longer have to choose between empowering the business or protecting its data- with Tableau you can finally do both.

Following guidelines traces highlights and capabilities of Tableau software, and isolates each featureinto four unique classifications.

1 High-level highlights

1.1 Open Architecture

A generally overlooked element, Tableau design stays a standout amongst the most basic parts of Business Intelligence. A few sellers construct their product all alone restrictive engineering, while others expand on open design and systems.

1.2 Wide database support

While a few arrangements just help a solitary database or stage, Tableau BI programming must help any database or stage. Going above and beyond, it should likewise construct applications that can pull information from different wellsprings of information. Tableau 10 includes even more data source options as given below:

Figure 1: Tableau v10

1.3 Real-time information

Some BI programming conveys day-old or even week-old information. Tableau makes applications that convey constant information straightforwardly from your database. Why is this so imperative? Business Intelligence encourages you to make snappy, educated choices in view of the most current information conceivable. An absence of constant information blocks basic leadership and significantly confines the benefits of BI. For example: The City of Boston, for example, is using Tableau in a pilot program in the mayor’s office to set up dashboards for various departments in the city. If it works as envisioned, the mayor will be able to walk into a control center that contains screens showing the dashboards and not only get an idea of what’s happening in those departments at any given time, but also ask questions about particular indicators on the dashboards and get answers while standing in front of the screens.

1.4 Self-benefit capacities

In the past, the IT division controlled detailing and BI capacities inside the association. Nowadays, Tableau programming incorporates self-benefit abilities that let end clients make their own particular BI and announcing applications.

1.5 Mobile support

Modern Business Intelligence programming must traverse all gadgets and stages. Tableau give support all set of mobile devices on various operating system: IOS, Android, etc.

1.6 Support for Data-mart or Data-Warehouses

Data-store and information stockroom structures permit the Tableau software to work with information from numerous source frameworks. They pull information from transactional databases that maintain the business and utilize power of OLAP fordetailed investigation required for ad-hoc reporting and analysis.

2 Security highlights

2.1 Application level security

Application level security gives you a chance to control Tableau application access on each client part or per-client premise. This commonly incorporates a part based framework, which shows distinctive menu choices to various clients in light of their part.

2.2 Row-level (or multi-inhabitant) security

A basic part of Tableau BI and announcing applications, multi-occupant security gives you a chance to control information access inside a solitary application at the column level. As delineated in the picture underneath, this implies various clients get to a similar application, yet see the diverse information.

Figure 2: Multi-occupant Security

2.3 Single sign-on

Tableau’s session/client confirmation process, single sign-on (SSO) gives clients a chance to enter their name and secret key in just a single place and access numerous related applications. It validates the client for every approved application and wipes out login prompts when exchanging between applications in a solitary session.

2.4 User benefit parameters

User benefit parameters let you customize highlights and security to singular clients or client parts. Spared to a client's profile, these client benefit parameters control client particular highlights all through each tableau BI application. Why is this so essential? Client benefit parameters offer wide control over numerous application viewpoints. They let you control the look and feel, include or shroud client alternatives, restrain client abilities, and that's only the tip of the iceberg. For example, assume your organization made a web turntable. Client benefit parameters could control which clients have an expert to trade that rotate table to a PDF report or spreadsheet. Unapproved clients couldn't see the "fare" choice in the application.

2.5 Flexible authentication choices

Many organizations as of now utilize numerous application validation sources. For example, your CRM framework may confirm clients against one client table, while your email framework may utilize a totally extraordinary confirmation source. Tableau BI programming gives adaptable confirmation options– giving you a chance to validate your applications utilizing whatever authentication sources you as of now have set up.

2.6 Application activity evaluating

Tableau Application activity auditing gives IT staff a chance to log end-client movement for flag/signoff exercises. This gives IT divisions a chance to screen when clients sign in, which applications they get to, and when they log off.

3 Must have features

3.1 Ad-hoc reporting

Ad-hoc reports let end users make and share work at run time. Clients select the information components they wish to find in the report at run-time, and afterward, send out the report to an organization of their picking or email the answer to different clients specifically from the web program.

3.2 Ranking report

Tableau report makes variable rankings, over numerous measurements, while indicating different choice criteria at run-time. For instance, assume you need a report that rundowns your main 25 best clients of the most recent year. Or, then again, assume you need the best (or base) 5 sales representatives a month ago. A positioning report makes this basic.

3.3 Executive Dashboards

A dashboard gives a continuous perspective of your business utilizing numerous, simple to-peruse diagrams. Tableau Dashboards offer basic information redid to every official's obligations and zones intrigue. For instance, a CEO needs diagrams showing income over the previous year, month, and week. The client benefit supervisor needs charts showing normal time expected to determine issues.

Figure 3: Dashboard

3.4 Pivot table/OLAP

Pivot tables naturally extricate, arrange, and condense information. Frequently utilized for investigating information, making correlations, and finding patterns, the adaptability offered by rotate tables makes them a standout amongst the most prevalent BI applications.

Figure 4: Pivot Table

3.5 What if-investigation

Tableau application gives you a chance to evaluate potential business changes before you make them. Utilizing past information, it shows how extraordinary changes may influence certain parts of your business. For instance, imagine a scenario in which you raised costs by 10%. Imagine a scenario in which you brought down costs and expanded amount.

3.6 Geospatial/mapping applications

This application takes your topographical information and showcases it graphically on a guide. It enables organizations to pick up area based understanding - either to pick up an aggressive edge, enhance authoritative execution administration or both.

4 Advanced highlights

4.1 Intelligent Alerts

Tableau threshold server application naturally sends an email or SMS message to the suitable party when information comes to a pre-characterized edge. For example, shrewd cautions can in a flash advice the CEO whenever a client wipes out their record, or whenever deals numbers achieve unusual levels.

4.2 Collaboration

As we gained from the ascent of online networking, the web gives the ideal joint effort stage. This idea makes an interpretation of consistently to Business Intelligence. Tableau BI applications will join remarking and let you interface with other collaborators straightforwardly inside the application.

4.3 Cloud-Compatibility BI

Most investigators and specialists concur on one point: Cloud processing is what's to come. Tableau on the cloud guarantees close to 100% uptime and versatility while dodging the exertion/cost of in-house equipment. Tableau Online is your analytics platform fully hosted in the cloud. Publish dashboards and share your discoveries with anyone. Invite colleagues or customers to explore hidden opportunities with interactive visualizations and accurate data, all easily accessible from a browser or on the go with mobile apps.

4.4 Built-in ETL

Tableau ETL software lets you extricate information from different source frameworks, change it into a solitary organization, and load that information into an objective database. They give end clients a basic approach to incorporate information from different areas in a distributed storage structure yet give a unified view for users.

#cloud #clouddata #database #bi #tableau

0 notes

mastechinfotrellis · 3 years ago

Text

Why do Big Data projects fail?

When you read the marketing spin on Big Data and the tools available today, you may deduce that there is much upside and not much downside to implementing a Big Data project. Nevertheless, you will quickly find that this is not the case. It’s not as simple as typing “apt-get install Hadoop” into a Linux command window, everything installs, you ask it complex questions, and it gives you sage advice. It is tough to get a Big Data project working.

This post is not trying to dissuade you from attempting a Big Data project. If done correctly, it can give you critical information about your customers and products that can separate you from the pack and make your company a leader in the marketplace.

However, you need to go into the project understanding what you are getting into and having the right resources assigned to give you the highest chance of success.

Here are four reasons why I think Big Data projects typically fails:

1. Underestimating how complex it is to get data, clean it, store it, and then analyze it.

One of the top reasons a big data project fails is because companies don’t know where all their data is.

Consequently, one of the first things you will need to do is understand where all of your data lives, develop connectors to extract that data from its source, clean the data, and put it in a form it can be analyzed. If you are using Hadoop for this, then once you have identified the data sources, you will need to build connectors to those sources and extract the data. You then have to make sure the data is relatively free of inconsistencies like multiple records for the same customer, wrong spelling, etc. Then, that data must be put into a standard format (JASON, XML, …) and stored in a high-performance file system such as hdfs. All of this requires some sophisticated programming in a language such as Java. You will also have to build out a large cluster of servers that can have the MapReduce nodes set up; and this infrastructure must be kept configured, maintained, and monitored.

2. You don’t know where the data is and what condition it is in.

As mentioned, Big Data is about “Big Data,” up to petabytes of data all stored in one place that can have analytics operations performed against it. But to gather and store this data, you need to know where it is in the first place, and it lurks in places you may not always think of. For example, almost every company has some CRM or point of sale systems that keep up with its customer data. It is usually a relational database and is used daily to transact business with the company’s customers. And, because you cannot know what data you need for analytics, you will need to extract all of this data and put it in the analytics “Data Lake.” But there is much more data available to be analyzed: web log files, spreadsheets, text messages with customers, feedback from Facebook, Instagram posts and comments, and much more. All of this data needs to go into the data lake in a standard form and extract API’s need to be built to get it.

3. You don’t know what questions to ask (the data).

For the big data project to have any value to a company, it needs to provide the team with actionable information. But, it’s still a dumb computer, so it must be explicitly asked what questions you want to be answered. Because of the amount of data available in a big data ecosystem, you can ask some pretty interesting questions. Something as broad as “what is the aged of most of my current customers,” to something as specific as “what zip code do most customers that buy blue caps live in.” You can also look for patterns of behavior and lots of other exciting things, and the computer can learn as more and more data is analyzed and results stored. Still, the company’s operations and marketing teams need to be able to articulate to the data analyst and programmers what questions they want to study.

4. It is expensive and takes time (java programmers, lots of hardware and complex administration, expensive analysts).

I hope you have received from the first three challenges that it’s not easy to get a big data project off the ground, much less keep it running. So, the last reason a project like this fails is the lack of a skilled team that is not given enough time and budget. If you have never looked at what a MapReduce study looks like, you may want to take a look at this word count example, and this example only counts words and returns results. To add to the complexity, you need to develop in Java (although other programming languages will work, Java is the de-facto standard environment), and you probably want to use an IDE such as Eclipse to manage things. All of this requires some pretty skilled developers that understand how to write MapReduce studies and can work with analysts to know what they need to program. These are not junior programmers. For the actual analysis, your data analytics team will need to understand how to develop the complex study algorithms and be able to articulate these to the developers to implement. These are senior analysts with a deep understanding of the data. And finally, the team needs to be led by a secure project manager that either is or has the sponsorship of a senior executive.

All of this costs both time and money. The good news is that once the initial project is finished, subsequent studies and changes to data sources can become somewhat routine, and the costs will decrease. But you still need to understand that just maintenance will not be without cost or effort.

For a big data project to be successful, you must have the right team in place before you start a project, you have defined expectations and have the money and patience to let the project play out. If done correctly, a big data project can pay off big time. It’s just quite a climb to reach the top.

Published by Mastech InfoTrellis

#bigdata #ai #data JAVA

0 notes

mastechinfotrellis · 3 years ago

Text

The Data-Driven Ecosystem and Domain Ontologies

As your enterprise moves toward being data driven, the ability to derive a domain ontology from your company’s data will become ever more important. In order to move to this deep analytical process, it is important to understand the amount of data required and the state the data must be in before you should attempt any deep analytics studies.

Data – Where is it?

The first need is to discover where all of the domain data for your enterprise actually exists. Some is self-evident such as your ERP, CRM, SFA and other internal sources. Some other sources such as social media (Instagram, Facebook, twitter, yelp, etc.) come to mind. However, there are also private sources such as lexisnexis.com, cencus.gov and movoto.com that can be harvested as well as web site data from Google, Bing and others. Some of the website data such as transaction logs and http logs are lesser known but can be just as important to have on hand.

The trick is to know where your data resides, retrieve this data and store it in an area where it can be modeled and acted upon.

Have the data – Now what?

Just because you have collected as much data as you can about your organization does not mean it is necessarily ready for use. The old adage garbage in garbage out is certainly true here and can be magnified if you are letting the computer make decisions based on ontologies derived from this data.

So, an extensive cleansing and deduplication effort needs to be performed on the raw data in order to make it reliable and usable for any deep analytics study. This leads us to the need of building out the data sources in stages, an ideal use for the MIT Enterprise Data Bus.

The process to develop reliable data that can be trusted as a source for deep analytics and decision centric studies such as ontologies is as follows:

Find the data, extensive research must be performed on potential sources of your enterprise data and that data retrieved into a storage source that can handle massive amounts of data, the “Data Ocean”. You will need to attach meta data to the files in order to keep up with what they contain.

Once you have the data in its raw form brought into the data ocean, a number of cleansing, deduplication and business rules must be applied to the raw data to make it reliable and available for use in analytics processes. This ocean of data is then available for use in data lake creation, big data operations, Machine Learning programs or Artificial Intelligence.

Becoming a data driven enterprise is a journey and some initial investment must be made in order to gain long term benefits. Most companies do not have the infrastructure or governance in place to fully utilize the corpus of their data, but with planning and a disciplined approach to that management and engineering of your data, long term benefits can be achieved, and data can drive a competitive advantage for the enterprise.

What the heck is an Ontology?

Ontologies are a deep, almost existential, understanding of a domain. In the case of business this equates to having insights into very complex elements of a particular domain. This could include such domains as customers, products, manufacturing, supply chain, or any other domain that is key to an enterprise’s success.

The thing about ontologies is that they need very trustworthy data in order to come up with the best analysis of a domain. Because you are letting the computer develop insights the business will have to act on, you want those insights to have the best possible data to work with.

So, the data is the thing. The better data you have, the better your ontologies and the better your outcome when you act on those ontologies to gain a competitive advantage over your competitors.

published by Mastech InfoTrellis

#enterprise databus data AI artificialintelligence enterprisedatabus

0 notes

mastechinfotrellis · 3 years ago

Text

Enterprise Data Bus – What? How? And Why?

What is an Enterprise Data Bus (EDB)? An EDB is a collection of data management and analysis tools that are installed and orchestrated so that the corpus of enterprise data is made available for analysis, and business value creation. An EDB is the analytics node of the data fabric. It can be used as an integration hub and connector layer of the data fabric. An EDB embraces the DataOps concept by allowing an agile method for automation and orchestration of the enterprise’s data journey, followed by Data from creation to its eventual use for deep analytics, data science, and automated business decisions.

While working with clients on their journey to becoming a data-driven enterprise, one of the key challenges encountered is the reliability of the data being used to drive business decisions. There is almost an unlimited amount of data available to an enterprise. It must be properly ingested, stored, transformed, and managed for it to be useful in business decisions, and digital transformations.

Many times we have seen Big Data and Data Science initiatives derailed. It did so due to the availability, reliability, and infrastructure performance issues that come with large data sets, which require very precise data quality. The concepts that are applied to Master Data Management need to be applied similarly to the source data for Data Science and Analytics studies. This means that many of the tools that are used for MDM can be leveraged within an EDB.

Podcast

Data Ingestion and Enterprise Intelligence

Hosted by Jeff Pohlmann, VP and GM, Data Engineering, Mastech InfoTrellis

Listen Now

As the EDB evolves with more data sources, transformations, and workflows, it can be seen as part of the data fabric’s Enterprise Integration Hub. Here, many sources and consumers of data orchestrate data exchange through a common set of robust infrastructure.

As illustrated in the architecture discussion below, the EDB is designed in logical layers. These logical layers can be implemented in several physical configurations, depending on the need for scalability, extensibility, or performance of individual layers.

As with any complex architecture, planning is the best way to mitigate failure. It needs enough amount of time to be allocated to discuss business requirements and technology constraints before attempting a physical implementation of an EDB. Knowledge of the underlying tools chosen to implement the EDB is also crucial, and experience with big data and data integration projects will be useful. Start small and grow the implementation in increments.

The architecture will not change, but volumes and number of data sources and consumers need to be validated and managed as the implementation grows.

The EDB Architecture

An EDB ecosystem is architected in layers. Each layer has its own set of scalable component services orchestrated in a manner to allow the best possible utilization of resources to create trusted analytics data. The EDB is built on top of the enterprise core components of Security, Monitoring, DataOps, and Data Governance.

By Implementing an Enterprise Data Bus, organizations can effectively gather their data in the Ingestion/Integration layer, store the raw, unfiltered data in the Data Ocean. From there, study data is transformed through the Data Engineering layer, and stored on the Data Lake as reliable, for purpose data. Once in the Data Lake, reliable and transformed data can be used for any number of analytics and data science purposes.

Not only can the EDB supply data to Data Science and deep analytics but, it can also be a trusted source for data warehouses, data marts, operational data stores, and other systems and applications that may need highly reliable data.

Since the EDB ecosystem can be bi-directional, results from analytics studies and AI can be used to act upon data in the source systems, creating better operational systems.

EDB by Layer

Source – These are sources of data that can be used for the enterprise data journey. They can be internal systems (CRM, ERP, MDM, SFA, etc.) or, external structured data(lead generators, demographic data, web transaction data, etc.) or, unstructured external data (posts to Yelp, Twitter tweets, Instagram photos, and comments), among others.

Ingestion/Integration – This layer provides the tools required to find, capture, and store the data into its initial landing site, the data ocean. There are several tools, all open source, included with the base implementation of the EDB. They are used to capture data from the sources, perform simple transformations to that data (such as convert to a common format, tag with metadata), and store in a format. The engineering layer can then access that. The benefit of this is, you can rapidly gather large amounts of data in a relatively raw state, and store it in a high- performance container (the Data Ocean) that is a cost-effective, scalable, and high volume infrastructure.

Data Ocean – This is a high-performance file system and data store that holds source data in its raw state for use with analytics or other data intelligence activities. This model allows fast, high volume data capture and storage. Its benefits allow capture of large data sets in a fast, cost-effective infrastructure, with the ability to provide large amounts of data quickly as needed by analytics, and business intelligence systems.

Data Engineering service providers– In this layer, the raw data from the Ocean is transformed into for purpose data used by analytics and business intelligence projects. In the engineering layer, data is transformed using open source or proprietary tools for data matching, data cleansing, data type matching, language translation, and other several transformations required for later use. The output of this layer is in a form that is required for particular study use. This is the critical layer that requires clean, reliable data to use for analytics and BI studies. Without this key layer, a significant rework is required for studies, as data must be cleaned and corrected, which results in the cost of lost productivity hours and bad decisions made by the analytics studies.

Data created by the Engineering layer is stored in the Data Lake, this is clean and reliable data stored in a form that can easily be used by analytics tools.

Data Lake – This is another high-performance, high capacity data store, similar to the Data Ocean where cleaned, ready to use data is stored. It can also be used as a staging area for Engineering, analytics, and BI work as it progresses. The data lake, while almost identical to the Data Ocean in tools, is implemented in a separate compute space to allow for scaling without impacting the other layers of the Architecture. The benefit of this layer is to have a repository of clean and reliable data on which to run analytics studies. It saves significant time and cost by eliminating rework or poor decisions during the analysis process.

Analytics and Data Science – While this is the domain of the analyst and Data Scientist, the EDB provides the tools and connectivity that enables the Analytics teams. The tools, here, are typically specified by the analytics teams, and implementers as part of the EDB overall ecosystem. Any number of Ontology and Analysis Lakes can be created depending on the organization of the study groups.

Because the EDB connects all layers in a bi-directional manner, results from analytics studies can act upon source systems to add or correct data in their data stores.

Data Engineering Solutions

Bottom Line - While many 360 degree tools typically focus on one domain (customer, product, etc.), MIT, by implementing EDB, MDM, and Ontologies, focuses on the entire domain of an enterprise by architecting the ecosystem required to get a true global view of their business domains and enable the data-driven enterprise through data engineering solutions.

#dataengineering #data datascience MDM

0 notes

mastechinfotrellis · 3 years ago

Text

IICS and Data Solutions on the DX Platform

Overview

The move to the cloud is fully in force, and therefore the variety of organizations’ integrated knowledge in hybrid environments has multiplied two-fold. Nearly three-quarters of respondents’ integrated knowledge in hybrid and cloud environments claimed poor knowledge quality in cloud services, restricted API access, company security and compliance policies as being among the key problems in implementations. Aside from big data challenges, the largest issue was an absence of data and skills contained inside their organization’s IT departments on the way to integrate with cloud services.

iPaaS (Integrated Platform as a Service)

For the past few years, cloud knowledge management has been outlined by Integration Platform-as-a-Service (iPaaS). iPaaS could be a set of integration tools delivered from a public cloud and needs no on-premises hardware or computer code. iPaaS was specifically designed to handle the light-weight electronic messaging and document standards (REST, JSON, etc.) employed by today’s cloud apps.

iPaaS consisted of 4 capabilities:

Cloud knowledge Integration

Cloud Application and method Integration

API Management

Connectivity

This approach though positive, provides a restricted read, line for a next-generation iPaaS which will meet the extra vital capabilities.

IICS (Informatica Intelligent Cloud Services)

IICS could be a next-generation integration platform as a service that addresses these key problems with knowledge, application integration, API management, and knowledge management within the cloud, as well as knowledge quality, security, and Master Data management (MDM) capabilities natively engineered for the cloud.

IICS is additionally designed to serve multiple enterprise user personas, promote best practices in preparation by exploitation templates, and alter DevOps strategies for answer delivery. Templates square measure prebuilt logic for common knowledge integration patterns akin to knowledge preparation, cleansing, and deposit. IICS users will produce their own answer templates and save them as assets inside the platform which will then be reused by others throughout the enterprise, augmentingsolidity and fostering best practices. IICS exposes genus APIs which will facilitate DevOps practices as well as automation of answer observation and continuous delivery through integration with external ASCII text file management systems, release, and preparation pipelines.

Capabilities

IICS delivers integration capabilities unified by data intelligence, enforced in container-based micro-services that provide preparation in hybrid, cloud, and multi-cloud environments.

Capabilities square measure being delivered in four distinct however unified clouds:

Integration cloud

Master DataManagement cloud

Data quality and governance cloud

Data security cloud

AI Engine

The CLAIRE engine, Informatica's AI engine, is currently a part of the next-generation iPaaS as AN endpoint, not in AN integration pipeline however within the intelligent automation of information integration activities, trained by data and knowing by individuals. Within the cloud, the CLAIRE engine might probably have access to data generated by several IICS implementations, learning best practices for knowledge integration tasks akin to mapping, transformation, movement, augmentation, federation, and aggregation, leading to intelligent help throughout development and automation of mundane activities.

Conclusion

The common motif of the next-generation iPaaS is technology consolidation and user expertise, delivering capabilities that talk to business worth. Microservices and instrumentality architectures square measure providing computer code vendors with the chance to unify what within the past has been disparate whereas conjointly providing the chance to maneuver quickly into hybrid and multi-cloud configurations to satisfy the strain of digital transformation.

You can visit - data solutions

#mdm masterdatamanagement datamanagement clouddata datasecurity cloudintegration

0 notes

mastechinfotrellis · 3 years ago

Text

Preparing for an MDM Implementation

There are many tasks you can perform to make your MDM implementation go smoother. Many people think being prepared is about the technical things but there are a lot of organizational challenges, data and system related challenges that must be addressed to have a successful MDM implementation.

Some of the following preparation tasks may seem to be project work, but once you start, you will appreciate the ground work done as part of your preparation.

Organizational Preparation

An MDM integration is a business driven initiative, and the business needs to be involved and properly prepared for the tasks ahead. Consider the following organizational points before you start your implementation.

Define the Business Goal

There are many business drivers for wanting to implement an MDM solution:

Complete view of customer profile

Improved data quality

Improved customer experience

Up sell and cross sell opportunities

Centralization of customer information

Reduced development and system costs

Reduced customer communication costs

Improved business decisions by having better data

It is key to identify what your business goals are for implementing an MDM strategy and prioritize if many benefits are to be realized. Understanding the business drivers will help focus the activity when decisions are to be made.

Whitepaper

Simplifying MDM On The Cloud - Is it Right for You?

Download Here

Create a Strategy and Roadmap

An MDM implementation is a program, not a project, and as such benefits from having a strategy. Most organizations will takes years to complete their MDM journey and recognizing this early on will help set the proper expectation.

Create an MDM assessment strategy and a roadmap to answer these questions to help keep things on track:

What needs to be done?

What sequence should it be done in?

What resources will be required?

What benefits will I receive and when?

What is it going to cost?

Define your Success Criteria

It is important to recognize when you have had success and be able to measure and communicate that success to your stakeholders. Your MDM implementation will run over many phases and each phase is designed to achieve a business goal. Having defined your criteria for success and being able to measure your rate of success will assist in keeping the program on track, and assist with future funding to achieve the ultimate goal. Having concrete milestones is essential in order to measure and demonstrate progress in such a large undertaking.

Get an Executive Sponsor

An MDM implementation is a program that will require many years to execute in most organizations. If you are considering an MDM implementation you probably already have numerous systems collecting, massaging, and reporting on customer information, and suffering from the diversity, volume, duplication, and data quality issues that come with it. It took years to create the problem, and it will take some years to straighten it all out.

Get an executive sponsor to back the initiative to have success. Many organizations are project oriented, and even funding can be limited to a project by project basis. Long term initiatives like MDM require a longer vision and support in order to be ultimately successful, and only an executive can provide that backing.

Many organizations operate in a series of silos. The very definition of master data indicates it is a shared, common interest set of information - which is at odds with a silo based organization. Executive sponsorship assists in having stakeholders work together.

Get an Owner

Some organizations do not have the concept of application ownership. A successful MDM implementation benefits from having an application owner whose sole responsibility is to look after the MDM application.

MDM is a foundational application, which means that the business value and function that MDM supplies is directly tied to the clients that use it. MDM, being a foundational application, will be used by many client applications in the organization, and typically each one of those clients will have an agenda for customer information. The application owner can make sure that any decisions related to the MDM application are in the best interests of all of the stakeholders.

Bring Governance to the Party

Establish a data governance body to oversee the master data.

If you are new to master data management, you are probably looking at mastering your Party, or person and organization information. Master data by definition is information which is of interest to a cross section of the enterprise. Interest means more than just having the information, but it also embodies how information is to be sourced and used, which can differ amongst the various stakeholders in the company. Governance gives you a process where you can resolve issues in the definition and use of information to ensure that the master data can service the needs of all the stakeholders in a way acceptable to all.

Governance is important to establish things like:

Definitions of data elements

Identifying sources of information

Identifying remediation processes for problems

Authorizing exceptions and follow-up compliance

Keeping MDM from becoming the dumping ground for Party data because it’s easy

Establish the rules and enforce them

Some organizations practice data stewardship and not data governance. The big difference between the approaches is whether you really need to comply or not. Governance implies rules and enforcement, and this is what master data management really needs.

In many organizations the master data is distributed amongst many different systems, is not centrally managed with a common set of data integrity and quality rules, and copied and modified in multiple locations. This is the MDM problem you are trying to solve and after all the hard work of implementing an MDM solution you do not want to end up with the exact same problem on a shiny new technology platform.

Establish rules for:

Where master data comes from

When is it acceptable to deviate from a rule

Provide an exception process with regular review and forced compliance based on defined timelines

Process to add new attributes

Process to control replication and when it's acceptable

Work Together

Many organizations are divided into silos and few initiatives span these silos. Your MDM implementation will span silos. Start early to establish the working relationship between the silos. Identify your stakeholders and get buy-in for the program.

SOA you need Governance

Modern MDM technology solutions are based on a Service Oriented Architecture approach and integrated into your systems using a service bus. MDM being a foundational application will provide a set of services that should be designed to meet the needs of the entire organization and thus require careful planning and governance to ensure that your services are strategic and you don't end up with JBOWS, (Just a Bunch Of Web Services).

A SOA governance function is an important organizational body in an IT organization. Services need to be well defined, have business rules that are well documented and understood, and be able to service the needs of a wide variety of clients. Commercial MDM products can offer hundreds of services out of the box contributing a significant number of services to the service catalogue.

System and Data Preparation

With the organization ready to embark on your MDM journey there are preparation tasks that will assist in keeping things moving along, and gain a better understanding of the real challenge before you.

Identify your Master Data

Not all information about a customer (or whatever domain your MDM implementation covers) is master data. Master data is information that is of interest to more than a single area of the organization. Master data is the key information that many people are interested in and can benefit from having a common view.

Your master data can be centrally managed and your other related domain information can exist in the systems that own or master that information. The Enterprise Service Bus is the common vehicle used to integrate the different sources of information to provide a consolidated view to the clients in a transparent manner.

Take a first cut at what your master data for the domain you are implementing will be comprised of. During the implementation you can refine the definition. By going through the process of the initial cut you will also get to exercise your Data Governance committee and processes, providing valuable experience for when timetables are tighter due to implementation targets.

Identify your Consumers and Producers

Identify the applications that produce master data and those that consume master data. Document how these applications:

Store their master data. In a database or file or other

How they update the master data. Do they create a key?

Do they share master data?

Are additions and updates made by other applications sent to the current application to keep it up to date?

What volume of master data exists?

How many add, update and inquiry transactions are executed?

What information is created used or stored?

What are your sources of data enrichment?

How is it shared?

How often?

How are they sent?

How often?

You may be looking at this list and thinking we should already have all this information. The truth is, yes you should but in reality you probably don't have it readily available.

You may also be looking at this information thinking that this is all project work. These questions however are very high level and will serve as only the starting place for when the implementation starts. This high level information will be required to fully understand the scope of your MDM implementation, the number of systems affected and the kinds of processing that will be impacted.

While executing the program, significant details about each one of these questions will need to be answered in order to solve the technology problems and ensure solutions account for the various categories of use cases.

Embrace the Metadata

Your MDM implementation will require you to gather significant information about your master data and systems that produce and consume it. Make sure that you retain this intellectual property by capturing the information in a metadata repository.

The MDM implementation will require you to capture:

Where master data comes from

Who uses it

What is it used for

What data is used

What rules are associated with production and consumption

Descriptions and definitions of the data

Fit for purpose

And numerous other bits of information. This is a good base of important information that will be useful to you going forward.

Consider the canonical model

The canonical model is a definition of what your master data is to contain, how elements are named and defined and how it is organized. The major MDM products offer a predefined data model that is significant and well thought out. Having your own data model protects you from dependency on a vendor’s definition and provides you an opportunity to establish your own naming standards and rules for the data.

Your customer information, or other domain information you are implementing, will have additional attributes which are not necessarily part of your master data. The canonical model provides the opportunity to create a comprehensive model that can service the larger need of the organization.

The enterprise service bus serves as the vehicle to translate requests for master data information from your canonical model to the technical requirements of your solution, and responses from the solution back to your clients.

Profile your Data

Profile your data to gain an understanding of the anomalies and problems that will need to be solved when you attempt to create a master store of the information.

As part of your MDM implementation you will want to establish a set of rules designed to address data quality issues. It will be important to understand the significance of those rules on your existing data so come the time for your first production load you don’t end up excluding vast amounts of data because it didn’t fit the rules.

Implementation of data governance

In Summary

It may seem like a lot of things need to be done just to prepare for an MDM implementation, but in reality it is in line with the scope of the work to come. No organization embarks on a multiple year business and IT initiative without significant planning and preparation.

The size of the problem to be solved will vary in each organization.

The existence and maturity of organizational processes such as governance will vary in each organization.

The timelines will vary in each organization.

The requirements to be successful are the same in each organization.

#data governance #mdm #master data management #datagovernanceimplementation implementationofdatagavernance

0 notes

mastechinfotrellis · 3 years ago

Text

Next-Generation MDM - Architecting Intelligence for the Future

Master Data Management (MDM) can best be described as an organization's core data repository containing the basic information essential to conducting business. The recording and management of an organization's master data can be attributed to processes and technologies developed in the 1990s. Although, they can be traced further back to the days long before computers were invented. It was manually maintained and was usually the "contact" information core to an organization's business function.

As advances in technology came about, managing the core business data became a defined process that included people, processes, and technology. What had initially been "contact" information expanded into four general domains: Customers, Products, Locations, and Others. Accurate master data was critical when analyzing transactional and non-operational data. MDM grew to be the link that facilitated data sharing between and within organizations – it gained the name "The Golden Record."

Whitepaper

AI and Knowledge Graphs – For better context of a large data corpus

Read Now

Today, we find ourselves inundated with vast amounts of data and struggle to distill relevant insights from the seas of information. There exist many ways to organize data in all varieties of technologies. The challenge remains to add context to data and turn it into useful information in a common language that can be understood and used across the organization.

The Next Generation of Master Data Management has moved from the "Golden Record" to the "Golden Profile." The comparison below outlines the digital transformation undergone by the MDM ecosystem to emerge as a relevant, contextualized system that delivers the right data to the right consumer at the right time.

The Next Generation of MDM forms the foundation of an organization's Data Fabric. It embodies a knowledge graph that connects the business context to the data and ensures data sharing across business units. By connecting data, organizations can leverage customer 360, KYC, Digital Twin, Digital Thread, and AI/ML opportunities through a trusted, integrated, and connected ecosystem.

Key Design Tenets

The Next Generation of MDM is a design concept and not a list of tools and/or applications. The design tenets of Next-Gen MDM that provide unparalleled flexibility and intelligence include:

Proven and Flexible Data Model

Container-Based Micro-Service Design

Graph DB

Supports CI/CD Pipeline

M/L Enabled (data integration, data quality, match, auto-merge, ever-greening, event management, data remediation, etc.)

Security and Access Control

Audit Trail

International Languages (include double-byte languages)

Multi-tenant

Data Residency Compliant Readiness

Edge-computing Readiness

Extensibility

Scalability

High availability & DR Capable

These design tenets have been developed to fulfill current state requirements and deliver capabilities that extend into the future too. While no technology is "permanent," extensibility is a crucial factor that sets Next-Gen MDM apart.

The starting point need not be intimidating; however, different activities require different skill sets to move the initiative forward. Shaping the new ecosystem calls for shifting some traditional roles and incorporating some other necessary roles.

Mastech InfoTrellis is a globally recognized leader with over 100 Successful implementations of advanced data management solutions. Founded by the original developers of IBM and other MDM product stacks, Mastech InfoTrellis is one among the pioneers of building Next-Gen products, including MDM, C360, and Veriscope.

Embrace the power of data monetization and automated data intelligence and extract maximum value with Mastech InfoTrellis experts charting a measurable path for your Next-Gen MDM integration.

#mdm mdmintegration datamanagement data management

0 notes

mastechinfotrellis · 3 years ago

Text

Strategic and tactical things to consider when building a Minimum Viable Knowledge Graph

In today’s environment, one does not have to be a particularly large organization to generate a ton of internal data. Interestingly enough, it is in large organizations that Data and IT teams have grown accustomed to structured, taxonomical data architecture and multi-systems storage. With multiple systems, data is highly prone to inconsistencies and duplication, especially as it is stored across different applications. It becomes, therefore, ironically harder to get to a trusted 360◦ view of customers and the business, and doubly challenging to make significant improvements to customer satisfaction since wrong messaging becomes more likely as data volume increases.

Still, gut reaction tells the traditional data engineer to create a single centralized master hub for storing the entire core customer data. Yet, with so many interconnections and nested relationships to account for and scale, there needs to be a paradigm shift in how such connections are linked and mined. And this is why Knowledge Graph’s time is now.

Knowledge Graphs integrate entities from multiple sources with their properties, relationships, and concepts in a network-like structure, i.e., every linkage is meaningful and contextual. This nature of knowledge graphs, therefore, gifts business leaders, analysts, and data scientists with a more holistic view of their business spanning from different levels of suppliers to products, productivity, and to customers simultaneously. Tertiary data, or those that may not be directly linked to customers or the business, can also be accommodated on the graph. This means traceability of insights, full accountability on the “why” of things, as well as the beginning of next-gen scenario simulations into the future. In other words, Knowledge Graphs facilitate enterprise intelligence.

But the construction and maintenance of decently intelligent Knowledge Graphs need to be demystified. Consider the challenges and resourcing needs for the following:

1. Understanding the multiple source systems where the data currently reside

Taking inventory of not just the data but also their multiple source systems is the most important step in data integration towards knowledge graphing. On the one hand, there is the pure mapping out of the underlying structure of each source system and the types of data residing in it – but this is only half the foundation. Business requirements and metrics must be simultaneously gathered and vetted as they are the very determinants of how the data will be used in the first place. The combination of business requirements and source system mapping would then fuel the scope for the Knowledge Graph construction. The sooner this is done, the better, so that the right contextual schema can be written. These are not just tasks to be checked off on a project plan in a vacuum. It is actually a highly collaborative and communicative effort across teams. Naturally, when business needs and the use of data are discussed, Analytics (yes, Analytics) must be discussed. Analytics is the catalyst to insights within the data, while the understanding of the existing data systems ensures implementable business solutions as an outcome. This is why, at Mastech InfoTrellis, we have Analytics Advisors who have multiple years of experience building analytics solutions at scale for Fortune 1000 organizations. They ensure that pitfalls are avoided when designing databases and architecture whose main internal consumers are analytically-minded use cases.

2. Designing the graph ontology – the real source of competitive advantage

What makes one Knowledge Graph smarter than another is how its data is contextualized. This is where Ontologies come in. For knowledge graphing, in particular, ontologies are sets of vertices and edges that map data attributes to their relevant schema. With vertices representing real-world entities, and edges representing relationships between those entities, ontologies instantly inject a comprehensive context to a graph, making it easy to access hidden interactions like never before. Even more impressive are the self-learning properties that these ontologies could have (depending on the AI/ML well versed-ness of the analytical resource building the graph). If done properly, the Knowledge Graph’s Entity Relationship schema can go unmanned and be self-patching for a long time, evolving as it draws more and more linkages across data domains.

Some practitioners wait until the business requirements and scope related to the Knowledge Graph are ready before they think of how they are going to inject context into the graph. However, to gain the most competitive advantage, ontologies should be designed simultaneously. Especially when plenty of domain expertise is required, ontologies can actually make business requirements more succinct, thereby streamlining the data requests for a given business problem. They guide data collection and engineering right down to choosing the correct tech stack and graph software. This, in essence, is why we have Ontology Design and Knowledge Graph Readiness Assessment as key offerings in our Data Science Kiosk. To immediately reap the benefits and ROI of knowledge graphing, an organization must be conscientious of its analytical talent and its analytics engine (which is really data architecture), not just its data management.

3. Data Profiling

Once data floods into the Knowledge Graph, data profiling needs to happen to quickly evaluate the quality and content of the data to ensure its integrity, accuracy, and completeness. Often times this is a validation/quality assurance exercise to verify that predefined rules and standards are preserved and discover anomalies, if any. Data profiling is especially important when the data is gathered from multiple source systems, and we want to make sure that quality and consistency are not being compromised during the transfer.

To perform this process effectively, we have developed a data profiling bot known as “Rover,” which is extremely useful in examining as well as collecting statistics or informative summaries about the database/file it is analyzing. Rover plays a crucial role in any data-intensive project to assess the data quality and improve the accuracy of data in corporate databases.

For Knowledge Graph construction, in particular, Rover not only quickly validates what’s in the database but also helps test out the pre-designed ontologies to make sure that they can be flooded with ample data once the graph is built. He makes the stakeholders aware of whether their data would be enough to create a Minimum Viable Graph (MVG) to support their analyses and AI initiatives. Last but not least, Rover also exemplifies the kind of automation that can be built on top of MVGs; this is why he lives virtually in the Data Science Kiosk, which is powered by our AI Accelerators (Ontology Design, Smart Ingestion, Entity.ai, Smart Data Prep Assistants, Feature Miners, and Smart Storytellers).

4. Integrating your Knowledge Graph “insights” back to business as usual

This last point is an important one. For Knowledge Graph insights to be useful (and worth it), they should be easily foldable back into one’s current environment. Let’s face it – knowledge graphing is not an all-or-none undertaking, but an evolution. That’s why the concept of an MVG is important, which is also why the first set of ontologies designed is pivotal. If the first ontologies and MVG do not address the right use cases, it could be a long waiting game for impatient stakeholders. If the ability to extract insights, or if the partial build of a graph actually impedes BAU processes, or (worse) if integration with BAU systems is expensive/impossible, the buy-in may not come easy. What good are such insights if not exercisable? These are the kinds of considerations our Knowledge Graph Readiness Assessment would expose.

Conclusion

The biggest advantage of building Knowledge Graphs is that it provides us with a unified view of customer and enterprise data on a global schema that captures the interrelationships between the data items represented across multiple databases. It helps gain important insights about customers and comes with numerous applications for multiple business use cases. When done right, Knowledge Graphs are the portal to true Customer360 and 5G-ready hyper-personalization. When done wrong, inefficiently or not, assigned their proper resource, Knowledge Graphs can become a grand science experiment with not much ROI to show for the effort. Is your organization Knowledge Graph-Ready?

For more details - enterprise knowledge graph

#mdm #datasciense #datamanagement #knowledgegraph #enterpriseknowledgegraph

0 notes

mastechinfotrellis · 3 years ago

Text

Cloud Ops Is a Key Enabler of Successful Digital Transformation

This article was originally published in DATAVERSITY on July 27, 2021

My first job out of graduate school was to develop an e-commerce application for a manufacturing behemoth. The idea was to build a web-based application for internal sales teams to place orders on behalf of the customers and eliminate paper processing. Don’t judge it using today’s standard. At the time, almost two decades ago, it was a revolutionary idea, and a tremendous undertaking. As a new computer science grad, I was excited and took on this task with immense enthusiasm.

The first functional web order center was released a few months later. After that we’d do three to four major releases per year, with many point releases in between to add small features and fix bugs. We soon developed a complex pricing system to automatically calculate thousands of customizable parts and products, and started rolling it out to other countries outside of the U.S. By the time I left the company three and a half years later, the web order center was rolled out to over 30 countries, with over $1 billion going through it every year.

Looking back, that was my first digital transformation experience. At the time, I had no idea that my entire professional career would evolve from it. My 15-year journey with IBM took me around the world, working with many customers across industries to help them transform their businesses one way or another. I have witnessed many successful transformations where companies emerge or stay on as industry leaders, but I have also seen many enterprises struggle to transform and adjust to the new ways of doing business.

A study by Gartner found that half of CEOs expect their industry to be substantially or unrecognizably transformed by digital. While talking to IT leaders around the world, I learned that many of them believe that digital transformation is the biggest challenge, but also a once-in-a-lifetime opportunity in our generation. However, research shows that 70% of all digital transformations fail. No clearly defined best practices, poorly defined tool integrations, limited ability to deploy across platforms, and/or struggling to implement new technologies are on top of the list.

Disciplined Approach Through Cloud Operations (Cloud Ops)

Cloud is an essential part of any digital transformation strategy. Cloud Operations (Cloud Ops) brings Agile and DevOps (a set of practices that combines software development and IT operations) to the cloud. Bringing the DevOps methodology and traditional IT operations to the cloud-based infrastructure allows team members to collaborate more effectively across the collective hybrid cloud ecosystem.

The most common reason digital transformations fail is due to how the transformation is executed. Setting clear goals, having a consistent approach, and reliable methodologies to achieve those goals are keys to a successful transformation. By applying the reliable and proven Agile and DevOps approach to cloud operations, you have a better chance of improving the success rate of a digital transformation.

Predictable Execution and Operational Excellence

In the world of Cloud Ops or DevOps, automation is your friend. It reduces human error, improves quality, and speeds up processes.

One of the objectives of the DevOps philosophy involves continuous operations and zero downtime. The idea is that you can keep updating the software, or deploy new features without disrupting the application or services. In order words, the production environment never goes down. Customers won’t be ever affected while you put out new feature functions, or fix bugs

If DevOps is done successfully, organizations can deploy software hundreds or even thousands of times per day. Through rigorous automated testing and deployment pipelines, it’s easy to deploy changes across many functions on a continuous basis, which means features are released to customers more frequently, bugs are fixed more quickly, operations are optimized continuously, and so on.

Companies that can successfully manage these changes consistently with predictable outcomes will come out as winners in the marketplace.

Innovation at a Faster Pace with Cloud Ops

Cloud Ops transforms your teams to have Agile, DevOps, and digital in their DNA and make your digital transformation journey more successful. Incorporation of cloud services can facilitate innovation initiatives. At the heart of modern cloud operation practices is the true integration with open-source capabilities to accelerate the continuous delivery of IT innovation in a hybrid cloud world. If done right, it’ll set your organization way ahead of the competition.

It’s just a matter of time before all companies become IT companies at their core. The ability to innovate faster and deliver and deploy those innovations to the consumers quickly will be the key to stay ahead of the competition and succeed in this industrial revolution.

Learn more about cloud services here

#cloudservices #digital transformation #mdm #data management #mastechinfotrellis

0 notes