#datafundamentals
Explore tagged Tumblr posts
excelworld · 1 year ago
Text
Tumblr media
0 notes
matzdeveloper · 2 years ago
Text
Azure Data Fundamentals - Part1
Data is generated everywhere using different system, application and devices, in multiple structures and format
Data is a valuable asset , which provide useful information and help to take critical business decisions when analyzed properly
It is necessary to store, analyze and capture data since it has become a primary requirement for every company across the globe.
Finding out Different Data Formats
Data structure in which the data is organized represent entities, And each entity has one ore more attributes or characteristics.
Data can be classified into different formats -Structured -Unstructured -Semi-Structure
Structured This is fixed schema and has different fields and properties. Schema is organized in a tabular format in rows and columns. Rows represent each instance of a data entity and column represent attribute of the entity.
Semi-Structured This has some structure and allow variation between entity instances. One example of semi-structured data is JSON(JavaScript Object Notation)
Unstructured This has data without any structure. Example can be document, images, audio, video and binary files.
Various options to store data in files
Two broad categories of data store in common use
File store Storing the data on a hard disk or removable media such as USB drives or on central shared location in the cloud
File Format used to store data depends on a number of factors including
Type of the data being stored
Application that will need ro read/write and process data
Data files readable by human or optimized for efficient storage and processing
Common File Formats
Delimited text files Data is separated with field delimiters and row terminators. Most commonly used format is CSV Data
-JSON Data is represented in hierarchical document schema which is used define object that have multiple attributes.
Databases
-XML Data is represented using tags enclosed in angle brackets to define elements and attributes.
-BLOB Data is stored in binary format 0's and 1's.Common type of data stored as binary include images, audio, video and application specific documents.
-Optimized File Format Some specialized file formats that enable compression, indexing and efficient storage and processing have been developed.
Common optimized file format include Avro, Optimized Row Columnar Format(ORC) and Parquet.
Various options to store data in database
Two ways data are stored in database -Relational Database -Non-Relational Database
-Relational Database This is used to store and query structured data. Data stored in the represent entities. Each instance of an entity is assigned a primary key which uniquely identifies and these keys are used to reference the entity instance in another table. Table are managed and queried using SQL which is based on ANSI standard.
-Non-Relational Databases This is often referred as NOSQL Database. There are 4 common types of nonrelational database commonly used
KeyValue Database - Each record consist of a unique key and associated value
Document Database - Specific form of Key Value database
Column Family Database - Store tabular data in rows and columns
Graph Database - Which store entities as nodes with link to define relationship between them
Understand Transactional data processing solutions
A system records transaction that encapsulate specific events that the organization want to track. Transaction system are often high volume handling millions of transaction every day often referred as Online Transactional Processing OLTP. OLTP system support so called ACID semantics
Atomicity-Each transaction is a single unit which either fails or succeed completely. Consistency-Transaction can only take data in the database from valid state to another. Isolation- Concurrent transaction cannot interfere with each other. Durability-When a transaction is committed, it remains committed.
OLTP is often used for supporting Line of Business Application
Understand Analytical data processing solutions
Analytics can be based on a snapshot of the data at a given point in time or a series of snapshot. It uses read-only system that store vast volumes of historical data.
Analytic usually look like
Data file stored in central data lake for analysis
ETL process copies data from files and OLTP DB into a Datawarehouse. 3.Data in data warehouse is aggregated into OLAP(Online analytical processing) model. 4.Data in data lake, data warehouse and OLAP can be queried to produce reports, visualization and dashboards.
Different Types of user might perform data analytic work at different stages -Data Scientist might work directly with files in a a data lake to explore and model data -Data Analyst query table directly to produce reports and visualization -Business user consume aggregated data in the form of reports and dashboards.
Keep Learning! Keep Enjoying!
0 notes