There are 2 data processing solutions:
- Transaction Systems (OLTP)
- Analytical Systems (OLAP)
Online Transaction Processing AKA OLTP is a process where data from different sources such as banking activity, retail checkout etc., and stores the transaction data into a database.
Online Analytical Processing AKA OLAP is a process applies complex queries to large amounts of historical data, aggregated from OLTP databases and other sources, for data mining, analytics, and business Intelligence projects
OLTP | OLAP |
Handles day to day transactions that results from enterprise operations | Analysis of information in the database for the purpose of making management decisions. |
Small, discrete unit of work | Big picture view of the information held in DB |
Often high volume | Can be high/low depending on the requirement |
Data is processed very quickly | Seconds, minutes, or hours depending on the amount of data to process |
Control and run essential business operations in real time | Plan, solve problems, support decisions, discover hidden insights |
Normalized databases for efficiency | Denormalized databases for analysis |
DATA INGESTION
- Getting the data from multiple data sources and storing it into one centralised location is called DATA INGESTION. It is the process of obtaining and importing data.
- This data can arrive as continuous stream or batches.
- Raw data can be stored in DBMS as files or other forms for fast, easily accessible storage.
- Data Ingestion might perform
- Filtering: Eg, Reject suspicious, corrupt or duplicated data
- Simple Transformation: Converting data into standard form
NOTE: During ingestion we can make only simple transformations and not complex critical transformation
DATA PROCESSING
- Data processing takes data in raw form, cleans it and converts it into a more meaningful format.
- The result is a databases /data warehouse that you can use to perform queries and generate visualisations.
- Data Processing might perform
- Data Cleaning: Removing anomalies, applying filters and transformations
- Data Wrangling: Capture, filter, clean, combine and aggregate data
NOTE: We can make more complex transformation. Eg: Azure synapse Analytics is used to store cleaned and transformed data
DATA EXPLORATION
- Data exploration is the process of trying to put together the pieces of puzzle in the journey to find a message in the data working with feedback cycles of defining hypotheses, analyzing data, and visualizing results.
- Exploration is a deep dive analysis of data in search for better insights
- The explored data can be visualized using tools such as Tableau, Power BI etc.,