-->

Section 01 - Introduction

SQL Basics for Data Engineering

SQL

SQL (Structured Query Language) is the language of data - it’s how you ask questions, filter results, and transform tables.

In data engineering, SQL is your foundation - you’ll use it to clean data, build transformations, and debug pipelines.

Database

A database is nothing more than a set of related information. For example, a telephone book is a database of the names, phone numbers and addresses of all the people living in a particular region.

Relational Databases (SQL)

  • Structured data with predefined schemas (tables, rows and columns).
  • Relationships enforced via foreign keys between tables.
  • ACID compliance (Atomicity, Consistency, Isolation and Durability).

Examples: PostgreSQL, MySQL, Oracle, SQL Server, Teradata

Non-Relational Databases (NoSQL)

  • Schema flexibility (add fields on the fly).
  • Horizontal scaling (built for distributed systems).
  • High performance for simple queries (key-value lookups).

Examples: Document DB (MongoDB), Columnar (Apache Cassandra), Key-Value pair (Redis, DynamoDB), Graph (Neo4j)

Terminologies

TermDefinition
EntityA fundamental concept in database design and it represents a real-world object, concept or thing that needs to be stored and managed.
ColumnAn individual piece of data stored in a table.
RowA set of columns that together completely describe an entity or some action on an entity. Also called a record.
TableA set of rows, held either in memory or on permanent storage.
Result setAnother name for a non-persistent table, generally the result of an SQL query.
Primary KeyOne or more columns that can be used as a unique identifier for each row in a table.
Foreign KeyOne or more columns that can be used together to identify a single row in another table.

SQL Statement Classes

  1. SQL Schema statements (DDL) - used to define data structures stored in databases. CREATE, DROP, ALTER, TRUNCATE
  2. SQL Data Statements (DML) - used to manipulate data structures defined using DDL. INSERT, UPDATE, DELETE, EXPLAIN, LOCK
  3. SQL Query statements (DQL) - used to select data stored in databases. SELECT
  4. SQL Control statements (DCL) - used to control rights and permissions. GRANT, REVOKE
  5. SQL Transaction statements (TCL) - used to manage transactions. COMMIT, ROLLBACK, SET, SAVEPOINT