Author	Nejat Hakan
eMail	nejat.hakan@outlook.de
PayPal Me	https://paypal.me/nejathakan

Database Basics

Introduction What is a Database

Welcome to the world of databases! In today's digital age, data is everywhere. From social media posts and online shopping transactions to scientific research and business analytics, vast amounts of information are constantly being generated, processed, and stored. But how do we manage this deluge of data effectively? How do we ensure it's organized, accessible, secure, and consistent? The answer lies in databases and the systems designed to manage them.

At its core, a database is simply an organized collection of structured information, or data, typically stored electronically in a computer system. Think of it as a highly efficient digital filing cabinet. Instead of rummaging through physical folders, you use specialized software to store, retrieve, update, and manage the data.

The software used to interact with databases – to create them, manage them, query them, and control access – is called a Database Management System (DBMS). When we talk about databases like PostgreSQL, MySQL, MongoDB, or SQLite, we are often referring to the DBMS, although the term "database" is frequently used to encompass both the data itself and the system managing it.

Why are Databases Necessary?

Imagine trying to manage customer information for a large online store using simple text files or spreadsheets.

Scalability: As the number of customers and orders grows, searching, sorting, and updating information in flat files becomes incredibly slow and inefficient. Databases are designed to handle massive amounts of data and scale effectively.
Data Integrity: How do you ensure that every customer has a unique ID, or that an order always refers to an existing product? Databases enforce rules (constraints) to maintain the accuracy, consistency, and reliability of data.
Concurrency: Multiple users or applications often need to access and modify data simultaneously (e.g., multiple customers placing orders at the same time). A DBMS handles concurrent access, preventing conflicts and ensuring data remains consistent. Simple file systems typically lock the entire file, which is highly inefficient.
Data Security: Databases provide mechanisms to control who can access what data and what operations they can perform (e.g., read-only access for analysts, full access for administrators).
Data Retrieval: Databases offer powerful query languages (like SQL) that allow you to retrieve specific subsets of data based on complex criteria far more efficiently than searching through unstructured files.
Backup and Recovery: DBMS provide tools and mechanisms for backing up data regularly and recovering it in case of hardware failure, software bugs, or human error.

A Brief Evolution

Early data management involved primitive file systems. Then came hierarchical and network models in the 1960s and 70s, which organized data in tree-like or graph-like structures. While an improvement, they were often complex and inflexible.

The major breakthrough came in the 1970s with Dr. E.F. Codd's invention of the relational model, which organizes data into simple tables (relations) with rows and columns. This model, combined with the development of Structured Query Language (SQL), revolutionized data management and remains the dominant paradigm today for many applications.

More recently, the rise of big data, the internet, and highly distributed applications led to the development of NoSQL (Not Only SQL) databases. These offer different data models (key-value, document, column-family, graph) designed to handle specific challenges like massive scale, high availability, and unstructured or semi-structured data, sometimes sacrificing some of the strict consistency guarantees of relational databases.

Types of Databases (Brief Overview)

Relational Databases (SQL): Organize data in tables with predefined schemas. Examples: PostgreSQL, MySQL, MariaDB, SQLite, Oracle Database, SQL Server. They use SQL for data definition and manipulation.
NoSQL Databases: A broad category encompassing various non-relational models.
- Key-Value Stores: Simple pairs of keys and values. Examples: Redis, Memcached.
- Document Databases: Store data in flexible, JSON-like documents. Examples: MongoDB, Couchbase.
- Column-Family Stores: Store data in columns rather than rows, optimized for large datasets and aggregations. Examples: Cassandra, HBase.
- Graph Databases: Focus on relationships between data points (nodes and edges). Examples: Neo4j, ArangoDB.

In this section, we will delve deeper into these concepts, starting with the foundational relational model and SQL, and then exploring the diverse world of NoSQL. We will focus on how these databases operate within the Linux environment, providing practical skills through hands-on workshops.

1. Relational Databases and SQL

The relational model, introduced by E.F. Codd, is the foundation for the most widely used type of database system. Its elegance and power stem from its simple, intuitive structure based on mathematical set theory and relational algebra. In a relational database, data is organized into tables, which are formally called relations.

Core Concepts of the Relational Model

Tables (Relations): A table represents a collection of related data entries organized in a two-dimensional grid of columns and rows. Each table typically represents a specific type of entity, like "Customers", "Products", or "Orders".
- Example: A Students table might store information about university students.
Rows (Tuples or Records): Each row in a table represents a single instance or record of the entity type the table describes.
- Example: In the Students table, each row would represent one specific student.
Columns (Attributes or Fields): Each column represents a specific characteristic or attribute of the entity type. All entries in a given column are of the same data type (e.g., integer, text, date).
- Example: The Students table might have columns like student_id (integer), first_name (text), last_name (text), date_of_birth (date), major (text).
Schema: The schema defines the structure of the database. It specifies the tables, the columns within each table, the data types for each column, and the relationships between tables (using keys and constraints). Defining the schema is a critical first step in designing a relational database.
Keys: Keys are special columns (or sets of columns) used to uniquely identify rows and establish relationships between tables.
- Primary Key (PK): A column (or combination of columns) whose value uniquely identifies each row in a table. Primary keys cannot contain NULL values and must be unique. Every table in a well-designed relational database should have a primary key.
  - Example: student_id in the Students table would likely be the primary key.
- Foreign Key (FK): A column (or combination of columns) in one table that refers to the primary key of another table. Foreign keys are used to link related tables and enforce referential integrity (ensuring that a row in one table cannot reference a non-existent row in another table).
  - Example: An Enrollments table might have a student_id column that is a foreign key referencing the student_id primary key in the Students table. This ensures that every enrollment record corresponds to a valid student.
Data Types: Each column must be assigned a specific data type, which dictates the kind of data it can store (e.g., INTEGER, VARCHAR(255) for variable-length text up to 255 characters, TEXT for longer text, DATE, TIMESTAMP, BOOLEAN, NUMERIC(10, 2) for numbers with fixed precision). Using appropriate data types helps ensure data integrity and optimize storage and performance.

Structured Query Language (SQL)

SQL is the standard language used to communicate with relational databases. It's a declarative language, meaning you specify what data you want, and the DBMS figures out how to retrieve or manipulate it. SQL commands are generally categorized into several sub-languages:

Data Definition Language (DDL): Used to define and modify the database structure (the schema).
- CREATE: Creates database objects (e.g., CREATE DATABASE, CREATE TABLE).
- ALTER: Modifies existing database objects (e.g., ALTER TABLE ADD COLUMN).
- DROP: Deletes database objects (e.g., DROP TABLE).
Data Manipulation Language (DML): Used to insert, retrieve, update, and delete data within the tables.
- SELECT: Retrieves data from one or more tables. This is the most frequently used SQL command.
- INSERT: Adds new rows of data into a table.
- UPDATE: Modifies existing data in a table.
- DELETE: Removes rows of data from a table.
Data Control Language (DCL): Used to manage user permissions and access control.
- GRANT: Gives specific permissions (e.g., SELECT, INSERT) on database objects to users or roles.
- REVOKE: Removes previously granted permissions.
Transaction Control Language (TCL): Used to manage transactions, ensuring data consistency.
- BEGIN TRANSACTION (or START TRANSACTION): Marks the beginning of a transaction.
- COMMIT: Saves all changes made during the current transaction permanently.
- ROLLBACK: Discards all changes made during the current transaction, restoring the database to its state before the transaction began.

ACID Properties

Relational databases are renowned for adhering to the ACID properties, which guarantee the reliability of database transactions:

Atomicity: Ensures that each transaction is treated as a single, indivisible unit. Either all operations within the transaction are completed successfully, or none of them are. If any part of the transaction fails, the entire transaction is rolled back. This prevents partial updates that could leave the database in an inconsistent state.
- Example: Transferring money involves debiting one account and crediting another. Atomicity ensures that both actions happen or neither happens; you never end up with money deducted but not credited.
Consistency: Guarantees that a transaction brings the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, and triggers. If a transaction would violate these rules, it is rolled back.
- Example: If a column has a UNIQUE constraint, a transaction attempting to insert a duplicate value will fail, maintaining consistency.
Isolation: Ensures that concurrent transactions execute independently without interfering with each other. The intermediate state of one transaction should not be visible to other transactions until it is committed. This prevents issues like dirty reads (reading uncommitted data) or lost updates. Databases achieve this through locking mechanisms or multi-version concurrency control (MVCC).
- Example: If two users try to book the last available seat on a flight simultaneously, isolation ensures that only one transaction succeeds, while the other fails or waits.
Durability: Guarantees that once a transaction has been committed, its changes are permanent and will survive subsequent system failures (like power outages or crashes). This is typically achieved by writing changes to transaction logs stored on non-volatile storage before acknowledging the commit.
- Example: Once you receive confirmation that your online order is placed (transaction committed), you can be confident the order data is saved permanently, even if the server crashes immediately afterward.

Popular Relational Databases on Linux

Linux is a very popular platform for hosting relational databases. Some common choices include:

PostgreSQL: A powerful, open-source object-relational database system known for its standards compliance, robustness, extensibility, and strong feature set. It's often favored for complex applications requiring high data integrity.
MySQL: The world's most popular open-source relational database, known for its ease of use, speed, and large community support. It's widely used for web applications (often part of the LAMP stack - Linux, Apache, MySQL, PHP/Python/Perl).
MariaDB: A community-developed fork of MySQL, created by the original developers of MySQL. It aims to maintain high compatibility with MySQL while adding new features and improvements, and remaining fully open-source.
SQLite: An embedded relational database library. Instead of a server process, SQLite stores the entire database in a single file on the host machine. It's incredibly lightweight and commonly used in applications (like web browsers, mobile apps) that need a local, self-contained database without the overhead of a full client-server system.

Understanding the relational model and SQL is fundamental for anyone working with structured data, regardless of the specific RDBMS used.

Workshop Installing and Connecting to PostgreSQL

This workshop guides you through installing the PostgreSQL database server on a Linux system, performing initial configuration, and connecting to it using the psql command-line client. We will use PostgreSQL because of its robustness and SQL standards compliance, making it an excellent choice for learning.

Prerequisites:

A Linux system (e.g., Ubuntu 20.04/22.04, Debian 11/12, Fedora 36+, CentOS/RHEL 8+).
Access to the terminal with sudo privileges.
Internet connection to download packages.

Steps:

Update Package Lists: Always start by ensuring your system's package list is up-to-date.
- On Debian/Ubuntu:
```
sudo apt update
```
- On Fedora/CentOS/RHEL:
```
sudo dnf update -y
# Or for older CentOS/RHEL: sudo yum update -y
```

Install PostgreSQL Server and Client: Install the main PostgreSQL server package and the client utilities, which include psql.

On Debian/Ubuntu: The postgresql-contrib package includes useful extensions and utilities.
```
sudo apt install postgresql postgresql-contrib -y
```

On Fedora: Fedora often uses modules; you might need to list available streams and enable one first, or simply install the default version.

# Check available versions (optional)
# sudo dnf module list postgresql

# Install default version
sudo dnf install postgresql-server postgresql-contrib -y

# Initialize the database cluster (REQUIRED on Fedora/RHEL)
sudo postgresql-setup --initdb

# Enable and start the service (REQUIRED on Fedora/RHEL)
sudo systemctl enable postgresql
sudo systemctl start postgresql

On CentOS/RHEL: PostgreSQL might be available via AppStream or require enabling the official PostgreSQL repository. Using the official repo is often recommended for newer versions.
- Option A (Using AppStream - may be older version):
```
# Check available versions (optional)
# sudo dnf module list postgresql

# Enable a specific stream (e.g., 13)
# sudo dnf module enable postgresql:13 -y

sudo dnf install postgresql-server postgresql-contrib -y
sudo postgresql-setup --initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql
```
- Option B (Using Official PostgreSQL Repo - Recommended for latest versions): Visit https://www.postgresql.org/download/linux/redhat/ for instructions on adding the repository, then install using dnf.

Verify Installation and Service Status: Check if the PostgreSQL service is running.
```
sudo systemctl status postgresql
```
You should see output indicating the service is "active (running)". If not, try starting it: sudo systemctl start postgresql. On Debian/Ubuntu, the service usually starts automatically after installation.
Understand PostgreSQL User Authentication: By default, PostgreSQL uses an authentication method called peer authentication for local connections. This means it checks the operating system username of the client connecting via a Unix domain socket and allows access if it matches a valid PostgreSQL username. During installation, a PostgreSQL superuser named postgres is created, associated with the system's postgres user.
Connect as the postgres Superuser: To perform administrative tasks, you need to connect as the postgres user. Use sudo to switch to the system postgres user and then run psql.
```
sudo -u postgres psql
```
You should now see the psql prompt, which looks like postgres=#. This indicates you are connected to the default database (also named postgres) as the postgres user.
- To exit psql, type \q and press Enter.
Set a Password for the postgres User (Recommended): While connected as postgres in psql, set a strong password. This is crucial if you plan to allow password-based connections later (e.g., over the network). Replace 'YourStrongPassword' with a secure password.
```
ALTER USER postgres PASSWORD 'YourStrongPassword';
```
You should see ALTER ROLE as confirmation.
Create a New Database User: Let's create a dedicated user for your own projects instead of always using the superuser. We'll create a user named university_admin. While still in psql as the postgres user:
```
CREATE USER university_admin WITH PASSWORD 'AnotherSecurePassword';
```
Replace 'AnotherSecurePassword' with a strong password. You should see CREATE ROLE.
Create a New Database: Let's create a database named university_db and make university_admin the owner.
```
CREATE DATABASE university_db OWNER university_admin;
```
You should see CREATE DATABASE.
Grant Privileges (Optional but Good Practice): Although university_admin owns the database, explicit grants are often clearer. Let's grant all privileges on the new database to the new user.
```
GRANT ALL PRIVILEGES ON DATABASE university_db TO university_admin;
```
You should see GRANT.
- Now, exit psql: \q
Connect as the New User: You can now connect directly to the new database using the user you created.
```
psql -U university_admin -d university_db -h localhost -W
```
- -U university_admin: Specifies the PostgreSQL username.
- -d university_db: Specifies the database name to connect to.
- -h localhost: Specifies the host (use localhost for local connections). If you omit -h, psql might try peer authentication via sockets first. Using -h localhost often forces password authentication (if configured).
- -W: Prompts for the password you set for university_admin.
Enter the password (AnotherSecurePassword) when prompted. Your prompt should now look like university_db=>, indicating you are connected to university_db as university_admin.
Explore psql Commands: Inside psql, you can use backslash commands:
- \l: List all databases.
- \du: List all users (roles).
- \dt: List tables in the current database (will be empty now).
- \d <table_name>: Describe a specific table (once you create one).
- \conninfo: Show information about the current connection.
- \q: Quit psql.

Congratulations! You have successfully installed PostgreSQL, created a dedicated user and database, and learned how to connect using psql. You are now ready to start designing and interacting with your relational database on Linux. Remember to exit psql (\q) when you are finished.

2. Data Modeling Fundamentals

Before you start creating tables and inserting data, it's crucial to design the structure of your database. This process is called data modeling. A well-designed data model ensures data integrity, reduces redundancy, improves query performance, and makes the database easier to understand and maintain. The goal is to accurately represent the real-world entities and their relationships within the constraints of the relational model.

Entities Attributes and Relationships

Entities: An entity represents a distinct object, person, place, concept, or event about which you want to store information. Entities typically become tables in your database.
- Examples: Student, Course, Professor, Department, Product, Order.
Attributes: An attribute is a property or characteristic of an entity. Attributes become the columns in your database table. Each attribute has a specific data type.
- Examples: For the Student entity, attributes might be student_id, first_name, last_name, email, date_of_birth, major. For the Course entity, attributes might be course_code, course_title, credits, department_id.
Relationships: Relationships describe how different entities are associated with each other. Understanding relationships is key to linking tables correctly using foreign keys. There are three main types of relationships:
- One-to-One (1:1): Each instance of one entity is related to exactly one instance of another entity, and vice versa. This is less common. It might be used to split a table with many columns or for security reasons.
  - Example: A User entity might have a 1:1 relationship with a UserProfile entity containing optional details. One user has one profile, and one profile belongs to one user.
- One-to-Many (1:N): One instance of an entity can be related to many instances of another entity, but each instance of the second entity is related to only one instance of the first. This is very common.
  - Example: One Department can have many Professors, but each Professor belongs to only one Department. The foreign key (department_id) would typically be placed in the Professors table (the "many" side).
- Many-to-Many (M:N): One instance of an entity can be related to many instances of a second entity, and one instance of the second entity can also be related to many instances of the first.
  - Example: A Student can enroll in many Courses, and a Course can have many Students enrolled. To implement M:N relationships in a relational database, you need an associative table (also called a junction or linking table). This table sits between the two entities and contains foreign keys referencing the primary keys of both related tables.
  - Implementation: Create an Enrollments table with columns like enrollment_id (PK), student_id (FK referencing Students), course_code (FK referencing Courses), and potentially other attributes like enrollment_date or grade.

Keys and Constraints Revisited

Primary Key (PK): Essential for uniquely identifying each row. Can be a single column (student_id) or multiple columns (a composite key, e.g., (order_id, product_id) in an OrderItems table). Often, an auto-incrementing integer (SERIAL or IDENTITY column) is used as a surrogate key when no natural unique identifier exists or when the natural key is complex.
Foreign Key (FK): Enforces referential integrity, ensuring relationships between tables are valid. When defining a foreign key, you usually specify what should happen if the referenced primary key is updated or deleted (e.g., ON DELETE CASCADE - delete related rows, ON DELETE SET NULL - set the FK to NULL, ON DELETE RESTRICT - prevent deletion if referenced).
Constraints: Rules enforced by the database to ensure data accuracy and consistency.
- NOT NULL: Ensures a column cannot contain NULL values.
- UNIQUE: Ensures all values in a column (or set of columns) are unique across the table (allows one NULL value, typically). Primary keys are implicitly UNIQUE and NOT NULL.
- CHECK: Specifies a condition that must be true for any value entered into a column.
  - Example: CHECK (credits > 0) on a Courses table.
- DEFAULT: Provides a default value for a column if none is specified during insertion.

Normalization

Normalization is the process of organizing the columns and tables of a relational database to minimize data redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables and defining relationships between them. The goal is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table. Redundancy wastes space and can lead to inconsistencies (update anomalies).

Normalization is typically described in terms of normal forms (NFs):

First Normal Form (1NF):
- Rule: Each column must contain atomic (indivisible) values, and each column must have a unique name within the table. There should be no repeating groups of columns. Each row must be unique (enforced by a primary key).
- Violation Example: A StudentCourses column containing a comma-separated list of course codes ('CS101, MATH202').
- Fix: Separate courses into individual rows, likely in a separate Enrollments table linked to the Students and Courses tables.
Second Normal Form (2NF):
- Prerequisite: Must be in 1NF.
- Rule: All non-key attributes must be fully functionally dependent on the entire primary key. This rule applies mainly when a table has a composite primary key (a PK made of multiple columns). It means no non-key column should depend on only part of the composite key.
- Violation Example: Table OrderItems (OrderID, ProductID, Quantity, OrderDate, ProductName). Assume (OrderID, ProductID) is the composite PK. Here, ProductName depends only on ProductID (part of the PK), not the whole key. OrderDate depends only on OrderID (part of the PK). Quantity depends on the full key.
- Fix: Decompose into separate tables: Orders(OrderID (PK), OrderDate), Products(ProductID (PK), ProductName), and OrderItems(OrderID (FK), ProductID (FK), Quantity). The primary key for OrderItems would be (OrderID, ProductID).
Third Normal Form (3NF):
- Prerequisite: Must be in 2NF.
- Rule: No non-key attribute should be transitively dependent on the primary key. This means a non-key attribute should not depend on another non-key attribute.
- Violation Example: Table Students (StudentID (PK), Name, Major, Department, DepartmentHead). Here, DepartmentHead depends on Department, which in turn depends on StudentID. This is a transitive dependency. If the head of a department changes, you'd have to update multiple student records.
- Fix: Decompose into Students (StudentID (PK), Name, Major, DepartmentID (FK)) and Departments (DepartmentID (PK), DepartmentName, DepartmentHead).

Higher normal forms (BCNF, 4NF, 5NF) exist but are less commonly applied in practice. Achieving 3NF is often considered a good balance between reducing redundancy and maintaining practical usability. Sometimes, denormalization (intentionally violating normalization rules) is done selectively to improve read performance, but it should be done cautiously as it reintroduces redundancy and potential update anomalies.

Entity-Relationship Diagrams (ERDs)

An ERD is a visual representation of the entities, attributes, and relationships in a database model. It's a blueprint used during the design phase. Common notations include Crow's Foot notation.

Entities: Represented by rectangles.
Attributes: Listed inside the entity rectangles (PKs and FKs often indicated).
Relationships: Represented by lines connecting related entities. Symbols on the ends of the lines (like a crow's foot for "many" and a single stroke for "one") indicate the cardinality (1:1, 1:N, M:N).

Drawing an ERD helps clarify the structure and identify potential issues before writing any CREATE TABLE statements.

Workshop Designing a Simple University Database Schema

In this workshop, we will apply data modeling concepts to design a simple schema for our university_db. We'll define entities, attributes, keys, and relationships, and then translate this design into SQL CREATE TABLE statements using PostgreSQL syntax.

Scenario:

We need to store information about Students, Courses, and which student is enrolled in which course.

Entities:

Students
Courses
Enrollments (Associative entity for the M:N relationship between Students and Courses)

Attributes and Keys:

Students Table:
- student_id: Primary Key (PK), uniquely identifies a student. Let's use a serial integer type that auto-increments.
- first_name: Text, cannot be null.
- last_name: Text, cannot be null.
- email: Text, must be unique, cannot be null.
- enrollment_date: Date, default to the current date.
Courses Table:
- course_id: Primary Key (PK), uniquely identifies a course. Let's use a serial integer type.
- course_code: Text, unique identifier like 'CS101', must be unique, cannot be null.
- course_title: Text, cannot be null.
- credits: Integer, must be positive.
Enrollments Table:
- enrollment_id: Primary Key (PK), uniquely identifies an enrollment record. Serial integer type.
- student_id: Foreign Key (FK) referencing Students(student_id). Cannot be null. If a student is deleted, their enrollments should also be deleted (CASCADE).
- course_id: Foreign Key (FK) referencing Courses(course_id). Cannot be null. If a course is deleted, related enrollments should also be deleted (CASCADE).
- enrollment_timestamp: Timestamp with time zone, default to the current timestamp.
- Constraint: A student should not be able to enroll in the same course more than once. We need a UNIQUE constraint on the combination of student_id and course_id.

Entity-Relationship Diagram (Text-Based Representation):

+-------------+       +-----------------+       +-----------+
|  Students   |       |   Enrollments   |       |  Courses  |
+-------------+       +-----------------+       +-----------+
| PK student_id|-------|< FK student_id  |       | PK course_id|
| first_name  |       | FK course_id  >|-------|< course_code|
| last_name   |       | PK enrollment_id|       | course_title|
| email (UQ)  |       | enrollment_ts   |       | credits   |
| enrollment_dt|       | UQ (student_id, |       +-----------+
+-------------+       |     course_id)  |
                        +-----------------+

PK: Primary Key
FK: Foreign Key
UQ: Unique Constraint
---|<-: Represents a one-to-many relationship (one Student can have many Enrollments, one Course can have many Enrollments).

Steps:

Connect to Your Database: If you are not already connected, connect to the university_db as the university_admin user you created in the previous workshop.
```
psql -U university_admin -d university_db -h localhost -W
```
Enter the password when prompted.
Create the Students Table: Execute the following SQL command in psql. Note the use of SERIAL for auto-incrementing PK, VARCHAR for text with length limits (adjust as needed), UNIQUE, NOT NULL, and DEFAULT.
```
CREATE TABLE Students (
    student_id SERIAL PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    enrollment_date DATE DEFAULT CURRENT_DATE
);
```
If successful, you should see CREATE TABLE. You can verify by typing \dt (shows the table) or \d Students (shows table details).

Create the Courses Table: Execute the following SQL command. Note the CHECK constraint.

CREATE TABLE Courses (
    course_id SERIAL PRIMARY KEY,
    course_code VARCHAR(10) UNIQUE NOT NULL,
    course_title VARCHAR(200) NOT NULL,
    credits INTEGER CHECK (credits > 0)
);

You should see CREATE TABLE. Verify with \dt or \d Courses.

Create the Enrollments Table: Execute the following SQL command. This table links Students and Courses. Note the FOREIGN KEY definitions with ON DELETE CASCADE and the composite UNIQUE constraint.

CREATE TABLE Enrollments (
    enrollment_id SERIAL PRIMARY KEY,
    student_id INTEGER NOT NULL,
    course_id INTEGER NOT NULL,
    enrollment_timestamp TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT fk_student
        FOREIGN KEY(student_id)
        REFERENCES Students(student_id)
        ON DELETE CASCADE,

    CONSTRAINT fk_course
        FOREIGN KEY(course_id)
        REFERENCES Courses(course_id)
        ON DELETE CASCADE,

    -- Ensure a student cannot enroll in the same course multiple times
    CONSTRAINT unique_student_course UNIQUE (student_id, course_id)
);

You should see CREATE TABLE. Verify with \dt or \d Enrollments. Notice how \d Enrollments shows the primary key, foreign key constraints, and the unique constraint.

Review the Schema: Use psql commands to review the tables you created:
- \dt: List all tables (courses, enrollments, students).
- \d Students: Describe the Students table structure, including columns, types, constraints, and indexes (like the primary key and unique email index).
- \d Courses: Describe the Courses table.
- \d Enrollments: Describe the Enrollments table, paying attention to the foreign key constraints and the unique constraint.

You have now successfully designed a basic relational database schema and implemented it using SQL DDL commands in PostgreSQL. This structure minimizes redundancy (student/course details are stored only once) and enforces integrity (valid student/course references, unique enrollments). You are now ready to populate these tables with data. Remember to exit psql (\q) when finished.

3. Basic SQL Operations SELECT INSERT UPDATE DELETE

Now that we have designed and created the structure (schema) for our university_db, the next step is to interact with the data within the tables. The core SQL commands for this belong to the Data Manipulation Language (DML): INSERT, SELECT, UPDATE, and DELETE.

1. INSERT Adding Data

The INSERT statement is used to add new rows (records) into a table.

Syntax (Specifying Columns): This is the recommended way, as it's independent of column order.

INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

Syntax (Implicit Columns - Requires matching order and values for all columns): Less robust if the table structure changes.
```
INSERT INTO table_name
VALUES (value1_for_col1, value2_for_col2, ...);
```

Inserting Multiple Rows:

INSERT INTO table_name (column1, column2) VALUES
  (value1a, value2a),
  (value1b, value2b),
  (value1c, value2c);

Important Notes:
- Values must match the data types of the corresponding columns.
- Text (string) values must be enclosed in single quotes (e.g., 'John Doe').
- Numeric values are not quoted (e.g., 123, 99.9).
- Date/Timestamp values are often quoted (e.g., '2023-10-27', '2023-10-27 10:30:00'). The exact format can depend on the database configuration, but ISO 8601 format (YYYY-MM-DD) is generally standard.
- You don't need to specify values for columns with DEFAULT values (like our enrollment_date or enrollment_timestamp) or SERIAL/IDENTITY columns (like our primary keys), as the database will generate them automatically.
- INSERT statements must satisfy all constraints (NOT NULL, UNIQUE, CHECK, FOREIGN KEY). If a constraint is violated, the INSERT will fail.

2. SELECT Retrieving Data

The SELECT statement is arguably the most powerful and frequently used SQL command. It allows you to retrieve data from one or more tables based on specified criteria.

Basic Syntax:

SELECT column1, column2, ...
FROM table_name;

Selecting All Columns:
Use * (asterisk) as a shortcut, but explicitly listing columns is often better for clarity and performance, especially in applications.
```
SELECT *
FROM table_name;
```
Filtering Data with WHERE:
The WHERE clause filters rows based on a condition.
```
SELECT column1, column2
FROM table_name
WHERE condition; -- e.g., WHERE credits > 3 OR WHERE first_name = 'Alice'
```
- Comparison Operators: =, != or <>, >, <, >=, <=
- Logical Operators: AND, OR, NOT
- Other Operators: BETWEEN, IN (matches values in a list), LIKE (pattern matching with % wildcard for multiple characters, _ for single character), IS NULL, IS NOT NULL.

Sorting Results with ORDER BY:
Sorts the retrieved rows based on one or more columns.

SELECT first_name, last_name
FROM Students
ORDER BY last_name ASC, first_name ASC; -- ASC (ascending) is default, DESC for descending

Limiting Results with LIMIT:
Restricts the number of rows returned (useful for pagination or finding top N results).

SELECT *
FROM Courses
ORDER BY course_code
LIMIT 10; -- Returns the first 10 courses alphabetically by code

Aggregating Data with GROUP BY and Aggregate Functions:
Used to group rows that have the same values in specified columns and calculate aggregate statistics for each group.
- Common Aggregate Functions: COUNT() (counts rows), SUM(), AVG() (average), MAX(), MIN().
```
-- Count students enrolled in each course
SELECT course_id, COUNT(student_id) AS num_students
FROM Enrollments
GROUP BY course_id
ORDER BY num_students DESC;
```

Filtering Groups with HAVING:
Similar to WHERE, but filters groups after aggregation, based on the results of aggregate functions. WHERE filters rows before aggregation.

-- Find courses with more than 5 students enrolled
SELECT course_id, COUNT(student_id) AS num_students
FROM Enrollments
GROUP BY course_id
HAVING COUNT(student_id) > 5;

Joining Tables:
SELECT becomes truly powerful when combining data from multiple related tables using JOIN clauses (covered in more advanced SQL sections, but the basic idea is to link tables based on matching foreign key and primary key values).

3. UPDATE Modifying Data

The UPDATE statement is used to modify existing data in a table.

Syntax:

UPDATE table_name
SET column1 = value1,
    column2 = value2,
    ...
WHERE condition; -- VERY IMPORTANT!

CRITICAL WARNING: If you omit the WHERE clause in an UPDATE statement, ALL ROWS in the table will be updated! Always double-check your WHERE clause before executing an UPDATE. It's often wise to run a SELECT statement with the same WHERE clause first to verify which rows will be affected.

Example: Change the email address for a specific student.

-- First, verify the student exists and you have the right ID
SELECT student_id, first_name, last_name, email
FROM Students
WHERE email = 'old.email@example.com';

-- If correct, perform the update
UPDATE Students
SET email = 'new.email@example.com'
WHERE student_id = 123; -- Assuming 123 is the ID found above

4. DELETE Removing Data

The DELETE statement is used to remove one or more rows from a table.

Syntax:

DELETE FROM table_name
WHERE condition; -- VERY IMPORTANT!

CRITICAL WARNING: Similar to UPDATE, if you omit the WHERE clause in a DELETE statement, ALL ROWS in the table will be deleted! Always be extremely careful and verify your WHERE clause. Consider using transactions (BEGIN, COMMIT, ROLLBACK) for potentially risky delete operations.

Example: Remove a specific enrollment record.

-- First, verify the enrollment record you want to delete
SELECT *
FROM Enrollments
WHERE student_id = 456 AND course_id = 789;

-- If correct, perform the delete
DELETE FROM Enrollments
WHERE student_id = 456 AND course_id = 789;

Note on Foreign Keys: If other tables have foreign key constraints referencing the row you are trying to delete, the DELETE operation might fail unless the constraint was defined with ON DELETE CASCADE (which would automatically delete the referencing rows) or ON DELETE SET NULL (which would set the foreign key column in the referencing rows to NULL). Our Enrollments table uses ON DELETE CASCADE, so deleting a student would automatically delete their enrollments.

Mastering these four DML commands (INSERT, SELECT, UPDATE, DELETE) is essential for effectively managing data within a relational database. Always pay close attention to WHERE clauses in UPDATE and DELETE statements to avoid unintended data modification or loss.

Workshop Populating and Querying the University Database

In this workshop, we will use the DML commands (INSERT, SELECT, UPDATE, DELETE) learned in the previous section to populate our university_db tables (Students, Courses, Enrollments) with sample data and then practice retrieving and manipulating that data.

Prerequisites:

PostgreSQL server running.
The university_db database and tables (Students, Courses, Enrollments) created as per the previous workshop.

Connected to university_db as university_admin using psql.

psql -U university_admin -d university_db -h localhost -W

Steps:

Insert Data into Students Table: Let's add a few students. Notice we don't specify student_id (it's SERIAL) or enrollment_date (it uses DEFAULT CURRENT_DATE).

INSERT INTO Students (first_name, last_name, email) VALUES
  ('Alice', 'Smith', 'alice.s@example.edu'),
  ('Bob', 'Johnson', 'bob.j@example.com'),
  ('Charlie', 'Williams', 'charlie.w@example.edu'),
  ('David', 'Brown', 'david.b@example.net');

You should see INSERT 0 4 indicating 4 rows were inserted.

Verify Student Data: Use SELECT to see the data you just inserted. Note how student_id and enrollment_date were automatically populated.
```
SELECT * FROM Students;
```
(Expected Output will show 4 rows with IDs 1-4, the names/emails you entered, and the current date)

Insert Data into Courses Table: Add some courses. We don't specify course_id (it's SERIAL).

INSERT INTO Courses (course_code, course_title, credits) VALUES
  ('CS101', 'Introduction to Computer Science', 3),
  ('MATH202', 'Calculus II', 4),
  ('PHYS101', 'General Physics I', 4),
  ('ENG101', 'Composition I', 3);

You should see INSERT 0 4.

Verify Course Data:
```
SELECT * FROM Courses;
```
(Expected Output will show 4 rows with IDs 1-4, the course details, matching the input)
Insert Data into Enrollments Table: Now, let's enroll students in courses. We need the student_id and course_id values. Assuming the IDs were generated sequentially starting from 1:
- Alice (ID 1) enrolls in CS101 (ID 1) and MATH202 (ID 2).
- Bob (ID 2) enrolls in CS101 (ID 1).
- Charlie (ID 3) enrolls in MATH202 (ID 2) and PHYS101 (ID 3).
- David (ID 4) enrolls in ENG101 (ID 4).
```
INSERT INTO Enrollments (student_id, course_id) VALUES
  (1, 1), -- Alice in CS101
  (1, 2), -- Alice in MATH202
  (2, 1), -- Bob in CS101
  (3, 2), -- Charlie in MATH202
  (3, 3), -- Charlie in PHYS101
  (4, 4); -- David in ENG101
```
  You should see INSERT 0 6.
Verify Enrollment Data:
```
SELECT * FROM Enrollments ORDER BY student_id, course_id;
```
(Expected Output will show 6 rows with enrollment IDs 1-6, the student/course ID pairs, and timestamps)

Practice SELECT Queries: Now, let's retrieve specific information.

List all student names (first and last):

SELECT first_name, last_name FROM Students;

Find all courses worth 4 credits:

SELECT course_code, course_title FROM Courses WHERE credits = 4;

Find students whose email ends with '.edu':

SELECT first_name, email FROM Students WHERE email LIKE '%.edu';

List students ordered by last name:

SELECT * FROM Students ORDER BY last_name ASC;

Find the course IDs Alice Smith (student_id 1) is enrolled in:

SELECT course_id FROM Enrollments WHERE student_id = 1;

Count how many students are enrolled in total: (Counts enrollment records)

SELECT COUNT(*) FROM Enrollments;
-- Or count distinct students enrolled in *any* course:
SELECT COUNT(DISTINCT student_id) FROM Enrollments;

Count students per course: (Using GROUP BY)

SELECT course_id, COUNT(student_id) AS number_of_students
FROM Enrollments
GROUP BY course_id
ORDER BY number_of_students DESC;

(Slightly more advanced) List the names of students enrolled in CS101 (course_id 1): This requires a JOIN, which links tables.

SELECT s.first_name, s.last_name
FROM Students s
JOIN Enrollments e ON s.student_id = e.student_id
WHERE e.course_id = 1; -- Assuming CS101 has course_id = 1

(Expected output: Alice Smith, Bob Johnson)

Practice UPDATE: Let's assume Bob Johnson changed his email. His student_id is 2.
- First, verify the current email:
```
SELECT email FROM Students WHERE student_id = 2;
```
  (Expected: bob.j@example.com)
- Now, update it:
```
UPDATE Students
SET email = 'robert.johnson@example.com'
WHERE student_id = 2;
```
  You should see UPDATE 1.
- Verify the change:
```
SELECT email FROM Students WHERE student_id = 2;
```
  (Expected: robert.johnson@example.com)
Practice DELETE: Let's say Alice Smith (student_id 1) drops MATH202 (course_id 2).
- First, verify the enrollment exists:
```
SELECT * FROM Enrollments WHERE student_id = 1 AND course_id = 2;
```
  (Expected: One row showing this enrollment)
- Now, delete it:
```
DELETE FROM Enrollments
WHERE student_id = 1 AND course_id = 2;
```
  You should see DELETE 1.
- Verify it's gone:
```
SELECT * FROM Enrollments WHERE student_id = 1;
```
  (Expected: Only the enrollment row for course_id 1 should remain for student_id 1)
(Optional) Test ON DELETE CASCADE: Let's delete David Brown (student_id 4). Because of ON DELETE CASCADE on the Enrollments table, his enrollment record(s) should also be deleted automatically.
- Verify David's enrollment:
```
SELECT * FROM Enrollments WHERE student_id = 4;
```
  (Expected: One row for course_id 4)
- Delete David from the Students table:
```
DELETE FROM Students WHERE student_id = 4;
```
  You should see DELETE 1.
- Verify David is gone from Students:
```
SELECT * FROM Students WHERE student_id = 4;
```
  (Expected: 0 rows)
- Verify David's enrollment is also gone:
```
SELECT * FROM Enrollments WHERE student_id = 4;
```
  (Expected: 0 rows)

This workshop provided hands-on practice with the fundamental DML operations. You learned how to insert data respecting constraints, query data using various SELECT clauses (WHERE, ORDER BY, GROUP BY), modify data safely using UPDATE with WHERE, and remove data using DELETE with WHERE, while observing the effect of foreign key constraints like ON DELETE CASCADE. Feel free to experiment further with different queries. Exit psql (\q) when you are done.

4. Introduction to NoSQL Databases

While relational databases (using SQL) are powerful and suitable for a vast range of applications involving structured data and requiring strong consistency (ACID properties), they aren't always the best fit for every scenario. The rise of the internet, social media, mobile applications, and the Internet of Things (IoT) generated unprecedented volumes, velocities, and varieties of data (often called "Big Data"). Handling this scale and diversity exposed some limitations of traditional relational databases, leading to the development of NoSQL databases.

NoSQL stands for "Not Only SQL". It's an umbrella term for database management systems that differ from classic relational databases in significant ways, particularly in their data models, scaling strategies, and consistency guarantees. They emerged to address needs like:

Massive Scalability: Handling terabytes or petabytes of data distributed across many servers (horizontal scaling). Relational databases traditionally scale vertically (bigger server), which has limits.
High Availability: Ensuring the database remains operational even if some servers fail.
Flexible Data Models: Managing unstructured or semi-structured data (like user profiles with varying fields, sensor readings, social media posts) without a rigid, predefined schema.
High Performance: Achieving very high read/write throughput for specific access patterns.

The CAP Theorem

Understanding NoSQL often involves understanding the CAP theorem, a fundamental concept in distributed systems formulated by Eric Brewer. It states that a distributed data store can only provide two of the following three guarantees simultaneously:

Consistency (C): Every read receives the most recent write or an error. All nodes in the system see the same data at the same time. (This is different but related to the 'C' in ACID).
Availability (A): Every request receives a (non-error) response, without guaranteeing it contains the most recent write. The system remains operational even if some nodes fail.
Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes (i.e., the network partitions).

In a distributed system, network partitions (P) are a reality that must be tolerated. Therefore, the CAP theorem essentially forces a trade-off between Consistency (C) and Availability (A) when a partition occurs.

CP Systems (Consistency + Partition Tolerance): These systems prioritize consistency. If a partition occurs, they might become unavailable for some requests (e.g., writes, or reads from the affected partition) to avoid returning potentially stale or inconsistent data. Many relational databases configured for replication fall here, as well as some NoSQL systems like HBase.
AP Systems (Availability + Partition Tolerance): These systems prioritize availability. Even during a partition, all nodes remain available for reads and writes. However, this means some reads might return stale data until the partition heals and data synchronization catches up. This is often referred to as eventual consistency. Many NoSQL databases (like Cassandra, DynamoDB, Riak, Couchbase) are designed as AP systems.

BASE Properties

In contrast to the strict ACID guarantees of relational databases, many NoSQL databases (especially AP systems) adhere to the BASE philosophy:

Basically Available: The system guarantees availability (as per the CAP theorem).
Soft State: The state of the system may change over time, even without input, due to eventual consistency. Data might be inconsistent temporarily across nodes.
Eventually Consistent: If no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Data gradually propagates through the distributed system.

This trade-off (sacrificing immediate consistency for availability and scalability) is acceptable or even desirable for many modern web applications where temporary inconsistencies (e.g., seeing an old profile picture for a few seconds) are less critical than the system being always available and scalable.

Types of NoSQL Databases

NoSQL is not a single technology but a category. The main types differ significantly in their data models and use cases:

Key-Value Stores:
- Model: Simplest model. Stores data as a collection of key-value pairs, like a dictionary or hash map. The 'value' can be anything (a string, number, JSON object, image, video). The database operations are typically limited to GET(key), PUT(key, value), DELETE(key).
- Pros: Extremely fast for simple lookups, highly scalable, simple model.
- Cons: Querying by value is often inefficient or impossible; relationships between data items aren't inherently managed.
- Examples: Redis, Memcached, Riak KV, Amazon DynamoDB (also document).
- Use Cases: Caching, session management, user profiles, real-time leaderboards.
Document Databases:
- Model: Stores data in documents, typically using formats like JSON (JavaScript Object Notation) or BSON (Binary JSON) or XML. Documents are self-contained units of data, and unlike rows in relational tables, documents within the same collection can have different structures (schema flexibility). Documents can contain nested structures (arrays and sub-documents).
- Pros: Flexible schema, natural mapping to object-oriented programming structures, querying based on document content is possible and often indexed. Good general-purpose NoSQL choice.
- Cons: Can lead to data redundancy if not modeled carefully; complex queries involving joins across different document types can be less efficient than in SQL.
- Examples: MongoDB, Couchbase, ArangoDB (multi-model), Elasticsearch (also search engine).
- Use Cases: Content management systems, user profiles, product catalogs, blogging platforms, mobile app data.
Column-Family (or Wide-Column) Stores:
- Model: Stores data in tables with rows and columns, but optimized for column-based access. Data for a particular column is stored contiguously on disk. Rows can have different columns, and columns can be grouped into "column families". Think of it as a two-dimensional key-value store (row key -> column family -> column -> value/timestamp).
- Pros: Extremely scalable for huge datasets (petabytes), high write performance, efficient querying over specific columns across many rows. Good for aggregation queries.
- Cons: Can be more complex to model and manage than other types. Read performance can vary depending on access patterns. Eventual consistency is common.
- Examples: Apache Cassandra, Apache HBase, Google Bigtable.
- Use Cases: Big data analytics, logging, time-series data, recommendation engines, IoT data streams.
Graph Databases:
- Model: Designed specifically to store and navigate relationships. Data is represented as nodes (entities) and edges (relationships connecting nodes). Both nodes and edges can have properties (attributes).
- Pros: Excellent performance for traversing complex relationships (e.g., "find friends of friends"). Intuitive model for highly interconnected data.
- Cons: Can be less efficient for queries involving aggregation over the entire dataset compared to other models. Scaling strategies can differ.
- Examples: Neo4j, ArangoDB (multi-model), JanusGraph, Amazon Neptune.
- Use Cases: Social networks, recommendation engines (e.g., "people who bought this also bought..."), fraud detection, network/IT operations mapping, knowledge graphs.

Choosing Between SQL and NoSQL

The choice isn't always SQL vs. NoSQL; sometimes the best solution involves using both (polyglot persistence).

Choose SQL/Relational when:
- Data is structured and schema is well-defined and relatively stable.
- Strong consistency (ACID) is paramount (e.g., financial transactions).
- You need complex querying capabilities involving multiple tables (JOINs).
- Mature technology and wide tooling support are important.
Choose NoSQL when:
- Dealing with massive amounts of data requiring horizontal scaling.
- High availability is more critical than immediate consistency.
- Data is unstructured or semi-structured, or the schema evolves rapidly.
- You need extremely high read/write throughput for specific (often simpler) query patterns.
- Your data model naturally fits one of the NoSQL types (key-value, document, graph, column-family).

Understanding the different NoSQL paradigms opens up new possibilities for handling diverse data challenges, especially in the context of modern, large-scale applications running on distributed Linux environments.

Workshop Setting Up and Using Redis as a Key Value Store

This workshop introduces you to Redis, a popular open-source, in-memory data structure store often used as a high-performance key-value database, cache, and message broker. We will install Redis on Linux, connect using its command-line interface (redis-cli), and perform basic key-value operations.

Prerequisites:

A Linux system (e.g., Ubuntu, Debian, Fedora, CentOS/RHEL).
Access to the terminal with sudo privileges.
Internet connection.

Steps:

Update Package Lists:

On Debian/Ubuntu:
```
sudo apt update
```

On Fedora/CentOS/RHEL:

sudo dnf update -y
# Or: sudo yum update -y

Install Redis Server:
- On Debian/Ubuntu:
```
sudo apt install redis-server -y
```
- On Fedora:
```
sudo dnf install redis -y
```
- On CentOS/RHEL (using EPEL repository): The Extra Packages for Enterprise Linux (EPEL) repository often provides Redis.
```
sudo dnf install epel-release -y
sudo dnf update -y
sudo dnf install redis -y
```
Start and Enable Redis Service: On most systems using systemd, you need to start the service and enable it to launch on boot. Debian/Ubuntu often starts it automatically post-installation.
```
sudo systemctl start redis-server  # Or just 'redis' on Fedora/RHEL
sudo systemctl enable redis-server # Or just 'redis' on Fedora/RHEL
```

Verify Redis Service Status:

sudo systemctl status redis-server # Or just 'redis'

Look for "active (running)".

Connect Using redis-cli: The Redis installation includes a command-line client.
```
redis-cli
```
You should see a prompt like 127.0.0.1:6379>. This indicates you're connected to the Redis server running on the default host (127.0.0.1) and port (6379). You can test the connection with the PING command:
```
PING
```
Redis should reply with PONG.
Basic Key-Value Operations:
- SET key value: Stores a string value associated with a key. If the key already exists, it's overwritten.
```
SET user:100:username "alice_wonder"
```
  (Expected Output: OK)
```
SET app:config:api_key "xyz789abc"
```
  (Expected Output: OK)
- GET key: Retrieves the string value associated with a key.
```
GET user:100:username
```
  (Expected Output: "alice_wonder")
```
GET app:config:api_key
```
  (Expected Output: "xyz789abc")
```
GET non_existent_key
```
  (Expected Output: (nil) - indicating the key doesn't exist)
- KEYS pattern:
  Lists keys matching a pattern (* matches any sequence, ? matches single char).
Warning:
Use KEYS with caution on production databases with many keys, as it can be slow and block other operations. It's okay for learning/debugging.
```
KEYS *
```
(Expected Output: Lists user:100:username and app:config:api_key)
```
KEYS user:*
```
(Expected Output: Lists user:100:username)
- EXISTS key:
  Checks if a key exists. Returns 1 if it exists, 0 otherwise.
```
EXISTS user:100:username
```
  (Expected Output: (integer) 1)
```
EXISTS fake_key
```
  (Expected Output: (integer) 0)
- DEL key [key ...]:
  Deletes one or more keys and their associated values. Returns the number of keys deleted.
```
DEL app:config:api_key
```
  (Expected Output: (integer) 1)
```
GET app:config:api_key
```
  (Expected Output: (nil))
- INCR key / DECR key:
  Atomically increments or decrements the integer value stored at a key. If the key doesn't exist, it's created with value 0 first. Useful for counters.
```
SET page:views:home 100
```
  (Expected Output: OK)
```
INCR page:views:home
```
  (Expected Output: (integer) 101)
```
INCR page:views:about
```
  (Expected Output: (integer) 1 - created and incremented)
```
GET page:views:home
```
  (Expected Output: "101")
```
DECR page:views:home
```
  (Expected Output: (integer) 100)
Setting Key Expiration:
Redis is often used for caching, where data should automatically expire.
- EXPIRE key seconds: Sets a timeout (in seconds) on an existing key.
```
SET session:user123 "some_session_data"
```
  (Expected Output: OK)
```
EXPIRE session:user123 60
```
  (Expected Output: (integer) 1 - meaning expiration set)
```
TTL session:user123
```
  (Expected Output: (integer) 58 - or similar, showing remaining time) (Wait for 60 seconds...)
```
GET session:user123
```
  (Expected Output: (nil) - key has expired)
- SETEX key seconds value: Atomically sets a key to a value with an expiration time.
```
SETEX cache:product:555 300 "Product 555 details JSON"
```
  (Expected Output: OK)
```
TTL cache:product:555
```
  (Expected Output: (integer) 300 - or slightly less)
Other Data Structures (Brief Mention):
Redis is more than just key-value; it supports other structures like Lists (LPUSH, RPUSH, LRANGE), Hashes (HSET, HGET, HGETALL - good for storing object fields), Sets (SADD, SMEMBERS, SINTER), and Sorted Sets (ZADD, ZRANGE - useful for leaderboards). Explore these with HELP command_name inside redis-cli.
Exit redis-cli:
```
QUIT
```
Or press Ctrl+D.

You have successfully installed Redis, connected to it, and performed fundamental key-value operations, including setting expirations. This demonstrates the simplicity and speed of key-value stores like Redis, commonly used in Linux environments for caching, session management, and other high-performance tasks.

5. Database Administration Basics on Linux

Managing databases involves more than just designing schemas and running queries. Database Administration (DBA) encompasses tasks related to maintaining the health, security, performance, and reliability of database systems. While DBA is a deep field, understanding the basics is crucial for anyone running database-dependent applications on Linux. We'll focus on common tasks, primarily using PostgreSQL as an example, but the concepts apply broadly.

User and Permission Management

Controlling who can access the database and what they can do is fundamental for security. Databases use roles (often called users) and privileges.

Roles/Users:
Represent entities that can connect to the database and own objects. Roles can have attributes like login permission (LOGIN), superuser status (SUPERUSER), ability to create databases (CREATEDB), or create roles (CREATEROLE).
- Creating a User (PostgreSQL):
```
-- In psql, connected as a superuser (like 'postgres')
CREATE USER myapp_user WITH PASSWORD 'secure_password' LOGIN;
```
- Creating a Role (Group): Roles can also act like groups to manage permissions collectively.
```
CREATE ROLE readonly_group;
```
- Assigning User to Role:
```
GRANT readonly_group TO myapp_user;
```

Privileges: Permissions granted on specific database objects (databases, schemas, tables, columns, sequences, functions). Common privileges include SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, EXECUTE, USAGE.

Granting Privileges (PostgreSQL):

-- Grant connect access to a database
GRANT CONNECT ON DATABASE mydatabase TO myapp_user;

-- Grant usage access to a schema (necessary before granting table privileges)
GRANT USAGE ON SCHEMA public TO myapp_user;
GRANT USAGE ON SCHEMA public TO readonly_group; -- Grant to role

-- Grant specific privileges on a table
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE products TO myapp_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly_group; -- Grant SELECT on all current tables

-- Set default privileges for future tables (important!)
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO readonly_group;

Revoking Privileges (PostgreSQL):

REVOKE DELETE ON TABLE products FROM myapp_user;
REVOKE ALL PRIVILEGES ON DATABASE mydatabase FROM some_old_user;

Linux Interaction: Database user management is distinct from Linux system users, although authentication methods like peer (used by default for postgres locally) link them. Network connections typically use password methods (md5 or scram-sha-256 in PostgreSQL) defined in the pg_hba.conf file.

Basic Backup and Restore Strategies

Data loss can be catastrophic. Regular backups are essential. Strategies vary, but common types include:

Logical Backups: Export the database structure (DDL) and data (DML INSERT statements or copy commands) into a human-readable (or compressed) file. Platform-independent but can be slower to restore for very large databases.

PostgreSQL (pg_dump, pg_dumpall):

# Dump a single database (common)
pg_dump -U db_user -h localhost -W -F c -f mydatabase_backup.dump mydatabase
# -U: DB User, -h: Host, -W: Prompt for password
# -F c: Custom format (compressed, flexible)
# -f: Output file
# mydatabase: Database name

# Dump all databases (roles, tablespaces) - needs superuser
pg_dumpall -U postgres -h localhost -W -f alldatabases.sql

MySQL/MariaDB (mysqldump):

mysqldump -u db_user -p --databases mydatabase > mydatabase_backup.sql
# -u: DB User, -p: Prompt for password
# --databases: Include CREATE DATABASE statement

Physical Backups: Copy the actual data files that make up the database. Faster restore times, especially for large databases, but often platform/version dependent and more complex to set up correctly (requires database shutdown or specific tools like pg_basebackup for online backups).

Restoring Data:

PostgreSQL (psql, pg_restore):

# Restore from plain SQL dump
psql -U db_user -h localhost -W -d mydatabase -f mydatabase_backup.sql

# Restore from custom format dump (use pg_restore)
# Assumes database 'mydatabase' exists and is empty
pg_restore -U db_user -h localhost -W -d mydatabase -v mydatabase_backup.dump
# -d: Database to restore into
# -v: Verbose output

MySQL/MariaDB (mysql client):

# Restore from SQL dump
mysql -u db_user -p mydatabase < mydatabase_backup.sql

Backup Frequency and Retention: Depends on how critical the data is and how often it changes. Common strategies involve daily full backups, possibly supplemented by more frequent incremental backups or transaction log archiving (Point-in-Time Recovery - PITR). Store backups securely, preferably off-site or in a different location. Test your restore process regularly!

Monitoring Database Performance

Keeping an eye on performance helps identify bottlenecks and resource issues.

Linux System Tools: Standard tools are invaluable.
- top/htop: Monitor CPU, memory usage of database processes (e.g., postgres, mysqld).
- iostat: Check disk I/O activity. High iowait can indicate disk bottlenecks.
- vmstat/free -h: Monitor memory usage, swapping. Excessive swapping kills database performance.
- netstat/ss: Check network connections to the database.
Database Logs: Contain vital information about errors, slow queries, connection attempts, checkpoints, etc. Location depends on the distribution and configuration (e.g., /var/log/postgresql/, /var/log/mysql/). Configure logging levels appropriately (e.g., log slow queries above a certain threshold).
Database-Specific Tools/Queries:
- PostgreSQL:
  - SELECT * FROM pg_stat_activity;: Shows current connections and queries being executed.
  - EXPLAIN ANALYZE <your_query>;: Shows the query plan and actual execution time, crucial for optimizing slow queries.
  - pg_stat_statements extension: Tracks execution statistics for all queries.
  - Monitoring tools: pgAdmin, Pganalyze, Datadog, etc.
- MySQL/MariaDB:
  - SHOW PROCESSLIST;: Shows current connections/queries.
  - EXPLAIN <your_query>;: Shows the query execution plan.
  - Performance Schema / sys schema: Provides detailed performance metrics.
  - Monitoring tools: MySQL Workbench, Percona Monitoring and Management (PMM), etc.

Configuration Files

Database behavior is controlled by configuration files. Editing these requires understanding the parameters and usually restarting the database service for changes to take effect. Always back up configuration files before editing!

PostgreSQL:
- postgresql.conf: Main server configuration (memory allocation like shared_buffers, work_mem; logging; network settings; autovacuum settings). Location often within the data directory (e.g., /var/lib/postgresql/<version>/main/) or /etc/postgresql/<version>/main/.
- pg_hba.conf: Client authentication control (Host-Based Authentication). Defines which users/networks can connect using which methods (peer, md5, scram-sha-256, etc.). Crucial for security.
- pg_ident.conf: Maps system usernames to database usernames (used for ident authentication).
MySQL/MariaDB:
- my.cnf or my.ini: Main configuration file (locations can vary: /etc/my.cnf, /etc/mysql/my.cnf, /etc/mysql/mysql.conf.d/). Contains settings for storage engines (InnoDB buffer pool size), logging, network ports, etc.

Basic Security Considerations

Network Access: Use pg_hba.conf (PostgreSQL) or firewall rules (ufw, firewalld) and bind-address settings (in postgresql.conf or my.cnf) to restrict connections only from trusted hosts/networks. Avoid exposing databases directly to the public internet unless absolutely necessary and properly secured.
Strong Passwords: Enforce strong passwords for all database users. Use secure authentication methods (scram-sha-256 preferred in PostgreSQL).
Principle of Least Privilege: Grant users only the permissions they absolutely need to perform their tasks. Avoid using superuser accounts for applications.
Regular Updates: Keep the database server and the underlying Linux OS patched against known vulnerabilities.
Encryption: Consider encrypting data at rest (filesystem level or Transparent Data Encryption if supported) and in transit (using SSL/TLS connections).
Auditing: Configure logging to track important actions (logins, DDL changes, etc.).

Basic Linux DBA skills are essential for ensuring your databases run smoothly, securely, and reliably. Start with user management, backups, basic monitoring, and understanding key configuration files.

Workshop Backing Up and Restoring the PostgreSQL University Database

This workshop provides practical experience with fundamental backup and restore operations for your PostgreSQL university_db using the standard pg_dump and psql/pg_restore command-line tools on Linux.

Prerequisites:

PostgreSQL server running with the university_db database populated from the previous workshop.
Access to the Linux terminal.
You should know the password for the university_admin user (or the postgres superuser if needed).
The pg_dump and pg_restore/psql command-line tools installed (usually included with postgresql-client or the main postgresql package).

Scenario:

We will create a backup of the university_db, simulate data loss by dropping the database, and then restore it from the backup.

Steps:

Create a Backup Directory: It's good practice to store backups in a dedicated location.
```
mkdir ~/db_backups
cd ~/db_backups
```
Perform the Backup using pg_dump: We'll use the custom format (-F c) which is compressed and generally recommended. We'll back up the university_db as the university_admin user.
```
pg_dump -U university_admin -h localhost -W -F c -f university_db_backup.dump university_db
```
- -U university_admin: Connect as this PostgreSQL user.
- -h localhost: Connect to the server on the local machine.
- -W: Prompt for the university_admin password.
- -F c: Use the custom archive format (compressed).
- -f university_db_backup.dump: Write the output to this file.
- university_db: The name of the database to dump.
Enter the password for university_admin when prompted. If successful, the command will complete without errors, and you'll have a file named university_db_backup.dump in your ~/db_backups directory.
Examine the Backup File (Optional): Since it's a custom format, you can't view it directly as SQL text. However, pg_restore can list its contents.
```
pg_restore -l university_db_backup.dump
```
This command lists the items archived in the dump file (schema definitions, table data, constraints, etc.).
Simulate Data Loss (Drop the Database): WARNING: This step permanently deletes the database. Ensure your backup completed successfully in Step 2 before proceeding. To drop a database, you usually need higher privileges (like the postgres superuser) or be the database owner (university_admin should work here, but sometimes superuser access is needed if there are active connections). You also cannot be connected to the database you intend to drop. Let's connect to the default postgres database first using the postgres Linux user.
```
sudo -u postgres psql postgres
```
Now, inside psql connected as postgres to the postgres database, drop university_db:
```
DROP DATABASE university_db;
```
You might get an error if other users are connected. If so, you may need to terminate connections first (advanced topic) or ensure no other psql sessions or applications are using university_db. If successful, you'll see DROP DATABASE. Verify it's gone by listing databases:
```
\l
```
university_db should no longer be listed. Exit psql:
```
\q
```
Recreate the Empty Database: Before restoring, the target database needs to exist. We'll recreate it, ensuring the original owner (university_admin) is set. Again, use the postgres superuser for this.
```
sudo -u postgres psql postgres
```
Inside psql:
```
CREATE DATABASE university_db OWNER university_admin;
```
You should see CREATE DATABASE. Verify its existence:
```
\l
```
university_db should be listed again, owned by university_admin. Exit psql:
```
\q
```
Restore the Database using pg_restore: Now, use pg_restore to load the data from your backup file into the newly created empty database. We'll connect as university_admin since that user owns the database and likely has permissions to create objects within it based on the dump file.
```
cd ~/db_backups # Make sure you are in the directory with the dump file
pg_restore -U university_admin -h localhost -W -d university_db -v university_db_backup.dump
```
- -U university_admin: Connect as this user.
- -h localhost: Connect to the local server.
- -W: Prompt for the password.
- -d university_db: Restore into this database.
- -v: Verbose mode (shows progress).
- university_db_backup.dump: The backup file to restore from.
Enter the university_admin password. You should see verbose output as pg_restore processes the dump file, recreating tables, constraints, and loading data. Look for completion messages without errors.
Verify the Restoration: Connect to the database again and check if the tables and data are back.
```
psql -U university_admin -d university_db -h localhost -W
```
Enter the password. Inside psql:
- List tables:
```
\dt
```
  (Expected: courses, enrollments, students should be listed)
- Check data in one of the tables:
```
SELECT * FROM Students ORDER BY student_id;
```
  (Expected: You should see the student data that existed before you dropped the database - likely Alice, Bob, Charlie, as David was deleted in a previous workshop unless you re-added him)
- Check enrollments:
```
SELECT * FROM Enrollments;
```
  (Expected: You should see the enrollment data corresponding to the restored students/courses)
Exit psql:
```
\q
```

Congratulations! You have successfully backed up a PostgreSQL database using pg_dump, simulated a failure by dropping the database, and restored it using pg_restore. This is a critical skill for any database administrator or developer working with databases on Linux. Remember to schedule regular backups as part of any production deployment.

Conclusion

This journey through Database Basics has equipped you with a foundational understanding of how data is organized, managed, and manipulated, particularly within the Linux environment. We started with the core concept of what a database is and why Database Management Systems (DBMS) are essential for handling data efficiently, reliably, and securely.

We delved into the ubiquitous Relational Model, exploring its core components: tables, rows, columns, schemas, and the crucial role of primary and foreign keys in maintaining data integrity and defining relationships. We learned about SQL (Structured Query Language), the standard language for interacting with relational databases, covering its main sub-languages (DDL, DML, DCL, TCL) and focusing on the fundamental DML commands: INSERT, SELECT, UPDATE, and DELETE. The importance of ACID properties (Atomicity, Consistency, Isolation, Durability) in guaranteeing reliable transactions within relational systems was also highlighted. Our practical workshops focused on PostgreSQL, guiding you through installation, connection, schema design using data modeling principles like normalization, and manipulating data within a sample university database.

Recognizing that the relational model isn't the only solution, we explored the diverse world of NoSQL databases. We discussed the motivations behind their development – handling Big Data challenges like scale, availability, and flexible data structures. We touched upon the CAP theorem and the BASE properties, understanding the trade-offs often made between consistency and availability in distributed NoSQL systems. We surveyed the major NoSQL categories – Key-Value Stores, Document Databases, Column-Family Stores, and Graph Databases – understanding their unique data models and common use cases. The workshop provided a taste of NoSQL using Redis, demonstrating the simplicity and speed of key-value operations.

Finally, we covered essential Database Administration Basics relevant to Linux environments. This included managing users and privileges, implementing fundamental backup and restore strategies using tools like pg_dump and pg_restore, performing basic performance monitoring using Linux system tools and database-specific commands, understanding the importance of configuration files (like postgresql.conf and pg_hba.conf), and applying key security considerations.

The key takeaway is that choosing the right database technology – whether SQL or a specific type of NoSQL – depends heavily on the specific requirements of your application regarding data structure, consistency needs, scalability demands, and query patterns. Often, modern applications employ a polyglot persistence approach, using multiple database types for different tasks.

The knowledge and practical skills gained here provide a solid platform for further exploration. Potential next steps include diving deeper into advanced SQL (joins, subqueries, window functions), learning about database indexing and performance optimization, exploring specific database technologies in more detail (e.g., advanced PostgreSQL features, MongoDB query language, Cassandra data modeling), or investigating database replication and high-availability setups. Databases are a cornerstone of modern software development and data science, and continued learning in this area is highly valuable.