Author | Nejat Hakan |
nejat.hakan@outlook.de | |
PayPal Me | https://paypal.me/nejathakan |
API Development with Flask
Introduction
Welcome to the world of API development using Flask, a popular Python web framework. This guide is designed for Linux users, particularly university students seeking a deep understanding of how to build robust and scalable Application Programming Interfaces (APIs). We will start with the fundamentals and gradually progress to more advanced topics, equipping you with the knowledge and practical skills needed for real-world API development.
What is an API?
An Application Programming Interface (API) acts as an intermediary, a contract, or a set of rules that allows different software applications to communicate with each other. Think of it like a waiter in a restaurant. You (the client application) don't go directly into the kitchen (the server application) to prepare your food. Instead, you give your order (a request) to the waiter (the API), who communicates it to the kitchen. The kitchen prepares the meal (processes the request and retrieves data), and the waiter brings it back to you (the response).
APIs are fundamental to modern software development, enabling:
- Decoupling: Front-end applications (like web browsers or mobile apps) can be developed independently from back-end services.
- Integration: Different services, potentially built with different technologies, can exchange data and functionality. For example, a weather app might use an API from a meteorological service.
- Reusability: A single backend API can serve multiple clients (web, mobile, desktop).
- Abstraction: APIs hide the complex internal implementation details of a service, exposing only the necessary functionality.
Types of Web APIs:
While various API architectures exist, the most common in web development is REST (Representational State Transfer).
- REST: An architectural style, not a strict protocol. It relies on standard HTTP methods (GET, POST, PUT, DELETE), uses URLs (Uniform Resource Locators) to identify resources, and often exchanges data in formats like JSON (JavaScript Object Notation) or XML (Extensible Markup Language). REST APIs are stateless, meaning each request from a client must contain all the information needed to understand and process it; the server does not store any client context between requests. We will primarily focus on REST APIs in this guide.
- SOAP (Simple Object Access Protocol): A stricter protocol-based standard that often uses XML for message formatting and typically relies on HTTP or SMTP for transmission. It has built-in standards for security and transactions but is generally considered more complex and verbose than REST.
- GraphQL: A query language for APIs developed by Facebook. It allows clients to request exactly the data they need and nothing more, potentially reducing the number of requests and the amount of data transferred compared to REST.
What is Flask?
Flask is a microframework for Python based on the Werkzeug WSGI toolkit and the Jinja2 templating engine. The term "micro" doesn't mean Flask lacks functionality; rather, it signifies that the core framework is simple, lightweight, and aims to keep dependencies minimal. It doesn't make many decisions for you, such as which database ORM (Object-Relational Mapper) or authentication library to use. This provides developers with significant flexibility.
Key Characteristics of Flask:
- Minimalist Core: Provides basic tools like routing, request handling, response generation, and templating.
- Extensible: Offers a rich ecosystem of extensions (e.g., Flask-SQLAlchemy for databases, Flask-Migrate for migrations, Flask-JWT-Extended for authentication) that can be easily integrated to add specific functionalities.
- Flexibility: Developers have the freedom to choose libraries and design patterns that best suit their project needs.
- Werkzeug & Jinja2: Built upon robust and well-regarded libraries. Werkzeug handles the WSGI (Web Server Gateway Interface) layer, managing requests and responses, while Jinja2 is a powerful templating engine for rendering dynamic HTML (though less critical for pure API development where JSON is the primary response format).
- Built-in Development Server: Comes with a simple server suitable for development and testing.
- Integrated Unit Testing Support: Facilitates writing and running tests for your application.
Flask vs. Django (Brief Comparison):
Django is another popular Python web framework, often described as "batteries-included."
- Scope: Django is a full-stack framework, providing an ORM, admin interface, authentication system, and more out-of-the-box. Flask is a microframework, requiring extensions for similar features.
- Flexibility: Flask offers more flexibility in choosing components. Django is more opinionated, guiding developers towards specific ways of doing things.
- Learning Curve: Flask generally has a gentler initial learning curve due to its smaller core. Django's comprehensive nature can be initially overwhelming but provides a lot of built-in structure.
- Use Cases: Flask excels in smaller applications, microservices, and APIs where flexibility is paramount. Django is often favored for larger, complex applications where its built-in features provide rapid development capabilities.
Why Flask for APIs?
Flask's characteristics make it an excellent choice for building APIs:
- Simplicity: Its minimalist nature makes it easy to get started quickly. Writing a basic API endpoint requires very little boilerplate code.
- Flexibility: You can choose the best libraries for your specific needs (e.g., database interaction, serialization, authentication) without being tied to built-in components you might not need.
- Explicit Control: Flask doesn't hide much, giving you clearer control over the request/response cycle, which is crucial for API development.
- Performance: Being lightweight, Flask can be very performant, especially when paired with efficient WSGI servers like Gunicorn or uWSGI.
- Python Ecosystem: Leverages the vast and powerful Python ecosystem for tasks like data manipulation (Pandas, NumPy), database access, machine learning integration, etc.
- Ideal for Microservices: Its small footprint and flexibility make Flask a popular choice for building individual microservices within a larger distributed system.
Setting up the Linux Environment
Before we start coding, let's ensure your Linux environment is ready. Most modern Linux distributions come with Python pre-installed.
1. Verify Python Installation:
Open your terminal and type:
You should see output indicating the installed versions (e.g., Python 3.8.x or higher is recommended). pip
is the package installer for Python. If python3
or pip3
are not found, you'll need to install them using your distribution's package manager.
- Debian/Ubuntu:
- Fedora/CentOS/RHEL:
(Use
yum
instead ofdnf
on older CentOS/RHEL versions).
2. Understanding Virtual Environments (venv
):
It is highly recommended to use virtual environments for every Python project. A virtual environment creates an isolated directory containing a specific Python interpreter and its own set of installed packages. This prevents package conflicts between different projects and keeps your global Python installation clean.
- Why use them? Imagine Project A needs version 1.0 of a library, but Project B needs version 2.0. Without virtual environments, installing version 2.0 for Project B might break Project A. Virtual environments solve this by isolating dependencies.
- Creating a Virtual Environment:
Navigate to the directory where you want to create your Flask project. Let's call it
my_flask_api
.This creates amkdir my_flask_api cd my_flask_api python3 -m venv venv # The command structure is: python3 -m venv <name_of_environment_directory> # 'venv' is a conventional name for the environment directory.
venv
subdirectory withinmy_flask_api
. - Activating the Virtual Environment:
Before installing packages or running your application, you must activate the environment:
Your terminal prompt should now change, often prepended with
(venv)
, indicating the environment is active. Anypip install
commands will now install packages into this isolatedvenv
directory. - Deactivating the Virtual Environment: When you're done working on the project, simply type:
3. Installing Flask:
With your virtual environment activated, install Flask using pip:
Pip will download and install Flask and its dependencies (Werkzeug, Jinja2, ItsDangerous, Click).
You are now ready to start building your first Flask API!
Workshop Setting Up Your Environment
This workshop guides you through setting up your development environment on Linux for the upcoming Flask API projects.
Goal:
Create a project directory, set up a Python virtual environment, and install Flask.
Steps:
- Open Your Terminal: Launch your preferred terminal application on your Linux system.
- Create a Project Directory: Choose a location for your projects (e.g.,
~/projects
). Create a directory specifically for this guide's work.# Navigate to where you store projects (create it if it doesn't exist) mkdir -p ~/projects cd ~/projects # Create the main directory for our API development work mkdir flask_api_course cd flask_api_course # Create a directory for the first basic API project mkdir basic_api cd basic_api # Verify your current directory pwd # Expected output: /home/your_username/projects/flask_api_course/basic_api (or similar)
- Create a Python Virtual Environment: Inside the
basic_api
directory, create a virtual environment namedvenv
. You should now see avenv
subdirectory: - Activate the Virtual Environment: Activate the newly created environment.
Observe your terminal prompt. It should now start with
(venv)
, like: This confirms the environment is active. - Install Flask: Use
pip
to install Flask within the active virtual environment. You will see output indicating Flask and its dependencies are being downloaded and installed. - Verify Flask Installation: You can check if Flask is installed correctly.
pip freeze # Expected output (versions might differ): # Click==... # Flask==... # itsdangerous==... # Jinja2==... # MarkupSafe==... # Werkzeug==...
pip freeze
lists all packages installed in the current environment. SeeingFlask
in the list confirms the installation. - (Optional) Deactivate and Reactivate: Practice deactivating and reactivating the environment.
Outcome: You now have a dedicated project directory (basic_api
) with an isolated Python environment where Flask is installed. You are ready to write your first Flask application in the next section. Remember to always activate your virtual environment (source venv/bin/activate
) when working on the project.
1. Your First Flask API
Let's dive straight into creating a minimal but functional Flask API. This section covers the absolute basics: creating a Flask application instance, defining a route, running the development server, and returning a simple JSON response.
Hello World API
The "Hello, World!" of web frameworks is typically displaying that text in a browser. For APIs, the equivalent is often returning a simple JSON message.
1. Create the Application File:
Inside your basic_api
directory (where your venv
folder resides), create a Python file named app.py
.
# Make sure you are in ~/projects/flask_api_course/basic_api
# Make sure your virtual environment is active: source venv/bin/activate
touch app.py
2. Write the Basic Flask Code:
Open app.py
in your favorite text editor (like VS Code, Vim, Nano, Gedit) and add the following code:
# app.py
from flask import Flask, jsonify
# 1. Create an instance of the Flask class
# __name__ tells Flask where to look for resources like templates and static files.
app = Flask(__name__)
# 2. Define a route and a view function
# The @app.route decorator binds a URL path ('/') to the hello_world function.
@app.route('/')
def hello_world():
"""This function is executed when someone accesses the root URL ('/')."""
# 3. Return a response
# We return a simple string for now.
return "Hello, World! This is my first Flask API."
# 4. Define another route for a JSON response
@app.route('/api/hello')
def hello_api():
"""This function returns a JSON response."""
# Prepare data as a Python dictionary
message_data = {
"message": "Hello from the API!",
"version": "1.0",
"status": "success"
}
# Use jsonify to convert the dictionary to a JSON response
# It also sets the Content-Type header to 'application/json'
return jsonify(message_data)
# 5. Run the application (only if this script is executed directly)
# The check `if __name__ == '__main__':` ensures this code
# doesn't run if the file is imported as a module elsewhere.
if __name__ == '__main__':
# `debug=True` enables the interactive debugger and automatic reloading.
# NEVER use debug=True in a production environment!
# `host='0.0.0.0'` makes the server accessible from other devices on your network.
# Default is '127.0.0.1' (localhost), only accessible from your machine.
app.run(host='0.0.0.0', port=5000, debug=True)
Explanation:
from flask import Flask, jsonify
: Imports the necessary classes.Flask
is the core application class, andjsonify
is a helper function to create JSON responses.app = Flask(__name__)
: Creates an instance of the Flask application.__name__
is a special Python variable that holds the name of the current module. Flask uses this to determine the application's root path.@app.route('/')
: This is a Python decorator. Decorators modify or enhance functions.@app.route()
registers the function that follows it (hello_world
) as a handler for requests to the specified URL path (/
, the root URL).def hello_world(): ...
: This is called a "view function". It contains the logic that processes the request and returns a response. Here, it simply returns a string.@app.route('/api/hello')
: Defines another route at the/api/hello
path.def hello_api(): ...
: This view function prepares a Python dictionary and usesjsonify(message_data)
to convert it into a proper JSON response with the correctContent-Type: application/json
header. This is crucial for clients expecting JSON.if __name__ == '__main__': app.run(...)
: This block starts Flask's built-in development server only when the scriptapp.py
is run directly (not imported).host='0.0.0.0'
: Listens on all available network interfaces. This allows you to access the API from other devices on your local network or even a virtual machine if networking is configured correctly. The default'127.0.0.1'
(localhost) only allows connections from the same machine.port=5000
: Specifies the port number to listen on. 5000 is the Flask default.debug=True
: Enables debug mode. This provides:- Interactive Debugger: If an error occurs, a detailed traceback is shown in the browser, allowing inspection of variables. SECURITY RISK: Never use in production.
- Auto-Reloader: The server automatically restarts when it detects code changes, speeding up development.
3. Run the Development Server:
Make sure your virtual environment is active (source venv/bin/activate
) and you are in the basic_api
directory. Run the application:
You should see output similar to this:
* Serving Flask app 'app' (lazy loading)
* Environment: development
* Debug mode: on
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://<your-local-ip>:5000
* Restarting with stat
* Debugger is active!
* Debugger PIN: xxx-xxx-xxx
The server is now running and listening for requests on port 5000.
4. Test the API Endpoints:
Open your web browser or use a tool like curl
in another terminal window:
-
Test the root URL:
- Browser: Navigate to
http://127.0.0.1:5000/
curl
: You should see the text:Hello, World! This is my first Flask API.
- Browser: Navigate to
-
Test the JSON API endpoint:
- Browser: Navigate to
http://127.0.0.1:5000/api/hello
curl
: You should see the JSON response: If usingcurl
, notice thatjsonify
automatically set theContent-Type
header. You can see headers withcurl -i
:
- Browser: Navigate to
5. Stop the Server:
Go back to the terminal where the server is running and press Ctrl+C
.
Understanding Routes (@app.route
)
The @app.route()
decorator is fundamental to Flask. It maps URL paths to your Python functions (view functions).
- Syntax:
@app.route(rule, **options)
rule
: The URL path as a string (e.g.,'/'
,'/users'
,'/items/<int:item_id>'
).options
: Keyword arguments that configure the route. The most common ismethods
.
HTTP Methods
Web applications, especially APIs, rely heavily on HTTP methods (also called "verbs") to define the action intended for a resource. The primary methods used in REST APIs are:
- GET: Retrieve data. Used for fetching resources (e.g., get a list of users, get details of a specific user). Should be safe (no side effects like data modification) and idempotent (multiple identical requests have the same effect as one).
- POST: Create a new resource. Used for submitting data to be processed, often resulting in the creation of a new entity (e.g., create a new user, post a new message). Not idempotent (multiple identical POST requests will likely create multiple resources).
- PUT: Update an existing resource entirely. Used to replace a resource at a specific URL with the provided data (e.g., update a user's entire profile). Should be idempotent (multiple identical PUT requests with the same data will result in the same final state).
- PATCH: Partially update an existing resource. Used to apply partial modifications to a resource (e.g., update only the user's email address). Idempotency depends on the nature of the patch operation.
- DELETE: Remove a resource. Used to delete the resource identified by the URL (e.g., delete a specific user). Should be idempotent (deleting something multiple times results in the same state - it's gone).
By default, a Flask route defined with @app.route()
only listens for GET requests (and implicitly HEAD and OPTIONS, which browsers often use). To handle other methods, you use the methods
argument:
@app.route('/api/items', methods=['GET', 'POST'])
def handle_items():
if request.method == 'POST':
# Logic to create a new item
return jsonify({"message": "Item created"}), 201 # 201 Created status
else: # GET request
# Logic to retrieve items
return jsonify({"items": [...]})
# Need to import request from flask
from flask import Flask, jsonify, request
# ... rest of app setup ...
request
object in the next section.
Returning JSON Data (jsonify
)
APIs commonly communicate using JSON because it's lightweight, human-readable, and easily parsed by JavaScript (and most other languages).
Flask's jsonify()
function does more than just convert a Python dictionary or list to a JSON string:
- Serialization: Converts Python objects (dicts, lists, strings, numbers, booleans, None) into a JSON formatted string.
- Response Object: Wraps the JSON string in a Flask
Response
object. - MIME Type: Sets the
Content-Type
header of the response toapplication/json
. This is crucial for clients to correctly interpret the response body.
Always use jsonify()
when returning JSON data from your Flask API endpoints. Simply returning a dictionary will not set the correct Content-Type
header, although Flask might sometimes correctly serialize it. Using jsonify
is the standard and correct way.
# Good practice
@app.route('/good')
def good_json():
data = {"status": "ok"}
return jsonify(data) # Correctly sets Content-Type: application/json
# Less ideal (might work, but doesn't guarantee correct headers)
@app.route('/less-good')
def less_good_json():
data = {"status": "maybe"}
return data # Header might be text/html depending on Flask version/context
Workshop Creating a Simple "Hello API"
Let's reinforce the concepts by building a slightly different "Hello API" from scratch, focusing on JSON output.
Goal:
Create a Flask API with two endpoints: one that returns a static welcome message in JSON, and another that greets a user by name (passed as part of the URL).
Steps:
-
Ensure Setup:
- Make sure you are in the
basic_api
directory (cd ~/projects/flask_api_course/basic_api
). - Make sure your virtual environment is active (
source venv/bin/activate
). - You should have
app.py
from the previous explanation. You can either modify it or rename it (e.g.,mv app.py app_old.py
) and create a newapp.py
. Let's create a new one for clarity.
- Make sure you are in the
-
Create
app.py
: Create a new file namedapp.py
and add the following code:# app.py from flask import Flask, jsonify # Create the Flask application instance app = Flask(__name__) # Endpoint 1: Static welcome message @app.route('/api/welcome') def welcome(): """Returns a static welcome message in JSON format.""" response_data = { "message": "Welcome to our Simple API!", "endpoints_available": [ "/api/welcome", "/api/greet/<name>" ] } return jsonify(response_data) # Endpoint 2: Personalized greeting (using a variable route) # <name> captures the part of the URL after /api/greet/ and passes it # as an argument called 'name' to the greet_user function. @app.route('/api/greet/<name>') def greet_user(name): """Returns a personalized greeting in JSON format.""" # Basic input validation/sanitization is often needed here in real apps! # For now, we'll assume the name is safe. response_data = { "greeting": f"Hello, {name}! Nice to meet you." } return jsonify(response_data) # Run the development server if __name__ == '__main__': print("Starting Flask development server...") print("Access Welcome API at: http://127.0.0.1:5000/api/welcome") print("Access Greet API at: http://127.0.0.1:5000/api/greet/YourName") app.run(host='0.0.0.0', port=5000, debug=True)
-
Run the Application: In your terminal (with the virtual environment active):
-
Test the Endpoints:
- Welcome Endpoint: Open a new terminal or use your browser.
- Greet Endpoint:
Try different names in the URL.
Try using spaces (URL encoding
curl http://127.0.0.1:5000/api/greet/Alice # Expected Output: # { # "greeting": "Hello, Alice! Nice to meet you." # } curl http://127.0.0.1:5000/api/greet/Bob # Expected Output: # { # "greeting": "Hello, Bob! Nice to meet you." # }
%20
might be needed or automatically handled bycurl
):
-
Stop the Server: Press
Ctrl+C
in the terminal running the Flask app.
Outcome: You have successfully created a Flask API with two distinct GET endpoints. One returns static JSON data, and the other demonstrates basic routing with variable parts, accepting input directly from the URL path and incorporating it into the JSON response. You also practiced using jsonify
consistently.
2. Routing and Request Handling
Now that you can create basic endpoints, let's explore Flask's routing capabilities in more detail and learn how to access incoming request data, which is essential for building interactive APIs.
Variable Rules in Routes
As seen in the previous workshop (/api/greet/<name>
), you can define dynamic parts in your URL routes. These parts are captured and passed as keyword arguments to your view function.
- Syntax:
<variable_name>
- Example:
@app.route('/users/<user_id>')
will match URLs like/users/123
,/users/abc
, etc. The value (123
orabc
) will be passed as theuser_id
argument to the view function.
from flask import Flask, jsonify
app = Flask(__name__)
@app.route('/product/<product_sku>')
def get_product(product_sku):
# In a real app, you'd look up the product_sku in a database
return jsonify({
"message": f"Fetching details for product SKU: {product_sku}",
"sku": product_sku
})
if __name__ == '__main__':
app.run(debug=True)
# Test with: curl http://127.0.0.1:5000/product/XYZ-789
By default, captured variables are treated as strings.
Type Converters
Sometimes, you need the captured variable to be a specific type, like an integer or a float. Flask provides built-in converters for this.
- Syntax:
<converter:variable_name>
- Common Converters:
string
: (Default) Accepts any text without a slash.int
: Accepts positive integers.float
: Accepts positive floating-point values.path
: Likestring
, but also accepts slashes (useful for capturing file paths).uuid
: Accepts UUID strings.
Example using int
:
from flask import Flask, jsonify, abort
app = Flask(__name__)
# Sample data (replace with database later)
users = {
1: {"name": "Alice", "email": "alice@example.com"},
2: {"name": "Bob", "email": "bob@example.com"}
}
@app.route('/api/users/<int:user_id>')
def get_user_by_id(user_id):
# user_id is now guaranteed to be an integer by Flask's routing
print(f"Requested user ID: {user_id}, Type: {type(user_id)}")
user = users.get(user_id)
if user:
return jsonify(user)
else:
# Abort with a 404 Not Found error if user doesn't exist
abort(404, description=f"User with ID {user_id} not found")
# Custom error handler for 404 errors (more on this later)
@app.errorhandler(404)
def resource_not_found(e):
return jsonify(error=str(e)), 404
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000, debug=True)
# Test with:
# curl http://127.0.0.1:5000/api/users/1 -> Success (Alice)
# curl http://127.0.0.1:5000/api/users/2 -> Success (Bob)
# curl http://127.0.0.1:5000/api/users/3 -> 404 Not Found (JSON error)
# curl http://127.0.0.1:5000/api/users/abc -> 404 Not Found (Flask routing fails match)
If you try to access /api/users/abc
, Flask's router won't even match the route because "abc" cannot be converted to an integer by the int:
converter, resulting in a standard 404 Not Found response (likely HTML in debug mode, or a plain text 404). If you request /api/users/3
, the route matches, the get_user_by_id
function runs, but since user 3 isn't in our users
dictionary, we explicitly abort(404)
, triggering our custom JSON 404 error handler.
Using converters provides automatic type validation at the routing level.
Accessing Request Data (request
object)
APIs often need to receive data from the client beyond just the URL path. This could be query parameters, data in the request body (like JSON payloads for POST/PUT requests), or headers. Flask provides the global request
object (imported from flask
) to access this information within a view function.
Important: The request
object is a context local. This means it acts like a global variable, but its value is specific to the current incoming request and thread. You can safely access it within a view function without passing it explicitly.
from flask import Flask, jsonify, request, abort
app = Flask(__name__)
# ... (add users dictionary and error handler from previous example) ...
1. Query Parameters (request.args
):
These are key-value pairs appended to the URL after a ?
, separated by &
. Example: /search?q=flask&limit=10
. They are typically used with GET requests for filtering, sorting, or pagination.
request.args
: AnImmutableMultiDict
(dictionary-like object that can have multiple values for the same key).request.args.get(key, default=None, type=None)
: Recommended way to access parameters. It returnsNone
if the key doesn't exist (avoidingKeyError
) and allows specifying a default value and a type conversion function (e.g.,type=int
).
@app.route('/api/search')
def search():
query = request.args.get('q') # Get the 'q' parameter
limit = request.args.get('limit', default=20, type=int) # Get 'limit', default 20, as int
sort_by = request.args.get('sort')
if not query:
return jsonify({"error": "Missing 'q' query parameter"}), 400 # Bad Request
# In a real app, perform search based on query, limit, sort_by
results = [
f"Result for '{query}' - 1",
f"Result for '{query}' - 2",
]
return jsonify({
"query_received": query,
"limit_applied": limit,
"sorting": sort_by if sort_by else "default",
"results": results[:limit] # Apply the limit
})
# Test with:
# curl http://127.0.0.1:5000/api/search?q=python -> Uses limit=20
# curl "http://127.0.0.1:5000/api/search?q=flask&limit=5&sort=date" -> Uses limit=5, sort=date
# curl http://127.0.0.1:5000/api/search -> Error 400
# curl "http://127.0.0.1:5000/api/search?q=test&limit=abc" -> limit will be default (20) due to type conversion failure
&
in most shells to prevent misinterpretation)
2. Request Body Data (request.json
, request.form
, request.data
):
Used primarily with POST, PUT, PATCH methods to send data to the server.
request.json
: Parses the incoming request body as JSON. Returns a Python dictionary or list. If the body isn't valid JSON or theContent-Type
header isn'tapplication/json
, it returnsNone
. Userequest.get_json()
for more control (e.g.,force=True
to parse even without correct header,silent=True
to returnNone
on error instead of raising one). This is the most common way to receive data in modern APIs.request.form
: Parses data submitted from an HTML form (Content-Type: application/x-www-form-urlencoded
ormultipart/form-data
). Returns anImmutableMultiDict
.request.data
: Returns the raw request body as bytes. Useful if the data isn't form data or JSON, or if you need to process it differently.request.files
: Used specifically for file uploads (multipart/form-data
). Returns anImmutableMultiDict
where values areFileStorage
objects.
Example using request.json
for a POST request:
# In-memory storage for simplicity
items_db = {}
next_item_id = 1
@app.route('/api/items', methods=['POST'])
def create_item():
global next_item_id # Allow modification of the global variable
# Check if the request content type is JSON
if not request.is_json:
return jsonify({"error": "Request must be JSON"}), 415 # Unsupported Media Type
# Get JSON data from the request body
# Use get_json(silent=True) to avoid automatic 400 error on parse failure
item_data = request.get_json()
# Basic validation
if not item_data:
return jsonify({"error": "Invalid JSON data"}), 400
if 'name' not in item_data or 'price' not in item_data:
return jsonify({"error": "Missing 'name' or 'price' in request body"}), 400
if not isinstance(item_data['name'], str) or not isinstance(item_data['price'], (int, float)):
return jsonify({"error": "'name' must be a string, 'price' must be a number"}), 400
# Create the new item
new_item = {
"id": next_item_id,
"name": item_data['name'],
"price": item_data['price'],
"description": item_data.get("description") # Optional field
}
items_db[next_item_id] = new_item
next_item_id += 1
# Return the created item and a 201 Created status code
return jsonify(new_item), 201
# Add a GET endpoint to see the items
@app.route('/api/items', methods=['GET'])
def get_all_items():
return jsonify(list(items_db.values()))
# Test POST with curl:
# curl -X POST http://127.0.0.1:5000/api/items \
# -H "Content-Type: application/json" \
# -d '{"name": "Laptop", "price": 1200.50, "description": "A powerful laptop"}'
#
# curl -X POST http://127.0.0.1:5000/api/items \
# -H "Content-Type: application/json" \
# -d '{"name": "Mouse", "price": 25}'
#
# Test GET:
# curl http://127.0.0.1:5000/api/items
-X POST
: Specifies the HTTP method.-H "Content-Type: application/json"
: Sets the required header.-d '...'
: Provides the request body data (the JSON string).
3. Request Headers (request.headers
):
Access incoming request headers (e.g., Authorization
, Accept
, User-Agent
).
request.headers
: A dictionary-like object (case-insensitive keys).request.headers.get('Header-Name')
: Access a specific header.
@app.route('/api/show-headers')
def show_headers():
user_agent = request.headers.get('User-Agent')
accept_header = request.headers.get('Accept')
# Convert headers to a regular dictionary for jsonify compatibility
headers_dict = dict(request.headers)
return jsonify({
"message": "Showing selected request headers",
"user_agent": user_agent,
"accept_header": accept_header,
"all_headers": headers_dict
})
# Test with:
# curl http://127.0.0.1:5000/api/show-headers
# curl -H "X-Custom-Header: MyValue" http://127.0.0.1:5000/api/show-headers
Handling Different HTTP Methods in One Route
As shown previously, you can specify multiple methods in the methods
list of @app.route()
. Inside the view function, you then use request.method
to determine which HTTP method was used for the current request and execute the appropriate logic.
# Assume items_db exists from previous example
@app.route('/api/items/<int:item_id>', methods=['GET', 'PUT', 'DELETE'])
def handle_single_item(item_id):
item = items_db.get(item_id)
if request.method == 'GET':
if item:
return jsonify(item)
else:
abort(404, description=f"Item with ID {item_id} not found")
elif request.method == 'PUT':
if not item:
abort(404, description=f"Item with ID {item_id} not found")
if not request.is_json:
return jsonify({"error": "Request must be JSON"}), 415
update_data = request.get_json()
if not update_data or 'name' not in update_data or 'price' not in update_data:
return jsonify({"error": "Missing 'name' or 'price' in request body for PUT"}), 400
# Update the item in place (or replace entirely)
item['name'] = update_data['name']
item['price'] = update_data['price']
item['description'] = update_data.get('description', item.get('description')) # Update if provided
items_db[item_id] = item # Ensure it's updated in our 'db'
return jsonify(item) # Return updated item
elif request.method == 'DELETE':
if item:
del items_db[item_id]
# No Content response typically has an empty body
return '', 204 # 204 No Content status
else:
abort(404, description=f"Item with ID {item_id} not found")
# Test PUT (assuming item 1 exists):
# curl -X PUT http://127.0.0.1:5000/api/items/1 \
# -H "Content-Type: application/json" \
# -d '{"name": "Gaming Laptop", "price": 1500.00, "description": "Upgraded laptop"}'
#
# Test DELETE (assuming item 2 exists):
# curl -X DELETE http://127.0.0.1:5000/api/items/2
# curl http://127.0.0.1:5000/api/items/2 -> Should now be 404
Workshop Building a Simple Data Retrieval API
Goal:
Create an API to manage a simple in-memory collection of "tasks". Implement endpoints to:
- Get all tasks (GET
/api/tasks
). - Get a specific task by ID (GET
/api/tasks/<int:task_id>
). - Create a new task (POST
/api/tasks
). - Allow filtering tasks by status using query parameters (e.g.,
/api/tasks?status=pending
).
Steps:
-
Setup:
- Continue working in the
basic_api
directory. - Make sure the virtual environment is active (
source venv/bin/activate
). - Create a new
app.py
or modify the existing one. Let's start fresh.
- Continue working in the
-
Create
app.py
:# app.py from flask import Flask, jsonify, request, abort app = Flask(__name__) # In-memory storage for tasks (list of dictionaries) tasks_db = [ {"id": 1, "title": "Learn Flask Basics", "status": "completed"}, {"id": 2, "title": "Build First API", "status": "completed"}, {"id": 3, "title": "Explore Routing", "status": "pending"}, ] next_task_id = 4 # Keep track of the next ID to assign # Error Handler for 404 @app.errorhandler(404) def not_found(error): # The description comes from abort(404, description=...) return jsonify({"error": error.description}), 404 # Error Handler for 400 @app.errorhandler(400) def bad_request(error): return jsonify({"error": error.description}), 400 # Error Handler for 415 @app.errorhandler(415) def unsupported_media_type(error): return jsonify({"error": error.description}), 415 # 1. Get all tasks (with optional filtering by status) @app.route('/api/tasks', methods=['GET']) def get_tasks(): # Check for optional 'status' query parameter status_filter = request.args.get('status') # e.g., 'pending' or 'completed' if status_filter: # Filter tasks based on the provided status filtered_tasks = [task for task in tasks_db if task['status'] == status_filter] return jsonify(filtered_tasks) else: # Return all tasks if no filter is applied return jsonify(tasks_db) # 2. Get a specific task by ID @app.route('/api/tasks/<int:task_id>', methods=['GET']) def get_task(task_id): # Find the task with the matching ID task = next((task for task in tasks_db if task['id'] == task_id), None) # The above uses a generator expression and next() to find the first match or return None if task: return jsonify(task) else: abort(404, description=f"Task with ID {task_id} not found") # 3. Create a new task @app.route('/api/tasks', methods=['POST']) def create_task(): global next_task_id # Expecting JSON data if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() # Basic validation if not data or 'title' not in data: abort(400, description="Missing 'title' in request body") if not isinstance(data['title'], str) or len(data['title'].strip()) == 0: abort(400, description="'title' must be a non-empty string") # Create the new task dictionary new_task = { "id": next_task_id, "title": data['title'].strip(), "status": "pending" # New tasks default to pending } # Add to our 'database' tasks_db.append(new_task) next_task_id += 1 # Return the created task and 201 status code return jsonify(new_task), 201 if __name__ == '__main__': print("Task API running...") print("Endpoints:") print(" GET /api/tasks") print(" GET /api/tasks?status=<status>") print(" GET /api/tasks/<task_id>") print(" POST /api/tasks (Body: {'title': 'Your task title'})") app.run(host='0.0.0.0', port=5000, debug=True)
-
Run the Application:
-
Test the Endpoints (use
curl
or an API testing tool like Postman/Insomnia):-
Get All Tasks:
(Should return the initial 3 tasks) -
Get Pending Tasks:
(Should return only task 3) -
Get Completed Tasks:
(Should return tasks 1 and 2) -
Get Task by ID (Success):
(Should return task 1) -
Get Task by ID (Not Found):
(Should return a 404 status and JSON error message) -
Create New Task (Success):
(Should return the new task with id=4 and status 201 Created) -
Create New Task (Missing Title):
(Should return a 400 status and JSON error message) -
Create New Task (Non-JSON):
(Should return a 415 status and JSON error message) -
Verify Creation:
(Should now show 4 tasks, including "Learn Advanced Flask")
-
-
Stop the Server: Press
Ctrl+C
.
Outcome: You have built a functional API for managing tasks using GET and POST methods. You've implemented variable routes with type converters (<int:task_id>
), accessed query parameters (request.args
), handled JSON request bodies (request.json
), performed basic validation, and used abort
to signal errors gracefully with appropriate status codes and JSON error messages by leveraging custom error handlers.
3. Response Formatting and Status Codes
Returning data is only part of the story; how you structure the response and what status code you use significantly impacts the usability and correctness of your API. Clients rely on these details to understand the outcome of their requests.
Crafting JSON Responses
We've already seen jsonify
as the primary tool for creating JSON responses. Consistency in your JSON structure is key for API consumers. Common patterns include:
- Data Envelope: Wrapping the primary data within a key, often
data
. This allows adding metadata alongside the main payload. Or for lists: - Direct Data: For simple cases, returning the object or list directly is also common. Or:
- Error Responses: Always return errors in a consistent JSON format. Include a descriptive message and potentially an error code or type. Or simpler: (As used in our previous error handlers).
Choose a pattern and stick to it throughout your API. The data envelope approach offers more flexibility for adding metadata later without breaking clients expecting a specific top-level structure.
Standard HTTP Status Codes
HTTP status codes are three-digit numbers returned by the server indicating the result of the client's request. Using the correct codes is crucial for RESTful API design. Clients (and intermediate proxies/caches) use these codes to determine how to proceed.
Here are some of the most important categories and codes for APIs:
2xx Success: The request was successfully received, understood, and accepted.
200 OK
: Standard response for successful GET requests. Also used for successful PUT/PATCH updates if no specific content is returned or if the updated resource is returned in the body.201 Created
: The request has been fulfilled and resulted in one or more new resources being created. Typically returned after successful POST requests. The response body should contain the newly created resource(s), and theLocation
header should contain the URL of the new resource.(Note:# Inside a POST view function after creating 'new_task' response = jsonify(new_task) response.status_code = 201 # Optionally set the Location header response.headers['Location'] = url_for('get_task', task_id=new_task['id'], _external=True) return response # Need: from flask import url_for
url_for
generates URLs based on view function names)202 Accepted
: The request has been accepted for processing, but the processing has not been completed (e.g., for asynchronous operations). The response might indicate how to track the status later.204 No Content
: The server successfully processed the request but does not need to return any content. Often used for successful DELETE requests or PUT updates where returning the resource is unnecessary. The response body must be empty.
3xx Redirection: Further action needs to be taken by the client to fulfill the request. (Less common in typical data APIs, more in web apps).
301 Moved Permanently
: The requested resource has been assigned a new permanent URI.302 Found
/307 Temporary Redirect
: The resource temporarily resides under a different URI.
4xx Client Errors: The client seems to have made an error.
400 Bad Request
: The server cannot process the request due to a client error (e.g., malformed request syntax, invalid request message framing, deceptive request routing, missing required parameters, validation errors). This is a very common error code.401 Unauthorized
: Authentication is required, and the client has not provided valid credentials or is not authenticated. The response should include aWWW-Authenticate
header.403 Forbidden
: The client is authenticated, but does not have permission to access the requested resource. The server understood the request but refuses to authorize it.404 Not Found
: The server cannot find the requested resource. This applies to non-existent URLs or specific resources identified within a valid URL (e.g.,/users/999
where user 999 doesn't exist).405 Method Not Allowed
: The method specified in the request (e.g., POST) is not allowed for the resource identified by the request URI (e.g., trying to POST to a read-only resource). The response must include anAllow
header listing the valid methods (e.g.,Allow: GET, PUT
). Flask often handles this automatically if a route exists but doesn't support the requested method.409 Conflict
: The request could not be completed due to a conflict with the current state of the resource (e.g., trying to create a resource that already exists with a unique identifier).415 Unsupported Media Type
: The server refuses to accept the request because the payload format (indicated by theContent-Type
header) is not supported by the target resource for the requested method.422 Unprocessable Entity
: The server understands the content type and syntax of the request entity, but was unable to process the contained instructions (e.g., semantic errors, validation errors in the data itself). Often used when400 Bad Request
is too generic for validation failures.429 Too Many Requests
: The user has sent too many requests in a given amount of time ("rate limiting").
5xx Server Errors: The server failed to fulfill an apparently valid request.
500 Internal Server Error
: A generic error message, given when an unexpected condition was encountered and no more specific message is suitable. This usually indicates a bug in the server-side code. Avoid letting unhandled exceptions bubble up to the user as raw 500 errors in production. Use error handlers to catch them and return a structured JSON error response instead.501 Not Implemented
: The server does not support the functionality required to fulfill the request (e.g., an unrecognized request method).503 Service Unavailable
: The server is currently unable to handle the request due to temporary overloading or maintenance. This is usually a temporary condition.
Returning Status Codes in Flask:
You can return the status code as a second item in the tuple returned by the view function:
return jsonify({"message": "Resource created"}), 201
return jsonify({"error": "Invalid input"}), 400
return '', 204 # Empty body for 204 No Content
Alternatively, you can create a Response
object explicitly (as jsonify
does internally) and set its status_code
attribute.
Customizing Headers in Responses
Besides the Content-Type
set by jsonify
, you might need to set other HTTP headers in your response (e.g., Location
, Cache-Control
, WWW-Authenticate
, custom headers like X-Request-ID
).
You can return headers as a third item in the tuple returned by the view function (as a dictionary or list of tuples):
@app.route('/api/resource', methods=['POST'])
def create_resource():
# ... creation logic ...
new_id = 123
data = {"id": new_id, "message": "Resource created"}
headers = {
'Location': f'/api/resource/{new_id}', # Relative URL
'X-Custom-Info': 'Some value'
}
# Return (body, status_code, headers)
return jsonify(data), 201, headers
# Test with curl -i to see headers
# curl -i -X POST http://127.0.0.1:5000/api/resource
You can also modify the headers
attribute of an explicit Response
object before returning it.
Workshop Enhancing API Responses
Goal:
Refine the Task API from the previous workshop by:
- Implementing standard HTTP status codes (
200
,201
,204
,400
,404
). - Adding the
Location
header for201 Created
responses. - Implementing a
DELETE
method for tasks, returning204 No Content
. - Adopting a consistent JSON response structure (using a simple
data
envelope for success,error
for failures).
Steps:
-
Setup:
- Continue in the
basic_api
directory. - Activate the virtual environment (
source venv/bin/activate
). - Use the
app.py
from the previous workshop as a starting point.
- Continue in the
-
Modify
app.py
:# app.py from flask import Flask, jsonify, request, abort, url_for # Added url_for app = Flask(__name__) # In-memory storage tasks_db = [ {"id": 1, "title": "Learn Flask Basics", "status": "completed"}, {"id": 2, "title": "Build First API", "status": "completed"}, {"id": 3, "title": "Explore Routing", "status": "pending"}, ] next_task_id = 4 # --- Consistent Response Functions --- def make_success_response(data_payload, status_code=200): """Creates a standardized success JSON response.""" response = jsonify({"status": "success", "data": data_payload}) response.status_code = status_code return response def make_error_response(message, status_code): """Creates a standardized error JSON response.""" response = jsonify({"status": "error", "error": {"message": message}}) response.status_code = status_code return response # --- Error Handlers --- @app.errorhandler(404) def not_found(error): return make_error_response(error.description or "Resource not found", 404) @app.errorhandler(400) def bad_request(error): return make_error_response(error.description or "Bad request", 400) @app.errorhandler(415) def unsupported_media_type(error): return make_error_response(error.description or "Unsupported media type", 415) @app.errorhandler(500) # Catch unexpected errors def internal_server_error(error): # Log the error here in a real application! print(f"!!! Internal Server Error: {error}") return make_error_response("An internal server error occurred", 500) # --- Routes --- @app.route('/api/tasks', methods=['GET']) def get_tasks(): status_filter = request.args.get('status') if status_filter: filtered_tasks = [task for task in tasks_db if task['status'] == status_filter] return make_success_response(filtered_tasks) # Default 200 OK else: return make_success_response(tasks_db) # Default 200 OK @app.route('/api/tasks/<int:task_id>', methods=['GET']) def get_task(task_id): task = next((task for task in tasks_db if task['id'] == task_id), None) if task: return make_success_response(task) # Default 200 OK else: # Abort triggers the 404 error handler abort(404, description=f"Task with ID {task_id} not found") @app.route('/api/tasks', methods=['POST']) def create_task(): global next_task_id if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data or 'title' not in data: abort(400, description="Missing 'title' in request body") title = data['title'].strip() if not isinstance(title, str) or len(title) == 0: abort(400, description="'title' must be a non-empty string") new_task = { "id": next_task_id, "title": title, "status": "pending" } tasks_db.append(new_task) next_task_id += 1 # Create a 201 Created response with Location header response = make_success_response(new_task, 201) # Use 201 status code try: # Generate the URL for the newly created task task_url = url_for('get_task', task_id=new_task['id'], _external=True) response.headers['Location'] = task_url except Exception as e: # Log this error in a real app - url_for might fail if endpoint name changes etc. print(f"Warning: Could not generate Location header. Error: {e}") return response # New DELETE endpoint @app.route('/api/tasks/<int:task_id>', methods=['DELETE']) def delete_task(task_id): global tasks_db # Needed because we are reassigning tasks_db list task_index = -1 for i, task in enumerate(tasks_db): if task['id'] == task_id: task_index = i break if task_index != -1: del tasks_db[task_index] # Return 204 No Content with an empty body return '', 204 else: abort(404, description=f"Task with ID {task_id} not found") # Add PUT for updating task status (example) @app.route('/api/tasks/<int:task_id>/status', methods=['PUT']) def update_task_status(task_id): task = next((task for task in tasks_db if task['id'] == task_id), None) if not task: abort(404, description=f"Task with ID {task_id} not found") if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data or 'status' not in data: abort(400, description="Missing 'status' in request body") new_status = data['status'] if new_status not in ['pending', 'completed']: abort(400, description="Invalid status value. Must be 'pending' or 'completed'.") task['status'] = new_status return make_success_response(task) # 200 OK if __name__ == '__main__': print("Enhanced Task API running...") print("Endpoints:") print(" GET /api/tasks") print(" GET /api/tasks?status=<status>") print(" POST /api/tasks (Body: {'title': '...'})") print(" GET /api/tasks/<id>") print(" DELETE /api/tasks/<id>") print(" PUT /api/tasks/<id>/status (Body: {'status': 'pending'|'completed'})") app.run(host='0.0.0.0', port=5000, debug=True)
-
Run the Application:
-
Test the Enhanced Endpoints (use
curl -i
to see status codes and headers):-
Get All Tasks (Check Status 200 and structure):
-
Create Task (Check Status 201, Location header, structure):
curl -i -X POST http://127.0.0.1:5000/api/tasks \ -H "Content-Type: application/json" \ -d '{"title": "Review Status Codes"}' # Expected: HTTP/1.1 201 Created, Content-Type: application/json # Expected: Location header pointing to http://.../api/tasks/4 # Body: {"status": "success", "data": {"id": 4, "title": "Review Status Codes", "status": "pending"}}
-
Get Created Task (Check Status 200):
-
Update Task Status (Check Status 200):
-
Delete Task (Check Status 204):
-
Try to Get Deleted Task (Check Status 404):
-
Try Invalid Request (Check Status 400):
-
-
Stop the Server: Press
Ctrl+C
.
Outcome: You have significantly improved the Task API by implementing standard HTTP status codes, providing meaningful Location
headers upon resource creation, correctly handling DELETE
requests with 204 No Content
, and standardizing the JSON response format for both success and error cases using helper functions. This makes the API more predictable and easier for clients to consume correctly. You also added a basic PUT
example and a generic 500 error handler.
4. Working with Databases (SQLAlchemy)
So far, our APIs have used simple Python lists or dictionaries stored in memory (tasks_db
, items_db
). This approach has a major drawback: the data is volatile. Every time you stop and restart the Flask application, all the data is lost. For any real-world application, you need persistent storage – a way to store data permanently so it survives application restarts and system reboots.
Relational databases (like SQLite, PostgreSQL, MySQL) are the most common solution for storing structured data in web applications. Interacting directly with databases using raw SQL queries can be tedious and error-prone. An Object-Relational Mapper (ORM) provides a higher-level abstraction, allowing you to interact with your database using Python objects and methods, translating these operations into SQL behind the scenes.
SQLAlchemy is the de facto standard ORM in the Python world. It's incredibly powerful and flexible. Flask-SQLAlchemy is a Flask extension that integrates SQLAlchemy seamlessly into your Flask applications, simplifying configuration and session management.
Why Use an ORM like SQLAlchemy?
- Abstraction: Write database interactions using Python code (classes, objects, methods) instead of raw SQL strings.
- Database Agnosticism: Write code that can often work with different database backends (SQLite, PostgreSQL, MySQL) with minimal changes (mainly in the connection string).
- Productivity: Reduces boilerplate code for common database operations (CRUD - Create, Read, Update, Delete).
- Security: Helps prevent SQL injection vulnerabilities when used correctly, as it typically handles parameter escaping.
- Maintainability: Keeps database logic organized within model definitions and object interactions.
Setting Up Flask-SQLAlchemy
1. Installation:
First, you need to install the Flask-SQLAlchemy extension and potentially a database driver (though SQLite is built into Python).
- Create a New Project Directory: Let's keep things organized.
- Create and Activate Virtual Environment:
- Install Flask and Flask-SQLAlchemy: (Flask-SQLAlchemy will install SQLAlchemy as a dependency).
- (Optional) Install PostgreSQL Driver (if using PostgreSQL):
2. Configuration:
Flask-SQLAlchemy is configured through your Flask application's configuration dictionary (app.config
). The most crucial setting is SQLALCHEMY_DATABASE_URI
, which tells SQLAlchemy where your database is located.
-
Database URI Format:
- SQLite:
sqlite:////path/to/database.db
(absolute path - note the four slashes initially for absolute paths) orsqlite:///relative/path/database.db
(relative path - three slashes). For an in-memory SQLite database (useful for testing):sqlite:///:memory:
- PostgreSQL:
postgresql://username:password@host:port/database_name
- MySQL:
mysql://username:password@host:port/database_name
(requires a driver likemysqlclient
orPyMySQL
)
- SQLite:
-
Integrating with Flask: Create your main application file (e.g.,
app.py
) and configure it:# intermediate_api/app.py import os from flask import Flask from flask_sqlalchemy import SQLAlchemy # Determine the base directory of the project basedir = os.path.abspath(os.path.dirname(__file__)) app = Flask(__name__) # --- Database Configuration --- # Option 1: SQLite (simple file-based database) # Creates a file named 'app.db' in the project's instance folder # The instance folder is a good place for files not tracked by version control. # Flask creates it automatically if it doesn't exist. app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(basedir, 'instance', 'app.db') # Option 2: PostgreSQL (Example - replace with your actual credentials) # Ensure PostgreSQL server is running and database 'myapidb' exists # app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql://user:password@localhost:5432/myapidb' # This setting disables a feature of Flask-SQLAlchemy that we don't need # and that warns us if not set. app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False # Create the SQLAlchemy database instance # This object provides access to SQLAlchemy functions and classes like db.Model db = SQLAlchemy(app) # We will define our models and routes below/in other files later # Ensure instance folder exists try: os.makedirs(app.instance_path) except OSError: pass # Already exists @app.route('/') def index(): return "Database configuration is set up!" if __name__ == '__main__': # Important: Create tables before running the app if they don't exist with app.app_context(): # Need app context to interact with db print("Creating database tables if they don't exist...") db.create_all() print("Database tables checked/created.") app.run(host='0.0.0.0', port=5000, debug=True)
Explanation:
- We import
os
to help construct file paths reliably. - We define
basedir
. - We set
SQLALCHEMY_DATABASE_URI
. Usingos.path.join
ensures the path separators are correct for the operating system. We place the SQLite file inside aninstance
folder relative to ourapp.py
file. This folder is conventionally used for instance-specific files (like databases or configuration files) that shouldn't be committed to version control. SQLALCHEMY_TRACK_MODIFICATIONS = False
disables event tracking, which consumes extra memory and is often unnecessary. It's recommended to explicitly set it toFalse
.db = SQLAlchemy(app)
creates the central database object, linking SQLAlchemy to our Flask app.- We ensure the
instance
directory exists. - Inside the
if __name__ == '__main__':
block, beforeapp.run()
, we usewith app.app_context(): db.create_all()
. This command inspects all classes inheriting fromdb.Model
and creates the corresponding database tables if they don't already exist. It needs an application context to know which app's configuration (specifically the database URI) to use.
- We import
Defining Models
Models are Python classes that represent tables in your database. They inherit from db.Model
provided by Flask-SQLAlchemy. Each attribute of the class, defined using db.Column
, corresponds to a column in the database table.
Let's redefine our Task
entity as a SQLAlchemy model:
# Add this within intermediate_api/app.py, after db = SQLAlchemy(app)
class Task(db.Model):
# Define the table name (optional, defaults to class name lowercase)
__tablename__ = 'tasks'
# Define columns
id = db.Column(db.Integer, primary_key=True) # Auto-incrementing primary key
title = db.Column(db.String(150), nullable=False) # Text column, max 150 chars, required
description = db.Column(db.Text, nullable=True) # Longer text column, optional
status = db.Column(db.String(50), nullable=False, default='pending') # String, required, default value
created_at = db.Column(db.DateTime, nullable=False, default=db.func.now()) # Timestamp, defaults to current time
updated_at = db.Column(db.DateTime, nullable=False, default=db.func.now(), onupdate=db.func.now()) # Updates on modification
def __repr__(self):
"""Provide a helpful representation when printing the object."""
return f"<Task {self.id}: {self.title} ({self.status})>"
def to_dict(self):
"""Convert the Task object into a dictionary for JSON serialization."""
return {
'id': self.id,
'title': self.title,
'description': self.description,
'status': self.status,
'created_at': self.created_at.isoformat() if self.created_at else None, # Format datetime for JSON
'updated_at': self.updated_at.isoformat() if self.updated_at else None
}
# --- The rest of the app.py code (routes, run block) follows ---
Explanation:
class Task(db.Model):
: Defines a model class inheriting fromdb.Model
.__tablename__ = 'tasks'
: Explicitly sets the table name. If omitted, SQLAlchemy would infer it astask
.id = db.Column(db.Integer, primary_key=True)
: Defines an integer column namedid
.primary_key=True
marks it as the primary key. For integer primary keys, most databases automatically handle auto-incrementing.title = db.Column(db.String(150), nullable=False)
: Defines a string (VARCHAR) columntitle
with a maximum length of 150 characters.nullable=False
means this column cannot be empty in the database (equivalent toNOT NULL
in SQL).description = db.Column(db.Text, nullable=True)
: Defines aText
column for potentially longer strings.nullable=True
(the default) allows this field to be empty (NULL
).status = db.Column(db.String(50), nullable=False, default='pending')
: A string column with a default value set at the database level.created_at = db.Column(db.DateTime, ..., default=db.func.now())
: A datetime column.default=db.func.now()
tells the database to use its current time function when inserting a row if no value is provided.updated_at = db.Column(..., default=db.func.now(), onupdate=db.func.now())
: Similar tocreated_at
, butonupdate=db.func.now()
tells the database to automatically update this timestamp whenever the row is modified.__repr__(self)
: A standard Python method that defines how the object should be represented as a string (e.g., when printed). Useful for debugging.to_dict(self)
: A custom helper method we've added. This is crucial for API development. Since SQLAlchemy model instances are complex objects, we need a way to convert them into simple Python dictionaries thatjsonify
can understand. This method handles that conversion, including formattingdatetime
objects into ISO 8601 strings, a standard format for JSON.
Basic CRUD Operations with SQLAlchemy
Flask-SQLAlchemy manages database sessions for you. The db.session
object is used to stage changes (additions, updates, deletions) before committing them to the database.
1. Creating Records (Create):
- Instantiate your model class with data.
- Add the object to the session:
db.session.add(object)
. - Commit the transaction:
db.session.commit()
.
def create_new_task(title, description=None, status='pending'):
"""Creates and saves a new task."""
new_task = Task(title=title, description=description, status=status)
try:
db.session.add(new_task) # Stage the new task for insertion
db.session.commit() # Execute the insert operation in the database
print(f"Created task: {new_task}")
return new_task
except Exception as e:
db.session.rollback() # Roll back changes if an error occurs during commit
print(f"Error creating task: {e}")
return None
2. Querying Records (Read):
SQLAlchemy provides a powerful query API accessible via Model.query
.
- Get all records:
Task.query.all()
returns a list ofTask
objects. - Get by primary key:
Task.query.get(primary_key_value)
returns a singleTask
object orNone
if not found. This is very efficient. - Filtering:
Task.query.filter_by(attribute=value)
orTask.query.filter(Task.attribute == value)
.filter_by
: Simple keyword arguments.Task.query.filter_by(status='pending').all()
filter
: More flexible, allows complex comparisons (<
,>
,!=
,like
, etc.).Task.query.filter(Task.status != 'completed').all()
- Getting the first result:
.first()
returns the first matching object orNone
.Task.query.filter_by(title='Learn Flask').first()
- Counting results:
.count()
returns the number of rows matching the query.Task.query.filter_by(status='pending').count()
- Ordering results:
.order_by(Model.attribute)
or.order_by(db.desc(Model.attribute))
.Task.query.order_by(Task.created_at).all()
def find_task_by_id(task_id):
"""Finds a task by its primary key."""
# .get() is optimized for primary key lookups
task = Task.query.get(task_id)
return task
def find_tasks_by_status(status_value):
"""Finds all tasks with a specific status."""
tasks = Task.query.filter_by(status=status_value).order_by(db.desc(Task.created_at)).all()
# Example: Find pending tasks, newest first
return tasks
def get_all_tasks():
"""Gets all tasks from the database."""
return Task.query.all()
3. Updating Records (Update):
- Fetch the existing record (e.g., using
Task.query.get()
). - Modify the attributes of the fetched object.
- Add the object to the session (SQLAlchemy is often smart enough to track changes, but
db.session.add()
is safe). - Commit the transaction:
db.session.commit()
.
def update_task_status(task_id, new_status):
"""Updates the status of an existing task."""
task = Task.query.get(task_id)
if task:
task.status = new_status # Modify the object's attribute
# Note: updated_at should be handled automatically by onupdate=db.func.now()
try:
db.session.commit() # Commit the changes
print(f"Updated task: {task}")
return task
except Exception as e:
db.session.rollback()
print(f"Error updating task: {e}")
return None
else:
print(f"Task with ID {task_id} not found for update.")
return None
4. Deleting Records (Delete):
- Fetch the existing record.
- Delete the object from the session:
db.session.delete(object)
. - Commit the transaction:
db.session.commit()
.
def remove_task(task_id):
"""Deletes a task from the database."""
task = Task.query.get(task_id)
if task:
try:
db.session.delete(task) # Stage the deletion
db.session.commit() # Execute the delete operation
print(f"Deleted task with ID: {task_id}")
return True
except Exception as e:
db.session.rollback()
print(f"Error deleting task: {e}")
return False
else:
print(f"Task with ID {task_id} not found for deletion.")
return False
Important: Session Management
db.session.commit()
: Saves all staged changes (adds, updates, deletes) to the database transactionally. If any part fails, the entire transaction is usually rolled back by the database.db.session.rollback()
: Explicitly discards any changes staged in the current session since the last commit. Crucial for error handling to prevent partial updates.db.session.remove()
/db.session.close()
: Flask-SQLAlchemy typically handles session cleanup automatically at the end of each request. You usually don't need to call these manually in a standard Flask request cycle.
Workshop Building a Persistent Task API
Goal:
Refactor the Task API to use Flask-SQLAlchemy and a SQLite database for persistent storage. Implement full CRUD functionality.
Steps:
-
Setup:
- You should be in the
intermediate_api
directory. - Virtual environment active (
source venv/bin/activate
). Flask
andFlask-SQLAlchemy
installed.
- You should be in the
-
Create
app.py
: Create a newapp.py
file (or clear the existing one) and add the following code, combining the setup, model definition, and new route implementations.# intermediate_api/app.py import os from flask import Flask, jsonify, request, abort from flask_sqlalchemy import SQLAlchemy from sqlalchemy import exc # Import specific SQLAlchemy exceptions # Determine the base directory of the project basedir = os.path.abspath(os.path.dirname(__file__)) app = Flask(__name__) # --- Database Configuration --- app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(basedir, 'instance', 'tasks.db') # Use a specific db file name app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False # Initialize extensions db = SQLAlchemy(app) # --- Database Model --- class Task(db.Model): __tablename__ = 'tasks' id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String(150), nullable=False) description = db.Column(db.Text, nullable=True) status = db.Column(db.String(50), nullable=False, default='pending') # Valid statuses: pending, completed created_at = db.Column(db.DateTime, server_default=db.func.now()) # Use server_default for portability updated_at = db.Column(db.DateTime, server_default=db.func.now(), onupdate=db.func.now()) def __repr__(self): return f"<Task {self.id}: {self.title} ({self.status})>" def to_dict(self): """Convert the Task object into a dictionary for JSON serialization.""" return { 'id': self.id, 'title': self.title, 'description': self.description, 'status': self.status, # Ensure datetime objects exist before calling isoformat() 'created_at': self.created_at.isoformat() if self.created_at else None, 'updated_at': self.updated_at.isoformat() if self.updated_at else None } # Ensure instance folder exists try: os.makedirs(app.instance_path) except OSError: pass # --- Helper Functions for Responses (Optional but Recommended) --- def make_success_response(data_payload, status_code=200): response = jsonify({"status": "success", "data": data_payload}) response.status_code = status_code return response def make_error_response(message, status_code): response = jsonify({"status": "error", "error": {"message": message}}) response.status_code = status_code return response # --- Error Handlers --- @app.errorhandler(404) def not_found(error): return make_error_response(error.description or "Resource not found", 404) @app.errorhandler(400) def bad_request(error): return make_error_response(error.description or "Bad request", 400) @app.errorhandler(415) def unsupported_media_type(error): return make_error_response(error.description or "Unsupported media type", 415) @app.errorhandler(422) def unprocessable_entity(error): return make_error_response(error.description or "Unprocessable entity", 422) @app.errorhandler(500) def internal_server_error(error): # It's good practice to rollback the session in case of internal errors db.session.rollback() # Log the actual error here in a real app print(f"!!! Internal Server Error: {error}") return make_error_response("An internal server error occurred", 500) # --- API Routes --- # GET /api/tasks - Retrieve all tasks or filter by status @app.route('/api/tasks', methods=['GET']) def get_tasks(): status_filter = request.args.get('status') query = Task.query # Start with base query if status_filter: if status_filter not in ['pending', 'completed']: abort(400, description="Invalid status filter. Use 'pending' or 'completed'.") query = query.filter_by(status=status_filter) # Order by creation date by default (newest first) tasks = query.order_by(db.desc(Task.created_at)).all() # Convert list of Task objects to list of dictionaries tasks_dict = [task.to_dict() for task in tasks] return make_success_response(tasks_dict) # GET /api/tasks/<id> - Retrieve a single task @app.route('/api/tasks/<int:task_id>', methods=['GET']) def get_task(task_id): # Use get_or_404: fetches by primary key, automatically aborts with 404 if not found task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found") return make_success_response(task.to_dict()) # POST /api/tasks - Create a new task @app.route('/api/tasks', methods=['POST']) def create_task(): if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data or 'title' not in data: abort(400, description="Missing 'title' in request body") title = data['title'].strip() if not isinstance(title, str) or len(title) == 0: abort(400, description="'title' must be a non-empty string") # Get optional description description = data.get('description') if description is not None and not isinstance(description, str): abort(400, description="'description' must be a string if provided") # Create Task instance new_task = Task(title=title, description=description) # status defaults to 'pending' try: db.session.add(new_task) db.session.commit() # After commit, new_task will have its ID and default values populated return make_success_response(new_task.to_dict(), 201) # Use 201 Created status except exc.SQLAlchemyError as e: # Catch potential database errors db.session.rollback() # Log the error e print(f"Database error on create: {e}") abort(500, description="Database error occurred during task creation.") # Internal Server Error except Exception as e: # Catch other unexpected errors db.session.rollback() print(f"Unexpected error on create: {e}") abort(500, description="An unexpected error occurred.") # PUT /api/tasks/<id> - Update an existing task (full update) @app.route('/api/tasks/<int:task_id>', methods=['PUT']) def update_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found for update") if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data: abort(400, description="Missing JSON data in request body") # --- Validation for PUT (expecting all required fields) --- if 'title' not in data or 'status' not in data: abort(400, description="Missing 'title' or 'status' in request body for PUT") title = data['title'].strip() status = data['status'] description = data.get('description') # Optional if not isinstance(title, str) or len(title) == 0: abort(400, description="'title' must be a non-empty string") if status not in ['pending', 'completed']: abort(400, description="Invalid status value. Must be 'pending' or 'completed'.") if description is not None and not isinstance(description, str): abort(400, description="'description' must be a string if provided") # --- End Validation --- # Update the task object's attributes task.title = title task.status = status task.description = description # Note: updated_at is handled by the database `onupdate` trigger try: # db.session.add(task) # Usually not needed for updates, session tracks changes db.session.commit() return make_success_response(task.to_dict()) # 200 OK default except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on update: {e}") abort(500, description="Database error occurred during task update.") except Exception as e: db.session.rollback() print(f"Unexpected error on update: {e}") abort(500, description="An unexpected error occurred.") # PATCH /api/tasks/<id> - Partially update an existing task (Example: only update status) @app.route('/api/tasks/<int:task_id>', methods=['PATCH']) def patch_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found for update") if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data: abort(400, description="Missing JSON data in request body") updated = False # Flag to check if any changes were made # Update title if provided if 'title' in data: title = data['title'].strip() if not isinstance(title, str) or len(title) == 0: abort(400, description="'title' must be a non-empty string if provided") task.title = title updated = True # Update description if provided if 'description' in data: description = data['description'] if description is not None and not isinstance(description, str): abort(400, description="'description' must be a string if provided") task.description = description updated = True # Update status if provided if 'status' in data: status = data['status'] if status not in ['pending', 'completed']: abort(400, description="Invalid status value. Must be 'pending' or 'completed'.") task.status = status updated = True if not updated: # If no recognized fields were in the PATCH data, maybe return 400? abort(400, description="No valid fields provided for update in PATCH request.") try: db.session.commit() return make_success_response(task.to_dict()) # 200 OK default except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on patch: {e}") abort(500, description="Database error occurred during task update.") except Exception as e: db.session.rollback() print(f"Unexpected error on patch: {e}") abort(500, description="An unexpected error occurred.") # DELETE /api/tasks/<id> - Delete a task @app.route('/api/tasks/<int:task_id>', methods=['DELETE']) def delete_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found for deletion") try: db.session.delete(task) db.session.commit() # Return 204 No Content with an empty body return '', 204 except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on delete: {e}") abort(500, description="Database error occurred during task deletion.") except Exception as e: db.session.rollback() print(f"Unexpected error on delete: {e}") abort(500, description="An unexpected error occurred.") # --- Main Execution --- if __name__ == '__main__': # Create database tables if they don't exist BEFORE running the app with app.app_context(): print("Creating database tables if they don't exist...") # db.drop_all() # Uncomment to clear database on every restart (for testing) db.create_all() print("Database tables checked/created.") # Optional: Seed initial data if table is empty if Task.query.count() == 0: print("Seeding initial tasks...") initial_tasks = [ Task(title="Learn Flask-SQLAlchemy", description="Understand models and sessions", status="pending"), Task(title="Build Persistent API", description="Refactor Task API", status="pending"), Task(title="Test CRUD Operations", description="Use curl or Postman", status="pending") ] db.session.bulk_save_objects(initial_tasks) db.session.commit() print("Initial tasks seeded.") print("\nPersistent Task API running...") print(f"Database located at: {app.config['SQLALCHEMY_DATABASE_URI']}") print("Endpoints:") print(" GET /api/tasks") print(" GET /api/tasks?status=<status>") print(" POST /api/tasks (Body: {'title': '...', 'description': '...'})") print(" GET /api/tasks/<id>") print(" PUT /api/tasks/<id> (Body: {'title': '...', 'status': '...', 'description': '...'})") print(" PATCH /api/tasks/<id> (Body: {'title': '...' or 'status': '...' or 'description': '...'})") print(" DELETE /api/tasks/<id>\n") app.run(host='0.0.0.0', port=5000, debug=True)
-
Run the Application: Open your terminal in the
Observe the output. You should see messages indicating the database tables are being checked/created and potentially seeded. The application will then start.intermediate_api
directory (withvenv
active):- Check: Look inside the
intermediate_api
directory. You should now see aninstance
folder containing atasks.db
file. This is your SQLite database.
- Check: Look inside the
-
Test the Persistent API (using
curl
):- Get All Tasks (should include seeded tasks):
- Create a New Task:
- Get the New Task by ID (e.g., ID 4 if it's the 4th task):
- Update Task (PUT - requires all fields):
- Partially Update Task (PATCH - update only status):
- Get All Tasks Again (verify changes):
- Delete a Task (e.g., delete task 1):
- Verify Deletion:
-
Stop and Restart the Server:
- Press
Ctrl+C
in the terminal running Flask. - Run
python app.py
again. - Test the API again, for example: You should see that the data persisted (tasks created/updated/deleted in the previous run are still in that state, except for task 1 which was deleted). The initial seeding logic only runs if the table is completely empty.
- Press
Outcome: You have successfully built a REST API with persistent data storage using Flask-SQLAlchemy and SQLite. You learned how to define models, configure the database connection, perform CRUD operations using db.session
, and serialize model objects into JSON using a to_dict
method. You also implemented PATCH for partial updates and improved error handling. Your API data now survives application restarts.
5. Blueprints and Application Structure (Advanced)
As your Flask application grows beyond a few simple routes, keeping all the code in a single app.py
file becomes increasingly difficult to manage and maintain. The code becomes cluttered, harder to read, and collaboration becomes challenging. Furthermore, you might want to reuse parts of your application or organize features into logical groups. This is where Flask's Blueprints and well-defined project structures come into play.
This section delves into these advanced techniques, enabling you to build larger, more organized, and maintainable Flask APIs suitable for complex real-world scenarios. We will also introduce the Application Factory Pattern, a standard practice for creating Flask application instances in a flexible and testable way.
Why Structure Matters
Imagine building a large house. You wouldn't just throw all the materials and tools into one big pile. Instead, you'd have blueprints (plans), separate rooms for different functions (kitchen, bedroom), organized storage for tools, and defined processes for construction. Similarly, structuring a web application offers significant advantages:
- Modularity: Breaking the application into smaller, independent components (like Blueprints) makes each part easier to understand, develop, and test in isolation.
- Maintainability: When code related to a specific feature (e.g., user management, product catalog) is grouped, finding and fixing bugs or adding new functionality becomes much simpler. Changes in one module are less likely to break unrelated parts of the application.
- Scalability: A well-structured application is easier to scale, both in terms of codebase size and potentially team size. Different developers can work on different modules concurrently.
- Reusability: Blueprints can be designed to be reusable across different projects or even mounted at different URL prefixes within the same application.
- Testability: Smaller, focused modules are generally easier to unit test than a single, monolithic file.
What are Blueprints?
A Blueprint in Flask is an object that allows you to record operations (like registering routes and error handlers) that you intend to execute later on an actual application object. Think of it as a template or a mini-application that captures a subset of your application's functionality.
Key Concepts:
- Grouping: Blueprints group related routes, view functions, static files, templates, and error handlers.
- Deferred Registration: Operations registered on a Blueprint are not active until the Blueprint is registered with a Flask application instance (
app
). - Namespace: When registered, Blueprints can be given a URL prefix (e.g.,
/api/users
,/admin
) and their own namespace, helping to avoid naming collisions between view functions or URL endpoints defined in different parts of the application.
Essentially, Blueprints provide a way to structure your Flask application into distinct components without requiring separate application objects for each component.
Creating a Blueprint
Creating a Blueprint is straightforward. You instantiate the Blueprint
class, providing a name for the Blueprint and the import name (usually __name__
) which Flask uses to locate associated resources.
# Example: Creating a Blueprint for user-related routes
from flask import Blueprint
# Syntax: Blueprint('blueprint_name', import_name, **options)
users_bp = Blueprint('users', __name__)
# Now, instead of @app.route, you use @<blueprint_name>.route
@users_bp.route('/profile')
def get_user_profile():
# Logic to fetch user profile
return jsonify({"username": "current_user", "email": "user@example.com"})
@users_bp.route('/settings', methods=['GET', 'POST'])
def update_user_settings():
if request.method == 'POST':
# Logic to update settings
pass
# Logic to display settings form or data
return jsonify({"message": "Settings endpoint"})
# You can also register error handlers specific to this blueprint
@users_bp.app_errorhandler(404) # Note: use app_errorhandler for app-wide handlers from a blueprint
def handle_404_user_error(error):
# Custom 404 handling specifically for user-related routes IF registered differently
# Or more commonly, register app-wide handlers on the main app or a core blueprint
return jsonify({"error": "User resource not found"}), 404
Explanation:
from flask import Blueprint
: Import the necessary class.users_bp = Blueprint('users', __name__)
: Create a Blueprint instance.'users'
: The name of the Blueprint. This is used internally by Flask and is especially important forurl_for()
.__name__
: The import name. Flask uses this to determine the root path for the Blueprint (e.g., to find associated template folders if you were using them).
@users_bp.route(...)
: Routes are defined using the Blueprint'sroute
decorator, not the application's (app.route
).- Error Handlers: You can register error handlers specific to a blueprint using
@blueprint_name.errorhandler()
or application-wide handlers using@blueprint_name.app_errorhandler()
. The latter is often preferred for consistency unless you need truly Blueprint-specific error pages.
Registering a Blueprint
A Blueprint is inactive until it's registered with a Flask application instance using the app.register_blueprint()
method.
from flask import Flask
# Assume users_bp is defined in another file (e.g., my_app/users/routes.py)
# from my_app.users.routes import users_bp
app = Flask(__name__)
# Register the blueprint
# Option 1: Register without a prefix (routes added directly under app's root)
# app.register_blueprint(users_bp)
# Access routes like: /profile, /settings
# Option 2: Register with a URL prefix
app.register_blueprint(users_bp, url_prefix='/api/v1/users')
# Access routes like: /api/v1/users/profile, /api/v1/users/settings
# Other app configurations and routes...
if __name__ == '__main__':
app.run(debug=True)
Key register_blueprint
Options:
blueprint
: The Blueprint object to register.url_prefix
: An optional string that will be prepended to all URLs defined in the Blueprint. This is extremely useful for versioning APIs (/api/v1
,/api/v2
) or grouping features (/admin
,/shop
).subdomain
: Optionally register the Blueprint for a specific subdomain.url_defaults
: A dictionary of default values for URL variables in the Blueprint's routes.
Project Structure with Blueprints
Using Blueprints naturally leads to organizing your project into multiple files and directories (modules/packages). There's no single "correct" structure, but common patterns emerge. A popular approach, especially for larger applications, is to create a main application package:
/your_project_root/
|-- venv/ # Virtual environment
|-- instance/ # Instance-specific files (e.g., database, secrets)
| |-- tasks.db # Example SQLite DB
|
|-- app/ # Your main application package (can be named differently)
| |-- __init__.py # Makes 'app' a package, often contains the app factory
| |-- config.py # Configuration classes/settings
| |-- extensions.py # Initialize Flask extensions (like db, migrate, jwt)
| |-- models.py # SQLAlchemy model definitions (can be split further)
| |
| |-- tasks/ # Blueprint package for 'tasks' feature
| | |-- __init__.py # Defines the tasks_bp Blueprint object
| | |-- routes.py # Routes and view functions for tasks
| | |-- models.py # (Optional) Models specific to tasks
| | |-- schemas.py # (Optional) Marshmallow schemas for tasks (covered later)
| |
| |-- users/ # (Example) Another Blueprint package
| | |-- __init__.py
| | |-- routes.py
| |
| |-- static/ # Static files (CSS, JS, images) - less relevant for pure APIs
| |-- templates/ # HTML templates - less relevant for pure APIs
|
|-- tests/ # Directory for automated tests
| |-- ...
|
|-- migrations/ # Flask-Migrate database migration scripts (covered later)
|-- requirements.txt # Project dependencies
|-- run.py # Script to create app instance using factory and run the server
|-- wsgi.py # WSGI entry point for production servers (Gunicorn, uWSGI)
Explanation of Key Components:
app/
(orproject_name/
): The core application package.app/__init__.py
: Crucial file. It signifies that theapp
directory is a Python package. It's the ideal place to define the Application Factory function (create_app
).app/config.py
: Defines configuration settings (database URI, secret keys, etc.), often using classes for different environments (Development, Testing, Production).app/extensions.py
: Instantiates Flask extensions (likedb = SQLAlchemy()
) without associating them with a specific app instance yet. This avoids circular imports. The association happens later in the factory function usingdb.init_app(app)
.app/models.py
: Contains your SQLAlchemy database model definitions. For very large applications, you might move models into their respective feature packages (e.g.,app/tasks/models.py
).app/tasks/
(Blueprint Package):__init__.py
: Often just imports the Blueprint object created inroutes.py
or defines it here.from .routes import tasks_bp
routes.py
(orviews.py
): Defines the Blueprint object (tasks_bp = Blueprint(...)
) and registers all routes (@tasks_bp.route(...)
) and potentially Blueprint-specific error handlers for this feature. Imports models and uses extensions (db
) as needed.
run.py
/wsgi.py
: Entry points for running the application. They import thecreate_app
factory, call it to get an app instance, and then run it (usingapp.run()
for development inrun.py
) or expose it for a WSGI server (inwsgi.py
).
This structure promotes separation of concerns and makes the application much easier to navigate and scale.
Application Factory Pattern (create_app
)
Instead of creating the Flask app
object globally at the top of a file, the Application Factory pattern involves creating the application instance inside a function, typically called create_app()
.
Motivation:
- Testing: You can create multiple instances of your application with different configurations (e.g., one using a testing database, another for development). This is essential for reliable automated testing.
- Configuration Management: Easily load different configurations based on environment variables or function arguments passed to the factory.
- Avoiding Circular Imports: By initializing extensions (
db.init_app(app)
) inside the factory after theapp
object is created, you avoid complex import issues that can arise when extensions need access to the app config, and view functions (in Blueprints) need access to the extensions. - Deployment: Simplifies creating app instances for different deployment scenarios (e.g., WSGI servers).
Implementation:
The create_app
function typically lives in the main application package's __init__.py
(app/__init__.py
).
# app/__init__.py
import os
from flask import Flask
from .config import Config, DevelopmentConfig, ProductionConfig # Assuming config classes are defined
from .extensions import db # Import extension instances
# Import blueprint objects
from .tasks.routes import tasks_bp
# from .users.routes import users_bp # Example for another blueprint
def create_app(config_class=None):
"""Application Factory Function"""
app = Flask(__name__, instance_relative_config=True) # Enable instance folder config
# --- Load Configuration ---
if config_class is None:
# Determine config based on environment variable or default to Development
env_config = os.getenv('FLASK_CONFIG', 'development').capitalize() + 'Config'
try:
config_class = globals()[env_config]
except KeyError:
config_class = DevelopmentConfig
app.config.from_object(config_class)
# Load instance config, if it exists (sensitive data)
# Example: instance/config.py could contain SECRET_KEY = '...'
# Ensure instance/config.py is in .gitignore
app.config.from_pyfile('config.py', silent=True)
# --- Initialize Extensions ---
# Pass the app instance to the extension objects
db.init_app(app)
# other_extension.init_app(app)
# --- Register Blueprints ---
# Use url_prefix to group task routes under /api/tasks
app.register_blueprint(tasks_bp, url_prefix='/api/tasks')
# app.register_blueprint(users_bp, url_prefix='/api/users')
# --- Register Application-Wide Error Handlers (Optional) ---
# Can define general error handlers here if not handled adequately in Blueprints
@app.errorhandler(404)
def handle_app_404(error):
# Generic 404 if not caught by a specific blueprint handler
return {"error": "Resource not found at this URL"}, 404
# --- Database Creation (Optional - Consider Flask-Migrate) ---
# This ensures tables are created when the app starts, if needed.
# Better handled by migration tools like Flask-Migrate in larger projects.
with app.app_context():
# db.drop_all() # Use with caution during development
db.create_all()
# Seed data logic could go here as well
print(f"App created with config: {config_class.__name__}")
print(f"Registered Blueprints: {list(app.blueprints.keys())}")
print(f"Database URI: {app.config.get('SQLALCHEMY_DATABASE_URI')}")
return app
Using the Factory:
You would then create a run.py
or wsgi.py
at the project root:
# run.py (for development)
import os
from app import create_app
# Load environment variables if using a .env file (pip install python-dotenv)
# from dotenv import load_dotenv
# load_dotenv()
# Create the app instance using the factory
# Optionally pass a specific config: create_app(ProductionConfig)
# Otherwise, it defaults based on FLASK_CONFIG or to DevelopmentConfig
app = create_app()
if __name__ == '__main__':
# Use Flask's development server (debug=True should be handled by config)
app.run(host='0.0.0.0', port=5000) # Port/host can also be in config
# wsgi.py (for production servers like Gunicorn)
import os
from app import create_app
# from dotenv import load_dotenv
# load_dotenv()
# Create app instance, typically forcing Production config
# from app.config import ProductionConfig
# application = create_app(ProductionConfig) # Or rely on FLASK_CONFIG env var
application = create_app() # Factory determines config based on env var or defaults
gunicorn "wsgi:application"
Workshop Refactoring the Task API with Blueprints and Factory Pattern
Goal:
Reorganize the persistent Task API (from the SQLAlchemy section) using a dedicated Blueprint for task routes and implementing the Application Factory pattern for better structure and testability.
Steps:
-
Create Project Structure: Navigate to your main projects directory (
cd ~/projects/flask_api_course
). Create the new structure:mkdir advanced_api cd advanced_api # Create app package and subdirectories mkdir app mkdir app/tasks mkdir instance # For the database file # Create initial empty Python files (use 'touch' command or your editor) touch app/__init__.py touch app/config.py touch app/extensions.py touch app/models.py touch app/tasks/__init__.py touch app/tasks/routes.py # Create top-level run script touch run.py # Create requirements file touch requirements.txt # Setup virtual environment python3 -m venv venv source venv/bin/activate # Install dependencies echo "Flask" >> requirements.txt echo "Flask-SQLAlchemy" >> requirements.txt # echo "python-dotenv" >> requirements.txt # Optional, for .env files pip install -r requirements.txt # Optional: Initialize Git # git init # echo "venv/" >> .gitignore # echo "instance/" >> .gitignore # echo "__pycache__/" >> .gitignore # echo "*.pyc" >> .gitignore # echo ".env" >> .gitignore # git add . # git commit -m "Initial project structure for Advanced API"
-
Define Configuration (
app/config.py
): Set up basic configuration classes.# app/config.py import os basedir = os.path.abspath(os.path.dirname(__file__)) # Go up one level to the project root relative to this file's directory project_root = os.path.dirname(basedir) instance_path = os.path.join(project_root, 'instance') class Config: """Base configuration.""" SECRET_KEY = os.environ.get('SECRET_KEY', os.urandom(24)) # Important for sessions, CSRF, etc. SQLALCHEMY_TRACK_MODIFICATIONS = False # Define instance path explicitly if needed, though instance_relative_config=True in factory helps # INSTANCE_PATH = instance_path class DevelopmentConfig(Config): """Development configuration.""" DEBUG = True SQLALCHEMY_DATABASE_URI = os.environ.get('DEV_DATABASE_URL') or \ 'sqlite:///' + os.path.join(instance_path, 'tasks_dev.db') # Use dev specific db # Ensure instance folder exists for SQLite development db if 'sqlite' in SQLALCHEMY_DATABASE_URI and not os.path.exists(instance_path): try: os.makedirs(instance_path) print(f"Created instance folder at: {instance_path}") except OSError as e: print(f"Error creating instance folder: {e}") class TestingConfig(Config): """Testing configuration.""" TESTING = True SQLALCHEMY_DATABASE_URI = os.environ.get('TEST_DATABASE_URL') or \ 'sqlite:///:memory:' # Use in-memory SQLite for tests WTF_CSRF_ENABLED = False # Disable CSRF forms protection in tests class ProductionConfig(Config): """Production configuration.""" DEBUG = False # Example for PostgreSQL - get from environment variables SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL') or \ 'sqlite:///' + os.path.join(instance_path, 'tasks_prod.db') # Use prod specific db # Add other production settings: logging, security headers, etc. # Dictionary to easily access configs by name config_by_name = dict( development=DevelopmentConfig, testing=TestingConfig, production=ProductionConfig )
-
Initialize Extensions (
app/extensions.py
): Instantiate SQLAlchemy (and potentially others later). -
Define Models (
app/models.py
): Move theTask
model here. Importdb
from.extensions
.Note: Adjusted# app/models.py from .extensions import db # Import db from extensions.py import datetime # Import datetime directly class Task(db.Model): __tablename__ = 'tasks' id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String(150), nullable=False) description = db.Column(db.Text, nullable=True) status = db.Column(db.String(50), nullable=False, default='pending') # Use server_default for database-level defaults, more portable than python default created_at = db.Column(db.DateTime, server_default=db.func.now()) updated_at = db.Column(db.DateTime, server_default=db.func.now(), onupdate=db.func.now()) def __repr__(self): return f"<Task {self.id}: {self.title} ({self.status})>" def to_dict(self): """Convert the Task object into a dictionary for JSON serialization.""" return { 'id': self.id, 'title': self.title, 'description': self.description, 'status': self.status, 'created_at': self.created_at.isoformat() if isinstance(self.created_at, datetime.datetime) else str(self.created_at), 'updated_at': self.updated_at.isoformat() if isinstance(self.updated_at, datetime.datetime) else str(self.updated_at) }
to_dict
slightly to handle potential non-datetime values more gracefully, though ideally they should always be datetimes after retrieval. -
Create Task Blueprint (
app/tasks/__init__.py
): Define the Blueprint instance. -
Define Task Routes (
app/tasks/routes.py
): Move all task-related routes here. Use@tasks_bp.route
, importdb
andTask
, and userequest
,jsonify
,abort
.# app/tasks/routes.py from flask import request, jsonify, abort from . import tasks_bp # Import the blueprint instance from __init__.py from ..extensions import db # Import db from the main extensions module from ..models import Task # Import Task model from the main models module from sqlalchemy import exc # --- Helper Functions (Could be moved to a shared utils module) --- def make_success_response(data_payload, status_code=200): response = jsonify({"status": "success", "data": data_payload}) response.status_code = status_code return response def make_error_response(message, status_code): response = jsonify({"status": "error", "error": {"message": message}}) response.status_code = status_code return response # --- Routes attached to tasks_bp --- # Note: The URL prefix '/api/tasks' will be added when registering the blueprint @tasks_bp.route('/', methods=['GET']) # Corresponds to GET /api/tasks/ def get_tasks(): status_filter = request.args.get('status') query = Task.query if status_filter: if status_filter not in ['pending', 'completed']: abort(400, description="Invalid status filter. Use 'pending' or 'completed'.") query = query.filter_by(status=status_filter) tasks = query.order_by(db.desc(Task.created_at)).all() tasks_dict = [task.to_dict() for task in tasks] return make_success_response(tasks_dict) @tasks_bp.route('/<int:task_id>', methods=['GET']) # GET /api/tasks/<id> def get_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found") return make_success_response(task.to_dict()) @tasks_bp.route('/', methods=['POST']) # POST /api/tasks/ def create_task(): if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data or 'title' not in data: abort(400, description="Missing 'title' in request body") title = data['title'].strip() if not isinstance(title, str) or len(title) == 0: abort(400, description="'title' must be a non-empty string") description = data.get('description') if description is not None and not isinstance(description, str): abort(400, description="'description' must be a string if provided") new_task = Task(title=title, description=description) try: db.session.add(new_task) db.session.commit() # Note: Location header generation needs url_for('.get_task', ...) # We might add that later or omit for simplicity now. return make_success_response(new_task.to_dict(), 201) except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on create: {e}") abort(500, description="Database error occurred during task creation.") except Exception as e: db.session.rollback() print(f"Unexpected error on create: {e}") abort(500, description="An unexpected error occurred.") @tasks_bp.route('/<int:task_id>', methods=['PUT']) # PUT /api/tasks/<id> def update_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found for update") if not request.is_json: abort(415, description="Request must be JSON") data = request.get_json() if not data: abort(400, description="Missing JSON data") if 'title' not in data or 'status' not in data: abort(400, description="Missing 'title' or 'status'") title = data['title'].strip() status = data['status'] description = data.get('description') if not isinstance(title, str) or len(title) == 0: abort(400, "'title' required") if status not in ['pending', 'completed']: abort(400, "Invalid status") if description is not None and not isinstance(description, str): abort(400, "'description' must be string") task.title = title task.status = status task.description = description try: db.session.commit() return make_success_response(task.to_dict()) except exc.SQLAlchemyError as e: db.session.rollback(); print(f"DB error: {e}"); abort(500, "DB error on update.") except Exception as e: db.session.rollback(); print(f"Error: {e}"); abort(500, "Unexpected error on update.") @tasks_bp.route('/<int:task_id>', methods=['PATCH']) # PATCH /api/tasks/<id> def patch_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found") if not request.is_json: abort(415, "Request must be JSON") data = request.get_json() if not data: abort(400, "Missing JSON data") updated = False if 'title' in data: title = data['title'].strip() if not isinstance(title, str) or len(title) == 0: abort(400, "'title' required") task.title = title; updated = True if 'description' in data: description = data['description'] if description is not None and not isinstance(description, str): abort(400, "'description' must be string") task.description = description; updated = True if 'status' in data: status = data['status'] if status not in ['pending', 'completed']: abort(400, "Invalid status") task.status = status; updated = True if not updated: abort(400, "No valid fields provided for update") try: db.session.commit() return make_success_response(task.to_dict()) except exc.SQLAlchemyError as e: db.session.rollback(); print(f"DB error: {e}"); abort(500, "DB error on patch.") except Exception as e: db.session.rollback(); print(f"Error: {e}"); abort(500, "Unexpected error on patch.") @tasks_bp.route('/<int:task_id>', methods=['DELETE']) # DELETE /api/tasks/<id> def delete_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task ID {task_id} not found") try: db.session.delete(task) db.session.commit() return '', 204 except exc.SQLAlchemyError as e: db.session.rollback(); print(f"DB error: {e}"); abort(500, "DB error on delete.") except Exception as e: db.session.rollback(); print(f"Error: {e}"); abort(500, "Unexpected error on delete.") # --- Blueprint Specific Error Handlers (Optional Example) --- # These would catch errors only within routes defined by this blueprint # if not handled by more specific app error handlers. @tasks_bp.errorhandler(400) def handle_task_bad_request(error): # You could customize the error message further here if needed return make_error_response(error.description or "Bad request in tasks API", 400)
-
Implement Application Factory (
app/__init__.py
): Bring everything together here.We added an# app/__init__.py import os from flask import Flask, jsonify from .config import config_by_name # Import the config dictionary from .extensions import db # Import extension instances # Import blueprint objects from .tasks import tasks_bp # Correctly import the blueprint instance from .models import Task # Import models to ensure they are known to SQLAlchemy def create_app(config_name='development'): """Application Factory Function""" app = Flask(__name__, instance_relative_config=True) # --- Load Configuration --- try: config_object = config_by_name[config_name] app.config.from_object(config_object) print(f"Loading configuration: {config_name}") except KeyError: raise ValueError(f"Invalid configuration name: {config_name}. Choose from {list(config_by_name.keys())}") # Load instance config if it exists (e.g., instance/config.py) app.config.from_pyfile('config.py', silent=True) # --- Initialize Extensions --- db.init_app(app) # migrate.init_app(app, db) # If using Flask-Migrate # --- Register Blueprints --- app.register_blueprint(tasks_bp, url_prefix='/api/tasks') # Register other blueprints here... # --- Application-Wide Error Handlers --- # Define more general handlers if needed @app.errorhandler(404) def handle_not_found(error): # Check if the description comes from abort() message = error.description if hasattr(error, 'description') else "The requested URL was not found on the server." return jsonify({"status": "error", "error": {"message": message}}), 404 @app.errorhandler(500) def handle_internal_error(error): # Log the actual error in production! print(f"Internal Server Error: {error}") # Print for debug purposes db.session.rollback() # Rollback the session return jsonify({"status": "error", "error": {"message": "An internal server error occurred."}}), 500 @app.errorhandler(400) # Example generic handler def handle_bad_request(error): message = error.description if hasattr(error, 'description') else "Bad request." return jsonify({"status": "error", "error": {"message": message}}), 400 @app.errorhandler(415) # Example generic handler def handle_unsupported_media_type(error): message = error.description if hasattr(error, 'description') else "Unsupported Media Type." return jsonify({"status": "error", "error": {"message": message}}), 415 # --- Database Creation & Seeding (Consider using Flask-Migrate or CLI commands) --- # This is okay for simple cases but less flexible than migrations. @app.cli.command("init-db") def init_db_command(): """Clear existing data and create new tables.""" # Make sure we have application context with app.app_context(): db.drop_all() db.create_all() print("Initialized the database.") # Seed initial data if Task.query.count() == 0: print("Seeding initial tasks...") initial_tasks = [ Task(title="Refactor with Blueprints", status="completed"), Task(title="Implement App Factory", status="completed"), Task(title="Test Refactored API", status="pending") ] db.session.bulk_save_objects(initial_tasks) db.session.commit() print("Initial tasks seeded.") # Optional: Create tables automatically if DB doesn't exist (less controlled) # Be cautious with this in production or with migrations. # with app.app_context(): # db.create_all() # --- Simple Health Check Route --- @app.route('/health') def health_check(): return jsonify({"status": "ok"}) print("-" * 40) print(f"Application '{app.name}' created.") print(f"Configuration: {config_name}") print(f"Debug Mode: {app.config['DEBUG']}") print(f"Database URI: {app.config['SQLALCHEMY_DATABASE_URI']}") print(f"Registered Blueprints: {list(app.blueprints.keys())}") print("-" * 40) return app
init-db
CLI command for more explicit database setup. -
Create Runner Script (
run.py
):# run.py import os from app import create_app # Determine the configuration name (e.g., from FLASK_CONFIG env var) # Default to 'development' if not set config_name = os.getenv('FLASK_CONFIG', 'development') print(f"Starting app with config: {config_name}") app = create_app(config_name) if __name__ == '__main__': # Port and host can also be loaded from config if needed port = int(os.getenv('PORT', 5000)) app.run(host='0.0.0.0', port=port) # Debug is set by the config object
-
Initialize and Test:
-
Initialize Database: Open your terminal in the
advanced_api
directory (withvenv
active). Run the custom Flask CLI command:Verify that# Ensure FLASK_APP environment variable is set (needed for 'flask' command) export FLASK_APP=run.py # Or if using python-dotenv, create a .flaskenv file with: # FLASK_APP=run.py # FLASK_CONFIG=development # Run the init-db command flask init-db # Expected output: Initialized the database. Seeding initial tasks... Initial tasks seeded.
instance/tasks_dev.db
has been created. -
Run the Application:
You should see the startup messages from thepython run.py # Or using Flask CLI (respects FLASK_APP, FLASK_DEBUG from env vars/config): # flask run --host=0.0.0.0 --port=5000
create_app
factory. -
Test Endpoints (using
curl
): Notice the/api/tasks
prefix is now required for all task routes.# Health Check (defined in app factory) curl http://127.0.0.1:5000/health # Get All Tasks (Now under /api/tasks/) curl http://127.0.0.1:5000/api/tasks/ # Get Task 1 curl http://127.0.0.1:5000/api/tasks/1 # Create Task curl -X POST http://127.0.0.1:5000/api/tasks/ \ -H "Content-Type: application/json" \ -d '{"title": "Implement Authentication"}' # Get All Tasks again (verify creation) curl http://127.0.0.1:5000/api/tasks/ # Delete Task (e.g., task 3) curl -i -X DELETE http://127.0.0.1:5000/api/tasks/3 # Verify deletion curl http://127.0.0.1:5000/api/tasks/3 # Should be 404
-
-
Stop the Server: Press
Ctrl+C
.
Outcome: You have successfully refactored the Task API into a much more organized structure using Blueprints and the Application Factory pattern. The core logic for tasks is now neatly contained within the app/tasks
package. The application is configured via classes in config.py
, extensions are initialized centrally, and the create_app
function provides a flexible entry point for creating app instances. This structure is significantly more scalable and maintainable for larger projects.
6. Input Validation and Serialization (Marshmallow)
In the previous sections, we manually validated incoming request data using if
conditions and dictionaries, and we manually converted our SQLAlchemy model instances into dictionaries using a to_dict()
method for JSON responses. While this works for simple APIs, it quickly becomes:
- Repetitive: Writing the same validation logic (checking types, required fields) across multiple endpoints is tedious and violates the DRY (Don't Repeat Yourself) principle.
- Error-Prone: It's easy to miss edge cases or introduce inconsistencies in manual validation and serialization.
- Poor Separation of Concerns: Validation and serialization logic clutter your view functions (route handlers), mixing data transformation/validation with request handling and business logic.
- Difficult to Maintain: Changing a model often requires updating validation logic and
to_dict
methods in multiple places.
To address these challenges, the Python ecosystem offers excellent libraries for data validation and serialization/deserialization. Marshmallow is the most popular and powerful library for this purpose, especially within the Flask community when paired with the Flask-Marshmallow extension.
What is Marshmallow?
Marshmallow is an ORM/ODM/framework-agnostic library for:
- Serialization: Converting complex data types, such as objects (like our SQLAlchemy models), into native Python data types (dictionaries, lists, etc.) that can be easily rendered into standard formats like JSON. This replaces our manual
to_dict()
method. - Deserialization: Parsing and converting native Python data types (typically obtained from incoming request data like JSON) back into application-level objects or validated data structures.
- Validation: Validating incoming data against predefined rules (e.g., required fields, data types, length constraints, specific choices) during the deserialization process.
Think of a Marshmallow Schema as a declarative layer that defines how your data should be structured, validated, and transformed between its internal representation (e.g., a Task
object) and its external representation (e.g., JSON).
Setting up Marshmallow
Flask-Marshmallow provides helpful integration features, including generating schemas directly from SQLAlchemy models.
1. Installation:
Ensure your virtual environment for the advanced_api
project is active (source venv/bin/activate
).
pip install Flask-Marshmallow marshmallow-sqlalchemy
# Flask-Marshmallow brings in Marshmallow core
# marshmallow-sqlalchemy is needed for SQLAlchemy model integration
# Add to requirements.txt
echo "Flask-Marshmallow" >> requirements.txt
echo "marshmallow-sqlalchemy" >> requirements.txt
pip freeze > requirements.txt # Optional: update requirements with exact versions
2. Initialization: Like other Flask extensions, we initialize Flask-Marshmallow using the factory pattern.
-
app/extensions.py
: Instantiate the Marshmallow object. -
app/__init__.py
: Initialize it with the app instance insidecreate_app
.We added a dedicated error handler for# app/__init__.py import os from flask import Flask, jsonify from .config import config_by_name from .extensions import db, ma # Import ma # ... other imports ... def create_app(config_name='development'): # ... (Flask app creation and config loading) ... # --- Initialize Extensions --- db.init_app(app) ma.init_app(app) # Initialize Marshmallow with the app # ... (other extensions) ... # --- Register Blueprints --- # ... (Blueprint registration) ... # --- Error Handlers --- # ... (Error handler definitions) ... # Add error handler for Marshmallow's ValidationError from marshmallow import ValidationError # Import ValidationError @app.errorhandler(ValidationError) def handle_marshmallow_validation(err): # err.messages is a dictionary containing validation errors # Format: {"field_name": ["error message 1", ...], ...} app.logger.warning(f"Marshmallow Validation Error: {err.messages}") # Log the error return jsonify({ "status": "error", "error": { "message": "Input validation failed", "details": err.messages } }), 422 # 422 Unprocessable Entity is appropriate for validation errors # ... (CLI commands, health check, etc.) ... return app
marshmallow.ValidationError
. When Marshmallow'sload
method encounters validation errors, it raises this exception. Our handler catches it, logs the details, and returns a standardized 422 error response containing the specific validation messages from Marshmallow.
Defining Schemas
Schemas are classes that inherit from ma.Schema
(provided by Flask-Marshmallow) or specialized schema types like ma.SQLAlchemyAutoSchema
.
Let's create a schema for our Task
model. For efficiency and to avoid redefining fields already present in our SQLAlchemy model, we'll use SQLAlchemyAutoSchema
. This automatically inspects the Task
model and generates corresponding Marshmallow fields.
- Create
app/tasks/schemas.py
:# app/tasks/schemas.py from ..extensions import ma # Import the Marshmallow instance from ..models import Task # Import the Task model from marshmallow import fields, validate # Import fields and validation helpers class TaskSchema(ma.SQLAlchemyAutoSchema): """ Marshmallow schema for Task model for serialization and deserialization. Uses SQLAlchemyAutoSchema to automatically generate fields based on the Task model. """ class Meta: model = Task # Specify the SQLAlchemy model to introspect load_instance = True # Optional: deserialize to model instance directly # exclude = ("updated_at",) # Optional: fields to exclude from serialization # Include SQLAlchemy session in context for loading instances # This allows Marshmallow to work with the existing db session # when creating/updating model instances during load. sqla_session = ma.SQLAlchemy().session # Use the session associated with Flask-SQLAlchemy # --- Field Overrides & Validation --- # Although fields are generated automatically, we can override them # to add validation or customize serialization/deserialization. # Make 'title' required on input (load), and add length validation title = fields.String(required=True, validate=validate.Length(min=1, max=150, error="Title must be between 1 and 150 characters.")) # Make 'status' required on input only when creating (not patching) # For PUT (full update), we'll handle requirement check in the view logic or a separate schema. # For POST (create), it defaults in the model, but we can still validate the input if provided. # We will primarily validate the allowed choices. status = fields.String( required=False, # Model has a default, so not strictly required on input unless overriding validate=validate.OneOf(["pending", "completed"], error="Status must be either 'pending' or 'completed'.") ) # Ensure 'description', if provided, is a string description = fields.String(required=False) # Allows null/omission # Make certain fields read-only (cannot be provided during input/load) id = fields.Integer(dump_only=True) # dump_only means it's only used for serialization output created_at = fields.DateTime(dump_only=True, format='iso') # Use ISO 8601 format for output updated_at = fields.DateTime(dump_only=True, format='iso') # Optional: Schema for PATCH operations where all fields are optional class TaskPatchSchema(TaskSchema): class Meta(TaskSchema.Meta): # Override Meta options if needed for PATCH pass # Make fields optional for PATCH by setting required=False title = fields.String(required=False, validate=validate.Length(min=1, max=150, error="Title must be between 1 and 150 characters.")) # Status validation remains, but it's not required status = fields.String(required=False, validate=validate.OneOf(["pending", "completed"], error="Status must be 'pending' or 'completed'.")) # Description is already optional
Explanation:
TaskSchema(ma.SQLAlchemyAutoSchema)
: Inherits fromSQLAlchemyAutoSchema
to link with theTask
model.class Meta:
: Inner class to configure the schema.model = Task
: Tells the schema which SQLAlchemy model to base itself on.load_instance = True
: Whenschema.load(data)
is called, it will attempt to return an instance of theTask
model (either new or updated) instead of just a dictionary. This is very convenient for ORM integration.sqla_session = db.session
: Critical forload_instance=True
. It tells Marshmallow to use the current Flask-SQLAlchemy database session when creating or updating model instances.
- Field Overrides: Even with
AutoSchema
, you can explicitly define fields. This is useful for:- Adding validation (
required=True
,validate=...
). - Marking fields as output-only (
dump_only=True
). This prevents clients from trying to set values for fields likeid
,created_at
during POST/PUT/PATCH requests. - Marking fields as input-only (
load_only=True
). - Specifying output formatting (
format='iso'
for datetimes).
- Adding validation (
- Validation Helpers: Marshmallow provides useful validators like
validate.Length
,validate.OneOf
,validate.Range
, etc. TaskPatchSchema
: We created a separate schema inheriting fromTaskSchema
specifically for PATCH operations. In this schema, we override fields liketitle
andstatus
to setrequired=False
, reflecting the nature of PATCH where any subset of fields can be provided.
Serialization (Object -> JSON)
Now, let's replace the to_dict()
method calls in our GET routes with Marshmallow serialization using schema.dump()
.
-
Modify
app/tasks/routes.py
(GET routes):# app/tasks/routes.py from flask import request, jsonify, abort from . import tasks_bp from ..extensions import db from ..models import Task from .schemas import TaskSchema, TaskPatchSchema # Import schemas from sqlalchemy import exc from marshmallow import ValidationError # Import ValidationError for specific handling if needed # Instantiate schemas (outside functions to avoid re-creation on every request) # Use 'many=True' for lists, 'many=False' (default) for single objects task_schema = TaskSchema() tasks_schema = TaskSchema(many=True) task_patch_schema = TaskPatchSchema() # Schema for PATCH operations # --- Helper Functions --- (Keep them or integrate into responses) def make_success_response(data_payload, status_code=200): response = jsonify({"status": "success", "data": data_payload}) response.status_code = status_code return response # ... make_error_response ... (Keep this) # --- Routes --- @tasks_bp.route('/', methods=['GET']) def get_tasks(): status_filter = request.args.get('status') query = Task.query if status_filter: # Basic validation for filter value if status_filter not in ['pending', 'completed']: abort(400, description="Invalid status filter. Use 'pending' or 'completed'.") query = query.filter_by(status=status_filter) tasks = query.order_by(db.desc(Task.created_at)).all() # Use Marshmallow schema to serialize the list of task objects result = tasks_schema.dump(tasks) # tasks_schema.dump() automatically converts the list of Task objects # into a list of dictionaries according to the TaskSchema definition. return make_success_response(result) @tasks_bp.route('/<int:task_id>', methods=['GET']) def get_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found") # Use Marshmallow schema to serialize the single task object result = task_schema.dump(task) # task_schema.dump() converts the single Task object into a dictionary. return make_success_response(result) # ... (Keep POST, PUT, PATCH, DELETE for now) ...
Explanation:
- We import
TaskSchema
andTaskPatchSchema
. - We instantiate the schemas once outside the view functions.
TaskSchema(many=True)
creates an instance specifically designed to handle lists of objects. - In
get_tasks
,tasks_schema.dump(tasks)
takes the list ofTask
model instances and returns a list of dictionaries, ready forjsonify
. - In
get_task
,task_schema.dump(task)
takes the singleTask
instance and returns a dictionary. - The manual
to_dict()
method in theTask
model is no longer needed for serialization and can be removed.
Deserialization and Validation (JSON -> Object Data)
The real power comes from using schema.load(data)
in your POST, PUT, and PATCH routes. This method performs two actions:
- Deserialization: Converts the input data (e.g., dictionary from
request.get_json()
) into a structure defined by the schema. Ifload_instance=True
, it attempts to create/update a model instance. - Validation: Applies all validation rules defined in the schema (
required
,validate
, etc.). If validation fails, it raises aValidationError
.
Let's refactor the write operations:
-
Modify
app/tasks/routes.py
(POST, PUT, PATCH routes):# app/tasks/routes.py # ... (Imports and schema instantiations remain the same) ... # --- Routes --- # ... (GET routes remain the same) ... @tasks_bp.route('/', methods=['POST']) def create_task(): json_data = request.get_json() if not json_data: # Use make_error_response or abort for consistency return make_error_response("No input data provided", 400) # abort(400, description="No input data provided") try: # Validate and deserialize input data using the schema # Because load_instance=True, this returns a Task model instance new_task = task_schema.load(json_data, session=db.session) # We pass the session explicitly to ensure the new instance is associated with it # Persist the new task instance (created by schema.load) db.session.add(new_task) db.session.commit() # Serialize the newly created task for the response result = task_schema.dump(new_task) return make_success_response(result, 201) except ValidationError as err: # The handle_marshmallow_validation error handler in app/__init__.py # will catch this and return a 422 response. # We just re-raise it here, or we could handle it locally if needed. # For clarity, let the app-level handler manage it. return make_error_response(f"Input validation failed: {err.messages}", 422) # pass # Let the app error handler handle it except exc.SQLAlchemyError as e: # Catch potential database errors db.session.rollback() print(f"Database error on create: {e}") abort(500, description="Database error occurred during task creation.") except Exception as e: # Catch other unexpected errors db.session.rollback() print(f"Unexpected error on create: {e}") abort(500, description="An unexpected error occurred.") @tasks_bp.route('/<int:task_id>', methods=['PUT']) def update_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task ID {task_id} not found for update") json_data = request.get_json() if not json_data: return make_error_response("No input data provided", 400) try: # Validate and deserialize input data into the existing task instance # load(data, instance=existing_instance, session=...) updates the instance in place updated_task = task_schema.load( json_data, instance=task, # Pass the existing task to update session=db.session, # Provide the session partial=False # Ensure all required fields are present for PUT ) db.session.commit() # Serialize the updated task result = task_schema.dump(updated_task) return make_success_response(result) except ValidationError as err: # Let the app error handler handle it return make_error_response(f"Input validation failed: {err.messages}", 422) # pass except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on update: {e}") abort(500, description="Database error occurred during task update.") except Exception as e: db.session.rollback() print(f"Unexpected error on update: {e}") abort(500, description="An unexpected error occurred.") @tasks_bp.route('/<int:task_id>', methods=['PATCH']) def patch_task(task_id): task = Task.query.get_or_404(task_id, description=f"Task ID {task_id} not found for update") json_data = request.get_json() if not json_data: return make_error_response("No input data provided", 400) try: # Use the specific TaskPatchSchema for partial updates # Pass partial=True to allow missing fields updated_task = task_patch_schema.load( json_data, instance=task, # Update existing task session=db.session, # Provide the session partial=True # Allow partial updates (missing fields are ignored) ) # Check if any data was actually provided in the patch request # Marshmallow load with partial=True won't error on empty data, so check keys if not json_data.keys(): abort(400, description="No fields provided for update in PATCH request.") db.session.commit() # Serialize the updated task result = task_schema.dump(updated_task) # Use the main schema for output return make_success_response(result) except ValidationError as err: return make_error_response(f"Input validation failed: {err.messages}", 422) # pass # Let app handler manage except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on patch: {e}") abort(500, description="Database error occurred during task update.") except Exception as e: db.session.rollback() print(f"Unexpected error on patch: {e}") abort(500, description="An unexpected error occurred.") @tasks_bp.route('/<int:task_id>', methods=['DELETE']) def delete_task(task_id): # No validation needed for delete beyond checking existence task = Task.query.get_or_404(task_id, description=f"Task ID {task_id} not found") try: db.session.delete(task) db.session.commit() return '', 204 # No Content except exc.SQLAlchemyError as e: db.session.rollback() print(f"Database error on delete: {e}") abort(500, description="Database error occurred during task deletion.") except Exception as e: db.session.rollback() print(f"Unexpected error on delete: {e}") abort(500, description="An unexpected error occurred.") # --- Blueprint Specific Error Handlers --- # Remove these or keep if specific blueprint handling is needed. # The app-level handlers are generally sufficient. # @tasks_bp.errorhandler(400) ...
Explanation:
schema.load(json_data, session=db.session)
(POST): Takes the raw JSON dictionary. Validates it againstTaskSchema
. If valid, creates a newTask
model instance (becauseload_instance=True
). We pass thedb.session
so the new instance is associated with the current transaction.schema.load(json_data, instance=task, session=db.session, partial=False)
(PUT): Takes the raw JSON. Validates againstTaskSchema
. Theinstance=task
argument tells Marshmallow to update the existingtask
object with the loaded data instead of creating a new one.partial=False
(the default forload
) enforces that all fields not marked asdump_only
in the schema must be present in the input, which aligns with the semantics of PUT (full replacement).task_patch_schema.load(json_data, instance=task, session=db.session, partial=True)
(PATCH): Uses theTaskPatchSchema
(where fields are optional).instance=task
updates the existing object. Crucially,partial=True
tellsload
to only update the fields actually present in thejson_data
and not to error if other fields are missing.- Error Handling: The
try...except ValidationError
block catches validation errors raised byschema.load()
. Our app-level error handler (handle_marshmallow_validation
) takes care of formatting the 422 response. Other exceptions (database errors, etc.) are handled as before. - Committing: After a successful
load
that modifies an instance (directly or viaload_instance=True
), you still need to calldb.session.commit()
to persist the changes to the database. Marshmallow only handles the object manipulation and validation in memory.
Workshop: Integrating Marshmallow into the Task API
Goal:
Apply Marshmallow schemas to the refactored Task API (using Blueprints and Factory) for robust validation and serialization, eliminating manual checks and to_dict()
.
Steps:
-
Prerequisites:
- Ensure you are in the
advanced_api
project directory. - Ensure the virtual environment is active (
source venv/bin/activate
). - Make sure
Flask-Marshmallow
andmarshmallow-sqlalchemy
are installed (pip install Flask-Marshmallow marshmallow-sqlalchemy
) and added torequirements.txt
.
- Ensure you are in the
-
Initialize Marshmallow Extension:
- Modify
app/extensions.py
to addma = Marshmallow()
. - Modify
app/__init__.py
to importma
and callma.init_app(app)
insidecreate_app
. - Add the
handle_marshmallow_validation
error handler withincreate_app
inapp/__init__.py
(as shown in the explanation above).
- Modify
-
Create Task Schema:
- Create the file
app/tasks/schemas.py
. - Define
TaskSchema
inheriting fromma.SQLAlchemyAutoSchema
inside this file. - Configure the
Meta
class (model=Task
,load_instance=True
,sqla_session=db.session
). - Override fields (
title
,status
,id
,created_at
,updated_at
) to add validation (required
,validate.Length
,validate.OneOf
) and serialization control (dump_only
,format
). - Define the optional
TaskPatchSchema
inheriting fromTaskSchema
and making relevant fields not required (required=False
).
- Create the file
-
Refactor Task Routes (
app/tasks/routes.py
):- Import
TaskSchema
,TaskPatchSchema
from.schemas
. - Import
ValidationError
frommarshmallow
. - Instantiate the schemas outside the view functions:
task_schema
,tasks_schema
,task_patch_schema
. - GET Routes (
/
and/<id>
): Replacetask.to_dict()
/ list comprehension withtask_schema.dump(task)
/tasks_schema.dump(tasks)
. - POST Route (
/
):- Get
request.get_json()
. - Inside a
try
block, callnew_task = task_schema.load(json_data, session=db.session)
. - Call
db.session.add(new_task)
anddb.session.commit()
. - Serialize the result using
task_schema.dump(new_task)
. - Add
except ValidationError as err:
block (can justpass
or returnmake_error_response
- the app handler will catch it if not handled locally). Keep other exception handlers.
- Get
- PUT Route (
/<id>
):- Fetch the existing
task
. - Get
request.get_json()
. - Inside a
try
block, callupdated_task = task_schema.load(json_data, instance=task, session=db.session, partial=False)
. - Call
db.session.commit()
. - Serialize using
task_schema.dump(updated_task)
. - Add
except ValidationError as err:
.
- Fetch the existing
- PATCH Route (
/<id>
):- Fetch the existing
task
. - Get
request.get_json()
. - Inside a
try
block, callupdated_task = task_patch_schema.load(json_data, instance=task, session=db.session, partial=True)
. - Add check
if not json_data.keys(): abort(400, ...)
- Call
db.session.commit()
. - Serialize using
task_schema.dump(updated_task)
. - Add
except ValidationError as err:
.
- Fetch the existing
- DELETE Route (
/<id>
): No changes needed here regarding Marshmallow.
- Import
-
Remove
to_dict()
Method:- Go to
app/models.py
and delete theto_dict(self)
method from theTask
class. It's no longer used.
- Go to
-
Test Thoroughly:
- Initialize Database: Run
flask init-db
if needed. - Run Application:
python run.py
. - Test Success Cases (using
curl
):GET /api/tasks/
GET /api/tasks/1
POST /api/tasks/
with valid data ({"title": "Test Marshmallow", "description": "..."}
) -> Check 201 response and data format.PUT /api/tasks/1
with valid full data ({"title": "Updated Title", "status": "completed", "description": "..."}
) -> Check 200 response.PATCH /api/tasks/1
with partial data ({"status": "pending"}
) -> Check 200 response.DELETE /api/tasks/2
-> Check 204 response.
- Test Validation Error Cases:
POST /api/tasks/
with missing title ({}
) -> Expect 422, check error details.POST /api/tasks/
with empty title ({"title": ""}
) -> Expect 422.POST /api/tasks/
with invalid status ({"title": "Bad Status", "status": "invalid"}
) -> Expect 422.PUT /api/tasks/1
with missing status ({"title": "Incomplete PUT"}
) -> Expect 422 (becausepartial=False
).PATCH /api/tasks/1
with invalid status ({"status": "invalid"}
) -> Expect 422.
- Test Other Errors:
GET /api/tasks/999
-> Expect 404.- Send non-JSON data to POST/PUT/PATCH -> Expect 415 (if
request.is_json
check is still there) or potentially other errors depending on Flask/Werkzeug handling.
- Initialize Database: Run
Outcome: Your API now uses Marshmallow for robust data validation and serialization. Your view functions are cleaner, focusing on request handling and database interaction, while the schema definitions handle the data transformation and validation rules declaratively. Error handling for validation failures is standardized through the ValidationError
handler, providing informative responses to the client.
7. Authentication and Authorization (JWT)
Most real-world APIs need to control who can access them and what actions different users are allowed to perform. Simply exposing CRUD operations on your data to the public internet is usually not desirable or secure. This is where Authentication and Authorization come in.
This section explores these critical concepts and demonstrates how to implement a common and robust solution using JSON Web Tokens (JWT) with the Flask-JWT-Extended extension.
Understanding the Concepts
It's crucial to distinguish between Authentication and Authorization:
-
Authentication (AuthN): "Who are you?"
- This is the process of verifying the identity of a client (a user, another service, etc.) trying to access your API.
- The client typically presents some form of credentials (like a username/password, an API key, or a token).
- The server validates these credentials against a trusted source (e.g., a user database).
- If successful, the server knows who the client is.
- Example: Logging into a website with your username and password authenticates you.
-
Authorization (AuthZ): "What are you allowed to do?"
- This process occurs after successful authentication.
- It determines whether the authenticated client has the necessary permissions to perform the requested action on a specific resource.
- Permissions are often based on roles (e.g., admin, editor, viewer) or specific access rights.
- Example: After logging in (authentication), you might be authorized to read articles but not authorized to publish new ones unless you have an 'editor' role.
For APIs, especially stateless REST APIs, managing authentication and authorization on every request requires a mechanism that doesn't rely on traditional server-side sessions stored in memory.
Common Authentication Methods for APIs
Several strategies exist for API authentication:
-
HTTP Basic Authentication:
- How it works: The client sends the username and password in the
Authorization
HTTP header, Base64 encoded (Authorization: Basic base64(username:password)
). - Pros: Simple to implement, widely supported.
- Cons: Sends credentials with every request. MUST be used over HTTPS to prevent credentials from being intercepted as plain text (Base64 is easily decoded). Not suitable for third-party applications accessing user data. Generally considered insecure for modern web APIs unless strictly over HTTPS and for very simple use cases.
- How it works: The client sends the username and password in the
-
API Keys:
- How it works: The server issues a unique secret key to each client. The client includes this key in requests, often in a custom HTTP header (e.g.,
X-API-Key: your_secret_key
) or as a query parameter. The server validates the key. - Pros: Relatively simple. Good for server-to-server communication or tracking usage by different applications.
- Cons: Keys are often long-lived; if compromised, they grant access until revoked. Managing revocation can be complex. Doesn't inherently represent a specific user session. Doesn't typically include expiration times or fine-grained permissions within the key itself.
- How it works: The server issues a unique secret key to each client. The client includes this key in requests, often in a custom HTTP header (e.g.,
-
Token-Based Authentication (JWT):
- How it works: This is the most common approach for modern APIs, especially those serving web and mobile front-ends.
- The user authenticates once (e.g., with username/password).
- The server generates a signed token (usually a JWT) containing information (claims) about the user (e.g., user ID, roles) and an expiration time.
- The server sends this token back to the client.
- The client stores the token (e.g., in browser local storage, memory) and includes it in the
Authorization
header of subsequent requests (typically as a Bearer token:Authorization: Bearer <token>
). - The server verifies the token's signature (to ensure it wasn't tampered with) and checks its expiration. Since the user's identity and potentially roles are inside the token payload, the server doesn't need to perform a database lookup on every request just to know who the user is (though it might for authorization checks).
- Pros: Stateless (server doesn't need to store session state). Scalable (verification is often CPU-bound, not I/O-bound). Secure (when implemented correctly over HTTPS, signature prevents tampering). Flexible (can store custom claims). Widely adopted standard.
- Cons: Tokens can be large if they contain many claims. If a token is compromised before it expires, it can be used by an attacker (mitigated by short expiration times and potentially refresh tokens/blocklists). Managing token revocation before expiration requires extra mechanisms (like blocklists).
- How it works: This is the most common approach for modern APIs, especially those serving web and mobile front-ends.
We will focus on JWT as it's highly relevant for typical web API development.
JSON Web Tokens (JWT) Structure
A JWT typically consists of three parts, separated by dots (.
):
- Header: Contains metadata about the token, like the type (
JWT
) and the signing algorithm used (HS256
,RS256
, etc.). Base64Url encoded. - Payload: Contains the claims – statements about the entity (typically the user) and additional data. There are registered claims (standardized, like
iss
issuer,exp
expiration time,sub
subject/user ID), public claims, and private claims (custom data). Base64Url encoded. - Signature: Created by taking the encoded header, the encoded payload, a secret key (known only to the server), and signing them using the algorithm specified in the header.
The signature ensures integrity. Anyone can decode the header and payload (they are just Base64Url encoded, not encrypted), but only someone with the secret key can create a valid signature. The server uses the secret key to verify that the header and payload haven't been changed since the token was issued.
Implementing JWT with Flask-JWT-Extended
Flask-JWT-Extended is a popular Flask extension that simplifies JWT implementation.
1. Setup and Configuration:
- Installation: (Assuming you're still in the
advanced_api
project withvenv
active) - Configuration (
app/config.py
): You MUST set a secret key for signing the JWTs. This should be a strong, random, and secret value, ideally loaded from environment variables or an instance config file (and never committed to version control).# app/config.py import os # ... other imports ... class Config: # ... other base config ... # Use the existing SECRET_KEY or define a specific JWT one JWT_SECRET_KEY = os.environ.get('JWT_SECRET_KEY', 'default-super-secret-key-for-dev-change-me!') # Recommended: Use a different secret than Flask's general SECRET_KEY # Recommended: Load from environment variable in production # Optional: Configure token expiration times (in seconds) JWT_ACCESS_TOKEN_EXPIRES = 3600 # 1 hour JWT_REFRESH_TOKEN_EXPIRES = 604800 # 7 days # ... DevelopmentConfig, TestingConfig, ProductionConfig ... # Ensure JWT_SECRET_KEY is set securely in ProductionConfig, likely via environment vars class ProductionConfig(Config): # ... JWT_SECRET_KEY = os.environ.get('JWT_SECRET_KEY') # Must be set in prod env # Make sure JWT_SECRET_KEY is actually set in the environment in production if not JWT_SECRET_KEY: raise ValueError("No JWT_SECRET_KEY set for production environment") # ...
- Initialization (
app/extensions.py
andapp/__init__.py
):# app/extensions.py from flask_sqlalchemy import SQLAlchemy from flask_marshmallow import Marshmallow from flask_jwt_extended import JWTManager # Import JWTManager db = SQLAlchemy() ma = Marshmallow() jwt = JWTManager() # Instantiate JWTManager
# app/__init__.py # ... imports ... from .extensions import db, ma, jwt # Import jwt def create_app(config_name='development'): # ... app creation, config loading ... # --- Initialize Extensions --- db.init_app(app) ma.init_app(app) jwt.init_app(app) # Initialize JWTManager with the app # --- Register Blueprints --- # ... # --- Error Handlers --- # ... other handlers ... # Note: Flask-JWT-Extended provides default handlers for common JWT errors # (ExpiredSignatureError, InvalidHeaderError, etc.) which return JSON errors. # You can customize them using @jwt.expired_token_loader, @jwt.invalid_token_loader etc. # if needed. The default handlers are often sufficient. # ... CLI, health check ... return app
2. Creating Tokens (Login): Typically, you create tokens when a user successfully logs in.
-
Need a User Model: First, we need a way to store users and their passwords. Let's add a simple
User
model.We use# app/models.py from .extensions import db import datetime from werkzeug.security import generate_password_hash, check_password_hash # For passwords # ... (Task model definition) ... class User(db.Model): __tablename__ = 'users' id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False) email = db.Column(db.String(120), unique=True, nullable=False) password_hash = db.Column(db.String(128), nullable=False) role = db.Column(db.String(50), nullable=False, default='user') # Add role (e.g., 'user', 'admin') created_at = db.Column(db.DateTime, server_default=db.func.now()) def set_password(self, password): """Create hashed password.""" self.password_hash = generate_password_hash(password) def check_password(self, password): """Check hashed password.""" return check_password_hash(self.password_hash, password) def __repr__(self): return f'<User {self.username} ({self.role})>' def to_dict(self): # Basic serialization (can use Marshmallow later if needed) return { 'id': self.id, 'username': self.username, 'email': self.email, 'role': self.role, 'created_at': self.created_at.isoformat() if self.created_at else None }
werkzeug.security
helpers to securely hash passwords. Never store plain text passwords!** -
Update
init-db
command: Modify theinit-db
command inapp/__init__.py
to also create the users table and potentially a default user.(Remember to run# app/__init__.py -> inside create_app() -> inside init_db_command() # ... after db.create_all() ... print("Initialized the database.") # Seed initial data if Task.query.count() == 0: # Seed tasks only if empty # ... (seeding tasks) ... print("Initial tasks seeded.") # Seed default user if none exist if User.query.count() == 0: print("Seeding default admin user...") admin_user = User(username='admin', email='admin@example.com', role='admin') admin_user.set_password('password') # Set a secure password in real apps! db.session.add(admin_user) # Add a regular user reg_user = User(username='user', email='user@example.com', role='user') reg_user.set_password('password') db.session.add(reg_user) db.session.commit() print("Default users seeded (admin/password, user/password). CHANGE passwords!")
flask init-db
again after adding the model and seeding logic). -
Create Authentication Blueprint and Login Route:
# app/auth/__init__.py from flask import Blueprint auth_bp = Blueprint('auth', __name__) from . import routes # Import routes to register them
# app/auth/routes.py from flask import request, jsonify from . import auth_bp from ..models import User from ..extensions import db from flask_jwt_extended import create_access_token, create_refresh_token, get_jwt_identity from marshmallow import ValidationError # Note: Using simple make_error_response for consistency. # Could use Marshmallow schemas for input validation here too. def make_error_response(message, status_code): response = jsonify({"status": "error", "error": {"message": message}}) response.status_code = status_code return response @auth_bp.route('/login', methods=['POST']) def login(): data = request.get_json() if not data or 'username' not in data or 'password' not in data: return make_error_response("Username and password required", 400) username = data['username'] password = data['password'] user = User.query.filter_by(username=username).first() # Verify user exists and password is correct if user and user.check_password(password): # Identity can be anything that uniquely identifies the user (e.g., user.id) # Can also add additional claims (like role) if needed in the token payload identity = user.id additional_claims = {"role": user.role} # Example: include role in token access_token = create_access_token(identity=identity, additional_claims=additional_claims) refresh_token = create_refresh_token(identity=identity) # Refresh token usually only contains identity return jsonify(access_token=access_token, refresh_token=refresh_token), 200 else: return make_error_response("Invalid username or password", 401) # Unauthorized # Optional: Register route (example) @auth_bp.route('/register', methods=['POST']) def register(): data = request.get_json() if not data or not all(k in data for k in ('username', 'email', 'password')): return make_error_response("Missing username, email, or password", 400) if User.query.filter((User.username == data['username']) | (User.email == data['email'])).first(): return make_error_response("Username or email already exists", 409) # Conflict new_user = User( username=data['username'], email=data['email'], role='user' # Default role for registration ) new_user.set_password(data['password']) try: db.session.add(new_user) db.session.commit() # Maybe return user info (without password hash) or just success message return jsonify({"message": "User registered successfully", "user_id": new_user.id}), 201 except Exception as e: db.session.rollback() print(f"Error during registration: {e}") return make_error_response("Failed to register user due to server error", 500)
-
Register Auth Blueprint (
app/__init__.py
):# app/__init__.py -> inside create_app() # ... other imports ... from .auth import auth_bp # Import the auth blueprint def create_app(config_name='development'): # ... app creation ... # ... initialize extensions ... # --- Register Blueprints --- app.register_blueprint(tasks_bp, url_prefix='/api/tasks') app.register_blueprint(auth_bp, url_prefix='/auth') # Register auth routes under /auth # ... error handlers, cli ... return app
3. Protecting Endpoints:
Use the @jwt_required()
decorator on routes that require a valid access token.
-
Modify
app/tasks/routes.py
: Add the decorator to all task routes (or specific ones needing protection).# app/tasks/routes.py # ... other imports ... from flask_jwt_extended import jwt_required, get_jwt_identity # Import decorators/helpers # ... schema instantiations ... @tasks_bp.route('/', methods=['GET']) @jwt_required() # Protect this route def get_tasks(): current_user_id = get_jwt_identity() # Get user ID from token print(f"Accessing get_tasks as user ID: {current_user_id}") # ... (rest of the function - potentially filter tasks by user ID later) ... # ... (serialization) ... return make_success_response(result) @tasks_bp.route('/<int:task_id>', methods=['GET']) @jwt_required() # Protect this route def get_task(task_id): current_user_id = get_jwt_identity() print(f"Accessing get_task({task_id}) as user ID: {current_user_id}") task = Task.query.get_or_404(task_id, description=f"Task with ID {task_id} not found") # Add authorization check here later if needed (e.g., is this user allowed to see this task?) result = task_schema.dump(task) return make_success_response(result) @tasks_bp.route('/', methods=['POST']) @jwt_required() # Protect this route def create_task(): current_user_id = get_jwt_identity() print(f"Creating task as user ID: {current_user_id}") # ... (validation using schema load) ... # Optionally associate task with user: new_task.user_id = current_user_id # ... (commit, serialize) ... return make_success_response(result, 201) @tasks_bp.route('/<int:task_id>', methods=['PUT']) @jwt_required() # Protect this route def update_task(task_id): current_user_id = get_jwt_identity() print(f"Updating task {task_id} as user ID: {current_user_id}") # ... (fetch task) ... # Add authorization check # ... (validation using schema load) ... # ... (commit, serialize) ... return make_success_response(result) @tasks_bp.route('/<int:task_id>', methods=['PATCH']) @jwt_required() # Protect this route def patch_task(task_id): current_user_id = get_jwt_identity() print(f"Patching task {task_id} as user ID: {current_user_id}") # ... (fetch task) ... # Add authorization check # ... (validation using schema load) ... # ... (commit, serialize) ... return make_success_response(result) @tasks_bp.route('/<int:task_id>', methods=['DELETE']) @jwt_required() # Protect this route def delete_task(task_id): current_user_id = get_jwt_identity() jwt_claims = get_jwt() # Get full claims if needed (e.g., for role) current_user_role = jwt_claims.get('role') print(f"Attempting delete task {task_id} as user ID: {current_user_id}, Role: {current_user_role}") # ... (fetch task) ... # Add authorization check (e.g., only admin or task owner can delete) # ... (delete, commit) ... return '', 204 # ... (Rest of the file, error handlers if any) ...
If you try to access these protected task endpoints without a valid Authorization: Bearer <token>
header, Flask-JWT-Extended will automatically return a 401 Unauthorized error (JSON formatted by default).
4. Refreshing Tokens: Access tokens are typically short-lived for security. Refresh tokens are longer-lived and are used to obtain new access tokens without requiring the user to log in again.
- Add Refresh Route (
app/auth/routes.py
):The client would call this endpoint with their valid refresh token when their access token expires.# app/auth/routes.py # ... imports ... from flask_jwt_extended import jwt_required, get_jwt_identity, create_access_token, get_jwt # ... login, register ... @auth_bp.route('/refresh', methods=['POST']) @jwt_required(refresh=True) # Requires a valid *refresh* token def refresh(): current_user_id = get_jwt_identity() # Get identity from refresh token # Optionally fetch user role again or get from refresh token if stored there (less common) user = User.query.get(current_user_id) if not user: return make_error_response("User not found for token identity", 404) # Create new access token with potentially updated claims if needed additional_claims = {"role": user.role} new_access_token = create_access_token(identity=current_user_id, additional_claims=additional_claims) return jsonify(access_token=new_access_token), 200
5. Handling Token Errors:
Flask-JWT-Extended handles common errors like expired tokens, missing tokens, invalid signatures, etc., by default, returning appropriate JSON errors and status codes (e.g., 401, 422). You can customize these handlers if needed using decorators like @jwt.expired_token_loader
.
6. Storing Tokens and Security Considerations:
- Client-Side Storage: The client needs to store the access and refresh tokens received from the
/login
endpoint. Common places are:- Memory: Store in JavaScript variables. Lost on page refresh/tab close. Generally secure against XSS if not exposed globally.
- Session Storage: Browser storage, cleared when the browser tab is closed. Accessible via JavaScript (XSS risk).
- Local Storage: Browser storage, persists until cleared manually or by code. Accessible via JavaScript (XSS risk).
- Cookies: Can be sent automatically by the browser.
HttpOnly
cookies are inaccessible to JavaScript, mitigating XSS risk for stealing the token directly. Cookies are vulnerable to Cross-Site Request Forgery (CSRF) attacks, so CSRF protection mechanisms (e.g., Flask-WTF CSRF tokens, SameSite cookie attribute) are essential if using cookies for tokens.
- Security:
- HTTPS: ALWAYS use HTTPS to protect tokens (and any sensitive data) in transit.
- XSS (Cross-Site Scripting): If storing tokens where JavaScript can access them (Local/Session Storage), be extremely vigilant about preventing XSS vulnerabilities in your frontend, as malicious scripts could steal the tokens.
- CSRF (Cross-Site Request Forgery): If using cookies, implement CSRF protection.
- Token Expiration: Keep access token lifetimes short (minutes to an hour). Use refresh tokens for longer sessions.
- Token Revocation (Logout): JWTs are inherently difficult to revoke before expiration. Common strategies for implementing logout include:
- Client-Side Only: Simply delete the token from client storage. The token remains technically valid until expiration but the client no longer sends it. Doesn't prevent replay if the token was compromised.
- Blocklisting: Maintain a server-side list (e.g., in Redis or a database) of revoked token identifiers (using the
jti
claim). Check this list on each request. Flask-JWT-Extended has support for this (@jwt.token_in_blocklist_loader
,JWT_BLOCKLIST_ENABLED
). This adds state back to your system.
Implementing Authorization
Now that users are authenticated, we need to control what they can do.
1. Role-Based Access Control (RBAC):
A common approach is to assign roles to users (as we did with the role
field in the User
model: 'user', 'admin'). Then, you check the authenticated user's role before allowing certain actions.
2. Checking Permissions in Routes:
You can access the user's identity (e.g., ID) using get_jwt_identity()
and the custom claims (like role) using get_jwt()
within your protected routes.
# Example within app/tasks/routes.py -> delete_task
from flask_jwt_extended import jwt_required, get_jwt_identity, get_jwt
@tasks_bp.route('/<int:task_id>', methods=['DELETE'])
@jwt_required()
def delete_task(task_id):
current_user_id = get_jwt_identity()
jwt_claims = get_jwt() # Get all claims from the token payload
current_user_role = jwt_claims.get('role') # Access the 'role' claim we added
print(f"Delete request for task {task_id} by User ID: {current_user_id}, Role: {current_user_role}")
task = Task.query.get_or_404(task_id)
# --- Authorization Check ---
# Example: Only 'admin' role can delete any task
if current_user_role != 'admin':
# Non-admins cannot delete (could add logic for users deleting their own tasks later)
return make_error_response("Permission denied: Admin role required to delete tasks.", 403) # 403 Forbidden
# If authorization check passes:
try:
db.session.delete(task)
db.session.commit()
return '', 204
except Exception as e:
db.session.rollback()
print(f"Error deleting task: {e}")
abort(500, "Failed to delete task")
3. Using Claims and Custom Decorators:
- Claims: Including roles or specific permissions directly in the JWT payload (as we did with
additional_claims={"role": user.role}
) makes them readily available viaget_jwt()
. -
Custom Decorators: For more complex or reusable authorization logic, you can create custom decorators that wrap
@jwt_required()
and perform additional permission checks.# Example: app/auth/decorators.py from functools import wraps from flask_jwt_extended import verify_jwt_in_request, get_jwt from flask import jsonify def admin_required(): """Decorator to ensure user has 'admin' role.""" def wrapper(fn): @wraps(fn) def decorator(*args, **kwargs): verify_jwt_in_request() # Ensure JWT is present and valid claims = get_jwt() if claims.get("role") == "admin": return fn(*args, **kwargs) else: # Using jsonify directly here, or use make_error_response helper return jsonify(msg="Permission denied: Admins only!"), 403 return decorator return wrapper # Usage in routes.py: # from ..auth.decorators import admin_required # # @tasks_bp.route('/<int:task_id>', methods=['DELETE']) # @admin_required() # Custom decorator combines JWT check and role check # def delete_task(task_id): # # No need to check role again here # task = Task.query.get_or_404(task_id) # # ... deletion logic ...
Workshop: Implementing JWT Authentication for the Task API
Goal:
Secure the Task API using Flask-JWT-Extended. Add user registration and login. Protect task endpoints, requiring users to be logged in. Implement a basic admin role check for deleting tasks.
Steps:
-
Prerequisites:
- You are in the
advanced_api
project directory. - Virtual environment is active.
Flask-JWT-Extended
andWerkzeug
are installed and inrequirements.txt
.
- You are in the
-
Configure JWT:
- Ensure
JWT_SECRET_KEY
is set inapp/config.py
(use a default for dev, but plan for environment variables in prod). - Set token expiration times (
JWT_ACCESS_TOKEN_EXPIRES
,JWT_REFRESH_TOKEN_EXPIRES
) if desired.
- Ensure
-
Initialize JWTManager:
- Instantiate
jwt = JWTManager()
inapp/extensions.py
. - Call
jwt.init_app(app)
in thecreate_app
factory inapp/__init__.py
.
- Instantiate
-
Add User Model:
- Define the
User
class withusername
,email
,password_hash
,role
,set_password
,check_password
inapp/models.py
.
- Define the
-
Update Database Initialization:
- Modify the
init-db
command inapp/__init__.py
to create theusers
table and seed default 'admin' and 'user' accounts. - Run
flask init-db
in your terminal to apply the changes to your database (instance/tasks_dev.db
).
- Modify the
-
Create Auth Blueprint:
- Create
app/auth/__init__.py
andapp/auth/routes.py
. - Define
auth_bp = Blueprint(...)
inapp/auth/__init__.py
. - Implement
/login
and/register
routes inapp/auth/routes.py
usingcreate_access_token
,create_refresh_token
,User
model, and password hashing. Include therole
inadditional_claims
for the access token. - Implement the
/refresh
route using@jwt_required(refresh=True)
. - Register
auth_bp
with the prefix/auth
inapp/__init__.py
.
- Create
-
Protect Task Endpoints:
- In
app/tasks/routes.py
, importjwt_required
,get_jwt_identity
,get_jwt
. - Add the
@jwt_required()
decorator to all Task routes (GET
,POST
,PUT
,PATCH
,DELETE
).
- In
-
Implement Authorization Check:
- In the
delete_task
function inapp/tasks/routes.py
:- Get the JWT claims using
claims = get_jwt()
. - Get the user's role:
role = claims.get('role')
. - Add an
if role != 'admin':
check and return a 403 Forbidden error if the user is not an admin.
- Get the JWT claims using
- In the
-
Test the Flow:
- Run the application:
python run.py
. - Attempt access without token:
curl http://127.0.0.1:5000/api/tasks/
-> Expect 401 Unauthorized. - (Optional) Register a new user:
curl -X POST -H "Content-Type: application/json" -d '{"username": "testuser", "email":"test@test.com", "password": "password"}' http://127.0.0.1:5000/auth/register
-> Expect 201. - Login as 'user':
- Access protected route with token: Replace
<ACCESS_TOKEN>
with the token you copied. - Attempt delete as 'user': (Assuming task 1 exists)
- Login as 'admin':
- Attempt delete as 'admin':
- (Optional) Test refresh: Use the refresh token obtained from login with the
/auth/refresh
endpoint (protected by@jwt_required(refresh=True)
). Send the refresh token as a Bearer token.
- Run the application:
Outcome: You have successfully implemented authentication and basic role-based authorization for your Task API using Flask-JWT-Extended. Users must now log in to obtain tokens, and these tokens are required to access task data. You've also restricted the delete operation to users with the 'admin' role, demonstrating a fundamental authorization pattern.
8. Database Migrations (Alembic & Flask-Migrate)
As your application evolves, your database schema (the structure of your tables, columns, relationships) will inevitably need to change. You might need to add a new column, modify an existing one, create a new table, or establish relationships between tables.
Simply changing your SQLAlchemy models (app/models.py
) is not enough. Your live database needs to be updated to reflect these changes without losing existing data. Doing this manually (e.g., using ALTER TABLE
SQL commands directly) is risky, error-prone, and difficult to manage, especially in team environments and across different deployment stages (development, staging, production).
This is where database migrations come in. A database migration system allows you to manage and apply incremental changes to your database schema in a structured, version-controlled way.
The Problem with db.create_all()
We previously used db.create_all()
within our application factory or startup script. While convenient for initial setup, db.create_all()
has significant limitations:
- Only Creates: It only creates tables and columns that do not already exist.
- Doesn't Update: It will not modify existing tables. If you add a column to your model,
db.create_all()
will not add that column to the corresponding table in the database. If you remove a column from your model,db.create_all()
will not drop it from the database table. - No Downgrades: It offers no way to revert schema changes.
- Data Loss Risk: Relying on
db.drop_all()
followed bydb.create_all()
during development is feasible but catastrophic in production as it deletes all existing data.
Therefore, db.create_all()
is only suitable for the very initial creation of the database schema or in testing scenarios where data loss is acceptable. For managing schema changes in an evolving application, you need a proper migration tool.
Alembic and Flask-Migrate
Alembic is a powerful, flexible database migration tool written by the creator of SQLAlchemy. It provides a framework for generating, managing, and applying migration scripts.
Flask-Migrate is a Flask extension that integrates Alembic seamlessly into your Flask application. It provides Flask CLI commands (flask db ...
) that simplify the process of using Alembic within the context of your Flask app and SQLAlchemy models.
Benefits:
- Version Control for Schema: Migration scripts are stored as Python files, typically in a
migrations/
directory, which can (and should) be committed to your version control system (like Git). This tracks the history of your schema changes. - Repeatable Changes: Migrations can be applied consistently across different environments (developer machines, testing, staging, production).
- Incremental Updates: Apply changes step-by-step without losing data.
- Upgrades & Downgrades: Alembic scripts define both an
upgrade
function (to apply the change) and adowngrade
function (to revert the change), allowing you to move forward and backward through schema versions. - Autogeneration: Alembic can compare your SQLAlchemy models to the current state of the database and automatically generate a draft migration script for the detected changes. This significantly speeds up the process, though generated scripts often need review and sometimes manual adjustments.
Setting Up Flask-Migrate
Let's integrate Flask-Migrate into our advanced_api
project.
1. Installation:
(Ensure your advanced_api
virtual environment is active)
pip install Flask-Migrate
# Add to requirements.txt
echo "Flask-Migrate" >> requirements.txt
pip freeze > requirements.txt
2. Initialization:
Follow the standard extension pattern: instantiate in extensions.py
and initialize in the factory.
-
app/extensions.py
: -
app/__init__.py
:Key Points:# app/__init__.py # ... other imports ... from .extensions import db, ma, jwt, migrate # Import migrate from .models import User, Task # IMPORTANT: Import all models here! def create_app(config_name='development'): # ... app creation, config loading ... # --- Initialize Extensions --- db.init_app(app) ma.init_app(app) jwt.init_app(app) # Initialize Flask-Migrate AFTER db and with the app and db instances migrate.init_app(app, db) # --- Register Blueprints --- # ... # --- Error Handlers --- # ... # --- Database Initialization Command (Keep or Modify) --- # Keep the init-db command for initial setup if desired, # but migrations will handle ongoing changes. # Note: Ensure all models are imported before db operations like create_all or migrate. @app.cli.command("init-db") def init_db_command(): # ... (existing init-db logic) ... # Consider removing db.create_all() from here if exclusively using migrations # Or keep it for the very first setup before migrations exist. pass # Adjust as needed # Remove the automatic db.create_all() call from the factory if using migrations # Let 'flask db upgrade' handle table creation based on migrations. # with app.app_context(): # db.create_all() # REMOVE or comment out this line # ... rest of factory ... return app
migrate.init_app(app, db)
: Initialize Flask-Migrate, linking it to both the Flask application (app
) and the SQLAlchemy instance (db
).- Import Models: Crucially, all your SQLAlchemy models (
User
,Task
, etc.) must be imported somewhere beforemigrate.init_app
is called or before migration commands are run. Importing them inapp/__init__.py
or ensuring they are imported via blueprint registration is common practice. Alembic needs to know about all models to detect changes correctly. - Remove
db.create_all()
: Once you start using migrations, you should generally remove the automaticdb.create_all()
call from your application startup. The database schema will now be managed entirely by applying migration scripts. The very first migration will effectively create all the initial tables.
The Migration Workflow
Flask-Migrate provides commands via flask db
. Make sure your FLASK_APP
environment variable is set (e.g., export FLASK_APP=run.py
).
1. Initialize the Migration Environment (flask db init
):
This command needs to be run only once per project.
-
What it does:
- Creates a
migrations/
directory. - Inside
migrations/
, it creates:versions/
: This subdirectory will hold individual migration script files.script.py.mako
: A template file used for generating migration scripts.env.py
: A Python script that Alembic runs when executing commands. It defines how to connect to your database, find your models, and configure migration behavior. Flask-Migrate configures this automatically to work with your Flask app context and SQLAlchemy models.alembic.ini
: The main configuration file for Alembic, specifying the database connection (handled by Flask-Migrate viaenv.py
), the location of scripts, etc.
- Creates a
-
Important: Add the
migrations/
directory to your version control (Git).
2. Generate a Migration Script (flask db migrate
):
Whenever you change your SQLAlchemy models (add/remove/modify models or columns), run this command:
# Make a change in app/models.py first!
# Example: Add a 'due_date' column to the Task model
# class Task(db.Model):
# ...
# due_date = db.Column(db.Date, nullable=True) # Add this line
# ...
# Then run the migrate command:
flask db migrate -m "Add due_date column to Task model"
-m "..."
: The message provides a short description of the change and becomes part of the generated script's filename. Use meaningful messages!-
What it does:
- Alembic inspects your current models (defined in
app/models.py
and imported). - It connects to your database (using the
SQLALCHEMY_DATABASE_URI
from your Flask config) and inspects its current schema. - It compares the models and the database schema.
- If it detects differences, it automatically generates a new Python script in the
migrations/versions/
directory (e.g.,migrations/versions/xxxxxxxxxxxx_add_due_date_column_to_task_model.py
). - This script contains an
upgrade()
function with Alembic operations (likeop.add_column()
) to apply the changes, and adowngrade()
function with operations (likeop.drop_column()
) to revert them.
- Alembic inspects your current models (defined in
-
Review: Always review the autogenerated script before applying it. Autogeneration is good but not perfect, especially for complex changes like renaming columns/tables, changing column types with data implications, or dealing with constraints. You might need to edit the script manually.
3. Apply the Migration (flask db upgrade
):
This command applies pending migration scripts to your database.
flask db upgrade # Apply the latest migration(s)
# Or apply up to a specific version: flask db upgrade <revision_id>
# Or apply all migrations: flask db upgrade head
- What it does:
- Checks the database for a special table named
alembic_version
. This table stores the revision ID of the last migration applied to the database. - Finds all migration scripts in
migrations/versions/
that haven't been applied yet (i.e., are newer than the version stored inalembic_version
). - Executes the
upgrade()
function of each pending migration script in chronological order, updating the database schema. - Updates the
alembic_version
table with the ID of the latest applied migration.
- Checks the database for a special table named
4. Revert a Migration (flask db downgrade
):
This command reverts the last applied migration(s).
flask db downgrade # Revert the very last migration
# Or revert to a specific version: flask db downgrade <revision_id>
# Or revert all migrations (use with extreme caution): flask db downgrade base
- What it does:
- Executes the
downgrade()
function of the specified migration(s) in reverse chronological order. - Updates the
alembic_version
table accordingly.
- Executes the
Other Useful Commands:
flask db current
: Shows the revision ID of the migration currently applied to the database.flask db history
: Lists all migration scripts and indicates the current position.flask db show <revision_id>
: Displays information about a specific migration script.flask db stamp head
: Marks the current database schema as being up-to-date with the latest migration script without actually running the migrations. Useful if the database schema already matches the models manually or if setting up migrations on an existing database.
Autogeneration Example
If you added due_date = db.Column(db.Date, nullable=True)
to the Task
model and ran flask db migrate -m "Add due_date"
, the generated script (migrations/versions/xxxxxxxxxxxx_add_due_date.py
) might look something like this:
"""Add due_date
Revision ID: xxxxxxxxxxxx
Revises: <previous_revision_id_or_blank>
Create Date: YYYY-MM-DD HH:MM:SS.ffffff
"""
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision = 'xxxxxxxxxxxx' # The unique ID for this migration
down_revision = '<previous_revision_id_or_blank>' # ID of the migration before this one
branch_labels = None
depends_on = None
def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('tasks', schema=None) as batch_op:
batch_op.add_column(sa.Column('due_date', sa.Date(), nullable=True))
# ### end Alembic commands ###
def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
with op.batch_alter_table('tasks', schema=None) as batch_op:
batch_op.drop_column('due_date')
# ### end Alembic commands ###
revision
/down_revision
: Link migrations together in a sequence.upgrade()
: Contains the operations to apply the change (add thedue_date
column).op.batch_alter_table
is used for SQLite compatibility.downgrade()
: Contains the operations to revert the change (drop thedue_date
column).
Workshop: Adding Migrations to the Task API
Goal:
Integrate Flask-Migrate into the advanced_api
project and use it to add a due_date
column to the Task
model.
Steps:
-
Prerequisites:
- Working
advanced_api
project with virtual environment active. Flask-Migrate
installed (pip install Flask-Migrate
).
- Working
-
Integrate Flask-Migrate:
- Add
migrate = Migrate()
toapp/extensions.py
. - Import
migrate
inapp/__init__.py
. - Call
migrate.init_app(app, db)
within thecreate_app
factory. - Ensure all models (
User
,Task
) are imported inapp/__init__.py
beforemigrate.init_app
. - Crucially: Comment out or remove the
db.create_all()
call withincreate_app
(if you haven't already). Schema management will now be handled by migrations. (Theinit-db
command might still calldb.create_all
, which is okay for the very first setup before any migrations exist, butflask db upgrade head
is the preferred way to setup/update the schema going forward).
- Add
-
Initialize Migration Repository:
- Open your terminal in the
advanced_api
root directory. - Set the
FLASK_APP
environment variable:export FLASK_APP=run.py
. - Run the init command:
- Verify that the
migrations/
directory and its contents (alembic.ini
,env.py
,script.py.mako
,versions/
) are created. - (If using Git) Add and commit the
migrations/
directory:
- Open your terminal in the
-
Stamp Initial Database (If Applicable):
- If your database (
instance/tasks_dev.db
) already exists and matches your current models (because you raninit-db
ordb.create_all()
before), you need to tell Alembic that the database is already at the "head" (latest) state before you make any model changes. If it's a fresh database or you deleted it, you can skip this. - Run:
flask db stamp head
- If your database (
-
Make a Model Change:
- Edit
app/models.py
. - Add a
due_date
column to theTask
model:
- Edit
-
Generate the Migration Script:
- In the terminal:
- Observe the output. It should detect the new column and report the path to the generated script (e.g.,
migrations/versions/xxxxxxxxxxxx_add_due_date_column_to_task_model.py
). - Review the Script: Open the generated file and examine the
upgrade()
anddowngrade()
functions. Ensure they correctly reflect the intended change (adding and dropping thedue_date
column).
-
Apply the Migration:
- In the terminal:
- Observe the output. It should indicate that it's running the migration script.
- Verify (Optional): You can use a database tool (like
sqlite3
CLI or DB Browser for SQLite) to inspect thetasks.db
file and confirm that thetasks
table now has adue_date
column and that thealembic_version
table contains the revision ID of the migration you just applied.
-
Update Schema and Routes (Optional but Recommended):
- Modify
app/tasks/schemas.py
(TaskSchema
) to include the newdue_date
field. Mark it as optional (required=False
) and potentiallydump_only=True
or add validation as needed. If you want clients to be able to set it: - Modify the
create_task
,update_task
,patch_task
routes inapp/tasks/routes.py
to handle the optionaldue_date
field from the request data, passing it to the model when creating/updating via the schema load. TheSQLAlchemyAutoSchema
withload_instance=True
should handle assigning the loadeddue_date
value automatically if the field exists in the schema and the input data.
- Modify
-
(Optional) Test Downgrade:
- In the terminal:
- Verify using a DB tool that the
due_date
column has been removed from thetasks
table and thealembic_version
table is updated (or empty if it was the first migration). - Run
flask db upgrade
again to re-apply the migration for subsequent steps.
-
Commit Changes:
- (If using Git) Add the generated migration script and any model/schema changes:
Outcome: You have successfully integrated database migrations into your project using Flask-Migrate. You learned how to initialize the migration environment, generate migration scripts based on model changes, and apply/revert those changes to your database in a controlled manner. Your database schema evolution is now manageable and version-controlled.
9. Testing Flask APIs
Writing automated tests for your API is not just a good practice; it's essential for building robust, reliable, and maintainable applications. Tests verify that your code behaves as expected, prevent regressions (unintentionally breaking existing functionality when adding new features or fixing bugs), and serve as living documentation for your API's behavior.
Flask provides excellent support for testing, and when combined with powerful testing frameworks like pytest, you can create comprehensive test suites for your APIs.
Types of Tests
For web APIs, we typically focus on several levels of testing:
-
Unit Tests:
- Focus:
Test the smallest possible units of code in isolation (e.g., a single function, a method within a class, a utility). - Goal:
Verify the correctness of the unit's logic, independent of external dependencies like databases, external APIs, or the full request/response cycle. - Techniques:
Often involves mocking or stubbing dependencies to isolate the unit under test. - Example:
Testing a helper function that formats data, testing a specific validation rule in a Marshmallow schema, testing a method on a SQLAlchemy model without interacting with the database session.
- Focus:
-
Integration Tests:
- Focus:
Test the interaction between different units or components of your application. - Goal:
Verify that integrated parts work together as expected. - Techniques:
May involve interacting with a real (but typically temporary or test-specific) database, making requests to multiple related endpoints, or testing the flow through several layers of your application (e.g., route -> service logic -> model -> database). - Example:
Testing if creating a task via a POST request correctly stores the data in the test database and if a subsequent GET request retrieves the same data. Testing if authentication middleware correctly integrates with route protection.
- Focus:
-
End-to-End (E2E) Tests (or Functional Tests):
- Focus:
Test the entire application flow from the perspective of a client. - Goal:
Verify that the complete system works as intended, simulating real user scenarios. - Techniques:
Typically involves running the entire application stack (including web server, database) and making actual HTTP requests to the API endpoints, then asserting the responses. - Example:
Simulating a user registering, logging in, creating a task, fetching the task, updating it, and finally deleting it, all through API calls.
- Focus:
For API development, integration tests often provide the most value, as they directly verify the API contracts (endpoints, request/response formats, status codes) and their interaction with core components like the database. Unit tests are crucial for complex business logic or utility functions.
Setting Up the Testing Environment
We'll use pytest
, a popular and powerful Python testing framework, along with Flask's built-in test client.
1. Installation:
(Ensure your advanced_api
virtual environment is active)
pip install pytest pytest-flask
# Add to requirements.txt
echo "pytest" >> requirements.txt
echo "pytest-flask" >> requirements.txt # Provides helpful pytest fixtures for Flask
pip freeze > requirements.txt
2. Test Directory Structure: Create a dedicated directory for tests at the root of your project.
# In advanced_api root directory
mkdir tests
touch tests/__init__.py # Make 'tests' a package
touch tests/conftest.py # pytest configuration and fixtures file
test_
(e.g., tests/test_tasks.py
, tests/test_auth.py
).
3. pytest Configuration (conftest.py
):
The conftest.py
file is a special pytest file where you can define fixtures. Fixtures are functions that provide a fixed baseline state or setup for your tests. They are a core feature of pytest, enabling reusable setup/teardown logic.
A crucial fixture for Flask testing is one that creates an instance of your application configured specifically for testing and provides a test client.
# tests/conftest.py
import pytest
from app import create_app # Import your application factory
from app.config import TestingConfig # Import the testing configuration
from app.extensions import db as _db # Import your db instance
@pytest.fixture(scope='session')
def app():
"""
Session-wide test Flask application. Configured for testing.
Handles creation and cleanup.
'session' scope means this fixture runs once per test session.
"""
print("\n--- Creating Test App Instance ---")
# Create app with testing config
_app = create_app(config_name='testing') # Use TestingConfig
# Establish an application context before running tests
ctx = _app.app_context()
ctx.push()
yield _app # Provide the app instance to tests
print("\n--- Cleaning Up Test App Context ---")
ctx.pop()
@pytest.fixture(scope='function')
def client(app):
"""
Provides a Flask test client for making requests.
'function' scope means a new client is created for each test function.
Depends on the 'app' fixture.
"""
print("--- Creating Test Client ---")
return app.test_client()
@pytest.fixture(scope='session')
def db(app):
"""
Session-wide test database. Handles creation and cleanup.
Depends on the 'app' fixture.
"""
print("--- Setting up Test Database ---")
# Use the db instance associated with the test app
with app.app_context():
# Ensure all models are imported if not done elsewhere (e.g., in app factory)
# from app.models import User, Task # May not be needed if factory imports them
# Create tables using the test app's config (e.g., in-memory SQLite)
_db.create_all()
yield _db # Provide the db instance to tests
# --- Cleanup ---
print("\n--- Tearing down Test Database ---")
with app.app_context():
_db.session.remove() # Ensure session is closed
_db.drop_all() # Drop all tables
@pytest.fixture(scope='function')
def session(db):
"""
Creates a new database session for each test function. Rolls back changes.
'function' scope ensures test isolation. Depends on the 'db' fixture.
"""
connection = db.engine.connect()
transaction = connection.begin()
# Bind the session to this connection/transaction
options = dict(bind=connection, binds={})
test_session = db.create_scoped_session(options=options)
# Make the session available on the db object for convenience,
# similar to how Flask-SQLAlchemy manages sessions per request
db.session = test_session
print("--- Starting Test DB Session ---")
yield test_session # Provide the session to the test function
# --- Cleanup for the function scope ---
print("--- Rolling back Test DB Session ---")
test_session.remove() # Also rolls back because we didn't commit
transaction.rollback() # Explicitly rollback the transaction
connection.close() # Close the connection
Explanation of Fixtures:
app()
: Creates the Flask app instance usingTestingConfig
. Thescope='session'
means this happens only once for the entire test run. It pushes an application context so that things likeurl_for
work within tests.yield
pauses the fixture to run the tests and then resumes for cleanup (popping the context).client(app)
: Depends on theapp
fixture. Usesapp.test_client()
to create a client that can simulate HTTP requests to the application without needing a running server.scope='function'
provides a clean client for each test.db(app)
: Depends on theapp
fixture. Uses the test app context to create all database tables (using the test database URI, e.g., in-memory SQLite).scope='session'
ensures tables are created once. After all tests in the session run, it drops all tables.session(db)
: Depends on thedb
fixture. This is crucial for test isolation.scope='function'
means it runs for every test function. It starts a database transaction, provides the session to the test, and then rolls back the transaction after the test finishes. This ensures that database changes made in one test do not affect subsequent tests.
Writing Tests
Test functions in pytest are typically simple functions whose names start with test_
. You declare the fixtures you need as arguments to your test function, and pytest automatically provides them.
Example: Testing the Task API (tests/test_tasks.py
)
# tests/test_tasks.py
import pytest
from app.models import Task, User # Import models
from flask import json # For decoding JSON responses
# Helper function to get auth tokens (adapt as needed)
def get_tokens(client, username='admin', password='password'):
"""Helper to log in and return access/refresh tokens."""
res = client.post('/auth/login', json={'username': username, 'password': password})
if res.status_code == 200:
data = json.loads(res.data)
return data.get('access_token'), data.get('refresh_token')
pytest.fail(f"Failed to log in as {username}. Status: {res.status_code}, Data: {res.data.decode()}")
return None, None
# --- Test Cases ---
def test_health_check(client):
"""Test the basic health check endpoint."""
res = client.get('/health')
assert res.status_code == 200
data = json.loads(res.data)
assert data['status'] == 'ok'
def test_get_tasks_unauthenticated(client):
"""Test that accessing tasks requires authentication."""
res = client.get('/api/tasks/')
assert res.status_code == 401 # Unauthorized
def test_get_tasks_authenticated_empty(client, session):
"""Test getting tasks when authenticated but no tasks exist."""
access_token, _ = get_tokens(client) # Login as default admin/user
res = client.get('/api/tasks/', headers={'Authorization': f'Bearer {access_token}'})
assert res.status_code == 200
data = json.loads(res.data)
assert data['status'] == 'success'
assert isinstance(data['data'], list)
assert len(data['data']) == 0
def test_create_task(client, session):
"""Test creating a new task."""
access_token, _ = get_tokens(client)
headers = {'Authorization': f'Bearer {access_token}', 'Content-Type': 'application/json'}
task_payload = {'title': 'My Test Task', 'description': 'Testing creation'}
res = client.post('/api/tasks/', headers=headers, json=task_payload)
assert res.status_code == 201 # Created
data = json.loads(res.data)
assert data['status'] == 'success'
assert data['data']['title'] == task_payload['title']
assert data['data']['description'] == task_payload['description']
assert data['data']['status'] == 'pending' # Default status
assert 'id' in data['data']
task_id = data['data']['id']
# Verify task exists in the database (using the test session fixture)
task_in_db = session.get(Task, task_id) # Use session.get for primary key lookup
assert task_in_db is not None
assert task_in_db.title == task_payload['title']
def test_create_task_invalid_payload(client, session):
"""Test creating a task with missing title (validation error)."""
access_token, _ = get_tokens(client)
headers = {'Authorization': f'Bearer {access_token}', 'Content-Type': 'application/json'}
invalid_payload = {'description': 'Only description'} # Missing title
res = client.post('/api/tasks/', headers=headers, json=invalid_payload)
assert res.status_code == 422 # Unprocessable Entity (due to Marshmallow validation)
data = json.loads(res.data)
assert data['status'] == 'error'
assert 'title' in data['error']['details'] # Check specific validation error
def test_get_single_task(client, session):
"""Test retrieving a single task by its ID."""
# 1. Create a task first (can also use a fixture to create sample data)
access_token, _ = get_tokens(client)
headers = {'Authorization': f'Bearer {access_token}', 'Content-Type': 'application/json'}
task_payload = {'title': 'Task to Get', 'description': 'Details...'}
create_res = client.post('/api/tasks/', headers=headers, json=task_payload)
assert create_res.status_code == 201
task_id = json.loads(create_res.data)['data']['id']
# 2. Get the created task
res = client.get(f'/api/tasks/{task_id}', headers={'Authorization': f'Bearer {access_token}'})
assert res.status_code == 200
data = json.loads(res.data)
assert data['status'] == 'success'
assert data['data']['id'] == task_id
assert data['data']['title'] == 'Task to Get'
def test_get_nonexistent_task(client, session):
"""Test retrieving a task that doesn't exist."""
access_token, _ = get_tokens(client)
res = client.get('/api/tasks/99999', headers={'Authorization': f'Bearer {access_token}'}) # Assume ID 99999 doesn't exist
assert res.status_code == 404 # Not Found
# --- Authorization Tests ---
def test_delete_task_as_admin(client, session):
"""Test that an admin can delete a task."""
# 1. Create a task
user_token, _ = get_tokens(client, username='user') # Create as regular user
headers_user = {'Authorization': f'Bearer {user_token}', 'Content-Type': 'application/json'}
task_payload = {'title': 'Task to be deleted'}
create_res = client.post('/api/tasks/', headers=headers_user, json=task_payload)
assert create_res.status_code == 201
task_id = json.loads(create_res.data)['data']['id']
# 2. Login as admin and delete
admin_token, _ = get_tokens(client, username='admin')
headers_admin = {'Authorization': f'Bearer {admin_token}'}
delete_res = client.delete(f'/api/tasks/{task_id}', headers=headers_admin)
assert delete_res.status_code == 204 # No Content
# 3. Verify task is gone
get_res = client.get(f'/api/tasks/{task_id}', headers=headers_admin)
assert get_res.status_code == 404
def test_delete_task_as_user(client, session):
"""Test that a regular user cannot delete a task (unless they own it - logic not implemented yet)."""
# 1. Create a task (e.g., by admin or another user)
admin_token, _ = get_tokens(client, username='admin')
headers_admin = {'Authorization': f'Bearer {admin_token}', 'Content-Type': 'application/json'}
task_payload = {'title': 'Admin Task'}
create_res = client.post('/api/tasks/', headers=headers_admin, json=task_payload)
assert create_res.status_code == 201
task_id = json.loads(create_res.data)['data']['id']
# 2. Login as regular user and attempt delete
user_token, _ = get_tokens(client, username='user')
headers_user = {'Authorization': f'Bearer {user_token}'}
delete_res = client.delete(f'/api/tasks/{task_id}', headers=headers_user)
assert delete_res.status_code == 403 # Forbidden
# 3. Verify task still exists
get_res = client.get(f'/api/tasks/{task_id}', headers=headers_admin) # Check as admin
assert get_res.status_code == 200
Explanation:
- Fixtures as Arguments: Test functions declare
client
andsession
as arguments to get the test client and isolated database session. - Test Client Usage:
client.get()
,client.post()
,client.delete()
, etc., are used to simulate HTTP requests.headers
: Pass authentication tokens orContent-Type
.json
: Send JSON payload in the request body (automatically setsContent-Type: application/json
).
- Assertions:
assert res.status_code == ...
checks the HTTP status.json.loads(res.data)
decodes the JSON response body. Assertions check the structure and values within the response data. - Database Verification: Tests that modify data (like
test_create_task
) often include assertions that directly query the database using thesession
fixture to ensure the change was persisted correctly. - Test Isolation: Because the
session
fixture rolls back changes after each test,test_create_task
doesn't leave the created task in the database fortest_get_tasks_authenticated_empty
to see. Each test starts with a clean slate (empty tables, as defined by thedb
fixture's scope).
Running Tests
Navigate to your project's root directory (advanced_api
) in the terminal (with the virtual environment active) and simply run:
pytest
# Or for more verbose output:
# pytest -v
# Or to run tests only in a specific file:
# pytest tests/test_tasks.py
# Or to run a specific test function by name:
# pytest -k test_create_task
Pytest will automatically discover files named test_*.py
or *_test.py
and functions named test_*
within them, execute them, and report the results (passes, failures, errors).
Workshop: Writing Tests for the Auth Endpoints
Goal: Add tests for the /auth/login
, /auth/register
, and /auth/refresh
endpoints.
Steps:
- Create Test File: Create
tests/test_auth.py
. - Add Imports and Fixtures: Import necessary modules (
pytest
,json
,User
) and include fixtures likeclient
,session
. -
Write Test Functions: Create tests for:
test_register_success
: Test successful user registration (POST/auth/register
). Verify 201 status, response message, and that the user exists in the database (usingsession
).test_register_missing_fields
: Test registration with missing data -> Expect 400.test_register_duplicate_username
: Try registering with an existing username -> Expect 409 Conflict.test_register_duplicate_email
: Try registering with an existing email -> Expect 409 Conflict.test_login_success
: Use a known user (e.g., default 'admin' or 'user' seeded byinit-db
) -> Expect 200, check foraccess_token
andrefresh_token
in the response.test_login_wrong_password
: Use correct username, wrong password -> Expect 401.test_login_nonexistent_user
: Use a username that doesn't exist -> Expect 401.test_login_missing_fields
: Send incomplete login data -> Expect 400.test_refresh_token_success
:- Log in to get a valid refresh token.
- Use the refresh token to POST to
/auth/refresh
. - Expect 200 and a new
access_token
in the response.
test_refresh_with_access_token
: Try using an access token on the refresh endpoint -> Expect 401/422 (Flask-JWT-Extended should reject it).test_refresh_invalid_token
: Send an invalid/expired refresh token -> Expect 401/422.
-
Run Tests: Run
pytest tests/test_auth.py
orpytest
to execute all tests. Debug any failures.
Outcome: You will have a test suite covering the core authentication functionality, ensuring registration, login, and token refresh work as expected under various conditions, including error cases. This builds confidence in your authentication system.
10. Deployment Strategies
Developing a Flask API on your local machine using the development server (flask run
or app.run()
) is convenient, but it's not suitable for production. The development server is single-threaded by default, not designed for performance or security under load, and lacks features needed for a robust deployment.
Deploying a Python web application like Flask typically involves several components working together on a Linux server:
- WSGI Server: Handles concurrent requests efficiently and communicates with your Flask application using the Web Server Gateway Interface (WSGI) standard. Popular choices: Gunicorn, uWSGI.
- Web Server (Reverse Proxy): Sits in front of the WSGI server. Handles incoming connections from the internet, serves static files directly (if any), manages SSL/TLS encryption (HTTPS), performs load balancing (if needed), and forwards dynamic requests to the WSGI server. Popular choice: Nginx.
- Process Manager: Ensures your WSGI server process(es) are running, automatically restarts them if they crash, and manages startup/shutdown. Popular choice: Systemd (built into most modern Linux distributions).
- Database: Your production database server (PostgreSQL, MySQL, etc.). Usually runs as a separate service.
- Code Deployment: A mechanism to get your application code onto the server (e.g., Git clone/pull, SCP, CI/CD pipeline).
Common Deployment Stacks
A very common and robust stack on Linux is:
Nginx (Reverse Proxy) <-> Gunicorn (WSGI Server) <-> Your Flask App Managed by Systemd.
Let's break down the roles and configuration.
Gunicorn (WSGI Server)
Gunicorn ('Green Unicorn') is a pure-Python WSGI HTTP server. It's simple to configure, widely used, and performant.
- Installation:
- Running Gunicorn: You typically run Gunicorn from the command line, pointing it to your WSGI application object. For our factory pattern using
wsgi.py
:# Ensure you are in the project root (advanced_api) # Make sure wsgi.py exists and correctly creates the app instance using the factory # Example wsgi.py: # import os # from app import create_app # # config_name = os.getenv('FLASK_CONFIG', 'production') # application = create_app(config_name) # Command to run: gunicorn --workers 3 --bind 0.0.0.0:5000 "wsgi:application" # Or bind to a Unix socket (often preferred when using Nginx on the same machine): # gunicorn --workers 3 --bind unix:/path/to/your/project/app.sock "wsgi:application" -m 007 # Set socket permissions
--workers
: Number of worker processes to handle requests concurrently. A common starting point is(2 * number_of_cpu_cores) + 1
. Adjust based on load testing.--bind
: The address and port (e.g.,0.0.0.0:5000
) or Unix socket path (e.g.,unix:/path/to/app.sock
) Gunicorn should listen on. Binding to0.0.0.0
allows connections from Nginx (or externally), while a Unix socket is often slightly more efficient for local communication between Nginx and Gunicorn."wsgi:application"
: Tells Gunicorn where to find your WSGI application object. It means: look in thewsgi.py
file for a variable namedapplication
.-m 007
: (If using sockets) Sets permissions on the socket file so the web server (Nginx) user can read/write to it.
You usually don't run this command directly in production; instead, you configure a process manager like Systemd to run it.
Nginx (Reverse Proxy)
Nginx is a high-performance web server commonly used as a reverse proxy.
- Installation (Linux):
-
Configuration: You configure Nginx by creating server block files (virtual hosts) typically in
/etc/nginx/sites-available/
and then enabling them by creating symbolic links in/etc/nginx/sites-enabled/
.Example Nginx Server Block (
/etc/nginx/sites-available/your_api
):server { listen 80; # Listen on port 80 for HTTP # listen 443 ssl; # Enable this line and below for HTTPS server_name your_domain.com www.your_domain.com; # Replace with your domain or server IP # --- SSL/TLS Configuration (Essential for Production!) --- # Uncomment and configure if using HTTPS (recommended) # ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem; # Path to your SSL cert # ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem; # Path to your private key # include /etc/letsencrypt/options-ssl-nginx.conf; # Recommended SSL options (from Certbot) # ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Diffie-Hellman params (from Certbot) # --- Proxying Requests to Gunicorn --- location / { # Match all requests # Redirect HTTP to HTTPS (if SSL enabled) # if ($scheme != "https") { # return 301 https://$host$request_uri; # } proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Adjust based on how Gunicorn is bound: # Option 1: Gunicorn listening on TCP port 5000 # proxy_pass http://127.0.0.1:5000; # Option 2: Gunicorn listening on a Unix socket proxy_pass http://unix:/path/to/your/project/app.sock; # Match Gunicorn --bind path proxy_read_timeout 300s; # Increase timeout if needed proxy_connect_timeout 75s; } # --- Optional: Serving Static Files Directly (if applicable) --- # location /static { # alias /path/to/your/project/app/static; # expires 30d; # Cache static files # } # --- Logging --- access_log /var/log/nginx/your_api_access.log; error_log /var/log/nginx/your_api_error.log; }
listen
: Specifies the port (80 for HTTP, 443 for HTTPS).server_name
: Your domain name(s) or server's IP address.- SSL Config: Essential for production HTTPS. Use tools like Certbot (Let's Encrypt) to obtain and manage free SSL certificates easily.
location /
: Block defines how to handle requests.proxy_set_header
: Passes important information about the original request to Gunicorn/Flask (like the original host, client IP).X-Forwarded-Proto $scheme
tells Flask if the original request was HTTP or HTTPS.proxy_pass
: Forwards the request to the Gunicorn process (either via TCP or Unix socket). Make sure this matches how Gunicorn is bound!proxy_*_timeout
: Adjust timeouts if your API has long-running requests.
-
Enabling the Site and Restarting Nginx:
Systemd (Process Manager)
Systemd is the standard init system and service manager on most modern Linux distributions. We can create a service file to manage our Gunicorn process.
-
Create Service File (
/etc/systemd/system/your_api.service
):[Unit] Description=Gunicorn instance to serve your_api # Start after the network is available After=network.target # If using PostgreSQL, maybe start after postgresql.service # After=network.target postgresql.service [Service] # User/Group to run the service as (create a dedicated user if needed) User=your_deploy_user # Replace with the user that owns the project files Group=www-data # Or the group Nginx runs as, for socket permissions # Working directory for the Gunicorn process WorkingDirectory=/path/to/your/project/advanced_api # Absolute path to project root # Path to the virtual environment's activate script is not needed directly, # but Gunicorn executable must be found. Specify absolute path to gunicorn. ExecStart=/path/to/your/project/advanced_api/venv/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 "wsgi:application" # Use the socket path relative to WorkingDirectory # OR if binding to TCP port: # ExecStart=/path/to/your/project/advanced_api/venv/bin/gunicorn --workers 3 --bind 0.0.0.0:5000 "wsgi:application" # Environment variables (optional, load from file or define here) # Environment="FLASK_CONFIG=production" # Environment="JWT_SECRET_KEY=your_production_secret" # Secrets better handled via files/vaults # EnvironmentFile=/path/to/your/project/.env # If using .env file # Restart policy Restart=always RestartSec=5s # Time to wait before restarting # Logging (optional, defaults to journald) StandardOutput=journal StandardError=journal SyslogIdentifier=your_api [Install] # Enable the service to start on boot WantedBy=multi-user.target
[Unit]
: Describes the service and its dependencies.[Service]
: Defines how to run the service.User
/Group
: Important for permissions. TheUser
should own the project files. TheGroup
might need to bewww-data
(ornginx
) if using a Unix socket so Nginx can access it.WorkingDirectory
: Set this to your project's root directory.ExecStart
: The full command to start Gunicorn. Use the absolute path to thegunicorn
executable inside your virtual environment. Make sure the--bind
argument here matches theproxy_pass
in your Nginx config. If using a socket, the pathunix:app.sock
is relative to theWorkingDirectory
.Environment
/EnvironmentFile
: Set environment variables needed by your Flask app (likeFLASK_CONFIG
, database URLs, secret keys). Avoid putting secrets directly in the service file. UseEnvironmentFile
or other secure methods.Restart
: Tells Systemd to restart the service if it fails.
[Install]
: Defines how the service should be enabled.
-
Managing the Service:
# Reload Systemd to recognize the new service file sudo systemctl daemon-reload # Start the service sudo systemctl start your_api # Check the status sudo systemctl status your_api # View logs (if using journald) sudo journalctl -u your_api -f # Follow logs # Stop the service sudo systemctl stop your_api # Enable the service to start automatically on boot sudo systemctl enable your_api
Containerization with Docker
Docker provides another powerful way to package and deploy your Flask application and its dependencies. It isolates your application environment, making deployments more consistent and reproducible across different machines.
Dockerfile
: Defines the steps to build a container image for your application.# Dockerfile # Use an official Python runtime as a parent image FROM python:3.9-slim # Set environment variables ENV PYTHONDONTWRITEBYTECODE 1 # Prevents python from writing pyc files ENV PYTHONUNBUFFERED 1 # Prevents python from buffering stdout/stderr # Set work directory WORKDIR /app # Install system dependencies (if any) # RUN apt-get update && apt-get install -y --no-install-recommends some-package # Install Python dependencies # Copy only requirements first to leverage Docker cache COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy project code into the container COPY . . # Set the user (optional but good practice) # RUN addgroup --system app && adduser --system --group app # USER app # Expose the port Gunicorn will run on (matches Gunicorn bind, NOT Nginx port) EXPOSE 5000 # Define the command to run the application using Gunicorn # Use production config, bind to 0.0.0.0 inside the container CMD ["gunicorn", "--workers", "4", "--bind", "0.0.0.0:5000", "wsgi:application"]
docker-compose.yml
(Optional): For managing multi-container applications (e.g., your API, Nginx, database).# docker-compose.yml version: '3.8' services: web: build: . # Build the image from the Dockerfile in the current directory command: gunicorn --workers 4 --bind 0.0.0.0:5000 wsgi:application volumes: - .:/app # Mount current directory (for development, maybe remove for prod) - ./instance:/app/instance # Mount instance folder ports: - "5000:5000" # Map host port 5000 to container port 5000 (for direct access) environment: FLASK_CONFIG: production # Load other env vars from a .env file (Compose automatically loads .env) # DATABASE_URL: postgresql://user:password@db:5432/mydatabase # JWT_SECRET_KEY: ${JWT_SECRET_KEY} # Get from host env or .env depends_on: - db # Wait for db service to be ready (basic check) networks: - app-network nginx: # Example Nginx service image: nginx:latest ports: - "80:80" # Map host port 80 to Nginx container port 80 - "443:443" # Map host port 443 to Nginx container port 443 volumes: - ./nginx.conf:/etc/nginx/conf.d/default.conf # Mount your Nginx config # - /path/to/ssl/certs:/etc/ssl/certs # Mount SSL certs depends_on: - web networks: - app-network db: # Example PostgreSQL service image: postgres:13 volumes: - postgres_data:/var/lib/postgresql/data/ # Persist data environment: POSTGRES_USER: user POSTGRES_PASSWORD: password POSTGRES_DB: mydatabase networks: - app-network networks: app-network: driver: bridge volumes: postgres_data:
- Workflow: Build the image (
docker build -t your-api-image .
), then run it (docker run ...
) or use Docker Compose (docker-compose up
). You'd still typically run Nginx outside the API container (or as another container) to act as the reverse proxy.
Deployment Checklist
- Code: Ensure your latest code is on the server (Git pull, CI/CD).
- Dependencies: Install production dependencies (
pip install -r requirements.txt
). - Configuration:
- Set
FLASK_CONFIG=production
. - Provide production database URI.
- Set a strong, unique
JWT_SECRET_KEY
(and other secrets) securely (environment variables,.env
file outside repo, secrets management tool). - Ensure
DEBUG=False
andTESTING=False
in production config.
- Set
- Database Migrations: Apply any pending migrations (
flask db upgrade
). - WSGI Server: Configure Gunicorn/uWSGI (workers, binding).
- Reverse Proxy: Configure Nginx (server name, SSL/TLS, proxy pass to WSGI).
- Process Manager: Create and enable a Systemd service (or equivalent) to manage the WSGI server process.
- HTTPS: Configure SSL/TLS certificates (e.g., using Let's Encrypt/Certbot). Force HTTPS redirection.
- Firewall: Configure server firewall (e.g.,
ufw
) to allow traffic only on necessary ports (e.g., 80/HTTP, 443/HTTPS, maybe SSH). - Logging: Set up proper logging for Nginx, Gunicorn, and your Flask application to monitor errors and activity.
- Monitoring: Implement monitoring tools to track server health, application performance, and errors.
Workshop: Preparing for Deployment (Conceptual)
Goal: Outline the steps and configuration needed to deploy the advanced_api
project using Gunicorn, Nginx, and Systemd on a hypothetical Linux server. (Actually performing the deployment requires a server environment).
Steps:
-
Create
wsgi.py
: At the root ofadvanced_api
, createwsgi.py
:# wsgi.py import os from app import create_app # Ensure production config is used by default or via environment variable config_name = os.getenv('FLASK_CONFIG', 'production') application = create_app(config_name) # Optional: Log application startup in WSGI context if __name__ != '__main__': # Check if run by WSGI server import logging gunicorn_logger = logging.getLogger('gunicorn.error') if gunicorn_logger: application.logger.handlers.extend(gunicorn_logger.handlers) application.logger.setLevel(gunicorn_logger.level) application.logger.info(f"Flask app '{application.name}' created for WSGI server with config '{config_name}'.") else: application.logger.info("Gunicorn logger not found, using default Flask logger.")
-
Create Sample Systemd Service File: Create a template
your_api.service
file (save it locally for reference, you'd place it in/etc/systemd/system/
on the server):# your_api.service (Template) [Unit] Description=Gunicorn instance for Advanced Task API After=network.target [Service] User=deploy_user # REPLACE with actual user Group=www-data # REPLACE /path/to/project with the actual deployment path WorkingDirectory=/path/to/project/advanced_api Environment="FLASK_CONFIG=production" # EnvironmentFile=/path/to/project/.env # Load secrets from .env file # REPLACE /path/to/venv with the actual venv path ExecStart=/path/to/project/advanced_api/venv/bin/gunicorn --workers 3 --bind unix:app.sock -m 007 "wsgi:application" Restart=always RestartSec=3 [Install] WantedBy=multi-user.target
-
Create Sample Nginx Config: Create a template
your_api_nginx.conf
file (save locally, place in/etc/nginx/sites-available/
on server):# your_api_nginx.conf (Template) server { listen 80; server_name your_domain.com; # REPLACE with actual domain/IP # Add HTTPS config here for production location / { include proxy_params; # Standard proxy headers usually in /etc/nginx/proxy_params proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # REPLACE /path/to/project with the actual deployment path proxy_pass http://unix:/path/to/project/advanced_api/app.sock; } access_log /var/log/nginx/advanced_api_access.log; error_log /var/log/nginx/advanced_api_error.log; }
-
Review Production Config: Go to
app/config.py
and ensureProductionConfig
setsDEBUG = False
,TESTING = False
, and securely loads necessary variables likeSQLALCHEMY_DATABASE_URI
andJWT_SECRET_KEY
(ideally from environment variables). -
Deployment Steps (Mental Walkthrough):
- Provision a Linux server.
- Install Python, pip, Nginx, database (e.g., PostgreSQL).
- Create a deployment user (
deploy_user
). - Set up firewall (
ufw
). - Clone your project code to
/path/to/project/advanced_api
. - Create and activate a virtual environment (
venv
). - Install dependencies (
pip install -r requirements.txt
). Don't forgetgunicorn
. - Set up the production database and user.
- Configure environment variables (e.g., in
/path/to/project/.env
or system-wide). Ensure the deploy user can read them. - Apply database migrations (
export FLASK_APP=wsgi.py; flask db upgrade
). - Place the Systemd service file (
your_api.service
) in/etc/systemd/system/
, replacing placeholders. Runsystemctl daemon-reload
,systemctl start your_api
,systemctl enable your_api
. Check status and logs. - Place the Nginx config file (
your_api_nginx.conf
) in/etc/nginx/sites-available/
, replacing placeholders. Create symlink insites-enabled
. Set up SSL certificates. Runnginx -t
,systemctl restart nginx
. - Test accessing your API via the domain name.
Outcome: While you haven't performed a live deployment, you have created the necessary configuration file templates (wsgi.py
, Systemd service, Nginx config) and mentally walked through the essential steps involved in deploying your Flask API to a production-like Linux environment.
11. Rate Limiting
When you expose an API to the internet, you need to protect it from abuse, whether intentional (malicious attacks) or unintentional (poorly behaving clients sending excessive requests). Rate limiting is a crucial technique for controlling the number of requests a client can make to your API within a specific time window.
Why Implement Rate Limiting?
- Prevent Denial-of-Service (DoS) / Brute-Force Attacks: Limits the speed at which attackers can hammer your login endpoints or resource-intensive operations.
- Ensure Fair Usage: Prevents a single misbehaving or overly aggressive client from consuming all server resources and degrading performance for other users.
- Manage Costs: If your API relies on paid external services, rate limiting can help control usage costs.
- Maintain Service Stability: Protects your backend infrastructure (servers, databases) from being overwhelmed by sudden spikes in traffic.
Flask-Limiter Extension
Flask-Limiter is an excellent extension that makes implementing various rate limiting strategies in Flask applications straightforward.
1. Installation:
(In your advanced_api
virtual environment)
pip install Flask-Limiter
# Add to requirements.txt
echo "Flask-Limiter" >> requirements.txt
pip freeze > requirements.txt
2. Initialization: Integrate it into your application factory. Flask-Limiter needs access to the request context to identify clients, so it's initialized with the app. It also requires a storage backend to keep track of request counts.
-
Configuration (
app/config.py
): Flask-Limiter needs a storage backend URI. For simple cases, in-memory works, but this won't scale across multiple Gunicorn workers or servers. Redis or Memcached are highly recommended for production.# app/config.py import os # ... class Config: # ... other config ... # --- Rate Limiter Configuration --- # In-memory storage (suitable for development/single process) RATELIMIT_STORAGE_URI = "memory://" # Redis Example (Recommended for production with multiple workers/servers) # Requires redis server running and 'redis' package installed (pip install redis) # RATELIMIT_STORAGE_URI = os.environ.get('RATELIMIT_REDIS_URL', 'redis://localhost:6379/0') # Memcached Example # Requires memcached server running and 'pymemcache' package installed # RATELIMIT_STORAGE_URI = os.environ.get('RATELIMIT_MEMCACHED_URL', 'memcached://localhost:11211') # Default rate limits applied to all routes unless overridden # Format: "count per interval[;another limit]" e.g., "100 per hour;10 per minute" RATELIMIT_DEFAULT = "200 per day;50 per hour" # Strategy to identify clients ('ip' is common) RATELIMIT_STRATEGY = 'ip' # Options: 'ip', 'host', custom based on headers etc. class DevelopmentConfig(Config): # ... # Maybe relax limits for development RATELIMIT_DEFAULT = "500 per hour;50 per minute" class ProductionConfig(Config): # ... # Use Redis or Memcached for storage RATELIMIT_STORAGE_URI = os.environ.get('RATELIMIT_REDIS_URL', 'redis://localhost:6379/0') # Define sensible production limits RATELIMIT_DEFAULT = "1000 per day;100 per hour;10 per minute" if not RATELIMIT_STORAGE_URI or RATELIMIT_STORAGE_URI == "memory://": print("WARNING: Rate limiting is using in-memory storage in production!") # ... TestingConfig ... class TestingConfig(Config): # ... RATELIMIT_ENABLED = False # Disable rate limiting during tests
-
app/extensions.py
: Instantiate the Limiter.# app/extensions.py # ... other imports ... from flask_limiter import Limiter from flask_limiter.util import get_remote_address # Default key function (by IP) # ... db, ma, jwt, migrate ... limiter = Limiter( key_func=get_remote_address, # Function to identify clients (default: request IP) # storage_uri will be set from app config during init_app # default_limits will be set from app config during init_app )
-
app/__init__.py
: Initialize with the app instance.# app/__init__.py # ... imports ... from .extensions import db, ma, jwt, migrate, limiter # Import limiter def create_app(config_name='development'): # ... app creation, config loading ... # --- Initialize Extensions --- db.init_app(app) ma.init_app(app) jwt.init_app(app) migrate.init_app(app, db) # Initialize Limiter - AFTER loading config limiter.init_app(app) # --- Register Blueprints --- # ... # --- Error Handlers --- # Flask-Limiter raises RateLimitExceeded, but has its own default 429 handler. # You can customize it using @app.errorhandler(429) if needed. # ... # ... rest of factory ... return app
Applying Rate Limits
Flask-Limiter offers several ways to apply limits:
- Global Limits: Set via
RATELIMIT_DEFAULT
in the Flask config. Applies to all routes unless explicitly exempted or overridden. - Decorator (
@limiter.limit
): Apply specific limits to individual routes or blueprints. - Blueprint Limits: Apply limits to all routes within a specific Blueprint during registration.
Examples:
-
Applying to a Specific Route (
This adds a specific limit in addition to any global limits. A request must pass all applicable limits.app/tasks/routes.py
): -
Applying to a Blueprint (
app/__init__.py
or blueprint definition): You can register limits directly onto a blueprint object before registering it with the app.Applying limits directly to blueprints is less common than using decorators on specific routes or relying on global defaults.# Option 1: Decorate the blueprint object directly (e.g., in app/tasks/__init__.py) # app/tasks/__init__.py # from flask import Blueprint # from ..extensions import limiter # # tasks_bp = Blueprint('tasks', __name__) # limiter.limit("60 per hour")(tasks_bp) # Apply limit to all routes in tasks_bp # from . import routes # Option 2: Using limiter.limit decorators within the blueprint routes file # This is often clearer as the limit is next to the route.
-
Dynamic Limits (Based on User/Token): You can define limits based on the authenticated user.
# Example key function based on JWT identity def get_user_id_func(): # Return a default value if no JWT identity is present (e.g., for public routes) # Note: Requires request context from flask_jwt_extended import get_jwt_identity try: # Check if JWT identity exists identity = get_jwt_identity() return str(identity) if identity else get_remote_address() except RuntimeError: # No request context or JWT setup issue, fallback to IP return get_remote_address() # In app/extensions.py, instantiate Limiter with the custom key func: # limiter = Limiter(key_func=get_user_id_func) # Then apply limits as usual. They will now be tracked per user ID (if logged in) # or per IP address (if not logged in).
-
Exempting Routes:
How it Works & Headers
- When a request comes in, Flask-Limiter identifies the client using the
key_func
(e.g., IP address). - It checks the configured storage (memory, Redis, etc.) for the number of requests made by that client within the defined time windows for all applicable limits.
- If any limit is exceeded, it aborts the request with a 429 Too Many Requests status code.
- Flask-Limiter automatically adds helpful HTTP headers to the response:
X-RateLimit-Limit
: The request limit for the current window.X-RateLimit-Remaining
: The number of requests remaining in the window.X-RateLimit-Reset
: The UTC epoch timestamp when the limit resets.Retry-After
: (Sent on 429 responses) The number of seconds the client should wait before retrying.
Workshop: Adding Basic Rate Limiting
Goal: Apply global rate limits and a specific, stricter limit to the login endpoint.
Steps:
- Install:
pip install Flask-Limiter
and add torequirements.txt
. If using Redis for storage (recommended for simulating production), alsopip install redis
. - Configure:
- In
app/config.py
:- Add
RATELIMIT_STORAGE_URI
. Choosememory://
for simplicity orredis://localhost:6379/0
if you have Redis running. - Add
RATELIMIT_DEFAULT
(e.g.,"100 per hour;10 per minute"
). - Set
RATELIMIT_ENABLED = False
inTestingConfig
.
- Add
- In
- Initialize:
- Add
limiter = Limiter(...)
toapp/extensions.py
, usingget_remote_address
. - Call
limiter.init_app(app)
inapp/__init__.py
.
- Add
- Apply Specific Limit:
- In
app/auth/routes.py
, importlimiter
from..extensions
. - Add the
@limiter.limit("5 per minute")
decorator to the/auth/login
route function, placing it below the@auth_bp.route
decorator.
- In
- Test:
- Run the app:
python run.py
. - Hit a normal endpoint repeatedly: Use
curl
in a loop or a simple script to hitGET /api/tasks/
(requires a valid token obtained previously) more than 10 times within a minute (or whatever yourRATELIMIT_DEFAULT
minute limit is). You should eventually get a429 Too Many Requests
response. Check theX-RateLimit-*
headers on successful responses. - Hit the login endpoint repeatedly: Send POST requests to
/auth/login
(even with invalid credentials) more than 5 times within a minute. You should get a 429 response sooner than for the tasks endpoint. - Check Headers: Observe the
X-RateLimit-*
headers in the responses usingcurl -i
.
- Run the app:
Outcome: You've implemented basic rate limiting for your API, protecting it against excessive requests using both global defaults and endpoint-specific rules. You understand how to configure Flask-Limiter and apply limits using decorators.
12. Caching Strategies
Caching is a fundamental technique for improving the performance and scalability of web applications and APIs. It involves storing the results of expensive or frequently accessed computations/data retrievals (like database queries or complex calculations) in a faster, temporary storage location (the cache). Subsequent requests for the same data can then be served directly from the cache, bypassing the original computation and significantly reducing response times and server load.
Why Use Caching in APIs?
- Reduced Latency: Serving responses from a fast cache (like memory or Redis) is much quicker than querying a database or calling external services.
- Reduced Server Load: Lessens the burden on your application servers, databases, and other backend resources, allowing your API to handle more concurrent users.
- Improved Scalability: Caching helps your application scale more effectively by reducing bottlenecks in slower parts of the system.
- Cost Savings: Can reduce database costs or API call costs to external services.
Caching Layers
Caching can occur at multiple levels:
- Client-Side: Browsers cache responses based on HTTP headers like
Cache-Control
,Expires
,ETag
. While important for web pages, less directly controlled by the API developer for typical API clients, though setting appropriate headers is good practice. - CDN (Content Delivery Network): CDNs cache static assets and potentially API responses geographically closer to users.
- Reverse Proxy (Nginx): Nginx can be configured to cache responses from your application server.
- Application-Level: Caching implemented within your Flask application itself. This gives you fine-grained control over what gets cached, when, and for how long. This is our focus here.
Flask-Caching Extension
Flask-Caching is the standard extension for adding caching capabilities to Flask applications. It supports various caching backends.
1. Installation:
(In your advanced_api
virtual environment)
pip install Flask-Caching
# Install backend specific libraries if needed:
# pip install redis # For Redis backend
# pip install pymemcache # For Memcached backend
Flask-Caching
and any backend drivers to requirements.txt
.
2. Configuration (app/config.py
):
You need to tell Flask-Caching which backend to use and provide connection details.
# app/config.py
import os
# ...
class Config:
# ... other config ...
# --- Caching Configuration ---
# See Flask-Caching docs for more options: https://flask-caching.readthedocs.io/en/latest/
CACHE_TYPE = "SimpleCache" # Default: In-memory cache (suitable for development)
CACHE_DEFAULT_TIMEOUT = 300 # Default cache timeout in seconds (5 minutes)
# Example: Redis Cache (Recommended for multi-process/server production)
# Requires 'redis' package and Redis server running
# CACHE_TYPE = "RedisCache"
# CACHE_REDIS_URL = os.environ.get('CACHE_REDIS_URL', 'redis://localhost:6379/1') # Use different DB than rate limiter
# Example: Memcached Cache
# Requires 'pymemcache' package and Memcached server running
# CACHE_TYPE = "MemcachedCache"
# CACHE_MEMCACHED_SERVERS = [os.environ.get('CACHE_MEMCACHED_URL', '127.0.0.1:11211')]
class DevelopmentConfig(Config):
# ...
pass # Inherits SimpleCache
class ProductionConfig(Config):
# ...
CACHE_TYPE = os.environ.get('CACHE_TYPE', 'RedisCache')
CACHE_REDIS_URL = os.environ.get('CACHE_REDIS_URL', 'redis://localhost:6379/1')
CACHE_DEFAULT_TIMEOUT = 60 * 15 # 15 minutes default in prod
if CACHE_TYPE == "SimpleCache":
print("WARNING: Caching is using SimpleCache (in-memory) in production!")
class TestingConfig(Config):
# ...
CACHE_TYPE = "NullCache" # Disable caching during tests to avoid side effects
3. Initialization:
-
app/extensions.py
: -
app/__init__.py
:# app/__init__.py # ... imports ... from .extensions import db, ma, jwt, migrate, limiter, cache # Import cache def create_app(config_name='development'): # ... app creation ... # --- Load Configuration --- # Config must be loaded before initializing extensions that use it # ... config loading ... # --- Initialize Extensions --- db.init_app(app) ma.init_app(app) jwt.init_app(app) migrate.init_app(app, db) limiter.init_app(app) cache.init_app(app) # Initialize Cache with app (reads config) # ... blueprints, error handlers, etc ... return app
Using the Cache
Flask-Caching primarily uses decorators to cache the results of view functions.
1. Caching View Functions (@cache.cached
):
This decorator caches the entire response of a view function.
# app/tasks/routes.py
# ... imports ...
from ..extensions import cache # Import cache instance
# ... schemas, limiter, helpers ...
@tasks_bp.route('/', methods=['GET'])
@jwt_required()
@cache.cached(timeout=60) # Cache response for 60 seconds
def get_tasks():
print("*** Executing get_tasks logic (Not Cached) ***") # Add log to see when it runs
# ... (filtering logic) ...
tasks = query.order_by(db.desc(Task.created_at)).all()
result = tasks_schema.dump(tasks)
return make_success_response(result)
@tasks_bp.route('/<int:task_id>', methods=['GET'])
@jwt_required()
# Cache based on URL path (includes task_id). Also consider query strings.
@cache.cached(timeout=300, key_prefix='view/%s') # Default prefix uses path + query string
# key_prefix allows customizing the cache key format
def get_task(task_id):
print(f"*** Executing get_task logic for ID {task_id} (Not Cached) ***")
task = Task.query.get_or_404(task_id)
result = task_schema.dump(task)
return make_success_response(result)
# IMPORTANT: Do NOT cache endpoints that modify data (POST, PUT, PATCH, DELETE)!
# Caching these would lead to inconsistent state.
# ... other routes (POST, PUT, PATCH, DELETE should NOT be cached) ...
@cache.cached(timeout=...)
: Caches the return value (the FlaskResponse
object) of the decorated function.timeout
: Cache duration in seconds (overridesCACHE_DEFAULT_TIMEOUT
).- Cache Key: By default, the cache key is generated based on the function's module/name and the request path including query string arguments. This means
/api/tasks/?status=pending
will have a different cache key than/api/tasks/
. key_prefix
: Allows customizing the generated cache key.view/%s
is the default format string where%s
is the request path + query string.
2. Memoization (@cache.memoize
):
Caches the result of any function (not just view functions) based on its arguments. Useful for caching results of expensive internal computations or data lookups that might be called multiple times within a single request or across different requests with the same inputs.
# Example: Caching a computationally expensive function (e.g., in a utils module)
from app.extensions import cache
@cache.memoize(timeout=3600) # Cache for 1 hour
def expensive_calculation(param1, param2):
print(f"*** Performing expensive calculation({param1}, {param2}) ***")
# Simulate heavy work
import time; time.sleep(2)
return {'result': param1 * param2}
# In a view function:
@tasks_bp.route('/calculate')
@jwt_required()
def calculate_stuff():
# The first time this is called with specific params, it runs.
# Subsequent calls with the same params (within the timeout) get the cached result.
calc1 = expensive_calculation(10, 5)
calc2 = expensive_calculation(20, 3)
calc3 = expensive_calculation(10, 5) # This call will likely hit the cache
return jsonify(calc1=calc1, calc2=calc2, calc3=calc3)
- The cache key for
@memoize
is based on the function name and the values of its arguments.
3. Manual Cache Access:
You can interact with the cache directly using cache.get(key)
, cache.set(key, value, timeout=...)
, cache.delete(key)
.
# Manual cache example
from app.extensions import cache
def get_user_profile_data(user_id):
cache_key = f"user_{user_id}_profile"
cached_data = cache.get(cache_key)
if cached_data:
print(f"--- Cache HIT for {cache_key} ---")
return cached_data
print(f"--- Cache MISS for {cache_key} ---")
# Simulate fetching data from DB or external service
user_data = {"id": user_id, "name": f"User {user_id}", "preference": "dark_mode"}
cache.set(cache_key, user_data, timeout=600) # Cache for 10 minutes
return user_data
4. Cache Invalidation: This is one of the hardest problems in caching. When underlying data changes (e.g., a task is updated or deleted), you need to remove the outdated data from the cache.
- Timeout-Based: The simplest method. Data expires automatically after the timeout. Suitable for data that doesn't change too frequently or where slightly stale data is acceptable.
- Manual Deletion: Explicitly delete cache keys when data changes. Requires careful tracking of which keys might be affected by an update.
# Inside update_task route, after successful commit: @tasks_bp.route('/<int:task_id>', methods=['PUT']) @jwt_required() def update_task(task_id): # ... fetch task, validate, update ... try: db.session.commit() # --- Invalidate Cache --- cache.delete(f"view//api/tasks/{task_id}") # Delete specific task cache cache.delete("view//api/tasks/") # Delete task list cache (simplistic) # More sophisticated invalidation might target specific list cache keys # e.g., based on status filters if those are cached separately. cache.delete_memoized(get_task, task_id) # Delete memoized version if used result = task_schema.dump(updated_task) return make_success_response(result) except Exception as e: # ... error handling ...
- Pattern-Based Deletion: Some backends (like Redis) support deleting keys based on patterns (e.g.,
cache.delete_many("view//api/tasks/*")
), but use with caution as it can be slow.cache.clear()
removes everything.
Workshop: Caching Task List and Details
Goal: Apply caching to the GET /api/tasks/
and GET /api/tasks/<id>
endpoints to improve performance. Implement manual cache clearing when a task is updated or deleted.
Steps:
- Install & Configure:
pip install Flask-Caching
(and e.g.redis
if using Redis). Add torequirements.txt
.- Configure
CACHE_TYPE
,CACHE_DEFAULT_TIMEOUT
, and potentiallyCACHE_REDIS_URL
inapp/config.py
. SetCACHE_TYPE = "NullCache"
inTestingConfig
.
- Initialize:
- Add
cache = Cache()
toapp/extensions.py
. - Call
cache.init_app(app)
inapp/__init__.py
.
- Add
- Apply Caching Decorators:
- In
app/tasks/routes.py
, importcache
from..extensions
. - Add
@cache.cached(timeout=60)
decorator to theget_tasks
function (below@jwt_required
). - Add
@cache.cached(timeout=300)
decorator to theget_task
function (below@jwt_required
). - Add
print()
statements inside both functions (before fetching data) to indicate when the function logic is actually running (cache miss).
- In
- Implement Cache Invalidation:
- In
update_task
(PUT) andpatch_task
(PATCH) functions: Afterdb.session.commit()
successfully completes, add lines to delete the relevant cache keys:cache.delete_memoized(get_task, task_id)
(if you assume@cache.cached
on a view function works like memoization based on arguments).- OR more reliably, figure out the key Flask-Caching generates (it depends on request path). A simpler, broader approach for this workshop is:
cache.delete(f"view//api/tasks/{task_id}")
# Key for the specific task viewcache.delete("view//api/tasks/")
# Key for the main task list view (simplistic)
- In
delete_task
function: Afterdb.session.commit()
, add lines to delete the cache keys similarly:cache.delete(f"view//api/tasks/{task_id}")
cache.delete("view//api/tasks/")
- In
create_task
function: Afterdb.session.commit()
, invalidate the list view cache:cache.delete("view//api/tasks/")
- In
- Test Caching Behavior:
- Run the application
python run.py
. - Cache Miss: Use
curl
(with auth token) toGET /api/tasks/1
. Check the Flask console output for the "Executing get_task logic..." message. - Cache Hit: Immediately
GET /api/tasks/1
again. The response should be faster, and you should not see the "Executing..." message in the console. - Cache Miss (List):
GET /api/tasks/
. Check console for "Executing get_tasks logic...". - Cache Hit (List):
GET /api/tasks/
again. You shouldn't see the message. - Invalidation (Update): Update task 1 using
PUT
orPATCH
. - Check Cache Miss: Immediately
GET /api/tasks/1
again. You should see the "Executing..." message because the cache was cleared by the update. Also,GET /api/tasks/
should show a cache miss again. - Invalidation (Delete): Delete another task (e.g., task 2).
- Check Cache Miss:
GET /api/tasks/
should show a cache miss.
- Run the application
Outcome: You have successfully implemented application-level caching for read-heavy endpoints using Flask-Caching. You can observe the performance difference between cache hits and misses and understand the importance of cache invalidation when data changes.
13. Asynchronous Operations & Background Tasks
Web servers and frameworks like Flask typically handle incoming requests synchronously. When a request arrives, a worker process/thread picks it up, executes the view function, interacts with databases or other services, generates the response, and sends it back. During this time, that worker is blocked and cannot handle other requests.
If your API needs to perform tasks that take a significant amount of time (e.g., sending emails, processing images/videos, generating complex reports, calling slow third-party APIs), doing this directly within the request-response cycle can lead to:
- Long Response Times: Users have to wait until the long task completes, leading to a poor user experience and potentially request timeouts (e.g., from Nginx or the client).
- Worker Starvation: Long-running tasks tie up your limited pool of WSGI workers (Gunicorn processes), reducing your API's capacity to handle concurrent requests and potentially leading to service unavailability under load.
The solution is to offload these long-running operations to background tasks that run outside the normal request-response cycle. The API endpoint can then quickly acknowledge the request (e.g., return a 202 Accepted
status) and let the background task run independently.
Celery: Distributed Task Queue
Celery is the most popular and powerful framework for handling background tasks and scheduling in Python. It's a distributed system, meaning tasks can be executed by separate worker processes, potentially running on different machines, making it highly scalable.
Celery Core Components:
- Task Queue (Message Broker): A message broker (like Redis or RabbitMQ) acts as the central hub. When your Flask app wants to run a background task, it sends a message containing the task name and arguments to the broker.
- Celery Workers: Separate Python processes that constantly monitor the message broker for new tasks. When a worker finds a task it can handle, it pulls the message from the queue and executes the corresponding Python function.
- Result Backend (Optional): A backend (like Redis, a database, etc.) to store the results or status of completed tasks, allowing your application to check if a task finished and what its output was.
Integration with Flask:
Celery integrates well with Flask. The setup involves configuring Celery to work within the Flask application context so that tasks can access Flask config, extensions (like db
), etc.
1. Installation:
pip install celery[redis] # Installs celery and redis client library
# OR
# pip install celery[librabbitmq] # For RabbitMQ
# Add to requirements.txt
echo "celery[redis]" >> requirements.txt # Or appropriate broker
pip freeze > requirements.txt
2. Configuration: Celery needs configuration, often alongside Flask config.
-
Create
celery_worker.py
(Example): It's common to have a separate entry point for Celery workers.# celery_worker.py (at project root) import os from app import create_app, celery # Import create_app and the Celery instance (defined next) # Load FLASK_CONFIG from environment, default to development for worker context if needed flask_env = os.getenv('FLASK_CONFIG', 'development') # Create a Flask app instance for Celery to use its context app = create_app(config_name=flask_env) app.app_context().push() # Push context for the worker
-
Configure Celery within Flask (
app/__init__.py
orapp/extensions.py
): You need to create a Celery instance and configure its broker and result backend.# app/extensions.py # ... other imports ... from celery import Celery, Task as CeleryTask # ... db, ma, jwt, migrate, limiter, cache ... # Define a base task that sets up Flask app context class FlaskTask(CeleryTask): def __call__(self, *args, **kwargs): # Ensure tasks run within Flask app context # Requires app object, usually passed during Celery app creation with self.app.app_context(): return super().__call__(*args, **kwargs) # Instantiate Celery - Initially without app context # We will configure it inside the factory where the app exists celery = Celery(__name__, task_cls=FlaskTask)
# app/__init__.py # ... imports ... from .extensions import db, ma, jwt, migrate, limiter, cache, celery # Import celery instance from . import models # Ensure models are imported def create_app(config_name='development'): app = Flask(__name__, instance_relative_config=True) # ... config loading ... app.config.from_object(config_by_name[config_name]) app.config.from_pyfile('config.py', silent=True) # --- Configure Celery --- # Update Celery config from Flask config # Make sure BROKER_URL and CELERY_RESULT_BACKEND are set in Flask config celery_config = { 'broker_url': app.config.get('CELERY_BROKER_URL', 'redis://localhost:6379/2'), # Use separate Redis DB 'result_backend': app.config.get('CELERY_RESULT_BACKEND', 'redis://localhost:6379/3'), 'include': ['app.tasks.background'], # List of modules where tasks are defined 'task_ignore_result': app.config.get('CELERY_TASK_IGNORE_RESULT', True), # Default to not storing results unless needed 'result_expires': app.config.get('CELERY_RESULT_EXPIRES', 3600), # How long to keep results (if stored) } celery.conf.update(celery_config) # Link the Flask app context to the Celery base task celery.conf.update(app=app) # Pass the app instance to the Celery config # Or update the task class directly if needed: # celery.Task = FlaskTask # Set the base task class # celery.Task.app = app # Assign the app to the base task class # --- Initialize other Extensions --- # ... db, ma, jwt, migrate, limiter, cache ... db.init_app(app) # ... other init_app calls ... # --- Register Blueprints --- # ... # --- Error Handlers / CLI etc. --- # ... return app
- Add
CELERY_BROKER_URL
andCELERY_RESULT_BACKEND
to yourapp/config.py
(e.g., pointing to Redis URLs, using different database numbers than caching/rate limiting). include
: Tells Celery where to find your task definitions.
- Add
3. Defining Tasks:
Create task functions decorated with @celery.task
.
- Create
app/tasks/background.py
: (Matches theinclude
path in Celery config)# app/tasks/background.py import time from ..extensions import celery, db # Import celery instance and db if needed from ..models import Task # Import models if task interacts with them from celery.utils.log import get_task_logger # Celery's logger logger = get_task_logger(__name__) # Use Celery's logger for tasks @celery.task(bind=True, default_retry_delay=30, max_retries=3) # bind=True gives access to 'self' (the task instance) def process_task_data(self, task_id): """Example background task that processes data for a given task ID.""" logger.info(f"Starting background processing for Task ID: {task_id}") try: # Access database within app context (handled by FlaskTask base class) task = Task.query.get(task_id) if not task: logger.warning(f"Task ID {task_id} not found in database.") return {"status": "failed", "reason": "Task not found"} # Simulate long processing logger.info(f"Processing task: {task.title}") time.sleep(10) # Simulate 10 seconds of work # Example: Update the task status or description task.description = (task.description or "") + " [Processed]" task.status = "completed" # Or some other status db.session.commit() logger.info(f"Finished processing Task ID: {task_id}") # Return value is stored in result backend if not ignored return {"status": "success", "task_id": task_id, "new_description": task.description} except Exception as e: logger.error(f"Error processing Task ID {task_id}: {e}", exc_info=True) # Retry the task automatically on failure (up to max_retries) raise self.retry(exc=e) @celery.task def send_confirmation_email(recipient, subject, body): """Example task to send an email (replace with actual email logic).""" logger.info(f"Simulating sending email to {recipient}") logger.info(f"Subject: {subject}") logger.info(f"Body: {body}") time.sleep(3) # Simulate network delay logger.info("Email 'sent'.") # In a real app, use Flask-Mail or other email libraries here return {"status": "sent", "recipient": recipient}
4. Calling Tasks from Flask Views:
Use the .delay()
or .apply_async()
methods of your task function to send it to the queue.
# Example usage in app/tasks/routes.py
# Import the background task functions
from .background import process_task_data, send_confirmation_email
# ... other imports, schemas, etc. ...
@tasks_bp.route('/<int:task_id>/process', methods=['POST'])
@jwt_required()
def trigger_task_processing(task_id):
"""Endpoint to trigger background processing for a task."""
# Ensure task exists before queueing (optional, task itself could check)
task = Task.query.get(task_id)
if not task:
return make_error_response(f"Task {task_id} not found.", 404)
# Send task to the Celery queue
# .delay() is a shortcut for .apply_async() with default options
task_instance = process_task_data.delay(task_id)
logger.info(f"Queued background processing for Task ID: {task_id}. Celery Task ID: {task_instance.id}")
# Return 202 Accepted, indicating the request was accepted for processing
# Optionally include the Celery task ID for tracking
return jsonify({
"message": "Task processing has been queued.",
"celery_task_id": task_instance.id,
"task_uri": url_for('tasks.get_processing_status', celery_task_id=task_instance.id, _external=True) # Optional: provide status check URL
}), 202
# Optional: Endpoint to check task status (requires result backend)
@tasks_bp.route('/process/status/<string:celery_task_id>', methods=['GET'])
@jwt_required()
def get_processing_status(celery_task_id):
"""Check the status of a Celery background task."""
# Import AsyncResult
from celery.result import AsyncResult
task_result = AsyncResult(celery_task_id, app=celery) # Get result object using task ID
response = {
'task_id': celery_task_id,
'status': task_result.status, # PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED
'result': None
}
if task_result.successful():
response['result'] = task_result.get() # Get the return value of the task
elif task_result.failed():
# Get the exception/traceback (be careful about exposing details)
try:
# result.get() will re-raise the exception
task_result.get()
except Exception as e:
response['result'] = {"error": str(e)} # Store error message
elif task_result.status == 'PENDING':
response['result'] = "Task is waiting in the queue."
elif task_result.status == 'STARTED':
response['result'] = "Task execution has started."
elif task_result.status == 'RETRY':
response['result'] = "Task is being retried."
return jsonify(response), 200
5. Running Celery Workers: You need to run separate Celery worker processes in your production environment (managed by Systemd or similar).
# In your project root (with venv active)
# Ensure FLASK_APP and FLASK_CONFIG are set if worker needs app context during import
# Make sure Redis/RabbitMQ server is running
# Command to start a worker:
celery -A celery_worker.celery worker --loglevel=info
# -A points to your Celery application instance (celery_worker module, celery variable)
# --loglevel sets the logging level
# Can add -c N to specify concurrency (number of child processes/threads per worker)
You would create another Systemd service file specifically for the Celery worker process(es), similar to how you managed the Gunicorn process.
Workshop: Adding a Background Email Task
Goal: When a new task is created via the API, trigger a background task to simulate sending a confirmation email.
Steps:
- Install Celery:
pip install celery[redis]
(or your chosen broker) and add torequirements.txt
. Ensure Redis server is running locally. - Configure Celery:
- Update
app/config.py
withCELERY_BROKER_URL
andCELERY_RESULT_BACKEND
(e.g.,redis://localhost:6379/2
and/3
). SetCELERY_TASK_IGNORE_RESULT = False
if you want to test the status endpoint. - Add
FlaskTask
base class andcelery = Celery(...)
instantiation toapp/extensions.py
. - Update
app/__init__.py
to configure Celery from Flask config (celery.conf.update(...)
) and link the app context (celery.conf.update(app=app)
). Make sureinclude
points to where tasks will be defined (e.g.,app.tasks.background
).
- Update
- Create Worker Entry Point: Create
celery_worker.py
at the project root. - Define Email Task:
- Create
app/tasks/background.py
. - Define the
send_confirmation_email(recipient, subject, body)
task inside it, decorated with@celery.task
. Add logging and atime.sleep()
to simulate work.
- Create
- Trigger Task:
- In
app/tasks/routes.py
, import thesend_confirmation_email
task. - Inside the
create_task
function, afterdb.session.commit()
, call the background task:# ... inside create_task, after commit ... try: # Assume user info is available, e.g., from JWT or related object # For demo, use a placeholder email user_email = "user@example.com" # Replace with actual logic later subject = f"Task Created: {new_task.title}" body = f"Your task '{new_task.title}' (ID: {new_task.id}) was created successfully." send_confirmation_email.delay(user_email, subject, body) logger.info(f"Queued confirmation email for Task ID: {new_task.id}") except Exception as mail_err: # Log error but don't fail the main request if queuing email fails logger.error(f"Failed to queue confirmation email for Task {new_task.id}: {mail_err}") result = task_schema.dump(new_task) return make_success_response(result, 201)
- In
- Run:
- Start Redis Server: Make sure it's running.
- Start Celery Worker: Open a new terminal, activate
venv
, and run:celery -A celery_worker.celery worker --loglevel=info
- Start Flask App: Open another terminal, activate
venv
, and run:python run.py
- Test:
- Use
curl
toPOST
a new task to/api/tasks/
. - The API response should return quickly (201 Created).
- Observe the Celery worker terminal: You should see log messages indicating the
send_confirmation_email
task was received and executed (including thetime.sleep
). - (Optional) If you implemented the status check endpoint and didn't ignore results, try querying
/api/tasks/process/status/<celery_task_id>
to see the task's outcome.
- Use
Outcome: You have successfully integrated Celery to run a simulated email task in the background when a new API task is created. The API response remains fast, while the potentially time-consuming operation happens asynchronously via the Celery worker.
14. API Documentation (Swagger/OpenAPI)
Good documentation is crucial for any API. It tells consumers (frontend developers, mobile app developers, other backend services, third parties) how to interact with your API: what endpoints are available, what HTTP methods they support, what parameters they expect (path, query, body), what the request/response formats are, what status codes to expect, and how authentication works. Clear, accurate, and up-to-date documentation significantly speeds up integration and reduces errors.
Manually writing and maintaining API documentation (e.g., in Markdown files or wikis) can be tedious and prone to becoming outdated as the API evolves. A more efficient approach is to generate documentation automatically from your code or from a specification.
OpenAPI Specification (Swagger)
The OpenAPI Specification (formerly known as Swagger Specification) is the industry standard for describing RESTful APIs. It provides a language-agnostic format (usually JSON or YAML) to define your API's:
- Endpoints (paths) and supported HTTP operations (GET, POST, PUT, DELETE, etc.).
- Input parameters (path, query, header, cookie, request body).
- Request and response body schemas (data models).
- Authentication methods.
- Metadata like contact information, license, terms of service.
Having an OpenAPI document for your API enables several powerful possibilities:
- Interactive Documentation: Tools like Swagger UI and ReDoc can render the OpenAPI document as a user-friendly, interactive web page where users can explore endpoints and even try making requests directly from the browser.
- Code Generation: Generate client SDKs (in various programming languages) and server stubs automatically from the specification.
- Automated Testing: Use the specification to validate API requests and responses during testing.
Generating OpenAPI Docs from Flask Code
Several Flask extensions help generate OpenAPI specifications and interactive documentation directly from your Flask code, reducing the need for manual spec writing. We'll look at two popular options: Flask-RESTx and APIFlask.
Option 1: Flask-RESTx (Integrated, Opinionated)
Flask-RESTx is a Flask extension specifically designed for building REST APIs. It provides high-level abstractions for routing, request parsing, response formatting, and excellent automatic Swagger documentation generation.
-
Key Concepts:
Api
: The main entry point, attached to your Flask app or a Blueprint.Namespace
: Used to group related resources under a common path prefix and for documentation structure.Resource
: Class-based views where methods correspond to HTTP verbs (e.g.,get()
,post()
,put()
).@api.expect
/@ns.expect
: Decorator to specify expected input data models or parsers.@api.marshal_with
/@ns.marshal_with
: Decorator to define and format the output data structure using Flask-RESTx models (api.model
).reqparse
: Built-in request argument parser (though Marshmallow can also be integrated).api.model
: Defines data schemas used for input validation and output marshalling, which are directly translated into the OpenAPI spec.
-
Pros:
- Highly integrated: Provides routing, marshalling, parsing, and docs in one package.
- Excellent, automatic Swagger UI generation with minimal extra code.
- Can enforce consistency and reduce boilerplate for standard REST patterns.
-
Cons:
- Opinionated: Requires structuring your API using its
Resource
andNamespace
classes, which might feel different from standard Flask views/Blueprints. - Learning curve associated with its specific components (
reqparse
,api.model
). - Refactoring an existing Flask app (like ours) to use Flask-RESTx can be significant.
- Opinionated: Requires structuring your API using its
-
Example Snippet:
Running this app and navigating to# Conceptual example - requires installing Flask-RESTx # from flask import Flask, Blueprint # from flask_restx import Api, Resource, fields, Namespace # # app = Flask(__name__) # blueprint = Blueprint('api', __name__, url_prefix='/api') # api = Api(blueprint, version='1.0', title='Sample API', description='A sample API with Flask-RESTx') # # ns = Namespace('tasks', description='Task operations') # api.add_namespace(ns) # # # Define data model for Swagger docs and marshalling # task_model = api.model('Task', { # 'id': fields.Integer(readonly=True, description='The task unique identifier'), # 'title': fields.String(required=True, description='The task title') # }) # # # In-memory list for demo # TASKS = [] # # @ns.route('/') # Corresponds to /api/tasks/ # class TaskList(Resource): # @ns.doc('list_tasks') # @ns.marshal_list_with(task_model) # Use the model for output list # def get(self): # """List all tasks""" # return TASKS # # @ns.doc('create_task') # @ns.expect(task_model, validate=True) # Expect input matching the model # @ns.marshal_with(task_model, code=201) # Use model for output, set status code # def post(self): # """Create a new task""" # new_task = api.payload # Access validated input payload # new_task['id'] = len(TASKS) + 1 # TASKS.append(new_task) # return new_task, 201 # # app.register_blueprint(blueprint) # # # When run, access /api/ to see Swagger UI
/api/
(the blueprint prefix) would automatically display the Swagger UI documentation.
Option 2: APIFlask (Modern, Marshmallow-centric)
APIFlask is a newer framework built on top of Flask and heavily inspired by FastAPI (for Python async frameworks). It focuses on providing easy OpenAPI specification generation using standard Flask decorators and Marshmallow schemas.
-
Key Concepts:
- Uses standard Flask
Blueprint
and view functions. - Decorators (
@app.input()
,@app.output()
,@app.authenticate()
) handle request validation, response formatting, and auth info using Marshmallow schemas directly. - Generates OpenAPI spec and integrates Swagger UI and ReDoc automatically.
- Uses standard Flask
-
Pros:
- Feels very familiar to Flask developers (uses standard decorators and Blueprints).
- Excellent integration with Marshmallow for validation and serialization (which we are already using).
- Less opinionated about application structure compared to Flask-RESTx.
- Provides both Swagger UI and ReDoc documentation views out-of-the-box.
-
Cons:
- Newer than Flask-RESTx, potentially less battle-tested in extremely large-scale applications (though actively developed and popular).
- Requires understanding Marshmallow schemas well (which is often a good thing anyway).
-
Example Snippet (Integrating with our existing app):
With APIFlask, you replace# Example modifications for APIFlask # Requires installing APIFlask (pip install APIFlask) # 1. Change app creation in app/__init__.py # from flask import Flask -> from apiflask import APIFlask # ... # def create_app(config_name='development'): # # Use APIFlask instead of Flask # app = APIFlask(__name__, # title="Advanced Task API", # Docs title # version="1.0.0", # Docs version # instance_relative_config=True) # # ... rest of factory ... # # APIFlask handles OpenAPI spec generation based on decorators # 2. Modify routes in app/tasks/routes.py # from flask import request, jsonify, abort -> from apiflask import APIBlueprint # Usually use APIFlask app directly or APIBlueprint # from ..schemas import TaskSchema, TaskPatchSchema # Keep schema imports # from flask_jwt_extended import jwt_required, get_jwt_identity # Keep JWT # # ... other imports ... # tasks_bp = APIBlueprint('tasks', __name__) # Use APIBlueprint if separating # # Import http exceptions from apiflask's werkzeug import # from werkzeug.exceptions import UnprocessableEntity, NotFound, Forbidden, Unauthorized, UnsupportedMediaType # task_schema = TaskSchema() # tasks_schema = TaskSchema(many=True) # task_patch_schema = TaskPatchSchema() # @tasks_bp.route('/', methods=['GET']) # Or @app.route if using app directly # @tasks_bp.output(tasks_schema) # Define output schema using decorator # @jwt_required() # Auth decorator remains # def get_tasks(): # # ... fetch tasks query ... # tasks = query.order_by(db.desc(Task.created_at)).all() # # No need to manually dump, the @output decorator handles it # return tasks # Return the model objects directly # @tasks_bp.route('/<int:task_id>', methods=['GET']) # @tasks_bp.output(task_schema) # Define output schema # @jwt_required() # def get_task(task_id): # task = Task.query.get(task_id) # if task is None: # raise NotFound(f"Task with ID {task_id} not found") # Use Werkzeug exceptions # return task # Return model object # @tasks_bp.route('/', methods=['POST']) # @tasks_bp.input(task_schema) # Define INPUT schema for validation & deserialization # @tasks_bp.output(task_schema, status_code=201) # Define output schema # @jwt_required() # def create_task(data): # Validated data is injected by @input decorator # # 'data' is now a dictionary validated against TaskSchema # new_task = Task(**data) # Create model instance from validated data # # ... db.session.add, db.session.commit ... # return new_task # Return model object # ... Similarly refactor PUT/PATCH using @input and @output ... # For PATCH, potentially use @input(TaskPatchSchema(partial=True)) # Error handling might need adjustment as APIFlask has specific ways too.
Flask
withAPIFlask
in your factory, and then use decorators like@app.input(Schema)
and@app.output(Schema)
on your view functions. APIFlask uses these decorators to validate requests, serialize responses, and generate the OpenAPI spec. Navigating to/docs
(Swagger UI) or/redoc
would show the generated documentation.
Option 3: Other Tools (e.g., Flask-Swagger-UI, Connexion)
- Flask-Swagger-UI: A simpler extension that just serves the Swagger UI static files. You still need to provide the OpenAPI specification file (JSON/YAML) yourself, either by writing it manually or generating it with another tool (like Marshmallow-based generators like
apispec
). Less automation. - Connexion: A framework built on top of Flask that takes an "OpenAPI-first" approach. You write the OpenAPI specification (YAML/JSON) first, and Connexion maps endpoints defined in the spec to your Python view functions, handling routing and request/response validation based on the spec.
Choosing an Approach
- Flask-RESTx: Good choice if you like its opinionated, resource-based structure and want highly integrated, automatic documentation with minimal fuss, especially for new projects. Be prepared for its specific way of doing things.
- APIFlask: Excellent choice if you prefer standard Flask routing and view functions, are already using Marshmallow (or want to), and desire automatic OpenAPI generation with less structural enforcement. Often easier to integrate into existing Flask apps using Marshmallow.
- Connexion: Ideal if you prefer a "design-first" approach, writing the OpenAPI specification manually before implementing the code.
- Flask-Swagger-UI + Manual/Other Spec Generation: Suitable if you need maximum control over the spec generation process or are using tools other than Marshmallow/Flask-RESTx models. Requires more manual effort.
Given that our project already uses Marshmallow extensively, APIFlask would likely be the smoothest integration path for adding automated documentation.
Workshop: Adding Documentation with APIFlask
Goal: Integrate APIFlask into the advanced_api
project to automatically generate OpenAPI documentation (Swagger UI) for the Task and Auth endpoints.
Steps:
-
Install APIFlask:
-
Update Application Factory (
app/__init__.py
):- Change the import:
from flask import Flask
tofrom apiflask import APIFlask
. - Change the app instantiation:
app = Flask(...)
toapp = APIFlask(__name__, title="Advanced Task API", version="1.0", instance_relative_config=True)
. - (Important) APIFlask's built-in error handling often takes precedence. Review or remove some of the custom
@app.errorhandler
definitions (especially for 400, 404, 422, 500) as APIFlask might provide suitable defaults or require different customization (@app.error_processor
). For this workshop, let's comment out our customhandle_marshmallow_validation
,handle_not_found
,handle_bad_request
,handle_unsupported_media_type
handlers inapp/__init__.py
to let APIFlask handle them based on decorators and raised exceptions. Keep thehandle_internal_error
for general 500s, perhaps.
- Change the import:
-
Update Blueprint Definitions (Optional but Recommended):
- Change
from flask import Blueprint
tofrom apiflask import APIBlueprint
inapp/tasks/__init__.py
andapp/auth/__init__.py
. - Update the instantiation:
tasks_bp = Blueprint(...)
totasks_bp = APIBlueprint(...)
(similarly forauth_bp
).
- Change
-
Refactor Task Routes (
app/tasks/routes.py
):- Import
HTTPError
or specific exceptions (e.g.,NotFound
,UnprocessableEntity
) fromwerkzeug.exceptions
orapiflask.exceptions
. - Remove the manual
make_success_response
andmake_error_response
helpers (APIFlask handles response structure via@output
). - Replace
jsonify
calls with returning Python objects/dictionaries directly. - Replace
abort(code, description=...)
calls with raising appropriate Werkzeug exceptions (e.g.,raise NotFound("Task not found")
,raise UnprocessableEntity("Validation failed")
). get_tasks
:- Add
@tasks_bp.output(tasks_schema)
decorator. - Return the list of
Task
objects directly:return tasks
.
- Add
get_task
:- Add
@tasks_bp.output(task_schema)
decorator. - Replace
Task.query.get_or_404(...)
withtask = Task.query.get(task_id)
and addif task is None: raise NotFound("Task not found")
. - Return the
task
object directly.
- Add
create_task
:- Remove the
request.get_json()
and manual validation checks. - Add
@tasks_bp.input(task_schema)
decorator. The function argumentdata
will receive the validated dict. - Add
@tasks_bp.output(task_schema, status_code=201)
decorator. - Change function signature to
def create_task(data):
. - Create the model:
new_task = Task(**data)
. - Commit to DB.
- Return the
new_task
object. - Remove the
try...except ValidationError
block (APIFlask handles this via@input
). Keep DB error handling.
- Remove the
update_task
(PUT):- Remove
request.get_json()
and manual validation. - Add
@tasks_bp.input(task_schema)
. Change signature todef update_task(task_id, data):
. - Add
@tasks_bp.output(task_schema)
. - Fetch existing task, raise
NotFound
if not found. - Update task attributes:
for key, value in data.items(): setattr(task, key, value)
. - Commit to DB.
- Return the updated
task
object. - Remove
ValidationError
handler.
- Remove
patch_task
(PATCH):- Use
@tasks_bp.input(TaskPatchSchema)
(orTaskSchema(partial=True)
if you didn't createTaskPatchSchema
). Change signature todef patch_task(task_id, data):
. - Add
@tasks_bp.output(task_schema)
. - Fetch existing task, raise
NotFound
. - Update attributes:
for key, value in data.items(): setattr(task, key, value)
. - Commit to DB.
- Return updated
task
object. - Remove
ValidationError
handler.
- Use
delete_task
:- No input/output schemas needed, but add
@tasks_bp.doc(responses={204: 'Task deleted successfully'})
for better docs. - Fetch task, raise
NotFound
if missing. - Perform authorization check (raise
Forbidden("Admin role required")
). - Commit to DB.
- Return an empty string and status code:
return '', 204
.
- No input/output schemas needed, but add
- Import
-
Refactor Auth Routes (
app/auth/routes.py
- Optional but good for consistency):- You could also refactor login/register using
@input
and@output
with dedicated Marshmallow schemas for login credentials and user registration data. This would provide automatic validation and documentation for these endpoints as well. For simplicity in this workshop, we can leave them as is, but they won't appear as nicely in the generated docs without schemas. - Add basic documentation decorators if not fully refactoring:
@auth_bp.route('/login', methods=['POST']) @auth_bp.doc(description='Log in to obtain JWT tokens.', responses={401: 'Invalid credentials', 400: 'Missing data'}) def login(): # ... existing logic ... @auth_bp.route('/register', methods=['POST']) @auth_bp.doc(description='Register a new user.', responses={409: 'User already exists', 400: 'Missing data'}) def register(): # ... existing logic ...
- You could also refactor login/register using
-
Run and Check Docs:
- Run the application:
python run.py
. - Open your browser and navigate to:
http://127.0.0.1:5000/docs
(Swagger UI)http://127.0.0.1:5000/redoc
(ReDoc UI)
- Explore the documentation. You should see the 'tasks' endpoints listed, with details about expected input (from
@input
/schemas) and output formats (from@output
/schemas). Authentication requirements (JWT) should also be indicated. You can even try executing requests directly from the Swagger UI after authorizing using the login endpoint.
- Run the application:
Outcome: You have integrated APIFlask into your project, leveraging its ability to generate interactive API documentation from your existing Marshmallow schemas and view functions using decorators. This significantly improves the usability and discoverability of your API for consumers. You've seen how @input
and @output
streamline validation and serialization while simultaneously powering the documentation.
15. Security Best Practices
Beyond authentication and authorization, several other security considerations are vital when building and deploying web APIs. Overlooking these can expose your application and users to significant risks.
-
HTTPS Everywhere:
- Why: Encrypts data in transit between the client and the server, preventing eavesdropping and man-in-the-middle attacks. This is non-negotiable for any API handling sensitive data, including login credentials and JWT tokens.
- How: Obtain SSL/TLS certificates (Let's Encrypt provides free ones via tools like Certbot) and configure your reverse proxy (Nginx) to use them for port 443. Implement HTTP Strict Transport Security (HSTS) headers to instruct browsers to always connect via HTTPS. Redirect all HTTP traffic to HTTPS.
-
Input Validation:
- Why: Never trust client input. Malicious users can send malformed, unexpected, or oversized data to exploit vulnerabilities (e.g., SQL injection, Cross-Site Scripting (XSS) if data is rendered elsewhere, Denial of Service).
- How:
- Use robust validation libraries (like Marshmallow, as we did) to define expected data types, formats, lengths, and allowed values.
- Validate all input sources: request bodies, query parameters, path parameters, headers.
- Reject invalid requests early (e.g., with 400 or 422 responses).
- Be specific about validation errors returned to the client.
-
Output Encoding & Content Types:
- Why: Primarily relevant if API data is consumed by web browsers, but good practice regardless. Ensures data is interpreted correctly and prevents XSS if responses are ever rendered as HTML.
- How:
- Always set the correct
Content-Type
header for your responses (e.g.,application/json
). Flask'sjsonify
and tools like APIFlask/Flask-RESTx handle this well for JSON. - If returning user-generated content that might be displayed in HTML, ensure it's properly encoded/sanitized to prevent XSS.
- Always set the correct
-
Rate Limiting:
- Why: Protects against DoS attacks, brute-forcing, and resource exhaustion.
- How: Implement rate limiting (e.g., using Flask-Limiter) based on IP address, user ID, API key, or a combination. Apply stricter limits to sensitive or expensive endpoints (like login).
-
Authentication & Authorization:
- Why: Controls who can access the API and what they can do.
- How:
- Use strong authentication mechanisms (like JWT with secure secrets and HTTPS).
- Implement proper authorization checks (e.g., role-based access control) within endpoints to ensure users only access resources they are permitted to.
- Don't expose internal IDs or sensitive data unnecessarily in tokens or responses.
-
Secrets Management:
- Why: Hardcoding sensitive information (API keys, database passwords, JWT secrets) directly into source code is extremely risky.
- How:
- Use environment variables (
os.getenv()
). - Use configuration files placed outside the version-controlled codebase (e.g., in the
instance
folder,.env
files loaded viapython-dotenv
). - Utilize dedicated secrets management systems (like HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) for production environments.
- Ensure secret files are included in your
.gitignore
.
- Use environment variables (
-
Dependency Management & Security Audits:
- Why: Vulnerabilities are often found in third-party libraries your application depends on.
- How:
- Keep your dependencies (Flask, extensions, other libraries listed in
requirements.txt
) up-to-date. - Use tools like
pip freeze > requirements.txt
to pin specific working versions. - Regularly audit dependencies for known vulnerabilities using tools like
pip-audit
,safety
, or GitHub's Dependabot.
- Keep your dependencies (Flask, extensions, other libraries listed in
-
Proper Error Handling:
- Why: Detailed error messages and stack traces leaked to the client can reveal internal application structure or sensitive information useful to attackers.
- How:
- Catch exceptions gracefully.
- In production (
DEBUG=False
), return generic, user-friendly error messages to the client (e.g., "An internal server error occurred"). - Log detailed error information (including stack traces) server-side for debugging. Configure proper logging (e.g., to files or a centralized logging service).
-
Security Headers:
- Why: Instruct browsers on how to behave when handling your site's content, mitigating certain attacks like clickjacking and XSS. While less critical for pure APIs not directly serving HTML to browsers, they are important if your Flask app ever serves web pages or is accessed via browsers.
- How: Configure your reverse proxy (Nginx) or use Flask extensions (like Flask-Talisman) to add headers like:
Strict-Transport-Security
(HSTS)Content-Security-Policy
(CSP)X-Content-Type-Options: nosniff
X-Frame-Options: DENY
orSAMEORIGIN
Referrer-Policy
-
Regular Security Testing:
- Why: Proactively find vulnerabilities before attackers do.
- How:
- Perform code reviews focusing on security aspects.
- Use static analysis security testing (SAST) tools.
- Conduct dynamic analysis security testing (DAST) using scanners or manual penetration testing against your deployed API.
Workshop: Reviewing Security Posture
Goal: Review the current state of the advanced_api
project against the security best practices discussed and identify areas for improvement (no coding required, just analysis).
Steps:
- HTTPS:
- Check: Is HTTPS currently enforced? (No, our Nginx template mentioned it but didn't fully configure it).
- Improvement: Need to obtain SSL certs and configure Nginx for HTTPS and redirection.
- Input Validation:
- Check: Are we validating input? (Yes, largely handled by Marshmallow schemas via APIFlask's
@input
decorator now). Path parameters (<int:task_id>
) are type-checked by Flask routing. Query parameters (status
filter) have basic validation. - Improvement: Could add more specific validation rules to schemas if needed (e.g., regex for certain string formats). Ensure all external input is covered.
- Check: Are we validating input? (Yes, largely handled by Marshmallow schemas via APIFlask's
- Output Encoding/Content Type:
- Check: Is the
Content-Type
correct? (Yes, APIFlask@output
decorator and directjsonify
ensureapplication/json
). - Improvement: Generally good for a JSON API.
- Check: Is the
- Rate Limiting:
- Check: Is it implemented? (Yes, using Flask-Limiter with global defaults and specific limit on login).
- Improvement: Ensure production uses a scalable backend (Redis/Memcached). Review if limits are appropriate for expected load. Consider user-specific limits.
- Authentication & Authorization:
- Check: Implemented? (Yes, JWT for AuthN. Basic role check on DELETE for AuthZ).
- Improvement: Authorization is minimal. Need checks to ensure users can only modify/view their own tasks (unless admin). Add
user_id
foreign key toTask
model, associate tasks with users on creation, and add ownership checks in PUT/PATCH/DELETE/GET (single task) routes. Role management could be more sophisticated.
- Secrets Management:
- Check: How are secrets handled? (
JWT_SECRET_KEY
,SQLALCHEMY_DATABASE_URI
are loaded from config, which can load from environment variables or instance config, but defaults might be insecure). - Improvement: Strictly enforce loading all secrets (DB URI with password, JWT secret) from environment variables or a secrets file (
.env
,instance/config.py
) in production. Ensure these are not in Git. Use stronger default secrets or remove defaults entirely for production config.
- Check: How are secrets handled? (
- Dependency Management:
- Check: Are dependencies pinned? (
requirements.txt
exists,pip freeze
was used). - Improvement: Set up regular dependency scanning (e.g., GitHub Dependabot,
pip-audit
). Keep dependencies updated.
- Check: Are dependencies pinned? (
- Error Handling:
- Check: How are errors handled? (APIFlask provides default handlers. We have a generic 500 handler that logs internally).
- Improvement: Ensure no sensitive details leak in production error responses. Enhance server-side logging (configure Flask and Gunicorn logging properly, maybe centralize logs).
- Security Headers:
- Check: Are they set? (No, not explicitly added).
- Improvement: Configure Nginx to add relevant headers (HSTS, X-Content-Type-Options, etc.).
- Regular Testing:
- Check: Do we have tests? (Yes, using pytest for tasks and auth).
- Improvement: Expand test coverage, especially for authorization logic and edge cases. Consider integrating SAST tools.
Outcome: This review highlights that while core security features like AuthN/AuthZ, validation, and rate limiting are implemented, crucial areas like HTTPS enforcement, robust authorization (ownership), secure secrets management in production, and dependency scanning need further attention for a production-ready API.
Conclusion
Congratulations! You have journeyed through the fundamentals and advanced concepts of building RESTful APIs using Flask on Linux. Starting from a simple "Hello, World!", you progressed through:
- Core Flask Concepts: Routing, request handling, response generation (
jsonify
). - HTTP Methods & Status Codes: Understanding and implementing REST principles.
- Persistent Storage: Integrating SQLAlchemy and Flask-SQLAlchemy for database interaction.
- Application Structure: Organizing code with Blueprints and the Application Factory pattern (
create_app
). - Data Validation & Serialization: Using Marshmallow and Flask-Marshmallow (or APIFlask) for robust data handling and replacing manual
to_dict
methods. - Authentication & Authorization: Securing endpoints with JWT via Flask-JWT-Extended and implementing basic role checks.
- Database Migrations: Managing schema changes reliably with Alembic and Flask-Migrate.
- Testing: Writing unit and integration tests using pytest and Flask's test client.
- Deployment: Understanding production deployment stacks involving WSGI servers (Gunicorn), reverse proxies (Nginx), and process managers (Systemd).
- Rate Limiting: Protecting your API from abuse using Flask-Limiter.
- Caching: Improving performance with Flask-Caching.
- Background Tasks: Offloading long-running operations using Celery.
- API Documentation: Generating interactive documentation with APIFlask (or alternatives like Flask-RESTx).
- Security Best Practices: Reviewing essential security considerations beyond basic authentication.
Flask's microframework nature provides flexibility, while its rich ecosystem of extensions allows you to build powerful, complex, and secure APIs. Remember that building great APIs is an ongoing process involving continuous learning, testing, and refinement.
Further Exploration:
- Advanced Authorization: Explore attribute-based access control (ABAC), libraries like Flask-Principal, or integrate with OAuth2 providers.
- GraphQL: Investigate building GraphQL APIs with Flask using libraries like Graphene-Python.
- WebSockets: For real-time communication, explore Flask-SocketIO.
- Advanced Testing: Mocking complex dependencies, property-based testing, load testing.
- CI/CD: Automate testing and deployment using Continuous Integration/Continuous Deployment pipelines (e.g., GitHub Actions, GitLab CI, Jenkins).
- Monitoring & Logging: Integrate tools like Prometheus, Grafana, Sentry, ELK stack for in-depth monitoring and centralized logging.
- Microservices: Apply Flask to build individual microservices within a larger distributed system.
This guide has equipped you with a strong foundation. Keep building, experimenting, and consulting the excellent documentation for Flask and its extensions. Happy coding!