Schema Design & Normalization
Database schema design is the process of organizing and structuring how data is stored in a database so it can be efficiently stored, retrieved, and updated.
In other words, it is all about designing the tables in our database.
In this lesson, we'll practice designing a database and use dbdiagram.io to create an ERD.
Table of Contents:
Terms
Schema Design - the process of organizing and structuring how data is stored in a database so it can be efficiently stored, retrieved, and updated.
ERD (Entity Relationship Diagram) - a visual representation of the contents of each entity (a.k.a. "table") in your database as well as the relationships between entities.
DBML (Database Markup Language) - a language used to generate ERDs by tools like dbdiagram.io
Normal Forms - a series of guidelines to help achieve a well-structured and efficient database schema.
Normalization - the process of organizing data in a database to eliminate redundancy and inconsistent dependencies.
Why is schema design important?
A well-designed database schema ensures:
data consistency
minimized redundancy
improved performance
Database schema designs are most often communicated through an Entity Relationship Diagram
So, how do we get from an idea to an ERD like this?
Steps For Designing A Database
Let's say we're designing a database for a school that tracks:
students at the school
teachers at the school
the courses that are available
which students are taking which courses
which teachers teach which courses
When designing a database schema, you'll typically follow these steps:
Identify tables (a.k.a "entities")
Define columns (a.k.a "fields" or "properties")
Determine relationships (one-to-many or many-to-many)
Normalize (more on this soon!)
Q: With your group, take 10 minutes and brainstorm for the first 3 steps for your own project
Database Markup Language
Database Markup Language was created to standardize the way that database schemas are communicated. When paired with a tool like https://dbdiagram.io/
, we can quickly define our database schema and create an Entity Relationship Diagram.
Visit https://dbdiagram.io/ and clear out the diagram editor.
Step 1 - Identify Tables
Okay, let's recall that we're designing a database for a school that tracks:
students at the school
teachers at the school
the courses that are available
which students are taking which courses
which teachers teach which courses
Using DBML, we can define a table and generate an ERD for that table by simply writing:
This creates a table called students
with a single id
column.
Below, you can see that I've also created tables for teachers
, classes
and enrollments
Step 2 - Define Columns
When defining columns, we should always indicate:
the name of the column (typically written in
underscore_case
)the type of the data (
integer
,string
,boolean
,date
, etc...)
For example, using DBML, we might define our students
table like this:
Step 3 - Determine Relationships
Determining relationships involves defining primary keys, foreign keys, and drawing the connections between those primary and foreign keys.
Using DBML we can simply append PK
to the end of a column to make it a primary key:
To define relationships between tables, we can use the syntax
The "bigger" side of the
<
represents the "many" side of the relationship
Or, using the dbdiagram.io tool, we can click and drag to draw connections between the tables:
Notice the lines drawn between tables are also labeled:
The
1
indicates the "one" side of a "one-to-" relationshipThe
*
indicates the "many" side of a "many-to-" relationship
In the end, our DBML for this database schema looks like this:
Practice
In groups, use dbdiagram.io
to design a database schema for your application.
If you haven't yet decided on an application, here are a few ideas that you can start with:
An online store that sells laptops.
A photo-sharing application with commenting features (Instagram).
An online library catalog.
A restaurant reservation system.
An concert management system.
A hotel booking system.
Step 4 - Normalizing Tables
Database normalization aims to ensure that our database minimizes redundancy (no duplicate data) and maintains data integrity (the data is accurate and complete).
To do this step, we need to understand the normal forms: a series of guidelines to help achieve a well-structured and efficient database schema.
Normalization is the process of making sure our database adheres to these guidelines.
There are many normal forms but the first two can get us quite far.
First Normal Form (1NF)
A table complies with the first normal form if it satisfies the following conditions:
All columns contain atomic values (i.e., each value is indivisible).
Each column has a unique name.
The order in which data is stored does not matter.
1NF helps eliminate duplicate data by ensuring that each attribute has a single, indivisible value.
For example, this table does not comply with the 1NF because the products
column contains multiple values:
1
Avery
Laptop, Mouse, Keyboard
2
Blake
Laptop, Monitor
3
Charles
Monitor, Trackpad
This leads to several problems:
Data Redundancy: The same product is repeated multiple times across different rows, leading to unnecessary duplication of data.
Data Inconsistency: If a product name needs to be updated, it must be done in multiple places, which can lead to inconsistencies if not done correctly.
Querying and Updating Challenges: Performing queries or updates on this data structure can be difficult and inefficient. For example, it is cumbersome to find all customers who ordered a specific product or to add or remove a product for a customer.
To fix it, we can make
1
Avery
Laptop
1
Avery
Mouse
1
Avery
Keyboard
2
Blake
Laptop
2
Blake
Monitor
3
Charles
Monitor
3
Charles
Trackpad
Q: How do you know that the table below is NOT in the first normal form? How would you fix it?
1
Alice
Math, Science, History
2
Bob
Science, English, Mathematics
3
Carol
History, Math, English
Second Normal Form (2NF)
A table complies with 2NF if it meets the following criteria:
It is already in 1NF.
All non-key columns are fully dependent on the primary key (i.e., there are no partial dependencies).
2NF helps remove redundancy by ensuring that each non-key attribute is fully dependent on the primary key. This prevents attributes from being repeated across multiple rows.
Take a look at this table order_details
which shows the relationships between an order placed by a customer and the products in each order.
This is NOT compliant with the second normal form:
1
1
1
Laptop
1
Avery
2
1
2
Monitor
1
Avery
3
2
1
Laptop
2
Blake
4
3
3
Trackpad
3
Charles
In this table, the following partial dependencies exist:
product_name
depends onproduct_id
, not on the primaryid
keycustomer_name
depends oncustomer_id
, not on the primaryid
key
In other words, we don't need both the product_id
and the product_name
to identify a product. We should rely exclusively on the product_id
to uniquely identify a product.
To reach 2NF, we must eliminate partial dependencies by removing these partial dependencies:
The
products
table ties eachproduct.id
to uniqueproduct.name
idname1
Laptop
2
Monitor
3
Trackpad
The
customers
table ties eachcustomer.id
to a uniquecustomer.name
idname1
Avery
2
Blake
3
Charles
The
orders
table ties whichcustomer.id
placed whichorder.id
idcustomer_id1
1
2
2
3
3
The
order_items
"junction/association" table tracks which products are associated with each order:idorder_idproduct_id1
1
1
2
1
2
3
2
1
4
3
3
Q: How do you know that the table below is NOT in 2NF? How would you fix it?
1
1
Alice
Math
2
1
Alice
Science
3
1
Alice
History
4
2
Bob
Science
5
2
Bob
English
6
2
Bob
Mathematics
7
3
Carol
History
8
3
Carol
Math
9
3
Carol
English
Last updated