Skip to main content

DATABASE DESIGN & NORMALIZATION

 

Database Design and Normalization

Introduction

  • Good database design is crucial for efficient data management and retrieval
  • Normalization is a technique used to design good table structures
  • Goals of normalization:
    1. Minimize data redundancy
    2. Ensure data consistency
    3. Simplify data maintenance
    4. Avoid anomalies in data operations (insertion, deletion, update)

Problems with Unnormalized Tables

1. Repetition Anomaly

  • Definition: Unnecessary repetition of data across multiple rows
  • Example: Repeating project details (Project_id, Project_name) for each employee working on the project
  • Consequences:
    • Wastes storage space
    • Slows down data retrieval
    • Increases risk of data inconsistencies

2. Insertion Anomaly

  • Definition: Difficulty in inserting certain data without including other, potentially unavailable data
  • Example: Unable to add a new employee without assigning them to a project
  • Consequences:
    • Limits flexibility in data entry
    • May lead to incomplete or inaccurate data

3. Deletion Anomaly

  • Definition: Unintended loss of data when deleting a record
  • Example: Deleting an employee's project assignment also removes their personal details
  • Consequences:
    • Risk of losing important, unrelated data
    • Complicates data management and historical record-keeping

4. Updating Anomaly

  • Definition: Inconsistency when updating repeated data
  • Example: Updating an employee's salary requires changes in multiple rows
  • Consequences:
    • Increases risk of data inconsistencies
    • Makes data maintenance more time-consuming and error-prone

Normalization Process

Normalization involves organizing data into progressively more structured forms:

First Normal Form (1NF)

  • Objectives:
    1. Eliminate repeating groups
    2. Create separate tables for each set of related data
    3. Identify each record with a unique field (primary key)
  • Rules:
    1. Each column should contain atomic (indivisible) values
    2. Each column should have a unique name
    3. The order of data stored doesn't matter
    4. Each record should be unique
  • Example:
    • Before 1NF:
      | Emp_no | Emp_name | Project_ids | Project_names | |--------|----------|------------------|-------------------------| | E1 | John | P1, P2 | MERCURY, VENUS |
    • After 1NF:
      | Emp_no | Emp_name | Project_id | Project_name | |--------|----------|------------|--------------| | E1 | John | P1 | MERCURY | | E1 | John | P2 | VENUS |

Second Normal Form (2NF)

  • Objectives:
    1. Meet all 1NF requirements
    2. Remove partial dependencies
  • Rules:
    1. Be in 1NF
    2. All non-key attributes must depend on the entire primary key
  • Example:
    • Before 2NF:
      | Emp_no | Project_id | Emp_name | Project_name | Grade | Salary | |--------|------------|----------|--------------|-------|--------| | E1 | P1 | John | MERCURY | G3 | 50000 | | E1 | P2 | John | VENUS | G3 | 50000 |
    • After 2NF:
      Employee Table: | Emp_no | Emp_name | Grade | Salary | |--------|----------|-------|--------| | E1 | John | G3 | 50000 | Project Table: | Project_id | Project_name | |------------|--------------| | P1 | MERCURY | | P2 | VENUS | Employee_Project Table: | Emp_no | Project_id | |--------|------------| | E1 | P1 | | E1 | P2 |

Third Normal Form (3NF)

  • Objectives:
    1. Meet all 2NF requirements
    2. Remove transitive dependencies
  • Rules:
    1. Be in 2NF
    2. No non-key attribute should depend on another non-key attribute
  • Example:
    • Before 3NF:
      | Emp_no | Emp_name | Grade | Salary | |--------|----------|-------|--------| | E1 | John | G3 | 50000 |
    • After 3NF:
      Employee Table: | Emp_no | Emp_name | Grade | |--------|----------|-------| | E1 | John | G3 | Grade_Salary Table: | Grade | Salary | |-------|--------| | G3 | 50000 |

Benefits of Normalization

  1. Reduced data redundancy
  2. Improved data consistency
  3. Greater flexibility in data queries
  4. Easier data maintenance and updates
  5. Better support for data integrity constraints
  6. More accurate representation of real-world entities and relationships

Key Concepts

  • Primary Key: A unique identifier for each record in a table
  • Foreign Key: A field that refers to the primary key in another table
  • Composite Key: A primary key that consists of two or more attributes
  • Atomic Values: Values that cannot be further divided
  • Partial Dependency: When a non-key attribute depends on only part of a composite key
  • Transitive Dependency: When a non-key attribute depends on another non-key attribute

Practical Considerations

  • While full normalization (up to 3NF or beyond) is theoretically ideal, practical implementations may sometimes denormalize for performance reasons
  • Always consider the specific requirements of your application and the nature of data access patterns
  • Normalization is an iterative process; start with 1NF and progressively refine the structure

Conclusion

  • Normalization is a powerful technique for creating efficient, consistent, and flexible database designs
  • Understanding and applying normalization principles is crucial for database designers and administrators
  • Always balance theoretical ideals with practical considerations in real-world applications

Comments

Popular posts from this blog

ORACLE Express Edition: Getting Started

1. Introduction to Oracle Database 21c Express Edition (XE) - Free, lightweight version of Oracle Database - Ideal for learning and small-scale applications - Limited to 12GB of user data and uses up to 2GB of RAM 2. Installation and Setup 2.1 Installing Oracle 21c XE 1. Download Oracle 21c XE from: https://www.oracle.com/database/technologies/xe-downloads.html 2. Run the installer:    - Windows: Double-click the .exe file    - Linux: Use `rpm` or `yum` command 3. Follow the installation wizard:  Accept the license agreement Choose an installation location (default is usually fine) Set a password for the SYS, SYSTEM, and PDBADMIN accounts (write this down!) Select the option to start the database service automatically (recommended)  4. Complete the installation: Wait for the installation process to finish Note down the database connection details provided at the end The default container database (CDB) name is XE The default pluggable database (PDB) nam...

Types of DBMS

 DBMS Types Hierarchical DBMS: Structure: Data organized in a tree-like structure with parent-child relationships. Example: IBM's Information Management System (IMS) Use case: Organizational charts, file systems Imagine a family tree: Grandparent Parent 1 Child 1 Child 2 Parent 2 Child 3 Network DBMS: Structure: Data organized in a graph-like structure allowing many-to-many relationships. Example: Integrated Data Store (IDS) Use case: Complex relationship modeling Think of a social network: Person A is friends with Person B and Person C Person B is friends with Person A and Person D Person C is friends with Person A and Person D Relational DBMS: Structure: Data organized in tables with rows and columns, using relationships between tables. Examples: MySQL, Oracle, PostgreSQL Use case: Most common type, used in various applications Imagine a library system: Books table: (ID, Title, Author, ISBN) Members table: (ID, Name, Address, Phone) Loans table...

Key Terminologies used in Relational Database

 1. Database Definition: A database is a structured collection of data stored and managed electronically. It allows for efficient data retrieval, manipulation, and management. Characteristics: - Structured : Data is organized in tables with rows and columns. - Persistent: Data remains available over time. - Accessible : Allows querying and manipulation through a Database Management System (DBMS). 2. Database Management System (DBMS) Definition: A DBMS is software that facilitates the creation, manipulation, and management of databases. It provides an interface for users and applications to interact with the data. Characteristics : - Data Definition: Allows creation and modification of database schemas. - Data Manipulation : Supports querying, updating, and deleting data. - Data Security: Manages user access and data integrity.  3. Table (Relation) Definition: A table, also known as a relation, is a collection of rows and columns used to store data in a relational database. C...