9.3 DATABASES (CIE)

TOPIC 9.3 | DATABASES

ON THIS PAGE
9.3 Understand the purpose of a primary key and identify a suitable primary key for a given database table.

ALSO IN THIS SECTION
9.1 DEFINING A DATABASE
9.2 DATA TYPES
9.3 PRIMARY KEYS
9.4 SQL (STRUCTURED QUERY LANGUAGE)
KEY TERMINOLOGY

WHAT IS A PRIMARY KEY

In database management, a primary key is a unique identifier for each record in a database table. It is a specific choice of a minimal set of columns (attributes) that uniquely define a row in a table. The primary key's main purposes are:

Uniqueness | Ensures that each record in the table is distinct from all others. No two rows can have the same primary key value, which helps in maintaining data integrity.
Indexing | Primary keys are often used to create indexes in a database, allowing for faster search and retrieval of data.
Relationships | They are crucial in establishing relationships between tables. In relational databases, primary keys from one table are often used as foreign keys in another, creating a link between the two tables.

PRIMARY KEY CHARACTERISTICS

Uniqueness | It must uniquely identify each record.
Non-Modifiable | Once created, the value of the primary key should not change. It should be a stable identifier for the record.
Not Null | A primary key cannot be null. Every record needs to have a value for the primary key.
Simplicity | Ideally, it should be compact and consist of as few columns as possible

IDENTIFYING A SUITABLE PRIMARY KEY

When determining a suitable primary key for a given database table, consider the following:

Natural vs. Surrogate Keys | A natural key is derived from existing data that naturally identifies a record (like an email address in a user table), while a surrogate key is a unique identifier created by the database (like an auto-incremented number).
Data Type and Length | Choose a data type that is appropriate for the key (e.g., integer, GUID) and ensure it is of optimal length for performance.
Consistency | Ensure the key is consistently available and does not change over time.

Example
Consider a database table for storing customer information. The table includes fields like CustomerID, Name, Email, Phone, and Address. In this scenario, CustomerID is a suitable primary key. It is unique for each customer, non-modifiable, always present, and simple (likely an integer).

A well-chosen primary key ensures efficient data management and retrieval, and it plays a critical role in maintaining the integrity and structure of the database.

In the example above it may not be a good idea to use the phone number or email address as the primary key for the following reasons

Redistribution of Phone Numbers | Phone numbers can be reassigned or redistributed to different individuals over time. This can lead to conflicts or inaccuracies in the database, as the same phone number might be associated with different users at different times.
Length and Format Variability | Phone numbers vary in length and format, depending on the country and whether they are mobile or landline numbers. This variability can cause inconsistencies in the database structure, complicating data management and potentially affecting performance due to non-uniform indexing.
Duplication Issues | Both phone numbers and email addresses may be shared by multiple individuals, such as family members or employees at the same company. This shared usage poses a problem for uniqueness – a fundamental requirement of primary keys. If one family member or employee has already registered with a particular phone number or email, the same contact information cannot be used to uniquely identify another individual in the database.
Changeability and Mutability | Phone numbers and email addresses are not immutable. Users may change their contact information for various reasons, such as changing phone providers or creating new email accounts. Since primary keys ideally should be stable and unchanging, using such mutable data can lead to significant challenges in maintaining data integrity and consistency.