IB COMPUTER SCIENCE OPTION A DATABASES

DATABASES | RELATIONAL DATABASES

ALSO IN THIS TOPIC
BASIC CONCEPTS
RELATIONAL DATABASES 2.1 - 2.8
YOU ARE HERE | RELATIONAL DATABASES 2.9 - 2.20
DATABASE MANAGEMENT
DATABASE MODELS AND ANALYSIS
DATABASE KEY TERMINOLOGY
DATABASE ANSWERS

Topics from the International Baccalaureate (IB) 2014 Computer Science Guide.

SECTION 1 | DATABASE TERMINOLOGY

Table: A table is a collection of related data organised in rows and columns. Tables are used to store data in a database and are often named based on the type of data they contain.
Record: A record is a collection of data that represents a single entity in a table. A record is also known as a row, and it typically contains information about a specific item or object, such as a customer, order, or product.
Field: A field is a single piece of data stored in a record. A field is also known as a column, and it represents a specific attribute or characteristic of the entity represented by the record.
Primary Key: A primary key is a field or combination of fields in a table that uniquely identifies each record in the table. A primary key is used to enforce data integrity and ensure that no two records in the table are identical.
Secondary Key: A secondary key is a field or combination of fields in a table that is not the primary key but can be used to access and query data in the table.
Foreign Key: A foreign key is a field in a table that refers to the primary key of another table. A foreign key is used to create a relationship between two tables and ensure data integrity across the tables.
Candidate Key: A candidate key is a field or combination of fields in a table that could be used as the primary key but is not currently used for that purpose. A candidate key is used to ensure that no two records in the table are identical.
Composite Primary Key: A composite primary key is a primary key that consists of two or more fields in a table. A composite primary key is used when a single field is not sufficient to uniquely identify each record in the table.
Join: A join is a database operation that combines data from two or more tables based on a related field. A join is used to combine data from multiple tables into a single result set that can be used for data analysis or reporting.

SECTION 2 | WHAT IS AN INNER JOIN

An inner join is a type of join operation in a database that combines data from two or more tables based on a common field. An inner join returns only the rows from each table that have matching values in the specified field, excluding any rows that do not have matching values.

Here is an example to illustrate how an inner join works:

Suppose you have two tables, a "Customers" table and an "Orders" table. The "Customers" table contains information about each customer, such as their name and address, while the "Orders" table contains information about each order, such as the order number and the customer who placed the order. Both tables have a common field, such as a customer ID.

To perform an inner join between these two tables, you would specify the customer ID field as the common field. The inner join would then return only the rows from each table where there is a matching customer ID, and exclude any rows where there is no matching customer ID.

For example, suppose the "Customers" table has a row with a customer ID of 123 and a name of "John Smith", and the "Orders" table has a row with an order number of 456 and a customer ID of 123. When you perform an inner join between these tables, the result set would contain only the row with customer ID 123, and exclude any other rows where there is no matching customer ID.

In summary, an inner join is a type of join operation that combines data from two or more tables based on a common field, returning only the rows that have matching values in the specified field. Inner joins are commonly used in database management to combine data from multiple tables into a single result set for data analysis or reporting.

SECTION 3 | ISSUES WITH REDUNDANT DATA

Redundant data refers to data that is unnecessarily duplicated or repeated in a database. Redundant data can cause a number of issues, including:

Data Inconsistency: When data is stored redundantly, it is possible for different copies of the same data to become inconsistent.
Data Integrity: Redundant data can also compromise data integrity by making it more difficult to maintain the accuracy and completeness of the data. When data is stored redundantly, it is more difficult to ensure that all copies of the data are updated consistently and accurately.
Storage Costs: Redundant data can also be costly in terms of storage space. When data is duplicated unnecessarily, it takes up more space in the database, which can increase storage costs and reduce system performance.
Maintenance Costs: Redundant data can also increase the cost of maintaining and updating the database. When data is stored redundantly, it requires additional effort to keep all copies of the data up to date and accurate.
Security Risks: Redundant data can also pose security risks by increasing the number of potential attack points for malicious actors. If redundant data is not properly secured, it can be more easily accessed and manipulated by unauthorised users.

Redundant data can cause a number of issues for a database, including data inconsistency, compromised data integrity, increased storage and maintenance costs, and security risks. By eliminating or minimising redundant data in a database, organisations can improve the accuracy and consistency of their data, reduce storage and maintenance costs, and enhance data security.

SECTION 4 | REFERENTIAL INTEGRITY

Referential integrity is an important concept in database design, particularly in a normalised database, it ensures that the relationships between tables in a database are maintained and that data is accurate and consistent across the tables.

Referential integrity ensures that the data in the database is accurate and consistent. By enforcing relationships between tables, referential integrity ensures that data is entered consistently and that all related data is kept up to date.

Referential integrity also helps to ensure data integrity in the database by preventing the insertion of invalid data into the database. By enforcing relationships between tables, referential integrity ensures that data is entered into the correct tables with the correct relationships, reducing the risk of data errors and inconsistencies.

This ensures that the relationships between tables in a database are maintained, ensuring data accuracy and consistency, data integrity, improved database performance, and enhanced data security. By ensuring that data is accurate, consistent, and secure, referential integrity helps organisations to make better decisions and operate more effectively.

SECTION 5 | NORMALISATION

The normalisation process is used to organise data in a database into tables and establish relationships between them. The process involves several steps, each of which is designed to remove data redundancies and dependencies. The three most commonly used normal forms are 1st Normal Form (1NF), 2nd Normal Form (2NF), and 3rd Normal Form (3NF). Here are the differences between each of these normal forms:

1st Normal Form (1NF): In 1NF, each table in a database contains only atomic values, meaning that each column contains only a single value. This means that data is not stored in a repeating group or array format, and each table has a primary key that uniquely identifies each row.
2nd Normal Form (2NF): In 2NF, the table must be in 1NF and each non-key column must be functionally dependent on the entire primary key. This means that each non-key column must be uniquely determined by the primary key, and cannot be determined by a subset of the primary key.
3rd Normal Form (3NF): In 3NF, the table must be in 2NF and all non-key columns must be independent of each other. This means that each non-key column should contain only data that is related to the primary key, and not contain data that is related to other non-key columns.

To summarise, 1NF requires that each table contain only atomic values, 2NF requires that each non-key column be functionally dependent on the entire primary key, and 3NF requires that all non-key columns be independent of each other. These normal forms are used to ensure that the data in a database is organised efficiently, and is free from data redundancies and dependencies.

SECTION 6 | BENEFITS OF A NORMALISED DATABASE

A normalised database is a database that is structured in accordance with the principles of database normalisation. Database normalisation is a process used to organise data in a database into tables and establish relationships between them, with the aim of reducing data redundancy and ensuring data integrity. A normalised database has the following characteristics:

Minimal Data Redundancy: A normalised database minimises data redundancy by organising data into tables and removing data that is repeated or duplicated unnecessarily. This helps to reduce the size of the database and improve database performance.
Consistent Data: A normalised database ensures that data is consistent across tables by removing data redundancies and dependencies. This helps to improve data integrity and reduce the risk of data inconsistencies.
Reduced Update Anomalies: A normalised database reduces the risk of update anomalies by ensuring that data is stored in the appropriate table and that each table contains only a single, logically related category of data. This helps to ensure that updates to the data are made only once and that the data remains consistent across the database.
Increased Scalability: A normalised database is highly scalable, meaning that it can be easily expanded or modified to accommodate new data or changing business needs. This is because the database is organised into tables, which can be modified or added as needed without affecting the rest of the database.
Improved Query Performance: A normalised database often has better query performance because data is organised into smaller, more manageable tables. This allows queries to be processed more quickly and efficiently, resulting in faster data retrieval times.
Simplified Maintenance: A normalised database is easier to maintain because data is organised into tables, making it easier to identify and fix errors or inconsistencies in the data. This helps to reduce the cost and effort required to maintain the database over time.

A normalised database is structured in accordance with the principles of database normalisation, with the aim of reducing data redundancy and ensuring data integrity. A normalised database is characterised by minimal data redundancy, consistent data, reduced update anomalies, increased scalability, improved query performance, and simplified maintenance.

MORE ON NORMALISATION CLICK HERE

SECTION 7 | DATA TYPES

The term 'Data Type' refers to the type of data used, for example it could be text, numbers, dates or time, boolean (Yes/No), Currency or an object such as an image or link. Each data type has its own data format, for example a data might be written DD/MM/YY or MM/DD/YY. Before setting up a database you will need to decide on data types and data formats for each field within your database. Once the data type and format for each field is set when you first create the database it should not be changed and it will restrict the data that is allowed to be entered, this then helps to ensure data integrity. Data integrity is the completeness, correctness or accuracy of data.

Imagine the problems that could occur if you did not defining the date data format when you set up the database. Giving the date 11th Jan 2021 as an example, an English person might enter this data as 11/01/21 an American person might enter this data as 01/11/21.

The list below shows some data types with example data formats.
Text: Two options within the Text data type are short text and long text. Short text is used for under 256 characters to be entered, as standard most databases as set to short text, and you would need to specify long text at the set up stage if you want to change this. Note: numbers such as phone numbers are often set as text data types, this is because they start with a 0 (zero) which can cause problems for number fields. Exam Note: The data type of a phone number has been a popular examination question.

Numbers: Numbers can normally be formatted as integers, decimal, scientific.

Boolean: Boolean fields are used when you want to enforce one of two options for example YES or NO, ON or OFF, M or F, TRUE or FALSE, 1 or 0. Some database software may only allow 1 or 0 as options in the boolean datatype selection.

Date/Time: Data and time are often combined, and after selecting this as a data type you select the format. Time options may be 12 or 24 hour formats and then the format such as hh/mm/ss. Date option normally include the structure of the date such as DD/MM/YYYY.

Currency: Currency datatype will allow you to choose the currency used for example $ or £, it will also allow you to specify the number of decimal places.

Object: An object would normally be something that you cannot enter via the keyboard such as music or a picture, but you could also have items such as hyperlinks as objects.

SECTION 8 | ENTITY RELATIONSHIP DIAGRAMS

An entity-relationship diagram (ERD) is a graphical representation of the relationships between entities in a database. It is used to model the data and relationships that exist within a database, and is a key tool in the database design process.

An ERD consists of entities, attributes, and relationships. An entity is a person, place, thing, or event that is relevant to the database, and is represented by a rectangle on the diagram. An attribute is a characteristic or property of an entity, and is represented by an oval or ellipse. A relationship is a connection between entities, and is represented by a line that connects the related entities.

ERDs are used to model complex databases, allowing developers to visualise the relationships between entities and to identify any potential issues or inconsistencies in the design. They are often used in conjunction with other tools, such as data flow diagrams and data dictionaries, to ensure that the database is well-designed and meets the requirements of the stakeholders. ERDs can also be used to communicate the design of the database to non-technical stakeholders, such as business analysts and project managers, in a clear and understandable way.

MORE ON ENTITY RELATIONSHIP DIAGRAMS CLICK HERE

SECTION 9 | QUERIES

A query is a request for data from a database. By executing a query, a user can retrieve and manipulate data stored in the database. A query can also be used to provide a view of a database, allowing users to see the data in a specific format that is customised to their needs. Queries can provide a view of a database in some of the following ways:

Selecting Fields: When creating a query, the user can select the fields they want to view from the database. This allows them to focus on specific information that is relevant to their needs. For example, a user might create a query that selects only the customer name, order number, and date for all orders placed in the last month.
Filtering Data: Queries can also be used to filter data based on specific criteria. This allows the user to view only the data that meets their specific needs. For example, a user might create a query that only shows orders from a specific region or orders that contain a certain product.
Sorting Data: Queries can also be used to sort data in a specific way. This allows the user to view the data in a way that makes sense for their needs. For example, a user might create a query that sorts orders by order date, so that the most recent orders appear at the top.
Grouping Data: Queries can also be used to group data together based on specific criteria. This allows the user to view data in a summarised format. For example, a user might create a query that groups orders by region or by product category.
Calculating Data: Queries can also be used to calculate data based on specific criteria. This allows the user to view the data in a way that is useful for their needs. For example, a user might create a query that calculates the average order size or total sales for a specific period.

By combining these techniques, a query can provide a view of a database that is customised to the user's needs. This allows the user to access the data they need in a way that is easy to understand and use.

SECTION 10 | SIMPLE QUERIES VS COMPLEX QUERIES

A simple query is a basic request for data from a database, typically involving only a single table and a small number of fields. A simple query is usually straightforward and easy to understand, and can be created using simple query languages or graphical user interfaces.

In contrast, a complex query is a more sophisticated request for data from a database, often involving multiple tables and complex operations. A complex query can be used to retrieve data that meets specific criteria or to perform advanced calculations or data manipulations.

Some key differences between simple and complex queries:

Complexity: Simple queries are less complex than complex queries, typically involving only a single table and a small number of fields. Complex queries, on the other hand, involve multiple tables, complex operations, and advanced functions.
Purpose: Simple queries are often used to retrieve specific data from a database, while complex queries are used to perform advanced data manipulation and analysis.
Performance: Simple queries are usually faster and more efficient than complex queries, as they involve less data and processing. Complex queries, on the other hand, can be slower and more resource-intensive, especially if they involve large amounts of data or complex calculations.
Ease of Use: Simple queries are generally easier to create and understand than complex queries. Simple queries can often be created using simple query languages or graphical user interfaces, while complex queries may require more advanced technical skills and knowledge.

Simple queries are basic requests for data from a database, involving a single table and a small number of fields. They are typically straightforward and easy to understand. Complex queries, on the other hand, involve multiple tables, complex operations, and advanced functions, and are used to perform advanced data manipulation and analysis. They can be more resource-intensive and require more advanced technical skills and knowledge.

SECTION 11 | CREATING A QUERY

Constructing a query involves creating a request for data from a database. The methods used to construct a query can vary depending on the type of database, the query language being used, and the specific requirements of the user. Although at IB level it is quite likely that a query will be done using SQL, here are some different methods that can be used:

Graphical User Interfaces (GUIs): Many database management systems (DBMS) provide graphical user interfaces (GUIs) that allow users to create queries using a visual interface. Users can select tables and fields, add filters and sorting criteria, and build complex queries using drag-and-drop functionality.
Query Languages: Query languages such as SQL (Structured Query Language) and LINQ (Language-Integrated Query) can be used to construct queries. These languages provide a syntax for creating queries that can be executed on a database. SQL is a standard language used for creating and managing relational databases, while LINQ is a .NET Framework component used to query collections and databases.
Stored Procedures: A stored procedure is a set of precompiled SQL statements that can be executed on a database. Stored procedures can be created to perform specific tasks or to retrieve data that meets specific criteria. They can be called from applications or other stored procedures to retrieve data from the database.
Data Access Layers: Data access layers provide a way to abstract the database from the application code. They provide a set of methods and functions that can be used to retrieve data from the database. The data access layer can be used to create queries, and the results can be returned to the application code for further processing.
Object-Relational Mapping (ORM): ORM tools provide a way to map database tables to object-oriented code. The ORM tool can be used to create queries and retrieve data from the database. The results can be returned as objects that can be used by the application code.
Web-Based Interfaces: Web-based interfaces can be used to create and execute queries from a web browser. These interfaces can provide a simple way to access the database from anywhere with an internet connection.

There are several methods that can be used to construct a query, including graphical user interfaces, query languages, stored procedures, data access layers, object-relational mapping tools, and web-based interfaces. The choice of method depends on the specific requirements of the user and the database management system being used.

ALSO IN THIS TOPIC

BASIC CONCEPTS
RELATIONAL DATABASES 2.1 - 2.8
RELATIONAL DATABASES 2.9 - 2.20
DATABASE MANAGEMENT
DATABASE MODELS AND ANALYSIS
DATABASE KEY TERMINOLOGY
DATABASE ANSWERS