SQL

SQL

2019-10-14T23:46:37.165Z

Basics

SELECT

Retrieving from a single table:

SELECT * FROM tableA;
SELECT column1, column2 FROM tableA;

Retrieving from multiple tables:

SELECT first_table.column, second_table.column
FROM first_table
JOIN second_table
    ON join_condition

SELECT first_table.column, second_table.column, third_table.column
FROM first_table
JOIN second_table
    ON join_condition
JOIN third_table
    ON join_condition

You can also use SELECT in subqueries when creating a table.

CREATE TABLE table AS
SELECT ...

Avoid SELECT * and fetch only the columns you need.

UPDATE

UPDATE table
SET column1 = value1
WHERE condition;

DELETE

DELETE FROM table
WHERE condition;

JOIN

A JOIN lets you combine data from two different tables.

Joins without JOIN statements:

SELECT
    products.title,
    purchases.quantity
FROM
    products,
    purchases
WHERE
    products.id = purchases.product_id
LIMIT 5;

INNER JOIN

An inner INNER JOIN of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection. An INNER JOIN gives the intersection of the two tables, i.e. only the rows they have in common.

Examples: If a value exists in the column a in one table but not in the column b in another other table, then the join fails for the rows containing that value and those rows are excluded from the result set.

a | b
--+--
3 | 3
4 | 4
SELECT *
FROM orders
JOIN customer
    ON orders.customer_id = customer.customer_id;

JOIN specifies the second table. JOIN defaults to INNER JOIN if only JOIN is used.

ON specifies the JOIN condition, i.e. how the JOIN must be executed. If the names of the columns used to join the two tables are identical (table1.colX = table2.colX), it can be shortened with the keyword USING:

SELECT *
FROM orders
JOIN customer
    USING (customer_id);
-- identical to:
-- ON orders.customer_id = customer.customer_id;
-- It only works if the column name is identical.

The word before the dot is the table. The word after the dot is the column.

An alias can be given to a table name:

SELECT order_id, c.last_name
FROM orders o
JOIN customer c
ON o.customer_id = c.customer_id;

Best practice: An SQL join is still more efficient than the nested selects approach. Get to know your ORM and take control of joins.

Warning: AS is only used for the resulting columns right after SELECT in the first line, not for table names.

Warning: Try to avoid the implicit JOIN:

SELECT *
FROM customers c, orders o
WHERE c.customer_id = o.customer_id;

A table can be joined to itself, either as an INNER JOIN or as an OUTER JOIN.

SELECT *
FROM employees e1
JOIN employees e2
    ON e1.reports_to = e2.id;

OUTER JOIN

An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union. A LEFT JOIN will give all rows in A, plus any common rows in B.

a |  b
--+-----
1 | null
2 | null
3 |    3
4 |    4

In an INNER JOIN, you are only retrieving records that match the ON condition. But what if some of the records do not match the condition and you still want to include them? Two types of OUTER JOIN:

  • LEFT JOIN: Retrieve all the records from both tables, but ignore the condition for the first table.
  • RIGHT JOIN: Retrieve all the records from both tables, but ignore the condition for the second table.

Warning: Try to avoid RIGHT JOINs--only use INNER JOINs and LEFT JOINs.

The OUTER keyword is optional if you are already using LEFT or RIGHT.

SELECT *
FROM products p
LEFT JOIN order_items oi
    ON p.product_id = oi.product_id; -- all rows from p are returned

Or from multiple tables:

SELECT *
FROM products p
LEFT JOIN order_items oi
    ON p.product_id = oi.product_id;
LEFT JOIN shippers sh
    ON oi.shipper_id = sh.shipper_id; -- all rows from p and sh are returned

NATURAL JOIN

A JOIN that uses the common columns, i.e., columns that have the same name.

SELECT *
FROM orders o
NATURAL JOIN customers c

Warning: In a NATURAL JOIN, the database engine infers the common columns, try to avoid using NATURAL JOINs.

CROSS JOIN

Retrieves every record from the first and second tables. (Remember, OUTER JOINs retrieve every record from one of the tables and the common rows in the other.) This is why CROSS JOINs need no condition.

SELECT *
FROM customers c
CROSS JOIN products p

INSERT

INSERT INTO table
VALUES (value1, value2);

INSERT INTO table (column1, column2)
VALUES (value1, value2);

INSERT INTO table (column1)
VALUES (value1), (value2), (value3);

Operators

<, >, <=, =>, =, !=

SELECT *
FROM customers
WHERE purchases > 10;

AND, OR

SELECT *
FROM customers
WHERE state = 'VA' AND state = 'FL';

NOT

SELECT *
FROM customers
WHERE NOT state = 'VA';

IN

SELECT *
FROM customers
WHERE state IN ('VA', 'FL');

BETWEEN

SELECT *
FROM customers
WHERE points BETWEEN 10 AND 20;

(Values are inclusive.)

LIKE

SELECT *
FROM customers
WHERE last_name LIKE 'b%';
-- % stands for any number of characters
-- _ stands for a single character

Avoid LIKE expressions with leading wildcards (e.g., '%TERM').

REGEXP

SELECT *
FROM customers
WHERE last_name REGEXP 'field$';
-- ^ beginning
-- $ end
-- | logical or
-- [abcd] any single character listed
-- [a-d] range of single character listed

UNION

Combine records retrieved from two or more queries.

SELECT
    order_id,
    order_date,
    'Active' as status
FROM orders
WHERE order_date >= '2019-01-01'
UNION
SELECT
    order_id,
    order_date,
    'Archived' as status
FROM orders
WHERE order_date < '2019-01-01'

Modifiers

WHERE

SELECT * FROM tableA WHERE age > 18;

A poorly written WHERE clause is the first ingredient of a slow query. The most reliable method for arriving at the best execution plan is to avoid unnecessary filters in the SQL statement.

ORDER BY

SELECT * FROM tableA ORDER BY age;
SELECT * FROM tableA ORDER BY age DESC;
SELECT * FROM tableA ORDER BY last_name, first_name;
-- ASC is default

AS

(alias for column)

SELECT name, age * 2 AS double_age
FROM tableA;

(with string value, alias for row)

SELECT 'Totals' as date_range
SUM(invoice_total) as total_sales,
SUM(pay_total) as total_pay,
FROM tableA
date_range | total_sales | total_pay
-----------+-------------+-----------
Totals     | 1539.8      | 231.5

DISTINCT

(unique values)

SELECT DISTINCT state FROM customers;

IS NULL

SELECT * FROM customers WHERE phone_number IS NULL;
SELECT * FROM customers WHERE phone_number IS NOT NULL;

When looking for a NULL value, you have to use IS NULL instead of = NULL. This is because the SQL standard does not define NULL as a value but rather as a placeholder for a missing or unknown value. Consequently, no value can be NULL.

LIMIT

SELECT * FROM customers LIMIT 5;
SELECT * FROM customers LIMIT 2, 3;
-- If there are two numbers, the first is the offset and the second is the limited.
-- In this example, two rows are skipped and the next three are listed.

Comments

-- This is a comment.

Placeholder

The question mark (?) is a placeholder where a parameter will be plugged in from your application language. Many databases offer a proprietary extension, such as a colon (

).

SELECT message FROM messages
WHERE processed = 'No' AND receiver = ?

Data types

Best practice: Use the smallest datatype that suits your needs. The less data you are using, the faster your queries are going to be.

String types

  • CHAR (fixed length)
  • VARCHAR (64 KB, 65,525 characters)
  • MEDIUMTEXT (16 MB, 16 million characters)
  • LARGETEXT (4 GB)
  • TINYTEXT (255 bytes)

Best practice: Set a number for VARCHAR() to simplify database maintenance, such as 50 for short strings and 255 for medium-length strings.

Numeric types

  • TINYINT (1 byte)
  • UNSIGNED TINYINT (0 to 255)
  • SMALLINT
  • MEDIUMINT
  • INT
  • BIGINT
  • DECIMAL (precision, scale) (synonyms: DEC, NUMERIC, FIXED)
  • DOUBLE
  • FLOAT

Boolean types

  • TRUE
  • FALSE

Enum types

  • ENUM('small', 'medium', 'large') (Using any other value for this field will cause an error. However, as a best practice, ENUMs should be avoided in favor of a lookup table with codes for the enumerated options.)

Date types

  • DATE
  • TIME
  • DATETIME
  • TIMESTAMP
  • YEAR

Blob types

For storing large blobs of binary data, e.g. containing images or videos.

  • TINYBLOB
  • BLOB
  • MEDIUMBLOB
  • LONGBLOB

Relational databases are designed for working with structuring data, not binary data. Blobs increase database size and slows down backups.

JSON type

  • JSON

Intermediate

Constraints

A constraint is simply a restriction placed on one or more columns of a table. There are several different types of constraints, including:

  • Primary key constraints: Identify the column or columns to guarantee uniqueness within a table.
  • Foreign key constraints: Restrict one or more columns to contain only values found in another table's primary key columns. They may also restrict the allowable values in other tables if update cascade or delete cascade rules are established.
  • Unique constraints: Restrict one or more columns to contain unique values within a table (Primary key constraints are a special type of unique constraint.)
  • Check constraints: Restrict the allowable values for a column.

Primary key

A primary key is a column that uniquely identifies the rows in a table. Primary key columns are by definition unique.

A composite primary key is one made up of two columns that uniquely identify the rows in a table. Example, an enrollments table containing a course_id and a student_id and using the combination as a primary key. (This would prevent a student from enrolling on the same course twice.) To create a composite primary key, tick the checkbox for the primary key in both columns.

A primary key should be short, so although an e-mail column would work as a unique identifier, it has too many characters. Also, ideally a primary key should not change.

Each table should have its own primary key.

Primary key columns that are simply INT identifiers should have auto-increment.

Foreign key

Whenever we establish a relationship between two tables, one table is the parent or primary key table, and the other is child or foreign key table.

A foreign key is a column in one table that references the primary key in another table.

ON UPDATE CASCADE and ON DELETE CASCADE specify, in the event of updating or deleting a primary key, whether this update or deletion should cascade down to the foreign keys in the child tables. Instead of CASCADE, there is also RESTRICT, which prevents the update or deleting from occurring if there the parent has a child. Other options: SET NULL (causes orphan records) and NO ACTION.

ON UPDATE CASCADE: Known as a cascading update, this type of foreign key constraint features the ON UPDATE CASCADE directive after the REFERENCES directive. This clause directs the server to automatically propagate any changes in the parent to all child rows. So, if you change the primary key value (parent), any tables that have foreign key references to the primary ky value (children) will be changed accordingly.

ON UPDATE DELETE: With this version of the constraint in place, the server will now update child rows in the product table when a row in the product_type table is updated, as well as delete child rows in the product table when a row in the product_type table is deleted.

Primary key     | Foreign key
----------------+----------------
Parent          | Child
Referenced      | Referencing
Independent     | Dependent
Defined here    | Defined elsewhere

With foreign key constraints in place, if a user attempts to insert a new row or change an existing row such that a foreign key column does not have a matching value in the parent table, the server raises an error. The foreign key constraint does not let you change a child row if there is no corresponding value in the parent.

CREATE DATABASE

CREATE DATABASE IF NOT EXISTS mydb;

DROP DATABASE

DROP DATABASE IF EXISTS mydb;

CREATE TABLE

CREATE TABLE IF NOT EXISTS customers
(
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50) NOT NULL
    points INT NOT NULL DEFAULT 0
    email VARCHAR(255) NOT NULL UNIQUE
);

ALTER TABLE

ALTER TABLE customers
    ADD last_name VARCHAR(50) NOT NULL AFTER first_name
    MODIFY COLUMN first_name VARCHAR(55) DEFAULT ''
    DROP points;

DROP TABLE

DROP TABLE IF EXISTS customers;

FOREIGN KEY

CREATE TABLE orders
(
    order_id INT PRIMARY KEY
    customer_id INT NOT NULL
    FOREIGN KEY fk_orders_customers (customer_id)
        REFERENCES customers (customer_id)
        --customer_id in orders table references customer_id in customers table
    ON UPDATE CASCADE
    ON DELETE NO ACTION
);

Convention: fk underscore childTable (i.e. the foreign key table, referring table) underscore parentTable (i.e. the primary key table, referenced table).

DROP FOREIGN KEY

ALTER TABLE orders
    ADD PRIMARY KEY (order_id),
    DROP PRIMARY KEY,
    DROP FOREIGN KEY fk_orders_customers;
    ADD FOREIGN KEY fk_orders_customers (customer_id)
        REFERENCES customers (customer_id)
    ON UPDATE CASCADE
    ON DELETE NO ACTON;

Advanced

Functions

Aggregate functions

Aggregate functions summarize data.

  • MAX();
  • MIN();
  • AVG();
  • SUM();
  • COUNT();
SELECT MAX(invoice_total) AS highest
FROM invoices;

In COUNT(), by default null values are excluded and duplicate values are included.

Aggregate functions have their own modifiers:

GROUP BY

Collect aggregate results by column.

SELECT
    SUM(invoice_total),
    client_id
FROM invoices
GROUP BY client_id

GROUP BY goes always after the WHERE modifier. GROUP BY can then be followed by the ORDER BY modifier.

HAVING

HAVING is the equivalent of a WHERE modifier for GROUP BY. WHERE filters data before the rows are grouped; HAVING filters data after the rows are grouped.

SELECT
    client_id,
    SUM(invoice_total) as total_sales,
FROM invoices
GROUP BY client_id
HAVING total_sales > 500;

The column in the HAVING modifier must be one of the selected columns. WHERE does not have this requirement.

WITH ROLLUP

WITH ROLLUP returns one extra row that summarizes each group in the GROUP BY clause. (Available in MySQL only.)

Numeric functions

ROUND()
SELECT ROUND(5.71, 1);
-- returns 5.7
TRUNCATE()
SELECT TRUNCATE(5.7146, 2);
-- returns 5.71
CEILING()
SELECT CEILING(5.7);
-- returns 6
FLOOR()
SELECT FLOOR(5.7);
-- returns 5
ABS()
SELECT ABS(-5.2)
-- returns 5.2
RAND()
SELECT RAND()
-- returns a random floating-point number

String functions

LENGTH()
SELECT LENGTH('hello');
-- returns 5
UPPER() / LOWER()
SELECT UPPER('hello')
-- returns HELLO
TRIM()
SELECT TRIM('   hello')
-- returns hello
LEFT() / RIGHT()
SELECT LEFT('kindergarten', 6)
-- returns kinder
SUBSTRING()
SELECT SUBSTRING('kindergarten', 3, 5)
-- arguments are start position and length
-- returns nderg
-- omitting the second argument means till the end
LOCATE()
SELECT LOCATE('n', 'kindergarten');
-- returns 3
CONCAT()
SELECT CONCAT('first', 'last')
-- returns firstlast

Date functions

NOW()
SELECT NOW();
-- returns current date and time
CURDATE()
SELECT CURDATE();
-- returns current date
CURTIME()
SELECT CURTIME();
-- returns current time
YEAR()
SELECT YEAR(NOW());
-- returns current year

Also available: MONTH(), DAY(), MINUTE(), SECOND(), DAYNAME(), MONTHNAME().

EXTRACT()
SELECT EXTRACT(DAY FROM NOW());
-- returns current day number
DATE_FORMAT()
DATE_FORMAT(NOW(), '%m %d %y');
-- returns formatted date
TIME_FORMAT()
TIME_FORMAT(NOW(), '%m %d %y');
-- returns formatted date
DATE_ADD()
SELECT DATE_ADD(NOW(), INTERVAL 1 DAY);
-- returns tomorrow
DATE_SUB()
SELECT DATE_SUB(NOW(), INTERVAL 1 DAY);
-- returns tomorrow
DATE_DIFF()
SELECT DATEDIFF('2019-01-05', '2019-01-01');
-- returns 4 (always in days)
TIME_TO_SEC()
SELECT TIME_TO_SEC('09:02') - TIME_TO_SEC('09:00');
-- returns 120

Conditional functions

IFNULL()

Replace a null value with a string.

SELECT
    order_id,
    IF_NULL(shipper_id, 'Not assigned')
FROM orders;
COALESCE()

Replace a null value with the value for that row in another column; if that value is also null, then return a string.

SELECT
    order_id,
    COALESCE(shipper_id, comments, 'Not assigned')
FROM orders;
IF()
SELECT
    order_id,
    order_date,
    IF(
        YEAR(order_date) = YEAR(NOW()),
        'Active',
        'Archived'
    ) AS category
FROM orders
CASE / WHEN / THEN / ELSE
SELECT
    order_id,
    order_date,
    CASE
        WHEN YEAR(order_date) = YEAR(NOW()) THEN 'Active'
        WHEN YEAR(order_date) = YEAR(NOW()- 1) THEN 'Last year'
        WHEN YEAR(order_date) < YEAR(NOW()-1 ) THEN 'Active'
        ELSE 'Future'
    END CASE AS category
FROM orders

Subqueries

A query within a query. The child query retrieves a piece of data that is used in the parent query.

SELECT *
FROM products
WHERE unit_price > (
    SELECT unit_price
    FROM products
    WHERE product_id = 3
);
-- Get the price for an ID. Then use the price as part of the comparison
-- to get a list of products with a higher price than the price for that ID.

If your subquery returns more than one row, you will not be able to use it on one side of an equality condition.

Subqueries are often used with the IN operator:

SELECT *
WHERE product_id NOT IN (
    SELECT product_id
    FROM order_items
);
-- Get the list of product IDs that have been ordered. Then look for product IDs
-- that are not there, i.e., that have not been ordered yet.

Subquery modifiers

ALL
SELECT *
FROM invoices
WHERE invoice_total > (
    SELECT MAX(invoice_total)
    FROM invoices
    WHERE client_id = 3
);
-- This query selects invoices that are higher than all invoices of client 3.

With the ALL keyword:

SELECT *
FROM invoices
WHERE invoice_total > ALL (
    SELECT invoice_total
    FROM invoices
    WHERE client_id = 3
);
-- This query selects also invoices that are higher than all invoices of
-- client 3. The subquery retrieves all the invoices of client 3, i.e. multiple
-- values are returned. The parent query compares each row against ALL the values
-- in the subquery. If true, the row is returned.
ANY/SOME

Same as ALL, but one true suffices.

SELECT *
FROM clients
WHERE client_id IN (
    SELECT client_id
    FROM invoices
    GROUP BY client_id
    HAVING COUNT(*) >= 2
);
-- Select the clients with at least two invoices.

SELECT *
FROM clients
WHERE client_id = ANY (
    SELECT client_id
    FROM invoices
    GROUP BY client_id
    HAVING COUNT(*) >= 2
);
-- Select the clients with at least two invoices. Same as above but with the ANY keyword.

Correlated subqueries

SELECT *
FROM clients c
WHERE EXISTS (
    SELECT client_id
    FROM invoices
    WHERE client_id = c.client_id
)

The subquery is correlated to the parent query via the clients (c) table reference. EXISTS returns true if at least one row is returned from the subquery. If the subquery returns false, the parent query is not executed.

EXISTS makes the query more efficient if the subquery returns a large result set, because EXISTS whittles the subquery result down to a simple true/false value.

A correlated subquery is dependent on its containing statement from which it references one or more columns.

Below, c.cust_id in the subquery refers to (i.e., correlates to) the alias c defined in the containing or main query. The reference to c.cust_id at the very end of the subquery is what makes the subquery correlated; the containing query must supply values for c.cust_id for the subquery to execute.

SELECT c.cust_id, c.cust_type_cd, c.city
FROM customer c
WHERE 2 = (SELECT COUNT(*)
FROM account a
WHERE a.cust_id = c.cust_id);

Subquery location

Subqueries may appear in the WHERE clause, in the SELECT clause and in the FROM clause.

SELECT
    invoice_id,
    invoice_total,
    (SELECT AVG(invoice_total) FROM invoices) as invoice_average,
    invoice_total - SELECT(invoice_average) as difference
FROM invoices;

Views

You create a view by naming a SELECT statement and storing the query under that name.

Creating a view

Create a view and use it just like a table. (Views do not store data.)

CREATE VIEW sales_by_client AS
SELECT
    c.client_id,
    c.name,
    SUM(invoice_total) AS sales
FROM clients c
JOIN invoices i USING (client_id)
GROUP BY cliend_id, name;

SELECT *
FROM sales_by_client
WHERE total_sales > 500;

SELECT *
FROM sales_by_client
JOIN clients USING (client_id);

To edit a view, you can drop it and recreate it:

DROP VIEW sales_by_client;

Or you can alter the existing view;

CREATE OR REPLACE sales_by_client AS
-- edited view contents

The best practice is to save the view (the CREATE VIEW X AS statement) in an sql file and place it under source control.

Updating or deleting a view

DELETE FROM sales_by_client
WHERE invoice_id = 1;

UPDATE sales_by_client
SET due_date = DATE_ADD(due_date, INTERVAL 2 DAY)
WHERE invoice_id = 1;

Preventing exclusions

WITH CHECK OPTION prevents UPDATE and DELETE statements from excluding rows from the view.

CREATE VIEW sales_by_client AS
SELECT
    c.client_id,
    c.name,
    SUM(invoice_total) AS sales
-- ...
WITH CHECK OPTION;

Stored procedures

Stored procedures are simply stored queries.

Do not mix your SQL code with your application language. Use stored procedures to separate, organize and version control your SQL code.

Also, some db engines optimize stored procedures, so stored procedures can sometimes be executed faster.

Creating stored procedures

To create a stored procedure, change a delimiter from semicolon to $$ (as is convention) to take the CREATE PROCEDURE and BEGIN/END clause all as one unit. Then, change the delimiter back to semicolon.

DELIMITER $$
CREATE PROCEDURE get_clients()
BEGIN
    SELECT * FROM clients;
END$$
DELIMITER ;

To call the stored procedure from your SQL code:

CALL get_clients();

Otherwise, call it from your application language.

Deleting stored procedures

DROP PROCEDURE IF EXISTS get_clients;

IF EXISTS can be omitted, but this will cause an error if you drop the procedure and then try to drop it again when it does not exist anymore.

Calling stored procedures from Node

let mysql = require('mysql')
let config = require('./config.js')

let connection = mysql.createConnection(config)

let sql = `CALL filterTodo(?)`

connection.query(sql, true, (error, results, fields) => {
  if (error) {
    return console.error(error.message)
  }
  console.log(results[0])
})

connection.end()

Creating a stored procedure with parameter

DELIMITER $$
CREATE PROCEDURE get_clients_by_state
(
    state CHAR(2)
)
BEGIN
    SELECT * FROM clients c;
    WHERE c.state = state;
    -- state column = state parameter
END$$
DELIMITER ;

CALL get_clients_by_state('CA');

Adding a default value:

DELIMITER $$
CREATE PROCEDURE get_clients_by_state
(
    state CHAR(2)
)
BEGIN
    IF STATE IS NULL THEN
        SET state = 'CA';
        -- default state is CA
    END IF;
    SELECT * FROM clients c;
    WHERE c.state = state;
    -- state column = state parameter
END$$
DELIMITER ;

CALL get_clients_by_state();
-- default argument CA is passed

Trigger

A trigger is a block of SQL code that gets executed before or after an insert, update or delete statement.

DELIMITER $$
CREATE TRIGGER payments_after_insert
    AFTER INSERT ON payments
    FOR EACH ROW
BEGIN
    UPDATE invoices
    SET payment_total = payment total + NEW.amount
    WHERE invoice_id = NEW.invoice_id;
END$$
DELIMITER ;

-- NEW is the row that was just inserted
-- OLD is the row that was just updated or deleted

The convention for naming triggers is:

tablename_tempword_action

payments_before_insert
customs_after_update

To delete the trigger:

DROP TRIGGER IF EXISTS payments_after_insert;

Events

An event is a block of SQL code that gets executed according to a schedule.

DELIMITER $$
CREATE EVENT yearly_delete_stale_rows
ON SCHEDULE
    EVERY 1 YEAR STARTS '2019-01-01'
DO BEGIN
    DELETE FROM payments_audit
    WHERE action_date < NOW()
END$$
DELIMITER ;

To delete an event:

DROP EVENT IF EXISTS yearly_delete_stale_rows;

To edit an event:

DELIMITER $$
ALTER EVENT yearly_delete_stale_rows
ON SCHEDULE
-- ...

Transactions

A single block of code containing two or more SQL commands—they all succeed or fail together as a unit.

START TRANSACTION;

INSERT INTO orders (customer_id, order_date, status) VALUES (1, '2019-01-01', 1);

INSERT INTO order_items
VALUES (LAST_INSERT_ID(), 1, 1);

COMMIT;

If the client disconnects during the transaction, all the commands executed up to that point get rolled back.

Concurrency

Concurrency is the situation where two or more clients are using the database at the same time.

Transaction isolation levels

Transaction isolation levels solve concurrency problems, which concern mostly what data is retrieved by a client while another client acts on that same data.

Concurrency problems:

  • Lost updates
  • Dirty reads
  • Non-repeating reads
  • Phantom reads

Transaction isolation levels:

  • READ UNCOMMITTED (fastest, no logs)
  • READ COMMITTED (fewer logs)
  • REPEATABLE READ (most commonly used)
  • SERIALIZABLE (slowest, most logs)

A lower isolation level gives us more concurrency, but more concurrency increases the risk for concurrency problems.

The more we increase the isolation level, the more performance problems are created, because more logs are necessary to isolate transactions.

There is also READ UNCOMMITED, which is used when data is rarely updated and you wish to achieve better performance.

To view the current isolation level:

SHOW VARIABLES LIKE 'transaction_isolation';

To change the isolation level:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- for next transaction

SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- for all transactions in session

SET GLOBAL TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- for all transactions globally

Database design

Relationships

Entity relational (ER) diagrams allow you to draw the conceptual model for the relations in the database.

draw.io is a popular ER modeling tool.

Relationships:

  • one to one
  • one to many (e.g., customer with more than one address)
  • many to many are supported in relational databases only through link tables

To create a many-to-many relationship, we need to create a link table, such as a table that collects student IDs and courses IDs. This way, we get a table with a one-to-many relationship to the link table and another table with a one-to-many relationship to the link table. The link table thus serves as an intermediary supporting a many-to-many relationship. In other words, whenever you have a many-to-many relationship, you need to break it down into two one-to-many relationships.

Normalization

Normalization is setting rules to prevent data duplication. There are seven rules or 'forms' but the first three forms are the most frequently used.

1NF

First Normal Form: Each cell should have a single value and we should not have repeated columns.

2NF

Second Normal Form: Every table should describe one entity and every column in that table should describe that entity.

For example, in an orders table, there should not be a customer_name column—there should be a customer_id column. In other words, if a column in a table does not have unique contents, its contents will be repeated, and if so, then the contents deserve their own table with a numeric primary key identifier.

3NF

Third Normal Form: A column in a table should not be derived from other columns. This is because an update to the main columns will need to be reflected also in the dependent column.

For example, if you have an invoices column, a payments column and a balance column (i.e., the difference between invoices and payments), the balance column should be deleted. Another example, if you have first_name, last_name and full_name, you should do away with full_name.

Best practice

Focus on eliminating duplication. Duplicate values show that the design is not normalized.

Indexing

An index is a separate table that refers to the indexed table and that ordered. It serves to quicken queries.

In the same way that a person uses an index to find words within a publication, a database server uses indexes to locate rows in a table. The purpose of an index is to make it easier to retrieve a subset of a table's rows and columns without the need to inspect every row in the table.

Creating an index does not change the table data; it just creates a new data structure that refers to the table. A database index is, after all, very much like the index at the end of a book: it occupies its own space, it is highly redundant, and it refers to the actual information stored in a different place.

Searching in a database index is like searching in a printed telephone directory. The key concept is that all entries are arranged in a well-defined order. Finding data in an ordered data set is fast and easy because the sort order determines each entry's position. The database automatically creates an index for the primary key.

Once created, the database maintains the index automatically. It applies every insert, delete and update to the index, thus causing maintenance overhead for write operations. Rebuilding an index does not improve performance on the long run.

Every index is a table (a special type of table, but still a table). Therefore, every time a row is added to or removed from a table, all indexes on that table must be modified. When a row is updated, any indexes on the column or columns that were affected need to be modified as well. Therefore, the more indexes you have, the more work the server needs to do to keep all schema objects up-to-date, which tends to slow things down.

Best practices:

  • Make sure all primary key columns are indexed. (Most servers automatically create unique indexes when you create primary key constraints.)
  • Build indexes on all columns that are referenced in foreign key constraints.
  • Index any columns that will frequently be used to retrieve data. Most date columns are good candidates, along with short (3- to 50-character) string columns.

Reserve indexes for performance-critical queries. Use indexes to speed up queries; do not index columns blindly. Columns that are primary keys are indexed by default.

Always try to use index-only scans, i.e. queries that use only indexed columns.

To create an index on a column:

CREATE INDEX idx_state ON customers (state);

To view indexes on a table:

SHOW INDEXES IN customers;

To delete an index on a table:

DROP idx_state ON customers;

To force the use of a particular index:

SELECT *
FROM customers
USE INDEX (idx_state_lastname)
WHERE state = 'NY' and last_name = 'A%';

Best practice: Index columns often used in WHERE clauses. Then do the same with the ORDER BY clauses. And finally, do the same with the SELECT clause. This way you get a covering index, so the query does not need to touch the tables and works only with the covering index.

You can use ASC and DESC modifiers in the index declaration.

CREATE INDEX sales_dt_pr
    ON sales (sale_date ASC, product_id DESC);

Try to avoid duplicate indexes, redundant indexes and unused indexes. Check existing indexes before creating new ones.

Partial indexes

A partial index is useful for commonly used WHERE conditions that use constant values—like the status code in the following example:

SELECT message
FROM messages
WHERE processed = 'N'
AND receiver = ?;

With partial indexing you can limit the index to include only the unprocessed messages. The syntax for this is surprisingly simple: a WHERE clause.

CREATE INDEX messages_todo
ON messages (receiver)
WHERE processed = 'N';

Prefix indexes

For string columns, an index may require a lot more memory and will not perform well.

When indexing string columns, include only a limited number of characters in the index:

CREATE INDEX idx_lastname ON customers (last_name(20));

Full-text indexes

Full-text indexes allow you to implement a search engine in your application.

CREATE FULLTEXT INDEX idx_title_body ON posts (title, body);

SELECT *
FROM posts
WHERE MATCH(title, body) AGAINST('react redux');

There is natural language mode and BOOLEAN MODE, which allows you to forcefully include or exclude certain terms from the search:

SELECT *
FROM posts
WHERE MATCH(title, body) AGAINST('react -redux +form' IN BOOLEAN MODE);
-- one term that must not appear, one term that must appear

SELECT *
FROM posts
WHERE MATCH(title, body) AGAINST('"handling a form"' IN BOOLEAN MODE);
-- exact phrase

Composite indexes

You can create an index on two columns simultaneously:

CREATE INDEX idx_state_points ON customers (state, points)
SELECT customer_id FROM customers
WHERE state = 'CA' AND points > 1000;
-- index on both columns for faster querying

Composite indexes are more efficient for queries that use the two indexed columns, as opposed to two separate indexes on those same columns.

The most important consideration when defining a concatenated index is how to choose the column order so it can be used as often as possible. But even though the two-index solution delivers very good select performance as well, the single-index solution is preferable. The fewer indexes a table has, the better the insert, delete and update performance.

Order of columns: The most frequently used column comes first. The column with higher cardinality (i.e., number of unique values) comes first, to narrow down your search as much as possible before the other column is searched. However, if the WHERE clause contains an equality and then a LIKE condition, the lower cardinality column should be favored.

Users

To create a user:

CREATE USER john IDENTIFIED BY 'mypassword';
-- password
CREATE USER john@127.0.0.1;
-- IP address, localhost, domain

View users:

SELECT user FROM mysql.user;

Delete users:

DROP USER john

To change a password:

SET PASSWORD john = 'mynewpassword';

Privileges

To grant privileges:

GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE
ON database_name.*
TO john;
-- database dot star means all tables in the db

EXECUTE refers to executing stored procedures.

GRANT ALL
ON *.*
TO john;

To view privileges on a user:

SHOW GRANTS FOR john;
SHOW GRANTS;
-- unspecified user means current user

To revoke privileges:

REVOKE INSERT, UPDATE, DELETE, EXECUTE
ON database_name.*
FROM john;

Order of execution

SQL order of execution:

FROM        -- Specify the tables to be queried
WHERE       -- Filter the returned rows
GROUP BY    -- Aggregate rows
HAVING      -- Filter the aggregates
SELECT      -- Specify the columns to show
ORDER BY    -- Order the returned rows
LIMIT       -- Limit the returned rows