MySQL Indexes - Complete Guide

by Aaron • 11/22/2022, 8:34:12 AM

What is an Index
Index Classification
Index Disadvantages
Index Optimization
When Indexes Become Ineffective
Use Cases

What is an Index

A data structure that helps storage engines quickly retrieve data. MySQL primarily uses B+ Tree structures.

Advantages and Disadvantages

Advantages:

Reduces disk I/O operations, increasing data retrieval speed
Unique indexes ensure data uniqueness
GROUP BY and ORDER BY operations can significantly improve efficiency and reduce CPU consumption

Disadvantages:

Requires additional storage space for index data structures
Creating and maintaining indexes requires extra time; as data increases, the time consumption also increases, slowing down data modification operations

B+ Tree

find 8
step1:  1 <= 8 < 18
step2:  6 <= 8 < 12
step3:  6, "8", 10
finish

              "1",        18,         36
             /          |
           /            |
         /              |
   1, "6", 12         18, 24, 30
  /    |    \         /    |   \
1      6    12       18   24      ..........
3     "8"    15      20   26
5     10    17       22   28

Why Not Red-Black Trees

B+ Tree has fewer search operations: The complexity of balanced trees is approximately equal to tree height log_d^N. Red-black trees have d = 2, while B+ Trees have d much larger than 2.

Index Classification

By Number of Indexed Fields

Single field: Single column index

Example: ALTER TABLE member ADD INDEX email_index (email);
Multiple fields: Composite index

Example: ALTER TABLE member ADD INDEX email_tel_index (email, tel);

By Primary Key Indexing

Primary Key Index: Indexing on the primary key
1. Indexes based on the primary key; if no primary key is set, MySQL automatically generates a hidden field as the primary key index
2. Every table has a primary key index
3. B+ Tree leaf nodes store complete data
Secondary Index (Auxiliary Index): Indexing on non-primary keys
1. Indexes based on non-primary keys
2. B+ Tree leaf nodes store primary key values
3. Querying involves two steps: 1) Find primary key value using secondary index, 2) Use primary key index to find complete data (called “table lookup”)
4. If all required query data is available from the secondary index, table lookup is not needed

By Physical Storage

Clustered Index: B+ Tree leaf nodes store complete data
Non-Clustered Index: B+ Tree leaf nodes store primary key values

Other Types

Unique Index
1. Values are unique, can be NULL
2. Checks for uniqueness during CREATE, UPDATE, and INSERT operations

Index Optimization

Covering Index Optimization: Obtain data from secondary index without table lookup
Auto-increment: During INSERT, no data movement is required, and B+ Tree space fragmentation is reduced
NOT NULL: NULL values require additional storage for null index records
Place more selective index columns on the left (Selectivity: ratio of unique index values to total records)

When Indexes Become Ineffective

B+ Trees sort indexes, so if WHERE conditions cannot obtain continuous range data, the index becomes ineffective

Left or left-right wildcard matching on indexed columns

-- Examples:
SELECT * FROM user WHERE name LIKE "%ron";
SELECT * FROM user WHERE name LIKE "%ro%";

Using functions on indexed columns

-- Example:
SELECT * FROM user WHERE LENGTH(name) = 6;

-- MySQL 8.0+ allows functional indexes
ALTER TABLE user ADD KEY index_name_length ((LENGTH(name)));

Performing calculations on indexed columns

-- Example:
SELECT * FROM user WHERE id + 1 = 6;

Implicit type conversion on indexed columns

-- Example:
SELECT * FROM user WHERE name = 123;

-- MySQL's type conversion rule converts string to number for comparison

Composite index not following leftmost matching principle

-- Example:
-- Index: (a, b, c)
WHERE a = 6 AND b = 6 AND c = 6  -- ✓ Uses index
WHERE a = 6 AND b = 6            -- ✓ Uses index
WHERE a = 6                      -- ✓ Uses index

WHERE b = 6 AND c = 6            -- ✗ Full table scan
WHERE b = 6                      -- ✗ Full table scan

OR conditions

WHERE condition1 OR condition2

-- If condition2 doesn't have an index, the entire query becomes ineffective
-- Both conditions need indexes for the query to use indexes effectively

Use Cases

Small tables: Full table scans are often more efficient than creating indexes
Very large tables: The cost of creating and maintaining indexes becomes too high (consider partitioning)
High selectivity columns: Indexes work best on columns with many unique values
Frequently queried columns: Columns used in WHERE, ORDER BY, and JOIN conditions

Best Practices

Create indexes on frequently queried columns
Use composite indexes for multi-column queries
Monitor index usage with EXPLAIN
Remove unused indexes
Consider covering indexes to avoid table lookups
Be mindful of index maintenance overhead

Understanding these principles will help you design efficient database schemas and write optimized queries that make the best use of MySQL’s indexing capabilities.