MySQL Indexes - Complete Guide
by Aaron • 11/22/2022, 8:34:12 AM
Table of Contents
- What is an Index
- Index Classification
- Index Disadvantages
- Index Optimization
- When Indexes Become Ineffective
- Use Cases
What is an Index
A data structure that helps storage engines quickly retrieve data. MySQL primarily uses B+ Tree structures.
Advantages and Disadvantages
Advantages:
- Reduces disk I/O operations, increasing data retrieval speed
- Unique indexes ensure data uniqueness
- GROUP BY and ORDER BY operations can significantly improve efficiency and reduce CPU consumption
Disadvantages:
- Requires additional storage space for index data structures
- Creating and maintaining indexes requires extra time; as data increases, the time consumption also increases, slowing down data modification operations
B+ Tree
find 8
step1: 1 <= 8 < 18
step2: 6 <= 8 < 12
step3: 6, "8", 10
finish
"1", 18, 36
/ |
/ |
/ |
1, "6", 12 18, 24, 30
/ | \ / | \
1 6 12 18 24 ..........
3 "8" 15 20 26
5 10 17 22 28
Why Not Red-Black Trees
- B+ Tree has fewer search operations: The complexity of balanced trees is approximately equal to tree height logdN. Red-black trees have d = 2, while B+ Trees have d much larger than 2.
Index Classification
By Number of Indexed Fields
-
Single field: Single column index
Example:
ALTER TABLE member ADD INDEX email_index (email);
-
Multiple fields: Composite index
Example:
ALTER TABLE member ADD INDEX email_tel_index (email, tel);
By Primary Key Indexing
-
Primary Key Index: Indexing on the primary key
- Indexes based on the primary key; if no primary key is set, MySQL automatically generates a hidden field as the primary key index
- Every table has a primary key index
- B+ Tree leaf nodes store complete data
-
Secondary Index (Auxiliary Index): Indexing on non-primary keys
- Indexes based on non-primary keys
- B+ Tree leaf nodes store primary key values
- Querying involves two steps: 1) Find primary key value using secondary index, 2) Use primary key index to find complete data (called “table lookup”)
- If all required query data is available from the secondary index, table lookup is not needed
By Physical Storage
- Clustered Index: B+ Tree leaf nodes store complete data
- Non-Clustered Index: B+ Tree leaf nodes store primary key values
Other Types
- Unique Index
- Values are unique, can be NULL
- Checks for uniqueness during CREATE, UPDATE, and INSERT operations
Index Optimization
- Covering Index Optimization: Obtain data from secondary index without table lookup
- Auto-increment: During INSERT, no data movement is required, and B+ Tree space fragmentation is reduced
- NOT NULL: NULL values require additional storage for null index records
- Place more selective index columns on the left (Selectivity: ratio of unique index values to total records)
When Indexes Become Ineffective
B+ Trees sort indexes, so if WHERE conditions cannot obtain continuous range data, the index becomes ineffective
-
Left or left-right wildcard matching on indexed columns
-- Examples: SELECT * FROM user WHERE name LIKE "%ron"; SELECT * FROM user WHERE name LIKE "%ro%";
-
Using functions on indexed columns
-- Example: SELECT * FROM user WHERE LENGTH(name) = 6; -- MySQL 8.0+ allows functional indexes ALTER TABLE user ADD KEY index_name_length ((LENGTH(name)));
-
Performing calculations on indexed columns
-- Example: SELECT * FROM user WHERE id + 1 = 6;
-
Implicit type conversion on indexed columns
-- Example: SELECT * FROM user WHERE name = 123; -- MySQL's type conversion rule converts string to number for comparison
-
Composite index not following leftmost matching principle
-- Example: -- Index: (a, b, c) WHERE a = 6 AND b = 6 AND c = 6 -- ✓ Uses index WHERE a = 6 AND b = 6 -- ✓ Uses index WHERE a = 6 -- ✓ Uses index WHERE b = 6 AND c = 6 -- ✗ Full table scan WHERE b = 6 -- ✗ Full table scan
-
OR conditions
WHERE condition1 OR condition2 -- If condition2 doesn't have an index, the entire query becomes ineffective -- Both conditions need indexes for the query to use indexes effectively
Use Cases
- Small tables: Full table scans are often more efficient than creating indexes
- Very large tables: The cost of creating and maintaining indexes becomes too high (consider partitioning)
- High selectivity columns: Indexes work best on columns with many unique values
- Frequently queried columns: Columns used in WHERE, ORDER BY, and JOIN conditions
Best Practices
- Create indexes on frequently queried columns
- Use composite indexes for multi-column queries
- Monitor index usage with EXPLAIN
- Remove unused indexes
- Consider covering indexes to avoid table lookups
- Be mindful of index maintenance overhead
Understanding these principles will help you design efficient database schemas and write optimized queries that make the best use of MySQL’s indexing capabilities.