Peak performance MySQL tuning training for operators

by Aurimas Mikalauskas

29 May 2023

The focus of this course was to get the best possible performance out of MySQL.The target audience - DBAs, SREs and alike. It was presented by Aurimas Mikalauskas at the Vinted office some 5 years ago. Aurimas was working as a consultant for Percona at the time. I’m not sure if the course is still available though. I got to watch a recording of the session at Vinted.

Contents of the course

Why I watched it

Whenever I asked for recommendations on how to improve my technical skills our staff engineer would recommend me to watch the recording of this course. I agreed that it is a good idea, but kept postponing it. Then I switched domains, asked my new staff engineer how to improve my relevant technical skills - and got the same recommendation again.

How I watched it

For the better half of the course I watched it and took vigorous notes, pausing the course whenever I needed to write more things down. This approach cost me about an hour to get through 15 minutes of the recording - while there were more than 8 hours of the recording in total. At some point I got frustrated enough and switched to taking additional notes on the printouts of the slides. This made me feel a bit uncomfortable at first - like I wasn’t able to focus on the content as much. But eventually I got used to it.

After finishing the course I went through my notes, highlighting different things in different colors - the things that seemed interesting or useful in general, the things I would like to learn more about, the things I planned to mention in this post. So far this is more or less the same approach I usually take with reading technical books. However, there was one new thing I tried - discussing what I’ve read with Chat GPT. I asked about things like how concepts from different views (e.g. tablespaces vs B+ trees) relate to each other or requested additional resources for some topics.

What I found the most valuable

As the course seems to have been aimed at DBAs and SREs, some parts (e.g. hardware and OS, MySQL configuration) flew right over my head. However things like schema and query optimisation seem invaluable for any backend engineer. MySQL server architecture was interesting for my general curiosity.

I would like to have some of the learnings always at hand, so I’ll summarize them in this blog post. However I’m not sure if it is OK to share the information from the course, so I will focus on the things that are available in official MySQL documentation.

Usage of the EXPLAIN statement

I catch myself in a lot of self-conscious moments where someone from my colleagues says “a proper senior developer should definitely know X” and I panic that apparently I am not a proper senior developer yet. One of these X’es that stood out for me was using MySQL explain statements. I guess the author of the talk agrees with this sentiment as there was quite a bit of course dedicated to explaining the EXPLAIN statement. There is also quite a bit on the topic in the official MySQL documentation. I think the best way to store this for a future reference is by organizing it into a table (I’ve omitted some fields that seemed less valuable):

Output field Possible values Meaning
id EXPLAIN returns a row of information for each table used in the SELECT statement. It lists the tables in the output in the order that MySQL would read them while processing the statement.
type describes how tables are joined.
system The table has only one row (= system table)
const The table has at most one matching row, which is read at the start of the query. Because there is only one row, values from the column in this row can be regarded as constants by the rest of the optimizer
eq_ref One row is read from this table for each combination of rows from the previous tables. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index.
ref Is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index (in other words, if the join cannot select a single row based on the key value). If the key that is used matches only a few rows, this is a good join type.
ref_or_null This join type is like ref, but with the addition that MySQL does an extra search for rows that contain NULL values.
index_merge Indicates that the Index Merge optimization is used.
range Only rows that are in a given range are retrieved, using an index to select the rows.
index The index tree is scanned. This occurs in two ways: if the index is a covering index for the queries only the index tree is scanned (in this case, the extra column says ‘Using index’); A full table scan is performed using reads from the index to look up data rows in index order (uses index does not appear in the extra column)
ALL A full table scan is done.
possible_keys Indicates the indexes from which MySQL can choose to find the rows in this table. If this column is NULL there are no relevant indexes.
key The key (index) that MySQL actually decided to use
key_len The key_len column indicates the length of the key that MySQL decided to use. The value of key_len enables you to determine how many parts of a multiple-part key MySQL actually uses.
rows The number of rows MySQL believes it must examine to execute the query (rough estimate)
extra This column contains additional information about how MySQL resolves the query. One example was given above when describing index join type.

Indices

A big part of the course was dedicated to how data is stored in InnoDB and why that is important when thinking about your schema and queries. As for the overall InnoDB architecture I think a picture speaks a thousand words and there is a good one in the official documentation.

The part I want to understand and remember the most are indices.

Clustered Index. InnoDB stores row data in the leaf nodes of a special B+ tree index called clustered index. The index is called “clustered” because the rows are sorted by the primary key values - rows with related index values are clustered together.

Let’s say I want to store the technical books I read in an InnoDB table. Here are the 5 most recent of them:

Id Tittle Author Rating
1 High Performance Browser Networking Ilya Grigorik 5
2 Fundamentals of Software Architecture Mark Richards 3
3 Database Reliability Engineering Charity Majors 3
4 Staff Engineer Will Larson 4
5 Kubernetes Up & Running Kelsey Hightower 5

To make things more interesting I would like to use the title as the primary key. The B+ tree (limited to 3 children per node) would look something like this:

Titles B+ tree index

The implications of storing data in such structure:

Secondary Indices. If I needed to add another index or few for my table (e.g. because there is a use case to also look up books by their rating or author) those would be secondary indexes. They would also be stored as B+ trees, but the leaf nodes would contain the value of the primary key, instead of the actual row data. The implications of this approach:

Compound indices. Both clustered and secondary indices can be composed of multiple columns. In such a case still a single B+ tree is used to store the index, but it is ordered by all the values comprising the index, starting from the leftmost one. This has some implications for the queries:

Other impressions

The most valuable thing I got out of this course is the feeling that I have some understanding about how MySQL works. Not having it was becoming an issue for me as everytime I had to do a database related task I would get anxious that maybe there is some big and important nuance I’m oblivious about. I feel like I would at least know the keywords worth looking into.

Book recommendations

I watched the course, took some notes and even summarized the most interesting things in this blog post. However, I would still like to have a good resource about MySQL query optimisation always at hand. I haven’t bought it yet, but “High Performance MySQL” looks like a good candidate for this as it is well rated (4.43/5 stars) and the last edition was published recently (2021) enough.

© 2024. All rights reserved.