current_mutex 5.1.46sp1 stack traces for a sample deadlock: ------------ thread 5148: insert .. MYSQL_BIN_LOG::purge_logs_before_date Line 3639 mysqld-debug.exe! MYSQL_BIN_LOG::rotate_and_purge Line 4527 mysqld-debug.exe!
- KILL is a generic statement that probably has a reason to take locks in this order.
LOCK_thd_data, because that prevents old_dump_thd from being deleted under the feet of kill_zombie_dump_threads.
Base Thread Start waiting for pthread_mutex_lock(&tmp-LOCK_thd_data); // Lock from delete thread 5040 holds the mutex. - It's probably necessary that kill_zombie_dump_threads takes LOCK_thread_count before old_dump_thd.
kill_zombie_dump_threads Line 1103 mysqld-debug.exe! - Increase and decrease the counter as dump threads start or stop reading a log.
We can fix the bug as follows: - Associate a counter with each binlog file.
And in fact, it seems unnecessary to block creating or destroying threads just because we rotate the log. - So the lock order should be: LOCK_thread_count, THD:: LOCK_thd_data, LOCK_log So it's the log rotation that violates lock order by taking LOCK_log before LOCK_thread count. During reconnection, the new dump thread tried to kill the old dump thread. ---------------- thread 2320: reconnecting slave doing COM_BINLOG_DUMP! 2) The slave server that replicates from the old dump thread tried to reconnect. Base Thread Start waiting for pthread_mutex_lock(&LOCK_thread_count); thread 2320 owns the mutex. Changelog entry: Deadlock could occur when these four things happened at the same time: 1) An old dump thread was waiting for the binary log to grow. - In log_in_use, just read the counter instead of iterating through all active threads in the system. - Enforce this lock order: LOCK_thread_count, THD:: LOCK_thd_data, LOCK_log, LOCK_index It seems that LOCK_index is always held anyways in all places when we need to access the counters, so we are only removing complexity, not adding complexity.