如何测试pos的bug(02-分析慢查询导致主从宕机)

快鱼网 24 0

标签

mysql主从中断, 慢查询,慢查询太大,my_net_writ()

作者

Matthew

时间

2022-02-09

告警

出现告警!触发告警名: 192.168.32.79:MySQL IO thread stopped告警开始时间:2022.02.09 09:08:20告警状态: PROBLEM告警级别: 严重监控项值:Slave IO thread running (06_slave_192.168.32.79:mysql.slave.Slave_IO_Running): Connecting原始事件ID: 当前时间:2022.02.09 09:08:23查看错误日志

数据库异常的时间。022-02-09T08:42:12.+08:00 [Note] Aborted connection to db: 'finenter' user: 'finenter' host: '192.168.50.56' (Got an error reading communication packets)2022-02-09T01:07:34.Z mysqld_safe Number of processes running now: -02-09T01:07:34.Z mysqld_safe mysqld restarted2022-02-09T09:07:34.+08:00 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.2022-02-09T09:07:34.+08:00 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled2022-02-09T09:07:34.+08:00 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.7.16-log) starting as process ..这一段是mysql开始启动的sql2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Opened 3 undo tablespaces2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: 3 undo tablespaces made active2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Highest supported file format is Barracuda.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn -02-09T09:07:37.+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number -02-09T09:07:37.+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number -02-09T09:07:37.+08:00 0 [Note] InnoDB: Database was not shutdown normally!2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Starting crash recovery.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Transaction was in the XA prepared state.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Transaction was in the XA prepared state.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Transaction was in the XA prepared state.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: Transaction was in the XA prepared state.2022-02-09T09:07:37.+08:00 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo数据库启动完成2022-02-09T09:07:41.+08:00 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections.Version: '5.7.16-log' socket: '/usr/local/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)2022-02-09T09:07:44.+08:00 47 [Note] Aborted connection 47 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.116' (Got an error reading communication packets)2022-02-09T09:07:45.+08:00 75 [Note] Aborted connection 75 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.115' (Got an error reading communication packets)2022-02-09T09:08:34.+08:00 259 [Note] Start binlog_dump to master_thread_id(259) slave_server(), pos(, 4)2022-02-09T09:08:34.+08:00 260 [Note] Start binlog_dump to master_thread_id(260) slave_server(3280), pos(, 4)2022-02-09T09:08:34.+08:00 261 [Note] Start binlog_dump to master_thread_id(261) slave_server(3279), pos(, 4)2022-02-09T09:08:34.+08:00 262 [Note] Start binlog_dump to master_thread_id(262) slave_server(3309), pos(, 4)2022-02-09T09:08:34.+08:00 264 [Note] Start binlog_dump to master_thread_id(264) slave_server(3217), pos(, 4)2022-02-09T09:08:34.+08:00 261 [Note] Aborted connection 261 to db: 'unconnected' user: 'repl' host: '192.168.32.79' (Failed on my_net_write())2022-02-09T09:08:34.+08:00 263 [Note] Start binlog_dump to master_thread_id(263) slave_server(3331), pos(, 4)2022-02-09T09:09:12.+08:00 0 [Note] InnoDB: Buffer pool(s) load completed at 9:09:12重启原因

用pt工具分析一个比较大的慢查询文件,资源不够用导致mysql异常重启了从库报错

Last_IO_Error: Fatal error: Failed to run 'after_read_event' hook分析源码

我们知道,mysqld是一个多线程的C/S架构的网络应用,因此少不了通过网络来读写数据,所以可能会出现写数据失败的情况。如果mysql的错误日志中出现此类错误,就说明是mysqld在向客户端发送网络包时失败导致的,当然,引申到复制场景,则说明是复制过程中,master向slave推送binlog时,写网络数据包失败。查看master_dump_thread 逻辑

需要确定master在send binlog失败的情况下退出dump thread的逻辑,根据错误日志提示,进入到相关的代码查看。错误代码在如下位置inline int Binlog_sender::send_packet(){ DBUG_ENTER("Binlog_sender::send_packet"); DBUG_PRINT("info", ("Sending event of type %s", Log_event::get_type_str( (Log_event_type)m_packet.ptr()[1 + EVENT_TYPE_OFFSET]))); // We should always use the same buffer to guarantee that the reallocation // logic is not broken. if (DBUG_EVALUATE_IF("simulate_send_error", true, my_net_write( m_thd->get_protocol_classic()->get_net(), (uchar*) m_packet.ptr(), m_packet.length()))) { set_unknow_error("Failed on my_net_write()"); DBUG_RETURN(1); }

调用关系

(gdb) bt#0 Binlog_sender::send_packet (this=0x7fead0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1158#1 0xf74e in Binlog_sender::send_packet_and_flush (this=0x7fead0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1182#2 0xe181 in Binlog_sender::send_heartbeat_event (this=0x7fead0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1143#3 0xee01 in Binlog_sender::wait_with_heartbeat (this=0x7fead0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:633#4 0xecd7 in Binlog_sender::wait_new_events (this=0x7fead0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:599#5 0xe938 in Binlog_sender::get_binlog_end_pos (this=0x7fead0, log_cache=0x7fea) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:365#6 0xc5e0 in Binlog_sender::send_binlog (this=0x7fead0, log_cache=0x7fea, start_pos=123) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:313#7 0xc1b4 in Binlog_sender::run (this=0x7fead0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:225结果层层返回到Binlog_sender::run

大致看下Binlog_sender::run的逻辑

void Binlog_sender::run(){ while (!has_error() && !m_thd->killed) { if (send_binlog(&log_cache, start_pos)) break; }}

解释到这里,大概就清楚了吧。

标签: binlog

抱歉,评论功能暂时关闭!