51Testing软件测试论坛
标题: 按月分区的方法和性能比较 [打印本页]
作者: TimiZheng 时间: 2019-10-16 15:48
标题: 按月分区的方法和性能比较
按月分区方法的比较
1、事出有因
在生产系统中发现在高并发时,分区表常常被锁住,而记录的锁非常奇怪,是最早分区中的一个数据,而实际删除的时候都限制了分区字段的值,这个值是一个between A and B ,A B之间并不包括最早分区。innodb的行级锁是按索引来的,无论如何都不应该会将他的这行记录锁住。
说明存在问题:我们的分区可能有问题或者mysql本身的bug
以下主要针对分区方式的分区解析和行锁进行的测试。
2、表结构 -按月分区的两种方式,第一种是按照 日期/100 得到的结果进行分区
- CREATE TABLE `test_stock_history_1` (
- `rtime` bigint(20) unsigned NOT NULL,
- `oid` int(10) unsigned DEFAULT NULL,
- `fund_key` bigint(20) unsigned NOT NULL,
- `trans_date` date NOT NULL DEFAULT '1700-01-01',
- ......
- ......
- `createtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
- `updatetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
- `broker` int(10) DEFAULT NULL,
- `terminal` char(3) DEFAULT NULL,
- `dealway` char(3) DEFAULT NULL,
- `type1` char(10) DEFAULT NULL,
- `type2` char(20) DEFAULT NULL,
- UNIQUE KEY `idx_sh_fto` (`fund_key`,`trans_date`,`oid`) USING BTREE,
- KEY `idx_sh_fts` (`fund_key`,`trans_date`,`stock_code`) USING BTREE
- ) ENGINE=InnoDB DEFAULT CHARSET=utf8
- |
复制代码第二种分区是直接按照日期的方式取值范围进行分区:
- | test_stock_history_2 | CREATE TABLE `test_stock_history_2` (
- `rtime` bigint(20) unsigned NOT NULL,
- `oid` int(10) unsigned DEFAULT NULL,
- `fund_key` bigint(20) unsigned NOT NULL,
- `trans_date` date NOT NULL DEFAULT '1700-01-01',
- ......
- ......
- `createtime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
- `updatetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
- `broker` int(10) DEFAULT NULL,
- `terminal` char(3) DEFAULT NULL,
- `dealway` char(3) DEFAULT NULL,
- `type1` char(10) DEFAULT NULL,
- `type2` char(20) DEFAULT NULL,
- UNIQUE KEY `idx_sh_fto` (`fund_key`,`trans_date`,`oid`) USING BTREE,
- KEY `idx_sh_fts` (`fund_key`,`trans_date`,`stock_code`) USING BTREE
- ) ENGINE=InnoDB DEFAULT CHARSET=utf8
- |
复制代码
以上两种表,除了分区方式不同,其他字段、索引均相同,每个分区中存放的数据也相同
3、按照索引的执行计划解释
- explain partitions (select * from test_stock_history_1 where fund_key=3100120 and trans_date >= '2016-01-01' and trans_date<= '2016-01-03' );
- +----+-------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+-----------------------+------------+---------+------+------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
- +----+-------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+-----------------------+------------+---------+------+------+-------------+
- | 1 | SIMPLE | test_stock_history_1 | p_201407,p_201408,p_201409,p_201410,p_201411,p_201412,p_201501,p_201502,p_201503,p_201504,p_201505,p_201506,p_201507,p_201508,p_201509,p_201510,p_201511,p_201512,p_201601,p_201602,p_201603,p_201604,p_201605,p_201606,p_201607,p_201608,p_201609,p_201610,p_201611,p_201612,p_catchall | range | idx_sh_fto,idx_sh_fts | idx_sh_fto | 11 | NULL | 10 | Using where |
- +----+-------------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+-----------------------+------------+---------+------+------+-------------+
- 发现是所有分区。
- mysql> explain partitions (select * from test_stock_history_2 where fund_key=3100120 and trans_date >= '2016-01-01' and trans_date<= '2016-01-03' );
- +----+-------------+----------------------+------------+-------+-----------------------+------------+---------+------+------+-------------+
- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
- +----+-------------+----------------------+------------+-------+-----------------------+------------+---------+------+------+-------------+
- | 1 | SIMPLE | test_stock_history_2 | p_201601 | range | idx_sh_fto,idx_sh_fts | idx_sh_fto | 11 | NULL | 1 | Using where |
- +----+-------------+----------------------+------------+-------+-----------------------+------------+---------+------+------+-------------+
复制代码 2、查询解释
尽管上面的分区表解释不同,但是他们的执行计划解释是完全相同的
这种分区方式看起来能够定位具体的分区,看起来好像比较好,需要进行测试。
两个表同时导入3千万数据(loaddata) ,花费时间几乎一致。
删除
3、事务锁的情况
3.1、表一方式的锁
从监控的结果看,是锁住了一条毫无关系的一行记录。并且在实验环境下这种情况是完全可重现的
- 执行事务1
- start transaction;
- delete from test_stock_history_1 where fund_key = 3100190 and trans_date between '2016-05-01' and '2016-05-20';
- 执行事务2
- start transaction;
- delete from test_stock_history_1 where fund_key = 3100190 and trans_date between '2016-04-01' and '2016-04-20';
复制代码
原来个人认为这两条记录毫无关系,按照innodb的行锁规则,1和2无关,2是不会被锁的,但是实际执行中被锁住了。
- 检查事务和锁表的情况:
- select * from innodb_trx \G
- *************************** 1. row ***************************
- trx_id: 2798703
- trx_state: LOCK WAIT
- trx_started: 2016-06-27 18:36:17
- trx_requested_lock_id: 2798703:328:17:459
- trx_wait_started: 2016-06-27 18:36:17
- trx_weight: 2
- trx_mysql_thread_id: 2
- trx_query: delete from test_stock_history_1 where fund_key = 3100190 and trans_date between '2016-04-01' and '2016-04-20'
- trx_operation_state: starting index read
- trx_tables_in_use: 31
- trx_tables_locked: 31
- trx_lock_structs: 2
- trx_lock_memory_bytes: 360
- trx_rows_locked: 1
- trx_rows_modified: 0
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- *************************** 2. row ***************************
- trx_id: 2798701
- trx_state: RUNNING
- trx_started: 2016-06-27 18:35:01
- trx_requested_lock_id: NULL
- trx_wait_started: NULL
- trx_weight: 107
- trx_mysql_thread_id: 3
- trx_query: NULL
- trx_operation_state: NULL
- trx_tables_in_use: 0
- trx_tables_locked: 0
- trx_lock_structs: 92
- trx_lock_memory_bytes: 13864
- trx_rows_locked: 84
- trx_rows_modified: 15
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- 2 rows in set (0.00 sec)
复制代码
- mysql> select * from innodb_locks \G
- *************************** 1. row ***************************
- lock_id: 2798703:328:17:459
- lock_trx_id: 2798703
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 17
- lock_rec: 459
- lock_data: 3100310, 1031394, 0
- *************************** 2. row ***************************
- lock_id: 2798701:328:17:459
- lock_trx_id: 2798701
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 17
- lock_rec: 459
- lock_data: 3100310, 1031394, 0
- 2 rows in set (0.00 sec)
复制代码
文章来源:沧海大声啸的博客作者:Kervin(博为峰网校讲师)
作者: TimiZheng 时间: 2019-10-16 15:51
- mysql> select * from innodb_lock_waits \G
- *************************** 1. row ***************************
- requesting_trx_id: 2798703
- requested_lock_id: 2798703:328:17:459
- blocking_trx_id: 2798701
- blocking_lock_id: 2798701:328:17:459
- 1 row in set (0.00 sec)
复制代码
从上面的结果看被锁住的记录在p_201407分区,这是第一个分区
索引对应的值是:3100310, 1031394, 0
但删除的时候跟这个毫无关系,无论fund_key还是分区。
- mysql> select fund_key,trans_date,oid from test.test_stock_history_1 limit 1;
- +----------+------------+------+
- | fund_key | trans_date | oid |
- +----------+------------+------+
- | 3100310 | 2014-07-02 | 0 |
- +----------+------------+------+
复制代码
出来就是这个被锁住的记录
作者: TimiZheng 时间: 2019-10-16 15:54
怀疑:如果无论是谁删除锁住的都是这行记录,那岂不是变成了全表锁是一个道理了?
验证:继续按事务1锁住,执行事务3:
mysql> delete From test_stock_history_1 where fund_key =4414000 and trans_date between '2016-03-01' and '2016-03-31';
Query OK, 18 rows affected (0.00 sec)
能够非常快的
再开窗口,继续执行事务4
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> delete From test_stock_history_1 where fund_key =4414000 and trans_date between '2016-03-01' and '2016-03-31';
事务4被阻塞了,查看锁:
- mysql> select * from innodb_trx \G
- *************************** 1. row ***************************
- trx_id: 2798734
- trx_state: LOCK WAIT
- trx_started: 2016-06-27 19:12:46
- trx_requested_lock_id: 2798734:328:2219:258
- trx_wait_started: 2016-06-27 19:12:46
- trx_weight: 2
- trx_mysql_thread_id: 7
- trx_query: delete From test_stock_history_1 where fund_key =4414000 and trans_date between '2016-03-01' and '2016-03-31'
- trx_operation_state: starting index read
- trx_tables_in_use: 31
- trx_tables_locked: 31
- trx_lock_structs: 2
- trx_lock_memory_bytes: 360
- trx_rows_locked: 1
- trx_rows_modified: 0
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- *************************** 2. row ***************************
- trx_id: 2798733
- trx_state: RUNNING
- trx_started: 2016-06-27 19:11:33
- trx_requested_lock_id: NULL
- trx_wait_started: NULL
- trx_weight: 91
- trx_mysql_thread_id: 2
- trx_query: NULL
- trx_operation_state: NULL
- trx_tables_in_use: 0
- trx_tables_locked: 0
- trx_lock_structs: 86
- trx_lock_memory_bytes: 13864
- trx_rows_locked: 64
- trx_rows_modified: 5
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- *************************** 3. row ***************************
- trx_id: 2798725
- trx_state: RUNNING
- trx_started: 2016-06-27 18:49:50
- trx_requested_lock_id: NULL
- trx_wait_started: NULL
- trx_weight: 101
- trx_mysql_thread_id: 3
- trx_query: NULL
- trx_operation_state: NULL
- trx_tables_in_use: 0
- trx_tables_locked: 0
- trx_lock_structs: 86
- trx_lock_memory_bytes: 13864
- trx_rows_locked: 84
- trx_rows_modified: 15
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- 3 rows in set (0.00 sec)
复制代码
- mysql> select * from innodb_locks \G
- *************************** 1. row ***************************
- lock_id: 2798734:328:2219:258
- lock_trx_id: 2798734
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 2219
- lock_rec: 258
- lock_data: 4414290, 1031393, 0
- *************************** 2. row ***************************
- lock_id: 2798733:328:2219:258
- lock_trx_id: 2798733
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 2219
- lock_rec: 258
- lock_data: 4414290, 1031393, 0
- 2 rows in set (0.00 sec)
- mysql> select * from innodb_lock_waits \G
- *************************** 1. row ***************************
- requesting_trx_id: 2798734
- requested_lock_id: 2798734:328:2219:258
- blocking_trx_id: 2798733
- blocking_lock_id: 2798733:328:2219:258
- 1 row in set (0.00 sec)
复制代码
从这个上面看,他锁住的记录同样和我们操作的记录完全无关 fund_key= 4414290
那我们就直接删除4414290 的记录看看
mysql> delete From test_stock_history_1 where fund_key = 4414290 and trans_date between '2014-07-01' and '2014-07-31';
这条也是被锁住
所以这些信息并没有欺骗我们,这行是真的被锁住了。
总结:从表一上相关的锁情况看,分区字段进行了一次计算之后,锁已经乱了,这可能是个bug,也可能是其他原因,我并没有找出他的原因所在。(如果有兄弟能查出原因,希望能告知我一声)
只从结果上,锁住的行范围变了,可能会有多种fund_key 对应锁到同一行了,在我们系统中高并发时可能看到这样的锁监控记录。
作者: TimiZheng 时间: 2019-10-16 15:57
3.2 表二的事务锁情况
表二的数据结构和数据完全一样,但是分区方式不同。
做同样的事务操作分析
事务1
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
- mysql> delete from test_stock_history_2 where fund_key = 3100190 and trans_date between '2016-05-01' and '2016-05-20';
- Query OK, 15 rows affected (0.00 sec)
复制代码
- 事务2
- mysql> start transaction;
- Query OK, 0 rows affected (0.00 sec)
- mysql> delete from test_stock_history_2 where fund_key = 3100190 and trans_date between '2016-04-01' and '2016-04-20';
- Query OK, 11 rows affected (0.00 sec)
复制代码
事务1和事务2之间,因为日期不同,按照行锁的规则,他们并不会互斥,所以事务2非常顺利的执行下去了
继续在事务2下执行
- mysql> delete from test_stock_history_2 where fund_key = 3100190 and trans_date between '2016-05-01' and '2016-05-20';
复制代码
作者: TimiZheng 时间: 2019-10-16 16:00
这条就被锁住了
- mysql> select * from innodb_trx \G
- *************************** 1. row ***************************
- trx_id: 2798760
- trx_state: LOCK WAIT
- trx_started: 2016-06-27 19:39:57
- trx_requested_lock_id: 2798760:381:18:2
- trx_wait_started: 2016-06-27 19:41:27
- trx_weight: 22
- trx_mysql_thread_id: 2
- trx_query: delete from test_stock_history_2 where fund_key = 3100190 and trans_date between '2016-05-01' and '2016-05-20'
- trx_operation_state: starting index read
- trx_tables_in_use: 1
- trx_tables_locked: 1
- trx_lock_structs: 11
- trx_lock_memory_bytes: 2936
- trx_rows_locked: 25
- trx_rows_modified: 11
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- *************************** 2. row ***************************
- trx_id: 2798759
- trx_state: RUNNING
- trx_started: 2016-06-27 19:39:46
- trx_requested_lock_id: NULL
- trx_wait_started: NULL
- trx_weight: 25
- trx_mysql_thread_id: 3
- trx_query: NULL
- trx_operation_state: NULL
- trx_tables_in_use: 0
- trx_tables_locked: 0
- trx_lock_structs: 10
- trx_lock_memory_bytes: 2936
- trx_rows_locked: 32
- trx_rows_modified: 15
- trx_concurrency_tickets: 0
- trx_isolation_level: REPEATABLE READ
- trx_unique_checks: 1
- trx_foreign_key_checks: 1
- trx_last_foreign_key_error: NULL
- trx_adaptive_hash_latched: 0
- trx_adaptive_hash_timeout: 10000
- trx_is_read_only: 0
- trx_autocommit_non_locking: 0
- 2 rows in set (0.00 sec)
复制代码- mysql> select * from innodb_locks \G
- *************************** 1. row ***************************
- lock_id: 2798760:381:18:2
- lock_trx_id: 2798760
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_2`
- lock_index: idx_sh_fto
- lock_space: 381
- lock_page: 18
- lock_rec: 2
- lock_data: 3100190, 1032357, 0
- *************************** 2. row ***************************
- lock_id: 2798759:381:18:2
- lock_trx_id: 2798759
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_2`
- lock_index: idx_sh_fto
- lock_space: 381
- lock_page: 18
- lock_rec: 2
- lock_data: 3100190, 1032357, 0
- 2 rows in set (0.00 sec)
复制代码- mysql> select * from innodb_lock_waits \G
- *************************** 1. row ***************************
- requesting_trx_id: 2798760
- requested_lock_id: 2798760:381:18:2
- blocking_trx_id: 2798759
- blocking_lock_id: 2798759:381:18:2
- 1 row in set (0.00 sec)
复制代码
从上面看他锁住的行信息, lock_data: 3100190, 1032357, 0
这个就是正确的。
插入和删除测试:
插入性能基本相同
删除测试:
作者: TimiZheng 时间: 2019-10-16 16:02
删除一次之后,已经没有数据了,再进行一次删除,因为已经全部没有数据,所以这个时候其实都是空的
测试删除的语句delete from table where fund_key between A and B ;
测试结果:
表1 :
当只有一个进程时37.5秒
1467193557.537469388:start execute delete /root/parttest/sqldir/delesql0.sql
1467193595.066862604:end execute delete /root/parttest/sqldir/delesql0.sql. execute time = 37529
同时两个进程,42秒
1467193905.955952753:end execute delete /root/parttest/sqldir/delesql1.sql. execute time = 42855
1467193907.442418878:end execute delete /root/parttest/sqldir/delesql0.sql. execute time = 44342
同时3个进程47.8秒
1467200320.559988788:end execute delete /root/parttest/sqldir/delesql1.sql. execute time = 47822
1467200320.559988144:end execute delete /root/parttest/sqldir/delesql2.sql. execute time = 47822
1467200320.560782711:end execute delete /root/parttest/sqldir/delesql0.sql. execute time = 47823
同时10个进程,需要53.9
同时50个进程,需要74.8秒
发现进程数量越多,花费的时间越长,完全出乎我们的意料
这个结果完全超出我们的医疗。
当50个进程的时候,查锁记录情况,查到48个记录。
- mysql> select * from information_schema.innodb_locks \G
- *************************** 1. row ***************************
- lock_id: 5311932:328:17:459
- lock_trx_id: 5311932
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 17
- lock_rec: 459
- lock_data: 3100310, 1031394, 0
- *************************** 2. row ***************************
- lock_id: 5311931:328:17:459
- lock_trx_id: 5311931
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 17
- lock_rec: 459
- lock_data: 3100310, 1031394, 0
- *************************** 3. row ***************************
- lock_id: 5311930:328:17:459
- lock_trx_id: 5311930
- lock_mode: X
- lock_type: RECORD
- lock_table: `test`.`test_stock_history_1`
- lock_index: idx_sh_fto
- lock_space: 328
- lock_page: 17
- lock_rec: 459
- lock_data: 3100310, 1031394, 0
- *************************** 4. row ***************************
- 总共48条锁记录
复制代码
4、结论
从上面的结果来看,对分区字段进行计算后作为分区字段的方式,会有弊端,只要出现2个方面:
1 是当条件是between 或者> < 组成的,那么分区信息就会错误。
2 行锁位置出现了问题,可能导致更大范围的锁,最终影响到高并发时的性能。
所以建议尽可能的使用原始值进行分区。
欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/) |
Powered by Discuz! X3.2 |