51Testing软件测试论坛

标题: 千万级数据分区存储方案测试 [打印本页]

作者: lsekfe    时间: 2021-9-1 10:47
标题: 千万级数据分区存储方案测试
 场景
  基于项目中的业务场景,对数据量大,对时间比较敏感(比如历史数据基本不用,热点数据集中在近期)的数据采用分区存储。 本次测试流程主要是创建测试表、插入千万条数据(本次两千万),查看分区表执行情况。
  创建表
  1.  CREATE TABLE test (
  2.     id BIGINT(20) auto_increment,
  3.   area_id INT (8) NOT NULL DEFAULT 0,
  4.   create_time datetime NOT NULL ,
  5.   num INT (8) NOT NULL DEFAULT 0,
  6.   PRIMARY KEY (id, area_id, create_time)
  7.   ) ENGINE = INNODB DEFAULT CHARSET = utf8 PARTITION BY RANGE (TO_DAYS(create_time))(
  8.   PARTITION p0
  9.   VALUES
  10.   LESS THAN (TO_DAYS('2019-11-01')) ENGINE = INNODB,
  11.   PARTITION p20191101
  12.   VALUES
  13.   LESS THAN (TO_DAYS('2019-11-02')) ENGINE = INNODB,
  14.   PARTITION p20191102
  15.   VALUES
  16.   LESS THAN (TO_DAYS('2019-11-03')) ENGINE = INNODB,
  17.   PARTITION p20191103
  18.   VALUES
  19.   LESS THAN (TO_DAYS('2019-11-04')) ENGINE = INNODB,
  20.   PARTITION p20191104
  21.   VALUES
  22.   LESS THAN (TO_DAYS('2019-11-05')) ENGINE = INNODB,
  23.   PARTITION p20191105
  24.   VALUES
  25.   LESS THAN (TO_DAYS('2019-11-06')) ENGINE = INNODB,
  26.   PARTITION p20191106
  27.   VALUES
  28.   LESS THAN (TO_DAYS('2019-11-07')) ENGINE = INNODB,
  29.   PARTITION p20191107
  30.   VALUES
  31.   LESS THAN (TO_DAYS('2019-11-08')) ENGINE = INNODB,
  32.   PARTITION p20191108
  33.   VALUES
  34.   LESS THAN (TO_DAYS('2019-11-09')) ENGINE = INNODB,
  35.   PARTITION p20191109
  36.   VALUES
  37.   LESS THAN (TO_DAYS('2019-11-10')) ENGINE = INNODB
  38.   );
复制代码
创建存储过程
  1. DELIMITER //
  2.       create PROCEDURE insert_test(in num INT, in time varchar(10))
  3.       BEGIN
  4.           DECLARE rowid INT DEFAULT 0;
  5.           DECLARE create_time datetime;
  6.   DECLARE area_id INT;
  7.   SET @exedata = "";
  8.           WHILE rowid < num DO
  9.               SET create_time = (select str_to_date(CONCAT(time,' ',LPAD(FLOOR(0 + (RAND() * 23)),2,0),':',LPAD(FLOOR(0 + (RAND() * 59)),2,0),':',LPAD(FLOOR(0 + (RAND() * 59)),2,0)), "%Y-%m-%d %H:%i:%s") from dual);
  10.   SET area_id = (select FLOOR(910000 + (RAND() * 9)));
  11.   SET rowid = rowid + 1;
  12.               IF length(@exedata)>0 THEN
  13.               SET @exedata = CONCAT(@exedata,',');
  14.               END IF;
  15.               SET @exedata=concat(@exedata,"('",area_id,"','",create_time,"','",rowid,"')");
  16.               IF rowid%5000=0
  17.               THEN
  18.                   SET @exesql =concat("insert into test(area_id,create_time,num) values ", @exedata);
  19.                   prepare stmt from @exesql;
  20.                   execute stmt;
  21.                   DEALLOCATE prepare stmt;
  22.                   SET @exedata = "";
  23.               END IF;
  24.           END WHILE;
  25.           IF length(@exedata)>0
  26.           THEN
  27.               SET @exesql =concat("insert into test(area_id,create_time,num) values ", @exedata);
  28.               prepare stmt from @exesql;
  29.               execute stmt;
  30.               DEALLOCATE prepare stmt;
  31.           END IF;
  32.       END //
  33.   DELIMITER ;
复制代码
调用存储过程插入数据
  1.  CALL insert_test (1000000, '2019-10-31');
  2.   CALL insert_test (1000000, '2019-11-01');
  3.   CALL insert_test (1000000, '2019-11-02');
  4.   CALL insert_test (1000000, '2019-11-03');
  5.   CALL insert_test (1000000, '2019-11-04');
  6.   CALL insert_test (1000000, '2019-11-05');
  7.   CALL insert_test (1000000, '2019-11-06');
  8.   CALL insert_test (1000000, '2019-11-07');
  9.   CALL insert_test (1000000, '2019-11-08');
  10.   CALL insert_test (1000000, '2019-11-09');
复制代码
分析
  1、两千万数据下,根据日期按小时平均在1s左右;保持单分区数据不变情况下,后期sql性能影响基本不会太大。
  2、此执行计划为全表扫描,经测试后发现分区后where已经问题不大,后期可尝试将主键优化(去掉ID)后,执行计划将来到range级别;

  3、未分区表,同样数据走索引情况下,多次执行平均在2.5s左右。
  1.  select PARTITION_NAME as "分区",TABLE_ROWS as "行数" from information_schema.partitions where table_schema="test" and table_name="test";
  2.   explain partitions  select  *  from  test  where  area_id = 910005 and  create_time > '2019-11-02 00:00:00' and create_time < '2019-11-02 23:59:59';
  3.   explain
  4.   select avg(num) , DATE_FORMAT(create_time,'%Y%m%d%H') time   from  test  
  5.   where  area_id = 910005 and  create_time > '2019-11-06 00:00:00' and create_time < '2019-11-06 23:59:59'
  6.   group by time
复制代码
[attach]134199[/attach]



作者: 千里    时间: 2021-9-26 16:04
这个DBA可能会关注得更多




欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/) Powered by Discuz! X3.2