mongodb分片 片键 测试

片键

  1. chunk(块)为分片中最小存储单位,是一个数据集合。
  2. 一个片上有多个chunk,一个chunk只能存在于一个分片上面。
  3. 数据就是根据片键划分到各分片上面的chunk中的,比如说:{{"Wid":10000}-->>{"Wid":10010}},这就是一个块,Wid就是我们要说的片键,是一个表中的某个字段。
  4. 查询的时候也是按照这个字段去查询,相当于关系数据库中分库分表的作用。

需求

  1. 查询局部化 意思就是我每次查询只需要到某一个分片上去取数据就可以了,这样子会快。
  2. 数据均衡性 数据划分均衡存储到各台机子上面,不能过于集中,这样会导致某台机器读写瓶颈,影响速度

    所以为了达到这个目的,片键必须设置好。

注意或原则

  1. 片键必须是一个索引,即使用片键做索引
  2. 片键通常由两个字段组成,第一个是粗粒度,第二个是粒度较细,即采用复合片键

片键选择

怎样选择:

小基数片键:如果某个片键一共只有N个值,那最多只能有N个数据块,也最多只有个N个分片。则随着数据量的增大会出现非常大的但不可分割的chunk。如果打算使用小基数片键的原因是需要在那个字段上进行大量的查询,请使用组合片键,并确保第二个字段有非常多的不同值。

随机片键:随机片键(比如MD5)最终会使数据块均匀分布在各个分片上,一般观点会以为这是一个很好的选择,解决了递增片键不具有写分散的问题,但是正因为是随机性会导致每次读取都可能访问不同的块,导致不断将数据从硬盘读到内存中,磁盘IO通常会很慢。

递增的片键:使用递增的分片的好处是数据的“局部性”,使得将最新产生的数据放在一起,对于大部分应用来说访问新的数据比访问老的数据更频繁,这就使得被访问的数据尽快能的都放在内存中,提升读的性能。这类的片键比如时间戳、日期、ObjectId、自增的主键(比如从sqlserver中导入的数据)。但是这样会导致新的文档总是被插入到“最后”一个分片(块)上去,这种片键创造了一个单一且不可分散的热点,不具有写分散性。

准升序键加搜索键: 通过像{coarselyAscending:1,search:1}这样的组合片键来实现,其中coarselyAscending的每个值最好能对应几十到几百个数据块,而search则应当是应用程序通常都会一句其进行查询的字段。

     注意:serach字段不能是升序字段,不然整个复合片键就下降为升序片键。这个字段应该具备非升序、分布随机基数适当的特点。
     复合片键中的第一个字段保证了拥有数据局部性,第二字段则保证了查询的隔离性。

首先一个大数据块((-∞,-∞),(∞,∞)),当他被填满,MongoDB将自动分割成两块,比如:

 ((-∞,-∞),("2012-07","susan"))

 [("2012-07","susan"),(∞,∞))

假设现在还是7月,则所有写操作会被均匀的分布到两个块上。所有用户名小于susan的数据被写入块1中,所有大于susan的数据被写入块2,然后整个生态系统就良性运行了,等到8月,MongoDB又开始创建2012-08的块,分布还是均衡的(这里不是时时均衡,肯定有个抹平的过程),等到9月,7月的数据无人访问就开始退出内存,不再占用资源。

测试

插入

如上文所描述设置 正确的分键再插入数据,对后面的查询非常重要!

  • 把添加卡顿率数据到mongo集群
10.1.3.22:10100[1]> get "201603240728|167838166|liaoning|unicom|-"
"{\"totalCount\":\"10\",\"user_count\":\"8\",\"bufCnt_not_none\":0,\"buf_user_Count\":0,\"bufCnt_sum\":0}"

分析: 如果使用单一片键的话存在以下问题:

  1. 要求访问新的数据比访问老的数据更频繁 -- 可用 时间戳、日期、ObjectId、自增的主键 --- 但是不是写分散
  2. 要使数据块均匀分布在各个分片 -- 可用md5 -- 但是 导致不断将数据从硬盘读到内存中,磁盘IO通常会很慢

    所以我们使用复合片键,即 {coarselyAscending:1,search:1}

    1) 从时间串抽出天日期 做coarselyAscending值,即 $time_day = substr("201603240728", 0, 8)

    2) 需要一个类GUID的随机数,还要常查询的 。 目前没有, 用 167838166|liaoning|unicom 做md5值, 或者直接用loc值

于是

db.apm_web_kpi.ensureIndex({'time_day': 1, 'loc': 1})
db.runCommand( { shardcollection : "kstat.apm_web_kpi", key:{"time_day":1, "loc":1}});

测试时间: 日志数据原来是插入redis的, 使用0.058s即58ms; 添加插入mongo的代码,最小总共使用0.095s即95ms, 增长30ms左右; 不过有时高达0.507s,即最高增长为0.5s左右。

插入数据也快

[root@ ...Stream]# cat /data/logs/FanxingStream-access_20160623123*.log|grep apm|awk '{print $19}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.0636
[root@ ...Stream]# cat /data/logs/FanxingStream-access_20160623153*.log|grep apm|awk '{print $19}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.2406

#后来发现是由于164这个分片通信问题导致的部分花时较长,把164分片权重降低后(或不连164),速度有质的飞越(平均只花了30ms) ---- 可能与redis保存一样快!!
cat /data/logs/FanxingStream-access_20160623172*.log|grep apm|awk '{print $19}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.0908

有三台机器,三个查询入口,根据权重随机选取连接插入,这样能够达到插入负载均衡。

<?php
/*
 * 2016.03.22 新增
 */
$d = '/data/service/';
ini_set('display_errors','On');
error_reporting(E_ERROR | E_WARNING | E_PARSE);
//正式服时为
include(__DIR__ . '/../kstat/config/config.php');
require_once ROOT_DIR . '/apps/Autoloader.php';
apps\Autoloader::register();
$t1 = microtime(true);
require_once dirname(__FILE__) . '/../lib/Shanty-Mongo/vendor/autoload.php';
set_include_path(implode(PATH_SEPARATOR, array(get_include_path())));
define('TESTS_SHANTY_MONGO_DB', 'kstat');
//addConnections
$connections = array(
        'masters' => array(
                0 => array('host' => '10.1.32.164', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB),
                1 => array('host' => '10.1.1.185', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB),
                2 => array('host' => '10.1.1.81', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB),
        ),
//如果配置有复制,可以配置上'slaves' => array(....) 用于查询 
);
$data = .... //数据略
Shanty_Mongo::addConnections($connections);
\apps\mongo\models\Apm_Web_Kpi::insertBatch($data, array('w'=>true));

$t2 = microtime(true);
echo "\n".'耗时'.round($t2-$t1,3).'秒';

使用 printShardingStatus(db.getSisterDB("config"),1); 查看分片情况

       {  "_id" : "kstat",  "primary" : "shard2",  "partitioned" : true }
                kstat.apm_web_kpi
                        shard key: { "time_day" : 1, "loc" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1  10
                                shard2  11
                                shard3  10
                        { "time_day" : { "$minKey" : 1 }, "loc" : { "$minKey" : 1 } } -->> { "time_day" : "20160622", "loc" : "-" } on : shard1 Timestamp(2, 0) 
                        { "time_day" : "20160622", "loc" : "-" } -->> { "time_day" : "20160622", "loc" : "beijing" } on : shard3 Timestamp(3, 0) 
                        { "time_day" : "20160622", "loc" : "beijing" } -->> { "time_day" : "20160622", "loc" : "chongqing" } on : shard1 Timestamp(4, 0) 
                        { "time_day" : "20160622", "loc" : "chongqing" } -->> { "time_day" : "20160622", "loc" : "fujian" } on : shard3 Timestamp(5, 0) 
                        { "time_day" : "20160622", "loc" : "fujian" } -->> { "time_day" : "20160622", "loc" : "gansu" } on : shard1 Timestamp(6, 0) 
                        { "time_day" : "20160622", "loc" : "gansu" } -->> { "time_day" : "20160622", "loc" : "guangdong" } on : shard3 Timestamp(7, 0) 
                        { "time_day" : "20160622", "loc" : "guangdong" } -->> { "time_day" : "20160622", "loc" : "guangxi" } on : shard1 Timestamp(8, 0) 
                        { "time_day" : "20160622", "loc" : "guangxi" } -->> { "time_day" : "20160622", "loc" : "guizhou" } on : shard3 Timestamp(9, 0) 
                        { "time_day" : "20160622", "loc" : "guizhou" } -->> { "time_day" : "20160622", "loc" : "hainan" } on : shard1 Timestamp(10, 0) 
                        { "time_day" : "20160622", "loc" : "hainan" } -->> { "time_day" : "20160622", "loc" : "hebei" } on : shard3 Timestamp(11, 0) 
                        { "time_day" : "20160622", "loc" : "hebei" } -->> { "time_day" : "20160622", "loc" : "heilongjiang" } on : shard1 Timestamp(12, 0) 
                        { "time_day" : "20160622", "loc" : "heilongjiang" } -->> { "time_day" : "20160622", "loc" : "henan" } on : shard3 Timestamp(13, 0) 
                        { "time_day" : "20160622", "loc" : "henan" } -->> { "time_day" : "20160622", "loc" : "hubei" } on : shard1 Timestamp(14, 0) 
                        { "time_day" : "20160622", "loc" : "hubei" } -->> { "time_day" : "20160622", "loc" : "hunan" } on : shard3 Timestamp(15, 0) 
                        { "time_day" : "20160622", "loc" : "hunan" } -->> { "time_day" : "20160622", "loc" : "jiangsu" } on : shard1 Timestamp(16, 0) 
                        { "time_day" : "20160622", "loc" : "jiangsu" } -->> { "time_day" : "20160622", "loc" : "jiangxi" } on : shard3 Timestamp(17, 0) 
                        { "time_day" : "20160622", "loc" : "jiangxi" } -->> { "time_day" : "20160622", "loc" : "jilin" } on : shard1 Timestamp(18, 0) 
                        { "time_day" : "20160622", "loc" : "jilin" } -->> { "time_day" : "20160622", "loc" : "liaoning" } on : shard3 Timestamp(19, 0) 
                        { "time_day" : "20160622", "loc" : "liaoning" } -->> { "time_day" : "20160622", "loc" : "neimenggu" } on : shard1 Timestamp(20, 0) 
                        { "time_day" : "20160622", "loc" : "neimenggu" } -->> { "time_day" : "20160622", "loc" : "ningxia" } on : shard3 Timestamp(21, 0) 
                        { "time_day" : "20160622", "loc" : "ningxia" } -->> { "time_day" : "20160622", "loc" : "qinghai" } on : shard2 Timestamp(21, 1) 
                        { "time_day" : "20160622", "loc" : "qinghai" } -->> { "time_day" : "20160622", "loc" : "shandong" } on : shard2 Timestamp(1, 22) 
                        { "time_day" : "20160622", "loc" : "shandong" } -->> { "time_day" : "20160622", "loc" : "shanxijin" } on : shard2 Timestamp(1, 23) 
                        { "time_day" : "20160622", "loc" : "shanxijin" } -->> { "time_day" : "20160622", "loc" : "shanxishan" } on : shard2 Timestamp(1, 24) 
                        { "time_day" : "20160622", "loc" : "shanxishan" } -->> { "time_day" : "20160622", "loc" : "sichuan" } on : shard2 Timestamp(1, 25) 
                        { "time_day" : "20160622", "loc" : "sichuan" } -->> { "time_day" : "20160622", "loc" : "tianjin" } on : shard2 Timestamp(1, 26) 
                        { "time_day" : "20160622", "loc" : "tianjin" } -->> { "time_day" : "20160622", "loc" : "unknown" } on : shard2 Timestamp(1, 27) 
                        { "time_day" : "20160622", "loc" : "unknown" } -->> { "time_day" : "20160622", "loc" : "xinjiang" } on : shard2 Timestamp(1, 28) 
                        { "time_day" : "20160622", "loc" : "xinjiang" } -->> { "time_day" : "20160622", "loc" : "yunnan" } on : shard2 Timestamp(1, 29) 
                        { "time_day" : "20160622", "loc" : "yunnan" } -->> { "time_day" : "20160622", "loc" : "zhejiang" } on : shard2 Timestamp(1, 30) 
                        { "time_day" : "20160622", "loc" : "zhejiang" } -->> { "time_day" : { "$maxKey" : 1 }, "loc" : { "$maxKey" : 1 } } on : shard2 Timestamp(1, 31) 
  • 把原来的容灾日志保存到mongodb中。原数据每分钟产生230左右条数据.
# 原来源始数据存放于redis中,如下:
10.1.3.22:10100[2]> get "rg_201606211428|10.1.80.173|d.k.com|-|-"
"{\"costtime\":\"89.837\",\"request\":\"107\",\"success\":\"98\",\"request_all\":\"107\",\"flow\":\"22730.000\"}" 
# 根据上面描述,为了保证查询速度,这里以  月时间 + 数据时间分钟值 作为片键索引
db.recover_gateway.ensureIndex({'data_month' => 1, 'data_time' => 1})
db.runCommand( { shardcollection : "kstat.recover_gateway", key:{"time_month":1, "data_time":1}});

插入数据,到目前为止,数据量为60万,查询分片信息,数据块被分得还算是均衡的

插入速度也比较快

[root@ ...Stream]# cat /data/logs/FanxingStream-access_20160623123*.log|grep recover|awk '{print $20}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.0295071
[root@ ...Stream]# cat /data/logs/FanxingStream-access_20160623153*.log|grep recover|awk '{print $20}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.115879

#后来发现是由于164这个分片通信问题导致的部分花时较长,把164分片权重降低后(或不连164),速度有质的飞越(平均只花了20ms) ---- 可能与redis保存一样快!!
 cat /data/logs/FanxingStream-access_20160623172*.log|grep recover|awk '{print $20}' | awk '{sum+=$1} END {print "Average = ", sum/NR}'
Average =  0.0479784
MongoDB Enterprise mongos> db.recover_gateway.count()
604959
MongoDB Enterprise mongos> db.printShardingStatus();
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5761190bcf18d754f39197f2")
}
  shards:
        {  "_id" : "shard1",  "host" : "shard1/10.1.1.185:22001,10.1.32.164:22001" }
        {  "_id" : "shard2",  "host" : "shard2/10.1.1.185:22002,10.1.32.164:22002" }
        {  "_id" : "shard3",  "host" : "shard3/10.1.1.185:22003,10.1.32.164:22003" }
  active mongoses:
        "3.2.6" : 3
  balancer:
        Currently enabled:  yes
        Currently running:  yes
                Balancer lock taken at Wed Jun 22 2016 09:59:04 GMT+0800 (CST) by NingBo_10_1_32_164:20000:1465981193:-461618926:Balancer:653185907
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                4 : Success
                1 : Failed with error 'aborted', from shard2 to shard3
                1 : Failed with error 'aborted', from shard2 to shard1
  databases:
        {  "_id" : "kstat",  "primary" : "shard2",  "partitioned" : true }
                kstat.recover_gateway
                        shard key: { "time_month" : 1, "data_time" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1  2
                                shard2  3
                                shard3  2
                        { "time_month" : { "$minKey" : 1 }, "data_time" : { "$minKey" : 1 } } -->> { "time_month" : "201606", "data_time" : "201606211430" } on : shard1 Timestamp(2, 0) 
                        { "time_month" : "201606", "data_time" : "201606211430" } -->> { "time_month" : "201606", "data_time" : "201606211651" } on : shard3 Timestamp(3, 0) 
                        { "time_month" : "201606", "data_time" : "201606211651" } -->> { "time_month" : "201606", "data_time" : "201606211943" } on : shard1 Timestamp(4, 0) 
                        { "time_month" : "201606", "data_time" : "201606211943" } -->> { "time_month" : "201606", "data_time" : "201606212239" } on : shard3 Timestamp(5, 0) 
                        { "time_month" : "201606", "data_time" : "201606212239" } -->> { "time_month" : "201606", "data_time" : "201606220142" } on : shard2 Timestamp(5, 1) 
                        { "time_month" : "201606", "data_time" : "201606220142" } -->> { "time_month" : "201606", "data_time" : "201606220734" } on : shard2 Timestamp(4, 3) 
                        { "time_month" : "201606", "data_time" : "201606220734" } -->> { "time_month" : { "$maxKey" : 1 }, "data_time" : { "$maxKey" : 1 } } on : shard2 Timestamp(4, 4) 
        {  "_id" : "recover_gateway",  "primary" : "shard1",  "partitioned" : true }

此时当我想再次设置唯一主键时,报错: ---

MongoDB Enterprise mongos> db.recover_gateway.ensureIndex({"data_time":1, "domain":1, "server_ip":1}, {unique:true});
{
        "raw" : {
                "shard1/10.1.1.185:22001,10.1.32.164:22001" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 2,
                        "ok" : 0,
                        "errmsg" : "cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }",
                        "code" : 67,
                        "$gleStats" : {
                                "lastOpTime" : Timestamp(1466523585, 2),
                                "electionId" : ObjectId("7fffffff0000000000000002")
                        }
                },
                "shard2/10.1.1.185:22002,10.1.32.164:22002" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 2,
                        "ok" : 0,
                        "errmsg" : "cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }",
                        "code" : 67,
                        "$gleStats" : {
                                "lastOpTime" : Timestamp(1466561137, 15),
                                "electionId" : ObjectId("7fffffff0000000000000006")
                        }
                },
                "shard3/10.1.1.185:22003,10.1.32.164:22003" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 2,
                        "ok" : 0,
                        "errmsg" : "cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }",
                        "code" : 67,
                        "$gleStats" : {
                                "lastOpTime" : Timestamp(1466553619, 13),
                                "electionId" : ObjectId("7fffffff0000000000000002")
                        }
                }
        },
        "code" : 67,
        "ok" : 0,
        "errmsg" : "{ shard1/10.1.1.185:22001,10.1.32.164:22001: \"cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }\", shard2/10.1.1.185:22002,10.1.32.164:22002: \"cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }\", shard3/10.1.1.185:22003,10.1.32.164:22003: \"cannot create unique index over { data_time: 1.0, domain: 1.0, server_ip: 1.0 } with shard key pattern { time_month: 1.0, data_time: 1.0 }\" }"
}

原因: 唯一索引问题 如果集群在_id上进行了分片,则无法再在其他字段上建立唯一索引 MongoDB无法保证集群中除了片键以外其他字段的唯一性

查询

有三台机器,三个查询入口,根据权重随机选取连接查询

<?php
/*
 * 2016.03.22 新增
 */
$d = '/data/service/';
ini_set('display_errors','On');
error_reporting(E_ERROR | E_WARNING | E_PARSE);
//正式服时为
include(__DIR__ . '/../kstat/config/config.php');
require_once ROOT_DIR . '/apps/Autoloader.php';
apps\Autoloader::register();
$t1 = microtime(true);
require_once dirname(__FILE__) . '/../lib/Shanty-Mongo/vendor/autoload.php';
set_include_path(implode(PATH_SEPARATOR, array(get_include_path())));
define('TESTS_SHANTY_MONGO_DB', 'kstat');
//addConnections
$connections = array(
        'masters' => array(
                0 => array('host' => '10.1.32.164', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB, 'weight' => 10),
                1 => array('host' => '10.1.1.185', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB, 'weight' => 5000),
                2 => array('host' => '10.1.1.81', 'port' => '20000', 'database' => TESTS_SHANTY_MONGO_DB, 'weight' => 5000),
        ),
//如果配置有复制,可以配置上'slaves' => array(....) 用于查询 
);
Shanty_Mongo::addConnections($connections);
//\apps\mongo\models\Apm_Web_Kpi::insertBatch($data, array('w'=>true));
$all = \apps\mongo\models\Recover_Gateway::all(array('data_time' => '201606211720'));
//$all = \apps\mongo\models\Recover_Gateway::all(array('time_month' =>'201606', 'data_time' => '201606221720'));
$a = ($all->export()); //取出所有满足query的数组数据,这个地方是主要耗时操作
$t2 = microtime(true);
echo "\n".'耗时'.round($t2-$t1,3).'秒';
// ----- 如果未完全使用复合片键索引操作 ---- 
//31万,耗时0.275秒
//60万,耗时0.729秒
//84万, 耗时0.955秒
// ----- 如果完全使用到了复合片键操作 ---- 
//31万,耗时0.01秒
//60万,耗时0.024秒
//84万, 耗时0.104秒

总结: 一个理想的片键可以极大的提升CRUD的性能,充分利用分片的优势。

后续继续观察测试上百万上千万数据的读写性能。