qps/rate limit & reduce expire seek range & expire-key hash by piaoairy219 · Pull Request #204 · distributedio/titan

piaoairy219 · 2020-05-13T03:45:32Z

1 limit namespace&command qps/rate in view of all the titan server
2 reduce expire seek range to avoid rocksdb tomstone problem
3 hash expire-key to 256 prefix to improving the expire handling speed and prevent expire-key writing focus on single node.
4 handle empty/illegal commands
5 if connection has been closed by the client, drop left command to process
6 limit max-connection

…ash. just after read the command we check if it is in command list we support, else will skip it

…new limit 2 just balance limit base in active titan server num 3 unit test of rate limit passed

2 add setlimit.sh

…d's qps/rate

2 change qps/rate metrics 3 labels: namespace, command, localip

…and prevent that current expire-keys region writing node have higher load than other nodes

2 fix expire_test.go: gc key can get before runExpire/doExpire if expireat < now; add 2 testCases for hash/string expire: check key is deleted and gc key exists after runExpire

…ed&round&seek&commit metrics label

…ad for dashboard, replace 1.4 with 2, buckets num decrease 50%

nioshield · 2020-05-13T07:09:20Z

db/expire.go

-	expireKeyPrefix = []byte("$sys:0:at:")
-	sysExpireLeader = []byte("$sys:0:EXL:EXLeader")
+	expireKeyPrefix     = []byte("$sys:0:at:")
+	hashExpireKeyPrefix = expireKeyPrefix[:len(expireKeyPrefix)-1]


把hash 值直接保存到expire key 中确实是个好办法,但是重新定义一个变量是否有必要?
是否可以直接把hash 值直接写入到expireKeyPrefix= []byte("$sys:0:at:") 数字0 的位置,这样旧数据也可以兼容在内,对现有key 的拼接改动也会小一些

shafreeck · 2020-05-19T23:45:32Z

db/limitersMgr.go

+}
+
+type CommandLimiter struct {
+	localIp     string


Use localIP instead

shafreeck · 2020-05-19T23:47:43Z

db/limitersMgr.go

+		return nil, errors.New(rateLimit.InterfaceName + " adds is empty")
+	}
+
+	if rateLimit.LimiterNamespace == "" {


If there is no namespace supplied, can we just regard it as a global limitation?
@piaoairy219

shafreeck · 2020-05-19T23:49:17Z

db/limitersMgr.go

+	if rateLimit.LimiterNamespace == "" {
+		return nil, errors.New("limiter-namespace is configured with empty")
+	}
+	if rateLimit.WeightChangeFactor <= 1 {


A configuration validator can be used here.

shafreeck · 2020-05-19T23:51:20Z

db/limitersMgr.go

+	if !(rateLimit.UsageToDivide > 0 && rateLimit.UsageToDivide < rateLimit.UsageToMultiply && rateLimit.UsageToMultiply < 1) {
+		return nil, errors.New("should config 0 < usage-to-divide < usage-to-multiply < 1")
+	}
+	if rateLimit.InitialPercent > 1 || rateLimit.InitialPercent <= 0 {
+		return nil, errors.New("initial-percent should in (0, 1]")
+	}
+


shafreeck · 2020-05-20T00:04:27Z

db/limitersMgr.go

+	strUnit = limitStr[len(limitStr)-1]
+	if strUnit == 'k' || strUnit == 'K' {
+		unit = 1024
+		limitStr = limitStr[:len(limitStr)-1]


Wrap this into a function

shafreeck · 2020-05-20T00:06:08Z

db/limitersMgr.go

+	v, ok := l.limiters.Load(limiterName)
+	var commandLimiter *CommandLimiter
+	if !ok {
+		commandLimiter = l.init(limiterName)


The name init seems too generic for its feature

shafreeck · 2020-05-20T00:07:05Z

db/limitersMgr.go

+		limiterName := k.(string)
+		commandLimiter := v.(*CommandLimiter)
+		if commandLimiter != nil {
+			averageQps := commandLimiter.reportLocalStat(l.conf.GlobalBalancePeriod)


Use averageQPS instead

shafreeck · 2020-05-20T00:09:36Z

db/limitersMgr.go

+	return key
+}
+
+func getNamespaceAndCmd(limiterName string) []string {


Returning the values namespace, command string seems to be more friendly

shafreeck · 2020-05-20T00:13:41Z

db/limitersMgr.go

+	}
+}
+
+func (l *LimitersMgr) getLimit(limiterName string, isQps bool) (int64, int) {


Avoid using a boolean as an argument. Refactor it to getQPSLimit and getRateLimit.

YIDWang · 2020-05-13T04:02:02Z

command/init.go

+		//"getbit":   Desc{Proc: AutoCommit(GetBit), Cons: Constraint{3, flags("r"), 1, 1, 1}},
+		//"bitcount": Desc{Proc: AutoCommit(BitCount), Cons: Constraint{-2, flags("r"), 1, 1, 1}},
+		//"bitpos":   Desc{Proc: AutoCommit(BitPos), Cons: Constraint{-3, flags("r"), 1, 1, 1}},



把实现的命令注释掉有什么考虑吗？

YIDWang · 2020-05-19T02:30:59Z

client.go

+			} else {
+				continue
+			}
+		}


CanExecute 这个逻辑设计的很好，在一些case 下返回的数据是否有问题。

multi
lpush key 1
xxx zz
exec

可以理解这个是安全性方面的考虑？我又两个疑问：

为什么需要这个功能？如果用户发送的是正确的协议，只是命令不对，可以认为是异常用户吗？

累加 3 次错误命令就要断开连接，是否有点太严格？如果这个功能是必要的，最好能有个配置选项。

YIDWang · 2020-05-19T02:33:29Z

command/init.go

-		"bitcount":    BitCount,
+		//"getbit":      GetBit,
+		//"bitpos":      BitPos,
+		//"bitcount":    BitCount,


命令已经实现，可以开放使用

YIDWang · 2020-05-20T00:13:06Z

db/kv.go

+			return kv.txn.Destory(obj, key)
+		}
+
 		if err := expireAt(kv.txn.t, mkey, obj.ID, obj.Type, obj.ExpireAt, at); err != nil {


这个逻辑判断是否 IsExpired 在这个函数已经校验，或者在这个函数实现更加优雅。

如有已经过期，是否需要返回特殊err，让用户知道设置失败。

shafreeck · 2020-05-20T00:17:10Z

db/limitersMgr.go

+			if i == 0 {
+				isQps = true
+			}
+			limit, burst := l.getLimit(limiterName, isQps)


Smart but weird, maybe we should define the closure first and reuse it.

shafreeck · 2020-05-20T00:24:56Z

db/limitersMgr.go

+	qpsLw  LimiterWrapper
+	rateLw LimiterWrapper


QPS has the same meaning of request rate, we should use more friendly names as the limitation type like Command and DataFlow

nioshield · 2020-05-20T02:30:18Z

db/db.go

+	for i := 0; i < EXPIRE_HASH_NUM; i++ {
+		expireHash := fmt.Sprintf("%04d", i)
+		go startExpire(sysdb, &conf.Expire, ls, expireHash)
+	}


关于expire 相关逻辑是否可以封装到expier.go 文件一个独立的方法中

nioshield · 2020-05-20T02:38:05Z

db/expire.go

+		expireLogFlag = "[Expire]"
+		metricsLabel = expire_unhash_worker
+	}
+


此处对于KeyPrefix 的拼接是否可以封装到独立的方法中

nioshield · 2020-05-20T03:09:30Z

db/expire.go

 }

-func runExpire(db *DB, batchLimit int) {
+func runExpire(db *DB, batchLimit int, expireHash string, lastExpireEndTs int64) int64 {


每次传入lastExpireEndTs 作为迭代器开始的位置与每次默认以expireKeyPrefix 开始,性能上差距有多大?

这个在实践中发现影响很大，具体跟使用场景非常相关。对于某些场景，比如每天导入新的数据，过期时间为 1 天的场景，大多数数据会集中过期，这些过期的数据被删除后，会变为 Tombstone ，RocksDB 在查找数据时，会遍历并跳过 Tombstone，如果 Tombstone 过多，比如上百万，则极大的影响遍历性能。

nioshield · 2020-05-20T03:10:58Z

db/expire.go

+	if err != nil {
 		txn.Rollback()
-		zap.L().Error("[Expire] commit failed", zap.Error(err))
+		zap.L().Error(expireLogFlag+" commit failed", zap.Error(err))


commit err 的情况下,是否需要return 0

shafreeck · 2020-05-22T04:50:53Z

db/limitersMgr.go

+
+func (cl *CommandLimiter) checkLimit(cmdName string, cmdArgs []string) {
+	d := cl.qpsLw.waitTime(1)
+	time.Sleep(d)


Why not use the Wait method?

shafreeck · 2020-05-22T04:53:13Z

db/limitersMgr.go

+	return limit, int(burst)
+}
+
+func (l *LimitersMgr) CheckLimit(namespace string, cmdName string, cmdArgs []string) {


Add a comment for the public method

shafreeck · 2020-05-22T09:04:16Z

db/expire.go

+	//iter get keys [key, upperBound), so using now+1 as 2nd parameter will get "at:now:" prefixed keys
+	//we seek end in "at:<now>" replace in "at;" , it can reduce the seek range and seek the deleted expired keys as little as possible.
+	//the behavior should reduce the expire delay in days and get/mget timeout, which are caused by rocksdb tomstone problem


shafreeck · 2020-05-22T09:41:26Z

context/context.go

+	LimitConnection   bool
+	MaxConnection     int64


It is too verbose to use two variables here. Maybe we should use MaxConnection with value 0 as unlimited.

shafreeck · 2020-05-22T09:42:48Z

context/context.go

+	ListZipThreshold  int
+	LimitConnection   bool
+	MaxConnection     int64
+	MaxConnectionWait int64


How does this variable be used? Why sleep a while when the limitation is exceeded? Why not close the connection immediately?

shafreeck · 2020-05-22T09:45:54Z

context/context.go

+	LimitConnection   bool
+	MaxConnection     int64
+	MaxConnectionWait int64
+	ClientsNum        int64


Use an atomic variable to avoid locks

shafreeck · 2020-05-22T10:01:00Z

command/command.go

 func Call(ctx *Context) {
 	ctx.Name = strings.ToLower(ctx.Name)

+	if _, ok := txnCommands[ctx.Name]; ok && ctx.Server.LimitersMgr != nil {


txnCommands is not used anymore, use commands instead.

xufangping added 30 commits September 3, 2019 22:56

fix empty command which cause titan to crash in parsing command

db84a39

change empty command's reply error from ErrEmptyArray to ErrEmptyCommand

3e4b89e

close the connection when meet empty command

c89b27f

some illegal clients send unreadable command which may cause titan cr…

43bd6c3

…ash. just after read the command we check if it is in command list we support, else will skip it

if a connection send unknown commands 3 times, close it

95bd04b

add expire left seconds metrics

63b131d

fix expire left time calculating

a9c28fe

when set left/delay current seconds(expire), also set other seconds to 0

e5ff501

delete zadd args output

fb11c67

first version of rate limit

586b30d

1 use rate.limiter to implement limit and read tikv key/value to get …

27f54e7

…new limit 2 just balance limit base in active titan server num 3 unit test of rate limit passed

qps also can be set burst, its limit also support k/K/m/M suffix

e5018e4

1 change log level--limit not set, to debug

ecd78a2

2 add setlimit.sh

add limit default config items and fix error in config.go

0648ab8

in startSyncNewLimit, just read all match limit once for every comman…

f86245a

…d's qps/rate

change got limit log trace

157d70a

add limit cleared log trace, just log limit is trigger when delay > 0

fd0d8cc

change limit/commandFunc cost seconds factor from 2 to 1.4

756a0e6

fix titan active time decoding bug

46bb6da

fix limit local percent bug

a839995

fix titan active time parsing bug

815efa4

add updateLimitPercent log

9612571

when create commandLimiter, also use localPercent to set limit

cc4255a

fix limitersMgr localPercent using bug and lock before use it

f00d8c5

reportLocalStat every globalBalancePeriod,even the commandLimiter is nil

da49bfa

limit can also work on auth-disabled titan server

30f7d2c

decrease the lock range in reportStat

bcbd5a7

reportLocalStat run in itself go routine

1a50695

1 change how to balance

a7753c4

2 change qps/rate metrics 3 labels: namespace, command, localip

update some config item's description and log

fda1e66

xufangping added 4 commits April 21, 2020 20:44

hash expire-key, it will improve the keys num handled every seconds, …

bd43ec2

…and prevent that current expire-keys region writing node have higher load than other nodes

1 fix runExpire's problem that unhashed goroutine scan hashed expire-key

fe6e84b

2 fix expire_test.go: gc key can get before runExpire/doExpire if expireat < now; add 2 testCases for hash/string expire: check key is deleted and gc key exists after runExpire

unhash expire goroutine use individual limit configuration item、expir…

977b457

…ed&round&seek&commit metrics label

buckets of comand/limit/commandFunc/txn commit cost is too many to lo…

9161d7e

…ad for dashboard, replace 1.4 with 2, buckets num decrease 50%

nioshield reviewed May 13, 2020

View reviewed changes

This was referenced May 18, 2020

Review the implementation of limiting QPS by namespace #205

Open

Review the feature of reducing the seek range of expiration #206

Open

Review the feature of hashing expiration keys #207

Open

Review the handling of server connections #208

Open

shafreeck reviewed May 19, 2020

View reviewed changes

db/limitersMgr.go

}

type CommandLimiter struct {

localIp string

Copy link

Contributor

shafreeck May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use localIP instead

shafreeck reviewed May 19, 2020

View reviewed changes

shafreeck reviewed May 20, 2020

View reviewed changes

YIDWang approved these changes May 20, 2020

View reviewed changes

shafreeck reviewed May 20, 2020

View reviewed changes

nioshield reviewed May 20, 2020

View reviewed changes

shafreeck reviewed May 22, 2020

View reviewed changes

shafreeck mentioned this pull request May 22, 2020

Review the added metrics #210

Open

Conversation

piaoairy219 commented May 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shafreeck May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shafreeck May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shafreeck May 22, 2020 •

edited

Loading

shafreeck May 22, 2020 •

edited

Loading