-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conn.AsyncWrite() performance tracking #423
Comments
Test example: conn.Write(), 0.1 ms
conn.AsyncWrite(), 40 ms
Time difference has been calculated between "start" and "stop". |
Thanks for the info here, however, there is little point in focusing on the latency of every single call of |
@panjf2000 你好作者,我现在发现 asyncWrite 在做游戏网关转发游戏数据的时候会有性能问题。请问有什么办法做到线程安全的实时写回 client 吗? // 初始化连接
func (c *serverConn) init(conn gnet.Conn) error {
c.conn = conn
// 这里启动一个协程,用来将游戏服务器通过下面的 Push 接口调用存到 chWrite channel 的数据,消费后推送回 用户 client
gopool.Go(c.write)
return nil
}
// Push 发送消息(异步),接收所有准备发给 client 的数据
func (c *serverConn) Push(msg []byte, msgType ...int) error {
c.chWrite <- chWrite{typ: dataPacket, msg: msg}
return nil
}
// 消费 chWrite channel,将数据返回 client
func (c *serverConn) write() {
for {
select {
case write, ok := <-c.chWrite:
if !ok {
return
}
buf, err := pack(write.msg)
if err != nil {
log.Errorf("packet message error: %v", err)
continue
}
if err = c.doWrite(buf); err != nil {
log.Errorf("connection:[%v] uid:[%v] write message error: %v", c.ID(), c.UID(), err)
}
}
}
}
// 当有 1000 个连接,将游戏服务器上其余玩家的位置信息广播给 client 时,这里数据量很好,然后通过下面的方法打印时间,会发现从调用 AsyncWrite 到真正写入完成的时间逐渐在增长。后面慢慢到几十秒了
func (c *serverConn) doWrite(buf []byte) (err error) {
if atomic.LoadInt32(&c.state) == int32(network.ConnClosed) {
return
}
start := time.Now()
err = c.conn.AsyncWrite(buf, func(conn gnet.Conn, err error) error {
if err != nil {
prom.GateServerWriteErrorCounter.WithLabelValues(util.ToString(c.id), util.ToString(c.uid), err.Error()).Inc()
}
spend := time.Since(start).Milliseconds()
if spend > 10 {
log.Warnf("cid:%+v uid:%+v asyncWrite spend:%+v ms", c.id, c.uid, spend)
}
return nil
})
return err
}
连接增长到 1000 个的时候,512B 左右的数据包,每秒转发数据包大概在几十万这个量级的时候,延迟就逐渐增长 |
目前 |
你可以通过 |
@panjf2000 我的代码场景和这个 issue 是一样的 #236 ,直接用 golang 的 tcp 目前同步 write没有瓶颈,表现是 cpu 消耗更高。 我本来是想调用 c.Write,这个测试下来是没有写延迟的,但这个确实是并发不安全的,在另一个线程发消息会遇到这个 issue 的问题 操作底层 conn_map 带来的并发读写问题 #525。 |
这个问题目前有进展吗,我们也遇到类似的情况,在连接数为2000的 qps的时候,比较慢,在连接数为2万相同qps的时候,时延蹭一下就涨上去了 |
遇到问题+1。 目测是因为单线程负载饱和导致写入数据的速度跟不上异步产生数据的速度。 发现改大这个。延时能降低。 gnet/internal/netpoll/epoll_events.go Line 33 in ab69eec
有个疑问,MaxAsyncTasksAtOneTime 作用是什么?有 MaxAsyncTasksAtOneTime 限制,高负载时会不会导致 AsyncTasks 无法及时处理导致队列堆积,延时上升,最坏情况导致 oom ? |
|
但也可能导致 asynchronous tasks 被饿死。如果应用会调用大量 AsyncWrite()。 我们内部模拟了一下,如果把 MaxAsyncTasksAtOneTime 去掉,然后负载打满,AsyncWrite() 平均延时从 1.8s 变 200ms。 可不可以如果 task 是 AsyncWrite,则不需要限制? |
What's your specific scenario using |
简化场景:每收到客户端的一个请求,服务端要调用很多(比如10) 次 AsyncWrite。
压力测试的时候和这老哥说的差不多。"蹭"一下就涨上去了。重点怀疑有队列堆积了。所以注意到了 asyncTaskQueue。然后 MaxAsyncTasksAtOneTime 。然后怀疑
然后考虑极端情况下,到宁肯让 eventloop 饿死(会减慢请求接收速度,减慢客户端请求发送速度,减慢服务端处理请求的速度,最后减慢 AsyncWrite 产生的速度,达到自平衡,所以问题不大),也不能让 AsyncWrite 在 asyncTaskQueue 堆起来 (延时飙升最坏情况 oom)。所以把 MaxAsyncTasksAtOneTime 限制去了后又测了一下。 |
我们是用来做群聊的网关,1次请求就会对应多个asyncWrite. 比较好奇事件循环的代码. 我之前看redis是事件循环里面,优先处理上一次遗留的任务,再处理本轮循环的任务. nodejs的事件循环也有类似的机制。
事件循环部分的机制,有些奇怪, berfore本轮任务里面,应该要先处理写回数据的任务才对啊. 如果是这样的话,也能够实现整体的平衡。即使读写比例不一致. |
我们讨论的内容和猜测和这老哥写的一模一样 (lll¬ω¬)...都觉得是 MaxAsyncTasksAtOneTime 的问题。 |
To clarify, I didn't expect the scenarios where there would be extremely multiple asynchronous writes per one read when I added But I think it's worth pondering if it will make sense to put the events of asynchronous write in |
我们比较奇怪 asyncTaskQueue 的作用。 如果我代码理解没错,eventloop 是没有办法保证 asyncTaskQueue 不堆积的。asyncTaskQueue 是无长度限制 list 。 最坏情况 oom,只是理论触发很难,因为确实 eventloop 一直在消耗 task,实际只能看到是队列堆积延时飙升。 asyncTaskQueue 的 task 无论是 AsyncWrite 还是其他事件,即使不紧急,但都是不能随便丢弃的,所以 task 迟早都要执行,而且是在 eventloop 线程里执行, |
I think I've made myself clear about delaying tasks
|
If you were trying to write multiple responses on one request, would using |
不好意思。终于看懂了。asyncTaskQueue 另一种极端情况(一边一直无限制 Enqueue),然后另一边 eventloop 一直 Dequeue 直到空队列,确实会把 eventloop 饿死。 也许最简单最保险的方法是把 MaxAsyncTasksAtOneTime 变成可配置选项,让用户选择? |
This was an alternative in my mind. But adding a new configuration that requires users to comprehend the rationale behind it, I'm afraid this might make things more complicated than it already is. Maybe the simple and better approach is to put the events of asynchronous write in |
@IrineSistiana @someview Could you try out #563 and see if it reduces the latencies for your use cases? Thanks! |
I believe setting MaxAsyncTasksAtOneTime=999999 has same effect as #563. Pressure test looks good. But this is a preliminary test and has no practical value. We have not encountered this problem in our actual environment. Maybe it's because of the small load. |
New thought: maybe introduce a new option type Options struct {
...
// AsyncWriteNoDelay indicates whether to ignore the MaxAsyncTasksAtOneTime limitation and write the data
// to the peer immediately.
AsyncWriteNoDelay bool
}
func WithAsyncWriteNoDelay(noDelay bool) Option {
return func(opts *Options) {
opts.AsyncWriteNoDelay = noDelay
}
} |
Thought: i think when taking about TCP write, all golang users will assume that operation will not have delay. (E.g. go/net set TCPNoDelay by default). And when gnet users call asyncwrite, they believe it has no-delay as well. I don't think using urgentAsyncTaskQueue will cause So I prefer putting asyncwrite in urgentAsyncTaskQueue by default. |
At this point, I'm more worried about the precedence of events than I am about the starving issue. You see, there are some events like |
My understanding, (from code and our pressure tests, not from production env though) The latency issue, that we are facing, is because when the load is high (but not reaching the limit), the queue may be piled up in a very short period and was not dequeued in many eventloop cycles because of the MaxAsyncTasksAtOneTime limit. It is not because the write io was slow. Although Wake and AsyncWirte have different priorities. Under normal circumstances (not reaching cpu limit, the asyncTaskQueue can be cleared in one cycle, so, no latency issues), the latency of Wake should (not tested, just guess) just a little less (<1ms, eventloop cycles is fast) than that of Asyncwrite. Unless the eventloop is at the cpu limit and the events are quened up. When the eventloop is starving, you are already in big trouble. This may happen in tests, but can't happen in production env. The only solution may be switching to multi-thread. So, I think we do not need to worry about the starving issue. Also, there is no mechanism to guarantee the execution order of these two calls. The user can not expect that a Wake call will always be invoked before AsyncWirte . Piling tasks into an unlimited queue without enqueue restriction and dequeue guarantee, is so risky. So, I don't think we should worry about priority either and should use |
OOM caused by the unlimited queue should be the last thing we care about. It's like any memory allocator in any language or kernel, they don't account for the OOM of users' programs. If you need more memory, just throw more physical memory on it or get more machines and split your services. The same rule applies to any networking frameworks or libraries: if your network traffic is already so overloaded that all memory has been taken up by your service, then you should "distributed-system" your service and do some load-balancing to mitigate the problem of memory consumption. While I find your viewpoints of event sequence make sense, I'm not with you on the suggestion of the single tasks queue. We have only a few kinds of event for tasks queue at the moment, but there might be more coming along in the future, this wouldn't be necessarily true, but it could and I'd like to keep that possibility. As a result of which, we need hierarchical(priority) queuing in the case of that complicated scenario. Thus, I have a more pragmatic solution for this: retain the current two queues, but enqueue all tasks into |
和之前的场景测试结果基本没有差别,感觉本质上还是多个连接排队写数据导致的。如果连接数多一些,比如万级别,asyncwritev时延就上去了(payload 500byte),在200ms这个级别。如果多个连接可以并发写,没有这个时延问题,我们采用netpoll库来测试过. |
Tracking the performance of Conn.AsyncWrite()
Need some benchmarks and scenarios to help diagnose it.
The text was updated successfully, but these errors were encountered: