sqlalchemy在遍历大量结果ORM对象时占用内存问题解决

silentime

浏览: 331709 次
性别:
来自: 北京

最近访客更多访客>>

wv1124

u012363178

进击的小白

对长亭晚

博主相关

博客

微博

相册

留言

关于我

博客专栏

: 高性能PHP框架Yii分析...
浏览量：83995

文章分类

社区版块

存档分类

博客分类：

orm python

python在内存管理上有一个特点，分配出去的内存，及时对象的引用计数为0，也不会立即释放内存，而是作为内存缓存，等待下次分配，到某个时机才会回收内存，因此在使用sqlalchemy的时候，如果查询结果包含大量结果（大于1000个），需要遍历每个ORM的时候，调用query().all()方法会导致内存激增（sqlalchemy会把所有对象放在内存中），下面是遍历290000+条记录的内存激增：

[I 160802 18:17:05 xxxx:134] c6833 Memory:   3.7%   662M/7870M
[I 160802 18:18:53 xxxx:140] c6833 after xxxx
[I 160802 18:18:53 xxxx:141] c6833 Memory:  29.7%  2716M/7870M

改用query().yield_per(1000)之后，内存分配就不会那么多了：

[I 160802 18:12:15 xxxx:134] b1213 Memory:   2.9%   600M/7870M
[I 160802 18:13:39 xxxx:140] b1213 after xxxx
[I 160802 18:13:39 xxxx:141] b1213 Memory:   9.4%  1112M/7870M

官方手册对于yield_per的描述如下：

sqlalchemy官方手册写道

The purpose of this method is when fetching very large result sets (> 10K rows), to batch results in sub-collections and yield them out partially, so that the Python interpreter doesn’t need to declare very large areas of memory which is both time consuming and leads to excessive memory use. The performance from fetching hundreds of thousands of rows can often double when a suitable yield-per setting (e.g. approximately 1000) is used, even with DBAPIs that buffer rows (which are most).

sqlalchemy官方手册：http://docs.sqlalchemy.org/en/latest/orm/query.html

分享到：

[转]优秀的计算机编程类博客和文章 | git commit添加静态代码检查hook

2016-08-03 11:11
浏览 4087
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论