Python

Python Story

On 2024/10/25, in Python, by admin

PPT下载地址： Python Story.ppt

http://www.slideshare.net/nnfish/python-story

Python story

View more presentations from small fish.

Pylons 入门实例教程 – cookie 和 session

On 2024/07/03, in Python, by admin

本篇讲述在 Pylons 里使用 cookie 和 session。

示例还是在上篇《Pylons 入门实例教程 – 数据库操作》的代码里继续添加。先来尝试下 cookie，添加新的 cookietest controller。

修改 index 方法，添加显示部分：

    def index(self):
        name = 'NULL'
        if request.cookies.has_key('name'):
            name = request.cookies['name']
        return 'cookie name=%s' % name

cookie 读取可以通过 request.cookies 对象，类似一个字典结构。需要注意的是读取时候用最好 has_key 判断下，这样避免抛 KeyError 异常。当然你也可以 try…catch 捕获一下。

再重新写一个方法，用来写 cookie。

    def writecookie(self):
        response.set_cookie("name", "smallfish")
        return "write cookie ok"

这里只是简单设置一个值得，set_cookie 还有其他参数，具体如下：

set_cookie(self, key, value='', max_age=None, path='/', domain=None, secure=None,
                 httponly=False, version=None, comment=None, expires=None, overwrite=False)

基本一般需要设置：max_age，path，domain，expires 这几个参数。

下面再来尝试一下 session：

smallfish@debian:~/workspace/python/hellodb$ paster controller sessiontest
Creating /home/smallfish/workspace/python/hellodb/hellodb/controllers/sessiontest.py
Creating /home/smallfish/workspace/python/hellodb/hellodb/tests/functional/test_sessiontest.py

和上面 cookie 例子类似，在 controller 里有两个方法，index 负责显示 session，writesession 负责写。

    def index(self):
        name = session.get('name', 'NULL')
        return 'session name=%s' % name

    def writesession(self):
        session['name'] = 'smallfish'
        session.save()
        return "save session ok"

index 方法里 get 后面的 NULL 是默认值。writesession 里需要注意下设置 session 之后需要 save。

删除 session 可以尝试如下：

del session['name']
# 删除所有
session.delete()

到这里，WEB 常用的一些东西在 Pylons 里基本走了一圈，包含 URL、模板、数据库和会话部分。

下一节将会涉及怎么在 Nginx 上发布 Pylons 应用。

Pylons 入门实例教程 – 数据库操作

On 2024/07/01, in Python, by admin

前面两篇入门，讲述了 Pylons 大致开发的流程、表单以及文件上传，思路大致跟传统的开发类似。本篇简单讲述下在 Pylons 如何使用数据库。

本篇侧重点是使用 ORM 框架 SQLAlchemy。现在 Python 社区里关注度比较高的大概有三：SQLAlchemy、SQLObject 和 Storm。其实本人最早是研究了一下 Storm，后来听虾哥（@marchliu）在应用里不是很爽之，遂关注了下他推荐的 SQLAlchemy。当然，你也可以对应数据库的 DB-API 库来进行操作。

示例代码的数据库是 PostgreSQL，对应的 Python 库使用的是 psycopg2。至于 Pg 配置和使用这里不再累赘，请狗之。

Debian/Ubuntu 安装很简单：

sudo aptitude install python-psycopg2

建立一个测试数据库，比如 test：

smallfish@debian:~/workspace/python/hello$ su postgres
postgres@debian:/home/smallfish/workspace/python/hello$ createdb -O smallfish test
postgres@debian:/home/smallfish/workspace/python/hello$ exit
smallfish@debian:~/workspace/python/hello$ psql -h 127.0.0.1 -p 5432 -U smallfish test
用户 smallfish 的口令：
psql (8.4.4)
SSL连接 (加密：DHE-RSA-AES256-SHA，位元：256)

输入 "help" 来获取帮助信息.

test=#

数据库的部分已经OK，下面就是来倒腾 Pylons 啦。

建立新项目，加入支持数据库部分，注意 Enter sqlalchemy那个选项，默认是 False，改成 True：

smallfish@debian:~/workspace/python$ paster create -t pylons hellodb
Selected and implied templates:
  Pylons#pylons  Pylons application template

Variables:
  egg:      hellodb
  package:  hellodb
  project:  hellodb
Enter template_engine (mako/genshi/jinja2/etc: Template language) ['mako']:
Enter sqlalchemy (True/False: Include SQLAlchemy 0.5 configuration) [False]: True
Creating template pylons
Creating directory ./hellodb

改成 True 之后在自动生成的 development.ini 里就有对应的数据库配置选项了。

再建立新的 db controller：

smallfish@debian:~/workspace/python$ cd hellodb/
smallfish@debian:~/workspace/python/hellodb$ paster controller db
Creating /home/smallfish/workspace/python/hellodb/hellodb/controllers/db.py
Creating /home/smallfish/workspace/python/hellodb/hellodb/tests/functional/test_db.py

编辑 development.ini，添加数据库配置部分。smallfish:123456 是对应的 PostgreSQL 用户名/密码，127.0.0.1:5432 是对应的主机地址/端口号，最后的test则是数据库名。

# SQLAlchemy database URL
sqlalchemy.url = postgresql://smallfish:[email protected]:5432/test

编辑 hellodb/model/__init__.py，加上一个叫 msg 的表和字段的定义：

"""The application's model objects"""
from hellodb.model.meta import Session, metadata

from sqlalchemy import orm, schema, types
from datetime import datetime

def init_model(engine):
    """Call me before using any of the tables or classes in the model"""
    Session.configure(bind=engine)

def now():
    return datetime.now()

msg_table = schema.Table('msg', metadata,
    schema.Column('id', types.Integer, schema.Sequence('msg_seq_id', optional=True), primary_key=True),
    schema.Column('content', types.Text(), nullable=False),
    schema.Column('addtime', types.DateTime(), default=now),
)

class Msg(object):
    pass

orm.mapper(Msg, msg_table)

示例 Msg 表很简单，三个字段：ID、内容和时间。

上面的代码除去导入 sqlclchemy 包里几个库，基本上有一个对应表的字段定义，还有一个空的 Msg 对象。

最后一行，则是做一个 map 的动作，把 Msg 映射到 msg_table 上。

下面是不是要在数据库里建立对应的表呢？有个简单的办法可以初始化数据库：paster setup-app development.ini：

smallfish@debian:~/workspace/python/hellodb$ paster setup-app development.ini
Running setup_config() from hellodb.websetup
20:08:43,619 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] select version()
20:08:43,619 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] {}
20:08:43,625 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] select current_schema()
20:08:43,625 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] {}
20:08:43,631 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where n.nspname=current_schema() and lower(relname)=%(name)s
20:08:43,631 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] {'name': u'msg'}
20:08:43,637 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread]
CREATE TABLE msg (
        id SERIAL NOT NULL,
        content TEXT NOT NULL,
        addtime TIMESTAMP WITHOUT TIME ZONE,
        PRIMARY KEY (id)
)

20:08:43,637 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] {}
20:08:43,732 INFO  [sqlalchemy.engine.base.Engine.0x...854c] [MainThread] COMMIT

可以看到上面的输出日志，包括了建表的SQL语句。其中 SERIAL 对应上面 __init__.py 里 Column 的 Seq 定义。serial 类型在 PostgreSQL 可以看成类似 MySQL 的自增ID（auto_increment）。

现在进入 PostgreSQL 查询数据库，就可以看到表以及序列已经建立。

test=# \d
                  关联列表
 架构模式 |    名称    |  型别  |  拥有者
----------+------------+--------+-----------
 public   | msg        | 资料表 | smallfish
 public   | msg_id_seq | 序列数 | smallfish
(2 行记录)

到这里，准备工作已经完毕，包括了初始化数据库，配置文件还有示例 controller。

下面就在 controller 代码里增加读写数据库的功能吧。

首先建立一个表单模板 db.htm ，用来添加并保存到数据库表中：

<form action="/db/add" method="post">
    <input type="text" name="content" />
    <br />
    <input type="submit" value="save" />
</form>

对应 controller index 修改成，很简单。返回到模板：

class DbController(BaseController):

    def index(self):
        return render('/db.htm')

添加 add 方法，对应上面 form 中的 /db/add 路径：

    def add(self):
        content = request.POST['content']
        from hellodb import model
        msg = model.Msg()
        msg.content = content
        model.meta.Session.add(msg)
        model.meta.Session.commit()
        return "add %s ok..." % content

添加部分简单完成。获取 POST 文本框，然后初始化一个 Msg 对象（上面 model 里定义的）。

注意 add 之后，必须手动 commit，这样才会真正保存到数据库。

浏览器访问一下：http://127.0.0.1:5000/db/index，随意添加一点数据吧，这个时候你可以在 PostgreSQL 里查询已经数据已经加进来了。

下面在 index 方法传递一些值到模板，输出刚才已经添加的数据：

    def index(self):
        from hellodb import model
        c.msgs = model.meta.Session.query(model.Msg).all()
        return render('/db.htm')

c.msgs 可以理解成全局变量，关于 c 的定义在 controller前几行就应该看到了。修改模板 db.htm 显示记录：

% for msg in c.msgs:
<p>${msg.id}: ${msg.content} / ${msg.addtime}</p>
% endfor

很简单，只是一个普通 for 循环，遍历 index 方式里传递的 c.msgs。Mako模板还是很易读的吧？

继续刷新下：http://127.0.0.1:5000/db/index，可以在页面上看到已经添加的数据了。

在狂输入了几十条之后，在一页里显示是不是忒土鳖了？

下面再介绍下 Pylons 里 webhelper 里一个分页组件的用法，当然你也可以自己写分页算法。下面是示例：

    def list(self):
        from webhelpers import paginate
        from hellodb import model
        msgs = model.meta.Session.query(model.Msg)
        c.paginator = paginate.Page(
            msgs,
            page=int(request.params.get('page', 1)),
            items_per_page = 10,
        )
        return render("/list.htm")

导入 paginate，然后把查询的数据库对象当参数传递给 paginate.Page，里面的page则是页面的传递的页数参数，items_per_page 就好理解多了，就是多少条一页。这里是10条。

对应的模板 list.htm 如下：

<pre>
% if len(c.paginator):

% for msg in c.paginator:
<p>${msg.id}: ${msg.content}</p>
% endfor

<p> ${c.paginator.pager("count: $item_count $first_item to $last_item , $link_first $link_previous $link_next $link_last")} </p>
% endif

for 部分如同上面示例，下面加了一行pager。里面一些变量从名字上就可以看出功能了。包括了总条数、当前是第几到第几条，然后就是常用的首页、上页、下页和最后一页。

这里链接的文字都是<<， <， >， >>。想改成文字请查看文档吧。。如果是第一页，是不会显示首页和上一页的。这个做过分页的一般都写过类似的代码吧。

现在访问：http://127.0.0.1:5000/db/list，想看到效果当然你得多填点数据哦。10条才会显示分页的挖。

好了，这里对数据库增加和显示部分都已经有示例代码了，当然最后还有一个分页用法。至于删除和更新之类请参考 SQLAlchemy 文档吧。

Pylons 入门实例教程 – 表单和文件上传

On 2024/06/30, in Python, by admin

继续上一篇《Pylons 入门实例教程 – Hello》，现在开始讲在 Pylons 里如何提交表单和上传文件。

继续延用上篇里面的 hello 工程，在 HiController 里添加 form 方法：

    def form(self):
        return render('/form.mako')

加完以后可以访问：http://127.0.0.1:5000/hi/form，会报错。

Server Error，根据报错内容大致就知道模板文件不存在了。如果有其他错误，也可以通过这个页面查看，当然还有很强大的 Debug 个功能哦。当然正式环境一般都是关闭这个功能的。这个，你懂得。。。

好吧，写一个表单的模板，只包含一个简单的文本框和提交按钮示例。

<form action="/hi/submit" method="post">
name: <input type="text" name="name" />
<br />
<input type="submit" value="submit" />
</form>

再添加一个 submit 方法来处理表单提交，

    def submit(self):
        return "hello, name: %s" % request.params['name']

request.params 包含了表单或者URL提交的参数，建议 POST 数据参照下面的上传部分。想获取更详细的列表，可以查看文档或者自己手动 dir()查阅。

下面尝试一下文件上传，首先在 development.ini 添加一个变量，用来存放文件上传后的文件夹。

[app:main]
upload_dir = %(here)s/upload

%(here) 启动后 server 会替换到当前目录地址，上面的地址就是当前路径下的upload文件夹。

修改一下刚才的表单，加一个 file 上传，注意 multipart/form-data 这句，上传必须。

<form action="/hi/submit" method="post"  enctype= "multipart/form-data">
name: <input type="text" name="name" />
<br />
file: <input type="file" name="file" />
<br />
<input type="submit" value="submit" />
</form>

修改 submit 方法，添加文件内容：

    def submit(self):
        name   = request.POST['name']
        myfile = request.POST['file']
        import os
        import shutil
        from pylons import config
        local_name = os.path.join(config['app_conf']['upload_dir'], myfile.filename)
        local_file = open(local_name, "wb")
        shutil.copyfileobj(myfile.file, local_file)
        myfile.file.close()
        local_file.close()
        return "hello, name: %s, upload: %s" % (name, myfile.filename)

里面 import 部分这里仅仅为了示例，正式的代码请放入程序开头部分，POST 内容可以从 request.POST 获取。

config['app_conf']['upload_dir'] 就是刚才配置里 development.ini 定义的地址。这个目录需要自己手动创建哦。

smallfish@debian:~/workspace/python/hello$ mkdir upload

OK，到这里程序部分都已经修改完成。重新访问一下：http://127.0.0.1:5000/hi/form

尝试一下上传，上传后可以在 upload 文件夹下看到文件了吧。。

当然这里只是示例，还需要处理一下上传的名字，防止有特殊符号哦。

Pylons 是 Python 的一个轻量级 MVC Web 开发框架，跟另外一个框架 TurboGears 比较相似，都是集合了一些优秀的组件而成。比如对 Request URL 采用了 Route，Template 采用了 Mako，数据库层则采用了ORM SQLAlchemy，当然，这些组件只是默认，你还可以根据自己喜好来选择其他组件，比如你可以采用 Jinja2 或 Genshi 模板，ORM也可以采用 SQLObject。完全是自由组合。

废话少说，现在开始安装吧。

smallfish@debian:~$ sudo aptitude install python-pylons

Debian/Ubuntu 系列系统可以直接 aptitude 安装，当然你也可以使用 easy_install 或者源码安装。

smallfish@debian:~$ sudo easy_install Pylons

更多安装文档请参考官网安装部分，http://pylonshq.com/docs/en/1.0/gettingstarted/#installing

好了，安装结束，来一个经典的Hello程序吧。

smallfish@debian:~/workspace/python$ paster create -t pylons hello
Selected and implied templates:
  Pylons#pylons  Pylons application template

Variables:
  egg:      hello
  package:  hello
  project:  hello
Enter template_engine (mako/genshi/jinja2/etc: Template language) ['mako']:
Enter sqlalchemy (True/False: Include SQLAlchemy 0.5 configuration) [False]:
Creating template pylons
Creating directory ./hello

下面输出略过，大致解说一下。Pylons 程序可以用 Paste 自动生成一些代码，包括controller。还可以运行 HTTP 服务来测试。

-t 表示自动创建的模板，可以如下来查看有哪些选项，更多就参考 help 吧。

smallfish@debian:~/workspace/python/hello$ paster create --list-templates
Available templates:
  basic_package:   A basic setuptools-enabled package
  paste_deploy:    A web application deployed through paste.deploy
  pylons:          Pylons application template
  pylons_minimal:  Pylons minimal application template

看一下hello的目录结构：

smallfish@debian:~/workspace/python/hello$ ls
development.ini  ez_setup.py  hello.egg-info  README.txt  setup.py
docs             hello        MANIFEST.in     setup.cfg   test.ini

这里具体各个文件意思就不讲解了，程序主体部分都在hello/hello目录下，development.ini 和 test.ini 分别是服务启动的配置文件，用于测试和开发环境。开始先运行一下，看效果吧。。

smallfish@debian:~/workspace/python/hello$ paster serve --reload development.ini
Starting subprocess with file monitor
Starting server in PID 1519.
serving on http://127.0.0.1:5000

在浏览器中打开：http://127.0.0.1:5000 看到页面了吧？恭喜。

继续，写一个简单的显示 hi的 controller 程序吧。

smallfish@debian:~/workspace/python/hello$ paster controller hi
Creating /home/smallfish/workspace/python/hello/hello/controllers/hi.py
Creating /home/smallfish/workspace/python/hello/hello/tests/functional/test_hi.py

自动生成程序和 test 文件。paster 启动服务不需要重启会自动加载，可以浏览器访问：http://127.0.0.1:5000/hi/index

很简单吧，打开 hi.py，基本如下：

class HiController(BaseController):

    def index(self):
        # Return a rendered template
        #return render('/hi.mako')
        # or, return a response
        return 'Hello World'

上面注释部分可以 return 一个模板文件，模板放入 templates 目录下即可。

去除上面的 return ‘Hello’ 返回 return render(‘/hi.mako’)

smallfish@debian:~/workspace/python/hello$ vi hello/templates/hi.mako
% for key, value in request.environ.items():
 ${key} = ${value}
% endfor

刷新 http://127.0.0.1:5000/hi/index ，可以看到一些环境变量的输出了吧。

今天就简单的说到这里吧，下回来一个完整的例子，包括URL、模板、数据库和Session的实例。

Python ConfigParser 与 ConfigObj INI 配置读写顺序

On 2024/04/19, in Python, by admin

默认的ConfigParser对于选项是按照字母顺序排列的。如下代码：

>>> from ConfigParser import ConfigParser
>>> cf = ConfigParser()
>>> cf.add_section('d')
>>> cf.set('d', 'name', 'smallfish')
>>> cf.add_section('a')
>>> cf.set('a', 'name', 'smallfish2')
>>> cf.write(open('d:/a.ini', 'w'))
>>> cf = None

生成配置如下：

[a]
name = smallfish2
[d]
name = smallfish

翻阅了官方文档似乎对ConfigParser中section的顺序没啥解说，毕竟字典本身就是无序的，如果想修改估计只能从源码入手把。不过有一个ConfigObj库还不错，可以实现顺序，当然功能不仅仅如此啦。下载地址：http://www.voidspace.org.uk/python/configobj.html

代码片段如下：

>>> from configobj import ConfigObj
>>> config = ConfigObj()
>>> config.filename = 'd:/a.ini'
>>> config['d'] = {}
>>> config['d']['name'] = 'smallfish'
>>> config['a'] = {}
>>> config['a']['name'] = 'smallfish2'
>>> config.write()

生成配置如下：

[d]
name = smallfish
[a]
name = smallfish2

web.py 数据库操作指南

On 2024/03/19, in Python, by admin

官网地址：http://webpy.org/

web.py是一个小巧灵活的框架，最新稳定版是0.33。这里不介绍web开发部分，介绍下关于数据库的相关操作。

很多Pyer一开始都喜欢自己封装数据库操作类，本人亦如此。不过后来通过观摩web.py的源码，发现其数据库操作部分相当紧凑实用。推荐懒人可以尝试一下。

废话不多，先来安装，有两种方式：

1. easy_install方式，如果木有此工具，可以参考：https://chenxiaoyu.org/blog/archives/23

easy_install web.py

2. 下载源码编译。地址： http://webpy.org/static/web.py-0.33.tar.gz ，解压后执行：

python setup.py install

web.py安装算到此结束，如果想使用其中的db功能，还得借助与相应数据库操作模块，比如MySQLdb、psycopg2。如果需要尝试连接池(database pool)功能，还得装下DBUtils。这几个模块都可以通过easy_install来安装。

下面开始使用吧！

1. 导入模块，定义数据库连接db。

import web
db = web.database(dbn='postgres', db='mydata', user='dbuser', pw='')

2. select 查询

# 查询表
entries = db.select('mytable')

# where 条件
myvar = dict(name="Bob")
results = db.select('mytable', myvar, where="name = $name")
results = db.select('mytable', where="id>100")

# 查询具体列
results = db.select('mytable', what="id,name")

# order by
results = db.select('mytable', order="post_date DESC")

# group
results = db.select('mytable', group="color")

# limit
results = db.select('mytable', limit=10)

# offset
results = db.select('mytable', offset=10)

3. 更新

db.update('mytable', where="id = 10", value1 = "foo")

4. 删除

db.delete('mytable', where="id=10")

5. 复杂查询

# count
results = db.query("SELECT COUNT(*) AS total_users FROM users")
print results[0].total_users

# join
results = db.query("SELECT * FROM entries JOIN users WHERE entries.author_id = users.id")

# 防止SQL注入可以这么干
results = db.query("SELECT * FROM users WHERE id=$id", vars={'id':10})

6 多数据库操作 (web.py大于0.3)

db1 = web.database(dbn='mysql', db='dbname1', user='foo')
db2 = web.database(dbn='mysql', db='dbname2', user='foo')

print db1.select('foo', where='id=1')
print db2.select('bar', where='id=5')

7. 事务

t = db.transaction()
try:
    db.insert('person', name='foo')
    db.insert('person', name='bar')
except:
    t.rollback()
    raise
else:
    t.commit()

# Python 2.5+ 可以用with
from __future__ import with_statement
with db.transaction():
    db.insert('person', name='foo')
    db.insert('person', name='bar')

Python(Stackless) + MongoDB Apache 日志(2G)分析

On 2024/03/04, in Apache, MongoDB, Python, by admin

为何选择Stackless？ http://www.stackless.com

Stackless可以简单的认为是Python一个增强版，最吸引眼球的非“微线程”莫属。微线程是轻量级的线程，与线程相比切换消耗的资源更小，线程内共享数据更加便捷。相比多线程代码更加简洁和可读。此项目是由EVE Online推出，在并发和性能上确实很强劲。安装和Python一样，可以考虑替换原系统Python。:)

为何选择MongoDB？ http://www.mongodb.org

可以在官网看到很多流行的应用采用MongoDB，比如sourceforge，github等。相比RDBMS有啥优势？首先在速度和性能上优势最为明显，不仅可以当作类似KeyValue数据库来使，还包含了一些数据库查询（Distinct、Group、随机、索引等特性）。再有一点特性就是：简单。不论是应用还是文档，还是第三方API，几乎略过一下就可以使用。不过有点遗憾的就是，存储的数据文件很大，超过正常数据的2-4倍之间。本文测试的Apache日志大小是2G，生产的数据文件有6G。寒…希望在新版里能有所缩身，当然这个也是明显的以空间换速度的后果。

本文除去上面提及到的两个软件，还需要安装pymongo模块。http://api.mongodb.org/python/

模块安装方式有源码编译和easy_install，这里就不再累赘。

1. 从Apache日志中分析出需要保存的资料，比如IP，时间，GET/POST，返回状态码等。

fmt_str  = '(?P<ip>[.\d]+) - - \[(?P<time>.*?)\] "(?P<method>.*?) (?P<uri>.*?) HTTP/1.\d" (?P<status>\d+) (?P<length>.*?) "(?P<referere>.*?)" "(?P<agent>.*?)"'
fmt_name = re.findall('\?P<(.*?)>', fmt_str)
fmt_re   = re.compile(fmt_str)

定义了一个正则用于提取每行日志的内容。fmt_name就是提取尖括号中间的变量名。

2. 定义MongoDB相关变量，包括需要存到collection名称。Connection采取的是默认Host和端口。

conn     = Connection()
apache   = conn.apache
logs     = apache.logs

3. 保存日志行

def make_line(line):
    m = fmt_re.search(line)
    if m:
        logs.insert(dict(zip(fmt_name, m.groups())))

4. 读取Apache日志文件

def make_log(log_path):
    with open(log_path) as fp:
        for line in fp:
            make_line(line.strip())

5. 运行把。

if __name__ == '__main__':
    make_log('d:/apachelog.txt')

脚本大致情况如此，这里没有放上stackless部分代码，可以参考下面代码：

import stackless
def print_x(x):
    print x
stackless.tasklet(print_x)('one')
stackless.tasklet(print_x)('two')
stackless.run()

tasklet操作只是把类似操作放入队列中，run才是真正的运行。这里主要用于替换原有多线程threading并行分析多个日志的行为。

补充：

Apache日志大小是2G，671万行左右。生成的数据库有6G。

硬件：Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz 台式机

系统：RHEL 5.2 文件系统ext3

其他：Stackless 2.6.4 MongoDB 1.2

在保存300万左右时候，一切正常。不管是CPU还是内存，以及插入速度都很不错，大概有8-9000条/秒。和以前笔记本上测试结果基本一致。再往以后，内存消耗有点飙升，插入速度也降低。500万左右记录时候CPU达到40%，内存消耗2.1G。在生成第二个2G数据文件时候似乎速度和效率又提升上去了。最终保存的结果不是太满意。

后加用笔记本重新测试了一下1000万数据，速度比上面的671万明显提升很多。初步怀疑有两个地方可能会影响性能和速度：

1. 文件系统的差异。笔记本是Ubuntu 9.10，ext4系统。搜了下ext3和ext4在大文件读写上会有所差距。

2. 正则匹配上。单行操作都是匹配提取。大文件上应该还有优化的空间。

修改 ModPython 下 PYTHON_EGG_CACHE 报错

On 2009/12/16, in Apache, Python, by admin

环境：Linux Apache Python(mod_python)

换了一台新机器，没有配置Mod_Python了，在一些应用里import MySQLdb出现了下面错误：

ExtractionError: Can't extract file(s) to egg cache
The following error occurred while trying to extract file(s) to the Python egg
cache:
  [Errno 13] Permission denied: '/root/.python-eggs'
The Python egg cache directory is currently set to:
  /root/.python-eggs
Perhaps your account does not have write access to this directory?  You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.

解决办法有两种：

1.设置PYTHON_EGG_CACHE环境变量

$ SetEnv PYTHON_EGG_CACHE /tmp/aaa/

目录权限注意要是apache用户，或者简单点就777

2.把egg格式转成目录

$ cd /python-path/site-packages/
$ mv MySQL_python-1.2.3c1-py2.5-linux-x86_64.egg foo.zip
$ mkdir MySQL_python-1.2.3c1-py2.5-linux-x86_64.egg
$ cd MySQL_python-1.2.3c1-py2.5-linux-x86_64.egg
$ unzip ../foo.zip
$ rm ../foo.zip

Pexpect通过SSH执行远程命令

On 2009/12/15, in Python, by admin

pexpect是python一个模块，可以通过：easy_install pexpect 来安装。

这里主要是用pexpect执行ssh，查看远程uptime和df -h看硬盘状况。

#ssh_cmd.py
#coding:utf-8
import pexpect

def ssh_cmd(ip, user, passwd, cmd):
    ssh = pexpect.spawn('ssh %s@%s "%s"' % (user, ip, cmd))
    r = ''
    try:
        i = ssh.expect(['password: ', 'continue connecting (yes/no)?'])
        if i == 0 :
            ssh.sendline(passwd)
        elif i == 1:
            ssh.sendline('yes')
    except pexpect.EOF:
        ssh.close()
    else:
        r = ssh.read()
        ssh.expect(pexpect.EOF)
        ssh.close()
    return r

hosts = '''
192.168.0.12:smallfish:1234:df -h,uptime
192.168.0.13:smallfish:1234:df -h,uptime
'''

for host in hosts.split("\n"):
    if host:
        ip, user, passwd, cmds = host.split(":")
        for cmd in cmds.split(","):
            print "-- %s run:%s --" % (ip, cmd)
            print ssh_cmd(ip, user, passwd, cmd)

hosts数组格式是：主机IP:用户名:密码:命令 (多个命令用逗号, 隔开)
可以看出打印出相应的结果了，可以拼成html发送mail看起来比较美观些咯！

smallfish weblog