技术

CPU天梯榜

多年不动手，CPU啥行情是完全不懂了。

查查性能，找找性价比，也就依靠这种天梯榜了。

技术

SQLite的Online Backup备份处理

SQLite的python自带接口里面，是没有在线备份功能的。

所以，一般来说，备份方案使用这种方式：

import sqlite3
from StringIO import StringIO
def init_sqlite_db(app):
    # Read database to tempfile
    con = sqlite3.connect(app.config['SQLITE_DATABASE'])
    tempfile = StringIO()
    for line in con.iterdump():
        tempfile.write('%s\n' % line)
    con.close()
    tempfile.seek(0)
    # Create a database in memory and import from tempfile
    app.sqlite = sqlite3.connect(":memory:")
    app.sqlite.cursor().executescript(tempfile.read())
    app.sqlite.commit()
    app.sqlite.row_factory = sqlite3.Row

大概的核心，也是使用connection的iterdump功能。

不过，如果引用一个库，sqlitebck，使用起来会简单很多。

1	$ pip install sqlitebck

调用接口：

# Basic usage example - memory database saved into file:
>>> import sqlite3
>>> conn = sqlite3.connect(':memory:')
>>> curr = conn.cursor()
# Create table and put there some data:
>>> curr.execute('CREATE TABLE foo (bar INTEGER)')
<sqlite3.Cursor object at 0xb73b2800>
>>> curr.execute('INSERT INTO foo VALUES (123)')
<sqlite3.Cursor object at 0xb73b2800>
>>> curr.close()
>>> conn.commit()
>>> import sqlitebck
# Save in memory database (conn) into file:
>>> conn2 = sqlite3.connect('/tmp/in_memory_sqlite_db_save.db')
>>> sqlitebck.copy(conn, conn2)
>>> conn.close()
>>> curr2 = conn2.cursor()
# Check if data is in file database:
>>> curr2.execute('SELECT * FROM foo');
<sqlite3.Cursor object at 0xb73b2860>
>>> curr2.fetchall()
[(123,)]
# If you want to load file database into memory, just call:
>>> sqlitebck.copy(conn2, conn)

另，为了提高性能，sqlite的内存数据库名字，可以使用保留的:memory:

技术

真★随机数-Random.org

一般编程，使用的随机数大多是伪随机。Random.org提供在线的真随机数服务。

种类很多，大致说说：

范围内的随机整数
给定字符串序列乱序输出
随机图片
字符串随机生成
密码服务
……

网络获取也比较简单：

1 2	curl "https://www.random.org/integers/?num=1&min=1&max=100&col=1&base=10&format=plain&rnd=new" >> 16

当然，也可以用比较正规的api服务：api.random.org

估计到今年（2017）年中收费。

PS：年会抽奖就用这个东西。保证公平公正。（网站提供数字签名服务，铁证）

技术

京东联盟推广多了一个协议选项

看看吧，如果是使用phantomjs的话，就需要多加一些点击事件。

driver.find_element_by_id("adtType_4").click()
        driver.find_element_by_xpath("//div[@id='adtTypeDiv']/div/label[4]").click()
driver.find_element_by_id("protocol_1").click()
        driver.find_element_by_xpath("//div[@id='protocollDiv']/div/label[1]").click()

用python的requests的话，就需要在post里面，添加新字段。

1	protocol=2

技术

python的requests去除掉ssl的warning

requests访问https网站的时候，如果证书出现问题，会raise出来异常。

其实，做爬虫的，还在乎啥证书，安不安全呗。

一般，会在requests的getorpost方法里面加入：

1	verify=False

形如：

1	resp = requests.post("https://media.jd.com/getAdvcode/1",data=data,verify=False)

不过，这样就意味着requests可能会出现warning。

connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)

warning 是意外的日志数据，多了很讨厌。

利用下面的方法关闭：

import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

PS:这种情况一般是探测到了中间人攻击。如果使用动态代理，出现类似warning的时候，最好去掉这个代理（他在抓你的https数据）

技术

Chrome的Cookie导出扩展

这个导出cookie还是蛮好用的。

和python配合，需要到处LWP 类型的Cookie，然后python使用cookielib库。

import cookielib
import requests
r = requests.Session()
r.cookies = cookielib.LWPCookieJar("cookies")
r.cookies.load(ignore_discard=True)
r.headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36"
}

cookies，要符合cookielib的模式。这里，貌似有个字符串规则。

def _really_load(self, f, filename, ignore_discard, ignore_expires):
    magic = f.readline()
    if not re.search(self.magic_re, magic):
        msg = ("%r does not look like a Set-Cookie3 (LWP) format "
               "file" % filename)
        raise LoadError(msg)

magic_re的规则是：

1	magic_re = r"^\#LWP-Cookies-(\d+\.\d+)"

也就是说，cookie文件的格式，第一行需要是这样的：

1	#LWP-Cookies-2.0

老实说，比较烂。用editthiscookie导出后，要修改头一下。

技术

Sublime Text 3 tab替换成空格

Performance->setting

1
2
3

{
true"translate_tabs_to_spaces": true,
}

技术

快捷好用的python console

Sublime Text是很好用的温变编辑器。其中，有个插件可以说是IDLE的最佳替代品。

Sublime Text REPL 快速，好用的python shell

利用这个，可以快速实验代码，调试程序。

技术

在centos7上搭建pyspider爬虫框架

pyspider是全国产的一个开源爬虫框架。

调度、采集、任务处理、结果统计都比较优秀。

在centos下安装pyspider，还是有些小坑的。

这里罗列下基本的安装步骤：

继续阅读···

技术

RaspBerryPi的线序（PinOut）

来个不负责任的一图流？

继续阅读···