skey4 - Tumblr blog

skey4 · 9 years

Text

Bundler 知识点<转>

Bundler 现在俨然是 Rails 的一个标准装备了。想将我使用 Bundler 的心得记录如下：

Rationale

最简单使用 Bundler 的方式的是在当前目录下创建一个 Gemfile 文件，然后运行 Bundle 。(当然需要先安装 Bundler)

touch GemfileBundle

但是现在 Gemfile 里面没有任何内容，所以会报一个 WARNING： The Gemfile specifies no dependencies 将下列内容写入 Gemfile 文件。

source "http://rubygems.org" gem "rails", "3.0.0.rc" gem "rack-cache" gem "nokogiri", "~> 1.4.2"

再次运行 Bundle , 你会发现当前目录下多了一个 Gemfile.lock 文件。

Bundler 会在当前目录下寻找 Gemfile 文件，然后按照他的规则解析，并安装 gems 最后将安装好的 gems 版本和依赖等信息记入 Gemfile.lock 文件

source

注意之前的 Gemfile 文件内容中的：

source "http://rubygems.org"

这是指定 gem 安装的搜索源。默认是 http://rubygems.org , 这是官方源。国内可以使用 http://ruby.taobao.org/ (由衷感谢淘宝) 你可以指定多个 source ，bundler 会按照顺序一个个查找。

Group

在 Gemfile 中，你可以指定 group

source "http://rubygems.org" gem "rails", "3.2.2" gem "rack-cache", :require => "rack/cache" gem "nokogiri", "~> 1.4.2" group :development do gem "sqlite3" end group :production do gem "pg" end

然后我们就可以使用 bundle install --without production 跳过 :production 组的 gem 的安装。

另外，我们使用 Bundler.require(:default, :development), 来 require rails rack/cache nokogiri sqlite3 而不 require pg

require

默认情况下： Bundle.require 会直接 require 和 gem 同名的文件。

但是你可以使用如下三种形式改变它：

gem "redis", :require => ["redis/connection/hiredis", "redis"] gem "webmock", :require => false gem "rack-cache", :require => "rack/cache"

其中第一种会在 require redis 时候去加载 redis/connection/hiredis 和 redis 两个文件。第二种不会加载任何文件。第三种会在 require rack-cache 时候去加载 rack/cache 文件。

version

关于 Gemfile 中的 gem 的版本问题，主要是涉及到之后的 gem 升级。随着时间的推移，我们项目中使用的 gem 如果不及时更新，就有可能给我们的项目造成潜在的危险（比如说某个 gem 的 bug 导致我们的网站有可能遭受攻击）

对于这个问题，一般有三种态度

Optimistic Version Constraint（乐观）

Pessimistic Version Constraint（悲观）

Absolute Version Constraint（超级悲观）

第一种

gem 'devise', '>= 1.3.4'

假定 1.3.4 之后的 devise 版本都是可用的，这时候运行 bundle update 后，会获取 devise 的最新版本安装。

第二种

gem 'library', '~> 2.2'

这种情况下，bundle update 会获取 library 2.2 到 3.0 之间的最新版本。

第三种

gem 'library', '2.2'

这种情况下，bundle update 不会更新 library

三种情况下，多数人会推荐使用第二种。这样既能保证 gem 及时更新，又能保证尽量不会破坏我们的功能。因为按照版本命名规范第一个版本号不改变代表不改变接口。详细参考gem 的 versioning policies 说明

原文出自：http://blog.zlxstar.me/blog/2012/08/28/bundler-tips/

#bundler

0 notes

skey4 · 9 years

Text

nginx扩展开发手册

https://www.airpair.com/nginx/extending-nginx-tutorial 记录下

#extend #nginx

0 notes

skey4 · 9 years

Text

Lua FFI 实战<转>

由来

FFI库，是LuaJIT中最重要的一个扩展库。它允许从纯Lua代码调用外部C函数，使用C数据结构。有了它，就不用再像Lua标准math库一样，编写Lua扩展库。把开发者从开发Lua扩展C库（语言/功能绑定库）的繁重工作中释放出来。

FFI简介

FFI库，允许从纯Lua代码调用外部C函数，使用C数据结构。

FFI库最大限度的省去了使用C手工编写繁重的Lua/C绑定的需要。不需要学习一门独立/额外的绑定语言——它解析普通C声明。这样可以从C头文件或参考手册中，直接剪切，粘贴。它的任务就是绑定很大的库，但不需要捣鼓脆弱的绑定生成器。

FFI紧紧的整合进了LuaJIT（几乎不可能作为一个独立的模块）。JIT编译器为Lua代码直接访问C数据结构而产生的代码，等同于一个C编译器应该生产的代码。在JIT编译过的代码中，调用C函数，可以被内连处理，不同于基于Lua/C API函数调用。

这一页将简要介绍FFI库的使用方法。

激励范例：调用外部C函数

真的很用容易去调用一个外部C库函数：

① local ffi = require("ffi") ② ffi.cdef[[ int printf(const char* fmt, ...); ]] ③ ffi.C.printf("Hello %s!", "world")

以上操作步骤，如下：

① 加载FFI库 ② 为函数增加一个函数声明。这个包含在`中括号`对之间的部分，是标准C语法。. ③ 调用命名的C函数——非常简单

事实上，背后的实现远非如此简单：③ 使用标准C库的命名空间ffi.C。通过符号名("printf")索引这个命名空间，自动绑定标准C库。索引结果是一个特殊类型的对象，当被调用时，执行printf函数。传递给这个函数的参数，从Lua对象自动转换为相应的C类型。

Ok，使用printf()不是一个壮观的示例。你也可能使用了io.write()和string.format()。但你有这个想法…… 以下是一个Windows平台弹出消息框的示例：

local ffi = require("ffi")

ffi.cdef[[

int MessageBoxA(void *w, const char *txt, const char *cap, int type);

]]

ffi.C.MessageBoxA(nil, "Hello world!", "Test", 0)

Bing! 再一次, 远非如此简单，不?

和要求使用Lua/C API去绑定函数的努力相比：

创建一个外部C文件，

增加一个C函数，遍历和检查Lua传递的参数，并调用这个真实的函数，

传统的处理方式

增加一个模块函数列表和对应的名字，

增加一个luaopen_*函数，并注册所有模块函数，

编译并链接为一个动态库（DLL），

并将库文件迁移到正确的路径，

编写Lua代码，加载模块

等等……

最后调用绑定函数。

唷！（很不爽呀！）

激励示例: 使用C数据结构

FFI库允许你创建，并访问C数据结构。当然，其主要应用是C函数接口。但，也可以独立使用。

Lua构建在高级数据类型之上。它们很灵活、可扩展，而且是动态的。这就是我们大家都喜欢Lua的原因所在。唉，针对特殊任务，你需要一个低级的数据结构时，这可能会低效。例如，一个超大的不同结构的数组，需要通过一张超大的表，存储非常多的小表来实现。这需要大量的内存开销以及性能开销。

这里是一个库的草图，操作一个彩图，以及一个基准。首先，朴素的Lua版本，如下：

local floor = math.floor

local function image_ramp_green(n)

local img = {}

local f = 255/(n-1)

for i=1,n do

img[i] = { red = 0, green = floor((i-1)*f), blue = 0, alpha = 255 }

end

return img

end

local function image_to_grey(img, n)

for i=1,n do

local y = floor(0.3*img[i].red + 0.59*img[i].green + 0.11*img[i].blue)

img[i].red = y; img[i].green = y; img[i].blue = y

end

local N = 400*400

local img = image_ramp_green(N)

for i=1,1000 do

image_to_grey(img, N)

end

以上代码，创建一个160.000像素的一张表，其中每个元素是一张持有4个范围0至255的数字值的表。首先，创建了一张绿色斜坡的图（1D，为了简单化），然后进行1000次灰阶转换操作。实在很蠢蛋，可是我需要一个简单示例……

以下是FFI版本代码。其中，被修改的部分加粗标注：

① local ffi = require("ffi") ffi.cdef[[ typedef struct { uint8_t red, green, blue, alpha; } rgba_pixel; ]] ② local function image_ramp_green(n) local img = ffi.new("rgba_pixel[?]", n) local f = 255/(n-1) ③ for i=0,n-1 do ④ img[i].green = i*f img[i].alpha = 255 end return img end local function image_to_grey(img, n) ③ for i=0,n-1 do ⑤ local y = 0.3*img[i].red + 0.59*img[i].green + 0.11*img[i].blue img[i].red = y; img[i].green = y; img[i].blue = y end end local N = 400*400 local img = image_ramp_green(N) for i=1,1000 do image_to_grey(img, N) end

Ok, 这是不是太困难:

① 首先，加载FFI库，声明底层数据类型。这里我们选择一个数据结构，持有4字节字段，每一个由4x8 RGBA像素组成。

② 通过ffi.new()直接创建这个数据结构——其中'?'是一个占位符，变长数组元素个数。

③ C数据是基于0的（zero-based），所以索引必须是0 到 n-1。你可能需要分配更多的元素，而不仅简化转换一流代码。

④ 由于ffi.new()默认0填充（zero-fills）数组, 我们仅需要设置绿色和alpha字段。

⑤ 调用math.floor()的过程可以省略，因为转换为整数时，浮点数已经被向0截断。这个过程隐式的发生在数据被存储在每一个像素的字段时。

现在让我们看一下主要影响的变更：

首先，内存消耗从22M降到640K(4004004字节)。少了35x。所以，表确实有一个显著的开销。BTW（By the Way: 顺便说一句）: 原始Lua程序在x64平台应该消耗40M内存。

其次，性能：纯Lua版本运行耗时9.57秒（使用Lua解析器52.9秒），而FFI版本在我的主机上耗时0.48秒（YMMV: 因人而异）。快了20x（比Lua解析器快了110x`）。

狂热的读者，可能注意到了为颜色将纯Lua代码版本转为使用数组索引（[1] 替换 .red, [2] 替换 .green 等）应该更加紧凑和更快。这个千真万确（大约1.7x）。从结构切换到数组也会有帮助。

虽然最终的代码不是惯用的，而容易出错。它仍然没有得到甚至接近FFI版本代码的性能。同时，高级数据结构不容易传递给别的C函数，尤其是I/O函数，没有过分转换处罚。

待续

扩展阅读

LuaJit FFI Library

Terra

LPEG: Parsing Expression Grammars For Lua, version 0.12

Lua中通过ffi调用c的结构体变量

使用 luajit 的 ffi 绑定 zeromq

Playing with LuaJIT FFI

LuaJIT FFI 调用 Curl 示例

Lua String Templates

Standalone FFI library for calling C functions from lua

Lua游戏开发实践指南

Lua程序设计:第2版

Beginning Lua Programming

安装LuaJIT

mkdir -p ~/lua-ffi_in_action && cd ~/lua-ffi_in_action git clone http://luajit.org/git/luajit-2.0.git cd luajit-2.0 make && make install

祝大家玩的开心

出自: http://guiquanz.me/2013/05/19/lua-ffi-intro/

#luajit #ffi #lua

0 notes

skey4 · 9 years

Text

PHP优化杂烩 <转>

讲 PHP 优化的文章往往都是教大家如何编写高效的代码，本文打算从另一个角度来讨论问题，教大家如何配置高效的环境，如此同样能够达到优化的目的。

pool

一个让人沮丧的消息是绝大多数 PHP 程序员都忽视了池的价值。这里所说的池可不是指数据库连接池之类的东西，而是指进程池，PHP 允许同时启动多个池，每个池使用不同的配置，各个池之间尊重彼此的主权领土完整，互不干涉内政。

pool

有什么好处呢？默认情况下，PHP 只启用了一个池，所有请求均在这个池中执行。一旦某些请求出现拥堵之类的情况，那么很可能会连累整个池出现火烧赤壁的结局；如果启用多个池，那么可以把请求分门别类放到不同的池中执行，此时如果某些请求出现拥堵之类的情况，那么只会影响自己所在的池，从而控制故障的波及范围。

listen

虽然 Nginx 和 PHP 可以部署在不同的服务器上，但是实际应用中，多数人都习惯把它们部署在同一台服务器上，如此就有两个选择：一个是 TCP，另一个是 Unix Socket。

listen

和 TCP 比较，Unix Socket 省略了一些诸如 TCP 三次握手之类的环节，所以相对更高效，不过需要注意的是，在使用 Unix Socket 时，因为没有 TCP 对应的可靠性保证机制，所以最好把 backlog 和 somaxconn 设置大些，否则面对高并发时会不稳定。

进程管理有动态和静态之分。动态模式一般先启动少量进程，再按照请求数的多少实时调整进程数。如此的优点很明显：节省资源；当然它的缺点也很明显：一旦出现高并发请求，系统将不得不忙着 FORK 新进程，必然会影响性能。相对应的，静态模式一次性 FORK 足量的进程，之后不管请求量如何均保持不变。和动态模式相比，静态模式虽然消耗了更多的资源，但是面对高并发请求，它不需要执行高昂的 FORK。

对大流量网站而言，除非服务器资源紧张，否则静态模式无疑是最佳选择。

pm.max_children

启动多少个 PHP 进程合适？在你给出自己的答案之前，不妨看看下面的文章：

php-fpm的max_chindren的一些误区

Should PHP Workers Always Equal Number Of CPUs

一个 CPU 在某一个时刻只能处理一个请求。当请求数大于 CPU 个数时，CPU 会划分时间片，轮流执行各个请求，既然涉及多个任务的调度，那么上下文切换必然会消耗一部分性能，从这个意义上讲，进程数应该等于 CPU 个数，如此一来每个进程都对应一个专属的 CPU，可以把上下文切换损失的效率降到最低。不过这个结论仅在请求是 CPU 密集型时才是正确的，而对于一般的 Web 请求而言，多半是 IO 密集型的，此时这个结论就值得商榷了，因为数据库查询等 IO 的存在，必然会导致 CPU 有相当一部分时间处于 WAIT 状态，也就是被浪费的状态。此时如果进程数多于 CPU 个数的话，那么当发生 IO 时，CPU 就有机会切换到别的请求继续执行，虽然这会带来一定上下文切换的开销，但是总比卡在 WAIT 状态好多了。

那多少合适呢？要理清这个问题，我们除了要关注 CPU 之外，还要关注内存情况：

PHP Memory

如上所示 top 命令的结果中和内存相关的列分别是 VIRT，RES，SHR。其中 VIRT 表示的是内存占用的理论值，通常不用在意它，RES 表示的是内存占用的实际值，看到这里大家可能会有点恐惧：一个 PHP 进程要占用这么多内存？虽然 RES 显示的数值看上去很大，但是这里面有很多是共享内存，也就是 SHR 显示的值，所以单个 PHP 进程实际占用的内存大小等于「RES – SHR」，一般就是 10M 上下。以此推算，理论上 1G 内存能支撑大概一百个 PHP 进程，10G 内存能大概支撑一千个 PHP 进程。当然并不能粗暴认为越多越好，最好结合 PHP 的 status 接口，通过监控活跃连接数的数量来调整。

说明：关于 Web 并发模型方面的知识建议参考范凯的「Web并发模型粗浅探讨」。

原文出自：http://huoding.com/2014/12/25/398#comment-345047

上面说的基本我们都再用，高并发下Unix Socket套接字确实不稳定，backlog 和 somaxconn也都有调整，还是不够稳定，不知道是不是调整的还是不够大，有待继续测试

0 notes

skey4 · 9 years

Text

10 examples of Linux ss command to monitor network connections <转>

ss - socket statistics

In a previous tutorial we saw how to use the netstat command to get statistics on network/socket connections. However the netstat command has long been deprecated and replaced by the ss command from the iproute suite of tools.

The ss command is capable of showing more information than the netstat and is faster. The netstat command reads various /proc files to gather information. However this approach falls weak when there are lots of connections to display. This makes it slower.

The ss command gets its information directly from kernel space. The options used with the ss commands are very similar to netstat making it an easy replacement.

So in this tutorial we are going to see few examples of how to use the ss command to check the network connections and socket statistics.

1. List all connections

The simplest command is to list out all connections.

$ ss | less Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port u_str ESTAB 0 0 * 15545 * 15544 u_str ESTAB 0 0 * 12240 * 12241 u_str ESTAB 0 0 @/tmp/dbus-2hQdRvvg49 12726 * 12159 u_str ESTAB 0 0 * 11808 * 11256 u_str ESTAB 0 0 * 15204 * 15205 .....

We are piping the output to less so that the output is scrollable. The output will contain all tcp, udp and unix socket connection details.

2. Filter out tcp,udp or unix connections

To view only tcp or udp or unix connections use the t, u or x option.

$ ss -t State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.1.2:43839 108.160.162.37:http ESTAB 0 0 192.168.1.2:43622 199.59.149.201:https ESTAB 0 0 192.168.1.2:33141 83.170.73.249:ircd ESTAB 0 0 192.168.1.2:54028 74.125.135.125:xmpp-client

$ ss -t OR $ ss -A tcp

By default the "t" option alone is going to report only those connections that are "established" or CONNECTED". It does not report the tcp sockets that are "LISTENING". Use the '-a' option together with t, to report them all at once.

List all udp connections

$ ss -ua State Recv-Q Send-Q Local Address:Port Peer Address:Port UNCONN 0 0 192.168.1.2:48268 *:* UNCONN 0 0 192.168.1.2:56575 *:* UNCONN 0 0 *:40309 *:* UNCONN 0 0 192.168.1.2:56879 *:* UNCONN 0 0 *:49014 *:* UNCONN 0 0 192.168.1.2:53124 *:* UNCONN 0 0 127.0.1.1:domain *:*

$ ss -a -A udp

The a option tells ss to report both "CONNECTED" and "LISTENING" sockets. Since UDP is a connection-less protocol, just "ss -u" will not report anything in most cases. Therefore we use the "a" option report all UDP connections (connected and listening).

Similarly use the x option to list out all unix socket connections.

3. Do not resolve hostname

To get the output faster, use the "n" option to prevent ss from resolving ip addresses to hostnames. But this will prevent resolution of port numbers as well.

$ ss -nt State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.1.2:43839 108.160.162.37:80 ESTAB 0 0 192.168.1.2:51350 74.125.200.84:443 ESTAB 0 0 192.168.1.2:33141 83.170.73.249:6667 ESTAB 0 0 192.168.1.2:54028 74.125.135.125:5222 ESTAB 0 0 192.168.1.2:48156 66.196.120.44:5050

4. Show only listening sockets

This will list out all the listening sockets. For example apache web server opens a socket connection on port 80 to listen for incoming connections.

$ ss -ltn State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 5 127.0.1.1:53 *:* LISTEN 0 128 127.0.0.1:631 *:* LISTEN 0 128 ::1:631 :::*

The above command lists out all "listening" "tcp" connections. The n option disables hostname resolution of the ip addresses giving the output faster.

To list out all listening udp connections replace t by u

$ ss -lun State Recv-Q Send-Q Local Address:Port Peer Address:Port UNCONN 0 0 127.0.1.1:53 *:* UNCONN 0 0 *:68 *:* UNCONN 0 0 192.168.1.2:123 *:* UNCONN 0 0 127.0.0.1:123 *:* UNCONN 0 0 *:123 *:* UNCONN 0 0 *:5353 *:* UNCONN 0 0 *:47799 *:* UNCONN 0 0 *:25322 *:* UNCONN 0 0 :::54310 :::* .....

5. Print process name and pid

To print out the process name/pid which owns the connection use the p option

$ ss -ltp State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 128 127.0.0.1:9050 *:* LISTEN 0 128 *:90 *:* LISTEN 0 128 *:db-lsp *:* users:(("dropbox",3566,32)) LISTEN 0 5 127.0.0.1:6600 *:* LISTEN 0 128 127.0.0.1:9000 *:* users:(("php5-fpm",1620,0),("php5-fpm",1619,0))

In the above output the last column contains the process name and pid. In this example dnsmasq is the process name and 1299 is the pid.

$ sudo ss -ltp [sudo] password for enlightened: State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 127.0.0.1:smtp *:* users:(("master",2051,12)) LISTEN 0 128 *:90 *:* users:(("nginx",1701,6),("nginx",1700,6),("nginx",1699,6),("nginx",1697,6),("nginx",1696,6)) LISTEN 0 5 127.0.0.1:6600 *:* users:(("mpd",2392,5)) LISTEN 0 128 127.0.0.1:9000 *:* users:(("php5-fpm",1620,0),("php5-fpm",1619,0),("php5-fpm",1616,7)) LISTEN 0 16 *:2633 *:* users:(("oned",1853,16)) LISTEN 0 50 127.0.0.1:mysql *:* users:(("mysqld",1095,10)) LISTEN 0 5 127.0.1.1:domain *:* users:(("dnsmasq",1347,5)) LISTEN 0 32 *:ftp *:* users:(("vsftpd",1051,3)) LISTEN 0 128 *:ssh *:* users:(("sshd",1015,3)) LISTEN 0 128 127.0.0.1:ipp *:* users:(("cupsd",688,11)) LISTEN 0 128 :::http :::* users:(("apache2",5322,4),("apache2",5321,4),("apache2",5317,4),("apache2",5316,4),("apache2",5313,4),("apache2",2505,4)) LISTEN 0 128 :::ssh :::* users:(("sshd",1015,4)) LISTEN 0 128 ::1:ipp :::* users:(("cupsd",688,10))

6. Print summary statistics

The s option prints out the statistics.

$ ss -s Total: 526 (kernel 0) TCP: 10 (estab 7, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0 Transport Total IP IPv6 * 0 - - RAW 0 0 0 UDP 15 9 6 TCP 10 9 1 INET 25 18 7 FRAG 0 0 0

7. Display timer information

With the '-o' option, the time information of each connection would be displayed. The timer information tells how long with

$ ss -tn -o State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.1.2:43839 108.160.162.37:80 ESTAB 0 0 192.168.1.2:36335 204.144.140.26:80 timer:(keepalive,26sec,0) ESTAB 0 0 192.168.1.2:33141 83.170.73.249:6667 ESTAB 0 0 192.168.1.2:58857 74.121.141.84:80 timer:(keepalive,23sec,0) ESTAB 0 0 192.168.1.2:42794 173.194.40.239:80 timer:(keepalive,32sec,0)

8. Display only IPv4 or IPv6 socket connections

To display only IPv4 socket connections use the '-f inet' or '-4' option.

$ ss -tl -f inet State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 128 127.0.0.1:9050 *:* LISTEN 0 128 *:90 *:* LISTEN 0 128 *:db-lsp *:* LISTEN 0 5 127.0.0.1:6600 *:*

To display only IPv6 connections use the '-f inet6' or '-6' option.

$ ss -tl6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 ::1:smtp :::* LISTEN 0 128 :::12865 :::* LISTEN 0 128 :::http :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 ::1:ipp :::*

9. Filtering connections by tcp state

The ss command supports filters that can be use to display only specific connections. The filter expression should be suffixed after all options. The ss command accepts filter in the following format.

$ ss [ OPTIONS ] [ STATE-FILTER ] [ ADDRESS-FILTER ]

Now here are some examples of how to filter socket connections by socket states. To display all Ipv4 tcp sockets that are in "connected" state.

$ ss -t4 state established Recv-Q Send-Q Local Address:Port Peer Address:Port 0 0 192.168.1.2:54436 165.193.246.23:https 0 0 192.168.1.2:43386 173.194.72.125:xmpp-client 0 0 192.168.1.2:38355 199.59.150.46:https 0 0 192.168.1.2:56198 108.160.162.37:http

Display sockets with state time-wait

$ ss -t4 state time-wait Recv-Q Send-Q Local Address:Port Peer Address:Port 0 0 192.168.1.2:42261 199.59.150.39:https 0 0 127.0.0.1:43541 127.0.0.1:2633

The state can be either of the following

1. established 2. syn-sent 3. syn-recv 4. fin-wait-1 5. fin-wait-2 6. time-wait 7. closed 8. close-wait 9. last-ack 10. closing 11. all - All of the above states 12. connected - All the states except for listen and closed 13. synchronized - All the connected states except for syn-sent 14. bucket - Show states, which are maintained as minisockets, i.e. time-wait and syn-recv. 15. big - Opposite to bucket state.

Note that many states like syn-sent, syn-recv would not show any sockets most of the time, since sockets remain in such states for a very short time. It would be ideal to use the watch command to detect such socket states in real time.

Here is an example

$ watch -n 1 "ss -t4 state syn-sent"

After running the above command, try opening some website in a browser or download something from some url. Immediately you should see socket connections appearing in the output, but for a very short while.

Every 1.0s: ss -t4 state syn-sent Tue Apr 1 10:07:33 2014 Recv-Q Send-Q Local Address:Port Peer Address:Port 0 1 192.168.1.2:55089 202.79.210.121:https 0 1 192.168.1.2:33733 203.84.220.80:https 0 1 192.168.1.2:36240 106.10.198.33:https

10. Filter connections by address and port number

Apart from tcp socket states, the ss command also supports filtering based on address and port number of the socket. The following examples demonstrate that.

Display all socket connections with source or destination port of ssh.

$ ss -at '( dport = :ssh or sport = :ssh )' State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:ssh *:* LISTEN 0 128 :::ssh :::*

Sockets with destination port 443 or 80

$ ss -nt '( dst :443 or dst :80 )' State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.1.2:58844 199.59.148.82:443 ESTAB 0 0 192.168.1.2:55320 165.193.246.23:443 ESTAB 0 0 192.168.1.2:56198 108.160.162.37:80 ESTAB 0 0 192.168.1.2:54889 192.241.177.148:443 ESTAB 0 0 192.168.1.2:39893 173.255.230.5:80 ESTAB 0 0 192.168.1.2:33440 38.127.167.38:443

The following syntax would also work

$ ss -nt dst :443 or dst :80

More examples

# Filter by address $ ss -nt dst 74.125.236.178 # CIDR notation is also supported $ ss -nt dst 74.125.236.178/16 # Address and Port combined $ ss -nt dst 74.125.236.178:80

Ports can also be filtered with dport/sport options. Port numbers must be prefixed with a ":".

$ ss -nt dport = :80 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.1.2:56198 108.160.162.37:80 ESTAB 0 0 192.168.1.2:39893 173.255.230.5:80 ESTAB 0 0 192.168.1.2:55043 74.125.236.178:80

The above is same as > ss -nt dst :80

Some more examples of filtering

# source address is 127.0.0.1 and source port is greater than 5000 $ ss -nt src 127.0.0.1 sport gt :5000 # local smtp (port 25) sockets $ sudo ss -ntlp sport eq :smtp # port numbers greater than 25 $ sudo ss -nt sport gt :1024 # sockets with remote ports less than 100 $ sudo ss -nt dport \< :100 # connections to remote port 80 $ sudo ss -nt state connected dport = :80

The following operators are supported when comparing port numbers

<= or le : Less than or equal to port >= or ge : Greater than or equal to port == or eq : Equal to port != or ne : Not equal to port < or gt : Less than to port > or lt : Greater than to port

Summary

The above examples cover most of what the ss command supports. For more information check the man pages.

Documentation of the filter syntax can be found in the package iproute2-doc that can be installed on debian/ubuntu systems

$ sudo apt-get install iproute2-doc

The file /usr/share/doc/iproute2-doc/ss.html contains details about the ss command filter syntax.

原文出自： http://www.binarytides.com/linux-ss-command/

#network monitoring #linux

0 notes

skey4 · 9 years

Text

使用dmesg查看Linux IO占用较高的程序

环境： Ubuntu

1. 开启IO监控

sudo sysctl vm.block_dump=1

2. IO监控开启后，系统将记录程序对所有硬盘块的访问，通过dmesg查看

dmesg

[442825.284270] mysqld(11600): READ block 6676888 on xvdb2 (8 sectors)

[442825.289893] mysqld(11600): READ block 11543240 on xvdb2 (8 sectors)

[442825.291317] mysqld(11600): READ block 11543248 on xvdb2 (24 sectors)

3. 使用awk汇总，得到占用磁盘最多的进程

dmesg |awk -F " " '{print $2}'|sort|uniq -c|sort -rn|head -n 100

1564 mysqld(11600):

994 python(11474):

302 nginx(6171):

136 mysqld(29743):

126 mysqld(15528):

71 ntpd(772):

62 mysqld(16837):

4. 调试完毕后，别忘了关闭IO监控。

sudo sysctl vm.block_dump=1

#io高 #dmesg

0 notes

skey4 · 9 years

Text

IO - 同步，异步，阻塞，非阻塞（亡羊补牢篇）<转>

这篇文章是见过的讲的最好的了，故转来供大家学习

当你发现自己最受欢迎的一篇blog其实大错特错时，这绝对不是一件让人愉悦的事。《 IO - 同步，异步，阻塞，非阻塞》是我在开始学习epoll和libevent的时候写的，主要的思路来自于文中的那篇link。写完之后发现很多人都很喜欢，我还是非常开心的，也说明这个问题确实困扰了很多人。随着学习的深入，渐渐的感觉原来的理解有些偏差，但是还是没引起自己的重视，觉着都是一些小错误，无伤大雅。直到有位博友问了一个问题，我重新查阅了一些更权威的资料，才发现原来的文章中有很大的理论错误。我不知道有多少人已经看过这篇blog并受到了我的误导，鄙人在此表示抱歉。俺以后写技术blog会更加严谨的。一度想把原文删了，最后还是没舍得。毕竟每篇blog都花费了不少心血，另外放在那里也可以引以为戒。所以这里新补一篇。算是亡羊补牢吧。

言归正传。同步（synchronous） IO和异步（asynchronous） IO，阻塞（blocking） IO和非阻塞（non-blocking）IO分别是什么，到底有什么区别？这个问题其实不同的人给出的答案都可能不同，比如wiki，就认为asynchronous IO和non-blocking IO是一个东西。这其实是因为不同的人的知识背景不同，并且在讨论这个问题的时候上下文(context)也不相同。所以，为了更好的回答这个问题，我先限定一下本文的上下文。本文讨论的背景是Linux环境下的network IO。本文最重要的参考文献是Richard Stevens的“UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking ”，6.2节“I/O Models ”，Stevens在这节中详细说明了各种IO的特点和区别，如果英文够好的话，推荐直接阅读。Stevens的文风是有名的深入浅出，所以不用担心看不懂。本文中的流程图也是截取自参考文献。

Stevens在文章中一共比较了五种IO Model： blocking IO nonblocking IO IO multiplexing signal driven IO asynchronous IO 由于signal driven IO在实际中并不常用，所以我这只提及剩下的四种IO Model。再说一下IO发生时涉及的对象和步骤。对于一个network IO (这里我们以read举例)，它会涉及到两个系统对象，一个是调用这个IO的process (or thread)，另一个就是系统内核(kernel)。当一个read操作发生时，它会经历两个阶段： 1 等待数据准备 (Waiting for the data to be ready) 2 将数据从内核拷贝到进程中 (Copying the data from the kernel to the process) 记住这两点很重要，因为这些IO Model的区别就是在两个阶段上各有不同的情况。

blocking IO 在linux中，默认情况下所有的socket都是blocking，一个典型的读操作流程大概是这样：

当用户进程调用了recvfrom这个系统调用，kernel就开始了IO的第一个阶段：准备数据。对于network io来说，很多时候数据在一开始还没有到达（比如，还没有收到一个完整的UDP包），这个时候kernel就要等待足够的数据到来。而在用户进程这边，整个进程会被阻塞。当kernel一直等到数据准备好了，它就会将数据从kernel中拷贝到用户内存，然后kernel返回结果，用户进程才解除block的状态，重新运行起来。所以，blocking IO的特点就是在IO执行的两个阶段都被block了。

non-blocking IO

linux下，可以通过设置socket使其变为non-blocking。当对一个non-blocking socket执行读操作时，流程是这个样子：

从图中可以看出，当用户进程发出read操作时，如果kernel中的数据还没有准备好，那么它并不会block用户进程，而是立刻返回一个error。从用户进程角度讲，它发起一个read操作后，并不需要等待，而是马上就得到了一个结果。用户进程判断结果是一个error时，它就知道数据还没有准备好，于是它可以再次发送read操作。一旦kernel中的数据准备好了，并且又再次收到了用户进程的system call，那么它马上就将数据拷贝到了用户内存，然后返回。所以，用户进程其实是需要不断的主动询问kernel数据好了没有。

IO multiplexing

IO multiplexing这个词可能有点陌生，但是如果我说select，epoll，大概就都能明白了。有些地方也称这种IO方式为event driven IO。我们都知道，select/epoll的好处就在于单个process就可以同时处理多个网络连接的IO。它的基本原理就是select/epoll这个function会不断的轮询所负责的所有socket，当某个socket有数据到达了，就通知用户进程。它的流程如图：

当用户进程调用了select，那么整个进程会被block，而同时，kernel会“监视”所有select负责的socket，当任何一个socket中的数据准备好了，select就会返回。这个时候用户进程再调用read操作，将数据从kernel拷贝到用户进程。这个图和blocking IO的图其实并没有太大的不同，事实上，还更差一些。因为这里需要使用两个system call (select 和 recvfrom)，而blocking IO只调用了一个system call (recvfrom)。但是，用select的优势在于它可以同时处理多个connection。（多说一句。所以，如果处理的连接数不是很高的话，使用select/epoll的web server不一定比使用multi-threading + blocking IO的web server性能更好，可能延迟还更大。select/epoll的优势并不是对于单个连接能处理得更快，而是在于能处理更多的连接。）在IO multiplexing Model中，实际中，对于每一个socket，一般都设置成为non-blocking，但是，如上图所示，整个用户的process其实是一直被block的。只不过process是被select这个函数block，而不是被socket IO给block。

Asynchronous I/O

linux下的asynchronous IO其实用得很少。先看一下它的流程：

用户进程发起read操作之后，立刻就可以开始去做其它的事。而另一方面，从kernel的角度，当它受到一个asynchronous read之后，首先它会立刻返回，所以不会对用户进程产生任何block。然后，kernel会等待数据准备完成，然后将数据拷贝到用户内存，当这一��都完成之后，kernel会给用户进程发送一个signal，告诉它read操作完成了。

到目前为止，已经将四个IO Model都介绍完了。现在回过头来回答最初的那几个问题：blocking和non-blocking的区别在哪，synchronous IO和asynchronous IO的区别在哪。先回答最简单的这个：blocking vs non-blocking。前面的介绍中其实已经很明确的说明了这两者的区别。调用blocking IO会一直block住对应的进程直到操作完成，而non-blocking IO在kernel还准备数据的情况下会立刻返回。

在说明synchronous IO和asynchronous IO的区别之前，需要先给出两者的定义。Stevens给出的定义（其实是POSIX的定义）是这样子的： A synchronous I/O operation causes the requesting process to be blocked until that I/O operationcompletes; An asynchronous I/O operation does not cause the requesting process to be blocked; 两者的区别就在于synchronous IO做”IO operation”的时候会将process阻塞。按照这个定义，之前所述的blocking IO，non-blocking IO，IO multiplexing都属于synchronous IO。有人可能会说，non-blocking IO并没有被block啊。这里有个非常“狡猾”的地方，定义中所指的”IO operation”是指真实的IO操作，就是例子中的recvfrom这个system call。non-blocking IO在执行recvfrom这个system call的时候，如果kernel的数据没有准备好，这时候不会block进程。但是，当kernel中数据准备好的时候，recvfrom会将数据从kernel拷贝到用户内存中，这个时候进程是被block了，在这段时间内，进程是被block的。而asynchronous IO则不一样，当进程发起IO 操作之后，就直接返回再也不理睬了，直到kernel发送一个信号，告诉进程说IO完成。在这整个过程中，进程完全没有被block。

各个IO Model的比较如图所示：

经过上面的介绍，会发现non-blocking IO和asynchronous IO的区别还是很明显的。在non-blocking IO中，虽然进程大部分时间都不会被block，但是它仍然要求进程去主动的check，并且当数据准备完成以后，也需要进程主动的再次调用recvfrom来将数据拷贝到用户内存。而asynchronous IO则完全不同。它就像是用户进程将整个IO操作交给了他人（kernel）完成，然后他人做完后发信号通知。在此期间，用户进程不需要去检查IO操作的状态，也不需要主动的去拷贝数据。最后，再举几个不是很恰当的例子来说明这四个IO Model: 有A，B，C，D四个人在钓鱼： A用的是最老式的鱼竿，所以呢，得一直守着，等到鱼上钩了再拉杆； B的鱼竿有个功能，能够显示是否有鱼上钩，所以呢，B就和旁边的MM聊天，隔会再看看有没有鱼上钩，有的话就迅速拉杆； C用的鱼竿和B差不多，但他想了一个好办法，就是同时放好几根鱼竿，然后守在旁边，一旦有显示说鱼上钩了，它就将对应的鱼竿拉起来； D是个有钱人，干脆雇了一个人帮他钓鱼，一旦那个人把鱼钓上来了，就给D发个短信。

原文出自: http://blog.csdn.net/historyasamirror/article/details/5778378

#IO #非阻塞 #阻塞 #异步 #同步

0 notes

skey4 · 9 years

Text

Tables on SSD, Redo/Binlog/SYSTEM-tablespace on HDD <转>

I recently did a disk bound DBT-2 benchmarking on SSD/HDD (MySQL 5.4.0, InnoDB). Now I'm pretty confident that storing tables on SSD, redo/Binlog/SYSTEM-tablespace on HDD will be one of the best practices for the time being. This post is a detailed benchmarking report. (This post is very long and focusing on InnoDB only. If you are familiar with HDD/SSD/InnoDB architecture and understand what my blog title means, skipping section 1 (general theory) then reading from section 2 (benchmarking results) would be fine. ) 1. General Theory of HDD, SSD and InnoDB SSD is often called as a disruptive storage technology. Currently storage capacity is much smaller and unit price is much higher than HDD, but the situation is very rapidly changing. In the near future many people will use SSD instead of HDD. From DBA's standpoint, you have a couple of choices for storage allocation. - Storing all files on SSD, not using HDD at all - Storing all files on HDD, not using SSD at all - Using SSD and HDD altogether (some files on SSD, others on HDD). Which is the best approach? My favorite approach is storing tables on SSD, storing Redo Log files, Binary Log files, and SYSTEM-tablespace(ibdata) on HDD. I describe a detailed reason and some DBT-2 benchmarking results below. 1.1 HDD is very good at sequential writes if write cache is enabled Using battery backed up write cache(BBWC) is one of the best practices for RDBMS world. BBWC is normally equipped with hardware raid controller. Normally InnoDB flushes data to disks (redo log files) per each commit to guarantee durability. You can change this behavior by changing innodb_flush_log_at_trx_commit parameter to 0 or 2, but default setting (1) is recommended if you need durability. Without write cache, flushing to disks require disk rotation and disk seek. Redo logs are sequentially written, so disk seek doesn’t happen when the disk is dedicated for redo log files, but disk rotation still happens. Average disk rotation overhead is 1/2 round. Theoretical maximum throughput is only 500 flushes per second when using single 15000RPM drive (15,000 * 2 / 60 seconds = 500). If using write cache, the situation is greatly improved. Flushing to write cache does not require disk seek/rotation so finishes very quickly. The data in write cache will be written to disk with optimal order (i.e. writing many data at one time), so total throughput is highly improved. Over 10,000 fsync() per second is not impossible.

Sometimes people say that write cache is not effective because the total write volume is the same. This is not correct. On HDD, disk seek & rotation overhead is huge. By using write cache, internally a lot of write operations are converted into small number of write operations. Then the total number of disk seeks & rotations can be much smaller. For random-write oriented files (i.e. index files), disk seeks & rotations still happen (but reduced so still very effective) , but for sequential-write oriented files (i.e. REDO log files) write cache is very effective. Write cache needs to be protected by battery in order not to be invalidated by power failure etc. Using H/W RAID + BBWC + HDD is used for years and now it’s a proven technology. You can setup with pretty reasonable cost. Note that HDD/SSD storage devices also have write cache themselves, but don't turn on. It's very dangerous because it's not protected by battery so data in cache is destroyed by power failure etc. 1.2 SSD is very good at random reads, good at random writes, not so good at sequential writes, compared to HDD Currently read/write performance on SSD highly depends on products and device drivers. SSD is very fast for random reads. I could get over 5,000 *random reads* on single Intel X25-E SSD drive. For HDD, normally only a few hundreads of random reads is possible. Over 10 times difference is huge. For sequential reads, performance difference is smaller but there is still a significant difference on some SSDs(When I tested, single X25-E drive was two times faster than two SAS 15,000RPM RAID1 drives). For writes, I heard both positive and negative information. I tested Intel X25-E SSD then it worked very well (I got over 2,000 random writes on single drive with write cache). But I heard from many people who tested different SSDs that some SSDs don't perform well for writes. Some cases it was much slower than HDD, some cases write performance gradually dropped after a few months, some cases it freezed for a while. You would also be concerned with a "Write Endurance" issue on SSD when running on production environments. On HDD, there is a huge difference between random writes and sequential writes. Sequential writes is very fast on HDD with write cache. On the other hand, on SSD, there is not so much difference between random writes and sequential writes. Though performance highly depends on SSD drives themselves, you probably notice that sequential write performance is not so different between HDD and SSD (or even faster on HDD). Since HDD has a longer history and is cheaper than SSD, currently there is not a strong reason to use SSD for sequential write oriented files. 1.3 MySQL/InnoDB files From these perspectives, it would make sense to locate random i/o oriented files on SSD, sequential write oriented files on HDD. Let's classify MySQL/InnoDB files as follows. Random i/o oriented: - Table files (*.ibd) - UNDO segments (ibdata) Sequential write oriented: - REDO log files (ib_logfile*) - Binary log files (binlog.XXXXXX) - Doublewrite buffer (ibdata) - Insert buffer (ibdata) - Slow query logs, error logs, general query logs, etc By default, table files (*.ibd) are not created but included in InnoDB's SYSTEM-tablespace(ibdata). By using "innodb_file_per_table" parameter, *.ibd files are created then table/index data are stored there. Table files are of course randomly read/written so storing on SSD is better. Note that write cache is also very effective on SSD so using H/W raid with BBWC + SSD would be nice. REDO log files and Binary log files are transactional logs. They are sequentially written so you can get very good performance on HDD with write cache. Sequential disk read will happen on recovery, but normally this will not cause a performance problem because log file size is normally much smaller than data files and sequential reads are much faster than random reads (happening on data files). Doublewrite buffer is a special feature for InnoDB. InnoDB first writes flushed pages to the "doublewrite buffer", then writing the pages to their correct positions on data files. This is to avoid page corruption (Without doublewrite buffer, page might get corrupted if power failure etc happens during writing to disks). Writing to doublewrite buffer is sequential so highly optimized for HDD. Sequential read will happen on recovery. MySQL has a parameter "skip_innodb_doublewrite" to disable doublewrite buffer, but disabling is dangerous. The amount of write data to doublewrite buffer is equivalent to that to data areas, so this is not negligible. Insert buffer is also a special feature for InnoDB. If non-unique, secondary index blocks are not in memory, InnoDB inserts entries to a "insert buffer" to avoid random disk i/o operations. Periodically, the insert buffer is merged into the secondary index trees in the database. Insert buffer enables to reduce the number of disk i/o operations by merging i/o requests to the same block, and random i/o operations can be sequential. So insert buffer is also highly optimized for HDD. Both sequential writes and reads will happen in normal operations. UNDO segments are random i/o oriented. To guarantee MVCC, innodb needs to register old images in UNDO segments. Reading previous images from UNDO segments on disk requires random reads. If you run a very long transaction with repeatable read (i.e. mysqldump --single-transaction) or running a long query, a lot of random reads might happen, so storing UNDO segments on SSD would be better in that case. If you run only short transactions/queries, this will not be an issue. Based on the above, I want to use "innodb_file_per_table", then storing *.ibd files on SSD, storing ib_logfile* (REDO log files), binary logs, ibdata (SYSTEM-tablespace) on HDD. 2 Benchmarking I ran DBT-2(disk i/o intensive) to verify whether my assumption was correct or not. 2.1 Benchmarking environment Sun Fire X4150 CPU: Intel Xeon 3.3GHz, quad core, 8 cores in total HDD: SAS 15,000RPM, 2 disks, RAID 1, write cache enabled on H/W raid controller SSD: Intel X25-E, Single drive, write cache enabled on drive (Note that this is dangerous and not recommended for production at all. I didn't have a H/W raid controller working well on Intel X25-E so I just turned on for benchmarking purpose only) OS: RedHat Enterprise Linux 5.3 (2.6.18-128.el5) Filesystem: ext3 I/O Scheduler: noop MySQL version: 5.4.0 DBT-2 option: ./run_workload.sh -n -s 100 -t 1 -w 100 -d 600 -c 20 (Approximate datafile size is 9.5 GB) my.cnf: [mysqld] basedir=/root/mysql5400 datadir=/ssd/mysql-data innodb_data_home_dir= innodb_data_file_path=/hdd1/ibdata1:500M:autoextend innodb_file_per_table innodb_log_group_home_dir=/hdd/log innodb_log_files_in_group=2 innodb_log_file_size=512M innodb_flush_log_at_trx_commit=1 innodb_buffer_pool_size=2G innodb_flush_method=O_DIRECT innodb_support_xa=0 (Binlog is disabled) This is a disk-bound configuration. If you increase innodb_buffer_pool_size higher, you can get better results but I intentionally set lower to do disk i/o intensive loads. 2.2 Benchmarking results 2.2.1 HDD vs SSD All files on HDD(two drives, RAID1): 3447.25 All files on SSD(single drive) : 14842.44 (The number is NOTPM, higher is better) I got 4.3 times better result on SSD. As Vadim, Matt, and other many people have already shown, just replacing HDD with SSD works very well for DBT-2 benchmarking. The following is iostat result.

HDD 3447.25 NOTPM Device: rrqm/s wrqm/s r/s w/s rMB/s sdf1 4.60 80.03 363.48 377.30 6.52 wMB/s avgrq-sz avgqu-sz await svctm %util 10.14 46.06 17.89 24.18 1.35 99.98 -----cpu----- us sy id wa st 3 1 50 46 0 SSD 14842.44 NOTPM Device: rrqm/s wrqm/s r/s w/s rMB/s sda1 0.00 11.90 1738.65 1812.33 30.73 wMB/s avgrq-sz avgqu-sz await svctm %util 46.42 44.49 4.01 1.13 0.27 95.03 -----cpu----- us sy id wa st 18 5 57 20 0

2.2.2 Storing some files on SSD, others on HDD All files on SSD : 14842.44 Redo log on HDD: 15539.8 Redo log and ibdata on HDD: 23358.63 Redo log and ibdata on tmpfs: 24076.43 Redo log and ibdata on separate SSD drives: 20450.78 These are quite interesting results. Let's see one by one. 2.2.2.1 Redo log file on HDD Storing just redo log files on HDD did not have so good effect (just 4.7% improvement). DBT-2 internally executes 14.5 DML(insert/update/delete) statements per transaction on average, so commit frequency is less than typical web applications. If you execute commit more frequently, performance difference will be bigger.The following is a statistics about how many bytes were written during single DBT-2 benchmarking. You can get this information by iostat -m. MB written on SSD: 31658 MB written on HDD: 1928 So 31658MB were written to SSD but only 1928MB were written to HDD. 20 times difference is huge so just moving redo logs from SSD to HDD would not have big impact. Note that this statistics highly depends on applications. For DBT-2, this doesn't work anyway. 2.2.2.2 Redo log and ibdata on HDD By storing redo log and ibdata on HDD, I got much better result (14842.44->23358.63, 57% improvement!). Dirty pages are written twice (one for doublewrite buffer, the other for actual data area) so by moving ibdata, in which doublewrite buffer is allocated, to different drives, you can decrease the amount of writes to SSD by half. Since doublewrite buffer is sequentially written, it fits for HDD very well. Here is a result of iostat -m. MB written on SSD : 23151 MB written on HDD : 25761 iostat & vmstat results are as follows. Apparently HDD was not so busy so working very well for sequential writes.

SSD-HDD 23358.63NOTPM Device: rrqm/s wrqm/s r/s w/s rMB/s sda1(SATA SSD) 0.03 0.00 2807.15 1909.00 49.48 sdf1(SAS HDD) 0.00 547.08 0.38 737.22 0.01 wMB/s avgrq-sz avgqu-sz await svctm %util 35.91 37.08 2.90 0.61 0.18 84.69 40.03 111.17 0.13 0.18 0.11 7.79 -----cpu----- us sy id wa st 28 9 49 14 0

2.2.2.3 Redo log and ibdata on tmpfs I was interested in whether storing redo log and ibdata on HDD can get the "best" performance or not. To verify this, I tested to store these files on tmpfs. This is not useful in production environment at all, but it's fine to check performance higher limit. tmpfs should be faster than any other high-end SSD(including PCIe SSD). If there is not so big performance difference, moving these files from HDD to very fast SSD is not needed. Here is a result. Redo log and ibdata on HDD(NOTPM): 23358.63 Redo log and ibdata on tmpfs(NOTPM): 24076.43 I got only 3% performance improvement. Actually this is not suprise because HDD was not maxed out when testing "redo log and ibdata on HDD". 2.2.2.4 Redo log and ibdata on separate SSD drive Some people might be interested in how performance is different if using two SSD drives, one for *ibd, the other for Redo/ibdata. The following is a result. Redo log and ibdata on HDD(NOTPM): 23358.63 Redo log and ibdata on separate SSD drives(NOTPM): 20450.78 So I got 12.5% worse result compared to HDD. Does this mean Intel X25-E is worse for sequential writes than SAS HDD? Actually the situation seemed not so simple. When I did a very simple sequential write oriented benchmarking (mysqlslap insert) on single drive, I got very close numbers compared to HDD. So there must be other reasons for performance drop. The following is iostat result.

SSD-SSD 20450.78NOTPM Device: rrqm/s wrqm/s r/s w/s rMB/s sda1(SATA SSD) 0.00 0.00 2430.32 1657.15 43.32 sdb1(SATA SSD) 0.00 12.60 0.75 1017.43 0.01 wMB/s avgrq-sz avgqu-sz await svctm %util 31.06 37.27 2.29 0.56 0.16 67.28 34.46 69.33 0.57 0.56 0.27 27.02 SSD-HDD 23358.63NOTPM Device: rrqm/s wrqm/s r/s w/s rMB/s sda1(SATA SSD) 0.03 0.00 2807.15 1909.00 49.48 sdf1(SAS HDD) 0.00 547.08 0.38 737.22 0.01 wMB/s avgrq-sz avgqu-sz await svctm %util 35.91 37.08 2.90 0.61 0.18 84.69 40.03 111.17 0.13 0.18 0.11 7.79

sda activity on SSD-SSD test was about 15% lower than on SSD-HDD test even though sdb was not fully busy. I have not identified the reason yet,but currently I assume that SATA interface (Host Bus Adapter port) got a bit saturated. I'll try other H/W components (SATA HDD, using different HBAs which has many more ports, etc) and see what will happen in the near future. Anyway, any high end SSD won't beat tmpfs numbers(24076.43NOTPM) so HDD(23358.63NOTPM) would be enough. 3 Conclusion Now I'm pretty confident that for many InnoDB based applications it would be a best practice to store *.ibd files on SSD, and store Redo log files, binary log files, and ibdata (SYSTEM-tablespace) on HDD. I intentionally disabled binary logs for these benchmarking because currently binlog has concurrency issues (breaking InnoDB group-commit). Hopefully this issue will be fixed soon in future 5.4 so I'll be able to show you detailed results at that time. In theory same principles as redo logs(storing on HDD) can be applied. I have done many other benchmarks (disabling doublewrite buffer, running with AUTOCOMMIT, UNDO log performance difference between HDD and SSD, etc) so I'd like to share some interesting results in other posts. A very interesting point I have seen is that "The amount of disk reads/writes" really matters on SSD, which does not matter on HDD. This is because disk seeks & rotation overhead is very small on SSD, so data transfer speed relatively got important. For example, doublewrite buffer really influenced performance on SSD, which did not have impact on HDD. This point might force architecture changes for RDBMS. From this perspective, I am highly impressed by PBXT. I'd like to do intensive tests on PBXT in the near future.

POSTED BY YOSHINORI MATSUNOBU

原文：http://yoshinorimatsunobu.blogspot.tw/2009/05/tables-on-ssd-redobinlogsystem.html

#ssd #mysql #innodb #HDD

0 notes

skey4 · 9 years

Text

golang: 常用数据类型底层结构分析 <转>

虽然golang是用C实现的，并且被称为下一代的C语言，但是golang跟C的差别还是很大的。它定义了一套很丰富的数据类型及数据结构，这些类型和结构或者是直接映射为C的数据类型，或者是用C struct来实现。了解golang的数据类型和数据结构的底层实现，将有助于我们更好的理解golang并写出质量更好的代码。

基础类型

源码在：$GOROOT/src/pkg/runtime/runtime.h 。我们先来看下基础类型：

* basic types

typedef signed char int8;

typedef unsigned char uint8;

typedef signed short int16;

typedef unsigned short uint16;

typedef signed int int32;

typedef unsigned int uint32;

typedef signed long long int int64;

typedef unsigned long long int uint64;

typedef float float32;

typedef double float64;

#ifdef _64BIT

typedef uint64 uintptr;

typedef int64 intptr;

typedef int64 intgo; // Go's int

typedef uint64 uintgo; // Go's uint

#else

typedef uint32 uintptr;

typedef int32 intptr;

typedef int32 intgo; // Go's int

typedef uint32 uintgo; // Go's uint

#endif

* defined types

typedef uint8 bool;

typedef uint8 byte;

int8、uint8、int16、uint16、int32、uint32、int64、uint64、float32、float64分别对应于C的类型，这个只要有C基础就很容易看得出来。uintptr和intptr是无符号和有符号的指针类型，并且确保在64位平台上是8个字节，在32位平台上是4个字节，uintptr主要用于golang中的指针运算。而intgo和uintgo之所以不命名为int和uint，是因为int在C中是类型名，想必uintgo是为了跟intgo的命名对应吧。intgo和uintgo对应golang中的int和uint。从定义可以看出int和uint是可变大小类型的，在64位平台上占8个字节，在32位平台上占4个字节。所以如果有明确的要求，应该选择int32、int64或uint32、uint64。byte类型的底层类型是uint8。可以看下测试：

package main

import (

"fmt"

"reflect"

)

func main() {

var b byte = 'D'

fmt.Printf("output: %v\n", reflect.TypeOf(b).Kind())

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

output: uint8

数据类型分为静态类型和底层类型，相对于以上代码中的变量b来说，byte是它的静态类型，uint8是它的底层类型。这点很重要，以后经常会用到这个概念。

rune类型

rune是int32的别名，用于表示unicode字符。通常在处理中文的时候需要用到它，当然也可以用range关键字。

string类型

string类型的底层是一个C struct。

struct String

{

byte* str;

intgo len;

};

成员str为字符数组，len为字符数组长度。golang的字符串是不可变类型，对string类型的变量初始化意味着会对底层结构的初始化。至于为什么str用byte类型而不用rune类型，这是因为golang的for循环对字符串的遍历是基于字节的，如果有必要，可以转成rune切片或使用range来迭代。我们来看个例子：

$GOPATH/src

----basictype_test

--------main.go

package main

import (

"fmt"

"unsafe"

)

func main() {

var str string = "hi, 陈一回~"

p := (*struct {

str uintptr

len int

})(unsafe.Pointer(&str))

fmt.Printf("%+v\n", p)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

output: &{str:135100456 len:14}

内建函数len对string类型的操作是直接从底层结构中取出len值，而不需要额外的操作，当然在初始化时必需同时初始化len的值。

slice类型

slice类型的底层同样是一个C struct。

struct Slice

{ // must not move anything

byte* array; // actual data

uintgo len; // number of elements

uintgo cap; // allocated number of elements

};

包括三个成员。array为底层数组，len为实际存放的个数，cap为总容量。使用内建函数make对slice进行初始化，也可以类似于数组的方式进行初始化。当使用make函数来对slice进行初始化时，第一个参数为切片类型，第二个参数为len，第三个参数可选，如果不传入，则cap等于len。通常传入cap参数来预先分配大小的slice，避免频繁重新分配内存。

package main

import (

"fmt"

"unsafe"

)

func main() {

var slice []int32 = make([]int32, 5, 10)

p := (*struct {

array uintptr

len int

cap int

})(unsafe.Pointer(&slice))

fmt.Printf("output: %+v\n", p)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

output: &{array:406958176 len:5 cap:10}

由于切片指向一个底层数组，并且可以通过切片语法直接从数组生成切片，所以需要了解切片和数组的关系，否则可能就会不知不觉的写出有bug的代码。比如有如下代码：

package main

import (

"fmt"

)

func main() {

var array = [...]int32{1, 2, 3, 4, 5}

var slice = array[2:4]

fmt.Printf("改变slice之前: array=%+v, slice=%+v\n", array, slice)

slice[0] = 234

fmt.Printf("改变slice之后: array=%+v, slice=%+v\n", array, slice)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

改变slice之前: array=[1 2 3 4 5], slice=[3 4]

改变slice之后: array=[1 2 234 4 5], slice=[234 4]

您可以清楚的看到，在改变slice后，array也被改变了。这是因为slice通过数组创建的切片指向这个数组，也就是说这个slice的底层数组就是这个array。因此很显然，slice的改变其实就是改变它的底层数组。当然如果删除或添加元素，那么len也会变化，cap可能会变化。

那这个slice是如何指向array呢？slice的底层数组指针指向array中索引为2的元素(因为切片是通过array[2:4]来生成的)，len记录元素个数，而cap则等于len。

之所以说cap可能会变，是因为cap表示总容量，添加或删除操作不一定会使总容量发生变化。我们接着再来看另一个例子：

package main

import (

"fmt"

)

func main() {

var array = [...]int32{1, 2, 3, 4, 5}

var slice = array[2:4]

slice = append(slice, 6, 7, 8)

fmt.Printf("改变slice之前: array=%+v, slice=%+v\n", array, slice)

slice[0] = 234

fmt.Printf("改变slice之后: array=%+v, slice=%+v\n", array, slice)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

改变slice之前: array=[1 2 3 4 5], slice=[3 4 6 7 8]

改变slice之后: array=[1 2 3 4 5], slice=[234 4 6 7 8]

经过append操作之后，对slice的修改并未影响到array。原因在于append的操作令slice重新分配底层数组，所以此时slice的底层数组不再指向前面定义的array。

但是很显然，这种规则对从切片生成的切片也是同样的，请看代码：

package main

import (

"fmt"

)

func main() {

var slice1 = []int32{1, 2, 3, 4, 5}

var slice2 = slice1[2:4]

fmt.Printf("改变slice2之前: slice1=%+v, slice2=%+v\n", slice1, slice2)

slice2[0] = 234

fmt.Printf("改变slice2之后: slice1=%+v, slice2=%+v\n", slice1, slice2)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

改变slice2之前: slice1=[1 2 3 4 5], slice2=[3 4]

改变slice2之后: slice1=[1 2 234 4 5], slice2=[234 4]

slice1和slice2共用一个底层数组，修改slice2的元素导致slice1也发生变化。

package main

import (

"fmt"

)

func main() {

var slice1 = []int32{1, 2, 3, 4, 5}

var slice2 = slice1[2:4]

fmt.Printf("改变slice2之前: slice1=%+v, slice2=%+v\n", slice1, slice2)

slice2 = append(slice2, 6, 7, 8)

fmt.Printf("改变slice2之后: slice1=%+v, slice2=%+v\n", slice1, slice2)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

改变slice2之前: slice1=[1 2 3 4 5], slice2=[3 4]

改变slice2之后: slice1=[1 2 3 4 5], slice2=[3 4 6 7 8]

而append操作可令slice1或slice2重新分配底层数组，因此对slice1或slice2执行append操作都不会相互影响。

接口类型

接口在golang中的实现比较复杂，在$GOROOT/src/pkg/runtime/type.h中定义了：

struct Type

{

uintptr size;

uint32 hash;

uint8 _unused;

uint8 align;

uint8 fieldAlign;

uint8 kind;

Alg *alg;

void *gc;

String *string;

UncommonType *x;

Type *ptrto;

};

在$GOROOT/src/pkg/runtime/runtime.h中定义了：

struct Iface

{

Itab* tab;

void* data;

};

struct Eface

{

Type* type;

void* data;

};

struct Itab

{

InterfaceType* inter;

Type* type;

Itab* link;

int32 bad;

int32 unused;

void (*fun[])(void);

};

interface实际上是一个结构体，包括两个成员，一个是指向数据的指针，一个包含了成员的类型信息。Eface是interface{}底层使用的数据结构。因为interface中保存了类型信息，所以可以实现反射。反射其实就是查找底层数据结构的元数据。完整的实现在：$GOROOT/src/pkg/runtime/iface.c 。

package main

import (

"fmt"

"unsafe"

)

func main() {

var str interface{} = "Hello World!"

p := (*struct {

tab uintptr

data uintptr

})(unsafe.Pointer(&str))

fmt.Printf("%+v\n", p)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

output: &{tab:134966528 data:406847688}

map类型

golang的map实现是hashtable，源码在：$GOROOT/src/pkg/runtime/hashmap.c 。

struct Hmap

{

uintgo count;

uint32 flags;

uint32 hash0;

uint8 B;

uint8 keysize;

uint8 valuesize;

uint16 bucketsize;

byte *buckets;

byte *oldbuckets;

uintptr nevacuate;

};

测试代码如下：

package main

import (

"fmt"

"unsafe"

)

func main() {

var m = make(map[string]int32, 10)

m["hello"] = 123

p := (*struct {

count int

flags uint32

hash0 uint32

B uint8

keysize uint8

valuesize uint8

bucketsize uint16

buckets uintptr

oldbuckets uintptr

nevacuate uintptr

})(unsafe.Pointer(&m))

fmt.Printf("output: %+v\n", p)

}

$ cd $GOPATH/src/basictype_test

$ go build

$ ./basictype_test

output: &{count:407032064 flags:0 hash0:134958144 B:192 keysize:0 valuesize:64 bucketsize:30063 buckets:540701813 oldbuckets:0 nevacuate:0}

golang的坑还是比较多的，需要深入研究底层，否则很容易掉坑里。

原文出自: http://studygolang.com/articles/706

#常用数据结构 #go

0 notes

skey4 · 9 years

Text

浅谈、男人需要完成的事！<转>

　　1,事业永远第一　　虽然金钱不是万能的,但没有钱是万万不能的,虽然这句话很俗,但绝对有道理,所以30岁之前,请把你大部分精力放在你的事业上. 　　2,别把钱看得太重　　不要抱怨自己现在工资低,银行存款4位数以下,看不到前途,现在要做的就是努力学习,即使你文凭再高,怎么把理论运用到实践还是需要一个很长的锻炼过程,社会永远是一所最博大的大学,它让你学到的知识远比你在学校学到的重要得多,所以同样,你也别太介意学历低.30岁之前靠自己能力买车买房的人还是极少. 　　3,学会体谅父母　　别嫌他们唠叨,等你为人父了你就知道可怜天下父母心,在他们眼里你还是个孩子,但他们真的老了,现在得你哄他们开心了,也许只要你的一个电话,一点小礼物,就可以让他们安心,很容易做到. 　　4,交上好朋友　　朋友对你一生都影响重大,不��去结识太多酒肉朋友,至少得有一个能在关键时刻帮助你的朋友,如果遇到这么一个人,就好好把握,日后必定有用,不管他现在是富还是穷. 　　5,别太相信爱情　　心中要有爱,但请别说也别相信那些琼瑶阿姨小说里面的山盟海誓,世上本无永恒,重要的是责任,但女人心海底针,心变了,一切都成枉然,你要做的就是该出手时就出手,该放手时别犹豫.30岁之前的爱情不是假的,但只是大多数人都没有能真正把握好的能力,所以学会量力而行. 　　6,别担心至今还保留初吻　　爱情不在多而在精,别以为自己20多岁还没碰过女孩子就害怕自己永远找不到老婆.以后你会有很多机会认识女孩子,要知道这个社会虽然男人多于女人,但现实是女人其实比男人更担心这个问题.男人30一枝花,你在升值而不是贬值,成熟的爱情往往更美丽更长久,所以不要像疯狗一样看到女孩就想追,学会品味寂寞. 　　7,不要沉迷于任何东西　　所谓玩物而丧志,网络游戏是你在出校门之前玩的,你现在没有多余的时间和精力花费到这上面,否则你透支的东西以后都得偿还.一个人要有兴趣,爱好,但请分清楚轻重. 　　8,年轻没有失败　　不要遇到挫折就灰心,年轻人要时刻保持积极向上的态度.失败了,重来过;失去了，再争取别的。错过了，要分析，下次来,要把握；幼稚了，下次，成熟点。不要紧，会好的，哪怕到了极点，也不要放弃，相信一定可以挺过去。不要消极，会好的。曾经的错，过去了，总不能回味在过去。现在的，很好，累完了，很舒服。不要伤，总会有人在支撑你。　　9,不要轻易崇拜或者鄙视一个人　　人都有偶像,但请拥有你自己的个性.不要刻意去模仿一个人,因为你就是你,是唯一的,独一无二的,要有自信.也不要全盘否定一个人,每个人是有价值的,如果你不能理解他,也请学会接受. 　　10,要有责任心. 　　不管你曾经怎样,但请从现在开始做一个正直的人.男人要有责任心,无论是工作还是生活上,一个有责任心的人才能让别人有安全感,才能让别人觉得你是一个值得信赖的人.我们不要懦弱,但请不要伤害爱你的人和你爱的人,尤其是善良的女孩,因为这个世界善良的女孩不多了,即使不想拥有,但也请让她保持她美丽的心. 　　11,男人的外貌并不重要. 　　不要为自己的长相身高而过分担心,一个心地善良,为人正直的男人远比那些空有英俊相貌,挺拔身材但内心龌龊的男人要帅得多.如果有人以貌取人,请不要太在意,因为你不用去为一个低级趣味的人而难过. 　　12,学会保护身体　　不要以为现在抽烟喝酒,熬夜通宵也没什么事.那是因为你的身体正处于你一生的黄金时段.30岁以后你就能明白力不从心这个词的意义了,身体是革命的本钱,没有好的身体什么也做不了,所以要尽量让自己过有规律的健康生活. 　　13,别觉得一事无成. 　　你现在还没有资格谈成功,当然如果你有千万资产的除外.一开始太固定的职业并不一定是好事,或许在不断的改行当中,你会学到更丰富的知识,而且可以挖掘出自己的潜能,找到最适合你的工作. 　　14,请认真工作　　即使你现在的工作再怎么无聊再怎么低级,也请你认真去对待,要知道任何成功人士都是从最小的事做起,或许你现在学不到多么了不起的知识,但起码你要学会良好的工作态度和工作方法,这对以后很重要. 　　15,请认真对待感情. 　　不要羡慕那些换女人像换鞋一样的花花公子,逢场作戏的爱情只是让你浪费时间浪费精力,一个人最痛苦的不是找不到爱人,而是心中没有了爱,当你把我爱你 3 个字变成你最容易说的一句话时,那么你在爱情的世界里已经很难找到真正的幸福了.爱情没有公平,总有一个人比对方付出得多,即使没有结果,也别觉得不值, 因为你的付出不光是为了她,也是为了你自己的爱,为爱付出是很可贵的,赞自己一下. 　　16.请留一点童心　　在内心深处,哪怕只是一个很小的角落里,请保持一份童心,不是幼稚,但有的时候单纯一点会让你很快乐.所以不要太计较得失,生活本无完美.

出自: http://www.iteye.com/topic/1135999

0 notes