Fantastechnol - Personal Reading Digest Technical Digest and Memo: Network

Showing posts with label Network. Show all posts

2007/12/01

Secure NFS

NFS Security
User Rating: How can I rate this item?

NFS (Network File System) is a widely used and primitive protocol that allows computers to share files over a network. The main problems with NFS are that it relies on the inherently insecure UDP protocol, transactions are not encrypted and hosts and users cannot be easily authenticated. Below we will show a number of issues that one can follow to heal those security problems.

Let us clarify how the NFS service operates. An NFS server is the server with a file system (or a directory) which is called NFS file system (or NFS directory) that will be exported to an NFS client. The NFS client will then have to import (or mount) the exported file system (directory) to itself before being able to have access to the file system (directory). We will annotate each issue below with on server, on client, on client & server and misc. Those mean that issue is done on NFS server, NFS client, both NFS client and server, and miscellaneous, respectively.

NFS file systems should be installed on a separate disk or partition (on server)

By having file systems on a separate partition of a harddisk, we can ensure that malicious users can not simply fill up the entire harddisk by writing large files onto it. This will then be able to crash other services running on the same harddisk.

Prevent normal users on an NFS client from mounting an NFS file system (on server)

This can be done by adding parameter 'secure' in an item in /etc/exports, such as: /home nfs-client(secure)

where the directory /home is the file system to be exported to the NFS client located at address nfs-client (specify the IP address or domain name of your NFS client).

Export an NFS file system in an appropriate permission mode (on server)

Let's say that you only need read-only permission on your exported NFS file system. Then the file system should be exported as read-only to prevent unintended or even intended modifications on those files. This is done by specifying parameter 'ro' in /etc/exports. /home nfs-client(ro)

Restrict exporting an NFS file system to a certain set of NFS clients (on server)

Specify only a specific set of NFS clients that will be allowed to mount an NFS file system. If possible, use numeric IP addresses or fully qualified domain names, instead of aliases.

Use the 'root_squash' option in /etc/exports on the NFS server if possible (on server)

When this option is used, then while mounting using the command mount, the user ID ?root? on the NFS client will be replaced by the user ID ?nobody? on the NFS server. This is to prevent the root on the NFS client from taking a superuser privilege on the NFS server, thus perhaps illegally allowing him to modify files on the NFS server. Here is an example: /home nfs-client(root_squash)

Disable suid (superuser ID) on an NFS file system (on client)

Add the 'nosuid' option (no superuser ID privilege) to an item in /etc/fstab (This file is used to determine which NFS file systems are to be mounted automatically at the startup time). This is to prevent files with suid bits set on the NFS server, e.g., Trojan horse files, from being executed on the NFS client, which could then lead to root compromise on the client. Or the root on the NFS client may accidentally execute those suid files. Here is an example of ?nosuid?. An item in /etc/fstab on the client may contain: nfs-server:/home /mnt/nfs nfs ro,nosuid 0 0

where nfs-server is the IP address or domain name of the NFS server and /home is the directory on the NFS server to be mounted to the client computer at the directory /mnt/nfs. Alternatively, the 'noexec' option can be used to disable any file execution at all. nfs-server:/home /mnt/nfs nfs ro,nosuid,noexec 0 0

Install the most recent patches for NFS and portmapper (on client & server)

NFS is known to be in the top-ten most common vulnerabilities reported by CERT and was abusively exploited. This means that the NFS server and portmapper on your system must be up-to-date to security patches.

Perform encryption over NFS traffic using SSH (on client & server)

Apart from the use of Secure Shell (SSH) for secure remote access, we can use it for tunnelling between an NFS client and server so that NFS traffic will be encrypted. The steps below will guide you how to encrypt NFS traffic using SSH.

Here is the simple diagram to show the concept of how NFS and SSH services cooperate. nfs-client nfs-server
mount --- SSH <=================> SSHD --- NFS

From this figure, when you mount an NFS directory from a client computer, you will mount through SSH. After the mounting is done, the NFS traffic in both directions will be encrypted and so secure.

In the figure the NFS server is located at address nfs-server (use either the IP address or domain name of your NFS server instead), and the NFS client is at address nfs-client. Make sure that in both systems you have SSH and NFS related services already installed so you can use them.

There are two way configurations on the NFS client and server which are described in the two sections below.
NFS server configuration
Section 1.1 and 1.2 are what we have to do on the NFS server. Export an NFS directory to itself

For example, if the NFS server's IP address is 10.226.43.154 and the NFS directory to be exported is /home, then add the following line to /etc/exports /home 10.226.43.154(rw,root_squash)

The reason for exporting directory /home to itself, instead of to an NFS client? IP address in an ordinary fashion, is that according to the figure above, we will feed the NFS data on the server to SSHD which is running at 10.226.43.154, instead of to the client computer in the usual case. The NFS data will then be forwarded securely to the client computer through the tunnel.

Note that the exported directory is allowed for read and write permission (rw). root_squash means the person who starts the mounting process to this directory will not obtain the root privilege on this NFS server.

Restart NFS and SSH daemons

Using Red Hat 7.2, you can manually start NFS and SSHD by issuing the following commands: #/sbin/service nfs restart
#/sbin/service sshd restart

If you want to have them started automatically at startup time, with Red Hat 7.2 add the two lines below to the startup file /etc/rc.d/rc.local. /sbin/service nfs start
/sbin/service sshd start

The term nfs in the commands above is a shell script that will start off two services, namely, NFS and MOUNTD.
NFS client configuration
Three sections below show what we have to do on the NFS client. Find the ports of NFS and MOUNTD on the NFS server Let's say you are now on the NFS client computer. To find the NFS and MOUNTD ports on the NFS server, use the command. #rpcinfo -p nfs-server

program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100003 2 tcp 2049 nfs
100003 2 udp 2049 nfs
100021 1 udp 1136 nlockmgr
100021 3 udp 1136 nlockmgr
100021 4 udp 1136 nlockmgr
100011 1 udp 789 rquotad
100011 2 udp 789 rquotad
100011 1 tcp 792 rquotad
100011 2 tcp 792 rquotad
100005 2 udp 2219 mountd
100005 2 tcp 2219 mountd

Note the lines with terms nfs and mountd. Under the column port, those are the ports for nfs and mountd. nfs has a port 2049 and mountd has a port 2219.
Setup the tunnel using SSH
On the NFS client computer, bind a SSH port with NFS port 2049. #ssh -f -c blowfish -L 7777:nfs-server:2049 -l tony nfs-server /bin/sleep 86400
#tony@nfs-server's password:
#

where:-c blowfish means SSH will use the algorithm blowfish to perform encryption.

-L 7777:nfs-server:2049 means binding the SSH client at port 7777 (or any other port that you want) to communicate with the NFS server at address nfs-server on port 2049.

-l tony nfs-server means in the process of login on the authentication server at address nfs-server (specify either the IP address or domain name of the authentication server), use the user login name tony to authenticate on the server.

/bin/sleep 86400 means to prevent spawning a shell on the client computer for 1 day (86,400 seconds). You can specify any larger number.

The line with #tony@nfs-server's password: will prompt the user tony for a password to continue authentication for the user.

Also on the NFS client computer, bind another SSH port with MOUNTD port 2219. #ssh -f -c blowfish -L 8888:nfs-server:2219 -l tony nfs-server /bin/sleep 86400
#tony@nfs-server's password:
#

where:-L 8888:nfs-server:2219 means binding this SSH client at port 8888 (or any other port that you want but not 7777 because you already used 7777) to communicate with the NFS server at address nfs-server on port 2219.

c) On the NFS client computer, mount NFS directory /home through the two SSH ports 7777 and 8888 at a local directory, say, /mnt/nfs. #mount -t nfs -o tcp,port=7777 ,mountport=8888 localhost:/home /mnt/nfs

Normally the format of the command mount is to mount, at the IP address (or domain name) of the remote host, the remote NFS directory (/home) to the local directory (/mnt/nfs). However, the reason we mount at localhost instead of the nfs-server, is because the data after decryption at the left end of the tunnel (see the figure above also) is on the localhost, not the remote host.

Alternatively, if you want to mount the NFS directory automatically at startup time, add the following line to /etc/fstab localhost:/home /mnt/nfs/ nfs tcp,rsize=8192,wsize=8192,intr,rw,bg,nosuid,port=7777,mountport=8888,noauto

Allow only traffic from authorised NFS clients to the NFS server (on server)

Supposing that an NFS server only provides the NFS service but nothing else so there are three ports available to use on the server, i.e., RPC Portmapper (on port 111), NFS (on port 2049), and Mountd (on port 2219). Here we can do some filtering on traffic that goes to the NFS server. Through the iptables firewall running locally on the NFS server (you must install iptables to use the following commands), allow only traffic from any authorised NFS client to the server. Allow traffic from an authorised subnet 10.226.43.0/24 to the ports Portmapper, NFS, and Mountd.
#iptables -A INPUT -i eth0 -s 10.226.43.0/24 -dport 111 -j ACCEPT
#iptables -A INPUT -i eth0 -s 10.226.43.0/24 -dport 2049 -j ACCEPT
#iptables -A INPUT -i eth0 -s 10.226.43.0/24 -dport 2219 -j ACCEPT

Deny something else.
#iptables -A INPUT -i eth0 -s 0/0 -dport 111 -j DROP
#iptables -A INPUT -i eth0 -s 0/0 -dport 2049 -j DROP
#iptables -A INPUT -i eth0 -s 0/0 -dport 2219 -j DROP
#iptables -A INPUT -i eth0 -s 0/0 -j DROP

Basically the NFS service operates through Portmapper service so if we block Portmapper port 111, we also then block NFS port 2049.

Alternatively, you can use the TCP wrapper to filter access to your portmapper by adding the line: portmapper: 10.226.43.0/24

to /etc/hosts.allow to allow access to portmapper only from subnet 10.226.43.0/24.

Also add the line below to /etc/hosts.deny to deny access to all other hosts not specified above. portmapper:ALL

Filter out Internet traffic to the NFS service on the routers and firewalls (misc)

In some cases for many organisations with their computers visible on the Internet, if the NFS service is also visible, then we may need to block Internet traffic to ports 111 (Portmapper), 2049 (NFS), and 2219 (Mountd) on your routers or firewalls to prevent unauthorised access to the two ports. With the iptables set up as your firewall, use the following rules: #iptables -A INPUT -i eth0 ?d nfs-server -dport 111 -j DROP
#iptables -A INPUT -i eth0 ?d nfs-server -dport 2049 -j DROP
#iptables -A INPUT -i eth0 -d nfs-server -dport 2219 -j DROP

Use the software tool NFSwatch to monitor NFS traffic (misc)

NFSwatch allows you to monitor NFS packets (traffic) flowing between the NFS client and server. It can be downloaded from ftp://ftp.cerias.purdue.edu/pub/tools/unix/netutils/nfswatch/. One good reason that we need to monitor is that in case there is some malicious activity going on or already taking place, we would then use the log created by NFSwatch to trace back to how and where it came from. To monitor NFS packets between nfs-server and nfs-client, use the command: #nfswatch -dst nfs-server -src nfs-client

all hosts Wed Aug 28 10:12:40 2002 Elapsed time: 00:03:10
Interval packets: 1098 (network) 818 (to host) 0 (dropped)
Total packets: 23069 (network) 14936 (to host) 0 (dropped)
Monitoring packets from interface lo
int pct total int pct total
ND Read 0 0% 0 TCP Packets 461 56% 13678
ND Write 0 0% 0 UDP Packets 353 43% 1051
NFS Read 160 20% 271 ICMP Packets 0 0% 0
NFS Write 1 0% 1 Routing Control 0 0% 36
NFS Mount 0 0% 7 Address Resolution 2 0% 76
YP/NIS/NIS+ 0 0% 0 Reverse Addr Resol 0 0% 0
RPC Authorization 166 20% 323 Ethernet/FDDI Bdcst 4 0% 179
Other RPC Packets 5 1% 56 Other Packets 2 0% 131
1 file system
File Sys int pct total
tmp(32,17) 0 0% 15

Specify the IP address (or domain name) of the source (-src) and that of the destination (-dst). SPONSOR: Guardian Digital
Secure Mail Suite: Easily enforce your company's corporate email policies.

2007/11/30

Linux QoS 設定スクリプト　Akihiro

Linux QoS 設定スクリプト

Linux ルーターに QoS を設定しました。快適、快適。って Linux を使わない人にはまったく興味のない話題だと思いますが（笑）。私が設定したスクリプトを書いておきますので、QoS を使いたい方は、参考にしてください。スクリプトを走らせた後に

tc -s class ls dev eth0

とコマンドを打つと、HTTP や FTP がトラフィックコントロールされている様子がわかります。

=== 以下スクリプト ===

#!/bin/sh

################
# ルールの初期化
/sbin/tc qdisc del dev eth0 root
/sbin/tc qdisc del dev eth1 root

##############################
# ルートクラス・親クラスの作成

## /dev/eth0 のルートクラスに cbq をセットし、ハンドルを 10 とする ##
/sbin/tc qdisc add dev eth0 root handle 10: cbq bandwidth 10Mbit avpkt 1000 cell 8

## /dev/eth1 のルートクラスに cbq をセットし、ハンドルを 11 とする ##
/sbin/tc qdisc add dev eth1 root handle 11: cbq bandwidth 10Mbit avpkt 1000 cell 8

## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 10:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth0 parent 10:0 classid 10:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000

## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 11:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth1 parent 11:0 classid 11:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000

##################################
# /dev/eth0 の帯域制御クラスの作成

## 10Mbit/sec の帯域クラスを priority 1, classid 10:61, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 3, classid 10:63, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 5, classid 10:65, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 7, classid 10:67, parent 10:1 で ##
## 作成し tbf スケジューラを設定帯域制限あり ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:67 cbq bandwidth 10Mbit rate 224Kbit allot 1514 cell 8 weight 22Kbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:67 tbf rate 224Kbit buffer 10Kb/8 limit 15Kb

##################################
# /dev/eth1 の帯域制御クラスの作成

## 10Mbit/sec の帯域クラスを priority 1, classid 11:61, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 3, classid 11:63, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 5, classid 11:65, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 7, classid 11:67, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:67 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:67 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

##########################################
# 各帯域クラスを適用するネットワークを定義

## DNS
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 53 0xffff flowid 10:61
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 53 0xffff flowid 11:61

## SSH
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 22 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 22 0xffff flowid 11:63

## HTTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 80 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 80 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 443 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 443 0xffff flowid 11:63

## MAIL
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 25 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 25 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 110 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 110 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 143 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 143 0xffff flowid 11:63

## NTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 123 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 123 0xffff flowid 11:63

## FTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 20 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 20 0xffff flowid 11:65
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 21 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 21 0xffff flowid 11:65

## その他
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dst any flowid 10:67
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip dst any flowid 11:67

=== 以上 ===

QoS 付き nat の完全な例 - Linux Advanced Routing & Traffic Control HOWTO

Linux Advanced Routing & Traffic Control HOWTO
Prev Chapter 15. クックブック Next

15.10. QoS 付き nat の完全な例

私は Pedro Larroy です。ここではたくさんのユーザがいるプライベートネットワークを、パブリックな ip アドレスを持つ Linux ルータを通してインターネットにつなぎ、この Linux ルータにネットワークアドレス変換 (NAT) をやらせる方法について、よくある設定例を説明したいと思います。ここでは QoS 設定を用いて、大学寮の 198 ユーザ (私もその一人。ただし管理者です) にインターネットアクセスを提供します。ユーザはみなピアツーピアプログラムのヘビーユーザですので、適切なトラフィック制御が不可欠です。これが興味を持たれた lartc 読者に対する、実用的な例になっていることを期待します。

まず先に、順番に段階を追った実践的なアプローチを取り、最後にその処理をブート時に自動的に行うやり方を説明します。この例が適用されるネットワークは、パブリック ip アドレスをひとつだけ持つ Linux ルータを介して、インターネットにつながっているプライベート LAN です。これを複数のパブリックアドレスに拡張することは非常に簡単で、 iptables のルールをいくつか追加するだけです。動作環境を作るには、以降のものが必要となります。

Linux 2.4.18 以降のカーネルがインストールされていること

2.4.18 を使っている場合は、HTB パッチが必要です。
iproute

tc のバイナリが HTB に対応していること。コンパイル済みのバイナリが HTB と一緒に配布されています。
iptables

15.10.1. まず乏しいバンド幅を最適化しましょう

まずいくつか qdisc を設定して、トラフィックをクラス選別します。 htb qdisc を作り、昇順の優先度を持つ 6 つのクラスを付属させます。次に、必ず割り当てられた速度を使え、他のクラスが不要としているバンド幅も使えるクラスを作ります。優先度を高く (つまり prio 番号を小さく) したクラスは、余ったバンド幅を先に利用できます。私たちの接続は下り 2Mb 上り 300kbit/s の ADSL です。私は 240kbit/s を上限速度としました。これ以上にすると、おそらく接続のどこかのバッファが効くためでしょうが、遅延が大きくなり始めるからです。このパラメータは実験的に測定して、近くのホストに対する遅延を見ながら増減してください。

CEIL を上りバンド幅上限値の 75% に調整してください。 eth0 になっているところは、インターネットのアクセスに使っているパブリックなインターフェースに変更してください。まず手始めに、以降を root のシェルで実行します。 CEIL=240
tc qdisc add dev eth0 root handle 1: htb default 15
tc class add dev eth0 parent 1: classid 1:1 htb rate ${CEIL}kbit ceil ${CEIL}kbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 80kbit ceil 80kbit prio 0
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
tc class add dev eth0 parent 1:1 classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
tc qdisc add dev eth0 parent 1:12 handle 120: sfq perturb 10
tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10

ここではまず、深さが 1 レベルの htb ツリーを作りました。次のような感じです。 +---------+
| root 1: |
+---------+
|
+---------------------------------------+
| class 1:1 |
+---------------------------------------+
| | | | | |
+----+ +----+ +----+ +----+ +----+ +----+
|1:10| |1:11| |1:12| |1:13| |1:14| |1:15|
+----+ +----+ +----+ +----+ +----+ +----+

classid 1:10 htb rate 80kbit ceil 80kbit prio 0

これが優先度が最高のクラスです。このクラスのパケットは、遅延が最も小さく、余ったバンド幅を最初に割り当てられます。よってこのクラスの ceil は抑え目に設定しておくのが良いでしょう。対話的トラフィックのように、遅延が小さいことによる利益が大きいパケットは、このクラスを使って送ります。具体的には ssh, telnet, dns, quake3, irc, SYN フラグの立ったパケットです。
classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1

これがバルクトラフィックをあてがう最初のクラスです。この例では、ローカルの web サーバから発するトラフィック (発信元ポートが 80) と、web ページのリクエスト (送信先ポートが 80) です。
classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2

このクラスには、TOS フィールドで Maximize-Throughput ビットが立っているトラフィックと、ルータの「ローカルプロセス」からインターネットに向けて発するトラフィックをおきます。よって以降のクラスは、このマシンを「経由する」トラフィックだけになります。
classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2

このクラスは、他の NAT されるマシンで、高い優先度を必要とするバルクトラフィックのためのものです。
classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3

ここにはメール関連のトラフィック (SMTP, pop3 など) と、 TOS フィールドの Minimize-Cost ビットが立ったパケットを入れます。
classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3

最後に、ここにはルータの背後に置かれた、NAT されたマシンからのトラフィックを入れます。 kazaa, edonkey などはここに入れ、他のサービスと干渉しないようにします。
15.10.2. パケットのクラス選別

qdisc 設定は行いましたが、パケットのクラス選別はまだです。ですので現在は、送信されるパケットはすべて 1:15 に入ります (なぜなら tc qdisc add dev eth0 root handle 1: htb default 15 を用いたから)。ここで、どのパケットがどこに行くのかを伝える必要があります。ここが最も重要な部分です。

ではフィルタを設定し、パケットを iptables でクラス選別できるようにします。私はこの作業には、まずほとんどの場合 iptables を用います。 iptables は柔軟ですし、各ルールでのパケットの計数もできるからです。また RETURN ターゲットを用いれば、パケットにすべてのルールを適用しなくて済みます。次のコマンドを実行します。 tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
tc filter add dev eth0 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12
tc filter add dev eth0 parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13
tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14
tc filter add dev eth0 parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15

ここでは単に、特定の FWMARK 値 (handle x fw) を持った各パケットを対応するクラス (classid x:x) に送るようカーネルに伝えただけです。次は、パケットへのマーク付けを iptables を使って行う方法です。

まず、パケットが iptables のフィルタをどのように通るのかを理解しなければなりません。 +------------+ +---------+ +-------------+
Packet -| PREROUTING |--- routing-----| FORWARD |-------+-------| POSTROUTING |- Packets
input +------------+ decision +---------+ | +-------------+ out
| |
+-------+ +--------+
| INPUT |---- Local process -| OUTPUT |
+-------+ +--------+

すべてのテーブルが存在し、デフォルトのポリシーが ACCEPT (-P ACCEPT) になっているとします。まだ iptables に触ったことがなければ、デフォルトで ok のはずです。私たちのプライベートネットワークはクラス B のアドレス 172.17.0.0/16 を持ち、パブリック ip は 212.170.21.172 です。

次にカーネルに実際に NAT を行うよう指示し、プライベートネットワークのクライアントが外部と通信を開始できるようにします。 echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 172.17.0.0/255.255.0.0 -o eth0 -j SNAT --to-source 212.170.21.172

ここでパケットが 1:15 経由で流れていることを確認しましょう: tc -s class show dev eth0

パケットへの印付けを開始するには、mangle テーブルの PREROUTING チェインにルールを追加します。 iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN

これでプライベートネットワークからインターネットのどこかに ping を行うと、 1:10 のパケット数が増加するのがわかるはずです。見てみましょう: tc -s class show dev eth0

ここでは -j RETURN を行って、パケットが他のルールには行かないようにしました。 icmp パケットは RETURN 以降のルールのマッチ動作の対象にはなりません。覚えておいてください。では適切に TOS を処理するよう、他にもルールを追加しましょう。 iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j MARK --set-mark 0x5
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j MARK --set-mark 0x6
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j RETURN

では ssh パケットを優先付けします: iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j RETURN

tcp 接続を開始するパケット、つまり SYN フラグの立ったパケットは、優先しましょう。 iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN

などなど。mangle の PREROUTING へのルール追加が終わったら、次のコマンドで PREROUTING テーブルを締めくくりましょう。 iptables -t mangle -A PREROUTING -j MARK --set-mark 0x6

これで、ここまで印付けされなかったトラフィックは 1:15 に向かいます。実はデフォルトのクラスは 1:15 なので、この最終ステップは不必要です。ですが設定全体の整合性を保つため、またこのルールのカウンタを見るために、ここでは印付けを行っています。

同様の作業を OUTPUT ルールに対しても行うといいでしょう。よってこれらのコマンドを、-A PREROUTING の代わりに -A OUTPUT とおいて繰り返します (s/PREROUTING/OUTPUT/)。こうするとローカル (この Linux ルータ) で生成されたトラフィックもクラス選別できます。 OUTPUT チェインの最後は、-j MARK --set-mark 0x3 で締めくくり、ローカルのトラフィックには高めの優先度を与えるようにしました。
15.10.3. この設定を改善する

これでこの設定はすべて動作するようになりました。グラフを見て、バンド幅がどのように使われているか、それをどのようにしたいか考えましょう。これには長い時間をかけましょう。私の場合は最終的に、このインターネット接続を非常にうまく動作させられるようになりました。これを行わなければ、常にタイムアウトに悩まされたり、新しく生成される tcp 接続にまったくバンド幅の配分がなされなかったり、という状態だったでしょう。

特定のクラスが、ほとんどの間一杯になっているような状況でしたら、他のキューイング規則をそこにあてがって、バンド幅の共有をより公平にしてあげるといいでしょう。 tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10

15.10.4. このすべてをブート時に起動する

当然ですが、いろいろな方法があります。私の場合は [start | stop | stop-tables | start-tables | reload-tables] といったオプションを受け付ける /etc/init.d/packetfilter というスクリプトを書き、qdisc を設定し、必要なカーネルモジュールをロードし、デーモンのように動作するようにしました。このスクリプトは同時に、/etc/network/iptables-rules から iptables のルールもロードします。このファイルの内容は iptables-save で保存、 iptables-restore で復元できます。 Prev Home Next
単一のホストまたはネットワークの速度制限 Up ブリッジと、代理 ARP を用いた擬似ブリッジの構築

ポート毎に帯域制御したい

ポート毎に帯域制御したい
最終更新： 2007/1/1

[概　要]

サーバからの送信される情報の帯域をポート毎(HTTPやFTP等のプロトコル毎)に制御したい。

[対　策]

カーネルのQoS(Quality of Service)機能を使用すると比較的簡単に帯域制御できます。但し、帯域制御できるのはサーバからの送信だけで受信は制御できないため、例えばFTPのアップロードを制御したい場合はデーモンの機能を併用する必要があります。
事前準備

カーネルのQoS(Quality of Service)機能を使用するには iproute+tc が必要であるが、最近のデストりには既に入っているのでこちらのインストールは不要である。tcを使用するといろいろなQoS制御ができる反面、かなり時間をかけて内容をしっかり理解しないとほとんど設定不可能であるが、cbq.init というスクリプトを使用するとポート毎の帯域制御であれば簡単に設定できるため、ここではこれを使用する。cbq.initをこちらからダウンロードし、システム起動時に自動起動できるようにする。なお、RedHat系ならそのままで良いが、SuSEの場合はtcのパスが異なるので、2行目でmvする代わりに3行目のようにsedで変換する。
# wget http://jaist.dl.sourceforge.net/sourceforge/cbqinit/cbq.init-v0.7.3
# mv cbq.init-v0.7.3 /etc/init.d/cbq.init
(# sed -e "s/TC=\/sbin\/tc/TC=\/usr\/sbin\/tc/g" cbq.init-v0.7.3 > /etc/init.d/cbq.init)
# chmod 755 /etc/init.d/cbq.init
# chkconfig --add cbq.init

QoS設定

QoS制御で使用する cbq.init 関係の設定ファイル名称及び設置場所はデフォルトで決まっている。

[設定ファイルの設置場所]

設定ファイルは、/etc/sysconfig/cbq/ディレクトリ配下に設定することになっているので、下記で作成しておく。
# mkdir /etc/sysconfig/cbq

[設定ファイル]

設定ファイルの名称と形式も決まっているので下記のとおりとする。clsid(クラスID)が異なれば複数の設定ファイルが記述できる。

ファイル名称： cbq-.

cbq-：ここは固定でこのとおりとすること。
：実質的にCBQのクラスIDであり、10進で2-65535の値(16進で0002-FFFF)で指定する。他の設定ファイルと重複は不可。
：クラスIDのニックネームなので自分でわかりやすいものを適当に付与する。

例： cbq-1280.My_first_shaper

[設定ファイルのパラメータ]
No. 種別パラメータ概要備考
1 デバイス DEVICE=,[,]
例：DEVICE=eth0,10Mbit,1Mbit

:帯域制御するインタフェース名。
:インタフェースの物理速度。100BASE-TXなら100Mbit、10BASE-Tなら10Mbitと指定。
:に比例するパラメータで原則の1/10の値にすること。必須
2 クラス RATE=
例：RATE=5Mbit

:このクラスに割り当てる帯域を指定。単位としてはKbit, Mbitが使用できる。bps, Kbps, Mbps も使用できるが、bytes/secであることに注意しなければならないのと、インタフェース速度との関係がわかりにくいので使用しないほうが無難。必須
3 WEIGHT=
例：WEIGHT=500kbit

:RATEに対応したパラメータで、原則RATEの1/10(WEIGHT ~= RATE / 10. 適当に四捨五入でもする。)の値にすること。必須
4 PRIO=<1-8> デフォルト:5
例：PRIO=5

トラヒックの優先度を1-8で指定。値が小さいほど優先的に処理されるので、プロトコル間で差をつける(SSHを最優先にする等)場合に使用できる。 OP
5 フィルタ RULE=[[saddr[/prefix]][:port],][daddr[/prefix]][:port]

ここで、実際に制御するアドレス/ネットワークとポートを指定する。前者のパラメータ[saddr[/prefix]]は制御するパケットのソースアドレス/ネットワークで、[daddr[/prefix]]が本スクリプトが動作しているサーバがデータを送信する相手を示す。両者の区切りである「,」は、前者のパラメータ指定の最後に付与するものなので注意が必要である。

例：
・WWWサーバへのアクセスに対するコンテンツ配送の制御
サーバの80番ポートをソースとするパケットを制御することになるので、下記のように [サーバアドレス:80,] とRULEに設定する。ソースをキーに制御するので最後の「,」を忘れずに。

RULE=192.168.1.100:80,
+---------+
| linux |-eth0------*-[client]
+---------+
Server：192.168.1.100 Client: any

80 --------------> any
・FTPサーバからのダウンロードトラヒックの制御
FTPサーバからのデータダウンロードは、ActiveモードとPassiveモードで使用するポートが異なる。
Activeモードの場合、サーバ側が20番となるコネクション(ftp-data)で送信されるので、以下のように設定する。

RULE=192.168.1.100:20,

Passiveモードの場合、サーバ側で使用するポートを指定できるデーモンでないと制御できない。おやじのサイトで紹介しているProftpd/vsftpdとも設定が可能なので、使用ポート範囲を設定する。ダウンロードデータはそのポートがソースとなるパケットで送信される範囲指定になるので、以下のように [開始ポート番号/ANDマスク] 設定する。
指定方法のANDマスクの考え方は、ネットワークのサブネットマスクの考え方(192.168.1.0/24の/24)と同じであり、/24を16進で表現したものである。例えば、4096から4127までの32ポートを設定したとすると、[ 4096/0xffe0 ]となり、下記のように4096～4127の数字は[ 0xffe0 ]でANDをとると全て4096となり同じ扱いになる。これでわかるように、開始ポート番号は、使用するポート数に応じて下位nビットが0となる値にしないと関係ないポートまで制限してしまうので、Proftpd等の設定例で示している4000～4029という設定は変更する必要がある。

4096(0x4000) 0100000000000000 [開始ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000

4127(0x401f) 0100000000011111 [終了ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000

RULE=192.168.1.100:4096/0xffe0,
+---------+
| linux |-eth0------*-[client]
+---------+
Server：192.168.1.100 Client: any

20/4096-4127 --------------> any 必須
6 タイマ TIME=[,, ...,/]-;/
例：TIME=0,1,2,5/18:00-06:00;256Kbit/25Kbit
TIME=18:00-06:00;256Kbit/25Kbit

本設定でタイマにより上記で設定した値と異なる帯域で制御することができる。
:ルールを適用する曜日を指定。0-6で 0 が日曜に対応している。
-:このルールの適用開始時刻と終了時刻を24時制で指定。
/：上記の2項、3項に同じ。 OP

[設定例] 下記のようなファイルを/etc/sysconfig/cbqディレクトリに設定する。

・cbq-100.http: WWWサーバへの過大なアクセスにより回線を使い切るのを制限する例。

DEVICE=eth0,100Mbit,10Mbit
RATE=5Mbit
WEIGHT=500Kbit
PRIO=5
RULE=192.168.1.100:80,

・cbq-101.ftp: FTPサーバからのダウンロードを制限する例。

DEVICE=eth0,100Mbit,10Mbit
RATE=10Mbit
WEIGHT=1Mbit
PRIO=6
RULE=192.168.1.100:20,
RULE=192.168.1.100:4096/0xffe0,

cbq.initの起動

cbq.init自体が起動スクリプトなので単純に起動すればよい。起動したら設定どおり制限されているかテストする。
# /etc/init.d/cbq.init start

tc - traffic control Linux QoS control tool

tc - traffic control Linux QoS control tool
Milan P. Stanic

mps@rns-nis.co.yu

Contents
Contents
1 What is QoS
2 command syntax
3 Queueing disciplines
3.1 Class Based Queue
3.2 Priority
3.3 FIFO
3.4 TBF
3.5 RED
3.6 GRED
3.7 SFQ
3.8 ATM
3.9 Dsmark
3.10 INGRESS
4 classes
4.1 CBQ
5 filters (or classifier)
5.1 filter rsvp
5.2 filter u32
5.3 filter fw
5.4 filter route
5.5 tcindex
6 police
Bibliography
Appendix

About this document

This document should be (comprehensive) description of tc command utility from iproute2 package.

Primary motivation for this work is my wish to learn about QoS in Linux (and about QoS in general). If you find errors or big mistakes in this document that is because I don't yet understand QoS. I hope it will improve over time.

It is based on kernel 2.4 and iproute2 version 000305

It is far from to be complete and/or without errors. I am writing it for purpose of my learning only, and I am not sure will (and when) it be finished. But, I am working on it (especially when I have time :).

All of the text is taken from different documents from the net, from linux-diffserv, linux-net mailing lists and from the Linux kernel source files.

Disclaimer: Use at your own risk. I am not responsible if you make loss or damage in any sense by using information from this document.

1 What is QoS

When the kernel has several packets to send out over a network device, it has to decide which ones to send first, which ones to delay, and which ones to drop. This is the job of the packet scheduler, and several different algorithms for how to do this "fairly" have been proposed.

With Linux QoS subsystem (which is constructed of the building blocks of the kernel and user space tools like ip and tc command line utilities) it is possible to make very flexible traffic control.

2 command syntax

tc (traffic controller) is the user level program which can be used to create and associate queues with the network devices. It is used to set up various kinds of queues and associate classes with each of those queues. It is also used to set up filters by which the packets is classified.

Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }

where OBJECT := { qdisc | class | filter }

OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }

Where it's expecting a number for BPS; it understands some suffixes: kbps (*1024), mbps (*1024*1024), kbit (*1024/8), and mbit (*1024*1024/8). If I'm reading the code correctly; "BPS" means Bytes Per Second; if you give a number without a suffix it assumes you want BITS per second (it divides the number you give it by 8). It also understands bps as a suffix.

Where it's expecting a time value, it seems it understands suffixes of s, sec, and secs for seconds, ms, msec, and msecs for milliseconds, and us, usec, and usecs for microseconds.

Where it wants a size parameter, it assumes non-suffixed numbers to be specified in bytes. It also understands suffixes of k and kb to mean kilobytes (*1024), m and mb to mean megabytes (*1024*1024), kbit to mean kilobit (*1024/8), and mbit to mean megabits (*1024*1024/8).

1Mbit == 128Kbps or 1 megabit is 128 kilobytes per second

bps = bits/sec (uhmm...)

kbps = bytes/sec * 1024

mbps = bytes/sec * 1024 * 1024

kbit = bits/sec * 1024

mbit = bits/sec * 1024 * 1024

In the examples Xbit and Xbps are interchangeably, when tc treats them very differently.

note: this is very confusing

note: make sure whenever you are dealing with memory related things like queue size, buffer size that their units are in bytes and when it is bandwidth and rate related parameters the units are in bits.

3 Queueing disciplines

Each network device has a queuing discipline associated with it, which controls how packets enqueued on that device are treated. It can be viewed with ip command:

root@dl:# ip link show

1: lo: <lt;LOOPBACK,UP>gt; mtu 3924 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <lt;BROADCAST,MULTICAST,PROMISC,UP>gt; mtu 1500 qdisc pfifo_fast qlen 100

link/ether 52:54:00:de:bf:19 brd ff:ff:ff:ff:ff:ff

3: tap0: <lt;BROADCAST,MULTICAST,NOARP>gt; mtu 1500 qdisc noop

link/ether fe:fd:00:00:00:00 brd ff:ff:ff:ff:ff:ff
Generally, queueing discipline ("qdisc") is a black box, which is able to enqueue packets and to dequeue them (when device is ready to send something) in order and at times determined by algorithm hidden in it.

By default queueing discipline is pfifo_fast which cannot be manipulated with tc. It is assigned to device when the device is started or when the other qdisc's deleted from the device. That qdiscs have 3 bands which are processed from band 0 to band 2, and when there is a packet in queue in higher priority band (lower number)

Qdisc's are:

FIFO - simple FIFO (packet (p-FIFO) or byte (b-FIFO) )
PRIO - n-band strict priority scheduler
TBF - token bucket filter
CBQ - class based queue
CSZ - Clark-Scott-Zhang
SFQ - stochastic fair queue
RED - random early detection
GRED - generalized random early detection
TEQL - traffic equalizer
ATM - asynchronous transfer mode
DSMARK - DSCP (Diff-Serv Code Point)marker/remarker
qdisc's are divided to two categories:

- "queues", which have no internal structure visible from outside.

- "schedulers", which split all the packets to "traffic classes", using "packet classifiers". ? is qdisc's which can split packets to ``traffic classes''

In turn, classes may have child qdiscs (as rule, queues) attached to them etc. etc. etc.

note: Certain qdiscs can have children and they are classfull, and others are leafs (describe it!)

classfull qdiscs: CBQ, ATM, DSMARK, CSZ and the ( p-FIFO ???? or prio )

leaf qdiscs: TBF, FIFO, SFQ, RED, GRED, TEQL

note: classfull qdiscs can be also leafs

The syntax for managing queuing discipline is:

Usage: tc qdisc [ add | del | replace | change | get ] dev STRING

[ handle QHANDLE ] [ root | ingress | parent CLASSID ]

[ estimator INTERVAL TIME_CONSTANT ]

[ [ QDISC_KIND ] [ help | OPTIONS ] ]

tc qdisc show [ dev STRING ] [ingress]

Where:

QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }

OPTIONS := ... try tc qdisc add <lt;desired QDISC_KIND>gt; help
add
ads a qdisc to device dev
del
delete qdisc from device dev
replace
replace the qdisc with another
handle
represents the unique handle that is assigned by the user to the queuing discipline. No two queuing disciplines can have the same handle. Qdisc handles always have minor number equal to zero.
root
indicates that the queue is at the root of a link sharing hierarchy and own all bandwidth on that device. Can only have one root qdisc per device.
ingress
policing on the ingress
parent
represents the handle of the parent queuing discipline.
dev
is network device to which we want attach qdisc
estimator
is used to determine if the requirements of the queue have been satisfied. The INTERVAL and the TIME_CONSTANT are two parameters that are of very high significance to the estimator. The estimator estimate the bandwidth used by each class over the appropriate time interval, to determine whether or not each class has been receiving its link sharing bandwidth.
Usage: ... estimator INTERVAL TIME-CONST

INTERVAL is interval between measurements

TIME-CONST is averaging time constant

Example: ... est 1sec 8sec
The time constant for the estimator is a critical parameter; this time constant determines the interval over which the router attempts to enforce the link-sharing guidelines.

[1]Unfortunately, rate estimation is not a very easy task. F.e. I did not find a simple way to estimate the current peak rate and even failed to formulate the problem. So I preferred not to built an estimator into the scheduler, but run this task separately. Ideally, it should be kernel thread(s), but for now it runs from timers, which puts apparent top bounds on the number of rated flows, has minimal overhead on small, but is enough to handle controlled load service, sets of aggregates.

We measure rate over A=(1<lt;<lt;interval) seconds and evaluate EWMA:

avrate = avrate*(1-W) + rate*W

where W is chosen as negative power of 2: W = 2(-ewma_log)

The resulting time constant is:

T = A/(-ln(1-W))

NOTES.

* The stored value for avbps is scaled by 25, so that maximal rate is 1Gbit, avpps is scaled by 210.

* Minimal interval is HZ/4=250msec (it is the greatest common divisor for HZ=100 and HZ=1024 8)), maximal interval is (HZ/4)*2EST_MAX_INTERVAL = 8sec. Shorter intervals are too expensive, longer ones can be implemented at user level painlessly.

You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes.

I'm not 100% sure of what I've just said. It's just how I think it works.

3.0.0.1 to stop QoS completely use the following for eth0:

tc qdisc del dev eth0 root

3.1 Class Based Queue

In CBQ, every class has variables idle and avgidle and parameter maxidle used in computing the limit status for the class, and the parameter offtime used in determining how long to restrict throughput for overlimit classes.

idle:
The variable idle is the difference between the desired time and the measured actual time between the most recent packet transmissions for the last two packets sent from this class. When the connection is sending more than its allocated bandwidth, then idle is negative. When the connection is sending perfectly at its alloted rate, then idle is zero.
avgidle:
The variable avgidle is the average of idle, and it computed using an exponential weighted moving average (EWMA). When the avgidle is zero or lower, then the class is overlimit (the class has been exceeding its allocated bandwidth in a recent short time interval).
maxidle:
The parameter maxidle gives an upper bound for avgidle. Thus maxidle limits the credit given to a class that has recently been under its allocation.
offtime:
The parameter offtime gives the time interval that a overlimit must wait before sending another packet. This parameter determines the steady-state burst size for a class when the class is running over its limit.
minidle:
The minidle parameter gives a (negative) lower bound for avgidle. Thus, a negative minidle lets the scheduler remember that a class has recently used more than its allocated bandwidth.
Usage: ... cbq bandwidth BPS avpkt BYTES [ mpu BYTES ]

[ cell BYTES ] [ ewma LOG ]
bandwidth
represents the maximum bandwidth available to the device to which the queue is attached.
avpkt
represents the average packet size. This is used in determining the transmission time which is given as Transmission Time t = average packet size / Link Bandwidth
mpu
represents the minimum number of bytes that will be sent in a packet. Packets that are of size lesser than mpu are set to mpu. This is done because for ethernet-like interfaces, the minimum packet size is 64. This value is usually set to 64.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
CBQ class is automatically generated when a CBQ qdisc created. ??

note: rtab is rate table?

note: mariano: should first declare a cbq "parent" class (which uses all the bandwidth) and then declare the two "leaf" classes.

CBQ is complex qdisc and to be fully understood it is good to read Sally Floyds and Van Jacobsons paper.

3.2 Priority

Simple priority queue

Usage: ... prio bands NUMBER priomap P1 P2...

Where:
bands
number of bands to add (default 3)
priomap
define how the priomap looks like (default to 3-band scheduler map)
So if you define more than 3 bands, make sure to re-define the priomap

In prio as long as there is data to be dequeued in the higher priority queue, prio will favor the higher queue.

3.3 FIFO

Simple First-In-First-Out queue which provides basic store-and-forward capability. FIFO is default qdisc on most real interfaces.

Usage: ... [p|b]fifo [ limit NUMBER ]
"b" stands for bytes, while "p" stands for packets.

limit
maximum length of the queue in bytes for bfifo and in packets for pfifo
This means that the maximum length of the fifo queue is measured in bytes in the first case and in number of packets in the second case.

small note: The fifo queue can be set to 0, but this still allows a single packet to be enqueued.

3.4 TBF

Token Bucket Filter is qdisc which have tokens and works like that if there is token in the bucket it possible to enqueue packet and take token. Kernel puts token in the bucket in some intervals

Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS

[ mtu BYTES[/BYTES] ] [ peakrate KBPS ] [ latency TIME ]
limit
is the number of bytes that can be queued
burst
specifies bits per burst how much can be sent within a given unit of time to not create scheduling concerns
rate
is used indirectly in qdisc's: that's at tc rate is used to calculate the transmition time required for each packet sized from mpu to mtu. Another definition: rate option is what control bandwidth. AFAIK `bandwidth' represents the `real' bandwidth of the device.
mtu
is maximum transfer unit
peakrate
max short term rate
latency
max latency to queuing
Jamal: TBF is influenced by quiet a few parameters; peakrate, rate, MTU, burst size etc. It will do what you ask it to ;->gt; And at times it will let bursts flood the gate i.e you might end up sending at wire speed. What are your parameters like?

3.5 RED

Random Early Detection discard packet even when there is space in the queue. As the queue length increases drop probability also increases. This approach enables sender to be notified that there is likelihood of congestion before it is actually appeared.

Usage: ... red limit BYTES min BYTES max BYTES avpkt BYTES burst PACKETS

probability PROBABILITY bandwidth KBPS [ ecn ]
limit
actual physical size of the queue
min
minimum threshold in Kilobytes
max
maximum threshold in Kilobytes.
avpkt
is average packet size
burst
is burstiness (from Jamal: used to compute time constant ) ???
probability
should be random drop probability
bandwidth
should be the real bandwidth of the interface
ecn
? explicit congestion notification (flag or what)
Always make sure that min <lt; max <lt; limit

3.6 GRED

Generalized RED is used in DiffServ implementation and it has virtual queue (VQ) within physical queue. Currently, the number of virtual queues is limited to 16.

GRED is configured in two steps. First the generic parameters are configured to select the number of virtual queues DPs and whether to turn on the RIO-like buffer sharing scheme. Also at this point, a default virtual queue is selected.

The second step is used to set parameters for individual virtual queues.

Usage: ... gred DP drop-probability limit BYTES min BYTES max BYTES

avpkt BYTES burst PACKETS probability PROBABILITY bandwidth KBPS

[prio value]

OR ... gred setup DPs <lt;num of DPs>gt; default <lt;default DP>gt; [grio]
setup
identifies that this is a generic setup for GRED
DPs
is the number of virtual queues
default
specifies default virtual queue
grio
turns on the RIO-like buffering scheme
limit
defines the virtual queue ``physical'' limit in bytes
min
defines the minimum threshold value in bytes
max
defines the maximum threshold value in bytes
avpkt
is the average packet size in bytes
bandwidth
is the wire-speed of the interface
burst
is the number of average-sized packets allowed to burst
probability
defines the drop probability in the range (0...)
DP
identifies the virtual queue assigned to these parameters
drop-probability
?
prio
identifies the virtual queue priority if grio was set in general parameters

3.7 SFQ

Stochastic Fair Queue as it's name implies. It processes queues in round-robin order.

Usage: ... sfq [ perturb SECS ] [ quantum BYTES ]
perturb
is no of seconds after them hashing function will be changed to minimize hash collision to small time interval (the perturb interval).
quantum
is DRR (Deficit Round Robin) round quantum like in CBQ.

3.8 ATM

Used to re-direct flows from the default path to ATM VCs. Each flow can have its own ATM VC, but multiple flows can also share the same VC.

Werner: ATM qdisc is different. It takes packets from some traffic stream (no matter what interface or such), and sends it over specific (and typically dedicated) ATM connections.

Werner: Then there's the case of qdiscs that don't really queue data, e.g. sch_dsmark or sch_atm.

3.9 Dsmark

Diff-serv marker isn't really a queuing discipline. It marks packet according to specified rule. It is configured as qdisc first and after that as class (if it is used for classification)

Usage: dsmark indices INDICES [ default_index DEFAULT_INDEX ] [ set_tc_index ]
indices
is the size of the table of (mask,value) pairs. See bellow. (maybe mask value)
default_index
is used if the classifier finds no match
set_tc_index
if set retrieves the content of the DS field and stores it in skb->gt;tc_index
When invoked to create class it's parameter are:

Usage: ... dsmark [ mask MASK ] [ value VALUE ]
mask
mask on DSCP (default 0xff)
value
value to or with (default 0)
Outgoing DSCP = (Incoming DSCP AND mask) OR value

Where Incoming DSCP is the DSCP value of the original incoming packet, and Outgoing DSCP is the DSCP that the packet will be assigned as it leaves the queue.

3.10 INGRESS

if present, the ingress qdisc is invoked for each packet arriving on the respective interface

ingress is a qdisc that only classifies but doesn't queue

the usual classifiers, classifier combinations, and policing functions can be used

the classification result is stored in skb->gt;tc_index, a la sch_dsmark

if the classification returns a "drop" result (TC_POLICE_SHOT), the packet is discarded. Otherwise, it is accepted.

Since there is no queue for implicit rate limiting (via PRIO, TBF, CBQ, etc.), rate limiting must be done explicitly via policing. This is still done exactly like policing on egress.

4 classes

mps: should I explain what is class and their intimacy with qdisc? Yes? Classes are main component of the QoS. (stupid explanation)

The syntax for creating a class is shown below:

tc class [ add | del | change | get ] dev STRING

[ classid CLASSID ] [ root | parent CLASSID ]

[ [ QDISC_KIND ] [ help | OPTIONS ] ]

tc class show [ dev STRING ] [ root | parent CLASSID ]
Where: QDISC_KIND := { prio | cbq | etc. }

OPTIONS := ... try tc class add <lt;desired QDISC_KIND>gt; help

The QDISC_KIND can be one of the queuing disciplines that support classes. The interpretation of the fields:

classid
represents the handle that is assigned to the class by the user. It consists of a major number and a minor number, which have been discussed already.
root
indicates that the class represents the root class in the link sharing hierarchy.
parent
indicates the handle of the parent of the queuing discipline.

4.1 CBQ

This algorithm classifies the waiting packets into a tree-like hierarchy of classes; the leaves of this tree are in turn scheduled by separate algorithms (called "disciplines" in this context).

Usage: ... cbq bandwidth BPS rate BPS maxburst PKTS [ avpkt BYTES ]

[ minburst PKTS ] [ bounded ] [ isolated ]

[ allot BYTES ] [ mpu BYTES ] [ weight RATE ]

[ prio NUMBER ] [ cell BYTES ] [ ewma LOG ]

[ estimator INTERVAL TIME_CONSTANT ]

[ split CLASSID ] [ defmap MASK/CHANGE ]
bandwidth
represents the maximum bandwidth that is available to the queuing discipline owned by this class. It is only used as helper value to compute min/max idle values from maxburst and avpkt.
rate
represents the bandwidth that is allocated to this class. rate should be set to the desired bandwidth (you want) to allocate to a given traffic class. The kernel does not use this directly. It uses pre-calculated rate translation tables. It is used to compute overlimit status of class.
maxburst
represents the number of bytes that will be sent in the longest possible burst.
avpkt
represents the average number of bytes in a packet belonging to this class.
minburst
represents the number of bytes that will be sent in the shortest possible burst.
bounded
indicates that the class cannot borrow unused bandwidth from its ancestors. If this is not specified, then the class can borrow unused bandwidth from the parent (default off).
isolated
indicates that the class will not share bandwidth with any of non-descendant classes
allot
allot is MTU + MAC header
mpu
is explained at page
weight
should be made proportional to the rate.(explain CBQ is implemented using Weighted Round Robin algorithm)
prio
represents the priority that is assigned to this class. priority of value 0 is highest (most important) and value 7 is lowest.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
ewma
is explained at page
estimator
is explained at page
split
field is used for fast access. This is normally the root of the CBQ tree. It can be set to any node in the hierarchy thereby enabling the use of a simple and fast classifier, which is configured only for a limited set of keys to point to this node. Only classes with split node set to this node will be matched. The type of service (TOS in the IP header) and sk->gt;priority is not used for this purpose.
defmap
say that best effort traffic, not classfied by another means will fall to this class. defmap is bitmap of logical priorities served by this class
A note about CBQ class setup:

cbq class has fifo qdisc attached by default

You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes. I'm not 100% sure of what I've just said. It's just how I think it works.

5 filters (or classifier)

Filters are used to classify (map) packets based on certain properties of the packet e.g. TOS byte in the IP header, IP addresses, port numbers etc to certain classes. Queuing disciplines uses filters to assign incoming packets to one of its classes. Filters can be maintained per class or per queuing disciplines based on the design of the queuing discipline. Filters are maintained in filter lists. Filter lists are ordered by priority, in ascending order. Also, the entries are keyed by the protocol for which they apply, e.g., IP, UDP etc. Filters for the same protocol on the same filter list must have different priority values.

Filter vary in the scope

Filters have meters associated with them (TB+rate estimator)

Usage: tc filter [ add | del | change | get ] dev STRING

[ pref PRIO ] [ protocol PROTO ]

[ estimator INTERVAL TIME_CONSTANT ]

[ root | classid CLASSID ] [ handle FILTERID ]

[ [ FILTER_TYPE ] [ help | OPTIONS ] ]
or

tc filter show [ dev STRING ] [ root | parent CLASSID ]

Where:

FILTER_TYPE := { rsvp | u32 | fw | route | etc. }

FILTERID := ... format depends on classifier, see there

OPTIONS := ... try tc filter add <lt;desired FILTER_KIND>gt; help
The interpretation of the fields:

pref
represents the priority that is assigned to the filter.
protocol
is used by the filter to identify packets belonging only to that protocol. As already mentioned, no two filters can have the same priority and protocol field.
root
indicates that the filter is at the root of the link sharing hierarchy.
classid
represents the handle of the class to which the filter is applied.
handle
represents the handle by which the filter is identified uniquely. The format of the filter is different for different classifiers.
estimator
is explained at page

5.1 filter rsvp

Use RSVP protocol for classification

Usage: ... rsvp ipproto PROTOCOL session DST[/PORT | GPI ]

[ sender SRC[/PORT | GPI ]

[ classid CLASSID ] [ police POLICE_SPEC ]

[ tunnelid ID ] [ tunnel ID skip NUMBER ]

Where:

GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |

u{8|16|32} NUMBER mask MASK at OFFSET}

POLICE_SPEC := ... look at TBF

FILTERID := X:Y
Comparing to general packet classification problem, RSVP needs only several relatively simple rules:

(dst, protocol) are always specified, so that we are able to hash them.

ipproto
is one of the IP protocol (TCP, UDP and maybe other)
session
is destination (address?) with or without port, or gpi (Generalized Port Identifier)
src
may be exact, or may be wildcard, so that we can keep a hash table plus one wildcard entry.
source
port (or flow label) is important only if src is given.
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ edge router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.

Alexey: IMPLEMENTATION.

We use a two level hash table: The top level is keyed by destination address and protocol ID, every bucket contains a list of "rsvp sessions", identified by destination address, protocol and DPI(="Destination Port ID"): triple (key, mask, offset).

Every bucket has a smaller hash table keyed by source address (cf. RSVP flowspec) and one wildcard entry for wildcard reservations. Every bucket is again a list of "RSVP flows", selected by source address and SPI(="Source Port ID" here rather than "security parameter index"): triple (key, mask, offset).

All the packets with IPv6 extension headers (but AH and ESP) and all fragmented packets go to the best-effort traffic class.

Two "port id"'s seems to be redundant, rfc2207 requires only one "Generalized Port Identifier". So that for classic ah, esp (and udp,tcp) both *pi should coincide or one of them should be wildcard.

At first sight, this redundancy is just a waste of CPU resources. But DPI and SPI add the possibility to assign different priorities to GPIs. Look also at note 4 about tunnels below.

One complication is the case of tunneled packets. We implement it as following: if the first lookup matches a special session with "tunnelhdr" value not zero, flowid doesn't contain the true flow ID, but the tunnel ID (1...255). In this case, we pull tunnelhdr bytes and restart lookup with tunnel ID added to the list of keys. Simple and stupid 8)8) It's enough for PIMREG and IPIP.

Two GPIs make it possible to parse even GRE packets. F.e. DPI can select ETH_P_IP (and necessary flags to make tunnelhdr correct) in GRE protocol field and SPI matches GRE key. Is it not nice? 8)8)

Well, as result, despite its simplicity, we get a pretty powerful classification engine.

Panagiotis Stathopoulos: Well an rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ egde router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.

note: I have to read more about RSVP

5.2 filter u32

Anything in the header can be used for classification

The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules.

Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]

[ police POLICE_SPEC ] [ offset OFFSET_SPEC ]

[ ht HTID ] [ hashkey HASHKEY_SPEC ]

[ sample SAMPLE ]

or u32 divisor DIVISOR

Where: SELECTOR := SAMPLE SAMPLE ...

SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} } SAMPLE_ARGS FILTERID := X:Y:Z

match
SELECTOR contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful.
link

classid

police

offset

ht
is hash table
hashkey
is the key to hash table
sample
is protocol such as IP or higher layer protocol such as UDP, TCP or ICMP. sample can be one of the keywords u32, u16 or u8 specifies length of the pattern in bits. PATTERN and MASK should follow, of length defined by the previous keyword. The OFFSET parameter is the offset, in bytes, to start matching. If nexthdr+ keyword is given, the offset is relative to start of the upper layer header.
police
specification is explained on the page
The syntax here is match ip <lt;item>gt; <lt;value>gt; <lt;mask>gt;

So match ip protocol 6 0xff matches protocol 6, TCP. (See /etc/protocols) match ip dport 0x17 0xffff is TELNET (/etc/services). Note that the number is hexadecimal, not decimal.

note: (mps) ht - hash table HTID Hash Table ID is fh - filter handle in filter show

The filters are packed to hash tables of key nodes with a set of 32bit key/mask pairs at every node. Nodes reference next level hash tables etc.

It seems that it represents the best middle point between speed and manageability both by human and by machine.

It is especially useful for link sharing combined with QoS; pure RSVP doesn't need such a general approach and can use much simpler (and faster) schemes.

5.3 filter fw

Classifier mapping ipchains' fwmark to traffic class

Usage: ... fw [ classid CLASSID ] [ police POLICE_SPEC ]

POLICE_SPEC := ... look at TBF

CLASSID := X:Y
classid
is class handle
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?

5.4 filter route

Use routing table decisions for classification

Usage: ... route [ from REALM | fromif TAG ] [ to REALM ]

[ flowid CLASSID ] [ police POLICE_SPEC ]

POLICE_SPEC := ... look at TBF

CLASSID := X:Y
from
REALM is realm in ip route table
fromif
TAG is interface tag
to
REALM is (again) ip route table realm
flowid
CLASSID is class to which packet (if passed) is
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
For now we assume that route tags <lt; 256. It allows to use direct table lookups, instead of hash tables.
For now we assume that "from TAG" and "fromdev DEV" statements are mutually exclusive.
"to TAG from ANY" has higher priority, than "to ANY from XXX"

5.5 tcindex

Use tc_index internal tag in skb to select classes.

Usage: ... tcindex [ hash SIZE ] [ mask MASK ] [ shift SHIFT ] [ pass_on | fall_through ] [ classid CLASSID ] [ police POLICE_SPEC ]
hash
is the size of the lookup table
mask
is the bit mask (this explanation is worthless)
shift
the mask right by SHIFT number
pass_on
defines that this packet will pass
fall_through

classid
is the class to which filter is attached
police
specification is explained on the page
note: key = (skb->gt;tc_index >gt;>gt; shift) & mask

6 police

The purpose of policing is to ensure that traffic does not exceed certain bounds. For simplicity, we will assume a broad definition of policing and consider it to comprise all kinds of traffic control actions that depend in some way on the traffic volume.

We consider four types of policing mechanisms:

policing decisions by filters
refusal to enqueue a packet
dropping of a packet from an ``inner'' queueing discipline
dropping of packet when enqueuing a new one
Usage: ... police rate BPS burst BYTES[/BYTES] [ mtu BYTES[/BYTES] ]

[ peakrate BPS ] [ avrate BPS ] [ ACTION ]

Where: ACTION := reclassify | drop | continue
rate
is the long-term rate attached to the meter
peakrate
this is the peakrate a flow is allowed to burst in the short-term. Basically this upper-bounds the rate.
mtu
a packet exceeding this size will be dropped. The default value is 2KB. This is fine with ethernet whose MTU is 1.5KB but will not be fine with Gigabit ethernet exploiting Jumbo frames for example. It also will not be valid for the lo device whose MTU is defined by amongst other things how much RAM you have. You must set this value if you have exceptions to the rule.
ACTION
exceed/non-exceed: This allows to define what actions should be exercised when a flow either exceeds its allocated or doesn't. they are:
pass
(?)
reclassify
used by CBQ to go to BE (Best Effort, ask Jamal?)
drop
simply drops packet
continue
- lookup the next filter rule with lower priority
note: "drop" is only recognized by the following qdiscs: atm, cbq, dsmark, and (ingress - really?). In particular, prio ignores it.

Bibliography

1
A. N. Kuznetsov, docs from iproute2

2
Werner Almesberger, Linux Network Traffic Control - Implementation Overview

3
Jamal Hadi Salim, IP Quality of Service on Linux http://????

4
Saravanan Radhakrishnan, Linux - Advanced Networking Overview http://qos.ittc.ukans.edu/howto/howto.html

12
Almesberger, Jamal Hadi Salim, Alexey Kuznetsov - Differentiated Services on Linux

6
linux-diffserv mailing list linux-diffserv@lrc.di.epfl.ch

7
Sally Floyd, Van Jacobson - Link-sharing and Resource Management Models for Packet Networks

9
Sally Floyd, Van Jacobson - Random Early Detection Gateways for Congestion Avoidance

9
Related Cisco documents from http://www.cisco.com/

10
Lixia Zhang, Steve Deering, Deborah Estrin, Scott Shenker, Daniel Zapalla - RSVP: A New Resource ReSerVation Protocol

11
Related RFC's

12
and many others

Appendix

note: flowid is sometimes class handle sometimes something else

mariano - good setup for me: If you remove the router and then the modem line becomes ppp0 (instead of eth0), you should declare that ppp0 has "bandwidth 30K". Then, the classes should use "bandwidth 30K rate 20K" and "bandwidth 30K rate 10K"

2007/11/29

Limit Bandwidth with bwmod for apache (Per Vhost/Directory)

Limit Bandwidth (Per Vhost/Directory)

The main goal, is to be able to "assign" a maximum (or fixed) bandwidth available to a vhost.
This is achieved inserting small delays while sending the data, thus limiting the top speed a client can use. In example, if we assign 100kb to a vhost, the first user will be able to download at 100kb. If another user starts downloading, each will be able to get 50kb/s max... A third, 33kb/s each.. and so on.

iptables チュートリアルの和訳

Iptablesチュートリアル 1.2.0
Oskar Andreasson

Tatsuya Nonogaki - 日本語訳
��http://www.asahi-net.or.jp/~aa4t-nngk/
��Japanese translation v.1.1.1

Copyright © 2001-2005 Oskar Andreasson

Copyright © 2005-2006 Tatsuya Nonogaki

この文書を、フリーソフトウェア財団発行の GNU フリー文書利用許諾契約書バージョン1.1 が定める条件の下で複製、頒布、あるいは改変することを許可する。序文とその副章は変更不可部分であり、「Original Author: Oskar Andreasson」は表カバーテキスト、裏カバーテキストは指定しない。この利用許諾契約書の複製物は「GNU フリー文書利用許諾契約書」という章に含まれている。

このチュートリアルに含まれるすべてのスクリプトはフリーソフトウェアです。あなたはこれを、フリーソフトウェア財団によって発行された GNU 一般公衆利用許諾契約書バージョン2の定める条件の下で再頒布または改変することができます。

これらのスクリプトは有用であることを願って頒布されますが、*全くの無保証* です。商業可能性の保証や特定の目的への適合性は、言外に示されたものも含め全く存在しません。詳しくはGNU 一般公衆利用許諾契約書をご覧ください。

あなたはこのチュートリアルと共に、GNU 一般公衆利用許諾契約書の複製物を一部受け取ったはずです。もし受け取っていなければ、フリーソフトウェア財団まで請求してください(宛先は the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA)。

献辞

僕はこのドキュメントを僕の素晴らしい妹に捧げる。彼女は僕を応援し、示唆を与えてくれた。彼女は僕の幸せの源であり一筋の光だ。感謝!

そして、僕はこの作品を、とてつもなくきつい仕事をしている Linux 開発者達と維持管理者達に捧げたい。この素晴らしいオペレーティングシステムを世に送り出してくれている人々へ。
Table of Contents
著者について
読み方
予備知識
このドキュメントで用いる表記法
<この日本語訳で用いる表記法>
1. 序文
1.1. なぜこのドキュメントを書いたか
1.2. どのようにして書いたか
1.3. このドキュメントで使う語句
2. TCP/IPのおさらい
2.1. TCP/IPのレイヤー
2.2. IPの特徴
2.3. IPヘッダ
2.4. TCPの特徴
2.5. TCPヘッダ
2.6. UDPの特徴
2.7. UDPヘッダ
2.8. ICMPの特徴
2.9. ICMPヘッダ
2.9.1. ICMPエコー要求/応答
2.9.2. ICMP到達不能メッセージ (Destination Unreachable)
2.9.3. ソースクエンチ (Source Quench)
2.9.4. リダイレクト
2.9.5. TTL equals 0
2.9.6. パラメータ障害
2.9.7. タイムスタンプ要求/応答
2.9.8. インフォメーション要求/応答
2.10. TCP/IP宛先誘導型ルーティング
2.11. まとめ
3. IPフィルタリングとは
3.1. IPフィルタとは何か
3.2. IPフィルタリングの用語と表現
3.3. IPフィルタの計画の仕方
3.4. まとめ
4. ネットワークアドレス変換とは
4.1. NATの利用目的と用語解説
4.2. NAT使用時の注意点
4.3. 概念を理解するためのNATマシン構築例
4.3.1. NATマシン構築に必要なもの
4.3.2. NATマシンの配備位置
4.3.3. プロキシの配置の仕方
4.3.4. NATマシン構築の最終段階
4.4. まとめ
5. 準備
5.1. iptablesの入手先
5.2. カーネルのセットアップ
5.3. ユーザ空間のセットアップ
5.3.1. ユーザ空間アプリケーションのコンパイル
5.3.2. Red Hat 7.1 でのインストール
6. テーブルとチェーンの道のり
6.1. 全般
6.2. mangleテーブル
6.3. natテーブル
6.4. filterテーブル
7. ステート機構
7.1. はじめに
7.2. conntrackエントリ
7.3. ユーザ空間でのステート
7.4. TCP接続
7.5. UDP接続
7.6. ICMP接続
7.7. デフォルトの接続
7.8. 複雑なプロトコルとコネクション追跡
8. 大きなルールセットの保存とリストア
8.1. 速度に関する考察
8.2. restoreの欠点
8.3. iptables-save
8.4. iptables-restore
9. ルールの作り方
9.1. iptablesのコマンドの基本
9.2. テーブル
9.3. コマンド
10. iptablesのマッチ
10.1. 汎用的なマッチ
10.2. 暗黙的なマッチ
10.2.1. TCPマッチ
10.2.2. UDPマッチ
10.2.3. ICMPマッチ
10.3. 明示的なマッチ
10.3.1. AH/ESPマッチ
10.3.2. Conntrackマッチ
10.3.3. DSCPマッチ
10.3.4. ECNマッチ
10.3.5. Helperマッチ
10.3.6. IP rangeマッチ
10.3.7. Lengthマッチ
10.3.8. Limitマッチ
10.3.9. MACマッチ
10.3.10. Markマッチ
10.3.11. Multiportマッチ
10.3.12. Ownerマッチ
10.3.13. Packet type マッチ
10.3.14. Recentマッチ
10.3.15. Stateマッチ
10.3.16. TCPMSSマッチ
10.3.17. TOSマッチ
10.3.18. TTLマッチ
10.3.19. Uncleanマッチ
11. iptablesのターゲットとジャンプ
11.1. ACCEPTターゲット
11.2. CLASSIFYターゲット
11.3. DNATターゲット
11.4. DROPターゲット
11.5. DSCPターゲット
11.6. ECNターゲット
11.7. LOGターゲット
11.8. MARKターゲット
11.9. MASQUERADEターゲット
11.10. MIRRORターゲット
11.11. NETMAPターゲット
11.12. QUEUEターゲット
11.13. REDIRECTターゲット
11.14. REJECTターゲット
11.15. RETURNターゲット
11.16. SAMEターゲット
11.17. SNATターゲット
11.18. TCPMSSターゲット
11.19. TOSターゲット
11.20. TTLターゲット
11.21. ULOGターゲット
12. スクリプトのデバグ
12.1. デバグ、それは必要欠くべからざるもの
12.2. Bashデバグテクニック
12.3. デバグに役立つシステムツール
12.4. iptablesのデバグ
12.5. その他のデバグツール
12.5.1. Nmap
12.5.2. Nessus
12.6. まとめ
13. rc.firewallファイル
13.1. 例 rc.firewall
13.2. rc.firewallの説明
13.2.1. 設定オプション
13.2.2. 追加モジュールの初期ロード
13.2.3. procの設定
13.2.4. 各種チェーンへのルール配置
13.2.5. デフォルトポリシーの設定
13.2.6. filterテーブルにユーザ定義チェーンを作る
13.2.7. INPUTチェーン
13.2.8. FORWARDチェーン
13.2.9. OUTPUTチェーン
13.2.10. natテーブルのPREROUTINGチェーン
13.2.11. SNATの開始とPOSTROUTINGチェーン
14. スクリプト例
14.1. rc.firewall.txtスクリプトの構造
14.1.1. 構造
14.2. rc.firewall.txt
14.3. rc.DMZ.firewall.txt
14.4. rc.DHCP.firewall.txt
14.5. rc.UTIN.firewall.txt
14.6. rc.test-iptables.txt
14.7. rc.flush-iptables.txt
14.8. Limit-match.txt
14.9. Pid-owner.txt
14.10. Recent-match.txt
14.11. Sid-owner.txt
14.12. Ttl-inc.txt
14.13. Iptables-save ruleset
15. iptables/netfilter用グラフィカルユーザインターフェイス
15.1. fwbuilder
15.2. Turtle Firewall プロジェクト
15.3. Integrated Secure Communications System
15.4. IPMenu
15.5. Easy Firewall Generator
15.6. まとめ
A. 特別なコマンドの詳細解説
A.1. 稼働中のルールセットのリストアップ
A.2. テーブルのアップデートとフラッシュ
B. よくある問題と質問
B.1. モジュールロードのトラブル
B.2. NEWステートでありながらSYNビットの立っていないパケット
B.3. SYN/ACKでNEWなパケット
B.4. 予約済みIPアドレスを使用するインターネットサービスプロバイダ
B.5. iptablesにDHCPリクエストを通させる
B.6. mIRC DCCのトラブル
C. ICMPタイプ
D. TCPオプション
E. その他の資料とリンク
F. 謝辞
G. History
H. GNUフリー文書利用許諾契約書
0. PREAMBLE
1. APPLICABILITY AND DEFINITIONS
2. VERBATIM COPYING
3. COPYING IN QUANTITY
4. MODIFICATIONS
5. COMBINING DOCUMENTS
6. COLLECTIONS OF DOCUMENTS
7. AGGREGATION WITH INDEPENDENT WORKS
8. TRANSLATION
9. TERMINATION
10. FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
I. GNU一般公衆利用許諾契約書
0. Preamble
1. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2. How to Apply These Terms to Your New Programs
J. スクリプト例コードベース
J.1. Example rc.firewall script
J.2. Example rc.DMZ.firewall script
J.3. Example rc.UTIN.firewall script
J.4. Example rc.DHCP.firewall script
J.5. Example rc.flush-iptables script
J.6. Example rc.test-iptables script
List of Tables
6-1. ローカルホスト (我々のマシン) を宛先とするパケット
6-2. ローカルホスト (我々のマシン) を送信元とするパケット
6-3. フォワードパケット
7-1. ユーザ空間でのステート
7-2. 内部ステート
7-3. サポートされている複雑なプロトコル
9-1. テーブル
9-2. コマンド
9-3. オプション
10-1. 汎用的なマッチ
10-2. TCPマッチ
10-3. UDPマッチ
10-4. ICMPマッチ
10-5. AHマッチオプション
10-6. ESPマッチオプション
10-7. Conntrackマッチオプション
10-8. DSCPマッチオプション
10-9. ECNマッチオプション
10-10. IP内のECNフィールド
10-11. Helperマッチオプション
10-12. IP rangeマッチオプション
10-13. Lengthマッチオプション
10-14. Limitマッチオプション
10-15. MACマッチオプション
10-16. Markマッチオプション
10-17. Multiportマッチオプション
10-18. Ownerマッチオプション
10-19. Packet typeマッチオプション
10-20. Recentマッチオプション
10-21. Stateマッチ
10-22. TCPMSSマッチオプション
10-23. TOSマッチ
10-24. TTLマッチ
11-1. CLASSIFYターゲットオプション
11-2. DNATターゲット
11-3. DSCPターゲットオプション
11-4. ECNターゲットオプション
11-5. LOGターゲットオプション
11-6. MARKターゲットオプション
11-7. MASQUERADEターゲット
11-8. NETMAPターゲットオプション
11-9. REDIRECTターゲット
11-10. REJECTターゲット
11-11. SAMEターゲットオプション
11-12. SNATターゲットオプション
11-13. TCPMSSターゲットオプション
11-14. TOSターゲット
11-15. TTLターゲット
11-16. ULOGターゲット
C-1. ICMPタイプ
D-1. TCPオプション

Firewall Builder

Object-oriented GUI and set of compilers for various firewall platforms. Currently implemented compilers for iptables, ipfilter, OpenBSD pf, ipfw, Cisco PIX firewall and routers access lists.

2007/11/28

Dynamic iptables firewalls Flexible (and fun) network security

Common threads --
Dynamic iptables firewalls
Flexible (and fun) network security

Japanese
http://www.ibm.com/developerworks/jp/linux/library/l-fw-index/
Chinese
http://www.ibm.com/developerworks/cn/linux/network/dif/index.html

Document options

Print this page

E-mail this page

New forum features

Try private messaging, read-tracking, and more

Rate this page

Help us improve this content

Level: Intermediate

Daniel Robbins (drobbins@gentoo.org), President/CEO, Gentoo Technologies, Inc.

01 Apr 2001
Firewalls are good and fun, but what do you do when you need to make rapid, complex changes to your firewall rules? Easy. Use Daniel Robbins' dynamic firewall scripts that are demonstrated in this article. You can use these scripts to increase your network security and responsiveness, and to inspire your own creative designs.

The best way to see the benefits of dynamic firewall scripts is to see them in action. To do this, let's imagine that I'm a sysadmin at an ISP, and I've recently set up a Linux-based firewall to protect my customers and internal systems from malicious users on the Internet. To do this, my firewall uses the new Linux 2.4 iptables stateful functionality to allow new outgoing connections to be established by my customers and servers, and of course to allow new incoming connections, but only to "public" services, such as web, ftp, ssh, and SMTP. Since I used a deny-by-default design, any from-Internet connections to non-public services, such as the squid proxy cache or Samba server, are automatically rejected. So far, I have a pretty decent firewall that offers a good level of protection for everyone at my ISP.

For the first week or so, the firewall works great, but then something ugly happens: Bob, my arch-nemesis (who works at a competing ISP) decides that he wants to flood my network with packets in an attempt to deny service to my customers. Unfortunately, Bob has carefully studied my firewall and knows that while I'm protecting many internal services, port 25 and 80 must be publicly accessible so that I can receive mail and serve HTTP requests. Bob decides to take advantage of this fact by launching a bandwidth-sucking attack against my web and mail server.

About a minute or so after Bob begins his attack, I notice that my uplinks start becoming saturated with packets. After taking a look at the situation with tcpdump, I determine that this is yet another Bob attack, and I figure out what IP addresses he's using to launch it. Now that I have this information, all that I need to do is block these IP addresses, and that should solve the problem -- a simple solution, or so I think.

Responding to an attack

I quickly load my firewall setup script into vi and begin hacking away at my iptables rules, modifying my firewall so that it'll block those evil incoming Bob packets. After a minute or so, I find the exact place to make the appropriate DROP rule additions, and I add them. Then, I start and stop the firewall...ooops, made a bit of a mistake when I added the rules. I load up the firewall scripts again, fix the problem, and thirty seconds later the firewall has been tweaked to block Bob's attack of the month. At first, it seems like I successfully thwarted the attack...until the helpdesk phones begin ringing. Apparently, Bob was able to disrupt my network for about 10 minutes, and now my customers are calling to find out what's going on. Even worse, after a few minutes pass, I notice that our uplinks again start to become saturated. This time, Bob appears to be using a brand-new set of IP addresses for his attacks. In response, I begin feverishly hacking away at our firewall scripts, except this time, I'm a bit panicky -- maybe my solution isn't so good after all.

Here's what went wrong in the above scenario. Although I had a decent firewall in place and also quickly identified the cause of the network problem, I was unable to modify the behavior of my firewall to respond to the threat in time. Of course, when your network is under attack, you want to be able to respond immediately, and being forced to hack away at your master firewall setup script in a panicked state is not only stressful, but also very inefficient.

Back to top

ipdrop

It would be far better if I had a special "ipdrop" script that's specifically designed to insert just the rules you need to block the IP address that I specify. With such a script, blocking a firewall is no longer a two-minute ordeal; instead, it takes five seconds. And since the script shields me from the task of editing firewall rules by hand, it eliminates a major source of errors. All that's left for me to do is to determine the IP address that I'd like to block, and then type: # ipdrop 129.24.8.1 on
IP 129.24.8.1 drop on.

Immediately, the ipdrop script would block 129.24.8.1, Bob's current evil IP address of the week. This script dramatically improves your defenses, because now an IP block is a no-brainer. Now, let's take a look at my implementation of the ipdrop script: #!/bin/bash

source /usr/local/share/dynfw.sh

args 2 $# "${0} IPADDR {on/off}" "Drops packets to/from IPADDR. Good for obnoxious
networks/hosts/DoS"

if [ "$2" == "on" ]
then
#rules will be appended or inserted as normal
APPEND="-A"
INSERT="-I"
rec_check ipdrop $1 "$1 already blocked" on
record ipdrop $1
elif [ "$2" == "off" ]
then
#rules will be deleted instead
APPEND="-D"
INSERT="-D"
rec_check ipdrop $1 "$1 not currently blocked" off
unrecord ipdrop $1
else
echo "Error: \"off\" or \"on\" expected as second argument"
exit 1
fi

#block outside IP address that's causing problems
#attacker's incoming TCP connections will take a minute or so to time out,
#reducing DoS effectiveness.

iptables $INSERT INPUT -s $1 -j DROP
iptables $INSERT OUTPUT -d $1 -j DROP
iptables $INSERT FORWARD -d $1 -j DROP
iptables $INSERT FORWARD -s $1 -j DROP

echo "IP ${1} drop ${2}."

Back to top

ipdrop: the explanation

If you take a look at the last four highlighted lines, you'll see the actual commands that insert the appropriate rules into the firewall tables. As you can see, the definition of the $INSERT environment variable varies, depending on whether we're running in "on" or "off" mode. When the iptables lines execute, the particular rules will be inserted or deleted appropriately.

Now, let's look at the function of the rules themselves, which should work perfectly with any type of existing firewall, or even on a system with no firewall; all you need is iptables support built-in to your 2.4 kernel. We block incoming packets arriving from the evil IP (first iptables line), block outgoing packets headed for the evil IP (next iptables line), and then turn off forwarding in either direction for this particular IP (last two iptables lines.) Once these rules are in place, your system will simply discard any packets that fall into one of these categories.

Another quick note: you'll also notice calls to "rec_check", "unrecord", "record", and "args". These are special helper bash functions defined in "dynfw.sh". The "record" function records the blocked ip in the /root/.dynfw-ipdrop file, while the "unrecord" removes the entry from /root/.dynfw-ipdrop. The "rec_check" function is used to abort the script with an error message if you attempt to re-block an already-blocked IP, or unblock an IP that isn't currently being blocked. The "args" function takes care of making sure that we receive the correct number of command-line arguments, and also handles printing helpful usage information. I've created a dynfw-1.0.tar.gz that contains all these tools; see the Resources section at the end of this article for more information.

Back to top

tcplimit

This next dynamic firewall script is useful if you need to limit the usage of a particular TCP-based network service, possibly something that generates a heavy CPU load on your end. Called "tcplimit", this script takes a TCP port, a rate, a scale, and "on" or "off" as an argument: # tcplimit 873 5 minute on
Port 873 new connection limit (5/minute, burst=5) on.

tcplimit uses the new iptables "state" module (make sure you've enabled this in your kernel or loaded the module) to allow only a certain number of new, incoming connections in a specific period of time. In this example, the firewall will allow only five new connections to my rsync server (port 873) per minute -- and it's possible to specify the desired number of connections you'd like per second/minute/hour or day, as needed.

tcplimit offers a good way of limiting non-essential services -- so that a flood of traffic to a non-essential service doesn't disrupt your network or server. In my case, I use tcplimit to set a maximum upper bound for rsync usage to prevent my DSL line from becoming saturated by too many rsync connections. Connection-limited services are recorded in /root/.dynfw-tcplimit, and if I ever want to turn the new connection limiting off, I can simply type: # tcplimit 873 5 minute off
Port 873 new connection limit off.

tcplimit works by creating a completely new chain in the "filter" table. This new chain will reject all packets that exceed our specified limit. Then, a single rule is inserted into the INPUT chain that redirects all incoming NEW connection packets headed to the target port (873 in this case) to this special chain, effectively placing a limit on new, incoming connections while not affecting packets that are part of an established connection.

When tcplimit is turned off, the INPUT rule and special chain are deleted. This is the kind of fancy stuff that really highlights the importance of having a well-tested, reliable script manage the firewall rules for you. As with ipblock, the tcplimit script should be compatible with any type of firewall, or even no firewall, as long as you have the proper iptables functionality enabled in your kernel.

Back to top

host-tcplimit

host-tcplimit is a lot like tcplimit, but it limits new TCP connections coming in from a particular IP address and heading for a particular TCP port on your server(s). host-tcplimit is particularly useful for preventing a particular person from abusing your network resources. For example, let's say you're running a CVS server, and you discover that a particular new developer appears to have set up a script that updates his sources with the repository every 10 minutes, using up a huge amount of unnecessary network resources over the course of a day. However, while you're in the process of composing an e-mail to him explaining the error of his ways, you receive an incoming message that reads as follows: Hi guys!

I'm really excited to be part of your development project. I just set up a
script to update my local copy of the code every ten minutes. I'm about to
leave on a two-week cruise, but when I get back, my sources will be totally
up-to-date and I'll be ready to help out! I'm heading out the door now...see
you in two weeks!

Sincerely,

Mr. Newbie

For such situations, a simple host-tcplimit command will solve the problem: # host-tcplimit 1.1.1.1 2401 1 day on

Now, Mr. Newbie (IP address 1.1.1.1) is limited to one CVS connection (port 2401) per day, saving oodles of network bandwidth.

Back to top

user-outblock

The last and possibly most intriguing of all my dynamic firewall scripts is user-outblock. This script provides an ideal way to allow a particular user to telnet or ssh into your system, yet not allow this user to establish any new outgoing connections from the command-line. Here's an example of a situation where user-outblock would come in handy. Let's say that a particular family has an account at my ISP. Mom and Dad use a graphical e-mail client to read their mail and occasionally surf the Web, but their son happens to be an aspiring hacker, and generally uses his shell access to do naughty things to other people's computers.

One day, you find that he's established ssh connections with several systems that appear to belong to the Pakistani military -- ouch. You'd like to help direct this youth towards more beneficial activities, so you do the following:

First, you do an audit of your system and make sure that you remove the suid bit from all your network binaries, like ssh: # chmod u-s /usr/bin/ssh

Now, any processes that he tries to use to interact with the network will be owned by his UID. You can now use user-outblock to block all outgoing TCP connections initiated by this UID (which happens to be 2049): # user-outblock 2049 on
UID 2049 block on.

Now, he can log in and read his mail, but he's not going to be using your servers to establish ssh connections and the like. Now, he could install an ssh client on his home PC. However, it's not too hard to whip up another dynamic firewall script that limits his home PC to Web, mail, and outgoing ssh connections (to your servers only).

Resources
Because I've found these dynamic firewall scripts so helpful, I've put together a neat little tarball (dynfw-1.0.tar.gz) that you can download and install on your machine.

To install, extract the tarball and run the included install.sh script. This script will install a shared bash script to /usr/local/share/dynfw.sh, and install the dynamic firewall scripts themselves to /usr/local/sbin. If you'd like them to end up in /usr/share and /usr/sbin instead, simply type this before running install.sh:

# export PREFIX=/usr

I've also added a dynamic firewall scripts section to the Gentoo Linux Web site that you can visit to get the latest version of the tarball. I'd like to continue improving and adding to the collection, making a truly useful resource for sysadmins planetwide. Now that we have iptables in the kernel, it's time to start taking advantage of it!

tcpdump is an essential tool for exploring low-level packet exchanges and verifying that your firewall is working correctly. If you don't have it, get it. If you've got it, start using it. If you're using it...good for you. :)

Visit the home page for the netfilter team to find lots of excellent resources, including the iptables sources, and Rusty's excellent "unreliable guides". These include a basic networking concepts HOWTO, a netfilter (iptables) HOWTO, a NAT HOWTO, and a netfilter hacking HOWTO for developers.

Thankfully, there are a lot of good online netfilter resources; however, don't forget the basics. The iptables man page is very detailed and is a shining example of what a man page should be.

There's now an Advanced Linux Routing and Traffic Control HOWTO available. There's a good section that shows how to use iptables to mark packets, and then use Linux routing functionality to route the packets based on these marks.

There's a netfilter (iptables) mailing list available, as well as one for netfilter developers. You can also access the mailing list archives at these URLs.

About the author

Residing in Albuquerque, New Mexico, Daniel Robbins is the President/CEO of Gentoo Technologies,Inc., the creator of Gentoo Linux, an advanced Linux for the PC, and the Portage system, a next-generation ports system for Linux. He has also served as a contributing author for the Macmillan books Caldera OpenLinux Unleashed, SuSE Linux Unleashed, and Samba Unleashed. Daniel has been involved with computers in some fashion since the second grade, when he was first exposed to the Logo programming language as well as a potentially dangerous dose of Pac Man. This probably explains why he has since served as a Lead Graphic Artist at SONY Electronic Publishing/Psygnosis. Daniel enjoys spending time with his wife, Mary, and his new baby daughter, Hadassah. You can contact Daniel at drobbins@gentoo.org.

2007/11/27

vsftpd configuration memo

vsftpd設定メモ
RedHat7.3
vsftpd-1.1.3
vsftpdでユーザーごとにchrootさせるディレクトリを変える設定
以下の例ではデフォルトで~/public_html、ユーザーhogeのみ~/(/home/hoge)にchrootさせるように設定しています。
vsftpd.confの設定

chroot_list_enable=YES
local_root=public_html(デフォルトでchrootさせたいところ)
user_config_dir=/etc/vsftpd_user_conf(デフォルト以外のところにchrootさせたいユーザーの設定ファイルを書くところ)
/etc/vsftpd.chroot_list

chrootさせたいユーザー名を書く。このときユーザーhogeも書いておく。chroot_list_fileでファイル名を指定した場合には/etc/vsftpd.chroot_listではなくそのファイルに書く。
ユーザー設定ファイルを置くディレクトリを作る

mkdir /etc/vsftpd_user_conf(vsftpd.confのuser_config_dirで指定したディレクトリを作る)
デフォルトと違うところにchrootさせたいユーザー名のファイル(ユーザー設定ファイル)を作る

touch /etc/vsftpd_user_conf/hoge
上で作ったユーザー設定ファイルにchrootさせたいディレクトリを書く

local_root=/home/hoge(~/とかではダメなようです)

vsftpdのドキュメント(その対訳thanks Mr.Takada)を参考にしました

TLS/SSL encryption on vsftpd

The release of vsftpd version 2 brought some major updates to the FTP package and the most notable is the inclusion of TLS/SSL encryption for securing authentication and data transfers between clients and server.
You should only enable TLS/SSL if you really need it. If you only intend to cater for anonymous users on your server, then you should not implement encryption.

To enable the TLS/SSL security controls, the vsftpd version must have been compiled with its support. To find out if your version has been compiled with SSL support, execute the following command at the prompt.
[bash]# ldd /usr/sbin/vsftpd | grep ssl

If the command displays the libssl line in its output, then your version is ready to support TLS/SSL. If libssl is not in the output then your version of vsftpd does not support encryption, you will either have to recompile the source code yourself, or convince your distribution developers to consider it for inclusion.
libssl.so.6 => /lib/libssl.so.6 (0x001bf000)

Before the server is able to do any encryption, it requires the generation of a private key and a digital certificate. During the key generation process you will be asked several questions in regards of server name, organisational name, country code.
PREFERRED METHOD..
[bash]# cd /etc/pki/tls/certs
[bash]# make vsftpd.pem
ALTERNATE METHOD..
[bash]# openssl req -x509 -nodes -days 730 -newkey rsa:1024 \
-keyout /etc/pki/tls/certs/vsftpd.pem \
-out /etc/pki/tls/certs/vsftpd.pem

Both commands above are suitable for creating your certificates. The bottom command creates an X509 SSL certificate with a life of 2 years (-days 730).
Country Name (2 letter code) [GB]:AU
State or Province Name (full name) [Berkshire]:QLD
Locality Name (eg, city) [Newbury]:Brisbane
Organization Name (eg, company) [My Company Ltd]:Miles Brennan
Organizational Unit Name (eg, section) []:Home Linux Server
Common Name (eg, your name or your server's hostname) []:galaxy.example.com
Email Address []:sysadmin@example.com

If you are using the server for legitimate business use and you want to provide a level of security assurance to your customers, then you should use a key that has been signed by a Certificate Authority.

The contents of the /etc/pki/tls/certs/vsftpd.pem file should be checked to ensure is has a private key and digital certificate. If any of the identifying details in the X509 change or have been entered incorrectly, you can easily regenerate new keys until the details are correct.

The vsftpd.pem file should also be secured so only root has access to the file. This does not affect the server if it is running as a non privileged account, as the keys are loaded before dropping into non privileged mode.
[bash]# cat /etc/pki/tls/certs/vsftpd.pem
[bash]# openssl x509 -in /etc/pki/tls/certs/vsftpd.pem -noout -text
[bash]# chmod 600 /etc/pki/tls/certs/vsftpd.pem

The configuration file now needs to be adjusted to include the support for TSL/SSL encryption. The following details are the recommended parameters required, details of each parameter can be obtained from the "man vsftpd.conf" file.
[bash]# vi /etc/vsftpd/vsftpd.conf

ssl_enable=YES
allow_anon_ssl=NO
force_local_data_ssl=NO
force_local_logins_ssl=YES

ssl_tlsv1=YES
ssl_sslv2=NO
ssl_sslv3=NO

rsa_cert_file=/etc/pki/tls/certs/vsftpd.pem

The service should now be restarted for the changes to take effect.
[bash]# /etc/init.d/vsftpd restart

For TLS/SSL encryption to be fully implemented, the FTP client application also needs to support secure connections.

TLS/SSL Enabled FTP Clients
The Linux based gFTP client is enabled for TLS/SSL connections, however it initially rejects self-signed server certificates. This can be fixed by disabling the "Verify SSL Peer" setting in options. When making connections, be sure to select the FTPS protocol.

The Windows based SmartFTP client is also enabled for TLS/SSL connections. The FTP server firstly needs to be configured as a "Favourite Site", then the properties need to adjusted to use the "FTP over SSL Explicit" protocol. Save the changes and connect.

Fantastechnol - Personal Reading Digest Technical Digest and Memo