Linux ルーターに QoS を設定しました。快適、快適。って Linux を使わない人にはまったく興味のない話題だと思いますが(笑)。私が設定したスクリプトを書いておきますので、QoS を使いたい方は、参考にしてください。スクリプトを走らせた後に

tc -s class ls dev eth0

とコマンドを打つと、HTTP や FTP がトラフィックコントロールされている様子がわかります。

# ルールの初期化
/sbin/tc qdisc del dev eth0 root
/sbin/tc qdisc del dev eth1 root

# ルートクラス・親クラスの作成

## /dev/eth0 のルートクラスに cbq をセットし、ハンドルを 10 とする ##
/sbin/tc qdisc add dev eth0 root handle 10: cbq bandwidth 10Mbit avpkt 1000 cell 8

## /dev/eth1 のルートクラスに cbq をセットし、ハンドルを 11 とする ##
/sbin/tc qdisc add dev eth1 root handle 11: cbq bandwidth 10Mbit avpkt 1000 cell 8

## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 10:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth0 parent 10:0 classid 10:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000

## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 11:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth1 parent 11:0 classid 11:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000

# /dev/eth0 の帯域制御クラスの作成

## 10Mbit/sec の帯域クラスを priority 1, classid 10:61, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 3, classid 10:63, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 5, classid 10:65, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 7, classid 10:67, parent 10:1 で ##
## 作成し tbf スケジューラを設定 帯域制限あり ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:67 cbq bandwidth 10Mbit rate 224Kbit allot 1514 cell 8 weight 22Kbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:67 tbf rate 224Kbit buffer 10Kb/8 limit 15Kb

# /dev/eth1 の帯域制御クラスの作成

## 10Mbit/sec の帯域クラスを priority 1, classid 11:61, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 3, classid 11:63, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 5, classid 11:65, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

## 10Mbit/sec の帯域クラスを priority 7, classid 11:67, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:67 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:67 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb

# 各帯域クラスを適用するネットワークを定義

## DNS
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 53 0xffff flowid 10:61
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 53 0xffff flowid 11:61

## SSH
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 22 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 22 0xffff flowid 11:63

/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 80 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 80 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 443 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 443 0xffff flowid 11:63

/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 25 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 25 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 110 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 110 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 143 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 143 0xffff flowid 11:63

## NTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 123 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 123 0xffff flowid 11:63

## FTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 20 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 20 0xffff flowid 11:65
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 21 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 21 0xffff flowid 11:65

## その他
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dst any flowid 10:67
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip dst any flowid 11:67

15.10. QoS 付き nat の完全な例

私は Pedro Larroy です。 ここではたくさんのユーザがいるプライベートネットワークを、 パブリックな ip アドレスを持つ Linux ルータを通してインターネットにつなぎ、 この Linux ルータにネットワークアドレス変換 (NAT) をやらせる方法について、 よくある設定例を説明したいと思います。 ここでは QoS 設定を用いて、大学寮の 198 ユーザ (私もその一人。ただし管理者です) にインターネットアクセスを提供します。 ユーザはみなピアツーピアプログラムのヘビーユーザですので、 適切なトラフィック制御が不可欠です。これが興味を持たれた lartc 読者に対する、 実用的な例になっていることを期待します。

まず先に、順番に段階を追った実践的なアプローチを取り、 最後にその処理をブート時に自動的に行うやり方を説明します。 この例が適用されるネットワークは、 パブリック ip アドレスをひとつだけ持つ Linux ルータを介して、 インターネットにつながっているプライベート LAN です。 これを複数のパブリックアドレスに拡張することは非常に簡単で、 iptables のルールをいくつか追加するだけです。 動作環境を作るには、以降のものが必要となります。

Linux 2.4.18 以降のカーネルがインストールされていること

2.4.18 を使っている場合は、HTB パッチが必要です。

tc のバイナリが HTB に対応していること。 コンパイル済みのバイナリが HTB と一緒に配布されています。

15.10.1. まず乏しいバンド幅を最適化しましょう

まずいくつか qdisc を設定して、トラフィックをクラス選別します。 htb qdisc を作り、昇順の優先度を持つ 6 つのクラスを付属させます。 次に、必ず割り当てられた速度を使え、 他のクラスが不要としているバンド幅も使えるクラスを作ります。 優先度を高く (つまり prio 番号を小さく) したクラスは、 余ったバンド幅を先に利用できます。 私たちの接続は下り 2Mb 上り 300kbit/s の ADSL です。 私は 240kbit/s を上限速度としました。これ以上にすると、 おそらく接続のどこかのバッファが効くためでしょうが、 遅延が大きくなり始めるからです。 このパラメータは実験的に測定して、近くのホストに対する遅延を見ながら 増減してください。

CEIL を上りバンド幅上限値の 75% に調整してください。 eth0 になっているところは、インターネットのアクセスに使っている パブリックなインターフェースに変更してください。 まず手始めに、以降を root のシェルで実行します。 CEIL=240
tc qdisc add dev eth0 root handle 1: htb default 15
tc class add dev eth0 parent 1: classid 1:1 htb rate ${CEIL}kbit ceil ${CEIL}kbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 80kbit ceil 80kbit prio 0
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
tc class add dev eth0 parent 1:1 classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
tc qdisc add dev eth0 parent 1:12 handle 120: sfq perturb 10
tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10

ここではまず、深さが 1 レベルの htb ツリーを作りました。 次のような感じです。 +---------+
| root 1: |
| class 1:1 |
| | | | | |
+----+ +----+ +----+ +----+ +----+ +----+
|1:10| |1:11| |1:12| |1:13| |1:14| |1:15|
+----+ +----+ +----+ +----+ +----+ +----+

classid 1:10 htb rate 80kbit ceil 80kbit prio 0

これが優先度が最高のクラスです。このクラスのパケットは、遅延が最も小さく、 余ったバンド幅を最初に割り当てられます。 よってこのクラスの ceil は抑え目に設定しておくのが良いでしょう。 対話的トラフィックのように、遅延が小さいことによる利益が大きいパケットは、 このクラスを使って送ります。具体的には ssh, telnet, dns, quake3, irc, SYN フラグの立ったパケット です。
classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1

これがバルクトラフィックをあてがう最初のクラスです。 この例では、ローカルの web サーバから発するトラフィック (発信元ポートが 80) と、web ページのリクエスト (送信先ポートが 80) です。
classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2

このクラスには、TOS フィールドで Maximize-Throughput ビットが立っている トラフィックと、ルータの「ローカルプロセス」から インターネットに向けて発するトラフィックをおきます。 よって以降のクラスは、このマシンを「経由する」トラフィックだけになります。
classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2

このクラスは、他の NAT されるマシンで、 高い優先度を必要とするバルクトラフィックのためのものです。
classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3

ここにはメール関連のトラフィック (SMTP, pop3 など) と、 TOS フィールドの Minimize-Cost ビットが立ったパケットを入れます。
classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3

最後に、ここにはルータの背後に置かれた、NAT されたマシンからの トラフィックを入れます。 kazaa, edonkey などはここに入れ、 他のサービスと干渉しないようにします。
15.10.2. パケットのクラス選別

qdisc 設定は行いましたが、パケットのクラス選別はまだです。 ですので現在は、送信されるパケットはすべて 1:15 に入ります (なぜなら tc qdisc add dev eth0 root handle 1: htb default 15 を用いたから)。ここで、どのパケットがどこに行くのかを伝える必要があります。 ここが最も重要な部分です。

ではフィルタを設定し、パケットを iptables でクラス選別できるようにします。 私はこの作業には、まずほとんどの場合 iptables を用います。 iptables は柔軟ですし、各ルールでのパケットの計数もできるからです。 また RETURN ターゲットを用いれば、 パケットにすべてのルールを適用しなくて済みます。 次のコマンドを実行します。 tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
tc filter add dev eth0 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12
tc filter add dev eth0 parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13
tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14
tc filter add dev eth0 parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15

ここでは単に、特定の FWMARK 値 (handle x fw) を持った各パケットを 対応するクラス (classid x:x) に送るようカーネルに伝えただけです。 次は、パケットへのマーク付けを iptables を使って行う方法です。

まず、パケットが iptables のフィルタを どのように通るのかを理解しなければなりません。 +------------+ +---------+ +-------------+
Packet -| PREROUTING |--- routing-----| FORWARD |-------+-------| POSTROUTING |- Packets
input +------------+ decision +---------+ | +-------------+ out
| |
+-------+ +--------+
| INPUT |---- Local process -| OUTPUT |
+-------+ +--------+

すべてのテーブルが存在し、デフォルトのポリシーが ACCEPT (-P ACCEPT) になっているとします。まだ iptables に触ったことがなければ、 デフォルトで ok のはずです。 私たちのプライベートネットワークはクラス B のアドレス を持ち、パブリック ip は です。

次にカーネルに実際に NAT を行うよう指示し、 プライベートネットワークのクライアントが外部と通信を開始できるようにします。 echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s -o eth0 -j SNAT --to-source

ここでパケットが 1:15 経由で流れていることを確認しましょう: tc -s class show dev eth0

パケットへの印付けを開始するには、mangle テーブルの PREROUTING チェインにルールを追加します。 iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN

これでプライベートネットワークからインターネットのどこかに ping を行うと、 1:10 のパケット数が増加するのがわかるはずです。見てみましょう: tc -s class show dev eth0

ここでは -j RETURN を行って、パケットが他のルールには行かないようにしました。 icmp パケットは RETURN 以降のルールのマッチ動作の対象にはなりません。 覚えておいてください。では適切に TOS を処理するよう、 他にもルールを追加しましょう。 iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j MARK --set-mark 0x5
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j MARK --set-mark 0x6
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j RETURN

では ssh パケットを優先付けします: iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j RETURN

tcp 接続を開始するパケット、つまり SYN フラグの立ったパケットは、 優先しましょう。 iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN

などなど。mangle の PREROUTING へのルール追加が終わったら、 次のコマンドで PREROUTING テーブルを締めくくりましょう。 iptables -t mangle -A PREROUTING -j MARK --set-mark 0x6

これで、ここまで印付けされなかったトラフィックは 1:15 に向かいます。 実はデフォルトのクラスは 1:15 なので、この最終ステップは不必要です。 ですが設定全体の整合性を保つため、またこのルールのカウンタを見るために、 ここでは印付けを行っています。

同様の作業を OUTPUT ルールに対しても行うといいでしょう。 よってこれらのコマンドを、-A PREROUTING の代わりに -A OUTPUT とおいて繰り返します (s/PREROUTING/OUTPUT/)。 こうするとローカル (この Linux ルータ) で生成されたトラフィックも クラス選別できます。 OUTPUT チェインの最後は、-j MARK --set-mark 0x3 で締めくくり、 ローカルのトラフィックには高めの優先度を与えるようにしました。
15.10.3. この設定を改善する

これでこの設定はすべて動作するようになりました。 グラフを見て、バンド幅がどのように使われているか、 それをどのようにしたいか考えましょう。 これには長い時間をかけましょう。私の場合は最終的に、 このインターネット接続を非常にうまく動作させられるようになりました。 これを行わなければ、常にタイムアウトに悩まされたり、 新しく生成される tcp 接続にまったくバンド幅の配分がなされなかったり、 という状態だったでしょう。

特定のクラスが、ほとんどの間一杯になっているような状況でしたら、 他のキューイング規則をそこにあてがって、 バンド幅の共有をより公平にしてあげるといいでしょう。 tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10

15.10.4. このすべてをブート時に起動する

当然ですが、いろいろな方法があります。 私の場合は [start | stop | stop-tables | start-tables | reload-tables] といったオプションを受け付ける /etc/init.d/packetfilter というスクリプトを書き、qdisc を設定し、必要なカーネルモジュールをロードし、 デーモンのように動作するようにしました。 このスクリプトは同時に、/etc/network/iptables-rules から iptables のルールもロードします。 このファイルの内容は iptables-save で保存、 iptables-restore で復元できます。 Prev Home Next
単一のホストまたはネットワークの速度制限 Up ブリッジと、代理 ARP を用いた擬似ブリッジの構築


カーネルのQoS(Quality of Service)機能を使用すると比較的簡単に帯域制御できます。但し、帯域制御できるのはサーバからの送信だけで受信は制御できないため、例えばFTPのアップロードを制御したい場合はデーモンの機能を併用する必要があります。

カーネルのQoS(Quality of Service)機能を使用するには iproute+tc が必要であるが、最近のデストりには既に入っているのでこちらのインストールは不要である。tcを使用するといろいろなQoS制御ができる反面、かなり時間をかけて内容をしっかり理解しないとほとんど設定不可能であるが、cbq.init というスクリプトを使用するとポート毎の帯域制御であれば簡単に設定できるため、ここではこれを使用する。cbq.initをこちらからダウンロードし、システム起動時に自動起動できるようにする。なお、RedHat系ならそのままで良いが、SuSEの場合はtcのパスが異なるので、2行目でmvする代わりに3行目のようにsedで変換する。
# wget http://jaist.dl.sourceforge.net/sourceforge/cbqinit/cbq.init-v0.7.3
# mv cbq.init-v0.7.3 /etc/init.d/cbq.init
(# sed -e "s/TC=\/sbin\/tc/TC=\/usr\/sbin\/tc/g" cbq.init-v0.7.3 > /etc/init.d/cbq.init)
# chmod 755 /etc/init.d/cbq.init
# chkconfig --add cbq.init


QoS制御で使用する cbq.init 関係の設定ファイル名称及び設置場所はデフォルトで決まっている。


# mkdir /etc/sysconfig/cbq



ファイル名称: cbq-.

cbq-: ここは固定でこのとおりとすること。

例: cbq-1280.My_first_shaper

No. 種 別 パ ラ メ ー タ 概 要 備考
1 デバイス DEVICE=,[,]

:に比例するパラメータで原則の1/10の値にすること。 必須
2 クラス RATE=

:このクラスに割り当てる帯域を指定。単位としてはKbit, Mbitが使用できる。bps, Kbps, Mbps も使用できるが、bytes/secであることに注意しなければならないのと、インタフェース速度との関係がわかりにくいので使用しないほうが無難。 必須

:RATEに対応したパラメータで、原則RATEの1/10(WEIGHT ~= RATE / 10. 適当に四捨五入でもする。)の値にすること。 必須
4 PRIO=<1-8> デフォルト:5

トラヒックの優先度を1-8で指定。値が小さいほど優先的に処理されるので、プロトコル間で差をつける(SSHを最優先にする等)場合に使用できる。 OP
5 フィルタ RULE=[[saddr[/prefix]][:port],][daddr[/prefix]][:port]


サーバの80番ポートをソースとするパケットを制御することになるので、下記のように [サーバアドレス:80,] とRULEに設定する。ソースをキーに制御するので最後の「,」を忘れずに。

| linux |-eth0------*-[client]
Server: Client: any

80 --------------> any


Passiveモードの場合、サーバ側で使用するポートを指定できるデーモンでないと制御できない。おやじのサイトで紹介しているProftpd/vsftpdとも設定が可能なので、使用ポート範囲を設定する。ダウンロードデータはそのポートがソースとなるパケットで送信される範囲指定になるので、以下のように [開始ポート番号/ANDマスク] 設定する。
指定方法のANDマスクの考え方は、ネットワークのサブネットマスクの考え方(の/24)と同じであり、/24を16進で表現したものである。例えば、4096から4127までの32ポートを設定したとすると、[ 4096/0xffe0 ]となり、下記のように4096~4127の数字は[ 0xffe0 ]でANDをとると全て4096となり同じ扱いになる。これでわかるように、開始ポート番号は、使用するポート数に応じて下位nビットが0となる値にしないと関係ないポートまで制限してしまうので、Proftpd等の設定例で示している4000~4029という設定は変更する必要がある。

4096(0x4000) 0100000000000000 [開始ポート]
32(0xffe0) 1111111111100000
AND 0100000000000000

4127(0x401f) 0100000000011111 [終了ポート]
32(0xffe0) 1111111111100000
AND 0100000000000000

| linux |-eth0------*-[client]
Server: Client: any

20/4096-4127 --------------> any 必須
6 タイマ TIME=[,, ...,/]-;/

:ルールを適用する曜日を指定。0-6で 0 が日曜に対応している。
/:上記の2項、3項に同じ。 OP

[設定例] 下記のようなファイルを/etc/sysconfig/cbqディレクトリに設定する。

・cbq-100.http: WWWサーバへの過大なアクセスにより回線を使い切るのを制限する例。


・cbq-101.ftp: FTPサーバからのダウンロードを制限する例。



# /etc/init.d/cbq.init start

tc - traffic control Linux QoS control tool

1 What is QoS

When the kernel has several packets to send out over a network device, it has to decide which ones to send first, which ones to delay, and which ones to drop. This is the job of the packet scheduler, and several different algorithms for how to do this "fairly" have been proposed.

With Linux QoS subsystem (which is constructed of the building blocks of the kernel and user space tools like ip and tc command line utilities) it is possible to make very flexible traffic control.

2 command syntax

tc (traffic controller) is the user level program which can be used to create and associate queues with the network devices. It is used to set up various kinds of queues and associate classes with each of those queues. It is also used to set up filters by which the packets is classified.

Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }

where OBJECT := { qdisc | class | filter }

OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }

Where it's expecting a number for BPS; it understands some suffixes: kbps (*1024), mbps (*1024*1024), kbit (*1024/8), and mbit (*1024*1024/8). If I'm reading the code correctly; "BPS" means Bytes Per Second; if you give a number without a suffix it assumes you want BITS per second (it divides the number you give it by 8). It also understands bps as a suffix.

Where it's expecting a time value, it seems it understands suffixes of s, sec, and secs for seconds, ms, msec, and msecs for milliseconds, and us, usec, and usecs for microseconds.

Where it wants a size parameter, it assumes non-suffixed numbers to be specified in bytes. It also understands suffixes of k and kb to mean kilobytes (*1024), m and mb to mean megabytes (*1024*1024), kbit to mean kilobit (*1024/8), and mbit to mean megabits (*1024*1024/8).

1Mbit == 128Kbps or 1 megabit is 128 kilobytes per second

bps = bits/sec (uhmm...)

kbps = bytes/sec * 1024

mbps = bytes/sec * 1024 * 1024

kbit = bits/sec * 1024

mbit = bits/sec * 1024 * 1024

In the examples Xbit and Xbps are interchangeably, when tc treats them very differently.

note: this is very confusing

note: make sure whenever you are dealing with memory related things like queue size, buffer size that their units are in bytes and when it is bandwidth and rate related parameters the units are in bits.

3 Queueing disciplines

Each network device has a queuing discipline associated with it, which controls how packets enqueued on that device are treated. It can be viewed with ip command:

root@dl:# ip link show

1: lo: <lt;LOOPBACK,UP>gt; mtu 3924 qdisc noqueue

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <lt;BROADCAST,MULTICAST,PROMISC,UP>gt; mtu 1500 qdisc pfifo_fast qlen 100

link/ether 52:54:00:de:bf:19 brd ff:ff:ff:ff:ff:ff

3: tap0: <lt;BROADCAST,MULTICAST,NOARP>gt; mtu 1500 qdisc noop

link/ether fe:fd:00:00:00:00 brd ff:ff:ff:ff:ff:ff
Generally, queueing discipline ("qdisc") is a black box, which is able to enqueue packets and to dequeue them (when device is ready to send something) in order and at times determined by algorithm hidden in it.

By default queueing discipline is pfifo_fast which cannot be manipulated with tc. It is assigned to device when the device is started or when the other qdisc's deleted from the device. That qdiscs have 3 bands which are processed from band 0 to band 2, and when there is a packet in queue in higher priority band (lower number)

Qdisc's are:

FIFO - simple FIFO (packet (p-FIFO) or byte (b-FIFO) )
PRIO - n-band strict priority scheduler
TBF - token bucket filter
CBQ - class based queue
CSZ - Clark-Scott-Zhang
SFQ - stochastic fair queue
RED - random early detection
GRED - generalized random early detection
TEQL - traffic equalizer
ATM - asynchronous transfer mode
DSMARK - DSCP (Diff-Serv Code Point)marker/remarker
qdisc's are divided to two categories:

- "queues", which have no internal structure visible from outside.

- "schedulers", which split all the packets to "traffic classes", using "packet classifiers". ? is qdisc's which can split packets to ``traffic classes''

In turn, classes may have child qdiscs (as rule, queues) attached to them etc. etc. etc.

note: Certain qdiscs can have children and they are classfull, and others are leafs (describe it!)

classfull qdiscs: CBQ, ATM, DSMARK, CSZ and the ( p-FIFO ???? or prio )

leaf qdiscs: TBF, FIFO, SFQ, RED, GRED, TEQL

note: classfull qdiscs can be also leafs

The syntax for managing queuing discipline is:

Usage: tc qdisc [ add | del | replace | change | get ] dev STRING

[ handle QHANDLE ] [ root | ingress | parent CLASSID ]


[ [ QDISC_KIND ] [ help | OPTIONS ] ]

tc qdisc show [ dev STRING ] [ingress]


QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }

OPTIONS := ... try tc qdisc add <lt;desired QDISC_KIND>gt; help
ads a qdisc to device dev
delete qdisc from device dev
replace the qdisc with another
represents the unique handle that is assigned by the user to the queuing discipline. No two queuing disciplines can have the same handle. Qdisc handles always have minor number equal to zero.
indicates that the queue is at the root of a link sharing hierarchy and own all bandwidth on that device. Can only have one root qdisc per device.
policing on the ingress
represents the handle of the parent queuing discipline.
is network device to which we want attach qdisc
is used to determine if the requirements of the queue have been satisfied. The INTERVAL and the TIME_CONSTANT are two parameters that are of very high significance to the estimator. The estimator estimate the bandwidth used by each class over the appropriate time interval, to determine whether or not each class has been receiving its link sharing bandwidth.
Usage: ... estimator INTERVAL TIME-CONST

INTERVAL is interval between measurements

TIME-CONST is averaging time constant

Example: ... est 1sec 8sec
The time constant for the estimator is a critical parameter; this time constant determines the interval over which the router attempts to enforce the link-sharing guidelines.

[1]Unfortunately, rate estimation is not a very easy task. F.e. I did not find a simple way to estimate the current peak rate and even failed to formulate the problem. So I preferred not to built an estimator into the scheduler, but run this task separately. Ideally, it should be kernel thread(s), but for now it runs from timers, which puts apparent top bounds on the number of rated flows, has minimal overhead on small, but is enough to handle controlled load service, sets of aggregates.

We measure rate over A=(1<lt;<lt;interval) seconds and evaluate EWMA:

avrate = avrate*(1-W) + rate*W

where W is chosen as negative power of 2: W = 2(-ewma_log)

The resulting time constant is:

T = A/(-ln(1-W))


* The stored value for avbps is scaled by 25, so that maximal rate is 1Gbit, avpps is scaled by 210.

* Minimal interval is HZ/4=250msec (it is the greatest common divisor for HZ=100 and HZ=1024 8)), maximal interval is (HZ/4)*2EST_MAX_INTERVAL = 8sec. Shorter intervals are too expensive, longer ones can be implemented at user level painlessly.

You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes.

I'm not 100% sure of what I've just said. It's just how I think it works. to stop QoS completely use the following for eth0:

tc qdisc del dev eth0 root

3.1 Class Based Queue

In CBQ, every class has variables idle and avgidle and parameter maxidle used in computing the limit status for the class, and the parameter offtime used in determining how long to restrict throughput for overlimit classes.

The variable idle is the difference between the desired time and the measured actual time between the most recent packet transmissions for the last two packets sent from this class. When the connection is sending more than its allocated bandwidth, then idle is negative. When the connection is sending perfectly at its alloted rate, then idle is zero.
The variable avgidle is the average of idle, and it computed using an exponential weighted moving average (EWMA). When the avgidle is zero or lower, then the class is overlimit (the class has been exceeding its allocated bandwidth in a recent short time interval).
The parameter maxidle gives an upper bound for avgidle. Thus maxidle limits the credit given to a class that has recently been under its allocation.
The parameter offtime gives the time interval that a overlimit must wait before sending another packet. This parameter determines the steady-state burst size for a class when the class is running over its limit.
The minidle parameter gives a (negative) lower bound for avgidle. Thus, a negative minidle lets the scheduler remember that a class has recently used more than its allocated bandwidth.
Usage: ... cbq bandwidth BPS avpkt BYTES [ mpu BYTES ]

[ cell BYTES ] [ ewma LOG ]
represents the maximum bandwidth available to the device to which the queue is attached.
represents the average packet size. This is used in determining the transmission time which is given as Transmission Time t = average packet size / Link Bandwidth
represents the minimum number of bytes that will be sent in a packet. Packets that are of size lesser than mpu are set to mpu. This is done because for ethernet-like interfaces, the minimum packet size is 64. This value is usually set to 64.
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
CBQ class is automatically generated when a CBQ qdisc created. ??

note: rtab is rate table?

note: mariano: should first declare a cbq "parent" class (which uses all the bandwidth) and then declare the two "leaf" classes.

CBQ is complex qdisc and to be fully understood it is good to read Sally Floyds and Van Jacobsons paper.

3.2 Priority

Simple priority queue

Usage: ... prio bands NUMBER priomap P1 P2...

number of bands to add (default 3)
define how the priomap looks like (default to 3-band scheduler map)
So if you define more than 3 bands, make sure to re-define the priomap

In prio as long as there is data to be dequeued in the higher priority queue, prio will favor the higher queue.

3.3 FIFO

Simple First-In-First-Out queue which provides basic store-and-forward capability. FIFO is default qdisc on most real interfaces.

Usage: ... [p|b]fifo [ limit NUMBER ]
"b" stands for bytes, while "p" stands for packets.

maximum length of the queue in bytes for bfifo and in packets for pfifo
This means that the maximum length of the fifo queue is measured in bytes in the first case and in number of packets in the second case.

small note: The fifo queue can be set to 0, but this still allows a single packet to be enqueued.

3.4 TBF

Token Bucket Filter is qdisc which have tokens and works like that if there is token in the bucket it possible to enqueue packet and take token. Kernel puts token in the bucket in some intervals

Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS

[ mtu BYTES[/BYTES] ] [ peakrate KBPS ] [ latency TIME ]
is the number of bytes that can be queued
specifies bits per burst how much can be sent within a given unit of time to not create scheduling concerns
is used indirectly in qdisc's: that's at tc rate is used to calculate the transmition time required for each packet sized from mpu to mtu. Another definition: rate option is what control bandwidth. AFAIK `bandwidth' represents the `real' bandwidth of the device.
is maximum transfer unit
max short term rate
max latency to queuing
Jamal: TBF is influenced by quiet a few parameters; peakrate, rate, MTU, burst size etc. It will do what you ask it to ;->gt; And at times it will let bursts flood the gate i.e you might end up sending at wire speed. What are your parameters like?

3.5 RED

Random Early Detection discard packet even when there is space in the queue. As the queue length increases drop probability also increases. This approach enables sender to be notified that there is likelihood of congestion before it is actually appeared.

Usage: ... red limit BYTES min BYTES max BYTES avpkt BYTES burst PACKETS

probability PROBABILITY bandwidth KBPS [ ecn ]
actual physical size of the queue
minimum threshold in Kilobytes
maximum threshold in Kilobytes.
is average packet size
is burstiness (from Jamal: used to compute time constant ) ???
should be random drop probability
should be the real bandwidth of the interface
? explicit congestion notification (flag or what)
Always make sure that min <lt; max <lt; limit

3.6 GRED

Generalized RED is used in DiffServ implementation and it has virtual queue (VQ) within physical queue. Currently, the number of virtual queues is limited to 16.

GRED is configured in two steps. First the generic parameters are configured to select the number of virtual queues DPs and whether to turn on the RIO-like buffer sharing scheme. Also at this point, a default virtual queue is selected.

The second step is used to set parameters for individual virtual queues.

Usage: ... gred DP drop-probability limit BYTES min BYTES max BYTES

avpkt BYTES burst PACKETS probability PROBABILITY bandwidth KBPS

[prio value]

OR ... gred setup DPs <lt;num of DPs>gt; default <lt;default DP>gt; [grio]
identifies that this is a generic setup for GRED
is the number of virtual queues
specifies default virtual queue
turns on the RIO-like buffering scheme
defines the virtual queue ``physical'' limit in bytes
defines the minimum threshold value in bytes
defines the maximum threshold value in bytes
is the average packet size in bytes
is the wire-speed of the interface
is the number of average-sized packets allowed to burst
defines the drop probability in the range (0...)
identifies the virtual queue assigned to these parameters
identifies the virtual queue priority if grio was set in general parameters

3.7 SFQ

Stochastic Fair Queue as it's name implies. It processes queues in round-robin order.

Usage: ... sfq [ perturb SECS ] [ quantum BYTES ]
is no of seconds after them hashing function will be changed to minimize hash collision to small time interval (the perturb interval).
is DRR (Deficit Round Robin) round quantum like in CBQ.

3.8 ATM

Used to re-direct flows from the default path to ATM VCs. Each flow can have its own ATM VC, but multiple flows can also share the same VC.

Werner: ATM qdisc is different. It takes packets from some traffic stream (no matter what interface or such), and sends it over specific (and typically dedicated) ATM connections.

Werner: Then there's the case of qdiscs that don't really queue data, e.g. sch_dsmark or sch_atm.

3.9 Dsmark

Diff-serv marker isn't really a queuing discipline. It marks packet according to specified rule. It is configured as qdisc first and after that as class (if it is used for classification)

Usage: dsmark indices INDICES [ default_index DEFAULT_INDEX ] [ set_tc_index ]
is the size of the table of (mask,value) pairs. See bellow. (maybe mask value)
is used if the classifier finds no match
if set retrieves the content of the DS field and stores it in skb->gt;tc_index
When invoked to create class it's parameter are:

Usage: ... dsmark [ mask MASK ] [ value VALUE ]
mask on DSCP (default 0xff)
value to or with (default 0)
Outgoing DSCP = (Incoming DSCP AND mask) OR value

Where Incoming DSCP is the DSCP value of the original incoming packet, and Outgoing DSCP is the DSCP that the packet will be assigned as it leaves the queue.


if present, the ingress qdisc is invoked for each packet arriving on the respective interface

ingress is a qdisc that only classifies but doesn't queue

the usual classifiers, classifier combinations, and policing functions can be used

the classification result is stored in skb->gt;tc_index, a la sch_dsmark

if the classification returns a "drop" result (TC_POLICE_SHOT), the packet is discarded. Otherwise, it is accepted.

Since there is no queue for implicit rate limiting (via PRIO, TBF, CBQ, etc.), rate limiting must be done explicitly via policing. This is still done exactly like policing on egress.

4 classes

mps: should I explain what is class and their intimacy with qdisc? Yes? Classes are main component of the QoS. (stupid explanation)

The syntax for creating a class is shown below:

tc class [ add | del | change | get ] dev STRING

[ classid CLASSID ] [ root | parent CLASSID ]

[ [ QDISC_KIND ] [ help | OPTIONS ] ]

tc class show [ dev STRING ] [ root | parent CLASSID ]
Where: QDISC_KIND := { prio | cbq | etc. }

OPTIONS := ... try tc class add <lt;desired QDISC_KIND>gt; help

The QDISC_KIND can be one of the queuing disciplines that support classes. The interpretation of the fields:

represents the handle that is assigned to the class by the user. It consists of a major number and a minor number, which have been discussed already.
indicates that the class represents the root class in the link sharing hierarchy.
indicates the handle of the parent of the queuing discipline.

4.1 CBQ

This algorithm classifies the waiting packets into a tree-like hierarchy of classes; the leaves of this tree are in turn scheduled by separate algorithms (called "disciplines" in this context).

Usage: ... cbq bandwidth BPS rate BPS maxburst PKTS [ avpkt BYTES ]

[ minburst PKTS ] [ bounded ] [ isolated ]

[ allot BYTES ] [ mpu BYTES ] [ weight RATE ]

[ prio NUMBER ] [ cell BYTES ] [ ewma LOG ]


[ split CLASSID ] [ defmap MASK/CHANGE ]
represents the maximum bandwidth that is available to the queuing discipline owned by this class. It is only used as helper value to compute min/max idle values from maxburst and avpkt.
represents the bandwidth that is allocated to this class. rate should be set to the desired bandwidth (you want) to allocate to a given traffic class. The kernel does not use this directly. It uses pre-calculated rate translation tables. It is used to compute overlimit status of class.
represents the number of bytes that will be sent in the longest possible burst.
represents the average number of bytes in a packet belonging to this class.
represents the number of bytes that will be sent in the shortest possible burst.
indicates that the class cannot borrow unused bandwidth from its ancestors. If this is not specified, then the class can borrow unused bandwidth from the parent (default off).
indicates that the class will not share bandwidth with any of non-descendant classes
allot is MTU + MAC header
is explained at page
should be made proportional to the rate.(explain CBQ is implemented using Weighted Round Robin algorithm)
represents the priority that is assigned to this class. priority of value 0 is highest (most important) and value 7 is lowest.
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
is explained at page
is explained at page
field is used for fast access. This is normally the root of the CBQ tree. It can be set to any node in the hierarchy thereby enabling the use of a simple and fast classifier, which is configured only for a limited set of keys to point to this node. Only classes with split node set to this node will be matched. The type of service (TOS in the IP header) and sk->gt;priority is not used for this purpose.
say that best effort traffic, not classfied by another means will fall to this class. defmap is bitmap of logical priorities served by this class
A note about CBQ class setup:

cbq class has fifo qdisc attached by default

You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes. I'm not 100% sure of what I've just said. It's just how I think it works.

5 filters (or classifier)

Filters are used to classify (map) packets based on certain properties of the packet e.g. TOS byte in the IP header, IP addresses, port numbers etc to certain classes. Queuing disciplines uses filters to assign incoming packets to one of its classes. Filters can be maintained per class or per queuing disciplines based on the design of the queuing discipline. Filters are maintained in filter lists. Filter lists are ordered by priority, in ascending order. Also, the entries are keyed by the protocol for which they apply, e.g., IP, UDP etc. Filters for the same protocol on the same filter list must have different priority values.

Filter vary in the scope

Filters have meters associated with them (TB+rate estimator)

Usage: tc filter [ add | del | change | get ] dev STRING

[ pref PRIO ] [ protocol PROTO ]


[ root | classid CLASSID ] [ handle FILTERID ]

[ [ FILTER_TYPE ] [ help | OPTIONS ] ]

tc filter show [ dev STRING ] [ root | parent CLASSID ]


FILTER_TYPE := { rsvp | u32 | fw | route | etc. }

FILTERID := ... format depends on classifier, see there

OPTIONS := ... try tc filter add <lt;desired FILTER_KIND>gt; help
The interpretation of the fields:

represents the priority that is assigned to the filter.
is used by the filter to identify packets belonging only to that protocol. As already mentioned, no two filters can have the same priority and protocol field.
indicates that the filter is at the root of the link sharing hierarchy.
represents the handle of the class to which the filter is applied.
represents the handle by which the filter is identified uniquely. The format of the filter is different for different classifiers.
is explained at page

5.1 filter rsvp

Use RSVP protocol for classification

Usage: ... rsvp ipproto PROTOCOL session DST[/PORT | GPI ]

[ sender SRC[/PORT | GPI ]

[ classid CLASSID ] [ police POLICE_SPEC ]

[ tunnelid ID ] [ tunnel ID skip NUMBER ]


GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |

u{8|16|32} NUMBER mask MASK at OFFSET}

POLICE_SPEC := ... look at TBF

Comparing to general packet classification problem, RSVP needs only several relatively simple rules:

(dst, protocol) are always specified, so that we are able to hash them.

is one of the IP protocol (TCP, UDP and maybe other)
is destination (address?) with or without port, or gpi (Generalized Port Identifier)
may be exact, or may be wildcard, so that we can keep a hash table plus one wildcard entry.
port (or flow label) is important only if src is given.
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ edge router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.


We use a two level hash table: The top level is keyed by destination address and protocol ID, every bucket contains a list of "rsvp sessions", identified by destination address, protocol and DPI(="Destination Port ID"): triple (key, mask, offset).

Every bucket has a smaller hash table keyed by source address (cf. RSVP flowspec) and one wildcard entry for wildcard reservations. Every bucket is again a list of "RSVP flows", selected by source address and SPI(="Source Port ID" here rather than "security parameter index"): triple (key, mask, offset).

All the packets with IPv6 extension headers (but AH and ESP) and all fragmented packets go to the best-effort traffic class.

Two "port id"'s seems to be redundant, rfc2207 requires only one "Generalized Port Identifier". So that for classic ah, esp (and udp,tcp) both *pi should coincide or one of them should be wildcard.

At first sight, this redundancy is just a waste of CPU resources. But DPI and SPI add the possibility to assign different priorities to GPIs. Look also at note 4 about tunnels below.

One complication is the case of tunneled packets. We implement it as following: if the first lookup matches a special session with "tunnelhdr" value not zero, flowid doesn't contain the true flow ID, but the tunnel ID (1...255). In this case, we pull tunnelhdr bytes and restart lookup with tunnel ID added to the list of keys. Simple and stupid 8)8) It's enough for PIMREG and IPIP.

Two GPIs make it possible to parse even GRE packets. F.e. DPI can select ETH_P_IP (and necessary flags to make tunnelhdr correct) in GRE protocol field and SPI matches GRE key. Is it not nice? 8)8)

Well, as result, despite its simplicity, we get a pretty powerful classification engine.

Panagiotis Stathopoulos: Well an rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ egde router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.

note: I have to read more about RSVP

5.2 filter u32

Anything in the header can be used for classification

The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules.

Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]

[ police POLICE_SPEC ] [ offset OFFSET_SPEC ]

[ ht HTID ] [ hashkey HASHKEY_SPEC ]

[ sample SAMPLE ]

or u32 divisor DIVISOR


SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} } SAMPLE_ARGS FILTERID := X:Y:Z

SELECTOR contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful.




is hash table
is the key to hash table
is protocol such as IP or higher layer protocol such as UDP, TCP or ICMP. sample can be one of the keywords u32, u16 or u8 specifies length of the pattern in bits. PATTERN and MASK should follow, of length defined by the previous keyword. The OFFSET parameter is the offset, in bytes, to start matching. If nexthdr+ keyword is given, the offset is relative to start of the upper layer header.
specification is explained on the page
The syntax here is match ip <lt;item>gt; <lt;value>gt; <lt;mask>gt;

So match ip protocol 6 0xff matches protocol 6, TCP. (See /etc/protocols) match ip dport 0x17 0xffff is TELNET (/etc/services). Note that the number is hexadecimal, not decimal.

note: (mps) ht - hash table HTID Hash Table ID is fh - filter handle in filter show

The filters are packed to hash tables of key nodes with a set of 32bit key/mask pairs at every node. Nodes reference next level hash tables etc.

It seems that it represents the best middle point between speed and manageability both by human and by machine.

It is especially useful for link sharing combined with QoS; pure RSVP doesn't need such a general approach and can use much simpler (and faster) schemes.

5.3 filter fw

Classifier mapping ipchains' fwmark to traffic class

Usage: ... fw [ classid CLASSID ] [ police POLICE_SPEC ]

POLICE_SPEC := ... look at TBF

is class handle
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?

5.4 filter route

Use routing table decisions for classification

Usage: ... route [ from REALM | fromif TAG ] [ to REALM ]

[ flowid CLASSID ] [ police POLICE_SPEC ]

POLICE_SPEC := ... look at TBF

REALM is realm in ip route table
TAG is interface tag
REALM is (again) ip route table realm
CLASSID is class to which packet (if passed) is
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
For now we assume that route tags <lt; 256. It allows to use direct table lookups, instead of hash tables.
For now we assume that "from TAG" and "fromdev DEV" statements are mutually exclusive.
"to TAG from ANY" has higher priority, than "to ANY from XXX"

5.5 tcindex

Use tc_index internal tag in skb to select classes.

Usage: ... tcindex [ hash SIZE ] [ mask MASK ] [ shift SHIFT ] [ pass_on | fall_through ] [ classid CLASSID ] [ police POLICE_SPEC ]
is the size of the lookup table
is the bit mask (this explanation is worthless)
the mask right by SHIFT number
defines that this packet will pass

is the class to which filter is attached
specification is explained on the page
note: key = (skb->gt;tc_index >gt;>gt; shift) & mask

6 police

The purpose of policing is to ensure that traffic does not exceed certain bounds. For simplicity, we will assume a broad definition of policing and consider it to comprise all kinds of traffic control actions that depend in some way on the traffic volume.

We consider four types of policing mechanisms:

policing decisions by filters
refusal to enqueue a packet
dropping of a packet from an ``inner'' queueing discipline
dropping of packet when enqueuing a new one
Usage: ... police rate BPS burst BYTES[/BYTES] [ mtu BYTES[/BYTES] ]

[ peakrate BPS ] [ avrate BPS ] [ ACTION ]

Where: ACTION := reclassify | drop | continue
is the long-term rate attached to the meter
this is the peakrate a flow is allowed to burst in the short-term. Basically this upper-bounds the rate.
a packet exceeding this size will be dropped. The default value is 2KB. This is fine with ethernet whose MTU is 1.5KB but will not be fine with Gigabit ethernet exploiting Jumbo frames for example. It also will not be valid for the lo device whose MTU is defined by amongst other things how much RAM you have. You must set this value if you have exceptions to the rule.
exceed/non-exceed: This allows to define what actions should be exercised when a flow either exceeds its allocated or doesn't. they are:
used by CBQ to go to BE (Best Effort, ask Jamal?)
simply drops packet
- lookup the next filter rule with lower priority
note: "drop" is only recognized by the following qdiscs: atm, cbq, dsmark, and (ingress - really?). In particular, prio ignores it.


A. N. Kuznetsov, docs from iproute2

Werner Almesberger, Linux Network Traffic Control - Implementation Overview

Jamal Hadi Salim, IP Quality of Service on Linux http://????

Saravanan Radhakrishnan, Linux - Advanced Networking Overview http://qos.ittc.ukans.edu/howto/howto.html

Almesberger, Jamal Hadi Salim, Alexey Kuznetsov - Differentiated Services on Linux

linux-diffserv mailing list linux-diffserv@lrc.di.epfl.ch

Sally Floyd, Van Jacobson - Link-sharing and Resource Management Models for Packet Networks

Sally Floyd, Van Jacobson - Random Early Detection Gateways for Congestion Avoidance

Related Cisco documents from http://www.cisco.com/

Lixia Zhang, Steve Deering, Deborah Estrin, Scott Shenker, Daniel Zapalla - RSVP: A New Resource ReSerVation Protocol

Related RFC's

and many others


Setting events 10210, 10211, 10212, and 10225 can be done by adding the following line for each event in the init.ora file:

Event = "event_number trace name errorstack forever, level 10"

When event 10210 is set, the data blocks are checked for corruption by checking their integrity. Data blocks that don't match the format are marked as soft corrupt.

When event 10211 is set, the index blocks are checked for corruption by checking their integrity. Index blocks that don't match the format are marked as soft corrupt.

When event 10212 is set, the cluster blocks are checked for corruption by checking their integrity. Cluster blocks that don't match the format are marked as soft corrupt.

When event 10225 is set, the fet$ and uset$ dictionary tables are checked for corruption by checking their integrity. Blocks that don't match the format are marked as soft corrupt.

Set event 10231 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing full table scans:

Event="10231 trace name context forever, level 10"

Set event 10233 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing index range scans:

Event="10233 trace name context forever, level 10"

To dump the Oracle block you can use below command from 8.x on words:

This command dumps datablock 9 in datafile11, into USER_DUMP_DEST directory.

Dumping Redo Logs file blocks:

SQL> ALTER SYSTEM DUMP LOGFILE ‘/usr/oracle8/product/admin/udump/rl. log’;

Rollback segments block corruption, it will cause problems (ORA-1578) while starting up the database.

With support of oracle, can use below under source parameter to startup the database.



This parameter is normally used to debug corruption’s that happen on disk.

The following V$ views contain information about blocks marked logically corrupt:


When this parameter is set, while reading a block from disk to catch, oracle will compute the checksum again and compares it with the value that is in the block.

If they differ, it indicates that the block is corrupted on disk. Oracle makes the block as corrupt and signals an error. There is an overhead involved in setting this parameter.


Oracle will catch stray writes made by processes in the buffer catch.

Oracle 9i new RMAN futures:

Obtain the datafile numbers and block numbers for the corrupted blocks. Typically, you obtain this output from the standard output, the alert.log, trace files, or a media management interface. For example, you may see the following in a trace file:

ORA-01578: ORACLE data block corrupted (file # 9, block # 13)
ORA-01110: data file 9: '/oracle/dbs/tbs_91.f'
ORA-01578: ORACLE data block corrupted (file # 2, block # 19)
ORA-01110: data file 2: '/oracle/dbs/tbs_21.f'

$rman target =rman/rman@rmanprod
RMAN> run {
2> allocate channel ch1 type disk;
3> blockrecover datafile 9 block 13 datafile 2 block 19;
4> }

Recovering Data blocks Using Selected Backups:

# restore from backupset

# restore from datafile image copy

# restore from backupset with tag "mondayAM"

# restore using backups made before one week ago

# restore using backups made before SCN 100

# restore using backups made before log sequence 7024

The main goal, is to be able to "assign" a maximum (or fixed) bandwidth available to a vhost.
This is achieved inserting small delays while sending the data, thus limiting the top speed a client can use. In example, if we assign 100kb to a vhost, the first user will be able to download at 100kb. If another user starts downloading, each will be able to get 50kb/s max... A third, 33kb/s each.. and so on.

J. スクリプト例コードベース
J.1. Example rc.firewall script
J.2. Example rc.DMZ.firewall script
J.3. Example rc.UTIN.firewall script
J.4. Example rc.DHCP.firewall script
J.5. Example rc.flush-iptables script
J.6. Example rc.test-iptables script
Object-oriented GUI and set of compilers for various firewall platforms. Currently implemented compilers for iptables, ipfilter, OpenBSD pf, ipfw, Cisco PIX firewall and routers access lists.

Another great new feature in tablespace management is rename tablespace.
Tablespace Rename Overview

In Oracle 10g, you can simply rename a tablespace TBS01 to TBS02 by issuing the following command:


However, you must follow the rules when renaming a tablespace:
You must set compatibility level to at least 10.0.1.
You cannot rename the SYSTEM or SYSAUX tablespaces.
You cannot rename an offline tablespace.
You cannot rename a tablespace that contains offline datafiles.
Renaming a tablespace does not changes its tablespace identifier.
Renaming a tablespace does not change the name of its datafiles.
Tablespace Rename Benefits

Tablespace rename provides the following benefits:
It simplifies the process of tablespace migration within a database.
It simplifies the process of transporting a tablespace between two databases.

Example 1: Rename a tablespace within a database. In Oracle9i or earlier releases, you must take the following steps to rename a tablespace from OLD_TBS to NEW_TBS:
Create a new tablespace NEW_TBS.
Copy all objects from OLD_TBS to NEW_TBS.
Drop tablespace OLD_TBS.

In Oracle 10g, you can accomplish the same thing in one step and rename tablespace OLD_TBS to NEW_TBS.


Example 2: Transport a tablespace between two databases. In the following example (see figure 3.2), you cannot transport a tablespace TBS01 from database A to database B in the previous release of Oracle server because database B also has a tablespace called TBS01. In Oracle 10g, you can simply rename TBS01 to TBS02 in database B before transporting tablespace TBS01.

Oracle 11g XDB Guide 28 Using Protocols to Access the Repository

Overview of Oracle XML DB Protocol Server

As described in Chapter 2, "Getting Started with Oracle XML DB" and Chapter 21, "Accessing Oracle XML DB Repository Data", Oracle XML DB Repository provides a hierarchical data repository in the database, designed for XML. Oracle XML DB Repository maps path names (or URLs) onto database objects of XMLType and provides management facilities for these objects.

Oracle XML DB also provides the Oracle XML DB protocol server. This supports standard Internet protocols, FTP, WebDAV, and HTTP(S), for accessing its hierarchical repository or file system. Note that HTTPS provides secure access to Oracle XML DB Repository.

These protocols can provide direct access to Oracle XML DB for many users without having to install additional software. The user names and passwords to be used with the protocols are the same as those for SQL*Plus. Enterprise users are also supported. Database administrators can use these protocols and resource APIs such as DBMS_XDB to access Automatic Storage Management (ASM) files and folders in the repository virtual folder /sys/asm.

When accessing virtual folder /sys/asm using Oracle XML DB protocols, you must log in as a DBA user other than SYS.

Oracle XML DB protocols are not supported on EBCDIC platforms.
Session Pooling

Oracle XML DB protocol server maintains a shared pool of sessions. Each protocol connection is associated with one session from this pool. After a connection is closed the session is put back into the shared pool and can be used to serve later connections.

Session pooling improves performance of HTTP(S) by avoiding the cost of re-creating session states, especially when using HTTP 1.0, which creates new connections for each request. For example, a couple of small files can be retrieved by an existing HTTP/1.1 connection in the time necessary to create a database session. You can tune the number of sessions in the pool by setting session-pool-size in Oracle XML DB xdbconfig.xml file, or disable it by setting pool size to zero.

Session pooling can affect users writing Java servlets, because other users can see session state initialized by another request for a different user. Hence, servlet writers should only use session memory, such as Java static variables, to hold data for the entire application rather than for a particular user. State for each user must be stored in the database or in a lookup table, rather than assuming that a session will only exist for a single user.

Figure 28-1 illustrates the Oracle XML DB protocol server components and how they are used to access files in Oracle XML DB Repository and other data. Only the relevant components of the repository are shown

Figure 28-1 Oracle XML DB Architecture: Protocol Server

Description of "Figure 28-1 Oracle XML DB Architecture: Protocol Server "

Oracle XML DB Protocol Server Configuration Management

Oracle XML DB protocol server uses configuration parameters stored in /xdbconfig.xml to initialize its startup state and manage session level configuration. The following section describes the protocol-specific configuration parameters that you can configure in the Oracle XML DB configuration file. The session pool size and timeout parameters cannot be changed dynamically, that is, you will need to restart the database in order for these changes to take effect.

Configuring Protocol Server Parameters

Figure 28-1 shows the parameters common to all protocols. All parameter names in this table, except those starting with /xdbconfig, are relative to the following XPath in the Oracle XML DB configuration schema:

FTP-specific parameters – Table 28-2 shows the FTP-specific parameters. These are relative to the following XPath in the Oracle XML DB configuration schema:

HTTP(S)/WebDAV specific parameters, except servlet-related parameters – Table 28-3 shows the HTTP(S)/WebDAV-specific parameters. These parameters are relative to the following XPath in the Oracle XML DB configuration schema:

You must either configure the port separately for each node of a Real Application Cluster (RAC) or configure it for one node and then restart the database instances on the other nodes. See "Configuring Oracle XML DB Using xdbconfig.xml".

Table 28-1 Common Protocol Configuration ParametersParameter Description

Specifies the mapping of file extensions to mime types. When a resource is stored in Oracle XML DB Repository, and its mime type is not specified, this list of mappings is used to set its mime type.

Specifies the mapping of file extensions to languages. When a resource is stored in Oracle XML DB Repository, and its language is not specified, this list of mappings is used to set its language.

Specifies the mapping of file extensions to encodings. When a resource is stored in Oracle XML DB Repository, and its encoding is not specified, this list of mappings is used to set its encoding.

Specifies the list of filename extensions that are treated as XML content by Oracle XML DB.

Maximum number of sessions that are kept in the protocol server session pool

If a connection is idle for this time (in hundredths of a second), then the shared server serving the connection is freed up to serve other connections.

Time (in hundredths of a second) after which a session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time. This parameter is used only if the specific protocol session timeout is not present in the configuration

Specifies the default schema location for a given namespace. This is used if the instance XML document does not contain an explicit xsi:schemaLocation attribute.

Time period after which a WebDAV lock on a resource becomes invalid. This could be overridden by a Timeout specified by the client that locks the resource.

Table 28-2 Configuration Parameters Specific to FTPParameter Description

Size of the buffer, in bytes, used to read data from the network during an FTP put operation. Set buffer-size to larger values for higher put performance. There is a trade-off between put performance and memory usage. Value can be from 1024 to 1048496, inclusive; the default value is 8192.

Port on which FTP server listens. By default, this is 0, which means that FTP is disabled. FTP is disabled by default because the FTP specification requires that passwords be transmitted in clear text, which can present a security hazard. To enable FTP, set this parameter to the FTP port to use, such as 2100.

Protocol over which the FTP server runs. By default, this is tcp.

A user-defined welcome message that is displayed whenever an FTP client connects to the server. If this parameter is empty or missing, then the following default welcome message is displayed: "Unauthorized use of this FTP server is prohibited and may be subject to civil and criminal prosecution."

Time (in hundredths of a second) after which an FTP connection will be terminated by the protocol server if the connection has been idle for that time.

Table 28-3 Configuration Parameters Specific to HTTP(S)/WebDAV (Except Servlet Parameters)Parameter Description

Port on which the HTTP(S)/WebDAV server listens, using protocol http-protocol. By default, this is 0, which means that HTTP is disabled. If this parameter is empty (), then the default value of 0 applies. An empty parameter is not recommended.

This parameter must be present, whether or not it is empty; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. The value must be different from the value of http2-port; otherwise, an error is raised.

Port on which the HTTP(S)/WebDAV server listens, using protocol http2-protocol.

This parameter is optional, but, if present, then http2-protocol must also be present; otherwise, an error is raised. The value must be different from the value of http-port; otherwise, an error is raised. An empty parameter () also raises an error.

Protocol over which the HTTP(S)/WebDAV server runs on port http-port. Must be either TCP or TCPS.

This parameter must be present; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. An empty parameter () also raises an error.

Protocol over which the HTTP(S)/WebDAV server runs on port http2-port. Must be either TCP or TCPS. If this parameter is empty (), then the default value of TCP applies. (An empty parameter is not recommended.)

This parameter is optional, but, if present, then http2-port must also be present; otherwise, an error is raised.

Time (in hundredths of a second) after which an HTTP(S) session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time.

Maximum size (in bytes) of an HTTP(S) header

Maximum size (in bytes) of an HTTP(S) request body

List of filenames that are considered welcome files. When an HTTP(S) get request for a container is received, the server first checks if there is a resource in the container with any of these names. If so, then the contents of that file are sent, instead of a list of resources in the container.

The character set in which an HTTP(S) protocol server assumes incoming URL is encoded when it is not encoded in UTF-8 or the Content-Type field Charset parameter of the request.

Indication of whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. See "Anonymous Access to Oracle XML DB Repository using HTTP".

Configuring Secure HTTP (HTTPS)

Enable the HTTP Listener to Use SSL

A DBA must carry out the following steps, to configure the HTTP Listener for SSL.

Create a wallet for the server and import a certificate – Use Oracle Wallet Manager to do the following:

Create a wallet for the server.

If a valid certificate with distinguished name (DN) of the server is not available, create a certificate request and submit it to a certificate authority. Obtain a valid certificate from the authority.

Import a valid certificate with the distinguished name (DN) of the server into the server.

Save the new wallet in obfuscated form, so that it can be opened without a password.

Specify the wallet location to the server – Use Oracle Net Manager to do this. Ensure that the configuration is saved to disk. This step updates files sqlnet.ora and listener.ora.

Disable client authentication at the server, since most Web clients do not have certificates. Use Oracle Net Manager to do this. This step updates file sqlnet.ora.

Create a listening end point that uses TCP/IP with SSL – Use Oracle Net Manager to do this. This step updates file listener.ora.

Enable TCPS Dispatcher

A DBA must edit the database pfile to enable launching of a TCPS dispatcher during database startup. The following line must be added to the file, where SID is the SID of the database:

The database pfile location depends on your operating system, as follows:

MS Windows – PARENT/admin/orcl/pfile, where PARENT is the parent folder of folder ORACLE_HOME

Unix, Linux – $ORACLE_HOME/admin/$ORACLE_SID/pfile
Interaction with Oracle XML DB File-System Resources

The protocol specifications, RFC 959 (FTP), RFC 2616 (HTTP), and RFC 2518 (WebDAV) implicitly assume an abstract, hierarchical file system on the server side. This is mapped to Oracle XML DB Repository. The repository provides:

Name resolution.

Security based on access control lists (ACLs). An ACL is a list of access control entries that determine which principals have access to a given resource or resources. See also Chapter 27, "Repository Resource Security".

The ability to store and retrieve any content. The repository can store both binary data input through FTP and XML schema-based documents.

Protocol Server Handles XML Schema-Based or Non-Schema-Based XML Documents

Oracle XML DB protocol server enhances the protocols by always checking if XML documents being inserted are based on XML schemas registered in Oracle XML DB Repository.

If the incoming XML document specifies an XML schema, then the Oracle XML DB storage to use is determined by that XML schema. This functionality is especially useful when you must store XML documents object-relationally in the database using simple protocols like FTP or WebDAV instead of using SQL statements.

If the incoming XML document is not XML schema-based, then it is stored as a binary document.
Event-Based Logging

In certain cases, it may be useful to log the requests received and responses sent by a protocol server. This can be achieved by setting event number 31098 to level 2. To set this event, add the following line to your init.ora file and restart the database:
event="31098 trace name context forever, level 2"
Using FTP and Oracle XML DB Protocol Server

The following sections describe FTP features supported by Oracle XML DB.
Oracle XML DB Protocol Server: FTP Features

File Transfer Protocol (FTP) is one of the oldest and most popular protocols on the net. FTP is specified in RFC959 and provides access to heterogeneous file systems in a uniform manner. FTP works by providing well-defined commands (methods) for communication between the client and the server. The transfer of command messages and the return of status happens on a single connection. However, a new connection is opened between the client and the server for data transfer. With HTTP(S), commands and data are transferred using a single connection.

FTP is implemented by dedicated clients at the operating system level, file-system explorer clients, and browsers. FTP is typically session-oriented: a user session is created through an explicit logon, a number of files or directories are downloaded and browsed, and then the connection is closed.

For security reasons, FTP is disabled, by default. This is because the IETF FTP protocol specification requires that passwords be transmitted in clear text. Disabling is done by configuring the FTP server port as zero (0). To enable FTP, set the ftp-port parameter to the FTP port to use, such as 2100.

FTP Features That Are Not Supported

Oracle XML DB implements FTP, as defined by RFC 959, with the exception of the following optional features:

Record-oriented files, for example, only the FILE structure of the STRU method is supported. This is the most widely used structure for transfer of files. It is also the default specified by the specification. Structure mount is not supported.


Allocate. This pre-allocates space before file transfer.

Account. This uses the insecure Telnet protocol.

FTP Client Methods That Are Supported

For access to the repository, Oracle XML DB supports the following FTP client methods.

cdup – change working directory to parent directory

cwd – change working directory

dele – delete file (not directory)

list, nlst – list files in working directory

mkd – create directory

noop – do nothing (but timeout counter on connection is reset)

pasv, port – establish a TCP data connection

pwd – get working directory

quit – close connection and quit FTP session

retr – retrieve data using an established connection

rmd – remove directory

rnfr, rnto – rename file (two-step process: from file, to file)

stor – store data using an established connection

syst – get system version

type – change data type: ascii or image binary types only

user, pass – user login

FTP Quote Methods

Oracle Database supports several FTP quote methods, which provide information directly to Oracle XML DB.

rm_r – Remove file or folder . If a folder, recursively remove all files and folders contained in .
quote rm_r

rm_f – Forcibly remove a resource.
quote rm_f

rm_rf – Combines rm_r and rm_f: Forcibly and recursively removes files and folders.
quote rm_rf

set_nls_locale – Specify the character-set encoding () to be used for file and directory names in FTP methods (including names in method responses).
quote set_nls_locale { | NULL}

Only IANA character-set names can be specified for . If nls_locale is set to NULL or is not set, then the database character set is used.

set_charset – Specify the character set of the data to be sent to the server.
quote set_charset { | NULL}

The set_charset method applies to only text files, not binary files, as determined by the file-extension mapping to MIME types that is defined in configuration file xdbconfig.xml.

If the parameter provided to set_charset is (not NULL), then it specifies the character set of the data.

If the parameter provided to set_charset is NULL, or if no set_charset command is given, then the MIME type of the data determines the character set for the data.

If the MIME type is not text/xml), then the data is not assumed to be XML. The database character set is used.

If the MIME type is text/xml, then the data represents an XML document.

If a byte order markFoot 1 (BOM) is present in the XML document, then it determines the character set of the data.

If there is no BOM, then:

If there is an encoding declaration in the XML document, then it determines the character set of the data.

If there is no encoding declaration, then the UTF-8 character set is used.
Using FTP with ASM Files

Automatic Storage Management (ASM) organizes database files into disk groups for simplified management and added benefits such as database mirroring and I/O balancing. Database administrators can use protocols and resource APIs to access ASM files in the Oracle XML DB repository virtual folder /sys/asm. All files in /sys/asm are binary.

Typical uses are listing, copying, moving, creating, and deleting ASM files and folders. Example 28-1 is an example of navigating the ASM virtual folder and listing the files in a subfolder.

Example 28-1 Navigating ASM Folders

The structure of the ASM virtual folder, /sys/asm, is described in Chapter 21, "Accessing Oracle XML DB Repository Data". In this example, the disk groups are DATA and RECOVERY; the database name is MFG; and the directories created for aliases are dbs and tmp. This example navigates to a subfolder, lists its files, and copies a file to the local file system.
ftp> open myhost 7777
ftp> user system
Password required for SYSTEM
Password: password
ftp> cd /sys/asm
ftp> ls
ftp> cd DATA
ftp> ls
ftp> cd dbs
ftp> ls
ftp> binary
ftp> get t_dbl.f, t_axl.f
ftp> put my_db2.f

In this example, after connecting to and logging onto database myhost (first three lines), FTP methods cd and ls are used to navigate and list folders, respectively. When in folder /sys/asm/DATA/dbs, FTP command get is used to copy files t_db1.f and t_ax1.f to the current folder of the local file system. Then, FTP command put is used to copy file my_db2.f from the local file system to folder /sys/asm/DATA/dbs.

Database administrators can copy ASM files from one database server to another, as well as between the database and a local file system. Example 28-2 shows copying between two databases. For this, the proxy FTP client method can be used, if available. The proxy method provides a direct connection to two different remote FTP servers.

Example 28-2 copies an ASM file from one database to another. Terms with the suffix 1 correspond to database server1; terms with the suffix 2 correspond to database server2. Note that, depending on your FTP client, the passwords you type might be echoed on your screen. Take the necessary precautions so that others do not see these passwords.

Example 28-2 Transferring ASM Files Between Databases with FTP proxy Method
1 ftp> open server1 port1
2 ftp> user username1
3 Password required for USERNAME1
4 Password: password-for-username1
5 ftp> cd /sys/asm/DATAFILE/MFG/DATAFILE
6 ftp> proxy open server2 port2
7 ftp> proxy user username2
8 Password required for USERNAME2
9 Password: password-for-username2
10 ftp> proxy cd /sys/asm/DATAFILE/MFG/DATAFILE
11 ftp> proxy put dbs2.f tmp1.f
12 ftp> proxy get dbs1.f tmp2.f

In this example:

Line 1 opens an FTP control connection to the Oracle XML DB FTP server, server1.

Lines 2–4 log the DBA onto server1 as USERNAME1.

Line 5 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server1.

Line 6 opens an FTP control connection to the second database server, server2. At this point, the FTP command proxy ? could be issued to see the available FTP commands on the secondary connection. (This is not shown.)

Lines 7–9 log the DBA onto server2 as USERNAME2.

Line 10 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server2.

Line 11 copies ASM file dbs2.f from server2 to ASM file tmp1.f on server1.

Line 12 copies ASM file dbs1.f from server1 to ASM file tmp2.f on server2.
Using FTP on the Standard Port Instead of the Oracle XML DB Default Port

You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure FTP to listen on any port. By default, FTP listens on a nonstandard, unprotected port. To use FTP on the standard port, 21, your DBA must do the following:

(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr

(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name:
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = 21))

(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &

Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.

Use PL/SQL procedure DBMS_XDB.setftpport with SYS as SYSDBA to set the FTP port number to 21 in the Oracle XML DB configuration file /xdbconfig.xml:
SQL> exec DBMS_XDB.setFTPPort(21);

Force the database to reregister with the listener, using this SQL statement:

Check that the listener is correctly configured, using this shell command:
% lsnrctl status

FTP Server Session Management

Oracle XML DB protocol server also provides session management for this protocol. After a short wait for a new command, FTP returns to the protocol layer and the shared server is freed up to serve other connections. The duration of this short wait is configurable by changing parameter call-timeOut in the Oracle XML DB configuration file. For high traffic sites, call-timeout should be shorter, so that more connections can be served. When new data arrives on the connection, the FTP server is re-invoked with fresh data. So, the long running nature of FTP does not affect the number of connections which can be made to the protocol server.

See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Handling Error 421. Modifying the Default Timeout Value of an FTP Session

If you are frequently disconnected from the server and have to reconnect and traverse the entire directory before doing the next operation, you may need to modify the default timeout value for FTP sessions. If the session is idle for more than this period, it gets disconnected. You can increase the timeout value (default = 6000 centiseconds) by modifying the configuration document as follows and then restart the database:

Example 28-3 Modifying the Default Timeout Value of an FTP Session
newconfig XMLType;
INTO newconfig
FTP Client Failure in Passive Mode

Do not use FTP in passive mode to connect remotely to a server that has HOSTNAME configured in Listener.ora as localhost or If the HOSTNAME specified in server file Listener.ora is localhost or, then the server is configured for local use only. If you try to connect remotely to the server using FTP in passive mode, the FTP client will fail. This is because the server passes IP address (derived from HOSTNAME) to the client, which makes the client try to connect to itself, not to the server.
Using HTTP(S) and Oracle XML DB Protocol Server

Oracle XML DB implements HyperText Transfer Protocol (HTTP), HTTP 1.1 as defined in the RFC2616 specification.
Oracle XML DB Protocol Server: HTTP(S) Features

The Oracle XML DB HTTP(S) component in the Oracle XML DB protocol server implements the RFC2616 specification with the exception of the following optional features:

gzip and compress transfer encodings

byte-range headers

The TRACE method (used for proxy error debugging)

Cache-control directives (these require you to specify expiration dates for content, and are not generally used)

TE, Trailer, Vary & Warning headers

Weak entity tags

Web common log format

Multi-homed Web server

HTTP(S) Features That Are Not Supported

Digest Authentication (RFC 2617) is not supported. Oracle XML DB supports Basic Authentication, where a client sends the user name and password in clear text in the Authorization header.
HTTP(S) Client Methods That Are Supported

For access to the repository, Oracle XML DB supports the following HTTP(S) client methods.

OPTIONS – get information about available communication options

GET – get document/data (including headers)

HEAD – get headers only, without document body

PUT – store data in resource

DELETE – delete resource

The semantics of these HTTP(S) methods are in accordance with WebDAV. Servlets and Web services may support additional HTTP(S) methods, such as POST.

Using HTTP(S) on a Standard Port Instead of an Oracle XML DB Default Port

You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure HTTP(S) to listen on any port. By default, HTTP(S) listens on a nonstandard, unprotected port. To use HTTP or HTTPS on a standard port (80 for HTTP, 443 for HTTPS), your DBA must do the following:

(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr

(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name, and port_number is 80 for HTTP or 443 for HTTPS:
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = port_number))

(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &

Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.

Use PL/SQL procedure DBMS_XDB.sethtpport with SYS as SYSDBA to set the HTTP(S) port number to port_number in the Oracle XML DB configuration file /xdbconfig.xml, where port_number is 80 for HTTP or 443 for HTTPS:
SQL> exec DBMS_XDB.setHTTPPort(port_number);

Force the database to reregister with the listener, using this SQL statement:

Check that the listener is correctly configured:
% lsnrctl status

HTTPS: Support for Secure HTTP

If properly configured, you can access Oracle XML DB Repository in a secure fashion, using HTTPS. See "Configuring Secure HTTP (HTTPS)" for configuration information.

If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use HTTPS for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry. For information about the latter, see http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2netwk.mspx#XSLTsection129121120120
Anonymous Access to Oracle XML DB Repository using HTTP

Configuration parameter allow-repository-anonymous-access controls whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. To allow anonymous HTTP access to the repository, you must set this parameter to true, and unlock the ANONYMOUS user account.

There is an inherent security risk associated with allowing anonymous access to the repository.

Parameter allow-repository-anonymous-access does not control anonymous access to the repository using servlets. Each servlet has its own security-role-ref parameter value to control its access.

Using Java Servlets with HTTP(S)

Oracle XML DB supports Java servlets. To use a Java servlet, it must be registered with a unique name in the Oracle XML DB configuration file, along with parameters to customize its action. It should be compiled, and loaded into the database. Finally, the servlet name must be associated with a pattern, which can be an extension such as *.jsp or a path name such as /a/b/c or /sys/*, as described in Java servlet application program interface (API) version 2.2.

While processing an HTTP(S) request, the path name for the request is matched with the registered patterns. If there is a match, then the protocol server invokes the corresponding servlet with the appropriate initialization parameters. For Java servlets, the existing Java Virtual Machine (JVM) infrastructure is used. This starts the JVM if need be, which in turn runs a Java method to initialize the servlet, create response, and request objects, pass these on to the servlet, and run it.

Embedded PL/SQL Gateway

You can use the PL/SQL gateway to implement a Web application entirely in PL/SQL. There are two implementations of the PL/SQL gateway:

mod_plsql – a plug-in of Oracle HTTP Server that lets you invoke PL/SQL stored procedures using HTTP(S). Oracle HTTP Server is a component of both Oracle Application Server and Oracle Database; it should not be confused with the HTTP component of the Oracle XML DB protocol server.

the embedded PL/SQL gateway – a gateway implementation that runs in the Oracle XML DB HTTP listener.

With the PL/SQL gateway (either implementation), a Web browser sends an HTTP(S) request in the form of a URL that identifies a stored procedure and provides it with parameter values. The gateway translates the URL, calls the stored procedure with the parameter values, and returns output (typically HTML) to the Web-browser client.

Using the embedded PL/SQL gateway simplifies installation, configuration, and administration of PL/SQL based Web applications. The embedded gateway uses the Oracle XML DB protocol server, not Oracle HTTP Server. Its configuration is defined by the Oracle XML DB configuration file, /xdbconfig.xml. However, the recommended way to configure the embedded gateway is to use the procedures in PL/SQL package DBMS_EPG, not to edit file /xdbconfig.xml.

Sending Multibyte Data From a Client

When a client sends multibyte data in a URL, RFC 2718 specifies that the client should send the URL using the %HH format, where HH is the hexadecimal notation of the byte value in UTF-8 encoding. The following are URL examples that can be sent to Oracle XML DB in an HTTP(S) or WebDAV context:

Oracle XML DB processes the requested URL, any URLs within an IF header, any URLs within the DESTINATION header, and any URLs in the REFERRED header that contains multibyte data.

The default-url-charset configuration parameter can be used to accept requests from some clients that use other, nonconforming, forms of URL, with characters that are not ASCII. If a request with such characters fails, try setting this value to the native character set of the client environment. The character set used in such URL fields must be specified with an IANA charset name.

default-url-charset controls the encoding for nonconforming URLs. It is not required to be set unless a nonconforming client that does not send the Content-Type charset is used.

Characters That Are Not ASCII In URLs

Characters that are not ASCII that appear in URLs passed to an HTTP server should be converted to UTF-8 and escaped in the %HH format, where HH is the hexadecimal notation of the byte value. For flexibility, the Oracle XML DB protocol server interprets the incoming URLs by testing whether it is encoded in one of the following character sets in the order presented here:


Charset parameter of the Content-Type field of the request, if specified

Character set, if specified, in the default-url-charset configuration parameter

Character set of the database

Controlling Character Sets for HTTP(S)

The following sections describe how character sets are controlled for data transferred using HTTP(S).
Request Character Set

The character set of the HTTP(S) request body is determined with the following algorithm:

The Content-Type header is evaluated. If the Content-Type header specifies a charset value, the specified charset is used.

The MIME type of the document is evaluated as follows:

If the MIME type is "*/xml", the character set is determined as follows:

- If a BOM is present, then UTF-16 is used.

- If an encoding declaration is present, the specified encoding is used.

- If neither a BOM nor an encoding declaration is present, UTF-8 is used.

If the MIME type is text, ISO8859-1 is used.

If the MIME type is neither "*/xml" nor text, the database character set is used.

There is a difference between HTTP(S) and SQL or FTP. For text documents, the default is ISO8859-1, as specified by the IETF.org RFC 2616: HTTP 1.1 Protocol Specification.
Response Character Set

The response generated by Oracle XML DB HTTP Server is in the character set specified in the Accept-Charset field of the request. Accept-Charset can have a list of character sets. Based on the q-value, Oracle XML DB chooses one that does not require conversion. This might not necessarily be the charset with the highest q-value. If Oracle XML DB cannot find one, then the conversion is based on the highest q-value.
Using WebDAV and Oracle XML DB

Web Distributed Authoring and Versioning (WebDAV) is an IETF standard protocol used to provide users with a file-system interface to Oracle XML Repository over the Internet. The most popular way of accessing a WebDAV server folder is through WebFolders on Microsoft Windows 2000 or Microsoft NT.

WebDAV is an extension to the HTTP 1.1 protocol that lets an HTTP server act as a file server. It lets clients perform remote Web content authoring through a coherent set of methods, headers, request body formats and response body formats. For example, a DAV-enabled editor can interact with an HTTP/WebDAV server as if it were a file system. WebDAV provides operations to store and retrieve resources, create and list contents of resource collections, lock resources for concurrent access in a coordinated manner, and to set and retrieve resource properties.
Oracle XML DB WebDAV Features

Oracle XML DB supports the following WebDAV features:

Foldering, specified by RFC2518

Access Control

WebDAV is a set of extensions to the HTTP(S) protocol that allow you to edit or manage your files on remote Web servers. WebDAV can also be used, for example, to:

Share documents over the Internet

Edit content over the Internet

WebDAV Features That Are Not Supported

Oracle XML DB supports the contents of RFC2518, with the following exceptions:

Lock-NULL resources create zero-length resources in the file system, and cannot be converted to folders.

The COPY, MOVE and DELETE methods comply with section 2 of the Internet Draft titled 'Binding Extensions to WebDAV'.

Depth-infinity locks

Only Basic Authentication is supported.
Supported WebDAV Client Methods

For access to the repository, Oracle XML DB supports the following HTTP(S)/WebDAV client methods.

PROPFIND (WebDAV-specific) – get properties for a resource

PROPPATCH (WebDAV-specific) – set or remove resource properties

LOCK (WebDAV-specific) – lock a resource (create or refresh a lock)

UNLOCK (WebDAV-specific) – unlock a resource (remove a lock)

COPY (WebDAV-specific) – copy a resource

MOVE (WebDAV-specific) – move a resource

MKCOL (WebDAV-specific) – create a folder resource (collection)

Using WebDAV with Microsoft Windows XP SP2

If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use a secure connection (HTTPS) for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry.

Using Oracle XML DB and WebDAV: Creating a WebFolder in Microsoft Windows

To create a WebFolder in Windows 2000, follow these steps:

Start > My Network Places.

Double-click Add Network Place.

Click Next.

Type the location of the folder, for example:

See Figure 28-2.

Click Next.

Enter any name to identify this WebFolder

Click Finish.

You can now access Oracle XML DB Repository the same way that you access any Windows folder.

