Linux QoS 設定スクリプト
Linux ルーターに QoS を設定しました。快適、快適。って Linux を使わない人にはまったく興味のない話題だと思いますが(笑)。私が設定したスクリプトを書いておきますので、QoS を使いたい方は、参考にしてください。スクリプトを走らせた後に
tc -s class ls dev eth0
とコマンドを打つと、HTTP や FTP がトラフィックコントロールされている様子がわかります。
=== 以下スクリプト ===
#!/bin/sh
################
# ルールの初期化
/sbin/tc qdisc del dev eth0 root
/sbin/tc qdisc del dev eth1 root
##############################
# ルートクラス・親クラスの作成
## /dev/eth0 のルートクラスに cbq をセットし、ハンドルを 10 とする ##
/sbin/tc qdisc add dev eth0 root handle 10: cbq bandwidth 10Mbit avpkt 1000 cell 8
## /dev/eth1 のルートクラスに cbq をセットし、ハンドルを 11 とする ##
/sbin/tc qdisc add dev eth1 root handle 11: cbq bandwidth 10Mbit avpkt 1000 cell 8
## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 10:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth0 parent 10:0 classid 10:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000
## 10Mbit/sec の帯域クラスを priority 8 で作成 (classid 10:1)
## 以後、handle 11:1 を parent とするクラスは最大で 10Mbit/sec の
## 帯域が利用可能となる。
/sbin/tc class add dev eth1 parent 11:0 classid 11:1 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 8 maxburst 20 avpkt 1000
##################################
# /dev/eth0 の帯域制御クラスの作成
## 10Mbit/sec の帯域クラスを priority 1, classid 10:61, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 3, classid 10:63, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 5, classid 10:65, parent 10:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 7, classid 10:67, parent 10:1 で ##
## 作成し tbf スケジューラを設定 帯域制限あり ##
/sbin/tc class add dev eth0 parent 10:1 classid 10:67 cbq bandwidth 10Mbit rate 224Kbit allot 1514 cell 8 weight 22Kbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth0 parent 10:67 tbf rate 224Kbit buffer 10Kb/8 limit 15Kb
##################################
# /dev/eth1 の帯域制御クラスの作成
## 10Mbit/sec の帯域クラスを priority 1, classid 11:61, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:61 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 1 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:61 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 3, classid 11:63, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:63 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 3 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:63 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 5, classid 11:65, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:65 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 5 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:65 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
## 10Mbit/sec の帯域クラスを priority 7, classid 11:67, parent 11:1 で ##
## 作成し tbf スケジューラを設定 ##
/sbin/tc class add dev eth1 parent 11:1 classid 11:67 cbq bandwidth 10Mbit rate 10Mbit allot 1514 cell 8 weight 1Mbit prio 7 maxburst 20 avpkt 1000 bounded
/sbin/tc qdisc add dev eth1 parent 11:67 tbf rate 10Mbit buffer 10Kb/8 limit 15Kb
##########################################
# 各帯域クラスを適用するネットワークを定義
## DNS
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 53 0xffff flowid 10:61
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 53 0xffff flowid 11:61
## SSH
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 22 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 22 0xffff flowid 11:63
## HTTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 80 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 80 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 443 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 443 0xffff flowid 11:63
## MAIL
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 25 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 25 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 110 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 110 0xffff flowid 11:63
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 143 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 143 0xffff flowid 11:63
## NTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 123 0xffff flowid 10:63
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 123 0xffff flowid 11:63
## FTP
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 20 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 20 0xffff flowid 11:65
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dport 21 0xffff flowid 10:65
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip sport 21 0xffff flowid 11:65
## その他
/sbin/tc filter add dev eth0 parent 10:0 protocol ip prio 100 u32 match ip dst any flowid 10:67
/sbin/tc filter add dev eth1 parent 11:0 protocol ip prio 100 u32 match ip dst any flowid 11:67
=== 以上 ===
2007/11/30
QoS 付き nat の完全な例 - Linux Advanced Routing & Traffic Control HOWTO
Linux Advanced Routing & Traffic Control HOWTO
Prev Chapter 15. クックブック Next
15.10. QoS 付き nat の完全な例
私は Pedro Larroy です。 ここではたくさんのユーザがいるプライベートネットワークを、 パブリックな ip アドレスを持つ Linux ルータを通してインターネットにつなぎ、 この Linux ルータにネットワークアドレス変換 (NAT) をやらせる方法について、 よくある設定例を説明したいと思います。 ここでは QoS 設定を用いて、大学寮の 198 ユーザ (私もその一人。ただし管理者です) にインターネットアクセスを提供します。 ユーザはみなピアツーピアプログラムのヘビーユーザですので、 適切なトラフィック制御が不可欠です。これが興味を持たれた lartc 読者に対する、 実用的な例になっていることを期待します。
まず先に、順番に段階を追った実践的なアプローチを取り、 最後にその処理をブート時に自動的に行うやり方を説明します。 この例が適用されるネットワークは、 パブリック ip アドレスをひとつだけ持つ Linux ルータを介して、 インターネットにつながっているプライベート LAN です。 これを複数のパブリックアドレスに拡張することは非常に簡単で、 iptables のルールをいくつか追加するだけです。 動作環境を作るには、以降のものが必要となります。
Linux 2.4.18 以降のカーネルがインストールされていること
2.4.18 を使っている場合は、HTB パッチが必要です。
iproute
tc のバイナリが HTB に対応していること。 コンパイル済みのバイナリが HTB と一緒に配布されています。
iptables
15.10.1. まず乏しいバンド幅を最適化しましょう
まずいくつか qdisc を設定して、トラフィックをクラス選別します。 htb qdisc を作り、昇順の優先度を持つ 6 つのクラスを付属させます。 次に、必ず割り当てられた速度を使え、 他のクラスが不要としているバンド幅も使えるクラスを作ります。 優先度を高く (つまり prio 番号を小さく) したクラスは、 余ったバンド幅を先に利用できます。 私たちの接続は下り 2Mb 上り 300kbit/s の ADSL です。 私は 240kbit/s を上限速度としました。これ以上にすると、 おそらく接続のどこかのバッファが効くためでしょうが、 遅延が大きくなり始めるからです。 このパラメータは実験的に測定して、近くのホストに対する遅延を見ながら 増減してください。
CEIL を上りバンド幅上限値の 75% に調整してください。 eth0 になっているところは、インターネットのアクセスに使っている パブリックなインターフェースに変更してください。 まず手始めに、以降を root のシェルで実行します。 CEIL=240
tc qdisc add dev eth0 root handle 1: htb default 15
tc class add dev eth0 parent 1: classid 1:1 htb rate ${CEIL}kbit ceil ${CEIL}kbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 80kbit ceil 80kbit prio 0
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
tc class add dev eth0 parent 1:1 classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
tc qdisc add dev eth0 parent 1:12 handle 120: sfq perturb 10
tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10
ここではまず、深さが 1 レベルの htb ツリーを作りました。 次のような感じです。 +---------+
| root 1: |
+---------+
|
+---------------------------------------+
| class 1:1 |
+---------------------------------------+
| | | | | |
+----+ +----+ +----+ +----+ +----+ +----+
|1:10| |1:11| |1:12| |1:13| |1:14| |1:15|
+----+ +----+ +----+ +----+ +----+ +----+
classid 1:10 htb rate 80kbit ceil 80kbit prio 0
これが優先度が最高のクラスです。このクラスのパケットは、遅延が最も小さく、 余ったバンド幅を最初に割り当てられます。 よってこのクラスの ceil は抑え目に設定しておくのが良いでしょう。 対話的トラフィックのように、遅延が小さいことによる利益が大きいパケットは、 このクラスを使って送ります。具体的には ssh, telnet, dns, quake3, irc, SYN フラグの立ったパケット です。
classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
これがバルクトラフィックをあてがう最初のクラスです。 この例では、ローカルの web サーバから発するトラフィック (発信元ポートが 80) と、web ページのリクエスト (送信先ポートが 80) です。
classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
このクラスには、TOS フィールドで Maximize-Throughput ビットが立っている トラフィックと、ルータの「ローカルプロセス」から インターネットに向けて発するトラフィックをおきます。 よって以降のクラスは、このマシンを「経由する」トラフィックだけになります。
classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
このクラスは、他の NAT されるマシンで、 高い優先度を必要とするバルクトラフィックのためのものです。
classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
ここにはメール関連のトラフィック (SMTP, pop3 など) と、 TOS フィールドの Minimize-Cost ビットが立ったパケットを入れます。
classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
最後に、ここにはルータの背後に置かれた、NAT されたマシンからの トラフィックを入れます。 kazaa, edonkey などはここに入れ、 他のサービスと干渉しないようにします。
15.10.2. パケットのクラス選別
qdisc 設定は行いましたが、パケットのクラス選別はまだです。 ですので現在は、送信されるパケットはすべて 1:15 に入ります (なぜなら tc qdisc add dev eth0 root handle 1: htb default 15 を用いたから)。ここで、どのパケットがどこに行くのかを伝える必要があります。 ここが最も重要な部分です。
ではフィルタを設定し、パケットを iptables でクラス選別できるようにします。 私はこの作業には、まずほとんどの場合 iptables を用います。 iptables は柔軟ですし、各ルールでのパケットの計数もできるからです。 また RETURN ターゲットを用いれば、 パケットにすべてのルールを適用しなくて済みます。 次のコマンドを実行します。 tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
tc filter add dev eth0 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12
tc filter add dev eth0 parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13
tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14
tc filter add dev eth0 parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15
ここでは単に、特定の FWMARK 値 (handle x fw) を持った各パケットを 対応するクラス (classid x:x) に送るようカーネルに伝えただけです。 次は、パケットへのマーク付けを iptables を使って行う方法です。
まず、パケットが iptables のフィルタを どのように通るのかを理解しなければなりません。 +------------+ +---------+ +-------------+
Packet -| PREROUTING |--- routing-----| FORWARD |-------+-------| POSTROUTING |- Packets
input +------------+ decision +---------+ | +-------------+ out
| |
+-------+ +--------+
| INPUT |---- Local process -| OUTPUT |
+-------+ +--------+
すべてのテーブルが存在し、デフォルトのポリシーが ACCEPT (-P ACCEPT) になっているとします。まだ iptables に触ったことがなければ、 デフォルトで ok のはずです。 私たちのプライベートネットワークはクラス B のアドレス 172.17.0.0/16 を持ち、パブリック ip は 212.170.21.172 です。
次にカーネルに実際に NAT を行うよう指示し、 プライベートネットワークのクライアントが外部と通信を開始できるようにします。 echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 172.17.0.0/255.255.0.0 -o eth0 -j SNAT --to-source 212.170.21.172
ここでパケットが 1:15 経由で流れていることを確認しましょう: tc -s class show dev eth0
パケットへの印付けを開始するには、mangle テーブルの PREROUTING チェインにルールを追加します。 iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN
これでプライベートネットワークからインターネットのどこかに ping を行うと、 1:10 のパケット数が増加するのがわかるはずです。見てみましょう: tc -s class show dev eth0
ここでは -j RETURN を行って、パケットが他のルールには行かないようにしました。 icmp パケットは RETURN 以降のルールのマッチ動作の対象にはなりません。 覚えておいてください。では適切に TOS を処理するよう、 他にもルールを追加しましょう。 iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j MARK --set-mark 0x5
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j MARK --set-mark 0x6
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j RETURN
では ssh パケットを優先付けします: iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j RETURN
tcp 接続を開始するパケット、つまり SYN フラグの立ったパケットは、 優先しましょう。 iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN
などなど。mangle の PREROUTING へのルール追加が終わったら、 次のコマンドで PREROUTING テーブルを締めくくりましょう。 iptables -t mangle -A PREROUTING -j MARK --set-mark 0x6
これで、ここまで印付けされなかったトラフィックは 1:15 に向かいます。 実はデフォルトのクラスは 1:15 なので、この最終ステップは不必要です。 ですが設定全体の整合性を保つため、またこのルールのカウンタを見るために、 ここでは印付けを行っています。
同様の作業を OUTPUT ルールに対しても行うといいでしょう。 よってこれらのコマンドを、-A PREROUTING の代わりに -A OUTPUT とおいて繰り返します (s/PREROUTING/OUTPUT/)。 こうするとローカル (この Linux ルータ) で生成されたトラフィックも クラス選別できます。 OUTPUT チェインの最後は、-j MARK --set-mark 0x3 で締めくくり、 ローカルのトラフィックには高めの優先度を与えるようにしました。
15.10.3. この設定を改善する
これでこの設定はすべて動作するようになりました。 グラフを見て、バンド幅がどのように使われているか、 それをどのようにしたいか考えましょう。 これには長い時間をかけましょう。私の場合は最終的に、 このインターネット接続を非常にうまく動作させられるようになりました。 これを行わなければ、常にタイムアウトに悩まされたり、 新しく生成される tcp 接続にまったくバンド幅の配分がなされなかったり、 という状態だったでしょう。
特定のクラスが、ほとんどの間一杯になっているような状況でしたら、 他のキューイング規則をそこにあてがって、 バンド幅の共有をより公平にしてあげるといいでしょう。 tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10
15.10.4. このすべてをブート時に起動する
当然ですが、いろいろな方法があります。 私の場合は [start | stop | stop-tables | start-tables | reload-tables] といったオプションを受け付ける /etc/init.d/packetfilter というスクリプトを書き、qdisc を設定し、必要なカーネルモジュールをロードし、 デーモンのように動作するようにしました。 このスクリプトは同時に、/etc/network/iptables-rules から iptables のルールもロードします。 このファイルの内容は iptables-save で保存、 iptables-restore で復元できます。 Prev Home Next
単一のホストまたはネットワークの速度制限 Up ブリッジと、代理 ARP を用いた擬似ブリッジの構築
Prev Chapter 15. クックブック Next
15.10. QoS 付き nat の完全な例
私は Pedro Larroy
まず先に、順番に段階を追った実践的なアプローチを取り、 最後にその処理をブート時に自動的に行うやり方を説明します。 この例が適用されるネットワークは、 パブリック ip アドレスをひとつだけ持つ Linux ルータを介して、 インターネットにつながっているプライベート LAN です。 これを複数のパブリックアドレスに拡張することは非常に簡単で、 iptables のルールをいくつか追加するだけです。 動作環境を作るには、以降のものが必要となります。
Linux 2.4.18 以降のカーネルがインストールされていること
2.4.18 を使っている場合は、HTB パッチが必要です。
iproute
tc のバイナリが HTB に対応していること。 コンパイル済みのバイナリが HTB と一緒に配布されています。
iptables
15.10.1. まず乏しいバンド幅を最適化しましょう
まずいくつか qdisc を設定して、トラフィックをクラス選別します。 htb qdisc を作り、昇順の優先度を持つ 6 つのクラスを付属させます。 次に、必ず割り当てられた速度を使え、 他のクラスが不要としているバンド幅も使えるクラスを作ります。 優先度を高く (つまり prio 番号を小さく) したクラスは、 余ったバンド幅を先に利用できます。 私たちの接続は下り 2Mb 上り 300kbit/s の ADSL です。 私は 240kbit/s を上限速度としました。これ以上にすると、 おそらく接続のどこかのバッファが効くためでしょうが、 遅延が大きくなり始めるからです。 このパラメータは実験的に測定して、近くのホストに対する遅延を見ながら 増減してください。
CEIL を上りバンド幅上限値の 75% に調整してください。 eth0 になっているところは、インターネットのアクセスに使っている パブリックなインターフェースに変更してください。 まず手始めに、以降を root のシェルで実行します。 CEIL=240
tc qdisc add dev eth0 root handle 1: htb default 15
tc class add dev eth0 parent 1: classid 1:1 htb rate ${CEIL}kbit ceil ${CEIL}kbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 80kbit ceil 80kbit prio 0
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
tc class add dev eth0 parent 1:1 classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
tc qdisc add dev eth0 parent 1:12 handle 120: sfq perturb 10
tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10
ここではまず、深さが 1 レベルの htb ツリーを作りました。 次のような感じです。 +---------+
| root 1: |
+---------+
|
+---------------------------------------+
| class 1:1 |
+---------------------------------------+
| | | | | |
+----+ +----+ +----+ +----+ +----+ +----+
|1:10| |1:11| |1:12| |1:13| |1:14| |1:15|
+----+ +----+ +----+ +----+ +----+ +----+
classid 1:10 htb rate 80kbit ceil 80kbit prio 0
これが優先度が最高のクラスです。このクラスのパケットは、遅延が最も小さく、 余ったバンド幅を最初に割り当てられます。 よってこのクラスの ceil は抑え目に設定しておくのが良いでしょう。 対話的トラフィックのように、遅延が小さいことによる利益が大きいパケットは、 このクラスを使って送ります。具体的には ssh, telnet, dns, quake3, irc, SYN フラグの立ったパケット です。
classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
これがバルクトラフィックをあてがう最初のクラスです。 この例では、ローカルの web サーバから発するトラフィック (発信元ポートが 80) と、web ページのリクエスト (送信先ポートが 80) です。
classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
このクラスには、TOS フィールドで Maximize-Throughput ビットが立っている トラフィックと、ルータの「ローカルプロセス」から インターネットに向けて発するトラフィックをおきます。 よって以降のクラスは、このマシンを「経由する」トラフィックだけになります。
classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
このクラスは、他の NAT されるマシンで、 高い優先度を必要とするバルクトラフィックのためのものです。
classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
ここにはメール関連のトラフィック (SMTP, pop3 など) と、 TOS フィールドの Minimize-Cost ビットが立ったパケットを入れます。
classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
最後に、ここにはルータの背後に置かれた、NAT されたマシンからの トラフィックを入れます。 kazaa, edonkey などはここに入れ、 他のサービスと干渉しないようにします。
15.10.2. パケットのクラス選別
qdisc 設定は行いましたが、パケットのクラス選別はまだです。 ですので現在は、送信されるパケットはすべて 1:15 に入ります (なぜなら tc qdisc add dev eth0 root handle 1: htb default 15 を用いたから)。ここで、どのパケットがどこに行くのかを伝える必要があります。 ここが最も重要な部分です。
ではフィルタを設定し、パケットを iptables でクラス選別できるようにします。 私はこの作業には、まずほとんどの場合 iptables を用います。 iptables は柔軟ですし、各ルールでのパケットの計数もできるからです。 また RETURN ターゲットを用いれば、 パケットにすべてのルールを適用しなくて済みます。 次のコマンドを実行します。 tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
tc filter add dev eth0 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12
tc filter add dev eth0 parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13
tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14
tc filter add dev eth0 parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15
ここでは単に、特定の FWMARK 値 (handle x fw) を持った各パケットを 対応するクラス (classid x:x) に送るようカーネルに伝えただけです。 次は、パケットへのマーク付けを iptables を使って行う方法です。
まず、パケットが iptables のフィルタを どのように通るのかを理解しなければなりません。 +------------+ +---------+ +-------------+
Packet -| PREROUTING |--- routing-----| FORWARD |-------+-------| POSTROUTING |- Packets
input +------------+ decision +---------+ | +-------------+ out
| |
+-------+ +--------+
| INPUT |---- Local process -| OUTPUT |
+-------+ +--------+
すべてのテーブルが存在し、デフォルトのポリシーが ACCEPT (-P ACCEPT) になっているとします。まだ iptables に触ったことがなければ、 デフォルトで ok のはずです。 私たちのプライベートネットワークはクラス B のアドレス 172.17.0.0/16 を持ち、パブリック ip は 212.170.21.172 です。
次にカーネルに実際に NAT を行うよう指示し、 プライベートネットワークのクライアントが外部と通信を開始できるようにします。 echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 172.17.0.0/255.255.0.0 -o eth0 -j SNAT --to-source 212.170.21.172
ここでパケットが 1:15 経由で流れていることを確認しましょう: tc -s class show dev eth0
パケットへの印付けを開始するには、mangle テーブルの PREROUTING チェインにルールを追加します。 iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN
これでプライベートネットワークからインターネットのどこかに ping を行うと、 1:10 のパケット数が増加するのがわかるはずです。見てみましょう: tc -s class show dev eth0
ここでは -j RETURN を行って、パケットが他のルールには行かないようにしました。 icmp パケットは RETURN 以降のルールのマッチ動作の対象にはなりません。 覚えておいてください。では適切に TOS を処理するよう、 他にもルールを追加しましょう。 iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j MARK --set-mark 0x5
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j MARK --set-mark 0x6
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j RETURN
では ssh パケットを優先付けします: iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j RETURN
tcp 接続を開始するパケット、つまり SYN フラグの立ったパケットは、 優先しましょう。 iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN
などなど。mangle の PREROUTING へのルール追加が終わったら、 次のコマンドで PREROUTING テーブルを締めくくりましょう。 iptables -t mangle -A PREROUTING -j MARK --set-mark 0x6
これで、ここまで印付けされなかったトラフィックは 1:15 に向かいます。 実はデフォルトのクラスは 1:15 なので、この最終ステップは不必要です。 ですが設定全体の整合性を保つため、またこのルールのカウンタを見るために、 ここでは印付けを行っています。
同様の作業を OUTPUT ルールに対しても行うといいでしょう。 よってこれらのコマンドを、-A PREROUTING の代わりに -A OUTPUT とおいて繰り返します (s/PREROUTING/OUTPUT/)。 こうするとローカル (この Linux ルータ) で生成されたトラフィックも クラス選別できます。 OUTPUT チェインの最後は、-j MARK --set-mark 0x3 で締めくくり、 ローカルのトラフィックには高めの優先度を与えるようにしました。
15.10.3. この設定を改善する
これでこの設定はすべて動作するようになりました。 グラフを見て、バンド幅がどのように使われているか、 それをどのようにしたいか考えましょう。 これには長い時間をかけましょう。私の場合は最終的に、 このインターネット接続を非常にうまく動作させられるようになりました。 これを行わなければ、常にタイムアウトに悩まされたり、 新しく生成される tcp 接続にまったくバンド幅の配分がなされなかったり、 という状態だったでしょう。
特定のクラスが、ほとんどの間一杯になっているような状況でしたら、 他のキューイング規則をそこにあてがって、 バンド幅の共有をより公平にしてあげるといいでしょう。 tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10
15.10.4. このすべてをブート時に起動する
当然ですが、いろいろな方法があります。 私の場合は [start | stop | stop-tables | start-tables | reload-tables] といったオプションを受け付ける /etc/init.d/packetfilter というスクリプトを書き、qdisc を設定し、必要なカーネルモジュールをロードし、 デーモンのように動作するようにしました。 このスクリプトは同時に、/etc/network/iptables-rules から iptables のルールもロードします。 このファイルの内容は iptables-save で保存、 iptables-restore で復元できます。 Prev Home Next
単一のホストまたはネットワークの速度制限 Up ブリッジと、代理 ARP を用いた擬似ブリッジの構築
ポート毎に帯域制御したい
ポート毎に帯域制御したい
最終更新: 2007/1/1
[概 要]
サーバからの送信される情報の帯域をポート毎(HTTPやFTP等のプロトコル毎)に制御したい。
[対 策]
カーネルのQoS(Quality of Service)機能を使用すると比較的簡単に帯域制御できます。但し、帯域制御できるのはサーバからの送信だけで受信は制御できないため、例えばFTPのアップロードを制御したい場合はデーモンの機能を併用する必要があります。
事前準備
カーネルのQoS(Quality of Service)機能を使用するには iproute+tc が必要であるが、最近のデストりには既に入っているのでこちらのインストールは不要である。tcを使用するといろいろなQoS制御ができる反面、かなり時間をかけて内容をしっかり理解しないとほとんど設定不可能であるが、cbq.init というスクリプトを使用するとポート毎の帯域制御であれば簡単に設定できるため、ここではこれを使用する。cbq.initをこちらからダウンロードし、システム起動時に自動起動できるようにする。なお、RedHat系ならそのままで良いが、SuSEの場合はtcのパスが異なるので、2行目でmvする代わりに3行目のようにsedで変換する。
# wget http://jaist.dl.sourceforge.net/sourceforge/cbqinit/cbq.init-v0.7.3
# mv cbq.init-v0.7.3 /etc/init.d/cbq.init
(# sed -e "s/TC=\/sbin\/tc/TC=\/usr\/sbin\/tc/g" cbq.init-v0.7.3 > /etc/init.d/cbq.init)
# chmod 755 /etc/init.d/cbq.init
# chkconfig --add cbq.init
QoS設定
QoS制御で使用する cbq.init 関係の設定ファイル名称及び設置場所はデフォルトで決まっている。
[設定ファイルの設置場所]
設定ファイルは、/etc/sysconfig/cbq/ディレクトリ配下に設定することになっているので、下記で作成しておく。
# mkdir /etc/sysconfig/cbq
[設定ファイル]
設定ファイルの名称と形式も決まっているので下記のとおりとする。clsid(クラスID)が異なれば複数の設定ファイルが記述できる。
ファイル名称: cbq-.
cbq-: ここは固定でこのとおりとすること。
:実質的にCBQのクラスIDであり、10進で2-65535の値(16進で0002-FFFF)で指定する。他の設定ファイルと重複は不可。
:クラスIDのニックネームなので自分でわかりやすいものを適当に付与する。
例: cbq-1280.My_first_shaper
[設定ファイルのパラメータ]
No. 種 別 パ ラ メ ー タ 概 要 備考
1 デバイス DEVICE=,[,]
例:DEVICE=eth0,10Mbit,1Mbit
:帯域制御するインタフェース名。
:インタフェースの物理速度。100BASE-TXなら100Mbit、10BASE-Tなら10Mbitと指定。
:に比例するパラメータで原則の1/10の値にすること。 必須
2 クラス RATE=
例:RATE=5Mbit
:このクラスに割り当てる帯域を指定。単位としてはKbit, Mbitが使用できる。bps, Kbps, Mbps も使用できるが、bytes/secであることに注意しなければならないのと、インタフェース速度との関係がわかりにくいので使用しないほうが無難。 必須
3 WEIGHT=
例:WEIGHT=500kbit
:RATEに対応したパラメータで、原則RATEの1/10(WEIGHT ~= RATE / 10. 適当に四捨五入でもする。)の値にすること。 必須
4 PRIO=<1-8> デフォルト:5
例:PRIO=5
トラヒックの優先度を1-8で指定。値が小さいほど優先的に処理されるので、プロトコル間で差をつける(SSHを最優先にする等)場合に使用できる。 OP
5 フィルタ RULE=[[saddr[/prefix]][:port],][daddr[/prefix]][:port]
ここで、実際に制御するアドレス/ネットワークとポートを指定する。前者のパラメータ[saddr[/prefix]]は制御するパケットのソースアドレス/ネットワークで、[daddr[/prefix]]が本スクリプトが動作しているサーバがデータを送信する相手を示す。両者の区切りである「,」は、前者のパラメータ指定の最後に付与するものなので注意が必要である。
例:
・WWWサーバへのアクセスに対するコンテンツ配送の制御
サーバの80番ポートをソースとするパケットを制御することになるので、下記のように [サーバアドレス:80,] とRULEに設定する。ソースをキーに制御するので最後の「,」を忘れずに。
RULE=192.168.1.100:80,
+---------+
| linux |-eth0------*-[client]
+---------+
Server:192.168.1.100 Client: any
80 --------------> any
・FTPサーバからのダウンロードトラヒックの制御
FTPサーバからのデータダウンロードは、ActiveモードとPassiveモードで使用するポートが異なる。
Activeモードの場合、サーバ側が20番となるコネクション(ftp-data)で送信されるので、以下のように設定する。
RULE=192.168.1.100:20,
Passiveモードの場合、サーバ側で使用するポートを指定できるデーモンでないと制御できない。おやじのサイトで紹介しているProftpd/vsftpdとも設定が可能なので、使用ポート範囲を設定する。ダウンロードデータはそのポートがソースとなるパケットで送信される範囲指定になるので、以下のように [開始ポート番号/ANDマスク] 設定する。
指定方法のANDマスクの考え方は、ネットワークのサブネットマスクの考え方(192.168.1.0/24の/24)と同じであり、/24を16進で表現したものである。例えば、4096から4127までの32ポートを設定したとすると、[ 4096/0xffe0 ]となり、下記のように4096~4127の数字は[ 0xffe0 ]でANDをとると全て4096となり同じ扱いになる。これでわかるように、開始ポート番号は、使用するポート数に応じて下位nビットが0となる値にしないと関係ないポートまで制限してしまうので、Proftpd等の設定例で示している4000~4029という設定は変更する必要がある。
4096(0x4000) 0100000000000000 [開始ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000
4127(0x401f) 0100000000011111 [終了ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000
RULE=192.168.1.100:4096/0xffe0,
+---------+
| linux |-eth0------*-[client]
+---------+
Server:192.168.1.100 Client: any
20/4096-4127 --------------> any 必須
6 タイマ TIME=[,, ...,/]-;/
例:TIME=0,1,2,5/18:00-06:00;256Kbit/25Kbit
TIME=18:00-06:00;256Kbit/25Kbit
本設定でタイマにより上記で設定した値と異なる帯域で制御することができる。
:ルールを適用する曜日を指定。0-6で 0 が日曜に対応している。
-:このルールの適用開始時刻と終了時刻を24時制で指定。
/:上記の2項、3項に同じ。 OP
[設定例] 下記のようなファイルを/etc/sysconfig/cbqディレクトリに設定する。
・cbq-100.http: WWWサーバへの過大なアクセスにより回線を使い切るのを制限する例。
DEVICE=eth0,100Mbit,10Mbit
RATE=5Mbit
WEIGHT=500Kbit
PRIO=5
RULE=192.168.1.100:80,
・cbq-101.ftp: FTPサーバからのダウンロードを制限する例。
DEVICE=eth0,100Mbit,10Mbit
RATE=10Mbit
WEIGHT=1Mbit
PRIO=6
RULE=192.168.1.100:20,
RULE=192.168.1.100:4096/0xffe0,
cbq.initの起動
cbq.init自体が起動スクリプトなので単純に起動すればよい。起動したら設定どおり制限されているかテストする。
# /etc/init.d/cbq.init start
最終更新: 2007/1/1
[概 要]
サーバからの送信される情報の帯域をポート毎(HTTPやFTP等のプロトコル毎)に制御したい。
[対 策]
カーネルのQoS(Quality of Service)機能を使用すると比較的簡単に帯域制御できます。但し、帯域制御できるのはサーバからの送信だけで受信は制御できないため、例えばFTPのアップロードを制御したい場合はデーモンの機能を併用する必要があります。
事前準備
カーネルのQoS(Quality of Service)機能を使用するには iproute+tc が必要であるが、最近のデストりには既に入っているのでこちらのインストールは不要である。tcを使用するといろいろなQoS制御ができる反面、かなり時間をかけて内容をしっかり理解しないとほとんど設定不可能であるが、cbq.init というスクリプトを使用するとポート毎の帯域制御であれば簡単に設定できるため、ここではこれを使用する。cbq.initをこちらからダウンロードし、システム起動時に自動起動できるようにする。なお、RedHat系ならそのままで良いが、SuSEの場合はtcのパスが異なるので、2行目でmvする代わりに3行目のようにsedで変換する。
# wget http://jaist.dl.sourceforge.net/sourceforge/cbqinit/cbq.init-v0.7.3
# mv cbq.init-v0.7.3 /etc/init.d/cbq.init
(# sed -e "s/TC=\/sbin\/tc/TC=\/usr\/sbin\/tc/g" cbq.init-v0.7.3 > /etc/init.d/cbq.init)
# chmod 755 /etc/init.d/cbq.init
# chkconfig --add cbq.init
QoS設定
QoS制御で使用する cbq.init 関係の設定ファイル名称及び設置場所はデフォルトで決まっている。
[設定ファイルの設置場所]
設定ファイルは、/etc/sysconfig/cbq/ディレクトリ配下に設定することになっているので、下記で作成しておく。
# mkdir /etc/sysconfig/cbq
[設定ファイル]
設定ファイルの名称と形式も決まっているので下記のとおりとする。clsid(クラスID)が異なれば複数の設定ファイルが記述できる。
ファイル名称: cbq-
cbq-: ここは固定でこのとおりとすること。
例: cbq-1280.My_first_shaper
[設定ファイルのパラメータ]
No. 種 別 パ ラ メ ー タ 概 要 備考
1 デバイス DEVICE=
例:DEVICE=eth0,10Mbit,1Mbit
2 クラス RATE=
例:RATE=5Mbit
3 WEIGHT=
例:WEIGHT=500kbit
4 PRIO=<1-8> デフォルト:5
例:PRIO=5
トラヒックの優先度を1-8で指定。値が小さいほど優先的に処理されるので、プロトコル間で差をつける(SSHを最優先にする等)場合に使用できる。 OP
5 フィルタ RULE=[[saddr[/prefix]][:port],][daddr[/prefix]][:port]
ここで、実際に制御するアドレス/ネットワークとポートを指定する。前者のパラメータ[saddr[/prefix]]は制御するパケットのソースアドレス/ネットワークで、[daddr[/prefix]]が本スクリプトが動作しているサーバがデータを送信する相手を示す。両者の区切りである「,」は、前者のパラメータ指定の最後に付与するものなので注意が必要である。
例:
・WWWサーバへのアクセスに対するコンテンツ配送の制御
サーバの80番ポートをソースとするパケットを制御することになるので、下記のように [サーバアドレス:80,] とRULEに設定する。ソースをキーに制御するので最後の「,」を忘れずに。
RULE=192.168.1.100:80,
+---------+
| linux |-eth0------*-[client]
+---------+
Server:192.168.1.100 Client: any
80 --------------> any
・FTPサーバからのダウンロードトラヒックの制御
FTPサーバからのデータダウンロードは、ActiveモードとPassiveモードで使用するポートが異なる。
Activeモードの場合、サーバ側が20番となるコネクション(ftp-data)で送信されるので、以下のように設定する。
RULE=192.168.1.100:20,
Passiveモードの場合、サーバ側で使用するポートを指定できるデーモンでないと制御できない。おやじのサイトで紹介しているProftpd/vsftpdとも設定が可能なので、使用ポート範囲を設定する。ダウンロードデータはそのポートがソースとなるパケットで送信される範囲指定になるので、以下のように [開始ポート番号/ANDマスク] 設定する。
指定方法のANDマスクの考え方は、ネットワークのサブネットマスクの考え方(192.168.1.0/24の/24)と同じであり、/24を16進で表現したものである。例えば、4096から4127までの32ポートを設定したとすると、[ 4096/0xffe0 ]となり、下記のように4096~4127の数字は[ 0xffe0 ]でANDをとると全て4096となり同じ扱いになる。これでわかるように、開始ポート番号は、使用するポート数に応じて下位nビットが0となる値にしないと関係ないポートまで制限してしまうので、Proftpd等の設定例で示している4000~4029という設定は変更する必要がある。
4096(0x4000) 0100000000000000 [開始ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000
4127(0x401f) 0100000000011111 [終了ポート]
32(0xffe0) 1111111111100000
---------------------------------
AND 0100000000000000
RULE=192.168.1.100:4096/0xffe0,
+---------+
| linux |-eth0------*-[client]
+---------+
Server:192.168.1.100 Client: any
20/4096-4127 --------------> any 必須
6 タイマ TIME=[
例:TIME=0,1,2,5/18:00-06:00;256Kbit/25Kbit
TIME=18:00-06:00;256Kbit/25Kbit
本設定でタイマにより上記で設定した値と異なる帯域で制御することができる。
[設定例] 下記のようなファイルを/etc/sysconfig/cbqディレクトリに設定する。
・cbq-100.http: WWWサーバへの過大なアクセスにより回線を使い切るのを制限する例。
DEVICE=eth0,100Mbit,10Mbit
RATE=5Mbit
WEIGHT=500Kbit
PRIO=5
RULE=192.168.1.100:80,
・cbq-101.ftp: FTPサーバからのダウンロードを制限する例。
DEVICE=eth0,100Mbit,10Mbit
RATE=10Mbit
WEIGHT=1Mbit
PRIO=6
RULE=192.168.1.100:20,
RULE=192.168.1.100:4096/0xffe0,
cbq.initの起動
cbq.init自体が起動スクリプトなので単純に起動すればよい。起動したら設定どおり制限されているかテストする。
# /etc/init.d/cbq.init start
tc - traffic control Linux QoS control tool
tc - traffic control Linux QoS control tool
Milan P. Stanic
mps@rns-nis.co.yu
Contents
Contents
1 What is QoS
2 command syntax
3 Queueing disciplines
3.1 Class Based Queue
3.2 Priority
3.3 FIFO
3.4 TBF
3.5 RED
3.6 GRED
3.7 SFQ
3.8 ATM
3.9 Dsmark
3.10 INGRESS
4 classes
4.1 CBQ
5 filters (or classifier)
5.1 filter rsvp
5.2 filter u32
5.3 filter fw
5.4 filter route
5.5 tcindex
6 police
Bibliography
Appendix
About this document
This document should be (comprehensive) description of tc command utility from iproute2 package.
Primary motivation for this work is my wish to learn about QoS in Linux (and about QoS in general). If you find errors or big mistakes in this document that is because I don't yet understand QoS. I hope it will improve over time.
It is based on kernel 2.4 and iproute2 version 000305
It is far from to be complete and/or without errors. I am writing it for purpose of my learning only, and I am not sure will (and when) it be finished. But, I am working on it (especially when I have time :).
All of the text is taken from different documents from the net, from linux-diffserv, linux-net mailing lists and from the Linux kernel source files.
Disclaimer: Use at your own risk. I am not responsible if you make loss or damage in any sense by using information from this document.
1 What is QoS
When the kernel has several packets to send out over a network device, it has to decide which ones to send first, which ones to delay, and which ones to drop. This is the job of the packet scheduler, and several different algorithms for how to do this "fairly" have been proposed.
With Linux QoS subsystem (which is constructed of the building blocks of the kernel and user space tools like ip and tc command line utilities) it is possible to make very flexible traffic control.
2 command syntax
tc (traffic controller) is the user level program which can be used to create and associate queues with the network devices. It is used to set up various kinds of queues and associate classes with each of those queues. It is also used to set up filters by which the packets is classified.
Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { qdisc | class | filter }
OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }
Where it's expecting a number for BPS; it understands some suffixes: kbps (*1024), mbps (*1024*1024), kbit (*1024/8), and mbit (*1024*1024/8). If I'm reading the code correctly; "BPS" means Bytes Per Second; if you give a number without a suffix it assumes you want BITS per second (it divides the number you give it by 8). It also understands bps as a suffix.
Where it's expecting a time value, it seems it understands suffixes of s, sec, and secs for seconds, ms, msec, and msecs for milliseconds, and us, usec, and usecs for microseconds.
Where it wants a size parameter, it assumes non-suffixed numbers to be specified in bytes. It also understands suffixes of k and kb to mean kilobytes (*1024), m and mb to mean megabytes (*1024*1024), kbit to mean kilobit (*1024/8), and mbit to mean megabits (*1024*1024/8).
1Mbit == 128Kbps or 1 megabit is 128 kilobytes per second
bps = bits/sec (uhmm...)
kbps = bytes/sec * 1024
mbps = bytes/sec * 1024 * 1024
kbit = bits/sec * 1024
mbit = bits/sec * 1024 * 1024
In the examples Xbit and Xbps are interchangeably, when tc treats them very differently.
note: this is very confusing
note: make sure whenever you are dealing with memory related things like queue size, buffer size that their units are in bytes and when it is bandwidth and rate related parameters the units are in bits.
3 Queueing disciplines
Each network device has a queuing discipline associated with it, which controls how packets enqueued on that device are treated. It can be viewed with ip command:
root@dl:# ip link show
1: lo: <lt;LOOPBACK,UP>gt; mtu 3924 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <lt;BROADCAST,MULTICAST,PROMISC,UP>gt; mtu 1500 qdisc pfifo_fast qlen 100
link/ether 52:54:00:de:bf:19 brd ff:ff:ff:ff:ff:ff
3: tap0: <lt;BROADCAST,MULTICAST,NOARP>gt; mtu 1500 qdisc noop
link/ether fe:fd:00:00:00:00 brd ff:ff:ff:ff:ff:ff
Generally, queueing discipline ("qdisc") is a black box, which is able to enqueue packets and to dequeue them (when device is ready to send something) in order and at times determined by algorithm hidden in it.
By default queueing discipline is pfifo_fast which cannot be manipulated with tc. It is assigned to device when the device is started or when the other qdisc's deleted from the device. That qdiscs have 3 bands which are processed from band 0 to band 2, and when there is a packet in queue in higher priority band (lower number)
Qdisc's are:
FIFO - simple FIFO (packet (p-FIFO) or byte (b-FIFO) )
PRIO - n-band strict priority scheduler
TBF - token bucket filter
CBQ - class based queue
CSZ - Clark-Scott-Zhang
SFQ - stochastic fair queue
RED - random early detection
GRED - generalized random early detection
TEQL - traffic equalizer
ATM - asynchronous transfer mode
DSMARK - DSCP (Diff-Serv Code Point)marker/remarker
qdisc's are divided to two categories:
- "queues", which have no internal structure visible from outside.
- "schedulers", which split all the packets to "traffic classes", using "packet classifiers". ? is qdisc's which can split packets to ``traffic classes''
In turn, classes may have child qdiscs (as rule, queues) attached to them etc. etc. etc.
note: Certain qdiscs can have children and they are classfull, and others are leafs (describe it!)
classfull qdiscs: CBQ, ATM, DSMARK, CSZ and the ( p-FIFO ???? or prio )
leaf qdiscs: TBF, FIFO, SFQ, RED, GRED, TEQL
note: classfull qdiscs can be also leafs
The syntax for managing queuing discipline is:
Usage: tc qdisc [ add | del | replace | change | get ] dev STRING
[ handle QHANDLE ] [ root | ingress | parent CLASSID ]
[ estimator INTERVAL TIME_CONSTANT ]
[ [ QDISC_KIND ] [ help | OPTIONS ] ]
tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <lt;desired QDISC_KIND>gt; help
add
ads a qdisc to device dev
del
delete qdisc from device dev
replace
replace the qdisc with another
handle
represents the unique handle that is assigned by the user to the queuing discipline. No two queuing disciplines can have the same handle. Qdisc handles always have minor number equal to zero.
root
indicates that the queue is at the root of a link sharing hierarchy and own all bandwidth on that device. Can only have one root qdisc per device.
ingress
policing on the ingress
parent
represents the handle of the parent queuing discipline.
dev
is network device to which we want attach qdisc
estimator
is used to determine if the requirements of the queue have been satisfied. The INTERVAL and the TIME_CONSTANT are two parameters that are of very high significance to the estimator. The estimator estimate the bandwidth used by each class over the appropriate time interval, to determine whether or not each class has been receiving its link sharing bandwidth.
Usage: ... estimator INTERVAL TIME-CONST
INTERVAL is interval between measurements
TIME-CONST is averaging time constant
Example: ... est 1sec 8sec
The time constant for the estimator is a critical parameter; this time constant determines the interval over which the router attempts to enforce the link-sharing guidelines.
[1]Unfortunately, rate estimation is not a very easy task. F.e. I did not find a simple way to estimate the current peak rate and even failed to formulate the problem. So I preferred not to built an estimator into the scheduler, but run this task separately. Ideally, it should be kernel thread(s), but for now it runs from timers, which puts apparent top bounds on the number of rated flows, has minimal overhead on small, but is enough to handle controlled load service, sets of aggregates.
We measure rate over A=(1<lt;<lt;interval) seconds and evaluate EWMA:
avrate = avrate*(1-W) + rate*W
where W is chosen as negative power of 2: W = 2(-ewma_log)
The resulting time constant is:
T = A/(-ln(1-W))
NOTES.
* The stored value for avbps is scaled by 25, so that maximal rate is 1Gbit, avpps is scaled by 210.
* Minimal interval is HZ/4=250msec (it is the greatest common divisor for HZ=100 and HZ=1024 8)), maximal interval is (HZ/4)*2EST_MAX_INTERVAL = 8sec. Shorter intervals are too expensive, longer ones can be implemented at user level painlessly.
You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes.
I'm not 100% sure of what I've just said. It's just how I think it works.
3.0.0.1 to stop QoS completely use the following for eth0:
tc qdisc del dev eth0 root
3.1 Class Based Queue
In CBQ, every class has variables idle and avgidle and parameter maxidle used in computing the limit status for the class, and the parameter offtime used in determining how long to restrict throughput for overlimit classes.
idle:
The variable idle is the difference between the desired time and the measured actual time between the most recent packet transmissions for the last two packets sent from this class. When the connection is sending more than its allocated bandwidth, then idle is negative. When the connection is sending perfectly at its alloted rate, then idle is zero.
avgidle:
The variable avgidle is the average of idle, and it computed using an exponential weighted moving average (EWMA). When the avgidle is zero or lower, then the class is overlimit (the class has been exceeding its allocated bandwidth in a recent short time interval).
maxidle:
The parameter maxidle gives an upper bound for avgidle. Thus maxidle limits the credit given to a class that has recently been under its allocation.
offtime:
The parameter offtime gives the time interval that a overlimit must wait before sending another packet. This parameter determines the steady-state burst size for a class when the class is running over its limit.
minidle:
The minidle parameter gives a (negative) lower bound for avgidle. Thus, a negative minidle lets the scheduler remember that a class has recently used more than its allocated bandwidth.
Usage: ... cbq bandwidth BPS avpkt BYTES [ mpu BYTES ]
[ cell BYTES ] [ ewma LOG ]
bandwidth
represents the maximum bandwidth available to the device to which the queue is attached.
avpkt
represents the average packet size. This is used in determining the transmission time which is given as Transmission Time t = average packet size / Link Bandwidth
mpu
represents the minimum number of bytes that will be sent in a packet. Packets that are of size lesser than mpu are set to mpu. This is done because for ethernet-like interfaces, the minimum packet size is 64. This value is usually set to 64.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
CBQ class is automatically generated when a CBQ qdisc created. ??
note: rtab is rate table?
note: mariano: should first declare a cbq "parent" class (which uses all the bandwidth) and then declare the two "leaf" classes.
CBQ is complex qdisc and to be fully understood it is good to read Sally Floyds and Van Jacobsons paper.
3.2 Priority
Simple priority queue
Usage: ... prio bands NUMBER priomap P1 P2...
Where:
bands
number of bands to add (default 3)
priomap
define how the priomap looks like (default to 3-band scheduler map)
So if you define more than 3 bands, make sure to re-define the priomap
In prio as long as there is data to be dequeued in the higher priority queue, prio will favor the higher queue.
3.3 FIFO
Simple First-In-First-Out queue which provides basic store-and-forward capability. FIFO is default qdisc on most real interfaces.
Usage: ... [p|b]fifo [ limit NUMBER ]
"b" stands for bytes, while "p" stands for packets.
limit
maximum length of the queue in bytes for bfifo and in packets for pfifo
This means that the maximum length of the fifo queue is measured in bytes in the first case and in number of packets in the second case.
small note: The fifo queue can be set to 0, but this still allows a single packet to be enqueued.
3.4 TBF
Token Bucket Filter is qdisc which have tokens and works like that if there is token in the bucket it possible to enqueue packet and take token. Kernel puts token in the bucket in some intervals
Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS
[ mtu BYTES[/BYTES] ] [ peakrate KBPS ] [ latency TIME ]
limit
is the number of bytes that can be queued
burst
specifies bits per burst how much can be sent within a given unit of time to not create scheduling concerns
rate
is used indirectly in qdisc's: that's at tc rate is used to calculate the transmition time required for each packet sized from mpu to mtu. Another definition: rate option is what control bandwidth. AFAIK `bandwidth' represents the `real' bandwidth of the device.
mtu
is maximum transfer unit
peakrate
max short term rate
latency
max latency to queuing
Jamal: TBF is influenced by quiet a few parameters; peakrate, rate, MTU, burst size etc. It will do what you ask it to ;->gt; And at times it will let bursts flood the gate i.e you might end up sending at wire speed. What are your parameters like?
3.5 RED
Random Early Detection discard packet even when there is space in the queue. As the queue length increases drop probability also increases. This approach enables sender to be notified that there is likelihood of congestion before it is actually appeared.
Usage: ... red limit BYTES min BYTES max BYTES avpkt BYTES burst PACKETS
probability PROBABILITY bandwidth KBPS [ ecn ]
limit
actual physical size of the queue
min
minimum threshold in Kilobytes
max
maximum threshold in Kilobytes.
avpkt
is average packet size
burst
is burstiness (from Jamal: used to compute time constant ) ???
probability
should be random drop probability
bandwidth
should be the real bandwidth of the interface
ecn
? explicit congestion notification (flag or what)
Always make sure that min <lt; max <lt; limit
3.6 GRED
Generalized RED is used in DiffServ implementation and it has virtual queue (VQ) within physical queue. Currently, the number of virtual queues is limited to 16.
GRED is configured in two steps. First the generic parameters are configured to select the number of virtual queues DPs and whether to turn on the RIO-like buffer sharing scheme. Also at this point, a default virtual queue is selected.
The second step is used to set parameters for individual virtual queues.
Usage: ... gred DP drop-probability limit BYTES min BYTES max BYTES
avpkt BYTES burst PACKETS probability PROBABILITY bandwidth KBPS
[prio value]
OR ... gred setup DPs <lt;num of DPs>gt; default <lt;default DP>gt; [grio]
setup
identifies that this is a generic setup for GRED
DPs
is the number of virtual queues
default
specifies default virtual queue
grio
turns on the RIO-like buffering scheme
limit
defines the virtual queue ``physical'' limit in bytes
min
defines the minimum threshold value in bytes
max
defines the maximum threshold value in bytes
avpkt
is the average packet size in bytes
bandwidth
is the wire-speed of the interface
burst
is the number of average-sized packets allowed to burst
probability
defines the drop probability in the range (0...)
DP
identifies the virtual queue assigned to these parameters
drop-probability
?
prio
identifies the virtual queue priority if grio was set in general parameters
3.7 SFQ
Stochastic Fair Queue as it's name implies. It processes queues in round-robin order.
Usage: ... sfq [ perturb SECS ] [ quantum BYTES ]
perturb
is no of seconds after them hashing function will be changed to minimize hash collision to small time interval (the perturb interval).
quantum
is DRR (Deficit Round Robin) round quantum like in CBQ.
3.8 ATM
Used to re-direct flows from the default path to ATM VCs. Each flow can have its own ATM VC, but multiple flows can also share the same VC.
Werner: ATM qdisc is different. It takes packets from some traffic stream (no matter what interface or such), and sends it over specific (and typically dedicated) ATM connections.
Werner: Then there's the case of qdiscs that don't really queue data, e.g. sch_dsmark or sch_atm.
3.9 Dsmark
Diff-serv marker isn't really a queuing discipline. It marks packet according to specified rule. It is configured as qdisc first and after that as class (if it is used for classification)
Usage: dsmark indices INDICES [ default_index DEFAULT_INDEX ] [ set_tc_index ]
indices
is the size of the table of (mask,value) pairs. See bellow. (maybe mask value)
default_index
is used if the classifier finds no match
set_tc_index
if set retrieves the content of the DS field and stores it in skb->gt;tc_index
When invoked to create class it's parameter are:
Usage: ... dsmark [ mask MASK ] [ value VALUE ]
mask
mask on DSCP (default 0xff)
value
value to or with (default 0)
Outgoing DSCP = (Incoming DSCP AND mask) OR value
Where Incoming DSCP is the DSCP value of the original incoming packet, and Outgoing DSCP is the DSCP that the packet will be assigned as it leaves the queue.
3.10 INGRESS
if present, the ingress qdisc is invoked for each packet arriving on the respective interface
ingress is a qdisc that only classifies but doesn't queue
the usual classifiers, classifier combinations, and policing functions can be used
the classification result is stored in skb->gt;tc_index, a la sch_dsmark
if the classification returns a "drop" result (TC_POLICE_SHOT), the packet is discarded. Otherwise, it is accepted.
Since there is no queue for implicit rate limiting (via PRIO, TBF, CBQ, etc.), rate limiting must be done explicitly via policing. This is still done exactly like policing on egress.
4 classes
mps: should I explain what is class and their intimacy with qdisc? Yes? Classes are main component of the QoS. (stupid explanation)
The syntax for creating a class is shown below:
tc class [ add | del | change | get ] dev STRING
[ classid CLASSID ] [ root | parent CLASSID ]
[ [ QDISC_KIND ] [ help | OPTIONS ] ]
tc class show [ dev STRING ] [ root | parent CLASSID ]
Where: QDISC_KIND := { prio | cbq | etc. }
OPTIONS := ... try tc class add <lt;desired QDISC_KIND>gt; help
The QDISC_KIND can be one of the queuing disciplines that support classes. The interpretation of the fields:
classid
represents the handle that is assigned to the class by the user. It consists of a major number and a minor number, which have been discussed already.
root
indicates that the class represents the root class in the link sharing hierarchy.
parent
indicates the handle of the parent of the queuing discipline.
4.1 CBQ
This algorithm classifies the waiting packets into a tree-like hierarchy of classes; the leaves of this tree are in turn scheduled by separate algorithms (called "disciplines" in this context).
Usage: ... cbq bandwidth BPS rate BPS maxburst PKTS [ avpkt BYTES ]
[ minburst PKTS ] [ bounded ] [ isolated ]
[ allot BYTES ] [ mpu BYTES ] [ weight RATE ]
[ prio NUMBER ] [ cell BYTES ] [ ewma LOG ]
[ estimator INTERVAL TIME_CONSTANT ]
[ split CLASSID ] [ defmap MASK/CHANGE ]
bandwidth
represents the maximum bandwidth that is available to the queuing discipline owned by this class. It is only used as helper value to compute min/max idle values from maxburst and avpkt.
rate
represents the bandwidth that is allocated to this class. rate should be set to the desired bandwidth (you want) to allocate to a given traffic class. The kernel does not use this directly. It uses pre-calculated rate translation tables. It is used to compute overlimit status of class.
maxburst
represents the number of bytes that will be sent in the longest possible burst.
avpkt
represents the average number of bytes in a packet belonging to this class.
minburst
represents the number of bytes that will be sent in the shortest possible burst.
bounded
indicates that the class cannot borrow unused bandwidth from its ancestors. If this is not specified, then the class can borrow unused bandwidth from the parent (default off).
isolated
indicates that the class will not share bandwidth with any of non-descendant classes
allot
allot is MTU + MAC header
mpu
is explained at page
weight
should be made proportional to the rate.(explain CBQ is implemented using Weighted Round Robin algorithm)
prio
represents the priority that is assigned to this class. priority of value 0 is highest (most important) and value 7 is lowest.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
ewma
is explained at page
estimator
is explained at page
split
field is used for fast access. This is normally the root of the CBQ tree. It can be set to any node in the hierarchy thereby enabling the use of a simple and fast classifier, which is configured only for a limited set of keys to point to this node. Only classes with split node set to this node will be matched. The type of service (TOS in the IP header) and sk->gt;priority is not used for this purpose.
defmap
say that best effort traffic, not classfied by another means will fall to this class. defmap is bitmap of logical priorities served by this class
A note about CBQ class setup:
cbq class has fifo qdisc attached by default
You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes. I'm not 100% sure of what I've just said. It's just how I think it works.
5 filters (or classifier)
Filters are used to classify (map) packets based on certain properties of the packet e.g. TOS byte in the IP header, IP addresses, port numbers etc to certain classes. Queuing disciplines uses filters to assign incoming packets to one of its classes. Filters can be maintained per class or per queuing disciplines based on the design of the queuing discipline. Filters are maintained in filter lists. Filter lists are ordered by priority, in ascending order. Also, the entries are keyed by the protocol for which they apply, e.g., IP, UDP etc. Filters for the same protocol on the same filter list must have different priority values.
Filter vary in the scope
Filters have meters associated with them (TB+rate estimator)
Usage: tc filter [ add | del | change | get ] dev STRING
[ pref PRIO ] [ protocol PROTO ]
[ estimator INTERVAL TIME_CONSTANT ]
[ root | classid CLASSID ] [ handle FILTERID ]
[ [ FILTER_TYPE ] [ help | OPTIONS ] ]
or
tc filter show [ dev STRING ] [ root | parent CLASSID ]
Where:
FILTER_TYPE := { rsvp | u32 | fw | route | etc. }
FILTERID := ... format depends on classifier, see there
OPTIONS := ... try tc filter add <lt;desired FILTER_KIND>gt; help
The interpretation of the fields:
pref
represents the priority that is assigned to the filter.
protocol
is used by the filter to identify packets belonging only to that protocol. As already mentioned, no two filters can have the same priority and protocol field.
root
indicates that the filter is at the root of the link sharing hierarchy.
classid
represents the handle of the class to which the filter is applied.
handle
represents the handle by which the filter is identified uniquely. The format of the filter is different for different classifiers.
estimator
is explained at page
5.1 filter rsvp
Use RSVP protocol for classification
Usage: ... rsvp ipproto PROTOCOL session DST[/PORT | GPI ]
[ sender SRC[/PORT | GPI ]
[ classid CLASSID ] [ police POLICE_SPEC ]
[ tunnelid ID ] [ tunnel ID skip NUMBER ]
Where:
GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |
u{8|16|32} NUMBER mask MASK at OFFSET}
POLICE_SPEC := ... look at TBF
FILTERID := X:Y
Comparing to general packet classification problem, RSVP needs only several relatively simple rules:
(dst, protocol) are always specified, so that we are able to hash them.
ipproto
is one of the IP protocol (TCP, UDP and maybe other)
session
is destination (address?) with or without port, or gpi (Generalized Port Identifier)
src
may be exact, or may be wildcard, so that we can keep a hash table plus one wildcard entry.
source
port (or flow label) is important only if src is given.
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ edge router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.
Alexey: IMPLEMENTATION.
We use a two level hash table: The top level is keyed by destination address and protocol ID, every bucket contains a list of "rsvp sessions", identified by destination address, protocol and DPI(="Destination Port ID"): triple (key, mask, offset).
Every bucket has a smaller hash table keyed by source address (cf. RSVP flowspec) and one wildcard entry for wildcard reservations. Every bucket is again a list of "RSVP flows", selected by source address and SPI(="Source Port ID" here rather than "security parameter index"): triple (key, mask, offset).
All the packets with IPv6 extension headers (but AH and ESP) and all fragmented packets go to the best-effort traffic class.
Two "port id"'s seems to be redundant, rfc2207 requires only one "Generalized Port Identifier". So that for classic ah, esp (and udp,tcp) both *pi should coincide or one of them should be wildcard.
At first sight, this redundancy is just a waste of CPU resources. But DPI and SPI add the possibility to assign different priorities to GPIs. Look also at note 4 about tunnels below.
One complication is the case of tunneled packets. We implement it as following: if the first lookup matches a special session with "tunnelhdr" value not zero, flowid doesn't contain the true flow ID, but the tunnel ID (1...255). In this case, we pull tunnelhdr bytes and restart lookup with tunnel ID added to the list of keys. Simple and stupid 8)8) It's enough for PIMREG and IPIP.
Two GPIs make it possible to parse even GRE packets. F.e. DPI can select ETH_P_IP (and necessary flags to make tunnelhdr correct) in GRE protocol field and SPI matches GRE key. Is it not nice? 8)8)
Well, as result, despite its simplicity, we get a pretty powerful classification engine.
Panagiotis Stathopoulos: Well an rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ egde router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.
note: I have to read more about RSVP
5.2 filter u32
Anything in the header can be used for classification
The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules.
Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]
[ police POLICE_SPEC ] [ offset OFFSET_SPEC ]
[ ht HTID ] [ hashkey HASHKEY_SPEC ]
[ sample SAMPLE ]
or u32 divisor DIVISOR
Where: SELECTOR := SAMPLE SAMPLE ...
SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} } SAMPLE_ARGS FILTERID := X:Y:Z
match
SELECTOR contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful.
link
classid
police
offset
ht
is hash table
hashkey
is the key to hash table
sample
is protocol such as IP or higher layer protocol such as UDP, TCP or ICMP. sample can be one of the keywords u32, u16 or u8 specifies length of the pattern in bits. PATTERN and MASK should follow, of length defined by the previous keyword. The OFFSET parameter is the offset, in bytes, to start matching. If nexthdr+ keyword is given, the offset is relative to start of the upper layer header.
police
specification is explained on the page
The syntax here is match ip <lt;item>gt; <lt;value>gt; <lt;mask>gt;
So match ip protocol 6 0xff matches protocol 6, TCP. (See /etc/protocols) match ip dport 0x17 0xffff is TELNET (/etc/services). Note that the number is hexadecimal, not decimal.
note: (mps) ht - hash table HTID Hash Table ID is fh - filter handle in filter show
The filters are packed to hash tables of key nodes with a set of 32bit key/mask pairs at every node. Nodes reference next level hash tables etc.
It seems that it represents the best middle point between speed and manageability both by human and by machine.
It is especially useful for link sharing combined with QoS; pure RSVP doesn't need such a general approach and can use much simpler (and faster) schemes.
5.3 filter fw
Classifier mapping ipchains' fwmark to traffic class
Usage: ... fw [ classid CLASSID ] [ police POLICE_SPEC ]
POLICE_SPEC := ... look at TBF
CLASSID := X:Y
classid
is class handle
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
5.4 filter route
Use routing table decisions for classification
Usage: ... route [ from REALM | fromif TAG ] [ to REALM ]
[ flowid CLASSID ] [ police POLICE_SPEC ]
POLICE_SPEC := ... look at TBF
CLASSID := X:Y
from
REALM is realm in ip route table
fromif
TAG is interface tag
to
REALM is (again) ip route table realm
flowid
CLASSID is class to which packet (if passed) is
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
For now we assume that route tags <lt; 256. It allows to use direct table lookups, instead of hash tables.
For now we assume that "from TAG" and "fromdev DEV" statements are mutually exclusive.
"to TAG from ANY" has higher priority, than "to ANY from XXX"
5.5 tcindex
Use tc_index internal tag in skb to select classes.
Usage: ... tcindex [ hash SIZE ] [ mask MASK ] [ shift SHIFT ] [ pass_on | fall_through ] [ classid CLASSID ] [ police POLICE_SPEC ]
hash
is the size of the lookup table
mask
is the bit mask (this explanation is worthless)
shift
the mask right by SHIFT number
pass_on
defines that this packet will pass
fall_through
classid
is the class to which filter is attached
police
specification is explained on the page
note: key = (skb->gt;tc_index >gt;>gt; shift) & mask
6 police
The purpose of policing is to ensure that traffic does not exceed certain bounds. For simplicity, we will assume a broad definition of policing and consider it to comprise all kinds of traffic control actions that depend in some way on the traffic volume.
We consider four types of policing mechanisms:
policing decisions by filters
refusal to enqueue a packet
dropping of a packet from an ``inner'' queueing discipline
dropping of packet when enqueuing a new one
Usage: ... police rate BPS burst BYTES[/BYTES] [ mtu BYTES[/BYTES] ]
[ peakrate BPS ] [ avrate BPS ] [ ACTION ]
Where: ACTION := reclassify | drop | continue
rate
is the long-term rate attached to the meter
peakrate
this is the peakrate a flow is allowed to burst in the short-term. Basically this upper-bounds the rate.
mtu
a packet exceeding this size will be dropped. The default value is 2KB. This is fine with ethernet whose MTU is 1.5KB but will not be fine with Gigabit ethernet exploiting Jumbo frames for example. It also will not be valid for the lo device whose MTU is defined by amongst other things how much RAM you have. You must set this value if you have exceptions to the rule.
ACTION
exceed/non-exceed: This allows to define what actions should be exercised when a flow either exceeds its allocated or doesn't. they are:
pass
(?)
reclassify
used by CBQ to go to BE (Best Effort, ask Jamal?)
drop
simply drops packet
continue
- lookup the next filter rule with lower priority
note: "drop" is only recognized by the following qdiscs: atm, cbq, dsmark, and (ingress - really?). In particular, prio ignores it.
Bibliography
1
A. N. Kuznetsov, docs from iproute2
2
Werner Almesberger, Linux Network Traffic Control - Implementation Overview
3
Jamal Hadi Salim, IP Quality of Service on Linux http://????
4
Saravanan Radhakrishnan, Linux - Advanced Networking Overview http://qos.ittc.ukans.edu/howto/howto.html
12
Almesberger, Jamal Hadi Salim, Alexey Kuznetsov - Differentiated Services on Linux
6
linux-diffserv mailing list linux-diffserv@lrc.di.epfl.ch
7
Sally Floyd, Van Jacobson - Link-sharing and Resource Management Models for Packet Networks
9
Sally Floyd, Van Jacobson - Random Early Detection Gateways for Congestion Avoidance
9
Related Cisco documents from http://www.cisco.com/
10
Lixia Zhang, Steve Deering, Deborah Estrin, Scott Shenker, Daniel Zapalla - RSVP: A New Resource ReSerVation Protocol
11
Related RFC's
12
and many others
Appendix
note: flowid is sometimes class handle sometimes something else
mariano - good setup for me: If you remove the router and then the modem line becomes ppp0 (instead of eth0), you should declare that ppp0 has "bandwidth 30K". Then, the classes should use "bandwidth 30K rate 20K" and "bandwidth 30K rate 10K"
Milan P. Stanic
mps@rns-nis.co.yu
Contents
Contents
1 What is QoS
2 command syntax
3 Queueing disciplines
3.1 Class Based Queue
3.2 Priority
3.3 FIFO
3.4 TBF
3.5 RED
3.6 GRED
3.7 SFQ
3.8 ATM
3.9 Dsmark
3.10 INGRESS
4 classes
4.1 CBQ
5 filters (or classifier)
5.1 filter rsvp
5.2 filter u32
5.3 filter fw
5.4 filter route
5.5 tcindex
6 police
Bibliography
Appendix
About this document
This document should be (comprehensive) description of tc command utility from iproute2 package.
Primary motivation for this work is my wish to learn about QoS in Linux (and about QoS in general). If you find errors or big mistakes in this document that is because I don't yet understand QoS. I hope it will improve over time.
It is based on kernel 2.4 and iproute2 version 000305
It is far from to be complete and/or without errors. I am writing it for purpose of my learning only, and I am not sure will (and when) it be finished. But, I am working on it (especially when I have time :).
All of the text is taken from different documents from the net, from linux-diffserv, linux-net mailing lists and from the Linux kernel source files.
Disclaimer: Use at your own risk. I am not responsible if you make loss or damage in any sense by using information from this document.
1 What is QoS
When the kernel has several packets to send out over a network device, it has to decide which ones to send first, which ones to delay, and which ones to drop. This is the job of the packet scheduler, and several different algorithms for how to do this "fairly" have been proposed.
With Linux QoS subsystem (which is constructed of the building blocks of the kernel and user space tools like ip and tc command line utilities) it is possible to make very flexible traffic control.
2 command syntax
tc (traffic controller) is the user level program which can be used to create and associate queues with the network devices. It is used to set up various kinds of queues and associate classes with each of those queues. It is also used to set up filters by which the packets is classified.
Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { qdisc | class | filter }
OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] }
Where it's expecting a number for BPS; it understands some suffixes: kbps (*1024), mbps (*1024*1024), kbit (*1024/8), and mbit (*1024*1024/8). If I'm reading the code correctly; "BPS" means Bytes Per Second; if you give a number without a suffix it assumes you want BITS per second (it divides the number you give it by 8). It also understands bps as a suffix.
Where it's expecting a time value, it seems it understands suffixes of s, sec, and secs for seconds, ms, msec, and msecs for milliseconds, and us, usec, and usecs for microseconds.
Where it wants a size parameter, it assumes non-suffixed numbers to be specified in bytes. It also understands suffixes of k and kb to mean kilobytes (*1024), m and mb to mean megabytes (*1024*1024), kbit to mean kilobit (*1024/8), and mbit to mean megabits (*1024*1024/8).
1Mbit == 128Kbps or 1 megabit is 128 kilobytes per second
bps = bits/sec (uhmm...)
kbps = bytes/sec * 1024
mbps = bytes/sec * 1024 * 1024
kbit = bits/sec * 1024
mbit = bits/sec * 1024 * 1024
In the examples Xbit and Xbps are interchangeably, when tc treats them very differently.
note: this is very confusing
note: make sure whenever you are dealing with memory related things like queue size, buffer size that their units are in bytes and when it is bandwidth and rate related parameters the units are in bits.
3 Queueing disciplines
Each network device has a queuing discipline associated with it, which controls how packets enqueued on that device are treated. It can be viewed with ip command:
root@dl:# ip link show
1: lo: <lt;LOOPBACK,UP>gt; mtu 3924 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <lt;BROADCAST,MULTICAST,PROMISC,UP>gt; mtu 1500 qdisc pfifo_fast qlen 100
link/ether 52:54:00:de:bf:19 brd ff:ff:ff:ff:ff:ff
3: tap0: <lt;BROADCAST,MULTICAST,NOARP>gt; mtu 1500 qdisc noop
link/ether fe:fd:00:00:00:00 brd ff:ff:ff:ff:ff:ff
Generally, queueing discipline ("qdisc") is a black box, which is able to enqueue packets and to dequeue them (when device is ready to send something) in order and at times determined by algorithm hidden in it.
By default queueing discipline is pfifo_fast which cannot be manipulated with tc. It is assigned to device when the device is started or when the other qdisc's deleted from the device. That qdiscs have 3 bands which are processed from band 0 to band 2, and when there is a packet in queue in higher priority band (lower number)
Qdisc's are:
FIFO - simple FIFO (packet (p-FIFO) or byte (b-FIFO) )
PRIO - n-band strict priority scheduler
TBF - token bucket filter
CBQ - class based queue
CSZ - Clark-Scott-Zhang
SFQ - stochastic fair queue
RED - random early detection
GRED - generalized random early detection
TEQL - traffic equalizer
ATM - asynchronous transfer mode
DSMARK - DSCP (Diff-Serv Code Point)marker/remarker
qdisc's are divided to two categories:
- "queues", which have no internal structure visible from outside.
- "schedulers", which split all the packets to "traffic classes", using "packet classifiers". ? is qdisc's which can split packets to ``traffic classes''
In turn, classes may have child qdiscs (as rule, queues) attached to them etc. etc. etc.
note: Certain qdiscs can have children and they are classfull, and others are leafs (describe it!)
classfull qdiscs: CBQ, ATM, DSMARK, CSZ and the ( p-FIFO ???? or prio )
leaf qdiscs: TBF, FIFO, SFQ, RED, GRED, TEQL
note: classfull qdiscs can be also leafs
The syntax for managing queuing discipline is:
Usage: tc qdisc [ add | del | replace | change | get ] dev STRING
[ handle QHANDLE ] [ root | ingress | parent CLASSID ]
[ estimator INTERVAL TIME_CONSTANT ]
[ [ QDISC_KIND ] [ help | OPTIONS ] ]
tc qdisc show [ dev STRING ] [ingress]
Where:
QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }
OPTIONS := ... try tc qdisc add <lt;desired QDISC_KIND>gt; help
add
ads a qdisc to device dev
del
delete qdisc from device dev
replace
replace the qdisc with another
handle
represents the unique handle that is assigned by the user to the queuing discipline. No two queuing disciplines can have the same handle. Qdisc handles always have minor number equal to zero.
root
indicates that the queue is at the root of a link sharing hierarchy and own all bandwidth on that device. Can only have one root qdisc per device.
ingress
policing on the ingress
parent
represents the handle of the parent queuing discipline.
dev
is network device to which we want attach qdisc
estimator
is used to determine if the requirements of the queue have been satisfied. The INTERVAL and the TIME_CONSTANT are two parameters that are of very high significance to the estimator. The estimator estimate the bandwidth used by each class over the appropriate time interval, to determine whether or not each class has been receiving its link sharing bandwidth.
Usage: ... estimator INTERVAL TIME-CONST
INTERVAL is interval between measurements
TIME-CONST is averaging time constant
Example: ... est 1sec 8sec
The time constant for the estimator is a critical parameter; this time constant determines the interval over which the router attempts to enforce the link-sharing guidelines.
[1]Unfortunately, rate estimation is not a very easy task. F.e. I did not find a simple way to estimate the current peak rate and even failed to formulate the problem. So I preferred not to built an estimator into the scheduler, but run this task separately. Ideally, it should be kernel thread(s), but for now it runs from timers, which puts apparent top bounds on the number of rated flows, has minimal overhead on small, but is enough to handle controlled load service, sets of aggregates.
We measure rate over A=(1<lt;<lt;interval) seconds and evaluate EWMA:
avrate = avrate*(1-W) + rate*W
where W is chosen as negative power of 2: W = 2(-ewma_log)
The resulting time constant is:
T = A/(-ln(1-W))
NOTES.
* The stored value for avbps is scaled by 25, so that maximal rate is 1Gbit, avpps is scaled by 210.
* Minimal interval is HZ/4=250msec (it is the greatest common divisor for HZ=100 and HZ=1024 8)), maximal interval is (HZ/4)*2EST_MAX_INTERVAL = 8sec. Shorter intervals are too expensive, longer ones can be implemented at user level painlessly.
You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes.
I'm not 100% sure of what I've just said. It's just how I think it works.
3.0.0.1 to stop QoS completely use the following for eth0:
tc qdisc del dev eth0 root
3.1 Class Based Queue
In CBQ, every class has variables idle and avgidle and parameter maxidle used in computing the limit status for the class, and the parameter offtime used in determining how long to restrict throughput for overlimit classes.
idle:
The variable idle is the difference between the desired time and the measured actual time between the most recent packet transmissions for the last two packets sent from this class. When the connection is sending more than its allocated bandwidth, then idle is negative. When the connection is sending perfectly at its alloted rate, then idle is zero.
avgidle:
The variable avgidle is the average of idle, and it computed using an exponential weighted moving average (EWMA). When the avgidle is zero or lower, then the class is overlimit (the class has been exceeding its allocated bandwidth in a recent short time interval).
maxidle:
The parameter maxidle gives an upper bound for avgidle. Thus maxidle limits the credit given to a class that has recently been under its allocation.
offtime:
The parameter offtime gives the time interval that a overlimit must wait before sending another packet. This parameter determines the steady-state burst size for a class when the class is running over its limit.
minidle:
The minidle parameter gives a (negative) lower bound for avgidle. Thus, a negative minidle lets the scheduler remember that a class has recently used more than its allocated bandwidth.
Usage: ... cbq bandwidth BPS avpkt BYTES [ mpu BYTES ]
[ cell BYTES ] [ ewma LOG ]
bandwidth
represents the maximum bandwidth available to the device to which the queue is attached.
avpkt
represents the average packet size. This is used in determining the transmission time which is given as Transmission Time t = average packet size / Link Bandwidth
mpu
represents the minimum number of bytes that will be sent in a packet. Packets that are of size lesser than mpu are set to mpu. This is done because for ethernet-like interfaces, the minimum packet size is 64. This value is usually set to 64.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
CBQ class is automatically generated when a CBQ qdisc created. ??
note: rtab is rate table?
note: mariano: should first declare a cbq "parent" class (which uses all the bandwidth) and then declare the two "leaf" classes.
CBQ is complex qdisc and to be fully understood it is good to read Sally Floyds and Van Jacobsons paper.
3.2 Priority
Simple priority queue
Usage: ... prio bands NUMBER priomap P1 P2...
Where:
bands
number of bands to add (default 3)
priomap
define how the priomap looks like (default to 3-band scheduler map)
So if you define more than 3 bands, make sure to re-define the priomap
In prio as long as there is data to be dequeued in the higher priority queue, prio will favor the higher queue.
3.3 FIFO
Simple First-In-First-Out queue which provides basic store-and-forward capability. FIFO is default qdisc on most real interfaces.
Usage: ... [p|b]fifo [ limit NUMBER ]
"b" stands for bytes, while "p" stands for packets.
limit
maximum length of the queue in bytes for bfifo and in packets for pfifo
This means that the maximum length of the fifo queue is measured in bytes in the first case and in number of packets in the second case.
small note: The fifo queue can be set to 0, but this still allows a single packet to be enqueued.
3.4 TBF
Token Bucket Filter is qdisc which have tokens and works like that if there is token in the bucket it possible to enqueue packet and take token. Kernel puts token in the bucket in some intervals
Usage: ... tbf limit BYTES burst BYTES[/BYTES] rate KBPS
[ mtu BYTES[/BYTES] ] [ peakrate KBPS ] [ latency TIME ]
limit
is the number of bytes that can be queued
burst
specifies bits per burst how much can be sent within a given unit of time to not create scheduling concerns
rate
is used indirectly in qdisc's: that's at tc rate is used to calculate the transmition time required for each packet sized from mpu to mtu. Another definition: rate option is what control bandwidth. AFAIK `bandwidth' represents the `real' bandwidth of the device.
mtu
is maximum transfer unit
peakrate
max short term rate
latency
max latency to queuing
Jamal: TBF is influenced by quiet a few parameters; peakrate, rate, MTU, burst size etc. It will do what you ask it to ;->gt; And at times it will let bursts flood the gate i.e you might end up sending at wire speed. What are your parameters like?
3.5 RED
Random Early Detection discard packet even when there is space in the queue. As the queue length increases drop probability also increases. This approach enables sender to be notified that there is likelihood of congestion before it is actually appeared.
Usage: ... red limit BYTES min BYTES max BYTES avpkt BYTES burst PACKETS
probability PROBABILITY bandwidth KBPS [ ecn ]
limit
actual physical size of the queue
min
minimum threshold in Kilobytes
max
maximum threshold in Kilobytes.
avpkt
is average packet size
burst
is burstiness (from Jamal: used to compute time constant ) ???
probability
should be random drop probability
bandwidth
should be the real bandwidth of the interface
ecn
? explicit congestion notification (flag or what)
Always make sure that min <lt; max <lt; limit
3.6 GRED
Generalized RED is used in DiffServ implementation and it has virtual queue (VQ) within physical queue. Currently, the number of virtual queues is limited to 16.
GRED is configured in two steps. First the generic parameters are configured to select the number of virtual queues DPs and whether to turn on the RIO-like buffer sharing scheme. Also at this point, a default virtual queue is selected.
The second step is used to set parameters for individual virtual queues.
Usage: ... gred DP drop-probability limit BYTES min BYTES max BYTES
avpkt BYTES burst PACKETS probability PROBABILITY bandwidth KBPS
[prio value]
OR ... gred setup DPs <lt;num of DPs>gt; default <lt;default DP>gt; [grio]
setup
identifies that this is a generic setup for GRED
DPs
is the number of virtual queues
default
specifies default virtual queue
grio
turns on the RIO-like buffering scheme
limit
defines the virtual queue ``physical'' limit in bytes
min
defines the minimum threshold value in bytes
max
defines the maximum threshold value in bytes
avpkt
is the average packet size in bytes
bandwidth
is the wire-speed of the interface
burst
is the number of average-sized packets allowed to burst
probability
defines the drop probability in the range (0...)
DP
identifies the virtual queue assigned to these parameters
drop-probability
?
prio
identifies the virtual queue priority if grio was set in general parameters
3.7 SFQ
Stochastic Fair Queue as it's name implies. It processes queues in round-robin order.
Usage: ... sfq [ perturb SECS ] [ quantum BYTES ]
perturb
is no of seconds after them hashing function will be changed to minimize hash collision to small time interval (the perturb interval).
quantum
is DRR (Deficit Round Robin) round quantum like in CBQ.
3.8 ATM
Used to re-direct flows from the default path to ATM VCs. Each flow can have its own ATM VC, but multiple flows can also share the same VC.
Werner: ATM qdisc is different. It takes packets from some traffic stream (no matter what interface or such), and sends it over specific (and typically dedicated) ATM connections.
Werner: Then there's the case of qdiscs that don't really queue data, e.g. sch_dsmark or sch_atm.
3.9 Dsmark
Diff-serv marker isn't really a queuing discipline. It marks packet according to specified rule. It is configured as qdisc first and after that as class (if it is used for classification)
Usage: dsmark indices INDICES [ default_index DEFAULT_INDEX ] [ set_tc_index ]
indices
is the size of the table of (mask,value) pairs. See bellow. (maybe mask value)
default_index
is used if the classifier finds no match
set_tc_index
if set retrieves the content of the DS field and stores it in skb->gt;tc_index
When invoked to create class it's parameter are:
Usage: ... dsmark [ mask MASK ] [ value VALUE ]
mask
mask on DSCP (default 0xff)
value
value to or with (default 0)
Outgoing DSCP = (Incoming DSCP AND mask) OR value
Where Incoming DSCP is the DSCP value of the original incoming packet, and Outgoing DSCP is the DSCP that the packet will be assigned as it leaves the queue.
3.10 INGRESS
if present, the ingress qdisc is invoked for each packet arriving on the respective interface
ingress is a qdisc that only classifies but doesn't queue
the usual classifiers, classifier combinations, and policing functions can be used
the classification result is stored in skb->gt;tc_index, a la sch_dsmark
if the classification returns a "drop" result (TC_POLICE_SHOT), the packet is discarded. Otherwise, it is accepted.
Since there is no queue for implicit rate limiting (via PRIO, TBF, CBQ, etc.), rate limiting must be done explicitly via policing. This is still done exactly like policing on egress.
4 classes
mps: should I explain what is class and their intimacy with qdisc? Yes? Classes are main component of the QoS. (stupid explanation)
The syntax for creating a class is shown below:
tc class [ add | del | change | get ] dev STRING
[ classid CLASSID ] [ root | parent CLASSID ]
[ [ QDISC_KIND ] [ help | OPTIONS ] ]
tc class show [ dev STRING ] [ root | parent CLASSID ]
Where: QDISC_KIND := { prio | cbq | etc. }
OPTIONS := ... try tc class add <lt;desired QDISC_KIND>gt; help
The QDISC_KIND can be one of the queuing disciplines that support classes. The interpretation of the fields:
classid
represents the handle that is assigned to the class by the user. It consists of a major number and a minor number, which have been discussed already.
root
indicates that the class represents the root class in the link sharing hierarchy.
parent
indicates the handle of the parent of the queuing discipline.
4.1 CBQ
This algorithm classifies the waiting packets into a tree-like hierarchy of classes; the leaves of this tree are in turn scheduled by separate algorithms (called "disciplines" in this context).
Usage: ... cbq bandwidth BPS rate BPS maxburst PKTS [ avpkt BYTES ]
[ minburst PKTS ] [ bounded ] [ isolated ]
[ allot BYTES ] [ mpu BYTES ] [ weight RATE ]
[ prio NUMBER ] [ cell BYTES ] [ ewma LOG ]
[ estimator INTERVAL TIME_CONSTANT ]
[ split CLASSID ] [ defmap MASK/CHANGE ]
bandwidth
represents the maximum bandwidth that is available to the queuing discipline owned by this class. It is only used as helper value to compute min/max idle values from maxburst and avpkt.
rate
represents the bandwidth that is allocated to this class. rate should be set to the desired bandwidth (you want) to allocate to a given traffic class. The kernel does not use this directly. It uses pre-calculated rate translation tables. It is used to compute overlimit status of class.
maxburst
represents the number of bytes that will be sent in the longest possible burst.
avpkt
represents the average number of bytes in a packet belonging to this class.
minburst
represents the number of bytes that will be sent in the shortest possible burst.
bounded
indicates that the class cannot borrow unused bandwidth from its ancestors. If this is not specified, then the class can borrow unused bandwidth from the parent (default off).
isolated
indicates that the class will not share bandwidth with any of non-descendant classes
allot
allot is MTU + MAC header
mpu
is explained at page
weight
should be made proportional to the rate.(explain CBQ is implemented using Weighted Round Robin algorithm)
prio
represents the priority that is assigned to this class. priority of value 0 is highest (most important) and value 7 is lowest.
cell
represents the boundaries of the bytes in the packets that are transmitted. It is used to index into an rtab table, that maintains the packet transmission times for various packet sizes.
ewma
is explained at page
estimator
is explained at page
split
field is used for fast access. This is normally the root of the CBQ tree. It can be set to any node in the hierarchy thereby enabling the use of a simple and fast classifier, which is configured only for a limited set of keys to point to this node. Only classes with split node set to this node will be matched. The type of service (TOS in the IP header) and sk->gt;priority is not used for this purpose.
defmap
say that best effort traffic, not classfied by another means will fall to this class. defmap is bitmap of logical priorities served by this class
A note about CBQ class setup:
cbq class has fifo qdisc attached by default
You *have* to declare first, the CBQ qdisc, then the CBQ "parent" class, and then (optionally, I think), the CBQ "leaf " classes. I'm not 100% sure of what I've just said. It's just how I think it works.
5 filters (or classifier)
Filters are used to classify (map) packets based on certain properties of the packet e.g. TOS byte in the IP header, IP addresses, port numbers etc to certain classes. Queuing disciplines uses filters to assign incoming packets to one of its classes. Filters can be maintained per class or per queuing disciplines based on the design of the queuing discipline. Filters are maintained in filter lists. Filter lists are ordered by priority, in ascending order. Also, the entries are keyed by the protocol for which they apply, e.g., IP, UDP etc. Filters for the same protocol on the same filter list must have different priority values.
Filter vary in the scope
Filters have meters associated with them (TB+rate estimator)
Usage: tc filter [ add | del | change | get ] dev STRING
[ pref PRIO ] [ protocol PROTO ]
[ estimator INTERVAL TIME_CONSTANT ]
[ root | classid CLASSID ] [ handle FILTERID ]
[ [ FILTER_TYPE ] [ help | OPTIONS ] ]
or
tc filter show [ dev STRING ] [ root | parent CLASSID ]
Where:
FILTER_TYPE := { rsvp | u32 | fw | route | etc. }
FILTERID := ... format depends on classifier, see there
OPTIONS := ... try tc filter add <lt;desired FILTER_KIND>gt; help
The interpretation of the fields:
pref
represents the priority that is assigned to the filter.
protocol
is used by the filter to identify packets belonging only to that protocol. As already mentioned, no two filters can have the same priority and protocol field.
root
indicates that the filter is at the root of the link sharing hierarchy.
classid
represents the handle of the class to which the filter is applied.
handle
represents the handle by which the filter is identified uniquely. The format of the filter is different for different classifiers.
estimator
is explained at page
5.1 filter rsvp
Use RSVP protocol for classification
Usage: ... rsvp ipproto PROTOCOL session DST[/PORT | GPI ]
[ sender SRC[/PORT | GPI ]
[ classid CLASSID ] [ police POLICE_SPEC ]
[ tunnelid ID ] [ tunnel ID skip NUMBER ]
Where:
GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |
u{8|16|32} NUMBER mask MASK at OFFSET}
POLICE_SPEC := ... look at TBF
FILTERID := X:Y
Comparing to general packet classification problem, RSVP needs only several relatively simple rules:
(dst, protocol) are always specified, so that we are able to hash them.
ipproto
is one of the IP protocol (TCP, UDP and maybe other)
session
is destination (address?) with or without port, or gpi (Generalized Port Identifier)
src
may be exact, or may be wildcard, so that we can keep a hash table plus one wildcard entry.
source
port (or flow label) is important only if src is given.
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ edge router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.
Alexey: IMPLEMENTATION.
We use a two level hash table: The top level is keyed by destination address and protocol ID, every bucket contains a list of "rsvp sessions", identified by destination address, protocol and DPI(="Destination Port ID"): triple (key, mask, offset).
Every bucket has a smaller hash table keyed by source address (cf. RSVP flowspec) and one wildcard entry for wildcard reservations. Every bucket is again a list of "RSVP flows", selected by source address and SPI(="Source Port ID" here rather than "security parameter index"): triple (key, mask, offset).
All the packets with IPv6 extension headers (but AH and ESP) and all fragmented packets go to the best-effort traffic class.
Two "port id"'s seems to be redundant, rfc2207 requires only one "Generalized Port Identifier". So that for classic ah, esp (and udp,tcp) both *pi should coincide or one of them should be wildcard.
At first sight, this redundancy is just a waste of CPU resources. But DPI and SPI add the possibility to assign different priorities to GPIs. Look also at note 4 about tunnels below.
One complication is the case of tunneled packets. We implement it as following: if the first lookup matches a special session with "tunnelhdr" value not zero, flowid doesn't contain the true flow ID, but the tunnel ID (1...255). In this case, we pull tunnelhdr bytes and restart lookup with tunnel ID added to the list of keys. Simple and stupid 8)8) It's enough for PIMREG and IPIP.
Two GPIs make it possible to parse even GRE packets. F.e. DPI can select ETH_P_IP (and necessary flags to make tunnelhdr correct) in GRE protocol field and SPI matches GRE key. Is it not nice? 8)8)
Well, as result, despite its simplicity, we get a pretty powerful classification engine.
Panagiotis Stathopoulos: Well an rsvp filter is used to distinguish an application session (dst port dst ip address). In an DiffServ egde router it can be used to mark packets of specific applications in order to be classified in the appropriate PHB.
note: I have to read more about RSVP
5.2 filter u32
Anything in the header can be used for classification
The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules.
Usage: ... u32 [ match SELECTOR ... ] [ link HTID ] [ classid CLASSID ]
[ police POLICE_SPEC ] [ offset OFFSET_SPEC ]
[ ht HTID ] [ hashkey HASHKEY_SPEC ]
[ sample SAMPLE ]
or u32 divisor DIVISOR
Where: SELECTOR := SAMPLE SAMPLE ...
SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} } SAMPLE_ARGS FILTERID := X:Y:Z
match
SELECTOR contains definition of the pattern, that will be matched to the currently processed packet. Precisely, it defines which bits are to be matched in the packet header and nothing more, but this simple method is very powerful.
link
classid
police
offset
ht
is hash table
hashkey
is the key to hash table
sample
is protocol such as IP or higher layer protocol such as UDP, TCP or ICMP. sample can be one of the keywords u32, u16 or u8 specifies length of the pattern in bits. PATTERN and MASK should follow, of length defined by the previous keyword. The OFFSET parameter is the offset, in bytes, to start matching. If nexthdr+ keyword is given, the offset is relative to start of the upper layer header.
police
specification is explained on the page
The syntax here is match ip <lt;item>gt; <lt;value>gt; <lt;mask>gt;
So match ip protocol 6 0xff matches protocol 6, TCP. (See /etc/protocols) match ip dport 0x17 0xffff is TELNET (/etc/services). Note that the number is hexadecimal, not decimal.
note: (mps) ht - hash table HTID Hash Table ID is fh - filter handle in filter show
The filters are packed to hash tables of key nodes with a set of 32bit key/mask pairs at every node. Nodes reference next level hash tables etc.
It seems that it represents the best middle point between speed and manageability both by human and by machine.
It is especially useful for link sharing combined with QoS; pure RSVP doesn't need such a general approach and can use much simpler (and faster) schemes.
5.3 filter fw
Classifier mapping ipchains' fwmark to traffic class
Usage: ... fw [ classid CLASSID ] [ police POLICE_SPEC ]
POLICE_SPEC := ... look at TBF
CLASSID := X:Y
classid
is class handle
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
5.4 filter route
Use routing table decisions for classification
Usage: ... route [ from REALM | fromif TAG ] [ to REALM ]
[ flowid CLASSID ] [ police POLICE_SPEC ]
POLICE_SPEC := ... look at TBF
CLASSID := X:Y
from
REALM is realm in ip route table
fromif
TAG is interface tag
to
REALM is (again) ip route table realm
flowid
CLASSID is class to which packet (if passed) is
police
specification is explained on the page , and it should be, but tc gives (with help command) reference to TBF?
For now we assume that route tags <lt; 256. It allows to use direct table lookups, instead of hash tables.
For now we assume that "from TAG" and "fromdev DEV" statements are mutually exclusive.
"to TAG from ANY" has higher priority, than "to ANY from XXX"
5.5 tcindex
Use tc_index internal tag in skb to select classes.
Usage: ... tcindex [ hash SIZE ] [ mask MASK ] [ shift SHIFT ] [ pass_on | fall_through ] [ classid CLASSID ] [ police POLICE_SPEC ]
hash
is the size of the lookup table
mask
is the bit mask (this explanation is worthless)
shift
the mask right by SHIFT number
pass_on
defines that this packet will pass
fall_through
classid
is the class to which filter is attached
police
specification is explained on the page
note: key = (skb->gt;tc_index >gt;>gt; shift) & mask
6 police
The purpose of policing is to ensure that traffic does not exceed certain bounds. For simplicity, we will assume a broad definition of policing and consider it to comprise all kinds of traffic control actions that depend in some way on the traffic volume.
We consider four types of policing mechanisms:
policing decisions by filters
refusal to enqueue a packet
dropping of a packet from an ``inner'' queueing discipline
dropping of packet when enqueuing a new one
Usage: ... police rate BPS burst BYTES[/BYTES] [ mtu BYTES[/BYTES] ]
[ peakrate BPS ] [ avrate BPS ] [ ACTION ]
Where: ACTION := reclassify | drop | continue
rate
is the long-term rate attached to the meter
peakrate
this is the peakrate a flow is allowed to burst in the short-term. Basically this upper-bounds the rate.
mtu
a packet exceeding this size will be dropped. The default value is 2KB. This is fine with ethernet whose MTU is 1.5KB but will not be fine with Gigabit ethernet exploiting Jumbo frames for example. It also will not be valid for the lo device whose MTU is defined by amongst other things how much RAM you have. You must set this value if you have exceptions to the rule.
ACTION
exceed/non-exceed: This allows to define what actions should be exercised when a flow either exceeds its allocated or doesn't. they are:
pass
(?)
reclassify
used by CBQ to go to BE (Best Effort, ask Jamal?)
drop
simply drops packet
continue
- lookup the next filter rule with lower priority
note: "drop" is only recognized by the following qdiscs: atm, cbq, dsmark, and (ingress - really?). In particular, prio ignores it.
Bibliography
1
A. N. Kuznetsov, docs from iproute2
2
Werner Almesberger, Linux Network Traffic Control - Implementation Overview
3
Jamal Hadi Salim, IP Quality of Service on Linux http://????
4
Saravanan Radhakrishnan, Linux - Advanced Networking Overview http://qos.ittc.ukans.edu/howto/howto.html
12
Almesberger, Jamal Hadi Salim, Alexey Kuznetsov - Differentiated Services on Linux
6
linux-diffserv mailing list linux-diffserv@lrc.di.epfl.ch
7
Sally Floyd, Van Jacobson - Link-sharing and Resource Management Models for Packet Networks
9
Sally Floyd, Van Jacobson - Random Early Detection Gateways for Congestion Avoidance
9
Related Cisco documents from http://www.cisco.com/
10
Lixia Zhang, Steve Deering, Deborah Estrin, Scott Shenker, Daniel Zapalla - RSVP: A New Resource ReSerVation Protocol
11
Related RFC's
12
and many others
Appendix
note: flowid is sometimes class handle sometimes something else
mariano - good setup for me: If you remove the router and then the modem line becomes ppp0 (instead of eth0), you should declare that ppp0 has "bandwidth 30K". Then, the classes should use "bandwidth 30K rate 20K" and "bandwidth 30K rate 10K"
2007/11/29
All About Data Blocks Corruption in Oracle
All About Data Blocks Corruption in Oracle
Vijaya R. Dumpa
Data Block Overview:
--------------------------------
| Common and Variable Header
| Table Dictionary
V Row Dictionary
Free Space
A Row Data
|
--------------------------------
Oracle allocates logical database space for all data in a database. The units of database space allocation are data blocks (also called logical blocks, Oracle blocks, or pages), extents, and segments. The next level of logical database space is an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information. The level of logical database storage above an extent is called a segment. The high water mark is the boundary between used and unused space in a segment.
The header contains general block information, such as the block address and the type of segment (for example, data, index, or rollback).
Table Directory, this portion of the data block contains information about the table having rows in this block.
Row Directory, this portion of the data block contains information about the actual rows in the block (including addresses for each row piece in the row data area).
Free space is allocated for insertion of new rows and for updates to rows that require additional space.
Row data, this portion of the data block contains rows in this block.
Analyze the Table structure to identify block corruption:
By analyzing the table structure and its associated objects, you can perform a detailed check of data blocks to identify block corruption:
SQL> analyze table_name/index_name/cluster_name ... validate structure cascade;
Detecting data block corruption using the DBVERIFY Utility:
DBVERIFY is an external command-line utility that performs a physical data structure integrity check on an offline database. It can be used against backup files and online files. Integrity checks are significantly faster if you run against an offline database.
Restrictions:
DBVERIFY checks are limited to cache-managed blocks. It’s only for use with datafiles, it will not work against control files or redo logs.
The following example is sample output of verification for the data file system_ts_01.dbf. And its Start block is 9 and end block is 25. Blocksize parameter is required only if the file to be verified has a non-2kb block size. Logfile parameter specifies the file to which logging information should be written. The feedback parameter has been given the value 2 to display one dot on the screen for every 2 blocks processed.
$ dbv file=system_ts_01.dbf start=9 end=25 blocksize=16384 logfile=dbvsys_ts.log feedback=2
DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Output:
$ pg dbvsys_ts.log
DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
DBVERIFY - Verification starting : FILE = system_ts_01.dbf
DBVERIFY - Verification complete
Total Pages Examined : 17
Total Pages Processed (Data) : 10
Total Pages Failing (Data) : 0
Total Pages Processed (Index) : 2
Total Pages Failing (Index) : 0
Total Pages Processed (Other) : 5
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Detecting and reporting data block corruption using the DBMS_REPAIR package:
Note: Note that this event can only be used if the block "wrapper" is marked corrupt.
Eg: If the block reports ORA-1578.
1. Create DBMS_REPAIR administration tables:
To Create Repair tables, run the below package.
SQL> EXEC DBMS_REPAIR.ADMIN_TABLES(‘REPAIR_ADMIN’, 1,1, ‘REPAIR_TS’);
Note that table names prefix with ‘REPAIR_’ or ‘ORPAN_’. If the second variable is 1, it will create ‘REAIR_key tables, if it is 2, then it will create ‘ORPAN_key tables.
If the thread variable is
1 then package performs ‘create’ operations.
2 then package performs ‘delete’ operations.
3 then package performs ‘drop’ operations.
2. Scanning a specific table or Index using the DBMS_REPAIR.CHECK_OBJECT procedure:
In the following example we check the table employee for possible corruption’s that belongs to the schema TEST. Let’s assume that we have created our administration tables called REPAIR_ADMIN in schema SYS.
To check the table block corruption use the following procedure:
SQL> VARIABLE A NUMBER;
SQL> EXEC DBMS_REPAIR.CHECK_OBJECT (‘TEST’,’EMP’, NULL,
1,’REPAIR_ADMIN’, NULL, NULL, NULL, NULL,:A);
SQL> PRINT A;
To check which block is corrupted, check in the REPAIR_ADMIN table.
SQL> SELECT * FROM REPAIR_ADMIN;
3. Fixing corrupt block using the DBMS_REPAIR.FIX_CORRUPT_BLOCK procedure:
SQL> VARIABLE A NUMBER;
SQL> EXEC DBMS_REPAIR.FIX.CORRUPT_BLOCKS (‘TEST’,’EMP’, NULL,
1,’REPARI_ADMIN’, NULL,:A);
SQL> SELECT MARKED FROM REPAIR_ADMIN;
If u select the EMP table now you still get the error ORA-1578.
4. Skipping corrupt blocks using the DBMS_REPAIR. SKIP_CORRUPT_BLOCK procedure:
SQL> EXEC DBMS_REPAIR. SKIP_CORRUPT.BLOCKS (‘TEST’, ‘EMP’, 1,1);
Notice the verification of running the DBMS_REPAIR tool. You have lost some of data. One main advantage of this tool is that you can retrieve the data past the corrupted block. However we have lost some data in the table.
5. This procedure is useful in identifying orphan keys in indexes that are pointing to corrupt rows of the table:
SQL> EXEC DBMS_REPAIR. DUMP ORPHAN_KEYS (‘TEST’,’IDX_EMP’, NULL,
2, ‘REPAIR_ADMIN’, ‘ORPHAN_ADMIN’, NULL,:A);
If u see any records in ORPHAN_ADMIN table you have to drop and re-create the index to avoid any inconsistencies in your queries.
6. The last thing you need to do while using the DBMS_REPAIR package is to run the DBMS_REPAIR.REBUILD_FREELISTS procedure to reinitialize the free list details in the data dictionary views.
SQL> EXEC DBMS_REPAIR.REBUILD_FREELISTS (‘TEST’,’EMP’, NULL, 1);
NOTE
Setting events 10210, 10211, 10212, and 10225 can be done by adding the following line for each event in the init.ora file:
Event = "event_number trace name errorstack forever, level 10"
When event 10210 is set, the data blocks are checked for corruption by checking their integrity. Data blocks that don't match the format are marked as soft corrupt.
When event 10211 is set, the index blocks are checked for corruption by checking their integrity. Index blocks that don't match the format are marked as soft corrupt.
When event 10212 is set, the cluster blocks are checked for corruption by checking their integrity. Cluster blocks that don't match the format are marked as soft corrupt.
When event 10225 is set, the fet$ and uset$ dictionary tables are checked for corruption by checking their integrity. Blocks that don't match the format are marked as soft corrupt.
Set event 10231 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing full table scans:
Event="10231 trace name context forever, level 10"
Set event 10233 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing index range scans:
Event="10233 trace name context forever, level 10"
To dump the Oracle block you can use below command from 8.x on words:
SQL> ALTER SYSTEM DUMP DATAFILE 11 block 9;
This command dumps datablock 9 in datafile11, into USER_DUMP_DEST directory.
Dumping Redo Logs file blocks:
SQL> ALTER SYSTEM DUMP LOGFILE ‘/usr/oracle8/product/admin/udump/rl. log’;
Rollback segments block corruption, it will cause problems (ORA-1578) while starting up the database.
With support of oracle, can use below under source parameter to startup the database.
_CORRUPTED_ROLLBACK_SEGMENTS=(RBS_1, RBS_2)
DB_BLOCK_COMPUTE_CHECKSUM
This parameter is normally used to debug corruption’s that happen on disk.
The following V$ views contain information about blocks marked logically corrupt:
V$ BACKUP_CORRUPTION, V$COPY_CORRUPTION
When this parameter is set, while reading a block from disk to catch, oracle will compute the checksum again and compares it with the value that is in the block.
If they differ, it indicates that the block is corrupted on disk. Oracle makes the block as corrupt and signals an error. There is an overhead involved in setting this parameter.
DB_BLOCK_CACHE_PROTECT=‘TRUE’
Oracle will catch stray writes made by processes in the buffer catch.
Oracle 9i new RMAN futures:
Obtain the datafile numbers and block numbers for the corrupted blocks. Typically, you obtain this output from the standard output, the alert.log, trace files, or a media management interface. For example, you may see the following in a trace file:
ORA-01578: ORACLE data block corrupted (file # 9, block # 13)
ORA-01110: data file 9: '/oracle/dbs/tbs_91.f'
ORA-01578: ORACLE data block corrupted (file # 2, block # 19)
ORA-01110: data file 2: '/oracle/dbs/tbs_21.f'
$rman target =rman/rman@rmanprod
RMAN> run {
2> allocate channel ch1 type disk;
3> blockrecover datafile 9 block 13 datafile 2 block 19;
4> }
Recovering Data blocks Using Selected Backups:
# restore from backupset
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM BACKUPSET;
# restore from datafile image copy
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM DATAFILECOPY;
# restore from backupset with tag "mondayAM"
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 199 FROM TAG = mondayAM;
# restore using backups made before one week ago
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
UNTIL 'SYSDATE-7';
# restore using backups made before SCN 100
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE UNTIL SCN 100;
# restore using backups made before log sequence 7024
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
UNTIL SEQUENCE 7024;
Vijaya R. Dumpa
Data Block Overview:
--------------------------------
| Common and Variable Header
| Table Dictionary
V Row Dictionary
Free Space
A Row Data
|
--------------------------------
Oracle allocates logical database space for all data in a database. The units of database space allocation are data blocks (also called logical blocks, Oracle blocks, or pages), extents, and segments. The next level of logical database space is an extent. An extent is a specific number of contiguous data blocks allocated for storing a specific type of information. The level of logical database storage above an extent is called a segment. The high water mark is the boundary between used and unused space in a segment.
The header contains general block information, such as the block address and the type of segment (for example, data, index, or rollback).
Table Directory, this portion of the data block contains information about the table having rows in this block.
Row Directory, this portion of the data block contains information about the actual rows in the block (including addresses for each row piece in the row data area).
Free space is allocated for insertion of new rows and for updates to rows that require additional space.
Row data, this portion of the data block contains rows in this block.
Analyze the Table structure to identify block corruption:
By analyzing the table structure and its associated objects, you can perform a detailed check of data blocks to identify block corruption:
SQL> analyze table_name/index_name/cluster_name ... validate structure cascade;
Detecting data block corruption using the DBVERIFY Utility:
DBVERIFY is an external command-line utility that performs a physical data structure integrity check on an offline database. It can be used against backup files and online files. Integrity checks are significantly faster if you run against an offline database.
Restrictions:
DBVERIFY checks are limited to cache-managed blocks. It’s only for use with datafiles, it will not work against control files or redo logs.
The following example is sample output of verification for the data file system_ts_01.dbf. And its Start block is 9 and end block is 25. Blocksize parameter is required only if the file to be verified has a non-2kb block size. Logfile parameter specifies the file to which logging information should be written. The feedback parameter has been given the value 2 to display one dot on the screen for every 2 blocks processed.
$ dbv file=system_ts_01.dbf start=9 end=25 blocksize=16384 logfile=dbvsys_ts.log feedback=2
DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
Output:
$ pg dbvsys_ts.log
DBVERIFY: Release 8.1.7.3.0 - Production on Fri Sep 13 14:11:52 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
DBVERIFY - Verification starting : FILE = system_ts_01.dbf
DBVERIFY - Verification complete
Total Pages Examined : 17
Total Pages Processed (Data) : 10
Total Pages Failing (Data) : 0
Total Pages Processed (Index) : 2
Total Pages Failing (Index) : 0
Total Pages Processed (Other) : 5
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Detecting and reporting data block corruption using the DBMS_REPAIR package:
Note: Note that this event can only be used if the block "wrapper" is marked corrupt.
Eg: If the block reports ORA-1578.
1. Create DBMS_REPAIR administration tables:
To Create Repair tables, run the below package.
SQL> EXEC DBMS_REPAIR.ADMIN_TABLES(‘REPAIR_ADMIN’, 1,1, ‘REPAIR_TS’);
Note that table names prefix with ‘REPAIR_’ or ‘ORPAN_’. If the second variable is 1, it will create ‘REAIR_key tables, if it is 2, then it will create ‘ORPAN_key tables.
If the thread variable is
1 then package performs ‘create’ operations.
2 then package performs ‘delete’ operations.
3 then package performs ‘drop’ operations.
2. Scanning a specific table or Index using the DBMS_REPAIR.CHECK_OBJECT procedure:
In the following example we check the table employee for possible corruption’s that belongs to the schema TEST. Let’s assume that we have created our administration tables called REPAIR_ADMIN in schema SYS.
To check the table block corruption use the following procedure:
SQL> VARIABLE A NUMBER;
SQL> EXEC DBMS_REPAIR.CHECK_OBJECT (‘TEST’,’EMP’, NULL,
1,’REPAIR_ADMIN’, NULL, NULL, NULL, NULL,:A);
SQL> PRINT A;
To check which block is corrupted, check in the REPAIR_ADMIN table.
SQL> SELECT * FROM REPAIR_ADMIN;
3. Fixing corrupt block using the DBMS_REPAIR.FIX_CORRUPT_BLOCK procedure:
SQL> VARIABLE A NUMBER;
SQL> EXEC DBMS_REPAIR.FIX.CORRUPT_BLOCKS (‘TEST’,’EMP’, NULL,
1,’REPARI_ADMIN’, NULL,:A);
SQL> SELECT MARKED FROM REPAIR_ADMIN;
If u select the EMP table now you still get the error ORA-1578.
4. Skipping corrupt blocks using the DBMS_REPAIR. SKIP_CORRUPT_BLOCK procedure:
SQL> EXEC DBMS_REPAIR. SKIP_CORRUPT.BLOCKS (‘TEST’, ‘EMP’, 1,1);
Notice the verification of running the DBMS_REPAIR tool. You have lost some of data. One main advantage of this tool is that you can retrieve the data past the corrupted block. However we have lost some data in the table.
5. This procedure is useful in identifying orphan keys in indexes that are pointing to corrupt rows of the table:
SQL> EXEC DBMS_REPAIR. DUMP ORPHAN_KEYS (‘TEST’,’IDX_EMP’, NULL,
2, ‘REPAIR_ADMIN’, ‘ORPHAN_ADMIN’, NULL,:A);
If u see any records in ORPHAN_ADMIN table you have to drop and re-create the index to avoid any inconsistencies in your queries.
6. The last thing you need to do while using the DBMS_REPAIR package is to run the DBMS_REPAIR.REBUILD_FREELISTS procedure to reinitialize the free list details in the data dictionary views.
SQL> EXEC DBMS_REPAIR.REBUILD_FREELISTS (‘TEST’,’EMP’, NULL, 1);
NOTE
Setting events 10210, 10211, 10212, and 10225 can be done by adding the following line for each event in the init.ora file:
Event = "event_number trace name errorstack forever, level 10"
When event 10210 is set, the data blocks are checked for corruption by checking their integrity. Data blocks that don't match the format are marked as soft corrupt.
When event 10211 is set, the index blocks are checked for corruption by checking their integrity. Index blocks that don't match the format are marked as soft corrupt.
When event 10212 is set, the cluster blocks are checked for corruption by checking their integrity. Cluster blocks that don't match the format are marked as soft corrupt.
When event 10225 is set, the fet$ and uset$ dictionary tables are checked for corruption by checking their integrity. Blocks that don't match the format are marked as soft corrupt.
Set event 10231 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing full table scans:
Event="10231 trace name context forever, level 10"
Set event 10233 in the init.ora file to cause Oracle to skip software- and media-corrupted blocks when performing index range scans:
Event="10233 trace name context forever, level 10"
To dump the Oracle block you can use below command from 8.x on words:
SQL> ALTER SYSTEM DUMP DATAFILE 11 block 9;
This command dumps datablock 9 in datafile11, into USER_DUMP_DEST directory.
Dumping Redo Logs file blocks:
SQL> ALTER SYSTEM DUMP LOGFILE ‘/usr/oracle8/product/admin/udump/rl. log’;
Rollback segments block corruption, it will cause problems (ORA-1578) while starting up the database.
With support of oracle, can use below under source parameter to startup the database.
_CORRUPTED_ROLLBACK_SEGMENTS=(RBS_1, RBS_2)
DB_BLOCK_COMPUTE_CHECKSUM
This parameter is normally used to debug corruption’s that happen on disk.
The following V$ views contain information about blocks marked logically corrupt:
V$ BACKUP_CORRUPTION, V$COPY_CORRUPTION
When this parameter is set, while reading a block from disk to catch, oracle will compute the checksum again and compares it with the value that is in the block.
If they differ, it indicates that the block is corrupted on disk. Oracle makes the block as corrupt and signals an error. There is an overhead involved in setting this parameter.
DB_BLOCK_CACHE_PROTECT=‘TRUE’
Oracle will catch stray writes made by processes in the buffer catch.
Oracle 9i new RMAN futures:
Obtain the datafile numbers and block numbers for the corrupted blocks. Typically, you obtain this output from the standard output, the alert.log, trace files, or a media management interface. For example, you may see the following in a trace file:
ORA-01578: ORACLE data block corrupted (file # 9, block # 13)
ORA-01110: data file 9: '/oracle/dbs/tbs_91.f'
ORA-01578: ORACLE data block corrupted (file # 2, block # 19)
ORA-01110: data file 2: '/oracle/dbs/tbs_21.f'
$rman target =rman/rman@rmanprod
RMAN> run {
2> allocate channel ch1 type disk;
3> blockrecover datafile 9 block 13 datafile 2 block 19;
4> }
Recovering Data blocks Using Selected Backups:
# restore from backupset
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM BACKUPSET;
# restore from datafile image copy
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 FROM DATAFILECOPY;
# restore from backupset with tag "mondayAM"
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 199 FROM TAG = mondayAM;
# restore using backups made before one week ago
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
UNTIL 'SYSDATE-7';
# restore using backups made before SCN 100
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE UNTIL SCN 100;
# restore using backups made before log sequence 7024
BLOCKRECOVER DATAFILE 9 BLOCK 13 DATAFILE 2 BLOCK 19 RESTORE
UNTIL SEQUENCE 7024;
Limit Bandwidth with bwmod for apache (Per Vhost/Directory)
Limit Bandwidth (Per Vhost/Directory)
The main goal, is to be able to "assign" a maximum (or fixed) bandwidth available to a vhost.
This is achieved inserting small delays while sending the data, thus limiting the top speed a client can use. In example, if we assign 100kb to a vhost, the first user will be able to download at 100kb. If another user starts downloading, each will be able to get 50kb/s max... A third, 33kb/s each.. and so on.
The main goal, is to be able to "assign" a maximum (or fixed) bandwidth available to a vhost.
This is achieved inserting small delays while sending the data, thus limiting the top speed a client can use. In example, if we assign 100kb to a vhost, the first user will be able to download at 100kb. If another user starts downloading, each will be able to get 50kb/s max... A third, 33kb/s each.. and so on.
iptables チュートリアルの和訳
Iptablesチュートリアル 1.2.0
Oskar Andreasson
Tatsuya Nonogaki - 日本語訳
���http://www.asahi-net.or.jp/~aa4t-nngk/
���Japanese translation v.1.1.1
Copyright © 2001-2005 Oskar Andreasson
Copyright © 2005-2006 Tatsuya Nonogaki
この文書を、フリーソフトウェア財団発行の GNU フリー文書利用許諾契約書バージョン1.1 が定める条件の下で複製、頒布、あるいは改変することを許可する。序文とその副章は変更不可部分であり、「Original Author: Oskar Andreasson」は表カバーテキスト、裏カバーテキストは指定しない。この利用許諾契約書の複製物は「GNU フリー文書利用許諾契約書」という章に含まれている。
このチュートリアルに含まれるすべてのスクリプトはフリーソフトウェアです。あなたはこれを、フリーソフトウェア財団によって発行された GNU 一般公衆利用許諾契約書バージョン2の定める条件の下で再頒布または改変することができます。
これらのスクリプトは有用であることを願って頒布されますが、*全くの無保証* です。商業可能性の保証や特定の目的への適合性は、言外に示されたものも含め全く存在しません。詳しくはGNU 一般公衆利用許諾契約書をご覧ください。
あなたはこのチュートリアルと共に、GNU 一般公衆利用許諾契約書の複製物を一部受け取ったはずです。もし受け取っていなければ、フリーソフトウェア財団まで請求してください(宛先は the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA)。
献辞
僕はこのドキュメントを僕の素晴らしい妹に捧げる。彼女は僕を応援し、示唆を与えてくれた。彼女は僕の幸せの源であり一筋の光だ。感謝!
そして、僕はこの作品を、とてつもなくきつい仕事をしている Linux 開発者達と維持管理者達に捧げたい。この素晴らしいオペレーティングシステムを世に送り出してくれている人々へ。
Table of Contents
著者について
読み方
予備知識
このドキュメントで用いる表記法
<この日本語訳で用いる表記法>
1. 序文
1.1. なぜこのドキュメントを書いたか
1.2. どのようにして書いたか
1.3. このドキュメントで使う語句
2. TCP/IPのおさらい
2.1. TCP/IPのレイヤー
2.2. IPの特徴
2.3. IPヘッダ
2.4. TCPの特徴
2.5. TCPヘッダ
2.6. UDPの特徴
2.7. UDPヘッダ
2.8. ICMPの特徴
2.9. ICMPヘッダ
2.9.1. ICMPエコー要求/応答
2.9.2. ICMP到達不能メッセージ (Destination Unreachable)
2.9.3. ソースクエンチ (Source Quench)
2.9.4. リダイレクト
2.9.5. TTL equals 0
2.9.6. パラメータ障害
2.9.7. タイムスタンプ要求/応答
2.9.8. インフォメーション要求/応答
2.10. TCP/IP宛先誘導型ルーティング
2.11. まとめ
3. IPフィルタリングとは
3.1. IPフィルタとは何か
3.2. IPフィルタリングの用語と表現
3.3. IPフィルタの計画の仕方
3.4. まとめ
4. ネットワークアドレス変換とは
4.1. NATの利用目的と用語解説
4.2. NAT使用時の注意点
4.3. 概念を理解するためのNATマシン構築例
4.3.1. NATマシン構築に必要なもの
4.3.2. NATマシンの配備位置
4.3.3. プロキシの配置の仕方
4.3.4. NATマシン構築の最終段階
4.4. まとめ
5. 準備
5.1. iptablesの入手先
5.2. カーネルのセットアップ
5.3. ユーザ空間のセットアップ
5.3.1. ユーザ空間アプリケーションのコンパイル
5.3.2. Red Hat 7.1 でのインストール
6. テーブルとチェーンの道のり
6.1. 全般
6.2. mangleテーブル
6.3. natテーブル
6.4. filterテーブル
7. ステート機構
7.1. はじめに
7.2. conntrackエントリ
7.3. ユーザ空間でのステート
7.4. TCP接続
7.5. UDP接続
7.6. ICMP接続
7.7. デフォルトの接続
7.8. 複雑なプロトコルとコネクション追跡
8. 大きなルールセットの保存とリストア
8.1. 速度に関する考察
8.2. restoreの欠点
8.3. iptables-save
8.4. iptables-restore
9. ルールの作り方
9.1. iptablesのコマンドの基本
9.2. テーブル
9.3. コマンド
10. iptablesのマッチ
10.1. 汎用的なマッチ
10.2. 暗黙的なマッチ
10.2.1. TCPマッチ
10.2.2. UDPマッチ
10.2.3. ICMPマッチ
10.3. 明示的なマッチ
10.3.1. AH/ESPマッチ
10.3.2. Conntrackマッチ
10.3.3. DSCPマッチ
10.3.4. ECNマッチ
10.3.5. Helperマッチ
10.3.6. IP rangeマッチ
10.3.7. Lengthマッチ
10.3.8. Limitマッチ
10.3.9. MACマッチ
10.3.10. Markマッチ
10.3.11. Multiportマッチ
10.3.12. Ownerマッチ
10.3.13. Packet type マッチ
10.3.14. Recentマッチ
10.3.15. Stateマッチ
10.3.16. TCPMSSマッチ
10.3.17. TOSマッチ
10.3.18. TTLマッチ
10.3.19. Uncleanマッチ
11. iptablesのターゲットとジャンプ
11.1. ACCEPTターゲット
11.2. CLASSIFYターゲット
11.3. DNATターゲット
11.4. DROPターゲット
11.5. DSCPターゲット
11.6. ECNターゲット
11.7. LOGターゲット
11.8. MARKターゲット
11.9. MASQUERADEターゲット
11.10. MIRRORターゲット
11.11. NETMAPターゲット
11.12. QUEUEターゲット
11.13. REDIRECTターゲット
11.14. REJECTターゲット
11.15. RETURNターゲット
11.16. SAMEターゲット
11.17. SNATターゲット
11.18. TCPMSSターゲット
11.19. TOSターゲット
11.20. TTLターゲット
11.21. ULOGターゲット
12. スクリプトのデバグ
12.1. デバグ、それは必要欠くべからざるもの
12.2. Bashデバグテクニック
12.3. デバグに役立つシステムツール
12.4. iptablesのデバグ
12.5. その他のデバグツール
12.5.1. Nmap
12.5.2. Nessus
12.6. まとめ
13. rc.firewallファイル
13.1. 例 rc.firewall
13.2. rc.firewallの説明
13.2.1. 設定オプション
13.2.2. 追加モジュールの初期ロード
13.2.3. procの設定
13.2.4. 各種チェーンへのルール配置
13.2.5. デフォルトポリシーの設定
13.2.6. filterテーブルにユーザ定義チェーンを作る
13.2.7. INPUTチェーン
13.2.8. FORWARDチェーン
13.2.9. OUTPUTチェーン
13.2.10. natテーブルのPREROUTINGチェーン
13.2.11. SNATの開始とPOSTROUTINGチェーン
14. スクリプト例
14.1. rc.firewall.txtスクリプトの構造
14.1.1. 構造
14.2. rc.firewall.txt
14.3. rc.DMZ.firewall.txt
14.4. rc.DHCP.firewall.txt
14.5. rc.UTIN.firewall.txt
14.6. rc.test-iptables.txt
14.7. rc.flush-iptables.txt
14.8. Limit-match.txt
14.9. Pid-owner.txt
14.10. Recent-match.txt
14.11. Sid-owner.txt
14.12. Ttl-inc.txt
14.13. Iptables-save ruleset
15. iptables/netfilter用グラフィカルユーザインターフェイス
15.1. fwbuilder
15.2. Turtle Firewall プロジェクト
15.3. Integrated Secure Communications System
15.4. IPMenu
15.5. Easy Firewall Generator
15.6. まとめ
A. 特別なコマンドの詳細解説
A.1. 稼働中のルールセットのリストアップ
A.2. テーブルのアップデートとフラッシュ
B. よくある問題と質問
B.1. モジュールロードのトラブル
B.2. NEWステートでありながらSYNビットの立っていないパケット
B.3. SYN/ACKでNEWなパケット
B.4. 予約済みIPアドレスを使用するインターネットサービスプロバイダ
B.5. iptablesにDHCPリクエストを通させる
B.6. mIRC DCCのトラブル
C. ICMPタイプ
D. TCPオプション
E. その他の資料とリンク
F. 謝辞
G. History
H. GNUフリー文書利用許諾契約書
0. PREAMBLE
1. APPLICABILITY AND DEFINITIONS
2. VERBATIM COPYING
3. COPYING IN QUANTITY
4. MODIFICATIONS
5. COMBINING DOCUMENTS
6. COLLECTIONS OF DOCUMENTS
7. AGGREGATION WITH INDEPENDENT WORKS
8. TRANSLATION
9. TERMINATION
10. FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
I. GNU一般公衆利用許諾契約書
0. Preamble
1. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2. How to Apply These Terms to Your New Programs
J. スクリプト例コードベース
J.1. Example rc.firewall script
J.2. Example rc.DMZ.firewall script
J.3. Example rc.UTIN.firewall script
J.4. Example rc.DHCP.firewall script
J.5. Example rc.flush-iptables script
J.6. Example rc.test-iptables script
List of Tables
6-1. ローカルホスト (我々のマシン) を宛先とするパケット
6-2. ローカルホスト (我々のマシン) を送信元とするパケット
6-3. フォワードパケット
7-1. ユーザ空間でのステート
7-2. 内部ステート
7-3. サポートされている複雑なプロトコル
9-1. テーブル
9-2. コマンド
9-3. オプション
10-1. 汎用的なマッチ
10-2. TCPマッチ
10-3. UDPマッチ
10-4. ICMPマッチ
10-5. AHマッチオプション
10-6. ESPマッチオプション
10-7. Conntrackマッチオプション
10-8. DSCPマッチオプション
10-9. ECNマッチオプション
10-10. IP内のECNフィールド
10-11. Helperマッチオプション
10-12. IP rangeマッチオプション
10-13. Lengthマッチオプション
10-14. Limitマッチオプション
10-15. MACマッチオプション
10-16. Markマッチオプション
10-17. Multiportマッチオプション
10-18. Ownerマッチオプション
10-19. Packet typeマッチオプション
10-20. Recentマッチオプション
10-21. Stateマッチ
10-22. TCPMSSマッチオプション
10-23. TOSマッチ
10-24. TTLマッチ
11-1. CLASSIFYターゲットオプション
11-2. DNATターゲット
11-3. DSCPターゲットオプション
11-4. ECNターゲットオプション
11-5. LOGターゲットオプション
11-6. MARKターゲットオプション
11-7. MASQUERADEターゲット
11-8. NETMAPターゲットオプション
11-9. REDIRECTターゲット
11-10. REJECTターゲット
11-11. SAMEターゲットオプション
11-12. SNATターゲットオプション
11-13. TCPMSSターゲットオプション
11-14. TOSターゲット
11-15. TTLターゲット
11-16. ULOGターゲット
C-1. ICMPタイプ
D-1. TCPオプション
Oskar Andreasson
Tatsuya Nonogaki - 日本語訳
���http://www.asahi-net.or.jp/~aa4t-nngk/
���Japanese translation v.1.1.1
Copyright © 2001-2005 Oskar Andreasson
Copyright © 2005-2006 Tatsuya Nonogaki
この文書を、フリーソフトウェア財団発行の GNU フリー文書利用許諾契約書バージョン1.1 が定める条件の下で複製、頒布、あるいは改変することを許可する。序文とその副章は変更不可部分であり、「Original Author: Oskar Andreasson」は表カバーテキスト、裏カバーテキストは指定しない。この利用許諾契約書の複製物は「GNU フリー文書利用許諾契約書」という章に含まれている。
このチュートリアルに含まれるすべてのスクリプトはフリーソフトウェアです。あなたはこれを、フリーソフトウェア財団によって発行された GNU 一般公衆利用許諾契約書バージョン2の定める条件の下で再頒布または改変することができます。
これらのスクリプトは有用であることを願って頒布されますが、*全くの無保証* です。商業可能性の保証や特定の目的への適合性は、言外に示されたものも含め全く存在しません。詳しくはGNU 一般公衆利用許諾契約書をご覧ください。
あなたはこのチュートリアルと共に、GNU 一般公衆利用許諾契約書の複製物を一部受け取ったはずです。もし受け取っていなければ、フリーソフトウェア財団まで請求してください(宛先は the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA)。
献辞
僕はこのドキュメントを僕の素晴らしい妹に捧げる。彼女は僕を応援し、示唆を与えてくれた。彼女は僕の幸せの源であり一筋の光だ。感謝!
そして、僕はこの作品を、とてつもなくきつい仕事をしている Linux 開発者達と維持管理者達に捧げたい。この素晴らしいオペレーティングシステムを世に送り出してくれている人々へ。
Table of Contents
著者について
読み方
予備知識
このドキュメントで用いる表記法
<この日本語訳で用いる表記法>
1. 序文
1.1. なぜこのドキュメントを書いたか
1.2. どのようにして書いたか
1.3. このドキュメントで使う語句
2. TCP/IPのおさらい
2.1. TCP/IPのレイヤー
2.2. IPの特徴
2.3. IPヘッダ
2.4. TCPの特徴
2.5. TCPヘッダ
2.6. UDPの特徴
2.7. UDPヘッダ
2.8. ICMPの特徴
2.9. ICMPヘッダ
2.9.1. ICMPエコー要求/応答
2.9.2. ICMP到達不能メッセージ (Destination Unreachable)
2.9.3. ソースクエンチ (Source Quench)
2.9.4. リダイレクト
2.9.5. TTL equals 0
2.9.6. パラメータ障害
2.9.7. タイムスタンプ要求/応答
2.9.8. インフォメーション要求/応答
2.10. TCP/IP宛先誘導型ルーティング
2.11. まとめ
3. IPフィルタリングとは
3.1. IPフィルタとは何か
3.2. IPフィルタリングの用語と表現
3.3. IPフィルタの計画の仕方
3.4. まとめ
4. ネットワークアドレス変換とは
4.1. NATの利用目的と用語解説
4.2. NAT使用時の注意点
4.3. 概念を理解するためのNATマシン構築例
4.3.1. NATマシン構築に必要なもの
4.3.2. NATマシンの配備位置
4.3.3. プロキシの配置の仕方
4.3.4. NATマシン構築の最終段階
4.4. まとめ
5. 準備
5.1. iptablesの入手先
5.2. カーネルのセットアップ
5.3. ユーザ空間のセットアップ
5.3.1. ユーザ空間アプリケーションのコンパイル
5.3.2. Red Hat 7.1 でのインストール
6. テーブルとチェーンの道のり
6.1. 全般
6.2. mangleテーブル
6.3. natテーブル
6.4. filterテーブル
7. ステート機構
7.1. はじめに
7.2. conntrackエントリ
7.3. ユーザ空間でのステート
7.4. TCP接続
7.5. UDP接続
7.6. ICMP接続
7.7. デフォルトの接続
7.8. 複雑なプロトコルとコネクション追跡
8. 大きなルールセットの保存とリストア
8.1. 速度に関する考察
8.2. restoreの欠点
8.3. iptables-save
8.4. iptables-restore
9. ルールの作り方
9.1. iptablesのコマンドの基本
9.2. テーブル
9.3. コマンド
10. iptablesのマッチ
10.1. 汎用的なマッチ
10.2. 暗黙的なマッチ
10.2.1. TCPマッチ
10.2.2. UDPマッチ
10.2.3. ICMPマッチ
10.3. 明示的なマッチ
10.3.1. AH/ESPマッチ
10.3.2. Conntrackマッチ
10.3.3. DSCPマッチ
10.3.4. ECNマッチ
10.3.5. Helperマッチ
10.3.6. IP rangeマッチ
10.3.7. Lengthマッチ
10.3.8. Limitマッチ
10.3.9. MACマッチ
10.3.10. Markマッチ
10.3.11. Multiportマッチ
10.3.12. Ownerマッチ
10.3.13. Packet type マッチ
10.3.14. Recentマッチ
10.3.15. Stateマッチ
10.3.16. TCPMSSマッチ
10.3.17. TOSマッチ
10.3.18. TTLマッチ
10.3.19. Uncleanマッチ
11. iptablesのターゲットとジャンプ
11.1. ACCEPTターゲット
11.2. CLASSIFYターゲット
11.3. DNATターゲット
11.4. DROPターゲット
11.5. DSCPターゲット
11.6. ECNターゲット
11.7. LOGターゲット
11.8. MARKターゲット
11.9. MASQUERADEターゲット
11.10. MIRRORターゲット
11.11. NETMAPターゲット
11.12. QUEUEターゲット
11.13. REDIRECTターゲット
11.14. REJECTターゲット
11.15. RETURNターゲット
11.16. SAMEターゲット
11.17. SNATターゲット
11.18. TCPMSSターゲット
11.19. TOSターゲット
11.20. TTLターゲット
11.21. ULOGターゲット
12. スクリプトのデバグ
12.1. デバグ、それは必要欠くべからざるもの
12.2. Bashデバグテクニック
12.3. デバグに役立つシステムツール
12.4. iptablesのデバグ
12.5. その他のデバグツール
12.5.1. Nmap
12.5.2. Nessus
12.6. まとめ
13. rc.firewallファイル
13.1. 例 rc.firewall
13.2. rc.firewallの説明
13.2.1. 設定オプション
13.2.2. 追加モジュールの初期ロード
13.2.3. procの設定
13.2.4. 各種チェーンへのルール配置
13.2.5. デフォルトポリシーの設定
13.2.6. filterテーブルにユーザ定義チェーンを作る
13.2.7. INPUTチェーン
13.2.8. FORWARDチェーン
13.2.9. OUTPUTチェーン
13.2.10. natテーブルのPREROUTINGチェーン
13.2.11. SNATの開始とPOSTROUTINGチェーン
14. スクリプト例
14.1. rc.firewall.txtスクリプトの構造
14.1.1. 構造
14.2. rc.firewall.txt
14.3. rc.DMZ.firewall.txt
14.4. rc.DHCP.firewall.txt
14.5. rc.UTIN.firewall.txt
14.6. rc.test-iptables.txt
14.7. rc.flush-iptables.txt
14.8. Limit-match.txt
14.9. Pid-owner.txt
14.10. Recent-match.txt
14.11. Sid-owner.txt
14.12. Ttl-inc.txt
14.13. Iptables-save ruleset
15. iptables/netfilter用グラフィカルユーザインターフェイス
15.1. fwbuilder
15.2. Turtle Firewall プロジェクト
15.3. Integrated Secure Communications System
15.4. IPMenu
15.5. Easy Firewall Generator
15.6. まとめ
A. 特別なコマンドの詳細解説
A.1. 稼働中のルールセットのリストアップ
A.2. テーブルのアップデートとフラッシュ
B. よくある問題と質問
B.1. モジュールロードのトラブル
B.2. NEWステートでありながらSYNビットの立っていないパケット
B.3. SYN/ACKでNEWなパケット
B.4. 予約済みIPアドレスを使用するインターネットサービスプロバイダ
B.5. iptablesにDHCPリクエストを通させる
B.6. mIRC DCCのトラブル
C. ICMPタイプ
D. TCPオプション
E. その他の資料とリンク
F. 謝辞
G. History
H. GNUフリー文書利用許諾契約書
0. PREAMBLE
1. APPLICABILITY AND DEFINITIONS
2. VERBATIM COPYING
3. COPYING IN QUANTITY
4. MODIFICATIONS
5. COMBINING DOCUMENTS
6. COLLECTIONS OF DOCUMENTS
7. AGGREGATION WITH INDEPENDENT WORKS
8. TRANSLATION
9. TERMINATION
10. FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
I. GNU一般公衆利用許諾契約書
0. Preamble
1. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2. How to Apply These Terms to Your New Programs
J. スクリプト例コードベース
J.1. Example rc.firewall script
J.2. Example rc.DMZ.firewall script
J.3. Example rc.UTIN.firewall script
J.4. Example rc.DHCP.firewall script
J.5. Example rc.flush-iptables script
J.6. Example rc.test-iptables script
List of Tables
6-1. ローカルホスト (我々のマシン) を宛先とするパケット
6-2. ローカルホスト (我々のマシン) を送信元とするパケット
6-3. フォワードパケット
7-1. ユーザ空間でのステート
7-2. 内部ステート
7-3. サポートされている複雑なプロトコル
9-1. テーブル
9-2. コマンド
9-3. オプション
10-1. 汎用的なマッチ
10-2. TCPマッチ
10-3. UDPマッチ
10-4. ICMPマッチ
10-5. AHマッチオプション
10-6. ESPマッチオプション
10-7. Conntrackマッチオプション
10-8. DSCPマッチオプション
10-9. ECNマッチオプション
10-10. IP内のECNフィールド
10-11. Helperマッチオプション
10-12. IP rangeマッチオプション
10-13. Lengthマッチオプション
10-14. Limitマッチオプション
10-15. MACマッチオプション
10-16. Markマッチオプション
10-17. Multiportマッチオプション
10-18. Ownerマッチオプション
10-19. Packet typeマッチオプション
10-20. Recentマッチオプション
10-21. Stateマッチ
10-22. TCPMSSマッチオプション
10-23. TOSマッチ
10-24. TTLマッチ
11-1. CLASSIFYターゲットオプション
11-2. DNATターゲット
11-3. DSCPターゲットオプション
11-4. ECNターゲットオプション
11-5. LOGターゲットオプション
11-6. MARKターゲットオプション
11-7. MASQUERADEターゲット
11-8. NETMAPターゲットオプション
11-9. REDIRECTターゲット
11-10. REJECTターゲット
11-11. SAMEターゲットオプション
11-12. SNATターゲットオプション
11-13. TCPMSSターゲットオプション
11-14. TOSターゲット
11-15. TTLターゲット
11-16. ULOGターゲット
C-1. ICMPタイプ
D-1. TCPオプション
Firewall Builder
Object-oriented GUI and set of compilers for various firewall platforms. Currently implemented compilers for iptables, ipfilter, OpenBSD pf, ipfw, Cisco PIX firewall and routers access lists.
Oracle Tips by Burleson Oracle10g Rename Tablespace
Oracle Tips by Burleson
Oracle10g Rename Tablespace
Another great new feature in tablespace management is rename tablespace.
Tablespace Rename Overview
In Oracle 10g, you can simply rename a tablespace TBS01 to TBS02 by issuing the following command:
ALTER TABLESPACE tbs01 RENAME TO tbs02;
However, you must follow the rules when renaming a tablespace:
You must set compatibility level to at least 10.0.1.
You cannot rename the SYSTEM or SYSAUX tablespaces.
You cannot rename an offline tablespace.
You cannot rename a tablespace that contains offline datafiles.
Renaming a tablespace does not changes its tablespace identifier.
Renaming a tablespace does not change the name of its datafiles.
Tablespace Rename Benefits
Tablespace rename provides the following benefits:
It simplifies the process of tablespace migration within a database.
It simplifies the process of transporting a tablespace between two databases.
Examples
Example 1: Rename a tablespace within a database. In Oracle9i or earlier releases, you must take the following steps to rename a tablespace from OLD_TBS to NEW_TBS:
Create a new tablespace NEW_TBS.
Copy all objects from OLD_TBS to NEW_TBS.
Drop tablespace OLD_TBS.
In Oracle 10g, you can accomplish the same thing in one step and rename tablespace OLD_TBS to NEW_TBS.
ALTER TABLESPACE old_tbs RENAME TO new_tbs;
Example 2: Transport a tablespace between two databases. In the following example (see figure 3.2), you cannot transport a tablespace TBS01 from database A to database B in the previous release of Oracle server because database B also has a tablespace called TBS01. In Oracle 10g, you can simply rename TBS01 to TBS02 in database B before transporting tablespace TBS01.
Oracle10g Rename Tablespace
Another great new feature in tablespace management is rename tablespace.
Tablespace Rename Overview
In Oracle 10g, you can simply rename a tablespace TBS01 to TBS02 by issuing the following command:
ALTER TABLESPACE tbs01 RENAME TO tbs02;
However, you must follow the rules when renaming a tablespace:
You must set compatibility level to at least 10.0.1.
You cannot rename the SYSTEM or SYSAUX tablespaces.
You cannot rename an offline tablespace.
You cannot rename a tablespace that contains offline datafiles.
Renaming a tablespace does not changes its tablespace identifier.
Renaming a tablespace does not change the name of its datafiles.
Tablespace Rename Benefits
Tablespace rename provides the following benefits:
It simplifies the process of tablespace migration within a database.
It simplifies the process of transporting a tablespace between two databases.
Examples
Example 1: Rename a tablespace within a database. In Oracle9i or earlier releases, you must take the following steps to rename a tablespace from OLD_TBS to NEW_TBS:
Create a new tablespace NEW_TBS.
Copy all objects from OLD_TBS to NEW_TBS.
Drop tablespace OLD_TBS.
In Oracle 10g, you can accomplish the same thing in one step and rename tablespace OLD_TBS to NEW_TBS.
ALTER TABLESPACE old_tbs RENAME TO new_tbs;
Example 2: Transport a tablespace between two databases. In the following example (see figure 3.2), you cannot transport a tablespace TBS01 from database A to database B in the previous release of Oracle server because database B also has a tablespace called TBS01. In Oracle 10g, you can simply rename TBS01 to TBS02 in database B before transporting tablespace TBS01.
Oracle 11g XDB Guide 28 Using Protocols to Access the Repository
28 Using Protocols to Access the Repository
This chapter describes how to access Oracle XML DB Repository data using FTP, HTTP(S)/WebDAV protocols.
This chapter contains these topics:
Overview of Oracle XML DB Protocol Server
Oracle XML DB Protocol Server Configuration Management
Using FTP and Oracle XML DB Protocol Server
Using HTTP(S) and Oracle XML DB Protocol Server
Using WebDAV and Oracle XML DB
Overview of Oracle XML DB Protocol Server
As described in Chapter 2, "Getting Started with Oracle XML DB" and Chapter 21, "Accessing Oracle XML DB Repository Data", Oracle XML DB Repository provides a hierarchical data repository in the database, designed for XML. Oracle XML DB Repository maps path names (or URLs) onto database objects of XMLType and provides management facilities for these objects.
Oracle XML DB also provides the Oracle XML DB protocol server. This supports standard Internet protocols, FTP, WebDAV, and HTTP(S), for accessing its hierarchical repository or file system. Note that HTTPS provides secure access to Oracle XML DB Repository.
These protocols can provide direct access to Oracle XML DB for many users without having to install additional software. The user names and passwords to be used with the protocols are the same as those for SQL*Plus. Enterprise users are also supported. Database administrators can use these protocols and resource APIs such as DBMS_XDB to access Automatic Storage Management (ASM) files and folders in the repository virtual folder /sys/asm.
See Also:
Chapter 21, "Accessing Oracle XML DB Repository Data" for more information about accessing repository information, and restrictions on that access
Note:
When accessing virtual folder /sys/asm using Oracle XML DB protocols, you must log in as a DBA user other than SYS.
Oracle XML DB protocols are not supported on EBCDIC platforms.
Session Pooling
Oracle XML DB protocol server maintains a shared pool of sessions. Each protocol connection is associated with one session from this pool. After a connection is closed the session is put back into the shared pool and can be used to serve later connections.
Session pooling improves performance of HTTP(S) by avoiding the cost of re-creating session states, especially when using HTTP 1.0, which creates new connections for each request. For example, a couple of small files can be retrieved by an existing HTTP/1.1 connection in the time necessary to create a database session. You can tune the number of sessions in the pool by setting session-pool-size in Oracle XML DB xdbconfig.xml file, or disable it by setting pool size to zero.
Session pooling can affect users writing Java servlets, because other users can see session state initialized by another request for a different user. Hence, servlet writers should only use session memory, such as Java static variables, to hold data for the entire application rather than for a particular user. State for each user must be stored in the database or in a lookup table, rather than assuming that a session will only exist for a single user.
See Also:
Chapter 32, "Writing Oracle XML DB Applications in Java"
Figure 28-1 illustrates the Oracle XML DB protocol server components and how they are used to access files in Oracle XML DB Repository and other data. Only the relevant components of the repository are shown
Figure 28-1 Oracle XML DB Architecture: Protocol Server
Description of "Figure 28-1 Oracle XML DB Architecture: Protocol Server "
Oracle XML DB Protocol Server Configuration Management
Oracle XML DB protocol server uses configuration parameters stored in /xdbconfig.xml to initialize its startup state and manage session level configuration. The following section describes the protocol-specific configuration parameters that you can configure in the Oracle XML DB configuration file. The session pool size and timeout parameters cannot be changed dynamically, that is, you will need to restart the database in order for these changes to take effect.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml"
Configuring Protocol Server Parameters
Figure 28-1 shows the parameters common to all protocols. All parameter names in this table, except those starting with /xdbconfig, are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/common
FTP-specific parameters – Table 28-2 shows the FTP-specific parameters. These are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/ftpconfig
HTTP(S)/WebDAV specific parameters, except servlet-related parameters – Table 28-3 shows the HTTP(S)/WebDAV-specific parameters. These parameters are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/httpconfig
Note:
You must either configure the port separately for each node of a Real Application Cluster (RAC) or configure it for one node and then restart the database instances on the other nodes. See "Configuring Oracle XML DB Using xdbconfig.xml".
See Also:
Chapter 34, "Administering Oracle XML DB" for more information about the configuration file xdbconfig.xml
"xdbconfig.xsd: XML Schema for Configuring Oracle XML DB"
"Configuring Default Namespace to Schema Location Mappings" for more information about the schemaLocation-mappings parameter
"Configuring XML File Extensions" for more information about the xml-extensions parameter
For examples of the usage of these parameters, see the configuration file, xdbconfig.xml.
Table 28-1 Common Protocol Configuration ParametersParameter Description
extension-mappings/mime-mappings
Specifies the mapping of file extensions to mime types. When a resource is stored in Oracle XML DB Repository, and its mime type is not specified, this list of mappings is used to set its mime type.
extension-mappings/lang-mappings
Specifies the mapping of file extensions to languages. When a resource is stored in Oracle XML DB Repository, and its language is not specified, this list of mappings is used to set its language.
extension-mappings/encoding-mappings
Specifies the mapping of file extensions to encodings. When a resource is stored in Oracle XML DB Repository, and its encoding is not specified, this list of mappings is used to set its encoding.
xml-extensions
Specifies the list of filename extensions that are treated as XML content by Oracle XML DB.
session-pool-size
Maximum number of sessions that are kept in the protocol server session pool
/xdbconfig/sysconfig/call-timeout
If a connection is idle for this time (in hundredths of a second), then the shared server serving the connection is freed up to serve other connections.
session-timeout
Time (in hundredths of a second) after which a session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time. This parameter is used only if the specific protocol session timeout is not present in the configuration
schemaLocation-mappings
Specifies the default schema location for a given namespace. This is used if the instance XML document does not contain an explicit xsi:schemaLocation attribute.
/xdbconfig/sysconfig/default-lock-timeout
Time period after which a WebDAV lock on a resource becomes invalid. This could be overridden by a Timeout specified by the client that locks the resource.
Table 28-2 Configuration Parameters Specific to FTPParameter Description
buffer-size
Size of the buffer, in bytes, used to read data from the network during an FTP put operation. Set buffer-size to larger values for higher put performance. There is a trade-off between put performance and memory usage. Value can be from 1024 to 1048496, inclusive; the default value is 8192.
ftp-port
Port on which FTP server listens. By default, this is 0, which means that FTP is disabled. FTP is disabled by default because the FTP specification requires that passwords be transmitted in clear text, which can present a security hazard. To enable FTP, set this parameter to the FTP port to use, such as 2100.
ftp-protocol
Protocol over which the FTP server runs. By default, this is tcp.
ftp-welcome-message
A user-defined welcome message that is displayed whenever an FTP client connects to the server. If this parameter is empty or missing, then the following default welcome message is displayed: "Unauthorized use of this FTP server is prohibited and may be subject to civil and criminal prosecution."
session-timeout
Time (in hundredths of a second) after which an FTP connection will be terminated by the protocol server if the connection has been idle for that time.
Table 28-3 Configuration Parameters Specific to HTTP(S)/WebDAV (Except Servlet Parameters)Parameter Description
http-port
Port on which the HTTP(S)/WebDAV server listens, using protocol http-protocol. By default, this is 0, which means that HTTP is disabled. If this parameter is empty ( ), then the default value of 0 applies. An empty parameter is not recommended.
This parameter must be present, whether or not it is empty; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. The value must be different from the value of http2-port; otherwise, an error is raised.
http2-port
Port on which the HTTP(S)/WebDAV server listens, using protocol http2-protocol.
This parameter is optional, but, if present, then http2-protocol must also be present; otherwise, an error is raised. The value must be different from the value of http-port; otherwise, an error is raised. An empty parameter ( ) also raises an error.
http-protocol
Protocol over which the HTTP(S)/WebDAV server runs on port http-port. Must be either TCP or TCPS.
This parameter must be present; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. An empty parameter ( ) also raises an error.
http2-protocol
Protocol over which the HTTP(S)/WebDAV server runs on port http2-port. Must be either TCP or TCPS. If this parameter is empty ( ), then the default value of TCP applies. (An empty parameter is not recommended.)
This parameter is optional, but, if present, then http2-port must also be present; otherwise, an error is raised.
session-timeout
Time (in hundredths of a second) after which an HTTP(S) session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time.
max-header-size
Maximum size (in bytes) of an HTTP(S) header
max-request-body
Maximum size (in bytes) of an HTTP(S) request body
webappconfig/welcome-file-list
List of filenames that are considered welcome files. When an HTTP(S) get request for a container is received, the server first checks if there is a resource in the container with any of these names. If so, then the contents of that file are sent, instead of a list of resources in the container.
default-url-charset
The character set in which an HTTP(S) protocol server assumes incoming URL is encoded when it is not encoded in UTF-8 or the Content-Type field Charset parameter of the request.
allow-repository-anonymous-access
Indication of whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. See "Anonymous Access to Oracle XML DB Repository using HTTP".
Configuring Secure HTTP (HTTPS)
To enable Oracle XML DB Repository to use secure HTTP connections (HTTPS), a DBA must configure the database accordingly: configure parameters http2-port and http2-protocol, enable the HTTP Listener to use SSL, and enable launching of the TCPS Dispatcher. After doing this, the DBA must stop, then restart, the database and the listener.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Enable the HTTP Listener to Use SSL
A DBA must carry out the following steps, to configure the HTTP Listener for SSL.
Create a wallet for the server and import a certificate – Use Oracle Wallet Manager to do the following:
Create a wallet for the server.
If a valid certificate with distinguished name (DN) of the server is not available, create a certificate request and submit it to a certificate authority. Obtain a valid certificate from the authority.
Import a valid certificate with the distinguished name (DN) of the server into the server.
Save the new wallet in obfuscated form, so that it can be opened without a password.
See Also:
Oracle Database Advanced Security Administrator's Guide for information about how to create a wallet
Specify the wallet location to the server – Use Oracle Net Manager to do this. Ensure that the configuration is saved to disk. This step updates files sqlnet.ora and listener.ora.
Disable client authentication at the server, since most Web clients do not have certificates. Use Oracle Net Manager to do this. This step updates file sqlnet.ora.
Create a listening end point that uses TCP/IP with SSL – Use Oracle Net Manager to do this. This step updates file listener.ora.
See Also:
Oracle Database Advanced Security Administrator's Guide for detailed information regarding steps 1 through 4
Enable TCPS Dispatcher
A DBA must edit the database pfile to enable launching of a TCPS dispatcher during database startup. The following line must be added to the file, where SID is the SID of the database:
dispatchers=(protocol=tcps)(service=SIDxdb)
The database pfile location depends on your operating system, as follows:
MS Windows – PARENT/admin/orcl/pfile, where PARENT is the parent folder of folder ORACLE_HOME
Unix, Linux – $ORACLE_HOME/admin/$ORACLE_SID/pfile
Interaction with Oracle XML DB File-System Resources
The protocol specifications, RFC 959 (FTP), RFC 2616 (HTTP), and RFC 2518 (WebDAV) implicitly assume an abstract, hierarchical file system on the server side. This is mapped to Oracle XML DB Repository. The repository provides:
Name resolution.
Security based on access control lists (ACLs). An ACL is a list of access control entries that determine which principals have access to a given resource or resources. See also Chapter 27, "Repository Resource Security".
The ability to store and retrieve any content. The repository can store both binary data input through FTP and XML schema-based documents.
See Also:
http://www.ietf.org/rfc/rfc959.txt
http://www.ietf.org/rfc/rfc2616.txt
http://www.ietf.org/rfc/rfc2518.txt
Protocol Server Handles XML Schema-Based or Non-Schema-Based XML Documents
Oracle XML DB protocol server enhances the protocols by always checking if XML documents being inserted are based on XML schemas registered in Oracle XML DB Repository.
If the incoming XML document specifies an XML schema, then the Oracle XML DB storage to use is determined by that XML schema. This functionality is especially useful when you must store XML documents object-relationally in the database using simple protocols like FTP or WebDAV instead of using SQL statements.
If the incoming XML document is not XML schema-based, then it is stored as a binary document.
Event-Based Logging
In certain cases, it may be useful to log the requests received and responses sent by a protocol server. This can be achieved by setting event number 31098 to level 2. To set this event, add the following line to your init.ora file and restart the database:
event="31098 trace name context forever, level 2"
Using FTP and Oracle XML DB Protocol Server
The following sections describe FTP features supported by Oracle XML DB.
Oracle XML DB Protocol Server: FTP Features
File Transfer Protocol (FTP) is one of the oldest and most popular protocols on the net. FTP is specified in RFC959 and provides access to heterogeneous file systems in a uniform manner. FTP works by providing well-defined commands (methods) for communication between the client and the server. The transfer of command messages and the return of status happens on a single connection. However, a new connection is opened between the client and the server for data transfer. With HTTP(S), commands and data are transferred using a single connection.
FTP is implemented by dedicated clients at the operating system level, file-system explorer clients, and browsers. FTP is typically session-oriented: a user session is created through an explicit logon, a number of files or directories are downloaded and browsed, and then the connection is closed.
Note:
For security reasons, FTP is disabled, by default. This is because the IETF FTP protocol specification requires that passwords be transmitted in clear text. Disabling is done by configuring the FTP server port as zero (0). To enable FTP, set the ftp-port parameter to the FTP port to use, such as 2100.
See Also:
RFC 959: FTP Protocol Specification – http://www.ietf.org/rfc/rfc959.txt
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring parameters
FTP Features That Are Not Supported
Oracle XML DB implements FTP, as defined by RFC 959, with the exception of the following optional features:
Record-oriented files, for example, only the FILE structure of the STRU method is supported. This is the most widely used structure for transfer of files. It is also the default specified by the specification. Structure mount is not supported.
Append.
Allocate. This pre-allocates space before file transfer.
Account. This uses the insecure Telnet protocol.
Abort.
FTP Client Methods That Are Supported
For access to the repository, Oracle XML DB supports the following FTP client methods.
cdup – change working directory to parent directory
cwd – change working directory
dele – delete file (not directory)
list, nlst – list files in working directory
mkd – create directory
noop – do nothing (but timeout counter on connection is reset)
pasv, port – establish a TCP data connection
pwd – get working directory
quit – close connection and quit FTP session
retr – retrieve data using an established connection
rmd – remove directory
rnfr, rnto – rename file (two-step process: from file, to file)
stor – store data using an established connection
syst – get system version
type – change data type: ascii or image binary types only
user, pass – user login
See Also:
"FTP Quote Methods" for supported FTP quote methods
"Using FTP with ASM Files" for an example of using FTP method proxy
FTP Quote Methods
Oracle Database supports several FTP quote methods, which provide information directly to Oracle XML DB.
rm_r – Remove file or folder. If a folder, recursively remove all files and folders contained in .
quote rm_r
rm_f – Forcibly remove a resource.
quote rm_f
rm_rf – Combines rm_r and rm_f: Forcibly and recursively removes files and folders.
quote rm_rf
set_nls_locale – Specify the character-set encoding () to be used for file and directory names in FTP methods (including names in method responses).
quote set_nls_locale { | NULL}
Only IANA character-set names can be specified for. If nls_locale is set to NULL or is not set, then the database character set is used.
set_charset – Specify the character set of the data to be sent to the server.
quote set_charset { | NULL}
The set_charset method applies to only text files, not binary files, as determined by the file-extension mapping to MIME types that is defined in configuration file xdbconfig.xml.
If the parameter provided to set_charset is (not NULL), then it specifies the character set of the data.
If the parameter provided to set_charset is NULL, or if no set_charset command is given, then the MIME type of the data determines the character set for the data.
If the MIME type is not text/xml), then the data is not assumed to be XML. The database character set is used.
If the MIME type is text/xml, then the data represents an XML document.
If a byte order markFoot 1 (BOM) is present in the XML document, then it determines the character set of the data.
If there is no BOM, then:
If there is an encoding declaration in the XML document, then it determines the character set of the data.
If there is no encoding declaration, then the UTF-8 character set is used.
Using FTP with ASM Files
Automatic Storage Management (ASM) organizes database files into disk groups for simplified management and added benefits such as database mirroring and I/O balancing. Database administrators can use protocols and resource APIs to access ASM files in the Oracle XML DB repository virtual folder /sys/asm. All files in /sys/asm are binary.
Typical uses are listing, copying, moving, creating, and deleting ASM files and folders. Example 28-1 is an example of navigating the ASM virtual folder and listing the files in a subfolder.
Example 28-1 Navigating ASM Folders
The structure of the ASM virtual folder, /sys/asm, is described in Chapter 21, "Accessing Oracle XML DB Repository Data". In this example, the disk groups are DATA and RECOVERY; the database name is MFG; and the directories created for aliases are dbs and tmp. This example navigates to a subfolder, lists its files, and copies a file to the local file system.
ftp> open myhost 7777
ftp> user system
Password required for SYSTEM
Password: password
ftp> cd /sys/asm
ftp> ls
DATA
RECOVERY
ftp> cd DATA
ftp> ls
dbs
MFG
ftp> cd dbs
ftp> ls
t_dbl.f
t_axl.f
ftp> binary
ftp> get t_dbl.f, t_axl.f
ftp> put my_db2.f
In this example, after connecting to and logging onto database myhost (first three lines), FTP methods cd and ls are used to navigate and list folders, respectively. When in folder /sys/asm/DATA/dbs, FTP command get is used to copy files t_db1.f and t_ax1.f to the current folder of the local file system. Then, FTP command put is used to copy file my_db2.f from the local file system to folder /sys/asm/DATA/dbs.
Database administrators can copy ASM files from one database server to another, as well as between the database and a local file system. Example 28-2 shows copying between two databases. For this, the proxy FTP client method can be used, if available. The proxy method provides a direct connection to two different remote FTP servers.
Example 28-2 copies an ASM file from one database to another. Terms with the suffix 1 correspond to database server1; terms with the suffix 2 correspond to database server2. Note that, depending on your FTP client, the passwords you type might be echoed on your screen. Take the necessary precautions so that others do not see these passwords.
Example 28-2 Transferring ASM Files Between Databases with FTP proxy Method
1 ftp> open server1 port1
2 ftp> user username1
3 Password required for USERNAME1
4 Password: password-for-username1
5 ftp> cd /sys/asm/DATAFILE/MFG/DATAFILE
6 ftp> proxy open server2 port2
7 ftp> proxy user username2
8 Password required for USERNAME2
9 Password: password-for-username2
10 ftp> proxy cd /sys/asm/DATAFILE/MFG/DATAFILE
11 ftp> proxy put dbs2.f tmp1.f
12 ftp> proxy get dbs1.f tmp2.f
In this example:
Line 1 opens an FTP control connection to the Oracle XML DB FTP server, server1.
Lines 2–4 log the DBA onto server1 as USERNAME1.
Line 5 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server1.
Line 6 opens an FTP control connection to the second database server, server2. At this point, the FTP command proxy ? could be issued to see the available FTP commands on the secondary connection. (This is not shown.)
Lines 7–9 log the DBA onto server2 as USERNAME2.
Line 10 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server2.
Line 11 copies ASM file dbs2.f from server2 to ASM file tmp1.f on server1.
Line 12 copies ASM file dbs1.f from server1 to ASM file tmp2.f on server2.
Using FTP on the Standard Port Instead of the Oracle XML DB Default Port
You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure FTP to listen on any port. By default, FTP listens on a nonstandard, unprotected port. To use FTP on the standard port, 21, your DBA must do the following:
(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr
(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name:
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = 21))
(PROTOCOL_STACK = (PRESENTATION = FTP) (SESSION = RAW)))
(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &
Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.
Use PL/SQL procedure DBMS_XDB.setftpport with SYS as SYSDBA to set the FTP port number to 21 in the Oracle XML DB configuration file /xdbconfig.xml:
SQL> exec DBMS_XDB.setFTPPort(21);
Force the database to reregister with the listener, using this SQL statement:
SQL> ALTER SYSTEM REGISTER;
Check that the listener is correctly configured, using this shell command:
% lsnrctl status
See Also:
Oracle Database Net Services Reference for information about listener parameters and file LISTENER.ora
Oracle Database Net Services Reference, section "Port Number Limitations" for information about running on privileged ports
FTP Server Session Management
Oracle XML DB protocol server also provides session management for this protocol. After a short wait for a new command, FTP returns to the protocol layer and the shared server is freed up to serve other connections. The duration of this short wait is configurable by changing parameter call-timeOut in the Oracle XML DB configuration file. For high traffic sites, call-timeout should be shorter, so that more connections can be served. When new data arrives on the connection, the FTP server is re-invoked with fresh data. So, the long running nature of FTP does not affect the number of connections which can be made to the protocol server.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Handling Error 421. Modifying the Default Timeout Value of an FTP Session
If you are frequently disconnected from the server and have to reconnect and traverse the entire directory before doing the next operation, you may need to modify the default timeout value for FTP sessions. If the session is idle for more than this period, it gets disconnected. You can increase the timeout value (default = 6000 centiseconds) by modifying the configuration document as follows and then restart the database:
Example 28-3 Modifying the Default Timeout Value of an FTP Session
DECLARE
newconfig XMLType;
BEGIN
SELECT
updateXML(
DBMS_XDB.cfg_get(),
'/xdbconfig/sysconfig/protocolconfig/ftpconfig/session-timeout/text()',
123456789)
INTO newconfig
FROM DUAL;
DBMS_XDB.cfg_update(newconfig);
END;/
COMMIT;
FTP Client Failure in Passive Mode
Do not use FTP in passive mode to connect remotely to a server that has HOSTNAME configured in Listener.ora as localhost or 127.0.0.1. If the HOSTNAME specified in server file Listener.ora is localhost or 127.0.0.1, then the server is configured for local use only. If you try to connect remotely to the server using FTP in passive mode, the FTP client will fail. This is because the server passes IP address 127.0.0.1 (derived from HOSTNAME) to the client, which makes the client try to connect to itself, not to the server.
Using HTTP(S) and Oracle XML DB Protocol Server
Oracle XML DB implements HyperText Transfer Protocol (HTTP), HTTP 1.1 as defined in the RFC2616 specification.
Oracle XML DB Protocol Server: HTTP(S) Features
The Oracle XML DB HTTP(S) component in the Oracle XML DB protocol server implements the RFC2616 specification with the exception of the following optional features:
gzip and compress transfer encodings
byte-range headers
The TRACE method (used for proxy error debugging)
Cache-control directives (these require you to specify expiration dates for content, and are not generally used)
TE, Trailer, Vary & Warning headers
Weak entity tags
Web common log format
Multi-homed Web server
See Also:
RFC 2616: HTTP 1.1 Protocol Specification—http://www.ietf.org/rfc/rfc2616.txt
HTTP(S) Features That Are Not Supported
Digest Authentication (RFC 2617) is not supported. Oracle XML DB supports Basic Authentication, where a client sends the user name and password in clear text in the Authorization header.
HTTP(S) Client Methods That Are Supported
For access to the repository, Oracle XML DB supports the following HTTP(S) client methods.
OPTIONS – get information about available communication options
GET – get document/data (including headers)
HEAD – get headers only, without document body
PUT – store data in resource
DELETE – delete resource
The semantics of these HTTP(S) methods are in accordance with WebDAV. Servlets and Web services may support additional HTTP(S) methods, such as POST.
See Also:
"Supported WebDAV Client Methods" for supported HTTP(S) client methods involving WebDAV
Using HTTP(S) on a Standard Port Instead of an Oracle XML DB Default Port
You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure HTTP(S) to listen on any port. By default, HTTP(S) listens on a nonstandard, unprotected port. To use HTTP or HTTPS on a standard port (80 for HTTP, 443 for HTTPS), your DBA must do the following:
(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr
(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name, and port_number is 80 for HTTP or 443 for HTTPS:
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = port_number))
(PROTOCOL_STACK = (PRESENTATION = HTTP) (SESSION = RAW)))
(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &
Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.
Use PL/SQL procedure DBMS_XDB.sethtpport with SYS as SYSDBA to set the HTTP(S) port number to port_number in the Oracle XML DB configuration file /xdbconfig.xml, where port_number is 80 for HTTP or 443 for HTTPS:
SQL> exec DBMS_XDB.setHTTPPort(port_number);
Force the database to reregister with the listener, using this SQL statement:
SQL> ALTER SYSTEM REGISTER;
Check that the listener is correctly configured:
% lsnrctl status
See Also:
Oracle Database Net Services Reference for information about listener parameters and file LISTENER.ora
Oracle Database Net Services Reference, section "Port Number Limitations" for information about running on privileged ports
HTTPS: Support for Secure HTTP
If properly configured, you can access Oracle XML DB Repository in a secure fashion, using HTTPS. See "Configuring Secure HTTP (HTTPS)" for configuration information.
Note:
If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use HTTPS for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry. For information about the latter, see http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2netwk.mspx#XSLTsection129121120120
Anonymous Access to Oracle XML DB Repository using HTTP
Configuration parameter allow-repository-anonymous-access controls whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. To allow anonymous HTTP access to the repository, you must set this parameter to true, and unlock the ANONYMOUS user account.
Caution:
There is an inherent security risk associated with allowing anonymous access to the repository.
Parameter allow-repository-anonymous-access does not control anonymous access to the repository using servlets. Each servlet has its own security-role-ref parameter value to control its access.
See Also:
Table 28-3 for information about parameter allow-repository-anonymous-access
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
"Configuring Oracle XML DB Servlets" for information about parameter security-role-ref
Using Java Servlets with HTTP(S)
Oracle XML DB supports Java servlets. To use a Java servlet, it must be registered with a unique name in the Oracle XML DB configuration file, along with parameters to customize its action. It should be compiled, and loaded into the database. Finally, the servlet name must be associated with a pattern, which can be an extension such as *.jsp or a path name such as /a/b/c or /sys/*, as described in Java servlet application program interface (API) version 2.2.
While processing an HTTP(S) request, the path name for the request is matched with the registered patterns. If there is a match, then the protocol server invokes the corresponding servlet with the appropriate initialization parameters. For Java servlets, the existing Java Virtual Machine (JVM) infrastructure is used. This starts the JVM if need be, which in turn runs a Java method to initialize the servlet, create response, and request objects, pass these on to the servlet, and run it.
See Also:
Chapter 32, "Writing Oracle XML DB Applications in Java"
Embedded PL/SQL Gateway
You can use the PL/SQL gateway to implement a Web application entirely in PL/SQL. There are two implementations of the PL/SQL gateway:
mod_plsql – a plug-in of Oracle HTTP Server that lets you invoke PL/SQL stored procedures using HTTP(S). Oracle HTTP Server is a component of both Oracle Application Server and Oracle Database; it should not be confused with the HTTP component of the Oracle XML DB protocol server.
the embedded PL/SQL gateway – a gateway implementation that runs in the Oracle XML DB HTTP listener.
With the PL/SQL gateway (either implementation), a Web browser sends an HTTP(S) request in the form of a URL that identifies a stored procedure and provides it with parameter values. The gateway translates the URL, calls the stored procedure with the parameter values, and returns output (typically HTML) to the Web-browser client.
Using the embedded PL/SQL gateway simplifies installation, configuration, and administration of PL/SQL based Web applications. The embedded gateway uses the Oracle XML DB protocol server, not Oracle HTTP Server. Its configuration is defined by the Oracle XML DB configuration file, /xdbconfig.xml. However, the recommended way to configure the embedded gateway is to use the procedures in PL/SQL package DBMS_EPG, not to edit file /xdbconfig.xml.
See Also:
Oracle Database Advanced Application Developer's Guide for information on using and configuring the embedded PL/SQL gateway
Chapter 34, "Administering Oracle XML DB" for information on the configuration definition of the embedded gateway in /xdbconfig.xml
Oracle Fusion Middleware Administrator's Guide for Oracle HTTP Server for conceptual information about using the PL/SQL gateway
Oracle HTTP Server mod_plsql User's Guide for information about mod_plsql
Sending Multibyte Data From a Client
When a client sends multibyte data in a URL, RFC 2718 specifies that the client should send the URL using the %HH format, where HH is the hexadecimal notation of the byte value in UTF-8 encoding. The following are URL examples that can be sent to Oracle XML DB in an HTTP(S) or WebDAV context:
http://urltest/xyz%E3%81%82%E3%82%A2
http://%E3%81%82%E3%82%A2
http://%E3%81%82%E3%82%A2/abc%E3%81%86%E3%83%8F.xml
Oracle XML DB processes the requested URL, any URLs within an IF header, any URLs within the DESTINATION header, and any URLs in the REFERRED header that contains multibyte data.
The default-url-charset configuration parameter can be used to accept requests from some clients that use other, nonconforming, forms of URL, with characters that are not ASCII. If a request with such characters fails, try setting this value to the native character set of the client environment. The character set used in such URL fields must be specified with an IANA charset name.
default-url-charset controls the encoding for nonconforming URLs. It is not required to be set unless a nonconforming client that does not send the Content-Type charset is used.
See Also:
RFC 2616: HTTP 1.1 Protocol Specification, http://www.ietf.org/rfc/rfc2616.txt
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Characters That Are Not ASCII In URLs
Characters that are not ASCII that appear in URLs passed to an HTTP server should be converted to UTF-8 and escaped in the %HH format, where HH is the hexadecimal notation of the byte value. For flexibility, the Oracle XML DB protocol server interprets the incoming URLs by testing whether it is encoded in one of the following character sets in the order presented here:
UTF-8
Charset parameter of the Content-Type field of the request, if specified
Character set, if specified, in the default-url-charset configuration parameter
Character set of the database
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Controlling Character Sets for HTTP(S)
The following sections describe how character sets are controlled for data transferred using HTTP(S).
Request Character Set
The character set of the HTTP(S) request body is determined with the following algorithm:
The Content-Type header is evaluated. If the Content-Type header specifies a charset value, the specified charset is used.
The MIME type of the document is evaluated as follows:
If the MIME type is "*/xml", the character set is determined as follows:
- If a BOM is present, then UTF-16 is used.
- If an encoding declaration is present, the specified encoding is used.
- If neither a BOM nor an encoding declaration is present, UTF-8 is used.
If the MIME type is text, ISO8859-1 is used.
If the MIME type is neither "*/xml" nor text, the database character set is used.
There is a difference between HTTP(S) and SQL or FTP. For text documents, the default is ISO8859-1, as specified by the IETF.org RFC 2616: HTTP 1.1 Protocol Specification.
Response Character Set
The response generated by Oracle XML DB HTTP Server is in the character set specified in the Accept-Charset field of the request. Accept-Charset can have a list of character sets. Based on the q-value, Oracle XML DB chooses one that does not require conversion. This might not necessarily be the charset with the highest q-value. If Oracle XML DB cannot find one, then the conversion is based on the highest q-value.
Using WebDAV and Oracle XML DB
Web Distributed Authoring and Versioning (WebDAV) is an IETF standard protocol used to provide users with a file-system interface to Oracle XML Repository over the Internet. The most popular way of accessing a WebDAV server folder is through WebFolders on Microsoft Windows 2000 or Microsoft NT.
WebDAV is an extension to the HTTP 1.1 protocol that lets an HTTP server act as a file server. It lets clients perform remote Web content authoring through a coherent set of methods, headers, request body formats and response body formats. For example, a DAV-enabled editor can interact with an HTTP/WebDAV server as if it were a file system. WebDAV provides operations to store and retrieve resources, create and list contents of resource collections, lock resources for concurrent access in a coordinated manner, and to set and retrieve resource properties.
Oracle XML DB WebDAV Features
Oracle XML DB supports the following WebDAV features:
Foldering, specified by RFC2518
Access Control
WebDAV is a set of extensions to the HTTP(S) protocol that allow you to edit or manage your files on remote Web servers. WebDAV can also be used, for example, to:
Share documents over the Internet
Edit content over the Internet
See Also:
RFC 2518: WebDAV Protocol Specification, http://www.ietf.org/rfc/rfc2518.txt
WebDAV Features That Are Not Supported
Oracle XML DB supports the contents of RFC2518, with the following exceptions:
Lock-NULL resources create zero-length resources in the file system, and cannot be converted to folders.
The COPY, MOVE and DELETE methods comply with section 2 of the Internet Draft titled 'Binding Extensions to WebDAV'.
Depth-infinity locks
Only Basic Authentication is supported.
Supported WebDAV Client Methods
For access to the repository, Oracle XML DB supports the following HTTP(S)/WebDAV client methods.
PROPFIND (WebDAV-specific) – get properties for a resource
PROPPATCH (WebDAV-specific) – set or remove resource properties
LOCK (WebDAV-specific) – lock a resource (create or refresh a lock)
UNLOCK (WebDAV-specific) – unlock a resource (remove a lock)
COPY (WebDAV-specific) – copy a resource
MOVE (WebDAV-specific) – move a resource
MKCOL (WebDAV-specific) – create a folder resource (collection)
See Also:
"HTTP(S) Client Methods That Are Supported" for additional supported HTTP(S) client methods
"Access Privileges" for information about WebDAV privileges
"Using WebDAV PROPPATCH to Add Metadata"
Using WebDAV with Microsoft Windows XP SP2
If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use a secure connection (HTTPS) for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry.
See Also:
http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2netwk.mspx#XSLTsection129121120120 for information about making necessary modifications to the Windows XP registry
"Configuring Secure HTTP (HTTPS)"
Using Oracle XML DB and WebDAV: Creating a WebFolder in Microsoft Windows
To create a WebFolder in Windows 2000, follow these steps:
Start > My Network Places.
Double-click Add Network Place.
Click Next.
Type the location of the folder, for example:
http://Oracle_server_name:HTTP_port_number
See Figure 28-2.
Click Next.
Enter any name to identify this WebFolder
Click Finish.
You can now access Oracle XML DB Repository the same way that you access any Windows folder.
Figure 28-2 Creating a WebFolder in Microsoft Windows
Description of "Figure 28-2 Creating a WebFolder in Microsoft Windows"
This chapter describes how to access Oracle XML DB Repository data using FTP, HTTP(S)/WebDAV protocols.
This chapter contains these topics:
Overview of Oracle XML DB Protocol Server
Oracle XML DB Protocol Server Configuration Management
Using FTP and Oracle XML DB Protocol Server
Using HTTP(S) and Oracle XML DB Protocol Server
Using WebDAV and Oracle XML DB
Overview of Oracle XML DB Protocol Server
As described in Chapter 2, "Getting Started with Oracle XML DB" and Chapter 21, "Accessing Oracle XML DB Repository Data", Oracle XML DB Repository provides a hierarchical data repository in the database, designed for XML. Oracle XML DB Repository maps path names (or URLs) onto database objects of XMLType and provides management facilities for these objects.
Oracle XML DB also provides the Oracle XML DB protocol server. This supports standard Internet protocols, FTP, WebDAV, and HTTP(S), for accessing its hierarchical repository or file system. Note that HTTPS provides secure access to Oracle XML DB Repository.
These protocols can provide direct access to Oracle XML DB for many users without having to install additional software. The user names and passwords to be used with the protocols are the same as those for SQL*Plus. Enterprise users are also supported. Database administrators can use these protocols and resource APIs such as DBMS_XDB to access Automatic Storage Management (ASM) files and folders in the repository virtual folder /sys/asm.
See Also:
Chapter 21, "Accessing Oracle XML DB Repository Data" for more information about accessing repository information, and restrictions on that access
Note:
When accessing virtual folder /sys/asm using Oracle XML DB protocols, you must log in as a DBA user other than SYS.
Oracle XML DB protocols are not supported on EBCDIC platforms.
Session Pooling
Oracle XML DB protocol server maintains a shared pool of sessions. Each protocol connection is associated with one session from this pool. After a connection is closed the session is put back into the shared pool and can be used to serve later connections.
Session pooling improves performance of HTTP(S) by avoiding the cost of re-creating session states, especially when using HTTP 1.0, which creates new connections for each request. For example, a couple of small files can be retrieved by an existing HTTP/1.1 connection in the time necessary to create a database session. You can tune the number of sessions in the pool by setting session-pool-size in Oracle XML DB xdbconfig.xml file, or disable it by setting pool size to zero.
Session pooling can affect users writing Java servlets, because other users can see session state initialized by another request for a different user. Hence, servlet writers should only use session memory, such as Java static variables, to hold data for the entire application rather than for a particular user. State for each user must be stored in the database or in a lookup table, rather than assuming that a session will only exist for a single user.
See Also:
Chapter 32, "Writing Oracle XML DB Applications in Java"
Figure 28-1 illustrates the Oracle XML DB protocol server components and how they are used to access files in Oracle XML DB Repository and other data. Only the relevant components of the repository are shown
Figure 28-1 Oracle XML DB Architecture: Protocol Server
Description of "Figure 28-1 Oracle XML DB Architecture: Protocol Server "
Oracle XML DB Protocol Server Configuration Management
Oracle XML DB protocol server uses configuration parameters stored in /xdbconfig.xml to initialize its startup state and manage session level configuration. The following section describes the protocol-specific configuration parameters that you can configure in the Oracle XML DB configuration file. The session pool size and timeout parameters cannot be changed dynamically, that is, you will need to restart the database in order for these changes to take effect.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml"
Configuring Protocol Server Parameters
Figure 28-1 shows the parameters common to all protocols. All parameter names in this table, except those starting with /xdbconfig, are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/common
FTP-specific parameters – Table 28-2 shows the FTP-specific parameters. These are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/ftpconfig
HTTP(S)/WebDAV specific parameters, except servlet-related parameters – Table 28-3 shows the HTTP(S)/WebDAV-specific parameters. These parameters are relative to the following XPath in the Oracle XML DB configuration schema:
/xdbconfig/sysconfig/protocolconfig/httpconfig
Note:
You must either configure the port separately for each node of a Real Application Cluster (RAC) or configure it for one node and then restart the database instances on the other nodes. See "Configuring Oracle XML DB Using xdbconfig.xml".
See Also:
Chapter 34, "Administering Oracle XML DB" for more information about the configuration file xdbconfig.xml
"xdbconfig.xsd: XML Schema for Configuring Oracle XML DB"
"Configuring Default Namespace to Schema Location Mappings" for more information about the schemaLocation-mappings parameter
"Configuring XML File Extensions" for more information about the xml-extensions parameter
For examples of the usage of these parameters, see the configuration file, xdbconfig.xml.
Table 28-1 Common Protocol Configuration ParametersParameter Description
extension-mappings/mime-mappings
Specifies the mapping of file extensions to mime types. When a resource is stored in Oracle XML DB Repository, and its mime type is not specified, this list of mappings is used to set its mime type.
extension-mappings/lang-mappings
Specifies the mapping of file extensions to languages. When a resource is stored in Oracle XML DB Repository, and its language is not specified, this list of mappings is used to set its language.
extension-mappings/encoding-mappings
Specifies the mapping of file extensions to encodings. When a resource is stored in Oracle XML DB Repository, and its encoding is not specified, this list of mappings is used to set its encoding.
xml-extensions
Specifies the list of filename extensions that are treated as XML content by Oracle XML DB.
session-pool-size
Maximum number of sessions that are kept in the protocol server session pool
/xdbconfig/sysconfig/call-timeout
If a connection is idle for this time (in hundredths of a second), then the shared server serving the connection is freed up to serve other connections.
session-timeout
Time (in hundredths of a second) after which a session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time. This parameter is used only if the specific protocol session timeout is not present in the configuration
schemaLocation-mappings
Specifies the default schema location for a given namespace. This is used if the instance XML document does not contain an explicit xsi:schemaLocation attribute.
/xdbconfig/sysconfig/default-lock-timeout
Time period after which a WebDAV lock on a resource becomes invalid. This could be overridden by a Timeout specified by the client that locks the resource.
Table 28-2 Configuration Parameters Specific to FTPParameter Description
buffer-size
Size of the buffer, in bytes, used to read data from the network during an FTP put operation. Set buffer-size to larger values for higher put performance. There is a trade-off between put performance and memory usage. Value can be from 1024 to 1048496, inclusive; the default value is 8192.
ftp-port
Port on which FTP server listens. By default, this is 0, which means that FTP is disabled. FTP is disabled by default because the FTP specification requires that passwords be transmitted in clear text, which can present a security hazard. To enable FTP, set this parameter to the FTP port to use, such as 2100.
ftp-protocol
Protocol over which the FTP server runs. By default, this is tcp.
ftp-welcome-message
A user-defined welcome message that is displayed whenever an FTP client connects to the server. If this parameter is empty or missing, then the following default welcome message is displayed: "Unauthorized use of this FTP server is prohibited and may be subject to civil and criminal prosecution."
session-timeout
Time (in hundredths of a second) after which an FTP connection will be terminated by the protocol server if the connection has been idle for that time.
Table 28-3 Configuration Parameters Specific to HTTP(S)/WebDAV (Except Servlet Parameters)Parameter Description
http-port
Port on which the HTTP(S)/WebDAV server listens, using protocol http-protocol. By default, this is 0, which means that HTTP is disabled. If this parameter is empty (
This parameter must be present, whether or not it is empty; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. The value must be different from the value of http2-port; otherwise, an error is raised.
http2-port
Port on which the HTTP(S)/WebDAV server listens, using protocol http2-protocol.
This parameter is optional, but, if present, then http2-protocol must also be present; otherwise, an error is raised. The value must be different from the value of http-port; otherwise, an error is raised. An empty parameter (
http-protocol
Protocol over which the HTTP(S)/WebDAV server runs on port http-port. Must be either TCP or TCPS.
This parameter must be present; otherwise, validation of xdbconfig.xml against XML schema xdbconfig.xsd fails. An empty parameter (
http2-protocol
Protocol over which the HTTP(S)/WebDAV server runs on port http2-port. Must be either TCP or TCPS. If this parameter is empty (
This parameter is optional, but, if present, then http2-port must also be present; otherwise, an error is raised.
session-timeout
Time (in hundredths of a second) after which an HTTP(S) session (and consequently the corresponding connection) will be terminated by the protocol server if the connection has been idle for that time.
max-header-size
Maximum size (in bytes) of an HTTP(S) header
max-request-body
Maximum size (in bytes) of an HTTP(S) request body
webappconfig/welcome-file-list
List of filenames that are considered welcome files. When an HTTP(S) get request for a container is received, the server first checks if there is a resource in the container with any of these names. If so, then the contents of that file are sent, instead of a list of resources in the container.
default-url-charset
The character set in which an HTTP(S) protocol server assumes incoming URL is encoded when it is not encoded in UTF-8 or the Content-Type field Charset parameter of the request.
allow-repository-anonymous-access
Indication of whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. See "Anonymous Access to Oracle XML DB Repository using HTTP".
Configuring Secure HTTP (HTTPS)
To enable Oracle XML DB Repository to use secure HTTP connections (HTTPS), a DBA must configure the database accordingly: configure parameters http2-port and http2-protocol, enable the HTTP Listener to use SSL, and enable launching of the TCPS Dispatcher. After doing this, the DBA must stop, then restart, the database and the listener.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Enable the HTTP Listener to Use SSL
A DBA must carry out the following steps, to configure the HTTP Listener for SSL.
Create a wallet for the server and import a certificate – Use Oracle Wallet Manager to do the following:
Create a wallet for the server.
If a valid certificate with distinguished name (DN) of the server is not available, create a certificate request and submit it to a certificate authority. Obtain a valid certificate from the authority.
Import a valid certificate with the distinguished name (DN) of the server into the server.
Save the new wallet in obfuscated form, so that it can be opened without a password.
See Also:
Oracle Database Advanced Security Administrator's Guide for information about how to create a wallet
Specify the wallet location to the server – Use Oracle Net Manager to do this. Ensure that the configuration is saved to disk. This step updates files sqlnet.ora and listener.ora.
Disable client authentication at the server, since most Web clients do not have certificates. Use Oracle Net Manager to do this. This step updates file sqlnet.ora.
Create a listening end point that uses TCP/IP with SSL – Use Oracle Net Manager to do this. This step updates file listener.ora.
See Also:
Oracle Database Advanced Security Administrator's Guide for detailed information regarding steps 1 through 4
Enable TCPS Dispatcher
A DBA must edit the database pfile to enable launching of a TCPS dispatcher during database startup. The following line must be added to the file, where SID is the SID of the database:
dispatchers=(protocol=tcps)(service=SIDxdb)
The database pfile location depends on your operating system, as follows:
MS Windows – PARENT/admin/orcl/pfile, where PARENT is the parent folder of folder ORACLE_HOME
Unix, Linux – $ORACLE_HOME/admin/$ORACLE_SID/pfile
Interaction with Oracle XML DB File-System Resources
The protocol specifications, RFC 959 (FTP), RFC 2616 (HTTP), and RFC 2518 (WebDAV) implicitly assume an abstract, hierarchical file system on the server side. This is mapped to Oracle XML DB Repository. The repository provides:
Name resolution.
Security based on access control lists (ACLs). An ACL is a list of access control entries that determine which principals have access to a given resource or resources. See also Chapter 27, "Repository Resource Security".
The ability to store and retrieve any content. The repository can store both binary data input through FTP and XML schema-based documents.
See Also:
http://www.ietf.org/rfc/rfc959.txt
http://www.ietf.org/rfc/rfc2616.txt
http://www.ietf.org/rfc/rfc2518.txt
Protocol Server Handles XML Schema-Based or Non-Schema-Based XML Documents
Oracle XML DB protocol server enhances the protocols by always checking if XML documents being inserted are based on XML schemas registered in Oracle XML DB Repository.
If the incoming XML document specifies an XML schema, then the Oracle XML DB storage to use is determined by that XML schema. This functionality is especially useful when you must store XML documents object-relationally in the database using simple protocols like FTP or WebDAV instead of using SQL statements.
If the incoming XML document is not XML schema-based, then it is stored as a binary document.
Event-Based Logging
In certain cases, it may be useful to log the requests received and responses sent by a protocol server. This can be achieved by setting event number 31098 to level 2. To set this event, add the following line to your init.ora file and restart the database:
event="31098 trace name context forever, level 2"
Using FTP and Oracle XML DB Protocol Server
The following sections describe FTP features supported by Oracle XML DB.
Oracle XML DB Protocol Server: FTP Features
File Transfer Protocol (FTP) is one of the oldest and most popular protocols on the net. FTP is specified in RFC959 and provides access to heterogeneous file systems in a uniform manner. FTP works by providing well-defined commands (methods) for communication between the client and the server. The transfer of command messages and the return of status happens on a single connection. However, a new connection is opened between the client and the server for data transfer. With HTTP(S), commands and data are transferred using a single connection.
FTP is implemented by dedicated clients at the operating system level, file-system explorer clients, and browsers. FTP is typically session-oriented: a user session is created through an explicit logon, a number of files or directories are downloaded and browsed, and then the connection is closed.
Note:
For security reasons, FTP is disabled, by default. This is because the IETF FTP protocol specification requires that passwords be transmitted in clear text. Disabling is done by configuring the FTP server port as zero (0). To enable FTP, set the ftp-port parameter to the FTP port to use, such as 2100.
See Also:
RFC 959: FTP Protocol Specification – http://www.ietf.org/rfc/rfc959.txt
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring parameters
FTP Features That Are Not Supported
Oracle XML DB implements FTP, as defined by RFC 959, with the exception of the following optional features:
Record-oriented files, for example, only the FILE structure of the STRU method is supported. This is the most widely used structure for transfer of files. It is also the default specified by the specification. Structure mount is not supported.
Append.
Allocate. This pre-allocates space before file transfer.
Account. This uses the insecure Telnet protocol.
Abort.
FTP Client Methods That Are Supported
For access to the repository, Oracle XML DB supports the following FTP client methods.
cdup – change working directory to parent directory
cwd – change working directory
dele – delete file (not directory)
list, nlst – list files in working directory
mkd – create directory
noop – do nothing (but timeout counter on connection is reset)
pasv, port – establish a TCP data connection
pwd – get working directory
quit – close connection and quit FTP session
retr – retrieve data using an established connection
rmd – remove directory
rnfr, rnto – rename file (two-step process: from file, to file)
stor – store data using an established connection
syst – get system version
type – change data type: ascii or image binary types only
user, pass – user login
See Also:
"FTP Quote Methods" for supported FTP quote methods
"Using FTP with ASM Files" for an example of using FTP method proxy
FTP Quote Methods
Oracle Database supports several FTP quote methods, which provide information directly to Oracle XML DB.
rm_r – Remove file or folder
quote rm_r
rm_f – Forcibly remove a resource.
quote rm_f
rm_rf – Combines rm_r and rm_f: Forcibly and recursively removes files and folders.
quote rm_rf
set_nls_locale – Specify the character-set encoding (
quote set_nls_locale {
Only IANA character-set names can be specified for
set_charset – Specify the character set of the data to be sent to the server.
quote set_charset {
The set_charset method applies to only text files, not binary files, as determined by the file-extension mapping to MIME types that is defined in configuration file xdbconfig.xml.
If the parameter provided to set_charset is
If the parameter provided to set_charset is NULL, or if no set_charset command is given, then the MIME type of the data determines the character set for the data.
If the MIME type is not text/xml), then the data is not assumed to be XML. The database character set is used.
If the MIME type is text/xml, then the data represents an XML document.
If a byte order markFoot 1 (BOM) is present in the XML document, then it determines the character set of the data.
If there is no BOM, then:
If there is an encoding declaration in the XML document, then it determines the character set of the data.
If there is no encoding declaration, then the UTF-8 character set is used.
Using FTP with ASM Files
Automatic Storage Management (ASM) organizes database files into disk groups for simplified management and added benefits such as database mirroring and I/O balancing. Database administrators can use protocols and resource APIs to access ASM files in the Oracle XML DB repository virtual folder /sys/asm. All files in /sys/asm are binary.
Typical uses are listing, copying, moving, creating, and deleting ASM files and folders. Example 28-1 is an example of navigating the ASM virtual folder and listing the files in a subfolder.
Example 28-1 Navigating ASM Folders
The structure of the ASM virtual folder, /sys/asm, is described in Chapter 21, "Accessing Oracle XML DB Repository Data". In this example, the disk groups are DATA and RECOVERY; the database name is MFG; and the directories created for aliases are dbs and tmp. This example navigates to a subfolder, lists its files, and copies a file to the local file system.
ftp> open myhost 7777
ftp> user system
Password required for SYSTEM
Password: password
ftp> cd /sys/asm
ftp> ls
DATA
RECOVERY
ftp> cd DATA
ftp> ls
dbs
MFG
ftp> cd dbs
ftp> ls
t_dbl.f
t_axl.f
ftp> binary
ftp> get t_dbl.f, t_axl.f
ftp> put my_db2.f
In this example, after connecting to and logging onto database myhost (first three lines), FTP methods cd and ls are used to navigate and list folders, respectively. When in folder /sys/asm/DATA/dbs, FTP command get is used to copy files t_db1.f and t_ax1.f to the current folder of the local file system. Then, FTP command put is used to copy file my_db2.f from the local file system to folder /sys/asm/DATA/dbs.
Database administrators can copy ASM files from one database server to another, as well as between the database and a local file system. Example 28-2 shows copying between two databases. For this, the proxy FTP client method can be used, if available. The proxy method provides a direct connection to two different remote FTP servers.
Example 28-2 copies an ASM file from one database to another. Terms with the suffix 1 correspond to database server1; terms with the suffix 2 correspond to database server2. Note that, depending on your FTP client, the passwords you type might be echoed on your screen. Take the necessary precautions so that others do not see these passwords.
Example 28-2 Transferring ASM Files Between Databases with FTP proxy Method
1 ftp> open server1 port1
2 ftp> user username1
3 Password required for USERNAME1
4 Password: password-for-username1
5 ftp> cd /sys/asm/DATAFILE/MFG/DATAFILE
6 ftp> proxy open server2 port2
7 ftp> proxy user username2
8 Password required for USERNAME2
9 Password: password-for-username2
10 ftp> proxy cd /sys/asm/DATAFILE/MFG/DATAFILE
11 ftp> proxy put dbs2.f tmp1.f
12 ftp> proxy get dbs1.f tmp2.f
In this example:
Line 1 opens an FTP control connection to the Oracle XML DB FTP server, server1.
Lines 2–4 log the DBA onto server1 as USERNAME1.
Line 5 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server1.
Line 6 opens an FTP control connection to the second database server, server2. At this point, the FTP command proxy ? could be issued to see the available FTP commands on the secondary connection. (This is not shown.)
Lines 7–9 log the DBA onto server2 as USERNAME2.
Line 10 navigates to /sys/asm/DATAFILE/MFG/DATAFILE on server2.
Line 11 copies ASM file dbs2.f from server2 to ASM file tmp1.f on server1.
Line 12 copies ASM file dbs1.f from server1 to ASM file tmp2.f on server2.
Using FTP on the Standard Port Instead of the Oracle XML DB Default Port
You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure FTP to listen on any port. By default, FTP listens on a nonstandard, unprotected port. To use FTP on the standard port, 21, your DBA must do the following:
(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr
(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name:
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = 21))
(PROTOCOL_STACK = (PRESENTATION = FTP) (SESSION = RAW)))
(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &
Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.
Use PL/SQL procedure DBMS_XDB.setftpport with SYS as SYSDBA to set the FTP port number to 21 in the Oracle XML DB configuration file /xdbconfig.xml:
SQL> exec DBMS_XDB.setFTPPort(21);
Force the database to reregister with the listener, using this SQL statement:
SQL> ALTER SYSTEM REGISTER;
Check that the listener is correctly configured, using this shell command:
% lsnrctl status
See Also:
Oracle Database Net Services Reference for information about listener parameters and file LISTENER.ora
Oracle Database Net Services Reference, section "Port Number Limitations" for information about running on privileged ports
FTP Server Session Management
Oracle XML DB protocol server also provides session management for this protocol. After a short wait for a new command, FTP returns to the protocol layer and the shared server is freed up to serve other connections. The duration of this short wait is configurable by changing parameter call-timeOut in the Oracle XML DB configuration file. For high traffic sites, call-timeout should be shorter, so that more connections can be served. When new data arrives on the connection, the FTP server is re-invoked with fresh data. So, the long running nature of FTP does not affect the number of connections which can be made to the protocol server.
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Handling Error 421. Modifying the Default Timeout Value of an FTP Session
If you are frequently disconnected from the server and have to reconnect and traverse the entire directory before doing the next operation, you may need to modify the default timeout value for FTP sessions. If the session is idle for more than this period, it gets disconnected. You can increase the timeout value (default = 6000 centiseconds) by modifying the configuration document as follows and then restart the database:
Example 28-3 Modifying the Default Timeout Value of an FTP Session
DECLARE
newconfig XMLType;
BEGIN
SELECT
updateXML(
DBMS_XDB.cfg_get(),
'/xdbconfig/sysconfig/protocolconfig/ftpconfig/session-timeout/text()',
123456789)
INTO newconfig
FROM DUAL;
DBMS_XDB.cfg_update(newconfig);
END;/
COMMIT;
FTP Client Failure in Passive Mode
Do not use FTP in passive mode to connect remotely to a server that has HOSTNAME configured in Listener.ora as localhost or 127.0.0.1. If the HOSTNAME specified in server file Listener.ora is localhost or 127.0.0.1, then the server is configured for local use only. If you try to connect remotely to the server using FTP in passive mode, the FTP client will fail. This is because the server passes IP address 127.0.0.1 (derived from HOSTNAME) to the client, which makes the client try to connect to itself, not to the server.
Using HTTP(S) and Oracle XML DB Protocol Server
Oracle XML DB implements HyperText Transfer Protocol (HTTP), HTTP 1.1 as defined in the RFC2616 specification.
Oracle XML DB Protocol Server: HTTP(S) Features
The Oracle XML DB HTTP(S) component in the Oracle XML DB protocol server implements the RFC2616 specification with the exception of the following optional features:
gzip and compress transfer encodings
byte-range headers
The TRACE method (used for proxy error debugging)
Cache-control directives (these require you to specify expiration dates for content, and are not generally used)
TE, Trailer, Vary & Warning headers
Weak entity tags
Web common log format
Multi-homed Web server
See Also:
RFC 2616: HTTP 1.1 Protocol Specification—http://www.ietf.org/rfc/rfc2616.txt
HTTP(S) Features That Are Not Supported
Digest Authentication (RFC 2617) is not supported. Oracle XML DB supports Basic Authentication, where a client sends the user name and password in clear text in the Authorization header.
HTTP(S) Client Methods That Are Supported
For access to the repository, Oracle XML DB supports the following HTTP(S) client methods.
OPTIONS – get information about available communication options
GET – get document/data (including headers)
HEAD – get headers only, without document body
PUT – store data in resource
DELETE – delete resource
The semantics of these HTTP(S) methods are in accordance with WebDAV. Servlets and Web services may support additional HTTP(S) methods, such as POST.
See Also:
"Supported WebDAV Client Methods" for supported HTTP(S) client methods involving WebDAV
Using HTTP(S) on a Standard Port Instead of an Oracle XML DB Default Port
You can use the Oracle XML DB configuration file, /xdbconfig.xml, to configure HTTP(S) to listen on any port. By default, HTTP(S) listens on a nonstandard, unprotected port. To use HTTP or HTTPS on a standard port (80 for HTTP, 443 for HTTPS), your DBA must do the following:
(UNIX only) Use this shell command to ensure that the owner and group of executable file tnslsnr are root:
% chown root:root $ORACLE_HOME/bin/tnslsnr
(UNIX only) Add the following entry to the listener file, LISTENER.ora, where hostname is your host name, and port_number is 80 for HTTP or 443 for HTTPS:
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname) (PORT = port_number))
(PROTOCOL_STACK = (PRESENTATION = HTTP) (SESSION = RAW)))
(UNIX only) Stop, then restart the listener, using the following shell commands, where user_id and group_id are your UNIX user and group identifiers, respectively:
% lsnrctl stop
% tnslsnr LISTENER -user user_id -group group_id &
Use the ampersand (&), to execute the second command in the background. Do not use lsnrctl start to start the listener.
Use PL/SQL procedure DBMS_XDB.sethtpport with SYS as SYSDBA to set the HTTP(S) port number to port_number in the Oracle XML DB configuration file /xdbconfig.xml, where port_number is 80 for HTTP or 443 for HTTPS:
SQL> exec DBMS_XDB.setHTTPPort(port_number);
Force the database to reregister with the listener, using this SQL statement:
SQL> ALTER SYSTEM REGISTER;
Check that the listener is correctly configured:
% lsnrctl status
See Also:
Oracle Database Net Services Reference for information about listener parameters and file LISTENER.ora
Oracle Database Net Services Reference, section "Port Number Limitations" for information about running on privileged ports
HTTPS: Support for Secure HTTP
If properly configured, you can access Oracle XML DB Repository in a secure fashion, using HTTPS. See "Configuring Secure HTTP (HTTPS)" for configuration information.
Note:
If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use HTTPS for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry. For information about the latter, see http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2netwk.mspx#XSLTsection129121120120
Anonymous Access to Oracle XML DB Repository using HTTP
Configuration parameter allow-repository-anonymous-access controls whether or not anonymous HTTP access to Oracle XML DB Repository data is allowed using an unlocked ANONYMOUS user account. The default value is false, meaning that unauthenticated access to repository data is blocked. To allow anonymous HTTP access to the repository, you must set this parameter to true, and unlock the ANONYMOUS user account.
Caution:
There is an inherent security risk associated with allowing anonymous access to the repository.
Parameter allow-repository-anonymous-access does not control anonymous access to the repository using servlets. Each servlet has its own security-role-ref parameter value to control its access.
See Also:
Table 28-3 for information about parameter allow-repository-anonymous-access
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
"Configuring Oracle XML DB Servlets" for information about parameter security-role-ref
Using Java Servlets with HTTP(S)
Oracle XML DB supports Java servlets. To use a Java servlet, it must be registered with a unique name in the Oracle XML DB configuration file, along with parameters to customize its action. It should be compiled, and loaded into the database. Finally, the servlet name must be associated with a pattern, which can be an extension such as *.jsp or a path name such as /a/b/c or /sys/*, as described in Java servlet application program interface (API) version 2.2.
While processing an HTTP(S) request, the path name for the request is matched with the registered patterns. If there is a match, then the protocol server invokes the corresponding servlet with the appropriate initialization parameters. For Java servlets, the existing Java Virtual Machine (JVM) infrastructure is used. This starts the JVM if need be, which in turn runs a Java method to initialize the servlet, create response, and request objects, pass these on to the servlet, and run it.
See Also:
Chapter 32, "Writing Oracle XML DB Applications in Java"
Embedded PL/SQL Gateway
You can use the PL/SQL gateway to implement a Web application entirely in PL/SQL. There are two implementations of the PL/SQL gateway:
mod_plsql – a plug-in of Oracle HTTP Server that lets you invoke PL/SQL stored procedures using HTTP(S). Oracle HTTP Server is a component of both Oracle Application Server and Oracle Database; it should not be confused with the HTTP component of the Oracle XML DB protocol server.
the embedded PL/SQL gateway – a gateway implementation that runs in the Oracle XML DB HTTP listener.
With the PL/SQL gateway (either implementation), a Web browser sends an HTTP(S) request in the form of a URL that identifies a stored procedure and provides it with parameter values. The gateway translates the URL, calls the stored procedure with the parameter values, and returns output (typically HTML) to the Web-browser client.
Using the embedded PL/SQL gateway simplifies installation, configuration, and administration of PL/SQL based Web applications. The embedded gateway uses the Oracle XML DB protocol server, not Oracle HTTP Server. Its configuration is defined by the Oracle XML DB configuration file, /xdbconfig.xml. However, the recommended way to configure the embedded gateway is to use the procedures in PL/SQL package DBMS_EPG, not to edit file /xdbconfig.xml.
See Also:
Oracle Database Advanced Application Developer's Guide for information on using and configuring the embedded PL/SQL gateway
Chapter 34, "Administering Oracle XML DB" for information on the configuration definition of the embedded gateway in /xdbconfig.xml
Oracle Fusion Middleware Administrator's Guide for Oracle HTTP Server for conceptual information about using the PL/SQL gateway
Oracle HTTP Server mod_plsql User's Guide for information about mod_plsql
Sending Multibyte Data From a Client
When a client sends multibyte data in a URL, RFC 2718 specifies that the client should send the URL using the %HH format, where HH is the hexadecimal notation of the byte value in UTF-8 encoding. The following are URL examples that can be sent to Oracle XML DB in an HTTP(S) or WebDAV context:
http://urltest/xyz%E3%81%82%E3%82%A2
http://%E3%81%82%E3%82%A2
http://%E3%81%82%E3%82%A2/abc%E3%81%86%E3%83%8F.xml
Oracle XML DB processes the requested URL, any URLs within an IF header, any URLs within the DESTINATION header, and any URLs in the REFERRED header that contains multibyte data.
The default-url-charset configuration parameter can be used to accept requests from some clients that use other, nonconforming, forms of URL, with characters that are not ASCII. If a request with such characters fails, try setting this value to the native character set of the client environment. The character set used in such URL fields must be specified with an IANA charset name.
default-url-charset controls the encoding for nonconforming URLs. It is not required to be set unless a nonconforming client that does not send the Content-Type charset is used.
See Also:
RFC 2616: HTTP 1.1 Protocol Specification, http://www.ietf.org/rfc/rfc2616.txt
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Characters That Are Not ASCII In URLs
Characters that are not ASCII that appear in URLs passed to an HTTP server should be converted to UTF-8 and escaped in the %HH format, where HH is the hexadecimal notation of the byte value. For flexibility, the Oracle XML DB protocol server interprets the incoming URLs by testing whether it is encoded in one of the following character sets in the order presented here:
UTF-8
Charset parameter of the Content-Type field of the request, if specified
Character set, if specified, in the default-url-charset configuration parameter
Character set of the database
See Also:
"Configuring Oracle XML DB Using xdbconfig.xml" for information about configuring Oracle XML DB parameters
Controlling Character Sets for HTTP(S)
The following sections describe how character sets are controlled for data transferred using HTTP(S).
Request Character Set
The character set of the HTTP(S) request body is determined with the following algorithm:
The Content-Type header is evaluated. If the Content-Type header specifies a charset value, the specified charset is used.
The MIME type of the document is evaluated as follows:
If the MIME type is "*/xml", the character set is determined as follows:
- If a BOM is present, then UTF-16 is used.
- If an encoding declaration is present, the specified encoding is used.
- If neither a BOM nor an encoding declaration is present, UTF-8 is used.
If the MIME type is text, ISO8859-1 is used.
If the MIME type is neither "*/xml" nor text, the database character set is used.
There is a difference between HTTP(S) and SQL or FTP. For text documents, the default is ISO8859-1, as specified by the IETF.org RFC 2616: HTTP 1.1 Protocol Specification.
Response Character Set
The response generated by Oracle XML DB HTTP Server is in the character set specified in the Accept-Charset field of the request. Accept-Charset can have a list of character sets. Based on the q-value, Oracle XML DB chooses one that does not require conversion. This might not necessarily be the charset with the highest q-value. If Oracle XML DB cannot find one, then the conversion is based on the highest q-value.
Using WebDAV and Oracle XML DB
Web Distributed Authoring and Versioning (WebDAV) is an IETF standard protocol used to provide users with a file-system interface to Oracle XML Repository over the Internet. The most popular way of accessing a WebDAV server folder is through WebFolders on Microsoft Windows 2000 or Microsoft NT.
WebDAV is an extension to the HTTP 1.1 protocol that lets an HTTP server act as a file server. It lets clients perform remote Web content authoring through a coherent set of methods, headers, request body formats and response body formats. For example, a DAV-enabled editor can interact with an HTTP/WebDAV server as if it were a file system. WebDAV provides operations to store and retrieve resources, create and list contents of resource collections, lock resources for concurrent access in a coordinated manner, and to set and retrieve resource properties.
Oracle XML DB WebDAV Features
Oracle XML DB supports the following WebDAV features:
Foldering, specified by RFC2518
Access Control
WebDAV is a set of extensions to the HTTP(S) protocol that allow you to edit or manage your files on remote Web servers. WebDAV can also be used, for example, to:
Share documents over the Internet
Edit content over the Internet
See Also:
RFC 2518: WebDAV Protocol Specification, http://www.ietf.org/rfc/rfc2518.txt
WebDAV Features That Are Not Supported
Oracle XML DB supports the contents of RFC2518, with the following exceptions:
Lock-NULL resources create zero-length resources in the file system, and cannot be converted to folders.
The COPY, MOVE and DELETE methods comply with section 2 of the Internet Draft titled 'Binding Extensions to WebDAV'.
Depth-infinity locks
Only Basic Authentication is supported.
Supported WebDAV Client Methods
For access to the repository, Oracle XML DB supports the following HTTP(S)/WebDAV client methods.
PROPFIND (WebDAV-specific) – get properties for a resource
PROPPATCH (WebDAV-specific) – set or remove resource properties
LOCK (WebDAV-specific) – lock a resource (create or refresh a lock)
UNLOCK (WebDAV-specific) – unlock a resource (remove a lock)
COPY (WebDAV-specific) – copy a resource
MOVE (WebDAV-specific) – move a resource
MKCOL (WebDAV-specific) – create a folder resource (collection)
See Also:
"HTTP(S) Client Methods That Are Supported" for additional supported HTTP(S) client methods
"Access Privileges" for information about WebDAV privileges
"Using WebDAV PROPPATCH to Add Metadata"
Using WebDAV with Microsoft Windows XP SP2
If Oracle Database is installed on Microsoft Windows XP with Service Pack 2 (SP2), then you must use a secure connection (HTTPS) for WebDAV access to Oracle XML DB Repository, or else you must make appropriate modifications to the Windows XP Registry.
See Also:
http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2netwk.mspx#XSLTsection129121120120 for information about making necessary modifications to the Windows XP registry
"Configuring Secure HTTP (HTTPS)"
Using Oracle XML DB and WebDAV: Creating a WebFolder in Microsoft Windows
To create a WebFolder in Windows 2000, follow these steps:
Start > My Network Places.
Double-click Add Network Place.
Click Next.
Type the location of the folder, for example:
http://Oracle_server_name:HTTP_port_number
See Figure 28-2.
Click Next.
Enter any name to identify this WebFolder
Click Finish.
You can now access Oracle XML DB Repository the same way that you access any Windows folder.
Figure 28-2 Creating a WebFolder in Microsoft Windows
Description of "Figure 28-2 Creating a WebFolder in Microsoft Windows"
Subscribe to:
Posts (Atom)