2007/12/10

ext3 became read only for journal corruption

*NOTE: I'm NOT SURE about the result of following operations! please backup all your data before execute it.

-----

The EL4 kernel is wacky when it comes the the I/O scheduler locking up and and causing ext3 to remount RO. Various hardware hiccups can cause it to go RO.

And when it does.. you need to tread lightly or you could lose everything.

If your ext3 filesystem had problems and remounted read-only, I would strongly advise /against/ simply fscking it. Often times when your filesystem has gone RO, it may have been that way for 30 minutes or more. Just rebooting or fscking is a great way to lose everything (i.e. everything being dumped into /lost+found/

Instead, I would recommend:
1) rebooting into a rescue CD environment (not allowing the rescue environment to mount or fsck your filesystems).
2) Nuke the ext3 journal:
tune2fs -O ^has_journal /dev/
(possibly doing the same for other problem partitions)
3) Do a fake fsck to see the extent of damage:
fsck -fn /dev/
(after checking things out.. use "-fy" once you're sure that it's safe)
4) Rebuild the journal w, "tune2fs -j /dev/
(rerun at least once until "clean" result is repeatable)
5) Mount and check things out,
"mkdir /mnt/tmp && mount -t ext3 /dev/ /mnt/tmp"
6) Gracefully umount & reboot:
"umount /mnt/tmp && shutdown -rf now && exit"

2007/12/08

僕の人工知能考え方

僕の人工知能考え方



人間の行動及び動機

勉強(生理維持、好奇心)
挑戦(?)
労働(生理維持)
繁殖(生理維持)
権力(生理維持)

動機

欲求(生理、安全、愛情、尊敬、自己実現)

人間の行動の実行

行動結果の判断と改善

人間思考の特徴
さえ動機があれば、色々と試していく。
うまくいくのとうまくいかないことがある。

進化と知能の違い

人工智能 book list

1.《人工智能》(美)尼尔森 郑扣根译 机械工业出版社
2.《人工智能智能系统指南》(英文版·第2版) (澳)尼格内维特斯基(Negnevitsky,M.) 机械工业出版社
3.《人工智能:理论与实践》(美)迪安 等著,顾国昌 等译 电子工业出版社
4.《人工智能:复杂问题求解的结构和策略》(美)George F.Luger 著,史忠植,张银奎 等译 机械工业出版社
5.《游戏编程中的人工智能技术》(美)布克兰德 著,吴祖增,沙鹰 翻译 清华大学出版社
6.《人工智能游戏编程真言》(美)拉比(Rabin,S.) 主编,庄越挺,吴飞 译清华大学出版社
7.《人工智能智能系统指南》(英文版·第2版) (澳)尼格内维特斯基(Negnevitsky,M.) 机械工业出版社

2007/12/06

dba_tablesapce.ALLOCATION_TYPE

ALLOCATION_TYPE 这个值有3个选项:
1、system:一旦设定该值,next_extent将为空,只有extents值。该值是默认值。这个选项的最小是64K
2、user:一旦设定该值,就允许我们可以控制next_extent了。只有两种情况出现users:一是该ts是数据字典管理的;另外一个是该ts是从数据字典管理转移到local的(用dbms_space_admin.tablespace_migrate_to_local)
3、uniform:将标明所有的extent的大小将一致,temp表空间只能采用这个方式;以上两个情况的extent的大小将不一致;uniform中的默认值为1M

flashback database

V$RECOVERY_STATUS
v$flash_recovery_area_usage
V$RECOVERY_FILE_DEST

Oracle RAC failover

SELECT FAILOVER_TYPE , FAILOVER_METHOD
FROM v$session WHERE sid = (SELECT sid FROM v$mystat WHERE ROWNUM = 1) ;

增量追加备份:前滚镜像拷贝 explain

RUN {
RECOVER COPY OF DATABASE WITH TAG 'incr_update';
BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'incr_update' DATABASE;
}

增量追加备份:前滚镜像拷贝
ORACLE文档原文:Incrementally Updated Backups: Rolling Forward Image Copy Backups。
增量追加备份工作原理:首先创建一个文件镜像拷贝,然后定期把从上次镜像拷贝最大SCN以来变化的数据块追加到镜像拷贝文件中。增量追加备份可以达到快速恢复的目的,如果是每天进行增量追加的话,在进行恢复的时候,我们最多应用一天的REDO数据就可以完成恢复。
创建增量追加备份,格式如下:
BACKUP... FOR RECOVER OF COPY WITH TAG
一个基础的增量追加备份示例:简称basic脚本
RUN {
RECOVER COPY OF DATABASE WITH TAG 'incr_update';
BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'incr_update'
DATABASE;
}
为了理解上述脚本,我们先看一下如果没有数据文件拷贝和增量备份运行这两个脚本的情况。
1、如果没有LEVEL0备份或者备份文件拷贝,执行BACKUP INCREMENTAL LEVEL 1... FOR RECOVER OF COPY WITH TAG...不能产生LEVEL1增量备份文件,但是RMAN会按照指定的tag在DATAFILE对应的目录下创建一分镜像文件拷贝。
2、如果没有LEVEL0备份或者备份文件拷贝,执行RECOVER COPY OF DATABASE WITH TAG...则生成一些信息但是不产生错误。

我们看一下整个basic脚本的执行情况:
第一次运行该脚本没有数据文件拷贝和增量备份所以执行RECOVER COPY OF DATABASE WITH TAG 'incr_update'没有任何结果;执行BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'incr_update' DATABASE将产生数据文件的镜像文件拷贝。
第二次运行该脚本,由于第一次运行的时候BACKUP INCREMENTAL LEVEL 1... FOR RECOVER OF COPY WITH TAG...命令产生一个镜像文件拷贝,但是没有LEVEL1的增量备份,所以执行RECOVER COPY OF DATABASE WITH TAG 'incr_update'还是没有任何结果;执行BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'incr_update' DATABASE将产生LEVEL1增量备份。
第三次运行该脚本,执行RECOVER COPY OF DATABASE WITH TAG 'incr_update'命令将把第二次执行该脚本产生的LEVEL1增量备份追加到镜像文件拷贝,同时又产生一个新的LEVEL1增量备份文件。
以后再执行该脚本都是把上次产生的LEVEL1的增量备份追加到镜像文件拷贝,然后再产生一个新的LEVEL1的增量备份文件。

如果需要恢复,我们首先恢复镜像文件拷贝和最后一次LEVEL1增量备份,最后应用REDO。

ext3 filesystem had problems

The EL4 kernel is wacky when it comes the the I/O scheduler locking up and and causing ext3 to remount RO. Various hardware hiccups can cause it to go RO.

And when it does.. you need to tread lightly or you could lose everything.

If your ext3 filesystem had problems and remounted read-only, I would strongly advise /against/ simply fscking it. Often times when your filesystem has
gone RO, it may have been that way for 30 minutes or more. Just rebooting or fscking is a great way to lose everything (i.e. everything being dumped into /lost+found/

Instead, I would recommend:
1) rebooting into a rescue CD environment (not allowing the rescue environment to mount or fsck your filesystems).
2) Nuke the ext3 journal:
tune2fs -O ^has_journal /dev/
(possibly doing the same for other problem partitions)
3) Do a fake fsck to see the extent of damage:
fsck -fn /dev/
(after checking things out.. use "-fy" once you're sure that it's safe)
4) Rebuild the journal w, "tune2fs -j /dev/
(rerun at least once until "clean" result is repeatable)
5) Mount and check things out,
"mkdir /mnt/tmp && mount -t ext3 /dev/ /mnt/tmp"
6) Gracefully umount & reboot:
"umount /mnt/tmp && shutdown -rf now && exit"

find oracle shared memory handle


ps -ef | grep vasdb | grep oracle | grep LOCAL
#memo the {pid}

sqlplus sys@op.. as sysdba
oradebug setospid {pid}
oradebug ipc
exit

cd {your udump dir}
ls -lt
vasdb_ora_xxxx.trc

grep Handle vasdb_ora_xxxx.trc
#you will find the memory handle.
#if you can start the secondary database to nomount.
#you can compare the too handle string and find out the difference.

the result of my server

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /u01/oracle/app/oracle/product/10.2.0/db_1
System name: Linux
Node name: test
Release: 2.4.21-37.ELsmp
Version: #1 SMP Wed Sep 7 13:28:55 EDT 2005
Machine: i686
Instance name: orcl
Redo thread mounted by this instance: 0
Oracle process number: 15
Unix process pid: 23332, image: oracle@test (TNS V1-V3)

*** 2007-12-05 21:09:32.165
*** SERVICE NAME:() 2007-12-05 21:09:32.165
*** SESSION ID:(159.1) 2007-12-05 21:09:32.165
Received ORADEBUG command 'ipc' from process Unix process pid: 23483, image:
Dump of unix-generic skgm context
areaflags 000000e7
realmflags 0000000f
mapsize 00000800
protectsize 00001000
lcmsize 00001000
seglen 00200000
largestsize 0000000080000000
smallestsize 0000000000400000
stacklimit 0xbe0776e0
stackdir -1
mode 660
magic acc01ade
Handle: 0xccd9088 `/u01/oracle/app/oracle/product/10.2.0/db_1orcl'
Dump of unix-generic realm handle `/u01/oracle/app/oracle/product/10.2.0/db_1orcl', flags = 00000000

EXPDP DATAPUMP EXCLUDE/INCLUDE parameters

EXPDP DATAPUMP EXCLUDE/INCLUDE parameters

The exclude and include parameters availbale with expdp,impdp can be used as metadata filters so that one can specify any objects like tables,indexes,triggers, procedure to be excluded or included during export or import operation

syntax:

EXCLUDE=[object_type]:[name_clause],[object_type]:[name_clause]

INCLUDE=[object_type]:[name_clause],[object_type]:[name_clause]

examples:

expdp SCHEMAS=scott EXCLUDE=SEQUENCE,TABLE:”IN (’EMP’,'DEPT’)”;

impdp SCHEMAS=scott INCLUDE=PACKAGE,FUNCTION, PROCEDURE,TABLE:”=’EMP’”

The name_clause is a SQL expression that is used as a filter on the object names of the object. It consists of a SQL operator and the values against which the object names of the specified type are to be compared. If no name_clause is provided, all objects of the specified type are excluded/included. The name clause must be separated from the object type with a colon.

Examples of operator-usage:

EXCLUDE=SEQUENCE

or:EXCLUDE=TABLE:”IN (’EMP’,'DEPT’)”

or:EXCLUDE=INDEX:”= ‘MY_INDX’”

or:INCLUDE=PROCEDURE:”LIKE ‘MY_PROC_%’”

or:INCLUDE=TABLE:”> ‘E’”

The parameter can also be stored in a par (parameter file) as shown

Parameter file:exp.par

DIRECTORY = my_dir

DUMPFILE = exp_tab.dmp

LOGFILE = exp_tab.log

SCHEMAS = scott

INCLUDE = TABLE:”IN (’EMP’, ‘DEPT’)”


expdp system/manager parfile=exp.par

If parameter file is not used then in unix special care needs to be taken in syntax of expdp and impdp, in particular all the single quotes and double quotes needs to be preceded with the special character ‘\’ .The syntax for windows and unix

Windows:

D:\> expdp system/manager DIRECTORY=my_dir DUMPFILE=exp_tab.dmp LOGFILE=exp_tab.log SCHEMAS=scott INCLUDE=TABLE:\”IN (’EMP’, ‘DEP’)\”

Unix:

% expdp system/manager DIRECTORY=my_dir DUMPFILE=exp_tab.dmp LOGFILE=exp_tab.log SCHEMAS=scott INCLUDE=TABLE:\”IN \(\’EMP\’, \’DEP\’\)\”

Any improper use of exclude or include can give you any of the below mentioned errors and hence to avoid this error please be careful and read the entire post carefully .

ORA-39001: invalid argument value

ORA-39071: Value for INCLUDE is badly formed.

ORA-00936: missing expression

or:

ORA-39001: invalid argument value

ORA-39071: Value for EXCLUDE is badly formed.

ORA-00904: “DEPT”: invalid identifier

or:

ORA-39001: invalid argument value

ORA-39041: Filter “INCLUDE” either identifies all object types or no object types.

or:

ORA-39001: invalid argument value

ORA-39041: Filter “EXCLUDE” either identifies all object types or no object types

.or:

ORA-39001: invalid argument value

ORA-39038: Object path “USER” is not supported for TABLE jobs.

or:

UDE-00011: parameter include is incompatible with parameter exclude

or:

ksh: syntax error: ‘(’ unexpected

or:

ORA-31655: no data or metadata objects selected for job

Thanks and Regards

Parikshit

Fw 数据库异常关闭后无法启动问题处理一例

数据库异常关闭后无法启动问题处理一例



作者: fuyuncat

来源: www.HelloDBA.com





某系统突然掉电,系统启动后发现Oracle无法启动。启动时报如下错误:
ORA-01102 cannot mount database in EXCLUSIVE mode



出现1102错误可能有以下几种可能:

一、在HA系统中,已经有其他节点启动了实例,将双机共享的资源(如磁盘阵列上的裸设备)占用了;



二、说明Oracle被异常关闭时,有资源没有被释放,一般有以下几种可能,

1、 Oracle的共享内存段或信号量没有被释放;

2、 Oracle的后台进程(如SMON、PMON、DBWn等)没有被关闭;

3、 用于锁内存的文件lk和sgadef.dbf文件没有被删除。



首先,虽然我们的系统是HA系统,但是备节点的实例始终处在关闭状态,这点通过在备节点上查数据库状态可以证实。

其次、是因系统掉电引起数据库宕机的,系统在接电后被重启,因此我们排除了第二种可能种的1、2点。最可疑的就是第3点了。

查$ORACLE_HOME/dbs目录:
$ cd $ORACLE_HOME/dbs
$ ls sgadef*
sgadef* not found
$ ls lk*
lkORA92



果然,lk文件没有被删除。将它删除掉
$ rm lk*



再启动数据库,成功。



如果怀疑是共享内存没有被释放,可以用以下命令查看:
$ipcs -mop
IPC status from /dev/kmem as of Thu Jul 6 14:41:43 2006
T ID KEY MODE OWNER GROUP NATTCH CPID LPID
Shared Memory:
m 0 0x411c29d6 --rw-rw-rw- root root 0 899 899
m 1 0x4e0c0002 --rw-rw-rw- root root 2 899 901
m 2 0x4120007a --rw-rw-rw- root root 2 899 901
m 458755 0x0c6629c9 --rw-r----- root sys 2 9113 17065
m 4 0x06347849 --rw-rw-rw- root root 1 1661 9150
m 65541 0xffffffff --rw-r--r-- root root 0 1659 1659
m 524294 0x5e100011 --rw------- root root 1 1811 1811
m 851975 0x5fe48aa4 --rw-r----- oracle oinstall 66 2017 25076



然后它ID号清除共享内存段:
$ipcrm –m 851975



对于信号量,可以用以下命令查看:
$ ipcs -sop
IPC status from /dev/kmem as of Thu Jul 6 14:44:16 2006
T ID KEY MODE OWNER GROUP
Semaphores:
s 0 0x4f1c0139 --ra------- root root
... ...
s 14 0x6c200ad8 --ra-ra-ra- root root
s 15 0x6d200ad8 --ra-ra-ra- root root
s 16 0x6f200ad8 --ra-ra-ra- root root
s 17 0xffffffff --ra-r--r-- root root
s 18 0x410c05c7 --ra-ra-ra- root root
s 19 0x00446f6e --ra-r--r-- root root
s 20 0x00446f6d --ra-r--r-- root root
s 21 0x00000001 --ra-ra-ra- root root
s 45078 0x67e72b58 --ra-r----- oracle oinstall



根据信号量ID,用以下命令清除信号量:
$ipcrm -s 45078



如果是Oracle进程没有关闭,用以下命令查出存在的oracle进程:
$ ps -ef|grep ora
oracle 29976 1 0 Jun 22 ? 0:52 ora_dbw0_ora92
oracle 29978 1 0 Jun 22 ? 0:51 ora_dbw1_ora92
oracle 5128 1 0 Jul 5 ? 0:00 oracleora92 (LOCAL=NO)
... ...



然后用kill -9命令杀掉进程
$kill -9



总结:

当发生1102错误时,可以按照以下流程检查、排错:
如果是HA系统,检查其他节点是否已经启动实例;
检查Oracle进程是否存在,如果存在则杀掉进程;
检查信号量是否存在,如果存在,则清除信号量;
检查共享内存段是否存在,如果存在,则清除共享内存段;
检查锁内存文件lk和sgadef.dbf是否存在,如果存在,则删除。

Vista Transformation Pack Lite 3.0

reference link: http://cowscorpion.com/Theme/VistaTransformationPackLite.html

download link: http://cowscorpion.com/cgi/ccount/click.php?id=2153

Vista Transformation Pack はXP/2003の外観をWindows Vista風にする、Visual Styleなどを含んだインストールパッケージです。
 ビジュアルスタイルの適応はWindowsのシステムファイルを変更するので自己責任で使用しましょう。

Vista Transformation Pack Lite は Vista Transformation Pack(Full)のLite版です。  Vista Visual Styles Pack はVisual Styleのみのパックです。

 Full版との違い
- 1つのフォント (Segoe UI, Bold のみ)
- 2つのスキン (Longhorn SideBar, Vista) , Vista clock
- 3つのvisual styles (Aero Style (Glass - 50xx), Aero Style (Glass - 51xx) and Aero Style (Vista - Beta 1))
- 5つの壁紙(Keynote Glass+Widescreen, Keynote Grass+Widescreen and WinV_Wallpaper)

UXPatcher
http://www.softpedia.com/get/System/OS-Enhancements/UXTheme-MultiPatcher.shtml

UXTheme MultiPatcher description
Uxtheme Multi-patcher - A patch that will change your uxtheme.dll on Windows XP
Uxtheme Multi-patcher will allow you to use any 3rd party msstyle theme on Windows XP/SP1/SP2 or Windows Server 2003 (with
Themes enabled) just by patching the uxtheme.dll (dynamic link library).

It's quite disappointment that 4.0 which claimed it should work on x64 OS but it didn't work. This time it wouldn't disappoint you since I tested the method and it can finally by-pass 32-bit redirection system. Grab the world first 32-bit patcher that works on x64 OS today!

Note: If you already have uxtheme.dll patched (being to use 3rd-party visual styles), you don’t need to update it with this ones.

2007/12/05

Hyper-Threading speeds Linux

The report of IBM says hyper-thread speeds up linux. But based my oracle database report, the improvement is not obviously and not stable.

------
Hyper-Threading speeds Linux

Multiprocessor performance on a single processor

Document options

Print this page

E-mail this page


Rate this page

Help us improve this content





Level: Introductory

Duc Vianney (dvianney@us.ibm.com), Linux Kernel Performance Group, Linux Technology Center, IBM

01 Jan 2003
The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that, to the operating system, makes a single processor behave like two logical processors. When enabled, the technology allows the processor to execute multiple threads simultaneously, in parallel within each processor, which can yield significant performance improvement. We set out to quantify just how much improvement you can expect to see.

The current Linux symmetric multiprocessing (SMP) kernel at both the 2.4 and 2.5 versions was made aware of Hyper-Threading, and performance speed-up had been observed in multithreaded benchmarks (see Resources later in this article for articles with more details).

This article gives the results of our investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not. The system under test was a multithreading-enabled, single-CPU Xeon. The benchmarks used in the study covered areas within the kernel that could be affected by Hyper-Threading, such as the scheduler, low-level kernel primitives, the file server, the network, and threaded support.

The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.

Introduction

Intel's Hyper-Threading Technology enables two logical processors on a single physical processor by replicating, partitioning, and sharing the resources within the Intel NetBurst microarchitecture pipeline.

Replicated resources create copies of the resources for the two threads:
All per-CPU architectural states
Instruction pointers, renaming logic
Some smaller resources (such as return stack predictor, ITLB, etc.)

Partitioned resources divide the resources between the executing threads:
Several buffers (Re-Order Buffer, Load/Store Buffers, queues, etc.)

Shared resources make use of the resources as needed between the two executing threads:
Out-of-Order execution engine
Caches

Typically, each physical processor has a single architectural state on a single processor core to service threads. With HT, each physical processor has two architectural states on a single core, making the physical processor appear as two logical processors to service threads. The system BIOS enumerates each architectural state on the physical processor. Since Hyper-Threading-aware operating systems take advantage of logical processors, those operating systems have twice as many resources to service threads.

Back to top





Hyper-Threading support in the Xeon processor

The Xeon processor is the first to implement Simultaneous Multi-Threading (SMT) in a general-purpose processor. (See Resources for more information on the Xeon family of processors.) To achieve the goal of executing two threads on a single physical processor, the processor simultaneously maintains the context of multiple threads that allow the scheduler to dispatch two potentially independent threads concurrently.

The operating system (OS) schedules and dispatches threads of code to each logical processor as it would in an SMP system. When a thread is not dispatched, the associated logical processor is kept idle.

When a thread is scheduled and dispatched to a logical processor, LP0, the Hyper-Threading technology utilizes the necessary processor resources to execute the thread.

When a second thread is scheduled and dispatched on the second logical processor, LP1, resources are replicated, divided, or shared as necessary in order to execute the second thread. Each processor makes selections at points in the pipeline to control and process the threads. As each thread finishes, the operating system idles the unused processor, freeing resources for the running processor.

The OS schedules and dispatches threads to each logical processor, just as it would in a dual-processor or multi-processor system. As the system schedules and introduces threads into the pipeline, resources are utilized as necessary to process two threads.

Back to top





Hyper-Threading support in Linux kernel 2.4

Under the Linux kernel, a Hyper-Threaded processor with two virtual processors is treated as a pair of real physical processors. As a result, the scheduler that handles SMP should be able to handle Hyper-Threading as well. The support for Hyper-Threading in Linux kernel 2.4.x began with 2.4.17 and includes the following enhancements:
128-byte lock alignment
Spin-wait loop optimization
Non-execution based delay loops
Detection of Hyper-Threading enabled processor and starting the logical processor as if machine was SMP
Serialization in MTRR and Microcode Update driver as they affect shared state
Optimization to scheduler when system is idle to prioritize scheduling on a physical processor before scheduling on logical processor
Offset user stack to avoid 64K aliasing

Back to top





Kernel performance measurement

To assess the effects of Hyper-Threading on the Linux kernel, we measured the performance of kernel benchmarks on a system containing the Intel Xeon processor with HT. The hardware was a single-CPU, 1.6 GHz Xeon MP processor with SMT, 2.5 GB of RAM, and two 9.2 GB SCSI disk drives. The kernel under measurement was stock version 2.4.19 configured and built with SMP enabled. The kernel Hyper-Threading support was specified by the boot option acpismp=force for Hyper-Threading and noht for no Hyper-Threading. The existence of Hyper-Threading support can be seen by using the command cat /proc/cpuinfo to show the presence of two processors, processor 0 and processor 1. Note the ht flag in Listing 1 for CPUs 0 and 1. In the case of no Hyper-Threading support, the data will be displayed for processor 0 only.

Listing 1. Output from cat /proc/cpuinfo showing Hyper-Threading support
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Genuine CPU 1.60GHz
stepping : 1
cpu MHz : 1600.382
cache size : 256 KB
. . .
fpu : yes
fpu_exception: yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm
bogomips : 3191.60
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Genuine CPU 1.60GHz
stepping : 1
cpu MHz : 1600.382
cache size : 256 KB
. . .
fpu : yes
fpu_exception: yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm
bogomips : 3198.15



Back to top





Linux kernel benchmarks

To measure Linux kernel performance, five benchmarks were used: LMbench, AIM Benchmark Suite IX (AIM9), chat, dbench, and tbench. The LMbench benchmark times various Linux application programming interfaces (APIs), such as basic system calls, context switching latency, and memory bandwidth. The AIM9 benchmark provides measurements of user application workload. The chat benchmark is a client-server workload modeled after a chat room. The dbench benchmark is a file server workload, and tbench is a TCP workload. Chat, dbench, and tbench are multithreaded benchmarks, while the others are single-threaded benchmarks.

Back to top





Effects of Hyper-Threading on Linux APIs

The effects of Hyper-Threading on Linux APIs were measured by LMbench, which is a microbenchmark containing a suite of bandwidth and latency measurements. Among these are cached file read, memory copy (bcopy), memory read/write (and latency), pipe, context switching, networking, filesystem creates and deletes, process creation, signal handling, and processor clock latency. LMbench stresses the following kernel components: scheduler, process management, communication, networking, memory map, and filesystem. The low level kernel primitives provide a good indicator of the underlying hardware capabilities and performance.

To study the effects of Hyper-Threading, we focused on latency measurements that measure time of message control, (in other words, how fast a system can perform some operation). The latency numbers are reported in microseconds per operation.

Table 1 shows a partial list of kernel functions tested by LMbench. Each data point is the average of three runs, and the data have been tested for their convergence to assure that they are repeatable when subjected to the same test environment. In general, there is no performance difference between Hyper-Threading and no Hyper-Threading for those functions that are running as a single thread. However, for those tests that require two threads to run, such as the pipe latency test and the three process latency tests, Hyper-Threading seems to degrade their latency times. The configured stock SMP kernel is denoted as 2419s. If the kernel was configured without Hyper-Threading support, it is denoted as 2419s-noht. With Hyper-Threading support, the kernel is listed as 2419s-ht.

Table 1. Effects of Hyper-Threading on Linux APIs Kernel function 2419s-noht 2419s-ht Speed-up
Simple syscall 1.10 1.10 0%
Simple read 1.49 1.49 0%
Simple write 1.40 1.40 0%
Simple stat 5.12 5.14 0%
Simple fstat 1.50 1.50 0%
Simple open/close 7.38 7.38 0%
Select on 10 fd's 5.41 5.41 0%
Select on 10 tcp fd's 5.69 5.70 0%
Signal handler installation 1.56 1.55 0%
Signal handler overhead 4.29 4.27 0%
Pipe latency 11.16 11.31 -1%
Process fork+exit 190.75 198.84 -4%
Process fork+execve 581.55 617.11 -6%
Process fork+/bin/sh -c 3051.28 3118.08 -2%
Note: Data are in microseconds: smaller is better.


The pipe latency test uses two processes communicating through a UNIX pipe to measure interprocess communication latencies via socket. The benchmark passes a token back and forth between the two processes. The degradation is 1%, which is small to the point of being insignificant.

The three process tests involve process creation and execution under Linux. The purpose is to measure the time taken to create a basic thread of control. For the process fork+exit test, the data represents the latency time taken to split a process into two (nearly) identical copies and have one exit. This is how new processes are created -- but it is not very useful since both processes are doing the same thing. In this test, Hyper-Threading causes a 4% degradation.

In the process fork+execve, the data represents the time it takes to create a new process and have that new process run a new program. This is the inner loop of all shells (command interpreters). This test sees 6% degradation due to Hyper-Threading.

In the process fork+/bin/sh -c test, the data represents the time taken to create a new process and have that new process run a new program by asking the system shell to find that program and run it. This is how the C library interface called system is implemented. This call is the most general and the most expensive. Under Hyper-Threading, this test runs 2% slower compared to non-Hyper-Threading.

Back to top





Effects of Hyper-Threading on Linux single-user application workload

The AIM9 benchmark is a single user workload designed to measure the performance of hardware and operating systems. The results are shown in Table 2. Most of the tests in the benchmark performed identically in Hyper-Threading and non-Hyper-Threading, except for the sync file operations and Integer Sieves. The three operations, Sync Random Disk Writes, Sync Sequential Disk Writes, and Sync Disk Copies, are approximately 35% slower in Hyper-Threading. On the other hand, Hyper-Threading provided a 60% improvement over non-Hyper-Threading in the case of Integer Sieves.

Table 2. Effects of Hyper-Threading on AIM9 workload 2419s-noht 2419s-ht Speed-up
add_double Thousand Double Precision Additions per second 638361 637724 0%
add_float Thousand Single Precision Additions per second 638400 637762 0%
add_long Thousand Long Integer Additions per second 1479041 1479041 0%
add_int Thousand Integer Additions per second 1483549 1491017 1%
add_short Thousand Short Integer Additions per second 1480800 1478400 0%
creat-clo File Creations and Closes per second 129100 139700 8%
page_test System Allocations & Pages per second 161330 161840 0%
brk_test System Memory Allocations per second 633466 635800 0%
jmp_test Non-local gotos per second 8666900 8694800 0%
signal_test Signal Traps per second 142300 142900 0%
exec_test Program Loads per second 387 387 0%
fork_test Task Creations per second 2365 2447 3%
link_test Link/Unlink Pairs per second 54142 59169 9%
disk_rr Random Disk Reads (K) per second 85758 89510 4%
disk_rw Random Disk Writes (K) per second 76800 78455 2%
disk_rd Sequential Disk Reads (K) per second 351904 356864 1%
disk_wrt Sequential Disk Writes (K) per second 154112 156359 1%
disk_cp Disk Copies (K) per second 104343 106283 2%
sync_disk_rw Sync Random Disk Writes (K) per second 239 155 -35%
sync_disk_wrt Sync Sequential Disk Writes (K) per second 97 60 -38%
sync_disk_cp Sync Disk Copies (K) per second 97 60 -38%
disk_src Directory Searches per second 48915 48195 -1%
div_double Thousand Double Precision Divides per second 37162 37202 0%
div_float Thousand Single Precision Divides per second 37125 37202 0%
div_long Thousand Long Integer Divides per second 27305 27360 0%
div_int Thousand Integer Divides per second 27305 27332 0%
div_short Thousand Short Integer Divides per second 27305 27360 0%
fun_cal Function Calls (no arguments) per second 30331268 30105600 -1%
fun_cal1 Function Calls (1 argument) per second 112435200 112844800 0%
fun_cal2 Function Calls (2 arguments) per second 97587200 97843200 0%
fun_cal15 Function Calls (15 arguments) per second 44748800 44800000 0%
sieve Integer Sieves per second 15 24 60%
mul_double Thousand Double Precision Multiplies per second 456287 456743 0%
mul_float Thousand Single Precision Multiplies per second 456000 456743 0%
mul_long Thousand Long Integer Multiplies per second 167904 168168 0%
mul_int Thousand Integer Multiplies per second 167976 168216 0%
mul_short Thousand Short Integer Multiplies per second 155730 155910 0%
num_rtns_1 Numeric Functions per second 92740 92920 0%
trig_rtns Trigonometric Functions per second 404000 405000 0%
matrix_rtns Point Transformations per second 875140 891300 2%
array_rtns Linear Systems Solved per second 579 578 0%
string_rtns String Manipulations per second 2560 2564 0%
mem_rtns_1 Dynamic Memory Operations per second 982035 980019 0%
mem_rtns_2 Block Memory Operations per second 214590 215390 0%
sort_rtns_1 Sort Operations per second 481 472 -2%
misc_rtns_1 Auxiliary Loops per second 7916 7864 -1%
dir_rtns_1 Directory Operations per second 2002000 2001000 0%
shell_rtns_1 Shell Scripts per second 95 97 2%
shell_rtns_2 Shell Scripts per second 95 96 1%
shell_rtns_3 Shell Scripts per second 95 97 2%
series_1 Series Evaluations per second 3165270 3189630 1%
shared_memory Shared Memory Operations per second 174080 174220 0%
tcp_test TCP/IP Messages per second 65835 66231 1%
udp_test UDP/IP DataGrams per second 111880 112150 0%
fifo_test FIFO Messages per second 228920 228900 0%
stream_pipe Stream Pipe Messages per second 170210 171060 0%
dgram_pipe DataGram Pipe Messages per second 168310 170560 1%
pipe_cpy Pipe Messages per second 245090 243440 -1%
ram_copy Memory to Memory Copy per second 490026708 492478668 1%


Back to top





Effects of Hyper-Threading on Linux multithreaded application workload

To measure the effects of Hyper-Threading on Linux multithreaded applications, we use the chat benchmark, which is modeled after a chat room. The benchmark includes both a client and a server. The client side of the benchmark will report the number of messages sent per second; the number of chat rooms and messages will control the workload. The workload creates a lot of threads and TCP/IP connections, and sends and receives a lot of messages. It uses the following default parameters:
Number of chat rooms = 10
Number of messages = 100
Message size = 100 bytes
Number of users = 20

By default, each chat room has 20 users. A total of 10 chat rooms will have 20x10 = 200 users. For each user in the chat room, the client will make a connection to the server. So since we have 200 users, we will have 200 connections to the server. Now, for each user (or connection) in the chat room, a "send" thread and a "receive" thread are created. Thus, a 10-chat-room scenario will create 10x20x2 = 400 client threads and 400 server threads, for a total of 800 threads. But there's more.

Each client "send" thread will send the specified number of messages to the server. For 10 chat rooms and 100 messages, the client will send 10x20x100 = 20,000 messages. The server "receive" thread will receive the corresponding number of messages. The chat room server will echo each of the messages back to the other users in the chat room. Thus, for 10 chat rooms and 100 messages, the server "send" thread will send 10x20x100x19 or 380,000 messages. The client "receive" thread will receive the corresponding number of messages.

The test starts by starting the chat server in a command-line session and the client in another command-line session. The client simulates the workload and the results represent the number of messages sent by the client. When the client ends its test, the server loops and accepts another start message from the client. In our measurement, we ran the benchmark with 20, 30, 40, and 50 chat rooms. The corresponding number of connections and threads are shown in Table 3.

Table 3. Number of chat rooms and threads tested Number of
chat rooms Number of
connections Number of
threads Number of
messages sent Number of
messages received Total number
of messages
20 400 1,600 40,000 760,000 800,000
30 600 2,400 60,000 1,140,000 1,200,000
40 800 3,200 80,000 1,520,000 1,600,000
50 1000 4,000 100,000 1,900,000 2,000,000


Table 4 show the performance impact of Hyper-Threading on the chat workload. Each data point represents the geometric mean of five runs. The data set clearly indicates that Hyper-Threading could improve the workload throughput from 22% to 28% depending on the number of chat rooms. Overall, Hyper-Threading will boost the chat performance by 24% based on the geometric mean of the 4 chat room samples.

Table 4. Effects of Hyper-Threading on chat throughput Number of chat rooms 2419s-noht 2419s-ht Speed-up
20 164,071 202,809 24%
30 151,530 184,803 22%
40 140,301 171,187 22%
50 123,842 158,543 28%
Geometric Mean 144,167 178,589 24%
Note: Data is the number of messages sent by client: higher is better.


Figure 1. Effects of Hyper-Threading on the chat workload


Back to top





Effects of Hyper-Threading on Linux multithreaded file server workload

The effect of Hyper-Threading on the file server was measured with dbench and its companion test, tbench. dbench is similar to the well known NetBench benchmark from the Ziff-Davis Media benchmark program, which lets you measure the performance of file servers as they handle network file requests from clients. However, while NetBench requires an elaborate setup of actual physical clients, dbench simulates the 90,000 operations typically run by a NetBench client by sniffing a 4 MB file called client.txt to produce the same workload. The contents of this file are file operation directives such as SMBopenx, SMBclose, SMBwritebraw, SMBgetatr, etc. Those I/O calls correspond to the Server Message Protocol Block (SMB) that the SMBD server in SAMBA would produce in a netbench run. The SMB protocol is used by Microsoft Windows 3.11, NT and 95/98 to share disks and printers.

In our tests, a total of 18 different types of I/O calls were used including open file, read, write, lock, unlock, get file attribute, set file attribute, close, get disk free space, get file time, set file time, find open, find next, find close, rename file, delete file, create new file, and flush file buffer.

dbench can simulate any number of clients without going through the expense of a physical setup. dbench produces only the filesystem load, and it does no networking calls. During a run, each client records the number of bytes of data moved and divides this number by the amount of time required to move the data. All client throughput scores are then added up to determine the overall throughput for the server. The overall I/O throughput score represents the number of megabytes per second transferred during the test. This is a measurement of how well the server can handle file requests from clients.

dbench is a good test for Hyper-Threading because it creates a high load and activity on the CPU and I/O schedulers. The ability of Hyper-Threading to support multithreaded file serving is severely tested by dbench because many files are created and accessed simultaneously by the clients. Each client has to create about 21 megabytes worth of test data files. For a test run with 20 clients, about 420 megabytes of data are expected. dbench is considered a good test to measure the performance of the elevator algorithm used in the Linux filesystem. dbench is used to test the working correctness of the algorithm, and whether the elevator is aggressive enough. It is also an interesting test for page replacement.

Table 5 shows the impact of HT on the dbench workload. Each data point represents the geometric mean of five runs. The data indicates that Hyper-Threading would improve dbench from as little as 9% to as much as 29%. The overall improvement is 18% based on the geometric mean of the five test scenarios.

Table 5. Effects of Hyper-Threading on dbench throughput Number of clients 2419s-noht 2419s-ht Speed-up
20 132.82 171.23 29%
30 131.43 169.55 29%
60 119.95 133.77 12%
90 111.89 121.81 9%
120 99.31 114.92 16%
Geometric Mean 118.4 140.3 18%
Note: Data are throughput in MB/sec: higher is better.


Figure 2. Effects of Hyper-Threading on the dbench workload


Back to top





tbench

tbench is another file server workload similar to dbench. However, tbench produces only the TCP and process load. tbench does the same socket calls that SMBD would do under a netbench load, but tbench does no filesystem calls. The idea behind tbench is to eliminate SMBD from the netbench test, as though the SMBD code could be made fast. The throughput results of tbench tell us how fast a netbench run could go if we eliminated all filesystem I/O and SMB packet processing. tbench is built as part of the dbench package.

Table 6 depicts the impact of Hyper-Threading on the tbench workload. As before, each data point represents the geometric mean of five runs. Hyper-Threading definitely would improve tbench throughput, from 22% to 31%. The overall improvement is 27% based on the geometric mean of the five test scenarios.

Table 6. Effects of Hyper-Threading on tbench throughput Number of clients 2419s-noht 2419s-ht Speed-up
20 60.98 79.86 31%
30 59.94 77.82 30%
60 55.85 70.19 26%
90 48.45 58.88 22%
120 37.85 47.92 27%
Geometric Mean 51.84 65.77 27%
Note: Data are throughput in MB/sec: higher is better.


Figure 3. Effects of Hyper-Threading on the tbench workload


Back to top





Hyper-Threading support in Linux kernel 2.5.x

Linux kernel 2.4.x was made aware of HT since the release of 2.4.17. The kernel 2.4.17 knows about the logical processor, and it treats a Hyper-Threaded processor as two physical processors. However, the scheduler used in the stock kernel 2.4.x is still considered naive for not being able to distinguish the resource contention problem between two logical processors versus two separate physical processors.

Ingo Molnar has pointed out scenarios in which the current scheduler gets things wrong (see Resources for a link). Consider a system with two physical CPUs, each of which provides two virtual processors. If there are two tasks running, the current scheduler would let them both run on a single physical processor, even though far better performance would result from migrating one process to the other physical CPU. The scheduler also doesn't understand that migrating a process from one virtual processor to its sibling (a logical CPU on the same physical CPU) is cheaper (due to cache loading) than migrating it across physical processors.

The solution is to change the way the run queues work. The 2.5 scheduler maintains one run queue per processor and attempts to avoid moving tasks between queues. The change is to have one run queue per physical processor that is able to feed tasks into all of the virtual processors. Throw in a smarter sense of what makes an idle CPU (all virtual processors must be idle), and the resulting code "magically fulfills" the needs of scheduling on a Hyper-Threading system.

In addition to the run queue change in the 2.5 scheduler, there are other changes needed to give the Linux kernel the ability to leverage HT for optimal performance. Those changes were discussed by Molnar (again, please see Resources for more on that) as follows.
HT-aware passive load-balancing:
The IRQ-driven balancing has to be per-physical-CPU, not per-logical-CPU. Otherwise, it might happen that one physical CPU runs two tasks while another physical CPU runs no task; the stock scheduler does not recognize this condition as "imbalance." To the scheduler, it appears as if the first two CPUs have 1-1 task running while the second two CPUs have 0-0 tasks running. The stock scheduler does not realize that the two logical CPUs belong to the same physical CPU.

"Active" load-balancing:
This is when a logical CPU goes idle and causes a physical CPU imbalance. This is a mechanism that simply does not exist in the stock 1:1 scheduler. The imbalance caused by an idle CPU can be solved via the normal load-balancer. In the case of HT, the situation is special because the source physical CPU might have just two tasks running, both runnable. This is a situation that the stock load-balancer is unable to handle, because running tasks are hard to migrate away. This migration is essential -- otherwise a physical CPU can get stuck running two tasks while another physical CPU stays idle.

HT-aware task pickup:
When the scheduler picks a new task, it should prefer all tasks that share the same physical CPU before trying to pull in tasks from other CPUs. The stock scheduler only picks tasks that were scheduled to that particular logical CPU.

HT-aware affinity:
Tasks should attempt to "stick" to physical CPUs, not logical CPUs.

HT-aware wakeup:
The stock scheduler only knows about the "current" CPU, it does not know about any sibling. On HT, if a thread is woken up on a logical CPU that is already executing a task, and if a sibling CPU is idle, then the sibling CPU has to be woken up and has to execute the newly woken-up task immediately.

At this writing, Molnar has provided a patch to stock kernel 2.5.32 implementing all the above changes by introducing the concept of a shared runqueue: multiple CPUs can share the same runqueue. A shared, per-physical-CPU runqueue fulfills all of the HT-scheduling needs listed above. Obviously this complicates scheduling and load-balancing, and the effects on the SMP and uniprocessor scheduler are still unknown.

The change in Linux kernel 2.5.32 was designed to affect Xeon systems with more than two CPUs, especially in the load-balancing and thread affinity arenas. Due to hardware resource constraints, we were only able to measure its effects in our one-CPU test environment. Using the same testing process employed in 2.4.19, we ran the three workloads, chat, dbench, and tbench, on 2.5.32. For chat, HT could bring as much as a 60% speed-up in the case of 40 chat rooms. The overall improvement was about 45%. For dbench, 27% was the high speed-up mark, with the overall improvement about 12%. For tbench, the overall improvement was about 35%.

Table 7. Effects of Hyper-Threading on Linux kernel 2.5.32 chat workload
Number of chat rooms 2532s-noht 2532s-ht Speed-up
20 137,792 207,788 51%
30 138,832 195,765 41%
40 144,454 231,509 47%
50 137,745 191,834 39%
Geometric Mean 139,678 202,034 45%
dbench workload
Number of clients 2532s-noht 2532s-ht Speed-up
20 142.02 180.87 27%
30 129.63 141.19 9%
60 84.76 86.02 1%
90 67.89 70.37 4%
120 57.44 70.59 23%
Geometric Mean 90.54 101.76 12%
tbench workload
Number of clients 2532s-noht 2532s-ht Speed-up
20 60.28 82.23 36%
30 60.12 81.72 36%
60 59.73 81.2 36%
90 59.71 80.79 35%
120 59.73 79.45 33%
Geometric Mean 59.91 81.07 35%
Note: chat data is the number of messages sent by the client/sec; dbench and tbench data are in MB/sec.


Back to top





Conclusion

Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness.
ACKNOWLEDGMENTS:
The author would like to thank Intel's Sunil Saxena for invaluable information gleaned at the LinuxWorld Conference Session Performance tuning for threaded applications -- with a look at Hyper-Threading at the LinuxWorld Conference in San Francisco, August 2002.



Resources
You can download the chat benchmark from the Linux Benchmark Suite Homepage.



The README file from dbench is courtesy of SAMBA.



More information on LMbench can be found at the LMbench home page.



The home of the Ziff-Davis NetBench benchmarking test gives more details of their test suite.



The Linux elevator algorithm is discussed in the November 23, 2000 edition of the Linux Weekly News Kernel Development section.



An August 2002 note on Hyper-Threading posted by Ingo Molnar to the kernel list is reprinted in the Linux Weekly News.



Another August 2002 LWN article also discusses the scheduler and Hyper-Threading (among other things).



Learn about IBM's developer contributions to Linux at the IBM Linux Technology Center.



Find more resources for Linux developers in the developerWorks Linux zone.



About the author


Duc Vianney works in operating system performance evaluation and measurement in computer architectures and Java. He is with the Linux Kernel Performance Group at the IBM Linux Technology Center. Duc has written several articles on Java performance for IBM developerWorks and PartnerWorld. You can contact Duc at dvianney@us.ibm.com.

Fw oracle中*_name*知多少 from warehouse.itpub

1、db_name 数据库名

SQL> connect xys/manager as sysdba
已连接。
SQL> show user
USER 为 "SYS"
SQL> show parameter db_name

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string test1

db_name是数据库的名字,oracle本身可能是比较有用的,对我们而言没有什么太多的用处,db_name记录在controlfile,datafile_header,redo中,要想修改db_name是比较麻烦的有两种办法:1、重建controlfile,之后要求必须以reseglogs方式打开数据库;2、通过nid。另外在建库时db_name被限制为最长8个字符,尽管10g在创建时没有错误提示了,但是看看库里最多能存下几个字符就明白了,为什么即使输入超过8个字符不会报错,但是最终还是被截断了,之前我就为一个企业处理过截断db_name而引起的一些问题。

SQL> desc v$database;
名称 是否为空? 类型
----------------------------------------- -------- ------------------------

DBID NUMBER
NAME VARCHAR2(9)

SQL>
因此在建库时指定恰当的db_name还是非常重要的。db_name还有一个非常重要的作用就是动态注册监听,不管是否指定了service_name,或者说service_name的值是什么,pmon都会使用db_name动态注册监听的。

SQL> host lsnrctl status

LSNRCTL for 32-bit Windows: Version 10.2.0.1.0 - Production on 03-12月-2007 10:1
9:36

Copyright (c) 1991, 2005, Oracle. All rights reserved.

正在连接到 (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
LISTENER 的 STATUS
------------------------
别名 LISTENER
版本 TNSLSNR for 32-bit Windows: Version 10.2.0.1.0 - Produ
ction
启动日期 03-12月-2007 09:29:47
正常运行时间 0 天 0 小时 49 分 50 秒
跟踪级别 off
安全性 ON: Local OS Authentication
SNMP OFF
监听程序日志文件 e:oracleproduct10.2.0db_1networkloglistener.log

监听端点概要...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xys)(PORT=1521)))
服务摘要..
服务 "TEST2.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...
服务 "TEST3.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...
服务 "test1.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...
服务 "test1_XPT.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...
命令执行成功

SQL> show parameter service_names

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names string TEST2, TEST3

我们发现service_names的值是TEST2, TEST3,但是lsnrctl status显示的结果中包含了“

服务 "test1.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...”

2、instnace_name 实例名

SQL> show parameter instance_name

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string inst_test
instance_name除了动态注册监听用到之外,到目前为止我没有发现其他用处,也许oracle用它来区分各个实例?不过相信仅仅通过instance_name也不能完全区分,至少也的用到sid吧,看看上面显示出来的动态注册监听中的信息,其中inst_test就是instance_name

3、SID:System Identifier

The SID identifies the instance's shared memory on a host, but may not uniquely distinguish this instance
from other instances

doc上把sid解释为在host上用sid来标示实例的共享内存的,可见sid主要是和os打交道的。

sid可以通过如下语句在库中查询:

SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
tsid

尽管v$instance中字段 instance_name 看起来是实例名,但是实际上存储的是sid,在win下sid不能重复,不管oracle_home是否相同,相同当然不行,主要是不同也不行,这里的不同是针对unix/linux而言的,在unix/linux下只要不同版本的oracle安装在不同的oracle_home下就可以创建相同sid的实例,但是win下不可以,这不是由oracle决定的,主要是受到windows服务的限制,在服务中不能存在服务名相同的oracle服务,服务名是由如下格式组成的:OracleServiceSID,因为服务名中包括了sid,所以sid如果相同了,服务名就相同了,这是windows所不允许的。因此在win下无法创建相同sid的不同实例。

4、service_names 服务名

服务名是复数,大家看好了,意味着service_names 可以是多个值,这里的服务名除了在动态注册的监听中被用到之外,没有发现其它用处,还有其它用处大家可以补充,dataguard中建议大家在primary,standby上使用相同的service_names,这样可能便于尽可能的实现透明切换,前提是如果没有配置静态静听的话,当然如果配置了静态注册的监听在primary,standby上也务必保持在listener中要求输入的服务名相同,还是那句话,尽可能的实现透明切换。下面查询可以显示service_names:

SQL> show parameter service_names

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names string TEST2, TEST3

这里我指定了2个值test2,test3,再来看看动态注册的监听是如何使用服务名的,监听的部分状态信息如下:

服务 "TEST2.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...
服务 "TEST3.COM" 包含 1 个例程。
例程 "inst_test", 状态 READY, 包含此服务的 1 个处理程序...

这里我们看到显示出来的服务名有后缀com,是因为我设置了db_domain

5、db_domain 数据库域名

SQL> show parameter db_domain

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_domain string COM

doc上说它被"."分割,包括句点最多128个字符,没改过这么长的,不知道,没有验证过,谁想验证就验证一下,db_domain 的作用主要是用在分布式数据库中,分布式事务的各个数据库应该有db_domain ,但是要求他们是否相同,doc上没说,我也不知道,之前单位开发有分布式环境,但是当时没有注意过,不过高级复制中要同步的对象所在的数据库是无论如何也要设置db_domain 的,是否要求相同也不得而知了,我在配置复制的时候把db_domain 设置为相同的了。介绍db_domain 的另一个用途就是在同一个os域中如果要创建同名db_name的数据库时建议最好让具有相同db_name的数据库具有不同的db_domain,以保证在同一个域中global_name是唯一的。doc上也是这样建议的:Oracle recommends that you specify DB_DOMAIN as a unique string for all databases in a domain

当指定了db_domain的时候,在创建db link时会自动在db_link的后面加上db_domain(doc:

If you omit the domains from the name of a database link, Oracle expands the name by qualifying the database with the domain of your local database as it currently exists in the data dictionary, and then stores the link name in the data dictionary. The characters valid in a database domain name are: alphanumeric characters, underscore (_), and number sign (#).

),9i好像记得是这样的,但是10g我验证了一下不是:

SQL> create database link dbl_test using 'orcl';

数据库链接已创建。

SQL> select db_link from dba_db_links;

DB_LINK
----------------------------------------------------------------------------

DBL
DBL_TEST
ORCL

还有一点需要主要的是:You must set this parameter for every instance, and multiple instances must have the same value in Real Application Clusters

6、global_name 全局数据库名

global_name 是由db_name.db_domain构成的,doc如下:
Understanding How Global Database Names Are Formed
A global database name is formed from two components: a database name and a domain. The database name and the domain name are determined by the following initialization parameters at database creation:
Component Parameter Requirements Example
Database name DB_NAME Must be eight characters or less. sales
Domain containing the database DB_DOMAIN Must follow standard Internet conventions. Levels in domain names must be separated by dots and the order of domain names is from leaf to root, left to right. us.acme.com
但是通过验证发现oracle并没有把db_name.db_domain和global_name 同步起来,不知道为什么?global_name oracle是通过提供了一个view,sys.global_name,该试图是源于props$的,可以查看创建view的脚本,最终我们访问的是一个public synonym global_name:

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------

test1

按照doc的意思,我上面看到的查询结果应该是test1.com才对,这是我的疑问,好久了?

不过我们也可以修改global_name:

SQL> alter database rename global_name to test1.com;

数据库已更改。

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

TEST1.COM

SQL> alter database rename global_name to test123.com;

数据库已更改。

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

TEST123.COM

SQL>

需要注意的是一旦加上了域就不能通过上面的命令去掉了,如:

SQL> alter database rename global_name to test123.com;

数据库已更改。

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

TEST123.COM

SQL> alter database rename global_name to test1;

数据库已更改。

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

TEST1.COM

SQL> alter database rename global_name to test123;

数据库已更改。

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

TEST123.COM

SQL>

不过可以直接update global_name 或者props$来去掉后缀:

SQL> update global_name set global_name='test1;

已更新 1 行。

SQL> commit;

提交完成。

SQL> select * from global_name;

GLOBAL_NAME
---------------------------------------------------------------

test1

SQL>

global_name 的作用主要也是用在Distributed Database中,我只在高级复制中用过global_name

详细的内容也可以参考下面的连接:
http://download.oracle.com/docs/ ... dmin.htm#sthref4096


7、global_names 是一个布尔值,为什么要提它,是应为global_names和global_name看起来很相似,global_names的作用是创建db link时是否强制使用远程数据库的global_name,如果global_names=true,则db link name必须要求是remote database的global_name,否则创建之后db link 不能连同,测试如下,缺省值是false
SQL> show parameter global_names

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
global_names boolean TRUE
SQL> select count(*) from t_emp@dbl;
select count(*) from t_emp@dbl
*
第 1 行出现错误:
ORA-02085: 数据库链接 DBL 连接到 ORCL

SQL> col db_link format a10
SQL> col host format a10
SQL> col owner format a10
SQL> col username format a10
SQL> select * from dba_db_links;

OWNER DB_LINK USERNAME HOST CREATED
---------- ---------- ---------- ---------- --------------
SYS DBL TEST orcl 01-12月-07
SYS ORCL TEST orcl 01-12月-07

SQL> select count(*) from t_emp@orcl;

COUNT(*)
----------
4

SQL>
8、在通过netmanager配置静态监听注册时,需要输入的全局数据库名(GLOBAL_DBNAME )到底应该输入什么?而通过netmanager配置网络服务命名(tns)是需要输入的服务名(SERVICE_NAME )又是什么?

listener.ora内容如下:

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = test)
(ORACLE_HOME = E:oracleproduct10.2.0db_1)
(SID_NAME = tsid)
)
)

LISTENER =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = xys)(PORT = 1521))
)

--================================

tnsnames.ora内容如下:

TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = xys)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = test)
)
)

--================================

这里明确的告诉大家,配置静态监听注册时,需要输入的全局数据库名(GLOBAL_DBNAME )输入什么都可以,只要保证listerner.ora中的GLOBAL_DBNAME和tnsnames.ora中的SERVICE_NAME保持一致就可以,下面通过试验看看效果:

值得主要的是GLOBAL_DBNAME = test,而此时数据库的db_name和global_name以及service_names分别如下显示:

SQL> show parameter db_name

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_name string test1
SQL> select *from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

test1

SQL> alter database rename global_name to abcd.yu;

数据库已更改。

SQL> select *from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------

ABCD.YU

SQL>
SQL> show parameter service_names

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
service_names string TEST2, TEST3
SQL>

而我在配置listener和tnsnames时提供的test和db_name,globla_name,service_name没有任何关系,然后看看tnsping的效果:

C:>tnsping test
已使用 TNSNAMES 适配器来解析别名
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)
(HOST = xys)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = test)))
OK (30 毫秒)

C:>

9、db_unique_name

DB_UNIQUE_NAME是10g的参数,在配置dataguard环境时必须为处于dg环境中每个db设置一个唯一值,在没有DB_UNIQUE_NAME参数之前,在同一台机器上搭建dg时必须使用参数LOCK_NAME_SPACE在standby参数文件中,10g

有了参数db_unique_name,LOCK_NAME_SPACE已经被废弃!

Automatic Storage Management (ASM) in Oracle Database 10g

Automatic Storage Management (ASM) in Oracle Database 10g
Automatic Storage Management (ASM) is a new feature that has be introduced in Oracle 10g to simplify the storage of Oracle datafiles, controlfiles and logfiles.
Overview of Automatic Storage Management (ASM)
Initialization Parameters and ASM Instance Creation
Startup and Shutdown of ASM Instances
Administering ASM Disk Groups
Disks
Templates
Directories
Aliases
Files
Checking Metadata
ASM Filenames
ASM Views
SQL and ASM
Migrating to ASM Using RMAN
Overview of Automatic Storage Management (ASM)
Automatic Storage Management (ASM) simplifies administration of Oracle related files by allowing the administrator to reference disk groups rather than individual disks and files, which are managed by ASM. The ASM functionality is an extention of the Oracle Managed Files (OMF) functionality that also includes striping and mirroring to provide balanced and secure storage. The new ASM functionality can be used in combination with existing raw and cooked file systems, along with OMF and manually managed files.

The ASM functionality is controlled by an ASM instance. This is not a full database instance, just the memory structures and as such is very small and lightweight.

The main components of ASM are disk groups, each of which comprise of several physical disks that are controlled as a single unit. The physical disks are known as ASM disks, while the files that reside on the disks are know as ASM files. The locations and names for the files are controlled by ASM, but user-friendly aliases and directory structures can be defined for ease of reference.

The level of redundancy and the granularity of the striping can be controlled using templates. Default templates are provided for each file type stored by ASM, but additional templates can be defined as needed.

Failure groups are defined within a disk group to support the required level of redundancy. For two-way mirroring you would expect a disk group to contain two failure groups so individual files are written to two locations.

In summary ASM provides the following functionality:
Manages groups of disks, called disk groups.
Manages disk redundancy within a disk group.
Provides near-optimal I/O balancing without any manual tuning.
Enables management of database objects without specifying mount points and filenames.
Supports large files.
Initialization Parameters and ASM Instance Creation
The initialization parameters that are of specific interest for an ASM instance are:
INSTANCE_TYPE - Set to ASM or RDBMS depending on the instance type. The default is RDBMS.
DB_UNIQUE_NAME - Specifies a globally unique name for the database. This defaults to +ASM but must be altered if you intend to run multiple ASM instances.
ASM_POWER_LIMIT -The maximum power for a rebalancing operation on an ASM instance. The valid values range from 1 to 11, with 1 being the default. The higher the limit the more resources are allocated resulting in faster rebalancing operations. This value is also used as the default when the POWER clause is omitted from a rebalance operation.
ASM_DISKGROUPS - The list of disk groups that should be mounted by an ASM instance during instance startup, or by the ALTER DISKGROUP ALL MOUNT statement. ASM configuration changes are automatically reflected in this parameter.
ASM_DISKSTRING - Specifies a value that can be used to limit the disks considered for discovery. Altering the default value may improve the speed of disk group mount time and the speed of adding a disk to a disk group. Changing the parameter to a value which prevents the discovery of already mounted disks results in an error. The default value is NULL allowing all suitable disks to be considered.
Incorrect usage of parameters in ASM or RDBMS instances result in ORA-15021 errors.

To create an ASM instance first create a file called init+ASM.ora in the /tmp directory containing the following information.
INSTANCE_TYPE=ASM
Next, using SQL*Plus connect to the ide instance.
export ORACLE_SID=+ASM
sqlplus / as sysdba
Create an spfile using the contents of the init+ASM.ora file.
SQL> CREATE SPFILE FROM PFILE='/tmp/init+ASM.ora';

File created.
Finally, start the instance with the NOMOUNT option.
SQL> startup nomount
ASM instance started

Total System Global Area 125829120 bytes
Fixed Size 1301456 bytes
Variable Size 124527664 bytes
Database Buffers 0 bytes
Redo Buffers 0 bytes
SQL>
The ASM instance is now ready to use for creating and mounting disk groups. To shutdown the ASM instance issue the following command.
SQL> shutdown
ASM instance shutdown
SQL>
Once an ASM instance is present disk groups can be used for the following parameters in database instances (INSTANCE_TYPE=RDBMS) to allow ASM file creation:
DB_CREATE_FILE_DEST
DB_CREATE_ONLINE_LOG_DEST_n
DB_RECOVERY_FILE_DEST
CONTROL_FILES
LOG_ARCHIVE_DEST_n
LOG_ARCHIVE_DEST
STANDBY_ARCHIVE_DEST
Startup and Shutdown of ASM Instances
ASM instance are started and stopped in a similar way to normal database instances. The options for the STARTUP command are:
FORCE - Performs a SHUTDOWN ABORT before restarting the ASM instance.
MOUNT - Starts the ASM instance and mounts the disk groups specified by the ASM_DISKGROUPS parameter.
NOMOUNT - Starts the ASM instance without mounting any disk groups.
OPEN - This is not a valid option for an ASM instance.
The options for the SHUTDOWN command are:
NORMAL - The ASM instance waits for all connected ASM instances and SQL sessions to exit then shuts down.
IMMEDIATE - The ASM instance waits for any SQL transactions to complete then shuts down. It doesn't wait for sessions to exit.
TRANSACTIONAL - Same as IMMEDIATE.
ABORT - The ASM instance shuts down instantly.
Administering ASM Disk Groups
Disks
Disk groups are created using the CREATE DISKGROUP statement. This statement allows you to specify the level of redundancy:
NORMAL REDUNDANCY - Two-way mirroring, requiring two failure groups.
HIGH REDUNDANCY - Three-way mirroring, requiring three failure groups.
EXTERNAL REDUNDANCY - No mirroring for disks that are already protected using hardware mirroring or RAID.
In addition failure groups and preferred names for disks can be defined. If the NAME clause is omitted the disks are given a system generated name like "disk_group_1_0001". The FORCE option can be used to move a disk from another disk group into this one.
CREATE DISKGROUP disk_group_1 NORMAL REDUNDANCY
FAILGROUP failure_group_1 DISK
'/devices/diska1' NAME diska1,
'/devices/diska2' NAME diska2,
FAILGROUP failure_group_2 DISK
'/devices/diskb1' NAME diskb1,
'/devices/diskb2' NAME diskb2;
Disk groups can be deleted using the DROP DISKGROUP statement.
DROP DISKGROUP disk_group_1 INCLUDING CONTENTS;
Disks can be added or removed from disk groups using the ALTER DISKGROUP statement. Remember that the wildcard "*" can be used to reference disks so long as the resulting string does not match a disk already used by an existing disk group.
-- Add disks.
ALTER DISKGROUP disk_group_1 ADD DISK
'/devices/disk*3',
'/devices/disk*4';

-- Drop a disk.
ALTER DISKGROUP disk_group_1 DROP DISK diska2;
Disks can be resized using the RESIZE clause of the ALTER DISKGROUP statement. The statement can be used to resize individual disks, all disks in a failure group or all disks in the disk group. If the SIZE clause is omitted the disks are resized to the size of the disk returned by the OS.
-- Resize a specific disk.
ALTER DISKGROUP disk_group_1
RESIZE DISK diska1 SIZE 100G;

-- Resize all disks in a failure group.
ALTER DISKGROUP disk_group_1
RESIZE DISKS IN FAILGROUP failure_group_1 SIZE 100G;

-- Resize all disks in a disk group.
ALTER DISKGROUP disk_group_1
RESIZE ALL SIZE 100G;
The UNDROP DISKS clause of the ALTER DISKGROUP statement allows pending disk drops to be undone. It will not revert drops that have completed, or disk drops associated with the dropping of a disk group.
ALTER DISKGROUP disk_group_1 UNDROP DISKS;
Disk groups can be rebalanced manually using the REBALANCE clause of the ALTER DISKGROUP statement. If the POWER clause is omitted the ASM_POWER_LIMIT parameter value is used. Rebalancing is only needed when the speed of the automatic rebalancing is not appropriate.
ALTER DISKGROUP disk_group_1 REBALANCE POWER 5;
Disk groups are mounted at ASM instance startup and unmounted at ASM instance shutdown. Manual mounting and dismounting can be accomplished using the ALTER DISKGROUP statement as seen below.
ALTER DISKGROUP ALL DISMOUNT;
ALTER DISKGROUP ALL MOUNT;
ALTER DISKGROUP disk_group_1 DISMOUNT;
ALTER DISKGROUP disk_group_1 MOUNT;
Templates
Templates are named groups of attributes that can be applied to the files within a disk group. The following example show how templates can be created, altered and dropped.
-- Create a new template.
ALTER DISKGROUP disk_group_1 ADD TEMPLATE my_template ATTRIBUTES (MIRROR FINE);

-- Modify template.
ALTER DISKGROUP disk_group_1 ALTER TEMPLATE my_template ATTRIBUTES (COARSE);

-- Drop template.
ALTER DISKGROUP disk_group_1 DROP TEMPLATE my_template;
Available attributes include:
UNPROTECTED - No mirroring or striping regardless of the redundancy setting.
MIRROR - Two-way mirroring for normal redundancy and three-way mirroring for high redundancy. This attribute cannot be set for external redundancy.
COARSE - Specifies lower granuality for striping. This attribute cannot be set for external redundancy.
FINE - Specifies higher granularity for striping. This attribute cannot be set for external redundancy.
Directories
A directory heirarchy can be defined using the ALTER DISKGROUP statement to support ASM file aliasing. The following examples show how ASM directories can be created, modified and deleted.
-- Create a directory.
ALTER DISKGROUP disk_group_1 ADD DIRECTORY '+disk_group_1/my_dir';

-- Rename a directory.
ALTER DISKGROUP disk_group_1 RENAME DIRECTORY '+disk_group_1/my_dir' TO '+disk_group_1/my_dir_2';

-- Delete a directory and all its contents.
ALTER DISKGROUP disk_group_1 DROP DIRECTORY '+disk_group_1/my_dir_2' FORCE;
Aliases
Aliases allow you to reference ASM files using user-friendly names, rather than the fully qualified ASM filenames.
-- Create an alias using the fully qualified filename.
ALTER DISKGROUP disk_group_1 ADD ALIAS '+disk_group_1/my_dir/my_file.dbf'
FOR '+disk_group_1/mydb/datafile/my_ts.342.3';

-- Create an alias using the numeric form filename.
ALTER DISKGROUP disk_group_1 ADD ALIAS '+disk_group_1/my_dir/my_file.dbf'
FOR '+disk_group_1.342.3';

-- Rename an alias.
ALTER DISKGROUP disk_group_1 RENAME ALIAS '+disk_group_1/my_dir/my_file.dbf'
TO '+disk_group_1/my_dir/my_file2.dbf';

-- Delete an alias.
ALTER DISKGROUP disk_group_1 DELETE ALIAS '+disk_group_1/my_dir/my_file.dbf';
Attempting to drop a system alias results in an error.
Files
Files are not deleted automatically if they are created using aliases, as they are not Oracle Managed Files (OMF), or if a recovery is done to a point-in-time before the file was created. For these circumstances it is necessary to manually delete the files, as shown below.
-- Drop file using an alias.
ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1/my_dir/my_file.dbf';

-- Drop file using a numeric form filename.
ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1.342.3';

-- Drop file using a fully qualified filename.
ALTER DISKGROUP disk_group_1 DROP FILE '+disk_group_1/mydb/datafile/my_ts.342.3';
Checking Metadata
The internal consistency of disk group metadata can be checked in a number of ways using the CHECK clause of the ALTER DISKGROUP statement.
-- Check metadata for a specific file.
ALTER DISKGROUP disk_group_1 CHECK FILE '+disk_group_1/my_dir/my_file.dbf'

-- Check metadata for a specific failure group in the disk group.
ALTER DISKGROUP disk_group_1 CHECK FAILGROUP failure_group_1;
-- Check metadata for a specific disk in the disk group. ALTER DISKGROUP disk_group_1 CHECK DISK diska1; -- Check metadata for all disks in the disk group. ALTER DISKGROUP disk_group_1 CHECK ALL;
ASM Views
The ASM configuration can be viewed using the V$ASM_% views, which often contain different information depending on whether they are queried from the ASM instance, or a dependant database instance.
View ASM Instance DB Instance
V$ASM_ALIAS Displays a row for each alias present in every disk group mounted by the ASM instance. Returns no rows
V$ASM_CLIENT Displays a row for each database instance using a disk group managed by the ASM instance. Displays a row for the ASM instance if the database has open ASM files.
V$ASM_DISK Displays a row for each disk discovered by the ASM instance, including disks which are not part of any disk group. Displays a row for each disk in disk groups in use by the database instance.
V$ASM_DISKGROUP Displays a row for each disk group discovered by the ASM instance. Displays a row for each disk group mounted by the local ASM instance.
V$ASM_FILE Displays a row for each file for each disk group mounted by the ASM instance. Displays no rows.
V$ASM_OPERATION Displays a row for each file for each long running operation executing in the ASM instance. Displays no rows.
V$ASM_TEMPLATE Displays a row for each template present in each disk group mounted by the ASM instance. Displays a row for each template present in each disk group mounted by the ASM instance with which the database instance communicates.

ASM Filenames
There are several ways to reference ASM file. Some forms are used during creation and some for referencing ASM files. The forms for file creation are incomplete, relying on ASM to create the fully qualified name, which can be retrieved from the supporting views. The forms of the ASM filenames are summarised below.
Filename Type Format
Fully Qualified ASM Filename +dgroup/dbname/file_type/file_type_tag.file.incarnation
Numeric ASM Filename +dgroup.file.incarnation
Alias ASM Filenames +dgroup/directory/filename
Alias ASM Filename with Template +dgroup(template)/alias
Incomplete ASM Filename +dgroup
Incomplete ASM Filename with Template +dgroup(template)

SQL and ASM
ASM filenames can be used in place of conventional filenames for most Oracle file types, including controlfiles, datafiles, logfiles etc. For example, the following command creates a new tablespace with a datafile in the disk_group_1 disk group.
CREATE TABLESPACE my_ts DATAFILE '+disk_group_1' SIZE 100M AUTOEXTEND ON;
Migrating to ASM Using RMAN
The following method shows how a primary database can be migrated to ASM from a disk based backup:
Disable change tracking (only available in Enterprise Edition) if it is currently being used.
SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;
Shutdown the database.
SQL> SHUTDOWN IMMEDIATE
Modify the parameter file of the target database as follows:
Set the DB_CREATE_FILE_DEST and DB_CREATE_ONLINE_LOG_DEST_n parameters to the relevant ASM disk groups.
Remove the CONTROL_FILES parameter from the spfile so the control files will be moved to the DB_CREATE_* destination and the spfile gets updated automatically. If you are using a pfile the CONTROL_FILES parameter must be set to the appropriate ASM files or aliases.

Start the database in nomount mode.
RMAN> STARTUP NOMOUNT
Restore the controlfile into the new location from the old location.
RMAN> RESTORE CONTROLFILE FROM 'old_control_file_name';
Mount the database.
RMAN> ALTER DATABASE MOUNT;
Copy the database into the ASM disk group.
RMAN> BACKUP AS COPY DATABASE FORMAT '+disk_group';
Switch all datafile to the new ASM location.
RMAN> SWITCH DATABASE TO COPY;
Open the database.
RMAN> ALTER DATABASE OPEN;
Create new redo logs in ASM and delete the old ones.

Enable change tracking if it was being used.
SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING;
Form more information see:
Using Automatic Storage Management
Migrating a Database into ASM

ASM manage frequent used sql

usage of diskgroup:

select name, 1-free_mb/total_mb used_pct, free_mb, total_mb from v$asm_diskgroup where name like 'DG..%'


list rawdevice usage:

select to_number(substr(path, 13)) rawno, path, name, 1-free_mb/total_mb used_pct, free_mb, total_mb from v$asm_disk ad where name like 'DG01%' order by 1;

log dml ddl trigger (i'm doubt whether it works?)

捕获ddl、dml操作触发器。

先建好记录表ddl$trace,capt$sql


捕获ddl操作触发器:


CREATE OR REPLACE TRIGGER tr_trace_ddl
AFTER DDL ON DATABASE
DECLARE
sql_text ora_name_list_t;
state_sql ddl$trace.ddl_sql%TYPE;
BEGIN
FOR i IN 1..ora_sql_txt(sql_text) LOOP
state_sql := state_sql||sql_text(i);
END LOOP;
INSERT INTO ddl$trace(login_user,ddl_time,ip_address,audsid,
schema_user,schema_object,ddl_sql)
VALUES(ora_login_user,SYSDATE,userenv('SESSIONID'),
sys_context('USERENV','IP_ADDRESS'),
ora_dict_obj_owner,ora_dict_obj_name,state_sql);
EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line ('Capture DDL Excption:'||SQLERRM);
END tr_trace_ddl;


捕获某表dml操作触发器:


CREATE OR REPLACE TRIGGER tr_capt_sql
BEFORE DELETE OR INSERT OR UPDATE
ON 表名
DECLARE
sql_text ora_name_list_t;
state_sql capt$sql.sql_text%TYPE;
BEGIN
FOR i IN 1..ora_sql_txt(sql_text) LOOP
state_sql := state_sql || sql_text(i);
END LOOP;
INSERT INTO capt$sql(login_user,capt_time,ip_address,audsid,owner,table_name,sql_text)
VALUES(ora_login_user,sysdate,sys_context('USERENV','IP_ADDRESS'),
userenv('SESSIONID'),'MANAGER','TEST',state_sql);
EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line('Capture DML Exception:'||SQLERRM);
END tr_capt_sql;

log ddl sqls

from asktom

create table log
2 ( operation varchar2(25),
3 owner varchar2(25),
4 name varchar2(25),
5 extra varchar2(4000) );

Table created.

tkyte@TKYTE816>
tkyte@TKYTE816> create or replace trigger ddl_trigger
2 after create or alter or drop on SCHEMA
3 declare
4 l_sysevent varchar2(25);
5 l_extra varchar2(4000);
6 begin
7 select ora_sysevent into l_sysevent from dual;
8
9 if ( l_sysevent in ('DROP','CREATE') )
10 then
11 if l_sysevent = 'CREATE'
12 then
13 begin
14 select 'storage ( initial ' || initial_extent ||
15 ' next ' || next_extent || ' .... )'
into l_extra
16 from all_tables
where table_name = ora_dict_obj_name
17 and owner = user;
18 exception
19 when no_data_found then null;
20 end;
21 end if;
22
23 insert into log
24 select ora_sysevent, ora_dict_obj_owner,
25 ora_dict_obj_name, l_extra
26 from dual;
27 elsif ( l_sysevent = 'ALTER' )
28 then
29 insert into log
30 select ora_sysevent, ora_dict_obj_owner,
31 ora_dict_obj_name, sql_text
32 from v$open_cursor
33 where upper(sql_text) like 'ALTER%' ||
34 ora_dict_obj_name || '%'
35 and sid = ( select sid
36 from v$session
37 where audsid=userenv('sessionid') );
38 end if;
39 end;
40 /

Trigger created.

tkyte@TKYTE816> drop table t;
Table dropped.

tkyte@TKYTE816> create table t ( x int );
Table created.

tkyte@TKYTE816> alter table t add y date;
Table altered.

tkyte@TKYTE816> select * from log;

OPERATION OWNER NAME EXTRA
--------- ----- ---- -------------------------------------------
DROP TKYTE T
CREATE TKYTE T storage ( initial 131072 next 131072 .... )
ALTER TKYTE T alter table t add y date

i wrote myself:

CREATE OR REPLACE TRIGGER TRG_TEST_DML
BEFORE INSERT OR DELETE OR UPDATE
ON TEST
REFERENCING OLD AS OLD NEW AS NEW
declare
n number;
stmt varchar2(4000);
sql_text ora_name_list_t;

-- log functions
function getnow return varchar2;
procedure logwrite(msg varchar2);
logopt boolean := true;
dirname v$parameter.value%type;
ft utl_file.file_type;
pgname varchar2(4000) := 'trg_test_dml';
fn varchar2(4000) := pgname||'_'||getnow||'.log';
procedure logopen is
begin
if logopt then
select value into dirname from v$parameter where name = 'utl_file_dir';
ft := utl_file.fopen(dirname, fn, 'a');
logwrite('log begin.');
end if;
end;
procedure logwrite(msg varchar2) is
begin
if logopt then
utl_file.put_line(ft, getnow || ' ' || msg);
utl_file.fflush(ft);
end if;
end;
procedure logclose is
begin
if logopt then
utl_file.fclose(ft);
end if;
end;
function getnow return varchar2 is
begin
return to_char(sysdate, 'yyyy.mm.dd_hh24.mi.ss');
end;
begin
logopen;
-- normal program begin

n := ora_sql_txt(sql_text);
logwrite(n);
if n is not null then
FOR i IN 1..n LOOP
stmt := stmt || sql_text(i);
END LOOP;
insert into sql_log(sql_text) values(stmt);
else
logwrite('WARING: ora_sql_txt returns null');
end if;

-- normal program end
logwrite('log normal end.');
logclose;
exception
when others then
logwrite('log abnormal end.');
logclose;
raise;
end;
/

TRIGGER LOGON ON DATABASE

NOTE: dba and sysdba will bypass all of the logon triggers.

CREATE OR REPLACE TRIGGER LOGON_DB_TRIGGER
AFTER LOGON ON DATABASE
BEGIN
FOR rec in (select username,program,SYS_CONTEXT('USERENV','IP_ADDRESS') ip from v$session
where audsid = userenv('sessionid')) loop
IF rec.IP = 'xxx.xxx.xxx.xxx' AND rec.USERNAME = 'HR' AND upper(rec.program) in ('MSACCESS.EXE','SQLPLUSW.EXE','JREW.EXE','MSEXCEL.EXE')
THEN
EXIT;
ELSIF upper(rec.program) in ('MSACCESS.EXE','SQLPLUSW.EXE','JREW.EXE','MSEXCEL.EXE') THEN
raise_application_error(-20001,'Access Deny');
END IF;
END LOOP;
END;

http://www.red-database-security.com/advisory/bypass_oracle_logon_trigger.html

日本語LANGの設定

http://www.shitomi.jp/server/sysconfig.html

システムの設定/Fedora Core4


Linux で言語設定を行うのが

/etc/sysconfig/i18n

というファイルになります。


日本語EUCを設定する場合、

---

LANG=”ja_JP.eucJP”

SUPPORTED=”ja_JP.eucJP:ja_JP:ja”

---

日本語UTF-8を設定する場合、

---

LANG=”ja_JP.UTF-8”

SUPPORTED=”ja_JP.UTF-8:ja_JP:ja”

---

SJISを設定する場合、

---

LANG=”ja_JP.SJIS”

SUPPORTED=”ja_JP.SJIS:ja_JP:ja”

---

などとなります。

(最低限の記述なので追記オプションあり)


設定文字列の設定ですが、基本的には「各ロケール」と「文字コード」の組み合わせになります。


また、デフォルトではなく一時的に変更したいのであれば環境変数[LANG]の中身を書き換えてあげてください。


http://www.searchman.info/fedoracore4/sev1030.html

LINUX(Fedora)の文字化け対策



http://www.geocities.co.jp/SiliconValley-PaloAlto/8090/toc.html

I18N Book, Table of Contents

modify ip of rac interconnect and vip

The VIP are registered as a highly available resource within the Oracle Cluster Registry (OCR).
If you want to use different VIP's than the previously configured you need this Metalink Note 283684.1 How to Change Interconnect/Public Interface IP Subnet in a 10g Cluster
Be aware of the following bug: 4500688 - THE INTERFACE NAME SHOULD BE SPECIFY WHEN EXECUTING 'SRVCTL MODIFY NODEAPPS' that is not described in the mentioned Metalink note above. This means that it is better idea to modify the VIPs by explicitly specifying the network interface name, for instance: srvctl modify nodeapps -n mynode1 -A 100.200.300.40/255.255.255.0/eth0

[original] oracle virtual private database

Application
application partitioning
application server node distribution
schema reference modification

RAC
service
node specify execution

Network
bandwidth limit

VPD
vpd key column
oracle user privilege
vpd schema
vpd user role
vpd user
multi-level vpd db objects
oracle user privilege
vp utl_file_dirs
vp packages

Performance
partitioning
parallel processing
resource plan

Storage
raw devices mount
dg allocation rule
raw allocation rule
lun allocation rule
raid allocation rule
disk allocation rule

Backup
backup schedual
point in time tablespace restore
transport tablespace
flashback database

OS, Operation and Administration
os user privilege
single sign on (os/oracle)
batch distribution
shared folder and failover
peak load balance schedual
responce time and resource usage monitoring

tablespace usage sql

I want to add diskgroup usage and file system usage.

--tablespace usage

SELECT D.TABLESPACE_NAME,SPACE "SUM_SPACE(M)",BLOCKS SUM_BLOCKS,SPACE-NVL(FREE_SPACE,0) "USED_SPACE(M)",
ROUND((1-NVL(FREE_SPACE,0)/SPACE)*100,2) "USED_RATE(%)",FREE_SPACE "FREE_SPACE(M)"
FROM
(SELECT TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) SPACE,SUM(BLOCKS) BLOCKS
FROM DBA_DATA_FILES
GROUP BY TABLESPACE_NAME) D,
(SELECT TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) FREE_SPACE
FROM DBA_FREE_SPACE
GROUP BY TABLESPACE_NAME) F
WHERE D.TABLESPACE_NAME = F.TABLESPACE_NAME(+)
UNION ALL --if have tempfile
SELECT D.TABLESPACE_NAME,SPACE "SUM_SPACE(M)",BLOCKS SUM_BLOCKS,
USED_SPACE "USED_SPACE(M)",ROUND(NVL(USED_SPACE,0)/SPACE*100,2) "USED_RATE(%)",
NVL(FREE_SPACE,0) "FREE_SPACE(M)"
FROM
(SELECT TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) SPACE,SUM(BLOCKS) BLOCKS
FROM DBA_TEMP_FILES
GROUP BY TABLESPACE_NAME) D,
(SELECT TABLESPACE_NAME,ROUND(SUM(BYTES_USED)/(1024*1024),2) USED_SPACE,
ROUND(SUM(BYTES_FREE)/(1024*1024),2) FREE_SPACE
FROM V$TEMP_SPACE_HEADER
GROUP BY TABLESPACE_NAME) F
WHERE D.TABLESPACE_NAME = F.TABLESPACE_NAME(+)

-- tablespace usage including autoextend

SELECT D.TABLESPACE_NAME,FILE_NAME "FILE_NAME",SPACE "SUM_SPACE(M)",BLOCKS SUM_BLOCKS,SPACE-NVL(FREE_SPACE,0) "USED_SPACE(M)",
ROUND((1-NVL(FREE_SPACE,0)/SPACE)*100,2) "USED_RATE(%)",FREE_SPACE "FREE_SPACE(M)",AUTOEXTENSIBLE
FROM
(SELECT FILE_ID,FILE_NAME,TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) SPACE,SUM(BLOCKS) BLOCKS
FROM DBA_DATA_FILES
GROUP BY TABLESPACE_NAME,FILE_ID,FILE_NAME) D,
(SELECT FILE_ID,TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) FREE_SPACE
FROM DBA_FREE_SPACE
GROUP BY TABLESPACE_NAME,FILE_ID) E,
(SELECT FILE_ID,AUTOEXTENSIBLE FROM DBA_DATA_FILES) F
WHERE D.TABLESPACE_NAME = E.TABLESPACE_NAME(+) AND D.FILE_ID = E.FILE_ID(+) AND D.FILE_ID = F.FILE_ID(+)
UNION ALL --if have tempfile
SELECT D.TABLESPACE_NAME,FILE_NAME "FILE_NAME",SPACE "SUM_SPACE(M)",BLOCKS SUM_BLOCKS,
USED_SPACE "USED_SPACE(M)",ROUND(NVL(USED_SPACE,0)/SPACE*100,2) "USED_RATE(%)",
NVL(FREE_SPACE,0) "FREE_SPACE(M)",AUTOEXTENSIBLE
FROM
(SELECT FILE_ID,FILE_NAME,TABLESPACE_NAME,ROUND(SUM(BYTES)/(1024*1024),2) SPACE,SUM(BLOCKS) BLOCKS
FROM DBA_TEMP_FILES
GROUP BY TABLESPACE_NAME,FILE_ID,FILE_NAME) D,
(SELECT FILE_ID,TABLESPACE_NAME,ROUND(SUM(BYTES_USED)/(1024*1024),2) USED_SPACE,
ROUND(SUM(BYTES_FREE)/(1024*1024),2) FREE_SPACE
FROM V$TEMP_SPACE_HEADER
GROUP BY TABLESPACE_NAME,FILE_ID) E,
(SELECT FILE_ID,AUTOEXTENSIBLE FROM DBA_TEMP_FILES) F
WHERE D.TABLESPACE_NAME = E.TABLESPACE_NAME(+) AND D.FILE_ID = E.FILE_ID(+) AND D.FILE_ID = F.FILE_ID(+)
ORDER BY TABLESPACE_NAME,FILE_NAME

-- extends based usage
SELECT Substr(df.tablespace_name,1,20) "Tablespace Name",
Substr(df.file_name,1,40) "File Name",
Round(df.bytes/1024/1024,2) "Size (M)",
Round(e.used_bytes/1024/1024,2) "Used (M)",
Round(f.free_bytes/1024/1024,2) "Free (M)",
Rpad(' '|| Rpad ('X',Round(e.used_bytes*10/df.bytes,0), 'X'),11,'-') "% Used"
FROM DBA_DATA_FILES DF,
(SELECT file_id,
Sum(Decode(bytes,NULL,0,bytes)) used_bytes
FROM dba_extents
GROUP by file_id) E,
(SELECT Max(bytes) free_bytes,
file_id
FROM dba_free_space
GROUP BY file_id) f
WHERE e.file_id (+) = df.file_id
AND df.file_id = f.file_id (+)
ORDER BY df.tablespace_name,
df.file_name;

Supporting Multilingual Databases with Unicode

Oracle® Database Globalization Support Guide
10g Release 1 (10.1)
Part Number B10749-02

6 Supporting Multilingual Databases with Unicode

This chapter illustrates how to use Unicode in an Oracle database environment. It includes the following topics:

Overview of Unicode

What is Unicode?

Implementing a Unicode Solution in the Database

Unicode Case Studies

Designing Database Schemas to Support Multiple Languages
Overview of Unicode

Dealing with many different languages in the same application or database has been complicated and difficult for a long time. To overcome the limitations of existing character encodings, several organizations began working on the creation of a global character set in the late 1980s. The need for this became even greater with the development of the World Wide Web in the mid-1990s. The Internet has changed how companies do business, with an emphasis on the global market that has made a universal character set a major requirement. A global character set needs to fulfill the following conditions:

Contain all major living scripts

Support legacy data and implementations

Be simple enough that a single implementation of an application is sufficient for worldwide use

A global character set should also have the following capabilities:

Support multilingual users and organizations

Conform to international standards

Enable worldwide interchange of data

This global character set exists, is in wide use, and is called Unicode.
What is Unicode?

Unicode is a universal encoded character set that enables information from any language to be stored using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.

The Unicode standard has been adopted by many software and hardware vendors. Many operating systems and browsers now support Unicode. Unicode is required by standards such as XML, Java, JavaScript, LDAP, and WML. It is also synchronized with the ISO/IEC 10646 standard.

Oracle Corporation started supporting Unicode as a database character set in version 7. In Oracle Database 10g, Unicode support has been expanded. Oracle Database 10g supports Unicode 3.2.

See Also:
http://www.unicode.org for more information about the Unicode standard


This section contains the following topics:

Supplementary Characters

Unicode Encodings

Oracle's Support for Unicode
Supplementary Characters

The first version of Unicode was a 16-bit, fixed-width encoding that used two bytes to encode each character. This allowed 65,536 characters to be represented. However, more characters need to be supported, especially additional CJK ideographs that are important for the Chinese, Japanese, and Korean markets.

Unicode 3.2 defines supplementary characters to meet this need. It uses two 16-bit code points (also known as supplementary characters) to represent a single character. This enables an additional 1,048,576 characters to be defined. The Unicode 3.2 standard defines 45,960 supplementary characters.

Adding supplementary characters increases the complexity of Unicode, but it is less complex than managing several different encodings in the same configuration.
Unicode Encodings

Unicode 3.2 encodes characters in different ways: UTF-8, UCS-2, and UTF-16. Conversion between different Unicode encodings is a simple bit-wise operation that is defined in the Unicode standard.

This section contains the following topics:

UTF-8 Encoding

UCS-2 Encoding

UTF-16 Encoding

Examples: UTF-16, UTF-8, and UCS-2 Encoding
UTF-8 Encoding

UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding and a strict superset of ASCII. This means that each and every character in the ASCII character set is available in UTF-8 with the same code point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.

UTF-8 is the Unicode encoding supported on UNIX platforms and used for HTML and most Internet browsers. Other environments such as Windows and Java use UCS-2 encoding.

The benefits of UTF-8 are as follows:

Compact storage requirement for European scripts because it is a strict superset of ASCII

Ease of migration between ASCII-based characters sets and UTF-8

See Also:

"Supplementary Characters"

Table B-2, "Unicode Character Code Ranges for UTF-8 Character Codes"

UCS-2 Encoding

UCS-2 is a fixed-width, 16-bit encoding. Each character is 2 bytes. UCS-2 is the Unicode encoding used by Java and Microsoft Windows NT 4.0. UCS-2 supports characters defined for Unicode 3.0, so there is no support for supplementary characters.

The benefits of UCS-2 over UTF-8 are as follows:

More compact storage for Asian scripts because all characters are two bytes

Faster string processing because characters are fixed-width

Better compatibility with Java and Microsoft clients

See Also:
"Supplementary Characters"

UTF-16 Encoding

UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters that are defined in Unicode 3.2 by using two UCS-2 code points for each supplementary character. UTF-16 is a strict superset of UCS-2.

One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used by Microsoft Windows 2000.

The benefits of UTF-16 over UTF-8 are as follows:

More compact storage for Asian scripts because most of the commonly used Asian characters are represented in two bytes.

Better compatibility with Java and Microsoft clients

See Also:

"Supplementary Characters"

Table B-1, "Unicode Character Code Ranges for UTF-16 Character Codes"

Examples: UTF-16, UTF-8, and UCS-2 Encoding

Figure 6-1 shows some characters and their character codes in UTF-16, UTF-8, and UCS-2 encoding. The last character is a treble clef (a music symbol), a supplementary character that has been added to the Unicode 3.2 standard.

Figure 6-1 UTF-16, UTF-8, and UCS-2 Encoding Examples

Description of the illustration nlspg031.gif

Oracle's Support for Unicode

Oracle Corporation started supporting Unicode as a database character set in version 7. Table 6-1 summarizes the Unicode character sets supported by the Oracle database server.

Table 6-1 Unicode Character Sets Supported by the Oracle Database ServerCharacter Set Supported in RDBMS Release Unicode Encoding Unicode Version Database Character Set National Character Set
AL24UTFFSS 7.2 - 8i UTF-8 1.1 Yes No
UTF8 8.0 - 10g UTF-8 For Oracle release 8.0 through Oracle8i release 8.1.6: 2.1

For Oracle8i release 8.1.7 and later: 3.0 Yes Yes (Oracle9i and Oracle Database 10g only)
UTFE 8.0 - 10g UTF-EBCDIC For Oracle8i releases 8.0 through 8.1.6: 2.1

For Oracle8i release 8.1.7 and later: 3.0 Yes No
AL32UTF8 9i - 10g UTF-8 Oracle9i, Release 1: 3.0

Oracle9i, Release 2: 3.1

Oracle Database 10g, Release 1: 3.2 Yes No
AL16UTF16 9i - 10g UTF-16 Oracle9i, Release 1: 3.0

Oracle9i, Release 2: 3.1

Oracle Database 10g, Release 1: 3.2 No Yes

Implementing a Unicode Solution in the Database

You can store Unicode characters in an Oracle database in two ways.

You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL CHAR datatypes (CHAR, VARCHAR2, CLOB, and LONG).

If you prefer to implement Unicode support incrementally or if you need to support multilingual data only in certain columns, then you can store Unicode data in either the UTF-16 or UTF-8 encoding form in SQL NCHAR datatypes (NCHAR, NVARCHAR2, and NCLOB). The SQL NCHAR datatypes are called Unicode datatypes because they are used only for storing Unicode data.

Note:
You can combine a Unicode database solution with a Unicode datatype solution.


The following sections explain how to use the two Unicode solutions and how to choose between them:

Enabling Multilingual Support with Unicode Databases

Enabling Multilingual Support with Unicode Datatypes

How to Choose Between a Unicode Database and a Unicode Datatype Solution

Comparing Unicode Character Sets for Database and Datatype Solutions
Enabling Multilingual Support with Unicode Databases

The database character set specifies the encoding to be used in the SQL CHAR datatypes as well as the metadata such as table names, column names, and SQL statements. A Unicode database is a database with a UTF-8 character set as the database character set. There are three Oracle character sets that implement the UTF-8 encoding. The first two are designed for ASCII-based platforms while the third one should be used on EBCDIC platforms.

AL32UTF8

The AL32UTF8 character set supports the latest version of the Unicode standard. It encodes characters in one, two, or three bytes. Supplementary characters require four bytes. It is for ASCII-based platforms.

UTF8

The UTF8 character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms.

The UTF8 character set has supported Unicode 3.0 since Oracle8i release 8.1.7 and will continue to support Unicode 3.0 in future releases of the Oracle database server. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. If supplementary characters are inserted into a UTF8 database, then it does not corrupt the data in the database. The supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes. Oracle Corporation recommends that you switch to AL32UTF8 for full support of supplementary characters in the database character set.

UTFE

The UTFE character set is for EBCDIC platforms. It is similar to UTF8 on ASCII platforms, but it encodes characters in one, two, three, and four bytes. Supplementary characters are converted as two 4-byte characters.

Example 6-1 Creating a Database with a Unicode Character Set

To create a database with the AL32UTF8 character set, use the CREATE DATABASE statement and include the CHARACTER SET AL32UTF8 clause. For example:
CREATE DATABASE sample
CONTROLFILE REUSE
LOGFILE
GROUP 1 ('diskx:log1.log', 'disky:log1.log') SIZE 50K,
GROUP 2 ('diskx:log2.log', 'disky:log2.log') SIZE 50K
MAXLOGFILES 5
MAXLOGHISTORY 100
MAXDATAFILES 10
MAXINSTANCES 2
ARCHIVELOG
CHARACTER SET AL32UTF8
NATIONAL CHARACTER SET AL16UTF16
DATAFILE
'disk1:df1.dbf' AUTOEXTEND ON,
'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED
DEFAULT TEMPORARY TABLESPACE temp_ts
UNDO TABLESPACE undo_ts
SET TIME_ZONE = '+02:00';

Note:
Specify the database character set when you create the database.

Enabling Multilingual Support with Unicode Datatypes

An alternative to storing Unicode data in the database is to use the SQL NCHAR datatypes (NCHAR, NVARCHAR, NCLOB). You can store Unicode characters into columns of these datatypes regardless of how the database character set has been defined. The NCHAR datatype is a Unicode datatype exclusively. In other words, it stores data encoded as Unicode.

In releases before Oracle9i, the NCHAR datatype supported fixed-width Asian character sets that were designed to provide higher performance. Examples of fixed-width character sets are JA16SJISFIXED and ZHT32EUCFIXED. No Unicode character set was supported as the national character set before Oracle9i.

You can create a table using the NVARCHAR2 and NCHAR datatypes. The column length specified for the NCHAR and NVARCHAR2 columns is always the number of characters instead of the number of bytes:
CREATE TABLE product_information
( product_id NUMBER(6)
, product_name NVARCHAR2(100)
, product_description VARCHAR2(1000));


The encoding used in the SQL NCHAR datatypes is the national character set specified for the database. You can specify one of the following Oracle character sets as the national character set:

AL16UTF16

This is the default character set for SQL NCHAR datatypes. The character set encodes Unicode data in the UTF-16 encoding. It supports supplementary characters, which are stored as four bytes.

UTF8

When UTF8 is specified for SQL NCHAR datatypes, the data stored in the SQL datatypes is in UTF-8 encoding.

You can specify the national character set for the SQL NCHAR datatypes when you create a database using the CREATE DATABASE statement with the NATIONAL CHARACTER SET clause. The following statement creates a database with WE8ISO8859P1 as the database character set and AL16UTF16 as the national character set.

Example 6-2 Creating a Database with a National Character Set
CREATE DATABASE sample
CONTROLFILE REUSE
LOGFILE
GROUP 1 ('diskx:log1.log', 'disky:log1.log') SIZE 50K,
GROUP 2 ('diskx:log2.log', 'disky:log2.log') SIZE 50K
MAXLOGFILES 5
MAXLOGHISTORY 100
MAXDATAFILES 10
MAXINSTANCES 2
ARCHIVELOG
CHARACTER SET WE8ISO8859P1
NATIONAL CHARACTER SET AL16UTF16
DATAFILE
'disk1:df1.dbf' AUTOEXTEND ON,
'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED
DEFAULT TEMPORARY TABLESPACE temp_ts
UNDO TABLESPACE undo_ts
SET TIME_ZONE = '+02:00';
How to Choose Between a Unicode Database and a Unicode Datatype Solution

To choose the right Unicode solution for your database, consider the following questions:

Programming environment: What are the main programming languages used in your applications? How do they support Unicode?

Ease of migration: How easily can your data and applications be migrated to take advantage of the Unicode solution?

Performance: How much performance overhead are you willing to accept in order to use Unicode in the database?

Type of data: Is your data mostly Asian or European? Do you need to store multilingual documents into LOB columns?

Type of applications: What type of applications are you implementing: a packaged application or a customized end-user application?

This section describes some general guidelines for choosing a Unicode database or a Unicode datatype solution. The final decision largely depends on your exact environment and requirements. This section contains the following topics:

When Should You Use a Unicode Database?

When Should You Use Unicode Datatypes?
When Should You Use a Unicode Database?

Use a Unicode database in the situations described in Table 6-2.

Table 6-2 Using a Unicode DatabaseSituation Explanation
You need easy code migration for Java or PL/SQL. If your existing application is mainly written in Java and PL/SQL and your main concern is to minimize the code changes required to support multiple languages, then you may want to use a Unicode database solution. If the datatypes used to stored data remain as SQL CHAR datatypes, then the Java and PL/SQL code that accesses these columns does not need to change.
You have evenly distributed multilingual data. If the multilingual data is evenly distributed in existing schema tables and you are not sure which tables contain multilingual data, then you should use a Unicode database because it does not require you to identify the kind of data that is stored in each column.
Your SQL statements and PL/SQL code contain Unicode data. You must use a Unicode database. SQL statements and PL/SQL code are converted into the database character set before being processed. If the SQL statements and PL/SQL code contain characters that cannot be converted to the database character set, then those characters are lost. A common place to use Unicode data in a SQL statement is in a string literal.
You want to store multilingual documents in BLOB format and use Oracle Text for content searching. You must use a Unicode database. The BLOB data is converted to the database character set before being indexed by Oracle Text. If your database character set is not UTF8, then data are lost when the documents contain characters that cannot be converted to the database character set.

When Should You Use Unicode Datatypes?

Use Unicode datatypes in the situations described in Table 6-3.

Table 6-3 Using Unicode DatatypesSituation Explanation
You want to add multilingual support incrementally. If you want to add Unicode support to the existing database without migrating the character set, then consider using Unicode datatypes to store Unicode data. You can add columns of the SQL NCHAR datatypes to existing tables or new tables to support multiple languages incrementally.
You want to build a packaged application. If you are building a packaged application to sell to customers, then you may want to build the application using SQL NCHAR datatypes. The SQL NCHAR datatype is a reliable Unicode datatype in which the data is always stored in Unicode, and the length of the data is always specified in UTF-16 code units. As a result, you need to test the application only once. The application will run on customer databases with any database character set.
You want better performance with single-byte database character sets. If performance is your main concern, then consider using a single-byte database character set and storing Unicode data in the SQL NCHAR datatypes. Databases that use a multibyte database character set such as UTF8 have a performance overhead.
You require UTF-16 support in Windows clients. If your applications are written in Visual C/C++ or Visual Basic running on Windows, then you may want to use the SQL NCHAR datatypes. You can store UTF-16 data in SQL NCHAR datatypes in the same way that you store it in the wchar_t buffer in Visual C/C++ and string buffer in Visual Basic. You can avoid buffer overflow in client applications because the length of the wchar_t and string datatypes match the length of the SQL NCHAR datatypes in the database.


Note:
You can use a Unicode database with Unicode datatypes.

Comparing Unicode Character Sets for Database and Datatype Solutions

Oracle provides two solutions to store Unicode characters in the database: a Unicode database solution and a Unicode datatype solution. After you select the Unicode database solution, the Unicode datatype solution or a combination of both, determine the character set to be used in the Unicode database or the Unicode datatype.

Table 6-4 contains advantages and disadvantages of different character sets for a Unicode database solution. The Oracle character sets that can be Unicode database character sets are AL32UTF8, UTF8, and UTFE.

Table 6-4 Character Set Advantages and Disadvantages for a Unicode Database SolutionDatabase Character Set Advantages Disadvantages
AL32UTF8 Supplementary characters are stored in 4 bytes, there is no data conversion when supplementary characters are retrieved and inserted if the client setting is UTF-8.

The storage for supplementary characters requires less disk space in AL32UTF8 than in UTF8. You cannot specify the length of SQL CHAR types in number of UCS-2 code points for supplementary characters. Supplementary characters are treated as one code point rather than the standard two code points.

The binary order for SQL CHAR columns is different from the binary order of SQL NCHAR columns when the data consists of supplementary characters. As a result, CHAR columns and NCHAR columns do not always have the same sort for identical strings.
UTF8 You can specify the length of SQL CHAR types in number of UCS-2 code points.

The binary order of the SQL CHAR columns is always the same as the binary order of the SQL NCHAR columns when the data consists of the same supplementary characters. As a result, CHAR columns and NCHAR columns have the same sort for identical strings. Supplementary characters are stored as 6 bytes instead of the 4 bytes defined by Unicode 3.2. As a result, Oracle has to convert data for supplementary characters if the client setting is UTF-8.
UTFE This is the only Unicode character set for the EBCDIC platform.

You can specify the length of SQL CHAR types in number of UCS-2 code points.

The binary order of the SQL CHAR columns is always the same as the binary order of the SQL NCHAR columns when the data consists of the same supplementary characters. As a result, CHAR columns and NCHAR columns have the same sort for identical strings. Supplementary character are stored as 8 bytes (two 4-byte sequences) instead of the 5 bytes defined by the Unicode standard. As a result, Oracle has to convert data for those supplementary characters.

UTFE is not a standard encoding in the Unicode standard. As a result, clients requiring standard UTF-8 encoding must convert data from UTFE to the standard encoding when data is retrieved and inserted.


Table 6-5 contains advantages and disadvantages of different character sets for a Unicode datatype solution. The Oracle character sets that can be national character sets are AL16UTF16 and UTF8. The default is AL16UTF16.

Table 6-5 Character Set Advantages and Disadvantages for a Unicode Datatype SolutionNational Character Set Advantages Disadvantages
AL16UTF16 Asian data in AL16UTF16 is usually more compact than in UTF8. As a result, you save disk space and have less disk I/O when most of the multilingual data stored in the database is Asian data.

It is usually faster to process strings encoded in the AL16UTF16 character set than strings encoded in UTF8 because Oracle processes most characters in an AL16UTF16 encoded string as fixed-width characters.

The maximum length limits for the NCHAR and NVARCHAR2 columns are 1000 and 2000 characters, respectively. Because the data is fixed-width, the lengths are guaranteed. European ASCII data requires more disk space to store in AL16UTF16 than in UTF8. If most of your data is European data, then it uses more disk space than if it were UTF8 data.

The maximum lengths for NCHAR and NVARCHAR2 are 1000 and 2000 characters, which is less than the lengths for NCHAR (2000) and NVARCHAR2 (4000) in UTF8.
UTF8 European data in UTF8 is usually more compact than in AL16UTF16. As a result, you save disk space and have better response time when most of the multilingual data stored in the database is European data.

The maximum lengths for the NCHAR and NVARCHAR2 columns are 2000 and 4000 characters respectively, which is more than those for NCHAR (1000) and NVARCHAR2 (2000) in AL16UTF16. Although the maximum lengths of the NCHAR and NVARCHAR2 columns are larger in UTF8, the actual storage size is still bound by the byte limits of 2000 and 4000 bytes, respectively. For example, you can store 4000 UTF8 characters in an NVARCHAR2 column if all the characters are single byte, but only 4000/3 characters if all the characters are three bytes. Asian data requires more disk space to store in UTF8 than in AL16UTF16. If most of your data is Asian data, then disk space usage is not less efficient than when the character set is AL16UTF16.

Although you can specify larger length limits for NCHAR and NVARCHAR, you are not guaranteed to be able to insert the number of characters specified by these limits. This is because UTF8 allows variable-width characters.

It is usually slower to process strings encoded in UTF8 than strings encoded in AL16UTF16 because UTF8 encoded strings consist of variable-width characters.

Unicode Case Studies

This section describes typical scenarios for storing Unicode characters in an Oracle database:

Example 6-3, "Unicode Solution with a Unicode Database"

Example 6-4, "Unicode Solution with Unicode Datatypes"

Example 6-5, "Unicode Solution with a Unicode Database and Unicode Datatypes"

Example 6-3 Unicode Solution with a Unicode Database

An American company running a Java application would like to add German and French support in the next release of the application. They would like to add Japanese support at a later time. The company currently has the following system configuration:

The existing database has a database character set of US7ASCII.

All character data in the existing database is composed of ASCII characters.

PL/SQL stored procedures are used in the database.

The database is around 300 GB.

There is a nightly downtime of 4 hours.

In this case, a typical solution is to choose UTF8 for the database character set because of the following reasons:

The database is very large and the scheduled downtime is short. Fast migration of the database to Unicode is vital. Because the database is in US7ASCII, the easiest and fastest way of enabling the database to support Unicode is to switch the database character set to UTF8 by issuing the ALTER DATABASE statement. No data conversion is required because US7ASCII is a subset of UTF8.

Because most of the code is written in Java and PL/SQL, changing the database character set to UTF8 is unlikely to break existing code. Unicode support is automatically enabled in the application.

Because the application supports French, German, and Japanese, there are few supplementary characters. Both AL32UTF8 and UTF8 are suitable.

Example 6-4 Unicode Solution with Unicode Datatypes

A European company that runs its applications mainly on Windows platforms wants to add new Windows applications written in Visual C/C++. The new applications will use the existing database to support Japanese and Chinese customer names. The company currently has the following system configuration:

The existing database has a database character set of WE8ISO8859P1.

All character data in the existing database is composed of Western European characters.

The database is around 50 GB.

A typical solution is take the following actions:

Use NCHAR and NVARCHAR2 datatypes to store Unicode characters

Keep WE8ISO8859P1 as the database character set

Use AL16UTF16 as the national character set

The reasons for this solution are:

Migrating the existing database to a Unicode database required data conversion because the database character set is WE8ISO8859P1 (a Latin-1 character set), which is not a subset of UTF8. As a result, there would be some overhead in converting the data to UTF8.

The additional languages are supported in new applications only. They do not depend on the existing applications or schemas. It is simpler to use the Unicode datatype in the new schema and keep the existing schemas unchanged.

Only customer name columns require Unicode support. Using a single NCHAR column meets the customer's requirements without migrating the entire database.

Because the languages to be supported are mostly Asian languages, AL16UTF16 should be used as the national character set so that disk space is used more efficiently.

The lengths of the SQL NCHAR datatypes are defined as number of characters. This is the same as the way they are treated when using wchar_t strings in Windows C/C++ programs. This reduces programming complexity.

Existing applications using the existing schemas are unaffected.

Example 6-5 Unicode Solution with a Unicode Database and Unicode Datatypes

A Japanese company wants to develop a new Java application. The company expects that the application will support as many languages as possible in the long run.

In order to store documents as is, the company decided to use the BLOB datatype to store documents of multiple languages.

The company may also want to generate UTF-8 XML documents from the relational data for business-to-business data exchange.

The back-end has Windows applications written in C/C++ using ODBC to access the Oracle database.

In this case, the typical solution is to create a Unicode database using AL32UTF8 as the database character set and use the SQL NCHAR datatypes to store multilingual data. The national character set should be set to AL16UTF16. The reasons for this solution are as follows:

When documents of different languages are stored BLOB format, Oracle Text requires the database character set to be one of the UTF-8 character sets. Because the applications may retrieve relational data as UTF-8 XML format (where supplementary characters are stored as four bytes), AL32UTF8 should be used as the database character set to avoid data conversion when UTF-8 data is retrieved or inserted.

Because applications are new and written in both Java and Windows C/C++, the company should use the SQL NCHAR datatype for its relational data. Both Java and Windows support the UTF-16 character datatype, and the length of a character string is always measured in the number of characters.

If most of the data is for Asian languages, then AL16UTF16 should be used with the SQL NCHAR datatypes because AL16UTF16 offers better performance and storage efficiency.
Designing Database Schemas to Support Multiple Languages

In addition to choosing a Unicode solution, the following issues should be taken into consideration when the database schema is designed to support multiple languages:

Specifying Column Lengths for Multilingual Data

Storing Data in Multiple Languages

Storing Documents in Multiple Languages in LOB Datatypes

Creating Indexes for Searching Multilingual Document Contents
Specifying Column Lengths for Multilingual Data

When you use NCHAR and NVARCHAR2 datatypes for storing multilingual data, the column size specified for a column is defined in number of characters. (The number of characters means the number of Unicode code units.) Table 6-6 shows the maximum size of the NCHAR and NVARCHAR2 datatypes for the AL16UTF16 and UTF8 national character sets.

Table 6-6 Maximum Datatype SizeNational Character Set Maximum Column Size of NCHAR Datatype Maximum Column Size of NVARCHAR2 Datatype
AL16UTF16 1000 characters 2000 characters
UTF8 2000 bytes 4000 bytes


When you use CHAR and VARCHAR2 datatypes for storing multilingual data, the maximum length specified for each column is, by default, in number of bytes. If the database needs to support Thai, Arabic, or multibyte languages such as Chinese and Japanese, then the maximum lengths of the CHAR, VARCHAR, and VARCHAR2 columns may need to be extended. This is because the number of bytes required to encode these languages in UTF8 or AL32UTF8 may be significantly larger than the number of bytes for encoding English and Western European languages. For example, one Thai character in the Thai character set requires 3 bytes in UTF8 or AL32UTF8. In addition, the maximum column lengths for CHAR, VARCHAR, and VARCHAR2 datatypes are 2000 bytes, 4000 bytes, and 4000 bytes respectively. If applications need to store more than 4000 bytes, then they should use the CLOB datatype.
Storing Data in Multiple Languages

The Unicode character set includes characters of most written languages around the world, but it does not contain information about the language to which a given character belongs. In other words, a character such as ä does not contain information about whether it is a French or German character. In order to provide information in the language a user desires, data stored in a Unicode database should accompany the language information to which the data belongs.

There are many ways for a database schema to relate data to a language. The following sections provide different approaches:

Store Language Information with the Data

Select Translated Data Using Fine-Grained Access Control
Store Language Information with the Data

For data such as product descriptions or product names, you can add a language column (language_id) of CHAR or VARCHAR2 datatype to the product table to identify the language of the corresponding product information. This enables applications to retrieve the information in the desired language. The possible values for this language column are the 3-letter abbreviations of the valid NLS_LANGUAGE values of the database.

See Also:
Appendix A, "Locale Data" for a list of NLS_LANGUAGE values and their abbreviations


You can also create a view to select the data of the current language. For example:
ALTER TABLE scott.product_information add (language_id VARCHAR2(50)):

CREATE OR REPLACE VIEW product AS
SELECT product_id, product_name
FROM product_information
WHERE language_id = sys_context('USERENV','LANG');
Select Translated Data Using Fine-Grained Access Control

Fine-grained access control enables you to limit the degree to which a user can view information in a table or view. Typically, this is done by appending a WHERE clause. when you add a WHERE clause as a fine-grained access policy to a table or view, Oracle automatically appends the WHERE clause to any SQL statements on the table at run time so that only those rows satisfying the WHERE clause can be accessed.

You can use this feature to avoid specifying the desired language of an user in the WHERE clause in every SELECT statement in your applications. The following WHERE clause limits the view of a table to the rows corresponding to the desired language of a user:
WHERE language_id = sys_context('userenv', 'LANG')


Specify this WHERE clause as a fine-grained access policy for product_information as follows:
create function func1 ( sch varchar2 , obj varchar2 )
return varchar2(100);
begin
return 'language_id = sys_context(''userenv'', ''LANG'')';
end
/

DBMS_RLS.ADD_POLICY ('scott', 'product_information', 'lang_policy', 'scott', 'func1', 'select');


Then any SELECT statement on the product_information table automatically appends the WHERE clause.

See Also:
Oracle Database Application Developer's Guide - Fundamentals for more information about fine-grained access control

Storing Documents in Multiple Languages in LOB Datatypes

You can store documents in multiple languages in CLOB, NCLOB, or BLOB datatypes and set up Oracle Text to enable content search for the documents.

Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted. Storage for an Asian language document in a CLOB column requires less storage space than the same document in a LONG column using UTF8, typically around 30% less, depending on the contents of the document.

Documents in NCLOB format are also stored in a proprietary format that is compatible with UCS-2 regardless of the database character set or national character set. The storage space requirement is the same as for CLOB data. Document contents are converted to UTF-16 when they are inserted into a NCLOB column. If you want to store multilingual documents in a non-Unicode database, then choose NCLOB. However, content search on NCLOB is not yet supported.

Documents in BLOB format are stored as they are. No data conversion occurs during insertion and retrieval. However, SQL string manipulation functions (such as LENGTH or SUBSTR) and collation functions (such as NLS_SORT and ORDER BY) cannot be applied to the BLOB datatype.

Table 6-7 lists the advantages and disadvantages of the CLOB, NCLOB, and BLOB datatypes when storing documents:

Table 6-7 Comparison of LOB Datatypes for Document StorageDatatypes Advantages Disadvantages
CLOB Content search support

String manipulation support Depends on database character set

Data conversion is necessary for insertion

Cannot store binary documents
NCLOB Independent of database character set

String manipulation support No content search support

Data conversion is necessary for insertion

Cannot store binary documents
BLOB Independent of database character set

Content search support

No data conversion, data stored as is

Can store binary documents such as Microsoft Word or Microsoft Excel No string manipulation support

Creating Indexes for Searching Multilingual Document Contents

Oracle Text enables you to build indexes for content search on multilingual documents stored in CLOB format and BLOB format. It uses a language-specific lexer to parse the CLOB or BLOB data and produces a list of searchable keywords.

Create a multilexer to search multilingual documents. The multilexer chooses a language-specific lexer for each row, based on a language column. This section describe the high level steps to create indexes for documents in multiple languages. It contains the following topics:

Creating Multilexers

Creating Indexes for Documents Stored in the CLOB Datatype

Creating Indexes for Documents Stored in the BLOB Datatype

See Also:
Oracle Text Reference

Creating Multilexers

The first step in creating the multilexer is the creation of language-specific lexer preferences for each language supported. The following example creates English, German, and Japanese lexers with PL/SQL procedures:
ctx_ddl.create_preference('english_lexer', 'basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.create_preference('german_lexer', 'basic_lexer');
ctx_ddl.set_attribute('german_lexer','composite','german');
ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');
ctx_ddl.set_attribute('german_lexer','mixed_case','yes');
ctx_ddl.create_preference('japanese_lexer', 'JAPANESE_VGRAM_LEXER');


After the language-specific lexer preferences are created, they need to be gathered together under a single multilexer preference. First, create the multilexer preference, using the MULTI_LEXER object:
ctx_ddl.create_preference('global_lexer','multi_lexer');


Now add the language-specific lexers to the multilexer preference using the add_sub_lexer call:
ctx_ddl.add_sub_lexer('global_lexer', 'german', 'german_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'japanese', 'japanese_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'default','english_lexer');


This nominates the german_lexer preference to handle German documents, the japanese_lexer preference to handle Japanese documents, and the english_lexer preference to handle everything else, using DEFAULT as the language.
Creating Indexes for Documents Stored in the CLOB Datatype

The multilexer decides which lexer to use for each row based on a language column in the table. This is a character column that stores the language of the document in a text column. Use the Oracle language name to identify the language of a document in this column. For example, if you use the CLOB datatype to store your documents, then add the language column to the table where the documents are stored:
CREATE TABLE globaldoc
(doc_id NUMBER PRIMARY KEY,
language VARCHAR2(30),
text CLOB);


To create an index for this table, use the multilexer preference and specify the name of the language column:
CREATE INDEX globalx ON globaldoc(text)
indextype IS ctxsys.context
parameters ('lexer
global_lexer
language
column
language');
Creating Indexes for Documents Stored in the BLOB Datatype

In addition to the language column, the character set and format columns must be added in the table where the documents are stored. The character set column stores the character set of the documents using the Oracle character set names. The format column specifies whether a document is a text or binary document. For example, the CREATE TABLE statement can specify columns called characterset and format:
CREATE TABLE globaldoc (
doc_id NUMBER PRIMARY KEY,
language VARCHAR2(30),
characterset VARCHAR2(30),
format VARCHAR2(10),
text BLOB
);


You can put word-processing or spreadsheet documents into the table and specify binary in the format column. For documents in HTML, XML and text format, you can put them into the table and specify text in the format column.

Because there is a column in which to specify the character set, you can store text documents in different character sets.

When you create the index, specify the names of the format and character set columns:
CREATE INDEX globalx ON globaldoc(text)
indextype is ctxsys.context
parameters ('filter inso_filter
lexer global_lexer
language column language
format column format
charset column characterset');


You can use the charset_filter if all documents are in text format. The charset_filter converts data from the character set specified in the charset column to the database character set.