本帖最后由 群发软件 于 2017-8-17 23:16 编辑
oracle诊断工具终于实现了全自动化并且更加快捷
RDA是Remote Diagnostic Agent 的简称,是oracle用来收集、分析数据库的工具,运行该工具不会改变系统的任何参数,RDA收集的相关数据非常全面,可以简化我们日常监控、分析数据库的工作,Oracle Support也建议我们在反馈相关问题时,提供RDA收集的数据,这样可以为快速解决问题提供一个有力的保证。
一、支持的操作系统
IBM AIX
Intel Linux (RedHat and SuSE)
HP-UX (10.* and 11.*)
HP Tru64
Sequent Dynix/Ptx
Sun Solaris (2.6 - 2.10)
Windows XP Professional
Windows 2000 Workstation and Server
Windows 2003 Server
二、支持的Oracle产品
Oracle RDBMS Server (Standard and Enterprise Editions)
Oracle RAC Cluster (Single Node Collection)
Oracle Application Server (iAS 1.0.2.x/9.0.x/10.1.2.x, HTTP Server)
Oracle Management Server and Intelligent Agent (Grid Server, Agent Server, DB Control)
OLAP Products (Express Server, Financial Analyzer, and Demand Planning Server)
Oracle Developer (Forms and Reports)
Oracle Collaboration Suites (Email Server and Internet Directory)
Oracle Networking products
三、主要用于诊断数据库的以下方面的问题
性能、安装、配置、升级、移植以及内部错误问题。
四、RDA的安装(版本4.12-080530)
工具下载: metalink上(需要metalink账号)
在Unix操作系统的安装说明:
1、工具下载完后不要在window操作系统下做解压缩,脚本的文件格式会破坏。
2、ftp上传是采用二进制方式。
3、执行解压缩命令
$ gunzip rda.tar.gz
$ tar xvf rda.tar
4、测试是否正常
$ cd rda
$ ./rda.sh -c
五、RDA的使用
在Unix操作系统中,oracle推荐在数据库创建用户下执行(一般即为oracle用户),第一次使用需要做一个采集的初始配置。
执行命令:
$ ./rda.sh -S
执行命令后,会有个向导化的界面让你根据当前数据库的实际安装的产品完成初始配置。 完成配置后rda目录下会生成配置文件 setup.cfg、setup.bak, 如果需要重新调整配置参数,可以直接修改该文件。
配置过程一般是默认值,或者按照提示输入参数,特殊地方如下:
-------------------------------------------------------------------------------
Enter an Oracle User ID (userid only) to view DBA_ and V$ tables
If RDA will be run under the Oracle software owner's ID, enter a '/' here,
and select Y at the SYSDBA prompt to avoid being prompted for the database
password at runtime.
Hit 'Return' to accept the default (system):/
Is '/' a sysdba user (will connect as sysdba) (Y/N)?
Hit 'Return' to accept the default (N):
> Y
-------------------------------------------------------------------------------
初始配置完毕后要采集数据,执行命令:
$ ./rda.sh
如果要看到详细的采集过程,可以执行命令:
$ ./rda.sh -v
Enter sysman user password:
Please re-enter it to confirm:
在采集过程中有可能需要你输入数据库相关用户的密码。
采集完成后,在rda目录下根据初始配置文件会生成格式为zip的报告文件以及采集日志, 默认文件位置 ./output/RDA.$machine_name.zip。
解压缩该文件,访问 <report_group>__start.htm,就可以看到这次采集的数据的详细信息。
$ cd output
$ ls -l *zip
-rw-r--r-- 1 oracle oinstall 1911090 11月 28 14:49 RDA.RDA_game.zip
$ sz RDA.RDA_game.zip
使用时的注意点:
1、有时不能生成zip格式的报告文件,这跟客户端是否安装压缩工具有关,需要自行解压缩相关报告文件。
2、如果需要安装已生成的配置再次采集,删除已生成的报告文件,然后执行命令:
$ ./rda.sh -Svf
重新设置后再执行采集命令:
$ ./rda.sh -v
六、采集数据分析
报告文件主要包括数据库对应机器的操作系统软、硬件环境、用户设置以及数据库详细的信息(初始参数、运行事件、日常监控等)。
在日常的维护当中,我们主要关注点:
1、关注RDBMS下的几个统计信息
Database SPFile Parameters spfile中数据库相关初始参数的设置
SGA Information 查看系统全局区当前内存信息
Sessions and Processes 采集时间内的数据库会话相关信息
V$System_Event 数据库实例整个运行期间所有进程事件的等待统计视图
V$Session_Wait 会话等待事件以及相关定位信息数据
Latch Information 竞争相关信息(翻译不是很准确)
Tablespaces 数据库表空间信息(表空间类型、对应数据文件使用率、是否可扩展等等)
Database Files 数据库对应数据文件信息(文件位置、使用率等)
Invalid Objects 失效的数据库对象,日常监控如果发现有失效,需要维护人员手工重新编译
all errors 日常执行的数据库错误
在 V$System_Event、 V$Session_Wait 主要关注(buffer busy waits、db file scattered read、db file sequential read、enqueue、free buffer waits、latch free、log file sync、log file paralle write 这些事件。
2、RDBMS Log/Trace Files 的信息
alert.log 数据库系统级日志
last errors 数据库最近的异常日志、根据trace文件位置,可以查看详细错误信息
--End--
ADDM(Automatic Database Diagnostic Monitor) 是植入oracle数据库的一个自诊断引擎.ADDM 通过检查和分析AWR获取的数据来判断Oracle数据库中可能的问题.
在Oracle9i及之前,DBA们已经拥有了很多很好用的性能分析工具,比如,tkprof、sql_trace、statspack、set event 10046&10053等等。这些工具能够帮助DBA很快的定位性能问题。但这些工具都只给出一些统计数据,然后再由DBA们根据自己的经验进行优化。
Oracle10g中推出了新的优化诊断工具:数据库自动诊断监视工具(Automatic Database Diagnostic Monitor :ADDM)和SQL优化建议工具(SQL Tuning Advisor: STA)。这两个工具的结合使用,能使DBA节省大量优化时间,也大大减少了系统宕机的危险。简单点说,ADDM就是收集相关的统计数据到自动工作量知识库(Automatic Workload Repository :AWR)中,而STA则根据这些数据,给出优化建议。
例如,一个系统资源紧张,出现了明显的性能问题,由以往的办法,做个一个statspack快照,等30分钟,再做一次。查看报告,发现’ db file scattered read’事件在top 5 events里面。根据经验,这个事件一般可能是因为缺少索引、统计分析信息不够新、热表都放在一个数据文件上导致IO争用等原因引起的。根据这些经验,我们需要逐个来定位排除,比如查看语句的查询计划、查看user_tables的last_analysed子段,检查热块等等步骤来最后定位出原因,并给出优化建议。但是,有了STA以后,它就可以根据ADDM采集到的数据直接给出优化建议,甚至给出优化后的语句。
ADDM能发现定位的问题包括:
•操作系统内存页入页出问题
•由于Oracle负载和非Oracle负载导致的CPU瓶颈问题
•导致不同资源负载的Top SQL语句和对象——CPU消耗、IO带宽占用、潜在IO问题、RAC内部通讯繁忙
•按照PLSQL和Java执行时间排的Top SQL语句.
•过多地连接 (login/logoff).
•过多硬解析问题——由于shared pool过小、书写问题、绑定大小不适应、解析失败原因引起的。
•过多软解析问题
•索引查询过多导致资源争用.
•由于用户锁导致的过多的等待时间 (通过包dbms_lock加的锁)
•由于DML锁导致的过多等待时间(例如锁住表了)
•由于管道输出导致的过多等待时间(如通过包dbms_pipe.put进行管道输出)
•由于并发更新同一个记录导致的过多等待时间(行级锁等待)
•由于ITL不够导致的过多等待时间(大量的事务操作同一个数据块)
•系统中过多的commit和rollback(logfile sync事件).
•由于磁盘带宽太小和其他潜在问题(如由于logfile太小导致过多的checkpoint,MTTR设置问题,过多的undo操作等等)导致的IO性能问题I
•对于DBWR进程写数据块,磁盘IO吞吐量不足
•由于归档进程无法跟上redo日至产生的速度,导致系统变慢
•redo数据文件太小导致的问题
•由于扩展磁盘分配导致的争用
•由于移动一个对象的高水位导致的争用问题
•内存太小问题——SGA Target, PGA, Buffer Cache, Shared Pool
•在一个实例或者一个机群环境中存在频繁读写争用的热块
•在一个实例或者一个机群环境中存在频繁读写争用的热对象
•RAC环境中内部通讯问题
•LMS进程无法跟上导致锁请求阻塞
•在RAC环境中由于阻塞和争用导致的实例倾斜
•RMAN导致的IO和CPU问题
•Streams和AQ问题
•资源管理等待事件
ADDM提供了一个整体的优化方案.基于一段时间内的AWR snapshots(默认一小时一次)可以执行ADDM 分析,它可以帮我们诊断在这段期间内数据库可能存在的瓶颈.
ADDM分析的目标是减小吞吐量的度量值, 在这里我们将它称为"DB TIME". DB TIME是一个累积值(数据库服务器处理用户请求所花费的时间). 它包括了等待时间和CPU处理的时间(针对所有活跃的用户进程而言),可以通过查询下面两个视图来获得它的值: V$SESS_TIME_MODEL, V$SYS_TIME_MODEL.
AWR收集的数据时放到内存中(share pool),通过一个新的后台进程MMON定期写到磁盘中。所以10g的share pool要求比以前版本更大,一般推荐比以前大15-20%。
注意: ADDM不会将处理用户响应时间作为调优的目标, 你应该使用"TRACE"技术来监控它.
通过减小"DB TIME", 使用同样多的系统资源,数据库服务器可以处理更多的用户请求,也就是提高了吞吐量. 通过ADDM报告的问题是按照DB time排序的.
二. ADDM 分析的结果
ADDM 分析的结果以一些"Finding"的样式来表达. 每个"Finding"都属于以下三种类型之一:
1. 问题: 描述了导致数据库性能问题的根源;
2. 征兆: 包含了可能导致其他问题的信息
3. 信息: 报告其他没有问题的模块
三. 设置ADDM
缺省情况下,ADDM已经被自动启用,通过初始化参数文件中的STATISTICS_LEVEL来控制.
这个参数应该被设置成TYPICAL或者ALL(缺省值是TYPICAL).如果你将这个参数设置成basic,很多Oracle的特性将被屏蔽.
ALTER SESSION SET STATISTICS_LEVEL= TYPICAL;
ADDM 对于I/O性能的评估分析在部分程度上依赖于这个DBIO_EXPECTED. 这个参数的含义是读取一个数据块所花费的平均时间(以微秒为单位). Oracle使用的是缺省值(10毫秒), 对于现在流行的硬盘来说, 这是一个比较合适的值.如果你的硬盘比较陈旧,或者你有一个非常好的RAM DISK,请修改这个值.
为了决定DBIO_EXPECTED这个参数该怎样去正确地配置,需要完成下面的步骤:
1. 基于你的机器的硬件,估量一下读取单个数据库块所花费的平均时间.
注意:这个度量应该针对随机的I/O(包括寻道的时间).传统的值应该属于5000-20000微秒这个区间.
2. 为接下来的ADDM执行设置一个时间参数. 例如:如果估计的值是8000微秒,你应该以SYS的身份执行
下面的过程:
EXECUTE DBMS_ADVISOR.SET_DEFAULT_PARAMETER ('ADDM','DBIO_EXPECTED',8000);
四. 通过Oracle Enterprise Manager来访问ADDM:
五. 诊断与ADDM相关的问题:
为了诊断数据库性能问题, ADDM分析可以跨越任意两个snapshots,只要它们满足下面两个条件:
1. 两个快照在创建过程中没有错误并且没有被删除;
2. 两个快照期间数据库不能发生关闭和启动的事件
(同statspack).
最简单的运行ADDM分析的方法就是运行Enterprise Manager.
另外,也可以手工地执行 $ORACLE_HOME/rdbms/admin/addmrpt.sql以及dbms_advisor包.
这些脚本和包可以被任何用户执行,只要它们被赋予了ADVISOR的角色.
5.1 使用addmrpt.sql来运行
和statspack包中的spreport.sql非常相似
5.2 使用dbms_advisor包:
基本步骤:
1) 创建一个task: dbms_advisor.create_task ;
2) 设置相关的参数:
START_SNAPSHOT,END_SNAPSHOT
(通过DBMS_ADVISOR.SET_TASK_PARAMETER来完成)
3) 执行这个task: DBMS_ADVISOR.E
六. 与 ADDM相关的视图:
DBA_ADVISOR_TASKS
DBA_ADVISOR_LOG
DBA_ADVISOR_RECOMMENDATIONS
DBA_ADVISOR_FINDINGS
七.工作采集、诊断过程
Oracle10g提供了一个图形化的界面(通过OEM),使这个工具使用起来非常简单。下面这里介绍一下如何通过sqlplus使用这个工具。
第一步:创建测试用的表
SQL> CREATE TABLE bigtab AS SELECT rownum as "id", a.* FROM dba_objects a;
Table created.
SQL> create table smalltab as select rownum as "id", a.* FROM dba_tables a;
Table created.
SQL> ALTER TABLE bigtab MODIFY (empno NUMBER);
Table altered.
SQL> DECLARE
2 n NUMBER;
3 BEGIN
4 FOR n IN 1..100
5 LOOP
6 INSERT INTO bigtab SELECT rownum as "id", a.* FROM dba_objects a;
7 COMMIT;
8 END LOOP;
9 END;
/
PL/SQL procedure successfully completed.
第二步:采集一次工作量快照
SQL> begin
2 dbms_workload_repository.create_snapshot('TYPICAL');
3 end;
4 /
PL/SQL procedure successfully completed.
第三步:进行一些高负荷操作
DECLARE
v_var number;
BEGIN
FOR n IN 1..6
LOOP
select count(*) into v_var from bigtab b, smalltab a;
END LOOP;
END;
/
PL/SQL procedure successfully completed.
第四步:再次采集一次工作量快照
要注意的是:两次快照之间的间隔时间必须足够(一般推荐30分钟左右),否则得到的ADDM报告中就会提示:THERE WAS NOT ENOUGH DATABASE TIME FOR ADDM ANALYSIS.
SQL> begin
2 dbms_workload_repository.create_snapshot('TYPICAL');
3 end;
4 /
PL/SQL procedure successfully completed.
第五步:创建一个优化诊断任务并执行
先获取到两次快照的ID:
SQL> select snap_id from
2 (SELECT * FROM dba_hist_snapshot
3 ORDER BY snap_id desc)
4 where rownum <=2;
SNAP_ID
--------
66
65
然后创建优化任务,并执行。
DECLARE
task_name VARCHAR2(30) := 'DEMO_ADDM01';
task_desc VARCHAR2(30) := 'ADDM Feature Test';
task_id NUMBER;
BEGIN
dbms_advisor.create_task('ADDM', task_id, task_name, task_desc, null);
dbms_advisor.set_task_parameter(task_name, 'START_SNAPSHOT', 65);
dbms_advisor.set_task_parameter(task_name, 'END_SNAPSHOT', 66);
dbms_advisor.set_task_parameter(task_name, 'INSTANCE', 1);
dbms_advisor.set_task_parameter(task_name, 'DB_ID', 1712582900);
dbms_advisor.execute_task(task_name);
END;
/
PL/SQL procedure successfully completed.
DBID 查看sql
SQL> select dbid from v$database;
DBID
----------
1712582900
其中,set_task_parameter是用来设置任务参数的。START_SNAPSHOT是起始快照ID,END_SNAPSHOT是结束快照ID,INSTANCE是实例号,对于单实例,一般是1,在RAC环境下,可以通过查询视图v$instance得到,DB_ID是数据库的唯一识别号,可以通过查询v$database查到。
第六步:查看优化建议结果
通知函数dbms_advisor.get_task_report可以得到优化建议结果。
SQL> SET LONG 1000000 PAGESIZE 0 LONGCHUNKSIZE 1000
SQL> COLUMN get_clob FORMAT a80
SQL> SELECT dbms_advisor.get_task_report('DEMO_ADDM01', 'TEXT', 'ALL') FROM DUAL;
DBMS_ADVISOR.GET_TASK_REPORT('
--------------------------------------------------------------------------------
DETAILED ADDM REPORT FOR TASK 'DEMO_ADDM01' WITH ID 243
-------------------------------------------------------
Analysis Period: 23-NOV-2005 from 15:02:27 to 16:06:42
Database ID/Instance: 1712582900/1
Database/Instance Names: EDGAR/edgar
Host Name: HUANGED
Database Version: 10.2.0.1.0
Snapshot Range: from 65 to 66
Database Time: 1463 seconds
Average Database Load: .4 active sessions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FINDING 1: 100% impact (1463 seconds)
-------------------------------------
Significant virtual memory paging was detected on the host operating system.
RECOMMENDATION 1: Host Configuration, 100% benefit (1463 seconds)
ACTION: Host operating system was experiencing significant paging but no
particular root cause could be detected. Investigate processes that
do not belong to this instance running on the host that are consuming
significant amount of virtual memory. Also consider adding more
physical memory to the host.
FINDING 2: 100% impact (1463 seconds)
-------------------------------------
SQL statements consuming significant database time were found.
RECOMMENDATION 1: SQL Tuning, 68% benefit (998 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "064wqx7c5b81z". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 064wqx7c5b81z
DECLARE
v_var number;
BEGIN
FOR n IN 1..10000
LOOP
select count(*) into v_var from bigtab b, smalltab a;
END LOOP;
END;
RECOMMENDATION 2: SQL Tuning, 67% benefit (986 seconds)
ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
"fvqfghq71cqns".
RELEVANT OBJECT: SQL statement with SQL_ID fvqfghq71cqns and
PLAN_HASH 3281046854
SELECT COUNT(*) FROM BIGTAB B, SMALLTAB A
RATIONALE: SQL statement with SQL_ID "fvqfghq71cqns" was executed 6
times and had an average elapsed time of 166 seconds.
FINDING 3: 69% impact (1002 seconds)
------------------------------------
Time spent on the CPU by the instance was responsible for a substantial part
of database time.
RECOMMENDATION 1: SQL Tuning, 67% benefit (986 seconds)
ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID
"fvqfghq71cqns".
RELEVANT OBJECT: SQL statement with SQL_ID fvqfghq71cqns and
PLAN_HASH 3281046854
SELECT COUNT(*) FROM BIGTAB B, SMALLTAB A
RATIONALE: SQL statement with SQL_ID "fvqfghq71cqns" was executed 6
times and had an average elapsed time of 166 seconds.
RATIONALE: Average CPU used per execution was 162 seconds.
RECOMMENDATION 2: SQL Tuning, 2.1% benefit (30 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "2b064ybzkwf1y". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 2b064ybzkwf1y
BEGIN EMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
RATIONALE: SQL statement with SQL_ID "2b064ybzkwf1y" was executed 125
times and had an average elapsed time of 0.26 seconds.
RATIONALE: Average CPU used per execution was 0.24 seconds.
FINDING 4: 2.2% impact (33 seconds)
-----------------------------------
PL/SQL execution consumed significant database time.
RECOMMENDATION 1: SQL Tuning, 2.2% benefit (33 seconds)
ACTION: Tune the PL/SQL block with SQL_ID "2b064ybzkwf1y". Refer to the
"Tuning PL/SQL Applications" chapter of Oracle's "PL/SQL User's Guide
and Reference"
RELEVANT OBJECT: SQL statement with SQL_ID 2b064ybzkwf1y
BEGIN EMD_NOTIFICATION.QUEUE_READY(:1, :2, :3); END;
RATIONALE: SQL statement with SQL_ID "2b064ybzkwf1y" was executed 125
times and had an average elapsed time of 0.26 seconds.
RATIONALE: Average time spent in PL/SQL execution was 0.26 seconds.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ADDITIONAL INFORMATION
----------------------
Wait class "Application" was not consuming significant database time.
Wait class "Commit" was not consuming significant database time.
Wait class "Concurrency" was not consuming significant database time.
Wait class "Configuration" was not consuming significant database time.
Wait class "Network" was not consuming significant database time.
Wait class "User I/O" was not consuming significant database time.
Session connect and disconnect calls were not consuming significant database
time.
Hard parsing of SQL statements was not consuming significant database time.
The analysis of I/O performance is based on the default assumption that the
average read time for one database block is 10000 micro-seconds.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TERMINOLOGY
-----------
DATABASE TIME: This is the ADDM's measurement of throughput. From the user's
point of view: this is the total amount of time spent by users waiting for
a response from the database after issuing a call (not including
networking). From the database instance point of view: this is the total
time spent by forground processes waiting for a database resource (e.g.,
read I/O), running on the CPU and waiting for a free CPU (run-queue). The
target of ADDM analysis is to reduce this metric as much as possible,
thereby reducing the instance's response time.
AVERAGE DATABASE LOAD: At any given time we can count how many users (also
called 'Active Sessions') are waiting for an answer from the instance. This
is the ADDM's measurement for instance load. The 'Average Database Load' is
the average of the the load measurement taken over the entire analysis
period. We get this number by dividing the 'Database Time' by the analysis
period. For example, if the analysis period is 30 minutes and the 'Database
Time' is 90 minutes, we have an average of 3 users waiting for a response.
IMPACT: Each finding has an 'Impact' associated with it. The impact is the
portion of the 'Database Time' the finding deals with. If we assume that
the problem described by the finding is completely solved, then the
'Database Time' will be reduced by the amount of the 'Impact'.
BENEFIT: Each recommendation has a 'benefit' associated with it. The ADDM
analysis estimates that the 'Database Time' can be reduced by the 'benefit'
amount if all the actions of the recommendation are performed.