【Oracle计数器】CPU used by this session详解

dawee 发表于 2007-6-19 10:24:29

查了资料，介绍说是“这是在用户调用开始和结束之间会话所占用的 CPU 时间（以 10 毫秒为单位）”，我整个场景跑下来，这个值一直处于85%，why？

rickyzhu 发表于 2007-6-19 13:51:09

说清楚点问题的来龙去脉好吗，你这样问问题无法回答。

Zee 发表于 2007-6-19 15:16:52

关注。sdlkfj8

shanxi 发表于 2007-6-19 15:32:10

ORACLE CPU过高的一次调整过程

发现ＣＰＵ占用率一般都维持在９０％以上，由于大量使用ＦＯＲ　ＵＰＤＡＴＥ，造成大量ＬＡＴＣＨ等等待现象，其中某些线程ＳＱＬ又运行非常频繁。很久以前就发现这些问题，给出相对意见，建议不采用ＦＯＲ　ＵＰＤＡＴＥ或在ＦＯＲ　ＵＰＤＡＴＥ上加ＮＯＷＡＩＴ（见http://spaces.msn.com/sunmoonking/blog/cns!E3BD9CBED01777CA!318.entry　　　“ELECT FOR UPDATE 相关的知识”一章　　），由于项目进程问题一直没进行程序修改，决定在这次解决掉，以解后顾之忧。

调整前要知道如下问题
   1，CPU是多少MHZ
   2，SERVER 端的CPU是否有负荷较重
   3，CLIENT 端的CPU是否有负荷较重
   4，空闲时间（如半夜）CPU使用率是否超过15%，如果超过则需要特别注意了
   5，CPU的PEAK LOAD
   6，CPU的IDLE STATE

影响ＣＰＵ的因素
1，高的无必要的解析会代价昂贵。
发现那些SQL运行了大量的PARSE
select sql_text, parse_calls, executions from v$sqlarea order by parse_calls desc;
SYS的总的PARSE情况
select name, value from v$sysstat where name = ’parse count%’;
只有硬解析才能减少，可以绑定变量，或增加每一个SESSION的CACHED CURSORS。
2，导致大量I/O的SQL也会明显占用CPU，如没有INDEX。BUFFER GETS一般会同CPU一块增长。可以通过v$sqlarea发现buffer_gets
3，其他等待时间，可以通过v$sesstat,v$sysstat查看

#VMSTAT 5 5 的CPU部分
   ususer用掉的
   sysystem用掉的
   id空闲

调整前
TOPAS
   Name          PIDCPU%PgSp Owner
   oracle    68841625.2 4.6 orasbp
   oracle    56965824.9 4.5 orasbp
   topas    676210 0.1 2.9 root
   syncd       77964 0.0 0.5 root
   hatsd    159792 0.0 8.3 root

或者ps aux|head
检查CPU数量
   /usr/sbin/bindprocessor -q
   The available processors are:0 1 2 3

STATSPACK的信息
                                                Total Wait wait Waits
Event                            Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
enqueue                         5,465    121    1,793 328    0.6
latch free                      2,986    2,669       21    7    0.3
根据消耗最多CPU的进程PID来得到SID详细信息
select s.sid from v$process p, v$session s
where s.paddr=p.addr and p.spid =&your_spid;
根据SID查SQL
SELECT SQL_TEXT from V$SQLTEXT_WITH_NEWLINES where HASH_VALUE
= (select sql_hash_value from v$session
where SID = <problem_SID_you_got_from_last_step>) ;
还有
select n.name,s.value
from v$statname n,V$sesstat s
where n.statistic# = s.statistic#
and value > 0
and s.sid = (select a.sid from v$process p,v$session a
where p.addr =a.paddr
and a.terminal = userenv(’terminal’))
order by n.class,n.name
用以上SQL完成SHELL(shell信息在后面whoit.sh)，运行
sh whoit.sh 688416来根据PID得到用户信息和SQL语句

在STATSPACK和我的whoit.sh都指定是这个ＳＱＬ的问题
SELECT demessageid, fromid, apptype, demessage,appversion,vovers
ion ,toid,tag FROM de_receivelogWHERE (status = :1 and rownum<
=1 and (dealtime is null or dealtime<sysdate) and delock=0) or (
delock=1 and Update_Date<sysdate-2/24) order by RECEIVETIME for
update

再看等待事件
select sid||' '||event||' '|| total_waits||' '||average_wait             from v$session_event where sid=25
SQL> /
SID||''||EVENT||''||TOTAL_WAITS||''||AVERAGE_WAIT
--------------------------------------------------------------------------------
25 latch free 46180 1
25 control file sequential read 4 0
25 log file sync 1 0
25 db file sequential read 202 0
25 db file scattered read 445 1
25 SQL*Net message to client 22 0
25 SQL*Net message from client 22 0

只有一个DEFAULT 池
NAME                                  HIT_RATIO
---------------------------------------- ----------
DEFAULT                               .88042806

设置db_keep_cache_size池，并ＫＥＥＰ表
ALTERTABLE customer STORAGE (BUFFER_POOLRECYCLE|KEEP|DFAULT)
SQL> analyze table sbpopt.de_receivelog compute statistics;
建立相关索引
alter tableTI_REPAIR_DEED storage(buffer_pool keep);
alter tablede_receivelogstorage(buffer_pool keep);

看到相关的等待都是LATCH FREE，enqueue，估计是由于SELECT FOR UPDATE并且全表扫描造成的
查看每个Session的CPU利用情况：
select ss.sid||' '||se.command||' '||ss.value CPU ||' '||se.username||' '||se.program
from v$sesstat ss, v$session se
where ss.statistic# in
(select statistic#
from v$statname
where name = 'CPU used by this session')
and se.sid=ss.sid
and ss.sid>6
order by ss.value

根据STATSPACK的HASH VALUE 用SQL>@sprepsql得到
STATSPACK SQL report for Hash Value: 1710202187Module: JDBC Thin Client
DB Name       DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
SBP       3008872479 SBP             1 9.2.0.6.0 NO    svodbp01
Start Id Start Time       End Id    End Time    Duration(mins)
--------- ------------------- --------- ------------------- --------------
   44 23-May-06 14:13:01       45 23-May-06 14:28:00       14.98
SQL Statistics
~~~~~~~~~~~~~~
-> CPU and Elapsed Time are in seconds (s) for Statement Total and in
milliseconds (ms) for Per Execute
                                                   % Snap
                  Statement Total    Per Execute Total
                  ------------------------------------
   Buffer Gets:    6,938,821       7,608.4 63.34
      Disk Reads:             0          0.0 .00
Rows processed:          197          0.2
CPU Time(s/ms):          508          557.2
Elapsed Time(s/ms):          607          665.8
         Sorts:       1,292          1.4
   Parse Calls:          191             .2
   Invalidations:             0
   Version count:             1
Sharable Mem(K):          22
      Executions:          912
SQL Text
~~~~~~~~
SELECT demessageid, fromid, apptype, demessage,appversion,vovers
ion ,toid,tag FROM de_receivelogWHERE (status = :1 and rownum<
=1 and (dealtime is null or dealtime<sysdate) and delock=0) or (
delock=1 and Update_Date<sysdate-2/24) order by RECEIVETIME for
update nowait
===============================================================
15分钟内执行了912次，占用22K内存（不多），每次执行1.4个排序，共花了508秒，由于已经采取措施将此表ＫＥＥＰ与内存中，因此没有磁盘读，但每次的逻辑读7,608.4个块

然后将SQL语句的FOR UPDATE改成 FOR UPDATE NOWAIT；　　　　　这才是真正解决问题的地方
至此，通过观察ＣＰＵ一般维持在２５％以内。

等待事件
enqueue(队列)Indicates waits associated with internal queuing mechanism for locking various resources and components of Oracle. Please refer to Appendix B of the Oracle8i Reference Manual for the complete list of enqueues in Oracle.是一种保护共享资源的琐，锁可以保护纪录里面数据一类的共享资源，防止两个人同时修改该纪录，fifo，enqueue等待通常是 ST enqueue,HW enqueue ,TX4 enqueue。ST enqueue在数据字典管理方式表空间的物理空间分配和管理上发挥作用。当有此问题时可以转变为本地管理表空间或预先分配扩展。HW enqueue同段的高水位线一起使用，手工分配扩展可避免其上的等待。TX4 是最常见的，1，唯一索引的重复，需要COMMIT/ROLLBACK来释放ENQUEUE。2，多个并发的对同一位图索引片的修改，同上。3，多用户同时修改相同的数据块，如果已经没有空闲的ITL槽，则会引发一数据块的锁，运用增大initrans或maxtrans以容纳更多ITL槽的方法来解决，或增大PCTFREE。(ST:使用本地表空间或预先分配大扩展,HW预先分配扩展于高水水位线之上,TX4增大表或索引的initrans和maxtrans,TM为外键建立索引,察看日志放在快速磁盘上)
latch free(锁存器空闲，拴空闲) Indicates latch contention for the latch# that is being waited on. Ensure that you already have tuned the number of latches to their allowed maximums by setting the relevant init.ora parameters. If the problem persists, you should determine what is causing the contention for the latch and fix the underlying problem. Your goal should be to cure the disease not the symptom. A latch free event is a symptom of a bigger problem. For example, if the latch# derived from this is a library cache latch (assuming that the shared pool is configured appropriately), it may indicate a significant amount of hard parsing. This usually is a problem with applications that have hard-coded values in them. Either rewrite them with bind variables or upgrade to Oracle8i and use CURSOR_SHARING=force (or just look the other way).Latch Free是一种低级的同步锁机制，用以维持某些访寻和执行操作的顺序。Oracle通过enqueue来达成对重做线程/表/事务一类对象的并发使用，而通过Latch来达成对SGA中共享内存结构的保护。Latch速度快而成本低，往往通过单个的内存单元来实现。大部分Latch是互斥的。共享型Latch许可对其内存结构的并发读操作。当请求得到Latch但它已被其它进程占住时，将产生一条Latch free miss的纪录。大多数Latch问题关联到不使用绑定变量（Library Cache Latch），重做日志生成中问题（Redo Allocation Latch），内存缓冲区竞争问题（Cache Buffers LRU Chain）和内存缓冲区中‘热块’的数据块（Cache Buffers Chain）。也有一些Latch Waits是由软件错误而造成的。首先确认足够的LATCH数,然后确认那个锁存器(V$SESSION_WAIT的P2),再确认是什么引起的.

#cat whoit.sh
sqlplus /nolog <<EOF
connect / as sysdba
col machine format a30
col program format a40
set line 200
select sid,serial# ,username,osuser,machine,program,process,to_char(logon_time,'yyyy/mm/dd hh24:mi:ss')
         from v\$session where paddr in
         ( select addr from v\$process where spid in($1));
select sql_text from v\$sqltext_with_newlines
         where hash_value in
         (select SQL_HASH_VALUE from v\$session where
            paddr in (select addr from v\$process where spid=$1)
      )
         order by piece;
exit;
EOF

dawee 发表于 2007-6-21 17:44:03

原帖由 rickyzhu 于 2007-6-19 13:51 发表 http://bbs.51testing.com/images/common/back.gif
说清楚点问题的来龙去脉好吗，你这样问问题无法回答。

我其实就是想问这个指标是什么意思
我认为应该不是CPU的占用率，那应该是什么呢？

因为现在测试的一个系统，数据库服务器与应用服务器为同一台机器，在考虑CPU的占用率的时候，就要考虑到底是应用占用了CPU，还是数据库占用了CPU

Jerry_zhiyuan 发表于 2007-6-22 23:40:05

学习中

shanxi 发表于 2007-6-24 13:21:54

回复 #5 dawee 的帖子

没看到你上面那个回复有 CPU used by this session的详解吗?

不可教也!

dawee 发表于 2007-6-24 22:02:12

原帖由 shanxi 于 2007-6-24 13:21 发表 http://bbs.51testing.com/images/common/back.gif
没看到你上面那个回复有 CPU used by this session的详解吗?

不可教也!

首先感谢你的帮助

其次要说的是，你动辄就copy一个长篇大论出来，谁会有耐心看，恐怕你自己都没有搞的清清楚楚就直接贴过来给别个看

wenbei 发表于 2007-6-25 17:32:48

回复 #4 shanxi 的帖子

很好
值得学习

rhymesky 发表于 2007-7-2 13:01:24

谢谢分享！

shanxi 发表于 2007-7-2 13:25:12

回复 #8 dawee 的帖子

BackGround没有，怎么去理解这个概念
光给个名词解释肯定说不清楚

testtoone 发表于 2011-5-17 13:20:25

mark一下

页: [1]

51Testing软件测试论坛 's Archiver