Sunday, September 20, 2009

Hang Analyze for Oracle Database

The following blog was created because of the difficulty in getting Oracle give a reason for performance issues that are not repeatable. Normally when an SR is raised, after quite some time Oracle would ask for a list of files that we should have taken during the performance issue window.

The following steps would be best to find the root cause of a performance issue:

1) Using Truss/ Trace on the DB Process in the server

- Attaching to an existing process
- Use 'ps' to identify the process you wish to trace.
Eg: ps -ef | grep oracleTEST

Log in as a user who has permission to control the process you want to trace. This means logging in as either ROOT or the process owner.

Find a writable directory which has lots of spare space. Eg: We assume here /tmp has lots of free space.

Enter the relevant command:

truss -o /tmp/mytruss -p

or:

trace -o /tmp/mytruss -p

To follow child processes:

truss -o /tmp/mytruss -fae -p

This will run until the process (and any traced children) exit. You can interrupt the capture of information if only a snapshot of the trace is required.

2) Use of Hang Analyze in the Database

- Run the following in the Database which is hanging or in which a performance
issue is faced:

sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug hanganalyze 4
oradebug dump systemstate 266
quit

NOTE: THE SYSTEMSTATE should be taken carefully as it would bring the Instance/ Database to a standstill.

3) Upload files to Oracle Support

- Upload the relevant trace files created in the /tmp and udump folder to My Oracle Support for analysis


A Sample Action plan that I got for an issue is given below. I would suggest that it is best to follow this so that we do not end up Oracle telling us to do this N days after the issue was reported, by when this would be too late:

ACTION PLAN
===========
Steps suggested by Oracle Support during a performance issue is as follows:

Provide the 10046 trace for the slow select/update.
• To gather 10046 trace at the session level:
alter session set tracefile_identifier='10046';

alter session set timed_statistics = true;
alter session set statistics_level=all;
alter session set max_dump_file_size = unlimited;

alter session set events '10046 trace name context forever,level 12';

-- Execute the queries or operations to be traced here --

select * from dual;
exit;

Also provide the errorstack on the hung session.

connect / as sysdba
oradebug setospid
oradebug unlimit
oradebug dump errorstack 3
----wait for 10 secs----
oradebug dump errorstack 3
----wait for 10 secs----
oradebug dump errorstack 3
----wait for 10 secs----

Upload the traces generated in udump to metalink.

Using SQL*Plus connect as SYSDBA using the following command:
sqlplus " / as sysdba"
Do this 2 times in 2 separate windows, creating 2 sqlplus sessions (SQL1 and SQL2)
In SQL1 gather the hanganalyze by executing the following:
SQL1> oradebug setmypid
SQL1> oradebug unlimit;
SQL1> oradebug hanganalyze 3

In SQL2 gather the systemstates by executing the following:
SQL2> oradebug setmypid
SQL2> oradebug unlimit;
SQL2> oradebug dump systemstate 266

Gather a second hang analyze having waited at least 1 minute to give time to identify process state change
s. In SQL1 execute the following:
SQL1> oradebug hanganalyze 3

In SQL2 execute the following to collect a second systemstate dump:
SQL2> oradebug setmypid
SQL2> oradebug unlimit;
SQL2> oradebug dump systemstate 266

No comments:

Post a Comment