Sunday, April 10, 2011

SQL Plan Management

Overview

Goal of this page is to help teams stabilize their queries execution plan. It contains scripts queries and documents which should help DBAs and SDEs to build more stable service.

Automatic Plan Capture

alter system set optimizer_capture_sql_plan_baselines = true scope = both;
alter system set optimizer_use_sql_plan_baselines = true scope = both;

Queries with plan recommendations

select sql_handle, plan_name, fixed
from dba_sql_plan_baselines
where accepted = 'NO'
and enabled = 'YES'
and creator = 'OLTP_USER'
and module = 'ServiceName';

More details about plan recommendation

select sql_text, plan_name, enabled, accepted, fixed, autopurge
from dba_sql_plan_baselines
where sql_handle = 'SYS_SQL_008fa0f9324d1f14';

Compare execution plans

select * from table(
dbms_xplan.display_sql_plan_baseline(
sql_handle=>'SYS_SQL_008fa0f9324d1f14',
format=>'basic'));

Compare all execution plans

select t.plan_table_output
from (
select distinct sql_handle
from dba_sql_plan_baselines
where accepted = 'NO' and creator = 'OLTP_USER') pb,
table(dbms_xplan.display_sql_plan_baseline(pb.sql_handle, NULL, 'basic')) t;

Identify sql_handle for given query in plan baselines

select sql_text from v$sql
where sql_id = 'anuq0jdxngy4k';

select sql_handle, plan_name, enabled, accepted, fixed from dba_sql_plan_baselines
where sql_text = ' ';

Evolve execution plans

set serveroutput on
set long 10000
declare
report clob;
begin
report := dbms_spm.evolve_sql_plan_baseline(
sql_handle => 'SYS_SQL_006093f2f146bd13');
dbms_output.put_line(report);
end;
/

Evolve all execution plans

var report clob;
exec :report := dbms_spm.evolve_sql_plan_baseline();

Drop plan baseline

set serveroutput on
set long 10000
declare
report natural;
begin
report := dbms_spm.drop_sql_plan_baseline( 'SYS_SQL_006093f2f146bd13', 'SYS_SQL_PLAN_0b22ad06749f2b9c');
dbms_output.put_line(report);
end;
/

Drop all baselines

set serveroutput on
set long 10000
DECLARE CURSOR delete_cursor IS
select distinct sql_handle, plan_name from dba_sql_plan_baselines;
v_counter NUMBER(38) DEFAULT 0;
report natural;
BEGIN
FOR c_pdr in delete_cursor
LOOP
report := dbms_spm.drop_sql_plan_baseline(c_pdr.sql_handle, c_pdr.plan_name);
dbms_output.put_line('Query: ' || c_pdr.sql_handle || ' plan: '||c_pdr.plan_name);
END LOOP;
END;
/

Alter plan baseline

set serveroutput on
declare
l_plans_altered pls_integer;
begin
l_plans_altered := dbms_spm.alter_sql_plan_baseline(
sql_handle => 'SYS_SQL_006093f2f146bd13',
plan_name => 'SYS_SQL_PLAN_d90440b9ed3324c0',
attribute_name => 'ENABLED',
attribute_value => 'NO');
dbms_output.put_line('Plans Altered: ' || l_plans_altered);
end;
/
set serveroutput on
declare
l_plans_altered pls_integer;
begin
l_plans_altered := dbms_spm.alter_sql_plan_baseline(
sql_handle => 'SYS_SQL_006093f2f146bd13',
plan_name => 'SYS_SQL_PLAN_d90440b9ed3324c0',
attribute_name => 'ACCEPTED',
attribute_value => 'NO');
dbms_output.put_line('Plans Altered: ' || l_plans_altered);
end;
/

Query used in package

select t.plan_table_output
from (
select distinct sql_handle
from dba_sql_plan_baselines
where accepted = 'NO' and creator = 'OLTP_USER'
and last_verified is null) pb,
table(dbms_xplan.display_sql_plan_baseline(pb.sql_handle, NULL, 'basic')) t;

Remove all

alter system set optimizer_capture_sql_plan_baselines = false scope = both;
alter system set optimizer_use_sql_plan_baselines = false scope = both;

ORA-38141

When you see this error:

ORA-38141: SQL plan baseline SQL_PLAN_gzs1skpuycuhdfe752e07 does not exist

Cause for the error:

  • fail over to standby without parameters set properly.
  • renaming baseline plan name.

Background:

SQL Plan Baseline is using outline tables. There is no 1 to 1 connection between sql_id and sql_handle. When Oracle evolves plan it gets signature. Then it uses signature to build outline data:

SELECT (xmltype(od.comp_data).extract('/outline_data')).getClobVal()
FROM sys.sqlobj$data od
WHERE od.signature = 12104247321009801218;

And then it prepares record.

SELECT PLAN_TABLE_OUTPUT FROM TABLE(CAST(DBMS_XPLAN.PREPARE_RECORDS(:B1 , :B2 ) AS SYS.DBMS_XPLAN_TYPE_TABLE))

So if you get in this trouble you will have to drop incorrect plan names and recreate them. I couldn't find better solution.

alter system set optimizer_capture_sql_plan_baselines = false;
alter system set optimizer_use_sql_plan_baselines = false;

DECLARE CURSOR delete_cursor IS
select distinct sql_handle, plan_name from dba_sql_plan_baselines where rownum < 10000;
v_counter NUMBER(38) DEFAULT 0;
report natural;
BEGIN
FOR c_pdr in delete_cursor
LOOP
report := dbms_spm.drop_sql_plan_baseline(c_pdr.sql_handle, c_pdr.plan_name);
-- dbms_output.put_line('Query: ' || c_pdr.sql_handle || ' plan: '||c_pdr.plan_name);
END LOOP;
END;
/
alter system set optimizer_capture_sql_plan_baselines = true scope = both;
alter system set optimizer_use_sql_plan_baselines = true scope = both;

ORA-06502

SYS_SQL_a76d706cc288f226 An uncaught error happened in display_sql_plan_baseline : ORA-06502: PL/SQL: numeric or value error

It is bug: Bug 8366552, Base Bug 7459168

Drop all baseline plans for that handle.

Friday, September 24, 2010

STOPKEY in Top N, Pagination and Iteration queries



Web services or report teams frequently have a need to display top N or last N rows from a table or group of joined tables, with some criteria applied.

Engineers commonly make the following mistake when designing queries and indexes to solve the top N query problem: they only create an index matching the leading predicate.

Common misconceptions:

  • Order of the columns in index is not important
  • When building an index we choose columns with good selectivity used as a query predicate.
  • Select * from table where rownum <>
  • Service has been tested and it works until customer with many records logs in.

How to identify problem:

If Top N is not implemented or if it is not implemented properly query can cause problems for application and for database. It usually can be recognized by increased customer latencies or high CPU utilization if all sessions keep reading same blocks or High I/O when each request asks for new data:

  • Latency
  • High user CPU
  • High I/O

Incorrect implementation:

Order of parsing is WHERE, ROWNUM, ORDER BY. Data will be read and then sorted.


Correct inefficient:

We pull ORDER BY to inner query and rownum to outer. Now we get correct data but it is not efficient. Index is built on predicate which is not good enough for Top N.


To have efficient query we have to build an index on WHERE clause column and ORDER BY column. Now query will be able to walk the index and stop when increment reaches 5. Therefore STOPKEY can work. How do we know: there is not ORDER BY STOPKEY.


So now that we have learned how to use STOPKEY and how to write an efficient Top N query, let's see how we can do it with pagination.

First what we can't do. We can't start counting from lower limit.


We have to get all the data and then filter bottom. Now when we know how to write pagination correctly we understand that we can't use this to paginate through all data because every time we read them all and filter up to lower limit.


We have iterate. How can we use Top N (STOPKEY) to achieve this? By remembering where we stopped last time and include that unique value in our index. So index now is predicates, unique value and ORDER BY. Since you do want to iterate through all data then usually you can ORDER BY on same unique value you will use to remember where you finished in last iteration.


So just remember last value and pass it again. You will not see "sort order by" STOPKEY. Also as you iterate trough data number of consistent gets will stay same. - That is what we want.

Now we will have more examples with different requirements. So what if we have predicate that requires range scan that is not in sync with order by column. Index is not sorted the way we want because first column has multiple values and it sorted on that column, while we need data to be sorted on second column.

Flip the order of the columns in the index. Now we have both good index and sorted properly.

One more example:

If we have multiple tables, ORDER BY must be in sync with order of data query will read. As long as we can follow first index data, step to second index follow that index data, step to third index follow that index data and go back. If all of this matches the ORDER BY clause then STOPKEY will work.


So finally how do we know it works?

Bad execution plan:

Good execution plan:

Rule:

Following must match:
  • Combination of lead predicate columns and all order by columns
  • All index columns
  • Both are sorted in same direction

Quick check. If your Execution Plan has following step your implementation is incorrect:


SORT ORDER BY STOPKEY



Thursday, September 23, 2010

Data Scaling with Producer Consumer



Most applications that work with data eventually have a need to process more data then their inital solution is built for. Therefore they start thinking about scaling. Here I will try to explain what not to do and what would I recommend.

First we have to identify components:


Usually at start application is built to synchronously process data. Same process is used to get data to process it and store result.


Due to data growth or some other reason we get to the point that we can't process data as fast as we get new request so we start falling behind. First approach is usually to separate producer and consumer since they are logically different things and one is usually faster then the other one.

Here we got Asynchronous processing. Producer brings data and Consumer processes data.

Problem here is that most frequent solutions is that Producer stores data into table and Consumer goes to look for data in that table to processes it. After completed work record s marked as processed. As we get more and more data it is more difficult to find rows that have to be processed.


Solution is to include queue table between producer and consumer. Producer stores data into queue. Consumer picks it up, do processing and stores into storage table. Queue record is removed after processing.


To scale Producers all we have to do is bring more instances of same process.


Queue table needs unique identifier, info about who Consumer, time when record is ready for processing and release version number. To scale queue all we have to do is create new one - scale horizontally.


Real problem is how to scale Consumers.



First solution is usually create more instances of same processes. Oracle reads are read consistent and the most likely all consumers will get same records.

To avoid different processes updating same rows developers implement optimistic locking, checking that column value is same as when they picked it up.

This will prevent multiple updates but work will still be wasted and it will not scale.


Nest solution I see which prevents first problem is lock records so another consumer doesn't repeat work. One Consumer picks up data locks record. Rest N-1 Consumers come look for same records and try to lock same records. This will cause spin lock and it is easy to identify by lock waits or high system CPU.


To avoid this update nowait is used. Now we prevented spin locks but we didn't scale. Because of Oracle consistent read we have to get our consumers to look for different records.


We have to iterate through data. So Consumer I will get M rows. Consumer II-N will come and see 1-M rows locked. Next time they will go from (M+1)-(2M) ... One Consumer will get these rows and next will have to increment again. This will scale but it is not efficient. N Consumer will have to read 1-M, (M+1)-(2M) ... ((N-1)M+1)-(NM) .. Lots of data.


To avoid this we have to do "select for update skip locked" ... Now we can process much more data by increasing number of Consumers.


In next blog I will try to explain how to iterate through data without need to use "select for update".