[database] How often should Oracle database statistics be run?

In your experience, how often should Oracle database statistics be run? Our team of developers recently discovered that statistics hadn't been run our production box in over 2 1/2 months. That sounds like a long time to me, but I'm not a DBA.

This question is related to database performance oracle statistics

The answer is


In the case of a data warehouse-type system you can consider collecting no statistics at all, and relying on dynamic sampling (setting optimizer_dynamic_sampling to level 2 or above).


What Oracle version are you using? Check this page which refers to Oracle 10:

http://www.acs.ilstu.edu/docs/Oracle/server.101/b10752/stats.htm

It says:

The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job.


With 10g and higher version of oracle, up to date statistics on tables and indexes are needed by the optimizer to make "good" execution plan decision. How often you collect statistics is a tricky call. It depends on your application, schema, data rate and business practice. Some third party apps which are written to be backward compatible with older version of oracle do not perform well with the new optimizer. Those application require that tables have no stats so that the db resorts back to rule base execution plan. But on the average oracle recommends that stats be collected on tables with stale statistics. You can set tables to be monitor and check their state and have them analyze if/when stale. Often that is enough, sometime it is not. It really depend on your database. For my database we have a set of OLTP tables that need nightly stats collection to maintain performance. Other tables are analyze once a week. On our large dw database, we analyze as needed as the tables are too large for regular analysis without affecting overall db load and performance. So the correct answer is, it depends on the application, data change and business needs.


In the case of a data warehouse-type system you can consider collecting no statistics at all, and relying on dynamic sampling (setting optimizer_dynamic_sampling to level 2 or above).


What Oracle version are you using? Check this page which refers to Oracle 10:

http://www.acs.ilstu.edu/docs/Oracle/server.101/b10752/stats.htm

It says:

The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job.


Make sure to balance the risk that fresh statistics cause undesirable changes to query plans against the risk that stale statistics can themselves cause query plans to change.

Imagine you have a bug database with a table ISSUE and a column CREATE_DATE where the values in the column increase more or less monotonically. Now, assume that there is a histogram on this column that tells Oracle that the values for this column are uniformly distributed between January 1, 2008 and September 17, 2008. This makes it possible for the optimizer to reasonably estimate the number of rows that would be returned if you were looking for all issues created last week (i.e. September 7 - 13). If the application continues to be used and the statistics are never updated, though, this histogram will be less and less accurate. So the optimizer will expect queries for "issues created last week" to be less and less accurate over time and may eventually cause Oracle to change the query plan negatively.


Whenever the data changes "significantly".

If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.

Statistics store information about item counts and relative frequencies -- things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.


When I was managing a large multi-user planning system backed by Oracle, our DBA had a weekly job that gathered statistics. Also, when we rolled out a significant change that could affect or be affected by statistics, we would force the job to run out of cycle to get things caught up.


Make sure to balance the risk that fresh statistics cause undesirable changes to query plans against the risk that stale statistics can themselves cause query plans to change.

Imagine you have a bug database with a table ISSUE and a column CREATE_DATE where the values in the column increase more or less monotonically. Now, assume that there is a histogram on this column that tells Oracle that the values for this column are uniformly distributed between January 1, 2008 and September 17, 2008. This makes it possible for the optimizer to reasonably estimate the number of rows that would be returned if you were looking for all issues created last week (i.e. September 7 - 13). If the application continues to be used and the statistics are never updated, though, this histogram will be less and less accurate. So the optimizer will expect queries for "issues created last week" to be less and less accurate over time and may eventually cause Oracle to change the query plan negatively.


Since Oracle 11g statistics are gathered automatically by default.

Two Scheduler windows are predefined upon installation of Oracle Database:

  • WEEKNIGHT_WINDOW starts at 10 p.m. and ends at 6 a.m. every Monday through Friday.
  • WEEKEND_WINDOW covers whole days Saturday and Sunday.

When statistics were last gathered?

SELECT owner, table_name, last_analyzed FROM all_tables ORDER BY last_analyzed DESC NULLS LAST; --Tables.
SELECT owner, index_name, last_analyzed FROM all_indexes ORDER BY last_analyzed DESC NULLS LAST; -- Indexes.

Status of automated statistics gathering?

SELECT * FROM dba_autotask_client WHERE client_name = 'auto optimizer stats collection';

Windows Groups?

SELECT window_group_name, window_name FROM dba_scheduler_wingroup_members;

Window Schedules?

SELECT window_name, start_time, duration FROM dba_autotask_schedule;

Manually gather Database Statistics in this Schema:

EXEC dbms_stats.gather_schema_stats(ownname=>NULL, cascade=>TRUE); -- cascade=>TRUE means include Table Indexes too.

Manually gather Database Statistics in all Schemas!

-- Probably need to CONNECT / AS SYSDBA
EXEC dbms_stats.gather_database_stats;

What Oracle version are you using? Check this page which refers to Oracle 10:

http://www.acs.ilstu.edu/docs/Oracle/server.101/b10752/stats.htm

It says:

The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job.


When I was managing a large multi-user planning system backed by Oracle, our DBA had a weekly job that gathered statistics. Also, when we rolled out a significant change that could affect or be affected by statistics, we would force the job to run out of cycle to get things caught up.


Generally it's not recommended to gather statistics so frequent on the whole database unless you have a strong justification for that, such as a bulk insert or big data change happen frequently on the database. gathering statistics on the database in this frequency MAY change the queries execution plan to a new poor execution plans, the thing may cost you much time trying to tune every query affected by the new poor plans, this is why you should test the impact of gathering new statistics on a test database, or in case you don't have the time or the man power for that, at least you should keep a fallback plan by backing up the original statics before you gather new ones, so in case you gather a new statistics and then the queries didn't perform as expected, you can easily restore back the original statistics.

There is a very useful script can help you backup original statistics and gather new ones and provide you with SQL command you can use to restore back the original statics in case the thing didn't go as expected after gathering new statistics. You can find the script in this link: http://dba-tips.blogspot.com/2014/09/script-to-ease-gathering-statistics-on.html


Whenever the data changes "significantly".

If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.

Statistics store information about item counts and relative frequencies -- things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.


Since Oracle 11g statistics are gathered automatically by default.

Two Scheduler windows are predefined upon installation of Oracle Database:

  • WEEKNIGHT_WINDOW starts at 10 p.m. and ends at 6 a.m. every Monday through Friday.
  • WEEKEND_WINDOW covers whole days Saturday and Sunday.

When statistics were last gathered?

SELECT owner, table_name, last_analyzed FROM all_tables ORDER BY last_analyzed DESC NULLS LAST; --Tables.
SELECT owner, index_name, last_analyzed FROM all_indexes ORDER BY last_analyzed DESC NULLS LAST; -- Indexes.

Status of automated statistics gathering?

SELECT * FROM dba_autotask_client WHERE client_name = 'auto optimizer stats collection';

Windows Groups?

SELECT window_group_name, window_name FROM dba_scheduler_wingroup_members;

Window Schedules?

SELECT window_name, start_time, duration FROM dba_autotask_schedule;

Manually gather Database Statistics in this Schema:

EXEC dbms_stats.gather_schema_stats(ownname=>NULL, cascade=>TRUE); -- cascade=>TRUE means include Table Indexes too.

Manually gather Database Statistics in all Schemas!

-- Probably need to CONNECT / AS SYSDBA
EXEC dbms_stats.gather_database_stats;

What Oracle version are you using? Check this page which refers to Oracle 10:

http://www.acs.ilstu.edu/docs/Oracle/server.101/b10752/stats.htm

It says:

The recommended approach to gathering statistics is to allow Oracle to automatically gather the statistics. Oracle gathers statistics on all database objects automatically and maintains those statistics in a regularly-scheduled maintenance job.


With 10g and higher version of oracle, up to date statistics on tables and indexes are needed by the optimizer to make "good" execution plan decision. How often you collect statistics is a tricky call. It depends on your application, schema, data rate and business practice. Some third party apps which are written to be backward compatible with older version of oracle do not perform well with the new optimizer. Those application require that tables have no stats so that the db resorts back to rule base execution plan. But on the average oracle recommends that stats be collected on tables with stale statistics. You can set tables to be monitor and check their state and have them analyze if/when stale. Often that is enough, sometime it is not. It really depend on your database. For my database we have a set of OLTP tables that need nightly stats collection to maintain performance. Other tables are analyze once a week. On our large dw database, we analyze as needed as the tables are too large for regular analysis without affecting overall db load and performance. So the correct answer is, it depends on the application, data change and business needs.


Whenever the data changes "significantly".

If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.

Statistics store information about item counts and relative frequencies -- things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.


Make sure to balance the risk that fresh statistics cause undesirable changes to query plans against the risk that stale statistics can themselves cause query plans to change.

Imagine you have a bug database with a table ISSUE and a column CREATE_DATE where the values in the column increase more or less monotonically. Now, assume that there is a histogram on this column that tells Oracle that the values for this column are uniformly distributed between January 1, 2008 and September 17, 2008. This makes it possible for the optimizer to reasonably estimate the number of rows that would be returned if you were looking for all issues created last week (i.e. September 7 - 13). If the application continues to be used and the statistics are never updated, though, this histogram will be less and less accurate. So the optimizer will expect queries for "issues created last week" to be less and less accurate over time and may eventually cause Oracle to change the query plan negatively.


Whenever the data changes "significantly".

If a table goes from 1 row to 200 rows, that's a significant change. When a table goes from 100,000 rows to 150,000 rows, that's not a terribly significant change. When a table goes from 1000 rows all with identical values in commonly-queried column X to 1000 rows with nearly unique values in column X, that's a significant change.

Statistics store information about item counts and relative frequencies -- things that will let it "guess" at how many rows will match a given criteria. When it guesses wrong, the optimizer can pick a very suboptimal query plan.


When I was managing a large multi-user planning system backed by Oracle, our DBA had a weekly job that gathered statistics. Also, when we rolled out a significant change that could affect or be affected by statistics, we would force the job to run out of cycle to get things caught up.


Generally it's not recommended to gather statistics so frequent on the whole database unless you have a strong justification for that, such as a bulk insert or big data change happen frequently on the database. gathering statistics on the database in this frequency MAY change the queries execution plan to a new poor execution plans, the thing may cost you much time trying to tune every query affected by the new poor plans, this is why you should test the impact of gathering new statistics on a test database, or in case you don't have the time or the man power for that, at least you should keep a fallback plan by backing up the original statics before you gather new ones, so in case you gather a new statistics and then the queries didn't perform as expected, you can easily restore back the original statistics.

There is a very useful script can help you backup original statistics and gather new ones and provide you with SQL command you can use to restore back the original statics in case the thing didn't go as expected after gathering new statistics. You can find the script in this link: http://dba-tips.blogspot.com/2014/09/script-to-ease-gathering-statistics-on.html


When I was managing a large multi-user planning system backed by Oracle, our DBA had a weekly job that gathered statistics. Also, when we rolled out a significant change that could affect or be affected by statistics, we would force the job to run out of cycle to get things caught up.


Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to oracle

concat yesterdays date with a specific time ORA-28001: The password has expired how to modify the size of a column How to create a blank/empty column with SELECT query in oracle? Find the number of employees in each department - SQL Oracle Query to display all tablespaces in a database and datafiles When or Why to use a "SET DEFINE OFF" in Oracle Database How to insert date values into table error: ORA-65096: invalid common user or role name in oracle In Oracle SQL: How do you insert the current date + time into a table?

Examples related to statistics

Function to calculate R2 (R-squared) in R pandas: find percentile stats of a given column What exactly does numpy.exp() do? Find p-value (significance) in scikit-learn LinearRegression How to plot ROC curve in Python Pandas - Compute z-score for all columns Calculating percentile of dataset column How to normalize an array in NumPy to a unit vector? How to find row number of a value in R code np.mean() vs np.average() in Python NumPy?