Published: May 1, 2005
By Author: Eric Hanson, Contributors: Gail Erickson, Lubor Kollar, Jason Ward
This document describes the improved indexed views capability of SQL Server 2005 Enterprise Edition. Indexed views are explained and specific scenarios in which they may provide performance improvements are discussed.
On This Page
What is an Indexed View?
Benefits of Using Indexed Views
What's new for Indexed Views in SQL Server 2005?
Design Considerations
Creating Indexed Views
Examples
Frequently Asked Questions about Indexed Views
For More Information
Benefits of Using Indexed Views
What's new for Indexed Views in SQL Server 2005?
Design Considerations
Creating Indexed Views
Examples
Frequently Asked Questions about Indexed Views
For More Information
What is an Indexed View?
For many years, Microsoft® SQL Server™ has supported the ability to create virtual tables known as views. Historically, these views served these main purposes:
- To provide a security mechanism that restricts users to a certain subset of data in one or more base tables.
- To provide a mechanism that allows developers to customize how users can logically view the data stored in base tables.
With SQL Server 2000, the functionality of SQL Server views was expanded to provide system performance benefits. It is possible to create a unique clustered index on a view, as well as nonclustered indexes, to improve data access performance on the most complex queries. In SQL Server 2000 and 2005, a view that has a unique clustered index is referred to as an indexed view. The discussion in this paper applies to SQL Server 2005. Much of the discussion also applies to SQL Server 2000.
From the Database Management System (DBMS) perspective, a view is a description of the data (a form of metadata). When a typical view is created, the metadata is defined by encapsulating a SELECT statement that defines a result set to be represented as a virtual table. When a view is referenced in the FROM clause of another query, this metadata is retrieved from the system catalog and expanded in place of the view's reference. After view expansion, the SQL Server query optimizer compiles a single execution plan for the executing query. The query optimizer searches though a set of possible execution plans for a query, and chooses the lowest-cost plan it can find, based on estimates of the actual time it will take to execute each query plan.
In the case of a non-indexed view, the portions of the view necessary to solve the query are materialized at run time. Any computations such as joins or aggregations are done during query execution for each query referencing the view1. After a unique clustered index is created on the view, the view's result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing this costly operation at execution time.
The indexed view can be used in a query execution in two ways. The query can reference the indexed view directly, or, more importantly, the query optimizer can select the view if it determines that the view can be substituted for some or all of the query in the lowest-cost query plan. In the second case, the indexed view is used instead of the underlying tables and their ordinary indexes. The view does not need to be referenced in the query for the query optimizer to use it during query execution. This allows existing applications to benefit from the newly created indexed views without changing those applications.
Note Indexed views are a feature of all versions of SQL Server 2000 and 2005. In the Developer and Enterprise editions of SQL Server 2000 and 2005, the query processor can use an indexed view to solve queries that structurally match the view, even if they don't refer to the view by name. In other versions, you must reference the view by name and use the NOEXPAND hint on the view reference to query the contents of an indexed view.
Performance Gains From Indexed Views
Using indexes to improve query performance is not a new concept; however, indexed views provide additional performance benefits that cannot be achieved using standard indexes. Indexed views can increase query performance in the following ways:
- Aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
- Tables can be prejoined and the resulting data set stored.
- Combinations of joins or aggregations can be stored.
The graph demonstrates the typical performance increases that can be achieved when the query optimizer uses an indexed view. The represented queries varied in complexity (for example, the number of aggregate calculations, the number of tables used, or the number of predicates) and included large multi-million row tables from a real production environment.
Figure 1
Using Nonclustered Indexes on Views
Secondary, nonclustered indexes on views can provide additional query performance. Similar to nonclustered indexes on tables, nonclustered indexes on views may provide more options for the query optimizer to choose from during the compilation process. For example, if the query includes columns not covered by the clustered index, the optimizer can choose one or more secondary indexes in the plan and avoid a time-consuming full scan of the indexed view or base tables.
Adding indexes to the schema increases the overhead on the database because the indexes will require on-going maintenance. Careful consideration should be given to finding the right balance of indexes and maintenance overhead.
Benefits of Using Indexed Views
Analyze your database workload before implementing indexed views. Use your knowledge of the queries as well as various tools (for example SQL Profiler) to identify the queries that can benefit from indexed views. Frequently occurring aggregations and joins are the best candidates for indexed views. Whether or not a query is asked frequently, it may be a candidate for an indexed view if it takes significant time to answer, and the value of getting the answer quickly is high. For example, some developers find it useful to create indexed views that pre-compute and store the answers to queries for reports run at the end of each month by senior executives.
Not all queries will benefit from indexed views. Similar to ordinary indexes, if the indexed views are not used, there is no benefit. In this case, not only are performance gains not realized, but the additional cost of disk space, maintenance, and optimization is incurred. However, when indexed views are used, they can provide significant improvements (by orders of magnitude) in data access. This is because the query optimizer uses the pre-computed results stored in the indexed view, which substantially reduces the cost of the query execution.
The query optimizer considers indexed views only for queries with nontrivial cost. This avoids situations where trying to match various indexed views during the query optimization costs more than the savings achieved by the indexed view usage. Indexed views are rarely used in queries with a cost of less than 1.
Applications that benefit from the implementation of indexed views include:
- Decision support workloads
- Data marts
- Data warehouses
- Online analytical processing (OLAP) stores and sources
- Data mining workloads
From the query type and pattern point of view, the benefiting applications can be characterized as those containing:
Joins and aggregations of large tables
- Repeated patterns of queries
- Repeated aggregations on the same or overlapping sets of columns
- Repeated joins of the same tables on the same keys
- Combinations of the above
On the contrary, online transaction processing (OLTP) systems with many writes, or database applications with frequent updates, may not be able to take advantage of indexed views because of the increased maintenance cost associated with updating both the view and underlying base tables.
How the Query Optimizer Uses Indexed Views
The SQL Server query optimizer automatically determines when an indexed view can be used for a given query execution. The view does not need to be referenced directly in the query for the optimizer to use it in the query execution plan. Therefore, existing applications may take advantage of the indexed views without any changes to the application itself; only the indexed views have to be created.
Optimizer Considerations
The query optimizer considers several conditions to determine if an indexed view can cover the entire query or a portion of it. These conditions correspond to a single FROM clause in the query and consist of the following:
- The tables in the query FROM clause must be a superset of the tables in the indexed view FROM clause.
- The join conditions in the query must be a superset of the join conditions in the view.
- The aggregate columns in the query must be derivable from a subset of the aggregate columns in the view.
- All expressions in the query select list must be derivable from the view select list or from the tables not included in the view definition.
- One predicate subsumes another if it matches a superset of the rows matched by the other. For example, "T.a=10" subsumes "T.a=10 and T.b=20." Any predicate subsumes itself. The part of the predicate of the view that restricts values of one table must subsume the part of the predicate of the query that restricts the same table. Furthermore, it must do so in a way that SQL Server can verify.
- All columns in the query search condition predicates that belong to tables in the view definition must appear in one or more of the following in the view definition:
- A GROUP BY list.
- The view select list if there is no GROUP BY.
- The same or equivalent predicate in the view definition.
Cases (1) and (2) allow SQL Server to apply a query predicate to rows from the view to further restrict the rows of the view. Number (3) is a special case where no filtering is needed on the column, so the column needn't appear in the view.
If the query contains more than one FROM clause (subqueries, derived tables, UNION), the optimizer may select several indexed views to process the query, and apply them to different FROM clauses.2
Example queries demonstrating these conditions are presented at the end of this document. Allowing the query optimizer to determine which indexes, if any, to use in the query execution plan is the recommended best practice.
Using the NOEXPAND view hint
When SQL Server processes queries that refer to views by name, the definitions of the views normally are expanded until they refer only to base tables. This process is called view expansion. It's a form of macro expansion.
The NOEXPAND view hint forces the query optimizer to treat the view like an ordinary table with a clustered index. It prevents view expansion. The NOEXPAND hint can only be applied if the indexed view is referenced directly in the FROM clause. For example,
SELECT Column1, Column2, ... FROM Table1, View1 WITH (NOEXPAND) WHERE ...
Use NOEXPAND if you want to be sure to have SQL Server process a query by reading the view itself instead of reading data from the base tables. If for some reason SQL Server chooses a query plan that processes the query against base tables when you'd prefer that it use the view, consider using NOEXPAND. You must use NOEXPAND in all versions of SQL Server other than Developer and Enterprise editions to have SQL Server process a query against an indexed view directly. You can see a graphical representation of the plan SQL Server chooses for a statement using the SQL Server Management Studio tool Display Estimated Execution Plan feature. Alternatively, you can see different non-graphical representations using SHOWPLAN_ALL, SHOWPLAN_TEXT, or SHOWPLAN_XML. See SQL Sever books online for a discussion of the different versions of SHOWPLAN.
Using the EXPAND VIEWS query hint
When processing a query that refers to a view by name, SQL Server always expands the views, unless you add the NOEXPAND hint to the view reference. It attempts to match indexed views to the expanded query, unless you specify the EXPAND VIEWS query hint in an OPTION clause at the end of the query. For example, suppose there is an indexed view View1 in the database. In the following query, View1 is expanded based on its logical definition (its CREATE VIEW statement), and then the EXPAND VIEWS option prevents the indexed view for View1 from being used in the plan to solve the query.
SELECT Column1, Column2, ... FROM Table1, View1 WHERE ... OPTION (EXPAND VIEWS)
Use EXPAND VIEWS if you want to be sure to have SQL Server process a query by accessing data directly from the base tables referenced by the query, instead of possibly accessing indexed views. EXPAND views may in some cases help eliminate lock contention that could be experienced with an indexed view. Both NOEXPAND and EXPAND VIEWS can help you evaluate performance with and without use of indexed views when you test your application.
What's new for Indexed Views in SQL Server 2005?
SQL Server 2005 contains many improvements for indexed views compared with SQL Server 2000. The set of indexable views has been expanded to include those based on:
- Scalar aggregates, including SUM and COUNT_BIG without GROUP BY.
- Scalar expressions and user-defined functions (UDFs). For example, given a table T(a int, b int, c int) and a scalar UDF dbo.MyUDF(@x int), an indexed view defined on T can contain a computed column such as a+b or dbo.MyUDF(a).
- Persisted imprecise columns. An imprecise column is one whose type is float or real, or a computed column that is derived from a float or real column. In SQL Server 2000, an imprecise column could be used in the select list of an indexed view if it was not part of the index key. An imprecise column could not be used elsewhere inside the view definition either, such as in the WHERE or FROM clauses. SQL Server 2005 allows an imprecise column to participate in the key or inside the view definition if the column is persisted in the base table. Persisted columns include regular columns and computed columns marked PERSISTED.
- The fundamental reason that imprecise, non-persisted columns can't participate in indexes or indexed views is that it is necessary to be able to detach a database from one machine and attach it to another. After the move, all computed column values stored in indexes or indexed views must be derivable in exactly the same way on the new hardware as on the old hardware, down to the individual bit. Otherwise, these indexed views are logically corrupted with respect to the new hardware. Because of this corruption, on the new hardware, queries to the indexed views could return different answers depending on whether the plan used the indexed view or the base tables to derive the view data. Furthermore, the indexed views couldn't be maintained correctly on the new machine. Unfortunately, floating point hardware on different machines (even with the same processor architecture from the same manufacturer) does not always stay 100% the same from version to version of the processor. A firmware upgrade might mean that (a*b) on the new hardware is not equal to (a*b) on the old hardware, for some floating point values a and b. For example, the results might be very close, but differ in the least significant bit. Persisting the imprecise computed values before indexing them solves this detach/attach inconsistency problem since all expressions are evaluated on the same machine during database update and maintenance of indexes and indexed views.
- Common Language Runtime (CLR) types. A major new feature of SQL Server 2005 is support for user-defined types (UDTs) and UDFs based on the CLR. Indexed views can be defined on CLR UDT columns, or expressions derived from those columns, provided that the columns or expressions are deterministic, and precise, persisted, or both. CLR user-defined aggregates cannot be used in an indexed view.
The capability of the optimizer to match queries to indexed views, and thus use them in query plans, has been enlarged to include:
- New expression types in the SELECT list or condition of a query or view that involve:
- Scalar expressions, such as (a+b)/2.
- Scalar aggregates.
- Scalar UDFs.
- Interval subsumption. The optimizer can detect whether interval conditions in an indexed view definition cover, or "subsume" interval conditions in a query. For example, the optimizer can determine that "a>10 and a<20" covers "a>12 and a<18."
- Expression equivalence. Certain expressions that can be shown to be the same even though they are syntactically different are treated the same. An example is that "a=b and c<>10" is equivalent to "10<>c and b=a."
In addition, if there are a large numbers of indexed views in the database, compilation performance for tables on which the views are defined is typically significantly faster in SQL Server 2005 than in SQL Server 2000.
Design Considerations
Identifying an appropriate set of indexes for a database system can be complex. While there are numerous possibilities to consider when designing ordinary indexes, adding indexed views to the schema dramatically increases the complexity of the design and the potential results. For example, indexed views can be used on:
- Any subset of tables referenced in the query.
- Any subset of the conditions in the query for that subset of tables.
- Grouping columns.
- Aggregate functions (for example, SUM).
Indexes on tables and indexed views should be designed concurrently to obtain the best results from each construct. Because both indexes and indexed views may be useful for a given query, designing them separately can lead to redundant recommendations that incur high storage and maintenance overhead. While tuning the physical design of a database, trade offs must be made between the performance requirements of a diverse set of queries and updates that the database system must support. Therefore, identifying an appropriate physical design for indexed views is a challenging task, and the Database Tuning Advisor should be used wherever it is possible.
Query optimization cost can increase substantially if there are many indexed views that the query optimizer may consider for a particular query. A query optimizer may consider all indexed views that are defined on any subset of tables in the query. Each view has to be investigated for the potential substitution before it is rejected. This may take same time, especially if there are hundreds of such views for a given query.
A view must meet several requirements before you can create a unique clustered index on it. During the design phase, consider these requirements:
- The view, and all tables referenced in the view, must be in the same database and have the same owner.
- The indexed view does not need to contain all the tables referenced in the query to be used by the optimizer.
- A unique clustered index must be created before any other indexes can be created on the view.
- Certain SET options (discussed later in this document) must be set correctly when the base tables, view, and index are created, and whenever data in the base tables and view are modified. In addition, the query optimizer will not consider the indexed view unless these SET options are correct.
- The view must be created using schema binding and any user-defined functions referenced in the view must also be created with the SCHEMABINDING option.
- Additional disk space will be required to hold the data defined by the indexed view.
Design Guidelines
Consider these guidelines when designing indexed views:
- Design indexed views that can be used by several queries or multiple operations.
- For example, an indexed view that contains the SUM of a column and the COUNT_BIG of a column can be used by queries that contain the functions SUM, COUNT, COUNT_BIG, or AVG. The queries will be faster because only a small number of rows from the view need to be retrieved rather than the full number of rows from the base tables and a portion of the computations required for performing the AVG function have already been done.
- Keep the index key compact.
- By using the fewest number of columns and bytes as possible in the index key, access to the rows of the indexed view can be done more efficiently because the indexed view rows may be are narrower, and key comparisons are faster than with a wider key. Additionally, the clustered index key is used as a row locator in every nonclustered index defined on the indexed view. The cost of a larger index key increases in proportion to the number of nonclustered indexes on the view.
- Consider the size of the resulting indexed view.
- In the case of pure aggregation, the indexed view may not provide any significant performance gains if its size is similar to the size of the original table.
- Design multiple smaller indexed views that accelerate parts of the process.
- You may not be able to always design an indexed view that addresses the entire query. Should that occur, consider creating several indexed views each performing a portion of the query.
Consider these examples:
- A frequently executed query aggregates data in one database, aggregates data in another database, and then joins the results. Because an indexed view cannot reference tables from more than one database, you cannot design a single view to perform the entire process. However, you can create an indexed view in each database that does the aggregation for that database. If the optimizer can match the indexed views against existing queries, at least the aggregation processing will be faster, without the need to recode existing queries. Although the join processing is not faster, the overall query is faster because it uses the aggregations stored in the indexed views.
- A frequently executed query aggregates data from several tables, and then uses UNION to combine the results. UNION is not allowed in an indexed view. You can design views to perform each of the individual aggregation operations. The optimizer can then select the indexed views to speed up queries with no need to recode the queries. While the UNION processing is not improved, the individual aggregation processes are improved.
Is There a Tool to Help Choose Indexed Views?
The Database Tuning Advisor (DTA3) is a SQL Server 2005 feature that helps database administrators tune their physical database design. DTA recommends indexed views in addition to recommending indexes on base tables, and table and index partitioning strategies. Using DTA enhances an administrator's ability to determine the combination of indexes, indexed views, and partitioning strategies that optimize the performance of the typical mix of queries executed against a database. DTA can recommend a wide variety of indexed views. These include ones that take advantage of the new features for indexed views for SQL Server 2005 described in the section What's New for Indexed Views in SQL Server 2005?. DTA doesn't eliminate the need for good judgment by the database administrator when designing physical storage structures. However, it can simplify the physical database design process. DTA operates in cooperation with the cost-based query optimizer by proposing a set of hypothetical index, indexed view, and partition structures. DTA uses the optimizer to estimate the cost of your workload with and without these structures, and recommends structures that provide low overall cost.
Because the Database Tuning Advisor forces all the required SET options (to ensure the result set is correct), its indexed view creation will succeed. However, your application may not be able to take advantage of the views if its option settings are not set as required. Inserts, updates, or deletes may fail on tables that participate in indexed view definitions if the SET options aren't specified as required.
Indexed views are similarly maintained; however, if the view references several tables, updating any of them may require updating the indexed view. Unlike ordinary indexes, a single row insert into any of the participating tables may cause multiple row changes in an indexed view. This is because the single row may join with multiple rows of another table. The same is true for updates and deletes. Consequently, the maintenance of an indexed view may be more expensive than maintaining an index on the table. Conversely, the maintenance of an indexed view with a highly selective condition may be much less expensive than maintaining an index on a table because most inserts, deletes and updates to base tables the view references will not affect the view. These operations can be filtered out with respect to the indexed view without accessing other database data.
In SQL Server, some views can be updated. When a view is updatable, the underlying base tables are modified directly through the view using INSERT, UPDATE and DELETE statements. Creating an index on a view does not prevent the view from being updatable. Updates to an indexed view really cause updates to the base table(s) underlying the view. These updates propagate back to the indexed view automatically as part of indexed view maintenance. For more information about updatable views, see Modifying Data Through a View in SQL Server Books Online for SQL Server 2005.
Maintenance Cost Considerations
- Additional storage is required in the database for the indexed view. The result set of an indexed view is physically persisted in the database in a manner similar to that of typical table storage
- SQL Server maintains views automatically; therefore, any changes to a base table on which a view is defined may initiate one or more changes in the view indexes. Thus, additional maintenance overhead is incurred.
The net performance improvement achieved by a view is the difference of the total query execution savings offered by the view and the cost to store and maintain the view.
It is relatively easy to approximate the required storage the view will consume. Evaluate the SELECT statement encapsulated by the view definition with the SQL Server Management Studio tool Display Estimated Execution Plan. This tool will yield an approximation of the number of rows returned by the query and the size of the row. By multiplying these two values together, it is possible to approximate the potential size of the view; however, this is only an approximation. The actual size of the index on the view can be accurately determined only by executing the query in the view definition, or creating the index on the view.
From the standpoint of automated maintenance considerations performed by SQL Server, the Display Estimated Execution Plan functionality may give some insight on the impact of this overhead. If a statement that modifies the view (UPDATE on the view, INSERT into a base table) is evaluated with SQL Server Management Studio, an execution plan displayed for the statement will include the maintenance operation for that statement. Taking this cost into consideration along with an idea of how many times this operation will occur in the production environment may indicate the potential cost of view maintenance.
As a general recommendation, any modifications or updates to the view or the base tables underlying the view should be performed in batches if possible, rather than singleton operations. This may reduce some overhead in the view maintenance.
Creating Indexed Views
The steps required to create an indexed view are critical to the successful implementation of the view.
- Verify the setting of ANSI_NULLS is correct for all existing tables that will be referenced in the view.
- Verify ANSI_NULLS is set correctly for the current session as shown in the table below before creating any new tables.
- Verify ANSI_NULLS and QUOTED_IDENTIFIER are set correctly for the current session as shown in the table below before creating the view.
- Verify the view definition is deterministic.
- Create the view using the WITH SCHEMABINDING option.
- Verify your session's SET options are set correctly as shown in the table below before creating the unique clustered index on the view.
- Create the unique clustered index on the view.
- The OBJECTPROPERTY function can be used to check the value of ANSI_NULLS and QUOTED_IDENTIFIER on an existing table or view.
Using SET Options to Obtain Consistent Results
Evaluating the same expression can produce different results in SQL Server 2005 if different SET options are enabled for the current session when the query is executed. For example, after the SET option CONCAT_NULL_YIELDS_NULL is set to ON, the expression 'abc' + NULL returns the value NULL. But after CONCAT_NULL_YIEDS_NULL is set to OFF, the same expression produces 'abc'. Indexed views require fixed values for several SET options for the current session and for objects referenced by the view to ensure that the views can be maintained correctly and return consistent results.
The SET options ANSI_NULLS and QUOTED_IDENTIFIER of the current session must both be set to ON at the time a view on which you wish to build an index is created. This is because these two options are stored with the view definition in the system catalogs.
The SET options of the current session must be set to the values shown in the Required Value column for the current session whenever these operations occur:
- An index is created on a view.
- There is any INSERT, UPDATE, or DELETE operation performed on any table participating in the indexed view.
- The indexed view is used by the query optimizer to produce the query plan.
SET Options | Required Value | Default Server Value | .Net SqlClient, OLE DB and ODBC Value | DB LIB Value |
---|---|---|---|---|
ANSI_NULLS | ON | OFF | ON | OFF |
ANSI_PADDING | ON | OFF | ON | OFF |
ANSI_WARNINGS | ON | OFF | ON | OFF |
CONCAT_NULL_YIELDS_NULL | ON | OFF | ON | OFF |
NUMERIC_ROUNDABORT | OFF | OFF | OFF | OFF |
QUOTED_IDENTIFIER | ON | OFF | ON | OFF |
The ARITHABORT option does have to be ON for the current session to create an indexed view, but it is implicitly ON in SQL Server 2005 once ANSI_WARNINGS is ON, so it does not need to be set explicitly. If you are using a .Net SqlClient, OLE DB or ODBC server connection, you do not have to modify any SET options from their defaults to create, use and maintain indexed views. All DB LIB values must be set correctly either at the server level using sp_configure or from the application using the SET command. For more information about SET options, see Using Options in SQL Server in SQL Server Books Online.
Using Deterministic Functions
The definition of an indexed view must be deterministic. A view is deterministic if all expressions in the select list, as well as the WHERE and GROUP BY clauses, are deterministic. Deterministic expressions always return the same result any time they are evaluated with a specific set of input values. Only deterministic functions may participate in deterministic expressions. For example, the DATEADD function is deterministic because it always returns the same result for any given set of argument values for its three parameters. GETDATE is not deterministic because it is always invoked with the same argument, yet the value it returns changes each time it is executed. For more information, see Deterministic and Nondeterministic Functions in SQL Server Books Online for SQL Server 2005.
Even if an expression is deterministic, if it contains float expressions, the exact result may depend on the processor architecture or version of microcode. To ensure data integrity in SQL Server 2005 when moving a database from one machine to another, such expressions can participate only as non-key columns of indexed views. Deterministic expressions that do not contain float expressions are called precise. Only deterministic expressions that are persisted and/or precise may participate in key columns and WHERE or GROUP BY clauses of indexed views. Persisted expressions are references to stored columns, including regular columns and computed columns marked PERSISTED.
Use the COLUMNPROPERTY function and IsDeterministic property to determine if a view column is deterministic. Use the COLUMNPROPERTY function and IsPrecise property to determine if a deterministic column in a view with SCHEMABINDING is precise. COLUMNPROPERTY returns 1 if the property is TRUE, 0 if FALSE, and NULL for invalid input. For example, in this script
CREATE TABLE T(a int, b real, c as getdate(), d as a+b) CREATE VIEW VT WITH SCHEMABINDING AS SELECT a, b, c, d FROM dbo.T SELECT object_id('VT'), COLUMNPROPERTY(object_id('VT'),'b','IsPrecise')
the SELECT returns 0 for IsPrecise because the b column is of type real. You can experiment with COLUMNPROPERTY to find out whether the other columns of T are deterministic and precise.
Additional Requirements
The set of views that can be indexed is a subset of the set of possible views. Any indexable view can exist with or without an index.
In addition to the requirements listed in the design guidelines, the Using Set Options to Obtain Consistent Results and Using Deterministic Functions sections, the following requirements must be met to create a unique clustered index on a view
Base Table Requirements
- Base tables referenced by the view must have the correct value of SET option ANSI_NULLS set at the time the table is created. The OBJECTPROPERTY function can be used to check the value of ANSI_NULLS on an existing table.
Function Requirements
User-defined functions referenced by the view must be created using the WITH SCHEMABINDING option.
View Requirements
- The view must be created using the WITH SCHEMABINDING option.
- Tables must be referenced by the view using two-part names (schemaname.tablename).
- User-defined functions must be referenced by the view using two-part names (schemaname.functionname).
- SET options ANSI_NULLS and QUOTED_IDENTIFIER must be set correctly.
View Restrictions
To create an index on a view in SQL Server 2005, the view definition must not contain:
ANY, NOT ANY | OPENROWSET, OPENQUERY, OPENDATASOURCE |
arithmetic on imprecise (float, real) values | OPENXML |
COMPUTE, COMPUTE BY | ORDER BY |
CONVERT producing an imprecise result | OUTER join |
COUNT(*) | reference to a base table with a disabled clustered index |
GROUP BY ALL | reference to a table or function in a different database |
Derived table (subquery in FROM list) | reference to another view |
DISTINCT | ROWSET function |
EXISTS, NOT EXISTS | self-join |
expressions on aggregate results (e.g. SUM(x)+SUM(x)) | STDEV, STDEVP, VAR, VARP, AVG |
full-text predicates (CONTAINS, FREETEXT, CONTAINSTABLE, FREETEXTTABLE) | Subquery |
imprecise constants (e.g. 2.34e5) | SUM on nullable expressions |
inline or table-valued functions | table hints (e.g. NOLOCK) |
MIN, MAX | text, ntext, image, filestream, or XML columns |
non-deterministic expressions | TOP |
non-unicode collations | UNION |
contradictions SQL Server 2005 can detect that mean the view would be empty (e.g. where 0=1 and ...) |
Note The indexed view may contain float and real columns; however, such columns cannot be included in the clustered index key if they are non-persisted computed columns.
GROUP BY Restrictions
If GROUP BY is present, the VIEW definition:
- Must contain COUNT_BIG(*).
- Must not contain HAVING, CUBE, ROLLUP, or GROUPING()
These restrictions are applicable only to the indexed view definition. A query can use an indexed view in its execution plan even if it does not satisfy these GROUP BY restrictions
Index Requirements
- The user executing the CREATE INDEX statement must be the view owner.
- If the view definition contains a GROUP BY clause, the key of the unique clustered index can reference only the columns specified in the GROUP BY clause.
- The index must not be created with the IGNORE_DUP_KEY option enabled.
Examples
The examples in this section illustrate the use of indexed views with two major groups of queries: aggregations and joins. They also demonstrate the conditions used by the query optimizer when determining if an indexed view is applicable. For information about a complete list of conditions, see "How the Query Optimizer Uses Indexed Views."
The queries are based on tables in AdventureWorks, the sample database provided in SQL Server 2005, and can be executed as written. You may want to use the Display Estimated Execution Plan tool in SQL Server Management Studio to view the plans selected by the query optimizer before and after the views are created. Although the examples demonstrate how the optimizer chooses the low cost execution plan, the AdventureWorks sample database is too small to show performance gains
Before you begin working on these examples, make sure your session has the correct options set by running these commands:
Setup
SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET CONCAT_NULL_YIELDS_NULL ON SET NUMERIC_ROUNDABORT OFF SET QUOTED_IDENTIFIER ON SET ARITHABORT ON
The following queries show two methods to return the five products with the largest total discount from the Sales.SalesOrderDetail table.
Query 1
SELECT TOP 5 ProductID, Sum(UnitPrice*OrderQty) - Sum(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS Rebate FROM Sales.SalesOrderDetail GROUP BY ProductID ORDER BY Rebate DESC
Query 2
SELECT TOP 5 ProductID, SUM(UnitPrice*OrderQty*UnitPriceDiscount) AS Rebate FROM Sales.SalesOrderDetail GROUP BY ProductID ORDER BY Rebate DESC
The execution plan selected by the query optimizer contains:
- A Clustered Index Scan on the Sales.SalesOrderDetail table with a row estimate of 121,317 rows.
- A Hash Match/Aggregate operator that puts the selected rows into a hash table based on the GROUP BY column and computes the SUM aggregation for each row.
- A TOP 5 sort operator based on the ORDER BY clause.
View 1
Adding an indexed view that includes the aggregations required for the Rebate column will change the query execution plan for Query 1. On a large table (multi-million rows), the query's performance would also improve significantly.
CREATE VIEW Vdiscount1 WITH SCHEMABINDING AS SELECT SUM(UnitPrice*OrderQty) AS SumPrice, SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount)) AS SumDiscountPrice, COUNT_BIG(*) AS Count, ProductID FROM Sales.SalesOrderDetail GROUP BY ProductID GO CREATE UNIQUE CLUSTERED INDEX VDiscountInd ON Vdiscount1 (ProductID)
The execution plan for the first query shows that the Vdiscount1 view is used by the optimizer. However, the view will not be used by the second query because it does not contain the SUM(UnitPrice*OrderQty*UnitPriceDiscount) aggregate. Another indexed view can be created that will address both queries.
View 2
CREATE VIEW Vdiscount2 WITH SCHEMABINDING AS SELECT SUM(UnitPrice*OrderQty)AS SumPrice, SUM(UnitPrice*OrderQty*(1.00-UnitPriceDiscount))AS SumDiscountPrice, SUM(UnitPrice*OrderQty*UnitPriceDiscount)AS SumDiscountPrice2, COUNT_BIG(*) AS Count, ProductID FROM Sales.SalesOrderDetail GROUP BY ProductID GO CREATE UNIQUE CLUSTERED INDEX VDiscountInd ON Vdiscount2 (ProductID)
With this indexed view, after dropping Vdiscount1, the query execution plan for both queries now contains:
- A Clustered Index Scan on the Vdiscount2 view with a row estimate of 266 rows
- A TOP 5 Sort function based on the ORDER BY clause
The query optimizer selected the view because it provided the lowest execution cost even though it was not referenced in the query.
Query 3
Query 3 is similar to the previous queries, but ProductID is replaced by the column SalesOrderID, which is not included in the view definition. This violates the condition that all expressions in the query select list on tables in the view definition must be derivable from the view select list in order to use the indexed view in the query plan.
SELECT TOP 3 SalesOrderID, SUM(UnitPrice*OrderQty*UnitPriceDiscount) OrderRebate FROM Sales.SalesOrderDetail GROUP BY SalesOrderID ORDER BY OrderRebate DESC
A separate indexed view would be required to address this query. Vdiscount2 could be modified to include SalesOrderID; however, the resulting view would contain as many rows as the original table and would not provide a performance improvement over using the base table.
Query 4
This query produces the average price for each product.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-od.UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID=p.ProductID GROUP BY p.Name, od.ProductID
Complex aggregates (for example, STDEV, VARIANCE, AVG) cannot be included in the definition of an index view. However, indexed views can be used to execute a query containing an AVG by including the simple aggregate functions that, when combined, perform the complex aggregation.
View 3
This indexed view contains the simple aggregate functions needed to perform an AVG function. When Query 4 is executed after the creation of View 3, the execution plan shows the view being used. The optimizer can derive the AVG expression from the view's simple aggregation columns Price and Count.
CREATE VIEW View3 WITH SCHEMABINDING AS SELECT ProductID, SUM(UnitPrice*(1.00-UnitPriceDiscount)) AS Price, COUNT_BIG(*) AS Count, SUM(OrderQty) AS Units FROM Sales.SalesOrderDetail GROUP BY ProductID GO CREATE UNIQUE CLUSTERED INDEX iv3 ON View3 (ProductID)
Query 5
This query is the same as Query 4, but includes one additional search condition. View 3 will work for this query even though the additional search condition references only columns from a table not included in the view definition.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID=p.ProductID AND p.Name like '%Red%' GROUP BY p.Name, od.ProductID
Query 6
The query optimizer cannot use View 3 for this query. The added search condition od.UnitPrice>10 contains a column from the table in the view definition but the column does not appear in the GROUP BY list nor does the search predicate appear in the view definition.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID = p.ProductID AND od.UnitPrice > 10 GROUP BY p.Name, od.ProductID
Query 7
In contrast, the query optimizer can use View 3 for Query 7 because the column defined in the new search condition od.ProductID between 0 and 995 is included in the GROUP BY clause in the view definition.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID = p.ProductID AND od.ProductID between 0 and 995 GROUP BY p.Name, od.ProductID
View 4
This view will satisfy the conditions for Query 6 by including the columns SumPrice and Count in the view definition to allow the AVG in the query to be computed.
CREATE VIEW View4 WITH SCHEMABINDING AS SELECT p.Name, od.ProductID, SUM(od.UnitPrice*(1.00-UnitPriceDiscount)) AS SumPrice, SUM(od.OrderQty) AS Units, COUNT_BIG(*) AS Count FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID = p.ProductID AND od.UnitPrice > 10 GROUP BY p.Name, od.ProductID GO CREATE UNIQUE CLUSTERED INDEX VdiscountInd on View4 (Name, ProductID)
Query 8
The same index on View 4 will also be used for a query where a join to the table Sales.SalesOrderHeader is added. This query meets the condition that the tables listed in the query FROM clause are a superset of the tables in the FROM clause of the indexed view.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p, Sales.SalesOrderHeader AS o WHERE od.ProductID = p.ProductID AND o.SalesOrderID = od.SalesOrderID AND od.UnitPrice > 10 GROUP BY p.Name, od.Product
The final two queries are modifications of Query 8. Each modification violates one of the optimizer conditions and unlike Query 8, cannot use View4.
Query 8a
Q8a cannot use the indexed view because of the WHERE clause mismatch between UnitPrice > 10 in the view definition and UnitPrice > 25 in the query, and the fact that UnitPrice does not appear in the view. The query search condition predicate must be a superset of the search condition predicates in the view definition.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p, Sales.SalesOrderHeader AS o WHERE od.ProductID = p.ProductID AND o.SalesOrderID = od.SalesOrderID AND od.UnitPrice > 25 GROUP BY p.Name, od.ProductID
Query 8b
Observe that table Sales.SalesOrderHeader does not participate in the indexed view V4 definition. In spite of that, adding a predicate on this table will disallow using the indexed view because the added predicate may change or eliminate additional rows participating in the aggregates shown in Query 8b below.
SELECT p.Name, od.ProductID, AVG(od.UnitPrice*(1.00-UnitPriceDiscount)) AS AvgPrice, SUM(od.OrderQty) AS Units FROM Sales.SalesOrderDetail AS od, Production.Product AS p, Sales.SalesOrderHeader AS o WHERE od.ProductID = p.ProductID AND o.SalesOrderID = od.SalesOrderID AND od.UnitPrice > 10 AND o.OrderDate > '20040728' GROUP BY p.Name, od.ProductID
View 4a
View 4a extends View 4 by including the UnitPrice column in the select list and GROUP BY clause. Query 8a can use View 4a because the UnitPrice values, known to be greater than 10, to be filtered further to leave only those greater than 25. This is an example of interval subsumption.
CREATE VIEW View4a WITH SCHEMABINDING AS SELECT p.Name, od.ProductID, od.UnitPrice, SUM(od.UnitPrice*(1.00-UnitPriceDiscount)) AS SumPrice, SUM(od.OrderQty) AS Units, COUNT_BIG(*) AS Count FROM Sales.SalesOrderDetail AS od, Production.Product AS p WHERE od.ProductID = p.ProductID AND od.UnitPrice > 10 GROUP BY p.Name, od.ProductID, od.UnitPrice GO CREATE UNIQUE CLUSTERED INDEX VdiscountInd ON View4a (Name, ProductID, UnitPrice)
View 5
View 5 contains an expression in its select and GROUP BY lists. Notice that LineTotal is a computed column, so by itself it is an expression. This expression is in turn nested inside a call to the FLOOR function.
CREATE VIEW View5 WITH SCHEMABINDING AS SELECT FLOOR(LineTotal) FloorTotal, COUNT_BIG(*) C FROM Sales.SalesOrderDetail GROUP BY FLOOR(LineTotal) GO CREATE UNIQUE CLUSTERED INDEX iView5 ON View5(FloorTotal)
Query 9
Query 9 contains the expression FLOOR(LineTotal) in its select and GROUP BY lists. With the new extensions to view matching for expressions in SQL Server 2005, this query uses the index on View 5.
SELECT TOP 5 FLOOR(LineTotal), Count(*) FROM Sales.SalesOrderDetail GROUP BY FLOOR(LineTotal) ORDER BY COUNT(*) DESC
View 6
View 6 stores information about line items for the three days at the end of a month. This clusters together these rows on a small number of pages so that queries to Sales.SalesOrderDetail for these days can be satisfied quickly.
CREATE VIEW View6 WITH SCHEMABINDING AS SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate FROM Sales.SalesOrderDetail WHERE ModifiedDate IN ( convert(datetime, '2004-07-31', 120), convert(datetime, '2004-07-30', 120), convert(datetime, '2004-07-29', 120) ) GO CREATE UNIQUE CLUSTERED INDEX VEndJulyO4Ind ON View6(SalesOrderID, SalesOrderDetailID) GO
Query 10
The following query can match View 6 and the system can produce a plan that scans the VendJuly04Ind index on the view instead of scanning the entire Sales.SalesOrderDetail table. This also demonstrates expression equivalence (because the order of days is different in the query than in the view, and the data formats are different) and predicate subsumption (because the query asks for a subset of the results stored in the view).
SELECT h.*, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID, UnitPrice, UnitPriceDiscount, d.rowguid, d.ModifiedDate FROM Sales.SalesOrderHeader as h, Sales.SalesOrderDetail as d WHERE (d.ModifiedDate = '20040729' OR d.ModifiedDate = '20040730') and d.SalesOrderID=h.SalesOrderID
View 7
Developers also sometimes find it convenient to use indexed views to enforce specialized integrity constraints. For example, a constraint you can enforce with an indexed view is "Column a of table T is unique except there may be multiple 0 values in the column." Indexed view View7 below enforces this constraint. If you run the following script, it runs successfully until the final insert. That statement is disallowed because it adds a non-zero duplicate value.
USE tempdb GO CREATE TABLE T(a int) GO CREATE VIEW View7 WITH SCHEMABINDING AS SELECT a FROM dbo.T WHERE a <> 0 GO CREATE UNIQUE CLUSTERED INDEX IV on View7(a) GO -- legal: INSERT INTO T VALUES(1) INSERT INTO T VALUES(2) INSERT INTO T VALUES(0) INSERT INTO T VALUES(0) -- duplicate 0 -- dissalowed: INSERT INTO T VALUES(2)
Frequently Asked Questions about Indexed Views
Q. Why are there restrictions on the kind of views I can create an index on?
A. To make sure that it is logically possible to incrementally maintain the view, to restrict the ability to create a view that would be expensive to maintain, and to limit the complexity of the SQL Server implementation. A large set of views is non-deterministic and context dependent; their contents 'changes' independent of DML operations. These can't be indexed. Examples are any views that call GETDATE or SUSER_SNAME in their definition.
Q. Why does the first index on a view have to be CLUSTERED and UNIQUE?
A. It must be UNIQUE to allow easy lookup of records in the view by key value during indexed view maintenance, and to prevent creation of views with duplicates, which would require special logic to maintain. It must be clustered because only a clustered index can enforce uniqueness and store the rows at the same time.
Q. Why isn't my indexed view being picked up by the query optimizer for use in the query plan?
A. There are three primary reasons the indexed view may not be being chosen by the optimizer:
(1) You are using a version other than Enterprise or Developer edition of SQL Server. Only Enterprise and Developer editions support automatic query-to-indexed-view matching. Reference the indexed view by name and include the NOEXPAND hint to have the query processor use the indexed view in all other editions.
(2) The cost of using the indexed view may exceed the cost of getting the data from the base tables, or the query is so simple that a query against the base tables is fast and easy to find. This often happens when the indexed view is defined on small tables. You can use the NOEXPAND hint if you want to force the query processor to use the indexed view. This may require you to rewrite your query if you don't initially reference the view explicitly. You can get the actual cost of the query with NOEXPAND and compare it to the actual cost of the query plan that doesn't reference the view. If they are close, this may give you confidence that the decision of whether or not to use the indexed view doesn't matter.
(3) The query optimizer is not matching the query to the indexed view. Double-check the definition of the view and the definition of the query to make sure that a structural match between the two is possible. CASTS, converts, and other expressions that don't logically alter your query result may prevent a match. Also, there are limits to the expression normalization and equivalence and subsumption testing that SQL Server performs. It may not be able to show that some equivalent expressions are the same, or that one expression that is logically subsumed by the other is really subsumed, so it may miss a match.
Q. I update my data warehouse once a week. Indexed views speed up my queries a lot during the week, but slow down the weekly update? What should I do?
A. Consider dropping the indexed views before the weekly update, and creating them again afterwards.
Q. My view has duplicates, but I really want to maintain it. What can I do?
A. Consider creating a view that groups by all the columns or expressions in the view you want, and adds a COUNT_BIG(*) column, and then creating a unique clustered index on the grouping columns. The grouping process ensures uniqueness. This isn't really the same view, but it might satisfy your needs.
Q. I have a view defined on top of another view. SQL Server won't let me index the top-level view. What can I do?
A. Consider expanding the definition of the nested view by hand into the top-level view, and then indexing it, indexing the innermost view, or not indexing the view.
Q. Why do indexed views have to be defined WITH SCHEMABINDING?
A. So that
- all objects the view references are unambiguously identified using schemaname.objectname regardless of which user is accessing the view, and
- the objects referred to in the view definition can't be altered in a way that would make the view definition illegal or force SQL Server to recreate the index on the vie
Q. Why can't I use OUTER JOIN in an indexed view?
A. Rows can logically disappear from an indexed view based on OUTER JOIN when you insert data into a base table. This makes incrementally updating OUTER JOIN views relatively complex to implement, and the performance of the implementation would be slower than for views based on standard (INNER) JOIN.
For More Information
Microsoft SQL Server 2005 Books Online contains more information about indexed views. For additional information, see these resources:
- Microsoft SQL Server Developer Center at http://msdn2.microsoft.com/sqlserver/default.aspx.
- The Microsoft SQL Server TechNet site at
http://www.microsoft.com/technet/prodtechnol/sql. - The Microsoft SQL Server Web site at http://www.microsoft.com/sql.
- SQL Server Magazine at http://www.sqlmag.com.
- The microsoft.public.sqlserver.server and microsoft.public.sqlserver.datawarehouse newsgroups at news://news.microsoft.com.
- The Microsoft Official Curriculum courses on SQL Server. For up-to-date course information, see http://www.microsoft.com/trainingandservices
No comments:
Post a Comment