referenced by : http://kb.tableausoftware.com/articles/knowledgebase/database-query-performance
For the DBA
If the above points are not sufficient to address your performance problems, it could be that the problems are at a deeper level than can be addressed by an end user. Tableau suggests that you engage your database administrator (DBA) and have them look at the following section for suggestions.
Seriously, know what you are asking
As pointed out earlier, knowing what you are asking the database to do is an important part of performance tuning. By running an audit or trace on the database, you can isolate the query that Tableau has passed to the query engine, and can check to see if it is what you expect. For example, does it have the expected GROUP BY and filter clauses; is it doing aggregations in the query as opposed to returning raw field values; etc.
As an example, in SQL Server start the Profiler tool to trace the queries running (filter by application name ="Tableau 6.1" or by your user name if the server is busy). This will allow you to see what the query was and how long it took to return.
Tune your indexes
Once you know the query being run, you can dump it into the execution plan estimator to see in more detail how the DBMS will process the query - this does not execute the query, but returns the estimated execution plan based on the statistics the server has collected.
Based on the information returned, you can determine whether additional indexes need to be created (i.e., the kind of query being asked by end users has changed and the current indexing model no longer reflects this accurately).
This is a deep topic but some basic principles are:
- Make certain you have indexes on all columns that are part of table joins
- Make certain you have indexes on any column used in a filter
- Explicitly define primary keys
- Explicitly define foreign key relationships
- For large data sets, use table partitioning
- Define columns as NOT NULL where possible
Use statistics
Databases engines collect statistical information about indexes and column data stored in the database. These statistics are used by the query optimizer to choose the most efficient plan for retrieving or updating data. Good statistics allow the optimizer to accurately assess the cost of different query plans, and choose a high-quality plan. For example, a common misunderstanding is that if you have indexes, databases will use those indexes to retrieve records in your query. Not necessarily. If you create, let's say, an index to a column City and <90% of the values are ‘Vancouver', the DBMS will most likely opt for a table scan instead of using the index if it knows these stats.
Ensuring that database statistics are being collected and used will help the database engine generate better queries, resulting in faster query performance.
Optimize the data model
Finally, the data model being queried can have a significant impact on the performance of queries. Ensuring that the structure of the data is aligned to the kinds of analysis the end users will do is critical for good query performance. If you find you are needing to design excessive joins, it could be an indication that the data model is not suited to the task at hand.
An example is where it could be beneficial to create summary tables if most of your queries only need aggregated data - not base level details records.
Again, this is a big topic beyond the scope of this article, but DBMS vendors have many whitepapers that describe their recommended best practices for data warehouse and data mart design.
댓글 없음:
댓글 쓰기