Teradata Database Object Summary
In my previous blog post I showed how to build an Object Catalog of your Teradata data warehouse. Trouble is even on moderately sized systems your Object catalog could result in tens of thousands of rows. One of my object summaries is over thirty thousand rows. This can be overwhelming if you hand it off to management since they prefer a thirty thousand foot view.
So in order to provide a higher level overview of your Teradata system you can create a view on top of your object catalog that summarizes information at the database or user level providing information such perm space allocation/consumption, average table perm space consumption, average number of rows per table, number of tables with/without stats, and a count of each object type in the database.
The following view could be extended fairly easily to include additional metrics on such as the largest and smallest tables in both row count or perm space consumption. For my needs, I didn't need to include those items. The SQL can be found below:
Building a Teradata Object Catalog
Recently I was asked to provide an inventory of a production Teradata system that included a list of all user created objects and as much information about the objects that I could gather included perm space usage, row counts, and object type. Two of the three could easily be gathered using the data dictionary tables in DBC. However, the row count requirement made me pause for a moment and weigh the impact this could have. The production system that I was working on had hundreds of databases and thousands of tables.
I knew writing an SQL script that would build the SELECT queries that would perform a COUNT(*) against every table and capture that in a flat file or table on the system was trivial. However, when you have a system with thousands or tens of thousands of objects it would take quite some time to run all of those statements even if many of the tables are relatively small. It just didn't make good sense to take that approach.
Then I realized that nearly every table had some level of statistics collected against it and maintained at least semi-regularly depending on the DBA that was responsible for the database. The trouble is that Teradata's HELP STATISTICS command only shows you the number of unique values for each set of columns containing statistics. But then I remembered that the view I used as part of my stats maintenance routine did have a column for the number of rows. The views that I use as the basis for my stats maintenance routine were written by Dieter Noeth and is available for the general public to use courtesy of the TeradataForum. For the sake of brevity and maintaining a single source for the code I am not going to post the SQL for the views here but you can find them here in the Attachment Library over at the TeradataForum. (It should be noted that there are two different sets of views that support both the 32 bit and 64 bit versions of Teradata. Make sure you use the correct set of views for your installation.)
In order to keep things organized there are three views used to gather the information instead of creating a bunch of derived tables. The view Stats_Data is responsible for collapsing the number of fields into a comma delimited list for the first 16 columns and then extracting the pertinent binary data for the statistics collected on a given set of columns. The second view Stats_Basics_[32/64]bit extrapolates the binary data from the Stats_Data view into individual binary fields that require further refining. The final view Stats_Details composes the binary data into its final numeric and temporal values from which additional reporting and analysis can be done.
With these three views in place the Object_Catalog view could easily be constructed by simply bringing together a view additional data dictionary tables and the Stats_Details view. The Object_Catalog view displays the following attributes about the objects in each database on the system:
- Database Name - self-explanatory
- Object Name - self-explanatory
- Column Count - This is a count of the number of fields for the object in TVFields.
- Table Rows - The number of rows on the table when the stats were last collected. NULL if stats are not collected.
- Table Size - The current perm space being consumed by the table.
- Table Missing Stats - Indicator field for if stats exist on the table. NULL for non-table objects.
- Table Kind - A single character field categorizing the object type. (e.g. T, V, M, etc.)
- Table Kind Description - A description of the object type (e.g. Table, View, Macro, etc.)
The SQL for the Object_Catalog view can be found below:
Teradata Routine Maintenance: Perm Space History
A user over at TeradataQuestions recently asked a if there was a table in Teradata that allowed you to track the growth of tables or databases over time. The short answer to this question was no. The data dictionary tables in Teradata only provider the current information about tablesizes and the current and max sizes of databases. If you want to track this historically for chargeback or capacity planning you need to write your own maintenance routine.
The good news is this routine is fairly straightforward once you figure out what information you want to retain historically. The data dictionary objects that will be involved in this are DBC.Dbase, DBC.TVM, and DBC.DatabaseSpace. Originally, I was going to use the data dictionary view DBC.TableSize but opted to denormalize the table and retain the database level information here as well.
I am using a database called SysDBA in this example. This is where I prefer to keep all my DBA related objects that go above and beyond what Teradata provides in the Data Dictionary database (DBC), keeping my objects separate from their objects as a matter of personal preference. Below you will find the table definition that I am using. I opted to use the same data type that is found in DBC.DatabaseSpace for consistency sake, but that isn't to say you couldn't opt to retain the value in BIGINT or DECIMAL. Teradata is tracking the tables sizes in bytes but I have opted to also capture them in megabyte and gigabyte equivalents. I have also decided to retain both the current and max database size on this table as well by leveraging the ALL table. I have excluded PeakPerm from this, but it could easily be included if you feel there is value knowing what the PeakPerm was for a database when the data was capture in the table.
CREATE TABLE SysDBA.PermSpaceHist, NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT ( DatabaseName VARCHAR(30) CHARACTERSET LATIN NOT CASESPECFIC NOT NULL, TableName VARCHAR(30) CHARACTERSET LATIN NOT CASESPECIFIC NOT NULL, TableSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL, TableSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, TableSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, DatabaseSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL, DatabaseSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, DatabaseSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, MaxDatabaseSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL, MaxDatabaseSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, MaxDatabaseSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL, CollectTimestamp TIMESTAMP(0) NOT NULL ) PRIMARY INDEX PIDX (DatabaseName) ;
I prefer to use a BTEQ script to maintain this table. This allows me to schedule it to run using cron from our ETL server. If you have an enterprise scheduling system or ETL tool that allows you to run custom scripts those would be appropriate locations to run this. This data is currently being captured on a weekly basis as there is no need to track the growth at a daily basis currently.
Below is core SQL to maintain this table.
INSERT INTO SysDBA.TableSizeHist SELECT D.DatabaseName , T.TVMName AS TableName , SUM(CASE WHEN TVMId <> '000000000000'XB THEN CurrentPermSpace ELSE NULL END ) AS CurrentTablePerm , CurrentTablePerm / 1024000.0000 AS CurrentTablePermMB , CurrentTablePerm / 1024000000.0000 AS CurrentTablePermGB , SUM(CASE WHEN TVMId = '000000000000'XB THEN CurrentPermSpace ELSE NULL END ) AS CurrentDBPerm , CurrentDBPerm / 1024000.0000 AS CurrentDBPermMB , CurrentDBPerm / 1024000000.0000 AS CurrentDBPermGB , SUM(CASE WHEN TVMId = '000000000000'XB THEN MaxPermSpace ELSE NULL END ) AS MaxDBPerm , MaxDBPerm / 1024000.0000 AS MaxDBPermMB , MaxDBPerm / 1024000000.0000 AS MaxDBPermGB , CURRENT_TIMESTAMP(0) FROM DBC.DatabaseSpace S INNER JOIN DBC.Dbase D ON D.DatabaseId = S.DatabaseId INNER JOIN DBC.TVM T ON T.TVMId = S.TableId WHERE D.DatabaseId <> '000000000000'XB /* All Database */ ORDER BY 1,2 GROUP BY 1,2 ;
Sabatically Speaking
Recently, Tim Ford (@SQLAgentman) threw down a challenge to a few members of the SQL Server community. His challenge was called Give Me A Coconut and Six Months, what would you do professionally if you were stranded on a deserted island and could devote all your time to these tasks. While I was not initially tagged by him to participate, I'm sure he wouldn't mind if I played in the same sandbox. (Right Tim? *grin*)
I assumed the shared responsibility of a Teradata environment with another DBA. Generally speaking, a single full time DBA should be able to handle this particular environment with ease if it was the only thing they were responsible for doing. Trouble is we are both 95% or better committed as developers and mentors to the others on the team.
If I were able to isolate myself in my Cave left to work here are some of things that I would like to get done:
Proper Development/Test/Production environment
We currently have the luxury of using both a production and disaster recovery/development Teradata systems. However, long before the DR system came online the developers were used to developing on the production system. This meant that development databases had to be created and named separately from their production counterparts and developers also had the ability to create objects within their user account. (For those of you not familiar with Teradata, users are simply databases with passwords. They can have permanent space assigned and own any object such as tables, views, macros, and stored procedures.)
To this day old habits die hard. Developers are still working on the production system because it is easier and they have easy access to production data for testing. The other DBA and I have been working hard to break their habit (through lot of database restores to the development system from production). For the most part we have been successful when it comes to new development. It is the support of the legacy environments that has been more difficult.
Furthermore, we have gone so far as to begin work on a methodology to establish a true development and test environment on the DR system. This environment introduces new concepts such as doing ETL through views instead of directly against the tables and separating business views from views for business intelligence tools (where shortcuts and other "tricks" are done to fool the tool). We have also laid out the separation of duties that are required as well. There would be a Librarian, UNIX batch administrator, Data Modeler and DBA functions. It's not to say those don't exist currently, it is just that the lines are blurred and the focus is on putting the customer first. Instead, there would be a more rigorous turn over process and proper use of source code version control for all ETL and DDL. We have most of this down on paper now and a little bit of the ground work done but haven't had the time to roll it out.
Backups
Our current backup strategy is good, but I think it could be better. Once we have implemented the process described above I think it could go a long way to helping our disaster recovery plan. With the ETL being done through views it allows me to move databases and tables around at will. I have considered organizing databases and tables in groupings of disaster recovery levels. In the event of a disaster I can simply start with my most critical database and restore additional databases as needed should the crisis continue.
The other challenge is that we currently do not have the luxury of a replication process. So in order to provide developers current data we have to rely on backups and restores between the environments. This has cluttered our archive and recovery schedule because no one wants to wait 30 minutes to tape scan a larger backup of 6 databases only to spend5 minutes restoring the actual data needed. The quick work around was to begin backing up individual databases.
Needless to say, there is quite a bit of time and effort required here to get this straightened out to find the best balance between efficient recovery time for development and a suitable disaster recovery plan as well. In lieu of replication, I dream of being able backup to a SAN and then having the SAN backed up to tape.
Maintenance Scripts
I've started to get some routine maintenance done automatically through scripts, but once you get started you realize there is a lot of things that you could automate. A slippery slope, so to speak. Among the things I have automated includes a statistics collection process, perm space warning (no automated perm growth in Teradata unless you write your own), dormant table identification, and secondary indexes that are not being used frequently are just a few of the things that I have automated. But there are several other things that would be of some value to automate. These include tracking table growth (replacing an inherited database growth process), identifying the top 10 worse performing workloads for the week by workload type (e.g. ETL, ad hoc, BI Tools) using DBQL, and a process to disable inactive users and drop dormant ones.
This work here is probably a good solid month's worth of effort. I'm sure that once I got started down this path there are quite a few other things that would be worth spending time on professionally. So a month might be enough time to satisfy this, but many of these things would make good blog series of "How-To's", so maybe I should ask for three months. What would you do if you had A Coconut and Six Months, hypothetically speaking of course?



