Rob Paller A blog about databases, business intelligence, and an outlet for my inner geek

Building a Teradata Object Catalog

Posted on December 7, 2009

Recently I was asked to provide an inventory of a production Teradata system that included a list of all user created objects and as much information about the objects that I could gather included perm space usage, row counts, and object type. Two of the three could easily be gathered using the data dictionary tables in DBC. However, the row count requirement made me pause for a moment and weigh the impact this could have. The production system that I was working on had hundreds of databases and thousands of tables.

I knew writing an SQL script that would build the SELECT queries that would perform a COUNT(*) against every table and capture that in a flat file or table on the system was trivial. However, when you have a system with thousands or tens of thousands of objects it would take quite some time to run all of those statements even if many of the tables are relatively small. It just didn't make good sense to take that approach.

Then I realized that nearly every table had some level of statistics collected against it and maintained at least semi-regularly depending on the DBA that was responsible for the database. The trouble is that Teradata's HELP STATISTICS command only shows you the number of unique values for each set of columns containing statistics. But then I remembered that the view I used as part of my stats maintenance routine did have a column for the number of rows. The views that I use as the basis for my stats maintenance routine were written by Dieter Noeth and is available for the general public to use courtesy of the TeradataForum. For the sake of brevity and maintaining a single source for the code I am not going to post the SQL for the views here but you can find them here in the Attachment Library over at the TeradataForum. (It should be noted that there are two different sets of views that support both the 32 bit and 64 bit versions of Teradata. Make sure you use the correct set of views for your installation.)

In order to keep things organized there are three views used to gather the information instead of creating a bunch of derived tables. The view Stats_Data is responsible for collapsing the number of fields into a comma delimited list for the first 16 columns and then extracting the pertinent binary data for the statistics collected on a given set of columns. The second view Stats_Basics_[32/64]bit extrapolates the binary data from the Stats_Data view into individual binary fields that require further refining. The final view Stats_Details composes the binary data into its final numeric and temporal values from which additional reporting and analysis can be done.

With these three views in place the Object_Catalog view could easily be constructed by simply bringing together a view additional data dictionary tables and the Stats_Details view. The Object_Catalog view displays the following attributes about the objects in each database on the system:

  • Database Name - self-explanatory
  • Object Name - self-explanatory
  • Column Count - This is a count of the number of fields for the object in TVFields.
  • Table Rows - The number of rows on the table when the stats were last collected. NULL if stats are not collected.
  • Table Size - The current perm space being consumed by the table.
  • Table Missing Stats - Indicator field for if stats exist on the table. NULL for non-table objects.
  • Table Kind - A single character field categorizing the object type. (e.g. T, V, M, etc.)
  • Table Kind Description - A description of the object type (e.g. Table, View, Macro, etc.)

The SQL for the Object_Catalog view can be found below:

  • Share/Bookmark

Teradata Routine Maintenance: Perm Space History

Posted on November 30, 2009

A user over at TeradataQuestions recently asked a if there was a table in Teradata that allowed you to track the growth of tables or databases over time. The short answer to this question was no. The data dictionary tables in Teradata only provider the current information about tablesizes and the current and max sizes of databases. If you want to track this historically for chargeback or capacity planning you need to write your own maintenance routine.

The good news is this routine is fairly straightforward once you figure out what information you want to retain historically. The data dictionary objects that will be involved in this are DBC.Dbase, DBC.TVM, and DBC.DatabaseSpace. Originally, I was going to use the data dictionary view DBC.TableSize but opted to denormalize the table and retain the database level information here as well.

I am using a database called SysDBA in this example. This is where I prefer to keep all my DBA related objects that go above and beyond what Teradata provides in the Data Dictionary database (DBC), keeping my objects separate from their objects as a matter of personal preference. Below you will find the table definition that I am using. I opted to use the same data type that is found in DBC.DatabaseSpace for consistency sake, but that isn't to say you couldn't opt to retain the value in BIGINT or DECIMAL. Teradata is tracking the tables sizes in bytes but I have opted to also capture them in megabyte and gigabyte equivalents. I have also decided to retain both the current and max database size on this table as well by leveraging the ALL table. I have excluded PeakPerm from this, but it could easily be included if you feel there is value knowing what the PeakPerm was for a database when the data was capture in the table.

CREATE TABLE SysDBA.PermSpaceHist, NO FALLBACK,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      DatabaseName VARCHAR(30) CHARACTERSET LATIN NOT CASESPECFIC NOT NULL,
      TableName VARCHAR(30) CHARACTERSET LATIN NOT CASESPECIFIC NOT NULL,
      TableSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      TableSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      TableSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      DatabaseSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      DatabaseSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      DatabaseSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      MaxDatabaseSize FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      MaxDatabaseSizeMB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      MaxDatabaseSizeGB FLOAT FORMAT '----,---,---,---,--9' NOT NULL,
      CollectTimestamp TIMESTAMP(0) NOT NULL
     )
PRIMARY INDEX PIDX (DatabaseName)
;

I prefer to use a BTEQ script to maintain this table. This allows me to schedule it to run using cron from our ETL server. If you have an enterprise scheduling system or ETL tool that allows you to run custom scripts those would be appropriate locations to run this. This data is currently being captured on a weekly basis as there is no need to track the growth at a daily basis currently.

Below is core SQL to maintain this table.

INSERT INTO SysDBA.TableSizeHist
SELECT D.DatabaseName
     , T.TVMName AS TableName
     , SUM(CASE WHEN TVMId <> '000000000000'XB
                THEN CurrentPermSpace
                ELSE NULL
           END ) AS CurrentTablePerm
     , CurrentTablePerm / 1024000.0000 AS CurrentTablePermMB
     , CurrentTablePerm / 1024000000.0000 AS CurrentTablePermGB
     , SUM(CASE WHEN TVMId = '000000000000'XB
                THEN CurrentPermSpace
                ELSE NULL
           END ) AS CurrentDBPerm
     , CurrentDBPerm / 1024000.0000 AS CurrentDBPermMB
     , CurrentDBPerm / 1024000000.0000 AS CurrentDBPermGB
     , SUM(CASE WHEN TVMId = '000000000000'XB
                THEN MaxPermSpace
                ELSE NULL
           END ) AS MaxDBPerm
     , MaxDBPerm / 1024000.0000 AS MaxDBPermMB
     , MaxDBPerm / 1024000000.0000 AS MaxDBPermGB
     , CURRENT_TIMESTAMP(0)
FROM DBC.DatabaseSpace S
INNER JOIN
     DBC.Dbase D
      ON D.DatabaseId = S.DatabaseId
INNER JOIN
     DBC.TVM T
      ON T.TVMId       = S.TableId
WHERE D.DatabaseId <> '000000000000'XB /* All Database */
ORDER BY 1,2
GROUP BY 1,2
;
  • Share/Bookmark
Tagged as: , , 6 Comments

Engaging the Teradata Community

Posted on October 2, 2009

If you are looking for Teradata developer or DBA resources, you are not going to find an equivalent to SQLServerPedia yet. The Teradata developers and DBA's aren't found tweeting about join indexes, DBQL, or fallback, nor are they  blogging about multi-value compression, UDFs, or Teradata Active System Management (TASM) on a regular basis. That doesn't mean the community doesn't exist you just have to know where to find it and how to engage it.

Piecing together the Teradata Community

Finding the pieces...

The TeradataForum

The TeradataForum has been around for over a decade now. However it is not a traditional forum, it is an independently operated, semi-moderated listserv that is maintained by John Hall.  A listserv is a lot like a party line in your Inbox. You can listen in and talk when you feel the need. However, there are some drawbacks in that conversations can't be easily taken offline as email addresses were stripped before the message goes out on the listserv. The website for the TeradataForum provides a collection of sample threads, archive of threads by year, and the ability to search the listserv via Google. All in all, it is a great resource that has been serving the community well for quite some time.

Teradata Forums

A few years back Teradata established the Teradata Forums, a traditional discussion forum, for Teradata users to congregate and discuss all things Teradata. It currently boasts over 8,000 users discussing over 4,500 topics. The Teradata Forums are a great place for those who would rather not participate in a listserv to engage the Teradata community when they have a problem or to give back to the community when they have a spare minute to answer some questions.

Teradata Developer Exchange

Announced just this summer, the Teradata Developer Exchange is the latest step toward building a stronger community for Teradata developers and DBA's. Here you will find articles submitting by some of Teradata's own brightest employees on topics ranging from Priority Scheduler, Statistics, Viewpoint, and UDF development. There is also a forum here as well that allows community members to discuss many of the same topics that are discussed on the Teradata Forums without having to leave the Developer Exchange to ask or answer a question. There is also a section here to download some of the things that may be useful to developers like an Eclipse plug-in or the recently released alpha version of SQL Assistant Java Edition.

TeradataQuestions

TeradataQuestions is the most recent member of the Teradata Community. TeradataQuestions is an independently run forum based on the Stack Overflow Knowledge Exchange. (The best example of this can be seen at StackOverflow.)  StackOverflow has become wildly successful in part because the community not only provides answers to the questions but they have the ability to flag the best answer and also vote on the question or the answers within the question. Participation in the community earns you reputation points and badges. Is there better way to encourage community involvement and self policing than by allowing your input to be voted up or down by your peers?

Get involved

The Teradata developer and DBA community is flourishing. You just have to know where the pieces are and figure out which pieces fit the missing pieces that make up your involvement in the community.  How are you engaging the Teradata developer and DBA community? Let me know in the comments...

  • Share/Bookmark

Five Ways to Embrace your Geekness

Posted on July 13, 2009

Geek Edition

Today is officially the day were us geeks have a chance to celebrate being a geek. It is an opportunity to enjoy all of our gadgets and time spent in front of the computer. Here are few ways that you can spend your day. I have done several of these already and others are on my list of things to do.

  1. Turn your $60 router into a $500+ router with Tomato Firmware. Tomato is a easy to use replacement firmware for Linksys, Buffalo, and other Broadcom-based routers. It has a simple and easy to use GUI, ability to monitor bandwidth, advance QOS, multiple DynDNS services, and other advanced features.
  2. Turbocharge your DNS services and add advanced content control using OpenDNS. By simply replacing your DNS Servers from your ISP with OpenDNS's DNS servers you gain anti-phishing protection via PhishTank and improved speed by automatically resolving your DNS requests with the DNS server that is located closest to you. Advanced content control is available to you if you choose to open a free account with them. Automatically determine how wide or restricted you want your internet to be for the entire house. No software installation is required on any computer in your house. I could dedicate an entire blog post to using this service. For more information, check out their website. I can't speak highly enough about this service.
  3. Run multiple virtual environments. Want to run Windows 7 RC on your system but don't want to dedicate a physical partition on your machine? You may want to consider trying Sun's VirtualBox. VirtualBox is available under the GPL. It supports a large number of Guest OSes including Ubuntu, Windows 7 RC, SUSE, Fedora, and many others. The host OS can be Windows, Linux, Macintosh, and OpenSolaris. The latest release, 3.0.0, includes support of Direct 3D 8/9, OpenGL 2.0, and Guest SMP with up to 32 virtual CPUs.  What else were you planning to do with that 1 terabyte hard drive?
  4. Run a SSH Proxy using PuTTY that you can access anywhere to tunnel your web browsing for increased security. Chad Perrin has provided a nice write up on how to configure a SSH proxy using PuTTY. Sadly even as Microsoft prepares to release Windows 7 they have yet to figure out a way to provide a built in SSH client or daemon. However, once you have installed Tomato firmware you can configure your own SSH server and with Port Forwarding access it from anywhere you have internet access. By signing up for a dynamic DNS hostname you can access your SSH proxy without having to remember your dynamically assigned IP address. (You can sign up for free a dynamic DNS hostname using DynDNS. Don't forget to modify the Firefox setting to use your network proxy to resolve your DNS entries as well. (about:confg -> network.proxy.socks_remote_dns = true)
  5. Brush up your 100 essential skills as a geek. Anton Olsen over at GeekDad has comprised a list of 100 essential skills any self-respecting geek should be able to accomplish. These skills inlcude being able to open a Master combination padlock in under 10 minutes, being a Google search ninja, and knowing the answer to life, the universe and everything.

Bonus Material: Do you find that your spouse, girlfriend or parents just doesn't understand your inner geek? Well here is something you can share with them: The Nerd Handbook. While your reading the Nerd Handbook be sure you add Rands to your RSS feeds, he is a fantastic blogger worth following.

Bonus Material: Here are some well known geeks worth following on Twitter: Chris Hardwick (@nerdist), Adam Savage (@donttrythis), Grant Imahara (@grantmahara), and Levar Burton (@levarburton)

I may just take a few of these items listed above and blow them out into full blog posts to show how I am using some of these. I glossed over many of them for the sake of your attention span because I know you have Twitter open, four other tabs in your browser, and your iPhone next to you playing Pandora or Last.fm.

Do you have a secret project or geek skill that you are keeping close to the chest? Why not share it below in the comments with the other readers.

  • Share/Bookmark