Finding Broken Views in Teradata

I have a few trouble tickets that have been lingering in my queue for longer than I care to admit. A couple of these tickets have to do with views that point to base tables that no longer exist in the data model. I should have realized that when I suggested that these incidents be combined into a change control to identify any and all views on  the data warehouse that were broken it would become my plane to land. After all, there were only 5000 or so views that needed to be verified. This includes both the 1:1 views and VDM layer that sits on top of the 1:1 views that are exposed to the BI tools and take care of any role based access controls that may be needed.

I have been using AtanaSoft’s AtanaSuite for the past 12 months mostly because I am particularly fond of their QueryTool IDE over Teradata’s SQL Assistant. Furthermore, the shining star in their suite of tools is really the CompressTool but I digress and that is for another blog post. I don’t really use the other tools in the suite but the first thing I did was take an inventory of what was at my disposal in AtanaSuite that might be able to help me make short order of the task at hand. Turns out while there wasn’t anything that would work and this was something that I didn’t want to have to rely on a GUI on my desktop to manage. Any self-respecting, lazy sysadminwould have realize that a repeatable task should be a scripted task.

No matter how stringent your change management process is odds are that in an environment with over 5000 views in it is bound to end up with a view pointing to a non-existent table or reference a column that was renamed eventually. Over the past 2 years I have recognized the strengths and weaknesses of our own change management process. Suffice it to say, there’s no shame in having a repeatable process in place.

I opted to build a stored procedure to facilitate this task. First I built a cursor that included, among other things, a small SQL statement that would be used to validate the view.

 SELECT T1.DatabaseName
      , T1.TableName
      , 'SELECT NULL (TITLE ''ValidView'') FROM ' ||
         TRIM(T1.DatabaseName) || '.' || TRIM(T1.TableName) ||
        ' WHERE 1=2;' AS DMLText
   FROM "DBC"."Tables" T1
  INNER JOIN "DBC"."Databases" D1
     ON T1.DatabaseName = D1.DatabaseName
  WHERE T1.TableKind = 'V'
    AND D1.OwnerName = :iOwnerDatabaseName

You’ll notice in the cursor definition above there is an inbound parameter for the OwnerName. In the initial version of this stored procedure I was focusing on subset of the views in the environment. Also, initially when I wrote the SQL statement I wanted something that was going to be as lightweight as possible. The last thing I wanted was an SQL statement running against a table with a cardinality in the tens of billions to return them. I had a couple of options.

Option 1:

  FROM {DatabaseName}.{TableName};

Recent improvements in the Teradata Optimizer have done wonders for this query. In Teradata 13.10 the optimizer performs a cylinder index scan. Remember to CAST() the COUNT(*) to a sufficiently large enough data type as COUNT() returns an INTEGER data type by default.

  1) First, we lock a distinct {Database}."pseudo table" for read
     on a RowHash to prevent global deadlock for {Database}.{Table}. 
  2) Next, we lock {Database}.{Table} for read. 
  3) We do an all-AMPs SUM step to aggregate from {Database}.{Table}
     by way of a cylinder index scan with no residual conditions. 
     Aggregate Intermediate Results are computed globally, then placed
     in Spool 3.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 3 is
     estimated with high confidence to be 1 row (23 bytes).  The
     estimated time for this step is 34.64 seconds. 
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (all_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 1 row (25 bytes).  The estimated time for this
     step is 0.00 seconds. 
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.

Option 2:

  FROM {DatabaseName}.{TableName}
 WHERE 1=2;

This turns out to be even more efficient that the cylinder index scan. As a result, this was the method which I opted to use in my code. EXPLAIN:

  1) First, we do a single-AMP RETRIEVE step from
     {Database}.{Table} in view
     {ViewDatabase}.{ViewName} by way of the unique primary
     index "{Database}.{Table}.{Column1} = NULL,
     {Database}.{Table}.{Column2} = NULL" with
     unsatisfiable conditions locking row for access.  The estimated
     time for this step is 0.00 seconds. 
  -> The row is sent directly back to the user as the result of
     statement 1.  The total estimated time is 0.00 seconds.

A second cursor definition will be used for the actual validation piece and also facilitate the logging in the error handler.

DECLARE cTestView CURSOR FOR TestView; -- Required for the actual validation process

The stored procedure would not work if you didn’t have a thorough error handling process implemented. You have to be prepared to handle any error that may come up as a result of the view definition being invalid. How you handle the error handling is really up to your business and/or reporting requirements. I won’t go into the exact details here aside from listing the error codes your stored procedure should handle.┬áThe last two error codes are used to log insufficient access privileges that may have otherwise been disruptive or fell to your general error handler.

  • 3810 – Column doesn’t exist
  • 3807 – Table doesn’t exist
  • 3523 – User doesn’t have SELECT access to underlying table
  • 5315 – User doesn’t have SELECT WITH GRANT OPTION

The final component of the stored procedure is to actually process your cursor and perform the actual validation of the views. This is where the second cursor I mentioned above comes into play. By using the PREPARE command the Teradata parser checks the syntax of the dynamic SQL statement that would be used to open the cursor. Syntax errors, such as those you are trapping in your error handler, are returned as exceptions.

OPEN GetViewNames;
FETCH FIRST FROM GetViewNames INTO vcDatabaseName, vcTableName, vcDMLText;
    SET intCount = intCount + 1;
    PREPARE TestView FROM vcDMLText;
    FETCH NEXT FROM GetViewNames INTO vcDatabaseName, vcTableName, vcDMLText; 

What would take an individual hours to process on their own can be done in minutes or seconds as a scripted routine. The results from the log table can then be further processed into an email to for you to read on Monday morning while working toward Inbox ZERO.

What this stored procedure doesn’t do is validate that your 1:1 views include all of the columns in the underlying base table. There’s nothing to stop you from adding columns to a table and not referencing them in the views that are built upon the table. It also doesn’t validate any views built upon your 1:1 views for missing columns. That’s kind of a grey area because you may purpose elect not to expose certain columns at this layer for security or use case reasons.

If you extend this stored procedure to complete those validations your logging routine could simply flag these exceptions as warnings or informational items thus making demoting their priority. The choice is yours.