Finding Broken Views in Teradata

I have a few trouble tickets that have been lingering in my queue for longer than I care to admit. A couple of these tickets have to do with views that point to base tables that no longer exist in the data model. I should have realized that when I suggested that these incidents be combined into a change control to identify any and all views on  the data warehouse that were broken it would become my plane to land. After all, there were only 5000 or so views that needed to be verified. This includes both the 1:1 views and VDM layer that sits on top of the 1:1 views that are exposed to the BI tools and take care of any role based access controls that may be needed.

I have been using AtanaSoft’s AtanaSuite for the past 12 months mostly because I am particularly fond of their QueryTool IDE over Teradata’s SQL Assistant. Furthermore, the shining star in their suite of tools is really the CompressTool but I digress and that is for another blog post. I don’t really use the other tools in the suite but the first thing I did was take an inventory of what was at my disposal in AtanaSuite that might be able to help me make short order of the task at hand. Turns out while there wasn’t anything that would work and this was something that I didn’t want to have to rely on a GUI on my desktop to manage. Any self-respecting, lazy sysadminwould have realize that a repeatable task should be a scripted task.

No matter how stringent your change management process is odds are that in an environment with over 5000 views in it is bound to end up with a view pointing to a non-existent table or reference a column that was renamed eventually. Over the past 2 years I have recognized the strengths and weaknesses of our own change management process. Suffice it to say, there’s no shame in having a repeatable process in place.

I opted to build a stored procedure to facilitate this task. First I built a cursor that included, among other things, a small SQL statement that would be used to validate the view.

 SELECT T1.DatabaseName
      , T1.TableName
      , 'SELECT NULL (TITLE ''ValidView'') FROM ' ||
         TRIM(T1.DatabaseName) || '.' || TRIM(T1.TableName) ||
        ' WHERE 1=2;' AS DMLText
   FROM "DBC"."Tables" T1
  INNER JOIN "DBC"."Databases" D1
     ON T1.DatabaseName = D1.DatabaseName
  WHERE T1.TableKind = 'V'
    AND D1.OwnerName = :iOwnerDatabaseName

You’ll notice in the cursor definition above there is an inbound parameter for the OwnerName. In the initial version of this stored procedure I was focusing on subset of the views in the environment. Also, initially when I wrote the SQL statement I wanted something that was going to be as lightweight as possible. The last thing I wanted was an SQL statement running against a table with a cardinality in the tens of billions to return them. I had a couple of options.

Option 1:

  FROM {DatabaseName}.{TableName};

Recent improvements in the Teradata Optimizer have done wonders for this query. In Teradata 13.10 the optimizer performs a cylinder index scan. Remember to CAST() the COUNT(*) to a sufficiently large enough data type as COUNT() returns an INTEGER data type by default.

  1) First, we lock a distinct {Database}."pseudo table" for read
     on a RowHash to prevent global deadlock for {Database}.{Table}. 
  2) Next, we lock {Database}.{Table} for read. 
  3) We do an all-AMPs SUM step to aggregate from {Database}.{Table}
     by way of a cylinder index scan with no residual conditions. 
     Aggregate Intermediate Results are computed globally, then placed
     in Spool 3.  The input table will not be cached in memory, but it
     is eligible for synchronized scanning.  The size of Spool 3 is
     estimated with high confidence to be 1 row (23 bytes).  The
     estimated time for this step is 34.64 seconds. 
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (all_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with high
     confidence to be 1 row (25 bytes).  The estimated time for this
     step is 0.00 seconds. 
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.

Option 2:

  FROM {DatabaseName}.{TableName}
 WHERE 1=2;

This turns out to be even more efficient that the cylinder index scan. As a result, this was the method which I opted to use in my code. EXPLAIN:

  1) First, we do a single-AMP RETRIEVE step from
     {Database}.{Table} in view
     {ViewDatabase}.{ViewName} by way of the unique primary
     index "{Database}.{Table}.{Column1} = NULL,
     {Database}.{Table}.{Column2} = NULL" with
     unsatisfiable conditions locking row for access.  The estimated
     time for this step is 0.00 seconds. 
  -> The row is sent directly back to the user as the result of
     statement 1.  The total estimated time is 0.00 seconds.

A second cursor definition will be used for the actual validation piece and also facilitate the logging in the error handler.

DECLARE cTestView CURSOR FOR TestView; -- Required for the actual validation process

The stored procedure would not work if you didn’t have a thorough error handling process implemented. You have to be prepared to handle any error that may come up as a result of the view definition being invalid. How you handle the error handling is really up to your business and/or reporting requirements. I won’t go into the exact details here aside from listing the error codes your stored procedure should handle. The last two error codes are used to log insufficient access privileges that may have otherwise been disruptive or fell to your general error handler.

  • 3810 – Column doesn’t exist
  • 3807 – Table doesn’t exist
  • 3523 – User doesn’t have SELECT access to underlying table
  • 5315 – User doesn’t have SELECT WITH GRANT OPTION

The final component of the stored procedure is to actually process your cursor and perform the actual validation of the views. This is where the second cursor I mentioned above comes into play. By using the PREPARE command the Teradata parser checks the syntax of the dynamic SQL statement that would be used to open the cursor. Syntax errors, such as those you are trapping in your error handler, are returned as exceptions.

OPEN GetViewNames;
FETCH FIRST FROM GetViewNames INTO vcDatabaseName, vcTableName, vcDMLText;
    SET intCount = intCount + 1;
    PREPARE TestView FROM vcDMLText;
    FETCH NEXT FROM GetViewNames INTO vcDatabaseName, vcTableName, vcDMLText; 

What would take an individual hours to process on their own can be done in minutes or seconds as a scripted routine. The results from the log table can then be further processed into an email to for you to read on Monday morning while working toward Inbox ZERO.

What this stored procedure doesn’t do is validate that your 1:1 views include all of the columns in the underlying base table. There’s nothing to stop you from adding columns to a table and not referencing them in the views that are built upon the table. It also doesn’t validate any views built upon your 1:1 views for missing columns. That’s kind of a grey area because you may purpose elect not to expose certain columns at this layer for security or use case reasons.

If you extend this stored procedure to complete those validations your logging routine could simply flag these exceptions as warnings or informational items thus making demoting their priority. The choice is yours.

Posted in Blog, Databases, Teradata | Tagged , , | Comments Off

How-To: Expand Your VMware Drive Image

So you’ve been given a pre-configured VMware image of Windows XP and a copy of VMware Player for your work environment. The considerate IT intern who probably configured your VMware image has no idea what you do day in and day out, let alone what tools you need in order to get your job done. They know that there is 4.7GB of space to work with on that DVD to cram your VMware image onto. So end up with a Windows XP guest OS with only the latest version Office loaded on there. All of that will fit on there only 10GB preallocated for your virtual hard disk and leave 5-6GB of space, easily. When you are done getting everything tweaked and configured just the way you like it, you find that you have only 200MB of space remaining. After all, the space requirements for your ETL software, database development software, database documentation, and other utilities add up.

This guide will show you how to expand the size of your VMware virtual hard disk and allocate that additional space to the partition in the guest OS.

Here are the things you will need before we proceed:

  • BACKUP of your VMware image.
  • Guest OS will need at least 75MB of free space.
  • Download a copy of EASEUS Partition Master Home Edition in the Guest OS.
  • BACKUP of your VMware image. – in case it wasn’t clear the first time I said it.

Step 0. Backup VMware Image

Before you proceed you will want to make sure that you back up your VMware image. If you backup your hard drive regularly and are comfortable with your most recent restore point then this step can be bypassed. If you don’t have a disaster recovery plan in place for your hard drive, I suggest you copy the entire folder containing your virtual machine to an external hard drive. It may not be the best solution but it is better than nothing.

Disclaimer: If you have sensitive information contained within your guest OS, please make sure your backup is secured in a manner that meets your company policies. If your host system hard drive is encrypted, copying the VMware image to an non-encrypted external hard drive will likely produce a non-encrypted backup.

Step 1. Select Virtual Machine

With the VMware Player screen open, select the virtual machine you wish to manipulate.  Then click on: “Edit virtual machine settings”

Continue reading

Posted in Blog, Tech | Tagged , , | Comments Off

From Bits to Bytes

Turning a passion for computers into a successful career

My passion for computers started a little over 25 years ago when my dad brought home our first computer, a Commodore 64 (C64), and hooked it up to our TV. At first it served a single purpose for me and my brother – video game console. We would spend hours swapping cartridges playing games like Paperboy and Centipede. It wasn’t until I saw my dad spending countless hours entering programs from a computer magazine into the computer that I realized there might be something more than just playing games. Eventually the TV would be replaced with a dedicated monitor; the cassette player with a floppy disk drive; the C64 with a Commodore 128 (C128).

Adieu Q-Link AOL, Bonjour BBS

Along with the C128 came this thing called a modem that made funny sounds, and a subscription to Q-Link. While the early days of Q-Link were nothing like what AOL would eventually become it provided a stepping stone to connecting with others online. Soon there after we stumbled across the world of Bulletin Board Services (BBS) and the subscription to Q-Link was no longer necessary. We quickly found a group of BBSes that we would frequent and it wasn’t long before we were invited to attend our first a bulletin board service meeting. I remember vividly how everyone gathered around the guy who had a 20 MB hard drive. It was there I was exposed to the online community and gaming on a BBS and there was no turning back. I was addicted.

Soon there after I was running my own BBS on the Commodore 128. Within the first year of running the BBS the Commodore 128 would be replaced with my first PC, an XT 8086 4.77 MHz . As the early 90’s progressed so did the successive replacements in computer hardware in my room. By the time I graduated high school, my BBS was connected to FIDONet, a network of message boards much like USENET newsgroups. It wouldn’t be long before my first exposure to the Internet while in college.

Reality Bites

While I was addicted to computers going through high school my heart and dreams were to pursue a degree in meteorology in the Air Force. When I wasn’t eligible to apply for the Air Force Academy, I quickly realized that I could capitalize on my passion for computers and pursue a computer science degree. It was while taking my first (and only) class on databases the instructor mentioned the opportunity to complete an internship with a regional retailer. A few of my friends and I applied and I quickly went from converting Paradox to Microsoft Access as an intern to being hired as a full-time employee developing web applications against SQL Server. I finished college while working full-time.

It was while working under satellite team under the data warehouse team  that I got my first exposure to database administration with SQL Server. I found that I enjoyed working with T-SQL and the administrative aspects of SQL Server. Eventually, I grew tired of writing applications and when the opportunity to become what was a junior DBA on the data warehouse team opened I jumped with both feet. It was there that other two DBAs and technical architect mentored me and solidified my career as a Teradata database administrator. They saw something in me that I hadn’t yet and knew with the right guidance I would be on a path to a successful career in data warehousing. I was fortunate enough to work with them for a couple years before making what would be the biggest career decision of my life at the time.

Knock, Knock… Who’s There?

Opportunity. It knocks only so many times before turning away and knocking on the door of someone else. I can’t tell you how many times I repeated those words as I struggled with the next career decision. I was asked to work with a small consulting firm on a one-year contract in the public sector on data warehousing project as an ETL developer. I turned them down at first. I had a young family, my youngest son was only two, and even though the job was close enough I could be home every night, the uncertainty beyond the initial 12 months was the deal breaker. When that firm contacted me again a few weeks later, I took the risk after talking to someone who I trusted professionally for the advice I sorely needed. Those initial 12 months turned into 6 years.

Initially my experience with SQL to work on a payment accuracy system and a client master data model was more critical. However, as my responsibility as database administrator increased with my time on the project. This allowed me to work closely with the system DBA team to help establish best practices and workload management guidelines that affect the entire data warehouse. Before long I was a trusted by the client for my technical acumen.

During the latter portion of my time there I started to become more involved in the Teradata community. Being an active member in the Teradata community has allowed me to help others and establish myself as an expert among my peers in the community. Twitter also has broadened my interactions with experts in the SQL Server, Oracle, business intelligence, data management, data quality, and data governance. As a result, I started to blog for both my employer and for my own site on various data related topics. It was a great exposure for me and gave me the opportunity to meet new people who share similar passions with me about data.

It was a combination of the mentor I turned to six years ago and someone who I met via Twitter that helped provide guidance and insight about the most difficult career decision I have made so far – becoming an independent consultant. I have recently accepted an opportunity to be a lead DBA position on another public sector project with an initial 12 month contract. Sound familiar?

Posted in Blog, Career | Tagged , , | 2 Comments