Naming Conventions In The Data Mart

What you call things matter. It matters to you, and it matters to others who interact with you. When you name things, your project, your children, your children’s stuffed animals, you typically consider two perspectives: what YOU think the name should be and what would OTHERS think about it.

Naming your data mart database tables is no different. And while when naming pets or stuffed animals, I would argue that what you think should matter just as much as what you think others will think about it, when it comes to tables and columns in your mart, I would suggest the OTHERS perspective should have a much higher weight.

Coming up with a naming convention and database standards at large is probably one of the first tasks you will face when starting to work on your reporting project implementation phase. In your logical designs and high levels diagrams you were able to use names that business users relate to, but when the time comes to start implementing the physical model, the DBAs, ETL developers and the rest of the technical database development team takes over and the needs of OTHERS, when it comes to what things are called, are usually completely ignored.

The fact of the matter is that most of the folks who are working on creating the data warehouse, operational data store or data mart, will never actually use it for any kind of reporting purpose. They are concerned with building database objects that would satisfy a variety of reporting needs, explicitly stated, and sometimes not. Structuring them in a way that will transcend the mind boggling fast pace of change in any modern organization. Systems come and go, business rules constantly change, personnel and minds shift. Designing a model that will hold throughout all that, and continue to deliver accurate, timely and reliable results is hard enough, and if that is accomplished, nobody except the database development team should have any say about what objects are called inside the database. Right? Well, they might be missing something.

Just like any other corporate project, the success of your reporting project is dependent on perception. The “rep” on the project is what will ultimately determine its faith. If people in the organization perceive the project to be a success, you will be a hero, but if, for whatever reason, the talk over the water fountain deems the project as a failure, you can forget about that promotion or extension you were eying.

And who will create this success or failure perception on your reporting project? Well, we can think of the people who will create this perception in terms of three groups.

First, you have your end users. These are typically the business users who defined the somewhat narrow requirements that launched your reporting initiative. They will never see the inside of your data warehouse or mart, and most of them have no idea what is an RDBMS or the faintest clue regarding how it works. They will see and use reports, dashboards and applications that rely on the data you produce, and their perception will ultimately rule. That is why you invest heavily in reporting technologies, and spend months refining fonts, colors, layouts and graphs. The end result of the monumental data effort has to look absolutely stunning, and the information has to be accurate. That is a given. If you cannot accomplish that, don’t bother getting started.

The third group is the technical implementation team. It consists of developers, business analysts, quality assurance folks and many others. This is the group the builds “the thing”. You better make sure they are all “groupies” and that the aura they project regarding anything related to the data work, is bright and positive.

The second group, in between the end users and the development team, is responsible for gluing the two. It’s the group of folks who design the interfaces between the database, and the reporting. The meta layer of your project. They are report and dashboards developers, universe designers or admin tool administrators or framework managers. They are tasked with understanding the tables, joins and structures of your database and translating those into something that can support the information needed from the end users. Some of them work directly for you as part of the project team, some you know of, but don’t manage, others, you may learn about in hind sight.

Why do these folks matter? Well, this group of individuals typically strudels the development and business worlds. They talk to everybody. They are often opinionated, and let folks on both sides of the project know what they think about the technical implementation as well as the business requirement.

They are the OTHERS you should consider when thinking about the naming convention you will deploy. Even if you are using a meta data layer such as BusinessObjects Universe, or OBIEE Admin Tool, these are the folks who will still go back to the database layer to check things, test, validate, understand, and design end products in the form of meta data layers, reports and dashboards and applications. This could be a fairly large group, and in a global situation, can be spread across the world.

If your report developers and meta data layer gurus start trashing the reporting data in front of your end business users because they cannot understand it – NOT BECAUSE IT IS NOT GOOD, BUT BECAUSE THEY CAN’T FOLLOW YOUR NAMING CONVENTIONS, you will be facing a perception problem.

So, how can you stay ahead of such a problem? Simply consider the needs of those who will be querying the database, as well as those who will be developing it, when naming things. Here are some concrete suggestions:

  1. All report developers and meta data layer designers are familiar with basic dimensional modeling concepts. They will expect to see dimensions and facts. Name your objects accordingly. Use a _FACT or _DIM suffix (some prefer prefix) to indicate the object orientation from a dimensional perspective. If you model an object as both (does that really happen often?), indicate that (_DIM_FACT). Use this same concept to indicate additional concepts such as aggregates.
  2. Avoid using acronyms and abbreviations if possible. If your RDBMS has restrictions on object names, and you have to shorten names, seek report developers and business users advise on appropriate short names to use. If you can’t do that, look to avoid cutting off long words, instead use shortening techniques that use letters from the beginning, middle and end of the word you seek to shorten. If you don’t have enough space to spell out CATEGORY, consider using CATEG, instead of CAT. A couple of months from now, It may save the report developer in Japan a lot of time and wonder as to why you would use domesticated animal  references in the reporting table names.
  3. Consider whether you need to indicate the database object type as part of its name. Your report and meta data layer developer will not care if they are using a view, a materialized view or a table. Your developers are using database tools that clearly indicate the object type, typically by displaying different types in different groupings. Especially if you are short on space for object names, do you really need to devote 3, 4 5, 6 or more character of the object names to indicate it is a _TBL_, _VW_ or _MV_?
  4. Column names matter as well. Once again, avoid acronyms and abbreviations as much as possible and spell out as much as you can to avoid ambiguity by someone who is not sitting right next to you right now, understanding exactly what you mean by some obscure three letter code you choose to use.
  5. Use a convenient and easy to follow naming convention for Primary and Foreign key column names. This is especially critical if you do not plan to enforce data integrity at the database level (which I personally typically don’t, but that’s a topic for another post..). It would make the reports and meta data layer developers life so much easier if they know that a CATEGORY_ID is a CATEGORY_ID is a CATEGORY_ID. Be clear about synthetic keys as opposed to natural keys, your reporting folks would care about that.
  6. Perhaps the most important of all is BE CONSISTENT. If your naming convention is consistent, even if it’s somewhat complex, people will eventually get it, and when they do, they will love the predictability and ease in which they can understand your models. Lack of consistency will throw the reporting team off, causing confusion, doubt and ultimately reduce the level of confidence in the mart.
This entry was posted in Data Warehousing and tagged , , , . Bookmark the permalink.