Friday, 15 March 2013

Warehouse Table Naming Conventions

 

These conventions are based upon Oracle’s BI Applications 7.9.6.3 standards.

The Analytic Data Warehouse (ADW) is an arbitrary name for the performance layer of your data-warehouse. The diagram below shows a typical data warehouse reference architecture with the Performance Layer circled in red.

image

Table Names

The ADW schema will predominantly hold tables built by ETL and queried by OBIEE

  • Fact tables contain columns which can be summed, averaged, etc.
  • Dimensional tables contain descriptive columns
  • ADW tables to be loosely named in the format business name/grain/type
  • Business name is a short business description of the table which is not tied to Northgate or e-Business Suite names
  • Grain is optional and describes table granularity (useful for aggregate tables)
    • For the time dimension, the level names DAY, WEEK, MONTH, YEAR should be used
  • Type is one of:
    • _A = aggregated fact table
    • _D = dimension table
    • _DH = dimension hierarchy table
    • _F = fact table
    • Staging tables (used during ETL but not by OBIEE) will have S appended to the
      • _DS = dimensional staging table
      • _FS = fact staging table
    • Materialized views should have MV appended to them
      • _DMV = dimensional materialized view
      • _FMV = fact materialized view
  • By default use singular names (facts maybe plural?)
  • Some examples:
    • TENANCY_D = Tenancy dimension
    • DEBT_DETAIL_F = Debt fact (lowest level detail)
    • DEBT_REGION_A = Debt fact aggregated to a regional level
    • DAY_D = Calendar dimension at day level (lowest level)
    • WEEK_D = Calendar dimension at week level
    • MONTH_D = Calendar dimension at month level
    • PROPERTY_UTILISATION_FMV = Property utilisation fact materialized view

Columns

  • Column names should be business relevant, non-technical, and not tied to Northgate or e-Business Suite names
  • Column suffixes:
    • WID - Identifier generated by ETL to link dimension and fact tables (except ROW_ID). For example TENANCY_WID would reference TENANCY_D.ROW_ID.
    • _ID - Source system identifier
    • _FLG - Flags Y/N
    • _IND - Indicator e.g. D,W,M
    • _DT - Date
    • _TS – Timestamp
    • _AMT - Amount
    • _CNT - Count
    • _PCT - Percentage
    • _CODE - A coded alphanumeric value that defines a possible value for a data element
    • _NAME - A short name that describes the code
    • _DESC - A long description for the code
  • Other conventions:
    • _NO - For descriptive number columns, i.e. TELEPHONE_NO
    • _TXT - For text fields which contain concatenated values etc.
  • System columns on each table:
    • ROW_WID - surrogate key to uniquely identify record (generated during ETL), first column in table
    • DATASOURCE_NUM_ID - Unique identifier of source system from which data was extracted. TBD numbers to use, 2nd column in table
    • INTEGRATION_ID - Unique identifier of fact or dimension in source system, 3rd column in table
    • W_INSERT_DT - Date inserted to Warehouse, at the end of table column definitions
    • W_UPDATE_DT - Date updated to Warehouse, at the end of table column definitions
    • ETL_RUN_ID - Identifier of ETL process that loaded, integrated, checked the row
  • Optional on each table:
    • CREATED_ON_DT - Date created in source system
    • CHANGED_ON_DT - Date changed in source system
    • DELETED_FLG - For logical deletes of warehouse records
  • Type 2 Slowly Changing Dimension columns:
    • EFFECTIVE_FROM_DT – Date that the dimensional record is effective from
    • EFFECTIVE_TO_DT – Date that the dimensional record is effective to
    • CURRENT_FLG – Is record current? Y/N
  • All ADW surrogate keys should be integers and generated from a sequence
  • Foreign keys should always use warehouse surrogate keys
  • The W_ prefix denotes a translated to warehouse code/name/description combination. (the concept of domain values)

Notes on Time Dimensions

  • Reporting, financial, and other calendars are all associated (de-normalized) onto each day of the year
  • Dimensional aggregates are needed by OBIEE for aggregate navigation. So WEEK_D, MONTH_D, and YEAR_D should also be created.
  • ETL updates dimension with reporting flags as necessary
  • The surrogate key ROW_WID is an integer representation of the period, of the format YYYYMMDD or YYYYMM or YYYY

Example Physical Table Column Conventions

Domain

Datatype

Notes

AMT

Number (12,2)

Amount

CNT

Integer

Count

CODE

Varchar2 (20)

Code e.g. TUS, TUD

DD

Number (2)

Day of month e.g. 1-31

DDD

Number (3)

Day of year e.g. 1-366

DESC

Varchar2 (200)

Code description

DS

Number (2)

Datasource e.g. 1=EBS

DT

Date

Date and time

FLG

Char (1)

Y/N flag

ID

Varchar2 (12)

Identifier

IND

Varchar2 (3)

Indicator e.g. D,W,M,Q,A

MM

Number (2)

Month of year e.g. 1-12

NAME

Varchar2 (250)

Name

NO

Number (12)

Number

PCT

Number (3,3)

Percentage

QQ

Number (2)

Quarter of year e.g. 1-4

TS

Timestamp

Timestamp

TXT

Varchar2 (200)

Text string e.g. +64 4 1234567

WW

Number (2)

Week of year e.g. 1-53

WID

Number (12)

Warehouse Identifier i.e. rowid

YYYY

Number (4)

Year e.g. 2012

 

Refresher on Slowly Changing Dimensions Types

(from Wikipedia)

clip_image002

No comments:

Post a Comment