These conventions are based upon Oracle’s BI Applications 7.9.6.3 standards.
The Analytic Data Warehouse (ADW) is an arbitrary name for the performance layer of your data-warehouse. The diagram below shows a typical data warehouse reference architecture with the Performance Layer circled in red.
Table Names
The ADW schema will predominantly hold tables built by ETL and queried by OBIEE
- Fact tables contain columns which can be summed, averaged, etc.
- Dimensional tables contain descriptive columns
- ADW tables to be loosely named in the format business name/grain/type
- Business name is a short business description of the table which is not tied to Northgate or e-Business Suite names
- Grain is optional and describes table granularity (useful for aggregate tables)  - For the time dimension, the level names DAY, WEEK, MONTH, YEAR should be used
 
- Type is one of:  - _A = aggregated fact table
- _D = dimension table
- _DH = dimension hierarchy table
- _F = fact table
- Staging tables (used during ETL but not by OBIEE) will have S appended to the  - _DS = dimensional staging table
- _FS = fact staging table
 
- Materialized views should have MV appended to them  - _DMV = dimensional materialized view
- _FMV = fact materialized view
 
 
- By default use singular names (facts maybe plural?)
- Some examples:  - TENANCY_D = Tenancy dimension
- DEBT_DETAIL_F = Debt fact (lowest level detail)
- DEBT_REGION_A = Debt fact aggregated to a regional level
- DAY_D = Calendar dimension at day level (lowest level)
- WEEK_D = Calendar dimension at week level
- MONTH_D = Calendar dimension at month level
- PROPERTY_UTILISATION_FMV = Property utilisation fact materialized view
 
Columns
- Column names should be business relevant, non-technical, and not tied to Northgate or e-Business Suite names
- Column suffixes:  - WID - Identifier generated by ETL to link dimension and fact tables (except ROW_ID). For example TENANCY_WID would reference TENANCY_D.ROW_ID.
- _ID - Source system identifier
- _FLG - Flags Y/N
- _IND - Indicator e.g. D,W,M
- _DT - Date
- _TS – Timestamp
- _AMT - Amount
- _CNT - Count
- _PCT - Percentage
- _CODE - A coded alphanumeric value that defines a possible value for a data element
- _NAME - A short name that describes the code
- _DESC - A long description for the code
 
- Other conventions:  - _NO - For descriptive number columns, i.e. TELEPHONE_NO
- _TXT - For text fields which contain concatenated values etc.
 
- System columns on each table:  - ROW_WID - surrogate key to uniquely identify record (generated during ETL), first column in table
- DATASOURCE_NUM_ID - Unique identifier of source system from which data was extracted. TBD numbers to use, 2nd column in table
- INTEGRATION_ID - Unique identifier of fact or dimension in source system, 3rd column in table
- W_INSERT_DT - Date inserted to Warehouse, at the end of table column definitions
- W_UPDATE_DT - Date updated to Warehouse, at the end of table column definitions
- ETL_RUN_ID - Identifier of ETL process that loaded, integrated, checked the row
 
- Optional on each table:  - CREATED_ON_DT - Date created in source system
- CHANGED_ON_DT - Date changed in source system
- DELETED_FLG - For logical deletes of warehouse records
 
- Type 2 Slowly Changing Dimension columns:  - EFFECTIVE_FROM_DT – Date that the dimensional record is effective from
- EFFECTIVE_TO_DT – Date that the dimensional record is effective to
- CURRENT_FLG – Is record current? Y/N
 
- All ADW surrogate keys should be integers and generated from a sequence
- Foreign keys should always use warehouse surrogate keys
- The W_ prefix denotes a translated to warehouse code/name/description combination. (the concept of domain values)
Notes on Time Dimensions
- Reporting, financial, and other calendars are all associated (de-normalized) onto each day of the year
- Dimensional aggregates are needed by OBIEE for aggregate navigation. So WEEK_D, MONTH_D, and YEAR_D should also be created.
- ETL updates dimension with reporting flags as necessary
- The surrogate key ROW_WID is an integer representation of the period, of the format YYYYMMDD or YYYYMM or YYYY
Example Physical Table Column Conventions
| Domain | Datatype | Notes | 
| AMT | Number (12,2) | Amount | 
| CNT | Integer | Count | 
| CODE | Varchar2 (20) | Code e.g. TUS, TUD | 
| DD | Number (2) | Day of month e.g. 1-31 | 
| DDD | Number (3) | Day of year e.g. 1-366 | 
| DESC | Varchar2 (200) | Code description | 
| DS | Number (2) | Datasource e.g. 1=EBS | 
| DT | Date | Date and time | 
| FLG | Char (1) | Y/N flag | 
| ID | Varchar2 (12) | Identifier | 
| IND | Varchar2 (3) | Indicator e.g. D,W,M,Q,A | 
| MM | Number (2) | Month of year e.g. 1-12 | 
| NAME | Varchar2 (250) | Name | 
| NO | Number (12) | Number | 
| PCT | Number (3,3) | Percentage | 
|  | Number (2) | Quarter of year e.g. 1-4 | 
| TS | Timestamp | Timestamp | 
| TXT | Varchar2 (200) | Text string e.g. +64 4 1234567 | 
| WW | Number (2) | Week of year e.g. 1-53 | 
| WID | Number (12) | Warehouse Identifier i.e. rowid | 
| YYYY | Number (4) | Year e.g. 2012 | 
Refresher on Slowly Changing Dimensions Types
(from Wikipedia)
 
No comments:
Post a Comment