Maintaining Link Tables

 

Building a Data Warehouse Automation community through knowledge sharing.

More advanced Link Object Staging Logic

When the source relationship is not based on the Business Keys

Figure 1

REMARK : This is a fictive model explaining a case and is not based on a real world example


Through this model we want to explain the more advanced logic necessary in the staging layer to solve the relationship and load the link table when the foreign/primary key relationship between the source objects is not based on the Business Key.

 

The relational source model as shown on figure 1 shows the relationship between an Invoice source table and a  Customer table.


The Business Keys are indicated by the U-sign front of the column and if they do not exist than the the Business Key is indicated by the Unique key. 

More specific in this model, it means :


  • Invoices:
    • The Business Key is INVOICE_NUMBER
  • Customers:
    • The Business Key is a composite key consisting of the columns:
      • FIRST_NAME
      • LAST_NAME
      • BIRTH_DATE
      • GENDER


The relationship between the 2 tables in this model is based on the customer_id which is the technical key (surrogate key) and is not based on the business key, meaning the combination of First_name, Last_name,Birth_date and Gender.


It means the following:

  • That the link table cannot be loaded only by using the Invoices table (where the relationship exists) but the a lookup is needed to find the Business Keys in the Customers table.
  • As the data in the landing area holds incremental data it means that you may not find all the records to solve the relationships during the lookup and that you have to do a lookup to the Customer Hub to find the missing related records and their Business Keys.

Figure 2


Figure 2 explains the 2 lookups that are needed to resolve the link between Invoice and Customer.

In the invoices landing table are 2 incremental records with which have 5 and 7 as the customer-id.

When the lookup happens to the Customers landing table it only holds the record with the Customer_Number = 5 which means that the record with Customer_Number 7 did not arrive into the landing area during this incremental load.


The only way to solve this is to do the lookup to the Satellite and that the Satellite holds the primary key of the Customers table = Customer_Number, so the lookup is possible.

Through the lookup to the HUB using the Hash key, the Business Keys kan be found resulting in a fully solved Link table.


Underneath you find the SQL used to load the invoices Staging table and which resolves the Link table between Invoices and Customers in a Oracle Database prior to 12C.


INSERT INTO moto_stg_ms_tst_ref.stg_invoices

 (

                    invoices_hkey

                    ,customers_hkey

                    ,lnk_invo_cust_hkey

                    ,load_date

                    ,src_bk

                    ,load_cycle_id

                    ,trans_timestamp

                    ,operation

                    ,record_type

                    ,invoice_number

                    ,invoice_customer_id

                    ,invoice_number_bk

                    ,business_key_bk

                    ,invoice_date

                    ,amount

                    ,discount

                    ,error_code

)

WITH

    distfkinv_cust AS

    (SELECT distinct invoice_customer_id

       FROM moto_ext_ms_tst_ref.ext_invoices)

  ,find_BK_fkinv_cust AS

    (

     SELECT hub.business_key_bk

           ,stg.invoice_customer_id

       FROM distfkinv_cust stg

       JOIN moto_fl_tst_ref.sat_ms_customers sat

         ON stg.invoice_customer_id = sat.customer_number

       JOIN moto_fl_tst_ref.hub_customers hub

         ON sat.customers_hkey = hub.customers_hkey

      WHERE sat.load_end_date = to_date('31/12/2399','DD/MM/YYYY')

      UNION

      SELECT ex.business_key_bk

             ,ex.customer_number invoice_customer_id

      FROM moto_ext_ms_tst_ref.ext_customers ex

      JOIN distfkinv_cust dist

        ON ex.customer_number = dist.invoice_customer_id

    )

SELECT UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk),2)))) invoices_hkey    

       ,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(UPPER('moto_sales') || '#' || c.business_key_bk),2)))) customers_hkey    

       ,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk || '#' || UPPER('moto_sales') || '#' || c.business_key_bk),2)))) lnk_invo_cust_hkey 

       ,load_date

       ,UPPER('moto_sales') src_bk

       ,load_cycle_id

       ,r.trans_timestamp

       ,r.operation

       ,r.record_type

       ,r.invoice_number

       ,r.invoice_customer_id

       ,r.invoice_number_bk

       ,c.business_key_bk

       ,invoice_date

       ,amount

       ,discount

 FROM moto_ext_ms_tst_ref.ext_invoices r

 LEFT OUTER JOIN find_BK_fkinv_cust c ON (r.invoice_customer_id=c.invoice_customer_id)

;


An argument could be made that this Technical key could also be put into the Hub which results in one less lookup from the Satellite to the Hub.
As the definition of the Hub is to only contain Business Keys it was our assumption that it was best to put this key in the Satellite but make it a special attribute that does not take part in the Change of History, meaning it will not be part of the calculation of the Hash Difference Attribute

Figure 3


As an additional Information we have added the target Data Vault model to this Article, it is shown on figure 3.


Dirk Vermeiren

Tripwire Solutions N.V.



Email Me

Copyright @ 2019 Mireconsuling.com All Rights Reserved

Data Vault Community

Become member

Privacy Policy