Building a Data Warehouse Automation community through knowledge sharing.
More advanced Link Object Staging Logic
When the source relationship is not based on the Business Keys
REMARK : This is a fictive model explaining a case and is not based on a real world example
Through this model we want to explain the more advanced logic necessary in the staging layer to solve the relationship and load the link table when the foreign/primary key relationship between the source objects is not based on the Business Key.
The relational source model as shown on figure 1 shows the relationship between an Invoice source table and a Customer table.
The Business Keys are indicated by the U-sign front of the column and if they do not exist than the the Business Key is indicated by the Unique key.
More specific in this model, it means :
The relationship between the 2 tables in this model is based on the customer_id which is the technical key (surrogate key) and is not based on the business key, meaning the combination of First_name, Last_name,Birth_date and Gender.
It means the following:
Figure 2 explains the 2 lookups that are needed to resolve the link between Invoice and Customer.
In the invoices landing table are 2 incremental records with which have 5 and 7 as the customer-id.
When the lookup happens to the Customers landing table it only holds the record with the Customer_Number = 5 which means that the record with Customer_Number 7 did not arrive into the landing area during this incremental load.
The only way to solve this is to do the lookup to the Satellite and that the Satellite holds the primary key of the Customers table = Customer_Number, so the lookup is possible.
Through the lookup to the HUB using the Hash key, the Business Keys kan be found resulting in a fully solved Link table.
Underneath you find the SQL used to load the invoices Staging table and which resolves the Link table between Invoices and Customers in a Oracle Database prior to 12C.
INSERT INTO moto_stg_ms_tst_ref.stg_invoices
(SELECT distinct invoice_customer_id
FROM distfkinv_cust stg
JOIN moto_fl_tst_ref.sat_ms_customers sat
ON stg.invoice_customer_id = sat.customer_number
JOIN moto_fl_tst_ref.hub_customers hub
ON sat.customers_hkey = hub.customers_hkey
WHERE sat.load_end_date = to_date('31/12/2399','DD/MM/YYYY')
FROM moto_ext_ms_tst_ref.ext_customers ex
JOIN distfkinv_cust dist
ON ex.customer_number = dist.invoice_customer_id
SELECT UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk),2)))) invoices_hkey
,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(UPPER('moto_sales') || '#' || c.business_key_bk),2)))) customers_hkey
,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk || '#' || UPPER('moto_sales') || '#' || c.business_key_bk),2)))) lnk_invo_cust_hkey
FROM moto_ext_ms_tst_ref.ext_invoices r
LEFT OUTER JOIN find_BK_fkinv_cust c ON (r.invoice_customer_id=c.invoice_customer_id)
An argument could be made that this Technical key could also be put into the Hub which results in one less lookup from the Satellite to the Hub.
As the definition of the Hub is to only contain Business Keys it was our assumption that it was best to put this key in the Satellite but make it a special attribute that does not take part in the Change of History, meaning it will not be part of the calculation of the Hash Difference Attribute
As an additional Information we have added the target Data Vault model to this Article, it is shown on figure 3.
Tripwire Solutions N.V.