ETL

ETL (Extract, Transform & Load):
  • Data is extracted from online transaction processing (OLTP) databases, today more commonly known just as 'transactional databases', and other data sources.
  • OLTP applications have high throughput, with large numbers of read and write requests.
  • They do not lend themselves well to data analysis or business intelligence tasks.
  • Data is then transformed in a staging area.
  • These transformations cover both data cleansing and optimizing the data for analysis.
  • The transformed data is then loaded into an online analytical processing (OLAP) database, today more commonly known as just an analytics database.
------------------------------------------------------------------------------------------------------------------------
Data Retention: It means how long the data needs to be available in the database.

Why Data Retention is required?

1) Performance will be impacted due to increase of data.
2) Cost will be high if database accumulates data.
3) Old records may not be useful for end user over a certain period of time.

Purging: It means deleting data from a database which crosses the defined retention time.
-----------------------------------------------------------------------------------------------------------------------

ETL testing process:

1) Requirement analysis (I/P: Data model & mapping document)
2) Test Planning
3) Test design (Preparation of test cases)
4) Test execution
5) Defect Retesting
6) Sign Off
-----------------------------------------------------------------------------------------------------------------------
Testing Scenarios:

1) Table structure verification
2) Constraints check
3) Index check (Ensure that index is created with required columns)
4) Source data validation
5) Data count check
6) Data comparison check
7) Duplicate data validation
8) Data with primary key & Foreign key check
9) Null check
10) Data precision check (Create teat data in source table with different precision and ensure the loaded data has the precision as per requirement)
11) Date format check
12) All business transformation rules validation
-----------------------------------------------------------------------------------------------------------------------
Type of transformations:

Active: The output record count of the transformation may or may not be equal to input record count. (Eg: Condition is age between 25 & 30)
Aggregator (Max,Min,Count etc)
Joiner
Union
Sorter

Passive: The output record count of the transformation is equal to input record count.
(Eg: Expression transformation to concatenate first name & last name columns)
Expression
-----------------------------------------------------------------------------------------------------------------------
Basic Test cases

1) To validate the structure and correctness by comparing with data model.
2) To validate the source and target tables count.
3) To confirm that records are unique for a particular ENTITY_NUM.
4) Ensure the  column shouldn't have NULL values as per the Data Model Document.
5) To verify the data from source & target
6) Format validation of date and timestamp columns.
7) Format Validation of int and float values.
-----------------------------------------------------------------------------------------------------------------------
Data model content:

Table information:
1) Source table: REGISTRATION_DETL
2) Target table: AB_REGISTRATION_DETL
3) Lookup table: DLY_REG_SEC

Input & loading information:
1) Loading frequency: DAILY
2) Loading logic: INCRMENTAL
3) Partition key: STATE_DC

Details:
1) Sr. No.: 1
2) Field: : REG_ID
3) Description: REGISTRATION ID
4) Application Name/ Database Name: FINAL_DISTRIBUTION
5) Source table: REG_DETL
6) Source field: REG_ID
7) Lookup tableDLY_REG_SEC
8) Lookup column: REG_ID
9) Lookup return columnSTATE_DESC
10) Data transformation logic: DIRECT
11) Fact (or) Dimension (or) Referred: DIMENSION
12) Key: PRI
13) Data type: int
14) Length: 11
15) Nullable:N



---------------------------------------------------------------------
Here is a high-level overview of a typical Glue reconciliation regression testing process:
    1. Identify the systems to be reconciled. This may include a data warehouse, data lake, and/or other data sources.
    2. Identify the data to be reconciled. This may include all data in the systems, or a specific subset of data.
    3. Identify the change that was made to one of the systems. This may be a new data source, a change to the data schema, or a change to the data processing logic.
    4. Generate test cases. This will involve identifying the different ways in which the data could be affected by the change, and creating test cases to verify that the data is still synchronized after the change.
    5. Execute the test cases. This may involve running Glue jobs, using a third-party tool, or manually comparing the data in the two systems.
    6. Analyze the results. If any of the test cases fail, then the cause of the failure needs to be investigated and fixed.

                                ---------------------------------------------------------------------

 


No comments:

Post a Comment