Add stuff here
Table of Contents |
---|
Description
The steps below describe how to run both the STAR ETL Main process (aka STAR Quarterly Refresh) and the STAR COVID-19 mini-CDM. Aside from a few configuration differences, the processing steps are the same for both.
Technical Info
Server Names
- Unix Development: plt-nosql01.dhe.duke.edu
- Unix Production: plp-nosql01.dhe.duke.edu
- Windows (both Dev and Prod): vwp-pmndmc.dhe.duke.edu
Prerequisites
- Must have an Airflow user account for the particular server. Contact ACE Data Engineering team to request access.
- Must have RDP access to vwp-pmndmc.dhe.duke.edu. Create ServiceNow ticket to request access.
Processing Steps
- For the Quarterly refresh only, update the latest RxNorm data for the medication subject area processing.
- Login using PuTTY terminal (or SSH GUI) to same server.
- Change directory to /dwhome/airflow_ci/master/dags/CDRN_ETL/RxNormLoad/
- Run file RxNormLoad.sh
- Login to the appropriate Airflow server via a web browser
- Development: http://plt-nosql01.dhe.duke.edu:8083/admin/
- Production: http://plp-nosql01.dhe.duke.edu:8083/admin/
- Edit the Airflow variable CDRN_ETL_Options_STAR
- Click Admin→Variables at the top of the Airflow web page.
- Click the Edit icon (pencil icon) for the entry CDRN_ETL_Options_STAR
- Make the following changes to these entries in the Airflow variable CDRN_ETL_Options_STAR
- loadToDt to the previous day for the mini-CDM refresh or the last day of the previous month for the quarterly refresh in YYYY-MM-DD format
network to CVD19 for the mini-CDM refresh or STAR for the quarterly refresh
prevPrefix to CVDNEW_ for the mini-CDM refresh or the previous V##_ prefix used for the last STAR refresh
newPrefix to CVDNEW_ for the mini-CDM refresh or the next number available for V##_ prefix for the the current STAR refresh
tempPrefix to ETL_
- Start DAG process
- Click DAGs at the top of the Airflow web page.
- Scroll down the DAG list to the entry for STAR_ETL_Master.
- On the right side of the line, there are several icons. Click the Play button (Trigger DAG).
- Click OK on the confirmation popup window. This will start the DAG process.
- Monitor DAG process
- Click on the link for the DAG STAR_ETL_Master from the main Airflow dashboard. This will open the DAG details page.
- Click either Graph View or Tree View to see the progress status of the DAG.
- Graph View - Will display a graph representation of all task being run and status of each task (Running, Finished, Failed, etc.) Does not display Task Duration Time correctly when hovering over the task.
- Tree View - Will display a tree representation of specific amount of executions, including current execution. Task Duration Time does display correctly here when hovering over the task.
- Usual run time for entire process is 10-12 hours.
- For quarterly refresh only, generate STAR hash token data.
- Login using PuTTY terminal (or SSH GUI) to same server.
- Change directory to /dwhome/airflow_ci/master/dags/CDRN_ETL/Datavant/
- Run file hash_token_load_master.sh STAR
- For quarterly refresh only, execute post-ETL process
- Click DAGs at the top of the Airflow web page.
- Scroll down the DAG list to the entry for STAR_ETL_PostOp
- Run following SQL to backfill any missing hash tokens from previous step:
INSERT INTO HSC_PCOR.<<V##>>_HASH_TOKEN
SELECT PATID, 'XXX - S00000', 'XXX - S00000', 'XXX - S00000', 'XXX - S00000', 'XXX - S00000', 'XXX - S00000', 'Backfill'
FROM HSC_PCOR.<<V##>>_DEMOGRAPHIC WHERE PATID IN
(
SELECT PATID FROM HSC_PCOR.<<V##>>_DEMOGRAPHIC vd
minus
SELECT PATID FROM HSC_PCOR.<<V##>>_HASH_TOKEN
);
- Once DAG finishes, login using PuTTY terminal (or SSH GUI) to same server.
- Change directory to /dwhome/apps/cdrnLoad
- Run following command to generate optimization stats for PCORNet tables
nohup sqlplus 'HSC_PCOR/<<PASSWORD>>'@eclrprd1 @gatherStats_HSC_PCOR.sql &
- For mini-CDM only, generate and send N3C data
- Login to the Windows server vwp-pmndmc.dhe.duke.edu via Remote Desktop Protocol (RDP).
- Open Windows File Explorer.
- Navigate to directory to D:\Apps\N3C\Project\
- Double click on file N3C_Main.bat to start the N3C data script.
- Usual run time for batch file is 8-10 hours.
- Can check for errors looking at log file H:\N3C\Project\N3C_debug.log
- For mini-CDM only, generate and send N3C hash tokens
- Go back to the Airflow server from step 1.
- Scroll down the DAG list to the entry for STAR_N3C_Hash_Tokens.
- Login using PuTTY terminal (or SSH GUI) to same server.
- Change directory to /dwhome/airflow_ci/master/dags/CDRN_ETL/Datavant/
- Run file hash_token_load_master.sh N3C
- Transfer ZIP file created from previous step to local computer
- Log into N3C hash token SFTP site (Setup required), 34.132.183.26:2222
- Need SFTP private key file
- Transfer ZIP file to N3C hash token SFTP site