Bulk Data Loading from a Zip

A friend is learning APEX to automate his business mentoring athletes. In his practice, he captures training performance metrics, then needs to analyze clients’ progress toward their goals over time. His instruments record an athlete’s performance data in a comma-separated values (CSV) format. Recently, he asked if APEX could load data from a CSV file.

After showing him how easy it was, he added another twist:”What if I have a directory full of CSV files, organized into subdirectories by athlete? After a week’s sessions with clients, there could be a lot of files. Will I have to import them one by one? ” Great question!

He said each directory reflects the client name, each file name includes the date and time of the session (SESSION_2023_08_11_13_45.csv), and each file has the same format. A perfect chance to show him three more APEX features:

  • Easy file uploading,
  • Simplified Zip file processing, and
  • Automatic data loading using a pre-defined column mapping.

The sample app you can download using the link at the end of the article illustrates the declarative configuration and small amount of code required to put these techniques into action.

Setting Up the Data Model

For simulation purposes, I chose a CSV file format that includes the name of an activity followed by two numerical values in each row like this:

AAA,333,887
BBB,123,899
CCC,445,1299

I created the following tables to store the athletes, the training sessions for each athlete, and the athlete session training activity data. I added a “staging” table to hold the uploaded data temporarily before moving it to the athlete session data table, related to the appropriate athlete’s training session.

Four tables used by the sample application to upload athlete training session data

Creating the Data Load Definition

The next step was creating an APEX Data Load Definition to capture how the CSV column data maps to columns in the EBA_DEMO_UPLOADED_SESSION_DATA table. In the APEX App Builder, under Shared Components, in the Data Sources section, I chose Data Load Definitions and clicked Create. On step 1, as shown below, I entered the name athlete_session_data_csv and chose the EBA_DEMO_UPLOADED_SESSION_DATA table.

Step 1 of defining a new data load definition: providing a name and picking a table.

On step 2 of the wizard, I dragged and dropped an example SESSION_2023_08_04_18_20.csv data file into the file upload drop zone.

Step 2 of defining a data load definition: provide a sample CSV file

On step 3, I unchecked the First line contains headers checkbox and mapped the values of the three CSV file columns to appropriate columns of the EBA_DEMO_UPLOADED_SESSION_DATA table. If the CSV file had contained an initial “header” row of column names, APEX might have been able to guess the mapping automatically. However, the files my friend showed me did not have headers so I wanted to show him that use case.

Step 3 of defining a data load definition: mapping the columns

To complete the wizard, I clicked on (Create Data Load).

Create Package for Handling the Upload

I used the Object Browser to create an eba_demo_bulk_data_load package with the spec below. The from_uploaded_zip_file() procedure accepts a single parameter for the name of the uploaded zip file to process. In the package body, I initially left the implementation of from_uploaded_zip_file() as a “TODO” task by using a null; statement. I’d later circle back to add in the upload processing code that I explain in detail below. For the moment, having the specification in place was enough to “wire up” the declarative interaction between the page in the next section and the procedure.

create or replace package eba_demo_bulk_data_load is 
    procedure from_uploaded_zip_file( 
        p_file_name in varchar2); 
end;

Uploading the Zip File

On the Home page, I added a single page item named P1_ZIP_FILE_UPLOAD of type File Browse… , set its Storage Type property to Table APEX_APPLICATION_TEMP_FILES, and its Purge File at property to End of Request. I also configured its File Types property to application/zip to guide the user to only choose zip files. Finally, I added a button named Upload_and_Load_All_Data to submit the page. This is the only configuration needed to upload a file for server-side processing during the page submit.

Adding a File Browse… page item to let an end user upload a zip file.

Processing the Zip File Contents

In the Processing tab of the Home page, I added an Invoke API page process to call the from_uploaded_zip_file() procedure in the eba_demo_bulk_data_load package. I configured its Server-side Condition so it only executes when the Upload_and_Load_All_Data button is pressed.

Adding an InvokeAPI page process to execute when the Upload button is pressed

As shown below, I selected the p_file_name function parameter in the Processing tab and used the Property Editor to configure its value to come from the P1_ZIP_FILE_UPLOAD page item. A File Browse… page item’s value gives the unique name of the uploaded file in the APEX_APPLICATION_TEMP_FILES view. We’ll use this info below to process the CSV files inside the uploaded zip file.

Configuring PL/SQL function argument value to come from P1_ZIP_FILE_UPLOAD page item

With these pieces in place, I had a working zip file upload page. Next, I turned my attention to the small amount of code required to “unpack” the zip file contents and process each CSV file it contained…

Loading Each CSV File in the Zip

In the implementation of the the from_uploaded_zip_file() procedure, I started by selecting the binary contents of the uploaded zip file from the APEX_APPLICATION_TEMP_FILES view into an l_zipfile BLOB variable, using the unique file name passed in the p_file_name parameter. Note that in the WHERE clause I’m using the NAME column and not the FILENAME column. The former is the unique name APEX assigns to the uploaded file (e.g. 12345678/athletes.zip), while the latter is just the “plain” name of the file the end-user provided (e.g. athletes.zip).

-- Get the uploaded zip file
select blob_content
into l_zipfile
from apex_application_temp_files
where name = p_file_name;

Next, using the handy apex_zip package, I get the list of CSV files inside this uploaded zip file:

-- Get list of files from the zip
l_files := apex_zip.get_files(l_zipfile);

Then I loop over the file names and load them one by one into the “staging” table using the pre-defined Data Loading Definition I created above. Before each iteration, I delete all the rows from the staging table, and in the next section I’ll explain how I determine the athlete and athlete session with which to associate the loaded data from the current CSV file in the loop. Notice I use another apex_zip package function get_file_content() to access the contents of the current CSV file. I pass that as the value of the p_data_to_load parameter of the apex_data_loading.load_data() function, referencing the static id of the data loading definition as the second parameter value.

-- Process the individual files inside the zip
for j in 1..l_files.count loop
    -- Start by clearing any existing rows out of the temporary upload table
    delete from eba_demo_uploaded_session_data;
    -- Get the contents of the current CSV file in the zip file
    l_csv_file := apex_zip.get_file_content(l_zipfile,l_files(j));
    -- Load data from current CSV file into temp data loading table
    -- using 'athlete_session_data_csv' data loading definition
    l_result := apex_data_loading.load_data(
                    p_data_to_load => l_csv_file,
                    p_static_id    => 'athlete_session_data_csv');
    /* See below for logic to associate data with athlete session */
end loop;

Moving Uploaded Data to Final Destination

After loading each CSV file into the “staging” table, I use the name of the file and the name of its directory to determine the date and time of the session as well as the athlete name. The split() function in the apex_string package comes in handy to break the file name into parts using the slash as the separator character. Assuming the file names will all be of the form athletes/AMINA/SESSION_2023_08_01_12_10.csv, we can see that the athlete name is the second part of the fully-qualified file name and the session date and time is encoded in the third part. Notice I’ve used a date format mask that includes the literal text "SESSION_" and ".csv" to easily extract the date with no additional fuss.

for j in 1..l_files.count loop
    /* See above for logic to load the current CSV file */

    -- Extract the name of the athlete from the current file name
    -- that we are assuming will be of the form:
    -- athletes/AMINA/SESSION_2023_08_01_12_10.csv
    l_filename_parts := apex_string.split(l_files(j),'/');
    l_athlete_name   := l_filename_parts(2);
    l_session_date   := to_date(l_filename_parts(3),
                                '"SESSION_"YYYY_MM_DD_HH24_MI".csv"');
    /* See below for logic to lookup or create athlete */
end loop;

Using straightforward code, I lookup the current athlete’s id based on their name. If the athlete name does not exist, I create a new Athlete row for them and return the new athlete id into l_athlete_id. Then I create a new athlete session for the athlete with the appropriate date and time extracted from the current CSV file name.

for j in 1..l_files.count loop
    /* See above for previous logic in the loop */
    -- Lookup the athlete id from their name. If not found, then
    -- create a new athlete with this name and return the newly
    -- assigned athlete id into l_athlete_id
    begin
        select id
          into l_athlete_id
          from eba_demo_athletes
         where name = l_athlete_name;
    exception
        when no_data_found then
            insert into eba_demo_athletes(name)
            values(l_athlete_name)
            returning id into l_athlete_id;
    end;
    -- Create a new athlete session for the athlete id
    -- and session date (inferred from the CSV file name above)
    insert into eba_demo_athlete_sessions(athlete_id, session_date)
    values (l_athlete_id, l_session_date)
    returning id into l_new_session_id;
    /* See below for final logic to move uploaded data in place */
end loop;

Finally, I select the uploaded performance data from the “staging” table and insert it into the Athlete Session Data table along with the correct Athlete Session foreign key so it’s related to the new athlete session for the current file’s athlete.

for j in 1..l_files.count loop
    /* See above for previous logic in the loop */
    -- Finally, move all the uploaded performance data for the current
    -- athlete's session into the newly created athlete session
    -- using the l_new_session_id returned above
    insert into eba_demo_athlete_session_data
           (athlete_session_id, activity, value1, value2)
    select  l_new_session_id,   activity, value1, value2
    from eba_demo_uploaded_session_data;
end loop;

Enabling Background Processing

Using the techniques I described in Page Processing Power Unchained, it would be very easy to introduce an execution chain into the Home page’s processing tab and make the current Invoke API page process be its child process. You would accomplish that by setting the Execution Chain property of the latter to the name of the newly-created chain. By setting the new execution chain’s Execute in Background property to ON, its Temporary File Handling property to Move, and its Temporary File Items to P1_ZIP_FILE_UPLOAD, the zip file CSV upload processing could be effortlessly done in the background while the user moves on to do other tasks. This extra flourish would be appropriate if there might be a large number of CSV files to process inside.

Trying Out the Sample App

You can download the APEX 23.1 sample app from here. This example athletes.zip file contains simulated training data for multiple sessions for four different athletes: BIFF, CHIP, ANNEENA, and WILMA. Try uploading the athletes.zip file then browsing the Athletes Session Data page.