Import CSV to Data Source (DMDS)

Import CSV to Data Source (DMDS)

Overview

This example demonstrates how to import data from a CSV file into a Pricefx Data Source (object type DM with dsUniqueName). Data Sources (also known as Data Marts or DMDS) are flexible storage tables used for pricing calculations. This example includes the critical onCompletion flush step that finalizes the data load and makes records available for use.

Files

routes/import-csv-to-datasource.xml

XML
<routes xmlns="http://camel.apache.org/schema/spring">
    <route id="import-csv-to-datasource">
        <from uri="file:{{import.datasource.directory}}?{{archive.file}}&amp;{{read.lock}}"/>
        <log message="Processing data source file: ${header.CamelFileNameOnly}" loggingLevel="INFO"/>
        <split aggregationStrategy="recordsCountAggregation" streaming="true">
            <tokenize token="
" group="10000"/>
            <to uri="pfx-csv:unmarshal?skipHeaderRecord=true"/>
            <to uri="pfx-api:loaddata?objectType=DM&amp;dsUniqueName=Product&amp;mapper=import-csv-to-datasource.mapper"/>
        </split>
        <log message="Data source import complete. Total records: ${header.PfxTotalInputRecordsCount}" loggingLevel="INFO"/>
        <onCompletion onCompleteOnly="true">
            <log message="Flushing data source DMDS.Product" loggingLevel="INFO"/>
            <to uri="pfx-api:flush?dataFeedName=DMF.Product&amp;dataSourceName=DMDS.Product"/>
            <log message="Data source DMDS.Product flushed successfully" loggingLevel="INFO"/>
        </onCompletion>
    </route>
</routes>

mappers/import-csv-to-datasource.mapper.xml

XML
<mappers>
    <loadMapper id="import-csv-to-datasource.mapper">
        <body in="sku" out="key1"/>
        <body in="region" out="key2"/>
        <body in="effectiveDate" out="key3"/>
        <body in="listPrice" out="value1"/>
        <body in="costPrice" out="value2"/>
        <body in="minPrice" out="value3"/>
        <body in="discount" out="value4"/>
        <body in="margin" out="value5"/>
    </loadMapper>
</mappers>

config/application.properties (snippet)

# Import directory for data source CSV files
import.datasource.directory=/data/imports/datasource

How It Works

  1. File Pickup: The file: component monitors the configured directory for CSV files. {{archive.file}} moves processed files to a timestamped archive; {{read.lock}} waits until the file is fully written before reading.

  2. Splitting: The split with tokenize token=" " group="10000" breaks the file into batches for streaming processing. This is essential for large data source files that can contain millions of rows.

  3. CSV Unmarshalling: pfx-csv:unmarshal parses each batch into a list of maps.

  4. Loading to Pricefx: pfx-api:loaddata with objectType=DM and dsUniqueName=Product targets the specific Data Source table named "Product". The mapper maps CSV columns to Data Source key and value fields.

  5. Flush on Completion: The onCompletion onCompleteOnly="true" block runs after the route completes successfully. The pfx-api:flush call finalizes the data load by flushing the data feed (DMF.Product) and data source (DMDS.Product). This step makes the loaded records available for use in pricing logic.

Common Pitfalls

  • Missing flush: Without the pfx-api:flush step, loaded data remains in a staging state and is not available for pricing calculations. Always include a flush after loading to a Data Source.

  • Flush naming convention: The dataFeedName uses the prefix DMF. and the dataSourceName uses the prefix DMDS., both followed by the same unique name used in dsUniqueName. For example, if dsUniqueName=Product, then dataFeedName=DMF.Product and dataSourceName=DMDS.Product.

  • onCompletion placement: The onCompletion block must be placed inside the <route> element but outside the <split> block. Placing it inside the split will cause it to execute after every batch instead of once at the end.

  • onCompleteOnly vs default: Use onCompleteOnly="true" to ensure the flush only runs on successful completion. Without this flag, the flush would also run if the route fails, potentially flushing incomplete data.

  • No businessKeys for DMDS: Unlike Products and Customers, Data Source loads typically do not use businessKeys. Each flush replaces the entire data set. If you need incremental loads, consider using direct2ds=true instead.

  • Key/Value field naming: Data Sources use key1-key6 for dimension fields and value1-value30 for measure fields. The field names in Pricefx Administration determine what each key/value represents.