pfx-csv:unmarshal

Overview

Converts CSV text (from the exchange body) into a List<Map<String, String>> (default) or List<List<String>> (when useMaps=false).

Input: The exchange body must be convertible to an InputStream (e.g. String, byte[], File, or InputStream).

Output: The exchange body is replaced with the parsed collection.

Header detection:

  • If header is set, those names are used.

  • If header is not set, the first line is parsed as the header.

  • When used inside a <split>, the header from the first batch (split index 0) is cached and reused for subsequent batches via ExchangeCache.

Encoding: Determined by the CamelCharsetName exchange property. Defaults to UTF-8. BOM (Byte Order Mark) is automatically stripped.

Error reporting: If a parse error occurs inside a split, the error message includes the absolute line number calculated from the split index and split size.

Properties

Parameter

Type

Default

Description

format

String

DEFAULT

Base CSVFormat preset name (e.g. DEFAULT, EXCEL, TDF, RFC4180).

delimiter

String

,

Field delimiter character. Supports Java escape sequences (e.g. \t for tab).

header

String

(auto-detect)

Comma-separated list of column names. When omitted, the first record is used as the header.

skipHeaderRecord

Boolean

false

Whether to skip the first line when it is a header.

useMaps

Boolean

true

When true, each row becomes a Map. When false, each row becomes a List of values.

headerPolicy

NORMAL / STRICT

NORMAL

When STRICT, the header parameter is required and must match exactly.

quoteCharacter

Character

"

Character used to quote field values.

quoteDisabled

Boolean

false

Set to true to disable quoting entirely.

recordSeparator

String

Platform default

Record (line) separator. Accepts CR, LF, CRLF tokens.

nullString

String

(none)

String to interpret as null when reading.

trim

Boolean

(none)

Whether to trim leading/trailing whitespace from field values.

camelSplitIndexAware

Boolean

true

Uses the Camel SPLIT_INDEX property for split-batch behavior: header is cached from the first batch.

forceSkipHeaderWhenPartOfSplit

Boolean

true

When true and exchange is part of a split (index > 0), header line is auto-skipped.

lazyStartProducer

Boolean

false

(Advanced) Whether to defer producer creation until the first message is processed.

Examples

Basic CSV Import (DMDS)

Read a CSV file, split into 5000-line batches, unmarshal, and load into a Pricefx Data Source:

XML
<route id="csvImportToDatasource">
    <from uri="file:{{import.fromUri}}"/>
    <split>
        <tokenize group="5000" token="\n"/>
        <to uri="pfx-csv:unmarshal?header=sku,label,price&amp;skipHeaderRecord=true&amp;delimiter=,"/>
        <to uri="pfx-api:loaddata?mapper=myMapper&amp;objectType=DM&amp;dsUniqueName=Product"/>
    </split>
    <onCompletion onCompleteOnly="true">
        <to uri="pfx-api:flush?dataFeedName=DMF.Product&amp;dataSourceName=DMDS.Product"/>
    </onCompletion>
</route>

Tab-Delimited File

Use a tab delimiter with the \t escape sequence:

XML
<to uri="pfx-csv:unmarshal?delimiter=\t&amp;header=sku,name,price&amp;skipHeaderRecord=true"/>

Pipe-Delimited with Quoting Disabled

XML
<to uri="pfx-csv:unmarshal?delimiter=|&amp;quoteDisabled=true&amp;header=id,name,value"/>

Strict Header Validation

Fail the route if the CSV file header does not exactly match the expected columns:

XML
<to uri="pfx-csv:unmarshal?header=sku,label,price&amp;headerPolicy=STRICT&amp;skipHeaderRecord=true"/>

Unmarshal to Lists Instead of Maps

When map keys are not needed (e.g. positional data):

XML
<to uri="pfx-csv:unmarshal?useMaps=false&amp;header=col1,col2,col3&amp;skipHeaderRecord=true"/>

Setting Encoding Explicitly

XML
<setProperty name="CamelCharsetName">
    <constant>ISO-8859-1</constant>
</setProperty>
<to uri="pfx-csv:unmarshal?header=sku,price&amp;skipHeaderRecord=true"/>

Common Pitfalls

  1. Forgetting skipHeaderRecord=true when a header line exists and the header parameter is also set. Without it, the first data row will be the header line itself.

  2. Header mismatch with headerPolicy=STRICT — the actual header in the CSV must match the header parameter exactly (same order, same case, same number of columns).

  3. Delimiter in XML — Remember to XML-escape the ampersand in URI query strings: use &amp; not &.

  4. Encoding issues — The component reads the CamelCharsetName exchange property for character encoding. If not set, UTF-8 is assumed. BOM is automatically stripped.

  5. Split batching header behavior — When camelSplitIndexAware=true (default), the header is read from the first batch and cached. Set camelSplitIndexAware=false if each split chunk is independent.