pfx-csv:streamingUnmarshal

Overview

Returns a lazy CSVParser (or ReusableCSVParser) as the exchange body instead of loading all records into memory. The caller is expected to iterate over it (typically via a downstream <split>).

Input: The exchange body must be convertible to an InputStream (e.g. String, byte[], File, or InputStream).

Output: The exchange body is replaced with a CSVParser iterator (or ReusableCSVParser when useReusableParser=true).

Header detection:

  • If header is set, those names are used.

  • If header is not set, withFirstRecordAsHeader() is applied so the first CSV line is consumed as the header.

Limitations

  • Cannot be used inside a <split> block. If a SPLIT_INDEX property is detected, a NonRecoverableException is thrown. Use streamingUnmarshal before the split and iterate the returned parser inside the split body.

  • The returned CSVParser is a forward-only iterator by default. Each record can only be read once unless useReusableParser=true is set.

Properties

Parameter

Type

Default

Description

format

String

DEFAULT

Base CSVFormat preset name (e.g. DEFAULT, EXCEL, TDF, RFC4180). Other options override values from this preset.

delimiter

String

,

Field delimiter character. Supports Java escape sequences (e.g. backslash-t for tab).

header

String

(auto-detect)

Comma-separated list of column names. When omitted, withFirstRecordAsHeader() is applied so the first CSV line is consumed as the header.

skipHeaderRecord

Boolean

false

Whether to skip the header record in the stream.

quoteCharacter

Character

"

Character used to quote field values.

quoteDisabled

Boolean

false

Set to true to disable quoting entirely (sets quote character to null).

quoteMode

String

(none)

Apache Commons CSV QuoteMode name: ALL, ALL_NON_NULL, MINIMAL, NON_NUMERIC, NONE.

escapeCharacter

Character

(none)

Escape character for special characters inside fields.

commentMarker

Character

(none)

Character that marks comment lines (lines starting with this character are ignored).

recordSeparator

String

Platform default

Record (line) separator.

nullString

String

(none)

String to interpret as null when reading.

trim

Boolean

(none)

Whether to trim leading/trailing whitespace from field values.

ignoreSurroundingSpaces

Boolean

(none)

Whether to ignore spaces surrounding field values.

ignoreEmptyLines

Boolean

(none)

Whether to skip empty lines.

ignoreHeaderCase

Boolean

(none)

Whether header matching is case-insensitive.

allowMissingColumnNames

Boolean

(none)

Whether to tolerate missing column names in the header record.

useReusableParser

Boolean

false

When true, wraps the input in a ReusableCSVParser that supports re-iteration over the same stream (the stream is marked and reset).

lazyStartProducer

Boolean

false

(Advanced) Whether to defer producer creation until the first message is processed.

Examples

Streaming Unmarshal for Large Files

For very large CSV files that should not be loaded entirely into memory:

XML
<route id="streamingImport">
    <from uri="file:{{import.fromUri}}"/>
    <to uri="pfx-csv:streamingUnmarshal?header=sku,label,price"/>
    <split>
        <simple>${body}</simple>
        <!-- each iteration yields one CSVRecord -->
        <to uri="pfx-api:loaddata?mapper=myMapper&amp;objectType=DM&amp;dsUniqueName=Product"/>
    </split>
</route>

Reusable Parser for Multiple Passes

When you need to iterate the CSV data more than once (e.g. validation pass then import pass):

XML
<to uri="pfx-csv:streamingUnmarshal?header=sku,label,price&amp;useReusableParser=true"/>

Common Pitfalls

  1. Using streamingUnmarshal inside a <split> -- This throws a NonRecoverableException. The streaming unmarshal step must come before the split. The returned CSVParser is then iterated by the <split> body.

  2. Forward-only iteration -- The default CSVParser is a forward-only iterator. If you need to iterate the data more than once, set useReusableParser=true so the stream can be reset.

  3. Memory expectations -- While streamingUnmarshal avoids loading all records at once, the underlying InputStream must remain open for the duration of iteration. Ensure the input source is not closed prematurely (e.g. by a file component that deletes on completion before the split finishes).

  4. Delimiter in XML -- Remember to XML-escape the ampersand in URI query strings: use &amp; not &.