pfx-csv:streamingUnmarshal | Pricefx Knowledge Base

Overview

Returns a lazy CSVParser (or ReusableCSVParser) as the exchange body instead of loading all records into memory. The caller is expected to iterate over it (typically via a downstream <split>).

Input: The exchange body must be convertible to an InputStream (e.g. String, byte[], File, or InputStream).

Output: The exchange body is replaced with a CSVParser iterator (or ReusableCSVParser when useReusableParser=true).

Header detection:

If header is set, those names are used.
If header is not set, withFirstRecordAsHeader() is applied so the first CSV line is consumed as the header.

Limitations

Cannot be used inside a <split> block. If a SPLIT_INDEX property is detected, a NonRecoverableException is thrown. Use streamingUnmarshal before the split and iterate the returned parser inside the split body.
The returned CSVParser is a forward-only iterator by default. Each record can only be read once unless useReusableParser=true is set.

Properties

Parameter	Type	Default	Description
`format`	`String`	`DEFAULT`	Base CSVFormat preset name (e.g. `DEFAULT`, `EXCEL`, `TDF`, `RFC4180`). Other options override values from this preset.
`delimiter`	`String`	`,`	Field delimiter character. Supports Java escape sequences (e.g. backslash-t for tab).
`header`	`String`	(auto-detect)	Comma-separated list of column names. When omitted, `withFirstRecordAsHeader()` is applied so the first CSV line is consumed as the header.
`skipHeaderRecord`	`Boolean`	`false`	Whether to skip the header record in the stream.
`quoteCharacter`	`Character`	`"`	Character used to quote field values.
`quoteDisabled`	`Boolean`	`false`	Set to `true` to disable quoting entirely (sets quote character to `null`).
`quoteMode`	`String`	(none)	Apache Commons CSV `QuoteMode` name: `ALL`, `ALL_NON_NULL`, `MINIMAL`, `NON_NUMERIC`, `NONE`.
`escapeCharacter`	`Character`	(none)	Escape character for special characters inside fields.
`commentMarker`	`Character`	(none)	Character that marks comment lines (lines starting with this character are ignored).
`recordSeparator`	`String`	Platform default	Record (line) separator.
`nullString`	`String`	(none)	String to interpret as `null` when reading.
`trim`	`Boolean`	(none)	Whether to trim leading/trailing whitespace from field values.
`ignoreSurroundingSpaces`	`Boolean`	(none)	Whether to ignore spaces surrounding field values.
`ignoreEmptyLines`	`Boolean`	(none)	Whether to skip empty lines.
`ignoreHeaderCase`	`Boolean`	(none)	Whether header matching is case-insensitive.
`allowMissingColumnNames`	`Boolean`	(none)	Whether to tolerate missing column names in the header record.
`useReusableParser`	`Boolean`	`false`	When `true`, wraps the input in a `ReusableCSVParser` that supports re-iteration over the same stream (the stream is marked and reset).
`lazyStartProducer`	`Boolean`	`false`	(Advanced) Whether to defer producer creation until the first message is processed.

Examples

Streaming Unmarshal for Large Files

For very large CSV files that should not be loaded entirely into memory:

XML

<route id="streamingImport">
    <from uri="file:{{import.fromUri}}"/>
    <to uri="pfx-csv:streamingUnmarshal?header=sku,label,price"/>
    <split>
        <simple>${body}</simple>
        <!-- each iteration yields one CSVRecord -->
        <to uri="pfx-api:loaddata?mapper=myMapper&amp;objectType=DM&amp;dsUniqueName=Product"/>
    </split>
</route>

Reusable Parser for Multiple Passes

When you need to iterate the CSV data more than once (e.g. validation pass then import pass):

XML

<to uri="pfx-csv:streamingUnmarshal?header=sku,label,price&amp;useReusableParser=true"/>

Common Pitfalls

Using streamingUnmarshal inside a <split> -- This throws a NonRecoverableException. The streaming unmarshal step must come before the split. The returned CSVParser is then iterated by the <split> body.
Forward-only iteration -- The default CSVParser is a forward-only iterator. If you need to iterate the data more than once, set useReusableParser=true so the stream can be reset.
Memory expectations -- While streamingUnmarshal avoids loading all records at once, the underlying InputStream must remain open for the duration of iteration. Ensure the input source is not closed prematurely (e.g. by a file component that deletes on completion before the split finishes).
Delimiter in XML -- Remember to XML-escape the ampersand in URI query strings: use & not &.