Overview
Returns a lazy CSVParser (or ReusableCSVParser) as the exchange body instead of loading all records into memory. The caller is expected to iterate over it (typically via a downstream <split>).
Input: The exchange body must be convertible to an InputStream (e.g. String, byte[], File, or InputStream).
Output: The exchange body is replaced with a CSVParser iterator (or ReusableCSVParser when useReusableParser=true).
Header detection:
-
If
headeris set, those names are used. -
If
headeris not set,withFirstRecordAsHeader()is applied so the first CSV line is consumed as the header.
Limitations
-
Cannot be used inside a
<split>block. If aSPLIT_INDEXproperty is detected, aNonRecoverableExceptionis thrown. UsestreamingUnmarshalbefore the split and iterate the returned parser inside the split body. -
The returned
CSVParseris a forward-only iterator by default. Each record can only be read once unlessuseReusableParser=trueis set.
Properties
|
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
|
Base CSVFormat preset name (e.g. |
|
|
|
|
Field delimiter character. Supports Java escape sequences (e.g. backslash-t for tab). |
|
|
|
(auto-detect) |
Comma-separated list of column names. When omitted, |
|
|
|
|
Whether to skip the header record in the stream. |
|
|
|
|
Character used to quote field values. |
|
|
|
|
Set to |
|
|
|
(none) |
Apache Commons CSV |
|
|
|
(none) |
Escape character for special characters inside fields. |
|
|
|
(none) |
Character that marks comment lines (lines starting with this character are ignored). |
|
|
|
Platform default |
Record (line) separator. |
|
|
|
(none) |
String to interpret as |
|
|
|
(none) |
Whether to trim leading/trailing whitespace from field values. |
|
|
|
(none) |
Whether to ignore spaces surrounding field values. |
|
|
|
(none) |
Whether to skip empty lines. |
|
|
|
(none) |
Whether header matching is case-insensitive. |
|
|
|
(none) |
Whether to tolerate missing column names in the header record. |
|
|
|
|
When |
|
|
|
|
(Advanced) Whether to defer producer creation until the first message is processed. |
Examples
Streaming Unmarshal for Large Files
For very large CSV files that should not be loaded entirely into memory:
<route id="streamingImport">
<from uri="file:{{import.fromUri}}"/>
<to uri="pfx-csv:streamingUnmarshal?header=sku,label,price"/>
<split>
<simple>${body}</simple>
<!-- each iteration yields one CSVRecord -->
<to uri="pfx-api:loaddata?mapper=myMapper&objectType=DM&dsUniqueName=Product"/>
</split>
</route>
Reusable Parser for Multiple Passes
When you need to iterate the CSV data more than once (e.g. validation pass then import pass):
<to uri="pfx-csv:streamingUnmarshal?header=sku,label,price&useReusableParser=true"/>
Common Pitfalls
-
Using
streamingUnmarshalinside a<split>-- This throws aNonRecoverableException. The streaming unmarshal step must come before the split. The returnedCSVParseris then iterated by the<split>body. -
Forward-only iteration -- The default
CSVParseris a forward-only iterator. If you need to iterate the data more than once, setuseReusableParser=trueso the stream can be reset. -
Memory expectations -- While
streamingUnmarshalavoids loading all records at once, the underlyingInputStreammust remain open for the duration of iteration. Ensure the input source is not closed prematurely (e.g. by a file component that deletes on completion before the split finishes). -
Delimiter in XML -- Remember to XML-escape the ampersand in URI query strings: use
&not&.