Since: PFIMCORE-463 (Oct 2022)
Transforms the given Excel file into internal structure in a fully streaming way.
-
Input:
InputStream— the Excel file content (XLSX only). -
Output:
Stream<Map<String, String>>— a lazy stream of rows for memory-efficient processing.
Key Differences from unmarshal
|
|
|
|
|---|---|---|
|
Format |
XLS and XLSX |
XLSX only |
|
Return type |
|
|
|
Memory footprint |
Full file loaded |
Substantially lower |
|
Use case |
Small to medium files |
Large files (thousands of rows) |
Properties
|
Option |
Type |
Default |
Since |
Description |
|---|---|---|---|---|
|
|
Boolean |
|
PFIMCORE-463 |
Indicates whether the input contains a header record. |
|
|
Boolean |
|
PFIMCORE-463 |
Determines whether to skip the header record in the output. |
|
|
String |
|
PFIMCORE-463 |
Comma-separated list of headers to use. |
|
|
Integer |
|
PFIMCORE-463 |
Index of the sheet with the required data. |
|
|
String |
|
PFIMCORE-463 |
Name of the sheet. If filled, takes precedence over |
Limitations
-
XLSX only — XLS files cannot be streamed. Use
unmarshalfor XLS files. -
Streaming resources must be properly closed. The component handles cleanup automatically via
CleanupFunctioncallbacks.
Examples
Default streaming unmarshal
<to uri="pfx-excel:streamingUnmarshal"/>
Stream large file and process in batches
<route>
<from uri="file:inbox?fileName=large-products.xlsx"/>
<to uri="pfx-excel:streamingUnmarshal"/>
<split streaming="true">
<simple>${body}</simple>
<to uri="direct:processRow"/>
</split>
</route>
Common Pitfalls
-
XLS files will fail — This method only supports XLSX format. Passing an XLS file will throw an exception. Use
unmarshalfor XLS files, or convert to XLSX first. -
Stream must be consumed — The output is a lazy
Stream<Map>, not aList. It can only be iterated once. If you need to process the data multiple times, collect it to a list first (but this defeats the memory advantage). -
Use
streaming="true"in split — When combining with Camelsplit, always setstreaming="true"to maintain the low memory footprint. Without it, Camel materializes the entire stream before splitting.