pfx-excel:streamingUnmarshal

Since: PFIMCORE-463 (Oct 2022)

Transforms the given Excel file into internal structure in a fully streaming way.

  • Input: InputStream — the Excel file content (XLSX only).

  • Output: Stream<Map<String, String>> — a lazy stream of rows for memory-efficient processing.

Key Differences from unmarshal


unmarshal

streamingUnmarshal

Format

XLS and XLSX

XLSX only

Return type

List<Map> (all rows in memory)

Stream<Map> (lazy, on-demand)

Memory footprint

Full file loaded

Substantially lower

Use case

Small to medium files

Large files (thousands of rows)

Properties

Option

Type

Default

Since

Description

hasHeaderRecord

Boolean

true

PFIMCORE-463

Indicates whether the input contains a header record.

skipHeaderRecord

Boolean

false

PFIMCORE-463

Determines whether to skip the header record in the output.

header

String


PFIMCORE-463

Comma-separated list of headers to use.

sheetIndex

Integer

0

PFIMCORE-463

Index of the sheet with the required data.

sheetName

String


PFIMCORE-463

Name of the sheet. If filled, takes precedence over sheetIndex.

Limitations

  • XLSX only — XLS files cannot be streamed. Use unmarshal for XLS files.

  • Streaming resources must be properly closed. The component handles cleanup automatically via CleanupFunction callbacks.

Examples

Default streaming unmarshal

XML
<to uri="pfx-excel:streamingUnmarshal"/>

Stream large file and process in batches

XML
<route>
  <from uri="file:inbox?fileName=large-products.xlsx"/>
  <to uri="pfx-excel:streamingUnmarshal"/>
  <split streaming="true">
    <simple>${body}</simple>
    <to uri="direct:processRow"/>
  </split>
</route>

Common Pitfalls

  • XLS files will fail — This method only supports XLSX format. Passing an XLS file will throw an exception. Use unmarshal for XLS files, or convert to XLSX first.

  • Stream must be consumed — The output is a lazy Stream<Map>, not a List. It can only be iterated once. If you need to process the data multiple times, collect it to a list first (but this defeats the memory advantage).

  • Use streaming="true" in split — When combining with Camel split, always set streaming="true" to maintain the low memory footprint. Without it, Camel materializes the entire stream before splitting.