Overview
The pfx-gp component loads bulk data into a Greenplum database via SSH/SFTP. It uploads the CSV body to the Greenplum master host and uses gpload to perform MERGE, UPDATE, or INSERT operations. Use it for high-throughput database synchronisation from Pricefx to Greenplum.
URI pattern: pfx-gp:loaddata[?options]
Methods
|
Method |
Description |
|---|---|
|
|
Bulk-load the CSV body into a Greenplum table via gpload |
Parameters
Connection (required)
|
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
String |
— |
Greenplum master host IP or hostname |
|
|
Integer |
— |
Greenplum master port |
|
|
String |
— |
Target database name |
|
|
String |
— |
Database user |
|
|
String |
— |
Database password — use |
|
|
String |
— |
SSH/SFTP host for file upload |
|
|
int |
|
SSH port |
|
|
String |
— |
SSH username |
|
|
String |
— |
SSH password — use |
|
|
String |
|
Directory on SSH host for staging files |
Load target
|
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
String |
— |
Required. Target Greenplum table name |
|
|
String |
|
Load mode: |
|
|
List |
— |
Columns used to match existing rows (MERGE/UPDATE) |
|
|
List |
— |
Columns to update on match |
|
|
String |
— |
Custom SQL condition for update |
|
|
boolean |
|
Truncate table before loading |
Format
|
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
String |
|
Input format ( |
|
|
String |
|
Field delimiter |
|
|
String |
|
Escape character |
|
|
String |
|
Quote character |
|
|
boolean |
|
Input includes a header row |
|
|
List |
— |
Columns that must not be null |
|
|
int |
|
Number of tolerated errors before aborting |
|
|
boolean |
|
Log rejected rows instead of aborting |
|
|
int |
|
Start of gpload port range |
|
|
int |
|
End of gpload port range |
|
|
boolean |
|
Reuse gpload staging tables across runs |
Example
MERGE from Pricefx export to Greenplum
<routes xmlns="http://camel.apache.org/schema/spring">
<route id="exportToGreenplum">
<from uri="timer://sync?repeatCount=1"/>
<to uri="pfx-api:fetch?objectType=PX&filter=pricelistFilter"/>
<to uri="pfx-csv:marshal?delimiter=,"/>
<to uri="pfx-gp:loaddata
?masterHost={{gp.host}}
&masterPort={{gp.port}}
&database={{gp.database}}
&dbUser={{gp.user}}
&dbPassword={{gp.password}}
&sshHost={{gp.ssh.host}}
&sshUsername={{gp.ssh.user}}
&sshPassword={{gp.ssh.password}}
&table=pricelist_items
&mode=MERGE
&matchColumns=sku,pricelist_id
&header=true"/>
</route>
</routes>
Common Pitfalls
-
Never hardcode credentials — always use
{{property}}placeholders fordbPasswordandsshPassword. -
matchColumnsis required forMERGEandUPDATEmodes — omitting it causes all rows to be treated as inserts. -
Greenplum gpload requires network access from the IM instance to both the
masterHost(database port) and thesshHost(SSH port). Verify firewall rules before deployment. -
portFrom/portTodefine the port range used by gpload for parallel loading — ensure this range is open between Greenplum segments and the IM instance. -
Set
header=trueif your marshalled CSV includes a header row so gpload skips it during load.