When to Use api.find and api.stream
The rule is that whenever you have a certain limited number of records to retrieve, you should always use api.find with the limits set as the parameter.
Using api.find with either a hard-coded limit such as api.find(”P”, 0, 200, ...) or with api.find(”P”, 0, api.getMaxFindResultsLimist(), ...) is a bad practice because you are expecting just a certain number of rows being present in the table. The exception is if you really want to load only a certain number of rows or you are using api.find in a loop in certain cases.
When you do not know the number of records, you have two options: either use api.stream or use api.find in a loop. The preferred way is in most cases api.stream.
def iter = api.stream("P", "sku", ["sku", "attribute1"], *filters)
while (iter.hasNext()) {
def row = iter.next()
// process the row...
// if a performance intensive work is done here,
// such as another access to the DB or a datamart query
// then use api.find instead
}
iter.close()
def start = 0
def data = null
while (data = api.find("P", start, api.getMaxFindResultsLimit(),
"sku", ["sku", "attribute1"], *filters)) {
start += data.size()
for (row in data) {
// process the row
}
}
The preferred way for loading undefined amount of data from the database is api.stream with these exceptions:
-
If the code within the loop takes significant time, then you should use
api.findinstead. The reason is thatapi.streammaintains an open connection to the database during the processing and this can have a negative impact on the performance, whereasapi.findfetches the data at once and no connection is maintained. -
The input generation (syntax check) mode is enabled.
(information) See also Data Querying using api.find() and api.stream() and General Queries (Quick Reference).
Beware of Groovy Closures Performance
It is a fact that using the Groovy closures have overhead and you should be very careful when iterating over a big amount of data. To demonstrate this here is a simple logic which just sums up numbers in a list.
(1..n).collect { it }.sum()
long sum = 0
long i = 1
while (i <= n) {
sum += i
++i
}
return sum
long sum = 0
(1..n).each { sum += it }
return sum
long sum = 0
for (long i = 1; i <= n; ++i) {
sum += i
}
return sum
Here are the measured results for a list of size n. The duration is in milliseconds.
|
Duration for list of size n [ms] |
1 000 x |
10 000 x |
100 000 x |
1 000 000 x |
10 000 000 x |
|---|---|---|---|---|---|
|
collect + sum |
12 |
130 |
904 |
9 014 |
90 242 |
|
each |
12 |
93 |
881 |
8 747 |
88 896 |
|
while |
2 |
12 |
111 |
708 |
6 925 |
|
for |
1 |
11 |
110 |
705 |
6 820 |
Here is a different example with a slightly more complex logic: https://dzone.com/articles/loops-performance-in-groovy
It is clear that for small lists the overhead does not play a significant role in the total calculation time but for larger fields it is much better to stick to the classic while-loop or for-loop.