Skip to main content

Refresh & Execution

Execute dataset queries and refresh data from datasources.

POST /api/datasets/{id}/_run

Run the dataset query with optional parameter overrides (without updating stored data).

Authentication: Required

Permission: dataset:write

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Request Body:

FieldTypeDescription
progressnumberProgress tracking identifier
paramsobjectParameter overrides

Example Request:

{
"params": {
"year": 2023,
"region": "West",
"minAmount": 1000
},
"progress": 12345
}

Response:

{
"records": 5432,
"duration": 2341,
"success": true,
"params": {
"year": 2023,
"region": "West",
"minAmount": 1000
}
}

Use Case:

Test query execution with different parameter values without affecting the dataset's stored data. Useful for previewing results before committing a refresh.


POST /api/datasets/{id}/_refresh

Refresh dataset data from the datasource.

Authentication: Required

Permission: dataset:refresh

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Request Body:

FieldTypeDefaultDescription
progressnumber-Progress tracking identifier
paramsobject-Parameter overrides
syncFieldsbooleantrueSynchronize field definitions
updateTypeMappingbooleantrueUpdate Elasticsearch mapping
disableRefreshIntervalbooleantrueDisable auto-refresh during operation

Example Request:

{
"params": {
"year": 2024,
"status": "completed"
},
"syncFields": true,
"updateTypeMapping": true,
"progress": 67890
}

Response:

{
"id": "sales-2024",
"records": 125000,
"previousRecordCount": 118500,
"duration": 15234,
"fieldsAdded": 2,
"fieldsUpdated": 3,
"dataUpdatedAt": "2024-02-08T15:30:00Z",
"success": true
}

Behavior:

  1. Executes dataset query with optional parameter overrides
  2. Clears existing data from Elasticsearch index
  3. Indexes new data from query results
  4. Synchronizes field definitions (if syncFields=true)
  5. Updates Elasticsearch mapping (if updateTypeMapping=true)
  6. Updates dataset dataUpdatedAt timestamp
  7. Updates record count

Progress Tracking:

Pass a progress identifier to track operation status via server-sent events or polling.

Long-Running Operation

Dataset refresh can take several minutes for large datasets. The default timeout is extended for this endpoint. Monitor progress to track completion.


POST /api/datasets/{id}/_refresh-internal

Internal refresh endpoint bypassing read access checks.

Authentication: Required

Permission: dataset:refresh

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Request Body:

Same as _refresh endpoint.

Response:

Same as _refresh endpoint.

Use Case:

Internal use by templates, scheduled jobs, and system processes that need to refresh datasets without user-level read access validation. Marked as isInternal: true.

Internal Endpoint

This endpoint is primarily used by Informer's internal systems (template refresh, job execution) and should not be called directly by client applications.


POST /api/datasets/{id}/_benchmark

Benchmark dataset execution performance.

Authentication: Required

Permission: dataset:refresh

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Request Body:

FieldTypeDescription
progressnumberProgress tracking identifier
paramsobjectParameter overrides
configobjectBenchmark configuration

Benchmark Configuration:

FieldTypeDefaultDescription
iterationsinteger5Number of test runs
warmupinteger1Warmup iterations (excluded from results)
includeDataTransferbooleanfalseInclude data transfer time
includeIndexingbooleanfalseInclude Elasticsearch indexing time

Example Request:

{
"params": {
"year": 2024
},
"config": {
"iterations": 10,
"warmup": 2,
"includeDataTransfer": true,
"includeIndexing": false
},
"progress": 11111
}

Response:

{
"iterations": 10,
"warmup": 2,
"results": [
{ "iteration": 1, "duration": 1234, "records": 125000 },
{ "iteration": 2, "duration": 1198, "records": 125000 },
{ "iteration": 3, "duration": 1256, "records": 125000 }
],
"statistics": {
"min": 1198,
"max": 1256,
"avg": 1229,
"median": 1234,
"stdDev": 24.5,
"records": 125000,
"avgRecordsPerSecond": 101706
},
"config": {
"includeDataTransfer": true,
"includeIndexing": false
}
}

Statistics:

FieldDescription
minFastest execution time (ms)
maxSlowest execution time (ms)
avgAverage execution time (ms)
medianMedian execution time (ms)
stdDevStandard deviation
recordsNumber of records returned
avgRecordsPerSecondAverage throughput

Use Case:

Measure query performance to identify bottlenecks, optimize queries, and establish performance baselines.


POST /api/datasets/{id}/_benchmark-internal

Internal benchmark endpoint.

Authentication: Required

Permission: dataset:refresh

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Request Body:

Same as _benchmark endpoint.

Response:

Same as _benchmark endpoint.

Use Case:

Internal use for system performance monitoring and automated benchmarking. Marked as isInternal: true.


Query Parameters

Dataset queries support parameterization for dynamic filtering and flexibility.

Parameter Definition

{
"name": "year",
"label": "Year",
"dataType": "number",
"defaultValue": 2024,
"required": true,
"description": "Fiscal year for reporting"
}

Parameter Properties:

PropertyTypeDescription
namestringParameter identifier (used in query as {{name}})
labelstringDisplay name
dataTypestringData type: string, number, date, boolean
defaultValueanyDefault value if not provided
requiredbooleanMust be provided for execution
descriptionstringHelp text for users

Query Usage

SQL query with parameters:

SELECT *
FROM orders
WHERE
year = {{year}}
AND region = {{region}}
AND amount >= {{minAmount}}

Runtime Override

Override parameters at execution time:

{
"params": {
"year": 2023,
"region": "East",
"minAmount": 5000
}
}

Refresh Strategies

Full Refresh

Complete data replacement (default behavior):

{
"syncFields": true,
"updateTypeMapping": true
}

Best for:

  • Complete data sync
  • Schema changes
  • Field additions/removals

Incremental Refresh

Append new data without clearing:

{
"params": {
"startDate": "2024-02-08"
},
"syncFields": false,
"updateTypeMapping": false
}

Best for:

  • Log/event data
  • Time-series data
  • Append-only datasets
Incremental Implementation

For incremental refreshes, use query parameters to filter for new records since last refresh. Store the last refresh timestamp in dataset metadata.

Schema Update Only

Sync fields without refreshing data:

POST /api/datasets/{id}/_syncFields

Best for:

  • Field definition updates
  • Label/format changes
  • Schema synchronization

Performance Optimization

Query Optimization

  • Use indexes on filter columns
  • Limit row scans with WHERE clauses
  • **Avoid SELECT *** when possible
  • Use query parameters for dynamic filtering
  • Test with EXPLAIN before deployment

Elasticsearch Optimization

  • Batch size: 1000-5000 documents per batch
  • Bulk indexing: Use bulk API for large datasets
  • Refresh interval: Increase during bulk operations
  • Replicas: Disable during initial load, enable after
  • Force merge: After bulk load completes

Monitoring

  • Track execution time with benchmarking
  • Monitor index size growth
  • Watch memory usage during refresh
  • Review query logs for slow queries
  • Set alerts for failed refreshes

Error Handling

Common Errors

Query Timeout:

{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Query execution timeout after 60000ms"
}

Missing Parameters:

{
"statusCode": 400,
"error": "Bad Request",
"message": "Required parameter 'year' not provided"
}

Datasource Connection:

{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Unable to connect to datasource: Connection refused"
}

Elasticsearch Indexing:

{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Elasticsearch indexing failed: Mapping conflict"
}

Retry Strategy

  • Connection errors: Retry with exponential backoff
  • Timeout errors: Increase timeout or optimize query
  • Mapping conflicts: Sync fields and retry
  • Out of memory: Reduce batch size or add resources

Best Practices

Scheduling

  • Off-peak hours: Schedule refreshes during low-traffic periods
  • Stagger refreshes: Avoid concurrent large refreshes
  • Monitor duration: Track and optimize slow refreshes
  • Set timeouts: Prevent runaway queries

Data Quality

  • Validate data: Check record counts before/after
  • Monitor errors: Log and alert on failures
  • Test parameters: Validate parameter values
  • Backup data: Snapshot before major refreshes

Security

  • Limit permissions: Restrict refresh to data wizards
  • Audit refreshes: Log all refresh operations
  • Validate queries: Sanitize SQL to prevent injection
  • Parameter validation: Check types and ranges