Refresh & Execution
Execute dataset queries and refresh data from datasources.
POST /api/datasets/{id}/_run
Run the dataset query with optional parameter overrides (without updating stored data).
Authentication: Required
Permission: dataset:write
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Request Body:
| Field | Type | Description |
|---|---|---|
progress | number | Progress tracking identifier |
params | object | Parameter overrides |
Example Request:
{
"params": {
"year": 2023,
"region": "West",
"minAmount": 1000
},
"progress": 12345
}
Response:
{
"records": 5432,
"duration": 2341,
"success": true,
"params": {
"year": 2023,
"region": "West",
"minAmount": 1000
}
}
Use Case:
Test query execution with different parameter values without affecting the dataset's stored data. Useful for previewing results before committing a refresh.
POST /api/datasets/{id}/_refresh
Refresh dataset data from the datasource.
Authentication: Required
Permission: dataset:refresh
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Request Body:
| Field | Type | Default | Description |
|---|---|---|---|
progress | number | - | Progress tracking identifier |
params | object | - | Parameter overrides |
syncFields | boolean | true | Synchronize field definitions |
updateTypeMapping | boolean | true | Update Elasticsearch mapping |
disableRefreshInterval | boolean | true | Disable auto-refresh during operation |
Example Request:
{
"params": {
"year": 2024,
"status": "completed"
},
"syncFields": true,
"updateTypeMapping": true,
"progress": 67890
}
Response:
{
"id": "sales-2024",
"records": 125000,
"previousRecordCount": 118500,
"duration": 15234,
"fieldsAdded": 2,
"fieldsUpdated": 3,
"dataUpdatedAt": "2024-02-08T15:30:00Z",
"success": true
}
Behavior:
- Executes dataset query with optional parameter overrides
- Clears existing data from Elasticsearch index
- Indexes new data from query results
- Synchronizes field definitions (if
syncFields=true) - Updates Elasticsearch mapping (if
updateTypeMapping=true) - Updates dataset
dataUpdatedAttimestamp - Updates record count
Progress Tracking:
Pass a progress identifier to track operation status via server-sent events or polling.
Dataset refresh can take several minutes for large datasets. The default timeout is extended for this endpoint. Monitor progress to track completion.
POST /api/datasets/{id}/_refresh-internal
Internal refresh endpoint bypassing read access checks.
Authentication: Required
Permission: dataset:refresh
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Request Body:
Same as _refresh endpoint.
Response:
Same as _refresh endpoint.
Use Case:
Internal use by templates, scheduled jobs, and system processes that need to refresh datasets without user-level read access validation. Marked as isInternal: true.
This endpoint is primarily used by Informer's internal systems (template refresh, job execution) and should not be called directly by client applications.
POST /api/datasets/{id}/_benchmark
Benchmark dataset execution performance.
Authentication: Required
Permission: dataset:refresh
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Request Body:
| Field | Type | Description |
|---|---|---|
progress | number | Progress tracking identifier |
params | object | Parameter overrides |
config | object | Benchmark configuration |
Benchmark Configuration:
| Field | Type | Default | Description |
|---|---|---|---|
iterations | integer | 5 | Number of test runs |
warmup | integer | 1 | Warmup iterations (excluded from results) |
includeDataTransfer | boolean | false | Include data transfer time |
includeIndexing | boolean | false | Include Elasticsearch indexing time |
Example Request:
{
"params": {
"year": 2024
},
"config": {
"iterations": 10,
"warmup": 2,
"includeDataTransfer": true,
"includeIndexing": false
},
"progress": 11111
}
Response:
{
"iterations": 10,
"warmup": 2,
"results": [
{ "iteration": 1, "duration": 1234, "records": 125000 },
{ "iteration": 2, "duration": 1198, "records": 125000 },
{ "iteration": 3, "duration": 1256, "records": 125000 }
],
"statistics": {
"min": 1198,
"max": 1256,
"avg": 1229,
"median": 1234,
"stdDev": 24.5,
"records": 125000,
"avgRecordsPerSecond": 101706
},
"config": {
"includeDataTransfer": true,
"includeIndexing": false
}
}
Statistics:
| Field | Description |
|---|---|
min | Fastest execution time (ms) |
max | Slowest execution time (ms) |
avg | Average execution time (ms) |
median | Median execution time (ms) |
stdDev | Standard deviation |
records | Number of records returned |
avgRecordsPerSecond | Average throughput |
Use Case:
Measure query performance to identify bottlenecks, optimize queries, and establish performance baselines.
POST /api/datasets/{id}/_benchmark-internal
Internal benchmark endpoint.
Authentication: Required
Permission: dataset:refresh
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Request Body:
Same as _benchmark endpoint.
Response:
Same as _benchmark endpoint.
Use Case:
Internal use for system performance monitoring and automated benchmarking. Marked as isInternal: true.
Query Parameters
Dataset queries support parameterization for dynamic filtering and flexibility.
Parameter Definition
{
"name": "year",
"label": "Year",
"dataType": "number",
"defaultValue": 2024,
"required": true,
"description": "Fiscal year for reporting"
}
Parameter Properties:
| Property | Type | Description |
|---|---|---|
name | string | Parameter identifier (used in query as {{name}}) |
label | string | Display name |
dataType | string | Data type: string, number, date, boolean |
defaultValue | any | Default value if not provided |
required | boolean | Must be provided for execution |
description | string | Help text for users |
Query Usage
SQL query with parameters:
SELECT *
FROM orders
WHERE
year = {{year}}
AND region = {{region}}
AND amount >= {{minAmount}}
Runtime Override
Override parameters at execution time:
{
"params": {
"year": 2023,
"region": "East",
"minAmount": 5000
}
}
Refresh Strategies
Full Refresh
Complete data replacement (default behavior):
{
"syncFields": true,
"updateTypeMapping": true
}
Best for:
- Complete data sync
- Schema changes
- Field additions/removals
Incremental Refresh
Append new data without clearing:
{
"params": {
"startDate": "2024-02-08"
},
"syncFields": false,
"updateTypeMapping": false
}
Best for:
- Log/event data
- Time-series data
- Append-only datasets
For incremental refreshes, use query parameters to filter for new records since last refresh. Store the last refresh timestamp in dataset metadata.
Schema Update Only
Sync fields without refreshing data:
POST /api/datasets/{id}/_syncFields
Best for:
- Field definition updates
- Label/format changes
- Schema synchronization
Performance Optimization
Query Optimization
- Use indexes on filter columns
- Limit row scans with WHERE clauses
- **Avoid SELECT *** when possible
- Use query parameters for dynamic filtering
- Test with EXPLAIN before deployment
Elasticsearch Optimization
- Batch size: 1000-5000 documents per batch
- Bulk indexing: Use bulk API for large datasets
- Refresh interval: Increase during bulk operations
- Replicas: Disable during initial load, enable after
- Force merge: After bulk load completes
Monitoring
- Track execution time with benchmarking
- Monitor index size growth
- Watch memory usage during refresh
- Review query logs for slow queries
- Set alerts for failed refreshes
Error Handling
Common Errors
Query Timeout:
{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Query execution timeout after 60000ms"
}
Missing Parameters:
{
"statusCode": 400,
"error": "Bad Request",
"message": "Required parameter 'year' not provided"
}
Datasource Connection:
{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Unable to connect to datasource: Connection refused"
}
Elasticsearch Indexing:
{
"statusCode": 500,
"error": "Internal Server Error",
"message": "Elasticsearch indexing failed: Mapping conflict"
}
Retry Strategy
- Connection errors: Retry with exponential backoff
- Timeout errors: Increase timeout or optimize query
- Mapping conflicts: Sync fields and retry
- Out of memory: Reduce batch size or add resources
Best Practices
Scheduling
- Off-peak hours: Schedule refreshes during low-traffic periods
- Stagger refreshes: Avoid concurrent large refreshes
- Monitor duration: Track and optimize slow refreshes
- Set timeouts: Prevent runaway queries
Data Quality
- Validate data: Check record counts before/after
- Monitor errors: Log and alert on failures
- Test parameters: Validate parameter values
- Backup data: Snapshot before major refreshes
Security
- Limit permissions: Restrict refresh to data wizards
- Audit refreshes: Log all refresh operations
- Validate queries: Sanitize SQL to prevent injection
- Parameter validation: Check types and ranges