Mapping & Index Management
Elasticsearch index and mapping operations for dataset storage.
GET /api/datasets/{id}/mapping
Get the Elasticsearch mapping for the dataset index.
Authentication: Required
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Response:
{
"i5_abc123_sales_2024": {
"mappings": {
"properties": {
"order_id": {
"type": "keyword"
},
"customer_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"amount": {
"type": "double"
},
"order_date": {
"type": "date"
},
"region": {
"type": "keyword"
}
}
}
}
}
Mapping Properties:
- type - Elasticsearch field type
- fields - Multi-field definitions
- analyzer - Text analysis configuration
- format - Date format patterns
- ignore_above - Maximum indexed string length
Use Case:
Inspect the Elasticsearch mapping to understand how data is indexed and queried. Useful for debugging search issues and optimizing queries.
GET /api/datasets/{id}/index/status
Get Elasticsearch index status and health.
Authentication: Required
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Response:
{
"index": "i5_abc123_sales_2024",
"health": "green",
"status": "open",
"numberOfShards": 5,
"numberOfReplicas": 1,
"docsCount": 125000,
"docsDeleted": 50,
"storeSize": "45.7mb",
"primaryStoreSize": "22.8mb"
}
Health Status:
| Status | Description |
|---|---|
green | All shards allocated, fully operational |
yellow | Primary shards allocated, some replicas missing |
red | Some primary shards unallocated, data unavailable |
GET /api/datasets/{id}/index/count
Get the document count in the dataset index.
Authentication: Required
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Response:
{
"count": 125000
}
GET /api/datasets/{id}/index/size
Get the index size in bytes.
Authentication: Required
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Dataset ID or slug |
Response:
{
"size": 45678901,
"sizeFormatted": "43.5 MB",
"primarySize": 22839450,
"primarySizeFormatted": "21.8 MB"
}
Size Components:
- size - Total index size including replicas
- primarySize - Size of primary shards only
- sizeFormatted - Human-readable size
- primarySizeFormatted - Human-readable primary size
DELETE /api/es-index/{id}
Delete an Elasticsearch index by ID (admin operation).
Authentication: Required
Permission: tenant:superuser
Path Parameters:
| Parameter | Type | Description |
|---|---|---|
id | string | Elasticsearch index name |
Response:
{
"acknowledged": true,
"index": "i5_abc123_old_dataset"
}
This endpoint bypasses dataset-level permissions and directly deletes Elasticsearch indices. Use with extreme caution. Deleted indices cannot be recovered.
POST /api/vacuum-es-indices
Delete multiple Elasticsearch indices in batch (admin operation).
Authentication: Required
Permission: tenant:superuser
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
indices | array | Yes | Array of index names to delete |
Example Request:
{
"indices": [
"i5_abc123_old_dataset_1",
"i5_abc123_old_dataset_2",
"i5_abc123_temp_import"
]
}
Response:
{
"deleted": 3,
"indices": [
"i5_abc123_old_dataset_1",
"i5_abc123_old_dataset_2",
"i5_abc123_temp_import"
]
}
Use Case:
Batch delete orphaned or temporary indices during cleanup or vacuum operations.
This operation permanently deletes multiple indices. Verify the index list before executing.
Elasticsearch Mapping Types
String Fields
Text (full-text search):
{
"type": "text",
"analyzer": "standard",
"search_analyzer": "standard"
}
Keyword (exact match, sorting, aggregations):
{
"type": "keyword",
"ignore_above": 256
}
Multi-field (both text and keyword):
{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Numeric Fields
Integer:
{
"type": "long"
}
Decimal:
{
"type": "double"
}
Date Fields
{
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
Boolean Fields
{
"type": "boolean"
}
Object Fields
{
"type": "object",
"properties": {
"street": { "type": "text" },
"city": { "type": "keyword" },
"zip": { "type": "keyword" }
}
}
Nested Fields
{
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"quantity": { "type": "long" }
}
}
Index Management Best Practices
Shard Configuration
- Small datasets (<1GB): 1 shard
- Medium datasets (1-10GB): 2-3 shards
- Large datasets (>10GB): 5+ shards
- Replicas: 1 for production, 0 for development
Mapping Strategy
- Plan ahead: Mapping changes require reindexing
- Use keyword for exact match: IDs, categories, statuses
- Use text for search: Names, descriptions, comments
- Multi-field common strings: Enable both search and sorting
- Set ignore_above: Prevent indexing of very long strings
Performance Optimization
- Refresh interval: Increase for bulk imports
- Disable replicas: During initial load
- Use bulk API: For large data imports
- Force merge: After bulk operations complete
- Delete by query: Clean up data efficiently
Monitoring
- Watch cluster health: Prevent yellow/red status
- Monitor shard sizes: Keep under 50GB per shard
- Track query performance: Optimize slow queries
- Review index stats: Identify growth patterns
Index Lifecycle
Creation
- Dataset created
- Elasticsearch index created with default mapping
- Fields indexed based on first data batch
- Mapping dynamically updated as needed
Updates
- Field definitions updated
- Mapping updated via field sync
- Data reindexed if types change
- Old mapping preserved for existing docs
Deletion
- Dataset deleted
- Elasticsearch index deleted
- All documents removed
- Mapping removed
Vacuum
- Orphaned indices identified
- Admin reviews index list
- Batch deletion performed
- Disk space reclaimed
Informer includes a vacuum job that identifies and removes orphaned Elasticsearch indices. Configure the schedule via the admin vacuum endpoint.