Skip to main content

Mapping & Index Management

Elasticsearch index and mapping operations for dataset storage.

GET /api/datasets/{id}/mapping

Get the Elasticsearch mapping for the dataset index.

Authentication: Required

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Response:

{
"i5_abc123_sales_2024": {
"mappings": {
"properties": {
"order_id": {
"type": "keyword"
},
"customer_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"amount": {
"type": "double"
},
"order_date": {
"type": "date"
},
"region": {
"type": "keyword"
}
}
}
}
}

Mapping Properties:

  • type - Elasticsearch field type
  • fields - Multi-field definitions
  • analyzer - Text analysis configuration
  • format - Date format patterns
  • ignore_above - Maximum indexed string length

Use Case:

Inspect the Elasticsearch mapping to understand how data is indexed and queried. Useful for debugging search issues and optimizing queries.


GET /api/datasets/{id}/index/status

Get Elasticsearch index status and health.

Authentication: Required

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Response:

{
"index": "i5_abc123_sales_2024",
"health": "green",
"status": "open",
"numberOfShards": 5,
"numberOfReplicas": 1,
"docsCount": 125000,
"docsDeleted": 50,
"storeSize": "45.7mb",
"primaryStoreSize": "22.8mb"
}

Health Status:

StatusDescription
greenAll shards allocated, fully operational
yellowPrimary shards allocated, some replicas missing
redSome primary shards unallocated, data unavailable

GET /api/datasets/{id}/index/count

Get the document count in the dataset index.

Authentication: Required

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Response:

{
"count": 125000
}

GET /api/datasets/{id}/index/size

Get the index size in bytes.

Authentication: Required

Path Parameters:

ParameterTypeDescription
idstringDataset ID or slug

Response:

{
"size": 45678901,
"sizeFormatted": "43.5 MB",
"primarySize": 22839450,
"primarySizeFormatted": "21.8 MB"
}

Size Components:

  • size - Total index size including replicas
  • primarySize - Size of primary shards only
  • sizeFormatted - Human-readable size
  • primarySizeFormatted - Human-readable primary size

DELETE /api/es-index/{id}

Delete an Elasticsearch index by ID (admin operation).

Authentication: Required

Permission: tenant:superuser

Path Parameters:

ParameterTypeDescription
idstringElasticsearch index name

Response:

{
"acknowledged": true,
"index": "i5_abc123_old_dataset"
}
Administrative Operation

This endpoint bypasses dataset-level permissions and directly deletes Elasticsearch indices. Use with extreme caution. Deleted indices cannot be recovered.


POST /api/vacuum-es-indices

Delete multiple Elasticsearch indices in batch (admin operation).

Authentication: Required

Permission: tenant:superuser

Request Body:

FieldTypeRequiredDescription
indicesarrayYesArray of index names to delete

Example Request:

{
"indices": [
"i5_abc123_old_dataset_1",
"i5_abc123_old_dataset_2",
"i5_abc123_temp_import"
]
}

Response:

{
"deleted": 3,
"indices": [
"i5_abc123_old_dataset_1",
"i5_abc123_old_dataset_2",
"i5_abc123_temp_import"
]
}

Use Case:

Batch delete orphaned or temporary indices during cleanup or vacuum operations.

Batch Deletion

This operation permanently deletes multiple indices. Verify the index list before executing.


Elasticsearch Mapping Types

String Fields

Text (full-text search):

{
"type": "text",
"analyzer": "standard",
"search_analyzer": "standard"
}

Keyword (exact match, sorting, aggregations):

{
"type": "keyword",
"ignore_above": 256
}

Multi-field (both text and keyword):

{
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}

Numeric Fields

Integer:

{
"type": "long"
}

Decimal:

{
"type": "double"
}

Date Fields

{
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}

Boolean Fields

{
"type": "boolean"
}

Object Fields

{
"type": "object",
"properties": {
"street": { "type": "text" },
"city": { "type": "keyword" },
"zip": { "type": "keyword" }
}
}

Nested Fields

{
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"quantity": { "type": "long" }
}
}

Index Management Best Practices

Shard Configuration

  • Small datasets (<1GB): 1 shard
  • Medium datasets (1-10GB): 2-3 shards
  • Large datasets (>10GB): 5+ shards
  • Replicas: 1 for production, 0 for development

Mapping Strategy

  • Plan ahead: Mapping changes require reindexing
  • Use keyword for exact match: IDs, categories, statuses
  • Use text for search: Names, descriptions, comments
  • Multi-field common strings: Enable both search and sorting
  • Set ignore_above: Prevent indexing of very long strings

Performance Optimization

  • Refresh interval: Increase for bulk imports
  • Disable replicas: During initial load
  • Use bulk API: For large data imports
  • Force merge: After bulk operations complete
  • Delete by query: Clean up data efficiently

Monitoring

  • Watch cluster health: Prevent yellow/red status
  • Monitor shard sizes: Keep under 50GB per shard
  • Track query performance: Optimize slow queries
  • Review index stats: Identify growth patterns

Index Lifecycle

Creation

  1. Dataset created
  2. Elasticsearch index created with default mapping
  3. Fields indexed based on first data batch
  4. Mapping dynamically updated as needed

Updates

  1. Field definitions updated
  2. Mapping updated via field sync
  3. Data reindexed if types change
  4. Old mapping preserved for existing docs

Deletion

  1. Dataset deleted
  2. Elasticsearch index deleted
  3. All documents removed
  4. Mapping removed

Vacuum

  1. Orphaned indices identified
  2. Admin reviews index list
  3. Batch deletion performed
  4. Disk space reclaimed
Automatic Cleanup

Informer includes a vacuum job that identifies and removes orphaned Elasticsearch indices. Configure the schedule via the admin vacuum endpoint.