Mapping & Index Management

Elasticsearch index and mapping operations for dataset storage.

GET /api/datasets/{id}/mapping

Get the Elasticsearch mapping for the dataset index.

Authentication: Required

Path Parameters:

Parameter	Type	Description
`id`	string	Dataset ID or slug

Response:

{
  "i5_abc123_sales_2024": {
    "mappings": {
      "properties": {
        "order_id": {
          "type": "keyword"
        },
        "customer_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "amount": {
          "type": "double"
        },
        "order_date": {
          "type": "date"
        },
        "region": {
          "type": "keyword"
        }
      }
    }
  }
}

Mapping Properties:

type - Elasticsearch field type
fields - Multi-field definitions
analyzer - Text analysis configuration
format - Date format patterns
ignore_above - Maximum indexed string length

Use Case:

Inspect the Elasticsearch mapping to understand how data is indexed and queried. Useful for debugging search issues and optimizing queries.

GET /api/datasets/{id}/index/status

Get Elasticsearch index status and health.

Authentication: Required

Path Parameters:

Parameter	Type	Description
`id`	string	Dataset ID or slug

Response:

{
  "index": "i5_abc123_sales_2024",
  "health": "green",
  "status": "open",
  "numberOfShards": 5,
  "numberOfReplicas": 1,
  "docsCount": 125000,
  "docsDeleted": 50,
  "storeSize": "45.7mb",
  "primaryStoreSize": "22.8mb"
}

Health Status:

Status	Description
`green`	All shards allocated, fully operational
`yellow`	Primary shards allocated, some replicas missing
`red`	Some primary shards unallocated, data unavailable

GET /api/datasets/{id}/index/count

Get the document count in the dataset index.

Authentication: Required

Path Parameters:

Parameter	Type	Description
`id`	string	Dataset ID or slug

Response:

{
  "count": 125000
}

GET /api/datasets/{id}/index/size

Get the index size in bytes.

Authentication: Required

Path Parameters:

Parameter	Type	Description
`id`	string	Dataset ID or slug

Response:

{
  "size": 45678901,
  "sizeFormatted": "43.5 MB",
  "primarySize": 22839450,
  "primarySizeFormatted": "21.8 MB"
}

Size Components:

size - Total index size including replicas
primarySize - Size of primary shards only
sizeFormatted - Human-readable size
primarySizeFormatted - Human-readable primary size

DELETE /api/es-index/{id}

Delete an Elasticsearch index by ID (admin operation).

Authentication: Required

Permission: tenant:superuser

Path Parameters:

Parameter	Type	Description
`id`	string	Elasticsearch index name

Response:

{
  "acknowledged": true,
  "index": "i5_abc123_old_dataset"
}

Administrative Operation

This endpoint bypasses dataset-level permissions and directly deletes Elasticsearch indices. Use with extreme caution. Deleted indices cannot be recovered.

POST /api/vacuum-es-indices

Delete multiple Elasticsearch indices in batch (admin operation).

Authentication: Required

Permission: tenant:superuser

Request Body:

Field	Type	Required	Description
`indices`	array	Yes	Array of index names to delete

Example Request:

{
  "indices": [
    "i5_abc123_old_dataset_1",
    "i5_abc123_old_dataset_2",
    "i5_abc123_temp_import"
  ]
}

Response:

{
  "deleted": 3,
  "indices": [
    "i5_abc123_old_dataset_1",
    "i5_abc123_old_dataset_2",
    "i5_abc123_temp_import"
  ]
}

Use Case:

Batch delete orphaned or temporary indices during cleanup or vacuum operations.

Batch Deletion

This operation permanently deletes multiple indices. Verify the index list before executing.

Elasticsearch Mapping Types

String Fields

Text (full-text search):

{
  "type": "text",
  "analyzer": "standard",
  "search_analyzer": "standard"
}

Keyword (exact match, sorting, aggregations):

{
  "type": "keyword",
  "ignore_above": 256
}

Multi-field (both text and keyword):

{
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

Numeric Fields

Integer:

{
  "type": "long"
}

Decimal:

{
  "type": "double"
}

Date Fields

{
  "type": "date",
  "format": "strict_date_optional_time||epoch_millis"
}

Boolean Fields

{
  "type": "boolean"
}

Object Fields

{
  "type": "object",
  "properties": {
    "street": { "type": "text" },
    "city": { "type": "keyword" },
    "zip": { "type": "keyword" }
  }
}

Nested Fields

{
  "type": "nested",
  "properties": {
    "name": { "type": "keyword" },
    "quantity": { "type": "long" }
  }
}

Index Management Best Practices

Shard Configuration

Small datasets (<1GB): 1 shard
Medium datasets (1-10GB): 2-3 shards
Large datasets (>10GB): 5+ shards
Replicas: 1 for production, 0 for development

Mapping Strategy

Plan ahead: Mapping changes require reindexing
Use keyword for exact match: IDs, categories, statuses
Use text for search: Names, descriptions, comments
Multi-field common strings: Enable both search and sorting
Set ignore_above: Prevent indexing of very long strings

Performance Optimization

Refresh interval: Increase for bulk imports
Disable replicas: During initial load
Use bulk API: For large data imports
Force merge: After bulk operations complete
Delete by query: Clean up data efficiently

Monitoring

Watch cluster health: Prevent yellow/red status
Monitor shard sizes: Keep under 50GB per shard
Track query performance: Optimize slow queries
Review index stats: Identify growth patterns

Index Lifecycle

Creation

Dataset created
Elasticsearch index created with default mapping
Fields indexed based on first data batch
Mapping dynamically updated as needed

Updates

Field definitions updated
Mapping updated via field sync
Data reindexed if types change
Old mapping preserved for existing docs

Deletion

Dataset deleted
Elasticsearch index deleted
All documents removed
Mapping removed

Vacuum

Orphaned indices identified
Admin reviews index list
Batch deletion performed
Disk space reclaimed

Automatic Cleanup

Informer includes a vacuum job that identifies and removes orphaned Elasticsearch indices. Configure the schedule via the admin vacuum endpoint.

GET /api/datasets/{id}/mapping​

GET /api/datasets/{id}/index/status​

GET /api/datasets/{id}/index/count​

GET /api/datasets/{id}/index/size​

DELETE /api/es-index/{id}​

POST /api/vacuum-es-indices​

Elasticsearch Mapping Types​

String Fields​

Numeric Fields​

Date Fields​

Boolean Fields​

Object Fields​

Nested Fields​

Index Management Best Practices​

Shard Configuration​

Mapping Strategy​

Performance Optimization​

Monitoring​

Index Lifecycle​

Creation​

Updates​

Deletion​

Vacuum​

GET /api/datasets/{id}/mapping

GET /api/datasets/{id}/index/status

GET /api/datasets/{id}/index/count

GET /api/datasets/{id}/index/size

DELETE /api/es-index/{id}

POST /api/vacuum-es-indices

Elasticsearch Mapping Types

String Fields

Numeric Fields

Date Fields

Boolean Fields

Object Fields

Nested Fields

Index Management Best Practices

Shard Configuration

Mapping Strategy

Performance Optimization

Monitoring

Index Lifecycle

Creation

Updates

Deletion

Vacuum