Customize the Search Index

Under the hood Canopy uses FlexSearch (opens in a new tab) to power the search index. FlexSearch is a full-text, memory efficient, search library that is fast and easy to use. It is also highly customizable. Canopy provides a number of ways to customize the search index.

💡

This guide assumes you have a Canopy IIIF project. See the Create a Project guide to get started.

Use Case

You'd like to use Canopy IIIF to create a digital exhibit featuring Arabic manuscripts. For example, the Arabic Manuscripts from West Africa (opens in a new tab) provided by Northwestern University. The IIIF Manifest data (opens in a new tab) contains both Arabic script and English text in its label and summary properties.

You'd like to customize search configuration in the following three ways:

  • Support the "Arabic" character set in Search (in addition to default "Latin").
  • Include text from Manifest summary values in search results.
  • Include additional Manifest metadata in search results. In our example Manifests (opens in a new tab) we include "Contributor" and "Alternate Title" as metadata items and would like to surface these in search results.

Implementation

Add search configuration

Setup a Canopy IIIF project with the following configuration, including the search property with default values.

config/canopy.json
{
  "collection": "https://api.dc.library.northwestern.edu/api/v2/collections/59ec43f9-a96c-4314-9b44-9923790b371c?as=iiif&size=100",
  "search": {
    "enabled": true,
    "flexSearch": {
      "bidirectional": false,
      "charset": "latin:extra",
      "document": {
        "index": [
          {
            "bidirectional": true,
            "depth": 3,
            "field": "label",
            "resolution": 9,
            "tokenize": "full"
          },
          {
            "field": "metadata",
            "resolution": 2
          },
          {
            "field": "summary",
            "resolution": 1
          }
        ]
      },
      "optimize": true,
      "tokenize": "strict"
    },
    "index": {
      "metadata": {
        "all": false,
        "enabled": true
      },
      "summary": {
        "enabled": false
      }
    }
  }
}

Support additional language charsets

Edit config/canopy.json and add the additional language encoding, arabic:extra, to the search.flexSearch.charset property. The entries should be an array of strings as we are using multiple language encodings.

config/canopy.json
{
  "collection": "https://api.dc.library.northwestern.edu/api/v2/collections/59ec43f9-a96c-4314-9b44-9923790b371c?as=iiif&size=100",
  "search": {
    "enabled": true,
    "flexSearch": {
      "bidirectional": false,
      "charset": ["latin:extra", "arabic:extra"],
      "document": {
        "index": [
          {
            "bidirectional": true,
            "depth": 3,
            "field": "label",
            "resolution": 9,
            "tokenize": "full"
          },
          {
            "field": "metadata",
            "resolution": 2
          },
          {
            "field": "summary",
            "resolution": 1
          }
        ]
      },
      "optimize": true,
      "tokenize": "strict"
    },
    "index": {
      "metadata": {
        "all": false,
        "enabled": true
      },
      "summary": {
        "enabled": false
      }
    }
  }
}

Include summary in search results

The default search configuration indexes only Manifest label and metadata values.

To include Manifest summary values in the search index, update the search.index.summary.enabled to true.

config/canopy.json
{
  "collection": "https://api.dc.library.northwestern.edu/api/v2/collections/59ec43f9-a96c-4314-9b44-9923790b371c?as=iiif&size=100",
  "search": {
    "enabled": true,
    "flexSearch": {
      "bidirectional": false,
      "charset": ["latin:extra", "arabic:extra"],
      "document": {
        "index": [
          {
            "bidirectional": true,
            "depth": 3,
            "field": "label",
            "resolution": 9,
            "tokenize": "full"
          },
          {
            "field": "metadata",
            "resolution": 2
          },
          {
            "field": "summary",
            "resolution": 1
          }
        ]
      },
      "optimize": true,
      "tokenize": "strict"
    },
    "index": {
      "metadata": {
        "all": false,
        "enabled": true
      },
      "summary": {
        "enabled": true
      }
    }
  }
}

Curate metadata labels for indexing

Implementers may choose to index all, part, or none of the metadata in Manifests. By default, Canopy IIIF indexes only values defined in the metadata property of config/canopy.json file.

Our source IIIF Collection has Manifests with specific metadata content to index, and we want to limit this to Date, Subject, Contributor, and Alternate Title labels. In this example Manifest (opens in a new tab), the respective values of "Translated title: Love fāʼidah with the amulet of Prophet Yūsuf" and "Falke, ʻUmar, 1893-1962 (Collector)" would be included in the index.

https://api.dc.library.northwestern.edu/api/v2/works/2ca1b09b-cbad-43dd-82bf-a7fa807269d8?as=iiif
{
  "@context": "http://iiif.io/api/presentation/3/context.json",
  "id": "https://api.dc.library.northwestern.edu/api/v2/works/2ca1b09b-cbad-43dd-82bf-a7fa807269d8?as=iiif",
  "type": "Manifest",
  "label": {
    "none": [
      "محبة مع خاتم النبي يوسف."
    ]
  },
  "metadata": [
    {
      "label": {
        "none": [
          "Alternate Title"
        ]
      },
      "value": {
        "none": [
          "Translated title: Love fāʼidah with the amulet of Prophet Yūsuf"
        ]
      }
    },
    {
      "label": {
        "none": [
          "Contributor"
        ]
      },
      "value": {
        "none": [
          "Falke, ʻUmar, 1893-1962 (Collector)"
        ]
      }
    },
    ...
  ],
  ...
}

Update the config/canopy.json file to include these labels in the metadata property.

config/canopy.json
{
  "collection": "https://api.dc.library.northwestern.edu/api/v2/collections/59ec43f9-a96c-4314-9b44-9923790b371c?as=iiif&size=100",
  "metadata": ["Date", "Subject", "Contributor", "Alternate Title"],
  "search": {
    "enabled": true,
    "flexSearch": {
      "bidirectional": false,
      "charset": ["latin:extra", "arabic:extra"],
      "document": {
        "index": [
          {
            "bidirectional": true,
            "depth": 3,
            "field": "label",
            "resolution": 9,
            "tokenize": "full"
          },
          {
            "field": "metadata",
            "resolution": 2
          },
          {
            "field": "summary",
            "resolution": 1
          }
        ]
      },
      "optimize": true,
      "tokenize": "strict"
    },
    "index": {
      "metadata": {
        "all": false,
        "enabled": true
      },
      "summary": {
        "enabled": true
      }
    }
  }
}

Tip: To confirm text is being indexed for search, open the file .canopy/index.json and verify your custom data is being added to the index.

Validate search customizations

Verify your customizations are working by searching for:

  • An Arabic phrase (e.g. "مجموع الفوائد.")
  • A Manifest summary value (e.g. "Fāʼidah of Prophet Yūsuf on gaining people's love and respect.")
  • A Manifest metadata value (e.g. "Falke", or "Prophet Yūsuf")

canopy home page

canopy home page