喬叔的 Elastic Stack 專業教育訓練
  • 喬叔的 Elastic Stack 專業教育訓練
  • 🧑關於喬叔 (Joe Wu)
  • Elastic 課程公開班
    • 🎯Elasticsearch 基礎實務班
      • 💯學員課後回饋
    • 🆕Elasticsearch 進階運維班
      • 💯學員課後回饋
    • Elasticsearch 進階開發班
    • Elastic Stack 基礎實務班
    • Elastic Observability 基礎實務班
    • 📩課程許願池
  • 技術分享
    • 📗喬叔帶你上手 Elastic Stack
      • 前言
      • Elastic Cloud 如何建立 Deployment
        • ES Node 的種類
        • 配置的選擇
      • Index 建立前你該知道的
        • ES Index 如何被建立
        • ES 的超前佈署 - Dynamic Mapping
        • ES 的超前佈署 - Index Template
        • ES Index 的別名 (Alias)
        • 管理你的 Index - Kibana Index
      • 管理 Index 的 Best Practices
        • Shard 的數量與 Rollover & Shrink API
        • 三溫暖架構 - Hot Warm Cold Architecture
        • Index Lifecycle Management (ILM)
        • Rollup
        • Transform
        • Snapshot Lifecycle Management (SLM)
        • 總結
      • Elastic Cloud 比免費版還多的功能
        • Elastic Stack 的方案比較與銷售方式
        • Centralized Beats Management
        • Centralized Pipeline Management
        • Watcher
        • Elasticsearch Token Service
        • Multi-stack monitoring & Automatic stack issue alerts
      • 向 App Search 學習怎麼用 Elasticsearch
        • 揭開 App Search 的面紗
        • Engine 的 Index Settings 篇
        • Engine 的 Mapping 篇
        • Engine 的 Search 基礎剖析篇
        • Engine 的 Search 進階剖析篇
      • Elasticsearch 的優化技巧
        • Indexing 索引效能優化
        • Searching 搜尋效能優化
        • Index 的儲存空間最佳化
        • Shard 的最佳化管理
      • 完賽心得
    • 📘喬叔帶你上手 Elastic Stack - 探索與實踐 Observability 系列
      • 前言 & 淺談 Observability
      • Elastic 的 Observability 解決方案
      • Uptime - 掌握系統的生命徵象
        • 我們要觀測的生命徵象是什麼?
        • 使用 Heartbeat 收集系統生命徵象數據
        • 透過 Kibana 觀看心電圖及設定警報
        • 使用合成監控 (Synthetics Monitor) 從使用者情境驗證服務的運作狀態
      • Metrics - 觀察系統的健康指標
        • Metrics 與 Metricbeat 的基本介紹
        • 使用 Metricbeat 掌握 Elastic Stack 的健康狀態
        • 使用 Metricbeat 掌握 Infrastructure 的健康狀態 Host 篇
        • 使用 Metricbeat 掌握 Infrastructure 的健康狀態 Docker 篇
        • 使用 Metricbeat 掌握 Infrastructure 的健康狀態 Kubernetes 篇
        • 使用 Metricbeat 掌握 Infrastructure 的健康狀態 AWS 篇
      • Logs - 挖掘系統內部發生的狀況
        • Logs 與 Filebeat 的基本介紹
        • 使用 Filebeat 應該要了解的設計細節與原理
        • 透過 Filebeat 收集 Elastic Stack 中各種服務的細節資訊
        • 透過 Filebeat 收集 Infrastructure 中各種服務的細節資訊
      • Traces - 觀察應用程式的效能瓶頸
        • Elastic APM 基本介紹
        • 使用 APM-Integratoin-Testing 建立 Elastic APM 的模擬環境
        • 如何在 Kibana 使用 APM UI
        • 使用 APM Server 來收集 APM 數據
        • 透過 APM Agents 收集並傳送後端服務運作的記錄
        • 透過真實使用者監控 (RUM, Real User Monitoring) 來改善使用者體驗
      • 建立結構化的 Log
        • Elastic Common Schema 結構化 Log 的規範
        • Elasticsearch Ingest Pipeline 資料 Index 前的轉換好幫手
          • 基本介紹
          • 各種常用的 Processor
          • Enrich 資料與例外處理
      • 有效的使用 Observability 的資料
        • 透過 Machine Learning 發現異常的問題
        • 使用 Kibana Alerts 主動通知異常狀況
        • 資料的生命週期管理
        • 使用 Elastic Observability 追縱及觀察問題的心得
      • 完賽心得
    • 😀Elasticsearch 技術分享小品
      • 🤖Elastic 與 AI
        • Elasticsearch Inference API 讓我們直接在 ES 裡運用 OpenAI Completion API
    • 🎥線上分享
      • 喬叔 Elasticsearch Index 管理與效能優化技巧
      • Elastic Certification 認證經驗分享
    • 🛠️workshop
      • 如何在 Elasticsearch 實現敏捷的資料建模與管理 @ DevOpsDays 2023
        • 工作坊實作內容
      • Elastic Observability 實作體驗坊 @ DevOpsDays 2022
        • 行前準備
        • 工作坊實作內容
      • 當 Elasticsearch 搜尋引擎遇上 AI @ HelloWordDevConference 2024
        • 投影片
        • Elasticsearch 環境準備
        • Google Colab 環境準備
        • 工作坊操作說明
        • ElasticSearch Relevance Engine (ESRE)
    • ⬆️Elastic Stack 版本升級記錄
      • 🔍Elasticsearch
  • 其他專業服務
    • 👩‍🎓企業包班 | 企業內訓
    • 👨‍💼顧問服務
    • 🈺專案合作
    • 🧩Elastic 授權代理
  • 相關連結
    • Facebook 粉絲頁
Powered by GitBook
On this page
  • 前言
  • 進入此章節的先備知識
  • 此章節的重點學習
  • 準備測試資料
  • Synonym 同義字的查詢
  • Curation
  • 取得 Curated 項目
  • 執行其他文件的查詢 (不能包含 curated item)
  • Relevance Tuning
  • 總結
  1. 技術分享
  2. 喬叔帶你上手 Elastic Stack
  3. 向 App Search 學習怎麼用 Elasticsearch

Engine 的 Search 進階剖析篇

前言

前面的章節介紹了 App Search 在執行 Engine 的搜尋時,我們如何取得 App Search 發送給 Elasticsearch 的 Search Request,搭配這系列先前的文章所介紹各種客制的 Analysis 以及應用在 Mapping 上的各種配置方法,我們剖析了 App Search 在執行 Search 時是如何運作的,這篇文章將會進一步探討在使用 App Search 的其他功能、像是 Synonyms, Curations, Relevance Tuning 時,Search Request 會有什麼樣的變化。

進入此章節的先備知識

  • Elasticsearch Query DSL 的基本知識。

  • 請先閱讀本系列先前的文章。

此章節的重點學習

  • 在使用 App Search 的 Synonyms, Curations, Relevance Tuning 功能時,底層是如何使用 Elasticsearch 的。


準備測試資料

首先我們先增加幾筆資料,以協助接下來幾個使用案例的說明,這三筆資料請直接使用 App Search Data Importer 匯入即可。

[
  {
    "id": "park_rocky-mountain",
    "title": "Rocky Mountain",
    "description": "Bisected north to south by the Continental Divide, this portion of the Rockies has ecosystems varying from over 150 riparian lakes to montane and subalpine forests to treeless alpine tundra. Wildlife including mule deer, bighorn sheep, black bears, and cougars inhabit its igneous mountains and glacial valleys. Longs Peak, a classic Colorado fourteener, and the scenic Bear Lake are popular destinations, as well as the historic Trail Ridge Road, which reaches an elevation of more than 12,000 feet (3,700 m).",
    "visitors": 4517585,
    "location": "40.4,-105.58",
    "date": "1915-01-26T06:00:00Z"
  },
  {
    "id": "yangming-mountain",
    "title": "Yangming Mountain",
    "description": "Yangmingshan National Park is one of the nine national parks in Taiwan, located between Taipei and New Taipei City. The districts that house parts of the park grounds include Taipei's Beitou and Shilin Districts; and New Taipei's Wanli, Jinshan and Sanzhi Districts. The national park is known for its cherry blossoms, hot springs, sulfur deposits, fumaroles, venomous snakes, and hiking trails, including Taiwan's tallest dormant volcano, Qixing (Seven Star) Mountain (1,120 m).",
    "visitors": 123123,
    "location": "25.17,121.56",
    "date": "1985-09-15T16:00:00Z"
  },
  {
    "id": "Himalaya-mountain",
    "title": "Himalaya",
    "description": "The Himalayas, is a mountain range in Asia separating the plains of the Indian subcontinent from the Tibetan Plateau. The range has many of Earth's highest peaks, including the highest, Mount Everest, at the border between Nepal and China. The Himalayas include over fifty mountains exceeding 7,200 m (23,600 ft) in elevation, including ten of the fourteen 8,000-metre peaks. By contrast, the highest peak outside Asia (Aconcagua, in the Andes) is 6,961 m (22,838 ft) tall.",
    "visitors": 52700000,
    "location": "27.59,86.55",
    "date": "1900-01-01T00:00:00Z"
  }
]

Synonym 同義字的查詢

針對探討 Synonym 同義字的查詢的執行方式,我們先建立一組同義字,這邊使用一個例子,將 rocky 和 yangming 這兩個字設成同義字:

接著我們到 Query Tester 執行查詢,並使用 rocky 來當查詢的關鍵字:

接下來我們來看看 slowlog 幫我們印出來 Elasticsearch 收到的 Search Request 的內容是什麼:

以下是 Formatted Search Request Payload:

{
  "from": 0,
  "size": 10,
  "timeout": "2000ms",
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "should": [
                    {
                      "multi_match": {
                        "query": "rocky",
                        "fields": [
                          "description$string^1.0",
                          "description$string.delimiter^0.4",
                          "description$string.joined^0.75",
                          "description$string.prefix^0.1",
                          "description$string.stem^0.95",
                          "external_id^1.0",
                          "title$string^1.0",
                          "title$string.delimiter^0.4",
                          "title$string.joined^0.75",
                          "title$string.prefix^0.1",
                          "title$string.stem^0.95"
                        ],
                        "type": "cross_fields",
                        "operator": "OR",
                        "slop": 0,
                        "prefix_length": 0,
                        "max_expansions": 50,
                        "minimum_should_match": "1<-1 3<49%",
                        "zero_terms_query": "NONE",
                        "auto_generate_synonyms_phrase_query": true,
                        "fuzzy_transpositions": true,
                        "boost": 1
                      }
                    },
                    {
                      "constant_score": {
                        "filter": {
                          "multi_match": {
                            "query": "rocky",
                            "fields": [
                              "description$string.intragram^0.1",
                              "external_id.intragram^0.1",
                              "title$string.intragram^0.1"
                            ],
                            "type": "best_fields",
                            "operator": "OR",
                            "slop": 0,
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "minimum_should_match": "35%",
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "fuzzy_transpositions": true,
                            "boost": 1
                          }
                        },
                        "boost": 0.1
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "multi_match": {
                              "query": "rocky yangming",
                              "fields": [
                                "description$string^1.0",
                                "description$string.delimiter^0.4",
                                "description$string.joined^0.75",
                                "description$string.prefix^0.1",
                                "description$string.stem^0.95",
                                "external_id^1.0",
                                "title$string^1.0",
                                "title$string.delimiter^0.4",
                                "title$string.joined^0.75",
                                "title$string.prefix^0.1",
                                "title$string.stem^0.95"
                              ],
                              "type": "cross_fields",
                              "operator": "OR",
                              "slop": 0,
                              "prefix_length": 0,
                              "max_expansions": 50,
                              "zero_terms_query": "NONE",
                              "auto_generate_synonyms_phrase_query": true,
                              "fuzzy_transpositions": true,
                              "boost": 0.75
                            }
                          }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "_source": {
    "includes": [
      "date$date",
      "visitors$float",
      "description$string",
      "external_id",
      "location$location",
      "title$string",
      "engine_id"
    ],
    "excludes": []
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "_doc": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "fragment_size": 300,
    "number_of_fragments": 1,
    "type": "plain",
    "highlight_query": {
      "multi_match": {
        "query": "rocky",
        "fields": [
          "description$string.prefix^1.0",
          "description$string.stem^1.0",
          "title$string.prefix^1.0",
          "title$string.stem^1.0"
        ],
        "type": "best_fields",
        "operator": "OR",
        "slop": 0,
        "prefix_length": 0,
        "max_expansions": 50,
        "zero_terms_query": "NONE",
        "auto_generate_synonyms_phrase_query": true,
        "fuzzy_transpositions": true,
        "boost": 1
      }
    },
    "order": "score",
    "require_field_match": false,
    "encoder": "html",
    "fields": {
      "description$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "description$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      }
    }
  }
}

我們可以發現這個 Request 和先前介紹基本 App Search 執行 Search 時產生的 Request 有一個塊 新增加 的一組 multi_match 查詢,並且使用 bool query - should 和原本的查詢包在一起:

我們發現這個 query 的關鍵字,直接包含了 rocky 和 yangming ,這代表了一件事:

App Search 在處理 Synonym 時,是在 Application 端進行的處理,不是使用 Elasticsearch 內的同義字字典機制,也就是當 App Search 的 Search API 收到 rocky 的關鍵字時,在 Application 端,先發現 rocky 是包含在 Synonym 的定義中,所以直接將 Synonym 的 rocky 這組同義字設定展開,也就是 rocky yangming,並且另外帶入在 Elasticsearch 的查詢中,也就因此產生上面的這個查詢語句,並且由於是同義字的查詢,所以這部份的 boost 值設定為較低的 0.75。

這種做法的好處是,因為 App Search 的 同義字 是在 App Search 的後台讓使用者靈活的自行設置,所以在 Application 端處理的彈性較高,不用另外維護 Elasticsearch 參照到的同義字字典,同時為了能彈性的調整,所以這邊 Synonym 的執行方式選擇是 searching 時機的同義字比對,也就是在搜尋時將關鍵字參考到同義字字典後展開,查詢所有同義字有定義的詞,以查詢出包含這些詞的文件,而不是在 indexing 時期先參考好同義字字典,並先將同義字的相關的字詞都包含在 index 中。

Curation

在使用 Curation 時,又是如何運作的呢?

以下我們透過 mountain 這個關鍵字為例,原始的 mountain 查詢結果如下:

我們在先 App Search 建立一組新的 Curation 設定,針對 mountain 這個關鍵字。

並且將原本分數最低的 yangming mountain ,拉到 promoted documents 中。

這時我們再重新搜尋 mountain 時,這個 yangming mountain 的 Score 變成了 1,並排序在最上面。

我們再透過 slowlog 來查詢底下發生了什麼事,這時我們發現有 2筆的 logs。

原來 App Search 在處理 Curation 時,會將查詢結果分成兩部份來執行

  1. 取得 Curated 項目

  2. 執行其他文件的查詢 (不能包含 curated item)

以下我們分別查看各別的查詢內容為何:

取得 Curated 項目

{
  "from": 0,
  "size": 1,
  "timeout": "2000ms",
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {
            "boost": 1
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must": [
              {
                "terms": {
                  "external_id": [
                    "yangming-mountain"
                  ],
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "_source": {
    "includes": [
      "date$date",
      "visitors$float",
      "description$string",
      "external_id",
      "location$location",
      "title$string",
      "engine_id"
    ],
    "excludes": []
  },
  "highlight": {
    "fragment_size": 300,
    "number_of_fragments": 1,
    "type": "plain",
    "highlight_query": {
      "multi_match": {
        "query": "Mountain",
        "fields": [
          "description$string.prefix^1.0",
          "description$string.stem^1.0",
          "title$string.prefix^1.0",
          "title$string.stem^1.0"
        ],
        "type": "best_fields",
        "operator": "OR",
        "slop": 0,
        "prefix_length": 0,
        "max_expansions": 50,
        "zero_terms_query": "NONE",
        "auto_generate_synonyms_phrase_query": true,
        "fuzzy_transpositions": true,
        "boost": 1
      }
    },
    "order": "score",
    "require_field_match": false,
    "encoder": "html",
    "fields": {
      "description$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "description$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      }
    }
  }
}

仔細看這部份的差異,主要就是以下這個:

直接明確的透過 terms query 將 external_id 是 promoted 的這筆資料查出。

這邊雖然只是簡單的把一筆資料查出來,卻還是透過 Query 的方式來執行,我猜測有另個主要的目的就是同樣要使用 Highlighting 的機制,並且簡單的將最後的查詢結果能 combine 在一起。

執行其他文件的查詢 (不能包含 curated item)

這部份執行的目的,就是查詢出除了 promoted 的資料之外的資料。

{
  "from": 0,
  "size": 9,
  "timeout": "2000ms",
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "should": [
                    {
                      "multi_match": {
                        "query": "Mountain",
                        "fields": [
                          "description$string^1.0",
                          "description$string.delimiter^0.4",
                          "description$string.joined^0.75",
                          "description$string.prefix^0.1",
                          "description$string.stem^0.95",
                          "external_id^1.0",
                          "title$string^1.0",
                          "title$string.delimiter^0.4",
                          "title$string.joined^0.75",
                          "title$string.prefix^0.1",
                          "title$string.stem^0.95"
                        ],
                        "type": "cross_fields",
                        "operator": "OR",
                        "slop": 0,
                        "prefix_length": 0,
                        "max_expansions": 50,
                        "minimum_should_match": "1<-1 3<49%",
                        "zero_terms_query": "NONE",
                        "auto_generate_synonyms_phrase_query": true,
                        "fuzzy_transpositions": true,
                        "boost": 1
                      }
                    },
                    {
                      "constant_score": {
                        "filter": {
                          "multi_match": {
                            "query": "Mountain",
                            "fields": [
                              "description$string.intragram^0.1",
                              "external_id.intragram^0.1",
                              "title$string.intragram^0.1"
                            ],
                            "type": "best_fields",
                            "operator": "OR",
                            "slop": 0,
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "minimum_should_match": "35%",
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "fuzzy_transpositions": true,
                            "boost": 1
                          }
                        },
                        "boost": 0.1
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "terms": {
                  "external_id": [
                    "yangming-mountain"
                  ],
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "_source": {
    "includes": [
      "date$date",
      "visitors$float",
      "description$string",
      "external_id",
      "location$location",
      "title$string",
      "engine_id"
    ],
    "excludes": []
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "_doc": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "fragment_size": 300,
    "number_of_fragments": 1,
    "type": "plain",
    "highlight_query": {
      "multi_match": {
        "query": "Mountain",
        "fields": [
          "description$string.prefix^1.0",
          "description$string.stem^1.0",
          "title$string.prefix^1.0",
          "title$string.stem^1.0"
        ],
        "type": "best_fields",
        "operator": "OR",
        "slop": 0,
        "prefix_length": 0,
        "max_expansions": 50,
        "zero_terms_query": "NONE",
        "auto_generate_synonyms_phrase_query": true,
        "fuzzy_transpositions": true,
        "boost": 1
      }
    },
    "order": "score",
    "require_field_match": false,
    "encoder": "html",
    "fields": {
      "description$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "description$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      }
    }
  }
}

從這邊產生出來的 Search Request ,可以發現主要是多增加了下面的這個 filter - must_not 的查詢。

也是明確的宣告不要包含這筆已經另外處理的 promoted item。

最後 App Search 會將這兩個各別查詢的結果合併在一起,這部份就是 App Search Curation 底下運作的方式。

Relevance Tuning

而 Relevance Tuning 的調整,對於查詢方式的影響是什麼,我們這邊直接來進行的實驗,將 title 欄位的 boost 從 1 調高到 3。

接下來我們同樣透過 Query Tester 來執行搜尋,可以看到針對 mountain 這個關鍵字的回傳結果依照調整 relevanc boost 之後有些不一樣了。

直接查看 slowlog 看看看這個查詢的 payload。

{
  "from": 0,
  "size": 10,
  "timeout": "2000ms",
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "should": [
                    {
                      "multi_match": {
                        "query": "mountain",
                        "fields": [
                          "description$string^1.0",
                          "description$string.delimiter^0.4",
                          "description$string.joined^0.75",
                          "description$string.prefix^0.1",
                          "description$string.stem^0.95",
                          "external_id^1.0",
                          "title$string^3.0",
                          "title$string.delimiter^1.2",
                          "title$string.joined^2.25",
                          "title$string.prefix^0.3",
                          "title$string.stem^2.85"
                        ],
                        "type": "cross_fields",
                        "operator": "OR",
                        "slop": 0,
                        "prefix_length": 0,
                        "max_expansions": 50,
                        "minimum_should_match": "1<-1 3<49%",
                        "zero_terms_query": "NONE",
                        "auto_generate_synonyms_phrase_query": true,
                        "fuzzy_transpositions": true,
                        "boost": 1
                      }
                    },
                    {
                      "constant_score": {
                        "filter": {
                          "multi_match": {
                            "query": "mountain",
                            "fields": [
                              "description$string.intragram^0.1",
                              "external_id.intragram^0.1"
                            ],
                            "type": "best_fields",
                            "operator": "OR",
                            "slop": 0,
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "minimum_should_match": "35%",
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "fuzzy_transpositions": true,
                            "boost": 1
                          }
                        },
                        "boost": 0.1
                      }
                    },
                    {
                      "constant_score": {
                        "filter": {
                          "multi_match": {
                            "query": "mountain",
                            "fields": [
                              "title$string.intragram^0.3"
                            ],
                            "type": "best_fields",
                            "operator": "OR",
                            "slop": 0,
                            "prefix_length": 0,
                            "max_expansions": 50,
                            "minimum_should_match": "35%",
                            "zero_terms_query": "NONE",
                            "auto_generate_synonyms_phrase_query": true,
                            "fuzzy_transpositions": true,
                            "boost": 1
                          }
                        },
                        "boost": 0.3
                      }
                    }
                  ],
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ],
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "_source": {
    "includes": [
      "date$date",
      "visitors$float",
      "description$string",
      "external_id",
      "location$location",
      "title$string",
      "engine_id"
    ],
    "excludes": []
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "_doc": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "fragment_size": 300,
    "number_of_fragments": 1,
    "type": "plain",
    "highlight_query": {
      "multi_match": {
        "query": "mountain",
        "fields": [
          "description$string.prefix^1.0",
          "description$string.stem^1.0",
          "title$string.prefix^1.0",
          "title$string.stem^1.0"
        ],
        "type": "best_fields",
        "operator": "OR",
        "slop": 0,
        "prefix_length": 0,
        "max_expansions": 50,
        "zero_terms_query": "NONE",
        "auto_generate_synonyms_phrase_query": true,
        "fuzzy_transpositions": true,
        "boost": 1
      }
    },
    "order": "score",
    "require_field_match": false,
    "encoder": "html",
    "fields": {
      "description$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "description$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.stem": {
        "fragment_size": 100,
        "no_match_size": 100
      },
      "title$string.prefix": {
        "fragment_size": 100,
        "no_match_size": 100
      }
    }
  }
}

這邊可以看到,針對 title 的欄位權重變成 3倍,因此 title 相關的 boosting 的值,也都對應的變成原本的 3倍。

而 intragram 使用的 constant_score 也另外加了另一組 boost: 0.3 的查詢。

所以在畫面上的 Relevance Tuning 調整,就是直接反應到 Search Request 組成時,每個 fields 的 boost 配置。

總結

從本篇文章的探索,可以發現 App Search 在實作 Synonyms, Curations, Relevance Tuning 的機制時,是如何使用 Elasticsearch,有些是使用 Elasticsearch 原本就提供的功能、有的是配合一些進階的使用方式,提高 Application 端的管理方便性,也有不一定都會使用到 Elasticsearch 的功能而直接在 Application 端處理掉,這種做法都是有為了達到的好處及對應的取捨,會是我們在使用 Elasticsearch 進行進階的產品搜尋功能開發時很好的參考。

PreviousEngine 的 Search 基礎剖析篇NextElasticsearch 的優化技巧

Last updated 2 years ago

synonym setting
synonym query tester
synonym request slowlog
synonym additioinal request
curation original request
curation - create
curation - manage curation
curation - new search
curation slowlog
curation must promoted item
curation must_not promoted item
relevance tuning - setting
relevance tuning - query tester
relevance tuning - normal fields
relevance tuning - intragram field
📗