NAV

Elasticsearch Cookbook

Before we start

Elasticsearch is a full text search engine. It has 2 main purposes.

The first is to be able to search its content according to a query and retrieve the corresponding documents.

The second is to sort these documents according to their relevancy toward the query. To do so, Elasticsearch computes a score according to the request. This score is influenced by each part of the query but the most sophisticated feature resides in its ability to tokenize words in a text field and ponderate it according to the frequency of these words in the corpus. You can find more information about scoring in the Elasticsearch documentation.

Installation

We want you to manipulate Elasticsearch while you are reading this cookbook, to do so you will need cURL, a terminal (Linux, Mac, Cygwin…) and optionally docker to speed up the installation.

You can also trust the output we provide in the cookbook and skip the installation chapter.

Launch Elasticsearch

We provide here a way to run Elasticsearch quickly with docker, but you can do it by following the installation documentation.

To launch Elasticsearch, copy this line in your terminal:

docker run -p 9200:9200 elasticsearch:2.3

(To stop Elasticsearch, you can use Ctrl-C)

The container we just launched will be accessed at the port 9200 on localhost. If you installed Elasticsearch using another method, adapt the examples provided in this cookbook to your install.

Check that Elasticsearch is reachable

Run the following command:

curl -g -X GET "http://localhost:9200/"

You can see below an example of reply. This cookbook assumes that your Elasticsearch version.number is between 2.3 and 5.x:

{
  "name" : "Edwin Jarvis",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.3.4",
    "build_hash" : "e455fd0c13dceca8dbbdbb1665d068ae55dabe3f",
    "build_timestamp" : "2016-06-30T11:24:31Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}

Data insertion

From now on we will add the ?pretty keyword to requests in order to get human-readable outputs.

Mapping creation

We will provide to Elasticsearch the mapping (RDBM: schema) of the data we want to index. Here we create a new document type (RDBM: table) called blogpost with 6 fields (RDBM: columns).

curl -g -X PUT "http://localhost:9200/example/?pretty" -d '{
  "settings" : {
    "index" : {
      "number_of_shards" : 1
    }
  },
  "mappings": {
    "blogpost": {
      "properties": {
        "author": {
          "type": "string",
          "analyzer": "standard"
        },
        "title": {
          "type": "string",
          "analyzer": "english"
        },
        "body": {
          "type": "string",
          "analyzer": "english"
        },
        "tags": {
          "type": "string",
          "index": "not_analyzed"
        },
        "status": {
          "type": "string",
          "index": "not_analyzed"
        },
        "publish_date": {
          "type": "date",
          "format": "yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}'

Reply:

{
  "acknowledged" : true
}

Document creation

curl -g -X PUT "http://localhost:9200/example/blogpost/1?pretty" -d '{
  "author": "John Doe",
  "title": "I love cats",
  "body": "They are so cute",
  "tags": [ "pet", "animal", "cat" ],
  "status": "pending",
  "publish_date": "2016-08-03"
}'

curl -g -X PUT "http://localhost:9200/example/blogpost/2?pretty" -d '{
  "author": "John Doe",
  "title": "I like dogs",
  "body": "They are loyal",
  "tags": [ "pet", "animal", "dog" ],
  "status": "published",
  "publish_date": "2016-08-01"
}'

curl -g -X PUT "http://localhost:9200/example/blogpost/3?pretty" -d '{
  "author": "John Smith",
  "title": "I hate fish",
  "body": "They do not bring the ball back",
  "tags": [ "pet", "animal", "fish" ],
  "status": "pending",
  "publish_date": "2017-08-03"
}'

curl -g -X PUT "http://localhost:9200/example/blogpost/4?pretty" -d '{
  "author": "Jane Doe",
  "title": "I hate cheese cake",
  "body": "I prefer chocolat cake",
  "tags": [ "food", "cake" ],
  "status": "archived",
  "publish_date": "1985-08-03"
}'

curl -g -X PUT "http://localhost:9200/example/blogpost/5?pretty" -d '{
  "author": "Will Smith",
  "title": "I admire lions",
  "body": "They are so regal",
  "tags": [ "wild animal", "animal", "lion" ],
  "status": "published",
  "publish_date": "2016-08-02"
}'

Replies:

{
  "_index" : "example",
  "_type" : "blogpost",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
{
  "_index" : "example",
  "_type" : "blogpost",
  "_id" : "2",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
{
  "_index" : "example",
  "_type" : "blogpost",
  "_id" : "3",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
{
  "_index" : "example",
  "_type" : "blogpost",
  "_id" : "4",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
{
  "_index" : "example",
  "_type" : "blogpost",
  "_id" : "5",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

The id

The number (1 to 5) at the end of the request url defines the id of the document (RDBM: primary key). If you do not specify it, Elasticsearch will assign an id to the document automatically. For the sake of this example, we explicitly defined the ID of each document (take a look at the last chunk of the URLs).If you do not specify it, Elasticsearch will automatically generate an ID and assign it to the document. Even if the ID is actually a String, you can use numbers for convenience.

The body

The body of the request must contain the content of the document you want to create. As you can see, the structure of the document matches our mapping. As a result, Elasticsearch will analyze and index our document as specified.

The structure

As you can see, we insert an array in a field ment to be a string. It is one of the feature of Elasticsearch; Any field can be an array of the defined type. For example, the tags field is defined as a string, but we chose to use it as an array of strings (and it is totally fine). Another feature of Elasticsearch is that you can nest a field to build complex documents. It is not addressed in this cookbook but you can find more information in the Elasticsearch documentation.

Useful commands

First of all, let’s take a look at some commands to explore your Elasticsearch instance.

List indices

List all available indices on your Elasticsearch instance:

curl -g "http://localhost:9200/_cat/indices?pretty"

Reply:

yellow open example 1 1 5 0 10.4kb 10.4kb

Get an index mapping

The mapping of an index consists of the list of the mappings of all the collections contained in the given index. To retrieve an index mapping, you can use the following command:

curl -g -X GET "http://localhost:9200/example/?pretty"

Reply:

{
  "example" : {
    "aliases" : { },
    "mappings" : {
      "blogpost" : {
        "properties" : {
          "author" : {
            "type" : "string",
            "analyzer" : "standard"
          },
          "body" : {
            "type" : "string",
            "analyzer" : "english"
          },
          "publish_date" : {
            "type" : "date",
            "format" : "yyyy-MM-dd||epoch_millis"
          },
          "status" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "tags" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "title" : {
            "type" : "string",
            "analyzer" : "english"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1474364614778",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "UXxlOo1uSy-vIlvo_8o5vA",
        "version" : {
          "created" : "2040099"
        }
      }
    },
    "warmers" : { }
  }
}

Basic queries

Search queries are all done with the GET method on the search endpoint, and the body of the request is a JSON object representing the query. We will present here the most common ways to use the different queries, together with the options that modify their behaviour. For more details about these options you can find more informations in the Elasticsearch documentation.

The search endpoint (and the match_all query)

The search endpoint allows different query parameters to control the output of the search. By defaut, the search endpoint will return the 10 first results, sorted by score. The match_all query returns all the documents in the collection, it can be useful with other queries in a bool query for instance.

Without query parameters

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "match_all": {}
  }
}'

Reply:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.0,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : 1.0,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They are so regal",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      }
    } ]
  }
}

Returns all the documents in the blogpost collection (because we have less than 10).

With from and size query parameters

To change this behaviour, 2 query parameters are available : from and size:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty&from=3&size=2" -d '{
  "query": {
    "match_all": {}
  }
}'

Reply:

{
  "took" : 28,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.0,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : 1.0,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They are so regal",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      }
    } ]
  }
}

Returns the 4th and 5th result of the search query. This is very useful when you want to paginate the results.

The scroll query parameter

The scroll query parameter is useful when dealing with huge data sets, or when you want to be sure the data set will not change during your processing. We recommend you to read the Elasticsearch documentation for more details.

The ids query

Returns the documents with the matching id.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "ids": {
      "values": ["2", "4"]
    }
  }
}'

Reply:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.0,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    } ]
  }
}

The query_string query

The query_string query is a way to “talk” directly to the core engine of Elasticsearch. If you are used to use Solr, it will look familiar.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "query_string": {
      "query": "_id:1 OR _id:2"
    }
  }
}'

Reply:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.35355338,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 0.35355338,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 0.35355338,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    } ]
  }
}

The match query

The match query is the one you want to use to perform a full text search. The query you use (here: “hate cake”) is analyzed (lowercased, tokenized …) and then is applied against the analyzed version of the field (which is also lowercased, tokenized…). As a result, the choice of the analyzer applied to a field is very important. To know more about analyzers, We recommend you to read the Elasticsearch documentation.

It results in a set of documents where a score is applied.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "match": {
      "title":"hate cake"
    }
  }
}'

Reply:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.2201192,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.2201192,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : 0.23384948,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      }
    } ]
  }
}

You can see that the second document does not contain cake at all but is still matching. This is because, by default, the match query operator applies a or operand to the provided searched terms. To return documents matching all tokens, you have to use the and operator:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "match": {
      "title": {
        "query": "hate cake",
        "operator": "and"
      }
    }
  }
}'

Reply:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.2201192,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.2201192,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    } ]
  }
}

The prefix query

The prefix query matches all the documents where the given field has a value that begins with the given string. In the following example, we want to match all the documents where the value of field status begins with pub:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "prefix": {
      "status":"pub"
    }
  }
}'

Reply:

{
  "took" : 107,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : 1.0,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They are so regal",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      }
    } ]
  }
}

The range query

The range query matches all the documents where the value of the given field is included within the specified range. In the following example, we want to match all the document where published_date is included within the two specified dates:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "range": {
      "publish_date": {
        "gte": "2016-08-01",
        "lte": "2016-08-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}'

Reply:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : 1.0,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They are so regal",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      }
    } ]
  }
}

The term query

The term query is used to find exact matches on the indexed value of a field. It should not be used on analyzed fields: the analyzed value that is indexed is a modified version of the input value. Analyzers are explain during the cookbook you can come back when it is clearer.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "term": {
      "status": "pending"
    }
  }
}'

Reply:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.5108256,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 1.5108256,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : 1.5108256,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      }
    } ]
  }
}

The terms query

Behaves exactly like term, but with multiple possible exact matches.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "terms": {
      "status": ["pending", "archived"]
    }
  }
}'

Reply:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.7524203,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 0.7524203,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 0.46769896,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : 0.46769896,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      }
    } ]
  }
}

The exists query

The exists query matches the documents where a given field is present:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "exists": {
      "field": "author"
    }
  }
}'

Reply:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : 1.0,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : 1.0,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      }
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : 1.0,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They belong to the Savanna",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      }
    } ]
  }
}

The missing query

The missing query is deprecated. Elasticsearch recommends to use the exists query in a must_not occurence of a bool compound query (and this will introduce you to the bool query :-) ).

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "author"
        }
      }
    }
  }
}'

Reply:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Sorting the result set

If you want to sort your result set in a different order than the _score default sort or compound the _score sort with other fields, you can specify the sort order alongside to the query:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "match_all": {}
  },
  "sort": [
    {"status": {"order": "asc"}}
  ]
}'

Reply:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : null,
    "hits" : [ {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "4",
      "_score" : null,
      "_source" : {
        "author" : "Jane Doe",
        "title" : "I hate cheese cake",
        "body" : "I prefer chocolat cake",
        "tags" : [ "food", "cake" ],
        "status" : "archived",
        "publish_date" : "1985-08-03"
      },
      "sort" : [ "archived" ]
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "author" : "John Doe",
        "title" : "I love cats",
        "body" : "They are so cute",
        "tags" : [ "pet", "animal", "cat" ],
        "status" : "pending",
        "publish_date" : "2016-08-03"
      },
      "sort" : [ "pending" ]
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "3",
      "_score" : null,
      "_source" : {
        "author" : "John Smith",
        "title" : "I hate fish",
        "body" : "They do not bring the ball back",
        "tags" : [ "pet", "animal", "fish" ],
        "status" : "pending",
        "publish_date" : "2017-08-03"
      },
      "sort" : [ "pending" ]
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "2",
      "_score" : null,
      "_source" : {
        "author" : "John Doe",
        "title" : "I like dogs",
        "body" : "They are loyal",
        "tags" : [ "pet", "animal", "dog" ],
        "status" : "published",
        "publish_date" : "2016-08-01"
      },
      "sort" : [ "published" ]
    }, {
      "_index" : "example",
      "_type" : "blogpost",
      "_id" : "5",
      "_score" : null,
      "_source" : {
        "author" : "Will Smith",
        "title" : "I admire lions",
        "body" : "They are so regal",
        "tags" : [ "wild animal", "animal", "lion" ],
        "status" : "published",
        "publish_date" : "2016-08-02"
      },
      "sort" : [ "published" ]
    } ]
  }
}

If the _score is not used in the sort, it is not calculated and nullified in the reply.

The bool (Boolean) query

(optional) You may need to explore the theory first, to understand the paradigm behind this kind of query. Thanksfully you can find a good resource on Wikipedia.

In the boolean compound query, there are 4 occurrence types:

  • must and should are used to filter AND score the documents.
  • filter and must_not are used to filter the documents (whether they match or not) but don’t influence the score.

This is what it looks like when we use every occurence types:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "match": {
          "author": {
            "query": "John Doe",
            "operator": "and"
          }
        }
      },
      "filter": {
        "term": {"tags": "animal" }
      },
      "must_not": {
        "range": {
          "publish_date": {"gte": "1985-01-01", "lte": "2016-01-01" }
        }
      },
      "should": [
        {"term": {"tags": "pet" }},
        {"term": {"tags": "dog" }}
      ]
    }
  }
}'

Reply (don’t spend too much time reading it, we will explain each occurence type and their effects later):

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 2.4638538,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 2.4638538,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.78557956,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    } ]
  }
}

You can find a full description in the Bool Query documentation.

The filter occurrence type

The filter occurrence type allows to filter documents with additional queries without affecting the score. You can even use a bool query in a filter occurrence type. We will introduce you with some ways to make basic filter requests. Up to you to choose your favorite.

Each example is equivalent from one to the others. As you will see there are different ways to achieve the same result using the filter occurence type.

Using a logical AND operator between fields

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "query_string": {
          "query": "status:published AND publish_date:[2015-01-01 TO *]"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": [
        {"term": {"status": "published" }},
        {"range": {"publish_date": {"gte": "2015-01-01" }}}
      ]
    }
  }
}'

Both examples above generate the same result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

You can notice that the score of both documents is 0: this is because we only use the filter occurence type of the bool query.

Using a logical AND operator between terms

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "query_string": {
          "query": "author:(john AND doe)"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "match": {
          "author": {
            "query": "john doe",
            "operator": "and"
          }
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": [
        {"match": {"author": "john" }},
        {"match": {"author": "doe" }}
      ]
    }
  }
}'

All examples above generate the same result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    } ]
  }
}

Using a logical OR operator between fields

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "query_string": {
          "query": "title:love OR tags:lion"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {"match": {"title": "love"}},
            {"match": {"tags": "lion"}}
          ]
        }
      }
    }
  }
}'

Both examples above generate the same result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

Using a logical OR operator between terms

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "query_string": {
          "query": "status:(published OR pending OR refused)"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {"term": {"status": "published" }},
            {"term": {"status": "pending" }},
            {"term": {"status": "refused" }}
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "match": {
          "status": {
            "query": "published pending refused",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}'

The last query is tricky. We specified 3 terms in the query, but as the field status is not analyzed, the query isn’t analyzed either. To split the query string into terms, we have to force the use of the standard analyzer. This allows the string "published pending refused" to be tokenized into the 3 following terms: ["published", "pending", "refused"].

Reply:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "3",
      "_score": 0.0,
      "_source": {
        "author": "John Smith",
        "title": "I hate fish",
        "body": "They do not bring the ball back",
        "tags": [ "pet", "animal", "fish" ],
        "status": "pending",
        "publish_date": "2017-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

Using a logical NOT operator

In this example we are using a bool query in the filter occurence type.

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "query_string": {
          "query": "-status:pending"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must_not": {
            "term": {"status": "pending" }
          }
        }
      }
    }
  }
}'

Both examples above generate the same result:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "4",
      "_score": 0.0,
      "_source": {
        "author": "Jane Doe",
        "title": "I hate cheese cake",
        "body": "I prefer chocolat cake",
        "tags": [ "food", "cake" ],
        "status": "archived",
        "publish_date": "1985-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

The must_not occurrence type

The must_not occurrence type allows to specify a query that will excludes documents from the result set. It acts like a logical NOT.

Usage of must_not with one query

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must_not": {
        "term": {"status": "pending" }
      }
    }
  }
}'

Expected reply:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 1.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "4",
      "_score": 1.0,
      "_source": {
        "author": "Jane Doe",
        "title": "I hate cheese cake",
        "body": "I prefer chocolat cake",
        "tags": [ "food", "cake" ],
        "status": "archived",
        "publish_date": "1985-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 1.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

Unlike filter that sets the score to 0 if used alone, the must_not occurence type sets the score to 1 when used alone. If you don’t want this to happen, you can use the constant_score query or include the bool with a must_not occurence in a filter (like we did in the previous example).

Usage of must_not with multiple queries

If you need to use more than one query to use in the must_not occurence type, you can replace the object query by an array of query objects. It will evict all documents where the field status is equal to “pending” or the field tags contains “pet”:

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must_not": [
        {"term": {"status": "pending" }},
        {"term": {"tags": "pet" }}
      ]
    }
  }
}'

Reply:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "4",
      "_score": 1.0,
      "_source": {
        "author": "Jane Doe",
        "title": "I hate cheese cake",
        "body": "I prefer chocolat cake",
        "tags": [ "food", "cake" ],
        "status": "archived",
        "publish_date": "1985-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 1.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They belong to the Savanna",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

The must occurrence type

The must occurrence type can be used used like the filter occurence type with the difference that it will influence the score. Let’s take a look at all the scores we get by replacing the filter occurence type in the previous examples with must. The AND examples give the same score for all documents. It is due to the little number of documents we use, their size and the small size of the corpus.

Using a logical AND operator between fields

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "status:published AND publish_date:[2015-01-01 TO *]"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": [
        {"term": {"status": "published" }},
        {"range": {"publish_date": {"gte": "2015-01-01" }}}
      ]
    }
  }
}'

Have the same reply with the same score:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.8117931,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 1.8117931,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 1.8117931,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They belong to the Savanna",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

Using a logical AND operator between terms

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "author:(john AND doe)"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "match": {
          "author": {
            "query": "john doe",
            "operator": "and"
          }
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": [
        {"match": {"author": "john" }},
        {"match": {"author": "doe" }}
      ]
    }
  }
}'

All examples above generate the same result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.0811163,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 1.0811163,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 1.0811163,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    } ]
  }
}

Using a logical OR operator between fields

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "title:love OR tags:lion"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "should": [
            {"match": {"title": "love"}},
            {"match": {"tags": "lion"}}
          ]
        }
      }
    }
  }
}'

Both examples above generate the same result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.67751116,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.67751116,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They belong to the Savanna",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.33875558,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    } ]
  }
}

Using a logical OR operator between terms

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "status:(published OR pending OR refused)"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "should": [
            {"term": {"status": "published" }},
            {"term": {"status": "pending" }},
            {"term": {"status": "refused" }}
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "match": {
          "status": {
            "query": "published pending refused",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}'

Reply:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.22560257,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "1",
      "_score": 0.22560257,
      "_source": {
        "author": "John Doe",
        "title": "I love cats",
        "body": "They are so cute",
        "tags": [ "pet", "animal", "cat" ],
        "status": "pending",
        "publish_date": "2016-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.22560257,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "3",
      "_score": 0.22560257,
      "_source": {
        "author": "John Smith",
        "title": "I hate fish",
        "body": "They do not bring the ball back",
        "tags": [ "pet", "animal", "fish" ],
        "status": "pending",
        "publish_date": "2017-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.22560257,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They belong to the Savanna",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

Using a logical NOT operator

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "-status:pending"
        }
      }
    }
  }
}'
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must_not": {
            "term": {"status": "pending" }
          }
        }
      }
    }
  }
}'

(the second example is a bit useless as we could use must_not directly)

Both examples above generate the same result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1.0,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 1.0,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "4",
      "_score": 1.0,
      "_source": {
        "author": "Jane Doe",
        "title": "I hate cheese cake",
        "body": "I prefer chocolat cake",
        "tags": [ "food", "cake" ],
        "status": "archived",
        "publish_date": "1985-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 1.0,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They belong to the Savanna",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

The should occurrence type

The should occurrence type is different from the 3 others as it allows to specify queries that “SHOULD” match the documents. If used without filter or must occurence types, at least one query will have to match the document. It could be seen as a logical OR operator. Its behaviour can be modified by the minimum_should_match. It allows to specify a number or a percentage of queries that have to match in order to select the document. You can see all available value formats of minimum_should_match in the Elasticsearch documentation.

Usage of minimum_should_match

curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "should": [
        {"term": {"status": "published" }},
        {"term": {"tags": "cake" }},
        {"match": {"body": "regal" }}
      ]
    }
  }
}'

We don’t use filter or must occurence types, as a result minimum_should_match is equal to 1.

Reply:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.98358554,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.98358554,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "4",
      "_score": 0.3945096,
      "_source": {
        "author": "Jane Doe",
        "title": "I hate cheese cake",
        "body": "I prefer chocolat cake",
        "tags": [ "food", "cake" ],
        "status": "archived",
        "publish_date": "1985-08-03"
      }
    }, {
      "_index": "example",
      "_type": "blogpost",
      "_id": "2",
      "_score": 0.24522427,
      "_source": {
        "author": "John Doe",
        "title": "I like dogs",
        "body": "They are loyal",
        "tags": [ "pet", "animal", "dog" ],
        "status": "published",
        "publish_date": "2016-08-01"
      }
    } ]
  }
}
curl -g -X POST "http://localhost:9200/example/blogpost/_search?pretty" -d '{
  "query": {
    "bool": {
      "should": [
        {"term": {"status": "published" }},
        {"term": {"tags": "cake" }},
        {"match": {"body": "regal" }}
      ],
      "minimum_should_match": 2
    }
  }
}'

Reply:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.98358554,
    "hits": [ {
      "_index": "example",
      "_type": "blogpost",
      "_id": "5",
      "_score": 0.98358554,
      "_source": {
        "author": "Will Smith",
        "title": "I admire lions",
        "body": "They are so regal",
        "tags": [ "wild animal", "animal", "lion" ],
        "status": "published",
        "publish_date": "2016-08-02"
      }
    } ]
  }
}

documentation links