Searching for Data

The search module provides functionality for querying the ANTARES Database. These queries are run against an ElasticSearch cluster and, so, queries must use the ElasticSearch syntax.

Let’s say that we are interested in finding all loci with:

  • Between 50 and 100 magnitude measurements

  • Tagged as a nuclear transient

We represent this query in Python as follows:

query = {
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "properties.num_mag_values": {
                            "gte": 50,
                            "lte": 100,
                        }
                    }
                },
                {
                     "term": {
                         "tags": "nuclear_transient"
                     }
                }
             ]
        }
    }
}

And can search through the ANTARES database for matching objects:

from antares_client.search import search
first_result = next(search(query))

The return value of the search function is an iterator over loci in the result set. This means that the result set is not immediately available in memory unless you did something like result_set = list(search(query)). Because result sets can be so large, we recommend against doing so. Prefer, instead, operations on the iterable like:

for locus in search(query):
    do_something(locus)

Query Helpers

We plan to provide a number of tools to simplify writing queries in the future. In the meantime, you can use the Python elasticsearch_dsl library to remove some of the boilerplate associated with structuring ElasticSearch queries.

If you’ve run pip install elasticsearch-dsl, you could also accomplish the previous example with:

from antares_client.search import search
from elasicsearch_dsl import Search

query = (
    Search()
    .filter("range", **{"properties.num_mag_values": {"gte": 50, "lte": 100}})
    .filter("term", tags="nuclear_transient")
    .to_dict()
)
first_result = next(search(query))

Query Syntax

Queries can have a complex and deep structure. Most queries will be nested within a bool structure, this allows multiple conditions to exist together. Let’s look at the conditional structures:

Must

All documents must match the clause in order to be returned. Think of this as an analog to AND. Notice that you can have multiple conditions, these are placed within a list using square brackets ([]).

{
  "query":{
    "bool":{
      "must":[
         {
           "match":{
               "properties.passband.keyword": "g"
             }
         },
         {
           "range":{
             "properties.ztf_magdiff":{
               "gte": 0.25
             }
           }
         }
      ]
    }
  }
}

Should

Any documents that match one or more criteria are returned. should is not exclusive, think of this as the analog to OR. These can also be placed in a list.

{
  "query":{
    "bool":{
      "should":[
         {
           "range":{
             "properties.ztf_srmag1":{
               "gte": "16.01"
             }
           }
         },
         {
           "range":{
             "properties.ztf_srmag1":{
               "lte": "14.99"
             }
           }
         }
      ]
    }
  }
}

Must Not

must_not is the logical NOT operator.

{
  "query":{
    "bool":{
      "must_not":[
         {
           "match":{
               "properties.passband": "g"
             }
         },{
           "match":{
             "properties.passband": "R"
           }
         }
      ]
    }
  }
}

Ranges

Ranges can have gt, lt, gte, lte (greater-than, less-than, greater-or-equal, less-or-equal respectively) conditions.

{
  "query":{
    "bool":{
      "should":[
         {
           "range":{
             "properties.ztf_srmag1":{
               "lt": "17.01"
               "gte": "16.01"
             }
           }
         },
         {
           "range":{
             "properties.ztf_srmag1":{
               "lte": "14.99"
             }
           }
         }
      ]
    }
  }
}

Set Membership

You can search for alerts that have properties in a given set of values with the terms property.

{
  "query": {
    "bool": {
      "filter": {
        "terms": {
          "locus_id": [
            2042517,
            2085365,
            2471567,
            2627841,
            2761143,
            2797326,
            2822419,
            2896237
          ]
        }
      }
    }
  }
}

Compound Queries

You can combine these different conditional clauses to write advanced queries. For example:

{
  "query":{
    "bool":{
      "must_not":[
         {
           "match":{
               "properties.passband": "g"
             }
         },
         {
           "range":{
             "dec":{
               "gte":20.23,
               "lte":28.00
             }
           }
         }
      ],
      "must":[
        {
          "range":{
            "ingest_time":{
              "gte": 1551398400,
              "lt": 1554076800
            }
          }
        }
      ],
      "should":[
         {
           "range":{
             "ra":{
               "lte": 66.13
             }
           }
         }
      ]
    }
  }
}