Persistent Query Service

As we are all painfully aware, searching within OS definitely needs some more work. It does not need to be too overly complicated tho. Among other things, I would like to propose a new Persistent Query Service as a Core API. The service would allow Queries to be created by a client, then accessed repeatedly later via GET requests... All queries would be specific to individual users/resources... For instance:

  POST /api/query/@me HTTP/1.1 
  Host: example.org 
  Content-Type: application/json 
  Authorization: Bearer 123456abcdef 
  Link: <http://opensocial.org/specs/3.0>; rel="implements" 

  { 
    "displayName": "My Search", 
    "objectTypes": ["person","event","file"], 
    "filters" : [ 
      {"field":"displayName","op":"contains","value":"foo"}
    ] 
  }

I'm still working on the specific Query document format... essentially, it would be an extensible JSON structure that would allow basic kinds of filters to be established for the query.

The server would create the query and hand me back a URL for it...

  HTTP/1.1 201 Created 
  Location: /api/query/@me/query-id-1

Afterwards, I can perform a GET at any time on that URL to get the current results.. which would be returned using the basic Activity Streams collection format...

  GET /api/query/@me/query-id-1 HTTP/1.1 
  Authorization: Bearer 123456abcdef

...

  HTTP/1.1 200 OK 
  Content-Type: application/json 
  Link: <http://opensocial.org/specs/3.0>; rel="implements" 

  { 
    "displayName": "My Search", 
    "objectTypes": ["person","event","file"], 
    "filters" : [ 
      {"field":"displayName","op":"contains","value":"foo"}
    ], 
    "items" : [ 
      { 
        "objectType":"person", 
        "displayName": "Foo Bar Baz", 
        ... 
      }, 
      { 
        "objectType":"event", 
        "displayName": "Foo Event", 
        ... 
      } 
    ] 
  }

To modify the query later, I can simply perform a PUT or PATCH operation on the Query URL, or send a DELETE.

To determine which queries are available to a given resource, the server would provide a single non-modifiable query for all resources... e.g.

GET /api/query/@me/@queries HTTP/1.1
Authorization: Bearer 123456abcdef

Which would return a collection of the resources available queries...

  HTTP/1.1 200 OK 
  Content-Type: application/json 
  Link: <http://opensocial.org/specs/3.0>; rel="implements" 

  { 
    "displayName": "My Queries", 
    "objectTypes": ["query"], 
    "items" : [ 
      { 
        "objectType": "query", 
        "displayName": "My Search", 
        "objectTypes": ["person","event","file"], 
        "filters" : [ 
          {"field":"displayName","op":"contains","value":"foo"}
        ], 
        "url":"http://example.org/api/query/@me/query-id-1" 
      }, 
    ] 
  }

The system would obviously be free to provide it's own collection of persistent queries. For example, a Forum or Community app built on the OpenSocial server would be able to create it's own queries for specific resources... listing the members of the community, for instance; listing the most recent messages; returning the URLs of all the profile photos for the members of the forum; etc... the list can go on. The point is that the Query Service would provide a single, consistent interface through which persistent search operations can be provided.

Strawman Query Document Syntax

Ok, this is just a strawman... I don't know if I would recommend going this direction or not but wanted to at least get it documented...

Basically, imagine a JSON based filter expression format... each object is a "Test" .. a test can either be Discreet or Aggregate. Easier to show by example... suppose I want to create a persistent query for the profile information of three specific people..

{
  "objectType":"query",
  "displayName": "My Query for users abc123, abc124 and abc125",
  "test":  {
    "and" : [
      {"field":"objectType", "value"="person"},
      {"or": [
        {"field": "id", "value": "abc123"},
        {"field": "id", "value": "abc124"},
        {"field": "id", "value": "abc125"}
      ]}
    ]
  }
}

The value of the "test" property is a "Test" object, which, in this case, happens to be an "Aggregate Test" ... Aggregate Test objects can have one of three fields, "and", "or", or "not", whose value is an array of Test objects. In this example, the value of "test" is true if each of the tests contained in the "and" array evaluate to "true".

The "and" array contains two Test objects, one of which is discreet, and one that is aggregate. The discreet test, {"field":"objectType","value"="person"}, tests that the field "objectType" is equal to the value "person". The aggregate test, using "or", evaluates to true if any of the contained tests evaluates to true. The contained tests each test the value of the "id" field.

The plain english form of this query would be, "Return objects whose 'objectType' equals 'person' and whose 'id' equals 'abc123', 'abc124' or 'abc125'"

Let's suppose the value of a field is itself an object that we need to test... for instance, we want only profiles whose "name" property contains the letters "snell", ignoring case...

{  
  "objectType":"query",  
  "displayName": "My Query",  
  "test":  {
    "and" : [
      {"field":"objectType", "value"="person"},
      {"field":"name",
        "test" : {
         "field" : "lastName",
         "op" : "contains",
         "value" : "snell",
          "ignore-case": true
        }
      }
    ]
  }
}

Here, we have two discreet tests that both must evaluate to true. The second test, however, specifies it's own "test" rather than a "value". The scope of that test is the value of the "name" property. It checks to see if the "lastName" property contains the value "snell" and specifies that case should be ignored.

I could choose to make that a bit more complex of a query if I wanted...

{  
"objectType":"query",  
"displayName": "My Query",  
"test":  {
    "and" : [
      {"field":"objectType", "value"="person"},
      {"field":"name",
        "test" : {
          "and": [
            {
             "field" : "lastName",
             "op" : "contains",
             "value" : "snell",
             "ignore-case": true },
            {
             "field" : "firstName", 
             "op" : "starts-with",
             "value" : "ja",
             "ignore-case": true }
          ]
        }
      }
    ]
  }
}

gain, this is just a strawman... I'm still exploring the options for if the value of the field happens to be an array. Some means of determining if the array is empty and testing the members of the array will be required.

Now, this all assumes that a structured query syntax is necessary... we could punt on all that and go with string based evaluations based on other defined languages...

{  
  "objectType":"query",  
  "displayName": "My Query",  
  "sql": "SELECT * FROM PEOPLE WHERE ID IN 'abc123','abc124','abc125'
}

We could actually allow implementations the freedom to choose a variety of alternatives here by defining a single must implement choice and allowing other expressions to be specified via extension. We have plenty of options to choose from.

Again, this is all just brainstorming. Comments and feedback and welcome.

Persistent Query Service

*Strawman* Query Document Syntax

Strawman Query Document Syntax