Juxtapus

Juxtapus
Team Name: 
Dr Tarantism

Introduction

Juxtapus is a new kind of search engine, designed to simplify the process of accessing open government data. The user asks a simple question, in plain English, such as “What is the life expectancy of indigenous people?” and Juxtapus will seek out the answer. What sets Juxtapus apart from conventional search engines however, is that it delivers a single result: exactly what the user asked for.

Why did you choose to build it?

After looking closely at the kind of data that exists, I realised that an opportunity was staring at me right in the face: I should make accessing government data as simple as checking emails, scrolling a Twitter or Facebook newsfeed, or searching Google. I asked myself, “What do these tasks have in common?” They are simple, involve as few steps as possible, they reduce our dependence upon the mainstream media, and most of all you can use them spontaneously: without a plan or purpose you can find interesting information.

The Internet showed us that we could live in a world where information was no longer controlled by a handful of people. The early web however still required some technical expertise to make the most of it. At the very least, a person needed a basic understanding of HTML to create a web page. In time though, services that simplify the process of creating a web presence proliferated, and ushered us into a new digital age. The web became more personal, more user friendly. I hope to achieve a similar result for open data.

What does it do?

The user asks a simple question, like “How much income tax was paid per person in 2011-2012?” Juxtapus finds and delivers the data. (This example is purely hypothetical and does not reflect the ATO dataset)

This begins with Juxtapus closely examining the user's search query. First Juxtapus will to check if they asked about a valid subject. In this case the subject "income tax" will be detected as valid. Next Juxtapus checks if the query contains a question preface valid for this particular subject (e.g. "how many", "where is"). The preface "how much" was detected as valid. Juxtapus examines the query to find "data shaping constraints" relevant to this question and topic. This means that it looks for elements of the question that provide some detail about how the data should be limited or modified, for example limiting the results to a particular city or time period. In this case, "Per person" was provided and is a valid shaping constraint that tells Juxtapus to deliver results grouped using the AVERAGE math function. "in 2011-2012" is another valid shaping constraint that tells Juxtapus which financial year to retrieve. Finally Juxtapus can look at its database to determine which script to run for this particular combination of topic and question. That script will cause Juxtapus to locate the data and deliver it to the user.

For a more detailed description visit: http://juxtapus.azurewebsites.net/about.aspx

Who is it for?

Initially the intended audience will be the general public and journalists.

In the future however, I intend to extend Juxtapus’ capabilities so that it will be useful to data scientists, Government departments and corporations. I will achieve this by allowing abstract questions to be asked such as “what are the outlying trends in our national statistics?” or “what correlations exist between past federal budgets and growth in the subsequent five years?”

What data have you reused?

Juxtapus is capable of accessing any and every dataset that is available via web service. Though I must admit that I am making the assumption that all such datasets contain tabular data and are in the JSON format. If this is not always the case, my code would need to be extended to access any other kinds of dataset.

How have you reused it?

I have designed an app that allows my analysts (just me for the time being) to manually index any and every available dataset. All the user needs to do is ask a question related to one or more datasets and Juxtapus will provide answers.

Technology used

I have not incorporated any open source code beyond the standard ASP.NET library. I write all my own code, including the deserialisation of JSON.

Future expansion (including entrepreneurial plans)

Abstract / Analytical Searches

In the future, it will be possible to ask questions that juxtapose several datasets against each other to support complex scientific work. (Hence the name “Juxtapus”) For example, “What numeric or trends frequently precede economic decline?” I would offer this as a premium service and charge a subscription fee for it.

Scripting Language

As the system expands, adding new scripts will be problematic if they must be hard-coded into the application. This is because programming is always prone to bugs. The solution is to create a simple scripting language so that scripts can be uploaded to a database. Juxtapus then acts as a code interpreter, executing the commands, and trying its best to detect bugs.
Another advantage of this approach is that it allows me to create a premium service where external users (especially data scientists) can write their own scripts and execute them remotely.

APIs are a brilliant innovation but they do have their limitations since they are based upon the exchange of data. This paradigm is effective if you know what you are looking for and the data provider anticipates your needs. However, if you are doing more advanced work this system leaves much to be desired. If you are a data scientist you may need to sift through billions of rows of data, but API’s are not generally designed for this, with many systems delivering only a hundred rows at a time. It is just not practical to make 10 million API requests to download the data we need.

My solution is to supplement the API mechanism by allowing premium users to upload (or commission the development of) custom scripts, which can then be triggered through the Juxtapus search bar (by entering a special command) or through the API.

Other Entrepreneurial Aspects

I would like to use Juxtapus as a prototype for a broader business venture. I believe that I could create a new market for this technology. I imagine that many organisations struggle with the burdens of customer service. This would explain why so many corporations send call centre jobs overseas. With Juxtapus, human labour can be used more efficiently to answer questions and FAQ sections on web sites would become redundant. Best of all, as it is not truly AI, Juxtapus wouldn’t take away jobs from human beings – it would create new jobs for Australians.

 

Datasets Used: 
I have not yet gotten to that point where I actually connected the datasets - I ran out of time. But as I said, my app will connect to every dataset because it is a type of search engine.

Local Event Location: