Internal SQL Library
Introduction
The Internal SQL library is a tool to query the content of Java arrays
using the SQL language.
You call a java method, named
ISQLQuery.query, to
which you pass some arrays and an SQL query.
The arrays get grouped
in virtual database tables, which are queried with the SQL query.
The result set
is converted to arrays, returned by ISQLQuery.query.
The following picture shows how this is done:
In the figure:
-
Each input array has a name.
The name is
used to identify the array in the SQL query.
-
The SQL is similar to the one
we use to query a database. One of the most important differences is in the from clause: Since there are no real tables (we
are not working with a real database), we build our tables in the SQL
query itself. [a1,
a2] as a in the from
clause means that the two input arrays a1 and a2 become the columns of
the table a. In the rest of the SQL query a1 and a2 are always
mentioned as a.a1 and a.a2 (it is not possible to use the column name
without the table).
-
The output arrays c1 and c2 are the
result set of the query. They are the aliases (as ...) in the
select clause of the query.
When we call the
query function:
-
The SQL query is parsed,
building an internal structure. If it contain errors the call is
stopped and the errors are returned as exceptions.
-
The input arrays are applied
to the query internal structure.
-
The output arrays are filled
with the return values of the query.
Dependencies
The Internal SQL library uses ANTLR v 3.x to parse the SQL query with the grammar defined in ISQLSelect.g.
How to use it in a program
After linked the library to your project, you need to create an
instance of the class ISQLQuery.
The ISQLQuery constructor has two parameters:
- The log
parameter, which is an instance of the ILogger class.
That's needed for this library to be able to log information about what
it is doing.
On
the other side the library cannot decide which log library you use
(log4j, java.util.Logger...), so what you need to do is to make an
adapter that wraps a Logger object of the log library you use in the
ILogger interface.
An example of such adapter (just for test purposes) is SystemLogger,
that prints the log information directly to the console.
- The isDebug
parameter. If this is true the library logs a lot of debug information
that is very useful when testing but reduces substantially the
performance.
Once you created the ISQLQuery object, you call its query method to
apply the SQL query to the arrays.
The query method has the following parameters:
- The sql
parameter, which is the SQL query.
- The mapArrayByName
parameter, which contains the arrays that will be handled as table
columns by the SQL query. They are in form of a map name ->
array.
The name is the one that identifies the array in the query.
The array is an instance of the one of the following classes:
It contains the real array and its type.
The arrays elements are objects (Double, String, Boolean,
Timestamp). The reason why we did not use primitive arrays when
possible (double[] and boolean[]) is practical: handle the arrays in a
generic
way, avoiding code duplication.
If your arrays are based on primitives (double[] or boolean[]) you can use ArrayBuilder to convert them to the equivalent object arrays (Double[] or Boolean[]).
The query method returns a map name -> result value.
Again, the name is the same that is used in the query as alias of the result item (after the as word).
The result value is an instance of ITypedValue. It means that it can be an array or a constant. It is a constant if resulting from an expression containing only a constant or an operation of constants.
Limitations
The SQL dialect to write the queries
is not standard. These are the differences:
-
The FROM clause is used to build
the tables to query, using the input arrays (like in the
example we saw before, [a1,
a2] as a).
-
All input columns have to be written as <tableName>.<columnName>
(as a.a1 and b.b2 in the figure), everywhere in the SQL query. To write
columnName without tableName is an error.
-
You need to set an alias
(AS ...) for each output value in the SELECT clause. This is needed to
connect the results with the output arrays.
-
Only aggregate functions
are allowed in the SELECT clause, in the WHERE clause no functions at
all are allowed. Operators (+,-,*,/,%) are allowed.
-
No sub-queries are
allowed.
-
Only SELECT, FROM, WHERE, ORDER BY, GROUP
BY clauses are available.
-
The LIKE operator works on regular
expressions. Therefore don't write LIKE 'inter%'
but LIKE 'inter.*' .
We have these limitations because:
-
Queries are not run against a database,
but arrays in memory.
-
This is the first version of the library
and reliability was preferred to completeness.
It is not excluded that in future
versions some of this limitations will be removed.
Query Syntax
For who wants to understand precisely
the SQL used for these queries, there is a grammar file, called ISQLSelect.g.
Here I give a more human readable
explanation.
The query is in the following form:
SELECT
<expression1> as <alias1>, ...
<expressionN> as
<aliasN>
FROM
[<column1_1>, ... <columnN_1>] as
<table1>, ...
[<column1_M>, .. <columnP_M>] as
<tableM>
WHERE
<condition1> [AND|OR <condition2> ...
AND|OR
<conditionN>]
ORDER
BY
<column1>, ... <columnN>
GROUP
BY
<column1>, ... <columnN>
and composed by the
following parts:
SELECT clause
The SELECT clause contains the result
values of the query. There must be an alias (AS) for each value
field.
Each result value can be:
-
a table column
-
a constant
-
an operation (+, -, *, /, %) between two
expressions
-
an
aggregate function (SUM, AVERAGE, MAX, MIN) Other functions (exp, log,
...) are not implemented. If you need to apply a function to an input
or output array you can do it in the code that calls ISQLQuery.query, before or after the call.
-
a CASE expression:
CASE <column>
WHEN <column> = (or another operator)
<constant1> THEN <expression1>
WHEN <column> = (or another operator)
<constant2> THEN <expression2>
...
ELSE expressionN
END
In each WHEN condition you compare the column (always the same column)
with a constant (constant1, constant2...). When one of the WHEN
condition is true, the result value is set to the THEN expression. If
none of them is true, the result value is set to the ELSE expression.
FROM clause
The FROM clause contains the
definitions of the tables. Each definition is in the form
[<column1>, <column2..>] AS
<table>, like in the
query in the figure.
All columns in a table need to be of
the same size. If they have different size, you can
distribute
them in two or more tables and join them using the WHERE clause.
WHERE clause (optional)
The WHERE clause is, as in the
standard SQL, a set of predicates connected by AND and OR. Each
predicate is in the form:
-
a comparison between two expressions,
using standard operators ( =, <>, <, >) .
We saw an example of it in the figure query.
-
a comparison between an expression and a
regular expression, using the LIKE operator. For example:
WHERE a.a1 LIKE 'inter.*'
to search all the a.a1 values that start with 'inter'.
-
an IN clause: <expression>
IN (<constant1>, <constant2>...,
<constantN>). For example:
WHERE a.a1 IN (10, 12, 16 34)
to accept only values of the column a.a1 that are in the given list
An expressions in the WHERE clause can
be:
-
a table column
-
a constant
-
an operation (+, -, *, /) between two
expressions
ORDER BY clause (optional)
As in the standard SQL, the ORDER BY
clause consists of the columns by which the query result has to be
sorted.
It is in the form:
ORDER BY <column1>,
<column2>,
... <columnN>
GROUP BY clause (optional)
Also the GROUP BY clause is the same
as in the standard SQL.
When you have some aggregate function
in the SELECT clause (SUM, AVERAGE...) all the columns in the SELECT
clause that are not parameters of the average functions must be
mentioned in the GROUP BY clause.
The GROUP BY clause is in the form:
GROUP BY <column1>,
<column2>,
... <columnN>
Some additional information.
Constants can be of 4 types, like the arrays.
-
Texts. They need to be surrounded by
single quotes ('example')
-
Numbers (with or without decimals).
-
Dates. They need to be surrounded by sharp
characters and written as ISO dates (#2007-10-11#).
- Booleans (true or false).