A Chaining Breakthrough!

To further steal the great foundation of jQuery method chaining, I have realized a way to use a “back” function that will regress backward through the chained functions and return the last object that was manipulated. Wow, that makes it sound confusing but it is not. Imagine the following chained code.

$objSelect->addTable('table');

Previously this would have returned the original sqlSelect object. But what if we wanted to manipulate the sqlTableReference before moving on? Maybe we want to add a JOIN element or set an alias for the table. This would be impossible using the chaining methods. We would need to explicitly create the table separately with its own chaining methods, then pass that create sqlTableReference object to the sqlSelect object. Let us try returning the created, or passing, sqlTableReference when users call the addTable method on the sqlSelect object. This will allow further manipulation of that table object. Great. Now we need to get “back” to the original sqlSelect object so we can add maybe an sqlSelectExpression or an sqlConditionGroup. Simply call the new back() chaining method to get back to the original object.

$objSelect->addTable('table')
		->setAlias('tbl')
		->back()
		->addWhere($objConditionGroup);

Going further in, we may need to regress all the way back to the beginning, to the absolute original object that started the line of code. Call the end() chaining method and get there!

$objSelect->addTable('table')
		->setJoin('INNER')
		->addOn($objConditionGroup)
		->back()
		->back();

// OR JUST CALL end()
$objSelect->addTable('table')
		->setJoin('INNER')
		->addOn($objConditionGroup)
		->end();

So far this is all I have used to accomplish this. There has been little testing but the first tests work incredibly. I have placed the following code within an abstract class from which all chaining classes are extended.

abstract class sqlMethodChainer
{

	private $objCaller;

	public function setCaller(&$objCaller)
	{
		$this->objCaller = $objCaller;
		return $this;
	}


	public function back()
	{
		return $this->objCaller;
	}

	public function end()
	{
		$obj = $this;
		while (is_object($obj) && method_exists($obj, 'back')) {
			$obj = $obj->back();
		}
		return $obj;
	}
}

My First Statement

Well I have managed to create a simple SELECT statement. It is cumbersome at best and follows along the process flow introduced in the last post, “Select Statement“. I show this only to demonstrate the foundation of how the statement is ultimately put together. You will notice that each class is initiated using a “create” static method instead of explicitly calling the __construct magic method. The “create” method returns the initiated class that can be further manipulated using a method chaining functionality. The chaining is not used in the following example, however, this is where is will make my next improvements to reduce necessary code. There is very little content validation being done at this point on data being passed to the statement constructors. Of course, because each component is represented by different classes, I can easily add validation once the basic code is completed.

$objSelect = SQL::newQuery('SELECT');
$objTableReference = sqlTableReference::create('table');
$objSelect->addTable($objTableReference);
$objColumnDefinition = sqlColumnDefinition::create('column');
$objColumnDefinition->setTable($objTableReference);
$objSelectExpression = sqlSelectExpression::create($objColumnDefinition);
$objSelect->addSelect($objSelectExpression);
$objConditionGroup = sqlConditionGroup::create('AND');
$objWhereCondition = sqlWhereCondition::create($objColumnDefinition);
$objWhereCondition->setOperator('=');
$objWhereCondition->setValue('value');
$objConditionGroup->addCondition($objWhereCondition);
$objSelect->addWhere($objConditionGroup);

echo $objSelect->__toString();
// SELECT table.column FROM table WHERE (table.column = value)

SELECT Statement

Let us break down the simplest of statements, the SELECT statement. I will try to walk through the process of creating the following: “SELECT column FROM table WHERE column = value;”.  This will be the “long” way of doing it, by explicitly creating each instance of the statement components. In the end, I wish to have the methods of the statement class itself create its own component classes. This way the volume of code needed to create the statement will be greatly reduced. Also, during instance destruction, I can have the statement class destroy all of its components. Therefore, a user would only need to unset the statement class and free all the memory associated with the class. This example is only to determine a flow to the process. It seems quite complex and cumbersome. Hopes are to use a chaining system, as briefly described in “The Last Relationship, I Think“, to reduce code and simplify the process.

  1. Initiate an sqlSelect object.
  2. Create a table reference instance.
  3. Pass the table reference to the sqlSelect object.
  4. Using the table reference, initiate a column definition object.
  5. Create an sqlSelectExpression instance.
  6. Pass the column definition object to the sqlSelectExpression object.
  7. Pass the sqlSelectExpression object to the sqlSelect object.
  8. Create an sqlConditionGroup object.
  9. Create an sqlWhereCondition object.
  10. Pass the sqlWhereCondition object to the sqlConditionGroup object.
  11. Pass the sqlConditionGroup object to the sqlSelect instance.

Statement Types

I can now determine each SQL statement type that I want to build. Each type will be represented by a class, each extending the abstract sqlStatement class. These classes will be made up of multiple instances of sqlRelationship and sqlComponent classes. The statement classes will verify each object passed to determine if it “fits” within the construct of it’s particular statement type.  For example, a SELECT statement must contain an sqlSelectExpression but an INSERT statement cannot.  A very basic breakdown is shown below.

  • SELECT
    • sqlSelectExpression: (1,*)
    • sqlTableReference: (1,*)
    • sqlConditionGroup: (0,1)
    • sqlSorter: (0,*)
  • INSERT
    • sqlSetExpression: (1,*)
    • sqlTableReference: (1)
  • UPDATE
    • sqlSetExpression: (1,*)
    • sqlTableReference: (1,*)
    • sqlConditionGroup: (0,1)
    • sqlSorter: (0,*)
  • REPLACE
    • sqlSetExpression: (1,*)
    • sqlTableReference: (1)
  • DELETE
    • sqlTableReference: (1)
    • sqlConditionGroup: (0,1)
    • sqlSorter: (0,*)

Order/Group/Having Clauses

There are a few last minute clauses to an SQL statement that I need to address. The HAVING clause is fairly straight forward. I requires a WHERE condition, so that is what I will use. Any statement class that can take a HAVING clause will accept an sqlConditionGroup as an argument as long as it is of the WHERE type. The ORDER BY and GROUP BY clauses will have their own classes.

An abstract sqlSorter class will be extended to create both the sqlOrder and sqlGroup classes. The sqlSorter will, in turn, be extended from the sqlRelationship class. Utilizing the same name/value pairs to hold things like column definitions or functions and the value being the direction, ASC or DESC. I am imagining functionality that will allow a user to add an sqlSelectExpression to the sqlStatement class, then designate it as a column to ORDER or GROUP by. The method to add the expression will return a reference to the added expression, this expression will have a reference to the statement class that holds it. When the expression is designated as a sorter, it will call its “parent” statement class to add the appropriate column definition to an internal array of sorters.

$objSelect->addSelect($objTable)->setAS('col1')->orderBy(2);

The value passed to “orderBy” will be used to sort the array. This will allow users to set order by rules out of order while creating select or where expressions. Either of the sqlSorter children can contain any number of sqlComponent classes to create the finalized string.

sqlNewQuery-uml-20110111

Condition Groups

So I have five sqlRelationship classes that can be passed to the SQL statement class. (not discussed yet) The SELECT expressions, SET expressions, and table references can all be stored as arrays within the SQL  statement class. However, because any WHERE/ON condition can be a complex AND or OR expression, and that they can be embedded within one another, arrays will certainly not work. Example: “(WHERE|ON) col1=col2 AND (col3 > 0 OR col4 = ‘test’)”

I am going to use a condition group comprised of individual WHERE or ON conditions and groups. The condition group can be tagged as an AND or OR type. Once a single condition is passed to the group class, the class will designate itself a “where” or an “on” group. Subsequent conditions passed will need to be of the same type. This designation will be checked by any sqlStatement WHERE clause functions and any sqlTableReference ON clause function to make sure they are of the correct type.

Below is the newest of the proposed UML diagram. You can see the addition of an sqlCondition abstract class extending the sqlRelationship class by which both the WHERE and ON conditions are created. These two classes, WHERE and ON, must be passed to an sqlConditionGroup class before being passed to the final statement class. Example: an sqlTableReference will only accept a group and not a singular sqlCondition. My current relationships, aggregate, composite or association, have not been finalized and may be changed during class creation.

REMEMBER: None of these class are designed to be explicitly constructed. They are “helper” classes to be used by the sqlStatement class to assist with creating the SQL query statement.

sqlNewQuery-UML-20110105

Relationship Components

Much of the arguments passed to any of the sqlRelationship objects will be strings. However, upon my initial analysis, there are a couple of potential arguments that may benefit from an encapsulation into a class. This will help not only with conversion of these arguments to strings for inclusion into the SQL statement, but for validating the passed argument is of the correct type within the sqlRelationships. My first thought is to create class definitions for column definitions and functions.

Column definitions will contain an sqlTableReference object and a column name. The table object will contain the table name to be joined with the column name within the __toString() method.  Function objects will process the __toString() method depending upon the function being requested. This is the perfect opportunity for encapsulation. We may be able to add some future formatting validation to prevent function syntax errors.  By creating classes for these argument types we allow for easy future enhancements.

Other argument types like values and expression will be passed as simple strings. The abstract parent sqlRelationship class will perform basic scrubbing to prevent invalid or malicious code from being passed via these strings. The last argument type, an SQL object, is an object in itself and no need to create another class to hold this type.

With some further expansion with the relationships between the classes, my new UML Diagram is shown below. You will notice that the sqlFunction object can be composed of multiple sqlFunction objects to create complex embedded functions.  I have also decided to extend the sqlWhereCondition into the sqlOnCondition instead of it being an extension of the sqlRelationship abstract class. The sqlOnCondition, in turn, is composed of multiple where conditions. This may prove to be incorrect later, but I will stick with it for now.

sqlNewQuery-UML-20110101

The Last Relationship, I Think

There is one last statement clause that need to determine if I can fit the pieces into my “relationship” idea. That clause is the table reference. Each table reference will be a single relationship. Of course, a simple table reference will only be a name, no value or operator needed. Any JOIN table will need a name and a value, then maybe use the operator to hold the JOIN type. The name will be the table name, the value can be used to hold the ON clause. The ON clause itself being a collection of relationships. So each table reference will not be a name/value pair, per se, but a relationship none the less.

I have been pondering the user interface in the back of my mind during these brainstorms about “relationships”. I actually think a jQuery type chaining system would be easiest to use. Imagine this:

$objTable = tableReference::newTable('table1')->setAS('t1')->setJoin('INNER')->setON($objONclause);

I need to use a static construction function because using a standard “new Object()” method cannot be further manipulated. The static constructor will internally can its own standard __construct() method. But before I get too deep into that, I need to breakdown all these relationships and try to find a common structure. My initial thought is to have an abstract “relationship” class that will be extended into each of the different types: select expression, table reference, set expression, where condition, and join on condition.

sqlNewQuery-UML-20101222

Expanding the Relationship Theory

Lets continue to fit the SQL relationship theory into basic SQL statement syntax. You may recall, a statement “relationship”, as defined by myself, is a name/value pair used to create the building blocks of a complete SQL statement. The next step is to analyze the name/value pairs that make up the SET/VALUES expressions used within INSERT/REPLACE/UPDATE statements. Because these pairs are used to populate a database, the “name” portion of each name/value pair needs to be a column reference. An operator need not be defined for the only allowed value is “equal to” (=). The “value” portion can be as diverse as the SELECT statement breakdown. Basically anything that returns a scalar value can be used within the “value” portion of the relationship. No aliases are allowed.

SET/VALUES EXPRESSION
Type name operator value alias EXAMPLE
VALUES/SET column expr INSERT INTO tbltest (col) VALUES (1+1);
column value INSERT INTO tbltest (col) VALUES (‘value’);
column func INSERT INTO tbltest (col) VALUES (REPLACE(‘test’, ‘e’, ‘i’));
column column INSERT INTO tbltest (col1, col2) VALUES (1, col1*2);
column subselect INSERT INTO tbltest (col1, col2) VALUES (1, (SELECT col3 FROM tbltest2 WHERE id = 1));
column(1) subselect INSERT INTO tbltest (col1, col2) (SELECT col3, col4 FROM tbltest2 WHERE id = 2);
column(2)
column(3)
column(n)

Moving on to the WHERE clause we can easily expand the relationship theory. Each element of the WHERE clause can be viewed as a statement relationship, or a name/value pair. A WHERE relationship, or condition as termed my the MySQL documentation, is an expression that evaluates to true or false. You can use a subselect as both the name and the value portion of each WHERE condition, however, because these subselects are required to return a scalar value, the expression will always evaluate to the same result. Therefore, I have excluded the “subselect” = “subselect” relationship option.

WHERE CONDITION
Type name operator value alias EXAMPLE
WHERE expr SELECT * FROM tbltest WHERE (1=1);
expr * value SELECT * FROM tbltest WHERE ((2-1)=1);
expr * expr SELECT * FROM tbltest WHERE ((10-5)>(1+2));
func * value SELECT * FROM tbltest WHERE REPLACE(‘test’, ‘e’, ‘i’) = ’tist';
func * expr SELECT * FROM tbltest WHERE REPLACE(‘test’, ‘e’, ‘i’) = ’tist';
func * func SELECT * FROM tbltest WHERE SIN(20) >= ABS(-.05);
column * value SELECT * FROM tbltest WHERE col1 = 10;
column * expr SELECT * FROM tbltest WHERE col1 = (10-9);
column * func SELECT * FROM tbltest WHERE col1 = CEIL(12.145);
column * column SELECT * FROM tbltest WHERE col1 = col2;
column * subselect SELECT * FROM tbltest WHERE col1 IN (SELECT id FROM tbltest2);
columns = subselect SELECT * FROM tbltest WHERE (col1,col2) = (SELECT col3,col4 FROM tbltest2);

The last condition satisfies the ROW() subselect option. In this case a set of column references must equal the column values returned from the subselect. This will be satisfied by allowing the user to pass an array of column references as the “name” and a subselect as the “value”.

Each relationship listed above can be used within any JOIN statement ON clause. For each ON condition is basically a condition to determine “where” to link the tables. The only exception may be the multiple columns equal to a subselect. However, I think we can expand the relationship theory to any JOIN clause conditional statements.

Relationships

My initial thought was to break down an SQL statement into pieces. Seems like a fairly obvious idea so I am inclined to go with it. Let us assume these basic pieces: Statement Type, SELECT expression, FROM table references, JOIN clauses, INSERT/REPLACE/UPDATE values, WHERE clauses, and lastly and ORDER/GROUP/HAVING type clause. I think, and I stress “think”, that I can break these into smaller pieces. Each can be comprised of a collection of “relationships”, or even more simply as a collection of name/value pairs. Let us see how far I can take this theory.

The following table demonstrates the idea of name/value pair making up a SELECT expression. Each name/value pair represents a “Statement Relationship”. These example relationships should represent all possible combinations of column references (column), simple expressions (expr) (like 1+1), core MySQL functions (func), and full SELECT queries (subselect) that can used within the select expression. A sub-select statement used within the SELECT expression must be a scalar value, returning only a singular value. Each has the option of using an alias for easier parsing of the resulting record set.

SELECT EXPRESSION
Type name operator value alias EXAMPLE
SELECT value [opt] SELECT 23;
value * value [opt] SELECT 25 = 25;
column [opt] SELECT col1 FROM tbltest;
column * value [opt] SELECT col1 = 25 FROM tbltest;
column * column [opt] SELECT col1 = col2 FROM tbltest;
expr [opt] SELECT (1+1);
expr * value [opt] SELECT (1+1) < 5;
expr * expr [opt] SELECT (1+1) < (2+1);
func [opt] SELECT REPLACE(‘test’, ‘e’, ‘i’);
func * value [opt] SELECT REPLACE(‘test’, ‘e’, ‘i’) = ’tist';
func * expr [opt] SELECT SIN(20) < 1;
func * func [opt] SELECT SIN(20) = COS(20);
subselect [opt] SELECT (SELECT col1 FROM tbltest);
subselect * value [opt] SELECT (SELECT col1 FROM tbltest) = ‘value';
subselect * expr [opt] SELECT (SELECT col1 FROM tbltest) > (1+1);
subselect * subselect [opt] SELECT (SELECT col1 FROM tbltest) = (SELECT col2 FROM tbltest);