1. Do we need to consider data types other than string and interger such as date,char etc? 2. How would we specify primary key?(Column level or table level) 3. What would be column length for type integer? 4. do we need to consider constraints other than primary key (e.g. non-null)?
In the requirements page it is given that F(A1; : : : ;An) is a monotone function defined by a (+;*)-arithmetic expression with positive co-efficient over the attributes. What will be the format for this monotone function ? Can we get an example ? Also what will be the format for the component scoring functions ?
1. Do we need to ensure that the primary key constraint is maintained after each insertion into a table, or can this be assumed?
2. Can we use "DROP INDEX I ON TABLE T" rather than just "DROP INDEX I"? Most (but not all) versions of DROP INDEX include a reference to the table in the DROP INDEX statement (http://www.w3schools.com/sql/sql_drop.asp).
3. Are we required to support both the long and short form of the SQL INSERT statement, or is one or the other sufficient?
JavaCC is a parser GENERATOR, so you can use it to build a parser for the subset of SQL used in project #2. You are not supposed to use an open-source SQL parser.
How is this different from using an open source parser directly; I mean people can still get some open source grammar and generate the parser with minimum effort. On the contrary, if we are to write our own grammar, that is pretty challenging (and re-inventing the wheel). How exactly will this be evaluated?
You cannot simply use an open-source grammar because you are implementing a SUBSET of SQL. So valid SQL queries need to be flagged if they do not belong to the subset.
1. Three secondary indexes: for A, B, and C (the sorted lists were used only for presentation), and a primary index.
2. The scoring function can be defined by any arithmetic expression that uses addition and multiplication over attributes and positive constants. The expression can use any subset of attributes of the given relation.
When traversing secondary indexes that are built on integers formatted as strings, the order of traversal will be different than if the integers were treated as integers. For example, as strings, all integers beginning with "1" will appear before all integers beginning with "2", so "10" will appear before "2" when the strings are sorted in "increasing" order. This clearly differs from integers that are sorted in increasing order, in which 2 will appear before 10.
How should we handle this difference in strings vs integers in light of the threshold algorithm?
Assuming that all attribute values are positive integers, we can handle this issue by simply zero-padding all attribute values when converting them to strings prior to insertion into the data file and various indexes. Something like the following could be used:
String.format("%012d", integerValue)
Can we make the assumption that all attribute values will be positive so that we can minimize the impact of this integer/string ordering issue in the indexes?
Whenever there are multiple records with the same value for a particular column, I find that my top-k queries differ from mySQL's top-k queries. I narrowed this down to the fact that because the order of columns is different (although my sorted list and mySQL's sorted lists have the same values), the threshold is satisfied earlier in some cases and hence my output differs from mySQL's output.
There is however a big possibility that I have implemented the Threshold algorithm incorrectly.
1. Do we need to consider data types other than string and interger such as date,char etc?
ReplyDelete2. How would we specify primary key?(Column level or table level)
3. What would be column length for type integer?
4. do we need to consider constraints other than primary key (e.g. non-null)?
1. Only integer values need to be considered for the SQL interface.
Delete2. Column-level is fine.
3. As in standard SQL.
4. No.
Can we get a distribution of marks for the features? That will make it easier for us to evaluate our own project.
ReplyDeleteIn the requirements page it is given that F(A1; : : : ;An) is a monotone function defined by a (+;*)-arithmetic expression with positive co-efficient over the attributes. What will be the format for this monotone function ? Can we get an example ? Also what will be the format for the component scoring functions ?
ReplyDeleteAssuming R(A B C D):
DeleteA*B + 2*C
There are no component scoring functions. The function F is defined directly in terms of the attributes (as above).
--jc
1. Do we need to ensure that the primary key constraint is maintained after each insertion into a table, or can this be assumed?
ReplyDelete2. Can we use "DROP INDEX I ON TABLE T" rather than just "DROP INDEX I"? Most (but not all) versions of DROP INDEX include a reference to the table in the DROP INDEX statement (http://www.w3schools.com/sql/sql_drop.asp).
3. Are we required to support both the long and short form of the SQL INSERT statement, or is one or the other sufficient?
Thanks,
Jay
1. You can assume that (and prepare your test data accordingly).
Delete2. Just the short form, without a table reference.
3. The short form, without column names.
--jc
1. What will be the format of the where clause?
ReplyDeleteEx: (Salary > 10000)
2. Do we have to consider where clause with multiple conditions (connected by AND/OR)?
Do we also have to consider where conditions like:
DeleteAttribute LIKE/IN Expression/List?
1-2. WHERE clauses can be disjunctions of conjunctions of arithmetic comparisons (=,>,...)
ReplyDelete3. No nested subqueries.
Can we use a open source parser (eg: javacc) for the parser part?
ReplyDeleteYes, in facts this is recommended.
Delete--jc
To clarify:
DeleteJavaCC is a parser GENERATOR, so you can use it to build a parser for the subset of SQL used in project #2. You are not supposed to use an open-source SQL parser.
--jc
How is this different from using an open source parser directly; I mean people can still get some open source grammar and generate the parser with minimum effort. On the contrary, if we are to write our own grammar, that is pretty challenging (and re-inventing the wheel). How exactly will this be evaluated?
DeleteRikson
You cannot simply use an open-source grammar because you are implementing a SUBSET of SQL. So valid SQL queries need to be flagged if they do not belong to the subset.
Delete--jc
1. F(A, B, C) = A * B + 2 * C
ReplyDeleteIn this case, to implement the threshold algorithm, do we make 3 lists (Si), one for A, second for B and third for 2 * c?
OR
we make 2 lists (Si), one for A * B and other for 2 * C?
2. Can scoring function be like (A + B) + 2 * C?
How do we decide how many lists to make for threshold algorithm?
1. Three secondary indexes: for A, B, and C (the sorted lists were used only for presentation), and a primary index.
Delete2. The scoring function can be defined by any arithmetic expression that uses addition and multiplication over attributes and positive constants. The expression can use any subset of attributes of the given relation.
When traversing secondary indexes that are built on integers formatted as strings, the order of traversal will be different than if the integers were treated as integers. For example, as strings, all integers beginning with "1" will appear before all integers beginning with "2", so "10" will appear before "2" when the strings are sorted in "increasing" order. This clearly differs from integers that are sorted in increasing order, in which 2 will appear before 10.
ReplyDeleteHow should we handle this difference in strings vs integers in light of the threshold algorithm?
In the example you can see, we only use actual numbers, not string. So you need to first transfer your data from string to integer.
DeleteThis comment has been removed by the author.
DeleteAssuming that all attribute values are positive integers, we can handle this issue by simply zero-padding all attribute values when converting them to strings prior to insertion into the data file and various indexes. Something like the following could be used:
ReplyDeleteString.format("%012d", integerValue)
Can we make the assumption that all attribute values will be positive so that we can minimize the impact of this integer/string ordering issue in the indexes?
Thank you for your solution.
ReplyDeleteYou can assume that attribute values are non-negative (they can be 0).
--jc
"1-2. WHERE clauses can be disjunctions of conjunctions of arithmetic comparisons (=,>,...)"
ReplyDeleteCan you please explain clearly? or give an example with the list of operators.
Whenever there are multiple records with the same value for a particular column, I find that my top-k queries differ from mySQL's top-k queries.
ReplyDeleteI narrowed this down to the fact that because the order of columns is different (although my sorted list and mySQL's sorted lists have the same values), the threshold is satisfied earlier in some cases and hence my output differs from mySQL's output.
There is however a big possibility that I have implemented the Threshold algorithm incorrectly.
Does anyone else suffer from a similar problem?