Monday, April 16, 2012

Please post project #2 questions as comments here

24 comments:

  1. 1. Do we need to consider data types other than string and interger such as date,char etc?
    2. How would we specify primary key?(Column level or table level)
    3. What would be column length for type integer?
    4. do we need to consider constraints other than primary key (e.g. non-null)?

    ReplyDelete
    Replies
    1. 1. Only integer values need to be considered for the SQL interface.
      2. Column-level is fine.
      3. As in standard SQL.
      4. No.

      Delete
  2. Can we get a distribution of marks for the features? That will make it easier for us to evaluate our own project.

    ReplyDelete
  3. In the requirements page it is given that F(A1; : : : ;An) is a monotone function defined by a (+;*)-arithmetic expression with positive co-efficient over the attributes. What will be the format for this monotone function ? Can we get an example ? Also what will be the format for the component scoring functions ?

    ReplyDelete
    Replies
    1. Assuming R(A B C D):

      A*B + 2*C

      There are no component scoring functions. The function F is defined directly in terms of the attributes (as above).

      --jc

      Delete
  4. 1. Do we need to ensure that the primary key constraint is maintained after each insertion into a table, or can this be assumed?

    2. Can we use "DROP INDEX I ON TABLE T" rather than just "DROP INDEX I"? Most (but not all) versions of DROP INDEX include a reference to the table in the DROP INDEX statement (http://www.w3schools.com/sql/sql_drop.asp).

    3. Are we required to support both the long and short form of the SQL INSERT statement, or is one or the other sufficient?

    Thanks,
    Jay

    ReplyDelete
    Replies
    1. 1. You can assume that (and prepare your test data accordingly).

      2. Just the short form, without a table reference.

      3. The short form, without column names.

      --jc

      Delete
  5. 1. What will be the format of the where clause?
    Ex: (Salary > 10000)


    2. Do we have to consider where clause with multiple conditions (connected by AND/OR)?

    ReplyDelete
    Replies
    1. Do we also have to consider where conditions like:
      Attribute LIKE/IN Expression/List?

      Delete
  6. 1-2. WHERE clauses can be disjunctions of conjunctions of arithmetic comparisons (=,>,...)

    3. No nested subqueries.

    ReplyDelete
  7. Can we use a open source parser (eg: javacc) for the parser part?

    ReplyDelete
    Replies
    1. Yes, in facts this is recommended.
      --jc

      Delete
    2. To clarify:

      JavaCC is a parser GENERATOR, so you can use it to build a parser for the subset of SQL used in project #2. You are not supposed to use an open-source SQL parser.

      --jc

      Delete
    3. How is this different from using an open source parser directly; I mean people can still get some open source grammar and generate the parser with minimum effort. On the contrary, if we are to write our own grammar, that is pretty challenging (and re-inventing the wheel). How exactly will this be evaluated?

      Rikson

      Delete
    4. You cannot simply use an open-source grammar because you are implementing a SUBSET of SQL. So valid SQL queries need to be flagged if they do not belong to the subset.

      --jc

      Delete
  8. 1. F(A, B, C) = A * B + 2 * C

    In this case, to implement the threshold algorithm, do we make 3 lists (Si), one for A, second for B and third for 2 * c?

    OR

    we make 2 lists (Si), one for A * B and other for 2 * C?

    2. Can scoring function be like (A + B) + 2 * C?
    How do we decide how many lists to make for threshold algorithm?

    ReplyDelete
    Replies
    1. 1. Three secondary indexes: for A, B, and C (the sorted lists were used only for presentation), and a primary index.

      2. The scoring function can be defined by any arithmetic expression that uses addition and multiplication over attributes and positive constants. The expression can use any subset of attributes of the given relation.

      Delete
  9. When traversing secondary indexes that are built on integers formatted as strings, the order of traversal will be different than if the integers were treated as integers. For example, as strings, all integers beginning with "1" will appear before all integers beginning with "2", so "10" will appear before "2" when the strings are sorted in "increasing" order. This clearly differs from integers that are sorted in increasing order, in which 2 will appear before 10.

    How should we handle this difference in strings vs integers in light of the threshold algorithm?

    ReplyDelete
    Replies
    1. In the example you can see, we only use actual numbers, not string. So you need to first transfer your data from string to integer.

      Delete
    2. This comment has been removed by the author.

      Delete
  10. Assuming that all attribute values are positive integers, we can handle this issue by simply zero-padding all attribute values when converting them to strings prior to insertion into the data file and various indexes. Something like the following could be used:

    String.format("%012d", integerValue)

    Can we make the assumption that all attribute values will be positive so that we can minimize the impact of this integer/string ordering issue in the indexes?

    ReplyDelete
  11. Thank you for your solution.

    You can assume that attribute values are non-negative (they can be 0).

    --jc

    ReplyDelete
  12. "1-2. WHERE clauses can be disjunctions of conjunctions of arithmetic comparisons (=,>,...)"

    Can you please explain clearly? or give an example with the list of operators.

    ReplyDelete
  13. Whenever there are multiple records with the same value for a particular column, I find that my top-k queries differ from mySQL's top-k queries.
    I narrowed this down to the fact that because the order of columns is different (although my sorted list and mySQL's sorted lists have the same values), the threshold is satisfied earlier in some cases and hence my output differs from mySQL's output.

    There is however a big possibility that I have implemented the Threshold algorithm incorrectly.

    Does anyone else suffer from a similar problem?

    ReplyDelete