[sql] Questions every good Database/SQL developer should be able to answer

I was going through Questions every good .Net developer should be able to answer and was highly impressed with the content and approach of this question and so in the same spirit, I am asking this question for Database/SQL Developer.

What questions do you think should a good Database/SQL programmer be able to respond to?

This question is related to sql database

The answer is


Almost everything is mentioned here. I would like to share one question which I was asked by a senior manager on database. I found the question pretty interesting and if you think about it deeply, it sort of has lot of meaning in it.

Question was - How would you describe database to your 5 year old kid ?


Give an example where denomralization is preferable.

(I like this one because people come rampaging out of college looking to put everything into 3rd normal form)


  • What are the downsides of using adhoc/on-the-fly SQL statements and what would you do instead?

This area can have a huge impact on performance and security of the db. Get this wrong, and you could end up in a world of pain.

  • In what situations would you use adhoc/on-the-fly SQL statements?

Because there is always an exception to the rule :)


  • Explain possible constraints on tables
  • Explain views (and materialized)
  • Explain sequences
  • Explain triggers

What is sql injection and how do you prevent it?

What is a cursor and when would you use it (or not) and why?


I would give a badly written query and ask them how they would go about performance tuning it.

I would ask about set theory. If you don't understand operating in sets, you can't effectively query a relational database.

I would give them some cursor examples and ask how they would rewrite them to make them set-based.

If the job involved imports and exports I would ask questions about SSIS (or other tools involved in doing this used by other datbases). If it involved writing reports, I would want to know that they understand aggregates and grouping (As well as Crystal Reports or SSRS or whatever ereporting tool you use).

I would ask the difference in results between these three queries:

select  a.field1
        , a.field2
        , b.field3
from table1 a
join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id
where a.field5 = 'test'
    and b.field3 = 1

select  a.field1
        , a.field2
        , b.field3
from table1 a
left join table2 b
    on a.id = b.id and b.field3 = 1
where a.field5 = 'test'

I've placed this answer because Erwin Smout posted a answer that was so wrong it highlighted that there is probably a need to specifically guard against it.

Erwin suggested:

"Why should every SELECT always include DISTINCT ?"

A more appropriate question would be: If someone were to make the claim that: "every SELECT always include DISTINCT"; how would you comment on the claim?

If a candidate is unable to shoot the claim down in flames they either:

  • Don't understand the problem with the claim.
  • Lack in critical thinking skills.
  • Lack in ability to communicate technical issues.

For the record

  1. Suppose your query is correct, and does not return any duplicates, then including DISTINCT simply forces the RDBMS to check your result (zero benefit, and a lot of additional processing).
  2. Suppose your query is incorrect, and does return duplicates, then including DISTINCT simply hides the problem (again with additional processing). It would be better to spot the problem and fix your query... it'll run faster that way.

An interesting question would involve relational division, or how to express a "for all" relationship, which would require nested not exists clauses.

The question comes straigh from this link.

Given the following tables, representing pilots that can fly planes and planes in a hangar:

create table PilotSkills (
  pilot_name char(15) not null,
  plane_name char(15) not null
)

create table Hangar (
  plane_name char(15) not null
)

Select the names of the pilots who can fly every plane in the hangar.

The answer:

select distinct pilot_name
from PilotSkills as ps1 
where not exists (
  select * from hangar
  where not exists (
    select * from PilotSkills as ps2 where 
      ps1.pilot_name = ps2.pilot_name and 
      ps2.plane_name = hangar.plane_name
  )
)

Or ...

Select all stack overflow users that have accepted answers in questions tagged with the 10 most popular programming languages.

The (possible) answer (assuming an Accepted_Answers view and a Target_Language_Tags table with the desired tags):

select distinct u.user_name
from Users as u
join Accepted_Answers as a1 on u.user_id = a1.user_id
where not exists (
  select * from Target_Language_Tags t
  where not exists (
    select * 
      from Accepted_Answers as a2
      join Questions as q on a2.question_id = q.question_id
      join Question_Tags as qt on qt.question_id = q.question_id 
    where 
      qt.tag_name = t.tag_name and
      a1.user_id = a2.user_id
  )
)

What is the difference between a clustered index and a nonclustered index?

Another question I would ask that is not for a specific server would be:

What is a deadlock?


  • Explain the difference between an inner and outer join.
  • What is a Cartesian product?
  • Explain 3rd normal form

At our company, instead of asking a lot of SQL questions that anyone with a good memory can answer, we created a SQL Developers test. The test is designed to have the candidate put together a solid schema with normalization and RI considerations, check constraints etc. And then be able to create some queries to produce results sets we're looking for. They create all this against a brief design specification we give them. They are allowed to do this at home, and take as much time as they need (within reason).


Knowing not to use, and WHY not to use:

SELECT *

Why should we hire you when we have a sophisticated application using a properly-optimized ORM and implementing caching systems such as memcached?

This is a serious question, they should be able to justify their existence. As Jeff Atwood likes to say "Hardware is Cheap, Programmers are Expensive"


What explain plan does and how interpret the results you get from it.


Compare and contrast the differences between a sql/rdbms solution and nosql solution. You can't claim to be an expert in any technology without knowing its strengths and weaknesses as compared to its competitors.


  • How do you detect and resolve concurrency issues in the application layer?
  • What locking paradigms are typically available, and discuss their pros and cons.
  • Discuss NULL values and related issues.
  • What is the largest database system you've worked on in terms of: # tables, # rows, # users.

Also the following platform specific (SQL Server) questions:

  • Discuss IDENTITY columns.
  • What is the timestamp datatype used for?

  • What database types had caused to you, as a developer, more trouble to understand and debug? Expected answer, IMHO, experience with issues using different date/timestamp types and BLOBs.

  • When is convenient to use bitmap indexes?


Here are a few:

  • What is normalization and why is it important?
  • What are some situations where you would de-normalize data?
  • What is a transaction and why is it important?
  • What is referential integrity and why is it important?
  • What steps would to take to investigate reports of slow database performance?

The application is in use 24 hours a day. Your maintenance / update window is 2 hours every month, how do you plan to minimise disruption?


A reprint of my answer here, as general guidelines for topics.

Basics


  1. SELECTing columns from a table
  2. Aggregates Part 1: COUNT, SUM, MAX/MIN
  3. Aggregates Part 2: DISTINCT, GROUP BY, HAVING

Intermediate


  1. JOINs, ANSI-89 and ANSI-92 syntax
  2. UNION vs UNION ALL
  3. NULL handling: COALESCE & Native NULL handling
  4. Subqueries: IN, EXISTS, and inline views
  5. Subqueries: Correlated
  6. WITH syntax: Subquery Factoring/CTE
  7. Views

Advanced Topics


  • Functions, Stored Procedures, Packages
  • Pivoting data: CASE & PIVOT syntax
  • Hierarchical Queries
  • Cursors: Implicit and Explicit
  • Triggers
  • Dynamic SQL
  • Materialized Views
  • Query Optimization: Indexes
  • Query Optimization: Explain Plans
  • Query Optimization: Profiling
  • Data Modelling: Normal Forms, 1 through 3
  • Data Modelling: Primary & Foreign Keys
  • Data Modelling: Table Constraints
  • Data Modelling: Link/Corrollary Tables
  • Full Text Searching
  • XML
  • Isolation Levels
  • Entity Relationship Diagrams (ERDs), Logical and Physical
  • Transactions: COMMIT, ROLLBACK, Error Handling

"Why should every SELECT always include DISTINCT ?"