shopping24 tech blog

s is for shopping

March 04, 2014 / by Sven Kleese / Senior project manager operations / @

Querying a timestamp for a date

I want to show you some ways to do date queries that I was presented over the years as “the” way to do it. Most of them have in common that they are not “the” way to do it. But they look good. That makes them false friends. I will show them in the order I learned about them.

The task

The queries or better: the WHERE clauses I will discuss, all try to solve this problem:

Find all records of a given date in a table using a key of type timestamp.

Environment

I used a PostgreSQL DMBS to do my little examinations. But I guess this applies to other SQL / RDBMS as well.

  • my_table is an example table with 54 million rows
  • my_timestamp is a timestamp consisting of both date and time
  • idx_my_table_my_timestamp is an index on my_timestamp
  • 'YYYY-MM-DD' is the date we are looking for
  • 'YYYY-MM-DD+1' means the date after the date we are looking for

The Queries

Number one: DATE_TRUNC()

This is the first one someone showed me by saying: “Look, what cool way I found to query for the date!”

WHERE DATE_TRUNC(‘DAY’, my_timestamp) = 'YYYY-MM-DD'

Looks good on the first sight. It addresses the problem to map all the different timestamps that can occur during a day to the day it is looked for.

But if you look at the output of EXPLAIN you’ll see that the index ist not used, so the whole query is a bit costly:

Aggregate  (cost=5161952.49..5161952.50 rows=1 width=0)
	-> Seq Scan on my_table  (cost=0.00..5161270.48 rows=272802 width=0)
		Filter: (date_trunc('DAY'::text, my_timestamp) = '2011-06-07 00:00:00'::timestamp)

It doesn’t use the index because it reads every single entry to truncate the timestamp to compare it with the given date.

Number two: DATE()

It was quite easy to develop this from the first one, though it looks quite advanced:

WHERE DATE( my_timestamp) = 'YYYY-MM-DD'

Right. Looks better somehow but works exactly the same:

Aggregate  (cost=5161952.49..5161952.50 rows=1 width=0)
	-> Seq Scan on my_table  (cost=0.00..5161270.48 rows=272802 width=0)
		Filter: (date(my_timestamp) = '2011-06-07'::date)

Number three: LIKE

This one looks very nice, as it anticipates the fact that the timestamp has something after the date. And it doesn’t seem to mangle with the column:

WHERE my_timestamp LIKE 'YYYY-MM-DD%'

But it isn’t any better.

It also needs to read every entry in the table to check the timestamp against the pattern.

Again: no index used.

It is even more expensive than the solutions before as now the simple equality check is replaced by a pattern match:

Aggregate  (cost=5298353.57..5298353.58 rows=1 width=0)
	-> Seq Scan on my_table  (cost=0.00..5297671.56 rows=272802 width=0)
		Filter: ((my_timestamp)::text ~~ '2011-06-07%'::text)

Number four: >= / <

This is the first one I saw that retrieves the wanted data and uses the index:

WHERE my_timestamp >= 'YYYY-MM-DD' AND my_timestamp < 'YYYY-MM-DD+1'

Look how it uses the index:

Aggregate  (cost=10003.94..10003.95 rows=1 width=0)
	-> Index Scan using idx_my_table_my_timestamp on my_table  (cost=0.00..9845.85 rows=63239 width=0)
		Index Cond: ((my_timestamp >= '2011-06-07 00:00:00'::timestamp) AND (my_timestamp < '2011-06-08 00:00:00'::timestamp))

Number five: BETWEEN

This is a simplyfication I actually came up with:

WHERE my_timestamp BETWEEN 'YYYY-MM-DD' AND 'YYYY-MM-DD+1'

As you can see in the output of EXPLAIN it uses the index. But what it also shows: It also includes the first second of the next day. In most cases it gets exactly the same results as the solution before. But in the rare cases where a row for the first second of a day exists it returns a wrong result:

Aggregate  (cost=10003.94..10003.95 rows=1 width=0)
	-> Index Scan using idx_my_table_my_timestamp on my_table  (cost=0.00..9845.85 rows=63239 width=0)
		Index Cond: ((my_timestamp >= '2011-06-07 00:00:00'::timestamp) AND (my_timestamp <= '2011-06-08 00:00:00'::timestamp))

Conclusion

Many people - including me (!) - show you solutions that look good but aren’t.

Always double check important queries for yourself.

Even your own “good” ideas ;-)