shopping24 tech blog

s is for shopping

August 17, 2014 / by Torsten Bøgh Köster / CTO / @tboeghk

Introducing a Solr JDBC synonym filter

In our product search, we utilize the classic Solr synonym filter a lot. But maintaining hundred thousands of synonyms, split over multiple text files is less than ideal. So we built a synonym filter that retrieves synonyms from a JDBC datastore.

We use synonyms in various types of search situations. We use a synonym based stemming approach, we use synonyms to broaden a user query and for the classic: spell correction.

This was covered in roughly 19MB of text files. Updates were pushed to Github by the operations team but it took a deployment to roll out the changes. To reduce turnaround time, we decided to store synonyms in a database and built the JdbcSynonymFilter to load them into Solr.

The solr-jdbc-synonyms project is released under Apache License 2.0 and is available on Github and Maven Central.

Installing the JDBC synonym filter (Apache Tomcat)

We run Solr on Apache Tomcat 7. If you use a different servlet container, your library locations may differ.

  1. Place the solr-jdbc-synonyms-1.0.0-jar-with-dependencies.jar in the /lib directory of your Solr installation.

  2. Place the JAR with the JDBC driver of your database in the /lib directory of your Tomcat. In our case, it’s the Postgres 9.3 driver

  3. The synonym filter picks up a JDBC DataSource from the JNDI context. You can configure it either in the GlobalNamingResources or the Context of your Solr application.

Configuring the synonym filter

The JdbcSynonymFilterFactory behaves exactly like the Solr SynonymFilterFactory. It’s meant to be a drop in replacement. Replace the class name and remove the synonyms parameter from your existing filter definitions and add the mandatory sql and jndiName parameter:

<filter class="com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"   
   sql="SELECT concat(left, '=>', right) as line FROM synonyms" 
   jndiName="jdbc/synonyms" ignoreCase="false" expand="true" />
  • jndiName: The JNDI name of your JDBC DataSource. In the example above, this would be jdbc/synonyms.

  • sql: A SQL statement returning valid Solr synonym lines in the first SQL result column. Valid synonym formats include x=>a, x=>a,b,c, x,y=>a,b,c or x,a,b,c.

    • You might have your left and right hand side of your synonym definitions stored in separate columns in your database. Use a SQL concat function to create a valid synonm line.

    • Use the sql where statement to select specific synonyms by type or only the one’s approved by qa.

    • PostgreSQL is equipped with a ARRAY data type that comes in handy if you have multiple entries on the right or left hand side of your synonym line. Use the array_to_string function to join the array values.

Contributing

Send us your use cases, user stories, file bugs, Github pull requests or join our engineering team to build leading e-commerce search applications and appliances.