August 17, 2014 / by Torsten Bøgh Köster / CTO / @tboeghk
Introducing a Solr JDBC synonym filter
In our product search, we utilize the classic Solr synonym filter a lot. But maintaining hundred thousands of synonyms, split over multiple text files is less than ideal. So we built a synonym filter that retrieves synonyms from a JDBC datastore.
We use synonyms in various types of search situations. We use a synonym based stemming approach, we use synonyms to broaden a user query and for the classic: spell correction.
This was covered in roughly 19MB of text files. Updates were pushed to Github
by the operations team but it took a deployment to roll out the changes.
To reduce turnaround time, we decided to store synonyms in a database and
built the JdbcSynonymFilter
to load them into Solr.
The solr-jdbc-synonyms
project is released under Apache License 2.0 and is available on Github and Maven Central.
Installing the JDBC synonym filter (Apache Tomcat)
We run Solr on Apache Tomcat 7. If you use a different servlet container, your library locations may differ.
-
Place the
solr-jdbc-synonyms-1.0.0-jar-with-dependencies.jar
in the/lib
directory of your Solr installation. -
Place the JAR with the JDBC driver of your database in the
/lib
directory of your Tomcat. In our case, it’s the Postgres 9.3 driver -
The synonym filter picks up a JDBC
DataSource
from the JNDI context. You can configure it either in theGlobalNamingResources
or theContext
of your Solr application.
Configuring the synonym filter
The JdbcSynonymFilterFactory
behaves exactly like the Solr SynonymFilterFactory
. It’s meant
to be a drop in replacement. Replace the class name and remove the synonyms
parameter from
your existing filter definitions and add the mandatory sql
and jndiName
parameter:
<filter class="com.s24.search.solr.analysis.jdbc.JdbcSynonymFilterFactory"
sql="SELECT concat(left, '=>', right) as line FROM synonyms"
jndiName="jdbc/synonyms" ignoreCase="false" expand="true" />
-
jndiName
: The JNDI name of your JDBCDataSource
. In the example above, this would bejdbc/synonyms
. -
sql
: A SQL statement returning valid Solr synonym lines in the first SQL result column. Valid synonym formats includex=>a
,x=>a,b,c
,x,y=>a,b,c
orx,a,b,c
.-
You might have your left and right hand side of your synonym definitions stored in separate columns in your database. Use a SQL
concat
function to create a valid synonm line. -
Use the sql
where
statement to select specific synonyms by type or only the one’s approved by qa. -
PostgreSQL is equipped with a ARRAY data type that comes in handy if you have multiple entries on the right or left hand side of your synonym line. Use the
array_to_string
function to join the array values.
-
Contributing
Send us your use cases, user stories, file bugs, Github pull requests or join our engineering team to build leading e-commerce search applications and appliances.