Documentation

is a tiny Scala library I created to cache Squeryl query results into memory in my web applications. It does not depend on Squeryl or anything, and may be used to cache anything.

Characteristics:

  • Written in pure Scala; no annotations, no aspects.
  • Simple, concise and strictly typed usage syntax.
  • No explicit startup/shutdown required.
  • No data serialization; thus no clustering support, and cached objects must be immutable.
  • No data expiration; there is no maxLifetime option.
  • Supports reconfiguration on-the-fly, invalidation and statistics collection for individual caches or for all caches at once via CacheRegistry.
  • Thread-safe: public methods are synchronized.

Contents of this page:

How to cache single values

In my projects, I usually have a single-row config table which stores configuration options. Using Squeryl, I map it like this:

					package myapp.model
					import org.squeryl._

					// It's just convenient to have primary key, even if it's value is always 1.
					case class Config (id: Int, ...) extends KeyedEntity[Int] {
						def this() = this(1, ...)
					}

					object T extends Schema {
						val config = table[Config]
					}
				

And then I access it like this:

					package myapp.dal
					import myapp.model._
					import org.squeryl.PrimitiveTypeMode._
					import ru.dimgel.lib.cache._

					object ConfigDAL {
						private val cache = new ValueCache[Config]

						def data = cache {
							// I need inTransaction{} here because configuration is queried
							// by webapp init() method outside request transaction context.
							inTransaction { from(T.config)(t => select(t)).head }
						}

						def data_=(x: Config) {
							// I don't use inTransaction{} anywhere except the above,
							// because my webapp service() method is wrapped in transaction{}.
							require(x.id == 1)
							T.config.update(x)

							// Optimization trick, to avoid excessive SQL query on next data getter call:
							//cache.clear()
							cache.set(x)
						}
					}
				

So, you create instance of ValueCache[V], and wrap your data query logic into a call to cache.apply(dataProvider: => V): V.

NOTE: If your data query logic throws an exception, it's propagated to the caller, and cache state does not change.

When you update your data, you have to clear (invalidate) cache manually by a call to cache.clear(); so next call to data getter would execute your data query logic. Or, for the sake of optimization, you can enforce cache to store updated data by a call to cache.set(v: V); in this case, next call to data getter would return that data without expensive execution of your data query logic.

I also often use ValueCache for caching lists of objects when I'm sure those lists are finite, like this:

					object NewsDAL {
						private val cache = new ValueCache[List[News]]

						def list = cache {
							from(T.news)(n => where(1 === 1) select(n) orderBy(n.whenCreated desc)).page(0, 10).toList
						}
					}
				

See below how to cache multiple objects by their keys using MapCache, and how ValueCache and MapCache may be used together.

How it works

Pretty simple, in fact. (Heh, I remember myself writing technical documentation for some automated factory control system 15 years ago. When I used word "simple" for something, my boss said: "It's techincal documentation, not literature. It maybe simple for you but occasionally not for others. Speak only facts." It was the best boss I ever had.) The very first version of ValueCache looked like this:

					package ru.dimgel.lib.cache

					class ValueCache[V] {
						private var data_? : Option[V] = None

						def apply(valueProvider: => V): V = synchronized {
							if (data_?.isEmpty)
								data_? = Some(valueProvider)
							data_?.get
						}

						def set(v: V) { synchronized {
							data_? = Some(v)
						}}

						def clear() { synchronized {
							data_? = None
						}}
					}
				

I beleive there's nothing to explain here. Current version supports configuration (see below; for ValueCache, it's just enabled/disabled), statistics collection, global CacheRegistry, but the essence is the same.

How to cache multiple values by their keys

Just example, again. Assume we have list of countries referenced by great many other tables. It would be much more effecient to cache countries separately instead of joining them into lots of queries.

Entity mapping:

					package myapp.model
					import org.squeryl._

					case class Country(id: Int, name: String, ...) extends KeyedEntity[Int] {
						def this() = this(1, null, ...)
					}

					object T extends Schema {
						val country = table[Country]
					}
				

DAL:

					package myapp.dal
					import myapp.model._
					import org.squeryl.PrimitiveTypeMode._
					import ru.dimgel.lib.cache._

					object CountryDAL {
						// By default, there's no limit on number of elements stored in cache.
						private val cache = new MapCache[Int, Country]

						def find(id: Int) =
							// Don't cache negative results, to avoid cache to grow infinitely.
							// So if requested entity does not exist, we throw (None.get throws)
							// and catch that exception outside cache call.
							try {
								Some(cache(id, id => T.country.lookup(id).get))
							} catch {
								case e: NoSuchElementException => None
							}

						def get(id: Int) =
							find(id).get

						def updateCountry(x: Country) {
							require(x.id != 0)
							T.country.update(x)

							//cache.clear()
							//cache.remove(x.id)
							cache.set(x.id, x)
						}

						def insertCountry(x: Country) = {
							require(x.id == 0)

							// I hate when Squeryl injects id into _immutable_ entity.
							val x2 = x.copy()
							T.country.insert(x2)
							assert(x2.id != 0)

							cache.set(x2.id, x2)

							x2
						}
					}
				

The idea is the same as for ValueCache[V], but MapCache[K,V] has two type parameters (storage key and value; internal storage is HashMap[K,V]), and method apply() has more complex signature: apply(k: K, dataProvider: K => V): V.

But there are some tricks about how it's used. Look at find() method in example above. First, negative results are not cached. If you want it, you should instantiate MapCache[Int, Option[Country]]. Second, if your data query logic throws an exception, it's propagated to cache caller and cache state does not change. These two behaviours are leveraged so find() method has return type Option[Country] and returns None if requested country is not found, but that None is not stored into cache.

See below how to configure max MapCache size and how eviction works.

When country is updated, you can, again, invalidate cache completely (which is absolutely stupid in this case), invalidate just single cache entry, or set/replace it immediately and thus avoid excessive SQL query when that entry is requested. When new country is inserted, it's possible to call cache.set(K,V) too.

Thoughts on caching separately instead of joining

It looks that pre-caching dictionaries (often used but rarely modified tables like countries, cities, etc.) may give significant performance advantage and reduce query complexity. But be careful if you have caches for various entities which reference each other.

First trouble: I doubt that Squeryl's relation declarations (ManyToOne, etc.) provide enough immutability semantics to be cached. Currently, I don't use them at all, instead I do this:

					package myapp.model
					case class Country(id: Int, ...) ...
					case class City(id: Int, countryId: Int, ...) ...
				
					package myapp.modelx
					import myapp.model
					case class CityX(city: City, country: Country)
				

Ugly but straitforward. So, if most of your use cases need city's country along with city, it could look natural to cache CityX instead of City:

					package myapp.dal
					import ...

					object CityDAL {
						private val cache = new MapCache[Int, CityX]

						def find(id: Int) =
							try {
								Some(cache(id, id => {
									from(T.city, T.country)((ci,co) =>
										where(ci.id === id and co.id === ci.countryId)
										select(CityX(ci, co))
									).head
								}))
							} catch {
								case e: NoSuchElementException => None
							}
					}
				

But here comes the second trouble: if you update some Country, you'll have to invalidate/update not only the appropriate entry of CountryDAL.cache, but also all entries of CityDAL.cache (and all other caches) which reference it, or you'll obviously get cache inconsistence.

Thinking about this problem, I added methods ValueCache.clearIf(cond: V => Boolean) and MapCache.removeWhere(cond: (K,V) => Boolean) as a potential solution for those who might want to maintain cross-cache consistency. I mean this use-case:

					object CountryDAL {
						def updateCountry(x: Country) {
							...
							cache.set(x.id, id)
							CityDAL.countryChanged(x)
						}
					}
					object CityDAL {
						def countryChanged(x: Country) {
							cache.removeWhere((id,cityX) => cityX.city.countryId == x.id)
						}
					}
				

But this idea looks ugly and dangerous:

  • Couping and complexity. Why the hell CountryDAL must know about CityDAL? Well, that maybe solved using Observer pattern, but the result cannot be called "simple and transparent" anymore in any case. And there maybe problems with Scala object instantiation order and circular dependencies.
  • Since all public methods in all caches are synchronized, I always fear of deadlocks.
  • Lots of data duplication among caches.

So for now I prefer instead of accessing cityX.country call CountryDAL.get(city.countryId) everywhere. I beleive, this is the case when more code results in less complexity. If you disagree, or have other ideas to share on the subject (and of course on everything else =)), I'd be thankful to read them on lib.cache GoogleGroups page.

Configuration, MapCache eviction policy

Configuration options are provided as by-name class parameters of ValueCache and MapCache classes:

					class ValueCache[V] (enabled: => Boolean = true)
					class MapCache[K,V] (enabled: => Boolean = true, maxElements_? : => Option[Int] = None)
				

Caches are enabled by default but can be disabled. In this case their internal storage is cleared, apply() methods always delegate to their dataProviders, and all updater methods (clear(), set(), remove(), .etc.) do nothing.

MapCache also has maxElements_? parameter. Default value None means that cache may grow infinitely. If you specify Some(N), then N must be positive and size of cache's internal HashMap storage would never exceed specified limit. Eviction policy is simple: least recently accessed entries are thrown away. This is done in efficient way, O(1), using auxiliary double-linked list of recently accessed entries (without cached data instance duplication).

Why cache parameters are by-name? They are applied on object instantiation and re-applied when you call cache's reloadConfig() method. You can keep cache parameters even in database (in Config entity fields, see ValueCache usage example in the beginning of documentation), provide site admin with HTML editor form and reapply all cache configurations on its submission. Just define your cache like I do:

					object NotificationsDAL {
						private val byUserIdCache = new MapCache[Int, List[NotificationX]] (
							enabled = ConfigDAL.data.cache_notifications_isEnabled,
							maxElements_? = ConfigDAL.data.cache_notifications_maxElements
						)
					}
				

I repeat: access to class parameters is performed only in two cases: on cache instantiation and each time when you call cache's reloadConfig() method. Not on any access to cache. Parameters are evaluated, their values are stored into internal variables (currently effective configuration) and cache state is adjusted accordingly. For example, if you switch cache from enabled to disabled state, it's internal storage is cleared; if you reduce MapCache's maxElements_? value, expensive least recently accessed elements would be evicted to fit new restriction.

CacheRegistry

Both ValueCache and MapCache extend abstract Cache class which declares their common API and registers its instances into global object CacheRegistry which provides helper methods that affect all registered caches at once:

  • reloadAllConfigs() calls reloadConfig() on all registered caches (this is what I call on config form submission as explained in previous section);
  • clearAll() calls clear() on all registered caches;
  • clearAllStatistics() calls clearStatistics() on all registered caches;
  • getAllStatistics() calls getStatistics() on all registered caches and returns them in unsorted list (see below about statistics).

CacheRegistry stores cache instances in WeakHashMap, so it does not prevent them from being garbage collected.

NOTE: I define my DALs as Scala objects (singletons), and they are instantiated lazily. You cannot affect those caches which don't yet exist (in my case, because they belong to DAL which is not yet instantiated).

CacheStatistics

Caches' getStatistics() methods return an instance of CacheStatistics class which contains snapshot of current cache's configuration and internal statistics counters (see scaladoc for details). I used to display that statistics in HTML table on a page accessible by site admin, along with buttons that perform actions of CacheRegistry API.

CacheStatistics does not contain a reference to the cache instance it was created by, instead it contains cache description. By default, cache description is cache class name long with its type parameter names (this is the only reason why I pass Manifests as implicit arguments to both ValueCache and MapCache). Anyway, these descriptions look ugly and not descriptive at all (in particular, descriptions of all ValueCache[List[T]] are all the same). So until better ideas arrive, it's recommended to override cache descriptions like this:

					object NotificationDAL {

						private val byUserIdCache = new MapCache[Int, List[NotificationX]](
							enabled = ConfigDAL.data.cache_notifications_isEnabled,
							maxElements_? = ConfigDAL.data.cache_notifications_maxElements
						) {
							override protected val description = "NotificationDAL.byUserIdCache"
						}
					}
				

Note that description is val, not def.

Caching both list and by-id map

Things like countries, cities and many others maybe both accessed by id and displayed in list. So it could be useful to consistently cache both list and by-id map.

In this snapshot, I introduced abstract class CachedListAndMap[K,V] which contained two caches ValueCache[List[V]] and MapCache[K,V], and maintained their consistency. The idea was to reload the whole MapCache using MapCache.set(Map[K,V]) method the same moment we load ValueCache[List], to be sure that if requested object is not found in MapCache then it does not exist. If you want details, please download that snapshot, run mvn site, open generated site and read what was there at this subsection's place.

But there's a simpler solution than dealing with two caches:

					package ru.dimgel.lib.cache

					abstract class CachedListAndMap[K, V] {

						protected final class Data(val list: List[V], val map: Map[K,V])

						// Abstract because user will need custom-configured instances.
						protected val cache: ValueCache[Data]

						protected def queryList: Iterable[V]
						protected def getKey(v: V): Option[K]


						private def data = cache {
							val list = queryList.toList
							val map = Map() ++ list.map(v => (getKey(v) -> v)).filter(!_._1.isEmpty).map(t2 => (t2._1.get -> t2._2))
							new Data(list, map)
						}

						final def list = data.list

						final def find(k: K) = data.map.get(k)

						final def get(k: K) = data.map(k)

						final def clear() {
							cache.clear()
						}
					}
				

This class is contained in library. I copied its sources to documentation page just to provide another real usage example.