Lotus Cache ETS: Implementing Distributed Caching
Hey everyone! Today, we're diving deep into a crucial aspect of Lotus caching – distributed caching. Specifically, we'll be addressing the current implementation of Lotus.Cache.ETS
and exploring how we can enhance it to support distributed environments. Let's get started!
The Challenge: Lotus.Cache.ETS is Not Distributed
Currently, the Lotus.Cache.ETS
implementation operates on a local, in-memory basis. This means that cached values are stored only on the node where they were initially cached. For many use cases, this might be perfectly fine. However, in scenarios involving multiple nodes or a distributed workforce, this limitation becomes apparent. Let's break down why this matters.
Why Distributed Caching Matters
Distributed caching becomes essential when dealing with expensive queries and globally distributed teams. Imagine a scenario where your application performs complex database queries that take significant time and resources. If each node in your cluster maintains its own separate cache, users hitting different nodes will trigger these expensive queries repeatedly. This leads to increased latency, higher database load, and a potentially poor user experience. For companies with teams spread across different geographical locations, the likelihood of users interacting with various nodes is even higher, exacerbating the problem.
Another critical use case is simple load balancing. In environments where requests are routed to the next available node rather than a specific session-bound node, the lack of a shared cache means that each request might hit a different node with an empty cache. This can significantly impact performance and negate the benefits of caching altogether. Therefore, to truly leverage the power of caching in these scenarios, a distributed approach is necessary. We need a system where cached data is accessible across all nodes in the cluster, ensuring consistency and minimizing redundant computations.
Steps to Reproduce the Issue
To illustrate the issue, let's walk through a simple reproduction scenario. We'll start two separate iex
sessions on the same machine, simulating a multi-node environment. Here's how you can set it up:
- Terminal 1:
iex --sname lotus1@localhost -S mix
- Terminal 2:
iex --sname lotus2@localhost -S mix
Next, we'll connect these two nodes to form a cluster:
Terminal 1:
iex(lotus1@localhost)1> Node.connect(:lotus2@localhost)
Now, let's verify that the nodes are indeed connected:
Terminal 1:
iex(lotus1@localhost)1> Node.list()
[:lotus2@localhost]
Terminal 2:
iex(lotus2@localhost)1> Node.list()
[:lotus1@localhost]
With our cluster set up, we can now demonstrate the caching issue. Let's cache a value on the first node:
Terminal 1:
iex(lotus1@localhost)2> Lotus.Cache.put("key", "value", :timer.hours(1))
:ok
iex(lotus1@localhost)3> Lotus.Cache.get("key")
{:ok, "value"}
As expected, we can retrieve the cached value on the same node. However, let's try accessing it from the second node:
Terminal 2:
iex(lotus2@localhost)2> Lotus.Cache.get("key")
:miss
As you can see, the second node returns :miss
, indicating that the cached value is not accessible across the cluster. This clearly demonstrates the need for a distributed caching solution.
Expected Behavior: Consistent Caching Across Nodes
Ideally, we would expect that caching a value with Lotus.Cache.put
on one node would make it accessible from any other connected node. This consistent caching behavior is crucial for maintaining performance and data integrity in a distributed environment. Imagine the possibilities: reduced database load, faster response times, and a seamless user experience regardless of which node handles the request. To achieve this, we need to explore different approaches for implementing distributed caching in Lotus.
Proposed Solutions: Building a Distributed Cache
Considering that we're primarily dealing with a query results cache, eventual consistency should be acceptable. This means that cached data might not be instantly synchronized across all nodes, but it will eventually converge to the same state. This approach allows us to prioritize performance and scalability while still providing a robust caching mechanism. Furthermore, we can leave node discovery to the library user, as they might have specific requirements for clustering nodes (e.g., using dns_cluster
or other methods). This flexibility ensures that Lotus can integrate seamlessly into various deployment environments.
Here are three potential solutions for implementing distributed caching in Lotus:
Option 1: Leveraging :pg2
for Clustered ETS Caches
Our first option involves using :pg2
(process groups) to create a cluster of ETS caches across all nodes. With this method, we essentially create a process group that spans all nodes in our cluster. Each node within this group maintains its own ETS cache, and we use process messages to synchronize cache updates (puts and deletions) across all members. This approach mirrors the implementation of Phoenix.PubSub.PG2
, a proven solution for distributed pub-sub functionality within the Phoenix framework. We can draw inspiration from its implementation:
# Example Implementation
defmodule Lotus.Cache.Distributed.PG2 do
def start_link(opts) do
name = Keyword.get(opts, :name, :lotus_cache)
:pg2.start_link(name, self())
{:ok, %{name: name}}
end
def put(key, value, ttl, state) do
:pg2.get_members(state.name)
|> Enum.each(fn member ->
send(member, {:cache_put, key, value, ttl})
end)
Lotus.Cache.ETS.put(key, value, ttl)
end
# Similar implementation for get and delete
end
This implementation ensures that whenever a cache entry is added or removed on one node, the change is propagated to all other nodes in the cluster. This keeps the caches synchronized and ensures that all nodes have access to the most up-to-date cached data. This approach offers a good balance between simplicity and performance, making it a viable option for distributed caching.
Option 2: Integrating Phoenix.PubSub for Inter-Node Communication
Our second option builds upon the first by leveraging Phoenix.PubSub
as a dependency for inter-node communication. Phoenix.PubSub
is a powerful and well-tested library for building pub-sub systems in Elixir, and it doesn't necessarily require the full Phoenix framework. This means we can utilize its distributed messaging capabilities without the overhead of integrating the entire framework. By using Phoenix.PubSub
, we can simplify the process of broadcasting cache updates across the cluster.
This approach offers several advantages. Phoenix.PubSub
handles the complexities of inter-node communication, including message routing, connection management, and fault tolerance. This allows us to focus on the core caching logic and avoid reinventing the wheel. Furthermore, Phoenix.PubSub
provides a flexible and scalable architecture, making it suitable for a wide range of deployment scenarios.
Option 3: Adopting a Third-Party Cache Library Like Cachex
Our third option involves exploring and adopting a dedicated third-party cache library like Cachex. Cachex is a feature-rich, in-memory cache library for Elixir that provides built-in support for distribution and clustering. By leveraging a library like Cachex, we can offload the complexities of distributed caching to a dedicated solution, allowing us to focus on other aspects of Lotus.
Cachex offers several advanced features, including support for different caching strategies, eviction policies, and persistence options. It also provides robust mechanisms for handling concurrency and ensuring data consistency in a distributed environment. By integrating Cachex into Lotus, we can potentially achieve a more scalable and reliable caching solution with minimal effort. However, this option would introduce an external dependency, which is a trade-off to consider.
Alternatives Considered: Documentation and User Responsibility
Before diving into complex implementations, it's worth considering a simpler alternative: providing comprehensive documentation and guidance for users who need distributed caching. This approach aligns with the philosophy of empowering users to make informed decisions and tailor solutions to their specific needs. We can create an article similar to the one provided by Hammer, the Elixir rate limiting library, which details how to implement distributed ETS caching:
- Hammer's Distributed ETS Guide: https://hexdocs.pm/hammer/distributed-ets.html
This article could outline the challenges of distributed caching, explain the limitations of the current Lotus.Cache.ETS
implementation, and provide code examples for building a custom distributed caching solution using tools like :pg2
or Phoenix.PubSub
. This approach places the responsibility for implementation on the user but provides them with the knowledge and resources they need to succeed.
Addressing the Nuances of Distributed Caching
It's crucial to acknowledge that the options we've discussed don't fully address all the complexities of distributed caching. Issues like network partitions (netsplits), cache warming on newly joining nodes, and the cost of broadcasting large cache values require careful consideration. For instance, in the event of a network partition, where nodes become isolated from each other, a naive implementation might lead to data inconsistencies or even data loss. Similarly, when a new node joins the cluster, it needs to