• lock

  • technical

1. Background

Mutually exclusive access to shared resources has traditionally been a problem that many business systems need to solve. In distributed systems, distributed locks are often used as a general-purpose solution.

2. From single-computer locking to distributed locking

In a stand-alone environment, when a shared resource cannot provide mutually exclusive access itself, to prevent data corruption caused by simultaneous read/write access to the shared resource by multiple threads/multi-processes, a third party provides the capability of mutual exclusion, which is often the kernel or a library that provides the capability of mutual exclusion. . If we evolve to a distributed environment, we need a distributed service that provides the same function, different machines get a lock through the service, and the machine that gets the lock can access the shared resources in an exclusive way, so the service is collectively called distributed lock service, and the lock is also called distributed lock.

To abstract the concept of distributed locks, first of all, a distributed lock needs to be a resource that provides concurrency control and outputs an exclusive state, that is.

Lock = Resource + Concurrency Control + Ownership Display

As an example, take a common single lock.

Spinlock = BOOL + CAS (optimistic lock)

Mutex = BOOL + CAS + notification (pessimistic lock)

Spinlock and Mutex are both a Bool resource, via the atomic CAS instruction: when it is now 0 set to 1, it holds the lock if it succeeds and not if it fails if it does not provide a display of ownership, e.g. AtomicInteger, it is also via the resource (Integer) + CAS, but it does not explicitly hint at ownership and therefore will not be, Of course, the "ownership display" can be seen more like a wrapper for some form of service provision.

In a stand-alone environment, the kernel has a "God's perspective" and can know the survival of the process and release the lock resources held by the process when it hangs, but in a distributed environment, this becomes a challenge, and to cope with various machine failures, downtime, etc., it is necessary to provide a new feature to the lock: availability.

Any service that provides three characteristics can provide the ability of distributed locks. The resources can be files, KVs, etc. The availability of locks is ensured by creating atomic operations such as files and KVs, indicating ownership by the result of successful creation, and also by TTL or sessions.

3. System Classification of Distributed Locks

Based on the security of the lock resource itself, we classify distributed locks into two camps.

- distributed systems based on asynchronous replication, such as MySQL, tair, Redis, etc.

- distributed consistent systems based on the Paxos protocol, such as a zookeeper, etc., etc.

In distributed systems based on asynchronous replication, there is a risk of data loss (lock loss), not safe enough, often through the mechanism of TTL to assume a fine-grained lock service, the system is simple to access, suitable for very time-sensitive services, expect to set a short validity period, perform short-term tasks, and the impact of lock loss on the business is relatively controllable.

The distributed system based on the Paxos protocol, which guarantees multiple copies of data through consistency protocols and high data security, often undertakes coarse-grained lock services through the mechanism of leases (sessions), this system requires a certain threshold and is suitable for services that are sensitive to security, wish to hold locks for a long time and do not expect lock loss to occur.