The client-side daemon (CSD) is responsible for those aspects of client-side operation for which kernel implementation is not necessary or desirable.
The CSD tracks the locations of each page of data stored NMS servers, and for noticing when a page has dropped below the desired degree of replication, due to a server crash. In this case, a thread is used to perform a pagein/pageout operation to restore the required degree of replication.
The CSD uses the following interfaces with other system components:
The CSD maintains the following principal data strutures:
The CSD performs the following initialization actions when it starts:
syslog facility,
used for logging status and debugging messages.
open() call on the control interface.
ioctl() calls on the control interface,
the CSD informs the CSKM about each of the servers.
Initially, the servers are marked by the CSD and CSKM as "down".
After initialization, the CSD enters its main loop, in which the following processing is performed:
The crash recovery mechanisms permit the CSD to shutdown at any time without any warning. However, under normal circumstances, the CSD will issue, via the TCP connection to each server with which it is in contact, a "client shutdown" message. Once this message has been sent, the client will close the TCP connection with the server. When the server receives the client shutdown message, it closes its end of the TCP connection, and then proceeds as if the client had crashed.
If the CSD receives a "server shutdown" message over the TCP connection to a server, then the CSD marks that server as "shutting down", and it begins "down server processing" and "restore replication" processing to restore the proper degree of replication of any data stored on that server. The only difference between this procedure and one that would occur if the TCP connection with the server had been lost, is that in this case the CSD will regard the server that is shutting down as a viable candidate for pagein (but not page out) of the pages to be replicated. Once the degree of replication has been restored for all pages stored on the server that is shutting down, the CSD will issue a "server shutdown acknowledge" message and close its side of the TCP connection.
The CSD maintains TCP connections over Ethernet with the SSD at each server host with whom it communicates. These connections are used to establish a context for communication, and for identifying and handling failures or reinitialization of the NMS system on the server hosts.
As part of its main loop, the CSD attempts to contact each server
it knows about, with which it does not currently have a TCP connection,
and to establish such a connection. When such a connection is established,
the SO_KEEPALIVE option is set, so that if the connection
is broken, the CSD will be notified the next time it tries to transmit
over the connection. Periodic heartbeat messages are used to ensure
that this notification occurs reasonably promptly.
When a TCP connection with a server is broken, "down server" processing is performed. This processing includes the following:
ioctl() to inform the CSKM that
the server is down.
When a new TCP connection with a server is established, "up server" processing is performed. This processing includes the following:
ioctl() is used to inform the kernel that
the indicated server is now up, and to supply the kernel
with the session ID for that server, negotiated in the previous
step.
The kernel uses the session ID as follows when receiving messages
from a remote server:
The CSD maintains a re-replication list, consisting of pages currently known to have inadequate degree of replication. The degree of replication of a page is decreased when the connection to one of the servers storing the page is broken. When this occurs, the data on that server is regarded as lost. and the page is added to the re-replication list.
The re-replication list is processed as a separate task within the CSD.
As each page in the list is treated, an ioctl() is first issued
to request the CSKM to page in the page, and then once the page has
arrived, an ioctl() is issued to request the CSKM to page it
out again. As a result, the page ends up stored by a new replica group
having the full desired degree of replication.
The re-replication task is subject to a throttle, which keeps re-replication processing from saturating the system and preventing useful work from taking place.
The CSD maintains a socket on which it will accept TCP connections from system adminstrators. A simple interactive command language is supported over these connections. This permits system administrators, and perhaps system administration scripts, to contact the CSD in order to change system parameters or get status information.
When a connection is made on the administrative connection, the CSD responds by issuing a single greeting line that identifies it, followed by a prompt. The user then issues a one-line command, after which the CSD issues a zero or more line response, followed by a prompt. Commands supported are:
set <variable> <value>
show <variable>
restart
suspend
resume
quit