Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-pre-ch-7.txt - draft-ietf-nfsv4-minorversion1-22.txt
 draft-pre-ch-7.txt   draft-ietf-nfsv4-minorversion1-22.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: September 14, 2008 Editors Expires: September 19, 2008 Editors
March 13, 2008 March 18, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-22.txt draft-ietf-nfsv4-minorversion1-22.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 14, 2008. This Internet-Draft will expire on September 19, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 4, line 30 skipping to change at page 4, line 30
8. State Management . . . . . . . . . . . . . . . . . . . . . . 147 8. State Management . . . . . . . . . . . . . . . . . . . . . . 147
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 148 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 148
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 148 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 148
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 149 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 149
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 150 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 150
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 151 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 151
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 156 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 156
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 158 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159
8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 8.4.3. Network Partitions and Recovery . . . . . . . . . . 163
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 167 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 168 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 Expiration . . . . . . . . . . . . . . . . . . . . . . . 169
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 169 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170
9. File Locking and Share Reservations . . . . . . . . . . . . . 170 9. File Locking and Share Reservations . . . . . . . . . . . . . 171
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 171 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 174 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175
9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 175 9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 176
9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 175 9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 176
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 177 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 179 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180
9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 180 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181
10.1. Performance Challenges for Client-Side Caching . . . . . 181 10.1. Performance Challenges for Client-Side Caching . . . . . 181
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 184 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 186 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187
10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 10.3.2. Data Caching and File Locking . . . . . . . . . . . 188
10.3.3. Data Caching and Mandatory File Locking . . . . . . 189 10.3.3. Data Caching and Mandatory File Locking . . . . . . 190
10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 10.3.4. Data Caching and File Identity . . . . . . . . . . . 190
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 193 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 200 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202
10.5.1. Revocation Recovery for Write Open Delegation . . . 202 10.5.1. Revocation Recovery for Write Open Delegation . . . 203
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203
10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 10.7. Data and Metadata Caching and Memory Mapped Files . . . 205
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 207 Delegations . . . . . . . . . . . . . . . . . . . . . . 208
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 207 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210
10.9.1. Introduction to Directory Delegations . . . . . . . 210 10.9.1. Introduction to Directory Delegations . . . . . . . 210
10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 10.9.2. Directory Delegation Design . . . . . . . . . . . . 211
10.9.3. Attributes in Support of Directory Notifications . . 212 10.9.3. Attributes in Support of Directory Notifications . . 212
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 213 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214
11.2. File System Presence or Absence . . . . . . . . . . . . 214 11.2. File System Presence or Absence . . . . . . . . . . . . 214
11.3. Getting Attributes for an Absent File System . . . . . . 215 11.3. Getting Attributes for an Absent File System . . . . . . 215
11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 11.3.1. GETATTR Within an Absent File System . . . . . . . . 215
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 216 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217
11.4. Uses of Location Information . . . . . . . . . . . . . . 217 11.4. Uses of Location Information . . . . . . . . . . . . . . 217
11.4.1. File System Replication . . . . . . . . . . . . . . 218 11.4.1. File System Replication . . . . . . . . . . . . . . 218
11.4.2. File System Migration . . . . . . . . . . . . . . . 219 11.4.2. File System Migration . . . . . . . . . . . . . . . 219
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220
11.5. Location Entries and Server Identity . . . . . . . . . . 221 11.5. Location Entries and Server Identity . . . . . . . . . . 221
11.6. Additional Client-side Considerations . . . . . . . . . 222 11.6. Additional Client-side Considerations . . . . . . . . . 222
11.7. Effecting File System Transitions . . . . . . . . . . . 223 11.7. Effecting File System Transitions . . . . . . . . . . . 223
11.7.1. File System Transitions and Simultaneous Access . . 224 11.7.1. File System Transitions and Simultaneous Access . . 224
11.7.2. Simultaneous Use and Transparent Transitions . . . . 224 11.7.2. Simultaneous Use and Transparent Transitions . . . . 225
11.7.3. Filehandles and File System Transitions . . . . . . 227 11.7.3. Filehandles and File System Transitions . . . . . . 227
11.7.4. Fileids and File System Transitions . . . . . . . . 227 11.7.4. Fileids and File System Transitions . . . . . . . . 228
11.7.5. Fsids and File System Transitions . . . . . . . . . 229 11.7.5. Fsids and File System Transitions . . . . . . . . . 229
11.7.6. The Change Attribute and File System Transitions . . 229 11.7.6. The Change Attribute and File System Transitions . . 230
11.7.7. Lock State and File System Transitions . . . . . . . 230 11.7.7. Lock State and File System Transitions . . . . . . . 230
11.7.8. Write Verifiers and File System Transitions . . . . 234 11.7.8. Write Verifiers and File System Transitions . . . . 234
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 234 Transitions . . . . . . . . . . . . . . . . . . . . 234
11.7.10. File System Data and File System Transitions . . . . 234 11.7.10. File System Data and File System Transitions . . . . 235
11.8. Effecting File System Referrals . . . . . . . . . . . . 236 11.8. Effecting File System Referrals . . . . . . . . . . . . 236
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 242 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 253 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 254 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 256 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263
skipping to change at page 147, line 6 skipping to change at page 147, line 6
a particular file system, as opposed to all of the data within it, a particular file system, as opposed to all of the data within it,
the server can apply the security policy of a shared resource in the the server can apply the security policy of a shared resource in the
server's namespace to components of the resource's ancestors. For server's namespace to components of the resource's ancestors. For
example: example:
/ (place holder/not exported) / (place holder/not exported)
/a/b (file system 1) /a/b (file system 1)
/a/b/MySecretProject (file system 2) /a/b/MySecretProject (file system 2)
The /a/b/MySecretProject directory is a real file system and is the The /a/b/MySecretProject directory is a real file system and is the
shared resource. Suppose the security policy for /a/b/ shared resource. Suppose the security policy for /a/b/
MySecretProject is Kerberos with integrity and it desired that MySecretProject is Kerberos with integrity and it is desired to limit
knowledge of the existence of this file system to be very limited. knowledge of the existence of this file system. In this case, the
In this case the server should apply the same security policy to server should apply the same security policy to /a/b. This allows
/a/b. This allows for knowledge the existence of a file system to be for knowledge of the existence of a file system to be secured when
secured in cases where this is desirable. desirable.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, applying that sort of policy would result in the server's resources, applying that sort of policy would result in
the higher-level file system not being accessible using any security the higher-level file system not being accessible using any security
flavor, which would make the that higher-level file system flavor, which would make the that higher-level file system
inaccessible. Therefore, that sort of configuration is not inaccessible. Therefore, that sort of configuration is not
compatible with hiding the existence (as opposed to the contents) compatible with hiding the existence (as opposed to the contents)
from clients using multiple disjoint sets of security flavors. from clients using multiple disjoint sets of security flavors.
In other circumstances, a desirable policy is for the security of a In other circumstances, a desirable policy is for the security of a
particular object in the server's namespace should include the union particular object in the server's namespace should include the union
of all security mechanisms of all direct descendants. A common and of all security mechanisms of all direct descendants. A common and
convenient practice, unless strong security requirements dictate convenient practice, unless strong security requirements dictate
otherwise, is to make all of the pseudo file system accessible by all otherwise, is to make all of the pseudo file system accessible by all
of the valid security mechanisms. of the valid security mechanisms.
Where there is concern about the security of data on the wire, Where there is concern about the security of data on the network,
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
file system in order to prevent man-in-the-middle-attacks from file system in order to prevent man-in-the-middle attacks.
directing LOOKUPs within the pseudo file system from compromising the
existence of sensitive data, or getting access to data that the
client is sending by directing the client to send it using weak
security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory record locking, the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM [36]. These features include expanded combination of NFS and NLM [36]. These features include expanded
locking facilities, which provide some measure of interclient locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to providing other useful exclusion, but the state is also valuable to offering features not
features not readily providable using a stateless model. There are readily providable using a stateless model. There are three
three components to making this state manageable: components to making this state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent made. Non-client-initiated changes in locking state are infrequent
and the client receives prompt notification of them and can adjust and the client receives prompt notification of them and can adjust
its view of the locking state to reflect the server's changes. its view of the locking state to reflect the server's changes.
Individual pieces of state created by the server and passed to the Individual pieces of state created by the server and passed to the
client at its request are represented by 128-bit stateids. These client at its request are represented by 128-bit stateids. These
stateids may represent a particular open file, a set of byte-range stateids may represent a particular open file, a set of byte-range
locks held by a particular owner, or a recallable delegation of locks held by a particular owner, or a recallable delegation of
skipping to change at page 149, line 4 skipping to change at page 148, line 47
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
When the server grants a lock of any type (including opens, record When the server grants a lock of any type (including opens, record
locks, delegations, and layouts) it responds with a unique stateid, locks, delegations, and layouts) it responds with a unique stateid,
that represents a set of locks (often a single lock) for the same that represents a set of locks (often a single lock) for the same
file, of the same type, and sharing the same ownership file, of the same type, and sharing the same ownership
characteristics. Thus opens of the same file by different open- characteristics. Thus opens of the same file by different open-
owners each have an identifying stateid. Similarly, each set of owners each have an identifying stateid. Similarly, each set of
record locks on a file owned by a specific lock-owner and gotten via record locks on a file owned by a specific lock-owner has its own
an open for a specific open-owner, has its own identifying stateid. identifying stateid. Delegations and layouts also have associated
Delegations and layouts also have associated stateids by which they stateids by which they may be referenced. The stateid is used as a
may be referenced. The stateid is used as a shorthand reference to a shorthand reference to a lock or set of locks and given a stateid the
lock or set of locks and given a stateid the server can determine the server can determine the associated state-owner or state-owners (in
associated state-owner or state-owners (in the case of an open-owner/ the case of an open-owner/lock-owner pair) and the associated
lock-owner pair) and the associated filehandle. When stateids are filehandle. When stateids are used, the current filehandle must be
used, the current filehandle must be the one associated with that the one associated with that stateid.
stateid.
All stateids associated with a given clientid are associated with a All stateids associated with a given client ID are associated with a
common lease which represents the claim of those stateids and the common lease which represents the claim of those stateids and the
objects they represent to be maintained by the server. See objects they represent to be maintained by the server. See
Section 8.3 for a discussion of leases. Section 8.3 for a discussion of leases.
The server may assign stateids independently for different clients. The server may assign stateids independently for different clients.
A stateid with the same bit pattern for one client may designate an A stateid with the same bit pattern for one client may designate an
entirely different set of locks for a different client. The stateid entirely different set of locks for a different client. The stateid
is always interpreted with respect to the client ID associated with is always interpreted with respect to the client ID associated with
the current session. Stateids apply to all sessions associated with the current session. Stateids apply to all sessions associated with
the given client ID and the client may use a stateid obtained from the given client ID and the client may use a stateid obtained from
skipping to change at page 149, line 38 skipping to change at page 149, line 32
With the exception of special stateids, to be discussed later, each With the exception of special stateids, to be discussed later, each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, it is understood there are situations such as a speak of guarantee, it is understood there are situations such as a
client restart, or lock revocation, that allow the guarantee to be client restart, or lock revocation, that allow the guarantee to be
voided. voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
Each stateid in this case represents the open for a given Each stateid in this case represents the open for a given client
clientid/open-owner/filehandle triple. Such stateids are subject ID/open-owner/filehandle triple. Such stateids are subject to
to change (with consequent bumping of the seqid) in response to change (with consequent incrementing of the stateid's seqid) in
OPENs that result in upgrade and OPEN_DOWNGRADE operations. response to OPENs that result in upgrade and OPEN_DOWNGRADE
operations.
o Stateids may represent sets of byte-range locks. o Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and all All locks held on a particular file by a particular owner and all
gotten under the aegis of a particular open file are associated gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being bumped as LOCK and with a single stateid with the seqid being increment whenever LOCK
LOCKU operation affect that set of locks. and LOCKU operations affect that set of locks.
o Stateids may represent file delegations, which are recallable o Stateids may represent file delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not reference, or will not modify a particular file, until the not reference, or will not modify a particular file, until the
delegation is returned. In NFSv4.1, file delegations may be delegation is returned. In NFSv4.1, file delegations may be
obtained on both regular and non-regular files. obtained on both regular and non-regular files.
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular filehandle. particular filehandle.
skipping to change at page 150, line 25 skipping to change at page 150, line 20
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular directory filehandle. particular directory filehandle.
o Stateids may represent layouts, which are recallable guarantees by o Stateids may represent layouts, which are recallable guarantees by
the server to the client, that particular files may be accessed the server to the client, that particular files may be accessed
via an alternate data access protocol at specific locations. Such via an alternate data access protocol at specific locations. Such
access is limited to particular sets of byte ranges and may access is limited to particular sets of byte ranges and may
proceed until those byte ranges are reduced or the layout is proceed until those byte ranges are reduced or the layout is
returned. returned.
A stateid represents all layouts held by a particular client for a A stateid represents the set of all layouts held by a particular
particular filehandle with a given layout type. The seqid is client for a particular filehandle with a given layout type. The
updated as the contents of that set changes with LAYOUT seqid is updated as the layouts of that set changes with layout
stateid changing operations such as LAYOUTGET and LAYOUTRETURN.
8.2.2. Stateid Structure 8.2.2. Stateid Structure
Stateids are divided into two fields, a 96-bit "other" field Stateids are divided into two fields, a 96-bit "other" field
identifying the specific set of locks and a 32-bit "seqid" sequence identifying the specific set of locks and a 32-bit "seqid" sequence
value. Except in the case of special stateids, to be discussed value. Except in the case of special stateids, to be discussed
below, a particular value of the "other" field denotes a set of locks below, a particular value of the "other" field denotes a set of locks
of the same type (for example byte-range locks, opens, delegations, of the same type (for example byte-range locks, opens, delegations,
or layouts), for a specific file or directory, and sharing the same or layouts), for a specific file or directory, and sharing the same
ownership characteristics. The seqid designates a specific instance ownership characteristics. The seqid designates a specific instance
skipping to change at page 156, line 8 skipping to change at page 156, line 4
8.2.5. Stateid Use for I/O Operations 8.2.5. Stateid Use for I/O Operations
Clients performing I/O operations (and SETATTR's modifying the file Clients performing I/O operations (and SETATTR's modifying the file
size), need to select an appropriate stateid based on the locks size), need to select an appropriate stateid based on the locks
(including opens and delegations) held by the client and the various (including opens and delegations) held by the client and the various
types of state-owners issuing the I/O requests. types of state-owners issuing the I/O requests.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. Note that the rules are the selection of the appropriate stateid. Note that the rules are
slightly different in the case of I/O to data servers when file slightly different in the case of I/O to data servers when file
layouts are being used. (See Section 13.9.1). layouts are being used (see Section 13.9.1).
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid should be used. delegation stateid SHOULD be used.
o Otherwise, if the lock-owner corresponding entity (e.g. process) o Otherwise, if the lock-owner corresponding entity (e.g. process)
issuing the I/O has a lock stateid for the associated open file, issuing the I/O has a lock stateid for the associated open file,
then the lock stateid for that lock-owner and open file should be then the lock stateid for that lock-owner and open file SHOULD be
used. used.
o If there is no lock stateid, then the open stateid for the open o If there is no lock stateid, then the open stateid for the open
file in question is used. file in question SHOULD be used.
o Finally, if none of the above apply, then a special stateid should o Finally, if none of the above apply, then a special stateid SHOULD
be used. be used.
8.3. Lease Renewal 8.3. Lease Renewal
The purpose of a lease is to provide allow the client to indicate to The purpose of a lease is to allow the client to indicate to the
the server, in a low-overhead way, that it is active, and thus that server, in a low-overhead way, that it is active, and thus that the
the server is to retain its locks. This arrangement allows the server is to retain the client's locks. This arrangement allows the
server to remove stale locking-related objects that are held by a server to remove stale locking-related objects that are held by a
client that has crashed or is otherwise unreachable, once the client that has crashed or is otherwise unreachable, once the
relevant lease expires. This allows other clients to obtain relevant lease expires. This in turn allows other clients to obtain
conflicting locks without being delayed indefinitely by inactive or conflicting locks without being delayed indefinitely by inactive or
unreachable clients. It is not a mechanism for cache consistency and unreachable clients. It is not a mechanism for cache consistency and
lease renewals may not be denied if the lease interval has not lease renewals may not be denied if the lease interval has not
expired. expired.
Since each session is associated with a specific client (identified Since each session is associated with a specific client (identified
by the client's client ID), any operation sent on that session is an by the client's client ID), any operation sent on that session is an
indication that the associated client is reachable. When a request indication that the associated client is reachable. When a request
is sent for a given session, successful execution of a SEQUENCE is sent for a given session, successful execution of a SEQUENCE
operation (or successful retrieval of the result of SEQUENCE from the operation (or successful retrieval of the result of SEQUENCE from the
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period. implicitly renewed, for the standard renewal period (equal to the
lease_time attribute).
If the client ID's lease has not expired when the server receives a If the client ID's lease has not expired when the server receives a
SEQUENCE operation, then the server MUST renew the lease. If the SEQUENCE operation, then the server MUST renew the lease. If the
client ID's lease has expired when the server receives a SEQUENCE client ID's lease has expired when the server receives a SEQUENCE
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server steps to ensure that the renewal messages actually reach the server
in good time. For example: in good time. For example:
o When trunking is in effect, the client should consider issuing o When trunking is in effect, the client should consider issuing
multiple requests on different connections, in order to ensure multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path that renewal occurs, even in the event of blockage in the path
used for one of those connections. used for one of those connections.
o TCP retransmission delays might become so large as to approach or o Transport retransmission delays might become so large as to
exceed the length of the lease period. This may be particularly approach or exceed the length of the lease period. This may be
likely when the server is unresponsive due to a restart; see particularly likely when the server is unresponsive due to a
Section 8.4.2.1 restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with
exponential back off, such that the maximum retransmission timeout
excees the both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has
restarted and its grace period ends.
The client MUST either recover from the ensuing NFS4ERR_NOGRACE
errors, or it MUST ensure that despite transport level
retransmission intervals that exceed the lease_time, nonetheless a
SEQUENCE operation is sent that renews the lease before
expiration. The client can achieve this by associating a new
connection with the session, and sending a SEQUENCE operation on
it. However, if the attempt to establish a new connection is
delayed for same reason (exponential backoff of the connection
establishment packets), the client will have to abort the
connection establishment attempt before the lease expires, and try
again.
If the server renews the lease upon receiving a SEQUENCE operation, If the server renews the lease upon receiving a SEQUENCE operation,
the server MUST NOT allow the lease to expire while the rest of the the server MUST NOT allow the lease to expire while the rest of the
operations in the COMPOUND procedure's request are still executing. operations in the COMPOUND procedure's request are still executing.
Once the last operation has finished, and the response to COMPOUND Once the last operation has finished, and the response to COMPOUND
has been sent, the server MUST set the lease to expire no sooner than has been sent, the server MUST set the lease to expire no sooner than
the sum of current time and the value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been at least the lease A client ID's lease can expire when it has been at least the lease
interval (lease_time) since the last lease-renewing SEQUENCE interval (lease_time) since the last lease-renewing SEQUENCE
operation was sent on any of the client ID's sessions and there must operation was sent on any of the client ID's sessions and there are
be no active COMPOUND operations on any such session. no active COMPOUND operations on any such sessions.
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate (sr_status_flags) returned by sequence and take the appropriate
action. (See Section 18.46.3 for details). action (see Section 18.46.3 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and o The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel which the client may need to address in order to backchannel which the client may need to address in order to
receive callback requests. receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS
GSS contexts for the backchannel which the client may have to contexts for the backchannel which the client may have to address
address to allow callback requests to be sent to it. to allow callback requests to be sent to it.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and SEQ4_STATUS_ADMIN_STATE_REVOKED, and
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock
revocation events. When these bits are set, the client should use revocation events. When these bits are set, the client should use
TEST_STATEID to find what stateids have been revoked and use TEST_STATEID to find what stateids have been revoked and use
FREE_STATEID to acknowledge loss of the associated state. FREE_STATEID to acknowledge loss of the associated state.
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that o The status bit SEQ4_STATUS_LEASE_MOVE indicates that
responsibility for lease renewal has been transferred to one or responsibility for lease renewal has been transferred to one or
more new servers. more new servers.
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that
due to server restart the client must reclaim locking state. due to server restart the client must reclaim locking state.
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates server has o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server
encountered an unrecoverable fault with the backchannel (e.g. it has encountered an unrecoverable fault with the backchannel (e.g.
has lost track of a sequence id for a slot in the backchannel). it has lost track of a sequence id for a slot in the backchannel).
8.4. Crash Recovery 8.4. Crash Recovery
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts. All READ and WRITE operations that may have been queued restarts. All READ and WRITE operations that may have been queued
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
determine that the client has recovered enough locking state to be determine that the client has recovered enough locking state to be
sure that such operations can be safely processed must be rejected. sure that such operations can be safely processed must be rejected.
This will happen because either: This will happen because either:
o The state presented is no longer valid since it is associated with o The state presented is no longer valid since it is associated with
a now invalid clientid. In this case the client will receive a now invalid client ID. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing clientid will attempt to attach a new session to the existing client ID will
encounter an NFS4ERR_STALE_CLIENTID error. result in an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
discussed in Section 8.3, when a client has not failed and re- discussed in Section 8.3, when a client has not failed and re-
establishes its lease before expiration occurs, requests for establishes its lease before expiration occurs, requests for
conflicting locks will not be granted. conflicting locks will not be granted.
To minimize client delay upon restart, lock requests are associated To minimize client delay upon restart, lock requests are associated
with an instance of the client by a client-supplied verifier. This with an instance of the client by a client-supplied verifier. This
verifier is part of the client_owner4 sent in the initial EXCHANGE_ID verifier is part of the client_owner4 sent in the initial EXCHANGE_ID
call made by the client. The server returns a client ID as a result call made by the client. The server returns a client ID as a result
of the EXCHANGE_ID operation. The client then confirms the use of of the EXCHANGE_ID operation. The client then confirms the use of
the client ID by establishing a session associated with that client the client ID by establishing a session associated with that client
ID. See Section 18.36.3 for a description how this is done. All ID (see Section 18.36.3 for a description how this is done). All
locks, including opens, record locks, delegations, and layouts locks, including opens, record locks, delegations, and layouts
obtained by sessions using that client ID are associated with that obtained by sessions using that client ID are associated with that
client ID. client ID.
Since the verifier will be changed by the client upon each Since the verifier will be changed by the client upon each
initialization, the server can compare a new verifier to the verifier initialization, the server can compare a new verifier to the verifier
associated with currently held locks and determine that they do not associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent match. This signifies the client's new instantiation and subsequent
loss of locking state. As a result, the server is free to release loss (upon confirmation of new the client ID) of locking state. As a
all locks held which are associated with the old client ID which was result, the server is free to release all locks held which are
derived from the old verifier. At this point conflicting locks from associated with the old client ID which was derived from the old
other clients, kept waiting while the lease had not yet expired, can verifier. At this point conflicting locks from other clients, kept
be granted. In addition, all stateids associated with the old waiting while the lease had not yet expired, can be granted. In
clientid can also be freed, as they are no longer reference-able. addition, all stateids associated with the old client ID can also be
freed, as they are no longer reference-able.
Note that the verifier must have the same uniqueness properties as Note that the verifier must have the same uniqueness properties as
the verifier for the COMMIT operation. the verifier for the COMMIT operation.
8.4.2. Server Failure and Recovery 8.4.2. Server Failure and Recovery
If the server loses locking state (usually as a result of a restart), If the server loses locking state (usually as a result of a restart),
it must allow clients time to discover this fact and re-establish the it must allow clients time to discover this fact and re-establish the
lost locking state. The client must be able to re-establish the lost locking state. The client must be able to re-establish the
locking state without having the server deny valid requests because locking state without having the server deny valid requests because
skipping to change at page 159, line 50 skipping to change at page 160, line 22
A client can determine that loss of locking state has occurred via A client can determine that loss of locking state has occurred via
several methods. several methods.
1. When a SEQUENCE (most common) or other operation returns 1. When a SEQUENCE (most common) or other operation returns
NFS4ERR_BADSESSION, this may mean the session has been destroyed, NFS4ERR_BADSESSION, this may mean the session has been destroyed,
but the client ID is still valid. The client sends a but the client ID is still valid. The client sends a
CREATE_SESSION request with the client ID to re-establish the CREATE_SESSION request with the client ID to re-establish the
session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID,
the client must establish a new client ID (see Section 8.1) and the client must establish a new client ID (see Section 8.1) and
re-establish its lock state after the CREATE_SESSION, with the re-establish its lock state with the new client ID, after the
new client ID CREATE_SESSION succeeds, (Section 8.4.2.1). CREATE_SESSION operation succeeds (see Section 8.4.2.1).
2. When a SEQUENCE (most common) or other operation on a persistent 2. When a SEQUENCE (most common) or other operation on a persistent
session returns NFS4ERR_DEADSESSION, this indicates that a session returns NFS4ERR_DEADSESSION, this indicates that a
session is no longer usable for new, i.e. not satisfied from the session is no longer usable for new, i.e. not satisfied from the
reply cache, operations. Once all pending operations are reply cache, operations. Once all pending operations are
determined to be either performed before the retry or not determined to be either performed before the retry or not
performed, the client sends a CREATE_SESSION request with the performed, the client sends a CREATE_SESSION request with the
client ID to re-establish the session. If CREATE_SESSION fails client ID to re-establish the session. If CREATE_SESSION fails
with NFS4ERR_STALE_CLIENTID, the client must establish a new with NFS4ERR_STALE_CLIENTID, the client must establish a new
client ID (see Section 8.1) and re-establish its lock state after client ID (see Section 8.1) and re-establish its lock state after
skipping to change at page 160, line 52 skipping to change at page 161, line 24
reliably determine (through state persistently maintained across reliably determine (through state persistently maintained across
restart instances), that granting any such lock cannot possibly restart instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to TRUE and OPEN operations with a claim type of reclaim set to TRUE and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the rca_one_fs argument set to FALSE, to indicate that it has the rca_one_fs argument set to FALSE, to indicate that it has
reclaimed all of the locking state that it will reclaim. Once a reclaimed all of the locking state that it will reclaim. Once a
client sends such a RECLAIM_COMPLETE operation, it may attempt non- client sends such a RECLAIM_COMPLETE operation, it may attempt non-
reclaim locking operations, although it may get NFS4ERR_GRACE errors reclaim locking operations, although it may get NFS4ERR_GRACE errors
the operations until the period of special handling is over. See the operations until the period of special handling is over. See
Section 11.7.7 for a discussion of the analogous handling lock Section 11.7.7 for a discussion of the analogous handling lock
reclamation in the case of file systems transitioning from server to reclamation in the case of file systems transitioning from server to
server. server.
skipping to change at page 161, line 26 skipping to change at page 161, line 46
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e. other LOCK and OPEN operations and non-reclaim locking requests (i.e. other LOCK and OPEN
operations) with an error of NFS4ERR_GRACE, unless it is able to operations) with an error of NFS4ERR_GRACE, unless it is able to
guarantee that these may be done safely, as described below. guarantee that these may be done safely, as described below.
The grace period may last until all clients which are known to The grace period may last until all clients which are known to
possibly have had locks have done a global RECLAIM_COMPLETE possibly have had locks have done a global RECLAIM_COMPLETE
operation, indicating that they have finished reclaiming the locks operation, indicating that they have finished reclaiming the locks
they held before the server restart. This means that a client which they held before the server restart. This means that a client which
has done a RECLAIM_COMPLETE must be prepared to receive an has done a RECLAIM_COMPLETE must be prepared to receive an
NFS4ERR_GRACE when attempting to acquire new locks. The server is NFS4ERR_GRACE when attempting to acquire new locks. In order for the
assumed to maintain in stable storage a list of clients which may server to know that all clients with possible prior lock state have
have such locks. The server may also terminate the grace period done a RECLAIM_COMPLETE, the server must maintain in stable storage a
before all clients have done a global RECLAIM_COMPLETE. The server list of clients which may have such locks. The server may also
SHOULD NOT terminate the grace period before a time equal to the terminate the grace period before all clients have done a global
lease period in order to give clients an opportunity to find out RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period
about the server restart, as a result of issuing requests on before a time equal to the lease period in order to give clients an
associated sessions with a frequency governed by the lease time. opportunity to find out about the server restart, as a result of
Note that when a client does not issue such requests (or they are issuing requests on associated sessions with a frequency governed by
issued by the client but not received by the server), it is possible the lease time. Note that when a client does not issue such requests
for the grace period to expire before the client finds out that the (or they are issued by the client but not received by the server), it
server restart has occurred. is possible for the grace period to expire before the client finds
out that the server restart has occurred.
Some additional time in order to allow a client to establish a new Some additional time in order to allow a client to establish a new
client ID and session and to effect lock reclaims may be added to the client ID and session and to effect lock reclaims may be added to the
lease time. Note that analogous rules apply to file system-specific lease time. Note that analogous rules apply to file system-specific
grace periods discussed in Section 11.7.7. grace periods discussed in Section 11.7.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
skipping to change at page 163, line 17 skipping to change at page 163, line 38
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
period at least as long as the lease period for the previous server period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
8.4.3. Network Partitions and Recovery 8.4.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will have not received a period provided by the server, the server will not have received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client, or it may allow the lock state to all locks held for the client, or it may allow the lock state to
remain for a considerable period, subject to the constraint that if a remain for a considerable period, subject to the constraint that if a
request for a conflicting lock is made, locks associated with an request for a conflicting lock is made, locks associated with an
expired lease do not prevent such a conflicting lock from being expired lease do not prevent such a conflicting lock from being
granted but MUST be revoked as necessary so as not to interfere with granted but MUST be revoked as necessary so as not to interfere with
such conflicting requests. such conflicting requests.
If the server chooses to delay freeing of lock state until there is a If the server chooses to delay freeing of lock state until there is a
conflict, it may either free all of the clients locks once there is a conflict, it may either free all of the clients locks once there is a
skipping to change at page 163, line 42 skipping to change at page 164, line 15
When the server chooses to free all of a client's lock state, either When the server chooses to free all of a client's lock state, either
immediately upon lease expiration, or a result of the first attempt immediately upon lease expiration, or a result of the first attempt
to obtain a conflicting a lock, the server may report the loss of to obtain a conflicting a lock, the server may report the loss of
lock state in a number of ways. lock state in a number of ways.
The server may choose to invalidate the session and the associated The server may choose to invalidate the session and the associated
client ID. In this case, when the client is able to communicate with client ID. In this case, when the client is able to communicate with
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting the server, it will receive an NFS4ERR_BADSESSION. Upon attempting
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. to create a new session, it would get an NFS4ERR_STALE_CLIENTID.
Upon creating the new clientid and new session it would attempt to Upon creating the new client ID and new session it would attempt to
reclaim locks not be allowed to do so by the server. reclaim locks not be allowed to do so by the server.
Another possibility is for the server to maintain the session and Another possibility is for the server to maintain the session and
clientid but for all stateids held by the client to become invalid or client ID but for all stateids held by the client to become invalid
stale. Once the client is able to reach the server after such a or stale. Once the client is able to reach the server after such a
network partition, the status returned by the SEQUENCE operation will network partition, the status returned by the SEQUENCE operation will
indicate a loss of locking state. (The flag indicate a loss of locking state. (The flag
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in
sr_status_flags.) In addition, all I/O submitted by the client with sr_status_flags.) In addition, all I/O submitted by the client with
the now invalid stateids will fail with the server returning the the now invalid stateids will fail with the server returning the
error NFS4ERR_EXPIRED. Once the client learns of the loss of locking error NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will suitably notify the applications that held the state, it will suitably notify the applications that held the
invalidated locks. The client should then take action to free invalidated locks. The client should then take action to free
invalidated stateids, either by establishing a new client ID using a invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids. the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when lease have expired, only a subset of stateids will
normally become invalid during a network partition. When the client normally become invalid during a network partition. When the client
is able to communicate with the server after such a network is able to communicate with the server after such a network
partition, the status returned by the SEQUENCE operation will partition, the status returned by the SEQUENCE operation will
indicate a partial loss of locking state. In addition, operations, indicate a partial loss of locking state
including I/O submitted by the client with the now invalid stateids (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In addition, operations,
including I/O submitted by the client, with the now invalid stateids
will fail with the server returning the error NFS4ERR_EXPIRED. Once will fail with the server returning the error NFS4ERR_EXPIRED. Once
the client learns of the loss of locking state, it will use the the client learns of the loss of locking state, it will use the
TEST_STATEID operation on all of its stateids to determine which TEST_STATEID operation on all of its stateids to determine which
locks have been lost and then suitably notify the applications that locks have been lost and then suitably notify the applications that
held the invalidated locks. The client can then release the held the invalidated locks. The client can then release the
invalidated locking state and acknowledge the revocation of the invalidated locking state and acknowledge the revocation of the
associated locks by doing a FREE_STATEID operation on each of the associated locks by doing a FREE_STATEID operation on each of the
invalidated stateids. invalidated stateids.
When a network partition is combined with a server restart, there are When a network partition is combined with a server restart, there are
skipping to change at page 167, line 12 skipping to change at page 167, line 35
Regardless of the level and approach to record keeping, the server Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to MUST implement one of the following strategies (which apply to
reclaims of share reservations, record locks, and delegations): reclaims of share reservations, record locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely
unforgiving, but necessary if the server does not record lock unforgiving, but necessary if the server does not record lock
state in stable storage. state in stable storage.
2. Record sufficient state in stable storage such that all known 2. Record sufficient state in stable storage such that all known
edge conditions involving server restart, including the two noted edge conditions involving server restart, including the two noted
in this section, are detected. Erroneously recognizing a edge in this section, are detected. It is acceptable to erroneously
condition and not allowing, when, with sufficient knowledge it recognize an edge condition and not allow a reclaim, when, with
would be grantable, acceptable. Note that at this time, it is sufficient knowledge it would be allowed. Note it is not known
not known if there are other edge conditions. if there are other edge conditions.
In the event that, after a server restart, the server determines In the event that, after a server restart, the server determines
that there is unrecoverable damage or corruption to the that there is unrecoverable damage or corruption to the
information in stable storage, then for all clients and/or locks information in stable storage, then for all clients and/or locks
which may be affected, the server MUST return NFS4ERR_NO_GRACE. which may be affected, the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating such handling are very dependent on the client's operating
environment. However, one potential approach is described below. environment. However, one potential approach is described below.
skipping to change at page 169, line 4 skipping to change at page 169, line 26
When determining the time period for the server lease, the usual When determining the time period for the server lease, the usual
lease tradeoffs apply. Short leases are good for fast server lease tradeoffs apply. Short leases are good for fast server
recovery at a cost of increased operations to effect lease renewal recovery at a cost of increased operations to effect lease renewal
(when there are no other operations during the period to effect lease (when there are no other operations during the period to effect lease
renewal as a side-effect). Long leases are certainly kinder and renewal as a side-effect). Long leases are certainly kinder and
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of long leases proportion to the lease time. The disadvantages of long leases
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, there can be a RECLAIM_COMPLETE. In the event of client failure, there can be a
longer period for leases to expire thus forcing conflicting requests longer period for leases to expire thus forcing conflicting requests
to wait. to wait.
Long leases are usable if the server is able to store lease state in Long leases are practical if the server is able to store lease state
non-volatile memory. Upon recovery, the server can reconstruct the in non-volatile memory. Upon recovery, the server can reconstruct
lease state from its non-volatile memory and continue operation with the lease state from its non-volatile memory and continue operation
its clients and therefore long leases would not be an issue. with its clients and therefore long leases would not be an issue.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lease. There is also the issue of propagation delay across of the lease. There is also the issue of propagation delay across
the network which could easily be several hundred milliseconds as the network which could easily be several hundred milliseconds as
well as the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
To take propagation delay into account, the client should subtract it To take propagation delay into account, the client should subtract it
from lease times (e.g. if the client estimates the one-way from lease times (e.g. if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is propagation delay as 200 millseconds, then it can assume that the
already 200 msec old when it gets it). In addition, it will take lease is already 200 millseconds old when it gets it). In addition,
another 200 msec to get a response back to the server. So the client it will take another 200 millseconds to get a response back to the
must send a lease renewal or write data back to the server 400 msec server. So the client must send a lease renewal or write data back
before the lease would expire. to the server at least 400 millseconds before the lease would expire.
The server's lease period configuration should take into account the The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into resources. It is expected that the lease period will take into
account the network propagation delays and other network delay account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period. server's administrator may have to tune the lease period.
8.8. Obsolete Locking Infrastructure From NFSv4.0 8.8. Obsolete Locking Infrastructure From NFSv4.0
skipping to change at page 170, line 10 skipping to change at page 170, line 32
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1.
The server MUST return NFS4ERR_NOTSUPP if these operations are found The server MUST return NFS4ERR_NOTSUPP if these operations are found
in an NFSv4.1 COMPOUND. in an NFSv4.1 COMPOUND.
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. o SETCLIENTID since its function has been replaced by EXCHANGE_ID.
o SETCLIENTID_CONFIRM since client ID confirmation now happens by o SETCLIENTID_CONFIRM since client ID confirmation now happens by
means of CREATE_SESSION. means of CREATE_SESSION.
o OPEN_CONFIRM because OPENs no longer require confirmation to o OPEN_CONFIRM because state-owner-based seqids have been replaced
establish an owner-based sequence value. by the sequence id in the SEQUENCE operation.
o RELEASE_LOCKOWNER because lock-owners with no associated locks do o RELEASE_LOCKOWNER because lock-owners with no associated locks do
not have any sequence-related state and so can be deleted by the not have any sequence-related state and so can be deleted by the
server at will. server at will.
o RENEW because every SEQUENCE operation for a session causes lease o RENEW because every SEQUENCE operation for a session causes lease
renewal, making a separate operation useless. renewal, making a separate operation superfluous.
Also, there are a number of fields, present in existing operations Also, there are a number of fields, present in existing operations
related to locking that have no use in minor version one. They were related to locking that have no use in minor version one. They were
used in minor version zero to perform functions now provided in a used in minor version zero to perform functions now provided in a
different fashion. different fashion.
o Sequence ids used to sequence requests for a given state-owner and o Sequence ids used to sequence requests for a given state-owner and
to provide retry protection, now provided via sessions. to provide retry protection, now provided via sessions.
o Client IDs used to identify the client associated with a given o Client IDs used to identify the client associated with a given
 End of changes. 66 change blocks. 
140 lines changed or deleted 160 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/