Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-pre-ch-9.txt - draft-ietf-nfsv4-minorversion1-22.txt
 draft-pre-ch-9.txt   draft-ietf-nfsv4-minorversion1-22.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: September 19, 2008 Editors Expires: September 22, 2008 Editors
March 18, 2008 March 21, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-22.txt draft-ietf-nfsv4-minorversion1-22.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 19, 2008. This Internet-Draft will expire on September 22, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 4, line 39 skipping to change at page 4, line 39
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159
8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 8.4.3. Network Partitions and Recovery . . . . . . . . . . 163
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 Expiration . . . . . . . . . . . . . . . . . . . . . . . 169
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170
9. File Locking and Share Reservations . . . . . . . . . . . . . 171 9. File Locking and Share Reservations . . . . . . . . . . . . . 171
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 171
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175
9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 176 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 176
9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 176 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 176
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 177
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180
9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 181
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181
10.1. Performance Challenges for Client-Side Caching . . . . . 181 10.1. Performance Challenges for Client-Side Caching . . . . . 182
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 183
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187
10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 10.3.2. Data Caching and File Locking . . . . . . . . . . . 188
10.3.3. Data Caching and Mandatory File Locking . . . . . . 190 10.3.3. Data Caching and Mandatory File Locking . . . . . . 190
10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 10.3.4. Data Caching and File Identity . . . . . . . . . . . 190
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 192
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 196
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 199
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 201
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 202
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 203
10.5.1. Revocation Recovery for Write Open Delegation . . . 203 10.5.1. Revocation Recovery for Write Open Delegation . . . 203
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 204
10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 10.7. Data and Metadata Caching and Memory Mapped Files . . . 206
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 208 Delegations . . . . . . . . . . . . . . . . . . . . . . 208
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 210
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 211
10.9.1. Introduction to Directory Delegations . . . . . . . 210 10.9.1. Introduction to Directory Delegations . . . . . . . 211
10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 10.9.2. Directory Delegation Design . . . . . . . . . . . . 212
10.9.3. Attributes in Support of Directory Notifications . . 212 10.9.3. Attributes in Support of Directory Notifications . . 213
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 213
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 214
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 214
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214
11.2. File System Presence or Absence . . . . . . . . . . . . 214 11.2. File System Presence or Absence . . . . . . . . . . . . 215
11.3. Getting Attributes for an Absent File System . . . . . . 215 11.3. Getting Attributes for an Absent File System . . . . . . 216
11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 11.3.1. GETATTR Within an Absent File System . . . . . . . . 216
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217
11.4. Uses of Location Information . . . . . . . . . . . . . . 217 11.4. Uses of Location Information . . . . . . . . . . . . . . 218
11.4.1. File System Replication . . . . . . . . . . . . . . 218 11.4.1. File System Replication . . . . . . . . . . . . . . 219
11.4.2. File System Migration . . . . . . . . . . . . . . . 219 11.4.2. File System Migration . . . . . . . . . . . . . . . 219
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 221
11.5. Location Entries and Server Identity . . . . . . . . . . 221 11.5. Location Entries and Server Identity . . . . . . . . . . 222
11.6. Additional Client-side Considerations . . . . . . . . . 222 11.6. Additional Client-side Considerations . . . . . . . . . 223
11.7. Effecting File System Transitions . . . . . . . . . . . 223 11.7. Effecting File System Transitions . . . . . . . . . . . 223
11.7.1. File System Transitions and Simultaneous Access . . 224 11.7.1. File System Transitions and Simultaneous Access . . 225
11.7.2. Simultaneous Use and Transparent Transitions . . . . 225 11.7.2. Simultaneous Use and Transparent Transitions . . . . 225
11.7.3. Filehandles and File System Transitions . . . . . . 227 11.7.3. Filehandles and File System Transitions . . . . . . 228
11.7.4. Fileids and File System Transitions . . . . . . . . 228 11.7.4. Fileids and File System Transitions . . . . . . . . 228
11.7.5. Fsids and File System Transitions . . . . . . . . . 229 11.7.5. Fsids and File System Transitions . . . . . . . . . 230
11.7.6. The Change Attribute and File System Transitions . . 230 11.7.6. The Change Attribute and File System Transitions . . 230
11.7.7. Lock State and File System Transitions . . . . . . . 230 11.7.7. Lock State and File System Transitions . . . . . . . 231
11.7.8. Write Verifiers and File System Transitions . . . . 234 11.7.8. Write Verifiers and File System Transitions . . . . 235
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 234 Transitions . . . . . . . . . . . . . . . . . . . . 235
11.7.10. File System Data and File System Transitions . . . . 235 11.7.10. File System Data and File System Transitions . . . . 235
11.8. Effecting File System Referrals . . . . . . . . . . . . 236 11.8. Effecting File System Referrals . . . . . . . . . . . . 237
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 237
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 241
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 246
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 249
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 261
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 261
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 263
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 263
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 263
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 264
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 264
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 264
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 264
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 264
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 264 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 265
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 264 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 265
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 265 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 266
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 266 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 267
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 267 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 268
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 267 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 268
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 267 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 268
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 269 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 270
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 270 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 271
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 271 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 272
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 274 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 275
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 281 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 282
12.5.7. Metadata Server Write Propagation . . . . . . . . . 281 12.5.7. Metadata Server Write Propagation . . . . . . . . . 282
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 281 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 282
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 283 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 284
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 283 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 284
12.7.2. Dealing with Lease Expiration on the Client . . . . 284 12.7.2. Dealing with Lease Expiration on the Client . . . . 285
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 285 Server . . . . . . . . . . . . . . . . . . . . . . . 286
12.7.4. Recovery from Metadata Server Restart . . . . . . . 285 12.7.4. Recovery from Metadata Server Restart . . . . . . . 286
12.7.5. Operations During Metadata Server Grace Period . . . 287 12.7.5. Operations During Metadata Server Grace Period . . . 288
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 288 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 289
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 288 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 289
12.9. Security Considerations for pNFS . . . . . . . . . . . . 288 12.9. Security Considerations for pNFS . . . . . . . . . . . . 289
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 289 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 290
13.1. Client ID and Session Considerations . . . . . . . . . . 290 13.1. Client ID and Session Considerations . . . . . . . . . . 291
13.1.1. Sessions Considerations for Data Servers . . . . . . 292 13.1.1. Sessions Considerations for Data Servers . . . . . . 293
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 292 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 293
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 293 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 294
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 297 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 298
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 297 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 298
13.4.2. Interpreting the File Layout Using Sparse Packing . 297 13.4.2. Interpreting the File Layout Using Sparse Packing . 298
13.4.3. Interpreting the File Layout Using Dense Packing . . 300 13.4.3. Interpreting the File Layout Using Dense Packing . . 301
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 302 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 303
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 304 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 305
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 305 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 306
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 307 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 308
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 309 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 310
13.9. Metadata and Data Server State Coordination . . . . . . 309 13.9. Metadata and Data Server State Coordination . . . . . . 310
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 309 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 310
13.9.2. Data Server State Propagation . . . . . . . . . . . 310 13.9.2. Data Server State Propagation . . . . . . . . . . . 311
13.10. Data Server Component File Size . . . . . . . . . . . . 312 13.10. Data Server Component File Size . . . . . . . . . . . . 313
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 313 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 314
13.12. Security Considerations for the File Layout Type . . . . 313 13.12. Security Considerations for the File Layout Type . . . . 314
14. Internationalization . . . . . . . . . . . . . . . . . . . . 314 14. Internationalization . . . . . . . . . . . . . . . . . . . . 315
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 315 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 316
14.2. Stringprep profile for the utf8str_cis type . . . . . . 317 14.2. Stringprep profile for the utf8str_cis type . . . . . . 318
14.3. Stringprep profile for the utf8str_mixed type . . . . . 318 14.3. Stringprep profile for the utf8str_mixed type . . . . . 319
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 320 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 321
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 320 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 321
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 321 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 322
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 321 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 322
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 323 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 324
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 325 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 326
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 326 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 327
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 328 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 329
15.1.5. State Management Errors . . . . . . . . . . . . . . 330 15.1.5. State Management Errors . . . . . . . . . . . . . . 331
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 331 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 332
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 331 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 332
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 332 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 333
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 333 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 334
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 334 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 335
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 335 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 336
15.1.12. Session Management Errors . . . . . . . . . . . . . 336 15.1.12. Session Management Errors . . . . . . . . . . . . . 337
15.1.13. Client Management Errors . . . . . . . . . . . . . . 337 15.1.13. Client Management Errors . . . . . . . . . . . . . . 338
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 338 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 339
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 338 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 339
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 339 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 340
15.2. Operations and their valid errors . . . . . . . . . . . 340 15.2. Operations and their valid errors . . . . . . . . . . . 341
15.3. Callback operations and their valid errors . . . . . . . 356 15.3. Callback operations and their valid errors . . . . . . . 357
15.4. Errors and the operations that use them . . . . . . . . 358 15.4. Errors and the operations that use them . . . . . . . . 359
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 372 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 373
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 372 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 373
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 373 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 374
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 383 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 384
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 386 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 387
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 386 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 387
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 389 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 390
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 390 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 391
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 393 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 394
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 396 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 397
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 397 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 398
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 397 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 398
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 399 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 400
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 400 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 401
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 402 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 403
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 406 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 407
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 408 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 409
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 409 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 410
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 411 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 412
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 412 Attributes . . . . . . . . . . . . . . . . . . . . . . . 413
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 413 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 414
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 432 Directory . . . . . . . . . . . . . . . . . . . . . . . 433
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 433 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 434
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 435 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 436
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 435 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 436
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 437 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 438
18.22. Operation 25: READ - Read from File . . . . . . . . . . 437 18.22. Operation 25: READ - Read from File . . . . . . . . . . 438
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 440 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 441
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 444 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 445
18.25. Operation 28: REMOVE - Remove File System Object . . . . 445 18.25. Operation 28: REMOVE - Remove File System Object . . . . 446
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 447 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 448
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 451 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 452
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 452 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 453
18.29. Operation 33: SECINFO - Obtain Available Security . . . 452 18.29. Operation 33: SECINFO - Obtain Available Security . . . 453
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 455 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 456
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 458 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 459
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 459 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 460
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 464 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 465
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 465 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 466
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 468 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 469
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 484 Confirm Client ID . . . . . . . . . . . . . . . . . . . 485
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 494 session . . . . . . . . . . . . . . . . . . . . . . . . 495
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 496 locks . . . . . . . . . . . . . . . . . . . . . . . . . 497
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 497 delegation . . . . . . . . . . . . . . . . . . . . . . . 498
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 501 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 502
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 503 for a File System . . . . . . . . . . . . . . . . . . . 504
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 505 a layout . . . . . . . . . . . . . . . . . . . . . . . . 506
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 508 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 509
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 512 Information . . . . . . . . . . . . . . . . . . . . . . 513
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 517 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 518
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 518 sequencing and control . . . . . . . . . . . . . . . . . 519
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 524 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 525
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 526 validity . . . . . . . . . . . . . . . . . . . . . . . . 527
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 528 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 529
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 532 client ID . . . . . . . . . . . . . . . . . . . . . . . 533
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 532 Finished . . . . . . . . . . . . . . . . . . . . . . . . 533
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 535 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 536
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 535 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 536
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 536 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 537
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 536 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 537
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 540 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 541
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 540 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 541
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 541 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 542
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 542 Client . . . . . . . . . . . . . . . . . . . . . . . . . 543
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 546 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 547
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 550 Client . . . . . . . . . . . . . . . . . . . . . . . . . 551
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 551 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 552
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 553 Resources for Recallable Objects . . . . . . . . . . . . 554
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 554 limits . . . . . . . . . . . . . . . . . . . . . . . . . 555
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 555 sequencing and control . . . . . . . . . . . . . . . . . 556
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 557 Delegation Wants . . . . . . . . . . . . . . . . . . . . 558
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 558 lock availability . . . . . . . . . . . . . . . . . . . 559
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 560 changes . . . . . . . . . . . . . . . . . . . . . . . . 561
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 562 Operation . . . . . . . . . . . . . . . . . . . . . . . 563
21. Security Considerations . . . . . . . . . . . . . . . . . . . 562 21. Security Considerations . . . . . . . . . . . . . . . . . . . 563
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 564 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 565
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 564 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 565
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 564 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 565
22.3. Defining New Notifications . . . . . . . . . . . . . . . 565 22.3. Defining New Notifications . . . . . . . . . . . . . . . 566
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 565 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 566
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 567 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 568
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 567 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 568
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 567 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 568
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 567 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 568
23.1. Normative References . . . . . . . . . . . . . . . . . . 567 23.1. Normative References . . . . . . . . . . . . . . . . . . 568
23.2. Informative References . . . . . . . . . . . . . . . . . 569 23.2. Informative References . . . . . . . . . . . . . . . . . 570
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 570 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 571
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 572 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 573
Intellectual Property and Copyright Statements . . . . . . . . . 574 Intellectual Property and Copyright Statements . . . . . . . . . 575
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 147, line 40 skipping to change at page 147, line 40
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking, the protocol becomes substantially more mandatory record locking, the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM [36]. These features include expanded combination of NFS and NLM [36]. These features include expanded
locking facilities, which provide some measure of interclient locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to offering features not exclusion, but the state also offers features not readily providable
readily providable using a stateless model. There are three using a stateless model. There are three components to making this
components to making this state manageable: state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent made. Non-client-initiated changes in locking state are infrequent.
and the client receives prompt notification of them and can adjust The client receives prompt notification of such changes and can
its view of the locking state to reflect the server's changes. adjust its view of the locking state to reflect the server's changes.
Individual pieces of state created by the server and passed to the Individual pieces of state created by the server and passed to the
client at its request are represented by 128-bit stateids. These client at its request are represented by 128-bit stateids. These
stateids may represent a particular open file, a set of byte-range stateids may represent a particular open file, a set of byte-range
locks held by a particular owner, or a recallable delegation of locks held by a particular owner, or a recallable delegation of
privileges to access a file in particular ways, or at a particular privileges to access a file in particular ways, or at a particular
location. location.
In all cases, there is a transition from the most general information In all cases, there is a transition from the most general information
which represents a client as a whole to the eventual lightweight which represents a client as a whole to the eventual lightweight
skipping to change at page 149, line 32 skipping to change at page 149, line 32
With the exception of special stateids, to be discussed later, each With the exception of special stateids, to be discussed later, each
stateid represents locking objects of one of a set of types defined stateid represents locking objects of one of a set of types defined
by the NFSv4.1 protocol. Note that in all these cases, where we by the NFSv4.1 protocol. Note that in all these cases, where we
speak of guarantee, it is understood there are situations such as a speak of guarantee, it is understood there are situations such as a
client restart, or lock revocation, that allow the guarantee to be client restart, or lock revocation, that allow the guarantee to be
voided. voided.
o Stateids may represent opens of files. o Stateids may represent opens of files.
Each stateid in this case represents the open for a given client Each stateid in this case represents the open state for a given
ID/open-owner/filehandle triple. Such stateids are subject to client ID/open-owner/filehandle triple. Such stateids are subject
change (with consequent incrementing of the stateid's seqid) in to change (with consequent incrementing of the stateid's seqid) in
response to OPENs that result in upgrade and OPEN_DOWNGRADE response to OPENs that result in upgrade and OPEN_DOWNGRADE
operations. operations.
o Stateids may represent sets of byte-range locks. o Stateids may represent sets of byte-range locks.
All locks held on a particular file by a particular owner and all All locks held on a particular file by a particular owner and all
gotten under the aegis of a particular open file are associated gotten under the aegis of a particular open file are associated
with a single stateid with the seqid being increment whenever LOCK with a single stateid with the seqid being incremented whenever
and LOCKU operations affect that set of locks. LOCK and LOCKU operations affect that set of locks.
o Stateids may represent file delegations, which are recallable o Stateids may represent file delegations, which are recallable
guarantees by the server to the client, that other clients will guarantees by the server to the client, that other clients will
not reference, or will not modify a particular file, until the not reference, or will not modify a particular file, until the
delegation is returned. In NFSv4.1, file delegations may be delegation is returned. In NFSv4.1, file delegations may be
obtained on both regular and non-regular files. obtained on both regular and non-regular files.
A stateid represents a single delegation held by a client for a A stateid represents a single delegation held by a client for a
particular filehandle. particular filehandle.
skipping to change at page 157, line 20 skipping to change at page 157, line 20
used for one of those connections. used for one of those connections.
o Transport retransmission delays might become so large as to o Transport retransmission delays might become so large as to
approach or exceed the length of the lease period. This may be approach or exceed the length of the lease period. This may be
particularly likely when the server is unresponsive due to a particularly likely when the server is unresponsive due to a
restart; see Section 8.4.2.1. If the client implementation is not restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends. failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with The scenario is that the client is using a transport with
exponential back off, such that the maximum retransmission timeout exponential back off, such that the maximum retransmission timeout
excees the both the grace period and the lease_time attribute. A exceeds the both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has transport-level retransmission is sent after the server has
restarted and its grace period ends. restarted and its grace period ends.
The client MUST either recover from the ensuing NFS4ERR_NOGRACE The client MUST either recover from the ensuing NFS4ERR_NOGRACE
errors, or it MUST ensure that despite transport level errors, or it MUST ensure that despite transport level
retransmission intervals that exceed the lease_time, nonetheless a retransmission intervals that exceed the lease_time, nonetheless a
SEQUENCE operation is sent that renews the lease before SEQUENCE operation is sent that renews the lease before
expiration. The client can achieve this by associating a new expiration. The client can achieve this by associating a new
connection with the session, and sending a SEQUENCE operation on connection with the session, and sending a SEQUENCE operation on
it. However, if the attempt to establish a new connection is it. However, if the attempt to establish a new connection is
delayed for same reason (exponential backoff of the connection delayed for some reason (e.g. exponential backoff of the
establishment packets), the client will have to abort the connection establishment packets), the client will have to abort
connection establishment attempt before the lease expires, and try the connection establishment attempt before the lease expires, and
again. attempt to re-connect.
If the server renews the lease upon receiving a SEQUENCE operation, If the server renews the lease upon receiving a SEQUENCE operation,
the server MUST NOT allow the lease to expire while the rest of the the server MUST NOT allow the lease to expire while the rest of the
operations in the COMPOUND procedure's request are still executing. operations in the COMPOUND procedure's request are still executing.
Once the last operation has finished, and the response to COMPOUND Once the last operation has finished, and the response to COMPOUND
has been sent, the server MUST set the lease to expire no sooner than has been sent, the server MUST set the lease to expire no sooner than
the sum of current time and the value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been at least the lease A client ID's lease can expire when it has been at least the lease
interval (lease_time) since the last lease-renewing SEQUENCE interval (lease_time) since the last lease-renewing SEQUENCE
skipping to change at page 159, line 38 skipping to change at page 159, line 38
the client ID by establishing a session associated with that client the client ID by establishing a session associated with that client
ID (see Section 18.36.3 for a description how this is done). All ID (see Section 18.36.3 for a description how this is done). All
locks, including opens, record locks, delegations, and layouts locks, including opens, record locks, delegations, and layouts
obtained by sessions using that client ID are associated with that obtained by sessions using that client ID are associated with that
client ID. client ID.
Since the verifier will be changed by the client upon each Since the verifier will be changed by the client upon each
initialization, the server can compare a new verifier to the verifier initialization, the server can compare a new verifier to the verifier
associated with currently held locks and determine that they do not associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent match. This signifies the client's new instantiation and subsequent
loss (upon confirmation of new the client ID) of locking state. As a loss (upon confirmation of the new client ID) of locking state. As a
result, the server is free to release all locks held which are result, the server is free to release all locks held which are
associated with the old client ID which was derived from the old associated with the old client ID which was derived from the old
verifier. At this point conflicting locks from other clients, kept verifier. At this point conflicting locks from other clients, kept
waiting while the lease had not yet expired, can be granted. In waiting while the lease had not yet expired, can be granted. In
addition, all stateids associated with the old client ID can also be addition, all stateids associated with the old client ID can also be
freed, as they are no longer reference-able. freed, as they are no longer reference-able.
Note that the verifier must have the same uniqueness properties as Note that the verifier must have the same uniqueness properties as
the verifier for the COMMIT operation. the verifier for the COMMIT operation.
skipping to change at page 161, line 13 skipping to change at page 161, line 13
are variants of the requests normally used to create locks of that are variants of the requests normally used to create locks of that
type and are referred to as "reclaim-type" requests and the process type and are referred to as "reclaim-type" requests and the process
of re-establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, period" is devoted to the reclaim process. During this period,
requests creating client IDs and sessions are handled normally, but requests creating client IDs and sessions are handled normally, but
locking requests are subject to special restrictions. Only reclaim- locking requests are subject to special restrictions. Only reclaim-
type locking requests are allowed, unless the server is able to type locking requests are allowed, unless the server can reliably
reliably determine (through state persistently maintained across determine (through state persistently maintained across restart
restart instances), that granting any such lock cannot possibly instances), that granting any such lock cannot possibly conflict with
conflict with a subsequent reclaim. When a request is made to obtain a subsequent reclaim. When a request is made to obtain a new lock
a new lock (i.e. not a reclaim-type request) during the grace period (i.e. not a reclaim-type request) during the grace period and such a
and such a determination cannot be made, the server must return the determination cannot be made, the server must return the error
error NFS4ERR_GRACE. NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to TRUE and OPEN operations with a claim type of reclaim set to TRUE and OPEN operations with a claim type of
CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the rca_one_fs argument set to FALSE, to indicate that it has the rca_one_fs argument set to FALSE, to indicate that it has
reclaimed all of the locking state that it will reclaim. Once a reclaimed all of the locking state that it will reclaim. Once a
client sends such a RECLAIM_COMPLETE operation, it may attempt non- client sends such a RECLAIM_COMPLETE operation, it may attempt non-
reclaim locking operations, although it may get NFS4ERR_GRACE errors reclaim locking operations, although it may get NFS4ERR_GRACE errors
the operations until the period of special handling is over. See the operations until the period of special handling is over. See
Section 11.7.7 for a discussion of the analogous handling lock Section 11.7.7 for a discussion of the analogous handling lock
reclamation in the case of file systems transitioning from server to reclamation in the case of file systems transitioning from server to
server. server.
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e. other LOCK and OPEN operations and non-reclaim locking requests (i.e. other LOCK and OPEN
operations) with an error of NFS4ERR_GRACE, unless it is able to operations) with an error of NFS4ERR_GRACE, unless it can guarantee
guarantee that these may be done safely, as described below. that these may be done safely, as described below.
The grace period may last until all clients which are known to The grace period may last until all clients which are known to
possibly have had locks have done a global RECLAIM_COMPLETE possibly have had locks have done a global RECLAIM_COMPLETE
operation, indicating that they have finished reclaiming the locks operation, indicating that they have finished reclaiming the locks
they held before the server restart. This means that a client which they held before the server restart. This means that a client which
has done a RECLAIM_COMPLETE must be prepared to receive an has done a RECLAIM_COMPLETE must be prepared to receive an
NFS4ERR_GRACE when attempting to acquire new locks. In order for the NFS4ERR_GRACE when attempting to acquire new locks. In order for the
server to know that all clients with possible prior lock state have server to know that all clients with possible prior lock state have
done a RECLAIM_COMPLETE, the server must maintain in stable storage a done a RECLAIM_COMPLETE, the server must maintain in stable storage a
list of clients which may have such locks. The server may also list of clients which may have such locks. The server may also
skipping to change at page 163, line 18 skipping to change at page 163, line 18
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[37]. The client must account for the server that is able to perform [37]. The client must account for the server that can perform I/O
I/O and non-reclaim locking requests within the grace period as well and non-reclaim locking requests within the grace period as well as
as those that can not do so. those that cannot do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since restart. I/O request has been granted since restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
skipping to change at page 164, line 12 skipping to change at page 164, line 12
allow conflicting requests. When it adopts the finer-grained allow conflicting requests. When it adopts the finer-grained
approach, it must revoke all locks associated with a given stateid, approach, it must revoke all locks associated with a given stateid,
even if the conflict is with only a subset of locks. even if the conflict is with only a subset of locks.
When the server chooses to free all of a client's lock state, either When the server chooses to free all of a client's lock state, either
immediately upon lease expiration, or a result of the first attempt immediately upon lease expiration, or a result of the first attempt
to obtain a conflicting a lock, the server may report the loss of to obtain a conflicting a lock, the server may report the loss of
lock state in a number of ways. lock state in a number of ways.
The server may choose to invalidate the session and the associated The server may choose to invalidate the session and the associated
client ID. In this case, when the client is able to communicate with client ID. In this case, once the client can communicate with the
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting server, it will receive an NFS4ERR_BADSESSION error. Upon attempting
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. to create a new session, it would get an NFS4ERR_STALE_CLIENTID.
Upon creating the new client ID and new session it would attempt to Upon creating the new client ID and new session it would attempt to
reclaim locks not be allowed to do so by the server. reclaim locks not be allowed to do so by the server.
Another possibility is for the server to maintain the session and Another possibility is for the server to maintain the session and
client ID but for all stateids held by the client to become invalid client ID but for all stateids held by the client to become invalid
or stale. Once the client is able to reach the server after such a or stale. Once the client can reach the server after such a network
network partition, the status returned by the SEQUENCE operation will partition, the status returned by the SEQUENCE operation will
indicate a loss of locking state. (The flag indicate a loss of locking state, i.e. the flag
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags.
sr_status_flags.) In addition, all I/O submitted by the client with In addition, all I/O submitted by the client with the now invalid
the now invalid stateids will fail with the server returning the stateids will fail with the server returning the error
error NFS4ERR_EXPIRED. Once the client learns of the loss of locking NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will suitably notify the applications that held the state, it will suitably notify the applications that held the
invalidated locks. The client should then take action to free invalidated locks. The client should then take action to free
invalidated stateids, either by establishing a new client ID using a invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids. the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when lease have expired, only a subset of stateids will
normally become invalid during a network partition. When the client normally become invalid during a network partition. When the client
is able to communicate with the server after such a network can communicate with the server after such a network partition heals,
partition, the status returned by the SEQUENCE operation will the status returned by the SEQUENCE operation will indicate a partial
indicate a partial loss of locking state loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In
(SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In addition, operations, addition, operations, including I/O submitted by the client, with the
including I/O submitted by the client, with the now invalid stateids now invalid stateids will fail with the server returning the error
will fail with the server returning the error NFS4ERR_EXPIRED. Once NFS4ERR_EXPIRED. Once the client learns of the loss of locking
the client learns of the loss of locking state, it will use the state, it will use the TEST_STATEID operation on all of its stateids
TEST_STATEID operation on all of its stateids to determine which to determine which locks have been lost and then suitably notify the
locks have been lost and then suitably notify the applications that applications that held the invalidated locks. The client can then
held the invalidated locks. The client can then release the release the invalidated locking state and acknowledge the revocation
invalidated locking state and acknowledge the revocation of the of the associated locks by doing a FREE_STATEID operation on each of
associated locks by doing a FREE_STATEID operation on each of the the invalidated stateids.
invalidated stateids.
When a network partition is combined with a server restart, there are When a network partition is combined with a server restart, there are
edge conditions that place requirements on the server in order to edge conditions that place requirements on the server in order to
avoid silent data corruption following the server restart. Two of avoid silent data corruption following the server restart. Two of
these edge conditions are known, and are discussed below. these edge conditions are known, and are discussed below.
The first edge condition arises as a result of the scenarios such as The first edge condition arises as a result of the scenarios such as
the following: the following:
1. Client A acquires a lock. 1. Client A acquires a lock.
skipping to change at page 167, line 37 skipping to change at page 167, line 37
reclaims of share reservations, record locks, and delegations): reclaims of share reservations, record locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely
unforgiving, but necessary if the server does not record lock unforgiving, but necessary if the server does not record lock
state in stable storage. state in stable storage.
2. Record sufficient state in stable storage such that all known 2. Record sufficient state in stable storage such that all known
edge conditions involving server restart, including the two noted edge conditions involving server restart, including the two noted
in this section, are detected. It is acceptable to erroneously in this section, are detected. It is acceptable to erroneously
recognize an edge condition and not allow a reclaim, when, with recognize an edge condition and not allow a reclaim, when, with
sufficient knowledge it would be allowed. Note it is not known sufficient knowledge it would be allowed. The error the server
if there are other edge conditions. would return in this case is NFS4ERR_NO_GRACE. Note it is not
known if there are other edge conditions.
In the event that, after a server restart, the server determines In the event that, after a server restart, the server determines
that there is unrecoverable damage or corruption to the that there is unrecoverable damage or corruption to the
information in stable storage, then for all clients and/or locks information in stable storage, then for all clients and/or locks
which may be affected, the server MUST return NFS4ERR_NO_GRACE. which may be affected, the server MUST return NFS4ERR_NO_GRACE.
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is A mandate for the client's handling of the NFS4ERR_NO_GRACE error is
outside the scope of this specification, since the strategies for outside the scope of this specification, since the strategies for
such handling are very dependent on the client's operating such handling are very dependent on the client's operating
environment. However, one potential approach is described below. environment. However, one potential approach is described below.
skipping to change at page 169, line 32 skipping to change at page 169, line 33
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of long leases proportion to the lease time. The disadvantages of long leases
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, there can be a RECLAIM_COMPLETE. In the event of client failure, there can be a
longer period for leases to expire thus forcing conflicting requests longer period for leases to expire thus forcing conflicting requests
to wait. to wait.
Long leases are practical if the server is able to store lease state Long leases are practical if the server is can store lease state in
in non-volatile memory. Upon recovery, the server can reconstruct non-volatile memory. Upon recovery, the server can reconstruct the
the lease state from its non-volatile memory and continue operation lease state from its non-volatile memory and continue operation with
with its clients and therefore long leases would not be an issue. its clients and therefore long leases would not be an issue.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lease. There is also the issue of propagation delay across of the lease. There is also the issue of propagation delay across
the network which could easily be several hundred milliseconds as the network which could easily be several hundred milliseconds as
well as the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
skipping to change at page 171, line 20 skipping to change at page 171, line 23
DESTROY_CLIENTID) are not ignored. DESTROY_CLIENTID) are not ignored.
9. File Locking and Share Reservations 9. File Locking and Share Reservations
To support Win32 share reservations it is necessary to provide To support Win32 share reservations it is necessary to provide
operations which atomically open or create files. Having a separate operations which atomically open or create files. Having a separate
share/unshare operation would not allow correct implementation of the share/unshare operation would not allow correct implementation of the
Win32 OpenFile API. In order to correctly implement share semantics, Win32 OpenFile API. In order to correctly implement share semantics,
the previous NFS protocol mechanisms used when a file is opened or the previous NFS protocol mechanisms used when a file is opened or
created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1
protocol defines an OPEN operation which looks up or creates a file protocol defines an OPEN operation which is capable of atomically
and establishes locking state on the server. looking up, creating, and locking a file on the server.
9.1. Opens and Byte-range Locks 9.1. Opens and Byte-Range Locks
It is assumed that manipulating a byte-range lock is rare when It is assumed that manipulating a byte-range lock is rare when
compared to READ and WRITE operations. It is also assumed that compared to READ and WRITE operations. It is also assumed that
crashes and network partitions are relatively rare. Therefore it is server restarts and network partitions are relatively rare.
important that the READ and WRITE operations have a lightweight Therefore it is important that the READ and WRITE operations have a
mechanism to indicate if they possess a held lock. A byte-range lock lightweight mechanism to indicate if they possess a held lock. A
request contains the heavyweight information required to establish a byte-range lock request contains the heavyweight information required
lock and uniquely define the owner of the lock. to establish a lock and uniquely define the owner of the lock.
9.1.1. State-owner Definition 9.1.1. State-owner Definition
When opening a file or requesting a record lock, the client must When opening a file or requesting a record lock, the client must
specify an identifier which represents the owner of the requested specify an identifier which represents the owner of the requested
lock. This identifier is in the form of a state-owner, represented lock. This identifier is in the form of a state-owner, represented
in the protocol by a state_owner4, a variable-length opaque array in the protocol by a state_owner4, a variable-length opaque array
which, when concatenated with the current client ID uniquely defines which, when concatenated with the current client ID uniquely defines
the owner of lock managed by the client. This may be a thread id, the owner of lock managed by the client. This may be a thread id,
process id, or other unique value. process id, or other unique value.
skipping to change at page 172, line 7 skipping to change at page 172, line 10
remain separate even if the same opaque arrays are used to designate remain separate even if the same opaque arrays are used to designate
owners of each. The protocol distinguishes between open-owners owners of each. The protocol distinguishes between open-owners
(represented by open_owner4 structures) and lock-owners (represented (represented by open_owner4 structures) and lock-owners (represented
by lock_owner4 structures). by lock_owner4 structures).
Each open is associated with a specific open-owner while each record Each open is associated with a specific open-owner while each record
lock is associated with a lock-owner and an open-owner, the latter lock is associated with a lock-owner and an open-owner, the latter
being the open-owner associated with the open file under which the being the open-owner associated with the open file under which the
LOCK operation was done. Delegations and layouts, on the other hand, LOCK operation was done. Delegations and layouts, on the other hand,
are not associated with a specific owner but are associated with the are not associated with a specific owner but are associated with the
client as a whole. client as a whole (identified by a client ID).
9.1.2. Use of the Stateid and Locking 9.1.2. Use of the Stateid and Locking
All READ, WRITE and SETATTR operations contain a stateid. For the All READ, WRITE and SETATTR operations contain a stateid. For the
purposes of this section, SETATTR operations which change the size purposes of this section, SETATTR operations which change the size
attribute of a file are treated as if they are writing the area attribute of a file are treated as if they are writing the area
between the old and new size (i.e. the range truncated or added to between the old and new size (i.e. the range truncated or added to
the file by means of the SETATTR), even where SETATTR is not the file by means of the SETATTR), even where SETATTR is not
explicitly mentioned in the text. The stateid passed to these explicitly mentioned in the text. The stateid passed to one of these
operation must be one that represents an open, a set of byte-range operations must be one that represents an open, a set of byte-range
locks, or a delegation, or it may be a special stateid representing locks, or a delegation, or it may be a special stateid representing
anonymous access or the special bypass stateid. anonymous access or the special bypass stateid.
If the state-owner performs a READ or WRITE in a situation in which If the state-owner performs a READ or WRITE in a situation in which
it has established a byte-range lock or share reservation on the it has established a byte-range lock or share reservation on the
server (any OPEN constitutes a share reservation) the stateid server (any OPEN constitutes a share reservation) the stateid
(previously returned by the server) must be used to indicate what (previously returned by the server) must be used to indicate what
locks, including both record locks and share reservations, are held locks, including both record locks and share reservations, are held
by the state-owner. If no state is established by the client, either by the state-owner. If no state is established by the client, either
record lock or share reservation, a special stateid for anonymous record lock or share reservation, a special stateid for anonymous
state (zero as "other" and "seqid") is used. (See Section 8.2.3 for state (zero as "other" and "seqid") is used. (See Section 8.2.3 for
a description of 'special' stateids in general). Regardless whether a description of 'special' stateids in general.) Regardless whether
a stateid for anonymous state or a stateid returned by the server is a stateid for anonymous state or a stateid returned by the server is
used, if there is a conflicting share reservation or mandatory record used, if there is a conflicting share reservation or mandatory record
lock held on the file, the server MUST refuse to service the READ or lock held on the file, the server MUST refuse to service the READ or
WRITE operation. WRITE operation.
Share reservations are established by OPEN operations and by their Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED. Record locks may be implemented by the with error NFS4ERR_LOCKED. Record locks may be implemented by the
server as either mandatory or advisory, or the choice of mandatory or server as either mandatory or advisory, or the choice of mandatory or
skipping to change at page 173, line 19 skipping to change at page 173, line 21
far as the APIs and requirements on implementation. If the mandatory far as the APIs and requirements on implementation. If the mandatory
lock attribute is set on the file, the server checks to see if the lock attribute is set on the file, the server checks to see if the
lock-owner has an appropriate shared (read) or exclusive (write) lock-owner has an appropriate shared (read) or exclusive (write)
record lock on the region it wishes to read or write to. If there is record lock on the region it wishes to read or write to. If there is
no appropriate lock, the server checks if there is a conflicting lock no appropriate lock, the server checks if there is a conflicting lock
(which can be done by attempting to acquire the conflicting lock on (which can be done by attempting to acquire the conflicting lock on
behalf of the lock-owner, and if successful, release the lock after behalf of the lock-owner, and if successful, release the lock after
the READ or WRITE is done), and if there is, the server returns the READ or WRITE is done), and if there is, the server returns
NFS4ERR_LOCKED. NFS4ERR_LOCKED.
For Windows environments, there are no advisory record locks, so the For Windows environments, record locks are always mandatory, so the
server always checks for record locks during I/O requests. server always checks for record locks during I/O requests.
Thus, the NFSv4.1 LOCK operation does not need to distinguish between Thus, the NFSv4.1 LOCK operation does not need to distinguish between
advisory and mandatory record locks. It is the NFSv4.1 server's advisory and mandatory record locks. It is the NFSv4.1 server's
processing of the READ and WRITE operations that introduces the processing of the READ and WRITE operations that introduces the
distinction. distinction.
Every stateid which is validly passed to READ, WRITE or SETATTR, with Every stateid which is validly passed to READ, WRITE or SETATTR, with
the exception of special stateid values, defines an access mode for the exception of special stateid values, defines an access mode for
the file (i.e. READ, WRITE, or READ-WRITE) the file (i.e. READ, WRITE, or READ-WRITE)
skipping to change at page 173, line 43 skipping to change at page 173, line 45
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the
same open-owner/file pair. same open-owner/file pair.
o For stateids returned by record lock requests, the appropriate o For stateids returned by record lock requests, the appropriate
mode is the access mode for the open stateid associated with the mode is the access mode for the open stateid associated with the
lock set represented by the stateid. lock set represented by the stateid.
o For delegation stateids the access mode is based on the type of o For delegation stateids the access mode is based on the type of
delegation. delegation.
When a READ, WRITE, or SETATTR which specifies the size attribute, is When a READ, WRITE, or SETATTR (which specifies the size attribute)
done, the operation is subject to checking against the access mode to is done, the operation is subject to checking against the access mode
verify that the operation is appropriate given the stateid with which to verify that the operation is appropriate given the stateid with
the operation is associated. which the operation is associated.
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which
set size), the server must verify that the access mode allows writing set size), the server MUST verify that the access mode allows writing
and return an NFS4ERR_OPENMODE error if it does not. In the case, of and MUST return an NFS4ERR_OPENMODE error if it does not. In the
READ, the server may perform the corresponding check on the access case, of READ, the server may perform the corresponding check on the
mode, or it may choose to allow READ on opens for WRITE only, to access mode, or it may choose to allow READ on opens for WRITE only,
accommodate clients whose write implementation may unavoidably do to accommodate clients whose write implementation may unavoidably do
reads (e.g. due to buffer cache constraints). However, even if READs reads (e.g. due to buffer cache constraints). However, even if READs
are allowed in these circumstances, the server MUST still check for are allowed in these circumstances, the server MUST still check for
locks that conflict with the READ (e.g. another open specify denial locks that conflict with the READ (e.g. another open specify denial
of READs). Note that a server which does enforce the access mode of READs). Note that a server which does enforce the access mode
check on READs need not explicitly check for conflicting share check on READs need not explicitly check for conflicting share
reservations since the existence of OPEN for read access guarantees reservations since the existence of OPEN for read access guarantees
that no conflicting share reservation can exist. that no conflicting share reservation can exist.
The read bypass special stateid (all bits of "other" and "seqid" set The read bypass special stateid (all bits of "other" and "seqid" set
to one) stateid indicates a desire to bypass locking checks. The to one) indicates a desire to bypass locking checks. The server MAY
server MAY allow READ operations to bypass locking checks at the allow READ operations to bypass locking checks at the server, when
server, when this special stateid is used. However, WRITE operations this special stateid is used. However, WRITE operations with this
with this special stateid value MUST NOT bypass locking checks and special stateid value MUST NOT bypass locking checks and are treated
are treated exactly the same as if a special stateid for anonymous exactly the same as if a special stateid for anonymous state were
state were used. used.
A lock may not be granted while a READ or WRITE operation using one A lock may not be granted while a READ or WRITE operation using one
of the special stateids is being performed and the scope of the lock of the special stateids is being performed and the scope of the lock
to be granted would conflict with the READ or WRITE operation. This to be granted would conflict with the READ or WRITE operation. This
can occur when: can occur when:
o A mandatory byte range lock is requested with range that conflicts o A mandatory byte range lock is requested with range that conflicts
with the range of the READ or WRITE operation. For the purposes with the range of the READ or WRITE operation. For the purposes
of this paragraph, a conflict occurs when a shared lock is of this paragraph, a conflict occurs when a shared lock is
requested and a WRITE operation is being performed, or an requested and a WRITE operation is being performed, or an
exclusive lock is requested and either a READ or a WRITE operation exclusive lock is requested and either a READ or a WRITE operation
is being performed. is being performed.
o A share reservation is requested which denies reading and or o A share reservation is requested which denies reading and or
writing and the corresponding is being performed. writing and the corresponding operation is being performed.
o A delegation is to be granted and the delegation type would o A delegation is to be granted and the delegation type would
prevent the I/O operation, i.e. READ and WRITE conflict with a prevent the I/O operation, i.e. READ and WRITE conflict with a
write delegation and WRITE conflicts with a read delegation. write delegation and WRITE conflicts with a read delegation.
When a client holds a delegation, it is particularly important to When a client holds a delegation, it needs to ensure that the stateid
make sure that the stateid sent conveys the association of operation sent conveys the association of operation with the delegation, to
with the delegation, to avoid the delegation from being avoidably avoid the delegation from being avoidably recalled. When the
recalled. When the delegation stateid, or a stateid open associated delegation stateid, or a stateid open associated with that
with that delegation, or a stateid representing byte-range locks delegation, or a stateid representing byte-range locks derived form
derived form such an open is used, the server knows that the READ, such an open is used, the server knows that the READ, WRITE, or
WRITE, or SETATTR does not conflict with the delegation, but is sent SETATTR does not conflict with the delegation, but is sent under the
under the aegis of the delegation. Even though it is possible for aegis of the delegation. Even though it is possible for the server
the server to determine from the clientid (via the sessionid) that to determine from the client ID (via the sessionid) that the client
the client does in fact have a delegation, the server is not obliged does in fact have a delegation, the server is not obliged to check
to check this, so using a special stateid can result in avoidable this, so using a special stateid can result in avoidable recall of
recall of the delegation. the delegation.
9.2. Lock Ranges 9.2. Lock Ranges
The protocol allows a lock-owner to request a lock with a byte range The protocol allows a lock-owner to request a lock with a byte range
and then either upgrade, downgrade, or unlock a sub-range of the and then either upgrade, downgrade, or unlock a sub-range of the
initial lock, or a range that consists of a range which overlaps, initial lock, or a range that consists of a range which overlaps,
fully or partially, that initial lock or a combination of a set of fully or partially, that initial lock or a combination of a set of
existing locks for the same lock-owner. It is expected that this existing locks for the same lock-owner. It is expected that this
will be an uncommon type of request. In any case, servers or server will be an uncommon type of request. In any case, servers or server
file systems may not be able to support sub-range lock semantics. In file systems may not be able to support sub-range lock semantics. In
skipping to change at page 175, line 26 skipping to change at page 175, line 28
sub-range of current locking state for the lock-owner, the server is sub-range of current locking state for the lock-owner, the server is
allowed to return the error NFS4ERR_LOCK_RANGE to signify that it allowed to return the error NFS4ERR_LOCK_RANGE to signify that it
does not support sub-range lock operations. Therefore, the client does not support sub-range lock operations. Therefore, the client
should be prepared to receive this error and, if appropriate, report should be prepared to receive this error and, if appropriate, report
the error to the requesting application. the error to the requesting application.
The client is discouraged from combining multiple independent locking The client is discouraged from combining multiple independent locking
ranges that happen to be adjacent into a single request since the ranges that happen to be adjacent into a single request since the
server may not support sub-range requests and for reasons related to server may not support sub-range requests and for reasons related to
the recovery of file locking state in the event of server failure. the recovery of file locking state in the event of server failure.
As discussed in Section 8.4.2 below, the server may employ certain As discussed in Section 8.4.2, the server may employ certain
optimizations during recovery that work effectively only when the optimizations during recovery that work effectively only when the
client's behavior during lock recovery is similar to the client's client's behavior during lock recovery is similar to the client's
locking behavior prior to server failure. locking behavior prior to server failure.
9.3. Upgrading and Downgrading Locks 9.3. Upgrading and Downgrading Locks
If a client has a write lock on a record, it can request an atomic If a client has a write lock on a record, it can request an atomic
downgrade of the lock to a read lock via the LOCK request, by setting downgrade of the lock to a read lock via the LOCK request, by setting
the type to READ_LT. If the server supports atomic downgrade, the the type to READ_LT. If the server supports atomic downgrade, the
request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP.
skipping to change at page 176, line 5 skipping to change at page 176, line 6
the type to WRITE_LT or WRITEW_LT. If the server does not support the type to WRITE_LT or WRITEW_LT. If the server does not support
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade
can be achieved without an existing conflict, the request will can be achieved without an existing conflict, the request will
succeed. Otherwise, the server will return either NFS4ERR_DENIED or succeed. Otherwise, the server will return either NFS4ERR_DENIED or
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the
client sent the LOCK request with the type set to WRITEW_LT and the client sent the LOCK request with the type set to WRITEW_LT and the
server has detected a deadlock. The client should be prepared to server has detected a deadlock. The client should be prepared to
receive such errors and if appropriate, report the error to the receive such errors and if appropriate, report the error to the
requesting application. requesting application.
9.4. Stateid Seqid Values and Byte-range Locks 9.4. Stateid Seqid Values and Byte-Range Locks
When a lock or unlock request is done, passing a stateid, the stateid When a lock or unlock request is done, passing a stateid, the stateid
returned has the same "other" value and a "seqid" value that is returned has the same "other" value and a "seqid" value that is
incremented to reflect the occurrence of the lock or unlock request. incremented to reflect the occurrence of the lock or unlock request.
The server MUST increment the value of the "seqid" field whenever The server MUST increment the value of the "seqid" field whenever
there is any change to the locking status of any byte offset as there is any change to the locking status of any byte offset as
described by any of locks covered by the stateid. A change in described by any of locks covered by the stateid. A change in
locking status includes a change from locked to unlocked or the locking status includes a change from locked to unlocked or the
reverse or a change from being locked for read to being locked for reverse or a change from being locked for read to being locked for
write or the reverse. write or the reverse.
When there is no such change, as, for example when a range already When there is no such change, as, for example when a range already
locked for write is locked again for write, the server MAY increment locked for write is locked again for write, the server MAY increment
the "seqid" value. the "seqid" value.
9.5. Issues with Multiple Open-owners 9.5. Issues with Multiple Open-Owners
When the same file is opened by multiple open-owners and there are When the same file is opened by multiple open-owners, a client will
LOCK and LOCKU requests for the same lock-owner issued through the have multiple open stateids for that file, each associated with a
different open files, a situation may arise in which there are different open-owner. In that case, there can be multiple LOCK and
multiple stateids representing byte-range locks for locks on the the LOCKU requests for the same lock-owner issued using the different
same file held by the same lock-owner but each assigned to a open stateids, and so a situation may arise in which there are
multiple stateids, each representing byte-range locks on the same
file and held by the same lock-owner but each associated with a
different open-owner. different open-owner.
In such a situation, the locking status of each byte (i.e. whether it In such a situation, the locking status of each byte (i.e. whether it
is locked, the read or write mode of the lock and the lock-owner is locked, the read or write mode of the lock and the lock-owner
holding the lock) MUST reflect the last LOCK or LOCKU operation done holding the lock) MUST reflect the last LOCK or LOCKU operation done
for the lock-owner in question, independent of the stateid through for the lock-owner in question, independent of the stateid through
which the request was issued. which the request was issued.
When a byte is locked by the lock-owner in question, the open-owner When a byte is locked by the lock-owner in question, the open-owner
to which that lock is assigned SHOULD be that of the open-owner to which that lock is assigned SHOULD be that of the open-owner
skipping to change at page 177, line 4 skipping to change at page 177, line 11
change to the set of locked bytes associated with a different stateid change to the set of locked bytes associated with a different stateid
for the same lock-owner, i.e. associated with a different open-owner, for the same lock-owner, i.e. associated with a different open-owner,
the "seqid" value for that stateid MUST NOT be incremented. the "seqid" value for that stateid MUST NOT be incremented.
9.6. Blocking Locks 9.6. Blocking Locks
Some clients require the support of blocking locks. While NFSv4.1 Some clients require the support of blocking locks. While NFSv4.1
provides a callback when a previously unavailable lock becomes provides a callback when a previously unavailable lock becomes
available, this is an OPTIONAL feature and clients cannot depend on available, this is an OPTIONAL feature and clients cannot depend on
its presence. Clients need to be prepared to continually poll for its presence. Clients need to be prepared to continually poll for
the lock. This presents a fairness problem. Two new lock types are the lock. This presents a fairness problem. Two of the lock types,
added, READW and WRITEW, and are used to indicate to the server that READW and WRITEW, are used to indicate to the server that the client
the client is requesting a blocking lock. When the callback is not is requesting a blocking lock. When the callback is not used, the
used, the server should maintain an ordered list of pending blocking server should maintain an ordered list of pending blocking locks.
locks. When the conflicting lock is released, the server may wait When the conflicting lock is released, the server may wait for the
the lease period for the first waiting client to re-request the lock. period of time equal to lease_time for the first waiting client to
After the lease period expires the next waiting client request is re-request the lock. After the lease period expires, the next
allowed the lock. Clients are required to poll at an interval waiting client request is allowed the lock. Clients are required to
sufficiently small that it is likely to acquire the lock in a timely poll at an interval sufficiently small that it is likely to acquire
manner. The server is not required to maintain a list of pending the lock in a timely manner. The server is not required to maintain
blocked locks as it is used to increase fairness and not correct a list of pending blocked locks as it is used to increase fairness
operation. Because of the unordered nature of crash recovery, and not correct operation. Because of the unordered nature of crash
storing of lock state to stable storage would be required to recovery, storing of lock state to stable storage would be required
guarantee ordered granting of blocking locks. to guarantee ordered granting of blocking locks.
Servers may also note the lock types and delay returning denial of Servers may also note the lock types and delay returning denial of
the request to allow extra time for a conflicting lock to be the request to allow extra time for a conflicting lock to be
released, allowing a successful return. In this way, clients can released, allowing a successful return. In this way, clients can
avoid the burden of needlessly frequent polling for blocking locks. avoid the burden of needlessly frequent polling for blocking locks.
The server should take care in the length of delay in the event the The server should take care in the length of delay in the event the
client retransmits the request. client retransmits the request.
If a server receives a blocking lock request, denies it, and then If a server receives a blocking lock request, denies it, and then
later receives a nonblocking request for the same lock, which is also later receives a nonblocking request for the same lock, which is also
skipping to change at page 180, line 45 skipping to change at page 181, line 5
possible wraparound of the 32-bit field. possible wraparound of the 32-bit field.
When the possibility exists that the client will send multiple OPENs When the possibility exists that the client will send multiple OPENs
for the same open-owner in parallel, it may be the case that an open for the same open-owner in parallel, it may be the case that an open
upgrade may happen without the client knowing beforehand that this upgrade may happen without the client knowing beforehand that this
could happen. Because of this possibility, CLOSEs and could happen. Because of this possibility, CLOSEs and
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in
the stateid, to avoid the possibility that the status change the stateid, to avoid the possibility that the status change
associated with an open upgrade is not inadvertently lost. associated with an open upgrade is not inadvertently lost.
9.11. Reclaim of Open and Byte-range Locks 9.11. Reclaim of Open and Byte-Range Locks
Special forms of the LOCK and OPEN operations are provided when it is Special forms of the LOCK and OPEN operations are provided when it is
necessary to re-establish byte-range locks or opens after a server necessary to re-establish byte-range locks or opens after a server
failure. failure.
o To reclaim existing opens, an OPEN operation is performed using a o To reclaim existing opens, an OPEN operation is performed using a
CLAIM_PREVIOUS. Because the client, in this type of situation, CLAIM_PREVIOUS. Because the client, in this type of situation,
will have already opened the file and have the filehandle of the will have already opened the file and have the filehandle of the
target file, this operation requires that the current filehandle target file, this operation requires that the current filehandle
be the target file, rather than a directory and no file name is be the target file, rather than a directory and no file name is
skipping to change at page 182, line 20 skipping to change at page 182, line 29
In this case, repeated reference to the server to find that no In this case, repeated reference to the server to find that no
conflicts exist is expensive. A better option with regards to conflicts exist is expensive. A better option with regards to
performance is to allow a client that repeatedly opens a file to do performance is to allow a client that repeatedly opens a file to do
so without reference to the server. This is done until potentially so without reference to the server. This is done until potentially
conflicting operations from another client actually occur. conflicting operations from another client actually occur.
A similar situation arises in connection with file locking. Sending A similar situation arises in connection with file locking. Sending
file lock and unlock requests to the server as well as the read and file lock and unlock requests to the server as well as the read and
write requests necessary to make data caching consistent with the write requests necessary to make data caching consistent with the
locking semantics (see Section 10.3.2 can severely limit performance. locking semantics (see Section 10.3.2) can severely limit
When locking is used to provide protection against infrequent performance. When locking is used to provide protection against
conflicts, a large penalty is incurred. This penalty may discourage infrequent conflicts, a large penalty is incurred. This penalty may
the use of file locking by applications. discourage the use of file locking by applications.
The NFSv4.1 protocol provides more aggressive caching strategies with The NFSv4.1 protocol provides more aggressive caching strategies with
the following design goals: the following design goals:
o Compatibility with a large range of server semantics. o Compatibility with a large range of server semantics.
o Providing the same caching benefits as previous versions of the o Providing the same caching benefits as previous versions of the
NFS protocol when unable to support the more aggressive model. NFS protocol when unable to support the more aggressive model.
o Requirements for aggressive caching are organized so that a large o Requirements for aggressive caching are organized so that a large
skipping to change at page 185, line 40 skipping to change at page 186, line 4
To allow for this type of client recovery, the server MAY extend the To allow for this type of client recovery, the server MAY extend the
period for delegation recovery beyond the typical lease expiration period for delegation recovery beyond the typical lease expiration
period. This implies that requests from other clients that conflict period. This implies that requests from other clients that conflict
with these delegations will need to wait. Because the normal recall with these delegations will need to wait. Because the normal recall
process may require significant time for the client to flush changed process may require significant time for the client to flush changed
state to the server, other clients need be prepared for delays that state to the server, other clients need be prepared for delays that
occur because of a conflicting delegation. This longer interval occur because of a conflicting delegation. This longer interval
would increase the window for clients to restart and consult stable would increase the window for clients to restart and consult stable
storage so that the delegations can be reclaimed. For open storage so that the delegations can be reclaimed. For open
delegations, such delegations are reclaimed using OPEN with a claim delegations, such delegations are reclaimed using OPEN with a claim
type of CLAIM_DELEGATE_PREV. (See Section 10.5 and Section 18.16 for type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (See Section 10.5
discussion of open delegation and the details of OPEN respectively). and Section 18.16 for discussion of open delegation and the details
of OPEN respectively).
A server MAY support a claim type of CLAIM_DELEGATE_PREV, and if it A server MAY support claim types of CLAIM_DELEGATE_PREV and
does, it MUST NOT remove delegations upon a CREATE_SESSION that CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations
confirms a client ID created by EXCHANGE_ID, and instead MUST, for a upon a CREATE_SESSION that confirms a client ID created by
period of time no less than that of the value of the lease_time EXCHANGE_ID, and instead MUST, for a period of time no less than that
attribute, maintain the client's delegations to allow time for the of the value of the lease_time attribute, maintain the client's
client to send CLAIM_DELEGATE_PREV requests. The server that delegations to allow time for the client to send CLAIM_DELEGATE_PREV
supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE operation. requests. The server that supports CLAIM_DELEGATE_PREV and/or
CLAIM_DELEG_PREV_FH MUST support the DELEGPURGE operation.
When the server restarts, delegations are reclaimed (using the OPEN When the server restarts, delegations are reclaimed (using the OPEN
operation with CLAIM_PREVIOUS) in a similar fashion to record locks operation with CLAIM_PREVIOUS) in a similar fashion to record locks
and share reservations. However, there is a slight semantic and share reservations. However, there is a slight semantic
difference. In the normal case if the server decides that a difference. In the normal case if the server decides that a
delegation should not be granted, it performs the requested action delegation should not be granted, it performs the requested action
(e.g. OPEN) without granting any delegation. For reclaim, the (e.g. OPEN) without granting any delegation. For reclaim, the
server grants the delegation but a special designation is applied so server grants the delegation but a special designation is applied so
that the client treats the delegation as having been granted but that the client treats the delegation as having been granted but
recalled by the server. Because of this, the client has the duty to recalled by the server. Because of this, the client has the duty to
skipping to change at page 188, line 12 skipping to change at page 188, line 27
client's cache. This validation must be done at least when the client's cache. This validation must be done at least when the
client's OPEN operation includes DENY=WRITE or BOTH thus client's OPEN operation includes DENY=WRITE or BOTH thus
terminating a period in which other clients may have had the terminating a period in which other clients may have had the
opportunity to open the file with WRITE access. Clients may opportunity to open the file with WRITE access. Clients may
choose to do the revalidation more often (i.e. at OPENs specifying choose to do the revalidation more often (i.e. at OPENs specifying
DENY=NONE) to parallel the NFSv3 protocol's practice for the DENY=NONE) to parallel the NFSv3 protocol's practice for the
benefit of users assuming this degree of cache revalidation. benefit of users assuming this degree of cache revalidation.
Since the change attribute is updated for data and metadata Since the change attribute is updated for data and metadata
modifications, some client implementors may be tempted to use the modifications, some client implementors may be tempted to use the
time_modify attribute and not change to validate cached data, so time_modify attribute and not the change attribute to validate
that metadata changes do not spuriously invalidate clean data. cached data, so that metadata changes do not spuriously invalidate
The implementor is cautioned in this approach. The change clean data. The implementor is cautioned in this approach. The
attribute is guaranteed to change for each update to the file, change attribute is guaranteed to change for each update to the
whereas time_modify is guaranteed to change only at the file, whereas time_modify is guaranteed to change only at the
granularity of the time_delta attribute. Use by the client's data granularity of the time_delta attribute. Use by the client's data
cache validation logic of time_modify and not change runs the risk cache validation logic of time_modify and not change runs the risk
of the client incorrectly marking stale data as valid. of the client incorrectly marking stale data as valid.
o Second, modified data must be flushed to the server before closing o Second, modified data must be flushed to the server before closing
a file OPENed for write. This is complementary to the first rule. a file OPENed for write. This is complementary to the first rule.
If the data is not flushed at CLOSE, the revalidation done after If the data is not flushed at CLOSE, the revalidation done after
client OPENs as file is unable to achieve its purpose. The other client OPENs as file is unable to achieve its purpose. The other
aspect to flushing the data before close is that the data must be aspect to flushing the data before close is that the data must be
committed to stable storage, at the server, before the CLOSE committed to stable storage, at the server, before the CLOSE
skipping to change at page 191, line 50 skipping to change at page 192, line 20
the delegation are subject to change. In particular, the server may the delegation are subject to change. In particular, the server may
receive a conflicting OPEN from another client, the server must receive a conflicting OPEN from another client, the server must
recall the delegation before deciding whether the OPEN from the other recall the delegation before deciding whether the OPEN from the other
client may be granted. Making a delegation is up to the server and client may be granted. Making a delegation is up to the server and
clients should not assume that any particular OPEN either will or clients should not assume that any particular OPEN either will or
will not result in an open delegation. The following is a typical will not result in an open delegation. The following is a typical
set of conditions that servers might use in deciding whether OPEN set of conditions that servers might use in deciding whether OPEN
should be delegated: should be delegated:
o The client must be able to respond to the server's callback o The client must be able to respond to the server's callback
requests. The server will use the CB_NULL procedure for a test of requests. If a backchannel has been established, the server will
callback ability. send a CB_COMPOUND request, containing a single operation,
CB_SEQUENCE, for a test of backchannel availability.
o The client must have responded properly to previous recalls. o The client must have responded properly to previous recalls.
o There must be no current open conflicting with the requested o There must be no current open conflicting with the requested
delegation. delegation.
o There should be no current delegation that conflicts with the o There should be no current delegation that conflicts with the
delegation being requested. delegation being requested.
o The probability of future conflicting open requests should be low o The probability of future conflicting open requests should be low
skipping to change at page 192, line 37 skipping to change at page 193, line 7
delegations. delegations.
When a client has a read open delegation, it is assured that neither When a client has a read open delegation, it is assured that neither
the contents, the attributes (with the exception of time_access), nor the contents, the attributes (with the exception of time_access), nor
the names of any links to the file will change without its knowledge, the names of any links to the file will change without its knowledge,
so long as the delegation is held. When a client has a write open so long as the delegation is held. When a client has a write open
delegation, it may modify the file data locally since no other client delegation, it may modify the file data locally since no other client
will be accessing the file's data. The client holding a write will be accessing the file's data. The client holding a write
delegation may only locally affect file attributes which are delegation may only locally affect file attributes which are
intimately connected with the file data: size, change, time_access, intimately connected with the file data: size, change, time_access,
time_metadata, and time_modify. to other attributes must be reflected time_metadata, and time_modify. All other attributes must be
on the server. reflected on the server.
When a client has an open delegation, it does not send OPENs or When a client has an open delegation, it does not need to send OPENs
CLOSEs to the server but updates the appropriate status internally. or CLOSEs to the server. Instead the client may update the
For a read open delegation, opens that cannot be handled locally appropriate status internally. For a read open delegation, opens
(opens for write or that deny read access) must be sent to the that cannot be handled locally (opens for write or that deny read
server. access) must be sent to the server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the reply to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
o the type of delegation (read or write) o the type of delegation (read or write).
o space limitation information to control flushing of data on close o space limitation information to control flushing of data on close
(write open delegation only, see Section 10.4.1. (write open delegation only, see Section 10.4.1).
o an nfsace4 specifying read and write permissions o an nfsace4 specifying read and write permissions.
o a stateid to represent the delegation for READ and WRITE o a stateid to represent the delegation for READ and WRITE.
The delegation stateid is separate and distinct from the stateid for The delegation stateid is separate and distinct from the stateid for
the OPEN proper. The standard stateid, unlike the delegation the OPEN proper. The standard stateid, unlike the delegation
stateid, is associated with a particular lock-owner and will continue stateid, is associated with a particular lock-owner and will continue
to be valid after the delegation is recalled and the file remains to be valid after the delegation is recalled and the file remains
open. open.
When a request internal to the client is made to open a file and open When a request internal to the client is made to open a file and an
delegation is in effect, it will be accepted or rejected solely on open delegation is in effect, it will be accepted or rejected solely
the basis of the following conditions. Any requirement for other on the basis of the following conditions. Any requirement for other
checks to be made by the delegate should result in open delegation checks to be made by the delegate should result in open delegation
being denied so that the checks can be made by the server itself. being denied so that the checks can be made by the server itself.
o The access and deny bits for the request and the file as described o The access and deny bits for the request and the file as described
in Section 9.7. in Section 9.7.
o The read and write permissions as determined below. o The read and write permissions as determined below.
The nfsace4 passed with delegation can be used to avoid frequent The nfsace4 passed with delegation can be used to avoid frequent
ACCESS calls. The permission check should be as follows: ACCESS calls. The permission check should be as follows:
skipping to change at page 193, line 43 skipping to change at page 194, line 16
ACCESS request must be sent to the server to obtain the definitive ACCESS request must be sent to the server to obtain the definitive
answer. answer.
The server may return an nfsace4 that is more restrictive than the The server may return an nfsace4 that is more restrictive than the
actual ACL of the file. This includes an nfsace4 that specifies actual ACL of the file. This includes an nfsace4 that specifies
denial of all access. Note that some common practices such as denial of all access. Note that some common practices such as
mapping the traditional user "root" to the user "nobody" may make it mapping the traditional user "root" to the user "nobody" may make it
incorrect to return the actual ACL of the file in the delegation incorrect to return the actual ACL of the file in the delegation
response. response.
The use of delegation together with various other forms of caching The use of a delegation together with various other forms of caching
creates the possibility that no server authentication will ever be creates the possibility that no server authentication and
performed for a given user since all of the user's requests might be authorization will ever be performed for a given user since all of
satisfied locally. Where the client is depending on the server for the user's requests might be satisfied locally. Where the client is
authentication, the client should be sure authentication occurs for depending on the server for authentication and authorization, the
client should be sure authentication and authorization occurs for
each user by use of the ACCESS operation. This should be the case each user by use of the ACCESS operation. This should be the case
even if an ACCESS operation would not be required otherwise. As even if an ACCESS operation would not be required otherwise. As
mentioned before, the server may enforce frequent authentication by mentioned before, the server may enforce frequent authentication by
returning an nfsace4 denying all access with every open delegation. returning an nfsace4 denying all access with every open delegation.
10.4.1. Open Delegation and Data Caching 10.4.1. Open Delegation and Data Caching
OPEN delegation allows much of the message overhead associated with An OPEN delegation allows much of the message overhead associated
the opening and closing files to be eliminated. An open when an open with the opening and closing files to be eliminated. An open when an
delegation is in effect does not require that a validation message be open delegation is in effect does not require that a validation
sent to the server. The continued endurance of the "read open message be sent to the server. The continued endurance of the "read
delegation" provides a guarantee that no OPEN for write and thus no open delegation" provides a guarantee that no OPEN for write and thus
write has occurred. Similarly, when closing a file opened for write no write has occurred. Similarly, when closing a file opened for
and if write open delegation is in effect, the data written does not write and if write open delegation is in effect, the data written
have to be flushed to the server until the open delegation is does not have to be written to the server until the open delegation
recalled. The continued endurance of the open delegation provides a is recalled. The continued endurance of the open delegation provides
guarantee that no open and thus no read or write has been done by a guarantee that no open and thus no read or write has been done by
another client. another client.
For the purposes of open delegation, READs and WRITEs done without an For the purposes of open delegation, READs and WRITEs done without an
OPEN are treated as the functional equivalents of a corresponding OPEN are treated as the functional equivalents of a corresponding
type of OPEN. Although client SHOULD NOT use special stateids when type of OPEN. Although client SHOULD NOT use special stateids when
an open exists, delegation handling on the server can use the an open exists, delegation handling on the server can use the client
clientid associated with the current session to determine if the ID associated with the current session to determine if the operation
operation has been done by the holder of the delegation, in which has been done by the holder of the delegation, in which case, no
case, no recall is necessary, or by another client, in which case the recall is necessary, or by another client, in which case the
delegation must be recalled and I/O not proceed until the delegation delegation must be recalled and I/O not proceed until the delegation
is recalled or revoked. is recalled or revoked.
With delegations, a client is able to avoid writing data to the With delegations, a client is able to avoid writing data to the
server when the CLOSE of a file is serviced. The file close system server when the CLOSE of a file is serviced. The file close system
call is the usual point at which the client is notified of a lack of call is the usual point at which the client is notified of a lack of
stable storage for the modified file data generated by the stable storage for the modified file data generated by the
application. At the close, file data is written to the server and application. At the close, file data is written to the server and
through normal accounting the server is able to determine if the through normal accounting the server is able to determine if the
available file system space for the data has been exceeded (i.e. available file system space for the data has been exceeded (i.e.
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting
includes quotas. The introduction of delegations requires that a includes quotas. The introduction of delegations requires that a
alternative method be in place for the same type of communication to alternative method be in place for the same type of communication to
occur between client and server. occur between client and server.
In the delegation response, the server provides either the limit of In the delegation response, the server provides either the limit of
the size of the file or the number of modified blocks and associated the size of the file or the number of modified blocks and associated
block size. The server must ensure that the client will be able to block size. The server must ensure that the client will be able to
flush data to the server of a size equal to that provided in the write modified data to the server of a size equal to that provided in
original delegation. The server must make this assurance for all the original delegation. The server must make this assurance for all
outstanding delegations. Therefore, the server must be careful in outstanding delegations. Therefore, the server must be careful in
its management of available space for new or modified data taking its management of available space for new or modified data taking
into account available file system space and any applicable quotas. into account available file system space and any applicable quotas.
The server can recall delegations as a result of managing the The server can recall delegations as a result of managing the
available file system space. The client should abide by the server's available file system space. The client should abide by the server's
state space limits for delegations. If the client exceeds the stated state space limits for delegations. If the client exceeds the stated
limits for the delegation, the server's behavior is undefined. limits for the delegation, the server's behavior is undefined.
Based on server conditions, quotas or available file system space, Based on server conditions, quotas or available file system space,
the server may grant write open delegations with very restrictive the server may grant write open delegations with very restrictive
skipping to change at page 197, line 44 skipping to change at page 198, line 17
As discussed earlier in this section, the client MAY return the same As discussed earlier in this section, the client MAY return the same
cc value on subsequent CB_GETATTR calls, even if the file was cc value on subsequent CB_GETATTR calls, even if the file was
modified in the client's cache yet again between successive modified in the client's cache yet again between successive
CB_GETATTR calls. Therefore, the server must assume that the file CB_GETATTR calls. Therefore, the server must assume that the file
has been modified yet again, and MUST take care to ensure that the has been modified yet again, and MUST take care to ensure that the
new nsc it constructs and returns is greater than the previous nsc it new nsc it constructs and returns is greater than the previous nsc it
returned. An example implementation's delegation record would returned. An example implementation's delegation record would
satisfy this mandate by including a boolean field (let us call it satisfy this mandate by including a boolean field (let us call it
"modified") that is set to FALSE when the delegation is granted, and "modified") that is set to FALSE when the delegation is granted, and
an sc value set at the time of grant to the change attribute value. an sc value set at the time of grant to the change attribute value.
The modified field would be set to true the first time cc != sc, and The modified field would be set to TRUE the first time cc != sc, and
would stay true until the delegation is returned or revoked. The would stay TRUE until the delegation is returned or revoked. The
processing for constructing nsc, time_modify, and time_metadata would processing for constructing nsc, time_modify, and time_metadata would
use this pseudo code: use this pseudo code:
if (!modified) { if (!modified) {
do CB_GETATTR for change and size; do CB_GETATTR for change and size;
if (cc != sc) if (cc != sc)
modified = TRUE; modified = TRUE;
} else { } else {
do CB_GETATTR for size; do CB_GETATTR for size;
skipping to change at page 199, line 4 skipping to change at page 199, line 22
o Potentially conflicting OPEN request (or READ/WRITE done with o Potentially conflicting OPEN request (or READ/WRITE done with
"special" stateid) "special" stateid)
o SETATTR sent by another client o SETATTR sent by another client
o REMOVE request for the file o REMOVE request for the file
o RENAME request for the file as either source or target of the o RENAME request for the file as either source or target of the
RENAME RENAME
Whether a RENAME of a directory in the path leading to the file Whether a RENAME of a directory in the path leading to the file
results in recall of an open delegation depends on the semantics of results in recall of an open delegation depends on the semantics of
the server file system. If that file system denies such RENAMEs when the server's file system. If that file system denies such RENAMEs
a file is open, the recall must be performed to determine whether the when a file is open, the recall must be performed to determine
file in question is, in fact, open. whether the file in question is, in fact, open.
In addition to the situations above, the server may choose to recall In addition to the situations above, the server may choose to recall
open delegations at any time if resource constraints make it open delegations at any time if resource constraints make it
advisable to do so. Clients should always be prepared for the advisable to do so. Clients should always be prepared for the
possibility of recall. possibility of recall.
When a client receives a recall for an open delegation, it needs to When a client receives a recall for an open delegation, it needs to
update state on the server before returning the delegation. These update state on the server before returning the delegation. These
same updates must be done whenever a client chooses to return a same updates must be done whenever a client chooses to return a
delegation voluntarily. The following items of state need to be delegation voluntarily. The following items of state need to be
skipping to change at page 200, line 48 skipping to change at page 201, line 20
awareness could result in the client finding out long after the awareness could result in the client finding out long after the
failure that its delegation has been revoked, and another client has failure that its delegation has been revoked, and another client has
modified the data for which the client had a delegation. This is modified the data for which the client had a delegation. This is
especially a problem for the client that held a write delegation. especially a problem for the client that held a write delegation.
Status bits returned by SEQUENCE operations help to provide an Status bits returned by SEQUENCE operations help to provide an
alternate way of informing the client of issues regarding the status alternate way of informing the client of issues regarding the status
of the backchannel and of recalled delegations. When the backchannel of the backchannel and of recalled delegations. When the backchannel
is not available, the server returns the status bit is not available, the server returns the status bit
SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can
respond by attempting to re-establish the backchannel and by react by attempting to re-establish the backchannel and by returning
returning recallable objects if a backchannel cannot be successfully recallable objects if a backchannel cannot be successfully re-
re-established. established.
Whether the backchannel is functioning or not, it may be that the Whether the backchannel is functioning or not, it may be that the
recalled delegation is not returned. Note that the client's lease recalled delegation is not returned. Note that the client's lease
might still be renewed, even though the recalled delegation is not might still be renewed, even though the recalled delegation is not
returned. In this situation, servers SHOULD revoke delegations that returned. In this situation, servers SHOULD revoke delegations that
are not returned in a period of time equal to the lease period. This are not returned in a period of time equal to the lease period. This
period of time should allow the client time to note the backchannel- period of time should allow the client time to note the backchannel-
down status and re-establish the backchannel. down status and re-establish the backchannel.
When delegations are revoked, the server will return with the When delegations are revoked, the server will return with the
skipping to change at page 201, line 49 skipping to change at page 202, line 20
If no opens exist for the file at the point the delegation is If no opens exist for the file at the point the delegation is
revoked, then notification of the revocation is unnecessary. revoked, then notification of the revocation is unnecessary.
However, if there is modified data present at the client for the However, if there is modified data present at the client for the
file, the user of the application should be notified. Unfortunately, file, the user of the application should be notified. Unfortunately,
it may not be possible to notify the user since active applications it may not be possible to notify the user since active applications
may not be present at the client. See Section 10.5.1 for additional may not be present at the client. See Section 10.5.1 for additional
details. details.
10.4.7. Delegations via WANT_DELEGATION 10.4.7. Delegations via WANT_DELEGATION
In addition to providing delegations as part of the response to OPEN In addition to providing delegations as part of the reply to OPEN
operations, servers MAY provide delegations separate from open, via operations, servers MAY provide delegations separate from open, via
the OPTIONAL WANT_DELEGATION operation. This allows delegations to the OPTIONAL WANT_DELEGATION operation. This allows delegations to
be obtained in advance of an OPEN that might benefit from them, for be obtained in advance of an OPEN that might benefit from them, for
objects which are not a valid target of OPEN, or to deal with cases objects which are not a valid target of OPEN, or to deal with cases
in which a delegation has been recalled and the client wants to make in which a delegation has been recalled and the client wants to make
an attempt to re-establish it if the absence of use by other clients an attempt to re-establish it if the absence of use by other clients
allows that. allows that.
The WANT_DELEGATION operation may be performed on any type of file The WANT_DELEGATION operation may be performed on any type of file
object other than a directory. object other than a directory.
skipping to change at page 203, line 48 skipping to change at page 204, line 22
Saving of such modified data in delegation revocation situations may Saving of such modified data in delegation revocation situations may
be limited to files of a certain size or might be used only when be limited to files of a certain size or might be used only when
sufficient disk space is available within the target file system. sufficient disk space is available within the target file system.
Such saving may also be restricted to situations when the client has Such saving may also be restricted to situations when the client has
sufficient buffering resources to keep the cached copy available sufficient buffering resources to keep the cached copy available
until it is properly stored to the target file system. until it is properly stored to the target file system.
10.6. Attribute Caching 10.6. Attribute Caching
This section pertains to the caching of a file's attributes on a
client when that client does not hold a delegation on the file.
The attributes discussed in this section do not include named The attributes discussed in this section do not include named
attributes. Individual named attributes are analogous to files and attributes. Individual named attributes are analogous to files and
caching of the data for these needs to be handled just as data caching of the data for these needs to be handled just as data
caching is for ordinary files. Similarly, LOOKUP results from an caching is for ordinary files. Similarly, LOOKUP results from an
OPENATTR directory are to be cached on the same basis as any other OPENATTR directory are to be cached on the same basis as any other
pathnames and similarly for directory contents. pathnames and similarly for directory contents.
Clients may cache file attributes obtained from the server and use Clients may cache file attributes obtained from the server and use
them to avoid subsequent GETATTR requests. Such caching is write them to avoid subsequent GETATTR requests. Such caching is write
through in that modification to file attributes is always done by through in that modification to file attributes is always done by
means of requests to the server and should not be done locally and means of requests to the server and should not be done locally and
cached. The exception to this are modifications to attributes that cached. The exception to this are modifications to attributes that
are intimately connected with data caching. Therefore, extending a are intimately connected with data caching. Therefore, extending a
file by writing data to the local data cache is reflected immediately file by writing data to the local data cache is reflected immediately
in the size as seen on the client without this change being in the size as seen on the client without this change being
immediately reflected on the server. Normally such changes are not immediately reflected on the server. Normally such changes are not
propagated directly to the server but when the modified data is propagated directly to the server but when the modified data is
flushed to the server, analogous attribute changes are made on the flushed to the server, analogous attribute changes are made on the
server. When open delegation is in effect, the modified attributes server. When open delegation is in effect, the modified attributes
may be returned to the server in the response to a CB_RECALL call. may be returned to the server in reaction to a CB_RECALL call.
The result of local caching of attributes is that the attribute The result of local caching of attributes is that the attribute
caches maintained on individual clients will not be coherent. caches maintained on individual clients will not be coherent.
Changes made in one order on the server may be seen in a different Changes made in one order on the server may be seen in a different
order on one client and in a third order on a different client. order on one client and in a third order on a different client.
The typical file system application programming interfaces do not The typical file system application programming interfaces do not
provide means to atomically modify or interrogate attributes for provide means to atomically modify or interrogate attributes for
multiple files at the same time. The following rules provide an multiple files at the same time. The following rules provide an
environment where the potential incoherences mentioned above can be environment where the potential incoherences mentioned above can be
skipping to change at page 206, line 37 skipping to change at page 207, line 16
instead is just being read by an application via the memory mapped instead is just being read by an application via the memory mapped
interface, the client will not see an updated time_access interface, the client will not see an updated time_access
attribute. However, in many operating environments, neither will attribute. However, in many operating environments, neither will
any process running on the server. Thus NFS clients are at no any process running on the server. Thus NFS clients are at no
disadvantage with respect to local processes. disadvantage with respect to local processes.
o If there is another client that is memory mapping the file, and if o If there is another client that is memory mapping the file, and if
that client is holding a write delegation, the same set of issues that client is holding a write delegation, the same set of issues
as discussed in the previous two bullet items apply. So, when a as discussed in the previous two bullet items apply. So, when a
server does a CB_GETATTR to a file that the client has modified in server does a CB_GETATTR to a file that the client has modified in
its cache, the response from CB_GETATTR will not necessarily be its cache, the reply from CB_GETATTR will not necessarily be
accurate. As discussed earlier, the client's obligation is to accurate. As discussed earlier, the client's obligation is to
report that the file has been modified since the delegation was report that the file has been modified since the delegation was
granted, not whether it has been modified again between successive granted, not whether it has been modified again between successive
CB_GETATTR calls, and the server MUST assume that any file the CB_GETATTR calls, and the server MUST assume that any file the
client has modified in cache has been modified again between client has modified in cache has been modified again between
successive CB_GETATTR calls. Depending on the nature of the successive CB_GETATTR calls. Depending on the nature of the
client's memory management system, this weak obligation may not be client's memory management system, this weak obligation may not be
possible. A client MAY return stale information in CB_GETATTR possible. A client MAY return stale information in CB_GETATTR
whenever the file is memory mapped. whenever the file is memory mapped.
skipping to change at page 208, line 7 skipping to change at page 208, line 32
o Clients and servers MAY deny a record lock on a file they know is o Clients and servers MAY deny a record lock on a file they know is
memory mapped. memory mapped.
o A client MAY deny memory mapping a file that it knows requires o A client MAY deny memory mapping a file that it knows requires
mandatory locking for I/O. If mandatory locking is enabled after mandatory locking for I/O. If mandatory locking is enabled after
the file is opened and mapped, the client MAY deny the application the file is opened and mapped, the client MAY deny the application
further access to its mapped file. further access to its mapped file.
10.8. Name and Directory Caching without Directory Delegations 10.8. Name and Directory Caching without Directory Delegations
Although NFSv4.1 defines a directory delegation facility, (described The NFSv4.1 directory delegation facility (described in Section 10.9
in Section 10.9 below), servers are allowed not to implement that below) is OPTIONAL for servers to implement. Even where it is
facility and even where it is implemented, it may not be always be implemented, it may not be always be functional because of resource
functional, because of resource availability issues or other availability issues or other constraints. Thus, it is important to
constraints. Because of that, it is important to understand how name understand how name and directory caching are done in the absence of
and directory caching are done in the absence of directory directory delegations. Those topics are discussed in the next in
delegations. Those topics are discussed in the next in
Section 10.8.1. Section 10.8.1.
10.8.1. Name Caching 10.8.1. Name Caching
The results of LOOKUP and READDIR operations may be cached to avoid The results of LOOKUP and READDIR operations may be cached to avoid
the cost of subsequent LOOKUP operations. Just as in the case of the cost of subsequent LOOKUP operations. Just as in the case of
attribute caching, inconsistencies may arise among the various client attribute caching, inconsistencies may arise among the various client
caches. To mitigate the effects of these inconsistencies and given caches. To mitigate the effects of these inconsistencies and given
the context of typical file system APIs, an upper time boundary is the context of typical file system APIs, an upper time boundary is
maintained on how long a client name cache entry can be kept without maintained on how long a client name cache entry can be kept without
skipping to change at page 210, line 50 skipping to change at page 211, line 27
Directory caching for the NFSv4.1 protocol, as previously described, Directory caching for the NFSv4.1 protocol, as previously described,
is similar to file caching in previous versions. Clients typically is similar to file caching in previous versions. Clients typically
cache directory information for a duration determined by the client. cache directory information for a duration determined by the client.
At the end of a predefined timeout, the client will query the server At the end of a predefined timeout, the client will query the server
to see if the directory has been updated. By caching attributes, to see if the directory has been updated. By caching attributes,
clients reduce the number of GETATTR calls made to the server to clients reduce the number of GETATTR calls made to the server to
validate attributes. Furthermore, frequently accessed files and validate attributes. Furthermore, frequently accessed files and
directories, such as the current working directory, have their directories, such as the current working directory, have their
attributes cached on the client so that some NFS operations can be attributes cached on the client so that some NFS operations can be
performed without having to make an RPC call. By caching name and performed without having to make an RPC call. By caching name and
inode information about most recently looked up entries in the inode information about most recently looked up entries in a
Directory Name Lookup Cache (DNLC), clients do not need to send Directory Name Lookup Cache (DNLC), clients do not need to send
LOOKUP calls to the server every time these files are accessed. LOOKUP calls to the server every time these files are accessed.
This caching approach works reasonably well at reducing network This caching approach works reasonably well at reducing network
traffic in many environments. However, it does not address traffic in many environments. However, it does not address
environments where there are numerous queries for files that do not environments where there are numerous queries for files that do not
exist. In these cases of "misses", the client must make RPC calls to exist. In these cases of "misses", the client sends requests to the
the server in order to provide reasonable application semantics and server in order to provide reasonable application semantics and
promptly detect the creation of new directory entries. Examples of promptly detect the creation of new directory entries. Examples of
high miss activity are compilation in software development high miss activity are compilation in software development
environments. The current behavior of NFS limits its potential environments. The current behavior of NFS limits its potential
scalability and wide-area sharing effectiveness in these types of scalability and wide-area sharing effectiveness in these types of
environments. Other distributed stateful file system architectures environments. Other distributed stateful file system architectures
such as AFS and DFS have proven that adding state around directory such as AFS and DFS have proven that adding state around directory
contents can greatly reduce network traffic in high-miss contents can greatly reduce network traffic in high-miss
environments. environments.
Delegation of directory contents is a RECOMMENDED feature of NFSv4.1. Delegation of directory contents is an OPTIONAL feature of NFSv4.1.
Directory delegations provide similar traffic reduction benefits as Directory delegations provide similar traffic reduction benefits as
with file delegations. By allowing clients to cache directory with file delegations. By allowing clients to cache directory
contents (in a read-only fashion) while being notified of changes, contents (in a read-only fashion) while being notified of changes,
the client can avoid making frequent requests to interrogate the the client can avoid making frequent requests to interrogate the
contents of slowly-changing directories, reducing network traffic and contents of slowly-changing directories, reducing network traffic and
improving client performance. It can also simplify the task of improving client performance. It can also simplify the task of
determining whether other clients are making changes to the directory determining whether other clients are making changes to the directory
when the client itself is making many changes to the directory and when the client itself is making many changes to the directory and
changes are not serialized. changes are not serialized.
skipping to change at page 211, line 51 skipping to change at page 212, line 28
NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation
to allow the client to ask for a directory delegation. The to allow the client to ask for a directory delegation. The
delegation covers directory attributes and all entries in the delegation covers directory attributes and all entries in the
directory. If either of these change, the delegation will be directory. If either of these change, the delegation will be
recalled synchronously. The operation causing the recall will have recalled synchronously. The operation causing the recall will have
to wait before the recall is complete. Any changes to directory to wait before the recall is complete. Any changes to directory
entry attributes will not cause the delegation to be recalled. entry attributes will not cause the delegation to be recalled.
In addition to asking for delegations, a client can also ask for In addition to asking for delegations, a client can also ask for
notifications for certain events. These events include changes to notifications for certain events. These events include changes to
directory attributes and/or its contents. If a client asks for the directory's attributes and/or its contents. If a client asks for
notification for a certain event, the server will notify the client notification for a certain event, the server will notify the client
when that event occurs. This will not result in the delegation being when that event occurs. This will not result in the delegation being
recalled for that client. The notifications are asynchronous and recalled for that client. The notifications are asynchronous and
provide a way of avoiding recalls in situations where a directory is provide a way of avoiding recalls in situations where a directory is
changing enough that the pure recall model may not be effective while changing enough that the pure recall model may not be effective while
trying to allow the client to get substantial benefit. In the trying to allow the client to get substantial benefit. In the
absence of notifications, once the delegation is recalled the client absence of notifications, once the delegation is recalled the client
has to refresh its directory cache which might not be very efficient has to refresh its directory cache which might not be very efficient
for very large directories. for very large directories.
The delegation is read-only and the client may not make changes to The delegation is read-only and the client may not make changes to
the directory other than by performing NFSv4.1 operations that modify the directory other than by performing NFSv4.1 operations that modify
the directory or the associated file attributes so that the server the directory or the associated file attributes so that the server
has knowledge of these changes. In order to keep the client has knowledge of these changes. In order to keep the client
namespace synchronized with the server, the server will, if the namespace synchronized with the server, the server will, if the
client has requested notifications, notify the client holding the client has requested notifications, notify the client holding the
delegation of the changes made as a result. This is to avoid any delegation of the changes made as a result. This is to avoid any
need for subsequent GETATTR or READDIR calls to the server. If a need for subsequent GETATTR or READDIR calls to the server. If a
single client is holding the delegation and that client makes any single client is holding the delegation and that client makes any
changes to the directory (i.e. the changes are made via operations changes to the directory (i.e. the changes are made via operations
sent though a session associated with the clientid holding the sent though a session associated with the client ID holding the
delegation), the delegation will not be recalled. Multiple clients delegation), the delegation will not be recalled. Multiple clients
may hold a delegation on the same directory, but if any such client may hold a delegation on the same directory, but if any such client
modifies the directory, the server MUST recall the delegation from modifies the directory, the server MUST recall the delegation from
the other clients, unless those clients have made provisions to be the other clients, unless those clients have made provisions to be
notified of that sort of modification. notified of that sort of modification.
Delegations can be recalled by the server at any time. Normally, the Delegations can be recalled by the server at any time. Normally, the
server will recall the delegation when the directory changes in a way server will recall the delegation when the directory changes in a way
that is not covered by the notification, or when the directory that is not covered by the notification, or when the directory
changes and notifications have not been requested. If another client changes and notifications have not been requested. If another client
skipping to change at page 213, line 31 skipping to change at page 214, line 9
o For OPEN, see Section 18.16.4. o For OPEN, see Section 18.16.4.
o For REMOVE, see Section 18.25.4. o For REMOVE, see Section 18.25.4.
o For RENAME, see Section 18.26.4. o For RENAME, see Section 18.26.4.
o For SETATTR, see Section 18.30.4. o For SETATTR, see Section 18.30.4.
10.9.5. Directory Delegation Recovery 10.9.5. Directory Delegation Recovery
Crash recovery for state on regular files has two main goals, Recovery from client or server restart for state on regular files has
avoiding the necessity of breaking application guarantees with two main goals, avoiding the necessity of breaking application
respect to locked files and delivery of updates cached at the client. guarantees with respect to locked files and delivery of updates
Neither of these applies to directories protected by read delegations cached at the client. Neither of these goals applies to directories
and notifications. Thus, no provision is made for reclaiming protected by read delegations and notifications. Thus, no provision
directory delegations in the event of client or server failure. The is made for reclaiming directory delegations in the event of client
client can simply establish a directory delegation in the same or server restart. The client can simply establish a directory
fashion as was done initially. delegation in the same fashion as was done initially.
11. Multi-Server Namespace 11. Multi-Server Namespace
NFSv4.1 supports attributes that allow a namespace to extend beyond NFSv4.1 supports attributes that allow a namespace to extend beyond
the boundaries of a single server. It is RECOMMENDED that clients the boundaries of a single server. It is RECOMMENDED that clients
and servers support construction of such multi-server namespaces. and servers support construction of such multi-server namespaces.
Use of such multi-server namespaces is OPTIONAL however, and for many Use of such multi-server namespaces is OPTIONAL however, and for many
purposes, single-server namespace are perfectly acceptable. Use of purposes, single-server namespace are perfectly acceptable. Use of
multi-server namespaces can provide many advantages, however, by multi-server namespaces can provide many advantages, however, by
separating a file system's logical position in a namespace from the separating a file system's logical position in a namespace from the
 End of changes. 112 change blocks. 
423 lines changed or deleted 432 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/