Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-pre-ch5.txt - draft-ietf-nfsv4-minorversion1-20.txt
 draft-pre-ch5.txt   draft-ietf-nfsv4-minorversion1-20.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: August 24, 2008 Editors Expires: August 25, 2008 Editors
February 21, 2008 February 22, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-20.txt draft-ietf-nfsv4-minorversion1-20.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 24, 2008. This Internet-Draft will expire on August 25, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 3, line 6 skipping to change at page 3, line 6
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 2.9.2. Client and Server Transport Behavior . . . . . . . . 37
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 59
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 67
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 72 2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 73
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 76 2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 73
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 76
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 76 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 77
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 79 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 79
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 89 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 89
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89
4.2.1. General Properties of a Filehandle . . . . . . . . . 90 4.2.1. General Properties of a Filehandle . . . . . . . . . 90
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 91 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 91
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 91 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 91
skipping to change at page 6, line 39 skipping to change at page 6, line 40
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 263 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 263
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 263 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 263
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 264 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 264
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 265 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 265
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 266 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 266
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 266 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 266
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 266 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 266
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 268 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 268
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 269 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 269
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 270 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 270
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 272 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 273
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 279 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 280
12.5.7. Metadata Server Write Propagation . . . . . . . . . 279 12.5.7. Metadata Server Write Propagation . . . . . . . . . 280
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 280 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 280
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 281 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 282
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 282 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 282
12.7.2. Dealing with Lease Expiration on the Client . . . . 282 12.7.2. Dealing with Lease Expiration on the Client . . . . 282
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 283 Server . . . . . . . . . . . . . . . . . . . . . . . 283
12.7.4. Recovery from Metadata Server Restart . . . . . . . 284 12.7.4. Recovery from Metadata Server Restart . . . . . . . 284
12.7.5. Operations During Metadata Server Grace Period . . . 286 12.7.5. Operations During Metadata Server Grace Period . . . 286
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 286 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 286
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 286 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 287
12.9. Security Considerations for pNFS . . . . . . . . . . . . 287 12.9. Security Considerations for pNFS . . . . . . . . . . . . 287
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 288 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 288
13.1. Client ID and Session Considerations . . . . . . . . . . 288 13.1. Client ID and Session Considerations . . . . . . . . . . 288
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 290 13.1.1. Sessions Considerations for Data Servers . . . . . . 291
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 291 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 291
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 295 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 292
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 295 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 296
13.4.2. Interpreting the File Layout Using Sparse Packing . 295 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 296
13.4.3. Interpreting the File Layout Using Dense Packing . . 298 13.4.2. Interpreting the File Layout Using Sparse Packing . 296
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 300 13.4.3. Interpreting the File Layout Using Dense Packing . . 299
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 302 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 301
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 303 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 303
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 305 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 304
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 307 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 306
13.9. Metadata and Data Server State Coordination . . . . . . 307 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 308
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 307 13.9. Metadata and Data Server State Coordination . . . . . . 308
13.9.2. Data Server State Propagation . . . . . . . . . . . 308 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 308
13.10. Data Server Component File Size . . . . . . . . . . . . 310 13.9.2. Data Server State Propagation . . . . . . . . . . . 309
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 311 13.10. Data Server Component File Size . . . . . . . . . . . . 311
13.12. Security Considerations for the File Layout Type . . . . 311 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 312
14. Internationalization . . . . . . . . . . . . . . . . . . . . 312 13.12. Security Considerations for the File Layout Type . . . . 312
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 313 14. Internationalization . . . . . . . . . . . . . . . . . . . . 313
14.2. Stringprep profile for the utf8str_cis type . . . . . . 315 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 314
14.3. Stringprep profile for the utf8str_mixed type . . . . . 316 14.2. Stringprep profile for the utf8str_cis type . . . . . . 316
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 318 14.3. Stringprep profile for the utf8str_mixed type . . . . . 317
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 318 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 319
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 319 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 319
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 319 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 320
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 321 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 320
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 323 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 322
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 324 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 324
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 326 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 325
15.1.5. State Management Errors . . . . . . . . . . . . . . 328 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 327
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 329 15.1.5. State Management Errors . . . . . . . . . . . . . . 329
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 329 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 330
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 330 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 330
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 331 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 331
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 332 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 332
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 333 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 333
15.1.12. Session Management Errors . . . . . . . . . . . . . 334 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 334
15.1.13. Client Management Errors . . . . . . . . . . . . . . 335 15.1.12. Session Management Errors . . . . . . . . . . . . . 335
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 336 15.1.13. Client Management Errors . . . . . . . . . . . . . . 336
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 336 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 337
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 337 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 337
15.2. Operations and their valid errors . . . . . . . . . . . 338 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 338
15.3. Callback operations and their valid errors . . . . . . . 354 15.2. Operations and their valid errors . . . . . . . . . . . 339
15.4. Errors and the operations that use them . . . . . . . . 356 15.3. Callback operations and their valid errors . . . . . . . 355
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 370 15.4. Errors and the operations that use them . . . . . . . . 357
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 370 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 371
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 371 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 371
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 381 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 372
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 384 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 382
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 384 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 385
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 387 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 385
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 388 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 388
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 391 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 389
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 392
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 394 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 395
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 395 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 396
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 395 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 396
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 397 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 398
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 398 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 399
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 400 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 401
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 404 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 405
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 406 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 407
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 407 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 408
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 409 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 410
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 410 Attributes . . . . . . . . . . . . . . . . . . . . . . . 411
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 411 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 412
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 430 Directory . . . . . . . . . . . . . . . . . . . . . . . 431
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 431 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 432
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 432 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 433
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 433 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 434
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 435 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 436
18.22. Operation 25: READ - Read from File . . . . . . . . . . 435 18.22. Operation 25: READ - Read from File . . . . . . . . . . 436
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 438 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 439
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 441 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 442
18.25. Operation 28: REMOVE - Remove File System Object . . . . 442 18.25. Operation 28: REMOVE - Remove File System Object . . . . 443
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 445 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 446
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 448 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 449
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 449 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 450
18.29. Operation 33: SECINFO - Obtain Available Security . . . 450 18.29. Operation 33: SECINFO - Obtain Available Security . . . 451
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 453 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 454
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 456 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 457
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 457 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 458
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 462 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 463
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 463 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 464
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 466 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 467
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 482 Confirm Client ID . . . . . . . . . . . . . . . . . . . 483
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 492 session . . . . . . . . . . . . . . . . . . . . . . . . 493
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 494 locks . . . . . . . . . . . . . . . . . . . . . . . . . 495
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 495 delegation . . . . . . . . . . . . . . . . . . . . . . . 496
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 499 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 500
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 501 for a File System . . . . . . . . . . . . . . . . . . . 502
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 503 a layout . . . . . . . . . . . . . . . . . . . . . . . . 504
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 506 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 507
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 510 Information . . . . . . . . . . . . . . . . . . . . . . 511
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 515 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 516
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 516 sequencing and control . . . . . . . . . . . . . . . . . 517
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 522 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 523
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 524 validity . . . . . . . . . . . . . . . . . . . . . . . . 525
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 526 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 527
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 529 client ID . . . . . . . . . . . . . . . . . . . . . . . 530
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 530 Finished . . . . . . . . . . . . . . . . . . . . . . . . 531
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 532 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 533
19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 533 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 534
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 533 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 534
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 533 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 534
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 538 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 539
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 538 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 539
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 539 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 540
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 540 Client . . . . . . . . . . . . . . . . . . . . . . . . . 541
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 544 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 545
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 548 Client . . . . . . . . . . . . . . . . . . . . . . . . . 549
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 549 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 550
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 551 Resources for Recallable Objects . . . . . . . . . . . . 552
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 552 limits . . . . . . . . . . . . . . . . . . . . . . . . . 553
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 553 sequencing and control . . . . . . . . . . . . . . . . . 554
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 555 Delegation Wants . . . . . . . . . . . . . . . . . . . . 556
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 556 lock availability . . . . . . . . . . . . . . . . . . . 557
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 558 changes . . . . . . . . . . . . . . . . . . . . . . . . 559
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 560 Operation . . . . . . . . . . . . . . . . . . . . . . . 561
21. Security Considerations . . . . . . . . . . . . . . . . . . . 560 21. Security Considerations . . . . . . . . . . . . . . . . . . . 561
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 562 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 563
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 562 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 563
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 562 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 563
22.3. Defining New Notifications . . . . . . . . . . . . . . . 563 22.3. Defining New Notifications . . . . . . . . . . . . . . . 564
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 563 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 564
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 565 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 566
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 565 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 566
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 565 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 566
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 565 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 566
23.1. Normative References . . . . . . . . . . . . . . . . . . 565 23.1. Normative References . . . . . . . . . . . . . . . . . . 566
23.2. Informative References . . . . . . . . . . . . . . . . . 567 23.2. Informative References . . . . . . . . . . . . . . . . . 568
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 568 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 569
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 570 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 571
Intellectual Property and Copyright Statements . . . . . . . . . 572 Intellectual Property and Copyright Statements . . . . . . . . . 573
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 27, line 45 skipping to change at page 27, line 45
the client ID in order to conserve resources. If the client contacts the client ID in order to conserve resources. If the client contacts
the server after this release, the server must ensure the client the server after this release, the server must ensure the client
receives the appropriate error so that it will use the EXCHANGE_ID/ receives the appropriate error so that it will use the EXCHANGE_ID/
CREATE_SESSION sequence to establish a new client ID. The server CREATE_SESSION sequence to establish a new client ID. The server
ought to be very hesitant to release a client ID since the resulting ought to be very hesitant to release a client ID since the resulting
work on the client to recover from such an event will be the same work on the client to recover from such an event will be the same
burden as if the server had failed and restarted. Typically a server burden as if the server had failed and restarted. Typically a server
would not release a client ID unless there had been no activity from would not release a client ID unless there had been no activity from
that client for many minutes. As long as there are sessions, opens, that client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.10.1.4 for discussion on releasing the client ID. See Section 2.10.11.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
skipping to change at page 46, line 43 skipping to change at page 46, line 43
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.5.1.2). The requirement holds even if the requester is Section 2.10.5.1.3). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client issuing the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
requirement, divide the requests into three classifications: requirement, divide the requests into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
skipping to change at page 49, line 40 skipping to change at page 49, line 40
seen in the slot. Note that because the sequence id must seen in the slot. Note that because the sequence id must
wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered
new request and a misordered retry cannot be distinguished. Thus, new request and a misordered retry cannot be distinguished. Thus,
the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from
SEQUENCE or CB_SEQUENCE). SEQUENCE or CB_SEQUENCE).
Unlike the XID, the slot id is always within a specific range; this Unlike the XID, the slot id is always within a specific range; this
has two implications. The first implication is that for a given has two implications. The first implication is that for a given
session, the replier need only cache the results of a limited number session, the replier need only cache the results of a limited number
of COMPOUND requests . The second implication derives from the of COMPOUND requests . The second implication derives from the
first, which is unlike XID-indexed reply caches (also known as first, which is that unlike XID-indexed reply caches (also known as
duplicate request caches - DRCs), the slot id-based reply cache duplicate request caches - DRCs), the slot id-based reply cache
cannot be overflowed. Through use of the sequence id to identify cannot be overflowed. Through use of the sequence id to identify
retransmitted requests, the replier does not need to actually cache retransmitted requests, the replier does not need to actually cache
the request itself, reducing the storage requirements of the reply the request itself, reducing the storage requirements of the reply
cache further. These facilities make it practical to maintain all cache further. These facilities make it practical to maintain all
the required entries for an effective reply cache. the required entries for an effective reply cache.
The slot id, sequence id, and sessionid therefore take over the The slot id, sequence id, and sessionid therefore take over the
traditional role of the XID and source network address in the traditional role of the XID and source network address in the
replier's reply cache implementation. This approach is considerably replier's reply cache implementation. This approach is considerably
skipping to change at page 52, line 23 skipping to change at page 52, line 23
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slotid as the request replied to and the request with the same slotid as the request replied to and the
next higher sequenceid. next higher sequenceid.
2.10.5.1.1. Errors from SEQUENCE and CB_SEQUENCE 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, sessionid,
sequenceid, and slotid MUST be cached in the reply cache. The reply
from SEQUENCE also includes the highest slotid, target highest
slotid, and status flags. The server SHOULD NOT cache these values,
and instead SHOULD re-compute the values from the current state of
the fore channel, session and/or client ID as appropriate.
Similarly, the reply from CB_SEQUENCE includes a highest slotid and
target highest slotid. The client SHOULD NOT cache these values, and
SHOULD re-compute the values from the current state of the session as
appropriate.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.2. Optional Reply Caching 2.10.5.1.3. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [24]. Caching the request is composed of all idempotent operations [24]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.5.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
skipping to change at page 53, line 9 skipping to change at page 53, line 23
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though o The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis are FALSE, the replier is always sa_cachethis or csa_cachethis are FALSE, the replier is always
free to cache. It may choose this approach in order to simplify free to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
with the next operation in COMPOUND or CB)COMPOUND having the with the next operation in COMPOUND or CB_COMPOUND having the
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP.
2.10.5.2. Retry and Replay of Reply 2.10.5.2. Retry and Replay of Reply
A requester MUST NOT retry a request, unless the connection it used A requester MUST NOT retry a request, unless the connection it used
to send the request disconnects. The requester can then reconnect to send the request disconnects. The requester can then reconnect
and re-send the request, or it can re-send the request over a and re-send the request, or it can re-send the request over a
different connection that is associated with the same session. different connection that is associated with the same session.
skipping to change at page 56, line 11 skipping to change at page 56, line 24
If a reply exceeds ca_maxresponsesize, the reply will have the status If a reply exceeds ca_maxresponsesize, the reply will have the status
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
or it MAY chose to return it on a subsequent operation (in the same or it MAY chose to return it on a subsequent operation (in the same
COMPOUND or CB_COMPOUND reply). A replier MAY return COMPOUND or CB_COMPOUND reply). A replier MAY return
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if
the response would still exceed ca_maxresponsesize. the response would still exceed ca_maxresponsesize.
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
cache a reply except if an error is returned by the SEQUENCE or cache a reply except if an error is returned by the SEQUENCE or
CB_SEQUENCE operation (see Section 2.10.5.1.1). If the reply exceeds CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
is returned on a operation other than first operation (SEQUENCE or is returned on a operation other than first operation (SEQUENCE or
CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or
csa_cachethis are TRUE. For example, if a COMPOUND has eleven csa_cachethis are TRUE. For example, if a COMPOUND has eleven
operations, including SEQUENCE, the fifth operation is a RENAME, and operations, including SEQUENCE, the fifth operation is a RENAME, and
the tenth operation is a READ for one million bytes, the server may the tenth operation is a READ for one million bytes, the server may
return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since
the server executed several operations, especially the non-idempotent the server executed several operations, especially the non-idempotent
skipping to change at page 71, line 18 skipping to change at page 71, line 27
Section 5.2.2 "Context Creation Requests" in [4]). Section 5.2.2 "Context Creation Requests" in [4]).
2.10.9. Session Mechanics - Steady State 2.10.9. Session Mechanics - Steady State
2.10.9.1. Obligations of the Server 2.10.9.1. Obligations of the Server
The server has the primary obligation to monitor the state of The server has the primary obligation to monitor the state of
backchannel resources that the client has created for the server backchannel resources that the client has created for the server
(RPCSEC_GSS contexts and backchannel connections). If these (RPCSEC_GSS contexts and backchannel connections). If these
resources vanish, the server takes action as specified in resources vanish, the server takes action as specified in
Section 2.10.10.2. Section 2.10.11.2.
2.10.9.2. Obligations of the Client 2.10.9.2. Obligations of the Client
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. force the server to cull an inactive session. A server MAY
consider a session to be inactive if the client has not used the
session before the session inactivity timer (Section 2.10.10) has
expired.
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions, one of which has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the backchannel. If the client requires o Maintain GSS contexts for the backchannel. If the client requires
the server to use the RPCSEC_GSS security flavor for callbacks, the server to use the RPCSEC_GSS security flavor for callbacks,
then it needs to be sure the contexts handed to the server via then it needs to be sure the contexts handed to the server via
BACKCHANNEL_CTL are unexpired. BACKCHANNEL_CTL are unexpired.
skipping to change at page 72, line 47 skipping to change at page 73, line 9
If the client wants to use additional connections for the If the client wants to use additional connections for the
backchannel, then it must call BIND_CONN_TO_SESSION on each backchannel, then it must call BIND_CONN_TO_SESSION on each
connection it wants to use with the session. If the client wants to connection it wants to use with the session. If the client wants to
use additional connections for the fore channel, then it must call use additional connections for the fore channel, then it must call
BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state
protection when the client ID was created. protection when the client ID was created.
At this point the session has reached steady state. At this point the session has reached steady state.
2.10.10. Session Mechanics - Recovery 2.10.10. Session Inactivity Timer
2.10.10.1. Events Requiring Client Action The server MAY maintain an session inactivity timer for each session.
If the session inactivity timer expires, then the server MAY destroy
the session. To avoid losing a session due to inactivity, the client
MUST renew the session inactivity timer. The length of session
inactivity timer MUST NOT be less than the lease_time attribute
(Section 5.7.1.11). As with lease renewal (Section 8.3), when the
server receives a SEQUENCE operation, it resets the session
inactivity timer, and MUST NOT allow the timer to expire while the
rest of the operations in the COMPOUND procedure's request are still
executing. Once the last operation has finished, the server MUST set
the session inactivity timer to expire no sooner that the sum of the
current time and the value of the lease_time attribute.
2.10.11. Session Mechanics - Recovery
2.10.11.1. Events Requiring Client Action
The following events require client action to recover. The following events require client action to recover.
2.10.10.1.1. RPCSEC_GSS Context Loss by Callback Path 2.10.11.1.1. RPCSEC_GSS Context Loss by Callback Path
If all RPCSEC_GSS contexts granted by the client to the server for If all RPCSEC_GSS contexts granted by the client to the server for
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.3). expired (see Section 18.46.3).
2.10.10.1.2. Connection Loss 2.10.11.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.5.2. Note that it is
skipping to change at page 73, line 39 skipping to change at page 74, line 16
disconnect. disconnect.
If the connection that was lost was the last one associated with the If the connection that was lost was the last one associated with the
backchannel, and the client wants to retain the backchannel and/or backchannel, and the client wants to retain the backchannel and/or
not put recallable state subject to revocation, the client must not put recallable state subject to revocation, the client must
reconnect, and if it does, it MUST associate the connection to the reconnect, and if it does, it MUST associate the connection to the
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD
indicate when it has no callback connection via the sr_status_flags indicate when it has no callback connection via the sr_status_flags
result from SEQUENCE. result from SEQUENCE.
2.10.10.1.3. Backchannel GSS Context Loss 2.10.11.1.3. Backchannel GSS Context Loss
Via the sr_status_flags result of the SEQUENCE operation or other Via the sr_status_flags result of the SEQUENCE operation or other
means, the client will learn if some or all of the RPCSEC_GSS means, the client will learn if some or all of the RPCSEC_GSS
contexts it assigned to the backchannel have been lost. If the contexts it assigned to the backchannel have been lost. If the
client wants to the retain the backchannel and/or not put recallable client wants to the retain the backchannel and/or not put recallable
state subjection to revocation, the client must use BACKCHANNEL_CTL state subjection to revocation, the client must use BACKCHANNEL_CTL
to assign new contexts. to assign new contexts.
2.10.10.1.4. Loss of Session 2.10.11.1.4. Loss of Session
The replier might lose a record of the session. Causes include: The replier might lose a record of the session. Causes include:
o Replier failure and restart o Replier failure and restart
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
skipping to change at page 75, line 5 skipping to change at page 75, line 27
client ID; loss of client ID however does imply loss of session, client ID; loss of client ID however does imply loss of session,
lock, open, delegation, and layout state. See Section 8.4.2. A lock, open, delegation, and layout state. See Section 8.4.2. A
session can survive a server restart, but lock recovery may still be session can survive a server restart, but lock recovery may still be
needed. needed.
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID
(for example the server restarts and does not preserve client ID (for example the server restarts and does not preserve client ID
state). If so, the client needs to call EXCHANGE_ID, followed by state). If so, the client needs to call EXCHANGE_ID, followed by
CREATE_SESSION. CREATE_SESSION.
2.10.10.2. Events Requiring Server Action 2.10.11.2. Events Requiring Server Action
The following events require server action to recover. The following events require server action to recover.
2.10.10.2.1. Client Crash and Restart 2.10.11.2.1. Client Crash and Restart
As described in Section 18.35, a restarted client sends EXCHANGE_ID As described in Section 18.35, a restarted client sends EXCHANGE_ID
in such a way it causes the server to delete any sessions it had. in such a way it causes the server to delete any sessions it had.
2.10.10.2.2. Client Crash with No Restart 2.10.11.2.2. Client Crash with No Restart
If a client crashes and never comes back, it will never send If a client crashes and never comes back, it will never send
EXCHANGE_ID with its old client owner. Thus the server has session EXCHANGE_ID with its old client owner. Thus the server has session
state that will never be used again. After an extended period of state that will never be used again. After an extended period of
time and if the server has resource constraints, it MAY destroy the time and if the server has resource constraints, it MAY destroy the
old session as well as locking state. old session as well as locking state.
2.10.10.2.3. Extended Network Partition 2.10.11.2.3. Extended Network Partition
To the server, the extended network partition may be no different To the server, the extended network partition may be no different
from a client crash with no restart (see Section 2.10.10.2.2). from a client crash with no restart (see Section 2.10.11.2.2).
Unless the server can discern that there is a network partition, it Unless the server can discern that there is a network partition, it
is free to treat the situation as if the client has crashed is free to treat the situation as if the client has crashed
permanently. permanently.
2.10.10.2.4. Backchannel Connection Loss 2.10.11.2.4. Backchannel Connection Loss
If there were callback requests outstanding at the time of a If there were callback requests outstanding at the time of a
connection loss, then the server MUST retry the request, as described connection loss, then the server MUST retry the request, as described
in Section 2.10.5.2. Note that it is not necessary to retry requests in Section 2.10.5.2. Note that it is not necessary to retry requests
over a connection with the same source network address or the same over a connection with the same source network address or the same
destination network address as the lost connection. As long as the destination network address as the lost connection. As long as the
sessionid, slot id, and sequence id in the retry match that of the sessionid, slot id, and sequence id in the retry match that of the
original request, the callback target will recognize the request as a original request, the callback target will recognize the request as a
retry even if it did see the request prior to disconnect. retry even if it did see the request prior to disconnect.
If the connection lost is the last one associated with the If the connection lost is the last one associated with the
backchannel, then the server MUST indicate that in the backchannel, then the server MUST indicate that in the
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.10.2.5. GSS Context Loss 2.10.11.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and when that one context is assigned to the backchannel reaches one, and when that one context is
near expiry (i.e. between one and two periods of lease time), near expiry (i.e. between one and two periods of lease time),
indicate so in the sr_status_flags field of all SEQUENCE replies. indicate so in the sr_status_flags field of all SEQUENCE replies.
The server MUST indicate when the all of the backchannel's assigned The server MUST indicate when the all of the backchannel's assigned
RPCSEC_GSS contexts have expired in the sr_status_flags field of all RPCSEC_GSS contexts have expired in the sr_status_flags field of all
SEQUENCE replies. SEQUENCE replies.
2.10.11. Parallel NFS and Sessions 2.10.12. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
skipping to change at page 94, line 32 skipping to change at page 94, line 32
server supports and construct requests with only those supported server supports and construct requests with only those supported
attributes (or a subset thereof). attributes (or a subset thereof).
To this end, attributes are divided into three groups: REQUIRED, To this end, attributes are divided into three groups: REQUIRED,
RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are
supported in the NFSv4.1 protocol by a specific and well-defined supported in the NFSv4.1 protocol by a specific and well-defined
encoding and are identified by number. They are requested by setting encoding and are identified by number. They are requested by setting
a bit in the bit vector sent in the GETATTR request; the server a bit in the bit vector sent in the GETATTR request; the server
response includes a bit vector to list what attributes were returned response includes a bit vector to list what attributes were returned
in the response. New REQUIRED or RECOMMENDED attributes may be added in the response. New REQUIRED or RECOMMENDED attributes may be added
to the NFS protocol between major revisions by publishing a to the NFSv4 protocol as part of a new minor version by publishing a
standards-track RFC which allocates a new attribute number value and standards-track RFC which allocates a new attribute number value and
defines the encoding for the attribute. See Section 2.7 for further defines the encoding for the attribute. See Section 2.7 for further
discussion. discussion.
Named attributes are accessed by the new OPENATTR operation, which Named attributes are accessed by the new OPENATTR operation, which
accesses a hidden directory of attributes associated with a file accesses a hidden directory of attributes associated with a file
system object. OPENATTR takes a filehandle for the object and system object. OPENATTR takes a filehandle for the object and
returns the filehandle for the attribute hierarchy. The filehandle returns the filehandle for the attribute hierarchy. The filehandle
for the named attributes is a directory object accessible by LOOKUP for the named attributes is a directory object accessible by LOOKUP
or READDIR and contains files whose names represent the named or READDIR and contains files whose names represent the named
skipping to change at page 95, line 37 skipping to change at page 95, line 37
Note that the hidden directory returned by OPENATTR is a convenience Note that the hidden directory returned by OPENATTR is a convenience
for protocol processing. The client should not make any assumptions for protocol processing. The client should not make any assumptions
about the server's implementation of named attributes and whether the about the server's implementation of named attributes and whether the
underlying file system at the server has a named attribute directory underlying file system at the server has a named attribute directory
or not. Therefore, operations such as SETATTR and GETATTR on the or not. Therefore, operations such as SETATTR and GETATTR on the
named attribute directory are undefined. named attribute directory are undefined.
5.1. REQUIRED Attributes 5.1. REQUIRED Attributes
These MUST be supported by every NFSv4.1 client and server in order These MUST be supported by every NFSv4.1 client and server in order
to ensure a minimum level of interoperability. The server must store to ensure a minimum level of interoperability. The server MUST store
and return these attributes and the client must be able to function and return these attributes and the client MUST be able to function
with an attribute set limited to these attributes. With just the with an attribute set limited to these attributes. With just the
REQUIRED attributes some client functionality may be impaired or REQUIRED attributes some client functionality may be impaired or
limited in some ways. A client may ask for any of these attributes limited in some ways. A client may ask for any of these attributes
to be returned by setting a bit in the GETATTR request and the server to be returned by setting a bit in the GETATTR request and the server
must return their value. must return their value.
5.2. RECOMMENDED Attributes 5.2. RECOMMENDED Attributes
These attributes are understood well enough to warrant support in the These attributes are understood well enough to warrant support in the
NFSv4.1 protocol. However, they may not be supported on all clients NFSv4.1 protocol. However, they may not be supported on all clients
and servers. A client may ask for any of these attributes to be and servers. A client may ask for any of these attributes to be
returned by setting a bit in the GETATTR request but must handle the returned by setting a bit in the GETATTR request but must handle the
case where the server does not return them. A client may ask for the case where the server does not return them. A client may ask for the
set of attributes the server supports and should not request set of attributes the server supports and SHOULD NOT request
attributes the server does not support. A server should be tolerant attributes the server does not support. A server should be tolerant
of requests for unsupported attributes and simply not return them of requests for unsupported attributes and simply not return them
rather than considering the request an error. It is expected that rather than considering the request an error. It is expected that
servers will support all attributes they comfortably can and only servers will support all attributes they comfortably can and only
fail to support attributes which are difficult to support in their fail to support attributes which are difficult to support in their
operating environments. A server should provide attributes whenever operating environments. A server should provide attributes whenever
they don't have to "tell lies" to the client. For example, a file they don't have to "tell lies" to the client. For example, a file
modification time should be either an accurate time or should not be modification time should be either an accurate time or should not be
supported by the server. This will not always be comfortable to supported by the server. This will not always be comfortable to
clients but the client is better positioned decide whether and how to clients but the client is better positioned decide whether and how to
skipping to change at page 97, line 5 skipping to change at page 97, line 5
of delegations (in the case of the named attribute directory these of delegations (in the case of the named attribute directory these
will be directory delegations). However, since granting of will be directory delegations). However, since granting of
delegations or not is within the server's discretion, a server need delegations or not is within the server's discretion, a server need
not support delegations on named attributes or the named attribute not support delegations on named attributes or the named attribute
directory. directory.
It is RECOMMENDED that servers support arbitrary named attributes. A It is RECOMMENDED that servers support arbitrary named attributes. A
client should not depend on the ability to store any named attributes client should not depend on the ability to store any named attributes
in the server's file system. If a server does support named in the server's file system. If a server does support named
attributes, a client which is also able to handle them should be able attributes, a client which is also able to handle them should be able
to copy a file's data and meta-data with complete transparency from to copy a file's data and metadata with complete transparency from
one location to another; this would imply that names allowed for one location to another; this would imply that names allowed for
regular directory entries are valid for named attribute names as regular directory entries are valid for named attribute names as
well. well.
In NFSv4.1, the structure of named attribute directories is In NFSv4.1, the structure of named attribute directories is
restricted in a number of ways, in order to prevent the development restricted in a number of ways, in order to prevent the development
of non-interoperable implementations in which some servers support a of non-interoperable implementations in which some servers support a
fully general hierarchical directory structure for named attributes fully general hierarchical directory structure for named attributes
while others support a limited set, but fully adequate to the while others support a limited set, but fully adequate to the
feature's goals. In such an environment, clients or applications feature's goals. In such an environment, clients or applications
might come to depend on non-portable extensions. The restrictions might come to depend on non-portable extensions. The restrictions
are: are:
o CREATE is not allowed in a named attribute directory. Thus, such o CREATE is not allowed in a named attribute directory. Thus, such
objects as symbolic links and special files are not allowed to be objects as symbolic links and special files are not allowed to be
named attributes. Further, directories may not be created in a named attributes. Further, directories may not be created in a
named attribute directory so no hierarchical structure of named named attribute directory so no hierarchical structure of named
attributes for a single object is allowed. attributes for a single object is allowed.
o OPENATTR many not be done on a named attribute directory or on a o OPENATTR MUST NOT be done on a named attribute directory or on a
named attribute. Thus, although these object have attributes, named attribute.
they may not may named attributes.
o Doing a RENAME of a named attribute to a different named attribute o Doing a RENAME of a named attribute to a different named attribute
directory or to an ordinary (i.e. non-named-attribute) directory directory or to an ordinary (i.e. non-named-attribute) directory
is not allowed. is not allowed.
o Creating hard links between names attribute directories or between o Creating hard links between named attribute directories or between
named attribute directories and ordinary directories is not named attribute directories and ordinary directories is not
allowed. allowed.
Names of attributes will not be controlled by this document or other Names of attributes will not be controlled by this document or other
IETF standards track documents. See Section 22.1 for further IETF standards track documents. See Section 22.1 for further
discussion. discussion.
5.4. Classification of Attributes 5.4. Classification of Attributes
Each of the REQUIRED and RECOMMENDED attributes can be classified in Each of the REQUIRED and RECOMMENDED attributes can be classified in
skipping to change at page 103, line 43 skipping to change at page 103, line 43
True, if the server able to change the times for a file system object True, if the server able to change the times for a file system object
as specified in a SETATTR operation. as specified in a SETATTR operation.
5.7.2.3. Attribute 16: case_insensitive 5.7.2.3. Attribute 16: case_insensitive
True, if filename comparisons on this file system are case True, if filename comparisons on this file system are case
insensitive. insensitive.
5.7.2.4. Attribute 17: case_preserving 5.7.2.4. Attribute 17: case_preserving
True, if filename case on this file system are preserved. True, if file name case on this file system is preserved.
5.7.2.5. Attribute 60: change_policy 5.7.2.5. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fss_type field of the fs_status attribute have not changed. On the fss_type field of the fs_status attribute have not changed. On
the other hand, a change in this value does necessarily imply a the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server change in policy. It is up to the client to interrogate the server
skipping to change at page 105, line 49 skipping to change at page 105, line 49
lead to the client either wasting bandwidth or not receiving the best lead to the client either wasting bandwidth or not receiving the best
performance. performance.
5.7.2.22. Attribute 32: mimetype 5.7.2.22. Attribute 32: mimetype
MIME body type/subtype of this object. MIME body type/subtype of this object.
5.7.2.23. Attribute 55: mounted_on_fileid 5.7.2.23. Attribute 55: mounted_on_fileid
Like fileid, but if the target filehandle is the root of a file Like fileid, but if the target filehandle is the root of a file
system return the fileid of the underlying directory. system, this attribute represents the fileid of the underlying
directory.
UNIX-based operating environments connect a file system into the UNIX-based operating environments connect a file system into the
namespace by connecting (mounting) the file system onto the existing namespace by connecting (mounting) the file system onto the existing
file object (the mount point, usually a directory) of an existing file object (the mount point, usually a directory) of an existing
file system. When the mount point's parent directory is read via an file system. When the mount point's parent directory is read via an
API like readdir(), the return results are directory entries, each API like readdir(), the return results are directory entries, each
with a component name and a fileid. The fileid of the mount point's with a component name and a fileid. The fileid of the mount point's
directory entry will be different from the fileid that the stat() directory entry will be different from the fileid that the stat()
system call returns. The stat() system call is returning the fileid system call returns. The stat() system call is returning the fileid
of the root of the mounted file system, whereas readdir() is of the root of the mounted file system, whereas readdir() is
skipping to change at page 107, line 7 skipping to change at page 107, line 7
should obey an invariant that has it returning a value that is equal should obey an invariant that has it returning a value that is equal
to the file object's entry in the object's parent directory, i.e. to the file object's entry in the object's parent directory, i.e.
what readdir() would have returned. Some operating environments what readdir() would have returned. Some operating environments
allow a series of two or more file systems to be mounted onto a allow a series of two or more file systems to be mounted onto a
single mount point. In this case, for the server to obey the single mount point. In this case, for the server to obey the
aforementioned invariant, it will need to find the base mount point, aforementioned invariant, it will need to find the base mount point,
and not the intermediate mount points. and not the intermediate mount points.
5.7.2.24. Attribute 34: no_trunc 5.7.2.24. Attribute 34: no_trunc
True, if a name longer than name_max is used, an error be returned If this attribute is TRUE, then if the client uses a file name longer
and name is not truncated. than name_max, an error will be returned instead of the name being
truncated.
5.7.2.25. Attribute 35: numlinks 5.7.2.25. Attribute 35: numlinks
Number of hard links to this object. Number of hard links to this object.
5.7.2.26. Attribute 36: owner 5.7.2.26. Attribute 36: owner
The string name of the owner of this object. The string name of the owner of this object.
5.7.2.27. Attribute 37: owner_group 5.7.2.27. Attribute 37: owner_group
The string name of the group ownership of this object. The string name of the group ownership of this object.
5.7.2.28. Attribute 38: quota_avail_hard 5.7.2.28. Attribute 38: quota_avail_hard
The value in bytes which represent the amount of additional disk The value in bytes which represents the amount of additional disk
space beyond the current allocation that can be allocated to this space beyond the current allocation that can be allocated to this
file or directory before further allocations will be refused. It is file or directory before further allocations will be refused. It is
understood that this space may be consumed by allocations to other understood that this space may be consumed by allocations to other
files or directories. files or directories.
5.7.2.29. Attribute 39: quota_avail_soft 5.7.2.29. Attribute 39: quota_avail_soft
The value in bytes which represents the amount of additional disk The value in bytes which represents the amount of additional disk
space that can be allocated to this file or directory before the user space that can be allocated to this file or directory before the user
may reasonably be warned. It is understood that this space may be may reasonably be warned. It is understood that this space may be
skipping to change at page 108, line 9 skipping to change at page 108, line 11
files or directories for which a quota_used value is maintained. files or directories for which a quota_used value is maintained.
E.g. "all files with a given owner", "all files with a given group E.g. "all files with a given owner", "all files with a given group
owner". etc. owner". etc.
The server is at liberty to choose any of those sets but should do so The server is at liberty to choose any of those sets but should do so
in a repeatable way. The rule may be configured per file system or in a repeatable way. The rule may be configured per file system or
may be "choose the set with the smallest quota". may be "choose the set with the smallest quota".
5.7.2.31. Attribute 41: rawdev 5.7.2.31. Attribute 41: rawdev
Raw device identifier. UNIX device major/minor node information. If Raw device identifier; the UNIX device major/minor node information.
the value of type is not NF4BLK or NF4CHR, the value return SHOULD If the value of type is not NF4BLK or NF4CHR, the value returned
NOT be considered useful. SHOULD NOT be considered useful.
5.7.2.32. Attribute 42: space_avail 5.7.2.32. Attribute 42: space_avail
Disk space in bytes available to this user on the file system Disk space in bytes available to this user on the file system
containing this object - this should be the smallest relevant limit. containing this object - this should be the smallest relevant limit.
5.7.2.33. Attribute 43: space_free 5.7.2.33. Attribute 43: space_free
Free disk space in bytes on the file system containing this object - Free disk space in bytes on the file system containing this object -
this should be the smallest relevant limit. this should be the smallest relevant limit.
skipping to change at page 108, line 33 skipping to change at page 108, line 35
5.7.2.34. Attribute 44: space_total 5.7.2.34. Attribute 44: space_total
Total disk space in bytes on the file system containing this object. Total disk space in bytes on the file system containing this object.
5.7.2.35. Attribute 45: space_used 5.7.2.35. Attribute 45: space_used
Number of file system bytes allocated to this object. Number of file system bytes allocated to this object.
5.7.2.36. Attribute 46: system 5.7.2.36. Attribute 46: system
True, if this file is a "system" file with respect to the Windows This attribute is TRUE if this file is a "system" file with respect
API. to the Windows operating environment.
5.7.2.37. Attribute 47: time_access 5.7.2.37. Attribute 47: time_access
The time_access attribute represents the time of last access to the The time_access attribute represents the time of last access to the
object by a read that was satisfied by the server. The notion of object by a read that was satisfied by the server. The notion of
what is an "access" depends on server's operating environment and/or what is an "access" depends on server's operating environment and/or
the server's file system semantics. For example, for servers obeying the server's file system semantics. For example, for servers obeying
POSIX semantics, time_access would be updated only by the READLINK, POSIX semantics, time_access would be updated only by the READLINK,
READ, and READDIR operations and not any of the operations that READ, and READDIR operations and not any of the operations that
modify the content of the object. Of course, setting the modify the content of the object. Of course, setting the
skipping to change at page 109, line 29 skipping to change at page 109, line 30
The time of creation of the object. This attribute does not have any The time of creation of the object. This attribute does not have any
relation to the traditional UNIX file attribute "ctime" or "change relation to the traditional UNIX file attribute "ctime" or "change
time". time".
5.7.2.41. Attribute 51: time_delta 5.7.2.41. Attribute 51: time_delta
Smallest useful server time granularity. Smallest useful server time granularity.
5.7.2.42. Attribute 52: time_metadata 5.7.2.42. Attribute 52: time_metadata
The time of last meta-data modification of the object. The time of last metadata modification of the object.
5.7.2.43. Attribute 53: time_modify 5.7.2.43. Attribute 53: time_modify
The time of last modification to the object. The time of last modification to the object.
5.7.2.44. Attribute 54: time_modify_set 5.7.2.44. Attribute 54: time_modify_set
Set the time of last modification to the object. SETATTR use only. Set the time of last modification to the object. SETATTR use only.
5.8. Interpreting owner and owner_group 5.8. Interpreting owner and owner_group
skipping to change at page 110, line 31 skipping to change at page 110, line 32
service may also be used to accomplish the translation. A server may service may also be used to accomplish the translation. A server may
provide a more general service, not limited by any particular provide a more general service, not limited by any particular
translation (which would only translate a limited set of possible translation (which would only translate a limited set of possible
strings) by storing the owner and owner_group attributes in local strings) by storing the owner and owner_group attributes in local
storage without any translation or it may augment a translation storage without any translation or it may augment a translation
method by storing the entire string for attributes for which no method by storing the entire string for attributes for which no
translation is available while using the local representation for translation is available while using the local representation for
those cases in which a translation is available. those cases in which a translation is available.
Servers that do not provide support for all possible values of the Servers that do not provide support for all possible values of the
owner and owner_group attributes, should return an error owner and owner_group attributes, SHOULD return an error
(NFS4ERR_BADOWNER) when a string is presented that has no (NFS4ERR_BADOWNER) when a string is presented that has no
translation, as the value to be set for a SETATTR of the owner, translation, as the value to be set for a SETATTR of the owner,
owner_group, or acl attributes. When a server does accept an owner owner_group, or acl attributes. When a server does accept an owner
or owner_group value as valid on a SETATTR (and similarly for the or owner_group value as valid on a SETATTR (and similarly for the
owner and group strings in an acl), it is promising to return that owner and group strings in an acl), it is promising to return that
same string when a corresponding GETATTR is done. Configuration same string when a corresponding GETATTR is done. Configuration
changes and ill-constructed name translations (those that contain changes (including changes from the mapping of the string to the
aliasing) may make that promise impossible to honor. Servers should local representation) and ill-constructed name translations (those
make appropriate efforts to avoid a situation in which these that contain aliasing) may make that promise impossible to honor.
attributes have their values changed when no real change to ownership Servers should make appropriate efforts to avoid a situation in which
has occurred. these attributes have their values changed when no real change to
ownership has occurred.
The "dns_domain" portion of the owner string is meant to be a DNS The "dns_domain" portion of the owner string is meant to be a DNS
domain name. For example, user@ietf.org. Servers should accept as domain name. For example, user@ietf.org. Servers should accept as
valid a set of users for at least one domain. A server may treat valid a set of users for at least one domain. A server may treat
other domains as having no valid translations. A more general other domains as having no valid translations. A more general
service is provided when a server is capable of accepting users for service is provided when a server is capable of accepting users for
multiple domains, or for all domains, subject to security multiple domains, or for all domains, subject to security
constraints. constraints.
In the case where there is no translation available to the client or In the case where there is no translation available to the client or
server, the attribute value must be constructed without the "@". server, the attribute value must be constructed without the "@".
Therefore, the absence of the @ from the owner or owner_group Therefore, the absence of the @ from the owner or owner_group
attribute signifies that no translation was available at the sender attribute signifies that no translation was available at the sender
and that the receiver of the attribute should not use that string as and that the receiver of the attribute should not use that string as
a basis for translation into its own internal format. Even though a basis for translation into its own internal format. Even though
the attribute value can not be translated, it may still be useful. the attribute value can not be translated, it may still be useful.
In the case of a client, the attribute string may be used for local In the case of a client, the attribute string may be used for local
display of ownership. display of ownership.
To provide a greater degree of compatibility with NFSv3, which To provide a greater degree of compatibility with NFSv3, which
identified users and groups by 32-bit unsigned uid's and gid's, owner identified users and groups by 32-bit unsigned user identifiers and
and group strings that consist of decimal numeric values with no group identifiers, owner and group strings that consist of decimal
leading zeros can be given a special interpretation by clients and numeric values with no leading zeros can be given a special
servers which choose to provide such support. The receiver may treat interpretation by clients and servers which choose to provide such
such a user or group string as representing the same user as would be support. The receiver may treat such a user or group string as
represented by an NFSv3 uid or gid having the corresponding numeric representing the same user as would be represented by an NFSv3 uid or
value. A server is not obligated to accept such a string, but may gid having the corresponding numeric value. A server is not
return an NFS4ERR_BADOWNER instead. To avoid this mechanism being obligated to accept such a string, but may return an NFS4ERR_BADOWNER
used to subvert user and group translation, so that a client might instead. To avoid this mechanism being used to subvert user and
pass all of the owners and groups in numeric form, a server SHOULD group translation, so that a client might pass all of the owners and
return an NFS4ERR_BADOWNER error when there is a valid translation groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER
for the user or owner designated in this way. In that case, the error when there is a valid translation for the user or owner
client must use the appropriate name@domain string and not the designated in this way. In that case, the client must use the
special form for compatibility. appropriate name@domain string and not the special form for
compatibility.
The owner string "nobody" may be used to designate an anonymous user, The owner string "nobody" may be used to designate an anonymous user,
which will be associated with a file created by a security principal which will be associated with a file created by a security principal
that cannot be mapped through normal means to the owner attribute. that cannot be mapped through normal means to the owner attribute.
5.9. Character Case Attributes 5.9. Character Case Attributes
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) has a "long descriptive each UCS-4 character (which UTF-8 encodes) has a "long descriptive
name" RFC1345 [35] which may or may not included the word "CAPITAL" name" RFC1345 [35] which may or may not include the word "CAPITAL" or
or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to "SMALL". The presence of SMALL or CAPITAL allows an NFS server to
implement unambiguous and efficient table driven mappings for case implement unambiguous and efficient table driven mappings for case
insensitive comparisons, and non-case-preserving storage. For insensitive comparisons, and non-case-preserving storage. For
general character handling and internationalization issues, see general character handling and internationalization issues, see
Section 14. Section 14.
5.10. Directory Notification Attributes 5.10. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free to for notifications of changes to attributes, but the server is free to
ignore what the client requests. The client can determine in advance ignore what the client requests. The client can determine in advance
skipping to change at page 112, line 24 skipping to change at page 112, line 27
5.10.2. Attribute 57: dirent_notif_delay 5.10.2. Attribute 57: dirent_notif_delay
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to a file server will delay before notifying the client of a change to a file
object that has an entry in the directory. object that has an entry in the directory.
5.11. pNFS Attribute Definitions 5.11. pNFS Attribute Definitions
5.11.1. Attribute 62: fs_layout_type 5.11.1. Attribute 62: fs_layout_type
The fs_layout_type attribute (data type layouttype4 (Section 3.3.13)) The fs_layout_type attribute (see Section 3.3.13) applies to a file
applies to a file system and indicates what layout types are system and indicates what layout types are supported by the file
supported by the file system. When the client encounters a new fsid, system. When the client encounters a new fsid, the client SHOULD
the client should obtain the value for the fs_layout_type attribute obtain the value for the fs_layout_type attribute associated with the
associated with the new file system. This attribute is used by the new file system. This attribute is used by the client to determine
client to determine if the layout types supported by the server match if the layout types supported by the server match any of the client's
any of the client's supported layout types. supported layout types.
5.11.2. Attribute 66: layout_alignment 5.11.2. Attribute 66: layout_alignment
When a client has layouts for a file system, the layout_alignment When a client has layouts for a file system, the layout_alignment
attribute indicates the preferred alignment for I/O to files on that attribute indicates the preferred alignment for I/O to files on that
file system. Where possible, the client should send READ and WRITE file system. Where possible, the client should send READ and WRITE
operations with offsets that are whole multiples of the operations with offsets that are whole multiples of the
layout_alignment attribute. layout_alignment attribute.
5.11.3. Attribute 65: layout_blksize 5.11.3. Attribute 65: layout_blksize
When a client has layouts for a file system, the layout_blksize When a client has layouts for a file system, the layout_blksize
attribute indicates the preferred block size for I/O to files on that attribute indicates the preferred block size for I/O to files on that
file system. Where possible, the client should send READ operations file system. Where possible, the client should send READ operations
with a count argument that is a whole multiple of layout_blksize, and with a count argument that is a whole multiple of layout_blksize, and
WRITE operations with a data argument of size that is a whole WRITE operations with a data argument of size that is a whole
multiple of layout_blksize. multiple of layout_blksize.
5.11.4. Attribute 63: layout_hint 5.11.4. Attribute 63: layout_hint
The layout_hint attribute (data type layouthint4 (Section 3.3.19)) The layout_hint attribute (see Section 3.3.19) may be set on newly
may be set on newly created files to influence the metadata server's created files to influence the metadata server's choice for the
choice for the file's layout. If possible, this attribute is one of file's layout. If possible, this attribute is one of those set in
those set in the initial attributes within the OPEN operation. The the initial attributes within the OPEN operation. The metadata
metadata server may choose to ignore this attribute. The layout_hint server may choose to ignore this attribute. The layout_hint
attribute is a sub-set of the layout structure returned by LAYOUTGET. attribute is a sub-set of the layout structure returned by LAYOUTGET.
For example, instead of specifying particular devices, this would be For example, instead of specifying particular devices, this would be
used to suggest the stripe width of a file. The server used to suggest the stripe width of a file. The server
implementation determines which fields within the layout will be implementation determines which fields within the layout will be
used. used.
5.11.5. Attribute 64: layout_type 5.11.5. Attribute 64: layout_type
This attribute lists the layout type(s) available for a file. The This attribute lists the layout type(s) available for a file. The
value returned by the server is for informational purposes only. The value returned by the server is for informational purposes only. The
skipping to change at page 113, line 33 skipping to change at page 113, line 33
needed in order to perform I/O. For example, the specific device needed in order to perform I/O. For example, the specific device
information for the file and its layout. information for the file and its layout.
5.11.6. Attribute 68: mdsthreshold 5.11.6. Attribute 68: mdsthreshold
This attribute is a server provided hint used to communicate to the This attribute is a server provided hint used to communicate to the
client when it is more efficient to send READ and WRITE operations to client when it is more efficient to send READ and WRITE operations to
the metadata server or the data server. The two types of thresholds the metadata server or the data server. The two types of thresholds
described are file size thresholds and I/O size thresholds. If a described are file size thresholds and I/O size thresholds. If a
file's size is smaller than the file size threshold, data accesses file's size is smaller than the file size threshold, data accesses
should be sent to the metadata server. If an I/O is below the I/O SHOULD be sent to the metadata server. If an I/O request has a
size threshold, the I/O should be sent to the metadata server. As length that is below the I/O size threshold, the I/O SHOULD be sent
defined, each threshold type is specified separately for READ and to the metadata server. Each threshold type is specified separately
WRITE. for READ and WRITE.
The server may provide both types of thresholds for a file. If both The server MAY provide both types of thresholds for a file. If both
file size and I/O size are provided, the client should exceed both file size and I/O size are provided, the client SHOULD reach or
thresholds before issuing its READ or WRITE requests to the data exceed both thresholds before issuing its READ or WRITE requests to
server. Alternatively, if only one of the specified thresholds is the data server. Alternatively, if only one of the specified
exceeded, the I/O requests are sent to the metadata server. thresholds are reached or exceeded, the I/O requests are sent to the
metadata server.
For each threshold type, a value of 0 indicates no READ or WRITE For each threshold type, a value of 0 indicates no READ or WRITE
should be sent to the metadata server, while a value of all 1s should be sent to the metadata server, while a value of all 1s
indicates all READS or WRITES should be sent to the metadata server. indicates all READS or WRITES should be sent to the metadata server.
The attribute is available on a per filehandle basis. If the current The attribute is available on a per filehandle basis. If the current
filehandle refers to a non-pNFS file or directory, the metadata filehandle refers to a non-pNFS file or directory, the metadata
server should return an attribute that is representative of the server should return an attribute that is representative of the
filehandle's file system. It is suggested that this attribute is filehandle's file system. It is suggested that this attribute is
queried as part of the OPEN operation. Due to dynamic system queried as part of the OPEN operation. Due to dynamic system
skipping to change at page 114, line 24 skipping to change at page 114, line 25
reached. reached.
When retention is enabled, retention MUST extend to the data of the When retention is enabled, retention MUST extend to the data of the
file, and the name of file. The server MAY extend retention any file, and the name of file. The server MAY extend retention any
other property of the file, including any subset of REQUIRED, other property of the file, including any subset of REQUIRED,
RECOMMENDED, and named attributes, with the exceptions noted in this RECOMMENDED, and named attributes, with the exceptions noted in this
section. section.
Servers MAY support or not support retention on any file object type. Servers MAY support or not support retention on any file object type.
The five retention attributes are as follows: The five retention attributes are explained in the next subsections.
5.12.1. Attribute 69: retention_get 5.12.1. Attribute 69: retention_get
If retention is enabled for the associated file, this attribute's If retention is enabled for the associated file, this attribute's
value represents the retention begin time of the file object. This value represents the retention begin time of the file object. This
attribute's value is only readable with the GETATTR operation and may attribute's value is only readable with the GETATTR operation and may
not be modified by the SETATTR operation. The value of the attribute not be modified by the SETATTR operation. The value of the attribute
consists of: consists of:
const RET4_DURATION_INFINITE = 0xffffffffffffffff; const RET4_DURATION_INFINITE = 0xffffffffffffffff;
skipping to change at page 115, line 43 skipping to change at page 115, line 48
5.12.4. Attribute 72: retentevt_set 5.12.4. Attribute 72: retentevt_set
Set the event-based retention duration, and optionally enable event- Set the event-based retention duration, and optionally enable event-
based retention on the file object. This attribute corresponds to based retention on the file object. This attribute corresponds to
retentevt_get, is like retention_set, but refers to event-based retentevt_get, is like retention_set, but refers to event-based
retention. When event based retention is set, the file MUST be retention. When event based retention is set, the file MUST be
retained even if non-event-based retention has been set, and the retained even if non-event-based retention has been set, and the
duration of non-event-based retention has been reached. Conversely, duration of non-event-based retention has been reached. Conversely,
when non-event-based retention has been set, the file MUST be when non-event-based retention has been set, the file MUST be
retained even the event-based retention has been set, and the retained even if event-based retention has been set, and the duration
duration of event-based retention has been reached. The server MAY of event-based retention has been reached. The server MAY restrict
restrict the enabling of event-based retention or the duration of the enabling of event-based retention or the duration of event-based
event-based retention on the basis of the ACE4_WRITE_RETENTION ACL retention on the basis of the ACE4_WRITE_RETENTION ACL permission.
permission. The enabling of event-based retention does not prevent The enabling of event-based retention does not prevent the enabling
the enabling of non-event-based retention nor the modification of the of non-event-based retention nor the modification of the
retention_hold attribute. retention_hold attribute.
5.12.5. Attribute 73: retention_hold 5.12.5. Attribute 73: retention_hold
Get or set administrative retention holds, one hold per bit position. Get or set administrative retention holds, one hold per bit position.
This attribute allows one to 64 administrative holds, one hold per This attribute allows one to 64 administrative holds, one hold per
bit on the attribute. If retention_hold is not zero, then the file bit on the attribute. If retention_hold is not zero, then the file
MUST NOT be deleted, renamed, or modified, even if the duration on MUST NOT be deleted, renamed, or modified, even if the duration on
enabled event or non-event-based retention has been reached. The enabled event or non-event-based retention has been reached. The
skipping to change at page 160, line 13 skipping to change at page 160, line 13
type locking requests are allowed, unless the server is able to type locking requests are allowed, unless the server is able to
reliably determine (through state persistently maintained across reliably determine (through state persistently maintained across
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to true and OPEN operations with a claim type of reclaim set to TRUE and OPEN operations with a claim type of
CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state.
Once this is done, or if there is no such locking state to reclaim, Once this is done, or if there is no such locking state to reclaim,
the client sends a global RECLAIM_COMPLETE operation, i.e. one with the client sends a global RECLAIM_COMPLETE operation, i.e. one with
the one_fs argument set to false, to indicate that it has reclaimed the rca_one_fs argument set to FALSE, to indicate that it has
all of the locking state that it will reclaim. Once a client sends reclaimed all of the locking state that it will reclaim. Once a
such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking client sends such a RECLAIM_COMPLETE operation, it may attempt non-
operations, although it may get NFS4ERR_GRACE errors the operations reclaim locking operations, although it may get NFS4ERR_GRACE errors
until the period of special handling is over. See Section 11.7.7 for the operations until the period of special handling is over. See
a discussion of the analogous handling lock reclamation in the case Section 11.7.7 for a discussion of the analogous handling lock
of file systems transitioning from server to server. reclamation in the case of file systems transitioning from server to
server.
During the grace period, the server must reject READ and WRITE During the grace period, the server must reject READ and WRITE
operations and non-reclaim locking requests (i.e. other LOCK and OPEN operations and non-reclaim locking requests (i.e. other LOCK and OPEN
operations) with an error of NFS4ERR_GRACE, unless it is able to operations) with an error of NFS4ERR_GRACE, unless it is able to
guarantee that these may be done safely, as described below. guarantee that these may be done safely, as described below.
The grace period may last until all clients who are known to possibly The grace period may last until all clients who are known to possibly
have had locks have done a global RECLAIM_COMPLETE operation, have had locks have done a global RECLAIM_COMPLETE operation,
indicating that they have finished reclaiming the locks they held indicating that they have finished reclaiming the locks they held
before the server reboot. This means that a client which has done a before the server reboot. This means that a client which has done a
skipping to change at page 196, line 34 skipping to change at page 196, line 34
storage is OPTIONAL. storage is OPTIONAL.
As discussed earlier in this section, the client MAY return the same As discussed earlier in this section, the client MAY return the same
cc value on subsequent CB_GETATTR calls, even if the file was cc value on subsequent CB_GETATTR calls, even if the file was
modified in the client's cache yet again between successive modified in the client's cache yet again between successive
CB_GETATTR calls. Therefore, the server must assume that the file CB_GETATTR calls. Therefore, the server must assume that the file
has been modified yet again, and MUST take care to ensure that the has been modified yet again, and MUST take care to ensure that the
new nsc it constructs and returns is greater than the previous nsc it new nsc it constructs and returns is greater than the previous nsc it
returned. An example implementation's delegation record would returned. An example implementation's delegation record would
satisfy this mandate by including a boolean field (let us call it satisfy this mandate by including a boolean field (let us call it
"modified") that is set to false when the delegation is granted, and "modified") that is set to FALSE when the delegation is granted, and
an sc value set at the time of grant to the change attribute value. an sc value set at the time of grant to the change attribute value.
The modified field would be set to true the first time cc != sc, and The modified field would be set to true the first time cc != sc, and
would stay true until the delegation is returned or revoked. The would stay true until the delegation is returned or revoked. The
processing for constructing nsc, time_modify, and time_metadata would processing for constructing nsc, time_modify, and time_metadata would
use this pseudo code: use this pseudo code:
if (!modified) { if (!modified) {
do CB_GETATTR for change and size; do CB_GETATTR for change and size;
if (cc != sc) if (cc != sc)
skipping to change at page 231, line 15 skipping to change at page 231, line 15
reclaim after server reboot (although in the case of the planned reclaim after server reboot (although in the case of the planned
state transfer associated with migration, these can be avoided by state transfer associated with migration, these can be avoided by
securely recording lock state as part of state migration). Unless securely recording lock state as part of state migration). Unless
the destination server can guarantee that locks will not be the destination server can guarantee that locks will not be
incorrectly granted, the destination server should not allow lock incorrectly granted, the destination server should not allow lock
reclaims and avoid establishing a grace period. reclaims and avoid establishing a grace period.
Once all locks have been reclaimed, or there were no locks to Once all locks have been reclaimed, or there were no locks to
reclaim, the client indicates that there are no more reclaims to be reclaim, the client indicates that there are no more reclaims to be
done for the file system in question by issuing a RECLAIM_COMPLETE done for the file system in question by issuing a RECLAIM_COMPLETE
operation with the one_fs parameter set to true. Once this has been operation with the rca_one_fs parameter set to true. Once this has
done, non-reclaim locking operations may be done, and any subsequent been done, non-reclaim locking operations may be done, and any
request to do reclaims will be rejected with the error subsequent request to do reclaims will be rejected with the error
NFS4ERR_NO_GRACE. NFS4ERR_NO_GRACE.
Information about client identity may be propagated between servers Information about client identity may be propagated between servers
in the form of client_owner4 and associated verifiers, under the in the form of client_owner4 and associated verifiers, under the
assumption that the client presents the same values to all the assumption that the client presents the same values to all the
servers with which it deals. servers with which it deals.
Servers are encouraged to provide facilities to allow locks to be Servers are encouraged to provide facilities to allow locks to be
reclaimed on the new server after a file system transition. Often, reclaimed on the new server after a file system transition. Often,
however, in cases in which the two servers do not share a server however, in cases in which the two servers do not share a server
skipping to change at page 268, line 20 skipping to change at page 268, line 20
the server supports and the client is prepared to use. The layout the server supports and the client is prepared to use. The layout
returned to the client may not exactly align with the requested byte returned to the client may not exactly align with the requested byte
range. A field within the LAYOUTGET request, loga_minlength, range. A field within the LAYOUTGET request, loga_minlength,
specifies the minimum length of the layout. The loga_minlength field specifies the minimum length of the layout. The loga_minlength field
should be at least one. As needed a client may make multiple should be at least one. As needed a client may make multiple
LAYOUTGET requests; these will result in multiple overlapping, non- LAYOUTGET requests; these will result in multiple overlapping, non-
conflicting layouts. conflicting layouts.
In order to get a layout, the client must first have opened the file In order to get a layout, the client must first have opened the file
via the OPEN operation. When a client has no layout on a file, it via the OPEN operation. When a client has no layout on a file, it
presents a stateid as returned by OPEN, a delegation stateid, or a MUST present a stateid as returned by OPEN, a delegation stateid, or
byte-range lock stateid in the loga_stateid argument. A successful a byte-range lock stateid in the loga_stateid argument. A successful
LAYOUTGET result includes a layout stateid. The first successful LAYOUTGET result includes a layout stateid. The first successful
LAYOUTGET processed by the server using a non-layout stateid as an LAYOUTGET processed by the server using a non-layout stateid as an
argument MUST have the "seqid" field of the layout stateid in the argument MUST have the "seqid" field of the layout stateid in the
response set to one. Thereafter, the client uses a layout stateid response set to one. Thereafter, the client uses a layout stateid
(see Section 12.5.3) on future invocations of LAYOUTGET on the file, (see Section 12.5.3) on future invocations of LAYOUTGET on the file,
and the "seqid" MUST NOT ever be set to zero. Once the layout has and the "seqid" MUST NOT ever be set to zero. Once the layout has
been retrieved, it can be held across multiple OPEN and CLOSE been retrieved, it can be held across multiple OPEN and CLOSE
sequences. Therefore, a client may hold a layout for a file that is sequences. Therefore, a client may hold a layout for a file that is
not currently open by any user on the client. This allows for the not currently open by any user on the client. This allows for the
caching of layouts beyond CLOSE. caching of layouts beyond CLOSE.
skipping to change at page 270, line 10 skipping to change at page 270, line 10
CB_LAYOUTRECALL request. Simply seeing the result or the CB_LAYOUTRECALL request. Simply seeing the result or the
CB_LAYOUTRECALL request is not sufficient cause to use the seqid. CB_LAYOUTRECALL request is not sufficient cause to use the seqid.
For LAYOUTGET results, if the client is not using the forgetful model For LAYOUTGET results, if the client is not using the forgetful model
(Section 12.5.5.1), it MUST first update its record of what ranges of (Section 12.5.5.1), it MUST first update its record of what ranges of
the file's layout it has before using the seqid. For LAYOUTRETURN the file's layout it has before using the seqid. For LAYOUTRETURN
results, the client MUST delete the range from its record of what results, the client MUST delete the range from its record of what
ranges of the file's layout it had before using the seqid. For ranges of the file's layout it had before using the seqid. For
CB_LAYOUTRECALL arguments, the client MUST send a response to the CB_LAYOUTRECALL arguments, the client MUST send a response to the
recall before using the seqid. recall before using the seqid.
Once a client has no more layouts on a file, the layout stateid is no
longer valid, and MUST NOT be used. Any attempt to use such a layout
stateid will result in NFS4ERR_BAD_STATEID.
12.5.4. Committing a Layout 12.5.4. Committing a Layout
Allowing for varying storage protocols capabilities, the pNFS Allowing for varying storage protocols capabilities, the pNFS
protocol does not require the metadata server and storage devices to protocol does not require the metadata server and storage devices to
have a consistent view of file attributes and data location mappings. have a consistent view of file attributes and data location mappings.
Data location mapping refers to aspects such as which offsets store Data location mapping refers to aspects such as which offsets store
data as opposed to storing holes (see Section 13.4.4 for a data as opposed to storing holes (see Section 13.4.4 for a
discussion). Related issues arise for storage protocols where a discussion). Related issues arise for storage protocols where a
layout may hold provisionally allocated blocks where the allocation layout may hold provisionally allocated blocks where the allocation
of those blocks does not survive a complete restart of both the of those blocks does not survive a complete restart of both the
skipping to change at page 271, line 5 skipping to change at page 271, line 8
The control protocol is free to synchronize the attributes before it The control protocol is free to synchronize the attributes before it
receives a LAYOUTCOMMIT, however upon successful completion of a receives a LAYOUTCOMMIT, however upon successful completion of a
LAYOUTCOMMIT, state that exists on the metadata server that describes LAYOUTCOMMIT, state that exists on the metadata server that describes
the file MUST be in sync with the state existing on the storage the file MUST be in sync with the state existing on the storage
devices that comprise that file as of the issuing client's last devices that comprise that file as of the issuing client's last
operation. Thus, a client that queries the size of a file between a operation. Thus, a client that queries the size of a file between a
WRITE to a storage device and the LAYOUTCOMMIT may observe a size WRITE to a storage device and the LAYOUTCOMMIT may observe a size
that does not reflect the actual data written. that does not reflect the actual data written.
The client MUST have a layout in order to issue LAYOUTCOMMIT.
12.5.4.1. LAYOUTCOMMIT and change/time_modify 12.5.4.1. LAYOUTCOMMIT and change/time_modify
The change and time_modify attributes may be updated by the server The change and time_modify attributes may be updated by the server
when the LAYOUTCOMMIT operation is processed. The reason for this is when the LAYOUTCOMMIT operation is processed. The reason for this is
that some layout types do not support the update of these attributes that some layout types do not support the update of these attributes
when the storage devices process I/O operations. The client is when the storage devices process I/O operations. If client has a
capable providing a suggested value to the server for time_modify layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY
within the arguments to LAYOUTCOMMIT. Based on layout type, the provide a suggested value to the server for time_modify within the
provided value may or may not be used. The server should sanity arguments to LAYOUTCOMMIT. Based on the layout type, the provided
check the client provided values before they are used. For example, value may or may not be used. The server should sanity check the
the server should ensure that time does not flow backwards. The client provided values before they are used. For example, the server
client always has the option to set time_modify through an explicit should ensure that time does not flow backwards. The client always
SETATTR operation. has the option to set time_modify through an explicit SETATTR
operation.
For some layout protocols, the storage device is able to notify the For some layout protocols, the storage device is able to notify the
metadata server of the occurrence of an I/O and as a result the metadata server of the occurrence of an I/O and as a result the
change and time_modify attributes may be updated at the metadata change and time_modify attributes may be updated at the metadata
server. For a metadata server that is capable of monitoring updates server. For a metadata server that is capable of monitoring updates
to the change and time_modify attributes, LAYOUTCOMMIT processing is to the change and time_modify attributes, LAYOUTCOMMIT processing is
not required to update the change attribute; in this case the not required to update the change attribute; in this case the
metadata server must ensure that no further update to the data has metadata server must ensure that no further update to the data has
occurred since the last update of the attributes; file-based occurred since the last update of the attributes; file-based
protocols may have enough information to make this determination or protocols may have enough information to make this determination or
skipping to change at page 271, line 45 skipping to change at page 271, line 51
12.5.4.2. LAYOUTCOMMIT and size 12.5.4.2. LAYOUTCOMMIT and size
The size of a file may be updated when the LAYOUTCOMMIT operation is The size of a file may be updated when the LAYOUTCOMMIT operation is
used by the client. One of the fields in the argument to used by the client. One of the fields in the argument to
LAYOUTCOMMIT is loca_last_write_offset; this field indicates the LAYOUTCOMMIT is loca_last_write_offset; this field indicates the
highest byte offset written but not yet committed with the highest byte offset written but not yet committed with the
LAYOUTCOMMIT operation. The data type of lora_last_write_offset is LAYOUTCOMMIT operation. The data type of lora_last_write_offset is
newoffset4 and is switched on a boolean value, no_newoffset, that newoffset4 and is switched on a boolean value, no_newoffset, that
indicates if a previous write occurred or not. If no_newoffset is indicates if a previous write occurred or not. If no_newoffset is
FALSE, an offset is not given. A loca_last_write_offset value of FALSE, an offset is not given. If the client has a layout with
zero means that one byte was written at offset zero. LAYOUTIOMODE4_RW iomode on the file, with an lo_offset and lo_length
that overlaps loca_last_write_offset, then the client MAY set
no_newoffset to TRUE and provide an offset that will update the file
size. Keep in mind that offset is not the same as length, though
they are related. For example, a loca_last_write_offset value of
zero means that one byte was written at offset zero, and so the
length of the file is at least one byte.
The metadata server may do one of the following: The metadata server may do one of the following:
1. Update the file's size using the last write offset provided by 1. Update the file's size using the last write offset provided by
the client as either the true file size or as a hint of the file the client as either the true file size or as a hint of the file
size. If the metadata server has a method available, any new size. If the metadata server has a method available, any new
value for file size should be sanity checked. For example, the value for file size should be sanity checked. For example, the
file must not be truncated if the client presents a last write file must not be truncated if the client presents a last write
offset less than the file's current size. offset less than the file's current size.
skipping to change at page 281, line 46 skipping to change at page 282, line 11
LAYOUTCOMMIT to commit the modification time and the new size of the LAYOUTCOMMIT to commit the modification time and the new size of the
file (if it believes it extended the file size) to the metadata file (if it believes it extended the file size) to the metadata
server and the modified data to the file system. server and the modified data to the file system.
12.7. Recovery 12.7. Recovery
Recovery is complicated by the distributed nature of the pNFS Recovery is complicated by the distributed nature of the pNFS
protocol. In general, crash recovery for layouts is similar to crash protocol. In general, crash recovery for layouts is similar to crash
recovery for delegations in the base NFSv4.1 protocol. However, the recovery for delegations in the base NFSv4.1 protocol. However, the
client's ability to perform I/O without contacting the metadata client's ability to perform I/O without contacting the metadata
server subtleties that must be handled correctly if the possibility server introduces subtleties that must be handled correctly if the
of file system corruption is to be avoided. [[Comment.4: mre: possibility of file system corruption is to be avoided.
layouts are bound to stateids]]
12.7.1. Recovery from Client Restart 12.7.1. Recovery from Client Restart
Client recovery for layouts is similar to client recovery for other Client recovery for layouts is similar to client recovery for other
lock and delegation state. When an pNFS client restarts, it will lock and delegation state. When an pNFS client restarts, it will
lose all information about the layouts that it previously owned. lose all information about the layouts that it previously owned.
There are two methods by which the server can reclaim these resources There are two methods by which the server can reclaim these resources
and allow otherwise conflicting layouts to be provided to other and allow otherwise conflicting layouts to be provided to other
clients. clients.
skipping to change at page 290, line 39 skipping to change at page 290, line 45
If a server is both a metadata server and a data server, the server If a server is both a metadata server and a data server, the server
might need to distinguish operations on files that are directed to might need to distinguish operations on files that are directed to
the metadata server from those that are directed to the data server. the metadata server from those that are directed to the data server.
It is RECOMMENDED that the values of the filehandles returned by the It is RECOMMENDED that the values of the filehandles returned by the
LAYOUTGET operation to be different than the value of the filehandle LAYOUTGET operation to be different than the value of the filehandle
returned by the OPEN of the same file. returned by the OPEN of the same file.
Another scenario is for the metadata server and the storage device to Another scenario is for the metadata server and the storage device to
be distinct from one client's point of view, and the roles reversed be distinct from one client's point of view, and the roles reversed
from another client's point of view. For example, in the cluster from another client's point of view. For example, in the cluster
file system model a metadata server to one client, may be a data file system model, a metadata server to one client may be a data
server to another client. If NFSv4.1 is being used as the storage server to another client. If NFSv4.1 is being used as the storage
protocol, then pNFS servers need to encode the values of filehandles protocol, then pNFS servers need to encode the values of filehandles
according to their specific roles. according to their specific roles.
13.1.1. Sessions Considerations for Data Servers
Section 2.10.9.2 states that a client has to keep its lease renewed
in order to prevent a session from being deleted by the server. If
the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role
set, then as noted in Section 13.6 the client will not be able to
determine the data server's lease_time attribute, because GETATTR
will not be permitted. Instead, the rule is that any time a client
receives a layout referring it to a data server that returns just the
EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the
lease_time attribute from the metadata server that returned the
layout applies to the data server. Thus the data server MUST be
aware of the values of all lease_time attributes of all metadata
servers it is providing I/O for, and MUST use the maximum of all such
lease_time values as the lease interval for all client IDs and
sessions established on it.
For example, if one metadata server has a lease_time attribute of 20
seconds, and a second metadata server has a lease_time attribute of
10 seconds, then if both servers return layouts that refer to an
EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data server MUST
renew a client's lease if the interval between two SEQUENCE
operations on different COMPOUND requests is less than 20 seconds.
13.2. File Layout Definitions 13.2. File Layout Definitions
The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout
type, and may be applicable to other layout types. type, and may be applicable to other layout types.
Unit. A unit is a fixed size quantity of data written to a data Unit. A unit is a fixed size quantity of data written to a data
server. server.
Pattern. A pattern is a method of distributing one or more equal Pattern. A pattern is a method of distributing one or more equal
sized units across a set of data servers. A pattern is iterated sized units across a set of data servers. A pattern is iterated
skipping to change at page 304, line 20 skipping to change at page 305, line 20
personalities, each COMPOUND sent by the client MUST be constructed personalities, each COMPOUND sent by the client MUST be constructed
so that it is appropriate to one of the two personalities, and must so that it is appropriate to one of the two personalities, and must
not contain operations directed to a mix of those personalities. The not contain operations directed to a mix of those personalities. The
server MUST enforce this. To understand the constraints, operations server MUST enforce this. To understand the constraints, operations
within a COMPOUND are divided into the following three classes: within a COMPOUND are divided into the following three classes:
1. An operation which is ambiguous regarding its personality 1. An operation which is ambiguous regarding its personality
assignment. These include all of the data-server housekeeping assignment. These include all of the data-server housekeeping
operations. Additionally, if the server has assigned filehandles operations. Additionally, if the server has assigned filehandles
so that the ones defined by the layout are the same as those used so that the ones defined by the layout are the same as those used
by the meta-data server, all operations in the second class are by the metadata server, all operations in the second class are
within this group unless a stateid used is incompatible with a within this group unless a stateid used is incompatible with a
data-server personality in that it is a special stateid or has a data-server personality in that it is a special stateid or has a
non-zero seqid field. non-zero seqid field.
2. An operation which is referable to the data server personality. 2. An operation which is referable to the data server personality.
These are data-server I/O operations where the filehandle is one These are data-server I/O operations where the filehandle is one
that can only be validly directed to the data-server personality. that can only be validly directed to the data-server personality.
3. An operation which is referable to the non-data-server 3. An operation which is referable to the non-data-server
personality. These include all COMPOUND operations that are personality. These include all COMPOUND operations that are
skipping to change at page 305, line 41 skipping to change at page 306, line 41
has completed (see Section 12.5.4.2). Section 13.10, describes the has completed (see Section 12.5.4.2). Section 13.10, describes the
mechanism by which the client is to handle data server files that do mechanism by which the client is to handle data server files that do
not reflect the metadata server's size. not reflect the metadata server's size.
13.7. COMMIT Through Metadata Server 13.7. COMMIT Through Metadata Server
The file layout provides two alternate means of providing for the The file layout provides two alternate means of providing for the
commit of data written through data servers. The flag commit of data written through data servers. The flag
NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout
(data type nfsv4_1_file_layout4) is an indication from the metadata (data type nfsv4_1_file_layout4) is an indication from the metadata
server to the client of the preferred way of performing COMMIT, server to the client of the REQUIRED way of performing COMMIT, either
either by sending the COMMIT to the data server or the metadata by sending the COMMIT to the data server or the metadata server.
server. These two methods of dealing with the issue correspond to These two methods of dealing with the issue correspond to broad
broad styles of implementation for a pNFS server supporting the files styles of implementation for a pNFS server supporting the files
layout type. layout type.
o When the flag is false, COMMIT operations are to be done to the o When the flag is FALSE, COMMIT operations MUST to be sent to the
data server to which the corresponding writes were done. This data server to which the corresponding WRITE operations were sent.
approach is most useful when striping of files is implemented as This approach is most useful when striping of files is implemented
part of pNFS server, with the individual data servers each as part of pNFS server, with the individual data servers each
implementing their own file systems. implementing their own file systems.
o When the flag is true, COMMIT operations are done to the metadata o When the flag is TRUE, COMMIT operations MUST be sent to the
server, rather than to the individual data servers. This approach metadata server, rather than to the individual data servers. This
is most useful when the pNFS server is implemented on top of a approach is most useful when the pNFS server is implemented on top
clustered file system. In such an implementation, sending of a clustered file system. In such an implementation, sending
COMMIT's to multiple data servers may result in repeated writes of COMMIT's to multiple data servers may result in repeated writes of
metadata blocks as each individual COMMIT is executed, to the metadata blocks as each individual COMMIT is executed, to the
detriment of write performance. Sending a single COMMIT to the detriment of write performance. Sending a single COMMIT to the
metadata server can provide more efficiency when there exists a metadata server can provide more efficiency when there exists a
clustered file system capable of implementing such a co-ordinated clustered file system capable of implementing such a co-ordinated
COMMIT. COMMIT.
If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to
maintain the current NFSv4.1 commit and recovery model, the data maintain the current NFSv4.1 commit and recovery model, the data
servers MUST return a common writeverf verifier in all WRITE servers MUST return a common writeverf verifier in all WRITE
skipping to change at page 314, line 32 skipping to change at page 315, line 32
Table B.1 Table B.1
Table B.2 is normally not part of the nfs4_cs_prep profile as it is Table B.2 is normally not part of the nfs4_cs_prep profile as it is
primarily for dealing with case-insensitive comparisons. However, if primarily for dealing with case-insensitive comparisons. However, if
the NFSv4.1 file server supports the case_insensitive file system the NFSv4.1 file server supports the case_insensitive file system
attribute, and if case_insensitive is true, the NFSv4.1 server MUST attribute, and if case_insensitive is true, the NFSv4.1 server MUST
use Table B.2 (in addition to Table B1) when processing utf8str_cs use Table B.2 (in addition to Table B1) when processing utf8str_cs
strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to
Table B.1) are being used. Table B.1) are being used.
If the case_preserving attribute is present and set to false, then If the case_preserving attribute is present and set to FALSE, then
the NFSv4.1 server MUST use table B.2 to map case when processing the NFSv4.1 server MUST use table B.2 to map case when processing
utf8str_cs strings. Whether the server maps from lower to upper case utf8str_cs strings. Whether the server maps from lower to upper case
or the upper to lower case is an implementation dependency. or the upper to lower case is an implementation dependency.
14.1.4. Normalization used by nfs4_cs_prep 14.1.4. Normalization used by nfs4_cs_prep
The nfs4_cs_prep profile does not specify a normalization form. A The nfs4_cs_prep profile does not specify a normalization form. A
later revision of this specification may specify a particular later revision of this specification may specify a particular
normalization form. Therefore, the server and client can expect that normalization form. Therefore, the server and client can expect that
they may receive unnormalized characters within protocol requests and they may receive unnormalized characters within protocol requests and
skipping to change at page 342, line 35 skipping to change at page 343, line 35
| GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE |
| ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL |
| LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, |
| | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, |
| | NFS4ERR_IO, NFS4ERR_ISDIR NFS4ERR_MOVED, | | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR |
| | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_NO_GRACE, | | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RECLAIM_BAD, | | | NFS4ERR_RECLAIM_BAD, |
| | NFS4ERR_RECLAIM_CONFLICT, | | | NFS4ERR_RECLAIM_CONFLICT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, |
| | NFS4ERR_WRONG_CRED | | | NFS4ERR_WRONG_CRED |
skipping to change at page 361, line 38 skipping to change at page 362, line 38
| NFS4ERR_INVAL | ACCESS, BACKCHANNEL_CTL, | | NFS4ERR_INVAL | ACCESS, BACKCHANNEL_CTL, |
| | BIND_CONN_TO_SESSION, | | | BIND_CONN_TO_SESSION, |
| | CB_GETATTR, CB_LAYOUTRECALL, | | | CB_GETATTR, CB_LAYOUTRECALL, |
| | CB_NOTIFY, CB_PUSH_DELEG, | | | CB_NOTIFY, CB_PUSH_DELEG, |
| | CB_RECALLABLE_OBJ_AVAIL, | | | CB_RECALLABLE_OBJ_AVAIL, |
| | CB_RECALL_ANY, CREATE, | | | CB_RECALL_ANY, CREATE, |
| | CREATE_SESSION, DELEGRETURN, | | | CREATE_SESSION, DELEGRETURN, |
| | EXCHANGE_ID, GETATTR, | | | EXCHANGE_ID, GETATTR, |
| | GETDEVICEINFO, GETDEVICELIST, | | | GETDEVICEINFO, GETDEVICELIST, |
| | GET_DIR_DELEGATION, | | | GET_DIR_DELEGATION, |
| | LAYOUTGET, LAYOUTRETURN, | | | LAYOUTCOMMIT, LAYOUTGET, |
| | LINK, LOCK, LOCKT, LOCKU, | | | LAYOUTRETURN, LINK, LOCK, |
| | LOOKUP, NVERIFY, OPEN, | | | LOCKT, LOCKU, LOOKUP, |
| | NVERIFY, OPEN, |
| | OPEN_DOWNGRADE, READ, | | | OPEN_DOWNGRADE, READ, |
| | READDIR, READLINK, | | | READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, SECINFO, | | | RENAME, SECINFO, |
| | SECINFO_NO_NAME, SETATTR, | | | SECINFO_NO_NAME, SETATTR, |
| | VERIFY, WANT_DELEGATION, | | | VERIFY, WANT_DELEGATION, |
| | WRITE | | | WRITE |
| NFS4ERR_IO | ACCESS, COMMIT, CREATE, | | NFS4ERR_IO | ACCESS, COMMIT, CREATE, |
| | GETATTR, GETDEVICELIST, | | | GETATTR, GETDEVICELIST, |
| | GET_DIR_DELEGATION, | | | GET_DIR_DELEGATION, |
skipping to change at page 426, line 48 skipping to change at page 427, line 48
In absence of a persistent session, the client invokes exclusive In absence of a persistent session, the client invokes exclusive
create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1.
In these cases, the client provides a verifier that can reasonably be In these cases, the client provides a verifier that can reasonably be
expected to be unique. A combination of a client identifier, perhaps expected to be unique. A combination of a client identifier, perhaps
the client network address, and a unique number generated by the the client network address, and a unique number generated by the
client, perhaps the RPC transaction identifier, may be appropriate. client, perhaps the RPC transaction identifier, may be appropriate.
If the object does not exist, the server creates the object and If the object does not exist, the server creates the object and
stores the verifier in stable storage. For file systems that do not stores the verifier in stable storage. For file systems that do not
provide a mechanism for the storage of arbitrary file attributes, the provide a mechanism for the storage of arbitrary file attributes, the
server may use one or more elements of the object meta-data to store server may use one or more elements of the object metadata to store
the verifier. The verifier must be stored in stable storage to the verifier. The verifier must be stored in stable storage to
prevent erroneous failure on retransmission of the request. It is prevent erroneous failure on retransmission of the request. It is
assumed that an exclusive create is being performed because exclusive assumed that an exclusive create is being performed because exclusive
semantics are critical to the application. Because of the expected semantics are critical to the application. Because of the expected
usage, exclusive CREATE does not rely solely on the server's reply usage, exclusive CREATE does not rely solely on the server's reply
cache for storage of the verifier. A nonpersistent reply cache does cache for storage of the verifier. A nonpersistent reply cache does
not survive a crash and the session and reply cache may be deleted not survive a crash and the session and reply cache may be deleted
after a network partition that exceeds the lease time, thus opening after a network partition that exceeds the lease time, thus opening
failure windows. failure windows.
skipping to change at page 485, line 31 skipping to change at page 486, line 31
uses (which will be either what the client offered, or what the uses (which will be either what the client offered, or what the
server is insisting on). return the value used to the client. These server is insisting on). return the value used to the client. These
parameters have the following interpretation. parameters have the following interpretation.
csa_flags: csa_flags:
The csa_flags field contains a list of the following flag bits: The csa_flags field contains a list of the following flag bits:
CREATE_SESSION4_FLAG_PERSIST: CREATE_SESSION4_FLAG_PERSIST:
If CREATE_SESSION4_FLAG_PERSIST is set, the client desires If CREATE_SESSION4_FLAG_PERSIST is set, the client wants the
server support for persistent reply cache. For sessions in server to provide a persistent reply cache. For sessions in
which only idempotent operations will be used (e.g. a read-only which only idempotent operations will be used (e.g. a read-only
session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST. session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST.
If the server does not or cannot provide a persistent reply If the server does not or cannot provide a persistent reply
cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in
the field csr_flags. the field csr_flags.
If the server is a pNFS metadata server, for reasons described If the server is a pNFS metadata server, for reasons described
in Section 12.5.2 it SHOULD support in Section 12.5.2 it SHOULD support
CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint
(Section 5.11.4) attribute. (Section 5.11.4) attribute.
skipping to change at page 493, line 20 skipping to change at page 494, line 20
18.37.2. RESULT 18.37.2. RESULT
struct DESTROY_SESSION4res { struct DESTROY_SESSION4res {
nfsstat4 dsr_status; nfsstat4 dsr_status;
}; };
18.37.3. DESCRIPTION 18.37.3. DESCRIPTION
The DESTROY_SESSION operation closes the session and discards the The DESTROY_SESSION operation closes the session and discards the
session's its reply cache, if any. Any remaining connections session's reply cache, if any. Any remaining connections associated
associated with the session are immediately disassociated and it not with the session are immediately disassociated and it not associated
associated with out sessions, MAY be closed by the server. Locks, with out sessions, MAY be closed by the server. Locks, delegations,
delegations, layouts, wants, and the lease, which are all tied to the layouts, wants, and the lease, which are all tied to the client ID,
client ID, are not affected by DESTROY_SESSION. are not affected by DESTROY_SESSION.
DESTROY_SESSION MUST be invoked on a connection that is associated DESTROY_SESSION MUST be invoked on a connection that is associated
with the session being destroyed. In addition if SP4_MACH_CRED state with the session being destroyed. In addition if SP4_MACH_CRED state
protection was specified when the client ID was created, the protection was specified when the client ID was created, the
RPCSEC_GSS principal that created the session MUST be the one that RPCSEC_GSS principal that created the session MUST be the one that
destroys the session, using RPCSEC_GSS privacy or integrity. If destroys the session, using RPCSEC_GSS privacy or integrity. If
SP4_SSV state protection was specified when the client ID was SP4_SSV state protection was specified when the client ID was
created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be
used, with integrity or privacy. used, with integrity or privacy.
If the COMPOUND request starts with SEQUENCE, and if the sessions If the COMPOUND request starts with SEQUENCE, and if the sessions
referred to by SEQUENCE and DESTROY_SESSION are the same, then referred to by SEQUENCE and DESTROY_SESSION are the same, then
o DESTROY_SESSION MUST be the final operation in the COMPOUND o DESTROY_SESSION MUST be the final operation in the COMPOUND
request. request.
o It is advisable to not place DESTROY_SESSION in a COMPOUND request o It is advisable to not place DESTROY_SESSION in a COMPOUND request
with other state-modifying operations, because the DESTROY_SESSION with other state-modifying operations, because the DESTROY_SESSION
will destroy reply cache. will destroy the reply cache.
DESTROY_SESSION MAY be the only operation in a COMPOUND request. DESTROY_SESSION MAY be the only operation in a COMPOUND request.
Because the session is destroyed, a client that retries the request Because the session is destroyed, a client that retries the request
may receive an error in reply to the retry, even though the original may receive an error in reply to the retry, even though the original
request was successful. request was successful.
If there is a backchannel on the session and the server has If there is a backchannel on the session and the server has
outstanding CB_COMPOUND operations for the session which have not outstanding CB_COMPOUND operations for the session which have not
been replied to, then the server MAY refuse to destroy the session been replied to, then the server MAY refuse to destroy the session
skipping to change at page 504, line 32 skipping to change at page 505, line 32
void; void;
}; };
18.42.3. DESCRIPTION 18.42.3. DESCRIPTION
Commits changes in the layout represented by the current filehandle, Commits changes in the layout represented by the current filehandle,
client ID (derived from the sessionid in the preceding SEQUENCE client ID (derived from the sessionid in the preceding SEQUENCE
operation), byte range, and stateid. Since layouts are sub- operation), byte range, and stateid. Since layouts are sub-
dividable, a smaller portion of a layout, retrieved via LAYOUTGET, dividable, a smaller portion of a layout, retrieved via LAYOUTGET,
may be committed. The region being committed is specified through may be committed. The region being committed is specified through
the byte range (loca_offset and loca_length). the byte range (loca_offset and loca_length). This region MUST
overlap with one or more existing layouts previously granted via
LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW.
The LAYOUTCOMMIT operation indicates that the client has completed The LAYOUTCOMMIT operation indicates that the client has completed
writes using a layout obtained by a previous LAYOUTGET. The client writes using a layout obtained by a previous LAYOUTGET. The client
may have only written a subset of the data range it previously may have only written a subset of the data range it previously
requested. LAYOUTCOMMIT allows it to commit or discard provisionally requested. LAYOUTCOMMIT allows it to commit or discard provisionally
allocated space and to update the server with a new end of file. The allocated space and to update the server with a new end of file. The
layout referenced by LAYOUTCOMMIT is still valid after the operation layout referenced by LAYOUTCOMMIT is still valid after the operation
completes and can be continued to be referenced by the client ID, completes and can be continued to be referenced by the client ID,
filehandle, byte range, layout type, and stateid. filehandle, byte range, layout type, and stateid.
If the loca_reclaim field is set to TRUE, this indicates that the If the loca_reclaim field is set to TRUE, this indicates that the
client is attempting to commit changes to a layout after the reboot client is attempting to commit changes to a layout after the reboot
of the metadata server during the metadata server's recovery grace of the metadata server during the metadata server's recovery grace
period. This type of request may be necessary when the client has period (see Section 12.7.4). This type of request may be necessary
uncommitted writes to provisionally allocated regions of a file which when the client has uncommitted writes to provisionally allocated
were sent to the storage devices before the reboot of the metadata regions of a file which were sent to the storage devices before the
server. In this case the layout provided by the client MUST be a reboot of the metadata server. In this case the layout provided by
subset of a writable layout that the client held immediately before the client MUST be a subset of a writable layout that the client held
the reboot of the metadata server. The metadata server is free to immediately before the reboot of the metadata server. The metadata
accept or reject this request based on its own internal metadata server is free to accept or reject this request based on its own
consistency checks. If the metadata server finds that the layout internal metadata consistency checks. If the metadata server finds
provided by the client does not pass its consistency checks, it MUST that the layout provided by the client does not pass its consistency
reject the request with the status NFS4ERR_RECLAIM_BAD. The checks, it MUST reject the request with the status
successful completion of the LAYOUTCOMMIT request with loca_reclaim NFS4ERR_RECLAIM_BAD. The successful completion of the LAYOUTCOMMIT
set to TRUE does NOT provide the client with a layout for the file. request with loca_reclaim set to TRUE does NOT provide the client
It simply commits the changes to the layout specified in the with a layout for the file. It simply commits the changes to the
loca_layoutupdate field. To obtain a layout for the file the client layout specified in the loca_layoutupdate field. To obtain a layout
must send a LAYOUTGET request to the server after the server's grace for the file the client must send a LAYOUTGET request to the server
period has expired. If the metadata server receives a LAYOUTCOMMIT after the server's grace period has expired. If the metadata server
request with loca_reclaim set to TRUE when the metadata server is not receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE when
in its recovery grace period, it MUST reject the request with the the metadata server is not in its recovery grace period, it MUST
status NFS4ERR_NO_GRACE. reject the request with the status NFS4ERR_NO_GRACE.
Setting the loca_reclaim field to TRUE is required if and only if the Setting the loca_reclaim field to TRUE is required if and only if the
committed layout was acquired before the metadata server reboot. If committed layout was acquired before the metadata server reboot. If
the client is committing a layout that was acquired during the the client is committing a layout that was acquired during the
metadata server's grace period, it MUST set the "reclaim" field to metadata server's grace period, it MUST set the "reclaim" field to
FALSE. FALSE.
The loca_stateid is a layout stateid value as returned by previously The loca_stateid is a layout stateid value as returned by previously
successful layout operations ( see Section 12.5.3). successful layout operations ( see Section 12.5.3).
The loca_last_write_offset field specifies the offset of the last The loca_last_write_offset field specifies the offset of the last
byte written by the client previous to the LAYOUTCOMMIT. Note that byte written by the client previous to the LAYOUTCOMMIT. Note that
this value is never equal to the file's size (at most it is one byte this value is never equal to the file's size (at most it is one byte
less than the file's size) and MUST be less than or equal to less than the file's size) and MUST be less than or equal to
NFS4_MAXFILEOFF. The metadata server may use this information to NFS4_MAXFILEOFF. Also, loca_last_write_offset MUST overlap the range
determine whether the file's size needs to be updated. If the described by loca_offset and loca_length. The metadata server may
metadata server updates the file's size as the result of the use this information to determine whether the file's size needs to be
LAYOUTCOMMIT operation, it must return the new size updated. If the metadata server updates the file's size as the
result of the LAYOUTCOMMIT operation, it must return the new size
(locr_newsize.ns_size) as part of the results. (locr_newsize.ns_size) as part of the results.
The loca_time_modify field allows the client to suggest a The loca_time_modify field allows the client to suggest a
modification time it would like the metadata server to set. The modification time it would like the metadata server to set. The
metadata server may use the suggestion or it may use the time of the metadata server may use the suggestion or it may use the time of the
LAYOUTCOMMIT operation to set the modification time. If the metadata LAYOUTCOMMIT operation to set the modification time. If the metadata
server uses the client provided modification time, it should ensure server uses the client provided modification time, it should ensure
time does not flow backwards. If the client wants to force the time does not flow backwards. If the client wants to force the
metadata server to set an exact time, the client should use a SETATTR metadata server to set an exact time, the client should use a SETATTR
operation in a compound right after LAYOUTCOMMIT. See Section 12.5.4 operation in a compound right after LAYOUTCOMMIT. See Section 12.5.4
skipping to change at page 508, line 11 skipping to change at page 509, line 11
The LAYOUTGET operation returns layout information for the specified The LAYOUTGET operation returns layout information for the specified
byte range: a layout. To get a layout from a specific offset through byte range: a layout. To get a layout from a specific offset through
the end-of-file, regardless of the file's length, a loga_length field the end-of-file, regardless of the file's length, a loga_length field
with all bits set to 1 (one) should be used. If loga_length is zero, with all bits set to 1 (one) should be used. If loga_length is zero,
or if a loga_length which is not all bits set to one is specified, or if a loga_length which is not all bits set to one is specified,
and loga_length when added to loga_offset exceeds the maximum 64-bit and loga_length when added to loga_offset exceeds the maximum 64-bit
unsigned integer value, the error NFS4ERR_INVAL will result. unsigned integer value, the error NFS4ERR_INVAL will result.
The loga_minlength field specifies the minimum length of layout the The loga_minlength field specifies the minimum length of layout the
server MUST return. If this requirement cannot be met, no layout server MUST return with two exceptions:
must be returned; the error NFS4ERR_BADLAYOUT will be returned.
1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and
loga_offset plus loga_minlength goes past the end of the file.
2. The range from loga_offset through loga_offset + loga_minlength -
1 overlaps two or more striping patterns. In which case,
logr_layout will contain two or more elements, and the sum of the
lo_length fields of each element MUST be at least loga_minlength
unless the first exception also applies.
If this requirement cannot be met, the server MUST NOT return a
layout and the error NFS4ERR_BADLAYOUT MUST be returned.
The loga_stateid field specifies a valid stateid. If a layout is not The loga_stateid field specifies a valid stateid. If a layout is not
currently held by the client, the loga_stateid field represents a currently held by the client, the loga_stateid field represents a
stateid reflecting the correspondingly valid open, record lock, or stateid reflecting the correspondingly valid open, record lock, or
delegation stateid. Once a layout is held by the client for the delegation stateid. Once a layout is held by the client for the
file, the loga_stateid field is a stateid as returned from a previous file, the loga_stateid field is a stateid as returned from a previous
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL
operation (see Section 12.5.3). operation (see Section 12.5.3).
The loga_maxcount field specifies the maximum layout size (in bytes) The loga_maxcount field specifies the maximum layout size (in bytes)
skipping to change at page 508, line 39 skipping to change at page 509, line 50
then logr_layout will contain just one entry. Otherwise, if the then logr_layout will contain just one entry. Otherwise, if the
requested range overlaps more than one striping pattern, logr_layout requested range overlaps more than one striping pattern, logr_layout
will contain the required number of entries. The elements of will contain the required number of entries. The elements of
logr_layout MUST be sorted in ascending order of the value of the logr_layout MUST be sorted in ascending order of the value of the
lo_offset field of each element. There MUST be no gaps or overlaps lo_offset field of each element. There MUST be no gaps or overlaps
in the range between two successive elements of logr_layout. The in the range between two successive elements of logr_layout. The
lo_iomode field in each element of logr_layout MUST be the same. lo_iomode field in each element of logr_layout MUST be the same.
The metadata server may adjust the range of the returned layout based The metadata server may adjust the range of the returned layout based
on the usage implied by the loga_iomode. The client MUST be prepared on the usage implied by the loga_iomode. The client MUST be prepared
to get a layout that does not align exactly with its request. The to get a layout that does not align exactly with its request. See
lo_length field in each element of logr_layout SHOULD be at least as
long as loga_minlength or the server SHOULD reject the request. See
Section 12.5.2 for more details. Section 12.5.2 for more details.
The metadata server may also return a layout with an lo_iomode other The metadata server may also return a layout with an lo_iomode other
than that requested by the client. If it does so, it must ensure than that requested by the client. If it does so, it MUST ensure
that the lo_iomode is more permissive than the loga_iomode requested. that the lo_iomode is more permissive than the loga_iomode requested.
For example, this behavior allows an implementation to upgrade read- For example, this behavior allows an implementation to upgrade read-
only requests to read/write requests at its discretion, within the only requests to read/write requests at its discretion, within the
limits of the layout type specific protocol. A lo_iomode of either limits of the layout type specific protocol. A lo_iomode of either
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW must be returned. LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned.
The logr_return_on_close result field is a directive to return the The logr_return_on_close result field is a directive to return the
layout before closing the file. When the server sets this return layout before closing the file. When the server sets this return
value to TRUE, it must be prepared to recall the layout in the case value to TRUE, it MUST be prepared to recall the layout in the case
the client fails to return the layout before close. For the server the client fails to return the layout before close. For the server
that knows a layout must be returned before a close of the file, this that knows a layout must be returned before a close of the file, this
return value can be used to communicate the desired behavior to the return value can be used to communicate the desired behavior to the
client and thus remove one extra step from the client's and server's client and thus remove one extra step from the client's and server's
interaction. interaction.
The logr_stateid, as with all stateid processing, is returned to the The logr_stateid, as with all stateid processing, is returned to the
client for use in subsequent layout related operations. See client for use in subsequent layout related operations. See
Section 8.2 for a further discussion. Section 8.2 for a further discussion.
skipping to change at page 509, line 36 skipping to change at page 510, line 44
If layouts are not supported for the requested file or its containing If layouts are not supported for the requested file or its containing
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If
the layout type is not supported, the metadata server should return the layout type is not supported, the metadata server should return
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout
matches the client provided layout identification, the server should matches the client provided layout identification, the server should
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should
return NFS4ERR_BADIOMODE. return NFS4ERR_BADIOMODE.
If the layout for the file is unavailable due to transient If the layout for the file is unavailable due to transient
conditions, e.g. file sharing prohibits layouts, the server must conditions, e.g. file sharing prohibits layouts, the server MUST
return NFS4ERR_LAYOUTTRYLATER. return NFS4ERR_LAYOUTTRYLATER.
If the layout request is rejected due to an overlapping layout If the layout request is rejected due to an overlapping layout
recall, the server must return NFS4ERR_RECALLCONFLICT. See recall, the server MUST return NFS4ERR_RECALLCONFLICT. See
Section 12.5.5.2 for details. Section 12.5.5.2 for details.
If the layout conflicts with a mandatory byte range lock held on the If the layout conflicts with a mandatory byte range lock held on the
file, and if the storage devices have no method of enforcing file, and if the storage devices have no method of enforcing
mandatory locks, other than through the restriction of layouts, the mandatory locks, other than through the restriction of layouts, the
metadata server should return NFS4ERR_LOCKED. metadata server should return NFS4ERR_LOCKED.
If client sets loga_signal_layout_avail to TRUE, then it is If client sets loga_signal_layout_avail to TRUE, then it is
registering with the client a "want" for a layout in the event the registering with the client a "want" for a layout in the event the
layout cannot be obtained due to resource exhaustion. If the server layout cannot be obtained due to resource exhaustion. If the server
skipping to change at page 514, line 22 skipping to change at page 515, line 22
layout. See Section 12.5.5 for more details. layout. See Section 12.5.5 for more details.
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after
the metadata server's grace period, NFS4ERR_NO_GRACE is returned. the metadata server's grace period, NFS4ERR_NO_GRACE is returned.
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and
lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL, lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL,
NFS4ERR_INVAL is returned. NFS4ERR_INVAL is returned.
If the operation specified lr_returntype of LAYOUTRETURN4_FILE, then If the operation specified lr_returntype of LAYOUTRETURN4_FILE, then
the lorr_stateid will represent the layout stateid as updated for lrs_stateid will represent the layout stateid as updated for this
this operation's processing; the current stateid will also be updated operation's processing; the current stateid will also be updated to
to match the returned value. If the last byte of any layout for the match the returned value. If the last byte of any layout for the
current file, client ID, and layout type is being returned and there current file, client ID, and layout type is being returned and there
are not remaining pending CB_LAYOUTRECALL operations for which a are no remaining pending CB_LAYOUTRECALL operations for which a
LAYOUTRETURN operation must be done as a completing operation, this LAYOUTRETURN operation must be done as a completing operation,
stateid value may be the special stateid consisting of all zeros. lrs_present MUST be FALSE, and thus no stateid will be returned.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
The server MAY require that the principal, security flavor, and if The server MAY require that the principal, security flavor, and if
applicable, the GSS mechanism, combination that acquired the layout applicable, the GSS mechanism, combination that acquired the layout
also be the one to send LAYOUTRETURN. This might not be possible if also be the one to send LAYOUTRETURN. This might not be possible if
credentials for the principal are no longer available. The server credentials for the principal are no longer available. The server
MAY allow the machine credential or SSV credential (see MAY allow the machine credential or SSV credential (see
Section 18.35) to send LAYOUTRETURN. Section 18.35) to send LAYOUTRETURN.
skipping to change at page 518, line 28 skipping to change at page 519, line 28
a request outstanding for; it could be equal to sa_slotid. The a request outstanding for; it could be equal to sa_slotid. The
server returns two "highest_slotid" values: sr_highest_slotid, and server returns two "highest_slotid" values: sr_highest_slotid, and
sr_target_highest_slotid. The former is the highest slot id the sr_target_highest_slotid. The former is the highest slot id the
server will accept in future SEQUENCE operation, and SHOULD NOT be server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the value of sa_highest_slotid. (but see Section 2.10.5.1 less than the value of sa_highest_slotid. (but see Section 2.10.5.1
for an exception). The latter is the highest slot id the server for an exception). The latter is the highest slot id the server
would prefer the client use on a future SEQUENCE operation. would prefer the client use on a future SEQUENCE operation.
If sa_cachethis is TRUE, then the client is requesting that the If sa_cachethis is TRUE, then the client is requesting that the
server cache the entire reply in the server's reply cache; therefore server cache the entire reply in the server's reply cache; therefore
the server MUST cache the reply (see Section 2.10.5.1.2). The server the server MUST cache the reply (see Section 2.10.5.1.3). The server
MAY cache the reply if sa_cachethis is FALSE. If the server does not MAY cache the reply if sa_cachethis is FALSE. If the server does not
cache the entire reply, it MUST still record that it executed the cache the entire reply, it MUST still record that it executed the
request at the specified slot and sequence id. request at the specified slot and sequence id.
The response to the SEQUENCE operation contains a word of status The response to the SEQUENCE operation contains a word of status
flags (sr_status_flags) that can provide to the client information flags (sr_status_flags) that can provide to the client information
related to the status of the client's lock state and communications related to the status of the client's lock state and communications
paths. Note that any status bits relating to lock state MAY be reset paths. Note that any status bits relating to lock state MAY be reset
when lock state is lost due to a server reboot (even if the session when lock state is lost due to a server reboot (even if the session
is persistent across reboots; session persistence does not imply lock is persistent across reboots; session persistence does not imply lock
skipping to change at page 520, line 36 skipping to change at page 521, line 36
transferred to one or more new servers. This condition will transferred to one or more new servers. This condition will
continue until the client receives an NFS4ERR_MOVED error and the continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR for the fs_locations or server receives the subsequent GETATTR for the fs_locations or
fs_locations_info attribute for an access to each file system for fs_locations_info attribute for an access to each file system for
which a lease has been moved to a new server. See which a lease has been moved to a new server. See
Section 11.7.7.1. Section 11.7.7.1.
SEQ4_STATUS_RESTART_RECLAIM_NEEDED SEQ4_STATUS_RESTART_RECLAIM_NEEDED
When set indicates that due to server restart or reboot the client When set indicates that due to server restart or reboot the client
must reclaim locking state. Until the client sends a global must reclaim locking state. Until the client sends a global
RECLAIM_COMPLETE (Section 18.51, every SEQUENCE operation will RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will
return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. return SEQ4_STATUS_RESTART_RECLAIM_NEEDED.
SEQ4_STATUS_BACKCHANNEL_FAULT SEQ4_STATUS_BACKCHANNEL_FAULT
The server has encountered an unrecoverable fault with the The server has encountered an unrecoverable fault with the
backchannel (e.g. it has lost track of the sequence id for a slot backchannel (e.g. it has lost track of the sequence id for a slot
in the backchannel). The client MUST stop sending more requests in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the to complete on the fore and back channel, and then destroy the
session. session.
skipping to change at page 525, line 48 skipping to change at page 526, line 48
o Special stateids are always considered invalid (they result in the o Special stateids are always considered invalid (they result in the
error code NFS4ERR_BAD_STATEID). error code NFS4ERR_BAD_STATEID).
All stateids are interpreted as being associated with the client for All stateids are interpreted as being associated with the client for
the current session. Any possible association with a previous the current session. Any possible association with a previous
instance of the client (as stale stateids) is not considered. instance of the client (as stale stateids) is not considered.
The errors which are validly returned within the status_code array The errors which are validly returned within the status_code array
are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID,
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED.
[[Comment.5: _LAYOUT_REVOKED]]. [[Comment.4: _LAYOUT_REVOKED]].
18.48.4. IMPLEMENTATION 18.48.4. IMPLEMENTATION
See Section 8.2.2 and Section 8.2.4 for a discussion of stateid See Section 8.2.2 and Section 8.2.4 for a discussion of stateid
structure, lifetime, and validation. structure, lifetime, and validation.
18.49. Operation 56: WANT_DELEGATION - Request Delegation 18.49. Operation 56: WANT_DELEGATION - Request Delegation
18.49.1. ARGUMENT 18.49.1. ARGUMENT
skipping to change at page 530, line 44 skipping to change at page 531, line 44
}; };
18.51.3. DESCRIPTION 18.51.3. DESCRIPTION
A RECLAIM_COMPLETE operation must be used to indicate that the client A RECLAIM_COMPLETE operation must be used to indicate that the client
has reclaimed all of the locking state that it will recover, when it has reclaimed all of the locking state that it will recover, when it
is recovering state due to either a server restart or the transfer of is recovering state due to either a server restart or the transfer of
a file system to another server. There are two types of a file system to another server. There are two types of
RECLAIM_COMPLETE operations: RECLAIM_COMPLETE operations:
o When one_fs is false, a global RECLAIM_COMPLETE is being done. o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done.
This indicates that recovery of all locks that the client held on This indicates that recovery of all locks that the client held on
the previous server instance have been completed. the previous server instance have been completed.
o When one_fs is true, a file system-specific RECLAIM_COMPLETE is o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE
being done. This indicates that recovery of locks for a single fs is being done. This indicates that recovery of locks for a single
(the one designated by the current filehandle) due to a file fs (the one designated by the current filehandle) due to a file
system transition have been completed. Presence of a current system transition have been completed. Presence of a current
filehandle is only required when one_fs is true. filehandle is only required when rca_one_fs is true.
Once a RECLAIM_COMPLETE is done, there can be no further reclaim Once a RECLAIM_COMPLETE is done, there can be no further reclaim
operations for locks whose scope is defined as having completed operations for locks whose scope is defined as having completed
recovery. Once the client sends RECLAIM_COMPLETE, the server will recovery. Once the client sends RECLAIM_COMPLETE, the server will
not allow the client to do subsequent reclaims of locking state for not allow the client to do subsequent reclaims of locking state for
that scope and will return NFS4ERR_NO_GRACE, if these are attempted. that scope and will return NFS4ERR_NO_GRACE, if these are attempted.
Whenever a client establishes a new client ID and before it does the Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST do a global first non-reclaim operation that obtains a lock, it MUST do a global
RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- RECLAIM_COMPLETE, even if there are no locks to reclaim. If non-
reclaim locking operations are done before the RECLAIM_COMPLETE, a reclaim locking operations are done before the RECLAIM_COMPLETE, a
NFS4ERR_GRACE will be returned. NFS4ERR_GRACE will be returned.
Similarly, when the client accesses a file system on a new server, Similarly, when the client accesses a file system on a new server,
before it sends the first non-reclaim operation that obtains a lock before it sends the first non-reclaim operation that obtains a lock
on this new server, it must do a RECLAIM_COMPLETE with one_fs true on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs
and current filehandle within that file system, even if there are no true and current filehandle within that file system, even if there
locks to reclaim. If non-reclaim locking operations are done on that are no locks to reclaim. If non-reclaim locking operations are done
file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will be on that file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will
returned. be returned.
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
done become non-reclaimable. The client MUST NOT attempt to reclaim done become non-reclaimable. The client MUST NOT attempt to reclaim
them, either during the current server instance or in any subsequent them, either during the current server instance or in any subsequent
server instance, or on another server to which responsibility for server instance, or on another server to which responsibility for
that file system is transferred. If the client were to do so, it that file system is transferred. If the client were to do so, it
would be violating the protocol by representing itself as owning would be violating the protocol by representing itself as owning
locks that it does not own, and so has no right to reclaim. See locks that it does not own, and so has no right to reclaim. See
Section 8.4.3 for a discussion of edge conditions related to lock Section 8.4.3 for a discussion of edge conditions related to lock
reclaim. reclaim.
skipping to change at page 533, line 6 skipping to change at page 534, line 6
18.52.4. IMPLEMENTATION 18.52.4. IMPLEMENTATION
A client will probably not send an operation with code OP_ILLEGAL but A client will probably not send an operation with code OP_ILLEGAL but
if it does, the response will be ILLEGAL4res just as it would be with if it does, the response will be ILLEGAL4res just as it would be with
any other invalid operation code. Note that if the server gets an any other invalid operation code. Note that if the server gets an
illegal operation code that is not OP_ILLEGAL, and if the server illegal operation code that is not OP_ILLEGAL, and if the server
checks for legal operation codes during the XDR decode phase, then checks for legal operation codes during the XDR decode phase, then
the ILLEGAL4res would not be returned. the ILLEGAL4res would not be returned.
19. NFSv44.1 Callback Procedures 19. NFSv4.1 Callback Procedures
The procedures used for callbacks are defined in the following The procedures used for callbacks are defined in the following
sections. In the interest of clarity, the terms "client" and sections. In the interest of clarity, the terms "client" and
"server" refer to NFS clients and servers, despite the fact that for "server" refer to NFS clients and servers, despite the fact that for
an individual callback RPC, the sense of these terms would be an individual callback RPC, the sense of these terms would be
precisely the opposite. precisely the opposite.
19.1. Procedure 0: CB_NULL - No Operation 19.1. Procedure 0: CB_NULL - No Operation
19.1.1. ARGUMENTS 19.1.1. ARGUMENTS
skipping to change at page 549, line 49 skipping to change at page 550, line 49
The server may decide that it cannot hold all of the state for The server may decide that it cannot hold all of the state for
recallable objects, such as delegations and layouts, without running recallable objects, such as delegations and layouts, without running
out of resources. In such a case, it is free to recall individual out of resources. In such a case, it is free to recall individual
objects to reduce the load but this would be far from optimal. objects to reduce the load but this would be far from optimal.
Because the general purpose of such recallable objects as delegations Because the general purpose of such recallable objects as delegations
is to eliminate client interaction with the server, the server cannot is to eliminate client interaction with the server, the server cannot
interpret lack of recent use as indicating that the object is no interpret lack of recent use as indicating that the object is no
longer useful. The absence of visible use may be the result of a longer useful. The absence of visible use may be the result of a
large number of potential operations eliminated. In the case of large number of potential operations eliminated. In the case of
layouts, the layout will be used explicitly but the meta-data server layouts, the layout will be used explicitly but the metadata server
does not have direct knowledge of such use. does not have direct knowledge of such use.
In order to implement an effective reclaim scheme for such objects, In order to implement an effective reclaim scheme for such objects,
the server's knowledge of available resources must be used to the server's knowledge of available resources must be used to
determine when objects must be recalled with the clients selecting determine when objects must be recalled with the clients selecting
the actual objects to be returned. the actual objects to be returned.
Server implementations may differ in their resource allocation Server implementations may differ in their resource allocation
requirements. For example, one server may share resources among all requirements. For example, one server may share resources among all
classes of recallable objects whereas another may use separate classes of recallable objects whereas another may use separate
skipping to change at page 553, line 9 skipping to change at page 554, line 9
slots, and if applicable, transport credits (e.g. RDMA credits for slots, and if applicable, transport credits (e.g. RDMA credits for
connections associated with the operations channel) to the server. connections associated with the operations channel) to the server.
CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target
highest_slot the server wants for the session. The client, should highest_slot the server wants for the session. The client, should
then work toward reducing the highest_slot to the target. then work toward reducing the highest_slot to the target.
If the session has only non-RDMA connections associated with its If the session has only non-RDMA connections associated with its
operations channel, then the client need only wait for all operations channel, then the client need only wait for all
outstanding requests with a slotid > rsa_target_highest_slotid to outstanding requests with a slotid > rsa_target_highest_slotid to
complete, then send a single COMPOUND consisting of a single SEQUENCE complete, then send a single COMPOUND consisting of a single SEQUENCE
operation, with the sa_highslot field set to operation, with the sa_highestslot field set to
rsa_target_highest_slotid. If there are RDMA-based connections rsa_target_highest_slotid. If there are RDMA-based connections
associated with operation channel, then the client needs to also send associated with operation channel, then the client needs to also send
enough zero-length RDMA Sends to take the total RDMA credit count to enough zero-length RDMA Sends to take the total RDMA credit count to
rsa_target_highest_slotid + 1 or below. rsa_target_highest_slotid + 1 or below.
20.8.4. IMPLEMENTATION 20.8.4. IMPLEMENTATION
If the client fails to reduce highest slot it has on the fore channel If the client fails to reduce highest slot it has on the fore channel
to what the server requests, the server can force the issue by to what the server requests, the server can force the issue by
asserting flow control on the receive side of all connections bound asserting flow control on the receive side of all connections bound
skipping to change at page 554, line 36 skipping to change at page 555, line 36
contents include the session to which this request belongs, slot id contents include the session to which this request belongs, slot id
and sequence id used by the server to implement session request and sequence id used by the server to implement session request
control and exactly once semantics, and exchanged slot maximums which control and exactly once semantics, and exchanged slot maximums which
are used to adjust the size of the reply cache. This operation MUST are used to adjust the size of the reply cache. This operation MUST
appear once as the first operation in each CB_COMPOUND request or a appear once as the first operation in each CB_COMPOUND request or a
protocol error must result. See Section 18.46.3 for a description of protocol error must result. See Section 18.46.3 for a description of
how slots are processed. how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.2). cache the reply (see Section 2.10.5.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by sessionid, slot id and sequencid. These are requests identified by sessionid, slot id and sequencid. These are requests
that the client previously sent to the server. These previous that the client previously sent to the server. These previous
requests created state that some operation(s) in the in the same requests created state that some operation(s) in the in the same
CB_COMPOUND as the csa_referring_call_lists is identifying. A CB_COMPOUND as the csa_referring_call_lists is identifying. A
sessionid is included because leased state is tied to a client ID, sessionid is included because leased state is tied to a client ID,
and a client ID can have multiple sessions. See Section 2.10.5.3. and a client ID can have multiple sessions. See Section 2.10.5.3.
The value of csa_sequenceid argument relative to the cached sequence The value of csa_sequenceid argument relative to the cached sequence
 End of changes. 126 change blocks. 
353 lines changed or deleted 441 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/