Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring...
draft-pre-ch5.txt | draft-ietf-nfsv4-minorversion1-20.txt | |||
---|---|---|---|---|
NFSv4 S. Shepler | NFSv4 S. Shepler | |||
Internet-Draft M. Eisler | Internet-Draft M. Eisler | |||
Intended status: Standards Track D. Noveck | Intended status: Standards Track D. Noveck | |||
Expires: August 24, 2008 Editors | Expires: August 25, 2008 Editors | |||
February 21, 2008 | February 22, 2008 | |||
NFS Version 4 Minor Version 1 | NFS Version 4 Minor Version 1 | |||
draft-ietf-nfsv4-minorversion1-20.txt | draft-ietf-nfsv4-minorversion1-20.txt | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on August 24, 2008. | This Internet-Draft will expire on August 25, 2008. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The IETF Trust (2008). | Copyright (C) The IETF Trust (2008). | |||
Abstract | Abstract | |||
This Internet-Draft describes NFS version 4 minor version one, | This Internet-Draft describes NFS version 4 minor version one, | |||
including features retained from the base protocol and protocol | including features retained from the base protocol and protocol | |||
extensions made subsequently. Major extensions introduced in NFS | extensions made subsequently. Major extensions introduced in NFS | |||
skipping to change at page 3, line 6 | skipping to change at page 3, line 6 | |||
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37 | 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37 | |||
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 | 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 | |||
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 | 2.9.2. Client and Server Transport Behavior . . . . . . . . 37 | |||
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 | 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 | 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 | 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 | |||
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 | 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 | |||
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 | 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 | |||
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 | 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 | |||
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 | 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 | |||
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58 | 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 59 | |||
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 | 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 | |||
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66 | 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 67 | |||
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71 | 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 71 | |||
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 72 | 2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 73 | |||
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 76 | 2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 73 | |||
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 76 | ||||
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76 | 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76 | |||
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 76 | 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 77 | |||
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77 | 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77 | |||
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 79 | 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 79 | |||
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88 | 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88 | |||
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88 | 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88 | |||
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 89 | 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 89 | |||
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89 | 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89 | |||
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89 | 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89 | |||
4.2.1. General Properties of a Filehandle . . . . . . . . . 90 | 4.2.1. General Properties of a Filehandle . . . . . . . . . 90 | |||
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 91 | 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 91 | |||
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 91 | 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 91 | |||
skipping to change at page 6, line 39 | skipping to change at page 6, line 40 | |||
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 263 | 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 263 | 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 264 | 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 264 | |||
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 265 | 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 265 | |||
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 266 | 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 266 | |||
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 266 | 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 266 | |||
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 266 | 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 266 | |||
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 268 | 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 268 | |||
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 269 | 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 269 | |||
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 270 | 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 270 | |||
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 272 | 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 273 | |||
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 279 | 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 280 | |||
12.5.7. Metadata Server Write Propagation . . . . . . . . . 279 | 12.5.7. Metadata Server Write Propagation . . . . . . . . . 280 | |||
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 280 | 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 280 | |||
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 281 | 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 282 | |||
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 282 | 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 282 | |||
12.7.2. Dealing with Lease Expiration on the Client . . . . 282 | 12.7.2. Dealing with Lease Expiration on the Client . . . . 282 | |||
12.7.3. Dealing with Loss of Layout State on the Metadata | 12.7.3. Dealing with Loss of Layout State on the Metadata | |||
Server . . . . . . . . . . . . . . . . . . . . . . . 283 | Server . . . . . . . . . . . . . . . . . . . . . . . 283 | |||
12.7.4. Recovery from Metadata Server Restart . . . . . . . 284 | 12.7.4. Recovery from Metadata Server Restart . . . . . . . 284 | |||
12.7.5. Operations During Metadata Server Grace Period . . . 286 | 12.7.5. Operations During Metadata Server Grace Period . . . 286 | |||
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 286 | 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 286 | |||
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 286 | 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 287 | |||
12.9. Security Considerations for pNFS . . . . . . . . . . . . 287 | 12.9. Security Considerations for pNFS . . . . . . . . . . . . 287 | |||
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 288 | 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 288 | |||
13.1. Client ID and Session Considerations . . . . . . . . . . 288 | 13.1. Client ID and Session Considerations . . . . . . . . . . 288 | |||
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 290 | 13.1.1. Sessions Considerations for Data Servers . . . . . . 291 | |||
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 291 | 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 291 | |||
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 295 | 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 292 | |||
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 295 | 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 296 | |||
13.4.2. Interpreting the File Layout Using Sparse Packing . 295 | 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 296 | |||
13.4.3. Interpreting the File Layout Using Dense Packing . . 298 | 13.4.2. Interpreting the File Layout Using Sparse Packing . 296 | |||
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 300 | 13.4.3. Interpreting the File Layout Using Dense Packing . . 299 | |||
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 302 | 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 301 | |||
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 303 | 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 303 | |||
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 305 | 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 304 | |||
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 307 | 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 306 | |||
13.9. Metadata and Data Server State Coordination . . . . . . 307 | 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 308 | |||
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 307 | 13.9. Metadata and Data Server State Coordination . . . . . . 308 | |||
13.9.2. Data Server State Propagation . . . . . . . . . . . 308 | 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 308 | |||
13.10. Data Server Component File Size . . . . . . . . . . . . 310 | 13.9.2. Data Server State Propagation . . . . . . . . . . . 309 | |||
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 311 | 13.10. Data Server Component File Size . . . . . . . . . . . . 311 | |||
13.12. Security Considerations for the File Layout Type . . . . 311 | 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 312 | |||
14. Internationalization . . . . . . . . . . . . . . . . . . . . 312 | 13.12. Security Considerations for the File Layout Type . . . . 312 | |||
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 313 | 14. Internationalization . . . . . . . . . . . . . . . . . . . . 313 | |||
14.2. Stringprep profile for the utf8str_cis type . . . . . . 315 | 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 314 | |||
14.3. Stringprep profile for the utf8str_mixed type . . . . . 316 | 14.2. Stringprep profile for the utf8str_cis type . . . . . . 316 | |||
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 318 | 14.3. Stringprep profile for the utf8str_mixed type . . . . . 317 | |||
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 318 | 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 319 | |||
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 319 | 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 319 | |||
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 319 | 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 320 | |||
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 321 | 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 320 | |||
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 323 | 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 322 | |||
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 324 | 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 324 | |||
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 326 | 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 325 | |||
15.1.5. State Management Errors . . . . . . . . . . . . . . 328 | 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 327 | |||
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 329 | 15.1.5. State Management Errors . . . . . . . . . . . . . . 329 | |||
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 329 | 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 330 | |||
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 330 | 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 330 | |||
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 331 | 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 331 | |||
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 332 | 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 332 | |||
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 333 | 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 333 | |||
15.1.12. Session Management Errors . . . . . . . . . . . . . 334 | 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 334 | |||
15.1.13. Client Management Errors . . . . . . . . . . . . . . 335 | 15.1.12. Session Management Errors . . . . . . . . . . . . . 335 | |||
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 336 | 15.1.13. Client Management Errors . . . . . . . . . . . . . . 336 | |||
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 336 | 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 337 | |||
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 337 | 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 337 | |||
15.2. Operations and their valid errors . . . . . . . . . . . 338 | 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 338 | |||
15.3. Callback operations and their valid errors . . . . . . . 354 | 15.2. Operations and their valid errors . . . . . . . . . . . 339 | |||
15.4. Errors and the operations that use them . . . . . . . . 356 | 15.3. Callback operations and their valid errors . . . . . . . 355 | |||
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 370 | 15.4. Errors and the operations that use them . . . . . . . . 357 | |||
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 370 | 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 371 | |||
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 371 | 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 371 | |||
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 381 | 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 372 | |||
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 384 | 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 382 | |||
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 384 | 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 385 | |||
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 387 | 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 385 | |||
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 388 | 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 388 | |||
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 391 | 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 389 | |||
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 392 | ||||
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | |||
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 394 | Recovery . . . . . . . . . . . . . . . . . . . . . . . . 395 | |||
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 395 | 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 396 | |||
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 395 | 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 396 | |||
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 397 | 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 398 | |||
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 398 | 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 399 | |||
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 400 | 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 401 | |||
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 404 | 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 405 | |||
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 406 | 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 407 | |||
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 407 | 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 408 | |||
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 409 | 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 410 | |||
18.15. Operation 17: NVERIFY - Verify Difference in | 18.15. Operation 17: NVERIFY - Verify Difference in | |||
Attributes . . . . . . . . . . . . . . . . . . . . . . . 410 | Attributes . . . . . . . . . . . . . . . . . . . . . . . 411 | |||
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 411 | 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 412 | |||
18.17. Operation 19: OPENATTR - Open Named Attribute | 18.17. Operation 19: OPENATTR - Open Named Attribute | |||
Directory . . . . . . . . . . . . . . . . . . . . . . . 430 | Directory . . . . . . . . . . . . . . . . . . . . . . . 431 | |||
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 431 | 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 432 | |||
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 432 | 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 433 | |||
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 433 | 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 434 | |||
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 435 | 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 436 | |||
18.22. Operation 25: READ - Read from File . . . . . . . . . . 435 | 18.22. Operation 25: READ - Read from File . . . . . . . . . . 436 | |||
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 438 | 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 439 | |||
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 441 | 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 442 | |||
18.25. Operation 28: REMOVE - Remove File System Object . . . . 442 | 18.25. Operation 28: REMOVE - Remove File System Object . . . . 443 | |||
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 445 | 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 446 | |||
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 448 | 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 449 | |||
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 449 | 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 450 | |||
18.29. Operation 33: SECINFO - Obtain Available Security . . . 450 | 18.29. Operation 33: SECINFO - Obtain Available Security . . . 451 | |||
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 453 | 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 454 | |||
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 456 | 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 457 | |||
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 457 | 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 458 | |||
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 462 | 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 463 | |||
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 463 | 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 464 | |||
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 466 | 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 467 | |||
18.36. Operation 43: CREATE_SESSION - Create New Session and | 18.36. Operation 43: CREATE_SESSION - Create New Session and | |||
Confirm Client ID . . . . . . . . . . . . . . . . . . . 482 | Confirm Client ID . . . . . . . . . . . . . . . . . . . 483 | |||
18.37. Operation 44: DESTROY_SESSION - Destroy existing | 18.37. Operation 44: DESTROY_SESSION - Destroy existing | |||
session . . . . . . . . . . . . . . . . . . . . . . . . 492 | session . . . . . . . . . . . . . . . . . . . . . . . . 493 | |||
18.38. Operation 45: FREE_STATEID - Free stateid with no | 18.38. Operation 45: FREE_STATEID - Free stateid with no | |||
locks . . . . . . . . . . . . . . . . . . . . . . . . . 494 | locks . . . . . . . . . . . . . . . . . . . . . . . . . 495 | |||
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | |||
delegation . . . . . . . . . . . . . . . . . . . . . . . 495 | delegation . . . . . . . . . . . . . . . . . . . . . . . 496 | |||
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 499 | 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 500 | |||
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | |||
for a File System . . . . . . . . . . . . . . . . . . . 501 | for a File System . . . . . . . . . . . . . . . . . . . 502 | |||
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | |||
a layout . . . . . . . . . . . . . . . . . . . . . . . . 503 | a layout . . . . . . . . . . . . . . . . . . . . . . . . 504 | |||
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 506 | 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 507 | |||
18.44. Operation 51: LAYOUTRETURN - Release Layout | 18.44. Operation 51: LAYOUTRETURN - Release Layout | |||
Information . . . . . . . . . . . . . . . . . . . . . . 510 | Information . . . . . . . . . . . . . . . . . . . . . . 511 | |||
18.45. Operation 52: SECINFO_NO_NAME - Get Security on | 18.45. Operation 52: SECINFO_NO_NAME - Get Security on | |||
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 515 | Unnamed Object . . . . . . . . . . . . . . . . . . . . . 516 | |||
18.46. Operation 53: SEQUENCE - Supply per-procedure | 18.46. Operation 53: SEQUENCE - Supply per-procedure | |||
sequencing and control . . . . . . . . . . . . . . . . . 516 | sequencing and control . . . . . . . . . . . . . . . . . 517 | |||
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 522 | 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 523 | |||
18.48. Operation 55: TEST_STATEID - Test stateids for | 18.48. Operation 55: TEST_STATEID - Test stateids for | |||
validity . . . . . . . . . . . . . . . . . . . . . . . . 524 | validity . . . . . . . . . . . . . . . . . . . . . . . . 525 | |||
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 526 | 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 527 | |||
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | |||
client ID . . . . . . . . . . . . . . . . . . . . . . . 529 | client ID . . . . . . . . . . . . . . . . . . . . . . . 530 | |||
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | |||
Finished . . . . . . . . . . . . . . . . . . . . . . . . 530 | Finished . . . . . . . . . . . . . . . . . . . . . . . . 531 | |||
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 532 | 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 533 | |||
19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 533 | 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 534 | |||
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 533 | 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 534 | |||
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 533 | 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 534 | |||
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 538 | 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 539 | |||
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 538 | 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 539 | |||
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 539 | 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 540 | |||
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | |||
Client . . . . . . . . . . . . . . . . . . . . . . . . . 540 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 541 | |||
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 544 | 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 545 | |||
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | |||
Client . . . . . . . . . . . . . . . . . . . . . . . . . 548 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 549 | |||
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 549 | 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 550 | |||
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | |||
Resources for Recallable Objects . . . . . . . . . . . . 551 | Resources for Recallable Objects . . . . . . . . . . . . 552 | |||
20.8. Operation 10: CB_RECALL_SLOT - change flow control | 20.8. Operation 10: CB_RECALL_SLOT - change flow control | |||
limits . . . . . . . . . . . . . . . . . . . . . . . . . 552 | limits . . . . . . . . . . . . . . . . . . . . . . . . . 553 | |||
20.9. Operation 11: CB_SEQUENCE - Supply backchannel | 20.9. Operation 11: CB_SEQUENCE - Supply backchannel | |||
sequencing and control . . . . . . . . . . . . . . . . . 553 | sequencing and control . . . . . . . . . . . . . . . . . 554 | |||
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | |||
Delegation Wants . . . . . . . . . . . . . . . . . . . . 555 | Delegation Wants . . . . . . . . . . . . . . . . . . . . 556 | |||
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | |||
lock availability . . . . . . . . . . . . . . . . . . . 556 | lock availability . . . . . . . . . . . . . . . . . . . 557 | |||
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | |||
changes . . . . . . . . . . . . . . . . . . . . . . . . 558 | changes . . . . . . . . . . . . . . . . . . . . . . . . 559 | |||
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | |||
Operation . . . . . . . . . . . . . . . . . . . . . . . 560 | Operation . . . . . . . . . . . . . . . . . . . . . . . 561 | |||
21. Security Considerations . . . . . . . . . . . . . . . . . . . 560 | 21. Security Considerations . . . . . . . . . . . . . . . . . . . 561 | |||
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 562 | 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 563 | |||
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 562 | 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 563 | |||
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 562 | 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 563 | |||
22.3. Defining New Notifications . . . . . . . . . . . . . . . 563 | 22.3. Defining New Notifications . . . . . . . . . . . . . . . 564 | |||
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 563 | 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 564 | |||
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 565 | 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 566 | |||
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 565 | 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 566 | |||
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 565 | 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 566 | |||
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 565 | 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 566 | |||
23.1. Normative References . . . . . . . . . . . . . . . . . . 565 | 23.1. Normative References . . . . . . . . . . . . . . . . . . 566 | |||
23.2. Informative References . . . . . . . . . . . . . . . . . 567 | 23.2. Informative References . . . . . . . . . . . . . . . . . 568 | |||
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 568 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 569 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 570 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 571 | |||
Intellectual Property and Copyright Statements . . . . . . . . . 572 | Intellectual Property and Copyright Statements . . . . . . . . . 573 | |||
1. Introduction | 1. Introduction | |||
1.1. The NFS Version 4 Minor Version 1 Protocol | 1.1. The NFS Version 4 Minor Version 1 Protocol | |||
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | |||
minor version of the NFS version 4 (NFSv4) protocol. The first minor | minor version of the NFS version 4 (NFSv4) protocol. The first minor | |||
version, NFSv4.0 is described in [21]. It generally follows the | version, NFSv4.0 is described in [21]. It generally follows the | |||
guidelines for minor versioning model listed in Section 10 of RFC | guidelines for minor versioning model listed in Section 10 of RFC | |||
3530. However, it diverges from guidelines 11 ("a client and server | 3530. However, it diverges from guidelines 11 ("a client and server | |||
skipping to change at page 27, line 45 | skipping to change at page 27, line 45 | |||
the client ID in order to conserve resources. If the client contacts | the client ID in order to conserve resources. If the client contacts | |||
the server after this release, the server must ensure the client | the server after this release, the server must ensure the client | |||
receives the appropriate error so that it will use the EXCHANGE_ID/ | receives the appropriate error so that it will use the EXCHANGE_ID/ | |||
CREATE_SESSION sequence to establish a new client ID. The server | CREATE_SESSION sequence to establish a new client ID. The server | |||
ought to be very hesitant to release a client ID since the resulting | ought to be very hesitant to release a client ID since the resulting | |||
work on the client to recover from such an event will be the same | work on the client to recover from such an event will be the same | |||
burden as if the server had failed and restarted. Typically a server | burden as if the server had failed and restarted. Typically a server | |||
would not release a client ID unless there had been no activity from | would not release a client ID unless there had been no activity from | |||
that client for many minutes. As long as there are sessions, opens, | that client for many minutes. As long as there are sessions, opens, | |||
locks, delegations, layouts, or wants, the server MUST NOT release | locks, delegations, layouts, or wants, the server MUST NOT release | |||
the client ID. See Section 2.10.10.1.4 for discussion on releasing | the client ID. See Section 2.10.11.1.4 for discussion on releasing | |||
inactive sessions. | inactive sessions. | |||
2.4.3. Resolving Client Owner Conflicts | 2.4.3. Resolving Client Owner Conflicts | |||
When the server gets an EXCHANGE_ID for a client owner that currently | When the server gets an EXCHANGE_ID for a client owner that currently | |||
has no state, or that has state, but the lease has expired, the | has no state, or that has state, but the lease has expired, the | |||
server MUST allow the EXCHANGE_ID, and confirm the new client ID if | server MUST allow the EXCHANGE_ID, and confirm the new client ID if | |||
followed by the appropriate CREATE_SESSION. | followed by the appropriate CREATE_SESSION. | |||
When the server gets an EXCHANGE_ID for a new incarnation of a client | When the server gets an EXCHANGE_ID for a new incarnation of a client | |||
skipping to change at page 46, line 43 | skipping to change at page 46, line 43 | |||
2.10.5. Exactly Once Semantics | 2.10.5. Exactly Once Semantics | |||
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for | Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for | |||
requests sent over a channel. EOS is supported on both the fore and | requests sent over a channel. EOS is supported on both the fore and | |||
back channels. | back channels. | |||
Each COMPOUND or CB_COMPOUND request that is sent with a leading | Each COMPOUND or CB_COMPOUND request that is sent with a leading | |||
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver | SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver | |||
exactly once. This requirement holds regardless of whether the | exactly once. This requirement holds regardless of whether the | |||
request is sent with reply caching specified (see | request is sent with reply caching specified (see | |||
Section 2.10.5.1.2). The requirement holds even if the requester is | Section 2.10.5.1.3). The requirement holds even if the requester is | |||
issuing the request over a session created between a pNFS data client | issuing the request over a session created between a pNFS data client | |||
and pNFS data server. To understand the rationale for this | and pNFS data server. To understand the rationale for this | |||
requirement, divide the requests into three classifications: | requirement, divide the requests into three classifications: | |||
o Nonidempotent requests. | o Nonidempotent requests. | |||
o Idempotent modifying requests. | o Idempotent modifying requests. | |||
o Idempotent non-modifying requests. | o Idempotent non-modifying requests. | |||
skipping to change at page 49, line 40 | skipping to change at page 49, line 40 | |||
seen in the slot. Note that because the sequence id must | seen in the slot. Note that because the sequence id must | |||
wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered | wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered | |||
new request and a misordered retry cannot be distinguished. Thus, | new request and a misordered retry cannot be distinguished. Thus, | |||
the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from | the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from | |||
SEQUENCE or CB_SEQUENCE). | SEQUENCE or CB_SEQUENCE). | |||
Unlike the XID, the slot id is always within a specific range; this | Unlike the XID, the slot id is always within a specific range; this | |||
has two implications. The first implication is that for a given | has two implications. The first implication is that for a given | |||
session, the replier need only cache the results of a limited number | session, the replier need only cache the results of a limited number | |||
of COMPOUND requests . The second implication derives from the | of COMPOUND requests . The second implication derives from the | |||
first, which is unlike XID-indexed reply caches (also known as | first, which is that unlike XID-indexed reply caches (also known as | |||
duplicate request caches - DRCs), the slot id-based reply cache | duplicate request caches - DRCs), the slot id-based reply cache | |||
cannot be overflowed. Through use of the sequence id to identify | cannot be overflowed. Through use of the sequence id to identify | |||
retransmitted requests, the replier does not need to actually cache | retransmitted requests, the replier does not need to actually cache | |||
the request itself, reducing the storage requirements of the reply | the request itself, reducing the storage requirements of the reply | |||
cache further. These facilities make it practical to maintain all | cache further. These facilities make it practical to maintain all | |||
the required entries for an effective reply cache. | the required entries for an effective reply cache. | |||
The slot id, sequence id, and sessionid therefore take over the | The slot id, sequence id, and sessionid therefore take over the | |||
traditional role of the XID and source network address in the | traditional role of the XID and source network address in the | |||
replier's reply cache implementation. This approach is considerably | replier's reply cache implementation. This approach is considerably | |||
skipping to change at page 52, line 23 | skipping to change at page 52, line 23 | |||
because the request may have been sent from the requester before | because the request may have been sent from the requester before | |||
the update was received. Therefore, in the downward adjustment | the update was received. Therefore, in the downward adjustment | |||
case, the replier may have to retain a number of reply cache | case, the replier may have to retain a number of reply cache | |||
entries at least as large as the old value of maximum requests | entries at least as large as the old value of maximum requests | |||
outstanding, until it can infer that the requester has seen a | outstanding, until it can infer that the requester has seen a | |||
reply containing the new granted highest_slotid. The replier can | reply containing the new granted highest_slotid. The replier can | |||
infer that requester as seen such a reply when it receives a new | infer that requester as seen such a reply when it receives a new | |||
request with the same slotid as the request replied to and the | request with the same slotid as the request replied to and the | |||
next higher sequenceid. | next higher sequenceid. | |||
2.10.5.1.1. Errors from SEQUENCE and CB_SEQUENCE | 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies | |||
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, | ||||
its reply MUST always be cached. Specifically, sessionid, | ||||
sequenceid, and slotid MUST be cached in the reply cache. The reply | ||||
from SEQUENCE also includes the highest slotid, target highest | ||||
slotid, and status flags. The server SHOULD NOT cache these values, | ||||
and instead SHOULD re-compute the values from the current state of | ||||
the fore channel, session and/or client ID as appropriate. | ||||
Similarly, the reply from CB_SEQUENCE includes a highest slotid and | ||||
target highest slotid. The client SHOULD NOT cache these values, and | ||||
SHOULD re-compute the values from the current state of the session as | ||||
appropriate. | ||||
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE | ||||
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of | Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of | |||
the slot MUST NOT change. The replier MUST NOT modify the reply | the slot MUST NOT change. The replier MUST NOT modify the reply | |||
cache entry for the slot whenever an error is returned from SEQUENCE | cache entry for the slot whenever an error is returned from SEQUENCE | |||
or CB_SEQUENCE. | or CB_SEQUENCE. | |||
2.10.5.1.2. Optional Reply Caching | 2.10.5.1.3. Optional Reply Caching | |||
On a per-request basis the requester can choose to direct the replier | On a per-request basis the requester can choose to direct the replier | |||
to cache the reply to all operations after the first operation | to cache the reply to all operations after the first operation | |||
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis | (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis | |||
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it | fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it | |||
would not direct the replier to cache the entire reply is that the | would not direct the replier to cache the entire reply is that the | |||
request is composed of all idempotent operations [24]. Caching the | request is composed of all idempotent operations [24]. Caching the | |||
reply may offer little benefit. If the reply is too large (see | reply may offer little benefit. If the reply is too large (see | |||
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply | Section 2.10.5.4), it may not be cacheable anyway. Even if the reply | |||
to idempotent request is small enough to cache, unnecessarily caching | to idempotent request is small enough to cache, unnecessarily caching | |||
skipping to change at page 53, line 9 | skipping to change at page 53, line 23 | |||
incremented by one. If a requester does not direct the replier to | incremented by one. If a requester does not direct the replier to | |||
cache the reply, the replier MUST do one of following: | cache the reply, the replier MUST do one of following: | |||
o The replier can cache the entire original reply. Even though | o The replier can cache the entire original reply. Even though | |||
sa_cachethis or csa_cachethis are FALSE, the replier is always | sa_cachethis or csa_cachethis are FALSE, the replier is always | |||
free to cache. It may choose this approach in order to simplify | free to cache. It may choose this approach in order to simplify | |||
implementation. | implementation. | |||
o The replier enters into its reply cache a reply consisting of the | o The replier enters into its reply cache a reply consisting of the | |||
original results to the SEQUENCE or CB_SEQUENCE operation, and | original results to the SEQUENCE or CB_SEQUENCE operation, and | |||
with the next operation in COMPOUND or CB)COMPOUND having the | with the next operation in COMPOUND or CB_COMPOUND having the | |||
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later | error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later | |||
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. | retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. | |||
2.10.5.2. Retry and Replay of Reply | 2.10.5.2. Retry and Replay of Reply | |||
A requester MUST NOT retry a request, unless the connection it used | A requester MUST NOT retry a request, unless the connection it used | |||
to send the request disconnects. The requester can then reconnect | to send the request disconnects. The requester can then reconnect | |||
and re-send the request, or it can re-send the request over a | and re-send the request, or it can re-send the request over a | |||
different connection that is associated with the same session. | different connection that is associated with the same session. | |||
skipping to change at page 56, line 11 | skipping to change at page 56, line 24 | |||
If a reply exceeds ca_maxresponsesize, the reply will have the status | If a reply exceeds ca_maxresponsesize, the reply will have the status | |||
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | |||
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, | status for first operation (SEQUENCE or CB_SEQUENCE) in the request, | |||
or it MAY chose to return it on a subsequent operation (in the same | or it MAY chose to return it on a subsequent operation (in the same | |||
COMPOUND or CB_COMPOUND reply). A replier MAY return | COMPOUND or CB_COMPOUND reply). A replier MAY return | |||
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | |||
the response would still exceed ca_maxresponsesize. | the response would still exceed ca_maxresponsesize. | |||
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST | If sa_cachethis or csa_cachethis are TRUE, then the replier MUST | |||
cache a reply except if an error is returned by the SEQUENCE or | cache a reply except if an error is returned by the SEQUENCE or | |||
CB_SEQUENCE operation (see Section 2.10.5.1.1). If the reply exceeds | CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds | |||
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are | ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are | |||
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even | TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even | |||
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) | if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) | |||
is returned on a operation other than first operation (SEQUENCE or | is returned on a operation other than first operation (SEQUENCE or | |||
CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or | CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or | |||
csa_cachethis are TRUE. For example, if a COMPOUND has eleven | csa_cachethis are TRUE. For example, if a COMPOUND has eleven | |||
operations, including SEQUENCE, the fifth operation is a RENAME, and | operations, including SEQUENCE, the fifth operation is a RENAME, and | |||
the tenth operation is a READ for one million bytes, the server may | the tenth operation is a READ for one million bytes, the server may | |||
return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since | return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since | |||
the server executed several operations, especially the non-idempotent | the server executed several operations, especially the non-idempotent | |||
skipping to change at page 71, line 18 | skipping to change at page 71, line 27 | |||
Section 5.2.2 "Context Creation Requests" in [4]). | Section 5.2.2 "Context Creation Requests" in [4]). | |||
2.10.9. Session Mechanics - Steady State | 2.10.9. Session Mechanics - Steady State | |||
2.10.9.1. Obligations of the Server | 2.10.9.1. Obligations of the Server | |||
The server has the primary obligation to monitor the state of | The server has the primary obligation to monitor the state of | |||
backchannel resources that the client has created for the server | backchannel resources that the client has created for the server | |||
(RPCSEC_GSS contexts and backchannel connections). If these | (RPCSEC_GSS contexts and backchannel connections). If these | |||
resources vanish, the server takes action as specified in | resources vanish, the server takes action as specified in | |||
Section 2.10.10.2. | Section 2.10.11.2. | |||
2.10.9.2. Obligations of the Client | 2.10.9.2. Obligations of the Client | |||
The client SHOULD honor the following obligations in order to utilize | The client SHOULD honor the following obligations in order to utilize | |||
the session: | the session: | |||
o Keep a necessary session from going idle on the server. A client | o Keep a necessary session from going idle on the server. A client | |||
that requires a session, but nonetheless is not sending operations | that requires a session, but nonetheless is not sending operations | |||
risks having the session be destroyed by the server. This is | risks having the session be destroyed by the server. This is | |||
because sessions consume resources, and resource limitations may | because sessions consume resources, and resource limitations may | |||
force the server to cull an inactive session. | force the server to cull an inactive session. A server MAY | |||
consider a session to be inactive if the client has not used the | ||||
session before the session inactivity timer (Section 2.10.10) has | ||||
expired. | ||||
o Destroy the session when not needed. If a client has multiple | o Destroy the session when not needed. If a client has multiple | |||
sessions, one of which has no requests waiting for replies, and | sessions, one of which has no requests waiting for replies, and | |||
has been idle for some period of time, it SHOULD destroy the | has been idle for some period of time, it SHOULD destroy the | |||
session. | session. | |||
o Maintain GSS contexts for the backchannel. If the client requires | o Maintain GSS contexts for the backchannel. If the client requires | |||
the server to use the RPCSEC_GSS security flavor for callbacks, | the server to use the RPCSEC_GSS security flavor for callbacks, | |||
then it needs to be sure the contexts handed to the server via | then it needs to be sure the contexts handed to the server via | |||
BACKCHANNEL_CTL are unexpired. | BACKCHANNEL_CTL are unexpired. | |||
skipping to change at page 72, line 47 | skipping to change at page 73, line 9 | |||
If the client wants to use additional connections for the | If the client wants to use additional connections for the | |||
backchannel, then it must call BIND_CONN_TO_SESSION on each | backchannel, then it must call BIND_CONN_TO_SESSION on each | |||
connection it wants to use with the session. If the client wants to | connection it wants to use with the session. If the client wants to | |||
use additional connections for the fore channel, then it must call | use additional connections for the fore channel, then it must call | |||
BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state | BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state | |||
protection when the client ID was created. | protection when the client ID was created. | |||
At this point the session has reached steady state. | At this point the session has reached steady state. | |||
2.10.10. Session Mechanics - Recovery | 2.10.10. Session Inactivity Timer | |||
2.10.10.1. Events Requiring Client Action | The server MAY maintain an session inactivity timer for each session. | |||
If the session inactivity timer expires, then the server MAY destroy | ||||
the session. To avoid losing a session due to inactivity, the client | ||||
MUST renew the session inactivity timer. The length of session | ||||
inactivity timer MUST NOT be less than the lease_time attribute | ||||
(Section 5.7.1.11). As with lease renewal (Section 8.3), when the | ||||
server receives a SEQUENCE operation, it resets the session | ||||
inactivity timer, and MUST NOT allow the timer to expire while the | ||||
rest of the operations in the COMPOUND procedure's request are still | ||||
executing. Once the last operation has finished, the server MUST set | ||||
the session inactivity timer to expire no sooner that the sum of the | ||||
current time and the value of the lease_time attribute. | ||||
2.10.11. Session Mechanics - Recovery | ||||
2.10.11.1. Events Requiring Client Action | ||||
The following events require client action to recover. | The following events require client action to recover. | |||
2.10.10.1.1. RPCSEC_GSS Context Loss by Callback Path | 2.10.11.1.1. RPCSEC_GSS Context Loss by Callback Path | |||
If all RPCSEC_GSS contexts granted by the client to the server for | If all RPCSEC_GSS contexts granted by the client to the server for | |||
callback use have expired, the client MUST establish a new context | callback use have expired, the client MUST establish a new context | |||
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE | via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE | |||
results indicates when callback contexts are nearly expired, or fully | results indicates when callback contexts are nearly expired, or fully | |||
expired (see Section 18.46.3). | expired (see Section 18.46.3). | |||
2.10.10.1.2. Connection Loss | 2.10.11.1.2. Connection Loss | |||
If the client loses the last connection of the session, and if wants | If the client loses the last connection of the session, and if wants | |||
to retain the session, then it must create a new connection, and if, | to retain the session, then it must create a new connection, and if, | |||
when the client ID was created, BIND_CONN_TO_SESSION was specified in | when the client ID was created, BIND_CONN_TO_SESSION was specified in | |||
the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION | the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION | |||
to associate the connection with the session. | to associate the connection with the session. | |||
If there was a request outstanding at the time the of connection | If there was a request outstanding at the time the of connection | |||
loss, then if client wants to continue to use the session it MUST | loss, then if client wants to continue to use the session it MUST | |||
retry the request, as described in Section 2.10.5.2. Note that it is | retry the request, as described in Section 2.10.5.2. Note that it is | |||
skipping to change at page 73, line 39 | skipping to change at page 74, line 16 | |||
disconnect. | disconnect. | |||
If the connection that was lost was the last one associated with the | If the connection that was lost was the last one associated with the | |||
backchannel, and the client wants to retain the backchannel and/or | backchannel, and the client wants to retain the backchannel and/or | |||
not put recallable state subject to revocation, the client must | not put recallable state subject to revocation, the client must | |||
reconnect, and if it does, it MUST associate the connection to the | reconnect, and if it does, it MUST associate the connection to the | |||
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD | session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD | |||
indicate when it has no callback connection via the sr_status_flags | indicate when it has no callback connection via the sr_status_flags | |||
result from SEQUENCE. | result from SEQUENCE. | |||
2.10.10.1.3. Backchannel GSS Context Loss | 2.10.11.1.3. Backchannel GSS Context Loss | |||
Via the sr_status_flags result of the SEQUENCE operation or other | Via the sr_status_flags result of the SEQUENCE operation or other | |||
means, the client will learn if some or all of the RPCSEC_GSS | means, the client will learn if some or all of the RPCSEC_GSS | |||
contexts it assigned to the backchannel have been lost. If the | contexts it assigned to the backchannel have been lost. If the | |||
client wants to the retain the backchannel and/or not put recallable | client wants to the retain the backchannel and/or not put recallable | |||
state subjection to revocation, the client must use BACKCHANNEL_CTL | state subjection to revocation, the client must use BACKCHANNEL_CTL | |||
to assign new contexts. | to assign new contexts. | |||
2.10.10.1.4. Loss of Session | 2.10.11.1.4. Loss of Session | |||
The replier might lose a record of the session. Causes include: | The replier might lose a record of the session. Causes include: | |||
o Replier failure and restart | o Replier failure and restart | |||
o A catastrophe that causes the reply cache to be corrupted or lost | o A catastrophe that causes the reply cache to be corrupted or lost | |||
on the media it was stored on. This applies even if the replier | on the media it was stored on. This applies even if the replier | |||
indicated in the CREATE_SESSION results that it would persist the | indicated in the CREATE_SESSION results that it would persist the | |||
cache. | cache. | |||
skipping to change at page 75, line 5 | skipping to change at page 75, line 27 | |||
client ID; loss of client ID however does imply loss of session, | client ID; loss of client ID however does imply loss of session, | |||
lock, open, delegation, and layout state. See Section 8.4.2. A | lock, open, delegation, and layout state. See Section 8.4.2. A | |||
session can survive a server restart, but lock recovery may still be | session can survive a server restart, but lock recovery may still be | |||
needed. | needed. | |||
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID | It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID | |||
(for example the server restarts and does not preserve client ID | (for example the server restarts and does not preserve client ID | |||
state). If so, the client needs to call EXCHANGE_ID, followed by | state). If so, the client needs to call EXCHANGE_ID, followed by | |||
CREATE_SESSION. | CREATE_SESSION. | |||
2.10.10.2. Events Requiring Server Action | 2.10.11.2. Events Requiring Server Action | |||
The following events require server action to recover. | The following events require server action to recover. | |||
2.10.10.2.1. Client Crash and Restart | 2.10.11.2.1. Client Crash and Restart | |||
As described in Section 18.35, a restarted client sends EXCHANGE_ID | As described in Section 18.35, a restarted client sends EXCHANGE_ID | |||
in such a way it causes the server to delete any sessions it had. | in such a way it causes the server to delete any sessions it had. | |||
2.10.10.2.2. Client Crash with No Restart | 2.10.11.2.2. Client Crash with No Restart | |||
If a client crashes and never comes back, it will never send | If a client crashes and never comes back, it will never send | |||
EXCHANGE_ID with its old client owner. Thus the server has session | EXCHANGE_ID with its old client owner. Thus the server has session | |||
state that will never be used again. After an extended period of | state that will never be used again. After an extended period of | |||
time and if the server has resource constraints, it MAY destroy the | time and if the server has resource constraints, it MAY destroy the | |||
old session as well as locking state. | old session as well as locking state. | |||
2.10.10.2.3. Extended Network Partition | 2.10.11.2.3. Extended Network Partition | |||
To the server, the extended network partition may be no different | To the server, the extended network partition may be no different | |||
from a client crash with no restart (see Section 2.10.10.2.2). | from a client crash with no restart (see Section 2.10.11.2.2). | |||
Unless the server can discern that there is a network partition, it | Unless the server can discern that there is a network partition, it | |||
is free to treat the situation as if the client has crashed | is free to treat the situation as if the client has crashed | |||
permanently. | permanently. | |||
2.10.10.2.4. Backchannel Connection Loss | 2.10.11.2.4. Backchannel Connection Loss | |||
If there were callback requests outstanding at the time of a | If there were callback requests outstanding at the time of a | |||
connection loss, then the server MUST retry the request, as described | connection loss, then the server MUST retry the request, as described | |||
in Section 2.10.5.2. Note that it is not necessary to retry requests | in Section 2.10.5.2. Note that it is not necessary to retry requests | |||
over a connection with the same source network address or the same | over a connection with the same source network address or the same | |||
destination network address as the lost connection. As long as the | destination network address as the lost connection. As long as the | |||
sessionid, slot id, and sequence id in the retry match that of the | sessionid, slot id, and sequence id in the retry match that of the | |||
original request, the callback target will recognize the request as a | original request, the callback target will recognize the request as a | |||
retry even if it did see the request prior to disconnect. | retry even if it did see the request prior to disconnect. | |||
If the connection lost is the last one associated with the | If the connection lost is the last one associated with the | |||
backchannel, then the server MUST indicate that in the | backchannel, then the server MUST indicate that in the | |||
sr_status_flags field of every SEQUENCE reply until the backchannel | sr_status_flags field of every SEQUENCE reply until the backchannel | |||
is reestablished. There are two situations each of which use | is reestablished. There are two situations each of which use | |||
different status flags: no connectivity for the session's | different status flags: no connectivity for the session's | |||
backchannel, and no connectivity for any session backchannel of the | backchannel, and no connectivity for any session backchannel of the | |||
client. See Section 18.46 for a description of the appropriate flags | client. See Section 18.46 for a description of the appropriate flags | |||
in sr_status_flags. | in sr_status_flags. | |||
2.10.10.2.5. GSS Context Loss | 2.10.11.2.5. GSS Context Loss | |||
The server SHOULD monitor when the number RPCSEC_GSS contexts | The server SHOULD monitor when the number RPCSEC_GSS contexts | |||
assigned to the backchannel reaches one, and when that one context is | assigned to the backchannel reaches one, and when that one context is | |||
near expiry (i.e. between one and two periods of lease time), | near expiry (i.e. between one and two periods of lease time), | |||
indicate so in the sr_status_flags field of all SEQUENCE replies. | indicate so in the sr_status_flags field of all SEQUENCE replies. | |||
The server MUST indicate when the all of the backchannel's assigned | The server MUST indicate when the all of the backchannel's assigned | |||
RPCSEC_GSS contexts have expired in the sr_status_flags field of all | RPCSEC_GSS contexts have expired in the sr_status_flags field of all | |||
SEQUENCE replies. | SEQUENCE replies. | |||
2.10.11. Parallel NFS and Sessions | 2.10.12. Parallel NFS and Sessions | |||
A client and server can potentially be a non-pNFS implementation, a | A client and server can potentially be a non-pNFS implementation, a | |||
metadata server implementation, a data server implementation, or two | metadata server implementation, a data server implementation, or two | |||
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, | or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, | |||
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not | EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not | |||
mutually exclusive) are passed in the EXCHANGE_ID arguments and | mutually exclusive) are passed in the EXCHANGE_ID arguments and | |||
results to allow the client to indicate how it wants to use sessions | results to allow the client to indicate how it wants to use sessions | |||
created under the client ID, and to allow the server to indicate how | created under the client ID, and to allow the server to indicate how | |||
it will allow the sessions to be used. See Section 13.1 for pNFS | it will allow the sessions to be used. See Section 13.1 for pNFS | |||
sessions considerations. | sessions considerations. | |||
skipping to change at page 94, line 32 | skipping to change at page 94, line 32 | |||
server supports and construct requests with only those supported | server supports and construct requests with only those supported | |||
attributes (or a subset thereof). | attributes (or a subset thereof). | |||
To this end, attributes are divided into three groups: REQUIRED, | To this end, attributes are divided into three groups: REQUIRED, | |||
RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are | RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are | |||
supported in the NFSv4.1 protocol by a specific and well-defined | supported in the NFSv4.1 protocol by a specific and well-defined | |||
encoding and are identified by number. They are requested by setting | encoding and are identified by number. They are requested by setting | |||
a bit in the bit vector sent in the GETATTR request; the server | a bit in the bit vector sent in the GETATTR request; the server | |||
response includes a bit vector to list what attributes were returned | response includes a bit vector to list what attributes were returned | |||
in the response. New REQUIRED or RECOMMENDED attributes may be added | in the response. New REQUIRED or RECOMMENDED attributes may be added | |||
to the NFS protocol between major revisions by publishing a | to the NFSv4 protocol as part of a new minor version by publishing a | |||
standards-track RFC which allocates a new attribute number value and | standards-track RFC which allocates a new attribute number value and | |||
defines the encoding for the attribute. See Section 2.7 for further | defines the encoding for the attribute. See Section 2.7 for further | |||
discussion. | discussion. | |||
Named attributes are accessed by the new OPENATTR operation, which | Named attributes are accessed by the new OPENATTR operation, which | |||
accesses a hidden directory of attributes associated with a file | accesses a hidden directory of attributes associated with a file | |||
system object. OPENATTR takes a filehandle for the object and | system object. OPENATTR takes a filehandle for the object and | |||
returns the filehandle for the attribute hierarchy. The filehandle | returns the filehandle for the attribute hierarchy. The filehandle | |||
for the named attributes is a directory object accessible by LOOKUP | for the named attributes is a directory object accessible by LOOKUP | |||
or READDIR and contains files whose names represent the named | or READDIR and contains files whose names represent the named | |||
skipping to change at page 95, line 37 | skipping to change at page 95, line 37 | |||
Note that the hidden directory returned by OPENATTR is a convenience | Note that the hidden directory returned by OPENATTR is a convenience | |||
for protocol processing. The client should not make any assumptions | for protocol processing. The client should not make any assumptions | |||
about the server's implementation of named attributes and whether the | about the server's implementation of named attributes and whether the | |||
underlying file system at the server has a named attribute directory | underlying file system at the server has a named attribute directory | |||
or not. Therefore, operations such as SETATTR and GETATTR on the | or not. Therefore, operations such as SETATTR and GETATTR on the | |||
named attribute directory are undefined. | named attribute directory are undefined. | |||
5.1. REQUIRED Attributes | 5.1. REQUIRED Attributes | |||
These MUST be supported by every NFSv4.1 client and server in order | These MUST be supported by every NFSv4.1 client and server in order | |||
to ensure a minimum level of interoperability. The server must store | to ensure a minimum level of interoperability. The server MUST store | |||
and return these attributes and the client must be able to function | and return these attributes and the client MUST be able to function | |||
with an attribute set limited to these attributes. With just the | with an attribute set limited to these attributes. With just the | |||
REQUIRED attributes some client functionality may be impaired or | REQUIRED attributes some client functionality may be impaired or | |||
limited in some ways. A client may ask for any of these attributes | limited in some ways. A client may ask for any of these attributes | |||
to be returned by setting a bit in the GETATTR request and the server | to be returned by setting a bit in the GETATTR request and the server | |||
must return their value. | must return their value. | |||
5.2. RECOMMENDED Attributes | 5.2. RECOMMENDED Attributes | |||
These attributes are understood well enough to warrant support in the | These attributes are understood well enough to warrant support in the | |||
NFSv4.1 protocol. However, they may not be supported on all clients | NFSv4.1 protocol. However, they may not be supported on all clients | |||
and servers. A client may ask for any of these attributes to be | and servers. A client may ask for any of these attributes to be | |||
returned by setting a bit in the GETATTR request but must handle the | returned by setting a bit in the GETATTR request but must handle the | |||
case where the server does not return them. A client may ask for the | case where the server does not return them. A client may ask for the | |||
set of attributes the server supports and should not request | set of attributes the server supports and SHOULD NOT request | |||
attributes the server does not support. A server should be tolerant | attributes the server does not support. A server should be tolerant | |||
of requests for unsupported attributes and simply not return them | of requests for unsupported attributes and simply not return them | |||
rather than considering the request an error. It is expected that | rather than considering the request an error. It is expected that | |||
servers will support all attributes they comfortably can and only | servers will support all attributes they comfortably can and only | |||
fail to support attributes which are difficult to support in their | fail to support attributes which are difficult to support in their | |||
operating environments. A server should provide attributes whenever | operating environments. A server should provide attributes whenever | |||
they don't have to "tell lies" to the client. For example, a file | they don't have to "tell lies" to the client. For example, a file | |||
modification time should be either an accurate time or should not be | modification time should be either an accurate time or should not be | |||
supported by the server. This will not always be comfortable to | supported by the server. This will not always be comfortable to | |||
clients but the client is better positioned decide whether and how to | clients but the client is better positioned decide whether and how to | |||
skipping to change at page 97, line 5 | skipping to change at page 97, line 5 | |||
of delegations (in the case of the named attribute directory these | of delegations (in the case of the named attribute directory these | |||
will be directory delegations). However, since granting of | will be directory delegations). However, since granting of | |||
delegations or not is within the server's discretion, a server need | delegations or not is within the server's discretion, a server need | |||
not support delegations on named attributes or the named attribute | not support delegations on named attributes or the named attribute | |||
directory. | directory. | |||
It is RECOMMENDED that servers support arbitrary named attributes. A | It is RECOMMENDED that servers support arbitrary named attributes. A | |||
client should not depend on the ability to store any named attributes | client should not depend on the ability to store any named attributes | |||
in the server's file system. If a server does support named | in the server's file system. If a server does support named | |||
attributes, a client which is also able to handle them should be able | attributes, a client which is also able to handle them should be able | |||
to copy a file's data and meta-data with complete transparency from | to copy a file's data and metadata with complete transparency from | |||
one location to another; this would imply that names allowed for | one location to another; this would imply that names allowed for | |||
regular directory entries are valid for named attribute names as | regular directory entries are valid for named attribute names as | |||
well. | well. | |||
In NFSv4.1, the structure of named attribute directories is | In NFSv4.1, the structure of named attribute directories is | |||
restricted in a number of ways, in order to prevent the development | restricted in a number of ways, in order to prevent the development | |||
of non-interoperable implementations in which some servers support a | of non-interoperable implementations in which some servers support a | |||
fully general hierarchical directory structure for named attributes | fully general hierarchical directory structure for named attributes | |||
while others support a limited set, but fully adequate to the | while others support a limited set, but fully adequate to the | |||
feature's goals. In such an environment, clients or applications | feature's goals. In such an environment, clients or applications | |||
might come to depend on non-portable extensions. The restrictions | might come to depend on non-portable extensions. The restrictions | |||
are: | are: | |||
o CREATE is not allowed in a named attribute directory. Thus, such | o CREATE is not allowed in a named attribute directory. Thus, such | |||
objects as symbolic links and special files are not allowed to be | objects as symbolic links and special files are not allowed to be | |||
named attributes. Further, directories may not be created in a | named attributes. Further, directories may not be created in a | |||
named attribute directory so no hierarchical structure of named | named attribute directory so no hierarchical structure of named | |||
attributes for a single object is allowed. | attributes for a single object is allowed. | |||
o OPENATTR many not be done on a named attribute directory or on a | o OPENATTR MUST NOT be done on a named attribute directory or on a | |||
named attribute. Thus, although these object have attributes, | named attribute. | |||
they may not may named attributes. | ||||
o Doing a RENAME of a named attribute to a different named attribute | o Doing a RENAME of a named attribute to a different named attribute | |||
directory or to an ordinary (i.e. non-named-attribute) directory | directory or to an ordinary (i.e. non-named-attribute) directory | |||
is not allowed. | is not allowed. | |||
o Creating hard links between names attribute directories or between | o Creating hard links between named attribute directories or between | |||
named attribute directories and ordinary directories is not | named attribute directories and ordinary directories is not | |||
allowed. | allowed. | |||
Names of attributes will not be controlled by this document or other | Names of attributes will not be controlled by this document or other | |||
IETF standards track documents. See Section 22.1 for further | IETF standards track documents. See Section 22.1 for further | |||
discussion. | discussion. | |||
5.4. Classification of Attributes | 5.4. Classification of Attributes | |||
Each of the REQUIRED and RECOMMENDED attributes can be classified in | Each of the REQUIRED and RECOMMENDED attributes can be classified in | |||
skipping to change at page 103, line 43 | skipping to change at page 103, line 43 | |||
True, if the server able to change the times for a file system object | True, if the server able to change the times for a file system object | |||
as specified in a SETATTR operation. | as specified in a SETATTR operation. | |||
5.7.2.3. Attribute 16: case_insensitive | 5.7.2.3. Attribute 16: case_insensitive | |||
True, if filename comparisons on this file system are case | True, if filename comparisons on this file system are case | |||
insensitive. | insensitive. | |||
5.7.2.4. Attribute 17: case_preserving | 5.7.2.4. Attribute 17: case_preserving | |||
True, if filename case on this file system are preserved. | True, if file name case on this file system is preserved. | |||
5.7.2.5. Attribute 60: change_policy | 5.7.2.5. Attribute 60: change_policy | |||
A value created by the server that the client can use to determine if | A value created by the server that the client can use to determine if | |||
some server policy related to the current file system has been | some server policy related to the current file system has been | |||
subject to change. If the value remains the same then the client can | subject to change. If the value remains the same then the client can | |||
be sure that the values of the attributes related to fs location and | be sure that the values of the attributes related to fs location and | |||
the fss_type field of the fs_status attribute have not changed. On | the fss_type field of the fs_status attribute have not changed. On | |||
the other hand, a change in this value does necessarily imply a | the other hand, a change in this value does necessarily imply a | |||
change in policy. It is up to the client to interrogate the server | change in policy. It is up to the client to interrogate the server | |||
skipping to change at page 105, line 49 | skipping to change at page 105, line 49 | |||
lead to the client either wasting bandwidth or not receiving the best | lead to the client either wasting bandwidth or not receiving the best | |||
performance. | performance. | |||
5.7.2.22. Attribute 32: mimetype | 5.7.2.22. Attribute 32: mimetype | |||
MIME body type/subtype of this object. | MIME body type/subtype of this object. | |||
5.7.2.23. Attribute 55: mounted_on_fileid | 5.7.2.23. Attribute 55: mounted_on_fileid | |||
Like fileid, but if the target filehandle is the root of a file | Like fileid, but if the target filehandle is the root of a file | |||
system return the fileid of the underlying directory. | system, this attribute represents the fileid of the underlying | |||
directory. | ||||
UNIX-based operating environments connect a file system into the | UNIX-based operating environments connect a file system into the | |||
namespace by connecting (mounting) the file system onto the existing | namespace by connecting (mounting) the file system onto the existing | |||
file object (the mount point, usually a directory) of an existing | file object (the mount point, usually a directory) of an existing | |||
file system. When the mount point's parent directory is read via an | file system. When the mount point's parent directory is read via an | |||
API like readdir(), the return results are directory entries, each | API like readdir(), the return results are directory entries, each | |||
with a component name and a fileid. The fileid of the mount point's | with a component name and a fileid. The fileid of the mount point's | |||
directory entry will be different from the fileid that the stat() | directory entry will be different from the fileid that the stat() | |||
system call returns. The stat() system call is returning the fileid | system call returns. The stat() system call is returning the fileid | |||
of the root of the mounted file system, whereas readdir() is | of the root of the mounted file system, whereas readdir() is | |||
skipping to change at page 107, line 7 | skipping to change at page 107, line 7 | |||
should obey an invariant that has it returning a value that is equal | should obey an invariant that has it returning a value that is equal | |||
to the file object's entry in the object's parent directory, i.e. | to the file object's entry in the object's parent directory, i.e. | |||
what readdir() would have returned. Some operating environments | what readdir() would have returned. Some operating environments | |||
allow a series of two or more file systems to be mounted onto a | allow a series of two or more file systems to be mounted onto a | |||
single mount point. In this case, for the server to obey the | single mount point. In this case, for the server to obey the | |||
aforementioned invariant, it will need to find the base mount point, | aforementioned invariant, it will need to find the base mount point, | |||
and not the intermediate mount points. | and not the intermediate mount points. | |||
5.7.2.24. Attribute 34: no_trunc | 5.7.2.24. Attribute 34: no_trunc | |||
True, if a name longer than name_max is used, an error be returned | If this attribute is TRUE, then if the client uses a file name longer | |||
and name is not truncated. | than name_max, an error will be returned instead of the name being | |||
truncated. | ||||
5.7.2.25. Attribute 35: numlinks | 5.7.2.25. Attribute 35: numlinks | |||
Number of hard links to this object. | Number of hard links to this object. | |||
5.7.2.26. Attribute 36: owner | 5.7.2.26. Attribute 36: owner | |||
The string name of the owner of this object. | The string name of the owner of this object. | |||
5.7.2.27. Attribute 37: owner_group | 5.7.2.27. Attribute 37: owner_group | |||
The string name of the group ownership of this object. | The string name of the group ownership of this object. | |||
5.7.2.28. Attribute 38: quota_avail_hard | 5.7.2.28. Attribute 38: quota_avail_hard | |||
The value in bytes which represent the amount of additional disk | The value in bytes which represents the amount of additional disk | |||
space beyond the current allocation that can be allocated to this | space beyond the current allocation that can be allocated to this | |||
file or directory before further allocations will be refused. It is | file or directory before further allocations will be refused. It is | |||
understood that this space may be consumed by allocations to other | understood that this space may be consumed by allocations to other | |||
files or directories. | files or directories. | |||
5.7.2.29. Attribute 39: quota_avail_soft | 5.7.2.29. Attribute 39: quota_avail_soft | |||
The value in bytes which represents the amount of additional disk | The value in bytes which represents the amount of additional disk | |||
space that can be allocated to this file or directory before the user | space that can be allocated to this file or directory before the user | |||
may reasonably be warned. It is understood that this space may be | may reasonably be warned. It is understood that this space may be | |||
skipping to change at page 108, line 9 | skipping to change at page 108, line 11 | |||
files or directories for which a quota_used value is maintained. | files or directories for which a quota_used value is maintained. | |||
E.g. "all files with a given owner", "all files with a given group | E.g. "all files with a given owner", "all files with a given group | |||
owner". etc. | owner". etc. | |||
The server is at liberty to choose any of those sets but should do so | The server is at liberty to choose any of those sets but should do so | |||
in a repeatable way. The rule may be configured per file system or | in a repeatable way. The rule may be configured per file system or | |||
may be "choose the set with the smallest quota". | may be "choose the set with the smallest quota". | |||
5.7.2.31. Attribute 41: rawdev | 5.7.2.31. Attribute 41: rawdev | |||
Raw device identifier. UNIX device major/minor node information. If | Raw device identifier; the UNIX device major/minor node information. | |||
the value of type is not NF4BLK or NF4CHR, the value return SHOULD | If the value of type is not NF4BLK or NF4CHR, the value returned | |||
NOT be considered useful. | SHOULD NOT be considered useful. | |||
5.7.2.32. Attribute 42: space_avail | 5.7.2.32. Attribute 42: space_avail | |||
Disk space in bytes available to this user on the file system | Disk space in bytes available to this user on the file system | |||
containing this object - this should be the smallest relevant limit. | containing this object - this should be the smallest relevant limit. | |||
5.7.2.33. Attribute 43: space_free | 5.7.2.33. Attribute 43: space_free | |||
Free disk space in bytes on the file system containing this object - | Free disk space in bytes on the file system containing this object - | |||
this should be the smallest relevant limit. | this should be the smallest relevant limit. | |||
skipping to change at page 108, line 33 | skipping to change at page 108, line 35 | |||
5.7.2.34. Attribute 44: space_total | 5.7.2.34. Attribute 44: space_total | |||
Total disk space in bytes on the file system containing this object. | Total disk space in bytes on the file system containing this object. | |||
5.7.2.35. Attribute 45: space_used | 5.7.2.35. Attribute 45: space_used | |||
Number of file system bytes allocated to this object. | Number of file system bytes allocated to this object. | |||
5.7.2.36. Attribute 46: system | 5.7.2.36. Attribute 46: system | |||
True, if this file is a "system" file with respect to the Windows | This attribute is TRUE if this file is a "system" file with respect | |||
API. | to the Windows operating environment. | |||
5.7.2.37. Attribute 47: time_access | 5.7.2.37. Attribute 47: time_access | |||
The time_access attribute represents the time of last access to the | The time_access attribute represents the time of last access to the | |||
object by a read that was satisfied by the server. The notion of | object by a read that was satisfied by the server. The notion of | |||
what is an "access" depends on server's operating environment and/or | what is an "access" depends on server's operating environment and/or | |||
the server's file system semantics. For example, for servers obeying | the server's file system semantics. For example, for servers obeying | |||
POSIX semantics, time_access would be updated only by the READLINK, | POSIX semantics, time_access would be updated only by the READLINK, | |||
READ, and READDIR operations and not any of the operations that | READ, and READDIR operations and not any of the operations that | |||
modify the content of the object. Of course, setting the | modify the content of the object. Of course, setting the | |||
skipping to change at page 109, line 29 | skipping to change at page 109, line 30 | |||
The time of creation of the object. This attribute does not have any | The time of creation of the object. This attribute does not have any | |||
relation to the traditional UNIX file attribute "ctime" or "change | relation to the traditional UNIX file attribute "ctime" or "change | |||
time". | time". | |||
5.7.2.41. Attribute 51: time_delta | 5.7.2.41. Attribute 51: time_delta | |||
Smallest useful server time granularity. | Smallest useful server time granularity. | |||
5.7.2.42. Attribute 52: time_metadata | 5.7.2.42. Attribute 52: time_metadata | |||
The time of last meta-data modification of the object. | The time of last metadata modification of the object. | |||
5.7.2.43. Attribute 53: time_modify | 5.7.2.43. Attribute 53: time_modify | |||
The time of last modification to the object. | The time of last modification to the object. | |||
5.7.2.44. Attribute 54: time_modify_set | 5.7.2.44. Attribute 54: time_modify_set | |||
Set the time of last modification to the object. SETATTR use only. | Set the time of last modification to the object. SETATTR use only. | |||
5.8. Interpreting owner and owner_group | 5.8. Interpreting owner and owner_group | |||
skipping to change at page 110, line 31 | skipping to change at page 110, line 32 | |||
service may also be used to accomplish the translation. A server may | service may also be used to accomplish the translation. A server may | |||
provide a more general service, not limited by any particular | provide a more general service, not limited by any particular | |||
translation (which would only translate a limited set of possible | translation (which would only translate a limited set of possible | |||
strings) by storing the owner and owner_group attributes in local | strings) by storing the owner and owner_group attributes in local | |||
storage without any translation or it may augment a translation | storage without any translation or it may augment a translation | |||
method by storing the entire string for attributes for which no | method by storing the entire string for attributes for which no | |||
translation is available while using the local representation for | translation is available while using the local representation for | |||
those cases in which a translation is available. | those cases in which a translation is available. | |||
Servers that do not provide support for all possible values of the | Servers that do not provide support for all possible values of the | |||
owner and owner_group attributes, should return an error | owner and owner_group attributes, SHOULD return an error | |||
(NFS4ERR_BADOWNER) when a string is presented that has no | (NFS4ERR_BADOWNER) when a string is presented that has no | |||
translation, as the value to be set for a SETATTR of the owner, | translation, as the value to be set for a SETATTR of the owner, | |||
owner_group, or acl attributes. When a server does accept an owner | owner_group, or acl attributes. When a server does accept an owner | |||
or owner_group value as valid on a SETATTR (and similarly for the | or owner_group value as valid on a SETATTR (and similarly for the | |||
owner and group strings in an acl), it is promising to return that | owner and group strings in an acl), it is promising to return that | |||
same string when a corresponding GETATTR is done. Configuration | same string when a corresponding GETATTR is done. Configuration | |||
changes and ill-constructed name translations (those that contain | changes (including changes from the mapping of the string to the | |||
aliasing) may make that promise impossible to honor. Servers should | local representation) and ill-constructed name translations (those | |||
make appropriate efforts to avoid a situation in which these | that contain aliasing) may make that promise impossible to honor. | |||
attributes have their values changed when no real change to ownership | Servers should make appropriate efforts to avoid a situation in which | |||
has occurred. | these attributes have their values changed when no real change to | |||
ownership has occurred. | ||||
The "dns_domain" portion of the owner string is meant to be a DNS | The "dns_domain" portion of the owner string is meant to be a DNS | |||
domain name. For example, user@ietf.org. Servers should accept as | domain name. For example, user@ietf.org. Servers should accept as | |||
valid a set of users for at least one domain. A server may treat | valid a set of users for at least one domain. A server may treat | |||
other domains as having no valid translations. A more general | other domains as having no valid translations. A more general | |||
service is provided when a server is capable of accepting users for | service is provided when a server is capable of accepting users for | |||
multiple domains, or for all domains, subject to security | multiple domains, or for all domains, subject to security | |||
constraints. | constraints. | |||
In the case where there is no translation available to the client or | In the case where there is no translation available to the client or | |||
server, the attribute value must be constructed without the "@". | server, the attribute value must be constructed without the "@". | |||
Therefore, the absence of the @ from the owner or owner_group | Therefore, the absence of the @ from the owner or owner_group | |||
attribute signifies that no translation was available at the sender | attribute signifies that no translation was available at the sender | |||
and that the receiver of the attribute should not use that string as | and that the receiver of the attribute should not use that string as | |||
a basis for translation into its own internal format. Even though | a basis for translation into its own internal format. Even though | |||
the attribute value can not be translated, it may still be useful. | the attribute value can not be translated, it may still be useful. | |||
In the case of a client, the attribute string may be used for local | In the case of a client, the attribute string may be used for local | |||
display of ownership. | display of ownership. | |||
To provide a greater degree of compatibility with NFSv3, which | To provide a greater degree of compatibility with NFSv3, which | |||
identified users and groups by 32-bit unsigned uid's and gid's, owner | identified users and groups by 32-bit unsigned user identifiers and | |||
and group strings that consist of decimal numeric values with no | group identifiers, owner and group strings that consist of decimal | |||
leading zeros can be given a special interpretation by clients and | numeric values with no leading zeros can be given a special | |||
servers which choose to provide such support. The receiver may treat | interpretation by clients and servers which choose to provide such | |||
such a user or group string as representing the same user as would be | support. The receiver may treat such a user or group string as | |||
represented by an NFSv3 uid or gid having the corresponding numeric | representing the same user as would be represented by an NFSv3 uid or | |||
value. A server is not obligated to accept such a string, but may | gid having the corresponding numeric value. A server is not | |||
return an NFS4ERR_BADOWNER instead. To avoid this mechanism being | obligated to accept such a string, but may return an NFS4ERR_BADOWNER | |||
used to subvert user and group translation, so that a client might | instead. To avoid this mechanism being used to subvert user and | |||
pass all of the owners and groups in numeric form, a server SHOULD | group translation, so that a client might pass all of the owners and | |||
return an NFS4ERR_BADOWNER error when there is a valid translation | groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER | |||
for the user or owner designated in this way. In that case, the | error when there is a valid translation for the user or owner | |||
client must use the appropriate name@domain string and not the | designated in this way. In that case, the client must use the | |||
special form for compatibility. | appropriate name@domain string and not the special form for | |||
compatibility. | ||||
The owner string "nobody" may be used to designate an anonymous user, | The owner string "nobody" may be used to designate an anonymous user, | |||
which will be associated with a file created by a security principal | which will be associated with a file created by a security principal | |||
that cannot be mapped through normal means to the owner attribute. | that cannot be mapped through normal means to the owner attribute. | |||
5.9. Character Case Attributes | 5.9. Character Case Attributes | |||
With respect to the case_insensitive and case_preserving attributes, | With respect to the case_insensitive and case_preserving attributes, | |||
each UCS-4 character (which UTF-8 encodes) has a "long descriptive | each UCS-4 character (which UTF-8 encodes) has a "long descriptive | |||
name" RFC1345 [35] which may or may not included the word "CAPITAL" | name" RFC1345 [35] which may or may not include the word "CAPITAL" or | |||
or "SMALL". The presence of SMALL or CAPITAL allows an NFS server to | "SMALL". The presence of SMALL or CAPITAL allows an NFS server to | |||
implement unambiguous and efficient table driven mappings for case | implement unambiguous and efficient table driven mappings for case | |||
insensitive comparisons, and non-case-preserving storage. For | insensitive comparisons, and non-case-preserving storage. For | |||
general character handling and internationalization issues, see | general character handling and internationalization issues, see | |||
Section 14. | Section 14. | |||
5.10. Directory Notification Attributes | 5.10. Directory Notification Attributes | |||
As described in Section 18.39, the client can request a minimum delay | As described in Section 18.39, the client can request a minimum delay | |||
for notifications of changes to attributes, but the server is free to | for notifications of changes to attributes, but the server is free to | |||
ignore what the client requests. The client can determine in advance | ignore what the client requests. The client can determine in advance | |||
skipping to change at page 112, line 24 | skipping to change at page 112, line 27 | |||
5.10.2. Attribute 57: dirent_notif_delay | 5.10.2. Attribute 57: dirent_notif_delay | |||
The dirent_notif_delay attribute is the minimum number of seconds the | The dirent_notif_delay attribute is the minimum number of seconds the | |||
server will delay before notifying the client of a change to a file | server will delay before notifying the client of a change to a file | |||
object that has an entry in the directory. | object that has an entry in the directory. | |||
5.11. pNFS Attribute Definitions | 5.11. pNFS Attribute Definitions | |||
5.11.1. Attribute 62: fs_layout_type | 5.11.1. Attribute 62: fs_layout_type | |||
The fs_layout_type attribute (data type layouttype4 (Section 3.3.13)) | The fs_layout_type attribute (see Section 3.3.13) applies to a file | |||
applies to a file system and indicates what layout types are | system and indicates what layout types are supported by the file | |||
supported by the file system. When the client encounters a new fsid, | system. When the client encounters a new fsid, the client SHOULD | |||
the client should obtain the value for the fs_layout_type attribute | obtain the value for the fs_layout_type attribute associated with the | |||
associated with the new file system. This attribute is used by the | new file system. This attribute is used by the client to determine | |||
client to determine if the layout types supported by the server match | if the layout types supported by the server match any of the client's | |||
any of the client's supported layout types. | supported layout types. | |||
5.11.2. Attribute 66: layout_alignment | 5.11.2. Attribute 66: layout_alignment | |||
When a client has layouts for a file system, the layout_alignment | When a client has layouts for a file system, the layout_alignment | |||
attribute indicates the preferred alignment for I/O to files on that | attribute indicates the preferred alignment for I/O to files on that | |||
file system. Where possible, the client should send READ and WRITE | file system. Where possible, the client should send READ and WRITE | |||
operations with offsets that are whole multiples of the | operations with offsets that are whole multiples of the | |||
layout_alignment attribute. | layout_alignment attribute. | |||
5.11.3. Attribute 65: layout_blksize | 5.11.3. Attribute 65: layout_blksize | |||
When a client has layouts for a file system, the layout_blksize | When a client has layouts for a file system, the layout_blksize | |||
attribute indicates the preferred block size for I/O to files on that | attribute indicates the preferred block size for I/O to files on that | |||
file system. Where possible, the client should send READ operations | file system. Where possible, the client should send READ operations | |||
with a count argument that is a whole multiple of layout_blksize, and | with a count argument that is a whole multiple of layout_blksize, and | |||
WRITE operations with a data argument of size that is a whole | WRITE operations with a data argument of size that is a whole | |||
multiple of layout_blksize. | multiple of layout_blksize. | |||
5.11.4. Attribute 63: layout_hint | 5.11.4. Attribute 63: layout_hint | |||
The layout_hint attribute (data type layouthint4 (Section 3.3.19)) | The layout_hint attribute (see Section 3.3.19) may be set on newly | |||
may be set on newly created files to influence the metadata server's | created files to influence the metadata server's choice for the | |||
choice for the file's layout. If possible, this attribute is one of | file's layout. If possible, this attribute is one of those set in | |||
those set in the initial attributes within the OPEN operation. The | the initial attributes within the OPEN operation. The metadata | |||
metadata server may choose to ignore this attribute. The layout_hint | server may choose to ignore this attribute. The layout_hint | |||
attribute is a sub-set of the layout structure returned by LAYOUTGET. | attribute is a sub-set of the layout structure returned by LAYOUTGET. | |||
For example, instead of specifying particular devices, this would be | For example, instead of specifying particular devices, this would be | |||
used to suggest the stripe width of a file. The server | used to suggest the stripe width of a file. The server | |||
implementation determines which fields within the layout will be | implementation determines which fields within the layout will be | |||
used. | used. | |||
5.11.5. Attribute 64: layout_type | 5.11.5. Attribute 64: layout_type | |||
This attribute lists the layout type(s) available for a file. The | This attribute lists the layout type(s) available for a file. The | |||
value returned by the server is for informational purposes only. The | value returned by the server is for informational purposes only. The | |||
skipping to change at page 113, line 33 | skipping to change at page 113, line 33 | |||
needed in order to perform I/O. For example, the specific device | needed in order to perform I/O. For example, the specific device | |||
information for the file and its layout. | information for the file and its layout. | |||
5.11.6. Attribute 68: mdsthreshold | 5.11.6. Attribute 68: mdsthreshold | |||
This attribute is a server provided hint used to communicate to the | This attribute is a server provided hint used to communicate to the | |||
client when it is more efficient to send READ and WRITE operations to | client when it is more efficient to send READ and WRITE operations to | |||
the metadata server or the data server. The two types of thresholds | the metadata server or the data server. The two types of thresholds | |||
described are file size thresholds and I/O size thresholds. If a | described are file size thresholds and I/O size thresholds. If a | |||
file's size is smaller than the file size threshold, data accesses | file's size is smaller than the file size threshold, data accesses | |||
should be sent to the metadata server. If an I/O is below the I/O | SHOULD be sent to the metadata server. If an I/O request has a | |||
size threshold, the I/O should be sent to the metadata server. As | length that is below the I/O size threshold, the I/O SHOULD be sent | |||
defined, each threshold type is specified separately for READ and | to the metadata server. Each threshold type is specified separately | |||
WRITE. | for READ and WRITE. | |||
The server may provide both types of thresholds for a file. If both | The server MAY provide both types of thresholds for a file. If both | |||
file size and I/O size are provided, the client should exceed both | file size and I/O size are provided, the client SHOULD reach or | |||
thresholds before issuing its READ or WRITE requests to the data | exceed both thresholds before issuing its READ or WRITE requests to | |||
server. Alternatively, if only one of the specified thresholds is | the data server. Alternatively, if only one of the specified | |||
exceeded, the I/O requests are sent to the metadata server. | thresholds are reached or exceeded, the I/O requests are sent to the | |||
metadata server. | ||||
For each threshold type, a value of 0 indicates no READ or WRITE | For each threshold type, a value of 0 indicates no READ or WRITE | |||
should be sent to the metadata server, while a value of all 1s | should be sent to the metadata server, while a value of all 1s | |||
indicates all READS or WRITES should be sent to the metadata server. | indicates all READS or WRITES should be sent to the metadata server. | |||
The attribute is available on a per filehandle basis. If the current | The attribute is available on a per filehandle basis. If the current | |||
filehandle refers to a non-pNFS file or directory, the metadata | filehandle refers to a non-pNFS file or directory, the metadata | |||
server should return an attribute that is representative of the | server should return an attribute that is representative of the | |||
filehandle's file system. It is suggested that this attribute is | filehandle's file system. It is suggested that this attribute is | |||
queried as part of the OPEN operation. Due to dynamic system | queried as part of the OPEN operation. Due to dynamic system | |||
skipping to change at page 114, line 24 | skipping to change at page 114, line 25 | |||
reached. | reached. | |||
When retention is enabled, retention MUST extend to the data of the | When retention is enabled, retention MUST extend to the data of the | |||
file, and the name of file. The server MAY extend retention any | file, and the name of file. The server MAY extend retention any | |||
other property of the file, including any subset of REQUIRED, | other property of the file, including any subset of REQUIRED, | |||
RECOMMENDED, and named attributes, with the exceptions noted in this | RECOMMENDED, and named attributes, with the exceptions noted in this | |||
section. | section. | |||
Servers MAY support or not support retention on any file object type. | Servers MAY support or not support retention on any file object type. | |||
The five retention attributes are as follows: | The five retention attributes are explained in the next subsections. | |||
5.12.1. Attribute 69: retention_get | 5.12.1. Attribute 69: retention_get | |||
If retention is enabled for the associated file, this attribute's | If retention is enabled for the associated file, this attribute's | |||
value represents the retention begin time of the file object. This | value represents the retention begin time of the file object. This | |||
attribute's value is only readable with the GETATTR operation and may | attribute's value is only readable with the GETATTR operation and may | |||
not be modified by the SETATTR operation. The value of the attribute | not be modified by the SETATTR operation. The value of the attribute | |||
consists of: | consists of: | |||
const RET4_DURATION_INFINITE = 0xffffffffffffffff; | const RET4_DURATION_INFINITE = 0xffffffffffffffff; | |||
skipping to change at page 115, line 43 | skipping to change at page 115, line 48 | |||
5.12.4. Attribute 72: retentevt_set | 5.12.4. Attribute 72: retentevt_set | |||
Set the event-based retention duration, and optionally enable event- | Set the event-based retention duration, and optionally enable event- | |||
based retention on the file object. This attribute corresponds to | based retention on the file object. This attribute corresponds to | |||
retentevt_get, is like retention_set, but refers to event-based | retentevt_get, is like retention_set, but refers to event-based | |||
retention. When event based retention is set, the file MUST be | retention. When event based retention is set, the file MUST be | |||
retained even if non-event-based retention has been set, and the | retained even if non-event-based retention has been set, and the | |||
duration of non-event-based retention has been reached. Conversely, | duration of non-event-based retention has been reached. Conversely, | |||
when non-event-based retention has been set, the file MUST be | when non-event-based retention has been set, the file MUST be | |||
retained even the event-based retention has been set, and the | retained even if event-based retention has been set, and the duration | |||
duration of event-based retention has been reached. The server MAY | of event-based retention has been reached. The server MAY restrict | |||
restrict the enabling of event-based retention or the duration of | the enabling of event-based retention or the duration of event-based | |||
event-based retention on the basis of the ACE4_WRITE_RETENTION ACL | retention on the basis of the ACE4_WRITE_RETENTION ACL permission. | |||
permission. The enabling of event-based retention does not prevent | The enabling of event-based retention does not prevent the enabling | |||
the enabling of non-event-based retention nor the modification of the | of non-event-based retention nor the modification of the | |||
retention_hold attribute. | retention_hold attribute. | |||
5.12.5. Attribute 73: retention_hold | 5.12.5. Attribute 73: retention_hold | |||
Get or set administrative retention holds, one hold per bit position. | Get or set administrative retention holds, one hold per bit position. | |||
This attribute allows one to 64 administrative holds, one hold per | This attribute allows one to 64 administrative holds, one hold per | |||
bit on the attribute. If retention_hold is not zero, then the file | bit on the attribute. If retention_hold is not zero, then the file | |||
MUST NOT be deleted, renamed, or modified, even if the duration on | MUST NOT be deleted, renamed, or modified, even if the duration on | |||
enabled event or non-event-based retention has been reached. The | enabled event or non-event-based retention has been reached. The | |||
skipping to change at page 160, line 13 | skipping to change at page 160, line 13 | |||
type locking requests are allowed, unless the server is able to | type locking requests are allowed, unless the server is able to | |||
reliably determine (through state persistently maintained across | reliably determine (through state persistently maintained across | |||
reboot instances), that granting any such lock cannot possibly | reboot instances), that granting any such lock cannot possibly | |||
conflict with a subsequent reclaim. When a request is made to obtain | conflict with a subsequent reclaim. When a request is made to obtain | |||
a new lock (i.e. not a reclaim-type request) during the grace period | a new lock (i.e. not a reclaim-type request) during the grace period | |||
and such a determination cannot be made, the server must return the | and such a determination cannot be made, the server must return the | |||
error NFS4ERR_GRACE. | error NFS4ERR_GRACE. | |||
Once a session is established using the new client ID, the client | Once a session is established using the new client ID, the client | |||
will use reclaim-type locking requests (e.g. LOCK requests with | will use reclaim-type locking requests (e.g. LOCK requests with | |||
reclaim set to true and OPEN operations with a claim type of | reclaim set to TRUE and OPEN operations with a claim type of | |||
CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. | CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. | |||
Once this is done, or if there is no such locking state to reclaim, | Once this is done, or if there is no such locking state to reclaim, | |||
the client sends a global RECLAIM_COMPLETE operation, i.e. one with | the client sends a global RECLAIM_COMPLETE operation, i.e. one with | |||
the one_fs argument set to false, to indicate that it has reclaimed | the rca_one_fs argument set to FALSE, to indicate that it has | |||
all of the locking state that it will reclaim. Once a client sends | reclaimed all of the locking state that it will reclaim. Once a | |||
such a RECLAIM_COMPLETE operation, it may attempt non-reclaim locking | client sends such a RECLAIM_COMPLETE operation, it may attempt non- | |||
operations, although it may get NFS4ERR_GRACE errors the operations | reclaim locking operations, although it may get NFS4ERR_GRACE errors | |||
until the period of special handling is over. See Section 11.7.7 for | the operations until the period of special handling is over. See | |||
a discussion of the analogous handling lock reclamation in the case | Section 11.7.7 for a discussion of the analogous handling lock | |||
of file systems transitioning from server to server. | reclamation in the case of file systems transitioning from server to | |||
server. | ||||
During the grace period, the server must reject READ and WRITE | During the grace period, the server must reject READ and WRITE | |||
operations and non-reclaim locking requests (i.e. other LOCK and OPEN | operations and non-reclaim locking requests (i.e. other LOCK and OPEN | |||
operations) with an error of NFS4ERR_GRACE, unless it is able to | operations) with an error of NFS4ERR_GRACE, unless it is able to | |||
guarantee that these may be done safely, as described below. | guarantee that these may be done safely, as described below. | |||
The grace period may last until all clients who are known to possibly | The grace period may last until all clients who are known to possibly | |||
have had locks have done a global RECLAIM_COMPLETE operation, | have had locks have done a global RECLAIM_COMPLETE operation, | |||
indicating that they have finished reclaiming the locks they held | indicating that they have finished reclaiming the locks they held | |||
before the server reboot. This means that a client which has done a | before the server reboot. This means that a client which has done a | |||
skipping to change at page 196, line 34 | skipping to change at page 196, line 34 | |||
storage is OPTIONAL. | storage is OPTIONAL. | |||
As discussed earlier in this section, the client MAY return the same | As discussed earlier in this section, the client MAY return the same | |||
cc value on subsequent CB_GETATTR calls, even if the file was | cc value on subsequent CB_GETATTR calls, even if the file was | |||
modified in the client's cache yet again between successive | modified in the client's cache yet again between successive | |||
CB_GETATTR calls. Therefore, the server must assume that the file | CB_GETATTR calls. Therefore, the server must assume that the file | |||
has been modified yet again, and MUST take care to ensure that the | has been modified yet again, and MUST take care to ensure that the | |||
new nsc it constructs and returns is greater than the previous nsc it | new nsc it constructs and returns is greater than the previous nsc it | |||
returned. An example implementation's delegation record would | returned. An example implementation's delegation record would | |||
satisfy this mandate by including a boolean field (let us call it | satisfy this mandate by including a boolean field (let us call it | |||
"modified") that is set to false when the delegation is granted, and | "modified") that is set to FALSE when the delegation is granted, and | |||
an sc value set at the time of grant to the change attribute value. | an sc value set at the time of grant to the change attribute value. | |||
The modified field would be set to true the first time cc != sc, and | The modified field would be set to true the first time cc != sc, and | |||
would stay true until the delegation is returned or revoked. The | would stay true until the delegation is returned or revoked. The | |||
processing for constructing nsc, time_modify, and time_metadata would | processing for constructing nsc, time_modify, and time_metadata would | |||
use this pseudo code: | use this pseudo code: | |||
if (!modified) { | if (!modified) { | |||
do CB_GETATTR for change and size; | do CB_GETATTR for change and size; | |||
if (cc != sc) | if (cc != sc) | |||
skipping to change at page 231, line 15 | skipping to change at page 231, line 15 | |||
reclaim after server reboot (although in the case of the planned | reclaim after server reboot (although in the case of the planned | |||
state transfer associated with migration, these can be avoided by | state transfer associated with migration, these can be avoided by | |||
securely recording lock state as part of state migration). Unless | securely recording lock state as part of state migration). Unless | |||
the destination server can guarantee that locks will not be | the destination server can guarantee that locks will not be | |||
incorrectly granted, the destination server should not allow lock | incorrectly granted, the destination server should not allow lock | |||
reclaims and avoid establishing a grace period. | reclaims and avoid establishing a grace period. | |||
Once all locks have been reclaimed, or there were no locks to | Once all locks have been reclaimed, or there were no locks to | |||
reclaim, the client indicates that there are no more reclaims to be | reclaim, the client indicates that there are no more reclaims to be | |||
done for the file system in question by issuing a RECLAIM_COMPLETE | done for the file system in question by issuing a RECLAIM_COMPLETE | |||
operation with the one_fs parameter set to true. Once this has been | operation with the rca_one_fs parameter set to true. Once this has | |||
done, non-reclaim locking operations may be done, and any subsequent | been done, non-reclaim locking operations may be done, and any | |||
request to do reclaims will be rejected with the error | subsequent request to do reclaims will be rejected with the error | |||
NFS4ERR_NO_GRACE. | NFS4ERR_NO_GRACE. | |||
Information about client identity may be propagated between servers | Information about client identity may be propagated between servers | |||
in the form of client_owner4 and associated verifiers, under the | in the form of client_owner4 and associated verifiers, under the | |||
assumption that the client presents the same values to all the | assumption that the client presents the same values to all the | |||
servers with which it deals. | servers with which it deals. | |||
Servers are encouraged to provide facilities to allow locks to be | Servers are encouraged to provide facilities to allow locks to be | |||
reclaimed on the new server after a file system transition. Often, | reclaimed on the new server after a file system transition. Often, | |||
however, in cases in which the two servers do not share a server | however, in cases in which the two servers do not share a server | |||
skipping to change at page 268, line 20 | skipping to change at page 268, line 20 | |||
the server supports and the client is prepared to use. The layout | the server supports and the client is prepared to use. The layout | |||
returned to the client may not exactly align with the requested byte | returned to the client may not exactly align with the requested byte | |||
range. A field within the LAYOUTGET request, loga_minlength, | range. A field within the LAYOUTGET request, loga_minlength, | |||
specifies the minimum length of the layout. The loga_minlength field | specifies the minimum length of the layout. The loga_minlength field | |||
should be at least one. As needed a client may make multiple | should be at least one. As needed a client may make multiple | |||
LAYOUTGET requests; these will result in multiple overlapping, non- | LAYOUTGET requests; these will result in multiple overlapping, non- | |||
conflicting layouts. | conflicting layouts. | |||
In order to get a layout, the client must first have opened the file | In order to get a layout, the client must first have opened the file | |||
via the OPEN operation. When a client has no layout on a file, it | via the OPEN operation. When a client has no layout on a file, it | |||
presents a stateid as returned by OPEN, a delegation stateid, or a | MUST present a stateid as returned by OPEN, a delegation stateid, or | |||
byte-range lock stateid in the loga_stateid argument. A successful | a byte-range lock stateid in the loga_stateid argument. A successful | |||
LAYOUTGET result includes a layout stateid. The first successful | LAYOUTGET result includes a layout stateid. The first successful | |||
LAYOUTGET processed by the server using a non-layout stateid as an | LAYOUTGET processed by the server using a non-layout stateid as an | |||
argument MUST have the "seqid" field of the layout stateid in the | argument MUST have the "seqid" field of the layout stateid in the | |||
response set to one. Thereafter, the client uses a layout stateid | response set to one. Thereafter, the client uses a layout stateid | |||
(see Section 12.5.3) on future invocations of LAYOUTGET on the file, | (see Section 12.5.3) on future invocations of LAYOUTGET on the file, | |||
and the "seqid" MUST NOT ever be set to zero. Once the layout has | and the "seqid" MUST NOT ever be set to zero. Once the layout has | |||
been retrieved, it can be held across multiple OPEN and CLOSE | been retrieved, it can be held across multiple OPEN and CLOSE | |||
sequences. Therefore, a client may hold a layout for a file that is | sequences. Therefore, a client may hold a layout for a file that is | |||
not currently open by any user on the client. This allows for the | not currently open by any user on the client. This allows for the | |||
caching of layouts beyond CLOSE. | caching of layouts beyond CLOSE. | |||
skipping to change at page 270, line 10 | skipping to change at page 270, line 10 | |||
CB_LAYOUTRECALL request. Simply seeing the result or the | CB_LAYOUTRECALL request. Simply seeing the result or the | |||
CB_LAYOUTRECALL request is not sufficient cause to use the seqid. | CB_LAYOUTRECALL request is not sufficient cause to use the seqid. | |||
For LAYOUTGET results, if the client is not using the forgetful model | For LAYOUTGET results, if the client is not using the forgetful model | |||
(Section 12.5.5.1), it MUST first update its record of what ranges of | (Section 12.5.5.1), it MUST first update its record of what ranges of | |||
the file's layout it has before using the seqid. For LAYOUTRETURN | the file's layout it has before using the seqid. For LAYOUTRETURN | |||
results, the client MUST delete the range from its record of what | results, the client MUST delete the range from its record of what | |||
ranges of the file's layout it had before using the seqid. For | ranges of the file's layout it had before using the seqid. For | |||
CB_LAYOUTRECALL arguments, the client MUST send a response to the | CB_LAYOUTRECALL arguments, the client MUST send a response to the | |||
recall before using the seqid. | recall before using the seqid. | |||
Once a client has no more layouts on a file, the layout stateid is no | ||||
longer valid, and MUST NOT be used. Any attempt to use such a layout | ||||
stateid will result in NFS4ERR_BAD_STATEID. | ||||
12.5.4. Committing a Layout | 12.5.4. Committing a Layout | |||
Allowing for varying storage protocols capabilities, the pNFS | Allowing for varying storage protocols capabilities, the pNFS | |||
protocol does not require the metadata server and storage devices to | protocol does not require the metadata server and storage devices to | |||
have a consistent view of file attributes and data location mappings. | have a consistent view of file attributes and data location mappings. | |||
Data location mapping refers to aspects such as which offsets store | Data location mapping refers to aspects such as which offsets store | |||
data as opposed to storing holes (see Section 13.4.4 for a | data as opposed to storing holes (see Section 13.4.4 for a | |||
discussion). Related issues arise for storage protocols where a | discussion). Related issues arise for storage protocols where a | |||
layout may hold provisionally allocated blocks where the allocation | layout may hold provisionally allocated blocks where the allocation | |||
of those blocks does not survive a complete restart of both the | of those blocks does not survive a complete restart of both the | |||
skipping to change at page 271, line 5 | skipping to change at page 271, line 8 | |||
The control protocol is free to synchronize the attributes before it | The control protocol is free to synchronize the attributes before it | |||
receives a LAYOUTCOMMIT, however upon successful completion of a | receives a LAYOUTCOMMIT, however upon successful completion of a | |||
LAYOUTCOMMIT, state that exists on the metadata server that describes | LAYOUTCOMMIT, state that exists on the metadata server that describes | |||
the file MUST be in sync with the state existing on the storage | the file MUST be in sync with the state existing on the storage | |||
devices that comprise that file as of the issuing client's last | devices that comprise that file as of the issuing client's last | |||
operation. Thus, a client that queries the size of a file between a | operation. Thus, a client that queries the size of a file between a | |||
WRITE to a storage device and the LAYOUTCOMMIT may observe a size | WRITE to a storage device and the LAYOUTCOMMIT may observe a size | |||
that does not reflect the actual data written. | that does not reflect the actual data written. | |||
The client MUST have a layout in order to issue LAYOUTCOMMIT. | ||||
12.5.4.1. LAYOUTCOMMIT and change/time_modify | 12.5.4.1. LAYOUTCOMMIT and change/time_modify | |||
The change and time_modify attributes may be updated by the server | The change and time_modify attributes may be updated by the server | |||
when the LAYOUTCOMMIT operation is processed. The reason for this is | when the LAYOUTCOMMIT operation is processed. The reason for this is | |||
that some layout types do not support the update of these attributes | that some layout types do not support the update of these attributes | |||
when the storage devices process I/O operations. The client is | when the storage devices process I/O operations. If client has a | |||
capable providing a suggested value to the server for time_modify | layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY | |||
within the arguments to LAYOUTCOMMIT. Based on layout type, the | provide a suggested value to the server for time_modify within the | |||
provided value may or may not be used. The server should sanity | arguments to LAYOUTCOMMIT. Based on the layout type, the provided | |||
check the client provided values before they are used. For example, | value may or may not be used. The server should sanity check the | |||
the server should ensure that time does not flow backwards. The | client provided values before they are used. For example, the server | |||
client always has the option to set time_modify through an explicit | should ensure that time does not flow backwards. The client always | |||
SETATTR operation. | has the option to set time_modify through an explicit SETATTR | |||
operation. | ||||
For some layout protocols, the storage device is able to notify the | For some layout protocols, the storage device is able to notify the | |||
metadata server of the occurrence of an I/O and as a result the | metadata server of the occurrence of an I/O and as a result the | |||
change and time_modify attributes may be updated at the metadata | change and time_modify attributes may be updated at the metadata | |||
server. For a metadata server that is capable of monitoring updates | server. For a metadata server that is capable of monitoring updates | |||
to the change and time_modify attributes, LAYOUTCOMMIT processing is | to the change and time_modify attributes, LAYOUTCOMMIT processing is | |||
not required to update the change attribute; in this case the | not required to update the change attribute; in this case the | |||
metadata server must ensure that no further update to the data has | metadata server must ensure that no further update to the data has | |||
occurred since the last update of the attributes; file-based | occurred since the last update of the attributes; file-based | |||
protocols may have enough information to make this determination or | protocols may have enough information to make this determination or | |||
skipping to change at page 271, line 45 | skipping to change at page 271, line 51 | |||
12.5.4.2. LAYOUTCOMMIT and size | 12.5.4.2. LAYOUTCOMMIT and size | |||
The size of a file may be updated when the LAYOUTCOMMIT operation is | The size of a file may be updated when the LAYOUTCOMMIT operation is | |||
used by the client. One of the fields in the argument to | used by the client. One of the fields in the argument to | |||
LAYOUTCOMMIT is loca_last_write_offset; this field indicates the | LAYOUTCOMMIT is loca_last_write_offset; this field indicates the | |||
highest byte offset written but not yet committed with the | highest byte offset written but not yet committed with the | |||
LAYOUTCOMMIT operation. The data type of lora_last_write_offset is | LAYOUTCOMMIT operation. The data type of lora_last_write_offset is | |||
newoffset4 and is switched on a boolean value, no_newoffset, that | newoffset4 and is switched on a boolean value, no_newoffset, that | |||
indicates if a previous write occurred or not. If no_newoffset is | indicates if a previous write occurred or not. If no_newoffset is | |||
FALSE, an offset is not given. A loca_last_write_offset value of | FALSE, an offset is not given. If the client has a layout with | |||
zero means that one byte was written at offset zero. | LAYOUTIOMODE4_RW iomode on the file, with an lo_offset and lo_length | |||
that overlaps loca_last_write_offset, then the client MAY set | ||||
no_newoffset to TRUE and provide an offset that will update the file | ||||
size. Keep in mind that offset is not the same as length, though | ||||
they are related. For example, a loca_last_write_offset value of | ||||
zero means that one byte was written at offset zero, and so the | ||||
length of the file is at least one byte. | ||||
The metadata server may do one of the following: | The metadata server may do one of the following: | |||
1. Update the file's size using the last write offset provided by | 1. Update the file's size using the last write offset provided by | |||
the client as either the true file size or as a hint of the file | the client as either the true file size or as a hint of the file | |||
size. If the metadata server has a method available, any new | size. If the metadata server has a method available, any new | |||
value for file size should be sanity checked. For example, the | value for file size should be sanity checked. For example, the | |||
file must not be truncated if the client presents a last write | file must not be truncated if the client presents a last write | |||
offset less than the file's current size. | offset less than the file's current size. | |||
skipping to change at page 281, line 46 | skipping to change at page 282, line 11 | |||
LAYOUTCOMMIT to commit the modification time and the new size of the | LAYOUTCOMMIT to commit the modification time and the new size of the | |||
file (if it believes it extended the file size) to the metadata | file (if it believes it extended the file size) to the metadata | |||
server and the modified data to the file system. | server and the modified data to the file system. | |||
12.7. Recovery | 12.7. Recovery | |||
Recovery is complicated by the distributed nature of the pNFS | Recovery is complicated by the distributed nature of the pNFS | |||
protocol. In general, crash recovery for layouts is similar to crash | protocol. In general, crash recovery for layouts is similar to crash | |||
recovery for delegations in the base NFSv4.1 protocol. However, the | recovery for delegations in the base NFSv4.1 protocol. However, the | |||
client's ability to perform I/O without contacting the metadata | client's ability to perform I/O without contacting the metadata | |||
server subtleties that must be handled correctly if the possibility | server introduces subtleties that must be handled correctly if the | |||
of file system corruption is to be avoided. [[Comment.4: mre: | possibility of file system corruption is to be avoided. | |||
layouts are bound to stateids]] | ||||
12.7.1. Recovery from Client Restart | 12.7.1. Recovery from Client Restart | |||
Client recovery for layouts is similar to client recovery for other | Client recovery for layouts is similar to client recovery for other | |||
lock and delegation state. When an pNFS client restarts, it will | lock and delegation state. When an pNFS client restarts, it will | |||
lose all information about the layouts that it previously owned. | lose all information about the layouts that it previously owned. | |||
There are two methods by which the server can reclaim these resources | There are two methods by which the server can reclaim these resources | |||
and allow otherwise conflicting layouts to be provided to other | and allow otherwise conflicting layouts to be provided to other | |||
clients. | clients. | |||
skipping to change at page 290, line 39 | skipping to change at page 290, line 45 | |||
If a server is both a metadata server and a data server, the server | If a server is both a metadata server and a data server, the server | |||
might need to distinguish operations on files that are directed to | might need to distinguish operations on files that are directed to | |||
the metadata server from those that are directed to the data server. | the metadata server from those that are directed to the data server. | |||
It is RECOMMENDED that the values of the filehandles returned by the | It is RECOMMENDED that the values of the filehandles returned by the | |||
LAYOUTGET operation to be different than the value of the filehandle | LAYOUTGET operation to be different than the value of the filehandle | |||
returned by the OPEN of the same file. | returned by the OPEN of the same file. | |||
Another scenario is for the metadata server and the storage device to | Another scenario is for the metadata server and the storage device to | |||
be distinct from one client's point of view, and the roles reversed | be distinct from one client's point of view, and the roles reversed | |||
from another client's point of view. For example, in the cluster | from another client's point of view. For example, in the cluster | |||
file system model a metadata server to one client, may be a data | file system model, a metadata server to one client may be a data | |||
server to another client. If NFSv4.1 is being used as the storage | server to another client. If NFSv4.1 is being used as the storage | |||
protocol, then pNFS servers need to encode the values of filehandles | protocol, then pNFS servers need to encode the values of filehandles | |||
according to their specific roles. | according to their specific roles. | |||
13.1.1. Sessions Considerations for Data Servers | ||||
Section 2.10.9.2 states that a client has to keep its lease renewed | ||||
in order to prevent a session from being deleted by the server. If | ||||
the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role | ||||
set, then as noted in Section 13.6 the client will not be able to | ||||
determine the data server's lease_time attribute, because GETATTR | ||||
will not be permitted. Instead, the rule is that any time a client | ||||
receives a layout referring it to a data server that returns just the | ||||
EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the | ||||
lease_time attribute from the metadata server that returned the | ||||
layout applies to the data server. Thus the data server MUST be | ||||
aware of the values of all lease_time attributes of all metadata | ||||
servers it is providing I/O for, and MUST use the maximum of all such | ||||
lease_time values as the lease interval for all client IDs and | ||||
sessions established on it. | ||||
For example, if one metadata server has a lease_time attribute of 20 | ||||
seconds, and a second metadata server has a lease_time attribute of | ||||
10 seconds, then if both servers return layouts that refer to an | ||||
EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data server MUST | ||||
renew a client's lease if the interval between two SEQUENCE | ||||
operations on different COMPOUND requests is less than 20 seconds. | ||||
13.2. File Layout Definitions | 13.2. File Layout Definitions | |||
The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout | The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout | |||
type, and may be applicable to other layout types. | type, and may be applicable to other layout types. | |||
Unit. A unit is a fixed size quantity of data written to a data | Unit. A unit is a fixed size quantity of data written to a data | |||
server. | server. | |||
Pattern. A pattern is a method of distributing one or more equal | Pattern. A pattern is a method of distributing one or more equal | |||
sized units across a set of data servers. A pattern is iterated | sized units across a set of data servers. A pattern is iterated | |||
skipping to change at page 304, line 20 | skipping to change at page 305, line 20 | |||
personalities, each COMPOUND sent by the client MUST be constructed | personalities, each COMPOUND sent by the client MUST be constructed | |||
so that it is appropriate to one of the two personalities, and must | so that it is appropriate to one of the two personalities, and must | |||
not contain operations directed to a mix of those personalities. The | not contain operations directed to a mix of those personalities. The | |||
server MUST enforce this. To understand the constraints, operations | server MUST enforce this. To understand the constraints, operations | |||
within a COMPOUND are divided into the following three classes: | within a COMPOUND are divided into the following three classes: | |||
1. An operation which is ambiguous regarding its personality | 1. An operation which is ambiguous regarding its personality | |||
assignment. These include all of the data-server housekeeping | assignment. These include all of the data-server housekeeping | |||
operations. Additionally, if the server has assigned filehandles | operations. Additionally, if the server has assigned filehandles | |||
so that the ones defined by the layout are the same as those used | so that the ones defined by the layout are the same as those used | |||
by the meta-data server, all operations in the second class are | by the metadata server, all operations in the second class are | |||
within this group unless a stateid used is incompatible with a | within this group unless a stateid used is incompatible with a | |||
data-server personality in that it is a special stateid or has a | data-server personality in that it is a special stateid or has a | |||
non-zero seqid field. | non-zero seqid field. | |||
2. An operation which is referable to the data server personality. | 2. An operation which is referable to the data server personality. | |||
These are data-server I/O operations where the filehandle is one | These are data-server I/O operations where the filehandle is one | |||
that can only be validly directed to the data-server personality. | that can only be validly directed to the data-server personality. | |||
3. An operation which is referable to the non-data-server | 3. An operation which is referable to the non-data-server | |||
personality. These include all COMPOUND operations that are | personality. These include all COMPOUND operations that are | |||
skipping to change at page 305, line 41 | skipping to change at page 306, line 41 | |||
has completed (see Section 12.5.4.2). Section 13.10, describes the | has completed (see Section 12.5.4.2). Section 13.10, describes the | |||
mechanism by which the client is to handle data server files that do | mechanism by which the client is to handle data server files that do | |||
not reflect the metadata server's size. | not reflect the metadata server's size. | |||
13.7. COMMIT Through Metadata Server | 13.7. COMMIT Through Metadata Server | |||
The file layout provides two alternate means of providing for the | The file layout provides two alternate means of providing for the | |||
commit of data written through data servers. The flag | commit of data written through data servers. The flag | |||
NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout | NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout | |||
(data type nfsv4_1_file_layout4) is an indication from the metadata | (data type nfsv4_1_file_layout4) is an indication from the metadata | |||
server to the client of the preferred way of performing COMMIT, | server to the client of the REQUIRED way of performing COMMIT, either | |||
either by sending the COMMIT to the data server or the metadata | by sending the COMMIT to the data server or the metadata server. | |||
server. These two methods of dealing with the issue correspond to | These two methods of dealing with the issue correspond to broad | |||
broad styles of implementation for a pNFS server supporting the files | styles of implementation for a pNFS server supporting the files | |||
layout type. | layout type. | |||
o When the flag is false, COMMIT operations are to be done to the | o When the flag is FALSE, COMMIT operations MUST to be sent to the | |||
data server to which the corresponding writes were done. This | data server to which the corresponding WRITE operations were sent. | |||
approach is most useful when striping of files is implemented as | This approach is most useful when striping of files is implemented | |||
part of pNFS server, with the individual data servers each | as part of pNFS server, with the individual data servers each | |||
implementing their own file systems. | implementing their own file systems. | |||
o When the flag is true, COMMIT operations are done to the metadata | o When the flag is TRUE, COMMIT operations MUST be sent to the | |||
server, rather than to the individual data servers. This approach | metadata server, rather than to the individual data servers. This | |||
is most useful when the pNFS server is implemented on top of a | approach is most useful when the pNFS server is implemented on top | |||
clustered file system. In such an implementation, sending | of a clustered file system. In such an implementation, sending | |||
COMMIT's to multiple data servers may result in repeated writes of | COMMIT's to multiple data servers may result in repeated writes of | |||
metadata blocks as each individual COMMIT is executed, to the | metadata blocks as each individual COMMIT is executed, to the | |||
detriment of write performance. Sending a single COMMIT to the | detriment of write performance. Sending a single COMMIT to the | |||
metadata server can provide more efficiency when there exists a | metadata server can provide more efficiency when there exists a | |||
clustered file system capable of implementing such a co-ordinated | clustered file system capable of implementing such a co-ordinated | |||
COMMIT. | COMMIT. | |||
If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to | If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to | |||
maintain the current NFSv4.1 commit and recovery model, the data | maintain the current NFSv4.1 commit and recovery model, the data | |||
servers MUST return a common writeverf verifier in all WRITE | servers MUST return a common writeverf verifier in all WRITE | |||
skipping to change at page 314, line 32 | skipping to change at page 315, line 32 | |||
Table B.1 | Table B.1 | |||
Table B.2 is normally not part of the nfs4_cs_prep profile as it is | Table B.2 is normally not part of the nfs4_cs_prep profile as it is | |||
primarily for dealing with case-insensitive comparisons. However, if | primarily for dealing with case-insensitive comparisons. However, if | |||
the NFSv4.1 file server supports the case_insensitive file system | the NFSv4.1 file server supports the case_insensitive file system | |||
attribute, and if case_insensitive is true, the NFSv4.1 server MUST | attribute, and if case_insensitive is true, the NFSv4.1 server MUST | |||
use Table B.2 (in addition to Table B1) when processing utf8str_cs | use Table B.2 (in addition to Table B1) when processing utf8str_cs | |||
strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to | strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to | |||
Table B.1) are being used. | Table B.1) are being used. | |||
If the case_preserving attribute is present and set to false, then | If the case_preserving attribute is present and set to FALSE, then | |||
the NFSv4.1 server MUST use table B.2 to map case when processing | the NFSv4.1 server MUST use table B.2 to map case when processing | |||
utf8str_cs strings. Whether the server maps from lower to upper case | utf8str_cs strings. Whether the server maps from lower to upper case | |||
or the upper to lower case is an implementation dependency. | or the upper to lower case is an implementation dependency. | |||
14.1.4. Normalization used by nfs4_cs_prep | 14.1.4. Normalization used by nfs4_cs_prep | |||
The nfs4_cs_prep profile does not specify a normalization form. A | The nfs4_cs_prep profile does not specify a normalization form. A | |||
later revision of this specification may specify a particular | later revision of this specification may specify a particular | |||
normalization form. Therefore, the server and client can expect that | normalization form. Therefore, the server and client can expect that | |||
they may receive unnormalized characters within protocol requests and | they may receive unnormalized characters within protocol requests and | |||
skipping to change at page 342, line 35 | skipping to change at page 343, line 35 | |||
| GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | |||
| | NFS4ERR_NOFILEHANDLE, | | | | NFS4ERR_NOFILEHANDLE, | | |||
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | |||
| ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | |||
| LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | |||
| | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | |||
| | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | |||
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | |||
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | |||
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | |||
| | NFS4ERR_IO, NFS4ERR_ISDIR NFS4ERR_MOVED, | | | | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR | | |||
| | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, | | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | |||
| | NFS4ERR_NO_GRACE, | | | | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | | |||
| | NFS4ERR_OP_NOT_IN_SESSION, | | | | NFS4ERR_OP_NOT_IN_SESSION, | | |||
| | NFS4ERR_RECLAIM_BAD, | | | | NFS4ERR_RECLAIM_BAD, | | |||
| | NFS4ERR_RECLAIM_CONFLICT, | | | | NFS4ERR_RECLAIM_CONFLICT, | | |||
| | NFS4ERR_REP_TOO_BIG, | | | | NFS4ERR_REP_TOO_BIG, | | |||
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | |||
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | |||
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | |||
| | NFS4ERR_TOO_MANY_OPS, | | | | NFS4ERR_TOO_MANY_OPS, | | |||
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | |||
| | NFS4ERR_WRONG_CRED | | | | NFS4ERR_WRONG_CRED | | |||
skipping to change at page 361, line 38 | skipping to change at page 362, line 38 | |||
| NFS4ERR_INVAL | ACCESS, BACKCHANNEL_CTL, | | | NFS4ERR_INVAL | ACCESS, BACKCHANNEL_CTL, | | |||
| | BIND_CONN_TO_SESSION, | | | | BIND_CONN_TO_SESSION, | | |||
| | CB_GETATTR, CB_LAYOUTRECALL, | | | | CB_GETATTR, CB_LAYOUTRECALL, | | |||
| | CB_NOTIFY, CB_PUSH_DELEG, | | | | CB_NOTIFY, CB_PUSH_DELEG, | | |||
| | CB_RECALLABLE_OBJ_AVAIL, | | | | CB_RECALLABLE_OBJ_AVAIL, | | |||
| | CB_RECALL_ANY, CREATE, | | | | CB_RECALL_ANY, CREATE, | | |||
| | CREATE_SESSION, DELEGRETURN, | | | | CREATE_SESSION, DELEGRETURN, | | |||
| | EXCHANGE_ID, GETATTR, | | | | EXCHANGE_ID, GETATTR, | | |||
| | GETDEVICEINFO, GETDEVICELIST, | | | | GETDEVICEINFO, GETDEVICELIST, | | |||
| | GET_DIR_DELEGATION, | | | | GET_DIR_DELEGATION, | | |||
| | LAYOUTGET, LAYOUTRETURN, | | | | LAYOUTCOMMIT, LAYOUTGET, | | |||
| | LINK, LOCK, LOCKT, LOCKU, | | | | LAYOUTRETURN, LINK, LOCK, | | |||
| | LOOKUP, NVERIFY, OPEN, | | | | LOCKT, LOCKU, LOOKUP, | | |||
| | NVERIFY, OPEN, | | ||||
| | OPEN_DOWNGRADE, READ, | | | | OPEN_DOWNGRADE, READ, | | |||
| | READDIR, READLINK, | | | | READDIR, READLINK, | | |||
| | RECLAIM_COMPLETE, REMOVE, | | | | RECLAIM_COMPLETE, REMOVE, | | |||
| | RENAME, SECINFO, | | | | RENAME, SECINFO, | | |||
| | SECINFO_NO_NAME, SETATTR, | | | | SECINFO_NO_NAME, SETATTR, | | |||
| | VERIFY, WANT_DELEGATION, | | | | VERIFY, WANT_DELEGATION, | | |||
| | WRITE | | | | WRITE | | |||
| NFS4ERR_IO | ACCESS, COMMIT, CREATE, | | | NFS4ERR_IO | ACCESS, COMMIT, CREATE, | | |||
| | GETATTR, GETDEVICELIST, | | | | GETATTR, GETDEVICELIST, | | |||
| | GET_DIR_DELEGATION, | | | | GET_DIR_DELEGATION, | | |||
skipping to change at page 426, line 48 | skipping to change at page 427, line 48 | |||
In absence of a persistent session, the client invokes exclusive | In absence of a persistent session, the client invokes exclusive | |||
create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. | create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. | |||
In these cases, the client provides a verifier that can reasonably be | In these cases, the client provides a verifier that can reasonably be | |||
expected to be unique. A combination of a client identifier, perhaps | expected to be unique. A combination of a client identifier, perhaps | |||
the client network address, and a unique number generated by the | the client network address, and a unique number generated by the | |||
client, perhaps the RPC transaction identifier, may be appropriate. | client, perhaps the RPC transaction identifier, may be appropriate. | |||
If the object does not exist, the server creates the object and | If the object does not exist, the server creates the object and | |||
stores the verifier in stable storage. For file systems that do not | stores the verifier in stable storage. For file systems that do not | |||
provide a mechanism for the storage of arbitrary file attributes, the | provide a mechanism for the storage of arbitrary file attributes, the | |||
server may use one or more elements of the object meta-data to store | server may use one or more elements of the object metadata to store | |||
the verifier. The verifier must be stored in stable storage to | the verifier. The verifier must be stored in stable storage to | |||
prevent erroneous failure on retransmission of the request. It is | prevent erroneous failure on retransmission of the request. It is | |||
assumed that an exclusive create is being performed because exclusive | assumed that an exclusive create is being performed because exclusive | |||
semantics are critical to the application. Because of the expected | semantics are critical to the application. Because of the expected | |||
usage, exclusive CREATE does not rely solely on the server's reply | usage, exclusive CREATE does not rely solely on the server's reply | |||
cache for storage of the verifier. A nonpersistent reply cache does | cache for storage of the verifier. A nonpersistent reply cache does | |||
not survive a crash and the session and reply cache may be deleted | not survive a crash and the session and reply cache may be deleted | |||
after a network partition that exceeds the lease time, thus opening | after a network partition that exceeds the lease time, thus opening | |||
failure windows. | failure windows. | |||
skipping to change at page 485, line 31 | skipping to change at page 486, line 31 | |||
uses (which will be either what the client offered, or what the | uses (which will be either what the client offered, or what the | |||
server is insisting on). return the value used to the client. These | server is insisting on). return the value used to the client. These | |||
parameters have the following interpretation. | parameters have the following interpretation. | |||
csa_flags: | csa_flags: | |||
The csa_flags field contains a list of the following flag bits: | The csa_flags field contains a list of the following flag bits: | |||
CREATE_SESSION4_FLAG_PERSIST: | CREATE_SESSION4_FLAG_PERSIST: | |||
If CREATE_SESSION4_FLAG_PERSIST is set, the client desires | If CREATE_SESSION4_FLAG_PERSIST is set, the client wants the | |||
server support for persistent reply cache. For sessions in | server to provide a persistent reply cache. For sessions in | |||
which only idempotent operations will be used (e.g. a read-only | which only idempotent operations will be used (e.g. a read-only | |||
session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST. | session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST. | |||
If the server does not or cannot provide a persistent reply | If the server does not or cannot provide a persistent reply | |||
cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in | cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in | |||
the field csr_flags. | the field csr_flags. | |||
If the server is a pNFS metadata server, for reasons described | If the server is a pNFS metadata server, for reasons described | |||
in Section 12.5.2 it SHOULD support | in Section 12.5.2 it SHOULD support | |||
CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint | CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint | |||
(Section 5.11.4) attribute. | (Section 5.11.4) attribute. | |||
skipping to change at page 493, line 20 | skipping to change at page 494, line 20 | |||
18.37.2. RESULT | 18.37.2. RESULT | |||
struct DESTROY_SESSION4res { | struct DESTROY_SESSION4res { | |||
nfsstat4 dsr_status; | nfsstat4 dsr_status; | |||
}; | }; | |||
18.37.3. DESCRIPTION | 18.37.3. DESCRIPTION | |||
The DESTROY_SESSION operation closes the session and discards the | The DESTROY_SESSION operation closes the session and discards the | |||
session's its reply cache, if any. Any remaining connections | session's reply cache, if any. Any remaining connections associated | |||
associated with the session are immediately disassociated and it not | with the session are immediately disassociated and it not associated | |||
associated with out sessions, MAY be closed by the server. Locks, | with out sessions, MAY be closed by the server. Locks, delegations, | |||
delegations, layouts, wants, and the lease, which are all tied to the | layouts, wants, and the lease, which are all tied to the client ID, | |||
client ID, are not affected by DESTROY_SESSION. | are not affected by DESTROY_SESSION. | |||
DESTROY_SESSION MUST be invoked on a connection that is associated | DESTROY_SESSION MUST be invoked on a connection that is associated | |||
with the session being destroyed. In addition if SP4_MACH_CRED state | with the session being destroyed. In addition if SP4_MACH_CRED state | |||
protection was specified when the client ID was created, the | protection was specified when the client ID was created, the | |||
RPCSEC_GSS principal that created the session MUST be the one that | RPCSEC_GSS principal that created the session MUST be the one that | |||
destroys the session, using RPCSEC_GSS privacy or integrity. If | destroys the session, using RPCSEC_GSS privacy or integrity. If | |||
SP4_SSV state protection was specified when the client ID was | SP4_SSV state protection was specified when the client ID was | |||
created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be | created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be | |||
used, with integrity or privacy. | used, with integrity or privacy. | |||
If the COMPOUND request starts with SEQUENCE, and if the sessions | If the COMPOUND request starts with SEQUENCE, and if the sessions | |||
referred to by SEQUENCE and DESTROY_SESSION are the same, then | referred to by SEQUENCE and DESTROY_SESSION are the same, then | |||
o DESTROY_SESSION MUST be the final operation in the COMPOUND | o DESTROY_SESSION MUST be the final operation in the COMPOUND | |||
request. | request. | |||
o It is advisable to not place DESTROY_SESSION in a COMPOUND request | o It is advisable to not place DESTROY_SESSION in a COMPOUND request | |||
with other state-modifying operations, because the DESTROY_SESSION | with other state-modifying operations, because the DESTROY_SESSION | |||
will destroy reply cache. | will destroy the reply cache. | |||
DESTROY_SESSION MAY be the only operation in a COMPOUND request. | DESTROY_SESSION MAY be the only operation in a COMPOUND request. | |||
Because the session is destroyed, a client that retries the request | Because the session is destroyed, a client that retries the request | |||
may receive an error in reply to the retry, even though the original | may receive an error in reply to the retry, even though the original | |||
request was successful. | request was successful. | |||
If there is a backchannel on the session and the server has | If there is a backchannel on the session and the server has | |||
outstanding CB_COMPOUND operations for the session which have not | outstanding CB_COMPOUND operations for the session which have not | |||
been replied to, then the server MAY refuse to destroy the session | been replied to, then the server MAY refuse to destroy the session | |||
skipping to change at page 504, line 32 | skipping to change at page 505, line 32 | |||
void; | void; | |||
}; | }; | |||
18.42.3. DESCRIPTION | 18.42.3. DESCRIPTION | |||
Commits changes in the layout represented by the current filehandle, | Commits changes in the layout represented by the current filehandle, | |||
client ID (derived from the sessionid in the preceding SEQUENCE | client ID (derived from the sessionid in the preceding SEQUENCE | |||
operation), byte range, and stateid. Since layouts are sub- | operation), byte range, and stateid. Since layouts are sub- | |||
dividable, a smaller portion of a layout, retrieved via LAYOUTGET, | dividable, a smaller portion of a layout, retrieved via LAYOUTGET, | |||
may be committed. The region being committed is specified through | may be committed. The region being committed is specified through | |||
the byte range (loca_offset and loca_length). | the byte range (loca_offset and loca_length). This region MUST | |||
overlap with one or more existing layouts previously granted via | ||||
LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW. | ||||
The LAYOUTCOMMIT operation indicates that the client has completed | The LAYOUTCOMMIT operation indicates that the client has completed | |||
writes using a layout obtained by a previous LAYOUTGET. The client | writes using a layout obtained by a previous LAYOUTGET. The client | |||
may have only written a subset of the data range it previously | may have only written a subset of the data range it previously | |||
requested. LAYOUTCOMMIT allows it to commit or discard provisionally | requested. LAYOUTCOMMIT allows it to commit or discard provisionally | |||
allocated space and to update the server with a new end of file. The | allocated space and to update the server with a new end of file. The | |||
layout referenced by LAYOUTCOMMIT is still valid after the operation | layout referenced by LAYOUTCOMMIT is still valid after the operation | |||
completes and can be continued to be referenced by the client ID, | completes and can be continued to be referenced by the client ID, | |||
filehandle, byte range, layout type, and stateid. | filehandle, byte range, layout type, and stateid. | |||
If the loca_reclaim field is set to TRUE, this indicates that the | If the loca_reclaim field is set to TRUE, this indicates that the | |||
client is attempting to commit changes to a layout after the reboot | client is attempting to commit changes to a layout after the reboot | |||
of the metadata server during the metadata server's recovery grace | of the metadata server during the metadata server's recovery grace | |||
period. This type of request may be necessary when the client has | period (see Section 12.7.4). This type of request may be necessary | |||
uncommitted writes to provisionally allocated regions of a file which | when the client has uncommitted writes to provisionally allocated | |||
were sent to the storage devices before the reboot of the metadata | regions of a file which were sent to the storage devices before the | |||
server. In this case the layout provided by the client MUST be a | reboot of the metadata server. In this case the layout provided by | |||
subset of a writable layout that the client held immediately before | the client MUST be a subset of a writable layout that the client held | |||
the reboot of the metadata server. The metadata server is free to | immediately before the reboot of the metadata server. The metadata | |||
accept or reject this request based on its own internal metadata | server is free to accept or reject this request based on its own | |||
consistency checks. If the metadata server finds that the layout | internal metadata consistency checks. If the metadata server finds | |||
provided by the client does not pass its consistency checks, it MUST | that the layout provided by the client does not pass its consistency | |||
reject the request with the status NFS4ERR_RECLAIM_BAD. The | checks, it MUST reject the request with the status | |||
successful completion of the LAYOUTCOMMIT request with loca_reclaim | NFS4ERR_RECLAIM_BAD. The successful completion of the LAYOUTCOMMIT | |||
set to TRUE does NOT provide the client with a layout for the file. | request with loca_reclaim set to TRUE does NOT provide the client | |||
It simply commits the changes to the layout specified in the | with a layout for the file. It simply commits the changes to the | |||
loca_layoutupdate field. To obtain a layout for the file the client | layout specified in the loca_layoutupdate field. To obtain a layout | |||
must send a LAYOUTGET request to the server after the server's grace | for the file the client must send a LAYOUTGET request to the server | |||
period has expired. If the metadata server receives a LAYOUTCOMMIT | after the server's grace period has expired. If the metadata server | |||
request with loca_reclaim set to TRUE when the metadata server is not | receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE when | |||
in its recovery grace period, it MUST reject the request with the | the metadata server is not in its recovery grace period, it MUST | |||
status NFS4ERR_NO_GRACE. | reject the request with the status NFS4ERR_NO_GRACE. | |||
Setting the loca_reclaim field to TRUE is required if and only if the | Setting the loca_reclaim field to TRUE is required if and only if the | |||
committed layout was acquired before the metadata server reboot. If | committed layout was acquired before the metadata server reboot. If | |||
the client is committing a layout that was acquired during the | the client is committing a layout that was acquired during the | |||
metadata server's grace period, it MUST set the "reclaim" field to | metadata server's grace period, it MUST set the "reclaim" field to | |||
FALSE. | FALSE. | |||
The loca_stateid is a layout stateid value as returned by previously | The loca_stateid is a layout stateid value as returned by previously | |||
successful layout operations ( see Section 12.5.3). | successful layout operations ( see Section 12.5.3). | |||
The loca_last_write_offset field specifies the offset of the last | The loca_last_write_offset field specifies the offset of the last | |||
byte written by the client previous to the LAYOUTCOMMIT. Note that | byte written by the client previous to the LAYOUTCOMMIT. Note that | |||
this value is never equal to the file's size (at most it is one byte | this value is never equal to the file's size (at most it is one byte | |||
less than the file's size) and MUST be less than or equal to | less than the file's size) and MUST be less than or equal to | |||
NFS4_MAXFILEOFF. The metadata server may use this information to | NFS4_MAXFILEOFF. Also, loca_last_write_offset MUST overlap the range | |||
determine whether the file's size needs to be updated. If the | described by loca_offset and loca_length. The metadata server may | |||
metadata server updates the file's size as the result of the | use this information to determine whether the file's size needs to be | |||
LAYOUTCOMMIT operation, it must return the new size | updated. If the metadata server updates the file's size as the | |||
result of the LAYOUTCOMMIT operation, it must return the new size | ||||
(locr_newsize.ns_size) as part of the results. | (locr_newsize.ns_size) as part of the results. | |||
The loca_time_modify field allows the client to suggest a | The loca_time_modify field allows the client to suggest a | |||
modification time it would like the metadata server to set. The | modification time it would like the metadata server to set. The | |||
metadata server may use the suggestion or it may use the time of the | metadata server may use the suggestion or it may use the time of the | |||
LAYOUTCOMMIT operation to set the modification time. If the metadata | LAYOUTCOMMIT operation to set the modification time. If the metadata | |||
server uses the client provided modification time, it should ensure | server uses the client provided modification time, it should ensure | |||
time does not flow backwards. If the client wants to force the | time does not flow backwards. If the client wants to force the | |||
metadata server to set an exact time, the client should use a SETATTR | metadata server to set an exact time, the client should use a SETATTR | |||
operation in a compound right after LAYOUTCOMMIT. See Section 12.5.4 | operation in a compound right after LAYOUTCOMMIT. See Section 12.5.4 | |||
skipping to change at page 508, line 11 | skipping to change at page 509, line 11 | |||
The LAYOUTGET operation returns layout information for the specified | The LAYOUTGET operation returns layout information for the specified | |||
byte range: a layout. To get a layout from a specific offset through | byte range: a layout. To get a layout from a specific offset through | |||
the end-of-file, regardless of the file's length, a loga_length field | the end-of-file, regardless of the file's length, a loga_length field | |||
with all bits set to 1 (one) should be used. If loga_length is zero, | with all bits set to 1 (one) should be used. If loga_length is zero, | |||
or if a loga_length which is not all bits set to one is specified, | or if a loga_length which is not all bits set to one is specified, | |||
and loga_length when added to loga_offset exceeds the maximum 64-bit | and loga_length when added to loga_offset exceeds the maximum 64-bit | |||
unsigned integer value, the error NFS4ERR_INVAL will result. | unsigned integer value, the error NFS4ERR_INVAL will result. | |||
The loga_minlength field specifies the minimum length of layout the | The loga_minlength field specifies the minimum length of layout the | |||
server MUST return. If this requirement cannot be met, no layout | server MUST return with two exceptions: | |||
must be returned; the error NFS4ERR_BADLAYOUT will be returned. | ||||
1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and | ||||
loga_offset plus loga_minlength goes past the end of the file. | ||||
2. The range from loga_offset through loga_offset + loga_minlength - | ||||
1 overlaps two or more striping patterns. In which case, | ||||
logr_layout will contain two or more elements, and the sum of the | ||||
lo_length fields of each element MUST be at least loga_minlength | ||||
unless the first exception also applies. | ||||
If this requirement cannot be met, the server MUST NOT return a | ||||
layout and the error NFS4ERR_BADLAYOUT MUST be returned. | ||||
The loga_stateid field specifies a valid stateid. If a layout is not | The loga_stateid field specifies a valid stateid. If a layout is not | |||
currently held by the client, the loga_stateid field represents a | currently held by the client, the loga_stateid field represents a | |||
stateid reflecting the correspondingly valid open, record lock, or | stateid reflecting the correspondingly valid open, record lock, or | |||
delegation stateid. Once a layout is held by the client for the | delegation stateid. Once a layout is held by the client for the | |||
file, the loga_stateid field is a stateid as returned from a previous | file, the loga_stateid field is a stateid as returned from a previous | |||
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL | LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL | |||
operation (see Section 12.5.3). | operation (see Section 12.5.3). | |||
The loga_maxcount field specifies the maximum layout size (in bytes) | The loga_maxcount field specifies the maximum layout size (in bytes) | |||
skipping to change at page 508, line 39 | skipping to change at page 509, line 50 | |||
then logr_layout will contain just one entry. Otherwise, if the | then logr_layout will contain just one entry. Otherwise, if the | |||
requested range overlaps more than one striping pattern, logr_layout | requested range overlaps more than one striping pattern, logr_layout | |||
will contain the required number of entries. The elements of | will contain the required number of entries. The elements of | |||
logr_layout MUST be sorted in ascending order of the value of the | logr_layout MUST be sorted in ascending order of the value of the | |||
lo_offset field of each element. There MUST be no gaps or overlaps | lo_offset field of each element. There MUST be no gaps or overlaps | |||
in the range between two successive elements of logr_layout. The | in the range between two successive elements of logr_layout. The | |||
lo_iomode field in each element of logr_layout MUST be the same. | lo_iomode field in each element of logr_layout MUST be the same. | |||
The metadata server may adjust the range of the returned layout based | The metadata server may adjust the range of the returned layout based | |||
on the usage implied by the loga_iomode. The client MUST be prepared | on the usage implied by the loga_iomode. The client MUST be prepared | |||
to get a layout that does not align exactly with its request. The | to get a layout that does not align exactly with its request. See | |||
lo_length field in each element of logr_layout SHOULD be at least as | ||||
long as loga_minlength or the server SHOULD reject the request. See | ||||
Section 12.5.2 for more details. | Section 12.5.2 for more details. | |||
The metadata server may also return a layout with an lo_iomode other | The metadata server may also return a layout with an lo_iomode other | |||
than that requested by the client. If it does so, it must ensure | than that requested by the client. If it does so, it MUST ensure | |||
that the lo_iomode is more permissive than the loga_iomode requested. | that the lo_iomode is more permissive than the loga_iomode requested. | |||
For example, this behavior allows an implementation to upgrade read- | For example, this behavior allows an implementation to upgrade read- | |||
only requests to read/write requests at its discretion, within the | only requests to read/write requests at its discretion, within the | |||
limits of the layout type specific protocol. A lo_iomode of either | limits of the layout type specific protocol. A lo_iomode of either | |||
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW must be returned. | LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. | |||
The logr_return_on_close result field is a directive to return the | The logr_return_on_close result field is a directive to return the | |||
layout before closing the file. When the server sets this return | layout before closing the file. When the server sets this return | |||
value to TRUE, it must be prepared to recall the layout in the case | value to TRUE, it MUST be prepared to recall the layout in the case | |||
the client fails to return the layout before close. For the server | the client fails to return the layout before close. For the server | |||
that knows a layout must be returned before a close of the file, this | that knows a layout must be returned before a close of the file, this | |||
return value can be used to communicate the desired behavior to the | return value can be used to communicate the desired behavior to the | |||
client and thus remove one extra step from the client's and server's | client and thus remove one extra step from the client's and server's | |||
interaction. | interaction. | |||
The logr_stateid, as with all stateid processing, is returned to the | The logr_stateid, as with all stateid processing, is returned to the | |||
client for use in subsequent layout related operations. See | client for use in subsequent layout related operations. See | |||
Section 8.2 for a further discussion. | Section 8.2 for a further discussion. | |||
skipping to change at page 509, line 36 | skipping to change at page 510, line 44 | |||
If layouts are not supported for the requested file or its containing | If layouts are not supported for the requested file or its containing | |||
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If | file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If | |||
the layout type is not supported, the metadata server should return | the layout type is not supported, the metadata server should return | |||
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout | NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout | |||
matches the client provided layout identification, the server should | matches the client provided layout identification, the server should | |||
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or | return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or | |||
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should | a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should | |||
return NFS4ERR_BADIOMODE. | return NFS4ERR_BADIOMODE. | |||
If the layout for the file is unavailable due to transient | If the layout for the file is unavailable due to transient | |||
conditions, e.g. file sharing prohibits layouts, the server must | conditions, e.g. file sharing prohibits layouts, the server MUST | |||
return NFS4ERR_LAYOUTTRYLATER. | return NFS4ERR_LAYOUTTRYLATER. | |||
If the layout request is rejected due to an overlapping layout | If the layout request is rejected due to an overlapping layout | |||
recall, the server must return NFS4ERR_RECALLCONFLICT. See | recall, the server MUST return NFS4ERR_RECALLCONFLICT. See | |||
Section 12.5.5.2 for details. | Section 12.5.5.2 for details. | |||
If the layout conflicts with a mandatory byte range lock held on the | If the layout conflicts with a mandatory byte range lock held on the | |||
file, and if the storage devices have no method of enforcing | file, and if the storage devices have no method of enforcing | |||
mandatory locks, other than through the restriction of layouts, the | mandatory locks, other than through the restriction of layouts, the | |||
metadata server should return NFS4ERR_LOCKED. | metadata server should return NFS4ERR_LOCKED. | |||
If client sets loga_signal_layout_avail to TRUE, then it is | If client sets loga_signal_layout_avail to TRUE, then it is | |||
registering with the client a "want" for a layout in the event the | registering with the client a "want" for a layout in the event the | |||
layout cannot be obtained due to resource exhaustion. If the server | layout cannot be obtained due to resource exhaustion. If the server | |||
skipping to change at page 514, line 22 | skipping to change at page 515, line 22 | |||
layout. See Section 12.5.5 for more details. | layout. See Section 12.5.5 for more details. | |||
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after | If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after | |||
the metadata server's grace period, NFS4ERR_NO_GRACE is returned. | the metadata server's grace period, NFS4ERR_NO_GRACE is returned. | |||
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and | If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and | |||
lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL, | lr_returntype is set to LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL, | |||
NFS4ERR_INVAL is returned. | NFS4ERR_INVAL is returned. | |||
If the operation specified lr_returntype of LAYOUTRETURN4_FILE, then | If the operation specified lr_returntype of LAYOUTRETURN4_FILE, then | |||
the lorr_stateid will represent the layout stateid as updated for | lrs_stateid will represent the layout stateid as updated for this | |||
this operation's processing; the current stateid will also be updated | operation's processing; the current stateid will also be updated to | |||
to match the returned value. If the last byte of any layout for the | match the returned value. If the last byte of any layout for the | |||
current file, client ID, and layout type is being returned and there | current file, client ID, and layout type is being returned and there | |||
are not remaining pending CB_LAYOUTRECALL operations for which a | are no remaining pending CB_LAYOUTRECALL operations for which a | |||
LAYOUTRETURN operation must be done as a completing operation, this | LAYOUTRETURN operation must be done as a completing operation, | |||
stateid value may be the special stateid consisting of all zeros. | lrs_present MUST be FALSE, and thus no stateid will be returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
The server MAY require that the principal, security flavor, and if | The server MAY require that the principal, security flavor, and if | |||
applicable, the GSS mechanism, combination that acquired the layout | applicable, the GSS mechanism, combination that acquired the layout | |||
also be the one to send LAYOUTRETURN. This might not be possible if | also be the one to send LAYOUTRETURN. This might not be possible if | |||
credentials for the principal are no longer available. The server | credentials for the principal are no longer available. The server | |||
MAY allow the machine credential or SSV credential (see | MAY allow the machine credential or SSV credential (see | |||
Section 18.35) to send LAYOUTRETURN. | Section 18.35) to send LAYOUTRETURN. | |||
skipping to change at page 518, line 28 | skipping to change at page 519, line 28 | |||
a request outstanding for; it could be equal to sa_slotid. The | a request outstanding for; it could be equal to sa_slotid. The | |||
server returns two "highest_slotid" values: sr_highest_slotid, and | server returns two "highest_slotid" values: sr_highest_slotid, and | |||
sr_target_highest_slotid. The former is the highest slot id the | sr_target_highest_slotid. The former is the highest slot id the | |||
server will accept in future SEQUENCE operation, and SHOULD NOT be | server will accept in future SEQUENCE operation, and SHOULD NOT be | |||
less than the value of sa_highest_slotid. (but see Section 2.10.5.1 | less than the value of sa_highest_slotid. (but see Section 2.10.5.1 | |||
for an exception). The latter is the highest slot id the server | for an exception). The latter is the highest slot id the server | |||
would prefer the client use on a future SEQUENCE operation. | would prefer the client use on a future SEQUENCE operation. | |||
If sa_cachethis is TRUE, then the client is requesting that the | If sa_cachethis is TRUE, then the client is requesting that the | |||
server cache the entire reply in the server's reply cache; therefore | server cache the entire reply in the server's reply cache; therefore | |||
the server MUST cache the reply (see Section 2.10.5.1.2). The server | the server MUST cache the reply (see Section 2.10.5.1.3). The server | |||
MAY cache the reply if sa_cachethis is FALSE. If the server does not | MAY cache the reply if sa_cachethis is FALSE. If the server does not | |||
cache the entire reply, it MUST still record that it executed the | cache the entire reply, it MUST still record that it executed the | |||
request at the specified slot and sequence id. | request at the specified slot and sequence id. | |||
The response to the SEQUENCE operation contains a word of status | The response to the SEQUENCE operation contains a word of status | |||
flags (sr_status_flags) that can provide to the client information | flags (sr_status_flags) that can provide to the client information | |||
related to the status of the client's lock state and communications | related to the status of the client's lock state and communications | |||
paths. Note that any status bits relating to lock state MAY be reset | paths. Note that any status bits relating to lock state MAY be reset | |||
when lock state is lost due to a server reboot (even if the session | when lock state is lost due to a server reboot (even if the session | |||
is persistent across reboots; session persistence does not imply lock | is persistent across reboots; session persistence does not imply lock | |||
skipping to change at page 520, line 36 | skipping to change at page 521, line 36 | |||
transferred to one or more new servers. This condition will | transferred to one or more new servers. This condition will | |||
continue until the client receives an NFS4ERR_MOVED error and the | continue until the client receives an NFS4ERR_MOVED error and the | |||
server receives the subsequent GETATTR for the fs_locations or | server receives the subsequent GETATTR for the fs_locations or | |||
fs_locations_info attribute for an access to each file system for | fs_locations_info attribute for an access to each file system for | |||
which a lease has been moved to a new server. See | which a lease has been moved to a new server. See | |||
Section 11.7.7.1. | Section 11.7.7.1. | |||
SEQ4_STATUS_RESTART_RECLAIM_NEEDED | SEQ4_STATUS_RESTART_RECLAIM_NEEDED | |||
When set indicates that due to server restart or reboot the client | When set indicates that due to server restart or reboot the client | |||
must reclaim locking state. Until the client sends a global | must reclaim locking state. Until the client sends a global | |||
RECLAIM_COMPLETE (Section 18.51, every SEQUENCE operation will | RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will | |||
return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. | return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. | |||
SEQ4_STATUS_BACKCHANNEL_FAULT | SEQ4_STATUS_BACKCHANNEL_FAULT | |||
The server has encountered an unrecoverable fault with the | The server has encountered an unrecoverable fault with the | |||
backchannel (e.g. it has lost track of the sequence id for a slot | backchannel (e.g. it has lost track of the sequence id for a slot | |||
in the backchannel). The client MUST stop sending more requests | in the backchannel). The client MUST stop sending more requests | |||
on the session's fore channel, wait for all outstanding requests | on the session's fore channel, wait for all outstanding requests | |||
to complete on the fore and back channel, and then destroy the | to complete on the fore and back channel, and then destroy the | |||
session. | session. | |||
skipping to change at page 525, line 48 | skipping to change at page 526, line 48 | |||
o Special stateids are always considered invalid (they result in the | o Special stateids are always considered invalid (they result in the | |||
error code NFS4ERR_BAD_STATEID). | error code NFS4ERR_BAD_STATEID). | |||
All stateids are interpreted as being associated with the client for | All stateids are interpreted as being associated with the client for | |||
the current session. Any possible association with a previous | the current session. Any possible association with a previous | |||
instance of the client (as stale stateids) is not considered. | instance of the client (as stale stateids) is not considered. | |||
The errors which are validly returned within the status_code array | The errors which are validly returned within the status_code array | |||
are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, | are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, | |||
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. | NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. | |||
[[Comment.5: _LAYOUT_REVOKED]]. | [[Comment.4: _LAYOUT_REVOKED]]. | |||
18.48.4. IMPLEMENTATION | 18.48.4. IMPLEMENTATION | |||
See Section 8.2.2 and Section 8.2.4 for a discussion of stateid | See Section 8.2.2 and Section 8.2.4 for a discussion of stateid | |||
structure, lifetime, and validation. | structure, lifetime, and validation. | |||
18.49. Operation 56: WANT_DELEGATION - Request Delegation | 18.49. Operation 56: WANT_DELEGATION - Request Delegation | |||
18.49.1. ARGUMENT | 18.49.1. ARGUMENT | |||
skipping to change at page 530, line 44 | skipping to change at page 531, line 44 | |||
}; | }; | |||
18.51.3. DESCRIPTION | 18.51.3. DESCRIPTION | |||
A RECLAIM_COMPLETE operation must be used to indicate that the client | A RECLAIM_COMPLETE operation must be used to indicate that the client | |||
has reclaimed all of the locking state that it will recover, when it | has reclaimed all of the locking state that it will recover, when it | |||
is recovering state due to either a server restart or the transfer of | is recovering state due to either a server restart or the transfer of | |||
a file system to another server. There are two types of | a file system to another server. There are two types of | |||
RECLAIM_COMPLETE operations: | RECLAIM_COMPLETE operations: | |||
o When one_fs is false, a global RECLAIM_COMPLETE is being done. | o When rca_one_fs is FALSE, a global RECLAIM_COMPLETE is being done. | |||
This indicates that recovery of all locks that the client held on | This indicates that recovery of all locks that the client held on | |||
the previous server instance have been completed. | the previous server instance have been completed. | |||
o When one_fs is true, a file system-specific RECLAIM_COMPLETE is | o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE | |||
being done. This indicates that recovery of locks for a single fs | is being done. This indicates that recovery of locks for a single | |||
(the one designated by the current filehandle) due to a file | fs (the one designated by the current filehandle) due to a file | |||
system transition have been completed. Presence of a current | system transition have been completed. Presence of a current | |||
filehandle is only required when one_fs is true. | filehandle is only required when rca_one_fs is true. | |||
Once a RECLAIM_COMPLETE is done, there can be no further reclaim | Once a RECLAIM_COMPLETE is done, there can be no further reclaim | |||
operations for locks whose scope is defined as having completed | operations for locks whose scope is defined as having completed | |||
recovery. Once the client sends RECLAIM_COMPLETE, the server will | recovery. Once the client sends RECLAIM_COMPLETE, the server will | |||
not allow the client to do subsequent reclaims of locking state for | not allow the client to do subsequent reclaims of locking state for | |||
that scope and will return NFS4ERR_NO_GRACE, if these are attempted. | that scope and will return NFS4ERR_NO_GRACE, if these are attempted. | |||
Whenever a client establishes a new client ID and before it does the | Whenever a client establishes a new client ID and before it does the | |||
first non-reclaim operation that obtains a lock, it MUST do a global | first non-reclaim operation that obtains a lock, it MUST do a global | |||
RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- | RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- | |||
reclaim locking operations are done before the RECLAIM_COMPLETE, a | reclaim locking operations are done before the RECLAIM_COMPLETE, a | |||
NFS4ERR_GRACE will be returned. | NFS4ERR_GRACE will be returned. | |||
Similarly, when the client accesses a file system on a new server, | Similarly, when the client accesses a file system on a new server, | |||
before it sends the first non-reclaim operation that obtains a lock | before it sends the first non-reclaim operation that obtains a lock | |||
on this new server, it must do a RECLAIM_COMPLETE with one_fs true | on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs | |||
and current filehandle within that file system, even if there are no | true and current filehandle within that file system, even if there | |||
locks to reclaim. If non-reclaim locking operations are done on that | are no locks to reclaim. If non-reclaim locking operations are done | |||
file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will be | on that file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will | |||
returned. | be returned. | |||
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is | Any locks not reclaimed at the point at which RECLAIM_COMPLETE is | |||
done become non-reclaimable. The client MUST NOT attempt to reclaim | done become non-reclaimable. The client MUST NOT attempt to reclaim | |||
them, either during the current server instance or in any subsequent | them, either during the current server instance or in any subsequent | |||
server instance, or on another server to which responsibility for | server instance, or on another server to which responsibility for | |||
that file system is transferred. If the client were to do so, it | that file system is transferred. If the client were to do so, it | |||
would be violating the protocol by representing itself as owning | would be violating the protocol by representing itself as owning | |||
locks that it does not own, and so has no right to reclaim. See | locks that it does not own, and so has no right to reclaim. See | |||
Section 8.4.3 for a discussion of edge conditions related to lock | Section 8.4.3 for a discussion of edge conditions related to lock | |||
reclaim. | reclaim. | |||
skipping to change at page 533, line 6 | skipping to change at page 534, line 6 | |||
18.52.4. IMPLEMENTATION | 18.52.4. IMPLEMENTATION | |||
A client will probably not send an operation with code OP_ILLEGAL but | A client will probably not send an operation with code OP_ILLEGAL but | |||
if it does, the response will be ILLEGAL4res just as it would be with | if it does, the response will be ILLEGAL4res just as it would be with | |||
any other invalid operation code. Note that if the server gets an | any other invalid operation code. Note that if the server gets an | |||
illegal operation code that is not OP_ILLEGAL, and if the server | illegal operation code that is not OP_ILLEGAL, and if the server | |||
checks for legal operation codes during the XDR decode phase, then | checks for legal operation codes during the XDR decode phase, then | |||
the ILLEGAL4res would not be returned. | the ILLEGAL4res would not be returned. | |||
19. NFSv44.1 Callback Procedures | 19. NFSv4.1 Callback Procedures | |||
The procedures used for callbacks are defined in the following | The procedures used for callbacks are defined in the following | |||
sections. In the interest of clarity, the terms "client" and | sections. In the interest of clarity, the terms "client" and | |||
"server" refer to NFS clients and servers, despite the fact that for | "server" refer to NFS clients and servers, despite the fact that for | |||
an individual callback RPC, the sense of these terms would be | an individual callback RPC, the sense of these terms would be | |||
precisely the opposite. | precisely the opposite. | |||
19.1. Procedure 0: CB_NULL - No Operation | 19.1. Procedure 0: CB_NULL - No Operation | |||
19.1.1. ARGUMENTS | 19.1.1. ARGUMENTS | |||
skipping to change at page 549, line 49 | skipping to change at page 550, line 49 | |||
The server may decide that it cannot hold all of the state for | The server may decide that it cannot hold all of the state for | |||
recallable objects, such as delegations and layouts, without running | recallable objects, such as delegations and layouts, without running | |||
out of resources. In such a case, it is free to recall individual | out of resources. In such a case, it is free to recall individual | |||
objects to reduce the load but this would be far from optimal. | objects to reduce the load but this would be far from optimal. | |||
Because the general purpose of such recallable objects as delegations | Because the general purpose of such recallable objects as delegations | |||
is to eliminate client interaction with the server, the server cannot | is to eliminate client interaction with the server, the server cannot | |||
interpret lack of recent use as indicating that the object is no | interpret lack of recent use as indicating that the object is no | |||
longer useful. The absence of visible use may be the result of a | longer useful. The absence of visible use may be the result of a | |||
large number of potential operations eliminated. In the case of | large number of potential operations eliminated. In the case of | |||
layouts, the layout will be used explicitly but the meta-data server | layouts, the layout will be used explicitly but the metadata server | |||
does not have direct knowledge of such use. | does not have direct knowledge of such use. | |||
In order to implement an effective reclaim scheme for such objects, | In order to implement an effective reclaim scheme for such objects, | |||
the server's knowledge of available resources must be used to | the server's knowledge of available resources must be used to | |||
determine when objects must be recalled with the clients selecting | determine when objects must be recalled with the clients selecting | |||
the actual objects to be returned. | the actual objects to be returned. | |||
Server implementations may differ in their resource allocation | Server implementations may differ in their resource allocation | |||
requirements. For example, one server may share resources among all | requirements. For example, one server may share resources among all | |||
classes of recallable objects whereas another may use separate | classes of recallable objects whereas another may use separate | |||
skipping to change at page 553, line 9 | skipping to change at page 554, line 9 | |||
slots, and if applicable, transport credits (e.g. RDMA credits for | slots, and if applicable, transport credits (e.g. RDMA credits for | |||
connections associated with the operations channel) to the server. | connections associated with the operations channel) to the server. | |||
CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target | CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target | |||
highest_slot the server wants for the session. The client, should | highest_slot the server wants for the session. The client, should | |||
then work toward reducing the highest_slot to the target. | then work toward reducing the highest_slot to the target. | |||
If the session has only non-RDMA connections associated with its | If the session has only non-RDMA connections associated with its | |||
operations channel, then the client need only wait for all | operations channel, then the client need only wait for all | |||
outstanding requests with a slotid > rsa_target_highest_slotid to | outstanding requests with a slotid > rsa_target_highest_slotid to | |||
complete, then send a single COMPOUND consisting of a single SEQUENCE | complete, then send a single COMPOUND consisting of a single SEQUENCE | |||
operation, with the sa_highslot field set to | operation, with the sa_highestslot field set to | |||
rsa_target_highest_slotid. If there are RDMA-based connections | rsa_target_highest_slotid. If there are RDMA-based connections | |||
associated with operation channel, then the client needs to also send | associated with operation channel, then the client needs to also send | |||
enough zero-length RDMA Sends to take the total RDMA credit count to | enough zero-length RDMA Sends to take the total RDMA credit count to | |||
rsa_target_highest_slotid + 1 or below. | rsa_target_highest_slotid + 1 or below. | |||
20.8.4. IMPLEMENTATION | 20.8.4. IMPLEMENTATION | |||
If the client fails to reduce highest slot it has on the fore channel | If the client fails to reduce highest slot it has on the fore channel | |||
to what the server requests, the server can force the issue by | to what the server requests, the server can force the issue by | |||
asserting flow control on the receive side of all connections bound | asserting flow control on the receive side of all connections bound | |||
skipping to change at page 554, line 36 | skipping to change at page 555, line 36 | |||
contents include the session to which this request belongs, slot id | contents include the session to which this request belongs, slot id | |||
and sequence id used by the server to implement session request | and sequence id used by the server to implement session request | |||
control and exactly once semantics, and exchanged slot maximums which | control and exactly once semantics, and exchanged slot maximums which | |||
are used to adjust the size of the reply cache. This operation MUST | are used to adjust the size of the reply cache. This operation MUST | |||
appear once as the first operation in each CB_COMPOUND request or a | appear once as the first operation in each CB_COMPOUND request or a | |||
protocol error must result. See Section 18.46.3 for a description of | protocol error must result. See Section 18.46.3 for a description of | |||
how slots are processed. | how slots are processed. | |||
If csa_cachethis is TRUE, then the server is requesting that the | If csa_cachethis is TRUE, then the server is requesting that the | |||
client cache the reply in the callback reply cache. The client MUST | client cache the reply in the callback reply cache. The client MUST | |||
cache the reply (see Section 2.10.5.1.2). | cache the reply (see Section 2.10.5.1.3). | |||
The csa_referring_call_lists array is the list of COMPOUND requests, | The csa_referring_call_lists array is the list of COMPOUND requests, | |||
identified by sessionid, slot id and sequencid. These are requests | identified by sessionid, slot id and sequencid. These are requests | |||
that the client previously sent to the server. These previous | that the client previously sent to the server. These previous | |||
requests created state that some operation(s) in the in the same | requests created state that some operation(s) in the in the same | |||
CB_COMPOUND as the csa_referring_call_lists is identifying. A | CB_COMPOUND as the csa_referring_call_lists is identifying. A | |||
sessionid is included because leased state is tied to a client ID, | sessionid is included because leased state is tied to a client ID, | |||
and a client ID can have multiple sessions. See Section 2.10.5.3. | and a client ID can have multiple sessions. See Section 2.10.5.3. | |||
The value of csa_sequenceid argument relative to the cached sequence | The value of csa_sequenceid argument relative to the cached sequence | |||
End of changes. 126 change blocks. | ||||
353 lines changed or deleted | 441 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |