Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-pre-ch-19.txt - draft-ietf-nfsv4-minorversion1-23.txt
 draft-pre-ch-19.txt   draft-ietf-nfsv4-minorversion1-23.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: November 2, 2008 Editors Expires: November 6, 2008 Editors
May 1, 2008 May 5, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-22.txt draft-ietf-nfsv4-minorversion1-23.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 2, 2008. This Internet-Draft will expire on November 6, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 6, line 42 skipping to change at page 6, line 42
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 278
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287
12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289
12.7.2. Dealing with Lease Expiration on the Client . . . . 290 12.7.2. Dealing with Lease Expiration on the Client . . . . 289
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 291 Server . . . . . . . . . . . . . . . . . . . . . . . 290
12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291
12.7.5. Operations During Metadata Server Grace Period . . . 293 12.7.5. Operations During Metadata Server Grace Period . . . 293
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 293
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294
12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295
13.1. Client ID and Session Considerations . . . . . . . . . . 296 13.1. Client ID and Session Considerations . . . . . . . . . . 295
13.1.1. Sessions Considerations for Data Servers . . . . . . 298 13.1.1. Sessions Considerations for Data Servers . . . . . . 298
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303
13.4.2. Interpreting the File Layout Using Sparse Packing . 303 13.4.2. Interpreting the File Layout Using Sparse Packing . 303
13.4.3. Interpreting the File Layout Using Dense Packing . . 306 13.4.3. Interpreting the File Layout Using Dense Packing . . 306
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311
skipping to change at page 9, line 17 skipping to change at page 9, line 17
locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 locks . . . . . . . . . . . . . . . . . . . . . . . . . 509
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 510 delegation . . . . . . . . . . . . . . . . . . . . . . . 510
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 516 for a File System . . . . . . . . . . . . . . . . . . . 516
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 a layout . . . . . . . . . . . . . . . . . . . . . . . . 518
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 526 Information . . . . . . . . . . . . . . . . . . . . . . 528
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 530 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 533
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 531 sequencing and control . . . . . . . . . . . . . . . . . 534
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 537 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 540
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 539 validity . . . . . . . . . . . . . . . . . . . . . . . . 542
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 541 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 544
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 545 client ID . . . . . . . . . . . . . . . . . . . . . . . 548
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 545 Finished . . . . . . . . . . . . . . . . . . . . . . . . 548
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 548 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 551
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 548 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 551
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 549 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 552
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 549 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 552
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 553 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 556
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 553 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 556
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 554 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 557
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 555 Client . . . . . . . . . . . . . . . . . . . . . . . . . 558
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 559 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 562
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 563 Client . . . . . . . . . . . . . . . . . . . . . . . . . 566
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 564 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 567
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 566 Resources for Recallable Objects . . . . . . . . . . . . 570
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 567 limits . . . . . . . . . . . . . . . . . . . . . . . . . 571
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 568 sequencing and control . . . . . . . . . . . . . . . . . 572
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 570 Delegation Wants . . . . . . . . . . . . . . . . . . . . 574
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 571 lock availability . . . . . . . . . . . . . . . . . . . 575
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 573 changes . . . . . . . . . . . . . . . . . . . . . . . . 577
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 575 Operation . . . . . . . . . . . . . . . . . . . . . . . 579
21. Security Considerations . . . . . . . . . . . . . . . . . . . 575 21. Security Considerations . . . . . . . . . . . . . . . . . . . 579
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 577 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 581
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 577 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 581
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 577 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 581
22.3. Defining New Notifications . . . . . . . . . . . . . . . 578 22.3. Defining New Notifications . . . . . . . . . . . . . . . 582
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 578 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 582
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 580 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 584
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 580 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 584
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 580 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 584
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 580 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 584
23.1. Normative References . . . . . . . . . . . . . . . . . . 580 23.1. Normative References . . . . . . . . . . . . . . . . . . 584
23.2. Informative References . . . . . . . . . . . . . . . . . 582 23.2. Informative References . . . . . . . . . . . . . . . . . 586
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 584 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 588
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 586 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 590
Intellectual Property and Copyright Statements . . . . . . . . . 587 Intellectual Property and Copyright Statements . . . . . . . . . 591
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 46, line 7 skipping to change at page 46, line 7
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. server.
Of course, even if the eir_server_owner.so_minor_id fields do Of course, even if the eir_server_owner.so_minor_id fields do
match, the client is free to employ client ID trunking instead of match, the client is free to employ client ID trunking instead of
sessiond trunking. session trunking.
The client completes the act of client ID trunking by invoking The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session. sessions and also associate each connection with each session.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with the same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
skipping to change at page 51, line 37 skipping to change at page 51, line 37
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot id, sequence id, and sessionid (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Given that well formulated XIDs continue to be required, this begs Given that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the session id in the reply means
requester does not have to use the XID to lookup the sessionid, which the requester does not have to use the XID to lookup the session id,
would be necessary if the connection were associated with multiple which would be necessary if the connection were associated with
sessions. Having the slot id and sequence id in the reply means multiple sessions. Having the slot id and sequence id in the reply
requester does not have to use the XID to lookup the slot id and means requester does not have to use the XID to lookup the slot id
sequence id. Furhermore, since the XID is only 32 bits, it is too and sequence id. Furhermore, since the XID is only 32 bits, it is
small to guarantee the re-association of a reply with its request too small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([27]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always indicate the slot id information. The requester must always indicate the slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
skipping to change at page 53, line 30 skipping to change at page 53, line 30
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slotid as the request replied to and the request with the same slotid as the request replied to and the
next higher sequenceid. next higher sequenceid.
2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, sessionid, its reply MUST always be cached. Specifically, session id, sequence
sequenceid, and slotid MUST be cached in the reply cache. The reply id, and slot id MUST be cached in the reply cache. The reply from
from SEQUENCE also includes the highest slotid, target highest SEQUENCE also includes the highest slot id, target highest slot id,
slotid, and status flags. Instead of caching these values, the and status flags. Instead of caching these values, the server MAY
server MAY re-compute the values from the current state of the fore re-compute the values from the current state of the fore channel,
channel, session and/or client ID as appropriate. Similarly, the session and/or client ID as appropriate. Similarly, the reply from
reply from CB_SEQUENCE includes a highest slotid and target highest CB_SEQUENCE includes a highest slot id and target highest slot id.
slotid. The client MAY re-compute the values from the current state The client MAY re-compute the values from the current state of the
of the session as appropriate. session as appropriate.
Regardless of whether a replier is re-computing highest slotid, Regardless of whether a replier is re-computing highest slotid,
target slotid, and status on replies to retries or not, the requester target slot id, and status on replies to retries or not, the
MUST NOT assume the values are being re-computed whenever it receives requester MUST NOT assume the values are being re-computed whenever
a reply after a retry is sent, since it has no way of knowing whether it receives a reply after a retry is sent, since it has no way of
the reply it has received was sent by the server in response to the knowing whether the reply it has received was sent by the server in
retry, or is a delayed response to the original request. Therefore, response to the retry, or is a delayed response to the original
it may be the case that highest slotid, target slotid, or status bits request. Therefore, it may be the case that highest slot id, target
may reflect the state of affairs when the request was first executed. slot id, or status bits may reflect the state of affairs when the
Although acting based on such delayed information is valid, it may request was first executed. Although acting based on such delayed
cause the receiver to do unneeded work. Requesters MAY choose to information is valid, it may cause the receiver to do unneeded work.
send additional requests to get the current state of affairs or use Requesters MAY choose to send additional requests to get the current
the state of affairs reported by subsequent requests, in preference state of affairs or use the state of affairs reported by subsequent
to acting immediately on data which may be out of date. requests, in preference to acting immediately on data which may be
out of date.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.3. Optional Reply Caching 2.10.5.1.3. Optional Reply Caching
skipping to change at page 56, line 19 skipping to change at page 56, line 19
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
perhaps even a different network, or even a different session perhaps even a different network, or even a different session
associated with the same client ID associated with the same client ID
The presence of a session between client and server alleviates this The presence of a session between client and server alleviates this
issue. When a session is in place, each client request is uniquely issue. When a session is in place, each client request is uniquely
identified by its { sessionid, slot id, sequence id } triple. By the identified by its { session id, slot id, sequence id } triple. By
rules under which slot entries (reply cache entries) are retired, the the rules under which slot entries (reply cache entries) are retired,
server has knowledge whether the client has "seen" each of the the server has knowledge whether the client has "seen" each of the
server's replies. The server can therefore provide sufficient server's replies. The server can therefore provide sufficient
information to the client to allow it to disambiguate between an information to the client to allow it to disambiguate between an
erroneous or conflicting callback race condition. erroneous or conflicting callback race condition.
For each client operation which might result in some sort of server For each client operation which might result in some sort of server
callback, the server SHOULD "remember" the { sessionid, slot id, callback, the server SHOULD "remember" the { sessionid, slot id,
sequence id } triple of the client request until the slot id sequence id } triple of the client request until the slot id
retirement rules allow the server to determine that the client has, retirement rules allow the server to determine that the client has,
in fact, seen the server's reply. Until the time the { sessionid, in fact, seen the server's reply. Until the time the { sessionid,
slot id, sequence id } request triple can be retired, any recalls of slot id, sequence id } request triple can be retired, any recalls of
the associated object MUST carry an array of these referring the associated object MUST carry an array of these referring
identifiers (in the CB_SEQUENCE operation's arguments), for the identifiers (in the CB_SEQUENCE operation's arguments), for the
benefit of the client. After this time, it is not necessary for the benefit of the client. After this time, it is not necessary for the
server to provide this information in related callbacks, since it is server to provide this information in related callbacks, since it is
certain that a race condition can no longer occur. certain that a race condition can no longer occur.
The CB_SEQUENCE operation which begins each server callback carries a The CB_SEQUENCE operation which begins each server callback carries a
list of "referring" { sessionid, slot id, sequence id } triples. If list of "referring" { sessionid, slot id, sequence id } triples. If
the client finds the request corresponding to the referring the client finds the request corresponding to the referring session
sessionid, slot id and sequence id to be currently outstanding (i.e. id, slot id and sequence id to be currently outstanding (i.e. the
the server's reply has not been seen by the client), it can determine server's reply has not been seen by the client), it can determine
that the callback has raced the reply, and act accordingly. If the that the callback has raced the reply, and act accordingly. If the
client does not find the request corresponding the referring triple client does not find the request corresponding the referring triple
to be outstanding (including the case of a sessionid referring to a to be outstanding (including the case of a sessionid referring to a
destroyed session), then there is no race with respect to this destroyed session), then there is no race with respect to this
triple. The server SHOULD limit the referring triples to requests triple. The server SHOULD limit the referring triples to requests
that refer to just those that apply to the objects referred to in the that refer to just those that apply to the objects referred to in the
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
skipping to change at page 59, line 37 skipping to change at page 59, line 37
sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session the accept new requests as well as retries. To re-animate a session the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequence id that is used for creating sessions
Section 18.35 and Section 18.36. This is a prerequisite to let (see Section 18.35 and Section 18.36. This is a prerequisite to
the client create more sessions. let the client create more sessions.
o The principal that created the client ID. This allows the server o The principal that created the client ID. This allows the server
to authenticate the client when it sends EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client o The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
sessions. sessions.
o The properties of the client ID as defined in Section 18.35. o The properties of the client ID as defined in Section 18.35.
skipping to change at page 76, line 21 skipping to change at page 76, line 21
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the sessionid that NFS4ERR_BADSESSION on the next operation that uses the session id
refers to the lost session. that refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session was lost. If the
connection survives session loss, then the next SEQUENCE operation connection survives session loss, then the next SEQUENCE operation
the client sends over the connection will get back the client sends over the connection will get back
skipping to change at page 150, line 28 skipping to change at page 150, line 28
which represents a client as a whole to the eventual lightweight which represents a client as a whole to the eventual lightweight
stateid used for most client and server locking interactions. The stateid used for most client and server locking interactions. The
details of this transition will vary with the type of object but it details of this transition will vary with the type of object but it
always starts with a client ID. always starts with a client ID.
8.1. Client and Session ID 8.1. Client and Session ID
A client must establish a client ID (see Section 2.4) and then one or A client must establish a client ID (see Section 2.4) and then one or
more sessionids (see Section 2.10) before performing any operations more sessionids (see Section 2.10) before performing any operations
to open, lock, delegate, or obtain a layout for a file object. Each to open, lock, delegate, or obtain a layout for a file object. Each
sessionid is associated with a specific client ID, and thus serves as session id is associated with a specific client ID, and thus serves
a shorthand reference to an NFSv4.1 client. as a shorthand reference to an NFSv4.1 client.
For some types of locking interactions, the client will represent For some types of locking interactions, the client will represent
some number of internal locking entities called "owners", which some number of internal locking entities called "owners", which
normally correspond to processes internal to the client. For other normally correspond to processes internal to the client. For other
types of locking-related objects, such as delegations and layouts, no types of locking-related objects, such as delegations and layouts, no
such intermediate entities are provided for, and the locking-related such intermediate entities are provided for, and the locking-related
objects are considered to be transferred directly between the server objects are considered to be transferred directly between the server
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
skipping to change at page 156, line 26 skipping to change at page 156, line 26
appropriate error returned when necessary. Special and non-special appropriate error returned when necessary. Special and non-special
stateids are handled separately. (See Section 8.2.3 for a discussion stateids are handled separately. (See Section 8.2.3 for a discussion
of special stateids.) of special stateids.)
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived from the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a session
which is invalid, SEQUENCE will return an error and the operation id which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
question will, although valid, be marked as dead, and any operation question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below. NFS4ERR_DEADSESSION, and thus not be processed as indicated below.
When a stateid is being tested, and the "other" field is all zeros or When a stateid is being tested, and the "other" field is all zeros or
all ones, a check that the "other" and "seqid" fields match a defined all ones, a check that the "other" and "seqid" fields match a defined
skipping to change at page 273, line 39 skipping to change at page 273, line 39
is incapable of providing this check in the presence of mandatory is incapable of providing this check in the presence of mandatory
file locks, the metadata server then MUST NOT grant layouts and file locks, the metadata server then MUST NOT grant layouts and
mandatory file locks simultaneously. mandatory file locks simultaneously.
12.5.2. Getting a Layout 12.5.2. Getting a Layout
A client obtains a layout with the LAYOUTGET operation. The metadata A client obtains a layout with the LAYOUTGET operation. The metadata
server will grant layouts of a particular type (e.g., block/volume, server will grant layouts of a particular type (e.g., block/volume,
object, or file). The client selects an appropriate layout type that object, or file). The client selects an appropriate layout type that
the server supports and the client is prepared to use. The layout the server supports and the client is prepared to use. The layout
returned to the client may not exactly align with the requested byte returned to the client might not exactly match the requested byte
range. A field within the LAYOUTGET request, loga_minlength, range as described in Section 18.43.3. As needed a client may make
specifies the minimum length of the layout. The loga_minlength field multiple LAYOUTGET requests; these might result in multiple
should be at least one. As needed a client may make multiple overlapping, non-conflicting layouts (see Section 12.2.8).
LAYOUTGET requests; these will result in multiple overlapping, non-
conflicting layouts.
In order to get a layout, the client must first have opened the file In order to get a layout, the client must first have opened the file
via the OPEN operation. When a client has no layout on a file, it via the OPEN operation. When a client has no layout on a file, it
MUST present a stateid as returned by OPEN, a delegation stateid, or MUST present a stateid as returned by OPEN, a delegation stateid, or
a byte-range lock stateid in the loga_stateid argument. A successful a byte-range lock stateid in the loga_stateid argument. A successful
LAYOUTGET result includes a layout stateid. The first successful LAYOUTGET result includes a layout stateid. The first successful
LAYOUTGET processed by the server using a non-layout stateid as an LAYOUTGET processed by the server using a non-layout stateid as an
argument MUST have the "seqid" field of the layout stateid in the argument MUST have the "seqid" field of the layout stateid in the
response set to one. Thereafter, the client uses a layout stateid response set to one. Thereafter, the client uses a layout stateid
(see Section 12.5.3) on future invocations of LAYOUTGET on the file, (see Section 12.5.3) on future invocations of LAYOUTGET on the file,
skipping to change at page 275, line 24 skipping to change at page 275, line 22
correct "seqid" is defined as the highest "seqid" value from correct "seqid" is defined as the highest "seqid" value from
responses of fully processed LAYOUTGET or LAYOUTRETURN operations or responses of fully processed LAYOUTGET or LAYOUTRETURN operations or
arguments of a fully processed CB_LAYOUTRECALL operation. Since the arguments of a fully processed CB_LAYOUTRECALL operation. Since the
server is incrementing the "seqid" value on each layout operation, server is incrementing the "seqid" value on each layout operation,
the client may determine the order of operation processing by the client may determine the order of operation processing by
inspecting the "seqid" value. In the case of overlapping layout inspecting the "seqid" value. In the case of overlapping layout
ranges, the ordering information will provide the client the ranges, the ordering information will provide the client the
knowledge of which layout ranges are held. Note that overlapping knowledge of which layout ranges are held. Note that overlapping
layout ranges may occur because of the client's specific requests or layout ranges may occur because of the client's specific requests or
because the server is allowed to expand the range of a requested because the server is allowed to expand the range of a requested
layout and notify the client in the LAYOUTRETURN results Additional layout and notify the client in the LAYOUTRETURN results. Additional
layout stateid sequencing requirements are provided in layout stateid sequencing requirements are provided in
Section 12.5.5.2. Section 12.5.5.2.
The client's receipt of a "seqid" is not sufficient for subsequent The client's receipt of a "seqid" is not sufficient for subsequent
use. The client must fully process the operations before the "seqid" use. The client must fully process the operations before the "seqid"
can be used. For LAYOUTGET results, if the client is not using the can be used. For LAYOUTGET results, if the client is not using the
forgetful model (Section 12.5.5.1), it MUST first update its record forgetful model (Section 12.5.5.1), it MUST first update its record
of what ranges of the file's layout it has before using the seqid. of what ranges of the file's layout it has before using the seqid.
For LAYOUTRETURN results, the client MUST delete the range from its For LAYOUTRETURN results, the client MUST delete the range from its
record of what ranges of the file's layout it had before using the record of what ranges of the file's layout it had before using the
skipping to change at page 504, line 45 skipping to change at page 504, line 45
records introduced in the description of EXCHANGE_ID is used with the records introduced in the description of EXCHANGE_ID is used with the
following addition: following addition:
clientid_arg: The value of the csa_clientid field of the clientid_arg: The value of the csa_clientid field of the
CREATE_SESSION4args structure of the current request. CREATE_SESSION4args structure of the current request.
Since CREATE_SESSION is a non-idempotent operation, we must consider Since CREATE_SESSION is a non-idempotent operation, we must consider
the possibility that retries may occur as a result of a client the possibility that retries may occur as a result of a client
restart, network partition, malfunctioning router, etc. For each restart, network partition, malfunctioning router, etc. For each
client ID created by EXCHANGE_ID, the server maintains a separate client ID created by EXCHANGE_ID, the server maintains a separate
reply cache similar to the session reply cache used for SEQUENCE reply cache (called the CREATE_SESSION reply cache) similar to the
operations, with two distinctions. session reply cache used for SEQUENCE operations, with two
distinctions.
o First this is a reply cache just for detecting and processing o First this is a reply cache just for detecting and processing
CREATE_SESSION requests for a given client ID. CREATE_SESSION requests for a given client ID.
o Second, the size of the client ID reply cache is of one slot (and o Second, the size of the client ID reply cache is of one slot (and
as a result, the CREATE_SESSION request does not carry a slot as a result, the CREATE_SESSION request does not carry a slot
number). This means that at most one CREATE_SESSION request for a number). This means that at most one CREATE_SESSION request for a
given client ID can be outstanding. given client ID can be outstanding.
As previously stated, CREATE_SESSION can be sent with or without a
preceding SEQUENCE operation. Even if SEQUENCE precedes
CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply
cache, which is separate from the reply cache for the session
associated with SEQUENCE. If CREATE_SESSION was originally sent by
itself, the client MAY send a retry of the CREATE_SESSION operation
within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was
originally sent in a COMPOUND that started with SEQUENCE, then the
client SHOULD send a retry in a COMPOUND that starts with SEQUENCE
that has the same session id as the SEQUENCE of the original request.
However, the client MAY send a retry in a COMPOUND that either has no
preceding SEQUENCE, or has a preceding SEQUENCE that refers to a
different session than the original CREATE_SESSION. This might be
necessary if the client sends a CREATE_SESSION in a COMPOUND preceded
by a SEQUENCE with session id X, and session X no longer exists.
Regardless any retry of CREATE_SESSION, with or without a preceding
SEQUENCE, MUST use the same value of csa_sequence as the original.
When a client sends a successful EXCHANGE_ID and it is returned an When a client sends a successful EXCHANGE_ID and it is returned an
unconfirmed client ID, the client is also returned eir_sequenceid, unconfirmed client ID, the client is also returned eir_sequenceid,
and the client is expected to set the value of csa_sequenceid in the and the client is expected to set the value of csa_sequenceid in the
client ID-confirming-CREATE_SESSION it sends with that client ID to client ID-confirming-CREATE_SESSION it sends with that client ID to
the value of eir_sequenceid. When EXCHANGE_ID returns a new, the value of eir_sequenceid. When EXCHANGE_ID returns a new,
unconfirmed client ID, the server initializes the client ID slot to unconfirmed client ID, the server initializes the client ID slot to
be equal to eir_sequenceid - 1 (accounting for underflow), and be equal to eir_sequenceid - 1 (accounting for underflow), and
records a contrived CREATE_SESSION result with a "cached" result of records a contrived CREATE_SESSION result with a "cached" result of
NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the
processing of the CREATE_SESSION operation is divided into four processing of the CREATE_SESSION operation is divided into four
skipping to change at page 522, line 51 skipping to change at page 522, line 51
the sessionid in the preceding SEQUENCE operation), current the sessionid in the preceding SEQUENCE operation), current
filehandle, layout type (loga_layout_type), and the layout stateid filehandle, layout type (loga_layout_type), and the layout stateid
(loga_stateid). The use of the loga_iomode field depends upon the (loga_stateid). The use of the loga_iomode field depends upon the
layout type, but should reflect the client's data access intent. layout type, but should reflect the client's data access intent.
If the metadata server is in a grace period, and does not persist If the metadata server is in a grace period, and does not persist
layouts and device ID to device address mappings, then it MUST return layouts and device ID to device address mappings, then it MUST return
NFS4ERR_GRACE (see Section 8.4.2.1). NFS4ERR_GRACE (see Section 8.4.2.1).
The LAYOUTGET operation returns layout information for the specified The LAYOUTGET operation returns layout information for the specified
byte range: a layout. To get a layout from a specific offset through byte range: a layout. The client actually specifies two ranges, both
the end-of-file, regardless of the file's length, a loga_length field starting at the offset in the loga_offset field. The first range is
set to NFS4_UINT64_MAX is used. If loga_length is zero, or if a between loga_offset and loga_offset + loga_length - 1 inclusive.
loga_length which is not NFS4_UINT64_MAX is specified, and the sum of This range indicates the desired range the client wants the layout to
loga_length and loga_offset exceeds NFS4_UINT64_MAX, the error cover. The second range is between loga_offset and loga_offset +
NFS4ERR_INVAL will result. loga_minlength - 1 inclusive. This range indicates the required
range the client needs the layout to cover. Thus, loga_minlength
MUST be less than or equal to loga_length.
The loga_minlength field specifies the minimum length of layout the When a length field is set to NFS4_UINT64_MAX, this indicates a
server MUST return with two exceptions: desire (when loga_length is NFS4_UINT64_MAX) or requirement (when
loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset
through the end-of-file, regardless of the file's length.
1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and The following rules govern the relationships among, and minima of
loga_offset plus loga_minlength goes past the end of the file. loga_length, loga_minlength, and loga_offset.
2. The range from loga_offset through loga_offset + loga_minlength - o If loga_length is less than loga_minlength, the metadata server
1 overlaps two or more striping patterns. In which case, MUST return NFS4ERR_INAL.
logr_layout will contain two or more elements, and the sum of the
lo_length fields of each element MUST be at least loga_minlength
unless the first exception also applies.
If this requirement cannot be met, the server MUST NOT return a o If loga_length or loga_minlength are zero the metadata server MUST
layout and the error NFS4ERR_BADLAYOUT MUST be returned. return NFS4ERR_INVAL.
o If the sum of loga_offset and loga_minlength exceeds
NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the
error NFS4ERR_INVAL MUST result.
o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX,
and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL
MUST result.
After the metadata server has performed the above checks on
loga_offset, loga_minlength, and loga_offset, the metadata server
MUST return a layout according to the rules in Table 21.
Acceptable layouts based on loga_minlength. u64m = NFS4_UINT64_MAX;
a_off = loga_offset; a_minlen = loga_minlength.
+-----------+-----------+----------+----------+---------------------+
| Layout | Layout | Layout | Layout | Layout length of |
| iomode of | a_minlen | iomode | offset | reply |
| request | of | of reply | of reply | |
| | request | | | |
+-----------+-----------+----------+----------+---------------------+
| _READ | u64m | MAY be | MUST be | MUST be >= file |
| | | _READ | <= a_off | length - layout |
| | | | | offset |
| _READ | u64m | MAY be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _READ | < u64m | MAY be | MUST be | MUST be >= MIN(file |
| | | _READ | <= a_off | length, a_minlen + |
| | | | | a_off) - layout |
| | | | | offset |
| _READ | < u64m | MAY be | MUST be | MUST be >= a_off - |
| | | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
| _RW | u64m | MUST be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _RW | < u64m | MUST be | MUST be | MUST be >= a_off - |
| | | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
+-----------+-----------+----------+----------+---------------------+
Table 21
If the metadata server cannot return a layout according to the rules
in Table 21, then the metadata server MUST return the error
NFS4ERR_BADLAYOUT. Assuming loga_length is greater than
loga_minlength, the metadata server SHOULD return a layout according
to the rules in Table 22.
Desired layouts based on loga_length. The rules of Table 21 MUST be
applied first. u64m = NFS4_UINT64_MAX; a_off = loga_offset; a_len =
loga_length.
+------------+------------+-----------+-----------+-----------------+
| Layout | Layout | Layout | Layout | Layout length |
| iomode of | a_len of | iomode of | offset of | of reply |
| request | request | reply | reply | |
+------------+------------+-----------+-----------+-----------------+
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _READ | <= a_off | |
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _READ | < u64m | MAY be | MUST be | SHOULD be >= |
| | | _READ | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _READ | < u64m | MAY be | MUST be | SHOULD be >= |
| | | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _RW | u64m | MUST be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _RW | < u64m | MUST be | MUST be | SHOULD be >= |
| | | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
+------------+------------+-----------+-----------+-----------------+
Table 22
The loga_stateid field specifies a valid stateid. If a layout is not The loga_stateid field specifies a valid stateid. If a layout is not
currently held by the client, the loga_stateid field represents a currently held by the client, the loga_stateid field represents a
stateid reflecting the correspondingly valid open, byte-range lock, stateid reflecting the correspondingly valid open, byte-range lock,
or delegation stateid. Once a layout is held by the client for the or delegation stateid. Once a layout is held on the file by the
file, the loga_stateid field is a stateid as returned from a previous client, the loga_stateid field MUST be a stateid as returned from a
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL previous LAYOUTGET or LAYOUTRETURN operation or provided by a
operation (see Section 12.5.3). CB_LAYOUTRECALL operation (see Section 12.5.3).
The loga_maxcount field specifies the maximum layout size (in bytes) The loga_maxcount field specifies the maximum layout size (in bytes)
that the client can handle. If the size of the layout structure that the client can handle. If the size of the layout structure
exceeds the size specified by maxcount, the metadata server will exceeds the size specified by maxcount, the metadata server will
return the NFS4ERR_TOOSMALL error. return the NFS4ERR_TOOSMALL error.
The returned layout is expressed as an array, logr_layout, with each The returned layout is expressed as an array, logr_layout, with each
element of type layout4. If a file has a single striping pattern, element of type layout4. If a file has a single striping pattern,
then logr_layout will contain just one entry. Otherwise, if the then logr_layout will contain just one entry. Otherwise, if the
requested range overlaps more than one striping pattern, logr_layout requested range overlaps more than one striping pattern, logr_layout
will contain the required number of entries. The elements of will contain the required number of entries. The elements of
logr_layout MUST be sorted in ascending order of the value of the logr_layout MUST be sorted in ascending order of the value of the
lo_offset field of each element. There MUST be no gaps or overlaps lo_offset field of each element. There MUST be no gaps or overlaps
in the range between two successive elements of logr_layout. The in the range between two successive elements of logr_layout. The
lo_iomode field in each element of logr_layout MUST be the same. lo_iomode field in each element of logr_layout MUST be the same.
The metadata server may adjust the range of the returned layout based Table 21 and Table 22 both refer to a returned layout iomode, offset,
on the usage implied by the loga_iomode. The client MUST be prepared and length. Because the returned layout is encoded in the
to get a layout that does not align exactly with its request. See logr_layout array, more description is required.
Section 12.5.2 for more details.
The metadata server may also return a layout with an lo_iomode other iomode
than that requested by the client. If it does so, it MUST ensure
that the lo_iomode is more permissive than the loga_iomode requested. The value of the returned layout iomode listed in Table 21 and
For example, this behavior allows an implementation to upgrade read- Table 22 is equal to the value of the lo_iomode field in each
only requests to read/write requests at its discretion, within the element of logr_layout. As shown in Table 21 and Table 22, the
limits of the layout type specific protocol. A lo_iomode of either metadata server MAY return a layout with an lo_iomode different
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. from the requested iomode (field loga_iomode of the request). If
it does so, it MUST ensure that the lo_iomode is more permissive
than the loga_iomode requested. For example, this behavior allows
an implementation to upgrade read-only requests to read/write
requests at its discretion, within the limits of the layout type
specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or
LAYOUTIOMODE4_RW MUST be returned.
offset
The value of the returned layout offset listed in Table 21 and
Table 22 is always equal to the lo_offset field of the field
element logr_layout.
length
When setting the value of the returned layout length, the
situation is complicated by the possibility that the special
layout length value NFS4_UINT64_MAX is involved. For a
logr_layout array of N elements, the lo_length field in the first
N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of
the last element of logr_layout can be NFS4_UINT64_MAX under some
conditions as described in the following list.
* If an applicable rule of Table 21 states the metadata server
MUST return a layout of length NFS4_UINT64_MAX, then lo_length
field of the last element of logr_layout MUST be
NFS4_UINT64_MAX.
* If an applicable rule of Table 21 states the metadata server
MUST NOT return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout MUST NOT be
NFS4_UINT64_MAX.
* If an applicable rule of Table 22 states the metadata server
SHOULD return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout SHOULD be
NFS4_UINT64_MAX.
* When the value of the returned layout length of Table 21 and
Table 22 is not NFS4_UINT64_MAX, then the returned layout
length is equal to the sum of the lo_length fields of each
element of logr_layout.
The logr_return_on_close result field is a directive to return the The logr_return_on_close result field is a directive to return the
layout before closing the file. When the server sets this return layout before closing the file. When the metadata server sets this
value to TRUE, it MUST be prepared to recall the layout in the case return value to TRUE, it MUST be prepared to recall the layout in the
the client fails to return the layout before close. For the server case the client fails to return the layout before close. For the
that knows a layout must be returned before a close of the file, this metadata server that knows a layout must be returned before a close
return value can be used to communicate the desired behavior to the of the file, this return value can be used to communicate the desired
client and thus remove one extra step from the client's and server's behavior to the client and thus remove one extra step from the
interaction. client's and metadata server's interaction.
The logr_stateid stateid is returned to the client for use in The logr_stateid stateid is returned to the client for use in
subsequent layout related operations. See Section 8.2, subsequent layout related operations. See Section 8.2,
Section 12.5.3, and Section 12.5.5.2 for a further discussion and Section 12.5.3, and Section 12.5.5.2 for a further discussion and
requirements. requirements.
The format of the returned layout (lo_content) is specific to the The format of the returned layout (lo_content) is specific to the
layout type. The value of the layout type (lo_content.loc_type) for layout type. The value of the layout type (lo_content.loc_type) for
each of the elements of the array of layouts returned by the server each of the elements of the array of layouts returned by the metadata
(logr_layout) MUST be equal to the loga_layout_type specified by the server (logr_layout) MUST be equal to the loga_layout_type specified
client. If it is not equal, the client SHOULD ignore the response as by the client. If it is not equal, the client SHOULD ignore the
invalid and behave as if the server returned an error, even if the response as invalid and behave as if the metadata server returned an
client does have support for the layout type returned. error, even if the client does have support for the layout type
returned.
If layouts are not supported for the requested file or its containing If layouts are not supported for the requested file or its containing
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If file system the metadata server MUST return
the layout type is not supported, the metadata server should return NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts
matches the client provided layout identification, the server should are supported but no layout matches the client provided layout
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or identification, the metadata server MUST return NFS4ERR_BADLAYOUT.
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should If an invalid loga_iomode is specified, or a loga_iomode of
return NFS4ERR_BADIOMODE. LAYOUTIOMODE4_ANY is specified, the metadata server MUST return
NFS4ERR_BADIOMODE.
If the layout for the file is unavailable due to transient If the layout for the file is unavailable due to transient
conditions, e.g. file sharing prohibits layouts, the server MUST conditions, e.g. file sharing prohibits layouts, the metadata server
return NFS4ERR_LAYOUTTRYLATER. MUST return NFS4ERR_LAYOUTTRYLATER.
If the layout request is rejected due to an overlapping layout If the layout request is rejected due to an overlapping layout
recall, the server MUST return NFS4ERR_RECALLCONFLICT. See recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See
Section 12.5.5.2 for details. Section 12.5.5.2 for details.
If the layout conflicts with a mandatory byte range lock held on the If the layout conflicts with a mandatory byte range lock held on the
file, and if the storage devices have no method of enforcing file, and if the storage devices have no method of enforcing
mandatory locks, other than through the restriction of layouts, the mandatory locks, other than through the restriction of layouts, the
metadata server should return NFS4ERR_LOCKED. metadata server SHOULD return NFS4ERR_LOCKED.
If client sets loga_signal_layout_avail to TRUE, then it is If client sets loga_signal_layout_avail to TRUE, then it is
registering with the client a "want" for a layout in the event the registering with the client a "want" for a layout in the event the
layout cannot be obtained due to resource exhaustion. If the server layout cannot be obtained due to resource exhaustion. If the
supports and will honor the "want", the results will have metadata server supports and will honor the "want", the results will
logr_will_signal_layout_avail set to TRUE. If so the client should have logr_will_signal_layout_avail set to TRUE. If so the client
expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a layout should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a
is available. layout is available.
On success, the current filehandle retains its value and the current On success, the current filehandle retains its value and the current
stateid is updated to match the value as returned in the results. stateid is updated to match the value as returned in the results.
18.43.4. IMPLEMENTATION 18.43.4. IMPLEMENTATION
Typically, LAYOUTGET will be called as part of a COMPOUND request Typically, LAYOUTGET will be called as part of a COMPOUND request
after an OPEN operation and results in the client having location after an OPEN operation and results in the client having location
information for the file; this requires that loga_stateid be set to information for the file; this requires that loga_stateid be set to
the special stateid that tells the server to use the current stateid, the special stateid that tells the metadata server to use the current
which is set by OPEN (see Section 16.2.3.1.2) . A client may also stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client
hold a layout across multiple OPENs. The client specifies a layout may also hold a layout across multiple OPENs. The client specifies a
type that limits what kind of layout the server will return. This layout type that limits what kind of layout the metadata server will
prevents servers from issuing layouts that are unusable by the return. This prevents metadata servers from granting layouts that
client. are unusable by the client.
Once the client has obtained a layout referring to a particular Once the client has obtained a layout referring to a particular
device ID, the server MUST NOT delete the device ID until the layout device ID, the metadata server MUST NOT delete the device ID until
is returned or revoked. the layout is returned or revoked.
CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is
that LAYOUTGET returns a device ID the client does not have device that LAYOUTGET returns a device ID the client does not have device
address mappings for, and the server sends a CB_NOTIFY_DEVICEID to address mappings for, and the metadata server sends a
add the device ID to the client's awareness and meanwhile the client CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and
sends GETDEVICEINFO on the device ID. This scenario is discussed in meanwhile the client sends GETDEVICEINFO on the device ID. This
Section 18.40.4. Another scenario is that the CB_NOTIFY_DEVICEID is scenario is discussed in Section 18.40.4. Another scenario is that
processed by the client before it processes the results from the CB_NOTIFY_DEVICEID is processed by the client before it processes
LAYOUTGET. The client will send a GETDEVICEINFO on the device ID. the results from LAYOUTGET. The client will send a GETDEVICEINFO on
If the results from GETDEVICEINFO are received before the client gets the device ID. If the results from GETDEVICEINFO are received before
results from LAYTOUTGET, then there is no longer a race. If the the client gets results from LAYTOUTGET, then there is no longer a
results from LAYOUTGET are received before the results from race. If the results from LAYOUTGET are received before the results
GETDEVICEINFO, the client can either wait for results of from GETDEVICEINFO, the client can either wait for results of
GETDEVICEINFO, or send another one to get possibly more up to date GETDEVICEINFO, or send another one to get possibly more up to date
device address mappings for the device ID. device address mappings for the device ID.
18.44. Operation 51: LAYOUTRETURN - Release Layout Information 18.44. Operation 51: LAYOUTRETURN - Release Layout Information
18.44.1. ARGUMENT 18.44.1. ARGUMENT
/* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */
const LAYOUT4_RET_REC_FILE = 1; const LAYOUT4_RET_REC_FILE = 1;
const LAYOUT4_RET_REC_FSID = 2; const LAYOUT4_RET_REC_FSID = 2;
skipping to change at page 537, line 15 skipping to change at page 540, line 15
If SEQUENCE returns an error, then the state of the slot (sequence If SEQUENCE returns an error, then the state of the slot (sequence
id, cached reply) MUST NOT change, and the associated lease MUST NOT id, cached reply) MUST NOT change, and the associated lease MUST NOT
be renewed. be renewed.
If SEQUENCE returns NFS4_OK, then the associated lease MUST be If SEQUENCE returns NFS4_OK, then the associated lease MUST be
renewed (see Section 8.3), except if renewed (see Section 8.3), except if
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags.
18.46.4. IMPLEMENTATION 18.46.4. IMPLEMENTATION
The server MUST maintain a mapping of sessionid to client ID in order The server MUST maintain a mapping of session id to client ID in
to validate any operations that follow SEQUENCE that take a stateid order to validate any operations that follow SEQUENCE that take a
as an argument and/or result. stateid as an argument and/or result.
If the client establishes a persistent session, then a SEQUENCE done If the client establishes a persistent session, then a SEQUENCE done
after a server restart may encounter requests performed and recorded after a server restart may encounter requests performed and recorded
in a persistent reply cache before the server restart. In this case, in a persistent reply cache before the server restart. In this case,
SEQUENCE will be processed successfully, while requests which were SEQUENCE will be processed successfully, while requests which were
not processed previously are rejected with NFS4ERR_DEADSESSION. not processed previously are rejected with NFS4ERR_DEADSESSION.
Depending on which of the operations within the COMPOUND were Depending on which of the operations within the COMPOUND were
successfully performed before the server restart, these operations successfully performed before the server restart, these operations
will also have replies sent from the server reply cache. Note that will also have replies sent from the server reply cache. Note that
skipping to change at page 549, line 19 skipping to change at page 552, line 19
19.1.1. ARGUMENTS 19.1.1. ARGUMENTS
void; void;
19.1.2. RESULTS 19.1.2. RESULTS
void; void;
19.1.3. DESCRIPTION 19.1.3. DESCRIPTION
Standard NULL procedure. Void argument, void response. Even though CB_NULL is the standard ONC RPC NULL procedure, with the standard
there is no direct functionality associated with this procedure, the void argument and void response. Even though there is no direct
server will use CB_NULL to confirm the existence of a path for RPCs functionality associated with this procedure, the server will use
from server to client. CB_NULL to confirm the existence of a path for RPCs from the server
to client.
19.1.4. ERRORS 19.1.4. ERRORS
None. None.
19.2. Procedure 1: CB_COMPOUND - Compound Operations 19.2. Procedure 1: CB_COMPOUND - Compound Operations
19.2.1. ARGUMENTS 19.2.1. ARGUMENTS
enum nfs_cb_opnum4 { enum nfs_cb_opnum4 {
skipping to change at page 552, line 17 skipping to change at page 555, line 17
nfs_cb_resop4 resarray<>; nfs_cb_resop4 resarray<>;
}; };
19.2.3. DESCRIPTION 19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the The CB_COMPOUND procedure is used to combine one or more of the
callback procedures into a single RPC request. The main callback RPC callback procedures into a single RPC request. The main callback RPC
program has two main procedures: CB_NULL and CB_COMPOUND. All other program has two main procedures: CB_NULL and CB_COMPOUND. All other
operations use the CB_COMPOUND procedure as a wrapper. operations use the CB_COMPOUND procedure as a wrapper.
In the processing of the CB_COMPOUND procedure, the client may find During the processing of the CB_COMPOUND procedure, the client may
that it does not have the available resources to execute any or all find that it does not have the available resources to execute any or
of the operations within the CB_COMPOUND sequence. This is discussed all of the operations within the CB_COMPOUND sequence. Refer to
in Section 2.10.5.4. Section 2.10.5.4 for details.
The minorversion field of the arguments MUST be the same as the The minorversion field of the arguments MUST be the same as the
minorversion of the COMPOUND procedure used to created the client ID minorversion of the COMPOUND procedure used to created the client ID
and session. For NFSv4.1, minorversion MUST be set to 1. and session. For NFSv4.1, minorversion MUST be set to 1.
Contained within the CB_COMPOUND results is a 'status' field. This Contained within the CB_COMPOUND results is a 'status' field. This
status must be equivalent to the status of the last operation that status must be equivalent to the status of the last operation that
was executed within the CB_COMPOUND procedure. Therefore, if an was executed within the CB_COMPOUND procedure. Therefore, if an
operation incurred an error then the 'status' value will be the same operation incurred an error then the 'status' value will be the same
error value as is being returned for the operation that failed. error value as is being returned for the operation that failed.
For a description of the "tag" field, see Section 16.2.3 where the The "tag" field is handled the same way as that of COMPOUND procedure
corresponding forward channel procedure is described. (see Section 16.2.3).
Illegal operation codes are handled in the same way as they are Illegal operation codes are handled in the same way as they are
handled for the COMPOUND procedure. handled for the COMPOUND procedure.
19.2.4. IMPLEMENTATION 19.2.4. IMPLEMENTATION
The CB_COMPOUND procedure is used to combine individual operations The CB_COMPOUND procedure is used to combine individual operations
into a single RPC request. The client interprets each of the into a single RPC request. The client interprets each of the
operations in turn. If an operation is executed by the client and operations in turn. If an operation is executed by the client and
the status of that operation is NFS4_OK, then the next operation in the status of that operation is NFS4_OK, then the next operation in
skipping to change at page 553, line 28 skipping to change at page 556, line 28
| NFS4ERR_INVAL | The tag argument is not in UTF-8 | | NFS4ERR_INVAL | The tag argument is not in UTF-8 |
| | encoding. | | | encoding. |
| NFS4ERR_MINOR_VERS_MISMATCH | | | NFS4ERR_MINOR_VERS_MISMATCH | |
| NFS4ERR_SERVERFAULT | | | NFS4ERR_SERVERFAULT | |
| NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_TOO_MANY_OPS | |
| NFS4ERR_REP_TOO_BIG | | | NFS4ERR_REP_TOO_BIG | |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | |
| NFS4ERR_REQ_TOO_BIG | | | NFS4ERR_REQ_TOO_BIG | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 21 Table 23
20. NFSv4.1 Callback Operations 20. NFSv4.1 Callback Operations
20.1. Operation 3: CB_GETATTR - Get Attributes 20.1. Operation 3: CB_GETATTR - Get Attributes
20.1.1. ARGUMENT 20.1.1. ARGUMENT
struct CB_GETATTR4args { struct CB_GETATTR4args {
nfs_fh4 fh; nfs_fh4 fh;
bitmap4 attr_request; bitmap4 attr_request;
skipping to change at page 554, line 27 skipping to change at page 557, line 27
20.1.3. DESCRIPTION 20.1.3. DESCRIPTION
The CB_GETATTR operation is used by the server to obtain the current The CB_GETATTR operation is used by the server to obtain the current
modified state of a file that has been write delegated. The modified state of a file that has been write delegated. The
attributes size and change are the only ones guaranteed to be attributes size and change are the only ones guaranteed to be
serviced by the client. See Section 10.4.3 for a full description of serviced by the client. See Section 10.4.3 for a full description of
how the client and server are to interact with the use of CB_GETATTR. how the client and server are to interact with the use of CB_GETATTR.
If the filehandle specified is not one for which the client holds a If the filehandle specified is not one for which the client holds a
write open delegation, an NFS4ERR_BADHANDLE error is returned. write delegation, an NFS4ERR_BADHANDLE error is returned.
20.1.4. IMPLEMENTATION 20.1.4. IMPLEMENTATION
The client returns attrmask bits and the associated attribute values The client returns attrmask bits and the associated attribute values
only for the change attribute, and attributes that it may change only for the change attribute, and attributes that it may change
(time_modify, and size). (time_modify, and size).
20.2. Operation 4: CB_RECALL - Recall an Open Delegation 20.2. Operation 4: CB_RECALL - Recall a Delegation
20.2.1. ARGUMENT 20.2.1. ARGUMENT
struct CB_RECALL4args { struct CB_RECALL4args {
stateid4 stateid; stateid4 stateid;
bool truncate; bool truncate;
nfs_fh4 fh; nfs_fh4 fh;
}; };
20.2.2. RESULT 20.2.2. RESULT
struct CB_RECALL4res { struct CB_RECALL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.2.3. DESCRIPTION 20.2.3. DESCRIPTION
The CB_RECALL operation is used to begin the process of recalling an The CB_RECALL operation is used to begin the process of recalling a
open delegation and returning it to the server. delegation and returning it to the server.
The truncate flag is used to optimize recall for a file which is The truncate flag is used to optimize recall for a file object which
about to be truncated to zero. When it is set, the client is freed is a regular file and is about to be truncated to zero. When it is
of obligation to propagate modified data for the file to the server, TRUE, the client is freed of the obligation to propagate modified
since this data is irrelevant. data for the file to the server, since this data is irrelevant.
If the handle specified is not one for which the client holds an open If the handle specified is not one for which the client holds a
delegation, an NFS4ERR_BADHANDLE error is returned. delegation, an NFS4ERR_BADHANDLE error is returned.
If the stateid specified is not one corresponding to an open If the stateid specified is not one corresponding to an open
delegation for the file specified by the filehandle, an delegation for the file specified by the filehandle, an
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
20.2.4. IMPLEMENTATION 20.2.4. IMPLEMENTATION
The client should reply to the callback immediately. Replying does The client SHOULD reply to the callback immediately. Replying does
not complete the recall except when an error was returned. The not complete the recall except when the value of the reply's status
recall is not complete until the delegation is returned using a field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not
DELEGRETURN. complete until the delegation is returned using a DELEGRETURN
operation.
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client
20.3.1. ARGUMENT 20.3.1. ARGUMENT
/* /*
* NFSv4.1 callback arguments and results * NFSv4.1 callback arguments and results
*/ */
enum layoutrecall_type4 { enum layoutrecall_type4 {
skipping to change at page 556, line 50 skipping to change at page 559, line 50
20.3.2. RESULT 20.3.2. RESULT
struct CB_LAYOUTRECALL4res { struct CB_LAYOUTRECALL4res {
nfsstat4 clorr_status; nfsstat4 clorr_status;
}; };
20.3.3. DESCRIPTION 20.3.3. DESCRIPTION
The CB_LAYOUTRECALL operation is used by the server to recall layouts The CB_LAYOUTRECALL operation is used by the server to recall layouts
from the client; as a result, the client will begin the process of from the client; as a result, the client will begin the process of
returning layouts with LAYOUTRETURN. The CB_LAYOUTRECALL operation returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation
specifies one of three forms of recall processing with the value of specifies one of three forms of recall processing with the value of
layoutrecall_type4. The recall is either for a specific layout (by layoutrecall_type4. The recall is either for a specific layout (by
file), for an entire file system (FSID), or for all file systems file), for an entire file system (FSID), or for all file systems
(ALL). (ALL).
The behavior of the operation varies based on the value of the The behavior of the operation varies based on the value of the
layoutrecall_type4. The value and behaviors are: layoutrecall_type4. The value and behaviors are:
LAYOUTRECALL4_FILE LAYOUTRECALL4_FILE
For a layout to match the recall request, the following fields For a layout to match the recall request, the values of the
must match in value with the layout: clora_type, clora_iomode, following fields must match those of the layout: clora_type,
lor_fh, and the byte range specified by lor_offset, and clora_iomode, lor_fh, and the byte range specified by lor_offset
lor_length. The clora_iomode field may have a special value of and lor_length. The clora_iomode field may have a special value
LAYOUTIOMODE4_ANY. The LAYOUTIOMODE4_ANY will match any value of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will
originally returned in a layout; therefore it acts as a wild card match any iomode originally returned in a layout; therefore it
for iomode. The other special value used is for lor_length. If acts as a wild card. The other special value used is for
lor_length has a value of NFS4_MAXFILELEN, the lor_length field lor_length. If lor_length has a value of NFS4_UINT64_MAX, the
means the maximum possible file size. If a matching layout is lor_length field means the maximum possible file size. If a
found, it MUST be returned using the LAYOUTRETURN operation, see matching layout is found, it MUST be returned using the
Section 18.44. An example of the field's special value use is if LAYOUTRETURN operation (see Section 18.44). An example of the
clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY,
lor_length is NFS4_MAXFILELEN, then the entire layout is to be lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the
returned. entire layout is to be returned.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts for the file or if the client does client does not hold layouts for the file or if the client does
not have any overlapping layouts for the specification in the not have any overlapping layouts for the specification in the
layout recall. layout recall.
LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL
If LAYOUTRECALL4_FSID is specified, the fsid specifies the file If LAYOUTRECALL4_FSID is specified, the fsid specifies the file
system for which any outstanding layouts MUST be returned. If system for which any outstanding layouts MUST be returned. If
skipping to change at page 557, line 51 skipping to change at page 560, line 51
respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or
LAYOUTRETURN4_ALL acknowledges to the server that the client LAYOUTRETURN4_ALL acknowledges to the server that the client
invalidated the said device mappings. See Section 12.5.5.2.1.5 invalidated the said device mappings. See Section 12.5.5.2.1.5
for considerations with "bulk" recall of layouts. for considerations with "bulk" recall of layouts.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts and does not have valid deviceid client does not hold layouts and does not have valid deviceid
mappings. mappings.
In processing the layout recall request, the client also varies its In processing the layout recall request, the client also varies its
behavior on the value of the clora_changed field. This field is used behavior based on the value of the clora_changed field. This field
by the server to provide additional context for the reason why the is used by the server to provide additional context for the reason
layout is being recalled. A FALSE value for clora_changed indicates why the layout is being recalled. A FALSE value for clora_changed
that no change in the layout is expected and the client may write indicates that no change in the layout is expected and the client may
modified data to the storage devices involved; this must be done write modified data to the storage devices involved; this must be
prior to returning the layout via LAYOUTRETURN. A TRUE value for done prior to returning the layout via LAYOUTRETURN. A TRUE value
clora_changed indicates that the server is changing the layout. for clora_changed indicates that the server is changing the layout.
Examples of layout changes and reasons for a TRUE indication are: Examples of layout changes and reasons for a TRUE indication are: the
metadata server is restriping the file or a permanent error has metadata server is restriping the file or a permanent error has
occurred on a storage device and the metadata server would like to occurred on a storage device and the metadata server would like to
provide a new layout for the file. Therefore, a clora_changed value provide a new layout for the file. Therefore, a clora_changed value
of TRUE indicates some level of change for the layout and the client of TRUE indicates some level of change for the layout and the client
SHOULD NOT write and commit modified data to the storage devices. In SHOULD NOT write and commit modified data to the storage devices. In
this case, the client writes and commits data through the metadata this case, the client writes and commits data through the metadata
server. server.
See Section 12.5.3 for a description of how the lor_stateid field in See Section 12.5.3 for a description of how the lor_stateid field in
the arguments is to be constructed. Note that the "seqid" field of the arguments is to be constructed. Note that the "seqid" field of
lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and
Section 12.5.5.2 for a further discussion and requirements. Section 12.5.5.2 for a further discussion and requirements.
20.3.4. IMPLEMENTATION 20.3.4. IMPLEMENTATION
The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL
(recall of file delegations) in that straightforward processing of (recall of file delegations) in that the client responds to the
the layout recall done and the client responds to the request before request before actually returning layouts via the LAYOUTRETURN
actually returning layouts with the LAYOUTRETURN operation. While operation. While the client responds to the CB_LAYOUTRECALL
the client responds to the CB_LAYOUTRECALL immediately, the operation immediately, the operation is not considered complete (i.e.
is not considered complete (i.e. considered pending) until all considered pending) until all affected layouts are returned to the
affected layouts are returned to the server with the LAYOUTRETURN server via the LAYOUTRETURN operation.
operation.
Before returning the layout to the server with LAYOUTRETURN, the Before returning the layout to the server via LAYOUTRETURN, the
client should wait for the response from in-process or in-flight client should wait for the response from in-process or in-flight
READ, WRITE, or COMMIT operations that use the recalled layout. READ, WRITE, or COMMIT operations that use the recalled layout.
If the client is holding modified data which is effected by a If the client is holding modified data which is affected by a
recalled layout, the client has various options for writing the data recalled layout, the client has various options for writing the data
to the server. As always, the client may write the data through the to the server. As always, the client may write the data through the
metadata server. In fact, the client may not have a choice other metadata server. In fact, the client may not have a choice other
than writing to the metadata server when the clora_changed argument than writing to the metadata server when the clora_changed argument
is TRUE and a new layout is unavailable from the server. However, is TRUE and a new layout is unavailable from the server. However,
the client may be able to write the modified data to the storage the client may be able to write the modified data to the storage
device if the clora_changed argument is FALSE; this needs to be done device if the clora_changed argument is FALSE; this needs to be done
before returning the layout with LAYOUTRETURN. If the client were to before returning the layout via LAYOUTRETURN. If the client were to
obtain a new layout covering the modified data's range, then writing obtain a new layout covering the modified data's range, then writing
to the storage devices is an available alternative. Note that before to the storage devices is an available alternative. Note that before
obtaining a new layout, the client must first return the original obtaining a new layout, the client must first return the original
layout. layout.
In the case of modified data being written while the layout is held, In the case of modified data being written while the layout is held,
the client must use LAYOUTCOMMIT operations at the appropriate time; the client must use LAYOUTCOMMIT operations at the appropriate time;
as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a
large amount of modified data is outstanding, the client may send large amount of modified data is outstanding, the client may send
LAYOUTRETURNs for portions of the recalled layout; this allows the LAYOUTRETURNs for portions of the recalled layout; this allows the
skipping to change at page 561, line 24 skipping to change at page 564, line 24
to clients about changes to delegated directories The registration of to clients about changes to delegated directories The registration of
notifications for the directories occurs when the delegation is notifications for the directories occurs when the delegation is
established using GET_DIR_DELEGATION. These notifications are sent established using GET_DIR_DELEGATION. These notifications are sent
over the backchannel. The notification is sent once the original over the backchannel. The notification is sent once the original
request has been processed on the server. The server will send an request has been processed on the server. The server will send an
array of notifications for changes that might have occurred in the array of notifications for changes that might have occurred in the
directory. The notifications are sent as list of pairs of bitmaps directory. The notifications are sent as list of pairs of bitmaps
and values. See Section 3.3.7 for a description of how NFSv4.1 and values. See Section 3.3.7 for a description of how NFSv4.1
bitmaps work. bitmaps work.
If the server has more notifications then can fit in the CB_COMPOUND If the server has more notifications than can fit in the CB_COMPOUND
request, it SHOULD send a sequence of serial CB_COMPOUND requests so request, it SHOULD send a sequence of serial CB_COMPOUND requests so
that the client's view of the directory does not become confused. that the client's view of the directory does not become confused.
E.g. If the server indicates a file named "foo" is added, and that E.g. If the server indicates a file named "foo" is added, and that
the file "foo" is removed, the order it which the client receives the file "foo" is removed, the order in which the client receives
these notifications are processed needs to be the same as the order these notifications needs to be the same as the order in which
in which corresponding operations occurred on the server. corresponding operations occurred on the server.
If the client holding the delegation makes any changes in the If the client holding the delegation makes any changes in the
directory that cause files or sub directories to be added or removed, directory that cause files or sub directories to be added or removed,
the server will notify that client of the resulting change(s). If the server will notify that client of the resulting change(s). If
the client holding the delegation is making attribute or cookie the client holding the delegation is making attribute or cookie
verifier changes only, the server does not need to send notifications verifier changes only, the server does not need to send notifications
to that client. The server will send the following information for to that client. The server will send the following information for
each operation: each operation:
NOTIFY4_ADD_ENTRY NOTIFY4_ADD_ENTRY
The server will send information about the new directory entry The server will send information about the new directory entry
being created along with the cookie for that entry. The entry being created along with the cookie for that entry. The entry
information (data type notify_add4) includes the component name of information (data type notify_add4) includes the component name of
the entry and attributes. The server will send this type of entry the entry and attributes. The server will send this type of entry
when a file is actually being created, when an entry is being when a file is actually being created, when an entry is being
added to a directory as a result of a rename across directories added to a directory as a result of a rename across directories
(see below), and when a hard link is being created to an existing (see below), and when a hard link is being created to an existing
file. If this entry is added to the end of the directory, the file. If this entry is added to the end of the directory, the
server will set the nad_last_entry flag to true. If the file is server will set the nad_last_entry flag to TRUE. If the file is
added such that there is at least one entry before it, the server added such that there is at least one entry before it, the server
will also return the previous entry information (nad_prev_entry, a will also return the previous entry information (nad_prev_entry, a
variable length array of up to one element. If the array is of variable length array of up to one element. If the array is of
zero length, there is no previous entry), along with its cookie. zero length, there is no previous entry), along with its cookie.
This is to help clients find the right location in their DNLC or This is to help clients find the right location in their file name
directory caches where this entry should be cached. If the new caches and directory caches where this entry should be cached. If
entry's cookie is available, it will be in nad_new_entry_cookie the new entry's cookie is available, it will be in the
(another variable length array of up to one element). If the nad_new_entry_cookie (another variable length array of up to one
addition of the entry causes another entry to be deleted (which element) field. If the addition of the entry causes another entry
can only happen in the rename case) atomically with the addition, to be deleted (which can only happen in the rename case)
then information on this entry is reported in nad_old_entry. atomically with the addition, then information on this entry is
reported in nad_old_entry.
NOTIFY4_REMOVE_ENTRY NOTIFY4_REMOVE_ENTRY
The server will send information about the directory entry being The server will send information about the directory entry being
deleted. The server will also send the cookie value for the deleted. The server will also send the cookie value for the
deleted entry so that clients can get to the cached information deleted entry so that clients can get to the cached information
for this entry. for this entry.
NOTIFY4_RENAME_ENTRY NOTIFY4_RENAME_ENTRY
The server will send information about both the old entry and the The server will send information about both the old entry and the
new entry. This includes name and attributes for each entry. In new entry. This includes name and attributes for each entry. In
skipping to change at page 563, line 32 skipping to change at page 566, line 32
20.5.2. RESULT 20.5.2. RESULT
struct CB_PUSH_DELEG4res { struct CB_PUSH_DELEG4res {
nfsstat4 cpdr_status; nfsstat4 cpdr_status;
}; };
20.5.3. DESCRIPTION 20.5.3. DESCRIPTION
CB_PUSH_DELEG is used by the server to both signal to the client that CB_PUSH_DELEG is used by the server to both signal to the client that
the delegation it wants is available and to simultaneously offer the the delegation it wants (previously indicated via a want established
delegation to the client. The client has the choice of accepting the from an OPEN or WANT_DELEGATION operation) is available and to
delegation by returning NFS4_OK to the server, delaying the decision simultaneously offer the delegation to the client. The client has
to accept the offered delegation by returning NFS4ERR_DELAY or the choice of accepting the delegation by returning NFS4_OK to the
permanently rejecting the offer of the delegation by returning server, delaying the decision to accept the offered delegation by
NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, returning NFS4ERR_DELAY or permanently rejecting the offer of the
the want previously established is permanently deleted. delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is
rejected in this fashion, the want previously established is
The server MUST send in cpda_delegation a delegation which satisfies permanently deleted and the delegation is subject to acquisition by
a request made in an OPEN or WANT_DELEGATION operation. another client.
20.5.4. IMPLEMENTATION 20.5.4. IMPLEMENTATION
If the client does return NFS4ERR_DELAY and there is a conflicting If the client does return NFS4ERR_DELAY and there is a conflicting
delegation request, the server MAY process it at the expense of the delegation request, the server MAY process it at the expense of the
client that returned NFS4ERR_DELAY. The client's want will typically client that returned NFS4ERR_DELAY. The client's want will typically
not be cancelled, but MAY processed behind other delegation requests not be cancelled, but MAY processed behind other delegation requests
or registered wants. or registered wants.
When a client returns a status other than NFS4_OK, NFSERR_DELAY, or When a client returns a status other than NFS4_OK, NFSERR_DELAY, or
NFS4ERR_REJECT_DELAY, the want remains pending, although servers may NFS4ERR_REJECT_DELAY, the want remains pending, although servers may
decide to cancel the want by sending a CB_WANTS_CANCELLED. decide to cancel the want by sending a CB_WANTS_CANCELLED.
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations
Notify client to return delegation and keep N of them. Notify client to return all but N delegations.
20.6.1. ARGUMENT 20.6.1. ARGUMENT
const RCA4_TYPE_MASK_RDATA_DLG = 0; const RCA4_TYPE_MASK_RDATA_DLG = 0;
const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_WDATA_DLG = 1;
const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_DIR_DLG = 2;
const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_FILE_LAYOUT = 3;
const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; const RCA4_TYPE_MASK_BLK_LAYOUT = 4;
const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15;
struct CB_RECALL_ANY4args { struct CB_RECALL_ANY4args {
uint32_t craa_objects_to_keep; uint32_t craa_objects_to_keep;
bitmap4 craa_type_mask; bitmap4 craa_type_mask;
}; };
20.6.2. RESULT 20.6.2. RESULT
skipping to change at page 565, line 23 skipping to change at page 568, line 23
resource pools for layouts and for delegations, or further separate resource pools for layouts and for delegations, or further separate
resources by types of delegations. resources by types of delegations.
When a given resource pool is over-utilized, the server can send a When a given resource pool is over-utilized, the server can send a
CB_RECALL_ANY to clients holding recallable objects of the types CB_RECALL_ANY to clients holding recallable objects of the types
involved, allowing it to keep a certain number of such objects and involved, allowing it to keep a certain number of such objects and
return any excess. A mask specifies which types of objects are to be return any excess. A mask specifies which types of objects are to be
limited. The client chooses, based on its own knowledge of current limited. The client chooses, based on its own knowledge of current
usefulness, which of the objects in that class should be returned. usefulness, which of the objects in that class should be returned.
For NFSv4.1, a number of bits are defined. For some of these, ranges A number of bits are defined. For some of these, ranges are defined
are defined and it is up to the definition of the storage protocol to and it is up to the definition of the storage protocol to specify how
specify how these are to be used. There are ranges for blocks-based these are to be used. There are ranges reserved for object-based
storage protocols, for object-based storage protocols and a reserved storage protocols and for other experimental storage protocols. An
range for other experimental storage protocols. The RFC defining RFC defining such a storage protocol needs to specify how particular
such a storage protocol needs to specify how particular bits within bits within its range are to be used. For example, it may specify a
its range are to be used. For example, it may specify a mapping mapping between attributes of the layout (read vs. write, size of
between attributes of the layout (read vs. write, size of area) and area) and the bit to be used or it may define a field in the layout
the bit to be used or it may define a field in the layout where the where the associated bit position is made available by the server to
associated bit position is made available by the server to the the client.
client.
When an undefined bit is set in the type mask, NFS4ERR_INVAL should RCA4_TYPE_MASK_RDATA_DLG
be returned. If a client does not support an object of the specified
type, if the bit is defined, NFS4ERR_INVAL should not be returned. The client is to return read delegations on non-directory file
Future minor versions of NFSv4 may expand the set of valid type mask objects.
bits.
RCA4_TYPE_MASK_WDATA_DLG
The client is to return write delegations on regular file objects.
RCA4_TYPE_MASK_DIR_DLG
The client is to return directory delegations.
RCA4_TYPE_MASK_FILE_LAYOUT
The client is to return layouts of type LAYOUT4_NFSV4_1_FILES.
RCA4_TYPE_MASK_BLK_LAYOUT
See [31] for a description.
RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX
See [30] for a description.
RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX
This range is reserved for telling the client to recall layouts of
experimental or site specific layout types (see Section 3.3.13).
When a bit is set in the type mask that corresponds to an undefined
type of recallable object, NFS4ERR_INVAL MUST be returned. When a
bit is set that corresponds to a defined type of object, but the
client does not support an object of the type, NFS4ERR_INVAL MUST NOT
be returned. Future minor versions of NFSv4 may expand the set of
valid type mask bits.
CB_RECALL_ANY specifies a count of objects that the client may keep CB_RECALL_ANY specifies a count of objects that the client may keep
as opposed to a count that the client must return. This is to avoid as opposed to a count that the client must return. This is to avoid
potential race between a CB_RECALL_ANY that had a count of objects to potential race between a CB_RECALL_ANY that had a count of objects to
free with a set of client-originated operations to return layouts or free with a set of client-originated operations to return layouts or
delegations. As a result of the race, the client and server would delegations. As a result of the race, the client and server would
have differing ideas as to how many objects to return. Hence the have differing ideas as to how many objects to return. Hence the
client could mistakenly free too many. client could mistakenly free too many.
If resource demands prompt it, the server may send another If resource demands prompt it, the server may send another
skipping to change at page 567, line 18 skipping to change at page 570, line 46
nfsstat4 croa_status; nfsstat4 croa_status;
}; };
20.7.3. DESCRIPTION 20.7.3. DESCRIPTION
CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client
that the server has resources to grant recallable objects that might that the server has resources to grant recallable objects that might
previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG,
or LAYOUTGET. or LAYOUTGET.
The argument, objects_to_keep means the total number of recallable The argument craa_objects_to_keep means the total number of
objects of the types indicated in the argument type_mask that the recallable objects of the types indicated in the argument type_mask
server believes it can allow the client to have, including the number that the server believes it can allow the client to have, including
of such objects the client already has. A client that tries to the number of such objects the client already has. A client that
acquire more recallable objects than the server informs it can have tries to acquire more recallable objects than the server informs it
runs the risk of having objects recalled. can have runs the risk of having objects recalled.
The server is not obligated to reserve the difference between the
number of the objects the client currently has and the value of
craa_objects_to_keep, nor does delaying the reply to
CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources
of the recallable objects for another purpose. Indeed, if a client
responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might
interpret the client as having reduced capability to manage
recallable objects, and so cancel or reduce any reservation it is
maintaining on behalf of the client. Thus if the client desires to
acquire more recallable objects, it needs to reply quickly to
CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to
acquire recallable objects.
20.8. Operation 10: CB_RECALL_SLOT - change flow control limits 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits
Change flow control limits Change flow control limits
20.8.1. ARGUMENT 20.8.1. ARGUMENT
struct CB_RECALL_SLOT4args { struct CB_RECALL_SLOT4args {
slotid4 rsa_target_highest_slotid; slotid4 rsa_target_highest_slotid;
}; };
skipping to change at page 567, line 45 skipping to change at page 571, line 40
20.8.2. RESULT 20.8.2. RESULT
struct CB_RECALL_SLOT4res { struct CB_RECALL_SLOT4res {
nfsstat4 rsr_status; nfsstat4 rsr_status;
}; };
20.8.3. DESCRIPTION 20.8.3. DESCRIPTION
The CB_RECALL_SLOT operation requests the client to return session The CB_RECALL_SLOT operation requests the client to return session
slots, and if applicable, transport credits (e.g. RDMA credits for slots, and if applicable, transport credits (e.g. RDMA credits for
connections associated with the operations channel) to the server. connections associated with the operations channel) of the session's
CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid,
highest_slot the server wants for the session. The client, should the value of the target highest slot id the server wants for the
then work toward reducing the highest_slot to the target. session. The client MUST then progress toward reducing the session's
highest slot id to the target value.
If the session has only non-RDMA connections associated with its If the session has only non-RDMA connections associated with its
operations channel, then the client need only wait for all operations channel, then the client need only wait for all
outstanding requests with a slotid > rsa_target_highest_slotid to outstanding requests with a slotid > rsa_target_highest_slotid to
complete, then send a single COMPOUND consisting of a single SEQUENCE complete, then send a single COMPOUND consisting of a single SEQUENCE
operation, with the sa_highestslot field set to operation, with the sa_highestslot field set to
rsa_target_highest_slotid. If there are RDMA-based connections rsa_target_highest_slotid. If there are RDMA-based connections
associated with operation channel, then the client needs to also send associated with operation channel, then the client needs to also send
enough zero-length RDMA Sends to take the total RDMA credit count to enough zero-length RDMA Sends to take the total RDMA credit count to
rsa_target_highest_slotid + 1 or below. rsa_target_highest_slotid + 1 or below.
skipping to change at page 569, line 26 skipping to change at page 573, line 26
case NFS4_OK: case NFS4_OK:
CB_SEQUENCE4resok csr_resok4; CB_SEQUENCE4resok csr_resok4;
default: default:
void; void;
}; };
20.9.3. DESCRIPTION 20.9.3. DESCRIPTION
The CB_SEQUENCE operation is used to manage operational accounting The CB_SEQUENCE operation is used to manage operational accounting
for the backchannel of the session on which a request is sent. The for the backchannel of the session on which a request is sent. The
contents include the session to which this request belongs, slot id contents include the session id to which this request belongs, the
and sequence id used by the server to implement session request slot id and sequence id used by the server to implement session
control and exactly once semantics, and exchanged slot maximums which request control and exactly once semantics, and exchanged slot id
are used to adjust the size of the reply cache. This operation MUST maxima which are used to adjust the size of the reply cache. This
appear once as the first operation in each CB_COMPOUND request or a operation will appear once as the first operation in each CB_COMPOUND
protocol error must result. See Section 18.46.3 for a description of request or a protocol error MUST result. See Section 18.46.3 for a
how slots are processed. description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.3). cache the reply (see Section 2.10.5.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by sessionid, slot id and sequencid. These are requests identified by sessionid, slot id and sequencid. These are requests
that the client previously sent to the server. These previous that the client previously sent to the server. These previous
requests created state that some operation(s) in the same CB_COMPOUND requests created state that some operation(s) in the same CB_COMPOUND
as the csa_referring_call_lists is identifying. A sessionid is as the csa_referring_call_lists are identifying. A session id is
included because leased state is tied to a client ID, and a client ID included because leased state is tied to a client ID, and a client ID
can have multiple sessions. See Section 2.10.5.3. can have multiple sessions. See Section 2.10.5.3.
The value of csa_sequenceid argument relative to the cached sequence The value of the csa_sequenceid argument relative to the cached
id on the slot falls into one of three cases. sequence id on the slot falls into one of three cases.
o If the difference between csa_sequenceid and the client's cached o If the difference between csa_sequenceid and the client's cached
sequence id at the slot id is two (2) or more, or if sequence id at the slot id is two (2) or more, or if
csa_sequenceid is less than the cached sequence id (accounting for csa_sequenceid is less than the cached sequence id (accounting for
wraparound of the unsigned sequence id value), then the client wraparound of the unsigned sequence id value), then the client
MUST return NFS4ERR_SEQ_MISORDERED. MUST return NFS4ERR_SEQ_MISORDERED.
o If csa_sequenceid and the cached sequence id are the same, this is o If csa_sequenceid and the cached sequence id are the same, this is
a retry, and the client returns the CB_COMPOUND request's cached a retry, and the client returns the CB_COMPOUND request's cached
reply. reply.
skipping to change at page 570, line 36 skipping to change at page 574, line 36
id, cached reply) MUST NOT change. id, cached reply) MUST NOT change.
The client returns two "highest_slotid" values: csr_highest_slotid, The client returns two "highest_slotid" values: csr_highest_slotid,
and csr_target_highest_slotid. The former is the highest slot id the and csr_target_highest_slotid. The former is the highest slot id the
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT client will accept in a future CB_SEQUENCE operation, and SHOULD NOT
be less than the value of csa_highest_slotid (but see be less than the value of csa_highest_slotid (but see
Section 2.10.5.1 for an exception). The latter is the highest slot Section 2.10.5.1 for an exception). The latter is the highest slot
id the client would prefer the server use on a future CB_SEQUENCE id the client would prefer the server use on a future CB_SEQUENCE
operation. operation.
20.9.4. IMPLEMENTATION
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation
Wants Wants
Retracts promise to signal delegation availability. Retracts promise to signal delegation availability.
20.10.1. ARGUMENT 20.10.1. ARGUMENT
struct CB_WANTS_CANCELLED4args { struct CB_WANTS_CANCELLED4args {
bool cwca_contended_wants_cancelled; bool cwca_contended_wants_cancelled;
bool cwca_resourced_wants_cancelled; bool cwca_resourced_wants_cancelled;
skipping to change at page 572, line 13 skipping to change at page 576, line 13
}; };
20.11.2. RESULT 20.11.2. RESULT
struct CB_NOTIFY_LOCK4res { struct CB_NOTIFY_LOCK4res {
nfsstat4 cnlr_status; nfsstat4 cnlr_status;
}; };
20.11.3. DESCRIPTION 20.11.3. DESCRIPTION
The server can use this operation to indicate that a lock for the The server can use this operation to indicate that a byte-range lock
given file and lock-owner, previously requested by the client via an for the given file and lock-owner, previously requested by the client
unsuccessful LOCK request, might be available. via an unsuccessful LOCK request, might be available.
This callback is meant to be used by servers to help reduce the This callback is meant to be used by servers to help reduce the
latency of blocking locks in the case where they recognize that a latency of blocking locks in the case where they recognize that a
client which has been polling for a blocking lock may now be able to client which has been polling for a blocking lock may now be able to
acquire the lock. If the server supports this callback for a given acquire the lock. If the server supports this callback for a given
file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when
responding to successful opens for that file. This does not commit responding to successful opens for that file. This does not commit
the server to use of CB_NOTIFY_LOCK, but the client may use this as a the server to the use of CB_NOTIFY_LOCK, but the client may use this
hint to decide how frequently to poll for locks derived from that as a hint to decide how frequently to poll for locks derived from
open. that open.
If an OPEN operation results in an upgrade, in which the stateid If an OPEN operation results in an upgrade, in which the stateid
returned has an "other" value matching that of a stateid already returned has an "other" value matching that of a stateid already
allocated, with a new "seqid" indicating a change in the lock being allocated, with a new "seqid" indicating a change in the lock being
represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag
when responding to that new OPEN controls handling from that point when responding to that new OPEN controls handling from that point
going forward. When parallel OPENs are done on the same file and going forward. When parallel OPENs are done on the same file and
open-owner, the ordering of the "seqid" field of the returned stateid open-owner, the ordering of the "seqid" field of the returned stateid
(subject to wraparound) are to be used to select the controlling (subject to wraparound) are to be used to select the controlling
value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag.
20.11.4. IMPLEMENTATION 20.11.4. IMPLEMENTATION
The server must not grant the lock to the client unless and until it The server MUST NOT grant the lock to the client unless and until it
receives an actual lock request from the client. Similarly, the receives an actual LOCK request from the client. Similarly, the
client receiving this callback cannot assume that it now has the client receiving this callback cannot assume that it now has the
lock, or that a subsequent request for the lock will be successful. lock, or that a subsequent LOCK request for the lock will be
successful.
The server is not required to implement this callback, and even if it The server is not required to implement this callback, and even if it
does, it is not required to use it in any particular case. Therefore does, it is not required to use it in any particular case. Therefore
the client must still rely on polling for blocking locks, as the client must still rely on polling for blocking locks, as
described in Section 9.6. described in Section 9.6.
Similarly, the client is not required to implement this callback, and Similarly, the client is not required to implement this callback, and
even it does, is still free to ignore it. Therefore the server MUST even it does, is still free to ignore it. Therefore the server MUST
NOT assume that the client will act based on the callback. NOT assume that the client will act based on the callback.
skipping to change at page 573, line 46 skipping to change at page 577, line 47
20.12.2. RESULT 20.12.2. RESULT
struct CB_NOTIFY_DEVICEID4res { struct CB_NOTIFY_DEVICEID4res {
nfsstat4 cndr_status; nfsstat4 cndr_status;
}; };
20.12.3. DESCRIPTION 20.12.3. DESCRIPTION
The CB_NOTIFY_DEVICEID operation is used by the server to send The CB_NOTIFY_DEVICEID operation is used by the server to send
notifications to clients about changes to pNFS device IDs. The notifications to clients about changes to pNFS device IDs. The
registration of device ID notifications occurs when the device registration of device ID notifications is optional and is done via
mapping stateid is established using GETDEVICEINFO or GETDEVICELIST. GETDEVICEINFO. These notifications are sent over the backchannel
These notifications are sent over the backchannel. The notification once the original request has been processed on the server. The
is sent once the original request has been processed on the server. server will send an array of notifications, cnda_changes, as a list
The server will send an array of notifications, cnda_changes, as a of pairs of bitmaps and values. See Section 3.3.7 for a description
list of pairs of bitmaps and values. See Section 3.3.7 for a of how NFSv4.1 bitmaps work.
description of how NFSv4.1 bitmaps work.
As with CB_NOTIFY (Section 20.4.3), it is possible the server has As with CB_NOTIFY (Section 20.4.3), it is possible the server has
more notifications than can fit in a CB_COMPOUND, thus requiring more notifications than can fit in a CB_COMPOUND, thus requiring
multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an
issue because unlike directory entries, device IDs cannot be re-used issue because unlike directory entries, device IDs cannot be re-used
after being deleted (Section 12.2.10). after being deleted (Section 12.2.10).
All device ID notifications contain a device ID and a layout type. All device ID notifications contain a device ID and a layout type.
The layout type is necessary because two different layout types can The layout type is necessary because two different layout types can
share the same device ID, and the common device ID can have share the same device ID, and the common device ID can have
completely different mappings for each layout type. completely different mappings for each layout type.
The server will send the following notifications: The server will send the following notifications:
NOTIFY_DEVICEID4_CHANGE NOTIFY_DEVICEID4_CHANGE
A previously provided device ID to device address mapping has A previously provided device ID to device address mapping has
changed and the client uses GETDEVICEINFO or GETDEVICELIST to changed and the client uses GETDEVICEINFO to obtain the updated
obtain the updated mapping. The notification is encoded in a mapping. The notification is encoded in a value of data type
value of data type notify_deviceid_change4. This data type also notify_deviceid_change4. This data type also contains a boolean
contains a boolean field, ndc_immediate, which if TRUE indicates field, ndc_immediate, which if TRUE indicates that the change will
that the change will be enforced immediately, and so the client be enforced immediately, and so the client might not be able to
might not be able to complete any pending I/O to the device ID. complete any pending I/O to the device ID. If ndc_immediate is
If ndc_immediate is FALSE, then for an indefinite time, the client FALSE, then for an indefinite time, the client can complete
can complete pending I/O. After pending I/O is complete, the pending I/O. After pending I/O is complete, the client SHOULD get
client SHOULD get the new device ID to device address mappings the new device ID to device address mappings before issuing new
before issuing new I/O to the device ID. I/O to the device ID.
NOTIFY4_DEVICEID_DELETE NOTIFY4_DEVICEID_DELETE
Deletes a device ID from the mappings. This notification MUST NOT Deletes a device ID from the mappings. This notification MUST NOT
be sent if the client has a layout that refers to the device ID. be sent if the client has a layout that refers to the device ID.
In other words if the server is sending a delete device ID In other words if the server is sending a delete device ID
notification, one of the following is true for layouts associated notification, one of the following is true for layouts associated
with the layout type: with the layout type:
* The client never had a layout referring to that device ID. * The client never had a layout referring to that device ID.
skipping to change at page 575, line 23 skipping to change at page 579, line 23
/* /*
* CB_ILLEGAL: Response for illegal operation numbers * CB_ILLEGAL: Response for illegal operation numbers
*/ */
struct CB_ILLEGAL4res { struct CB_ILLEGAL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.13.3. DESCRIPTION 20.13.3. DESCRIPTION
This operation is a placeholder for encoding a result to handle the This operation is a placeholder for encoding a result to handle the
case of the client sending an operation code within COMPOUND that is case of the server sending an operation code within CB_COMPOUND that
not defined in the NFSv4.1 specification. See Section 16.2.3 for is not defined in the NFSv4.1 specification. See Section 19.2.3 for
more details. more details.
The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.
20.13.4. IMPLEMENTATION 20.13.4. IMPLEMENTATION
A server will probably not send an operation with code OP_CB_ILLEGAL A server will probably not send an operation with code OP_CB_ILLEGAL
but if it does, the response will be CB_ILLEGAL4res just as it would but if it does, the response will be CB_ILLEGAL4res just as it would
be with any other invalid operation code. Note that if the client be with any other invalid operation code. Note that if the client
gets an illegal operation code that is not OP_ILLEGAL, and if the gets an illegal operation code that is not OP_ILLEGAL, and if the
client checks for legal operation codes during the XDR decode phase, client checks for legal operation codes during the XDR decode phase,
then the CB_ILLEGAL4res would not be returned. then an instance of data type CB_ILLEGAL4res will not be returned.
21. Security Considerations 21. Security Considerations
NFS has historically used a model where, from an authentication NFS has historically used a model where, from an authentication
perspective, the client was the entire machine, or at least the perspective, the client was the entire machine, or at least the
source network address of the machine. The NFS server relied on the source network address of the machine. The NFS server relied on the
NFS client to make the proper authentication of the end-user. The NFS client to make the proper authentication of the end-user. The
NFS server in turn shared its files only to specific clients, as NFS server in turn shared its files only to specific clients, as
identified by the client's source network address. Given this model, identified by the client's source network address. Given this model,
the AUTH_SYS RPC security flavor simply identified the end-user using the AUTH_SYS RPC security flavor simply identified the end-user using
skipping to change at page 583, line 6 skipping to change at page 587, line 6
[27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings ,
February 1996. February 1996.
[28] Nowicki, B., "NFS: Network File System Protocol specification", [28] Nowicki, B., "NFS: Network File System Protocol specification",
RFC 1094, March 1989. RFC 1094, March 1989.
[29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available
Network Server", USENIX Conference Proceedings , January 1991. Network Server", USENIX Conference Proceedings , January 1991.
[30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS
Operations", September 2007, <ftp://www.ietf.org/ Operations", April 2008, <ftp://www.ietf.org/internet-drafts/
internet-drafts/draft-nfsv4-pnfs-obj-04.txt>. draft-nfsv4-pnfs-obj-07.txt>.
[31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume
Layout", November 2007, <ftp://www.ietf.org/internet-drafts/ Layout", April 2008, <ftp://www.ietf.org/internet-drafts/
draft-ietf-nfsv4-pnfs-block-05.txt>. draft-ietf-nfsv4-pnfs-block-08.txt>.
[32] Callaghan, B., "WebNFS Client Specification", RFC 2054, [32] Callaghan, B., "WebNFS Client Specification", RFC 2054,
October 1996. October 1996.
[33] Callaghan, B., "WebNFS Server Specification", RFC 2055, [33] Callaghan, B., "WebNFS Server Specification", RFC 2055,
October 1996. October 1996.
[34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
June 1999. June 1999.
 End of changes. 99 change blocks. 
338 lines changed or deleted 517 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/