draft-ietf-nfsv4-minorversion1-PAv3.txt   draft-ietf-nfsv4-minorversion1-PAv4.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: October 13, 2009 Editors Expires: December 22, 2009 Editors
April 11, 2009 June 20, 2009
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-PAv3.txt draft-ietf-nfsv4-minorversion1-PAv4.txt
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 1, line 33 skipping to change at page 1, line 33
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 13, 2009. This Internet-Draft will expire on December 22, 2009.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 4, line 21 skipping to change at page 4, line 21
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 87 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 87
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 89 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 89
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 97 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 98 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 98
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 98 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 98
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 98 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 98
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 99 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 99
4.2.1. General Properties of a Filehandle . . . . . . . . . 99 4.2.1. General Properties of a Filehandle . . . . . . . . . 99
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 100 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 100
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 100 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 100
4.3. One Method of Constructing a Volatile Filehandle . . . . 102 4.3. One Method of Constructing a Volatile Filehandle . . . . 101
4.4. Client Recovery from Filehandle Expiration . . . . . . . 102 4.4. Client Recovery from Filehandle Expiration . . . . . . . 102
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 103 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 103
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 104 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 104
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 104 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 104
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 105 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 105
5.4. Classification of Attributes . . . . . . . . . . . . . . 106 5.4. Classification of Attributes . . . . . . . . . . . . . . 106
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 107 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 107
5.6. REQUIRED Attributes - List and Definition References . . 108 5.6. REQUIRED Attributes - List and Definition References . . 107
5.7. RECOMMENDED Attributes - List and Definition 5.7. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 108 References . . . . . . . . . . . . . . . . . . . . . . . 108
5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 110 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 110
5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 110 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 110
5.8.2. Definitions of Uncategorized RECOMMENDED 5.8.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 112 Attributes . . . . . . . . . . . . . . . . . . . . . 112
5.9. Interpreting owner and owner_group . . . . . . . . . . . 119 5.9. Interpreting owner and owner_group . . . . . . . . . . . 119
5.10. Character Case Attributes . . . . . . . . . . . . . . . 121 5.10. Character Case Attributes . . . . . . . . . . . . . . . 121
5.11. Directory Notification Attributes . . . . . . . . . . . 121 5.11. Directory Notification Attributes . . . . . . . . . . . 121
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 122 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 121
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 123 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 123
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 126 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 126
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 127 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 127
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 127 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 127
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 143 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 143
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 143 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 143
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 143 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 143
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 143 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 143
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 144 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 144
skipping to change at page 8, line 43 skipping to change at page 8, line 43
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 341 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 341
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 342 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 342
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 344 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 344
15.1.5. State Management Errors . . . . . . . . . . . . . . 346 15.1.5. State Management Errors . . . . . . . . . . . . . . 346
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 346 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 346
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 347 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 347
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 348 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 348
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 349 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 349
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 350 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 350
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 351 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 351
15.1.12. Session Management Errors . . . . . . . . . . . . . 352 15.1.12. Session Management Errors . . . . . . . . . . . . . 353
15.1.13. Client Management Errors . . . . . . . . . . . . . . 353 15.1.13. Client Management Errors . . . . . . . . . . . . . . 353
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 354 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 354
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 354 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 354
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 355 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 355
15.2. Operations and their valid errors . . . . . . . . . . . 356 15.2. Operations and their valid errors . . . . . . . . . . . 356
15.3. Callback operations and their valid errors . . . . . . . 372 15.3. Callback operations and their valid errors . . . . . . . 372
15.4. Errors and the operations that use them . . . . . . . . 374 15.4. Errors and the operations that use them . . . . . . . . 374
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 389 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 389
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 389 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 389
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 390 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 390
skipping to change at page 30, line 19 skipping to change at page 30, line 19
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following are previous incarnation of the client owner if one of the following are
true: true:
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is sending the EXCHANGE_ID
that if the client ID was created with SP4_MACH_CRED state operation. Note that if the client ID was created with
protection (Section 18.35), the principal MUST be based on SP4_MACH_CRED state protection (Section 18.35), the principal MUST
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used
integrity or privacy, and the same GSS mechanism and principal MUST be integrity or privacy, and the same GSS mechanism and
MUST be used as that used when the client ID was created. principal MUST be used as that used when the client ID was
created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.8.3) and the client sends the (Section 18.35, Section 2.10.8.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.9). GSS SSV mechanism (Section 2.10.9).
o The client ID was established with SP4_SSV protection, and under o The client ID was established with SP4_SSV protection, and under
the conditions described herein, the EXCHANGE_ID was sent with the conditions described herein, the EXCHANGE_ID was sent with
SP4_MACH_CRED state protection. Because the SSV might not persist SP4_MACH_CRED state protection. Because the SSV might not persist
across client and server restart, and because the first time a across client and server restart, and because the first time a
skipping to change at page 48, line 6 skipping to change at page 48, line 6
compatibility can be quite minimal, and limited to a simple partition compatibility can be quite minimal, and limited to a simple partition
of the ID space. The recognition of common values requires of the ID space. The recognition of common values requires
additional implementation, but this can be tailored to the specific additional implementation, but this can be tailored to the specific
situations in which that recognition is desired. situations in which that recognition is desired.
Clients will have occasion to compare the server scope values of Clients will have occasion to compare the server scope values of
multiple servers under a number of circumstances, each of which will multiple servers under a number of circumstances, each of which will
be discussed under the appropriate functional section. be discussed under the appropriate functional section.
o When server owner values received in response to EXCHANGE_ID o When server owner values received in response to EXCHANGE_ID
operations issued to multiple network addresses are compared for operations sent to multiple network addresses are compared for the
the purpose of determining the validity of various forms of purpose of determining the validity of various forms of trunking,
trunking, as described in Section 2.10.5. as described in Section 2.10.5.
o When network or server reconfiguration causes the same network o When network or server reconfiguration causes the same network
address to possibly be directed to different servers, with the address to possibly be directed to different servers, with the
necessity for the client to determine when lock reclaim should be necessity for the client to determine when lock reclaim should be
attempted, as described in Section 8.4.2.1 attempted, as described in Section 8.4.2.1
o When file system migration causes the transfer of responsibility o When file system migration causes the transfer of responsibility
for a file system between servers and the client needs to for a file system between servers and the client needs to
determine whether state has been transferred with the file system determine whether state has been transferred with the file system
(as described in Section 11.7.7) or whether the client needs to (as described in Section 11.7.7) or whether the client needs to
skipping to change at page 52, line 10 skipping to change at page 52, line 10
If the client specified SP4_MACH_CRED state protection, the If the client specified SP4_MACH_CRED state protection, the
BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or BIND_CONN_TO_SESSION operation will use RPCSEC_GSS integrity or
privacy, using the same credential that was used when the client privacy, using the same credential that was used when the client
ID was created. Mutual authentication via RPCSEC_GSS assures the ID was created. Mutual authentication via RPCSEC_GSS assures the
client that the connection is associated with the correct session client that the connection is associated with the correct session
of the correct server. of the correct server.
o For client ID trunking, the client has at least two options for o For client ID trunking, the client has at least two options for
verifying that the same client ID obtained from two different verifying that the same client ID obtained from two different
EXCHANGE_ID operations came from the same server. The first EXCHANGE_ID operations came from the same server. The first
option is to use RPCSEC_GSS authentication when issuing each option is to use RPCSEC_GSS authentication when sending each
EXCHANGE_ID. Each time an EXCHANGE_ID is sent with RPCSEC_GSS EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with
authentication, the client notes the principal name of the GSS RPCSEC_GSS authentication, the client notes the principal name of
target. If the EXCHANGE_ID results indicate client ID trunking is the GSS target. If the EXCHANGE_ID results indicate client ID
possible, and the GSS targets' principal names are the same, the trunking is possible, and the GSS targets' principal names are the
servers are the same and client ID trunking is allowed. same, the servers are the same and client ID trunking is allowed.
The second option for verification is to use SP4_SSV protection. The second option for verification is to use SP4_SSV protection.
When the client sends EXCHANGE_ID it specifies SP4_SSV protection. When the client sends EXCHANGE_ID it specifies SP4_SSV protection.
The first EXCHANGE_ID the client sends always has to be confirmed The first EXCHANGE_ID the client sends always has to be confirmed
by a CREATE_SESSION call. The client then sends SET_SSV. Later by a CREATE_SESSION call. The client then sends SET_SSV. Later
the client sends EXCHANGE_ID to a second destination network the client sends EXCHANGE_ID to a second destination network
address different from the one the first EXCHANGE_ID was sent to. address different from the one the first EXCHANGE_ID was sent to.
The client checks that each EXCHANGE_ID reply has the same The client checks that each EXCHANGE_ID reply has the same
eir_clientid, eir_server_owner.so_major_id, and eir_server_scope. eir_clientid, eir_server_owner.so_major_id, and eir_server_scope.
If so, the client verifies the claim by issuing a CREATE_SESSION If so, the client verifies the claim by sending a CREATE_SESSION
to the second destination address, protected with RPCSEC_GSS operation to the second destination address, protected with
integrity using an RPCSEC_GSS handle returned by the second RPCSEC_GSS integrity using an RPCSEC_GSS handle returned by the
EXCHANGE_ID. If the server accepts the CREATE_SESSION request, second EXCHANGE_ID. If the server accepts the CREATE_SESSION
and if the client verifies the RPCSEC_GSS verifier and integrity request, and if the client verifies the RPCSEC_GSS verifier and
codes, then the client has proof the second server knows the SSV, integrity codes, then the client has proof the second server knows
and thus the two servers are co-operating for the purposes of the SSV, and thus the two servers are co-operating for the
specifying server scope and client ID trunking. purposes of specifying server scope and client ID trunking.
2.10.6. Exactly Once Semantics 2.10.6. Exactly Once Semantics
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.6.1.3). The requirement holds even if the requester is Section 2.10.6.1.3). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client sending the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
requirement, divide the requests into three classifications: requirement, divide the requests into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
An example of a non-idempotent request is RENAME. If is obvious that An example of a non-idempotent request is RENAME. If is obvious that
skipping to change at page 58, line 14 skipping to change at page 58, line 14
enforced highest_slotid, the requester is only allowed to send enforced highest_slotid, the requester is only allowed to send
retries on slots that exceed the replier's highest_slotid. If a retries on slots that exceed the replier's highest_slotid. If a
request is received with a slot ID that is higher than the new request is received with a slot ID that is higher than the new
enforced highest_slotid, and the sequence ID is one higher than enforced highest_slotid, and the sequence ID is one higher than
what is in the slot's reply cache, then the server can both retire what is in the slot's reply cache, then the server can both retire
the slot and return NFS4ERR_BADSLOT (however the server MUST NOT the slot and return NFS4ERR_BADSLOT (however the server MUST NOT
do one and not the other). The reason it is safe to retire the do one and not the other). The reason it is safe to retire the
slot is because that by using the next sequence ID, the requester slot is because that by using the next sequence ID, the requester
is indicating it has received the previous reply for the slot. is indicating it has received the previous reply for the slot.
o The requester SHOULD use the lowest available slot when issuing a o The requester SHOULD use the lowest available slot when sending a
new request. This way, the replier may be able to retire slot new request. This way, the replier may be able to retire slot
entries faster. However, where the replier is actively adjusting entries faster. However, where the replier is actively adjusting
its granted highest_slotid, it will not be able to use only the its granted highest_slotid, it will not be able to use only the
receipt of the slot ID and highest_slotid in the request. Neither receipt of the slot ID and highest_slotid in the request. Neither
the slot ID nor the highest_slotid used in a request may reflect the slot ID nor the highest_slotid used in a request may reflect
the replier's current idea of the requester's session limit, the replier's current idea of the requester's session limit,
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
skipping to change at page 100, line 7 skipping to change at page 100, line 7
incorrect behavior. Further discussion of filehandle and attribute incorrect behavior. Further discussion of filehandle and attribute
comparison in the context of data caching is presented in the comparison in the context of data caching is presented in the
Section 10.3.4. Section 10.3.4.
As an example, in the case that two different path names when As an example, in the case that two different path names when
traversed at the server terminate at the same file system object, the traversed at the server terminate at the same file system object, the
server SHOULD return the same filehandle for each path. This can server SHOULD return the same filehandle for each path. This can
occur if a hard link (see [6]) is used to create two file names which occur if a hard link (see [6]) is used to create two file names which
refer to the same underlying file object and associated data. For refer to the same underlying file object and associated data. For
example, if paths /a/b/c and /a/d/c refer to the same file, the example, if paths /a/b/c and /a/d/c refer to the same file, the
server SHOULD return the same filehandle for both path names server SHOULD return the same filehandle for both path name
traversals. traversals.
4.2.2. Persistent Filehandle 4.2.2. Persistent Filehandle
A persistent filehandle is defined as having a fixed value for the A persistent filehandle is defined as having a fixed value for the
lifetime of the file system object to which it refers. Once the lifetime of the file system object to which it refers. Once the
server creates the filehandle for a file system object, the server server creates the filehandle for a file system object, the server
MUST accept the same filehandle for the object for the lifetime of MUST accept the same filehandle for the object for the lifetime of
the object. If the server restarts, the NFS server MUST honor the the object. If the server restarts, the NFS server MUST honor the
same filehandle value as it did in the server's previous same filehandle value as it did in the server's previous
skipping to change at page 101, line 32 skipping to change at page 101, line 32
_handle_ class information within the fs_locations_info attribute. _handle_ class information within the fs_locations_info attribute.
When this bit is set, clients without access to fs_locations_info When this bit is set, clients without access to fs_locations_info
information should assume filehandles will expire on file system information should assume filehandles will expire on file system
transitions. transitions.
FH4_VOL_RENAME The filehandle will expire during rename. This FH4_VOL_RENAME The filehandle will expire during rename. This
includes a rename by the requesting client or a rename by any includes a rename by the requesting client or a rename by any
other client. If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant. other client. If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant.
Servers which provide volatile filehandles that may expire while open Servers which provide volatile filehandles that may expire while open
(i.e. if FH4_VOL_MIGRATION or FH4_VOL_RENAME is set or if
FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set), should
deny a RENAME or REMOVE that would affect an OPEN file of any of the
components leading to the OPEN file. In addition, the server should
deny all RENAME or REMOVE requests during the grace period upon
server restart.
Servers which provide volatile filehandles that may expire while open
require special care as regards handling of RENAMEs and REMOVEs. require special care as regards handling of RENAMEs and REMOVEs.
This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is
set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set, set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set,
or if a non-readonly file system has a transition target in a or if a non-readonly file system has a transition target in a
different _handle _ class. In these cases, the server should deny a different _handle _ class. In these cases, the server should deny a
RENAME or REMOVE that would affect an OPEN file of any of the RENAME or REMOVE that would affect an OPEN file of any of the
components leading to the OPEN file. In addition, the server should components leading to the OPEN file. In addition, the server should
deny all RENAME or REMOVE requests during the grace period, in order deny all RENAME or REMOVE requests during the grace period, in order
to make sure that reclaims of files where filehandles may have to make sure that reclaims of files where filehandles may have
expired do not do a reclaim for the wrong file. expired do not do a reclaim for the wrong file.
skipping to change at page 121, line 31 skipping to change at page 121, line 27
With respect to the case_insensitive and case_preserving attributes, With respect to the case_insensitive and case_preserving attributes,
each UCS-4 character (which UTF-8 encodes) can be mapped according to each UCS-4 character (which UTF-8 encodes) can be mapped according to
Appendix B.2 of RFC3454 [19]. For general character handling and Appendix B.2 of RFC3454 [19]. For general character handling and
internationalization issues, see Section 14. internationalization issues, see Section 14.
5.11. Directory Notification Attributes 5.11. Directory Notification Attributes
As described in Section 18.39, the client can request a minimum delay As described in Section 18.39, the client can request a minimum delay
for notifications of changes to attributes, but the server is free to for notifications of changes to attributes, but the server is free to
ignore what the client requests. The client can determine in advance ignore what the client requests. The client can determine in advance
what notification delays the server will accept by issuing a GETATTR what notification delays the server will accept by sending a GETATTR
for either or both of two directory notification attributes. When operation for either or both of two directory notification
the client calls the GET_DIR_DELEGATION operation and asks for attributes. When the client calls the GET_DIR_DELEGATION operation
attribute change notifications, it should request notification delays and asks for attribute change notifications, it should request
that are no less than the values in the server-provided attributes. notification delays that are no less than the values in the server-
provided attributes.
5.11.1. Attribute 56: dir_notif_delay 5.11.1. Attribute 56: dir_notif_delay
The dir_notif_delay attribute is the minimum number of seconds the The dir_notif_delay attribute is the minimum number of seconds the
server will delay before notifying the client of a change to the server will delay before notifying the client of a change to the
directory's attributes. directory's attributes.
5.11.2. Attribute 57: dirent_notif_delay 5.11.2. Attribute 57: dirent_notif_delay
The dirent_notif_delay attribute is the minimum number of seconds the The dirent_notif_delay attribute is the minimum number of seconds the
skipping to change at page 123, line 20 skipping to change at page 123, line 14
the metadata server or the data server. The two types of thresholds the metadata server or the data server. The two types of thresholds
described are file size thresholds and I/O size thresholds. If a described are file size thresholds and I/O size thresholds. If a
file's size is smaller than the file size threshold, data accesses file's size is smaller than the file size threshold, data accesses
SHOULD be sent to the metadata server. If an I/O request has a SHOULD be sent to the metadata server. If an I/O request has a
length that is below the I/O size threshold, the I/O SHOULD be sent length that is below the I/O size threshold, the I/O SHOULD be sent
to the metadata server. Each threshold type is specified separately to the metadata server. Each threshold type is specified separately
for READ and WRITE. for READ and WRITE.
The server MAY provide both types of thresholds for a file. If both The server MAY provide both types of thresholds for a file. If both
file size and I/O size are provided, the client SHOULD reach or file size and I/O size are provided, the client SHOULD reach or
exceed both thresholds before issuing its READ or WRITE requests to exceed both thresholds before sending its READ or WRITE operations to
the data server. Alternatively, if only one of the specified the data server. Alternatively, if only one of the specified
thresholds are reached or exceeded, the I/O requests are sent to the thresholds are reached or exceeded, the I/O requests are sent to the
metadata server. metadata server.
For each threshold type, a value of 0 indicates no READ or WRITE For each threshold type, a value of 0 indicates no READ or WRITE
should be sent to the metadata server, while a value of all 1s should be sent to the metadata server, while a value of all 1s
indicates all READS or WRITES should be sent to the metadata server. indicates all READS or WRITES should be sent to the metadata server.
The attribute is available on a per filehandle basis. If the current The attribute is available on a per filehandle basis. If the current
filehandle refers to a non-pNFS file or directory, the metadata filehandle refers to a non-pNFS file or directory, the metadata
skipping to change at page 156, line 41 skipping to change at page 156, line 41
shared resource. Suppose the security policy for /a/b/ shared resource. Suppose the security policy for /a/b/
MySecretProject is Kerberos with integrity and it is desired to limit MySecretProject is Kerberos with integrity and it is desired to limit
knowledge of the existence of this file system. In this case, the knowledge of the existence of this file system. In this case, the
server should apply the same security policy to /a/b. This allows server should apply the same security policy to /a/b. This allows
for knowledge of the existence of a file system to be secured when for knowledge of the existence of a file system to be secured when
desirable. desirable.
For the case of the use of multiple, disjoint security mechanisms in For the case of the use of multiple, disjoint security mechanisms in
the server's resources, applying that sort of policy would result in the server's resources, applying that sort of policy would result in
the higher-level file system not being accessible using any security the higher-level file system not being accessible using any security
flavor, which would make the that higher-level file system flavor, which would make that higher-level file system inaccessible.
inaccessible. Therefore, that sort of configuration is not Therefore, that sort of configuration is not compatible with hiding
compatible with hiding the existence (as opposed to the contents) the existence (as opposed to the contents) from clients using
from clients using multiple disjoint sets of security flavors. multiple disjoint sets of security flavors.
In other circumstances, a desirable policy is for the security of a In other circumstances, a desirable policy is for the security of a
particular object in the server's namespace should include the union particular object in the server's namespace should include the union
of all security mechanisms of all direct descendants. A common and of all security mechanisms of all direct descendants. A common and
convenient practice, unless strong security requirements dictate convenient practice, unless strong security requirements dictate
otherwise, is to make all of the pseudo file system accessible by all otherwise, is to make all of the pseudo file system accessible by all
of the valid security mechanisms. of the valid security mechanisms.
Where there is concern about the security of data on the network, Where there is concern about the security of data on the network,
clients should use strong security mechanisms to access the pseudo clients should use strong security mechanisms to access the pseudo
skipping to change at page 165, line 47 skipping to change at page 165, line 47
any additional information about the type of stateid and any additional information about the type of stateid and
information associated with that particular type of stateid, such information associated with that particular type of stateid, such
as the associated set of locks, such as open-owner and lock-owner as the associated set of locks, such as open-owner and lock-owner
information, as well as information on the specific locks, such as information, as well as information on the specific locks, such as
open modes and byte ranges. open modes and byte ranges.
8.2.5. Stateid Use for I/O Operations 8.2.5. Stateid Use for I/O Operations
Clients performing I/O operations need to select an appropriate Clients performing I/O operations need to select an appropriate
stateid based on the locks (including opens and delegations) held by stateid based on the locks (including opens and delegations) held by
the client and the various types of state-owners issuing the I/O the client and the various types of state-owners sending the I/O
requests. SETATTR operations which change the file size are treated requests. SETATTR operations which change the file size are treated
like I/O operations in this regard. like I/O operations in this regard.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid. In following these rules, the selection of the appropriate stateid. In following these rules,
the client will only consider locks of which it has actually received the client will only consider locks of which it has actually received
notification by an appropriate operation response or callback. Note notification by an appropriate operation response or callback. Note
that the rules are slightly different in the case of I/O to data that the rules are slightly different in the case of I/O to data
servers when file layouts are being used (see Section 13.9.1). servers when file layouts are being used (see Section 13.9.1).
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid SHOULD be used. delegation stateid SHOULD be used.
o Otherwise, if the lock-owner corresponding entity (e.g. process) o Otherwise, if the lock-owner corresponding entity (e.g. process)
issuing the I/O has a lock stateid for the associated open file, sending the I/O has a lock stateid for the associated open file,
then the lock stateid for that lock-owner and open file SHOULD be then the lock stateid for that lock-owner and open file SHOULD be
used. used.
o If there is no lock stateid, then the open stateid for the open o If there is no lock stateid, then the open stateid for the open
file in question SHOULD be used. file in question SHOULD be used.
o Finally, if none of the above apply, then a special stateid SHOULD o Finally, if none of the above apply, then a special stateid SHOULD
be used. be used.
Ignoring these rules may result in situations in which the server Ignoring these rules may result in situations in which the server
skipping to change at page 167, line 41 skipping to change at page 167, line 41
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server steps to ensure that the renewal messages actually reach the server
in good time. For example: in good time. For example:
o When trunking is in effect, the client should consider issuing o When trunking is in effect, the client should consider sending
multiple requests on different connections, in order to ensure multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path that renewal occurs, even in the event of blockage in the path
used for one of those connections. used for one of those connections.
o Transport retransmission delays might become so large as to o Transport retransmission delays might become so large as to
approach or exceed the length of the lease period. This may be approach or exceed the length of the lease period. This may be
particularly likely when the server is unresponsive due to a particularly likely when the server is unresponsive due to a
restart; see Section 8.4.2.1. If the client implementation is not restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends. failing to detect a server restart before the grace period ends.
skipping to change at page 172, line 40 skipping to change at page 172, line 40
they held before the server restart. This means that a client which they held before the server restart. This means that a client which
has done a RECLAIM_COMPLETE must be prepared to receive an has done a RECLAIM_COMPLETE must be prepared to receive an
NFS4ERR_GRACE when attempting to acquire new locks. In order for the NFS4ERR_GRACE when attempting to acquire new locks. In order for the
server to know that all clients with possible prior lock state have server to know that all clients with possible prior lock state have
done a RECLAIM_COMPLETE, the server must maintain in stable storage a done a RECLAIM_COMPLETE, the server must maintain in stable storage a
list of clients which may have such locks. The server may also list of clients which may have such locks. The server may also
terminate the grace period before all clients have done a global terminate the grace period before all clients have done a global
RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period
before a time equal to the lease period in order to give clients an before a time equal to the lease period in order to give clients an
opportunity to find out about the server restart, as a result of opportunity to find out about the server restart, as a result of
issuing requests on associated sessions with a frequency governed by sending requests on associated sessions with a frequency governed by
the lease time. Note that when a client does not issue such requests the lease time. Note that when a client does not send such requests
(or they are issued by the client but not received by the server), it (or they are sent by the client but not received by the server), it
is possible for the grace period to expire before the client finds is possible for the grace period to expire before the client finds
out that the server restart has occurred. out that the server restart has occurred.
Some additional time in order to allow a client to establish a new Some additional time in order to allow a client to establish a new
client ID and session and to effect lock reclaims may be added to the client ID and session and to effect lock reclaims may be added to the
lease time. Note that analogous rules apply to file system-specific lease time. Note that analogous rules apply to file system-specific
grace periods discussed in Section 11.7.7. grace periods discussed in Section 11.7.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
skipping to change at page 181, line 38 skipping to change at page 181, line 38
gentler to servers trying to handle very large numbers of clients. gentler to servers trying to handle very large numbers of clients.
The number of extra requests to effect lock renewal drops in inverse The number of extra requests to effect lock renewal drops in inverse
proportion to the lease time. The disadvantages of long leases proportion to the lease time. The disadvantages of long leases
include the possibility of slower recovery after certain failures. include the possibility of slower recovery after certain failures.
After server failure, a longer grace period may be required when some After server failure, a longer grace period may be required when some
clients do not promptly reclaim their locks and do a global clients do not promptly reclaim their locks and do a global
RECLAIM_COMPLETE. In the event of client failure, there can be a RECLAIM_COMPLETE. In the event of client failure, there can be a
longer period for leases to expire thus forcing conflicting requests longer period for leases to expire thus forcing conflicting requests
to wait. to wait.
Long leases are practical if the server can store lease state in non- Long leases are practical if the server can store lease state in
volatile memory. Upon recovery, the server can reconstruct the lease stable storage. Upon recovery, the server can reconstruct the lease
state from its non-volatile memory and continue operation with its state from its stable storage and continue operation with its
clients and therefore long leases would not be an issue. clients.
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration client and server clocks do not drift excessively over the duration
of the lease. There is also the issue of propagation delay across of the lease. There is also the issue of propagation delay across
the network which could easily be several hundred milliseconds as the network which could easily be several hundred milliseconds as
well as the possibility that requests will be lost and need to be well as the possibility that requests will be lost and need to be
retransmitted. retransmitted.
skipping to change at page 188, line 34 skipping to change at page 188, line 34
When there is no such change, as, for example when a range already When there is no such change, as, for example when a range already
locked for write is locked again for write, the server MAY increment locked for write is locked again for write, the server MAY increment
the "seqid" value. the "seqid" value.
9.5. Issues with Multiple Open-Owners 9.5. Issues with Multiple Open-Owners
When the same file is opened by multiple open-owners, a client will When the same file is opened by multiple open-owners, a client will
have multiple open stateids for that file, each associated with a have multiple open stateids for that file, each associated with a
different open-owner. In that case, there can be multiple LOCK and different open-owner. In that case, there can be multiple LOCK and
LOCKU requests for the same lock-owner issued using the different LOCKU requests for the same lock-owner sent using the different open
open stateids, and so a situation may arise in which there are stateids, and so a situation may arise in which there are multiple
multiple stateids, each representing byte-range locks on the same stateids, each representing byte-range locks on the same file and
file and held by the same lock-owner but each associated with a held by the same lock-owner but each associated with a different
different open-owner. open-owner.
In such a situation, the locking status of each byte (i.e. whether it In such a situation, the locking status of each byte (i.e. whether it
is locked, the read or write mode of the lock and the lock-owner is locked, the read or write mode of the lock and the lock-owner
holding the lock) MUST reflect the last LOCK or LOCKU operation done holding the lock) MUST reflect the last LOCK or LOCKU operation done
for the lock-owner in question, independent of the stateid through for the lock-owner in question, independent of the stateid through
which the request was issued. which the request was sent.
When a byte is locked by the lock-owner in question, the open-owner When a byte is locked by the lock-owner in question, the open-owner
to which that lock is assigned SHOULD be that of the open-owner to which that lock is assigned SHOULD be that of the open-owner
associated with the stateid through which the last LOCK of that byte associated with the stateid through which the last LOCK of that byte
was done. When there is a change in the open-owner associated with was done. When there is a change in the open-owner associated with
locks for the stateid through which a LOCK or LOCKU was done, the locks for the stateid through which a LOCK or LOCKU was done, the
"seqid" field of the stateid MUST be incremented, even if the "seqid" field of the stateid MUST be incremented, even if the
locking, in terms of lock-owners has not changed. When there is a locking, in terms of lock-owners has not changed. When there is a
change to the set of locked bytes associated with a different stateid change to the set of locked bytes associated with a different stateid
for the same lock-owner, i.e. associated with a different open-owner, for the same lock-owner, i.e. associated with a different open-owner,
skipping to change at page 191, line 22 skipping to change at page 191, line 22
operation so the appropriate share semantics can be applied. For operation so the appropriate share semantics can be applied. For
clients that do not have a deny mode built into their open clients that do not have a deny mode built into their open
programming interfaces, deny equal to NONE should be used. programming interfaces, deny equal to NONE should be used.
The OPEN operation with the CREATE flag, also subsumes the CREATE The OPEN operation with the CREATE flag, also subsumes the CREATE
operation for regular files as used in previous versions of the NFS operation for regular files as used in previous versions of the NFS
protocol. This allows a create with a share to be done atomically. protocol. This allows a create with a share to be done atomically.
The CLOSE operation removes all share reservations held by the open- The CLOSE operation removes all share reservations held by the open-
owner on that file. If byte-range locks are held, the client SHOULD owner on that file. If byte-range locks are held, the client SHOULD
release all locks before issuing a CLOSE. The server MAY free all release all locks before sending a CLOSE operation. The server MAY
outstanding locks on CLOSE but some servers may not support the CLOSE free all outstanding locks on CLOSE but some servers may not support
of a file that still has byte-range locks held. The server MUST the CLOSE of a file that still has byte-range locks held. The server
return failure, NFS4ERR_LOCKS_HELD, if any locks would exist after MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist
the CLOSE. after the CLOSE.
The LOOKUP operation will return a filehandle without establishing The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server any lock state on the server. Without a valid stateid, the server
will assume the client has the least access. For example, a file will assume the client has the least access. For example, a file
opened with deny READ/WRITE using a filehandle obtained through opened with deny READ/WRITE using a filehandle obtained through
LOOKUP could only be read using the special read bypass stateid and LOOKUP could only be read using the special read bypass stateid and
could not be written at all because it would not have a valid stateid could not be written at all because it would not have a valid stateid
and the special anonymous stateid would not be allowed access. and the special anonymous stateid would not be allowed access.
9.9. Open Upgrade and Downgrade 9.9. Open Upgrade and Downgrade
skipping to change at page 192, line 6 skipping to change at page 192, line 6
represented by a single stateid whose "other" values matches that of represented by a single stateid whose "other" values matches that of
the original open, and whose "seqid" value is incremented to reflect the original open, and whose "seqid" value is incremented to reflect
the occurrence of the upgrade. The increment is required in cases in the occurrence of the upgrade. The increment is required in cases in
which the "upgrade" results in no change to the open mode (e.g. an which the "upgrade" results in no change to the open mode (e.g. an
OPEN is done for read when the existing open file is opened for read- OPEN is done for read when the existing open file is opened for read-
write). Only a single CLOSE will be done to reset the effects of write). Only a single CLOSE will be done to reset the effects of
both OPENs. The client may use the stateid returned by the OPEN both OPENs. The client may use the stateid returned by the OPEN
effecting the upgrade or with a stateid sharing the same "other" effecting the upgrade or with a stateid sharing the same "other"
field and a seqid of zero, although care needs to be taken as far as field and a seqid of zero, although care needs to be taken as far as
upgrades which happen while the CLOSE is pending. Note that the upgrades which happen while the CLOSE is pending. Note that the
client, when issuing the OPEN, may not know that the same file is in client, when sending the OPEN operation, may not know that the same
fact being opened. The above only applies if both OPENs result in file is in fact being opened. The above only applies if both OPENs
the OPENed object being designated by the same filehandle. result in the OPENed object being designated by the same filehandle.
When the server chooses to export multiple filehandles corresponding When the server chooses to export multiple filehandles corresponding
to the same file object and returns different filehandles on two to the same file object and returns different filehandles on two
different OPENs of the same file object, the server MUST NOT "OR" different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files. together the access and deny bits and coalesce the two open files.
Instead the server must maintain separate OPENs with separate Instead the server must maintain separate OPENs with separate
stateids and will require separate CLOSEs to free them. stateids and will require separate CLOSEs to free them.
When multiple open files on the client are merged into a single open When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the file object on the server, the close of one of the open files (on the
skipping to change at page 235, line 5 skipping to change at page 235, line 5
attribute, the conditions of use specified in this attribute (e.g. attribute, the conditions of use specified in this attribute (e.g.
priorities, specification of simultaneous use) may limit the client's priorities, specification of simultaneous use) may limit the client's
use of these alternate locations. use of these alternate locations.
If a single location entry designates multiple server IP addresses, If a single location entry designates multiple server IP addresses,
the client cannot assume that these addresses are multiple paths to the client cannot assume that these addresses are multiple paths to
the same server. In most case they will be, but the client MUST the same server. In most case they will be, but the client MUST
verify that before acting on that assumption. When two server verify that before acting on that assumption. When two server
addresses are designated by a single location entry and they addresses are designated by a single location entry and they
correspond to different servers, this normally indicates some sort of correspond to different servers, this normally indicates some sort of
misconfiguration, and so the client should avoid use such location misconfiguration, and so the client should avoid using such location
entries when alternatives are available. When they are not, clients entries when alternatives are available. When they are not, clients
should pick one of IP addresses and use it, without using others that should pick one of IP addresses and use it, without using others that
are not directed to the same server. are not directed to the same server.
11.6. Additional Client-side Considerations 11.6. Additional Client-side Considerations
When clients make use of servers that implement referrals, When clients make use of servers that implement referrals,
replication, and migration, care should be taken so that a user who replication, and migration, care should be taken so that a user who
mounts a given file system that includes a referral or a relocated mounts a given file system that includes a referral or a relocated
file system continues to see a coherent picture of that user-side file system continues to see a coherent picture of that user-side
file system despite the fact that it contains a number of server-side file system despite the fact that it contains a number of server-side
file systems which may be on different servers. file systems which may be on different servers.
One important issue is upward navigation from the root of a server- One important issue is upward navigation from the root of a server-
side file system to its parent (specified as ".." in UNIX), in the side file system to its parent (specified as ".." in UNIX), in the
case in which it transitions to that file system as a result of case in which it transitions to that file system as a result of
referral, migration, or a transition as a result of replication. referral, migration, or a transition as a result of replication.
When the client is at such a point, and it needs to ascend to the When the client is at such a point, and it needs to ascend to the
parent, it must go back to the parent as seen within the multi-server parent, it must go back to the parent as seen within the multi-server
namespace rather issuing a LOOKUPP call to the server, which would namespace rather than sending a LOOKUPP operation to the server,
result in the parent within that server's single-server namespace. which would result in the parent within that server's single-server
In order to do this, the client needs to remember the filehandles namespace. In order to do this, the client needs to remember the
that represent such file system roots, and use these instead of filehandles that represent such file system roots, and use these
issuing a LOOKUPP to the current server. This will allow the client instead of sending a LOOKUPP operation to the current server. This
to present to applications a consistent namespace, where upward will allow the client to present to applications a consistent
navigation and downward navigation are consistent. namespace, where upward navigation and downward navigation are
consistent.
Another issue concerns refresh of referral locations. When referrals Another issue concerns refresh of referral locations. When referrals
are used extensively, they may change as server configurations are used extensively, they may change as server configurations
change. It is expected that clients will cache information related change. It is expected that clients will cache information related
to traversing referrals so that future client side requests are to traversing referrals so that future client side requests are
resolved locally without server communication. This is usually resolved locally without server communication. This is usually
rooted in client-side name lookup caching. Clients should rooted in client-side name lookup caching. Clients should
periodically purge this data for referral points in order to detect periodically purge this data for referral points in order to detect
changes in location information. When the change_policy attribute changes in location information. When the change_policy attribute
changes for directories that hold referral entries or for the changes for directories that hold referral entries or for the
skipping to change at page 245, line 7 skipping to change at page 245, line 7
system transition, edge conditions can arise similar to those for system transition, edge conditions can arise similar to those for
reclaim after server restart (although in the case of the planned reclaim after server restart (although in the case of the planned
state transfer associated with migration, these can be avoided by state transfer associated with migration, these can be avoided by
securely recording lock state as part of state migration). Unless securely recording lock state as part of state migration). Unless
the destination server can guarantee that locks will not be the destination server can guarantee that locks will not be
incorrectly granted, the destination server should not allow lock incorrectly granted, the destination server should not allow lock
reclaims and avoid establishing a grace period. reclaims and avoid establishing a grace period.
Once all locks have been reclaimed, or there were no locks to Once all locks have been reclaimed, or there were no locks to
reclaim, the client indicates that there are no more reclaims to be reclaim, the client indicates that there are no more reclaims to be
done for the file system in question by issuing a RECLAIM_COMPLETE done for the file system in question by sending a RECLAIM_COMPLETE
operation with the rca_one_fs parameter set to true. Once this has operation with the rca_one_fs parameter set to true. Once this has
been done, non-reclaim locking operations may be done, and any been done, non-reclaim locking operations may be done, and any
subsequent request to do reclaims will be rejected with the error subsequent request to do reclaims will be rejected with the error
NFS4ERR_NO_GRACE. NFS4ERR_NO_GRACE.
Information about client identity may be propagated between servers Information about client identity may be propagated between servers
in the form of client_owner4 and associated verifiers, under the in the form of client_owner4 and associated verifiers, under the
assumption that the client presents the same values to all the assumption that the client presents the same values to all the
servers with which it deals. servers with which it deals.
skipping to change at page 286, line 21 skipping to change at page 286, line 21
For block/volume-based layouts, LAYOUTCOMMIT may require updating the For block/volume-based layouts, LAYOUTCOMMIT may require updating the
block list that comprises the file and committing this layout to block list that comprises the file and committing this layout to
stable storage. For file-layouts synchronization of attributes stable storage. For file-layouts synchronization of attributes
between the metadata and storage devices primarily the size attribute between the metadata and storage devices primarily the size attribute
is required. is required.
The control protocol is free to synchronize the attributes before it The control protocol is free to synchronize the attributes before it
receives a LAYOUTCOMMIT, however upon successful completion of a receives a LAYOUTCOMMIT, however upon successful completion of a
LAYOUTCOMMIT, state that exists on the metadata server that describes LAYOUTCOMMIT, state that exists on the metadata server that describes
the file MUST be in sync with the state existing on the storage the file MUST be synchronized with the state existing on the storage
devices that comprise that file as of the issuing client's last devices that comprise that file as of the time of the client's last
operation. Thus, a client that queries the size of a file between a sent operation. Thus, a client that queries the size of a file
WRITE to a storage device and the LAYOUTCOMMIT may observe a size between a WRITE to a storage device and the LAYOUTCOMMIT might
that does not reflect the actual data written. observe a size that does not reflect the actual data written.
The client MUST have a layout in order to issue LAYOUTCOMMIT. The client MUST have a layout in order to send a LAYOUTCOMMIT
operation.
12.5.4.1. LAYOUTCOMMIT and change/time_modify 12.5.4.1. LAYOUTCOMMIT and change/time_modify
The change and time_modify attributes may be updated by the server The change and time_modify attributes may be updated by the server
when the LAYOUTCOMMIT operation is processed. The reason for this is when the LAYOUTCOMMIT operation is processed. The reason for this is
that some layout types do not support the update of these attributes that some layout types do not support the update of these attributes
when the storage devices process I/O operations. If client has a when the storage devices process I/O operations. If client has a
layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY
provide a suggested value to the server for time_modify within the provide a suggested value to the server for time_modify within the
arguments to LAYOUTCOMMIT. Based on the layout type, the provided arguments to LAYOUTCOMMIT. Based on the layout type, the provided
skipping to change at page 290, line 26 skipping to change at page 290, line 26
o If conflicts that require callbacks are very rare, and a server o If conflicts that require callbacks are very rare, and a server
can use a multi-file callback to recover per-client resources can use a multi-file callback to recover per-client resources
(e.g., via a FSID recall, or a multi-file recall within a single (e.g., via a FSID recall, or a multi-file recall within a single
compound), the result may be significantly less client-server pNFS compound), the result may be significantly less client-server pNFS
traffic. traffic.
o It may be useful for servers to maintain information about what o It may be useful for servers to maintain information about what
ranges are held by a client on a coarse-grained basis, leading to ranges are held by a client on a coarse-grained basis, leading to
the server's layout ranges being beyond those actually held by the the server's layout ranges being beyond those actually held by the
client. In the extreme, a server could manage conflicts on a per- client. In the extreme, a server could manage conflicts on a per-
file basis, only issuing whole-file callbacks even though clients file basis, only sending whole-file callbacks even though clients
may request and be granted sub-file ranges. may request and be granted sub-file ranges.
o It may be useful for clients to "forget" details about what o It may be useful for clients to "forget" details about what
layouts and ranges the client actually has, leading to the layouts and ranges the client actually has, leading to the
server's layout ranges being beyond those what the client "thinks" server's layout ranges being beyond those what the client "thinks"
it has. As long as the client does not assume it has layouts that it has. As long as the client does not assume it has layouts that
are beyond what the server has granted, this is a safe practice. are beyond what the server has granted, this is a safe practice.
When a client forgets what ranges and layouts it has, and it When a client forgets what ranges and layouts it has, and it
receives a CB_LAYOUTRECALL operation, the client MUST follow up receives a CB_LAYOUTRECALL operation, the client MUST follow up
with a LAYOUTRETURN for what the server recalled, or alternatively with a LAYOUTRETURN for what the server recalled, or alternatively
return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to
return in the recalled range. return in the recalled range.
o In order to avoid errors, it is vital that a client not assign o In order to avoid errors, it is vital that a client not assign
itself layout permissions beyond what the server has granted and itself layout permissions beyond what the server has granted and
that the server not forget layout permissions that have been that the server not forget layout permissions that have been
granted. On the other hand, if a server believes that a client granted. On the other hand, if a server believes that a client
holds a layout that the client does not know about, it is useful holds a layout that the client does not know about, it is useful
for the client to cleanly indicate completion of the requested for the client to cleanly indicate completion of the requested
recall either by issuing a LAYOUTRETURN for the entire requested recall either by sending a LAYOUTRETURN operation for the entire
range or by returning an NFS4ERR_NOMATCHING_LAYOUT error to the requested range or by returning an NFS4ERR_NOMATCHING_LAYOUT error
CB_LAYOUTRECALL. to the CB_LAYOUTRECALL.
Thus, in light of the above, it is useful for a server to be able to Thus, in light of the above, it is useful for a server to be able to
send callbacks for layout ranges it has not granted to a client, and send callbacks for layout ranges it has not granted to a client, and
for a client to return ranges it does not hold. A pNFS client MUST for a client to return ranges it does not hold. A pNFS client MUST
always return layouts that comprise the full range specified by the always return layouts that comprise the full range specified by the
recall. Note, the full recalled layout range need not be returned as recall. Note, the full recalled layout range need not be returned as
part of a single operation, but may be returned in portions. This part of a single operation, but may be returned in portions. This
allows the client to stage the flushing of dirty data, layout allows the client to stage the flushing of dirty data, layout
commits, and returns. Also, it indicates to the metadata server that commits, and returns. Also, it indicates to the metadata server that
the client is making progress. the client is making progress.
skipping to change at page 292, line 29 skipping to change at page 292, line 29
processing such a CB_LAYOUTRECALL until it processes all replies for processing such a CB_LAYOUTRECALL until it processes all replies for
outstanding LAYOUTGET and LAYOUTRETURN operations for the outstanding LAYOUTGET and LAYOUTRETURN operations for the
corresponding file with seqid less than the seqid given by corresponding file with seqid less than the seqid given by
CB_LAYOUTRECALL (lor_stateid, see Section 20.3.) CB_LAYOUTRECALL (lor_stateid, see Section 20.3.)
In addition to the seqid-based mechanism, Section 2.10.6.3 describes In addition to the seqid-based mechanism, Section 2.10.6.3 describes
the sessions mechanism for allowing the client to detect callback the sessions mechanism for allowing the client to detect callback
race conditions and delay processing such a CB_LAYOUTRECALL. The race conditions and delay processing such a CB_LAYOUTRECALL. The
server MAY reference conflicting operations in the CB_SEQUENCE that server MAY reference conflicting operations in the CB_SEQUENCE that
precedes the CB_LAYOUTRECALL. Because the server has already sent precedes the CB_LAYOUTRECALL. Because the server has already sent
replies for these operations before issuing the callback, the replies replies for these operations before sending the callback, the replies
may race with the CB_LAYOUTRECALL. The client MUST wait for all the may race with the CB_LAYOUTRECALL. The client MUST wait for all the
referenced calls to complete and update its view of the layout state referenced calls to complete and update its view of the layout state
before processing the CB_LAYOUTRECALL. before processing the CB_LAYOUTRECALL.
12.5.5.2.1.1. Get/Return Sequencing 12.5.5.2.1.1. Get/Return Sequencing
The protocol allows the client to send concurrent LAYOUTGET and The protocol allows the client to send concurrent LAYOUTGET and
LAYOUTRETURN operations to the server. The protocol does not provide LAYOUTRETURN operations to the server. The protocol does not provide
any means for the server to process the requests in the same order in any means for the server to process the requests in the same order in
which they were created. However, through the use of the "seqid" which they were created. However, through the use of the "seqid"
skipping to change at page 293, line 27 skipping to change at page 293, line 27
outstanding at the same time for that same file. outstanding at the same time for that same file.
12.5.5.2.1.2. Client Considerations 12.5.5.2.1.2. Client Considerations
Consider a pNFS client that has sent a LAYOUTGET and before it Consider a pNFS client that has sent a LAYOUTGET and before it
receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for
the same file with an overlapping range. There are two the same file with an overlapping range. There are two
possibilities, which the client can distinguish via the layout possibilities, which the client can distinguish via the layout
stateid in the recall. stateid in the recall.
1. The server processed the LAYOUTGET before issuing the recall, so 1. The server processed the LAYOUTGET before sending the recall, so
the LAYOUTGET must be waited for because it may be carrying the LAYOUTGET must be waited for because it may be carrying
layout information that will need to be returned to deal with the layout information that will need to be returned to deal with the
CB_LAYOUTRECALL. CB_LAYOUTRECALL.
2. The server sent the callback before receiving the LAYOUTGET. The 2. The server sent the callback before receiving the LAYOUTGET. The
server will not respond to the LAYOUTGET until the server will not respond to the LAYOUTGET until the
CB_LAYOUTRECALL is processed. CB_LAYOUTRECALL is processed.
If these possibilities cannot be distinguished, a deadlock could If these possibilities cannot be distinguished, a deadlock could
result, as the client must wait for the LAYOUTGET response before result, as the client must wait for the LAYOUTGET response before
skipping to change at page 309, line 42 skipping to change at page 309, line 42
NFLH4_CARE_COMMIT_THRU_MDS NFLH4_CARE_COMMIT_THRU_MDS
= NFL4_UFLG_COMMIT_THRU_MDS, = NFL4_UFLG_COMMIT_THRU_MDS,
NFLH4_CARE_STRIPE_UNIT_SIZE NFLH4_CARE_STRIPE_UNIT_SIZE
= 0x00000040, = 0x00000040,
NFLH4_CARE_STRIPE_COUNT = 0x00000080 NFLH4_CARE_STRIPE_COUNT = 0x00000080
}; };
/* Encoded in the loh_body field of type layouthint4: */ /* Encoded in the loh_body field of data type layouthint4: */
struct nfsv4_1_file_layouthint4 { struct nfsv4_1_file_layouthint4 {
uint32_t nflh_care; uint32_t nflh_care;
nfl_util4 nflh_util; nfl_util4 nflh_util;
count4 nflh_stripe_count; count4 nflh_stripe_count;
}; };
The generic layout hint structure is described in Section 3.3.19. The generic layout hint structure is described in Section 3.3.19.
The client uses the layout hint in the layout_hint (Section 5.12.4) The client uses the layout hint in the layout_hint (Section 5.12.4)
attribute to indicate the preferred type of layout to be used for a attribute to indicate the preferred type of layout to be used for a
skipping to change at page 310, line 37 skipping to change at page 310, line 37
nfsv4_1_file_layout4. Among other content, nfsv4_1_file_layout4 has nfsv4_1_file_layout4. Among other content, nfsv4_1_file_layout4 has
a storage device ID (field nfl_deviceid) of data type deviceid4. The a storage device ID (field nfl_deviceid) of data type deviceid4. The
GETDEVICEINFO operation maps a device ID to a storage device address GETDEVICEINFO operation maps a device ID to a storage device address
(type device_addr4). When GETDEVICEINFO returns a device address (type device_addr4). When GETDEVICEINFO returns a device address
with a layout type of LAYOUT4_NFSV4_1_FILES (the da_layout_type with a layout type of LAYOUT4_NFSV4_1_FILES (the da_layout_type
field), the da_addr_body field contains a value of data type field), the da_addr_body field contains a value of data type
nfsv4_1_file_layout_ds_addr4. nfsv4_1_file_layout_ds_addr4.
typedef netaddr4 multipath_list4<>; typedef netaddr4 multipath_list4<>;
/* Encoded in the da_addr_body field of type device_addr4: */ /*
* Encoded in the da_addr_body field of
* data type device_addr4:
*/
struct nfsv4_1_file_layout_ds_addr4 { struct nfsv4_1_file_layout_ds_addr4 {
uint32_t nflda_stripe_indices<>; uint32_t nflda_stripe_indices<>;
multipath_list4 nflda_multipath_ds_list<>; multipath_list4 nflda_multipath_ds_list<>;
}; };
The nfsv4_1_file_layout_ds_addr4 data type represents the device The nfsv4_1_file_layout_ds_addr4 data type represents the device
address. It is composed of two fields: address. It is composed of two fields:
1. nflda_multipath_ds_list: An array of lists of data servers, where 1. nflda_multipath_ds_list: An array of lists of data servers, where
each list can be one or more elements, and each element each list can be one or more elements, and each element
skipping to change at page 311, line 13 skipping to change at page 311, line 16
array might be different than the stripe count. array might be different than the stripe count.
2. nflda_stripe_indices: An array of indices used to index into 2. nflda_stripe_indices: An array of indices used to index into
nflda_multipath_ds_list. The value of each element of nflda_multipath_ds_list. The value of each element of
nflda_stripe_indices MUST be less than the number of elements in nflda_stripe_indices MUST be less than the number of elements in
nflda_multipath_ds_list. Each element of nflda_multipath_ds_list nflda_multipath_ds_list. Each element of nflda_multipath_ds_list
SHOULD be referred to by one or more elements of SHOULD be referred to by one or more elements of
nflda_stripe_indices. The number of elements in nflda_stripe_indices. The number of elements in
nflda_stripe_indices is always equal to the stripe count. nflda_stripe_indices is always equal to the stripe count.
/* Encoded in the loc_body field of type layout_content4: */ /*
* Encoded in the loc_body field of
* data type layout_content4:
*/
struct nfsv4_1_file_layout4 { struct nfsv4_1_file_layout4 {
deviceid4 nfl_deviceid; deviceid4 nfl_deviceid;
nfl_util4 nfl_util; nfl_util4 nfl_util;
uint32_t nfl_first_stripe_index; uint32_t nfl_first_stripe_index;
offset4 nfl_pattern_offset; offset4 nfl_pattern_offset;
nfs_fh4 nfl_fh_list<>; nfs_fh4 nfl_fh_list<>;
}; };
The nfsv4_1_file_layout4 data type represents the layout. It is The nfsv4_1_file_layout4 data type represents the layout. It is
composed of the following fields: composed of the following fields:
skipping to change at page 312, line 26 skipping to change at page 312, line 32
nfl_fh_list MUST be one of three values: nfl_fh_list MUST be one of three values:
+ Zero. This means that filehandles used for each data + Zero. This means that filehandles used for each data
server are the same as the filehandle returned by the OPEN server are the same as the filehandle returned by the OPEN
operation from the metadata server. operation from the metadata server.
+ One. This means that every data server uses the same + One. This means that every data server uses the same
filehandle: what is specified in nfl_fh_list[0]. filehandle: what is specified in nfl_fh_list[0].
+ The same number of elements in nflda_multipath_ds_list. + The same number of elements in nflda_multipath_ds_list.
Thus, in this case, when issuing an I/O to any data server Thus, in this case, when sending an I/O operation to any
in nflda_multipath_ds_list[X], the filehandle in data server in nflda_multipath_ds_list[X], the filehandle
nfl_fh_list[X] MUST be used. in nfl_fh_list[X] MUST be used.
See the discussion on sparse packing in Section 13.4.4. See the discussion on sparse packing in Section 13.4.4.
* If dense packing is being used, number of elements in * If dense packing is being used, number of elements in
nfl_fh_list MUST be the same as the number of elements in nfl_fh_list MUST be the same as the number of elements in
nflda_stripe_indices. Thus when issuing I/O to any data nflda_stripe_indices. Thus when sending an I/O operation to
server in nflda_multipath_ds_list[nflda_stripe_indices[Y]], any data server in
the filehandle in nfl_fh_list[Y] MUST be used. In addition, nflda_multipath_ds_list[nflda_stripe_indices[Y]], the
any time there exists i, and j, (i != j) such that the filehandle in nfl_fh_list[Y] MUST be used. In addition, any
time there exists i, and j, (i != j) such that the
intersection of intersection of
nflda_multipath_ds_list[nflda_stripe_indices[i]] and nflda_multipath_ds_list[nflda_stripe_indices[i]] and
nflda_multipath_ds_list[nflda_stripe_indices[j]] is not empty, nflda_multipath_ds_list[nflda_stripe_indices[j]] is not empty,
then nfl_fh_list[i] MUST NOT equal nfl_fh_list[j]. In other then nfl_fh_list[i] MUST NOT equal nfl_fh_list[j]. In other
words, when dense packing is being used, if a data server words, when dense packing is being used, if a data server
appears in two or more units of a striping pattern, each appears in two or more units of a striping pattern, each
reference to the data server MUST use a different filehandle. reference to the data server MUST use a different filehandle.
Indeed, if there are multiple striping patterns, as indicated Indeed, if there are multiple striping patterns, as indicated
by the presence of multiple objects of data type layout4 by the presence of multiple objects of data type layout4
skipping to change at page 319, line 10 skipping to change at page 319, line 10
The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util
of the data type nfsv4_1_file_layouthint4 and field nfl_util of data of the data type nfsv4_1_file_layouthint4 and field nfl_util of data
type nfsv4_1_file_layout_ds_addr4) specifies how the data is packed type nfsv4_1_file_layout_ds_addr4) specifies how the data is packed
within the data file on a data server. It allows for two different within the data file on a data server. It allows for two different
data packings: sparse and dense. The packing type determines the data packings: sparse and dense. The packing type determines the
calculation that will be made to map the client visible file offset calculation that will be made to map the client visible file offset
to the offset within the data file located on the data server. to the offset within the data file located on the data server.
If nfl_util & NFL4_UFLG_DENSE is zero, this means that sparse packing If nfl_util & NFL4_UFLG_DENSE is zero, this means that sparse packing
is being used. Hence the logical offsets of the file as viewed by a is being used. Hence the logical offsets of the file as viewed by a
client issuing READs and WRITEs directly to the metadata server are client sending READs and WRITEs directly to the metadata server are
the same offsets each data server uses when storing a stripe unit. the same offsets each data server uses when storing a stripe unit.
The effect then, for striping patterns consisting of at least two The effect then, for striping patterns consisting of at least two
stripe units, is for each data server file to be sparse or holey. So stripe units, is for each data server file to be sparse or holey. So
for example, suppose there is a pattern with three stripe units, the for example, suppose there is a pattern with three stripe units, the
stripe unit size is a 4096 bytes, and there are three data servers in stripe unit size is a 4096 bytes, and there are three data servers in
the pattern, then the file in data server 1 will have stripe units 0, the pattern, then the file in data server 1 will have stripe units 0,
3, 6, 9, ... filled, data server 2's file will have stripe units 1, 3, 6, 9, ... filled, data server 2's file will have stripe units 1,
4, 7, 10, ... filled, and data server 3's file will have stripe units 4, 7, 10, ... filled, and data server 3's file will have stripe units
2, 5, 8, 11, ... filled. The unfilled stripe units of each file will 2, 5, 8, 11, ... filled. The unfilled stripe units of each file will
be holes, hence the files in each data server are sparse. be holes, hence the files in each data server are sparse.
skipping to change at page 324, line 46 skipping to change at page 324, line 46
Note that if the layout specified dense packing, then the offset used Note that if the layout specified dense packing, then the offset used
to a COMMIT to the MDS may differ than that of an offset used to a to a COMMIT to the MDS may differ than that of an offset used to a
COMMIT to the data server. COMMIT to the data server.
The single COMMIT to the metadata server will return a verifier and The single COMMIT to the metadata server will return a verifier and
the client should compare it to all the verifiers from the WRITEs and the client should compare it to all the verifiers from the WRITEs and
fail the COMMIT if there is any mismatched verifiers. If COMMIT to fail the COMMIT if there is any mismatched verifiers. If COMMIT to
the metadata server fails, the client should re-send WRITEs for all the metadata server fails, the client should re-send WRITEs for all
the modified data in the file. The client should treat modified data the modified data in the file. The client should treat modified data
with a mismatched verifier as a WRITE failure and try to recover by with a mismatched verifier as a WRITE failure and try to recover by
reissuing the WRITEs to the original data server or using another resending the WRITEs to the original data server or using another
path to that data if the layout has not been recalled. Another path to that data if the layout has not been recalled. Another
option the client has is getting a new layout or just rewrite the option the client has is getting a new layout or just rewrite the
data through the metadata server. If nfl_util & data through the metadata server. If nfl_util &
NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending a COMMIT to the metadata NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending a COMMIT to the metadata
server might have no effect. If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS server might have no effect. If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS
is FALSE, a COMMIT sent to the metadata server should be used only to is FALSE, a COMMIT sent to the metadata server should be used only to
commit data that was written to the metadata server. See commit data that was written to the metadata server. See
Section 12.7.6 for recovery options. Section 12.7.6 for recovery options.
13.8. The Layout Iomode 13.8. The Layout Iomode
skipping to change at page 325, line 51 skipping to change at page 325, line 51
has the implication that stateids are globally valid on both the has the implication that stateids are globally valid on both the
metadata and data servers. This requires the metadata server to metadata and data servers. This requires the metadata server to
propagate changes in lock and open state to the data servers, so that propagate changes in lock and open state to the data servers, so that
the data servers can validate I/O accesses. This is discussed the data servers can validate I/O accesses. This is discussed
further in Section 13.9.2. Depending on when stateids are further in Section 13.9.2. Depending on when stateids are
propagated, the existence of a valid stateid on the data server may propagated, the existence of a valid stateid on the data server may
act as proof of a valid layout. act as proof of a valid layout.
Clients performing I/O operations need to select an appropriate Clients performing I/O operations need to select an appropriate
stateid based on the locks (including opens and delegations) held by stateid based on the locks (including opens and delegations) held by
the client and the various types of state-owners issuing the I/O the client and the various types of state-owners sending the I/O
requests. The rules for doing so when referencing data servers are requests. The rules for doing so when referencing data servers are
somewhat different from those discussed in Section 8.2.5 which apply somewhat different from those discussed in Section 8.2.5 which apply
when accessing metadata servers. when accessing metadata servers.
The following rules, applied in order of decreasing priority, govern The following rules, applied in order of decreasing priority, govern
the selection of the appropriate stateid: the selection of the appropriate stateid:
o If the client holds a delegation for the file in question, the o If the client holds a delegation for the file in question, the
delegation stateid should be used. delegation stateid should be used.
skipping to change at page 329, line 22 skipping to change at page 329, line 22
on a precise client lease timer and without requiring data servers to on a precise client lease timer and without requiring data servers to
maintain lease timers. However, while LAYOUT4_NFSV4_1_FILES pNFS maintain lease timers. However, while LAYOUT4_NFSV4_1_FILES pNFS
server is free to deny the client all access to the data servers, server is free to deny the client all access to the data servers,
because it supports revocation of layouts, it is also free to perform because it supports revocation of layouts, it is also free to perform
a denial on a per file basis only when revoking a layout. a denial on a per file basis only when revoking a layout.
In addition to lease expiration, the reasons a layout can be revoked In addition to lease expiration, the reasons a layout can be revoked
include: client fails to respond to a CB_LAYOUTRECALL, the metadata include: client fails to respond to a CB_LAYOUTRECALL, the metadata
server restarts, or administrative intervention. Regardless of the server restarts, or administrative intervention. Regardless of the
reason, once a client's layout has been revoked, the pNFS server MUST reason, once a client's layout has been revoked, the pNFS server MUST
prevent the client from issuing I/O for the affected file from and to prevent the client from sending I/O for the affected file from and to
all data servers, in other words, it MUST fence the client from the all data servers, in other words, it MUST fence the client from the
affected file on the data servers. affected file on the data servers.
Fencing works as follows. As described in Section 13.1, in COMPOUND Fencing works as follows. As described in Section 13.1, in COMPOUND
procedure requests to the data server, the data filehandle provided procedure requests to the data server, the data filehandle provided
by the PUTFH operation and the stateid in the READ or WRITE operation by the PUTFH operation and the stateid in the READ or WRITE operation
are used to validate that the client has a valid layout for the I/O are used to validate that the client has a valid layout for the I/O
being performed, if it does not, the I/O is rejected with being performed, if it does not, the I/O is rejected with
NFS4ERR_PNFS_NO_LAYOUT. The server can simply check the stateid, and NFS4ERR_PNFS_NO_LAYOUT. The server can simply check the stateid, and
additionally, make the data filehandle stale if the layout specified additionally, make the data filehandle stale if the layout specified
skipping to change at page 350, line 17 skipping to change at page 350, line 17
A reclaim of client state was attempted in circumstances in which the A reclaim of client state was attempted in circumstances in which the
server cannot guarantee that conflicting state has not been provided server cannot guarantee that conflicting state has not been provided
to another client. This can occur because the reclaim has been done to another client. This can occur because the reclaim has been done
outside of the grace period of the server, after the client has done outside of the grace period of the server, after the client has done
a RECLAIM_COMPLETE operation, or because previous operations have a RECLAIM_COMPLETE operation, or because previous operations have
created a situation in which the server is not able to determine that created a situation in which the server is not able to determine that
a reclaim-interfering edge condition does not exist. a reclaim-interfering edge condition does not exist.
15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034) 15.1.9.4. NFS4ERR_RECLAIM_BAD (Error Code 10034)
A reclaim attempted by the client does not match the server's state The server has determined that a reclaim attempted by the client is
consistency checks and has been rejected therefore as invalid. not valid, i.e. the lock specified as being reclaimed could not
possibly have existed before the server restart. A server is not
obliged to make this determination and will typically rely on the
client to only reclaim locks that the client was granted prior to
restart. However, when a server does have reliable information to
enable it make this determination, this error indicates that the
reclaim has been rejected as invalid. This is as opposed to the
error NFS4ERR_RECLAIM_CONFLICT (see Section 15.1.9.5) where the
server can only determine that there has been an invalid reclaim, but
cannot determine which request is invalid.
15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035) 15.1.9.5. NFS4ERR_RECLAIM_CONFLICT (Error Code 10035)
The reclaim attempted by the client has encountered a conflict and The reclaim attempted by the client has encountered a conflict and
cannot be satisfied. Potentially indicates a misbehaving client, cannot be satisfied. Potentially indicates a misbehaving client,
although not necessarily the one receiving the error. The although not necessarily the one receiving the error. The
misbehavior might be on the part of the client that established the misbehavior might be on the part of the client that established the
lock with which this client conflicted. lock with which this client conflicted. See also Section 15.1.9.4
for the related error, NFS4ERR_RECLAIM_BAD.
15.1.10. pNFS Errors 15.1.10. pNFS Errors
This section deals with pNFS-related errors including those that are This section deals with pNFS-related errors including those that are
associated with using NFSv4.1 to communicate with a data server. associated with using NFSv4.1 to communicate with a data server.
15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049) 15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049)
An invalid or inappropriate layout iomode was specified. An invalid or inappropriate layout iomode was specified.
skipping to change at page 351, line 42 skipping to change at page 351, line 49
Section 12.5.5.2.1.3. Section 12.5.5.2.1.3.
15.1.10.9. NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062) 15.1.10.9. NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062)
The client has specified a layout type which is not supported by the The client has specified a layout type which is not supported by the
server. server.
15.1.11. Session Use Errors 15.1.11. Session Use Errors
This section deals with errors encountered in using sessions, that This section deals with errors encountered in using sessions, that
is, in issuing requests over them using the Sequence (i.e. either is, in sending requests over sessions using Sequence (i.e. either
SEQUENCE or CB_SEQUENCE) operations. SEQUENCE or CB_SEQUENCE) operations.
15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052)
The specified session ID is unknown to the server to which the The specified session ID is unknown to the server to which the
operation is addressed. operation is addressed.
15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053)
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
skipping to change at page 352, line 41 skipping to change at page 352, line 46
A Sequence operation was sent on a connection that has not been A Sequence operation was sent on a connection that has not been
associated with the specified session, where the client specified associated with the specified session, where the client specified
that connection association was to be enforced with SP4_MACH_CRED or that connection association was to be enforced with SP4_MACH_CRED or
SP4_SSV state protection. SP4_SSV state protection.
15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076) 15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076)
The requester sent a Sequence operation with a slot ID and sequence The requester sent a Sequence operation with a slot ID and sequence
ID that are in the reply cache, but the replier has detected that the ID that are in the reply cache, but the replier has detected that the
retried request is not the same as the original request. retried request is not the same as the original request, including a
retry that has different operations or different arguments in the
operations from the original, and a retry that uses a different
principal in the RPC request's credential field.
15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063) 15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063)
The requester sent a Sequence operation with an invalid sequence ID. The requester sent a Sequence operation with an invalid sequence ID.
15.1.12. Session Management Errors 15.1.12. Session Management Errors
This section deals with errors associated with requests used in This section deals with errors associated with requests used in
session management. session management.
skipping to change at page 353, line 28 skipping to change at page 353, line 37
15.1.13.1. NFS4ERR_CLIENTID_BUSY (Error Code 10074) 15.1.13.1. NFS4ERR_CLIENTID_BUSY (Error Code 10074)
The DESTROY_CLIENTID operation has found there are sessions and/or The DESTROY_CLIENTID operation has found there are sessions and/or
unexpired state associated with the client ID to be destroyed. unexpired state associated with the client ID to be destroyed.
15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017) 15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017)
While processing an EXCHANGE_ID operation, the server was presented While processing an EXCHANGE_ID operation, the server was presented
with a co_ownerid field matches an existing client with valid leased with a co_ownerid field matches an existing client with valid leased
state but the principal issuing the EXCHANGE_ID is different than state but the principal sending the EXCHANGE_ID operation is
that establishing the existing client. This indicates a (most likely different than that establishing the existing client. This indicates
due to chance) collision between clients. The client should recover a (most likely due to chance) collision between clients. The client
by changing the co_ownerid and re-sending EXCHANGE_ID (but not with should recover by changing the co_ownerid and re-sending EXCHANGE_ID
the same slot ID and sequence ID; one or both MUST be different on (but not with the same slot ID and sequence ID; one or both MUST be
the re-send). different on the re-send).
15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079) 15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079)
An EXCHANGE_ID was sent which specified state protection via SSV, and An EXCHANGE_ID was sent which specified state protection via SSV, and
where the set of encryption algorithms presented by the client did where the set of encryption algorithms presented by the client did
not include any supported by the server. not include any supported by the server.
15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072) 15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072)
An EXCHANGE_ID was sent which specified state protection via SSV, and An EXCHANGE_ID was sent which specified state protection via SSV, and
skipping to change at page 368, line 35 skipping to change at page 368, line 35
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_TOO_MANY_OPS |
| RENAME | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, | | RENAME | NFS4ERR_ACCESS, NFS4ERR_BADCHAR, |
| | NFS4ERR_BADNAME, NFS4ERR_BADXDR, | | | NFS4ERR_BADNAME, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_DQUOT, NFS4ERR_EXIST, | | | NFS4ERR_DQUOT, NFS4ERR_EXIST, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, | | | NFS4ERR_FHEXPIRED, NFS4ERR_FILE_OPEN, |
| | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, | | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, |
| | NFS4ERR_MOVED, NFS4ERR_NAMETOOLONG, | | | NFS4ERR_MLINK, NFS4ERR_MOVED, |
| | NFS4ERR_NOENT, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NAMETOOLONG, NFS4ERR_NOENT, |
| | NFS4ERR_NOSPC, NFS4ERR_NOTDIR, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOSPC, |
| | NFS4ERR_NOTEMPTY, | | | NFS4ERR_NOTDIR, NFS4ERR_NOTEMPTY, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_ROFS, |
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC, | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONGSEC, |
| | NFS4ERR_XDEV | | | NFS4ERR_XDEV |
| RENEW | NFS4ERR_NOTSUPP | | RENEW | NFS4ERR_NOTSUPP |
| RESTOREFH | NFS4ERR_DEADSESSION, NFS4ERR_FHEXPIRED, | | RESTOREFH | NFS4ERR_DEADSESSION, NFS4ERR_FHEXPIRED, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, |
skipping to change at page 380, line 47 skipping to change at page 380, line 47
| NFS4ERR_ISDIR | COMMIT, LAYOUTCOMMIT, | | NFS4ERR_ISDIR | COMMIT, LAYOUTCOMMIT, |
| | LAYOUTRETURN, LINK, LOCK, | | | LAYOUTRETURN, LINK, LOCK, |
| | LOCKT, OPEN, READ, WRITE | | | LOCKT, OPEN, READ, WRITE |
| NFS4ERR_LAYOUTTRYLATER | LAYOUTGET | | NFS4ERR_LAYOUTTRYLATER | LAYOUTGET |
| NFS4ERR_LAYOUTUNAVAILABLE | LAYOUTGET | | NFS4ERR_LAYOUTUNAVAILABLE | LAYOUTGET |
| NFS4ERR_LOCKED | LAYOUTGET, READ, SETATTR, | | NFS4ERR_LOCKED | LAYOUTGET, READ, SETATTR, |
| | WRITE | | | WRITE |
| NFS4ERR_LOCKS_HELD | CLOSE, FREE_STATEID | | NFS4ERR_LOCKS_HELD | CLOSE, FREE_STATEID |
| NFS4ERR_LOCK_NOTSUPP | LOCK | | NFS4ERR_LOCK_NOTSUPP | LOCK |
| NFS4ERR_LOCK_RANGE | LOCK, LOCKT, LOCKU | | NFS4ERR_LOCK_RANGE | LOCK, LOCKT, LOCKU |
| NFS4ERR_MLINK | CREATE, LINK | | NFS4ERR_MLINK | CREATE, LINK, RENAME |
| NFS4ERR_MOVED | ACCESS, CLOSE, COMMIT, | | NFS4ERR_MOVED | ACCESS, CLOSE, COMMIT, |
| | CREATE, DELEGRETURN, GETATTR, | | | CREATE, DELEGRETURN, GETATTR, |
| | GETFH, GET_DIR_DELEGATION, | | | GETFH, GET_DIR_DELEGATION, |
| | LAYOUTCOMMIT, LAYOUTGET, | | | LAYOUTCOMMIT, LAYOUTGET, |
| | LAYOUTRETURN, LINK, LOCK, | | | LAYOUTRETURN, LINK, LOCK, |
| | LOCKT, LOCKU, LOOKUP, | | | LOCKT, LOCKU, LOOKUP, |
| | LOOKUPP, NVERIFY, OPEN, | | | LOOKUPP, NVERIFY, OPEN, |
| | OPENATTR, OPEN_DOWNGRADE, | | | OPENATTR, OPEN_DOWNGRADE, |
| | PUTFH, READ, READDIR, | | | PUTFH, READ, READDIR, |
| | READLINK, RECLAIM_COMPLETE, | | | READLINK, RECLAIM_COMPLETE, |
skipping to change at page 410, line 48 skipping to change at page 410, line 48
18.2.3. DESCRIPTION 18.2.3. DESCRIPTION
The CLOSE operation releases share reservations for the regular or The CLOSE operation releases share reservations for the regular or
named attribute file as specified by the current filehandle. The named attribute file as specified by the current filehandle. The
share reservations and other state information released at the server share reservations and other state information released at the server
as a result of this CLOSE is only that associated with the supplied as a result of this CLOSE is only that associated with the supplied
stateid. State associated with other OPENs is not affected. stateid. State associated with other OPENs is not affected.
If byte-range locks are held, the client SHOULD release all locks If byte-range locks are held, the client SHOULD release all locks
before issuing a CLOSE. The server MAY free all outstanding locks on before sending a CLOSE. The server MAY free all outstanding locks on
CLOSE but some servers may not support the CLOSE of a file that still CLOSE but some servers may not support the CLOSE of a file that still
has byte-range locks held. The server MUST return failure if any has byte-range locks held. The server MUST return failure if any
locks would exist after the CLOSE. locks would exist after the CLOSE.
The argument seqid MAY have any value and the server MUST ignore The argument seqid MAY have any value and the server MUST ignore
seqid. seqid.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
The server MAY require that the principal, security flavor, and The server MAY require that the principal, security flavor, and
skipping to change at page 452, line 36 skipping to change at page 452, line 36
sent for an open file have the same credentials as the OPEN itself, sent for an open file have the same credentials as the OPEN itself,
and the server is REQUIRED to perform access checking on the READs and the server is REQUIRED to perform access checking on the READs
and WRITEs themselves. Otherwise, if the reply to EXCHANGE_ID did and WRITEs themselves. Otherwise, if the reply to EXCHANGE_ID did
have EXCHGID4_FLAG_BIND_PRINC_STATEID set, then with one exception, have EXCHGID4_FLAG_BIND_PRINC_STATEID set, then with one exception,
the credentials used in the OPEN request MUST match those used in the the credentials used in the OPEN request MUST match those used in the
READs and WRITEs, and the stateids in the READs and WRITEs MUST READs and WRITEs, and the stateids in the READs and WRITEs MUST
match, or be derived from the stateid from the reply to OPEN. The match, or be derived from the stateid from the reply to OPEN. The
exception is if SP4_SSV or SP4_MACH_CRED state protection is used, exception is if SP4_SSV or SP4_MACH_CRED state protection is used,
and the spo_must_allow result of EXCHANGE_ID includes the READ and/or and the spo_must_allow result of EXCHANGE_ID includes the READ and/or
WRITE operations. In that case, the machine or SSV credential will WRITE operations. In that case, the machine or SSV credential will
be allowed to issue READ and/or WRITE. See Section 18.35. be allowed to send READ and/or WRITE. See Section 18.35.
If the component provided to OPEN is a symbolic link, the error If the component provided to OPEN is a symbolic link, the error
NFS4ERR_SYMLINK will be returned to the client, while if it is a NFS4ERR_SYMLINK will be returned to the client, while if it is a
directory the error NFS4ERR_ISDIR. If the component is neither of directory the error NFS4ERR_ISDIR. If the component is neither of
those but not an ordinary file, the error NFS4ERR_WRONG_TYPE is those but not an ordinary file, the error NFS4ERR_WRONG_TYPE is
returned. If the current filehandle is not a directory, the error returned. If the current filehandle is not a directory, the error
NFS4ERR_NOTDIR will be returned. NFS4ERR_NOTDIR will be returned.
The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a
client avoid the common implementation practice of renaming an open client avoid the common implementation practice of renaming an open
skipping to change at page 468, line 11 skipping to change at page 468, line 11
NFSv3 required a different operator RMDIR for directory removal and NFSv3 required a different operator RMDIR for directory removal and
REMOVE for non-directory removal. This allowed clients to skip REMOVE for non-directory removal. This allowed clients to skip
checking the file type when being passed a non-directory delete checking the file type when being passed a non-directory delete
system call (e.g. unlink() [27] in POSIX) to remove a directory, as system call (e.g. unlink() [27] in POSIX) to remove a directory, as
well as the converse (e.g. a rmdir() on a non-directory) because they well as the converse (e.g. a rmdir() on a non-directory) because they
knew the server would check the file type. NFSv4.1 REMOVE can be knew the server would check the file type. NFSv4.1 REMOVE can be
used to delete any directory entry independent of its file type. The used to delete any directory entry independent of its file type. The
implementor of an NFSv4.1 client's entry points from the unlink() and implementor of an NFSv4.1 client's entry points from the unlink() and
rmdir() system calls should first check the file type against the rmdir() system calls should first check the file type against the
types the system call is allowed to remove before issuing a REMOVE. types the system call is allowed to remove before sending a REMOVE
Alternatively, the implementor can produce a COMPOUND call that operation. Alternatively, the implementor can produce a COMPOUND
includes a LOOKUP/VERIFY sequence to verify the file type before a call that includes a LOOKUP/VERIFY sequence of operations to verify
REMOVE operation in the same COMPOUND call. the file type before a REMOVE operation in the same COMPOUND call.
The concept of last reference is server specific. However, if the The concept of last reference is server specific. However, if the
numlinks field in the previous attributes of the object had the value numlinks field in the previous attributes of the object had the value
1, the client should not rely on referring to the object via a 1, the client should not rely on referring to the object via a
filehandle. Likewise, the client should not rely on the resources filehandle. Likewise, the client should not rely on the resources
(disk space, directory entry, and so on) formerly associated with the (disk space, directory entry, and so on) formerly associated with the
object becoming immediately available. Thus, if a client needs to be object becoming immediately available. Thus, if a client needs to be
able to continue to access a file after using REMOVE to remove it, able to continue to access a file after using REMOVE to remove it,
the client should take steps to make sure that the file will still be the client should take steps to make sure that the file will still be
accessible. While the traditional mechanism used is to RENAME the accessible. While the traditional mechanism used is to RENAME the
skipping to change at page 501, line 29 skipping to change at page 501, line 29
DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID
}, then the result spo_must_enforce MUST include the operations the }, then the result spo_must_enforce MUST include the operations the
client requested from that set. client requested from that set.
If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then
connection binding enforcement is enabled, and the client MUST use connection binding enforcement is enabled, and the client MUST use
the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV
protection is used) credential on calls to BIND_CONN_TO_SESSION. protection is used) credential on calls to BIND_CONN_TO_SESSION.
The second list is spo_must_allow and consists of those operations The second list is spo_must_allow and consists of those operations
the client wants to have the option of issuing with the machine the client wants to have the option of sending with the machine
credential or the SSV-based credential, even if the object the credential or the SSV-based credential, even if the object the
operations are performed on is not owned by the machine or SSV operations are performed on is not owned by the machine or SSV
credential. credential.
The corresponding result, also called spo_must_allow, consists of the The corresponding result, also called spo_must_allow, consists of the
operations the server will allow the client to use SP4_SSV or operations the server will allow the client to use SP4_SSV or
SP4_MACH_CRED credentials with. Normally the server's result equals SP4_MACH_CRED credentials with. Normally the server's result equals
the client's argument, but the result MAY be different. the client's argument, but the result MAY be different.
The purpose of spo_must_allow is to allow clients to solve the The purpose of spo_must_allow is to allow clients to solve the
skipping to change at page 528, line 6 skipping to change at page 528, line 6
receiving; the server must support device ID notifications for the receiving; the server must support device ID notifications for the
notification request to have affect. The notification mask is notification request to have affect. The notification mask is
composed in the same manner as the bitmap for file attributes composed in the same manner as the bitmap for file attributes
(Section 3.3.7). The numbers of bit positions are listed in the (Section 3.3.7). The numbers of bit positions are listed in the
notify_device_type4 enumeration type (Section 20.12). Only two notify_device_type4 enumeration type (Section 20.12). Only two
enumerated values of notify_device_type4 currently apply to enumerated values of notify_device_type4 currently apply to
GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE
(see Section 20.12). (see Section 20.12).
The notification bitmap applies only to the specified device ID. If The notification bitmap applies only to the specified device ID. If
a client issues GETDEVICEINFO on a deviceID multiple times, the last a client sends a GETDEVICEINFO operation on a deviceID multiple
notification bitmap is used by the server for subsequent times, the last notification bitmap is used by the server for
notifications. If the bitmap is zero or empty, then the device ID's subsequent notifications. If the bitmap is zero or empty, then the
notifications are turned off. device ID's notifications are turned off.
If the client wants to just update or turn off notifications, it MAY If the client wants to just update or turn off notifications, it MAY
issue GETDEVICEINFO with gdia_maxcount set to zero. In that event, send a GETDEVICEINFO operation with gdia_maxcount set to zero. In
if the device ID is valid, the reply's da_addr_body field of the that event, if the device ID is valid, the reply's da_addr_body field
gdir_device_addr field will be of zero length. of the gdir_device_addr field will be of zero length.
If an unknown device ID is given in gdia_device_id, the server If an unknown device ID is given in gdia_device_id, the server
returns NFS4ERR_NOENT. Otherwise, the device address information is returns NFS4ERR_NOENT. Otherwise, the device address information is
returned in gdir_device_addr. Finally, if the server supports returned in gdir_device_addr. Finally, if the server supports
notifications for device ID mappings, the gdir_notification result notifications for device ID mappings, the gdir_notification result
will contain a bitmap of which notifications it will actually send to will contain a bitmap of which notifications it will actually send to
the client (via CB_NOTIFY_DEVICEID, see Section 20.12). the client (via CB_NOTIFY_DEVICEID, see Section 20.12).
If NFS4ERR_TOOSMALL is returned, the results also contain If NFS4ERR_TOOSMALL is returned, the results also contain
gdir_mincount. The value of gdir_mincount represents the minimum gdir_mincount. The value of gdir_mincount represents the minimum
skipping to change at page 529, line 6 skipping to change at page 529, line 6
o CB_NOTIFY_DEVICEID deletes a device ID. If the client believes it o CB_NOTIFY_DEVICEID deletes a device ID. If the client believes it
has layouts that refer to the device ID, then it is possible the has layouts that refer to the device ID, then it is possible the
layouts have been revoked. The client should send a TEST_STATEID layouts have been revoked. The client should send a TEST_STATEID
request using the stateid for each layout that might have been request using the stateid for each layout that might have been
revoked. If TEST_STATEID indicates any layouts have been revoked, revoked. If TEST_STATEID indicates any layouts have been revoked,
the client must recover from layout revocation as described in the client must recover from layout revocation as described in
Section 12.5.6. If TEST_STATEID indicates at least one layout has Section 12.5.6. If TEST_STATEID indicates at least one layout has
not been revoked, the client should send a GETDEVICEINFO on the not been revoked, the client should send a GETDEVICEINFO on the
device ID to verify that the device ID has been deleted. If device ID to verify that the device ID has been deleted. If
GETDEVICEINFO indicates the device ID does not exist, the client GETDEVICEINFO indicates the device ID does not exist, the client
then assumes the server is faulty, and recovers issuing by then assumes the server is faulty, and recovers by sending an
EXCHANGE_ID. If the client does not have layouts that refer to EXCHANGE_ID operation. If the client does not have layouts that
the device ID, no harm is done. The client should mark the device refer to the device ID, no harm is done. The client should mark
ID as deleted, and when the GETDEVICEINFO or GETDEVICELIST results the device ID as deleted, and when the GETDEVICEINFO or
are finally received for the device ID, delete the device ID from GETDEVICELIST results are finally received for the device ID,
client's cache. delete the device ID from client's cache.
o CB_NOTIFY_DEVICEID indicates a device ID's device addressing o CB_NOTIFY_DEVICEID indicates a device ID's device addressing
mappings have changed. The client should assume that the results mappings have changed. The client should assume that the results
from the in progress GETDEVICEINFO will be stale for the device ID from the in progress GETDEVICEINFO will be stale for the device ID
once received, and so it should send another GETDEVICEINFO on the once received, and so it should send another GETDEVICEINFO on the
device ID. device ID.
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for a File 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for a File
System System
skipping to change at page 543, line 25 skipping to change at page 543, line 25
(which equals 13 * 4096). Because the value of (which equals 13 * 4096). Because the value of
threshold4_read_iosize is equal to 4096, it is practical and threshold4_read_iosize is equal to 4096, it is practical and
reasonable for the client to use several LAYOUTGETs to complete reasonable for the client to use several LAYOUTGETs to complete
the series of READs. The client sends a LAYOUTGET request with the series of READs. The client sends a LAYOUTGET request with
loga_offset set to 8192, loga_minlength set to 4096, and loga_offset set to 8192, loga_minlength set to 4096, and
loga_length set to 53248 or higher. The server will grant a loga_length set to 53248 or higher. The server will grant a
layout possibly with an initial offset of 0, with an end offset of layout possibly with an initial offset of 0, with an end offset of
at least 8192 + 4096 - 1 = 12287, but preferably a layout with an at least 8192 + 4096 - 1 = 12287, but preferably a layout with an
offset aligned on the stripe width and a length that is a multiple offset aligned on the stripe width and a length that is a multiple
of the stripe width. This will allow the client to make forward of the stripe width. This will allow the client to make forward
progress, possibly having to issue more LAYOUTGET requests for the progress, possibly having to send more LAYOUTGET operations for
remainder of the range. the remainder of the range.
o An NFS client detects a sequential read pattern, and so issues a o An NFS client detects a sequential read pattern, and so sends a
LAYOUTGET that goes well beyond any current or pending read LAYOUTGET operation that goes well beyond any current or pending
requests to the server. The server might likewise detect this read requests to the server. The server might likewise detect
pattern, and grant the LAYOUTGET request. The client continues to this pattern, and grant the LAYOUTGET request. The client
send LAYOUTGET requests once it has read from an offset of the continues to send LAYOUTGET requests once it has read from an
file that represents 50% of the way through the last layout it offset of the file that represents 50% of the way through the
received. range of the last layout it received.
o As above but the client fails to detect the pattern, but the o As above but the client fails to detect the pattern, but the
server does. The next time the metadata server gets a LAYOUTGET, server does. The next time the metadata server gets a LAYOUTGET,
it returns a layout with a length that is well beyond it returns a layout with a length that is well beyond
loga_minlength. loga_minlength.
o A client is using buffered I/O, and has a long queue of write o A client is using buffered I/O, and has a long queue of write
behinds to process and also detects a sequential write pattern. behinds to process and also detects a sequential write pattern.
It issues a LAYOUTGET for a layout that spans the range of the It sends a LAYOUTGET operation for a layout that spans the range
queued write behinds and well beyond, including ranges beyond the of the queued write behinds and well beyond, including ranges
filer's current length. The client continues to issue LAYOUTGETs beyond the filer's current length. The client continues to send
once the write behind queue reaches 50% of the maximum queue LAYOUTGET operations once the write behind queue reaches 50% of
length. the maximum queue length.
Once the client has obtained a layout referring to a particular Once the client has obtained a layout referring to a particular
device ID, the metadata server MUST NOT delete the device ID until device ID, the metadata server MUST NOT delete the device ID until
the layout is returned or revoked. the layout is returned or revoked.
CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is
that LAYOUTGET returns a device ID the client does not have device that LAYOUTGET returns a device ID the client does not have device
address mappings for, and the metadata server sends a address mappings for, and the metadata server sends a
CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and
meanwhile the client sends GETDEVICEINFO on the device ID. This meanwhile the client sends GETDEVICEINFO on the device ID. This
skipping to change at page 557, line 31 skipping to change at page 557, line 31
and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see
Section 2.10.9). Section 2.10.9).
18.47.4. IMPLEMENTATION 18.47.4. IMPLEMENTATION
When the server receives ssa_digest, it MUST verify the digest by When the server receives ssa_digest, it MUST verify the digest by
computing the digest the same way the client did and comparing it computing the digest the same way the client did and comparing it
with ssa_digest. If the server gets a different result, this is an with ssa_digest. If the server gets a different result, this is an
error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of
another SET_SSV from the same client ID changing the SSV. If so, the another SET_SSV from the same client ID changing the SSV. If so, the
client recovers by issuing SET_SSV again with a recomputed digest client recovers by sending a SET_SSV operation again with a
based on the subkey of the new SSV. If the transport connection is recomputed digest based on the subkey of the new SSV. If the
dropped after the SET_SSV request is sent, but before the SET_SSV transport connection is dropped after the SET_SSV request is sent,
reply is received, then there are special considerations for recovery but before the SET_SSV reply is received, then there are special
if the client has no more connections associated with sessions considerations for recovery if the client has no more connections
associated with the client ID of the SSV. See Section 18.34.4. associated with sessions associated with the client ID of the SSV.
See Section 18.34.4.
Clients SHOULD NOT send an ssa_ssv that is equal to a previous Clients SHOULD NOT send an ssa_ssv that is equal to a previous
ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv
equal to zero since the SSV is initialized to zero when the client ID equal to zero since the SSV is initialized to zero when the client ID
is created). is created).
Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST
support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE,
SET_SSV }. SET_SSV }.
skipping to change at page 585, line 5 skipping to change at page 585, line 5
CB_RECALL_ANY by sending another recall with a higher count. When a CB_RECALL_ANY by sending another recall with a higher count. When a
CB_RECALL_ANY is received and the count is already within the limit CB_RECALL_ANY is received and the count is already within the limit
set or is above a limit that the client is working to get down to, set or is above a limit that the client is working to get down to,
that callback has no effect. that callback has no effect.
Servers are generally free not to give out recallable objects when Servers are generally free not to give out recallable objects when
insufficient resources are available. Note that the effect of such a insufficient resources are available. Note that the effect of such a
policy is implicitly to give precedence to existing objects relative policy is implicitly to give precedence to existing objects relative
to requested ones, with the result that resources might not be to requested ones, with the result that resources might not be
optimally used. To prevent this, servers are well advised to make optimally used. To prevent this, servers are well advised to make
the point at which they start issuing CB_RECALL_ANY callbacks the point at which they start sending CB_RECALL_ANY callbacks
somewhat below that at which they cease to give out new delegations somewhat below that at which they cease to give out new delegations
and layouts. This allows the client to purge its less-used objects and layouts. This allows the client to purge its less-used objects
whenever appropriate and so continue to have its subsequent requests whenever appropriate and so continue to have its subsequent requests
given new resources freed up by object returns. given new resources freed up by object returns.
20.6.4. IMPLEMENTATION 20.6.4. IMPLEMENTATION
The client can choose to return any type of object specified by the The client can choose to return any type of object specified by the
mask. If a server wishes to limit use of objects of a specific type, mask. If a server wishes to limit use of objects of a specific type,
it should only specify that type in the mask sent. The client may it should only specify that type in the mask sent. The client may
skipping to change at page 593, line 30 skipping to change at page 593, line 30
NOTIFY_DEVICEID4_CHANGE NOTIFY_DEVICEID4_CHANGE
A previously provided device ID to device address mapping has A previously provided device ID to device address mapping has
changed and the client uses GETDEVICEINFO to obtain the updated changed and the client uses GETDEVICEINFO to obtain the updated
mapping. The notification is encoded in a value of data type mapping. The notification is encoded in a value of data type
notify_deviceid_change4. This data type also contains a boolean notify_deviceid_change4. This data type also contains a boolean
field, ndc_immediate, which if TRUE indicates that the change will field, ndc_immediate, which if TRUE indicates that the change will
be enforced immediately, and so the client might not be able to be enforced immediately, and so the client might not be able to
complete any pending I/O to the device ID. If ndc_immediate is complete any pending I/O to the device ID. If ndc_immediate is
FALSE, then for an indefinite time, the client can complete FALSE, then for an indefinite time, the client can complete
pending I/O. After pending I/O is complete, the client SHOULD get pending I/O. After pending I/O is complete, the client SHOULD get
the new device ID to device address mappings before issuing new the new device ID to device address mappings before sending new
I/O to the device ID. I/O requests to the device ID.
NOTIFY4_DEVICEID_DELETE NOTIFY4_DEVICEID_DELETE
Deletes a device ID from the mappings. This notification MUST NOT Deletes a device ID from the mappings. This notification MUST NOT
be sent if the client has a layout that refers to the device ID. be sent if the client has a layout that refers to the device ID.
In other words if the server is sending a delete device ID In other words if the server is sending a delete device ID
notification, one of the following is true for layouts associated notification, one of the following is true for layouts associated
with the layout type: with the layout type:
* The client never had a layout referring to that device ID. * The client never had a layout referring to that device ID.
 End of changes. 65 change blocks. 
170 lines changed or deleted 187 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/