Index: Makefile =================================================================== RCS file: /cvs/stdk/NFSv41/Makefile,v retrieving revision 1.114 diff -u -r1.114 Makefile --- Makefile 1 May 2008 18:18:31 -0000 1.114 +++ Makefile 6 May 2008 04:50:50 -0000 @@ -6,8 +6,8 @@ YEAR=`date +%Y` MONTH=`date +%b` -PREVVERS=21 -VERS=22 +PREVVERS=22 +VERS=23 VPATH = dotx.d autogen/%.xml : %.x Index: nfsv41_back_acks.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_back_acks.xml,v retrieving revision 1.19 diff -u -r1.19 nfsv41_back_acks.xml --- nfsv41_back_acks.xml 31 Jan 2008 01:30:48 -0000 1.19 +++ nfsv41_back_acks.xml 6 May 2008 04:50:50 -0000 @@ -30,16 +30,22 @@ Sam Falkner and Lisa Week. + The pNFS work was inspired by the NASD and OSD + work done by Garth Gibson. Gary Grider has also + been a champion of high-performance parallel I/O. + Garth Gibson and Peter Corbett started the pNFS + effort with a problem statement document for IETF + that formed the basis for the pNFS work in NFSv4.1. + + + The initial drafts for the parallel NFS support were edited by Brent Welch and Garth Goodson. Additional authors for those documents were Benny Halevy, David Black, and Andy Adamson. Additional input came from the informal group which contributed to the construction of the initial pNFS drafts; specific acknowledgement goes to Gary Grider, Peter Corbett, Dave Noveck, - Peter Honeyman, and Stephen Fridella. The pNFS work was inspired - by the NASD and - OSD work done by Garth Gibson. Gary Grider of the national labs - (LANL) has also been a champion of high-performance parallel I/O. + Peter Honeyman, and Stephen Fridella. Fredric Isaman found several errors in draft versions of the @@ -245,7 +251,7 @@ Others who provided comments include: - Mahesh Siddheshwar. + Jason Goldschmidt and Mahesh Siddheshwar. Index: nfsv41_back_references.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_back_references.xml,v retrieving revision 1.18 diff -u -r1.18 nfsv41_back_references.xml --- nfsv41_back_references.xml 20 Apr 2008 16:28:33 -0000 1.18 +++ nfsv41_back_references.xml 6 May 2008 04:50:50 -0000 @@ -762,7 +762,7 @@ + target='ftp://www.ietf.org/internet-drafts/draft-ietf-nfsv4-pnfs-block-08.txt'> pNFS Block/Volume Layout @@ -774,7 +774,7 @@ EMC Corporation - + @@ -1119,7 +1119,7 @@ + target='ftp://www.ietf.org/internet-drafts/draft-nfsv4-pnfs-obj-07.txt'> Object-based pNFS Operations @@ -1131,7 +1131,7 @@ "Panasas Inc." - + Index: nfsv41_middle_cbproc_compound.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_cbproc_compound.xml,v retrieving revision 1.14 diff -u -r1.14 nfsv41_middle_cbproc_compound.xml --- nfsv41_middle_cbproc_compound.xml 15 Dec 2007 05:44:52 -0000 1.14 +++ nfsv41_middle_cbproc_compound.xml 6 May 2008 04:50:50 -0000 @@ -23,10 +23,10 @@ operations use the CB_COMPOUND procedure as a wrapper. - In the processing of the CB_COMPOUND procedure, the client may find + During the processing of the CB_COMPOUND procedure, the client may find that it does not have the available resources to execute any or all of - the operations within the CB_COMPOUND sequence. This is - discussed in . + the operations within the CB_COMPOUND sequence. + Refer to for details. The minorversion field of the arguments MUST be the same as the @@ -41,9 +41,8 @@ as is being returned for the operation that failed. - For a description of the "tag" field, see - where the corresponding - forward channel procedure is described. + The "tag" field is handled the same way as that of COMPOUND + procedure (see ). Illegal operation codes are handled in the same way as they are Index: nfsv41_middle_cbproc_null.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_cbproc_null.xml,v retrieving revision 1.4 diff -u -r1.4 nfsv41_middle_cbproc_null.xml --- nfsv41_middle_cbproc_null.xml 8 Nov 2007 04:15:21 -0000 1.4 +++ nfsv41_middle_cbproc_null.xml 6 May 2008 04:50:50 -0000 @@ -21,10 +21,10 @@
- Standard NULL procedure. Void argument, void response. Even though + CB_NULL is the standard ONC RPC NULL procedure, with the standard void argument and void response. Even though there is no direct functionality associated with this procedure, the server will use CB_NULL to confirm the existence of a path for RPCs - from server to client. + from the server to client.
Index: nfsv41_middle_core_infrastructure.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_core_infrastructure.xml,v retrieving revision 1.90 diff -u -r1.90 nfsv41_middle_core_infrastructure.xml --- nfsv41_middle_core_infrastructure.xml 20 Apr 2008 16:28:33 -0000 1.90 +++ nfsv41_middle_core_infrastructure.xml 6 May 2008 04:50:51 -0000 @@ -1789,7 +1789,7 @@ Of course, even if the eir_server_owner.so_minor_id fields do match, the client is free to employ client - ID trunking instead of sessiond trunking. + ID trunking instead of session trunking. @@ -2134,7 +2134,7 @@ - The slot id, sequence id, and sessionid therefore take over the traditional role + The slot id, sequence id, and session id therefore take over the traditional role of the XID and source network address in the replier's reply cache implementation. This approach is considerably @@ -2168,7 +2168,7 @@ The SEQUENCE or CB_SEQUENCE operation may generate an error. - If so, the embedded slot id, sequence id, and sessionid (if + If so, the embedded slot id, sequence id, and session id (if present) in the request will not be in the reply, and the requester has only the XID to match the reply to the request. @@ -2177,16 +2177,16 @@ Given that well formulated XIDs continue to be required, this begs the question why SEQUENCE and CB_SEQUENCE replies - have a sessionid, slot id and sequence id? Having the sessionid + have a session id, slot id and sequence id? Having the session id in the reply means the requester does not have to use the XID to lookup - the sessionid, which would be necessary if the connection were + the session id, which would be necessary if the connection were associated with multiple sessions. Having the slot id and sequence id in the reply means requester does not have to use the XID to lookup the slot id and sequence id. Furhermore, since the XID is only 32 bits, it is too small to guarantee the re-association of a reply with its request (); having - sessionid, slot id, and sequence id in the reply allows the + session id, slot id, and sequence id in the reply allows the client to validate that the reply in fact belongs to the matched request. @@ -2260,7 +2260,7 @@ the server can both retire the slot and return NFS4ERR_BADSLOT (however the server MUST NOT do one and not the other). (The reason it is safe to retire the slot - is because that by using the next sequenceid, the client + is because that by using the next sequence id, the client is indicating it has received the previous reply for the slot.) Once the replier has forcibly lowered the enforced @@ -2288,8 +2288,8 @@ has seen a reply containing the new granted highest_slotid. The replier can infer that requester as seen such a reply when it receives a new request with the same - slotid as the request replied to and the next higher - sequenceid. + slot id as the request replied to and the next higher + sequence id. @@ -2299,16 +2299,16 @@ When a SEQUENCE or CB_SEQUENCE operation is successfully executed, its reply MUST always be - cached. Specifically, sessionid, sequenceid, - and slotid MUST be cached in the reply cache. + cached. Specifically, session id, sequence id, + and slot id MUST be cached in the reply cache. The reply from SEQUENCE also includes the highest - slotid, target highest slotid, and status flags. Instead + slot id, target highest slot id, and status flags. Instead of caching these values, the server MAY re-compute the values from the current state of the fore channel, session and/or client ID as appropriate. Similarly, the reply from - CB_SEQUENCE includes a highest slotid and target - highest slotid. The client + CB_SEQUENCE includes a highest slot id and target + highest slot id. The client MAY re-compute the values from the current state of the session as appropriate. @@ -2316,14 +2316,14 @@ - Regardless of whether a replier is re-computing highest slotid, - target slotid, and status on replies to retries or not, the requester + Regardless of whether a replier is re-computing highest slot id, + target slot id, and status on replies to retries or not, the requester MUST NOT assume the values are being re-computed whenever it receives a reply after a retry is sent, since it has no way of knowing whether the reply it has received was sent by the server in response to the retry, or is a delayed response to the original request. Therefore, it may be the case that - highest slotid, target slotid, or status bits may reflect + highest slot id, target slot id, or status bits may reflect the state of affairs when the request was first executed. Although acting based on such delayed information is valid, it may cause the receiver to do unneeded work. Requesters @@ -2485,7 +2485,7 @@ The presence of a session between client and server alleviates this issue. When a session is in place, each client request is uniquely identified by its { - sessionid, slot id, sequence id } triple. By the rules under which + session id, slot id, sequence id } triple. By the rules under which slot entries (reply cache entries) are retired, the server has knowledge whether the client has "seen" each of the server's replies. The server @@ -2497,10 +2497,10 @@ For each client operation which might result in some sort of server callback, the server SHOULD "remember" - the { sessionid, slot id, sequence id } triple of the client request + the { session id, slot id, sequence id } triple of the client request until the slot id retirement rules allow the server to determine that the client has, in fact, seen the - server's reply. Until the time the { sessionid, slot id, + server's reply. Until the time the { session id, slot id, sequence id } request triple can be retired, any recalls of the associated object MUST carry an array of these referring identifiers (in the CB_SEQUENCE operation's @@ -2511,15 +2511,15 @@ The CB_SEQUENCE operation which begins each server - callback carries a list of "referring" { sessionid, slot id, + callback carries a list of "referring" { session id, slot id, sequence id } triples. If the client finds the request - corresponding to the referring sessionid, slot id and sequence id + corresponding to the referring session id, slot id and sequence id to be currently outstanding (i.e. the server's reply has not been seen by the client), it can determine that the callback has raced the reply, and act accordingly. If the client does not find the request corresponding the referring triple to be outstanding (including - the case of a sessionid referring to a destroyed session), + the case of a session id referring to a destroyed session), then there is no race with respect to this triple. The server SHOULD limit the referring triples to requests that refer to just those that apply to the objects @@ -2673,7 +2673,7 @@ - The sessionid. + The session id. The slot table including the sequence id and cached reply for @@ -2689,7 +2689,7 @@ the server will never see any NFSv4.1-level protocol manifestation of a client restart. If the replier is a server, with just the - slot table and sessionid persisting, + slot table and session id persisting, any requests the client retries after the server restart will return the results that are cached in reply cache. and any new requests (i.e. the sequence id is one (1) greater than the @@ -2707,7 +2707,7 @@ as the - The client ID's sequenceid that is used for creating + The client ID's sequence id that is used for creating sessions (see and . This is a prerequisite to let the client create more sessions. @@ -2996,7 +2996,7 @@ As described to this point in the specification, the state model of NFSv4.1 is vulnerable to an attacker that sends a - SEQUENCE operation with a forged sessionid and with a slot id that + SEQUENCE operation with a forged session id and with a slot id that it expects the legitimate client to use next. When the legitimate client uses the slot id with the same sequence number, the server returns the attacker's result from the reply cache which @@ -3743,7 +3743,7 @@ is not necessary to retry requests over a connection with the same source network address or the same destination network address as the lost connection. As - long as the sessionid, slot id, and sequence id in the + long as the session id, slot id, and sequence id in the retry match that of the original request, the server will recognize the request as a retry if it executed the request prior to disconnect. @@ -3793,7 +3793,7 @@ Loss of reply cache is equivalent to loss of session. The replier indicates loss of session to the requester by returning NFS4ERR_BADSESSION on the next operation - that uses the sessionid that refers to the lost + that uses the session id that refers to the lost session. @@ -3802,7 +3802,7 @@ that the session has not been lost. It reconnects, and if it specified connection association enforcement when the session was created, it - invokes BIND_CONN_TO_SESSION using the sessionid. Otherwise, + invokes BIND_CONN_TO_SESSION using the session id. Otherwise, it invokes SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns NFS4ERR_BADSESSION, the client knows the session was lost. If the connection survives @@ -3879,7 +3879,7 @@ is not necessary to retry requests over a connection with the same source network address or the same destination network address as the lost connection. As long as - the sessionid, slot id, and sequence id in the retry + the session id, slot id, and sequence id in the retry match that of the original request, the callback target will recognize the request as a retry even if it did see the request prior to disconnect. Index: nfsv41_middle_errors.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_errors.xml,v retrieving revision 1.64 diff -u -r1.64 nfsv41_middle_errors.xml --- nfsv41_middle_errors.xml 20 Apr 2008 16:28:33 -0000 1.64 +++ nfsv41_middle_errors.xml 6 May 2008 04:50:52 -0000 @@ -1243,7 +1243,7 @@
- A sessionid was specified which does not exist. + A session id was specified which does not exist.
In NFSv4.1, the - sessionid in the SEQUENCE operation implies the + session id in the SEQUENCE operation implies the client ID, which in turn might be used by the server to map the stateid to the right client/server pair. However, when a data server is presented with a READ or WRITE operation with a stateid, because the stateid is associated with - client ID on a metadata server, and because the sessionid in + client ID on a metadata server, and because the session id in the preceding SEQUENCE operation is tied to the client ID of the data server, the data server has no obvious way to determine the metadata server from the Index: nfsv41_middle_filelocking.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_filelocking.xml,v retrieving revision 1.47 diff -u -r1.47 nfsv41_middle_filelocking.xml --- nfsv41_middle_filelocking.xml 31 Mar 2008 23:18:57 -0000 1.47 +++ nfsv41_middle_filelocking.xml 6 May 2008 04:50:52 -0000 @@ -217,7 +217,7 @@ does not conflict with the delegation, but is sent under the aegis of the delegation. Even though it is possible for the server to determine from the client ID (via - the sessionid) that the client does in fact have a + the session id) that the client does in fact have a delegation, the server is not obliged to check this, so using a special stateid can result in avoidable recall of the delegation. Index: nfsv41_middle_iana_considerations.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_iana_considerations.xml,v retrieving revision 1.18 diff -u -r1.18 nfsv41_middle_iana_considerations.xml --- nfsv41_middle_iana_considerations.xml 16 Feb 2008 14:30:25 -0000 1.18 +++ nfsv41_middle_iana_considerations.xml 6 May 2008 04:50:52 -0000 @@ -4,8 +4,8 @@
- The NFSv4.1 protocol provides for the association of named - attributes to files. The name space identifiers for these attributes + The NFSv4.1 protocol supports the association of a file with zero or + more named attributes. The name space identifiers for these attributes are defined as string names. The protocol does not define the specific assignment of the name space for these file attributes. Even though the name space is not specifically controlled to prevent @@ -31,7 +31,7 @@ ) discussed the r_netid field and the corresponding r_addr field within a netaddr4 structure. The NFSv4 protocol depends on the syntax and semantics of these - fields to effectively communicate callback information between client + fields to effectively communicate callback and other information between client and server. Therefore, an IANA registry has been created to include the values defined in this document and to allow for future expansion based on transport usage/availability. Additions to this ONC RPC @@ -99,7 +99,8 @@ New layout type numbers will be requested from IANA. IANA will only provide layout type numbers for Standards Track RFCs approved by the IESG, in accordance with Standards Action policy - defined in RFC2434. + defined in . All layout types + assigned by IANA MUST be in the range 0x00000001 to 0x7FFFFFFF. The author of a new pNFS layout specification must follow these @@ -158,6 +159,10 @@ A list of any new notification values for CB_NOTIFY_DEVICEID. + + A list of any new recallable object types for + CB_RECALL_ANY. + Include an IANA considerations section. Index: nfsv41_middle_op_bind_conn_to_session.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_bind_conn_to_session.xml,v retrieving revision 1.27 diff -u -r1.27 nfsv41_middle_op_bind_conn_to_session.xml --- nfsv41_middle_op_bind_conn_to_session.xml 1 May 2008 22:44:00 -0000 1.27 +++ nfsv41_middle_op_bind_conn_to_session.xml 6 May 2008 04:50:52 -0000 @@ -81,7 +81,7 @@ the client might need to use BIND_CONN_TO_SESSION to associate a new connection. If the server restarted and does not keep the reply cache in stable - storage, the server will not recognize the sessionid. + storage, the server will not recognize the session id. The client will ultimately have to invoke EXCHANGE_ID to create a new client ID and session. Index: nfsv41_middle_op_cb_getattr.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_getattr.xml,v retrieving revision 1.6 diff -u -r1.6 nfsv41_middle_op_cb_getattr.xml --- nfsv41_middle_op_cb_getattr.xml 20 Nov 2007 21:30:04 -0000 1.6 +++ nfsv41_middle_op_cb_getattr.xml 6 May 2008 04:50:52 -0000 @@ -22,7 +22,7 @@ If the filehandle specified is not one for which the client holds a - write open delegation, an NFS4ERR_BADHANDLE error is returned. + write delegation, an NFS4ERR_BADHANDLE error is returned.
Index: nfsv41_middle_op_cb_illegal.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_illegal.xml,v retrieving revision 1.6 diff -u -r1.6 nfsv41_middle_op_cb_illegal.xml --- nfsv41_middle_op_cb_illegal.xml 1 May 2008 22:44:00 -0000 1.6 +++ nfsv41_middle_op_cb_illegal.xml 6 May 2008 04:50:52 -0000 @@ -16,10 +16,10 @@
This operation is a placeholder for encoding a - result to handle the case of the client sending - an operation code within COMPOUND that is not + result to handle the case of the server sending + an operation code within CB_COMPOUND that is not defined in the NFSv4.1 specification. See for more details. + target="OP_CB_COMPOUND_DESCRIPTION"/> for more details. @@ -34,8 +34,8 @@ just as it would be with any other invalid operation code. Note that if the client gets an illegal operation code that is not OP_ILLEGAL, and if the client checks for legal operation codes - during the XDR decode phase, then the CB_ILLEGAL4res would not be - returned. + during the XDR decode phase, then an instance of + data type CB_ILLEGAL4res will not be returned.
Index: nfsv41_middle_op_cb_layoutrecall.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_layoutrecall.xml,v retrieving revision 1.24 diff -u -r1.24 nfsv41_middle_op_cb_layoutrecall.xml --- nfsv41_middle_op_cb_layoutrecall.xml 27 Mar 2008 21:05:40 -0000 1.24 +++ nfsv41_middle_op_cb_layoutrecall.xml 6 May 2008 04:50:53 -0000 @@ -14,7 +14,7 @@ The CB_LAYOUTRECALL operation is used by the server to recall layouts from the client; as a result, the client will begin the - process of returning layouts with LAYOUTRETURN. The + process of returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation specifies one of three forms of recall processing with the value of layoutrecall_type4. The recall is either for a specific layout (by file), for an entire file system @@ -26,20 +26,20 @@ - For a layout to match the recall request, the following fields - must match in value with the layout: clora_type, clora_iomode, - lor_fh, and the byte range specified by lor_offset, and + For a layout to match the recall request, the values of the following fields + must match those of the layout: clora_type, clora_iomode, + lor_fh, and the byte range specified by lor_offset and lor_length. The clora_iomode field may have a special value - of LAYOUTIOMODE4_ANY. The LAYOUTIOMODE4_ANY will match any - value originally returned in a layout; therefore it acts as a - wild card for iomode. The other special value used is for - lor_length. If lor_length has a value of NFS4_MAXFILELEN, the + of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will match any + iomode originally returned in a layout; therefore it acts as a + wild card. The other special value used is for + lor_length. If lor_length has a value of NFS4_UINT64_MAX, the lor_length field means the maximum possible file size. If a matching layout is found, it MUST be returned using the - LAYOUTRETURN operation, see . + LAYOUTRETURN operation (see ). An example of the field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and lor_length is - NFS4_MAXFILELEN, then the entire layout is to be returned. + NFS4_UINT64_MAX, then the entire layout is to be returned. The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the @@ -70,7 +70,7 @@ In processing the layout recall request, the client also varies - its behavior on the value of the clora_changed field. This + its behavior based on the value of the clora_changed field. This field is used by the server to provide additional context for the reason why the layout is being recalled. A FALSE value for clora_changed indicates that no change in the layout is expected @@ -78,7 +78,7 @@ involved; this must be done prior to returning the layout via LAYOUTRETURN. A TRUE value for clora_changed indicates that the server is changing the layout. Examples of layout changes and - reasons for a TRUE indication are: metadata server is restriping + reasons for a TRUE indication are: the metadata server is restriping the file or a permanent error has occurred on a storage device and the metadata server would like to provide a new layout for the file. Therefore, a clora_changed value of TRUE indicates @@ -99,21 +99,21 @@
The client's processing for CB_LAYOUTRECALL is similar to - CB_RECALL (recall of file delegations) in that straightforward - processing of the layout recall done and the client responds to - the request before actually returning layouts with the + CB_RECALL (recall of file delegations) in that + the client responds to + the request before actually returning layouts via the LAYOUTRETURN operation. While the client responds to the CB_LAYOUTRECALL immediately, the operation is not considered complete (i.e. considered pending) until all affected layouts are returned to the server - with the LAYOUTRETURN operation. + via the LAYOUTRETURN operation. - Before returning the layout to the server with LAYOUTRETURN, the + Before returning the layout to the server via LAYOUTRETURN, the client should wait for the response from in-process or in-flight READ, WRITE, or COMMIT operations that use the recalled layout. - If the client is holding modified data which is effected by a + If the client is holding modified data which is affected by a recalled layout, the client has various options for writing the data to the server. As always, the client may write the data through the metadata server. In fact, the client may not have a @@ -122,7 +122,7 @@ from the server. However, the client may be able to write the modified data to the storage device if the clora_changed argument is FALSE; this needs to be done before returning the - layout with LAYOUTRETURN. If the client were to obtain a new + layout via LAYOUTRETURN. If the client were to obtain a new layout covering the modified data's range, then writing to the storage devices is an available alternative. Note that before obtaining a new layout, the client must first return the Index: nfsv41_middle_op_cb_notify.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_notify.xml,v retrieving revision 1.20 diff -u -r1.20 nfsv41_middle_op_cb_notify.xml --- nfsv41_middle_op_cb_notify.xml 19 Dec 2007 15:27:13 -0000 1.20 +++ nfsv41_middle_op_cb_notify.xml 6 May 2008 04:50:53 -0000 @@ -31,13 +31,13 @@ - If the server has more notifications then can fit in + If the server has more notifications than can fit in the CB_COMPOUND request, it SHOULD send a sequence of serial CB_COMPOUND requests so that the client's view of the directory does not become confused. E.g. If the server indicates a file named "foo" is added, and that the - file "foo" is removed, the order it which the client receives - these notifications are processed needs to be the same as the + file "foo" is removed, the order in which the client receives + these notifications needs to be the same as the order in which corresponding operations occurred on the server. @@ -66,16 +66,16 @@ (see below), and when a hard link is being created to an existing file. If this entry is added to the end of the directory, the server will set the nad_last_entry flag to - true. If the file is added such that there is at least one + TRUE. If the file is added such that there is at least one entry before it, the server will also return the previous entry information (nad_prev_entry, a variable length array of up to one element. If the array is of zero length, there is no previous entry), along with its cookie. This is to - help clients find the right location in their DNLC or + help clients find the right location in their file name caches and directory caches where this entry should be cached. If the new entry's cookie is available, it will be in - nad_new_entry_cookie (another variable length array of up to - one element). If the addition of the entry causes another + the nad_new_entry_cookie (another variable length array of up to + one element) field. If the addition of the entry causes another entry to be deleted (which can only happen in the rename case) atomically with the addition, then information on this entry is reported in nad_old_entry. Index: nfsv41_middle_op_cb_notify_deviceid.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_notify_deviceid.xml,v retrieving revision 1.6 diff -u -r1.6 nfsv41_middle_op_cb_notify_deviceid.xml --- nfsv41_middle_op_cb_notify_deviceid.xml 29 Jan 2008 08:19:32 -0000 1.6 +++ nfsv41_middle_op_cb_notify_deviceid.xml 6 May 2008 04:50:53 -0000 @@ -17,10 +17,9 @@ The CB_NOTIFY_DEVICEID operation is used by the server to send notifications to clients about changes to pNFS device IDs. The registration of - device ID notifications occurs when the device - mapping stateid is established using GETDEVICEINFO - or GETDEVICELIST. These notifications are sent - over the backchannel. The notification is sent + device ID notifications is optional and is done via + GETDEVICEINFO. These notifications are sent + over the backchannel once the original request has been processed on the server. The server will send an array of notifications, cnda_changes, as a list of pairs of @@ -58,7 +57,7 @@ A previously provided device ID to device address mapping has changed and the client uses - GETDEVICEINFO or GETDEVICELIST to obtain the + GETDEVICEINFO to obtain the updated mapping. The notification is encoded in a value of data Index: nfsv41_middle_op_cb_notify_lock.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_notify_lock.xml,v retrieving revision 1.6 diff -u -r1.6 nfsv41_middle_op_cb_notify_lock.xml --- nfsv41_middle_op_cb_notify_lock.xml 21 Feb 2008 12:58:17 -0000 1.6 +++ nfsv41_middle_op_cb_notify_lock.xml 6 May 2008 04:50:53 -0000 @@ -13,7 +13,7 @@
- The server can use this operation to indicate that a lock for the given + The server can use this operation to indicate that a byte-range lock for the given file and lock-owner, previously requested by the client via an unsuccessful LOCK request, might be available. @@ -23,7 +23,7 @@ been polling for a blocking lock may now be able to acquire the lock. If the server supports this callback for a given file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when responding to successful opens - for that file. This does not commit the server to use of CB_NOTIFY_LOCK, + for that file. This does not commit the server to the use of CB_NOTIFY_LOCK, but the client may use this as a hint to decide how frequently to poll for locks derived from that open. @@ -40,10 +40,10 @@
- The server must not grant the lock to the client unless and until it - receives an actual lock request from the client. Similarly, the client + The server MUST NOT grant the lock to the client unless and until it + receives an actual LOCK request from the client. Similarly, the client receiving this callback cannot assume that it now has the lock, or that a - subsequent request for the lock will be successful. + subsequent LOCK request for the lock will be successful. The server is not required to implement this callback, and even if it Index: nfsv41_middle_op_cb_push_deleg.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_push_deleg.xml,v retrieving revision 1.13 diff -u -r1.13 nfsv41_middle_op_cb_push_deleg.xml --- nfsv41_middle_op_cb_push_deleg.xml 17 Dec 2007 14:44:27 -0000 1.13 +++ nfsv41_middle_op_cb_push_deleg.xml 6 May 2008 04:50:53 -0000 @@ -18,7 +18,9 @@
CB_PUSH_DELEG is used by the server to both signal to the - client that the delegation it wants is available and to + client that the delegation it wants (previously indicated + via a want established from an + OPEN or WANT_DELEGATION operation) is available and to simultaneously offer the delegation to the client. The client has the choice of accepting the delegation by returning NFS4_OK to the server, delaying the decision to accept the @@ -26,12 +28,8 @@ or permanently rejecting the offer of the delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, the want - previously established is permanently deleted. - - - The server MUST send in cpda_delegation a delegation - which satisfies a request made in an - OPEN or WANT_DELEGATION operation. + previously established is permanently deleted and the delegation + is subject to acquisition by another client.
Index: nfsv41_middle_op_cb_recall.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_recall.xml,v retrieving revision 1.5 diff -u -r1.5 nfsv41_middle_op_cb_recall.xml --- nfsv41_middle_op_cb_recall.xml 8 Nov 2007 04:15:21 -0000 1.5 +++ nfsv41_middle_op_cb_recall.xml 6 May 2008 04:50:53 -0000 @@ -1,7 +1,7 @@ -
+
@@ -12,17 +12,18 @@
The CB_RECALL operation is used to begin the process of recalling - an open delegation and returning it to the server. + a delegation and returning it to the server. - The truncate flag is used to optimize recall for a file which is - about to be truncated to zero. When it is set, the client is freed - of obligation to propagate modified data for the file to the + The truncate flag is used to optimize recall for a file object which + is a regular file and is + about to be truncated to zero. When it is TRUE, the client is freed + of the obligation to propagate modified data for the file to the server, since this data is irrelevant. - If the handle specified is not one for which the client holds an - open delegation, an NFS4ERR_BADHANDLE error is returned. + If the handle specified is not one for which the client holds a + delegation, an NFS4ERR_BADHANDLE error is returned. If the stateid specified is not one corresponding to an open @@ -32,10 +33,13 @@
- The client should reply to the callback immediately. Replying does - not complete the recall except when an error was returned. The - recall is not complete until the delegation is returned using a - DELEGRETURN. + The client SHOULD reply to the callback immediately. + Replying does not complete the recall except when + the value of the reply's status field is neither + NFS4ERR_DELAY nor NFS4_OK. The recall is not complete + until the delegation is returned using a DELEGRETURN + operation. +
Index: nfsv41_middle_op_cb_recall_any.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_recall_any.xml,v retrieving revision 1.10 diff -u -r1.10 nfsv41_middle_op_cb_recall_any.xml --- nfsv41_middle_op_cb_recall_any.xml 25 Feb 2008 20:51:37 -0000 1.10 +++ nfsv41_middle_op_cb_recall_any.xml 6 May 2008 04:50:53 -0000 @@ -1,9 +1,9 @@ -
+
- Notify client to return delegation and keep N of them. + Notify client to return all but N recallable objects.
@@ -53,24 +53,83 @@ in that class should be returned. - For NFSv4.1, a number of bits are defined. For some of these, ranges + A number of bits are defined. For some of these, ranges are defined and it is up to the definition of the storage protocol to specify how these are to be used. There are ranges - for blocks-based storage protocols, for object-based storage - protocols and a reserved range for other experimental storage - protocols. The RFC defining such a storage protocol needs to + reserved for object-based storage + protocols and for other experimental storage + protocols. An RFC defining such a storage protocol needs to specify how particular bits within its range are to be used. For example, it may specify a mapping between attributes of the layout (read vs. write, size of area) and the bit to be used or it may define a field in the layout where the associated bit position is made available by the server to the client. + + + + + + The client is to return read delegations on + non-directory file objects. + + + + + + + The client is to return write delegations on + regular file objects. + + + + + + + The client is to return directory delegations. + + + + + + + The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. + + + + + + See for a description. + + + + + + + See for a description. + + + + + + This range is reserved for telling the client to recall + layouts of experimental + or site specific layout types (see ). + + + + - When an undefined bit is set in the type mask, NFS4ERR_INVAL - should be returned. If a client does not support - an object of the specified type, if the bit is defined, - NFS4ERR_INVAL should not be returned. Future minor versions - of NFSv4 may expand the set of valid type mask bits. + When a bit is set in the type mask that corresponds + to an undefined type of recallable object, + NFS4ERR_INVAL MUST be returned. When a bit is set + that corresponds to a defined type of object, but + the client does not support an object of the type, + NFS4ERR_INVAL MUST NOT be returned. Future minor + versions of NFSv4 may expand the set of valid type + mask bits. + CB_RECALL_ANY specifies a count of objects that the client may Index: nfsv41_middle_op_cb_recall_credit.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_recall_credit.xml,v retrieving revision 1.13 diff -u -r1.13 nfsv41_middle_op_cb_recall_credit.xml --- nfsv41_middle_op_cb_recall_credit.xml 25 Feb 2008 20:51:37 -0000 1.13 +++ nfsv41_middle_op_cb_recall_credit.xml 6 May 2008 04:50:53 -0000 @@ -17,16 +17,16 @@ The CB_RECALL_SLOT operation requests the client to return session slots, and if applicable, transport credits (e.g. RDMA credits for connections associated with - the operations channel) to the server. + the operations channel) of the session's fore channel. CB_RECALL_SLOT specifies - rsa_target_highest_slotid, the target highest_slot the server wants - for the session. The client, should then work toward reducing - the highest_slot to the target. + rsa_target_highest_slotid, the value of the target highest slot id the server wants + for the session. The client MUST then progress toward reducing + the session's highest slot id to the target value. If the session has only non-RDMA connections associated with its operations channel, then the client need only wait - for all outstanding requests with a slotid > + for all outstanding requests with a slot id > rsa_target_highest_slotid to complete, then send a single COMPOUND consisting of a single SEQUENCE operation, with the sa_highestslot field set to rsa_target_highest_slotid. Index: nfsv41_middle_op_cb_recallable_obj_avail.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_recallable_obj_avail.xml,v retrieving revision 1.10 diff -u -r1.10 nfsv41_middle_op_cb_recallable_obj_avail.xml --- nfsv41_middle_op_cb_recallable_obj_avail.xml 17 Dec 2007 14:44:27 -0000 1.10 +++ nfsv41_middle_op_cb_recallable_obj_avail.xml 6 May 2008 04:50:53 -0000 @@ -21,7 +21,7 @@ WANT_DELEGATION, GET_DIR_DELEG, or LAYOUTGET. - The argument, objects_to_keep means the total number of + The argument craa_objects_to_keep means the total number of recallable objects of the types indicated in the argument type_mask that the server believes it can allow the client to have, including the number of such objects the client already @@ -29,5 +29,25 @@ than the server informs it can have runs the risk of having objects recalled. + + The server is not obligated to reserve the + difference between the number of the objects + the client currently has and the value of + craa_objects_to_keep, nor does delaying the reply + to CB_RECALLABLE_OBJ_AVAIL prevent the server + from using the resources of the recallable objects + for another purpose. Indeed, if a client responds + slowly to CB_RECALLABLE_OBJ_AVAIL, the server might + interpret the client as having reduced capability + to manage recallable objects, and so cancel + or reduce any reservation it is maintaining on behalf + of the client. + Thus if the client desires to acquire more + recallable objects, it needs to reply quickly + to CB_RECALLABLE_OBJ_AVAIL, and then send the + appropriate operations to acquire recallable + objects. + +
Index: nfsv41_middle_op_cb_sequence.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_cb_sequence.xml,v retrieving revision 1.23 diff -u -r1.23 nfsv41_middle_op_cb_sequence.xml --- nfsv41_middle_op_cb_sequence.xml 19 Mar 2008 05:04:46 -0000 1.23 +++ nfsv41_middle_op_cb_sequence.xml 6 May 2008 04:50:53 -0000 @@ -16,13 +16,13 @@ The CB_SEQUENCE operation is used to manage operational accounting for the backchannel of the session on which a request is - sent. The contents include the session to which this - request belongs, slot id and sequence id used by the server to + sent. The contents include the session id to which this + request belongs, the slot id and sequence id used by the server to implement session request control and exactly once - semantics, and exchanged slot maximums which are used to adjust the - size of the reply cache. This operation MUST appear once as the first operation in + semantics, and exchanged slot id maxima which are used to adjust the + size of the reply cache. This operation will appear once as the first operation in each CB_COMPOUND request - or a protocol error must result. See + or a protocol error MUST result. See for a description of how slots are processed. @@ -33,18 +33,18 @@ The csa_referring_call_lists array is the list of COMPOUND - requests, identified by sessionid, slot id and sequencid. These + requests, identified by session id, slot id and sequencid. These are requests that the client previously sent to the server. These previous requests created state that some operation(s) - in the same CB_COMPOUND as the csa_referring_call_lists is + in the same CB_COMPOUND as the csa_referring_call_lists are identifying. - A sessionid is included because + A session id is included because leased state is tied to a client ID, and a client ID can have multiple sessions. See . - The value of csa_sequenceid argument relative to + The value of the csa_sequenceid argument relative to the cached sequence id on the slot falls into one of three cases. @@ -100,6 +100,4 @@
-
-
Index: nfsv41_middle_op_create_session.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_create_session.xml,v retrieving revision 1.47 diff -u -r1.47 nfsv41_middle_op_create_session.xml --- nfsv41_middle_op_create_session.xml 1 May 2008 22:44:00 -0000 1.47 +++ nfsv41_middle_op_create_session.xml 6 May 2008 04:50:53 -0000 @@ -52,7 +52,7 @@ This is the client ID the new session will be associated - with. The corresponding result is csr_sessionid, the sessionid + with. The corresponding result is csr_sessionid, the session id of the new session. @@ -359,7 +359,8 @@ as a result of a client restart, network partition, malfunctioning router, etc. For each client ID created by EXCHANGE_ID, the server maintains a - separate reply cache similar to the session reply + separate reply cache (called the CREATE_SESSION reply cache) + similar to the session reply cache used for SEQUENCE operations, with two distinctions. @@ -377,6 +378,34 @@ + As previously stated, CREATE_SESSION can be sent with + or without a preceding SEQUENCE operation. Even if + SEQUENCE precedes CREATE_SESSION, the server MUST + maintain the CREATE_SESSION reply cache, which + is separate from the reply cache for the session + associated with SEQUENCE. If CREATE_SESSION was + originally sent by itself, the client MAY send + a retry of the CREATE_SESSION operation within a + COMPOUND preceded by SEQUENCE. If CREATE_SESSION + was originally sent in a COMPOUND that started with + SEQUENCE, then the client SHOULD send a retry in + a COMPOUND that starts with SEQUENCE that has the + same session id as the SEQUENCE of the original + request. However, the client MAY send a retry in a + COMPOUND that either has no preceding SEQUENCE, or + has a preceding SEQUENCE that refers to a different + session than the original CREATE_SESSION. This might + be necessary if the client sends a CREATE_SESSION + in a COMPOUND preceded by a SEQUENCE with session + id X, and session X no longer exists. Regardless any + retry of CREATE_SESSION, with or without a preceding + SEQUENCE, MUST use the same value of csa_sequence + as the original. + + + + + When a client sends a successful EXCHANGE_ID and it is returned an unconfirmed client ID, the client is also returned eir_sequenceid, and the client is Index: nfsv41_middle_op_destroy_clientid.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_destroy_clientid.xml,v retrieving revision 1.9 diff -u -r1.9 nfsv41_middle_op_destroy_clientid.xml --- nfsv41_middle_op_destroy_clientid.xml 1 May 2008 22:44:00 -0000 1.9 +++ nfsv41_middle_op_destroy_clientid.xml 6 May 2008 04:50:53 -0000 @@ -22,7 +22,7 @@ ID, the server MUST return NFS4ERR_CLIENTID_BUSY. DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as the client ID derived from the - sessionid of SEQUENCE is not the same as the client + session id of SEQUENCE is not the same as the client ID to be destroyed. If the client IDs are the same, then the server MUST return NFS4ERR_CLIENTID_BUSY. Index: nfsv41_middle_op_layoutcommit.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_layoutcommit.xml,v retrieving revision 1.23 diff -u -r1.23 nfsv41_middle_op_layoutcommit.xml --- nfsv41_middle_op_layoutcommit.xml 1 May 2008 22:44:00 -0000 1.23 +++ nfsv41_middle_op_layoutcommit.xml 6 May 2008 04:50:53 -0000 @@ -12,7 +12,7 @@
Commits changes in the layout represented by the current - filehandle, client ID (derived from the sessionid in the + filehandle, client ID (derived from the session id in the preceding SEQUENCE operation), byte range, and stateid. Since layouts are sub-dividable, a smaller portion of a layout, retrieved via LAYOUTGET, can be committed. The region being Index: nfsv41_middle_op_layoutget.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_layoutget.xml,v retrieving revision 1.27 diff -u -r1.27 nfsv41_middle_op_layoutget.xml --- nfsv41_middle_op_layoutget.xml 1 May 2008 22:44:00 -0000 1.27 +++ nfsv41_middle_op_layoutget.xml 6 May 2008 04:50:53 -0000 @@ -14,7 +14,7 @@ Requests a layout from the metadata server for reading or writing the file given by the filehandle at the byte range specified by offset and length. Layouts are - identified by the client ID (derived from the sessionid in the + identified by the client ID (derived from the session id in the preceding SEQUENCE operation), current filehandle, layout type (loga_layout_type), and the layout stateid (loga_stateid). The use of the loga_iomode field depends upon the layout type, but should @@ -23,54 +23,180 @@ If the metadata server is in a grace period, and does not persist layouts and device ID to device address mappings, then - it MUST return NFS4ERR_GRACE (see ). + it MUST return NFS4ERR_GRACE (see ). The LAYOUTGET operation returns layout information - for the specified byte range: a layout. To get - a layout from a specific offset through the - end-of-file, regardless of the file's length, a - loga_length field set to NFS4_UINT64_MAX is used. - If loga_length is zero, or if a loga_length which is - not NFS4_UINT64_MAX is specified, and the sum of loga_length - and loga_offset exceeds NFS4_UINT64_MAX, - the error NFS4ERR_INVAL will result. - - - - The loga_minlength field specifies the minimum length of - layout the server MUST return with two exceptions: - - - + for the specified byte range: a layout. + The client actually specifies two ranges, both starting + at the offset in the loga_offset field. The first + range is between loga_offset and loga_offset + loga_length - 1 + inclusive. This range indicates the desired range the client + wants the layout to cover. The second range is between + loga_offset and loga_offset + loga_minlength - 1 inclusive. This + range indicates the required range the client needs the layout + to cover. Thus, loga_minlength MUST be less than or equal to + loga_length. + + + + When a length field is set to NFS4_UINT64_MAX, + this indicates a desire (when loga_length is NFS4_UINT64_MAX) + or requirement (when loga_minlength is NFS4_UINT64_MAX) + to get a layout from loga_offset through the + end-of-file, regardless of the file's length. + + + The following rules govern the relationships among, + and minima of + loga_length, loga_minlength, and loga_offset. + + - - The argument loga_iomode was set to - LAYOUTIOMODE_READ, and loga_offset plus - loga_minlength goes past the end of the file. + If loga_length is less than loga_minlength, the metadata server + MUST return NFS4ERR_INAL. - + If loga_length or loga_minlength are zero the metadata server + MUST return NFS4ERR_INVAL. - The range from loga_offset through loga_offset + - loga_minlength - 1 overlaps two or more striping - patterns. In which case, logr_layout will - contain two or more elements, and the sum of - the lo_length fields of each element MUST be at - least loga_minlength unless the first exception - also applies. + + + If the sum of loga_offset and loga_minlength exceeds + NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, + the error NFS4ERR_INVAL MUST result. + + If the sum of loga_offset and loga_length exceeds + NFS4_UINT64_MAX, and loga_length is not NFS4_UINT64_MAX, + the error NFS4ERR_INVAL MUST result. + - If this requirement cannot be met, the server MUST - NOT return a layout and the error NFS4ERR_BADLAYOUT - MUST be returned. - - + After the metadata server has performed the above checks on loga_offset, + loga_minlength, and loga_offset, the metadata server MUST return a + layout according to the rules in . + + + + + + Acceptable layouts based on loga_minlength. + u64m = NFS4_UINT64_MAX; a_off = loga_offset; + a_minlen = loga_minlength. + + + + Layout iomode of request + Layout a_minlen of request + Layout iomode of reply + Layout offset of reply + Layout length of reply + + _READ + u64m + MAY be _READ + MUST be <= a_off + MUST be >= file length - layout offset + + _READ + u64m + MAY be _RW + MUST be <= a_off + MUST be u64m + + _READ + < u64m + MAY be _READ + MUST be <= a_off + MUST be >= MIN(file length, a_minlen + a_off) - layout offset + + _READ + < u64m + MAY be _RW + MUST be <= a_off + MUST be >= a_off - layout offset + a_minlen + + _RW + u64m + MUST be _RW + MUST be <= a_off + MUST be u64m + + _RW + < u64m + MUST be _RW + MUST be <= a_off + MUST be >= a_off - layout offset + a_minlen + + + + If the metadata server cannot return a layout according + to the rules in , + then the metadata server MUST return the error + NFS4ERR_BADLAYOUT. Assuming loga_length is greater + than loga_minlength, the metadata server SHOULD + return a layout according to the rules in . + + + + + + + Desired layouts based on loga_length. + The rules of MUST be applied first. + u64m = NFS4_UINT64_MAX; a_off = loga_offset; + a_len = loga_length. + + + + Layout iomode of request + Layout a_len of request + Layout iomode of reply + Layout offset of reply + Layout length of reply + + _READ + u64m + MAY be _READ + MUST be <= a_off + SHOULD be u64m + + _READ + u64m + MAY be _RW + MUST be <= a_off + SHOULD be u64m + + _READ + < u64m + MAY be _READ + MUST be <= a_off + SHOULD be >= a_off - layout offset + a_len + + _READ + < u64m + MAY be _RW + MUST be <= a_off + SHOULD be >= a_off - layout offset + a_len + + _RW + u64m + MUST be _RW + MUST be <= a_off + SHOULD be u64m + + _RW + < u64m + MUST be _RW + MUST be <= a_off + SHOULD be >= a_off - layout offset + a_len + The loga_stateid field specifies a valid stateid. @@ -78,11 +204,11 @@ the loga_stateid field represents a stateid reflecting the correspondingly valid open, byte-range lock, or delegation stateid. Once a - layout is held by the client for the file, the - loga_stateid field is a stateid as returned from + layout is held on the file by the client, the + loga_stateid field MUST be a stateid as returned from a previous LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL operation (see ). + target="layout_stateid"/>). @@ -91,6 +217,7 @@ exceeds the size specified by maxcount, the metadata server will return the NFS4ERR_TOOSMALL error. + The returned layout is expressed as an array, logr_layout, with each element of type layout4. If a @@ -104,45 +231,117 @@ in the range between two successive elements of logr_layout. The lo_iomode field in each element of logr_layout MUST be the same. - - - The metadata server may adjust - the range of the returned layout - based on the usage implied by - the loga_iomode. The client MUST - be prepared to get a layout that - does not align exactly with its - request. - See - for more details. - - The metadata server may also return a layout with an lo_iomode - other than that requested by the client. If it does so, it MUST - ensure that the lo_iomode is more permissive than the - loga_iomode requested. For example, this behavior allows an - implementation to upgrade read-only requests to read/write - requests at its discretion, within the limits of the layout type - specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or - LAYOUTIOMODE4_RW MUST be returned. + + and + + + both refer to a returned layout iomode, offset, and length. + Because the returned layout is encoded in the logr_layout array, + more description is required. + + + + + The value of the returned layout iomode listed in + + and + + is equal to the value of the lo_iomode field in each + element of logr_layout. + + As shown in + and , + the metadata server MAY return a layout with an lo_iomode + different from the requested iomode (field loga_iomode of the request). + If it does so, it MUST + ensure that the lo_iomode is more permissive than the + loga_iomode requested. For example, this behavior allows an + implementation to upgrade read-only requests to read/write + requests at its discretion, within the limits of the layout type + specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or + LAYOUTIOMODE4_RW MUST be returned. + + + + + + The value of the returned layout offset listed in + + and + + is always equal to the lo_offset field of the field + element logr_layout. + + + + + + When setting the value of the returned layout + length, the situation is complicated by the + possibility that the special layout length value + NFS4_UINT64_MAX is involved. For a logr_layout + array of N elements, the lo_length field in the + first N-1 elements MUST NOT be NFS4_UINT64_MAX. The + lo_length field of the last element of logr_layout + can be NFS4_UINT64_MAX under some conditions as + described in the following list. + + + + + If an applicable rule of + states the metadata server MUST return a layout of length + NFS4_UINT64_MAX, then lo_length field of the last + element of logr_layout MUST be NFS4_UINT64_MAX. + + + + If an applicable rule of + states the metadata server MUST NOT return a layout of length + NFS4_UINT64_MAX, then lo_length field of the last + element of logr_layout MUST NOT be NFS4_UINT64_MAX. + + + + If an applicable rule of + states the metadata server SHOULD return a layout of length + NFS4_UINT64_MAX, then lo_length field of the last + element of logr_layout SHOULD be NFS4_UINT64_MAX. + + + + When the value of the returned layout length of + + and + is not NFS4_UINT64_MAX, then + the returned layout length is equal to the sum of the + lo_length fields of each element of logr_layout. + + + + + + + + The logr_return_on_close result field is a directive to return - the layout before closing the file. When the server sets this + the layout before closing the file. When the metadata server sets this return value to TRUE, it MUST be prepared to recall the layout in the case the client fails to return the layout before close. - For the server that knows a layout must be returned before a + For the metadata server that knows a layout must be returned before a close of the file, this return value can be used to communicate the desired behavior to the client and thus remove one extra - step from the client's and server's interaction. + step from the client's and metadata server's interaction. The logr_stateid stateid is returned to the client for use in subsequent layout related operations. See - , , and + , , and for a further discussion and requirements. @@ -150,45 +349,45 @@ The format of the returned layout (lo_content) is specific to the layout type. The value of the layout type (lo_content.loc_type) for each of - the elements of the array of layouts returned by the server + the elements of the array of layouts returned by the metadata server (logr_layout) MUST be equal to the loga_layout_type specified by the client. If it is not equal, the client SHOULD ignore - the response as invalid and behave as if the server returned + the response as invalid and behave as if the metadata server returned an error, even if the client does have support for the layout type returned. If layouts are not supported for the requested file or its - containing file system the server SHOULD return + containing file system the metadata server MUST return NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, - the metadata server should return NFS4ERR_UNKNOWN_LAYOUTTYPE. + the metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout matches the client - provided layout identification, the server should return + provided layout identification, the metadata server MUST return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or a - loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should + loga_iomode of LAYOUTIOMODE4_ANY is specified, the metadata server MUST return NFS4ERR_BADIOMODE. If the layout for the file is unavailable due to transient - conditions, e.g. file sharing prohibits layouts, the server MUST + conditions, e.g. file sharing prohibits layouts, the metadata server MUST return NFS4ERR_LAYOUTTRYLATER. If the layout request is rejected due to an overlapping layout - recall, the server MUST return NFS4ERR_RECALLCONFLICT. See for details. + recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See for details. If the layout conflicts with a mandatory byte range lock held on the file, and if the storage devices have no method of enforcing mandatory locks, other than through the restriction of layouts, the - metadata server should return NFS4ERR_LOCKED. + metadata server SHOULD return NFS4ERR_LOCKED. If client sets loga_signal_layout_avail to TRUE, then it is registering with the client a "want" for a layout in the event the layout cannot be obtained due to resource exhaustion. - If the server supports and will honor the "want", + If the metadata server supports and will honor the "want", the results will have logr_will_signal_layout_avail set to TRUE. If so the client should expect a CB_RECALLABLE_OBJ_AVAIL @@ -206,28 +405,28 @@ COMPOUND request after an OPEN operation and results in the client having location information for the file; this requires that loga_stateid be set to the - special stateid that tells the server to use the + special stateid that tells the metadata server to use the current stateid, which is set by OPEN (see ) . A client may also hold a layout across multiple OPENs. The client specifies a layout type that limits what kind of layout the - server will return. This prevents servers from - issuing layouts that are unusable by the client. + metadata server will return. This prevents metadata servers from + granting layouts that are unusable by the client. Once the client has obtained a layout referring to a particular - device ID, the server MUST NOT delete the device ID until the + device ID, the metadata server MUST NOT delete the device ID until the layout is returned or revoked. CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is that LAYOUTGET returns a device ID the client does not have device - address mappings for, and the server sends a CB_NOTIFY_DEVICEID + address mappings for, and the metadata server sends a CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and meanwhile the client sends GETDEVICEINFO on the device ID. This scenario is discussed in - . + . Another scenario is that the CB_NOTIFY_DEVICEID is processed by the client before it processes the results from LAYOUTGET. The client will send a GETDEVICEINFO on the device ID. Index: nfsv41_middle_op_layoutreturn.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_layoutreturn.xml,v retrieving revision 1.29 diff -u -r1.29 nfsv41_middle_op_layoutreturn.xml --- nfsv41_middle_op_layoutreturn.xml 1 May 2008 22:44:00 -0000 1.29 +++ nfsv41_middle_op_layoutreturn.xml 6 May 2008 04:50:54 -0000 @@ -15,7 +15,7 @@ This operation returns from the client to the server one or more layouts represented by the client ID - (derived from the sessionid in the preceding SEQUENCE + (derived from the session id in the preceding SEQUENCE operation), lora_layout_type, and lora_iomode. When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is further identified by the current Index: nfsv41_middle_op_lock.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_lock.xml,v retrieving revision 1.17 diff -u -r1.17 nfsv41_middle_op_lock.xml --- nfsv41_middle_op_lock.xml 20 Apr 2008 16:28:33 -0000 1.17 +++ nfsv41_middle_op_lock.xml 6 May 2008 04:50:54 -0000 @@ -82,7 +82,7 @@ (locker.open_owner.lock_owner.clientid). The reason the server MUST ignore the clientid field is that the server MUST derive the client ID from - the sessionid from the SEQUENCE operation of the + the session id from the SEQUENCE operation of the COMPOUND request. Index: nfsv41_middle_op_lockt.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_lockt.xml,v retrieving revision 1.12 diff -u -r1.12 nfsv41_middle_op_lockt.xml --- nfsv41_middle_op_lockt.xml 20 Apr 2008 16:28:33 -0000 1.12 +++ nfsv41_middle_op_lockt.xml 6 May 2008 04:50:54 -0000 @@ -35,7 +35,7 @@ any value by the client and MUST be ignored by the server. The reason the server MUST ignore the clientid field is that the server MUST derive the - client ID from the sessionid from the SEQUENCE + client ID from the session id from the SEQUENCE operation of the COMPOUND request. Index: nfsv41_middle_op_open.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_open.xml,v retrieving revision 1.34 diff -u -r1.34 nfsv41_middle_op_open.xml --- nfsv41_middle_op_open.xml 20 Apr 2008 16:28:33 -0000 1.34 +++ nfsv41_middle_op_open.xml 6 May 2008 04:50:54 -0000 @@ -216,7 +216,7 @@ field called clientid and a field called owner. The client can set the clientid field to any value and the server MUST ignore it. Instead the server MUST - derive the client ID from the sessionid of the + derive the client ID from the session id of the SEQUENCE operation of the COMPOUND request. Index: nfsv41_middle_op_sequence.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_op_sequence.xml,v retrieving revision 1.40 diff -u -r1.40 nfsv41_middle_op_sequence.xml --- nfsv41_middle_op_sequence.xml 1 May 2008 22:44:00 -0000 1.40 +++ nfsv41_middle_op_sequence.xml 6 May 2008 04:50:54 -0000 @@ -285,7 +285,7 @@
- The server MUST maintain a mapping of sessionid to client ID + The server MUST maintain a mapping of session id to client ID in order to validate any operations that follow SEQUENCE that take a stateid as an argument and/or result. Index: nfsv41_middle_pnfs.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_pnfs.xml,v retrieving revision 1.79 diff -u -r1.79 nfsv41_middle_pnfs.xml --- nfsv41_middle_pnfs.xml 1 May 2008 22:44:00 -0000 1.79 +++ nfsv41_middle_pnfs.xml 6 May 2008 04:50:54 -0000 @@ -218,7 +218,7 @@
-
+
A layout defines how a file's data is organized on one or more storage devices. There are many potential layout types; each of the @@ -509,15 +509,12 @@ The client selects an appropriate layout type that the server supports and the client is prepared to use. The layout returned to - the client may not exactly align with the - requested byte range. A field within the - LAYOUTGET request, loga_minlength, specifies - the minimum length of the layout. - The loga_minlength - field should be at least one. As needed a - client may make multiple LAYOUTGET requests; - these will result in multiple overlapping, - non-conflicting layouts. + the client might not exactly match the + requested byte range as described in . As needed a client + may make multiple LAYOUTGET requests; these might result + in multiple overlapping, non-conflicting layouts (see + ). @@ -610,7 +607,7 @@ knowledge of which layout ranges are held. Note that overlapping layout ranges may occur because of the client's specific requests or because the server is allowed to expand the range of a requested - layout and notify the client in the LAYOUTRETURN results Additional + layout and notify the client in the LAYOUTRETURN results. Additional layout stateid sequencing requirements are provided in . Index: nfsv41_middle_security_considerations.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_security_considerations.xml,v retrieving revision 1.10 diff -u -r1.10 nfsv41_middle_security_considerations.xml --- nfsv41_middle_security_considerations.xml 29 Jan 2008 08:19:32 -0000 1.10 +++ nfsv41_middle_security_considerations.xml 6 May 2008 04:50:54 -0000 @@ -3,11 +3,10 @@
- NFS has historically used a model where, from an - authentication perspective, the client was the entire - machine, or at least the source network address of - the machine. The NFS server relied on the NFS client - to make the proper authentication of the end-user. + Historically the authentication of model of NFS + had the entire machine being the NFS client, and the + NFS server trusting the NFS client + to authenticate the end-user. The NFS server in turn shared its files only to specific clients, as identified by the client's source network address. Given this model, the AUTH_SYS @@ -17,26 +16,23 @@ came from the same network address and port number that the request was sent to. While such a model is easy to implement and simple to deploy and use, it is - certainly not a safe model. Thus, NFSv4.1 + unsafe. Thus, NFSv4.1 implementations are REQUIRED to support a security model that uses end to end authentication, where an end-user on a client mutually authenticates (via cryptographic schemes that do not expose passwords or keys in the clear on the network) to a principal on an NFS server. Consideration - should also be given to the integrity and privacy of + is also be given to the integrity and privacy of NFS requests and responses. The issues of end to end mutual authentication, integrity, and privacy are discussed . - Note that while NFSv4.1 mandates an end to end - mutual authentication model, the "classic" model of - machine authentication via network address checking - and AUTH_SYS identification can still be supported - with the caveat that the AUTH_SYS flavor is neither - REQUIRED nor RECOMMENDED by this specification, and - so interoperability via AUTH_SYS is not assured. + Note that being REQUIRED to implement does not mean REQUIRED to + use; AUTH_SYS can be used by NFSv4.1 clients and servers. + However, AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, + and so interoperability via AUTH_SYS is not assured. @@ -52,31 +48,33 @@ the option to use weaker security mechanisms, there are three operations in particular that warrant the implementation overriding user choices. + - The first two such operations are SECINFO - SECINFO_NO_NAME. It is RECOMMENDED that the client - send the either operation such that it is protected - with a security flavor that has integrity protection, - such as RPCSEC_GSS with - either the rpc_gss_svc_integrity or rpc_gss_svc_privacy - service. Without integrity protection encapsulating - SECINFO and SECINFO_NO_NAME and their results, an attacker in the - middle could modify results such that the client might - select a weaker algorithm in the set allowed by server, - making the client and/or server vulnerable to further - attacks. + SECINFO_NO_NAME. It is RECOMMENDED that the client send + the either operation such that it is protected with a + security flavor that has integrity protection, such + as RPCSEC_GSS with either the rpc_gss_svc_integrity + or rpc_gss_svc_privacy service. Without integrity + protection encapsulating SECINFO and SECINFO_NO_NAME + and their results, an attacker in the middle could + modify results such that the client might select a + weaker algorithm in the set allowed by server, making + the client and/or server vulnerable to further attacks. - The second operation that should definitely use integrity protection - is any GETATTR for the fs_locations attribute. The attack has two + The third operation that should definitely use integrity protection + is any GETATTR for the fs_locations and fs_locations_info attributes. The attack has two steps. First the attacker modifies the unprotected results of some operation to return NFS4ERR_MOVED. Second, when the client follows up - with a GETATTR for the fs_locations attribute, the attacker modifies + with a GETATTR for the fs_locations or fs_locations_info attributes, the attacker modifies the results to cause the client migrate its traffic to a server controlled by the attacker. + + + Relative to previous NFS versions, Index: nfsv41_middle_state_mgmt.xml =================================================================== RCS file: /cvs/stdk/NFSv41/nfsv41_middle_state_mgmt.xml,v retrieving revision 1.34 diff -u -r1.34 nfsv41_middle_state_mgmt.xml --- nfsv41_middle_state_mgmt.xml 20 Apr 2008 16:28:33 -0000 1.34 +++ nfsv41_middle_state_mgmt.xml 6 May 2008 04:50:55 -0000 @@ -55,7 +55,7 @@ and then one or more sessionids (see ) before performing any operations to open, lock, delegate, or obtain a layout for a file object. - Each sessionid is associated with a specific client ID, and thus + Each session id is associated with a specific client ID, and thus serves as a shorthand reference to an NFSv4.1 client. @@ -436,7 +436,7 @@ If server restart has resulted in an invalid - client ID or a sessionid which is invalid, SEQUENCE will return + client ID or a session id which is invalid, SEQUENCE will return an error and the operation that takes a stateid as an argument will never be processed. @@ -601,7 +601,7 @@
- Because each operation is associated with a sessionid and from that + Because each operation is associated with a session id and from that the clientid can be determined, operations do not need to include a stateid for the server to be able to determine whether the they should cause a delegation to be recalled or are to be Index: dotx-id.d/Makefile =================================================================== RCS file: /cvs/stdk/NFSv41/dotx-id.d/Makefile,v retrieving revision 1.16 diff -u -r1.16 Makefile --- dotx-id.d/Makefile 1 May 2008 18:18:57 -0000 1.16 +++ dotx-id.d/Makefile 6 May 2008 04:50:55 -0000 @@ -5,8 +5,8 @@ YEAR=`date +%Y` MONTH=`date +%b` -PREVVERS=03 -VERS=04 +PREVVERS=04 +VERS=05 VPATH = ../../dotx.d all: txt filelist Index: dotx.d/cb_recall_any_args.x =================================================================== RCS file: /cvs/stdk/NFSv41/dotx.d/cb_recall_any_args.x,v retrieving revision 1.3 diff -u -r1.3 cb_recall_any_args.x --- dotx.d/cb_recall_any_args.x 3 Mar 2007 21:52:42 -0000 1.3 +++ dotx.d/cb_recall_any_args.x 6 May 2008 04:50:55 -0000 @@ -2,10 +2,9 @@ const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_FILE_LAYOUT = 3; -const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; -const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7; +const RCA4_TYPE_MASK_BLK_LAYOUT = 4; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; -const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; +const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; Index: dotx.d/spit_types.sh =================================================================== RCS file: /cvs/stdk/NFSv41/dotx.d/spit_types.sh,v retrieving revision 1.47 diff -u -r1.47 spit_types.sh --- dotx.d/spit_types.sh 21 Feb 2008 17:11:47 -0000 1.47 +++ dotx.d/spit_types.sh 6 May 2008 04:50:55 -0000 @@ -831,7 +831,7 @@ type_nfs_cb_resop4.x ) cat << EOF > $i -union nfs_cb_resop4 switch (unsigned resop){ +union nfs_cb_resop4 switch (unsigned resop) { case OP_CB_GETATTR: CB_GETATTR4res opcbgetattr; case OP_CB_RECALL: CB_RECALL4res opcbrecall; @@ -1069,7 +1069,7 @@ type_nfs_resop4.x ) cat << EOF > $i -union nfs_resop4 switch (nfs_opnum4 resop){ +union nfs_resop4 switch (nfs_opnum4 resop) { case OP_ACCESS: ACCESS4res opaccess; case OP_CLOSE: CLOSE4res opclose; case OP_COMMIT: COMMIT4res opcommit;