draft-ietf-nfsv4-minorversion1-25.txt   draft-ietf-nfsv4-minorversion1-26.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: February 20, 2009 Editors Expires: March 7, 2009 Editors
August 19, 2008 September 03, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-25.txt draft-ietf-nfsv4-minorversion1-26.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 20, 2009. This Internet-Draft will expire on March 7, 2009.
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
version 4 minor version one include: Sessions, Directory Delegations, version 4 minor version one include: Sessions, Directory Delegations,
and parallel NFS (pNFS). and parallel NFS (pNFS).
Requirements Language Requirements Language
skipping to change at page 2, line 46 skipping to change at page 2, line 46
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 38
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 39 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 39
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 39 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 39
2.9.2. Client and Server Transport Behavior . . . . . . . . 39 2.9.2. Client and Server Transport Behavior . . . . . . . . 39
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 42
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 45 2.10.4. Server Scope . . . . . . . . . . . . . . . . . . . . 45
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 48 2.10.5. Trunking . . . . . . . . . . . . . . . . . . . . . . 48
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 61 2.10.6. Exactly Once Semantics . . . . . . . . . . . . . . . 51
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 64 2.10.7. RDMA Considerations . . . . . . . . . . . . . . . . 64
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 69 2.10.8. Sessions Security . . . . . . . . . . . . . . . . . 67
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 73 2.10.9. The SSV GSS Mechanism . . . . . . . . . . . . . . . 72
2.10.10. Session Inactivity Timer . . . . . . . . . . . . . . 75 2.10.10. Session Mechanics - Steady State . . . . . . . . . . 76
2.10.11. Session Mechanics - Recovery . . . . . . . . . . . . 75 2.10.11. Session Inactivity Timer . . . . . . . . . . . . . . 78
2.10.12. Parallel NFS and Sessions . . . . . . . . . . . . . 79 2.10.12. Session Mechanics - Recovery . . . . . . . . . . . . 78
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 79 2.10.13. Parallel NFS and Sessions . . . . . . . . . . . . . 83
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 79 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 84
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 80 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 84
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 82 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 85
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 86
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 90 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 91 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 95
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 91 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 95
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 91 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 96
4.2.1. General Properties of a Filehandle . . . . . . . . . 92 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 96
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 93 4.2.1. General Properties of a Filehandle . . . . . . . . . 97
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 93 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 97
4.3. One Method of Constructing a Volatile Filehandle . . . . 94 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 98
4.4. Client Recovery from Filehandle Expiration . . . . . . . 95 4.3. One Method of Constructing a Volatile Filehandle . . . . 99
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 96 4.4. Client Recovery from Filehandle Expiration . . . . . . . 99
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 97 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 100
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 97 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 102
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 98 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 102
5.4. Classification of Attributes . . . . . . . . . . . . . . 99 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 102
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 100 5.4. Classification of Attributes . . . . . . . . . . . . . . 104
5.6. REQUIRED Attributes - List and Definition References . . 100 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 105
5.6. REQUIRED Attributes - List and Definition References . . 105
5.7. RECOMMENDED Attributes - List and Definition 5.7. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . 106
5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 103 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 108
5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 103 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 108
5.8.2. Definitions of Uncategorized RECOMMENDED 5.8.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 105 Attributes . . . . . . . . . . . . . . . . . . . . . 110
5.9. Interpreting owner and owner_group . . . . . . . . . . . 112 5.9. Interpreting owner and owner_group . . . . . . . . . . . 116
5.10. Character Case Attributes . . . . . . . . . . . . . . . 114 5.10. Character Case Attributes . . . . . . . . . . . . . . . 118
5.11. Directory Notification Attributes . . . . . . . . . . . 114 5.11. Directory Notification Attributes . . . . . . . . . . . 119
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 114 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 119
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 116 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 121
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 119 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 124
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 120 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 125
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 120 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 125
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 135 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 140
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 135 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 140
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 135 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 140
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 136 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 141
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 137 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 142
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 137 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 142
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 138 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 143
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 139 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 144
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 139 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 144
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 141 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 146
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 141 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 146
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 145 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 150
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 145 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 150
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 146 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 151
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 146 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 151
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 152
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 152
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 152
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 153
7.8. Security Policy and Namespace Presentation . . . . . . . 148 7.8. Security Policy and Namespace Presentation . . . . . . . 153
8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 8. State Management . . . . . . . . . . . . . . . . . . . . . . 154
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 155
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 155
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 156
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 157
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 159
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 160
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 163
8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 164
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 164
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 166
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 167
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 168
8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 8.4.3. Network Partitions and Recovery . . . . . . . . . . 172
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 176
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 177
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 Expiration . . . . . . . . . . . . . . . . . . . . . . . 178
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 179
9. File Locking and Share Reservations . . . . . . . . . . . . . 174 9. File Locking and Share Reservations . . . . . . . . . . . . . 180
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 180
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 180
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 180
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 178 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 183
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 178 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 184
9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 179 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 184
9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 179 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 185
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 180 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 185
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 181 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 186
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 182 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 187
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 182 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 188
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 183 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 189
9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 184 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 189
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 184 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 190
10.1. Performance Challenges for Client-Side Caching . . . . . 185 10.1. Performance Challenges for Client-Side Caching . . . . . 190
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 186 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 191
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 188 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 193
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 190 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 196
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 190 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 196
10.3.2. Data Caching and File Locking . . . . . . . . . . . 191 10.3.2. Data Caching and File Locking . . . . . . . . . . . 197
10.3.3. Data Caching and Mandatory File Locking . . . . . . 193 10.3.3. Data Caching and Mandatory File Locking . . . . . . 199
10.3.4. Data Caching and File Identity . . . . . . . . . . . 193 10.3.4. Data Caching and File Identity . . . . . . . . . . . 199
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 195 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 200
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 197 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 203
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 198 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 204
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 199 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 204
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 202 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 207
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 204 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 209
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 204 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 210
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 205 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 210
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 206 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 211
10.5.1. Revocation Recovery for Write Open Delegation . . . 206 10.5.1. Revocation Recovery for Write Open Delegation . . . 212
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 207 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 212
10.7. Data and Metadata Caching and Memory Mapped Files . . . 209 10.7. Data and Metadata Caching and Memory Mapped Files . . . 214
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 211 Delegations . . . . . . . . . . . . . . . . . . . . . . 217
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 211 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 217
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 213 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 218
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 214 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 219
10.9.1. Introduction to Directory Delegations . . . . . . . 214 10.9.1. Introduction to Directory Delegations . . . . . . . 219
10.9.2. Directory Delegation Design . . . . . . . . . . . . 215 10.9.2. Directory Delegation Design . . . . . . . . . . . . 220
10.9.3. Attributes in Support of Directory Notifications . . 216 10.9.3. Attributes in Support of Directory Notifications . . 221
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 216 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 221
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 217 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 222
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 217 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 222
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 217 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 223
11.2. File System Presence or Absence . . . . . . . . . . . . 218 11.2. File System Presence or Absence . . . . . . . . . . . . 223
11.3. Getting Attributes for an Absent File System . . . . . . 219 11.3. Getting Attributes for an Absent File System . . . . . . 224
11.3.1. GETATTR Within an Absent File System . . . . . . . . 219 11.3.1. GETATTR Within an Absent File System . . . . . . . . 225
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 220 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 226
11.4. Uses of Location Information . . . . . . . . . . . . . . 221 11.4. Uses of Location Information . . . . . . . . . . . . . . 226
11.4.1. File System Replication . . . . . . . . . . . . . . 222 11.4.1. File System Replication . . . . . . . . . . . . . . 227
11.4.2. File System Migration . . . . . . . . . . . . . . . 222 11.4.2. File System Migration . . . . . . . . . . . . . . . 228
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 224 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 229
11.5. Location Entries and Server Identity . . . . . . . . . . 225 11.5. Location Entries and Server Identity . . . . . . . . . . 231
11.6. Additional Client-side Considerations . . . . . . . . . 226 11.6. Additional Client-side Considerations . . . . . . . . . 231
11.7. Effecting File System Transitions . . . . . . . . . . . 226 11.7. Effecting File System Transitions . . . . . . . . . . . 232
11.7.1. File System Transitions and Simultaneous Access . . 228 11.7.1. File System Transitions and Simultaneous Access . . 233
11.7.2. Simultaneous Use and Transparent Transitions . . . . 228 11.7.2. Simultaneous Use and Transparent Transitions . . . . 234
11.7.3. Filehandles and File System Transitions . . . . . . 231 11.7.3. Filehandles and File System Transitions . . . . . . 237
11.7.4. Fileids and File System Transitions . . . . . . . . 231 11.7.4. Fileids and File System Transitions . . . . . . . . 237
11.7.5. Fsids and File System Transitions . . . . . . . . . 233 11.7.5. Fsids and File System Transitions . . . . . . . . . 238
11.7.6. The Change Attribute and File System Transitions . . 233 11.7.6. The Change Attribute and File System Transitions . . 239
11.7.7. Lock State and File System Transitions . . . . . . . 234 11.7.7. Lock State and File System Transitions . . . . . . . 239
11.7.8. Write Verifiers and File System Transitions . . . . 238 11.7.8. Write Verifiers and File System Transitions . . . . 244
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 238 Transitions . . . . . . . . . . . . . . . . . . . . 244
11.7.10. File System Data and File System Transitions . . . . 238 11.7.10. File System Data and File System Transitions . . . . 244
11.8. Effecting File System Referrals . . . . . . . . . . . . 240 11.8. Effecting File System Referrals . . . . . . . . . . . . 246
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 240 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 246
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 244 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 250
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 246 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 252
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 249 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 255
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 253 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 259
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 258 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 264
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 265
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 267
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 271
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 271
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 272
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 273
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 273
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 273
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 273
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 268 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 274
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 274
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 274
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 275
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 275
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 276
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 277
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 278
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 278
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 278
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 279
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 280
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 276 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 282
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 285
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 293
12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 12.5.7. Metadata Server Write Propagation . . . . . . . . . 293
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 293
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 295
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 295
12.7.2. Dealing with Lease Expiration on the Client . . . . 290 12.7.2. Dealing with Lease Expiration on the Client . . . . 296
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 291 Server . . . . . . . . . . . . . . . . . . . . . . . 297
12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 12.7.4. Recovery from Metadata Server Restart . . . . . . . 297
12.7.5. Operations During Metadata Server Grace Period . . . 293 12.7.5. Operations During Metadata Server Grace Period . . . 299
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 300
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 300
12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 12.9. Security Considerations for pNFS . . . . . . . . . . . . 300
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 301
13.1. Client ID and Session Considerations . . . . . . . . . . 296 13.1. Client ID and Session Considerations . . . . . . . . . . 302
13.1.1. Sessions Considerations for Data Servers . . . . . . 298 13.1.1. Sessions Considerations for Data Servers . . . . . . 304
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 304
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 305
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 309
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 309
13.4.2. Interpreting the File Layout Using Sparse Packing . 303 13.4.2. Interpreting the File Layout Using Sparse Packing . 309
13.4.3. Interpreting the File Layout Using Dense Packing . . 306 13.4.3. Interpreting the File Layout Using Dense Packing . . 312
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 314
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 316
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 317
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 319
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 321
13.9. Metadata and Data Server State Coordination . . . . . . 315 13.9. Metadata and Data Server State Coordination . . . . . . 321
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 321
13.9.2. Data Server State Propagation . . . . . . . . . . . 316 13.9.2. Data Server State Propagation . . . . . . . . . . . 322
13.10. Data Server Component File Size . . . . . . . . . . . . 318 13.10. Data Server Component File Size . . . . . . . . . . . . 324
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 325
13.12. Security Considerations for the File Layout Type . . . . 319 13.12. Security Considerations for the File Layout Type . . . . 325
14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 14. Internationalization . . . . . . . . . . . . . . . . . . . . 326
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 327
14.2. Stringprep profile for the utf8str_cis type . . . . . . 323 14.2. Stringprep profile for the utf8str_cis type . . . . . . 329
14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 14.3. Stringprep profile for the utf8str_mixed type . . . . . 330
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 332
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 332
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 333
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 333
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 335
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 337
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 338
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 340
15.1.5. State Management Errors . . . . . . . . . . . . . . 336 15.1.5. State Management Errors . . . . . . . . . . . . . . 342
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 343
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 343
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 344
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 345
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 346
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 347
15.1.12. Session Management Errors . . . . . . . . . . . . . 343 15.1.12. Session Management Errors . . . . . . . . . . . . . 349
15.1.13. Client Management Errors . . . . . . . . . . . . . . 343 15.1.13. Client Management Errors . . . . . . . . . . . . . . 349
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 350
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 350
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 351
15.2. Operations and their valid errors . . . . . . . . . . . 346 15.2. Operations and their valid errors . . . . . . . . . . . 352
15.3. Callback operations and their valid errors . . . . . . . 362 15.3. Callback operations and their valid errors . . . . . . . 368
15.4. Errors and the operations that use them . . . . . . . . 364 15.4. Errors and the operations that use them . . . . . . . . 370
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 384
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 384
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 385
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 396
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 399
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 399
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 405
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 406
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 409
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 412
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 413
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 413
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 415
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 416
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 419
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 423
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 424
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 426
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 427
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 423 Attributes . . . . . . . . . . . . . . . . . . . . . . . 429
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 430
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 443 Directory . . . . . . . . . . . . . . . . . . . . . . . 449
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 450
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 452
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 452
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 454
18.22. Operation 25: READ - Read from File . . . . . . . . . . 449 18.22. Operation 25: READ - Read from File . . . . . . . . . . 455
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 457
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 461
18.25. Operation 28: REMOVE - Remove File System Object . . . . 456 18.25. Operation 28: REMOVE - Remove File System Object . . . . 462
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 464
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 468
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 469
18.29. Operation 33: SECINFO - Obtain Available Security . . . 464 18.29. Operation 33: SECINFO - Obtain Available Security . . . 470
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 474
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 477
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 478
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 482
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 484
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 487
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 498 Confirm Client ID . . . . . . . . . . . . . . . . . . . 504
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 508 session . . . . . . . . . . . . . . . . . . . . . . . . 514
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 locks . . . . . . . . . . . . . . . . . . . . . . . . . 515
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 510 delegation . . . . . . . . . . . . . . . . . . . . . . . 516
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 520
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 516 for a File System . . . . . . . . . . . . . . . . . . . 522
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 a layout . . . . . . . . . . . . . . . . . . . . . . . . 524
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 527
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 531 Information . . . . . . . . . . . . . . . . . . . . . . 537
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 535 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 541
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 537 sequencing and control . . . . . . . . . . . . . . . . . 543
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 542 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 548
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 544 validity . . . . . . . . . . . . . . . . . . . . . . . . 550
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 546 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 552
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 550 client ID . . . . . . . . . . . . . . . . . . . . . . . 556
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 550 Finished . . . . . . . . . . . . . . . . . . . . . . . . 556
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 553 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 559
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 553 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 559
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 554 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 560
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 554 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 560
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 558 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 564
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 558 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 564
20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 559 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 565
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 560 Client . . . . . . . . . . . . . . . . . . . . . . . . . 566
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 564 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 570
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 568 Client . . . . . . . . . . . . . . . . . . . . . . . . . 574
20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable
objects . . . . . . . . . . . . . . . . . . . . . . . . 569 objects . . . . . . . . . . . . . . . . . . . . . . . . 575
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 572 Resources for Recallable Objects . . . . . . . . . . . . 578
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 573 limits . . . . . . . . . . . . . . . . . . . . . . . . . 579
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 574 sequencing and control . . . . . . . . . . . . . . . . . 580
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 576 Delegation Wants . . . . . . . . . . . . . . . . . . . . 582
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 577 lock availability . . . . . . . . . . . . . . . . . . . 583
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 579 changes . . . . . . . . . . . . . . . . . . . . . . . . 585
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 581 Operation . . . . . . . . . . . . . . . . . . . . . . . 587
21. Security Considerations . . . . . . . . . . . . . . . . . . . 581 21. Security Considerations . . . . . . . . . . . . . . . . . . . 587
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 583 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 589
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 583 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 589
22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 584 22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 590
22.1.2. Updating Registrations . . . . . . . . . . . . . . . 584 22.1.2. Updating Registrations . . . . . . . . . . . . . . . 590
22.2. Device ID Notifications . . . . . . . . . . . . . . . . 584 22.2. Device ID Notifications . . . . . . . . . . . . . . . . 590
22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 585 22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 591
22.2.2. Updating Registrations . . . . . . . . . . . . . . . 585 22.2.2. Updating Registrations . . . . . . . . . . . . . . . 591
22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 585 22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 591
22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 587 22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 593
22.3.2. Updating Registrations . . . . . . . . . . . . . . . 587 22.3.2. Updating Registrations . . . . . . . . . . . . . . . 593
22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 587 22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 593
22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 588 22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 594
22.4.2. Updating Registrations . . . . . . . . . . . . . . . 588 22.4.2. Updating Registrations . . . . . . . . . . . . . . . 594
22.4.3. Guidelines for Writing Layout Type Specifications . 588 22.4.3. Guidelines for Writing Layout Type Specifications . 594
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 590 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 596
22.5.1. Path Variables Registry . . . . . . . . . . . . . . 590 22.5.1. Path Variables Registry . . . . . . . . . . . . . . 596
22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 592 22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 598
22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 592 22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 598
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 593 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 599
23.1. Normative References . . . . . . . . . . . . . . . . . . 593 23.1. Normative References . . . . . . . . . . . . . . . . . . 599
23.2. Informative References . . . . . . . . . . . . . . . . . 595 23.2. Informative References . . . . . . . . . . . . . . . . . 601
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 596 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 602
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 598 Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 604
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 599 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 605
Intellectual Property and Copyright Statements . . . . . . . . . 600 Intellectual Property and Copyright Statements . . . . . . . . . 606
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [20]. It generally follows the version, NFSv4.0 is described in [20]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 26, line 4 skipping to change at page 26, line 4
(e.g. restarts) of the same client cause the client to present the (e.g. restarts) of the same client cause the client to present the
same string. The implementor is cautioned from an approach that same string. The implementor is cautioned from an approach that
requires the string to be recorded in a local file because this requires the string to be recorded in a local file because this
precludes the use of the implementation in an environment where precludes the use of the implementation in an environment where
there is no local disk and all file access is from an NFSv4.1 there is no local disk and all file access is from an NFSv4.1
server. server.
o The string should be the same for each server network address that o The string should be the same for each server network address that
the client accesses. This way, if a server has multiple the client accesses. This way, if a server has multiple
interfaces, the client can trunk traffic over multiple network interfaces, the client can trunk traffic over multiple network
paths as described in Section 2.10.4. (Note: the precise opposite paths as described in Section 2.10.5. (Note: the precise opposite
was advised in the NFSv4.0 specification [20].) was advised in the NFSv4.0 specification [20].)
o The algorithm for generating the string should not assume that the o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client client's network address will not change, unless the client
implementation knows it is using statically assigned network implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and addresses. This includes changes between client incarnations and
even changes while the client is still running in its current even changes while the client is still running in its current
incarnation. Thus with dynamic address assignment, if the client incarnation. Thus with dynamic address assignment, if the client
includes just the client's network address in the co_ownerid includes just the client's network address in the co_ownerid
string, there is a real risk that after the client gives up the string, there is a real risk that after the client gives up the
skipping to change at page 27, line 9 skipping to change at page 27, line 9
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives an current client ID is no longer valid when it receives an
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.5.5), but in each case the session is persistent (see Section 2.10.6.5), but in each case
the client will receive this error when it attempts to establish a the client will receive this error when it attempts to establish a
new session with the existing client ID and receives the error new session with the existing client ID and receives the error
NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be
obtained via EXCHANGE_ID and the new session established with that obtained via EXCHANGE_ID and the new session established with that
client ID. client ID.
When a session is not persistent, the client will find out that it When a session is not persistent, the client will find out that it
needs to create a new session as a result of getting an needs to create a new session as a result of getting an
NFS4ERR_BADSESSION, since the session in question was lost as part of NFS4ERR_BADSESSION, since the session in question was lost as part of
a server restart. When the existing client ID is presented to a a server restart. When the existing client ID is presented to a
skipping to change at page 28, line 39 skipping to change at page 28, line 39
the client ID in order to conserve resources. If the client contacts the client ID in order to conserve resources. If the client contacts
the server after this release, the server must ensure the client the server after this release, the server must ensure the client
receives the appropriate error so that it will use the EXCHANGE_ID/ receives the appropriate error so that it will use the EXCHANGE_ID/
CREATE_SESSION sequence to establish a new client ID. The server CREATE_SESSION sequence to establish a new client ID. The server
ought to be very hesitant to release a client ID since the resulting ought to be very hesitant to release a client ID since the resulting
work on the client to recover from such an event will be the same work on the client to recover from such an event will be the same
burden as if the server had failed and restarted. Typically a server burden as if the server had failed and restarted. Typically a server
would not release a client ID unless there had been no activity from would not release a client ID unless there had been no activity from
that client for many minutes. As long as there are sessions, opens, that client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.11.1.4 for discussion on releasing the client ID. See Section 2.10.12.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
skipping to change at page 29, line 15 skipping to change at page 29, line 15
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is issuing the EXCHANGE_ID. Note
that if the client ID was created with SP4_MACH_CRED state that if the client ID was created with SP4_MACH_CRED state
protection (Section 18.35), the principal MUST be based on protection (Section 18.35), the principal MUST be based on
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
integrity or privacy, and the same GSS mechanism and principal integrity or privacy, and the same GSS mechanism and principal
must be used as that used when the client ID was created. must be used as that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.8.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.9).
o The client ID was established with SP4_SSV protection, and under o The client ID was established with SP4_SSV protection, and under
the conditions described herein, the EXCHANGE_ID was sent with the conditions described herein, the EXCHANGE_ID was sent with
SP4_MACH_CRED state protection. Because the SSV might not persist SP4_MACH_CRED state protection. Because the SSV might not persist
across client and server restart, and because the first time a across client and server restart, and because the first time a
client sends EXCHANGE_ID to a server it does not have an SSV, the client sends EXCHANGE_ID to a server it does not have an SSV, the
client MAY send the subsequent EXCHANGE_ID without an SSV client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
principal MUST be based on RPCSEC_GSS authentication, the principal MUST be based on RPCSEC_GSS authentication, the
RPCSEC_GSS service used MUST be integrity or privacy, and the same RPCSEC_GSS service used MUST be integrity or privacy, and the same
skipping to change at page 29, line 41 skipping to change at page 29, line 41
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, and the co_verifier in the EXCHANGE_ID which created the client ID, and the co_verifier in the EXCHANGE_ID
differs from the co_verifier used when the client ID was created, differs from the co_verifier used when the client ID was created,
then after the server receives a CREATE_SESSION that confirms the then after the server receives a CREATE_SESSION that confirms the
client ID, the server deletes state. If the co_verifier values are client ID, the server deletes state. If the co_verifier values are
the same, (e.g. the client is either updating properties of the the same, (e.g. the client is either updating properties of the
client ID (Section 18.35), or the client is attempting trunking client ID (Section 18.35), or the client is attempting trunking
(Section 2.10.4) the server MUST NOT delete state. (Section 2.10.5) the server MUST NOT delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand server ID. The Server unlike the Client Owner, there is no shorthand server ID. The Server
Owner is defined in the following data type: Owner is defined in the following data type:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID were sent over can be assumed to address the same Server EXCHANGE_ID were sent over can be assumed to address the same Server
(as defined in Section 1.5). If the so_minor_id fields are also the (as defined in Section 1.5). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session can be shared across both connections. The reader is but the session can be shared across both connections. The reader is
cautioned that multiple servers may deliberately or accidentally cautioned that multiple servers may deliberately or accidentally
claim to have the same so_major_id or so_major_id/so_minor_id; the claim to have the same so_major_id or so_major_id/so_minor_id; the
reader should examine Section 2.10.4 and Section 18.35 in order to reader should examine Section 2.10.5 and Section 18.35 in order to
avoid acting on falsely matching Server Owner values. avoid acting on falsely matching Server Owner values.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.5).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
With the NFSv4.1 server potentially offering multiple security With the NFSv4.1 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its file system namespace NFS server may have multiple points within its file system namespace
that are available for use by NFS clients. These points can be that are available for use by NFS clients. These points can be
considered security policy boundaries, and in some NFS considered security policy boundaries, and in some NFS
implementations are tied to NFS export points. In turn the NFS implementations are tied to NFS export points. In turn the NFS
skipping to change at page 40, line 22 skipping to change at page 40, line 22
In order to reduce congestion, if a connection-oriented transport is In order to reduce congestion, if a connection-oriented transport is
used, and the request is not the NULL procedure, used, and the request is not the NULL procedure,
o A requester MUST NOT retry a request unless the connection the o A requester MUST NOT retry a request unless the connection the
request was sent over was lost before the reply was received. request was sent over was lost before the reply was received.
o A replier MUST NOT silently drop a request, even if the request is o A replier MUST NOT silently drop a request, even if the request is
a retry. (The silent drop behavior of RPCSEC_GSS [4] does not a retry. (The silent drop behavior of RPCSEC_GSS [4] does not
apply because this behavior happens at the RPCSEC_GSS layer, a apply because this behavior happens at the RPCSEC_GSS layer, a
lower layer in the request processing). Instead, the replier lower layer in the request processing). Instead, the replier
SHOULD return an appropriate error (see Section 2.10.5.1) or it SHOULD return an appropriate error (see Section 2.10.6.1) or it
MAY disconnect the connection. MAY disconnect the connection.
When sending a reply, the replier MUST send the reply to the same When sending a reply, the replier MUST send the reply to the same
full network address (e.g. if using an IP-based transport, the source full network address (e.g. if using an IP-based transport, the source
port of the requester is part of the full network address) that the port of the requester is part of the full network address) that the
requester sent the request from. If using a connection-oriented requester sent the request from. If using a connection-oriented
transport, replies MUST be sent on the same connection the request transport, replies MUST be sent on the same connection the request
was received from. was received from.
If a connection is dropped after the replier receives the request but If a connection is dropped after the replier receives the request but
skipping to change at page 41, line 15 skipping to change at page 41, line 15
o RDMA credits present a new issue to the reply cache in NFSv4.1. o RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, as described in Section 2.10.5.2, while a session is In addition, as described in Section 2.10.6.2, while a session is
active, the NFSv4.1 requester MUST NOT stop waiting for a reply. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [24] for the NFS protocol should be the default registered port 2049 [24] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [25]. protocols as described in [25].
2.10. Session 2.10. Session
skipping to change at page 41, line 51 skipping to change at page 41, line 51
o Requiring machine credentials for fully secure operation. o Requiring machine credentials for fully secure operation.
Through the introduction of a session, NFSv4.1 addresses the above Through the introduction of a session, NFSv4.1 addresses the above
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
through server failure and recovery. One reason that previous through server failure and recovery. One reason that previous
revisions of NFS did not support EOS was because some EOS revisions of NFS did not support EOS was because some EOS
approaches often limited parallelism. As will be explained in approaches often limited parallelism. As will be explained in
Section 2.10.5, NFSv4.1 supports both EOS and unlimited Section 2.10.6, NFSv4.1 supports both EOS and unlimited
parallelism. parallelism.
o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates
transport connections and provides them to the server to use for transport connections and provides them to the server to use for
sending callback requests, thus solving the firewall issue sending callback requests, thus solving the firewall issue
(Section 18.34). Races between responses from client requests, (Section 18.34). Races between responses from client requests,
and callbacks caused by the requests are detected via the and callbacks caused by the requests are detected via the
session's sequencing properties which are a consequence of EOS session's sequencing properties which are a consequence of EOS
(Section 2.10.5.3). (Section 2.10.6.3).
o The NFSv4.1 client can add an arbitrary number of connections to o The NFSv4.1 client can add an arbitrary number of connections to
the session, and thus provide trunking (Section 2.10.4). the session, and thus provide trunking (Section 2.10.5).
o The NFSv4.1 client and server produces a session key independent o The NFSv4.1 client and server produces a session key independent
of client and server machine credentials which can be used to of client and server machine credentials which can be used to
compute a digest for protecting critical session management compute a digest for protecting critical session management
operations (Section 2.10.7.3). operations (Section 2.10.8.3).
o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for
use by the session's backchannel that do not require the server to use by the session's backchannel that do not require the server to
authenticate to a client machine principal (Section 2.10.7.2). authenticate to a client machine principal (Section 2.10.8.2).
A session is a dynamically created, long-lived server object created A session is a dynamically created, long-lived server object created
by a client, used over time from one or more transport connections. by a client, used over time from one or more transport connections.
Its function is to maintain the server's state relative to the Its function is to maintain the server's state relative to the
connection(s) belonging to a client instance. This state is entirely connection(s) belonging to a client instance. This state is entirely
independent of the connection itself, and indeed the state exists independent of the connection itself, and indeed the state exists
whether the connection exists or not. A client may have one or more whether the connection exists or not. A client may have one or more
sessions associated with it so that client-associated state may be sessions associated with it so that client-associated state may be
accessed using any of the sessions associated with that client's accessed using any of the sessions associated with that client's
client ID, when connections are associated with those sessions. When client ID, when connections are associated with those sessions. When
skipping to change at page 43, line 20 skipping to change at page 43, line 20
established session, with the exception of some session established session, with the exception of some session
administration operations, such as DESTROY_SESSION (Section 18.37). administration operations, such as DESTROY_SESSION (Section 18.37).
2.10.2.1. SEQUENCE and CB_SEQUENCE 2.10.2.1. SEQUENCE and CB_SEQUENCE
In NFSv4.1, when the SEQUENCE operation is present, it MUST be the In NFSv4.1, when the SEQUENCE operation is present, it MUST be the
first operation in the COMPOUND procedure. The primary purpose of first operation in the COMPOUND procedure. The primary purpose of
SEQUENCE is to carry the session identifier. The session identifier SEQUENCE is to carry the session identifier. The session identifier
associates all other operations in the COMPOUND procedure with a associates all other operations in the COMPOUND procedure with a
particular session. SEQUENCE also contains required information for particular session. SEQUENCE also contains required information for
maintaining EOS (see Section 2.10.5). Session-enabled NFSv4.1 maintaining EOS (see Section 2.10.6). Session-enabled NFSv4.1
COMPOUND requests thus have the form: COMPOUND requests thus have the form:
+-----+--------------+-----------+------------+-----------+---- +-----+--------------+-----------+------------+-----------+----
| tag | minorversion | numops |SEQUENCE op | op + args | ... | tag | minorversion | numops |SEQUENCE op | op + args | ...
| | (== 1) | (limited) | + args | | | | (== 1) | (limited) | + args | |
+-----+--------------+-----------+------------+-----------+---- +-----+--------------+-----------+------------+-----------+----
and the replys have the form: and the replies have the form:
+------------+-----+--------+-------------------------------+--// +------------+-----+--------+-------------------------------+--//
|last status | tag | numres |status + SEQUENCE op + results | // |last status | tag | numres |status + SEQUENCE op + results | //
+------------+-----+--------+-------------------------------+--// +------------+-----+--------+-------------------------------+--//
//-----------------------+---- //-----------------------+----
// status + op + results | ... // status + op + results | ...
//-----------------------+---- //-----------------------+----
A CB_COMPOUND procedure request and reply has a similar form to A CB_COMPOUND procedure request and reply has a similar form to
COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE COMPOUND, but instead of a SEQUENCE operation, there is a CB_SEQUENCE
operation. CB_COMPOUND also has an additional field called operation. CB_COMPOUND also has an additional field called
"callback_ident", which is superfluous in NFSv4.1 and MUST be ignored "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored
by the client. CB_SEQUENCE has the same information as SEQUENCE, and by the client. CB_SEQUENCE has the same information as SEQUENCE, and
also includes other information needed to resolve callback races also includes other information needed to resolve callback races
(Section 2.10.5.3). (Section 2.10.6.3).
2.10.2.2. Client ID and Session Association 2.10.2.2. Client ID and Session Association
Each client ID (Section 2.4) can have zero or more active sessions. Each client ID (Section 2.4) can have zero or more active sessions.
A client ID and associated session are required to perform file A client ID and associated session are required to perform file
access in NFSv4.1. Each time a session is used (whether by a client access in NFSv4.1. Each time a session is used (whether by a client
sending a request to the server, or the client replying to a callback sending a request to the server, or the client replying to a callback
request from the server), the state leased to its associated client request from the server), the state leased to its associated client
ID is automatically renewed. ID is automatically renewed.
State such as share reservations, locks, delegations, and layouts State such as share reservations, locks, delegations, and layouts
(Section 1.6.4) is tied to the client ID. Client state is not tied (Section 1.6.4) is tied to the client ID. Client state is not tied
to any individual session. Successive state changing operations from to any individual session. Successive state changing operations from
a given state owner MAY go over different sessions, provided the a given state owner MAY go over different sessions, provided the
session is associated with the same client ID. A callback MAY arrive session is associated with the same client ID. A callback MAY arrive
over a different session than from the session that originally over a different session than from the session that originally
acquired the state pertaining to the callback. For example, if acquired the state pertaining to the callback. For example, if
session A is used to acquire a delegation, a request to recall the session A is used to acquire a delegation, a request to recall the
delegation MAY arrive over session B if both sessions are associated delegation MAY arrive over session B if both sessions are associated
with the same client ID. Section 2.10.7.1 and Section 2.10.7.2 with the same client ID. Section 2.10.8.1 and Section 2.10.8.2
discuss the security considerations around callbacks. discuss the security considerations around callbacks.
2.10.3. Channels 2.10.3. Channels
A channel is not a connection. A channel represents the direction A channel is not a connection. A channel represents the direction
ONC RPC requests are sent. ONC RPC requests are sent.
Each session has one or two channels: the fore channel and the Each session has one or two channels: the fore channel and the
backchannel. Because there are at most two channels per session, and backchannel. Because there are at most two channels per session, and
because each channel has a distinct purpose, channels are not because each channel has a distinct purpose, channels are not
skipping to change at page 44, line 39 skipping to change at page 44, line 39
server, and carries COMPOUND requests and responses. A session server, and carries COMPOUND requests and responses. A session
always has a fore channel. always has a fore channel.
The backchannel used for callback requests from server to client, and The backchannel used for callback requests from server to client, and
carries CB_COMPOUND requests and responses. Whether there is a carries CB_COMPOUND requests and responses. Whether there is a
backchannel or not is a decision by the client, however many features backchannel or not is a decision by the client, however many features
of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support
backchannels. backchannels.
Each session has resources for each channel, including separate reply Each session has resources for each channel, including separate reply
caches (see Section 2.10.5.1). Note that even the backchannel caches (see Section 2.10.6.1). Note that even the backchannel
requires a reply cache because some callback operations are requires a reply cache because some callback operations are
nonidempotent. nonidempotent.
2.10.3.1. Association of Connections, Channels, and Sessions 2.10.3.1. Association of Connections, Channels, and Sessions
Each channel is associated with zero or more transport connections Each channel is associated with zero or more transport connections
(whether of the same transport protocol or different transport (whether of the same transport protocol or different transport
protocols). A connection can be associated with one channel or both protocols). A connection can be associated with one channel or both
channels of a session; the client and server negotiate whether a channels of a session; the client and server negotiate whether a
connection will carry traffic for one channel or both channels via connection will carry traffic for one channel or both channels via
skipping to change at page 45, line 22 skipping to change at page 45, line 22
A connection's association with a session is not exclusive. A A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs. including sessions associated with other client IDs.
It is permissible for connections of multiple transport types to be It is permissible for connections of multiple transport types to be
associated with the same channel. For example both a TCP and RDMA associated with the same channel. For example both a TCP and RDMA
connection can be associated with the fore channel. In the event an connection can be associated with the fore channel. In the event an
RDMA and non-RDMA connection are associated with the same channel, RDMA and non-RDMA connection are associated with the same channel,
the maximum number of slots SHOULD be at least one more than the the maximum number of slots SHOULD be at least one more than the
total number of RDMA credits (Section 2.10.5.1. This way if all RDMA total number of RDMA credits (Section 2.10.6.1. This way if all RDMA
credits are used, the non-RDMA connection can have at least one credits are used, the non-RDMA connection can have at least one
outstanding request. If a server supports multiple transport types, outstanding request. If a server supports multiple transport types,
it MUST allow a client to associate connections from each transport it MUST allow a client to associate connections from each transport
to a channel. to a channel.
It is permissible for a connection of one type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Trunking 2.10.4. Server Scope
Servers each specify a server scope value in the form of an opaque
string eir_server_scope returned as part of the results of an
EXCHANGE_ID operation. The purpose of the server scope is to allow
groups of servers to indicate to clients that a set of servers
sharing the same server scope value have arranged to use compatible
values of otherwise opaque identifiers so that the identifiers
generated by one server of that set may be presented to another of
that same scope.
The use of such compatible values does not imply that a value
generated by one server will always be accepted by another. In most
cases, it will not. However, a server will not accept a value
generated by another inadvertently. When it does accept it, it will
be because it is recognized as valid and carrying the same meaning as
on another server of the same scope.
When servers are of the same server scope, this compatibility of
values applies to the follow identifiers:
o Filehandle values. A filehandle value accepted by two servers of
the same server scope denotes the same object. A write done to
one server is reflected immediately in a read done to the other
and locks obtained on one server conflict with those requested on
the other.
o Session ID values. A session ID value accepted by two server of
the same server scope denotes the same session.
o Client ID values. A client ID value accepted as valid by two
servers of the same server scope is associated with two clients
with the same client owner and verifier.
o State ID values when the corresponding client ID is recognized as
valid. If the same stateid value is accepted as valid on two
servers of the same scope and the client ID's on the two servers
represent the same client owner and verifier, then the two state
ID values designate the same set of locks and are for the same
file
o Server owner values. When the server scope values are the same,
server owner value may be validly compared. In cases where the
server scope are different, server owner values are treated as
different even if they contain all identical bytes.
The co-ordination among servers required to provide such
compatibility can be quite minimal, and limited to a simple partition
of the ID space. The recognition of common values requires
additional implementation, but this can be tailored to the specific
situations in which that recognition is desired.
Clients will have occasion to compare the server scope values of
multiple servers under a number of circumstances, each of which will
be discussed under the appropriate functional section.
o When server owner values received in response to EXCHANGE_ID
operations issued to multiple network addresses are compared for
the purpose of determining the validity of various forms of
trunking, as described in Section 2.10.5.
o When network or server reconfiguration causes the same network
address to possibly be directed to different servers, with the
necessity for the client to determine when lock reclaim should be
attempted, as described in Section 8.4.2.1
o When file system migration causes the transfer of responsibility
for a file system between servers and the client needs to
determine whether state has been transferred with the file system
and whether a client may reclaim it on a similar basis as in the
case of server reboot.
When two replies from EXCHANGE_ID each from two different server
network addresses have the same server scope, there are a number of
ways a client can validate that the common server scope is by benign
intent.
o If both EXCHANGE_ID requests were sent with RPCSEC_GSS
authentication and the server principal is the same for both
targets, the equality of server scope is validated.
o A second option for verification is to use SP4_SSV protection in a
fashion similar to the verification of client ID trunking. When
the client sends EXCHANGE_ID it specifies SP4_SSV protection. The
first EXCHANGE_ID the client sends always has to be confirmed by a
CREATE_SESSION call. The client then sends SET_SSV. Later the
client sends EXCHANGE_ID to a second destination network address
different form the one the first EXCHANGE_ID was sent to. The
client checks that each EXCHANGE_ID reply has the same
eir_server_scope. If so, the client verifies the claim by issuing
a CREATE_SESSION to the second destination address, protected with
RPCSEC_GSS integrity using an RPCSEC_GSS handle returned by the
second EXCHANGE_ID. If the server accepts the CREATE_SESSION
request, and if the client verifies the RPCSEC_GSS verifier and
integrity codes, then the client has proof the second server knows
the SSV, and thus the two servers are co-operating for the
purposes of maintaining compatible ID spaces as indicated by a
common server scope.
o If neither of the two methods provides verification, the client
may accept the appearance of the second server in fs_locations or
fs_locations_info attribute for a relevant file system. For
example, if there is migration event for a particular particular
file system or there are locks to be reclaimed on a particular
file system, the attributes for that particular file system may be
used. The client sends the GETATTR request to the first server
for the fs_locations or fs_locations_info attribute with
RPCSEC_GSS authentication. It may need to do this in advance of
the need to verify the common server scope. If the client
successfully authenticates the reply to GETATTR, and the GETATTR
request and reply containing the fs_locations or fs_locations_info
attribute refers to the second server, then the equality of server
scope is supported. A client may choose to limit the use of this
form of support to information relevant to the specific file
system involved (e.g. a file system being migrated).
2.10.5. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 repliers and requesters MUST support session trunking.
trunking. NFSv4.1 servers MAY support client ID trunking. NFSv4.1
clients MUST support client ID trunking. NFSv4.1 servers MUST support both forms of trunking within the
context of a single server network address and MUST support both
forms within the context of the set of network addresses used to
access a single server. NFSv4.1 servers in a clustered configuration
MAY allow network addresses for different servers to use client ID
trunking.
Clients may use either form of trunking as long as they do not, when
trunking between different server network addresses, violate the
servers' mandates as to the kinds of trunking to be allowed (see
below). With regard to callback channels, the client MUST allow the
server to choose among all callback channels valid for a given client
ID and MUST support trunking when the connections supporting the
backchannel allow session or client ID trunking to be used for
callbacks
Session trunking is essentially the association of multiple Session trunking is essentially the association of multiple
connections, each with potentially different target and/or source connections, each with potentially different target and/or source
network addresses, to the same session. network addresses, to the same session. When the target network
addresses (server addresses) of the two connections are the same, the
server MUST support such session trunking. When the target network
addresses are different, the server MAY indicate such support using
the data returned by the EXCHANGE_ID operation (see below).
Client ID trunking is the association of multiple sessions to the Client ID trunking is the association of multiple sessions to the
same client ID, major server owner ID (Section 2.5), and server scope same client ID. Servers MUST support client ID trunking for two
(Section 11.7.7). When two servers return the same major server target network addresses whenever they allow session trunking for
owner and server scope it means the two servers are cooperating on those same two network addresses. In addition, a server MAY, by
presenting the same major server owner ID (Section 2.5), and server
scope (Section 11.7.7) allow an additional case of client ID
trunking. When two servers return the same major server owner and
server scope, it means that the two servers are cooperating on
locking state management which is a prerequisite for client ID locking state management which is a prerequisite for client ID
trunking. trunking.
Understanding and distinguishing session and client ID trunking Understanding and distinguishing when the client is allowed to use
requires understanding how the results of the EXCHANGE_ID session and client ID trunking requires understanding how the results
(Section 18.35) operation identify a server. Suppose a client sends of the EXCHANGE_ID (Section 18.35) operation identify a server.
EXCHANGE_ID over two different connections each with a possibly Suppose a client sends EXCHANGE_ID over two different connections
different target network address but each EXCHANGE_ID with the same each with a possibly different target network address but each
value in the eia_clientowner field. If the same NFSv4.1 server is EXCHANGE_ID operation has the same value in the eia_clientowner
listening over each connection, then each EXCHANGE_ID result MUST field. If the same NFSv4.1 server is listening over each connection,
return the same values of eir_clientid, eir_server_owner.so_major_id then each EXCHANGE_ID result MUST return the same values of
and eir_server_scope. The client can then treat each connection as eir_clientid, eir_server_owner.so_major_id and eir_server_scope. The
referring to the same server (subject to verification, see client can then treat each connection as referring to the same server
Paragraph 5 later in this section), and it can use each connection to (subject to verification, see Paragraph 8 later in this section), and
trunk requests and replies. The question is whether session trunking it can use each connection to trunk requests and replies. The
and/or client ID trunking applies. client's choice is whether session trunking or client ID trunking
applies.
Session Trunking If the eia_clientowner argument is the same in two Session Trunking If the eia_clientowner argument is the same in two
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can send BIND_CONN_TO_SESSION to for the tuple, the client can send BIND_CONN_TO_SESSION to
associate the connection to the session. (Of course, if the associate the connection to the session.
client does not want to use session trunking, it can invoke
CREATE_SESSION on the connection. This will result in client ID Of course, if the client does not desire to use session trunking,
trunking as described below.) it is not required to do so. It can invoke CREATE_SESSION on the
connection. This will result in client ID trunking as described
below. It can also decide to drop the connection if it does not
choose to use trunking.
Client ID Trunking If the eia_clientowner argument is the same in Client ID Trunking If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. server.
Of course, even if the eir_server_owner.so_minor_id fields do Of course, even if the eir_server_owner.so_minor_id fields do
match, the client is free to employ client ID trunking instead of match, the client is free to employ client ID trunking instead of
session trunking. session trunking.
The client completes the act of client ID trunking by invoking The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session. sessions and also associate each connection with its respective
session. The client is free to choose not to use client ID
trunking by simply dropping the connection at this point.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with that same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
The client should be prepared for the possibility that
eir_server_owner values may be different on subsequent EXCHANGE_ID
requests made to the same network address, as a result of various
sorts of reconfiguration events. When this happens and the changes
result in the invalidation of previously valid forms of trunking, the
client should cease to use those forms, either by dropping
connections or by adding sessions. For a discussion of lock reclaim
as it relates to such reconfiguration events, see Section 8.4.2.1.
When two servers over two connections claim matching or partially When two servers over two connections claim matching or partially
matching eir_server_owner, eir_server_scope, and eir_clientid values, matching eir_server_owner, eir_server_scope, and eir_clientid values,
the client does not have to trust the servers' claims. The client the client does not have to trust the servers' claims. The client
may verify these claims before trunking traffic in the following may verify these claims before trunking traffic in the following
ways: ways:
o For session trunking, clients SHOULD reliably verify if o For session trunking, clients SHOULD reliably verify if
connections between different network paths are in fact associated connections between different network paths are in fact associated
with the same NFSv4.1 server and usable on the same session, and with the same NFSv4.1 server and usable on the same session, and
servers MUST allow clients to perform reliable verification. When servers MUST allow clients to perform reliable verification. When
skipping to change at page 47, line 35 skipping to change at page 50, line 41
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST send the BIND_CONN_TO_SESSION with operation, the client MUST send the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and an RPCSEC_GSS protection, using integrity or privacy, and an
RPCSEC_GSS handle created with the GSS SSV mechanism RPCSEC_GSS handle created with the GSS SSV mechanism
(Section 2.10.8). (Section 2.10.9).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, so the client will know it cannot not be verified by the client, so the client will know it cannot
use the connection for trunking the specified session. use the connection for trunking the specified session.
skipping to change at page 48, line 22 skipping to change at page 51, line 27
authentication, the client notes the principal name of the GSS authentication, the client notes the principal name of the GSS
target. If the EXCHANGE_ID results indicate client ID trunking is target. If the EXCHANGE_ID results indicate client ID trunking is
possible, and the GSS targets' principal names are the same, the possible, and the GSS targets' principal names are the same, the
servers are the same and client ID trunking is allowed. servers are the same and client ID trunking is allowed.
The second option for verification is to use SP4_SSV protection. The second option for verification is to use SP4_SSV protection.
When the client sends EXCHANGE_ID it specifies SP4_SSV protection. When the client sends EXCHANGE_ID it specifies SP4_SSV protection.
The first EXCHANGE_ID the client sends always has to be confirmed The first EXCHANGE_ID the client sends always has to be confirmed
by a CREATE_SESSION call. The client then sends SET_SSV. Later by a CREATE_SESSION call. The client then sends SET_SSV. Later
the client sends EXCHANGE_ID to a second destination network the client sends EXCHANGE_ID to a second destination network
address than the first EXCHANGE_ID was sent with. The client address different from the one the first EXCHANGE_ID was sent to.
checks that each EXCHANGE_ID reply has the same eir_clientid, The client checks that each EXCHANGE_ID reply has the same
eir_server_owner.so_major_id, and eir_server_scope. If so, the eir_clientid, eir_server_owner.so_major_id, and eir_server_scope.
client verifies the claim by issuing a CREATE_SESSION to the If so, the client verifies the claim by issuing a CREATE_SESSION
second destination address, protected with RPCSEC_GSS integrity to the second destination address, protected with RPCSEC_GSS
using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If integrity using an RPCSEC_GSS handle returned by the second
the server accepts the CREATE_SESSION request, and if the client EXCHANGE_ID. If the server accepts the CREATE_SESSION request,
verifies the RPCSEC_GSS verifier and integrity codes, then the and if the client verifies the RPCSEC_GSS verifier and integrity
client has proof the second server knows the SSV, and thus the two codes, then the client has proof the second server knows the SSV,
servers are the same for the purposes of client ID trunking. and thus the two servers are co-operating for the purposes of
specifying server scope and client ID trunking.
2.10.5. Exactly Once Semantics 2.10.6. Exactly Once Semantics
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.5.1.3). The requirement holds even if the requester is Section 2.10.6.1.3). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client issuing the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
requirement, divide the requests into three classifications: requirement, divide the requests into three classifications:
o Nonidempotent requests. o Nonidempotent requests.
o Idempotent modifying requests. o Idempotent modifying requests.
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
skipping to change at page 49, line 46 skipping to change at page 52, line 49
execution of a such a request will not cause data corruption, or execution of a such a request will not cause data corruption, or
produce an incorrect result. Nonetheless, to keep the implementation produce an incorrect result. Nonetheless, to keep the implementation
simple, the replier MUST enforce EOS for all requests whether simple, the replier MUST enforce EOS for all requests whether
idempotent and non-modifying or not. idempotent and non-modifying or not.
Note that true and complete EOS is not possible unless the server Note that true and complete EOS is not possible unless the server
persists the reply cache in stable storage, unless the server is persists the reply cache in stable storage, unless the server is
somehow implemented to never require a restart (indeed if such a somehow implemented to never require a restart (indeed if such a
server exists, the distinction between a reply cache kept in stable server exists, the distinction between a reply cache kept in stable
storage versus one that is not is one without meaning). See storage versus one that is not is one without meaning). See
Section 2.10.5.5 for a discussion of persistence in the reply cache. Section 2.10.6.5 for a discussion of persistence in the reply cache.
Regardless, even if the server does not persist the reply cache, EOS Regardless, even if the server does not persist the reply cache, EOS
improves robustness and correctness over previous versions of NFS improves robustness and correctness over previous versions of NFS
because the legacy duplicate request/reply caches were based on the because the legacy duplicate request/reply caches were based on the
ONC RPC transaction identifier (XID). Section 2.10.5.1 explains the ONC RPC transaction identifier (XID). Section 2.10.6.1 explains the
shortcomings of the XID as a basis for a reply cache and describes shortcomings of the XID as a basis for a reply cache and describes
how NFSv4.1 sessions improve upon the XID. how NFSv4.1 sessions improve upon the XID.
2.10.5.1. Slot Identifiers and Reply Cache 2.10.6.1. Slot Identifiers and Reply Cache
The RPC layer provides a transaction ID (XID), which, while required The RPC layer provides a transaction ID (XID), which, while required
to be unique, is not convenient for tracking requests for two to be unique, is not convenient for tracking requests for two
reasons. First, the XID is only meaningful to the requester; it reasons. First, the XID is only meaningful to the requester; it
cannot be interpreted by the replier except to test for equality with cannot be interpreted by the replier except to test for equality with
previously sent requests. When consulting an RPC-based duplicate previously sent requests. When consulting an RPC-based duplicate
request cache, the opaqueness of the XID requires a computationally request cache, the opaqueness of the XID requires a computationally
expensive lookup (often via a hash that includes XID and source expensive lookup (often via a hash that includes XID and source
address). NFSv4.1 requests use a non-opaque slot ID which is an address). NFSv4.1 requests use a non-opaque slot ID which is an
index into a slot table, which is far more efficient. Second, index into a slot table, which is far more efficient. Second,
skipping to change at page 51, line 20 skipping to change at page 54, line 26
request is: request is:
o A new request, in which the sequence ID is one greater than that o A new request, in which the sequence ID is one greater than that
previously seen in the slot (accounting for sequence wraparound). previously seen in the slot (accounting for sequence wraparound).
The replier proceeds to execute the new request, and the replier The replier proceeds to execute the new request, and the replier
MUST increase the slot's sequence ID by one. MUST increase the slot's sequence ID by one.
o A retransmitted request, in which the sequence ID is equal to that o A retransmitted request, in which the sequence ID is equal to that
currently recorded in the slot. If the original request has currently recorded in the slot. If the original request has
executed to completion, the replier returns the cached reply. See executed to completion, the replier returns the cached reply. See
Section 2.10.5.2 for direction on how the replier deals with Section 2.10.6.2 for direction on how the replier deals with
retries of requests that are still in progress. retries of requests that are still in progress.
o A misordered retry, in which the sequence ID is less than o A misordered retry, in which the sequence ID is less than
(accounting for sequence wraparound) that previously seen in the (accounting for sequence wraparound) that previously seen in the
slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the
result from SEQUENCE or CB_SEQUENCE). result from SEQUENCE or CB_SEQUENCE).
o A misordered new request, in which the sequence ID is two or more o A misordered new request, in which the sequence ID is two or more
than (accounting for sequence wraparound) than that previously than (accounting for sequence wraparound) than that previously
seen in the slot. Note that because the sequence ID must seen in the slot. Note that because the sequence ID must
skipping to change at page 54, line 23 skipping to change at page 57, line 27
because the request may have been sent from the requester before because the request may have been sent from the requester before
the update was received. Therefore, in the downward adjustment the update was received. Therefore, in the downward adjustment
case, the replier may have to retain a number of reply cache case, the replier may have to retain a number of reply cache
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slot ID as the request replied to and the request with the same slot ID as the request replied to and the
next higher sequence ID. next higher sequence ID.
2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2.10.6.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, session ID, sequence its reply MUST always be cached. Specifically, session ID, sequence
ID, and slot ID MUST be cached in the reply cache. The reply from ID, and slot ID MUST be cached in the reply cache. The reply from
SEQUENCE also includes the highest slot ID, target highest slot ID, SEQUENCE also includes the highest slot ID, target highest slot ID,
and status flags. Instead of caching these values, the server MAY and status flags. Instead of caching these values, the server MAY
re-compute the values from the current state of the fore channel, re-compute the values from the current state of the fore channel,
session and/or client ID as appropriate. Similarly, the reply from session and/or client ID as appropriate. Similarly, the reply from
CB_SEQUENCE includes a highest slot ID and target highest slot ID. CB_SEQUENCE includes a highest slot ID and target highest slot ID.
The client MAY re-compute the values from the current state of the The client MAY re-compute the values from the current state of the
skipping to change at page 55, line 5 skipping to change at page 58, line 8
response to the retry, or is a delayed response to the original response to the retry, or is a delayed response to the original
request. Therefore, it may be the case that highest slot ID, target request. Therefore, it may be the case that highest slot ID, target
slot ID, or status bits may reflect the state of affairs when the slot ID, or status bits may reflect the state of affairs when the
request was first executed. Although acting based on such delayed request was first executed. Although acting based on such delayed
information is valid, it may cause the receiver to do unneeded work. information is valid, it may cause the receiver to do unneeded work.
Requesters MAY choose to send additional requests to get the current Requesters MAY choose to send additional requests to get the current
state of affairs or use the state of affairs reported by subsequent state of affairs or use the state of affairs reported by subsequent
requests, in preference to acting immediately on data which may be requests, in preference to acting immediately on data which may be
out of date. out of date.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2.10.6.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.3. Optional Reply Caching 2.10.6.1.3. Optional Reply Caching
On a per-request basis the requester can choose to direct the replier On a per-request basis the requester can choose to direct the replier
to cache the reply to all operations after the first operation to cache the reply to all operations after the first operation
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it
would not direct the replier to cache the entire reply is that the would not direct the replier to cache the entire reply is that the
request is composed of all idempotent operations [23]. Caching the request is composed of all idempotent operations [23]. Caching the
reply may offer little benefit. If the reply is too large (see reply may offer little benefit. If the reply is too large (see
Section 2.10.5.4), it may not be cacheable anyway. Even if the reply Section 2.10.6.4), it may not be cacheable anyway. Even if the reply
to idempotent request is small enough to cache, unnecessarily caching to idempotent request is small enough to cache, unnecessarily caching
the reply slows down the server and increases RPC latency. the reply slows down the server and increases RPC latency.
Whether the requester requests the reply to be cached or not has no Whether the requester requests the reply to be cached or not has no
effect on the slot processing. If the results of SEQUENCE or effect on the slot processing. If the results of SEQUENCE or
CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be
incremented by one. If a requester does not direct the replier to incremented by one. If a requester does not direct the replier to
cache the reply, the replier MUST do one of following: cache the reply, the replier MUST do one of following:
o The replier can cache the entire original reply. Even though o The replier can cache the entire original reply. Even though
sa_cachethis or csa_cachethis are FALSE, the replier is always sa_cachethis or csa_cachethis are FALSE, the replier is always
free to cache. It may choose this approach in order to simplify free to cache. It may choose this approach in order to simplify
implementation. implementation.
o The replier enters into its reply cache a reply consisting of the o The replier enters into its reply cache a reply consisting of the
original results to the SEQUENCE or CB_SEQUENCE operation, and original results to the SEQUENCE or CB_SEQUENCE operation, and
with the next operation in COMPOUND or CB_COMPOUND having the with the next operation in COMPOUND or CB_COMPOUND having the
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP.
2.10.5.2. Retry and Replay of Reply 2.10.6.2. Retry and Replay of Reply
A requester MUST NOT retry a request, unless the connection it used A requester MUST NOT retry a request, unless the connection it used
to send the request disconnects. The requester can then reconnect to send the request disconnects. The requester can then reconnect
and re-send the request, or it can re-send the request over a and re-send the request, or it can re-send the request over a
different connection that is associated with the same session. different connection that is associated with the same session.
If the requester is a server wanting to re-send a callback operation If the requester is a server wanting to re-send a callback operation
over the backchannel of session, the requester of course cannot over the backchannel of session, the requester of course cannot
reconnect because only the client can associate connections with the reconnect because only the client can associate connections with the
backchannel. The server can re-send the request over another backchannel. The server can re-send the request over another
skipping to change at page 56, line 46 skipping to change at page 60, line 5
A retry might be sent while the original request is still in progress A retry might be sent while the original request is still in progress
on the replier. The replier SHOULD deal with the issue by returning on the replier. The replier SHOULD deal with the issue by returning
NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but
implementations MAY return NFS4ERR_MISORDERED. Since errors from implementations MAY return NFS4ERR_MISORDERED. Since errors from
SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this
approach allows the results of the execution of the original request approach allows the results of the execution of the original request
to be properly recorded in the reply cache (assuming the requester to be properly recorded in the reply cache (assuming the requester
specified the reply to be cached). specified the reply to be cached).
2.10.5.3. Resolving Server Callback Races 2.10.6.3. Resolving Server Callback Races
It is possible for server callbacks to arrive at the client before It is possible for server callbacks to arrive at the client before
the reply from related fore channel operations. For example, a the reply from related fore channel operations. For example, a
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
perhaps even a different network, or even a different session perhaps even a different network, or even a different session
associated with the same client ID associated with the same client ID
skipping to change at page 58, line 8 skipping to change at page 61, line 13
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
server, and wait that period of time. If that period of time expires server, and wait that period of time. If that period of time expires
it can respond to the CB_COMPOUND with NFS4ERR_DELAY. it can respond to the CB_COMPOUND with NFS4ERR_DELAY.
There are other scenarios under which callbacks may race replies. There are other scenarios under which callbacks may race replies.
Among them are pNFS layout recalls as described in Section 12.5.5.2. Among them are pNFS layout recalls as described in Section 12.5.5.2.
2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2.10.6.4. COMPOUND and CB_COMPOUND Construction Issues
Very large requests and replies may pose both buffer management Very large requests and replies may pose both buffer management
issues (especially with RDMA) and reply cache issues. When the issues (especially with RDMA) and reply cache issues. When the
session is created, (Section 18.36), for each channel (fore and session is created, (Section 18.36), for each channel (fore and
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
(ca_maxresponsesize_cached). (ca_maxresponsesize_cached).
skipping to change at page 58, line 40 skipping to change at page 61, line 45
If a reply exceeds ca_maxresponsesize, the reply will have the status If a reply exceeds ca_maxresponsesize, the reply will have the status
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
or it MAY opt to return it on a subsequent operation (in the same or it MAY opt to return it on a subsequent operation (in the same
COMPOUND or CB_COMPOUND reply). A replier MAY return COMPOUND or CB_COMPOUND reply). A replier MAY return
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if
the response would still exceed ca_maxresponsesize. the response would still exceed ca_maxresponsesize.
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
cache a reply except if an error is returned by the SEQUENCE or cache a reply except if an error is returned by the SEQUENCE or
CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds CB_SEQUENCE operation (see Section 2.10.6.1.2). If the reply exceeds
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
is returned on a operation other than first operation (SEQUENCE or is returned on a operation other than first operation (SEQUENCE or
CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or
csa_cachethis are TRUE. For example, if a COMPOUND has eleven csa_cachethis are TRUE. For example, if a COMPOUND has eleven
operations, including SEQUENCE, the fifth operation is a RENAME, and operations, including SEQUENCE, the fifth operation is a RENAME, and
the tenth operation is a READ for one million bytes, the server may the tenth operation is a READ for one million bytes, the server may
return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since
the server executed several operations, especially the non-idempotent the server executed several operations, especially the non-idempotent
skipping to change at page 59, line 47 skipping to change at page 63, line 5
too large on the next operation, especially if the operation is too large on the next operation, especially if the operation is
OPEN. OPEN.
o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent
current filehandle changing operation, if it looks at the next current filehandle changing operation, if it looks at the next
operation (in the same COMPOUND procedure) and finds it is not operation (in the same COMPOUND procedure) and finds it is not
GETFH. The server SHOULD do this if it is unable to determine in GETFH. The server SHOULD do this if it is unable to determine in
advance whether the total response size would exceed advance whether the total response size would exceed
ca_maxresponsesize_cached or ca_maxresponsesize. ca_maxresponsesize_cached or ca_maxresponsesize.
2.10.5.5. Persistence 2.10.6.5. Persistence
Since the reply cache is bounded, it is practical for the reply cache Since the reply cache is bounded, it is practical for the reply cache
to persist across server restarts. The replier MUST persist the to persist across server restarts. The replier MUST persist the
following information if it agreed to persist the session (when the following information if it agreed to persist the session (when the
session was created; see Section 18.36): session was created; see Section 18.36):
o The session ID. o The session ID.
o The slot table including the sequence ID and cached reply for each o The slot table including the sequence ID and cached reply for each
slot. slot.
skipping to change at page 61, line 24 skipping to change at page 64, line 29
failure before the transaction is committed, then the server rolls failure before the transaction is committed, then the server rolls
back the transaction. If server itself fails, then when it restarts, back the transaction. If server itself fails, then when it restarts,
its recovery logic could roll back the transaction before starting its recovery logic could roll back the transaction before starting
the NFSv4.1 server. the NFSv4.1 server.
While the description of the implementation for atomic execution of While the description of the implementation for atomic execution of
the request and caching of the reply is beyond the scope of this the request and caching of the reply is beyond the scope of this
document, an example implementation for NFSv2 [27] is described in document, an example implementation for NFSv2 [27] is described in
[28]. [28].
2.10.6. RDMA Considerations 2.10.7. RDMA Considerations
A complete discussion of the operation of RPC-based protocols over A complete discussion of the operation of RPC-based protocols over
RDMA transports is in [8]. A discussion of the operation of NFSv4, RDMA transports is in [8]. A discussion of the operation of NFSv4,
including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, including NFSv4.1, over RDMA is in [9]. Where RDMA is considered,
this specification assumes the use of such a layering; it addresses this specification assumes the use of such a layering; it addresses
only the upper layer issues relevant to making best use of RPC/RDMA. only the upper layer issues relevant to making best use of RPC/RDMA.
2.10.6.1. RDMA Connection Resources 2.10.7.1. RDMA Connection Resources
RDMA requires its consumers to register memory and post buffers of a RDMA requires its consumers to register memory and post buffers of a
specific size and number for receive operations. specific size and number for receive operations.
Registration of memory can be a relatively high-overhead operation, Registration of memory can be a relatively high-overhead operation,
since it requires pinning of buffers, assignment of attributes (e.g. since it requires pinning of buffers, assignment of attributes (e.g.
readable/writable), and initialization of hardware translation. readable/writable), and initialization of hardware translation.
Preregistration is desirable to reduce overhead. These registrations Preregistration is desirable to reduce overhead. These registrations
are specific to hardware interfaces and even to RDMA connection are specific to hardware interfaces and even to RDMA connection
endpoints, therefore negotiation of their limits is desirable to endpoints, therefore negotiation of their limits is desirable to
skipping to change at page 62, line 13 skipping to change at page 65, line 18
NFSv4.1 manages slots as resources on a per session basis (see NFSv4.1 manages slots as resources on a per session basis (see
Section 2.10), while RDMA connections manage credits on a per Section 2.10), while RDMA connections manage credits on a per
connection basis. This means that in order for a peer to send data connection basis. This means that in order for a peer to send data
over RDMA to a remote buffer, it has to have both an NFSv4.1 slot, over RDMA to a remote buffer, it has to have both an NFSv4.1 slot,
and an RDMA credit. If multiple RDMA connections are associated with and an RDMA credit. If multiple RDMA connections are associated with
a session, then if the total number of credits across all RDMA a session, then if the total number of credits across all RDMA
connections associated with the session is X, and the number slots in connections associated with the session is X, and the number slots in
the session is Y, then the maximum number of outstanding requests is the session is Y, then the maximum number of outstanding requests is
lesser of X and Y. lesser of X and Y.
2.10.6.2. Flow Control 2.10.7.2. Flow Control
Previous versions of NFS do not provide flow control; instead they Previous versions of NFS do not provide flow control; instead they
rely on the windowing provided by transports like TCP to throttle rely on the windowing provided by transports like TCP to throttle
requests. This does not work with RDMA, which provides no operation requests. This does not work with RDMA, which provides no operation
flow control and will terminate a connection in error when limits are flow control and will terminate a connection in error when limits are
exceeded. Limits such as maximum number of requests outstanding are exceeded. Limits such as maximum number of requests outstanding are
therefore negotiated when a session is created (see the therefore negotiated when a session is created (see the
ca_maxrequests field in Section 18.36). These limits then provide ca_maxrequests field in Section 18.36). These limits then provide
the maxima which each connection associated with the session's the maxima which each connection associated with the session's
channel(s) must remain within. RDMA connections are managed within channel(s) must remain within. RDMA connections are managed within
skipping to change at page 62, line 42 skipping to change at page 65, line 47
associated with the replier's channel does exceed the channel's associated with the replier's channel does exceed the channel's
maximum number of outstanding requests. maximum number of outstanding requests.
The limits may also be modified dynamically at the replier's choosing The limits may also be modified dynamically at the replier's choosing
by manipulating certain parameters present in each NFSv4.1 reply. In by manipulating certain parameters present in each NFSv4.1 reply. In
addition, the CB_RECALL_SLOT callback operation (see Section 20.8) addition, the CB_RECALL_SLOT callback operation (see Section 20.8)
can be sent by a server to a client to return RDMA credits to the can be sent by a server to a client to return RDMA credits to the
server, thereby lowering the maximum number of requests a client can server, thereby lowering the maximum number of requests a client can
have outstanding to the server. have outstanding to the server.
2.10.6.3. Padding 2.10.7.3. Padding
Header padding is requested by each peer at session initiation (see Header padding is requested by each peer at session initiation (see
the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), the ca_headerpadsize argument to CREATE_SESSION in Section 18.36),
and subsequently used by the RPC RDMA layer, as described in [8]. and subsequently used by the RPC RDMA layer, as described in [8].
Zero padding is permitted. Zero padding is permitted.
Padding leverages the useful property that RDMA preserve alignment of Padding leverages the useful property that RDMA preserve alignment of
data, even when they are placed into anonymous (untagged) buffers. data, even when they are placed into anonymous (untagged) buffers.
If requested, client inline writes will insert appropriate pad bytes If requested, client inline writes will insert appropriate pad bytes
within the request header to align the data payload on the specified within the request header to align the data payload on the specified
boundary. The client is encouraged to add sufficient padding (up to boundary. The client is encouraged to add sufficient padding (up to
the negotiated size) so that the "data" field of the NFSv4.1 WRITE the negotiated size) so that the "data" field of the NFSv4.1 WRITE
operation is aligned. Most servers can make good use of such operation is aligned. Most servers can make good use of such
padding, which allows them to chain receive buffers in such a way padding, which allows them to chain receive buffers in such a way
skipping to change at page 63, line 47 skipping to change at page 67, line 5
In the above case, the server may recycle unused buffers to the next In the above case, the server may recycle unused buffers to the next
posted receive if unused by the actual received request, or may pass posted receive if unused by the actual received request, or may pass
the now-complete buffers by reference for normal write processing. the now-complete buffers by reference for normal write processing.
For a server which can make use of it, this removes any need for data For a server which can make use of it, this removes any need for data
copies of incoming data, without resorting to complicated end-to-end copies of incoming data, without resorting to complicated end-to-end
buffer advertisement and management. This includes most kernel-based buffer advertisement and management. This includes most kernel-based
and integrated server designs, among many others. The client may and integrated server designs, among many others. The client may
perform similar optimizations, if desired. perform similar optimizations, if desired.
2.10.6.4. Dual RDMA and Non-RDMA Transports 2.10.7.4. Dual RDMA and Non-RDMA Transports
Some RDMA transports (for example [10]), permit a "streaming" (non- Some RDMA transports (for example [10]), permit a "streaming" (non-
RDMA) phase, where ordinary traffic might flow before "stepping up" RDMA) phase, where ordinary traffic might flow before "stepping up"
to RDMA mode, commencing RDMA traffic. Some RDMA transports start to RDMA mode, commencing RDMA traffic. Some RDMA transports start
connections always in RDMA mode. NFSv4.1 allows, but does not connections always in RDMA mode. NFSv4.1 allows, but does not
assume, a streaming phase before RDMA mode. When a connection is assume, a streaming phase before RDMA mode. When a connection is
associated with a session, the client and server negotiate whether associated with a session, the client and server negotiate whether
the connection is used in RDMA or non-RDMA mode (see Section 18.36 the connection is used in RDMA or non-RDMA mode (see Section 18.36
and Section 18.34). and Section 18.34).
2.10.7. Sessions Security 2.10.8. Sessions Security
2.10.7.1. Session Callback Security 2.10.8.1. Session Callback Security
Via session / connection association, NFSv4.1 improves security over Via session / connection association, NFSv4.1 improves security over
that provided by NFSv4.0 for the backchannel. The connection is that provided by NFSv4.0 for the backchannel. The connection is
client-initiated (see Section 18.34), and subject to the same client-initiated (see Section 18.34), and subject to the same
firewall and routing checks as the fore channel. The connection firewall and routing checks as the fore channel. The connection
cannot be hijacked by an attacker who connects to the client port cannot be hijacked by an attacker who connects to the client port
prior to the intended server as is possible with NFSv4.0. At the prior to the intended server as is possible with NFSv4.0. At the
client's option (see Section 18.35), connection association is fully client's option (see Section 18.35), connection association is fully
authenticated before being activated (see Section 18.34). Traffic authenticated before being activated (see Section 18.34). Traffic
from the server over the backchannel is authenticated exactly as the from the server over the backchannel is authenticated exactly as the
client specifies (see Section 2.10.7.2). client specifies (see Section 2.10.8.2).
2.10.7.2. Backchannel RPC Security 2.10.8.2. Backchannel RPC Security
When the NFSv4.1 client establishes the backchannel, it informs the When the NFSv4.1 client establishes the backchannel, it informs the
server of the security flavors and principals to use when sending server of the security flavors and principals to use when sending
requests. If the security flavor is RPCSEC_GSS, the client expresses requests. If the security flavor is RPCSEC_GSS, the client expresses
the principal in the form of an established RPCSEC_GSS context. The the principal in the form of an established RPCSEC_GSS context. The
server is free to use any of the flavor/principal combinations the server is free to use any of the flavor/principal combinations the
client offers, but it MUST NOT use unoffered combinations. This way, client offers, but it MUST NOT use unoffered combinations. This way,
the client need not provide a target GSS principal for the the client need not provide a target GSS principal for the
backchannel as it did with NFSv4.0, nor the server have to implement backchannel as it did with NFSv4.0, nor the server have to implement
an RPCSEC_GSS initiator as it did with NFSv4.0 [20]. an RPCSEC_GSS initiator as it did with NFSv4.0 [20].
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL
(Section 18.33) operations allow the client to specify flavor/ (Section 18.33) operations allow the client to specify flavor/
principal combinations. principal combinations.
Also note that the SP4_SSV state protection mode (see Section 18.35 Also note that the SP4_SSV state protection mode (see Section 18.35
and Section 2.10.7.3) has the side benefit of providing SSV-derived and Section 2.10.8.3) has the side benefit of providing SSV-derived
RPCSEC_GSS contexts (Section 2.10.8). RPCSEC_GSS contexts (Section 2.10.9).
2.10.7.3. Protection from Unauthorized State Changes 2.10.8.3. Protection from Unauthorized State Changes
As described to this point in the specification, the state model of As described to this point in the specification, the state model of
NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation
with a forged session ID and with a slot ID that it expects the with a forged session ID and with a slot ID that it expects the
legitimate client to use next. When the legitimate client uses the legitimate client to use next. When the legitimate client uses the
slot ID with the same sequence number, the server returns the slot ID with the same sequence number, the server returns the
attacker's result from the reply cache which disrupts the legitimate attacker's result from the reply cache which disrupts the legitimate
client and thus denies service to it. Similarly an attacker could client and thus denies service to it. Similarly an attacker could
send a CREATE_SESSION with a forged client ID to create a new session send a CREATE_SESSION with a forged client ID to create a new session
associated with the client ID. The attacker could send requests associated with the client ID. The attacker could send requests
skipping to change at page 66, line 37 skipping to change at page 69, line 44
3. The physical client has multiple users, but the client 3. The physical client has multiple users, but the client
implementation has a unique client ID for each user. This is implementation has a unique client ID for each user. This is
effectively the same as the second scenario, but a disadvantage effectively the same as the second scenario, but a disadvantage
is that each user must be allocated at least one session each, so is that each user must be allocated at least one session each, so
the approach suffers from lack of economy. the approach suffers from lack of economy.
The SP4_SSV protection option uses a Secret State Verifier (SSV) The SP4_SSV protection option uses a Secret State Verifier (SSV)
which is shared between a client and server. The SSV serves as the which is shared between a client and server. The SSV serves as the
secret key for an internal (that is, internal to NFSv4.1) GSS secret key for an internal (that is, internal to NFSv4.1) GSS
mechanism that uses the secret key for Message Integrity Code (MIC) mechanism that uses the secret key for Message Integrity Code (MIC)
and Wrap tokens (Section 2.10.8). The SP4_SSV protection option is and Wrap tokens (Section 2.10.9). The SP4_SSV protection option is
intended for the client that has multiple users, and the system intended for the client that has multiple users, and the system
administrator does not wish to configure a permanent machine administrator does not wish to configure a permanent machine
credential for each client. The SSV is established on the server via credential for each client. The SSV is established on the server via
SET_SSV (see Section 18.47). To prevent eavesdropping, a client SET_SSV (see Section 18.47). To prevent eavesdropping, a client
SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several SHOULD send SET_SSV via RPCSEC_GSS with the privacy service. Several
aspects of the SSV make it intractable for an attacker to guess the aspects of the SSV make it intractable for an attacker to guess the
SSV, and thus associate rogue connections with a session, and rogue SSV, and thus associate rogue connections with a session, and rogue
sessions with a client ID: sessions with a client ID:
o The arguments to and results of SET_SSV include digests of the old o The arguments to and results of SET_SSV include digests of the old
and new SSV, respectively. and new SSV, respectively.
o Because the initial value of the SSV is zero, therefore known, the o Because the initial value of the SSV is zero, therefore known, the
client that opts for SP4_SSV protection and opts to apply SP4_SSV client that opts for SP4_SSV protection and opts to apply SP4_SSV
protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at protection to BIND_CONN_TO_SESSION and CREATE_SESSION MUST send at
least one SET_SSV operation before the first BIND_CONN_TO_SESSION least one SET_SSV operation before the first BIND_CONN_TO_SESSION
operation or before the second CREATE_SESSION operation on a operation or before the second CREATE_SESSION operation on a
client ID. If it does not, the SSV mechanism will not generate client ID. If it does not, the SSV mechanism will not generate
tokens (Section 2.10.8). A client SHOULD send SET_SSV as soon as tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as
a session is created. a session is created.
o A SET_SSV does not replace the SSV with the argument to SET_SSV. o A SET_SSV does not replace the SSV with the argument to SET_SSV.
Instead, the current SSV on the server is logically exclusive ORed Instead, the current SSV on the server is logically exclusive ORed
(XORed) with the argument to SET_SSV. Each time a new principal (XORed) with the argument to SET_SSV. Each time a new principal
uses a client ID for the first time, the client SHOULD send a uses a client ID for the first time, the client SHOULD send a
SET_SSV with that principal's RPCSEC_GSS credentials, with SET_SSV with that principal's RPCSEC_GSS credentials, with
RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY.
Here are the types of attacks that can be attempted by an attacker Here are the types of attacks that can be attempted by an attacker
skipping to change at page 69, line 27 skipping to change at page 72, line 33
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. is to prevent connection hijacking, the use of IPsec is RECOMMENDED.
If a connection hijack occurs, the hijacker could in theory change If a connection hijack occurs, the hijacker could in theory change
locking state and negatively impact the service to legitimate locking state and negatively impact the service to legitimate
clients. However if the server is configured to require the use of clients. However if the server is configured to require the use of
RPCSEC_GSS with integrity or privacy on the affected file objects, RPCSEC_GSS with integrity or privacy on the affected file objects,
and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35), and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35),
is in force, this will thwart unauthorized attempts to change locking is in force, this will thwart unauthorized attempts to change locking
state. state.
2.10.8. The SSV GSS Mechanism 2.10.9. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage token (emitted by GSS_Wrap). SealedMessage token (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
skipping to change at page 73, line 36 skipping to change at page 76, line 43
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (See its tokens because RPCSEC_GSS does not use those features (See
Section 5.2.2 "Context Creation Requests" in [4]). Section 5.2.2 "Context Creation Requests" in [4]).
2.10.9. Session Mechanics - Steady State 2.10.10. Session Mechanics - Steady State
2.10.9.1. Obligations of the Server 2.10.10.1. Obligations of the Server
The server has the primary obligation to monitor the state of The server has the primary obligation to monitor the state of
backchannel resources that the client has created for the server backchannel resources that the client has created for the server
(RPCSEC_GSS contexts and backchannel connections). If these (RPCSEC_GSS contexts and backchannel connections). If these
resources vanish, the server takes action as specified in resources vanish, the server takes action as specified in
Section 2.10.11.2. Section 2.10.12.2.
2.10.9.2. Obligations of the Client 2.10.10.2. Obligations of the Client
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. A server MAY force the server to cull an inactive session. A server MAY
consider a session to be inactive if the client has not used the consider a session to be inactive if the client has not used the
session before the session inactivity timer (Section 2.10.10) has session before the session inactivity timer (Section 2.10.11) has
expired. expired.
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions, one of which has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the backchannel. If the client requires o Maintain GSS contexts for the backchannel. If the client requires
the server to use the RPCSEC_GSS security flavor for callbacks, the server to use the RPCSEC_GSS security flavor for callbacks,
then it needs to be sure the contexts handed to the server via then it needs to be sure the contexts handed to the server via
skipping to change at page 74, line 35 skipping to change at page 77, line 40
backchannel in order to gracefully recall recallable state, or backchannel in order to gracefully recall recallable state, or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way for the is not being used for the fore channel, there is no way for the
client tell if the connection is still alive (e.g., the server client tell if the connection is still alive (e.g., the server
restarted without sending a disconnect). The onus is on the restarted without sending a disconnect). The onus is on the
server, not the client, to determine if the backchannel's server, not the client, to determine if the backchannel's
connection is alive, and to indicate in the response to a SEQUENCE connection is alive, and to indicate in the response to a SEQUENCE
operation when the last connection associated with a session's operation when the last connection associated with a session's
backchannel has disconnected. backchannel has disconnected.
2.10.9.3. Steps the Client Takes To Establish a Session 2.10.10.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client sends EXCHANGE_ID If the client does not have a client ID, the client sends EXCHANGE_ID
to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV
protection, in the spo_must_enforce list of operations, it SHOULD at protection, in the spo_must_enforce list of operations, it SHOULD at
minimum specify: CREATE_SESSION, DESTROY_SESSION, minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
The client uses the client ID to send a CREATE_SESSION on a The client uses the client ID to send a CREATE_SESSION on a
skipping to change at page 75, line 28 skipping to change at page 78, line 33
If the client wants to use additional connections for the If the client wants to use additional connections for the
backchannel, then it must call BIND_CONN_TO_SESSION on each backchannel, then it must call BIND_CONN_TO_SESSION on each
connection it wants to use with the session. If the client wants to connection it wants to use with the session. If the client wants to
use additional connections for the fore channel, then it must call use additional connections for the fore channel, then it must call
BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED state
protection when the client ID was created. protection when the client ID was created.
At this point the session has reached steady state. At this point the session has reached steady state.
2.10.10. Session Inactivity Timer 2.10.11. Session Inactivity Timer
The server MAY maintain a session inactivity timer for each session. The server MAY maintain a session inactivity timer for each session.
If the session inactivity timer expires, then the server MAY destroy If the session inactivity timer expires, then the server MAY destroy
the session. To avoid losing a session due to inactivity, the client the session. To avoid losing a session due to inactivity, the client
MUST renew the session inactivity timer. The length of session MUST renew the session inactivity timer. The length of session
inactivity timer MUST NOT be less than the lease_time attribute inactivity timer MUST NOT be less than the lease_time attribute
(Section 5.8.1.11). As with lease renewal (Section 8.3), when the (Section 5.8.1.11). As with lease renewal (Section 8.3), when the
server receives a SEQUENCE operation, it resets the session server receives a SEQUENCE operation, it resets the session
inactivity timer, and MUST NOT allow the timer to expire while the inactivity timer, and MUST NOT allow the timer to expire while the
rest of the operations in the COMPOUND procedure's request are still rest of the operations in the COMPOUND procedure's request are still
executing. Once the last operation has finished, the server MUST set executing. Once the last operation has finished, the server MUST set
the session inactivity timer to expire no sooner that the sum of the the session inactivity timer to expire no sooner that the sum of the
current time and the value of the lease_time attribute. current time and the value of the lease_time attribute.
2.10.11. Session Mechanics - Recovery 2.10.12. Session Mechanics - Recovery
2.10.11.1. Events Requiring Client Action 2.10.12.1. Events Requiring Client Action
The following events require client action to recover. The following events require client action to recover.
2.10.11.1.1. RPCSEC_GSS Context Loss by Callback Path 2.10.12.1.1. RPCSEC_GSS Context Loss by Callback Path
If all RPCSEC_GSS contexts granted by the client to the server for If all RPCSEC_GSS contexts granted by the client to the server for
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.3). expired (see Section 18.46.3).
2.10.11.1.2. Connection Loss 2.10.12.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.6.2. Note that it is
not necessary to retry requests over a connection with the same not necessary to retry requests over a connection with the same
source network address or the same destination network address as the source network address or the same destination network address as the
lost connection. As long as the session ID, slot ID, and sequence ID lost connection. As long as the session ID, slot ID, and sequence ID
in the retry match that of the original request, the server will in the retry match that of the original request, the server will
recognize the request as a retry if it executed the request prior to recognize the request as a retry if it executed the request prior to
disconnect. disconnect.
If the connection that was lost was the last one associated with the If the connection that was lost was the last one associated with the
backchannel, and the client wants to retain the backchannel and/or backchannel, and the client wants to retain the backchannel and/or
not put recallable state subject to revocation, the client must not put recallable state subject to revocation, the client must
reconnect, and if it does, it MUST associate the connection to the reconnect, and if it does, it MUST associate the connection to the
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD
indicate when it has no callback connection via the sr_status_flags indicate when it has no callback connection via the sr_status_flags
result from SEQUENCE. result from SEQUENCE.
2.10.11.1.3. Backchannel GSS Context Loss 2.10.12.1.3. Backchannel GSS Context Loss
Via the sr_status_flags result of the SEQUENCE operation or other Via the sr_status_flags result of the SEQUENCE operation or other
means, the client will learn if some or all of the RPCSEC_GSS means, the client will learn if some or all of the RPCSEC_GSS
contexts it assigned to the backchannel have been lost. If the contexts it assigned to the backchannel have been lost. If the
client wants to the retain the backchannel and/or not put recallable client wants to the retain the backchannel and/or not put recallable
state subjection to revocation, the client must use BACKCHANNEL_CTL state subjection to revocation, the client must use BACKCHANNEL_CTL
to assign new contexts. to assign new contexts.
2.10.11.1.4. Loss of Session 2.10.12.1.4. Loss of Session
The replier might lose a record of the session. Causes include: The replier might lose a record of the session. Causes include:
o Replier failure and restart o Replier failure and restart
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
o As a result of configuration changes among a set of clustered
servers, a network address previously connected to one server
becomes connected to a different server which has no knowledge of
the session in question. Such a configuration change will
generally only happen when the original server ceases to function
for a time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the session ID NFS4ERR_BADSESSION on the next operation that uses the session ID
that refers to the lost session. that refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes BIND_CONN_TO_SESSION using the session ID. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session is not available to
connection survives session loss, then the next SEQUENCE operation it when communicating with that network address. If the connection
the client sends over the connection will get back survives session loss, then the next SEQUENCE operation the client
NFS4ERR_BADSESSION. The client again knows the session was lost. sends over the connection will get back NFS4ERR_BADSESSION. The
client again knows the session was lost.
Here is one suggested algorithm for the client when it gets
NFS4ERR_BADSESSION. It is not obligatory in that, if a client does
not want to take advantage of such features as trunking, it may omit
parts of it. However, it is a useful example which draws attention
to various possible recovery issues:
1. If the client has other connections to other server network
addresses associated with the same session, attempt a COMPOUND
with a single operation, SEQUENCE, on each of the other
connections.
2. If the attempts succeed, the session is still alive, and this is
a strong indicator the server's network address has moved. The
client might send an EXCHANGE_ID on the connection that returned
NFS4ERR_BADSESSION to see if there are opportunities for client
ID trunking (i.e. the same client ID and so_major are returned).
The client might use DNS to see if the moved network address was
replaced with another, so that the performance and availability
benefits of session trunking can continue.
3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION then the
session no longer exists on any of the server network addresses
the client has connections associated with that session ID. It
is possible the session is still alive and available on other
network addresses. The client sends an EXCHANGE_ID on all the
connections to see if the server owner is still listening on
those network addresses. If the same server owner is returned,
but a new client ID is returned, this is a strong indicator of a
server restart. If both the same server owner and same client ID
are returned, then this is a strong indication that the server
did delete the session, and the client will need to send a
CREATE_SESSION if it has no other sessions for that client ID.
If a different server owner is returned, the client can use DNS
to find other network addresses. If it does not, or if DNS does
not find any other addresses for the server, then the client will
be unable to provide NFSv4.1 service, and fatal errors should be
returned to processes that were using the server. If the client
is using a "mount" paradigm, unmounting the server is advised.
4. If the client knows of no other connections associated with the
session ID, and server network addresses that are, or have been
associated with the session ID, then the client can use DNS to
find other network addresses. If it does not, or if DNS does not
find any other addresses for the server, then the client will be
unable to provide NFSv4.1 service, and fatal errors should be
returned to processes that were using the server. If the client
is using a "mount" paradigm, unmounting the server is advised.
If there is a reconfiguration event which results in the same network
being assigned to servers where the server_scope value is different,
it cannot be guaranteed that a session ID generated by the first will
be recognized as invalid by the first. Therefore, in managing server
reconfigurations among servers with different server scope values, it
is necessary to make sure that all clients have disconnected from the
first server before effecting the reconfiguration. Nonetheless,
clients should not assume that this requirement will always be
adhered to in effecting server reconfigurations to deal with
unexpected events. Even where a session ID is inappropriately
recognized as valid, it is likely that either the connection will not
be recognized as valid, or that a sequence value for a slot will not
be correct. Therefore, when a client receives results indicating
such unexpected errors, the use of EXCHANGE_ID to determine the
current server configuration and present the client to the server is
recommended.
A variation on the above is that after a server's network address
moves, there is no NFSv4.1 server listening. E.g. no listener on
port 2049, the NFSv4 server returns NFS4ERR_MINOR_VERS_MISMATCH, the
NFS server server returns a PROG_MISMATCH error, the RPC listener on
2049 returns PROG_MISMATCH, or attempts to re-connect to the network
address timeout. These should be treated as equivalent to SEQUENCE
returning NFS4ERR_BADSESSION for these purposes.
When the client detects session loss, it must call CREATE_SESSION to When the client detects session loss, it must call CREATE_SESSION to
recover. Any non-idempotent operations that were in progress may recover. Any non-idempotent operations that were in progress may
have been performed on the server at the time of session loss. The have been performed on the server at the time of session loss. The
client has no general way to recover from this. client has no general way to recover from this.
Note that loss of session does not imply loss of lock, open, Note that loss of session does not imply loss of lock, open,
delegation, or layout state because locks, opens, delegations, and delegation, or layout state because locks, opens, delegations, and
layouts are tied to the client ID and depend on the client ID, not layouts are tied to the client ID and depend on the client ID, not
the session. Nor does loss of lock, open, delegation, or layout the session. Nor does loss of lock, open, delegation, or layout
skipping to change at page 78, line 5 skipping to change at page 82, line 38
client ID; loss of client ID however does imply loss of session, client ID; loss of client ID however does imply loss of session,
lock, open, delegation, and layout state. See Section 8.4.2. A lock, open, delegation, and layout state. See Section 8.4.2. A
session can survive a server restart, but lock recovery may still be session can survive a server restart, but lock recovery may still be
needed. needed.
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID
(for example the server restarts and does not preserve client ID (for example the server restarts and does not preserve client ID
state). If so, the client needs to call EXCHANGE_ID, followed by state). If so, the client needs to call EXCHANGE_ID, followed by
CREATE_SESSION. CREATE_SESSION.
2.10.11.2. Events Requiring Server Action 2.10.12.2. Events Requiring Server Action
The following events require server action to recover. The following events require server action to recover.
2.10.11.2.1. Client Crash and Restart 2.10.12.2.1. Client Crash and Restart
As described in Section 18.35, a restarted client sends EXCHANGE_ID As described in Section 18.35, a restarted client sends EXCHANGE_ID
in such a way it causes the server to delete any sessions it had. in such a way it causes the server to delete any sessions it had.
2.10.11.2.2. Client Crash with No Restart 2.10.12.2.2. Client Crash with No Restart
If a client crashes and never comes back, it will never send If a client crashes and never comes back, it will never send
EXCHANGE_ID with its old client owner. Thus the server has session EXCHANGE_ID with its old client owner. Thus the server has session
state that will never be used again. After an extended period of state that will never be used again. After an extended period of
time and if the server has resource constraints, it MAY destroy the time and if the server has resource constraints, it MAY destroy the
old session as well as locking state. old session as well as locking state.
2.10.11.2.3. Extended Network Partition 2.10.12.2.3. Extended Network Partition
To the server, the extended network partition may be no different To the server, the extended network partition may be no different
from a client crash with no restart (see Section 2.10.11.2.2). from a client crash with no restart (see Section 2.10.12.2.2).
Unless the server can discern that there is a network partition, it Unless the server can discern that there is a network partition, it
is free to treat the situation as if the client has crashed is free to treat the situation as if the client has crashed
permanently. permanently.
2.10.11.2.4. Backchannel Connection Loss 2.10.12.2.4. Backchannel Connection Loss
If there were callback requests outstanding at the time of a If there were callback requests outstanding at the time of a
connection loss, then the server MUST retry the request, as described connection loss, then the server MUST retry the request, as described
in Section 2.10.5.2. Note that it is not necessary to retry requests in Section 2.10.6.2. Note that it is not necessary to retry requests
over a connection with the same source network address or the same over a connection with the same source network address or the same
destination network address as the lost connection. As long as the destination network address as the lost connection. As long as the
session ID, slot ID, and sequence ID in the retry match that of the session ID, slot ID, and sequence ID in the retry match that of the
original request, the callback target will recognize the request as a original request, the callback target will recognize the request as a
retry even if it did see the request prior to disconnect. retry even if it did see the request prior to disconnect.
If the connection lost is the last one associated with the If the connection lost is the last one associated with the
backchannel, then the server MUST indicate that in the backchannel, then the server MUST indicate that in the
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.11.2.5. GSS Context Loss 2.10.12.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and when that one context is assigned to the backchannel reaches one, and when that one context is
near expiry (i.e. between one and two periods of lease time), near expiry (i.e. between one and two periods of lease time),
indicate so in the sr_status_flags field of all SEQUENCE replies. indicate so in the sr_status_flags field of all SEQUENCE replies.
The server MUST indicate when the all of the backchannel's assigned The server MUST indicate when the all of the backchannel's assigned
RPCSEC_GSS contexts have expired in the sr_status_flags field of all RPCSEC_GSS contexts have expired in the sr_status_flags field of all
SEQUENCE replies. SEQUENCE replies.
2.10.12. Parallel NFS and Sessions 2.10.13. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
it will allow the sessions to be used. See Section 13.1 for pNFS it will allow the sessions to be used. See Section 13.1 for pNFS
sessions considerations. sessions considerations.
skipping to change at page 84, line 48 skipping to change at page 89, line 33
3.3.9. netaddr4 3.3.9. netaddr4
struct netaddr4 { struct netaddr4 {
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 data type is used to identify network transport The netaddr4 data type is used to identify network transport
endpoints. The r_netid and r_addr fields respectively contain a endpoints. The r_netid and r_addr fields respectively contain a
netid and uaddr. The netid and uaddr concepts are defined in in netid and uaddr. The netid and uaddr concepts are defined in [13].
[13]. The netid and uaddr formats for TCP over IPv4 and TCP over The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are
IPv6 are defined in [13], specifically Tables 2 and 3 and Sections defined in [13], specifically Tables 2 and 3 and Sections 3.2.3.3 and
3.2.3.3 and 3.2.3.4. 3.2.3.4.
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
clientid4 clientid; clientid4 clientid;
opaque owner<NFS4_OPAQUE_LIMIT>; opaque owner<NFS4_OPAQUE_LIMIT>;
}; };
typedef state_owner4 open_owner4; typedef state_owner4 open_owner4;
typedef state_owner4 lock_owner4; typedef state_owner4 lock_owner4;
skipping to change at page 160, line 31 skipping to change at page 165, line 31
careful, transport retransmission delays can result in the client careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends. failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with The scenario is that the client is using a transport with
exponential back off, such that the maximum retransmission timeout exponential back off, such that the maximum retransmission timeout
exceeds the both the grace period and the lease_time attribute. A exceeds the both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has transport-level retransmission is sent after the server has
restarted and its grace period ends. restarted and its grace period ends.
The client MUST either recover from the ensuing NFS4ERR_NOGRACE The client MUST either recover from the ensuing NFS4ERR_NO_GRACE
errors, or it MUST ensure that despite transport level errors, or it MUST ensure that despite transport level
retransmission intervals that exceed the lease_time, nonetheless a retransmission intervals that exceed the lease_time, nonetheless a
SEQUENCE operation is sent that renews the lease before SEQUENCE operation is sent that renews the lease before
expiration. The client can achieve this by associating a new expiration. The client can achieve this by associating a new
connection with the session, and sending a SEQUENCE operation on connection with the session, and sending a SEQUENCE operation on
it. However, if the attempt to establish a new connection is it. However, if the attempt to establish a new connection is
delayed for some reason (e.g. exponential backoff of the delayed for some reason (e.g. exponential backoff of the
connection establishment packets), the client will have to abort connection establishment packets), the client will have to abort
the connection establishment attempt before the lease expires, and the connection establishment attempt before the lease expires, and
attempt to re-connect. attempt to re-connect.
skipping to change at page 162, line 12 skipping to change at page 167, line 12
within the client or network buffers must wait until the client has within the client or network buffers must wait until the client has
successfully recovered the locks protecting the READ and WRITE successfully recovered the locks protecting the READ and WRITE
operations. Any that reach the server before the server can safely operations. Any that reach the server before the server can safely
determine that the client has recovered enough locking state to be determine that the client has recovered enough locking state to be
sure that such operations can be safely processed must be rejected. sure that such operations can be safely processed must be rejected.
This will happen because either: This will happen because either:
o The state presented is no longer valid since it is associated with o The state presented is no longer valid since it is associated with
a now invalid client ID. In this case the client will receive a now invalid client ID. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing client ID will attempt to attach a new session to that invalid client ID will
result in an NFS4ERR_STALE_CLIENTID error. result in an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
skipping to change at page 166, line 39 skipping to change at page 171, line 39
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
period at least as long as the lease period for the previous server period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established. previous server instance to be reliably re-established.
The possibility exists, that because of server configuration events,
the client will be communicating with a server different than the one
on which the locks were obtained, as shown by the combination of
eir_server_scope and eir_server_owner. This leads to the issue of if
and when the client should attempt to reclaim locks previously
obtained on what is being reported as a different server. The rules
to resolve this question are as follows:
o If the server scope is different the client should not attempt to
reclaim locks. In this situation no lock reclaim is possible.
Any attempt to re-obtain the locks with non-reclaim operations is
problematic since there is no guarantee that the existing
filehandles will be recognized by the new server, or that if
recognized, they denote the same objects. It is best to treat the
locks as having been revoked by the reconfiguration event.
o If the server scope is the same, the client should attempt to
reclaim locks, even if the eir_server_owner value is different.
In this situation, it is the responsibility of the server to
return NFS4ERR_NO_GRACE if it cannot provide correct support for
lock reclaim operations, including the prevention of edge
conditions.
The eir_server_owner field is not used in making this determination.
Its function is to specify trunking possibilities for the client (see
Section 2.10.5) and not to control lock reclaim.
8.4.3. Network Partitions and Recovery 8.4.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease If the duration of a network partition is greater than the lease
period provided by the server, the server will not have received a period provided by the server, the server will not have received a
lease renewal from the client. If this occurs, the server may free lease renewal from the client. If this occurs, the server may free
all locks held for the client, or it may allow the lock state to all locks held for the client, or it may allow the lock state to
remain for a considerable period, subject to the constraint that if a remain for a considerable period, subject to the constraint that if a
request for a conflicting lock is made, locks associated with an request for a conflicting lock is made, locks associated with an
expired lease do not prevent such a conflicting lock from being expired lease do not prevent such a conflicting lock from being
granted but MUST be revoked as necessary so as not to interfere with granted but MUST be revoked as necessary so as not to interfere with
skipping to change at page 167, line 38 skipping to change at page 173, line 17
In addition, all I/O submitted by the client with the now invalid In addition, all I/O submitted by the client with the now invalid
stateids will fail with the server returning the error stateids will fail with the server returning the error
NFS4ERR_EXPIRED. Once the client learns of the loss of locking NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will suitably notify the applications that held the state, it will suitably notify the applications that held the
invalidated locks. The client should then take action to free invalidated locks. The client should then take action to free
invalidated stateids, either by establishing a new client ID using a invalidated stateids, either by establishing a new client ID using a
new verifier or by doing a FREE_STATEID operation to release each of new verifier or by doing a FREE_STATEID operation to release each of
the invalidated stateids. the invalidated stateids.
When the server adopts a finer-grained approach to revocation of When the server adopts a finer-grained approach to revocation of
locks when lease have expired, only a subset of stateids will locks when leases have expired, only a subset of stateids will
normally become invalid during a network partition. When the client normally become invalid during a network partition. When the client
can communicate with the server after such a network partition heals, can communicate with the server after such a network partition heals,
the status returned by the SEQUENCE operation will indicate a partial the status returned by the SEQUENCE operation will indicate a partial
loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In
addition, operations, including I/O submitted by the client, with the addition, operations, including I/O submitted by the client, with the
now invalid stateids will fail with the server returning the error now invalid stateids will fail with the server returning the error
NFS4ERR_EXPIRED. Once the client learns of the loss of locking NFS4ERR_EXPIRED. Once the client learns of the loss of locking
state, it will use the TEST_STATEID operation on all of its stateids state, it will use the TEST_STATEID operation on all of its stateids
to determine which locks have been lost and then suitably notify the to determine which locks have been lost and then suitably notify the
applications that held the invalidated locks. The client can then applications that held the invalidated locks. The client can then
skipping to change at page 225, line 32 skipping to change at page 231, line 14
11.5. Location Entries and Server Identity 11.5. Location Entries and Server Identity
As mentioned above, a single location entry may have a server address As mentioned above, a single location entry may have a server address
target in the form of a DNS name which may represent multiple IP target in the form of a DNS name which may represent multiple IP
addresses, while multiple location entries may have their own server addresses, while multiple location entries may have their own server
address targets, that reference the same server. Whether two IP address targets, that reference the same server. Whether two IP
addresses designate the same server is indicated by the existence of addresses designate the same server is indicated by the existence of
a common so_major_id field within the eir_server_owner field returned a common so_major_id field within the eir_server_owner field returned
by EXCHANGE_ID (see Section 18.35.3), subject to further by EXCHANGE_ID (see Section 18.35.3), subject to further
verification, for details of which see Section 2.10.4. verification, for details of which see Section 2.10.5.
When multiple addresses for the same server exist, the client may When multiple addresses for the same server exist, the client may
assume that for each file system in the namespace of a given server assume that for each file system in the namespace of a given server
network address, there exist file systems at corresponding namespace network address, there exist file systems at corresponding namespace
locations for each of the other server network addresses. It may do locations for each of the other server network addresses. It may do
this even in the absence of explicit listing in fs_locations and this even in the absence of explicit listing in fs_locations and
fs_locations_info. Such corresponding file system locations can be fs_locations_info. Such corresponding file system locations can be
used as alternate locations, just as those explicitly specified via used as alternate locations, just as those explicitly specified via
the fs_locations and fs_locations_info attributes. Where these the fs_locations and fs_locations_info attributes. Where these
specific addresses are explicitly designated in the fs_locations_info specific addresses are explicitly designated in the fs_locations_info
skipping to change at page 229, line 43 skipping to change at page 235, line 26
When the conditions in Section 11.7.2 hold, in either of the When the conditions in Section 11.7.2 hold, in either of the
following two cases, the client may use the two file system instances following two cases, the client may use the two file system instances
simultaneously. simultaneously.
o The fs_locations_info attribute does not contain separate per- o The fs_locations_info attribute does not contain separate per-
network-address entries for file systems instances at the distinct network-address entries for file systems instances at the distinct
network addresses. This includes the case in which the network addresses. This includes the case in which the
fs_locations_info attribute is unavailable. In this case, the fs_locations_info attribute is unavailable. In this case, the
fact that the two server addresses connect to the same server (as fact that the two server addresses connect to the same server (as
indicated by the two addresses sharing the same the so_major_id indicated by the two addresses sharing the same the so_major_id
value and subsequently confirmed as described in Section 2.10.4) value and subsequently confirmed as described in Section 2.10.5)
justifies simultaneous use and there is no fs_locations_info justifies simultaneous use and there is no fs_locations_info
attribute information contradicting that. attribute information contradicting that.
o The fs_locations_info attribute indicates that two file system o The fs_locations_info attribute indicates that two file system
instances belong to the same _simultaneous-use_ class. instances belong to the same _simultaneous-use_ class.
In this case, the client may use both file system instances In this case, the client may use both file system instances
simultaneously, as representations of the same file system, whether simultaneously, as representations of the same file system, whether
that happens because the two network addresses connect to the same that happens because the two network addresses connect to the same
physical server or because different servers connect to clustered physical server or because different servers connect to clustered
skipping to change at page 234, line 26 skipping to change at page 240, line 10
which they have not. Cooperation by two servers in state management which they have not. Cooperation by two servers in state management
requires coordination of client IDs. Before the client attempts to requires coordination of client IDs. Before the client attempts to
use a client ID associated with one server in a request to the server use a client ID associated with one server in a request to the server
of the other file system, it must eliminate the possibility that two of the other file system, it must eliminate the possibility that two
non-cooperating servers have assigned the same client ID by accident. non-cooperating servers have assigned the same client ID by accident.
The client needs to compare the eir_server_scope values returned by The client needs to compare the eir_server_scope values returned by
each server. If the scope values do not match, then the servers have each server. If the scope values do not match, then the servers have
not cooperated in state management. If the scope values match, then not cooperated in state management. If the scope values match, then
this indicates the servers have cooperated in assigning client IDs to this indicates the servers have cooperated in assigning client IDs to
the point that they will reject client IDs that refer to state they the point that they will reject client IDs that refer to state they
do not know about. do not know about. See Section 2.10.4 for more information about the
use of server scope.
In the case of migration, the servers involved in the migration of a In the case of migration, the servers involved in the migration of a
file system SHOULD transfer all server state from the original to the file system SHOULD transfer all server state from the original to the
new server. When this is done, it must be done in a way that is new server. When this is done, it must be done in a way that is
transparent to the client. With replication, such a degree of common transparent to the client. With replication, such a degree of common
state is typically not the case. Clients, however should use the state is typically not the case. Clients, however should use the
information provided by the eir_server_scope returned by EXCHANGE_ID information provided by the eir_server_scope returned by EXCHANGE_ID
to determine whether such sharing may be in effect, rather than (as modified by the validation procedures described in
making assumptions based on the reason for the transition. Section 2.10.4) to determine whether such sharing may be in effect,
rather than making assumptions based on the reason for the
transition.
This state transfer will reduce disruption to the client when a file This state transfer will reduce disruption to the client when a file
system transition occurs. If the servers are successful in system transition occurs. If the servers are successful in
transferring all state, the client can attempt to establish sessions transferring all state, the client can attempt to establish sessions
associated with the client ID used for the source file system associated with the client ID used for the source file system
instance. If the server accepts that as a valid client ID, then the instance. If the server accepts that as a valid client ID, then the
client may use the existing stateids associated with that client ID client may use the existing stateids associated with that client ID
for the old file system instance in connection with that same client for the old file system instance in connection with that same client
ID in connection with the transitioned file system instance. ID in connection with the transitioned file system instance. If the
client in question already had a client ID on the target system, it
may interrogate the state ID values from the source system under that
new client ID, with the assurance that if they are accepted as valid,
then they represent validly transferred lock state for the source
file system, transferred to the target server.
When the two servers belong to the same server scope, it does not When the two servers belong to the same server scope, it does not
mean that when dealing with the transition, the client will not have mean that when dealing with the transition, the client will not have
to reclaim state. However it does mean that the client may proceed to reclaim state. However it does mean that the client may proceed
using its current client ID when establishing communication with the using its current client ID when establishing communication with the
new server and the new server will either recognize the client ID as new server and the new server will either recognize the client ID as
valid, or reject it, in which case locks must be reclaimed by the valid, or reject it, in which case locks must be reclaimed by the
client. client.
File systems co-operating in state management may actually share File systems co-operating in state management may actually share
skipping to change at page 235, line 18 skipping to change at page 241, line 10
reject as stale) each other's stateids and client IDs. Servers which reject as stale) each other's stateids and client IDs. Servers which
do share state may not do so under all conditions or at all times. do share state may not do so under all conditions or at all times.
The requirement for the server is that if it cannot be sure in The requirement for the server is that if it cannot be sure in
accepting a client ID that it reflects the locks the client was accepting a client ID that it reflects the locks the client was
given, it must treat all associated state as stale and report it as given, it must treat all associated state as stale and report it as
such to the client. such to the client.
When the two file system instances are on servers that do not share a When the two file system instances are on servers that do not share a
server scope value, the client must establish a new client ID on the server scope value, the client must establish a new client ID on the
destination, if it does not have one already, and reclaim locks if destination, if it does not have one already, and reclaim locks if
possible. In this case, old stateids and client IDs should not be allowed by the server. In this case, old stateids and client IDs
presented to the new server since there is no assurance that they should not be presented to the new server since there is no assurance
will not conflict with IDs valid on that server. that they will not conflict with IDs valid on that server. Note that
in this case lock reclaim may be attempted even when the servers
involved in the transfer have different server scope values (see
Section 8.4.2.1 for the contrary case of reclaim after server reboot.
Servers with different server scope values may co-operate to allow
reclaim for locks associated with the transfer of a filesystem even
if they do not co-operate sufficiently to share a server scope.
In either case, when actual locks are not known to be maintained, the In either case, when actual locks are not known to be maintained, the
destination server may establish a grace period specific to the given destination server may establish a grace period specific to the given
file system, with non-reclaim locks being rejected for that file file system, with non-reclaim locks being rejected for that file
system, even though normal locks are being granted for other file system, even though normal locks are being granted for other file
systems. Clients should not infer the absence of a grace period for systems. Clients should not infer the absence of a grace period for
file systems being transitioned to a server from responses to file systems being transitioned to a server from responses to
requests for other file systems. requests for other file systems.
In the case of lock reclamation for a given file system after a file In the case of lock reclamation for a given file system after a file
skipping to change at page 282, line 41 skipping to change at page 288, line 41
layout stateid. If the "seqid" is not one higher than what the layout stateid. If the "seqid" is not one higher than what the
client currently has recorded, and the client has at least one client currently has recorded, and the client has at least one
LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows
the server sent the CB_LAYOUTRECALL after sending a response to an the server sent the CB_LAYOUTRECALL after sending a response to an
outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before
processing such a CB_LAYOUTRECALL until it processes all replies for processing such a CB_LAYOUTRECALL until it processes all replies for
outstanding LAYOUTGET and LAYOUTRETURN operations for the outstanding LAYOUTGET and LAYOUTRETURN operations for the
corresponding file with seqid less than the seqid given by corresponding file with seqid less than the seqid given by
CB_LAYOUTRECALL (lor_stateid, see Section 20.3.) CB_LAYOUTRECALL (lor_stateid, see Section 20.3.)
In addition to the seqid-based mechanism, Section 2.10.5.3 describes In addition to the seqid-based mechanism, Section 2.10.6.3 describes
the sessions mechanism for allowing the client to detect callback the sessions mechanism for allowing the client to detect callback
race conditions and delay processing such a CB_LAYOUTRECALL. The race conditions and delay processing such a CB_LAYOUTRECALL. The
server MAY reference conflicting operations in the CB_SEQUENCE that server MAY reference conflicting operations in the CB_SEQUENCE that
precedes the CB_LAYOUTRECALL. Because the server has already sent precedes the CB_LAYOUTRECALL. Because the server has already sent
replies for these operations before issuing the callback, the replies replies for these operations before issuing the callback, the replies
may race with the CB_LAYOUTRECALL. The client MUST wait for all the may race with the CB_LAYOUTRECALL. The client MUST wait for all the
referenced calls to complete and update its view of the layout state referenced calls to complete and update its view of the layout state
before processing the CB_LAYOUTRECALL. before processing the CB_LAYOUTRECALL.
12.5.5.2.1.1. Get/Return Sequencing 12.5.5.2.1.1. Get/Return Sequencing
skipping to change at page 285, line 24 skipping to change at page 291, line 24
12.5.5.2.1.4. Wraparound and Validation of Seqid 12.5.5.2.1.4. Wraparound and Validation of Seqid
The rules for layout stateid processing differ from other stateids in The rules for layout stateid processing differ from other stateids in
the protocol because the "seqid" value cannot be zero and the the protocol because the "seqid" value cannot be zero and the
stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The
non-zero requirement combined with the inherent parallelism of layout non-zero requirement combined with the inherent parallelism of layout
operations means that a set of LAYOUTGET and LAYOUTRETURN operations operations means that a set of LAYOUTGET and LAYOUTRETURN operations
may contain the same value for "seqid". The server uses a slightly may contain the same value for "seqid". The server uses a slightly
modified version of the modulo arithmetic as described in modified version of the modulo arithmetic as described in
Section 2.10.5.1 when incrementing the layout stateid's "seqid". The Section 2.10.6.1 when incrementing the layout stateid's "seqid". The
modification to that modulo arithmetic description is to not use modification to that modulo arithmetic description is to not use
zero. The modulo arithmetic is also used for the comparisons of zero. The modulo arithmetic is also used for the comparisons of
"seqid" values in the processing of CB_LAYOUTRECALL events as "seqid" values in the processing of CB_LAYOUTRECALL events as
described above in Section 12.5.5.2.1.3. described above in Section 12.5.5.2.1.3.
Just as the server validates the "seqid" in the event of Just as the server validates the "seqid" in the event of
CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the
server also validates the "seqid" value to ensure that it is within server also validates the "seqid" value to ensure that it is within
an appropriate range. This range represents the degree of an appropriate range. This range represents the degree of
parallelism the server supports for layout stateids. If the client parallelism the server supports for layout stateids. If the client
skipping to change at page 290, line 27 skipping to change at page 296, line 27
the lease expiration. First, for all modified but uncommitted data, the lease expiration. First, for all modified but uncommitted data,
write it to the metadata server using the FILE_SYNC4 flag for the write it to the metadata server using the FILE_SYNC4 flag for the
WRITEs or WRITE and COMMIT. Second, the client reestablishes a WRITEs or WRITE and COMMIT. Second, the client reestablishes a
client ID and session with the server and obtain new layouts and client ID and session with the server and obtain new layouts and
device ID to device address mappings for the modified data ranges and device ID to device address mappings for the modified data ranges and
then write the data to the storage devices with the newly obtained then write the data to the storage devices with the newly obtained
layouts. layouts.
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns
NFS4ERR_STALE_CLIENTID, or SEQUENCE returns NFS4ERR_BAD_SESSION and NFS4ERR_BAD_SESSION and CREATE_SESSION returns
CREATE_SESSION returns NFS4ERR_STALE_CLIENTID) then the metadata NFS4ERR_STALE_CLIENTID) then the metadata server has restarted, and
server has restarted, and the client SHOULD recover using the methods the client SHOULD recover using the methods described in
described in Section 12.7.4. Section 12.7.4.
If sr_status_flags from the metadata server has If sr_status_flags from the metadata server has
SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following
the procedure described in Section 11.7.7.1. After that, the client the procedure described in Section 11.7.7.1. After that, the client
may get an indication that the layout state was not moved with the may get an indication that the layout state was not moved with the
file system. The client recovers as in the other applicable file system. The client recovers as in the other applicable
situations discussed in Paragraph 1 or Paragraph 2 of this section. situations discussed in Paragraph 1 or Paragraph 2 of this section.
If sr_status_flags reports no loss of state, then the lease for the If sr_status_flags reports no loss of state, then the lease for the
layouts the client has are valid and renewed, and the client can once layouts the client has are valid and renewed, and the client can once
skipping to change at page 298, line 22 skipping to change at page 304, line 22
Another scenario is for the metadata server and the storage device to Another scenario is for the metadata server and the storage device to
be distinct from one client's point of view, and the roles reversed be distinct from one client's point of view, and the roles reversed
from another client's point of view. For example, in the cluster from another client's point of view. For example, in the cluster
file system model, a metadata server to one client may be a data file system model, a metadata server to one client may be a data
server to another client. If NFSv4.1 is being used as the storage server to another client. If NFSv4.1 is being used as the storage
protocol, then pNFS servers need to encode the values of filehandles protocol, then pNFS servers need to encode the values of filehandles
according to their specific roles. according to their specific roles.
13.1.1. Sessions Considerations for Data Servers 13.1.1. Sessions Considerations for Data Servers
Section 2.10.9.2 states that a client has to keep its lease renewed Section 2.10.10.2 states that a client has to keep its lease renewed
in order to prevent a session from being deleted by the server. If in order to prevent a session from being deleted by the server. If
the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role
set, then as noted in Section 13.6 the client will not be able to set, then as noted in Section 13.6 the client will not be able to
determine the data server's lease_time attribute, because GETATTR determine the data server's lease_time attribute, because GETATTR
will not be permitted. Instead, the rule is that any time a client will not be permitted. Instead, the rule is that any time a client
receives a layout referring it to a data server that returns just the receives a layout referring it to a data server that returns just the
EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the
lease_time attribute from the metadata server that returned the lease_time attribute from the metadata server that returned the
layout applies to the data server. Thus the data server MUST be layout applies to the data server. Thus the data server MUST be
aware of the values of all lease_time attributes of all metadata aware of the values of all lease_time attributes of all metadata
skipping to change at page 310, line 30 skipping to change at page 316, line 30
data server 2. Unless data server 2 has two filehandles (each data server 2. Unless data server 2 has two filehandles (each
referring to a different data file), then, for example, a write to referring to a different data file), then, for example, a write to
logical stripe unit 1 overwrites the write to logical stripe unit 2, logical stripe unit 1 overwrites the write to logical stripe unit 2,
because both logical stripe units are located in the same stripe unit because both logical stripe units are located in the same stripe unit
(0) of data server 2. (0) of data server 2.
13.5. Data Server Multipathing 13.5. Data Server Multipathing
The NFSv4.1 file layout supports multipathing to multiple data server The NFSv4.1 file layout supports multipathing to multiple data server
addresses. Data server-level multipathing is used for bandwidth addresses. Data server-level multipathing is used for bandwidth
scaling via trunking (Section 2.10.4) and for higher availability of scaling via trunking (Section 2.10.5) and for higher availability of
use in the case of a data server failure. Multipathing allows the use in the case of a data server failure. Multipathing allows the
client to switch to another data server address which may that of client to switch to another data server address which may that of
another data server that is exporting the same data stripe unit, another data server that is exporting the same data stripe unit,
without having to contact the metadata server for a new layout. without having to contact the metadata server for a new layout.
To support data server multipathing, each element of the To support data server multipathing, each element of the
nflda_multipath_ds_list contains an array of one more data server nflda_multipath_ds_list contains an array of one more data server
network addresses. This array (data type multipath_list4) represents network addresses. This array (data type multipath_list4) represents
a list of data servers (each identified by a network address), with a list of data servers (each identified by a network address), with
it being possible that some data servers will appear in the list it being possible that some data servers will appear in the list
skipping to change at page 311, line 18 skipping to change at page 317, line 18
the device ID to device address mappings to the available data the device ID to device address mappings to the available data
servers. If the device ID itself must be replaced, the MDS SHOULD servers. If the device ID itself must be replaced, the MDS SHOULD
recall all layouts with the device ID, and thus force the client to recall all layouts with the device ID, and thus force the client to
get new layouts and device ID mappings via LAYOUTGET and get new layouts and device ID mappings via LAYOUTGET and
GETDEVICEINFO. GETDEVICEINFO.
Generally if two network addresses appear in an element of Generally if two network addresses appear in an element of
nflda_multipath_ds_list they will designate the same data server and nflda_multipath_ds_list they will designate the same data server and
the two data server addresses will support the implementation client the two data server addresses will support the implementation client
ID or session trunking (the latter is RECOMMENDED) as defined in ID or session trunking (the latter is RECOMMENDED) as defined in
Section 2.10.4, and the two data server addresses will share the same Section 2.10.5, and the two data server addresses will share the same
server owner, or major ID of the server owner. It is not always server owner, or major ID of the server owner. It is not always
necessary for the two data server addresses to designate the same necessary for the two data server addresses to designate the same
server with trunking being used. For example the data could be read- server with trunking being used. For example the data could be read-
only, and the data consist of exact replicas. only, and the data consist of exact replicas.
13.6. Operations Sent to NFSv4.1 Data Servers 13.6. Operations Sent to NFSv4.1 Data Servers
Clients accessing data on an NFSv4.1 data server MUST send only the Clients accessing data on an NFSv4.1 data server MUST send only the
NULL procedure and COMPOUND procedures whose operations are taken NULL procedure and COMPOUND procedures whose operations are taken
only from two restricted subsets of the operations defined as valid only from two restricted subsets of the operations defined as valid
skipping to change at page 336, line 35 skipping to change at page 342, line 35
due to administrative interaction, possibly while the lease is valid. due to administrative interaction, possibly while the lease is valid.
15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) 15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026)
A stateid does not properly designate any valid state. See A stateid does not properly designate any valid state. See
Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are
validated. validated.
15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087) 15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087)
A stateid designates recallable locking state of any type that has A stateid designates recallable locking state of any type (delegation
been revoked due to the failure of the client to return the lock, or layout) that has been revoked due to the failure of the client to
when it was recalled. return the lock, when it was recalled.
15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011) 15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011)
A stateid designates locking state of any type that has been revoked A stateid designates locking state of any type that has been revoked
due to expiration of the client's lease, either immediately upon due to expiration of the client's lease, either immediately upon
lease expiration, or following a later request for a conflicting lease expiration, or following a later request for a conflicting
lock. lock.
15.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024) 15.1.5.5. NFS4ERR_OLD_STATEID (Error Code 10024)
skipping to change at page 342, line 7 skipping to change at page 348, line 7
server. server.
15.1.11. Session Use Errors 15.1.11. Session Use Errors
This section deals with errors encountered in using sessions, that This section deals with errors encountered in using sessions, that
is, in issuing requests over them using the Sequence (i.e. either is, in issuing requests over them using the Sequence (i.e. either
SEQUENCE or CB_SEQUENCE) operations. SEQUENCE or CB_SEQUENCE) operations.
15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052)
A session ID was specified which does not exist. A session ID that was specified is not known to the server to which
the operation is addressed.
15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053)
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
the replier does not have in its slot table. It is possible the slot the replier does not have in its slot table. It is possible the slot
may have been retired. may have been retired.
15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)
The highest_slot argument in a Sequence operation exceeds the The highest_slot argument in a Sequence operation exceeds the
skipping to change at page 351, line 8 skipping to change at page 357, line 8
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE | | | NFS4ERR_UNKNOWN_LAYOUTTYPE |
| GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, | | GETFH | NFS4ERR_FHEXPIRED, NFS4ERR_MOVED, |
| | NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_NOFILEHANDLE, |
| | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE | | | NFS4ERR_OP_NOT_IN_SESSION, NFS4ERR_STALE |
| ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL | | ILLEGAL | NFS4ERR_BADXDR NFS4ERR_OP_ILLEGAL |
| LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, | | LAYOUTCOMMIT | NFS4ERR_ACCESS, NFS4ERR_ADMIN_REVOKED, |
| | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, | | | NFS4ERR_ATTRNOTSUPP, NFS4ERR_BADIOMODE, |
| | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, | | | NFS4ERR_BADLAYOUT, NFS4ERR_BADXDR, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_EXPIRED, NFS4ERR_FBIG, | | | NFS4ERR_DELEG_REVOKED, NFS4ERR_EXPIRED, |
| | NFS4ERR_FHEXPIRED, NFS4ERR_GRACE, | | | NFS4ERR_FBIG, NFS4ERR_FHEXPIRED, |
| | NFS4ERR_INVAL, NFS4ERR_IO, NFS4ERR_ISDIR | | | NFS4ERR_GRACE, NFS4ERR_INVAL, NFS4ERR_IO, |
| | NFS4ERR_MOVED, NFS4ERR_NOFILEHANDLE, | | | NFS4ERR_ISDIR NFS4ERR_MOVED, |
| | NFS4ERR_NOTSUPP, NFS4ERR_NO_GRACE, | | | NFS4ERR_NOFILEHANDLE, NFS4ERR_NOTSUPP, |
| | NFS4ERR_NO_GRACE, |
| | NFS4ERR_OP_NOT_IN_SESSION, | | | NFS4ERR_OP_NOT_IN_SESSION, |
| | NFS4ERR_RECLAIM_BAD, | | | NFS4ERR_RECLAIM_BAD, |
| | NFS4ERR_RECLAIM_CONFLICT, | | | NFS4ERR_RECLAIM_CONFLICT, |
| | NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE, NFS4ERR_SYMLINK, | | | NFS4ERR_STALE, NFS4ERR_SYMLINK, |
| | NFS4ERR_TOO_MANY_OPS, | | | NFS4ERR_TOO_MANY_OPS, |
| | NFS4ERR_UNKNOWN_LAYOUTTYPE, | | | NFS4ERR_UNKNOWN_LAYOUTTYPE, |
| | NFS4ERR_WRONG_CRED | | | NFS4ERR_WRONG_CRED |
skipping to change at page 368, line 35 skipping to change at page 374, line 35
| | OPENATTR, OPEN_DOWNGRADE, | | | OPENATTR, OPEN_DOWNGRADE, |
| | PUTFH, PUTPUBFH, PUTROOTFH, | | | PUTFH, PUTPUBFH, PUTROOTFH, |
| | READ, READDIR, READLINK, | | | READ, READDIR, READLINK, |
| | RECLAIM_COMPLETE, REMOVE, | | | RECLAIM_COMPLETE, REMOVE, |
| | RENAME, SECINFO, | | | RENAME, SECINFO, |
| | SECINFO_NO_NAME, SEQUENCE, | | | SECINFO_NO_NAME, SEQUENCE, |
| | SETATTR, SET_SSV, | | | SETATTR, SET_SSV, |
| | TEST_STATEID, VERIFY, | | | TEST_STATEID, VERIFY, |
| | WANT_DELEGATION, WRITE | | | WANT_DELEGATION, WRITE |
| NFS4ERR_DELEG_ALREADY_WANTED | OPEN, WANT_DELEGATION | | NFS4ERR_DELEG_ALREADY_WANTED | OPEN, WANT_DELEGATION |
| NFS4ERR_DELEG_REVOKED | DELEGRETURN, LAYOUTGET, | | NFS4ERR_DELEG_REVOKED | DELEGRETURN, LAYOUTCOMMIT, |
| | LAYOUTRETURN, OPEN, READ, | | | LAYOUTGET, LAYOUTRETURN, |
| | SETATTR, WRITE | | | OPEN, READ, SETATTR, WRITE |
| NFS4ERR_DENIED | LOCK, LOCKT | | NFS4ERR_DENIED | LOCK, LOCKT |
| NFS4ERR_DIRDELEG_UNAVAIL | GET_DIR_DELEGATION | | NFS4ERR_DIRDELEG_UNAVAIL | GET_DIR_DELEGATION |
| NFS4ERR_DQUOT | CREATE, LAYOUTGET, LINK, | | NFS4ERR_DQUOT | CREATE, LAYOUTGET, LINK, |
| | OPEN, OPENATTR, RENAME, | | | OPEN, OPENATTR, RENAME, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_ENCR_ALG_UNSUPP | EXCHANGE_ID | | NFS4ERR_ENCR_ALG_UNSUPP | EXCHANGE_ID |
| NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME | | NFS4ERR_EXIST | CREATE, LINK, OPEN, RENAME |
| NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, | | NFS4ERR_EXPIRED | CLOSE, DELEGRETURN, |
| | LAYOUTCOMMIT, LAYOUTRETURN, | | | LAYOUTCOMMIT, LAYOUTRETURN, |
| | LOCK, LOCKU, OPEN, | | | LOCK, LOCKU, OPEN, |
skipping to change at page 385, line 39 skipping to change at page 391, line 39
The COMPOUND procedure is used to combine individual operations into The COMPOUND procedure is used to combine individual operations into
a single RPC request. The server interprets each of the operations a single RPC request. The server interprets each of the operations
in turn. If an operation is executed by the server and the status of in turn. If an operation is executed by the server and the status of
that operation is NFS4_OK, then the next operation in the COMPOUND that operation is NFS4_OK, then the next operation in the COMPOUND
procedure is executed. The server continues this process until there procedure is executed. The server continues this process until there
are no more operations to be executed or one of the operations has a are no more operations to be executed or one of the operations has a
status value other than NFS4_OK. status value other than NFS4_OK.
In the processing of the COMPOUND procedure, the server may find that In the processing of the COMPOUND procedure, the server may find that
it does not have the available resources to execute any or all of the it does not have the available resources to execute any or all of the
operations within the COMPOUND sequence. See Section 2.10.5.4 for a operations within the COMPOUND sequence. See Section 2.10.6.4 for a
more detailed discussion. more detailed discussion.
The server will generally choose between two methods of decoding the The server will generally choose between two methods of decoding the
client's request. The first would be the traditional one pass XDR client's request. The first would be the traditional one pass XDR
decode. If there is an XDR decoding error in this case, the RPC XDR decode. If there is an XDR decoding error in this case, the RPC XDR
decode error would be returned. The second method would be to make decode error would be returned. The second method would be to make
an initial pass to decode the basic COMPOUND request and then to XDR an initial pass to decode the basic COMPOUND request and then to XDR
decode the individual operations; the most interesting is the decode decode the individual operations; the most interesting is the decode
of attributes. In this case, the server may encounter an XDR decode of attributes. In this case, the server may encounter an XDR decode
error during the second pass. In this case, the server would return error during the second pass. In this case, the server would return
skipping to change at page 410, line 5 skipping to change at page 416, line 5
default: default:
void; void;
}; };
18.8.3. DESCRIPTION 18.8.3. DESCRIPTION
This operation returns the current filehandle value. This operation returns the current filehandle value.
On success, the current filehandle retains its value. On success, the current filehandle retains its value.
As described in Section 2.10.5.4, GETFH is REQUIRED or RECOMMENDED to As described in Section 2.10.6.4, GETFH is REQUIRED or RECOMMENDED to
immediately follow certain operations, and servers are free to reject immediately follow certain operations, and servers are free to reject
such operations the client fails to insert GETFH in the request as such operations the client fails to insert GETFH in the request as
REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional
justification for why GETFH MUST follow OPEN. justification for why GETFH MUST follow OPEN.
18.8.4. IMPLEMENTATION 18.8.4. IMPLEMENTATION
Operations that change the current filehandle like LOOKUP or CREATE Operations that change the current filehandle like LOOKUP or CREATE
do not automatically return the new filehandle as a result. For do not automatically return the new filehandle as a result. For
instance, if a client needs to lookup a directory entry and obtain instance, if a client needs to lookup a directory entry and obtain
skipping to change at page 443, line 10 skipping to change at page 449, line 10
to determine which file to close. Therefore the client MUST follow to determine which file to close. Therefore the client MUST follow
every OPEN operation with a GETFH operation in the same COMPOUND every OPEN operation with a GETFH operation in the same COMPOUND
procedure. This will supply the client with the filehandle such that procedure. This will supply the client with the filehandle such that
CLOSE can be used appropriately. CLOSE can be used appropriately.
Simply waiting for the lease on the file to expire is insufficient Simply waiting for the lease on the file to expire is insufficient
because the server may maintain the state indefinitely as long as because the server may maintain the state indefinitely as long as
another client does not attempt to make a conflicting access to the another client does not attempt to make a conflicting access to the
same file. same file.
See also Section 2.10.5.4. See also Section 2.10.6.4.
18.17. Operation 19: OPENATTR - Open Named Attribute Directory 18.17. Operation 19: OPENATTR - Open Named Attribute Directory
18.17.1. ARGUMENTS 18.17.1. ARGUMENTS
struct OPENATTR4args { struct OPENATTR4args {
/* CURRENT_FH: object */ /* CURRENT_FH: object */
bool createdir; bool createdir;
}; };
skipping to change at page 479, line 15 skipping to change at page 485, line 15
18.34.3. DESCRIPTION 18.34.3. DESCRIPTION
BIND_CONN_TO_SESSION is used to associate additional connections with BIND_CONN_TO_SESSION is used to associate additional connections with
a session. It MUST be used on the connection being associated with a session. It MUST be used on the connection being associated with
the session. It MUST be the only operation in the COMPOUND the session. It MUST be the only operation in the COMPOUND
procedure. If SP4_NONE (Section 18.35) state protection is used, any procedure. If SP4_NONE (Section 18.35) state protection is used, any
principal, security flavor, or RPCSEC_GSS context MAY be used to principal, security flavor, or RPCSEC_GSS context MAY be used to
invoke the operation. If SP4_MACH_CRED is used, RPCSEC_GSS MUST be invoke the operation. If SP4_MACH_CRED is used, RPCSEC_GSS MUST be
used with the integrity or privacy services, using the principal that used with the integrity or privacy services, using the principal that
created the client ID. If SP4_SSV is used, RPCSEC_GSS with the SSV created the client ID. If SP4_SSV is used, RPCSEC_GSS with the SSV
GSS mechanism (Section 2.10.8) and integrity or privacy MUST be used. GSS mechanism (Section 2.10.9) and integrity or privacy MUST be used.
If, when the client ID was created, the client opted for SP4_NONE If, when the client ID was created, the client opted for SP4_NONE
state protection, the client is not required to use state protection, the client is not required to use
BIND_CONN_TO_SESSION to associate the connection with the session, BIND_CONN_TO_SESSION to associate the connection with the session,
unless the client wishes to associate the connection with the unless the client wishes to associate the connection with the
backchannel. When SP4_NONE protection is used, simply sending a backchannel. When SP4_NONE protection is used, simply sending a
COMPOUND request with a SEQUENCE operation is sufficient to associate COMPOUND request with a SEQUENCE operation is sufficient to associate
the connection with the session specified in SEQUENCE. the connection with the session specified in SEQUENCE.
The field bctsa_dir indicates whether the client wants to associate The field bctsa_dir indicates whether the client wants to associate
skipping to change at page 484, line 29 skipping to change at page 490, line 29
EXCHANGE_ID sent with the current incarnation and co_ownerid will EXCHANGE_ID sent with the current incarnation and co_ownerid will
result in an error or an update of the client ID's properties, result in an error or an update of the client ID's properties,
depending on the arguments to EXCHANGE_ID. depending on the arguments to EXCHANGE_ID.
A server MUST NOT use the same client ID for two different A server MUST NOT use the same client ID for two different
incarnations of an eir_clientowner. incarnations of an eir_clientowner.
In addition to the client ID and sequence ID, the server returns a In addition to the client ID and sequence ID, the server returns a
server owner (eir_server_owner) and server scope (eir_server_scope). server owner (eir_server_owner) and server scope (eir_server_scope).
The former field is used for network trunking as described in The former field is used for network trunking as described in
Section 2.10.4. The latter field is used to allow clients to Section 2.10.5. The latter field is used to allow clients to
determine when client IDs sent by one server may be recognized by determine when client IDs sent by one server may be recognized by
another in the event of file system migration (see Section 11.7.7). another in the event of file system migration (see Section 11.7.7).
The client ID returned by EXCHANGE_ID is only unique relative to the The client ID returned by EXCHANGE_ID is only unique relative to the
combination of eir_server_owner.so_major_id and eir_server_scope. combination of eir_server_owner.so_major_id and eir_server_scope.
Thus if two servers return the same client ID, the onus is on the Thus if two servers return the same client ID, the onus is on the
client to distinguish the client IDs on the basis of client to distinguish the client IDs on the basis of
eir_server_owner.so_major_id and eir_server_scope. In the event two eir_server_owner.so_major_id and eir_server_scope. In the event two
different server's claim matching server_owner.so_major_id and different server's claim matching server_owner.so_major_id and
eir_server_scope, the client can use the verification techniques eir_server_scope, the client can use the verification techniques
discussed in Section 2.10.4 to determine if the servers are distinct. discussed in Section 2.10.5 to determine if the servers are distinct.
If they are distinct, then the client will need to note the If they are distinct, then the client will need to note the
destination network addresses of the connections used with each destination network addresses of the connections used with each
server, and use the network address as the final discriminator. server, and use the network address as the final discriminator.
The server, as defined by the unique identity expressed in the The server, as defined by the unique identity expressed in the
so_major_id of the server owner and the server scope, needs to track so_major_id of the server owner and the server scope, needs to track
several properties of each client ID it hands out. The properties several properties of each client ID it hands out. The properties
apply to the client ID and all sessions associated with the client apply to the client ID and all sessions associated with the client
ID. The properties are derived from the arguments and results of ID. The properties are derived from the arguments and results of
EXCHANGE_ID. The client ID properties include: EXCHANGE_ID. The client ID properties include:
skipping to change at page 486, line 13 skipping to change at page 492, line 13
this property cannot be updated by subsequent EXCHANGE_ID this property cannot be updated by subsequent EXCHANGE_ID
requests. requests.
* The length of the SSV. This property is represented by the * The length of the SSV. This property is represented by the
spi_ssv_len field in the EXCHANGE_ID results. Once the client spi_ssv_len field in the EXCHANGE_ID results. Once the client
ID is confirmed, this property cannot be updated by subsequent ID is confirmed, this property cannot be updated by subsequent
EXCHANGE_ID requests. The length of SSV MUST be equal to the EXCHANGE_ID requests. The length of SSV MUST be equal to the
length of the key used by the negotiated encryption algorithm. length of the key used by the negotiated encryption algorithm.
* Number of concurrent versions of the SSV the client and server * Number of concurrent versions of the SSV the client and server
will support (Section 2.10.8). This property is represented by will support (Section 2.10.9). This property is represented by
spi_window, in the EXCHANGE_ID results. The property may be spi_window, in the EXCHANGE_ID results. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
o The client's implementation ID as represented by the o The client's implementation ID as represented by the
eia_client_impl_id field of the arguments. The property may be eia_client_impl_id field of the arguments. The property may be
updated by subsequent EXCHANGE_ID requests. updated by subsequent EXCHANGE_ID requests.
o The server's implementation ID as represented by the o The server's implementation ID as represented by the
eir_server_impl_id field of the reply. The property may be eir_server_impl_id field of the reply. The property may be
updated by replies to subsequent EXCHANGE_ID requests. updated by replies to subsequent EXCHANGE_ID requests.
skipping to change at page 487, line 13 skipping to change at page 493, line 13
principal and security flavor it uses when sending the EXCHANGE_ID principal and security flavor it uses when sending the EXCHANGE_ID
request. The situations described in Sub-Paragraph 6, Sub- request. The situations described in Sub-Paragraph 6, Sub-
Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in
Section 18.35.4 will apply. Note that if the operation succeeds and Section 18.35.4 will apply. Note that if the operation succeeds and
returns a client ID that is already confirmed, the server MUST set returns a client ID that is already confirmed, the server MUST set
the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags.
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this
means the client is trying to establish a new client ID; it is means the client is trying to establish a new client ID; it is
attempting to trunk data communication to the server attempting to trunk data communication to the server
(Section 2.10.4); or it is attempting to update properties of an (Section 2.10.5); or it is attempting to update properties of an
unconfirmed client ID. The situations described in Sub-Paragraph 1, unconfirmed client ID. The situations described in Sub-Paragraph 1,
Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5 Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5
of Paragraph 6 in Section 18.35.4 will apply. Note that if the of Paragraph 6 in Section 18.35.4 will apply. Note that if the
operation succeeds and returns a client ID that was previously operation succeeds and returns a client ID that was previously
confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in
eir_flags. eir_flags.
When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client
indicates that it is capable of dealing with an NFS4ERR_MOVED error indicates that it is capable of dealing with an NFS4ERR_MOVED error
as part of a referral sequence. When this bit is not set, it is as part of a referral sequence. When this bit is not set, it is
skipping to change at page 488, line 29 skipping to change at page 494, line 29
Multiple roles can be associated with the same client ID or with Multiple roles can be associated with the same client ID or with
different client IDs. Thus, if a client sends EXCHANGE_ID from the different client IDs. Thus, if a client sends EXCHANGE_ID from the
same client owner to the same server owner multiple times, but same client owner to the same server owner multiple times, but
specifies different pNFS roles each time, the server might return specifies different pNFS roles each time, the server might return
different client IDs. Given that different pNFS roles might have different client IDs. Given that different pNFS roles might have
different client IDs, the client may ask for different properties for different client IDs, the client may ask for different properties for
each role/client ID. each role/client ID.
The spa_how field of the eia_state_protect field specifies how the The spa_how field of the eia_state_protect field specifies how the
client wants to protect its client, locking and session state from client wants to protect its client, locking and session state from
unauthorized changes (Section 2.10.7.3): unauthorized changes (Section 2.10.8.3):
o SP4_NONE. The client does not request the NFSv4.1 server to o SP4_NONE. The client does not request the NFSv4.1 server to
enforce state protection. The NFSv4.1 server MUST NOT enforce enforce state protection. The NFSv4.1 server MUST NOT enforce
state protection for the returned client ID. state protection for the returned client ID.
o SP4_MACH_CRED. This choice is only valid if the client sent the o SP4_MACH_CRED. This choice is only valid if the client sent the
request with RPCSEC_GSS as the security flavor, and with a service request with RPCSEC_GSS as the security flavor, and with a service
of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. The client wants of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. The client wants
to use an RPCSEC_GSS-based machine credential to protect its to use an RPCSEC_GSS-based machine credential to protect its
state. The server MUST note the principal the EXCHANGE_ID state. The server MUST note the principal the EXCHANGE_ID
skipping to change at page 491, line 20 skipping to change at page 497, line 20
return NFS4ERR_INVAL. The server responds with spi_window, which return NFS4ERR_INVAL. The server responds with spi_window, which
MUST NOT exceed ssp_window, and MUST be at least one (1). Any MUST NOT exceed ssp_window, and MUST be at least one (1). Any
requests on the backchannel or fore channel that are using a requests on the backchannel or fore channel that are using a
version of the SSV that is outside the window will fail with an version of the SSV that is outside the window will fail with an
ONC RPC authentication error, and the requester will have to retry ONC RPC authentication error, and the requester will have to retry
them with the same slot ID and sequence ID. them with the same slot ID and sequence ID.
ssp_num_gss_handles: ssp_num_gss_handles:
This is the number of RPCSEC_GSS handles the server should create This is the number of RPCSEC_GSS handles the server should create
that are based on the GSS SSV mechanism (Section 2.10.8). It is that are based on the GSS SSV mechanism (Section 2.10.9). It is
not the total number of RPCSEC_GSS handles for the client ID. not the total number of RPCSEC_GSS handles for the client ID.
Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS Indeed, subsequent calls to EXCHANGE_ID will add RPCSEC_GSS
handles. The server responds with a list of handles in handles. The server responds with a list of handles in
spi_handles. If the client asks for at least one handle and the spi_handles. If the client asks for at least one handle and the
server cannot create it, the server MUST return an error. The server cannot create it, the server MUST return an error. The
handles in spi_handles are not available for use until the client handles in spi_handles are not available for use until the client
ID is confirmed, which could be immediately if EXCHANGE_ID returns ID is confirmed, which could be immediately if EXCHANGE_ID returns
EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from
CREATE_SESSION. While a client ID can span all the connections CREATE_SESSION. While a client ID can span all the connections
that are connected to a server sharing the same that are connected to a server sharing the same
skipping to change at page 502, line 5 skipping to change at page 508, line 5
The maximum size of a COMPOUND or CB_COMPOUND request that will The maximum size of a COMPOUND or CB_COMPOUND request that will
be sent. This size represents the XDR encoded size of the be sent. This size represents the XDR encoded size of the
request, including the RPC headers (including security flavor request, including the RPC headers (including security flavor
credentials and verifiers) but excludes any RPC transport credentials and verifiers) but excludes any RPC transport
framing headers. Imagine a request coming over a non-RDMA framing headers. Imagine a request coming over a non-RDMA
TCP/IP connection, and that it has a single Record Marking TCP/IP connection, and that it has a single Record Marking
header preceding it. The maximum allowable count encoded in header preceding it. The maximum allowable count encoded in
the header will be ca_maxrequestsize. If a requester sends a the header will be ca_maxrequestsize. If a requester sends a
request that exceeds ca_maxrequestsize, the error request that exceeds ca_maxrequestsize, the error
NFS4ERR_REQ_TOO_BIG will be returned per the description in NFS4ERR_REQ_TOO_BIG will be returned per the description in
Section 2.10.5.4. Section 2.10.6.4.
ca_maxresponsesize: ca_maxresponsesize:
The maximum size of a COMPOUND or CB_COMPOUND reply that the The maximum size of a COMPOUND or CB_COMPOUND reply that the
requester will accept from the replier including RPC headers requester will accept from the replier including RPC headers
(see the ca_maxrequestsize definition). The NFSv4.1 server (see the ca_maxrequestsize definition). The NFSv4.1 server
MUST NOT increase the value of this parameter in the MUST NOT increase the value of this parameter in the
CREATE_SESSION results. However, if the client selects a value CREATE_SESSION results. However, if the client selects a value
for ca_maxresponsesize such that a replier on a channel could for ca_maxresponsesize such that a replier on a channel could
never send a response, the server SHOULD return never send a response, the server SHOULD return
NFS4ERR_TOOSMALL in the CREATE_SESSION reply. If a requester NFS4ERR_TOOSMALL in the CREATE_SESSION reply. If a requester
sends a request for which the size of the reply would exceed sends a request for which the size of the reply would exceed
this value, the replier will return NFS4ERR_REP_TOO_BIG, per this value, the replier will return NFS4ERR_REP_TOO_BIG, per
the description in Section 2.10.5.4. the description in Section 2.10.6.4.
ca_maxresponsesize_cached: ca_maxresponsesize_cached:
Like ca_maxresponsesize, but the maximum size of a reply that Like ca_maxresponsesize, but the maximum size of a reply that
will be stored in the reply cache (Section 2.10.5.1). If the will be stored in the reply cache (Section 2.10.6.1). If the
reply to CREATE_SESSION has ca_maxresponsesize_cached less than reply to CREATE_SESSION has ca_maxresponsesize_cached less than
ca_maxresponsesize, then this is an indication to the requester ca_maxresponsesize, then this is an indication to the requester
on the channel that it needs to be selective about which on the channel that it needs to be selective about which
replies it directs the replier to cache; for example large replies it directs the replier to cache; for example large
replies from nonidempotent operations (e.g. COMPOUND requests replies from nonidempotent operations (e.g. COMPOUND requests
with a READ operation), should not be cached. The requester with a READ operation), should not be cached. The requester
decides which replies to cache via an argument to the SEQUENCE decides which replies to cache via an argument to the SEQUENCE
(the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the (the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the
csa_cachethis field, see Section 20.9) operations. If a csa_cachethis field, see Section 20.9) operations. If a
requester sends a request for which the size of the reply would requester sends a request for which the size of the reply would
exceed this value, the replier will return exceed this value, the replier will return
NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in
Section 2.10.5.4. Section 2.10.6.4.
ca_maxoperations: ca_maxoperations:
The maximum number of operations the replier will accept in a The maximum number of operations the replier will accept in a
COMPOUND or CB_COMPOUND. The server MUST NOT increase COMPOUND or CB_COMPOUND. The server MUST NOT increase
ca_maxoperations in the reply to CREATE_SESSION. If the ca_maxoperations in the reply to CREATE_SESSION. If the
requester sends a COMPOUND or CB_COMPOUND with more operations requester sends a COMPOUND or CB_COMPOUND with more operations
than ca_maxoperations, the replier MUST return than ca_maxoperations, the replier MUST return
NFS4ERR_TOO_MANY_OPS. NFS4ERR_TOO_MANY_OPS.
skipping to change at page 509, line 13 skipping to change at page 515, line 13
has no remaining associated sessions, the connection MAY be closed by has no remaining associated sessions, the connection MAY be closed by
the server. Locks, delegations, layouts, wants, and the lease, which the server. Locks, delegations, layouts, wants, and the lease, which
are all tied to the client ID, are not affected by DESTROY_SESSION. are all tied to the client ID, are not affected by DESTROY_SESSION.
DESTROY_SESSION MUST be invoked on a connection that is associated DESTROY_SESSION MUST be invoked on a connection that is associated
with the session being destroyed. In addition if SP4_MACH_CRED state with the session being destroyed. In addition if SP4_MACH_CRED state
protection was specified when the client ID was created, the protection was specified when the client ID was created, the
RPCSEC_GSS principal that created the session MUST be the one that RPCSEC_GSS principal that created the session MUST be the one that
destroys the session, using RPCSEC_GSS privacy or integrity. If destroys the session, using RPCSEC_GSS privacy or integrity. If
SP4_SSV state protection was specified when the client ID was SP4_SSV state protection was specified when the client ID was
created, RPCSEC_GSS using the SSV mechanism (Section 2.10.8) MUST be created, RPCSEC_GSS using the SSV mechanism (Section 2.10.9) MUST be
used, with integrity or privacy. used, with integrity or privacy.
If the COMPOUND request starts with SEQUENCE, and if the sessionids If the COMPOUND request starts with SEQUENCE, and if the sessionids
specified in SEQUENCE and DESTROY_SESSION are the same, then specified in SEQUENCE and DESTROY_SESSION are the same, then
o DESTROY_SESSION MUST be the final operation in the COMPOUND o DESTROY_SESSION MUST be the final operation in the COMPOUND
request. request.
o It is advisable to not place DESTROY_SESSION in a COMPOUND request o It is advisable to not place DESTROY_SESSION in a COMPOUND request
with other state-modifying operations, because the DESTROY_SESSION with other state-modifying operations, because the DESTROY_SESSION
skipping to change at page 538, line 37 skipping to change at page 544, line 37
The sa_slotid argument is the index in the reply cache for the The sa_slotid argument is the index in the reply cache for the
request. The sa_sequenceid field is the sequence number of the request. The sa_sequenceid field is the sequence number of the
request for the reply cache entry (slot). The sr_slotid result MUST request for the reply cache entry (slot). The sr_slotid result MUST
equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid.
The sa_highest_slotid argument is the highest slot ID the client has The sa_highest_slotid argument is the highest slot ID the client has
a request outstanding for; it could be equal to sa_slotid. The a request outstanding for; it could be equal to sa_slotid. The
server returns two "highest_slotid" values: sr_highest_slotid, and server returns two "highest_slotid" values: sr_highest_slotid, and
sr_target_highest_slotid. The former is the highest slot ID the sr_target_highest_slotid. The former is the highest slot ID the
server will accept in future SEQUENCE operation, and SHOULD NOT be server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the value of sa_highest_slotid. (but see Section 2.10.5.1 less than the value of sa_highest_slotid. (but see Section 2.10.6.1
for an exception). The latter is the highest slot ID the server for an exception). The latter is the highest slot ID the server
would prefer the client use on a future SEQUENCE operation. would prefer the client use on a future SEQUENCE operation.
If sa_cachethis is TRUE, then the client is requesting that the If sa_cachethis is TRUE, then the client is requesting that the
server cache the entire reply in the server's reply cache; therefore server cache the entire reply in the server's reply cache; therefore
the server MUST cache the reply (see Section 2.10.5.1.3). The server the server MUST cache the reply (see Section 2.10.6.1.3). The server
MAY cache the reply if sa_cachethis is FALSE. If the server does not MAY cache the reply if sa_cachethis is FALSE. If the server does not
cache the entire reply, it MUST still record that it executed the cache the entire reply, it MUST still record that it executed the
request at the specified slot and sequence ID. request at the specified slot and sequence ID.
The response to the SEQUENCE operation contains a word of status The response to the SEQUENCE operation contains a word of status
flags (sr_status_flags) that can provide to the client information flags (sr_status_flags) that can provide to the client information
related to the status of the client's lock state and communications related to the status of the client's lock state and communications
paths. Note that any status bits relating to lock state MAY be reset paths. Note that any status bits relating to lock state MAY be reset
when lock state is lost due to a server restart (even if the session when lock state is lost due to a server restart (even if the session
is persistent across restarts; session persistence does not imply is persistent across restarts; session persistence does not imply
skipping to change at page 543, line 26 skipping to change at page 549, line 26
case NFS4_OK: case NFS4_OK:
SET_SSV4resok ssr_resok4; SET_SSV4resok ssr_resok4;
default: default:
void; void;
}; };
18.47.3. DESCRIPTION 18.47.3. DESCRIPTION
This operation is used to update the SSV for a client ID. Before This operation is used to update the SSV for a client ID. Before
SET_SSV is called the first time on a client ID, the SSV is zero (0). SET_SSV is called the first time on a client ID, the SSV is zero (0).
The SSV is the key used for the SSV GSS mechanism (Section 2.10.8) The SSV is the key used for the SSV GSS mechanism (Section 2.10.9)
SET_SSV MUST be preceded by a SEQUENCE operation in the same SET_SSV MUST be preceded by a SEQUENCE operation in the same
COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV
state protection when the client ID was created (see Section 18.35); state protection when the client ID was created (see Section 18.35);
the server returns NFS4ERR_INVAL in that case. the server returns NFS4ERR_INVAL in that case.
The field ssa_digest is computed as the output of the HMAC RFC2104 The field ssa_digest is computed as the output of the HMAC RFC2104
[11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and [11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and
current SSV as the key (See Section 2.10.8 for a description of current SSV as the key (See Section 2.10.9 for a description of
subkeys), and an XDR encoded value of data type ssa_digest_input4. subkeys), and an XDR encoded value of data type ssa_digest_input4.
The field sdi_seqargs is equal to the arguments of the SEQUENCE The field sdi_seqargs is equal to the arguments of the SEQUENCE
operation for the COMPOUND procedure that SET_SSV is within. operation for the COMPOUND procedure that SET_SSV is within.
The argument ssa_ssv is XORed with the current SSV to produce the new The argument ssa_ssv is XORed with the current SSV to produce the new
SSV. The argument ssa_ssv SHOULD be generated randomly. SSV. The argument ssa_ssv SHOULD be generated randomly.
In the response, ssr_digest is the output of the HMAC using the In the response, ssr_digest is the output of the HMAC using the
subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and
an XDR encoded value of data type ssr_digest_input4. The field an XDR encoded value of data type ssr_digest_input4. The field
skipping to change at page 544, line 9 skipping to change at page 550, line 9
COMPOUND procedure that SET_SSV is within. COMPOUND procedure that SET_SSV is within.
As noted in Section 18.35, the client and server can maintain As noted in Section 18.35, the client and server can maintain
multiple concurrent versions of the SSV. The client and server each multiple concurrent versions of the SSV. The client and server each
MUST maintain an internal SSV version number, which is set to one (1) MUST maintain an internal SSV version number, which is set to one (1)
the first time SET_SSV executes on the server and the client receives the first time SET_SSV executes on the server and the client receives
the first SET_SSV reply. Each subsequent SET_SSV increases the the first SET_SSV reply. Each subsequent SET_SSV increases the
internal SSV version number by one (1). The value of this version internal SSV version number by one (1). The value of this version
number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq,
and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see
Section 2.10.8). Section 2.10.9).
18.47.4. IMPLEMENTATION 18.47.4. IMPLEMENTATION
When the server receives ssa_digest, it MUST verify the digest by When the server receives ssa_digest, it MUST verify the digest by
computing the digest the same way the client did and comparing it computing the digest the same way the client did and comparing it
with ssa_digest. If the server gets a different result, this is an with ssa_digest. If the server gets a different result, this is an
error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of
another SET_SSV from the same client ID changing the SSV. If so, the another SET_SSV from the same client ID changing the SSV. If so, the
client recovers by issuing SET_SSV again with a recomputed digest client recovers by issuing SET_SSV again with a recomputed digest
based on the subkey of the new SSV. If the transport connection is based on the subkey of the new SSV. If the transport connection is
skipping to change at page 544, line 38 skipping to change at page 550, line 38
is created). is created).
Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST
support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE,
SET_SSV }. SET_SSV }.
A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's
credential because the purpose of SET_SSV is to seed the SSV from credential because the purpose of SET_SSV is to seed the SSV from
non-SSV credentials. Instead SET_SSV SHOULD be sent with the non-SSV credentials. Instead SET_SSV SHOULD be sent with the
credential of a user that is accessing the client ID for the first credential of a user that is accessing the client ID for the first
time (Section 2.10.7.3). However if the client does send SET_SSV time (Section 2.10.8.3). However if the client does send SET_SSV
with SSV credentials, the digest protecting the arguments uses the with SSV credentials, the digest protecting the arguments uses the
value of the SSV before ssa_ssv is XORed in, and the digest value of the SSV before ssa_ssv is XORed in, and the digest
protecting the results uses the value of the SSV after the ssa_ssv is protecting the results uses the value of the SSV after the ssa_ssv is
XORed in. XORed in.
18.48. Operation 55: TEST_STATEID - Test stateids for validity 18.48. Operation 55: TEST_STATEID - Test stateids for validity
Test a series of stateids for validity. Test a series of stateids for validity.
18.48.1. ARGUMENT 18.48.1. ARGUMENT
skipping to change at page 557, line 20 skipping to change at page 563, line 20
19.2.3. DESCRIPTION 19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the The CB_COMPOUND procedure is used to combine one or more of the
callback procedures into a single RPC request. The main callback RPC callback procedures into a single RPC request. The main callback RPC
program has two main procedures: CB_NULL and CB_COMPOUND. All other program has two main procedures: CB_NULL and CB_COMPOUND. All other
operations use the CB_COMPOUND procedure as a wrapper. operations use the CB_COMPOUND procedure as a wrapper.
During the processing of the CB_COMPOUND procedure, the client may During the processing of the CB_COMPOUND procedure, the client may
find that it does not have the available resources to execute any or find that it does not have the available resources to execute any or
all of the operations within the CB_COMPOUND sequence. Refer to all of the operations within the CB_COMPOUND sequence. Refer to
Section 2.10.5.4 for details. Section 2.10.6.4 for details.
The minorversion field of the arguments MUST be the same as the The minorversion field of the arguments MUST be the same as the
minorversion of the COMPOUND procedure used to created the client ID minorversion of the COMPOUND procedure used to created the client ID
and session. For NFSv4.1, minorversion MUST be set to 1. and session. For NFSv4.1, minorversion MUST be set to 1.
Contained within the CB_COMPOUND results is a 'status' field. This Contained within the CB_COMPOUND results is a 'status' field. This
status must be equivalent to the status of the last operation that status must be equivalent to the status of the last operation that
was executed within the CB_COMPOUND procedure. Therefore, if an was executed within the CB_COMPOUND procedure. Therefore, if an
operation incurred an error then the 'status' value will be the same operation incurred an error then the 'status' value will be the same
error value as is being returned for the operation that failed. error value as is being returned for the operation that failed.
skipping to change at page 575, line 36 skipping to change at page 581, line 36
contents include the session ID to which this request belongs, the contents include the session ID to which this request belongs, the
slot ID and sequence ID used by the server to implement session slot ID and sequence ID used by the server to implement session
request control and exactly once semantics, and exchanged slot ID request control and exactly once semantics, and exchanged slot ID
maxima which are used to adjust the size of the reply cache. This maxima which are used to adjust the size of the reply cache. This
operation will appear once as the first operation in each CB_COMPOUND operation will appear once as the first operation in each CB_COMPOUND
request or a protocol error MUST result. See Section 18.46.3 for a request or a protocol error MUST result. See Section 18.46.3 for a
description of how slots are processed. description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.3). cache the reply (see Section 2.10.6.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by session ID, slot ID and sequence ID. These are identified by session ID, slot ID and sequence ID. These are
requests that the client previously sent to the server. These requests that the client previously sent to the server. These
previous requests created state that some operation(s) in the same previous requests created state that some operation(s) in the same
CB_COMPOUND as the csa_referring_call_lists are identifying. A CB_COMPOUND as the csa_referring_call_lists are identifying. A
session ID is included because leased state is tied to a client ID, session ID is included because leased state is tied to a client ID,
and a client ID can have multiple sessions. See Section 2.10.5.3. and a client ID can have multiple sessions. See Section 2.10.6.3.
The value of the csa_sequenceid argument relative to the cached The value of the csa_sequenceid argument relative to the cached
sequence ID on the slot falls into one of three cases. sequence ID on the slot falls into one of three cases.
o If the difference between csa_sequenceid and the client's cached o If the difference between csa_sequenceid and the client's cached
sequence ID at the slot ID is two (2) or more, or if sequence ID at the slot ID is two (2) or more, or if
csa_sequenceid is less than the cached sequence ID (accounting for csa_sequenceid is less than the cached sequence ID (accounting for
wraparound of the unsigned sequence ID value), then the client wraparound of the unsigned sequence ID value), then the client
MUST return NFS4ERR_SEQ_MISORDERED. MUST return NFS4ERR_SEQ_MISORDERED.
skipping to change at page 576, line 32 skipping to change at page 582, line 32
of what it has already executed. The client MAY however detect the of what it has already executed. The client MAY however detect the
server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY.
If CB_SEQUENCE returns an error, then the state of the slot (sequence If CB_SEQUENCE returns an error, then the state of the slot (sequence
ID, cached reply) MUST NOT change. ID, cached reply) MUST NOT change.
The client returns two "highest_slotid" values: csr_highest_slotid, The client returns two "highest_slotid" values: csr_highest_slotid,
and csr_target_highest_slotid. The former is the highest slot ID the and csr_target_highest_slotid. The former is the highest slot ID the
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT client will accept in a future CB_SEQUENCE operation, and SHOULD NOT
be less than the value of csa_highest_slotid (but see be less than the value of csa_highest_slotid (but see
Section 2.10.5.1 for an exception). The latter is the highest slot Section 2.10.6.1 for an exception). The latter is the highest slot
ID the client would prefer the server use on a future CB_SEQUENCE ID the client would prefer the server use on a future CB_SEQUENCE
operation. operation.
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation
Wants Wants
Retracts promise to signal delegation availability. Retracts promise to signal delegation availability.
20.10.1. ARGUMENT 20.10.1. ARGUMENT
skipping to change at page 582, line 49 skipping to change at page 588, line 49
protection is any GETATTR for the fs_locations and protection is any GETATTR for the fs_locations and
fs_locations_info attributes. The attack has two steps. First fs_locations_info attributes. The attack has two steps. First
the attacker modifies the unprotected results of some operation to the attacker modifies the unprotected results of some operation to
return NFS4ERR_MOVED. Second, when the client follows up with a return NFS4ERR_MOVED. Second, when the client follows up with a
GETATTR for the fs_locations or fs_locations_info attributes, the GETATTR for the fs_locations or fs_locations_info attributes, the
attacker modifies the results to cause the client migrate its attacker modifies the results to cause the client migrate its
traffic to a server controlled by the attacker. traffic to a server controlled by the attacker.
Relative to previous NFS versions, NFSv4.1 has additional security Relative to previous NFS versions, NFSv4.1 has additional security
considerations for pNFS (see Section 12.9 and Section 13.12), locking considerations for pNFS (see Section 12.9 and Section 13.12), locking
and session state (see Section 2.10.7.3). and session state (see Section 2.10.8.3).
22. IANA Considerations 22. IANA Considerations
This section uses terms that are defined in [43]. This section uses terms that are defined in [43].
22.1. Named Attribute Definitions 22.1. Named Attribute Definitions
IANA will create a registry called the "NFSv4 Named Attribute IANA will create a registry called the "NFSv4 Named Attribute
Definitions Registry". Definitions Registry".
 End of changes. 169 change blocks. 
520 lines changed or deleted 800 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/