Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-ietf-nfsv4-minorversion1-19.txt - draft-ietf-nfsv4-minorversion1-20.txt
 draft-ietf-nfsv4-minorversion1-19.txt   draft-ietf-nfsv4-minorversion1-20.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: August 1, 2008 Editors Expires: August 20, 2008 Editors
January 29, 2008 February 17, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-19.txt draft-ietf-nfsv4-minorversion1-20.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 1, 2008. This Internet-Draft will expire on August 20, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 2, line 37 skipping to change at page 2, line 37
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19 2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 19
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 22
2.4. Client Identifiers and Client Owners . . . . . . . . . . 23 2.4. Client Identifiers and Client Owners . . . . . . . . . . 23
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 26
2.4.2. Server Release of Client ID . . . . . . . . . . . . 27 2.4.2. Server Release of Client ID . . . . . . . . . . . . 27
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 27
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 28 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 28
2.6. Security Service Negotiation . . . . . . . . . . . . . . 29 2.6. Security Service Negotiation . . . . . . . . . . . . . . 29
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 29
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 29 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 30
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 30
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 33
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36 2.8. Non-RPC-based Security Services . . . . . . . . . . . . 36
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 36
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 36
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 36
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 36 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 37
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 37
2.9.2. Client and Server Transport Behavior . . . . . . . . 37 2.9.2. Client and Server Transport Behavior . . . . . . . . 37
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 39
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 39
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 39
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 40
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 42
2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43 2.10.4. Trunking . . . . . . . . . . . . . . . . . . . . . . 43
2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46 2.10.5. Exactly Once Semantics . . . . . . . . . . . . . . . 46
2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58 2.10.6. RDMA Considerations . . . . . . . . . . . . . . . . 58
2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61 2.10.7. Sessions Security . . . . . . . . . . . . . . . . . 61
2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66 2.10.8. The SSV GSS Mechanism . . . . . . . . . . . . . . . 66
2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70 2.10.9. Session Mechanics - Steady State . . . . . . . . . . 70
2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 71 2.10.10. Session Mechanics - Recovery . . . . . . . . . . . . 72
2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75 2.10.11. Parallel NFS and Sessions . . . . . . . . . . . . . 75
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 75 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 76
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 75 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 76
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 76 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 77
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 78
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 87 4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 87 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 88
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 87 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 88
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 88 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 89
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 88 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 89
4.2.1. General Properties of a Filehandle . . . . . . . . . 89 4.2.1. General Properties of a Filehandle . . . . . . . . . 89
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 89 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 90
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 90
4.3. One Method of Constructing a Volatile Filehandle . . . . 91 4.3. One Method of Constructing a Volatile Filehandle . . . . 92
4.4. Client Recovery from Filehandle Expiration . . . . . . . 91 4.4. Client Recovery from Filehandle Expiration . . . . . . . 92
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 92 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 93
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 94 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 94
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 94 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 95
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 94 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 95
5.4. Classification of Attributes . . . . . . . . . . . . . . 96 5.4. Classification of Attributes . . . . . . . . . . . . . . 97
5.5. REQUIRED Attributes - List and Definition References . . 97 5.5. REQUIRED Attributes - List and Definition References . . 98
5.6. RECOMMENDED Attributes - List and Definition 5.6. RECOMMENDED Attributes - List and Definition
References . . . . . . . . . . . . . . . . . . . . . . . 97 References . . . . . . . . . . . . . . . . . . . . . . . 98
5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 99 5.7. Attribute Definitions . . . . . . . . . . . . . . . . . 100
5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 99 5.7.1. Definitions of REQUIRED Attributes . . . . . . . . . 100
5.7.2. Definitions of Uncategorized RECOMMENDED 5.7.2. Definitions of Uncategorized RECOMMENDED
Attributes . . . . . . . . . . . . . . . . . . . . . 101 Attributes . . . . . . . . . . . . . . . . . . . . . 102
5.8. Interpreting owner and owner_group . . . . . . . . . . . 107 5.8. Interpreting owner and owner_group . . . . . . . . . . . 108
5.9. Character Case Attributes . . . . . . . . . . . . . . . 109 5.9. Character Case Attributes . . . . . . . . . . . . . . . 110
5.10. Directory Notification Attributes . . . . . . . . . . . 109 5.10. Directory Notification Attributes . . . . . . . . . . . 110
5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 110 5.11. pNFS Attribute Definitions . . . . . . . . . . . . . . . 111
5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 112 5.12. Retention Attributes . . . . . . . . . . . . . . . . . . 113
6. Security Related Attributes . . . . . . . . . . . . . . . . . 114 6. Security Related Attributes . . . . . . . . . . . . . . . . . 115
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 115 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 116
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 115 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 116
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 130 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 131
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 130 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 131
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 131 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 131
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 131 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 132
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 132 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 133
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 132 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 133
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 133 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 134
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 134 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 135
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 134 6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 135
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 136 6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 137
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 136 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 137
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 140 7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 141
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 140 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 141
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 141 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 142
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 141 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 142
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 142 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 143
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 142 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 143
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 142 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 143
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 143 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 144
7.8. Security Policy and Namespace Presentation . . . . . . . 143 7.8. Security Policy and Namespace Presentation . . . . . . . 144
8. State Management . . . . . . . . . . . . . . . . . . . . . . 144 8. State Management . . . . . . . . . . . . . . . . . . . . . . 145
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 145 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 146
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 145 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 146
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 146 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 147
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 147 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 148
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 148 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 149
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 150 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 151
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 153 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 154
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 153 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 154
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 155 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 156
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 155 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 157
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 156 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 157
8.4.3. Network Partitions and Recovery . . . . . . . . . . 159 8.4.3. Network Partitions and Recovery . . . . . . . . . . 161
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 164 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 165
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 165 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 166
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 165 Expiration . . . . . . . . . . . . . . . . . . . . . . . 167
8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 166 8.8. Vestigial Locking Infrastructure From V4.0 . . . . . . . 167
9. File Locking and Share Reservations . . . . . . . . . . . . . 167 9. File Locking and Share Reservations . . . . . . . . . . . . . 168
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 167 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 169
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 167 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 169
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 168 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 169
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 171 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 172
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 171 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 173
9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 172 9.4. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 173
9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 173 9.5. Share Reservations . . . . . . . . . . . . . . . . . . . 174
9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 174 9.6. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 175
9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 174 9.7. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 176
9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 175 9.8. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 176
9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 176 9.9. Reclaim of Open and Byte-range Locks . . . . . . . . . . 177
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 176 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 177
10.1. Performance Challenges for Client-Side Caching . . . . . 177 10.1. Performance Challenges for Client-Side Caching . . . . . 178
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 178 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 179
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 180 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 181
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 182 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 183
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 182 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 184
10.3.2. Data Caching and File Locking . . . . . . . . . . . 183 10.3.2. Data Caching and File Locking . . . . . . . . . . . 185
10.3.3. Data Caching and Mandatory File Locking . . . . . . 185 10.3.3. Data Caching and Mandatory File Locking . . . . . . 186
10.3.4. Data Caching and File Identity . . . . . . . . . . . 185 10.3.4. Data Caching and File Identity . . . . . . . . . . . 187
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 187 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 188
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 189 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 190
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 190 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 192
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 191 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 192
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 194 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 195
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 195 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 197
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 196 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 197
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 197 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 198
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 197 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 199
10.5.1. Revocation Recovery for Write Open Delegation . . . 198 10.5.1. Revocation Recovery for Write Open Delegation . . . 199
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 199 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 200
10.7. Data and Metadata Caching and Memory Mapped Files . . . 201 10.7. Data and Metadata Caching and Memory Mapped Files . . . 202
10.8. Name and Directory Caching without Directory 10.8. Name and Directory Caching without Directory
Delegations . . . . . . . . . . . . . . . . . . . . . . 203 Delegations . . . . . . . . . . . . . . . . . . . . . . 204
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 203 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 204
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 205 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 206
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 205 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 207
10.9.1. Introduction to Directory Delegations . . . . . . . 206 10.9.1. Introduction to Directory Delegations . . . . . . . 207
10.9.2. Directory Delegation Design . . . . . . . . . . . . 207 10.9.2. Directory Delegation Design . . . . . . . . . . . . 208
10.9.3. Attributes in Support of Directory Notifications . . 208 10.9.3. Attributes in Support of Directory Notifications . . 209
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 208 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 209
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 208 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 210
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 209 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 210
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 209 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 210
11.2. File System Presence or Absence . . . . . . . . . . . . 209 11.2. File System Presence or Absence . . . . . . . . . . . . 211
11.3. Getting Attributes for an Absent File System . . . . . . 211 11.3. Getting Attributes for an Absent File System . . . . . . 212
11.3.1. GETATTR Within an Absent File System . . . . . . . . 211 11.3.1. GETATTR Within an Absent File System . . . . . . . . 212
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 212 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 213
11.4. Uses of Location Information . . . . . . . . . . . . . . 213 11.4. Uses of Location Information . . . . . . . . . . . . . . 214
11.4.1. File System Replication . . . . . . . . . . . . . . 213 11.4.1. File System Replication . . . . . . . . . . . . . . 215
11.4.2. File System Migration . . . . . . . . . . . . . . . 214 11.4.2. File System Migration . . . . . . . . . . . . . . . 216
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 215 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 217
11.5. Location Entries and Server Identity . . . . . . . . . . 217 11.5. Location Entries and Server Identity . . . . . . . . . . 218
11.6. Additional Client-side Considerations . . . . . . . . . 217 11.6. Additional Client-side Considerations . . . . . . . . . 219
11.7. Effecting File System Transitions . . . . . . . . . . . 218 11.7. Effecting File System Transitions . . . . . . . . . . . 220
11.7.1. File System Transitions and Simultaneous Access . . 219 11.7.1. File System Transitions and Simultaneous Access . . 221
11.7.2. Simultaneous Use and Transparent Transitions . . . . 220 11.7.2. Simultaneous Use and Transparent Transitions . . . . 221
11.7.3. Filehandles and File System Transitions . . . . . . 223 11.7.3. Filehandles and File System Transitions . . . . . . 224
11.7.4. Fileids and File System Transitions . . . . . . . . 223 11.7.4. Fileids and File System Transitions . . . . . . . . 224
11.7.5. Fsids and File System Transitions . . . . . . . . . 224 11.7.5. Fsids and File System Transitions . . . . . . . . . 226
11.7.6. The Change Attribute and File System Transitions . . 225 11.7.6. The Change Attribute and File System Transitions . . 226
11.7.7. Lock State and File System Transitions . . . . . . . 226 11.7.7. Lock State and File System Transitions . . . . . . . 227
11.7.8. Write Verifiers and File System Transitions . . . . 229 11.7.8. Write Verifiers and File System Transitions . . . . 231
11.7.9. Readdir Cookies and Verifiers and File System 11.7.9. Readdir Cookies and Verifiers and File System
Transitions . . . . . . . . . . . . . . . . . . . . 230 Transitions . . . . . . . . . . . . . . . . . . . . 231
11.7.10. File System Data and File System Transitions . . . . 230 11.7.10. File System Data and File System Transitions . . . . 231
11.8. Effecting File System Referrals . . . . . . . . . . . . 231 11.8. Effecting File System Referrals . . . . . . . . . . . . 233
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 232 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 233
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 236 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 237
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 238 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 239
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 240 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 241
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 244 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 245
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 249 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 250
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 250 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 251
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 252 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 253
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 256 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 257
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 256 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 257
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 257 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 259
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 258 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 259
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 258 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 259
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 258 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 260
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 258 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 260
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 258 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 260
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 258 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 260
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 259 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 260
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 259 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 261
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 260 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 261
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 260 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 262
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 262 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 263
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 263 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 264
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 263 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 264
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 263 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 264
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 264 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 266
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 265 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 267
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 266 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 268
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 269 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 270
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 276 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 277
12.5.7. Metadata Server Write Propagation . . . . . . . . . 276 12.5.7. Metadata Server Write Propagation . . . . . . . . . 277
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 276 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 278
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 278 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 279
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 278 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 280
12.7.2. Dealing with Lease Expiration on the Client . . . . 279 12.7.2. Dealing with Lease Expiration on the Client . . . . 280
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 280 Server . . . . . . . . . . . . . . . . . . . . . . . 281
12.7.4. Recovery from Metadata Server Restart . . . . . . . 280 12.7.4. Recovery from Metadata Server Restart . . . . . . . 282
12.7.5. Operations During Metadata Server Grace Period . . . 282 12.7.5. Operations During Metadata Server Grace Period . . . 284
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 283 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 284
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 283 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 284
12.9. Security Considerations for pNFS . . . . . . . . . . . . 284 12.9. Security Considerations for pNFS . . . . . . . . . . . . 285
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 285 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 286
13.1. Client ID and Session Considerations . . . . . . . . . . 285 13.1. Client ID and Session Considerations . . . . . . . . . . 286
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 287 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 288
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 288 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 289
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 292 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 293
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 292 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 293
13.4.2. Interpreting the File Layout Using Sparse Packing . 292 13.4.2. Interpreting the File Layout Using Sparse Packing . 293
13.4.3. Interpreting the File Layout Using Dense Packing . . 294 13.4.3. Interpreting the File Layout Using Dense Packing . . 296
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 297 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 298
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 298 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 300
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 299 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 301
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 302 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 303
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 303 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 305
13.9. Metadata and Data Server State Coordination . . . . . . 303 13.9. Metadata and Data Server State Coordination . . . . . . 305
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 303 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 305
13.9.2. Data Server State Propagation . . . . . . . . . . . 304 13.9.2. Data Server State Propagation . . . . . . . . . . . 306
13.10. Data Server Component File Size . . . . . . . . . . . . 306 13.10. Data Server Component File Size . . . . . . . . . . . . 308
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 307 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 309
13.12. Security Considerations for the File Layout Type . . . . 308 13.12. Security Considerations for the File Layout Type . . . . 309
14. Internationalization . . . . . . . . . . . . . . . . . . . . 309 14. Internationalization . . . . . . . . . . . . . . . . . . . . 310
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 310 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 311
14.2. Stringprep profile for the utf8str_cis type . . . . . . 311 14.2. Stringprep profile for the utf8str_cis type . . . . . . 313
14.3. Stringprep profile for the utf8str_mixed type . . . . . 313 14.3. Stringprep profile for the utf8str_mixed type . . . . . 314
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 314 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 316
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 314 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 316
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 315 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 317
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 315 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 317
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 317 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 319
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 319 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 321
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 320 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 322
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 322 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 324
15.1.5. State Management Errors . . . . . . . . . . . . . . 324 15.1.5. State Management Errors . . . . . . . . . . . . . . 326
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 325 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 327
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 326 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 327
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 326 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 328
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 328 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 329
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 328 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 330
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 330 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 331
15.1.12. Session Management Errors . . . . . . . . . . . . . 331 15.1.12. Session Management Errors . . . . . . . . . . . . . 332
15.1.13. Client Management Errors . . . . . . . . . . . . . . 331 15.1.13. Client Management Errors . . . . . . . . . . . . . . 333
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 332 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 334
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 333 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 334
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 333 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 335
15.2. Operations and their valid errors . . . . . . . . . . . 334 15.2. Operations and their valid errors . . . . . . . . . . . 336
15.3. Callback operations and their valid errors . . . . . . . 350 15.3. Callback operations and their valid errors . . . . . . . 352
15.4. Errors and the operations that use them . . . . . . . . 352 15.4. Errors and the operations that use them . . . . . . . . 354
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 366 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 368
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 366 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 368
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 367 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 369
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 377 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 379
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 380 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 382
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 380 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 382
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 383 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 385
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 384 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 386
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 387 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 389
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 390 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 392
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 391 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 393
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 391 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 393
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 393 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 395
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 394 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 396
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 396 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 398
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 400 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 402
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 402 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 404
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 403 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 405
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 405 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 407
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 406 Attributes . . . . . . . . . . . . . . . . . . . . . . . 408
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 407 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 409
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 426 Directory . . . . . . . . . . . . . . . . . . . . . . . 428
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 427 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 429
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 428 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 430
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 429 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 431
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 431 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 433
18.22. Operation 25: READ - Read from File . . . . . . . . . . 431 18.22. Operation 25: READ - Read from File . . . . . . . . . . 433
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 434 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 436
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 437 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 439
18.25. Operation 28: REMOVE - Remove File System Object . . . . 438 18.25. Operation 28: REMOVE - Remove File System Object . . . . 440
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 441 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 443
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 444 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 446
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 445 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 447
18.29. Operation 33: SECINFO - Obtain Available Security . . . 446 18.29. Operation 33: SECINFO - Obtain Available Security . . . 448
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 449 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 451
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 452 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 454
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 453 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 455
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 458 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 460
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 459 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 461
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 462 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 464
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 478 Confirm Client ID . . . . . . . . . . . . . . . . . . . 480
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 487 session . . . . . . . . . . . . . . . . . . . . . . . . 490
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 489 locks . . . . . . . . . . . . . . . . . . . . . . . . . 492
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 490 delegation . . . . . . . . . . . . . . . . . . . . . . . 493
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 494 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 497
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 496 for a File System . . . . . . . . . . . . . . . . . . . 499
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 498 a layout . . . . . . . . . . . . . . . . . . . . . . . . 501
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 501 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 504
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 505 Information . . . . . . . . . . . . . . . . . . . . . . 508
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 510 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 513
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 511 sequencing and control . . . . . . . . . . . . . . . . . 514
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 517 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 520
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 519 validity . . . . . . . . . . . . . . . . . . . . . . . . 522
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 521 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 524
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 525 client ID . . . . . . . . . . . . . . . . . . . . . . . 527
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 525 Finished . . . . . . . . . . . . . . . . . . . . . . . . 528
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 528 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 530
19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 528 19. NFSv44.1 Callback Procedures . . . . . . . . . . . . . . . . 531
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 529 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 531
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 529 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 531
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 533 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 536
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 533 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 536
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 534 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 537
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 535 Client . . . . . . . . . . . . . . . . . . . . . . . . . 538
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 539 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 542
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 543 Client . . . . . . . . . . . . . . . . . . . . . . . . . 546
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 544 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 547
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 546 Resources for Recallable Objects . . . . . . . . . . . . 549
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 547 limits . . . . . . . . . . . . . . . . . . . . . . . . . 550
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 548 sequencing and control . . . . . . . . . . . . . . . . . 551
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 550 Delegation Wants . . . . . . . . . . . . . . . . . . . . 553
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 551 lock availability . . . . . . . . . . . . . . . . . . . 554
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 553 changes . . . . . . . . . . . . . . . . . . . . . . . . 556
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 555 Operation . . . . . . . . . . . . . . . . . . . . . . . 558
21. Security Considerations . . . . . . . . . . . . . . . . . . . 555 21. Security Considerations . . . . . . . . . . . . . . . . . . . 558
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 557 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 560
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 557 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 560
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 557 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 560
22.3. Defining New Notifications . . . . . . . . . . . . . . . 558 22.3. Defining New Notifications . . . . . . . . . . . . . . . 561
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 559 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 561
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 560 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 563
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 560 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 563
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 561 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 563
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 561 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 563
23.1. Normative References . . . . . . . . . . . . . . . . . . 561 23.1. Normative References . . . . . . . . . . . . . . . . . . 563
23.2. Informative References . . . . . . . . . . . . . . . . . 562 23.2. Informative References . . . . . . . . . . . . . . . . . 565
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 564 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 566
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 566 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 568
Intellectual Property and Copyright Statements . . . . . . . . . 567 Intellectual Property and Copyright Statements . . . . . . . . . 570
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 16, line 39 skipping to change at page 16, line 39
systems are reachable from a special per-server global root systems are reachable from a special per-server global root
filehandle. This allows LOOKUP operations to be used to perform filehandle. This allows LOOKUP operations to be used to perform
functions previously provided by the MOUNT protocol. The server functions previously provided by the MOUNT protocol. The server
provides any necessary pseudo file systems to bridge any gaps that provides any necessary pseudo file systems to bridge any gaps that
arise due to unexported gaps between exported file systems. arise due to unexported gaps between exported file systems.
1.6.3.1. Filehandles 1.6.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are As in previous versions of the NFS protocol, opaque filehandles are
used to identify individual files and directories. Lookup-type and used to identify individual files and directories. Lookup-type and
create operations are used to go from file and directory names to the create operations translate file and directory names to filehandles
filehandle which is then used to identify the object to subsequent which are then used to identify objects in subsequent operations.
operations.
The NFSv4.1 protocol provides support for persistent filehandles, The NFSv4.1 protocol provides support for persistent filehandles,
guaranteed to be valid for the lifetime of the file system object guaranteed to be valid for the lifetime of the file system object
designated. In addition it provides support to servers to provide designated. In addition it provides support to servers to provide
filehandles with more limited validity guarantees, called volatile filehandles with more limited validity guarantees, called volatile
filehandles. filehandles.
1.6.3.2. File Attributes 1.6.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible attribute structure. The NFSv4.1 protocol has a rich and extensible attribute structure,
Only a small set of the defined attributes are REQUIRED to be which is divided into REQUIRED, RECOMMENDED, and named attributes.
provided by all server implementations. The other attributes are
known as RECOMMENDED attributes.
The acl, sacl, and dacl attributes are a significant set of file The acl, sacl, and dacl attributes compose a set of RECOMMENDED file
attributes that make up the Access Control List (ACL) of a file. attributes that make up the Access Control List (ACL) of a file
These attributes provide for directory and file access control beyond (Section 6). These attributes provide for directory and file access
the model used in NFSv3. The ACL definition allows for specification control beyond the model used in NFSv3. The ACL definition allows
of specific sets of permissions for individual users and groups. In for specification of specific sets of permissions for individual
addition, ACL inheritance allows propagation of access permissions users and groups. In addition, ACL inheritance allows propagation of
and restriction down a directory tree as file system objects are access permissions and restriction down a directory tree as file
created. system objects are created.
One other type of attribute is the named attribute. A named A named attribute is an opaque byte stream that is associated with a
attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate are meant to be used by client applications as a method to associate
application-specific data with a regular file or directory. NFSv4.1 application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non- allowed operations in order to prevent the development of non-
interoperable implementation. See Section 5.3 for details. interoperable implementation. See Section 5.3 for details.
1.6.3.3. Multi-server Namespace 1.6.3.3. Multi-server Namespace
NFSv4.1 contains a number of features to allow implementation of NFSv4.1 contains a number of features to allow implementation of
skipping to change at page 18, line 26 skipping to change at page 18, line 26
The types of locks are: The types of locks are:
o Share reservations as established by OPEN operations. o Share reservations as established by OPEN operations.
o Byte-range locks. o Byte-range locks.
o File delegations, which are recallable locks that assure the o File delegations, which are recallable locks that assure the
holder that inconsistent opens and file changes cannot occur so holder that inconsistent opens and file changes cannot occur so
long as the delegation is held. long as the delegation is held.
o Directory delegations, which are recallable delegations that o Directory delegations, which are recallable locks that assure the
assure the holder that inconsistent directory modifications cannot holder that inconsistent directory modifications cannot occur so
occur so long as the delegation is held. long as the delegation is held.
o Layouts, which are recallable objects that assure the holder that o Layouts, which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the direct access to the file data may be performed directly by the
client and that no change to the data's location inconsistent with client and that no change to the data's location inconsistent with
that access may be made so long as the layout is held. that access may be made so long as the layout is held.
All locks for a given client are tied together under a single client- All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client wide lease. All requests made on sessions associated with the client
renew that lease. When leases are not promptly renewed locks are renew that lease. When leases are not promptly renewed locks are
subject to revocation. In the event of server reboot, clients have subject to revocation. In the event of server reboot, clients have
the opportunity to safely reclaim their locks within a special grace the opportunity to safely reclaim their locks within a special grace
period. period.
1.7. Differences from NFSv4.0 1.7. Differences from NFSv4.0
The following summarizes the differences between minor version one The following summarizes the major differences between minor version
and the base protocol: one and the base protocol:
o Implementation of the sessions model. o Implementation of the sessions model.
o Support for parallel access to data. o Support for parallel access to data.
o Addition of the RECLAIM_COMPLETE operation to better structure the o Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process. lock reclamation process.
o Support for delegations on directories and other file types in o Support for delegations on directories and other file types in
addition to regular files. addition to regular files.
skipping to change at page 19, line 30 skipping to change at page 19, line 30
2.2. RPC and XDR 2.2. RPC and XDR
The NFSv4.1 protocol is a Remote Procedure Call (RPC) application The NFSv4.1 protocol is a Remote Procedure Call (RPC) application
that uses RPC version 2 and the corresponding eXternal Data that uses RPC version 2 and the corresponding eXternal Data
Representation (XDR) as defined in [3] and [2]. Representation (XDR) as defined in [3] and [2].
2.2.1. RPC-based Security 2.2.1. RPC-based Security
Previous NFS versions have been thought of as having a host-based Previous NFS versions have been thought of as having a host-based
authentication model, where the NFS server authenticates the NFS authentication model, where the NFS server authenticates the NFS
client, and trust the client to authenticate all users. Actually, client, and trusts the client to authenticate all users. Actually,
NFS has always depended on RPC for authentication. The first form of NFS has always depended on RPC for authentication. One of the first
RPC authentication which required a host-based authentication forms of RPC authentication, AUTH_SYS, had no strong authentication,
approach. NFSv4.1 also depends on RPC for basic security services, and required a host-based authentication approach. NFSv4.1 also
and mandates RPC support for a user-based authentication model. The depends on RPC for basic security services, and mandates RPC support
user-based authentication model has user principals authenticated by for a user-based authentication model. The user-based authentication
a server, and in turn the server authenticated by user principals. model has user principals authenticated by a server, and in turn the
RPC provides some basic security services which are used by NFSv4.1. server authenticated by user principals. RPC provides some basic
security services which are used by NFSv4.1.
2.2.1.1. RPC Security Flavors 2.2.1.1. RPC Security Flavors
As described in section 7.2 "Authentication" of [3], RPC security is As described in section 7.2 "Authentication" of [3], RPC security is
encapsulated in the RPC header, via a security or authentication encapsulated in the RPC header, via a security or authentication
flavor, and information specific to the specification of the security flavor, and information specific to the specified security flavor.
flavor. Every RPC header conveys information used to identify and Every RPC header conveys information used to identify and
authenticate a client and server. As discussed in Section 2.2.1.1.1, authenticate a client and server. As discussed in Section 2.2.1.1.1,
some security flavors provide additional security services. some security flavors provide additional security services.
NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This
requirement to implement is not a requirement to use.) Other requirement to implement is not a requirement to use.) Other
flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well. flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well.
2.2.1.1.1. RPCSEC_GSS and Security Services 2.2.1.1.1. RPCSEC_GSS and Security Services
RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows
skipping to change at page 24, line 14 skipping to change at page 24, line 14
recovery see Section 12.7.1. recovery see Section 12.7.1.
Releasing such state requires that the server be able to determine Releasing such state requires that the server be able to determine
that one client instance is the successor of another. Where this that one client instance is the successor of another. Where this
cannot be done, for any of a number of reasons, the locking state cannot be done, for any of a number of reasons, the locking state
will remain for a time subject to lease expiration (see Section 8.3) will remain for a time subject to lease expiration (see Section 8.3)
and the new client will need to wait for such state to be removed, if and the new client will need to wait for such state to be removed, if
it makes conflicting lock requests. it makes conflicting lock requests.
Client identification is encapsulated in the following Client Owner Client identification is encapsulated in the following Client Owner
structure: data type:
struct client_owner4 { struct client_owner4 {
verifier4 co_verifier; verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
}; };
The first field, co_verifier, is a client incarnation verifier. The The first field, co_verifier, is a client incarnation verifier. The
server will start the process of canceling the client's leased state server will start the process of canceling the client's leased state
if co_verifier is different than what the server has previously if co_verifier is different than what the server has previously
recorded for the identified client (as specified in the co_ownerid recorded for the identified client (as specified in the co_ownerid
skipping to change at page 25, line 39 skipping to change at page 25, line 39
* A MAC address (again, a one way function should be performed). * A MAC address (again, a one way function should be performed).
* The timestamp of when the NFSv4.1 software was first installed * The timestamp of when the NFSv4.1 software was first installed
on the client (though this is subject to the previously on the client (though this is subject to the previously
mentioned caution about using information that is stored in a mentioned caution about using information that is stored in a
file, because the file might only be accessible over NFSv4.1). file, because the file might only be accessible over NFSv4.1).
* A true random number. However since this number ought to be * A true random number. However since this number ought to be
the same between client incarnations, this shares the same the same between client incarnations, this shares the same
problem as that of the using the timestamp of the software problem as that of using the timestamp of the software
installation. installation.
o For a user level NFSv4.1 client, it should contain additional o For a user level NFSv4.1 client, it should contain additional
information to distinguish the client from other user level information to distinguish the client from other user level
clients running on the same host, such as a process identifier or clients running on the same host, such as a process identifier or
other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
skipping to change at page 26, line 43 skipping to change at page 26, line 43
an attempt is made to establish this new session with the existing an attempt is made to establish this new session with the existing
client ID, the server will reject the request with client ID, the server will reject the request with
NFS4ERR_STALE_CLIENTID. NFS4ERR_STALE_CLIENTID.
When NFS4ERR_STALE_CLIENTID is received in either of these When NFS4ERR_STALE_CLIENTID is received in either of these
situations, the client must obtain a new client ID by use of the situations, the client must obtain a new client ID by use of the
EXCHANGE_ID operation, then use that client ID as the basis of a new EXCHANGE_ID operation, then use that client ID as the basis of a new
session, and then proceed to any other necessary recovery for the session, and then proceed to any other necessary recovery for the
server restart case (See Section 8.4.2). server restart case (See Section 8.4.2).
See the detailed descriptions of EXCHANGE_ID (Section 18.35 and See the descriptions of EXCHANGE_ID (Section 18.35) and
CREATE_SESSION (Section 18.36) for a complete specification of these CREATE_SESSION (Section 18.36) for a complete specification of these
operations. operations.
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established
using SETCLIENTID using NFSv4.0, so that an NFSv4.1 client is not using the SETCLIENTID operation of NFSv4.0. A server that does so
forced to delay until lease expiration for locking state established will allow an upgraded client to avoid waiting until the lease (i.e.
by the earlier client using minor version 0. This requires the
client_owner4 be constructed the same way as the nfs_client_id4. If the lease established by the NFSv4.0 instance client) expires. This
the latter's contents included the server's network address, and the requires the client_owner4 be constructed the same way as the
NFSv4.1 client does not wish to use a client ID that prevents nfs_client_id4. If the latter's contents included the server's
trunking, it should send two EXCHANGE_ID operations. The first network address (per the recommendations of the NFS4.0 specification
EXCHANGE_ID will have a client_owner4 equal to the nfs_client_id4. [21]), and the NFSv4.1 client does not wish to use a client ID that
This will clear the state created by the NFSv4.0 client. The second prevents trunking, it should send two EXCHANGE_ID operations. The
EXCHANGE_ID will not have the server's network address. The state first EXCHANGE_ID will have a client_owner4 equal to the
created for the second EXCHANGE_ID will not have to wait for lease nfs_client_id4. This will clear the state created by the NFSv4.0
expiration, because there will be no state to expire. client. The second EXCHANGE_ID will not have the server's network
address. The state created for the second EXCHANGE_ID will not have
to wait for lease expiration, because there will be no state to
expire.
2.4.2. Server Release of Client ID 2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID NFSv4.1 introduces a new operation called DESTROY_CLIENTID
(Section 18.50) which the client SHOULD use to destroy a client ID it (Section 18.50) which the client SHOULD use to destroy a client ID it
no longer needs. This permits graceful, bilateral release of a no longer needs. This permits graceful, bilateral release of a
client ID. The operation cannot be used if there are sessions client ID. The operation cannot be used if there are sessions
associated with the client ID, or state with an unexpired lease. associated with the client ID, or state with an unexpired lease.
If the server determines that the client holds no associated state If the server determines that the client holds no associated state
for its client ID (including sessions, opens, locks, delegations, for its client ID (including sessions, opens, locks, delegations,
layouts, and wants), the server may choose to unilaterally release layouts, and wants), the server may choose to unilaterally release
the client ID. The server may make this choice for an inactive the client ID in order to conserve resources. If the client contacts
client so that resources are not consumed by those intermittently the server after this release, the server must ensure the client
active clients. If the client contacts the server after this receives the appropriate error so that it will use the EXCHANGE_ID/
release, the server must ensure the client receives the appropriate CREATE_SESSION sequence to establish a new client ID. The server
error so that it will use the EXCHANGE_ID/CREATE_SESSION sequence to ought to be very hesitant to release a client ID since the resulting
establish a new identity. It should be clear that the server must be work on the client to recover from such an event will be the same
very hesitant to release a client ID since the resulting work on the burden as if the server had failed and restarted. Typically a server
client to recover from such an event will be the same burden as if would not release a client ID unless there had been no activity from
the server had failed and restarted. Typically a server would not that client for many minutes. As long as there are sessions, opens,
release a client ID unless there had been no activity from that
client for many minutes. As long as there are sessions, opens,
locks, delegations, layouts, or wants, the server MUST NOT release locks, delegations, layouts, or wants, the server MUST NOT release
the client ID. See Section 2.10.10.1.4 for discussion on releasing the client ID. See Section 2.10.10.1.4 for discussion on releasing
inactive sessions. inactive sessions.
2.4.3. Resolving Client Owner Conflicts 2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state, but the lease has expired, the has no state, or that has state, but the lease has expired, the
server MUST allow the EXCHANGE_ID, and confirm the new client ID if server MUST allow the EXCHANGE_ID, and confirm the new client ID if
followed by the appropriate CREATE_SESSION. followed by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following are previous incarnation of the client owner if one of the following are
true: true:
o The principal that created the client ID for the client owner is o The principal that created the client ID for the client owner is
the same as the principal that is issuing the EXCHANGE_ID. Note the same as the principal that is issuing the EXCHANGE_ID. Note
that if the client ID was created with SP4_MACH_CRED protection that if the client ID was created with SP4_MACH_CRED state
(Section 18.35), the principal MUST be based on RPCSEC_GSS protection (Section 18.35), the principal MUST be based on
authentication, the RPCSEC_GSS service used MUST be integrity or RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be
privacy, and the same GSS mechanism and principal must be used as integrity or privacy, and the same GSS mechanism and principal
that used when the client ID was created. must be used as that used when the client ID was created.
o The client ID was established with SP4_SSV protection o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.7.3) and the client sends the (Section 18.35, Section 2.10.7.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.8). GSS SSV mechanism (Section 2.10.8).
o The client ID was established with SP4_SSV protection. Because o The client ID was established with SP4_SSV protection, and under
the SSV might not persist across client and server restart, and the conditions described herein, the EXCHANGE_ID was sent with
because the first time a client sends EXCHANGE_ID to a server it SP4_MACH_CRED state protection. Because the SSV might not persist
does not have an SSV, the client MAY send the subsequent across client and server restart, and because the first time a
EXCHANGE_ID without an SSV RPCSEC_GSS handle. Instead, as with client sends EXCHANGE_ID to a server it does not have an SSV, the
SP4_MACH_CRED protection, the principal MUST be based on client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS authentication, the RPCSEC_GSS service used MUST be RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
integrity or privacy, and the same GSS mechanism and principal principal MUST be based on RPCSEC_GSS authentication, the
must be used as that used when the client ID was created. RPCSEC_GSS service used MUST be integrity or privacy, and the same
GSS mechanism and principal MUST be used as that used when the
client ID was created.
If none of the above situations apply, the server MUST return If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE. NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that If the server accepts the principal and co_ownerid as matching that
which created the client ID, it deletes state (once CREATE_SESSION which created the client ID, and the co_verifier in the EXCHANGE_ID
confirms the client ID) if the co_verifier in the EXCHANGE_ID differs differs from the co_verifier used when the client ID was created,
from the co_verifier used when the client ID was created. If the then after the server receives a CREATE_SESSION that confirms the
co_verifier values are the same, then the client is either updating client ID, the server deletes state. If the co_verifier values are
properties of the client ID (Section 18.35), or possibly attempting the same, (e.g. the client is either updating properties of the
trunking (Section 2.10.4) and the server MUST NOT delete state. client ID (Section 18.35), or the client is attempting trunking
(Section 2.10.4) the server MUST NOT delete state.
2.5. Server Owners 2.5. Server Owners
The Server Owner is similar to a Client Owner (Section 2.4), but The Server Owner is similar to a Client Owner (Section 2.4), but
unlike the Client Owner, there is no shorthand serverid. The Server unlike the Client Owner, there is no shorthand server ID. The Server
Owner is defined in the following structure: Owner is defined in the following data type:
struct server_owner4 { struct server_owner4 {
uint64_t so_minor_id; uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>; opaque so_major_id<NFS4_OPAQUE_LIMIT>;
}; };
The Server Owner is returned from EXCHANGE_ID. When the so_major_id The Server Owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections each fields are the same in two EXCHANGE_ID results, the connections each
EXCHANGE_ID are sent over can be assumed to address the same Server EXCHANGE_ID were sent over can be assumed to address the same Server
(as defined in Section 1.5). If the so_minor_id fields are also the (as defined in Section 1.5). If the so_minor_id fields are also the
same, then not only do both connections connect to the same server, same, then not only do both connections connect to the same server,
but the session and other state can be shared across both but the session can be shared across both connections. The reader is
connections. The reader is cautioned that multiple servers may cautioned that multiple servers may deliberately or accidentally
deliberately or accidentally claim to have the same so_major_id or claim to have the same so_major_id or so_major_id/so_minor_id; the
so_major_id/so_minor_id; the reader should examine Section 2.10.4 and reader should examine Section 2.10.4 and Section 18.35 in order to
Section 18.35. avoid acting on falsely matching Server Owner values.
The considerations for generating a so_major_id are similar to that The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.4). (see Section 2.10.4).
2.6. Security Service Negotiation 2.6. Security Service Negotiation
skipping to change at page 30, line 49 skipping to change at page 31, line 9
2.6.3.1.1. Put Filehandle Operation + SAVEFH 2.6.3.1.1. Put Filehandle Operation + SAVEFH
The client is saving a filehandle for a future RESTOREFH. The server The client is saving a filehandle for a future RESTOREFH. The server
MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle MUST NOT return NFS4ERR_WRONGSEC to either the put filehandle
operation or SAVEFH. operation or SAVEFH.
2.6.3.1.2. Two or More Put Filehandle Operations 2.6.3.1.2. Two or More Put Filehandle Operations
For a series of N put filehandle operations, the server MUST NOT For a series of N put filehandle operations, the server MUST NOT
return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations.
The Nth put filehandle operation is handled as if it is the first in The N'th put filehandle operation is handled as if it is the first in
a series of operations, and the second in the series of operations is a subseries of operations. For example if the server received PUTFH,
not a put filehandle operation. For example if the server received PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC
PUTFH, PUTROOTFH, LOOKUP, then the PUTFH is ignored for purposes, and the PUTROOTFH, LOOKUP subseries is processed as
NFS4ERR_WRONGSEC purposes, and the PUTROOTFH, LOOKUP subseries is according to Section 2.6.3.1.3.
processed as according to Section 2.6.3.1.3.
2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name) 2.6.3.1.3. Put Filehandle Operation + LOOKUP (or OPEN by Name)
This situation also applies to a put filehandle operation followed by This situation also applies to a put filehandle operation followed by
a LOOKUP or an OPEN operation that specifies a component name. a LOOKUP or an OPEN operation that specifies a component name.
In this situation, the client is potentially crossing a security In this situation, the client is potentially crossing a security
policy boundary, and the set of security tuples the parent directory policy boundary, and the set of security tuples the parent directory
supports may differ from those of the child. The server supports may differ from those of the child. The server
implementation may decide whether to impose any restrictions on implementation may decide whether to impose any restrictions on
skipping to change at page 33, line 25 skipping to change at page 33, line 31
A COMPOUND containing the series put filehandle operation + A COMPOUND containing the series put filehandle operation +
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
for the client to recover from NFS4ERR_WRONGSEC. for the client to recover from NFS4ERR_WRONGSEC.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name). component name).
2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME 2.6.3.1.8. Operations after SECINFO and SECINFO_NO_NAME
Placing an operation that uses the current filehandle after SECINFO Suppose a client sends a COMPOUND procedure containing the series
or SECINFO_NO_NAME seemingly introduces a issue with what error to SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple
return when security tuple of the request is not allowed for the used does not match that required for the target file. By rule (see
operation that uses the current filehandle. For example, suppose a
client sends a COMPOUND procedure containing the series SEQUENCE,
PUTFH, SECINFO_NONAME, READ, and suppose the security tuple used does
not match that required for the target file. By rule (see
Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return Section 2.6.3.1.5), neither PUTFH nor SECINFO_NO_NAME can return
NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.7), READ cannot
return NFS4ERR_WRONGSEC. The issue is resolved by the fact that return NFS4ERR_WRONGSEC. The issue is resolved by the fact that
SECINFO and SECINFO_NO_NAME consume the current filehandle. This SECINFO and SECINFO_NO_NAME consume the current filehandle (note that
leaves no current filehandle for READ to use, and READ returns this is a change from NFSv4.0). This leaves no current filehandle
NFS4ERR_NOFILEHANDLE. for READ to use, and READ returns NFS4ERR_NOFILEHANDLE.
2.7. Minor Versioning 2.7. Minor Versioning
To address the requirement of an NFS protocol that can evolve as the To address the requirement of an NFS protocol that can evolve as the
need arises, the NFSv4.1 protocol contains the rules and framework to need arises, the NFSv4.1 protocol contains the rules and framework to
allow for future minor changes or versioning. allow for future minor changes or versioning.
The base assumption with respect to minor versioning is that any The base assumption with respect to minor versioning is that any
future accepted minor version must follow the IETF process and be future accepted minor version must follow the IETF process and be
documented in a standards track RFC. Therefore, each minor version documented in a standards track RFC. Therefore, each minor version
skipping to change at page 35, line 9 skipping to change at page 35, line 11
* adding bits to flag fields such as new attributes to * adding bits to flag fields such as new attributes to
GETATTR's bitmap4 data type and providing corresponding GETATTR's bitmap4 data type and providing corresponding
variants of opaque arrays, such as a notify4 used together variants of opaque arrays, such as a notify4 used together
with such bitmaps. with such bitmaps.
* adding bits to existing attributes like ACLs that have flag * adding bits to existing attributes like ACLs that have flag
words words
* extending enumerated types (including NFS4ERR_*) with new * extending enumerated types (including NFS4ERR_*) with new
values and values
* adding cases to a switched union
4. Minor versions may not modify the structure of existing 4. Minor versions may not modify the structure of existing
attributes. attributes.
5. Minor versions may not delete operations. 5. Minor versions may not delete operations.
This prevents the potential reuse of a particular operation This prevents the potential reuse of a particular operation
"slot" in a future minor version. "slot" in a future minor version.
6. Minor versions may not delete attributes. 6. Minor versions may not delete attributes.
skipping to change at page 36, line 18 skipping to change at page 36, line 24
13. A client MUST NOT attempt to use a stateid, filehandle, or 13. A client MUST NOT attempt to use a stateid, filehandle, or
similar returned object from the COMPOUND procedure with minor similar returned object from the COMPOUND procedure with minor
version X for another COMPOUND procedure with minor version Y, version X for another COMPOUND procedure with minor version Y,
where X != Y. where X != Y.
2.8. Non-RPC-based Security Services 2.8. Non-RPC-based Security Services
As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for
identification, authentication, integrity, and privacy. NFSv4.1 identification, authentication, integrity, and privacy. NFSv4.1
itself provides additional security services as described in the next itself provides or enables additional security services as described
several subsections. in the next several subsections.
2.8.1. Authorization 2.8.1. Authorization
Authorization to access a file object via an NFSv4.1 operation is Authorization to access a file object via an NFSv4.1 operation is
ultimately determined by the NFSv4.1 server. A client can ultimately determined by the NFSv4.1 server. A client can
predetermine its access to a file object via the OPEN (Section 18.16) predetermine its access to a file object via the OPEN (Section 18.16)
and the ACCESS (Section 18.1) operations. and the ACCESS (Section 18.1) operations.
Principals with appropriate access rights can modify the Principals with appropriate access rights can modify the
authorization on a file object via the SETATTR (Section 18.30) authorization on a file object via the SETATTR (Section 18.30)
skipping to change at page 37, line 32 skipping to change at page 37, line 34
time this document was written, the only two transports that had the time this document was written, the only two transports that had the
above attributes were TCP and SCTP. To enhance the possibilities for above attributes were TCP and SCTP. To enhance the possibilities for
interoperability, an NFSv4.1 implementation MUST support operation interoperability, an NFSv4.1 implementation MUST support operation
over the TCP transport protocol. over the TCP transport protocol.
Even if NFSv4.1 is used over a non-IP network protocol, it is Even if NFSv4.1 is used over a non-IP network protocol, it is
RECOMMENDED that the transport support congestion control. RECOMMENDED that the transport support congestion control.
It is permissible for a connectionless transport to be used under It is permissible for a connectionless transport to be used under
NFSv4.1, however reliable and in-order delivery of data by the NFSv4.1, however reliable and in-order delivery of data by the
connectionless transport is still required. NFSv4.1 assumes that a connectionless transport is REQUIRED. NFSv4.1 assumes that a client
client transport address and server transport address used to send transport address and server transport address used to send data over
data over a transport together constitute a connection, even if the a transport together constitute a connection, even if the underlying
underlying transport eschews the concept of a connection. transport eschews the concept of a connection.
2.9.2. Client and Server Transport Behavior 2.9.2. Client and Server Transport Behavior
If a connection-oriented transport (e.g. TCP) is used the client and If a connection-oriented transport (e.g. TCP) is used, the client
server SHOULD use long lived connections for at least three reasons: and server SHOULD use long lived connections for at least three
reasons:
1. This will prevent the weakening of the transport's congestion 1. This will prevent the weakening of the transport's congestion
control mechanisms via short lived connections. control mechanisms via short lived connections.
2. This will improve performance for the WAN environment by 2. This will improve performance for the WAN environment by
eliminating the need for connection setup handshakes. eliminating the need for connection setup handshakes.
3. The NFSv4.1 callback model differs from NFSv4.0, and requires the 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the
client and server to maintain a client-created backchannel (see client and server to maintain a client-created backchannel (see
Section 2.10.3.1) for the server to use. Section 2.10.3.1) for the server to use.
skipping to change at page 39, line 7 skipping to change at page 39, line 10
o RDMA credits present a new issue to the reply cache in NFSv4.1. o RDMA credits present a new issue to the reply cache in NFSv4.1.
The reply cache may be used when a connection within a session is The reply cache may be used when a connection within a session is
lost, such as after the client reconnects. Credit information is lost, such as after the client reconnects. Credit information is
a dynamic property of the RDMA connection, and stale values must a dynamic property of the RDMA connection, and stale values must
not be replayed from the cache. This implies that the reply cache not be replayed from the cache. This implies that the reply cache
contents must not be blindly used when replies are sent from it, contents must not be blindly used when replies are sent from it,
and credit information appropriate to the channel must be and credit information appropriate to the channel must be
refreshed by the RPC layer. refreshed by the RPC layer.
In addition, the NFSv4.1 requester is not allowed to stop waiting for In addition, as described in Section 2.10.5.2, while a session is
a reply, as described in Section 2.10.5.2. active, the NFSv4.1 requester MUST NOT stop waiting for a reply.
2.9.3. Ports 2.9.3. Ports
Historically, NFSv3 servers have listened over TCP port 2049. The Historically, NFSv3 servers have listened over TCP port 2049. The
registered port 2049 [25] for the NFS protocol should be the default registered port 2049 [25] for the NFS protocol should be the default
configuration. NFSv4.1 clients SHOULD NOT use the RPC binding configuration. NFSv4.1 clients SHOULD NOT use the RPC binding
protocols as described in [26]. protocols as described in [26].
2.10. Session 2.10. Session
2.10.1. Motivation and Overview 2.10.1. Motivation and Overview
Previous versions and minor versions of NFS have suffered from the Previous versions and minor versions of NFS have suffered from the
following: following:
o Lack of support for exactly once semantics (EOS). This includes o Lack of support for Exactly Once Semantics (EOS). This includes
lack of support for EOS through server failure and recovery. lack of support for EOS through server failure and recovery.
o Limited callback support, including no support for sending o Limited callback support, including no support for sending
callbacks through firewalls, and races between responses from callbacks through firewalls, and races between replies to normal
normal requests, and callbacks. requests and callbacks.
o Limited trunking over multiple network paths. o Limited trunking over multiple network paths.
o Requiring machine credentials for fully secure operation. o Requiring machine credentials for fully secure operation.
Through the introduction of a session, NFSv4.1 addresses the above Through the introduction of a session, NFSv4.1 addresses the above
shortfalls with practical solutions: shortfalls with practical solutions:
o EOS is enabled by a reply cache with a bounded size, making it o EOS is enabled by a reply cache with a bounded size, making it
feasible to keep the cache in persistent storage and enable EOS feasible to keep the cache in persistent storage and enable EOS
skipping to change at page 43, line 12 skipping to change at page 43, line 16
A connection's association with a session is not exclusive. A A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs. including sessions associated with other client IDs.
It is permissible for connections of multiple transport types to be It is permissible for connections of multiple transport types to be
associated with the same channel. For example both a TCP and RDMA associated with the same channel. For example both a TCP and RDMA
connection can be associated with the fore channel. In the event an connection can be associated with the fore channel. In the event an
RDMA and non-RDMA connection are associated with the same channel, RDMA and non-RDMA connection are associated with the same channel,
the maximum number of slots SHOULD be at least one more than the the maximum number of slots SHOULD be at least one more than the
total number of credits (Section 2.10.5.1. This way if all RDMA total number of RDMA credits (Section 2.10.5.1. This way if all RDMA
credits are used, the non-RDMA connection can have at least one credits are used, the non-RDMA connection can have at least one
outstanding request. If a server supports multiple transport types, outstanding request. If a server supports multiple transport types,
it MUST allow a client to associate connections from each transport it MUST allow a client to associate connections from each transport
to a channel. to a channel.
It is permissible for a connection of one type of transport to be It is permissible for a connection of one type of transport to be
associated with the fore channel, and a connection of a different associated with the fore channel, and a connection of a different
type to be associated with the backchannel. type to be associated with the backchannel.
2.10.4. Trunking 2.10.4. Trunking
Trunking is the use of multiple connections between a client and Trunking is the use of multiple connections between a client and
server in order to increase the speed of data transfer. NFSv4.1 server in order to increase the speed of data transfer. NFSv4.1
supports two types of trunking: session trunking and client ID supports two types of trunking: session trunking and client ID
trunking. NFSv4.1 servers MUST support trunking. trunking. NFSv4.1 servers MUST support trunking.
Session trunking is essentially the association of multiple Session trunking is essentially the association of multiple
connections, each with a potentially different target network connections, each with potentially different target and/or source
address, to the same session. network addresses, to the same session.
Client ID trunking is the association of multiple sessions to the Client ID trunking is the association of multiple sessions to the
same client ID, major server owner ID (Section 2.5), and server scope same client ID, major server owner ID (Section 2.5), and server scope
(Section 11.7.7). When two servers return the same major server (Section 11.7.7). When two servers return the same major server
owner and server scope it means the two servers are cooperating on owner and server scope it means the two servers are cooperating on
locking state management which is a prerequisite for client ID locking state management which is a prerequisite for client ID
trunking. trunking.
Understanding and distinguishing session and client ID trunking Understanding and distinguishing session and client ID trunking
requires understanding how the results of the EXCHANGE_ID requires understanding how the results of the EXCHANGE_ID
skipping to change at page 44, line 17 skipping to change at page 44, line 21
different EXCHANGE_ID requests, and the eir_clientid, different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and
eir_server_scope results match in both EXCHANGE_ID results, then eir_server_scope results match in both EXCHANGE_ID results, then
the client is permitted to perform session trunking. If the the client is permitted to perform session trunking. If the
client has no session mapping to the tuple of eir_clientid, client has no session mapping to the tuple of eir_clientid,
eir_server_owner.so_major_id, eir_server_scope, eir_server_owner.so_major_id, eir_server_scope,
eir_server_owner.so_minor_id, then it creates the session via a eir_server_owner.so_minor_id, then it creates the session via a
CREATE_SESSION operation over one of the connections, which CREATE_SESSION operation over one of the connections, which
associates the connection to the session. If there is a session associates the connection to the session. If there is a session
for the tuple, the client can send BIND_CONN_TO_SESSION to for the tuple, the client can send BIND_CONN_TO_SESSION to
associate the connection to the session. Or if the client does associate the connection to the session. (Of course, if the
not want to use session trunking, it can invoke CREATE_SESSION on client does not want to use session trunking, it can invoke
the connection. CREATE_SESSION on the connection. This will result in client ID
trunking as described below.)
Client ID Trunking If the eia_clientowner argument is the same in Client ID Trunking If the eia_clientowner argument is the same in
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. Of course, even if the eir_server_owner.so_minor_id server.
fields do match, the client is free to employ client ID trunking
instead of sessiond trunking. The client completes the act of Of course, even if the eir_server_owner.so_minor_id fields do
client ID trunking by invoking CREATE_SESSION on each connection, match, the client is free to employ client ID trunking instead of
using the same client ID that was returned in eir_clientid. These sessiond trunking.
invocations create two sessions and also associate each connection
with each session. The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with the same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
When two servers over two connections claim matching or partially When two servers over two connections claim matching or partially
matching eir_server_owner, eir_server_scope, and eir_clientid values, matching eir_server_owner, eir_server_scope, and eir_clientid values,
the client does not have to trust the servers' claims. The client the client does not have to trust the servers' claims. The client
may verify these claims before trunking traffic in the following may verify these claims before trunking traffic in the following
ways: ways:
skipping to change at page 45, line 15 skipping to change at page 45, line 26
BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or BIND_CONN_TO_SESSION is to be verified according to the SP4_SSV or
SP4_MACH_CRED (Section 18.35) state protection options. For SP4_MACH_CRED (Section 18.35) state protection options. For
SP4_SSV, reliable verification depends on a shared secret (the SP4_SSV, reliable verification depends on a shared secret (the
SSV) that is established via the SET_SSV (Section 18.47) SSV) that is established via the SET_SSV (Section 18.47)
operation. operation.
When a new connection is associated with the session (via the When a new connection is associated with the session (via the
BIND_CONN_TO_SESSION operation, see Section 18.34), if the client BIND_CONN_TO_SESSION operation, see Section 18.34), if the client
specified SP4_SSV state protection for the BIND_CONN_TO_SESSION specified SP4_SSV state protection for the BIND_CONN_TO_SESSION
operation, the client MUST send the BIND_CONN_TO_SESSION with operation, the client MUST send the BIND_CONN_TO_SESSION with
RPCSEC_GSS protection, using integrity or privacy, and a RPCSEC_GSS protection, using integrity or privacy, and an
RPCSEC_GSS using the GSS SSV mechanism (Section 2.10.8). The RPCSEC_GSS handle created with the GSS SSV mechanism
RPCSEC_GSS handle is created by CREATE_SESSION (Section 18.36). (Section 2.10.8).
If the client mistakenly tries to associate a connection to a If the client mistakenly tries to associate a connection to a
session of a wrong server, the server will either reject the session of a wrong server, the server will either reject the
attempt because it is not aware of the session identifier of the attempt because it is not aware of the session identifier of the
BIND_CONN_TO_SESSION arguments, or it will reject the attempt BIND_CONN_TO_SESSION arguments, or it will reject the attempt
because the RPCSEC_GSS authentication fails. Even if the server because the RPCSEC_GSS authentication fails. Even if the server
mistakenly or maliciously accepts the connection association mistakenly or maliciously accepts the connection association
attempt, the RPCSEC_GSS verifier it computes in the response will attempt, the RPCSEC_GSS verifier it computes in the response will
not be verified by the client, so the client will know it cannot not be verified by the client, so the client will know it cannot
use the connection for trunking the specified session. use the connection for trunking the specified session.
skipping to change at page 46, line 17 skipping to change at page 46, line 27
client verifies the claim by issuing a CREATE_SESSION to the client verifies the claim by issuing a CREATE_SESSION to the
second destination address, protected with RPCSEC_GSS integrity second destination address, protected with RPCSEC_GSS integrity
using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If using an RPCSEC_GSS handle returned by the second EXCHANGE_ID. If
the server accepts the CREATE_SESSION request, and if the client the server accepts the CREATE_SESSION request, and if the client
verifies the RPCSEC_GSS verifier and integrity codes, then the verifies the RPCSEC_GSS verifier and integrity codes, then the
client has proof the second server knows the SSV, and thus the two client has proof the second server knows the SSV, and thus the two
servers are the same for the purposes of client ID trunking. servers are the same for the purposes of client ID trunking.
2.10.5. Exactly Once Semantics 2.10.5. Exactly Once Semantics
Via the session, NFSv4.1 offers exactly once semantics (EOS) for Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for
requests sent over a channel. EOS is supported on both the fore and requests sent over a channel. EOS is supported on both the fore and
back channels. back channels.
Each COMPOUND or CB_COMPOUND request that is sent with a leading Each COMPOUND or CB_COMPOUND request that is sent with a leading
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver
exactly once. This requirement holds regardless of whether the exactly once. This requirement holds regardless of whether the
request is sent with reply caching specified (see request is sent with reply caching specified (see
Section 2.10.5.1.2). The requirement holds even if the requester is Section 2.10.5.1.2). The requirement holds even if the requester is
issuing the request over a session created between a pNFS data client issuing the request over a session created between a pNFS data client
and pNFS data server. To understand the rationale for this and pNFS data server. To understand the rationale for this
skipping to change at page 46, line 44 skipping to change at page 47, line 7
o Idempotent non-modifying requests. o Idempotent non-modifying requests.
An example of a non-idempotent request is RENAME. If is obvious that An example of a non-idempotent request is RENAME. If is obvious that
if a replier executes the same RENAME request twice, and the first if a replier executes the same RENAME request twice, and the first
execution succeeds, the re-execution will fail. If the replier execution succeeds, the re-execution will fail. If the replier
returns the result from the re-execution, this result is incorrect. returns the result from the re-execution, this result is incorrect.
Therefore, EOS is required for nonidempotent requests. Therefore, EOS is required for nonidempotent requests.
An example of an idempotent modifying request is a COMPOUND request An example of an idempotent modifying request is a COMPOUND request
containing a WRITE operation. Repeated execution of the same WRITE containing a WRITE operation. Repeated execution of the same WRITE
has the same effect as execution of that write once. Nevertheless, has the same effect as execution of that write a single time.
enforcing EOS for WRITEs and other idempotent modifying requests is Nevertheless, enforcing EOS for WRITEs and other idempotent modifying
necessary to avoid data corruption. requests is necessary to avoid data corruption.
Suppose a client sends WRITEs A and B to a noncompliant server that Suppose a client sends WRITE A to a noncompliant server that does not
does not enforce EOS, and receives no response, perhaps due to a enforce EOS, and receives no response, perhaps due to a network
network partition. The client reconnects to the server and re-sends partition. The client reconnects to the server and re-sends WRITE A.
both WRITEs. Now, the server has outstanding two instances of each Now, the server has outstanding two instances of A. The server can be
of A and B. The server can be in a situation in which it executes and in a situation in which it executes and replies to the retry of A,
replies to the retries of A and B, while the first A and B are still while the first A is still waiting in the server's internal I/O
waiting in the server's I/O system for some resource. Upon receiving system for some resource. Upon receiving the reply to the second
the replies to the second attempts of WRITEs A and B, the client attempt of WRITE A, the client believes its write is done so it is
believes its writes are done so it is free to send WRITE D which free to send WRITE B which overlaps the range of A. When the original
overlaps the range of one or both of A and B. If A or B are A is dispatched from the server's I/O system, and executed (thus the
subsequently executed for the second time, then what has been written second time A will have been written), then what has been written by
by D can be overwritten and thus corrupted. B can be overwritten and thus corrupted.
An example of an idempotent non-modifying request is a COMPOUND An example of an idempotent non-modifying request is a COMPOUND
containing SEQUENCE, PUTFH, READLINK and nothing else. The re- containing SEQUENCE, PUTFH, READLINK and nothing else. The re-
execution of a such a request will not cause data corruption, or execution of a such a request will not cause data corruption, or
produce an incorrect result. Nonetheless, to keep the implementation produce an incorrect result. Nonetheless, to keep the implementation
simple, the replier MUST enforce EOS for all requests whether simple, the replier MUST enforce EOS for all requests whether
idempotent and non-modifying or not. idempotent and non-modifying or not.
Note that true and complete EOS is not possible unless the server Note that true and complete EOS is not possible unless the server
persists the reply cache in stable storage, unless the server is persists the reply cache in stable storage, unless the server is
skipping to change at page 48, line 21 skipping to change at page 48, line 31
which the request is to be sent. The value of N starts out as equal which the request is to be sent. The value of N starts out as equal
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the
response to SEQUENCE or CB_SEQUENCE as described later in this response to SEQUENCE or CB_SEQUENCE as described later in this
section. The slot id must be unused by any of the requests which the section. The slot id must be unused by any of the requests which the
requester has already active on the session. "Unused" here means the requester has already active on the session. "Unused" here means the
requester has no outstanding request for that slot id. requester has no outstanding request for that slot id.
A slot contains a sequence id and the cached reply corresponding to A slot contains a sequence id and the cached reply corresponding to
the request sent with that sequence id. The sequence id is a 32 bit the request sent with that sequence id. The sequence id is a 32 bit
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 -
1). The first time a slot is used, the requester must specify a 1). The first time a slot is used, the requester MUST specify a
sequence id of one (1) (Section 18.36). Each time a slot is reused, sequence id of one (1) (Section 18.36). Each time a slot is reused,
the request MUST specify a sequence id that is one greater than that the request MUST specify a sequence id that is one greater than that
of the previous request on the slot. If the previous sequence id was of the previous request on the slot. If the previous sequence id was
0xFFFFFFFF, then the next request for the slot MUST have the sequence 0xFFFFFFFF, then the next request for the slot MUST have the sequence
id set to zero (i.e. (2^32 - 1) + 1 mod 2^32). id set to zero (i.e. (2^32 - 1) + 1 mod 2^32).
The sequence id accompanies the slot id in each request. It is for The sequence id accompanies the slot id in each request. It is for
the critical check at the server: it used to efficiently determine the critical check at the server: it used to efficiently determine
whether a request using a certain slot id is a retransmit or a new, whether a request using a certain slot id is a retransmit or a new,
never-before-seen request. It is not feasible for the client to never-before-seen request. It is not feasible for the client to
skipping to change at page 49, line 30 skipping to change at page 49, line 41
session, the replier need only cache the results of a limited number session, the replier need only cache the results of a limited number
of COMPOUND requests . The second implication derives from the of COMPOUND requests . The second implication derives from the
first, which is unlike XID-indexed reply caches (also known as first, which is unlike XID-indexed reply caches (also known as
duplicate request caches - DRCs), the slot id-based reply cache duplicate request caches - DRCs), the slot id-based reply cache
cannot be overflowed. Through use of the sequence id to identify cannot be overflowed. Through use of the sequence id to identify
retransmitted requests, the replier does not need to actually cache retransmitted requests, the replier does not need to actually cache
the request itself, reducing the storage requirements of the reply the request itself, reducing the storage requirements of the reply
cache further. These facilities make it practical to maintain all cache further. These facilities make it practical to maintain all
the required entries for an effective reply cache. the required entries for an effective reply cache.
The slot id and sequence id therefore take over the traditional role The slot id, sequence id, and sessionid therefore take over the
of the XID and source network address in the replier's reply cache traditional role of the XID and source network address in the
implementation. This approach is considerably more portable and replier's reply cache implementation. This approach is considerably
completely robust - it is not subject to the reassignment of ports as more portable and completely robust - it is not subject to the
clients reconnect over IP networks. In addition, the RPC XID is not reassignment of ports as clients reconnect over IP networks. In
used in the reply cache, enhancing robustness of the cache in the addition, the RPC XID is not used in the reply cache, enhancing
face of any rapid reuse of XIDs by the requester. While the replier robustness of the cache in the face of any rapid reuse of XIDs by the
does not care about the XID for the purposes of reply cache requester. While the replier does not care about the XID for the
management (but the replier MUST return the same XID that was in the purposes of reply cache management (but the replier MUST return the
request), nonetheless there are considerations for the XID in NFSv4.1 same XID that was in the request), nonetheless there are
that are the same as all other previous versions of NFS. The RPC XID considerations for the XID in NFSv4.1 that are the same as all other
remains in each message and must be formulated in NFSv4.1 requests as previous versions of NFS. The RPC XID remains in each message and
it any other ONC RPC request. The reasons include: must be formulated in NFSv4.1 requests as in any other ONC RPC
request. The reasons include:
o The RPC layer retains its existing semantics and implementation. o The RPC layer retains its existing semantics and implementation.
o The requester and replier must be able to interoperate at the RPC o The requester and replier must be able to interoperate at the RPC
layer, prior to the NFSv4.1 decoding of the SEQUENCE or layer, prior to the NFSv4.1 decoding of the SEQUENCE or
CB_SEQUENCE operation CB_SEQUENCE operation.
o If an operation is being used that does not start with SEQUENCE or o If an operation is being used that does not start with SEQUENCE or
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot id, sequence id, and sessionid (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Givem that well formulated XIDs continue to be required, this begs Given that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the sessionid in the reply means the
requester does not have to use the XID to lookup the sessionid, which requester does not have to use the XID to lookup the sessionid, which
would be necessary if the connection were associated with multiple would be necessary if the connection were associated with multiple
sessions. Having the slot id and sequence id in the reply means sessions. Having the slot id and sequence id in the reply means
requester does not have to use the XID to lookup the slot id and requester does not have to use the XID to lookup the slot id and
sequence id. Furhermore, since the XID is only 32 bits, it is too sequence id. Furhermore, since the XID is only 32 bits, it is too
small to guarantee the re-association of a reply with its request small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([27]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always provide a slot id information. The requester must always indicate the slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
above the actual instantaneous usage to maintain some minimum or above the actual instantaneous usage to maintain some minimum or
optimal level. This provides a way for the requester to yield unused optimal level. This provides a way for the requester to yield unused
request slots back to the replier, which in turn can use the request slots back to the replier, which in turn can use the
information to reallocate resources. information to reallocate resources.
The replier responds with both a new target highest_slotid, and an The replier responds with both a new target highest_slotid, and an
enforced highest_slotid, described as follows: enforced highest_slotid, described as follows:
skipping to change at page 51, line 19 skipping to change at page 51, line 31
even though the replier knows there are no outstanding requests a even though the replier knows there are no outstanding requests a
higher slot ids, it MAY take more forceful action. When faced higher slot ids, it MAY take more forceful action. When faced
with intransigence, the replier MAY reply with a new enforced with intransigence, the replier MAY reply with a new enforced
highest_slotid that is less than its previous enforced highest_slotid that is less than its previous enforced
highest_slotid. Thereafter, if the requester continues to send highest_slotid. Thereafter, if the requester continues to send
requests with a highest_slotid that is greater than the replier's requests with a highest_slotid that is greater than the replier's
new enforced highest_slotid the server MAY return new enforced highest_slotid the server MAY return
NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater NFS4ERR_BAD_HIGHSLOT, unless the slot id in the request is greater
than the new enforced highest_slotid, and the request is a retry. than the new enforced highest_slotid, and the request is a retry.
The replier SHOULD keep slots it wants to retire around until the The replier SHOULD retain the slots it wants to retire until the
requester sends a request with a highest_slotid less than or equal requester sends a request with a highest_slotid less than or equal
to the replier's new enforced highest_slotid. Also a request with to the replier's new enforced highest_slotid. Also if a request
a slot that is higher than the new enforced highest_slotid can be is received with a slot that is higher than the new enforced
retired if the requester specifies a sequence id that is not equal highest_slotid, and the sequence id is one higher than what is in
what is in the slot's reply cache. In other words, once the the slot's reply cache, then the server can both retire the slot
replier has forcibly lowered the enforced highest_slotid, the and return NFS4ERR_BADSLOT (however the server MUST NOT do one and
not the other). (The reason it is safe to retire the slot is
because that by using the next sequenceid, the client is
indicating it has received the previous reply for the slot.) Once
the replier has forcibly lowered the enforced highest_slotid, the
requester is only allowed to send retries to the to-be-retired requester is only allowed to send retries to the to-be-retired
slots. slots.
o The requester SHOULD use the lowest available slot when issuing a o The requester SHOULD use the lowest available slot when issuing a
new request. This way, the replier may be able to retire slot new request. This way, the replier may be able to retire slot
entries faster. However, where the replier is actively adjusting entries faster. However, where the replier is actively adjusting
its granted highest_slotid, it will not not be able to use only its granted highest_slotid, it will not not be able to use only
the receipt of the slot id and highest_slotid in the request. the receipt of the slot id and highest_slotid in the request.
Neither the slot id nor the highest_slotid used in a request may Neither the slot id nor the highest_slotid used in a request may
reflect the replier's current idea of the requester's session reflect the replier's current idea of the requester's session
limit, because the request may have been sent from the requester limit, because the request may have been sent from the requester
before the update was received. Therefore, in the downward before the update was received. Therefore, in the downward
adjustment case, the replier may have to retain a number of reply adjustment case, the replier may have to retain a number of reply
cache entries at least as large as the old value of maximum cache entries at least as large as the old value of maximum
requests outstanding, until it can infer that the requester has requests outstanding, until it can infer that the requester has
seen a reply containing the new granted highest_slotid. The seen a reply containing the new granted highest_slotid. The
replier can infer that requester as seen such a reply when it replier can infer that requester as seen such a reply when it
receives a new request with the same slotid as the request replied receives a new request with the same slotid as the request replied
skipping to change at page 53, line 14 skipping to change at page 53, line 31
destroy the session). destroy the session).
Note that it is not fatal for a client to retry without a disconnect Note that it is not fatal for a client to retry without a disconnect
between the request and retry. However the retry does consume between the request and retry. However the retry does consume
resources, especially with RDMA, where each request, retry or not, resources, especially with RDMA, where each request, retry or not,
consumes a credit. Retries for no reason, especially retries sent consumes a credit. Retries for no reason, especially retries sent
shortly after the previous attempt, are a poor use of network shortly after the previous attempt, are a poor use of network
bandwidth and defeat the purpose of a transport's inherent congestion bandwidth and defeat the purpose of a transport's inherent congestion
control system. control system.
A client MUST wait for a reply to a request before using the slot for A requester MUST wait for a reply to a request before using the slot
another request. If it does not wait for a reply, then the client for another request. If it does not wait for a reply, then the
does not know what sequence id to use for the slot on its next requester does not know what sequence id to use for the slot on its
request. For example, suppose a client sends a request with sequence next request. For example, suppose a requester sends a request with
id 1, and does not wait for the response. The next time it uses the sequence id 1, and does not wait for the response. The next time it
slot, it sends the new request with sequence id 2. If the server has uses the slot, it sends the new request with sequence id 2. If the
not seen the request with sequence id 1, then the server is not replier has not seen the request with sequence id 1, then the replier
expecting sequence id 2, and rejects the client's new request with is not expecting sequence id 2, and rejects the requester's new
NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or CB_SEQUENCE). request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or
CB_SEQUENCE).
RDMA fabrics do not guarantee that the memory handles (Steering Tags) RDMA fabrics do not guarantee that the memory handles (Steering Tags)
within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that
of a single connection. Therefore, handles used by the direct of a single connection. Therefore, handles used by the direct
operations become invalid after connection loss. The server must operations become invalid after connection loss. The server must
ensure that any RDMA operations which must be replayed from the reply ensure that any RDMA operations which must be replayed from the reply
cache use the newly provided handle(s) from the most recent request. cache use the newly provided handle(s) from the most recent request.
A retry might be sent while the original request is still in progress A retry might be sent while the original request is still in progress
on the replier. The replier SHOULD deal with the issue by by on the replier. The replier SHOULD deal with the issue by returning
returning NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but
operation, but implementations MAY return NFS4ERR_MISORDERED. Since implementations MAY return NFS4ERR_MISORDERED. Since errors from
errors from SEQUENCE and CB_SEQUENCE are never recorded in the reply SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this
cache, this approach allows the results of the execution of the approach allows the results of the execution of the original request
original request to be properly recorded in the reply cache (assuming to be properly recorded in the reply cache (assuming the requester
the requester specified the reply to be cached). specified the reply to be cached).
2.10.5.3. Resolving Server Callback Races 2.10.5.3. Resolving Server Callback Races
It is possible for server callbacks to arrive at the client before It is possible for server callbacks to arrive at the client before
the reply from related fore channel operations. For example, a the reply from related fore channel operations. For example, a
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
skipping to change at page 54, line 46 skipping to change at page 55, line 15
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
because it is possible that it will be delayed indefinitely. The because it is possible that it will be delayed indefinitely. The
client should assume the likely case that the reply will arrive client should assume the likely case that the reply will arrive
within the average round trip time for COMPOUND requests to the within the average round trip time for COMPOUND requests to the
server, and wait that period of time. If that period of time expires server, and wait that period of time. If that period of time expires
it can respond to the CB_COMPOUND with NFS4ERR_DELAY. it can respond to the CB_COMPOUND with NFS4ERR_DELAY.
There are other scenarios under which callbacks may race replies, There are other scenarios under which callbacks may race replies.
among them pNFS layout recalls, described in Section 12.5.5.2. Among them are pNFS layout recalls as described in Section 12.5.5.2.
2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues 2.10.5.4. COMPOUND and CB_COMPOUND Construction Issues
Very large requests and replies may pose both buffer management Very large requests and replies may pose both buffer management
issues (especially with RDMA) and reply cache issues. When the issues (especially with RDMA) and reply cache issues. When the
session is created, (Section 18.36), for each channel (fore and session is created, (Section 18.36), for each channel (fore and
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
skipping to change at page 57, line 19 skipping to change at page 57, line 35
any requests that were sent and executed before the server restarted. any requests that were sent and executed before the server restarted.
If the replier is a client then there is no need for it to persist If the replier is a client then there is no need for it to persist
any more information, unless the client will be persisting all other any more information, unless the client will be persisting all other
state across client restart. In which case, the server will never state across client restart. In which case, the server will never
see any NFSv4.1-level protocol manifestation of a client restart. If see any NFSv4.1-level protocol manifestation of a client restart. If
the replier is a server, with just the slot table and sessionid the replier is a server, with just the slot table and sessionid
persisting, any requests the client retries after the server restart persisting, any requests the client retries after the server restart
will return the results that are cached in reply cache. and any new will return the results that are cached in reply cache. and any new
requests (i.e. the sequence id is one (1) greater than the slot's requests (i.e. the sequence id is one (1) greater than the slot's
sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered: dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session the accept new requests as well as retries. To re-animate a session the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequenceid that is used for creating sessions (see
Section 18.35 and Section 18.36. This is a prerequisite to let Section 18.35 and Section 18.36. This is a prerequisite to let
skipping to change at page 63, line 12 skipping to change at page 63, line 28
Assuming a proper safe guard, using the per-machine credential for Assuming a proper safe guard, using the per-machine credential for
operations like CREATE_SESSION, BIND_CONN_TO_SESSION, operations like CREATE_SESSION, BIND_CONN_TO_SESSION,
DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from
associating a rogue connection with a session, or associating a rogue associating a rogue connection with a session, or associating a rogue
session with a client ID. session with a client ID.
There are at least three scenarios for the SP4_MACH_CRED option: There are at least three scenarios for the SP4_MACH_CRED option:
1. That the system administrator configures a unique, permanent per- 1. That the system administrator configures a unique, permanent per-
machine credential for one of the mandated GSS mechanisms (for machine credential for one of the mandated GSS mechanisms (for
example, if Kerberos V5 is used, a "keytab" for principal named example, if Kerberos V5 is used, a "keytab" containing a
after client host name could be used). principal named after client host name could be used).
2. The client is used by a single user, and so the client ID and its 2. The client is used by a single user, and so the client ID and its
sessions are used by just that user. If the user's credential sessions are used by just that user. If the user's credential
expires, then session and client ID maintenance cannot occur, but expires, then session and client ID maintenance cannot occur, but
since the client has a single user, only that user is since the client has a single user, only that user is
inconvenienced. inconvenienced.
3. The physical client has multiple users, but the client 3. The physical client has multiple users, but the client
implementation has a unique client ID for each user. This is implementation has a unique client ID for each user. This is
effectively the same as the second scenario, but a disadvantage effectively the same as the second scenario, but a disadvantage
skipping to change at page 64, line 51 skipping to change at page 65, line 19
because the client SHOULD have modified the SSV due to Eve using because the client SHOULD have modified the SSV due to Eve using
the new session, Bob cannot get revenge on Eve by associating a the new session, Bob cannot get revenge on Eve by associating a
rogue connection with the session. rogue connection with the session.
The question is how did the legitimate client detect that Eve has The question is how did the legitimate client detect that Eve has
hijacked the old session? When the client detects that a new hijacked the old session? When the client detects that a new
principal, Bob, wants to use the session, it SHOULD have sent a principal, Bob, wants to use the session, it SHOULD have sent a
SET_SSV, which leads to following sub-scenarios: SET_SSV, which leads to following sub-scenarios:
* Let us suppose that from the rogue connection, Eve sent a * Let us suppose that from the rogue connection, Eve sent a
SET_SSV with the same slot id and sequence that the legitimate SET_SSV with the same slot id and sequence id that the
client later uses. The server will assume this is a retry, and legitimate client later uses. The server will assume the
return to the legitimate client the reply it sent Eve. However, SET_SSV sent with Bob's credentials is a retry, and return to
unless Eve can correctly guess the SSV the legitimate client the legitimate client the reply it sent Eve. However, unless
will use, the digest verification checks in the SET_SSV Eve can correctly guess the SSV the legitimate client will use,
response will fail. That is an indication to the client that the digest verification checks in the SET_SSV response will
the session has apparently been hijacked. fail. That is an indication to the client that the session has
apparently been hijacked.
* Alternatively, Eve sent a SET_SSV with a different slot id than * Alternatively, Eve sent a SET_SSV with a different slot id than
the legitimate client uses for its SET_SSV. Then the digest the legitimate client uses for its SET_SSV. Then the digest
verification on the server fails, and it is again apparent to verification of the SET_SSV send with Bob's credentials fails
the client that the session has been hijacked. on the server fails, and the error returned to the client makes
it apparent that the session has been hijacked.
* Alternatively, Eve sent an operation other than SET_SSV, but * Alternatively, Eve sent an operation other than SET_SSV, but
with the same slot id and sequence that the legitimate client with the same slot id and sequence that the legitimate client
uses for its SET_SSV. The server returns to the legitimate uses for its SET_SSV. The server returns to the legitimate
client the response it sent Eve. The client sees that the client the response it sent Eve. The client sees that the
response is not at all what it expects. The client assumes response is not at all what it expects. The client assumes
either session hijacking or a server bug, and either way either session hijacking or a server bug, and either way
destroys the old session. destroys the old session.
o Eve associates a rogue connection with the session as above, and o Eve associates a rogue connection with the session as above, and
then destroys the session. Again, Bob goes to use the server from then destroys the session. Again, Bob goes to use the server from
the legitimate client by issuing a SET_SSV. The client receives the legitimate client, which sends a SET_SSV using Bob's
an error that indicates the session does not exist. When the credentials. The client receives an error that indicates the
client tries to create a new session, this will fail because the session does not exist. When the client tries to create a new
SSV it has does not that the server has, and now the client knows session, this will fail because the SSV it has does not match that
the session was hijacked. The legitimate client establishes a new the server has, and now the client knows the session was hijacked.
client ID as before. The legitimate client establishes a new client ID as before.
o If Eve creates a connection before the legitimate client o If Eve creates a connection before the legitimate client
establishes an SSV, because the initial value of the SSV is zero establishes an SSV, because the initial value of the SSV is zero
and therefore known, Eve can send a SET_SSV that will pass the and therefore known, Eve can send a SET_SSV that will pass the
digest verification check. However because the new connection has digest verification check. However because the new connection has
not been associated with the session, the SET_SSV is rejected for not been associated with the session, the SET_SSV is rejected for
that reason. that reason.
In summary an attacker's disruption of state when SP4_SSV protection In summary, an attacker's disruption of state when SP4_SSV protection
is in use is limited to the formative period of a client ID, its is in use is limited to the formative period of a client ID, its
first session, and the establishment of the SSV. Once a non- first session, and the establishment of the SSV. Once a non-
malicious user uses the client ID, the client quickly detects any malicious user uses the client ID, the client quickly detects any
hijack and rectifies the situation. Once a non-malicious user hijack and rectifies the situation. Once a non-malicious user
successfully modifies the SSV, the attacker cannot use NFSv4.1 successfully modifies the SSV, the attacker cannot use NFSv4.1
operations to disrupt the non-malicious user. operations to disrupt the non-malicious user.
Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches
prevent hijacking of a transport connection that has previously been prevent hijacking of a transport connection that has previously been
associated with a session. If the goal of a counter threat strategy associated with a session. If the goal of a counter threat strategy
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. is to prevent connection hijacking, the use of IPsec is RECOMMENDED.
If the goal of a counter threat strategy is to prevent a connection If a connection hijack occurs, the hijacker could in theory change
hijacker from making unauthorized state changes, then the locking state and negatively impact the service to legitimate
SP4_MACH_CRED protection approach can be used with a client ID per clients. However if the server is configured to require the use of
user (i.e. the aforementioned third scenario for machine credential RPCSEC_GSS with integrity or privacy on the affected file objects,
state protection). For each unique user, the client invokes and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35),
EXCHANGE_ID with the user's credential, specifying SP4_MACH_CRED is in force, this will thwart unauthorized attempts to change locking
protections, and specifying that all operations MUST be protected state.
with the machine credential. The server will then reject any
subsequent operations on the client ID or its sessions that do not
use RPCSEC_GSS with privacy or integrity and do not use the same
credential that created the client ID.
2.10.8. The SSV GSS Mechanism 2.10.8. The SSV GSS Mechanism
The SSV provides the secret key for a mechanism that NFSv4.1 uses for The SSV provides the secret key for a mechanism that NFSv4.1 uses for
state protection. Contexts for this mechanism are not established state protection. Contexts for this mechanism are not established
via the RPCSEC_GSS protocol. Instead, the contexts are automatically via the RPCSEC_GSS protocol. Instead, the contexts are automatically
created when EXCHANGE_ID specifies SP4_SSV protection. The only created when EXCHANGE_ID specifies SP4_SSV protection. The only
tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the tokens defined are the PerMsgToken (emitted by GSS_GetMIC) and the
SealedMessage (emitted by GSS_Wrap). SealedMessage token (emitted by GSS_Wrap).
The mechanism OID for the SSV mechanism is: The mechanism OID for the SSV mechanism is:
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech
(1.3.6.1.4.1.28882.1.1). While the SSV mechanisms does not define (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any
any initial context tokens, the OID can be used to let servers initial context tokens, the OID can be used to let servers indicate
indicate that the SSV mechanism is acceptable whenever the client that the SSV mechanism is acceptable whenever the client sends a
sends a SECINFO or SECINFO_NO_NAME operation (see Section 2.6). SECINFO or SECINFO_NO_NAME operation (see Section 2.6).
The SSV mechanism defines four subkeys derived from the SSV value. The SSV mechanism defines four subkeys derived from the SSV value.
Each time SET_SSV is invoked the subkeys are recalculated by the Each time SET_SSV is invoked the subkeys are recalculated by the
client and server. The four subkeys are calculated by from each of client and server. The calculation of each of the four subkeys
the valid ssv_subkey4 enumerated values. The calculation uses the depends on each of the four respective ssv_subkey4 enumerated values.
HMAC ([11]), algorithm, using the current SSV as the key, the one way The calculation uses the HMAC [11], algorithm, using the current SSV
hash algorithm as negotiated by EXCHANGE_ID, and the input text as as the key, the one way hash algorithm as negotiated by EXCHANGE_ID,
represented by the XDR encoded enumeration of type ssv_subkey4. and the input text as represented by the XDR encoded enumeration of
type ssv_subkey4.
/* Input for computing subkeys */ /* Input for computing subkeys */
enum ssv_subkey4 { enum ssv_subkey4 {
SSV4_SUBKEY_MIC_I2T = 1, SSV4_SUBKEY_MIC_I2T = 1,
SSV4_SUBKEY_MIC_T2I = 2, SSV4_SUBKEY_MIC_T2I = 2,
SSV4_SUBKEY_SEAL_I2T = 3, SSV4_SUBKEY_SEAL_I2T = 3,
SSV4_SUBKEY_SEAL_T2I = 4 SSV4_SUBKEY_SEAL_T2I = 4
}; };
The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating
message integrity codes (MICs) that originate from the NFSv4.1 message integrity codes (MICs) that originate from the NFSv4.1
client, whether as part of a request over the fore channel, or a client, whether as part of a request over the fore channel, or a
response over the backchannel. The subkey derived from SSV4_SUBKEY- response over the backchannel. The subkey derived from SSV4_SUBKEY-
MIST2I is used for MICs originating from the NFSv4.1 server. The MIST2I is used for MICs originating from the NFSv4.1 server. The
subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text subkey derived from SSV4_SUBKEY_SEAL_I2T is used for encryption text
originating from the NFSv4.1 client and the subkey derived from originating from the NFSv4.1 client and the subkey derived from
SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the SSV4_SUBKEY_SEAL_T2I is used for encryption text originating from the
NFSv4.1 server. NFSv4.1 server.
The field smt_hmac is an HMAC calculated by using the subkey derived
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one
way hash algorithm as negotiated by EXCHANGE_ID, and the input text
as represented by data of type ssv_mic_plain_tkn4. The field
smpt_ssv_seq is the same as smt_ssv_seq. The field smt_orig_plain is
the input text as passed into GSS_GetMIC().
The PerMsgToken description is based on an XDR definition: The PerMsgToken description is based on an XDR definition:
/* Input for computing smt_hmac */ /* Input for computing smt_hmac */
struct ssv_mic_plain_tkn4 { struct ssv_mic_plain_tkn4 {
uint32_t smpt_ssv_seq; uint32_t smpt_ssv_seq;
opaque smpt_orig_plain<>; opaque smpt_orig_plain<>;
}; };
/* SSV GSS PerMsgToken token */ /* SSV GSS PerMsgToken token */
struct ssv_mic_tkn4 { struct ssv_mic_tkn4 {
uint32_t smt_ssv_seq; uint32_t smt_ssv_seq;
opaque smt_hmac<>; opaque smt_hmac<>;
}; };
The field smt_hmac is an HMAC calculated by using the subkey derived
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one
way hash algorithm as negotiated by EXCHANGE_ID, and the input text
as represented by data of type ssv_mic_plain_tkn4. The field
smpt_ssv_seq is the same as smt_ssv_seq. The field smpt_orig_plain
is the "message" input passed to GSS_GetMIC() (see Section 2.3.1 of
[7]). The caller of GSS_GetMIC() provides a pointer to a buffer
containing the plain text. The SSV mechanism's entry point for
GSS_GetMIC() encodes this into an opaque array, and the encoding will
include an initial four byte length, plus any necessary padding.
Prepended to this will be the XDR encoded value of smpt_ssv_seq thus
making up an XDR encoding of a value of data type ssv_mic_plain_tkn4,
which in turn is the input into the HMAC.
The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type
ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence
number which is equal to 1 after SET_SSV (Section 18.47) is called number which is equal to 1 after SET_SSV (Section 18.47) is called
the first time on a client ID. Thereafter, it is incremented on each the first time on a client ID. Thereafter, it is incremented on each
SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the SET_SSV. Thus smt_ssv_seq represents the version of the SSV at the
time GSS_GetMIC() was called. As noted in Section 18.35, the client time GSS_GetMIC() was called. As noted in Section 18.35, the client
and server can maintain multiple concurrent versions of the SSV. and server can maintain multiple concurrent versions of the SSV.
This allows the SSV to be changed without serializing all RPC calls This allows the SSV to be changed without serializing all RPC calls
that use the SSV mechanism with SET_SSV operations. that use the SSV mechanism with SET_SSV operations. Once the HMAC is
calculated, it is XDR encoded into smt_hmac, which will include an
initial four byte length, and any necessary padding. Prepended to
this will be the XDR encoded value of smt_ssv_seq.
The SealedMessage description is based on an XDR definition: The SealedMessage description is based on an XDR definition:
/* Input for computing ssct_encr_data and ssct_hmac */ /* Input for computing ssct_encr_data and ssct_hmac */
struct ssv_seal_plain_tkn4 { struct ssv_seal_plain_tkn4 {
opaque sspt_confounder<>; opaque sspt_confounder<>;
uint32_t sspt_ssv_seq; uint32_t sspt_ssv_seq;
opaque sspt_orig_plain<>; opaque sspt_orig_plain<>;
opaque sspt_pad<>; opaque sspt_pad<>;
}; };
skipping to change at page 69, line 7 skipping to change at page 69, line 32
The ssct_hmac field is the result of computing an HMAC using value of The ssct_hmac field is the result of computing an HMAC using value of
the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The
key is the subkey derived from SSV4_SUBKEY_MIC_I2T or key is the subkey derived from SSV4_SUBKEY_MIC_I2T or
SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that
negotiated by EXCHANGE_ID. negotiated by EXCHANGE_ID.
The sspt_confounder field is a random value. The sspt_confounder field is a random value.
The sspt_ssv_seq field is the same as ssvt_ssv_seq. The sspt_ssv_seq field is the same as ssvt_ssv_seq.
The sspt_orig_plain field is the original plaintext as passed to The field sspt_orig_plain field is the original plaintext and is the
GSS_Wrap(). "input_message" input passed to GSS_Wrap() (see Section 2.3.3 of
[7]). As with the handling of the plaintext by the SSV mechanism's
GSS_GetMIC() entry point, the entry point for GSS_Wrap() expects a
pointer to the plaintext, and will XDR encode an opaque array into
sspt_orig_plain representing the plain text, along with the other
fields of an instance of data type ssv_seal_plain_tkn4.
The sspt_pad field is present to support encryption algorithms that The sspt_pad field is present to support encryption algorithms that
require inputs to be in fixed sized blocks. The content of sspt_pad require inputs to be in fixed sized blocks. The content of sspt_pad
is zero filled except for the length. Beware that the XDR encoding is zero filled except for the length. Beware that the XDR encoding
of ssv_seal_plain_tkn4 contains three variable length arrays, and so of ssv_seal_plain_tkn4 contains three variable length arrays, and so
each array consumes four bytes for an array length, and each array each array consumes four bytes for an array length, and each array
that follows the length is always padded to a multiple of four bytes that follows the length is always padded to a multiple of four bytes
per the XDR standard. per the XDR standard.
For example suppose the encryption algorithm uses 16 byte blocks, and For example suppose the encryption algorithm uses 16 byte blocks, and
skipping to change at page 69, line 36 skipping to change at page 70, line 18
or a total encoding of 16 bytes. The total number of XDR encoded or a total encoding of 16 bytes. The total number of XDR encoded
bytes is thus 8 + 4 + 20 + 16 = 48. bytes is thus 8 + 4 + 20 + 16 = 48.
GSS_Wrap() emits a token that is an XDR encoding of a value of data GSS_Wrap() emits a token that is an XDR encoding of a value of data
type ssv_seal_cipher_tkn4. Note that regardless whether the caller type ssv_seal_cipher_tkn4. Note that regardless whether the caller
of GSS_Wrap() requests confidentiality or not, the token always has of GSS_Wrap() requests confidentiality or not, the token always has
confidentiality. This is because the SSV mechanism is for confidentiality. This is because the SSV mechanism is for
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without
confidentiality. confidentiality.
Effectively there is a single GSS context for a single client ID. There is one SSV per client ID. Effectively there is a single GSS
All RPCSEC_GSS handles share the same GSS context. SSV GSS contexts context for a client ID / SSV pair. All SSV mechanism RPCSEC_GSS
do not expire except when the SSV is destroyed (causes would include handles of a client ID / SSV pair share the same GSS context. SSV
the client ID being destroyed or a server restart). Since one GSS contexts do not expire except when the SSV is destroyed (causes
purpose of context expiration is to replace keys that have been in would include the client ID being destroyed or a server restart).
use for "too long" hence vulnerable to compromise by brute force or Since one purpose of context expiration is to replace keys that have
accident, the client can send periodic SET_SSV operations, by cycling been in use for "too long" hence vulnerable to compromise by brute
through different users' RPCSEC_GSS credentials. This way the SSV is force or accident, the client can replace the SSV key by sending
replaced without destroying the SSV's GSS contexts. periodic SET_SSV operations, by cycling through different users'
RPCSEC_GSS credentials. This way the SSV is replaced without
destroying the SSV's GSS contexts.
SSV RPCSEC_GSS handles can be expired or deleted by the server at any SSV RPCSEC_GSS handles can be expired or deleted by the server at any
time and the EXCHANGE_ID operation can be used to create more SSV time and the EXCHANGE_ID operation can be used to create more SSV
RPCSEC_GSS handles. RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not
imply that the SSV or its GSS context have expired.
The client MUST establish an SSV via SET_SSV before the SSV GSS The client MUST establish an SSV via SET_SSV before the SSV GSS
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC().
If SET_SSV has not been successfully called, attempts to emit tokens If SET_SSV has not been successfully called, attempts to emit tokens
MUST fail. MUST fail.
The SSV mechanism does not support replay detection and sequencing in The SSV mechanism does not support replay detection and sequencing in
its tokens because RPCSEC_GSS does not use those features (See its tokens because RPCSEC_GSS does not use those features (See
Section 5.2.2 "Context Creation Requests" in [4]). Section 5.2.2 "Context Creation Requests" in [4]).
skipping to change at page 70, line 32 skipping to change at page 71, line 18
The client SHOULD honor the following obligations in order to utilize The client SHOULD honor the following obligations in order to utilize
the session: the session:
o Keep a necessary session from going idle on the server. A client o Keep a necessary session from going idle on the server. A client
that requires a session, but nonetheless is not sending operations that requires a session, but nonetheless is not sending operations
risks having the session be destroyed by the server. This is risks having the session be destroyed by the server. This is
because sessions consume resources, and resource limitations may because sessions consume resources, and resource limitations may
force the server to cull an inactive session. force the server to cull an inactive session.
o Destroy the session when not needed. If a client has multiple o Destroy the session when not needed. If a client has multiple
sessions and one of them has no requests waiting for replies, and sessions, one of which has no requests waiting for replies, and
has been idle for some period of time, it SHOULD destroy the has been idle for some period of time, it SHOULD destroy the
session. session.
o Maintain GSS contexts for the backchannel. If the client requires o Maintain GSS contexts for the backchannel. If the client requires
the server to use the RPCSEC_GSS security flavor for callbacks, the server to use the RPCSEC_GSS security flavor for callbacks,
then it needs to be sure the contexts handed to the server via then it needs to be sure the contexts handed to the server via
BACKCHANNEL_CTL are unexpired. BACKCHANNEL_CTL are unexpired.
o Preserve a connection for a backchannel. The server requires a o Preserve a connection for a backchannel. The server requires a
backchannel in order to gracefully recall recallable state, or backchannel in order to gracefully recall recallable state, or
notify the client of certain events. Note that if the connection notify the client of certain events. Note that if the connection
is not being used for the fore channel, there is no way the client is not being used for the fore channel, there is no way for the
tell if the connection is still alive (e.g., the server restarted client tell if the connection is still alive (e.g., the server
without sending a disconnect). The onus is on the server, not the restarted without sending a disconnect). The onus is on the
client, to determine if the backchannel's connection is alive, and server, not the client, to determine if the backchannel's
to indicate in the response to a SEQUENCE operation when the last connection is alive, and to indicate in the response to a SEQUENCE
connection associated with a session's backchannel has operation when the last connection associated with a session's
disconnected. backchannel has disconnected.
2.10.9.3. Steps the Client Takes To Establish a Session 2.10.9.3. Steps the Client Takes To Establish a Session
If the client does not have a client ID, the client sends EXCHANGE_ID If the client does not have a client ID, the client sends EXCHANGE_ID
to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV
protection, in the spo_must_enforce list of operations, it SHOULD at protection, in the spo_must_enforce list of operations, it SHOULD at
minimum specify: CREATE_SESSION, DESTROY_SESSION, minimum specify: CREATE_SESSION, DESTROY_SESSION,
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts
for SP4_SSV protection, the client needs to ask for SSV-based for SP4_SSV protection, the client needs to ask for SSV-based
RPCSEC_GSS handles. RPCSEC_GSS handles.
skipping to change at page 72, line 18 skipping to change at page 72, line 51
callback use have expired, the client MUST establish a new context callback use have expired, the client MUST establish a new context
via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE via BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE
results indicates when callback contexts are nearly expired, or fully results indicates when callback contexts are nearly expired, or fully
expired (see Section 18.46.3). expired (see Section 18.46.3).
2.10.10.1.2. Connection Loss 2.10.10.1.2. Connection Loss
If the client loses the last connection of the session, and if wants If the client loses the last connection of the session, and if wants
to retain the session, then it must create a new connection, and if, to retain the session, then it must create a new connection, and if,
when the client ID was created, BIND_CONN_TO_SESSION was specified in when the client ID was created, BIND_CONN_TO_SESSION was specified in
the spo_must_enforce list, the client MUST use BIND_CONNN_TO_SESSION the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION
to associate the connection with the session. to associate the connection with the session.
If there was a request outstanding at the time the of connection If there was a request outstanding at the time the of connection
loss, then if client wants to continue to use the session it MUST loss, then if client wants to continue to use the session it MUST
retry the request, as described in Section 2.10.5.2. Note that it is retry the request, as described in Section 2.10.5.2. Note that it is
not necessary to retry requests over a connection with the same not necessary to retry requests over a connection with the same
source network address or the same destination network address as the source network address or the same destination network address as the
lost connection. As long as the sessionid, slot id, and sequence id lost connection. As long as the sessionid, slot id, and sequence id
in the retry match that of the original request, the server will in the retry match that of the original request, the server will
recognize the request as a retry if it executed the request prior to recognize the request as a retry if it executed the request prior to
skipping to change at page 75, line 8 skipping to change at page 75, line 36
sr_status_flags field of every SEQUENCE reply until the backchannel sr_status_flags field of every SEQUENCE reply until the backchannel
is reestablished. There are two situations each of which use is reestablished. There are two situations each of which use
different status flags: no connectivity for the session's different status flags: no connectivity for the session's
backchannel, and no connectivity for any session backchannel of the backchannel, and no connectivity for any session backchannel of the
client. See Section 18.46 for a description of the appropriate flags client. See Section 18.46 for a description of the appropriate flags
in sr_status_flags. in sr_status_flags.
2.10.10.2.5. GSS Context Loss 2.10.10.2.5. GSS Context Loss
The server SHOULD monitor when the number RPCSEC_GSS contexts The server SHOULD monitor when the number RPCSEC_GSS contexts
assigned to the backchannel reaches one, and that one context is near assigned to the backchannel reaches one, and when that one context is
expiry (i.e. between one and two periods of lease time), and indicate near expiry (i.e. between one and two periods of lease time),
so in the sr_status_flags field of all SEQUENCE replies. The server indicate so in the sr_status_flags field of all SEQUENCE replies.
MUST indicate when the all of the backchannel's assigned RPCSEC_GSS The server MUST indicate when the all of the backchannel's assigned
contexts have expired in the sr_status_flags field of all SEQUENCE RPCSEC_GSS contexts have expired in the sr_status_flags field of all
replies. SEQUENCE replies.
2.10.11. Parallel NFS and Sessions 2.10.11. Parallel NFS and Sessions
A client and server can potentially be a non-pNFS implementation, a A client and server can potentially be a non-pNFS implementation, a
metadata server implementation, a data server implementation, or two metadata server implementation, a data server implementation, or two
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS,
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not
mutually exclusive) are passed in the EXCHANGE_ID arguments and mutually exclusive) are passed in the EXCHANGE_ID arguments and
results to allow the client to indicate how it wants to use sessions results to allow the client to indicate how it wants to use sessions
created under the client ID, and to allow the server to indicate how created under the client ID, and to allow the server to indicate how
skipping to change at page 76, line 32 skipping to change at page 77, line 13
integer. integer.
o NFS4_MAXFILELEN is the maximum length of a regular file. o NFS4_MAXFILELEN is the maximum length of a regular file.
o NFS4_MAXFILEOFF is the maximum offset into a regular file. o NFS4_MAXFILEOFF is the maximum offset into a regular file.
3.2. Basic Data Types 3.2. Basic Data Types
These are the base NFSv4.1 data types. These are the base NFSv4.1 data types.
+----------------------+--------------------------------------------+ +---------------+---------------------------------------------------+
| Data Type | Definition | | Data Type | Definition |
+----------------------+--------------------------------------------+ +---------------+---------------------------------------------------+
| int32_t | typedef int int32_t; | | int32_t | typedef int int32_t; |
| uint32_t | typedef unsigned int uint32_t; | | uint32_t | typedef unsigned int uint32_t; |
| int64_t | typedef hyper int64_t; | | int64_t | typedef hyper int64_t; |
| uint64_t | typedef unsigned hyper uint64_t; | | uint64_t | typedef unsigned hyper uint64_t; |
| attrlist4<> | typedef opaque attrlist4<>; | | attrlist4 | typedef opaque attrlist4<>; |
| | Used for file/directory attributes | | | Used for file/directory attributes |
| bitmap4<> | typedef uint32_t bitmap4<>; | | bitmap4 | typedef uint32_t bitmap4<>; |
| | Used in attribute array encoding. | | | Used in attribute array encoding. |
| changeid4 | typedef uint64_t changeid4; | | changeid4 | typedef uint64_t changeid4; |
| | Used in definition of change_info | | | Used in definition of change_info |
| clientid4 | typedef uint64_t clientid4; | | clientid4 | typedef uint64_t clientid4; |
| | Shorthand reference to client | | | Shorthand reference to client identification |
| | identification |
| count4 | typedef uint32_t count4; | | count4 | typedef uint32_t count4; |
| | Various count parameters (READ, WRITE, | | | Various count parameters (READ, WRITE, COMMIT) |
| | COMMIT) |
| length4 | typedef uint64_t length4; | | length4 | typedef uint64_t length4; |
| | Describes LOCK lengths | | | Describes LOCK lengths |
| mode4 | typedef uint32_t mode4; | | mode4 | typedef uint32_t mode4; |
| | Mode attribute data type | | | Mode attribute data type |
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | nfs_cookie4 | typedef uint64_t nfs_cookie4; |
| | Opaque cookie value for READDIR | | | Opaque cookie value for READDIR |
| nfs_fh4<NFS4_FHSIZE> | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; |
| | Filehandle definition | | | Filehandle definition |
| nfs_ftype4 | enum nfs_ftype4; | | nfs_ftype4 | enum nfs_ftype4; |
| | Various defined file types | | | Various defined file types |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations | | | Return value for operations |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, | | | Various offset designations (READ, WRITE, LOCK, |
| | LOCK, COMMIT) | | | COMMIT) |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in | | | Quality of protection designation in SECINFO |
| | SECINFO | | sec_oid4 | typedef opaque sec_oid4<>; |
| sec_oid4<> | typedef opaque sec_oid4<>; | | | Security Object Identifier. The sec_oid4 data |
| | Security Object Identifier The sec_oid4 | | | type is not really opaque. Instead it contains an |
| | data type is not really opaque. Instead it | | | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the |
| | contains an ASN.1 OBJECT IDENTIFIER as | | | mech_type argument to GSS_Init_sec_context. See |
| | used by GSS-API in the mech_type argument | | | [7] for details. |
| | to GSS_Init_sec_context. See [7] for |
| | details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | sequence number used for various session | | | sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking | | | Sequence identifier used for file locking |
| sessionid4 | typedef opaque | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; |
| | sessionid4[NFS4_SESSIONID_SIZE]; |
| | Session identifier | | | Session identifier |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | sequencing artifact for various session | | | sequencing artifact for various session |
| | operations (SEQUENCE, CB_SEQUENCE). | | | operations (SEQUENCE, CB_SEQUENCE). |
| utf8string<> | typedef opaque utf8string<>; | | utf8string | typedef opaque utf8string<>; |
| | UTF-8 encoding for strings | | | UTF-8 encoding for strings |
| utf8str_cis | typedef utf8string utf8str_cis; | | utf8str_cis | typedef utf8string utf8str_cis; |
| | Case-insensitive UTF-8 string | | | Case-insensitive UTF-8 string |
| utf8str_cs | typedef utf8string utf8str_cs; | | utf8str_cs | typedef utf8string utf8str_cs; |
| | Case-sensitive UTF-8 string | | | Case-sensitive UTF-8 string |
| utf8str_mixed | typedef utf8string utf8str_mixed; | | utf8str_mixed | typedef utf8string utf8str_mixed; |
| | UTF-8 strings with a case sensitive prefix | | | UTF-8 strings with a case sensitive prefix and a |
| | and a case insensitive suffix. | | | case insensitive suffix. |
| component4 | typedef utf8str_cs component4; | | component4 | typedef utf8str_cs component4; |
| | Represents path name components | | | Represents path name components |
| linktext4 | typedef utf8str_cs linktext4; | | linktext4 | typedef utf8str_cs linktext4; |
| | Symbolic link contents | | | Symbolic link contents |
| pathname4<> | typedef component4 pathname4<>; | | pathname4 | typedef component4 pathname4<>; |
| | Represents path name for fs_locations | | | Represents path name for fs_locations |
| verifier4 | typedef opaque | | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; |
| | verifier4[NFS4_VERIFIER_SIZE]; | | | Verifier used for various operations (COMMIT, |
| | Verifier used for various operations | | | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) |
| | (COMMIT, CREATE, EXCHANGE_ID, OPEN, | | | NFS4_VERIFIER_SIZE is defined as 8. |
| | READDIR, WRITE) NFS4_VERIFIER_SIZE is | +---------------+---------------------------------------------------+
| | defined as 8. |
+----------------------+--------------------------------------------+
End of Base Data Types End of Base Data Types
Table 1 Table 1
3.3. Structured Data Types 3.3. Structured Data Types
3.3.1. nfstime4 3.3.1. nfstime4
struct nfstime4 { struct nfstime4 {
skipping to change at page 80, line 50 skipping to change at page 81, line 30
/* see struct rpcb in RFC 1833 */ /* see struct rpcb in RFC 1833 */
string na_r_netid<>; /* network id */ string na_r_netid<>; /* network id */
string na_r_addr<>; /* universal address */ string na_r_addr<>; /* universal address */
}; };
The netaddr4 structure is used to identify TCP/IP based endpoints. The netaddr4 structure is used to identify TCP/IP based endpoints.
The r_netid and r_addr fields are specified in RFC1833 [26], but they The r_netid and r_addr fields are specified in RFC1833 [26], but they
are underspecified in RFC1833 [26] as far as what they should look are underspecified in RFC1833 [26] as far as what they should look
like for specific protocols. like for specific protocols.
3.3.9.1. Format of netaddr4 for TCP and UDP over IPv4
For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the For TCP over IPv4 and for UDP over IPv4, the format of r_addr is the
US-ASCII string: US-ASCII string:
h1.h2.h3.h4.p1.p2 h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four bytes long. representing an IPv4 address, which is always four bytes long.
Assuming big-endian ordering, h1, h2, h3, and h4, are respectively, Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
the first through fourth bytes each converted to ASCII-decimal. the first through fourth bytes each converted to ASCII-decimal.
Assuming big-endian ordering, p1 and p2 are, respectively, the first Assuming big-endian ordering, p1 and p2 are, respectively, the first
skipping to change at page 81, line 23 skipping to change at page 82, line 5
host, in big-endian order, has an address of 0x0A010307 and there is host, in big-endian order, has an address of 0x0A010307 and there is
a service listening on, in big endian order, port 0x020F (decimal a service listening on, in big endian order, port 0x020F (decimal
527), then complete universal address is "10.1.3.7.2.15". 527), then complete universal address is "10.1.3.7.2.15".
For TCP over IPv4 the value of r_netid is the string "tcp". For UDP For TCP over IPv4 the value of r_netid is the string "tcp". For UDP
over IPv4 the value of r_netid is the string "udp". That this over IPv4 the value of r_netid is the string "udp". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see not imply that UDP/IPv4 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.3.9.2. Format of netaddr4 for TCP and UDP over IPv6
For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the For TCP over IPv6 and for UDP over IPv6, the format of r_addr is the
US-ASCII string: US-ASCII string:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2 x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same way The suffix "p1.p2" is the service port, and is computed the same way
as with universal addresses for TCP and UDP over IPv4. The prefix, as with universal addresses for TCP and UDP over IPv4. The prefix,
"x1:x2:x3:x4:x5:x6:x7:x8", is the standard textual form for "x1:x2:x3:x4:x5:x6:x7:x8", is the preferred textual form for
representing an IPv6 address as defined in Section 2.2 of RFC2373 representing an IPv6 address as defined in Section 2.2 of RFC3513
[13]. Additionally, the two alternative forms specified in Section [13]. Additionally, the two alternative forms specified in Section
2.2 of RFC2373 [13] are also acceptable. 2.2 of RFC3513 are also acceptable.
For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP For TCP over IPv6 the value of r_netid is the string "tcp6". For UDP
over IPv6 the value of r_netid is the string "udp6". That this over IPv6 the value of r_netid is the string "udp6". That this
document specifies the universal address and netid for UDP/IPv6 does document specifies the universal address and netid for UDP/IPv6 does
not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see not imply that UDP/IPv6 is a legal transport for NFSv4.1 (see
Section 2.9). Section 2.9).
3.3.10. state_owner4 3.3.10. state_owner4
struct state_owner4 { struct state_owner4 {
skipping to change at page 101, line 51 skipping to change at page 102, line 51
5.7.2.4. Attribute 17: case_preserving 5.7.2.4. Attribute 17: case_preserving
True, if filename case on this file system are preserved. True, if filename case on this file system are preserved.
5.7.2.5. Attribute 60: change_policy 5.7.2.5. Attribute 60: change_policy
A value created by the server that the client can use to determine if A value created by the server that the client can use to determine if
some server policy related to the current file system has been some server policy related to the current file system has been
subject to change. If the value remains the same then the client can subject to change. If the value remains the same then the client can
be sure that the values of the attributes related to fs location and be sure that the values of the attributes related to fs location and
the fsstat_type field of the fs_status attribute have not changed. the fss_type field of the fs_status attribute have not changed. On
On the other hand, a change in this value does necessarily imply a the other hand, a change in this value does necessarily imply a
change in policy. It is up to the client to interrogate the server change in policy. It is up to the client to interrogate the server
to determine if some policy relevant to it has changed. See to determine if some policy relevant to it has changed. See
Section 3.3.6 for details. Section 3.3.6 for details.
This attribute MUST change when the value returned by the This attribute MUST change when the value returned by the
fs_locations or fs_locations_info attribute changes, when a file fs_locations or fs_locations_info attribute changes, when a file
system goes from read-only to writable or vice versa, or when the system goes from read-only to writable or vice versa, or when the
allowable set of security flavors for the file system or any part allowable set of security flavors for the file system or any part
thereof is changed. thereof is changed.
skipping to change at page 130, line 39 skipping to change at page 131, line 39
6.2.1.5.1. Discussion of EVERYONE@ 6.2.1.5.1. Discussion of EVERYONE@
It is important to note that "EVERYONE@" is not equivalent to the It is important to note that "EVERYONE@" is not equivalent to the
UNIX "other" entity. This is because, by definition, UNIX "other" UNIX "other" entity. This is because, by definition, UNIX "other"
does not include the owner or owning group of a file. "EVERYONE@" does not include the owner or owning group of a file. "EVERYONE@"
means literally everyone, including the owner or owning group. means literally everyone, including the owner or owning group.
6.2.2. Attribute 58: dacl 6.2.2. Attribute 58: dacl
The dacl, and sacl, attributes are like the acl attribute, but dacl The dacl attribute is like the acl attribute, but dacl allows just
and sacl each allow only certain types of ACEs. The dacl attribute ALLOW and DENY ACEs. The dacl attribute supports automatic
allows just ALLOW and DENY ACEs. The dacl and sacl attributes also inheritance (see Section 6.4.3.2).
support automatic inheritance (see Section 6.4.3.2).
6.2.3. Attribute 59: sacl 6.2.3. Attribute 59: sacl
The sacl, and dacl, attributes are like the acl attribute, but dacl The sacl attribute is like the acl attribute, but sacl allows just
and sacl each allow only certain types of ACEs. The sacl attribute AUDIT and ALARM ACEs. The sacl attribute supports automatic
allows just AUDIT and ALARM ACEs. The dacl and sacl attributes also inheritance (see Section 6.4.3.2).
support automatic inheritance (see Section 6.4.3.2).
6.2.4. Attribute 33: mode 6.2.4. Attribute 33: mode
The NFSv4.1 mode attribute is based on the UNIX mode bits. The The NFSv4.1 mode attribute is based on the UNIX mode bits. The
following bits are defined: following bits are defined:
const MODE4_SUID = 0x800; /* set user id on execution */ const MODE4_SUID = 0x800; /* set user id on execution */
const MODE4_SGID = 0x400; /* set group id on execution */ const MODE4_SGID = 0x400; /* set group id on execution */
const MODE4_SVTX = 0x200; /* save text even after use */ const MODE4_SVTX = 0x200; /* save text even after use */
const MODE4_RUSR = 0x100; /* read permission: owner */ const MODE4_RUSR = 0x100; /* read permission: owner */
skipping to change at page 144, line 47 skipping to change at page 145, line 42
client is sending by directing the client to send it using weak client is sending by directing the client to send it using weak
security mechanisms. security mechanisms.
8. State Management 8. State Management
Integrating locking into the NFS protocol necessarily causes it to be Integrating locking into the NFS protocol necessarily causes it to be
stateful. With the inclusion of such features as share reservations, stateful. With the inclusion of such features as share reservations,
file and directory delegations, recallable layouts, and support for file and directory delegations, recallable layouts, and support for
mandatory record locking the protocol becomes substantially more mandatory record locking the protocol becomes substantially more
dependent on proper management of state than the traditional dependent on proper management of state than the traditional
combination of NFS and NLM [XNFS]. These features include expanded combination of NFS and NLM [36]. These features include expanded
locking facilities, which provide some measure of interclient locking facilities, which provide some measure of interclient
exclusion, but the state is also valuable to providing other useful exclusion, but the state is also valuable to providing other useful
features not readily providable using a stateless model. There are features not readily providable using a stateless model. There are
three components to making this state manageable: three components to making this state manageable:
o Clear division between client and server o Clear division between client and server
o Ability to reliably detect inconsistency in state between client o Ability to reliably detect inconsistency in state between client
and server and server
o Simple and robust recovery mechanisms o Simple and robust recovery mechanisms
In this model, the server owns the state information. The client In this model, the server owns the state information. The client
requests changes in locks and the server responds with the changes requests changes in locks and the server responds with the changes
made. Non-client-initiated changes in locking state are infrequent made. Non-client-initiated changes in locking state are infrequent
and the client receives prompt notification of them and can adjust and the client receives prompt notification of them and can adjust
its view of the locking state to reflect the server's changes. its view of the locking state to reflect the server's changes.
skipping to change at page 151, line 18 skipping to change at page 152, line 18
o An indication of the current status of the locks associated with o An indication of the current status of the locks associated with
this stateid. In particular, whether these have been revoked and this stateid. In particular, whether these have been revoked and
if so, for what reason. if so, for what reason.
With this information, an incoming stateid can be validated and and With this information, an incoming stateid can be validated and and
the appropriate error returned when necessary. Special and non- the appropriate error returned when necessary. Special and non-
special stateids are handled separately. (See Section 8.2.3 for a special stateids are handled separately. (See Section 8.2.3 for a
discussion of special stateids). discussion of special stateids).
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived the the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a sessionid
which is invalid, SEQUENCE will return an error and the operation which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
skipping to change at page 151, line 51 skipping to change at page 152, line 51
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
o If the special stateid is one designating the current stateid, and o If the special stateid is one designating the current stateid, and
there is a current stateid, then the current stateid is there is a current stateid, then the current stateid is
substituted for the special stateid and the checks appropriate to substituted for the special stateid and the checks appropriate to
non-special stateids in performed. non-special stateids in performed.
o If the combination is valid in general but is not appropriate to o If the combination is valid in general but is not appropriate to
the context in which the stateid is used (e.g. an all-zero stateid the context in which the stateid is used (e.g. an all-zero stateid
is used when an open stateid is required in a LOCK operation), the is used when an open stateid is required in a LOCK operation), the
the error NFS4ERR_BAD_STATEID is also returned. error NFS4ERR_BAD_STATEID is also returned.
o Otherwise, the check is completed and the special stateid is o Otherwise, the check is completed and the special stateid is
accepted as valid. accepted as valid.
When a stateid is being tested, and the "other" field is neither all When a stateid is being tested, and the "other" field is neither all
zeros or all ones, the following procedure could be used to validate zeros or all ones, the following procedure could be used to validate
an incoming stateid and return an appropriate error, when necessary, an incoming stateid and return an appropriate error, when necessary,
assuming that the "other" field would be divided into a table index assuming that the "other" field would be divided into a table index
and an entry generation. and an entry generation.
skipping to change at page 154, line 12 skipping to change at page 155, line 12
reply cache) on an unexpired lease will result in the lease being reply cache) on an unexpired lease will result in the lease being
implicitly renewed, for the standard renewal period. implicitly renewed, for the standard renewal period.
If the client ID's lease has not expired when the server receives a If the client ID's lease has not expired when the server receives a
SEQUENCE operation, then the server MUST renew the lease. If the SEQUENCE operation, then the server MUST renew the lease. If the
client ID's lease has expired when the server receives a SEQUENCE client ID's lease has expired when the server receives a SEQUENCE
operation, the server MAY renew the lease; this depends on whether operation, the server MAY renew the lease; this depends on whether
any state was revoked as a result of the client's failure to renew any state was revoked as a result of the client's failure to renew
the lease before expiration. the lease before expiration.
Absent other activity that would renew the lease, a COMPOUND
consisting of a single SEQUENCE operation will suffice. The client
should also take communication-related delays into account and take
steps to ensure that the renewal messages actually reach the server
in good time. For example:
o When trunking is in effect, the client should consider issuing
multiple requests on different connections, in order to ensure
that renewal occurs, even in the event of blockage in the path
used for one of those connections.
o TCP retransmission delays might become so large as to approach or
exceed the length of the lease period. This may be particularly
likely when the server is unresponsive due to a reboot; see
Section 8.4.2.1
If the server renews the lease upon receiving a SEQUENCE operation, If the server renews the lease upon receiving a SEQUENCE operation,
the server MUST NOT allow the lease to expire while the rest of the the server MUST NOT allow the lease to expire while the rest of the
operations in the COMPOUND procedure's request are still executing. operations in the COMPOUND procedure's request are still executing.
Once the last operation has finished, and the response to COMPOUND Once the last operation has finished, and the response to COMPOUND
has been sent, the server MUST set the lease to expire no sooner than has been sent, the server MUST set the lease to expire no sooner than
the sum of current time and the value of the lease_time attribute. the sum of current time and the value of the lease_time attribute.
A client ID's lease can expire when it has been been at least the A client ID's lease can expire when it has been been at least the
lease interval (lease_time) since the last lease-renewing SEQUENCE lease interval (lease_time) since the last lease-renewing SEQUENCE
operation was sent on any of the client ID's sessions and there must operation was sent on any of the client ID's sessions and there must
skipping to change at page 154, line 34 skipping to change at page 155, line 50
Because the SEQUENCE operation is the basic mechanism to renew a Because the SEQUENCE operation is the basic mechanism to renew a
lease, and because if must be done at least once for each lease lease, and because if must be done at least once for each lease
period, it is the natural mechanism whereby the server will inform period, it is the natural mechanism whereby the server will inform
the client of changes in the lease status that the client needs to be the client of changes in the lease status that the client needs to be
informed of. The client should inspect the status flags informed of. The client should inspect the status flags
(sr_status_flags) returned by sequence and take the appropriate (sr_status_flags) returned by sequence and take the appropriate
action. (See Section 18.46.3 for details). action. (See Section 18.46.3 for details).
o The status bits SEQ4_STATUS_CB_PATH_DOWN and o The status bits SEQ4_STATUS_CB_PATH_DOWN and
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the
backchannel which the the client may need to address in order to backchannel which the client may need to address in order to
receive callback requests. receive callback requests.
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with
GSS contexts for the backchannel which the client may have to GSS contexts for the backchannel which the client may have to
address to allow callback requests to be sent to it. address to allow callback requests to be sent to it.
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED,
SEQ4_STATUS_ADMIN_STATE_REVOKED, and SEQ4_STATUS_ADMIN_STATE_REVOKED, and
skipping to change at page 155, line 24 skipping to change at page 156, line 41
A critical requirement in crash recovery is that both the client and A critical requirement in crash recovery is that both the client and
the server know when the other has failed. Additionally, it is the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and client has successfully recovered the locks protecting the READ and
WRITE operations. Any that reach the server before the server can WRITE operations. Any that reach the server before the server can
safely determine that the client has recovered enough locking state safely determine that the client has recovered enough locking state
to be sure that such operations can be safely processed must be to be sure that such operations can be safely processed must be
rejected, either because the state presented is no longer valid rejected. This will happen because either:
(NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID) or because
subsequent recovery of locks may make execution of the operation o The state presented is no longer valid since it is associated with
a now invalid clientid. In this case the client will receive
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any
attempt to attach a new session to the existing clientid will
encounter an NFS4ERR_STALE_CLIENTID error.
o Subsequent recovery of locks may make execution of the operation
inappropriate (NFS4ERR_GRACE). inappropriate (NFS4ERR_GRACE).
8.4.1. Client Failure and Recovery 8.4.1. Client Failure and Recovery
In the event that a client fails, the server may release the client's In the event that a client fails, the server may release the client's
locks when the associated lease has expired. Conflicting locks from locks when the associated lease has expired. Conflicting locks from
another client may only be granted after this lease expiration. As another client may only be granted after this lease expiration. As
discussed in Section 8.3, when a client has not failed and re- discussed in Section 8.3, when a client has not failed and re-
establishes his lease before expiration occurs, requests for establishes his lease before expiration occurs, requests for
conflicting locks will not be granted. conflicting locks will not be granted.
skipping to change at page 157, line 16 skipping to change at page 158, line 38
example, CREATE_SESSION, DESTROY_SESSION) returns example, CREATE_SESSION, DESTROY_SESSION) returns
NFS4ERR_STALE_CLIENTID. The client MUST establish a new client NFS4ERR_STALE_CLIENTID. The client MUST establish a new client
ID (Section 8.1) and re-establish its lock state ID (Section 8.1) and re-establish its lock state
(Section 8.4.2.1). (Section 8.4.2.1).
8.4.2.1. State Reclaim 8.4.2.1. State Reclaim
When state information and the associated locks are lost as a result When state information and the associated locks are lost as a result
of a server reboot, the protocol must provide a way to cause that of a server reboot, the protocol must provide a way to cause that
state to be re-established. The approach used is to define, for most state to be re-established. The approach used is to define, for most
type of locking state (layouts are an exception), a request whose types of locking state (layouts are an exception), a request whose
function is to allow the client to re-establish on the server a lock function is to allow the client to re-establish on the server a lock
first obtained from a previous instance. Generally these requests first obtained from a previous instance. Generally these requests
are variants of the requests normally used to create locks of that are variants of the requests normally used to create locks of that
type and are referred to as "reclaim-type" requests and the process type and are referred to as "reclaim-type" requests and the process
of re-establishing such locks is referred to as "reclaiming" them. of re-establishing such locks is referred to as "reclaiming" them.
Because each client must have an opportunity to reclaim all of the Because each client must have an opportunity to reclaim all of the
locks that it has without the possibility that some other client will locks that it has without the possibility that some other client will
be granted a conflicting lock, a special period called the "grace be granted a conflicting lock, a special period called the "grace
period" is devoted to the reclaim process. During this period, period" is devoted to the reclaim process. During this period,
requests creating client IDs and sessions are handled normally, but requests creating client IDs and sessions are handled normally, but
locking requests are subject to special restrictions. Only reclaim- locking requests are subject to special restrictions. Only reclaim-
type locking requests are allowed, unless the server is able to< type locking requests are allowed, unless the server is able to
reliably determine (through state persistently maintained across reliably determine (through state persistently maintained across
reboot instances), that granting any such lock cannot possibly reboot instances), that granting any such lock cannot possibly
conflict with a subsequent reclaim. When a request is made to obtain conflict with a subsequent reclaim. When a request is made to obtain
a new lock (i.e. not a reclaim-type request) during the grace period a new lock (i.e. not a reclaim-type request) during the grace period
and such a determination cannot be made, the server must return the and such a determination cannot be made, the server must return the
error NFS4ERR_GRACE. error NFS4ERR_GRACE.
Once a session is established using the new client ID, the client Once a session is established using the new client ID, the client
will use reclaim-type locking requests (e.g. LOCK requests with will use reclaim-type locking requests (e.g. LOCK requests with
reclaim set to true and OPEN operations with a claim type of reclaim set to true and OPEN operations with a claim type of
skipping to change at page 158, line 17 skipping to change at page 159, line 40
The grace period may last until all clients who are known to possibly The grace period may last until all clients who are known to possibly
have had locks have done a global RECLAIM_COMPLETE operation, have had locks have done a global RECLAIM_COMPLETE operation,
indicating that they have finished reclaiming the locks they held indicating that they have finished reclaiming the locks they held
before the server reboot. This means that a client which has done a before the server reboot. This means that a client which has done a
RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE when RECLAIM_COMPLETE must be prepared to receive an NFS4ERR_GRACE when
attempting to acquire new locks. The server is assumed to maintain attempting to acquire new locks. The server is assumed to maintain
in stable storage a list of clients who may have such locks. The in stable storage a list of clients who may have such locks. The
server may also terminate the grace period before all clients have server may also terminate the grace period before all clients have
done a global RECLAIM_COMPLETE. The server SHOULD NOT terminate the done a global RECLAIM_COMPLETE. The server SHOULD NOT terminate the
grace period before a time equal to the lease period in order to give grace period before a time equal to the lease period in order to give
clients an opportunity to find out about the server reboot. Some clients an opportunity to find out about the server reboot, as a
additional time in order to allow time to establish a new client ID result of issuing requests on associated sessions with a frequency
and session and to effect lock reclaims may be added. Note that governed by the lease time. Note that when a client does not issue
analogous rules apply to file system-specific grace periods discussed such requests (or they are issued by the client but not received by
in Section 11.7.7. the server), it is possible for the grace period to expire before the
client finds out that the server reboot has occurred.
Some additional time in order to allow time to establish a new client
ID and session and to effect lock reclaims may be added to the lease
time. Note that analogous rules apply to file system-specific grace
periods discussed in Section 11.7.7.
If the server can reliably determine that granting a non-reclaim If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients, request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned even within the the NFS4ERR_GRACE error does not have to be returned even within the
grace period, although NFS4ERR_GRACE must always be returned to grace period, although NFS4ERR_GRACE must always be returned to
clients attempting a non-reclaim lock request before doing their own clients attempting a non-reclaim lock request before doing their own
global RECLAIM_COMPLETE. For the server to be able to service READ global RECLAIM_COMPLETE. For the server to be able to service READ
and WRITE operations during the grace period, it must again be able and WRITE operations during the grace period, it must again be able
to guarantee that no possible conflict could arise between a to guarantee that no possible conflict could arise between a
potential reclaim locking request and the READ or WRITE operation. potential reclaim locking request and the READ or WRITE operation.
skipping to change at page 159, line 24 skipping to change at page 161, line 5
requests to be processed during the grace period, it MUST determine requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation subsequently reclaimed would have prevented any I/O operation
processed during the grace period. processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case the client should non-reclaim lock and I/O requests. In this case the client should
employ a retry mechanism for the request. A delay (on the order of employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in the server. Further discussion of the general issue is included in
[Floyd]. The client must account for the server that is able to [37]. The client must account for the server that is able to perform
perform I/O and non-reclaim locking requests within the grace period I/O and non-reclaim locking requests within the grace period as well
as well as those that can not do so. as those that can not do so.
A reclaim-type locking request outside the server's grace period can A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since reboot or restart. I/O request has been granted since reboot or restart.
A server may, upon restart, establish a new value for the lease A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server. for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace However, the server must establish, for this restart event, a grace
skipping to change at page 164, line 31 skipping to change at page 166, line 11
At any point, the server can revoke locks held by a client and the At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and responsible for validating the state information between itself and
the server. Validating locking state for the client means that it the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held. must verify or reclaim state for each lock currently held.
The first occasion of lock revocation is upon server reboot or The first occasion of lock revocation is upon server reboot or
restart. Note that this includes situations in which sessions are restart. Note that this includes situations in which sessions are
persistent and locking state is lost. In this class of instances, persistent and locking state is lost. In this class of instances,
the client will receive an error (NFS4ERR_STALE_STATEID on an the client will receive an error (NFS4ERR_STALE_CLIENTID on an
operation that takes a stateid as an argument or operation that takes client ID, usually as part of recovery in
NFS4ERR_STALE_CLIENTID on an operation that takes a sessionid or response to a problem with the current session) and the client will
client ID) and the client will proceed with normal crash recovery as proceed with normal crash recovery as described in the
described in the Section 8.4.2.1. Section 8.4.2.1.
The second occasion of lock revocation is the inability to renew the The second occasion of lock revocation is the inability to renew the
lease before expiration, as discussed in Section 8.4.3. While this lease before expiration, as discussed in Section 8.4.3. While this
is considered a rare or unusual event, the client must be prepared to is considered a rare or unusual event, the client must be prepared to
recover. The server is responsible for determining the precise recover. The server is responsible for determining the precise
consequences of the lease expiration, informing the client of the consequences of the lease expiration, informing the client of the
scope of the lock revocation decided upon. The client then uses the scope of the lock revocation decided upon. The client then uses the
status information provided by the server in the SEQUENCE results status information provided by the server in the SEQUENCE results
(field sr_status_flags, see Section 18.46.3) to synchronize its (field sr_status_flags, see Section 18.46.3) to synchronize its
locking state with that of the server, in order to recover. locking state with that of the server, in order to recover.
skipping to change at page 175, line 36 skipping to change at page 177, line 13
openowner are inherently serialized because of the owner-based seqid, openowner are inherently serialized because of the owner-based seqid,
multiple OPENs for the same openowner may be done in parallel. When multiple OPENs for the same openowner may be done in parallel. When
clients do this, they may encounter situations in which, because of clients do this, they may encounter situations in which, because of
the existence of hard links, two OPEN operations may turn out to open the existence of hard links, two OPEN operations may turn out to open
the same file, with a later OPEN performed being an upgrade of the the same file, with a later OPEN performed being an upgrade of the
first, with this fact only visible to the client once the operations first, with this fact only visible to the client once the operations
complete. complete.
In this situation, clients may determine the order in which the OPENs In this situation, clients may determine the order in which the OPENs
were performed by examining the stateids returned by the OPENs. were performed by examining the stateids returned by the OPENs.
Stateids that share a common value of the the "other" field can be Stateids that share a common value of the "other" field can be
recognized as having opened the same file, with the order of the recognized as having opened the same file, with the order of the
operations determinable from the order of the "seqid" fields, mod any operations determinable from the order of the "seqid" fields, mod any
possible wraparound of the 32-bit field. possible wraparound of the 32-bit field.
When the possibility exists that the client will send multiple OPENs When the possibility exists that the client will send multiple OPENs
for the same openowner in parallel, it may be the case that an open for the same openowner in parallel, it may be the case that an open
upgrade may happen without the client knowing beforehand that this upgrade may happen without the client knowing beforehand that this
could happen. Because of this possibility, CLOSEs and could happen. Because of this possibility, CLOSEs and
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in
the stateid, to avoid the possibility that the status change the stateid, to avoid the possibility that the status change
skipping to change at page 187, line 48 skipping to change at page 189, line 13
There are two types of open delegations, read and write. A read open There are two types of open delegations, read and write. A read open
delegation allows a client to handle, on its own, requests to open a delegation allows a client to handle, on its own, requests to open a
file for reading that do not deny read access to others. Multiple file for reading that do not deny read access to others. Multiple
read open delegations may be outstanding simultaneously and do not read open delegations may be outstanding simultaneously and do not
conflict. A write open delegation allows the client to handle, on conflict. A write open delegation allows the client to handle, on
its own, all opens. Only one write open delegation may exist for a its own, all opens. Only one write open delegation may exist for a
given file at a given time and it is inconsistent with any read open given file at a given time and it is inconsistent with any read open
delegations. delegations.
When a client has a read open delegation, it is assured that neither When a client has a read open delegation, it is assured that neither
the contents, the attributes, nor the names of any links to the file the contents, the attributes (with the exception of time_access), nor
will change without its knowledge, so long as the delegation is held. the names of any links to the file will change without its knowledge,
When a client has a write open delegation, it may modify the file so long as the delegation is held. When a client has a write open
data locally since no other client will be accessing the file's data. delegation, it may modify the file data locally since no other client
The client holding a write delegation may only locally affect file will be accessing the file's data. The client holding a write
attributes which are intimately connected with the file data: size, delegation may only locally affect file attributes which are
time_modify, change. Changes to other attributes must be reflected intimately connected with the file data: size, change, time_access,
time_metadata, and time_modify. to other attributes must be reflected
on the server. on the server.
When a client has an open delegation, it does not send OPENs or When a client has an open delegation, it does not send OPENs or
CLOSEs to the server but updates the appropriate status internally. CLOSEs to the server but updates the appropriate status internally.
For a read open delegation, opens that cannot be handled locally For a read open delegation, opens that cannot be handled locally
(opens for write or that deny read access) must be sent to the (opens for write or that deny read access) must be sent to the
server. server.
When an open delegation is made, the response to the OPEN contains an When an open delegation is made, the response to the OPEN contains an
open delegation structure which specifies the following: open delegation structure which specifies the following:
skipping to change at page 216, line 8 skipping to change at page 217, line 24
Referrals provide a way of placing a file system in a location within Referrals provide a way of placing a file system in a location within
the namespace essentially without respect to its physical location on the namespace essentially without respect to its physical location on
a given server. This allows a single server or a set of servers to a given server. This allows a single server or a set of servers to
present a multi-server namespace that encompasses file systems present a multi-server namespace that encompasses file systems
located on multiple servers. Some likely uses of this include located on multiple servers. Some likely uses of this include
establishment of site-wide or organization-wide namespaces, or even establishment of site-wide or organization-wide namespaces, or even
knitting such together into a truly global namespace. knitting such together into a truly global namespace.
Referrals occur when a client determines, upon first referencing a Referrals occur when a client determines, upon first referencing a
position in the current namespace, that it is part of a new file position in the current namespace, that it is part of a new file
system and that that file system is absent. When this occurs, system and that the file system is absent. When this occurs,
typically by receiving the error NFS4ERR_MOVED, the actual location typically by receiving the error NFS4ERR_MOVED, the actual location
or locations of the file system can be determined by fetching the or locations of the file system can be determined by fetching the
fs_locations or fs_locations_info attribute. fs_locations or fs_locations_info attribute.
The locations-related attribute may designate a single file system The locations-related attribute may designate a single file system
location or multiple file system locations, to be selected based on location or multiple file system locations, to be selected based on
the needs of the client. The server, in the fs_locations_info the needs of the client. The server, in the fs_locations_info
attribute may specify priorities to be associated with various file attribute may specify priorities to be associated with various file
system location choices. The server may assign different priorities system location choices. The server may assign different priorities
to different locations as reported to individual clients, in order to to different locations as reported to individual clients, in order to
skipping to change at page 252, line 41 skipping to change at page 254, line 10
In an environment in which multiple copies of the same basic set of In an environment in which multiple copies of the same basic set of
data are available, information regarding the particular source of data are available, information regarding the particular source of
such data and the relationships among different copies can be very such data and the relationships among different copies can be very
helpful in providing consistent data to applications. helpful in providing consistent data to applications.
enum fs4_status_type { enum fs4_status_type {
STATUS4_FIXED = 1, STATUS4_FIXED = 1,
STATUS4_UPDATED = 2, STATUS4_UPDATED = 2,
STATUS4_VERSIONED = 3, STATUS4_VERSIONED = 3,
STATUS4_WRITABLE = 4, STATUS4_WRITABLE = 4,
STATUS4_ABSENT = 5 STATUS4_REFERRAL = 5
}; };
struct fs4_status { struct fs4_status {
bool fss_absent;
fs4_status_type fss_type; fs4_status_type fss_type;
utf8str_cs fss_source; utf8str_cs fss_source;
utf8str_cs fss_current; utf8str_cs fss_current;
int32_t fss_age; int32_t fss_age;
nfstime4 fss_version; nfstime4 fss_version;
}; };
The boolean fsstat_absent indicates whether the file system is
currently absent. This value will be set if the file system was The boolean fss_absent indicates whether the file system is currently
previously present and becomes absent, or if the file system has absent. This value will be set if the file system was previously
never been present and the type is STATUS4_REFERRAL. When this present and becomes absent, or if the file system has never been
boolean is set and the type is not STATUS4_REFERRAL, the remaining present and the type is STATUS4_REFERRAL. When this boolean is set
information in the fs4_status reflects that last valid when the file and the type is not STATUS4_REFERRAL, the remaining information in
system was present. the fs4_status reflects that last valid when the file system was
present.
The type value indicates the kind of file system image represented. The type value indicates the kind of file system image represented.
This is of particular importance when using the version values to This is of particular importance when using the version values to
determine appropriate succession of file system images. When determine appropriate succession of file system images. When
fsstat_absent is set, and the file system was previously present, the fss_absent is set, and the file system was previously present, the
type reflected is that when the file was last present. Five types type reflected is that when the file was last present. Five types
are distinguished: are distinguished:
o STATUS4_FIXED which indicates a read-only image in the sense that o STATUS4_FIXED which indicates a read-only image in the sense that
it will never change. The possibility is allowed that, as a it will never change. The possibility is allowed that, as a
result of migration or switch to a different image, changed data result of migration or switch to a different image, changed data
can be accessed, but within the confines of this instance, no can be accessed, but within the confines of this instance, no
change is allowed. The client can use this fact to cache change is allowed. The client can use this fact to cache
aggressively. aggressively.
skipping to change at page 257, line 29 skipping to change at page 259, line 6
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. As noted in the diagram of storage protocols to be defined and used. As noted in the diagram
above, the storage protocol is the method used by the client to store above, the storage protocol is the method used by the client to store
and retrieve data directly from the storage devices. The NFSv4.1 and retrieve data directly from the storage devices. The NFSv4.1
protocol directly defines one storage protocol, the NFSv4.1 storage protocol directly defines one storage protocol, the NFSv4.1 storage
type, and its use. type, and its use.
Examples of other storage protocols that could be used with NFSv4.1's Examples of other storage protocols that could be used with NFSv4.1's
pNFS are: pNFS are:
o Block/volume protocols such as iSCSI ([36]), and FCP ([37]). The o Block/volume protocols such as iSCSI ([38]), and FCP ([39]). The
block/volume protocol support can be independent of the addressing block/volume protocol support can be independent of the addressing
structure of the block/volume protocol used, allowing more than structure of the block/volume protocol used, allowing more than
one protocol to access the same file data and enabling one protocol to access the same file data and enabling
extensibility to other block/volume protocols. extensibility to other block/volume protocols.
o Object protocols such as OSD over iSCSI or Fibre Channel [38]. o Object protocols such as OSD over iSCSI or Fibre Channel [40].
o Other storage protocols, including PVFS and other file systems o Other storage protocols, including PVFS and other file systems
that are in use in HPC environments. that are in use in HPC environments.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between a NFSv4.1 client and server. interoperability between a NFSv4.1 client and server.
skipping to change at page 259, line 26 skipping to change at page 261, line 5
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [31], object [30], and file as those associated with block/volume [31], object [30], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type name space is also defined. Values from the of the layout type name space is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation. experimentation.
As an example, a file layout type could be an array of tuples (e.g., As an example, layout of the file layout type could be an array of
deviceID, file_handle), along with a definition of how the data is tuples (e.g., deviceID, file_handle), along with a definition of how
stored across the devices (e.g., striping). A block/volume layout the data is stored across the devices (e.g., striping). A block/
might be an array of tuples that store <deviceID, block_number, block volume layout might be an array of tuples that store <deviceID,
count> along with information about block size and the associated block_number, block count> along with information about block size
file offset of the block number. An object layout might be an array and the associated file offset of the block number. An object layout
of tuples <deviceID, objectID> and an additional structure (i.e., the might be an array of tuples <deviceID, objectID> and an additional
aggregation map) that defines how the logical byte sequence of the structure (i.e., the aggregation map) that defines how the logical
file data is serialized into the different objects. Note that the byte sequence of the file data is serialized into the different
actual layouts are typically more complex than these simple objects. Note that the actual layouts are typically more complex
expository examples. than these simple expository examples.
Requests for pNFS-related operations will often specify a layout
type. Examples of such operations are GETDEVICEINFO and LAYOUTGET.
The response for these operations will include structures such a
device_addr4 or a layout4, each of which includes a layout type
within it. The layout type sent by the server MUST always be the
same one requested by the client. When a client sends a response
that includes a different layout type, the client SHOULD ignore the
response and behave as if the server had returned an error response.
12.2.8. Layout 12.2.8. Layout
A layout defines how a file's data is organized on one or more A layout defines how a file's data is organized on one or more
storage devices. There are many potential layout types; each of the storage devices. There are many potential layout types; each of the
layout types are differentiated by the storage protocol used to layout types are differentiated by the storage protocol used to
access data and in the aggregation scheme that lays out the file data access data and in the aggregation scheme that lays out the file data
on the underlying storage devices. A layout is precisely identified on the underlying storage devices. A layout is precisely identified
by the following tuple: <client ID, filehandle, layout type, iomode, by the following tuple: <client ID, filehandle, layout type, iomode,
range>; where filehandle refers to the filehandle of the file on the range>; where filehandle refers to the filehandle of the file on the
skipping to change at page 260, line 16 skipping to change at page 261, line 52
(i.e., the storage device/file mapping parameters differ). Note that (i.e., the storage device/file mapping parameters differ). Note that
differing iomodes do not lead to conflicting layouts. It is differing iomodes do not lead to conflicting layouts. It is
permissible for layouts with different iomodes, pertaining to the permissible for layouts with different iomodes, pertaining to the
same byte range, to be held by the same client. An example of this same byte range, to be held by the same client. An example of this
would be copy-on-write functionality for a block/volume layout type. would be copy-on-write functionality for a block/volume layout type.
12.2.9. Layout Iomode 12.2.9. Layout Iomode
The layout iomode (data type layoutiomode4, see Section 3.3.20) The layout iomode (data type layoutiomode4, see Section 3.3.20)
indicates to the metadata server the client's intent to perform indicates to the metadata server the client's intent to perform
either just READ operations (Section 18.22) or a mixture of I/O either just read operations or a mixture of I/O possibly containing
possibly containing WRITE (Section 18.32) and READ operations. For read and write operations. For certain layout types, it is useful
certain layout types, it is useful for a client to specify this for a client to specify this intent at LAYOUTGET (Section 18.43)
intent at LAYOUTGET (Section 18.43) time. For example, block/volume time. For example, block/volume based protocols, block allocation
based protocols, block allocation could occur when a READ/WRITE could occur when a READ/WRITE iomode is specified. A special
iomode is specified. A special LAYOUTIOMODE4_ANY iomode is defined LAYOUTIOMODE4_ANY iomode is defined and can only be used for
and can only be used for LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies
LAYOUTGET. It specifies that layouts pertaining to both READ and that layouts pertaining to both READ and READ/WRITE iomodes are being
READ/WRITE iomodes are being returned or recalled, respectively. returned or recalled, respectively.
A storage device may validate I/O with regards to the iomode; this is A storage device may validate I/O with regards to the iomode; this is
dependent upon storage device implementation and layout type. Thus, dependent upon storage device implementation and layout type. Thus,
if the client's layout iomode is inconsistent with the I/O being if the client's layout iomode is inconsistent with the I/O being
performed, the storage device may reject the client's I/O with an performed, the storage device may reject the client's I/O with an
error indicating a new layout with the correct I/O mode should be error indicating a new layout with the correct I/O mode should be
fetched. For example, if a client gets a layout with a READ iomode fetched. For example, if a client gets a layout with a READ iomode
and performs a WRITE to a storage device, the storage device is and performs a WRITE to a storage device, the storage device is
allowed to reject that WRITE. allowed to reject that WRITE.
skipping to change at page 263, line 43 skipping to change at page 265, line 29
which a layout is held, does not necessarily conflict with the which a layout is held, does not necessarily conflict with the
holding of the layout that describes the file being modified. holding of the layout that describes the file being modified.
Therefore, it is the requirement of the storage protocol or layout Therefore, it is the requirement of the storage protocol or layout
type that determines the necessary behavior. For example, block/ type that determines the necessary behavior. For example, block/
volume layout types require that the layout's iomode agree with the volume layout types require that the layout's iomode agree with the
type of I/O being performed. type of I/O being performed.
Depending upon the layout type and storage protocol in use, storage Depending upon the layout type and storage protocol in use, storage
device access permissions may be granted by LAYOUTGET and may be device access permissions may be granted by LAYOUTGET and may be
encoded within the type-specific layout. For an example of storage encoded within the type-specific layout. For an example of storage
device access permissions see an object based protocol such as [38]. device access permissions see an object based protocol such as [40].
If access permissions are encoded within the layout, the metadata If access permissions are encoded within the layout, the metadata
server SHOULD recall the layout when those permissions become invalid server SHOULD recall the layout when those permissions become invalid
for any reason; for example when a file becomes unwritable or for any reason; for example when a file becomes unwritable or
inaccessible to a client. Note, clients are still required to inaccessible to a client. Note, clients are still required to
perform the appropriate access operations with open, lock and access perform the appropriate access operations with open, lock and access
as described above. The degree to which it is possible for the as described above. The degree to which it is possible for the
client to circumvent these access operations and the consequences of client to circumvent these access operations and the consequences of
doing so must be clearly specified by the individual layout type doing so must be clearly specified by the individual layout type
specifications. In addition, these specifications must be clear specifications. In addition, these specifications must be clear
about the requirements and non-requirements for the checking about the requirements and non-requirements for the checking
skipping to change at page 283, line 30 skipping to change at page 285, line 7
does this, there is no need to wait for the original storage device. does this, there is no need to wait for the original storage device.
12.8. Metadata and Storage Device Roles 12.8. Metadata and Storage Device Roles
If the same physical hardware is used to implement both a metadata If the same physical hardware is used to implement both a metadata
server and storage device, then the same hardware entity is to be server and storage device, then the same hardware entity is to be
understood to be implementing two distinct roles and it is important understood to be implementing two distinct roles and it is important
that it be clearly understood on behalf of which role the hardware is that it be clearly understood on behalf of which role the hardware is
executing at any given time. executing at any given time.
Various sub-cases can be distinguished. Two sub-cases can be distinguished.
1. The storage device uses NFSv4.1 as the storage protocol. The 1. The storage device uses NFSv4.1 as the storage protocol, i.e.
same physical hardware is used to implement both a metadata and same physical hardware is used to implement both a metadata and
data server. If an EXCHANGE_ID operation sent to the metadata data server. See Section 13.1 for a description how multiple
server has EXCHGID4_FLAG_USE_PNFS_MDS set and roles are handled.
EXCHGID4_FLAG_USE_PNFS_DS not set, the role of all sessions
derived from the client ID is metadata server-only. If an
EXCHANGE_ID operation sent to the data server has
EXCHGID4_FLAG_USE_PNFS_DS set and EXCHGID4_FLAG_USE_PNFS_MDS not
set, the role of all sessions derived from the client ID is data
server only. These assertions are true regardless whether the
network addresses of the metadata server and data server are the
same or not.
The client will use the same client owner for both the metadata
server EXCHANGE_ID and the data server EXCHANGE_ID. Since the
client sends one with EXCHGID4_FLAG_USE_PNFS_MDS set, and the
other with EXCHGID4_FLAG_USE_PNFS_DS set, the server will need to
return unique client IDs, as well as server_owners, which will
eliminate ambiguity about dual roles the same physical entity
serves.
2. The metadata and data server each return EXCHANGE_ID results with
EXCHGID4_FLAG_USE_PNFS_DS and EXCHGID4_FLAG_USE_PNFS_MDS both
set, the server_owner and server_scope results are the same, and
the client IDs are the same, and if RPCSEC_GSS is used, the
server principals are the same. As noted in Section 2.10.4 the
two servers are the same, whether they have the same network
address or not. If the pNFS server is ambiguous in its
EXCHANGE_ID results as to what role a client ID may be used for,
yet still requires the NFSv4.1 request be directed in a manner
specific to a role (e.g. a READ request for a particular offset
directed to the metadata server role might use a different offset
if the READ was intended for the data server role, if the file is
using STRIPE4_DENSE packing, see Section 13.4.4), the pNFS server
may mark the the metadata filehandle differently from the data
filehandle so that operations addressed to the metadata server
can be distinguished from those directed to the data servers.
Marking the metadata and data server filehandles differently (and
this is RECOMMENDED) is possible because the former are derived
from OPEN operations, and the latter are derived from LAYOUTGET
operations.
Note, that it may be the case that while the metadata server and
the storage device are distinct from one client's point of view,
the roles may be reversed according to another client's point of
view. For example, in the cluster file system model a metadata
server to one client, may be a data server to another client. If
NFSv4.1 is being used as the storage protocol, then pNFS servers
need to mark filehandles according to their specific roles.
3. The storage device does not use NFSv4.1 as the storage protocol, 2. The storage device does not use NFSv4.1 as the storage protocol,
and the same physical hardware is used to implement both a and the same physical hardware is used to implement both a
metadata and storage device. Whether distinct network addresses metadata and storage device. Whether distinct network addresses
are used to access metadata server and storage device is are used to access metadata server and storage device is
immaterial, because, it is always clear to the pNFS client and immaterial, because, it is always clear to the pNFS client and
server, from upper layer protocol being used (NFSv4.1 or non- server, from upper layer protocol being used (NFSv4.1 or non-
NFSv4.1) what role the request to the common server network NFSv4.1) what role the request to the common server network
address is directed to. address is directed to.
12.9. Security Considerations for pNFS 12.9. Security Considerations for pNFS
skipping to change at page 286, line 41 skipping to change at page 287, line 23
+--------------------------------------------------------+ +--------------------------------------------------------+
As the above table implies, a server can have one or two roles. A As the above table implies, a server can have one or two roles. A
server can be both a metadata server and a data server or it can be server can be both a metadata server and a data server or it can be
both a data server and non-metadata server. In addition to returning both a data server and non-metadata server. In addition to returning
two roles in EXCHANGE_ID's results, and thus serving both roles via a two roles in EXCHANGE_ID's results, and thus serving both roles via a
common client ID, a server can serve two roles by returning a unique common client ID, a server can serve two roles by returning a unique
client ID and server owner for each role in each of two EXCHANGE_ID client ID and server owner for each role in each of two EXCHANGE_ID
results, with each result indicating each role. results, with each result indicating each role.
In the case of a server with concurrent PNFS roles that are served by
a common client ID, if the EXCHANGE_ID request from the client has
zero or a combination of the bits set in eia_flags, the server result
should set bits which represent the higher of the acceptable
combination of the server roles, with a preference to match the roles
requested by the client. Thus if a client request has
(EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS |
EXCHID4_FLAG_USE_PNFS_DS) flags set, and the server is both a
metadata server and a data server, serving both the roles by a common
client ID, the server SHOULD return with (EXCHGID4_FLAG_USE_PNFS_MDS
| EXCHGID4_FLAG_USE_PNFS_DS) set.
In the case of a server that has multiple concurrent PNFS roles, each
role served by a unique client ID, if the client specifies zero or a
combination of roles in the request, the server results SHOULD return
only one of the roles from the combination specified by the client
request. If the role specified by the server result does not match
the intended use by the client, the client should send the
EXCHANGE_ID specifying just the interested PNFS role.
If a pNFS metadata client gets a layout that refers it to an NFSv4.1 If a pNFS metadata client gets a layout that refers it to an NFSv4.1
data server, it needs a client ID on that data server. If it does data server, it needs a client ID on that data server. If it does
not yet have a client ID from the server that had the not yet have a client ID from the server that had the
EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then
the client must send an EXCHANGE_ID to the data server, using the the client must send an EXCHANGE_ID to the data server, using the
same co_ownerid as it sent to the metadata server, with the same co_ownerid as it sent to the metadata server, with the
EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's
EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the
client may use the client ID to create sessions that will exchange client may use the client ID to create sessions that will exchange
pNFS data operations. The client ID returned by the data server has pNFS data operations. The client ID returned by the data server has
skipping to change at page 287, line 27 skipping to change at page 288, line 29
If metadata server routing and/or identity information is encoded in If metadata server routing and/or identity information is encoded in
data server filehandles, when the metadata server identity or data server filehandles, when the metadata server identity or
location changes, the data server filehandles it gave out must become location changes, the data server filehandles it gave out must become
become invalid (stale), and so the metadata server must first recall become invalid (stale), and so the metadata server must first recall
the layouts. Invalidating a data server filehandle does not render the layouts. Invalidating a data server filehandle does not render
the NFS client's data cache invalid. The client's cache should map a the NFS client's data cache invalid. The client's cache should map a
data server filehandle to a metadata server filehandle, and a data server filehandle to a metadata server filehandle, and a
metadata server filehandle to cached data. metadata server filehandle to cached data.
If a server is both a metadata server and a data server, the server
might need to distinguish operations on files that are directed to
the metadata server from those that are directed to the data server.
It is RECOMMENDED that the values of the filehandles returned by the
LAYOUTGET operation to be different than the value of the filehandle
returned by the OPEN of the same file.
Another secenario is for the metadata server and the storage device
to be distinct from one client's point of view, and the roles
reversed from another client's point of view. For example, in the
cluster file system model a metadata server to one client, may be a
data server to another client. If NFSv4.1 is being used as the
storage protocol, then pNFS servers need to encode the values of
filehandles according to their specific roles.
13.2. File Layout Definitions 13.2. File Layout Definitions
The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout
type, and may be applicable to other layout types. type, and may be applicable to other layout types.
Unit. A unit is a fixed size quantity of data written to a data Unit. A unit is a fixed size quantity of data written to a data
server. server.
Pattern. A pattern is a method of distributing one or more equal Pattern. A pattern is a method of distributing one or more equal
sized units across a set of data servers. A pattern is iterated sized units across a set of data servers. A pattern is iterated
skipping to change at page 291, line 5 skipping to change at page 292, line 28
converting the client's logical I/O offset (e.g. the current converting the client's logical I/O offset (e.g. the current
offset in a POSIX file descriptor before the read() or write() offset in a POSIX file descriptor before the read() or write()
system call is sent) into the stripe unit number (see system call is sent) into the stripe unit number (see
Section 13.4.1). Section 13.4.1).
If dense packing is used, then nfl_pattern_offset is also needed If dense packing is used, then nfl_pattern_offset is also needed
to convert the client's logical I/O offset to an offset on the to convert the client's logical I/O offset to an offset on the
file on the data server corresponding to the stripe unit number file on the data server corresponding to the stripe unit number
(see Section 13.4.4). (see Section 13.4.4).
Note that nfl_pattern_offset is not always the same as lo_offset.
For example, via the LAYOUTGET operation, a client might request
a layout starting at offset 1000 of a file that has its striping
pattern start at offset 0.
5. nfl_fh_list: An array of data server filehandles for each list of 5. nfl_fh_list: An array of data server filehandles for each list of
data servers in each element of the nflda_multipath_ds_list data servers in each element of the nflda_multipath_ds_list
array. The number of elements in nfl_fh_list depends on whether array. The number of elements in nfl_fh_list depends on whether
sparse or dense packing is being used. sparse or dense packing is being used.
* If sparse packing is being used, the number of elements in * If sparse packing is being used, the number of elements in
nfl_fh_list MUST be one of three values: nfl_fh_list MUST be one of three values:
+ Zero. This means that filehandles used for each data + Zero. This means that filehandles used for each data
server are the same as the filehandle returned by the OPEN server are the same as the filehandle returned by the OPEN
skipping to change at page 298, line 22 skipping to change at page 299, line 46
sparse packing example, the corresponding dense packing would have sparse packing example, the corresponding dense packing would have
all stripe units of all data files filled. Logical stripe units 0, all stripe units of all data files filled. Logical stripe units 0,
3, 6, ... of the file would live on stripe units 0, 1, 2, ... of the 3, 6, ... of the file would live on stripe units 0, 1, 2, ... of the
file of data server 1, logical stripe units 1, 4, 7, ... of the file file of data server 1, logical stripe units 1, 4, 7, ... of the file
would live on stripe units 0, 1, 2, ... of the file of data server 2, would live on stripe units 0, 1, 2, ... of the file of data server 2,
and logical stripe units 2, 5, 8, ... of the file would live on and logical stripe units 2, 5, 8, ... of the file would live on
stripe units 0, 1, 2, ... of the file of data server 3. stripe units 0, 1, 2, ... of the file of data server 3.
Because dense packing does not leave holes on the data servers, the Because dense packing does not leave holes on the data servers, the
pNFS client is allowed to write to any offset of any data file of any pNFS client is allowed to write to any offset of any data file of any
data server in the stripe. Thus the the data servers need not know data server in the stripe. Thus the data servers need not know the
the file's striping pattern. file's striping pattern.
The calculation to determine the byte offset within the data file for The calculation to determine the byte offset within the data file for
dense data server layouts is: dense data server layouts is:
stripe_width = stripe_unit_size * N; stripe_width = stripe_unit_size * N;
where N = number of elements in nflda_stripe_indices. where N = number of elements in nflda_stripe_indices.
relative_offset = file_offset - nfl_pattern_offset; relative_offset = file_offset - nfl_pattern_offset;
data_file_offset = floor(relative_offset / stripe_width) data_file_offset = floor(relative_offset / stripe_width)
skipping to change at page 316, line 9 skipping to change at page 317, line 43
| NFS4ERR_BADSESSION | 10052 | Section 15.1.11.1 | | NFS4ERR_BADSESSION | 10052 | Section 15.1.11.1 |
| NFS4ERR_BADSLOT | 10053 | Section 15.1.11.2 | | NFS4ERR_BADSLOT | 10053 | Section 15.1.11.2 |
| NFS4ERR_BADTYPE | 10007 | Section 15.1.4.1 | | NFS4ERR_BADTYPE | 10007 | Section 15.1.4.1 |
| NFS4ERR_BADXDR | 10036 | Section 15.1.1.1 | | NFS4ERR_BADXDR | 10036 | Section 15.1.1.1 |
| NFS4ERR_BAD_COOKIE | 10003 | Section 15.1.1.2 | | NFS4ERR_BAD_COOKIE | 10003 | Section 15.1.1.2 |
| NFS4ERR_BAD_HIGH_SLOT | 10077 | Section 15.1.11.3 | | NFS4ERR_BAD_HIGH_SLOT | 10077 | Section 15.1.11.3 |
| NFS4ERR_BAD_RANGE | 10042 | Section 15.1.8.1 | | NFS4ERR_BAD_RANGE | 10042 | Section 15.1.8.1 |
| NFS4ERR_BAD_SEQID | 10026 | Section 15.1.16.1 | | NFS4ERR_BAD_SEQID | 10026 | Section 15.1.16.1 |
| NFS4ERR_BAD_SESSION_DIGEST | 10051 | Section 15.1.12.2 | | NFS4ERR_BAD_SESSION_DIGEST | 10051 | Section 15.1.12.2 |
| NFS4ERR_BAD_STATEID | 10025 | Section 15.1.5.2 | | NFS4ERR_BAD_STATEID | 10025 | Section 15.1.5.2 |
| NFS4ERR_CB_PATH_DOWN | 10048 | Section 15.1.16.2 | | NFS4ERR_CB_PATH_DOWN | 10048 | Section 15.1.11.4 |
| NFS4ERR_CLID_INUSE | 10017 | Section 15.1.13.2 | | NFS4ERR_CLID_INUSE | 10017 | Section 15.1.13.2 |
| NFS4ERR_CLIENTID_BUSY | 10074 | Section 15.1.13.1 | | NFS4ERR_CLIENTID_BUSY | 10074 | Section 15.1.13.1 |
| NFS4ERR_COMPLETE_ALREADY | 10054 | Section 15.1.9.1 | | NFS4ERR_COMPLETE_ALREADY | 10054 | Section 15.1.9.1 |
| NFS4ERR_CONN_BINDING_NOT_ENFORCED | 10073 | Section 15.1.12.3 | | NFS4ERR_CONN_BINDING_NOT_ENFORCED | 10073 | Section 15.1.12.3 |
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | Section 15.1.11.5 | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | 10055 | Section 15.1.11.6 |
| NFS4ERR_DEADLOCK | 10045 | Section 15.1.8.2 | | NFS4ERR_DEADLOCK | 10045 | Section 15.1.8.2 |
| NFS4ERR_DEADSESSION | 10078 | Section 15.1.11.4 | | NFS4ERR_DEADSESSION | 10078 | Section 15.1.11.5 |
| NFS4ERR_DELAY | 10008 | Section 15.1.1.3 | | NFS4ERR_DELAY | 10008 | Section 15.1.1.3 |
| NFS4ERR_DELEG_ALREADY_WANTED | 10056 | Section 15.1.14.1 | | NFS4ERR_DELEG_ALREADY_WANTED | 10056 | Section 15.1.14.1 |
| NFS4ERR_DENIED | 10010 | Section 15.1.8.3 | | NFS4ERR_DENIED | 10010 | Section 15.1.8.3 |
| NFS4ERR_DIRDELEG_UNAVAIL | 10084 | Section 15.1.14.2 | | NFS4ERR_DIRDELEG_UNAVAIL | 10084 | Section 15.1.14.2 |
| NFS4ERR_DQUOT | 69 | Section 15.1.4.2 | | NFS4ERR_DQUOT | 69 | Section 15.1.4.2 |
| NFS4ERR_ENCR_ALG_UNSUPP | 10079 | Section 15.1.13.3 | | NFS4ERR_ENCR_ALG_UNSUPP | 10079 | Section 15.1.13.3 |
| NFS4ERR_EXIST | 17 | Section 15.1.4.3 | | NFS4ERR_EXIST | 17 | Section 15.1.4.3 |
| NFS4ERR_EXPIRED | 10011 | Section 15.1.5.4 | | NFS4ERR_EXPIRED | 10011 | Section 15.1.5.4 |
| NFS4ERR_FBIG | 27 | Section 15.1.4.4 | | NFS4ERR_FBIG | 27 | Section 15.1.4.4 |
| NFS4ERR_FHEXPIRED | 10014 | Section 15.1.2.2 | | NFS4ERR_FHEXPIRED | 10014 | Section 15.1.2.2 |
| NFS4ERR_FILE_OPEN | 10046 | Section 15.1.4.5 | | NFS4ERR_FILE_OPEN | 10046 | Section 15.1.4.5 |
| NFS4ERR_GRACE | 10013 | Section 15.1.9.2 | | NFS4ERR_GRACE | 10013 | Section 15.1.9.2 |
| NFS4ERR_HASH_ALG_UNSUPP | 10072 | Section 15.1.13.4 | | NFS4ERR_HASH_ALG_UNSUPP | 10072 | Section 15.1.13.4 |
| NFS4ERR_INVAL | 22 | Section 15.1.1.4 | | NFS4ERR_INVAL | 22 | Section 15.1.1.4 |
| NFS4ERR_IO | 5 | Section 15.1.4.6 | | NFS4ERR_IO | 5 | Section 15.1.4.6 |
| NFS4ERR_ISDIR | 21 | Section 15.1.2.3 | | NFS4ERR_ISDIR | 21 | Section 15.1.2.3 |
| NFS4ERR_LAYOUTTRYLATER | 10058 | Section 15.1.10.3 | | NFS4ERR_LAYOUTTRYLATER | 10058 | Section 15.1.10.3 |
| NFS4ERR_LAYOUTUNAVAILABLE | 10059 | Section 15.1.10.4 | | NFS4ERR_LAYOUTUNAVAILABLE | 10059 | Section 15.1.10.4 |
| NFS4ERR_LEASE_MOVED | 10031 | Section 15.1.16.3 | | NFS4ERR_LEASE_MOVED | 10031 | Section 15.1.16.2 |
| NFS4ERR_LOCKED | 10012 | Section 15.1.8.4 | | NFS4ERR_LOCKED | 10012 | Section 15.1.8.4 |
| NFS4ERR_LOCKS_HELD | 10037 | Section 15.1.8.5 | | NFS4ERR_LOCKS_HELD | 10037 | Section 15.1.8.5 |
| NFS4ERR_LOCK_NOTSUPP | 10043 | Section 15.1.8.6 | | NFS4ERR_LOCK_NOTSUPP | 10043 | Section 15.1.8.6 |
| NFS4ERR_LOCK_RANGE | 10028 | Section 15.1.8.7 | | NFS4ERR_LOCK_RANGE | 10028 | Section 15.1.8.7 |
| NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 15.1.3.2 | | NFS4ERR_MINOR_VERS_MISMATCH | 10021 | Section 15.1.3.2 |
| NFS4ERR_MLINK | 31 | Section 15.1.4.7 | | NFS4ERR_MLINK | 31 | Section 15.1.4.7 |
| NFS4ERR_MOVED | 10019 | Section 15.1.2.4 | | NFS4ERR_MOVED | 10019 | Section 15.1.2.4 |
| NFS4ERR_NAMETOOLONG | 63 | Section 15.1.7.3 | | NFS4ERR_NAMETOOLONG | 63 | Section 15.1.7.3 |
| NFS4ERR_NOENT | 2 | Section 15.1.4.8 | | NFS4ERR_NOENT | 2 | Section 15.1.4.8 |
| NFS4ERR_NOFILEHANDLE | 10020 | Section 15.1.2.5 | | NFS4ERR_NOFILEHANDLE | 10020 | Section 15.1.2.5 |
| NFS4ERR_NOMATCHING_LAYOUT | 10060 | Section 15.1.10.5 | | NFS4ERR_NOMATCHING_LAYOUT | 10060 | Section 15.1.10.5 |
| NFS4ERR_NOSPC | 28 | Section 15.1.4.9 | | NFS4ERR_NOSPC | 28 | Section 15.1.4.9 |
| NFS4ERR_NOTDIR | 20 | Section 15.1.2.6 | | NFS4ERR_NOTDIR | 20 | Section 15.1.2.6 |
| NFS4ERR_NOTEMPTY | 66 | Section 15.1.4.10 | | NFS4ERR_NOTEMPTY | 66 | Section 15.1.4.10 |
| NFS4ERR_NOTSUPP | 10004 | Section 15.1.1.5 | | NFS4ERR_NOTSUPP | 10004 | Section 15.1.1.5 |
| NFS4ERR_NOT_ONLY_OP | 10081 | Section 15.1.3.3 | | NFS4ERR_NOT_ONLY_OP | 10081 | Section 15.1.3.3 |
| NFS4ERR_NOT_SAME | 10027 | Section 15.1.15.3 | | NFS4ERR_NOT_SAME | 10027 | Section 15.1.15.3 |
| NFS4ERR_NO_GRACE | 10033 | Section 15.1.9.3 | | NFS4ERR_NO_GRACE | 10033 | Section 15.1.9.3 |
| NFS4ERR_NXIO | 6 | Section 15.1.16.4 | | NFS4ERR_NXIO | 6 | Section 15.1.16.3 |
| NFS4ERR_OLD_STATEID | 10024 | Section 15.1.5.5 | | NFS4ERR_OLD_STATEID | 10024 | Section 15.1.5.5 |
| NFS4ERR_OPENMODE | 10038 | Section 15.1.8.8 | | NFS4ERR_OPENMODE | 10038 | Section 15.1.8.8 |
| NFS4ERR_OP_ILLEGAL | 10044 | Section 15.1.3.4 | | NFS4ERR_OP_ILLEGAL | 10044 | Section 15.1.3.4 |
| NFS4ERR_OP_NOT_IN_SESSION | 10070 | Section 15.1.3.5 | | NFS4ERR_OP_NOT_IN_SESSION | 10071 | Section 15.1.3.5 |
| NFS4ERR_PERM | 1 | Section 15.1.6.2 | | NFS4ERR_PERM | 1 | Section 15.1.6.2 |
| NFS4ERR_PNFS_IO_HOLE | 10075 | Section 15.1.10.6 | | NFS4ERR_PNFS_IO_HOLE | 10075 | Section 15.1.10.6 |
| NFS4ERR_PNFS_NO_LAYOUT | 10080 | Section 15.1.10.7 | | NFS4ERR_PNFS_NO_LAYOUT | 10080 | Section 15.1.10.7 |
| NFS4ERR_RECALLCONFLICT | 10061 | Section 15.1.14.3 | | NFS4ERR_RECALLCONFLICT | 10061 | Section 15.1.14.3 |
| NFS4ERR_RECLAIM_BAD | 10034 | Section 15.1.9.4 | | NFS4ERR_RECLAIM_BAD | 10034 | Section 15.1.9.4 |
| NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 15.1.9.5 | | NFS4ERR_RECLAIM_CONFLICT | 10035 | Section 15.1.9.5 |
| NFS4ERR_REJECT_DELEG | 10085 | Section 15.1.14.4 | | NFS4ERR_REJECT_DELEG | 10085 | Section 15.1.14.4 |
| NFS4ERR_REP_TOO_BIG | 10066 | Section 15.1.3.6 | | NFS4ERR_REP_TOO_BIG | 10066 | Section 15.1.3.6 |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | Section 15.1.3.7 | | NFS4ERR_REP_TOO_BIG_TO_CACHE | 10067 | Section 15.1.3.7 |
| NFS4ERR_REQ_TOO_BIG | 10065 | Section 15.1.3.8 | | NFS4ERR_REQ_TOO_BIG | 10065 | Section 15.1.3.8 |
| NFS4ERR_RESTOREFH | 10030 | Section 15.1.16.5 | | NFS4ERR_RESTOREFH | 10030 | Section 15.1.16.4 |
| NFS4ERR_RETRY_UNCACHED_REP | 10068 | Section 15.1.3.9 | | NFS4ERR_RETRY_UNCACHED_REP | 10068 | Section 15.1.3.9 |
| NFS4ERR_RETURNCONFLICT | 10086 | Section 15.1.10.8 | | NFS4ERR_RETURNCONFLICT | 10086 | Section 15.1.10.8 |
| NFS4ERR_ROFS | 30 | Section 15.1.4.11 | | NFS4ERR_ROFS | 30 | Section 15.1.4.11 |
| NFS4ERR_SAME | 10009 | Section 15.1.15.4 | | NFS4ERR_SAME | 10009 | Section 15.1.15.4 |
| NFS4ERR_SHARE_DENIED | 10015 | Section 15.1.8.9 | | NFS4ERR_SHARE_DENIED | 10015 | Section 15.1.8.9 |
| NFS4ERR_SEQUENCE_POS | 10064 | Section 15.1.3.10 | | NFS4ERR_SEQUENCE_POS | 10064 | Section 15.1.3.10 |
| NFS4ERR_SEQ_FALSE_RETRY | 10076 | Section 15.1.11.6 | | NFS4ERR_SEQ_FALSE_RETRY | 10076 | Section 15.1.11.7 |
| NFS4ERR_SEQ_MISORDERED | 10063 | Section 15.1.11.7 | | NFS4ERR_SEQ_MISORDERED | 10063 | Section 15.1.11.8 |
| NFS4ERR_SERVERFAULT | 10006 | Section 15.1.1.6 | | NFS4ERR_SERVERFAULT | 10006 | Section 15.1.1.6 |
| NFS4ERR_STALE | 70 | Section 15.1.2.7 | | NFS4ERR_STALE | 70 | Section 15.1.2.7 |
| NFS4ERR_STALE_CLIENTID | 10022 | Section 15.1.13.5 | | NFS4ERR_STALE_CLIENTID | 10022 | Section 15.1.13.5 |
| NFS4ERR_STALE_STATEID | 10023 | Section 15.1.16.6 | | NFS4ERR_STALE_STATEID | 10023 | Section 15.1.16.5 |
| NFS4ERR_SYMLINK | 10029 | Section 15.1.2.8 | | NFS4ERR_SYMLINK | 10029 | Section 15.1.2.8 |
| NFS4ERR_TOOSMALL | 10005 | Section 15.1.1.7 | | NFS4ERR_TOOSMALL | 10005 | Section 15.1.1.7 |
| NFS4ERR_TOO_MANY_OPS | 10070 | Section 15.1.3.11 | | NFS4ERR_TOO_MANY_OPS | 10070 | Section 15.1.3.11 |
| NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | Section 15.1.10.9 | | NFS4ERR_UNKNOWN_LAYOUTTYPE | 10062 | Section 15.1.10.9 |
| NFS4ERR_UNSAFE_COMPOUND | 10069 | Section 15.1.3.12 | | NFS4ERR_UNSAFE_COMPOUND | 10069 | Section 15.1.3.12 |
| NFS4ERR_WRONGSEC | 10016 | Section 15.1.6.3 | | NFS4ERR_WRONGSEC | 10016 | Section 15.1.6.3 |
| NFS4ERR_WRONG_CRED | 10082 | Section 15.1.6.4 | | NFS4ERR_WRONG_CRED | 10082 | Section 15.1.6.4 |
| NFS4ERR_WRONG_TYPE | 10083 | Section 15.1.2.9 | | NFS4ERR_WRONG_TYPE | 10083 | Section 15.1.2.9 |
| NFS4ERR_XDEV | 18 | Section 15.1.4.12 | | NFS4ERR_XDEV | 18 | Section 15.1.4.12 |
+-----------------------------------+--------+-------------------+ +-----------------------------------+--------+-------------------+
skipping to change at page 321, line 38 skipping to change at page 323, line 28
15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044) 15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044)
The operation code is not a valid one for the current Compound The operation code is not a valid one for the current Compound
procedure. The opcode in the result stream matched with this error procedure. The opcode in the result stream matched with this error
is the ILLEGAL value, although the value that appears in the request is the ILLEGAL value, although the value that appears in the request
stream may be different. Where an illegal value appears and the stream may be different. Where an illegal value appears and the
replier pre-parses all ops for a Compound procedure before doing any replier pre-parses all ops for a Compound procedure before doing any
operation execution, an RPC-level XDR error may be returned in this operation execution, an RPC-level XDR error may be returned in this
case. case.
15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10070) 15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071)
Most forward operations and all callback operations are only valid Most forward operations and all callback operations are only valid
within the context of a session, so that the Compound request in within the context of a session, so that the Compound request in
question must begin with a Sequence operation, If an attempt is made question must begin with a Sequence operation, If an attempt is made
to execute these operations outside the context of session, this to execute these operations outside the context of session, this
error results. error results.
15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066) 15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066)
The reply to a Compound would exceed the channel's negotiated maximum The reply to a Compound would exceed the channel's negotiated maximum
skipping to change at page 329, line 35 skipping to change at page 331, line 18
or the particular specified file. or the particular specified file.
15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060) 15.1.10.5. NFS4ERR_NOMATCHING_LAYOUT (Error Code 10060)
Returned when layouts are recalled and the client has no layouts Returned when layouts are recalled and the client has no layouts
matching the specification of the layouts being recalled. matching the specification of the layouts being recalled.
15.1.10.6. NFS4ERR_PNFS_IO_HOLE (Error Code 10075) 15.1.10.6. NFS4ERR_PNFS_IO_HOLE (Error Code 10075)
The pNFS client has attempted to read from or write to an illegal The pNFS client has attempted to read from or write to an illegal
hole of a file of a data server that is using the STRIPE4_SPARSE hole of a file of a data server that is using sparse packing. See
stripe type. See Section 13.4.4. Section 13.4.4.
15.1.10.7. NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080) 15.1.10.7. NFS4ERR_PNFS_NO_LAYOUT (Error Code 10080)
The pNFS client has attempted to read from or write to a file (using The pNFS client has attempted to read from or write to a file (using
a request to a data server) without holding a valid layout. This a request to a data server) without holding a valid layout. This
includes the case where the client had a layout, but the iomode does includes the case where the client had a layout, but the iomode does
not allow a WRITE. not allow a WRITE.
15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086) 15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086)
skipping to change at page 330, line 31 skipping to change at page 332, line 11
The requester sent a Sequence operation that attempted to use a slot The requester sent a Sequence operation that attempted to use a slot
the replier does not have in its slot table. It is possible the slot the replier does not have in its slot table. It is possible the slot
may have been retired. may have been retired.
15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077) 15.1.11.3. NFS4ERR_BAD_HIGH_SLOT (Error Code 10077)
The highest_slot argument in a Sequence operation exceeds the The highest_slot argument in a Sequence operation exceeds the
replier's enforced highest_slotid. replier's enforced highest_slotid.
15.1.11.4. NFS4ERR_DEADSESSION (Error Code 10078) 15.1.11.4. NFS4ERR_CB_PATH_DOWN (Error Code 10048)
There is a problem contacting the client via the callback path. The
function of this error has been mostly superseded by the use of
status flags in the reply to the SEQUENCE SEQUENCE operation (see
Section 18.46).
15.1.11.5. NFS4ERR_DEADSESSION (Error Code 10078)
The specified session is a persistent session which is dead and does The specified session is a persistent session which is dead and does
not accept new requests or perform new operations on existing not accept new requests or perform new operations on existing
requests (in the case in which a request was partially executed requests (in the case in which a request was partially executed
before server restart). before server restart).
15.1.11.5. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) 15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055)
A Sequence operation was sent on a connection that has not been A Sequence operation was sent on a connection that has not been
associated with the specified session, in an environment where the associated with the specified session, in an environment where the
associated client ID specified that connection binding be enforced. associated client ID specified that connection binding be enforced.
15.1.11.6. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076) 15.1.11.7. NFS4ERR_SEQ_FALSE_RETRY (Error Code 10076)
The requester sent a Sequence operation with a slot id and sequence The requester sent a Sequence operation with a slot id and sequence
id that are in the reply cache, but the replier has detected that the id that are in the reply cache, but the replier has detected that the
retried request is not the same as the original request. retried request is not the same as the original request.
15.1.11.7. NFS4ERR_SEQ_MISORDERED (Error Code 10063) 15.1.11.8. NFS4ERR_SEQ_MISORDERED (Error Code 10063)
The requester sent a Sequence operation with an invalid sequence id. The requester sent a Sequence operation with an invalid sequence id.
15.1.12. Session Management Errors 15.1.12. Session Management Errors
This section deals with errors associated with requests used in This section deals with errors associated with requests used in
session management. session management.
15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057) 15.1.12.1. NFS4ERR_BACK_CHAN_BUSY (Error Code 10057)
skipping to change at page 334, line 5 skipping to change at page 335, line 37
o There has been a restructuring of some errors for NFSv4.1 which o There has been a restructuring of some errors for NFSv4.1 which
resulted in the elimination of certain of the errors. resulted in the elimination of certain of the errors.
15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026) 15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026)
The sequence number in a locking request is neither the next expected The sequence number in a locking request is neither the next expected
number or the last number processed. These sequence id's are ignored number or the last number processed. These sequence id's are ignored
in NFSv4.1. in NFSv4.1.
15.1.16.2. NFS4ERR_CB_PATH_DOWN (Error Code 10048) 15.1.16.2. NFS4ERR_LEASE_MOVED (Error Code 10031)
There is a problem contacting the client via the callback path
15.1.16.3. NFS4ERR_LEASE_MOVED (Error Code 10031)
A lease being renewed is associated with a file system that has been A lease being renewed is associated with a file system that has been
migrated to a new server migrated to a new server
15.1.16.4. NFS4ERR_NXIO (Error Code 5) 15.1.16.3. NFS4ERR_NXIO (Error Code 5)
I/O error. No such device or address. I/O error. No such device or address.
15.1.16.5. NFS4ERR_RESTOREFH (Error Code 10030) 15.1.16.4. NFS4ERR_RESTOREFH (Error Code 10030)
The RESTOREFH operation does not have a saved filehandle (identified The RESTOREFH operation does not have a saved filehandle (identified
by SAVEFH) to operate upon. by SAVEFH) to operate upon.
15.1.16.6. NFS4ERR_STALE_STATEID (Error Code 10023) 15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023)
A stateid generated by an earlier server instance was used. A stateid generated by an earlier server instance was used.
15.2. Operations and their valid errors 15.2. Operations and their valid errors
This section contains a table which gives the valid error returns for This section contains a table which gives the valid error returns for
each protocol operation. The error code NFS4_OK (indicating no each protocol operation. The error code NFS4_OK (indicating no
error) is not listed but should be understood to be returnable by all error) is not listed but should be understood to be returnable by all
operations with two important exceptions: operations with two important exceptions:
skipping to change at page 337, line 26 skipping to change at page 338, line 43
| | NFS4ERR_WRONG_CRED | | | NFS4ERR_WRONG_CRED |
| DESTROY_CLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLIENTID_BUSY, | | DESTROY_CLIENTID | NFS4ERR_BADXDR, NFS4ERR_CLIENTID_BUSY, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE_CLIENTID, | | | NFS4ERR_STALE_CLIENTID, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED |
| DESTROY_SESSION | NFS4ERR_BACK_CHAN_BUSY, | | DESTROY_SESSION | NFS4ERR_BACK_CHAN_BUSY, |
| | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, | | | NFS4ERR_BADSESSION, NFS4ERR_BADXDR, |
| | NFS4ERR_CB_PATH_DOWN, |
| | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, | | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION, |
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, |
| | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, | | | NFS4ERR_NOT_ONLY_OP, NFS4ERR_REP_TOO_BIG, |
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, |
| | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, | | | NFS4ERR_REQ_TOO_BIG, NFS4ERR_SERVERFAULT, |
| | NFS4ERR_STALE_CLIENTID, | | | NFS4ERR_STALE_CLIENTID, |
| | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED | | | NFS4ERR_TOO_MANY_OPS, NFS4ERR_WRONG_CRED |
| EXCHANGE_ID | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, | | EXCHANGE_ID | NFS4ERR_BADCHAR, NFS4ERR_BADXDR, |
| | NFS4ERR_CLID_INUSE, NFS4ERR_DEADSESSION, | | | NFS4ERR_CLID_INUSE, NFS4ERR_DEADSESSION, |
| | NFS4ERR_DELAY, NFS4ERR_ENCR_ALG_UNSUPP, | | | NFS4ERR_DELAY, NFS4ERR_ENCR_ALG_UNSUPP, |
skipping to change at page 354, line 45 skipping to change at page 356, line 45
| | SEQUENCE | | | SEQUENCE |
| NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU | | NFS4ERR_BAD_RANGE | LOCK, LOCKT, LOCKU |
| NFS4ERR_BAD_SESSION_DIGEST | BIND_CONN_TO_SESSION, SET_SSV | | NFS4ERR_BAD_SESSION_DIGEST | BIND_CONN_TO_SESSION, SET_SSV |
| NFS4ERR_BAD_STATEID | CB_LAYOUTRECALL, CB_NOTIFY, | | NFS4ERR_BAD_STATEID | CB_LAYOUTRECALL, CB_NOTIFY, |
| | CB_NOTIFY_LOCK, CB_RECALL, | | | CB_NOTIFY_LOCK, CB_RECALL, |
| | CLOSE, DELEGRETURN, | | | CLOSE, DELEGRETURN, |
| | FREE_STATEID, LAYOUTGET, | | | FREE_STATEID, LAYOUTGET, |
| | LAYOUTRETURN, LOCK, LOCKU, | | | LAYOUTRETURN, LOCK, LOCKU, |
| | OPEN, OPEN_DOWNGRADE, READ, | | | OPEN, OPEN_DOWNGRADE, READ, |
| | SETATTR, WRITE | | | SETATTR, WRITE |
| NFS4ERR_CB_PATH_DOWN | DESTROY_SESSION |
| NFS4ERR_CLID_INUSE | EXCHANGE_ID | | NFS4ERR_CLID_INUSE | EXCHANGE_ID |
| NFS4ERR_CLIENTID_BUSY | DESTROY_CLIENTID | | NFS4ERR_CLIENTID_BUSY | DESTROY_CLIENTID |
| NFS4ERR_COMPLETE_ALREADY | RECLAIM_COMPLETE | | NFS4ERR_COMPLETE_ALREADY | RECLAIM_COMPLETE |
| NFS4ERR_CONN_BINDING_NOT_ENFORCED | BIND_CONN_TO_SESSION, SET_SSV | | NFS4ERR_CONN_BINDING_NOT_ENFORCED | BIND_CONN_TO_SESSION, SET_SSV |
| NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, DESTROY_SESSION, | | NFS4ERR_CONN_NOT_BOUND_TO_SESSION | CB_SEQUENCE, DESTROY_SESSION, |
| | SEQUENCE | | | SEQUENCE |
| NFS4ERR_DEADLOCK | LOCK | | NFS4ERR_DEADLOCK | LOCK |
| NFS4ERR_DEADSESSION | ACCESS, BACKCHANNEL_CTL, | | NFS4ERR_DEADSESSION | ACCESS, BACKCHANNEL_CTL, |
| | BIND_CONN_TO_SESSION, CLOSE, | | | BIND_CONN_TO_SESSION, CLOSE, |
| | COMMIT, CREATE, | | | COMMIT, CREATE, |
skipping to change at page 372, line 51 skipping to change at page 374, line 51
case OP_SEQUENCE: SEQUENCE4res opsequence; case OP_SEQUENCE: SEQUENCE4res opsequence;
case OP_SET_SSV: SET_SSV4res opset_ssv; case OP_SET_SSV: SET_SSV4res opset_ssv;
case OP_TEST_STATEID: TEST_STATEID4res optest_stateid; case OP_TEST_STATEID: TEST_STATEID4res optest_stateid;
case OP_WANT_DELEGATION: case OP_WANT_DELEGATION:
WANT_DELEGATION4res WANT_DELEGATION4res
opwant_delegation; opwant_delegation;
case OP_DESTROY_CLIENTID: case OP_DESTROY_CLIENTID:
DESTROY_CLIENTID4res DESTROY_CLIENTID4res
opwant_destroy_clientid; opdestroy_clientid;
case OP_RECLAIM_COMPLETE: case OP_RECLAIM_COMPLETE:
RECLAIM_COMPLETE4res RECLAIM_COMPLETE4res
opreclaim_complete; opreclaim_complete;
/* Operations not new to NFSv4.1 */ /* Operations not new to NFSv4.1 */
case OP_ILLEGAL: ILLEGAL4res opillegal; case OP_ILLEGAL: ILLEGAL4res opillegal;
}; };
struct COMPOUND4res { struct COMPOUND4res {
nfsstat4 status; nfsstat4 status;
skipping to change at page 430, line 6 skipping to change at page 432, line 6
}; };
18.20.3. DESCRIPTION 18.20.3. DESCRIPTION
Replaces the current filehandle with the filehandle that represents Replaces the current filehandle with the filehandle that represents
the public filehandle of the server's name space. This filehandle the public filehandle of the server's name space. This filehandle
may be different from the "root" filehandle which may be associated may be different from the "root" filehandle which may be associated
with some other directory on the server. with some other directory on the server.
The public filehandle represents the concepts embodied in RFC2054 The public filehandle represents the concepts embodied in RFC2054
[32], RFC2055 [33], RFC2224 [39]. The intent for NFSv4.1 is that the [32], RFC2055 [33], RFC2224 [41]. The intent for NFSv4.1 is that the
public filehandle (represented by the PUTPUBFH operation) be used as public filehandle (represented by the PUTPUBFH operation) be used as
a method of providing WebNFS server compatibility with NFSv3. a method of providing WebNFS server compatibility with NFSv3.
The public filehandle and the root filehandle (represented by the The public filehandle and the root filehandle (represented by the
PUTROOTFH operation) should be equivalent. If the public and root PUTROOTFH operation) should be equivalent. If the public and root
filehandles are not equivalent, then the public filehandle MUST be a filehandles are not equivalent, then the public filehandle MUST be a
descendant of the root filehandle. descendant of the root filehandle.
18.20.4. IMPLEMENTATION 18.20.4. IMPLEMENTATION
Used as the first operator in an NFS request to set the context for Used as the first operator in an NFS request to set the context for
following operations. following operations.
With the NFSv3 public filehandle, the client is able to specify With the NFSv3 public filehandle, the client is able to specify
whether the path name provided in the LOOKUP should be evaluated as whether the path name provided in the LOOKUP should be evaluated as
either an absolute path relative to the server's root or relative to either an absolute path relative to the server's root or relative to
the public filehandle. RFC2224 [39] contains further discussion of the public filehandle. RFC2224 [41] contains further discussion of
the functionality. With NFSv4.1, that type of specification is not the functionality. With NFSv4.1, that type of specification is not
directly available in the LOOKUP operation. The reason for this is directly available in the LOOKUP operation. The reason for this is
because the component separators needed to specify absolute vs. because the component separators needed to specify absolute vs.
relative are not allowed in NFSv4. Therefore, the client is relative are not allowed in NFSv4. Therefore, the client is
responsible for constructing its request such that the use of either responsible for constructing its request such that the use of either
PUTROOTFH or PUTPUBFH are used to signify absolute or relative PUTROOTFH or PUTPUBFH are used to signify absolute or relative
evaluation of an NFS URL respectively. evaluation of an NFS URL respectively.
Note that there are warnings mentioned in RFC2224 [39] with respect Note that there are warnings mentioned in RFC2224 [41] with respect
to the use of absolute evaluation and the restrictions the server may to the use of absolute evaluation and the restrictions the server may
place on that evaluation with respect to how much of its namespace place on that evaluation with respect to how much of its namespace
has been made available. These same warnings apply to NFSv4. It is has been made available. These same warnings apply to NFSv4. It is
likely, therefore that because of server implementation details, an likely, therefore that because of server implementation details, an
NFSv3 absolute public filehandle lookup may behave differently than NFSv3 absolute public filehandle lookup may behave differently than
an NFSv4.1 absolute resolution. an NFSv4.1 absolute resolution.
There is a form of security negotiation as described in RFC2755 [40] There is a form of security negotiation as described in RFC2755 [42]
that uses the public filehandle a method of employing SNEGO. This that uses the public filehandle a method of employing SNEGO. This
method is not available with NFSv4.1 as filehandles are not method is not available with NFSv4.1 as filehandles are not
overloaded with special meaning and therefore do not provide the same overloaded with special meaning and therefore do not provide the same
framework as NFSv3. Clients should therefore use the security framework as NFSv3. Clients should therefore use the security
negotiation mechanisms described in this RFC. negotiation mechanisms described in this RFC.
18.20.5. ERRORS 18.20.5. ERRORS
18.21. Operation 24: PUTROOTFH - Set Root Filehandle 18.21. Operation 24: PUTROOTFH - Set Root Filehandle
skipping to change at page 469, line 11 skipping to change at page 471, line 11
applies only to new stateids. Existing stateids (and all stateids applies only to new stateids. Existing stateids (and all stateids
with the same "other" field) that were created with stateid to with the same "other" field) that were created with stateid to
principal binding in force will continue to have binding in force. principal binding in force will continue to have binding in force.
Existing stateids (and all stateids with same "other" field) that Existing stateids (and all stateids with same "other" field) that
were created with stateid to principal not in force will continue to were created with stateid to principal not in force will continue to
have binding not in force. have binding not in force.
The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and
EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 and EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 and
convey roles the client ID is to be used for in a pNFS environment convey roles the client ID is to be used for in a pNFS environment.
Note that the same client owner/server owner pair can have multiple The server MUST set one of the acceptable combinations of these bits
roles. Multiple roles can be associated with the same client ID or (roles) in eir_flags, as specified in Section 13.1. Note that the
with different client IDs. Thus, if a client sends EXCHANGE_ID from same client owner/server owner pair can have multiple roles.
the same client owner to the same server owner multiple times, but Multiple roles can be associated with the same client ID or with
different client IDs. Thus, if a client sends EXCHANGE_ID from the
same client owner to the same server owner multiple times, but
specifies different pNFS roles each time, the server might return specifies different pNFS roles each time, the server might return
different client IDs. Given that different pNFS roles might have different client IDs. Given that different pNFS roles might have
different client IDs, the client may ask for different properties for different client IDs, the client may ask for different properties for
each role/client ID. each role/client ID.
The spa_how field of the eia_state_protect field specifies how the The spa_how field of the eia_state_protect field specifies how the
client wants to protect its client, locking and session state from client wants to protect its client, locking and session state from
unauthorized changes (Section 2.10.7.3): unauthorized changes (Section 2.10.7.3):
o SP4_NONE. The client does not request the NFSv4.1 server to o SP4_NONE. The client does not request the NFSv4.1 server to
skipping to change at page 484, line 10 skipping to change at page 487, line 10
Creation Response - Successful Acceptance" of [4]. Creation Response - Successful Acceptance" of [4].
The first RPCSEC_GSS handle, gcbp_handle_from_server, is the fore The first RPCSEC_GSS handle, gcbp_handle_from_server, is the fore
handle the server returned to the client (in the handle field of handle the server returned to the client (in the handle field of
data type rpc_gss_init_res) when the RPCSEC_GSS context was data type rpc_gss_init_res) when the RPCSEC_GSS context was
created on the server. The second handle, created on the server. The second handle,
gcbp_handle_from_client, is the back handle the client will map gcbp_handle_from_client, is the back handle the client will map
the RPCSEC_GSS context to. The server can immediately use the the RPCSEC_GSS context to. The server can immediately use the
value of gcbp_handle_from_client in the RPCSEC_GSS credential in value of gcbp_handle_from_client in the RPCSEC_GSS credential in
callback RPCs. I.e., the value in gcbp_handle_from_client can be callback RPCs. I.e., the value in gcbp_handle_from_client can be
used as the value of the the field "handle" in data type used as the value of the field "handle" in data type
rpc_gss_cred_t (see Section 5, "Elements of the RPCSEC_GSS rpc_gss_cred_t (see Section 5, "Elements of the RPCSEC_GSS
Security Protocol" of [4]) in callback RPCs. The server must use Security Protocol" of [4]) in callback RPCs. The server must use
the RPCSEC_GSS security service specified in gcbp_service, i.e. it the RPCSEC_GSS security service specified in gcbp_service, i.e. it
must set the the "service" field of the rpc_gss_cred_t data type must set the "service" field of the rpc_gss_cred_t data type in
in RPCSEC_GSS credential to the value of gcbp_service (see Section RPCSEC_GSS credential to the value of gcbp_service (see Section
5.3.1, "RPC Request Header", of [4]). 5.3.1, "RPC Request Header", of [4]).
If the RPCSEC_GSS handle identified by gcbp_handle_from_server If the RPCSEC_GSS handle identified by gcbp_handle_from_server
does not exist on the server, the server will return does not exist on the server, the server will return
NFS4ERR_NOENT. NFS4ERR_NOENT.
Note that while the GSS context state is shared between the fore Note that while the GSS context state is shared between the fore
and back RPCSEC_GSS contexts, the fore and back RPCSEC_GSS context and back RPCSEC_GSS contexts, the fore and back RPCSEC_GSS context
state are independent of each other as far as the RPCSEC_GSS state are independent of each other as far as the RPCSEC_GSS
sequence number (see the seq_num field in the rpc_gss_cred_t data sequence number (see the seq_num field in the rpc_gss_cred_t data
skipping to change at page 489, line 6 skipping to change at page 492, line 6
with other state-modifying operations, because the DESTROY_SESSION with other state-modifying operations, because the DESTROY_SESSION
will destroy reply cache. will destroy reply cache.
DESTROY_SESSION MAY be the only operation in a COMPOUND request. DESTROY_SESSION MAY be the only operation in a COMPOUND request.
Because the session is destroyed, a client that retries the request Because the session is destroyed, a client that retries the request
may receive an error in reply to the retry, even though the original may receive an error in reply to the retry, even though the original
request was successful. request was successful.
If there is a backchannel on the session and the server has If there is a backchannel on the session and the server has
outstanding CB_SEQUENCE operations, then the server MAY refuse to outstanding CB_COMPOUND operations for the session which have not
destroy the session and return NFS4ERR_BACK_CHAN_BUSY. In the event been replied to, then the server MAY refuse to destroy the session
the backchannel is down, the server should instead return and return an error. In the event the backchannel is down, the
NFS4ERR_CB_PATH_DOWN to inform the client that the backchannel needs server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the client that
to repaired before the server will allow the session to be destroyed. the backchannel needs to repaired before the server will allow the
The client SHOULD reply to all outstanding CB_COMPOUNDs before re- session to be destroyed. Otherwise, the error CB_BACK_CHAN_BUSY
issuing DESTROY_SESSION. SHOULD be returned to indicate that there are CB_COMPOUNDs that need
to be replied to. The client SHOULD reply to all outstanding
CB_COMPOUNDs before re-sending DESTROY_SESSION.
18.38. Operation 45: FREE_STATEID - Free stateid with no locks 18.38. Operation 45: FREE_STATEID - Free stateid with no locks
Free a single stateid. Free a single stateid.
18.38.1. ARGUMENT 18.38.1. ARGUMENT
struct FREE_STATEID4args { struct FREE_STATEID4args {
stateid4 fsa_stateid; stateid4 fsa_stateid;
}; };
skipping to change at page 495, line 6 skipping to change at page 498, line 6
client identifies the device information to be returned by providing client identifies the device information to be returned by providing
the gdia_device_id and gdia_layout_type that uniquely identify the the gdia_device_id and gdia_layout_type that uniquely identify the
device address. The client provides gdia_maxcount to limit the device address. The client provides gdia_maxcount to limit the
number of bytes for the result. This maximum size represents all of number of bytes for the result. This maximum size represents all of
the data being returned within the GETDEVICEINFO4resok structure and the data being returned within the GETDEVICEINFO4resok structure and
includes the XDR overhead. The server may return less data. If the includes the XDR overhead. The server may return less data. If the
server is unable to return the information within the gdia_maxcount server is unable to return the information within the gdia_maxcount
limit, the error NFS4ERR_TOOSMALL will be returned. However, if limit, the error NFS4ERR_TOOSMALL will be returned. However, if
gdia_maxcount is zero, NFS4ERR_TOOSMALL MUST NOT be returned. gdia_maxcount is zero, NFS4ERR_TOOSMALL MUST NOT be returned.
The da_layout_type field of the gdir_device_addr returned by the
server MUST be equal to the gdia_layout_type specified by the client.
If it is not equal, the client SHOULD ignore the response as invalid
and behave as if the server returned an error, even if the client
does have support for the layout type returned.
The client also provides a notification bitmap, gdia_notify_types for The client also provides a notification bitmap, gdia_notify_types for
the device ID mapping notification for which it is interested in the device ID mapping notification for which it is interested in
receiving; the server must support device ID notifications for the receiving; the server must support device ID notifications for the
notification request to have affect. The notification mask is notification request to have affect. The notification mask is
composed in the same manner as the bitmap for file attributes composed in the same manner as the bitmap for file attributes
(Section 3.3.7). The numbers of bit positions are listed in the (Section 3.3.7). The numbers of bit positions are listed in the
notify_device_type4 enumeration type (Section 20.12). Only two notify_device_type4 enumeration type (Section 20.12). Only two
enumerated values of notify_device_type4 currently apply to enumerated values of notify_device_type4 currently apply to
GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE
(see Section 20.12). (see Section 20.12).
skipping to change at page 495, line 33 skipping to change at page 498, line 39
If the client wants to just update or turn off notifications, it MAY If the client wants to just update or turn off notifications, it MAY
issue GETDEVICEINFO with gdia_maxcount set to zero. In that event, issue GETDEVICEINFO with gdia_maxcount set to zero. In that event,
if the device ID is valid, the da_addr_body field of the if the device ID is valid, the da_addr_body field of the
gdir_device_addr field will be of zero length. gdir_device_addr field will be of zero length.
If an unknown device ID is given in gdia_device_id, the server If an unknown device ID is given in gdia_device_id, the server
returns NFS4ERR_NOENT. Otherwise, the device address information is returns NFS4ERR_NOENT. Otherwise, the device address information is
returned in gdir_device_addr. Finally, if the server supports returned in gdir_device_addr. Finally, if the server supports
notifications for device ID mappings, the gdir_notification result notifications for device ID mappings, the gdir_notification result
will contain a bitmap of which notifications it will actually send to will contain a bitmap of which notifications it will actually send to
the server (via CB_NOTIFY_DEVICEID, see Section 20.12). the client (via CB_NOTIFY_DEVICEID, see Section 20.12).
If NFS4ERR_TOOSMALL is returned, the results also contain If NFS4ERR_TOOSMALL is returned, the results also contain
gdir_mincount. The value of gdir_mincount represents the minimum gdir_mincount. The value of gdir_mincount represents the minimum
size necessary to obtain the device information. size necessary to obtain the device information.
18.40.4. IMPLEMENTATION 18.40.4. IMPLEMENTATION
Aside from updating or turning off notifications, another use case Aside from updating or turning off notifications, another use case
for gdia_maxcount being set to zero is to validate a device ID. for gdia_maxcount being set to zero is to validate a device ID.
skipping to change at page 503, line 27 skipping to change at page 506, line 27
delegation stateid. Once a layout is held by the client for the delegation stateid. Once a layout is held by the client for the
file, the loga_stateid field is a stateid as returned from a previous file, the loga_stateid field is a stateid as returned from a previous
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL
operation (see Section 12.5.3). operation (see Section 12.5.3).
The loga_maxcount field specifies the maximum layout size (in bytes) The loga_maxcount field specifies the maximum layout size (in bytes)
that the client can handle. If the size of the layout structure that the client can handle. If the size of the layout structure
exceeds the size specified by maxcount, the metadata server will exceeds the size specified by maxcount, the metadata server will
return the NFS4ERR_TOOSMALL error. return the NFS4ERR_TOOSMALL error.
The returned layout is expressed an array, logr_layout, of type The returned layout is expressed as an array, logr_layout, with each
layout4. If a file has a single striping pattern, then logr_layout element of type layout4. If a file has a single striping pattern,
will contain just one entry. Otherwise, if the requested range then logr_layout will contain just one entry. Otherwise, if the
overlaps more than one striping pattern, logr_layout will contain the requested range overlaps more than one striping pattern, logr_layout
required number of entries. Each element of logr_layout MUST have will contain the required number of entries. The elements of
the same iomode. The elements of logr_layout MUST be sorted in logr_layout MUST be sorted in ascending order of the value of the
ascending order of the value of lo_offset field of each element. lo_offset field of each element. There MUST be no gaps or overlaps
There MUST be no gaps in the range between two successive elements of in the range between two successive elements of logr_layout. The
logr_layout. The lo_iomode field in each element of logr_layout MUST lo_iomode field in each element of logr_layout MUST be the same.
be the same.
The metadata server may adjust the range of the returned layout based The metadata server may adjust the range of the returned layout based
on the usage implied by the loga_iomode. The client must be prepared on the usage implied by the loga_iomode. The client MUST be prepared
to get a layout that does not align exactly with its request. The to get a layout that does not align exactly with its request. The
lo_length field in each element of logr_layout SHOULD be at least as lo_length field in each element of logr_layout SHOULD be at least as
long as loga_minlength or the server SHOULD reject the request. See long as loga_minlength or the server SHOULD reject the request. See
Section 12.5.2 for more details. Section 12.5.2 for more details.
The metadata server may also return a layout with an lo_iomode other The metadata server may also return a layout with an lo_iomode other
than that requested by the client. If it does so, it must ensure than that requested by the client. If it does so, it must ensure
that the lo_iomode is more permissive than the loga_iomode requested. that the lo_iomode is more permissive than the loga_iomode requested.
For example, this behavior allows an implementation to upgrade read- For example, this behavior allows an implementation to upgrade read-
only requests to read/write requests at its discretion, within the only requests to read/write requests at its discretion, within the
limits of the layout type specific protocol. A lo_iomode of either limits of the layout type specific protocol. A lo_iomode of either
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW must be returned. LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW must be returned.
The logr_return_on_close result field is a directive to return the The logr_return_on_close result field is a directive to return the
layout before closing the file. When the server sets this return layout before closing the file. When the server sets this return
value to TRUE, it must be prepared to recall the layout in the case value to TRUE, it must be prepared to recall the layout in the case
the client fails to return the layout before close. For the server the client fails to return the layout before close. For the server
that knows a layout must be returned before a close of the file, this that knows a layout must be returned before a close of the file, this
return value can be used to communicate the desired behavior to the return value can be used to communicate the desired behavior to the
client and thus removing one extra step from the client's and client and thus remove one extra step from the client's and server's
server's interaction. interaction.
The logr_stateid, as with all stateid processing, is returned to the The logr_stateid, as with all stateid processing, is returned to the
client for use in subsequent layout related operations. See client for use in subsequent layout related operations. See
Section 8.2 for a further discussion. Section 8.2 for a further discussion.
The format of the returned layout (lo_content) is specific to the The format of the returned layout (lo_content) is specific to the
layout type. layout type. The value of the layout type (lo_content.loc_type) for
each of the elements of the array of layouts returned by the server
(logr_layout) MUST be equal to the loga_layout_type specified by the
client. If it is not equal, the client SHOULD ignore the response as
invalid and behave as if the server returned an error, even if the
client does have support for the layout type returned.
If layouts are not supported for the requested file or its containing If layouts are not supported for the requested file or its containing
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If
the layout type is not supported, the metadata server should return the layout type is not supported, the metadata server should return
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout
matches the client provided layout identification, the server should matches the client provided layout identification, the server should
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should
return NFS4ERR_BADIOMODE. return NFS4ERR_BADIOMODE.
skipping to change at page 512, line 20 skipping to change at page 515, line 20
const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED = 0x00000008; const SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED = 0x00000008;
const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED = 0x00000010; const SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED = 0x00000010;
const SEQ4_STATUS_ADMIN_STATE_REVOKED = 0x00000020; const SEQ4_STATUS_ADMIN_STATE_REVOKED = 0x00000020;
const SEQ4_STATUS_RECALLABLE_STATE_REVOKED = 0x00000040; const SEQ4_STATUS_RECALLABLE_STATE_REVOKED = 0x00000040;
const SEQ4_STATUS_LEASE_MOVED = 0x00000080; const SEQ4_STATUS_LEASE_MOVED = 0x00000080;
const SEQ4_STATUS_RESTART_RECLAIM_NEEDED = 0x00000100; const SEQ4_STATUS_RESTART_RECLAIM_NEEDED = 0x00000100;
const SEQ4_STATUS_CB_PATH_DOWN_SESSION = 0x00000200; const SEQ4_STATUS_CB_PATH_DOWN_SESSION = 0x00000200;
const SEQ4_STATUS_BACKCHANNEL_FAULT = 0x00000400; const SEQ4_STATUS_BACKCHANNEL_FAULT = 0x00000400;
const SEQ4_STATUS_DEVID_CHANGED = 0x00000800; const SEQ4_STATUS_DEVID_CHANGED = 0x00000800;
const SEQ4_STATUS_DEVID_DELETED = 0x00001000; const SEQ4_STATUS_DEVID_DELETED = 0x00001000;
const SEQ4_STATUS_DEVID_DELETED_ALL = 0x00002000;
struct SEQUENCE4resok { struct SEQUENCE4resok {
sessionid4 sr_sessionid; sessionid4 sr_sessionid;
sequenceid4 sr_sequenceid; sequenceid4 sr_sequenceid;
slotid4 sr_slotid; slotid4 sr_slotid;
slotid4 sr_highest_slotid; slotid4 sr_highest_slotid;
slotid4 sr_target_highest_slotid; slotid4 sr_target_highest_slotid;
uint32_t sr_status_flags; uint32_t sr_status_flags;
}; };
skipping to change at page 513, line 24 skipping to change at page 516, line 22
The sa_slotid argument is the index in the reply cache for the The sa_slotid argument is the index in the reply cache for the
request. The sa_sequenceid field is the sequence number of the request. The sa_sequenceid field is the sequence number of the
request for the reply cache entry (slot). The sr_slotid result MUST request for the reply cache entry (slot). The sr_slotid result MUST
equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid.
The sa_highest_slotid argument is the highest slot id the client has The sa_highest_slotid argument is the highest slot id the client has
a request outstanding for; it could be equal to sa_slotid. The a request outstanding for; it could be equal to sa_slotid. The
server returns two "highest_slotid" values: sr_highest_slotid, and server returns two "highest_slotid" values: sr_highest_slotid, and
sr_target_highest_slotid. The former is the highest slot id the sr_target_highest_slotid. The former is the highest slot id the
server will accept in future SEQUENCE operation, and SHOULD NOT be server will accept in future SEQUENCE operation, and SHOULD NOT be
less than the the value of sa_highest_slotid. (but see less than the value of sa_highest_slotid. (but see Section 2.10.5.1
Section 2.10.5.1 for an exception). The latter is the highest slot for an exception). The latter is the highest slot id the server
id the server would prefer the client use on a future SEQUENCE would prefer the client use on a future SEQUENCE operation.
operation.
If sa_cachethis is TRUE, then the client is requesting that the If sa_cachethis is TRUE, then the client is requesting that the
server cache the entire reply in the server's reply cache; therefore server cache the entire reply in the server's reply cache; therefore
the server MUST cache the reply (see Section 2.10.5.1.2). The server the server MUST cache the reply (see Section 2.10.5.1.2). The server
MAY cache the reply if sa_cachethis is FALSE. If the server does not MAY cache the reply if sa_cachethis is FALSE. If the server does not
cache the entire reply, it MUST still record that it executed the cache the entire reply, it MUST still record that it executed the
request at the specified slot and sequence id. request at the specified slot and sequence id.
The response to the SEQUENCE operation contains a word of status The response to the SEQUENCE operation contains a word of status
flags (sr_status_flags) that can provide to the client information flags (sr_status_flags) that can provide to the client information
skipping to change at page 515, line 51 skipping to change at page 518, line 51
backchannel (e.g. it has lost track of the sequence id for a slot backchannel (e.g. it has lost track of the sequence id for a slot
in the backchannel). The client MUST stop sending more requests in the backchannel). The client MUST stop sending more requests
on the session's fore channel, wait for all outstanding requests on the session's fore channel, wait for all outstanding requests
to complete on the fore and back channel, and then destroy the to complete on the fore and back channel, and then destroy the
session. session.
SEQ4_STATUS_DEVID_CHANGED SEQ4_STATUS_DEVID_CHANGED
The client is using device ID notifications and the server has The client is using device ID notifications and the server has
changed a device ID mapping held by the client. This flag will changed a device ID mapping held by the client. This flag will
stay present until the client has obtained the new mapping with stay present until the client has obtained the new mapping with
GETDEVICELIST or GETDEVICEINFO. GETDEVICEINFO.
SEQ4_STATUS_DEVID_DELETED SEQ4_STATUS_DEVID_DELETED
The server has removed a device ID mapping as held by the client. The client is using device ID notifications and the server has
This flag will stay in affect until the client either sends a deleted a device ID mapping held by the client. This flag will
DELEGRETURN or uses GETDEVICELIST to refresh all mappings. stay in affect until the client sends GETDEVICEINFO with a null
value in the argument gdia_notify_types.
SEQ4_STATUS_DEVID_DELETED_ALL
The server has deleted all device ID mappings; this flag will stay
present until the client sends the appropriate DELEGRETURN.
The value of sa_sequenceid argument relative to to the cached The value of sa_sequenceid argument relative to to the cached
sequence id on the slot falls into one of three cases. sequence id on the slot falls into one of three cases.
o If the difference between sa_sequenceid and the server's cached o If the difference between sa_sequenceid and the server's cached
sequence id at the slot id is two (2) or more, or if sa_sequenceid sequence id at the slot id is two (2) or more, or if sa_sequenceid
is less than the cached sequence id (accounting for wraparound of is less than the cached sequence id (accounting for wraparound of
the unsigned sequence id value), then the server MUST return the unsigned sequence id value), then the server MUST return
NFS4ERR_SEQ_MISORDERED. NFS4ERR_SEQ_MISORDERED.
skipping to change at page 558, line 17 skipping to change at page 561, line 17
"udp" - UDP over IP version 4 "udp" - UDP over IP version 4
"tcp6" - TCP over IP version 6 "tcp6" - TCP over IP version 6
"udp6" - UDP over IP version 6 "udp6" - UDP over IP version 6
Note: the '"' marks are used for delimiting the strings for this Note: the '"' marks are used for delimiting the strings for this
document and are not part of the Network Identifier string. document and are not part of the Network Identifier string.
For the "tcp" and "udp" Network Identifiers the Universal Address or For the "tcp" and "udp" Network Identifiers the Universal Address or
r_addr (for IPv4) is a US-ASCII string and is of the form: r_addr (for IPv4) is a US-ASCII string and is of the form described
in Section 3.3.9.1.
h1.h2.h3.h4.p1.p2
The prefix, "h1.h2.h3.h4", is the standard textual form for
representing an IPv4 address, which is always four octets long.
Assuming big-endian ordering, h1, h2, h3, and h4, are respectively,
the first through fourth octets each converted to ASCII-decimal.
Assuming big-endian ordering, p1 and p2 are, respectively, the first
and second octets each converted to ASCII-decimal. For example, if a
host, in big-endian order, has an address of 0x0A010307 and there is
a service listening on, in big endian order, port 0x020F (decimal
527), then complete universal address is "10.1.3.7.2.15".
For the "tcp6" and "udp6" Network Identifiers the Universal Address
or r_addr (for IPv6) is a US-ASCII string and is of the form:
x1:x2:x3:x4:x5:x6:x7:x8.p1.p2
The suffix "p1.p2" is the service port, and is computed the same way For the "tcp" and "udp" Network Identifiers the Universal Address or
as with universal addresses for "tcp" and "udp". The prefix, "x1:x2: r_addr (for IPv6) is a US-ASCII string and is of the form described
x3:x4:x5:x6:x7:x8", is the standard textual form for representing an in Section 3.3.9.2.
IPv6 address as defined in Section 2.2 of RFC2373 [13].
Additionally, the two alternative forms specified in Section 2.2 of
RFC2373 [13] are also acceptable.
As mentioned, the registration of new Network Identifiers will As mentioned, the registration of new Network Identifiers will
require the publication of an Informational RFC with similar detail require the publication of an RFC with similar detail as listed above
as listed above for the Network Identifier itself and corresponding for the Network Identifier itself and corresponding Universal
Universal Address. Address.
22.3. Defining New Notifications 22.3. Defining New Notifications
New notification types may be added to the CB_NOTIFY_DEVICEID New notification types may be added to the CB_NOTIFY_DEVICEID
operation Section 20.12. This can be done via changes to the operation Section 20.12. This can be done via changes to the
operations that register notifications, or by adding new operations operations that register notifications, or by adding new operations
to NFSv4. This requires a new minor version of NFSv4, and requires a to NFSv4. This requires a new minor version of NFSv4, and requires a
standards track document from IETF. Another way to add a standards track document from IETF. Another way to add a
notification is to specify a new layout type. Notifications for new notification is to specify a new layout type. Notifications for new
layout types would be requested via GETDEVICELIST (Section 18.41) and layout types would be requested via GETDEVICELIST (Section 18.41) and
skipping to change at page 560, line 26 skipping to change at page 563, line 7
Standards" (STD 1). The new layout specification will be Standards" (STD 1). The new layout specification will be
submitted for eventual publication as a standards track RFC. submitted for eventual publication as a standards track RFC.
5. The layout specification progresses through the IETF standards 5. The layout specification progresses through the IETF standards
process; the new option will be reviewed by the NFSv4 Working process; the new option will be reviewed by the NFSv4 Working
Group (if that group still exists), or as an Internet Draft not Group (if that group still exists), or as an Internet Draft not
submitted by an IETF working group. submitted by an IETF working group.
22.5. Path Variable Definitions 22.5. Path Variable Definitions
This section deals with the IANA considerations associated the the This section deals with the IANA considerations associated with the
variable substitution feature for location names as described in variable substitution feature for location names as described in
Section 11.10.3. As described there, variables subject to Section 11.10.3. As described there, variables subject to
substitution consist of a domain name and a specific name within that substitution consist of a domain name and a specific name within that
domain, with two separated by a colon. domain, with two separated by a colon.
22.5.1. Path Variable Values 22.5.1. Path Variable Values
For names with the domain "ietf.org" only three specific names are For names with the domain "ietf.org" only three specific names are
currently defined and additional names will only be created via currently defined and additional names will only be created via
standards-track RFC's. standards-track RFC's.
skipping to change at page 561, line 43 skipping to change at page 564, line 22
5 Generic Security Service Application Program Interface (GSS- 5 Generic Security Service Application Program Interface (GSS-
API) Mechanism Version 2", RFC 4121, July 2005. API) Mechanism Version 2", RFC 4121, July 2005.
[6] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism [6] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism
Using SPKM", RFC 2847, June 2000. Using SPKM", RFC 2847, June 2000.
[7] Linn, J., "Generic Security Service Application Program [7] Linn, J., "Generic Security Service Application Program
Interface Version 2, Update 1", RFC 2743, January 2000. Interface Version 2, Update 1", RFC 2743, January 2000.
[8] Talpey, T. and B. Callaghan, "RDMA Transport for ONC RPC - A [8] Talpey, T. and B. Callaghan, "RDMA Transport for ONC RPC - A
Work in Progress", Internet Draft draft-ietf-nfsv4-rpcrdma-05, Work in Progress", Internet Draft draft-ietf-nfsv4-rpcrdma-06,
May 2007. May 2007.
[9] Talpey, T. and B. Callaghan, "NFS Direct Data Placement - A [9] Talpey, T. and B. Callaghan, "NFS Direct Data Placement - A
Work in Progress", Internet Work in Progress", Internet
Draft draft-ietf-nfsv4-nfsdirect-05, May 2007. Draft draft-ietf-nfsv4-nfsdirect-06, May 2007.
[10] Recio, P., Metzler, B., Culley, P., Hilland, J., and D. Garcia, [10] Recio, P., Metzler, B., Culley, P., Hilland, J., and D. Garcia,
"A Remote Direct Memory Access Protocol Specification", "A Remote Direct Memory Access Protocol Specification",
RFC 5040, October 2007. RFC 5040, October 2007.
[11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing
for Message Authentication", RFC 2104, February 1997. for Message Authentication", RFC 2104, February 1997.
[12] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1 [12] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1
XDR Description A Work in Progress", Internet XDR Description A Work in Progress", Internet
Draft draft-ietf-nfsv4-minorversion1-dot-x-02.txt, Draft draft-ietf-nfsv4-minorversion1-dot-x-04.txt,
December 2007. December 2007.
[13] Hinden, R. and S. Deering, "IP Version 6 Addressing [13] Hinden, R. and S. Deering, "IP Version 6 Addressing
Architecture", RFC 2373, July 1998. Architecture", RFC 3513, April 2003.
[14] International Organization for Standardization, "Information [14] International Organization for Standardization, "Information
Technology - Universal Multiple-octet coded Character Set (UCS) Technology - Universal Multiple-octet coded Character Set (UCS)
- Part 1: Architecture and Basic Multilingual Plane", - Part 1: Architecture and Basic Multilingual Plane",
ISO Standard 10646-1, May 1993. ISO Standard 10646-1, May 1993.
[15] Alvestrand, H., "IETF Policy on Character Sets and Languages", [15] Alvestrand, H., "IETF Policy on Character Sets and Languages",
BCP 18, RFC 2277, January 1998. BCP 18, RFC 2277, January 1998.
[16] Hoffman, P. and M. Blanchet, "Preparation of Internationalized [16] Hoffman, P. and M. Blanchet, "Preparation of Internationalized
skipping to change at page 563, line 47 skipping to change at page 566, line 25
[33] Callaghan, B., "WebNFS Server Specification", RFC 2055, [33] Callaghan, B., "WebNFS Server Specification", RFC 2055,
October 1996. October 1996.
[34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
June 1999. June 1999.
[35] Simonsen, K., "Character Mnemonics and Character Sets", [35] Simonsen, K., "Character Mnemonics and Character Sets",
RFC 1345, June 1992. RFC 1345, June 1992.
[36] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E. [36] The Open Group, "Protocols for Interworking: XNFS, Version 3W,
ISBN 1-85912-184-5", February 1998.
[37] Floyd, S. and V. Jacobson, "The Synchronization of Periodic
Routing Messages", IEEE/ACM Transactions on Networking 2(2),
pp. 122-136, April 1994.
[38] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., and E.
Zeidner, "Internet Small Computer Systems Interface (iSCSI)", Zeidner, "Internet Small Computer Systems Interface (iSCSI)",
RFC 3720, April 2004. RFC 3720, April 2004.
[37] Snively, R., "Fibre Channel Protocol for SCSI, 2nd Version [39] Snively, R., "Fibre Channel Protocol for SCSI, 2nd Version
(FCP-2)", ANSI/INCITS 350-2003, Oct 2003. (FCP-2)", ANSI/INCITS 350-2003, Oct 2003.
[38] Weber, R., "Object-Based Storage Device Commands (OSD)", ANSI/ [40] Weber, R., "Object-Based Storage Device Commands (OSD)", ANSI/
INCITS 400-2004, July 2004, INCITS 400-2004, July 2004,
<http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>. <http://www.t10.org/ftp/t10/drafts/osd/osd-r10.pdf>.
[39] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. [41] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997.
[40] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation [42] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation
for WebNFS", RFC 2755, January 2000. for WebNFS", RFC 2755, January 2000.
Appendix A. Acknowledgments Appendix A. Acknowledgments
The initial drafts for the SECINFO extensions were edited by Mike The initial drafts for the SECINFO extensions were edited by Mike
Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl
Burnett. Burnett.
The initial drafts for the SESSIONS extensions were edited by Tom The initial drafts for the SESSIONS extensions were edited by Tom
Talpey, Spencer Shepler, Jon Bauman with contributions from Charles Talpey, Spencer Shepler, Jon Bauman with contributions from Charles
skipping to change at page 564, line 43 skipping to change at page 567, line 29
Carl Burnett, Ted Anderson and Tom Talpey. Carl Burnett, Ted Anderson and Tom Talpey.
The initial drafts for the ACL explanations were contributed by Sam The initial drafts for the ACL explanations were contributed by Sam
Falkner and Lisa Week. Falkner and Lisa Week.
The initial drafts for the parallel NFS support were edited by Brent The initial drafts for the parallel NFS support were edited by Brent
Welch and Garth Goodson. Additional authors for those documents were Welch and Garth Goodson. Additional authors for those documents were
Benny Halevy, David Black, and Andy Adamson. Additional input came Benny Halevy, David Black, and Andy Adamson. Additional input came
from the informal group which contributed to the construction of the from the informal group which contributed to the construction of the
initial pNFS drafts; specific acknowledgement goes to Gary Grider, initial pNFS drafts; specific acknowledgement goes to Gary Grider,
Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella The Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella.
pNFS work was inspired by the NASD and OSD work done by Garth Gibson. The pNFS work was inspired by the NASD and OSD work done by Garth
Gary Grider of the national labs (LANL) has also been a champion of Gibson. Gary Grider of the national labs (LANL) has also been a
high-performance parallel I/O. champion of high-performance parallel I/O.
Fredric Isaman found several errors in draft versions of the ONC RPC Fredric Isaman found several errors in draft versions of the ONC RPC
XDR description of the NFSv4.1 protocol. XDR description of the NFSv4.1 protocol.
Audrey Van Bellingham provided, in numerous ways, essential co- Audrey Van Bellingham provided, in numerous ways, essential co-
ordination and management of the process of editing the specification ordination and management of the process of editing the specification
drafts. drafts.
Richard Jernigan gave feedback on the file layout's striping pattern Richard Jernigan gave feedback on the file layout's striping pattern
design. design.
skipping to change at page 566, line 10 skipping to change at page 568, line 45
Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer
Shepler, Renu Tewari, Lisa Week, and Brent Welch. Shepler, Renu Tewari, Lisa Week, and Brent Welch.
A review team worked together to generate the tables of assignments A review team worked together to generate the tables of assignments
of error sets to operations and make sure that each such assignment of error sets to operations and make sure that each such assignment
had two or more people validating it. Participating in the process had two or more people validating it. Participating in the process
were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert
Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy
Weaver, and Lisa Week. Weaver, and Lisa Week.
Others who provided comments include: Mahesh Siddheshwar.
Authors' Addresses Authors' Addresses
Spencer Shepler Spencer Shepler
Sun Microsystems, Inc. Sun Microsystems, Inc.
7808 Moonflower Drive 7808 Moonflower Drive
Austin, TX 78750 Austin, TX 78750
USA USA
Phone: +1-512-401-1080 Phone: +1-512-401-1080
Email: spencer.shepler@sun.com Email: spencer.shepler@sun.com
 End of changes. 219 change blocks. 
854 lines changed or deleted 914 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/