Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring...
draft-pre-ch-9.txt | draft-ietf-nfsv4-minorversion1-22.txt | |||
---|---|---|---|---|
NFSv4 S. Shepler | NFSv4 S. Shepler | |||
Internet-Draft M. Eisler | Internet-Draft M. Eisler | |||
Intended status: Standards Track D. Noveck | Intended status: Standards Track D. Noveck | |||
Expires: September 19, 2008 Editors | Expires: September 22, 2008 Editors | |||
March 18, 2008 | March 21, 2008 | |||
NFS Version 4 Minor Version 1 | NFS Version 4 Minor Version 1 | |||
draft-ietf-nfsv4-minorversion1-22.txt | draft-ietf-nfsv4-minorversion1-22.txt | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on September 19, 2008. | This Internet-Draft will expire on September 22, 2008. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The IETF Trust (2008). | Copyright (C) The IETF Trust (2008). | |||
Abstract | Abstract | |||
This Internet-Draft describes NFS version 4 minor version one, | This Internet-Draft describes NFS version 4 minor version one, | |||
including features retained from the base protocol and protocol | including features retained from the base protocol and protocol | |||
extensions made subsequently. Major extensions introduced in NFS | extensions made subsequently. Major extensions introduced in NFS | |||
skipping to change at page 4, line 39 | skipping to change at page 4, line 39 | |||
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 | 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 | |||
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159 | 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159 | |||
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 | 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 | |||
8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 | 8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 | |||
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168 | 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168 | |||
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169 | 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169 | |||
8.7. Clocks, Propagation Delay, and Calculating Lease | 8.7. Clocks, Propagation Delay, and Calculating Lease | |||
Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 | Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 | |||
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170 | 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170 | |||
9. File Locking and Share Reservations . . . . . . . . . . . . . 171 | 9. File Locking and Share Reservations . . . . . . . . . . . . . 171 | |||
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 | 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 171 | |||
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 | 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 | |||
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172 | 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172 | |||
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175 | 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175 | |||
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 | 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 | |||
9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 176 | 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 176 | |||
9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 176 | 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 176 | |||
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 | 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 177 | |||
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178 | 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178 | |||
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 | 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 | |||
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 | 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 | |||
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180 | 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180 | |||
9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 | 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 181 | |||
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181 | 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181 | |||
10.1. Performance Challenges for Client-Side Caching . . . . . 181 | 10.1. Performance Challenges for Client-Side Caching . . . . . 182 | |||
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 | 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 183 | |||
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185 | 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185 | |||
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187 | 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187 | |||
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 | 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 | |||
10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 | 10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 | |||
10.3.3. Data Caching and Mandatory File Locking . . . . . . 190 | 10.3.3. Data Caching and Mandatory File Locking . . . . . . 190 | |||
10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 | 10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 | |||
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 | 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 192 | |||
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194 | 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194 | |||
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 | 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 | |||
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 | 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 196 | |||
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 | 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 199 | |||
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 | 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 201 | |||
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201 | 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201 | |||
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 | 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 202 | |||
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 | 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 203 | |||
10.5.1. Revocation Recovery for Write Open Delegation . . . 203 | 10.5.1. Revocation Recovery for Write Open Delegation . . . 203 | |||
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 | 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 204 | |||
10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 | 10.7. Data and Metadata Caching and Memory Mapped Files . . . 206 | |||
10.8. Name and Directory Caching without Directory | 10.8. Name and Directory Caching without Directory | |||
Delegations . . . . . . . . . . . . . . . . . . . . . . 208 | Delegations . . . . . . . . . . . . . . . . . . . . . . 208 | |||
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208 | 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208 | |||
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 | 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 210 | |||
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 | 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 211 | |||
10.9.1. Introduction to Directory Delegations . . . . . . . 210 | 10.9.1. Introduction to Directory Delegations . . . . . . . 211 | |||
10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 | 10.9.2. Directory Delegation Design . . . . . . . . . . . . 212 | |||
10.9.3. Attributes in Support of Directory Notifications . . 212 | 10.9.3. Attributes in Support of Directory Notifications . . 213 | |||
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 | 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 213 | |||
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 | 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 214 | |||
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 | 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 214 | |||
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214 | 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214 | |||
11.2. File System Presence or Absence . . . . . . . . . . . . 214 | 11.2. File System Presence or Absence . . . . . . . . . . . . 215 | |||
11.3. Getting Attributes for an Absent File System . . . . . . 215 | 11.3. Getting Attributes for an Absent File System . . . . . . 216 | |||
11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 | 11.3.1. GETATTR Within an Absent File System . . . . . . . . 216 | |||
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217 | 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217 | |||
11.4. Uses of Location Information . . . . . . . . . . . . . . 217 | 11.4. Uses of Location Information . . . . . . . . . . . . . . 218 | |||
11.4.1. File System Replication . . . . . . . . . . . . . . 218 | 11.4.1. File System Replication . . . . . . . . . . . . . . 219 | |||
11.4.2. File System Migration . . . . . . . . . . . . . . . 219 | 11.4.2. File System Migration . . . . . . . . . . . . . . . 219 | |||
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 | 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 221 | |||
11.5. Location Entries and Server Identity . . . . . . . . . . 221 | 11.5. Location Entries and Server Identity . . . . . . . . . . 222 | |||
11.6. Additional Client-side Considerations . . . . . . . . . 222 | 11.6. Additional Client-side Considerations . . . . . . . . . 223 | |||
11.7. Effecting File System Transitions . . . . . . . . . . . 223 | 11.7. Effecting File System Transitions . . . . . . . . . . . 223 | |||
11.7.1. File System Transitions and Simultaneous Access . . 224 | 11.7.1. File System Transitions and Simultaneous Access . . 225 | |||
11.7.2. Simultaneous Use and Transparent Transitions . . . . 225 | 11.7.2. Simultaneous Use and Transparent Transitions . . . . 225 | |||
11.7.3. Filehandles and File System Transitions . . . . . . 227 | 11.7.3. Filehandles and File System Transitions . . . . . . 228 | |||
11.7.4. Fileids and File System Transitions . . . . . . . . 228 | 11.7.4. Fileids and File System Transitions . . . . . . . . 228 | |||
11.7.5. Fsids and File System Transitions . . . . . . . . . 229 | 11.7.5. Fsids and File System Transitions . . . . . . . . . 230 | |||
11.7.6. The Change Attribute and File System Transitions . . 230 | 11.7.6. The Change Attribute and File System Transitions . . 230 | |||
11.7.7. Lock State and File System Transitions . . . . . . . 230 | 11.7.7. Lock State and File System Transitions . . . . . . . 231 | |||
11.7.8. Write Verifiers and File System Transitions . . . . 234 | 11.7.8. Write Verifiers and File System Transitions . . . . 235 | |||
11.7.9. Readdir Cookies and Verifiers and File System | 11.7.9. Readdir Cookies and Verifiers and File System | |||
Transitions . . . . . . . . . . . . . . . . . . . . 234 | Transitions . . . . . . . . . . . . . . . . . . . . 235 | |||
11.7.10. File System Data and File System Transitions . . . . 235 | 11.7.10. File System Data and File System Transitions . . . . 235 | |||
11.8. Effecting File System Referrals . . . . . . . . . . . . 236 | 11.8. Effecting File System Referrals . . . . . . . . . . . . 237 | |||
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 | 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 237 | |||
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 | 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 241 | |||
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243 | 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243 | |||
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245 | 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 246 | |||
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 | 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 249 | |||
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254 | 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254 | |||
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255 | 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255 | |||
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257 | 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257 | |||
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 | 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 261 | |||
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 | 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 261 | |||
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 | 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 | 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 | 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 263 | |||
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 | 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 264 | |||
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 | 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 264 | |||
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 | 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 264 | |||
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 | 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 264 | |||
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 | 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 264 | |||
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 264 | 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 265 | |||
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 264 | 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 265 | |||
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 265 | 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 266 | |||
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 266 | 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 267 | |||
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 267 | 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 268 | |||
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 267 | 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 268 | |||
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 267 | 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 268 | |||
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 269 | 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 270 | |||
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 270 | 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 271 | |||
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 271 | 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 272 | |||
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 274 | 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 275 | |||
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 281 | 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 282 | |||
12.5.7. Metadata Server Write Propagation . . . . . . . . . 281 | 12.5.7. Metadata Server Write Propagation . . . . . . . . . 282 | |||
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 281 | 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 282 | |||
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 283 | 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 284 | |||
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 283 | 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 284 | |||
12.7.2. Dealing with Lease Expiration on the Client . . . . 284 | 12.7.2. Dealing with Lease Expiration on the Client . . . . 285 | |||
12.7.3. Dealing with Loss of Layout State on the Metadata | 12.7.3. Dealing with Loss of Layout State on the Metadata | |||
Server . . . . . . . . . . . . . . . . . . . . . . . 285 | Server . . . . . . . . . . . . . . . . . . . . . . . 286 | |||
12.7.4. Recovery from Metadata Server Restart . . . . . . . 285 | 12.7.4. Recovery from Metadata Server Restart . . . . . . . 286 | |||
12.7.5. Operations During Metadata Server Grace Period . . . 287 | 12.7.5. Operations During Metadata Server Grace Period . . . 288 | |||
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 288 | 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 289 | |||
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 288 | 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 289 | |||
12.9. Security Considerations for pNFS . . . . . . . . . . . . 288 | 12.9. Security Considerations for pNFS . . . . . . . . . . . . 289 | |||
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 289 | 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 290 | |||
13.1. Client ID and Session Considerations . . . . . . . . . . 290 | 13.1. Client ID and Session Considerations . . . . . . . . . . 291 | |||
13.1.1. Sessions Considerations for Data Servers . . . . . . 292 | 13.1.1. Sessions Considerations for Data Servers . . . . . . 293 | |||
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 292 | 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 293 | |||
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 293 | 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 294 | |||
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 297 | 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 298 | |||
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 297 | 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 298 | |||
13.4.2. Interpreting the File Layout Using Sparse Packing . 297 | 13.4.2. Interpreting the File Layout Using Sparse Packing . 298 | |||
13.4.3. Interpreting the File Layout Using Dense Packing . . 300 | 13.4.3. Interpreting the File Layout Using Dense Packing . . 301 | |||
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 302 | 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 303 | |||
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 304 | 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 305 | |||
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 305 | 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 306 | |||
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 307 | 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 308 | |||
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 309 | 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 310 | |||
13.9. Metadata and Data Server State Coordination . . . . . . 309 | 13.9. Metadata and Data Server State Coordination . . . . . . 310 | |||
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 309 | 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 310 | |||
13.9.2. Data Server State Propagation . . . . . . . . . . . 310 | 13.9.2. Data Server State Propagation . . . . . . . . . . . 311 | |||
13.10. Data Server Component File Size . . . . . . . . . . . . 312 | 13.10. Data Server Component File Size . . . . . . . . . . . . 313 | |||
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 313 | 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 314 | |||
13.12. Security Considerations for the File Layout Type . . . . 313 | 13.12. Security Considerations for the File Layout Type . . . . 314 | |||
14. Internationalization . . . . . . . . . . . . . . . . . . . . 314 | 14. Internationalization . . . . . . . . . . . . . . . . . . . . 315 | |||
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 315 | 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 316 | |||
14.2. Stringprep profile for the utf8str_cis type . . . . . . 317 | 14.2. Stringprep profile for the utf8str_cis type . . . . . . 318 | |||
14.3. Stringprep profile for the utf8str_mixed type . . . . . 318 | 14.3. Stringprep profile for the utf8str_mixed type . . . . . 319 | |||
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 320 | 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 321 | |||
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 320 | 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 321 | |||
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 321 | 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 322 | |||
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 321 | 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 322 | |||
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 323 | 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 324 | |||
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 325 | 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 326 | |||
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 326 | 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 327 | |||
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 328 | 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 329 | |||
15.1.5. State Management Errors . . . . . . . . . . . . . . 330 | 15.1.5. State Management Errors . . . . . . . . . . . . . . 331 | |||
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 331 | 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 332 | |||
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 331 | 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 332 | |||
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 332 | 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 333 | |||
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 333 | 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 334 | |||
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 334 | 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 335 | |||
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 335 | 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 336 | |||
15.1.12. Session Management Errors . . . . . . . . . . . . . 336 | 15.1.12. Session Management Errors . . . . . . . . . . . . . 337 | |||
15.1.13. Client Management Errors . . . . . . . . . . . . . . 337 | 15.1.13. Client Management Errors . . . . . . . . . . . . . . 338 | |||
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 338 | 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 339 | |||
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 338 | 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 339 | |||
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 339 | 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 340 | |||
15.2. Operations and their valid errors . . . . . . . . . . . 340 | 15.2. Operations and their valid errors . . . . . . . . . . . 341 | |||
15.3. Callback operations and their valid errors . . . . . . . 356 | 15.3. Callback operations and their valid errors . . . . . . . 357 | |||
15.4. Errors and the operations that use them . . . . . . . . 358 | 15.4. Errors and the operations that use them . . . . . . . . 359 | |||
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 372 | 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 373 | |||
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 372 | 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 373 | |||
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 373 | 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 374 | |||
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 383 | 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 384 | |||
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 386 | 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 387 | |||
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 386 | 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 387 | |||
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 389 | 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 390 | |||
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 390 | 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 391 | |||
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 393 | 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 394 | |||
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | |||
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 396 | Recovery . . . . . . . . . . . . . . . . . . . . . . . . 397 | |||
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 397 | 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 398 | |||
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 397 | 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 398 | |||
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 399 | 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 400 | |||
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 400 | 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 401 | |||
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 402 | 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 403 | |||
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 406 | 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 407 | |||
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 408 | 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 409 | |||
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 409 | 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 410 | |||
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 411 | 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 412 | |||
18.15. Operation 17: NVERIFY - Verify Difference in | 18.15. Operation 17: NVERIFY - Verify Difference in | |||
Attributes . . . . . . . . . . . . . . . . . . . . . . . 412 | Attributes . . . . . . . . . . . . . . . . . . . . . . . 413 | |||
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 413 | 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 414 | |||
18.17. Operation 19: OPENATTR - Open Named Attribute | 18.17. Operation 19: OPENATTR - Open Named Attribute | |||
Directory . . . . . . . . . . . . . . . . . . . . . . . 432 | Directory . . . . . . . . . . . . . . . . . . . . . . . 433 | |||
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 433 | 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 434 | |||
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 435 | 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 436 | |||
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 435 | 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 436 | |||
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 437 | 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 438 | |||
18.22. Operation 25: READ - Read from File . . . . . . . . . . 437 | 18.22. Operation 25: READ - Read from File . . . . . . . . . . 438 | |||
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 440 | 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 441 | |||
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 444 | 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 445 | |||
18.25. Operation 28: REMOVE - Remove File System Object . . . . 445 | 18.25. Operation 28: REMOVE - Remove File System Object . . . . 446 | |||
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 447 | 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 448 | |||
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 451 | 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 452 | |||
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 452 | 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 453 | |||
18.29. Operation 33: SECINFO - Obtain Available Security . . . 452 | 18.29. Operation 33: SECINFO - Obtain Available Security . . . 453 | |||
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 455 | 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 456 | |||
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 458 | 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 459 | |||
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 459 | 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 460 | |||
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 464 | 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 465 | |||
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 465 | 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 466 | |||
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 468 | 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 469 | |||
18.36. Operation 43: CREATE_SESSION - Create New Session and | 18.36. Operation 43: CREATE_SESSION - Create New Session and | |||
Confirm Client ID . . . . . . . . . . . . . . . . . . . 484 | Confirm Client ID . . . . . . . . . . . . . . . . . . . 485 | |||
18.37. Operation 44: DESTROY_SESSION - Destroy existing | 18.37. Operation 44: DESTROY_SESSION - Destroy existing | |||
session . . . . . . . . . . . . . . . . . . . . . . . . 494 | session . . . . . . . . . . . . . . . . . . . . . . . . 495 | |||
18.38. Operation 45: FREE_STATEID - Free stateid with no | 18.38. Operation 45: FREE_STATEID - Free stateid with no | |||
locks . . . . . . . . . . . . . . . . . . . . . . . . . 496 | locks . . . . . . . . . . . . . . . . . . . . . . . . . 497 | |||
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | |||
delegation . . . . . . . . . . . . . . . . . . . . . . . 497 | delegation . . . . . . . . . . . . . . . . . . . . . . . 498 | |||
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 501 | 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 502 | |||
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | |||
for a File System . . . . . . . . . . . . . . . . . . . 503 | for a File System . . . . . . . . . . . . . . . . . . . 504 | |||
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | |||
a layout . . . . . . . . . . . . . . . . . . . . . . . . 505 | a layout . . . . . . . . . . . . . . . . . . . . . . . . 506 | |||
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 508 | 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 509 | |||
18.44. Operation 51: LAYOUTRETURN - Release Layout | 18.44. Operation 51: LAYOUTRETURN - Release Layout | |||
Information . . . . . . . . . . . . . . . . . . . . . . 512 | Information . . . . . . . . . . . . . . . . . . . . . . 513 | |||
18.45. Operation 52: SECINFO_NO_NAME - Get Security on | 18.45. Operation 52: SECINFO_NO_NAME - Get Security on | |||
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 517 | Unnamed Object . . . . . . . . . . . . . . . . . . . . . 518 | |||
18.46. Operation 53: SEQUENCE - Supply per-procedure | 18.46. Operation 53: SEQUENCE - Supply per-procedure | |||
sequencing and control . . . . . . . . . . . . . . . . . 518 | sequencing and control . . . . . . . . . . . . . . . . . 519 | |||
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 524 | 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 525 | |||
18.48. Operation 55: TEST_STATEID - Test stateids for | 18.48. Operation 55: TEST_STATEID - Test stateids for | |||
validity . . . . . . . . . . . . . . . . . . . . . . . . 526 | validity . . . . . . . . . . . . . . . . . . . . . . . . 527 | |||
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 528 | 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 529 | |||
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | |||
client ID . . . . . . . . . . . . . . . . . . . . . . . 532 | client ID . . . . . . . . . . . . . . . . . . . . . . . 533 | |||
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | |||
Finished . . . . . . . . . . . . . . . . . . . . . . . . 532 | Finished . . . . . . . . . . . . . . . . . . . . . . . . 533 | |||
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 535 | 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 536 | |||
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 535 | 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 536 | |||
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 536 | 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 537 | |||
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 536 | 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 537 | |||
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 540 | 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 541 | |||
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 540 | 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 541 | |||
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 541 | 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 542 | |||
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | |||
Client . . . . . . . . . . . . . . . . . . . . . . . . . 542 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 543 | |||
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 546 | 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 547 | |||
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | |||
Client . . . . . . . . . . . . . . . . . . . . . . . . . 550 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 551 | |||
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 551 | 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 552 | |||
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | |||
Resources for Recallable Objects . . . . . . . . . . . . 553 | Resources for Recallable Objects . . . . . . . . . . . . 554 | |||
20.8. Operation 10: CB_RECALL_SLOT - change flow control | 20.8. Operation 10: CB_RECALL_SLOT - change flow control | |||
limits . . . . . . . . . . . . . . . . . . . . . . . . . 554 | limits . . . . . . . . . . . . . . . . . . . . . . . . . 555 | |||
20.9. Operation 11: CB_SEQUENCE - Supply backchannel | 20.9. Operation 11: CB_SEQUENCE - Supply backchannel | |||
sequencing and control . . . . . . . . . . . . . . . . . 555 | sequencing and control . . . . . . . . . . . . . . . . . 556 | |||
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | |||
Delegation Wants . . . . . . . . . . . . . . . . . . . . 557 | Delegation Wants . . . . . . . . . . . . . . . . . . . . 558 | |||
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | |||
lock availability . . . . . . . . . . . . . . . . . . . 558 | lock availability . . . . . . . . . . . . . . . . . . . 559 | |||
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | |||
changes . . . . . . . . . . . . . . . . . . . . . . . . 560 | changes . . . . . . . . . . . . . . . . . . . . . . . . 561 | |||
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | |||
Operation . . . . . . . . . . . . . . . . . . . . . . . 562 | Operation . . . . . . . . . . . . . . . . . . . . . . . 563 | |||
21. Security Considerations . . . . . . . . . . . . . . . . . . . 562 | 21. Security Considerations . . . . . . . . . . . . . . . . . . . 563 | |||
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 564 | 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 565 | |||
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 564 | 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 565 | |||
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 564 | 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 565 | |||
22.3. Defining New Notifications . . . . . . . . . . . . . . . 565 | 22.3. Defining New Notifications . . . . . . . . . . . . . . . 566 | |||
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 565 | 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 566 | |||
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 567 | 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 568 | |||
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 567 | 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 568 | |||
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 567 | 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 568 | |||
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 567 | 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 568 | |||
23.1. Normative References . . . . . . . . . . . . . . . . . . 567 | 23.1. Normative References . . . . . . . . . . . . . . . . . . 568 | |||
23.2. Informative References . . . . . . . . . . . . . . . . . 569 | 23.2. Informative References . . . . . . . . . . . . . . . . . 570 | |||
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 570 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 571 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 572 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 573 | |||
Intellectual Property and Copyright Statements . . . . . . . . . 574 | Intellectual Property and Copyright Statements . . . . . . . . . 575 | |||
1. Introduction | 1. Introduction | |||
1.1. The NFS Version 4 Minor Version 1 Protocol | 1.1. The NFS Version 4 Minor Version 1 Protocol | |||
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | |||
minor version of the NFS version 4 (NFSv4) protocol. The first minor | minor version of the NFS version 4 (NFSv4) protocol. The first minor | |||
version, NFSv4.0 is described in [21]. It generally follows the | version, NFSv4.0 is described in [21]. It generally follows the | |||
guidelines for minor versioning model listed in Section 10 of RFC | guidelines for minor versioning model listed in Section 10 of RFC | |||
3530. However, it diverges from guidelines 11 ("a client and server | 3530. However, it diverges from guidelines 11 ("a client and server | |||
skipping to change at page 147, line 40 | skipping to change at page 147, line 40 | |||
8. State Management | 8. State Management | |||
Integrating locking into the NFS protocol necessarily causes it to be | Integrating locking into the NFS protocol necessarily causes it to be | |||
stateful. With the inclusion of such features as share reservations, | stateful. With the inclusion of such features as share reservations, | |||
file and directory delegations, recallable layouts, and support for | file and directory delegations, recallable layouts, and support for | |||
mandatory record locking, the protocol becomes substantially more | mandatory record locking, the protocol becomes substantially more | |||
dependent on proper management of state than the traditional | dependent on proper management of state than the traditional | |||
combination of NFS and NLM [36]. These features include expanded | combination of NFS and NLM [36]. These features include expanded | |||
locking facilities, which provide some measure of interclient | locking facilities, which provide some measure of interclient | |||
exclusion, but the state is also valuable to offering features not | exclusion, but the state also offers features not readily providable | |||
readily providable using a stateless model. There are three | using a stateless model. There are three components to making this | |||
components to making this state manageable: | state manageable: | |||
o Clear division between client and server | o Clear division between client and server | |||
o Ability to reliably detect inconsistency in state between client | o Ability to reliably detect inconsistency in state between client | |||
and server | and server | |||
o Simple and robust recovery mechanisms | o Simple and robust recovery mechanisms | |||
In this model, the server owns the state information. The client | In this model, the server owns the state information. The client | |||
requests changes in locks and the server responds with the changes | requests changes in locks and the server responds with the changes | |||
made. Non-client-initiated changes in locking state are infrequent | made. Non-client-initiated changes in locking state are infrequent. | |||
and the client receives prompt notification of them and can adjust | The client receives prompt notification of such changes and can | |||
its view of the locking state to reflect the server's changes. | adjust its view of the locking state to reflect the server's changes. | |||
Individual pieces of state created by the server and passed to the | Individual pieces of state created by the server and passed to the | |||
client at its request are represented by 128-bit stateids. These | client at its request are represented by 128-bit stateids. These | |||
stateids may represent a particular open file, a set of byte-range | stateids may represent a particular open file, a set of byte-range | |||
locks held by a particular owner, or a recallable delegation of | locks held by a particular owner, or a recallable delegation of | |||
privileges to access a file in particular ways, or at a particular | privileges to access a file in particular ways, or at a particular | |||
location. | location. | |||
In all cases, there is a transition from the most general information | In all cases, there is a transition from the most general information | |||
which represents a client as a whole to the eventual lightweight | which represents a client as a whole to the eventual lightweight | |||
skipping to change at page 149, line 32 | skipping to change at page 149, line 32 | |||
With the exception of special stateids, to be discussed later, each | With the exception of special stateids, to be discussed later, each | |||
stateid represents locking objects of one of a set of types defined | stateid represents locking objects of one of a set of types defined | |||
by the NFSv4.1 protocol. Note that in all these cases, where we | by the NFSv4.1 protocol. Note that in all these cases, where we | |||
speak of guarantee, it is understood there are situations such as a | speak of guarantee, it is understood there are situations such as a | |||
client restart, or lock revocation, that allow the guarantee to be | client restart, or lock revocation, that allow the guarantee to be | |||
voided. | voided. | |||
o Stateids may represent opens of files. | o Stateids may represent opens of files. | |||
Each stateid in this case represents the open for a given client | Each stateid in this case represents the open state for a given | |||
ID/open-owner/filehandle triple. Such stateids are subject to | client ID/open-owner/filehandle triple. Such stateids are subject | |||
change (with consequent incrementing of the stateid's seqid) in | to change (with consequent incrementing of the stateid's seqid) in | |||
response to OPENs that result in upgrade and OPEN_DOWNGRADE | response to OPENs that result in upgrade and OPEN_DOWNGRADE | |||
operations. | operations. | |||
o Stateids may represent sets of byte-range locks. | o Stateids may represent sets of byte-range locks. | |||
All locks held on a particular file by a particular owner and all | All locks held on a particular file by a particular owner and all | |||
gotten under the aegis of a particular open file are associated | gotten under the aegis of a particular open file are associated | |||
with a single stateid with the seqid being increment whenever LOCK | with a single stateid with the seqid being incremented whenever | |||
and LOCKU operations affect that set of locks. | LOCK and LOCKU operations affect that set of locks. | |||
o Stateids may represent file delegations, which are recallable | o Stateids may represent file delegations, which are recallable | |||
guarantees by the server to the client, that other clients will | guarantees by the server to the client, that other clients will | |||
not reference, or will not modify a particular file, until the | not reference, or will not modify a particular file, until the | |||
delegation is returned. In NFSv4.1, file delegations may be | delegation is returned. In NFSv4.1, file delegations may be | |||
obtained on both regular and non-regular files. | obtained on both regular and non-regular files. | |||
A stateid represents a single delegation held by a client for a | A stateid represents a single delegation held by a client for a | |||
particular filehandle. | particular filehandle. | |||
skipping to change at page 157, line 20 | skipping to change at page 157, line 20 | |||
used for one of those connections. | used for one of those connections. | |||
o Transport retransmission delays might become so large as to | o Transport retransmission delays might become so large as to | |||
approach or exceed the length of the lease period. This may be | approach or exceed the length of the lease period. This may be | |||
particularly likely when the server is unresponsive due to a | particularly likely when the server is unresponsive due to a | |||
restart; see Section 8.4.2.1. If the client implementation is not | restart; see Section 8.4.2.1. If the client implementation is not | |||
careful, transport retransmission delays can result in the client | careful, transport retransmission delays can result in the client | |||
failing to detect a server restart before the grace period ends. | failing to detect a server restart before the grace period ends. | |||
The scenario is that the client is using a transport with | The scenario is that the client is using a transport with | |||
exponential back off, such that the maximum retransmission timeout | exponential back off, such that the maximum retransmission timeout | |||
excees the both the grace period and the lease_time attribute. A | exceeds the both the grace period and the lease_time attribute. A | |||
network partition causes the client's connection's retransmission | network partition causes the client's connection's retransmission | |||
interval to back off, and even after the partition heals, the next | interval to back off, and even after the partition heals, the next | |||
transport-level retransmission is sent after the server has | transport-level retransmission is sent after the server has | |||
restarted and its grace period ends. | restarted and its grace period ends. | |||
The client MUST either recover from the ensuing NFS4ERR_NOGRACE | The client MUST either recover from the ensuing NFS4ERR_NOGRACE | |||
errors, or it MUST ensure that despite transport level | errors, or it MUST ensure that despite transport level | |||
retransmission intervals that exceed the lease_time, nonetheless a | retransmission intervals that exceed the lease_time, nonetheless a | |||
SEQUENCE operation is sent that renews the lease before | SEQUENCE operation is sent that renews the lease before | |||
expiration. The client can achieve this by associating a new | expiration. The client can achieve this by associating a new | |||
connection with the session, and sending a SEQUENCE operation on | connection with the session, and sending a SEQUENCE operation on | |||
it. However, if the attempt to establish a new connection is | it. However, if the attempt to establish a new connection is | |||
delayed for same reason (exponential backoff of the connection | delayed for some reason (e.g. exponential backoff of the | |||
establishment packets), the client will have to abort the | connection establishment packets), the client will have to abort | |||
connection establishment attempt before the lease expires, and try | the connection establishment attempt before the lease expires, and | |||
again. | attempt to re-connect. | |||
If the server renews the lease upon receiving a SEQUENCE operation, | If the server renews the lease upon receiving a SEQUENCE operation, | |||
the server MUST NOT allow the lease to expire while the rest of the | the server MUST NOT allow the lease to expire while the rest of the | |||
operations in the COMPOUND procedure's request are still executing. | operations in the COMPOUND procedure's request are still executing. | |||
Once the last operation has finished, and the response to COMPOUND | Once the last operation has finished, and the response to COMPOUND | |||
has been sent, the server MUST set the lease to expire no sooner than | has been sent, the server MUST set the lease to expire no sooner than | |||
the sum of current time and the value of the lease_time attribute. | the sum of current time and the value of the lease_time attribute. | |||
A client ID's lease can expire when it has been at least the lease | A client ID's lease can expire when it has been at least the lease | |||
interval (lease_time) since the last lease-renewing SEQUENCE | interval (lease_time) since the last lease-renewing SEQUENCE | |||
skipping to change at page 159, line 38 | skipping to change at page 159, line 38 | |||
the client ID by establishing a session associated with that client | the client ID by establishing a session associated with that client | |||
ID (see Section 18.36.3 for a description how this is done). All | ID (see Section 18.36.3 for a description how this is done). All | |||
locks, including opens, record locks, delegations, and layouts | locks, including opens, record locks, delegations, and layouts | |||
obtained by sessions using that client ID are associated with that | obtained by sessions using that client ID are associated with that | |||
client ID. | client ID. | |||
Since the verifier will be changed by the client upon each | Since the verifier will be changed by the client upon each | |||
initialization, the server can compare a new verifier to the verifier | initialization, the server can compare a new verifier to the verifier | |||
associated with currently held locks and determine that they do not | associated with currently held locks and determine that they do not | |||
match. This signifies the client's new instantiation and subsequent | match. This signifies the client's new instantiation and subsequent | |||
loss (upon confirmation of new the client ID) of locking state. As a | loss (upon confirmation of the new client ID) of locking state. As a | |||
result, the server is free to release all locks held which are | result, the server is free to release all locks held which are | |||
associated with the old client ID which was derived from the old | associated with the old client ID which was derived from the old | |||
verifier. At this point conflicting locks from other clients, kept | verifier. At this point conflicting locks from other clients, kept | |||
waiting while the lease had not yet expired, can be granted. In | waiting while the lease had not yet expired, can be granted. In | |||
addition, all stateids associated with the old client ID can also be | addition, all stateids associated with the old client ID can also be | |||
freed, as they are no longer reference-able. | freed, as they are no longer reference-able. | |||
Note that the verifier must have the same uniqueness properties as | Note that the verifier must have the same uniqueness properties as | |||
the verifier for the COMMIT operation. | the verifier for the COMMIT operation. | |||
skipping to change at page 161, line 13 | skipping to change at page 161, line 13 | |||
are variants of the requests normally used to create locks of that | are variants of the requests normally used to create locks of that | |||
type and are referred to as "reclaim-type" requests and the process | type and are referred to as "reclaim-type" requests and the process | |||
of re-establishing such locks is referred to as "reclaiming" them. | of re-establishing such locks is referred to as "reclaiming" them. | |||
Because each client must have an opportunity to reclaim all of the | Because each client must have an opportunity to reclaim all of the | |||
locks that it has without the possibility that some other client will | locks that it has without the possibility that some other client will | |||
be granted a conflicting lock, a special period called the "grace | be granted a conflicting lock, a special period called the "grace | |||
period" is devoted to the reclaim process. During this period, | period" is devoted to the reclaim process. During this period, | |||
requests creating client IDs and sessions are handled normally, but | requests creating client IDs and sessions are handled normally, but | |||
locking requests are subject to special restrictions. Only reclaim- | locking requests are subject to special restrictions. Only reclaim- | |||
type locking requests are allowed, unless the server is able to | type locking requests are allowed, unless the server can reliably | |||
reliably determine (through state persistently maintained across | determine (through state persistently maintained across restart | |||
restart instances), that granting any such lock cannot possibly | instances), that granting any such lock cannot possibly conflict with | |||
conflict with a subsequent reclaim. When a request is made to obtain | a subsequent reclaim. When a request is made to obtain a new lock | |||
a new lock (i.e. not a reclaim-type request) during the grace period | (i.e. not a reclaim-type request) during the grace period and such a | |||
and such a determination cannot be made, the server must return the | determination cannot be made, the server must return the error | |||
error NFS4ERR_GRACE. | NFS4ERR_GRACE. | |||
Once a session is established using the new client ID, the client | Once a session is established using the new client ID, the client | |||
will use reclaim-type locking requests (e.g. LOCK requests with | will use reclaim-type locking requests (e.g. LOCK requests with | |||
reclaim set to TRUE and OPEN operations with a claim type of | reclaim set to TRUE and OPEN operations with a claim type of | |||
CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. | CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. | |||
Once this is done, or if there is no such locking state to reclaim, | Once this is done, or if there is no such locking state to reclaim, | |||
the client sends a global RECLAIM_COMPLETE operation, i.e. one with | the client sends a global RECLAIM_COMPLETE operation, i.e. one with | |||
the rca_one_fs argument set to FALSE, to indicate that it has | the rca_one_fs argument set to FALSE, to indicate that it has | |||
reclaimed all of the locking state that it will reclaim. Once a | reclaimed all of the locking state that it will reclaim. Once a | |||
client sends such a RECLAIM_COMPLETE operation, it may attempt non- | client sends such a RECLAIM_COMPLETE operation, it may attempt non- | |||
reclaim locking operations, although it may get NFS4ERR_GRACE errors | reclaim locking operations, although it may get NFS4ERR_GRACE errors | |||
the operations until the period of special handling is over. See | the operations until the period of special handling is over. See | |||
Section 11.7.7 for a discussion of the analogous handling lock | Section 11.7.7 for a discussion of the analogous handling lock | |||
reclamation in the case of file systems transitioning from server to | reclamation in the case of file systems transitioning from server to | |||
server. | server. | |||
During the grace period, the server must reject READ and WRITE | During the grace period, the server must reject READ and WRITE | |||
operations and non-reclaim locking requests (i.e. other LOCK and OPEN | operations and non-reclaim locking requests (i.e. other LOCK and OPEN | |||
operations) with an error of NFS4ERR_GRACE, unless it is able to | operations) with an error of NFS4ERR_GRACE, unless it can guarantee | |||
guarantee that these may be done safely, as described below. | that these may be done safely, as described below. | |||
The grace period may last until all clients which are known to | The grace period may last until all clients which are known to | |||
possibly have had locks have done a global RECLAIM_COMPLETE | possibly have had locks have done a global RECLAIM_COMPLETE | |||
operation, indicating that they have finished reclaiming the locks | operation, indicating that they have finished reclaiming the locks | |||
they held before the server restart. This means that a client which | they held before the server restart. This means that a client which | |||
has done a RECLAIM_COMPLETE must be prepared to receive an | has done a RECLAIM_COMPLETE must be prepared to receive an | |||
NFS4ERR_GRACE when attempting to acquire new locks. In order for the | NFS4ERR_GRACE when attempting to acquire new locks. In order for the | |||
server to know that all clients with possible prior lock state have | server to know that all clients with possible prior lock state have | |||
done a RECLAIM_COMPLETE, the server must maintain in stable storage a | done a RECLAIM_COMPLETE, the server must maintain in stable storage a | |||
list of clients which may have such locks. The server may also | list of clients which may have such locks. The server may also | |||
skipping to change at page 163, line 18 | skipping to change at page 163, line 18 | |||
requests to be processed during the grace period, it MUST determine | requests to be processed during the grace period, it MUST determine | |||
that no lock subsequently reclaimed will be rejected and that no lock | that no lock subsequently reclaimed will be rejected and that no lock | |||
subsequently reclaimed would have prevented any I/O operation | subsequently reclaimed would have prevented any I/O operation | |||
processed during the grace period. | processed during the grace period. | |||
Clients should be prepared for the return of NFS4ERR_GRACE errors for | Clients should be prepared for the return of NFS4ERR_GRACE errors for | |||
non-reclaim lock and I/O requests. In this case the client should | non-reclaim lock and I/O requests. In this case the client should | |||
employ a retry mechanism for the request. A delay (on the order of | employ a retry mechanism for the request. A delay (on the order of | |||
several seconds) between retries should be used to avoid overwhelming | several seconds) between retries should be used to avoid overwhelming | |||
the server. Further discussion of the general issue is included in | the server. Further discussion of the general issue is included in | |||
[37]. The client must account for the server that is able to perform | [37]. The client must account for the server that can perform I/O | |||
I/O and non-reclaim locking requests within the grace period as well | and non-reclaim locking requests within the grace period as well as | |||
as those that can not do so. | those that cannot do so. | |||
A reclaim-type locking request outside the server's grace period can | A reclaim-type locking request outside the server's grace period can | |||
only succeed if the server can guarantee that no conflicting lock or | only succeed if the server can guarantee that no conflicting lock or | |||
I/O request has been granted since restart. | I/O request has been granted since restart. | |||
A server may, upon restart, establish a new value for the lease | A server may, upon restart, establish a new value for the lease | |||
period. Therefore, clients should, once a new client ID is | period. Therefore, clients should, once a new client ID is | |||
established, refetch the lease_time attribute and use it as the basis | established, refetch the lease_time attribute and use it as the basis | |||
for lease renewal for the lease associated with that server. | for lease renewal for the lease associated with that server. | |||
However, the server must establish, for this restart event, a grace | However, the server must establish, for this restart event, a grace | |||
skipping to change at page 164, line 12 | skipping to change at page 164, line 12 | |||
allow conflicting requests. When it adopts the finer-grained | allow conflicting requests. When it adopts the finer-grained | |||
approach, it must revoke all locks associated with a given stateid, | approach, it must revoke all locks associated with a given stateid, | |||
even if the conflict is with only a subset of locks. | even if the conflict is with only a subset of locks. | |||
When the server chooses to free all of a client's lock state, either | When the server chooses to free all of a client's lock state, either | |||
immediately upon lease expiration, or a result of the first attempt | immediately upon lease expiration, or a result of the first attempt | |||
to obtain a conflicting a lock, the server may report the loss of | to obtain a conflicting a lock, the server may report the loss of | |||
lock state in a number of ways. | lock state in a number of ways. | |||
The server may choose to invalidate the session and the associated | The server may choose to invalidate the session and the associated | |||
client ID. In this case, when the client is able to communicate with | client ID. In this case, once the client can communicate with the | |||
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting | server, it will receive an NFS4ERR_BADSESSION error. Upon attempting | |||
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | |||
Upon creating the new client ID and new session it would attempt to | Upon creating the new client ID and new session it would attempt to | |||
reclaim locks not be allowed to do so by the server. | reclaim locks not be allowed to do so by the server. | |||
Another possibility is for the server to maintain the session and | Another possibility is for the server to maintain the session and | |||
client ID but for all stateids held by the client to become invalid | client ID but for all stateids held by the client to become invalid | |||
or stale. Once the client is able to reach the server after such a | or stale. Once the client can reach the server after such a network | |||
network partition, the status returned by the SEQUENCE operation will | partition, the status returned by the SEQUENCE operation will | |||
indicate a loss of locking state. (The flag | indicate a loss of locking state, i.e. the flag | |||
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags. | |||
sr_status_flags.) In addition, all I/O submitted by the client with | In addition, all I/O submitted by the client with the now invalid | |||
the now invalid stateids will fail with the server returning the | stateids will fail with the server returning the error | |||
error NFS4ERR_EXPIRED. Once the client learns of the loss of locking | NFS4ERR_EXPIRED. Once the client learns of the loss of locking | |||
state, it will suitably notify the applications that held the | state, it will suitably notify the applications that held the | |||
invalidated locks. The client should then take action to free | invalidated locks. The client should then take action to free | |||
invalidated stateids, either by establishing a new client ID using a | invalidated stateids, either by establishing a new client ID using a | |||
new verifier or by doing a FREE_STATEID operation to release each of | new verifier or by doing a FREE_STATEID operation to release each of | |||
the invalidated stateids. | the invalidated stateids. | |||
When the server adopts a finer-grained approach to revocation of | When the server adopts a finer-grained approach to revocation of | |||
locks when lease have expired, only a subset of stateids will | locks when lease have expired, only a subset of stateids will | |||
normally become invalid during a network partition. When the client | normally become invalid during a network partition. When the client | |||
is able to communicate with the server after such a network | can communicate with the server after such a network partition heals, | |||
partition, the status returned by the SEQUENCE operation will | the status returned by the SEQUENCE operation will indicate a partial | |||
indicate a partial loss of locking state | loss of locking state (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In | |||
(SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In addition, operations, | addition, operations, including I/O submitted by the client, with the | |||
including I/O submitted by the client, with the now invalid stateids | now invalid stateids will fail with the server returning the error | |||
will fail with the server returning the error NFS4ERR_EXPIRED. Once | NFS4ERR_EXPIRED. Once the client learns of the loss of locking | |||
the client learns of the loss of locking state, it will use the | state, it will use the TEST_STATEID operation on all of its stateids | |||
TEST_STATEID operation on all of its stateids to determine which | to determine which locks have been lost and then suitably notify the | |||
locks have been lost and then suitably notify the applications that | applications that held the invalidated locks. The client can then | |||
held the invalidated locks. The client can then release the | release the invalidated locking state and acknowledge the revocation | |||
invalidated locking state and acknowledge the revocation of the | of the associated locks by doing a FREE_STATEID operation on each of | |||
associated locks by doing a FREE_STATEID operation on each of the | the invalidated stateids. | |||
invalidated stateids. | ||||
When a network partition is combined with a server restart, there are | When a network partition is combined with a server restart, there are | |||
edge conditions that place requirements on the server in order to | edge conditions that place requirements on the server in order to | |||
avoid silent data corruption following the server restart. Two of | avoid silent data corruption following the server restart. Two of | |||
these edge conditions are known, and are discussed below. | these edge conditions are known, and are discussed below. | |||
The first edge condition arises as a result of the scenarios such as | The first edge condition arises as a result of the scenarios such as | |||
the following: | the following: | |||
1. Client A acquires a lock. | 1. Client A acquires a lock. | |||
skipping to change at page 167, line 37 | skipping to change at page 167, line 37 | |||
reclaims of share reservations, record locks, and delegations): | reclaims of share reservations, record locks, and delegations): | |||
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | |||
unforgiving, but necessary if the server does not record lock | unforgiving, but necessary if the server does not record lock | |||
state in stable storage. | state in stable storage. | |||
2. Record sufficient state in stable storage such that all known | 2. Record sufficient state in stable storage such that all known | |||
edge conditions involving server restart, including the two noted | edge conditions involving server restart, including the two noted | |||
in this section, are detected. It is acceptable to erroneously | in this section, are detected. It is acceptable to erroneously | |||
recognize an edge condition and not allow a reclaim, when, with | recognize an edge condition and not allow a reclaim, when, with | |||
sufficient knowledge it would be allowed. Note it is not known | sufficient knowledge it would be allowed. The error the server | |||
if there are other edge conditions. | would return in this case is NFS4ERR_NO_GRACE. Note it is not | |||
known if there are other edge conditions. | ||||
In the event that, after a server restart, the server determines | In the event that, after a server restart, the server determines | |||
that there is unrecoverable damage or corruption to the | that there is unrecoverable damage or corruption to the | |||
information in stable storage, then for all clients and/or locks | information in stable storage, then for all clients and/or locks | |||
which may be affected, the server MUST return NFS4ERR_NO_GRACE. | which may be affected, the server MUST return NFS4ERR_NO_GRACE. | |||
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | |||
outside the scope of this specification, since the strategies for | outside the scope of this specification, since the strategies for | |||
such handling are very dependent on the client's operating | such handling are very dependent on the client's operating | |||
environment. However, one potential approach is described below. | environment. However, one potential approach is described below. | |||
skipping to change at page 169, line 32 | skipping to change at page 169, line 33 | |||
gentler to servers trying to handle very large numbers of clients. | gentler to servers trying to handle very large numbers of clients. | |||
The number of extra requests to effect lock renewal drops in inverse | The number of extra requests to effect lock renewal drops in inverse | |||
proportion to the lease time. The disadvantages of long leases | proportion to the lease time. The disadvantages of long leases | |||
include the possibility of slower recovery after certain failures. | include the possibility of slower recovery after certain failures. | |||
After server failure, a longer grace period may be required when some | After server failure, a longer grace period may be required when some | |||
clients do not promptly reclaim their locks and do a global | clients do not promptly reclaim their locks and do a global | |||
RECLAIM_COMPLETE. In the event of client failure, there can be a | RECLAIM_COMPLETE. In the event of client failure, there can be a | |||
longer period for leases to expire thus forcing conflicting requests | longer period for leases to expire thus forcing conflicting requests | |||
to wait. | to wait. | |||
Long leases are practical if the server is able to store lease state | Long leases are practical if the server is can store lease state in | |||
in non-volatile memory. Upon recovery, the server can reconstruct | non-volatile memory. Upon recovery, the server can reconstruct the | |||
the lease state from its non-volatile memory and continue operation | lease state from its non-volatile memory and continue operation with | |||
with its clients and therefore long leases would not be an issue. | its clients and therefore long leases would not be an issue. | |||
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | |||
To avoid the need for synchronized clocks, lease times are granted by | To avoid the need for synchronized clocks, lease times are granted by | |||
the server as a time delta. However, there is a requirement that the | the server as a time delta. However, there is a requirement that the | |||
client and server clocks do not drift excessively over the duration | client and server clocks do not drift excessively over the duration | |||
of the lease. There is also the issue of propagation delay across | of the lease. There is also the issue of propagation delay across | |||
the network which could easily be several hundred milliseconds as | the network which could easily be several hundred milliseconds as | |||
well as the possibility that requests will be lost and need to be | well as the possibility that requests will be lost and need to be | |||
retransmitted. | retransmitted. | |||
skipping to change at page 171, line 20 | skipping to change at page 171, line 23 | |||
DESTROY_CLIENTID) are not ignored. | DESTROY_CLIENTID) are not ignored. | |||
9. File Locking and Share Reservations | 9. File Locking and Share Reservations | |||
To support Win32 share reservations it is necessary to provide | To support Win32 share reservations it is necessary to provide | |||
operations which atomically open or create files. Having a separate | operations which atomically open or create files. Having a separate | |||
share/unshare operation would not allow correct implementation of the | share/unshare operation would not allow correct implementation of the | |||
Win32 OpenFile API. In order to correctly implement share semantics, | Win32 OpenFile API. In order to correctly implement share semantics, | |||
the previous NFS protocol mechanisms used when a file is opened or | the previous NFS protocol mechanisms used when a file is opened or | |||
created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 | created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 | |||
protocol defines an OPEN operation which looks up or creates a file | protocol defines an OPEN operation which is capable of atomically | |||
and establishes locking state on the server. | looking up, creating, and locking a file on the server. | |||
9.1. Opens and Byte-range Locks | 9.1. Opens and Byte-Range Locks | |||
It is assumed that manipulating a byte-range lock is rare when | It is assumed that manipulating a byte-range lock is rare when | |||
compared to READ and WRITE operations. It is also assumed that | compared to READ and WRITE operations. It is also assumed that | |||
crashes and network partitions are relatively rare. Therefore it is | server restarts and network partitions are relatively rare. | |||
important that the READ and WRITE operations have a lightweight | Therefore it is important that the READ and WRITE operations have a | |||
mechanism to indicate if they possess a held lock. A byte-range lock | lightweight mechanism to indicate if they possess a held lock. A | |||
request contains the heavyweight information required to establish a | byte-range lock request contains the heavyweight information required | |||
lock and uniquely define the owner of the lock. | to establish a lock and uniquely define the owner of the lock. | |||
9.1.1. State-owner Definition | 9.1.1. State-owner Definition | |||
When opening a file or requesting a record lock, the client must | When opening a file or requesting a record lock, the client must | |||
specify an identifier which represents the owner of the requested | specify an identifier which represents the owner of the requested | |||
lock. This identifier is in the form of a state-owner, represented | lock. This identifier is in the form of a state-owner, represented | |||
in the protocol by a state_owner4, a variable-length opaque array | in the protocol by a state_owner4, a variable-length opaque array | |||
which, when concatenated with the current client ID uniquely defines | which, when concatenated with the current client ID uniquely defines | |||
the owner of lock managed by the client. This may be a thread id, | the owner of lock managed by the client. This may be a thread id, | |||
process id, or other unique value. | process id, or other unique value. | |||
skipping to change at page 172, line 7 | skipping to change at page 172, line 10 | |||
remain separate even if the same opaque arrays are used to designate | remain separate even if the same opaque arrays are used to designate | |||
owners of each. The protocol distinguishes between open-owners | owners of each. The protocol distinguishes between open-owners | |||
(represented by open_owner4 structures) and lock-owners (represented | (represented by open_owner4 structures) and lock-owners (represented | |||
by lock_owner4 structures). | by lock_owner4 structures). | |||
Each open is associated with a specific open-owner while each record | Each open is associated with a specific open-owner while each record | |||
lock is associated with a lock-owner and an open-owner, the latter | lock is associated with a lock-owner and an open-owner, the latter | |||
being the open-owner associated with the open file under which the | being the open-owner associated with the open file under which the | |||
LOCK operation was done. Delegations and layouts, on the other hand, | LOCK operation was done. Delegations and layouts, on the other hand, | |||
are not associated with a specific owner but are associated with the | are not associated with a specific owner but are associated with the | |||
client as a whole. | client as a whole (identified by a client ID). | |||
9.1.2. Use of the Stateid and Locking | 9.1.2. Use of the Stateid and Locking | |||
All READ, WRITE and SETATTR operations contain a stateid. For the | All READ, WRITE and SETATTR operations contain a stateid. For the | |||
purposes of this section, SETATTR operations which change the size | purposes of this section, SETATTR operations which change the size | |||
attribute of a file are treated as if they are writing the area | attribute of a file are treated as if they are writing the area | |||
between the old and new size (i.e. the range truncated or added to | between the old and new size (i.e. the range truncated or added to | |||
the file by means of the SETATTR), even where SETATTR is not | the file by means of the SETATTR), even where SETATTR is not | |||
explicitly mentioned in the text. The stateid passed to these | explicitly mentioned in the text. The stateid passed to one of these | |||
operation must be one that represents an open, a set of byte-range | operations must be one that represents an open, a set of byte-range | |||
locks, or a delegation, or it may be a special stateid representing | locks, or a delegation, or it may be a special stateid representing | |||
anonymous access or the special bypass stateid. | anonymous access or the special bypass stateid. | |||
If the state-owner performs a READ or WRITE in a situation in which | If the state-owner performs a READ or WRITE in a situation in which | |||
it has established a byte-range lock or share reservation on the | it has established a byte-range lock or share reservation on the | |||
server (any OPEN constitutes a share reservation) the stateid | server (any OPEN constitutes a share reservation) the stateid | |||
(previously returned by the server) must be used to indicate what | (previously returned by the server) must be used to indicate what | |||
locks, including both record locks and share reservations, are held | locks, including both record locks and share reservations, are held | |||
by the state-owner. If no state is established by the client, either | by the state-owner. If no state is established by the client, either | |||
record lock or share reservation, a special stateid for anonymous | record lock or share reservation, a special stateid for anonymous | |||
state (zero as "other" and "seqid") is used. (See Section 8.2.3 for | state (zero as "other" and "seqid") is used. (See Section 8.2.3 for | |||
a description of 'special' stateids in general). Regardless whether | a description of 'special' stateids in general.) Regardless whether | |||
a stateid for anonymous state or a stateid returned by the server is | a stateid for anonymous state or a stateid returned by the server is | |||
used, if there is a conflicting share reservation or mandatory record | used, if there is a conflicting share reservation or mandatory record | |||
lock held on the file, the server MUST refuse to service the READ or | lock held on the file, the server MUST refuse to service the READ or | |||
WRITE operation. | WRITE operation. | |||
Share reservations are established by OPEN operations and by their | Share reservations are established by OPEN operations and by their | |||
nature are mandatory in that when the OPEN denies READ or WRITE | nature are mandatory in that when the OPEN denies READ or WRITE | |||
operations, that denial results in such operations being rejected | operations, that denial results in such operations being rejected | |||
with error NFS4ERR_LOCKED. Record locks may be implemented by the | with error NFS4ERR_LOCKED. Record locks may be implemented by the | |||
server as either mandatory or advisory, or the choice of mandatory or | server as either mandatory or advisory, or the choice of mandatory or | |||
skipping to change at page 173, line 19 | skipping to change at page 173, line 21 | |||
far as the APIs and requirements on implementation. If the mandatory | far as the APIs and requirements on implementation. If the mandatory | |||
lock attribute is set on the file, the server checks to see if the | lock attribute is set on the file, the server checks to see if the | |||
lock-owner has an appropriate shared (read) or exclusive (write) | lock-owner has an appropriate shared (read) or exclusive (write) | |||
record lock on the region it wishes to read or write to. If there is | record lock on the region it wishes to read or write to. If there is | |||
no appropriate lock, the server checks if there is a conflicting lock | no appropriate lock, the server checks if there is a conflicting lock | |||
(which can be done by attempting to acquire the conflicting lock on | (which can be done by attempting to acquire the conflicting lock on | |||
behalf of the lock-owner, and if successful, release the lock after | behalf of the lock-owner, and if successful, release the lock after | |||
the READ or WRITE is done), and if there is, the server returns | the READ or WRITE is done), and if there is, the server returns | |||
NFS4ERR_LOCKED. | NFS4ERR_LOCKED. | |||
For Windows environments, there are no advisory record locks, so the | For Windows environments, record locks are always mandatory, so the | |||
server always checks for record locks during I/O requests. | server always checks for record locks during I/O requests. | |||
Thus, the NFSv4.1 LOCK operation does not need to distinguish between | Thus, the NFSv4.1 LOCK operation does not need to distinguish between | |||
advisory and mandatory record locks. It is the NFSv4.1 server's | advisory and mandatory record locks. It is the NFSv4.1 server's | |||
processing of the READ and WRITE operations that introduces the | processing of the READ and WRITE operations that introduces the | |||
distinction. | distinction. | |||
Every stateid which is validly passed to READ, WRITE or SETATTR, with | Every stateid which is validly passed to READ, WRITE or SETATTR, with | |||
the exception of special stateid values, defines an access mode for | the exception of special stateid values, defines an access mode for | |||
the file (i.e. READ, WRITE, or READ-WRITE) | the file (i.e. READ, WRITE, or READ-WRITE) | |||
skipping to change at page 173, line 43 | skipping to change at page 173, line 45 | |||
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the | and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the | |||
same open-owner/file pair. | same open-owner/file pair. | |||
o For stateids returned by record lock requests, the appropriate | o For stateids returned by record lock requests, the appropriate | |||
mode is the access mode for the open stateid associated with the | mode is the access mode for the open stateid associated with the | |||
lock set represented by the stateid. | lock set represented by the stateid. | |||
o For delegation stateids the access mode is based on the type of | o For delegation stateids the access mode is based on the type of | |||
delegation. | delegation. | |||
When a READ, WRITE, or SETATTR which specifies the size attribute, is | When a READ, WRITE, or SETATTR (which specifies the size attribute) | |||
done, the operation is subject to checking against the access mode to | is done, the operation is subject to checking against the access mode | |||
verify that the operation is appropriate given the stateid with which | to verify that the operation is appropriate given the stateid with | |||
the operation is associated. | which the operation is associated. | |||
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which | In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which | |||
set size), the server must verify that the access mode allows writing | set size), the server MUST verify that the access mode allows writing | |||
and return an NFS4ERR_OPENMODE error if it does not. In the case, of | and MUST return an NFS4ERR_OPENMODE error if it does not. In the | |||
READ, the server may perform the corresponding check on the access | case, of READ, the server may perform the corresponding check on the | |||
mode, or it may choose to allow READ on opens for WRITE only, to | access mode, or it may choose to allow READ on opens for WRITE only, | |||
accommodate clients whose write implementation may unavoidably do | to accommodate clients whose write implementation may unavoidably do | |||
reads (e.g. due to buffer cache constraints). However, even if READs | reads (e.g. due to buffer cache constraints). However, even if READs | |||
are allowed in these circumstances, the server MUST still check for | are allowed in these circumstances, the server MUST still check for | |||
locks that conflict with the READ (e.g. another open specify denial | locks that conflict with the READ (e.g. another open specify denial | |||
of READs). Note that a server which does enforce the access mode | of READs). Note that a server which does enforce the access mode | |||
check on READs need not explicitly check for conflicting share | check on READs need not explicitly check for conflicting share | |||
reservations since the existence of OPEN for read access guarantees | reservations since the existence of OPEN for read access guarantees | |||
that no conflicting share reservation can exist. | that no conflicting share reservation can exist. | |||
The read bypass special stateid (all bits of "other" and "seqid" set | The read bypass special stateid (all bits of "other" and "seqid" set | |||
to one) stateid indicates a desire to bypass locking checks. The | to one) indicates a desire to bypass locking checks. The server MAY | |||
server MAY allow READ operations to bypass locking checks at the | allow READ operations to bypass locking checks at the server, when | |||
server, when this special stateid is used. However, WRITE operations | this special stateid is used. However, WRITE operations with this | |||
with this special stateid value MUST NOT bypass locking checks and | special stateid value MUST NOT bypass locking checks and are treated | |||
are treated exactly the same as if a special stateid for anonymous | exactly the same as if a special stateid for anonymous state were | |||
state were used. | used. | |||
A lock may not be granted while a READ or WRITE operation using one | A lock may not be granted while a READ or WRITE operation using one | |||
of the special stateids is being performed and the scope of the lock | of the special stateids is being performed and the scope of the lock | |||
to be granted would conflict with the READ or WRITE operation. This | to be granted would conflict with the READ or WRITE operation. This | |||
can occur when: | can occur when: | |||
o A mandatory byte range lock is requested with range that conflicts | o A mandatory byte range lock is requested with range that conflicts | |||
with the range of the READ or WRITE operation. For the purposes | with the range of the READ or WRITE operation. For the purposes | |||
of this paragraph, a conflict occurs when a shared lock is | of this paragraph, a conflict occurs when a shared lock is | |||
requested and a WRITE operation is being performed, or an | requested and a WRITE operation is being performed, or an | |||
exclusive lock is requested and either a READ or a WRITE operation | exclusive lock is requested and either a READ or a WRITE operation | |||
is being performed. | is being performed. | |||
o A share reservation is requested which denies reading and or | o A share reservation is requested which denies reading and or | |||
writing and the corresponding is being performed. | writing and the corresponding operation is being performed. | |||
o A delegation is to be granted and the delegation type would | o A delegation is to be granted and the delegation type would | |||
prevent the I/O operation, i.e. READ and WRITE conflict with a | prevent the I/O operation, i.e. READ and WRITE conflict with a | |||
write delegation and WRITE conflicts with a read delegation. | write delegation and WRITE conflicts with a read delegation. | |||
When a client holds a delegation, it is particularly important to | When a client holds a delegation, it needs to ensure that the stateid | |||
make sure that the stateid sent conveys the association of operation | sent conveys the association of operation with the delegation, to | |||
with the delegation, to avoid the delegation from being avoidably | avoid the delegation from being avoidably recalled. When the | |||
recalled. When the delegation stateid, or a stateid open associated | delegation stateid, or a stateid open associated with that | |||
with that delegation, or a stateid representing byte-range locks | delegation, or a stateid representing byte-range locks derived form | |||
derived form such an open is used, the server knows that the READ, | such an open is used, the server knows that the READ, WRITE, or | |||
WRITE, or SETATTR does not conflict with the delegation, but is sent | SETATTR does not conflict with the delegation, but is sent under the | |||
under the aegis of the delegation. Even though it is possible for | aegis of the delegation. Even though it is possible for the server | |||
the server to determine from the clientid (via the sessionid) that | to determine from the client ID (via the sessionid) that the client | |||
the client does in fact have a delegation, the server is not obliged | does in fact have a delegation, the server is not obliged to check | |||
to check this, so using a special stateid can result in avoidable | this, so using a special stateid can result in avoidable recall of | |||
recall of the delegation. | the delegation. | |||
9.2. Lock Ranges | 9.2. Lock Ranges | |||
The protocol allows a lock-owner to request a lock with a byte range | The protocol allows a lock-owner to request a lock with a byte range | |||
and then either upgrade, downgrade, or unlock a sub-range of the | and then either upgrade, downgrade, or unlock a sub-range of the | |||
initial lock, or a range that consists of a range which overlaps, | initial lock, or a range that consists of a range which overlaps, | |||
fully or partially, that initial lock or a combination of a set of | fully or partially, that initial lock or a combination of a set of | |||
existing locks for the same lock-owner. It is expected that this | existing locks for the same lock-owner. It is expected that this | |||
will be an uncommon type of request. In any case, servers or server | will be an uncommon type of request. In any case, servers or server | |||
file systems may not be able to support sub-range lock semantics. In | file systems may not be able to support sub-range lock semantics. In | |||
skipping to change at page 175, line 26 | skipping to change at page 175, line 28 | |||
sub-range of current locking state for the lock-owner, the server is | sub-range of current locking state for the lock-owner, the server is | |||
allowed to return the error NFS4ERR_LOCK_RANGE to signify that it | allowed to return the error NFS4ERR_LOCK_RANGE to signify that it | |||
does not support sub-range lock operations. Therefore, the client | does not support sub-range lock operations. Therefore, the client | |||
should be prepared to receive this error and, if appropriate, report | should be prepared to receive this error and, if appropriate, report | |||
the error to the requesting application. | the error to the requesting application. | |||
The client is discouraged from combining multiple independent locking | The client is discouraged from combining multiple independent locking | |||
ranges that happen to be adjacent into a single request since the | ranges that happen to be adjacent into a single request since the | |||
server may not support sub-range requests and for reasons related to | server may not support sub-range requests and for reasons related to | |||
the recovery of file locking state in the event of server failure. | the recovery of file locking state in the event of server failure. | |||
As discussed in Section 8.4.2 below, the server may employ certain | As discussed in Section 8.4.2, the server may employ certain | |||
optimizations during recovery that work effectively only when the | optimizations during recovery that work effectively only when the | |||
client's behavior during lock recovery is similar to the client's | client's behavior during lock recovery is similar to the client's | |||
locking behavior prior to server failure. | locking behavior prior to server failure. | |||
9.3. Upgrading and Downgrading Locks | 9.3. Upgrading and Downgrading Locks | |||
If a client has a write lock on a record, it can request an atomic | If a client has a write lock on a record, it can request an atomic | |||
downgrade of the lock to a read lock via the LOCK request, by setting | downgrade of the lock to a read lock via the LOCK request, by setting | |||
the type to READ_LT. If the server supports atomic downgrade, the | the type to READ_LT. If the server supports atomic downgrade, the | |||
request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. | request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. | |||
skipping to change at page 176, line 5 | skipping to change at page 176, line 6 | |||
the type to WRITE_LT or WRITEW_LT. If the server does not support | the type to WRITE_LT or WRITEW_LT. If the server does not support | |||
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade | atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade | |||
can be achieved without an existing conflict, the request will | can be achieved without an existing conflict, the request will | |||
succeed. Otherwise, the server will return either NFS4ERR_DENIED or | succeed. Otherwise, the server will return either NFS4ERR_DENIED or | |||
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the | NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the | |||
client sent the LOCK request with the type set to WRITEW_LT and the | client sent the LOCK request with the type set to WRITEW_LT and the | |||
server has detected a deadlock. The client should be prepared to | server has detected a deadlock. The client should be prepared to | |||
receive such errors and if appropriate, report the error to the | receive such errors and if appropriate, report the error to the | |||
requesting application. | requesting application. | |||
9.4. Stateid Seqid Values and Byte-range Locks | 9.4. Stateid Seqid Values and Byte-Range Locks | |||
When a lock or unlock request is done, passing a stateid, the stateid | When a lock or unlock request is done, passing a stateid, the stateid | |||
returned has the same "other" value and a "seqid" value that is | returned has the same "other" value and a "seqid" value that is | |||
incremented to reflect the occurrence of the lock or unlock request. | incremented to reflect the occurrence of the lock or unlock request. | |||
The server MUST increment the value of the "seqid" field whenever | The server MUST increment the value of the "seqid" field whenever | |||
there is any change to the locking status of any byte offset as | there is any change to the locking status of any byte offset as | |||
described by any of locks covered by the stateid. A change in | described by any of locks covered by the stateid. A change in | |||
locking status includes a change from locked to unlocked or the | locking status includes a change from locked to unlocked or the | |||
reverse or a change from being locked for read to being locked for | reverse or a change from being locked for read to being locked for | |||
write or the reverse. | write or the reverse. | |||
When there is no such change, as, for example when a range already | When there is no such change, as, for example when a range already | |||
locked for write is locked again for write, the server MAY increment | locked for write is locked again for write, the server MAY increment | |||
the "seqid" value. | the "seqid" value. | |||
9.5. Issues with Multiple Open-owners | 9.5. Issues with Multiple Open-Owners | |||
When the same file is opened by multiple open-owners and there are | When the same file is opened by multiple open-owners, a client will | |||
LOCK and LOCKU requests for the same lock-owner issued through the | have multiple open stateids for that file, each associated with a | |||
different open files, a situation may arise in which there are | different open-owner. In that case, there can be multiple LOCK and | |||
multiple stateids representing byte-range locks for locks on the the | LOCKU requests for the same lock-owner issued using the different | |||
same file held by the same lock-owner but each assigned to a | open stateids, and so a situation may arise in which there are | |||
multiple stateids, each representing byte-range locks on the same | ||||
file and held by the same lock-owner but each associated with a | ||||
different open-owner. | different open-owner. | |||
In such a situation, the locking status of each byte (i.e. whether it | In such a situation, the locking status of each byte (i.e. whether it | |||
is locked, the read or write mode of the lock and the lock-owner | is locked, the read or write mode of the lock and the lock-owner | |||
holding the lock) MUST reflect the last LOCK or LOCKU operation done | holding the lock) MUST reflect the last LOCK or LOCKU operation done | |||
for the lock-owner in question, independent of the stateid through | for the lock-owner in question, independent of the stateid through | |||
which the request was issued. | which the request was issued. | |||
When a byte is locked by the lock-owner in question, the open-owner | When a byte is locked by the lock-owner in question, the open-owner | |||
to which that lock is assigned SHOULD be that of the open-owner | to which that lock is assigned SHOULD be that of the open-owner | |||
skipping to change at page 177, line 4 | skipping to change at page 177, line 11 | |||
change to the set of locked bytes associated with a different stateid | change to the set of locked bytes associated with a different stateid | |||
for the same lock-owner, i.e. associated with a different open-owner, | for the same lock-owner, i.e. associated with a different open-owner, | |||
the "seqid" value for that stateid MUST NOT be incremented. | the "seqid" value for that stateid MUST NOT be incremented. | |||
9.6. Blocking Locks | 9.6. Blocking Locks | |||
Some clients require the support of blocking locks. While NFSv4.1 | Some clients require the support of blocking locks. While NFSv4.1 | |||
provides a callback when a previously unavailable lock becomes | provides a callback when a previously unavailable lock becomes | |||
available, this is an OPTIONAL feature and clients cannot depend on | available, this is an OPTIONAL feature and clients cannot depend on | |||
its presence. Clients need to be prepared to continually poll for | its presence. Clients need to be prepared to continually poll for | |||
the lock. This presents a fairness problem. Two new lock types are | the lock. This presents a fairness problem. Two of the lock types, | |||
added, READW and WRITEW, and are used to indicate to the server that | READW and WRITEW, are used to indicate to the server that the client | |||
the client is requesting a blocking lock. When the callback is not | is requesting a blocking lock. When the callback is not used, the | |||
used, the server should maintain an ordered list of pending blocking | server should maintain an ordered list of pending blocking locks. | |||
locks. When the conflicting lock is released, the server may wait | When the conflicting lock is released, the server may wait for the | |||
the lease period for the first waiting client to re-request the lock. | period of time equal to lease_time for the first waiting client to | |||
After the lease period expires the next waiting client request is | re-request the lock. After the lease period expires, the next | |||
allowed the lock. Clients are required to poll at an interval | waiting client request is allowed the lock. Clients are required to | |||
sufficiently small that it is likely to acquire the lock in a timely | poll at an interval sufficiently small that it is likely to acquire | |||
manner. The server is not required to maintain a list of pending | the lock in a timely manner. The server is not required to maintain | |||
blocked locks as it is used to increase fairness and not correct | a list of pending blocked locks as it is used to increase fairness | |||
operation. Because of the unordered nature of crash recovery, | and not correct operation. Because of the unordered nature of crash | |||
storing of lock state to stable storage would be required to | recovery, storing of lock state to stable storage would be required | |||
guarantee ordered granting of blocking locks. | to guarantee ordered granting of blocking locks. | |||
Servers may also note the lock types and delay returning denial of | Servers may also note the lock types and delay returning denial of | |||
the request to allow extra time for a conflicting lock to be | the request to allow extra time for a conflicting lock to be | |||
released, allowing a successful return. In this way, clients can | released, allowing a successful return. In this way, clients can | |||
avoid the burden of needlessly frequent polling for blocking locks. | avoid the burden of needlessly frequent polling for blocking locks. | |||
The server should take care in the length of delay in the event the | The server should take care in the length of delay in the event the | |||
client retransmits the request. | client retransmits the request. | |||
If a server receives a blocking lock request, denies it, and then | If a server receives a blocking lock request, denies it, and then | |||
later receives a nonblocking request for the same lock, which is also | later receives a nonblocking request for the same lock, which is also | |||
skipping to change at page 180, line 45 | skipping to change at page 181, line 5 | |||
possible wraparound of the 32-bit field. | possible wraparound of the 32-bit field. | |||
When the possibility exists that the client will send multiple OPENs | When the possibility exists that the client will send multiple OPENs | |||
for the same open-owner in parallel, it may be the case that an open | for the same open-owner in parallel, it may be the case that an open | |||
upgrade may happen without the client knowing beforehand that this | upgrade may happen without the client knowing beforehand that this | |||
could happen. Because of this possibility, CLOSEs and | could happen. Because of this possibility, CLOSEs and | |||
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in | OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in | |||
the stateid, to avoid the possibility that the status change | the stateid, to avoid the possibility that the status change | |||
associated with an open upgrade is not inadvertently lost. | associated with an open upgrade is not inadvertently lost. | |||
9.11. Reclaim of Open and Byte-range Locks | 9.11. Reclaim of Open and Byte-Range Locks | |||
Special forms of the LOCK and OPEN operations are provided when it is | Special forms of the LOCK and OPEN operations are provided when it is | |||
necessary to re-establish byte-range locks or opens after a server | necessary to re-establish byte-range locks or opens after a server | |||
failure. | failure. | |||
o To reclaim existing opens, an OPEN operation is performed using a | o To reclaim existing opens, an OPEN operation is performed using a | |||
CLAIM_PREVIOUS. Because the client, in this type of situation, | CLAIM_PREVIOUS. Because the client, in this type of situation, | |||
will have already opened the file and have the filehandle of the | will have already opened the file and have the filehandle of the | |||
target file, this operation requires that the current filehandle | target file, this operation requires that the current filehandle | |||
be the target file, rather than a directory and no file name is | be the target file, rather than a directory and no file name is | |||
skipping to change at page 182, line 20 | skipping to change at page 182, line 29 | |||
In this case, repeated reference to the server to find that no | In this case, repeated reference to the server to find that no | |||
conflicts exist is expensive. A better option with regards to | conflicts exist is expensive. A better option with regards to | |||
performance is to allow a client that repeatedly opens a file to do | performance is to allow a client that repeatedly opens a file to do | |||
so without reference to the server. This is done until potentially | so without reference to the server. This is done until potentially | |||
conflicting operations from another client actually occur. | conflicting operations from another client actually occur. | |||
A similar situation arises in connection with file locking. Sending | A similar situation arises in connection with file locking. Sending | |||
file lock and unlock requests to the server as well as the read and | file lock and unlock requests to the server as well as the read and | |||
write requests necessary to make data caching consistent with the | write requests necessary to make data caching consistent with the | |||
locking semantics (see Section 10.3.2 can severely limit performance. | locking semantics (see Section 10.3.2) can severely limit | |||
When locking is used to provide protection against infrequent | performance. When locking is used to provide protection against | |||
conflicts, a large penalty is incurred. This penalty may discourage | infrequent conflicts, a large penalty is incurred. This penalty may | |||
the use of file locking by applications. | discourage the use of file locking by applications. | |||
The NFSv4.1 protocol provides more aggressive caching strategies with | The NFSv4.1 protocol provides more aggressive caching strategies with | |||
the following design goals: | the following design goals: | |||
o Compatibility with a large range of server semantics. | o Compatibility with a large range of server semantics. | |||
o Providing the same caching benefits as previous versions of the | o Providing the same caching benefits as previous versions of the | |||
NFS protocol when unable to support the more aggressive model. | NFS protocol when unable to support the more aggressive model. | |||
o Requirements for aggressive caching are organized so that a large | o Requirements for aggressive caching are organized so that a large | |||
skipping to change at page 185, line 40 | skipping to change at page 186, line 4 | |||
To allow for this type of client recovery, the server MAY extend the | To allow for this type of client recovery, the server MAY extend the | |||
period for delegation recovery beyond the typical lease expiration | period for delegation recovery beyond the typical lease expiration | |||
period. This implies that requests from other clients that conflict | period. This implies that requests from other clients that conflict | |||
with these delegations will need to wait. Because the normal recall | with these delegations will need to wait. Because the normal recall | |||
process may require significant time for the client to flush changed | process may require significant time for the client to flush changed | |||
state to the server, other clients need be prepared for delays that | state to the server, other clients need be prepared for delays that | |||
occur because of a conflicting delegation. This longer interval | occur because of a conflicting delegation. This longer interval | |||
would increase the window for clients to restart and consult stable | would increase the window for clients to restart and consult stable | |||
storage so that the delegations can be reclaimed. For open | storage so that the delegations can be reclaimed. For open | |||
delegations, such delegations are reclaimed using OPEN with a claim | delegations, such delegations are reclaimed using OPEN with a claim | |||
type of CLAIM_DELEGATE_PREV. (See Section 10.5 and Section 18.16 for | type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (See Section 10.5 | |||
discussion of open delegation and the details of OPEN respectively). | and Section 18.16 for discussion of open delegation and the details | |||
of OPEN respectively). | ||||
A server MAY support a claim type of CLAIM_DELEGATE_PREV, and if it | A server MAY support claim types of CLAIM_DELEGATE_PREV and | |||
does, it MUST NOT remove delegations upon a CREATE_SESSION that | CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations | |||
confirms a client ID created by EXCHANGE_ID, and instead MUST, for a | upon a CREATE_SESSION that confirms a client ID created by | |||
period of time no less than that of the value of the lease_time | EXCHANGE_ID, and instead MUST, for a period of time no less than that | |||
attribute, maintain the client's delegations to allow time for the | of the value of the lease_time attribute, maintain the client's | |||
client to send CLAIM_DELEGATE_PREV requests. The server that | delegations to allow time for the client to send CLAIM_DELEGATE_PREV | |||
supports CLAIM_DELEGATE_PREV MUST support the DELEGPURGE operation. | requests. The server that supports CLAIM_DELEGATE_PREV and/or | |||
CLAIM_DELEG_PREV_FH MUST support the DELEGPURGE operation. | ||||
When the server restarts, delegations are reclaimed (using the OPEN | When the server restarts, delegations are reclaimed (using the OPEN | |||
operation with CLAIM_PREVIOUS) in a similar fashion to record locks | operation with CLAIM_PREVIOUS) in a similar fashion to record locks | |||
and share reservations. However, there is a slight semantic | and share reservations. However, there is a slight semantic | |||
difference. In the normal case if the server decides that a | difference. In the normal case if the server decides that a | |||
delegation should not be granted, it performs the requested action | delegation should not be granted, it performs the requested action | |||
(e.g. OPEN) without granting any delegation. For reclaim, the | (e.g. OPEN) without granting any delegation. For reclaim, the | |||
server grants the delegation but a special designation is applied so | server grants the delegation but a special designation is applied so | |||
that the client treats the delegation as having been granted but | that the client treats the delegation as having been granted but | |||
recalled by the server. Because of this, the client has the duty to | recalled by the server. Because of this, the client has the duty to | |||
skipping to change at page 188, line 12 | skipping to change at page 188, line 27 | |||
client's cache. This validation must be done at least when the | client's cache. This validation must be done at least when the | |||
client's OPEN operation includes DENY=WRITE or BOTH thus | client's OPEN operation includes DENY=WRITE or BOTH thus | |||
terminating a period in which other clients may have had the | terminating a period in which other clients may have had the | |||
opportunity to open the file with WRITE access. Clients may | opportunity to open the file with WRITE access. Clients may | |||
choose to do the revalidation more often (i.e. at OPENs specifying | choose to do the revalidation more often (i.e. at OPENs specifying | |||
DENY=NONE) to parallel the NFSv3 protocol's practice for the | DENY=NONE) to parallel the NFSv3 protocol's practice for the | |||
benefit of users assuming this degree of cache revalidation. | benefit of users assuming this degree of cache revalidation. | |||
Since the change attribute is updated for data and metadata | Since the change attribute is updated for data and metadata | |||
modifications, some client implementors may be tempted to use the | modifications, some client implementors may be tempted to use the | |||
time_modify attribute and not change to validate cached data, so | time_modify attribute and not the change attribute to validate | |||
that metadata changes do not spuriously invalidate clean data. | cached data, so that metadata changes do not spuriously invalidate | |||
The implementor is cautioned in this approach. The change | clean data. The implementor is cautioned in this approach. The | |||
attribute is guaranteed to change for each update to the file, | change attribute is guaranteed to change for each update to the | |||
whereas time_modify is guaranteed to change only at the | file, whereas time_modify is guaranteed to change only at the | |||
granularity of the time_delta attribute. Use by the client's data | granularity of the time_delta attribute. Use by the client's data | |||
cache validation logic of time_modify and not change runs the risk | cache validation logic of time_modify and not change runs the risk | |||
of the client incorrectly marking stale data as valid. | of the client incorrectly marking stale data as valid. | |||
o Second, modified data must be flushed to the server before closing | o Second, modified data must be flushed to the server before closing | |||
a file OPENed for write. This is complementary to the first rule. | a file OPENed for write. This is complementary to the first rule. | |||
If the data is not flushed at CLOSE, the revalidation done after | If the data is not flushed at CLOSE, the revalidation done after | |||
client OPENs as file is unable to achieve its purpose. The other | client OPENs as file is unable to achieve its purpose. The other | |||
aspect to flushing the data before close is that the data must be | aspect to flushing the data before close is that the data must be | |||
committed to stable storage, at the server, before the CLOSE | committed to stable storage, at the server, before the CLOSE | |||
skipping to change at page 191, line 50 | skipping to change at page 192, line 20 | |||
the delegation are subject to change. In particular, the server may | the delegation are subject to change. In particular, the server may | |||
receive a conflicting OPEN from another client, the server must | receive a conflicting OPEN from another client, the server must | |||
recall the delegation before deciding whether the OPEN from the other | recall the delegation before deciding whether the OPEN from the other | |||
client may be granted. Making a delegation is up to the server and | client may be granted. Making a delegation is up to the server and | |||
clients should not assume that any particular OPEN either will or | clients should not assume that any particular OPEN either will or | |||
will not result in an open delegation. The following is a typical | will not result in an open delegation. The following is a typical | |||
set of conditions that servers might use in deciding whether OPEN | set of conditions that servers might use in deciding whether OPEN | |||
should be delegated: | should be delegated: | |||
o The client must be able to respond to the server's callback | o The client must be able to respond to the server's callback | |||
requests. The server will use the CB_NULL procedure for a test of | requests. If a backchannel has been established, the server will | |||
callback ability. | send a CB_COMPOUND request, containing a single operation, | |||
CB_SEQUENCE, for a test of backchannel availability. | ||||
o The client must have responded properly to previous recalls. | o The client must have responded properly to previous recalls. | |||
o There must be no current open conflicting with the requested | o There must be no current open conflicting with the requested | |||
delegation. | delegation. | |||
o There should be no current delegation that conflicts with the | o There should be no current delegation that conflicts with the | |||
delegation being requested. | delegation being requested. | |||
o The probability of future conflicting open requests should be low | o The probability of future conflicting open requests should be low | |||
skipping to change at page 192, line 37 | skipping to change at page 193, line 7 | |||
delegations. | delegations. | |||
When a client has a read open delegation, it is assured that neither | When a client has a read open delegation, it is assured that neither | |||
the contents, the attributes (with the exception of time_access), nor | the contents, the attributes (with the exception of time_access), nor | |||
the names of any links to the file will change without its knowledge, | the names of any links to the file will change without its knowledge, | |||
so long as the delegation is held. When a client has a write open | so long as the delegation is held. When a client has a write open | |||
delegation, it may modify the file data locally since no other client | delegation, it may modify the file data locally since no other client | |||
will be accessing the file's data. The client holding a write | will be accessing the file's data. The client holding a write | |||
delegation may only locally affect file attributes which are | delegation may only locally affect file attributes which are | |||
intimately connected with the file data: size, change, time_access, | intimately connected with the file data: size, change, time_access, | |||
time_metadata, and time_modify. to other attributes must be reflected | time_metadata, and time_modify. All other attributes must be | |||
on the server. | reflected on the server. | |||
When a client has an open delegation, it does not send OPENs or | When a client has an open delegation, it does not need to send OPENs | |||
CLOSEs to the server but updates the appropriate status internally. | or CLOSEs to the server. Instead the client may update the | |||
For a read open delegation, opens that cannot be handled locally | appropriate status internally. For a read open delegation, opens | |||
(opens for write or that deny read access) must be sent to the | that cannot be handled locally (opens for write or that deny read | |||
server. | access) must be sent to the server. | |||
When an open delegation is made, the response to the OPEN contains an | When an open delegation is made, the reply to the OPEN contains an | |||
open delegation structure which specifies the following: | open delegation structure which specifies the following: | |||
o the type of delegation (read or write) | o the type of delegation (read or write). | |||
o space limitation information to control flushing of data on close | o space limitation information to control flushing of data on close | |||
(write open delegation only, see Section 10.4.1. | (write open delegation only, see Section 10.4.1). | |||
o an nfsace4 specifying read and write permissions | o an nfsace4 specifying read and write permissions. | |||
o a stateid to represent the delegation for READ and WRITE | o a stateid to represent the delegation for READ and WRITE. | |||
The delegation stateid is separate and distinct from the stateid for | The delegation stateid is separate and distinct from the stateid for | |||
the OPEN proper. The standard stateid, unlike the delegation | the OPEN proper. The standard stateid, unlike the delegation | |||
stateid, is associated with a particular lock-owner and will continue | stateid, is associated with a particular lock-owner and will continue | |||
to be valid after the delegation is recalled and the file remains | to be valid after the delegation is recalled and the file remains | |||
open. | open. | |||
When a request internal to the client is made to open a file and open | When a request internal to the client is made to open a file and an | |||
delegation is in effect, it will be accepted or rejected solely on | open delegation is in effect, it will be accepted or rejected solely | |||
the basis of the following conditions. Any requirement for other | on the basis of the following conditions. Any requirement for other | |||
checks to be made by the delegate should result in open delegation | checks to be made by the delegate should result in open delegation | |||
being denied so that the checks can be made by the server itself. | being denied so that the checks can be made by the server itself. | |||
o The access and deny bits for the request and the file as described | o The access and deny bits for the request and the file as described | |||
in Section 9.7. | in Section 9.7. | |||
o The read and write permissions as determined below. | o The read and write permissions as determined below. | |||
The nfsace4 passed with delegation can be used to avoid frequent | The nfsace4 passed with delegation can be used to avoid frequent | |||
ACCESS calls. The permission check should be as follows: | ACCESS calls. The permission check should be as follows: | |||
skipping to change at page 193, line 43 | skipping to change at page 194, line 16 | |||
ACCESS request must be sent to the server to obtain the definitive | ACCESS request must be sent to the server to obtain the definitive | |||
answer. | answer. | |||
The server may return an nfsace4 that is more restrictive than the | The server may return an nfsace4 that is more restrictive than the | |||
actual ACL of the file. This includes an nfsace4 that specifies | actual ACL of the file. This includes an nfsace4 that specifies | |||
denial of all access. Note that some common practices such as | denial of all access. Note that some common practices such as | |||
mapping the traditional user "root" to the user "nobody" may make it | mapping the traditional user "root" to the user "nobody" may make it | |||
incorrect to return the actual ACL of the file in the delegation | incorrect to return the actual ACL of the file in the delegation | |||
response. | response. | |||
The use of delegation together with various other forms of caching | The use of a delegation together with various other forms of caching | |||
creates the possibility that no server authentication will ever be | creates the possibility that no server authentication and | |||
performed for a given user since all of the user's requests might be | authorization will ever be performed for a given user since all of | |||
satisfied locally. Where the client is depending on the server for | the user's requests might be satisfied locally. Where the client is | |||
authentication, the client should be sure authentication occurs for | depending on the server for authentication and authorization, the | |||
client should be sure authentication and authorization occurs for | ||||
each user by use of the ACCESS operation. This should be the case | each user by use of the ACCESS operation. This should be the case | |||
even if an ACCESS operation would not be required otherwise. As | even if an ACCESS operation would not be required otherwise. As | |||
mentioned before, the server may enforce frequent authentication by | mentioned before, the server may enforce frequent authentication by | |||
returning an nfsace4 denying all access with every open delegation. | returning an nfsace4 denying all access with every open delegation. | |||
10.4.1. Open Delegation and Data Caching | 10.4.1. Open Delegation and Data Caching | |||
OPEN delegation allows much of the message overhead associated with | An OPEN delegation allows much of the message overhead associated | |||
the opening and closing files to be eliminated. An open when an open | with the opening and closing files to be eliminated. An open when an | |||
delegation is in effect does not require that a validation message be | open delegation is in effect does not require that a validation | |||
sent to the server. The continued endurance of the "read open | message be sent to the server. The continued endurance of the "read | |||
delegation" provides a guarantee that no OPEN for write and thus no | open delegation" provides a guarantee that no OPEN for write and thus | |||
write has occurred. Similarly, when closing a file opened for write | no write has occurred. Similarly, when closing a file opened for | |||
and if write open delegation is in effect, the data written does not | write and if write open delegation is in effect, the data written | |||
have to be flushed to the server until the open delegation is | does not have to be written to the server until the open delegation | |||
recalled. The continued endurance of the open delegation provides a | is recalled. The continued endurance of the open delegation provides | |||
guarantee that no open and thus no read or write has been done by | a guarantee that no open and thus no read or write has been done by | |||
another client. | another client. | |||
For the purposes of open delegation, READs and WRITEs done without an | For the purposes of open delegation, READs and WRITEs done without an | |||
OPEN are treated as the functional equivalents of a corresponding | OPEN are treated as the functional equivalents of a corresponding | |||
type of OPEN. Although client SHOULD NOT use special stateids when | type of OPEN. Although client SHOULD NOT use special stateids when | |||
an open exists, delegation handling on the server can use the | an open exists, delegation handling on the server can use the client | |||
clientid associated with the current session to determine if the | ID associated with the current session to determine if the operation | |||
operation has been done by the holder of the delegation, in which | has been done by the holder of the delegation, in which case, no | |||
case, no recall is necessary, or by another client, in which case the | recall is necessary, or by another client, in which case the | |||
delegation must be recalled and I/O not proceed until the delegation | delegation must be recalled and I/O not proceed until the delegation | |||
is recalled or revoked. | is recalled or revoked. | |||
With delegations, a client is able to avoid writing data to the | With delegations, a client is able to avoid writing data to the | |||
server when the CLOSE of a file is serviced. The file close system | server when the CLOSE of a file is serviced. The file close system | |||
call is the usual point at which the client is notified of a lack of | call is the usual point at which the client is notified of a lack of | |||
stable storage for the modified file data generated by the | stable storage for the modified file data generated by the | |||
application. At the close, file data is written to the server and | application. At the close, file data is written to the server and | |||
through normal accounting the server is able to determine if the | through normal accounting the server is able to determine if the | |||
available file system space for the data has been exceeded (i.e. | available file system space for the data has been exceeded (i.e. | |||
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting | server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting | |||
includes quotas. The introduction of delegations requires that a | includes quotas. The introduction of delegations requires that a | |||
alternative method be in place for the same type of communication to | alternative method be in place for the same type of communication to | |||
occur between client and server. | occur between client and server. | |||
In the delegation response, the server provides either the limit of | In the delegation response, the server provides either the limit of | |||
the size of the file or the number of modified blocks and associated | the size of the file or the number of modified blocks and associated | |||
block size. The server must ensure that the client will be able to | block size. The server must ensure that the client will be able to | |||
flush data to the server of a size equal to that provided in the | write modified data to the server of a size equal to that provided in | |||
original delegation. The server must make this assurance for all | the original delegation. The server must make this assurance for all | |||
outstanding delegations. Therefore, the server must be careful in | outstanding delegations. Therefore, the server must be careful in | |||
its management of available space for new or modified data taking | its management of available space for new or modified data taking | |||
into account available file system space and any applicable quotas. | into account available file system space and any applicable quotas. | |||
The server can recall delegations as a result of managing the | The server can recall delegations as a result of managing the | |||
available file system space. The client should abide by the server's | available file system space. The client should abide by the server's | |||
state space limits for delegations. If the client exceeds the stated | state space limits for delegations. If the client exceeds the stated | |||
limits for the delegation, the server's behavior is undefined. | limits for the delegation, the server's behavior is undefined. | |||
Based on server conditions, quotas or available file system space, | Based on server conditions, quotas or available file system space, | |||
the server may grant write open delegations with very restrictive | the server may grant write open delegations with very restrictive | |||
skipping to change at page 197, line 44 | skipping to change at page 198, line 17 | |||
As discussed earlier in this section, the client MAY return the same | As discussed earlier in this section, the client MAY return the same | |||
cc value on subsequent CB_GETATTR calls, even if the file was | cc value on subsequent CB_GETATTR calls, even if the file was | |||
modified in the client's cache yet again between successive | modified in the client's cache yet again between successive | |||
CB_GETATTR calls. Therefore, the server must assume that the file | CB_GETATTR calls. Therefore, the server must assume that the file | |||
has been modified yet again, and MUST take care to ensure that the | has been modified yet again, and MUST take care to ensure that the | |||
new nsc it constructs and returns is greater than the previous nsc it | new nsc it constructs and returns is greater than the previous nsc it | |||
returned. An example implementation's delegation record would | returned. An example implementation's delegation record would | |||
satisfy this mandate by including a boolean field (let us call it | satisfy this mandate by including a boolean field (let us call it | |||
"modified") that is set to FALSE when the delegation is granted, and | "modified") that is set to FALSE when the delegation is granted, and | |||
an sc value set at the time of grant to the change attribute value. | an sc value set at the time of grant to the change attribute value. | |||
The modified field would be set to true the first time cc != sc, and | The modified field would be set to TRUE the first time cc != sc, and | |||
would stay true until the delegation is returned or revoked. The | would stay TRUE until the delegation is returned or revoked. The | |||
processing for constructing nsc, time_modify, and time_metadata would | processing for constructing nsc, time_modify, and time_metadata would | |||
use this pseudo code: | use this pseudo code: | |||
if (!modified) { | if (!modified) { | |||
do CB_GETATTR for change and size; | do CB_GETATTR for change and size; | |||
if (cc != sc) | if (cc != sc) | |||
modified = TRUE; | modified = TRUE; | |||
} else { | } else { | |||
do CB_GETATTR for size; | do CB_GETATTR for size; | |||
skipping to change at page 199, line 4 | skipping to change at page 199, line 22 | |||
o Potentially conflicting OPEN request (or READ/WRITE done with | o Potentially conflicting OPEN request (or READ/WRITE done with | |||
"special" stateid) | "special" stateid) | |||
o SETATTR sent by another client | o SETATTR sent by another client | |||
o REMOVE request for the file | o REMOVE request for the file | |||
o RENAME request for the file as either source or target of the | o RENAME request for the file as either source or target of the | |||
RENAME | RENAME | |||
Whether a RENAME of a directory in the path leading to the file | Whether a RENAME of a directory in the path leading to the file | |||
results in recall of an open delegation depends on the semantics of | results in recall of an open delegation depends on the semantics of | |||
the server file system. If that file system denies such RENAMEs when | the server's file system. If that file system denies such RENAMEs | |||
a file is open, the recall must be performed to determine whether the | when a file is open, the recall must be performed to determine | |||
file in question is, in fact, open. | whether the file in question is, in fact, open. | |||
In addition to the situations above, the server may choose to recall | In addition to the situations above, the server may choose to recall | |||
open delegations at any time if resource constraints make it | open delegations at any time if resource constraints make it | |||
advisable to do so. Clients should always be prepared for the | advisable to do so. Clients should always be prepared for the | |||
possibility of recall. | possibility of recall. | |||
When a client receives a recall for an open delegation, it needs to | When a client receives a recall for an open delegation, it needs to | |||
update state on the server before returning the delegation. These | update state on the server before returning the delegation. These | |||
same updates must be done whenever a client chooses to return a | same updates must be done whenever a client chooses to return a | |||
delegation voluntarily. The following items of state need to be | delegation voluntarily. The following items of state need to be | |||
skipping to change at page 200, line 48 | skipping to change at page 201, line 20 | |||
awareness could result in the client finding out long after the | awareness could result in the client finding out long after the | |||
failure that its delegation has been revoked, and another client has | failure that its delegation has been revoked, and another client has | |||
modified the data for which the client had a delegation. This is | modified the data for which the client had a delegation. This is | |||
especially a problem for the client that held a write delegation. | especially a problem for the client that held a write delegation. | |||
Status bits returned by SEQUENCE operations help to provide an | Status bits returned by SEQUENCE operations help to provide an | |||
alternate way of informing the client of issues regarding the status | alternate way of informing the client of issues regarding the status | |||
of the backchannel and of recalled delegations. When the backchannel | of the backchannel and of recalled delegations. When the backchannel | |||
is not available, the server returns the status bit | is not available, the server returns the status bit | |||
SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can | SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can | |||
respond by attempting to re-establish the backchannel and by | react by attempting to re-establish the backchannel and by returning | |||
returning recallable objects if a backchannel cannot be successfully | recallable objects if a backchannel cannot be successfully re- | |||
re-established. | established. | |||
Whether the backchannel is functioning or not, it may be that the | Whether the backchannel is functioning or not, it may be that the | |||
recalled delegation is not returned. Note that the client's lease | recalled delegation is not returned. Note that the client's lease | |||
might still be renewed, even though the recalled delegation is not | might still be renewed, even though the recalled delegation is not | |||
returned. In this situation, servers SHOULD revoke delegations that | returned. In this situation, servers SHOULD revoke delegations that | |||
are not returned in a period of time equal to the lease period. This | are not returned in a period of time equal to the lease period. This | |||
period of time should allow the client time to note the backchannel- | period of time should allow the client time to note the backchannel- | |||
down status and re-establish the backchannel. | down status and re-establish the backchannel. | |||
When delegations are revoked, the server will return with the | When delegations are revoked, the server will return with the | |||
skipping to change at page 201, line 49 | skipping to change at page 202, line 20 | |||
If no opens exist for the file at the point the delegation is | If no opens exist for the file at the point the delegation is | |||
revoked, then notification of the revocation is unnecessary. | revoked, then notification of the revocation is unnecessary. | |||
However, if there is modified data present at the client for the | However, if there is modified data present at the client for the | |||
file, the user of the application should be notified. Unfortunately, | file, the user of the application should be notified. Unfortunately, | |||
it may not be possible to notify the user since active applications | it may not be possible to notify the user since active applications | |||
may not be present at the client. See Section 10.5.1 for additional | may not be present at the client. See Section 10.5.1 for additional | |||
details. | details. | |||
10.4.7. Delegations via WANT_DELEGATION | 10.4.7. Delegations via WANT_DELEGATION | |||
In addition to providing delegations as part of the response to OPEN | In addition to providing delegations as part of the reply to OPEN | |||
operations, servers MAY provide delegations separate from open, via | operations, servers MAY provide delegations separate from open, via | |||
the OPTIONAL WANT_DELEGATION operation. This allows delegations to | the OPTIONAL WANT_DELEGATION operation. This allows delegations to | |||
be obtained in advance of an OPEN that might benefit from them, for | be obtained in advance of an OPEN that might benefit from them, for | |||
objects which are not a valid target of OPEN, or to deal with cases | objects which are not a valid target of OPEN, or to deal with cases | |||
in which a delegation has been recalled and the client wants to make | in which a delegation has been recalled and the client wants to make | |||
an attempt to re-establish it if the absence of use by other clients | an attempt to re-establish it if the absence of use by other clients | |||
allows that. | allows that. | |||
The WANT_DELEGATION operation may be performed on any type of file | The WANT_DELEGATION operation may be performed on any type of file | |||
object other than a directory. | object other than a directory. | |||
skipping to change at page 203, line 48 | skipping to change at page 204, line 22 | |||
Saving of such modified data in delegation revocation situations may | Saving of such modified data in delegation revocation situations may | |||
be limited to files of a certain size or might be used only when | be limited to files of a certain size or might be used only when | |||
sufficient disk space is available within the target file system. | sufficient disk space is available within the target file system. | |||
Such saving may also be restricted to situations when the client has | Such saving may also be restricted to situations when the client has | |||
sufficient buffering resources to keep the cached copy available | sufficient buffering resources to keep the cached copy available | |||
until it is properly stored to the target file system. | until it is properly stored to the target file system. | |||
10.6. Attribute Caching | 10.6. Attribute Caching | |||
This section pertains to the caching of a file's attributes on a | ||||
client when that client does not hold a delegation on the file. | ||||
The attributes discussed in this section do not include named | The attributes discussed in this section do not include named | |||
attributes. Individual named attributes are analogous to files and | attributes. Individual named attributes are analogous to files and | |||
caching of the data for these needs to be handled just as data | caching of the data for these needs to be handled just as data | |||
caching is for ordinary files. Similarly, LOOKUP results from an | caching is for ordinary files. Similarly, LOOKUP results from an | |||
OPENATTR directory are to be cached on the same basis as any other | OPENATTR directory are to be cached on the same basis as any other | |||
pathnames and similarly for directory contents. | pathnames and similarly for directory contents. | |||
Clients may cache file attributes obtained from the server and use | Clients may cache file attributes obtained from the server and use | |||
them to avoid subsequent GETATTR requests. Such caching is write | them to avoid subsequent GETATTR requests. Such caching is write | |||
through in that modification to file attributes is always done by | through in that modification to file attributes is always done by | |||
means of requests to the server and should not be done locally and | means of requests to the server and should not be done locally and | |||
cached. The exception to this are modifications to attributes that | cached. The exception to this are modifications to attributes that | |||
are intimately connected with data caching. Therefore, extending a | are intimately connected with data caching. Therefore, extending a | |||
file by writing data to the local data cache is reflected immediately | file by writing data to the local data cache is reflected immediately | |||
in the size as seen on the client without this change being | in the size as seen on the client without this change being | |||
immediately reflected on the server. Normally such changes are not | immediately reflected on the server. Normally such changes are not | |||
propagated directly to the server but when the modified data is | propagated directly to the server but when the modified data is | |||
flushed to the server, analogous attribute changes are made on the | flushed to the server, analogous attribute changes are made on the | |||
server. When open delegation is in effect, the modified attributes | server. When open delegation is in effect, the modified attributes | |||
may be returned to the server in the response to a CB_RECALL call. | may be returned to the server in reaction to a CB_RECALL call. | |||
The result of local caching of attributes is that the attribute | The result of local caching of attributes is that the attribute | |||
caches maintained on individual clients will not be coherent. | caches maintained on individual clients will not be coherent. | |||
Changes made in one order on the server may be seen in a different | Changes made in one order on the server may be seen in a different | |||
order on one client and in a third order on a different client. | order on one client and in a third order on a different client. | |||
The typical file system application programming interfaces do not | The typical file system application programming interfaces do not | |||
provide means to atomically modify or interrogate attributes for | provide means to atomically modify or interrogate attributes for | |||
multiple files at the same time. The following rules provide an | multiple files at the same time. The following rules provide an | |||
environment where the potential incoherences mentioned above can be | environment where the potential incoherences mentioned above can be | |||
skipping to change at page 206, line 37 | skipping to change at page 207, line 16 | |||
instead is just being read by an application via the memory mapped | instead is just being read by an application via the memory mapped | |||
interface, the client will not see an updated time_access | interface, the client will not see an updated time_access | |||
attribute. However, in many operating environments, neither will | attribute. However, in many operating environments, neither will | |||
any process running on the server. Thus NFS clients are at no | any process running on the server. Thus NFS clients are at no | |||
disadvantage with respect to local processes. | disadvantage with respect to local processes. | |||
o If there is another client that is memory mapping the file, and if | o If there is another client that is memory mapping the file, and if | |||
that client is holding a write delegation, the same set of issues | that client is holding a write delegation, the same set of issues | |||
as discussed in the previous two bullet items apply. So, when a | as discussed in the previous two bullet items apply. So, when a | |||
server does a CB_GETATTR to a file that the client has modified in | server does a CB_GETATTR to a file that the client has modified in | |||
its cache, the response from CB_GETATTR will not necessarily be | its cache, the reply from CB_GETATTR will not necessarily be | |||
accurate. As discussed earlier, the client's obligation is to | accurate. As discussed earlier, the client's obligation is to | |||
report that the file has been modified since the delegation was | report that the file has been modified since the delegation was | |||
granted, not whether it has been modified again between successive | granted, not whether it has been modified again between successive | |||
CB_GETATTR calls, and the server MUST assume that any file the | CB_GETATTR calls, and the server MUST assume that any file the | |||
client has modified in cache has been modified again between | client has modified in cache has been modified again between | |||
successive CB_GETATTR calls. Depending on the nature of the | successive CB_GETATTR calls. Depending on the nature of the | |||
client's memory management system, this weak obligation may not be | client's memory management system, this weak obligation may not be | |||
possible. A client MAY return stale information in CB_GETATTR | possible. A client MAY return stale information in CB_GETATTR | |||
whenever the file is memory mapped. | whenever the file is memory mapped. | |||
skipping to change at page 208, line 7 | skipping to change at page 208, line 32 | |||
o Clients and servers MAY deny a record lock on a file they know is | o Clients and servers MAY deny a record lock on a file they know is | |||
memory mapped. | memory mapped. | |||
o A client MAY deny memory mapping a file that it knows requires | o A client MAY deny memory mapping a file that it knows requires | |||
mandatory locking for I/O. If mandatory locking is enabled after | mandatory locking for I/O. If mandatory locking is enabled after | |||
the file is opened and mapped, the client MAY deny the application | the file is opened and mapped, the client MAY deny the application | |||
further access to its mapped file. | further access to its mapped file. | |||
10.8. Name and Directory Caching without Directory Delegations | 10.8. Name and Directory Caching without Directory Delegations | |||
Although NFSv4.1 defines a directory delegation facility, (described | The NFSv4.1 directory delegation facility (described in Section 10.9 | |||
in Section 10.9 below), servers are allowed not to implement that | below) is OPTIONAL for servers to implement. Even where it is | |||
facility and even where it is implemented, it may not be always be | implemented, it may not be always be functional because of resource | |||
functional, because of resource availability issues or other | availability issues or other constraints. Thus, it is important to | |||
constraints. Because of that, it is important to understand how name | understand how name and directory caching are done in the absence of | |||
and directory caching are done in the absence of directory | directory delegations. Those topics are discussed in the next in | |||
delegations. Those topics are discussed in the next in | ||||
Section 10.8.1. | Section 10.8.1. | |||
10.8.1. Name Caching | 10.8.1. Name Caching | |||
The results of LOOKUP and READDIR operations may be cached to avoid | The results of LOOKUP and READDIR operations may be cached to avoid | |||
the cost of subsequent LOOKUP operations. Just as in the case of | the cost of subsequent LOOKUP operations. Just as in the case of | |||
attribute caching, inconsistencies may arise among the various client | attribute caching, inconsistencies may arise among the various client | |||
caches. To mitigate the effects of these inconsistencies and given | caches. To mitigate the effects of these inconsistencies and given | |||
the context of typical file system APIs, an upper time boundary is | the context of typical file system APIs, an upper time boundary is | |||
maintained on how long a client name cache entry can be kept without | maintained on how long a client name cache entry can be kept without | |||
skipping to change at page 210, line 50 | skipping to change at page 211, line 27 | |||
Directory caching for the NFSv4.1 protocol, as previously described, | Directory caching for the NFSv4.1 protocol, as previously described, | |||
is similar to file caching in previous versions. Clients typically | is similar to file caching in previous versions. Clients typically | |||
cache directory information for a duration determined by the client. | cache directory information for a duration determined by the client. | |||
At the end of a predefined timeout, the client will query the server | At the end of a predefined timeout, the client will query the server | |||
to see if the directory has been updated. By caching attributes, | to see if the directory has been updated. By caching attributes, | |||
clients reduce the number of GETATTR calls made to the server to | clients reduce the number of GETATTR calls made to the server to | |||
validate attributes. Furthermore, frequently accessed files and | validate attributes. Furthermore, frequently accessed files and | |||
directories, such as the current working directory, have their | directories, such as the current working directory, have their | |||
attributes cached on the client so that some NFS operations can be | attributes cached on the client so that some NFS operations can be | |||
performed without having to make an RPC call. By caching name and | performed without having to make an RPC call. By caching name and | |||
inode information about most recently looked up entries in the | inode information about most recently looked up entries in a | |||
Directory Name Lookup Cache (DNLC), clients do not need to send | Directory Name Lookup Cache (DNLC), clients do not need to send | |||
LOOKUP calls to the server every time these files are accessed. | LOOKUP calls to the server every time these files are accessed. | |||
This caching approach works reasonably well at reducing network | This caching approach works reasonably well at reducing network | |||
traffic in many environments. However, it does not address | traffic in many environments. However, it does not address | |||
environments where there are numerous queries for files that do not | environments where there are numerous queries for files that do not | |||
exist. In these cases of "misses", the client must make RPC calls to | exist. In these cases of "misses", the client sends requests to the | |||
the server in order to provide reasonable application semantics and | server in order to provide reasonable application semantics and | |||
promptly detect the creation of new directory entries. Examples of | promptly detect the creation of new directory entries. Examples of | |||
high miss activity are compilation in software development | high miss activity are compilation in software development | |||
environments. The current behavior of NFS limits its potential | environments. The current behavior of NFS limits its potential | |||
scalability and wide-area sharing effectiveness in these types of | scalability and wide-area sharing effectiveness in these types of | |||
environments. Other distributed stateful file system architectures | environments. Other distributed stateful file system architectures | |||
such as AFS and DFS have proven that adding state around directory | such as AFS and DFS have proven that adding state around directory | |||
contents can greatly reduce network traffic in high-miss | contents can greatly reduce network traffic in high-miss | |||
environments. | environments. | |||
Delegation of directory contents is a RECOMMENDED feature of NFSv4.1. | Delegation of directory contents is an OPTIONAL feature of NFSv4.1. | |||
Directory delegations provide similar traffic reduction benefits as | Directory delegations provide similar traffic reduction benefits as | |||
with file delegations. By allowing clients to cache directory | with file delegations. By allowing clients to cache directory | |||
contents (in a read-only fashion) while being notified of changes, | contents (in a read-only fashion) while being notified of changes, | |||
the client can avoid making frequent requests to interrogate the | the client can avoid making frequent requests to interrogate the | |||
contents of slowly-changing directories, reducing network traffic and | contents of slowly-changing directories, reducing network traffic and | |||
improving client performance. It can also simplify the task of | improving client performance. It can also simplify the task of | |||
determining whether other clients are making changes to the directory | determining whether other clients are making changes to the directory | |||
when the client itself is making many changes to the directory and | when the client itself is making many changes to the directory and | |||
changes are not serialized. | changes are not serialized. | |||
skipping to change at page 211, line 51 | skipping to change at page 212, line 28 | |||
NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation | NFSv4.1 introduces the GET_DIR_DELEGATION (Section 18.39) operation | |||
to allow the client to ask for a directory delegation. The | to allow the client to ask for a directory delegation. The | |||
delegation covers directory attributes and all entries in the | delegation covers directory attributes and all entries in the | |||
directory. If either of these change, the delegation will be | directory. If either of these change, the delegation will be | |||
recalled synchronously. The operation causing the recall will have | recalled synchronously. The operation causing the recall will have | |||
to wait before the recall is complete. Any changes to directory | to wait before the recall is complete. Any changes to directory | |||
entry attributes will not cause the delegation to be recalled. | entry attributes will not cause the delegation to be recalled. | |||
In addition to asking for delegations, a client can also ask for | In addition to asking for delegations, a client can also ask for | |||
notifications for certain events. These events include changes to | notifications for certain events. These events include changes to | |||
directory attributes and/or its contents. If a client asks for | the directory's attributes and/or its contents. If a client asks for | |||
notification for a certain event, the server will notify the client | notification for a certain event, the server will notify the client | |||
when that event occurs. This will not result in the delegation being | when that event occurs. This will not result in the delegation being | |||
recalled for that client. The notifications are asynchronous and | recalled for that client. The notifications are asynchronous and | |||
provide a way of avoiding recalls in situations where a directory is | provide a way of avoiding recalls in situations where a directory is | |||
changing enough that the pure recall model may not be effective while | changing enough that the pure recall model may not be effective while | |||
trying to allow the client to get substantial benefit. In the | trying to allow the client to get substantial benefit. In the | |||
absence of notifications, once the delegation is recalled the client | absence of notifications, once the delegation is recalled the client | |||
has to refresh its directory cache which might not be very efficient | has to refresh its directory cache which might not be very efficient | |||
for very large directories. | for very large directories. | |||
The delegation is read-only and the client may not make changes to | The delegation is read-only and the client may not make changes to | |||
the directory other than by performing NFSv4.1 operations that modify | the directory other than by performing NFSv4.1 operations that modify | |||
the directory or the associated file attributes so that the server | the directory or the associated file attributes so that the server | |||
has knowledge of these changes. In order to keep the client | has knowledge of these changes. In order to keep the client | |||
namespace synchronized with the server, the server will, if the | namespace synchronized with the server, the server will, if the | |||
client has requested notifications, notify the client holding the | client has requested notifications, notify the client holding the | |||
delegation of the changes made as a result. This is to avoid any | delegation of the changes made as a result. This is to avoid any | |||
need for subsequent GETATTR or READDIR calls to the server. If a | need for subsequent GETATTR or READDIR calls to the server. If a | |||
single client is holding the delegation and that client makes any | single client is holding the delegation and that client makes any | |||
changes to the directory (i.e. the changes are made via operations | changes to the directory (i.e. the changes are made via operations | |||
sent though a session associated with the clientid holding the | sent though a session associated with the client ID holding the | |||
delegation), the delegation will not be recalled. Multiple clients | delegation), the delegation will not be recalled. Multiple clients | |||
may hold a delegation on the same directory, but if any such client | may hold a delegation on the same directory, but if any such client | |||
modifies the directory, the server MUST recall the delegation from | modifies the directory, the server MUST recall the delegation from | |||
the other clients, unless those clients have made provisions to be | the other clients, unless those clients have made provisions to be | |||
notified of that sort of modification. | notified of that sort of modification. | |||
Delegations can be recalled by the server at any time. Normally, the | Delegations can be recalled by the server at any time. Normally, the | |||
server will recall the delegation when the directory changes in a way | server will recall the delegation when the directory changes in a way | |||
that is not covered by the notification, or when the directory | that is not covered by the notification, or when the directory | |||
changes and notifications have not been requested. If another client | changes and notifications have not been requested. If another client | |||
skipping to change at page 213, line 31 | skipping to change at page 214, line 9 | |||
o For OPEN, see Section 18.16.4. | o For OPEN, see Section 18.16.4. | |||
o For REMOVE, see Section 18.25.4. | o For REMOVE, see Section 18.25.4. | |||
o For RENAME, see Section 18.26.4. | o For RENAME, see Section 18.26.4. | |||
o For SETATTR, see Section 18.30.4. | o For SETATTR, see Section 18.30.4. | |||
10.9.5. Directory Delegation Recovery | 10.9.5. Directory Delegation Recovery | |||
Crash recovery for state on regular files has two main goals, | Recovery from client or server restart for state on regular files has | |||
avoiding the necessity of breaking application guarantees with | two main goals, avoiding the necessity of breaking application | |||
respect to locked files and delivery of updates cached at the client. | guarantees with respect to locked files and delivery of updates | |||
Neither of these applies to directories protected by read delegations | cached at the client. Neither of these goals applies to directories | |||
and notifications. Thus, no provision is made for reclaiming | protected by read delegations and notifications. Thus, no provision | |||
directory delegations in the event of client or server failure. The | is made for reclaiming directory delegations in the event of client | |||
client can simply establish a directory delegation in the same | or server restart. The client can simply establish a directory | |||
fashion as was done initially. | delegation in the same fashion as was done initially. | |||
11. Multi-Server Namespace | 11. Multi-Server Namespace | |||
NFSv4.1 supports attributes that allow a namespace to extend beyond | NFSv4.1 supports attributes that allow a namespace to extend beyond | |||
the boundaries of a single server. It is RECOMMENDED that clients | the boundaries of a single server. It is RECOMMENDED that clients | |||
and servers support construction of such multi-server namespaces. | and servers support construction of such multi-server namespaces. | |||
Use of such multi-server namespaces is OPTIONAL however, and for many | Use of such multi-server namespaces is OPTIONAL however, and for many | |||
purposes, single-server namespace are perfectly acceptable. Use of | purposes, single-server namespace are perfectly acceptable. Use of | |||
multi-server namespaces can provide many advantages, however, by | multi-server namespaces can provide many advantages, however, by | |||
separating a file system's logical position in a namespace from the | separating a file system's logical position in a namespace from the | |||
End of changes. 112 change blocks. | ||||
423 lines changed or deleted | 432 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |