Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring...
draft-pre-ch-7.txt | draft-ietf-nfsv4-minorversion1-22.txt | |||
---|---|---|---|---|
NFSv4 S. Shepler | NFSv4 S. Shepler | |||
Internet-Draft M. Eisler | Internet-Draft M. Eisler | |||
Intended status: Standards Track D. Noveck | Intended status: Standards Track D. Noveck | |||
Expires: September 14, 2008 Editors | Expires: September 19, 2008 Editors | |||
March 13, 2008 | March 18, 2008 | |||
NFS Version 4 Minor Version 1 | NFS Version 4 Minor Version 1 | |||
draft-ietf-nfsv4-minorversion1-22.txt | draft-ietf-nfsv4-minorversion1-22.txt | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on September 14, 2008. | This Internet-Draft will expire on September 19, 2008. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The IETF Trust (2008). | Copyright (C) The IETF Trust (2008). | |||
Abstract | Abstract | |||
This Internet-Draft describes NFS version 4 minor version one, | This Internet-Draft describes NFS version 4 minor version one, | |||
including features retained from the base protocol and protocol | including features retained from the base protocol and protocol | |||
extensions made subsequently. Major extensions introduced in NFS | extensions made subsequently. Major extensions introduced in NFS | |||
skipping to change at page 4, line 30 | skipping to change at page 4, line 30 | |||
8. State Management . . . . . . . . . . . . . . . . . . . . . . 147 | 8. State Management . . . . . . . . . . . . . . . . . . . . . . 147 | |||
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 148 | 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 148 | |||
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 148 | 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 148 | |||
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 149 | 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 149 | |||
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 150 | 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 150 | |||
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 151 | 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 151 | |||
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152 | 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 152 | |||
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155 | 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 155 | |||
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 156 | 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 156 | |||
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 | 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 158 | |||
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 158 | 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 159 | |||
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 | 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 159 | |||
8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 | 8.4.3. Network Partitions and Recovery . . . . . . . . . . 163 | |||
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 167 | 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 168 | |||
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 168 | 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 169 | |||
8.7. Clocks, Propagation Delay, and Calculating Lease | 8.7. Clocks, Propagation Delay, and Calculating Lease | |||
Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 | Expiration . . . . . . . . . . . . . . . . . . . . . . . 169 | |||
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 169 | 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 170 | |||
9. File Locking and Share Reservations . . . . . . . . . . . . . 170 | 9. File Locking and Share Reservations . . . . . . . . . . . . . 171 | |||
9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 | 9.1. Opens and Byte-range Locks . . . . . . . . . . . . . . . 171 | |||
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 | 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 171 | |||
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 171 | 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 172 | |||
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 174 | 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 175 | |||
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 | 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 175 | |||
9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 175 | 9.4. Stateid Seqid Values and Byte-range Locks . . . . . . . 176 | |||
9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 175 | 9.5. Issues with Multiple Open-owners . . . . . . . . . . . . 176 | |||
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 | 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 176 | |||
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 177 | 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 178 | |||
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 | 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 178 | |||
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 | 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 179 | |||
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 179 | 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 180 | |||
9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 | 9.11. Reclaim of Open and Byte-range Locks . . . . . . . . . . 180 | |||
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 180 | 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 181 | |||
10.1. Performance Challenges for Client-Side Caching . . . . . 181 | 10.1. Performance Challenges for Client-Side Caching . . . . . 181 | |||
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 | 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 182 | |||
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 184 | 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 185 | |||
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 186 | 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 187 | |||
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 | 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 187 | |||
10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 | 10.3.2. Data Caching and File Locking . . . . . . . . . . . 188 | |||
10.3.3. Data Caching and Mandatory File Locking . . . . . . 189 | 10.3.3. Data Caching and Mandatory File Locking . . . . . . 190 | |||
10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 | 10.3.4. Data Caching and File Identity . . . . . . . . . . . 190 | |||
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 | 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 191 | |||
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 193 | 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 194 | |||
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 | 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 195 | |||
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 | 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 195 | |||
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 | 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 198 | |||
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 | 10.4.5. Clients that Fail to Honor Delegation Recalls . . . 200 | |||
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 200 | 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 201 | |||
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 | 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 201 | |||
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 | 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 202 | |||
10.5.1. Revocation Recovery for Write Open Delegation . . . 202 | 10.5.1. Revocation Recovery for Write Open Delegation . . . 203 | |||
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 | 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 203 | |||
10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 | 10.7. Data and Metadata Caching and Memory Mapped Files . . . 205 | |||
10.8. Name and Directory Caching without Directory | 10.8. Name and Directory Caching without Directory | |||
Delegations . . . . . . . . . . . . . . . . . . . . . . 207 | Delegations . . . . . . . . . . . . . . . . . . . . . . 208 | |||
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 207 | 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 208 | |||
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 | 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 209 | |||
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 | 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 210 | |||
10.9.1. Introduction to Directory Delegations . . . . . . . 210 | 10.9.1. Introduction to Directory Delegations . . . . . . . 210 | |||
10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 | 10.9.2. Directory Delegation Design . . . . . . . . . . . . 211 | |||
10.9.3. Attributes in Support of Directory Notifications . . 212 | 10.9.3. Attributes in Support of Directory Notifications . . 212 | |||
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 | 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 212 | |||
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 | 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 213 | |||
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 | 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 213 | |||
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 213 | 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 214 | |||
11.2. File System Presence or Absence . . . . . . . . . . . . 214 | 11.2. File System Presence or Absence . . . . . . . . . . . . 214 | |||
11.3. Getting Attributes for an Absent File System . . . . . . 215 | 11.3. Getting Attributes for an Absent File System . . . . . . 215 | |||
11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 | 11.3.1. GETATTR Within an Absent File System . . . . . . . . 215 | |||
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 216 | 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 217 | |||
11.4. Uses of Location Information . . . . . . . . . . . . . . 217 | 11.4. Uses of Location Information . . . . . . . . . . . . . . 217 | |||
11.4.1. File System Replication . . . . . . . . . . . . . . 218 | 11.4.1. File System Replication . . . . . . . . . . . . . . 218 | |||
11.4.2. File System Migration . . . . . . . . . . . . . . . 219 | 11.4.2. File System Migration . . . . . . . . . . . . . . . 219 | |||
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 | 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 220 | |||
11.5. Location Entries and Server Identity . . . . . . . . . . 221 | 11.5. Location Entries and Server Identity . . . . . . . . . . 221 | |||
11.6. Additional Client-side Considerations . . . . . . . . . 222 | 11.6. Additional Client-side Considerations . . . . . . . . . 222 | |||
11.7. Effecting File System Transitions . . . . . . . . . . . 223 | 11.7. Effecting File System Transitions . . . . . . . . . . . 223 | |||
11.7.1. File System Transitions and Simultaneous Access . . 224 | 11.7.1. File System Transitions and Simultaneous Access . . 224 | |||
11.7.2. Simultaneous Use and Transparent Transitions . . . . 224 | 11.7.2. Simultaneous Use and Transparent Transitions . . . . 225 | |||
11.7.3. Filehandles and File System Transitions . . . . . . 227 | 11.7.3. Filehandles and File System Transitions . . . . . . 227 | |||
11.7.4. Fileids and File System Transitions . . . . . . . . 227 | 11.7.4. Fileids and File System Transitions . . . . . . . . 228 | |||
11.7.5. Fsids and File System Transitions . . . . . . . . . 229 | 11.7.5. Fsids and File System Transitions . . . . . . . . . 229 | |||
11.7.6. The Change Attribute and File System Transitions . . 229 | 11.7.6. The Change Attribute and File System Transitions . . 230 | |||
11.7.7. Lock State and File System Transitions . . . . . . . 230 | 11.7.7. Lock State and File System Transitions . . . . . . . 230 | |||
11.7.8. Write Verifiers and File System Transitions . . . . 234 | 11.7.8. Write Verifiers and File System Transitions . . . . 234 | |||
11.7.9. Readdir Cookies and Verifiers and File System | 11.7.9. Readdir Cookies and Verifiers and File System | |||
Transitions . . . . . . . . . . . . . . . . . . . . 234 | Transitions . . . . . . . . . . . . . . . . . . . . 234 | |||
11.7.10. File System Data and File System Transitions . . . . 234 | 11.7.10. File System Data and File System Transitions . . . . 235 | |||
11.8. Effecting File System Referrals . . . . . . . . . . . . 236 | 11.8. Effecting File System Referrals . . . . . . . . . . . . 236 | |||
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 | 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 236 | |||
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 | 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 240 | |||
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 242 | 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 243 | |||
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245 | 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 245 | |||
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 | 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 248 | |||
11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 253 | 11.10.2. The fs_locations_info4 Structure . . . . . . . . . . 254 | |||
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 254 | 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 255 | |||
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 256 | 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 257 | |||
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 | 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 260 | |||
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 | 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 260 | |||
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 | 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 262 | |||
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 | 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 262 | |||
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 | 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 262 | |||
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 | 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 | 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 263 | |||
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 | 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 263 | |||
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 | 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 263 | |||
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 | 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 263 | |||
skipping to change at page 147, line 6 | skipping to change at page 147, line 6 | |||
a particular file system, as opposed to all of the data within it, | a particular file system, as opposed to all of the data within it, | |||
the server can apply the security policy of a shared resource in the | the server can apply the security policy of a shared resource in the | |||
server's namespace to components of the resource's ancestors. For | server's namespace to components of the resource's ancestors. For | |||
example: | example: | |||
/ (place holder/not exported) | / (place holder/not exported) | |||
/a/b (file system 1) | /a/b (file system 1) | |||
/a/b/MySecretProject (file system 2) | /a/b/MySecretProject (file system 2) | |||
The /a/b/MySecretProject directory is a real file system and is the | The /a/b/MySecretProject directory is a real file system and is the | |||
shared resource. Suppose the security policy for /a/b/ | shared resource. Suppose the security policy for /a/b/ | |||
MySecretProject is Kerberos with integrity and it desired that | MySecretProject is Kerberos with integrity and it is desired to limit | |||
knowledge of the existence of this file system to be very limited. | knowledge of the existence of this file system. In this case, the | |||
In this case the server should apply the same security policy to | server should apply the same security policy to /a/b. This allows | |||
/a/b. This allows for knowledge the existence of a file system to be | for knowledge of the existence of a file system to be secured when | |||
secured in cases where this is desirable. | desirable. | |||
For the case of the use of multiple, disjoint security mechanisms in | For the case of the use of multiple, disjoint security mechanisms in | |||
the server's resources, applying that sort of policy would result in | the server's resources, applying that sort of policy would result in | |||
the higher-level file system not being accessible using any security | the higher-level file system not being accessible using any security | |||
flavor, which would make the that higher-level file system | flavor, which would make the that higher-level file system | |||
inaccessible. Therefore, that sort of configuration is not | inaccessible. Therefore, that sort of configuration is not | |||
compatible with hiding the existence (as opposed to the contents) | compatible with hiding the existence (as opposed to the contents) | |||
from clients using multiple disjoint sets of security flavors. | from clients using multiple disjoint sets of security flavors. | |||
In other circumstances, a desirable policy is for the security of a | In other circumstances, a desirable policy is for the security of a | |||
particular object in the server's namespace should include the union | particular object in the server's namespace should include the union | |||
of all security mechanisms of all direct descendants. A common and | of all security mechanisms of all direct descendants. A common and | |||
convenient practice, unless strong security requirements dictate | convenient practice, unless strong security requirements dictate | |||
otherwise, is to make all of the pseudo file system accessible by all | otherwise, is to make all of the pseudo file system accessible by all | |||
of the valid security mechanisms. | of the valid security mechanisms. | |||
Where there is concern about the security of data on the wire, | Where there is concern about the security of data on the network, | |||
clients should use strong security mechanisms to access the pseudo | clients should use strong security mechanisms to access the pseudo | |||
file system in order to prevent man-in-the-middle-attacks from | file system in order to prevent man-in-the-middle attacks. | |||
directing LOOKUPs within the pseudo file system from compromising the | ||||
existence of sensitive data, or getting access to data that the | ||||
client is sending by directing the client to send it using weak | ||||
security mechanisms. | ||||
8. State Management | 8. State Management | |||
Integrating locking into the NFS protocol necessarily causes it to be | Integrating locking into the NFS protocol necessarily causes it to be | |||
stateful. With the inclusion of such features as share reservations, | stateful. With the inclusion of such features as share reservations, | |||
file and directory delegations, recallable layouts, and support for | file and directory delegations, recallable layouts, and support for | |||
mandatory record locking the protocol becomes substantially more | mandatory record locking, the protocol becomes substantially more | |||
dependent on proper management of state than the traditional | dependent on proper management of state than the traditional | |||
combination of NFS and NLM [36]. These features include expanded | combination of NFS and NLM [36]. These features include expanded | |||
locking facilities, which provide some measure of interclient | locking facilities, which provide some measure of interclient | |||
exclusion, but the state is also valuable to providing other useful | exclusion, but the state is also valuable to offering features not | |||
features not readily providable using a stateless model. There are | readily providable using a stateless model. There are three | |||
three components to making this state manageable: | components to making this state manageable: | |||
o Clear division between client and server | o Clear division between client and server | |||
o Ability to reliably detect inconsistency in state between client | o Ability to reliably detect inconsistency in state between client | |||
and server | and server | |||
o Simple and robust recovery mechanisms | o Simple and robust recovery mechanisms | |||
In this model, the server owns the state information. The client | In this model, the server owns the state information. The client | |||
requests changes in locks and the server responds with the changes | requests changes in locks and the server responds with the changes | |||
made. Non-client-initiated changes in locking state are infrequent | made. Non-client-initiated changes in locking state are infrequent | |||
and the client receives prompt notification of them and can adjust | and the client receives prompt notification of them and can adjust | |||
its view of the locking state to reflect the server's changes. | its view of the locking state to reflect the server's changes. | |||
Individual pieces of state created by the server and passed to the | Individual pieces of state created by the server and passed to the | |||
client at its request are represented by 128-bit stateids. These | client at its request are represented by 128-bit stateids. These | |||
stateids may represent a particular open file, a set of byte-range | stateids may represent a particular open file, a set of byte-range | |||
locks held by a particular owner, or a recallable delegation of | locks held by a particular owner, or a recallable delegation of | |||
skipping to change at page 149, line 4 | skipping to change at page 148, line 47 | |||
and a unitary client. | and a unitary client. | |||
8.2. Stateid Definition | 8.2. Stateid Definition | |||
When the server grants a lock of any type (including opens, record | When the server grants a lock of any type (including opens, record | |||
locks, delegations, and layouts) it responds with a unique stateid, | locks, delegations, and layouts) it responds with a unique stateid, | |||
that represents a set of locks (often a single lock) for the same | that represents a set of locks (often a single lock) for the same | |||
file, of the same type, and sharing the same ownership | file, of the same type, and sharing the same ownership | |||
characteristics. Thus opens of the same file by different open- | characteristics. Thus opens of the same file by different open- | |||
owners each have an identifying stateid. Similarly, each set of | owners each have an identifying stateid. Similarly, each set of | |||
record locks on a file owned by a specific lock-owner and gotten via | record locks on a file owned by a specific lock-owner has its own | |||
an open for a specific open-owner, has its own identifying stateid. | identifying stateid. Delegations and layouts also have associated | |||
Delegations and layouts also have associated stateids by which they | stateids by which they may be referenced. The stateid is used as a | |||
may be referenced. The stateid is used as a shorthand reference to a | shorthand reference to a lock or set of locks and given a stateid the | |||
lock or set of locks and given a stateid the server can determine the | server can determine the associated state-owner or state-owners (in | |||
associated state-owner or state-owners (in the case of an open-owner/ | the case of an open-owner/lock-owner pair) and the associated | |||
lock-owner pair) and the associated filehandle. When stateids are | filehandle. When stateids are used, the current filehandle must be | |||
used, the current filehandle must be the one associated with that | the one associated with that stateid. | |||
stateid. | ||||
All stateids associated with a given clientid are associated with a | All stateids associated with a given client ID are associated with a | |||
common lease which represents the claim of those stateids and the | common lease which represents the claim of those stateids and the | |||
objects they represent to be maintained by the server. See | objects they represent to be maintained by the server. See | |||
Section 8.3 for a discussion of leases. | Section 8.3 for a discussion of leases. | |||
The server may assign stateids independently for different clients. | The server may assign stateids independently for different clients. | |||
A stateid with the same bit pattern for one client may designate an | A stateid with the same bit pattern for one client may designate an | |||
entirely different set of locks for a different client. The stateid | entirely different set of locks for a different client. The stateid | |||
is always interpreted with respect to the client ID associated with | is always interpreted with respect to the client ID associated with | |||
the current session. Stateids apply to all sessions associated with | the current session. Stateids apply to all sessions associated with | |||
the given client ID and the client may use a stateid obtained from | the given client ID and the client may use a stateid obtained from | |||
skipping to change at page 149, line 38 | skipping to change at page 149, line 32 | |||
With the exception of special stateids, to be discussed later, each | With the exception of special stateids, to be discussed later, each | |||
stateid represents locking objects of one of a set of types defined | stateid represents locking objects of one of a set of types defined | |||
by the NFSv4.1 protocol. Note that in all these cases, where we | by the NFSv4.1 protocol. Note that in all these cases, where we | |||
speak of guarantee, it is understood there are situations such as a | speak of guarantee, it is understood there are situations such as a | |||
client restart, or lock revocation, that allow the guarantee to be | client restart, or lock revocation, that allow the guarantee to be | |||
voided. | voided. | |||
o Stateids may represent opens of files. | o Stateids may represent opens of files. | |||
Each stateid in this case represents the open for a given | Each stateid in this case represents the open for a given client | |||
clientid/open-owner/filehandle triple. Such stateids are subject | ID/open-owner/filehandle triple. Such stateids are subject to | |||
to change (with consequent bumping of the seqid) in response to | change (with consequent incrementing of the stateid's seqid) in | |||
OPENs that result in upgrade and OPEN_DOWNGRADE operations. | response to OPENs that result in upgrade and OPEN_DOWNGRADE | |||
operations. | ||||
o Stateids may represent sets of byte-range locks. | o Stateids may represent sets of byte-range locks. | |||
All locks held on a particular file by a particular owner and all | All locks held on a particular file by a particular owner and all | |||
gotten under the aegis of a particular open file are associated | gotten under the aegis of a particular open file are associated | |||
with a single stateid with the seqid being bumped as LOCK and | with a single stateid with the seqid being increment whenever LOCK | |||
LOCKU operation affect that set of locks. | and LOCKU operations affect that set of locks. | |||
o Stateids may represent file delegations, which are recallable | o Stateids may represent file delegations, which are recallable | |||
guarantees by the server to the client, that other clients will | guarantees by the server to the client, that other clients will | |||
not reference, or will not modify a particular file, until the | not reference, or will not modify a particular file, until the | |||
delegation is returned. In NFSv4.1, file delegations may be | delegation is returned. In NFSv4.1, file delegations may be | |||
obtained on both regular and non-regular files. | obtained on both regular and non-regular files. | |||
A stateid represents a single delegation held by a client for a | A stateid represents a single delegation held by a client for a | |||
particular filehandle. | particular filehandle. | |||
skipping to change at page 150, line 25 | skipping to change at page 150, line 20 | |||
A stateid represents a single delegation held by a client for a | A stateid represents a single delegation held by a client for a | |||
particular directory filehandle. | particular directory filehandle. | |||
o Stateids may represent layouts, which are recallable guarantees by | o Stateids may represent layouts, which are recallable guarantees by | |||
the server to the client, that particular files may be accessed | the server to the client, that particular files may be accessed | |||
via an alternate data access protocol at specific locations. Such | via an alternate data access protocol at specific locations. Such | |||
access is limited to particular sets of byte ranges and may | access is limited to particular sets of byte ranges and may | |||
proceed until those byte ranges are reduced or the layout is | proceed until those byte ranges are reduced or the layout is | |||
returned. | returned. | |||
A stateid represents all layouts held by a particular client for a | A stateid represents the set of all layouts held by a particular | |||
particular filehandle with a given layout type. The seqid is | client for a particular filehandle with a given layout type. The | |||
updated as the contents of that set changes with LAYOUT | seqid is updated as the layouts of that set changes with layout | |||
stateid changing operations such as LAYOUTGET and LAYOUTRETURN. | ||||
8.2.2. Stateid Structure | 8.2.2. Stateid Structure | |||
Stateids are divided into two fields, a 96-bit "other" field | Stateids are divided into two fields, a 96-bit "other" field | |||
identifying the specific set of locks and a 32-bit "seqid" sequence | identifying the specific set of locks and a 32-bit "seqid" sequence | |||
value. Except in the case of special stateids, to be discussed | value. Except in the case of special stateids, to be discussed | |||
below, a particular value of the "other" field denotes a set of locks | below, a particular value of the "other" field denotes a set of locks | |||
of the same type (for example byte-range locks, opens, delegations, | of the same type (for example byte-range locks, opens, delegations, | |||
or layouts), for a specific file or directory, and sharing the same | or layouts), for a specific file or directory, and sharing the same | |||
ownership characteristics. The seqid designates a specific instance | ownership characteristics. The seqid designates a specific instance | |||
skipping to change at page 156, line 8 | skipping to change at page 156, line 4 | |||
8.2.5. Stateid Use for I/O Operations | 8.2.5. Stateid Use for I/O Operations | |||
Clients performing I/O operations (and SETATTR's modifying the file | Clients performing I/O operations (and SETATTR's modifying the file | |||
size), need to select an appropriate stateid based on the locks | size), need to select an appropriate stateid based on the locks | |||
(including opens and delegations) held by the client and the various | (including opens and delegations) held by the client and the various | |||
types of state-owners issuing the I/O requests. | types of state-owners issuing the I/O requests. | |||
The following rules, applied in order of decreasing priority, govern | The following rules, applied in order of decreasing priority, govern | |||
the selection of the appropriate stateid. Note that the rules are | the selection of the appropriate stateid. Note that the rules are | |||
slightly different in the case of I/O to data servers when file | slightly different in the case of I/O to data servers when file | |||
layouts are being used. (See Section 13.9.1). | layouts are being used (see Section 13.9.1). | |||
o If the client holds a delegation for the file in question, the | o If the client holds a delegation for the file in question, the | |||
delegation stateid should be used. | delegation stateid SHOULD be used. | |||
o Otherwise, if the lock-owner corresponding entity (e.g. process) | o Otherwise, if the lock-owner corresponding entity (e.g. process) | |||
issuing the I/O has a lock stateid for the associated open file, | issuing the I/O has a lock stateid for the associated open file, | |||
then the lock stateid for that lock-owner and open file should be | then the lock stateid for that lock-owner and open file SHOULD be | |||
used. | used. | |||
o If there is no lock stateid, then the open stateid for the open | o If there is no lock stateid, then the open stateid for the open | |||
file in question is used. | file in question SHOULD be used. | |||
o Finally, if none of the above apply, then a special stateid should | o Finally, if none of the above apply, then a special stateid SHOULD | |||
be used. | be used. | |||
8.3. Lease Renewal | 8.3. Lease Renewal | |||
The purpose of a lease is to provide allow the client to indicate to | The purpose of a lease is to allow the client to indicate to the | |||
the server, in a low-overhead way, that it is active, and thus that | server, in a low-overhead way, that it is active, and thus that the | |||
the server is to retain its locks. This arrangement allows the | server is to retain the client's locks. This arrangement allows the | |||
server to remove stale locking-related objects that are held by a | server to remove stale locking-related objects that are held by a | |||
client that has crashed or is otherwise unreachable, once the | client that has crashed or is otherwise unreachable, once the | |||
relevant lease expires. This allows other clients to obtain | relevant lease expires. This in turn allows other clients to obtain | |||
conflicting locks without being delayed indefinitely by inactive or | conflicting locks without being delayed indefinitely by inactive or | |||
unreachable clients. It is not a mechanism for cache consistency and | unreachable clients. It is not a mechanism for cache consistency and | |||
lease renewals may not be denied if the lease interval has not | lease renewals may not be denied if the lease interval has not | |||
expired. | expired. | |||
Since each session is associated with a specific client (identified | Since each session is associated with a specific client (identified | |||
by the client's client ID), any operation sent on that session is an | by the client's client ID), any operation sent on that session is an | |||
indication that the associated client is reachable. When a request | indication that the associated client is reachable. When a request | |||
is sent for a given session, successful execution of a SEQUENCE | is sent for a given session, successful execution of a SEQUENCE | |||
operation (or successful retrieval of the result of SEQUENCE from the | operation (or successful retrieval of the result of SEQUENCE from the | |||
reply cache) on an unexpired lease will result in the lease being | reply cache) on an unexpired lease will result in the lease being | |||
implicitly renewed, for the standard renewal period. | implicitly renewed, for the standard renewal period (equal to the | |||
lease_time attribute). | ||||
If the client ID's lease has not expired when the server receives a | If the client ID's lease has not expired when the server receives a | |||
SEQUENCE operation, then the server MUST renew the lease. If the | SEQUENCE operation, then the server MUST renew the lease. If the | |||
client ID's lease has expired when the server receives a SEQUENCE | client ID's lease has expired when the server receives a SEQUENCE | |||
operation, the server MAY renew the lease; this depends on whether | operation, the server MAY renew the lease; this depends on whether | |||
any state was revoked as a result of the client's failure to renew | any state was revoked as a result of the client's failure to renew | |||
the lease before expiration. | the lease before expiration. | |||
Absent other activity that would renew the lease, a COMPOUND | Absent other activity that would renew the lease, a COMPOUND | |||
consisting of a single SEQUENCE operation will suffice. The client | consisting of a single SEQUENCE operation will suffice. The client | |||
should also take communication-related delays into account and take | should also take communication-related delays into account and take | |||
steps to ensure that the renewal messages actually reach the server | steps to ensure that the renewal messages actually reach the server | |||
in good time. For example: | in good time. For example: | |||
o When trunking is in effect, the client should consider issuing | o When trunking is in effect, the client should consider issuing | |||
multiple requests on different connections, in order to ensure | multiple requests on different connections, in order to ensure | |||
that renewal occurs, even in the event of blockage in the path | that renewal occurs, even in the event of blockage in the path | |||
used for one of those connections. | used for one of those connections. | |||
o TCP retransmission delays might become so large as to approach or | o Transport retransmission delays might become so large as to | |||
exceed the length of the lease period. This may be particularly | approach or exceed the length of the lease period. This may be | |||
likely when the server is unresponsive due to a restart; see | particularly likely when the server is unresponsive due to a | |||
Section 8.4.2.1 | restart; see Section 8.4.2.1. If the client implementation is not | |||
careful, transport retransmission delays can result in the client | ||||
failing to detect a server restart before the grace period ends. | ||||
The scenario is that the client is using a transport with | ||||
exponential back off, such that the maximum retransmission timeout | ||||
excees the both the grace period and the lease_time attribute. A | ||||
network partition causes the client's connection's retransmission | ||||
interval to back off, and even after the partition heals, the next | ||||
transport-level retransmission is sent after the server has | ||||
restarted and its grace period ends. | ||||
The client MUST either recover from the ensuing NFS4ERR_NOGRACE | ||||
errors, or it MUST ensure that despite transport level | ||||
retransmission intervals that exceed the lease_time, nonetheless a | ||||
SEQUENCE operation is sent that renews the lease before | ||||
expiration. The client can achieve this by associating a new | ||||
connection with the session, and sending a SEQUENCE operation on | ||||
it. However, if the attempt to establish a new connection is | ||||
delayed for same reason (exponential backoff of the connection | ||||
establishment packets), the client will have to abort the | ||||
connection establishment attempt before the lease expires, and try | ||||
again. | ||||
If the server renews the lease upon receiving a SEQUENCE operation, | If the server renews the lease upon receiving a SEQUENCE operation, | |||
the server MUST NOT allow the lease to expire while the rest of the | the server MUST NOT allow the lease to expire while the rest of the | |||
operations in the COMPOUND procedure's request are still executing. | operations in the COMPOUND procedure's request are still executing. | |||
Once the last operation has finished, and the response to COMPOUND | Once the last operation has finished, and the response to COMPOUND | |||
has been sent, the server MUST set the lease to expire no sooner than | has been sent, the server MUST set the lease to expire no sooner than | |||
the sum of current time and the value of the lease_time attribute. | the sum of current time and the value of the lease_time attribute. | |||
A client ID's lease can expire when it has been at least the lease | A client ID's lease can expire when it has been at least the lease | |||
interval (lease_time) since the last lease-renewing SEQUENCE | interval (lease_time) since the last lease-renewing SEQUENCE | |||
operation was sent on any of the client ID's sessions and there must | operation was sent on any of the client ID's sessions and there are | |||
be no active COMPOUND operations on any such session. | no active COMPOUND operations on any such sessions. | |||
Because the SEQUENCE operation is the basic mechanism to renew a | Because the SEQUENCE operation is the basic mechanism to renew a | |||
lease, and because if must be done at least once for each lease | lease, and because if must be done at least once for each lease | |||
period, it is the natural mechanism whereby the server will inform | period, it is the natural mechanism whereby the server will inform | |||
the client of changes in the lease status that the client needs to be | the client of changes in the lease status that the client needs to be | |||
informed of. The client should inspect the status flags | informed of. The client should inspect the status flags | |||
(sr_status_flags) returned by sequence and take the appropriate | (sr_status_flags) returned by sequence and take the appropriate | |||
action. (See Section 18.46.3 for details). | action (see Section 18.46.3 for details). | |||
o The status bits SEQ4_STATUS_CB_PATH_DOWN and | o The status bits SEQ4_STATUS_CB_PATH_DOWN and | |||
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the | SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the | |||
backchannel which the client may need to address in order to | backchannel which the client may need to address in order to | |||
receive callback requests. | receive callback requests. | |||
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and | o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and | |||
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicates actual problems with | SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS | |||
GSS contexts for the backchannel which the client may have to | contexts for the backchannel which the client may have to address | |||
address to allow callback requests to be sent to it. | to allow callback requests to be sent to it. | |||
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | |||
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, | SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, | |||
SEQ4_STATUS_ADMIN_STATE_REVOKED, and | SEQ4_STATUS_ADMIN_STATE_REVOKED, and | |||
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock | SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock | |||
revocation events. When these bits are set, the client should use | revocation events. When these bits are set, the client should use | |||
TEST_STATEID to find what stateids have been revoked and use | TEST_STATEID to find what stateids have been revoked and use | |||
FREE_STATEID to acknowledge loss of the associated state. | FREE_STATEID to acknowledge loss of the associated state. | |||
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that | o The status bit SEQ4_STATUS_LEASE_MOVE indicates that | |||
responsibility for lease renewal has been transferred to one or | responsibility for lease renewal has been transferred to one or | |||
more new servers. | more new servers. | |||
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that | o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that | |||
due to server restart the client must reclaim locking state. | due to server restart the client must reclaim locking state. | |||
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates server has | o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server | |||
encountered an unrecoverable fault with the backchannel (e.g. it | has encountered an unrecoverable fault with the backchannel (e.g. | |||
has lost track of a sequence id for a slot in the backchannel). | it has lost track of a sequence id for a slot in the backchannel). | |||
8.4. Crash Recovery | 8.4. Crash Recovery | |||
A critical requirement in crash recovery is that both the client and | A critical requirement in crash recovery is that both the client and | |||
the server know when the other has failed. Additionally, it is | the server know when the other has failed. Additionally, it is | |||
required that a client sees a consistent view of data across server | required that a client sees a consistent view of data across server | |||
restarts. All READ and WRITE operations that may have been queued | restarts. All READ and WRITE operations that may have been queued | |||
within the client or network buffers must wait until the client has | within the client or network buffers must wait until the client has | |||
successfully recovered the locks protecting the READ and WRITE | successfully recovered the locks protecting the READ and WRITE | |||
operations. Any that reach the server before the server can safely | operations. Any that reach the server before the server can safely | |||
determine that the client has recovered enough locking state to be | determine that the client has recovered enough locking state to be | |||
sure that such operations can be safely processed must be rejected. | sure that such operations can be safely processed must be rejected. | |||
This will happen because either: | This will happen because either: | |||
o The state presented is no longer valid since it is associated with | o The state presented is no longer valid since it is associated with | |||
a now invalid clientid. In this case the client will receive | a now invalid client ID. In this case the client will receive | |||
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any | either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any | |||
attempt to attach a new session to the existing clientid will | attempt to attach a new session to the existing client ID will | |||
encounter an NFS4ERR_STALE_CLIENTID error. | result in an NFS4ERR_STALE_CLIENTID error. | |||
o Subsequent recovery of locks may make execution of the operation | o Subsequent recovery of locks may make execution of the operation | |||
inappropriate (NFS4ERR_GRACE). | inappropriate (NFS4ERR_GRACE). | |||
8.4.1. Client Failure and Recovery | 8.4.1. Client Failure and Recovery | |||
In the event that a client fails, the server may release the client's | In the event that a client fails, the server may release the client's | |||
locks when the associated lease has expired. Conflicting locks from | locks when the associated lease has expired. Conflicting locks from | |||
another client may only be granted after this lease expiration. As | another client may only be granted after this lease expiration. As | |||
discussed in Section 8.3, when a client has not failed and re- | discussed in Section 8.3, when a client has not failed and re- | |||
establishes its lease before expiration occurs, requests for | establishes its lease before expiration occurs, requests for | |||
conflicting locks will not be granted. | conflicting locks will not be granted. | |||
To minimize client delay upon restart, lock requests are associated | To minimize client delay upon restart, lock requests are associated | |||
with an instance of the client by a client-supplied verifier. This | with an instance of the client by a client-supplied verifier. This | |||
verifier is part of the client_owner4 sent in the initial EXCHANGE_ID | verifier is part of the client_owner4 sent in the initial EXCHANGE_ID | |||
call made by the client. The server returns a client ID as a result | call made by the client. The server returns a client ID as a result | |||
of the EXCHANGE_ID operation. The client then confirms the use of | of the EXCHANGE_ID operation. The client then confirms the use of | |||
the client ID by establishing a session associated with that client | the client ID by establishing a session associated with that client | |||
ID. See Section 18.36.3 for a description how this is done. All | ID (see Section 18.36.3 for a description how this is done). All | |||
locks, including opens, record locks, delegations, and layouts | locks, including opens, record locks, delegations, and layouts | |||
obtained by sessions using that client ID are associated with that | obtained by sessions using that client ID are associated with that | |||
client ID. | client ID. | |||
Since the verifier will be changed by the client upon each | Since the verifier will be changed by the client upon each | |||
initialization, the server can compare a new verifier to the verifier | initialization, the server can compare a new verifier to the verifier | |||
associated with currently held locks and determine that they do not | associated with currently held locks and determine that they do not | |||
match. This signifies the client's new instantiation and subsequent | match. This signifies the client's new instantiation and subsequent | |||
loss of locking state. As a result, the server is free to release | loss (upon confirmation of new the client ID) of locking state. As a | |||
all locks held which are associated with the old client ID which was | result, the server is free to release all locks held which are | |||
derived from the old verifier. At this point conflicting locks from | associated with the old client ID which was derived from the old | |||
other clients, kept waiting while the lease had not yet expired, can | verifier. At this point conflicting locks from other clients, kept | |||
be granted. In addition, all stateids associated with the old | waiting while the lease had not yet expired, can be granted. In | |||
clientid can also be freed, as they are no longer reference-able. | addition, all stateids associated with the old client ID can also be | |||
freed, as they are no longer reference-able. | ||||
Note that the verifier must have the same uniqueness properties as | Note that the verifier must have the same uniqueness properties as | |||
the verifier for the COMMIT operation. | the verifier for the COMMIT operation. | |||
8.4.2. Server Failure and Recovery | 8.4.2. Server Failure and Recovery | |||
If the server loses locking state (usually as a result of a restart), | If the server loses locking state (usually as a result of a restart), | |||
it must allow clients time to discover this fact and re-establish the | it must allow clients time to discover this fact and re-establish the | |||
lost locking state. The client must be able to re-establish the | lost locking state. The client must be able to re-establish the | |||
locking state without having the server deny valid requests because | locking state without having the server deny valid requests because | |||
skipping to change at page 159, line 50 | skipping to change at page 160, line 22 | |||
A client can determine that loss of locking state has occurred via | A client can determine that loss of locking state has occurred via | |||
several methods. | several methods. | |||
1. When a SEQUENCE (most common) or other operation returns | 1. When a SEQUENCE (most common) or other operation returns | |||
NFS4ERR_BADSESSION, this may mean the session has been destroyed, | NFS4ERR_BADSESSION, this may mean the session has been destroyed, | |||
but the client ID is still valid. The client sends a | but the client ID is still valid. The client sends a | |||
CREATE_SESSION request with the client ID to re-establish the | CREATE_SESSION request with the client ID to re-establish the | |||
session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | |||
the client must establish a new client ID (see Section 8.1) and | the client must establish a new client ID (see Section 8.1) and | |||
re-establish its lock state after the CREATE_SESSION, with the | re-establish its lock state with the new client ID, after the | |||
new client ID CREATE_SESSION succeeds, (Section 8.4.2.1). | CREATE_SESSION operation succeeds (see Section 8.4.2.1). | |||
2. When a SEQUENCE (most common) or other operation on a persistent | 2. When a SEQUENCE (most common) or other operation on a persistent | |||
session returns NFS4ERR_DEADSESSION, this indicates that a | session returns NFS4ERR_DEADSESSION, this indicates that a | |||
session is no longer usable for new, i.e. not satisfied from the | session is no longer usable for new, i.e. not satisfied from the | |||
reply cache, operations. Once all pending operations are | reply cache, operations. Once all pending operations are | |||
determined to be either performed before the retry or not | determined to be either performed before the retry or not | |||
performed, the client sends a CREATE_SESSION request with the | performed, the client sends a CREATE_SESSION request with the | |||
client ID to re-establish the session. If CREATE_SESSION fails | client ID to re-establish the session. If CREATE_SESSION fails | |||
with NFS4ERR_STALE_CLIENTID, the client must establish a new | with NFS4ERR_STALE_CLIENTID, the client must establish a new | |||
client ID (see Section 8.1) and re-establish its lock state after | client ID (see Section 8.1) and re-establish its lock state after | |||
skipping to change at page 160, line 52 | skipping to change at page 161, line 24 | |||
reliably determine (through state persistently maintained across | reliably determine (through state persistently maintained across | |||
restart instances), that granting any such lock cannot possibly | restart instances), that granting any such lock cannot possibly | |||
conflict with a subsequent reclaim. When a request is made to obtain | conflict with a subsequent reclaim. When a request is made to obtain | |||
a new lock (i.e. not a reclaim-type request) during the grace period | a new lock (i.e. not a reclaim-type request) during the grace period | |||
and such a determination cannot be made, the server must return the | and such a determination cannot be made, the server must return the | |||
error NFS4ERR_GRACE. | error NFS4ERR_GRACE. | |||
Once a session is established using the new client ID, the client | Once a session is established using the new client ID, the client | |||
will use reclaim-type locking requests (e.g. LOCK requests with | will use reclaim-type locking requests (e.g. LOCK requests with | |||
reclaim set to TRUE and OPEN operations with a claim type of | reclaim set to TRUE and OPEN operations with a claim type of | |||
CLAIM_PREVIOUS. See Section 9.11) to re-establish its locking state. | CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. | |||
Once this is done, or if there is no such locking state to reclaim, | Once this is done, or if there is no such locking state to reclaim, | |||
the client sends a global RECLAIM_COMPLETE operation, i.e. one with | the client sends a global RECLAIM_COMPLETE operation, i.e. one with | |||
the rca_one_fs argument set to FALSE, to indicate that it has | the rca_one_fs argument set to FALSE, to indicate that it has | |||
reclaimed all of the locking state that it will reclaim. Once a | reclaimed all of the locking state that it will reclaim. Once a | |||
client sends such a RECLAIM_COMPLETE operation, it may attempt non- | client sends such a RECLAIM_COMPLETE operation, it may attempt non- | |||
reclaim locking operations, although it may get NFS4ERR_GRACE errors | reclaim locking operations, although it may get NFS4ERR_GRACE errors | |||
the operations until the period of special handling is over. See | the operations until the period of special handling is over. See | |||
Section 11.7.7 for a discussion of the analogous handling lock | Section 11.7.7 for a discussion of the analogous handling lock | |||
reclamation in the case of file systems transitioning from server to | reclamation in the case of file systems transitioning from server to | |||
server. | server. | |||
skipping to change at page 161, line 26 | skipping to change at page 161, line 46 | |||
During the grace period, the server must reject READ and WRITE | During the grace period, the server must reject READ and WRITE | |||
operations and non-reclaim locking requests (i.e. other LOCK and OPEN | operations and non-reclaim locking requests (i.e. other LOCK and OPEN | |||
operations) with an error of NFS4ERR_GRACE, unless it is able to | operations) with an error of NFS4ERR_GRACE, unless it is able to | |||
guarantee that these may be done safely, as described below. | guarantee that these may be done safely, as described below. | |||
The grace period may last until all clients which are known to | The grace period may last until all clients which are known to | |||
possibly have had locks have done a global RECLAIM_COMPLETE | possibly have had locks have done a global RECLAIM_COMPLETE | |||
operation, indicating that they have finished reclaiming the locks | operation, indicating that they have finished reclaiming the locks | |||
they held before the server restart. This means that a client which | they held before the server restart. This means that a client which | |||
has done a RECLAIM_COMPLETE must be prepared to receive an | has done a RECLAIM_COMPLETE must be prepared to receive an | |||
NFS4ERR_GRACE when attempting to acquire new locks. The server is | NFS4ERR_GRACE when attempting to acquire new locks. In order for the | |||
assumed to maintain in stable storage a list of clients which may | server to know that all clients with possible prior lock state have | |||
have such locks. The server may also terminate the grace period | done a RECLAIM_COMPLETE, the server must maintain in stable storage a | |||
before all clients have done a global RECLAIM_COMPLETE. The server | list of clients which may have such locks. The server may also | |||
SHOULD NOT terminate the grace period before a time equal to the | terminate the grace period before all clients have done a global | |||
lease period in order to give clients an opportunity to find out | RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period | |||
about the server restart, as a result of issuing requests on | before a time equal to the lease period in order to give clients an | |||
associated sessions with a frequency governed by the lease time. | opportunity to find out about the server restart, as a result of | |||
Note that when a client does not issue such requests (or they are | issuing requests on associated sessions with a frequency governed by | |||
issued by the client but not received by the server), it is possible | the lease time. Note that when a client does not issue such requests | |||
for the grace period to expire before the client finds out that the | (or they are issued by the client but not received by the server), it | |||
server restart has occurred. | is possible for the grace period to expire before the client finds | |||
out that the server restart has occurred. | ||||
Some additional time in order to allow a client to establish a new | Some additional time in order to allow a client to establish a new | |||
client ID and session and to effect lock reclaims may be added to the | client ID and session and to effect lock reclaims may be added to the | |||
lease time. Note that analogous rules apply to file system-specific | lease time. Note that analogous rules apply to file system-specific | |||
grace periods discussed in Section 11.7.7. | grace periods discussed in Section 11.7.7. | |||
If the server can reliably determine that granting a non-reclaim | If the server can reliably determine that granting a non-reclaim | |||
request will not conflict with reclamation of locks by other clients, | request will not conflict with reclamation of locks by other clients, | |||
the NFS4ERR_GRACE error does not have to be returned even within the | the NFS4ERR_GRACE error does not have to be returned even within the | |||
grace period, although NFS4ERR_GRACE must always be returned to | grace period, although NFS4ERR_GRACE must always be returned to | |||
skipping to change at page 163, line 17 | skipping to change at page 163, line 38 | |||
established, refetch the lease_time attribute and use it as the basis | established, refetch the lease_time attribute and use it as the basis | |||
for lease renewal for the lease associated with that server. | for lease renewal for the lease associated with that server. | |||
However, the server must establish, for this restart event, a grace | However, the server must establish, for this restart event, a grace | |||
period at least as long as the lease period for the previous server | period at least as long as the lease period for the previous server | |||
instantiation. This allows the client state obtained during the | instantiation. This allows the client state obtained during the | |||
previous server instance to be reliably re-established. | previous server instance to be reliably re-established. | |||
8.4.3. Network Partitions and Recovery | 8.4.3. Network Partitions and Recovery | |||
If the duration of a network partition is greater than the lease | If the duration of a network partition is greater than the lease | |||
period provided by the server, the server will have not received a | period provided by the server, the server will not have received a | |||
lease renewal from the client. If this occurs, the server may free | lease renewal from the client. If this occurs, the server may free | |||
all locks held for the client, or it may allow the lock state to | all locks held for the client, or it may allow the lock state to | |||
remain for a considerable period, subject to the constraint that if a | remain for a considerable period, subject to the constraint that if a | |||
request for a conflicting lock is made, locks associated with an | request for a conflicting lock is made, locks associated with an | |||
expired lease do not prevent such a conflicting lock from being | expired lease do not prevent such a conflicting lock from being | |||
granted but MUST be revoked as necessary so as not to interfere with | granted but MUST be revoked as necessary so as not to interfere with | |||
such conflicting requests. | such conflicting requests. | |||
If the server chooses to delay freeing of lock state until there is a | If the server chooses to delay freeing of lock state until there is a | |||
conflict, it may either free all of the clients locks once there is a | conflict, it may either free all of the clients locks once there is a | |||
skipping to change at page 163, line 42 | skipping to change at page 164, line 15 | |||
When the server chooses to free all of a client's lock state, either | When the server chooses to free all of a client's lock state, either | |||
immediately upon lease expiration, or a result of the first attempt | immediately upon lease expiration, or a result of the first attempt | |||
to obtain a conflicting a lock, the server may report the loss of | to obtain a conflicting a lock, the server may report the loss of | |||
lock state in a number of ways. | lock state in a number of ways. | |||
The server may choose to invalidate the session and the associated | The server may choose to invalidate the session and the associated | |||
client ID. In this case, when the client is able to communicate with | client ID. In this case, when the client is able to communicate with | |||
the server, it will receive an NFS4ERR_BADSESSION. Upon attempting | the server, it will receive an NFS4ERR_BADSESSION. Upon attempting | |||
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | |||
Upon creating the new clientid and new session it would attempt to | Upon creating the new client ID and new session it would attempt to | |||
reclaim locks not be allowed to do so by the server. | reclaim locks not be allowed to do so by the server. | |||
Another possibility is for the server to maintain the session and | Another possibility is for the server to maintain the session and | |||
clientid but for all stateids held by the client to become invalid or | client ID but for all stateids held by the client to become invalid | |||
stale. Once the client is able to reach the server after such a | or stale. Once the client is able to reach the server after such a | |||
network partition, the status returned by the SEQUENCE operation will | network partition, the status returned by the SEQUENCE operation will | |||
indicate a loss of locking state. (The flag | indicate a loss of locking state. (The flag | |||
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in | |||
sr_status_flags.) In addition, all I/O submitted by the client with | sr_status_flags.) In addition, all I/O submitted by the client with | |||
the now invalid stateids will fail with the server returning the | the now invalid stateids will fail with the server returning the | |||
error NFS4ERR_EXPIRED. Once the client learns of the loss of locking | error NFS4ERR_EXPIRED. Once the client learns of the loss of locking | |||
state, it will suitably notify the applications that held the | state, it will suitably notify the applications that held the | |||
invalidated locks. The client should then take action to free | invalidated locks. The client should then take action to free | |||
invalidated stateids, either by establishing a new client ID using a | invalidated stateids, either by establishing a new client ID using a | |||
new verifier or by doing a FREE_STATEID operation to release each of | new verifier or by doing a FREE_STATEID operation to release each of | |||
the invalidated stateids. | the invalidated stateids. | |||
When the server adopts a finer-grained approach to revocation of | When the server adopts a finer-grained approach to revocation of | |||
locks when lease have expired, only a subset of stateids will | locks when lease have expired, only a subset of stateids will | |||
normally become invalid during a network partition. When the client | normally become invalid during a network partition. When the client | |||
is able to communicate with the server after such a network | is able to communicate with the server after such a network | |||
partition, the status returned by the SEQUENCE operation will | partition, the status returned by the SEQUENCE operation will | |||
indicate a partial loss of locking state. In addition, operations, | indicate a partial loss of locking state | |||
including I/O submitted by the client with the now invalid stateids | (SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED). In addition, operations, | |||
including I/O submitted by the client, with the now invalid stateids | ||||
will fail with the server returning the error NFS4ERR_EXPIRED. Once | will fail with the server returning the error NFS4ERR_EXPIRED. Once | |||
the client learns of the loss of locking state, it will use the | the client learns of the loss of locking state, it will use the | |||
TEST_STATEID operation on all of its stateids to determine which | TEST_STATEID operation on all of its stateids to determine which | |||
locks have been lost and then suitably notify the applications that | locks have been lost and then suitably notify the applications that | |||
held the invalidated locks. The client can then release the | held the invalidated locks. The client can then release the | |||
invalidated locking state and acknowledge the revocation of the | invalidated locking state and acknowledge the revocation of the | |||
associated locks by doing a FREE_STATEID operation on each of the | associated locks by doing a FREE_STATEID operation on each of the | |||
invalidated stateids. | invalidated stateids. | |||
When a network partition is combined with a server restart, there are | When a network partition is combined with a server restart, there are | |||
skipping to change at page 167, line 12 | skipping to change at page 167, line 35 | |||
Regardless of the level and approach to record keeping, the server | Regardless of the level and approach to record keeping, the server | |||
MUST implement one of the following strategies (which apply to | MUST implement one of the following strategies (which apply to | |||
reclaims of share reservations, record locks, and delegations): | reclaims of share reservations, record locks, and delegations): | |||
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | |||
unforgiving, but necessary if the server does not record lock | unforgiving, but necessary if the server does not record lock | |||
state in stable storage. | state in stable storage. | |||
2. Record sufficient state in stable storage such that all known | 2. Record sufficient state in stable storage such that all known | |||
edge conditions involving server restart, including the two noted | edge conditions involving server restart, including the two noted | |||
in this section, are detected. Erroneously recognizing a edge | in this section, are detected. It is acceptable to erroneously | |||
condition and not allowing, when, with sufficient knowledge it | recognize an edge condition and not allow a reclaim, when, with | |||
would be grantable, acceptable. Note that at this time, it is | sufficient knowledge it would be allowed. Note it is not known | |||
not known if there are other edge conditions. | if there are other edge conditions. | |||
In the event that, after a server restart, the server determines | In the event that, after a server restart, the server determines | |||
that there is unrecoverable damage or corruption to the | that there is unrecoverable damage or corruption to the | |||
information in stable storage, then for all clients and/or locks | information in stable storage, then for all clients and/or locks | |||
which may be affected, the server MUST return NFS4ERR_NO_GRACE. | which may be affected, the server MUST return NFS4ERR_NO_GRACE. | |||
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | |||
outside the scope of this specification, since the strategies for | outside the scope of this specification, since the strategies for | |||
such handling are very dependent on the client's operating | such handling are very dependent on the client's operating | |||
environment. However, one potential approach is described below. | environment. However, one potential approach is described below. | |||
skipping to change at page 169, line 4 | skipping to change at page 169, line 26 | |||
When determining the time period for the server lease, the usual | When determining the time period for the server lease, the usual | |||
lease tradeoffs apply. Short leases are good for fast server | lease tradeoffs apply. Short leases are good for fast server | |||
recovery at a cost of increased operations to effect lease renewal | recovery at a cost of increased operations to effect lease renewal | |||
(when there are no other operations during the period to effect lease | (when there are no other operations during the period to effect lease | |||
renewal as a side-effect). Long leases are certainly kinder and | renewal as a side-effect). Long leases are certainly kinder and | |||
gentler to servers trying to handle very large numbers of clients. | gentler to servers trying to handle very large numbers of clients. | |||
The number of extra requests to effect lock renewal drops in inverse | The number of extra requests to effect lock renewal drops in inverse | |||
proportion to the lease time. The disadvantages of long leases | proportion to the lease time. The disadvantages of long leases | |||
include the possibility of slower recovery after certain failures. | include the possibility of slower recovery after certain failures. | |||
After server failure, a longer grace period may be required when some | After server failure, a longer grace period may be required when some | |||
clients do not promptly reclaim their locks and do a global | clients do not promptly reclaim their locks and do a global | |||
RECLAIM_COMPLETE. In the event of client failure, there can be a | RECLAIM_COMPLETE. In the event of client failure, there can be a | |||
longer period for leases to expire thus forcing conflicting requests | longer period for leases to expire thus forcing conflicting requests | |||
to wait. | to wait. | |||
Long leases are usable if the server is able to store lease state in | Long leases are practical if the server is able to store lease state | |||
non-volatile memory. Upon recovery, the server can reconstruct the | in non-volatile memory. Upon recovery, the server can reconstruct | |||
lease state from its non-volatile memory and continue operation with | the lease state from its non-volatile memory and continue operation | |||
its clients and therefore long leases would not be an issue. | with its clients and therefore long leases would not be an issue. | |||
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | |||
To avoid the need for synchronized clocks, lease times are granted by | To avoid the need for synchronized clocks, lease times are granted by | |||
the server as a time delta. However, there is a requirement that the | the server as a time delta. However, there is a requirement that the | |||
client and server clocks do not drift excessively over the duration | client and server clocks do not drift excessively over the duration | |||
of the lease. There is also the issue of propagation delay across | of the lease. There is also the issue of propagation delay across | |||
the network which could easily be several hundred milliseconds as | the network which could easily be several hundred milliseconds as | |||
well as the possibility that requests will be lost and need to be | well as the possibility that requests will be lost and need to be | |||
retransmitted. | retransmitted. | |||
To take propagation delay into account, the client should subtract it | To take propagation delay into account, the client should subtract it | |||
from lease times (e.g. if the client estimates the one-way | from lease times (e.g. if the client estimates the one-way | |||
propagation delay as 200 msec, then it can assume that the lease is | propagation delay as 200 millseconds, then it can assume that the | |||
already 200 msec old when it gets it). In addition, it will take | lease is already 200 millseconds old when it gets it). In addition, | |||
another 200 msec to get a response back to the server. So the client | it will take another 200 millseconds to get a response back to the | |||
must send a lease renewal or write data back to the server 400 msec | server. So the client must send a lease renewal or write data back | |||
before the lease would expire. | to the server at least 400 millseconds before the lease would expire. | |||
The server's lease period configuration should take into account the | The server's lease period configuration should take into account the | |||
network distance of the clients that will be accessing the server's | network distance of the clients that will be accessing the server's | |||
resources. It is expected that the lease period will take into | resources. It is expected that the lease period will take into | |||
account the network propagation delays and other network delay | account the network propagation delays and other network delay | |||
factors for the client population. Since the protocol does not allow | factors for the client population. Since the protocol does not allow | |||
for an automatic method to determine an appropriate lease period, the | for an automatic method to determine an appropriate lease period, the | |||
server's administrator may have to tune the lease period. | server's administrator may have to tune the lease period. | |||
8.8. Obsolete Locking Infrastructure From NFSv4.0 | 8.8. Obsolete Locking Infrastructure From NFSv4.0 | |||
skipping to change at page 170, line 10 | skipping to change at page 170, line 32 | |||
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. | The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. | |||
The server MUST return NFS4ERR_NOTSUPP if these operations are found | The server MUST return NFS4ERR_NOTSUPP if these operations are found | |||
in an NFSv4.1 COMPOUND. | in an NFSv4.1 COMPOUND. | |||
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. | o SETCLIENTID since its function has been replaced by EXCHANGE_ID. | |||
o SETCLIENTID_CONFIRM since client ID confirmation now happens by | o SETCLIENTID_CONFIRM since client ID confirmation now happens by | |||
means of CREATE_SESSION. | means of CREATE_SESSION. | |||
o OPEN_CONFIRM because OPENs no longer require confirmation to | o OPEN_CONFIRM because state-owner-based seqids have been replaced | |||
establish an owner-based sequence value. | by the sequence id in the SEQUENCE operation. | |||
o RELEASE_LOCKOWNER because lock-owners with no associated locks do | o RELEASE_LOCKOWNER because lock-owners with no associated locks do | |||
not have any sequence-related state and so can be deleted by the | not have any sequence-related state and so can be deleted by the | |||
server at will. | server at will. | |||
o RENEW because every SEQUENCE operation for a session causes lease | o RENEW because every SEQUENCE operation for a session causes lease | |||
renewal, making a separate operation useless. | renewal, making a separate operation superfluous. | |||
Also, there are a number of fields, present in existing operations | Also, there are a number of fields, present in existing operations | |||
related to locking that have no use in minor version one. They were | related to locking that have no use in minor version one. They were | |||
used in minor version zero to perform functions now provided in a | used in minor version zero to perform functions now provided in a | |||
different fashion. | different fashion. | |||
o Sequence ids used to sequence requests for a given state-owner and | o Sequence ids used to sequence requests for a given state-owner and | |||
to provide retry protection, now provided via sessions. | to provide retry protection, now provided via sessions. | |||
o Client IDs used to identify the client associated with a given | o Client IDs used to identify the client associated with a given | |||
End of changes. 66 change blocks. | ||||
140 lines changed or deleted | 160 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |