draft-ietf-nfsv4-minorversion1-PAv8.txt | 2009-12-17-2-TO-rfc5661.txt | |||
---|---|---|---|---|
NFSv4 S. Shepler | Network Working Group S. Shepler, Ed. | |||
Internet-Draft M. Eisler | Request for Comments: 5661 Storspeed, Inc. | |||
Intended status: Standards Track D. Noveck | Category: Standards Track M. Eisler, Ed. | |||
Expires: June 9, 2010 Editors | D. Noveck, Ed. | |||
December 06, 2009 | NetApp | |||
October 2009 | ||||
NFS Version 4 Minor Version 1 | ||||
draft-ietf-nfsv4-minorversion1-PAv8.txt | ||||
Status of this Memo | ||||
This Internet-Draft is submitted to IETF in full conformance with the | ||||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | Network File System (NFS) Version 4 Minor Version 1 Protocol | |||
Task Force (IETF), its areas, and its working groups. Note that | ||||
other groups may also distribute working documents as Internet- | ||||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | Abstract | |||
and may be updated, replaced, or obsoleted by other documents at any | ||||
time. It is inappropriate to use Internet-Drafts as reference | ||||
material or to cite them other than as "work in progress." | ||||
The list of current Internet-Drafts can be accessed at | This document describes the Network File System (NFS) version 4 minor | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | version 1, including features retained from the base protocol (NFS | |||
version 4 minor version 0, which is specified in RFC 3530) and | ||||
protocol extensions made subsequently. Major extensions introduced | ||||
in NFS version 4 minor version 1 include Sessions, Directory | ||||
Delegations, and parallel NFS (pNFS). NFS version 4 minor version 1 | ||||
has no dependencies on NFS version 4 minor version 0, and it is | ||||
considered a separate protocol. Thus, this document neither updates | ||||
nor obsoletes RFC 3530. NFS minor version 1 is deemed superior to | ||||
NFS minor version 0 with no loss of functionality, and its use is | ||||
preferred over version 0. Both NFS minor versions 0 and 1 can be | ||||
used simultaneously on the same network, between the same client and | ||||
server. | ||||
The list of Internet-Draft Shadow Directories can be accessed at | Status of This Memo | |||
http://www.ietf.org/shadow.html. | ||||
This Internet-Draft will expire on June 9, 2010. | This document specifies an Internet standards track protocol for the | |||
Internet community, and requests discussion and suggestions for | ||||
improvements. Please refer to the current edition of the "Internet | ||||
Official Protocol Standards" (STD 1) for the standardization state | ||||
and status of this protocol. Distribution of this memo is unlimited. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | ||||
Abstract | the Trust Legal Provisions and are provided without warranty as | |||
described in the BSD License. | ||||
This document describes NFS version 4 minor version one, including | ||||
features retained from the base protocol (NFS version 4 minor version | ||||
zero which is specified in RFC3530) and protocol extensions made | ||||
subsequently. Major extensions introduced in NFS version 4 minor | ||||
version one include: Sessions, Directory Delegations, and parallel | ||||
NFS (pNFS). NFS version 4 minor version one has no dependencies on | ||||
NFS version 4 minor version zero, and is considered a separate | ||||
protocol. Thus this document neither updates nor obsoletes RFC3530. | ||||
NFS minor version one is deemed superior to NFS minor version zero | ||||
with no loss of functionality, and its use is preferred over version | ||||
zero. Both NFS minor version zero and one can be used simultaneously | ||||
on the same network, between the same client and server. | ||||
Requirements Language | ||||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
document are to be interpreted as described in RFC 2119 [1]. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 12 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 12 | 1.1. The NFS Version 4 Minor Version 1 Protocol . . . . . . . 11 | |||
1.2. Scope of this Document . . . . . . . . . . . . . . . . . 12 | 1.2. Requirements Language . . . . . . . . . . . . . . . . . 11 | |||
1.3. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 12 | 1.3. Scope of This Document . . . . . . . . . . . . . . . . . 11 | |||
1.4. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 13 | 1.4. NFSv4 Goals . . . . . . . . . . . . . . . . . . . . . . 11 | |||
1.5. General Definitions . . . . . . . . . . . . . . . . . . 13 | 1.5. NFSv4.1 Goals . . . . . . . . . . . . . . . . . . . . . 12 | |||
1.6. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 16 | 1.6. General Definitions . . . . . . . . . . . . . . . . . . 13 | |||
1.6.1. RPC and Security . . . . . . . . . . . . . . . . . . 16 | 1.7. Overview of NFSv4.1 Features . . . . . . . . . . . . . . 15 | |||
1.6.2. Protocol Structure . . . . . . . . . . . . . . . . . 17 | 1.7.1. RPC and Security . . . . . . . . . . . . . . . . . . 15 | |||
1.6.3. File System Model . . . . . . . . . . . . . . . . . 17 | 1.7.2. Protocol Structure . . . . . . . . . . . . . . . . . 16 | |||
1.6.4. Locking Facilities . . . . . . . . . . . . . . . . . 19 | 1.7.3. File System Model . . . . . . . . . . . . . . . . . 16 | |||
1.7. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 20 | 1.7.4. Locking Facilities . . . . . . . . . . . . . . . . . 18 | |||
2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 21 | 1.8. Differences from NFSv4.0 . . . . . . . . . . . . . . . . 19 | |||
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 21 | 2. Core Infrastructure . . . . . . . . . . . . . . . . . . . . . 20 | |||
2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 21 | 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 20 | |||
2.2.1. RPC-based Security . . . . . . . . . . . . . . . . . 21 | 2.2. RPC and XDR . . . . . . . . . . . . . . . . . . . . . . 20 | |||
2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 24 | 2.2.1. RPC-Based Security . . . . . . . . . . . . . . . . . 20 | |||
2.4. Client Identifiers and Client Owners . . . . . . . . . . 25 | 2.3. COMPOUND and CB_COMPOUND . . . . . . . . . . . . . . . . 23 | |||
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 29 | 2.4. Client Identifiers and Client Owners . . . . . . . . . . 24 | |||
2.4.2. Server Release of Client ID . . . . . . . . . . . . 29 | 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 . . . . . . . . . . 28 | |||
2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 30 | 2.4.2. Server Release of Client ID . . . . . . . . . . . . 28 | |||
2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 31 | 2.4.3. Resolving Client Owner Conflicts . . . . . . . . . . 29 | |||
2.6. Security Service Negotiation . . . . . . . . . . . . . . 31 | 2.5. Server Owners . . . . . . . . . . . . . . . . . . . . . 30 | |||
2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 32 | 2.6. Security Service Negotiation . . . . . . . . . . . . . . 30 | |||
2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 32 | 2.6.1. NFSv4.1 Security Tuples . . . . . . . . . . . . . . 31 | |||
2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 32 | 2.6.2. SECINFO and SECINFO_NO_NAME . . . . . . . . . . . . 31 | |||
2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 37 | 2.6.3. Security Error . . . . . . . . . . . . . . . . . . . 31 | |||
2.8. Non-RPC-based Security Services . . . . . . . . . . . . 39 | 2.7. Minor Versioning . . . . . . . . . . . . . . . . . . . . 36 | |||
2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 39 | 2.8. Non-RPC-Based Security Services . . . . . . . . . . . . 38 | |||
2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 39 | 2.8.1. Authorization . . . . . . . . . . . . . . . . . . . 38 | |||
2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 40 | 2.8.2. Auditing . . . . . . . . . . . . . . . . . . . . . . 38 | |||
2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 40 | 2.8.3. Intrusion Detection . . . . . . . . . . . . . . . . 39 | |||
2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 40 | 2.9. Transport Layers . . . . . . . . . . . . . . . . . . . . 39 | |||
2.9.2. Client and Server Transport Behavior . . . . . . . . 41 | 2.9.1. REQUIRED and RECOMMENDED Properties of Transports . 39 | |||
2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 42 | 2.9.2. Client and Server Transport Behavior . . . . . . . . 40 | |||
2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 42 | 2.9.3. Ports . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
2.10.1. Motivation and Overview . . . . . . . . . . . . . . 42 | 2.10. Session . . . . . . . . . . . . . . . . . . . . . . . . 41 | |||
2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 44 | 2.10.1. Motivation and Overview . . . . . . . . . . . . . . 41 | |||
2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 45 | 2.10.2. NFSv4 Integration . . . . . . . . . . . . . . . . . 43 | |||
2.10.4. Server Scope . . . . . . . . . . . . . . . . . . . . 46 | 2.10.3. Channels . . . . . . . . . . . . . . . . . . . . . . 44 | |||
2.10.5. Trunking . . . . . . . . . . . . . . . . . . . . . . 49 | 2.10.4. Server Scope . . . . . . . . . . . . . . . . . . . . 45 | |||
2.10.6. Exactly Once Semantics . . . . . . . . . . . . . . . 52 | 2.10.5. Trunking . . . . . . . . . . . . . . . . . . . . . . 48 | |||
2.10.7. RDMA Considerations . . . . . . . . . . . . . . . . 67 | 2.10.6. Exactly Once Semantics . . . . . . . . . . . . . . . 51 | |||
2.10.8. Sessions Security . . . . . . . . . . . . . . . . . 70 | 2.10.7. RDMA Considerations . . . . . . . . . . . . . . . . 66 | |||
2.10.9. The Secret State Verifier (SSV) GSS Mechanism . . . 75 | 2.10.8. Session Security . . . . . . . . . . . . . . . . . . 69 | |||
2.10.10. Security Considerations for RPCSEC_GSS when using | 2.10.9. The Secret State Verifier (SSV) GSS Mechanism . . . 74 | |||
the SSV Mechanism . . . . . . . . . . . . . . . . . 79 | 2.10.10. Security Considerations for RPCSEC_GSS When Using | |||
2.10.11. Session Mechanics - Steady State . . . . . . . . . . 81 | the SSV Mechanism . . . . . . . . . . . . . . . . . 78 | |||
2.10.12. Session Inactivity Timer . . . . . . . . . . . . . . 82 | 2.10.11. Session Mechanics - Steady State . . . . . . . . . . 80 | |||
2.10.13. Session Mechanics - Recovery . . . . . . . . . . . . 83 | 2.10.12. Session Inactivity Timer . . . . . . . . . . . . . . 81 | |||
2.10.14. Parallel NFS and Sessions . . . . . . . . . . . . . 88 | 2.10.13. Session Mechanics - Recovery . . . . . . . . . . . . 82 | |||
3. Protocol Constants and Data Types . . . . . . . . . . . . . . 88 | 2.10.14. Parallel NFS and Sessions . . . . . . . . . . . . . 87 | |||
3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 88 | 3. Protocol Constants and Data Types . . . . . . . . . . . . . . 87 | |||
3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 89 | 3.1. Basic Constants . . . . . . . . . . . . . . . . . . . . 87 | |||
3.3. Structured Data Types . . . . . . . . . . . . . . . . . 91 | 3.2. Basic Data Types . . . . . . . . . . . . . . . . . . . . 88 | |||
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 99 | 3.3. Structured Data Types . . . . . . . . . . . . . . . . . 90 | |||
4. Filehandles . . . . . . . . . . . . . . . . . . . . . . . . . 98 | ||||
4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 99 | 4.1. Obtaining the First Filehandle . . . . . . . . . . . . . 99 | |||
4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 99 | 4.1.1. Root Filehandle . . . . . . . . . . . . . . . . . . 99 | |||
4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 100 | 4.1.2. Public Filehandle . . . . . . . . . . . . . . . . . 99 | |||
4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 100 | 4.2. Filehandle Types . . . . . . . . . . . . . . . . . . . . 100 | |||
4.2.1. General Properties of a Filehandle . . . . . . . . . 101 | 4.2.1. General Properties of a Filehandle . . . . . . . . . 100 | |||
4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 101 | 4.2.2. Persistent Filehandle . . . . . . . . . . . . . . . 101 | |||
4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 102 | 4.2.3. Volatile Filehandle . . . . . . . . . . . . . . . . 101 | |||
4.3. One Method of Constructing a Volatile Filehandle . . . . 103 | 4.3. One Method of Constructing a Volatile Filehandle . . . . 102 | |||
4.4. Client Recovery from Filehandle Expiration . . . . . . . 103 | 4.4. Client Recovery from Filehandle Expiration . . . . . . . 103 | |||
5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 104 | 5. File Attributes . . . . . . . . . . . . . . . . . . . . . . . 104 | |||
5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 105 | 5.1. REQUIRED Attributes . . . . . . . . . . . . . . . . . . 105 | |||
5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 106 | 5.2. RECOMMENDED Attributes . . . . . . . . . . . . . . . . . 105 | |||
5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 106 | 5.3. Named Attributes . . . . . . . . . . . . . . . . . . . . 106 | |||
5.4. Classification of Attributes . . . . . . . . . . . . . . 108 | 5.4. Classification of Attributes . . . . . . . . . . . . . . 107 | |||
5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 109 | 5.5. Set-Only and Get-Only Attributes . . . . . . . . . . . . 108 | |||
5.6. REQUIRED Attributes - List and Definition References . . 109 | 5.6. REQUIRED Attributes - List and Definition References . . 108 | |||
5.7. RECOMMENDED Attributes - List and Definition | 5.7. RECOMMENDED Attributes - List and Definition | |||
References . . . . . . . . . . . . . . . . . . . . . . . 110 | References . . . . . . . . . . . . . . . . . . . . . . . 109 | |||
5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 112 | 5.8. Attribute Definitions . . . . . . . . . . . . . . . . . 111 | |||
5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 112 | 5.8.1. Definitions of REQUIRED Attributes . . . . . . . . . 111 | |||
5.8.2. Definitions of Uncategorized RECOMMENDED | 5.8.2. Definitions of Uncategorized RECOMMENDED | |||
Attributes . . . . . . . . . . . . . . . . . . . . . 114 | Attributes . . . . . . . . . . . . . . . . . . . . . 113 | |||
5.9. Interpreting owner and owner_group . . . . . . . . . . . 120 | 5.9. Interpreting owner and owner_group . . . . . . . . . . . 120 | |||
5.10. Character Case Attributes . . . . . . . . . . . . . . . 122 | 5.10. Character Case Attributes . . . . . . . . . . . . . . . 122 | |||
5.11. Directory Notification Attributes . . . . . . . . . . . 123 | 5.11. Directory Notification Attributes . . . . . . . . . . . 122 | |||
5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 123 | 5.12. pNFS Attribute Definitions . . . . . . . . . . . . . . . 122 | |||
5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 125 | 5.13. Retention Attributes . . . . . . . . . . . . . . . . . . 124 | |||
6. Access Control Attributes . . . . . . . . . . . . . . . . . . 128 | 6. Access Control Attributes . . . . . . . . . . . . . . . . . . 127 | |||
6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 128 | 6.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . 127 | |||
6.2. File Attributes Discussion . . . . . . . . . . . . . . . 129 | 6.2. File Attributes Discussion . . . . . . . . . . . . . . . 128 | |||
6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 129 | 6.2.1. Attribute 12: acl . . . . . . . . . . . . . . . . . 128 | |||
6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 144 | 6.2.2. Attribute 58: dacl . . . . . . . . . . . . . . . . . 144 | |||
6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 144 | 6.2.3. Attribute 59: sacl . . . . . . . . . . . . . . . . . 144 | |||
6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 144 | 6.2.4. Attribute 33: mode . . . . . . . . . . . . . . . . . 144 | |||
6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 145 | 6.2.5. Attribute 74: mode_set_masked . . . . . . . . . . . 144 | |||
6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 146 | 6.3. Common Methods . . . . . . . . . . . . . . . . . . . . . 145 | |||
6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 146 | 6.3.1. Interpreting an ACL . . . . . . . . . . . . . . . . 145 | |||
6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 147 | 6.3.2. Computing a Mode Attribute from an ACL . . . . . . . 146 | |||
6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 148 | 6.4. Requirements . . . . . . . . . . . . . . . . . . . . . . 147 | |||
6.4.1. Setting the mode and/or ACL Attributes . . . . . . . 148 | 6.4.1. Setting the Mode and/or ACL Attributes . . . . . . . 148 | |||
6.4.2. Retrieving the mode and/or ACL Attributes . . . . . 150 | 6.4.2. Retrieving the Mode and/or ACL Attributes . . . . . 149 | |||
6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 150 | 6.4.3. Creating New Objects . . . . . . . . . . . . . . . . 150 | |||
7. Single-server Namespace . . . . . . . . . . . . . . . . . . . 154 | 7. Single-Server Namespace . . . . . . . . . . . . . . . . . . . 154 | |||
7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 155 | 7.1. Server Exports . . . . . . . . . . . . . . . . . . . . . 154 | |||
7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 155 | 7.2. Browsing Exports . . . . . . . . . . . . . . . . . . . . 154 | |||
7.3. Server Pseudo File System . . . . . . . . . . . . . . . 155 | 7.3. Server Pseudo File System . . . . . . . . . . . . . . . 155 | |||
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 156 | 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 155 | |||
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 156 | 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 155 | |||
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 157 | 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 156 | |||
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 157 | 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 156 | |||
7.8. Security Policy and Namespace Presentation . . . . . . . 157 | 7.8. Security Policy and Namespace Presentation . . . . . . . 157 | |||
8. State Management . . . . . . . . . . . . . . . . . . . . . . 158 | 8. State Management . . . . . . . . . . . . . . . . . . . . . . 158 | |||
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 159 | 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 158 | |||
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 160 | 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 159 | |||
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 160 | 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 159 | |||
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 161 | 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 161 | |||
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 163 | 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 162 | |||
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 164 | 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 163 | |||
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 167 | 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 166 | |||
8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 168 | 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 167 | |||
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 168 | 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 167 | |||
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 171 | 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 170 | |||
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 171 | 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 170 | |||
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 172 | 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 171 | |||
8.4.3. Network Partitions and Recovery . . . . . . . . . . 177 | 8.4.3. Network Partitions and Recovery . . . . . . . . . . 176 | |||
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 182 | 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 181 | |||
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 183 | 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 182 | |||
8.7. Clocks, Propagation Delay, and Calculating Lease | 8.7. Clocks, Propagation Delay, and Calculating Lease | |||
Expiration . . . . . . . . . . . . . . . . . . . . . . . 183 | Expiration . . . . . . . . . . . . . . . . . . . . . . . 182 | |||
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 184 | 8.8. Obsolete Locking Infrastructure from NFSv4.0 . . . . . . 183 | |||
9. File Locking and Share Reservations . . . . . . . . . . . . . 185 | 9. File Locking and Share Reservations . . . . . . . . . . . . . 184 | |||
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 185 | 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 184 | |||
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 185 | 9.1.1. State-Owner Definition . . . . . . . . . . . . . . . 184 | |||
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 185 | 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 185 | |||
9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 188 | 9.2. Lock Ranges . . . . . . . . . . . . . . . . . . . . . . 188 | |||
9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 189 | 9.3. Upgrading and Downgrading Locks . . . . . . . . . . . . 188 | |||
9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 189 | 9.4. Stateid Seqid Values and Byte-Range Locks . . . . . . . 189 | |||
9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 190 | 9.5. Issues with Multiple Open-Owners . . . . . . . . . . . . 189 | |||
9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 190 | 9.6. Blocking Locks . . . . . . . . . . . . . . . . . . . . . 190 | |||
9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 191 | 9.7. Share Reservations . . . . . . . . . . . . . . . . . . . 191 | |||
9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 192 | 9.8. OPEN/CLOSE Operations . . . . . . . . . . . . . . . . . 192 | |||
9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 193 | 9.9. Open Upgrade and Downgrade . . . . . . . . . . . . . . . 192 | |||
9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 194 | 9.10. Parallel OPENs . . . . . . . . . . . . . . . . . . . . . 193 | |||
9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 194 | 9.11. Reclaim of Open and Byte-Range Locks . . . . . . . . . . 194 | |||
10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 195 | 10. Client-Side Caching . . . . . . . . . . . . . . . . . . . . . 194 | |||
10.1. Performance Challenges for Client-Side Caching . . . . . 195 | 10.1. Performance Challenges for Client-Side Caching . . . . . 195 | |||
10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 196 | 10.2. Delegation and Callbacks . . . . . . . . . . . . . . . . 196 | |||
10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 198 | 10.2.1. Delegation Recovery . . . . . . . . . . . . . . . . 198 | |||
10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 201 | 10.3. Data Caching . . . . . . . . . . . . . . . . . . . . . . 200 | |||
10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 201 | 10.3.1. Data Caching and OPENs . . . . . . . . . . . . . . . 201 | |||
10.3.2. Data Caching and File Locking . . . . . . . . . . . 202 | 10.3.2. Data Caching and File Locking . . . . . . . . . . . 202 | |||
10.3.3. Data Caching and Mandatory File Locking . . . . . . 204 | 10.3.3. Data Caching and Mandatory File Locking . . . . . . 203 | |||
10.3.4. Data Caching and File Identity . . . . . . . . . . . 204 | 10.3.4. Data Caching and File Identity . . . . . . . . . . . 204 | |||
10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 205 | 10.4. Open Delegation . . . . . . . . . . . . . . . . . . . . 205 | |||
10.4.1. Open Delegation and Data Caching . . . . . . . . . . 208 | 10.4.1. Open Delegation and Data Caching . . . . . . . . . . 207 | |||
10.4.2. Open Delegation and File Locks . . . . . . . . . . . 209 | 10.4.2. Open Delegation and File Locks . . . . . . . . . . . 209 | |||
10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 209 | 10.4.3. Handling of CB_GETATTR . . . . . . . . . . . . . . . 209 | |||
10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 212 | 10.4.4. Recall of Open Delegation . . . . . . . . . . . . . 212 | |||
10.4.5. Clients that Fail to Honor Delegation Recalls . . . 214 | 10.4.5. Clients That Fail to Honor Delegation Recalls . . . 214 | |||
10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 215 | 10.4.6. Delegation Revocation . . . . . . . . . . . . . . . 215 | |||
10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 215 | 10.4.7. Delegations via WANT_DELEGATION . . . . . . . . . . 216 | |||
10.5. Data Caching and Revocation . . . . . . . . . . . . . . 216 | 10.5. Data Caching and Revocation . . . . . . . . . . . . . . 216 | |||
10.5.1. Revocation Recovery for Write Open Delegation . . . 217 | 10.5.1. Revocation Recovery for Write Open Delegation . . . 217 | |||
10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 217 | 10.6. Attribute Caching . . . . . . . . . . . . . . . . . . . 218 | |||
10.7. Data and Metadata Caching and Memory Mapped Files . . . 219 | 10.7. Data and Metadata Caching and Memory Mapped Files . . . 220 | |||
10.8. Name and Directory Caching without Directory | 10.8. Name and Directory Caching without Directory | |||
Delegations . . . . . . . . . . . . . . . . . . . . . . 222 | Delegations . . . . . . . . . . . . . . . . . . . . . . 222 | |||
10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 222 | 10.8.1. Name Caching . . . . . . . . . . . . . . . . . . . . 222 | |||
10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 223 | 10.8.2. Directory Caching . . . . . . . . . . . . . . . . . 224 | |||
10.9. Directory Delegations . . . . . . . . . . . . . . . . . 224 | 10.9. Directory Delegations . . . . . . . . . . . . . . . . . 224 | |||
10.9.1. Introduction to Directory Delegations . . . . . . . 224 | 10.9.1. Introduction to Directory Delegations . . . . . . . 225 | |||
10.9.2. Directory Delegation Design . . . . . . . . . . . . 225 | 10.9.2. Directory Delegation Design . . . . . . . . . . . . 226 | |||
10.9.3. Attributes in Support of Directory Notifications . . 226 | 10.9.3. Attributes in Support of Directory Notifications . . 227 | |||
10.9.4. Directory Delegation Recall . . . . . . . . . . . . 226 | 10.9.4. Directory Delegation Recall . . . . . . . . . . . . 227 | |||
10.9.5. Directory Delegation Recovery . . . . . . . . . . . 227 | 10.9.5. Directory Delegation Recovery . . . . . . . . . . . 227 | |||
11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 227 | 11. Multi-Server Namespace . . . . . . . . . . . . . . . . . . . 228 | |||
11.1. Location Attributes . . . . . . . . . . . . . . . . . . 228 | 11.1. Location Attributes . . . . . . . . . . . . . . . . . . 228 | |||
11.2. File System Presence or Absence . . . . . . . . . . . . 228 | 11.2. File System Presence or Absence . . . . . . . . . . . . 228 | |||
11.3. Getting Attributes for an Absent File System . . . . . . 229 | 11.3. Getting Attributes for an Absent File System . . . . . . 230 | |||
11.3.1. GETATTR Within an Absent File System . . . . . . . . 230 | 11.3.1. GETATTR within an Absent File System . . . . . . . . 230 | |||
11.3.2. READDIR and Absent File Systems . . . . . . . . . . 231 | 11.3.2. READDIR and Absent File Systems . . . . . . . . . . 231 | |||
11.4. Uses of Location Information . . . . . . . . . . . . . . 231 | 11.4. Uses of Location Information . . . . . . . . . . . . . . 232 | |||
11.4.1. File System Replication . . . . . . . . . . . . . . 232 | 11.4.1. File System Replication . . . . . . . . . . . . . . 232 | |||
11.4.2. File System Migration . . . . . . . . . . . . . . . 233 | 11.4.2. File System Migration . . . . . . . . . . . . . . . 233 | |||
11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 234 | 11.4.3. Referrals . . . . . . . . . . . . . . . . . . . . . 234 | |||
11.5. Location Entries and Server Identity . . . . . . . . . . 236 | 11.5. Location Entries and Server Identity . . . . . . . . . . 236 | |||
11.6. Additional Client-side Considerations . . . . . . . . . 236 | 11.6. Additional Client-Side Considerations . . . . . . . . . 236 | |||
11.7. Effecting File System Transitions . . . . . . . . . . . 237 | 11.7. Effecting File System Transitions . . . . . . . . . . . 237 | |||
11.7.1. File System Transitions and Simultaneous Access . . 238 | 11.7.1. File System Transitions and Simultaneous Access . . 238 | |||
11.7.2. Simultaneous Use and Transparent Transitions . . . . 239 | 11.7.2. Simultaneous Use and Transparent Transitions . . . . 239 | |||
11.7.3. Filehandles and File System Transitions . . . . . . 242 | 11.7.3. Filehandles and File System Transitions . . . . . . 242 | |||
11.7.4. Fileids and File System Transitions . . . . . . . . 242 | 11.7.4. Fileids and File System Transitions . . . . . . . . 242 | |||
11.7.5. Fsids and File System Transitions . . . . . . . . . 243 | 11.7.5. Fsids and File System Transitions . . . . . . . . . 243 | |||
11.7.6. The Change Attribute and File System Transitions . . 244 | 11.7.6. The Change Attribute and File System Transitions . . 244 | |||
11.7.7. Lock State and File System Transitions . . . . . . . 244 | 11.7.7. Lock State and File System Transitions . . . . . . . 245 | |||
11.7.8. Write Verifiers and File System Transitions . . . . 249 | 11.7.8. Write Verifiers and File System Transitions . . . . 249 | |||
11.7.9. Readdir Cookies and Verifiers and File System | 11.7.9. Readdir Cookies and Verifiers and File System | |||
Transitions . . . . . . . . . . . . . . . . . . . . 249 | Transitions . . . . . . . . . . . . . . . . . . . . 249 | |||
11.7.10. File System Data and File System Transitions . . . . 249 | 11.7.10. File System Data and File System Transitions . . . . 249 | |||
11.8. Effecting File System Referrals . . . . . . . . . . . . 251 | 11.8. Effecting File System Referrals . . . . . . . . . . . . 251 | |||
11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 251 | 11.8.1. Referral Example (LOOKUP) . . . . . . . . . . . . . 251 | |||
11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 255 | 11.8.2. Referral Example (READDIR) . . . . . . . . . . . . . 255 | |||
11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 257 | 11.9. The Attribute fs_locations . . . . . . . . . . . . . . . 257 | |||
11.10. The Attribute fs_locations_info . . . . . . . . . . . . 260 | 11.10. The Attribute fs_locations_info . . . . . . . . . . . . 260 | |||
11.10.1. The fs_locations_server4 Structure . . . . . . . . . 264 | 11.10.1. The fs_locations_server4 Structure . . . . . . . . . 264 | |||
skipping to change at page 7, line 30 | skipping to change at page 6, line 31 | |||
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 277 | 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 277 | |||
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 278 | 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 278 | |||
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 278 | 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 278 | |||
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 278 | 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 278 | |||
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 278 | 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 278 | |||
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 279 | 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 279 | |||
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 279 | 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 279 | |||
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 280 | 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 280 | |||
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 280 | 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 280 | |||
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 281 | 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 281 | |||
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 281 | 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 282 | |||
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 283 | 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 283 | |||
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 284 | 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 284 | |||
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 284 | 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 284 | |||
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 284 | 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 284 | |||
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 285 | 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 285 | |||
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 286 | 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 286 | |||
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 287 | 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 287 | |||
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 290 | 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 290 | |||
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 299 | 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 299 | |||
12.5.7. Metadata Server Write Propagation . . . . . . . . . 299 | 12.5.7. Metadata Server Write Propagation . . . . . . . . . 299 | |||
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 299 | 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 299 | |||
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 301 | 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 301 | |||
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 301 | 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 301 | |||
12.7.2. Dealing with Lease Expiration on the Client . . . . 301 | 12.7.2. Dealing with Lease Expiration on the Client . . . . 301 | |||
12.7.3. Dealing with Loss of Layout State on the Metadata | 12.7.3. Dealing with Loss of Layout State on the Metadata | |||
Server . . . . . . . . . . . . . . . . . . . . . . . 302 | Server . . . . . . . . . . . . . . . . . . . . . . . 302 | |||
12.7.4. Recovery from Metadata Server Restart . . . . . . . 303 | 12.7.4. Recovery from Metadata Server Restart . . . . . . . 303 | |||
12.7.5. Operations During Metadata Server Grace Period . . . 305 | 12.7.5. Operations during Metadata Server Grace Period . . . 305 | |||
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 305 | 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 305 | |||
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 306 | 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 306 | |||
12.9. Security Considerations for pNFS . . . . . . . . . . . . 306 | 12.9. Security Considerations for pNFS . . . . . . . . . . . . 306 | |||
13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type . 307 | 13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type . 307 | |||
13.1. Client ID and Session Considerations . . . . . . . . . . 307 | 13.1. Client ID and Session Considerations . . . . . . . . . . 308 | |||
13.1.1. Sessions Considerations for Data Servers . . . . . . 310 | 13.1.1. Sessions Considerations for Data Servers . . . . . . 310 | |||
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 310 | 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 310 | |||
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 311 | 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 311 | |||
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 315 | 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 315 | |||
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 315 | 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 315 | |||
13.4.2. Interpreting the File Layout Using Sparse Packing . 316 | 13.4.2. Interpreting the File Layout Using Sparse Packing . 316 | |||
13.4.3. Interpreting the File Layout Using Dense Packing . . 318 | 13.4.3. Interpreting the File Layout Using Dense Packing . . 318 | |||
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 320 | 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 320 | |||
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 322 | 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 322 | |||
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 323 | 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 323 | |||
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 325 | 13.7. COMMIT through Metadata Server . . . . . . . . . . . . . 325 | |||
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 327 | 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 327 | |||
13.9. Metadata and Data Server State Coordination . . . . . . 327 | 13.9. Metadata and Data Server State Coordination . . . . . . 327 | |||
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 327 | 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 327 | |||
13.9.2. Data Server State Propagation . . . . . . . . . . . 328 | 13.9.2. Data Server State Propagation . . . . . . . . . . . 328 | |||
13.10. Data Server Component File Size . . . . . . . . . . . . 330 | 13.10. Data Server Component File Size . . . . . . . . . . . . 330 | |||
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 331 | 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 331 | |||
13.12. Security Considerations for the File Layout Type . . . . 331 | 13.12. Security Considerations for the File Layout Type . . . . 332 | |||
14. Internationalization . . . . . . . . . . . . . . . . . . . . 332 | 14. Internationalization . . . . . . . . . . . . . . . . . . . . 333 | |||
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 333 | 14.1. Stringprep Profile for the utf8str_cs Type . . . . . . . 334 | |||
14.2. Stringprep profile for the utf8str_cis type . . . . . . 335 | 14.2. Stringprep Profile for the utf8str_cis Type . . . . . . 335 | |||
14.3. Stringprep profile for the utf8str_mixed type . . . . . 336 | 14.3. Stringprep Profile for the utf8str_mixed Type . . . . . 336 | |||
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 337 | 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 338 | |||
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 338 | 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 338 | |||
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 338 | 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 339 | |||
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 339 | 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 339 | |||
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 341 | 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 341 | |||
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 343 | 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 343 | |||
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 344 | 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 344 | |||
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 346 | 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 346 | |||
15.1.5. State Management Errors . . . . . . . . . . . . . . 348 | 15.1.5. State Management Errors . . . . . . . . . . . . . . 348 | |||
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 348 | 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 349 | |||
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 349 | 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 350 | |||
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 350 | 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 350 | |||
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 351 | 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 352 | |||
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 352 | 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 353 | |||
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 353 | 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 354 | |||
15.1.12. Session Management Errors . . . . . . . . . . . . . 355 | 15.1.12. Session Management Errors . . . . . . . . . . . . . 355 | |||
15.1.13. Client Management Errors . . . . . . . . . . . . . . 355 | 15.1.13. Client Management Errors . . . . . . . . . . . . . . 355 | |||
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 356 | 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 356 | |||
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 356 | 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 357 | |||
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 357 | 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 357 | |||
15.2. Operations and their valid errors . . . . . . . . . . . 358 | 15.2. Operations and Their Valid Errors . . . . . . . . . . . 358 | |||
15.3. Callback operations and their valid errors . . . . . . . 375 | 15.3. Callback Operations and Their Valid Errors . . . . . . . 375 | |||
15.4. Errors and the operations that use them . . . . . . . . 378 | 15.4. Errors and the Operations That Use Them . . . . . . . . 378 | |||
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 394 | 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 394 | |||
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 394 | 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 394 | |||
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 395 | 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 395 | |||
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 406 | 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 406 | |||
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 409 | 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 409 | |||
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 409 | 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 409 | |||
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 415 | 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 415 | |||
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 416 | 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 416 | |||
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 419 | 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 419 | |||
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | |||
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 422 | Recovery . . . . . . . . . . . . . . . . . . . . . . . . 422 | |||
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 423 | 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 423 | |||
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 423 | 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 423 | |||
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 425 | 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 425 | |||
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 426 | 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 426 | |||
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 429 | 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 429 | |||
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 433 | 18.11. Operation 13: LOCKT - Test for Lock . . . . . . . . . . 434 | |||
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 434 | 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 435 | |||
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 436 | 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 437 | |||
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 437 | 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 438 | |||
18.15. Operation 17: NVERIFY - Verify Difference in | 18.15. Operation 17: NVERIFY - Verify Difference in | |||
Attributes . . . . . . . . . . . . . . . . . . . . . . . 439 | Attributes . . . . . . . . . . . . . . . . . . . . . . . 440 | |||
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 440 | 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 441 | |||
18.17. Operation 19: OPENATTR - Open Named Attribute | 18.17. Operation 19: OPENATTR - Open Named Attribute | |||
Directory . . . . . . . . . . . . . . . . . . . . . . . 459 | Directory . . . . . . . . . . . . . . . . . . . . . . . 460 | |||
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 460 | 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 462 | |||
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 462 | 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 463 | |||
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 462 | 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 464 | |||
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 464 | 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 466 | |||
18.22. Operation 25: READ - Read from File . . . . . . . . . . 465 | 18.22. Operation 25: READ - Read from File . . . . . . . . . . 466 | |||
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 467 | 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 469 | |||
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 471 | 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 472 | |||
18.25. Operation 28: REMOVE - Remove File System Object . . . . 472 | 18.25. Operation 28: REMOVE - Remove File System Object . . . . 473 | |||
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 474 | 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 476 | |||
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 478 | 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 479 | |||
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 479 | 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 480 | |||
18.29. Operation 33: SECINFO - Obtain Available Security . . . 480 | 18.29. Operation 33: SECINFO - Obtain Available Security . . . 481 | |||
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 484 | 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 485 | |||
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 487 | 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 488 | |||
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 488 | 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 489 | |||
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 492 | 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel Control . . 494 | |||
18.34. Operation 41: BIND_CONN_TO_SESSION - Associate | 18.34. Operation 41: BIND_CONN_TO_SESSION - Associate | |||
Connection with Session . . . . . . . . . . . . . . . . 494 | Connection with Session . . . . . . . . . . . . . . . . 495 | |||
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 497 | 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 498 | |||
18.36. Operation 43: CREATE_SESSION - Create New Session and | 18.36. Operation 43: CREATE_SESSION - Create New Session and | |||
Confirm Client ID . . . . . . . . . . . . . . . . . . . 515 | Confirm Client ID . . . . . . . . . . . . . . . . . . . 516 | |||
18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 525 | 18.37. Operation 44: DESTROY_SESSION - Destroy a Session . . . 526 | |||
18.38. Operation 45: FREE_STATEID - Free Stateid with No | 18.38. Operation 45: FREE_STATEID - Free Stateid with No | |||
Locks . . . . . . . . . . . . . . . . . . . . . . . . . 527 | Locks . . . . . . . . . . . . . . . . . . . . . . . . . 528 | |||
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | 18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory | |||
delegation . . . . . . . . . . . . . . . . . . . . . . . 528 | Delegation . . . . . . . . . . . . . . . . . . . . . . . 529 | |||
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 532 | 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 533 | |||
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | |||
for a File System . . . . . . . . . . . . . . . . . . . 535 | for a File System . . . . . . . . . . . . . . . . . . . 536 | |||
18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using | 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using | |||
a Layout . . . . . . . . . . . . . . . . . . . . . . . . 536 | a Layout . . . . . . . . . . . . . . . . . . . . . . . . 537 | |||
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 540 | 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 541 | |||
18.44. Operation 51: LAYOUTRETURN - Release Layout | 18.44. Operation 51: LAYOUTRETURN - Release Layout | |||
Information . . . . . . . . . . . . . . . . . . . . . . 549 | Information . . . . . . . . . . . . . . . . . . . . . . 551 | |||
18.45. Operation 52: SECINFO_NO_NAME - Get Security on | 18.45. Operation 52: SECINFO_NO_NAME - Get Security on | |||
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 553 | Unnamed Object . . . . . . . . . . . . . . . . . . . . . 556 | |||
18.46. Operation 53: SEQUENCE - Supply Per-Procedure | 18.46. Operation 53: SEQUENCE - Supply Per-Procedure | |||
Sequencing and Control . . . . . . . . . . . . . . . . . 554 | Sequencing and Control . . . . . . . . . . . . . . . . . 557 | |||
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 560 | 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 563 | |||
18.48. Operation 55: TEST_STATEID - Test Stateids for | 18.48. Operation 55: TEST_STATEID - Test Stateids for | |||
Validity . . . . . . . . . . . . . . . . . . . . . . . . 563 | Validity . . . . . . . . . . . . . . . . . . . . . . . . 565 | |||
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 564 | 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 567 | |||
18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 568 | 18.50. Operation 57: DESTROY_CLIENTID - Destroy a Client ID . . 571 | |||
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | |||
Finished . . . . . . . . . . . . . . . . . . . . . . . . 568 | Finished . . . . . . . . . . . . . . . . . . . . . . . . 571 | |||
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 571 | 18.52. Operation 10044: ILLEGAL - Illegal Operation . . . . . . 574 | |||
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 571 | 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 574 | |||
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 572 | 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 575 | |||
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 572 | 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 575 | |||
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 576 | 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 579 | |||
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 576 | 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 579 | |||
20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 577 | 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 580 | |||
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | |||
Client . . . . . . . . . . . . . . . . . . . . . . . . . 578 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 581 | |||
20.4. Operation 6: CB_NOTIFY - Notify Client of Directory | 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory | |||
Changes . . . . . . . . . . . . . . . . . . . . . . . . 582 | Changes . . . . . . . . . . . . . . . . . . . . . . . . 585 | |||
20.5. Operation 7: CB_PUSH_DELEG - Offer Previously | 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously | |||
Requested Delegation to Client . . . . . . . . . . . . . 586 | Requested Delegation to Client . . . . . . . . . . . . . 589 | |||
20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable | 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable | |||
Objects . . . . . . . . . . . . . . . . . . . . . . . . 587 | Objects . . . . . . . . . . . . . . . . . . . . . . . . 590 | |||
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | |||
Resources for Recallable Objects . . . . . . . . . . . . 590 | Resources for Recallable Objects . . . . . . . . . . . . 593 | |||
20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control | 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control | |||
Limits . . . . . . . . . . . . . . . . . . . . . . . . . 591 | Limits . . . . . . . . . . . . . . . . . . . . . . . . . 594 | |||
20.9. Operation 11: CB_SEQUENCE - Supply Backchannel | 20.9. Operation 11: CB_SEQUENCE - Supply Backchannel | |||
Sequencing and Control . . . . . . . . . . . . . . . . . 592 | Sequencing and Control . . . . . . . . . . . . . . . . . 595 | |||
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | |||
Delegation Wants . . . . . . . . . . . . . . . . . . . . 594 | Delegation Wants . . . . . . . . . . . . . . . . . . . . 597 | |||
20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of | 20.11. Operation 13: CB_NOTIFY_LOCK - Notify Client of | |||
Possible Lock Availability . . . . . . . . . . . . . . . 595 | Possible Lock Availability . . . . . . . . . . . . . . . 598 | |||
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of | 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of | |||
Device ID Changes . . . . . . . . . . . . . . . . . . . 597 | Device ID Changes . . . . . . . . . . . . . . . . . . . 600 | |||
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | |||
Operation . . . . . . . . . . . . . . . . . . . . . . . 599 | Operation . . . . . . . . . . . . . . . . . . . . . . . 602 | |||
21. Security Considerations . . . . . . . . . . . . . . . . . . . 599 | 21. Security Considerations . . . . . . . . . . . . . . . . . . . 602 | |||
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 601 | 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 604 | |||
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 601 | 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 604 | |||
22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 602 | 22.1.1. Initial Registry . . . . . . . . . . . . . . . . . . 605 | |||
22.1.2. Updating Registrations . . . . . . . . . . . . . . . 602 | 22.1.2. Updating Registrations . . . . . . . . . . . . . . . 605 | |||
22.2. Device ID Notifications . . . . . . . . . . . . . . . . 602 | 22.2. Device ID Notifications . . . . . . . . . . . . . . . . 605 | |||
22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 603 | 22.2.1. Initial Registry . . . . . . . . . . . . . . . . . . 606 | |||
22.2.2. Updating Registrations . . . . . . . . . . . . . . . 604 | 22.2.2. Updating Registrations . . . . . . . . . . . . . . . 607 | |||
22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 604 | 22.3. Object Recall Types . . . . . . . . . . . . . . . . . . 607 | |||
22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 605 | 22.3.1. Initial Registry . . . . . . . . . . . . . . . . . . 608 | |||
22.3.2. Updating Registrations . . . . . . . . . . . . . . . 605 | 22.3.2. Updating Registrations . . . . . . . . . . . . . . . 608 | |||
22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 605 | 22.4. Layout Types . . . . . . . . . . . . . . . . . . . . . . 608 | |||
22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 606 | 22.4.1. Initial Registry . . . . . . . . . . . . . . . . . . 609 | |||
22.4.2. Updating Registrations . . . . . . . . . . . . . . . 607 | 22.4.2. Updating Registrations . . . . . . . . . . . . . . . 610 | |||
22.4.3. Guidelines for Writing Layout Type Specifications . 607 | 22.4.3. Guidelines for Writing Layout Type Specifications . 610 | |||
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 608 | 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 611 | |||
22.5.1. Path Variables Registry . . . . . . . . . . . . . . 608 | 22.5.1. Path Variables Registry . . . . . . . . . . . . . . 612 | |||
22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 610 | 22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable . . . . 613 | |||
22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 611 | 22.5.3. Values for the ${ietf.org:OS_TYPE} Variable . . . . 614 | |||
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 612 | 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 615 | |||
23.1. Normative References . . . . . . . . . . . . . . . . . . 612 | 23.1. Normative References . . . . . . . . . . . . . . . . . . 615 | |||
23.2. Informative References . . . . . . . . . . . . . . . . . 614 | 23.2. Informative References . . . . . . . . . . . . . . . . . 617 | |||
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 616 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 619 | |||
Appendix B. RFC Editor Notes . . . . . . . . . . . . . . . . . . 618 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 619 | ||||
1. Introduction | 1. Introduction | |||
1.1. The NFS Version 4 Minor Version 1 Protocol | 1.1. The NFS Version 4 Minor Version 1 Protocol | |||
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | |||
minor version of the NFS version 4 (NFSv4) protocol. The first minor | minor version of the NFS version 4 (NFSv4) protocol. The first minor | |||
version, NFSv4.0 is described in [30]. It generally follows the | version, NFSv4.0, is described in [30]. It generally follows the | |||
guidelines for minor versioning model listed in Section 10 of RFC | guidelines for minor versioning that are listed in Section 10 of RFC | |||
3530. However, it diverges from guidelines 11 ("a client and server | 3530. However, it diverges from guidelines 11 ("a client and server | |||
that supports minor version X must support minor versions 0 through | that support minor version X must support minor versions 0 through | |||
X-1"), and 12 ("no features may be introduced as mandatory in a minor | X-1") and 12 ("no new features may be introduced as mandatory in a | |||
version"). These divergences are due to the introduction of the | minor version"). These divergences are due to the introduction of | |||
sessions model for managing non-idempotent operations and the | the sessions model for managing non-idempotent operations and the | |||
RECLAIM_COMPLETE operation. These two new features are | RECLAIM_COMPLETE operation. These two new features are | |||
infrastructural in nature and simplify implementation of existing and | infrastructural in nature and simplify implementation of existing and | |||
other new features. Making them anything but REQUIRED would add | other new features. Making them anything but REQUIRED would add | |||
undue complexity to protocol definition and implementation. NFSv4.1 | undue complexity to protocol definition and implementation. NFSv4.1 | |||
accordingly updates the Minor Versioning guidelines (Section 2.7). | accordingly updates the minor versioning guidelines (Section 2.7). | |||
As a minor version, NFSv4.1 is consistent with the overall goals for | As a minor version, NFSv4.1 is consistent with the overall goals for | |||
NFSv4, but extends the protocol so as to better meet those goals, | NFSv4, but extends the protocol so as to better meet those goals, | |||
based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted | based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted | |||
some additional goals, which motivate some of the major extensions in | some additional goals, which motivate some of the major extensions in | |||
NFSv4.1. | NFSv4.1. | |||
1.2. Scope of this Document | 1.2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
document are to be interpreted as described in RFC 2119 [1]. | ||||
1.3. Scope of This Document | ||||
This document describes the NFSv4.1 protocol. With respect to | This document describes the NFSv4.1 protocol. With respect to | |||
NFSv4.0, this document does not: | NFSv4.0, this document does not: | |||
o describe the NFSv4.0 protocol, except where needed to contrast | o describe the NFSv4.0 protocol, except where needed to contrast | |||
with NFSv4.1. | with NFSv4.1. | |||
o modify the specification of the NFSv4.0 protocol. | o modify the specification of the NFSv4.0 protocol. | |||
o clarify the NFSv4.0 protocol. | o clarify the NFSv4.0 protocol. | |||
1.3. NFSv4 Goals | 1.4. NFSv4 Goals | |||
The NFSv4 protocol is a further revision of the NFS protocol defined | The NFSv4 protocol is a further revision of the NFS protocol defined | |||
already by NFSv3 [31]. It retains the essential characteristics of | already by NFSv3 [31]. It retains the essential characteristics of | |||
previous versions: easy recovery; independence of transport | previous versions: easy recovery; independence of transport | |||
protocols, operating systems and file systems; simplicity; and good | protocols, operating systems, and file systems; simplicity; and good | |||
performance. NFSv4 has the following goals: | performance. NFSv4 has the following goals: | |||
o Improved access and good performance on the Internet. | o Improved access and good performance on the Internet | |||
The protocol is designed to transit firewalls easily, perform well | The protocol is designed to transit firewalls easily, perform well | |||
where latency is high and bandwidth is low, and scale to very | where latency is high and bandwidth is low, and scale to very | |||
large numbers of clients per server. | large numbers of clients per server. | |||
o Strong security with negotiation built into the protocol. | o Strong security with negotiation built into the protocol | |||
The protocol builds on the work of the ONCRPC working group in | The protocol builds on the work of the ONCRPC working group in | |||
supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1 | supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1 | |||
protocol provides a mechanism to allow clients and servers the | protocol provides a mechanism to allow clients and servers the | |||
ability to negotiate security and require clients and servers to | ability to negotiate security and require clients and servers to | |||
support a minimal set of security schemes. | support a minimal set of security schemes. | |||
o Good cross-platform interoperability. | o Good cross-platform interoperability | |||
The protocol features a file system model that provides a useful, | The protocol features a file system model that provides a useful, | |||
common set of features that does not unduly favor one file system | common set of features that does not unduly favor one file system | |||
or operating system over another. | or operating system over another. | |||
o Designed for protocol extensions. | o Designed for protocol extensions | |||
The protocol is designed to accept standard extensions within a | The protocol is designed to accept standard extensions within a | |||
framework that enable and encourages backward compatibility. | framework that enables and encourages backward compatibility. | |||
1.4. NFSv4.1 Goals | 1.5. NFSv4.1 Goals | |||
NFSv4.1 has the following goals, within the framework established by | NFSv4.1 has the following goals, within the framework established by | |||
the overall NFSv4 goals. | the overall NFSv4 goals. | |||
o To correct significant structural weaknesses and oversights | o To correct significant structural weaknesses and oversights | |||
discovered in the base protocol. | discovered in the base protocol. | |||
o To add clarity and specificity to areas left unaddressed or not | o To add clarity and specificity to areas left unaddressed or not | |||
addressed in sufficient detail in the base protocol. However, as | addressed in sufficient detail in the base protocol. However, as | |||
stated in Section 1.2, it is not a goal to clarify the NFSv4.0 | stated in Section 1.3, it is not a goal to clarify the NFSv4.0 | |||
protocol in the NFSv4.1 specification. | protocol in the NFSv4.1 specification. | |||
o To add specific features based on experience with the existing | o To add specific features based on experience with the existing | |||
protocol and recent industry developments. | protocol and recent industry developments. | |||
o To provide protocol support to take advantage of clustered server | o To provide protocol support to take advantage of clustered server | |||
deployments including the ability to provide scalable parallel | deployments including the ability to provide scalable parallel | |||
access to files distributed among multiple servers. | access to files distributed among multiple servers. | |||
1.5. General Definitions | 1.6. General Definitions | |||
The following definitions are provided for the purpose of providing | The following definitions provide an appropriate context for the | |||
an appropriate context for the reader. | reader. | |||
Byte This document defines a byte as an octet, i.e. a datum exactly | Byte: In this document, a byte is an octet, i.e., a datum exactly 8 | |||
8 bits in length. | bits in length. | |||
Client The "client" is the entity that accesses the NFS server's | Client: The client is the entity that accesses the NFS server's | |||
resources. The client may be an application which contains the | resources. The client may be an application that contains the | |||
logic to access the NFS server directly. The client may also be | logic to access the NFS server directly. The client may also be | |||
the traditional operating system client that provides remote file | the traditional operating system client that provides remote file | |||
system services for a set of applications. | system services for a set of applications. | |||
A client is uniquely identified by a Client Owner. | A client is uniquely identified by a client owner. | |||
With reference to file locking, the client is also the entity that | With reference to byte-range locking, the client is also the | |||
maintains a set of locks on behalf of one or more applications. | entity that maintains a set of locks on behalf of one or more | |||
This client is responsible for crash or failure recovery for those | applications. This client is responsible for crash or failure | |||
locks it manages. | recovery for those locks it manages. | |||
Note that multiple clients may share the same transport and | Note that multiple clients may share the same transport and | |||
connection and multiple clients may exist on the same network | connection and multiple clients may exist on the same network | |||
node. | node. | |||
Client ID A 64-bit quantity used as a unique, short-hand reference | Client ID: The client ID is a 64-bit quantity used as a unique, | |||
to a client supplied Verifier and client owner. The server is | short-hand reference to a client-supplied verifier and client | |||
responsible for supplying the client ID. | owner. The server is responsible for supplying the client ID. | |||
Client Owner The client owner is a unique string, opaque to the | Client Owner: The client owner is a unique string, opaque to the | |||
server, which identifies a client. Multiple network connections | server, that identifies a client. Multiple network connections | |||
and source network addresses originating from those connections | and source network addresses originating from those connections | |||
may share a client owner. The server is expected to treat | may share a client owner. The server is expected to treat | |||
requests from connections with the same client owner as coming | requests from connections with the same client owner as coming | |||
from the same client. | from the same client. | |||
File System The collection of objects on a server (as identified by | File System: The file system is the collection of objects on a | |||
the major identifier of a Server Owner, which is defined later in | server (as identified by the major identifier of a server owner, | |||
this section), that share the same fsid attribute (see | which is defined later in this section) that share the same fsid | |||
Section 5.8.1.9). | attribute (see Section 5.8.1.9). | |||
Lease An interval of time defined by the server for which the client | Lease: A lease is an interval of time defined by the server for | |||
is irrevocably granted a lock. At the end of a lease period the | which the client is irrevocably granted locks. At the end of a | |||
lock may be revoked if the lease has not been extended. The lock | lease period, locks may be revoked if the lease has not been | |||
must be revoked if a conflicting lock has been granted after the | extended. A lock must be revoked if a conflicting lock has been | |||
lease interval. | granted after the lease interval. | |||
All leases granted by a server have the same fixed interval. Note | A server grants a client a single lease for all state. | |||
that the fixed interval was chosen to alleviate the expense a | ||||
server would have in maintaining state about variable length | ||||
leases across server failures. | ||||
Lock The term "lock" is used to refer to byte-range (in UNIX | Lock: The term "lock" is used to refer to byte-range (in UNIX | |||
environments, also known as record) locks, share reservations, | environments, also known as record) locks, share reservations, | |||
delegations, or layouts unless specifically stated otherwise. | delegations, or layouts unless specifically stated otherwise. | |||
Secret State Verifier (SSV) The SSV is a unique secret key shared | Secret State Verifier (SSV): The SSV is a unique secret key shared | |||
between a client and server. The SSV serves as the secret key for | between a client and server. The SSV serves as the secret key for | |||
an internal (that is, internal to NFSv4.1) GSS mechanism (the SSV | an internal (that is, internal to NFSv4.1) Generic Security | |||
GSS mechanism, see Section 2.10.9). The SSV GSS mechanism uses | Services (GSS) mechanism (the SSV GSS mechanism; see | |||
the SSV to compute Message Integrity Code (MIC) and Wrap tokens. | Section 2.10.9). The SSV GSS mechanism uses the SSV to compute | |||
See Section 2.10.8.3 for more details on how NFSv4.1 uses the SSV | message integrity code (MIC) and Wrap tokens. See | |||
and the SSV GSS mechanism. | Section 2.10.8.3 for more details on how NFSv4.1 uses the SSV and | |||
the SSV GSS mechanism. | ||||
Server The "Server" is the entity responsible for coordinating | Server: The Server is the entity responsible for coordinating client | |||
client access to a set of file systems and is identified by a | access to a set of file systems and is identified by a server | |||
Server owner. A server can span multiple network addresses. | owner. A server can span multiple network addresses. | |||
Server Owner The "Server Owner" identifies the server to the client. | Server Owner: The server owner identifies the server to the client. | |||
The server owner consists of a major and minor identifier. When | The server owner consists of a major identifier and a minor | |||
the client has two connections each to a peer with the same major | identifier. When the client has two connections each to a peer | |||
identifier, the client assumes both peers are the same server (the | with the same major identifier, the client assumes that both peers | |||
server namespace is the same via each connection), and assumes and | are the same server (the server namespace is the same via each | |||
lock state is sharable across both connections. When each peer | connection) and that lock state is sharable across both | |||
has both the same major and minor identifier, the client assumes | connections. When each peer has both the same major and minor | |||
each connection might be associable with the same session. | identifiers, the client assumes that each connection might be | |||
associable with the same session. | ||||
Stable Storage Stable storage is storage from which data stored by | Stable Storage: Stable storage is storage from which data stored by | |||
an NFSv4.1 server can be recovered without data loss from multiple | an NFSv4.1 server can be recovered without data loss from multiple | |||
power failures (including cascading power failures, that is, | power failures (including cascading power failures, that is, | |||
several power failures in quick succession), operating system | several power failures in quick succession), operating system | |||
failures, and/or hardware failure of components other than the | failures, and/or hardware failure of components other than the | |||
storage medium itself (such as disk, nonvolatile RAM, flash | storage medium itself (such as disk, nonvolatile RAM, flash | |||
memory, etc.). | memory, etc.). | |||
Some examples of stable storage that are allowable for an NFS | Some examples of stable storage that are allowable for an NFS | |||
server include: | server include: | |||
1. Media commit of data, that is, the modified data has been | 1. Media commit of data; that is, the modified data has been | |||
successfully written to the disk media, for example, the disk | successfully written to the disk media, for example, the disk | |||
platter. | platter. | |||
2. An immediate reply disk drive with battery-backed on- drive | 2. An immediate reply disk drive with battery-backed, on-drive | |||
intermediate storage or uninterruptible power system (UPS). | intermediate storage or uninterruptible power system (UPS). | |||
3. Server commit of data with battery-backed intermediate storage | 3. Server commit of data with battery-backed intermediate storage | |||
and recovery software. | and recovery software. | |||
4. Cache commit with uninterruptible power system (UPS) and | 4. Cache commit with uninterruptible power system (UPS) and | |||
recovery software. | recovery software. | |||
Stateid A 128-bit quantity returned by a server that uniquely | Stateid: A stateid is a 128-bit quantity returned by a server that | |||
defines the open and locking state provided by the server for a | uniquely defines the open and locking states provided by the | |||
specific open-owner or lock-owner/open-owner pair for a specific | server for a specific open-owner or lock-owner/open-owner pair for | |||
file and type of lock. | a specific file and type of lock. | |||
Verifier A 64-bit quantity generated by the client that the server | Verifier: A verifier is a 64-bit quantity generated by the client | |||
can use to determine if the client has restarted and lost all | that the server can use to determine if the client has restarted | |||
previous lock state. | and lost all previous lock state. | |||
1.6. Overview of NFSv4.1 Features | 1.7. Overview of NFSv4.1 Features | |||
To provide a reasonable context for the reader, the major features of | The major features of the NFSv4.1 protocol will be reviewed in brief. | |||
the NFSv4.1 protocol will be reviewed in brief. This will be done to | This will be done to provide an appropriate context for both the | |||
provide an appropriate context for both the reader who is familiar | reader who is familiar with the previous versions of the NFS protocol | |||
with the previous versions of the NFS protocol and the reader that is | and the reader who is new to the NFS protocols. For the reader new | |||
new to the NFS protocols. For the reader new to the NFS protocols, | to the NFS protocols, there is still a set of fundamental knowledge | |||
there is still a set of fundamental knowledge that is expected. The | that is expected. The reader should be familiar with the External | |||
reader should be familiar with the XDR and RPC protocols as described | Data Representation (XDR) and Remote Procedure Call (RPC) protocols | |||
in [2] and [3]. A basic knowledge of file systems and distributed | as described in [2] and [3]. A basic knowledge of file systems and | |||
file systems is expected as well. | distributed file systems is expected as well. | |||
In general this specification of NFSv4.1 will not distinguish those | In general, this specification of NFSv4.1 will not distinguish those | |||
features added in minor version one from those present in the base | features added in minor version 1 from those present in the base | |||
protocol but will treat NFSv4.1 as a unified whole. See Section 1.7 | protocol but will treat NFSv4.1 as a unified whole. See Section 1.8 | |||
for a summary of the differences between NFSv4.0 and NFSv4.1. | for a summary of the differences between NFSv4.0 and NFSv4.1. | |||
1.6.1. RPC and Security | 1.7.1. RPC and Security | |||
As with previous versions of NFS, the External Data Representation | As with previous versions of NFS, the External Data Representation | |||
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 | (XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1 | |||
protocol are those defined in [2] and [3]. To meet end-to-end | protocol are those defined in [2] and [3]. To meet end-to-end | |||
security requirements, the RPCSEC_GSS framework [4] is used to extend | security requirements, the RPCSEC_GSS framework [4] is used to extend | |||
the basic RPC security. With the use of RPCSEC_GSS, various | the basic RPC security. With the use of RPCSEC_GSS, various | |||
mechanisms can be provided to offer authentication, integrity, and | mechanisms can be provided to offer authentication, integrity, and | |||
privacy to the NFSv4 protocol. Kerberos V5 is used as described in | privacy to the NFSv4 protocol. Kerberos V5 is used as described in | |||
[5] to provide one security framework. With the use of RPCSEC_GSS, | [5] to provide one security framework. With the use of RPCSEC_GSS, | |||
other mechanisms may also be specified and used for NFSv4.1 security. | other mechanisms may also be specified and used for NFSv4.1 security. | |||
To enable in-band security negotiation, the NFSv4.1 protocol has | To enable in-band security negotiation, the NFSv4.1 protocol has | |||
operations which provide the client a method of querying the server | operations that provide the client a method of querying the server | |||
about its policies regarding which security mechanisms must be used | about its policies regarding which security mechanisms must be used | |||
for access to the server's file system resources. With this, the | for access to the server's file system resources. With this, the | |||
client can securely match the security mechanism that meets the | client can securely match the security mechanism that meets the | |||
policies specified at both the client and server. | policies specified at both the client and server. | |||
NFSv4.1 introduces parallel access (see Section 1.6.2.2), which is | NFSv4.1 introduces parallel access (see Section 1.7.2.2), which is | |||
called pNFS. The security framework described in this section is | called pNFS. The security framework described in this section is | |||
significantly modified by the introduction of pNFS (see | significantly modified by the introduction of pNFS (see | |||
Section 12.9), because data access is sometimes not over RPC. The | Section 12.9), because data access is sometimes not over RPC. The | |||
level of significance varies with the Storage Protocol (see | level of significance varies with the storage protocol (see | |||
Section 12.2.5) and can be as low as zero impact (see Section 13.12). | Section 12.2.5) and can be as low as zero impact (see Section 13.12). | |||
1.6.2. Protocol Structure | 1.7.2. Protocol Structure | |||
1.6.2.1. Core Protocol | 1.7.2.1. Core Protocol | |||
Unlike NFSv3, which used a series of ancillary protocols (e.g. NLM, | Unlike NFSv3, which used a series of ancillary protocols (e.g., NLM, | |||
NSM, MOUNT), within all minor versions of NFSv4 a single RPC protocol | NSM (Network Status Monitor), MOUNT), within all minor versions of | |||
is used to make requests to the server. Facilities that had been | NFSv4 a single RPC protocol is used to make requests to the server. | |||
separate protocols, such as locking, are now integrated within a | Facilities that had been separate protocols, such as locking, are now | |||
single unified protocol. | integrated within a single unified protocol. | |||
1.6.2.2. Parallel Access | 1.7.2.2. Parallel Access | |||
Minor version one supports high-performance data access to a | Minor version 1 supports high-performance data access to a clustered | |||
clustered server implementation by enabling a separation of metadata | server implementation by enabling a separation of metadata access and | |||
access and data access, with the latter done to multiple servers in | data access, with the latter done to multiple servers in parallel. | |||
parallel. | ||||
Such parallel data access is controlled by recallable objects known | Such parallel data access is controlled by recallable objects known | |||
as "layouts", which are integrated into the protocol locking model. | as "layouts", which are integrated into the protocol locking model. | |||
Clients direct requests for data access to a set of data servers | Clients direct requests for data access to a set of data servers | |||
specified by the layout via a data storage protocol which may be | specified by the layout via a data storage protocol which may be | |||
NFSv4.1 or may be another protocol. | NFSv4.1 or may be another protocol. | |||
Because the protocols used for parallel data access are not | Because the protocols used for parallel data access are not | |||
necessarily RPC-based, the RPC-based security model (Section 1.6.1) | necessarily RPC-based, the RPC-based security model (Section 1.7.1) | |||
is obviously impacted (see Section 12.9). The degree of impact | is obviously impacted (see Section 12.9). The degree of impact | |||
varies with the Storage Protocol (see Section 12.2.5) used for data | varies with the storage protocol (see Section 12.2.5) used for data | |||
access, and can be as low as zero (see Section 13.12). | access, and can be as low as zero (see Section 13.12). | |||
1.6.3. File System Model | 1.7.3. File System Model | |||
The general file system model used for the NFSv4.1 protocol is the | The general file system model used for the NFSv4.1 protocol is the | |||
same as previous versions. The server file system is hierarchical | same as previous versions. The server file system is hierarchical | |||
with the regular files contained within being treated as opaque byte | with the regular files contained within being treated as opaque byte | |||
streams. In a slight departure, file and directory names are encoded | streams. In a slight departure, file and directory names are encoded | |||
with UTF-8 to deal with the basics of internationalization. | with UTF-8 to deal with the basics of internationalization. | |||
The NFSv4.1 protocol does not require a separate protocol to provide | The NFSv4.1 protocol does not require a separate protocol to provide | |||
for the initial mapping between path name and filehandle. All file | for the initial mapping between path name and filehandle. All file | |||
systems exported by a server are presented as a tree so that all file | systems exported by a server are presented as a tree so that all file | |||
systems are reachable from a special per-server global root | systems are reachable from a special per-server global root | |||
filehandle. This allows LOOKUP operations to be used to perform | filehandle. This allows LOOKUP operations to be used to perform | |||
functions previously provided by the MOUNT protocol. The server | functions previously provided by the MOUNT protocol. The server | |||
provides any necessary pseudo file systems to bridge any gaps that | provides any necessary pseudo file systems to bridge any gaps that | |||
arise due to unexported gaps between exported file systems. | arise due to unexported gaps between exported file systems. | |||
1.6.3.1. Filehandles | 1.7.3.1. Filehandles | |||
As in previous versions of the NFS protocol, opaque filehandles are | As in previous versions of the NFS protocol, opaque filehandles are | |||
used to identify individual files and directories. Lookup-type and | used to identify individual files and directories. Lookup-type and | |||
create operations translate file and directory names to filehandles | create operations translate file and directory names to filehandles, | |||
which are then used to identify objects in subsequent operations. | which are then used to identify objects in subsequent operations. | |||
The NFSv4.1 protocol provides support for persistent filehandles, | The NFSv4.1 protocol provides support for persistent filehandles, | |||
guaranteed to be valid for the lifetime of the file system object | guaranteed to be valid for the lifetime of the file system object | |||
designated. In addition it provides support to servers to provide | designated. In addition, it provides support to servers to provide | |||
filehandles with more limited validity guarantees, called volatile | filehandles with more limited validity guarantees, called volatile | |||
filehandles. | filehandles. | |||
1.6.3.2. File Attributes | 1.7.3.2. File Attributes | |||
The NFSv4.1 protocol has a rich and extensible file object attribute | The NFSv4.1 protocol has a rich and extensible file object attribute | |||
structure, which is divided into REQUIRED, RECOMMENDED, and named | structure, which is divided into REQUIRED, RECOMMENDED, and named | |||
attributes (see Section 5). | attributes (see Section 5). | |||
Several (but not all) of the REQUIRED attributes are derived from the | Several (but not all) of the REQUIRED attributes are derived from the | |||
attributes of NFSv3 (see the definition of the fattr3 data type in | attributes of NFSv3 (see the definition of the fattr3 data type in | |||
[31]). An example of a REQUIRED attribute is the file object's type | [31]). An example of a REQUIRED attribute is the file object's type | |||
(Section 5.8.1.2) so that regular files can be distinguished from | (Section 5.8.1.2) so that regular files can be distinguished from | |||
directories (also known as folders in some operating environments) | directories (also known as folders in some operating environments) | |||
and other types of objects. REQUIRED attributes are discussed in | and other types of objects. REQUIRED attributes are discussed in | |||
Section 5.1. | Section 5.1. | |||
An example of three RECOMMENDED attributes are acl, sacl, and dacl. | An example of three RECOMMENDED attributes are acl, sacl, and dacl. | |||
These attributes define an Access Control List (ACL) on a file object | These attributes define an Access Control List (ACL) on a file object | |||
(Section 6). An ACL provides directory and file access control | (Section 6). An ACL provides directory and file access control | |||
beyond the model used in NFSv3. The ACL definition allows for | beyond the model used in NFSv3. The ACL definition allows for | |||
specification of specific sets of permissions for individual users | specification of specific sets of permissions for individual users | |||
and groups. In addition, ACL inheritance allows propagation of | and groups. In addition, ACL inheritance allows propagation of | |||
access permissions and restriction down a directory tree as file | access permissions and restrictions down a directory tree as file | |||
system objects are created. RECOMMENDED attributes are discussed in | system objects are created. RECOMMENDED attributes are discussed in | |||
Section 5.2. | Section 5.2. | |||
A named attribute is an opaque byte stream that is associated with a | A named attribute is an opaque byte stream that is associated with a | |||
directory or file and referred to by a string name. Named attributes | directory or file and referred to by a string name. Named attributes | |||
are meant to be used by client applications as a method to associate | are meant to be used by client applications as a method to associate | |||
application-specific data with a regular file or directory. NFSv4.1 | application-specific data with a regular file or directory. NFSv4.1 | |||
modifies named attributes relative to NFSv4.0 by tightening the | modifies named attributes relative to NFSv4.0 by tightening the | |||
allowed operations in order to prevent the development of non- | allowed operations in order to prevent the development of non- | |||
interoperable implementations. Named attributes are discussed in | interoperable implementations. Named attributes are discussed in | |||
Section 5.3. | Section 5.3. | |||
1.6.3.3. Multi-server Namespace | 1.7.3.3. Multi-Server Namespace | |||
NFSv4.1 contains a number of features to allow implementation of | NFSv4.1 contains a number of features to allow implementation of | |||
namespaces that cross server boundaries and that allow and facilitate | namespaces that cross server boundaries and that allow and facilitate | |||
a non-disruptive transfer of support for individual file systems | a non-disruptive transfer of support for individual file systems | |||
between servers. They are all based upon attributes that allow one | between servers. They are all based upon attributes that allow one | |||
file system to specify alternate or new locations for that file | file system to specify alternate or new locations for that file | |||
system. | system. | |||
These attributes may be used together with the concept of absent file | These attributes may be used together with the concept of absent file | |||
systems, which provide specifications for additional locations but no | systems, which provide specifications for additional locations but no | |||
skipping to change at page 19, line 33 | skipping to change at page 18, line 34 | |||
o Location attributes may be provided for present file systems to | o Location attributes may be provided for present file systems to | |||
provide the locations of alternate file system instances or | provide the locations of alternate file system instances or | |||
replicas to be used in the event that the current file system | replicas to be used in the event that the current file system | |||
instance becomes unavailable. | instance becomes unavailable. | |||
o Location attributes may be provided when a previously present file | o Location attributes may be provided when a previously present file | |||
system becomes absent. This allows non-disruptive migration of | system becomes absent. This allows non-disruptive migration of | |||
file systems to alternate servers. | file systems to alternate servers. | |||
1.6.4. Locking Facilities | 1.7.4. Locking Facilities | |||
As mentioned previously, NFS v4.1 is a single protocol which includes | As mentioned previously, NFSv4.1 is a single protocol that includes | |||
locking facilities. These locking facilities include support for | locking facilities. These locking facilities include support for | |||
many types of locks including a number of sorts of recallable locks. | many types of locks including a number of sorts of recallable locks. | |||
Recallable locks such as delegations allow the client to be assured | Recallable locks such as delegations allow the client to be assured | |||
that certain events will not occur so long as that lock is held. | that certain events will not occur so long as that lock is held. | |||
When circumstances change, the lock is recalled via a callback | When circumstances change, the lock is recalled via a callback | |||
request. The assurances provided by delegations allow more extensive | request. The assurances provided by delegations allow more extensive | |||
caching to be done safely when circumstances allow it. | caching to be done safely when circumstances allow it. | |||
The types of locks are: | The types of locks are: | |||
skipping to change at page 20, line 11 | skipping to change at page 19, line 15 | |||
o File delegations, which are recallable locks that assure the | o File delegations, which are recallable locks that assure the | |||
holder that inconsistent opens and file changes cannot occur so | holder that inconsistent opens and file changes cannot occur so | |||
long as the delegation is held. | long as the delegation is held. | |||
o Directory delegations, which are recallable locks that assure the | o Directory delegations, which are recallable locks that assure the | |||
holder that inconsistent directory modifications cannot occur so | holder that inconsistent directory modifications cannot occur so | |||
long as the delegation is held. | long as the delegation is held. | |||
o Layouts, which are recallable objects that assure the holder that | o Layouts, which are recallable objects that assure the holder that | |||
direct access to the file data may be performed directly by the | direct access to the file data may be performed directly by the | |||
client and that no change to the data's location inconsistent with | client and that no change to the data's location that is | |||
that access may be made so long as the layout is held. | inconsistent with that access may be made so long as the layout is | |||
held. | ||||
All locks for a given client are tied together under a single client- | All locks for a given client are tied together under a single client- | |||
wide lease. All requests made on sessions associated with the client | wide lease. All requests made on sessions associated with the client | |||
renew that lease. When leases are not promptly renewed locks are | renew that lease. When the client's lease is not promptly renewed, | |||
subject to revocation. In the event of server restart, clients have | the client's locks are subject to revocation. In the event of server | |||
the opportunity to safely reclaim their locks within a special grace | restart, clients have the opportunity to safely reclaim their locks | |||
period. | within a special grace period. | |||
1.7. Differences from NFSv4.0 | 1.8. Differences from NFSv4.0 | |||
The following summarizes the major differences between minor version | The following summarizes the major differences between minor version | |||
one and the base protocol: | 1 and the base protocol: | |||
o Implementation of the sessions model (Section 2.10). | o Implementation of the sessions model (Section 2.10). | |||
o Parallel access to data (Section 12). | o Parallel access to data (Section 12). | |||
o Addition of the RECLAIM_COMPLETE operation to better structure the | o Addition of the RECLAIM_COMPLETE operation to better structure the | |||
lock reclamation process (Section 18.51). | lock reclamation process (Section 18.51). | |||
o Enhanced delegation support as follows. | o Enhanced delegation support as follows. | |||
* Delegations on directories and other file types in addition to | * Delegations on directories and other file types in addition to | |||
regular files (Section 18.39, Section 18.49). | regular files (Section 18.39, Section 18.49). | |||
* Operations to optimize acquisition of recalled or denied | * Operations to optimize acquisition of recalled or denied | |||
delegations (Section 18.49, Section 20.5, Section 20.7). | delegations (Section 18.49, Section 20.5, Section 20.7). | |||
* Notifications of changes to files and directories | * Notifications of changes to files and directories | |||
(Section 18.39, Section 20.4). | (Section 18.39, Section 20.4). | |||
* A method to allow a server to indicate it is recalling one or | * A method to allow a server to indicate that it is recalling one | |||
more delegations for resource management reasons, and thus a | or more delegations for resource management reasons, and thus a | |||
method to allow the client to pick which delegations to return | method to allow the client to pick which delegations to return | |||
(Section 20.6). | (Section 20.6). | |||
o Attributes can be set atomically during exclusive file create via | o Attributes can be set atomically during exclusive file create via | |||
the OPEN operation (see the new EXCLUSIVE4_1 creation method in | the OPEN operation (see the new EXCLUSIVE4_1 creation method in | |||
Section 18.16). | Section 18.16). | |||
o Open files can be preserved if removed and the hard link count | o Open files can be preserved if removed and the hard link count | |||
("hard link" is defined in an Open Group [6] standard) goes to | ("hard link" is defined in an Open Group [6] standard) goes to | |||
zero thus obviating the need for clients to rename deleted files | zero, thus obviating the need for clients to rename deleted files | |||
to partially hidden names -- colloquially called "silly rename" | to partially hidden names -- colloquially called "silly rename" | |||
(see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in | (see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in | |||
Section 18.16). | Section 18.16). | |||
o Improved compatibility with Microsoft Windows for Access Control | o Improved compatibility with Microsoft Windows for Access Control | |||
Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2). | Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2). | |||
o Data retention (Section 5.13). | o Data retention (Section 5.13). | |||
o Identification of the implementation of the NFS client and server | o Identification of the implementation of the NFS client and server | |||
skipping to change at page 21, line 41 | skipping to change at page 20, line 45 | |||
NFSv4.1 relies on core infrastructure common to nearly every | NFSv4.1 relies on core infrastructure common to nearly every | |||
operation. This core infrastructure is described in the remainder of | operation. This core infrastructure is described in the remainder of | |||
this section. | this section. | |||
2.2. RPC and XDR | 2.2. RPC and XDR | |||
The NFSv4.1 protocol is a Remote Procedure Call (RPC) application | The NFSv4.1 protocol is a Remote Procedure Call (RPC) application | |||
that uses RPC version 2 and the corresponding eXternal Data | that uses RPC version 2 and the corresponding eXternal Data | |||
Representation (XDR) as defined in [3] and [2]. | Representation (XDR) as defined in [3] and [2]. | |||
2.2.1. RPC-based Security | 2.2.1. RPC-Based Security | |||
Previous NFS versions have been thought of as having a host-based | Previous NFS versions have been thought of as having a host-based | |||
authentication model, where the NFS server authenticates the NFS | authentication model, where the NFS server authenticates the NFS | |||
client, and trusts the client to authenticate all users. Actually, | client, and trusts the client to authenticate all users. Actually, | |||
NFS has always depended on RPC for authentication. One of the first | NFS has always depended on RPC for authentication. One of the first | |||
forms of RPC authentication, AUTH_SYS, had no strong authentication, | forms of RPC authentication, AUTH_SYS, had no strong authentication | |||
and required a host-based authentication approach. NFSv4.1 also | and required a host-based authentication approach. NFSv4.1 also | |||
depends on RPC for basic security services, and mandates RPC support | depends on RPC for basic security services and mandates RPC support | |||
for a user-based authentication model. The user-based authentication | for a user-based authentication model. The user-based authentication | |||
model has user principals authenticated by a server, and in turn the | model has user principals authenticated by a server, and in turn the | |||
server authenticated by user principals. RPC provides some basic | server authenticated by user principals. RPC provides some basic | |||
security services which are used by NFSv4.1. | security services that are used by NFSv4.1. | |||
2.2.1.1. RPC Security Flavors | 2.2.1.1. RPC Security Flavors | |||
As described in section 7.2 "Authentication" of [3], RPC security is | As described in Section 7.2 ("Authentication") of [3], RPC security | |||
encapsulated in the RPC header, via a security or authentication | is encapsulated in the RPC header, via a security or authentication | |||
flavor, and information specific to the specified security flavor. | flavor, and information specific to the specified security flavor. | |||
Every RPC header conveys information used to identify and | Every RPC header conveys information used to identify and | |||
authenticate a client and server. As discussed in Section 2.2.1.1.1, | authenticate a client and server. As discussed in Section 2.2.1.1.1, | |||
some security flavors provide additional security services. | some security flavors provide additional security services. | |||
NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This | NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This | |||
requirement to implement is not a requirement to use.) Other | requirement to implement is not a requirement to use.) Other | |||
flavors, such as AUTH_NONE, and AUTH_SYS, MAY be implemented as well. | flavors, such as AUTH_NONE and AUTH_SYS, MAY be implemented as well. | |||
2.2.1.1.1. RPCSEC_GSS and Security Services | 2.2.1.1.1. RPCSEC_GSS and Security Services | |||
RPCSEC_GSS ([4]) uses the functionality of GSS-API [7]. This allows | RPCSEC_GSS [4] uses the functionality of GSS-API [7]. This allows | |||
for the use of various security mechanisms by the RPC layer without | for the use of various security mechanisms by the RPC layer without | |||
the additional implementation overhead of adding RPC security | the additional implementation overhead of adding RPC security | |||
flavors. | flavors. | |||
2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy | 2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy | |||
Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate | Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate | |||
users on clients to servers, and servers to users. It can also | users on clients to servers, and servers to users. It can also | |||
perform integrity checking on the entire RPC message, including the | perform integrity checking on the entire RPC message, including the | |||
RPC header, and the arguments or results. Finally, privacy, usually | RPC header, and on the arguments or results. Finally, privacy, | |||
via encryption, is a service available with RPCSEC_GSS. Privacy is | usually via encryption, is a service available with RPCSEC_GSS. | |||
performed on the arguments and results. Note that if privacy is | Privacy is performed on the arguments and results. Note that if | |||
selected, integrity, authentication, and identification are enabled. | privacy is selected, integrity, authentication, and identification | |||
If privacy is not selected, but integrity is selected, authentication | are enabled. If privacy is not selected, but integrity is selected, | |||
and identification are enabled. If integrity and privacy are not | authentication and identification are enabled. If integrity and | |||
selected, but authentication is enabled, identification is enabled. | privacy are not selected, but authentication is enabled, | |||
RPCSEC_GSS does not provide identification as a separate service. | identification is enabled. RPCSEC_GSS does not provide | |||
identification as a separate service. | ||||
Although GSS-API has an authentication service distinct from its | Although GSS-API has an authentication service distinct from its | |||
privacy and integrity services, GSS-API's authentication service is | privacy and integrity services, GSS-API's authentication service is | |||
not used for RPCSEC_GSS's authentication service. Instead, each RPC | not used for RPCSEC_GSS's authentication service. Instead, each RPC | |||
request and response header is integrity protected with the GSS-API | request and response header is integrity protected with the GSS-API | |||
integrity service, and this allows RPCSEC_GSS to offer per-RPC | integrity service, and this allows RPCSEC_GSS to offer per-RPC | |||
authentication and identity. See [4] for more information. | authentication and identity. See [4] for more information. | |||
NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and | NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and | |||
authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's | authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's | |||
privacy service. NFSv4.1 clients SHOULD support RPCSEC_GSS's privacy | privacy service. NFSv4.1 clients SHOULD support RPCSEC_GSS's privacy | |||
service. | service. | |||
2.2.1.1.1.2. Security mechanisms for NFSv4.1 | 2.2.1.1.1.2. Security Mechanisms for NFSv4.1 | |||
RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide | RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide | |||
security services. Therefore NFSv4.1 clients and servers MUST | security services. Therefore, NFSv4.1 clients and servers MUST | |||
support the Kerberos V5 security mechanism. | support the Kerberos V5 security mechanism. | |||
The use of RPCSEC_GSS requires selection of: mechanism, quality of | The use of RPCSEC_GSS requires selection of mechanism, quality of | |||
protection (QOP), and service (authentication, integrity, privacy). | protection (QOP), and service (authentication, integrity, privacy). | |||
For the mandated security mechanisms, NFSv4.1 specifies that a QOP of | For the mandated security mechanisms, NFSv4.1 specifies that a QOP of | |||
zero (0) is used, leaving it up to the mechanism or the mechanism's | zero is used, leaving it up to the mechanism or the mechanism's | |||
configuration to map QOP zero to an appropriate level of protection. | configuration to map QOP zero to an appropriate level of protection. | |||
Each mandated mechanism specifies minimum set of cryptographic | Each mandated mechanism specifies a minimum set of cryptographic | |||
algorithms for implementing integrity and privacy. NFSv4.1 clients | algorithms for implementing integrity and privacy. NFSv4.1 clients | |||
and servers MUST be implemented on operating environments that comply | and servers MUST be implemented on operating environments that comply | |||
with the REQUIRED cryptographic algorithms of each REQUIRED | with the REQUIRED cryptographic algorithms of each REQUIRED | |||
mechanism. | mechanism. | |||
2.2.1.1.1.2.1. Kerberos V5 | 2.2.1.1.1.2.1. Kerberos V5 | |||
The Kerberos V5 GSS-API mechanism as described in [5] MUST be | The Kerberos V5 GSS-API mechanism as described in [5] MUST be | |||
implemented with the RPCSEC_GSS services as specified in the | implemented with the RPCSEC_GSS services as specified in the | |||
following table: | following table: | |||
skipping to change at page 23, line 42 | skipping to change at page 22, line 47 | |||
4 == RPCSEC_GSS service | 4 == RPCSEC_GSS service | |||
5 == NFSv4.1 clients MUST support | 5 == NFSv4.1 clients MUST support | |||
6 == NFSv4.1 servers MUST support | 6 == NFSv4.1 servers MUST support | |||
1 2 3 4 5 6 | 1 2 3 4 5 6 | |||
------------------------------------------------------------------ | ------------------------------------------------------------------ | |||
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes | 390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes | |||
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes | 390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes | |||
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes | 390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes | |||
Note that the number and name of the pseudo flavor is presented here | Note that the number and name of the pseudo flavor are presented here | |||
as a mapping aid to the implementor. Because the NFSv4.1 protocol | as a mapping aid to the implementor. Because the NFSv4.1 protocol | |||
includes a method to negotiate security and it understands the GSS- | includes a method to negotiate security and it understands the GSS- | |||
API mechanism, the pseudo flavor is not needed. The pseudo flavor is | API mechanism, the pseudo flavor is not needed. The pseudo flavor is | |||
needed for the NFSv3 since the security negotiation is done via the | needed for the NFSv3 since the security negotiation is done via the | |||
MOUNT protocol as described in [33]. | MOUNT protocol as described in [33]. | |||
At the time NFSv4.1 was specified, AES with HMAC-SHA1 was a REQUIRED | At the time NFSv4.1 was specified, the Advanced Encryption Standard | |||
algorithm set for Kerberos V5. In contrast, when NFSv4.0 was | (AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5. | |||
specified, weaker algorithm sets were REQUIRED for Kerberos V5, and | In contrast, when NFSv4.0 was specified, weaker algorithm sets were | |||
were REQUIRED in the NFSv4.0 specification, because the Kerberos V5 | REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0 | |||
specification at the time did not specify stronger algorithms. The | specification, because the Kerberos V5 specification at the time did | |||
NFSv4.1 specification does not specify REQUIRED algorithms for | not specify stronger algorithms. The NFSv4.1 specification does not | |||
Kerberos V5, and instead, the implementor is expected to track the | specify REQUIRED algorithms for Kerberos V5, and instead, the | |||
evolution of the Kerberos V5 standard if and when stronger algorithms | implementor is expected to track the evolution of the Kerberos V5 | |||
are specified. | standard if and when stronger algorithms are specified. | |||
2.2.1.1.1.2.1.1. Security Considerations for Cryptographic Algorithms | 2.2.1.1.1.2.1.1. Security Considerations for Cryptographic Algorithms | |||
in Kerberos V5 | in Kerberos V5 | |||
When deploying NFSv4.1, the strength of the security achieved depends | When deploying NFSv4.1, the strength of the security achieved depends | |||
on the existing Kerberos V5 infrastructure. The algorithms of | on the existing Kerberos V5 infrastructure. The algorithms of | |||
Kerberos V5 are not directly exposed to or selectable by the client | Kerberos V5 are not directly exposed to or selectable by the client | |||
or server, so there is some due diligence required by the user of | or server, so there is some due diligence required by the user of | |||
NFSv4.1 to ensure that security is acceptable where where needed. | NFSv4.1 to ensure that security is acceptable where needed. | |||
2.2.1.1.1.3. GSS Server Principal | 2.2.1.1.1.3. GSS Server Principal | |||
Regardless of what security mechanism under RPCSEC_GSS is being used, | Regardless of what security mechanism under RPCSEC_GSS is being used, | |||
the NFS server, MUST identify itself in GSS-API via a | the NFS server MUST identify itself in GSS-API via a | |||
GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE | GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE | |||
names are of the form: | names are of the form: | |||
service@hostname | service@hostname | |||
For NFS, the "service" element is | For NFS, the "service" element is | |||
nfs | nfs | |||
Implementations of security mechanisms will convert nfs@hostname to | Implementations of security mechanisms will convert nfs@hostname to | |||
various different forms. For Kerberos V5 the following form is | various different forms. For Kerberos V5, the following form is | |||
RECOMMENDED: | RECOMMENDED: | |||
nfs/hostname | nfs/hostname | |||
2.3. COMPOUND and CB_COMPOUND | 2.3. COMPOUND and CB_COMPOUND | |||
A significant departure from the versions of the NFS protocol before | A significant departure from the versions of the NFS protocol before | |||
NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4 | NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4 | |||
protocol, in all minor versions, there are exactly two RPC | protocol, in all minor versions, there are exactly two RPC | |||
procedures, NULL and COMPOUND. The COMPOUND procedure is defined as | procedures, NULL and COMPOUND. The COMPOUND procedure is defined as | |||
skipping to change at page 25, line 12 | skipping to change at page 24, line 17 | |||
of facilities exist to pass results from one operation to another. | of facilities exist to pass results from one operation to another. | |||
Once an operation returns a failing result, the evaluation ends and | Once an operation returns a failing result, the evaluation ends and | |||
the results of all evaluated operations are returned to the client. | the results of all evaluated operations are returned to the client. | |||
With the use of the COMPOUND procedure, the client is able to build | With the use of the COMPOUND procedure, the client is able to build | |||
simple or complex requests. These COMPOUND requests allow for a | simple or complex requests. These COMPOUND requests allow for a | |||
reduction in the number of RPCs needed for logical file system | reduction in the number of RPCs needed for logical file system | |||
operations. For example, multi-component lookup requests can be | operations. For example, multi-component lookup requests can be | |||
constructed by combining multiple LOOKUP operations. Those can be | constructed by combining multiple LOOKUP operations. Those can be | |||
further combined with operations such as GETATTR, READDIR, or OPEN | further combined with operations such as GETATTR, READDIR, or OPEN | |||
plus READ to do more complicated sets of operations without incurring | plus READ to do more complicated sets of operation without incurring | |||
additional latency. | additional latency. | |||
NFSv4.1 also contains a considerable set of callback operations in | NFSv4.1 also contains a considerable set of callback operations in | |||
which the server makes an RPC directed at the client. Callback RPCs | which the server makes an RPC directed at the client. Callback RPCs | |||
have a similar structure to that of the normal server requests. In | have a similar structure to that of the normal server requests. In | |||
all minor versions of the NFSv4 protocol there are two callback RPC | all minor versions of the NFSv4 protocol, there are two callback RPC | |||
procedures, CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is | procedures: CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is | |||
defined in an analogous fashion to that of COMPOUND with its own set | defined in an analogous fashion to that of COMPOUND with its own set | |||
of callback operations. | of callback operations. | |||
The addition of new server and callback operations within the | The addition of new server and callback operations within the | |||
COMPOUND and CB_COMPOUND request framework provides a means of | COMPOUND and CB_COMPOUND request framework provides a means of | |||
extending the protocol in subsequent minor versions. | extending the protocol in subsequent minor versions. | |||
Except for a small number of operations needed for session creation, | Except for a small number of operations needed for session creation, | |||
server requests and callback requests are performed within the | server requests and callback requests are performed within the | |||
context of a session. Sessions provide a client context for every | context of a session. Sessions provide a client context for every | |||
skipping to change at page 26, line 11 | skipping to change at page 25, line 14 | |||
Unlike NFSv4.0, the only NFSv4.1 operations possible before a client | Unlike NFSv4.0, the only NFSv4.1 operations possible before a client | |||
ID is established are those needed to establish the client ID. | ID is established are those needed to establish the client ID. | |||
A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION | A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION | |||
operation using that client ID (eir_clientid as returned from | operation using that client ID (eir_clientid as returned from | |||
EXCHANGE_ID) is required to establish and confirm the client ID on | EXCHANGE_ID) is required to establish and confirm the client ID on | |||
the server. Establishment of identification by a new incarnation of | the server. Establishment of identification by a new incarnation of | |||
the client also has the effect of immediately releasing any locking | the client also has the effect of immediately releasing any locking | |||
state that a previous incarnation of that same client might have had | state that a previous incarnation of that same client might have had | |||
on the server. Such released state would include all lock, share | on the server. Such released state would include all byte-range | |||
reservation, layout state, and where the server is not supporting the | lock, share reservation, layout state, and -- where the server | |||
CLAIM_DELEGATE_PREV claim type, all delegation state associated with | supports neither the CLAIM_DELEGATE_PREV nor CLAIM_DELEG_CUR_FH claim | |||
the same client with the same identity. For discussion of delegation | types -- all delegation state associated with the same client with | |||
state recovery, see Section 10.2.1. For discussion of layout state | the same identity. For discussion of delegation state recovery, see | |||
recovery see Section 12.7.1. | Section 10.2.1. For discussion of layout state recovery, see | |||
Section 12.7.1. | ||||
Releasing such state requires that the server be able to determine | Releasing such state requires that the server be able to determine | |||
that one client instance is the successor of another. Where this | that one client instance is the successor of another. Where this | |||
cannot be done, for any of a number of reasons, the locking state | cannot be done, for any of a number of reasons, the locking state | |||
will remain for a time subject to lease expiration (see Section 8.3) | will remain for a time subject to lease expiration (see Section 8.3) | |||
and the new client will need to wait for such state to be removed, if | and the new client will need to wait for such state to be removed, if | |||
it makes conflicting lock requests. | it makes conflicting lock requests. | |||
Client identification is encapsulated in the following Client Owner | Client identification is encapsulated in the following client owner | |||
data type: | data type: | |||
struct client_owner4 { | struct client_owner4 { | |||
verifier4 co_verifier; | verifier4 co_verifier; | |||
opaque co_ownerid<NFS4_OPAQUE_LIMIT>; | opaque co_ownerid<NFS4_OPAQUE_LIMIT>; | |||
}; | }; | |||
The first field, co_verifier, is a client incarnation verifier. The | The first field, co_verifier, is a client incarnation verifier. The | |||
server will start the process of canceling the client's leased state | server will start the process of canceling the client's leased state | |||
if co_verifier is different than what the server has previously | if co_verifier is different than what the server has previously | |||
recorded for the identified client (as specified in the co_ownerid | recorded for the identified client (as specified in the co_ownerid | |||
field). | field). | |||
The second field, co_ownerid is a variable length string that | The second field, co_ownerid, is a variable length string that | |||
uniquely defines the client so that subsequent instances of the same | uniquely defines the client so that subsequent instances of the same | |||
client bear the same co_ownerid with a different verifier. | client bear the same co_ownerid with a different verifier. | |||
There are several considerations for how the client generates the | There are several considerations for how the client generates the | |||
co_ownerid string: | co_ownerid string: | |||
o The string should be unique so that multiple clients do not | o The string should be unique so that multiple clients do not | |||
present the same string. The consequences of two clients | present the same string. The consequences of two clients | |||
presenting the same string range from one client getting an error | presenting the same string range from one client getting an error | |||
to one client having its leased state abruptly and unexpectedly | to one client having its leased state abruptly and unexpectedly | |||
cancelled. | cancelled. | |||
o The string should be selected so that subsequent incarnations | o The string should be selected so that subsequent incarnations | |||
(e.g. restarts) of the same client cause the client to present the | (e.g., restarts) of the same client cause the client to present | |||
same string. The implementor is cautioned from an approach that | the same string. The implementor is cautioned from an approach | |||
requires the string to be recorded in a local file because this | that requires the string to be recorded in a local file because | |||
precludes the use of the implementation in an environment where | this precludes the use of the implementation in an environment | |||
there is no local disk and all file access is from an NFSv4.1 | where there is no local disk and all file access is from an | |||
server. | NFSv4.1 server. | |||
o The string should be the same for each server network address that | o The string should be the same for each server network address that | |||
the client accesses. This way, if a server has multiple | the client accesses. This way, if a server has multiple | |||
interfaces, the client can trunk traffic over multiple network | interfaces, the client can trunk traffic over multiple network | |||
paths as described in Section 2.10.5. (Note: the precise opposite | paths as described in Section 2.10.5. (Note: the precise opposite | |||
was advised in the NFSv4.0 specification [30].) | was advised in the NFSv4.0 specification [30].) | |||
o The algorithm for generating the string should not assume that the | o The algorithm for generating the string should not assume that the | |||
client's network address will not change, unless the client | client's network address will not change, unless the client | |||
implementation knows it is using statically assigned network | implementation knows it is using statically assigned network | |||
addresses. This includes changes between client incarnations and | addresses. This includes changes between client incarnations and | |||
even changes while the client is still running in its current | even changes while the client is still running in its current | |||
incarnation. Thus with dynamic address assignment, if the client | incarnation. Thus, with dynamic address assignment, if the client | |||
includes just the client's network address in the co_ownerid | includes just the client's network address in the co_ownerid | |||
string, there is a real risk that after the client gives up the | string, there is a real risk that after the client gives up the | |||
network address, another client, using a similar algorithm for | network address, another client, using a similar algorithm for | |||
generating the co_ownerid string, would generate a conflicting | generating the co_ownerid string, would generate a conflicting | |||
co_ownerid string. | co_ownerid string. | |||
Given the above considerations, an example of a well generated | Given the above considerations, an example of a well-generated | |||
co_ownerid string is one that includes: | co_ownerid string is one that includes: | |||
o If applicable, the client's statically assigned network address. | o If applicable, the client's statically assigned network address. | |||
o Additional information that tends to be unique, such as one or | o Additional information that tends to be unique, such as one or | |||
more of: | more of: | |||
* The client machine's serial number (for privacy reasons, it is | * The client machine's serial number (for privacy reasons, it is | |||
best to perform some one way function on the serial number). | best to perform some one-way function on the serial number). | |||
* A MAC address (again, a one way function should be performed). | * A Media Access Control (MAC) address (again, a one-way function | |||
should be performed). | ||||
* The timestamp of when the NFSv4.1 software was first installed | * The timestamp of when the NFSv4.1 software was first installed | |||
on the client (though this is subject to the previously | on the client (though this is subject to the previously | |||
mentioned caution about using information that is stored in a | mentioned caution about using information that is stored in a | |||
file, because the file might only be accessible over NFSv4.1). | file, because the file might only be accessible over NFSv4.1). | |||
* A true random number. However since this number ought to be | * A true random number. However, since this number ought to be | |||
the same between client incarnations, this shares the same | the same between client incarnations, this shares the same | |||
problem as that of using the timestamp of the software | problem as that of using the timestamp of the software | |||
installation. | installation. | |||
o For a user level NFSv4.1 client, it should contain additional | o For a user-level NFSv4.1 client, it should contain additional | |||
information to distinguish the client from other user level | information to distinguish the client from other user-level | |||
clients running on the same host, such as a process identifier or | clients running on the same host, such as a process identifier or | |||
other unique sequence. | other unique sequence. | |||
The client ID is assigned by the server (the eir_clientid result from | The client ID is assigned by the server (the eir_clientid result from | |||
EXCHANGE_ID) and should be chosen so that it will not conflict with a | EXCHANGE_ID) and should be chosen so that it will not conflict with a | |||
client ID previously assigned by the server. This applies across | client ID previously assigned by the server. This applies across | |||
server restarts. | server restarts. | |||
In the event of a server restart, a client may find out that its | In the event of a server restart, a client may find out that its | |||
current client ID is no longer valid when it receives an | current client ID is no longer valid when it receives an | |||
skipping to change at page 28, line 39 | skipping to change at page 27, line 44 | |||
a server restart. When the existing client ID is presented to a | a server restart. When the existing client ID is presented to a | |||
server as part of creating a session and that client ID is not | server as part of creating a session and that client ID is not | |||
recognized, as would happen after a server restart, the server will | recognized, as would happen after a server restart, the server will | |||
reject the request with the error NFS4ERR_STALE_CLIENTID. | reject the request with the error NFS4ERR_STALE_CLIENTID. | |||
In the case of the session being persistent, the client will re- | In the case of the session being persistent, the client will re- | |||
establish communication using the existing session after the restart. | establish communication using the existing session after the restart. | |||
This session will be associated with the existing client ID but may | This session will be associated with the existing client ID but may | |||
only be used to retransmit operations that the client previously | only be used to retransmit operations that the client previously | |||
transmitted and did not see replies to. Replies to operations that | transmitted and did not see replies to. Replies to operations that | |||
the server previously performed will come from the reply cache, | the server previously performed will come from the reply cache; | |||
otherwise NFS4ERR_DEADSESSION will be returned. Hence, such a | otherwise, NFS4ERR_DEADSESSION will be returned. Hence, such a | |||
session is referred to as "dead". In this situation, in order to | session is referred to as "dead". In this situation, in order to | |||
perform new operations, the client needs to establish a new session. | perform new operations, the client needs to establish a new session. | |||
If an attempt is made to establish this new session with the existing | If an attempt is made to establish this new session with the existing | |||
client ID, the server will reject the request with | client ID, the server will reject the request with | |||
NFS4ERR_STALE_CLIENTID. | NFS4ERR_STALE_CLIENTID. | |||
When NFS4ERR_STALE_CLIENTID is received in either of these | When NFS4ERR_STALE_CLIENTID is received in either of these | |||
situations, the client needs to obtain a new client ID by use of the | situations, the client needs to obtain a new client ID by use of the | |||
EXCHANGE_ID operation, then use that client ID as the basis of a new | EXCHANGE_ID operation, then use that client ID as the basis of a new | |||
session, and then proceed to any other necessary recovery for the | session, and then proceed to any other necessary recovery for the | |||
server restart case (See Section 8.4.2). | server restart case (see Section 8.4.2). | |||
See the descriptions of EXCHANGE_ID (Section 18.35) and | See the descriptions of EXCHANGE_ID (Section 18.35) and | |||
CREATE_SESSION (Section 18.36) for a complete specification of these | CREATE_SESSION (Section 18.36) for a complete specification of these | |||
operations. | operations. | |||
2.4.1. Upgrade from NFSv4.0 to NFSv4.1 | 2.4.1. Upgrade from NFSv4.0 to NFSv4.1 | |||
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a | To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a | |||
client_owner4 in an EXCHANGE_ID with an nfs_client_id4 established | value of data type client_owner4 in an EXCHANGE_ID with a value of | |||
using the SETCLIENTID operation of NFSv4.0. A server that does so | data type nfs_client_id4 that was established using the SETCLIENTID | |||
will allow an upgraded client to avoid waiting until the lease (i.e. | operation of NFSv4.0. A server that does so will allow an upgraded | |||
the lease established by the NFSv4.0 instance client) expires. This | client to avoid waiting until the lease (i.e., the lease established | |||
requires the client_owner4 be constructed the same way as the | by the NFSv4.0 instance client) expires. This requires that the | |||
nfs_client_id4. If the latter's contents included the server's | value of data type client_owner4 be constructed the same way as the | |||
network address (per the recommendations of the NFSv4.0 specification | value of data type nfs_client_id4. If the latter's contents included | |||
[30]), and the NFSv4.1 client does not wish to use a client ID that | the server's network address (per the recommendations of the NFSv4.0 | |||
prevents trunking, it should send two EXCHANGE_ID operations. The | specification [30]), and the NFSv4.1 client does not wish to use a | |||
first EXCHANGE_ID will have a client_owner4 equal to the | client ID that prevents trunking, it should send two EXCHANGE_ID | |||
nfs_client_id4. This will clear the state created by the NFSv4.0 | operations. The first EXCHANGE_ID will have a client_owner4 equal to | |||
the nfs_client_id4. This will clear the state created by the NFSv4.0 | ||||
client. The second EXCHANGE_ID will not have the server's network | client. The second EXCHANGE_ID will not have the server's network | |||
address. The state created for the second EXCHANGE_ID will not have | address. The state created for the second EXCHANGE_ID will not have | |||
to wait for lease expiration, because there will be no state to | to wait for lease expiration, because there will be no state to | |||
expire. | expire. | |||
2.4.2. Server Release of Client ID | 2.4.2. Server Release of Client ID | |||
NFSv4.1 introduces a new operation called DESTROY_CLIENTID | NFSv4.1 introduces a new operation called DESTROY_CLIENTID | |||
(Section 18.50) which the client SHOULD use to destroy a client ID it | (Section 18.50), which the client SHOULD use to destroy a client ID | |||
no longer needs. This permits graceful, bilateral release of a | it no longer needs. This permits graceful, bilateral release of a | |||
client ID. The operation cannot be used if there are sessions | client ID. The operation cannot be used if there are sessions | |||
associated with the client ID, or state with an unexpired lease. | associated with the client ID, or state with an unexpired lease. | |||
If the server determines that the client holds no associated state | If the server determines that the client holds no associated state | |||
for its client ID (associated state includes unrevoked sessions, | for its client ID (associated state includes unrevoked sessions, | |||
opens, locks, delegations, layouts, and wants), the server MAY choose | opens, locks, delegations, layouts, and wants), the server MAY choose | |||
to unilaterally release the client ID in order to conserve resources. | to unilaterally release the client ID in order to conserve resources. | |||
If the client contacts the server after this release, the server MUST | If the client contacts the server after this release, the server MUST | |||
ensure the client receives the appropriate error so that it will use | ensure that the client receives the appropriate error so that it will | |||
the EXCHANGE_ID/CREATE_SESSION sequence to establish a new client ID. | use the EXCHANGE_ID/CREATE_SESSION sequence to establish a new client | |||
The server ought to be very hesitant to release a client ID since the | ID. The server ought to be very hesitant to release a client ID | |||
resulting work on the client to recover from such an event will be | since the resulting work on the client to recover from such an event | |||
the same burden as if the server had failed and restarted. Typically | will be the same burden as if the server had failed and restarted. | |||
a server would not release a client ID unless there had been no | Typically, a server would not release a client ID unless there had | |||
activity from that client for many minutes. As long as there are | been no activity from that client for many minutes. As long as there | |||
sessions, opens, locks, delegations, layouts, or wants, the server | are sessions, opens, locks, delegations, layouts, or wants, the | |||
MUST NOT release the client ID. See Section 2.10.13.1.4 for a | server MUST NOT release the client ID. See Section 2.10.13.1.4 for | |||
discussion on releasing inactive sessions. | discussion on releasing inactive sessions. | |||
2.4.3. Resolving Client Owner Conflicts | 2.4.3. Resolving Client Owner Conflicts | |||
When the server gets an EXCHANGE_ID for a client owner that currently | When the server gets an EXCHANGE_ID for a client owner that currently | |||
has no state, or that has state, but the lease has expired, the | has no state, or that has state but the lease has expired, the server | |||
server MUST allow the EXCHANGE_ID, and confirm the new client ID if | MUST allow the EXCHANGE_ID and confirm the new client ID if followed | |||
followed by the appropriate CREATE_SESSION. | by the appropriate CREATE_SESSION. | |||
When the server gets an EXCHANGE_ID for a new incarnation of a client | When the server gets an EXCHANGE_ID for a new incarnation of a client | |||
owner that currently has an old incarnation with state and an | owner that currently has an old incarnation with state and an | |||
unexpired lease, the server is allowed to dispose of the state of the | unexpired lease, the server is allowed to dispose of the state of the | |||
previous incarnation of the client owner if one of the following are | previous incarnation of the client owner if one of the following is | |||
true: | true: | |||
o The principal that created the client ID for the client owner is | o The principal that created the client ID for the client owner is | |||
the same as the principal that is sending the EXCHANGE_ID | the same as the principal that is sending the EXCHANGE_ID | |||
operation. Note that if the client ID was created with | operation. Note that if the client ID was created with | |||
SP4_MACH_CRED state protection (Section 18.35), the principal MUST | SP4_MACH_CRED state protection (Section 18.35), the principal MUST | |||
be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used | be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used | |||
MUST be integrity or privacy, and the same GSS mechanism and | MUST be integrity or privacy, and the same GSS mechanism and | |||
principal MUST be used as that used when the client ID was | principal MUST be used as that used when the client ID was | |||
created. | created. | |||
skipping to change at page 30, line 52 | skipping to change at page 30, line 8 | |||
client ID was created. | client ID was created. | |||
If none of the above situations apply, the server MUST return | If none of the above situations apply, the server MUST return | |||
NFS4ERR_CLID_INUSE. | NFS4ERR_CLID_INUSE. | |||
If the server accepts the principal and co_ownerid as matching that | If the server accepts the principal and co_ownerid as matching that | |||
which created the client ID, and the co_verifier in the EXCHANGE_ID | which created the client ID, and the co_verifier in the EXCHANGE_ID | |||
differs from the co_verifier used when the client ID was created, | differs from the co_verifier used when the client ID was created, | |||
then after the server receives a CREATE_SESSION that confirms the | then after the server receives a CREATE_SESSION that confirms the | |||
client ID, the server deletes state. If the co_verifier values are | client ID, the server deletes state. If the co_verifier values are | |||
the same, (e.g. the client is either updating properties of the | the same (e.g., the client either is updating properties of the | |||
client ID (Section 18.35), or the client is attempting trunking | client ID (Section 18.35) or is attempting trunking (Section 2.10.5), | |||
(Section 2.10.5) the server MUST NOT delete state. | the server MUST NOT delete state. | |||
2.5. Server Owners | 2.5. Server Owners | |||
The Server Owner is similar to a Client Owner (Section 2.4), but | The server owner is similar to a client owner (Section 2.4), but | |||
unlike the Client Owner, there is no shorthand server ID. The Server | unlike the client owner, there is no shorthand server ID. The server | |||
Owner is defined in the following data type: | owner is defined in the following data type: | |||
struct server_owner4 { | struct server_owner4 { | |||
uint64_t so_minor_id; | uint64_t so_minor_id; | |||
opaque so_major_id<NFS4_OPAQUE_LIMIT>; | opaque so_major_id<NFS4_OPAQUE_LIMIT>; | |||
}; | }; | |||
The Server Owner is returned from EXCHANGE_ID. When the so_major_id | The server owner is returned from EXCHANGE_ID. When the so_major_id | |||
fields are the same in two EXCHANGE_ID results, the connections each | fields are the same in two EXCHANGE_ID results, the connections that | |||
EXCHANGE_ID were sent over can be assumed to address the same Server | each EXCHANGE_ID were sent over can be assumed to address the same | |||
(as defined in Section 1.5). If the so_minor_id fields are also the | server (as defined in Section 1.6). If the so_minor_id fields are | |||
same, then not only do both connections connect to the same server, | also the same, then not only do both connections connect to the same | |||
but the session can be shared across both connections. The reader is | server, but the session can be shared across both connections. The | |||
cautioned that multiple servers may deliberately or accidentally | reader is cautioned that multiple servers may deliberately or | |||
claim to have the same so_major_id or so_major_id/so_minor_id; the | accidentally claim to have the same so_major_id or so_major_id/ | |||
reader should examine Section 2.10.5 and Section 18.35 in order to | so_minor_id; the reader should examine Sections 2.10.5 and 18.35 in | |||
avoid acting on falsely matching Server Owner values. | order to avoid acting on falsely matching server owner values. | |||
The considerations for generating a so_major_id are similar to that | The considerations for generating a so_major_id are similar to that | |||
for generating a co_ownerid string (see Section 2.4). The | for generating a co_ownerid string (see Section 2.4). The | |||
consequences of two servers generating conflicting so_major_id values | consequences of two servers generating conflicting so_major_id values | |||
are less dire than they are for co_ownerid conflicts because the | are less dire than they are for co_ownerid conflicts because the | |||
client can use RPCSEC_GSS to compare the authenticity of each server | client can use RPCSEC_GSS to compare the authenticity of each server | |||
(see Section 2.10.5). | (see Section 2.10.5). | |||
2.6. Security Service Negotiation | 2.6. Security Service Negotiation | |||
With the NFSv4.1 server potentially offering multiple security | With the NFSv4.1 server potentially offering multiple security | |||
mechanisms, the client needs a method to determine or negotiate which | mechanisms, the client needs a method to determine or negotiate which | |||
mechanism is to be used for its communication with the server. The | mechanism is to be used for its communication with the server. The | |||
NFS server may have multiple points within its file system namespace | NFS server may have multiple points within its file system namespace | |||
that are available for use by NFS clients. These points can be | that are available for use by NFS clients. These points can be | |||
considered security policy boundaries, and in some NFS | considered security policy boundaries, and, in some NFS | |||
implementations are tied to NFS export points. In turn the NFS | implementations, are tied to NFS export points. In turn, the NFS | |||
server may be configured such that each of these security policy | server may be configured such that each of these security policy | |||
boundaries may have different or multiple security mechanisms in use. | boundaries may have different or multiple security mechanisms in use. | |||
The security negotiation between client and server SHOULD be done | The security negotiation between client and server SHOULD be done | |||
with a secure channel to eliminate the possibility of a third party | with a secure channel to eliminate the possibility of a third party | |||
intercepting the negotiation sequence and forcing the client and | intercepting the negotiation sequence and forcing the client and | |||
server to choose a lower level of security than required or desired. | server to choose a lower level of security than required or desired. | |||
See Section 21 for further discussion. | See Section 21 for further discussion. | |||
2.6.1. NFSv4.1 Security Tuples | 2.6.1. NFSv4.1 Security Tuples | |||
An NFS server can assign one or more "security tuples" to each | An NFS server can assign one or more "security tuples" to each | |||
security policy boundary in its namespace. Each security tuple | security policy boundary in its namespace. Each security tuple | |||
consists of a security flavor (see Section 2.2.1.1), and if the | consists of a security flavor (see Section 2.2.1.1) and, if the | |||
flavor is RPCSEC_GSS, a GSS-API mechanism OID, a GSS-API quality of | flavor is RPCSEC_GSS, a GSS-API mechanism Object Identifier (OID), a | |||
protection, and an RPCSEC_GSS service. | GSS-API quality of protection, and an RPCSEC_GSS service. | |||
2.6.2. SECINFO and SECINFO_NO_NAME | 2.6.2. SECINFO and SECINFO_NO_NAME | |||
The SECINFO and SECINFO_NO_NAME operations allow the client to | The SECINFO and SECINFO_NO_NAME operations allow the client to | |||
determine, on a per filehandle basis, what security tuple is to be | determine, on a per-filehandle basis, what security tuple is to be | |||
used for server access. In general, the client will not have to use | used for server access. In general, the client will not have to use | |||
either operation except during initial communication with the server | either operation except during initial communication with the server | |||
or when the client crosses security policy boundaries at the server. | or when the client crosses security policy boundaries at the server. | |||
However, the server's policies may also change at any time and force | However, the server's policies may also change at any time and force | |||
the client to negotiate a new security tuple. | the client to negotiate a new security tuple. | |||
Where the use of different security tuples would affect the type of | Where the use of different security tuples would affect the type of | |||
access that would be allowed if a request was sent over the same | access that would be allowed if a request was sent over the same | |||
connection used for the SECINFO or SECINFO_NO_NAME operation (e.g. | connection used for the SECINFO or SECINFO_NO_NAME operation (e.g., | |||
read-only vs. read-write) access, security tuples that allow greater | read-only vs. read-write) access, security tuples that allow greater | |||
access should be presented first. Where the general level of access | access should be presented first. Where the general level of access | |||
is the same and different security flavors limit the range of | is the same and different security flavors limit the range of | |||
principals whose privileges are recognized (e.g. allowing or | principals whose privileges are recognized (e.g., allowing or | |||
disallowing root access), flavors supporting the greatest range of | disallowing root access), flavors supporting the greatest range of | |||
principals should be listed first. | principals should be listed first. | |||
2.6.3. Security Error | 2.6.3. Security Error | |||
Based on the assumption that each NFSv4.1 client and server MUST | Based on the assumption that each NFSv4.1 client and server MUST | |||
support a minimum set of security (i.e., Kerberos V5 under | support a minimum set of security (i.e., Kerberos V5 under | |||
RPCSEC_GSS), the NFS client will initiate file access to the server | RPCSEC_GSS), the NFS client will initiate file access to the server | |||
with one of the minimal security tuples. During communication with | with one of the minimal security tuples. During communication with | |||
the server, the client may receive an NFS error of NFS4ERR_WRONGSEC. | the server, the client may receive an NFS error of NFS4ERR_WRONGSEC. | |||
This error allows the server to notify the client that the security | This error allows the server to notify the client that the security | |||
tuple currently being used contravenes the server's security policy. | tuple currently being used contravenes the server's security policy. | |||
The client is then responsible for determining (see Section 2.6.3.1) | The client is then responsible for determining (see Section 2.6.3.1) | |||
what security tuples are available at the server and choosing one | what security tuples are available at the server and choosing one | |||
which is appropriate for the client. | that is appropriate for the client. | |||
2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME | 2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME | |||
This section explains of the mechanics of NFSv4.1 security | This section explains the mechanics of NFSv4.1 security negotiation. | |||
negotiation. | ||||
2.6.3.1.1. Put Filehandle Operations | 2.6.3.1.1. Put Filehandle Operations | |||
The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH, | The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH, | |||
PUTFH, and RESTOREFH. Each of the subsections herein describes how | PUTFH, and RESTOREFH. Each of the subsections herein describes how | |||
the server handles a subseries of operations that starts with a put | the server handles a subseries of operations that starts with a put | |||
filehandle operation. | filehandle operation. | |||
2.6.3.1.1.1. Put Filehandle Operation + SAVEFH | 2.6.3.1.1.1. Put Filehandle Operation + SAVEFH | |||
The client is saving a filehandle for a future RESTOREFH, LINK, or | The client is saving a filehandle for a future RESTOREFH, LINK, or | |||
RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine | RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine | |||
whether the put filehandle operation returns NFS4ERR_WRONGSEC or not, | whether or not the put filehandle operation returns NFS4ERR_WRONGSEC, | |||
the server implementation pretends SAVEFH is not in the series of | the server implementation pretends SAVEFH is not in the series of | |||
operations and examines which of the situations described in the | operations and examines which of the situations described in the | |||
other subsections of Section 2.6.3.1.1 apply. | other subsections of Section 2.6.3.1.1 apply. | |||
2.6.3.1.1.2. Two or More Put Filehandle Operations | 2.6.3.1.1.2. Two or More Put Filehandle Operations | |||
For a series of N put filehandle operations, the server MUST NOT | For a series of N put filehandle operations, the server MUST NOT | |||
return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. | return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations. | |||
The N'th put filehandle operation is handled as if it is the first in | The Nth put filehandle operation is handled as if it is the first in | |||
a subseries of operations. For example if the server received PUTFH, | a subseries of operations. For example, if the server received a | |||
PUTROOTFH, LOOKUP, then the PUTFH is ignored for NFS4ERR_WRONGSEC | COMPOUND request with this series of operations -- PUTFH, PUTROOTFH, | |||
LOOKUP -- then the PUTFH operation is ignored for NFS4ERR_WRONGSEC | ||||
purposes, and the PUTROOTFH, LOOKUP subseries is processed as | purposes, and the PUTROOTFH, LOOKUP subseries is processed as | |||
according to Section 2.6.3.1.1.3. | according to Section 2.6.3.1.1.3. | |||
2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing | 2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing | |||
Name) | Name) | |||
This situation also applies to a put filehandle operation followed by | This situation also applies to a put filehandle operation followed by | |||
a LOOKUP or an OPEN operation that specifies an existing component | a LOOKUP or an OPEN operation that specifies an existing component | |||
name. | name. | |||
In this situation, the client is potentially crossing a security | In this situation, the client is potentially crossing a security | |||
policy boundary, and the set of security tuples the parent directory | policy boundary, and the set of security tuples the parent directory | |||
supports may differ from those of the child. The server | supports may differ from those of the child. The server | |||
implementation may decide whether to impose any restrictions on | implementation may decide whether to impose any restrictions on | |||
security policy administration. There are at least three approaches | security policy administration. There are at least three approaches | |||
(sec_policy_child is the tuple set of the child export, | (sec_policy_child is the tuple set of the child export, | |||
sec_policy_parent is that of the parent). | sec_policy_parent is that of the parent). | |||
a) sec_policy_child <= sec_policy_parent (<= for subset). This | (a) sec_policy_child <= sec_policy_parent (<= for subset). This | |||
means that the set of security tuples specified on the security | means that the set of security tuples specified on the security | |||
policy of a child directory is always a subset of that of its | policy of a child directory is always a subset of its parent | |||
parent directory. | directory. | |||
b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, | (b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection, | |||
{} for the empty set). This means that the security tuples | {} for the empty set). This means that the set of security | |||
specified on the security policy of a child directory always has a | tuples specified on the security policy of a child directory | |||
non empty intersection with that of the parent. | always has a non-empty intersection with that of the parent. | |||
c) sec_policy_child ^ sec_policy_parent == {}. This means that | (c) sec_policy_child ^ sec_policy_parent == {}. This means that the | |||
the set of tuples specified on the security policy of a child | set of security tuples specified on the security policy of a | |||
directory may not intersect with that of the parent. In other | child directory may not intersect with that of the parent. In | |||
words, there are no restrictions on how the system administrator | other words, there are no restrictions on how the system | |||
may set up these tuples. | administrator may set up these tuples. | |||
In order for a server to support approaches (b) (for the case when a | In order for a server to support approaches (b) (for the case when a | |||
client chooses a flavor that is not a member of sec_policy_parent) | client chooses a flavor that is not a member of sec_policy_parent) | |||
and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC | and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC | |||
when there is a security tuple mismatch. Instead, it should be | when there is a security tuple mismatch. Instead, it should be | |||
returned from the LOOKUP (or OPEN by existing component name) that | returned from the LOOKUP (or OPEN by existing component name) that | |||
follows. | follows. | |||
Since the above guideline does not contradict approach (a), it should | Since the above guideline does not contradict approach (a), it should | |||
be followed in general. Even if approach (a) is implemented, it is | be followed in general. Even if approach (a) is implemented, it is | |||
skipping to change at page 35, line 7 | skipping to change at page 34, line 11 | |||
the client's only recourse is to send the put filehandle operation, | the client's only recourse is to send the put filehandle operation, | |||
LOOKUPP, GETFH sequence of operations with every security tuple it | LOOKUPP, GETFH sequence of operations with every security tuple it | |||
supports. | supports. | |||
Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server | Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server | |||
MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle | MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle | |||
operation if the operation is immediately followed by a LOOKUPP. | operation if the operation is immediately followed by a LOOKUPP. | |||
2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME | 2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME | |||
A security sensitive client is allowed to choose a strong security | A security-sensitive client is allowed to choose a strong security | |||
tuple when querying a server to determine a file object's permitted | tuple when querying a server to determine a file object's permitted | |||
security tuples. The security tuple chosen by the client does not | security tuples. The security tuple chosen by the client does not | |||
have to be included in the tuple list of the security policy of the | have to be included in the tuple list of the security policy of | |||
either parent directory indicated in the put filehandle operation, or | either the parent directory indicated in the put filehandle operation | |||
the child file object indicated in SECINFO (or any parent directory | or the child file object indicated in SECINFO (or any parent | |||
indicated in SECINFO_NO_NAME). Of course the server has to be | directory indicated in SECINFO_NO_NAME). Of course, the server has | |||
configured for whatever security tuple the client selects, otherwise | to be configured for whatever security tuple the client selects; | |||
the request will fail at RPC layer with an appropriate authentication | otherwise, the request will fail at the RPC layer with an appropriate | |||
error. | authentication error. | |||
In theory, there is no connection between the security flavor used by | In theory, there is no connection between the security flavor used by | |||
SECINFO or SECINFO_NO_NAME and those supported by the security | SECINFO or SECINFO_NO_NAME and those supported by the security | |||
policy. But in practice, the client may start looking for strong | policy. But in practice, the client may start looking for strong | |||
flavors from those supported by the security policy, followed by | flavors from those supported by the security policy, followed by | |||
those in the REQUIRED set. | those in the REQUIRED set. | |||
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put | The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put | |||
filehandle operation that is immediately followed by SECINFO or | filehandle operation that is immediately followed by SECINFO or | |||
SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC | SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC | |||
skipping to change at page 35, line 38 | skipping to change at page 34, line 42 | |||
2.6.3.1.1.6. Put Filehandle Operation + Nothing | 2.6.3.1.1.6. Put Filehandle Operation + Nothing | |||
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. | The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC. | |||
2.6.3.1.1.7. Put Filehandle Operation + Anything Else | 2.6.3.1.1.7. Put Filehandle Operation + Anything Else | |||
"Anything Else" includes OPEN by filehandle. | "Anything Else" includes OPEN by filehandle. | |||
The security policy enforcement applies to the filehandle specified | The security policy enforcement applies to the filehandle specified | |||
in the put filehandle operation. Therefore the put filehandle | in the put filehandle operation. Therefore, the put filehandle | |||
operation MUST return NFS4ERR_WRONGSEC when there is a security tuple | operation MUST return NFS4ERR_WRONGSEC when there is a security tuple | |||
mismatch. This avoids the complexity adding NFS4ERR_WRONGSEC as an | mismatch. This avoids the complexity of adding NFS4ERR_WRONGSEC as | |||
allowable error to every other operation. | an allowable error to every other operation. | |||
A COMPOUND containing the series put filehandle operation + | A COMPOUND containing the series put filehandle operation + | |||
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way | SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way | |||
for the client to recover from NFS4ERR_WRONGSEC. | for the client to recover from NFS4ERR_WRONGSEC. | |||
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation | The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation | |||
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by | other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by | |||
component name). | component name). | |||
2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME | 2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME | |||
skipping to change at page 36, line 23 | skipping to change at page 35, line 24 | |||
SECINFO and SECINFO_NO_NAME consume the current filehandle (note that | SECINFO and SECINFO_NO_NAME consume the current filehandle (note that | |||
this is a change from NFSv4.0). This leaves no current filehandle | this is a change from NFSv4.0). This leaves no current filehandle | |||
for READ to use, and READ returns NFS4ERR_NOFILEHANDLE. | for READ to use, and READ returns NFS4ERR_NOFILEHANDLE. | |||
2.6.3.1.2. LINK and RENAME | 2.6.3.1.2. LINK and RENAME | |||
The LINK and RENAME operations use both the current and saved | The LINK and RENAME operations use both the current and saved | |||
filehandles. Technically, the server MAY return NFS4ERR_WRONGSEC | filehandles. Technically, the server MAY return NFS4ERR_WRONGSEC | |||
from LINK or RENAME if the security policy of the saved filehandle | from LINK or RENAME if the security policy of the saved filehandle | |||
rejects the security flavor used in the COMPOUND request's | rejects the security flavor used in the COMPOUND request's | |||
credentials. However, if the server does so, and if there is no | credentials. If the server does so, then if there is no intersection | |||
intersection between the security policies of saved and current | between the security policies of saved and current filehandles, this | |||
filehandles, this means it will be impossible for the client to | means that it will be impossible for the client to perform the | |||
perform the intended LINK or RENAME operation. | intended LINK or RENAME operation. | |||
For example, suppose the client sends this COMPOUND request: | For example, suppose the client sends this COMPOUND request: | |||
SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where | SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where | |||
filehandles bFH and aFH refer to different directories. Suppose no | filehandles bFH and aFH refer to different directories. Suppose no | |||
common security tuple exists between the security policies of aFH and | common security tuple exists between the security policies of aFH and | |||
bFH. If the client sends the request using credentials acceptable to | bFH. If the client sends the request using credentials acceptable to | |||
bFH's security policy but not aFH's policy, then the PUTFH aFH | bFH's security policy but not aFH's policy, then the PUTFH aFH | |||
operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME | operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME | |||
request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, | request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, | |||
RENAME "c" "d", using credentials acceptable to aFH's security | RENAME "c" "d", using credentials acceptable to aFH's security policy | |||
policy, but not bFH's policy. The server returns NFS4ERR_WRONGSEC on | but not bFH's policy. The server returns NFS4ERR_WRONGSEC on the | |||
the RENAME operation. | RENAME operation. | |||
To prevent a client from starting endless cycle of a request | To prevent a client from an endless sequence of a request containing | |||
containing LINK or RENAME, followed by a request containing | LINK or RENAME, followed by a request containing SECINFO_NO_NAME or | |||
SECINFO_NO_NAME or SECINFO, the server MUST detect when the security | SECINFO, the server MUST detect when the security policies of the | |||
policies of the current and saved filehandles have no mutually | current and saved filehandles have no mutually acceptable security | |||
acceptable security tuple, and MUST NOT return NFS4ERR_WRONGSEC from | tuple, and MUST NOT return NFS4ERR_WRONGSEC from LINK or RENAME in | |||
LINK or RENAME in that situation. Instead the server MUST do one of | that situation. Instead the server MUST do one of two things: | |||
two things: | ||||
o The server can return NFS4ERR_XDEV. | o The server can return NFS4ERR_XDEV. | |||
o The server can allow the security policy of the current filehandle | o The server can allow the security policy of the current filehandle | |||
to override that of the saved filehandle, and so return NFS4_OK. | to override that of the saved filehandle, and so return NFS4_OK. | |||
2.7. Minor Versioning | 2.7. Minor Versioning | |||
To address the requirement of an NFS protocol that can evolve as the | To address the requirement of an NFS protocol that can evolve as the | |||
need arises, the NFSv4.1 protocol contains the rules and framework to | need arises, the NFSv4.1 protocol contains the rules and framework to | |||
allow for future minor changes or versioning. | allow for future minor changes or versioning. | |||
The base assumption with respect to minor versioning is that any | The base assumption with respect to minor versioning is that any | |||
future accepted minor version will be documented in one or more | future accepted minor version will be documented in one or more | |||
standards track RFCs. Minor version zero of the NFSv4 protocol is | Standards Track RFCs. Minor version 0 of the NFSv4 protocol is | |||
represented by [30], and minor version one is represented by this | represented by [30], and minor version 1 is represented by this RFC. | |||
document [[Comment.1: RFC Editor: change "document" to "RFC" when we | The COMPOUND and CB_COMPOUND procedures support the encoding of the | |||
publish]]. The COMPOUND and CB_COMPOUND procedures support the | minor version being requested by the client. | |||
encoding of the minor version being requested by the client. | ||||
The following items represent the basic rules for the development of | The following items represent the basic rules for the development of | |||
minor versions. Note that a future minor version may modify or add | minor versions. Note that a future minor version may modify or add | |||
to the following rules as part of the minor version definition. | to the following rules as part of the minor version definition. | |||
1. Procedures are not added or deleted | 1. Procedures are not added or deleted. | |||
To maintain the general RPC model, NFSv4 minor versions will not | To maintain the general RPC model, NFSv4 minor versions will not | |||
add to or delete procedures from the NFS program. | add to or delete procedures from the NFS program. | |||
2. Minor versions may add operations to the COMPOUND and | 2. Minor versions may add operations to the COMPOUND and | |||
CB_COMPOUND procedures. | CB_COMPOUND procedures. | |||
The addition of operations to the COMPOUND and CB_COMPOUND | The addition of operations to the COMPOUND and CB_COMPOUND | |||
procedures does not affect the RPC model. | procedures does not affect the RPC model. | |||
* Minor versions may append attributes to the bitmap4 that | * Minor versions may append attributes to the bitmap4 that | |||
represents sets of attributes and the fattr4 that represents | represents sets of attributes and to the fattr4 that | |||
sets of attribute values. | represents sets of attribute values. | |||
This allows for the expansion of the attribute model to allow | This allows for the expansion of the attribute model to allow | |||
for future growth or adaptation. | for future growth or adaptation. | |||
* Minor version X must append any new attributes after the last | * Minor version X must append any new attributes after the last | |||
documented attribute. | documented attribute. | |||
Since attribute results are specified as an opaque array of | Since attribute results are specified as an opaque array of | |||
per-attribute XDR encoded results, the complexity of adding | per-attribute, XDR-encoded results, the complexity of adding | |||
new attributes in the midst of the current definitions would | new attributes in the midst of the current definitions would | |||
be too burdensome. | be too burdensome. | |||
3. Minor versions must not modify the structure of an existing | 3. Minor versions must not modify the structure of an existing | |||
operation's arguments or results. | operation's arguments or results. | |||
Again the complexity of handling multiple structure definitions | Again, the complexity of handling multiple structure definitions | |||
for a single operation is too burdensome. New operations should | for a single operation is too burdensome. New operations should | |||
be added instead of modifying existing structures for a minor | be added instead of modifying existing structures for a minor | |||
version. | version. | |||
This rule does not preclude the following adaptations in a minor | This rule does not preclude the following adaptations in a minor | |||
version. | version: | |||
* adding bits to flag fields such as new attributes to | * adding bits to flag fields, such as new attributes to | |||
GETATTR's bitmap4 data type and providing corresponding | GETATTR's bitmap4 data type, and providing corresponding | |||
variants of opaque arrays, such as a notify4 used together | variants of opaque arrays, such as a notify4 used together | |||
with such bitmaps. | with such bitmaps | |||
* adding bits to existing attributes like ACLs that have flag | * adding bits to existing attributes like ACLs that have flag | |||
words | words | |||
* extending enumerated types (including NFS4ERR_*) with new | * extending enumerated types (including NFS4ERR_*) with new | |||
values | values | |||
* adding cases to a switched union | * adding cases to a switched union | |||
4. Minor versions must not modify the structure of existing | 4. Minor versions must not modify the structure of existing | |||
skipping to change at page 38, line 38 | skipping to change at page 37, line 37 | |||
This prevents the potential reuse of a particular operation | This prevents the potential reuse of a particular operation | |||
"slot" in a future minor version. | "slot" in a future minor version. | |||
6. Minor versions must not delete attributes. | 6. Minor versions must not delete attributes. | |||
7. Minor versions must not delete flag bits or enumeration values. | 7. Minor versions must not delete flag bits or enumeration values. | |||
8. Minor versions may declare an operation MUST NOT be implemented. | 8. Minor versions may declare an operation MUST NOT be implemented. | |||
Specifying an operation MUST NOT be implemented is equivalent to | Specifying that an operation MUST NOT be implemented is | |||
obsoleting an operation. For the client, it means that the | equivalent to obsoleting an operation. For the client, it means | |||
operation MUST NOT be sent to the server. For the server, an | that the operation MUST NOT be sent to the server. For the | |||
NFS error can be returned as opposed to "dropping" the request | server, an NFS error can be returned as opposed to "dropping" | |||
as an XDR decode error. This approach allows for the | the request as an XDR decode error. This approach allows for | |||
obsolescence of an operation while maintaining its structure so | the obsolescence of an operation while maintaining its structure | |||
that a future minor version can reintroduce the operation. | so that a future minor version can reintroduce the operation. | |||
1. Minor versions may declare an attribute MUST NOT be | 1. Minor versions may declare that an attribute MUST NOT be | |||
implemented. | implemented. | |||
2. Minor versions may declare a flag bit or enumeration value | 2. Minor versions may declare that a flag bit or enumeration | |||
MUST NOT be implemented. | value MUST NOT be implemented. | |||
9. Minor versions may downgrade features from REQUIRED to | 9. Minor versions may downgrade features from REQUIRED to | |||
RECOMMENDED, or RECOMMENDED to OPTIONAL. | RECOMMENDED, or RECOMMENDED to OPTIONAL. | |||
10. Minor versions may upgrade features from OPTIONAL to RECOMMENDED | 10. Minor versions may upgrade features from OPTIONAL to | |||
or RECOMMENDED to REQUIRED. | RECOMMENDED, or RECOMMENDED to REQUIRED. | |||
11. A client and server that supports minor version X SHOULD support | 11. A client and server that support minor version X SHOULD support | |||
minor versions 0 (zero) through X-1 as well. | minor versions zero through X-1 as well. | |||
12. Except for infrastructural changes, a minor version must not | 12. Except for infrastructural changes, a minor version must not | |||
introduce REQUIRED new features. | introduce REQUIRED new features. | |||
This rule allows for the introduction of new functionality and | This rule allows for the introduction of new functionality and | |||
forces the use of implementation experience before designating a | forces the use of implementation experience before designating a | |||
feature as REQUIRED. On the other hand, some classes of | feature as REQUIRED. On the other hand, some classes of | |||
features are infrastructural and have broad effects. Allowing | features are infrastructural and have broad effects. Allowing | |||
infrastructural features to be RECOMMENDED or OPTIONAL | infrastructural features to be RECOMMENDED or OPTIONAL | |||
complicates implementation of the minor version. | complicates implementation of the minor version. | |||
13. A client MUST NOT attempt to use a stateid, filehandle, or | 13. A client MUST NOT attempt to use a stateid, filehandle, or | |||
similar returned object from the COMPOUND procedure with minor | similar returned object from the COMPOUND procedure with minor | |||
version X for another COMPOUND procedure with minor version Y, | version X for another COMPOUND procedure with minor version Y, | |||
where X != Y. | where X != Y. | |||
2.8. Non-RPC-based Security Services | 2.8. Non-RPC-Based Security Services | |||
As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for | As described in Section 2.2.1.1.1.1, NFSv4.1 relies on RPC for | |||
identification, authentication, integrity, and privacy. NFSv4.1 | identification, authentication, integrity, and privacy. NFSv4.1 | |||
itself provides or enables additional security services as described | itself provides or enables additional security services as described | |||
in the next several subsections. | in the next several subsections. | |||
2.8.1. Authorization | 2.8.1. Authorization | |||
Authorization to access a file object via an NFSv4.1 operation is | Authorization to access a file object via an NFSv4.1 operation is | |||
ultimately determined by the NFSv4.1 server. A client can | ultimately determined by the NFSv4.1 server. A client can | |||
predetermine its access to a file object via the OPEN (Section 18.16) | predetermine its access to a file object via the OPEN (Section 18.16) | |||
and the ACCESS (Section 18.1) operations. | and the ACCESS (Section 18.1) operations. | |||
Principals with appropriate access rights can modify the | Principals with appropriate access rights can modify the | |||
authorization on a file object via the SETATTR (Section 18.30) | authorization on a file object via the SETATTR (Section 18.30) | |||
operation. Attributes that affect access rights include: mode, | operation. Attributes that affect access rights include mode, owner, | |||
owner, owner_group, acl, dacl, and sacl. See Section 5. | owner_group, acl, dacl, and sacl. See Section 5. | |||
2.8.2. Auditing | 2.8.2. Auditing | |||
NFSv4.1 provides auditing on a per file object basis, via the acl and | NFSv4.1 provides auditing on a per-file object basis, via the acl and | |||
sacl attributes as described in Section 6. It is outside the scope | sacl attributes as described in Section 6. It is outside the scope | |||
of this specification to specify audit log formats or management | of this specification to specify audit log formats or management | |||
policies. | policies. | |||
2.8.3. Intrusion Detection | 2.8.3. Intrusion Detection | |||
NFSv4.1 provides alarm control on a per file object basis, via the | NFSv4.1 provides alarm control on a per-file object basis, via the | |||
acl and sacl attributes as described in Section 6. Alarms may serve | acl and sacl attributes as described in Section 6. Alarms may serve | |||
as the basis for intrusion detection. It is outside the scope of | as the basis for intrusion detection. It is outside the scope of | |||
this specification to specify heuristics for detecting intrusion via | this specification to specify heuristics for detecting intrusion via | |||
alarms. | alarms. | |||
2.9. Transport Layers | 2.9. Transport Layers | |||
2.9.1. REQUIRED and RECOMMENDED Properties of Transports | 2.9.1. REQUIRED and RECOMMENDED Properties of Transports | |||
NFSv4.1 works over RDMA and non-RDMA-based transports with the | NFSv4.1 works over Remote Direct Memory Access (RDMA) and non-RDMA- | |||
following attributes: | based transports with the following attributes: | |||
o The transport supports reliable delivery of data, which NFSv4.1 | o The transport supports reliable delivery of data, which NFSv4.1 | |||
requires but neither NFSv4.1 nor RPC has facilities for ensuring. | requires but neither NFSv4.1 nor RPC has facilities for ensuring | |||
[34] | [34]. | |||
o The transport delivers data in the order it was sent. Ordered | o The transport delivers data in the order it was sent. Ordered | |||
delivery simplifies detection of transmit errors, and simplifies | delivery simplifies detection of transmit errors, and simplifies | |||
the sending of arbitrary sized requests and responses, via the | the sending of arbitrary sized requests and responses via the | |||
record marking protocol [3]. | record marking protocol [3]. | |||
Where an NFSv4.1 implementation supports operation over the IP | Where an NFSv4.1 implementation supports operation over the IP | |||
network protocol, any transport used between NFS and IP MUST be among | network protocol, any transport used between NFS and IP MUST be among | |||
the IETF-approved congestion control transport protocols. At the | the IETF-approved congestion control transport protocols. At the | |||
time this document was written, the only two transports that had the | time this document was written, the only two transports that had the | |||
above attributes were TCP and SCTP. To enhance the possibilities for | above attributes were TCP and the Stream Control Transmission | |||
interoperability, an NFSv4.1 implementation MUST support operation | Protocol (SCTP). To enhance the possibilities for interoperability, | |||
over the TCP transport protocol. | an NFSv4.1 implementation MUST support operation over the TCP | |||
transport protocol. | ||||
Even if NFSv4.1 is used over a non-IP network protocol, it is | Even if NFSv4.1 is used over a non-IP network protocol, it is | |||
RECOMMENDED that the transport support congestion control. | RECOMMENDED that the transport support congestion control. | |||
It is permissible for a connectionless transport to be used under | It is permissible for a connectionless transport to be used under | |||
NFSv4.1, however reliable and in-order delivery of data combined with | NFSv4.1; however, reliable and in-order delivery of data combined | |||
congestion control by the connectionless transport is REQUIRED; as a | with congestion control by the connectionless transport is REQUIRED. | |||
consequence UDP by itself MUST NOT be used as an NFSv4.1 transport. | As a consequence, UDP by itself MUST NOT be used as an NFSv4.1 | |||
NFSv4.1 assumes that a client transport address and server transport | transport. NFSv4.1 assumes that a client transport address and | |||
address used to send data over a transport together constitute a | server transport address used to send data over a transport together | |||
connection, even if the underlying transport eschews the concept of a | constitute a connection, even if the underlying transport eschews the | |||
connection. | concept of a connection. | |||
2.9.2. Client and Server Transport Behavior | 2.9.2. Client and Server Transport Behavior | |||
If a connection-oriented transport (e.g. TCP) is used, the client | If a connection-oriented transport (e.g., TCP) is used, the client | |||
and server SHOULD use long lived connections for at least three | and server SHOULD use long-lived connections for at least three | |||
reasons: | reasons: | |||
1. This will prevent the weakening of the transport's congestion | 1. This will prevent the weakening of the transport's congestion | |||
control mechanisms via short lived connections. | control mechanisms via short-lived connections. | |||
2. This will improve performance for the WAN environment by | 2. This will improve performance for the WAN environment by | |||
eliminating the need for connection setup handshakes. | eliminating the need for connection setup handshakes. | |||
3. The NFSv4.1 callback model differs from NFSv4.0, and requires the | 3. The NFSv4.1 callback model differs from NFSv4.0, and requires the | |||
client and server to maintain a client-created backchannel (see | client and server to maintain a client-created backchannel (see | |||
Section 2.10.3.1) for the server to use. | Section 2.10.3.1) for the server to use. | |||
In order to reduce congestion, if a connection-oriented transport is | In order to reduce congestion, if a connection-oriented transport is | |||
used, and the request is not the NULL procedure, | used, and the request is not the NULL procedure: | |||
o A requester MUST NOT retry a request unless the connection the | o A requester MUST NOT retry a request unless the connection the | |||
request was sent over was lost before the reply was received. | request was sent over was lost before the reply was received. | |||
o A replier MUST NOT silently drop a request, even if the request is | o A replier MUST NOT silently drop a request, even if the request is | |||
a retry. (The silent drop behavior of RPCSEC_GSS [4] does not | a retry. (The silent drop behavior of RPCSEC_GSS [4] does not | |||
apply because this behavior happens at the RPCSEC_GSS layer, a | apply because this behavior happens at the RPCSEC_GSS layer, a | |||
lower layer in the request processing). Instead, the replier | lower layer in the request processing.) Instead, the replier | |||
SHOULD return an appropriate error (see Section 2.10.6.1) or it | SHOULD return an appropriate error (see Section 2.10.6.1), or it | |||
MAY disconnect the connection. | MAY disconnect the connection. | |||
When sending a reply, the replier MUST send the reply to the same | When sending a reply, the replier MUST send the reply to the same | |||
full network address (e.g. if using an IP-based transport, the source | full network address (e.g., if using an IP-based transport, the | |||
port of the requester is part of the full network address) that the | source port of the requester is part of the full network address) | |||
requester sent the request from. If using a connection-oriented | from which the requester sent the request. If using a connection- | |||
transport, replies MUST be sent on the same connection the request | oriented transport, replies MUST be sent on the same connection from | |||
was received from. | which the request was received. | |||
If a connection is dropped after the replier receives the request but | If a connection is dropped after the replier receives the request but | |||
before the replier sends the reply, the replier might have an pending | before the replier sends the reply, the replier might have a pending | |||
reply. If a connection is established with the same source and | reply. If a connection is established with the same source and | |||
destination full network address as the dropped connection, then the | destination full network address as the dropped connection, then the | |||
replier MUST NOT send the reply until the requester retries the | replier MUST NOT send the reply until the requester retries the | |||
request. The reason for this prohibition is that the requester MAY | request. The reason for this prohibition is that the requester MAY | |||
retry a request over a different connection that is associated with | retry a request over a different connection (provided that connection | |||
the session. | is associated with the original request's session). | |||
When using RDMA transports there are other reasons for not tolerating | When using RDMA transports, there are other reasons for not | |||
retries over the same connection: | tolerating retries over the same connection: | |||
o RDMA transports use "credits" to enforce flow control, where a | o RDMA transports use "credits" to enforce flow control, where a | |||
credit is a right to a peer to transmit a message. If one peer | credit is a right to a peer to transmit a message. If one peer | |||
were to retransmit a request (or reply), it would consume an | were to retransmit a request (or reply), it would consume an | |||
additional credit. If the replier retransmitted a reply, it would | additional credit. If the replier retransmitted a reply, it would | |||
certainly result in an RDMA connection loss, since the requester | certainly result in an RDMA connection loss, since the requester | |||
would typically only post a single receive buffer for each | would typically only post a single receive buffer for each | |||
request. If the requester retransmitted a request, the additional | request. If the requester retransmitted a request, the additional | |||
credit consumed on the server might lead to RDMA connection | credit consumed on the server might lead to RDMA connection | |||
failure unless the client accounted for it and decreased its | failure unless the client accounted for it and decreased its | |||
skipping to change at page 43, line 20 | skipping to change at page 42, line 20 | |||
shortfalls with practical solutions: | shortfalls with practical solutions: | |||
o EOS is enabled by a reply cache with a bounded size, making it | o EOS is enabled by a reply cache with a bounded size, making it | |||
feasible to keep the cache in persistent storage and enable EOS | feasible to keep the cache in persistent storage and enable EOS | |||
through server failure and recovery. One reason that previous | through server failure and recovery. One reason that previous | |||
revisions of NFS did not support EOS was because some EOS | revisions of NFS did not support EOS was because some EOS | |||
approaches often limited parallelism. As will be explained in | approaches often limited parallelism. As will be explained in | |||
Section 2.10.6, NFSv4.1 supports both EOS and unlimited | Section 2.10.6, NFSv4.1 supports both EOS and unlimited | |||
parallelism. | parallelism. | |||
o The NFSv4.1 client (defined in Section 1.5, Paragraph 2) creates | o The NFSv4.1 client (defined in Section 1.6, Paragraph 2) creates | |||
transport connections and provides them to the server to use for | transport connections and provides them to the server to use for | |||
sending callback requests, thus solving the firewall issue | sending callback requests, thus solving the firewall issue | |||
(Section 18.34). Races between responses from client requests, | (Section 18.34). Races between responses from client requests and | |||
and callbacks caused by the requests are detected via the | callbacks caused by the requests are detected via the session's | |||
session's sequencing properties which are a consequence of EOS | sequencing properties that are a consequence of EOS | |||
(Section 2.10.6.3). | (Section 2.10.6.3). | |||
o The NFSv4.1 client can associate an arbitrary number of | o The NFSv4.1 client can associate an arbitrary number of | |||
connections with the session, and thus provide trunking | connections with the session, and thus provide trunking | |||
(Section 2.10.5). | (Section 2.10.5). | |||
o The NFSv4.1 client and server produces a session key independent | o The NFSv4.1 client and server produces a session key independent | |||
of client and server machine credentials which can be used to | of client and server machine credentials which can be used to | |||
compute a digest for protecting critical session management | compute a digest for protecting critical session management | |||
operations (Section 2.10.8.3). | operations (Section 2.10.8.3). | |||
o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for | o The NFSv4.1 client can also create secure RPCSEC_GSS contexts for | |||
use by the session's backchannel that do not require the server to | use by the session's backchannel that do not require the server to | |||
authenticate to a client machine principal (Section 2.10.8.2). | authenticate to a client machine principal (Section 2.10.8.2). | |||
A session is a dynamically created, long-lived server object created | A session is a dynamically created, long-lived server object created | |||
by a client, used over time from one or more transport connections. | by a client and used over time from one or more transport | |||
Its function is to maintain the server's state relative to the | connections. Its function is to maintain the server's state relative | |||
connection(s) belonging to a client instance. This state is entirely | to the connection(s) belonging to a client instance. This state is | |||
independent of the connection itself, and indeed the state exists | entirely independent of the connection itself, and indeed the state | |||
whether the connection exists or not. A client may have one or more | exists whether or not the connection exists. A client may have one | |||
sessions associated with it so that client-associated state may be | or more sessions associated with it so that client-associated state | |||
accessed using any of the sessions associated with that client's | may be accessed using any of the sessions associated with that | |||
client ID, when connections are associated with those sessions. When | client's client ID, when connections are associated with those | |||
no connections are associated with any of a client ID's sessions for | sessions. When no connections are associated with any of a client | |||
an extended time, such objects as locks, opens, delegations, layouts, | ID's sessions for an extended time, such objects as locks, opens, | |||
etc. are subject to expiration. The session serves as an object | delegations, layouts, etc. are subject to expiration. The session | |||
representing a means of access by a client to the associated client | serves as an object representing a means of access by a client to the | |||
state on the server, independent of the physical means of access to | associated client state on the server, independent of the physical | |||
that state. | means of access to that state. | |||
A single client may create multiple sessions. A single session MUST | A single client may create multiple sessions. A single session MUST | |||
NOT serve multiple clients. | NOT serve multiple clients. | |||
2.10.2. NFSv4 Integration | 2.10.2. NFSv4 Integration | |||
Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major | Sessions are part of NFSv4.1 and not NFSv4.0. Normally, a major | |||
infrastructure change such as sessions would require a new major | infrastructure change such as sessions would require a new major | |||
version number to an ONC RPC program like NFS. However, because | version number to an Open Network Computing (ONC) RPC program like | |||
NFSv4 encapsulates its functionality in a single procedure, COMPOUND, | NFS. However, because NFSv4 encapsulates its functionality in a | |||
and because COMPOUND can support an arbitrary number of operations, | single procedure, COMPOUND, and because COMPOUND can support an | |||
sessions have been added to NFSv4.1 with little difficulty. COMPOUND | arbitrary number of operations, sessions have been added to NFSv4.1 | |||
includes a minor version number field, and for NFSv4.1 this minor | with little difficulty. COMPOUND includes a minor version number | |||
version is set to 1. When the NFSv4 server processes a COMPOUND with | field, and for NFSv4.1 this minor version is set to 1. When the | |||
the minor version set to 1, it expects a different set of operations | NFSv4 server processes a COMPOUND with the minor version set to 1, it | |||
than it does for NFSv4.0. NFSv4.1 defines the SEQUENCE operation, | expects a different set of operations than it does for NFSv4.0. | |||
which is required for every COMPOUND that operates over an | NFSv4.1 defines the SEQUENCE operation, which is required for every | |||
established session, with the exception of some session | COMPOUND that operates over an established session, with the | |||
administration operations, such as DESTROY_SESSION (Section 18.37). | exception of some session administration operations, such as | |||
DESTROY_SESSION (Section 18.37). | ||||
2.10.2.1. SEQUENCE and CB_SEQUENCE | 2.10.2.1. SEQUENCE and CB_SEQUENCE | |||
In NFSv4.1, when the SEQUENCE operation is present, it MUST be the | In NFSv4.1, when the SEQUENCE operation is present, it MUST be the | |||
first operation in the COMPOUND procedure. The primary purpose of | first operation in the COMPOUND procedure. The primary purpose of | |||
SEQUENCE is to carry the session identifier. The session identifier | SEQUENCE is to carry the session identifier. The session identifier | |||
associates all other operations in the COMPOUND procedure with a | associates all other operations in the COMPOUND procedure with a | |||
particular session. SEQUENCE also contains required information for | particular session. SEQUENCE also contains required information for | |||
maintaining EOS (see Section 2.10.6). Session-enabled NFSv4.1 | maintaining EOS (see Section 2.10.6). Session-enabled NFSv4.1 | |||
COMPOUND requests thus have the form: | COMPOUND requests thus have the form: | |||
skipping to change at page 45, line 16 | skipping to change at page 44, line 17 | |||
"callback_ident", which is superfluous in NFSv4.1 and MUST be ignored | "callback_ident", which is superfluous in NFSv4.1 and MUST be ignored | |||
by the client. CB_SEQUENCE has the same information as SEQUENCE, and | by the client. CB_SEQUENCE has the same information as SEQUENCE, and | |||
also includes other information needed to resolve callback races | also includes other information needed to resolve callback races | |||
(Section 2.10.6.3). | (Section 2.10.6.3). | |||
2.10.2.2. Client ID and Session Association | 2.10.2.2. Client ID and Session Association | |||
Each client ID (Section 2.4) can have zero or more active sessions. | Each client ID (Section 2.4) can have zero or more active sessions. | |||
A client ID and associated session are required to perform file | A client ID and associated session are required to perform file | |||
access in NFSv4.1. Each time a session is used (whether by a client | access in NFSv4.1. Each time a session is used (whether by a client | |||
sending a request to the server, or the client replying to a callback | sending a request to the server or the client replying to a callback | |||
request from the server), the state leased to its associated client | request from the server), the state leased to its associated client | |||
ID is automatically renewed. | ID is automatically renewed. | |||
State such as share reservations, locks, delegations, and layouts | State (which can consist of share reservations, locks, delegations, | |||
(Section 1.6.4) is tied to the client ID. Client state is not tied | and layouts (Section 1.7.4)) is tied to the client ID. Client state | |||
to any individual session. Successive state changing operations from | is not tied to any individual session. Successive state changing | |||
a given state owner MAY go over different sessions, provided the | operations from a given state owner MAY go over different sessions, | |||
session is associated with the same client ID. A callback MAY arrive | provided the session is associated with the same client ID. A | |||
over a different session than from the session that originally | callback MAY arrive over a different session than that of the request | |||
acquired the state pertaining to the callback. For example, if | that originally acquired the state pertaining to the callback. For | |||
session A is used to acquire a delegation, a request to recall the | example, if session A is used to acquire a delegation, a request to | |||
delegation MAY arrive over session B if both sessions are associated | recall the delegation MAY arrive over session B if both sessions are | |||
with the same client ID. Section 2.10.8.1 and Section 2.10.8.2 | associated with the same client ID. Sections 2.10.8.1 and 2.10.8.2 | |||
discuss the security considerations around callbacks. | discuss the security considerations around callbacks. | |||
2.10.3. Channels | 2.10.3. Channels | |||
A channel is not a connection. A channel represents the direction | A channel is not a connection. A channel represents the direction | |||
ONC RPC requests are sent. | ONC RPC requests are sent. | |||
Each session has one or two channels: the fore channel and the | Each session has one or two channels: the fore channel and the | |||
backchannel. Because there are at most two channels per session, and | backchannel. Because there are at most two channels per session, and | |||
because each channel has a distinct purpose, channels are not | because each channel has a distinct purpose, channels are not | |||
assigned identifiers. | assigned identifiers. | |||
The fore channel is used for ordinary requests from the client to the | The fore channel is used for ordinary requests from the client to the | |||
server, and carries COMPOUND requests and responses. A session | server, and carries COMPOUND requests and responses. A session | |||
always has a fore channel. | always has a fore channel. | |||
The backchannel used for callback requests from server to client, and | The backchannel is used for callback requests from server to client, | |||
carries CB_COMPOUND requests and responses. Whether there is a | and carries CB_COMPOUND requests and responses. Whether or not there | |||
backchannel or not is a decision by the client, however many features | is a backchannel is a decision made by the client; however, many | |||
of NFSv4.1 require a backchannel. NFSv4.1 servers MUST support | features of NFSv4.1 require a backchannel. NFSv4.1 servers MUST | |||
backchannels. | support backchannels. | |||
Each session has resources for each channel, including separate reply | Each session has resources for each channel, including separate reply | |||
caches (see Section 2.10.6.1). Note that even the backchannel | caches (see Section 2.10.6.1). Note that even the backchannel | |||
requires a reply cache (or at least, a slot table in order to detect | requires a reply cache (or, at least, a slot table in order to detect | |||
retries) because some callback operations are nonidempotent. | retries) because some callback operations are nonidempotent. | |||
2.10.3.1. Association of Connections, Channels, and Sessions | 2.10.3.1. Association of Connections, Channels, and Sessions | |||
Each channel is associated with zero or more transport connections | Each channel is associated with zero or more transport connections | |||
(whether of the same transport protocol or different transport | (whether of the same transport protocol or different transport | |||
protocols). A connection can be associated with one channel or both | protocols). A connection can be associated with one channel or both | |||
channels of a session; the client and server negotiate whether a | channels of a session; the client and server negotiate whether a | |||
connection will carry traffic for one channel or both channels via | connection will carry traffic for one channel or both channels via | |||
the CREATE_SESSION (Section 18.36) and the BIND_CONN_TO_SESSION | the CREATE_SESSION (Section 18.36) and the BIND_CONN_TO_SESSION | |||
skipping to change at page 46, line 33 | skipping to change at page 45, line 33 | |||
SEQUENCE is transmitted on a different connection, the connection is | SEQUENCE is transmitted on a different connection, the connection is | |||
automatically associated with the fore channel of the session | automatically associated with the fore channel of the session | |||
specified in the SEQUENCE operation. | specified in the SEQUENCE operation. | |||
A connection's association with a session is not exclusive. A | A connection's association with a session is not exclusive. A | |||
connection associated with the channel(s) of one session may be | connection associated with the channel(s) of one session may be | |||
simultaneously associated with the channel(s) of other sessions | simultaneously associated with the channel(s) of other sessions | |||
including sessions associated with other client IDs. | including sessions associated with other client IDs. | |||
It is permissible for connections of multiple transport types to be | It is permissible for connections of multiple transport types to be | |||
associated with the same channel. For example both a TCP and RDMA | associated with the same channel. For example, both TCP and RDMA | |||
connection can be associated with the fore channel. In the event an | connections can be associated with the fore channel. In the event an | |||
RDMA and non-RDMA connection are associated with the same channel, | RDMA and non-RDMA connection are associated with the same channel, | |||
the maximum number of slots SHOULD be at least one more than the | the maximum number of slots SHOULD be at least one more than the | |||
total number of RDMA credits (Section 2.10.6.1. This way if all RDMA | total number of RDMA credits (Section 2.10.6.1). This way, if all | |||
credits are used, the non-RDMA connection can have at least one | RDMA credits are used, the non-RDMA connection can have at least one | |||
outstanding request. If a server supports multiple transport types, | outstanding request. If a server supports multiple transport types, | |||
it MUST allow a client to associate connections from each transport | it MUST allow a client to associate connections from each transport | |||
to a channel. | to a channel. | |||
It is permissible for a connection of one type of transport to be | It is permissible for a connection of one type of transport to be | |||
associated with the fore channel, and a connection of a different | associated with the fore channel, and a connection of a different | |||
type to be associated with the backchannel. | type to be associated with the backchannel. | |||
2.10.4. Server Scope | 2.10.4. Server Scope | |||
Servers each specify a server scope value in the form of an opaque | Servers each specify a server scope value in the form of an opaque | |||
string eir_server_scope returned as part of the results of an | string eir_server_scope returned as part of the results of an | |||
EXCHANGE_ID operation. The purpose of the server scope is to allow a | EXCHANGE_ID operation. The purpose of the server scope is to allow a | |||
group of servers to indicate to clients that a set of servers sharing | group of servers to indicate to clients that a set of servers sharing | |||
the same server scope value have arranged to use compatible values of | the same server scope value has arranged to use compatible values of | |||
otherwise opaque identifiers. Thus the identifiers generated by one | otherwise opaque identifiers. Thus, the identifiers generated by one | |||
server of that set may be presented to another of that same scope. | server of that set may be presented to another of that same scope. | |||
The use of such compatible values does not imply that a value | The use of such compatible values does not imply that a value | |||
generated by one server will always be accepted by another. In most | generated by one server will always be accepted by another. In most | |||
cases, it will not. However, a server will not accept a value | cases, it will not. However, a server will not accept a value | |||
generated by another inadvertently. When it does accept it, it will | generated by another inadvertently. When it does accept it, it will | |||
be because it is recognized as valid and carrying the same meaning as | be because it is recognized as valid and carrying the same meaning as | |||
on another server of the same scope. | on another server of the same scope. | |||
When servers are of the same server scope, this compatibility of | When servers are of the same server scope, this compatibility of | |||
values applies to the follow identifiers: | values applies to the follow identifiers: | |||
o Filehandle values. A filehandle value accepted by two servers of | o Filehandle values. A filehandle value accepted by two servers of | |||
the same server scope denotes the same object. A write done to | the same server scope denotes the same object. A WRITE operation | |||
one server is reflected immediately in a read done to the other | sent to one server is reflected immediately in a READ sent to the | |||
and locks obtained on one server conflict with those requested on | other, and locks obtained on one server conflict with those | |||
the other. | requested on the other. | |||
o Session ID values. A session ID value accepted by two servers of | o Session ID values. A session ID value accepted by two servers of | |||
the same server scope denotes the same session. | the same server scope denotes the same session. | |||
o Client ID values. A client ID value accepted as valid by two | o Client ID values. A client ID value accepted as valid by two | |||
servers of the same server scope is associated with two clients | servers of the same server scope is associated with two clients | |||
with the same client owner and verifier. | with the same client owner and verifier. | |||
o State ID values when the corresponding client ID is recognized as | o State ID values. A state ID value is recognized as valid when the | |||
valid. If the same stateid value is accepted as valid on two | corresponding client ID is recognized as valid. If the same | |||
servers of the same scope and the client IDs on the two servers | stateid value is accepted as valid on two servers of the same | |||
represent the same client owner and verifier, then the two stateid | scope and the client IDs on the two servers represent the same | |||
values designate the same set of locks and are for the same file | client owner and verifier, then the two stateid values designate | |||
the same set of locks and are for the same file. | ||||
o Server owner values. When the server scope values are the same, | o Server owner values. When the server scope values are the same, | |||
server owner value may be validly compared. In cases where the | server owner value may be validly compared. In cases where the | |||
server scope are different, server owner values are treated as | server scope values are different, server owner values are treated | |||
different even if they contain all identical bytes. | as different even if they contain all identical bytes. | |||
The co-ordination among servers required to provide such | The coordination among servers required to provide such compatibility | |||
compatibility can be quite minimal, and limited to a simple partition | can be quite minimal, and limited to a simple partition of the ID | |||
of the ID space. The recognition of common values requires | space. The recognition of common values requires additional | |||
additional implementation, but this can be tailored to the specific | implementation, but this can be tailored to the specific situations | |||
situations in which that recognition is desired. | in which that recognition is desired. | |||
Clients will have occasion to compare the server scope values of | Clients will have occasion to compare the server scope values of | |||
multiple servers under a number of circumstances, each of which will | multiple servers under a number of circumstances, each of which will | |||
be discussed under the appropriate functional section. | be discussed under the appropriate functional section: | |||
o When server owner values received in response to EXCHANGE_ID | o When server owner values received in response to EXCHANGE_ID | |||
operations sent to multiple network addresses are compared for the | operations sent to multiple network addresses are compared for the | |||
purpose of determining the validity of various forms of trunking, | purpose of determining the validity of various forms of trunking, | |||
as described in Section 2.10.5. | as described in Section 2.10.5. | |||
o When network or server reconfiguration causes the same network | o When network or server reconfiguration causes the same network | |||
address to possibly be directed to different servers, with the | address to possibly be directed to different servers, with the | |||
necessity for the client to determine when lock reclaim should be | necessity for the client to determine when lock reclaim should be | |||
attempted, as described in Section 8.4.2.1 | attempted, as described in Section 8.4.2.1. | |||
o When file system migration causes the transfer of responsibility | o When file system migration causes the transfer of responsibility | |||
for a file system between servers and the client needs to | for a file system between servers and the client needs to | |||
determine whether state has been transferred with the file system | determine whether state has been transferred with the file system | |||
(as described in Section 11.7.7) or whether the client needs to | (as described in Section 11.7.7) or whether the client needs to | |||
reclaim state on a similar basis as in the case of server restart, | reclaim state on a similar basis as in the case of server restart, | |||
as described in Section 8.4.2. | as described in Section 8.4.2. | |||
When two replies from EXCHANGE_ID each from two different server | When two replies from EXCHANGE_ID, each from two different server | |||
network addresses have the same server scope, there are a number of | network addresses, have the same server scope, there are a number of | |||
ways a client can validate that the common server scope is due to two | ways a client can validate that the common server scope is due to two | |||
servers cooperating in a group. | servers cooperating in a group. | |||
o If both EXCHANGE_ID requests were sent with RPCSEC_GSS | o If both EXCHANGE_ID requests were sent with RPCSEC_GSS | |||
authentication and the server principal is the same for both | authentication and the server principal is the same for both | |||
targets, the equality of server scope is validated. It is | targets, the equality of server scope is validated. It is | |||
RECOMMENDED that two servers intending to share the same server | RECOMMENDED that two servers intending to share the same server | |||
scope also share the same principal name. | scope also share the same principal name. | |||
o The client may accept the appearance of the second server in | o The client may accept the appearance of the second server in the | |||
fs_locations or fs_locations_info attribute for a relevant file | fs_locations or fs_locations_info attribute for a relevant file | |||
system. For example, if there is a migration event for a | system. For example, if there is a migration event for a | |||
particular file system or there are locks to be reclaimed on a | particular file system or there are locks to be reclaimed on a | |||
particular file system, the attributes for that particular file | particular file system, the attributes for that particular file | |||
system may be used. The client sends the GETATTR request to the | system may be used. The client sends the GETATTR request to the | |||
first server for the fs_locations or fs_locations_info attribute | first server for the fs_locations or fs_locations_info attribute | |||
with RPCSEC_GSS authentication. It may need to do this in advance | with RPCSEC_GSS authentication. It may need to do this in advance | |||
of the need to verify the common server scope. If the client | of the need to verify the common server scope. If the client | |||
successfully authenticates the reply to GETATTR, and the GETATTR | successfully authenticates the reply to GETATTR, and the GETATTR | |||
request and reply containing the fs_locations or fs_locations_info | request and reply containing the fs_locations or fs_locations_info | |||
skipping to change at page 49, line 26 | skipping to change at page 48, line 26 | |||
MAY allow network addresses for different servers to use client ID | MAY allow network addresses for different servers to use client ID | |||
trunking. | trunking. | |||
Clients may use either form of trunking as long as they do not, when | Clients may use either form of trunking as long as they do not, when | |||
trunking between different server network addresses, violate the | trunking between different server network addresses, violate the | |||
servers' mandates as to the kinds of trunking to be allowed (see | servers' mandates as to the kinds of trunking to be allowed (see | |||
below). With regard to callback channels, the client MUST allow the | below). With regard to callback channels, the client MUST allow the | |||
server to choose among all callback channels valid for a given client | server to choose among all callback channels valid for a given client | |||
ID and MUST support trunking when the connections supporting the | ID and MUST support trunking when the connections supporting the | |||
backchannel allow session or client ID trunking to be used for | backchannel allow session or client ID trunking to be used for | |||
callbacks | callbacks. | |||
Session trunking is essentially the association of multiple | Session trunking is essentially the association of multiple | |||
connections, each with potentially different target and/or source | connections, each with potentially different target and/or source | |||
network addresses, to the same session. When the target network | network addresses, to the same session. When the target network | |||
addresses (server addresses) of the two connections are the same, the | addresses (server addresses) of the two connections are the same, the | |||
server MUST support such session trunking. When the target network | server MUST support such session trunking. When the target network | |||
addresses are different, the server MAY indicate such support using | addresses are different, the server MAY indicate such support using | |||
the data returned by the EXCHANGE_ID operation (see below). | the data returned by the EXCHANGE_ID operation (see below). | |||
Client ID trunking is the association of multiple sessions to the | Client ID trunking is the association of multiple sessions to the | |||
same client ID. Servers MUST support client ID trunking for two | same client ID. Servers MUST support client ID trunking for two | |||
target network addresses whenever they allow session trunking for | target network addresses whenever they allow session trunking for | |||
those same two network addresses. In addition, a server MAY, by | those same two network addresses. In addition, a server MAY, by | |||
presenting the same major server owner ID (Section 2.5), and server | presenting the same major server owner ID (Section 2.5) and server | |||
scope (Section 2.10.4) allow an additional case of client ID | scope (Section 2.10.4), allow an additional case of client ID | |||
trunking. When two servers return the same major server owner and | trunking. When two servers return the same major server owner and | |||
server scope, it means that the two servers are cooperating on | server scope, it means that the two servers are cooperating on | |||
locking state management which is a prerequisite for client ID | locking state management, which is a prerequisite for client ID | |||
trunking. | trunking. | |||
Understanding and distinguishing when the client is allowed to use | Distinguishing when the client is allowed to use session and client | |||
session and client ID trunking requires understanding how the results | ID trunking requires understanding how the results of the EXCHANGE_ID | |||
of the EXCHANGE_ID (Section 18.35) operation identify a server. | (Section 18.35) operation identify a server. Suppose a client sends | |||
Suppose a client sends EXCHANGE_ID over two different connections | EXCHANGE_IDs over two different connections, each with a possibly | |||
each with a possibly different target network address but each | different target network address, but each EXCHANGE_ID operation has | |||
EXCHANGE_ID operation has the same value in the eia_clientowner | the same value in the eia_clientowner field. If the same NFSv4.1 | |||
field. If the same NFSv4.1 server is listening over each connection, | server is listening over each connection, then each EXCHANGE_ID | |||
then each EXCHANGE_ID result MUST return the same values of | result MUST return the same values of eir_clientid, | |||
eir_clientid, eir_server_owner.so_major_id and eir_server_scope. The | eir_server_owner.so_major_id, and eir_server_scope. The client can | |||
client can then treat each connection as referring to the same server | then treat each connection as referring to the same server (subject | |||
(subject to verification, see Paragraph 8 later in this section), and | to verification; see Section 2.10.5.1 later in this section), and it | |||
it can use each connection to trunk requests and replies. The | can use each connection to trunk requests and replies. The client's | |||
client's choice is whether session trunking or client ID trunking | choice is whether session trunking or client ID trunking applies. | |||
applies. | ||||
Session Trunking. If the eia_clientowner argument is the same in two | Session Trunking. If the eia_clientowner argument is the same in two | |||
different EXCHANGE_ID requests, and the eir_clientid, | different EXCHANGE_ID requests, and the eir_clientid, | |||
eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and | eir_server_owner.so_major_id, eir_server_owner.so_minor_id, and | |||
eir_server_scope results match in both EXCHANGE_ID results, then | eir_server_scope results match in both EXCHANGE_ID results, then | |||
the client is permitted to perform session trunking. If the | the client is permitted to perform session trunking. If the | |||
client has no session mapping to the tuple of eir_clientid, | client has no session mapping to the tuple of eir_clientid, | |||
eir_server_owner.so_major_id, eir_server_scope, | eir_server_owner.so_major_id, eir_server_scope, and | |||
eir_server_owner.so_minor_id, then it creates the session via a | eir_server_owner.so_minor_id, then it creates the session via a | |||
CREATE_SESSION operation over one of the connections, which | CREATE_SESSION operation over one of the connections, which | |||
associates the connection to the session. If there is a session | associates the connection to the session. If there is a session | |||
for the tuple, the client can send BIND_CONN_TO_SESSION to | for the tuple, the client can send BIND_CONN_TO_SESSION to | |||
associate the connection to the session. | associate the connection to the session. | |||
Of course, if the client does not desire to use session trunking, | Of course, if the client does not desire to use session trunking, | |||
it is not required to do so. It can invoke CREATE_SESSION on the | it is not required to do so. It can invoke CREATE_SESSION on the | |||
connection. This will result in client ID trunking as described | connection. This will result in client ID trunking as described | |||
below. It can also decide to drop the connection if it does not | below. It can also decide to drop the connection if it does not | |||
choose to use trunking. | choose to use trunking. | |||
Client ID Trunking. If the eia_clientowner argument is the same in | Client ID Trunking. If the eia_clientowner argument is the same in | |||
two different EXCHANGE_ID requests, and the eir_clientid, | two different EXCHANGE_ID requests, and the eir_clientid, | |||
eir_server_owner.so_major_id, and eir_server_scope results match | eir_server_owner.so_major_id, and eir_server_scope results match | |||
in both EXCHANGE_ID results, then the client is permitted to | in both EXCHANGE_ID results, then the client is permitted to | |||
perform client ID trunking (regardless whether the | perform client ID trunking (regardless of whether the | |||
eir_server_owner.so_minor_id results match). The client can | eir_server_owner.so_minor_id results match). The client can | |||
associate each connection with different sessions, where each | associate each connection with different sessions, where each | |||
session is associated with the same server. | session is associated with the same server. | |||
The client completes the act of client ID trunking by invoking | The client completes the act of client ID trunking by invoking | |||
CREATE_SESSION on each connection, using the same client ID that | CREATE_SESSION on each connection, using the same client ID that | |||
was returned in eir_clientid. These invocations create two | was returned in eir_clientid. These invocations create two | |||
sessions and also associate each connection with its respective | sessions and also associate each connection with its respective | |||
session. The client is free to choose not to use client ID | session. The client is free to decline to use client ID trunking | |||
trunking by simply dropping the connection at this point. | by simply dropping the connection at this point. | |||
When doing client ID trunking, locking state is shared across | When doing client ID trunking, locking state is shared across | |||
sessions associated with that same client ID. This requires the | sessions associated with that same client ID. This requires the | |||
server to coordinate state across sessions. | server to coordinate state across sessions. | |||
The client should be prepared for the possibility that | The client should be prepared for the possibility that | |||
eir_server_owner values may be different on subsequent EXCHANGE_ID | eir_server_owner values may be different on subsequent EXCHANGE_ID | |||
requests made to the same network address, as a result of various | requests made to the same network address, as a result of various | |||
sorts of reconfiguration events. When this happens and the changes | sorts of reconfiguration events. When this happens and the changes | |||
result in the invalidation of previously valid forms of trunking, the | result in the invalidation of previously valid forms of trunking, the | |||
client should cease to use those forms, either by dropping | client should cease to use those forms, either by dropping | |||
connections or by adding sessions. For a discussion of lock reclaim | connections or by adding sessions. For a discussion of lock reclaim | |||
as it relates to such reconfiguration events, see Section 8.4.2.1. | as it relates to such reconfiguration events, see Section 8.4.2.1. | |||
2.10.5.1. Verifying Claims of Matching Server Identity | ||||
When two servers over two connections claim matching or partially | When two servers over two connections claim matching or partially | |||
matching eir_server_owner, eir_server_scope, and eir_clientid values, | matching eir_server_owner, eir_server_scope, and eir_clientid values, | |||
the client does not have to trust the servers' claims. The client | the client does not have to trust the servers' claims. The client | |||
may verify these claims before trunking traffic in the following | may verify these claims before trunking traffic in the following | |||
ways: | ways: | |||
o For session trunking, clients SHOULD reliably verify if | o For session trunking, clients SHOULD reliably verify if | |||
connections between different network paths are in fact associated | connections between different network paths are in fact associated | |||
with the same NFSv4.1 server and usable on the same session, and | with the same NFSv4.1 server and usable on the same session, and | |||
servers MUST allow clients to perform reliable verification. When | servers MUST allow clients to perform reliable verification. When | |||
skipping to change at page 52, line 13 | skipping to change at page 51, line 15 | |||
ID was created. Mutual authentication via RPCSEC_GSS assures the | ID was created. Mutual authentication via RPCSEC_GSS assures the | |||
client that the connection is associated with the correct session | client that the connection is associated with the correct session | |||
of the correct server. | of the correct server. | |||
o For client ID trunking, the client has at least two options for | o For client ID trunking, the client has at least two options for | |||
verifying that the same client ID obtained from two different | verifying that the same client ID obtained from two different | |||
EXCHANGE_ID operations came from the same server. The first | EXCHANGE_ID operations came from the same server. The first | |||
option is to use RPCSEC_GSS authentication when sending each | option is to use RPCSEC_GSS authentication when sending each | |||
EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with | EXCHANGE_ID operation. Each time an EXCHANGE_ID is sent with | |||
RPCSEC_GSS authentication, the client notes the principal name of | RPCSEC_GSS authentication, the client notes the principal name of | |||
the GSS target. If the EXCHANGE_ID results indicate client ID | the GSS target. If the EXCHANGE_ID results indicate that client | |||
trunking is possible, and the GSS targets' principal names are the | ID trunking is possible, and the GSS targets' principal names are | |||
same, the servers are the same and client ID trunking is allowed. | the same, the servers are the same and client ID trunking is | |||
allowed. | ||||
The second option for verification is to use SP4_SSV protection. | The second option for verification is to use SP4_SSV protection. | |||
When the client sends EXCHANGE_ID it specifies SP4_SSV protection. | When the client sends EXCHANGE_ID, it specifies SP4_SSV | |||
The first EXCHANGE_ID the client sends always has to be confirmed | protection. The first EXCHANGE_ID the client sends always has to | |||
by a CREATE_SESSION call. The client then sends SET_SSV. Later | be confirmed by a CREATE_SESSION call. The client then sends | |||
the client sends EXCHANGE_ID to a second destination network | SET_SSV. Later, the client sends EXCHANGE_ID to a second | |||
address different from the one the first EXCHANGE_ID was sent to. | destination network address different from the one the first | |||
The client checks that each EXCHANGE_ID reply has the same | EXCHANGE_ID was sent to. The client checks that each EXCHANGE_ID | |||
eir_clientid, eir_server_owner.so_major_id, and eir_server_scope. | reply has the same eir_clientid, eir_server_owner.so_major_id, and | |||
If so, the client verifies the claim by sending a CREATE_SESSION | eir_server_scope. If so, the client verifies the claim by sending | |||
operation to the second destination address, protected with | a CREATE_SESSION operation to the second destination address, | |||
RPCSEC_GSS integrity using an RPCSEC_GSS handle returned by the | protected with RPCSEC_GSS integrity using an RPCSEC_GSS handle | |||
second EXCHANGE_ID. If the server accepts the CREATE_SESSION | returned by the second EXCHANGE_ID. If the server accepts the | |||
request, and if the client verifies the RPCSEC_GSS verifier and | CREATE_SESSION request, and if the client verifies the RPCSEC_GSS | |||
integrity codes, then the client has proof the second server knows | verifier and integrity codes, then the client has proof the second | |||
the SSV, and thus the two servers are co-operating for the | server knows the SSV, and thus the two servers are cooperating for | |||
purposes of specifying server scope and client ID trunking. | the purposes of specifying server scope and client ID trunking. | |||
2.10.6. Exactly Once Semantics | 2.10.6. Exactly Once Semantics | |||
Via the session, NFSv4.1 offers Exactly Once Semantics (EOS) for | Via the session, NFSv4.1 offers exactly once semantics (EOS) for | |||
requests sent over a channel. EOS is supported on both the fore and | requests sent over a channel. EOS is supported on both the fore | |||
back channels. | channel and backchannel. | |||
Each COMPOUND or CB_COMPOUND request that is sent with a leading | Each COMPOUND or CB_COMPOUND request that is sent with a leading | |||
SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver | SEQUENCE or CB_SEQUENCE operation MUST be executed by the receiver | |||
exactly once. This requirement holds regardless of whether the | exactly once. This requirement holds regardless of whether the | |||
request is sent with reply caching specified (see | request is sent with reply caching specified (see | |||
Section 2.10.6.1.3). The requirement holds even if the requester is | Section 2.10.6.1.3). The requirement holds even if the requester is | |||
sending the request over a session created between a pNFS data client | sending the request over a session created between a pNFS data client | |||
and pNFS data server. To understand the rationale for this | and pNFS data server. To understand the rationale for this | |||
requirement, divide the requests into three classifications: | requirement, divide the requests into three classifications: | |||
o Nonidempotent requests. | o Non-idempotent requests. | |||
o Idempotent modifying requests. | o Idempotent modifying requests. | |||
o Idempotent non-modifying requests. | o Idempotent non-modifying requests. | |||
An example of a non-idempotent request is RENAME. Obviously if a | An example of a non-idempotent request is RENAME. Obviously, if a | |||
replier executes the same RENAME request twice, and the first | replier executes the same RENAME request twice, and the first | |||
execution succeeds, the re-execution will fail. If the replier | execution succeeds, the re-execution will fail. If the replier | |||
returns the result from the re-execution, this result is incorrect. | returns the result from the re-execution, this result is incorrect. | |||
Therefore, EOS is required for nonidempotent requests. | Therefore, EOS is required for non-idempotent requests. | |||
An example of an idempotent modifying request is a COMPOUND request | An example of an idempotent modifying request is a COMPOUND request | |||
containing a WRITE operation. Repeated execution of the same WRITE | containing a WRITE operation. Repeated execution of the same WRITE | |||
has the same effect as execution of that write a single time. | has the same effect as execution of that WRITE a single time. | |||
Nevertheless, enforcing EOS for WRITEs and other idempotent modifying | Nevertheless, enforcing EOS for WRITEs and other idempotent modifying | |||
requests is necessary to avoid data corruption. | requests is necessary to avoid data corruption. | |||
Suppose a client sends WRITE A to a noncompliant server that does not | Suppose a client sends WRITE A to a noncompliant server that does not | |||
enforce EOS, and receives no response, perhaps due to a network | enforce EOS, and receives no response, perhaps due to a network | |||
partition. The client reconnects to the server and re-sends WRITE A. | partition. The client reconnects to the server and re-sends WRITE A. | |||
Now, the server has outstanding two instances of A. The server can be | Now, the server has outstanding two instances of A. The server can be | |||
in a situation in which it executes and replies to the retry of A, | in a situation in which it executes and replies to the retry of A, | |||
while the first A is still waiting in the server's internal I/O | while the first A is still waiting in the server's internal I/O | |||
system for some resource. Upon receiving the reply to the second | system for some resource. Upon receiving the reply to the second | |||
attempt of WRITE A, the client believes its write is done so it is | attempt of WRITE A, the client believes its WRITE is done so it is | |||
free to send WRITE B which overlaps the range of A. When the original | free to send WRITE B, which overlaps the byte-range of A. When the | |||
A is dispatched from the server's I/O system, and executed (thus the | original A is dispatched from the server's I/O system and executed | |||
second time A will have been written), then what has been written by | (thus the second time A will have been written), then what has been | |||
B can be overwritten and thus corrupted. | written by B can be overwritten and thus corrupted. | |||
An example of an idempotent non-modifying request is a COMPOUND | An example of an idempotent non-modifying request is a COMPOUND | |||
containing SEQUENCE, PUTFH, READLINK and nothing else. The re- | containing SEQUENCE, PUTFH, READLINK, and nothing else. The re- | |||
execution of a such a request will not cause data corruption, or | execution of such a request will not cause data corruption or produce | |||
produce an incorrect result. Nonetheless, to keep the implementation | an incorrect result. Nonetheless, to keep the implementation simple, | |||
simple, the replier MUST enforce EOS for all requests whether | the replier MUST enforce EOS for all requests, whether or not | |||
idempotent and non-modifying or not. | idempotent and non-modifying. | |||
Note that true and complete EOS is not possible unless the server | Note that true and complete EOS is not possible unless the server | |||
persists the reply cache in stable storage, unless the server is | persists the reply cache in stable storage, and unless the server is | |||
somehow implemented to never require a restart (indeed if such a | somehow implemented to never require a restart (indeed, if such a | |||
server exists, the distinction between a reply cache kept in stable | server exists, the distinction between a reply cache kept in stable | |||
storage versus one that is not is one without meaning). See | storage versus one that is not is one without meaning). See | |||
Section 2.10.6.5 for a discussion of persistence in the reply cache. | Section 2.10.6.5 for a discussion of persistence in the reply cache. | |||
Regardless, even if the server does not persist the reply cache, EOS | Regardless, even if the server does not persist the reply cache, EOS | |||
improves robustness and correctness over previous versions of NFS | improves robustness and correctness over previous versions of NFS | |||
because the legacy duplicate request/reply caches were based on the | because the legacy duplicate request/reply caches were based on the | |||
ONC RPC transaction identifier (XID). Section 2.10.6.1 explains the | ONC RPC transaction identifier (XID). Section 2.10.6.1 explains the | |||
shortcomings of the XID as a basis for a reply cache and describes | shortcomings of the XID as a basis for a reply cache and describes | |||
how NFSv4.1 sessions improve upon the XID. | how NFSv4.1 sessions improve upon the XID. | |||
2.10.6.1. Slot Identifiers and Reply Cache | 2.10.6.1. Slot Identifiers and Reply Cache | |||
The RPC layer provides a transaction ID (XID), which, while required | The RPC layer provides a transaction ID (XID), which, while required | |||
to be unique, is not convenient for tracking requests for two | to be unique, is not convenient for tracking requests for two | |||
reasons. First, the XID is only meaningful to the requester; it | reasons. First, the XID is only meaningful to the requester; it | |||
cannot be interpreted by the replier except to test for equality with | cannot be interpreted by the replier except to test for equality with | |||
previously sent requests. When consulting an RPC-based duplicate | previously sent requests. When consulting an RPC-based duplicate | |||
request cache, the opaqueness of the XID requires a computationally | request cache, the opaqueness of the XID requires a computationally | |||
expensive lookup (often via a hash that includes XID and source | expensive lookup (often via a hash that includes XID and source | |||
address). NFSv4.1 requests use a non-opaque slot ID which is an | address). NFSv4.1 requests use a non-opaque slot ID, which is an | |||
index into a slot table, which is far more efficient. Second, | index into a slot table, which is far more efficient. Second, | |||
because RPC requests can be executed by the replier in any order, | because RPC requests can be executed by the replier in any order, | |||
there is no bound on the number of requests that may be outstanding | there is no bound on the number of requests that may be outstanding | |||
at any time. To achieve perfect EOS using ONC RPC would require | at any time. To achieve perfect EOS, using ONC RPC would require | |||
storing all replies in the reply cache. XIDs are 32 bits; storing | storing all replies in the reply cache. XIDs are 32 bits; storing | |||
over four billion (2^32) replies in the reply cache is not practical. | over four billion (2^32) replies in the reply cache is not practical. | |||
In practice, previous versions of NFS have chosen to store a fixed | In practice, previous versions of NFS have chosen to store a fixed | |||
number of replies in the cache, and use a least recently used (LRU) | number of replies in the cache, and to use a least recently used | |||
approach to replacing cache entries with new entries when the cache | (LRU) approach to replacing cache entries with new entries when the | |||
is full. In NFSv4.1, the number of outstanding requests is bounded | cache is full. In NFSv4.1, the number of outstanding requests is | |||
by the size of the slot table, and a sequence ID per slot is used to | bounded by the size of the slot table, and a sequence ID per slot is | |||
tell the replier when it is safe to delete a cached reply. | used to tell the replier when it is safe to delete a cached reply. | |||
In the NFSv4.1 reply cache, when the requester sends a new request, | In the NFSv4.1 reply cache, when the requester sends a new request, | |||
it selects a slot ID in the range 0..N, where N is the replier's | it selects a slot ID in the range 0..N, where N is the replier's | |||
current maximum slot ID granted to the requester on the session over | current maximum slot ID granted to the requester on the session over | |||
which the request is to be sent. The value of N starts out as equal | which the request is to be sent. The value of N starts out as equal | |||
to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the | to ca_maxrequests - 1 (Section 18.36), but can be adjusted by the | |||
response to SEQUENCE or CB_SEQUENCE as described later in this | response to SEQUENCE or CB_SEQUENCE as described later in this | |||
section. The slot ID must be unused by any of the requests which the | section. The slot ID must be unused by any of the requests that the | |||
requester has already active on the session. "Unused" here means the | requester has already active on the session. "Unused" here means the | |||
requester has no outstanding request for that slot ID. | requester has no outstanding request for that slot ID. | |||
A slot contains a sequence ID and the cached reply corresponding to | A slot contains a sequence ID and the cached reply corresponding to | |||
the request sent with that sequence ID. The sequence ID is a 32 bit | the request sent with that sequence ID. The sequence ID is a 32-bit | |||
unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - | unsigned value, and is therefore in the range 0..0xFFFFFFFF (2^32 - | |||
1). The first time a slot is used, the requester MUST specify a | 1). The first time a slot is used, the requester MUST specify a | |||
sequence ID of one (1) (Section 18.36). Each time a slot is reused, | sequence ID of one (Section 18.36). Each time a slot is reused, the | |||
the request MUST specify a sequence ID that is one greater than that | request MUST specify a sequence ID that is one greater than that of | |||
of the previous request on the slot. If the previous sequence ID was | the previous request on the slot. If the previous sequence ID was | |||
0xFFFFFFFF, then the next request for the slot MUST have the sequence | 0xFFFFFFFF, then the next request for the slot MUST have the sequence | |||
ID set to zero (i.e. (2^32 - 1) + 1 mod 2^32). | ID set to zero (i.e., (2^32 - 1) + 1 mod 2^32). | |||
The sequence ID accompanies the slot ID in each request. It is for | The sequence ID accompanies the slot ID in each request. It is for | |||
the critical check at the replier: it used to efficiently determine | the critical check at the replier: it used to efficiently determine | |||
whether a request using a certain slot ID is a retransmit or a new, | whether a request using a certain slot ID is a retransmit or a new, | |||
never-before-seen request. It is not feasible for the requester to | never-before-seen request. It is not feasible for the requester to | |||
assert that it is retransmitting to implement this, because for any | assert that it is retransmitting to implement this, because for any | |||
given request the requester cannot know whether the replier has seen | given request the requester cannot know whether the replier has seen | |||
it unless the replier actually replies. Of course, if the requester | it unless the replier actually replies. Of course, if the requester | |||
has seen the reply, the requester would not retransmit. | has seen the reply, the requester would not retransmit. | |||
skipping to change at page 55, line 29 | skipping to change at page 54, line 32 | |||
executed to completion, the replier returns the cached reply. See | executed to completion, the replier returns the cached reply. See | |||
Section 2.10.6.2 for direction on how the replier deals with | Section 2.10.6.2 for direction on how the replier deals with | |||
retries of requests that are still in progress. | retries of requests that are still in progress. | |||
o A misordered retry, in which the sequence ID is less than | o A misordered retry, in which the sequence ID is less than | |||
(accounting for sequence wraparound) that previously seen in the | (accounting for sequence wraparound) that previously seen in the | |||
slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the | slot. The replier MUST return NFS4ERR_SEQ_MISORDERED (as the | |||
result from SEQUENCE or CB_SEQUENCE). | result from SEQUENCE or CB_SEQUENCE). | |||
o A misordered new request, in which the sequence ID is two or more | o A misordered new request, in which the sequence ID is two or more | |||
than (accounting for sequence wraparound) than that previously | than (accounting for sequence wraparound) that previously seen in | |||
seen in the slot. Note that because the sequence ID MUST | the slot. Note that because the sequence ID MUST wrap around to | |||
wraparound to zero (0) once it reaches 0xFFFFFFFF, a misordered | zero once it reaches 0xFFFFFFFF, a misordered new request and a | |||
new request and a misordered retry cannot be distinguished. Thus, | misordered retry cannot be distinguished. Thus, the replier MUST | |||
the replier MUST return NFS4ERR_SEQ_MISORDERED (as the result from | return NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or | |||
SEQUENCE or CB_SEQUENCE). | CB_SEQUENCE). | |||
Unlike the XID, the slot ID is always within a specific range; this | Unlike the XID, the slot ID is always within a specific range; this | |||
has two implications. The first implication is that for a given | has two implications. The first implication is that for a given | |||
session, the replier need only cache the results of a limited number | session, the replier need only cache the results of a limited number | |||
of COMPOUND requests . The second implication derives from the | of COMPOUND requests. The second implication derives from the first, | |||
first, which is that unlike XID-indexed reply caches (also known as | which is that unlike XID-indexed reply caches (also known as | |||
duplicate request caches - DRCs), the slot ID-based reply cache | duplicate request caches - DRCs), the slot ID-based reply cache | |||
cannot be overflowed. Through use of the sequence ID to identify | cannot be overflowed. Through use of the sequence ID to identify | |||
retransmitted requests, the replier does not need to actually cache | retransmitted requests, the replier does not need to actually cache | |||
the request itself, reducing the storage requirements of the reply | the request itself, reducing the storage requirements of the reply | |||
cache further. These facilities make it practical to maintain all | cache further. These facilities make it practical to maintain all | |||
the required entries for an effective reply cache. | the required entries for an effective reply cache. | |||
The slot ID, sequence ID, and session ID therefore take over the | The slot ID, sequence ID, and session ID therefore take over the | |||
traditional role of the XID and source network address in the | traditional role of the XID and source network address in the | |||
replier's reply cache implementation. This approach is considerably | replier's reply cache implementation. This approach is considerably | |||
more portable and completely robust - it is not subject to the | more portable and completely robust -- it is not subject to the | |||
reassignment of ports as clients reconnect over IP networks. In | reassignment of ports as clients reconnect over IP networks. In | |||
addition, the RPC XID is not used in the reply cache, enhancing | addition, the RPC XID is not used in the reply cache, enhancing | |||
robustness of the cache in the face of any rapid reuse of XIDs by the | robustness of the cache in the face of any rapid reuse of XIDs by the | |||
requester. While the replier does not care about the XID for the | requester. While the replier does not care about the XID for the | |||
purposes of reply cache management (but the replier MUST return the | purposes of reply cache management (but the replier MUST return the | |||
same XID that was in the request), nonetheless there are | same XID that was in the request), nonetheless there are | |||
considerations for the XID in NFSv4.1 that are the same as all other | considerations for the XID in NFSv4.1 that are the same as all other | |||
previous versions of NFS. The RPC XID remains in each message and | previous versions of NFS. The RPC XID remains in each message and | |||
needs to be formulated in NFSv4.1 requests as in any other ONC RPC | needs to be formulated in NFSv4.1 requests as in any other ONC RPC | |||
request. The reasons include: | request. The reasons include: | |||
o The RPC layer retains its existing semantics and implementation. | o The RPC layer retains its existing semantics and implementation. | |||
o The requester and replier must be able to interoperate at the RPC | o The requester and replier must be able to interoperate at the RPC | |||
layer, prior to the NFSv4.1 decoding of the SEQUENCE or | layer, prior to the NFSv4.1 decoding of the SEQUENCE or | |||
CB_SEQUENCE operation. | CB_SEQUENCE operation. | |||
o If an operation is being used that does not start with SEQUENCE or | o If an operation is being used that does not start with SEQUENCE or | |||
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is | CB_SEQUENCE (e.g., BIND_CONN_TO_SESSION), then the RPC XID is | |||
needed for correct operation to match the reply to the request. | needed for correct operation to match the reply to the request. | |||
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If | o The SEQUENCE or CB_SEQUENCE operation may generate an error. If | |||
so, the embedded slot ID, sequence ID, and session ID (if present) | so, the embedded slot ID, sequence ID, and session ID (if present) | |||
in the request will not be in the reply, and the requester has | in the request will not be in the reply, and the requester has | |||
only the XID to match the reply to the request. | only the XID to match the reply to the request. | |||
Given that well formulated XIDs continue to be required, this begs | Given that well-formulated XIDs continue to be required, this begs | |||
the question why SEQUENCE and CB_SEQUENCE replies have a session ID, | the question: why do SEQUENCE and CB_SEQUENCE replies have a session | |||
slot ID and sequence ID? Having the session ID in the reply means | ID, slot ID, and sequence ID? Having the session ID in the reply | |||
the requester does not have to use the XID to lookup the session ID, | means that the requester does not have to use the XID to look up the | |||
which would be necessary if the connection were associated with | session ID, which would be necessary if the connection were | |||
multiple sessions. Having the slot ID and sequence ID in the reply | associated with multiple sessions. Having the slot ID and sequence | |||
means the requester does not have to use the XID to lookup the slot | ID in the reply means that the requester does not have to use the XID | |||
ID and sequence ID. Furthermore, since the XID is only 32 bits, it | to look up the slot ID and sequence ID. Furthermore, since the XID | |||
is too small to guarantee the re-association of a reply with its | is only 32 bits, it is too small to guarantee the re-association of a | |||
request ([37]); having session ID, slot ID, and sequence ID in the | reply with its request [37]; having session ID, slot ID, and sequence | |||
reply allows the client to validate that the reply in fact belongs to | ID in the reply allows the client to validate that the reply in fact | |||
the matched request. | belongs to the matched request. | |||
The SEQUENCE (and CB_SEQUENCE) operation also carries a | The SEQUENCE (and CB_SEQUENCE) operation also carries a | |||
"highest_slotid" value which carries additional requester slot usage | "highest_slotid" value, which carries additional requester slot usage | |||
information. The requester MUST always indicate the slot ID | information. The requester MUST always indicate the slot ID | |||
representing the outstanding request with the highest-numbered slot | representing the outstanding request with the highest-numbered slot | |||
value. The requester should in all cases provide the most | value. The requester should in all cases provide the most | |||
conservative value possible, although it can be increased somewhat | conservative value possible, although it can be increased somewhat | |||
above the actual instantaneous usage to maintain some minimum or | above the actual instantaneous usage to maintain some minimum or | |||
optimal level. This provides a way for the requester to yield unused | optimal level. This provides a way for the requester to yield unused | |||
request slots back to the replier, which in turn can use the | request slots back to the replier, which in turn can use the | |||
information to reallocate resources. | information to reallocate resources. | |||
The replier responds with both a new target highest_slotid, and an | The replier responds with both a new target highest_slotid and an | |||
enforced highest_slotid, described as follows: | enforced highest_slotid, described as follows: | |||
o The target highest_slotid is an indication to the requester of the | o The target highest_slotid is an indication to the requester of the | |||
highest_slotid the replier wishes the requester to be using. This | highest_slotid the replier wishes the requester to be using. This | |||
permits the replier to withdraw (or add) resources from a | permits the replier to withdraw (or add) resources from a | |||
requester that has been found to not be using them, in order to | requester that has been found to not be using them, in order to | |||
more fairly share resources among a varying level of demand from | more fairly share resources among a varying level of demand from | |||
other requesters. The requester must always comply with the | other requesters. The requester must always comply with the | |||
replier's value updates, since they indicate newly established | replier's value updates, since they indicate newly established | |||
hard limits on the requester's access to session resources. | hard limits on the requester's access to session resources. | |||
However, because of request pipelining, the requester may have | However, because of request pipelining, the requester may have | |||
active requests in flight reflecting prior values, therefore the | active requests in flight reflecting prior values; therefore, the | |||
replier must not immediately require the requester to comply. | replier must not immediately require the requester to comply. | |||
o The enforced highest_slotid indicates the highest slot ID the | o The enforced highest_slotid indicates the highest slot ID the | |||
requester is permitted to use on a subsequent SEQUENCE or | requester is permitted to use on a subsequent SEQUENCE or | |||
CB_SEQUENCE operation. The replier's enforced highest_slotid | CB_SEQUENCE operation. The replier's enforced highest_slotid | |||
SHOULD be no less than the highest_slotid the requester indicated | SHOULD be no less than the highest_slotid the requester indicated | |||
in the SEQUENCE or CB_SEQUENCE arguments. | in the SEQUENCE or CB_SEQUENCE arguments. | |||
A requester can be intransigent with respect to lowering its | A requester can be intransigent with respect to lowering its | |||
highest_slotid argument to a Sequence operation, i.e. the | highest_slotid argument to a Sequence operation, i.e. the | |||
skipping to change at page 57, line 41 | skipping to change at page 56, line 44 | |||
highest_slotid argument to be higher than the target | highest_slotid argument to be higher than the target | |||
highest_slotid. This can be considered particularly egregious | highest_slotid. This can be considered particularly egregious | |||
behavior when the replier knows there are no outstanding requests | behavior when the replier knows there are no outstanding requests | |||
with slot IDs higher than its target highest_slotid. When faced | with slot IDs higher than its target highest_slotid. When faced | |||
with such intransigence, the replier is free to take more forceful | with such intransigence, the replier is free to take more forceful | |||
action, and MAY reply with a new enforced highest_slotid that is | action, and MAY reply with a new enforced highest_slotid that is | |||
less than its previous enforced highest_slotid. Thereafter, if | less than its previous enforced highest_slotid. Thereafter, if | |||
the requester continues to send requests with a highest_slotid | the requester continues to send requests with a highest_slotid | |||
that is greater than the replier's new enforced highest_slotid, | that is greater than the replier's new enforced highest_slotid, | |||
the server MAY return NFS4ERR_BAD_HIGH_SLOT, unless the slot ID in | the server MAY return NFS4ERR_BAD_HIGH_SLOT, unless the slot ID in | |||
the request is greater than the new enforced highest_slotid, and | the request is greater than the new enforced highest_slotid and | |||
the request is a retry. | the request is a retry. | |||
The replier SHOULD retain the slots it wants to retire until the | The replier SHOULD retain the slots it wants to retire until the | |||
requester sends a request with a highest_slotid less than or equal | requester sends a request with a highest_slotid less than or equal | |||
to the replier's new enforced highest_slotid. | to the replier's new enforced highest_slotid. | |||
The requester can also be intransigent with respect to sending | The requester can also be intransigent with respect to sending | |||
non-retry requests that have a slot ID that exceeds the replier's | non-retry requests that have a slot ID that exceeds the replier's | |||
highest_slotid. Once the replier has forcibly lowered the | highest_slotid. Once the replier has forcibly lowered the | |||
enforced highest_slotid, the requester is only allowed to send | enforced highest_slotid, the requester is only allowed to send | |||
retries on slots that exceed the replier's highest_slotid. If a | retries on slots that exceed the replier's highest_slotid. If a | |||
request is received with a slot ID that is higher than the new | request is received with a slot ID that is higher than the new | |||
enforced highest_slotid, and the sequence ID is one higher than | enforced highest_slotid, and the sequence ID is one higher than | |||
what is in the slot's reply cache, then the server can both retire | what is in the slot's reply cache, then the server can both retire | |||
the slot and return NFS4ERR_BADSLOT (however the server MUST NOT | the slot and return NFS4ERR_BADSLOT (however, the server MUST NOT | |||
do one and not the other). The reason it is safe to retire the | do one and not the other). The reason it is safe to retire the | |||
slot is because that by using the next sequence ID, the requester | slot is because by using the next sequence ID, the requester is | |||
is indicating it has received the previous reply for the slot. | indicating it has received the previous reply for the slot. | |||
o The requester SHOULD use the lowest available slot when sending a | o The requester SHOULD use the lowest available slot when sending a | |||
new request. This way, the replier may be able to retire slot | new request. This way, the replier may be able to retire slot | |||
entries faster. However, where the replier is actively adjusting | entries faster. However, where the replier is actively adjusting | |||
its granted highest_slotid, it will not be able to use only the | its granted highest_slotid, it will not be able to use only the | |||
receipt of the slot ID and highest_slotid in the request. Neither | receipt of the slot ID and highest_slotid in the request. Neither | |||
the slot ID nor the highest_slotid used in a request may reflect | the slot ID nor the highest_slotid used in a request may reflect | |||
the replier's current idea of the requester's session limit, | the replier's current idea of the requester's session limit, | |||
because the request may have been sent from the requester before | because the request may have been sent from the requester before | |||
the update was received. Therefore, in the downward adjustment | the update was received. Therefore, in the downward adjustment | |||
case, the replier may have to retain a number of reply cache | case, the replier may have to retain a number of reply cache | |||
entries at least as large as the old value of maximum requests | entries at least as large as the old value of maximum requests | |||
outstanding, until it can infer that the requester has seen a | outstanding, until it can infer that the requester has seen a | |||
reply containing the new granted highest_slotid. The replier can | reply containing the new granted highest_slotid. The replier can | |||
infer that requester as seen such a reply when it receives a new | infer that the requester has seen such a reply when it receives a | |||
request with the same slot ID as the request replied to and the | new request with the same slot ID as the request replied to and | |||
next higher sequence ID. | the next higher sequence ID. | |||
2.10.6.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies | 2.10.6.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies | |||
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, | When a SEQUENCE or CB_SEQUENCE operation is successfully executed, | |||
its reply MUST always be cached. Specifically, session ID, sequence | its reply MUST always be cached. Specifically, session ID, sequence | |||
ID, and slot ID MUST be cached in the reply cache. The reply from | ID, and slot ID MUST be cached in the reply cache. The reply from | |||
SEQUENCE also includes the highest slot ID, target highest slot ID, | SEQUENCE also includes the highest slot ID, target highest slot ID, | |||
and status flags. Instead of caching these values, the server MAY | and status flags. Instead of caching these values, the server MAY | |||
re-compute the values from the current state of the fore channel, | re-compute the values from the current state of the fore channel, | |||
session and/or client ID as appropriate. Similarly, the reply from | session, and/or client ID as appropriate. Similarly, the reply from | |||
CB_SEQUENCE includes a highest slot ID and target highest slot ID. | CB_SEQUENCE includes a highest slot ID and target highest slot ID. | |||
The client MAY re-compute the values from the current state of the | The client MAY re-compute the values from the current state of the | |||
session as appropriate. | session as appropriate. | |||
Regardless of whether a replier is re-computing highest slot ID, | Regardless of whether or not a replier is re-computing highest slot | |||
target slot ID, and status on replies to retries or not, the | ID, target slot ID, and status on replies to retries, the requester | |||
requester MUST NOT assume the values are being re-computed whenever | MUST NOT assume that the values are being re-computed whenever it | |||
it receives a reply after a retry is sent, since it has no way of | receives a reply after a retry is sent, since it has no way of | |||
knowing whether the reply it has received was sent by the server in | knowing whether the reply it has received was sent by the replier in | |||
response to the retry, or is a delayed response to the original | response to the retry or is a delayed response to the original | |||
request. Therefore, it may be the case that highest slot ID, target | request. Therefore, it may be the case that highest slot ID, target | |||
slot ID, or status bits may reflect the state of affairs when the | slot ID, or status bits may reflect the state of affairs when the | |||
request was first executed. Although acting based on such delayed | request was first executed. Although acting based on such delayed | |||
information is valid, it may cause the receiver to do unneeded work. | information is valid, it may cause the receiver of the reply to do | |||
Requesters MAY choose to send additional requests to get the current | unneeded work. Requesters MAY choose to send additional requests to | |||
state of affairs or use the state of affairs reported by subsequent | get the current state of affairs or use the state of affairs reported | |||
requests, in preference to acting immediately on data which may be | by subsequent requests, in preference to acting immediately on data | |||
out of date. | that might be out of date. | |||
2.10.6.1.2. Errors from SEQUENCE and CB_SEQUENCE | 2.10.6.1.2. Errors from SEQUENCE and CB_SEQUENCE | |||
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence ID of | Any time SEQUENCE or CB_SEQUENCE returns an error, the sequence ID of | |||
the slot MUST NOT change. The replier MUST NOT modify the reply | the slot MUST NOT change. The replier MUST NOT modify the reply | |||
cache entry for the slot whenever an error is returned from SEQUENCE | cache entry for the slot whenever an error is returned from SEQUENCE | |||
or CB_SEQUENCE. | or CB_SEQUENCE. | |||
2.10.6.1.3. Optional Reply Caching | 2.10.6.1.3. Optional Reply Caching | |||
On a per-request basis the requester can choose to direct the replier | On a per-request basis, the requester can choose to direct the | |||
to cache the reply to all operations after the first operation | replier to cache the reply to all operations after the first | |||
(SEQUENCE or CB_SEQUENCE) via the sa_cachethis or csa_cachethis | operation (SEQUENCE or CB_SEQUENCE) via the sa_cachethis or | |||
fields of the arguments to SEQUENCE or CB_SEQUENCE. The reason it | csa_cachethis fields of the arguments to SEQUENCE or CB_SEQUENCE. | |||
would not direct the replier to cache the entire reply is that the | The reason it would not direct the replier to cache the entire reply | |||
request is composed of all idempotent operations [34]. Caching the | is that the request is composed of all idempotent operations [34]. | |||
reply may offer little benefit. If the reply is too large (see | Caching the reply may offer little benefit. If the reply is too | |||
Section 2.10.6.4), it may not be cacheable anyway. Even if the reply | large (see Section 2.10.6.4), it may not be cacheable anyway. Even | |||
to idempotent request is small enough to cache, unnecessarily caching | if the reply to idempotent request is small enough to cache, | |||
the reply slows down the server and increases RPC latency. | unnecessarily caching the reply slows down the server and increases | |||
RPC latency. | ||||
Whether the requester requests the reply to be cached or not has no | Whether or not the requester requests the reply to be cached has no | |||
effect on the slot processing. If the results of SEQUENCE or | effect on the slot processing. If the results of SEQUENCE or | |||
CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be | CB_SEQUENCE are NFS4_OK, then the slot's sequence ID MUST be | |||
incremented by one. If a requester does not direct the replier to | incremented by one. If a requester does not direct the replier to | |||
cache the reply, the replier MUST do one of following: | cache the reply, the replier MUST do one of following: | |||
o The replier can cache the entire original reply. Even though | o The replier can cache the entire original reply. Even though | |||
sa_cachethis or csa_cachethis are FALSE, the replier is always | sa_cachethis or csa_cachethis is FALSE, the replier is always free | |||
free to cache. It may choose this approach in order to simplify | to cache. It may choose this approach in order to simplify | |||
implementation. | implementation. | |||
o The replier enters into its reply cache a reply consisting of the | o The replier enters into its reply cache a reply consisting of the | |||
original results to the SEQUENCE or CB_SEQUENCE operation, and | original results to the SEQUENCE or CB_SEQUENCE operation, and | |||
with the next operation in COMPOUND or CB_COMPOUND having the | with the next operation in COMPOUND or CB_COMPOUND having the | |||
error NFS4ERR_RETRY_UNCACHED_REP. Thus if the requester later | error NFS4ERR_RETRY_UNCACHED_REP. Thus, if the requester later | |||
retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. If a | retries the request, it will get NFS4ERR_RETRY_UNCACHED_REP. If a | |||
replier receives a retried Sequence operation where the reply to | replier receives a retried Sequence operation where the reply to | |||
the COMPOUND or CB_COMPOUND was not cached, then the replier, | the COMPOUND or CB_COMPOUND was not cached, then the replier, | |||
* MAY return NFS4ERR_RETRY_UNCACHED_REP in reply to a Sequence | * MAY return NFS4ERR_RETRY_UNCACHED_REP in reply to a Sequence | |||
operation if the Sequence operation is not the first operation | operation if the Sequence operation is not the first operation | |||
(granted, a requester that does so is in violation of the | (granted, a requester that does so is in violation of the | |||
NFSv4.1 protocol). | NFSv4.1 protocol). | |||
* MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a | * MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a | |||
Sequence operation if the Sequence operation is the first | Sequence operation if the Sequence operation is the first | |||
operation. | operation. | |||
o If the second operation is an illegal operation, or an operation | o If the second operation is an illegal operation, or an operation | |||
skipping to change at page 60, line 15 | skipping to change at page 59, line 18 | |||
operation if the Sequence operation is not the first operation | operation if the Sequence operation is not the first operation | |||
(granted, a requester that does so is in violation of the | (granted, a requester that does so is in violation of the | |||
NFSv4.1 protocol). | NFSv4.1 protocol). | |||
* MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a | * MUST NOT return NFS4ERR_RETRY_UNCACHED_REP in reply to a | |||
Sequence operation if the Sequence operation is the first | Sequence operation if the Sequence operation is the first | |||
operation. | operation. | |||
o If the second operation is an illegal operation, or an operation | o If the second operation is an illegal operation, or an operation | |||
that was legal in a previous minor version of NFSv4 and MUST NOT | that was legal in a previous minor version of NFSv4 and MUST NOT | |||
be supported in current minor version (e.g. SETCLIENTID), the | be supported in the current minor version (e.g., SETCLIENTID), the | |||
replier MUST NOT ever return NFS4ERR_RETRY_UNCACHED_REP. Instead | replier MUST NOT ever return NFS4ERR_RETRY_UNCACHED_REP. Instead | |||
the replier MUST return NFS4ERR_OP_ILLEGAL, or NFS4ERR_BADXDR, or | the replier MUST return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or | |||
NFS4ERR_NOTSUPP as appropriate. | NFS4ERR_NOTSUPP as appropriate. | |||
o If the second operation can result in another error status, the | o If the second operation can result in another error status, the | |||
replier MAY return a status other than NFS4ERR_RETRY_UNCACHED_REP, | replier MAY return a status other than NFS4ERR_RETRY_UNCACHED_REP, | |||
provided the operation is not executed in such a way that the | provided the operation is not executed in such a way that the | |||
state of the replier is changed. Examples of such an error status | state of the replier is changed. Examples of such an error status | |||
include: NFS4ERR_NOTSUPP returned for an operation that is legal | include: NFS4ERR_NOTSUPP returned for an operation that is legal | |||
but not REQUIRED in the current minor versions, and thus not | but not REQUIRED in the current minor versions, and thus not | |||
supported by the replier; NFS4ERR_SEQUENCE_POS; and | supported by the replier; NFS4ERR_SEQUENCE_POS; and | |||
NFS4ERR_REQ_TOO_BIG. | NFS4ERR_REQ_TOO_BIG. | |||
The discussion above assumes that the retried request matches the | The discussion above assumes that the retried request matches the | |||
original one. Section 2.10.6.1.3.1 discusses what the replier might | original one. Section 2.10.6.1.3.1 discusses what the replier might | |||
do, and MUST do when original and retried requests do not match. | do, and MUST do when original and retried requests do not match. | |||
Since the replier may only cache a small amount of the information | Since the replier may only cache a small amount of the information | |||
that would be required to determine whether this is a case of a false | that would be required to determine whether this is a case of a false | |||
retry, the replier may send to the client, any of the following | retry, the replier may send to the client any of the following | |||
responses: | responses: | |||
o The cached reply to the original request (if the replier has | o The cached reply to the original request (if the replier has | |||
cached it in its entirety, and the users of the original request | cached it in its entirety and the users of the original request | |||
and retry match). | and retry match). | |||
o A reply that consists only of the Sequence operation with the | o A reply that consists only of the Sequence operation with the | |||
error NFS4ERR_FALSE_RETRY. | error NFS4ERR_FALSE_RETRY. | |||
o A reply consisting of the response to Sequence with the status | o A reply consisting of the response to Sequence with the status | |||
NFS4_OK, together with the second operation as it appeared in the | NFS4_OK, together with the second operation as it appeared in the | |||
retried request with an error of NFS4ERR_RETRY_UNCACHED_REP or | retried request with an error of NFS4ERR_RETRY_UNCACHED_REP or | |||
other error as described above. | other error as described above. | |||
o A reply that consists of the response to Sequence with the status | o A reply that consists of the response to Sequence with the status | |||
NFS4_OK, together with the second operation as it appeared in the | NFS4_OK, together with the second operation as it appeared in the | |||
original request with an error of NFS4ERR_RETRY_UNCACHED_REP or | original request with an error of NFS4ERR_RETRY_UNCACHED_REP or | |||
other error as described above. | other error as described above. | |||
2.10.6.1.3.1. False Retry | 2.10.6.1.3.1. False Retry | |||
If a requester sent a Sequence operation with a slot ID and sequence | If a requester sent a Sequence operation with a slot ID and sequence | |||
ID that are in the reply cache, but the replier detected that the | ID that are in the reply cache but the replier detected that the | |||
retried request is not the same as the original request, including a | retried request is not the same as the original request, including a | |||
retry that has different operations or different arguments in the | retry that has different operations or different arguments in the | |||
operations from the original, and a retry that uses a different | operations from the original and a retry that uses a different | |||
principal in the RPC request's credential field that translates to a | principal in the RPC request's credential field that translates to a | |||
different user, then this is a false retry. When the replier detects | different user, then this is a false retry. When the replier detects | |||
a false retry, it is permitted to (but not always obligated to) | a false retry, it is permitted (but not always obligated) to return | |||
return NFS4ERR_FALSE_RETRY in response to the Sequence operation when | NFS4ERR_FALSE_RETRY in response to the Sequence operation when it | |||
it detects a false retry. | detects a false retry. | |||
Translations of particularly privileged user values to other users | Translations of particularly privileged user values to other users | |||
due to the lack of appropriately secure credentials, as configured on | due to the lack of appropriately secure credentials, as configured on | |||
the replier, should be applied before determining whether the users | the replier, should be applied before determining whether the users | |||
are the same or different. If the replier determines the users are | are the same or different. If the replier determines the users are | |||
different between the original request and a retry, then the replier | different between the original request and a retry, then the replier | |||
MUST return NFS4ERR_FALSE_RETRY. | MUST return NFS4ERR_FALSE_RETRY. | |||
If an operation of the retry is an illegal operation, or an operation | If an operation of the retry is an illegal operation, or an operation | |||
that was legal in a previous minor version of NFSv4 and MUST NOT be | that was legal in a previous minor version of NFSv4 and MUST NOT be | |||
supported in current minor version (e.g. SETCLIENTID), the replier | supported in the current minor version (e.g., SETCLIENTID), the | |||
MAY return NFS4ERR_FALSE_RETRY (and MUST do so if the users of the | replier MAY return NFS4ERR_FALSE_RETRY (and MUST do so if the users | |||
original request and retry differ). Otherwise, the replier MAY | of the original request and retry differ). Otherwise, the replier | |||
NFS4ERR_OP_ILLEGAL, or NFS4ERR_BADXDR, or NFS4ERR_NOTSUPP as | MAY return NFS4ERR_OP_ILLEGAL or NFS4ERR_BADXDR or NFS4ERR_NOTSUPP as | |||
appropriate. Note that the handling is in contrast for how replier | appropriate. Note that the handling is in contrast for how the | |||
deals with retries requests with no cached reply. The difference is | replier deals with retries requests with no cached reply. The | |||
due to NFS4ERR_FALSE_RETRY being a valid error for only Sequence | difference is due to NFS4ERR_FALSE_RETRY being a valid error for only | |||
operations, whereas NFS4ERR_RETRY_UNCACHED_REP is a valid error for | Sequence operations, whereas NFS4ERR_RETRY_UNCACHED_REP is a valid | |||
all operations except illegal operations and operations that MUST NOT | error for all operations except illegal operations and operations | |||
be supported in the current minor version of NFSv4. | that MUST NOT be supported in the current minor version of NFSv4. | |||
2.10.6.2. Retry and Replay of Reply | 2.10.6.2. Retry and Replay of Reply | |||
A requester MUST NOT retry a request, unless the connection it used | A requester MUST NOT retry a request, unless the connection it used | |||
to send the request disconnects. The requester can then reconnect | to send the request disconnects. The requester can then reconnect | |||
and re-send the request, or it can re-send the request over a | and re-send the request, or it can re-send the request over a | |||
different connection that is associated with the same session. | different connection that is associated with the same session. | |||
If the requester is a server wanting to re-send a callback operation | If the requester is a server wanting to re-send a callback operation | |||
over the backchannel of a session, the requester of course cannot | over the backchannel of a session, the requester of course cannot | |||
reconnect because only the client can associate connections with the | reconnect because only the client can associate connections with the | |||
backchannel. The server can re-send the request over another | backchannel. The server can re-send the request over another | |||
connection that is bound to the same session's backchannel. If there | connection that is bound to the same session's backchannel. If there | |||
is no such connection, the server MUST indicate that the session has | is no such connection, the server MUST indicate that the session has | |||
no backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag | no backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag | |||
bit in the response to the next SEQUENCE operation from the client. | bit in the response to the next SEQUENCE operation from the client. | |||
The client MUST then associate a connection with the session (or | The client MUST then associate a connection with the session (or | |||
destroy the session). | destroy the session). | |||
Note that it is not fatal for a requester to retry without a | Note that it is not fatal for a requester to retry without a | |||
disconnect between the request and retry. However the retry does | disconnect between the request and retry. However, the retry does | |||
consume resources, especially with RDMA, where each request, retry or | consume resources, especially with RDMA, where each request, retry or | |||
not, consumes a credit. Retries for no reason, especially retries | not, consumes a credit. Retries for no reason, especially retries | |||
sent shortly after the previous attempt, are a poor use of network | sent shortly after the previous attempt, are a poor use of network | |||
bandwidth and defeat the purpose of a transport's inherent congestion | bandwidth and defeat the purpose of a transport's inherent congestion | |||
control system. | control system. | |||
A requester MUST wait for a reply to a request before using the slot | A requester MUST wait for a reply to a request before using the slot | |||
for another request. If it does not wait for a reply, then the | for another request. If it does not wait for a reply, then the | |||
requester does not know what sequence ID to use for the slot on its | requester does not know what sequence ID to use for the slot on its | |||
next request. For example, suppose a requester sends a request with | next request. For example, suppose a requester sends a request with | |||
sequence ID 1, and does not wait for the response. The next time it | sequence ID 1, and does not wait for the response. The next time it | |||
uses the slot, it sends the new request with sequence ID 2. If the | uses the slot, it sends the new request with sequence ID 2. If the | |||
replier has not seen the request with sequence ID 1, then the replier | replier has not seen the request with sequence ID 1, then the replier | |||
is not expecting sequence ID 2, and rejects the requester's new | is not expecting sequence ID 2, and rejects the requester's new | |||
request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or | request with NFS4ERR_SEQ_MISORDERED (as the result from SEQUENCE or | |||
CB_SEQUENCE). | CB_SEQUENCE). | |||
RDMA fabrics do not guarantee that the memory handles (Steering Tags) | RDMA fabrics do not guarantee that the memory handles (Steering Tags) | |||
within each RPC/RDMA "chunk" ([8]) are valid on a scope outside that | within each RPC/RDMA "chunk" [8] are valid on a scope outside that of | |||
of a single connection. Therefore, handles used by the direct | a single connection. Therefore, handles used by the direct | |||
operations become invalid after connection loss. The server must | operations become invalid after connection loss. The server must | |||
ensure that any RDMA operations which must be replayed from the reply | ensure that any RDMA operations that must be replayed from the reply | |||
cache use the newly provided handle(s) from the most recent request. | cache use the newly provided handle(s) from the most recent request. | |||
A retry might be sent while the original request is still in progress | A retry might be sent while the original request is still in progress | |||
on the replier. The replier SHOULD deal with the issue by returning | on the replier. The replier SHOULD deal with the issue by returning | |||
NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but | NFS4ERR_DELAY as the reply to SEQUENCE or CB_SEQUENCE operation, but | |||
implementations MAY return NFS4ERR_MISORDERED. Since errors from | implementations MAY return NFS4ERR_MISORDERED. Since errors from | |||
SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this | SEQUENCE and CB_SEQUENCE are never recorded in the reply cache, this | |||
approach allows the results of the execution of the original request | approach allows the results of the execution of the original request | |||
to be properly recorded in the reply cache (assuming the requester | to be properly recorded in the reply cache (assuming that the | |||
specified the reply to be cached). | requester specified the reply to be cached). | |||
2.10.6.3. Resolving Server Callback Races | 2.10.6.3. Resolving Server Callback Races | |||
It is possible for server callbacks to arrive at the client before | It is possible for server callbacks to arrive at the client before | |||
the reply from related fore channel operations. For example, a | the reply from related fore channel operations. For example, a | |||
client may have been granted a delegation to a file it has opened, | client may have been granted a delegation to a file it has opened, | |||
but the reply to the OPEN (informing the client of the granting of | but the reply to the OPEN (informing the client of the granting of | |||
the delegation) may be delayed in the network. If a conflicting | the delegation) may be delayed in the network. If a conflicting | |||
operation arrives at the server, it will recall the delegation using | operation arrives at the server, it will recall the delegation using | |||
the backchannel, which may be on a different transport connection, | the backchannel, which may be on a different transport connection, | |||
perhaps even a different network, or even a different session | perhaps even a different network, or even a different session | |||
associated with the same client ID | associated with the same client ID. | |||
The presence of a session between the client and server alleviates | The presence of a session between the client and server alleviates | |||
this issue. When a session is in place, each client request is | this issue. When a session is in place, each client request is | |||
uniquely identified by its { session ID, slot ID, sequence ID } | uniquely identified by its { session ID, slot ID, sequence ID } | |||
triple. By the rules under which slot entries (reply cache entries) | triple. By the rules under which slot entries (reply cache entries) | |||
are retired, the server has knowledge whether the client has "seen" | are retired, the server has knowledge whether the client has "seen" | |||
each of the server's replies. The server can therefore provide | each of the server's replies. The server can therefore provide | |||
sufficient information to the client to allow it to disambiguate | sufficient information to the client to allow it to disambiguate | |||
between an erroneous or conflicting callback race condition. | between an erroneous or conflicting callback race condition. | |||
For each client operation which might result in some sort of server | For each client operation that might result in some sort of server | |||
callback, the server SHOULD "remember" the { session ID, slot ID, | callback, the server SHOULD "remember" the { session ID, slot ID, | |||
sequence ID } triple of the client request until the slot ID | sequence ID } triple of the client request until the slot ID | |||
retirement rules allow the server to determine that the client has, | retirement rules allow the server to determine that the client has, | |||
in fact, seen the server's reply. Until the time the { session ID, | in fact, seen the server's reply. Until the time the { session ID, | |||
slot ID, sequence ID } request triple can be retired, any recalls of | slot ID, sequence ID } request triple can be retired, any recalls of | |||
the associated object MUST carry an array of these referring | the associated object MUST carry an array of these referring | |||
identifiers (in the CB_SEQUENCE operation's arguments), for the | identifiers (in the CB_SEQUENCE operation's arguments), for the | |||
benefit of the client. After this time, it is not necessary for the | benefit of the client. After this time, it is not necessary for the | |||
server to provide this information in related callbacks, since it is | server to provide this information in related callbacks, since it is | |||
certain that a race condition can no longer occur. | certain that a race condition can no longer occur. | |||
The CB_SEQUENCE operation which begins each server callback carries a | The CB_SEQUENCE operation that begins each server callback carries a | |||
list of "referring" { session ID, slot ID, sequence ID } triples. If | list of "referring" { session ID, slot ID, sequence ID } triples. If | |||
the client finds the request corresponding to the referring session | the client finds the request corresponding to the referring session | |||
ID, slot ID and sequence ID to be currently outstanding (i.e. the | ID, slot ID, and sequence ID to be currently outstanding (i.e., the | |||
server's reply has not been seen by the client), it can determine | server's reply has not been seen by the client), it can determine | |||
that the callback has raced the reply, and act accordingly. If the | that the callback has raced the reply, and act accordingly. If the | |||
client does not find the request corresponding the referring triple | client does not find the request corresponding to the referring | |||
to be outstanding (including the case of a session ID referring to a | triple to be outstanding (including the case of a session ID | |||
destroyed session), then there is no race with respect to this | referring to a destroyed session), then there is no race with respect | |||
triple. The server SHOULD limit the referring triples to requests | to this triple. The server SHOULD limit the referring triples to | |||
that refer to just those that apply to the objects referred to in the | requests that refer to just those that apply to the objects referred | |||
CB_COMPOUND procedure. | to in the CB_COMPOUND procedure. | |||
The client must not simply wait forever for the expected server reply | The client must not simply wait forever for the expected server reply | |||
to arrive before responding to the CB_COMPOUND that won the race, | to arrive before responding to the CB_COMPOUND that won the race, | |||
because it is possible that it will be delayed indefinitely. The | because it is possible that it will be delayed indefinitely. The | |||
client should assume the likely case that the reply will arrive | client should assume the likely case that the reply will arrive | |||
within the average round trip time for COMPOUND requests to the | within the average round-trip time for COMPOUND requests to the | |||
server, and wait that period of time. If that period of time expires | server, and wait that period of time. If that period of time | |||
it can respond to the CB_COMPOUND with NFS4ERR_DELAY. | expires, it can respond to the CB_COMPOUND with NFS4ERR_DELAY. | |||
There are other scenarios under which callbacks may race replies. | There are other scenarios under which callbacks may race replies. | |||
Among them are pNFS layout recalls as described in Section 12.5.5.2. | Among them are pNFS layout recalls as described in Section 12.5.5.2. | |||
2.10.6.4. COMPOUND and CB_COMPOUND Construction Issues | 2.10.6.4. COMPOUND and CB_COMPOUND Construction Issues | |||
Very large requests and replies may pose both buffer management | Very large requests and replies may pose both buffer management | |||
issues (especially with RDMA) and reply cache issues. When the | issues (especially with RDMA) and reply cache issues. When the | |||
session is created, (Section 18.36), for each channel (fore and | session is created (Section 18.36), for each channel (fore and back), | |||
back), the client and server negotiate the maximum sized request they | the client and server negotiate the maximum-sized request they will | |||
will send or process (ca_maxrequestsize), the maximum sized reply | send or process (ca_maxrequestsize), the maximum-sized reply they | |||
they will return or process (ca_maxresponsesize), and the maximum | will return or process (ca_maxresponsesize), and the maximum-sized | |||
sized reply they will store in the reply cache | reply they will store in the reply cache (ca_maxresponsesize_cached). | |||
(ca_maxresponsesize_cached). | ||||
If a request exceeds ca_maxrequestsize, the reply will have the | If a request exceeds ca_maxrequestsize, the reply will have the | |||
status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG | status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG | |||
as the status for first operation (SEQUENCE or CB_SEQUENCE) in the | as the status for the first operation (SEQUENCE or CB_SEQUENCE) in | |||
request (which means no operations in the request executed, and the | the request (which means that no operations in the request executed | |||
state of the slot in the reply cache is unchanged), or it MAY opt to | and that the state of the slot in the reply cache is unchanged), or | |||
return it on a subsequent operation in the same COMPOUND or | it MAY opt to return it on a subsequent operation in the same | |||
CB_COMPOUND request (which means at least one operation did execute | COMPOUND or CB_COMPOUND request (which means that at least one | |||
and the state of the slot in reply cache does change). The replier | operation did execute and that the state of the slot in the reply | |||
SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds | cache does change). The replier SHOULD set NFS4ERR_REQ_TOO_BIG on | |||
ca_maxrequestsize. | the operation that exceeds ca_maxrequestsize. | |||
If a reply exceeds ca_maxresponsesize, the reply will have the status | If a reply exceeds ca_maxresponsesize, the reply will have the status | |||
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | |||
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, | status for the first operation (SEQUENCE or CB_SEQUENCE) in the | |||
or it MAY opt to return it on a subsequent operation (in the same | request, or it MAY opt to return it on a subsequent operation (in the | |||
COMPOUND or CB_COMPOUND reply). A replier MAY return | same COMPOUND or CB_COMPOUND reply). A replier MAY return | |||
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | |||
the response would still exceed ca_maxresponsesize. | the response would still exceed ca_maxresponsesize. | |||
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST | If sa_cachethis or csa_cachethis is TRUE, then the replier MUST cache | |||
cache a reply except if an error is returned by the SEQUENCE or | a reply except if an error is returned by the SEQUENCE or CB_SEQUENCE | |||
CB_SEQUENCE operation (see Section 2.10.6.1.2). If the reply exceeds | operation (see Section 2.10.6.1.2). If the reply exceeds | |||
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are | ca_maxresponsesize_cached (and sa_cachethis or csa_cachethis is | |||
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even | TRUE), then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. | |||
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) | Even if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that | |||
is returned on a operation other than first operation (SEQUENCE or | matter) is returned on an operation other than the first operation | |||
CB_SEQUENCE), then the reply MUST be cached if sa_cachethis or | (SEQUENCE or CB_SEQUENCE), then the reply MUST be cached if | |||
csa_cachethis are TRUE. For example, if a COMPOUND has eleven | sa_cachethis or csa_cachethis is TRUE. For example, if a COMPOUND | |||
operations, including SEQUENCE, the fifth operation is a RENAME, and | has eleven operations, including SEQUENCE, the fifth operation is a | |||
the tenth operation is a READ for one million bytes, the server may | RENAME, and the tenth operation is a READ for one million bytes, the | |||
return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth operation. Since | server may return NFS4ERR_REP_TOO_BIG_TO_CACHE on the tenth | |||
the server executed several operations, especially the non-idempotent | operation. Since the server executed several operations, especially | |||
RENAME, the client's request to cache the reply needs to be honored | the non-idempotent RENAME, the client's request to cache the reply | |||
in order for correct operation of exactly once semantics. If the | needs to be honored in order for the correct operation of exactly | |||
client retries the request, the server will have cached a reply that | once semantics. If the client retries the request, the server will | |||
contains results for ten of the eleven requested operations, with the | have cached a reply that contains results for ten of the eleven | |||
tenth operation having a status of NFS4ERR_REP_TOO_BIG_TO_CACHE. | requested operations, with the tenth operation having a status of | |||
NFS4ERR_REP_TOO_BIG_TO_CACHE. | ||||
A client needs to take care that when sending operations that change | A client needs to take care that when sending operations that change | |||
the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH and | the current filehandle (except for PUTFH, PUTPUBFH, PUTROOTFH, and | |||
RESTOREFH) that it not exceed the maximum reply buffer before the | RESTOREFH), it not exceed the maximum reply buffer before the GETFH | |||
GETFH operation. Otherwise the client will have to retry the | operation. Otherwise, the client will have to retry the operation | |||
operation that changed the current filehandle, in order to obtain the | that changed the current filehandle, in order to obtain the desired | |||
desired filehandle. For the OPEN operation (see Section 18.16), | filehandle. For the OPEN operation (see Section 18.16), retry is not | |||
retry is not always available as an option. The following guidelines | always available as an option. The following guidelines for the | |||
for the handling of filehandle changing operations are advised: | handling of filehandle-changing operations are advised: | |||
o Within the same COMPOUND procedure, a client SHOULD send GETFH | o Within the same COMPOUND procedure, a client SHOULD send GETFH | |||
immediately after a current filehandle changing operation. A | immediately after a current filehandle-changing operation. A | |||
client MUST send GETFH after a current filehandle changing | client MUST send GETFH after a current filehandle-changing | |||
operation that is also non-idempotent (e.g., the OPEN operation), | operation that is also non-idempotent (e.g., the OPEN operation), | |||
unless the operation is RESTOREFH. RESTOREFH is an exception, | unless the operation is RESTOREFH. RESTOREFH is an exception, | |||
because even though it is non-idempotent, the filehandle RESTOREFH | because even though it is non-idempotent, the filehandle RESTOREFH | |||
produced originated from an operation that is either idempotent | produced originated from an operation that is either idempotent | |||
(e.g. PUTFH, LOOKUP), or non-idempotent (e.g. OPEN, CREATE). If | (e.g., PUTFH, LOOKUP), or non-idempotent (e.g., OPEN, CREATE). If | |||
the origin is non-idempotent, then because the client MUST send | the origin is non-idempotent, then because the client MUST send | |||
GETFH after the origin operation, the client can recover if | GETFH after the origin operation, the client can recover if | |||
RESTOREFH returns an error. | RESTOREFH returns an error. | |||
o A server MAY return NFS4ERR_REP_TOO_BIG or | o A server MAY return NFS4ERR_REP_TOO_BIG or | |||
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a | NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a | |||
filehandle changing operation if the reply would be too large on | filehandle-changing operation if the reply would be too large on | |||
the next operation. | the next operation. | |||
o A server SHOULD return NFS4ERR_REP_TOO_BIG or | o A server SHOULD return NFS4ERR_REP_TOO_BIG or | |||
NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a | NFS4ERR_REP_TOO_BIG_TO_CACHE (if sa_cachethis is TRUE) on a | |||
filehandle changing non-idempotent operation if the reply would be | filehandle-changing, non-idempotent operation if the reply would | |||
too large on the next operation, especially if the operation is | be too large on the next operation, especially if the operation is | |||
OPEN. | OPEN. | |||
o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent | o A server MAY return NFS4ERR_UNSAFE_COMPOUND to a non-idempotent | |||
current filehandle changing operation, if it looks at the next | current filehandle-changing operation, if it looks at the next | |||
operation (in the same COMPOUND procedure) and finds it is not | operation (in the same COMPOUND procedure) and finds it is not | |||
GETFH. The server SHOULD do this if it is unable to determine in | GETFH. The server SHOULD do this if it is unable to determine in | |||
advance whether the total response size would exceed | advance whether the total response size would exceed | |||
ca_maxresponsesize_cached or ca_maxresponsesize. | ca_maxresponsesize_cached or ca_maxresponsesize. | |||
2.10.6.5. Persistence | 2.10.6.5. Persistence | |||
Since the reply cache is bounded, it is practical for the reply cache | Since the reply cache is bounded, it is practical for the reply cache | |||
to persist across server restarts. The replier MUST persist the | to persist across server restarts. The replier MUST persist the | |||
following information if it agreed to persist the session (when the | following information if it agreed to persist the session (when the | |||
session was created; see Section 18.36): | session was created; see Section 18.36): | |||
o The session ID. | o The session ID. | |||
o The slot table including the sequence ID and cached reply for each | o The slot table including the sequence ID and cached reply for each | |||
slot. | slot. | |||
The above are sufficient for a replier to provide EOS semantics for | The above are sufficient for a replier to provide EOS semantics for | |||
any requests that were sent and executed before the server restarted. | any requests that were sent and executed before the server restarted. | |||
If the replier is a client then there is no need for it to persist | If the replier is a client, then there is no need for it to persist | |||
any more information, unless the client will be persisting all other | any more information, unless the client will be persisting all other | |||
state across client restart. In which case, the server will never | state across client restart, in which case, the server will never see | |||
see any NFSv4.1-level protocol manifestation of a client restart. If | any NFSv4.1-level protocol manifestation of a client restart. If the | |||
the replier is a server, with just the slot table and session ID | replier is a server, with just the slot table and session ID | |||
persisting, any requests the client retries after the server restart | persisting, any requests the client retries after the server restart | |||
will return the results that are cached in reply cache. and any new | will return the results that are cached in the reply cache, and any | |||
requests (i.e. the sequence ID is one (1) greater than the slot's | new requests (i.e., the sequence ID is one greater than the slot's | |||
sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by | sequence ID) MUST be rejected with NFS4ERR_DEADSESSION (returned by | |||
SEQUENCE). Such a session is considered dead. A server MAY re- | SEQUENCE). Such a session is considered dead. A server MAY re- | |||
animate a session after a server restart so that the session will | animate a session after a server restart so that the session will | |||
accept new requests as well as retries. To re-animate a session the | accept new requests as well as retries. To re-animate a session, the | |||
server needs to persist additional information through server | server needs to persist additional information through server | |||
restart: | restart: | |||
o The client ID. This is a prerequisite to let the client to create | o The client ID. This is a prerequisite to let the client create | |||
more sessions associated with the same client ID as the | more sessions associated with the same client ID as the re- | |||
animated session. | ||||
o The client ID's sequence ID that is used for creating sessions | o The client ID's sequence ID that is used for creating sessions | |||
(see Section 18.35 and Section 18.36). This is a prerequisite to | (see Sections 18.35 and 18.36). This is a prerequisite to let the | |||
let the client create more sessions. | client create more sessions. | |||
o The principal that created the client ID. This allows the server | o The principal that created the client ID. This allows the server | |||
to authenticate the client when it sends EXCHANGE_ID. | to authenticate the client when it sends EXCHANGE_ID. | |||
o The SSV, if SP4_SSV state protection was specified when the client | o The SSV, if SP4_SSV state protection was specified when the client | |||
ID was created (see Section 18.35). This lets the client create | ID was created (see Section 18.35). This lets the client create | |||
new sessions, and associate connections with the new and existing | new sessions, and associate connections with the new and existing | |||
sessions. | sessions. | |||
o The properties of the client ID as defined in Section 18.35. | o The properties of the client ID as defined in Section 18.35. | |||
A persistent reply cache places certain demands on the server. The | A persistent reply cache places certain demands on the server. The | |||
execution of the sequence of operations (starting with SEQUENCE) and | execution of the sequence of operations (starting with SEQUENCE) and | |||
placement of its results in the persistent cache MUST be atomic. If | placement of its results in the persistent cache MUST be atomic. If | |||
a client retries an sequence of operations that was previously | a client retries a sequence of operations that was previously | |||
executed on the server the only acceptable outcomes are either the | executed on the server, the only acceptable outcomes are either the | |||
original cached reply or an indication that client ID or session has | original cached reply or an indication that the client ID or session | |||
been lost (indicating a catastrophic loss of the reply cache or a | has been lost (indicating a catastrophic loss of the reply cache or a | |||
session that has been deleted because the client failed to use the | session that has been deleted because the client failed to use the | |||
session for an extended period of time). | session for an extended period of time). | |||
A server could fail and restart in the middle of a COMPOUND procedure | A server could fail and restart in the middle of a COMPOUND procedure | |||
that contains one or more non-idempotent or idempotent-but-modifying | that contains one or more non-idempotent or idempotent-but-modifying | |||
operations. This creates an even higher challenge for atomic | operations. This creates an even higher challenge for atomic | |||
execution and placement of results in the reply cache. One way to | execution and placement of results in the reply cache. One way to | |||
view the problem is as a single transaction consisting of each | view the problem is as a single transaction consisting of each | |||
operation in the COMPOUND followed by storing the result in | operation in the COMPOUND followed by storing the result in | |||
persistent storage, then finally a transaction commit. If there is a | persistent storage, then finally a transaction commit. If there is a | |||
failure before the transaction is committed, then the server rolls | failure before the transaction is committed, then the server rolls | |||
back the transaction. If server itself fails, then when it restarts, | back the transaction. If the server itself fails, then when it | |||
its recovery logic could roll back the transaction before starting | restarts, its recovery logic could roll back the transaction before | |||
the NFSv4.1 server. | starting the NFSv4.1 server. | |||
While the description of the implementation for atomic execution of | While the description of the implementation for atomic execution of | |||
the request and caching of the reply is beyond the scope of this | the request and caching of the reply is beyond the scope of this | |||
document, an example implementation for NFSv2 [38] is described in | document, an example implementation for NFSv2 [38] is described in | |||
[39]. | [39]. | |||
2.10.7. RDMA Considerations | 2.10.7. RDMA Considerations | |||
A complete discussion of the operation of RPC-based protocols over | A complete discussion of the operation of RPC-based protocols over | |||
RDMA transports is in [8]. A discussion of the operation of NFSv4, | RDMA transports is in [8]. A discussion of the operation of NFSv4, | |||
including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, | including NFSv4.1, over RDMA is in [9]. Where RDMA is considered, | |||
this specification assumes the use of such a layering; it addresses | this specification assumes the use of such a layering; it addresses | |||
only the upper layer issues relevant to making best use of RPC/RDMA. | only the upper-layer issues relevant to making best use of RPC/RDMA. | |||
2.10.7.1. RDMA Connection Resources | 2.10.7.1. RDMA Connection Resources | |||
RDMA requires its consumers to register memory and post buffers of a | RDMA requires its consumers to register memory and post buffers of a | |||
specific size and number for receive operations. | specific size and number for receive operations. | |||
Registration of memory can be a relatively high-overhead operation, | Registration of memory can be a relatively high-overhead operation, | |||
since it requires pinning of buffers, assignment of attributes (e.g. | since it requires pinning of buffers, assignment of attributes (e.g., | |||
readable/writable), and initialization of hardware translation. | readable/writable), and initialization of hardware translation. | |||
Preregistration is desirable to reduce overhead. These registrations | Preregistration is desirable to reduce overhead. These registrations | |||
are specific to hardware interfaces and even to RDMA connection | are specific to hardware interfaces and even to RDMA connection | |||
endpoints, therefore negotiation of their limits is desirable to | endpoints; therefore, negotiation of their limits is desirable to | |||
manage resources effectively. | manage resources effectively. | |||
Following basic registration, these buffers must be posted by the RPC | Following basic registration, these buffers must be posted by the RPC | |||
layer to handle receives. These buffers remain in use by the RPC/ | layer to handle receives. These buffers remain in use by the RPC/ | |||
NFSv4.1 implementation; the size and number of them must be known to | NFSv4.1 implementation; the size and number of them must be known to | |||
the remote peer in order to avoid RDMA errors which would cause a | the remote peer in order to avoid RDMA errors that would cause a | |||
fatal error on the RDMA connection. | fatal error on the RDMA connection. | |||
NFSv4.1 manages slots as resources on a per session basis (see | NFSv4.1 manages slots as resources on a per-session basis (see | |||
Section 2.10), while RDMA connections manage credits on a per | Section 2.10), while RDMA connections manage credits on a per- | |||
connection basis. This means that in order for a peer to send data | connection basis. This means that in order for a peer to send data | |||
over RDMA to a remote buffer, it has to have both an NFSv4.1 slot, | over RDMA to a remote buffer, it has to have both an NFSv4.1 slot and | |||
and an RDMA credit. If multiple RDMA connections are associated with | an RDMA credit. If multiple RDMA connections are associated with a | |||
a session, then if the total number of credits across all RDMA | session, then if the total number of credits across all RDMA | |||
connections associated with the session is X, and the number slots in | connections associated with the session is X, and the number of slots | |||
the session is Y, then the maximum number of outstanding requests is | in the session is Y, then the maximum number of outstanding requests | |||
lesser of X and Y. | is the lesser of X and Y. | |||
2.10.7.2. Flow Control | 2.10.7.2. Flow Control | |||
Previous versions of NFS do not provide flow control; instead they | Previous versions of NFS do not provide flow control; instead, they | |||
rely on the windowing provided by transports like TCP to throttle | rely on the windowing provided by transports like TCP to throttle | |||
requests. This does not work with RDMA, which provides no operation | requests. This does not work with RDMA, which provides no operation | |||
flow control and will terminate a connection in error when limits are | flow control and will terminate a connection in error when limits are | |||
exceeded. Limits such as maximum number of requests outstanding are | exceeded. Limits such as maximum number of requests outstanding are | |||
therefore negotiated when a session is created (see the | therefore negotiated when a session is created (see the | |||
ca_maxrequests field in Section 18.36). These limits then provide | ca_maxrequests field in Section 18.36). These limits then provide | |||
the maxima which each connection associated with the session's | the maxima within which each connection associated with the session's | |||
channel(s) must remain within. RDMA connections are managed within | channel(s) must remain. RDMA connections are managed within these | |||
these limits as described in section 3.3 ("Flow Control"[[Comment.2: | limits as described in Section 3.3 of [8]; if there are multiple RDMA | |||
RFC Editor: please verify section and title of the RPCRDMA document | connections, then the maximum number of requests for a channel will | |||
which is currently at | be divided among the RDMA connections. Put a different way, the onus | |||
http://tools.ietf.org/html/draft-ietf-nfsv4-rpcrdma-08#section-3.3]]) | is on the replier to ensure that the total number of RDMA credits | |||
of [8]; if there are multiple RDMA connections, then the maximum | across all connections associated with the replier's channel does | |||
number of requests for a channel will be divided among the RDMA | exceed the channel's maximum number of outstanding requests. | |||
connections. Put a different way, the onus is on the replier to | ||||
ensure that total number of RDMA credits across all connections | ||||
associated with the replier's channel does exceed the channel's | ||||
maximum number of outstanding requests. | ||||
The limits may also be modified dynamically at the replier's choosing | The limits may also be modified dynamically at the replier's choosing | |||
by manipulating certain parameters present in each NFSv4.1 reply. In | by manipulating certain parameters present in each NFSv4.1 reply. In | |||
addition, the CB_RECALL_SLOT callback operation (see Section 20.8) | addition, the CB_RECALL_SLOT callback operation (see Section 20.8) | |||
can be sent by a server to a client to return RDMA credits to the | can be sent by a server to a client to return RDMA credits to the | |||
server, thereby lowering the maximum number of requests a client can | server, thereby lowering the maximum number of requests a client can | |||
have outstanding to the server. | have outstanding to the server. | |||
2.10.7.3. Padding | 2.10.7.3. Padding | |||
Header padding is requested by each peer at session initiation (see | Header padding is requested by each peer at session initiation (see | |||
the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), | the ca_headerpadsize argument to CREATE_SESSION in Section 18.36), | |||
and subsequently used by the RPC RDMA layer, as described in [8]. | and subsequently used by the RPC RDMA layer, as described in [8]. | |||
Zero padding is permitted. | Zero padding is permitted. | |||
Padding leverages the useful property that RDMA preserve alignment of | Padding leverages the useful property that RDMA preserve alignment of | |||
data, even when they are placed into anonymous (untagged) buffers. | data, even when they are placed into anonymous (untagged) buffers. | |||
If requested, client inline writes will insert appropriate pad bytes | If requested, client inline writes will insert appropriate pad bytes | |||
within the request header to align the data payload on the specified | within the request header to align the data payload on the specified | |||
boundary. The client is encouraged to add sufficient padding (up to | boundary. The client is encouraged to add sufficient padding (up to | |||
the negotiated size) so that the "data" field of the NFSv4.1 WRITE | the negotiated size) so that the "data" field of the WRITE operation | |||
operation is aligned. Most servers can make good use of such | is aligned. Most servers can make good use of such padding, which | |||
padding, which allows them to chain receive buffers in such a way | allows them to chain receive buffers in such a way that any data | |||
that any data carried by client requests will be placed into | carried by client requests will be placed into appropriate buffers at | |||
appropriate buffers at the server, ready for file system processing. | the server, ready for file system processing. The receiver's RPC | |||
The receiver's RPC layer encounters no overhead from skipping over | layer encounters no overhead from skipping over pad bytes, and the | |||
pad bytes, and the RDMA layer's high performance makes the insertion | RDMA layer's high performance makes the insertion and transmission of | |||
and transmission of padding on the sender a significant optimization. | padding on the sender a significant optimization. In this way, the | |||
In this way, the need for servers to perform RDMA Read to satisfy all | need for servers to perform RDMA Read to satisfy all but the largest | |||
but the largest client writes is obviated. An added benefit is the | client writes is obviated. An added benefit is the reduction of | |||
reduction of message round trips on the network - a potentially good | message round trips on the network -- a potentially good trade, where | |||
trade, where latency is present. | latency is present. | |||
The value to choose for padding is subject to a number of criteria. | The value to choose for padding is subject to a number of criteria. | |||
A primary source of variable-length data in the RPC header is the | A primary source of variable-length data in the RPC header is the | |||
authentication information, the form of which is client-determined, | authentication information, the form of which is client-determined, | |||
possibly in response to server specification. The contents of | possibly in response to server specification. The contents of | |||
COMPOUNDs, sizes of strings such as those passed to RENAME, etc. all | COMPOUNDs, sizes of strings such as those passed to RENAME, etc. all | |||
go into the determination of a maximal NFSv4.1 request size and | go into the determination of a maximal NFSv4.1 request size and | |||
therefore minimal buffer size. The client must select its offered | therefore minimal buffer size. The client must select its offered | |||
value carefully, so as not to overburden the server, and vice- versa. | value carefully, so as to avoid overburdening the server, and vice | |||
The benefit of an appropriate padding value is higher performance. | versa. The benefit of an appropriate padding value is higher | |||
[[Comment.3: RFC editor please keep this diagram on one page.]] | performance. | |||
Sender gather: | Sender gather: | |||
|RPC Request|Pad bytes|Length| -> |User data...| | |RPC Request|Pad bytes|Length| -> |User data...| | |||
\------+----------------------/ \ | \------+----------------------/ \ | |||
\ \ | \ \ | |||
\ Receiver scatter: \-----------+- ... | \ Receiver scatter: \-----------+- ... | |||
/-----+----------------\ \ \ | /-----+----------------\ \ \ | |||
|RPC Request|Pad|Length| -> |FS buffer|->|FS buffer|->... | |RPC Request|Pad|Length| -> |FS buffer|->|FS buffer|->... | |||
In the above case, the server may recycle unused buffers to the next | In the above case, the server may recycle unused buffers to the next | |||
posted receive if unused by the actual received request, or may pass | posted receive if unused by the actual received request, or may pass | |||
the now-complete buffers by reference for normal write processing. | the now-complete buffers by reference for normal write processing. | |||
For a server which can make use of it, this removes any need for data | For a server that can make use of it, this removes any need for data | |||
copies of incoming data, without resorting to complicated end-to-end | copies of incoming data, without resorting to complicated end-to-end | |||
buffer advertisement and management. This includes most kernel-based | buffer advertisement and management. This includes most kernel-based | |||
and integrated server designs, among many others. The client may | and integrated server designs, among many others. The client may | |||
perform similar optimizations, if desired. | perform similar optimizations, if desired. | |||
2.10.7.4. Dual RDMA and Non-RDMA Transports | 2.10.7.4. Dual RDMA and Non-RDMA Transports | |||
Some RDMA transports (e.g., RFC5040 [10]), permit a "streaming" (non- | Some RDMA transports (e.g., RFC 5040 [10]) permit a "streaming" (non- | |||
RDMA) phase, where ordinary traffic might flow before "stepping up" | RDMA) phase, where ordinary traffic might flow before "stepping up" | |||
to RDMA mode, commencing RDMA traffic. Some RDMA transports start | to RDMA mode, commencing RDMA traffic. Some RDMA transports start | |||
connections always in RDMA mode. NFSv4.1 allows, but does not | connections always in RDMA mode. NFSv4.1 allows, but does not | |||
assume, a streaming phase before RDMA mode. When a connection is | assume, a streaming phase before RDMA mode. When a connection is | |||
associated with a session, the client and server negotiate whether | associated with a session, the client and server negotiate whether | |||
the connection is used in RDMA or non-RDMA mode (see Section 18.36 | the connection is used in RDMA or non-RDMA mode (see Sections 18.36 | |||
and Section 18.34). | and 18.34). | |||
2.10.8. Sessions Security | 2.10.8. Session Security | |||
2.10.8.1. Session Callback Security | 2.10.8.1. Session Callback Security | |||
Via session / connection association, NFSv4.1 improves security over | Via session / connection association, NFSv4.1 improves security over | |||
that provided by NFSv4.0 for the backchannel. The connection is | that provided by NFSv4.0 for the backchannel. The connection is | |||
client-initiated (see Section 18.34), and subject to the same | client-initiated (see Section 18.34) and subject to the same firewall | |||
firewall and routing checks as the fore channel. At the client's | and routing checks as the fore channel. At the client's option (see | |||
option (see Section 18.35), connection association is fully | Section 18.35), connection association is fully authenticated before | |||
authenticated before being activated (see Section 18.34). Traffic | being activated (see Section 18.34). Traffic from the server over | |||
from the server over the backchannel is authenticated exactly as the | the backchannel is authenticated exactly as the client specifies (see | |||
client specifies (see Section 2.10.8.2). | Section 2.10.8.2). | |||
2.10.8.2. Backchannel RPC Security | 2.10.8.2. Backchannel RPC Security | |||
When the NFSv4.1 client establishes the backchannel, it informs the | When the NFSv4.1 client establishes the backchannel, it informs the | |||
server of the security flavors and principals to use when sending | server of the security flavors and principals to use when sending | |||
requests. If the security flavor is RPCSEC_GSS, the client expresses | requests. If the security flavor is RPCSEC_GSS, the client expresses | |||
the principal in the form of an established RPCSEC_GSS context. The | the principal in the form of an established RPCSEC_GSS context. The | |||
server is free to use any of the flavor/principal combinations the | server is free to use any of the flavor/principal combinations the | |||
client offers, but it MUST NOT use unoffered combinations. This way, | client offers, but it MUST NOT use unoffered combinations. This way, | |||
the client need not provide a target GSS principal for the | the client need not provide a target GSS principal for the | |||
backchannel as it did with NFSv4.0, nor the server have to implement | backchannel as it did with NFSv4.0, nor does the server have to | |||
an RPCSEC_GSS initiator as it did with NFSv4.0 [30]. | implement an RPCSEC_GSS initiator as it did with NFSv4.0 [30]. | |||
The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL | The CREATE_SESSION (Section 18.36) and BACKCHANNEL_CTL | |||
(Section 18.33) operations allow the client to specify flavor/ | (Section 18.33) operations allow the client to specify flavor/ | |||
principal combinations. | principal combinations. | |||
Also note that the SP4_SSV state protection mode (see Section 18.35 | Also note that the SP4_SSV state protection mode (see Sections 18.35 | |||
and Section 2.10.8.3) has the side benefit of providing SSV-derived | and 2.10.8.3) has the side benefit of providing SSV-derived | |||
RPCSEC_GSS contexts (Section 2.10.9). | RPCSEC_GSS contexts (Section 2.10.9). | |||
2.10.8.3. Protection from Unauthorized State Changes | 2.10.8.3. Protection from Unauthorized State Changes | |||
As described to this point in the specification, the state model of | As described to this point in the specification, the state model of | |||
NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation | NFSv4.1 is vulnerable to an attacker that sends a SEQUENCE operation | |||
with a forged session ID and with a slot ID that it expects the | with a forged session ID and with a slot ID that it expects the | |||
legitimate client to use next. When the legitimate client uses the | legitimate client to use next. When the legitimate client uses the | |||
slot ID with the same sequence number, the server returns the | slot ID with the same sequence number, the server returns the | |||
attacker's result from the reply cache which disrupts the legitimate | attacker's result from the reply cache, which disrupts the legitimate | |||
client and thus denies service to it. Similarly an attacker could | client and thus denies service to it. Similarly, an attacker could | |||
send a CREATE_SESSION with a forged client ID to create a new session | send a CREATE_SESSION with a forged client ID to create a new session | |||
associated with the client ID. The attacker could send requests | associated with the client ID. The attacker could send requests | |||
using the new session that change locking state, such as LOCKU | using the new session that change locking state, such as LOCKU | |||
operations to release locks the legitimate client has acquired. | operations to release locks the legitimate client has acquired. | |||
Setting a security policy on the file that requires RPCSEC_GSS | ||||
Setting a security policy on the file which requires RPCSEC_GSS | ||||
credentials when manipulating the file's state is one potential work | credentials when manipulating the file's state is one potential work | |||
around, but has the disadvantage of preventing a legitimate client | around, but has the disadvantage of preventing a legitimate client | |||
from releasing state when RPCSEC_GSS is required to do so, but a GSS | from releasing state when RPCSEC_GSS is required to do so, but a GSS | |||
context cannot be obtained (possibly because the user has logged off | context cannot be obtained (possibly because the user has logged off | |||
the client). | the client). | |||
NFSv4.1 provides three options to a client for state protection which | NFSv4.1 provides three options to a client for state protection, | |||
are specified when a client creates a client ID via EXCHANGE_ID | which are specified when a client creates a client ID via EXCHANGE_ID | |||
(Section 18.35). | (Section 18.35). | |||
The first (SP4_NONE) is to simply waive state protection. | The first (SP4_NONE) is to simply waive state protection. | |||
The other two options (SP4_MACH_CRED and SP4_SSV) share several | The other two options (SP4_MACH_CRED and SP4_SSV) share several | |||
traits: | traits: | |||
o An RPCSEC_GSS-based credential is used to authenticate client ID | o An RPCSEC_GSS-based credential is used to authenticate client ID | |||
and session maintenance operations, including creating and | and session maintenance operations, including creating and | |||
destroying a session, associating a connection with the session, | destroying a session, associating a connection with the session, | |||
skipping to change at page 71, line 47 | skipping to change at page 70, line 47 | |||
might have to be the same as the one that acquired the state). | might have to be the same as the one that acquired the state). | |||
However, the client might not have an RPCSEC_GSS context for such | However, the client might not have an RPCSEC_GSS context for such | |||
a principal, and might not be able to create such a context | a principal, and might not be able to create such a context | |||
(perhaps because the user has logged off). When the client | (perhaps because the user has logged off). When the client | |||
establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a | establishes SP4_MACH_CRED or SP4_SSV protection, it can specify a | |||
list of operations that the server MUST allow using the machine | list of operations that the server MUST allow using the machine | |||
credential (if SP4_MACH_CRED is used) or the SSV credential (if | credential (if SP4_MACH_CRED is used) or the SSV credential (if | |||
SP4_SSV is used). | SP4_SSV is used). | |||
The SP4_MACH_CRED state protection option uses a machine credential | The SP4_MACH_CRED state protection option uses a machine credential | |||
where the principal that creates the client ID, MUST also be the | where the principal that creates the client ID MUST also be the | |||
principal that performs client ID and session maintenance operations. | principal that performs client ID and session maintenance operations. | |||
The security of the machine credential state protection approach | The security of the machine credential state protection approach | |||
depends entirely on safe guarding the per-machine credential. | depends entirely on safe guarding the per-machine credential. | |||
Assuming a proper safe guard, using the per-machine credential for | Assuming a proper safeguard using the per-machine credential for | |||
operations like CREATE_SESSION, BIND_CONN_TO_SESSION, | operations like CREATE_SESSION, BIND_CONN_TO_SESSION, | |||
DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from | DESTROY_SESSION, and DESTROY_CLIENTID will prevent an attacker from | |||
associating a rogue connection with a session, or associating a rogue | associating a rogue connection with a session, or associating a rogue | |||
session with a client ID. | session with a client ID. | |||
There are at least three scenarios for the SP4_MACH_CRED option: | There are at least three scenarios for the SP4_MACH_CRED option: | |||
1. That the system administrator configures a unique, permanent per- | 1. The system administrator configures a unique, permanent per- | |||
machine credential for one of the mandated GSS mechanisms (e.g., | machine credential for one of the mandated GSS mechanisms (e.g., | |||
if Kerberos V5 is used, a "keytab" containing a principal derived | if Kerberos V5 is used, a "keytab" containing a principal derived | |||
from a client host name could be used). | from a client host name could be used). | |||
2. The client is used by a single user, and so the client ID and its | 2. The client is used by a single user, and so the client ID and its | |||
sessions are used by just that user. If the user's credential | sessions are used by just that user. If the user's credential | |||
expires, then session and client ID maintenance cannot occur, but | expires, then session and client ID maintenance cannot occur, but | |||
since the client has a single user, only that user is | since the client has a single user, only that user is | |||
inconvenienced. | inconvenienced. | |||
3. The physical client has multiple users, but the client | 3. The physical client has multiple users, but the client | |||
implementation has a unique client ID for each user. This is | implementation has a unique client ID for each user. This is | |||
effectively the same as the second scenario, but a disadvantage | effectively the same as the second scenario, but a disadvantage | |||
is that each user needs to be allocated at least one session | is that each user needs to be allocated at least one session | |||
each, so the approach suffers from lack of economy. | each, so the approach suffers from lack of economy. | |||
The SP4_SSV protection option uses the SSV (Section 1.5), via | The SP4_SSV protection option uses the SSV (Section 1.6), via | |||
RPCSEC_GSS and the SSV GSS mechanism (Section 2.10.9) to protect | RPCSEC_GSS and the SSV GSS mechanism (Section 2.10.9), to protect | |||
state from attack. The SP4_SSV protection option is intended for the | state from attack. The SP4_SSV protection option is intended for the | |||
situation comprised of a client that has multiple active users, and a | situation comprised of a client that has multiple active users and a | |||
system administrator who wants to avoid the burden of installing a | system administrator who wants to avoid the burden of installing a | |||
permanent machine credential on each client. The SSV is established | permanent machine credential on each client. The SSV is established | |||
and updated on the server via SET_SSV (see Section 18.47). To | and updated on the server via SET_SSV (see Section 18.47). To | |||
prevent eavesdropping, a client SHOULD send SET_SSV via RPCSEC_GSS | prevent eavesdropping, a client SHOULD send SET_SSV via RPCSEC_GSS | |||
with the privacy service. Several aspects of the SSV make it | with the privacy service. Several aspects of the SSV make it | |||
intractable for an attacker to guess the SSV, and thus associate | intractable for an attacker to guess the SSV, and thus associate | |||
rogue connections with a session, and rogue sessions with a client | rogue connections with a session, and rogue sessions with a client | |||
ID: | ID: | |||
o The arguments to and results of SET_SSV include digests of the old | o The arguments to and results of SET_SSV include digests of the old | |||
skipping to change at page 73, line 10 | skipping to change at page 72, line 10 | |||
operation or before the second CREATE_SESSION operation on a | operation or before the second CREATE_SESSION operation on a | |||
client ID. If it does not, the SSV mechanism will not generate | client ID. If it does not, the SSV mechanism will not generate | |||
tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as | tokens (Section 2.10.9). A client SHOULD send SET_SSV as soon as | |||
a session is created. | a session is created. | |||
o A SET_SSV request does not replace the SSV with the argument to | o A SET_SSV request does not replace the SSV with the argument to | |||
SET_SSV. Instead, the current SSV on the server is logically | SET_SSV. Instead, the current SSV on the server is logically | |||
exclusive ORed (XORed) with the argument to SET_SSV. Each time a | exclusive ORed (XORed) with the argument to SET_SSV. Each time a | |||
new principal uses a client ID for the first time, the client | new principal uses a client ID for the first time, the client | |||
SHOULD send a SET_SSV with that principal's RPCSEC_GSS | SHOULD send a SET_SSV with that principal's RPCSEC_GSS | |||
credentials, with the RPCSEC_GSS service set to | credentials, with RPCSEC_GSS service set to RPC_GSS_SVC_PRIVACY. | |||
RPC_GSS_SVC_PRIVACY. | ||||
Here are the types of attacks that can be attempted by an attacker | Here are the types of attacks that can be attempted by an attacker | |||
named Eve on a victim named Bob, and how SP4_SSV protection foils | named Eve on a victim named Bob, and how SP4_SSV protection foils | |||
each attack: | each attack: | |||
o Suppose Eve is the first user to log into a legitimate client. | o Suppose Eve is the first user to log into a legitimate client. | |||
Eve's use of an NFSv4.1 file system will cause the legitimate | Eve's use of an NFSv4.1 file system will cause the legitimate | |||
client to create a client ID with SP4_SSV protection, specifying | client to create a client ID with SP4_SSV protection, specifying | |||
that the BIND_CONN_TO_SESSION operation MUST use the SSV | that the BIND_CONN_TO_SESSION operation MUST use the SSV | |||
credential. Eve's use of the file system also causes an SSV to be | credential. Eve's use of the file system also causes an SSV to be | |||
created. The SET_SSV operation that creates the SSV will be | created. The SET_SSV operation that creates the SSV will be | |||
protected by the RPCSEC_GSS context created by the legitimate | protected by the RPCSEC_GSS context created by the legitimate | |||
client which uses Eve's GSS principal and credentials. Eve can | client, which uses Eve's GSS principal and credentials. Eve can | |||
eavesdrop on the network while her RPCSEC_GSS context is created, | eavesdrop on the network while her RPCSEC_GSS context is created | |||
and the SET_SSV using her context is sent. Even if the legitimate | and the SET_SSV using her context is sent. Even if the legitimate | |||
client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve | client sends the SET_SSV with RPC_GSS_SVC_PRIVACY, because Eve | |||
knows her own credentials, she can decrypt the SSV. Eve can | knows her own credentials, she can decrypt the SSV. Eve can | |||
compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will | compute an RPCSEC_GSS credential that BIND_CONN_TO_SESSION will | |||
accept, and so associate a new connection with the legitimate | accept, and so associate a new connection with the legitimate | |||
session. Eve can change the slot ID and sequence state of a | session. Eve can change the slot ID and sequence state of a | |||
legitimate session, and/or the SSV state, in such a way that when | legitimate session, and/or the SSV state, in such a way that when | |||
Bob accesses the server via the same legitimate client, the | Bob accesses the server via the same legitimate client, the | |||
legitimate client will be unable to use the session. | legitimate client will be unable to use the session. | |||
skipping to change at page 73, line 51 | skipping to change at page 72, line 50 | |||
Once the legitimate client establishes an SSV over the new session | Once the legitimate client establishes an SSV over the new session | |||
using Bob's RPCSEC_GSS context, Eve can use the new session via | using Bob's RPCSEC_GSS context, Eve can use the new session via | |||
the legitimate client, but she cannot disrupt Bob. Moreover, | the legitimate client, but she cannot disrupt Bob. Moreover, | |||
because the client SHOULD have modified the SSV due to Eve using | because the client SHOULD have modified the SSV due to Eve using | |||
the new session, Bob cannot get revenge on Eve by associating a | the new session, Bob cannot get revenge on Eve by associating a | |||
rogue connection with the session. | rogue connection with the session. | |||
The question is how did the legitimate client detect that Eve has | The question is how did the legitimate client detect that Eve has | |||
hijacked the old session? When the client detects that a new | hijacked the old session? When the client detects that a new | |||
principal, Bob, wants to use the session, it SHOULD have sent a | principal, Bob, wants to use the session, it SHOULD have sent a | |||
SET_SSV, which leads to following sub-scenarios: | SET_SSV, which leads to the following sub-scenarios: | |||
* Let us suppose that from the rogue connection, Eve sent a | * Let us suppose that from the rogue connection, Eve sent a | |||
SET_SSV with the same slot ID and sequence ID that the | SET_SSV with the same slot ID and sequence ID that the | |||
legitimate client later uses. The server will assume the | legitimate client later uses. The server will assume the | |||
SET_SSV sent with Bob's credentials is a retry, and return to | SET_SSV sent with Bob's credentials is a retry, and return to | |||
the legitimate client the reply it sent Eve. However, unless | the legitimate client the reply it sent Eve. However, unless | |||
Eve can correctly guess the SSV the legitimate client will use, | Eve can correctly guess the SSV the legitimate client will use, | |||
the digest verification checks in the SET_SSV response will | the digest verification checks in the SET_SSV response will | |||
fail. That is an indication to the client that the session has | fail. That is an indication to the client that the session has | |||
apparently been hijacked. | apparently been hijacked. | |||
skipping to change at page 74, line 32 | skipping to change at page 73, line 32 | |||
with the same slot ID and sequence that the legitimate client | with the same slot ID and sequence that the legitimate client | |||
uses for its SET_SSV. The server returns to the legitimate | uses for its SET_SSV. The server returns to the legitimate | |||
client the response it sent Eve. The client sees that the | client the response it sent Eve. The client sees that the | |||
response is not at all what it expects. The client assumes | response is not at all what it expects. The client assumes | |||
either session hijacking or a server bug, and either way | either session hijacking or a server bug, and either way | |||
destroys the old session. | destroys the old session. | |||
o Eve associates a rogue connection with the session as above, and | o Eve associates a rogue connection with the session as above, and | |||
then destroys the session. Again, Bob goes to use the server from | then destroys the session. Again, Bob goes to use the server from | |||
the legitimate client, which sends a SET_SSV using Bob's | the legitimate client, which sends a SET_SSV using Bob's | |||
credentials. The client receives an error that indicates the | credentials. The client receives an error that indicates that the | |||
session does not exist. When the client tries to create a new | session does not exist. When the client tries to create a new | |||
session, this will fail because the SSV it has does not match that | session, this will fail because the SSV it has does not match that | |||
the server has, and now the client knows the session was hijacked. | which the server has, and now the client knows the session was | |||
The legitimate client establishes a new client ID. | hijacked. The legitimate client establishes a new client ID. | |||
o If Eve creates a connection before the legitimate client | o If Eve creates a connection before the legitimate client | |||
establishes an SSV, because the initial value of the SSV is zero | establishes an SSV, because the initial value of the SSV is zero | |||
and therefore known, Eve can send a SET_SSV that will pass the | and therefore known, Eve can send a SET_SSV that will pass the | |||
digest verification check. However because the new connection has | digest verification check. However, because the new connection | |||
not been associated with the session, the SET_SSV is rejected for | has not been associated with the session, the SET_SSV is rejected | |||
that reason. | for that reason. | |||
In summary, an attacker's disruption of state when SP4_SSV protection | In summary, an attacker's disruption of state when SP4_SSV protection | |||
is in use is limited to the formative period of a client ID, its | is in use is limited to the formative period of a client ID, its | |||
first session, and the establishment of the SSV. Once a non- | first session, and the establishment of the SSV. Once a non- | |||
malicious user uses the client ID, the client quickly detects any | malicious user uses the client ID, the client quickly detects any | |||
hijack and rectifies the situation. Once a non-malicious user | hijack and rectifies the situation. Once a non-malicious user | |||
successfully modifies the SSV, the attacker cannot use NFSv4.1 | successfully modifies the SSV, the attacker cannot use NFSv4.1 | |||
operations to disrupt the non-malicious user. | operations to disrupt the non-malicious user. | |||
Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches | Note that neither the SP4_MACH_CRED nor SP4_SSV protection approaches | |||
prevent hijacking of a transport connection that has previously been | prevent hijacking of a transport connection that has previously been | |||
associated with a session. If the goal of a counter threat strategy | associated with a session. If the goal of a counter-threat strategy | |||
is to prevent connection hijacking, the use of IPsec is RECOMMENDED. | is to prevent connection hijacking, the use of IPsec is RECOMMENDED. | |||
If a connection hijack occurs, the hijacker could in theory change | If a connection hijack occurs, the hijacker could in theory change | |||
locking state and negatively impact the service to legitimate | locking state and negatively impact the service to legitimate | |||
clients. However if the server is configured to require the use of | clients. However, if the server is configured to require the use of | |||
RPCSEC_GSS with integrity or privacy on the affected file objects, | RPCSEC_GSS with integrity or privacy on the affected file objects, | |||
and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35), | and if EXCHGID4_FLAG_BIND_PRINC_STATEID capability (Section 18.35) is | |||
is in force, this will thwart unauthorized attempts to change locking | in force, this will thwart unauthorized attempts to change locking | |||
state. | state. | |||
2.10.9. The Secret State Verifier (SSV) GSS Mechanism | 2.10.9. The Secret State Verifier (SSV) GSS Mechanism | |||
The SSV provides the secret key for a GSS mechanism internal to | The SSV provides the secret key for a GSS mechanism internal to | |||
NFSv4.1 that NFSv4.1 uses for state protection. Contexts for this | NFSv4.1 that NFSv4.1 uses for state protection. Contexts for this | |||
mechanism are not established via the RPCSEC_GSS protocol. Instead, | mechanism are not established via the RPCSEC_GSS protocol. Instead, | |||
the contexts are automatically created when EXCHANGE_ID specifies | the contexts are automatically created when EXCHANGE_ID specifies | |||
SP4_SSV protection. The only tokens defined are the PerMsgToken | SP4_SSV protection. The only tokens defined are the PerMsgToken | |||
(emitted by GSS_GetMIC) and the SealedMessage token (emitted by | (emitted by GSS_GetMIC) and the SealedMessage token (emitted by | |||
GSS_Wrap). | GSS_Wrap). | |||
The mechanism OID for the SSV mechanism is: | The mechanism OID for the SSV mechanism is | |||
iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech | iso.org.dod.internet.private.enterprise.Michael Eisler.nfs.ssv_mech | |||
(1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any | (1.3.6.1.4.1.28882.1.1). While the SSV mechanism does not define any | |||
initial context tokens, the OID can be used to let servers indicate | initial context tokens, the OID can be used to let servers indicate | |||
that the SSV mechanism is acceptable whenever the client sends a | that the SSV mechanism is acceptable whenever the client sends a | |||
SECINFO or SECINFO_NO_NAME operation (see Section 2.6). | SECINFO or SECINFO_NO_NAME operation (see Section 2.6). | |||
The SSV mechanism defines four subkeys derived from the SSV value. | The SSV mechanism defines four subkeys derived from the SSV value. | |||
Each time SET_SSV is invoked the subkeys are recalculated by the | Each time SET_SSV is invoked, the subkeys are recalculated by the | |||
client and server. The calculation of each of the four subkeys | client and server. The calculation of each of the four subkeys | |||
depends on each of the four respective ssv_subkey4 enumerated values. | depends on each of the four respective ssv_subkey4 enumerated values. | |||
The calculation uses the HMAC [11], algorithm, using the current SSV | The calculation uses the HMAC [11] algorithm, using the current SSV | |||
as the key, the one way hash algorithm as negotiated by EXCHANGE_ID, | as the key, the one-way hash algorithm as negotiated by EXCHANGE_ID, | |||
and the input text as represented by the XDR encoded enumeration | and the input text as represented by the XDR encoded enumeration | |||
value for that subkey of data type ssv_subkey4. If the length of the | value for that subkey of data type ssv_subkey4. If the length of the | |||
output of the HMAC algorithm exceeds the length of key of encryption | output of the HMAC algorithm exceeds the length of key of the | |||
algorithm (which is also negotiated by EXCHANGE_ID), then the subkey | encryption algorithm (which is also negotiated by EXCHANGE_ID), then | |||
MUST be truncated from the HMAC output, i.e. if the subkey is of N | the subkey MUST be truncated from the HMAC output, i.e., if the | |||
bytes long, then the first N bytes of the HMAC output MUST be used | subkey is of N bytes long, then the first N bytes of the HMAC output | |||
for the subkey. The specification of EXCHANGE_ID states that the | MUST be used for the subkey. The specification of EXCHANGE_ID states | |||
length of the output of the HMAC algorithm MUST NOT be less than | that the length of the output of the HMAC algorithm MUST NOT be less | |||
length of subkey needed for the encryption algorithm (see | than the length of subkey needed for the encryption algorithm (see | |||
Section 18.35). | Section 18.35). | |||
/* Input for computing subkeys */ | /* Input for computing subkeys */ | |||
enum ssv_subkey4 { | enum ssv_subkey4 { | |||
SSV4_SUBKEY_MIC_I2T = 1, | SSV4_SUBKEY_MIC_I2T = 1, | |||
SSV4_SUBKEY_MIC_T2I = 2, | SSV4_SUBKEY_MIC_T2I = 2, | |||
SSV4_SUBKEY_SEAL_I2T = 3, | SSV4_SUBKEY_SEAL_I2T = 3, | |||
SSV4_SUBKEY_SEAL_T2I = 4 | SSV4_SUBKEY_SEAL_T2I = 4 | |||
}; | }; | |||
The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating | The subkey derived from SSV4_SUBKEY_MIC_I2T is used for calculating | |||
message integrity codes (MICs) that originate from the NFSv4.1 | message integrity codes (MICs) that originate from the NFSv4.1 | |||
client, whether as part of a request over the fore channel, or a | client, whether as part of a request over the fore channel or a | |||
response over the backchannel. The subkey derived from | response over the backchannel. The subkey derived from | |||
SSV4_SUBKEY_MIC_T2I is used for MICs originating from the NFSv4.1 | SSV4_SUBKEY_MIC_T2I is used for MICs originating from the NFSv4.1 | |||
server. The subkey derived from SSV4_SUBKEY_SEAL_I2T is used for | server. The subkey derived from SSV4_SUBKEY_SEAL_I2T is used for | |||
encryption text originating from the NFSv4.1 client and the subkey | encryption text originating from the NFSv4.1 client, and the subkey | |||
derived from SSV4_SUBKEY_SEAL_T2I is used for encryption text | derived from SSV4_SUBKEY_SEAL_T2I is used for encryption text | |||
originating from the NFSv4.1 server. | originating from the NFSv4.1 server. | |||
The PerMsgToken description is based on an XDR definition: | The PerMsgToken description is based on an XDR definition: | |||
/* Input for computing smt_hmac */ | /* Input for computing smt_hmac */ | |||
struct ssv_mic_plain_tkn4 { | struct ssv_mic_plain_tkn4 { | |||
uint32_t smpt_ssv_seq; | uint32_t smpt_ssv_seq; | |||
opaque smpt_orig_plain<>; | opaque smpt_orig_plain<>; | |||
}; | }; | |||
/* SSV GSS PerMsgToken token */ | /* SSV GSS PerMsgToken token */ | |||
struct ssv_mic_tkn4 { | struct ssv_mic_tkn4 { | |||
uint32_t smt_ssv_seq; | uint32_t smt_ssv_seq; | |||
opaque smt_hmac<>; | opaque smt_hmac<>; | |||
}; | }; | |||
The field smt_hmac is an HMAC calculated by using the subkey derived | The field smt_hmac is an HMAC calculated by using the subkey derived | |||
from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one | from SSV4_SUBKEY_MIC_I2T or SSV4_SUBKEY_MIC_T2I as the key, the one- | |||
way hash algorithm as negotiated by EXCHANGE_ID, and the input text | way hash algorithm as negotiated by EXCHANGE_ID, and the input text | |||
as represented by data of type ssv_mic_plain_tkn4. The field | as represented by data of type ssv_mic_plain_tkn4. The field | |||
smpt_ssv_seq is the same as smt_ssv_seq. The field smpt_orig_plain | smpt_ssv_seq is the same as smt_ssv_seq. The field smpt_orig_plain | |||
is the "message" input passed to GSS_GetMIC() (see Section 2.3.1 of | is the "message" input passed to GSS_GetMIC() (see Section 2.3.1 of | |||
[7]). The caller of GSS_GetMIC() provides a pointer to a buffer | [7]). The caller of GSS_GetMIC() provides a pointer to a buffer | |||
containing the plain text. The SSV mechanism's entry point for | containing the plain text. The SSV mechanism's entry point for | |||
GSS_GetMIC() encodes this into an opaque array, and the encoding will | GSS_GetMIC() encodes this into an opaque array, and the encoding will | |||
include an initial four byte length, plus any necessary padding. | include an initial four-byte length, plus any necessary padding. | |||
Prepended to this will be the XDR encoded value of smpt_ssv_seq thus | Prepended to this will be the XDR encoded value of smpt_ssv_seq, thus | |||
making up an XDR encoding of a value of data type ssv_mic_plain_tkn4, | making up an XDR encoding of a value of data type ssv_mic_plain_tkn4, | |||
which in turn is the input into the HMAC. | which in turn is the input into the HMAC. | |||
The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type | The token emitted by GSS_GetMIC() is XDR encoded and of XDR data type | |||
ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence | ssv_mic_tkn4. The field smt_ssv_seq comes from the SSV sequence | |||
number which is equal to 1 after SET_SSV (Section 18.47) is called | number, which is equal to one after SET_SSV (Section 18.47) is called | |||
the first time on a client ID. Thereafter, the SSV sequence number | the first time on a client ID. Thereafter, the SSV sequence number | |||
is incremented on each SET_SSV. Thus smt_ssv_seq represents the | is incremented on each SET_SSV. Thus, smt_ssv_seq represents the | |||
version of the SSV at the time GSS_GetMIC() was called. As noted in | version of the SSV at the time GSS_GetMIC() was called. As noted in | |||
Section 18.35, the client and server can maintain multiple concurrent | Section 18.35, the client and server can maintain multiple concurrent | |||
versions of the SSV. This allows the SSV to be changed without | versions of the SSV. This allows the SSV to be changed without | |||
serializing all RPC calls that use the SSV mechanism with SET_SSV | serializing all RPC calls that use the SSV mechanism with SET_SSV | |||
operations. Once the HMAC is calculated, it is XDR encoded into | operations. Once the HMAC is calculated, it is XDR encoded into | |||
smt_hmac, which will include an initial four byte length, and any | smt_hmac, which will include an initial four-byte length, and any | |||
necessary padding. Prepended to this will be the XDR encoded value | necessary padding. Prepended to this will be the XDR encoded value | |||
of smt_ssv_seq. | of smt_ssv_seq. | |||
The SealedMessage description is based on an XDR definition: | The SealedMessage description is based on an XDR definition: | |||
/* Input for computing ssct_encr_data and ssct_hmac */ | /* Input for computing ssct_encr_data and ssct_hmac */ | |||
struct ssv_seal_plain_tkn4 { | struct ssv_seal_plain_tkn4 { | |||
opaque sspt_confounder<>; | opaque sspt_confounder<>; | |||
uint32_t sspt_ssv_seq; | uint32_t sspt_ssv_seq; | |||
opaque sspt_orig_plain<>; | opaque sspt_orig_plain<>; | |||
skipping to change at page 78, line 14 | skipping to change at page 77, line 14 | |||
The ssct_ssv_seq field has the same meaning as smt_ssv_seq. | The ssct_ssv_seq field has the same meaning as smt_ssv_seq. | |||
The ssct_encr_data field is the result of encrypting a value of the | The ssct_encr_data field is the result of encrypting a value of the | |||
XDR encoded data type ssv_seal_plain_tkn4. The encryption key is the | XDR encoded data type ssv_seal_plain_tkn4. The encryption key is the | |||
subkey derived from SSV4_SUBKEY_SEAL_I2T or SSV4_SUBKEY_SEAL_T2I, and | subkey derived from SSV4_SUBKEY_SEAL_I2T or SSV4_SUBKEY_SEAL_T2I, and | |||
the encryption algorithm is that negotiated by EXCHANGE_ID. | the encryption algorithm is that negotiated by EXCHANGE_ID. | |||
The ssct_iv field is the initialization vector (IV) for the | The ssct_iv field is the initialization vector (IV) for the | |||
encryption algorithm (if applicable) and is sent in clear text. The | encryption algorithm (if applicable) and is sent in clear text. The | |||
content and size of the IV MUST comply with specification of the | content and size of the IV MUST comply with the specification of the | |||
encryption algorithm. For example, the id-aes256-CBC algorithm MUST | encryption algorithm. For example, the id-aes256-CBC algorithm MUST | |||
use a 16 byte initialization vector (IV) which MUST be unpredictable | use a 16-byte initialization vector (IV), which MUST be unpredictable | |||
for each instance of a value of type ssv_seal_plain_tkn4 that is | for each instance of a value of data type ssv_seal_plain_tkn4 that is | |||
encrypted with a particular SSV key. | encrypted with a particular SSV key. | |||
The ssct_hmac field is the result of computing an HMAC using value of | The ssct_hmac field is the result of computing an HMAC using the | |||
the XDR encoded data type ssv_seal_plain_tkn4 as the input text. The | value of the XDR encoded data type ssv_seal_plain_tkn4 as the input | |||
key is the subkey derived from SSV4_SUBKEY_MIC_I2T or | text. The key is the subkey derived from SSV4_SUBKEY_MIC_I2T or | |||
SSV4_SUBKEY_MIC_T2I, and the one way hash algorithm is that | SSV4_SUBKEY_MIC_T2I, and the one-way hash algorithm is that | |||
negotiated by EXCHANGE_ID. | negotiated by EXCHANGE_ID. | |||
The sspt_confounder field is a random value. | The sspt_confounder field is a random value. | |||
The sspt_ssv_seq field is the same as ssvt_ssv_seq. | The sspt_ssv_seq field is the same as ssvt_ssv_seq. | |||
The field sspt_orig_plain field is the original plaintext and is the | The field sspt_orig_plain field is the original plaintext and is the | |||
"input_message" input passed to GSS_Wrap() (see Section 2.3.3 of | "input_message" input passed to GSS_Wrap() (see Section 2.3.3 of | |||
[7]). As with the handling of the plaintext by the SSV mechanism's | [7]). As with the handling of the plaintext by the SSV mechanism's | |||
GSS_GetMIC() entry point, the entry point for GSS_Wrap() expects a | GSS_GetMIC() entry point, the entry point for GSS_Wrap() expects a | |||
pointer to the plaintext, and will XDR encode an opaque array into | pointer to the plaintext, and will XDR encode an opaque array into | |||
sspt_orig_plain representing the plain text, along with the other | sspt_orig_plain representing the plain text, along with the other | |||
fields of an instance of data type ssv_seal_plain_tkn4. | fields of an instance of data type ssv_seal_plain_tkn4. | |||
The sspt_pad field is present to support encryption algorithms that | The sspt_pad field is present to support encryption algorithms that | |||
require inputs to be in fixed sized blocks. The content of sspt_pad | require inputs to be in fixed-sized blocks. The content of sspt_pad | |||
is zero filled except for the length. Beware that the XDR encoding | is zero filled except for the length. Beware that the XDR encoding | |||
of ssv_seal_plain_tkn4 contains three variable length arrays, and so | of ssv_seal_plain_tkn4 contains three variable-length arrays, and so | |||
each array consumes four bytes for an array length, and each array | each array consumes four bytes for an array length, and each array | |||
that follows the length is always padded to a multiple of four bytes | that follows the length is always padded to a multiple of four bytes | |||
per the XDR standard. | per the XDR standard. | |||
For example suppose the encryption algorithm uses 16 byte blocks, and | For example, suppose the encryption algorithm uses 16-byte blocks, | |||
the sspt_confounder is three bytes long, and the sspt_orig_plain | and the sspt_confounder is three bytes long, and the sspt_orig_plain | |||
field is 15 bytes long. The XDR encoding of sspt_confounder uses | field is 15 bytes long. The XDR encoding of sspt_confounder uses | |||
eight bytes (4 + 3 + 1 byte pad), the XDR encoding of sspt_ssv_seq | eight bytes (4 + 3 + 1 byte pad), the XDR encoding of sspt_ssv_seq | |||
uses four bytes, the XDR encoding of sspt_orig_plain uses 20 bytes (4 | uses four bytes, the XDR encoding of sspt_orig_plain uses 20 bytes (4 | |||
+ 15 + 1 byte pad), and the smallest XDR encoding of the sspt_pad | + 15 + 1 byte pad), and the smallest XDR encoding of the sspt_pad | |||
field is four bytes. This totals 36 bytes. The next multiple of 16 | field is four bytes. This totals 36 bytes. The next multiple of 16 | |||
is 48, thus the length field of sspt_pad needs to be set to 12 bytes, | is 48; thus, the length field of sspt_pad needs to be set to 12 | |||
or a total encoding of 16 bytes. The total number of XDR encoded | bytes, or a total encoding of 16 bytes. The total number of XDR | |||
bytes is thus 8 + 4 + 20 + 16 = 48. | encoded bytes is thus 8 + 4 + 20 + 16 = 48. | |||
GSS_Wrap() emits a token that is an XDR encoding of a value of data | GSS_Wrap() emits a token that is an XDR encoding of a value of data | |||
type ssv_seal_cipher_tkn4. Note that regardless whether the caller | type ssv_seal_cipher_tkn4. Note that regardless of whether or not | |||
of GSS_Wrap() requests confidentiality or not, the token always has | the caller of GSS_Wrap() requests confidentiality, the token always | |||
confidentiality. This is because the SSV mechanism is for | has confidentiality. This is because the SSV mechanism is for | |||
RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without | RPCSEC_GSS, and RPCSEC_GSS never produces GSS_wrap() tokens without | |||
confidentiality. | confidentiality. | |||
There is one SSV per client ID. There is a single GSS context for a | There is one SSV per client ID. There is a single GSS context for a | |||
client ID / SSV pair. All SSV mechanism RPCSEC_GSS handles of a | client ID / SSV pair. All SSV mechanism RPCSEC_GSS handles of a | |||
client ID / SSV pair share the same GSS context. SSV GSS contexts do | client ID / SSV pair share the same GSS context. SSV GSS contexts do | |||
not expire except when the SSV is destroyed (causes would include the | not expire except when the SSV is destroyed (causes would include the | |||
client ID being destroyed or a server restart). Since one purpose of | client ID being destroyed or a server restart). Since one purpose of | |||
context expiration is to replace keys that have been in use for "too | context expiration is to replace keys that have been in use for "too | |||
long" hence vulnerable to compromise by brute force or accident, the | long", hence vulnerable to compromise by brute force or accident, the | |||
client can replace the SSV key by sending periodic SET_SSV | client can replace the SSV key by sending periodic SET_SSV | |||
operations, by cycling through different users' RPCSEC_GSS | operations, which is done by cycling through different users' | |||
credentials. This way the SSV is replaced without destroying the | RPCSEC_GSS credentials. This way, the SSV is replaced without | |||
SSV's GSS contexts. | destroying the SSV's GSS contexts. | |||
SSV RPCSEC_GSS handles can be expired or deleted by the server at any | SSV RPCSEC_GSS handles can be expired or deleted by the server at any | |||
time and the EXCHANGE_ID operation can be used to create more SSV | time, and the EXCHANGE_ID operation can be used to create more SSV | |||
RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not | RPCSEC_GSS handles. Expiration of SSV RPCSEC_GSS handles does not | |||
imply that the SSV or its GSS context have expired. | imply that the SSV or its GSS context has expired. | |||
The client MUST establish an SSV via SET_SSV before the SSV GSS | The client MUST establish an SSV via SET_SSV before the SSV GSS | |||
context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). | context can be used to emit tokens from GSS_Wrap() and GSS_GetMIC(). | |||
If SET_SSV has not been successfully called, attempts to emit tokens | If SET_SSV has not been successfully called, attempts to emit tokens | |||
MUST fail. | MUST fail. | |||
The SSV mechanism does not support replay detection and sequencing in | The SSV mechanism does not support replay detection and sequencing in | |||
its tokens because RPCSEC_GSS does not use those features (See | its tokens because RPCSEC_GSS does not use those features (See | |||
Section 5.2.2 "Context Creation Requests" in [4]). However, | Section 5.2.2, "Context Creation Requests", in [4]). However, | |||
Section 2.10.10 discusses special considerations for the SSV | Section 2.10.10 discusses special considerations for the SSV | |||
mechanism when used with RPCSEC_GSS. | mechanism when used with RPCSEC_GSS. | |||
2.10.10. Security Considerations for RPCSEC_GSS when using the SSV | 2.10.10. Security Considerations for RPCSEC_GSS When Using the SSV | |||
Mechanism | Mechanism | |||
When a client ID is created with SP4_SSV state protection (see | When a client ID is created with SP4_SSV state protection (see | |||
Section 18.35), the client is permitted to associate multiple | Section 18.35), the client is permitted to associate multiple | |||
RPCSEC_GSS handles with the single SSV GSS context (see | RPCSEC_GSS handles with the single SSV GSS context (see | |||
Section 2.10.9). Because of the way RPCSEC_GSS (both version 1 and | Section 2.10.9). Because of the way RPCSEC_GSS (both version 1 and | |||
version 2, see [4] and [12]) calculate the verifier of the reply, | version 2, see [4] and [12]) calculate the verifier of the reply, | |||
special care must be taken by the implementation of the NFSv4.1 | special care must be taken by the implementation of the NFSv4.1 | |||
client to prevent attacks by a man-in-the-middle. The verifier of an | client to prevent attacks by a man-in-the-middle. The verifier of an | |||
RPCSEC_GSS reply is the output of GSS_GetMIC() applied to the input | RPCSEC_GSS reply is the output of GSS_GetMIC() applied to the input | |||
skipping to change at page 81, line 21 | skipping to change at page 80, line 21 | |||
(RPCSEC_GSS contexts and backchannel connections). If these | (RPCSEC_GSS contexts and backchannel connections). If these | |||
resources vanish, the server takes action as specified in | resources vanish, the server takes action as specified in | |||
Section 2.10.13.2. | Section 2.10.13.2. | |||
2.10.11.2. Obligations of the Client | 2.10.11.2. Obligations of the Client | |||
The client SHOULD honor the following obligations in order to utilize | The client SHOULD honor the following obligations in order to utilize | |||
the session: | the session: | |||
o Keep a necessary session from going idle on the server. A client | o Keep a necessary session from going idle on the server. A client | |||
that requires a session, but nonetheless is not sending operations | that requires a session but nonetheless is not sending operations | |||
risks having the server destroy the session. This is because | risks having the session be destroyed by the server. This is | |||
sessions consume resources, and resource limitations may force the | because sessions consume resources, and resource limitations may | |||
server to cull an inactive session. A server MAY consider a | force the server to cull an inactive session. A server MAY | |||
session to be inactive if the client has not used the session | consider a session to be inactive if the client has not used the | |||
before the session inactivity timer (Section 2.10.12) has expired. | session before the session inactivity timer (Section 2.10.12) has | |||
expired. | ||||
o Destroy the session when not needed. If a client has multiple | o Destroy the session when not needed. If a client has multiple | |||
sessions, one of which has no requests waiting for replies, and | sessions, one of which has no requests waiting for replies, and | |||
has been idle for some period of time, it SHOULD destroy the | has been idle for some period of time, it SHOULD destroy the | |||
session. | session. | |||
o Maintain GSS contexts and RPCSEC_GSS handles for the backchannel. | o Maintain GSS contexts and RPCSEC_GSS handles for the backchannel. | |||
If the client requires the server to use the RPCSEC_GSS security | If the client requires the server to use the RPCSEC_GSS security | |||
flavor for callbacks, then it needs to be sure the RPCSEC_GSS | flavor for callbacks, then it needs to be sure the RPCSEC_GSS | |||
handles and/or their GSS contexts that are handed to the server | handles and/or their GSS contexts that are handed to the server | |||
via BACKCHANNEL_CTL or CREATE_SESSION are unexpired. | via BACKCHANNEL_CTL or CREATE_SESSION are unexpired. | |||
o Preserve a connection for a backchannel. The server requires a | o Preserve a connection for a backchannel. The server requires a | |||
backchannel in order to gracefully recall recallable state, or | backchannel in order to gracefully recall recallable state or | |||
notify the client of certain events. Note that if the connection | notify the client of certain events. Note that if the connection | |||
is not being used for the fore channel, there is no way for the | is not being used for the fore channel, there is no way for the | |||
client tell if the connection is still alive (e.g., the server | client to tell if the connection is still alive (e.g., the server | |||
restarted without sending a disconnect). The onus is on the | restarted without sending a disconnect). The onus is on the | |||
server, not the client, to determine if the backchannel's | server, not the client, to determine if the backchannel's | |||
connection is alive, and to indicate in the response to a SEQUENCE | connection is alive, and to indicate in the response to a SEQUENCE | |||
operation when the last connection associated with a session's | operation when the last connection associated with a session's | |||
backchannel has disconnected. | backchannel has disconnected. | |||
2.10.11.3. Steps the Client Takes To Establish a Session | 2.10.11.3. Steps the Client Takes to Establish a Session | |||
If the client does not have a client ID, the client sends EXCHANGE_ID | If the client does not have a client ID, the client sends EXCHANGE_ID | |||
to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV | to establish a client ID. If it opts for SP4_MACH_CRED or SP4_SSV | |||
protection, in the spo_must_enforce list of operations, it SHOULD at | protection, in the spo_must_enforce list of operations, it SHOULD at | |||
minimum specify: CREATE_SESSION, DESTROY_SESSION, | minimum specify CREATE_SESSION, DESTROY_SESSION, | |||
BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If opts | BIND_CONN_TO_SESSION, BACKCHANNEL_CTL, and DESTROY_CLIENTID. If it | |||
for SP4_SSV protection, the client needs to ask for SSV-based | opts for SP4_SSV protection, the client needs to ask for SSV-based | |||
RPCSEC_GSS handles. | RPCSEC_GSS handles. | |||
The client uses the client ID to send a CREATE_SESSION on a | The client uses the client ID to send a CREATE_SESSION on a | |||
connection to the server. The results of CREATE_SESSION indicate | connection to the server. The results of CREATE_SESSION indicate | |||
whether the server will persist the session reply cache through a | whether or not the server will persist the session reply cache | |||
server restart or not, and the client notes this for future | through a server that has restarted, and the client notes this for | |||
reference. | future reference. | |||
If the client specified SP4_SSV state protection when the client ID | If the client specified SP4_SSV state protection when the client ID | |||
was created, then it SHOULD send SET_SSV in the first COMPOUND after | was created, then it SHOULD send SET_SSV in the first COMPOUND after | |||
the session is created. Each time a new principal goes to use the | the session is created. Each time a new principal goes to use the | |||
client ID, it SHOULD send a SET_SSV again. | client ID, it SHOULD send a SET_SSV again. | |||
If the client wants to use delegations, layouts, directory | If the client wants to use delegations, layouts, directory | |||
notifications, or any other state that requires a backchannel, then | notifications, or any other state that requires a backchannel, then | |||
it needs to add a connection to the backchannel if CREATE_SESSION did | it needs to add a connection to the backchannel if CREATE_SESSION did | |||
not already do so. The client creates a connection, and calls | not already do so. The client creates a connection, and calls | |||
skipping to change at page 82, line 45 | skipping to change at page 81, line 45 | |||
protection when it called EXCHANGE_ID, then the client SHOULD specify | protection when it called EXCHANGE_ID, then the client SHOULD specify | |||
that the backchannel use RPCSEC_GSS contexts for security. | that the backchannel use RPCSEC_GSS contexts for security. | |||
If the client wants to use additional connections for the | If the client wants to use additional connections for the | |||
backchannel, then it needs to call BIND_CONN_TO_SESSION on each | backchannel, then it needs to call BIND_CONN_TO_SESSION on each | |||
connection it wants to use with the session. If the client wants to | connection it wants to use with the session. If the client wants to | |||
use additional connections for the fore channel, then it needs to | use additional connections for the fore channel, then it needs to | |||
call BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED | call BIND_CONN_TO_SESSION if it specified SP4_SSV or SP4_MACH_CRED | |||
state protection when the client ID was created. | state protection when the client ID was created. | |||
At this point the session has reached steady state. | At this point, the session has reached steady state. | |||
2.10.12. Session Inactivity Timer | 2.10.12. Session Inactivity Timer | |||
The server MAY maintain a session inactivity timer for each session. | The server MAY maintain a session inactivity timer for each session. | |||
If the session inactivity timer expires, then the server MAY destroy | If the session inactivity timer expires, then the server MAY destroy | |||
the session. To avoid losing a session due to inactivity, the client | the session. To avoid losing a session due to inactivity, the client | |||
MUST renew the session inactivity timer. The length of session | MUST renew the session inactivity timer. The length of session | |||
inactivity timer MUST NOT be less than the lease_time attribute | inactivity timer MUST NOT be less than the lease_time attribute | |||
(Section 5.8.1.11). As with lease renewal (Section 8.3), when the | (Section 5.8.1.11). As with lease renewal (Section 8.3), when the | |||
server receives a SEQUENCE operation, it resets the session | server receives a SEQUENCE operation, it resets the session | |||
inactivity timer, and MUST NOT allow the timer to expire while the | inactivity timer, and MUST NOT allow the timer to expire while the | |||
rest of the operations in the COMPOUND procedure's request are still | rest of the operations in the COMPOUND procedure's request are still | |||
executing. Once the last operation has finished, the server MUST set | executing. Once the last operation has finished, the server MUST set | |||
the session inactivity timer to expire no sooner that the sum of the | the session inactivity timer to expire no sooner than the sum of the | |||
current time and the value of the lease_time attribute. | current time and the value of the lease_time attribute. | |||
2.10.13. Session Mechanics - Recovery | 2.10.13. Session Mechanics - Recovery | |||
2.10.13.1. Events Requiring Client Action | 2.10.13.1. Events Requiring Client Action | |||
The following events require client action to recover. | The following events require client action to recover. | |||
2.10.13.1.1. RPCSEC_GSS Context Loss by Callback Path | 2.10.13.1.1. RPCSEC_GSS Context Loss by Callback Path | |||
If all RPCSEC_GSS handles granted by the client to the server for | If all RPCSEC_GSS handles granted by the client to the server for | |||
callback use have expired, the client MUST establish a new handle via | callback use have expired, the client MUST establish a new handle via | |||
BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE results | BACKCHANNEL_CTL. The sr_status_flags field of the SEQUENCE results | |||
indicates when callback handles are nearly expired, or fully expired | indicates when callback handles are nearly expired, or fully expired | |||
(see Section 18.46.3). | (see Section 18.46.3). | |||
2.10.13.1.2. Connection Loss | 2.10.13.1.2. Connection Loss | |||
If the client loses the last connection of the session, and if wants | If the client loses the last connection of the session and wants to | |||
to retain the session, then it needs to create a new connection, and | retain the session, then it needs to create a new connection, and if, | |||
if, when the client ID was created, BIND_CONN_TO_SESSION was | when the client ID was created, BIND_CONN_TO_SESSION was specified in | |||
specified in the spo_must_enforce list, the client MUST use | the spo_must_enforce list, the client MUST use BIND_CONN_TO_SESSION | |||
BIND_CONN_TO_SESSION to associate the connection with the session. | to associate the connection with the session. | |||
If there was a request outstanding at the time the of connection | If there was a request outstanding at the time of connection loss, | |||
loss, then if client wants to continue to use the session it MUST | then if the client wants to continue to use the session, it MUST | |||
retry the request, as described in Section 2.10.6.2. Note that it is | retry the request, as described in Section 2.10.6.2. Note that it is | |||
not necessary to retry requests over a connection with the same | not necessary to retry requests over a connection with the same | |||
source network address or the same destination network address as the | source network address or the same destination network address as the | |||
lost connection. As long as the session ID, slot ID, and sequence ID | lost connection. As long as the session ID, slot ID, and sequence ID | |||
in the retry match that of the original request, the server will | in the retry match that of the original request, the server will | |||
recognize the request as a retry if it executed the request prior to | recognize the request as a retry if it executed the request prior to | |||
disconnect. | disconnect. | |||
If the connection that was lost was the last one associated with the | If the connection that was lost was the last one associated with the | |||
backchannel, and the client wants to retain the backchannel and/or | backchannel, and the client wants to retain the backchannel and/or | |||
skipping to change at page 84, line 10 | skipping to change at page 83, line 10 | |||
reconnect, and if it does, it MUST associate the connection to the | reconnect, and if it does, it MUST associate the connection to the | |||
session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD | session and backchannel via BIND_CONN_TO_SESSION. The server SHOULD | |||
indicate when it has no callback connection via the sr_status_flags | indicate when it has no callback connection via the sr_status_flags | |||
result from SEQUENCE. | result from SEQUENCE. | |||
2.10.13.1.3. Backchannel GSS Context Loss | 2.10.13.1.3. Backchannel GSS Context Loss | |||
Via the sr_status_flags result of the SEQUENCE operation or other | Via the sr_status_flags result of the SEQUENCE operation or other | |||
means, the client will learn if some or all of the RPCSEC_GSS | means, the client will learn if some or all of the RPCSEC_GSS | |||
contexts it assigned to the backchannel have been lost. If the | contexts it assigned to the backchannel have been lost. If the | |||
client wants to the retain the backchannel and/or not put recallable | client wants to retain the backchannel and/or not put recallable | |||
state subjection to revocation, the client needs to use | state subject to revocation, the client needs to use BACKCHANNEL_CTL | |||
BACKCHANNEL_CTL to assign new contexts. | to assign new contexts. | |||
2.10.13.1.4. Loss of Session | 2.10.13.1.4. Loss of Session | |||
The replier might lose a record of the session. Causes include: | The replier might lose a record of the session. Causes include: | |||
o Replier failure and restart | o Replier failure and restart. | |||
o A catastrophe that causes the reply cache to be corrupted or lost | o A catastrophe that causes the reply cache to be corrupted or lost | |||
on the media it was stored on. This applies even if the replier | on the media on which it was stored. This applies even if the | |||
indicated in the CREATE_SESSION results that it would persist the | replier indicated in the CREATE_SESSION results that it would | |||
cache. | persist the cache. | |||
o The server purges the session of a client that has been inactive | o The server purges the session of a client that has been inactive | |||
for a very extended period of time. | for a very extended period of time. | |||
o As a result of configuration changes among a set of clustered | o As a result of configuration changes among a set of clustered | |||
servers, a network address previously connected to one server | servers, a network address previously connected to one server | |||
becomes connected to a different server which has no knowledge of | becomes connected to a different server that has no knowledge of | |||
the session in question. Such a configuration change will | the session in question. Such a configuration change will | |||
generally only happen when the original server ceases to function | generally only happen when the original server ceases to function | |||
for a time. | for a time. | |||
Loss of reply cache is equivalent to loss of session. The replier | Loss of reply cache is equivalent to loss of session. The replier | |||
indicates loss of session to the requester by returning | indicates loss of session to the requester by returning | |||
NFS4ERR_BADSESSION on the next operation that uses the session ID | NFS4ERR_BADSESSION on the next operation that uses the session ID | |||
that refers to the lost session. | that refers to the lost session. | |||
After an event like a server restart, the client may have lost its | After an event like a server restart, the client may have lost its | |||
skipping to change at page 85, line 6 | skipping to change at page 84, line 6 | |||
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns | SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns | |||
NFS4ERR_BADSESSION, the client knows the session is not available to | NFS4ERR_BADSESSION, the client knows the session is not available to | |||
it when communicating with that network address. If the connection | it when communicating with that network address. If the connection | |||
survives session loss, then the next SEQUENCE operation the client | survives session loss, then the next SEQUENCE operation the client | |||
sends over the connection will get back NFS4ERR_BADSESSION. The | sends over the connection will get back NFS4ERR_BADSESSION. The | |||
client again knows the session was lost. | client again knows the session was lost. | |||
Here is one suggested algorithm for the client when it gets | Here is one suggested algorithm for the client when it gets | |||
NFS4ERR_BADSESSION. It is not obligatory in that, if a client does | NFS4ERR_BADSESSION. It is not obligatory in that, if a client does | |||
not want to take advantage of such features as trunking, it may omit | not want to take advantage of such features as trunking, it may omit | |||
parts of it. However, it is a useful example which draws attention | parts of it. However, it is a useful example that draws attention to | |||
to various possible recovery issues: | various possible recovery issues: | |||
1. If the client has other connections to other server network | 1. If the client has other connections to other server network | |||
addresses associated with the same session, attempt a COMPOUND | addresses associated with the same session, attempt a COMPOUND | |||
with a single operation, SEQUENCE, on each of the other | with a single operation, SEQUENCE, on each of the other | |||
connections. | connections. | |||
2. If the attempts succeed, the session is still alive, and this is | 2. If the attempts succeed, the session is still alive, and this is | |||
a strong indicator the server's network address has moved. The | a strong indicator that the server's network address has moved. | |||
client might send an EXCHANGE_ID on the connection that returned | The client might send an EXCHANGE_ID on the connection that | |||
NFS4ERR_BADSESSION to see if there are opportunities for client | returned NFS4ERR_BADSESSION to see if there are opportunities for | |||
ID trunking (i.e. the same client ID and so_major are returned). | client ID trunking (i.e., the same client ID and so_major are | |||
The client might use DNS to see if the moved network address was | returned). The client might use DNS to see if the moved network | |||
replaced with another, so that the performance and availability | address was replaced with another, so that the performance and | |||
benefits of session trunking can continue. | availability benefits of session trunking can continue. | |||
3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION then the | 3. If the SEQUENCE requests fail with NFS4ERR_BADSESSION, then the | |||
session no longer exists on any of the server network addresses | session no longer exists on any of the server network addresses | |||
the client has connections associated with that session ID. It | for which the client has connections associated with that session | |||
is possible the session is still alive and available on other | ID. It is possible the session is still alive and available on | |||
network addresses. The client sends an EXCHANGE_ID on all the | other network addresses. The client sends an EXCHANGE_ID on all | |||
connections to see if the server owner is still listening on | the connections to see if the server owner is still listening on | |||
those network addresses. If the same server owner is returned, | those network addresses. If the same server owner is returned | |||
but a new client ID is returned, this is a strong indicator of a | but a new client ID is returned, this is a strong indicator of a | |||
server restart. If both the same server owner and same client ID | server restart. If both the same server owner and same client ID | |||
are returned, then this is a strong indication that the server | are returned, then this is a strong indication that the server | |||
did delete the session, and the client will need to send a | did delete the session, and the client will need to send a | |||
CREATE_SESSION if it has no other sessions for that client ID. | CREATE_SESSION if it has no other sessions for that client ID. | |||
If a different server owner is returned, the client can use DNS | If a different server owner is returned, the client can use DNS | |||
to find other network addresses. If it does not, or if DNS does | to find other network addresses. If it does not, or if DNS does | |||
not find any other addresses for the server, then the client will | not find any other addresses for the server, then the client will | |||
be unable to provide NFSv4.1 service, and fatal errors should be | be unable to provide NFSv4.1 service, and fatal errors should be | |||
returned to processes that were using the server. If the client | returned to processes that were using the server. If the client | |||
is using a "mount" paradigm, unmounting the server is advised. | is using a "mount" paradigm, unmounting the server is advised. | |||
4. If the client knows of no other connections associated with the | 4. If the client knows of no other connections associated with the | |||
session ID, and server network addresses that are, or have been | session ID and server network addresses that are, or have been, | |||
associated with the session ID, then the client can use DNS to | associated with the session ID, then the client can use DNS to | |||
find other network addresses. If it does not, or if DNS does not | find other network addresses. If it does not, or if DNS does not | |||
find any other addresses for the server, then the client will be | find any other addresses for the server, then the client will be | |||
unable to provide NFSv4.1 service, and fatal errors should be | unable to provide NFSv4.1 service, and fatal errors should be | |||
returned to processes that were using the server. If the client | returned to processes that were using the server. If the client | |||
is using a "mount" paradigm, unmounting the server is advised. | is using a "mount" paradigm, unmounting the server is advised. | |||
If there is a reconfiguration event which results in the same network | If there is a reconfiguration event that results in the same network | |||
address being assigned to servers where the eir_server_scope value is | address being assigned to servers where the eir_server_scope value is | |||
different, it cannot be guaranteed that a session ID generated by the | different, it cannot be guaranteed that a session ID generated by the | |||
first will be recognized as invalid by the first. Therefore, in | first will be recognized as invalid by the first. Therefore, in | |||
managing server reconfigurations among servers with different server | managing server reconfigurations among servers with different server | |||
scope values, it is necessary to make sure that all clients have | scope values, it is necessary to make sure that all clients have | |||
disconnected from the first server before effecting the | disconnected from the first server before effecting the | |||
reconfiguration. Nonetheless, clients cannot assume that servers | reconfiguration. Nonetheless, clients should not assume that servers | |||
will always adhere to this requirement; clients MUST be prepared to | will always adhere to this requirement; clients MUST be prepared to | |||
deal with unexpected effects of server reconfigurations. Even where | deal with unexpected effects of server reconfigurations. Even where | |||
a session ID is inappropriately recognized as valid, it is likely | a session ID is inappropriately recognized as valid, it is likely | |||
that either the connection will not be recognized as valid, or that a | either that the connection will not be recognized as valid or that a | |||
sequence value for a slot will not be correct. Therefore, when a | sequence value for a slot will not be correct. Therefore, when a | |||
client receives results indicating such unexpected errors, the use of | client receives results indicating such unexpected errors, the use of | |||
EXCHANGE_ID to determine the current server configuration is | EXCHANGE_ID to determine the current server configuration is | |||
RECOMMENDED. | RECOMMENDED. | |||
A variation on the above is that after a server's network address | A variation on the above is that after a server's network address | |||
moves, there is no NFSv4.1 server listening. E.g. no listener on | moves, there is no NFSv4.1 server listening, e.g., no listener on | |||
port 2049, the NFSv4 server returns NFS4ERR_MINOR_VERS_MISMATCH, the | port 2049. In this example, one of the following occur: the NFSv4 | |||
NFS server returns a PROG_MISMATCH error, the RPC listener on 2049 | server returns NFS4ERR_MINOR_VERS_MISMATCH, the NFS server returns a | |||
returns PROG_MISMATCH, or attempts to re-connect to the network | PROG_MISMATCH error, the RPC listener on 2049 returns PROG_UNVAIL, or | |||
address timeout. These SHOULD be treated as equivalent to SEQUENCE | attempts to reconnect to the network address timeout. These SHOULD | |||
returning NFS4ERR_BADSESSION for these purposes. | be treated as equivalent to SEQUENCE returning NFS4ERR_BADSESSION for | |||
these purposes. | ||||
When the client detects session loss, it needs to call CREATE_SESSION | When the client detects session loss, it needs to call CREATE_SESSION | |||
to recover. Any non-idempotent operations that were in progress | to recover. Any non-idempotent operations that were in progress | |||
might have been performed on the server at the time of session loss. | might have been performed on the server at the time of session loss. | |||
The client has no general way to recover from this. | The client has no general way to recover from this. | |||
Note that loss of session does not imply loss of lock, open, | Note that loss of session does not imply loss of byte-range lock, | |||
delegation, or layout state because locks, opens, delegations, and | open, delegation, or layout state because locks, opens, delegations, | |||
layouts are tied to the client ID and depend on the client ID, not | and layouts are tied to the client ID and depend on the client ID, | |||
the session. Nor does loss of lock, open, delegation, or layout | not the session. Nor does loss of byte-range lock, open, delegation, | |||
state imply loss of session state, because the session depends on the | or layout state imply loss of session state, because the session | |||
client ID; loss of client ID however does imply loss of session, | depends on the client ID; loss of client ID however does imply loss | |||
lock, open, delegation, and layout state. See Section 8.4.2. A | of session, byte-range lock, open, delegation, and layout state. See | |||
session can survive a server restart, but lock recovery may still be | Section 8.4.2. A session can survive a server restart, but lock | |||
needed. | recovery may still be needed. | |||
It is possible CREATE_SESSION will fail with NFS4ERR_STALE_CLIENTID | It is possible that CREATE_SESSION will fail with | |||
(e.g. the server restarts and does not preserve client ID state). If | NFS4ERR_STALE_CLIENTID (e.g., the server restarts and does not | |||
so, the client needs to call EXCHANGE_ID, followed by CREATE_SESSION. | preserve client ID state). If so, the client needs to call | |||
EXCHANGE_ID, followed by CREATE_SESSION. | ||||
2.10.13.2. Events Requiring Server Action | 2.10.13.2. Events Requiring Server Action | |||
The following events require server action to recover. | The following events require server action to recover. | |||
2.10.13.2.1. Client Crash and Restart | 2.10.13.2.1. Client Crash and Restart | |||
As described in Section 18.35, a restarted client sends EXCHANGE_ID | As described in Section 18.35, a restarted client sends EXCHANGE_ID | |||
in such a way it causes the server to delete any sessions it had. | in such a way that it causes the server to delete any sessions it | |||
had. | ||||
2.10.13.2.2. Client Crash with No Restart | 2.10.13.2.2. Client Crash with No Restart | |||
If a client crashes and never comes back, it will never send | If a client crashes and never comes back, it will never send | |||
EXCHANGE_ID with its old client owner. Thus the server has session | EXCHANGE_ID with its old client owner. Thus, the server has session | |||
state that will never be used again. After an extended period of | state that will never be used again. After an extended period of | |||
time and if the server has resource constraints, it MAY destroy the | time, and if the server has resource constraints, it MAY destroy the | |||
old session as well as locking state. | old session as well as locking state. | |||
2.10.13.2.3. Extended Network Partition | 2.10.13.2.3. Extended Network Partition | |||
To the server, the extended network partition may be no different | To the server, the extended network partition may be no different | |||
from a client crash with no restart (see Section 2.10.13.2.2). | from a client crash with no restart (see Section 2.10.13.2.2). | |||
Unless the server can discern that there is a network partition, it | Unless the server can discern that there is a network partition, it | |||
is free to treat the situation as if the client has crashed | is free to treat the situation as if the client has crashed | |||
permanently. | permanently. | |||
skipping to change at page 87, line 40 | skipping to change at page 86, line 45 | |||
in Section 2.10.6.2. Note that it is not necessary to retry requests | in Section 2.10.6.2. Note that it is not necessary to retry requests | |||
over a connection with the same source network address or the same | over a connection with the same source network address or the same | |||
destination network address as the lost connection. As long as the | destination network address as the lost connection. As long as the | |||
session ID, slot ID, and sequence ID in the retry match that of the | session ID, slot ID, and sequence ID in the retry match that of the | |||
original request, the callback target will recognize the request as a | original request, the callback target will recognize the request as a | |||
retry even if it did see the request prior to disconnect. | retry even if it did see the request prior to disconnect. | |||
If the connection lost is the last one associated with the | If the connection lost is the last one associated with the | |||
backchannel, then the server MUST indicate that in the | backchannel, then the server MUST indicate that in the | |||
sr_status_flags field of every SEQUENCE reply until the backchannel | sr_status_flags field of every SEQUENCE reply until the backchannel | |||
is reestablished. There are two situations each of which use | is re-established. There are two situations, each of which uses | |||
different status flags: no connectivity for the session's | different status flags: no connectivity for the session's backchannel | |||
backchannel, and no connectivity for any session backchannel of the | and no connectivity for any session backchannel of the client. See | |||
client. See Section 18.46 for a description of the appropriate flags | Section 18.46 for a description of the appropriate flags in | |||
in sr_status_flags. | sr_status_flags. | |||
2.10.13.2.5. GSS Context Loss | 2.10.13.2.5. GSS Context Loss | |||
The server SHOULD monitor when the number RPCSEC_GSS contexts | The server SHOULD monitor when the number RPCSEC_GSS handles assigned | |||
assigned to the backchannel reaches one, and when that one context is | to the backchannel reaches one, and when that one handle is near | |||
near expiry (i.e. between one and two periods of lease time), | expiry (i.e., between one and two periods of lease time), and | |||
indicate so in the sr_status_flags field of all SEQUENCE replies. | indicate so in the sr_status_flags field of all SEQUENCE replies. | |||
The server MUST indicate when all of the backchannel's assigned | The server MUST indicate when all of the backchannel's assigned | |||
RPCSEC_GSS handles have expired via the sr_status_flags field of all | RPCSEC_GSS handles have expired via the sr_status_flags field of all | |||
SEQUENCE replies. | SEQUENCE replies. | |||
2.10.14. Parallel NFS and Sessions | 2.10.14. Parallel NFS and Sessions | |||
A client and server can potentially be a non-pNFS implementation, a | A client and server can potentially be a non-pNFS implementation, a | |||
metadata server implementation, a data server implementation, or two | metadata server implementation, a data server implementation, or two | |||
or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, | or three types of implementations. The EXCHGID4_FLAG_USE_NON_PNFS, | |||
EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not | EXCHGID4_FLAG_USE_PNFS_MDS, and EXCHGID4_FLAG_USE_PNFS_DS flags (not | |||
mutually exclusive) are passed in the EXCHANGE_ID arguments and | mutually exclusive) are passed in the EXCHANGE_ID arguments and | |||
results to allow the client to indicate how it wants to use sessions | results to allow the client to indicate how it wants to use sessions | |||
created under the client ID, and to allow the server to indicate how | created under the client ID, and to allow the server to indicate how | |||
it will allow the sessions to be used. See Section 13.1 for pNFS | it will allow the sessions to be used. See Section 13.1 for pNFS | |||
sessions considerations. | sessions considerations. | |||
3. Protocol Constants and Data Types | 3. Protocol Constants and Data Types | |||
The syntax and semantics to describe the data types of the NFSv4.1 | The syntax and semantics to describe the data types of the NFSv4.1 | |||
protocol are defined in the XDR RFC4506 [2] and RPC RFC1831 [3] | protocol are defined in the XDR RFC 4506 [2] and RPC RFC 5531 [3] | |||
documents. The next sections build upon the XDR data types to define | documents. The next sections build upon the XDR data types to define | |||
constants, types and structures specific to this protocol. The full | constants, types, and structures specific to this protocol. The full | |||
list of XDR data types is in [13]. | list of XDR data types is in [13]. | |||
3.1. Basic Constants | 3.1. Basic Constants | |||
const NFS4_FHSIZE = 128; | const NFS4_FHSIZE = 128; | |||
const NFS4_VERIFIER_SIZE = 8; | const NFS4_VERIFIER_SIZE = 8; | |||
const NFS4_OPAQUE_LIMIT = 1024; | const NFS4_OPAQUE_LIMIT = 1024; | |||
const NFS4_SESSIONID_SIZE = 16; | const NFS4_SESSIONID_SIZE = 16; | |||
const NFS4_INT64_MAX = 0x7fffffffffffffff; | const NFS4_INT64_MAX = 0x7fffffffffffffff; | |||
skipping to change at page 89, line 7 | skipping to change at page 88, line 14 | |||
o NFS4_FHSIZE is the maximum size of a filehandle. | o NFS4_FHSIZE is the maximum size of a filehandle. | |||
o NFS4_VERIFIER_SIZE is the fixed size of a verifier. | o NFS4_VERIFIER_SIZE is the fixed size of a verifier. | |||
o NFS4_OPAQUE_LIMIT is the maximum size of certain opaque | o NFS4_OPAQUE_LIMIT is the maximum size of certain opaque | |||
information. | information. | |||
o NFS4_SESSIONID_SIZE is the fixed size of a session identifier. | o NFS4_SESSIONID_SIZE is the fixed size of a session identifier. | |||
o NFS4_INT64_MAX is the maximum value of a signed 64 bit integer. | o NFS4_INT64_MAX is the maximum value of a signed 64-bit integer. | |||
o NFS4_UINT64_MAX is the maximum value of an unsigned 64 bit | o NFS4_UINT64_MAX is the maximum value of an unsigned 64-bit | |||
integer. | integer. | |||
o NFS4_INT32_MAX is the maximum value of a signed 32 bit integer. | o NFS4_INT32_MAX is the maximum value of a signed 32-bit integer. | |||
o NFS4_UINT32_MAX is the maximum value of an unsigned 32 bit | o NFS4_UINT32_MAX is the maximum value of an unsigned 32-bit | |||
integer. | integer. | |||
o NFS4_MAXFILELEN is the maximum length of a regular file. | o NFS4_MAXFILELEN is the maximum length of a regular file. | |||
o NFS4_MAXFILEOFF is the maximum offset into a regular file. | o NFS4_MAXFILEOFF is the maximum offset into a regular file. | |||
3.2. Basic Data Types | 3.2. Basic Data Types | |||
These are the base NFSv4.1 data types. | These are the base NFSv4.1 data types. | |||
skipping to change at page 89, line 43 | skipping to change at page 88, line 50 | |||
| | Used for file/directory attributes. | | | | Used for file/directory attributes. | | |||
| bitmap4 | typedef uint32_t bitmap4<>; | | | bitmap4 | typedef uint32_t bitmap4<>; | | |||
| | Used in attribute array encoding. | | | | Used in attribute array encoding. | | |||
| changeid4 | typedef uint64_t changeid4; | | | changeid4 | typedef uint64_t changeid4; | | |||
| | Used in the definition of change_info4. | | | | Used in the definition of change_info4. | | |||
| clientid4 | typedef uint64_t clientid4; | | | clientid4 | typedef uint64_t clientid4; | | |||
| | Shorthand reference to client identification. | | | | Shorthand reference to client identification. | | |||
| count4 | typedef uint32_t count4; | | | count4 | typedef uint32_t count4; | | |||
| | Various count parameters (READ, WRITE, COMMIT). | | | | Various count parameters (READ, WRITE, COMMIT). | | |||
| length4 | typedef uint64_t length4; | | | length4 | typedef uint64_t length4; | | |||
| | The length of a byte range within a file. | | | | The length of a byte-range within a file. | | |||
| mode4 | typedef uint32_t mode4; | | | mode4 | typedef uint32_t mode4; | | |||
| | Mode attribute data type. | | | | Mode attribute data type. | | |||
| nfs_cookie4 | typedef uint64_t nfs_cookie4; | | | nfs_cookie4 | typedef uint64_t nfs_cookie4; | | |||
| | Opaque cookie value for READDIR. | | | | Opaque cookie value for READDIR. | | |||
| nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | | nfs_fh4 | typedef opaque nfs_fh4<NFS4_FHSIZE>; | | |||
| | Filehandle definition. | | | | Filehandle definition. | | |||
| nfs_ftype4 | enum nfs_ftype4; | | | nfs_ftype4 | enum nfs_ftype4; | | |||
| | Various defined file types. | | | | Various defined file types. | | |||
| nfsstat4 | enum nfsstat4; | | | nfsstat4 | enum nfsstat4; | | |||
| | Return value for operations. | | | | Return value for operations. | | |||
| offset4 | typedef uint64_t offset4; | | | offset4 | typedef uint64_t offset4; | | |||
| | Various offset designations (READ, WRITE, LOCK, | | | | Various offset designations (READ, WRITE, LOCK, | | |||
| | COMMIT). | | | | COMMIT). | | |||
| qop4 | typedef uint32_t qop4; | | | qop4 | typedef uint32_t qop4; | | |||
| | Quality of protection designation in SECINFO. | | | | Quality of protection designation in SECINFO. | | |||
| sec_oid4 | typedef opaque sec_oid4<>; | | | sec_oid4 | typedef opaque sec_oid4<>; | | |||
| | Security Object Identifier. The sec_oid4 data | | | | Security Object Identifier. The sec_oid4 data | | |||
| | type is not really opaque. Instead it contains an | | | | type is not really opaque. Instead, it contains | | |||
| | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the | | | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | | |||
| | mech_type argument to GSS_Init_sec_context. See | | | | the mech_type argument to GSS_Init_sec_context. | | |||
| | [7] for details. | | | | See [7] for details. | | |||
| sequenceid4 | typedef uint32_t sequenceid4; | | | sequenceid4 | typedef uint32_t sequenceid4; | | |||
| | Sequence number used for various session | | | | Sequence number used for various session | | |||
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | | operations (EXCHANGE_ID, CREATE_SESSION, | | |||
| | SEQUENCE, CB_SEQUENCE). | | | | SEQUENCE, CB_SEQUENCE). | | |||
| seqid4 | typedef uint32_t seqid4; | | | seqid4 | typedef uint32_t seqid4; | | |||
| | Sequence identifier used for file locking. | | | | Sequence identifier used for locking. | | |||
| sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | |||
| | Session identifier. | | | | Session identifier. | | |||
| slotid4 | typedef uint32_t slotid4; | | | slotid4 | typedef uint32_t slotid4; | | |||
| | Sequencing artifact for various session | | | | Sequencing artifact for various session | | |||
| | operations (SEQUENCE, CB_SEQUENCE). | | | | operations (SEQUENCE, CB_SEQUENCE). | | |||
| utf8string | typedef opaque utf8string<>; | | | utf8string | typedef opaque utf8string<>; | | |||
| | UTF-8 encoding for strings. | | | | UTF-8 encoding for strings. | | |||
| utf8str_cis | typedef utf8string utf8str_cis; | | | utf8str_cis | typedef utf8string utf8str_cis; | | |||
| | Case-insensitive UTF-8 string. | | | | Case-insensitive UTF-8 string. | | |||
| utf8str_cs | typedef utf8string utf8str_cs; | | | utf8str_cs | typedef utf8string utf8str_cs; | | |||
| | Case-sensitive UTF-8 string. | | | | Case-sensitive UTF-8 string. | | |||
| utf8str_mixed | typedef utf8string utf8str_mixed; | | | utf8str_mixed | typedef utf8string utf8str_mixed; | | |||
| | UTF-8 strings with a case sensitive prefix and a | | | | UTF-8 strings with a case-sensitive prefix and a | | |||
| | case insensitive suffix. | | | | case-insensitive suffix. | | |||
| component4 | typedef utf8str_cs component4; | | | component4 | typedef utf8str_cs component4; | | |||
| | Represents path name components. | | | | Represents path name components. | | |||
| linktext4 | typedef utf8str_cs linktext4; | | | linktext4 | typedef utf8str_cs linktext4; | | |||
| | Symbolic link contents ("symbolic link" is | | | | Symbolic link contents ("symbolic link" is | | |||
| | defined in an Open Group [14] standard). | | | | defined in an Open Group [14] standard). | | |||
| pathname4 | typedef component4 pathname4<>; | | | pathname4 | typedef component4 pathname4<>; | | |||
| | Represents path name for fs_locations. | | | | Represents path name for fs_locations. | | |||
| verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | | | verifier4 | typedef opaque verifier4[NFS4_VERIFIER_SIZE]; | | |||
| | Verifier used for various operations (COMMIT, | | | | Verifier used for various operations (COMMIT, | | |||
| | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) | | | | CREATE, EXCHANGE_ID, OPEN, READDIR, WRITE) | | |||
skipping to change at page 91, line 15 | skipping to change at page 90, line 23 | |||
3.3. Structured Data Types | 3.3. Structured Data Types | |||
3.3.1. nfstime4 | 3.3.1. nfstime4 | |||
struct nfstime4 { | struct nfstime4 { | |||
int64_t seconds; | int64_t seconds; | |||
uint32_t nseconds; | uint32_t nseconds; | |||
}; | }; | |||
The nfstime4 data type gives the number of seconds and nanoseconds | The nfstime4 data type gives the number of seconds and nanoseconds | |||
since midnight or 0 hour January 1, 1970 Coordinated Universal Time | since midnight or zero hour January 1, 1970 Coordinated Universal | |||
(UTC). Values greater than zero for the seconds field denote dates | Time (UTC). Values greater than zero for the seconds field denote | |||
after the 0 hour January 1, 1970. Values less than zero for the | dates after the zero hour January 1, 1970. Values less than zero for | |||
seconds field denote dates before the 0 hour January 1, 1970. In | the seconds field denote dates before the zero hour January 1, 1970. | |||
both cases, the nseconds field is to be added to the seconds field | In both cases, the nseconds field is to be added to the seconds field | |||
for the final time representation. For example, if the time to be | for the final time representation. For example, if the time to be | |||
represented is one-half second before 0 hour January 1, 1970, the | represented is one-half second before zero hour January 1, 1970, the | |||
seconds field would have a value of negative one (-1) and the | seconds field would have a value of negative one (-1) and the | |||
nseconds fields would have a value of one-half second (500000000). | nseconds field would have a value of one-half second (500000000). | |||
Values greater than 999,999,999 for nseconds are invalid. | Values greater than 999,999,999 for nseconds are invalid. | |||
This data type is used to pass time and date information. A server | This data type is used to pass time and date information. A server | |||
converts to and from its local representation of time when processing | converts to and from its local representation of time when processing | |||
time values, preserving as much accuracy as possible. If the | time values, preserving as much accuracy as possible. If the | |||
precision of timestamps stored for a file system object is less than | precision of timestamps stored for a file system object is less than | |||
defined, loss of precision can occur. An adjunct time maintenance | defined, loss of precision can occur. An adjunct time maintenance | |||
protocol is RECOMMENDED to reduce client and server time skew. | protocol is RECOMMENDED to reduce client and server time skew. | |||
3.3.2. time_how4 | 3.3.2. time_how4 | |||
skipping to change at page 92, line 47 | skipping to change at page 92, line 15 | |||
3.3.7. fattr4 | 3.3.7. fattr4 | |||
struct fattr4 { | struct fattr4 { | |||
bitmap4 attrmask; | bitmap4 attrmask; | |||
attrlist4 attr_vals; | attrlist4 attr_vals; | |||
}; | }; | |||
The fattr4 data type is used to represent file and directory | The fattr4 data type is used to represent file and directory | |||
attributes. | attributes. | |||
The bitmap is a counted array of 32 bit integers used to contain bit | The bitmap is a counted array of 32-bit integers used to contain bit | |||
values. The position of the integer in the array that contains bit n | values. The position of the integer in the array that contains bit n | |||
can be computed from the expression (n / 32) and its bit within that | can be computed from the expression (n / 32), and its bit within that | |||
integer is (n mod 32). | integer is (n mod 32). | |||
0 1 | 0 1 | |||
+-----------+-----------+-----------+-- | +-----------+-----------+-----------+-- | |||
| count | 31 .. 0 | 63 .. 32 | | | count | 31 .. 0 | 63 .. 32 | | |||
+-----------+-----------+-----------+-- | +-----------+-----------+-----------+-- | |||
3.3.8. change_info4 | 3.3.8. change_info4 | |||
struct change_info4 { | struct change_info4 { | |||
skipping to change at page 93, line 35 | skipping to change at page 92, line 50 | |||
struct netaddr4 { | struct netaddr4 { | |||
/* see struct rpcb in RFC 1833 */ | /* see struct rpcb in RFC 1833 */ | |||
string na_r_netid<>; /* network id */ | string na_r_netid<>; /* network id */ | |||
string na_r_addr<>; /* universal address */ | string na_r_addr<>; /* universal address */ | |||
}; | }; | |||
The netaddr4 data type is used to identify network transport | The netaddr4 data type is used to identify network transport | |||
endpoints. The r_netid and r_addr fields respectively contain a | endpoints. The r_netid and r_addr fields respectively contain a | |||
netid and uaddr. The netid and uaddr concepts are defined in [15]. | netid and uaddr. The netid and uaddr concepts are defined in [15]. | |||
The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are | The netid and uaddr formats for TCP over IPv4 and TCP over IPv6 are | |||
defined in [15], specifically Tables 2 and 3 and Sections 4.2.3.3 and | defined in [15], specifically Tables 2 and 3 and Sections 5.2.3.3 and | |||
4.2.3.4. | 5.2.3.4. | |||
3.3.10. state_owner4 | 3.3.10. state_owner4 | |||
struct state_owner4 { | struct state_owner4 { | |||
clientid4 clientid; | clientid4 clientid; | |||
opaque owner<NFS4_OPAQUE_LIMIT>; | opaque owner<NFS4_OPAQUE_LIMIT>; | |||
}; | }; | |||
typedef state_owner4 open_owner4; | typedef state_owner4 open_owner4; | |||
typedef state_owner4 lock_owner4; | typedef state_owner4 lock_owner4; | |||
The state_owner4 data type is the base type for the open_owner4 | The state_owner4 data type is the base type for the open_owner4 | |||
Section 3.3.10.1 and lock_owner4 Section 3.3.10.2. | (Section 3.3.10.1) and lock_owner4 (Section 3.3.10.2. | |||
3.3.10.1. open_owner4 | 3.3.10.1. open_owner4 | |||
This data type is used to identify the owner of open state. | This data type is used to identify the owner of OPEN state. | |||
3.3.10.2. lock_owner4 | 3.3.10.2. lock_owner4 | |||
This structure is used to identify the owner of byte-range locking | This structure is used to identify the owner of byte-range locking | |||
state. | state. | |||
3.3.11. open_to_lock_owner4 | 3.3.11. open_to_lock_owner4 | |||
struct open_to_lock_owner4 { | struct open_to_lock_owner4 { | |||
seqid4 open_seqid; | seqid4 open_seqid; | |||
stateid4 open_stateid; | stateid4 open_stateid; | |||
seqid4 lock_seqid; | seqid4 lock_seqid; | |||
lock_owner4 lock_owner; | lock_owner4 lock_owner; | |||
}; | }; | |||
This data type is used for the first LOCK operation done for an | This data type is used for the first LOCK operation done for an | |||
open_owner4. It provides both the open_stateid and lock_owner such | open_owner4. It provides both the open_stateid and lock_owner, such | |||
that the transition is made from a valid open_stateid sequence to | that the transition is made from a valid open_stateid sequence to | |||
that of the new lock_stateid sequence. Using this mechanism avoids | that of the new lock_stateid sequence. Using this mechanism avoids | |||
the confirmation of the lock_owner/lock_seqid pair since it is tied | the confirmation of the lock_owner/lock_seqid pair since it is tied | |||
to established state in the form of the open_stateid/open_seqid. | to established state in the form of the open_stateid/open_seqid. | |||
3.3.12. stateid4 | 3.3.12. stateid4 | |||
struct stateid4 { | struct stateid4 { | |||
uint32_t seqid; | uint32_t seqid; | |||
opaque other[12]; | opaque other[12]; | |||
}; | }; | |||
This data type is used for the various state sharing mechanisms | This data type is used for the various state sharing mechanisms | |||
between the client and server. The client never modifies a value of | between the client and server. The client never modifies a value of | |||
data type stateid. The starting value of the seqid field is | data type stateid. The starting value of the "seqid" field is | |||
undefined. The server is required to increment the seqid field by | undefined. The server is required to increment the "seqid" field by | |||
one (1) at each transition of the stateid. This is important since | one at each transition of the stateid. This is important since the | |||
the client will inspect the seqid in OPEN stateids to determine the | client will inspect the seqid in OPEN stateids to determine the order | |||
order of OPEN processing done by the server. | of OPEN processing done by the server. | |||
3.3.13. layouttype4 | 3.3.13. layouttype4 | |||
enum layouttype4 { | enum layouttype4 { | |||
LAYOUT4_NFSV4_1_FILES = 0x1, | LAYOUT4_NFSV4_1_FILES = 0x1, | |||
LAYOUT4_OSD2_OBJECTS = 0x2, | LAYOUT4_OSD2_OBJECTS = 0x2, | |||
LAYOUT4_BLOCK_VOLUME = 0x3 | LAYOUT4_BLOCK_VOLUME = 0x3 | |||
}; | }; | |||
This data type indicates what type of layout is being used. The file | This data type indicates what type of layout is being used. The file | |||
server advertises the layout types it supports through the | server advertises the layout types it supports through the | |||
fs_layout_type file system attribute (Section 5.12.1). A client asks | fs_layout_type file system attribute (Section 5.12.1). A client asks | |||
for layouts of a particular type in LAYOUTGET, and processes those | for layouts of a particular type in LAYOUTGET, and processes those | |||
layouts in its layout-type-specific logic. | layouts in its layout-type-specific logic. | |||
The layouttype4 data type is 32 bits in length. The range | The layouttype4 data type is 32 bits in length. The range | |||
represented by the layout type is split into three parts. Type 0x0 | represented by the layout type is split into three parts. Type 0x0 | |||
is reserved. Types within the range 0x00000001-0x7FFFFFFF are | is reserved. Types within the range 0x00000001-0x7FFFFFFF are | |||
globally unique and are assigned according to the description in | globally unique and are assigned according to the description in | |||
skipping to change at page 96, line 6 | skipping to change at page 95, line 22 | |||
The device address is used to set up a communication channel with the | The device address is used to set up a communication channel with the | |||
storage device. Different layout types will require different data | storage device. Different layout types will require different data | |||
types to define how they communicate with storage devices. The | types to define how they communicate with storage devices. The | |||
opaque da_addr_body field is interpreted based on the specified | opaque da_addr_body field is interpreted based on the specified | |||
da_layout_type field. | da_layout_type field. | |||
This document defines the device address for the NFSv4.1 file layout | This document defines the device address for the NFSv4.1 file layout | |||
(see Section 13.3), which identifies a storage device by network IP | (see Section 13.3), which identifies a storage device by network IP | |||
address and port number. This is sufficient for the clients to | address and port number. This is sufficient for the clients to | |||
communicate with the NFSv4.1 storage devices, and may be sufficient | communicate with the NFSv4.1 storage devices, and may be sufficient | |||
for other layout types as well. Device types for object storage | for other layout types as well. Device types for object-based | |||
devices and block storage devices (e.g., SCSI volume labels) are | storage devices and block storage devices (e.g., Small Computer | |||
defined by their respective layout specifications. | System Interface (SCSI) volume labels) are defined by their | |||
respective layout specifications. | ||||
3.3.16. layout_content4 | 3.3.16. layout_content4 | |||
struct layout_content4 { | struct layout_content4 { | |||
layouttype4 loc_type; | layouttype4 loc_type; | |||
opaque loc_body<>; | opaque loc_body<>; | |||
}; | }; | |||
The loc_body field is interpreted based on the layout type | The loc_body field is interpreted based on the layout type | |||
(loc_type). This document defines the loc_body for the NFSv4.1 file | (loc_type). This document defines the loc_body for the NFSv4.1 file | |||
layout type is defined; see Section 13.3 for its definition. | layout type; see Section 13.3 for its definition. | |||
3.3.17. layout4 | 3.3.17. layout4 | |||
struct layout4 { | struct layout4 { | |||
offset4 lo_offset; | offset4 lo_offset; | |||
length4 lo_length; | length4 lo_length; | |||
layoutiomode4 lo_iomode; | layoutiomode4 lo_iomode; | |||
layout_content4 lo_content; | layout_content4 lo_content; | |||
}; | }; | |||
The layout4 data type defines a layout for a file. The layout type | The layout4 data type defines a layout for a file. The layout type | |||
specific data is opaque within lo_content. Since layouts are sub- | specific data is opaque within lo_content. Since layouts are sub- | |||
dividable, the offset and length together with the file's filehandle, | dividable, the offset and length together with the file's filehandle, | |||
the client ID, iomode, and layout type, identify the layout. | the client ID, iomode, and layout type identify the layout. | |||
3.3.18. layoutupdate4 | 3.3.18. layoutupdate4 | |||
struct layoutupdate4 { | struct layoutupdate4 { | |||
layouttype4 lou_type; | layouttype4 lou_type; | |||
opaque lou_body<>; | opaque lou_body<>; | |||
}; | }; | |||
The layoutupdate4 data type is used by the client to return updated | The layoutupdate4 data type is used by the client to return updated | |||
layout information to the metadata server via the LAYOUTCOMMIT | layout information to the metadata server via the LAYOUTCOMMIT | |||
(Section 18.42) operation. This data type provides a channel to pass | (Section 18.42) operation. This data type provides a channel to pass | |||
layout type specific information (in field lou_body) back to the | layout type specific information (in field lou_body) back to the | |||
metadata server. E.g., for the block/volume layout type this could | metadata server. For example, for the block/volume layout type, this | |||
include the list of reserved blocks that were written. The contents | could include the list of reserved blocks that were written. The | |||
of the opaque lou_body argument are determined by the layout type. | contents of the opaque lou_body argument are determined by the layout | |||
The NFSv4.1 file-based layout does not use this data type; if | type. The NFSv4.1 file-based layout does not use this data type; if | |||
lou_type is LAYOUT4_NFSV4_1_FILES, the lou_body field MUST have a | lou_type is LAYOUT4_NFSV4_1_FILES, the lou_body field MUST have a | |||
zero length. | zero length. | |||
3.3.19. layouthint4 | 3.3.19. layouthint4 | |||
struct layouthint4 { | struct layouthint4 { | |||
layouttype4 loh_type; | layouttype4 loh_type; | |||
opaque loh_body<>; | opaque loh_body<>; | |||
}; | }; | |||
The layouthint4 data type is used by the client to pass in a hint | The layouthint4 data type is used by the client to pass in a hint | |||
about the type of layout it would like created for a particular file. | about the type of layout it would like created for a particular file. | |||
It is the data type specified by the layout_hint attribute described | It is the data type specified by the layout_hint attribute described | |||
in Section 5.12.4. The metadata server may ignore the hint, or may | in Section 5.12.4. The metadata server may ignore the hint or may | |||
selectively ignore fields within the hint. This hint should be | selectively ignore fields within the hint. This hint should be | |||
provided at create time as part of the initial attributes within | provided at create time as part of the initial attributes within | |||
OPEN. The loh_body field is specific to the type of layout | OPEN. The loh_body field is specific to the type of layout | |||
(loh_type). The NFSv4.1 file-based layout uses the | (loh_type). The NFSv4.1 file-based layout uses the | |||
nfsv4_1_file_layouthint4 data type as defined in Section 13.3. | nfsv4_1_file_layouthint4 data type as defined in Section 13.3. | |||
3.3.20. layoutiomode4 | 3.3.20. layoutiomode4 | |||
enum layoutiomode4 { | enum layoutiomode4 { | |||
LAYOUTIOMODE4_READ = 1, | LAYOUTIOMODE4_READ = 1, | |||
skipping to change at page 97, line 50 | skipping to change at page 97, line 19 | |||
3.3.21. nfs_impl_id4 | 3.3.21. nfs_impl_id4 | |||
struct nfs_impl_id4 { | struct nfs_impl_id4 { | |||
utf8str_cis nii_domain; | utf8str_cis nii_domain; | |||
utf8str_cs nii_name; | utf8str_cs nii_name; | |||
nfstime4 nii_date; | nfstime4 nii_date; | |||
}; | }; | |||
This data type is used to identify client and server implementation | This data type is used to identify client and server implementation | |||
details. The nii_domain field is the DNS domain name that the | details. The nii_domain field is the DNS domain name with which the | |||
implementer is associated with. The nii_name field is the product | implementor is associated. The nii_name field is the product name of | |||
name of the implementation and is completely free form. It is | the implementation and is completely free form. It is RECOMMENDED | |||
RECOMMENDED that the nii_name be used to distinguish machine | that the nii_name be used to distinguish machine architecture, | |||
architecture, machine platforms, revisions, versions, and patch | machine platforms, revisions, versions, and patch levels. The | |||
levels. The nii_date field is the timestamp of when the software | nii_date field is the timestamp of when the software instance was | |||
instance was published or built. | published or built. | |||
3.3.22. threshold_item4 | 3.3.22. threshold_item4 | |||
struct threshold_item4 { | struct threshold_item4 { | |||
layouttype4 thi_layout_type; | layouttype4 thi_layout_type; | |||
bitmap4 thi_hintset; | bitmap4 thi_hintset; | |||
opaque thi_hintlist<>; | opaque thi_hintlist<>; | |||
}; | }; | |||
This data type contains a list of hints specific to a layout type for | This data type contains a list of hints specific to a layout type for | |||
helping the client determine when it should send I/O directly through | helping the client determine when it should send I/O directly through | |||
the metadata server versus the storage devices. The data type | the metadata server versus the storage devices. The data type | |||
consists of the layout type (thi_layout_type), a bitmap (thi_hintset) | consists of the layout type (thi_layout_type), a bitmap (thi_hintset) | |||
describing the set of hints supported by the server (they may differ | describing the set of hints supported by the server (they may differ | |||
based on the layout type), and a list of hints (thi_hintlist), whose | based on the layout type), and a list of hints (thi_hintlist) whose | |||
content is determined by the hintset bitmap. See the mdsthreshold | content is determined by the hintset bitmap. See the mdsthreshold | |||
attribute for more details. | attribute for more details. | |||
The thi_hintset field is a bitmap of the following values: | The thi_hintset field is a bitmap of the following values: | |||
+-------------------------+---+---------+---------------------------+ | +-------------------------+---+---------+---------------------------+ | |||
| name | # | Data | Description | | | name | # | Data | Description | | |||
| | | Type | | | | | | Type | | | |||
+-------------------------+---+---------+---------------------------+ | +-------------------------+---+---------+---------------------------+ | |||
| threshold4_read_size | 0 | length4 | The file size below which | | | threshold4_read_size | 0 | length4 | If a file's length is | | |||
| | | | it is RECOMMENDED to read | | | | | | less than the value of | | |||
| | | | data through the MDS. | | | | | | threshold4_read_size, | | |||
| threshold4_write_size | 1 | length4 | The file size below which | | | | | | then it is RECOMMENDED | | |||
| | | | it is RECOMMENDED to | | | | | | that the client read from | | |||
| | | | write data through the | | | | | | the file via the MDS and | | |||
| | | | MDS. | | | | | | not a storage device. | | |||
| threshold4_write_size | 1 | length4 | If a file's length is | | ||||
| | | | less than the value of | | ||||
| | | | threshold4_write_size, | | ||||
| | | | then it is RECOMMENDED | | ||||
| | | | that the client write to | | ||||
| | | | the file via the MDS and | | ||||
| | | | not a storage device. | | ||||
| threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | | threshold4_read_iosize | 2 | length4 | For read I/O sizes below | | |||
| | | | this threshold it is | | | | | | this threshold, it is | | |||
| | | | RECOMMENDED to read data | | | | | | RECOMMENDED to read data | | |||
| | | | through the MDS | | | | | | through the MDS. | | |||
| threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | | threshold4_write_iosize | 3 | length4 | For write I/O sizes below | | |||
| | | | this threshold it is | | | | | | this threshold, it is | | |||
| | | | RECOMMENDED to write data | | | | | | RECOMMENDED to write data | | |||
| | | | through the MDS | | | | | | through the MDS. | | |||
+-------------------------+---+---------+---------------------------+ | +-------------------------+---+---------+---------------------------+ | |||
3.3.23. mdsthreshold4 | 3.3.23. mdsthreshold4 | |||
struct mdsthreshold4 { | struct mdsthreshold4 { | |||
threshold_item4 mth_hints<>; | threshold_item4 mth_hints<>; | |||
}; | }; | |||
This data type holds an array of elements of data type | This data type holds an array of elements of data type | |||
threshold_item4, each of which is valid for a particular layout type. | threshold_item4, each of which is valid for a particular layout type. | |||
An array is necessary because a server can support multiple layout | An array is necessary because a server can support multiple layout | |||
types for a single file. | types for a single file. | |||
4. Filehandles | 4. Filehandles | |||
The filehandle in the NFS protocol is a per server unique identifier | The filehandle in the NFS protocol is a per-server unique identifier | |||
for a file system object. The contents of the filehandle are opaque | for a file system object. The contents of the filehandle are opaque | |||
to the client. Therefore, the server is responsible for translating | to the client. Therefore, the server is responsible for translating | |||
the filehandle to an internal representation of the file system | the filehandle to an internal representation of the file system | |||
object. | object. | |||
4.1. Obtaining the First Filehandle | 4.1. Obtaining the First Filehandle | |||
The operations of the NFS protocol are defined in terms of one or | The operations of the NFS protocol are defined in terms of one or | |||
more filehandles. Therefore, the client needs a filehandle to | more filehandles. Therefore, the client needs a filehandle to | |||
initiate communication with the server. With the NFSv3 protocol | initiate communication with the server. With the NFSv3 protocol (RFC | |||
(RFC1813 [31]), there exists an ancillary protocol to obtain this | 1813 [31]), there exists an ancillary protocol to obtain this first | |||
first filehandle. The MOUNT protocol, RPC program number 100005, | filehandle. The MOUNT protocol, RPC program number 100005, provides | |||
provides the mechanism of translating a string based file system path | the mechanism of translating a string-based file system pathname to a | |||
name to a filehandle which can then be used by the NFS protocols. | filehandle, which can then be used by the NFS protocols. | |||
The MOUNT protocol has deficiencies in the area of security and use | The MOUNT protocol has deficiencies in the area of security and use | |||
via firewalls. This is one reason that the use of the public | via firewalls. This is one reason that the use of the public | |||
filehandle was introduced in RFC2054 [42] and RFC2055 [43]. With the | filehandle was introduced in RFC 2054 [42] and RFC 2055 [43]. With | |||
use of the public filehandle in combination with the LOOKUP operation | the use of the public filehandle in combination with the LOOKUP | |||
in the NFSv3 protocol, it has been demonstrated that the MOUNT | operation in the NFSv3 protocol, it has been demonstrated that the | |||
protocol is unnecessary for viable interaction between NFS client and | MOUNT protocol is unnecessary for viable interaction between NFS | |||
server. | client and server. | |||
Therefore, the NFSv4.1 protocol will not use an ancillary protocol | Therefore, the NFSv4.1 protocol will not use an ancillary protocol | |||
for translation from string based path names to a filehandle. Two | for translation from string-based pathnames to a filehandle. Two | |||
special filehandles will be used as starting points for the NFS | special filehandles will be used as starting points for the NFS | |||
client. | client. | |||
4.1.1. Root Filehandle | 4.1.1. Root Filehandle | |||
The first of the special filehandles is the ROOT filehandle. The | The first of the special filehandles is the ROOT filehandle. The | |||
ROOT filehandle is the "conceptual" root of the file system name | ROOT filehandle is the "conceptual" root of the file system namespace | |||
space at the NFS server. The client uses or starts with the ROOT | at the NFS server. The client uses or starts with the ROOT | |||
filehandle by employing the PUTROOTFH operation. The PUTROOTFH | filehandle by employing the PUTROOTFH operation. The PUTROOTFH | |||
operation instructs the server to set the "current" filehandle to the | operation instructs the server to set the "current" filehandle to the | |||
ROOT of the server's file tree. Once this PUTROOTFH operation is | ROOT of the server's file tree. Once this PUTROOTFH operation is | |||
used, the client can then traverse the entirety of the server's file | used, the client can then traverse the entirety of the server's file | |||
tree with the LOOKUP operation. A complete discussion of the server | tree with the LOOKUP operation. A complete discussion of the server | |||
name space is in the Section 7. | namespace is in Section 7. | |||
4.1.2. Public Filehandle | 4.1.2. Public Filehandle | |||
The second special filehandle is the PUBLIC filehandle. Unlike the | The second special filehandle is the PUBLIC filehandle. Unlike the | |||
ROOT filehandle, the PUBLIC filehandle may be bound or represent an | ROOT filehandle, the PUBLIC filehandle may be bound or represent an | |||
arbitrary file system object at the server. The server is | arbitrary file system object at the server. The server is | |||
responsible for this binding. It may be that the PUBLIC filehandle | responsible for this binding. It may be that the PUBLIC filehandle | |||
and the ROOT filehandle refer to the same file system object. | and the ROOT filehandle refer to the same file system object. | |||
However, it is up to the administrative software at the server and | However, it is up to the administrative software at the server and | |||
the policies of the server administrator to define the binding of the | the policies of the server administrator to define the binding of the | |||
skipping to change at page 100, line 34 | skipping to change at page 100, line 15 | |||
4.2. Filehandle Types | 4.2. Filehandle Types | |||
In the NFSv3 protocol, there was one type of filehandle with a single | In the NFSv3 protocol, there was one type of filehandle with a single | |||
set of semantics. This type of filehandle is termed "persistent" in | set of semantics. This type of filehandle is termed "persistent" in | |||
NFSv4.1. The semantics of a persistent filehandle remain the same as | NFSv4.1. The semantics of a persistent filehandle remain the same as | |||
before. A new type of filehandle introduced in NFSv4.1 is the | before. A new type of filehandle introduced in NFSv4.1 is the | |||
"volatile" filehandle, which attempts to accommodate certain server | "volatile" filehandle, which attempts to accommodate certain server | |||
environments. | environments. | |||
The volatile filehandle type was introduced to address server | The volatile filehandle type was introduced to address server | |||
functionality or implementation issues which make correct | functionality or implementation issues that make correct | |||
implementation of a persistent filehandle infeasible. Some server | implementation of a persistent filehandle infeasible. Some server | |||
environments do not provide a file system level invariant that can be | environments do not provide a file-system-level invariant that can be | |||
used to construct a persistent filehandle. The underlying server | used to construct a persistent filehandle. The underlying server | |||
file system may not provide the invariant or the server's file system | file system may not provide the invariant or the server's file system | |||
programming interfaces may not provide access to the needed | programming interfaces may not provide access to the needed | |||
invariant. Volatile filehandles may ease the implementation of | invariant. Volatile filehandles may ease the implementation of | |||
server functionality such as hierarchical storage management or file | server functionality such as hierarchical storage management or file | |||
system reorganization or migration. However, the volatile filehandle | system reorganization or migration. However, the volatile filehandle | |||
increases the implementation burden for the client. | increases the implementation burden for the client. | |||
Since the client will need to handle persistent and volatile | Since the client will need to handle persistent and volatile | |||
filehandles differently, a file attribute is defined which may be | filehandles differently, a file attribute is defined that may be used | |||
used by the client to determine the filehandle types being returned | by the client to determine the filehandle types being returned by the | |||
by the server. | server. | |||
4.2.1. General Properties of a Filehandle | 4.2.1. General Properties of a Filehandle | |||
The filehandle contains all the information the server needs to | The filehandle contains all the information the server needs to | |||
distinguish an individual file. To the client, the filehandle is | distinguish an individual file. To the client, the filehandle is | |||
opaque. The client stores filehandles for use in a later request and | opaque. The client stores filehandles for use in a later request and | |||
can compare two filehandles from the same server for equality by | can compare two filehandles from the same server for equality by | |||
doing a byte-by-byte comparison. However, the client MUST NOT | doing a byte-by-byte comparison. However, the client MUST NOT | |||
otherwise interpret the contents of filehandles. If two filehandles | otherwise interpret the contents of filehandles. If two filehandles | |||
from the same server are equal, they MUST refer to the same file. | from the same server are equal, they MUST refer to the same file. | |||
Servers SHOULD try to maintain a one-to-one correspondence between | Servers SHOULD try to maintain a one-to-one correspondence between | |||
filehandles and files but this is not required. Clients MUST use | filehandles and files, but this is not required. Clients MUST use | |||
filehandle comparisons only to improve performance, not for correct | filehandle comparisons only to improve performance, not for correct | |||
behavior. All clients need to be prepared for situations in which it | behavior. All clients need to be prepared for situations in which it | |||
cannot be determined whether two filehandles denote the same object | cannot be determined whether two filehandles denote the same object | |||
and in such cases, avoid making invalid assumptions which might cause | and in such cases, avoid making invalid assumptions that might cause | |||
incorrect behavior. Further discussion of filehandle and attribute | incorrect behavior. Further discussion of filehandle and attribute | |||
comparison in the context of data caching is presented in the | comparison in the context of data caching is presented in | |||
Section 10.3.4. | Section 10.3.4. | |||
As an example, in the case that two different path names when | As an example, in the case that two different path names when | |||
traversed at the server terminate at the same file system object, the | traversed at the server terminate at the same file system object, the | |||
server SHOULD return the same filehandle for each path. This can | server SHOULD return the same filehandle for each path. This can | |||
occur if a hard link (see [6]) is used to create two file names which | occur if a hard link (see [6]) is used to create two file names that | |||
refer to the same underlying file object and associated data. For | refer to the same underlying file object and associated data. For | |||
example, if paths /a/b/c and /a/d/c refer to the same file, the | example, if paths /a/b/c and /a/d/c refer to the same file, the | |||
server SHOULD return the same filehandle for both path name | server SHOULD return the same filehandle for both pathnames' | |||
traversals. | traversals. | |||
4.2.2. Persistent Filehandle | 4.2.2. Persistent Filehandle | |||
A persistent filehandle is defined as having a fixed value for the | A persistent filehandle is defined as having a fixed value for the | |||
lifetime of the file system object to which it refers. Once the | lifetime of the file system object to which it refers. Once the | |||
server creates the filehandle for a file system object, the server | server creates the filehandle for a file system object, the server | |||
MUST accept the same filehandle for the object for the lifetime of | MUST accept the same filehandle for the object for the lifetime of | |||
the object. If the server restarts, the NFS server MUST honor the | the object. If the server restarts, the NFS server MUST honor the | |||
same filehandle value as it did in the server's previous | same filehandle value as it did in the server's previous | |||
skipping to change at page 101, line 52 | skipping to change at page 101, line 29 | |||
NFS server MUST honor the same filehandle as the old NFS server. | NFS server MUST honor the same filehandle as the old NFS server. | |||
The persistent filehandle will be become stale or invalid when the | The persistent filehandle will be become stale or invalid when the | |||
file system object is removed. When the server is presented with a | file system object is removed. When the server is presented with a | |||
persistent filehandle that refers to a deleted object, it MUST return | persistent filehandle that refers to a deleted object, it MUST return | |||
an error of NFS4ERR_STALE. A filehandle may become stale when the | an error of NFS4ERR_STALE. A filehandle may become stale when the | |||
file system containing the object is no longer available. The file | file system containing the object is no longer available. The file | |||
system may become unavailable if it exists on removable media and the | system may become unavailable if it exists on removable media and the | |||
media is no longer available at the server or the file system in | media is no longer available at the server or the file system in | |||
whole has been destroyed or the file system has simply been removed | whole has been destroyed or the file system has simply been removed | |||
from the server's name space (i.e. unmounted in a UNIX environment). | from the server's namespace (i.e., unmounted in a UNIX environment). | |||
4.2.3. Volatile Filehandle | 4.2.3. Volatile Filehandle | |||
A volatile filehandle does not share the same longevity | A volatile filehandle does not share the same longevity | |||
characteristics of a persistent filehandle. The server may determine | characteristics of a persistent filehandle. The server may determine | |||
that a volatile filehandle is no longer valid at many different | that a volatile filehandle is no longer valid at many different | |||
points in time. If the server can definitively determine that a | points in time. If the server can definitively determine that a | |||
volatile filehandle refers to an object that has been removed, the | volatile filehandle refers to an object that has been removed, the | |||
server should return NFS4ERR_STALE to the client (as is the case for | server should return NFS4ERR_STALE to the client (as is the case for | |||
persistent filehandles). In all other cases where the server | persistent filehandles). In all other cases where the server | |||
skipping to change at page 102, line 29 | skipping to change at page 102, line 12 | |||
particular file system. This attribute is a bitmask with the | particular file system. This attribute is a bitmask with the | |||
following values: | following values: | |||
FH4_PERSISTENT The value of FH4_PERSISTENT is used to indicate a | FH4_PERSISTENT The value of FH4_PERSISTENT is used to indicate a | |||
persistent filehandle, which is valid until the object is removed | persistent filehandle, which is valid until the object is removed | |||
from the file system. The server will not return | from the file system. The server will not return | |||
NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined | NFS4ERR_FHEXPIRED for this filehandle. FH4_PERSISTENT is defined | |||
as a value in which none of the bits specified below are set. | as a value in which none of the bits specified below are set. | |||
FH4_VOLATILE_ANY The filehandle may expire at any time, except as | FH4_VOLATILE_ANY The filehandle may expire at any time, except as | |||
specifically excluded (i.e. FH4_NO_EXPIRE_WITH_OPEN). | specifically excluded (i.e., FH4_NO_EXPIRE_WITH_OPEN). | |||
FH4_NOEXPIRE_WITH_OPEN May only be set when FH4_VOLATILE_ANY is set. | FH4_NOEXPIRE_WITH_OPEN May only be set when FH4_VOLATILE_ANY is set. | |||
If this bit is set, then the meaning of FH4_VOLATILE_ANY is | If this bit is set, then the meaning of FH4_VOLATILE_ANY is | |||
qualified to exclude any expiration of the filehandle when it is | qualified to exclude any expiration of the filehandle when it is | |||
open. | open. | |||
FH4_VOL_MIGRATION The filehandle will expire as a result of a file | FH4_VOL_MIGRATION The filehandle will expire as a result of a file | |||
system transition (migration or replication), in those case in | system transition (migration or replication), in those cases in | |||
which the continuity of filehandle use is not specified by | which the continuity of filehandle use is not specified by handle | |||
_handle_ class information within the fs_locations_info attribute. | class information within the fs_locations_info attribute. When | |||
When this bit is set, clients without access to fs_locations_info | this bit is set, clients without access to fs_locations_info | |||
information should assume filehandles will expire on file system | information should assume that filehandles will expire on file | |||
transitions. | system transitions. | |||
FH4_VOL_RENAME The filehandle will expire during rename. This | FH4_VOL_RENAME The filehandle will expire during rename. This | |||
includes a rename by the requesting client or a rename by any | includes a rename by the requesting client or a rename by any | |||
other client. If FH4_VOLATILE_ANY is set, FH4_VOL_RENAME is | other client. If FH4_VOL_ANY is set, FH4_VOL_RENAME is redundant. | |||
redundant. | ||||
Servers which provide volatile filehandles that may expire while open | Servers that provide volatile filehandles that can expire while open | |||
require special care as regards handling of RENAMEs and REMOVEs. | require special care as regards handling of RENAMEs and REMOVEs. | |||
This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is | This situation can arise if FH4_VOL_MIGRATION or FH4_VOL_RENAME is | |||
set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN not set, | set, if FH4_VOLATILE_ANY is set and FH4_NOEXPIRE_WITH_OPEN is not | |||
or if a non-readonly file system has a transition target in a | set, or if a non-read-only file system has a transition target in a | |||
different _handle _ class. In these cases, the server should deny a | different handle class. In these cases, the server should deny a | |||
RENAME or REMOVE that would affect an OPEN file of any of the | RENAME or REMOVE that would affect an OPEN file of any of the | |||
components leading to the OPEN file. In addition, the server should | components leading to the OPEN file. In addition, the server should | |||
deny all RENAME or REMOVE requests during the grace period, in order | deny all RENAME or REMOVE requests during the grace period, in order | |||
to make sure that reclaims of files where filehandles may have | to make sure that reclaims of files where filehandles may have | |||
expired do not do a reclaim for the wrong file. | expired do not do a reclaim for the wrong file. | |||
Volatile filehandles are especially suitable for implementation of | Volatile filehandles are especially suitable for implementation of | |||
the pseudo file systems used to bridge exports. See Section 7.5 for | the pseudo file systems used to bridge exports. See Section 7.5 for | |||
a discussion of this. | a discussion of this. | |||
4.3. One Method of Constructing a Volatile Filehandle | 4.3. One Method of Constructing a Volatile Filehandle | |||
A volatile filehandle, while opaque to the client could contain: | A volatile filehandle, while opaque to the client, could contain: | |||
[volatile bit = 1 | server boot time | slot | generation number] | [volatile bit = 1 | server boot time | slot | generation number] | |||
o slot is an index in the server volatile filehandle table | o slot is an index in the server volatile filehandle table | |||
o generation number is the generation number for the table entry/ | o generation number is the generation number for the table entry/ | |||
slot | slot | |||
When the client presents a volatile filehandle, the server makes the | When the client presents a volatile filehandle, the server makes the | |||
following checks, which assume that the check for the volatile bit | following checks, which assume that the check for the volatile bit | |||
has passed. If the server boot time is less than the current server | has passed. If the server boot time is less than the current server | |||
boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return | boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return | |||
NFS4ERR_BADHANDLE. If the generation number does not match, return | NFS4ERR_BADHANDLE. If the generation number does not match, return | |||
skipping to change at page 103, line 36 | skipping to change at page 103, line 18 | |||
When the client presents a volatile filehandle, the server makes the | When the client presents a volatile filehandle, the server makes the | |||
following checks, which assume that the check for the volatile bit | following checks, which assume that the check for the volatile bit | |||
has passed. If the server boot time is less than the current server | has passed. If the server boot time is less than the current server | |||
boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return | boot time, return NFS4ERR_FHEXPIRED. If slot is out of range, return | |||
NFS4ERR_BADHANDLE. If the generation number does not match, return | NFS4ERR_BADHANDLE. If the generation number does not match, return | |||
NFS4ERR_FHEXPIRED. | NFS4ERR_FHEXPIRED. | |||
When the server restarts, the table is gone (it is volatile). | When the server restarts, the table is gone (it is volatile). | |||
If volatile bit is 0, then it is a persistent filehandle with a | If the volatile bit is 0, then it is a persistent filehandle with a | |||
different structure following it. | different structure following it. | |||
4.4. Client Recovery from Filehandle Expiration | 4.4. Client Recovery from Filehandle Expiration | |||
If possible, the client SHOULD recover from the receipt of an | If possible, the client SHOULD recover from the receipt of an | |||
NFS4ERR_FHEXPIRED error. The client must take on additional | NFS4ERR_FHEXPIRED error. The client must take on additional | |||
responsibility so that it may prepare itself to recover from the | responsibility so that it may prepare itself to recover from the | |||
expiration of a volatile filehandle. If the server returns | expiration of a volatile filehandle. If the server returns | |||
persistent filehandles, the client does not need these additional | persistent filehandles, the client does not need these additional | |||
steps. | steps. | |||
For volatile filehandles, most commonly the client will need to store | For volatile filehandles, most commonly the client will need to store | |||
the component names leading up to and including the file system | the component names leading up to and including the file system | |||
object in question. With these names, the client should be able to | object in question. With these names, the client should be able to | |||
recover by finding a filehandle in the name space that is still | recover by finding a filehandle in the name space that is still | |||
available or by starting at the root of the server's file system name | available or by starting at the root of the server's file system | |||
space. | namespace. | |||
If the expired filehandle refers to an object that has been removed | If the expired filehandle refers to an object that has been removed | |||
from the file system, obviously the client will not be able to | from the file system, obviously the client will not be able to | |||
recover from the expired filehandle. | recover from the expired filehandle. | |||
It is also possible that the expired filehandle refers to a file that | It is also possible that the expired filehandle refers to a file that | |||
has been renamed. If the file was renamed by another client, again | has been renamed. If the file was renamed by another client, again | |||
it is possible that the original client will not be able to recover. | it is possible that the original client will not be able to recover. | |||
However, in the case that the client itself is renaming the file and | However, in the case that the client itself is renaming the file and | |||
the file is open, it is possible that the client may be able to | the file is open, it is possible that the client may be able to | |||
recover. The client can determine the new path name based on the | recover. The client can determine the new path name based on the | |||
processing of the rename request. The client can then regenerate the | processing of the rename request. The client can then regenerate the | |||
new filehandle based on the new path name. The client could also use | new filehandle based on the new path name. The client could also use | |||
the compound operation mechanism to construct a set of operations | the COMPOUND procedure to construct a series of operations like: | |||
like: | ||||
RENAME A B | RENAME A B | |||
LOOKUP B | LOOKUP B | |||
GETFH | GETFH | |||
Note that the COMPOUND procedure does not provide atomicity. This | Note that the COMPOUND procedure does not provide atomicity. This | |||
example only reduces the overhead of recovering from an expired | example only reduces the overhead of recovering from an expired | |||
filehandle. | filehandle. | |||
5. File Attributes | 5. File Attributes | |||
To meet the requirements of extensibility and increased | To meet the requirements of extensibility and increased | |||
interoperability with non-UNIX platforms, attributes need to be | interoperability with non-UNIX platforms, attributes need to be | |||
handled in a flexible manner. The NFSv3 fattr3 structure contains a | handled in a flexible manner. The NFSv3 fattr3 structure contains a | |||
fixed list of attributes that not all clients and servers are able to | fixed list of attributes that not all clients and servers are able to | |||
support or care about. The fattr3 structure can not be extended as | support or care about. The fattr3 structure can not be extended as | |||
new needs arise and it provides no way to indicate non-support. With | new needs arise and it provides no way to indicate non-support. With | |||
the NFSv4.1 protocol, the client is able query what attributes the | the NFSv4.1 protocol, the client is able to query what attributes the | |||
server supports and construct requests with only those supported | server supports and construct requests with only those supported | |||
attributes (or a subset thereof). | attributes (or a subset thereof). | |||
To this end, attributes are divided into three groups: REQUIRED, | To this end, attributes are divided into three groups: REQUIRED, | |||
RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are | RECOMMENDED, and named. Both REQUIRED and RECOMMENDED attributes are | |||
supported in the NFSv4.1 protocol by a specific and well-defined | supported in the NFSv4.1 protocol by a specific and well-defined | |||
encoding and are identified by number. They are requested by setting | encoding and are identified by number. They are requested by setting | |||
a bit in the bit vector sent in the GETATTR request; the server | a bit in the bit vector sent in the GETATTR request; the server | |||
response includes a bit vector to list what attributes were returned | response includes a bit vector to list what attributes were returned | |||
in the response. New REQUIRED or RECOMMENDED attributes may be added | in the response. New REQUIRED or RECOMMENDED attributes may be added | |||
to the NFSv4 protocol as part of a new minor version by publishing a | to the NFSv4 protocol as part of a new minor version by publishing a | |||
standards-track RFC which allocates a new attribute number value and | Standards Track RFC that allocates a new attribute number value and | |||
defines the encoding for the attribute. See Section 2.7 for further | defines the encoding for the attribute. See Section 2.7 for further | |||
discussion. | discussion. | |||
Named attributes are accessed by the new OPENATTR operation, which | Named attributes are accessed by the new OPENATTR operation, which | |||
accesses a hidden directory of attributes associated with a file | accesses a hidden directory of attributes associated with a file | |||
system object. OPENATTR takes a filehandle for the object and | system object. OPENATTR takes a filehandle for the object and | |||
returns the filehandle for the attribute hierarchy. The filehandle | returns the filehandle for the attribute hierarchy. The filehandle | |||
for the named attributes is a directory object accessible by LOOKUP | for the named attributes is a directory object accessible by LOOKUP | |||
or READDIR and contains files whose names represent the named | or READDIR and contains files whose names represent the named | |||
attributes and whose data bytes are the value of the attribute. For | attributes and whose data bytes are the value of the attribute. For | |||
skipping to change at page 105, line 27 | skipping to change at page 105, line 16 | |||
| LOOKUP | "foo" | ; look up file | | | LOOKUP | "foo" | ; look up file | | |||
| GETATTR | attrbits | | | | GETATTR | attrbits | | | |||
| OPENATTR | | ; access foo's named attributes | | | OPENATTR | | ; access foo's named attributes | | |||
| LOOKUP | "x11icon" | ; look up specific attribute | | | LOOKUP | "x11icon" | ; look up specific attribute | | |||
| READ | 0,4096 | ; read stream of bytes | | | READ | 0,4096 | ; read stream of bytes | | |||
+----------+-----------+---------------------------------+ | +----------+-----------+---------------------------------+ | |||
Named attributes are intended for data needed by applications rather | Named attributes are intended for data needed by applications rather | |||
than by an NFS client implementation. NFS implementors are strongly | than by an NFS client implementation. NFS implementors are strongly | |||
encouraged to define their new attributes as RECOMMENDED attributes | encouraged to define their new attributes as RECOMMENDED attributes | |||
by bringing them to the IETF standards-track process. | by bringing them to the IETF Standards Track process. | |||
The set of attributes which are classified as REQUIRED is | The set of attributes that are classified as REQUIRED is deliberately | |||
deliberately small since servers need to do whatever it takes to | small since servers need to do whatever it takes to support them. A | |||
support them. A server should support as many of the RECOMMENDED | server should support as many of the RECOMMENDED attributes as | |||
attributes as possible but by their definition, the server is not | possible but, by their definition, the server is not required to | |||
required to support all of them. Attributes are deemed REQUIRED if | support all of them. Attributes are deemed REQUIRED if the data is | |||
the data is both needed by a large number of clients and is not | both needed by a large number of clients and is not otherwise | |||
otherwise reasonably computable by the client when support is not | reasonably computable by the client when support is not provided on | |||
provided on the server. | the server. | |||
Note that the hidden directory returned by OPENATTR is a convenience | Note that the hidden directory returned by OPENATTR is a convenience | |||
for protocol processing. The client should not make any assumptions | for protocol processing. The client should not make any assumptions | |||
about the server's implementation of named attributes and whether the | about the server's implementation of named attributes and whether or | |||
underlying file system at the server has a named attribute directory | not the underlying file system at the server has a named attribute | |||
or not. Therefore, operations such as SETATTR and GETATTR on the | directory. Therefore, operations such as SETATTR and GETATTR on the | |||
named attribute directory are undefined. | named attribute directory are undefined. | |||
5.1. REQUIRED Attributes | 5.1. REQUIRED Attributes | |||
These MUST be supported by every NFSv4.1 client and server in order | These MUST be supported by every NFSv4.1 client and server in order | |||
to ensure a minimum level of interoperability. The server MUST store | to ensure a minimum level of interoperability. The server MUST store | |||
and return these attributes and the client MUST be able to function | and return these attributes, and the client MUST be able to function | |||
with an attribute set limited to these attributes. With just the | with an attribute set limited to these attributes. With just the | |||
REQUIRED attributes some client functionality may be impaired or | REQUIRED attributes some client functionality may be impaired or | |||
limited in some ways. A client may ask for any of these attributes | limited in some ways. A client may ask for any of these attributes | |||
to be returned by setting a bit in the GETATTR request and the server | to be returned by setting a bit in the GETATTR request, and the | |||
must return their value. | server MUST return their value. | |||
5.2. RECOMMENDED Attributes | 5.2. RECOMMENDED Attributes | |||
These attributes are understood well enough to warrant support in the | These attributes are understood well enough to warrant support in the | |||
NFSv4.1 protocol. However, they may not be supported on all clients | NFSv4.1 protocol. However, they may not be supported on all clients | |||
and servers. A client may ask for any of these attributes to be | and servers. A client may ask for any of these attributes to be | |||
returned by setting a bit in the GETATTR request but must handle the | returned by setting a bit in the GETATTR request but must handle the | |||
case where the server does not return them. A client MAY ask for the | case where the server does not return them. A client MAY ask for the | |||
set of attributes the server supports and SHOULD NOT request | set of attributes the server supports and SHOULD NOT request | |||
attributes the server does not support. A server should be tolerant | attributes the server does not support. A server should be tolerant | |||
of requests for unsupported attributes and simply not return them | of requests for unsupported attributes and simply not return them | |||
rather than considering the request an error. It is expected that | rather than considering the request an error. It is expected that | |||
servers will support all attributes they comfortably can and only | servers will support all attributes they comfortably can and only | |||
fail to support attributes which are difficult to support in their | fail to support attributes that are difficult to support in their | |||
operating environments. A server should provide attributes whenever | operating environments. A server should provide attributes whenever | |||
they don't have to "tell lies" to the client. For example, a file | they don't have to "tell lies" to the client. For example, a file | |||
modification time should be either an accurate time or should not be | modification time should be either an accurate time or should not be | |||
supported by the server. This will not always be comfortable to | supported by the server. At times this will be difficult for | |||
clients but the client is better positioned decide whether and how to | clients, but a client is better positioned to decide whether and how | |||
fabricate or construct an attribute or whether to do without the | to fabricate or construct an attribute or whether to do without the | |||
attribute. | attribute. | |||
5.3. Named Attributes | 5.3. Named Attributes | |||
These attributes are not supported by direct encoding in the NFSv4 | These attributes are not supported by direct encoding in the NFSv4 | |||
protocol but are accessed by string names rather than numbers and | protocol but are accessed by string names rather than numbers and | |||
correspond to an uninterpreted stream of bytes which are stored with | correspond to an uninterpreted stream of bytes that are stored with | |||
the file system object. The name space for these attributes may be | the file system object. The name space for these attributes may be | |||
accessed by using the OPENATTR operation. The OPENATTR operation | accessed by using the OPENATTR operation. The OPENATTR operation | |||
returns a filehandle for a virtual "named attribute directory" and | returns a filehandle for a virtual "named attribute directory", and | |||
further perusal and modification of the name space may be done using | further perusal and modification of the name space may be done using | |||
operations that work on more typical directories. In particular, | operations that work on more typical directories. In particular, | |||
READDIR may be used to get a list of such named attributes and LOOKUP | READDIR may be used to get a list of such named attributes, and | |||
and OPEN may select a particular attribute. Creation of a new named | LOOKUP and OPEN may select a particular attribute. Creation of a new | |||
attribute may be the result of an OPEN specifying file creation. | named attribute may be the result of an OPEN specifying file | |||
creation. | ||||
Once an OPEN is done, named attributes may be examined and changed by | Once an OPEN is done, named attributes may be examined and changed by | |||
normal READ and WRITE operations using the filehandles and stateids | normal READ and WRITE operations using the filehandles and stateids | |||
returned by OPEN. | returned by OPEN. | |||
Named attributes and the named attribute directory may have their own | Named attributes and the named attribute directory may have their own | |||
(non-named) attributes. Each of these objects MUST have all of the | (non-named) attributes. Each of these objects MUST have all of the | |||
REQUIRED attributes and may have additional RECOMMENDED attributes. | REQUIRED attributes and may have additional RECOMMENDED attributes. | |||
However, the set of attributes for named attributes and the named | However, the set of attributes for named attributes and the named | |||
attribute directory need not be as large as, and typically will not | attribute directory need not be, and typically will not be, as large | |||
be as large as that for other objects in that file system. | as that for other objects in that file system. | |||
Named attributes and the named attribute directory may be the target | Named attributes and the named attribute directory might be the | |||
of delegations (in the case of the named attribute directory these | target of delegations (in the case of the named attribute directory, | |||
will be directory delegations). However, since granting of | these will be directory delegations). However, since granting | |||
delegations or not is within the server's discretion, a server need | delegations is at the server's discretion, a server need not support | |||
not support delegations on named attributes or the named attribute | delegations on named attributes or the named attribute directory. | |||
directory. | ||||
It is RECOMMENDED that servers support arbitrary named attributes. A | It is RECOMMENDED that servers support arbitrary named attributes. A | |||
client should not depend on the ability to store any named attributes | client should not depend on the ability to store any named attributes | |||
in the server's file system. If a server does support named | in the server's file system. If a server does support named | |||
attributes, a client which is also able to handle them should be able | attributes, a client that is also able to handle them should be able | |||
to copy a file's data and metadata with complete transparency from | to copy a file's data and metadata with complete transparency from | |||
one location to another; this would imply that names allowed for | one location to another; this would imply that names allowed for | |||
regular directory entries are valid for named attribute names as | regular directory entries are valid for named attribute names as | |||
well. | well. | |||
In NFSv4.1, the structure of named attribute directories is | In NFSv4.1, the structure of named attribute directories is | |||
restricted in a number of ways, in order to prevent the development | restricted in a number of ways, in order to prevent the development | |||
of non-interoperable implementations in which some servers support a | of non-interoperable implementations in which some servers support a | |||
fully general hierarchical directory structure for named attributes | fully general hierarchical directory structure for named attributes | |||
while others support a limited set, but fully adequate to the | while others support a limited but adequate structure for named | |||
feature's goals. In such an environment, clients or applications | attributes. In such an environment, clients or applications might | |||
might come to depend on non-portable extensions. The restrictions | come to depend on non-portable extensions. The restrictions are: | |||
are: | ||||
o CREATE is not allowed in a named attribute directory. Thus, such | o CREATE is not allowed in a named attribute directory. Thus, such | |||
objects as symbolic links and special files are not allowed to be | objects as symbolic links and special files are not allowed to be | |||
named attributes. Further, directories may not be created in a | named attributes. Further, directories may not be created in a | |||
named attribute directory so no hierarchical structure of named | named attribute directory, so no hierarchical structure of named | |||
attributes for a single object is allowed. | attributes for a single object is allowed. | |||
o If OPENATTR is done on a named attribute directory or on a named | o If OPENATTR is done on a named attribute directory or on a named | |||
attribute, the server MUST return NFS4ERR_WRONG_TYPE. | attribute, the server MUST return NFS4ERR_WRONG_TYPE. | |||
o Doing a RENAME of a named attribute to a different named attribute | o Doing a RENAME of a named attribute to a different named attribute | |||
directory or to an ordinary (i.e. non-named-attribute) directory | directory or to an ordinary (i.e., non-named-attribute) directory | |||
is not allowed. | is not allowed. | |||
o Creating hard links between named attribute directories or between | o Creating hard links between named attribute directories or between | |||
named attribute directories and ordinary directories is not | named attribute directories and ordinary directories is not | |||
allowed. | allowed. | |||
Names of attributes will not be controlled by this document or other | Names of attributes will not be controlled by this document or other | |||
IETF standards track documents. See Section 22.1 for further | IETF Standards Track documents. See Section 22.1 for further | |||
discussion. | discussion. | |||
5.4. Classification of Attributes | 5.4. Classification of Attributes | |||
Each of the REQUIRED and RECOMMENDED attributes can be classified in | Each of the REQUIRED and RECOMMENDED attributes can be classified in | |||
one of three categories: per server (i.e. the value of the attribute | one of three categories: per server (i.e., the value of the attribute | |||
will be the same for all file objects that share the same server | will be the same for all file objects that share the same server | |||
owner; see Section 2.5 for a definition of server owner), per file | owner; see Section 2.5 for a definition of server owner), per file | |||
system (i.e. the value of the attribute will be the same for some or | system (i.e., the value of the attribute will be the same for some or | |||
all file objects that share the same fsid attribute (Section 5.8.1.9) | all file objects that share the same fsid attribute (Section 5.8.1.9) | |||
and Server Owner), or per file system object. Note that it is | and server owner), or per file system object. Note that it is | |||
possible that some per file system attributes may vary within the | possible that some per file system attributes may vary within the | |||
file system, depending on the value of the "homogeneous" | file system, depending on the value of the "homogeneous" | |||
(Section 5.8.2.16) attribute. Note that the attributes | (Section 5.8.2.16) attribute. Note that the attributes | |||
time_access_set and time_modify_set are not listed in this section | time_access_set and time_modify_set are not listed in this section | |||
because they are write-only attributes corresponding to time_access | because they are write-only attributes corresponding to time_access | |||
and time_modify, and are used in a special instance of SETATTR. | and time_modify, and are used in a special instance of SETATTR. | |||
o The per server attribute is: | o The per-server attribute is: | |||
lease_time | lease_time | |||
o The per file system attributes are: | o The per-file system attributes are: | |||
supported_attrs, suppattr_exclcreat, fh_expire_type, | supported_attrs, suppattr_exclcreat, fh_expire_type, | |||
link_support, symlink_support, unique_handles, aclsupport, | link_support, symlink_support, unique_handles, aclsupport, | |||
cansettime, case_insensitive, case_preserving, | cansettime, case_insensitive, case_preserving, | |||
chown_restricted, files_avail, files_free, files_total, | chown_restricted, files_avail, files_free, files_total, | |||
fs_locations, homogeneous, maxfilesize, maxname, maxread, | fs_locations, homogeneous, maxfilesize, maxname, maxread, | |||
maxwrite, no_trunc, space_avail, space_free, space_total, | maxwrite, no_trunc, space_avail, space_free, space_total, | |||
time_delta, change_policy, fs_status, fs_layout_type, | time_delta, change_policy, fs_status, fs_layout_type, | |||
fs_locations_info, fs_charset_cap | fs_locations_info, fs_charset_cap | |||
o The per file system object attributes are: | o The per-file system object attributes are: | |||
type, change, size, named_attr, fsid, rdattr_error, filehandle, | type, change, size, named_attr, fsid, rdattr_error, filehandle, | |||
acl, archive, fileid, hidden, maxlink, mimetype, mode, | acl, archive, fileid, hidden, maxlink, mimetype, mode, | |||
numlinks, owner, owner_group, rawdev, space_used, system, | numlinks, owner, owner_group, rawdev, space_used, system, | |||
time_access, time_backup, time_create, time_metadata, | time_access, time_backup, time_create, time_metadata, | |||
time_modify, mounted_on_fileid, dir_notif_delay, | time_modify, mounted_on_fileid, dir_notif_delay, | |||
dirent_notif_delay, dacl, sacl, layout_type, layout_hint, | dirent_notif_delay, dacl, sacl, layout_type, layout_hint, | |||
layout_blksize, layout_alignment, mdsthreshold, retention_get, | layout_blksize, layout_alignment, mdsthreshold, retention_get, | |||
retention_set, retentevt_get, retentevt_set, retention_hold, | retention_set, retentevt_get, retentevt_set, retention_hold, | |||
mode_set_masked | mode_set_masked | |||
For quota_avail_hard, quota_avail_soft, and quota_used see their | For quota_avail_hard, quota_avail_soft, and quota_used, see their | |||
definitions below for the appropriate classification. | definitions below for the appropriate classification. | |||
5.5. Set-Only and Get-Only Attributes | 5.5. Set-Only and Get-Only Attributes | |||
Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can | Some REQUIRED and RECOMMENDED attributes are set-only; i.e., they can | |||
be set via SETATTR but not retrieved via GETATTR. Similarly, some | be set via SETATTR but not retrieved via GETATTR. Similarly, some | |||
REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be | REQUIRED and RECOMMENDED attributes are get-only; i.e., they can be | |||
retrieved GETATTR but not set via SETATTR. If a client attempts to | retrieved via GETATTR but not set via SETATTR. If a client attempts | |||
set a get-only attribute or get a set-only attributes, the server | to set a get-only attribute or get a set-only attributes, the server | |||
MUST return NFS4ERR_INVAL. | MUST return NFS4ERR_INVAL. | |||
5.6. REQUIRED Attributes - List and Definition References | 5.6. REQUIRED Attributes - List and Definition References | |||
The list of REQUIRED attributes appears in Table 2. The meaning of | The list of REQUIRED attributes appears in Table 2. The meaning of | |||
the columns of the table are: | the columns of the table are: | |||
o Name: the name of attribute | o Name: The name of the attribute. | |||
o Id: the number assigned to the attribute. In the event of | o Id: The number assigned to the attribute. In the event of | |||
conflicts between the assigned number and [13], the latter is | conflicts between the assigned number and [13], the latter is | |||
likely authoritative, but should be resolved with Errata to this | likely authoritative, but should be resolved with Errata to this | |||
document and/or [13]. See [44] for the Errata process. | document and/or [13]. See [44] for the Errata process. | |||
o Data Type: The XDR data type of the attribute. | o Data Type: The XDR data type of the attribute. | |||
o Acc: Access allowed to the attribute. R means read-only (GETATTR | o Acc: Access allowed to the attribute. R means read-only (GETATTR | |||
may retrieve, SETATTR may not set). W means write-only (SETATTR | may retrieve, SETATTR may not set). W means write-only (SETATTR | |||
may set, GETATTR may not retrieve). R W means read/write (GETATTR | may set, GETATTR may not retrieve). R W means read/write (GETATTR | |||
may retrieve, SETATTR may set). | may retrieve, SETATTR may set). | |||
o Defined in: the section of this specification that describes the | o Defined in: The section of this specification that describes the | |||
attribute. | attribute. | |||
+--------------------+----+------------+-----+------------------+ | +--------------------+----+------------+-----+------------------+ | |||
| Name | Id | Data Type | Acc | Defined in: | | | Name | Id | Data Type | Acc | Defined in: | | |||
+--------------------+----+------------+-----+------------------+ | +--------------------+----+------------+-----+------------------+ | |||
| supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | | | supported_attrs | 0 | bitmap4 | R | Section 5.8.1.1 | | |||
| type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | | | type | 1 | nfs_ftype4 | R | Section 5.8.1.2 | | |||
| fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | | | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | | |||
| change | 3 | uint64_t | R | Section 5.8.1.4 | | | change | 3 | uint64_t | R | Section 5.8.1.4 | | |||
| size | 4 | uint64_t | R W | Section 5.8.1.5 | | | size | 4 | uint64_t | R W | Section 5.8.1.5 | | |||
skipping to change at page 112, line 4 | skipping to change at page 111, line 21 | |||
| time_access_set | 48 | settime4 | W | Section 5.8.2.38 | | | time_access_set | 48 | settime4 | W | Section 5.8.2.38 | | |||
| time_backup | 49 | nfstime4 | R W | Section 5.8.2.39 | | | time_backup | 49 | nfstime4 | R W | Section 5.8.2.39 | | |||
| time_create | 50 | nfstime4 | R W | Section 5.8.2.40 | | | time_create | 50 | nfstime4 | R W | Section 5.8.2.40 | | |||
| time_delta | 51 | nfstime4 | R | Section 5.8.2.41 | | | time_delta | 51 | nfstime4 | R | Section 5.8.2.41 | | |||
| time_metadata | 52 | nfstime4 | R | Section 5.8.2.42 | | | time_metadata | 52 | nfstime4 | R | Section 5.8.2.42 | | |||
| time_modify | 53 | nfstime4 | R | Section 5.8.2.43 | | | time_modify | 53 | nfstime4 | R | Section 5.8.2.43 | | |||
| time_modify_set | 54 | settime4 | W | Section 5.8.2.44 | | | time_modify_set | 54 | settime4 | W | Section 5.8.2.44 | | |||
+--------------------+----+----------------+-----+------------------+ | +--------------------+----+----------------+-----+------------------+ | |||
Table 3 | Table 3 | |||
* fs_locations_info4 | * fs_locations_info4 | |||
5.8. Attribute Definitions | 5.8. Attribute Definitions | |||
5.8.1. Definitions of REQUIRED Attributes | 5.8.1. Definitions of REQUIRED Attributes | |||
5.8.1.1. Attribute 0: supported_attrs | 5.8.1.1. Attribute 0: supported_attrs | |||
The bit vector which would retrieve all REQUIRED and RECOMMENDED | The bit vector that would retrieve all REQUIRED and RECOMMENDED | |||
attributes that are supported for this object. The scope of this | attributes that are supported for this object. The scope of this | |||
attribute applies to all objects with a matching fsid. | attribute applies to all objects with a matching fsid. | |||
5.8.1.2. Attribute 1: type | 5.8.1.2. Attribute 1: type | |||
Designates the type of an object in terms of one of a number of | Designates the type of an object in terms of one of a number of | |||
special constants: | special constants: | |||
o NF4REG designates a regular file. | o NF4REG designates a regular file. | |||
skipping to change at page 112, line 42 | skipping to change at page 112, line 14 | |||
o NF4FIFO designates a fifo special file. | o NF4FIFO designates a fifo special file. | |||
o NF4ATTRDIR designates a named attribute directory. | o NF4ATTRDIR designates a named attribute directory. | |||
o NF4NAMEDATTR designates a named attribute. | o NF4NAMEDATTR designates a named attribute. | |||
Within the explanatory text and operation descriptions, the following | Within the explanatory text and operation descriptions, the following | |||
phrases will be used with the meanings given below: | phrases will be used with the meanings given below: | |||
o The phrase "is a directory" means that the object is of type | o The phrase "is a directory" means that the object's type attribute | |||
NF4DIR or of type NF4ATTRDIR. | is NF4DIR or NF4ATTRDIR. | |||
o The phrase "is a special file" means that the object is of one of | o The phrase "is a special file" means that the object's type | |||
the types NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. | attribute is NF4BLK, NF4CHR, NF4SOCK, or NF4FIFO. | |||
o The phrase "is an ordinary file" means that the object is of type | o The phrases "is an ordinary file" and "is a regular file" mean | |||
NF4REG or of type NF4NAMEDATTR. | that the object's type attribute is NF4REG or NF4NAMEDATTR. | |||
5.8.1.3. Attribute 2: fh_expire_type | 5.8.1.3. Attribute 2: fh_expire_type | |||
Server uses this to specify filehandle expiration behavior to the | Server uses this to specify filehandle expiration behavior to the | |||
client. See Section 4 for additional description. | client. See Section 4 for additional description. | |||
5.8.1.4. Attribute 3: change | 5.8.1.4. Attribute 3: change | |||
A value created by the server that the client can use to determine if | A value created by the server that the client can use to determine if | |||
file data, directory contents or attributes of the object have been | file data, directory contents, or attributes of the object have been | |||
modified. The server may return the object's time_metadata attribute | modified. The server may return the object's time_metadata attribute | |||
for this attribute's value but only if the file system object can not | for this attribute's value, but only if the file system object cannot | |||
be updated more frequently than the resolution of time_metadata. | be updated more frequently than the resolution of time_metadata. | |||
5.8.1.5. Attribute 4: size | 5.8.1.5. Attribute 4: size | |||
The size of the object in bytes. | The size of the object in bytes. | |||
5.8.1.6. Attribute 5: link_support | 5.8.1.6. Attribute 5: link_support | |||
True, if the object's file system supports hard links. | TRUE, if the object's file system supports hard links. | |||
5.8.1.7. Attribute 6: symlink_support | 5.8.1.7. Attribute 6: symlink_support | |||
True, if the object's file system supports symbolic links. | TRUE, if the object's file system supports symbolic links. | |||
5.8.1.8. Attribute 7: named_attr | 5.8.1.8. Attribute 7: named_attr | |||
True, if this object has named attributes. In other words, object | TRUE, if this object has named attributes. In other words, object | |||
has a non-empty named attribute directory. | has a non-empty named attribute directory. | |||
5.8.1.9. Attribute 8: fsid | 5.8.1.9. Attribute 8: fsid | |||
Unique file system identifier for the file system holding this | Unique file system identifier for the file system holding this | |||
object. fsid contains major and minor components each of which are of | object. The fsid attribute has major and minor components, each of | |||
data type uint64_t. | which are of data type uint64_t. | |||
5.8.1.10. Attribute 9: unique_handles | 5.8.1.10. Attribute 9: unique_handles | |||
True, if two distinct filehandles guaranteed to refer to two | TRUE, if two distinct filehandles are guaranteed to refer to two | |||
different file system objects. | different file system objects. | |||
5.8.1.11. Attribute 10: lease_time | 5.8.1.11. Attribute 10: lease_time | |||
Duration of leases at server in seconds. | Duration of the lease at server in seconds. | |||
5.8.1.12. Attribute 11: rdattr_error | 5.8.1.12. Attribute 11: rdattr_error | |||
Error returned from an attempt to retrieve attributes during a | Error returned from an attempt to retrieve attributes during a | |||
READDIR operation. | READDIR operation. | |||
5.8.1.13. Attribute 19: filehandle | 5.8.1.13. Attribute 19: filehandle | |||
The filehandle of this object (primarily for READDIR requests). | The filehandle of this object (primarily for READDIR requests). | |||
5.8.1.14. Attribute 75: suppattr_exclcreat | 5.8.1.14. Attribute 75: suppattr_exclcreat | |||
The bit vector which would set all REQUIRED and RECOMMENDED | The bit vector that would set all REQUIRED and RECOMMENDED attributes | |||
attributes that are supported by the EXCLUSIVE4_1 method of file | that are supported by the EXCLUSIVE4_1 method of file creation via | |||
creation via the OPEN operation. The scope of this attribute applies | the OPEN operation. The scope of this attribute applies to all | |||
to all objects with a matching fsid. | objects with a matching fsid. | |||
5.8.2. Definitions of Uncategorized RECOMMENDED Attributes | 5.8.2. Definitions of Uncategorized RECOMMENDED Attributes | |||
The definitions of most of the RECOMMENDED attributes follow. | The definitions of most of the RECOMMENDED attributes follow. | |||
Collections that share a common category are defined in other | Collections that share a common category are defined in other | |||
sections. | sections. | |||
5.8.2.1. Attribute 14: archive | 5.8.2.1. Attribute 14: archive | |||
True, if this file has been archived since the time of last | TRUE, if this file has been archived since the time of last | |||
modification (deprecated in favor of time_backup). | modification (deprecated in favor of time_backup). | |||
5.8.2.2. Attribute 15: cansettime | 5.8.2.2. Attribute 15: cansettime | |||
True, if the server able to change the times for a file system object | TRUE, if the server is able to change the times for a file system | |||
as specified in a SETATTR operation. | object as specified in a SETATTR operation. | |||
5.8.2.3. Attribute 16: case_insensitive | 5.8.2.3. Attribute 16: case_insensitive | |||
True, if file name comparisons on this file system are case | TRUE, if file name comparisons on this file system are case | |||
insensitive. | insensitive. | |||
5.8.2.4. Attribute 17: case_preserving | 5.8.2.4. Attribute 17: case_preserving | |||
True, if file name case on this file system is preserved. | TRUE, if file name case on this file system is preserved. | |||
5.8.2.5. Attribute 60: change_policy | 5.8.2.5. Attribute 60: change_policy | |||
A value created by the server that the client can use to determine if | A value created by the server that the client can use to determine if | |||
some server policy related to the current file system has been | some server policy related to the current file system has been | |||
subject to change. If the value remains the same then the client can | subject to change. If the value remains the same, then the client | |||
be sure that the values of the attributes related to fs location and | can be sure that the values of the attributes related to fs location | |||
the fss_type field of the fs_status attribute have not changed. On | and the fss_type field of the fs_status attribute have not changed. | |||
the other hand, a change in this value does necessarily imply a | On the other hand, a change in this value does necessarily imply a | |||
change in policy. It is up to the client to interrogate the server | change in policy. It is up to the client to interrogate the server | |||
to determine if some policy relevant to it has changed. See | to determine if some policy relevant to it has changed. See | |||
Section 3.3.6 for details. | Section 3.3.6 for details. | |||
This attribute MUST change when the value returned by the | This attribute MUST change when the value returned by the | |||
fs_locations or fs_locations_info attribute changes, when a file | fs_locations or fs_locations_info attribute changes, when a file | |||
system goes from read-only to writable or vice versa, or when the | system goes from read-only to writable or vice versa, or when the | |||
allowable set of security flavors for the file system or any part | allowable set of security flavors for the file system or any part | |||
thereof is changed. | thereof is changed. | |||
5.8.2.6. Attribute 18: chown_restricted | 5.8.2.6. Attribute 18: chown_restricted | |||
If TRUE, the server will reject any request to change either the | If TRUE, the server will reject any request to change either the | |||
owner or the group associated with a file if the caller is not a | owner or the group associated with a file if the caller is not a | |||
privileged user (for example, "root" in UNIX operating environments | privileged user (for example, "root" in UNIX operating environments | |||
or in Windows 2000 the "Take Ownership" privilege). | or, in Windows 2000, the "Take Ownership" privilege). | |||
5.8.2.7. Attribute 20: fileid | 5.8.2.7. Attribute 20: fileid | |||
A number uniquely identifying the file within the file system. | A number uniquely identifying the file within the file system. | |||
5.8.2.8. Attribute 21: files_avail | 5.8.2.8. Attribute 21: files_avail | |||
File slots available to this user on the file system containing this | File slots available to this user on the file system containing this | |||
object - this should be the smallest relevant limit. | object -- this should be the smallest relevant limit. | |||
5.8.2.9. Attribute 22: files_free | 5.8.2.9. Attribute 22: files_free | |||
Free file slots on the file system containing this object - this | Free file slots on the file system containing this object -- this | |||
should be the smallest relevant limit. | should be the smallest relevant limit. | |||
5.8.2.10. Attribute 23: files_total | 5.8.2.10. Attribute 23: files_total | |||
Total file slots on the file system containing this object. | Total file slots on the file system containing this object. | |||
5.8.2.11. Attribute 76: fs_charset_cap | 5.8.2.11. Attribute 76: fs_charset_cap | |||
Character set capabilities for this file system. See Section 14.4. | Character set capabilities for this file system. See Section 14.4. | |||
skipping to change at page 116, line 17 | skipping to change at page 115, line 31 | |||
Full function file system location. See Section 11.10 for more | Full function file system location. See Section 11.10 for more | |||
details. | details. | |||
5.8.2.14. Attribute 61: fs_status | 5.8.2.14. Attribute 61: fs_status | |||
Generic file system type information. See Section 11.11 for more | Generic file system type information. See Section 11.11 for more | |||
details. | details. | |||
5.8.2.15. Attribute 25: hidden | 5.8.2.15. Attribute 25: hidden | |||
True, if the file is considered hidden with respect to the Windows | TRUE, if the file is considered hidden with respect to the Windows | |||
API. | API. | |||
5.8.2.16. Attribute 26: homogeneous | 5.8.2.16. Attribute 26: homogeneous | |||
True, if this object's file system is homogeneous, i.e. are per file | TRUE, if this object's file system is homogeneous; i.e., all objects | |||
system attributes the same for all file system's objects. | in the file system (all objects on the server with the same fsid) | |||
have common values for all per-file-system attributes. | ||||
5.8.2.17. Attribute 27: maxfilesize | 5.8.2.17. Attribute 27: maxfilesize | |||
Maximum supported file size for the file system of this object. | Maximum supported file size for the file system of this object. | |||
5.8.2.18. Attribute 28: maxlink | 5.8.2.18. Attribute 28: maxlink | |||
Maximum number of links for this object. | Maximum number of links for this object. | |||
5.8.2.19. Attribute 29: maxname | 5.8.2.19. Attribute 29: maxname | |||
Maximum file name size supported for this object. | Maximum file name size supported for this object. | |||
5.8.2.20. Attribute 30: maxread | 5.8.2.20. Attribute 30: maxread | |||
Maximum read size supported for this object. | Maximum amount of data the READ operation will return for this | |||
object. | ||||
5.8.2.21. Attribute 31: maxwrite | 5.8.2.21. Attribute 31: maxwrite | |||
Maximum write size supported for this object. This attribute SHOULD | Maximum amount of data the WRITE operation will accept for this | |||
be supported if the file is writable. Lack of this attribute can | object. This attribute SHOULD be supported if the file is writable. | |||
lead to the client either wasting bandwidth or not receiving the best | Lack of this attribute can lead to the client either wasting | |||
performance. | bandwidth or not receiving the best performance. | |||
5.8.2.22. Attribute 32: mimetype | 5.8.2.22. Attribute 32: mimetype | |||
MIME body type/subtype of this object. | MIME body type/subtype of this object. | |||
5.8.2.23. Attribute 55: mounted_on_fileid | 5.8.2.23. Attribute 55: mounted_on_fileid | |||
Like fileid, but if the target filehandle is the root of a file | Like fileid, but if the target filehandle is the root of a file | |||
system, this attribute represents the fileid of the underlying | system, this attribute represents the fileid of the underlying | |||
directory. | directory. | |||
UNIX-based operating environments connect a file system into the | UNIX-based operating environments connect a file system into the | |||
namespace by connecting (mounting) the file system onto the existing | namespace by connecting (mounting) the file system onto the existing | |||
file object (the mount point, usually a directory) of an existing | file object (the mount point, usually a directory) of an existing | |||
file system. When the mount point's parent directory is read via an | file system. When the mount point's parent directory is read via an | |||
API like readdir(), the return results are directory entries, each | API like readdir(), the return results are directory entries, each | |||
with a component name and a fileid. The fileid of the mount point's | with a component name and a fileid. The fileid of the mount point's | |||
directory entry will be different from the fileid that the stat() | directory entry will be different from the fileid that the stat() | |||
system call returns. The stat() system call is returning the fileid | system call returns. The stat() system call is returning the fileid | |||
of the root of the mounted file system, whereas readdir() is | of the root of the mounted file system, whereas readdir() is | |||
returning the fileid stat() would have returned before any file | returning the fileid that stat() would have returned before any file | |||
systems were mounted on the mount point. | systems were mounted on the mount point. | |||
Unlike NFSv3, NFSv4.1 allows a client's LOOKUP request to cross other | Unlike NFSv3, NFSv4.1 allows a client's LOOKUP request to cross other | |||
file systems. The client detects the file system crossing whenever | file systems. The client detects the file system crossing whenever | |||
the filehandle argument of LOOKUP has an fsid attribute different | the filehandle argument of LOOKUP has an fsid attribute different | |||
from that of the filehandle returned by LOOKUP. A UNIX-based client | from that of the filehandle returned by LOOKUP. A UNIX-based client | |||
will consider this a "mount point crossing". UNIX has a legacy | will consider this a "mount point crossing". UNIX has a legacy | |||
scheme for allowing a process to determine its current working | scheme for allowing a process to determine its current working | |||
directory. This relies on readdir() of a mount point's parent and | directory. This relies on readdir() of a mount point's parent and | |||
stat() of the mount point returning fileids as previously described. | stat() of the mount point returning fileids as previously described. | |||
skipping to change at page 118, line 5 | skipping to change at page 117, line 21 | |||
the same as that of the fileid attribute. | the same as that of the fileid attribute. | |||
The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD | The mounted_on_fileid attribute is RECOMMENDED, so the server SHOULD | |||
provide it if possible, and for a UNIX-based server, this is | provide it if possible, and for a UNIX-based server, this is | |||
straightforward. Usually, mounted_on_fileid will be requested during | straightforward. Usually, mounted_on_fileid will be requested during | |||
a READDIR operation, in which case it is trivial (at least for UNIX- | a READDIR operation, in which case it is trivial (at least for UNIX- | |||
based servers) to return mounted_on_fileid since it is equal to the | based servers) to return mounted_on_fileid since it is equal to the | |||
fileid of a directory entry returned by readdir(). If | fileid of a directory entry returned by readdir(). If | |||
mounted_on_fileid is requested in a GETATTR operation, the server | mounted_on_fileid is requested in a GETATTR operation, the server | |||
should obey an invariant that has it returning a value that is equal | should obey an invariant that has it returning a value that is equal | |||
to the file object's entry in the object's parent directory, i.e. | to the file object's entry in the object's parent directory, i.e., | |||
what readdir() would have returned. Some operating environments | what readdir() would have returned. Some operating environments | |||
allow a series of two or more file systems to be mounted onto a | allow a series of two or more file systems to be mounted onto a | |||
single mount point. In this case, for the server to obey the | single mount point. In this case, for the server to obey the | |||
aforementioned invariant, it will need to find the base mount point, | aforementioned invariant, it will need to find the base mount point, | |||
and not the intermediate mount points. | and not the intermediate mount points. | |||
5.8.2.24. Attribute 34: no_trunc | 5.8.2.24. Attribute 34: no_trunc | |||
If this attribute is TRUE, then if the client uses a file name longer | If this attribute is TRUE, then if the client uses a file name longer | |||
than name_max, an error will be returned instead of the name being | than name_max, an error will be returned instead of the name being | |||
skipping to change at page 118, line 32 | skipping to change at page 117, line 48 | |||
5.8.2.26. Attribute 36: owner | 5.8.2.26. Attribute 36: owner | |||
The string name of the owner of this object. | The string name of the owner of this object. | |||
5.8.2.27. Attribute 37: owner_group | 5.8.2.27. Attribute 37: owner_group | |||
The string name of the group ownership of this object. | The string name of the group ownership of this object. | |||
5.8.2.28. Attribute 38: quota_avail_hard | 5.8.2.28. Attribute 38: quota_avail_hard | |||
The value in bytes which represents the amount of additional disk | The value in bytes that represents the amount of additional disk | |||
space beyond the current allocation that can be allocated to this | space beyond the current allocation that can be allocated to this | |||
file or directory before further allocations will be refused. It is | file or directory before further allocations will be refused. It is | |||
understood that this space may be consumed by allocations to other | understood that this space may be consumed by allocations to other | |||
files or directories. | files or directories. | |||
5.8.2.29. Attribute 39: quota_avail_soft | 5.8.2.29. Attribute 39: quota_avail_soft | |||
The value in bytes which represents the amount of additional disk | The value in bytes that represents the amount of additional disk | |||
space that can be allocated to this file or directory before the user | space that can be allocated to this file or directory before the user | |||
may reasonably be warned. It is understood that this space may be | may reasonably be warned. It is understood that this space may be | |||
consumed by allocations to other files or directories though there is | consumed by allocations to other files or directories though there is | |||
a rule as to which other files or directories. | a rule as to which other files or directories. | |||
5.8.2.30. Attribute 40: quota_used | 5.8.2.30. Attribute 40: quota_used | |||
The value in bytes which represent the amount of disc space used by | The value in bytes that represents the amount of disk space used by | |||
this file or directory and possibly a number of other similar files | this file or directory and possibly a number of other similar files | |||
or directories, where the set of "similar" meets at least the | or directories, where the set of "similar" meets at least the | |||
criterion that allocating space to any file or directory in the set | criterion that allocating space to any file or directory in the set | |||
will reduce the "quota_avail_hard" of every other file or directory | will reduce the "quota_avail_hard" of every other file or directory | |||
in the set. | in the set. | |||
Note that there may be a number of distinct but overlapping sets of | Note that there may be a number of distinct but overlapping sets of | |||
files or directories for which a quota_used value is maintained. | files or directories for which a quota_used value is maintained, | |||
E.g. "all files with a given owner", "all files with a given group | e.g., "all files with a given owner", "all files with a given group | |||
owner". etc. The server is at liberty to choose any of those sets | owner", etc. The server is at liberty to choose any of those sets | |||
when providing the content of the quota_used attribute, but should do | when providing the content of the quota_used attribute, but should do | |||
so in a repeatable way. The rule may be configured per file system | so in a repeatable way. The rule may be configured per file system | |||
or may be "choose the set with the smallest quota". | or may be "choose the set with the smallest quota". | |||
5.8.2.31. Attribute 41: rawdev | 5.8.2.31. Attribute 41: rawdev | |||
Raw device identifier; the UNIX device major/minor node information. | Raw device number of file of type NF4BLK or NF4CHR. The device | |||
If the value of type is not NF4BLK or NF4CHR, the value returned | number is split into major and minor numbers. If the file's type | |||
SHOULD NOT be considered useful. | attribute is not NF4BLK or NF4CHR, the value returned SHOULD NOT be | |||
considered useful. | ||||
5.8.2.32. Attribute 42: space_avail | 5.8.2.32. Attribute 42: space_avail | |||
Disk space in bytes available to this user on the file system | Disk space in bytes available to this user on the file system | |||
containing this object - this should be the smallest relevant limit. | containing this object -- this should be the smallest relevant limit. | |||
5.8.2.33. Attribute 43: space_free | 5.8.2.33. Attribute 43: space_free | |||
Free disk space in bytes on the file system containing this object - | Free disk space in bytes on the file system containing this object -- | |||
this should be the smallest relevant limit. | this should be the smallest relevant limit. | |||
5.8.2.34. Attribute 44: space_total | 5.8.2.34. Attribute 44: space_total | |||
Total disk space in bytes on the file system containing this object. | Total disk space in bytes on the file system containing this object. | |||
5.8.2.35. Attribute 45: space_used | 5.8.2.35. Attribute 45: space_used | |||
Number of file system bytes allocated to this object. | Number of file system bytes allocated to this object. | |||
5.8.2.36. Attribute 46: system | 5.8.2.36. Attribute 46: system | |||
This attribute is TRUE if this file is a "system" file with respect | This attribute is TRUE if this file is a "system" file with respect | |||
to the Windows operating environment. | to the Windows operating environment. | |||
5.8.2.37. Attribute 47: time_access | 5.8.2.37. Attribute 47: time_access | |||
The time_access attribute represents the time of last access to the | The time_access attribute represents the time of last access to the | |||
object by a read that was satisfied by the server. The notion of | object by a READ operation sent to the server. The notion of what is | |||
what is an "access" depends on server's operating environment and/or | an "access" depends on the server's operating environment and/or the | |||
the server's file system semantics. For example, for servers obeying | server's file system semantics. For example, for servers obeying | |||
POSIX semantics, time_access would be updated only by the READ and | Portable Operating System Interface (POSIX) semantics, time_access | |||
READDIR operations and not any of the operations that modify the | would be updated only by the READ and READDIR operations and not any | |||
content of the object [16], [17], [18]. Of course, setting the | of the operations that modify the content of the object [16], [17], | |||
corresponding time_access_set attribute is another way to modify the | [18]. Of course, setting the corresponding time_access_set attribute | |||
time_access attribute. | is another way to modify the time_access attribute. | |||
Whenever the file object resides on a writable file system, the | Whenever the file object resides on a writable file system, the | |||
server should make best efforts to record time_access into stable | server should make its best efforts to record time_access into stable | |||
storage. However, to mitigate the performance effects of doing so, | storage. However, to mitigate the performance effects of doing so, | |||
and most especially whenever the server is satisfying the read of the | and most especially whenever the server is satisfying the read of the | |||
object's content from its cache, the server MAY cache access time | object's content from its cache, the server MAY cache access time | |||
updates and lazily write them to stable storage. It is also | updates and lazily write them to stable storage. It is also | |||
acceptable to give administrators of the server the option to disable | acceptable to give administrators of the server the option to disable | |||
time_access updates. | time_access updates. | |||
5.8.2.38. Attribute 48: time_access_set | 5.8.2.38. Attribute 48: time_access_set | |||
Set the time of last access to the object. SETATTR use only. | Sets the time of last access to the object. SETATTR use only. | |||
5.8.2.39. Attribute 49: time_backup | 5.8.2.39. Attribute 49: time_backup | |||
The time of last backup of the object. | The time of last backup of the object. | |||
5.8.2.40. Attribute 50: time_create | 5.8.2.40. Attribute 50: time_create | |||
The time of creation of the object. This attribute does not have any | The time of creation of the object. This attribute does not have any | |||
relation to the traditional UNIX file attribute "ctime" or "change | relation to the traditional UNIX file attribute "ctime" or "change | |||
time". | time". | |||
skipping to change at page 120, line 45 | skipping to change at page 120, line 15 | |||
5.8.2.42. Attribute 52: time_metadata | 5.8.2.42. Attribute 52: time_metadata | |||
The time of last metadata modification of the object. | The time of last metadata modification of the object. | |||
5.8.2.43. Attribute 53: time_modify | 5.8.2.43. Attribute 53: time_modify | |||
The time of last modification to the object. | The time of last modification to the object. | |||
5.8.2.44. Attribute 54: time_modify_set | 5.8.2.44. Attribute 54: time_modify_set | |||
Set the time of last modification to the object. SETATTR use only. | Sets the time of last modification to the object. SETATTR use only. | |||
5.9. Interpreting owner and owner_group | 5.9. Interpreting owner and owner_group | |||
The RECOMMENDED attributes "owner" and "owner_group" (and also users | The RECOMMENDED attributes "owner" and "owner_group" (and also users | |||
and groups within the "acl" attribute) are represented in terms of a | and groups within the "acl" attribute) are represented in terms of a | |||
UTF-8 string. To avoid a representation that is tied to a particular | UTF-8 string. To avoid a representation that is tied to a particular | |||
underlying implementation at the client or server, the use of the | underlying implementation at the client or server, the use of the | |||
UTF-8 string has been chosen. Note that section 6.1 of RFC2624 [45] | UTF-8 string has been chosen. Note that Section 6.1 of RFC 2624 [45] | |||
provides additional rationale. It is expected that the client and | provides additional rationale. It is expected that the client and | |||
server will have their own local representation of owner and | server will have their own local representation of owner and | |||
owner_group that is used for local storage or presentation to the end | owner_group that is used for local storage or presentation to the end | |||
user. Therefore, it is expected that when these attributes are | user. Therefore, it is expected that when these attributes are | |||
transferred between the client and server that the local | transferred between the client and server, the local representation | |||
representation is translated to a syntax of the form "user@ | is translated to a syntax of the form "user@dns_domain". This will | |||
dns_domain". This will allow for a client and server that do not use | allow for a client and server that do not use the same local | |||
the same local representation the ability to translate to a common | representation the ability to translate to a common syntax that can | |||
syntax that can be interpreted by both. | be interpreted by both. | |||
Similarly, security principals may be represented in different ways | Similarly, security principals may be represented in different ways | |||
by different security mechanisms. Servers normally translate these | by different security mechanisms. Servers normally translate these | |||
representations into a common format, generally that used by local | representations into a common format, generally that used by local | |||
storage, to serve as a means of identifying the users corresponding | storage, to serve as a means of identifying the users corresponding | |||
to these security principals. When these local identifiers are | to these security principals. When these local identifiers are | |||
translated to the form of the owner attribute, associated with files | translated to the form of the owner attribute, associated with files | |||
created by such principals they identify, in a common format, the | created by such principals, they identify, in a common format, the | |||
users associated with each corresponding set of security principals. | users associated with each corresponding set of security principals. | |||
The translation used to interpret owner and group strings is not | The translation used to interpret owner and group strings is not | |||
specified as part of the protocol. This allows various solutions to | specified as part of the protocol. This allows various solutions to | |||
be employed. For example, a local translation table may be consulted | be employed. For example, a local translation table may be consulted | |||
that maps between a numeric identifier to the user@dns_domain syntax. | that maps a numeric identifier to the user@dns_domain syntax. A name | |||
A name service may also be used to accomplish the translation. A | service may also be used to accomplish the translation. A server may | |||
server may provide a more general service, not limited by any | provide a more general service, not limited by any particular | |||
particular translation (which would only translate a limited set of | translation (which would only translate a limited set of possible | |||
possible strings) by storing the owner and owner_group attributes in | strings) by storing the owner and owner_group attributes in local | |||
local storage without any translation or it may augment a translation | storage without any translation or it may augment a translation | |||
method by storing the entire string for attributes for which no | method by storing the entire string for attributes for which no | |||
translation is available while using the local representation for | translation is available while using the local representation for | |||
those cases in which a translation is available. | those cases in which a translation is available. | |||
Servers that do not provide support for all possible values of the | Servers that do not provide support for all possible values of the | |||
owner and owner_group attributes, SHOULD return an error | owner and owner_group attributes SHOULD return an error | |||
(NFS4ERR_BADOWNER) when a string is presented that has no | (NFS4ERR_BADOWNER) when a string is presented that has no | |||
translation, as the value to be set for a SETATTR of the owner, | translation, as the value to be set for a SETATTR of the owner, | |||
owner_group, or acl attributes. When a server does accept an owner | owner_group, or acl attributes. When a server does accept an owner | |||
or owner_group value as valid on a SETATTR (and similarly for the | or owner_group value as valid on a SETATTR (and similarly for the | |||
owner and group strings in an acl), it is promising to return that | owner and group strings in an acl), it is promising to return that | |||
same string when a corresponding GETATTR is done. Configuration | same string when a corresponding GETATTR is done. Configuration | |||
changes (including changes from the mapping of the string to the | changes (including changes from the mapping of the string to the | |||
local representation) and ill-constructed name translations (those | local representation) and ill-constructed name translations (those | |||
that contain aliasing) may make that promise impossible to honor. | that contain aliasing) may make that promise impossible to honor. | |||
Servers should make appropriate efforts to avoid a situation in which | Servers should make appropriate efforts to avoid a situation in which | |||
these attributes have their values changed when no real change to | these attributes have their values changed when no real change to | |||
ownership has occurred. | ownership has occurred. | |||
The "dns_domain" portion of the owner string is meant to be a DNS | The "dns_domain" portion of the owner string is meant to be a DNS | |||
domain name. For example, user@example.org. Servers should accept | domain name, for example, user@example.org. Servers should accept as | |||
as valid a set of users for at least one domain. A server may treat | valid a set of users for at least one domain. A server may treat | |||
other domains as having no valid translations. A more general | other domains as having no valid translations. A more general | |||
service is provided when a server is capable of accepting users for | service is provided when a server is capable of accepting users for | |||
multiple domains, or for all domains, subject to security | multiple domains, or for all domains, subject to security | |||
constraints. | constraints. | |||
In the case where there is no translation available to the client or | In the case where there is no translation available to the client or | |||
server, the attribute value will be constructed without the "@". | server, the attribute value will be constructed without the "@". | |||
Therefore, the absence of the @ from the owner or owner_group | Therefore, the absence of the @ from the owner or owner_group | |||
attribute signifies that no translation was available at the sender | attribute signifies that no translation was available at the sender | |||
and that the receiver of the attribute should not use that string as | and that the receiver of the attribute should not use that string as | |||
a basis for translation into its own internal format. Even though | a basis for translation into its own internal format. Even though | |||
the attribute value can not be translated, it may still be useful. | the attribute value cannot be translated, it may still be useful. In | |||
In the case of a client, the attribute string may be used for local | the case of a client, the attribute string may be used for local | |||
display of ownership. | display of ownership. | |||
To provide a greater degree of compatibility with NFSv3, which | To provide a greater degree of compatibility with NFSv3, which | |||
identified users and groups by 32-bit unsigned user identifiers and | identified users and groups by 32-bit unsigned user identifiers and | |||
group identifiers, owner and group strings that consist of decimal | group identifiers, owner and group strings that consist of decimal | |||
numeric values with no leading zeros can be given a special | numeric values with no leading zeros can be given a special | |||
interpretation by clients and servers which choose to provide such | interpretation by clients and servers that choose to provide such | |||
support. The receiver may treat such a user or group string as | support. The receiver may treat such a user or group string as | |||
representing the same user as would be represented by an NFSv3 uid or | representing the same user as would be represented by an NFSv3 uid or | |||
gid having the corresponding numeric value. A server is not | gid having the corresponding numeric value. A server is not | |||
obligated to accept such a string, but may return an NFS4ERR_BADOWNER | obligated to accept such a string, but may return an NFS4ERR_BADOWNER | |||
instead. To avoid this mechanism being used to subvert user and | instead. To avoid this mechanism being used to subvert user and | |||
group translation, so that a client might pass all of the owners and | group translation, so that a client might pass all of the owners and | |||
groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER | groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER | |||
error when there is a valid translation for the user or owner | error when there is a valid translation for the user or owner | |||
designated in this way. In that case, the client must use the | designated in this way. In that case, the client must use the | |||
appropriate name@domain string and not the special form for | appropriate name@domain string and not the special form for | |||
skipping to change at page 124, line 16 | skipping to change at page 123, line 34 | |||
layout_blksize, and WRITE operations with a data argument of size | layout_blksize, and WRITE operations with a data argument of size | |||
that is a whole multiple of layout_blksize. | that is a whole multiple of layout_blksize. | |||
5.12.4. Attribute 63: layout_hint | 5.12.4. Attribute 63: layout_hint | |||
The layout_hint attribute (see Section 3.3.19) may be set on newly | The layout_hint attribute (see Section 3.3.19) may be set on newly | |||
created files to influence the metadata server's choice for the | created files to influence the metadata server's choice for the | |||
file's layout. If possible, this attribute is one of those set in | file's layout. If possible, this attribute is one of those set in | |||
the initial attributes within the OPEN operation. The metadata | the initial attributes within the OPEN operation. The metadata | |||
server may choose to ignore this attribute. The layout_hint | server may choose to ignore this attribute. The layout_hint | |||
attribute is a sub-set of the layout structure returned by LAYOUTGET. | attribute is a subset of the layout structure returned by LAYOUTGET. | |||
For example, instead of specifying particular devices, this would be | For example, instead of specifying particular devices, this would be | |||
used to suggest the stripe width of a file. The server | used to suggest the stripe width of a file. The server | |||
implementation determines which fields within the layout will be | implementation determines which fields within the layout will be | |||
used. | used. | |||
5.12.5. Attribute 64: layout_type | 5.12.5. Attribute 64: layout_type | |||
This attribute lists the layout type(s) available for a file. The | This attribute lists the layout type(s) available for a file. The | |||
value returned by the server is for informational purposes only. The | value returned by the server is for informational purposes only. The | |||
client will use the LAYOUTGET operation to obtain the information | client will use the LAYOUTGET operation to obtain the information | |||
needed in order to perform I/O. For example, the specific device | needed in order to perform I/O, for example, the specific device | |||
information for the file and its layout. | information for the file and its layout. | |||
5.12.6. Attribute 68: mdsthreshold | 5.12.6. Attribute 68: mdsthreshold | |||
This attribute is a server provided hint used to communicate to the | This attribute is a server-provided hint used to communicate to the | |||
client when it is more efficient to send READ and WRITE operations to | client when it is more efficient to send READ and WRITE operations to | |||
the metadata server or the data server. The two types of thresholds | the metadata server or the data server. The two types of thresholds | |||
described are file size thresholds and I/O size thresholds. If a | described are file size thresholds and I/O size thresholds. If a | |||
file's size is smaller than the file size threshold, data accesses | file's size is smaller than the file size threshold, data accesses | |||
SHOULD be sent to the metadata server. If an I/O request has a | SHOULD be sent to the metadata server. If an I/O request has a | |||
length that is below the I/O size threshold, the I/O SHOULD be sent | length that is below the I/O size threshold, the I/O SHOULD be sent | |||
to the metadata server. Each threshold type is specified separately | to the metadata server. Each threshold type is specified separately | |||
for READ and WRITE. | for read and write. | |||
The server MAY provide both types of thresholds for a file. If both | The server MAY provide both types of thresholds for a file. If both | |||
file size and I/O size are provided, the client SHOULD reach or | file size and I/O size are provided, the client SHOULD reach or | |||
exceed both thresholds before sending its READ or WRITE operations to | exceed both thresholds before sending its read or write requests to | |||
the data server. Alternatively, if only one of the specified | the data server. Alternatively, if only one of the specified | |||
thresholds are reached or exceeded, the I/O requests are sent to the | thresholds is reached or exceeded, the I/O requests are sent to the | |||
metadata server. | metadata server. | |||
For each threshold type, a value of 0 indicates no READ or WRITE | For each threshold type, a value of zero indicates no READ or WRITE | |||
should be sent to the metadata server, while a value of all 1s | should be sent to the metadata server, while a value of all ones | |||
indicates all READS or WRITES should be sent to the metadata server. | indicates that all READs or WRITEs should be sent to the metadata | |||
server. | ||||
The attribute is available on a per filehandle basis. If the current | The attribute is available on a per-filehandle basis. If the current | |||
filehandle refers to a non-pNFS file or directory, the metadata | filehandle refers to a non-pNFS file or directory, the metadata | |||
server should return an attribute that is representative of the | server should return an attribute that is representative of the | |||
filehandle's file system. It is suggested that this attribute is | filehandle's file system. It is suggested that this attribute is | |||
queried as part of the OPEN operation. Due to dynamic system | queried as part of the OPEN operation. Due to dynamic system | |||
changes, the client should not assume that the attribute will remain | changes, the client should not assume that the attribute will remain | |||
constant for any specific time period, thus it should be periodically | constant for any specific time period; thus, it should be | |||
refreshed. | periodically refreshed. | |||
5.13. Retention Attributes | 5.13. Retention Attributes | |||
Retention is a concept whereby a file object can be placed in an | Retention is a concept whereby a file object can be placed in an | |||
immutable, undeletable, unrenamable state for a fixed or infinite | immutable, undeletable, unrenamable state for a fixed or infinite | |||
duration of time. Once in this "retained" state, the file cannot be | duration of time. Once in this "retained" state, the file cannot be | |||
moved out of the state until the duration of retention has been | moved out of the state until the duration of retention has been | |||
reached. | reached. | |||
When retention is enabled, retention MUST extend to the data of the | When retention is enabled, retention MUST extend to the data of the | |||
skipping to change at page 126, line 31 | skipping to change at page 126, line 4 | |||
If the client sets rs_enable to TRUE, then it is enabling retention | If the client sets rs_enable to TRUE, then it is enabling retention | |||
on the file object with the begin time of retention starting from the | on the file object with the begin time of retention starting from the | |||
server's current time and date. The duration of the retention can | server's current time and date. The duration of the retention can | |||
also be provided if the rs_duration array is of length one. The | also be provided if the rs_duration array is of length one. The | |||
duration is the time in seconds from the begin time of retention, and | duration is the time in seconds from the begin time of retention, and | |||
if set to RET4_DURATION_INFINITE, the file is to be retained forever. | if set to RET4_DURATION_INFINITE, the file is to be retained forever. | |||
If retention is enabled, with no duration specified in either this | If retention is enabled, with no duration specified in either this | |||
SETATTR or a previous SETATTR, the duration defaults to zero seconds. | SETATTR or a previous SETATTR, the duration defaults to zero seconds. | |||
The server MAY restrict the enabling of retention or the duration of | The server MAY restrict the enabling of retention or the duration of | |||
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. | retention on the basis of the ACE4_WRITE_RETENTION ACL permission. | |||
The enabling of retention MUST NOT prevent the enabling of event- | The enabling of retention MUST NOT prevent the enabling of event- | |||
based retention nor the modification of the retention_hold attribute. | based retention or the modification of the retention_hold attribute. | |||
The following rules apply to both the retention_set and retentevt_set | The following rules apply to both the retention_set and retentevt_set | |||
attributes. | attributes. | |||
o As long as retention is not enabled, the client is permitted to | o As long as retention is not enabled, the client is permitted to | |||
decrease the duration. | decrease the duration. | |||
o The duration can always be set to an equal or higher value, even | o The duration can always be set to an equal or higher value, even | |||
if retention is enabled. Note that once retention is enabled, the | if retention is enabled. Note that once retention is enabled, the | |||
actual duration (as returned by the retention_get or retentevt_get | actual duration (as returned by the retention_get or retentevt_get | |||
attributes, see Section 5.13.1 or Section 5.13.3), is constantly | attributes; see Section 5.13.1 or Section 5.13.3) is constantly | |||
counting down to zero (one unit per second), unless the duration | counting down to zero (one unit per second), unless the duration | |||
was set to RET4_DURATION_INFINITE. Thus it will not be possible | was set to RET4_DURATION_INFINITE. Thus, it will not be possible | |||
for the client to precisely extend the duration on a file that has | for the client to precisely extend the duration on a file that has | |||
retention enabled. | retention enabled. | |||
o While retention is enabled, attempts to disable retention or | o While retention is enabled, attempts to disable retention or | |||
decrease the retention's duration MUST fail with the error | decrease the retention's duration MUST fail with the error | |||
NFS4ERR_INVAL. | NFS4ERR_INVAL. | |||
o If the principal attempting to change retention_set or | o If the principal attempting to change retention_set or | |||
retentevt_set does not have ACE4_WRITE_RETENTION permissions, the | retentevt_set does not have ACE4_WRITE_RETENTION permissions, the | |||
attempt MUST fail with NFS4ERR_ACCESS. | attempt MUST fail with NFS4ERR_ACCESS. | |||
5.13.3. Attribute 71: retentevt_get | 5.13.3. Attribute 71: retentevt_get | |||
Get the event-based retention duration, and if enabled, the event- | Gets the event-based retention duration, and if enabled, the event- | |||
based retention begin time of the file object. This attribute is | based retention begin time of the file object. This attribute is | |||
like retention_get but refers to event-based retention. The event | like retention_get, but refers to event-based retention. The event | |||
that triggers event-based retention is not defined by the NFSv4.1 | that triggers event-based retention is not defined by the NFSv4.1 | |||
specification. | specification. | |||
5.13.4. Attribute 72: retentevt_set | 5.13.4. Attribute 72: retentevt_set | |||
Set the event-based retention duration, and optionally enable event- | Sets the event-based retention duration, and optionally enables | |||
based retention on the file object. This attribute corresponds to | event-based retention on the file object. This attribute corresponds | |||
retentevt_get, is like retention_set, but refers to event-based | to retentevt_get and is like retention_set, but refers to event-based | |||
retention. When event based retention is set, the file MUST be | retention. When event-based retention is set, the file MUST be | |||
retained even if non-event-based retention has been set, and the | retained even if non-event-based retention has been set, and the | |||
duration of non-event-based retention has been reached. Conversely, | duration of non-event-based retention has been reached. Conversely, | |||
when non-event-based retention has been set, the file MUST be | when non-event-based retention has been set, the file MUST be | |||
retained even if event-based retention has been set, and the duration | retained even if event-based retention has been set, and the duration | |||
of event-based retention has been reached. The server MAY restrict | of event-based retention has been reached. The server MAY restrict | |||
the enabling of event-based retention or the duration of event-based | the enabling of event-based retention or the duration of event-based | |||
retention on the basis of the ACE4_WRITE_RETENTION ACL permission. | retention on the basis of the ACE4_WRITE_RETENTION ACL permission. | |||
The enabling of event-based retention MUST NOT prevent the enabling | The enabling of event-based retention MUST NOT prevent the enabling | |||
of non-event-based retention nor the modification of the | of non-event-based retention or the modification of the | |||
retention_hold attribute. | retention_hold attribute. | |||
5.13.5. Attribute 73: retention_hold | 5.13.5. Attribute 73: retention_hold | |||
Get or set administrative retention holds, one hold per bit position. | Gets or sets administrative retention holds, one hold per bit | |||
position. | ||||
This attribute allows one to 64 administrative holds, one hold per | This attribute allows one to 64 administrative holds, one hold per | |||
bit on the attribute. If retention_hold is not zero, then the file | bit on the attribute. If retention_hold is not zero, then the file | |||
MUST NOT be deleted, renamed, or modified, even if the duration on | MUST NOT be deleted, renamed, or modified, even if the duration on | |||
enabled event or non-event-based retention has been reached. The | enabled event or non-event-based retention has been reached. The | |||
server MAY restrict the modification of retention_hold on the basis | server MAY restrict the modification of retention_hold on the basis | |||
of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of | of the ACE4_WRITE_RETENTION_HOLD ACL permission. The enabling of | |||
administration retention holds does not prevent the enabling of | administration retention holds does not prevent the enabling of | |||
event-based or non-event-based retention. | event-based or non-event-based retention. | |||
If the principal attempting to change retention_hold does not have | If the principal attempting to change retention_hold does not have | |||
ACE4_WRITE_RETENTION_HOLD permissions, the attempt MUST fail with | ACE4_WRITE_RETENTION_HOLD permissions, the attempt MUST fail with | |||
NFS4ERR_ACCESS. | NFS4ERR_ACCESS. | |||
6. Access Control Attributes | 6. Access Control Attributes | |||
Access Control Lists (ACLs) are file attributes that specify fine | Access Control Lists (ACLs) are file attributes that specify fine- | |||
grained access control. This chapter covers the "acl", "dacl", | grained access control. This section covers the "acl", "dacl", | |||
"sacl", "aclsupport", "mode", "mode_set_masked" file attributes, and | "sacl", "aclsupport", "mode", and "mode_set_masked" file attributes | |||
their interactions. Note that file attributes may apply to any file | and their interactions. Note that file attributes may apply to any | |||
system object. | file system object. | |||
6.1. Goals | 6.1. Goals | |||
ACLs and modes represent two well established models for specifying | ACLs and modes represent two well-established models for specifying | |||
permissions. This chapter specifies requirements that attempt to | permissions. This section specifies requirements that attempt to | |||
meet the following goals: | meet the following goals: | |||
o If a server supports the mode attribute, it should provide | o If a server supports the mode attribute, it should provide | |||
reasonable semantics to clients that only set and retrieve the | reasonable semantics to clients that only set and retrieve the | |||
mode attribute. | mode attribute. | |||
o If a server supports ACL attributes, it should provide reasonable | o If a server supports ACL attributes, it should provide reasonable | |||
semantics to clients that only set and retrieve those attributes. | semantics to clients that only set and retrieve those attributes. | |||
o On servers that support the mode attribute, if ACL attributes have | o On servers that support the mode attribute, if ACL attributes have | |||
skipping to change at page 128, line 40 | skipping to change at page 128, line 12 | |||
o On servers that support the mode attribute, if the ACL attributes | o On servers that support the mode attribute, if the ACL attributes | |||
have been previously set on an object, either explicitly or via | have been previously set on an object, either explicitly or via | |||
inheritance: | inheritance: | |||
* Setting only the mode attribute should effectively control the | * Setting only the mode attribute should effectively control the | |||
traditional UNIX-like permissions of read, write, and execute | traditional UNIX-like permissions of read, write, and execute | |||
on owner, owner_group, and other. | on owner, owner_group, and other. | |||
* Setting only the mode attribute should provide reasonable | * Setting only the mode attribute should provide reasonable | |||
security. For example, setting a mode of 000 should be enough | security. For example, setting a mode of 000 should be enough | |||
to ensure that future opens for read or write by any principal | to ensure that future OPEN operations for | |||
fail, regardless of a previously existing or inherited ACL. | OPEN4_SHARE_ACCESS_READ or OPEN4_SHARE_ACCESS_WRITE by any | |||
principal fail, regardless of a previously existing or | ||||
inherited ACL. | ||||
o NFSv4.1 may introduce different semantics relating to the mode and | o NFSv4.1 may introduce different semantics relating to the mode and | |||
ACL attributes, but it does not render invalid any previously | ACL attributes, but it does not render invalid any previously | |||
existing implementations. Additionally, this chapter provides | existing implementations. Additionally, this section provides | |||
clarifications based on previous implementations and discussions | clarifications based on previous implementations and discussions | |||
around them. | around them. | |||
o On servers that support both the mode and the acl or dacl | o On servers that support both the mode and the acl or dacl | |||
attributes, the server must keep the two consistent with each | attributes, the server must keep the two consistent with each | |||
other. The value of the mode attribute (with the exception of the | other. The value of the mode attribute (with the exception of the | |||
three high order bits described in Section 6.2.4), must be | three high-order bits described in Section 6.2.4) must be | |||
determined entirely by the value of the ACL, so that use of the | determined entirely by the value of the ACL, so that use of the | |||
mode is never required for anything other than setting the three | mode is never required for anything other than setting the three | |||
high order bits. See Section 6.4.1 for exact requirements. | high-order bits. See Section 6.4.1 for exact requirements. | |||
o When a mode attribute is set on an object, the ACL attributes may | o When a mode attribute is set on an object, the ACL attributes may | |||
need to be modified so as to not conflict with the new mode. In | need to be modified in order to not conflict with the new mode. | |||
such cases, it is desirable that the ACL keep as much information | In such cases, it is desirable that the ACL keep as much | |||
as possible. This includes information about inheritance, AUDIT | information as possible. This includes information about | |||
and ALARM ACEs, and permissions granted and denied that do not | inheritance, AUDIT and ALARM ACEs, and permissions granted and | |||
conflict with the new mode. | denied that do not conflict with the new mode. | |||
6.2. File Attributes Discussion | 6.2. File Attributes Discussion | |||
6.2.1. Attribute 12: acl | 6.2.1. Attribute 12: acl | |||
The NFSv4.1 ACL attribute contains an array of access control entries | The NFSv4.1 ACL attribute contains an array of Access Control Entries | |||
(ACEs) that are associated with the file system object. Although the | (ACEs) that are associated with the file system object. Although the | |||
client can read and write the acl attribute, the server is | client can set and get the acl attribute, the server is responsible | |||
responsible for using the ACL to perform access control. The client | for using the ACL to perform access control. The client can use the | |||
can use the OPEN or ACCESS operations to check access without | OPEN or ACCESS operations to check access without modifying or | |||
modifying or reading data or metadata. | reading data or metadata. | |||
The NFS ACE structure is defined as follows: | The NFS ACE structure is defined as follows: | |||
typedef uint32_t acetype4; | typedef uint32_t acetype4; | |||
typedef uint32_t aceflag4; | typedef uint32_t aceflag4; | |||
typedef uint32_t acemask4; | typedef uint32_t acemask4; | |||
struct nfsace4 { | struct nfsace4 { | |||
acetype4 type; | acetype4 type; | |||
aceflag4 flag; | aceflag4 flag; | |||
acemask4 access_mask; | acemask4 access_mask; | |||
utf8str_mixed who; | utf8str_mixed who; | |||
}; | }; | |||
skipping to change at page 129, line 42 | skipping to change at page 129, line 16 | |||
typedef uint32_t acemask4; | typedef uint32_t acemask4; | |||
struct nfsace4 { | struct nfsace4 { | |||
acetype4 type; | acetype4 type; | |||
aceflag4 flag; | aceflag4 flag; | |||
acemask4 access_mask; | acemask4 access_mask; | |||
utf8str_mixed who; | utf8str_mixed who; | |||
}; | }; | |||
To determine if a request succeeds, the server processes each nfsace4 | To determine if a request succeeds, the server processes each nfsace4 | |||
entry in order. Only ACEs which have a "who" that matches the | entry in order. Only ACEs that have a "who" that matches the | |||
requester are considered. Each ACE is processed until all of the | requester are considered. Each ACE is processed until all of the | |||
bits of the requester's access have been ALLOWED. Once a bit (see | bits of the requester's access have been ALLOWED. Once a bit (see | |||
below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer | below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer | |||
considered in the processing of later ACEs. If an ACCESS_DENIED_ACE | considered in the processing of later ACEs. If an ACCESS_DENIED_ACE | |||
is encountered where the requester's access still has unALLOWED bits | is encountered where the requester's access still has unALLOWED bits | |||
in common with the "access_mask" of the ACE, the request is denied. | in common with the "access_mask" of the ACE, the request is denied. | |||
When the ACL is fully processed, if there are bits in the requester's | When the ACL is fully processed, if there are bits in the requester's | |||
mask that have not been ALLOWED or DENIED, access is denied. | mask that have not been ALLOWED or DENIED, access is denied. | |||
Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do | Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do | |||
not affect a requester's access, and instead are for triggering | not affect a requester's access, and instead are for triggering | |||
events as a result of a requester's access attempt. Therefore, AUDIT | events as a result of a requester's access attempt. Therefore, AUDIT | |||
and ALARM ACEs are processed only after processing ALLOW and DENY | and ALARM ACEs are processed only after processing ALLOW and DENY | |||
ACEs. | ACEs. | |||
The NFSv4.1 ACL model is quite rich. Some server platforms may | The NFSv4.1 ACL model is quite rich. Some server platforms may | |||
skipping to change at page 130, line 15 | skipping to change at page 129, line 33 | |||
When the ACL is fully processed, if there are bits in the requester's | When the ACL is fully processed, if there are bits in the requester's | |||
mask that have not been ALLOWED or DENIED, access is denied. | mask that have not been ALLOWED or DENIED, access is denied. | |||
Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do | Unlike the ALLOW and DENY ACE types, the ALARM and AUDIT ACE types do | |||
not affect a requester's access, and instead are for triggering | not affect a requester's access, and instead are for triggering | |||
events as a result of a requester's access attempt. Therefore, AUDIT | events as a result of a requester's access attempt. Therefore, AUDIT | |||
and ALARM ACEs are processed only after processing ALLOW and DENY | and ALARM ACEs are processed only after processing ALLOW and DENY | |||
ACEs. | ACEs. | |||
The NFSv4.1 ACL model is quite rich. Some server platforms may | The NFSv4.1 ACL model is quite rich. Some server platforms may | |||
provide access control functionality that goes beyond the UNIX-style | provide access-control functionality that goes beyond the UNIX-style | |||
mode attribute, but which is not as rich as the NFS ACL model. So | mode attribute, but that is not as rich as the NFS ACL model. So | |||
that users can take advantage of this more limited functionality, the | that users can take advantage of this more limited functionality, the | |||
server may support the acl attributes by mapping between its ACL | server may support the acl attributes by mapping between its ACL | |||
model and the NFSv4.1 ACL model. Servers must ensure that the ACL | model and the NFSv4.1 ACL model. Servers must ensure that the ACL | |||
they actually store or enforce is at least as strict as the NFSv4 ACL | they actually store or enforce is at least as strict as the NFSv4 ACL | |||
that was set. It is tempting to accomplish this by rejecting any ACL | that was set. It is tempting to accomplish this by rejecting any ACL | |||
that falls outside the small set that can be represented accurately. | that falls outside the small set that can be represented accurately. | |||
However, such an approach can render ACLs unusable without special | However, such an approach can render ACLs unusable without special | |||
client-side knowledge of the server's mapping, which defeats the | client-side knowledge of the server's mapping, which defeats the | |||
purpose of having a common NFSv4 ACL protocol. Therefore servers | purpose of having a common NFSv4 ACL protocol. Therefore, servers | |||
should accept every ACL that they can without compromising security. | should accept every ACL that they can without compromising security. | |||
To help accomplish this, servers may make a special exception, in the | To help accomplish this, servers may make a special exception, in the | |||
case of unsupported permission bits, to the rule that bits not | case of unsupported permission bits, to the rule that bits not | |||
ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- | ALLOWED or DENIED by an ACL must be denied. For example, a UNIX- | |||
style server might choose to silently allow read attribute | style server might choose to silently allow read attribute | |||
permissions even though an ACL does not explicitly allow those | permissions even though an ACL does not explicitly allow those | |||
permissions. (An ACL that explicitly denies permission to read | permissions. (An ACL that explicitly denies permission to read | |||
attributes should still be rejected.) | attributes should still be rejected.) | |||
The situation is complicated by the fact that a server may have | The situation is complicated by the fact that a server may have | |||
multiple modules that enforce ACLs. For example, the enforcement for | multiple modules that enforce ACLs. For example, the enforcement for | |||
NFSv4.1 access may be different from, but not weaker than, the | NFSv4.1 access may be different from, but not weaker than, the | |||
enforcement for local access, and both may be different from the | enforcement for local access, and both may be different from the | |||
enforcement for access through other protocols such as SMB. So it | enforcement for access through other protocols such as SMB (Server | |||
may be useful for a server to accept an ACL even if not all of its | Message Block). So it may be useful for a server to accept an ACL | |||
modules are able to support it. | even if not all of its modules are able to support it. | |||
The guiding principle with regard to NFSv4 access is that the server | The guiding principle with regard to NFSv4 access is that the server | |||
must not accept ACLs that appear to make access to the file more | must not accept ACLs that appear to make access to the file more | |||
restrictive than it really is. | restrictive than it really is. | |||
6.2.1.1. ACE Type | 6.2.1.1. ACE Type | |||
The constants used for the type field (acetype4) are as follows: | The constants used for the type field (acetype4) are as follows: | |||
const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; | const ACE4_ACCESS_ALLOWED_ACE_TYPE = 0x00000000; | |||
const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; | const ACE4_ACCESS_DENIED_ACE_TYPE = 0x00000001; | |||
const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; | const ACE4_SYSTEM_AUDIT_ACE_TYPE = 0x00000002; | |||
const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; | const ACE4_SYSTEM_ALARM_ACE_TYPE = 0x00000003; | |||
Only the ALLOWED and DENIED bits types may be used in the dacl | Only the ALLOWED and DENIED bits may be used in the dacl attribute, | |||
attribute, and only the AUDIT and ALARM bits may be used in the sacl | and only the AUDIT and ALARM bits may be used in the sacl attribute. | |||
attribute. All four are permitted in the acl attribute. | All four are permitted in the acl attribute. | |||
+------------------------------+--------------+---------------------+ | +------------------------------+--------------+---------------------+ | |||
| Value | Abbreviation | Description | | | Value | Abbreviation | Description | | |||
+------------------------------+--------------+---------------------+ | +------------------------------+--------------+---------------------+ | |||
| ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | | | ACE4_ACCESS_ALLOWED_ACE_TYPE | ALLOW | Explicitly grants | | |||
| | | the access defined | | | | | the access defined | | |||
| | | in acemask4 to the | | | | | in acemask4 to the | | |||
| | | file or directory. | | | | | file or directory. | | |||
| ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | | | ACE4_ACCESS_DENIED_ACE_TYPE | DENY | Explicitly denies | | |||
| | | the access defined | | | | | the access defined | | |||
| | | in acemask4 to the | | | | | in acemask4 to the | | |||
| | | file or directory. | | | | | file or directory. | | |||
| ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | LOG (in a system | | | ACE4_SYSTEM_AUDIT_ACE_TYPE | AUDIT | Log (in a | | |||
| | | dependent way) any | | | | | system-dependent | | |||
| | | access attempt to a | | | | | way) any access | | |||
| | | file or directory | | | | | attempt to a file | | |||
| | | which uses any of | | | | | or directory that | | |||
| | | the access methods | | | | | uses any of the | | |||
| | | access methods | | ||||
| | | specified in | | | | | specified in | | |||
| | | acemask4. | | | | | acemask4. | | |||
| ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate a system | | | ACE4_SYSTEM_ALARM_ACE_TYPE | ALARM | Generate an alarm | | |||
| | | ALARM (system | | | | | (in a | | |||
| | | dependent) when any | | | | | system-dependent | | |||
| | | way) when any | | ||||
| | | access attempt is | | | | | access attempt is | | |||
| | | made to a file or | | | | | made to a file or | | |||
| | | directory for the | | | | | directory for the | | |||
| | | access methods | | | | | access methods | | |||
| | | specified in | | | | | specified in | | |||
| | | acemask4. | | | | | acemask4. | | |||
+------------------------------+--------------+---------------------+ | +------------------------------+--------------+---------------------+ | |||
The "Abbreviation" column denotes how the types will be referred to | The "Abbreviation" column denotes how the types will be referred to | |||
throughout the rest of this chapter. | throughout the rest of this section. | |||
6.2.1.2. Attribute 13: aclsupport | 6.2.1.2. Attribute 13: aclsupport | |||
A server need not support all of the above ACE types. This attribute | A server need not support all of the above ACE types. This attribute | |||
indicates which ACE types are supported for the current file system. | indicates which ACE types are supported for the current file system. | |||
The bitmask constants used to represent the above definitions within | The bitmask constants used to represent the above definitions within | |||
the aclsupport attribute are as follows: | the aclsupport attribute are as follows: | |||
const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; | const ACL4_SUPPORT_ALLOW_ACL = 0x00000001; | |||
const ACL4_SUPPORT_DENY_ACL = 0x00000002; | const ACL4_SUPPORT_DENY_ACL = 0x00000002; | |||
const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; | const ACL4_SUPPORT_AUDIT_ACL = 0x00000004; | |||
const ACL4_SUPPORT_ALARM_ACL = 0x00000008; | const ACL4_SUPPORT_ALARM_ACL = 0x00000008; | |||
Servers which support either the ALLOW or DENY ACE type SHOULD | Servers that support either the ALLOW or DENY ACE type SHOULD support | |||
support both ALLOW and DENY ACE types. | both ALLOW and DENY ACE types. | |||
Clients should not attempt to set an ACE unless the server claims | Clients should not attempt to set an ACE unless the server claims | |||
support for that ACE type. If the server receives a request to set | support for that ACE type. If the server receives a request to set | |||
an ACE that it cannot store, it MUST reject the request with | an ACE that it cannot store, it MUST reject the request with | |||
NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE | NFS4ERR_ATTRNOTSUPP. If the server receives a request to set an ACE | |||
that it can store but cannot enforce, the server SHOULD reject the | that it can store but cannot enforce, the server SHOULD reject the | |||
request with NFS4ERR_ATTRNOTSUPP. | request with NFS4ERR_ATTRNOTSUPP. | |||
Support for any of the ACL attributes is optional (albeit, | Support for any of the ACL attributes is optional (albeit | |||
RECOMMENDED). However, a server that supports either of the new ACL | RECOMMENDED). However, a server that supports either of the new ACL | |||
attributes (dacl or sacl) MUST allow use of the new ACL attributes to | attributes (dacl or sacl) MUST allow use of the new ACL attributes to | |||
access all of the ACE types which it supports. In other words, if | access all of the ACE types that it supports. In other words, if | |||
such a server supports ALLOW or DENY ACEs, then it MUST support the | such a server supports ALLOW or DENY ACEs, then it MUST support the | |||
dacl attribute, and if it supports AUDIT or ALARM ACEs, then it MUST | dacl attribute, and if it supports AUDIT or ALARM ACEs, then it MUST | |||
support the sacl attribute. | support the sacl attribute. | |||
6.2.1.3. ACE Access Mask | 6.2.1.3. ACE Access Mask | |||
The bitmask constants used for the access mask field are as follows: | The bitmask constants used for the access mask field are as follows: | |||
const ACE4_READ_DATA = 0x00000001; | const ACE4_READ_DATA = 0x00000001; | |||
const ACE4_LIST_DIRECTORY = 0x00000001; | const ACE4_LIST_DIRECTORY = 0x00000001; | |||
skipping to change at page 135, line 28 | skipping to change at page 134, line 47 | |||
ACE4_READ_NAMED_ATTRS | ACE4_READ_NAMED_ATTRS | |||
Operation(s) affected: | Operation(s) affected: | |||
OPENATTR | OPENATTR | |||
Discussion: | Discussion: | |||
Permission to read the named attributes of a file or to lookup | Permission to read the named attributes of a file or to lookup | |||
the named attributes directory. OPENATTR is affected when it | the named attribute directory. OPENATTR is affected when it is | |||
is not used to create a named attribute directory. This is | not used to create a named attribute directory. This is when | |||
when 1.) createdir is TRUE, but a named attribute directory | 1) createdir is TRUE, but a named attribute directory already | |||
already exists, or 2.) createdir is FALSE. | exists, or 2) createdir is FALSE. | |||
ACE4_WRITE_NAMED_ATTRS | ACE4_WRITE_NAMED_ATTRS | |||
Operation(s) affected: | Operation(s) affected: | |||
OPENATTR | OPENATTR | |||
Discussion: | Discussion: | |||
Permission to write the named attributes of a file or to create | Permission to write the named attributes of a file or to create | |||
a named attribute directory. OPENATTR is affected when it is | a named attribute directory. OPENATTR is affected when it is | |||
used to create a named attribute directory. This is when | used to create a named attribute directory. This is when | |||
createdir is TRUE and no named attribute directory exists. The | createdir is TRUE and no named attribute directory exists. The | |||
ability to check whether or not a named attribute directory | ability to check whether or not a named attribute directory | |||
exists depends on the ability to look it up, therefore, users | exists depends on the ability to look it up; therefore, users | |||
also need the ACE4_READ_NAMED_ATTRS permission in order to | also need the ACE4_READ_NAMED_ATTRS permission in order to | |||
create a named attribute directory. | create a named attribute directory. | |||
ACE4_EXECUTE | ACE4_EXECUTE | |||
Operation(s) affected: | Operation(s) affected: | |||
READ | READ | |||
OPEN | OPEN | |||
skipping to change at page 137, line 33 | skipping to change at page 137, line 4 | |||
Permission to delete a file or directory within a directory. | Permission to delete a file or directory within a directory. | |||
See Section 6.2.1.3.2 for information on ACE4_DELETE and | See Section 6.2.1.3.2 for information on ACE4_DELETE and | |||
ACE4_DELETE_CHILD interact. | ACE4_DELETE_CHILD interact. | |||
ACE4_READ_ATTRIBUTES | ACE4_READ_ATTRIBUTES | |||
Operation(s) affected: | Operation(s) affected: | |||
GETATTR of file system object attributes | GETATTR of file system object attributes | |||
VERIFY | VERIFY | |||
NVERIFY | NVERIFY | |||
READDIR | READDIR | |||
Discussion: | Discussion: | |||
The ability to read basic attributes (non-ACLs) of a file. On | The ability to read basic attributes (non-ACLs) of a file. On | |||
a UNIX system, basic attributes can be thought of as the stat | a UNIX system, basic attributes can be thought of as the stat- | |||
level attributes. Allowing this access mask bit would mean the | level attributes. Allowing this access mask bit would mean | |||
entity can execute "ls -l" and stat. If a READDIR operation | that the entity can execute "ls -l" and stat. If a READDIR | |||
requests attributes, this mask must be allowed for the READDIR | operation requests attributes, this mask must be allowed for | |||
to succeed. | the READDIR to succeed. | |||
ACE4_WRITE_ATTRIBUTES | ACE4_WRITE_ATTRIBUTES | |||
Operation(s) affected: | Operation(s) affected: | |||
SETATTR of time_access_set, time_backup, | SETATTR of time_access_set, time_backup, | |||
time_create, time_modify_set, mimetype, hidden, system | time_create, time_modify_set, mimetype, hidden, system | |||
Discussion: | Discussion: | |||
Permission to change the times associated with a file or | Permission to change the times associated with a file or | |||
directory to an arbitrary value. Also permission to change the | directory to an arbitrary value. Also permission to change the | |||
mimetype, hidden and system attributes. A user having | mimetype, hidden, and system attributes. A user having | |||
ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set | ACE4_WRITE_DATA or ACE4_WRITE_ATTRIBUTES will be allowed to set | |||
the times associated with a file to the current server time. | the times associated with a file to the current server time. | |||
ACE4_WRITE_RETENTION | ACE4_WRITE_RETENTION | |||
Operation(s) affected: | Operation(s) affected: | |||
SETATTR of retention_set, retentevt_set. | SETATTR of retention_set, retentevt_set. | |||
Discussion: | Discussion: | |||
skipping to change at page 140, line 19 | skipping to change at page 139, line 34 | |||
NONE | NONE | |||
Discussion: | Discussion: | |||
Permission to use the file object as a synchronization | Permission to use the file object as a synchronization | |||
primitive for interprocess communication. This permission is | primitive for interprocess communication. This permission is | |||
not enforced or interpreted by the NFSv4.1 server on behalf of | not enforced or interpreted by the NFSv4.1 server on behalf of | |||
the client. | the client. | |||
Typically, the ACE4_SYNCHRONIZE permission is only meaningful | Typically, the ACE4_SYNCHRONIZE permission is only meaningful | |||
on local file systems, i.e. file systems not accessed via | on local file systems, i.e., file systems not accessed via | |||
NFSv4.1. The reason that the permission bit exists is that | NFSv4.1. The reason that the permission bit exists is that | |||
some operating environments, such as Windows, use | some operating environments, such as Windows, use | |||
ACE4_SYNCHRONIZE. | ACE4_SYNCHRONIZE. | |||
For example, if a client copies a file that has | For example, if a client copies a file that has | |||
ACE4_SYNCHRONIZE set from a local file system to an NFSv4.1 | ACE4_SYNCHRONIZE set from a local file system to an NFSv4.1 | |||
server, and then later copies the file from the NFSv4.1 server | server, and then later copies the file from the NFSv4.1 server | |||
to a local file system, it is likely that if ACE4_SYNCHRONIZE | to a local file system, it is likely that if ACE4_SYNCHRONIZE | |||
was set in the original file, the client will want it set in | was set in the original file, the client will want it set in | |||
the second copy. The first copy will not have the permission | the second copy. The first copy will not have the permission | |||
skipping to change at page 141, line 8 | skipping to change at page 140, line 23 | |||
except in the previously discussed cases of execute and read. For | except in the previously discussed cases of execute and read. For | |||
example, suppose a server cannot distinguish overwriting data from | example, suppose a server cannot distinguish overwriting data from | |||
appending new data, as described in the previous paragraph. If a | appending new data, as described in the previous paragraph. If a | |||
client submits an ALLOW ACE where ACE4_APPEND_DATA is set but | client submits an ALLOW ACE where ACE4_APPEND_DATA is set but | |||
ACE4_WRITE_DATA is not (or vice versa), the server should either turn | ACE4_WRITE_DATA is not (or vice versa), the server should either turn | |||
off ACE4_APPEND_DATA or reject the request with NFS4ERR_ATTRNOTSUPP. | off ACE4_APPEND_DATA or reject the request with NFS4ERR_ATTRNOTSUPP. | |||
6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD | 6.2.1.3.2. ACE4_DELETE vs. ACE4_DELETE_CHILD | |||
Two access mask bits govern the ability to delete a directory entry: | Two access mask bits govern the ability to delete a directory entry: | |||
ACE4_DELETE on the object itself (the "target"), and | ACE4_DELETE on the object itself (the "target") and ACE4_DELETE_CHILD | |||
ACE4_DELETE_CHILD on the containing directory (the "parent"). | on the containing directory (the "parent"). | |||
Many systems also take the "sticky bit" (MODE4_SVTX) on a directory | Many systems also take the "sticky bit" (MODE4_SVTX) on a directory | |||
to allow unlink only to a user that owns either the target or the | to allow unlink only to a user that owns either the target or the | |||
parent; on some such systems the decision also depends on whether the | parent; on some such systems the decision also depends on whether the | |||
target is writable. | target is writable. | |||
Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the | Servers SHOULD allow unlink if either ACE4_DELETE is permitted on the | |||
target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that | target, or ACE4_DELETE_CHILD is permitted on the parent. (Note that | |||
this is true even if the parent or target explicitly denies one of | this is true even if the parent or target explicitly denies one of | |||
these permissions.) | these permissions.) | |||
skipping to change at page 142, line 23 | skipping to change at page 141, line 42 | |||
Any non-directory file in any sub-directory will get this ACE | Any non-directory file in any sub-directory will get this ACE | |||
inherited. | inherited. | |||
ACE4_DIRECTORY_INHERIT_ACE | ACE4_DIRECTORY_INHERIT_ACE | |||
Can be placed on a directory and indicates that this ACE should be | Can be placed on a directory and indicates that this ACE should be | |||
added to each new directory created. | added to each new directory created. | |||
If this flag is set in an ACE in an ACL attribute to be set on a | If this flag is set in an ACE in an ACL attribute to be set on a | |||
non-directory file system object, the operation attempting to set | non-directory file system object, the operation attempting to set | |||
the ACL SHOULD fail with NFS4ERR_ATTRNOTSUPP. | the ACL SHOULD fail with NFS4ERR_ATTRNOTSUPP. | |||
ACE4_NO_PROPAGATE_INHERIT_ACE | ||||
Can be placed on a directory. This flag tells the server that | ||||
inheritance of this ACE should stop at newly created child | ||||
directories. | ||||
ACE4_INHERIT_ONLY_ACE | ACE4_INHERIT_ONLY_ACE | |||
Can be placed on a directory but does not apply to the directory; | Can be placed on a directory but does not apply to the directory; | |||
ALLOW and DENY ACEs with this bit set do not affect access to the | ALLOW and DENY ACEs with this bit set do not affect access to the | |||
directory, and AUDIT and ALARM ACEs with this bit set do not | directory, and AUDIT and ALARM ACEs with this bit set do not | |||
trigger log or alarm events. Such ACEs only take effect once they | trigger log or alarm events. Such ACEs only take effect once they | |||
are applied (with this bit cleared) to newly created files and | are applied (with this bit cleared) to newly created files and | |||
directories as specified by the above two flags. | directories as specified by the ACE4_FILE_INHERIT_ACE and | |||
ACE4_DIRECTORY_INHERIT_ACE flags. | ||||
If this flag is present on an ACE, but neither | If this flag is present on an ACE, but neither | |||
ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, | ACE4_DIRECTORY_INHERIT_ACE nor ACE4_FILE_INHERIT_ACE is present, | |||
then an operation attempting to set such an attribute SHOULD fail | then an operation attempting to set such an attribute SHOULD fail | |||
with NFS4ERR_ATTRNOTSUPP. | with NFS4ERR_ATTRNOTSUPP. | |||
ACE4_NO_PROPAGATE_INHERIT_ACE | ||||
Can be placed on a directory. This flag tells the server that | ||||
inheritance of this ACE should stop at newly created child | ||||
directories. | ||||
ACE4_INHERITED_ACE | ||||
Indicates that this ACE is inherited from a parent directory. A | ||||
server that supports automatic inheritance will place this flag on | ||||
any ACEs inherited from the parent directory when creating a new | ||||
object. Client applications will use this to perform automatic | ||||
inheritance. Clients and servers MUST clear this bit in the acl | ||||
attribute; it may only be used in the dacl and sacl attributes. | ||||
ACE4_SUCCESSFUL_ACCESS_ACE_FLAG | ACE4_SUCCESSFUL_ACCESS_ACE_FLAG | |||
ACE4_FAILED_ACCESS_ACE_FLAG | ACE4_FAILED_ACCESS_ACE_FLAG | |||
The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and | The ACE4_SUCCESSFUL_ACCESS_ACE_FLAG (SUCCESS) and | |||
ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on | ACE4_FAILED_ACCESS_ACE_FLAG (FAILED) flag bits may be set only on | |||
ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE | ACE4_SYSTEM_AUDIT_ACE_TYPE (AUDIT) and ACE4_SYSTEM_ALARM_ACE_TYPE | |||
(ALARM) ACE types. If during the processing of the file's ACL, | (ALARM) ACE types. If during the processing of the file's ACL, | |||
the server encounters an AUDIT or ALARM ACE that matches the | the server encounters an AUDIT or ALARM ACE that matches the | |||
principal attempting the OPEN, the server notes that fact, and the | principal attempting the OPEN, the server notes that fact, and the | |||
presence, if any, of the SUCCESS and FAILED flags encountered in | presence, if any, of the SUCCESS and FAILED flags encountered in | |||
the AUDIT or ALARM ACE. Once the server completes the ACL | the AUDIT or ALARM ACE. Once the server completes the ACL | |||
processing, it then notes if the operation succeeded or failed. | processing, it then notes if the operation succeeded or failed. | |||
skipping to change at page 143, line 34 | skipping to change at page 142, line 44 | |||
ALARM, we consider an ACCESS operation to be a "failure" if it | ALARM, we consider an ACCESS operation to be a "failure" if it | |||
fails to return a bit that was requested and supported. | fails to return a bit that was requested and supported. | |||
ACE4_IDENTIFIER_GROUP | ACE4_IDENTIFIER_GROUP | |||
Indicates that the "who" refers to a GROUP as defined under UNIX | Indicates that the "who" refers to a GROUP as defined under UNIX | |||
or a GROUP ACCOUNT as defined under Windows. Clients and servers | or a GROUP ACCOUNT as defined under Windows. Clients and servers | |||
MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who | MUST ignore the ACE4_IDENTIFIER_GROUP flag on ACEs with a who | |||
value equal to one of the special identifiers outlined in | value equal to one of the special identifiers outlined in | |||
Section 6.2.1.5. | Section 6.2.1.5. | |||
ACE4_INHERITED_ACE | ||||
Indicates that this ACE is inherited from a parent directory. A | ||||
server that supports automatic inheritance will place this flag on | ||||
any ACEs inherited from the parent directory when creating a new | ||||
object. Client applications will use this to perform automatic | ||||
inheritance. Clients and servers MUST clear this bit in the acl | ||||
attribute; it may only be used in the dacl and sacl attributes. | ||||
6.2.1.5. ACE Who | 6.2.1.5. ACE Who | |||
The "who" field of an ACE is an identifier that specifies the | The "who" field of an ACE is an identifier that specifies the | |||
principal or principals to whom the ACE applies. It may refer to a | principal or principals to whom the ACE applies. It may refer to a | |||
user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying | user or a group, with the flag bit ACE4_IDENTIFIER_GROUP specifying | |||
which. | which. | |||
There are several special identifiers which need to be understood | There are several special identifiers that need to be understood | |||
universally, rather than in the context of a particular DNS domain. | universally, rather than in the context of a particular DNS domain. | |||
Some of these identifiers cannot be understood when an NFS client | Some of these identifiers cannot be understood when an NFS client | |||
accesses the server, but have meaning when a local process accesses | accesses the server, but have meaning when a local process accesses | |||
the file. The ability to display and modify these permissions is | the file. The ability to display and modify these permissions is | |||
permitted over NFS, even if none of the access methods on the server | permitted over NFS, even if none of the access methods on the server | |||
understands the identifiers. | understands the identifiers. | |||
+---------------+--------------------------------------------------+ | +---------------+--------------------------------------------------+ | |||
| Who | Description | | | Who | Description | | |||
+---------------+--------------------------------------------------+ | +---------------+--------------------------------------------------+ | |||
| OWNER | The owner of the file | | | OWNER | The owner of the file. | | |||
| GROUP | The group associated with the file. | | | GROUP | The group associated with the file. | | |||
| EVERYONE | The world, including the owner and owning group. | | | EVERYONE | The world, including the owner and owning group. | | |||
| INTERACTIVE | Accessed from an interactive terminal. | | | INTERACTIVE | Accessed from an interactive terminal. | | |||
| NETWORK | Accessed via the network. | | | NETWORK | Accessed via the network. | | |||
| DIALUP | Accessed as a dialup user to the server. | | | DIALUP | Accessed as a dialup user to the server. | | |||
| BATCH | Accessed from a batch job. | | | BATCH | Accessed from a batch job. | | |||
| ANONYMOUS | Accessed without any authentication. | | | ANONYMOUS | Accessed without any authentication. | | |||
| AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS) | | | AUTHENTICATED | Any authenticated user (opposite of ANONYMOUS). | | |||
| SERVICE | Access from a system service. | | | SERVICE | Access from a system service. | | |||
+---------------+--------------------------------------------------+ | +---------------+--------------------------------------------------+ | |||
Table 4 | Table 4 | |||
To avoid conflict, these special identifiers are distinguished by an | To avoid conflict, these special identifiers are distinguished by an | |||
appended "@" and should appear in the form "xxxx@" (with no domain | appended "@" and should appear in the form "xxxx@" (with no domain | |||
name after the "@"). For example: ANONYMOUS@. | name after the "@"), for example, ANONYMOUS@. | |||
The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these | The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these | |||
special identifiers. When encoding entries with these special | special identifiers. When encoding entries with these special | |||
identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. | identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. | |||
6.2.1.5.1. Discussion of EVERYONE@ | 6.2.1.5.1. Discussion of EVERYONE@ | |||
It is important to note that "EVERYONE@" is not equivalent to the | It is important to note that "EVERYONE@" is not equivalent to the | |||
UNIX "other" entity. This is because, by definition, UNIX "other" | UNIX "other" entity. This is because, by definition, UNIX "other" | |||
does not include the owner or owning group of a file. "EVERYONE@" | does not include the owner or owning group of a file. "EVERYONE@" | |||
skipping to change at page 145, line 22 | skipping to change at page 144, line 39 | |||
const MODE4_WGRP = 0x010; /* write permission: group */ | const MODE4_WGRP = 0x010; /* write permission: group */ | |||
const MODE4_XGRP = 0x008; /* execute permission: group */ | const MODE4_XGRP = 0x008; /* execute permission: group */ | |||
const MODE4_ROTH = 0x004; /* read permission: other */ | const MODE4_ROTH = 0x004; /* read permission: other */ | |||
const MODE4_WOTH = 0x002; /* write permission: other */ | const MODE4_WOTH = 0x002; /* write permission: other */ | |||
const MODE4_XOTH = 0x001; /* execute permission: other */ | const MODE4_XOTH = 0x001; /* execute permission: other */ | |||
Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal | Bits MODE4_RUSR, MODE4_WUSR, and MODE4_XUSR apply to the principal | |||
identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and | identified in the owner attribute. Bits MODE4_RGRP, MODE4_WGRP, and | |||
MODE4_XGRP apply to principals identified in the owner_group | MODE4_XGRP apply to principals identified in the owner_group | |||
attribute but who are not identified in the owner attribute. Bits | attribute but who are not identified in the owner attribute. Bits | |||
MODE4_ROTH, MODE4_WOTH, MODE4_XOTH apply to any principal that does | MODE4_ROTH, MODE4_WOTH, and MODE4_XOTH apply to any principal that | |||
not match that in the owner attribute, and does not have a group | does not match that in the owner attribute and does not have a group | |||
matching that of the owner_group attribute. | matching that of the owner_group attribute. | |||
Bits within the mode other than those specified above are not defined | Bits within a mode other than those specified above are not defined | |||
by this protocol. A server MUST NOT return bits other than those | by this protocol. A server MUST NOT return bits other than those | |||
defined above in a GETATTR or READDIR operation, and it MUST return | defined above in a GETATTR or READDIR operation, and it MUST return | |||
NFS4ERR_INVAL if bits other than those defined above are set in a | NFS4ERR_INVAL if bits other than those defined above are set in a | |||
SETATTR, CREATE, OPEN, VERIFY or NVERIFY operation. | SETATTR, CREATE, OPEN, VERIFY, or NVERIFY operation. | |||
6.2.5. Attribute 74: mode_set_masked | 6.2.5. Attribute 74: mode_set_masked | |||
The mode_set_masked attribute is a write-only attribute that allows | The mode_set_masked attribute is a write-only attribute that allows | |||
individual bits in the mode attribute to be set or reset, without | individual bits in the mode attribute to be set or reset, without | |||
changing others. It allows, for example, the bits MODE4_SUID, | changing others. It allows, for example, the bits MODE4_SUID, | |||
MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified | MODE4_SGID, and MODE4_SVTX to be modified while leaving unmodified | |||
any of the nine low-order mode bits devoted to permissions. | any of the nine low-order mode bits devoted to permissions. | |||
In such instances that the nine low-order bits are left unmodified, | In such instances that the nine low-order bits are left unmodified, | |||
then neither the acl nor the dacl attribute should be automatically | then neither the acl nor the dacl attribute should be automatically | |||
modified as discussed in Section 6.4.1. | modified as discussed in Section 6.4.1. | |||
The mode_set_masked attribute consists of two words each in the form | The mode_set_masked attribute consists of two words, each in the form | |||
of a mode4. The first consists of the value to be applied to the | of a mode4. The first consists of the value to be applied to the | |||
current mode value and the second is a mask. Only bits set to one in | current mode value and the second is a mask. Only bits set to one in | |||
the mask word are changed (set or reset) in the file's mode. All | the mask word are changed (set or reset) in the file's mode. All | |||
other bits in the mode remain unchanged. Bits in the first word that | other bits in the mode remain unchanged. Bits in the first word that | |||
correspond to bits which are zero in the mask are ignored, except | correspond to bits that are zero in the mask are ignored, except that | |||
that undefined bits are checked for validity and can result in | undefined bits are checked for validity and can result in | |||
NFS4ERR_INVAL as described below. | NFS4ERR_INVAL as described below. | |||
The mode_set_masked attribute is only valid in a SETATTR operation. | The mode_set_masked attribute is only valid in a SETATTR operation. | |||
If it is used in a CREATE or OPEN operation, the server MUST return | If it is used in a CREATE or OPEN operation, the server MUST return | |||
NFS4ERR_INVAL. | NFS4ERR_INVAL. | |||
Bits not defined as valid in the mode attribute are not valid in | Bits not defined as valid in the mode attribute are not valid in | |||
either word of the mode_set_masked attribute. The server MUST return | either word of the mode_set_masked attribute. The server MUST return | |||
NFS4ERR_INVAL if any of those are on in a SETATTR. If the mode and | NFS4ERR_INVAL if any such bits are set to one in a SETATTR. If the | |||
mode_set_masked attributes are both specified in the same SETATTR, | mode and mode_set_masked attributes are both specified in the same | |||
the server MUST also return NFS4ERR_INVAL. | SETATTR, the server MUST also return NFS4ERR_INVAL. | |||
6.3. Common Methods | 6.3. Common Methods | |||
The requirements in this section will be referred to in future | The requirements in this section will be referred to in future | |||
sections, especially Section 6.4. | sections, especially Section 6.4. | |||
6.3.1. Interpreting an ACL | 6.3.1. Interpreting an ACL | |||
6.3.1.1. Server Considerations | 6.3.1.1. Server Considerations | |||
The server uses the algorithm described in Section 6.2.1 to determine | The server uses the algorithm described in Section 6.2.1 to determine | |||
whether an ACL allows access to an object. However, the ACL might | whether an ACL allows access to an object. However, the ACL might | |||
not be the sole determiner of access. For example: | not be the sole determiner of access. For example: | |||
o In the case of a file system exported as read-only, the server may | o In the case of a file system exported as read-only, the server may | |||
deny write permissions even though an object's ACL grants it. | deny write access even though an object's ACL grants it. | |||
o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL | o Server implementations MAY grant ACE4_WRITE_ACL and ACE4_READ_ACL | |||
permissions to prevent a situation from arising in which there is | permissions to prevent a situation from arising in which there is | |||
no valid way to ever modify the ACL. | no valid way to ever modify the ACL. | |||
o All servers will allow a user the ability to read the data of the | o All servers will allow a user the ability to read the data of the | |||
file when only the execute permission is granted (i.e. If the ACL | file when only the execute permission is granted (i.e., if the ACL | |||
denies the user the ACE4_READ_DATA access and allows the user | denies the user the ACE4_READ_DATA access and allows the user | |||
ACE4_EXECUTE, the server will allow the user to read the data of | ACE4_EXECUTE, the server will allow the user to read the data of | |||
the file). | the file). | |||
o Many servers have the notion of owner-override in which the owner | o Many servers have the notion of owner-override in which the owner | |||
of the object is allowed to override accesses that are denied by | of the object is allowed to override accesses that are denied by | |||
the ACL. This may be helpful, for example, to allow users | the ACL. This may be helpful, for example, to allow users | |||
continued access to open files on which the permissions have | continued access to open files on which the permissions have | |||
changed. | changed. | |||
skipping to change at page 147, line 11 | skipping to change at page 146, line 28 | |||
beyond an ordinary user. The superuser may be able to read or | beyond an ordinary user. The superuser may be able to read or | |||
write data or metadata in ways that would not be permitted by the | write data or metadata in ways that would not be permitted by the | |||
ACL. | ACL. | |||
o A retention attribute might also block access otherwise allowed by | o A retention attribute might also block access otherwise allowed by | |||
ACLs (see Section 5.13). | ACLs (see Section 5.13). | |||
6.3.1.2. Client Considerations | 6.3.1.2. Client Considerations | |||
Clients SHOULD NOT do their own access checks based on their | Clients SHOULD NOT do their own access checks based on their | |||
interpretation the ACL, but rather use the OPEN and ACCESS operations | interpretation of the ACL, but rather use the OPEN and ACCESS | |||
to do access checks. This allows the client to act on the results of | operations to do access checks. This allows the client to act on the | |||
having the server determine whether or not access should be granted | results of having the server determine whether or not access should | |||
based on its interpretation of the ACL. | be granted based on its interpretation of the ACL. | |||
Clients must be aware of situations in which an object's ACL will | Clients must be aware of situations in which an object's ACL will | |||
define a certain access even though the server will not enforce it. | define a certain access even though the server will not enforce it. | |||
In general, but especially in these situations, the client needs to | In general, but especially in these situations, the client needs to | |||
do its part in the enforcement of access as defined by the ACL. To | do its part in the enforcement of access as defined by the ACL. To | |||
do this, the client MAY send the appropriate ACCESS operation prior | do this, the client MAY send the appropriate ACCESS operation prior | |||
to servicing the request of the user or application in order to | to servicing the request of the user or application in order to | |||
determine whether the user or application should be granted the | determine whether the user or application should be granted the | |||
access requested. For examples in which the ACL may define accesses | access requested. For examples in which the ACL may define accesses | |||
that the server doesn't enforce see Section 6.3.1.1. | that the server doesn't enforce, see Section 6.3.1.1. | |||
6.3.2. Computing a Mode Attribute from an ACL | 6.3.2. Computing a Mode Attribute from an ACL | |||
The following method can be used to calculate the MODE4_R*, MODE4_W* | The following method can be used to calculate the MODE4_R*, MODE4_W*, | |||
and MODE4_X* bits of a mode attribute, based upon an ACL. | and MODE4_X* bits of a mode attribute, based upon an ACL. | |||
First, for each of the special identifiers OWNER@, GROUP@, and | First, for each of the special identifiers OWNER@, GROUP@, and | |||
EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY | EVERYONE@, evaluate the ACL in order, considering only ALLOW and DENY | |||
ACEs for the identifier EVERYONE@ and for the identifier under | ACEs for the identifier EVERYONE@ and for the identifier under | |||
consideration. The result of the evaluation will be an NFSv4 ACL | consideration. The result of the evaluation will be an NFSv4 ACL | |||
mask showing exactly which bits are permitted to that identifier. | mask showing exactly which bits are permitted to that identifier. | |||
Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@ | Then translate the calculated mask for OWNER@, GROUP@, and EVERYONE@ | |||
into mode bits for, respectively, the user, group, and other, as | into mode bits for, respectively, the user, group, and other, as | |||
skipping to change at page 148, line 26 | skipping to change at page 147, line 40 | |||
The same user confusion seen when fetching the mode also results if | The same user confusion seen when fetching the mode also results if | |||
setting the mode does not effectively control permissions for the | setting the mode does not effectively control permissions for the | |||
owner, group, and other users; this motivates some of the | owner, group, and other users; this motivates some of the | |||
requirements that follow. | requirements that follow. | |||
6.4. Requirements | 6.4. Requirements | |||
The server that supports both mode and ACL must take care to | The server that supports both mode and ACL must take care to | |||
synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the | synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the | |||
ACEs which have respective who fields of "OWNER@", "GROUP@", and | ACEs that have respective who fields of "OWNER@", "GROUP@", and | |||
"EVERYONE@" so that the client can see semantically equivalent access | "EVERYONE@". This way, the client can see if semantically equivalent | |||
permissions exist whether the client asks for owner, owner_group and | access permissions exist whether the client asks for the owner, | |||
mode attributes, or for just the ACL. | owner_group, and mode attributes or for just the ACL. | |||
In this section, much is made of the methods in Section 6.3.2. Many | In this section, much is made of the methods in Section 6.3.2. Many | |||
requirements refer to this section. But note that the methods have | requirements refer to this section. But note that the methods have | |||
behaviors specified with "SHOULD". This is intentional, to avoid | behaviors specified with "SHOULD". This is intentional, to avoid | |||
invalidating existing implementations that compute the mode according | invalidating existing implementations that compute the mode according | |||
to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by | to the withdrawn POSIX ACL draft (1003.1e draft 17), rather than by | |||
actual permissions on owner, group, and other. | actual permissions on owner, group, and other. | |||
6.4.1. Setting the mode and/or ACL Attributes | 6.4.1. Setting the Mode and/or ACL Attributes | |||
In the case where a server supports the sacl or dacl attribute, in | In the case where a server supports the sacl or dacl attribute, in | |||
addition to the acl attribute, the server MUST fail a request to set | addition to the acl attribute, the server MUST fail a request to set | |||
the acl attribute simultaneously with a dacl or sacl attribute. The | the acl attribute simultaneously with a dacl or sacl attribute. The | |||
error to be given is NFS4ERR_ATTRNOTSUPP. | error to be given is NFS4ERR_ATTRNOTSUPP. | |||
6.4.1.1. Setting mode and not ACL | 6.4.1.1. Setting Mode and not ACL | |||
When any of the nine low-order mode bits are subject to change, | When any of the nine low-order mode bits are subject to change, | |||
either because the mode attribute was set or because the | either because the mode attribute was set or because the | |||
mode_set_masked attribute was set and the mask included one or more | mode_set_masked attribute was set and the mask included one or more | |||
bits from the nine low-order mode bits, and no ACL attribute is | bits from the nine low-order mode bits, and no ACL attribute is | |||
explicitly set, the acl and dacl attributes must be modified in | explicitly set, the acl and dacl attributes must be modified in | |||
accordance with the updated value of those bits. This must happen | accordance with the updated value of those bits. This must happen | |||
even if the value of the low-order bits is the same after the mode is | even if the value of the low-order bits is the same after the mode is | |||
set as before. | set as before. | |||
skipping to change at page 149, line 29 | skipping to change at page 148, line 45 | |||
ACE4_READ_DATA. | ACE4_READ_DATA. | |||
2. If MODE4_WGRP is not set, entities explicitly listed in the ACL | 2. If MODE4_WGRP is not set, entities explicitly listed in the ACL | |||
other than OWNER@ and EVERYONE@ SHOULD NOT be granted | other than OWNER@ and EVERYONE@ SHOULD NOT be granted | |||
ACE4_WRITE_DATA or ACE4_APPEND_DATA. | ACE4_WRITE_DATA or ACE4_APPEND_DATA. | |||
3. If MODE4_XGRP is not set, entities explicitly listed in the ACL | 3. If MODE4_XGRP is not set, entities explicitly listed in the ACL | |||
other than OWNER@ and EVERYONE@ SHOULD NOT be granted | other than OWNER@ and EVERYONE@ SHOULD NOT be granted | |||
ACE4_EXECUTE. | ACE4_EXECUTE. | |||
Access mask bits other those listed above, appearing in ALLOW ACEs, | Access mask bits other than those listed above, appearing in ALLOW | |||
MAY also be disabled. | ACEs, MAY also be disabled. | |||
Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect | Note that ACEs with the flag ACE4_INHERIT_ONLY_ACE set do not affect | |||
the permissions of the ACL itself, nor do ACEs of the type AUDIT and | the permissions of the ACL itself, nor do ACEs of the type AUDIT and | |||
ALARM. As such, it is desirable to leave these ACEs unmodified when | ALARM. As such, it is desirable to leave these ACEs unmodified when | |||
modifying the ACL attributes. | modifying the ACL attributes. | |||
Also note that the requirement may be met by discarding the acl and | Also note that the requirement may be met by discarding the acl and | |||
dacl, in favor of an ACL that represents the mode and only the mode. | dacl, in favor of an ACL that represents the mode and only the mode. | |||
This is permitted, but it is preferable for a server to preserve as | This is permitted, but it is preferable for a server to preserve as | |||
much of the ACL as possible without violating the above requirements. | much of the ACL as possible without violating the above requirements. | |||
Discarding the ACL makes it effectively impossible for a file created | Discarding the ACL makes it effectively impossible for a file created | |||
with a mode attribute to inherit an ACL (see Section 6.4.3). | with a mode attribute to inherit an ACL (see Section 6.4.3). | |||
6.4.1.2. Setting ACL and not mode | 6.4.1.2. Setting ACL and Not Mode | |||
When setting the acl or dacl and not setting the mode or | When setting the acl or dacl and not setting the mode or | |||
mode_set_masked attributes, the permission bits of the mode need to | mode_set_masked attributes, the permission bits of the mode need to | |||
be derived from the ACL. In this case, the ACL attribute SHOULD be | be derived from the ACL. In this case, the ACL attribute SHOULD be | |||
set as given. The nine low-order bits of the mode attribute | set as given. The nine low-order bits of the mode attribute | |||
(MODE4_R*, MODE4_W*, MODE4_X*) MUST be modified to match the result | (MODE4_R*, MODE4_W*, MODE4_X*) MUST be modified to match the result | |||
of the method Section 6.3.2. The three high-order bits of the mode | of the method in Section 6.3.2. The three high-order bits of the | |||
(MODE4_SUID, MODE4_SGID, MODE4_SVTX) SHOULD remain unchanged. | mode (MODE4_SUID, MODE4_SGID, MODE4_SVTX) SHOULD remain unchanged. | |||
6.4.1.3. Setting both ACL and mode | 6.4.1.3. Setting Both ACL and Mode | |||
When setting both the mode (includes use of either the mode attribute | When setting both the mode (includes use of either the mode attribute | |||
or the mode_set_masked attribute) and the acl or dacl attributes in | or the mode_set_masked attribute) and the acl or dacl attributes in | |||
the same operation, the attributes MUST be applied in this order: | the same operation, the attributes MUST be applied in this order: | |||
mode (or mode_set_masked), then ACL. The mode-related attribute is | mode (or mode_set_masked), then ACL. The mode-related attribute is | |||
set as given, then the ACL attribute is set as given, possibly | set as given, then the ACL attribute is set as given, possibly | |||
changing the final mode, as described above in Section 6.4.1.2. | changing the final mode, as described above in Section 6.4.1.2. | |||
6.4.2. Retrieving the mode and/or ACL Attributes | 6.4.2. Retrieving the Mode and/or ACL Attributes | |||
This section applies only to servers that support both the mode and | This section applies only to servers that support both the mode and | |||
ACL attributes. | ACL attributes. | |||
Some server implementations may have a concept of "objects without | Some server implementations may have a concept of "objects without | |||
ACLs", meaning that all permissions are granted and denied according | ACLs", meaning that all permissions are granted and denied according | |||
to the mode attribute, and that no ACL attribute is stored for that | to the mode attribute and that no ACL attribute is stored for that | |||
object. If an ACL attribute is requested of such a server, the | object. If an ACL attribute is requested of such a server, the | |||
server SHOULD return an ACL that does not conflict with the mode; | server SHOULD return an ACL that does not conflict with the mode; | |||
that is to say, the ACL returned SHOULD represent the nine low-order | that is to say, the ACL returned SHOULD represent the nine low-order | |||
bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as | bits of the mode attribute (MODE4_R*, MODE4_W*, MODE4_X*) as | |||
described in Section 6.3.2. | described in Section 6.3.2. | |||
For other server implementations, the ACL attribute is always present | For other server implementations, the ACL attribute is always present | |||
for every object. Such servers SHOULD store at least the three high- | for every object. Such servers SHOULD store at least the three high- | |||
order bits of the mode attribute (MODE4_SUID, MODE4_SGID, | order bits of the mode attribute (MODE4_SUID, MODE4_SGID, | |||
MODE4_SVTX). The server SHOULD return a mode attribute if one is | MODE4_SVTX). The server SHOULD return a mode attribute if one is | |||
skipping to change at page 150, line 47 | skipping to change at page 150, line 15 | |||
6.4.3. Creating New Objects | 6.4.3. Creating New Objects | |||
If a server supports any ACL attributes, it may use the ACL | If a server supports any ACL attributes, it may use the ACL | |||
attributes on the parent directory to compute an initial ACL | attributes on the parent directory to compute an initial ACL | |||
attribute for a newly created object. This will be referred to as | attribute for a newly created object. This will be referred to as | |||
the inherited ACL within this section. The act of adding one or more | the inherited ACL within this section. The act of adding one or more | |||
ACEs to the inherited ACL that are based upon ACEs in the parent | ACEs to the inherited ACL that are based upon ACEs in the parent | |||
directory's ACL will be referred to as inheriting an ACE within this | directory's ACL will be referred to as inheriting an ACE within this | |||
section. | section. | |||
Implementors should standardize on what the behavior of CREATE and | Implementors should standardize what the behavior of CREATE and OPEN | |||
OPEN must be depending on the presence or absence of the mode and ACL | must be depending on the presence or absence of the mode and ACL | |||
attributes. | attributes. | |||
1. If just the mode is given in the call: | 1. If just the mode is given in the call: | |||
In this case, inheritance SHOULD take place, but the mode MUST be | In this case, inheritance SHOULD take place, but the mode MUST be | |||
applied to the inherited ACL as described in Section 6.4.1.1, | applied to the inherited ACL as described in Section 6.4.1.1, | |||
thereby modifying the ACL. | thereby modifying the ACL. | |||
2. If just the ACL is given in the call: | 2. If just the ACL is given in the call: | |||
In this case, inheritance SHOULD NOT take place, and the ACL as | In this case, inheritance SHOULD NOT take place, and the ACL as | |||
defined in the CREATE or OPEN will be set without modification, | defined in the CREATE or OPEN will be set without modification, | |||
and the mode modified as in Section 6.4.1.2 | and the mode modified as in Section 6.4.1.2. | |||
3. If both mode and ACL are given in the call: | 3. If both mode and ACL are given in the call: | |||
In this case, inheritance SHOULD NOT take place, and both | In this case, inheritance SHOULD NOT take place, and both | |||
attributes will be set as described in Section 6.4.1.3. | attributes will be set as described in Section 6.4.1.3. | |||
4. If neither mode nor ACL are given in the call: | 4. If neither mode nor ACL is given in the call: | |||
In the case where an object is being created without any initial | In the case where an object is being created without any initial | |||
attributes at all, e.g. an OPEN operation with an opentype4 of | attributes at all, e.g., an OPEN operation with an opentype4 of | |||
OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD | OPEN4_CREATE and a createmode4 of EXCLUSIVE4, inheritance SHOULD | |||
NOT take place (note that EXCLUSIVE4_1 is a better choice of | NOT take place (note that EXCLUSIVE4_1 is a better choice of | |||
createmode4, since it does permit initial attributes). Instead, | createmode4, since it does permit initial attributes). Instead, | |||
the server SHOULD set permissions to deny all access to the newly | the server SHOULD set permissions to deny all access to the newly | |||
created object. It is expected that the appropriate client will | created object. It is expected that the appropriate client will | |||
set the desired attributes in a subsequent SETATTR operation, and | set the desired attributes in a subsequent SETATTR operation, and | |||
the server SHOULD allow that operation to succeed, regardless of | the server SHOULD allow that operation to succeed, regardless of | |||
what permissions the object is created with. For example, an | what permissions the object is created with. For example, an | |||
empty ACL denies all permissions, but the server should allow the | empty ACL denies all permissions, but the server should allow the | |||
owner's SETATTR to succeed even though WRITE_ACL is implicitly | owner's SETATTR to succeed even though WRITE_ACL is implicitly | |||
denied. | denied. | |||
In other cases, inheritance SHOULD take place, and no | In other cases, inheritance SHOULD take place, and no | |||
modifications to the ACL will happen. The mode attribute, if | modifications to the ACL will happen. The mode attribute, if | |||
supported, MUST be as computed in Section 6.3.2, with the | supported, MUST be as computed in Section 6.3.2, with the | |||
MODE4_SUID, MODE4_SGID and MODE4_SVTX bits clear. If no | MODE4_SUID, MODE4_SGID, and MODE4_SVTX bits clear. If no | |||
inheritable ACEs exist on the parent directory, the rules for | inheritable ACEs exist on the parent directory, the rules for | |||
creating acl, dacl or sacl attributes are implementation defined. | creating acl, dacl, or sacl attributes are implementation | |||
If either the dacl or sacl attribute is supported, then the | defined. If either the dacl or sacl attribute is supported, then | |||
ACL4_DEFAULTED flag SHOULD be set on the newly created | the ACL4_DEFAULTED flag SHOULD be set on the newly created | |||
attributes. | attributes. | |||
6.4.3.1. The Inherited ACL | 6.4.3.1. The Inherited ACL | |||
If the object being created is not a directory, the inherited ACL | If the object being created is not a directory, the inherited ACL | |||
SHOULD NOT inherit ACEs from the parent directory ACL unless the | SHOULD NOT inherit ACEs from the parent directory ACL unless the | |||
ACE4_FILE_INHERIT_FLAG is set. | ACE4_FILE_INHERIT_FLAG is set. | |||
If the object being created is a directory, the inherited ACL should | If the object being created is a directory, the inherited ACL should | |||
inherit all inheritable ACEs from the parent directory, those that | inherit all inheritable ACEs from the parent directory, that is, | |||
have ACE4_FILE_INHERIT_ACE or ACE4_DIRECTORY_INHERIT_ACE flag set. | those that have the ACE4_FILE_INHERIT_ACE or | |||
If the inheritable ACE has ACE4_FILE_INHERIT_ACE set, but | ACE4_DIRECTORY_INHERIT_ACE flag set. If the inheritable ACE has | |||
ACE4_DIRECTORY_INHERIT_ACE is clear, the inherited ACE on the newly | ACE4_FILE_INHERIT_ACE set but ACE4_DIRECTORY_INHERIT_ACE is clear, | |||
created directory MUST have the ACE4_INHERIT_ONLY_ACE flag set to | the inherited ACE on the newly created directory MUST have the | |||
prevent the directory from being affected by ACEs meant for non- | ACE4_INHERIT_ONLY_ACE flag set to prevent the directory from being | |||
directories. | affected by ACEs meant for non-directories. | |||
When a new directory is created, the server MAY split any inherited | When a new directory is created, the server MAY split any inherited | |||
ACE which is both inheritable and effective (in other words, which | ACE that is both inheritable and effective (in other words, that has | |||
has neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE | neither ACE4_INHERIT_ONLY_ACE nor ACE4_NO_PROPAGATE_INHERIT_ACE set), | |||
set), into two ACEs, one with no inheritance flags, and one with | into two ACEs, one with no inheritance flags and one with | |||
ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, | ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, | |||
both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) | both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) | |||
This makes it simpler to modify the effective permissions on the | This makes it simpler to modify the effective permissions on the | |||
directory without modifying the ACE which is to be inherited to the | directory without modifying the ACE that is to be inherited to the | |||
new directory's children. | new directory's children. | |||
6.4.3.2. Automatic Inheritance | 6.4.3.2. Automatic Inheritance | |||
The acl attribute consists only of an array of ACEs, but the sacl | The acl attribute consists only of an array of ACEs, but the sacl | |||
(Section 6.2.3) and dacl (Section 6.2.2) attributes also include an | (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an | |||
additional flag field. | additional flag field. | |||
struct nfsacl41 { | struct nfsacl41 { | |||
aclflag4 na41_flag; | aclflag4 na41_flag; | |||
skipping to change at page 153, line 11 | skipping to change at page 152, line 21 | |||
cleared in the acl). | cleared in the acl). | |||
Together these features allow a server to support automatic | Together these features allow a server to support automatic | |||
inheritance, which we now explain in more detail. | inheritance, which we now explain in more detail. | |||
Inheritable ACEs are normally inherited by child objects only at the | Inheritable ACEs are normally inherited by child objects only at the | |||
time that the child objects are created; later modifications to | time that the child objects are created; later modifications to | |||
inheritable ACEs do not result in modifications to inherited ACEs on | inheritable ACEs do not result in modifications to inherited ACEs on | |||
descendants. | descendants. | |||
However, the dacl and sacl provide an OPTIONAL mechanism which allows | However, the dacl and sacl provide an OPTIONAL mechanism that allows | |||
a client application to propagate changes to inheritable ACEs to an | a client application to propagate changes to inheritable ACEs to an | |||
entire directory hierarchy. | entire directory hierarchy. | |||
A server that supports this performs inheritance at object creation | A server that supports this performs inheritance at object creation | |||
time in the normal way, and SHOULD set the ACE4_INHERITED_ACE flag on | time in the normal way, and SHOULD set the ACE4_INHERITED_ACE flag on | |||
any inherited ACEs as they are added to the new object. | any inherited ACEs as they are added to the new object. | |||
A client application such as an ACL editor may then propagate changes | A client application such as an ACL editor may then propagate changes | |||
to inheritable ACEs on a directory by recursively traversing that | to inheritable ACEs on a directory by recursively traversing that | |||
directory's descendants and modifying each ACL encountered to remove | directory's descendants and modifying each ACL encountered to remove | |||
any ACEs with the ACE4_INHERITED_ACE flag and to replace them by the | any ACEs with the ACE4_INHERITED_ACE flag and to replace them by the | |||
new inheritable ACEs (also with the ACE4_INHERITED_ACE flag set). It | new inheritable ACEs (also with the ACE4_INHERITED_ACE flag set). It | |||
uses the existing ACE inheritance flags in the obvious way to decide | uses the existing ACE inheritance flags in the obvious way to decide | |||
which ACEs to propagate. (Note that it may encounter further | which ACEs to propagate. (Note that it may encounter further | |||
inheritable ACEs when descending the directory hierarchy, and that | inheritable ACEs when descending the directory hierarchy and that | |||
those will also need to be taken into account when propagating | those will also need to be taken into account when propagating | |||
inheritable ACEs to further descendants.) | inheritable ACEs to further descendants.) | |||
The reach of this propagation may be limited in two ways: first, | The reach of this propagation may be limited in two ways: first, | |||
automatic inheritance is not performed from any directory ACL that | automatic inheritance is not performed from any directory ACL that | |||
has the ACL4_AUTO_INHERIT flag cleared; and second, automatic | has the ACL4_AUTO_INHERIT flag cleared; and second, automatic | |||
inheritance stops wherever an ACL with the ACL4_PROTECTED flag is | inheritance stops wherever an ACL with the ACL4_PROTECTED flag is | |||
set, preventing modification of that ACL and also (if the ACL is set | set, preventing modification of that ACL and also (if the ACL is set | |||
on a directory) of the ACL on any of the object's descendants. | on a directory) of the ACL on any of the object's descendants. | |||
This propagation is performed independently for the sacl and the dacl | This propagation is performed independently for the sacl and the dacl | |||
attributes; thus the ACL4_AUTO_INHERIT and ACL4_PROTECTED flags may | attributes; thus, the ACL4_AUTO_INHERIT and ACL4_PROTECTED flags may | |||
be independently set for the sacl and the dacl, and propagation of | be independently set for the sacl and the dacl, and propagation of | |||
one type of acl may continue down a hierarchy even where propagation | one type of acl may continue down a hierarchy even where propagation | |||
of the other acl has stopped. | of the other acl has stopped. | |||
New objects should be created with a dacl and a sacl that both have | New objects should be created with a dacl and a sacl that both have | |||
the ACL4_PROTECTED flag cleared and the ACL4_AUTO_INHERIT flag set to | the ACL4_PROTECTED flag cleared and the ACL4_AUTO_INHERIT flag set to | |||
the same value as that on, respectively, the sacl or dacl of the | the same value as that on, respectively, the sacl or dacl of the | |||
parent object. | parent object. | |||
Both the dacl and sacl attributes are RECOMMENDED, and a server may | Both the dacl and sacl attributes are RECOMMENDED, and a server may | |||
support one without supporting the other. | support one without supporting the other. | |||
A server that supports both the old acl attribute and one or both of | A server that supports both the old acl attribute and one or both of | |||
the new dacl or sacl attributes must do so in such a way as to keep | the new dacl or sacl attributes must do so in such a way as to keep | |||
all three attributes consistent with each other. Thus the ACEs | all three attributes consistent with each other. Thus, the ACEs | |||
reported in the acl attribute should be the union of the ACEs | reported in the acl attribute should be the union of the ACEs | |||
reported in the dacl and sacl attributes, except that the | reported in the dacl and sacl attributes, except that the | |||
ACE4_INHERITED_ACE flag must be cleared from the ACEs in the acl. | ACE4_INHERITED_ACE flag must be cleared from the ACEs in the acl. | |||
And of course a client that queries only the acl will be unable to | And of course a client that queries only the acl will be unable to | |||
determine the values of the sacl or dacl flag fields. | determine the values of the sacl or dacl flag fields. | |||
When a client performs a SETATTR for the acl attribute, the server | When a client performs a SETATTR for the acl attribute, the server | |||
SHOULD set the ACL4_PROTECTED flag to true on both the sacl and the | SHOULD set the ACL4_PROTECTED flag to true on both the sacl and the | |||
dacl. By using the acl attribute, as opposed to the dacl or sacl | dacl. By using the acl attribute, as opposed to the dacl or sacl | |||
attributes, the client signals that it may not understand automatic | attributes, the client signals that it may not understand automatic | |||
skipping to change at page 154, line 42 | skipping to change at page 154, line 5 | |||
Finally, in the case where the request that creates a new file or | Finally, in the case where the request that creates a new file or | |||
directory does not also set permissions for that file or directory, | directory does not also set permissions for that file or directory, | |||
and there are also no ACEs to inherit from the parent's directory, | and there are also no ACEs to inherit from the parent's directory, | |||
then the server's choice of ACL for the new object is implementation- | then the server's choice of ACL for the new object is implementation- | |||
dependent. In this case, the server SHOULD set the ACL4_DEFAULTED | dependent. In this case, the server SHOULD set the ACL4_DEFAULTED | |||
flag on the ACL it chooses for the new object. An application | flag on the ACL it chooses for the new object. An application | |||
performing automatic inheritance takes the ACL4_DEFAULTED flag as a | performing automatic inheritance takes the ACL4_DEFAULTED flag as a | |||
sign that the ACL should be completely replaced by one generated | sign that the ACL should be completely replaced by one generated | |||
using the automatic inheritance rules. | using the automatic inheritance rules. | |||
7. Single-server Namespace | 7. Single-Server Namespace | |||
This chapter describes the NFSv4 single-server namespace. Single- | This section describes the NFSv4 single-server namespace. Single- | |||
server namespaces may be presented directly to clients, or they may | server namespaces may be presented directly to clients, or they may | |||
be used as a basis to form larger multi-server namespaces (e.g. site- | be used as a basis to form larger multi-server namespaces (e.g., | |||
wide or organization-wide) to be presented to clients, as described | site-wide or organization-wide) to be presented to clients, as | |||
in Section 11. | described in Section 11. | |||
7.1. Server Exports | 7.1. Server Exports | |||
On a UNIX server, the namespace describes all the files reachable by | On a UNIX server, the namespace describes all the files reachable by | |||
pathnames under the root directory or "/". On a Windows server the | pathnames under the root directory or "/". On a Windows server, the | |||
namespace constitutes all the files on disks named by mapped disk | namespace constitutes all the files on disks named by mapped disk | |||
letters. NFS server administrators rarely make the entire server's | letters. NFS server administrators rarely make the entire server's | |||
file system namespace available to NFS clients. More often portions | file system namespace available to NFS clients. More often, portions | |||
of the namespace are made available via an "export" feature. In | of the namespace are made available via an "export" feature. In | |||
previous versions of the NFS protocol, the root filehandle for each | previous versions of the NFS protocol, the root filehandle for each | |||
export is obtained through the MOUNT protocol; the client sent a | export is obtained through the MOUNT protocol; the client sent a | |||
string that identified the export name within the namespace and the | string that identified the export name within the namespace and the | |||
server returned the root filehandle for that export. The MOUNT | server returned the root filehandle for that export. The MOUNT | |||
protocol also provided an EXPORTS procedure that enumerated server's | protocol also provided an EXPORTS procedure that enumerated the | |||
exports. | server's exports. | |||
7.2. Browsing Exports | 7.2. Browsing Exports | |||
The NFSv4.1 protocol provides a root filehandle that clients can use | The NFSv4.1 protocol provides a root filehandle that clients can use | |||
to obtain filehandles for the exports of a particular server, via a | to obtain filehandles for the exports of a particular server, via a | |||
series of LOOKUP operations within a COMPOUND, to traverse a path. A | series of LOOKUP operations within a COMPOUND, to traverse a path. A | |||
common user experience is to use a graphical user interface (perhaps | common user experience is to use a graphical user interface (perhaps | |||
a file "Open" dialog window) to find a file via progressive browsing | a file "Open" dialog window) to find a file via progressive browsing | |||
through a directory tree. The client must be able to move from one | through a directory tree. The client must be able to move from one | |||
export to another export via single-component, progressive LOOKUP | export to another export via single-component, progressive LOOKUP | |||
skipping to change at page 155, line 46 | skipping to change at page 155, line 5 | |||
In the case of NFSv3, an automounter on the client can obtain a | In the case of NFSv3, an automounter on the client can obtain a | |||
snapshot of the server's namespace using the EXPORTS procedure of the | snapshot of the server's namespace using the EXPORTS procedure of the | |||
MOUNT protocol. If it understands the server's pathname syntax, it | MOUNT protocol. If it understands the server's pathname syntax, it | |||
can create an image of the server's namespace on the client. The | can create an image of the server's namespace on the client. The | |||
parts of the namespace that are not exported by the server are filled | parts of the namespace that are not exported by the server are filled | |||
in with directories that might be constructed similarly to an NFSv4.1 | in with directories that might be constructed similarly to an NFSv4.1 | |||
"pseudo file system" (see Section 7.3) that allows the user to browse | "pseudo file system" (see Section 7.3) that allows the user to browse | |||
from one mounted file system to another. There is a drawback to this | from one mounted file system to another. There is a drawback to this | |||
representation of the server's namespace on the client: it is static. | representation of the server's namespace on the client: it is static. | |||
If the server administrator adds a new export the client will be | If the server administrator adds a new export, the client will be | |||
unaware of it. | unaware of it. | |||
7.3. Server Pseudo File System | 7.3. Server Pseudo File System | |||
NFSv4.1 servers avoid this namespace inconsistency by presenting all | NFSv4.1 servers avoid this namespace inconsistency by presenting all | |||
the exports for a given server within the framework of a single | the exports for a given server within the framework of a single | |||
namespace, for that server. An NFSv4.1 client uses LOOKUP and | namespace for that server. An NFSv4.1 client uses LOOKUP and READDIR | |||
READDIR operations to browse seamlessly from one export to another. | operations to browse seamlessly from one export to another. | |||
Where there are portions of the server namespace that are not | Where there are portions of the server namespace that are not | |||
exported, clients require some way of traversing those portions to | exported, clients require some way of traversing those portions to | |||
reach actual exported file systems. A technique that servers may use | reach actual exported file systems. A technique that servers may use | |||
to provide for this is to bridge unexported portion of the namespace | to provide for this is to bridge the unexported portion of the | |||
via a "pseudo file system" that provides a view of exported | namespace via a "pseudo file system" that provides a view of exported | |||
directories only. A pseudo file system has a unique fsid and behaves | directories only. A pseudo file system has a unique fsid and behaves | |||
like a normal, read-only file system. | like a normal, read-only file system. | |||
Based on the construction of the server's namespace, it is possible | Based on the construction of the server's namespace, it is possible | |||
that multiple pseudo file systems may exist. For example, | that multiple pseudo file systems may exist. For example, | |||
/a pseudo file system | /a pseudo file system | |||
/a/b real file system | /a/b real file system | |||
/a/b/c pseudo file system | /a/b/c pseudo file system | |||
/a/b/c/d real file system | /a/b/c/d real file system | |||
Each of the pseudo file systems is considered a separate entity and | Each of the pseudo file systems is considered a separate entity and | |||
therefore MUST have its own fsid, unique among all the fsids for that | therefore MUST have its own fsid, unique among all the fsids for that | |||
server. | server. | |||
7.4. Multiple Roots | 7.4. Multiple Roots | |||
Certain operating environments are sometimes described as having | Certain operating environments are sometimes described as having | |||
"multiple roots". In such environments individual file systems are | "multiple roots". In such environments, individual file systems are | |||
commonly represented by disk or volume names. NFSv4 servers for | commonly represented by disk or volume names. NFSv4 servers for | |||
these platforms can construct a pseudo file system above these root | these platforms can construct a pseudo file system above these root | |||
names so that disk letters or volume names are simply directory names | names so that disk letters or volume names are simply directory names | |||
in the pseudo root. | in the pseudo root. | |||
7.5. Filehandle Volatility | 7.5. Filehandle Volatility | |||
The nature of the server's pseudo file system is that it is a logical | The nature of the server's pseudo file system is that it is a logical | |||
representation of file system(s) available from the server. | representation of file system(s) available from the server. | |||
Therefore, the pseudo file system is most likely constructed | Therefore, the pseudo file system is most likely constructed | |||
dynamically when the server is first instantiated. It is expected | dynamically when the server is first instantiated. It is expected | |||
that the pseudo file system may not have an on disk counterpart from | that the pseudo file system may not have an on-disk counterpart from | |||
which persistent filehandles could be constructed. Even though it is | which persistent filehandles could be constructed. Even though it is | |||
preferable that the server provide persistent filehandles for the | preferable that the server provide persistent filehandles for the | |||
pseudo file system, the NFS client should expect that pseudo file | pseudo file system, the NFS client should expect that pseudo file | |||
system filehandles are volatile. This can be confirmed by checking | system filehandles are volatile. This can be confirmed by checking | |||
the associated "fh_expire_type" attribute for those filehandles in | the associated "fh_expire_type" attribute for those filehandles in | |||
question. If the filehandles are volatile, the NFS client must be | question. If the filehandles are volatile, the NFS client must be | |||
prepared to recover a filehandle value (e.g. with a series of LOOKUP | prepared to recover a filehandle value (e.g., with a series of LOOKUP | |||
operations) when receiving an error of NFS4ERR_FHEXPIRED. | operations) when receiving an error of NFS4ERR_FHEXPIRED. | |||
Because it is quite likely that servers will implement pseudo file | Because it is quite likely that servers will implement pseudo file | |||
systems using volatile filehandles, clients need to be prepared for | systems using volatile filehandles, clients need to be prepared for | |||
them, rather than assuming that all filehandles will be persistent. | them, rather than assuming that all filehandles will be persistent. | |||
7.6. Exported Root | 7.6. Exported Root | |||
If the server's root file system is exported, one might conclude that | If the server's root file system is exported, one might conclude that | |||
a pseudo file system is unneeded. This not necessarily so. Assume | a pseudo file system is unneeded. This is not necessarily so. | |||
the following file systems on a server: | Assume the following file systems on a server: | |||
/ fs1 (exported) | / fs1 (exported) | |||
/a fs2 (not exported) | /a fs2 (not exported) | |||
/a/b fs3 (exported) | /a/b fs3 (exported) | |||
Because fs2 is not exported, fs3 cannot be reached with simple | Because fs2 is not exported, fs3 cannot be reached with simple | |||
LOOKUPs. The server must bridge the gap with a pseudo file system. | LOOKUPs. The server must bridge the gap with a pseudo file system. | |||
7.7. Mount Point Crossing | 7.7. Mount Point Crossing | |||
The server file system environment may be constructed in such a way | The server file system environment may be constructed in such a way | |||
that one file system contains a directory which is 'covered' or | that one file system contains a directory that is 'covered' or | |||
mounted upon by a second file system. For example: | mounted upon by a second file system. For example: | |||
/a/b (file system 1) | /a/b (file system 1) | |||
/a/b/c/d (file system 2) | /a/b/c/d (file system 2) | |||
The pseudo file system for this server may be constructed to look | The pseudo file system for this server may be constructed to look | |||
like: | like: | |||
/ (place holder/not exported) | / (place holder/not exported) | |||
/a/b (file system 1) | /a/b (file system 1) | |||
/a/b/c/d (file system 2) | /a/b/c/d (file system 2) | |||
It is the server's responsibility to present the pseudo file system | It is the server's responsibility to present the pseudo file system | |||
that is complete to the client. If the client sends a lookup request | that is complete to the client. If the client sends a LOOKUP request | |||
for the path "/a/b/c/d", the server's response is the filehandle of | for the path /a/b/c/d, the server's response is the filehandle of the | |||
the root of the file system "/a/b/c/d". In previous versions of the | root of the file system /a/b/c/d. In previous versions of the NFS | |||
NFS protocol, the server would respond with the filehandle of | protocol, the server would respond with the filehandle of directory | |||
directory "/a/b/c/d" within the file system "/a/b". | /a/b/c/d within the file system /a/b. | |||
The NFS client will be able to determine if it crosses a server mount | The NFS client will be able to determine if it crosses a server mount | |||
point by a change in the value of the "fsid" attribute. | point by a change in the value of the "fsid" attribute. | |||
7.8. Security Policy and Namespace Presentation | 7.8. Security Policy and Namespace Presentation | |||
Because NFSv4 clients possess the ability to change the security | Because NFSv4 clients possess the ability to change the security | |||
mechanisms used, after determining what is allowed, by using SECINFO | mechanisms used, after determining what is allowed, by using SECINFO | |||
and SECINFO_NONAME, the server SHOULD NOT present a different view of | and SECINFO_NONAME, the server SHOULD NOT present a different view of | |||
the namespace based on the security mechanism being used by a client. | the namespace based on the security mechanism being used by a client. | |||
skipping to change at page 158, line 30 | skipping to change at page 157, line 39 | |||
shared resource. Suppose the security policy for /a/b/ | shared resource. Suppose the security policy for /a/b/ | |||
MySecretProject is Kerberos with integrity and it is desired to limit | MySecretProject is Kerberos with integrity and it is desired to limit | |||
knowledge of the existence of this file system. In this case, the | knowledge of the existence of this file system. In this case, the | |||
server should apply the same security policy to /a/b. This allows | server should apply the same security policy to /a/b. This allows | |||
for knowledge of the existence of a file system to be secured when | for knowledge of the existence of a file system to be secured when | |||
desirable. | desirable. | |||
For the case of the use of multiple, disjoint security mechanisms in | For the case of the use of multiple, disjoint security mechanisms in | |||
the server's resources, applying that sort of policy would result in | the server's resources, applying that sort of policy would result in | |||
the higher-level file system not being accessible using any security | the higher-level file system not being accessible using any security | |||
flavor, which would make that higher-level file system inaccessible. | flavor. Therefore, that sort of configuration is not compatible with | |||
Therefore, that sort of configuration is not compatible with hiding | hiding the existence (as opposed to the contents) from clients using | |||
the existence (as opposed to the contents) from clients using | ||||
multiple disjoint sets of security flavors. | multiple disjoint sets of security flavors. | |||
In other circumstances, a desirable policy is for the security of a | In other circumstances, a desirable policy is for the security of a | |||
particular object in the server's namespace should include the union | particular object in the server's namespace to include the union of | |||
of all security mechanisms of all direct descendants. A common and | all security mechanisms of all direct descendants. A common and | |||
convenient practice, unless strong security requirements dictate | convenient practice, unless strong security requirements dictate | |||
otherwise, is to make all of the pseudo file system accessible by all | otherwise, is to make the entire the pseudo file system accessible by | |||
of the valid security mechanisms. | all of the valid security mechanisms. | |||
Where there is concern about the security of data on the network, | Where there is concern about the security of data on the network, | |||
clients should use strong security mechanisms to access the pseudo | clients should use strong security mechanisms to access the pseudo | |||
file system in order to prevent man-in-the-middle attacks. | file system in order to prevent man-in-the-middle attacks. | |||
8. State Management | 8. State Management | |||
Integrating locking into the NFS protocol necessarily causes it to be | Integrating locking into the NFS protocol necessarily causes it to be | |||
stateful. With the inclusion of such features as share reservations, | stateful. With the inclusion of such features as share reservations, | |||
file and directory delegations, recallable layouts, and support for | file and directory delegations, recallable layouts, and support for | |||
mandatory byte-range locking, the protocol becomes substantially more | mandatory byte-range locking, the protocol becomes substantially more | |||
dependent on proper management of state than the traditional | dependent on proper management of state than the traditional | |||
combination of NFS and NLM [46]. These features include expanded | combination of NFS and NLM (Network Lock Manager) [46]. These | |||
locking facilities, which provide some measure of interclient | features include expanded locking facilities, which provide some | |||
exclusion, but the state also offers features not readily providable | measure of inter-client exclusion, but the state also offers features | |||
using a stateless model. There are three components to making this | not readily providable using a stateless model. There are three | |||
state manageable: | components to making this state manageable: | |||
o Clear division between client and server | o clear division between client and server | |||
o Ability to reliably detect inconsistency in state between client | o ability to reliably detect inconsistency in state between client | |||
and server | and server | |||
o Simple and robust recovery mechanisms | o simple and robust recovery mechanisms | |||
In this model, the server owns the state information. The client | In this model, the server owns the state information. The client | |||
requests changes in locks and the server responds with the changes | requests changes in locks and the server responds with the changes | |||
made. Non-client-initiated changes in locking state are infrequent. | made. Non-client-initiated changes in locking state are infrequent. | |||
The client receives prompt notification of such changes and can | The client receives prompt notification of such changes and can | |||
adjust its view of the locking state to reflect the server's changes. | adjust its view of the locking state to reflect the server's changes. | |||
Individual pieces of state created by the server and passed to the | Individual pieces of state created by the server and passed to the | |||
client at its request are represented by 128-bit stateids. These | client at its request are represented by 128-bit stateids. These | |||
stateids may represent a particular open file, a set of byte-range | stateids may represent a particular open file, a set of byte-range | |||
locks held by a particular owner, or a recallable delegation of | locks held by a particular owner, or a recallable delegation of | |||
privileges to access a file in particular ways, or at a particular | privileges to access a file in particular ways or at a particular | |||
location. | location. | |||
In all cases, there is a transition from the most general information | In all cases, there is a transition from the most general information | |||
which represents a client as a whole to the eventual lightweight | that represents a client as a whole to the eventual lightweight | |||
stateid used for most client and server locking interactions. The | stateid used for most client and server locking interactions. The | |||
details of this transition will vary with the type of object but it | details of this transition will vary with the type of object but it | |||
always starts with a client ID. | always starts with a client ID. | |||
8.1. Client and Session ID | 8.1. Client and Session ID | |||
A client must establish a client ID (see Section 2.4) and then one or | A client must establish a client ID (see Section 2.4) and then one or | |||
more sessionids (see Section 2.10) before performing any operations | more sessionids (see Section 2.10) before performing any operations | |||
to open, lock, delegate, or obtain a layout for a file object. Each | to open, byte-range lock, delegate, or obtain a layout for a file | |||
session ID is associated with a specific client ID, and thus serves | object. Each session ID is associated with a specific client ID, and | |||
as a shorthand reference to an NFSv4.1 client. | thus serves as a shorthand reference to an NFSv4.1 client. | |||
For some types of locking interactions, the client will represent | For some types of locking interactions, the client will represent | |||
some number of internal locking entities called "owners", which | some number of internal locking entities called "owners", which | |||
normally correspond to processes internal to the client. For other | normally correspond to processes internal to the client. For other | |||
types of locking-related objects, such as delegations and layouts, no | types of locking-related objects, such as delegations and layouts, no | |||
such intermediate entities are provided for, and the locking-related | such intermediate entities are provided for, and the locking-related | |||
objects are considered to be transferred directly between the server | objects are considered to be transferred directly between the server | |||
and a unitary client. | and a unitary client. | |||
8.2. Stateid Definition | 8.2. Stateid Definition | |||
When the server grants a lock of any type (including opens, byte- | When the server grants a lock of any type (including opens, byte- | |||
range locks, delegations, and layouts) it responds with a unique | range locks, delegations, and layouts), it responds with a unique | |||
stateid, that represents a set of locks (often a single lock) for the | stateid that represents a set of locks (often a single lock) for the | |||
same file, of the same type, and sharing the same ownership | same file, of the same type, and sharing the same ownership | |||
characteristics. Thus opens of the same file by different open- | characteristics. Thus, opens of the same file by different open- | |||
owners each have an identifying stateid. Similarly, each set of | owners each have an identifying stateid. Similarly, each set of | |||
byte-range locks on a file owned by a specific lock-owner has its own | byte-range locks on a file owned by a specific lock-owner has its own | |||
identifying stateid. Delegations and layouts also have associated | identifying stateid. Delegations and layouts also have associated | |||
stateids by which they may be referenced. The stateid is used as a | stateids by which they may be referenced. The stateid is used as a | |||
shorthand reference to a lock or set of locks and given a stateid the | shorthand reference to a lock or set of locks, and given a stateid, | |||
server can determine the associated state-owner or state-owners (in | the server can determine the associated state-owner or state-owners | |||
the case of an open-owner/lock-owner pair) and the associated | (in the case of an open-owner/lock-owner pair) and the associated | |||
filehandle. When stateids are used, the current filehandle must be | filehandle. When stateids are used, the current filehandle must be | |||
the one associated with that stateid. | the one associated with that stateid. | |||
All stateids associated with a given client ID are associated with a | All stateids associated with a given client ID are associated with a | |||
common lease which represents the claim of those stateids and the | common lease that represents the claim of those stateids and the | |||
objects they represent to be maintained by the server. See | objects they represent to be maintained by the server. See | |||
Section 8.3 for a discussion of leases. | Section 8.3 for a discussion of the lease. | |||
The server may assign stateids independently for different clients. | The server may assign stateids independently for different clients. | |||
A stateid with the same bit pattern for one client may designate an | A stateid with the same bit pattern for one client may designate an | |||
entirely different set of locks for a different client. The stateid | entirely different set of locks for a different client. The stateid | |||
is always interpreted with respect to the client ID associated with | is always interpreted with respect to the client ID associated with | |||
the current session. Stateids apply to all sessions associated with | the current session. Stateids apply to all sessions associated with | |||
the given client ID and the client may use a stateid obtained from | the given client ID, and the client may use a stateid obtained from | |||
one session on another session associated with the same client ID. | one session on another session associated with the same client ID. | |||
8.2.1. Stateid Types | 8.2.1. Stateid Types | |||
With the exception of special stateids (see Section 8.2.3), each | With the exception of special stateids (see Section 8.2.3), each | |||
stateid represents locking objects of one of a set of types defined | stateid represents locking objects of one of a set of types defined | |||
by the NFSv4.1 protocol. Note that in all these cases, where we | by the NFSv4.1 protocol. Note that in all these cases, where we | |||
speak of guarantee, it is understood there are situations such as a | speak of guarantee, it is understood there are situations such as a | |||
client restart, or lock revocation, that allow the guarantee to be | client restart, or lock revocation, that allow the guarantee to be | |||
voided. | voided. | |||
o Stateids may represent opens of files. | o Stateids may represent opens of files. | |||
Each stateid in this case represents the open state for a given | Each stateid in this case represents the OPEN state for a given | |||
client ID/open-owner/filehandle triple. Such stateids are subject | client ID/open-owner/filehandle triple. Such stateids are subject | |||
to change (with consequent incrementing of the stateid's seqid) in | to change (with consequent incrementing of the stateid's seqid) in | |||
response to OPENs that result in upgrade and OPEN_DOWNGRADE | response to OPENs that result in upgrade and OPEN_DOWNGRADE | |||
operations. | operations. | |||
o Stateids may represent sets of byte-range locks. | o Stateids may represent sets of byte-range locks. | |||
All locks held on a particular file by a particular owner and all | All locks held on a particular file by a particular owner and | |||
gotten under the aegis of a particular open file are associated | gotten under the aegis of a particular open file are associated | |||
with a single stateid with the seqid being incremented whenever | with a single stateid with the seqid being incremented whenever | |||
LOCK and LOCKU operations affect that set of locks. | LOCK and LOCKU operations affect that set of locks. | |||
o Stateids may represent file delegations, which are recallable | o Stateids may represent file delegations, which are recallable | |||
guarantees by the server to the client, that other clients will | guarantees by the server to the client that other clients will not | |||
not reference, or will not modify a particular file, until the | reference or modify a particular file, until the delegation is | |||
delegation is returned. In NFSv4.1, file delegations may be | returned. In NFSv4.1, file delegations may be obtained on both | |||
obtained on both regular and non-regular files. | regular and non-regular files. | |||
A stateid represents a single delegation held by a client for a | A stateid represents a single delegation held by a client for a | |||
particular filehandle. | particular filehandle. | |||
o Stateids may represent directory delegations, which are recallable | o Stateids may represent directory delegations, which are recallable | |||
guarantees by the server to the client, that other clients will | guarantees by the server to the client that other clients will not | |||
not modify the directory, until the delegation is returned. | modify the directory, until the delegation is returned. | |||
A stateid represents a single delegation held by a client for a | A stateid represents a single delegation held by a client for a | |||
particular directory filehandle. | particular directory filehandle. | |||
o Stateids may represent layouts, which are recallable guarantees by | o Stateids may represent layouts, which are recallable guarantees by | |||
the server to the client, that particular files may be accessed | the server to the client that particular files may be accessed via | |||
via an alternate data access protocol at specific locations. Such | an alternate data access protocol at specific locations. Such | |||
access is limited to particular sets of byte ranges and may | access is limited to particular sets of byte-ranges and may | |||
proceed until those byte ranges are reduced or the layout is | proceed until those byte-ranges are reduced or the layout is | |||
returned. | returned. | |||
A stateid represents the set of all layouts held by a particular | A stateid represents the set of all layouts held by a particular | |||
client for a particular filehandle with a given layout type. The | client for a particular filehandle with a given layout type. The | |||
seqid is updated as the layouts of that set changes with layout | seqid is updated as the layouts of that set of byte-ranges change, | |||
stateid changing operations such as LAYOUTGET and LAYOUTRETURN. | via layout stateid changing operations such as LAYOUTGET and | |||
LAYOUTRETURN. | ||||
8.2.2. Stateid Structure | 8.2.2. Stateid Structure | |||
Stateids are divided into two fields, a 96-bit "other" field | Stateids are divided into two fields, a 96-bit "other" field | |||
identifying the specific set of locks and a 32-bit "seqid" sequence | identifying the specific set of locks and a 32-bit "seqid" sequence | |||
value. Except in the case of special stateids (see Section 8.2.3), a | value. Except in the case of special stateids (see Section 8.2.3), a | |||
particular value of the "other" field denotes a set of locks of the | particular value of the "other" field denotes a set of locks of the | |||
same type (for example byte-range locks, opens, delegations, or | same type (for example, byte-range locks, opens, delegations, or | |||
layouts), for a specific file or directory, and sharing the same | layouts), for a specific file or directory, and sharing the same | |||
ownership characteristics. The seqid designates a specific instance | ownership characteristics. The seqid designates a specific instance | |||
of such a set of locks, and is incremented to indicate changes in | of such a set of locks, and is incremented to indicate changes in | |||
such a set of locks, either by the addition or deletion of locks from | such a set of locks, either by the addition or deletion of locks from | |||
the set, a change in the byte-range they apply to, or an upgrade or | the set, a change in the byte-range they apply to, or an upgrade or | |||
downgrade in the type of one or more locks. | downgrade in the type of one or more locks. | |||
When such a set of locks is first created the server returns a | When such a set of locks is first created, the server returns a | |||
stateid with seqid value of one. On subsequent operations which | stateid with seqid value of one. On subsequent operations that | |||
modify the set of locks the server is required to increment the seqid | modify the set of locks, the server is required to increment the | |||
field by one (1) whenever it returns a stateid for the same state- | "seqid" field by one whenever it returns a stateid for the same | |||
owner/file/type combination and there is some change in the set of | state-owner/file/type combination and there is some change in the set | |||
locks actually designated. In this case the server will return a | of locks actually designated. In this case, the server will return a | |||
stateid with an other field the same as previously used for that | stateid with an "other" field the same as previously used for that | |||
state-owner/file/type combination, with an incremented seqid field. | state-owner/file/type combination, with an incremented "seqid" field. | |||
This pattern continues until the seqid is incremented past | This pattern continues until the seqid is incremented past | |||
NFS4_UINT32_MAX, and one (not zero) is the next seqid value. | NFS4_UINT32_MAX, and one (not zero) is the next seqid value. | |||
The purpose of the incrementing of the seqid is to allow the server | The purpose of the incrementing of the seqid is to allow the server | |||
to communicate to the client the order in which operations that | to communicate to the client the order in which operations that | |||
modified locking state associated with a stateid have been processed | modified locking state associated with a stateid have been processed | |||
and to make it possible for the client to send requests that are | and to make it possible for the client to send requests that are | |||
conditional on the set of locks not having changed since the stateid | conditional on the set of locks not having changed since the stateid | |||
in question was returned. | in question was returned. | |||
Except for layout stateids (Section 12.5.3) when a client sends a | Except for layout stateids (Section 12.5.3), when a client sends a | |||
stateid to the server, it has two choices with regard to the seqid | stateid to the server, it has two choices with regard to the seqid | |||
sent. It may set the seqid to zero to indicate to the server that it | sent. It may set the seqid to zero to indicate to the server that it | |||
wishes the most up-to-date seqid for that stateid's "other" field to | wishes the most up-to-date seqid for that stateid's "other" field to | |||
be used. This would be the common choice in the case of a stateid | be used. This would be the common choice in the case of a stateid | |||
sent with a READ or WRITE operation. It also may set a non-zero | sent with a READ or WRITE operation. It also may set a non-zero | |||
value in which case the server checks if that seqid is the correct | value, in which case the server checks if that seqid is the correct | |||
one. In that case the server is required to return | one. In that case, the server is required to return | |||
NFS4ERR_OLD_STATEID if the seqid is lower than the most current value | NFS4ERR_OLD_STATEID if the seqid is lower than the most current value | |||
and NFS4ERR_BAD_STATEID if the seqid is greater than the most current | and NFS4ERR_BAD_STATEID if the seqid is greater than the most current | |||
value. This would be the common choice in the case of stateids sent | value. This would be the common choice in the case of stateids sent | |||
with a CLOSE or OPEN_DOWNGRADE. Because OPENs may be sent in | with a CLOSE or OPEN_DOWNGRADE. Because OPENs may be sent in | |||
parallel for the same owner, a client might close a file without | parallel for the same owner, a client might close a file without | |||
knowing that an OPEN upgrade had been done by the server, changing | knowing that an OPEN upgrade had been done by the server, changing | |||
the lock in question. If CLOSE were sent with a zero seqid, the OPEN | the lock in question. If CLOSE were sent with a zero seqid, the OPEN | |||
upgrade would be cancelled before the client even received an | upgrade would be cancelled before the client even received an | |||
indication that an upgrade had happened. | indication that an upgrade had happened. | |||
When a stateid is sent by the server to client as part of a callback | When a stateid is sent by the server to the client as part of a | |||
operation, it is not subject to checking for a current seqid and | callback operation, it is not subject to checking for a current seqid | |||
returning NFS4ERR_OLD_STATEID. This is because the client is not in | and returning NFS4ERR_OLD_STATEID. This is because the client is not | |||
a position to know the most up-to-date seqid and thus cannot verify | in a position to know the most up-to-date seqid and thus cannot | |||
it. Unless specially noted, the seqid value for a stateid sent by | verify it. Unless specially noted, the seqid value for a stateid | |||
the server to the client as part of a callback is required to be zero | sent by the server to the client as part of a callback is required to | |||
with NFS4ERR_BAD_STATEID returned if it is not. | be zero with NFS4ERR_BAD_STATEID returned if it is not. | |||
In making comparisons between seqids, both by the client in | In making comparisons between seqids, both by the client in | |||
determining the order of operations and by the server in determining | determining the order of operations and by the server in determining | |||
whether the NFS4ERR_OLD_STATEID is to be returned, the possibility of | whether the NFS4ERR_OLD_STATEID is to be returned, the possibility of | |||
the seqid being swapped around past the NFS4_UINT32_MAX value needs | the seqid being swapped around past the NFS4_UINT32_MAX value needs | |||
to be taken into account. When two seqid values are being compared, | to be taken into account. When two seqid values are being compared, | |||
the total count of slots for all sessions associated with the current | the total count of slots for all sessions associated with the current | |||
client is used to do this. When one seqid value is less that this | client is used to do this. When one seqid value is less than this | |||
total slot count and another seqid value is greater than | total slot count and another seqid value is greater than | |||
NFS4_UINT32_MAX minus the total slot count, the former is to be | NFS4_UINT32_MAX minus the total slot count, the former is to be | |||
treated as lower than the later, despite the fact that it is | treated as lower than the latter, despite the fact that it is | |||
numerically greater. | numerically greater. | |||
8.2.3. Special Stateids | 8.2.3. Special Stateids | |||
Stateid values whose "other" field is either all zeros or all ones | Stateid values whose "other" field is either all zeros or all ones | |||
are reserved. They may not be assigned by the server but have | are reserved. They may not be assigned by the server but have | |||
special meanings defined by the protocol. The particular meaning | special meanings defined by the protocol. The particular meaning | |||
depends on whether the "other" field is all zeros or all ones and the | depends on whether the "other" field is all zeros or all ones and the | |||
specific value of the "seqid" field. | specific value of the "seqid" field. | |||
The following combinations of "other" and "seqid" are defined in | The following combinations of "other" and "seqid" are defined in | |||
NFSv4.1: | NFSv4.1: | |||
o When "other" and "seqid" are both zero, the stateid is treated as | o When "other" and "seqid" are both zero, the stateid is treated as | |||
a special anonymous stateid, which can be used in READ, WRITE, and | a special anonymous stateid, which can be used in READ, WRITE, and | |||
SETATTR requests to indicate the absence of any open state | SETATTR requests to indicate the absence of any OPEN state | |||
associated with the request. When an anonymous stateid value is | associated with the request. When an anonymous stateid value is | |||
used, and an existing open denies the form of access requested, | used and an existing open denies the form of access requested, | |||
then access will be denied to the request. This stateid MUST NOT | then access will be denied to the request. This stateid MUST NOT | |||
be used on operations to data servers (Section 13.6). | be used on operations to data servers (Section 13.6). | |||
o When "other" and "seqid" are both all ones, the stateid is a | o When "other" and "seqid" are both all ones, the stateid is a | |||
special read bypass stateid. When this value is used in WRITE or | special READ bypass stateid. When this value is used in WRITE or | |||
SETATTR, it is treated like the anonymous value. When used in | SETATTR, it is treated like the anonymous value. When used in | |||
READ, the server MAY grant access, even if access would normally | READ, the server MAY grant access, even if access would normally | |||
be denied to READ requests. This stateid MUST NOT be used on | be denied to READ operations. This stateid MUST NOT be used on | |||
operations to data servers. | operations to data servers. | |||
o When "other" is zero and "seqid" is one, the stateid represents | o When "other" is zero and "seqid" is one, the stateid represents | |||
the current stateid, which is whatever value is the last stateid | the current stateid, which is whatever value is the last stateid | |||
returned by an operation within the COMPOUND. In the case of an | returned by an operation within the COMPOUND. In the case of an | |||
OPEN, the stateid returned for the open file, and not the | OPEN, the stateid returned for the open file and not the | |||
delegation is used. The stateid passed to the operation in place | delegation is used. The stateid passed to the operation in place | |||
of the special value has its "seqid" value set to zero, except | of the special value has its "seqid" value set to zero, except | |||
when the current stateid is used by the operation CLOSE or | when the current stateid is used by the operation CLOSE or | |||
OPEN_DOWNGRADE. If there is no operation in the COMPOUND which | OPEN_DOWNGRADE. If there is no operation in the COMPOUND that has | |||
has returned a stateid value, the server MUST return the error | returned a stateid value, the server MUST return the error | |||
NFS4ERR_BAD_STATEID. As illustrated in Figure 6, if the value of | NFS4ERR_BAD_STATEID. As illustrated in Figure 6, if the value of | |||
a current stateid is a special stateid, and the stateid of an | a current stateid is a special stateid and the stateid of an | |||
operation's arguments has "other" set to zero, and "seqid" set to | operation's arguments has "other" set to zero and "seqid" set to | |||
one, then the server MUST return the error NFS4ERR_BAD_STATEID. | one, then the server MUST return the error NFS4ERR_BAD_STATEID. | |||
o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid | o When "other" is zero and "seqid" is NFS4_UINT32_MAX, the stateid | |||
represents a reserved stateid value defined to be invalid. When | represents a reserved stateid value defined to be invalid. When | |||
this stateid is used, the server MUST return the error | this stateid is used, the server MUST return the error | |||
NFS4ERR_BAD_STATEID. | NFS4ERR_BAD_STATEID. | |||
If a stateid value is used which has all zero or all ones in the | If a stateid value is used that has all zeros or all ones in the | |||
"other" field, but does not match one of the cases above, the server | "other" field but does not match one of the cases above, the server | |||
MUST return the error NFS4ERR_BAD_STATEID. | MUST return the error NFS4ERR_BAD_STATEID. | |||
Special stateids, unlike other stateids, are not associated with | Special stateids, unlike other stateids, are not associated with | |||
individual client IDs or filehandles and can be used with all valid | individual client IDs or filehandles and can be used with all valid | |||
client IDs and filehandles. In the case of a special stateid | client IDs and filehandles. In the case of a special stateid | |||
designating the current stateid, the current stateid value | designating the current stateid, the current stateid value | |||
substituted for the special stateid is associated with a particular | substituted for the special stateid is associated with a particular | |||
client ID and filehandle, and so, if it is used where current | client ID and filehandle, and so, if it is used where the current | |||
filehandle does not match that associated with the current stateid, | filehandle does not match that associated with the current stateid, | |||
the operation to which the stateid is passed will return | the operation to which the stateid is passed will return | |||
NFS4ERR_BAD_STATEID. | NFS4ERR_BAD_STATEID. | |||
8.2.4. Stateid Lifetime and Validation | 8.2.4. Stateid Lifetime and Validation | |||
Stateids must remain valid until either a client restart or a server | Stateids must remain valid until either a client restart or a server | |||
restart or until the client returns all of the locks associated with | restart or until the client returns all of the locks associated with | |||
the stateid by means of an operation such as CLOSE or DELEGRETURN. | the stateid by means of an operation such as CLOSE or DELEGRETURN. | |||
If the locks are lost due to revocation, as long as the client ID is | If the locks are lost due to revocation, as long as the client ID is | |||
valid, the stateid remains a valid designation of that revoked state | valid, the stateid remains a valid designation of that revoked state | |||
until the client frees it by using FREE_STATEID. Stateids associated | until the client frees it by using FREE_STATEID. Stateids associated | |||
with byte-range locks are an exception. They remain valid even if a | with byte-range locks are an exception. They remain valid even if a | |||
LOCKU frees all remaining locks, so long as the open file with which | LOCKU frees all remaining locks, so long as the open file with which | |||
they are associated remains open, unless the client does a | they are associated remains open, unless the client frees the | |||
FREE_STATEID to cause the stateid to be freed. | stateids via the FREE_STATEID operation. | |||
It should be noted that there are situations in which the client's | It should be noted that there are situations in which the client's | |||
locks become invalid, without the client requesting they be returned. | locks become invalid, without the client requesting they be returned. | |||
These include lease expiration and a number of forms of lock | These include lease expiration and a number of forms of lock | |||
revocation within the lease period. It is important to note that in | revocation within the lease period. It is important to note that in | |||
these situations, the stateid remains valid and the client can use it | these situations, the stateid remains valid and the client can use it | |||
to determine the disposition of the associated lost locks. | to determine the disposition of the associated lost locks. | |||
An "other" value must never be reused for a different purpose (i.e. | An "other" value must never be reused for a different purpose (i.e., | |||
different filehandle, owner, or type of locks) within the context of | different filehandle, owner, or type of locks) within the context of | |||
a single client ID. A server may retain the "other" value for the | a single client ID. A server may retain the "other" value for the | |||
same purpose beyond the point where it may otherwise be freed but if | same purpose beyond the point where it may otherwise be freed, but if | |||
it does so, it must maintain "seqid" continuity with previous values. | it does so, it must maintain "seqid" continuity with previous values. | |||
One mechanism that may be used to satisfy the requirement that the | One mechanism that may be used to satisfy the requirement that the | |||
server recognize invalid and out-of-date stateids is for the server | server recognize invalid and out-of-date stateids is for the server | |||
to divide the "other" field of the stateid into two fields. | to divide the "other" field of the stateid into two fields. | |||
o An index into a table of locking-state structures. | o an index into a table of locking-state structures. | |||
o A generation number which is incremented on each allocation of a | o a generation number that is incremented on each allocation of a | |||
table entry for a particular use. | table entry for a particular use. | |||
And then store in each table entry, | And then store in each table entry, | |||
o The client ID with which the stateid is associated. | o the client ID with which the stateid is associated. | |||
o The current generation number for the (at most one) valid stateid | o the current generation number for the (at most one) valid stateid | |||
sharing this index value. | sharing this index value. | |||
o The filehandle of the file on which the locks are taken. | o the filehandle of the file on which the locks are taken. | |||
o An indication of the type of stateid (open, byte-range lock, file | o an indication of the type of stateid (open, byte-range lock, file | |||
delegation, directory delegation, layout). | delegation, directory delegation, layout). | |||
o The last "seqid" value returned corresponding to the current | o the last "seqid" value returned corresponding to the current | |||
"other" value. | "other" value. | |||
o An indication of the current status of the locks associated with | o an indication of the current status of the locks associated with | |||
this stateid. In particular, whether these have been revoked and | this stateid, in particular, whether these have been revoked and | |||
if so, for what reason. | if so, for what reason. | |||
With this information, an incoming stateid can be validated and the | With this information, an incoming stateid can be validated and the | |||
appropriate error returned when necessary. Special and non-special | appropriate error returned when necessary. Special and non-special | |||
stateids are handled separately. (See Section 8.2.3 for a discussion | stateids are handled separately. (See Section 8.2.3 for a discussion | |||
of special stateids.) | of special stateids.) | |||
Note that stateids are implicitly qualified by the current client ID, | Note that stateids are implicitly qualified by the current client ID, | |||
as derived from the client ID associated with the current session. | as derived from the client ID associated with the current session. | |||
Note however, that the semantics of the session will prevent stateids | Note, however, that the semantics of the session will prevent | |||
associated with a previous client or server instance from being | stateids associated with a previous client or server instance from | |||
analyzed by this procedure. | being analyzed by this procedure. | |||
If server restart has resulted in an invalid client ID or a session | If server restart has resulted in an invalid client ID or a session | |||
ID which is invalid, SEQUENCE will return an error and the operation | ID that is invalid, SEQUENCE will return an error and the operation | |||
that takes a stateid as an argument will never be processed. | that takes a stateid as an argument will never be processed. | |||
If there has been a server restart where there is a persistent | If there has been a server restart where there is a persistent | |||
session, and all leased state has been lost, then the session in | session and all leased state has been lost, then the session in | |||
question will, although valid, be marked as dead, and any operation | question will, although valid, be marked as dead, and any operation | |||
not satisfied by means of the reply cache will receive the error | not satisfied by means of the reply cache will receive the error | |||
NFS4ERR_DEADSESSION, and thus not be processed as indicated below. | NFS4ERR_DEADSESSION, and thus not be processed as indicated below. | |||
When a stateid is being tested, and the "other" field is all zeros or | When a stateid is being tested and the "other" field is all zeros or | |||
all ones, a check that the "other" and "seqid" fields match a defined | all ones, a check that the "other" and "seqid" fields match a defined | |||
combination for a special stateid is done and the results determined | combination for a special stateid is done and the results determined | |||
as follows: | as follows: | |||
o If the "other" and "seqid" fields do not match a defined | o If the "other" and "seqid" fields do not match a defined | |||
combination associated with a special stateid, the error | combination associated with a special stateid, the error | |||
NFS4ERR_BAD_STATEID is returned. | NFS4ERR_BAD_STATEID is returned. | |||
o If the special stateid is one designating the current stateid, and | o If the special stateid is one designating the current stateid and | |||
there is a current stateid, then the current stateid is | there is a current stateid, then the current stateid is | |||
substituted for the special stateid and the checks appropriate to | substituted for the special stateid and the checks appropriate to | |||
non-special stateids in performed. | non-special stateids are performed. | |||
o If the combination is valid in general but is not appropriate to | o If the combination is valid in general but is not appropriate to | |||
the context in which the stateid is used (e.g. an all-zero stateid | the context in which the stateid is used (e.g., an all-zero | |||
is used when an open stateid is required in a LOCK operation), the | stateid is used when an OPEN stateid is required in a LOCK | |||
error NFS4ERR_BAD_STATEID is also returned. | operation), the error NFS4ERR_BAD_STATEID is also returned. | |||
o Otherwise, the check is completed and the special stateid is | o Otherwise, the check is completed and the special stateid is | |||
accepted as valid. | accepted as valid. | |||
When a stateid is being tested, and the "other" field is neither all | When a stateid is being tested, and the "other" field is neither all | |||
zeros or all ones, the following procedure could be used to validate | zeros nor all ones, the following procedure could be used to validate | |||
an incoming stateid and return an appropriate error, when necessary, | an incoming stateid and return an appropriate error, when necessary, | |||
assuming that the "other" field would be divided into a table index | assuming that the "other" field would be divided into a table index | |||
and an entry generation. | and an entry generation. | |||
o If the table index field is outside the range of the associated | o If the table index field is outside the range of the associated | |||
table, return NFS4ERR_BAD_STATEID. | table, return NFS4ERR_BAD_STATEID. | |||
o If the selected table entry is of a different generation than that | o If the selected table entry is of a different generation than that | |||
specified in the incoming stateid, return NFS4ERR_BAD_STATEID. | specified in the incoming stateid, return NFS4ERR_BAD_STATEID. | |||
skipping to change at page 167, line 4 | skipping to change at page 166, line 16 | |||
associated with the current session, return NFS4ERR_BAD_STATEID. | associated with the current session, return NFS4ERR_BAD_STATEID. | |||
o If the stateid represents revoked state, then return | o If the stateid represents revoked state, then return | |||
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, | NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or NFS4ERR_DELEG_REVOKED, | |||
as appropriate. | as appropriate. | |||
o If the stateid type is not valid for the context in which the | o If the stateid type is not valid for the context in which the | |||
stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid | stateid appears, return NFS4ERR_BAD_STATEID. Note that a stateid | |||
may be valid in general, as would be reported by the TEST_STATEID | may be valid in general, as would be reported by the TEST_STATEID | |||
operation, but be invalid for a particular operation, as, for | operation, but be invalid for a particular operation, as, for | |||
example, when a stateid which doesn't represent byte-range locks | example, when a stateid that doesn't represent byte-range locks is | |||
is passed to the non-from_open case of LOCK or to LOCKU, or when a | passed to the non-from_open case of LOCK or to LOCKU, or when a | |||
stateid which does not represent an open is passed to CLOSE or | stateid that does not represent an open is passed to CLOSE or | |||
OPEN_DOWNGRADE. In such cases, the server MUST return | OPEN_DOWNGRADE. In such cases, the server MUST return | |||
NFS4ERR_BAD_STATEID. | NFS4ERR_BAD_STATEID. | |||
o If the "seqid" field is not zero, and it is greater than the | o If the "seqid" field is not zero and it is greater than the | |||
current sequence value corresponding the current "other" field, | current sequence value corresponding to the current "other" field, | |||
return NFS4ERR_BAD_STATEID. | return NFS4ERR_BAD_STATEID. | |||
o If the "seqid" field is not zero, and it is less than the current | o If the "seqid" field is not zero and it is less than the current | |||
sequence value corresponding the current "other" field, return | sequence value corresponding to the current "other" field, return | |||
NFS4ERR_OLD_STATEID. | NFS4ERR_OLD_STATEID. | |||
o Otherwise, the stateid is valid and the table entry should contain | o Otherwise, the stateid is valid and the table entry should contain | |||
any additional information about the type of stateid and | any additional information about the type of stateid and | |||
information associated with that particular type of stateid, such | information associated with that particular type of stateid, such | |||
as the associated set of locks, such as open-owner and lock-owner | as the associated set of locks, e.g., open-owner and lock-owner | |||
information, as well as information on the specific locks, such as | information, as well as information on the specific locks, e.g., | |||
open modes and byte ranges. | open modes and byte-ranges. | |||
8.2.5. Stateid Use for I/O Operations | 8.2.5. Stateid Use for I/O Operations | |||
Clients performing I/O operations need to select an appropriate | Clients performing I/O operations need to select an appropriate | |||
stateid based on the locks (including opens and delegations) held by | stateid based on the locks (including opens and delegations) held by | |||
the client and the various types of state-owners sending the I/O | the client and the various types of state-owners sending the I/O | |||
requests. SETATTR operations which change the file size are treated | requests. SETATTR operations that change the file size are treated | |||
like I/O operations in this regard. | like I/O operations in this regard. | |||
The following rules, applied in order of decreasing priority, govern | The following rules, applied in order of decreasing priority, govern | |||
the selection of the appropriate stateid. In following these rules, | the selection of the appropriate stateid. In following these rules, | |||
the client will only consider locks of which it has actually received | the client will only consider locks of which it has actually received | |||
notification by an appropriate operation response or callback. Note | notification by an appropriate operation response or callback. Note | |||
that the rules are slightly different in the case of I/O to data | that the rules are slightly different in the case of I/O to data | |||
servers when file layouts are being used (see Section 13.9.1). | servers when file layouts are being used (see Section 13.9.1). | |||
o If the client holds a delegation for the file in question, the | o If the client holds a delegation for the file in question, the | |||
delegation stateid SHOULD be used. | delegation stateid SHOULD be used. | |||
o Otherwise, if the lock-owner corresponding entity (e.g. process) | o Otherwise, if the entity corresponding to the lock-owner (e.g., a | |||
sending the I/O has a lock stateid for the associated open file, | process) sending the I/O has a byte-range lock stateid for the | |||
then the lock stateid for that lock-owner and open file SHOULD be | associated open file, then the byte-range lock stateid for that | |||
used. | lock-owner and open file SHOULD be used. | |||
o If there is no lock stateid, then the open stateid for the open | o If there is no byte-range lock stateid, then the OPEN stateid for | |||
file in question SHOULD be used. | the open file in question SHOULD be used. | |||
o Finally, if none of the above apply, then a special stateid SHOULD | o Finally, if none of the above apply, then a special stateid SHOULD | |||
be used. | be used. | |||
Ignoring these rules may result in situations in which the server | Ignoring these rules may result in situations in which the server | |||
does not have information necessary to properly process the request. | does not have information necessary to properly process the request. | |||
For example, when mandatory byte-range locks are in effect, if the | For example, when mandatory byte-range locks are in effect, if the | |||
stateid does not indicate the proper lock-owner, via a lock stateid, | stateid does not indicate the proper lock-owner, via a lock stateid, | |||
a request might be avoidably rejected. | a request might be avoidably rejected. | |||
The server however should not try to enforce these ordering rules and | The server however should not try to enforce these ordering rules and | |||
should use whatever information is available to proper process I/O | should use whatever information is available to properly process I/O | |||
requests. In particular, when a client has a delegation for a given | requests. In particular, when a client has a delegation for a given | |||
file, it SHOULD take note of this fact in processing a request, even | file, it SHOULD take note of this fact in processing a request, even | |||
if it is sent with a special stateid. | if it is sent with a special stateid. | |||
8.2.6. Stateid Use for SETATTR Operations | 8.2.6. Stateid Use for SETATTR Operations | |||
Because each operation is associated with a session ID and from that | Because each operation is associated with a session ID and from that | |||
the clientid can be determined, operations do not need to include a | the clientid can be determined, operations do not need to include a | |||
stateid for the server to be able to determine whether they should | stateid for the server to be able to determine whether they should | |||
cause a delegation to be recalled or are to be treated as done within | cause a delegation to be recalled or are to be treated as done within | |||
the scope of the delegation. | the scope of the delegation. | |||
In the case of SETATTR operations, a stateid is present. In cases | In the case of SETATTR operations, a stateid is present. In cases | |||
other than those which set the file size, the client may send either | other than those that set the file size, the client may send either a | |||
a special stateid or, when a delegation is held for the file in | special stateid or, when a delegation is held for the file in | |||
question, a delegation stateid. While the server SHOULD validate the | question, a delegation stateid. While the server SHOULD validate the | |||
stateid and may use the stateid to optimize the determination as to | stateid and may use the stateid to optimize the determination as to | |||
whether a delegation is held, it SHOULD note the presence of a | whether a delegation is held, it SHOULD note the presence of a | |||
delegation even when a special stateid is sent, and MUST accept a | delegation even when a special stateid is sent, and MUST accept a | |||
valid delegation stateid when sent. | valid delegation stateid when sent. | |||
8.3. Lease Renewal | 8.3. Lease Renewal | |||
Each client/server pair, as represented by a client ID, has a single | Each client/server pair, as represented by a client ID, has a single | |||
lease. The purpose of the lease is to allow the client to indicate | lease. The purpose of the lease is to allow the client to indicate | |||
skipping to change at page 169, line 37 | skipping to change at page 168, line 48 | |||
used for one of those connections. | used for one of those connections. | |||
o Transport retransmission delays might become so large as to | o Transport retransmission delays might become so large as to | |||
approach or exceed the length of the lease period. This may be | approach or exceed the length of the lease period. This may be | |||
particularly likely when the server is unresponsive due to a | particularly likely when the server is unresponsive due to a | |||
restart; see Section 8.4.2.1. If the client implementation is not | restart; see Section 8.4.2.1. If the client implementation is not | |||
careful, transport retransmission delays can result in the client | careful, transport retransmission delays can result in the client | |||
failing to detect a server restart before the grace period ends. | failing to detect a server restart before the grace period ends. | |||
The scenario is that the client is using a transport with | The scenario is that the client is using a transport with | |||
exponential back off, such that the maximum retransmission timeout | exponential back off, such that the maximum retransmission timeout | |||
exceeds the both the grace period and the lease_time attribute. A | exceeds both the grace period and the lease_time attribute. A | |||
network partition causes the client's connection's retransmission | network partition causes the client's connection's retransmission | |||
interval to back off, and even after the partition heals, the next | interval to back off, and even after the partition heals, the next | |||
transport-level retransmission is sent after the server has | transport-level retransmission is sent after the server has | |||
restarted and its grace period ends. | restarted and its grace period ends. | |||
The client MUST either recover from the ensuing NFS4ERR_NO_GRACE | The client MUST either recover from the ensuing NFS4ERR_NO_GRACE | |||
errors, or it MUST ensure that despite transport level | errors or it MUST ensure that, despite transport-level | |||
retransmission intervals that exceed the lease_time, nonetheless a | retransmission intervals that exceed the lease_time, a SEQUENCE | |||
SEQUENCE operation is sent that renews the lease before | operation is sent that renews the lease before expiration. The | |||
expiration. The client can achieve this by associating a new | client can achieve this by associating a new connection with the | |||
connection with the session, and sending a SEQUENCE operation on | session, and sending a SEQUENCE operation on it. However, if the | |||
it. However, if the attempt to establish a new connection is | attempt to establish a new connection is delayed for some reason | |||
delayed for some reason (e.g. exponential backoff of the | (e.g., exponential backoff of the connection establishment | |||
connection establishment packets), the client will have to abort | packets), the client will have to abort the connection | |||
the connection establishment attempt before the lease expires, and | establishment attempt before the lease expires, and attempt to | |||
attempt to re-connect. | reconnect. | |||
If the server renews the lease upon receiving a SEQUENCE operation, | If the server renews the lease upon receiving a SEQUENCE operation, | |||
the server MUST NOT allow the lease to expire while the rest of the | the server MUST NOT allow the lease to expire while the rest of the | |||
operations in the COMPOUND procedure's request are still executing. | operations in the COMPOUND procedure's request are still executing. | |||
Once the last operation has finished, and the response to COMPOUND | Once the last operation has finished, and the response to COMPOUND | |||
has been sent, the server MUST set the lease to expire no sooner than | has been sent, the server MUST set the lease to expire no sooner than | |||
the sum of current time and the value of the lease_time attribute. | the sum of current time and the value of the lease_time attribute. | |||
A client ID's lease can expire when it has been at least the lease | A client ID's lease can expire when it has been at least the lease | |||
interval (lease_time) since the last lease-renewing SEQUENCE | interval (lease_time) since the last lease-renewing SEQUENCE | |||
operation was sent on any of the client ID's sessions and there are | operation was sent on any of the client ID's sessions and there are | |||
no active COMPOUND operations on any such sessions. | no active COMPOUND operations on any such sessions. | |||
Because the SEQUENCE operation is the basic mechanism to renew a | Because the SEQUENCE operation is the basic mechanism to renew a | |||
lease, and because if must be done at least once for each lease | lease, and because it must be done at least once for each lease | |||
period, it is the natural mechanism whereby the server will inform | period, it is the natural mechanism whereby the server will inform | |||
the client of changes in the lease status that the client needs to be | the client of changes in the lease status that the client needs to be | |||
informed of. The client should inspect the status flags | informed of. The client should inspect the status flags | |||
(sr_status_flags) returned by sequence and take the appropriate | (sr_status_flags) returned by sequence and take the appropriate | |||
action (see Section 18.46.3 for details). | action (see Section 18.46.3 for details). | |||
o The status bits SEQ4_STATUS_CB_PATH_DOWN and | o The status bits SEQ4_STATUS_CB_PATH_DOWN and | |||
SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the | SEQ4_STATUS_CB_PATH_DOWN_SESSION indicate problems with the | |||
backchannel which the client may need to address in order to | backchannel that the client may need to address in order to | |||
receive callback requests. | receive callback requests. | |||
o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and | o The status bits SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and | |||
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS | SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED indicate problems with GSS | |||
contexts or RPCSEC_GSS handles for the backchannel which the | contexts or RPCSEC_GSS handles for the backchannel that the client | |||
client may have to address to allow callback requests to be sent | might have to address in order to allow callback requests to be | |||
to it. | sent. | |||
o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | o The status bits SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | |||
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, | SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, | |||
SEQ4_STATUS_ADMIN_STATE_REVOKED, and | SEQ4_STATUS_ADMIN_STATE_REVOKED, and | |||
SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock | SEQ4_STATUS_RECALLABLE_STATE_REVOKED notify the client of lock | |||
revocation events. When these bits are set, the client should use | revocation events. When these bits are set, the client should use | |||
TEST_STATEID to find what stateids have been revoked and use | TEST_STATEID to find what stateids have been revoked and use | |||
FREE_STATEID to acknowledge loss of the associated state. | FREE_STATEID to acknowledge loss of the associated state. | |||
o The status bit SEQ4_STATUS_LEASE_MOVE indicates that | o The status bit SEQ4_STATUS_LEASE_MOVE indicates that | |||
responsibility for lease renewal has been transferred to one or | responsibility for lease renewal has been transferred to one or | |||
more new servers. | more new servers. | |||
o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that | o The status bit SEQ4_STATUS_RESTART_RECLAIM_NEEDED indicates that | |||
due to server restart the client must reclaim locking state. | due to server restart the client must reclaim locking state. | |||
o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates the server | o The status bit SEQ4_STATUS_BACKCHANNEL_FAULT indicates that the | |||
has encountered an unrecoverable fault with the backchannel (e.g. | server has encountered an unrecoverable fault with the backchannel | |||
it has lost track of a sequence ID for a slot in the backchannel). | (e.g., it has lost track of a sequence ID for a slot in the | |||
backchannel). | ||||
8.4. Crash Recovery | 8.4. Crash Recovery | |||
A critical requirement in crash recovery is that both the client and | A critical requirement in crash recovery is that both the client and | |||
the server know when the other has failed. Additionally, it is | the server know when the other has failed. Additionally, it is | |||
required that a client sees a consistent view of data across server | required that a client sees a consistent view of data across server | |||
restarts. All READ and WRITE operations that may have been queued | restarts. All READ and WRITE operations that may have been queued | |||
within the client or network buffers must wait until the client has | within the client or network buffers must wait until the client has | |||
successfully recovered the locks protecting the READ and WRITE | successfully recovered the locks protecting the READ and WRITE | |||
operations. Any that reach the server before the server can safely | operations. Any that reach the server before the server can safely | |||
determine that the client has recovered enough locking state to be | determine that the client has recovered enough locking state to be | |||
sure that such operations can be safely processed must be rejected. | sure that such operations can be safely processed must be rejected. | |||
This will happen because either: | This will happen because either: | |||
o The state presented is no longer valid since it is associated with | o The state presented is no longer valid since it is associated with | |||
a now invalid client ID. In this case the client will receive | a now invalid client ID. In this case, the client will receive | |||
either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any | either an NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, and any | |||
attempt to attach a new session to that invalid client ID will | attempt to attach a new session to that invalid client ID will | |||
result in an NFS4ERR_STALE_CLIENTID error. | result in an NFS4ERR_STALE_CLIENTID error. | |||
o Subsequent recovery of locks may make execution of the operation | o Subsequent recovery of locks may make execution of the operation | |||
inappropriate (NFS4ERR_GRACE). | inappropriate (NFS4ERR_GRACE). | |||
8.4.1. Client Failure and Recovery | 8.4.1. Client Failure and Recovery | |||
In the event that a client fails, the server may release the client's | In the event that a client fails, the server may release the client's | |||
skipping to change at page 171, line 46 | skipping to change at page 171, line 11 | |||
discussed in Section 8.3, when a client has not failed and re- | discussed in Section 8.3, when a client has not failed and re- | |||
establishes its lease before expiration occurs, requests for | establishes its lease before expiration occurs, requests for | |||
conflicting locks will not be granted. | conflicting locks will not be granted. | |||
To minimize client delay upon restart, lock requests are associated | To minimize client delay upon restart, lock requests are associated | |||
with an instance of the client by a client-supplied verifier. This | with an instance of the client by a client-supplied verifier. This | |||
verifier is part of the client_owner4 sent in the initial EXCHANGE_ID | verifier is part of the client_owner4 sent in the initial EXCHANGE_ID | |||
call made by the client. The server returns a client ID as a result | call made by the client. The server returns a client ID as a result | |||
of the EXCHANGE_ID operation. The client then confirms the use of | of the EXCHANGE_ID operation. The client then confirms the use of | |||
the client ID by establishing a session associated with that client | the client ID by establishing a session associated with that client | |||
ID (see Section 18.36.3 for a description how this is done). All | ID (see Section 18.36.3 for a description of how this is done). All | |||
locks, including opens, byte-range locks, delegations, and layouts | locks, including opens, byte-range locks, delegations, and layouts | |||
obtained by sessions using that client ID are associated with that | obtained by sessions using that client ID, are associated with that | |||
client ID. | client ID. | |||
Since the verifier will be changed by the client upon each | Since the verifier will be changed by the client upon each | |||
initialization, the server can compare a new verifier to the verifier | initialization, the server can compare a new verifier to the verifier | |||
associated with currently held locks and determine that they do not | associated with currently held locks and determine that they do not | |||
match. This signifies the client's new instantiation and subsequent | match. This signifies the client's new instantiation and subsequent | |||
loss (upon confirmation of the new client ID) of locking state. As a | loss (upon confirmation of the new client ID) of locking state. As a | |||
result, the server is free to release all locks held which are | result, the server is free to release all locks held that are | |||
associated with the old client ID which was derived from the old | associated with the old client ID that was derived from the old | |||
verifier. At this point conflicting locks from other clients, kept | verifier. At this point, conflicting locks from other clients, kept | |||
waiting while the lease had not yet expired, can be granted. In | waiting while the lease had not yet expired, can be granted. In | |||
addition, all stateids associated with the old client ID can also be | addition, all stateids associated with the old client ID can also be | |||
freed, as they are no longer reference-able. | freed, as they are no longer reference-able. | |||
Note that the verifier must have the same uniqueness properties as | Note that the verifier must have the same uniqueness properties as | |||
the verifier for the COMMIT operation. | the verifier for the COMMIT operation. | |||
8.4.2. Server Failure and Recovery | 8.4.2. Server Failure and Recovery | |||
If the server loses locking state (usually as a result of a restart), | If the server loses locking state (usually as a result of a restart), | |||
it must allow clients time to discover this fact and re-establish the | it must allow clients time to discover this fact and re-establish the | |||
lost locking state. The client must be able to re-establish the | lost locking state. The client must be able to re-establish the | |||
locking state without having the server deny valid requests because | locking state without having the server deny valid requests because | |||
the server has granted conflicting access to another client. | the server has granted conflicting access to another client. | |||
Likewise, if there is a possibility that clients have not yet re- | Likewise, if there is a possibility that clients have not yet re- | |||
established their locking state for a file, and that such locking | established their locking state for a file and that such locking | |||
state might make it invalid to perform READ or WRITE operations, for | state might make it invalid to perform READ or WRITE operations. For | |||
example through the establishment of mandatory locks, the server must | example, if mandatory locks are a possibility, the server must | |||
disallow READ and WRITE operations for that file. | disallow READ and WRITE operations for that file. | |||
A client can determine that loss of locking state has occurred via | A client can determine that loss of locking state has occurred via | |||
several methods. | several methods. | |||
1. When a SEQUENCE (most common) or other operation returns | 1. When a SEQUENCE (most common) or other operation returns | |||
NFS4ERR_BADSESSION, this may mean the session has been destroyed, | NFS4ERR_BADSESSION, this may mean that the session has been | |||
but the client ID is still valid. The client sends a | destroyed but the client ID is still valid. The client sends a | |||
CREATE_SESSION request with the client ID to re-establish the | CREATE_SESSION request with the client ID to re-establish the | |||
session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | session. If CREATE_SESSION fails with NFS4ERR_STALE_CLIENTID, | |||
the client must establish a new client ID (see Section 8.1) and | the client must establish a new client ID (see Section 8.1) and | |||
re-establish its lock state with the new client ID, after the | re-establish its lock state with the new client ID, after the | |||
CREATE_SESSION operation succeeds (see Section 8.4.2.1). | CREATE_SESSION operation succeeds (see Section 8.4.2.1). | |||
2. When a SEQUENCE (most common) or other operation on a persistent | 2. When a SEQUENCE (most common) or other operation on a persistent | |||
session returns NFS4ERR_DEADSESSION, this indicates that a | session returns NFS4ERR_DEADSESSION, this indicates that a | |||
session is no longer usable for new, i.e. not satisfied from the | session is no longer usable for new, i.e., not satisfied from the | |||
reply cache, operations. Once all pending operations are | reply cache, operations. Once all pending operations are | |||
determined to be either performed before the retry or not | determined to be either performed before the retry or not | |||
performed, the client sends a CREATE_SESSION request with the | performed, the client sends a CREATE_SESSION request with the | |||
client ID to re-establish the session. If CREATE_SESSION fails | client ID to re-establish the session. If CREATE_SESSION fails | |||
with NFS4ERR_STALE_CLIENTID, the client must establish a new | with NFS4ERR_STALE_CLIENTID, the client must establish a new | |||
client ID (see Section 8.1) and re-establish its lock state after | client ID (see Section 8.1) and re-establish its lock state after | |||
the CREATE_SESSION, with the new client ID, succeeds, | the CREATE_SESSION, with the new client ID, succeeds | |||
(Section 8.4.2.1). | (Section 8.4.2.1). | |||
3. When a operation, neither SEQUENCE nor preceded by SEQUENCE (for | 3. When an operation, neither SEQUENCE nor preceded by SEQUENCE (for | |||
example, CREATE_SESSION, DESTROY_SESSION) returns | example, CREATE_SESSION, DESTROY_SESSION), returns | |||
NFS4ERR_STALE_CLIENTID. The client MUST establish a new client | NFS4ERR_STALE_CLIENTID, the client MUST establish a new client ID | |||
ID (Section 8.1) and re-establish its lock state | (Section 8.1) and re-establish its lock state (Section 8.4.2.1). | |||
(Section 8.4.2.1). | ||||
8.4.2.1. State Reclaim | 8.4.2.1. State Reclaim | |||
When state information and the associated locks are lost as a result | When state information and the associated locks are lost as a result | |||
of a server restart, the protocol must provide a way to cause that | of a server restart, the protocol must provide a way to cause that | |||
state to be re-established. The approach used is to define, for most | state to be re-established. The approach used is to define, for most | |||
types of locking state (layouts are an exception), a request whose | types of locking state (layouts are an exception), a request whose | |||
function is to allow the client to re-establish on the server a lock | function is to allow the client to re-establish on the server a lock | |||
first obtained from a previous instance. Generally these requests | first obtained from a previous instance. Generally, these requests | |||
are variants of the requests normally used to create locks of that | are variants of the requests normally used to create locks of that | |||
type and are referred to as "reclaim-type" requests and the process | type and are referred to as "reclaim-type" requests, and the process | |||
of re-establishing such locks is referred to as "reclaiming" them. | of re-establishing such locks is referred to as "reclaiming" them. | |||
Because each client must have an opportunity to reclaim all of the | Because each client must have an opportunity to reclaim all of the | |||
locks that it has without the possibility that some other client will | locks that it has without the possibility that some other client will | |||
be granted a conflicting lock, a special period called the "grace | be granted a conflicting lock, a "grace period" is devoted to the | |||
period" is devoted to the reclaim process. During this period, | reclaim process. During this period, requests creating client IDs | |||
requests creating client IDs and sessions are handled normally, but | and sessions are handled normally, but locking requests are subject | |||
locking requests are subject to special restrictions. Only reclaim- | to special restrictions. Only reclaim-type locking requests are | |||
type locking requests are allowed, unless the server can reliably | allowed, unless the server can reliably determine (through state | |||
determine (through state persistently maintained across restart | persistently maintained across restart instances) that granting any | |||
instances), that granting any such lock cannot possibly conflict with | such lock cannot possibly conflict with a subsequent reclaim. When a | |||
a subsequent reclaim. When a request is made to obtain a new lock | request is made to obtain a new lock (i.e., not a reclaim-type | |||
(i.e. not a reclaim-type request) during the grace period and such a | request) during the grace period and such a determination cannot be | |||
determination cannot be made, the server must return the error | made, the server must return the error NFS4ERR_GRACE. | |||
NFS4ERR_GRACE. | ||||
Once a session is established using the new client ID, the client | Once a session is established using the new client ID, the client | |||
will use reclaim-type locking requests (e.g. LOCK requests with | will use reclaim-type locking requests (e.g., LOCK operations with | |||
reclaim set to TRUE and OPEN operations with a claim type of | reclaim set to TRUE and OPEN operations with a claim type of | |||
CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. | CLAIM_PREVIOUS; see Section 9.11) to re-establish its locking state. | |||
Once this is done, or if there is no such locking state to reclaim, | Once this is done, or if there is no such locking state to reclaim, | |||
the client sends a global RECLAIM_COMPLETE operation, i.e. one with | the client sends a global RECLAIM_COMPLETE operation, i.e., one with | |||
the rca_one_fs argument set to FALSE, to indicate that it has | the rca_one_fs argument set to FALSE, to indicate that it has | |||
reclaimed all of the locking state that it will reclaim. Once a | reclaimed all of the locking state that it will reclaim. Once a | |||
client sends such a RECLAIM_COMPLETE operation, it may attempt non- | client sends such a RECLAIM_COMPLETE operation, it may attempt non- | |||
reclaim locking operations, although it may get NFS4ERR_GRACE errors | reclaim locking operations, although it might get an NFS4ERR_GRACE | |||
the operations until the period of special handling is over. See | status result from each such operation until the period of special | |||
Section 11.7.7 for a discussion of the analogous handling lock | handling is over. See Section 11.7.7 for a discussion of the | |||
reclamation in the case of file systems transitioning from server to | analogous handling lock reclamation in the case of file systems | |||
server. | transitioning from server to server. | |||
During the grace period, the server must reject READ and WRITE | During the grace period, the server must reject READ and WRITE | |||
operations and non-reclaim locking requests (i.e. other LOCK and OPEN | operations and non-reclaim locking requests (i.e., other LOCK and | |||
operations) with an error of NFS4ERR_GRACE, unless it can guarantee | OPEN operations) with an error of NFS4ERR_GRACE, unless it can | |||
that these may be done safely, as described below. | guarantee that these may be done safely, as described below. | |||
The grace period may last until all clients which are known to | The grace period may last until all clients that are known to | |||
possibly have had locks have done a global RECLAIM_COMPLETE | possibly have had locks have done a global RECLAIM_COMPLETE | |||
operation, indicating that they have finished reclaiming the locks | operation, indicating that they have finished reclaiming the locks | |||
they held before the server restart. This means that a client which | they held before the server restart. This means that a client that | |||
has done a RECLAIM_COMPLETE must be prepared to receive an | has done a RECLAIM_COMPLETE must be prepared to receive an | |||
NFS4ERR_GRACE when attempting to acquire new locks. In order for the | NFS4ERR_GRACE when attempting to acquire new locks. In order for the | |||
server to know that all clients with possible prior lock state have | server to know that all clients with possible prior lock state have | |||
done a RECLAIM_COMPLETE, the server must maintain in stable storage a | done a RECLAIM_COMPLETE, the server must maintain in stable storage a | |||
list of clients which may have such locks. The server may also | list clients that may have such locks. The server may also terminate | |||
terminate the grace period before all clients have done a global | the grace period before all clients have done a global | |||
RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period | RECLAIM_COMPLETE. The server SHOULD NOT terminate the grace period | |||
before a time equal to the lease period in order to give clients an | before a time equal to the lease period in order to give clients an | |||
opportunity to find out about the server restart, as a result of | opportunity to find out about the server restart, as a result of | |||
sending requests on associated sessions with a frequency governed by | sending requests on associated sessions with a frequency governed by | |||
the lease time. Note that when a client does not send such requests | the lease time. Note that when a client does not send such requests | |||
(or they are sent by the client but not received by the server), it | (or they are sent by the client but not received by the server), it | |||
is possible for the grace period to expire before the client finds | is possible for the grace period to expire before the client finds | |||
out that the server restart has occurred. | out that the server restart has occurred. | |||
Some additional time in order to allow a client to establish a new | Some additional time in order to allow a client to establish a new | |||
skipping to change at page 174, line 51 | skipping to change at page 174, line 14 | |||
to guarantee that no possible conflict could arise between a | to guarantee that no possible conflict could arise between a | |||
potential reclaim locking request and the READ or WRITE operation. | potential reclaim locking request and the READ or WRITE operation. | |||
If the server is unable to offer that guarantee, the NFS4ERR_GRACE | If the server is unable to offer that guarantee, the NFS4ERR_GRACE | |||
error must be returned to the client. | error must be returned to the client. | |||
For a server to provide simple, valid handling during the grace | For a server to provide simple, valid handling during the grace | |||
period, the easiest method is to simply reject all non-reclaim | period, the easiest method is to simply reject all non-reclaim | |||
locking requests and READ and WRITE operations by returning the | locking requests and READ and WRITE operations by returning the | |||
NFS4ERR_GRACE error. However, a server may keep information about | NFS4ERR_GRACE error. However, a server may keep information about | |||
granted locks in stable storage. With this information, the server | granted locks in stable storage. With this information, the server | |||
could determine if a regular lock or READ or WRITE operation can be | could determine if a locking, READ or WRITE operation can be safely | |||
safely processed. | processed. | |||
For example, if the server maintained on stable storage summary | For example, if the server maintained on stable storage summary | |||
information on whether mandatory locks exist, either mandatory byte- | information on whether mandatory locks exist, either mandatory byte- | |||
range locks, or share reservations specifying deny modes, many | range locks, or share reservations specifying deny modes, many | |||
requests could be allowed during the grace period. If it is known | requests could be allowed during the grace period. If it is known | |||
that no such share reservations exist, OPEN request that do not | that no such share reservations exist, OPEN request that do not | |||
specify deny modes may be safely granted. If, in addition, it is | specify deny modes may be safely granted. If, in addition, it is | |||
known that no mandatory byte-range locks exist, either through | known that no mandatory byte-range locks exist, either through | |||
information stored on stable storage or simply because the server | information stored on stable storage or simply because the server | |||
does not support such locks, READ and WRITE requests may be safely | does not support such locks, READ and WRITE operations may be safely | |||
processed during the grace period. Another important case is where | processed during the grace period. Another important case is where | |||
it is known that no mandatory byte-range locks exist, either because | it is known that no mandatory byte-range locks exist, either because | |||
the server does not provide support for them, or because their | the server does not provide support for them or because their absence | |||
absence is known from persistently recorded data. In this case, READ | is known from persistently recorded data. In this case, READ and | |||
and WRITE operations specifying stateids derived from reclaim-type | WRITE operations specifying stateids derived from reclaim-type | |||
operation may be validly processed during the grace period because | operations may be validly processed during the grace period because | |||
the fact of the valid reclaim ensures that no lock subsequently | of the fact that the valid reclaim ensures that no lock subsequently | |||
granted can prevent the I/O. | granted can prevent the I/O. | |||
To reiterate, for a server that allows non-reclaim lock and I/O | To reiterate, for a server that allows non-reclaim lock and I/O | |||
requests to be processed during the grace period, it MUST determine | requests to be processed during the grace period, it MUST determine | |||
that no lock subsequently reclaimed will be rejected and that no lock | that no lock subsequently reclaimed will be rejected and that no lock | |||
subsequently reclaimed would have prevented any I/O operation | subsequently reclaimed would have prevented any I/O operation | |||
processed during the grace period. | processed during the grace period. | |||
Clients should be prepared for the return of NFS4ERR_GRACE errors for | Clients should be prepared for the return of NFS4ERR_GRACE errors for | |||
non-reclaim lock and I/O requests. In this case the client should | non-reclaim lock and I/O requests. In this case, the client should | |||
employ a retry mechanism for the request. A delay (on the order of | employ a retry mechanism for the request. A delay (on the order of | |||
several seconds) between retries should be used to avoid overwhelming | several seconds) between retries should be used to avoid overwhelming | |||
the server. Further discussion of the general issue is included in | the server. Further discussion of the general issue is included in | |||
[47]. The client must account for the server that can perform I/O | [47]. The client must account for the server that can perform I/O | |||
and non-reclaim locking requests within the grace period as well as | and non-reclaim locking requests within the grace period as well as | |||
those that cannot do so. | those that cannot do so. | |||
A reclaim-type locking request outside the server's grace period can | A reclaim-type locking request outside the server's grace period can | |||
only succeed if the server can guarantee that no conflicting lock or | only succeed if the server can guarantee that no conflicting lock or | |||
I/O request has been granted since restart. | I/O request has been granted since restart. | |||
A server may, upon restart, establish a new value for the lease | A server may, upon restart, establish a new value for the lease | |||
period. Therefore, clients should, once a new client ID is | period. Therefore, clients should, once a new client ID is | |||
established, refetch the lease_time attribute and use it as the basis | established, refetch the lease_time attribute and use it as the basis | |||
for lease renewal for the lease associated with that server. | for lease renewal for the lease associated with that server. | |||
However, the server must establish, for this restart event, a grace | However, the server must establish, for this restart event, a grace | |||
period at least as long as the lease period for the previous server | period at least as long as the lease period for the previous server | |||
instantiation. This allows the client state obtained during the | instantiation. This allows the client state obtained during the | |||
previous server instance to be reliably re-established. | previous server instance to be reliably re-established. | |||
The possibility exists, that because of server configuration events, | The possibility exists that, because of server configuration events, | |||
the client will be communicating with a server different than the one | the client will be communicating with a server different than the one | |||
on which the locks were obtained, as shown by the combination of | on which the locks were obtained, as shown by the combination of | |||
eir_server_scope and eir_server_owner. This leads to the issue of if | eir_server_scope and eir_server_owner. This leads to the issue of if | |||
and when the client should attempt to reclaim locks previously | and when the client should attempt to reclaim locks previously | |||
obtained on what is being reported as a different server. The rules | obtained on what is being reported as a different server. The rules | |||
to resolve this question are as follows: | to resolve this question are as follows: | |||
o If the server scope is different the client should not attempt to | o If the server scope is different, the client should not attempt to | |||
reclaim locks. In this situation no lock reclaim is possible. | reclaim locks. In this situation, no lock reclaim is possible. | |||
Any attempt to re-obtain the locks with non-reclaim operations is | Any attempt to re-obtain the locks with non-reclaim operations is | |||
problematic since there is no guarantee that the existing | problematic since there is no guarantee that the existing | |||
filehandles will be recognized by the new server, or that if | filehandles will be recognized by the new server, or that if | |||
recognized, they denote the same objects. It is best to treat the | recognized, they denote the same objects. It is best to treat the | |||
locks as having been revoked by the reconfiguration event. | locks as having been revoked by the reconfiguration event. | |||
o If the server scope is the same, the client should attempt to | o If the server scope is the same, the client should attempt to | |||
reclaim locks, even if the eir_server_owner value is different. | reclaim locks, even if the eir_server_owner value is different. | |||
In this situation, it is the responsibility of the server to | In this situation, it is the responsibility of the server to | |||
return NFS4ERR_NO_GRACE if it cannot provide correct support for | return NFS4ERR_NO_GRACE if it cannot provide correct support for | |||
lock reclaim operations, including the prevention of edge | lock reclaim operations, including the prevention of edge | |||
conditions. | conditions. | |||
The eir_server_owner field is not used in making this determination. | The eir_server_owner field is not used in making this determination. | |||
Its function is to specify trunking possibilities for the client (see | Its function is to specify trunking possibilities for the client (see | |||
Section 2.10.5) and not to control lock reclaim. | Section 2.10.5) and not to control lock reclaim. | |||
8.4.2.1.1. Security Considerations for State Reclaim | 8.4.2.1.1. Security Considerations for State Reclaim | |||
During the grace period, a client can reclaim state it believes or | During the grace period, a client can reclaim state that it believes | |||
asserts it had before the server restarted. Unless the server | or asserts it had before the server restarted. Unless the server | |||
maintained a complete record of all the state the client had, the | maintained a complete record of all the state the client had, the | |||
server has little choice but to trust the client. (Of course if the | server has little choice but to trust the client. (Of course, if the | |||
server maintained a complete record, then it would not have to force | server maintained a complete record, then it would not have to force | |||
the client to reclaim state after server restart.) While the server | the client to reclaim state after server restart.) While the server | |||
has to trust the client to tell the truth, such trust does not have | has to trust the client to tell the truth, such trust does not have | |||
any negative consequences for security. The fundamental rule for the | any negative consequences for security. The fundamental rule for the | |||
server when processing reclaim requests is that it MUST NOT grant the | server when processing reclaim requests is that it MUST NOT grant the | |||
reclaim if an equivalent non-reclaim request would not be granted | reclaim if an equivalent non-reclaim request would not be granted | |||
during steady-state due to access control or access conflict issues. | during steady state due to access control or access conflict issues. | |||
For example an OPEN request during a reclaim will be refused with | For example, an OPEN request during a reclaim will be refused with | |||
NFS4ERR_ACCESS if the principal making the request does not have | NFS4ERR_ACCESS if the principal making the request does not have | |||
access to open the file according to the discretionary ACL | access to open the file according to the discretionary ACL | |||
(Section 6.2.2) on the file. | (Section 6.2.2) on the file. | |||
Nonetheless, it is possible that client operating in error or | Nonetheless, it is possible that a client operating in error or | |||
maliciously could, during reclaim, prevent another client from | maliciously could, during reclaim, prevent another client from | |||
reclaiming access to state. For example, an attacker could send an | reclaiming access to state. For example, an attacker could send an | |||
OPEN reclaim operation with a deny mode that prevents another client | OPEN reclaim operation with a deny mode that prevents another client | |||
from reclaiming the open state it had before the server restarted. | from reclaiming the OPEN state it had before the server restarted. | |||
The attacker could perform the same denial of service during steady | The attacker could perform the same denial of service during steady | |||
state prior to server restart, as long as the the attacker had | state prior to server restart, as long as the attacker had | |||
permissions. Given that the attack vectors are equivalent, the grace | permissions. Given that the attack vectors are equivalent, the grace | |||
period does not offer any additional opportunity for denial of | period does not offer any additional opportunity for denial of | |||
service, and any concerns about this attack vector, whether during | service, and any concerns about this attack vector, whether during | |||
grace or steady state are addressed the same way: use RPCSEC_GSS for | grace or steady state, are addressed the same way: use RPCSEC_GSS for | |||
authentication, and limit access to the file only to principals the | authentication and limit access to the file only to principals that | |||
owner of the file trusts. | the owner of the file trusts. | |||
Note that if prior to restart the server had client IDs with the | Note that if prior to restart the server had client IDs with the | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID (Section 18.35) capability set, then | EXCHGID4_FLAG_BIND_PRINC_STATEID (Section 18.35) capability set, then | |||
the server SHOULD record in stable storage the client owner and the | the server SHOULD record in stable storage the client owner and the | |||
principal that established the client ID via EXCHANGE_ID. If the | principal that established the client ID via EXCHANGE_ID. If the | |||
server does not, then there is a risk a client will be unable to | server does not, then there is a risk a client will be unable to | |||
reclaim state if it does not have a credential for a principal that | reclaim state if it does not have a credential for a principal that | |||
was originally authorized to establish the state. | was originally authorized to establish the state. | |||
8.4.3. Network Partitions and Recovery | 8.4.3. Network Partitions and Recovery | |||
If the duration of a network partition is greater than the lease | If the duration of a network partition is greater than the lease | |||
period provided by the server, the server will not have received a | period provided by the server, the server will not have received a | |||
lease renewal from the client. If this occurs, the server may free | lease renewal from the client. If this occurs, the server may free | |||
all locks held for the client, or it may allow the lock state to | all locks held for the client or it may allow the lock state to | |||
remain for a considerable period, subject to the constraint that if a | remain for a considerable period, subject to the constraint that if a | |||
request for a conflicting lock is made, locks associated with an | request for a conflicting lock is made, locks associated with an | |||
expired lease do not prevent such a conflicting lock from being | expired lease do not prevent such a conflicting lock from being | |||
granted but MUST be revoked as necessary so as not to interfere with | granted but MUST be revoked as necessary so as to avoid interfering | |||
such conflicting requests. | with such conflicting requests. | |||
If the server chooses to delay freeing of lock state until there is a | If the server chooses to delay freeing of lock state until there is a | |||
conflict, it may either free all of the clients locks once there is a | conflict, it may either free all of the client's locks once there is | |||
conflict, or it may only revoke the minimum set of locks necessary to | a conflict or it may only revoke the minimum set of locks necessary | |||
allow conflicting requests. When it adopts the finer-grained | to allow conflicting requests. When it adopts the finer-grained | |||
approach, it must revoke all locks associated with a given stateid, | approach, it must revoke all locks associated with a given stateid, | |||
even if the conflict is with only a subset of locks. | even if the conflict is with only a subset of locks. | |||
When the server chooses to free all of a client's lock state, either | When the server chooses to free all of a client's lock state, either | |||
immediately upon lease expiration, or a result of the first attempt | immediately upon lease expiration or as a result of the first attempt | |||
to obtain a conflicting a lock, the server may report the loss of | to obtain a conflicting a lock, the server may report the loss of | |||
lock state in a number of ways. | lock state in a number of ways. | |||
The server may choose to invalidate the session and the associated | The server may choose to invalidate the session and the associated | |||
client ID. In this case, once the client can communicate with the | client ID. In this case, once the client can communicate with the | |||
server, it will receive an NFS4ERR_BADSESSION error. Upon attempting | server, it will receive an NFS4ERR_BADSESSION error. Upon attempting | |||
to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | to create a new session, it would get an NFS4ERR_STALE_CLIENTID. | |||
Upon creating the new client ID and new session the client will | Upon creating the new client ID and new session, the client will | |||
attempt to reclaim locks. Normally, the server will not allow the | attempt to reclaim locks. Normally, the server will not allow the | |||
client to reclaim locks, because the server will not be in its | client to reclaim locks, because the server will not be in its | |||
recovery grace period. | recovery grace period. | |||
Another possibility is for the server to maintain the session and | Another possibility is for the server to maintain the session and | |||
client ID but for all stateids held by the client to become invalid | client ID but for all stateids held by the client to become invalid | |||
or stale. Once the client can reach the server after such a network | or stale. Once the client can reach the server after such a network | |||
partition, the status returned by the SEQUENCE operation will | partition, the status returned by the SEQUENCE operation will | |||
indicate a loss of locking state, i.e. the flag | indicate a loss of locking state; i.e., the flag | |||
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags. | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED will be set in sr_status_flags. | |||
In addition, all I/O submitted by the client with the now invalid | In addition, all I/O submitted by the client with the now invalid | |||
stateids will fail with the server returning the error | stateids will fail with the server returning the error | |||
NFS4ERR_EXPIRED. Once the client learns of the loss of locking | NFS4ERR_EXPIRED. Once the client learns of the loss of locking | |||
state, it will suitably notify the applications that held the | state, it will suitably notify the applications that held the | |||
invalidated locks. The client should then take action to free | invalidated locks. The client should then take action to free | |||
invalidated stateids, either by establishing a new client ID using a | invalidated stateids, either by establishing a new client ID using a | |||
new verifier or by doing a FREE_STATEID operation to release each of | new verifier or by doing a FREE_STATEID operation to release each of | |||
the invalidated stateids. | the invalidated stateids. | |||
skipping to change at page 179, line 8 | skipping to change at page 178, line 18 | |||
the following: | the following: | |||
1. Client A acquires a lock. | 1. Client A acquires a lock. | |||
2. Client A and server experience mutual network partition, such | 2. Client A and server experience mutual network partition, such | |||
that client A is unable to renew its lease. | that client A is unable to renew its lease. | |||
3. Client A's lease expires, and the server releases the lock. | 3. Client A's lease expires, and the server releases the lock. | |||
4. Client B acquires a lock that would have conflicted with that of | 4. Client B acquires a lock that would have conflicted with that of | |||
Client A. | client A. | |||
5. Client B releases its lock. | 5. Client B releases its lock. | |||
6. Server restarts. | 6. Server restarts. | |||
7. Network partition between client A and server heals. | 7. Network partition between client A and server heals. | |||
8. Client A connects to new server instance and finds out about | 8. Client A connects to a new server instance and finds out about | |||
server restart. | server restart. | |||
9. Client A reclaims its lock within the server's grace period. | 9. Client A reclaims its lock within the server's grace period. | |||
Thus, at the final step, the server has erroneously granted client | Thus, at the final step, the server has erroneously granted client | |||
A's lock reclaim. If client B modified the object the lock was | A's lock reclaim. If client B modified the object the lock was | |||
protecting, client A will experience object corruption. | protecting, client A will experience object corruption. | |||
The second known edge condition arises in situations such as the | The second known edge condition arises in situations such as the | |||
following: | following: | |||
skipping to change at page 180, line 9 | skipping to change at page 179, line 20 | |||
9. Client A connects to new server instance and finds out about | 9. Client A connects to new server instance and finds out about | |||
server restart. | server restart. | |||
10. Client A reclaims its lock within the server's grace period. | 10. Client A reclaims its lock within the server's grace period. | |||
As with the first edge condition, the final step of the scenario of | As with the first edge condition, the final step of the scenario of | |||
the second edge condition has the server erroneously granting client | the second edge condition has the server erroneously granting client | |||
A's lock reclaim. | A's lock reclaim. | |||
Solving the first and second edge conditions requires that the server | Solving the first and second edge conditions requires either that the | |||
either always assumes after it restarts that some edge condition | server always assumes after it restarts that some edge condition | |||
occurs, and thus return NFS4ERR_NO_GRACE for all reclaim attempts, or | occurs, and thus returns NFS4ERR_NO_GRACE for all reclaim attempts, | |||
that the server record some information in stable storage. The | or that the server record some information in stable storage. The | |||
amount of information the server records in stable storage is in | amount of information the server records in stable storage is in | |||
inverse proportion to how harsh the server intends to be whenever | inverse proportion to how harsh the server intends to be whenever | |||
edge conditions arise. The server that is completely tolerant of all | edge conditions arise. The server that is completely tolerant of all | |||
edge conditions will record in stable storage every lock that is | edge conditions will record in stable storage every lock that is | |||
acquired, removing the lock record from stable storage only when the | acquired, removing the lock record from stable storage only when the | |||
lock is released. For the two edge conditions discussed above, the | lock is released. For the two edge conditions discussed above, the | |||
harshest a server can be, and still support a grace period for | harshest a server can be, and still support a grace period for | |||
reclaims, requires that the server record in stable storage some | reclaims, requires that the server record in stable storage some | |||
minimal information. For example, a server implementation could, for | minimal information. For example, a server implementation could, for | |||
each client, save in stable storage a record containing: | each client, save in stable storage a record containing: | |||
o the co_ownerid field from the client_owner4 presented in the | o the co_ownerid field from the client_owner4 presented in the | |||
EXCHANGE_ID operation. | EXCHANGE_ID operation. | |||
o a boolean that indicates if the client's lease expired or if there | o a boolean that indicates if the client's lease expired or if there | |||
was administrative intervention (see Section 8.5) to revoke a | was administrative intervention (see Section 8.5) to revoke a | |||
byte-range lock, share reservation, or delegation and there has | byte-range lock, share reservation, or delegation and there has | |||
been no acknowledgement, via FREE_STATEID, of such revocation. | been no acknowledgment, via FREE_STATEID, of such revocation. | |||
o a boolean that indicates whether the client may have locks that it | o a boolean that indicates whether the client may have locks that it | |||
believes to be reclaimable in situations which the grace period | believes to be reclaimable in situations in which the grace period | |||
was terminated, making the server's view of lock reclaimability | was terminated, making the server's view of lock reclaimability | |||
suspect. The server will set this for any client record in stable | suspect. The server will set this for any client record in stable | |||
storage where the client has not done a suitable RECLAIM_COMPLETE | storage where the client has not done a suitable RECLAIM_COMPLETE | |||
(global or file system-specific depending on the target of the | (global or file system-specific depending on the target of the | |||
lock request) before it grants any new (i.e. not reclaimed) lock | lock request) before it grants any new (i.e., not reclaimed) lock | |||
to any client. | to any client. | |||
Assuming the above record keeping, for the first edge condition, | Assuming the above record keeping, for the first edge condition, | |||
after the server restarts, the record that client A's lease expired | after the server restarts, the record that client A's lease expired | |||
means that another client could have acquired a conflicting byte- | means that another client could have acquired a conflicting byte- | |||
range lock, share reservation, or delegation. Hence the server must | range lock, share reservation, or delegation. Hence, the server must | |||
reject a reclaim from client A with the error NFS4ERR_NO_GRACE. | reject a reclaim from client A with the error NFS4ERR_NO_GRACE. | |||
For the second edge condition, after the server restarts for a second | For the second edge condition, after the server restarts for a second | |||
time, the indication that the client had not completed its reclaims | time, the indication that the client had not completed its reclaims | |||
at the time at which the grace period ended means that the server | at the time at which the grace period ended means that the server | |||
must reject a reclaim from client A with the error NFS4ERR_NO_GRACE. | must reject a reclaim from client A with the error NFS4ERR_NO_GRACE. | |||
When either edge condition occurs, the client's attempt to reclaim | When either edge condition occurs, the client's attempt to reclaim | |||
locks will result in the error NFS4ERR_NO_GRACE. When this is | locks will result in the error NFS4ERR_NO_GRACE. When this is | |||
received, or after the client restarts with no lock state, the client | received, or after the client restarts with no lock state, the client | |||
skipping to change at page 181, line 24 | skipping to change at page 180, line 35 | |||
reclaims of share reservations, byte-range locks, and delegations): | reclaims of share reservations, byte-range locks, and delegations): | |||
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | 1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely | |||
unforgiving, but necessary if the server does not record lock | unforgiving, but necessary if the server does not record lock | |||
state in stable storage. | state in stable storage. | |||
2. Record sufficient state in stable storage such that all known | 2. Record sufficient state in stable storage such that all known | |||
edge conditions involving server restart, including the two noted | edge conditions involving server restart, including the two noted | |||
in this section, are detected. It is acceptable to erroneously | in this section, are detected. It is acceptable to erroneously | |||
recognize an edge condition and not allow a reclaim, when, with | recognize an edge condition and not allow a reclaim, when, with | |||
sufficient knowledge it would be allowed. The error the server | sufficient knowledge, it would be allowed. The error the server | |||
would return in this case is NFS4ERR_NO_GRACE. Note it is not | would return in this case is NFS4ERR_NO_GRACE. Note that it is | |||
known if there are other edge conditions. | not known if there are other edge conditions. | |||
In the event that, after a server restart, the server determines | In the event that, after a server restart, the server determines | |||
that there is unrecoverable damage or corruption to the | there is unrecoverable damage or corruption to the information in | |||
information in stable storage, then for all clients and/or locks | stable storage, then for all clients and/or locks that may be | |||
which may be affected, the server MUST return NFS4ERR_NO_GRACE. | affected, the server MUST return NFS4ERR_NO_GRACE. | |||
A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | A mandate for the client's handling of the NFS4ERR_NO_GRACE error is | |||
outside the scope of this specification, since the strategies for | outside the scope of this specification, since the strategies for | |||
such handling are very dependent on the client's operating | such handling are very dependent on the client's operating | |||
environment. However, one potential approach is described below. | environment. However, one potential approach is described below. | |||
When the client receives NFS4ERR_NO_GRACE, it could examine the | When the client receives NFS4ERR_NO_GRACE, it could examine the | |||
change attribute of the objects the client is trying to reclaim state | change attribute of the objects for which the client is trying to | |||
for, and use that to determine whether to re-establish the state via | reclaim state, and use that to determine whether to re-establish the | |||
normal OPEN or LOCK requests. This is acceptable provided the | state via normal OPEN or LOCK operations. This is acceptable | |||
client's operating environment allows it. In other words, the client | provided that the client's operating environment allows it. In other | |||
implementor is advised to document for his users the behavior. The | words, the client implementor is advised to document for his users | |||
client could also inform the application that its byte-range lock or | the behavior. The client could also inform the application that its | |||
share reservations (whether they were delegated or not) have been | byte-range lock or share reservations (whether or not they were | |||
lost, such as via a UNIX signal, a GUI pop-up window, etc. See | delegated) have been lost, such as via a UNIX signal, a Graphical | |||
Section 10.5 for a discussion of what the client should do for | User Interface (GUI) pop-up window, etc. See Section 10.5 for a | |||
dealing with unreclaimed delegations on client state. | discussion of what the client should do for dealing with unreclaimed | |||
delegations on client state. | ||||
For further discussion of revocation of locks see Section 8.5. | For further discussion of revocation of locks, see Section 8.5. | |||
8.5. Server Revocation of Locks | 8.5. Server Revocation of Locks | |||
At any point, the server can revoke locks held by a client and the | At any point, the server can revoke locks held by a client, and the | |||
client must be prepared for this event. When the client detects that | client must be prepared for this event. When the client detects that | |||
its locks have been or may have been revoked, the client is | its locks have been or may have been revoked, the client is | |||
responsible for validating the state information between itself and | responsible for validating the state information between itself and | |||
the server. Validating locking state for the client means that it | the server. Validating locking state for the client means that it | |||
must verify or reclaim state for each lock currently held. | must verify or reclaim state for each lock currently held. | |||
The first occasion of lock revocation is upon server restart. Note | The first occasion of lock revocation is upon server restart. Note | |||
that this includes situations in which sessions are persistent and | that this includes situations in which sessions are persistent and | |||
locking state is lost. In this class of instances, the client will | locking state is lost. In this class of instances, the client will | |||
receive an error (NFS4ERR_STALE_CLIENTID) on an operation that takes | receive an error (NFS4ERR_STALE_CLIENTID) on an operation that takes | |||
client ID, usually as part of recovery in response to a problem with | client ID, usually as part of recovery in response to a problem with | |||
the current session) and the client will proceed with normal crash | the current session), and the client will proceed with normal crash | |||
recovery as described in the Section 8.4.2.1. | recovery as described in the Section 8.4.2.1. | |||
The second occasion of lock revocation is the inability to renew the | The second occasion of lock revocation is the inability to renew the | |||
lease before expiration, as discussed in Section 8.4.3. While this | lease before expiration, as discussed in Section 8.4.3. While this | |||
is considered a rare or unusual event, the client must be prepared to | is considered a rare or unusual event, the client must be prepared to | |||
recover. The server is responsible for determining the precise | recover. The server is responsible for determining the precise | |||
consequences of the lease expiration, informing the client of the | consequences of the lease expiration, informing the client of the | |||
scope of the lock revocation decided upon. The client then uses the | scope of the lock revocation decided upon. The client then uses the | |||
status information provided by the server in the SEQUENCE results | status information provided by the server in the SEQUENCE results | |||
(field sr_status_flags, see Section 18.46.3) to synchronize its | (field sr_status_flags, see Section 18.46.3) to synchronize its | |||
locking state with that of the server, in order to recover. | locking state with that of the server, in order to recover. | |||
The third occasion of lock revocation can occur as a result of | The third occasion of lock revocation can occur as a result of | |||
revocation of locks within the lease period, either because of | revocation of locks within the lease period, either because of | |||
administrative intervention, or because a recallable lock (a | administrative intervention or because a recallable lock (a | |||
delegation or layout) was not returned within the lease period after | delegation or layout) was not returned within the lease period after | |||
having been recalled. While these are considered rare events, they | having been recalled. While these are considered rare events, they | |||
are possible and the client must be prepared to deal with them. When | are possible, and the client must be prepared to deal with them. | |||
either of these events occur, the client finds out about the | When either of these events occurs, the client finds out about the | |||
situation through the status returned by the SEQUENCE operation. Any | situation through the status returned by the SEQUENCE operation. Any | |||
use of stateids associated with locks revoked during the lease period | use of stateids associated with locks revoked during the lease period | |||
will receive the error NFS4ERR_ADMIN_REVOKED or | will receive the error NFS4ERR_ADMIN_REVOKED or | |||
NFS4ERR_DELEG_REVOKED, as appropriate. | NFS4ERR_DELEG_REVOKED, as appropriate. | |||
In all situations in which a subset of locking state may have been | In all situations in which a subset of locking state may have been | |||
revoked, which include all cases in which locking state is revoked | revoked, which include all cases in which locking state is revoked | |||
within the lease period, it is up to the client to determine which | within the lease period, it is up to the client to determine which | |||
locks have been revoked and which have not. It does this by using | locks have been revoked and which have not. It does this by using | |||
the TEST_STATEID operation on the appropriate set of stateids. Once | the TEST_STATEID operation on the appropriate set of stateids. Once | |||
the set of revoked locks has been determined, the applications can be | the set of revoked locks has been determined, the applications can be | |||
notified, and the invalidated stateids can be freed and lock | notified, and the invalidated stateids can be freed and lock | |||
revocation acknowledged by using FREE_STATEID. | revocation acknowledged by using FREE_STATEID. | |||
8.6. Short and Long Leases | 8.6. Short and Long Leases | |||
When determining the time period for the server lease, the usual | When determining the time period for the server lease, the usual | |||
lease tradeoffs apply. Short leases are good for fast server | lease tradeoffs apply. A short lease is good for fast server | |||
recovery at a cost of increased operations to effect lease renewal | recovery at a cost of increased operations to effect lease renewal | |||
(when there are no other operations during the period to effect lease | (when there are no other operations during the period to effect lease | |||
renewal as a side-effect). Long leases are certainly kinder and | renewal as a side effect). A long lease is certainly kinder and | |||
gentler to servers trying to handle very large numbers of clients. | gentler to servers trying to handle very large numbers of clients. | |||
The number of extra requests to effect lock renewal drops in inverse | The number of extra requests to effect lock renewal drops in inverse | |||
proportion to the lease time. The disadvantages of long leases | proportion to the lease time. The disadvantages of a long lease | |||
include the possibility of slower recovery after certain failures. | include the possibility of slower recovery after certain failures. | |||
After server failure, a longer grace period may be required when some | After server failure, a longer grace period may be required when some | |||
clients do not promptly reclaim their locks and do a global | clients do not promptly reclaim their locks and do a global | |||
RECLAIM_COMPLETE. In the event of client failure, there can be a | RECLAIM_COMPLETE. In the event of client failure, the longer period | |||
longer period for leases to expire thus forcing conflicting requests | for a lease to expire will force conflicting requests to wait longer. | |||
to wait. | ||||
Long leases are practical if the server can store lease state in | A long lease is practical if the server can store lease state in | |||
stable storage. Upon recovery, the server can reconstruct the lease | stable storage. Upon recovery, the server can reconstruct the lease | |||
state from its stable storage and continue operation with its | state from its stable storage and continue operation with its | |||
clients. | clients. | |||
8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | 8.7. Clocks, Propagation Delay, and Calculating Lease Expiration | |||
To avoid the need for synchronized clocks, lease times are granted by | To avoid the need for synchronized clocks, lease times are granted by | |||
the server as a time delta. However, there is a requirement that the | the server as a time delta. However, there is a requirement that the | |||
client and server clocks do not drift excessively over the duration | client and server clocks do not drift excessively over the duration | |||
of the lease. There is also the issue of propagation delay across | of the lease. There is also the issue of propagation delay across | |||
the network which could easily be several hundred milliseconds as | the network, which could easily be several hundred milliseconds, as | |||
well as the possibility that requests will be lost and need to be | well as the possibility that requests will be lost and need to be | |||
retransmitted. | retransmitted. | |||
To take propagation delay into account, the client should subtract it | To take propagation delay into account, the client should subtract it | |||
from lease times (e.g. if the client estimates the one-way | from lease times (e.g., if the client estimates the one-way | |||
propagation delay as 200 milliseconds, then it can assume that the | propagation delay as 200 milliseconds, then it can assume that the | |||
lease is already 200 milliseconds old when it gets it). In addition, | lease is already 200 milliseconds old when it gets it). In addition, | |||
it will take another 200 milliseconds to get a response back to the | it will take another 200 milliseconds to get a response back to the | |||
server. So the client must send a lease renewal or write data back | server. So the client must send a lease renewal or write data back | |||
to the server at least 400 milliseconds before the lease would | to the server at least 400 milliseconds before the lease would | |||
expire. If the propagation delay varies over the life of the lease | expire. If the propagation delay varies over the life of the lease | |||
(e.g. the client is on a mobile host), the client will need to | (e.g., the client is on a mobile host), the client will need to | |||
continuously subtract the increase in propagation delay from the | continuously subtract the increase in propagation delay from the | |||
lease times. | lease times. | |||
The server's lease period configuration should take into account the | The server's lease period configuration should take into account the | |||
network distance of the clients that will be accessing the server's | network distance of the clients that will be accessing the server's | |||
resources. It is expected that the lease period will take into | resources. It is expected that the lease period will take into | |||
account the network propagation delays and other network delay | account the network propagation delays and other network delay | |||
factors for the client population. Since the protocol does not allow | factors for the client population. Since the protocol does not allow | |||
for an automatic method to determine an appropriate lease period, the | for an automatic method to determine an appropriate lease period, the | |||
server's administrator may have to tune the lease period. | server's administrator may have to tune the lease period. | |||
8.8. Obsolete Locking Infrastructure From NFSv4.0 | 8.8. Obsolete Locking Infrastructure from NFSv4.0 | |||
There are a number of operations and fields within existing | There are a number of operations and fields within existing | |||
operations that no longer have a function in NFSv4.1. In one way or | operations that no longer have a function in NFSv4.1. In one way or | |||
another, these changes are all due to the implementation of sessions | another, these changes are all due to the implementation of sessions | |||
which provides client context and exactly once semantics as a base | that provide client context and exactly once semantics as a base | |||
feature of the protocol, separate from locking itself. | feature of the protocol, separate from locking itself. | |||
The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. | The following NFSv4.0 operations MUST NOT be implemented in NFSv4.1. | |||
The server MUST return NFS4ERR_NOTSUPP if these operations are found | The server MUST return NFS4ERR_NOTSUPP if these operations are found | |||
in an NFSv4.1 COMPOUND. | in an NFSv4.1 COMPOUND. | |||
o SETCLIENTID since its function has been replaced by EXCHANGE_ID. | o SETCLIENTID since its function has been replaced by EXCHANGE_ID. | |||
o SETCLIENTID_CONFIRM since client ID confirmation now happens by | o SETCLIENTID_CONFIRM since client ID confirmation now happens by | |||
means of CREATE_SESSION. | means of CREATE_SESSION. | |||
skipping to change at page 184, line 35 | skipping to change at page 183, line 46 | |||
o OPEN_CONFIRM because state-owner-based seqids have been replaced | o OPEN_CONFIRM because state-owner-based seqids have been replaced | |||
by the sequence ID in the SEQUENCE operation. | by the sequence ID in the SEQUENCE operation. | |||
o RELEASE_LOCKOWNER because lock-owners with no associated locks do | o RELEASE_LOCKOWNER because lock-owners with no associated locks do | |||
not have any sequence-related state and so can be deleted by the | not have any sequence-related state and so can be deleted by the | |||
server at will. | server at will. | |||
o RENEW because every SEQUENCE operation for a session causes lease | o RENEW because every SEQUENCE operation for a session causes lease | |||
renewal, making a separate operation superfluous. | renewal, making a separate operation superfluous. | |||
Also, there are a number of fields, present in existing operations | Also, there are a number of fields, present in existing operations, | |||
related to locking that have no use in minor version one. They were | related to locking that have no use in minor version 1. They were | |||
used in minor version zero to perform functions now provided in a | used in minor version 0 to perform functions now provided in a | |||
different fashion. | different fashion. | |||
o Sequence ids used to sequence requests for a given state-owner and | o Sequence ids used to sequence requests for a given state-owner and | |||
to provide retry protection, now provided via sessions. | to provide retry protection, now provided via sessions. | |||
o Client IDs used to identify the client associated with a given | o Client IDs used to identify the client associated with a given | |||
request. Client identification is now available using the client | request. Client identification is now available using the client | |||
ID associated with the current session, without needing an | ID associated with the current session, without needing an | |||
explicit client ID field. | explicit client ID field. | |||
Such vestigial fields in existing operations have no function in | Such vestigial fields in existing operations have no function in | |||
NFSv4.1 and are ignored by the server. Note that client IDs in | NFSv4.1 and are ignored by the server. Note that client IDs in | |||
operations new to NFSv4.1 (such as CREATE_SESSION and | operations new to NFSv4.1 (such as CREATE_SESSION and | |||
DESTROY_CLIENTID) are not ignored. | DESTROY_CLIENTID) are not ignored. | |||
9. File Locking and Share Reservations | 9. File Locking and Share Reservations | |||
To support Win32 share reservations it is necessary to provide | To support Win32 share reservations, it is necessary to provide | |||
operations which atomically open or create files. Having a separate | operations that atomically open or create files. Having a separate | |||
share/unshare operation would not allow correct implementation of the | share/unshare operation would not allow correct implementation of the | |||
Win32 OpenFile API. In order to correctly implement share semantics, | Win32 OpenFile API. In order to correctly implement share semantics, | |||
the previous NFS protocol mechanisms used when a file is opened or | the previous NFS protocol mechanisms used when a file is opened or | |||
created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 | created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4.1 | |||
protocol defines an OPEN operation which is capable of atomically | protocol defines an OPEN operation that is capable of atomically | |||
looking up, creating, and locking a file on the server. | looking up, creating, and locking a file on the server. | |||
9.1. Opens and Byte-Range Locks | 9.1. Opens and Byte-Range Locks | |||
It is assumed that manipulating a byte-range lock is rare when | It is assumed that manipulating a byte-range lock is rare when | |||
compared to READ and WRITE operations. It is also assumed that | compared to READ and WRITE operations. It is also assumed that | |||
server restarts and network partitions are relatively rare. | server restarts and network partitions are relatively rare. | |||
Therefore it is important that the READ and WRITE operations have a | Therefore, it is important that the READ and WRITE operations have a | |||
lightweight mechanism to indicate if they possess a held lock. A | lightweight mechanism to indicate if they possess a held lock. A | |||
byte-range lock request contains the heavyweight information required | LOCK operation contains the heavyweight information required to | |||
to establish a lock and uniquely define the owner of the lock. | establish a byte-range lock and uniquely define the owner of the | |||
lock. | ||||
9.1.1. State-owner Definition | 9.1.1. State-Owner Definition | |||
When opening a file or requesting a byte-range lock, the client must | When opening a file or requesting a byte-range lock, the client must | |||
specify an identifier which represents the owner of the requested | specify an identifier that represents the owner of the requested | |||
lock. This identifier is in the form of a state-owner, represented | lock. This identifier is in the form of a state-owner, represented | |||
in the protocol by a state_owner4, a variable-length opaque array | in the protocol by a state_owner4, a variable-length opaque array | |||
which, when concatenated with the current client ID uniquely defines | that, when concatenated with the current client ID, uniquely defines | |||
the owner of lock managed by the client. This may be a thread ID, | the owner of a lock managed by the client. This may be a thread ID, | |||
process ID, or other unique value. | process ID, or other unique value. | |||
Owners of opens and owners of byte-range locks are separate entities | Owners of opens and owners of byte-range locks are separate entities | |||
and remain separate even if the same opaque arrays are used to | and remain separate even if the same opaque arrays are used to | |||
designate owners of each. The protocol distinguishes between open- | designate owners of each. The protocol distinguishes between open- | |||
owners (represented by open_owner4 structures) and lock-owners | owners (represented by open_owner4 structures) and lock-owners | |||
(represented by lock_owner4 structures). | (represented by lock_owner4 structures). | |||
Each open is associated with a specific open-owner while each byte- | Each open is associated with a specific open-owner while each byte- | |||
range lock is associated with a lock-owner and an open-owner, the | range lock is associated with a lock-owner and an open-owner, the | |||
latter being the open-owner associated with the open file under which | latter being the open-owner associated with the open file under which | |||
the LOCK operation was done. Delegations and layouts, on the other | the LOCK operation was done. Delegations and layouts, on the other | |||
hand, are not associated with a specific owner but are associated | hand, are not associated with a specific owner but are associated | |||
with the client as a whole (identified by a client ID). | with the client as a whole (identified by a client ID). | |||
9.1.2. Use of the Stateid and Locking | 9.1.2. Use of the Stateid and Locking | |||
All READ, WRITE and SETATTR operations contain a stateid. For the | All READ, WRITE, and SETATTR operations contain a stateid. For the | |||
purposes of this section, SETATTR operations which change the size | purposes of this section, SETATTR operations that change the size | |||
attribute of a file are treated as if they are writing the area | attribute of a file are treated as if they are writing the area | |||
between the old and new size (i.e. the range truncated or added to | between the old and new sizes (i.e., the byte-range truncated or | |||
the file by means of the SETATTR), even where SETATTR is not | added to the file by means of the SETATTR), even where SETATTR is not | |||
explicitly mentioned in the text. The stateid passed to one of these | explicitly mentioned in the text. The stateid passed to one of these | |||
operations must be one that represents an open, a set of byte-range | operations must be one that represents an open, a set of byte-range | |||
locks, or a delegation, or it may be a special stateid representing | locks, or a delegation, or it may be a special stateid representing | |||
anonymous access or the special bypass stateid. | anonymous access or the special bypass stateid. | |||
If the state-owner performs a READ or WRITE in a situation in which | If the state-owner performs a READ or WRITE operation in a situation | |||
it has established a byte-range lock or share reservation on the | in which it has established a byte-range lock or share reservation on | |||
server (any OPEN constitutes a share reservation) the stateid | the server (any OPEN constitutes a share reservation), the stateid | |||
(previously returned by the server) must be used to indicate what | (previously returned by the server) must be used to indicate what | |||
locks, including both byte-range locks and share reservations, are | locks, including both byte-range locks and share reservations, are | |||
held by the state-owner. If no state is established by the client, | held by the state-owner. If no state is established by the client, | |||
either byte-range lock or share reservation, a special stateid for | either a byte-range lock or a share reservation, a special stateid | |||
anonymous state (zero as "other" and "seqid") is used. (See | for anonymous state (zero as the value for "other" and "seqid") is | |||
Section 8.2.3 for a description of 'special' stateids in general.) | used. (See Section 8.2.3 for a description of 'special' stateids in | |||
Regardless whether a stateid for anonymous state or a stateid | general.) Regardless of whether a stateid for anonymous state or a | |||
returned by the server is used, if there is a conflicting share | stateid returned by the server is used, if there is a conflicting | |||
reservation or mandatory byte-range lock held on the file, the server | share reservation or mandatory byte-range lock held on the file, the | |||
MUST refuse to service the READ or WRITE operation. | server MUST refuse to service the READ or WRITE operation. | |||
Share reservations are established by OPEN operations and by their | Share reservations are established by OPEN operations and by their | |||
nature are mandatory in that when the OPEN denies READ or WRITE | nature are mandatory in that when the OPEN denies READ or WRITE | |||
operations, that denial results in such operations being rejected | operations, that denial results in such operations being rejected | |||
with error NFS4ERR_LOCKED. Byte-range locks may be implemented by | with error NFS4ERR_LOCKED. Byte-range locks may be implemented by | |||
the server as either mandatory or advisory, or the choice of | the server as either mandatory or advisory, or the choice of | |||
mandatory or advisory behavior may be determined by the server on the | mandatory or advisory behavior may be determined by the server on the | |||
basis of the file being accessed (for example, some UNIX-based | basis of the file being accessed (for example, some UNIX-based | |||
servers support a "mandatory lock bit" on the mode attribute such | servers support a "mandatory lock bit" on the mode attribute such | |||
that if set, byte-range locks are required on the file before I/O is | that if set, byte-range locks are required on the file before I/O is | |||
possible). When byte-range locks are advisory, they only prevent the | possible). When byte-range locks are advisory, they only prevent the | |||
granting of conflicting lock requests and have no effect on READs or | granting of conflicting lock requests and have no effect on READs or | |||
WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O | WRITEs. Mandatory byte-range locks, however, prevent conflicting I/O | |||
operations. When they are attempted, they are rejected with | operations. When they are attempted, they are rejected with | |||
NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file it | NFS4ERR_LOCKED. When the client gets NFS4ERR_LOCKED on a file for | |||
knows it has the proper share reservation for, it will need to send a | which it knows it has the proper share reservation, it will need to | |||
LOCK request on the region of the file that includes the region the | send a LOCK operation on the byte-range of the file that includes the | |||
I/O was to be performed on, with an appropriate locktype (i.e. | byte-range the I/O was to be performed on, with an appropriate | |||
READ*_LT for a READ operation, WRITE*_LT for a WRITE operation). | locktype field of the LOCK operation's arguments (i.e., READ*_LT for | |||
a READ operation, WRITE*_LT for a WRITE operation). | ||||
Note that for UNIX environments that support mandatory file locking, | ||||
the distinction between advisory and mandatory locking is subtle. In | ||||
fact, advisory and mandatory byte-range locks are exactly the same in | ||||
so far as the APIs and requirements on implementation. If the | ||||
mandatory lock attribute is set on the file, the server checks to see | ||||
if the lock-owner has an appropriate shared (read) or exclusive | ||||
(write) byte-range lock on the region it wishes to read or write to. | ||||
If there is no appropriate lock, the server checks if there is a | Note that for UNIX environments that support mandatory byte-range | |||
conflicting lock (which can be done by attempting to acquire the | locking, the distinction between advisory and mandatory locking is | |||
conflicting lock on behalf of the lock-owner, and if successful, | subtle. In fact, advisory and mandatory byte-range locks are exactly | |||
release the lock after the READ or WRITE is done), and if there is, | the same as far as the APIs and requirements on implementation. If | |||
the server returns NFS4ERR_LOCKED. | the mandatory lock attribute is set on the file, the server checks to | |||
see if the lock-owner has an appropriate shared (READ_LT) or | ||||
exclusive (WRITE_LT) byte-range lock on the byte-range it wishes to | ||||
READ from or WRITE to. If there is no appropriate lock, the server | ||||
checks if there is a conflicting lock (which can be done by | ||||
attempting to acquire the conflicting lock on behalf of the lock- | ||||
owner, and if successful, release the lock after the READ or WRITE | ||||
operation is done), and if there is, the server returns | ||||
NFS4ERR_LOCKED. | ||||
For Windows environments, byte-range locks are always mandatory, so | For Windows environments, byte-range locks are always mandatory, so | |||
the server always checks for byte-range locks during I/O requests. | the server always checks for byte-range locks during I/O requests. | |||
Thus, the NFSv4.1 LOCK operation does not need to distinguish between | Thus, the LOCK operation does not need to distinguish between | |||
advisory and mandatory byte-range locks. It is the NFSv4.1 server's | advisory and mandatory byte-range locks. It is the server's | |||
processing of the READ and WRITE operations that introduces the | processing of the READ and WRITE operations that introduces the | |||
distinction. | distinction. | |||
Every stateid which is validly passed to READ, WRITE or SETATTR, with | Every stateid that is validly passed to READ, WRITE, or SETATTR, with | |||
the exception of special stateid values, defines an access mode for | the exception of special stateid values, defines an access mode for | |||
the file (i.e. READ, WRITE, or READ-WRITE) | the file (i.e., OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or | |||
OPEN4_SHARE_ACCESS_BOTH). | ||||
o For stateids associated with opens, this is the mode defined by | o For stateids associated with opens, this is the mode defined by | |||
the original OPEN which caused the allocation of the open stateid | the original OPEN that caused the allocation of the OPEN stateid | |||
and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the | and as modified by subsequent OPENs and OPEN_DOWNGRADEs for the | |||
same open-owner/file pair. | same open-owner/file pair. | |||
o For stateids returned by byte-range lock requests, the appropriate | o For stateids returned by byte-range LOCK operations, the | |||
mode is the access mode for the open stateid associated with the | appropriate mode is the access mode for the OPEN stateid | |||
lock set represented by the stateid. | associated with the lock set represented by the stateid. | |||
o For delegation stateids the access mode is based on the type of | o For delegation stateids, the access mode is based on the type of | |||
delegation. | delegation. | |||
When a READ, WRITE, or SETATTR (which specifies the size attribute) | When a READ, WRITE, or SETATTR (that specifies the size attribute) | |||
is done, the operation is subject to checking against the access mode | operation is done, the operation is subject to checking against the | |||
to verify that the operation is appropriate given the stateid with | access mode to verify that the operation is appropriate given the | |||
which the operation is associated. | stateid with which the operation is associated. | |||
In the case of WRITE-type operations (i.e. WRITEs and SETATTRs which | In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that | |||
set size), the server MUST verify that the access mode allows writing | set size), the server MUST verify that the access mode allows writing | |||
and MUST return an NFS4ERR_OPENMODE error if it does not. In the | and MUST return an NFS4ERR_OPENMODE error if it does not. In the | |||
case, of READ, the server may perform the corresponding check on the | case of READ, the server may perform the corresponding check on the | |||
access mode, or it may choose to allow READ on opens for WRITE only, | access mode, or it may choose to allow READ on OPENs for | |||
to accommodate clients whose write implementation may unavoidably do | OPEN4_SHARE_ACCESS_WRITE, to accommodate clients whose WRITE | |||
reads (e.g. due to buffer cache constraints). However, even if READs | implementation may unavoidably do reads (e.g., due to buffer cache | |||
are allowed in these circumstances, the server MUST still check for | constraints). However, even if READs are allowed in these | |||
locks that conflict with the READ (e.g. another open specify denial | circumstances, the server MUST still check for locks that conflict | |||
of READs). Note that a server which does enforce the access mode | with the READ (e.g., another OPEN specified OPEN4_SHARE_DENY_READ or | |||
check on READs need not explicitly check for conflicting share | OPEN4_SHARE_DENY_BOTH). Note that a server that does enforce the | |||
reservations since the existence of OPEN for read access guarantees | access mode check on READs need not explicitly check for conflicting | |||
that no conflicting share reservation can exist. | share reservations since the existence of OPEN for | |||
OPEN4_SHARE_ACCESS_READ guarantees that no conflicting share | ||||
reservation can exist. | ||||
The read bypass special stateid (all bits of "other" and "seqid" set | The READ bypass special stateid (all bits of "other" and "seqid" set | |||
to one) indicates a desire to bypass locking checks. The server MAY | to one) indicates a desire to bypass locking checks. The server MAY | |||
allow READ operations to bypass locking checks at the server, when | allow READ operations to bypass locking checks at the server, when | |||
this special stateid is used. However, WRITE operations with this | this special stateid is used. However, WRITE operations with this | |||
special stateid value MUST NOT bypass locking checks and are treated | special stateid value MUST NOT bypass locking checks and are treated | |||
exactly the same as if a special stateid for anonymous state were | exactly the same as if a special stateid for anonymous state were | |||
used. | used. | |||
A lock may not be granted while a READ or WRITE operation using one | A lock may not be granted while a READ or WRITE operation using one | |||
of the special stateids is being performed and the scope of the lock | of the special stateids is being performed and the scope of the lock | |||
to be granted would conflict with the READ or WRITE operation. This | to be granted would conflict with the READ or WRITE operation. This | |||
can occur when: | can occur when: | |||
o A mandatory byte range lock is requested with range that conflicts | o A mandatory byte-range lock is requested with a byte-range that | |||
with the range of the READ or WRITE operation. For the purposes | conflicts with the byte-range of the READ or WRITE operation. For | |||
of this paragraph, a conflict occurs when a shared lock is | the purposes of this paragraph, a conflict occurs when a shared | |||
requested and a WRITE operation is being performed, or an | lock is requested and a WRITE operation is being performed, or an | |||
exclusive lock is requested and either a READ or a WRITE operation | exclusive lock is requested and either a READ or a WRITE operation | |||
is being performed. | is being performed. | |||
o A share reservation is requested which denies reading and or | o A share reservation is requested that denies reading and/or | |||
writing and the corresponding operation is being performed. | writing and the corresponding operation is being performed. | |||
o A delegation is to be granted and the delegation type would | o A delegation is to be granted and the delegation type would | |||
prevent the I/O operation, i.e. READ and WRITE conflict with a | prevent the I/O operation, i.e., READ and WRITE conflict with an | |||
write delegation and WRITE conflicts with a read delegation. | OPEN_DELEGATE_WRITE delegation and WRITE conflicts with an | |||
OPEN_DELEGATE_READ delegation. | ||||
When a client holds a delegation, it needs to ensure that the stateid | When a client holds a delegation, it needs to ensure that the stateid | |||
sent conveys the association of operation with the delegation, to | sent conveys the association of operation with the delegation, to | |||
avoid the delegation from being avoidably recalled. When the | avoid the delegation from being avoidably recalled. When the | |||
delegation stateid, or a stateid open associated with that | delegation stateid, a stateid open associated with that delegation, | |||
delegation, or a stateid representing byte-range locks derived form | or a stateid representing byte-range locks derived from such an open | |||
such an open is used, the server knows that the READ, WRITE, or | is used, the server knows that the READ, WRITE, or SETATTR does not | |||
SETATTR does not conflict with the delegation, but is sent under the | conflict with the delegation but is sent under the aegis of the | |||
aegis of the delegation. Even though it is possible for the server | delegation. Even though it is possible for the server to determine | |||
to determine from the client ID (via the session ID) that the client | from the client ID (via the session ID) that the client does in fact | |||
does in fact have a delegation, the server is not obliged to check | have a delegation, the server is not obliged to check this, so using | |||
this, so using a special stateid can result in avoidable recall of | a special stateid can result in avoidable recall of the delegation. | |||
the delegation. | ||||
9.2. Lock Ranges | 9.2. Lock Ranges | |||
The protocol allows a lock-owner to request a lock with a byte range | The protocol allows a lock-owner to request a lock with a byte-range | |||
and then either upgrade, downgrade, or unlock a sub-range of the | and then either upgrade, downgrade, or unlock a sub-range of the | |||
initial lock, or a range that consists of a range which overlaps, | initial lock, or a byte-range that overlaps -- fully or partially -- | |||
fully or partially, that initial lock or a combination of a set of | either with that initial lock or a combination of a set of existing | |||
existing locks for the same lock-owner. It is expected that this | locks for the same lock-owner. It is expected that this will be an | |||
will be an uncommon type of request. In any case, servers or server | uncommon type of request. In any case, servers or server file | |||
file systems may not be able to support sub-range lock semantics. In | systems may not be able to support sub-range lock semantics. In the | |||
the event that a server receives a locking request that represents a | event that a server receives a locking request that represents a sub- | |||
sub-range of current locking state for the lock-owner, the server is | range of current locking state for the lock-owner, the server is | |||
allowed to return the error NFS4ERR_LOCK_RANGE to signify that it | allowed to return the error NFS4ERR_LOCK_RANGE to signify that it | |||
does not support sub-range lock operations. Therefore, the client | does not support sub-range lock operations. Therefore, the client | |||
should be prepared to receive this error and, if appropriate, report | should be prepared to receive this error and, if appropriate, report | |||
the error to the requesting application. | the error to the requesting application. | |||
The client is discouraged from combining multiple independent locking | The client is discouraged from combining multiple independent locking | |||
ranges that happen to be adjacent into a single request since the | ranges that happen to be adjacent into a single request since the | |||
server may not support sub-range requests and for reasons related to | server may not support sub-range requests for reasons related to the | |||
the recovery of file locking state in the event of server failure. | recovery of byte-range locking state in the event of server failure. | |||
As discussed in Section 8.4.2, the server may employ certain | As discussed in Section 8.4.2, the server may employ certain | |||
optimizations during recovery that work effectively only when the | optimizations during recovery that work effectively only when the | |||
client's behavior during lock recovery is similar to the client's | client's behavior during lock recovery is similar to the client's | |||
locking behavior prior to server failure. | locking behavior prior to server failure. | |||
9.3. Upgrading and Downgrading Locks | 9.3. Upgrading and Downgrading Locks | |||
If a client has a write lock on a byte-range, it can request an | If a client has a WRITE_LT lock on a byte-range, it can request an | |||
atomic downgrade of the lock to a read lock via the LOCK request, by | atomic downgrade of the lock to a READ_LT lock via the LOCK | |||
setting the type to READ_LT. If the server supports atomic | operation, by setting the type to READ_LT. If the server supports | |||
downgrade, the request will succeed. If not, it will return | atomic downgrade, the request will succeed. If not, it will return | |||
NFS4ERR_LOCK_NOTSUPP. The client should be prepared to receive this | NFS4ERR_LOCK_NOTSUPP. The client should be prepared to receive this | |||
error, and if appropriate, report the error to the requesting | error and, if appropriate, report the error to the requesting | |||
application. | application. | |||
If a client has a read lock on a byte-range, it can request an atomic | If a client has a READ_LT lock on a byte-range, it can request an | |||
upgrade of the lock to a write lock via the LOCK request by setting | atomic upgrade of the lock to a WRITE_LT lock via the LOCK operation | |||
the type to WRITE_LT or WRITEW_LT. If the server does not support | by setting the type to WRITE_LT or WRITEW_LT. If the server does not | |||
atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade | support atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the | |||
can be achieved without an existing conflict, the request will | upgrade can be achieved without an existing conflict, the request | |||
succeed. Otherwise, the server will return either NFS4ERR_DENIED or | will succeed. Otherwise, the server will return either | |||
NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the | NFS4ERR_DENIED or NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is | |||
client sent the LOCK request with the type set to WRITEW_LT and the | returned if the client sent the LOCK operation with the type set to | |||
server has detected a deadlock. The client should be prepared to | WRITEW_LT and the server has detected a deadlock. The client should | |||
receive such errors and if appropriate, report the error to the | be prepared to receive such errors and, if appropriate, report the | |||
requesting application. | error to the requesting application. | |||
9.4. Stateid Seqid Values and Byte-Range Locks | 9.4. Stateid Seqid Values and Byte-Range Locks | |||
When a lock or unlock request is done, passing a stateid, the stateid | When a LOCK or LOCKU operation is performed, the stateid returned has | |||
returned has the same "other" value and a "seqid" value that is | the same "other" value as the argument's stateid, and a "seqid" value | |||
incremented to reflect the occurrence of the lock or unlock request. | that is incremented (relative to the argument's stateid) to reflect | |||
The server MUST increment the value of the "seqid" field whenever | the occurrence of the LOCK or LOCKU operation. The server MUST | |||
there is any change to the locking status of any byte offset as | increment the value of the "seqid" field whenever there is any change | |||
described by any of locks covered by the stateid. A change in | to the locking status of any byte offset as described by any of the | |||
locking status includes a change from locked to unlocked or the | locks covered by the stateid. A change in locking status includes a | |||
reverse or a change from being locked for read to being locked for | change from locked to unlocked or the reverse or a change from being | |||
write or the reverse. | locked for READ_LT to being locked for WRITE_LT or the reverse. | |||
When there is no such change, as, for example when a range already | When there is no such change, as, for example, when a range already | |||
locked for write is locked again for write, the server MAY increment | locked for WRITE_LT is locked again for WRITE_LT, the server MAY | |||
the "seqid" value. | increment the "seqid" value. | |||
9.5. Issues with Multiple Open-Owners | 9.5. Issues with Multiple Open-Owners | |||
When the same file is opened by multiple open-owners, a client will | When the same file is opened by multiple open-owners, a client will | |||
have multiple open stateids for that file, each associated with a | have multiple OPEN stateids for that file, each associated with a | |||
different open-owner. In that case, there can be multiple LOCK and | different open-owner. In that case, there can be multiple LOCK and | |||
LOCKU requests for the same lock-owner sent using the different open | LOCKU requests for the same lock-owner sent using the different OPEN | |||
stateids, and so a situation may arise in which there are multiple | stateids, and so a situation may arise in which there are multiple | |||
stateids, each representing byte-range locks on the same file and | stateids, each representing byte-range locks on the same file and | |||
held by the same lock-owner but each associated with a different | held by the same lock-owner but each associated with a different | |||
open-owner. | open-owner. | |||
In such a situation, the locking status of each byte (i.e. whether it | In such a situation, the locking status of each byte (i.e., whether | |||
is locked, the read or write mode of the lock and the lock-owner | it is locked, the READ_LT or WRITE_LT type of the lock, and the lock- | |||
holding the lock) MUST reflect the last LOCK or LOCKU operation done | owner holding the lock) MUST reflect the last LOCK or LOCKU operation | |||
for the lock-owner in question, independent of the stateid through | done for the lock-owner in question, independent of the stateid | |||
which the request was sent. | through which the request was sent. | |||
When a byte is locked by the lock-owner in question, the open-owner | When a byte is locked by the lock-owner in question, the open-owner | |||
to which that lock is assigned SHOULD be that of the open-owner | to which that byte-range lock is assigned SHOULD be that of the open- | |||
associated with the stateid through which the last LOCK of that byte | owner associated with the stateid through which the last LOCK of that | |||
was done. When there is a change in the open-owner associated with | byte was done. When there is a change in the open-owner associated | |||
locks for the stateid through which a LOCK or LOCKU was done, the | with locks for the stateid through which a LOCK or LOCKU was done, | |||
"seqid" field of the stateid MUST be incremented, even if the | the "seqid" field of the stateid MUST be incremented, even if the | |||
locking, in terms of lock-owners has not changed. When there is a | locking, in terms of lock-owners has not changed. When there is a | |||
change to the set of locked bytes associated with a different stateid | change to the set of locked bytes associated with a different stateid | |||
for the same lock-owner, i.e. associated with a different open-owner, | for the same lock-owner, i.e., associated with a different open- | |||
the "seqid" value for that stateid MUST NOT be incremented. | owner, the "seqid" value for that stateid MUST NOT be incremented. | |||
9.6. Blocking Locks | 9.6. Blocking Locks | |||
Some clients require the support of blocking locks. While NFSv4.1 | Some clients require the support of blocking locks. While NFSv4.1 | |||
provides a callback when a previously unavailable lock becomes | provides a callback when a previously unavailable lock becomes | |||
available, this is an OPTIONAL feature and clients cannot depend on | available, this is an OPTIONAL feature and clients cannot depend on | |||
its presence. Clients need to be prepared to continually poll for | its presence. Clients need to be prepared to continually poll for | |||
the lock. This presents a fairness problem. Two of the lock types, | the lock. This presents a fairness problem. Two of the lock types, | |||
READW and WRITEW, are used to indicate to the server that the client | READW_LT and WRITEW_LT, are used to indicate to the server that the | |||
is requesting a blocking lock. When the callback is not used, the | client is requesting a blocking lock. When the callback is not used, | |||
server should maintain an ordered list of pending blocking locks. | the server should maintain an ordered list of pending blocking locks. | |||
When the conflicting lock is released, the server may wait for the | When the conflicting lock is released, the server may wait for the | |||
period of time equal to lease_time for the first waiting client to | period of time equal to lease_time for the first waiting client to | |||
re-request the lock. After the lease period expires, the next | re-request the lock. After the lease period expires, the next | |||
waiting client request is allowed the lock. Clients are required to | waiting client request is allowed the lock. Clients are required to | |||
poll at an interval sufficiently small that it is likely to acquire | poll at an interval sufficiently small that it is likely to acquire | |||
the lock in a timely manner. The server is not required to maintain | the lock in a timely manner. The server is not required to maintain | |||
a list of pending blocked locks as it is used to increase fairness | a list of pending blocked locks as it is used to increase fairness | |||
and not correct operation. Because of the unordered nature of crash | and not correct operation. Because of the unordered nature of crash | |||
recovery, storing of lock state to stable storage would be required | recovery, storing of lock state to stable storage would be required | |||
to guarantee ordered granting of blocking locks. | to guarantee ordered granting of blocking locks. | |||
Servers may also note the lock types and delay returning denial of | Servers may also note the lock types and delay returning denial of | |||
the request to allow extra time for a conflicting lock to be | the request to allow extra time for a conflicting lock to be | |||
released, allowing a successful return. In this way, clients can | released, allowing a successful return. In this way, clients can | |||
avoid the burden of needlessly frequent polling for blocking locks. | avoid the burden of needless frequent polling for blocking locks. | |||
The server should take care in the length of delay in the event the | The server should take care in the length of delay in the event the | |||
client retransmits the request. | client retransmits the request. | |||
If a server receives a blocking lock request, denies it, and then | If a server receives a blocking LOCK operation, denies it, and then | |||
later receives a nonblocking request for the same lock, which is also | later receives a nonblocking request for the same lock, which is also | |||
denied, then it should remove the lock in question from its list of | denied, then it should remove the lock in question from its list of | |||
pending blocking locks. Clients should use such a nonblocking | pending blocking locks. Clients should use such a nonblocking | |||
request to indicate to the server that this is the last time they | request to indicate to the server that this is the last time they | |||
intend to poll for the lock, as may happen when the process | intend to poll for the lock, as may happen when the process | |||
requesting the lock is interrupted. This is a courtesy to the | requesting the lock is interrupted. This is a courtesy to the | |||
server, to prevent it from unnecessarily waiting a lease period | server, to prevent it from unnecessarily waiting a lease period | |||
before granting other lock requests. However, clients are not | before granting other LOCK operations. However, clients are not | |||
required to perform this courtesy, and servers must not depend on | required to perform this courtesy, and servers must not depend on | |||
them doing so. Also, clients must be prepared for the possibility | them doing so. Also, clients must be prepared for the possibility | |||
that this final locking request will be accepted. | that this final locking request will be accepted. | |||
When server indicates, via the flag OPEN4_RESULT_MAY_NOTIFY_LOCK, | When a server indicates, via the flag OPEN4_RESULT_MAY_NOTIFY_LOCK, | |||
that CB_NOTIFY_LOCK callbacks will be done for the current open file, | that CB_NOTIFY_LOCK callbacks might be done for the current open | |||
the client should take notice of this, but, since this is a hint, | file, the client should take notice of this, but, since this is a | |||
cannot rely on a CB_NOTIFY_LOCK always being done. A client may | hint, cannot rely on a CB_NOTIFY_LOCK always being done. A client | |||
reasonably reduce the frequency with which it polls for a denied | may reasonably reduce the frequency with which it polls for a denied | |||
lock, since the greater latency that might occur is likely to be | lock, since the greater latency that might occur is likely to be | |||
eliminated given a prompt callback, but it still needs to poll. When | eliminated given a prompt callback, but it still needs to poll. When | |||
it receives a CB_NOTIFY_LOCK it should promptly try to obtain the | it receives a CB_NOTIFY_LOCK, it should promptly try to obtain the | |||
lock, but it should be aware that other clients may polling and the | lock, but it should be aware that other clients may be polling and | |||
server is under no obligation to reserve the lock for that particular | that the server is under no obligation to reserve the lock for that | |||
client. | particular client. | |||
9.7. Share Reservations | 9.7. Share Reservations | |||
A share reservation is a mechanism to control access to a file. It | A share reservation is a mechanism to control access to a file. It | |||
is a separate and independent mechanism from byte-range locking. | is a separate and independent mechanism from byte-range locking. | |||
When a client opens a file, it sends an OPEN operation to the server | When a client opens a file, it sends an OPEN operation to the server | |||
specifying the type of access required (READ, WRITE, or BOTH) and the | specifying the type of access required (READ, WRITE, or BOTH) and the | |||
type of access to deny others (deny NONE, READ, WRITE, or BOTH). If | type of access to deny others (OPEN4_SHARE_DENY_NONE, | |||
the OPEN fails the client will fail the application's open request. | OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or | |||
OPEN4_SHARE_DENY_BOTH). If the OPEN fails, the client will fail the | ||||
application's open request. | ||||
Pseudo-code definition of the semantics: | Pseudo-code definition of the semantics: | |||
if (request.access == 0) { | if (request.access == 0) { | |||
return (NFS4ERR_INVAL) | return (NFS4ERR_INVAL) | |||
} else { | } else { | |||
if ((request.access & file_state.deny)) || | if ((request.access & file_state.deny)) || | |||
(request.deny & file_state.access)) { | (request.deny & file_state.access)) { | |||
return (NFS4ERR_SHARE_DENIED) | return (NFS4ERR_SHARE_DENIED) | |||
} | } | |||
skipping to change at page 192, line 38 | skipping to change at page 192, line 19 | |||
const OPEN4_SHARE_DENY_NONE = 0x00000000; | const OPEN4_SHARE_DENY_NONE = 0x00000000; | |||
const OPEN4_SHARE_DENY_READ = 0x00000001; | const OPEN4_SHARE_DENY_READ = 0x00000001; | |||
const OPEN4_SHARE_DENY_WRITE = 0x00000002; | const OPEN4_SHARE_DENY_WRITE = 0x00000002; | |||
const OPEN4_SHARE_DENY_BOTH = 0x00000003; | const OPEN4_SHARE_DENY_BOTH = 0x00000003; | |||
9.8. OPEN/CLOSE Operations | 9.8. OPEN/CLOSE Operations | |||
To provide correct share semantics, a client MUST use the OPEN | To provide correct share semantics, a client MUST use the OPEN | |||
operation to obtain the initial filehandle and indicate the desired | operation to obtain the initial filehandle and indicate the desired | |||
access and what access, if any, to deny. Even if the client intends | access and what access, if any, to deny. Even if the client intends | |||
to use a special stateid for anonymous state or read bypass, it must | to use a special stateid for anonymous state or READ bypass, it must | |||
still obtain the filehandle for the regular file with the OPEN | still obtain the filehandle for the regular file with the OPEN | |||
operation so the appropriate share semantics can be applied. For | operation so the appropriate share semantics can be applied. Clients | |||
clients that do not have a deny mode built into their open | that do not have a deny mode built into their programming interfaces | |||
programming interfaces, deny equal to NONE should be used. | for opening a file should request a deny mode of | |||
OPEN4_SHARE_DENY_NONE. | ||||
The OPEN operation with the CREATE flag, also subsumes the CREATE | The OPEN operation with the CREATE flag also subsumes the CREATE | |||
operation for regular files as used in previous versions of the NFS | operation for regular files as used in previous versions of the NFS | |||
protocol. This allows a create with a share to be done atomically. | protocol. This allows a create with a share to be done atomically. | |||
The CLOSE operation removes all share reservations held by the open- | The CLOSE operation removes all share reservations held by the open- | |||
owner on that file. If byte-range locks are held, the client SHOULD | owner on that file. If byte-range locks are held, the client SHOULD | |||
release all locks before sending a CLOSE operation. The server MAY | release all locks before sending a CLOSE operation. The server MAY | |||
free all outstanding locks on CLOSE but some servers may not support | free all outstanding locks on CLOSE, but some servers may not support | |||
the CLOSE of a file that still has byte-range locks held. The server | the CLOSE of a file that still has byte-range locks held. The server | |||
MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist | MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist | |||
after the CLOSE. | after the CLOSE. | |||
The LOOKUP operation will return a filehandle without establishing | The LOOKUP operation will return a filehandle without establishing | |||
any lock state on the server. Without a valid stateid, the server | any lock state on the server. Without a valid stateid, the server | |||
will assume the client has the least access. For example, a file | will assume that the client has the least access. For example, if | |||
opened with deny READ/WRITE using a filehandle obtained through | one client opened a file with OPEN4_SHARE_DENY_BOTH and another | |||
LOOKUP could only be read using the special read bypass stateid and | client accesses the file via a filehandle obtained through LOOKUP, | |||
could not be written at all because it would not have a valid stateid | the second client could only read the file using the special read | |||
and the special anonymous stateid would not be allowed access. | bypass stateid. The second client could not WRITE the file at all | |||
because it would not have a valid stateid from OPEN and the special | ||||
anonymous stateid would not be allowed access. | ||||
9.9. Open Upgrade and Downgrade | 9.9. Open Upgrade and Downgrade | |||
When an OPEN is done for a file and the open-owner for which the open | When an OPEN is done for a file and the open-owner for which the OPEN | |||
is being done already has the file open, the result is to upgrade the | is being done already has the file open, the result is to upgrade the | |||
open file status maintained on the server to include the access and | open file status maintained on the server to include the access and | |||
deny bits specified by the new OPEN as well as those for the existing | deny bits specified by the new OPEN as well as those for the existing | |||
OPEN. The result is that there is one open file, as far as the | OPEN. The result is that there is one open file, as far as the | |||
protocol is concerned, and it includes the union of the access and | protocol is concerned, and it includes the union of the access and | |||
deny bits for all of the OPEN requests completed. The open is | deny bits for all of the OPEN requests completed. The OPEN is | |||
represented by a single stateid whose "other" values matches that of | represented by a single stateid whose "other" value matches that of | |||
the original open, and whose "seqid" value is incremented to reflect | the original open, and whose "seqid" value is incremented to reflect | |||
the occurrence of the upgrade. The increment is required in cases in | the occurrence of the upgrade. The increment is required in cases in | |||
which the "upgrade" results in no change to the open mode (e.g. an | which the "upgrade" results in no change to the open mode (e.g., an | |||
OPEN is done for read when the existing open file is opened for read- | OPEN is done for read when the existing open file is opened for | |||
write). Only a single CLOSE will be done to reset the effects of | OPEN4_SHARE_ACCESS_BOTH). Only a single CLOSE will be done to reset | |||
both OPENs. The client may use the stateid returned by the OPEN | the effects of both OPENs. The client may use the stateid returned | |||
effecting the upgrade or with a stateid sharing the same "other" | by the OPEN effecting the upgrade or with a stateid sharing the same | |||
field and a seqid of zero, although care needs to be taken as far as | "other" field and a seqid of zero, although care needs to be taken as | |||
upgrades which happen while the CLOSE is pending. Note that the | far as upgrades that happen while the CLOSE is pending. Note that | |||
client, when sending the OPEN operation, may not know that the same | the client, when sending the OPEN, may not know that the same file is | |||
file is in fact being opened. The above only applies if both OPENs | in fact being opened. The above only applies if both OPENs result in | |||
result in the OPENed object being designated by the same filehandle. | the OPENed object being designated by the same filehandle. | |||
When the server chooses to export multiple filehandles corresponding | When the server chooses to export multiple filehandles corresponding | |||
to the same file object and returns different filehandles on two | to the same file object and returns different filehandles on two | |||
different OPENs of the same file object, the server MUST NOT "OR" | different OPENs of the same file object, the server MUST NOT "OR" | |||
together the access and deny bits and coalesce the two open files. | together the access and deny bits and coalesce the two open files. | |||
Instead the server must maintain separate OPENs with separate | Instead, the server must maintain separate OPENs with separate | |||
stateids and will require separate CLOSEs to free them. | stateids and will require separate CLOSEs to free them. | |||
When multiple open files on the client are merged into a single open | When multiple open files on the client are merged into a single OPEN | |||
file object on the server, the close of one of the open files (on the | file object on the server, the close of one of the open files (on the | |||
client) may necessitate change of the access and deny status of the | client) may necessitate change of the access and deny status of the | |||
open file on the server. This is because the union of the access and | open file on the server. This is because the union of the access and | |||
deny bits for the remaining opens may be smaller (i.e. a proper | deny bits for the remaining opens may be smaller (i.e., a proper | |||
subset) than previously. The OPEN_DOWNGRADE operation is used to | subset) than previously. The OPEN_DOWNGRADE operation is used to | |||
make the necessary change and the client should use it to update the | make the necessary change and the client should use it to update the | |||
server so that share reservation requests by other clients are | server so that share reservation requests by other clients are | |||
handled properly. The stateid returned has the same "other" field as | handled properly. The stateid returned has the same "other" field as | |||
that passed to the server. The "seqid" value in the returned stateid | that passed to the server. The "seqid" value in the returned stateid | |||
MUST be incremented, even is situation in which there is no change | MUST be incremented, even in situations in which there is no change | |||
the access and deny bits for the file. | to the access and deny bits for the file. | |||
9.10. Parallel OPENs | 9.10. Parallel OPENs | |||
Unlike the case of NFSv4.0, in which OPEN operations for the same | Unlike the case of NFSv4.0, in which OPEN operations for the same | |||
open-owner are inherently serialized because of the owner-based | open-owner are inherently serialized because of the owner-based | |||
seqid, multiple OPENs for the same open-owner may be done in | seqid, multiple OPENs for the same open-owner may be done in | |||
parallel. When clients do this, they may encounter situations in | parallel. When clients do this, they may encounter situations in | |||
which, because of the existence of hard links, two OPEN operations | which, because of the existence of hard links, two OPEN operations | |||
may turn out to open the same file, with a later OPEN performed being | may turn out to open the same file, with a later OPEN performed being | |||
an upgrade of the first, with this fact only visible to the client | an upgrade of the first, with this fact only visible to the client | |||
skipping to change at page 194, line 32 | skipping to change at page 194, line 16 | |||
were performed by examining the stateids returned by the OPENs. | were performed by examining the stateids returned by the OPENs. | |||
Stateids that share a common value of the "other" field can be | Stateids that share a common value of the "other" field can be | |||
recognized as having opened the same file, with the order of the | recognized as having opened the same file, with the order of the | |||
operations determinable from the order of the "seqid" fields, mod any | operations determinable from the order of the "seqid" fields, mod any | |||
possible wraparound of the 32-bit field. | possible wraparound of the 32-bit field. | |||
When the possibility exists that the client will send multiple OPENs | When the possibility exists that the client will send multiple OPENs | |||
for the same open-owner in parallel, it may be the case that an open | for the same open-owner in parallel, it may be the case that an open | |||
upgrade may happen without the client knowing beforehand that this | upgrade may happen without the client knowing beforehand that this | |||
could happen. Because of this possibility, CLOSEs and | could happen. Because of this possibility, CLOSEs and | |||
OPEN_DOWNGRADEs, should generally be sent with a non-zero seqid in | OPEN_DOWNGRADEs should generally be sent with a non-zero seqid in the | |||
the stateid, to avoid the possibility that the status change | stateid, to avoid the possibility that the status change associated | |||
associated with an open upgrade is not inadvertently lost. | with an open upgrade is not inadvertently lost. | |||
9.11. Reclaim of Open and Byte-Range Locks | 9.11. Reclaim of Open and Byte-Range Locks | |||
Special forms of the LOCK and OPEN operations are provided when it is | Special forms of the LOCK and OPEN operations are provided when it is | |||
necessary to re-establish byte-range locks or opens after a server | necessary to re-establish byte-range locks or opens after a server | |||
failure. | failure. | |||
o To reclaim existing opens, an OPEN operation is performed using a | o To reclaim existing opens, an OPEN operation is performed using a | |||
CLAIM_PREVIOUS. Because the client, in this type of situation, | CLAIM_PREVIOUS. Because the client, in this type of situation, | |||
will have already opened the file and have the filehandle of the | will have already opened the file and have the filehandle of the | |||
target file, this operation requires that the current filehandle | target file, this operation requires that the current filehandle | |||
be the target file, rather than a directory and no file name is | be the target file, rather than a directory, and no file name is | |||
specified. | specified. | |||
o To reclaim byte-range locks, a LOCK operation with the reclaim | o To reclaim byte-range locks, a LOCK operation with the reclaim | |||
parameter set to true is used. | parameter set to true is used. | |||
Reclaims of opens associated with delegations are discussed in | Reclaims of opens associated with delegations are discussed in | |||
Section 10.2.1. | Section 10.2.1. | |||
10. Client-Side Caching | 10. Client-Side Caching | |||
Client-side caching of data, of file attributes, and of file names is | Client-side caching of data, of file attributes, and of file names is | |||
essential to providing good performance with the NFS protocol. | essential to providing good performance with the NFS protocol. | |||
Providing distributed cache coherence is a difficult problem and | Providing distributed cache coherence is a difficult problem, and | |||
previous versions of the NFS protocol have not attempted it. | previous versions of the NFS protocol have not attempted it. | |||
Instead, several NFS client implementation techniques have been used | Instead, several NFS client implementation techniques have been used | |||
to reduce the problems that a lack of coherence poses for users. | to reduce the problems that a lack of coherence poses for users. | |||
These techniques have not been clearly defined by earlier protocol | These techniques have not been clearly defined by earlier protocol | |||
specifications and it is often unclear what is valid or invalid | specifications, and it is often unclear what is valid or invalid | |||
client behavior. | client behavior. | |||
The NFSv4.1 protocol uses many techniques similar to those that have | The NFSv4.1 protocol uses many techniques similar to those that have | |||
been used in previous protocol versions. The NFSv4.1 protocol does | been used in previous protocol versions. The NFSv4.1 protocol does | |||
not provide distributed cache coherence. However, it defines a more | not provide distributed cache coherence. However, it defines a more | |||
limited set of caching guarantees to allow locks and share | limited set of caching guarantees to allow locks and share | |||
reservations to be used without destructive interference from client | reservations to be used without destructive interference from client- | |||
side caching. | side caching. | |||
In addition, the NFSv4.1 protocol introduces a delegation mechanism | In addition, the NFSv4.1 protocol introduces a delegation mechanism, | |||
which allows many decisions normally made by the server to be made | which allows many decisions normally made by the server to be made | |||
locally by clients. This mechanism provides efficient support of the | locally by clients. This mechanism provides efficient support of the | |||
common cases where sharing is infrequent or where sharing is read- | common cases where sharing is infrequent or where sharing is read- | |||
only. | only. | |||
10.1. Performance Challenges for Client-Side Caching | 10.1. Performance Challenges for Client-Side Caching | |||
Caching techniques used in previous versions of the NFS protocol have | Caching techniques used in previous versions of the NFS protocol have | |||
been successful in providing good performance. However, several | been successful in providing good performance. However, several | |||
scalability challenges can arise when those techniques are used with | scalability challenges can arise when those techniques are used with | |||
very large numbers of clients. This is particularly true when | very large numbers of clients. This is particularly true when | |||
clients are geographically distributed which classically increases | clients are geographically distributed, which classically increases | |||
the latency for cache revalidation requests. | the latency for cache revalidation requests. | |||
The previous versions of the NFS protocol repeat their file data | The previous versions of the NFS protocol repeat their file data | |||
cache validation requests at the time the file is opened. This | cache validation requests at the time the file is opened. This | |||
behavior can have serious performance drawbacks. A common case is | behavior can have serious performance drawbacks. A common case is | |||
one in which a file is only accessed by a single client. Therefore, | one in which a file is only accessed by a single client. Therefore, | |||
sharing is infrequent. | sharing is infrequent. | |||
In this case, repeated reference to the server to find that no | In this case, repeated references to the server to find that no | |||
conflicts exist is expensive. A better option with regards to | conflicts exist are expensive. A better option with regards to | |||
performance is to allow a client that repeatedly opens a file to do | performance is to allow a client that repeatedly opens a file to do | |||
so without reference to the server. This is done until potentially | so without reference to the server. This is done until potentially | |||
conflicting operations from another client actually occur. | conflicting operations from another client actually occur. | |||
A similar situation arises in connection with file locking. Sending | A similar situation arises in connection with byte-range locking. | |||
file lock and unlock requests to the server as well as the read and | Sending LOCK and LOCKU operations as well as the READ and WRITE | |||
write requests necessary to make data caching consistent with the | operations necessary to make data caching consistent with the locking | |||
locking semantics (see Section 10.3.2) can severely limit | semantics (see Section 10.3.2) can severely limit performance. When | |||
performance. When locking is used to provide protection against | locking is used to provide protection against infrequent conflicts, a | |||
infrequent conflicts, a large penalty is incurred. This penalty may | large penalty is incurred. This penalty may discourage the use of | |||
discourage the use of file locking by applications. | byte-range locking by applications. | |||
The NFSv4.1 protocol provides more aggressive caching strategies with | The NFSv4.1 protocol provides more aggressive caching strategies with | |||
the following design goals: | the following design goals: | |||
o Compatibility with a large range of server semantics. | o Compatibility with a large range of server semantics. | |||
o Providing the same caching benefits as previous versions of the | o Providing the same caching benefits as previous versions of the | |||
NFS protocol when unable to support the more aggressive model. | NFS protocol when unable to support the more aggressive model. | |||
o Requirements for aggressive caching are organized so that a large | o Requirements for aggressive caching are organized so that a large | |||
skipping to change at page 196, line 41 | skipping to change at page 196, line 24 | |||
Recallable delegation of server responsibilities for a file to a | Recallable delegation of server responsibilities for a file to a | |||
client improves performance by avoiding repeated requests to the | client improves performance by avoiding repeated requests to the | |||
server in the absence of inter-client conflict. With the use of a | server in the absence of inter-client conflict. With the use of a | |||
"callback" RPC from server to client, a server recalls delegated | "callback" RPC from server to client, a server recalls delegated | |||
responsibilities when another client engages in sharing of a | responsibilities when another client engages in sharing of a | |||
delegated file. | delegated file. | |||
A delegation is passed from the server to the client, specifying the | A delegation is passed from the server to the client, specifying the | |||
object of the delegation and the type of delegation. There are | object of the delegation and the type of delegation. There are | |||
different types of delegations but each type contains a stateid to be | different types of delegations, but each type contains a stateid to | |||
used to represent the delegation when performing operations that | be used to represent the delegation when performing operations that | |||
depend on the delegation. This stateid is similar to those | depend on the delegation. This stateid is similar to those | |||
associated with locks and share reservations but differs in that the | associated with locks and share reservations but differs in that the | |||
stateid for a delegation is associated with a client ID and may be | stateid for a delegation is associated with a client ID and may be | |||
used on behalf of all the open-owners for the given client. A | used on behalf of all the open-owners for the given client. A | |||
delegation is made to the client as a whole and not to any specific | delegation is made to the client as a whole and not to any specific | |||
process or thread of control within it. | process or thread of control within it. | |||
The backchannel is established by CREATE_SESSION and | The backchannel is established by CREATE_SESSION and | |||
BIND_CONN_TO_SESSION, and the client is required to maintain it. | BIND_CONN_TO_SESSION, and the client is required to maintain it. | |||
Because the backchannel may be down, even temporarily, correct | Because the backchannel may be down, even temporarily, correct | |||
protocol operation does not depend on them. Preliminary testing of | protocol operation does not depend on them. Preliminary testing of | |||
backchannel functionality by means of a CB_COMPOUND procedure with a | backchannel functionality by means of a CB_COMPOUND procedure with a | |||
single operation, CB_SEQUENCE, can be used to check the continuity of | single operation, CB_SEQUENCE, can be used to check the continuity of | |||
the backchannel. A server avoids delegating responsibilities until | the backchannel. A server avoids delegating responsibilities until | |||
it has determined that the backchannel exists. Because the granting | it has determined that the backchannel exists. Because the granting | |||
of a delegation is always conditional upon the absence of conflicting | of a delegation is always conditional upon the absence of conflicting | |||
access, clients MUST NOT assume that a delegation will be granted and | access, clients MUST NOT assume that a delegation will be granted and | |||
they MUST always be prepared for OPENs, WANT_DELEGATIONs, and | they MUST always be prepared for OPENs, WANT_DELEGATIONs, and | |||
GET_DIR_DELEGATIONs to be processed without any delegations being | GET_DIR_DELEGATIONs to be processed without any delegations being | |||
skipping to change at page 197, line 17 | skipping to change at page 196, line 47 | |||
backchannel functionality by means of a CB_COMPOUND procedure with a | backchannel functionality by means of a CB_COMPOUND procedure with a | |||
single operation, CB_SEQUENCE, can be used to check the continuity of | single operation, CB_SEQUENCE, can be used to check the continuity of | |||
the backchannel. A server avoids delegating responsibilities until | the backchannel. A server avoids delegating responsibilities until | |||
it has determined that the backchannel exists. Because the granting | it has determined that the backchannel exists. Because the granting | |||
of a delegation is always conditional upon the absence of conflicting | of a delegation is always conditional upon the absence of conflicting | |||
access, clients MUST NOT assume that a delegation will be granted and | access, clients MUST NOT assume that a delegation will be granted and | |||
they MUST always be prepared for OPENs, WANT_DELEGATIONs, and | they MUST always be prepared for OPENs, WANT_DELEGATIONs, and | |||
GET_DIR_DELEGATIONs to be processed without any delegations being | GET_DIR_DELEGATIONs to be processed without any delegations being | |||
granted. | granted. | |||
Once granted, a delegation behaves in many ways like a lock. There | ||||
is an associated lease that is subject to renewal together with all | ||||
of the other leases held by that client. | ||||
Unlike locks, an operation by a second client to a delegated file | Unlike locks, an operation by a second client to a delegated file | |||
will cause the server to recall a delegation through a callback. For | will cause the server to recall a delegation through a callback. For | |||
individual operations, we will describe, under IMPLEMENTATION, when | individual operations, we will describe, under IMPLEMENTATION, when | |||
such operations are required to effect a recall. A number of points | such operations are required to effect a recall. A number of points | |||
should be noted, however. | should be noted, however. | |||
o The server is free to recall a delegation whenever it feels it is | o The server is free to recall a delegation whenever it feels it is | |||
desirable and may do so even if no operations requiring recall are | desirable and may do so even if no operations requiring recall are | |||
being done. | being done. | |||
o Operations done outside the NFSv4 protocol, due to, for example, | o Operations done outside the NFSv4.1 protocol, due to, for example, | |||
access by other protocols, or by local access, also need to result | access by other protocols, or by local access, also need to result | |||
in delegation recall when they make analogous changes to file | in delegation recall when they make analogous changes to file | |||
system data. What is crucial is if the change would invalidate | system data. What is crucial is if the change would invalidate | |||
the guarantees provided by the delegation. When this is possible, | the guarantees provided by the delegation. When this is possible, | |||
the delegation needs to be recalled and MUST be returned or | the delegation needs to be recalled and MUST be returned or | |||
revoked before allowing the operation to proceed. | revoked before allowing the operation to proceed. | |||
o The semantics of the file system are crucial in defining when | o The semantics of the file system are crucial in defining when | |||
delegation recall is required. If a particular change within a | delegation recall is required. If a particular change within a | |||
specific implementation causes change to a file attribute, then | specific implementation causes change to a file attribute, then | |||
skipping to change at page 198, line 40 | skipping to change at page 198, line 17 | |||
the server should allow sufficient time for the delegation to be | the server should allow sufficient time for the delegation to be | |||
returned since it may involve numerous RPCs to the server. If the | returned since it may involve numerous RPCs to the server. If the | |||
server is able to determine that the client is diligently flushing | server is able to determine that the client is diligently flushing | |||
state to the server as a result of the recall, the server may extend | state to the server as a result of the recall, the server may extend | |||
the usual time allowed for a recall. However, the time allowed for | the usual time allowed for a recall. However, the time allowed for | |||
recall completion should not be unbounded. | recall completion should not be unbounded. | |||
An example of this is when responsibility to mediate opens on a given | An example of this is when responsibility to mediate opens on a given | |||
file is delegated to a client (see Section 10.4). The server will | file is delegated to a client (see Section 10.4). The server will | |||
not know what opens are in effect on the client. Without this | not know what opens are in effect on the client. Without this | |||
knowledge the server will be unable to determine if the access and | knowledge, the server will be unable to determine if the access and | |||
deny state for the file allows any particular open until the | deny states for the file allow any particular open until the | |||
delegation for the file has been returned. | delegation for the file has been returned. | |||
A client failure or a network partition can result in failure to | A client failure or a network partition can result in failure to | |||
respond to a recall callback. In this case, the server will revoke | respond to a recall callback. In this case, the server will revoke | |||
the delegation which in turn will render useless any modified state | the delegation, which in turn will render useless any modified state | |||
still on the client. | still on the client. | |||
10.2.1. Delegation Recovery | 10.2.1. Delegation Recovery | |||
There are three situations that delegation recovery needs to deal | There are three situations that delegation recovery needs to deal | |||
with: | with: | |||
o Client restart | o client restart | |||
o Server restart | o server restart | |||
o Network partition (full or backchannel-only) | o network partition (full or backchannel-only) | |||
In the event the client restarts, the failure to renew the lease will | In the event the client restarts, the failure to renew the lease will | |||
result in the revocation of byte-range locks and share reservations. | result in the revocation of byte-range locks and share reservations. | |||
Delegations, however, may be treated a bit differently. | Delegations, however, may be treated a bit differently. | |||
There will be situations in which delegations will need to be | There will be situations in which delegations will need to be re- | |||
reestablished after a client restarts. The reason for this is the | established after a client restarts. The reason for this is that the | |||
client may have file data stored locally and this data was associated | client may have file data stored locally and this data was associated | |||
with the previously held delegations. The client will need to | with the previously held delegations. The client will need to re- | |||
reestablish the appropriate file state on the server. | establish the appropriate file state on the server. | |||
To allow for this type of client recovery, the server MAY extend the | To allow for this type of client recovery, the server MAY extend the | |||
period for delegation recovery beyond the typical lease expiration | period for delegation recovery beyond the typical lease expiration | |||
period. This implies that requests from other clients that conflict | period. This implies that requests from other clients that conflict | |||
with these delegations will need to wait. Because the normal recall | with these delegations will need to wait. Because the normal recall | |||
process may require significant time for the client to flush changed | process may require significant time for the client to flush changed | |||
state to the server, other clients need be prepared for delays that | state to the server, other clients need be prepared for delays that | |||
occur because of a conflicting delegation. This longer interval | occur because of a conflicting delegation. This longer interval | |||
would increase the window for clients to restart and consult stable | would increase the window for clients to restart and consult stable | |||
storage so that the delegations can be reclaimed. For open | storage so that the delegations can be reclaimed. For OPEN | |||
delegations, such delegations are reclaimed using OPEN with a claim | delegations, such delegations are reclaimed using OPEN with a claim | |||
type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (See Section 10.5 | type of CLAIM_DELEGATE_PREV or CLAIM_DELEG_PREV_FH (see Sections 10.5 | |||
and Section 18.16 for discussion of open delegation and the details | and 18.16 for discussion of OPEN delegation and the details of OPEN, | |||
of OPEN respectively). | respectively). | |||
A server MAY support claim types of CLAIM_DELEGATE_PREV and | A server MAY support claim types of CLAIM_DELEGATE_PREV and | |||
CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations | CLAIM_DELEG_PREV_FH, and if it does, it MUST NOT remove delegations | |||
upon a CREATE_SESSION that confirms a client ID created by | upon a CREATE_SESSION that confirm a client ID created by | |||
EXCHANGE_ID, and instead MUST, for a period of time no less than that | EXCHANGE_ID. Instead, the server MUST, for a period of time no less | |||
of the value of the lease_time attribute, maintain the client's | than that of the value of the lease_time attribute, maintain the | |||
delegations to allow time for the client to send CLAIM_DELEGATE_PREV | client's delegations to allow time for the client to send | |||
requests. The server that supports CLAIM_DELEGATE_PREV and/or | CLAIM_DELEGATE_PREV and/or CLAIM_DELEG_PREV_FH requests. The server | |||
CLAIM_DELEG_PREV_FH MUST support the DELEGPURGE operation. | that supports CLAIM_DELEGATE_PREV and/or CLAIM_DELEG_PREV_FH MUST | |||
support the DELEGPURGE operation. | ||||
When the server restarts, delegations are reclaimed (using the OPEN | When the server restarts, delegations are reclaimed (using the OPEN | |||
operation with CLAIM_PREVIOUS) in a similar fashion to byte-range | operation with CLAIM_PREVIOUS) in a similar fashion to byte-range | |||
locks and share reservations. However, there is a slight semantic | locks and share reservations. However, there is a slight semantic | |||
difference. In the normal case if the server decides that a | difference. In the normal case, if the server decides that a | |||
delegation should not be granted, it performs the requested action | delegation should not be granted, it performs the requested action | |||
(e.g. OPEN) without granting any delegation. For reclaim, the | (e.g., OPEN) without granting any delegation. For reclaim, the | |||
server grants the delegation but a special designation is applied so | server grants the delegation but a special designation is applied so | |||
that the client treats the delegation as having been granted but | that the client treats the delegation as having been granted but | |||
recalled by the server. Because of this, the client has the duty to | recalled by the server. Because of this, the client has the duty to | |||
write all modified state to the server and then return the | write all modified state to the server and then return the | |||
delegation. This process of handling delegation reclaim reconciles | delegation. This process of handling delegation reclaim reconciles | |||
three principles of the NFSv4.1 protocol: | three principles of the NFSv4.1 protocol: | |||
o Upon reclaim, a client reporting resources assigned to it by an | o Upon reclaim, a client reporting resources assigned to it by an | |||
earlier server instance must be granted those resources. | earlier server instance must be granted those resources. | |||
o The server has unquestionable authority to determine whether | o The server has unquestionable authority to determine whether | |||
delegations are to be granted and, once granted, whether they are | delegations are to be granted and, once granted, whether they are | |||
to be continued. | to be continued. | |||
o The use of callbacks is not to be depended upon until the client | o The use of callbacks should not be depended upon until the client | |||
has proven its ability to receive them. | has proven its ability to receive them. | |||
When a client needs to reclaim a delegation and there is no | When a client needs to reclaim a delegation and there is no | |||
associated open, the client may use the CLAIM_PREVIOUS variant of the | associated open, the client may use the CLAIM_PREVIOUS variant of the | |||
WANT_DELEGATION operation. However, since the server is not required | WANT_DELEGATION operation. However, since the server is not required | |||
to support this operation, an alternative is to reclaim via a dummy | to support this operation, an alternative is to reclaim via a dummy | |||
open together with the delegation using an OPEN of type | OPEN together with the delegation using an OPEN of type | |||
CLAIM_PREVIOUS. The dummy open file can be released using a CLOSE to | CLAIM_PREVIOUS. The dummy open file can be released using a CLOSE to | |||
re-establish the original state to be reclaimed, a delegation without | re-establish the original state to be reclaimed, a delegation without | |||
an associated open. | an associated open. | |||
When a client has more than a single open associated with a | When a client has more than a single open associated with a | |||
delegation, state for those additional opens can be established using | delegation, state for those additional opens can be established using | |||
OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to | OPEN operations of type CLAIM_DELEGATE_CUR. When these are used to | |||
establish opens associated with reclaimed delegations, the server | establish opens associated with reclaimed delegations, the server | |||
MUST allow them when made within the grace period. | MUST allow them when made within the grace period. | |||
When a network partition occurs, delegations are subject to freeing | When a network partition occurs, delegations are subject to freeing | |||
by the server when the lease renewal period expires. This is similar | by the server when the lease renewal period expires. This is similar | |||
to the behavior for locks and share reservations. For delegations, | to the behavior for locks and share reservations. For delegations, | |||
however, the server may extend the period in which conflicting | however, the server may extend the period in which conflicting | |||
requests are held off. Eventually the occurrence of a conflicting | requests are held off. Eventually, the occurrence of a conflicting | |||
request from another client will cause revocation of the delegation. | request from another client will cause revocation of the delegation. | |||
A loss of the backchannel (e.g. by later network configuration | A loss of the backchannel (e.g., by later network configuration | |||
change) will have the same effect. A recall request will fail and | change) will have the same effect. A recall request will fail and | |||
revocation of the delegation will result. | revocation of the delegation will result. | |||
A client normally finds out about revocation of a delegation when it | A client normally finds out about revocation of a delegation when it | |||
uses a stateid associated with a delegation and receives one of the | uses a stateid associated with a delegation and receives one of the | |||
errors NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or | errors NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, or | |||
NFS4ERR_DELEG_REVOKED. It also may find out about delegation | NFS4ERR_DELEG_REVOKED. It also may find out about delegation | |||
revocation after a client restart when it attempts to reclaim a | revocation after a client restart when it attempts to reclaim a | |||
delegation and receives that same error. Note that in the case of a | delegation and receives that same error. Note that in the case of a | |||
revoked write open delegation, there are issues because data may have | revoked OPEN_DELEGATE_WRITE delegation, there are issues because data | |||
been modified by the client whose delegation is revoked and | may have been modified by the client whose delegation is revoked and | |||
separately by other clients. See Section 10.5.1 for a discussion of | separately by other clients. See Section 10.5.1 for a discussion of | |||
such issues. Note also that when delegations are revoked, | such issues. Note also that when delegations are revoked, | |||
information about the revoked delegation will be written by the | information about the revoked delegation will be written by the | |||
server to stable storage (as described in Section 8.4.3). This is | server to stable storage (as described in Section 8.4.3). This is | |||
done to deal with the case in which a server restarts after revoking | done to deal with the case in which a server restarts after revoking | |||
a delegation but before the client holding the revoked delegation is | a delegation but before the client holding the revoked delegation is | |||
notified about the revocation. | notified about the revocation. | |||
10.3. Data Caching | 10.3. Data Caching | |||
When applications share access to a set of files, they need to be | When applications share access to a set of files, they need to be | |||
implemented so as to take account of the possibility of conflicting | implemented so as to take account of the possibility of conflicting | |||
access by another application. This is true whether the applications | access by another application. This is true whether the applications | |||
in question execute on different clients or reside on the same | in question execute on different clients or reside on the same | |||
client. | client. | |||
Share reservations and byte-range locks are the facilities the | Share reservations and byte-range locks are the facilities the | |||
NFSv4.1 protocol provides to allow applications to coordinate access | NFSv4.1 protocol provides to allow applications to coordinate access | |||
by using mutual exclusion facilities. The NFSv4.1 protocol's data | by using mutual exclusion facilities. The NFSv4.1 protocol's data | |||
caching must be implemented such that it does not invalidate the | caching must be implemented such that it does not invalidate the | |||
assumptions that those using these facilities depend upon. | assumptions on which those using these facilities depend. | |||
10.3.1. Data Caching and OPENs | 10.3.1. Data Caching and OPENs | |||
In order to avoid invalidating the sharing assumptions that | In order to avoid invalidating the sharing assumptions on which | |||
applications rely on, NFSv4.1 clients should not provide cached data | applications rely, NFSv4.1 clients should not provide cached data to | |||
to applications or modify it on behalf of an application when it | applications or modify it on behalf of an application when it would | |||
would not be valid to obtain or modify that same data via a READ or | not be valid to obtain or modify that same data via a READ or WRITE | |||
WRITE operation. | operation. | |||
Furthermore, in the absence of open delegation (see Section 10.4), | Furthermore, in the absence of an OPEN delegation (see Section 10.4), | |||
two additional rules apply. Note that these rules are obeyed in | two additional rules apply. Note that these rules are obeyed in | |||
practice by many NFSv3 clients. | practice by many NFSv3 clients. | |||
o First, cached data present on a client must be revalidated after | o First, cached data present on a client must be revalidated after | |||
doing an OPEN. Revalidating means that the client fetches the | doing an OPEN. Revalidating means that the client fetches the | |||
change attribute from the server, compares it with the cached | change attribute from the server, compares it with the cached | |||
change attribute, and if different, declares the cached data (as | change attribute, and if different, declares the cached data (as | |||
well as the cached attributes) as invalid. This is to ensure that | well as the cached attributes) as invalid. This is to ensure that | |||
the data for the OPENed file is still correctly reflected in the | the data for the OPENed file is still correctly reflected in the | |||
client's cache. This validation must be done at least when the | client's cache. This validation must be done at least when the | |||
client's OPEN operation includes DENY=WRITE or BOTH thus | client's OPEN operation includes a deny of OPEN4_SHARE_DENY_WRITE | |||
terminating a period in which other clients may have had the | or OPEN4_SHARE_DENY_BOTH, thus terminating a period in which other | |||
opportunity to open the file with WRITE access. Clients may | clients may have had the opportunity to open the file with | |||
choose to do the revalidation more often (i.e. at OPENs specifying | OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH access. Clients | |||
DENY=NONE) to parallel the NFSv3 protocol's practice for the | may choose to do the revalidation more often (i.e., at OPENs | |||
benefit of users assuming this degree of cache revalidation. | specifying a deny mode of OPEN4_SHARE_DENY_NONE) to parallel the | |||
NFSv3 protocol's practice for the benefit of users assuming this | ||||
degree of cache revalidation. | ||||
Since the change attribute is updated for data and metadata | Since the change attribute is updated for data and metadata | |||
modifications, some client implementors may be tempted to use the | modifications, some client implementors may be tempted to use the | |||
time_modify attribute and not the change attribute to validate | time_modify attribute and not the change attribute to validate | |||
cached data, so that metadata changes do not spuriously invalidate | cached data, so that metadata changes do not spuriously invalidate | |||
clean data. The implementor is cautioned in this approach. The | clean data. The implementor is cautioned in this approach. The | |||
change attribute is guaranteed to change for each update to the | change attribute is guaranteed to change for each update to the | |||
file, whereas time_modify is guaranteed to change only at the | file, whereas time_modify is guaranteed to change only at the | |||
granularity of the time_delta attribute. Use by the client's data | granularity of the time_delta attribute. Use by the client's data | |||
cache validation logic of time_modify and not change runs the risk | cache validation logic of time_modify and not change runs the risk | |||
of the client incorrectly marking stale data as valid. Thus any | of the client incorrectly marking stale data as valid. Thus, any | |||
cache validation approach by the client MUST include the use of | cache validation approach by the client MUST include the use of | |||
the change attribute. | the change attribute. | |||
o Second, modified data must be flushed to the server before closing | o Second, modified data must be flushed to the server before closing | |||
a file OPENed for write. This is complementary to the first rule. | a file OPENed for OPEN4_SHARE_ACCESS_WRITE. This is complementary | |||
If the data is not flushed at CLOSE, the revalidation done after | to the first rule. If the data is not flushed at CLOSE, the | |||
client OPENs as file is unable to achieve its purpose. The other | revalidation done after the client OPENs a file is unable to | |||
aspect to flushing the data before close is that the data must be | achieve its purpose. The other aspect to flushing the data before | |||
committed to stable storage, at the server, before the CLOSE | close is that the data must be committed to stable storage, at the | |||
operation is requested by the client. In the case of a server | server, before the CLOSE operation is requested by the client. In | |||
restart and a CLOSEd file, it may not be possible to retransmit | the case of a server restart and a CLOSEd file, it may not be | |||
the data to be written to the file. Hence, this requirement. | possible to retransmit the data to be written to the file, hence, | |||
this requirement. | ||||
10.3.2. Data Caching and File Locking | 10.3.2. Data Caching and File Locking | |||
For those applications that choose to use file locking instead of | For those applications that choose to use byte-range locking instead | |||
share reservations to exclude inconsistent file access, there is an | of share reservations to exclude inconsistent file access, there is | |||
analogous set of constraints that apply to client side data caching. | an analogous set of constraints that apply to client-side data | |||
These rules are effective only if the file locking is used in a way | caching. These rules are effective only if the byte-range locking is | |||
that matches in an equivalent way the actual READ and WRITE | used in a way that matches in an equivalent way the actual READ and | |||
operations executed. This is as opposed to file locking that is | WRITE operations executed. This is as opposed to byte-range locking | |||
based on pure convention. For example, it is possible to manipulate | that is based on pure convention. For example, it is possible to | |||
a two-megabyte file by dividing the file into two one-megabyte | manipulate a two-megabyte file by dividing the file into two one- | |||
regions and protecting access to the two regions by file locks on | megabyte ranges and protecting access to the two byte-ranges by byte- | |||
bytes zero and one. A lock for write on byte zero of the file would | range locks on bytes zero and one. A WRITE_LT lock on byte zero of | |||
represent the right to do READ and WRITE operations on the first | the file would represent the right to perform READ and WRITE | |||
region. A lock for write on byte one of the file would represent the | operations on the first byte-range. A WRITE_LT lock on byte one of | |||
right to do READ and WRITE operations on the second region. As long | the file would represent the right to perform READ and WRITE | |||
as all applications manipulating the file obey this convention, they | operations on the second byte-range. As long as all applications | |||
will work on a local file system. However, they may not work with | manipulating the file obey this convention, they will work on a local | |||
the NFSv4.1 protocol unless clients refrain from data caching. | file system. However, they may not work with the NFSv4.1 protocol | |||
unless clients refrain from data caching. | ||||
The rules for data caching in the file locking environment are: | The rules for data caching in the byte-range locking environment are: | |||
o First, when a client obtains a file lock for a particular region, | o First, when a client obtains a byte-range lock for a particular | |||
the data cache corresponding to that region (if any cache data | byte-range, the data cache corresponding to that byte-range (if | |||
exists) must be revalidated. If the change attribute indicates | any cache data exists) must be revalidated. If the change | |||
that the file may have been updated since the cached data was | attribute indicates that the file may have been updated since the | |||
obtained, the client must flush or invalidate the cached data for | cached data was obtained, the client must flush or invalidate the | |||
the newly locked region. A client might choose to invalidate all | cached data for the newly locked byte-range. A client might | |||
of non-modified cached data that it has for the file but the only | choose to invalidate all of the non-modified cached data that it | |||
requirement for correct operation is to invalidate all of the data | has for the file, but the only requirement for correct operation | |||
in the newly locked region. | is to invalidate all of the data in the newly locked byte-range. | |||
o Second, before releasing a write lock for a region, all modified | o Second, before releasing a WRITE_LT lock for a byte-range, all | |||
data for that region must be flushed to the server. The modified | modified data for that byte-range must be flushed to the server. | |||
data must also be written to stable storage. | The modified data must also be written to stable storage. | |||
Note that flushing data to the server and the invalidation of cached | Note that flushing data to the server and the invalidation of cached | |||
data must reflect the actual byte ranges locked or unlocked. | data must reflect the actual byte-ranges locked or unlocked. | |||
Rounding these up or down to reflect client cache block boundaries | Rounding these up or down to reflect client cache block boundaries | |||
will cause problems if not carefully done. For example, writing a | will cause problems if not carefully done. For example, writing a | |||
modified block when only half of that block is within an area being | modified block when only half of that block is within an area being | |||
unlocked may cause invalid modification to the region outside the | unlocked may cause invalid modification to the byte-range outside the | |||
unlocked area. This, in turn, may be part of a region locked by | unlocked area. This, in turn, may be part of a byte-range locked by | |||
another client. Clients can avoid this situation by synchronously | another client. Clients can avoid this situation by synchronously | |||
performing portions of write operations that overlap that portion | performing portions of WRITE operations that overlap that portion | |||
(initial or final) that is not a full block. Similarly, invalidating | (initial or final) that is not a full block. Similarly, invalidating | |||
a locked area which is not an integral number of full buffer blocks | a locked area that is not an integral number of full buffer blocks | |||
would require the client to read one or two partial blocks from the | would require the client to read one or two partial blocks from the | |||
server if the revalidation procedure shows that the data which the | server if the revalidation procedure shows that the data that the | |||
client possesses may not be valid. | client possesses may not be valid. | |||
The data that is written to the server as a prerequisite to the | The data that is written to the server as a prerequisite to the | |||
unlocking of a region must be written, at the server, to stable | unlocking of a byte-range must be written, at the server, to stable | |||
storage. The client may accomplish this either with synchronous | storage. The client may accomplish this either with synchronous | |||
writes or by following asynchronous writes with a COMMIT operation. | writes or by following asynchronous writes with a COMMIT operation. | |||
This is required because retransmission of the modified data after a | This is required because retransmission of the modified data after a | |||
server restart might conflict with a lock held by another client. | server restart might conflict with a lock held by another client. | |||
A client implementation may choose to accommodate applications which | A client implementation may choose to accommodate applications that | |||
use byte-range locking in non-standard ways (e.g. using a byte-range | use byte-range locking in non-standard ways (e.g., using a byte-range | |||
lock as a global semaphore) by flushing to the server more data upon | lock as a global semaphore) by flushing to the server more data upon | |||
an LOCKU than is covered by the locked range. This may include | a LOCKU than is covered by the locked range. This may include | |||
modified data within files other than the one for which the unlocks | modified data within files other than the one for which the unlocks | |||
are being done. In such cases, the client must not interfere with | are being done. In such cases, the client must not interfere with | |||
applications whose READs and WRITEs are being done only within the | applications whose READs and WRITEs are being done only within the | |||
bounds of byte-range locks which the application holds. For example, | bounds of byte-range locks that the application holds. For example, | |||
an application locks a single byte of a file and proceeds to write | an application locks a single byte of a file and proceeds to write | |||
that single byte. A client that chose to handle a LOCKU by flushing | that single byte. A client that chose to handle a LOCKU by flushing | |||
all modified data to the server could validly write that single byte | all modified data to the server could validly write that single byte | |||
in response to an unrelated unlock. However, it would not be valid | in response to an unrelated LOCKU operation. However, it would not | |||
to write the entire block in which that single written byte was | be valid to write the entire block in which that single written byte | |||
located since it includes an area that is not locked and might be | was located since it includes an area that is not locked and might be | |||
locked by another client. Client implementations can avoid this | locked by another client. Client implementations can avoid this | |||
problem by dividing files with modified data into those for which all | problem by dividing files with modified data into those for which all | |||
modifications are done to areas covered by an appropriate byte-range | modifications are done to areas covered by an appropriate byte-range | |||
lock and those for which there are modifications not covered by a | lock and those for which there are modifications not covered by a | |||
byte-range lock. Any writes done for the former class of files must | byte-range lock. Any writes done for the former class of files must | |||
not include areas not locked and thus not modified on the client. | not include areas not locked and thus not modified on the client. | |||
10.3.3. Data Caching and Mandatory File Locking | 10.3.3. Data Caching and Mandatory File Locking | |||
Client side data caching needs to respect mandatory file locking when | Client-side data caching needs to respect mandatory byte-range | |||
it is in effect. The presence of mandatory file locking for a given | locking when it is in effect. The presence of mandatory byte-range | |||
file is indicated when the client gets back NFS4ERR_LOCKED from a | locking for a given file is indicated when the client gets back | |||
READ or WRITE on a file it has an appropriate share reservation for. | NFS4ERR_LOCKED from a READ or WRITE operation on a file for which it | |||
When mandatory locking is in effect for a file, the client must check | has an appropriate share reservation. When mandatory locking is in | |||
for an appropriate file lock for data being read or written. If a | effect for a file, the client must check for an appropriate byte- | |||
lock exists for the range being read or written, the client may | range lock for data being read or written. If a byte-range lock | |||
satisfy the request using the client's validated cache. If an | exists for the range being read or written, the client may satisfy | |||
appropriate file lock is not held for the range of the read or write, | the request using the client's validated cache. If an appropriate | |||
the read or write request must not be satisfied by the client's cache | byte-range lock is not held for the range of the read or write, the | |||
and the request must be sent to the server for processing. When a | read or write request must not be satisfied by the client's cache and | |||
read or write request partially overlaps a locked region, the request | the request must be sent to the server for processing. When a read | |||
should be subdivided into multiple pieces with each region (locked or | or write request partially overlaps a locked byte-range, the request | |||
not) treated appropriately. | should be subdivided into multiple pieces with each byte-range | |||
(locked or not) treated appropriately. | ||||
10.3.4. Data Caching and File Identity | 10.3.4. Data Caching and File Identity | |||
When clients cache data, the file data needs to be organized | When clients cache data, the file data needs to be organized | |||
according to the file system object to which the data belongs. For | according to the file system object to which the data belongs. For | |||
NFSv3 clients, the typical practice has been to assume for the | NFSv3 clients, the typical practice has been to assume for the | |||
purpose of caching that distinct filehandles represent distinct file | purpose of caching that distinct filehandles represent distinct file | |||
system objects. The client then has the choice to organize and | system objects. The client then has the choice to organize and | |||
maintain the data cache on this basis. | maintain the data cache on this basis. | |||
In the NFSv4.1 protocol, there is now the possibility to have | In the NFSv4.1 protocol, there is now the possibility to have | |||
significant deviations from a "one filehandle per object" model | significant deviations from a "one filehandle per object" model | |||
because a filehandle may be constructed on the basis of the object's | because a filehandle may be constructed on the basis of the object's | |||
pathname. Therefore, clients need a reliable method to determine if | pathname. Therefore, clients need a reliable method to determine if | |||
two filehandles designate the same file system object. If clients | two filehandles designate the same file system object. If clients | |||
were simply to assume that all distinct filehandles denote distinct | were simply to assume that all distinct filehandles denote distinct | |||
objects and proceed to do data caching on this basis, caching | objects and proceed to do data caching on this basis, caching | |||
inconsistencies would arise between the distinct client side objects | inconsistencies would arise between the distinct client-side objects | |||
which mapped to the same server side object. | that mapped to the same server-side object. | |||
By providing a method to differentiate filehandles, the NFSv4.1 | By providing a method to differentiate filehandles, the NFSv4.1 | |||
protocol alleviates a potential functional regression in comparison | protocol alleviates a potential functional regression in comparison | |||
with the NFSv3 protocol. Without this method, caching | with the NFSv3 protocol. Without this method, caching | |||
inconsistencies within the same client could occur and this has not | inconsistencies within the same client could occur, and this has not | |||
been present in previous versions of the NFS protocol. Note that it | been present in previous versions of the NFS protocol. Note that it | |||
is possible to have such inconsistencies with applications executing | is possible to have such inconsistencies with applications executing | |||
on multiple clients but that is not the issue being addressed here. | on multiple clients, but that is not the issue being addressed here. | |||
For the purposes of data caching, the following steps allow an | For the purposes of data caching, the following steps allow an | |||
NFSv4.1 client to determine whether two distinct filehandles denote | NFSv4.1 client to determine whether two distinct filehandles denote | |||
the same server side object: | the same server-side object: | |||
o If GETATTR directed to two filehandles returns different values of | o If GETATTR directed to two filehandles returns different values of | |||
the fsid attribute, then the filehandles represent distinct | the fsid attribute, then the filehandles represent distinct | |||
objects. | objects. | |||
o If GETATTR for any file with an fsid that matches the fsid of the | o If GETATTR for any file with an fsid that matches the fsid of the | |||
two filehandles in question returns a unique_handles attribute | two filehandles in question returns a unique_handles attribute | |||
with a value of TRUE, then the two objects are distinct. | with a value of TRUE, then the two objects are distinct. | |||
o If GETATTR directed to the two filehandles does not return the | o If GETATTR directed to the two filehandles does not return the | |||
fileid attribute for both of the handles, then it cannot be | fileid attribute for both of the handles, then it cannot be | |||
determined whether the two objects are the same. Therefore, | determined whether the two objects are the same. Therefore, | |||
operations which depend on that knowledge (e.g. client side data | operations that depend on that knowledge (e.g., client-side data | |||
caching) cannot be done reliably. Note that if GETATTR does not | caching) cannot be done reliably. Note that if GETATTR does not | |||
return the fileid attribute for both filehandles, it will return | return the fileid attribute for both filehandles, it will return | |||
it for neither of the filehandles, since the fsid for both | it for neither of the filehandles, since the fsid for both | |||
filehandles is the same. | filehandles is the same. | |||
o If GETATTR directed to the two filehandles returns different | o If GETATTR directed to the two filehandles returns different | |||
values for the fileid attribute, then they are distinct objects. | values for the fileid attribute, then they are distinct objects. | |||
o Otherwise they are the same object. | o Otherwise, they are the same object. | |||
10.4. Open Delegation | 10.4. Open Delegation | |||
When a file is being OPENed, the server may delegate further handling | When a file is being OPENed, the server may delegate further handling | |||
of opens and closes for that file to the opening client. Any such | of opens and closes for that file to the opening client. Any such | |||
delegation is recallable, since the circumstances that allowed for | delegation is recallable since the circumstances that allowed for the | |||
the delegation are subject to change. In particular, the server may | delegation are subject to change. In particular, if the server | |||
receive a conflicting OPEN from another client, the server must | receives a conflicting OPEN from another client, the server must | |||
recall the delegation before deciding whether the OPEN from the other | recall the delegation before deciding whether the OPEN from the other | |||
client may be granted. Making a delegation is up to the server and | client may be granted. Making a delegation is up to the server, and | |||
clients should not assume that any particular OPEN either will or | clients should not assume that any particular OPEN either will or | |||
will not result in an open delegation. The following is a typical | will not result in an OPEN delegation. The following is a typical | |||
set of conditions that servers might use in deciding whether OPEN | set of conditions that servers might use in deciding whether an OPEN | |||
should be delegated: | should be delegated: | |||
o The client must be able to respond to the server's callback | o The client must be able to respond to the server's callback | |||
requests. If a backchannel has been established, the server will | requests. If a backchannel has been established, the server will | |||
send a CB_COMPOUND request, containing a single operation, | send a CB_COMPOUND request, containing a single operation, | |||
CB_SEQUENCE, for a test of backchannel availability. | CB_SEQUENCE, for a test of backchannel availability. | |||
o The client must have responded properly to previous recalls. | o The client must have responded properly to previous recalls. | |||
o There must be no current open conflicting with the requested | o There must be no current OPEN conflicting with the requested | |||
delegation. | delegation. | |||
o There should be no current delegation that conflicts with the | o There should be no current delegation that conflicts with the | |||
delegation being requested. | delegation being requested. | |||
o The probability of future conflicting open requests should be low | o The probability of future conflicting open requests should be low | |||
based on the recent history of the file. | based on the recent history of the file. | |||
o The existence of any server-specific semantics of OPEN/CLOSE that | o The existence of any server-specific semantics of OPEN/CLOSE that | |||
would make the required handling incompatible with the prescribed | would make the required handling incompatible with the prescribed | |||
handling that the delegated client would apply (see below). | handling that the delegated client would apply (see below). | |||
There are two types of open delegations, read and write. A read open | There are two types of OPEN delegations: OPEN_DELEGATE_READ and | |||
delegation allows a client to handle, on its own, requests to open a | OPEN_DELEGATE_WRITE. An OPEN_DELEGATE_READ delegation allows a | |||
file for reading that do not deny read access to others. Multiple | client to handle, on its own, requests to open a file for reading | |||
read open delegations may be outstanding simultaneously and do not | that do not deny OPEN4_SHARE_ACCESS_READ access to others. Multiple | |||
conflict. A write open delegation allows the client to handle, on | OPEN_DELEGATE_READ delegations may be outstanding simultaneously and | |||
its own, all opens. Only one write open delegation may exist for a | do not conflict. An OPEN_DELEGATE_WRITE delegation allows the client | |||
given file at a given time and it is inconsistent with any read open | to handle, on its own, all opens. Only OPEN_DELEGATE_WRITE | |||
delegations. | delegation may exist for a given file at a given time, and it is | |||
inconsistent with any OPEN_DELEGATE_READ delegations. | ||||
When a client has a read open delegation, it is assured that neither | When a client has an OPEN_DELEGATE_READ delegation, it is assured | |||
the contents, the attributes (with the exception of time_access), nor | that neither the contents, the attributes (with the exception of | |||
the names of any links to the file will change without its knowledge, | time_access), nor the names of any links to the file will change | |||
so long as the delegation is held. When a client has a write open | without its knowledge, so long as the delegation is held. When a | |||
delegation, it may modify the file data locally since no other client | client has an OPEN_DELEGATE_WRITE delegation, it may modify the file | |||
will be accessing the file's data. The client holding a write | data locally since no other client will be accessing the file's data. | |||
delegation may only locally affect file attributes which are | The client holding an OPEN_DELEGATE_WRITE delegation may only locally | |||
intimately connected with the file data: size, change, time_access, | affect file attributes that are intimately connected with the file | |||
time_metadata, and time_modify. All other attributes must be | data: size, change, time_access, time_metadata, and time_modify. All | |||
reflected on the server. | other attributes must be reflected on the server. | |||
When a client has an open delegation, it does not need to send OPENs | When a client has an OPEN delegation, it does not need to send OPENs | |||
or CLOSEs to the server. Instead the client may update the | or CLOSEs to the server. Instead, the client may update the | |||
appropriate status internally. For a read open delegation, opens | appropriate status internally. For an OPEN_DELEGATE_READ delegation, | |||
that cannot be handled locally (opens for write or that deny read | opens that cannot be handled locally (opens that are for | |||
access) must be sent to the server. | OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH or that deny | |||
OPEN4_SHARE_ACCESS_READ access) must be sent to the server. | ||||
When an open delegation is made, the reply to the OPEN contains an | When an OPEN delegation is made, the reply to the OPEN contains an | |||
open delegation structure which specifies the following: | OPEN delegation structure that specifies the following: | |||
o the type of delegation (read or write). | o the type of delegation (OPEN_DELEGATE_READ or | |||
OPEN_DELEGATE_WRITE). | ||||
o space limitation information to control flushing of data on close | o space limitation information to control flushing of data on close | |||
(write open delegation only, see Section 10.4.1). | (OPEN_DELEGATE_WRITE delegation only; see Section 10.4.1) | |||
o an nfsace4 specifying read and write permissions. | o an nfsace4 specifying read and write permissions | |||
o a stateid to represent the delegation for READ and WRITE. | o a stateid to represent the delegation | |||
The delegation stateid is separate and distinct from the stateid for | The delegation stateid is separate and distinct from the stateid for | |||
the OPEN proper. The standard stateid, unlike the delegation | the OPEN proper. The standard stateid, unlike the delegation | |||
stateid, is associated with a particular lock-owner and will continue | stateid, is associated with a particular lock-owner and will continue | |||
to be valid after the delegation is recalled and the file remains | to be valid after the delegation is recalled and the file remains | |||
open. | open. | |||
When a request internal to the client is made to open a file and an | When a request internal to the client is made to open a file and an | |||
open delegation is in effect, it will be accepted or rejected solely | OPEN delegation is in effect, it will be accepted or rejected solely | |||
on the basis of the following conditions. Any requirement for other | on the basis of the following conditions. Any requirement for other | |||
checks to be made by the delegate should result in open delegation | checks to be made by the delegate should result in the OPEN | |||
being denied so that the checks can be made by the server itself. | delegation being denied so that the checks can be made by the server | |||
itself. | ||||
o The access and deny bits for the request and the file as described | o The access and deny bits for the request and the file as described | |||
in Section 9.7. | in Section 9.7. | |||
o The read and write permissions as determined below. | o The read and write permissions as determined below. | |||
The nfsace4 passed with delegation can be used to avoid frequent | The nfsace4 passed with delegation can be used to avoid frequent | |||
ACCESS calls. The permission check should be as follows: | ACCESS calls. The permission check should be as follows: | |||
o If the nfsace4 indicates that the open may be done, then it should | o If the nfsace4 indicates that the open may be done, then it should | |||
skipping to change at page 207, line 52 | skipping to change at page 207, line 37 | |||
The use of a delegation together with various other forms of caching | The use of a delegation together with various other forms of caching | |||
creates the possibility that no server authentication and | creates the possibility that no server authentication and | |||
authorization will ever be performed for a given user since all of | authorization will ever be performed for a given user since all of | |||
the user's requests might be satisfied locally. Where the client is | the user's requests might be satisfied locally. Where the client is | |||
depending on the server for authentication and authorization, the | depending on the server for authentication and authorization, the | |||
client should be sure authentication and authorization occurs for | client should be sure authentication and authorization occurs for | |||
each user by use of the ACCESS operation. This should be the case | each user by use of the ACCESS operation. This should be the case | |||
even if an ACCESS operation would not be required otherwise. As | even if an ACCESS operation would not be required otherwise. As | |||
mentioned before, the server may enforce frequent authentication by | mentioned before, the server may enforce frequent authentication by | |||
returning an nfsace4 denying all access with every open delegation. | returning an nfsace4 denying all access with every OPEN delegation. | |||
10.4.1. Open Delegation and Data Caching | 10.4.1. Open Delegation and Data Caching | |||
An OPEN delegation allows much of the message overhead associated | An OPEN delegation allows much of the message overhead associated | |||
with the opening and closing files to be eliminated. An open when an | with the opening and closing files to be eliminated. An open when an | |||
open delegation is in effect does not require that a validation | OPEN delegation is in effect does not require that a validation | |||
message be sent to the server. The continued endurance of the "read | message be sent to the server. The continued endurance of the | |||
open delegation" provides a guarantee that no OPEN for write and thus | "OPEN_DELEGATE_READ delegation" provides a guarantee that no OPEN for | |||
no write has occurred. Similarly, when closing a file opened for | OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH, and thus no write, | |||
write and if write open delegation is in effect, the data written | has occurred. Similarly, when closing a file opened for | |||
does not have to be written to the server until the open delegation | OPEN4_SHARE_ACCESS_WRITE/OPEN4_SHARE_ACCESS_BOTH and if an | |||
is recalled. The continued endurance of the open delegation provides | OPEN_DELEGATE_WRITE delegation is in effect, the data written does | |||
a guarantee that no open and thus no read or write has been done by | not have to be written to the server until the OPEN delegation is | |||
recalled. The continued endurance of the OPEN delegation provides a | ||||
guarantee that no open, and thus no READ or WRITE, has been done by | ||||
another client. | another client. | |||
For the purposes of open delegation, READs and WRITEs done without an | For the purposes of OPEN delegation, READs and WRITEs done without an | |||
OPEN are treated as the functional equivalents of a corresponding | OPEN are treated as the functional equivalents of a corresponding | |||
type of OPEN. Although client SHOULD NOT use special stateids when | type of OPEN. Although a client SHOULD NOT use special stateids when | |||
an open exists, delegation handling on the server can use the client | an open exists, delegation handling on the server can use the client | |||
ID associated with the current session to determine if the operation | ID associated with the current session to determine if the operation | |||
has been done by the holder of the delegation, in which case, no | has been done by the holder of the delegation (in which case, no | |||
recall is necessary, or by another client, in which case the | recall is necessary) or by another client (in which case, the | |||
delegation must be recalled and I/O not proceed until the delegation | delegation must be recalled and I/O not proceed until the delegation | |||
is recalled or revoked. | is recalled or revoked). | |||
With delegations, a client is able to avoid writing data to the | With delegations, a client is able to avoid writing data to the | |||
server when the CLOSE of a file is serviced. The file close system | server when the CLOSE of a file is serviced. The file close system | |||
call is the usual point at which the client is notified of a lack of | call is the usual point at which the client is notified of a lack of | |||
stable storage for the modified file data generated by the | stable storage for the modified file data generated by the | |||
application. At the close, file data is written to the server and | application. At the close, file data is written to the server and, | |||
through normal accounting the server is able to determine if the | through normal accounting, the server is able to determine if the | |||
available file system space for the data has been exceeded (i.e. | available file system space for the data has been exceeded (i.e., the | |||
server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting | server returns NFS4ERR_NOSPC or NFS4ERR_DQUOT). This accounting | |||
includes quotas. The introduction of delegations requires that a | includes quotas. The introduction of delegations requires that an | |||
alternative method be in place for the same type of communication to | alternative method be in place for the same type of communication to | |||
occur between client and server. | occur between client and server. | |||
In the delegation response, the server provides either the limit of | In the delegation response, the server provides either the limit of | |||
the size of the file or the number of modified blocks and associated | the size of the file or the number of modified blocks and associated | |||
block size. The server must ensure that the client will be able to | block size. The server must ensure that the client will be able to | |||
write modified data to the server of a size equal to that provided in | write modified data to the server of a size equal to that provided in | |||
the original delegation. The server must make this assurance for all | the original delegation. The server must make this assurance for all | |||
outstanding delegations. Therefore, the server must be careful in | outstanding delegations. Therefore, the server must be careful in | |||
its management of available space for new or modified data taking | its management of available space for new or modified data, taking | |||
into account available file system space and any applicable quotas. | into account available file system space and any applicable quotas. | |||
The server can recall delegations as a result of managing the | The server can recall delegations as a result of managing the | |||
available file system space. The client should abide by the server's | available file system space. The client should abide by the server's | |||
state space limits for delegations. If the client exceeds the stated | state space limits for delegations. If the client exceeds the stated | |||
limits for the delegation, the server's behavior is undefined. | limits for the delegation, the server's behavior is undefined. | |||
Based on server conditions, quotas or available file system space, | Based on server conditions, quotas, or available file system space, | |||
the server may grant write open delegations with very restrictive | the server may grant OPEN_DELEGATE_WRITE delegations with very | |||
space limitations. The limitations may be defined in a way that will | restrictive space limitations. The limitations may be defined in a | |||
always force modified data to be flushed to the server on close. | way that will always force modified data to be flushed to the server | |||
on close. | ||||
With respect to authentication, flushing modified data to the server | With respect to authentication, flushing modified data to the server | |||
after a CLOSE has occurred may be problematic. For example, the user | after a CLOSE has occurred may be problematic. For example, the user | |||
of the application may have logged off the client and unexpired | of the application may have logged off the client, and unexpired | |||
authentication credentials may not be present. In this case, the | authentication credentials may not be present. In this case, the | |||
client may need to take special care to ensure that local unexpired | client may need to take special care to ensure that local unexpired | |||
credentials will in fact be available. This may be accomplished by | credentials will in fact be available. This may be accomplished by | |||
tracking the expiration time of credentials and flushing data well in | tracking the expiration time of credentials and flushing data well in | |||
advance of their expiration or by making private copies of | advance of their expiration or by making private copies of | |||
credentials to assure their availability when needed. | credentials to assure their availability when needed. | |||
10.4.2. Open Delegation and File Locks | 10.4.2. Open Delegation and File Locks | |||
When a client holds a write open delegation, lock operations are | When a client holds an OPEN_DELEGATE_WRITE delegation, lock | |||
performed locally. This includes those required for mandatory file | operations are performed locally. This includes those required for | |||
locking. This can be done since the delegation implies that there | mandatory byte-range locking. This can be done since the delegation | |||
can be no conflicting locks. Similarly, all of the revalidations | implies that there can be no conflicting locks. Similarly, all of | |||
that would normally be associated with obtaining locks and the | the revalidations that would normally be associated with obtaining | |||
flushing of data associated with the releasing of locks need not be | locks and the flushing of data associated with the releasing of locks | |||
done. | need not be done. | |||
When a client holds a read open delegation, lock operations are not | When a client holds an OPEN_DELEGATE_READ delegation, lock operations | |||
performed locally. All lock operations, including those requesting | are not performed locally. All lock operations, including those | |||
non-exclusive locks, are sent to the server for resolution. | requesting non-exclusive locks, are sent to the server for | |||
resolution. | ||||
10.4.3. Handling of CB_GETATTR | 10.4.3. Handling of CB_GETATTR | |||
The server needs to employ special handling for a GETATTR where the | The server needs to employ special handling for a GETATTR where the | |||
target is a file that has a write open delegation in effect. The | target is a file that has an OPEN_DELEGATE_WRITE delegation in | |||
reason for this is that the client holding the write delegation may | effect. The reason for this is that the client holding the | |||
have modified the data and the server needs to reflect this change to | OPEN_DELEGATE_WRITE delegation may have modified the data, and the | |||
the second client that submitted the GETATTR. Therefore, the client | server needs to reflect this change to the second client that | |||
holding the write delegation needs to be interrogated. The server | submitted the GETATTR. Therefore, the client holding the | |||
OPEN_DELEGATE_WRITE delegation needs to be interrogated. The server | ||||
will use the CB_GETATTR operation. The only attributes that the | will use the CB_GETATTR operation. The only attributes that the | |||
server can reliably query via CB_GETATTR are size and change. | server can reliably query via CB_GETATTR are size and change. | |||
Since CB_GETATTR is being used to satisfy another client's GETATTR | Since CB_GETATTR is being used to satisfy another client's GETATTR | |||
request, the server only needs to know if the client holding the | request, the server only needs to know if the client holding the | |||
delegation has a modified version of the file. If the client's copy | delegation has a modified version of the file. If the client's copy | |||
of the delegated file is not modified (data or size), the server can | of the delegated file is not modified (data or size), the server can | |||
satisfy the second client's GETATTR request from the attributes | satisfy the second client's GETATTR request from the attributes | |||
stored locally at the server. If the file is modified, the server | stored locally at the server. If the file is modified, the server | |||
only needs to know about this modified state. If the server | only needs to know about this modified state. If the server | |||
skipping to change at page 210, line 11 | skipping to change at page 210, line 4 | |||
stored locally at the server. If the file is modified, the server | stored locally at the server. If the file is modified, the server | |||
only needs to know about this modified state. If the server | only needs to know about this modified state. If the server | |||
determines that the file is currently modified, it will respond to | determines that the file is currently modified, it will respond to | |||
the second client's GETATTR as if the file had been modified locally | the second client's GETATTR as if the file had been modified locally | |||
at the server. | at the server. | |||
Since the form of the change attribute is determined by the server | Since the form of the change attribute is determined by the server | |||
and is opaque to the client, the client and server need to agree on a | and is opaque to the client, the client and server need to agree on a | |||
method of communicating the modified state of the file. For the size | method of communicating the modified state of the file. For the size | |||
attribute, the client will report its current view of the file size. | attribute, the client will report its current view of the file size. | |||
For the change attribute, the handling is more involved. | For the change attribute, the handling is more involved. | |||
For the client, the following steps will be taken when receiving a | For the client, the following steps will be taken when receiving an | |||
write delegation: | OPEN_DELEGATE_WRITE delegation: | |||
o The value of the change attribute will be obtained from the server | o The value of the change attribute will be obtained from the server | |||
and cached. Let this value be represented by c. | and cached. Let this value be represented by c. | |||
o The client will create a value greater than c that will be used | o The client will create a value greater than c that will be used | |||
for communicating modified data is held at the client. Let this | for communicating that modified data is held at the client. Let | |||
value be represented by d. | this value be represented by d. | |||
o When the client is queried via CB_GETATTR for the change | o When the client is queried via CB_GETATTR for the change | |||
attribute, it checks to see if it holds modified data. If the | attribute, it checks to see if it holds modified data. If the | |||
file is modified, the value d is returned for the change attribute | file is modified, the value d is returned for the change attribute | |||
value. If this file is not currently modified, the client returns | value. If this file is not currently modified, the client returns | |||
the value c for the change attribute. | the value c for the change attribute. | |||
For simplicity of implementation, the client MAY for each CB_GETATTR | For simplicity of implementation, the client MAY for each CB_GETATTR | |||
return the same value d. This is true even if, between successive | return the same value d. This is true even if, between successive | |||
CB_GETATTR operations, the client again modifies in the file's data | CB_GETATTR operations, the client again modifies the file's data or | |||
or metadata in its cache. The client can return the same value | metadata in its cache. The client can return the same value because | |||
because the only requirement is that the client be able to indicate | the only requirement is that the client be able to indicate to the | |||
to the server that the client holds modified data. Therefore, the | server that the client holds modified data. Therefore, the value of | |||
value of d may always be c + 1. | d may always be c + 1. | |||
While the change attribute is opaque to the client in the sense that | While the change attribute is opaque to the client in the sense that | |||
it has no idea what units of time, if any, the server is counting | it has no idea what units of time, if any, the server is counting | |||
change with, it is not opaque in that the client has to treat it as | change with, it is not opaque in that the client has to treat it as | |||
an unsigned integer, and the server has to be able to see the results | an unsigned integer, and the server has to be able to see the results | |||
of the client's changes to that integer. Therefore, the server MUST | of the client's changes to that integer. Therefore, the server MUST | |||
encode the change attribute in network order when sending it to the | encode the change attribute in network order when sending it to the | |||
client. The client MUST decode it from network order to its native | client. The client MUST decode it from network order to its native | |||
order when receiving it and the client MUST encode it network order | order when receiving it, and the client MUST encode it in network | |||
when sending it to the server. For this reason, change is defined as | order when sending it to the server. For this reason, change is | |||
an unsigned integer rather than an opaque array of bytes. | defined as an unsigned integer rather than an opaque array of bytes. | |||
For the server, the following steps will be taken when providing a | For the server, the following steps will be taken when providing an | |||
write delegation: | OPEN_DELEGATE_WRITE delegation: | |||
o Upon providing a write delegation, the server will cache a copy of | o Upon providing an OPEN_DELEGATE_WRITE delegation, the server will | |||
the change attribute in the data structure it uses to record the | cache a copy of the change attribute in the data structure it uses | |||
delegation. Let this value be represented by sc. | to record the delegation. Let this value be represented by sc. | |||
o When a second client sends a GETATTR operation on the same file to | o When a second client sends a GETATTR operation on the same file to | |||
the server, the server obtains the change attribute from the first | the server, the server obtains the change attribute from the first | |||
client. Let this value be cc. | client. Let this value be cc. | |||
o If the value cc is equal to sc, the file is not modified and the | o If the value cc is equal to sc, the file is not modified and the | |||
server returns the current values for change, time_metadata, and | server returns the current values for change, time_metadata, and | |||
time_modify (for example) to the second client. | time_modify (for example) to the second client. | |||
o If the value cc is NOT equal to sc, the file is currently modified | o If the value cc is NOT equal to sc, the file is currently modified | |||
skipping to change at page 212, line 35 | skipping to change at page 212, line 35 | |||
In the case that the file attribute size is different than the | In the case that the file attribute size is different than the | |||
server's current value, the server treats this as a modification | server's current value, the server treats this as a modification | |||
regardless of the value of the change attribute retrieved via | regardless of the value of the change attribute retrieved via | |||
CB_GETATTR and responds to the second client as in the last step. | CB_GETATTR and responds to the second client as in the last step. | |||
This methodology resolves issues of clock differences between client | This methodology resolves issues of clock differences between client | |||
and server and other scenarios where the use of CB_GETATTR break | and server and other scenarios where the use of CB_GETATTR break | |||
down. | down. | |||
It should be noted that the server is under no obligation to use | It should be noted that the server is under no obligation to use | |||
CB_GETATTR and therefore the server MAY simply recall the delegation | CB_GETATTR, and therefore the server MAY simply recall the delegation | |||
to avoid its use. | to avoid its use. | |||
10.4.4. Recall of Open Delegation | 10.4.4. Recall of Open Delegation | |||
The following events necessitate recall of an open delegation: | The following events necessitate recall of an OPEN delegation: | |||
o Potentially conflicting OPEN request (or READ/WRITE done with | o potentially conflicting OPEN request (or a READ or WRITE operation | |||
"special" stateid) | done with a special stateid) | |||
o SETATTR sent by another client | o SETATTR sent by another client | |||
o REMOVE request for the file | o REMOVE request for the file | |||
o RENAME request for the file as either source or target of the | o RENAME request for the file as either the source or target of the | |||
RENAME | RENAME | |||
Whether a RENAME of a directory in the path leading to the file | Whether a RENAME of a directory in the path leading to the file | |||
results in recall of an open delegation depends on the semantics of | results in recall of an OPEN delegation depends on the semantics of | |||
the server's file system. If that file system denies such RENAMEs | the server's file system. If that file system denies such RENAMEs | |||
when a file is open, the recall must be performed to determine | when a file is open, the recall must be performed to determine | |||
whether the file in question is, in fact, open. | whether the file in question is, in fact, open. | |||
In addition to the situations above, the server may choose to recall | In addition to the situations above, the server may choose to recall | |||
open delegations at any time if resource constraints make it | OPEN delegations at any time if resource constraints make it | |||
advisable to do so. Clients should always be prepared for the | advisable to do so. Clients should always be prepared for the | |||
possibility of recall. | possibility of recall. | |||
When a client receives a recall for an open delegation, it needs to | When a client receives a recall for an OPEN delegation, it needs to | |||
update state on the server before returning the delegation. These | update state on the server before returning the delegation. These | |||
same updates must be done whenever a client chooses to return a | same updates must be done whenever a client chooses to return a | |||
delegation voluntarily. The following items of state need to be | delegation voluntarily. The following items of state need to be | |||
dealt with: | dealt with: | |||
o If the file associated with the delegation is no longer open and | o If the file associated with the delegation is no longer open and | |||
no previous CLOSE operation has been sent to the server, a CLOSE | no previous CLOSE operation has been sent to the server, a CLOSE | |||
operation must be sent to the server. | operation must be sent to the server. | |||
o If a file has other open references at the client, then OPEN | o If a file has other open references at the client, then OPEN | |||
operations must be sent to the server. The appropriate stateids | operations must be sent to the server. The appropriate stateids | |||
will be provided by the server for subsequent use by the client | will be provided by the server for subsequent use by the client | |||
since the delegation stateid will no longer be valid. These OPEN | since the delegation stateid will no longer be valid. These OPEN | |||
requests are done with the claim type of CLAIM_DELEGATE_CUR. This | requests are done with the claim type of CLAIM_DELEGATE_CUR. This | |||
will allow the presentation of the delegation stateid so that the | will allow the presentation of the delegation stateid so that the | |||
client can establish the appropriate rights to perform the OPEN. | client can establish the appropriate rights to perform the OPEN. | |||
(see the Section 18.16 which describes the OPEN" operation for | (see Section 18.16, which describes the OPEN operation, for | |||
details.) | details.) | |||
o If there are granted file locks, the corresponding LOCK operations | o If there are granted byte-range locks, the corresponding LOCK | |||
need to be performed. This applies to the write open delegation | operations need to be performed. This applies to the | |||
case only. | OPEN_DELEGATE_WRITE delegation case only. | |||
o For a write open delegation, if at the time of recall the file is | o For an OPEN_DELEGATE_WRITE delegation, if at the time of recall | |||
not open for write, all modified data for the file must be flushed | the file is not open for OPEN4_SHARE_ACCESS_WRITE/ | |||
to the server. If the delegation had not existed, the client | OPEN4_SHARE_ACCESS_BOTH, all modified data for the file must be | |||
would have done this data flush before the CLOSE operation. | flushed to the server. If the delegation had not existed, the | |||
client would have done this data flush before the CLOSE operation. | ||||
o For a write open delegation when a file is still open at the time | o For an OPEN_DELEGATE_WRITE delegation when a file is still open at | |||
of recall, any modified data for the file needs to be flushed to | the time of recall, any modified data for the file needs to be | |||
the server. | flushed to the server. | |||
o With the write open delegation in place, it is possible that the | o With the OPEN_DELEGATE_WRITE delegation in place, it is possible | |||
file was truncated during the duration of the delegation. For | that the file was truncated during the duration of the delegation. | |||
example, the truncation could have occurred as a result of an OPEN | For example, the truncation could have occurred as a result of an | |||
UNCHECKED with a size attribute value of zero. Therefore, if a | OPEN UNCHECKED with a size attribute value of zero. Therefore, if | |||
truncation of the file has occurred and this operation has not | a truncation of the file has occurred and this operation has not | |||
been propagated to the server, the truncation must occur before | been propagated to the server, the truncation must occur before | |||
any modified data is written to the server. | any modified data is written to the server. | |||
In the case of write open delegation, file locking imposes some | In the case of OPEN_DELEGATE_WRITE delegation, byte-range locking | |||
additional requirements. To precisely maintain the associated | imposes some additional requirements. To precisely maintain the | |||
invariant, it is required to flush any modified data in any region | associated invariant, it is required to flush any modified data in | |||
for which a write lock was released while the write delegation was in | any byte-range for which a WRITE_LT lock was released while the | |||
effect. However, because the write open delegation implies no other | OPEN_DELEGATE_WRITE delegation was in effect. However, because the | |||
locking by other clients, a simpler implementation is to flush all | OPEN_DELEGATE_WRITE delegation implies no other locking by other | |||
modified data for the file (as described just above) if any write | clients, a simpler implementation is to flush all modified data for | |||
lock has been released while the write open delegation was in effect. | the file (as described just above) if any WRITE_LT lock has been | |||
released while the OPEN_DELEGATE_WRITE delegation was in effect. | ||||
An implementation need not wait until delegation recall (or deciding | An implementation need not wait until delegation recall (or the | |||
to voluntarily return a delegation) to perform any of the above | decision to voluntarily return a delegation) to perform any of the | |||
actions, if implementation considerations (e.g. resource availability | above actions, if implementation considerations (e.g., resource | |||
constraints) make that desirable. Generally, however, the fact that | availability constraints) make that desirable. Generally, however, | |||
the actual open state of the file may continue to change makes it not | the fact that the actual OPEN state of the file may continue to | |||
worthwhile to send information about opens and closes to the server, | change makes it not worthwhile to send information about opens and | |||
except as part of delegation return. Only in the case of closing the | closes to the server, except as part of delegation return. An | |||
open that resulted in obtaining the delegation would clients be | exception is when the client has no more internal opens of the file. | |||
likely to do this early, since, in that case, the close once done | In this case, sending a CLOSE is useful because it reduces resource | |||
will not be undone. Regardless of the client's choices on scheduling | utilization on the client and server. Regardless of the client's | |||
these actions, all must be performed before the delegation is | choices on scheduling these actions, all must be performed before the | |||
returned, including (when applicable) the close that corresponds to | delegation is returned, including (when applicable) the close that | |||
the open that resulted in the delegation. These actions can be | corresponds to the OPEN that resulted in the delegation. These | |||
performed either in previous requests or in previous operations in | actions can be performed either in previous requests or in previous | |||
the same COMPOUND request. | operations in the same COMPOUND request. | |||
10.4.5. Clients that Fail to Honor Delegation Recalls | 10.4.5. Clients That Fail to Honor Delegation Recalls | |||
A client may fail to respond to a recall for various reasons, such as | A client may fail to respond to a recall for various reasons, such as | |||
a failure of the backchannel from server to the client. The client | a failure of the backchannel from server to the client. The client | |||
may be unaware of a failure in the backchannel. This lack of | may be unaware of a failure in the backchannel. This lack of | |||
awareness could result in the client finding out long after the | awareness could result in the client finding out long after the | |||
failure that its delegation has been revoked, and another client has | failure that its delegation has been revoked, and another client has | |||
modified the data for which the client had a delegation. This is | modified the data for which the client had a delegation. This is | |||
especially a problem for the client that held a write delegation. | especially a problem for the client that held an OPEN_DELEGATE_WRITE | |||
delegation. | ||||
Status bits returned by SEQUENCE operations help to provide an | Status bits returned by SEQUENCE operations help to provide an | |||
alternate way of informing the client of issues regarding the status | alternate way of informing the client of issues regarding the status | |||
of the backchannel and of recalled delegations. When the backchannel | of the backchannel and of recalled delegations. When the backchannel | |||
is not available, the server returns the status bit | is not available, the server returns the status bit | |||
SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can | SEQ4_STATUS_CB_PATH_DOWN on SEQUENCE operations. The client can | |||
react by attempting to re-establish the backchannel and by returning | react by attempting to re-establish the backchannel and by returning | |||
recallable objects if a backchannel cannot be successfully re- | recallable objects if a backchannel cannot be successfully re- | |||
established. | established. | |||
skipping to change at page 215, line 21 | skipping to change at page 215, line 23 | |||
down status and re-establish the backchannel. | down status and re-establish the backchannel. | |||
When delegations are revoked, the server will return with the | When delegations are revoked, the server will return with the | |||
SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit set on subsequent | SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit set on subsequent | |||
SEQUENCE operations. The client should note this and then use | SEQUENCE operations. The client should note this and then use | |||
TEST_STATEID to find which delegations have been revoked. | TEST_STATEID to find which delegations have been revoked. | |||
10.4.6. Delegation Revocation | 10.4.6. Delegation Revocation | |||
At the point a delegation is revoked, if there are associated opens | At the point a delegation is revoked, if there are associated opens | |||
on the client, these opens may or may not be revoked. If no lock or | on the client, these opens may or may not be revoked. If no byte- | |||
open is granted that is inconsistent with the existing open, the | range lock or open is granted that is inconsistent with the existing | |||
stateid for the open may remain valid, and be disconnected from the | open, the stateid for the open may remain valid and be disconnected | |||
revoked delegation, just as would be the case if the delegation were | from the revoked delegation, just as would be the case if the | |||
returned. | delegation were returned. | |||
For example, if an OPEN for read-write with DENY=NONE is associated | For example, if an OPEN for OPEN4_SHARE_ACCESS_BOTH with a deny of | |||
with the delegation, granting of another such OPEN to a different | OPEN4_SHARE_DENY_NONE is associated with the delegation, granting of | |||
client will revoke the delegation but need not revoke the OPEN, since | another such OPEN to a different client will revoke the delegation | |||
no lock inconsistent with that OPEN has been granted. On the other | but need not revoke the OPEN, since the two OPENs are consistent with | |||
hand, if an OPEN denying write is granted, then the existing open | each other. On the other hand, if an OPEN denying write access is | |||
must be revoked. | granted, then the existing OPEN must be revoked. | |||
When opens and/or locks are revoked, the applications holding these | When opens and/or locks are revoked, the applications holding these | |||
opens or locks need to be notified. This notification usually occurs | opens or locks need to be notified. This notification usually occurs | |||
by returning errors for READ/WRITE operations or when a close is | by returning errors for READ/WRITE operations or when a close is | |||
attempted for the open file. | attempted for the open file. | |||
If no opens exist for the file at the point the delegation is | If no opens exist for the file at the point the delegation is | |||
revoked, then notification of the revocation is unnecessary. | revoked, then notification of the revocation is unnecessary. | |||
However, if there is modified data present at the client for the | However, if there is modified data present at the client for the | |||
file, the user of the application should be notified. Unfortunately, | file, the user of the application should be notified. Unfortunately, | |||
it may not be possible to notify the user since active applications | it may not be possible to notify the user since active applications | |||
may not be present at the client. See Section 10.5.1 for additional | may not be present at the client. See Section 10.5.1 for additional | |||
details. | details. | |||
10.4.7. Delegations via WANT_DELEGATION | 10.4.7. Delegations via WANT_DELEGATION | |||
In addition to providing delegations as part of the reply to OPEN | In addition to providing delegations as part of the reply to OPEN | |||
operations, servers MAY provide delegations separate from open, via | operations, servers MAY provide delegations separate from open, via | |||
the OPTIONAL WANT_DELEGATION operation. This allows delegations to | the OPTIONAL WANT_DELEGATION operation. This allows delegations to | |||
be obtained in advance of an OPEN that might benefit from them, for | be obtained in advance of an OPEN that might benefit from them, for | |||
objects which are not a valid target of OPEN, or to deal with cases | objects that are not a valid target of OPEN, or to deal with cases in | |||
in which a delegation has been recalled and the client wants to make | which a delegation has been recalled and the client wants to make an | |||
an attempt to re-establish it if the absence of use by other clients | attempt to re-establish it if the absence of use by other clients | |||
allows that. | allows that. | |||
The WANT_DELEGATION operation may be performed on any type of file | The WANT_DELEGATION operation may be performed on any type of file | |||
object other than a directory. | object other than a directory. | |||
When a delegation is obtained using WANT_DELEGATION, any open files | When a delegation is obtained using WANT_DELEGATION, any open files | |||
for the same filehandle held by that client are to be treated as | for the same filehandle held by that client are to be treated as | |||
subordinate to the delegation, just as if they had been created using | subordinate to the delegation, just as if they had been created using | |||
an OPEN of type CLAIM_DELEGATE_CUR. They are otherwise unchanged as | an OPEN of type CLAIM_DELEGATE_CUR. They are otherwise unchanged as | |||
to seqid, access and deny modes, and the relationship with byte-range | to seqid, access and deny modes, and the relationship with byte-range | |||
locks. Similarly, existing byte-range locks subordinate to an open | locks. Similarly, because existing byte-range locks are subordinate | |||
which becomes subordinate to a delegation, become indirectly | to an open, those byte-range locks also become indirectly subordinate | |||
subordinate to that new delegation. | to that new delegation. | |||
The WANT_DELEGATION operation provides for delivery of delegations | The WANT_DELEGATION operation provides for delivery of delegations | |||
via callbacks, when the delegations are not immediately available. | via callbacks, when the delegations are not immediately available. | |||
When a requested delegation is available, it is delivered to the | When a requested delegation is available, it is delivered to the | |||
client via a CB_PUSH_DELEG operation. When this happens, open files | client via a CB_PUSH_DELEG operation. When this happens, open files | |||
for the same filehandle become subordinate to the new delegation at | for the same filehandle become subordinate to the new delegation at | |||
the point at which the delegation is delivered , just as if they had | the point at which the delegation is delivered , just as if they had | |||
been created using an OPEN of type CLAIM_DELEGATE_CUR. Similarly, | been created using an OPEN of type CLAIM_DELEGATE_CUR. Similarly, | |||
for existing byte-range locks subordinate to an open. | this occurs for existing byte-range locks subordinate to an open. | |||
10.5. Data Caching and Revocation | 10.5. Data Caching and Revocation | |||
When locks and delegations are revoked, the assumptions upon which | When locks and delegations are revoked, the assumptions upon which | |||
successful caching depend are no longer guaranteed. For any locks or | successful caching depends are no longer guaranteed. For any locks | |||
share reservations that have been revoked, the corresponding state- | or share reservations that have been revoked, the corresponding | |||
owner needs to be notified. This notification includes applications | state-owner needs to be notified. This notification includes | |||
with a file open that has a corresponding delegation which has been | applications with a file open that has a corresponding delegation | |||
revoked. Cached data associated with the revocation must be removed | that has been revoked. Cached data associated with the revocation | |||
from the client. In the case of modified data existing in the | must be removed from the client. In the case of modified data | |||
client's cache, that data must be removed from the client without it | existing in the client's cache, that data must be removed from the | |||
being written to the server. As mentioned, the assumptions made by | client without being written to the server. As mentioned, the | |||
the client are no longer valid at the point when a lock or delegation | assumptions made by the client are no longer valid at the point when | |||
has been revoked. For example, another client may have been granted | a lock or delegation has been revoked. For example, another client | |||
a conflicting lock after the revocation of the lock at the first | may have been granted a conflicting byte-range lock after the | |||
client. Therefore, the data within the lock range may have been | revocation of the byte-range lock at the first client. Therefore, | |||
modified by the other client. Obviously, the first client is unable | the data within the lock range may have been modified by the other | |||
to guarantee to the application what has occurred to the file in the | client. Obviously, the first client is unable to guarantee to the | |||
case of revocation. | application what has occurred to the file in the case of revocation. | |||
Notification to a state-owner will in many cases consist of simply | Notification to a state-owner will in many cases consist of simply | |||
returning an error on the next and all subsequent READs/WRITEs to the | returning an error on the next and all subsequent READs/WRITEs to the | |||
open file or on the close. Where the methods available to a client | open file or on the close. Where the methods available to a client | |||
make such notification impossible because errors for certain | make such notification impossible because errors for certain | |||
operations may not be returned, more drastic action such as signals | operations may not be returned, more drastic action such as signals | |||
or process termination may be appropriate. The justification for | or process termination may be appropriate. The justification here is | |||
this is that an invariant for which an application depends on may be | that an invariant on which an application depends may be violated. | |||
violated. Depending on how errors are typically treated for the | Depending on how errors are typically treated for the client- | |||
client operating environment, further levels of notification | operating environment, further levels of notification including | |||
including logging, console messages, and GUI pop-ups may be | logging, console messages, and GUI pop-ups may be appropriate. | |||
appropriate. | ||||
10.5.1. Revocation Recovery for Write Open Delegation | 10.5.1. Revocation Recovery for Write Open Delegation | |||
Revocation recovery for a write open delegation poses the special | Revocation recovery for an OPEN_DELEGATE_WRITE delegation poses the | |||
issue of modified data in the client cache while the file is not | special issue of modified data in the client cache while the file is | |||
open. In this situation, any client which does not flush modified | not open. In this situation, any client that does not flush modified | |||
data to the server on each close must ensure that the user receives | data to the server on each close must ensure that the user receives | |||
appropriate notification of the failure as a result of the | appropriate notification of the failure as a result of the | |||
revocation. Since such situations may require human action to | revocation. Since such situations may require human action to | |||
correct problems, notification schemes in which the appropriate user | correct problems, notification schemes in which the appropriate user | |||
or administrator is notified may be necessary. Logging and console | or administrator is notified may be necessary. Logging and console | |||
messages are typical examples. | messages are typical examples. | |||
If there is modified data on the client, it must not be flushed | If there is modified data on the client, it must not be flushed | |||
normally to the server. A client may attempt to provide a copy of | normally to the server. A client may attempt to provide a copy of | |||
the file data as modified during the delegation under a different | the file data as modified during the delegation under a different | |||
name in the file system name space to ease recovery. Note that when | name in the file system name space to ease recovery. Note that when | |||
the client can determine that the file has not been modified by any | the client can determine that the file has not been modified by any | |||
other client, or when the client has a complete cached copy of file | other client, or when the client has a complete cached copy of the | |||
in question, such a saved copy of the client's view of the file may | file in question, such a saved copy of the client's view of the file | |||
be of particular value for recovery. In other case, recovery using a | may be of particular value for recovery. In another case, recovery | |||
copy of the file based partially on the client's cached data and | using a copy of the file based partially on the client's cached data | |||
partially on the server copy as modified by other clients, will be | and partially on the server's copy as modified by other clients will | |||
anything but straightforward, so clients may avoid saving file | be anything but straightforward, so clients may avoid saving file | |||
contents in these situations or mark the results specially to warn | contents in these situations or specially mark the results to warn | |||
users of possible problems. | users of possible problems. | |||
Saving of such modified data in delegation revocation situations may | Saving of such modified data in delegation revocation situations may | |||
be limited to files of a certain size or might be used only when | be limited to files of a certain size or might be used only when | |||
sufficient disk space is available within the target file system. | sufficient disk space is available within the target file system. | |||
Such saving may also be restricted to situations when the client has | Such saving may also be restricted to situations when the client has | |||
sufficient buffering resources to keep the cached copy available | sufficient buffering resources to keep the cached copy available | |||
until it is properly stored to the target file system. | until it is properly stored to the target file system. | |||
10.6. Attribute Caching | 10.6. Attribute Caching | |||
This section pertains to the caching of a file's attributes on a | This section pertains to the caching of a file's attributes on a | |||
client when that client does not hold a delegation on the file. | client when that client does not hold a delegation on the file. | |||
The attributes discussed in this section do not include named | The attributes discussed in this section do not include named | |||
attributes. Individual named attributes are analogous to files and | attributes. Individual named attributes are analogous to files, and | |||
caching of the data for these needs to be handled just as data | caching of the data for these needs to be handled just as data | |||
caching is for ordinary files. Similarly, LOOKUP results from an | caching is for ordinary files. Similarly, LOOKUP results from an | |||
OPENATTR directory are to be cached on the same basis as any other | OPENATTR directory (as well as the directory's contents) are to be | |||
pathnames and similarly for directory contents. | cached on the same basis as any other pathnames. | |||
Clients may cache file attributes obtained from the server and use | Clients may cache file attributes obtained from the server and use | |||
them to avoid subsequent GETATTR requests. Such caching is write | them to avoid subsequent GETATTR requests. Such caching is write | |||
through in that modification to file attributes is always done by | through in that modification to file attributes is always done by | |||
means of requests to the server and should not be done locally and | means of requests to the server and should not be done locally and | |||
cached. The exception to this are modifications to attributes that | should not be cached. The exception to this are modifications to | |||
are intimately connected with data caching. Therefore, extending a | attributes that are intimately connected with data caching. | |||
file by writing data to the local data cache is reflected immediately | Therefore, extending a file by writing data to the local data cache | |||
in the size as seen on the client without this change being | is reflected immediately in the size as seen on the client without | |||
immediately reflected on the server. Normally such changes are not | this change being immediately reflected on the server. Normally, | |||
propagated directly to the server but when the modified data is | such changes are not propagated directly to the server, but when the | |||
flushed to the server, analogous attribute changes are made on the | modified data is flushed to the server, analogous attribute changes | |||
server. When open delegation is in effect, the modified attributes | are made on the server. When OPEN delegation is in effect, the | |||
may be returned to the server in reaction to a CB_RECALL call. | modified attributes may be returned to the server in reaction to a | |||
CB_RECALL call. | ||||
The result of local caching of attributes is that the attribute | The result of local caching of attributes is that the attribute | |||
caches maintained on individual clients will not be coherent. | caches maintained on individual clients will not be coherent. | |||
Changes made in one order on the server may be seen in a different | Changes made in one order on the server may be seen in a different | |||
order on one client and in a third order on a different client. | order on one client and in a third order on another client. | |||
The typical file system application programming interfaces do not | The typical file system application programming interfaces do not | |||
provide means to atomically modify or interrogate attributes for | provide means to atomically modify or interrogate attributes for | |||
multiple files at the same time. The following rules provide an | multiple files at the same time. The following rules provide an | |||
environment where the potential incoherences mentioned above can be | environment where the potential incoherencies mentioned above can be | |||
reasonably managed. These rules are derived from the practice of | reasonably managed. These rules are derived from the practice of | |||
previous NFS protocols. | previous NFS protocols. | |||
o All attributes for a given file (per-fsid attributes excepted) are | o All attributes for a given file (per-fsid attributes excepted) are | |||
cached as a unit at the client so that no non-serializability can | cached as a unit at the client so that no non-serializability can | |||
arise within the context of a single file. | arise within the context of a single file. | |||
o An upper time boundary is maintained on how long a client cache | o An upper time boundary is maintained on how long a client cache | |||
entry can be kept without being refreshed from the server. | entry can be kept without being refreshed from the server. | |||
skipping to change at page 219, line 7 | skipping to change at page 219, line 14 | |||
containing RPC. This includes directory operations that update | containing RPC. This includes directory operations that update | |||
attributes indirectly. This is accomplished by following the | attributes indirectly. This is accomplished by following the | |||
modifying operation with a GETATTR operation and then using the | modifying operation with a GETATTR operation and then using the | |||
results of the GETATTR to update the client's cached attributes. | results of the GETATTR to update the client's cached attributes. | |||
Note that if the full set of attributes to be cached is requested by | Note that if the full set of attributes to be cached is requested by | |||
READDIR, the results can be cached by the client on the same basis as | READDIR, the results can be cached by the client on the same basis as | |||
attributes obtained via GETATTR. | attributes obtained via GETATTR. | |||
A client may validate its cached version of attributes for a file by | A client may validate its cached version of attributes for a file by | |||
fetching just both the change and time_access attributes and assuming | fetching both the change and time_access attributes and assuming that | |||
that if the change attribute has the same value as it did when the | if the change attribute has the same value as it did when the | |||
attributes were cached, then no attributes other than time_access | attributes were cached, then no attributes other than time_access | |||
have changed. The reason why time_access is also fetched is because | have changed. The reason why time_access is also fetched is because | |||
many servers operate in environments where the operation that updates | many servers operate in environments where the operation that updates | |||
change does not update time_access. For example, POSIX file | change does not update time_access. For example, POSIX file | |||
semantics do not update access time when a file is modified by the | semantics do not update access time when a file is modified by the | |||
write system call [18]. Therefore, the client that wants a current | write system call [18]. Therefore, the client that wants a current | |||
time_access value should fetch it with change during the attribute | time_access value should fetch it with change during the attribute | |||
cache validation processing and update its cached time_access. | cache validation processing and update its cached time_access. | |||
The client may maintain a cache of modified attributes for those | The client may maintain a cache of modified attributes for those | |||
skipping to change at page 219, line 36 | skipping to change at page 219, line 43 | |||
file object. If an NFS client is caching the content of a file | file object. If an NFS client is caching the content of a file | |||
object, whether it is a regular file, directory, or symbolic link, | object, whether it is a regular file, directory, or symbolic link, | |||
the client SHOULD NOT update the time_access attribute (via SETATTR | the client SHOULD NOT update the time_access attribute (via SETATTR | |||
or a small READ or READDIR request) on the server with each read that | or a small READ or READDIR request) on the server with each read that | |||
is satisfied from cache. The reason is that this can defeat the | is satisfied from cache. The reason is that this can defeat the | |||
performance benefits of caching content, especially since an explicit | performance benefits of caching content, especially since an explicit | |||
SETATTR of time_access may alter the change attribute on the server. | SETATTR of time_access may alter the change attribute on the server. | |||
If the change attribute changes, clients that are caching the content | If the change attribute changes, clients that are caching the content | |||
will think the content has changed, and will re-read unmodified data | will think the content has changed, and will re-read unmodified data | |||
from the server. Nor is the client encouraged to maintain a modified | from the server. Nor is the client encouraged to maintain a modified | |||
version of time_access in its cache, since this would mean that the | version of time_access in its cache, since the client either would | |||
client will either eventually have to write the access time to the | eventually have to write the access time to the server with bad | |||
server with bad performance effects, or it would never update the | performance effects or never update the server's time_access, thereby | |||
server's time_access, thereby resulting in a situation where an | resulting in a situation where an application that caches access time | |||
application that caches access time between a close and open of the | between a close and open of the same file observes the access time | |||
same file observes the access time oscillating between the past and | oscillating between the past and present. The time_access attribute | |||
present. The time_access attribute always means the time of last | always means the time of last access to a file by a read that was | |||
access to a file by a read that was satisfied by the server. This | satisfied by the server. This way clients will tend to see only | |||
way clients will tend to see only time_access changes that go forward | time_access changes that go forward in time. | |||
in time. | ||||
10.7. Data and Metadata Caching and Memory Mapped Files | 10.7. Data and Metadata Caching and Memory Mapped Files | |||
Some operating environments include the capability for an application | Some operating environments include the capability for an application | |||
to map a file's content into the application's address space. Each | to map a file's content into the application's address space. Each | |||
time the application accesses a memory location that corresponds to a | time the application accesses a memory location that corresponds to a | |||
block that has not been loaded into the address space, a page fault | block that has not been loaded into the address space, a page fault | |||
occurs and the file is read (or if the block does not exist in the | occurs and the file is read (or if the block does not exist in the | |||
file, the block is allocated and then instantiated in the | file, the block is allocated and then instantiated in the | |||
application's address space). | application's address space). | |||
As long as each memory mapped access to the file requires a page | As long as each memory-mapped access to the file requires a page | |||
fault, the relevant attributes of the file that are used to detect | fault, the relevant attributes of the file that are used to detect | |||
access and modification (time_access, time_metadata, time_modify, and | access and modification (time_access, time_metadata, time_modify, and | |||
change) will be updated. However, in many operating environments, | change) will be updated. However, in many operating environments, | |||
when page faults are not required these attributes will not be | when page faults are not required, these attributes will not be | |||
updated on reads or updates to the file via memory access (regardless | updated on reads or updates to the file via memory access (regardless | |||
whether the file is local file or is being access remotely). A | of whether the file is local or is accessed remotely). A client or | |||
client or server MAY fail to update attributes of a file that is | server MAY fail to update attributes of a file that is being accessed | |||
being accessed via memory mapped I/O. This has several implications: | via memory-mapped I/O. This has several implications: | |||
o If there is an application on the server that has memory mapped a | o If there is an application on the server that has memory mapped a | |||
file that a client is also accessing, the client may not be able | file that a client is also accessing, the client may not be able | |||
to get a consistent value of the change attribute to determine | to get a consistent value of the change attribute to determine | |||
whether its cache is stale or not. A server that knows that the | whether or not its cache is stale. A server that knows that the | |||
file is memory mapped could always pessimistically return updated | file is memory-mapped could always pessimistically return updated | |||
values for change so as to force the application to always get the | values for change so as to force the application to always get the | |||
most up to date data and metadata for the file. However, due to | most up-to-date data and metadata for the file. However, due to | |||
the negative performance implications of this, such behavior is | the negative performance implications of this, such behavior is | |||
OPTIONAL. | OPTIONAL. | |||
o If the memory mapped file is not being modified on the server, and | o If the memory-mapped file is not being modified on the server, and | |||
instead is just being read by an application via the memory mapped | instead is just being read by an application via the memory-mapped | |||
interface, the client will not see an updated time_access | interface, the client will not see an updated time_access | |||
attribute. However, in many operating environments, neither will | attribute. However, in many operating environments, neither will | |||
any process running on the server. Thus NFS clients are at no | any process running on the server. Thus, NFS clients are at no | |||
disadvantage with respect to local processes. | disadvantage with respect to local processes. | |||
o If there is another client that is memory mapping the file, and if | o If there is another client that is memory mapping the file, and if | |||
that client is holding a write delegation, the same set of issues | that client is holding an OPEN_DELEGATE_WRITE delegation, the same | |||
as discussed in the previous two bullet items apply. So, when a | set of issues as discussed in the previous two bullet points | |||
server does a CB_GETATTR to a file that the client has modified in | apply. So, when a server does a CB_GETATTR to a file that the | |||
its cache, the reply from CB_GETATTR will not necessarily be | client has modified in its cache, the reply from CB_GETATTR will | |||
accurate. As discussed earlier, the client's obligation is to | not necessarily be accurate. As discussed earlier, the client's | |||
report that the file has been modified since the delegation was | obligation is to report that the file has been modified since the | |||
granted, not whether it has been modified again between successive | delegation was granted, not whether it has been modified again | |||
CB_GETATTR calls, and the server MUST assume that any file the | between successive CB_GETATTR calls, and the server MUST assume | |||
client has modified in cache has been modified again between | that any file the client has modified in cache has been modified | |||
successive CB_GETATTR calls. Depending on the nature of the | again between successive CB_GETATTR calls. Depending on the | |||
client's memory management system, this weak obligation may not be | nature of the client's memory management system, this weak | |||
possible. A client MAY return stale information in CB_GETATTR | obligation may not be possible. A client MAY return stale | |||
whenever the file is memory mapped. | information in CB_GETATTR whenever the file is memory-mapped. | |||
o The mixture of memory mapping and file locking on the same file is | o The mixture of memory mapping and byte-range locking on the same | |||
problematic. Consider the following scenario, where a page size | file is problematic. Consider the following scenario, where a | |||
on each client is 8192 bytes. | page size on each client is 8192 bytes. | |||
* Client A memory maps first page (8192 bytes) of file X | * Client A memory maps the first page (8192 bytes) of file X. | |||
* Client B memory maps first page (8192 bytes) of file X | * Client B memory maps the first page (8192 bytes) of file X. | |||
* Client A write locks first 4096 bytes | * Client A WRITE_LT locks the first 4096 bytes. | |||
* Client B write locks second 4096 bytes | * Client B WRITE_LT locks the second 4096 bytes. | |||
* Client A, via a STORE instruction modifies part of its locked | * Client A, via a STORE instruction, modifies part of its locked | |||
region. | byte-range. | |||
* Simultaneous to client A, client B executes a STORE on part of | * Simultaneous to client A, client B executes a STORE on part of | |||
its locked region. | its locked byte-range. | |||
Here the challenge is for each client to resynchronize to get a | Here the challenge is for each client to resynchronize to get a | |||
correct view of the first page. In many operating environments, the | correct view of the first page. In many operating environments, the | |||
virtual memory management systems on each client only know a page is | virtual memory management systems on each client only know a page is | |||
modified, not that a subset of the page corresponding to the | modified, not that a subset of the page corresponding to the | |||
respective lock regions has been modified. So it is not possible for | respective lock byte-ranges has been modified. So it is not possible | |||
each client to do the right thing, which is to only write to the | for each client to do the right thing, which is to write to the | |||
server that portion of the page that is locked. For example, if | server only that portion of the page that is locked. For example, if | |||
client A simply writes out the page, and then client B writes out the | client A simply writes out the page, and then client B writes out the | |||
page, client A's data is lost. | page, client A's data is lost. | |||
Moreover, if mandatory locking is enabled on the file, then we have a | Moreover, if mandatory locking is enabled on the file, then we have a | |||
different problem. When clients A and B execute the STORE | different problem. When clients A and B execute the STORE | |||
instructions, the resulting page faults require a byte-range lock on | instructions, the resulting page faults require a byte-range lock on | |||
the entire page. Each client then tries to extend their locked range | the entire page. Each client then tries to extend their locked range | |||
to the entire page, which results in a deadlock. Communicating the | to the entire page, which results in a deadlock. Communicating the | |||
NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best. | NFS4ERR_DEADLOCK error to a STORE instruction is difficult at best. | |||
If a client is locking the entire memory mapped file, there is no | If a client is locking the entire memory-mapped file, there is no | |||
problem with advisory or mandatory byte-range locking, at least until | problem with advisory or mandatory byte-range locking, at least until | |||
the client unlocks a region in the middle of the file. | the client unlocks a byte-range in the middle of the file. | |||
Given the above issues the following are permitted: | Given the above issues, the following are permitted: | |||
o Clients and servers MAY deny memory mapping a file they know there | o Clients and servers MAY deny memory mapping a file for which they | |||
are byte-range locks for. | know there are byte-range locks. | |||
o Clients and servers MAY deny a byte-range lock on a file they know | o Clients and servers MAY deny a byte-range lock on a file they know | |||
is memory mapped. | is memory-mapped. | |||
o A client MAY deny memory mapping a file that it knows requires | o A client MAY deny memory mapping a file that it knows requires | |||
mandatory locking for I/O. If mandatory locking is enabled after | mandatory locking for I/O. If mandatory locking is enabled after | |||
the file is opened and mapped, the client MAY deny the application | the file is opened and mapped, the client MAY deny the application | |||
further access to its mapped file. | further access to its mapped file. | |||
10.8. Name and Directory Caching without Directory Delegations | 10.8. Name and Directory Caching without Directory Delegations | |||
The NFSv4.1 directory delegation facility (described in Section 10.9 | The NFSv4.1 directory delegation facility (described in Section 10.9 | |||
below) is OPTIONAL for servers to implement. Even where it is | below) is OPTIONAL for servers to implement. Even where it is | |||
implemented, it may not be always be functional because of resource | implemented, it may not always be functional because of resource | |||
availability issues or other constraints. Thus, it is important to | availability issues or other constraints. Thus, it is important to | |||
understand how name and directory caching are done in the absence of | understand how name and directory caching are done in the absence of | |||
directory delegations. Those topics are discussed in the next in | directory delegations. These topics are discussed in the next two | |||
Section 10.8.1. | subsections. | |||
10.8.1. Name Caching | 10.8.1. Name Caching | |||
The results of LOOKUP and READDIR operations may be cached to avoid | The results of LOOKUP and READDIR operations may be cached to avoid | |||
the cost of subsequent LOOKUP operations. Just as in the case of | the cost of subsequent LOOKUP operations. Just as in the case of | |||
attribute caching, inconsistencies may arise among the various client | attribute caching, inconsistencies may arise among the various client | |||
caches. To mitigate the effects of these inconsistencies and given | caches. To mitigate the effects of these inconsistencies and given | |||
the context of typical file system APIs, an upper time boundary is | the context of typical file system APIs, an upper time boundary is | |||
maintained on how long a client name cache entry can be kept without | maintained for how long a client name cache entry can be kept without | |||
verifying that the entry has not been made invalid by a directory | verifying that the entry has not been made invalid by a directory | |||
change operation performed by another client. | change operation performed by another client. | |||
When a client is not making changes to a directory for which there | When a client is not making changes to a directory for which there | |||
exist name cache entries, the client needs to periodically fetch | exist name cache entries, the client needs to periodically fetch | |||
attributes for that directory to ensure that it is not being | attributes for that directory to ensure that it is not being | |||
modified. After determining that no modification has occurred, the | modified. After determining that no modification has occurred, the | |||
expiration time for the associated name cache entries may be updated | expiration time for the associated name cache entries may be updated | |||
to be the current time plus the name cache staleness bound. | to be the current time plus the name cache staleness bound. | |||
When a client is making changes to a given directory, it needs to | When a client is making changes to a given directory, it needs to | |||
determine whether there have been changes made to the directory by | determine whether there have been changes made to the directory by | |||
other clients. It does this by using the change attribute as | other clients. It does this by using the change attribute as | |||
reported before and after the directory operation in the associated | reported before and after the directory operation in the associated | |||
change_info4 value returned for the operation. The server is able to | change_info4 value returned for the operation. The server is able to | |||
communicate to the client whether the change_info4 data is provided | communicate to the client whether the change_info4 data is provided | |||
atomically with respect to the directory operation. If the change | atomically with respect to the directory operation. If the change | |||
values are provided atomically, the client has a basis for | values are provided atomically, the client has a basis for | |||
determining, given proper care, whether other clients are modifying | determining, given proper care, whether other clients are modifying | |||
the directory is question. | the directory in question. | |||
The simplest way to enable the client to make this determination is | The simplest way to enable the client to make this determination is | |||
for the client to serialize all changes made to a specific directory. | for the client to serialize all changes made to a specific directory. | |||
When this is done, and the server provides before and after values of | When this is done, and the server provides before and after values of | |||
the change attribute atomically, the client can simply compare the | the change attribute atomically, the client can simply compare the | |||
after value of the change attribute from one operation on a directory | after value of the change attribute from one operation on a directory | |||
with the before value on the next subsequent operation modifying that | with the before value on the subsequent operation modifying that | |||
directory. When these are equal, the client is assured that no other | directory. When these are equal, the client is assured that no other | |||
client is modifying the directory in question. | client is modifying the directory in question. | |||
When such serialization is not used, and there may be multiple | When such serialization is not used, and there may be multiple | |||
simultaneous outstanding operations modifying a single directory sent | simultaneous outstanding operations modifying a single directory sent | |||
from a single client, making this sort of determination can be more | from a single client, making this sort of determination can be more | |||
complicated, since two such operations which are recognized as | complicated. If two such operations complete in a different order | |||
complete in a different order than they were actually performed, | than they were actually performed, that might give an appearance | |||
might give an appearance consistent with modification being made by | consistent with modification being made by another client. Where | |||
another client. Where this appears to happen, the client needs to | this appears to happen, the client needs to await the completion of | |||
await the completion of all such modifications that were started | all such modifications that were started previously, to see if the | |||
previously, to see if the outstanding before and after change numbers | outstanding before and after change numbers can be sorted into a | |||
can be sorted into a chain such that the before value of one change | chain such that the before value of one change number matches the | |||
number matches the after value of a previous one, in a chain | after value of a previous one, in a chain consistent with this client | |||
consistent with this client being the only one modifying the | being the only one modifying the directory. | |||
directory. | ||||
In either of these cases, the client is able to determine whether the | In either of these cases, the client is able to determine whether the | |||
directory is being modified by another client. If the comparison | directory is being modified by another client. If the comparison | |||
indicates that the directory was updated by another client, the name | indicates that the directory was updated by another client, the name | |||
cache associated with the modified directory is purged from the | cache associated with the modified directory is purged from the | |||
client. If the comparison indicates no modification, the name cache | client. If the comparison indicates no modification, the name cache | |||
can be updated on the client to reflect the directory operation and | can be updated on the client to reflect the directory operation and | |||
the associated timeout extended. The post-operation change value | the associated timeout can be extended. The post-operation change | |||
needs to be saved as the basis for future change_info4 comparisons. | value needs to be saved as the basis for future change_info4 | |||
comparisons. | ||||
As demonstrated by the scenario above, name caching requires that the | As demonstrated by the scenario above, name caching requires that the | |||
client revalidate name cache data by inspecting the change attribute | client revalidate name cache data by inspecting the change attribute | |||
of a directory at the point when the name cache item was cached. | of a directory at the point when the name cache item was cached. | |||
This requires that the server update the change attribute for | This requires that the server update the change attribute for | |||
directories when the contents of the corresponding directory is | directories when the contents of the corresponding directory is | |||
modified. For a client to use the change_info4 information | modified. For a client to use the change_info4 information | |||
appropriately and correctly, the server must report the pre and post | appropriately and correctly, the server must report the pre- and | |||
operation change attribute values atomically. When the server is | post-operation change attribute values atomically. When the server | |||
unable to report the before and after values atomically with respect | is unable to report the before and after values atomically with | |||
to the directory operation, the server must indicate that fact in the | respect to the directory operation, the server must indicate that | |||
change_info4 return value. When the information is not atomically | fact in the change_info4 return value. When the information is not | |||
reported, the client should not assume that other clients have not | atomically reported, the client should not assume that other clients | |||
changed the directory. | have not changed the directory. | |||
10.8.2. Directory Caching | 10.8.2. Directory Caching | |||
The results of READDIR operations may be used to avoid subsequent | The results of READDIR operations may be used to avoid subsequent | |||
READDIR operations. Just as in the cases of attribute and name | READDIR operations. Just as in the cases of attribute and name | |||
caching, inconsistencies may arise among the various client caches. | caching, inconsistencies may arise among the various client caches. | |||
To mitigate the effects of these inconsistencies, and given the | To mitigate the effects of these inconsistencies, and given the | |||
context of typical file system APIs, the following rules should be | context of typical file system APIs, the following rules should be | |||
followed: | followed: | |||
o Cached READDIR information for a directory which is not obtained | o Cached READDIR information for a directory that is not obtained in | |||
in a single READDIR operation must always be a consistent snapshot | a single READDIR operation must always be a consistent snapshot of | |||
of directory contents. This is determined by using a GETATTR | directory contents. This is determined by using a GETATTR before | |||
before the first READDIR and after the last of READDIR that | the first READDIR and after the last READDIR that contributes to | |||
contributes to the cache. | the cache. | |||
o An upper time boundary is maintained to indicate the length of | o An upper time boundary is maintained to indicate the length of | |||
time a directory cache entry is considered valid before the client | time a directory cache entry is considered valid before the client | |||
must revalidate the cached information. | must revalidate the cached information. | |||
The revalidation technique parallels that discussed in the case of | The revalidation technique parallels that discussed in the case of | |||
name caching. When the client is not changing the directory in | name caching. When the client is not changing the directory in | |||
question, checking the change attribute of the directory with GETATTR | question, checking the change attribute of the directory with GETATTR | |||
is adequate. The lifetime of the cache entry can be extended at | is adequate. The lifetime of the cache entry can be extended at | |||
these checkpoints. When a client is modifying the directory, the | these checkpoints. When a client is modifying the directory, the | |||
skipping to change at page 224, line 33 | skipping to change at page 224, line 40 | |||
are other clients modifying the directory. If it is determined that | are other clients modifying the directory. If it is determined that | |||
no other client modifications are occurring, the client may update | no other client modifications are occurring, the client may update | |||
its directory cache to reflect its own changes. | its directory cache to reflect its own changes. | |||
As demonstrated previously, directory caching requires that the | As demonstrated previously, directory caching requires that the | |||
client revalidate directory cache data by inspecting the change | client revalidate directory cache data by inspecting the change | |||
attribute of a directory at the point when the directory was cached. | attribute of a directory at the point when the directory was cached. | |||
This requires that the server update the change attribute for | This requires that the server update the change attribute for | |||
directories when the contents of the corresponding directory is | directories when the contents of the corresponding directory is | |||
modified. For a client to use the change_info4 information | modified. For a client to use the change_info4 information | |||
appropriately and correctly, the server must report the pre and post | appropriately and correctly, the server must report the pre- and | |||
operation change attribute values atomically. When the server is | post-operation change attribute values atomically. When the server | |||
unable to report the before and after values atomically with respect | is unable to report the before and after values atomically with | |||
to the directory operation, the server must indicate that fact in the | respect to the directory operation, the server must indicate that | |||
change_info4 return value. When the information is not atomically | fact in the change_info4 return value. When the information is not | |||
reported, the client should not assume that other clients have not | atomically reported, the client should not assume that other clients | |||
changed the directory. | have not changed the directory. | |||
10.9. Directory Delegations | 10.9. Directory Delegations | |||
10.9.1. Introduction to Directory Delegations | 10.9.1. Introduction to Directory Delegations | |||
Directory caching for the NFSv4.1 protocol, as previously described, | Directory caching for the NFSv4.1 protocol, as previously described, | |||
is similar to file caching in previous versions. Clients typically | is similar to file caching in previous versions. Clients typically | |||
cache directory information for a duration determined by the client. | cache directory information for a duration determined by the client. | |||
At the end of a predefined timeout, the client will query the server | At the end of a predefined timeout, the client will query the server | |||
to see if the directory has been updated. By caching attributes, | to see if the directory has been updated. By caching attributes, | |||
skipping to change at page 226, line 16 | skipping to change at page 226, line 25 | |||
In addition to asking for delegations, a client can also ask for | In addition to asking for delegations, a client can also ask for | |||
notifications for certain events. These events include changes to | notifications for certain events. These events include changes to | |||
the directory's attributes and/or its contents. If a client asks for | the directory's attributes and/or its contents. If a client asks for | |||
notification for a certain event, the server will notify the client | notification for a certain event, the server will notify the client | |||
when that event occurs. This will not result in the delegation being | when that event occurs. This will not result in the delegation being | |||
recalled for that client. The notifications are asynchronous and | recalled for that client. The notifications are asynchronous and | |||
provide a way of avoiding recalls in situations where a directory is | provide a way of avoiding recalls in situations where a directory is | |||
changing enough that the pure recall model may not be effective while | changing enough that the pure recall model may not be effective while | |||
trying to allow the client to get substantial benefit. In the | trying to allow the client to get substantial benefit. In the | |||
absence of notifications, once the delegation is recalled the client | absence of notifications, once the delegation is recalled the client | |||
has to refresh its directory cache which might not be very efficient | has to refresh its directory cache; this might not be very efficient | |||
for very large directories. | for very large directories. | |||
The delegation is read-only and the client may not make changes to | The delegation is read-only and the client may not make changes to | |||
the directory other than by performing NFSv4.1 operations that modify | the directory other than by performing NFSv4.1 operations that modify | |||
the directory or the associated file attributes so that the server | the directory or the associated file attributes so that the server | |||
has knowledge of these changes. In order to keep the client | has knowledge of these changes. In order to keep the client's | |||
namespace synchronized with the server, the server will, if the | namespace synchronized with the server, the server will notify the | |||
client has requested notifications, notify the client holding the | delegation-holding client (assuming it has requested notifications) | |||
delegation of the changes made as a result. This is to avoid any | of the changes made as a result of that client's directory-modifying | |||
need for subsequent GETATTR or READDIR calls to the server. If a | operations. This is to avoid any need for that client to send | |||
single client is holding the delegation and that client makes any | subsequent GETATTR or READDIR operations to the server. If a single | |||
changes to the directory (i.e. the changes are made via operations | client is holding the delegation and that client makes any changes to | |||
sent though a session associated with the client ID holding the | the directory (i.e., the changes are made via operations sent on a | |||
delegation), the delegation will not be recalled. Multiple clients | session associated with the client ID holding the delegation), the | |||
may hold a delegation on the same directory, but if any such client | delegation will not be recalled. Multiple clients may hold a | |||
modifies the directory, the server MUST recall the delegation from | delegation on the same directory, but if any such client modifies the | |||
the other clients, unless those clients have made provisions to be | directory, the server MUST recall the delegation from the other | |||
notified of that sort of modification. | clients, unless those clients have made provisions to be notified of | |||
that sort of modification. | ||||
Delegations can be recalled by the server at any time. Normally, the | Delegations can be recalled by the server at any time. Normally, the | |||
server will recall the delegation when the directory changes in a way | server will recall the delegation when the directory changes in a way | |||
that is not covered by the notification, or when the directory | that is not covered by the notification, or when the directory | |||
changes and notifications have not been requested. If another client | changes and notifications have not been requested. If another client | |||
removes the directory for which a delegation has been granted, the | removes the directory for which a delegation has been granted, the | |||
server will recall the delegation. | server will recall the delegation. | |||
10.9.3. Attributes in Support of Directory Notifications | 10.9.3. Attributes in Support of Directory Notifications | |||
See Section 5.11 for a description of the attributes associated with | See Section 5.11 for a description of the attributes associated with | |||
directory notifications. | directory notifications. | |||
10.9.4. Directory Delegation Recall | 10.9.4. Directory Delegation Recall | |||
The server will recall the directory delegation by sending a callback | The server will recall the directory delegation by sending a callback | |||
to the client. It will use the same callback procedure as used for | to the client. It will use the same callback procedure as used for | |||
recalling file delegations. The server will recall the delegation | recalling file delegations. The server will recall the delegation | |||
when the directory changes in a way that is not covered by the | when the directory changes in a way that is not covered by the | |||
notification. However the server need not recall the delegation if | notification. However, the server need not recall the delegation if | |||
attributes of an entry within the directory change. | attributes of an entry within the directory change. | |||
If the server notices that handing out a delegation for a directory | If the server notices that handing out a delegation for a directory | |||
is causing too many notifications to be sent out, it may decide not | is causing too many notifications to be sent out, it may decide to | |||
to hand out delegations for that directory, or recall those already | not hand out delegations for that directory and/or recall those | |||
granted. If a client tries to remove the directory for which a | already granted. If a client tries to remove the directory for which | |||
delegation has been granted, the server will recall all associated | a delegation has been granted, the server will recall all associated | |||
delegations. | delegations. | |||
The implementation sections for a number of operations describe | The implementation sections for a number of operations describe | |||
situations in which notification or delegation recall would be | situations in which notification or delegation recall would be | |||
required under some common circumstances. In this regard, a similar | required under some common circumstances. In this regard, a similar | |||
set of caveats to those listed in Section 10.2 apply. | set of caveats to those listed in Section 10.2 apply. | |||
o For CREATE, see Section 18.4.4. | o For CREATE, see Section 18.4.4. | |||
o For LINK, see Section 18.9.4. | o For LINK, see Section 18.9.4. | |||
skipping to change at page 227, line 36 | skipping to change at page 227, line 46 | |||
o For REMOVE, see Section 18.25.4. | o For REMOVE, see Section 18.25.4. | |||
o For RENAME, see Section 18.26.4. | o For RENAME, see Section 18.26.4. | |||
o For SETATTR, see Section 18.30.4. | o For SETATTR, see Section 18.30.4. | |||
10.9.5. Directory Delegation Recovery | 10.9.5. Directory Delegation Recovery | |||
Recovery from client or server restart for state on regular files has | Recovery from client or server restart for state on regular files has | |||
two main goals, avoiding the necessity of breaking application | two main goals: avoiding the necessity of breaking application | |||
guarantees with respect to locked files and delivery of updates | guarantees with respect to locked files and delivery of updates | |||
cached at the client. Neither of these goals applies to directories | cached at the client. Neither of these goals applies to directories | |||
protected by read delegations and notifications. Thus, no provision | protected by OPEN_DELEGATE_READ delegations and notifications. Thus, | |||
is made for reclaiming directory delegations in the event of client | no provision is made for reclaiming directory delegations in the | |||
or server restart. The client can simply establish a directory | event of client or server restart. The client can simply establish a | |||
delegation in the same fashion as was done initially. | directory delegation in the same fashion as was done initially. | |||
11. Multi-Server Namespace | 11. Multi-Server Namespace | |||
NFSv4.1 supports attributes that allow a namespace to extend beyond | NFSv4.1 supports attributes that allow a namespace to extend beyond | |||
the boundaries of a single server. It is RECOMMENDED that clients | the boundaries of a single server. It is RECOMMENDED that clients | |||
and servers support construction of such multi-server namespaces. | and servers support construction of such multi-server namespaces. | |||
Use of such multi-server namespaces is OPTIONAL however, and for many | Use of such multi-server namespaces is OPTIONAL, however, and for | |||
purposes, single-server namespace are perfectly acceptable. Use of | many purposes, single-server namespaces are perfectly acceptable. | |||
multi-server namespaces can provide many advantages, however, by | Use of multi-server namespaces can provide many advantages, however, | |||
separating a file system's logical position in a namespace from the | by separating a file system's logical position in a namespace from | |||
(possibly changing) logistical and administrative considerations that | the (possibly changing) logistical and administrative considerations | |||
result in particular file systems being located on particular | that result in particular file systems being located on particular | |||
servers. | servers. | |||
11.1. Location Attributes | 11.1. Location Attributes | |||
NFSv4.1 contains RECOMMENDED attributes that allow file systems on | NFSv4.1 contains RECOMMENDED attributes that allow file systems on | |||
one server to be associated with one or more instances of that file | one server to be associated with one or more instances of that file | |||
system on other servers. These attributes specify such file system | system on other servers. These attributes specify such file system | |||
instances by specifying a server address target (either as a DNS name | instances by specifying a server address target (either as a DNS name | |||
representing one or more IP addresses or as a literal IP address) | representing one or more IP addresses or as a literal IP address) | |||
together with the path of that file system within the associated | together with the path of that file system within the associated | |||
single-server namespace. | single-server namespace. | |||
The fs_locations_info RECOMMENDED attribute allows specification of | The fs_locations_info RECOMMENDED attribute allows specification of | |||
one or more file system instance locations where the data | one or more file system instance locations where the data | |||
corresponding to a given file system may be found. This attribute | corresponding to a given file system may be found. This attribute | |||
provides to the client, in addition to information about file system | provides to the client, in addition to information about file system | |||
instance locations, significant information about the various file | instance locations, significant information about the various file | |||
system instance choices (e.g. priority for use, writability, | system instance choices (e.g., priority for use, writability, | |||
currency, etc.). It also includes information to help the client | currency, etc.). It also includes information to help the client | |||
efficiently effect as seamless a transition as possible among | efficiently effect as seamless a transition as possible among | |||
multiple file system instances, when and if that should be necessary. | multiple file system instances, when and if that should be necessary. | |||
The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and | The fs_locations RECOMMENDED attribute is inherited from NFSv4.0 and | |||
only allows specification of the file system locations where the data | only allows specification of the file system locations where the data | |||
corresponding to a given file system may be found. Servers SHOULD | corresponding to a given file system may be found. Servers SHOULD | |||
make this attribute available whenever fs_locations_info is | make this attribute available whenever fs_locations_info is | |||
supported, but client use of fs_locations_info is to be preferred. | supported, but client use of fs_locations_info is to be preferred. | |||
11.2. File System Presence or Absence | 11.2. File System Presence or Absence | |||
A given location in an NFSv4.1 namespace (typically but not | A given location in an NFSv4.1 namespace (typically but not | |||
necessarily a multi-server namespace) can have a number of file | necessarily a multi-server namespace) can have a number of file | |||
system instance locations associated with it (via the fs_locations or | system instance locations associated with it (via the fs_locations or | |||
fs_locations_info attribute). There may also be an actual current | fs_locations_info attribute). There may also be an actual current | |||
file system at that location, accessible via normal namespace | file system at that location, accessible via normal namespace | |||
operations (e.g. LOOKUP). In this case, the file system is said to | operations (e.g., LOOKUP). In this case, the file system is said to | |||
be "present" at that position in the namespace and clients will | be "present" at that position in the namespace, and clients will | |||
typically use it, reserving use of additional locations specified via | typically use it, reserving use of additional locations specified via | |||
the location-related attributes to situations in which the principal | the location-related attributes to situations in which the principal | |||
location is no longer available. | location is no longer available. | |||
When there is no actual file system at the namespace location in | When there is no actual file system at the namespace location in | |||
question, the file system is said to be "absent". An absent file | question, the file system is said to be "absent". An absent file | |||
system contains no files or directories other than the root. Any | system contains no files or directories other than the root. Any | |||
reference to it, except to access a small set of attributes useful in | reference to it, except to access a small set of attributes useful in | |||
determining alternate locations, will result in an error, | determining alternate locations, will result in an error, | |||
NFS4ERR_MOVED. Note that if the server ever returns the error | NFS4ERR_MOVED. Note that if the server ever returns the error | |||
NFS4ERR_MOVED, it MUST support the fs_locations attribute and SHOULD | NFS4ERR_MOVED, it MUST support the fs_locations attribute and SHOULD | |||
support the fs_locations_info and fs_status attributes. | support the fs_locations_info and fs_status attributes. | |||
While the error name suggests that we have a case of a file system | While the error name suggests that we have a case of a file system | |||
which once was present, and has only become absent later, this is | that once was present, and has only become absent later, this is only | |||
only one possibility. A position in the namespace may be permanently | one possibility. A position in the namespace may be permanently | |||
absent with the set of file system(s) designated by the location | absent with the set of file system(s) designated by the location | |||
attributes being the only realization. The name NFS4ERR_MOVED | attributes being the only realization. The name NFS4ERR_MOVED | |||
reflects an earlier, more limited conception of its function, but | reflects an earlier, more limited conception of its function, but | |||
this error will be returned whenever the referenced file system is | this error will be returned whenever the referenced file system is | |||
absent, whether it has moved or not. | absent, whether it has moved or not. | |||
Except in the case of GETATTR-type operations (to be discussed | Except in the case of GETATTR-type operations (to be discussed | |||
later), when the current filehandle at the start of an operation is | later), when the current filehandle at the start of an operation is | |||
within an absent file system, that operation is not performed and the | within an absent file system, that operation is not performed and the | |||
error NFS4ERR_MOVED returned, to indicate that the file system is | error NFS4ERR_MOVED is returned, to indicate that the file system is | |||
absent on the current server. | absent on the current server. | |||
Because a GETFH cannot succeed if the current filehandle is within an | Because a GETFH cannot succeed if the current filehandle is within an | |||
absent file system, filehandles within an absent file system cannot | absent file system, filehandles within an absent file system cannot | |||
be transferred to the client. When a client does have filehandles | be transferred to the client. When a client does have filehandles | |||
within an absent file system, it is the result of obtaining them when | within an absent file system, it is the result of obtaining them when | |||
the file system was present, and having the file system become absent | the file system was present, and having the file system become absent | |||
subsequently. | subsequently. | |||
It should be noted that because the check for the current filehandle | It should be noted that because the check for the current filehandle | |||
skipping to change at page 229, line 48 | skipping to change at page 230, line 10 | |||
information, as discussed below. | information, as discussed below. | |||
The RECOMMENDED file system attribute fs_status can be used to | The RECOMMENDED file system attribute fs_status can be used to | |||
interrogate the present/absent status of a given file system. | interrogate the present/absent status of a given file system. | |||
11.3. Getting Attributes for an Absent File System | 11.3. Getting Attributes for an Absent File System | |||
When a file system is absent, most attributes are not available, but | When a file system is absent, most attributes are not available, but | |||
it is necessary to allow the client access to the small set of | it is necessary to allow the client access to the small set of | |||
attributes that are available, and most particularly those that give | attributes that are available, and most particularly those that give | |||
information about the correct current locations for this file system, | information about the correct current locations for this file system: | |||
fs_locations and fs_locations_info. | fs_locations and fs_locations_info. | |||
11.3.1. GETATTR Within an Absent File System | 11.3.1. GETATTR within an Absent File System | |||
As mentioned above, an exception is made for GETATTR in that | As mentioned above, an exception is made for GETATTR in that | |||
attributes may be obtained for a filehandle within an absent file | attributes may be obtained for a filehandle within an absent file | |||
system. This exception only applies if the attribute mask contains | system. This exception only applies if the attribute mask contains | |||
at least one attribute bit that indicates the client is interested in | at least one attribute bit that indicates the client is interested in | |||
a result regarding an absent file system: fs_locations, | a result regarding an absent file system: fs_locations, | |||
fs_locations_info, or fs_status. If none of these attributes is | fs_locations_info, or fs_status. If none of these attributes is | |||
requested, GETATTR will result in an NFS4ERR_MOVED error. | requested, GETATTR will result in an NFS4ERR_MOVED error. | |||
When a GETATTR is done on an absent file system, the set of supported | When a GETATTR is done on an absent file system, the set of supported | |||
attributes is very limited. Many attributes, including those that | attributes is very limited. Many attributes, including those that | |||
are normally REQUIRED, will not be available on an absent file | are normally REQUIRED, will not be available on an absent file | |||
system. In addition to the attributes mentioned above (fs_locations, | system. In addition to the attributes mentioned above (fs_locations, | |||
fs_locations_info, fs_status), the following attributes SHOULD be | fs_locations_info, fs_status), the following attributes SHOULD be | |||
available on absent file systems, in the case of RECOMMENDED | available on absent file systems. In the case of RECOMMENDED | |||
attributes at least to the same degree that they are available on | attributes, they should be available at least to the same degree that | |||
present file systems. | they are available on present file systems. | |||
change_policy: This attribute is useful for absent file systems and | change_policy: This attribute is useful for absent file systems and | |||
can be helpful in summarizing to the client when any of the | can be helpful in summarizing to the client when any of the | |||
location-related attributes changes. | location-related attributes change. | |||
fsid: This attribute should be provided so that the client can | fsid: This attribute should be provided so that the client can | |||
determine file system boundaries, including, in particular, the | determine file system boundaries, including, in particular, the | |||
boundary between present and absent file systems. This value must | boundary between present and absent file systems. This value must | |||
be different from any other fsid on the current server and need | be different from any other fsid on the current server and need | |||
have no particular relationship to fsids on any particular | have no particular relationship to fsids on any particular | |||
destination to which the client might be directed. | destination to which the client might be directed. | |||
mounted_on_fileid: For objects at the top of an absent file system | mounted_on_fileid: For objects at the top of an absent file system, | |||
this attribute needs to be available. Since the fileid is one | this attribute needs to be available. Since the fileid is within | |||
which is within the present parent file system, there should be no | the present parent file system, there should be no need to | |||
need to reference the absent file system to provide this | reference the absent file system to provide this information. | |||
information. | ||||
Other attributes SHOULD NOT be made available for absent file | Other attributes SHOULD NOT be made available for absent file | |||
systems, even when it is possible to provide them. The server should | systems, even when it is possible to provide them. The server should | |||
not assume that more information is always better and should avoid | not assume that more information is always better and should avoid | |||
gratuitously providing additional information. | gratuitously providing additional information. | |||
When a GETATTR operation includes a bit mask for one of the | When a GETATTR operation includes a bit mask for one of the | |||
attributes fs_locations, fs_locations_info, or fs_status, but where | attributes fs_locations, fs_locations_info, or fs_status, but where | |||
the bit mask includes attributes which are not supported, GETATTR | the bit mask includes attributes that are not supported, GETATTR will | |||
will not return an error, but will return the mask of the actual | not return an error, but will return the mask of the actual | |||
attributes supported with the results. | attributes supported with the results. | |||
Handling of VERIFY/NVERIFY is similar to GETATTR in that if the | Handling of VERIFY/NVERIFY is similar to GETATTR in that if the | |||
attribute mask does not include fs_locations, fs_locations_info, or | attribute mask does not include fs_locations, fs_locations_info, or | |||
fs_status, the error NFS4ERR_MOVED will result. It differs in that | fs_status, the error NFS4ERR_MOVED will result. It differs in that | |||
any appearance in the attribute mask of an attribute not supported | any appearance in the attribute mask of an attribute not supported | |||
for an absent file system (and note that this will include some | for an absent file system (and note that this will include some | |||
normally REQUIRED attributes), will also cause an NFS4ERR_MOVED | normally REQUIRED attributes) will also cause an NFS4ERR_MOVED | |||
result. | result. | |||
11.3.2. READDIR and Absent File Systems | 11.3.2. READDIR and Absent File Systems | |||
A READDIR performed when the current filehandle is within an absent | A READDIR performed when the current filehandle is within an absent | |||
file system will result in an NFS4ERR_MOVED error, since, unlike the | file system will result in an NFS4ERR_MOVED error, since, unlike the | |||
case of GETATTR, no such exception is made for READDIR. | case of GETATTR, no such exception is made for READDIR. | |||
Attributes for an absent file system may be fetched via a READDIR for | Attributes for an absent file system may be fetched via a READDIR for | |||
a directory in a present file system, when that directory contains | a directory in a present file system, when that directory contains | |||
skipping to change at page 231, line 30 | skipping to change at page 231, line 38 | |||
case, the handling is as follows: | case, the handling is as follows: | |||
o If the attribute set requested includes one of the attributes | o If the attribute set requested includes one of the attributes | |||
fs_locations, fs_locations_info, or fs_status, then fetching of | fs_locations, fs_locations_info, or fs_status, then fetching of | |||
attributes proceeds normally and no NFS4ERR_MOVED indication is | attributes proceeds normally and no NFS4ERR_MOVED indication is | |||
returned, even when the rdattr_error attribute is requested. | returned, even when the rdattr_error attribute is requested. | |||
o If the attribute set requested does not include one of the | o If the attribute set requested does not include one of the | |||
attributes fs_locations, fs_locations_info, or fs_status, then if | attributes fs_locations, fs_locations_info, or fs_status, then if | |||
the rdattr_error attribute is requested, each directory entry for | the rdattr_error attribute is requested, each directory entry for | |||
the root of an absent file system, will report NFS4ERR_MOVED as | the root of an absent file system will report NFS4ERR_MOVED as the | |||
the value of the rdattr_error attribute. | value of the rdattr_error attribute. | |||
o If the attribute set requested does not include any of the | o If the attribute set requested does not include any of the | |||
attributes fs_locations, fs_locations_info, fs_status, or | attributes fs_locations, fs_locations_info, fs_status, or | |||
rdattr_error then the occurrence of the root of an absent file | rdattr_error, then the occurrence of the root of an absent file | |||
system within the directory will result in the READDIR failing | system within the directory will result in the READDIR failing | |||
with an NFS4ERR_MOVED error. | with an NFS4ERR_MOVED error. | |||
o The unavailability of an attribute because of a file system's | o The unavailability of an attribute because of a file system's | |||
absence, even one that is ordinarily REQUIRED, does not result in | absence, even one that is ordinarily REQUIRED, does not result in | |||
any error indication. The set of attributes returned for the root | any error indication. The set of attributes returned for the root | |||
directory of the absent file system in that case is simply | directory of the absent file system in that case is simply | |||
restricted to those actually available. | restricted to those actually available. | |||
11.4. Uses of Location Information | 11.4. Uses of Location Information | |||
The location-bearing attributes (fs_locations and fs_locations_info), | The location-bearing attributes (fs_locations and fs_locations_info), | |||
provide, together with the possibility of absent file systems, a | together with the possibility of absent file systems, provide a | |||
number of important facilities in providing reliable, manageable, and | number of important facilities in providing reliable, manageable, and | |||
scalable data access. | scalable data access. | |||
When a file system is present, these attributes can provide | When a file system is present, these attributes can provide | |||
alternative locations, to be used to access the same data, in the | alternative locations, to be used to access the same data, in the | |||
event of server failures, communications problems, or other | event of server failures, communications problems, or other | |||
difficulties that make continued access to the current file system | difficulties that make continued access to the current file system | |||
impossible or otherwise impractical. Under some circumstances | impossible or otherwise impractical. Under some circumstances, | |||
multiple alternative locations may be used simultaneously to provide | multiple alternative locations may be used simultaneously to provide | |||
higher performance access to the file system in question. Provision | higher-performance access to the file system in question. Provision | |||
of such alternate locations is referred to as "replication" although | of such alternate locations is referred to as "replication" although | |||
there are cases in which replicated sets of data are not in fact | there are cases in which replicated sets of data are not in fact | |||
present, and the replicas are instead different paths to the same | present, and the replicas are instead different paths to the same | |||
data. | data. | |||
When a file system is present and becomes absent, clients can be | When a file system is present and becomes absent, clients can be | |||
given the opportunity to have continued access to their data, at an | given the opportunity to have continued access to their data, at an | |||
alternate location. In this case, a continued attempt to use the | alternate location. In this case, a continued attempt to use the | |||
data in the now-absent file system will result in an NFS4ERR_MOVED | data in the now-absent file system will result in an NFS4ERR_MOVED | |||
error and at that point the successor locations (typically only one | error and, at that point, the successor locations (typically only one | |||
but multiple choices are possible) can be fetched and used to | although multiple choices are possible) can be fetched and used to | |||
continue access. Transfer of the file system contents to the new | continue access. Transfer of the file system contents to the new | |||
location is referred to as "migration", but it should be kept in mind | location is referred to as "migration", but it should be kept in mind | |||
that there are cases in which this term can be used, like | that there are cases in which this term can be used, like | |||
"replication", when there is no actual data migration per se. | "replication", when there is no actual data migration per se. | |||
Where a file system was not previously present, specification of file | Where a file system was not previously present, specification of file | |||
system location provides a means by which file systems located on one | system location provides a means by which file systems located on one | |||
server can be associated with a namespace defined by another server, | server can be associated with a namespace defined by another server, | |||
thus allowing a general multi-server namespace facility. A | thus allowing a general multi-server namespace facility. A | |||
designation of such a location, in place of an absent file system, is | designation of such a location, in place of an absent file system, is | |||
skipping to change at page 233, line 20 | skipping to change at page 233, line 27 | |||
The alternate locations may be physical replicas of the (typically | The alternate locations may be physical replicas of the (typically | |||
read-only) file system data, or they may reflect alternate paths to | read-only) file system data, or they may reflect alternate paths to | |||
the same server or provide for the use of various forms of server | the same server or provide for the use of various forms of server | |||
clustering in which multiple servers provide alternate ways of | clustering in which multiple servers provide alternate ways of | |||
accessing the same physical file system. How these different modes | accessing the same physical file system. How these different modes | |||
of file system transition are represented within the fs_locations and | of file system transition are represented within the fs_locations and | |||
fs_locations_info attributes and how the client deals with file | fs_locations_info attributes and how the client deals with file | |||
system transition issues will be discussed in detail below. | system transition issues will be discussed in detail below. | |||
Multiple server addresses, whether they are derived from a single | Multiple server addresses, whether they are derived from a single | |||
entry with a DNS name representing a set of IP addresses, or from | entry with a DNS name representing a set of IP addresses or from | |||
multiple entries each with its own server address may correspond to | multiple entries each with its own server address, may correspond to | |||
the same actual server. The fact that two addresses correspond to | the same actual server. The fact that two addresses correspond to | |||
the same server is shown by a common so_major_id field within the | the same server is shown by a common so_major_id field within the | |||
eir_server_owner field returned by EXCHANGE_ID (see Section 18.35.3). | eir_server_owner field returned by EXCHANGE_ID (see Section 18.35.3). | |||
For a detailed discussion of how server address targets interact with | For a detailed discussion of how server address targets interact with | |||
the determination of server identity specified by the server owner | the determination of server identity specified by the server owner | |||
field, see Section 11.5. | field, see Section 11.5. | |||
11.4.2. File System Migration | 11.4.2. File System Migration | |||
When a file system is present and becomes absent, clients can be | When a file system is present and becomes absent, clients can be | |||
given the opportunity to have continued access to their data, at an | given the opportunity to have continued access to their data, at an | |||
alternate location, as specified by the fs_locations or | alternate location, as specified by the fs_locations or | |||
fs_locations_info attribute. Typically, a client will be accessing | fs_locations_info attribute. Typically, a client will be accessing | |||
the file system in question, get an NFS4ERR_MOVED error, and then use | the file system in question, get an NFS4ERR_MOVED error, and then use | |||
the fs_locations or fs_locations_info attribute to determine the new | the fs_locations or fs_locations_info attribute to determine the new | |||
location of the data. When fs_locations_info is used, additional | location of the data. When fs_locations_info is used, additional | |||
information will be available which will define the nature of the | information will be available that will define the nature of the | |||
client's handling of the transition to a new server. | client's handling of the transition to a new server. | |||
Such migration can be helpful in providing load balancing or general | Such migration can be helpful in providing load balancing or general | |||
resource reallocation. The protocol does not specify how the file | resource reallocation. The protocol does not specify how the file | |||
system will be moved between servers. It is anticipated that a | system will be moved between servers. It is anticipated that a | |||
number of different server-to-server transfer mechanisms might be | number of different server-to-server transfer mechanisms might be | |||
used with the choice left to the server implementer. The NFSv4.1 | used with the choice left to the server implementor. The NFSv4.1 | |||
protocol specifies the method used to communicate the migration event | protocol specifies the method used to communicate the migration event | |||
between client and server. | between client and server. | |||
The new location may be an alternate communication path to the same | The new location may be an alternate communication path to the same | |||
server, or, in the case of various forms of server clustering, | server or, in the case of various forms of server clustering, another | |||
another server providing access to the same physical file system. | server providing access to the same physical file system. The | |||
client's responsibilities in dealing with this transition depend on | ||||
The client's responsibilities in dealing with this transition depend | the specific nature of the new access path as well as how and whether | |||
on the specific nature of the new access path and how and whether | ||||
data was in fact migrated. These issues will be discussed in detail | data was in fact migrated. These issues will be discussed in detail | |||
below. | below. | |||
When multiple server addresses correspond to the same actual server, | When multiple server addresses correspond to the same actual server, | |||
as shown by a common value for the so_major_id field of the | as shown by a common value for the so_major_id field of the | |||
eir_server_owner field returned by EXCHANGE_ID, the location or | eir_server_owner field returned by EXCHANGE_ID, the location or | |||
locations may designate alternate server addresses in the form of | locations may designate alternate server addresses in the form of | |||
specific server network addresses. These can be used to access the | specific server network addresses. These can be used to access the | |||
file system in question at those addresses and when it is no longer | file system in question at those addresses and when it is no longer | |||
accessible at the original address. | accessible at the original address. | |||
Although a single successor location is typical, multiple locations | Although a single successor location is typical, multiple locations | |||
may be provided, together with information that allows priority among | may be provided, together with information that allows priority among | |||
the choices to be indicated, via information in the fs_locations_info | the choices to be indicated, via information in the fs_locations_info | |||
attribute. Where suitable clustering mechanisms make it possible to | attribute. Where suitable, clustering mechanisms make it possible to | |||
provide multiple identical file systems or paths to them, this allows | provide multiple identical file systems or paths to them; this allows | |||
the client the opportunity to deal with any resource or | the client the opportunity to deal with any resource or | |||
communications issues that might limit data availability. | communications issues that might limit data availability. | |||
When an alternate location is designated as the target for migration, | When an alternate location is designated as the target for migration, | |||
it must designate the same data (with metadata being the same to the | it must designate the same data (with metadata being the same to the | |||
degree indicated by the fs_locations_info attribute). Where file | degree indicated by the fs_locations_info attribute). Where file | |||
systems are writable, a change made on the original file system must | systems are writable, a change made on the original file system must | |||
be visible on all migration targets. Where a file system is not | be visible on all migration targets. Where a file system is not | |||
writable but represents a read-only copy (possibly periodically | writable but represents a read-only copy (possibly periodically | |||
updated) of a writable file system, similar requirements apply to the | updated) of a writable file system, similar requirements apply to the | |||
propagation of updates. Any change visible in the original file | propagation of updates. Any change visible in the original file | |||
system must already be effected on all migration targets, to avoid | system must already be effected on all migration targets, to avoid | |||
any possibility, that a client in effecting a transition to the | any possibility that a client, in effecting a transition to the | |||
migration target will see any reversion in file system state. | migration target, will see any reversion in file system state. | |||
11.4.3. Referrals | 11.4.3. Referrals | |||
Referrals provide a way of placing a file system in a location within | Referrals provide a way of placing a file system in a location within | |||
the namespace essentially without respect to its physical location on | the namespace essentially without respect to its physical location on | |||
a given server. This allows a single server or a set of servers to | a given server. This allows a single server or a set of servers to | |||
present a multi-server namespace that encompasses file systems | present a multi-server namespace that encompasses file systems | |||
located on multiple servers. Some likely uses of this include | located on multiple servers. Some likely uses of this include | |||
establishment of site-wide or organization-wide namespaces, or even | establishment of site-wide or organization-wide namespaces, or even | |||
knitting such together into a truly global namespace. | knitting such together into a truly global namespace. | |||
skipping to change at page 235, line 9 | skipping to change at page 235, line 16 | |||
Referrals occur when a client determines, upon first referencing a | Referrals occur when a client determines, upon first referencing a | |||
position in the current namespace, that it is part of a new file | position in the current namespace, that it is part of a new file | |||
system and that the file system is absent. When this occurs, | system and that the file system is absent. When this occurs, | |||
typically by receiving the error NFS4ERR_MOVED, the actual location | typically by receiving the error NFS4ERR_MOVED, the actual location | |||
or locations of the file system can be determined by fetching the | or locations of the file system can be determined by fetching the | |||
fs_locations or fs_locations_info attribute. | fs_locations or fs_locations_info attribute. | |||
The locations-related attribute may designate a single file system | The locations-related attribute may designate a single file system | |||
location or multiple file system locations, to be selected based on | location or multiple file system locations, to be selected based on | |||
the needs of the client. The server, in the fs_locations_info | the needs of the client. The server, in the fs_locations_info | |||
attribute may specify priorities to be associated with various file | attribute, may specify priorities to be associated with various file | |||
system location choices. The server may assign different priorities | system location choices. The server may assign different priorities | |||
to different locations as reported to individual clients, in order to | to different locations as reported to individual clients, in order to | |||
adapt to client physical location or to effect load balancing. When | adapt to client physical location or to effect load balancing. When | |||
both read-only and read-write file systems are present, some of the | both read-only and read-write file systems are present, some of the | |||
read-only locations may not be absolutely up-to-date (as they would | read-only locations might not be absolutely up-to-date (as they would | |||
have to be in the case of replication and migration). Servers may | have to be in the case of replication and migration). Servers may | |||
also specify file system locations that include client-substituted | also specify file system locations that include client-substituted | |||
variables so that different clients are referred to different file | variables so that different clients are referred to different file | |||
systems (with different data contents) based on client attributes | systems (with different data contents) based on client attributes | |||
such as CPU architecture. | such as CPU architecture. | |||
When the fs_locations_info attribute indicates that there are | When the fs_locations_info attribute indicates that there are | |||
multiple possible targets listed, the relationships among them may be | multiple possible targets listed, the relationships among them may be | |||
important to the client in selecting the one to use. The same rules | important to the client in selecting which one to use. The same | |||
specified in Section 11.4.1 defining the appropriate standards for | rules specified in Section 11.4.1 defining the appropriate standards | |||
the data propagation, apply to these multiple replicas as well. For | for the data propagation apply to these multiple replicas as well. | |||
example, the client might prefer a writable target on a server that | For example, the client might prefer a writable target on a server | |||
has additional writable replicas to which it subsequently might | that has additional writable replicas to which it subsequently might | |||
switch. Note that, as distinguished from the case of replication, | switch. Note that, as distinguished from the case of replication, | |||
there is no need to deal with the case of propagation of updates made | there is no need to deal with the case of propagation of updates made | |||
by the current client, since the current client has not accessed the | by the current client, since the current client has not accessed the | |||
file system in question. | file system in question. | |||
Use of multi-server namespaces is enabled by NFSv4.1 but is not | Use of multi-server namespaces is enabled by NFSv4.1 but is not | |||
required. The use of multi-server namespaces and their scope will | required. The use of multi-server namespaces and their scope will | |||
depend on the applications used, and system administration | depend on the applications used and system administration | |||
preferences. | preferences. | |||
Multi-server namespaces can be established by a single server | Multi-server namespaces can be established by a single server | |||
providing a large set of referrals to all of the included file | providing a large set of referrals to all of the included file | |||
systems. Alternatively, a single multi-server namespace may be | systems. Alternatively, a single multi-server namespace may be | |||
administratively segmented with separate referral file systems (on | administratively segmented with separate referral file systems (on | |||
separate servers) for each separately-administered portion of the | separate servers) for each separately administered portion of the | |||
namespace. Any segment or the top-level referral file system may use | namespace. The top-level referral file system or any segment may use | |||
replicated referral file systems for higher availability. | replicated referral file systems for higher availability. | |||
Generally, multi-server namespaces are for the most part uniform, in | Generally, multi-server namespaces are for the most part uniform, in | |||
that the same data made available to one client at a given location | that the same data made available to one client at a given location | |||
in the namespace is made available to all clients at that location. | in the namespace is made available to all clients at that location. | |||
There are however facilities provided which allow different clients | However, there are facilities provided that allow different clients | |||
to be directed to different sets of data, so as to adapt to such | to be directed to different sets of data, so as to adapt to such | |||
client characteristics as CPU architecture. | client characteristics as CPU architecture. | |||
11.5. Location Entries and Server Identity | 11.5. Location Entries and Server Identity | |||
As mentioned above, a single location entry may have a server address | As mentioned above, a single location entry may have a server address | |||
target in the form of a DNS name which may represent multiple IP | target in the form of a DNS name that may represent multiple IP | |||
addresses, while multiple location entries may have their own server | addresses, while multiple location entries may have their own server | |||
address targets, that reference the same server. Whether two IP | address targets that reference the same server. Whether two IP | |||
addresses designate the same server is indicated by the existence of | addresses designate the same server is indicated by the existence of | |||
a common so_major_id field within the eir_server_owner field returned | a common so_major_id field within the eir_server_owner field returned | |||
by EXCHANGE_ID (see Section 18.35.3), subject to further | by EXCHANGE_ID (see Section 18.35.3), subject to further verification | |||
verification, for details of which see Section 2.10.5. | (for details see Section 2.10.5). | |||
When multiple addresses for the same server exist, the client may | When multiple addresses for the same server exist, the client may | |||
assume that for each file system in the namespace of a given server | assume that for each file system in the namespace of a given server | |||
network address, there exist file systems at corresponding namespace | network address, there exist file systems at corresponding namespace | |||
locations for each of the other server network addresses. It may do | locations for each of the other server network addresses. It may do | |||
this even in the absence of explicit listing in fs_locations and | this even in the absence of explicit listing in fs_locations and | |||
fs_locations_info. Such corresponding file system locations can be | fs_locations_info. Such corresponding file system locations can be | |||
used as alternate locations, just as those explicitly specified via | used as alternate locations, just as those explicitly specified via | |||
the fs_locations and fs_locations_info attributes. Where these | the fs_locations and fs_locations_info attributes. Where these | |||
specific addresses are explicitly designated in the fs_locations_info | specific addresses are explicitly designated in the fs_locations_info | |||
attribute, the conditions of use specified in this attribute (e.g. | attribute, the conditions of use specified in this attribute (e.g., | |||
priorities, specification of simultaneous use) may limit the client's | priorities, specification of simultaneous use) may limit the client's | |||
use of these alternate locations. | use of these alternate locations. | |||
If a single location entry designates multiple server IP addresses, | If a single location entry designates multiple server IP addresses, | |||
the client cannot assume that these addresses are multiple paths to | the client cannot assume that these addresses are multiple paths to | |||
the same server. In most case they will be, but the client MUST | the same server. In most cases, they will be, but the client MUST | |||
verify that before acting on that assumption. When two server | verify that before acting on that assumption. When two server | |||
addresses are designated by a single location entry and they | addresses are designated by a single location entry and they | |||
correspond to different servers, this normally indicates some sort of | correspond to different servers, this normally indicates some sort of | |||
misconfiguration, and so the client should avoid using such location | misconfiguration, and so the client should avoid using such location | |||
entries when alternatives are available. When they are not, clients | entries when alternatives are available. When they are not, clients | |||
should pick one of IP addresses and use it, without using others that | should pick one of IP addresses and use it, without using others that | |||
are not directed to the same server. | are not directed to the same server. | |||
11.6. Additional Client-side Considerations | 11.6. Additional Client-Side Considerations | |||
When clients make use of servers that implement referrals, | When clients make use of servers that implement referrals, | |||
replication, and migration, care should be taken so that a user who | replication, and migration, care should be taken that a user who | |||
mounts a given file system that includes a referral or a relocated | mounts a given file system that includes a referral or a relocated | |||
file system continues to see a coherent picture of that user-side | file system continues to see a coherent picture of that user-side | |||
file system despite the fact that it contains a number of server-side | file system despite the fact that it contains a number of server-side | |||
file systems which may be on different servers. | file systems that may be on different servers. | |||
One important issue is upward navigation from the root of a server- | One important issue is upward navigation from the root of a server- | |||
side file system to its parent (specified as ".." in UNIX), in the | side file system to its parent (specified as ".." in UNIX), in the | |||
case in which it transitions to that file system as a result of | case in which it transitions to that file system as a result of | |||
referral, migration, or a transition as a result of replication. | referral, migration, or a transition as a result of replication. | |||
When the client is at such a point, and it needs to ascend to the | When the client is at such a point, and it needs to ascend to the | |||
parent, it must go back to the parent as seen within the multi-server | parent, it must go back to the parent as seen within the multi-server | |||
namespace rather than sending a LOOKUPP operation to the server, | namespace rather than sending a LOOKUPP operation to the server, | |||
which would result in the parent within that server's single-server | which would result in the parent within that server's single-server | |||
namespace. In order to do this, the client needs to remember the | namespace. In order to do this, the client needs to remember the | |||
filehandles that represent such file system roots, and use these | filehandles that represent such file system roots and use these | |||
instead of sending a LOOKUPP operation to the current server. This | instead of sending a LOOKUPP operation to the current server. This | |||
will allow the client to present to applications a consistent | will allow the client to present to applications a consistent | |||
namespace, where upward navigation and downward navigation are | namespace, where upward navigation and downward navigation are | |||
consistent. | consistent. | |||
Another issue concerns refresh of referral locations. When referrals | Another issue concerns refresh of referral locations. When referrals | |||
are used extensively, they may change as server configurations | are used extensively, they may change as server configurations | |||
change. It is expected that clients will cache information related | change. It is expected that clients will cache information related | |||
to traversing referrals so that future client side requests are | to traversing referrals so that future client-side requests are | |||
resolved locally without server communication. This is usually | resolved locally without server communication. This is usually | |||
rooted in client-side name lookup caching. Clients should | rooted in client-side name lookup caching. Clients should | |||
periodically purge this data for referral points in order to detect | periodically purge this data for referral points in order to detect | |||
changes in location information. When the change_policy attribute | changes in location information. When the change_policy attribute | |||
changes for directories that hold referral entries or for the | changes for directories that hold referral entries or for the | |||
referral entries themselves, clients should consider any associated | referral entries themselves, clients should consider any associated | |||
cached referral information to be out of date. | cached referral information to be out of date. | |||
11.7. Effecting File System Transitions | 11.7. Effecting File System Transitions | |||
Transitions between file system instances, whether due to switching | Transitions between file system instances, whether due to switching | |||
between replicas upon server unavailability, or in response to | between replicas upon server unavailability or to server-initiated | |||
server-initiated migration events are best dealt with together. This | migration events, are best dealt with together. This is so even | |||
is so even though for the server, pragmatic considerations will | though, for the server, pragmatic considerations will normally force | |||
normally force different implementation strategies for planned and | different implementation strategies for planned and unplanned | |||
unplanned transitions. Even though the prototypical use cases of | transitions. Even though the prototypical use cases of replication | |||
replication and migration contain distinctive sets of features, when | and migration contain distinctive sets of features, when all | |||
all possibilities for these operations are considered, there is an | possibilities for these operations are considered, there is an | |||
underlying unity of these operations, from the client's point of | underlying unity of these operations, from the client's point of | |||
view, that makes treating them together desirable. | view, that makes treating them together desirable. | |||
A number of methods are possible for servers to replicate data and to | A number of methods are possible for servers to replicate data and to | |||
track client state in order to allow clients to transition between | track client state in order to allow clients to transition between | |||
file system instances with a minimum of disruption. Such methods | file system instances with a minimum of disruption. Such methods | |||
vary between those that use inter-server clustering techniques to | vary between those that use inter-server clustering techniques to | |||
limit the changes seen by the client, to those that are less | limit the changes seen by the client, to those that are less | |||
aggressive, use more standard methods of replicating data, and impose | aggressive, use more standard methods of replicating data, and impose | |||
a greater burden on the client to adapt to the transition. | a greater burden on the client to adapt to the transition. | |||
The NFSv4.1 protocol does not impose choices on clients and servers | The NFSv4.1 protocol does not impose choices on clients and servers | |||
with regard to that spectrum of transition methods. In fact, there | with regard to that spectrum of transition methods. In fact, there | |||
are many valid choices, depending on client and application | are many valid choices, depending on client and application | |||
requirements and their interaction with server implementation | requirements and their interaction with server implementation | |||
choices. The NFSv4.1 protocol does define the specific choices that | choices. The NFSv4.1 protocol does define the specific choices that | |||
can be made, how these choices are communicated to the client and how | can be made, how these choices are communicated to the client, and | |||
the client is to deal with any discontinuities. | how the client is to deal with any discontinuities. | |||
In the sections below, references will be made to various possible | In the sections below, references will be made to various possible | |||
server implementation choices as a way of illustrating the transition | server implementation choices as a way of illustrating the transition | |||
scenarios that clients may deal with. The intent here is not to | scenarios that clients may deal with. The intent here is not to | |||
define or limit server implementations but rather to illustrate the | define or limit server implementations but rather to illustrate the | |||
range of issues that clients may face. | range of issues that clients may face. | |||
In the discussion below, references will be made to a file system | In the discussion below, references will be made to a file system | |||
having a particular property or of two file systems (typically the | having a particular property or to two file systems (typically the | |||
source and destination) belonging to a common class of any of several | source and destination) belonging to a common class of any of several | |||
types. Two file systems that belong to such a class share some | types. Two file systems that belong to such a class share some | |||
important aspect of file system behavior that clients may depend upon | important aspects of file system behavior that clients may depend | |||
when present, to easily effect a seamless transition between file | upon when present, to easily effect a seamless transition between | |||
system instances. Conversely, where the file systems do not belong | file system instances. Conversely, where the file systems do not | |||
to such a common class, the client has to deal with various sorts of | belong to such a common class, the client has to deal with various | |||
implementation discontinuities which may cause performance or other | sorts of implementation discontinuities that may cause performance or | |||
issues in effecting a transition. | other issues in effecting a transition. | |||
Where the fs_locations_info attribute is available, such file system | Where the fs_locations_info attribute is available, such file system | |||
classification data will be made directly available to the client | classification data will be made directly available to the client | |||
(see Section 11.10 for details). When only fs_locations is | (see Section 11.10 for details). When only fs_locations is | |||
available, default assumptions with regard to such classifications | available, default assumptions with regard to such classifications | |||
have to be inferred (see Section 11.9 for details). | have to be inferred (see Section 11.9 for details). | |||
In cases in which one server is expected to accept opaque values from | In cases in which one server is expected to accept opaque values from | |||
the client that originated from another server, the servers SHOULD | the client that originated from another server, the servers SHOULD | |||
encode the "opaque" values in big endian byte order. If this is | encode the "opaque" values in big-endian byte order. If this is | |||
done, servers acting as replicas or immigrating file systems will be | done, servers acting as replicas or immigrating file systems will be | |||
able to parse values like stateids, directory cookies, filehandles, | able to parse values like stateids, directory cookies, filehandles, | |||
etc. even if their native byte order is different from that of other | etc., even if their native byte order is different from that of other | |||
servers cooperating in the replication and migration of the file | servers cooperating in the replication and migration of the file | |||
system. | system. | |||
11.7.1. File System Transitions and Simultaneous Access | 11.7.1. File System Transitions and Simultaneous Access | |||
When a single file system may be accessed at multiple locations, | When a single file system may be accessed at multiple locations, | |||
whether this is because of an indication of file system identity as | either because of an indication of file system identity as reported | |||
reported by the fs_locations or fs_locations_info attributes or | by the fs_locations or fs_locations_info attributes or because two | |||
because two file system instances have corresponding locations on | file system instances have corresponding locations on server | |||
server addresses which connect to the same server (as indicated by a | addresses that connect to the same server (as indicated by a common | |||
common so_major_id field in the eir_server_owner field returned by | so_major_id field in the eir_server_owner field returned by | |||
EXCHANGE_ID), the client will, depending on specific circumstances as | EXCHANGE_ID), the client will, depending on specific circumstances as | |||
discussed below, either: | discussed below, either: | |||
o The client accesses multiple instances simultaneously, as | o Access multiple instances simultaneously, each of which represents | |||
representing alternate paths to the same data and metadata. | an alternate path to the same data and metadata. | |||
o The client accesses one instance (or set of instances) and then | o Access one instance (or set of instances) and then transition to | |||
transitions to an alternative instance (or set of instances) as a | an alternative instance (or set of instances) as a result of | |||
result of network issues, server unresponsiveness, or server- | network issues, server unresponsiveness, or server-directed | |||
directed migration. The transition may involve changes in | migration. The transition may involve changes in filehandles, | |||
filehandles, fileids, the change attribute, and/or locking state, | fileids, the change attribute, and/or locking state, depending on | |||
depending on the attributes of the source and destination file | the attributes of the source and destination file system | |||
system instances, as specified in the fs_locations_info attribute. | instances, as specified in the fs_locations_info attribute. | |||
Which of these choices is possible, and how a transition is effected, | Which of these choices is possible, and how a transition is effected, | |||
is governed by equivalence classes of file system instances as | is governed by equivalence classes of file system instances as | |||
reported by the fs_locations_info attribute, and, for file system | reported by the fs_locations_info attribute, and for file system | |||
instances in the same location within a multiple single-server | instances in the same location within a multi-homed single-server | |||
namespace as indicated by the so_major_id field in the | namespace, as indicated by the value of the so_major_id field of the | |||
eir_server_owner field returned by EXCHANGE_ID. | eir_server_owner field returned by EXCHANGE_ID. | |||
11.7.2. Simultaneous Use and Transparent Transitions | 11.7.2. Simultaneous Use and Transparent Transitions | |||
When two file system instances have the same location within their | When two file system instances have the same location within their | |||
respective single-server namespaces and those two server network | respective single-server namespaces and those two server network | |||
addresses designate the same server (as indicated by the same | addresses designate the same server (as indicated by the same value | |||
so_major_id value in the eir_server_owner value returned in response | of the so_major_id field of the eir_server_owner field returned in | |||
to EXCHANGE_ID), those file systems instances can be treated as the | response to EXCHANGE_ID), those file system instances can be treated | |||
same, and either used together simultaneously or serially with no | as the same, and either used together simultaneously or serially with | |||
transition activity required on the part of the client. In this case | no transition activity required on the part of the client. In this | |||
we refer to the transition as "transparent" and the client in | case, we refer to the transition as "transparent", and the client in | |||
transferring access from to the other is acting as it would in the | transferring access from one to the other is acting as it would in | |||
event that communication is interrupted, with a new connection and | the event that communication is interrupted, with a new connection | |||
possibly a new session being established to continue access to the | and possibly a new session being established to continue access to | |||
same file system. | the same file system. | |||
Whether simultaneous use of the two file system instances is valid is | Whether simultaneous use of the two file system instances is valid is | |||
controlled by whether the fs_locations_info attribute shows the two | controlled by whether the fs_locations_info attribute shows the two | |||
instances as having the same _simultaneous-use_ class. See | instances as having the same simultaneous-use class. See | |||
Section 11.10.1 for information about the definition of the various | Section 11.10.1 for information about the definition of the various | |||
use classes, including the _simultaneous-use_ class. | use classes, including the simultaneous-use class. | |||
Note that for two such file systems, any information within the | Note that for two such file systems, any information within the | |||
fs_locations_info attribute that indicates the need for special | fs_locations_info attribute that indicates the need for special | |||
transition activity, i.e. the appearance of the two file system | transition activity, i.e., the appearance of the two file system | |||
instances with different _handle_, _fileid_, _write-verifier_, | instances with different handle, fileid, write-verifier, change, and | |||
_change_, _readdir_ classes, indicates a serious problem and the | readdir classes, indicates a serious problem. The client, if it | |||
client, if it allows transition to the file system instance at all, | allows transition to the file system instance at all, must not treat | |||
must not treat this as a transparent transition. The server SHOULD | this as a transparent transition. The server SHOULD NOT indicate | |||
NOT indicate that these instances belong to different _handle_, | that these instances belong to different handle, fileid, write- | |||
_fileid_, _write-verifier_, _change_, _readdir_ classes, whether the | verifier, change, and readdir classes, whether or not the two | |||
two instances are shown belonging to the same _simultaneous-use_ | instances are shown belonging to the same simultaneous-use class. | |||
class or not. | ||||
Where these conditions do not apply, a non-transparent file system | Where these conditions do not apply, a non-transparent file system | |||
instance transition is required with the details depending on the | instance transition is required with the details depending on the | |||
respective _handle_, _fileid_, _write-verifier_, _change_, _readdir_ | respective handle, fileid, write-verifier, change, and readdir | |||
classes of the two file system instances and whether the two servers | classes of the two file system instances, and whether the two | |||
address in question have the same eir_server_scope value as reported | servers' addresses in question have the same eir_server_scope value | |||
by EXCHANGE_ID. | as reported by EXCHANGE_ID. | |||
11.7.2.1. Simultaneous Use of File System Instances | 11.7.2.1. Simultaneous Use of File System Instances | |||
When the conditions in Section 11.7.2 hold, in either of the | When the conditions in Section 11.7.2 hold, in either of the | |||
following two cases, the client may use the two file system instances | following two cases, the client may use the two file system instances | |||
simultaneously. | simultaneously. | |||
o The fs_locations_info attribute does not contain separate per- | o The fs_locations_info attribute does not contain separate per- | |||
network-address entries for file systems instances at the distinct | network-address entries for file system instances at the distinct | |||
network addresses. This includes the case in which the | network addresses. This includes the case in which the | |||
fs_locations_info attribute is unavailable. In this case, the | fs_locations_info attribute is unavailable. In this case, the | |||
fact that the two server addresses connect to the same server (as | fact that the two server addresses connect to the same server (as | |||
indicated by the two addresses sharing the same the so_major_id | indicated by the two addresses sharing the same the so_major_id | |||
value and subsequently confirmed as described in Section 2.10.5) | value and subsequently confirmed as described in Section 2.10.5) | |||
justifies simultaneous use and there is no fs_locations_info | justifies simultaneous use, and there is no fs_locations_info | |||
attribute information contradicting that. | attribute information contradicting that. | |||
o The fs_locations_info attribute indicates that two file system | o The fs_locations_info attribute indicates that two file system | |||
instances belong to the same _simultaneous-use_ class. | instances belong to the same simultaneous-use class. | |||
In this case, the client may use both file system instances | In this case, the client may use both file system instances | |||
simultaneously, as representations of the same file system, whether | simultaneously, as representations of the same file system, whether | |||
that happens because the two network addresses connect to the same | that happens because the two network addresses connect to the same | |||
physical server or because different servers connect to clustered | physical server or because different servers connect to clustered | |||
file systems and export their data in common. When simultaneous use | file systems and export their data in common. When simultaneous use | |||
is in effect, any change made to one file system instance must be | is in effect, any change made to one file system instance must be | |||
immediately reflected in the other file system instance(s). Locks | immediately reflected in the other file system instance(s). Locks | |||
are treated as part of a common lease, associated with a common | are treated as part of a common lease, associated with a common | |||
client ID. Depending on the details of the eir_server_owner returned | client ID. Depending on the details of the eir_server_owner returned | |||
by EXCHANGE_ID, the two server instances may be accessed by different | by EXCHANGE_ID, the two server instances may be accessed by different | |||
sessions or a single session in common. | sessions or a single session in common. | |||
11.7.2.2. Transparent File System Transitions | 11.7.2.2. Transparent File System Transitions | |||
When the conditions in Section 11.7.2.1 hold and the | When the conditions in Section 11.7.2.1 hold and the | |||
fs_locations_info attribute explicitly shows the file system | fs_locations_info attribute explicitly shows the file system | |||
instances for these distinct network addresses as belonging to | instances for these distinct network addresses as belonging to | |||
different _simultaneous-use_ classes, the file system instances | different simultaneous-use classes, the file system instances should | |||
should not be used by the client simultaneously, but rather serially | not be used by the client simultaneously. Rather, they should be | |||
with one being used unless and until communication difficulties, lack | used serially with one being used unless and until communication | |||
of responsiveness, or an explicit migration event causes another file | difficulties, lack of responsiveness, or an explicit migration event | |||
system instance (or set of file system instances sharing a common | causes another file system instance (or set of file system instances | |||
_simultaneous-use_ class) to be used. | sharing a common simultaneous-use class) to be used. | |||
When a change of file system instance is to be done, the client will | When a change of file system instance is to be done, the client will | |||
use the same client ID already in effect. If it already has | use the same client ID already in effect. If the client already has | |||
connections to the new server address, these will be used. Otherwise | connections to the new server address, these will be used. | |||
new connections to existing sessions or new sessions associated with | Otherwise, new connections to existing sessions or new sessions | |||
the existing client ID are established as indicated by the | associated with the existing client ID are established as indicated | |||
eir_server_owner returned by EXCHANGE_ID. | by the eir_server_owner returned by EXCHANGE_ID. | |||
In all such transparent transition cases, the following apply: | In all such transparent transition cases, the following apply: | |||
o If filehandles are persistent they stay the same. If filehandles | o If filehandles are persistent, they stay the same. If filehandles | |||
are volatile, they either stay the same, or if they expire, the | are volatile, they either stay the same or expire, but the reason | |||
reason for expiration is not due to the file system transition. | for expiration is not due to the file system transition. | |||
o Fileid values do not change across the transition. | o Fileid values do not change across the transition. | |||
o The file system will have the same fsid in both the old and new | o The file system will have the same fsid in both the old and new | |||
locations. | locations. | |||
o Change attribute values are consistent across the transition and | o Change attribute values are consistent across the transition and | |||
do not have to be refetched. When change attributes indicate that | do not have to be refetched. When change attributes indicate that | |||
a cached object is still valid, it can remain cached. | a cached object is still valid, it can remain cached. | |||
o Client and state identifiers retain their validity across the | o Client and state identifiers retain their validity across the | |||
transition, except where their staleness is recognized and | transition, except where their staleness is recognized and | |||
reported by the new server. Except where such staleness requires | reported by the new server. Except where such staleness requires | |||
it, no lock reclamation is needed. Any such staleness is an | it, no lock reclamation is needed. Any such staleness is an | |||
indication that the server should be considered to have restarted | indication that the server should be considered to have restarted | |||
and is reported as discussed in Section 8.4.2. | and is reported as discussed in Section 8.4.2. | |||
o Write verifiers are presumed to retain their validity and can be | o Write verifiers are presumed to retain their validity and can be | |||
used to compare with verifiers returned by COMMIT on the new | used to compare with verifiers returned by COMMIT on the new | |||
server, with the expectation that if COMMIT on the new server | server. If COMMIT on the new server returns an identical | |||
returns an identical verifier, then that server has all of the | verifier, then it is expected that the new server has all of the | |||
data unstably written to the original server and has committed it | data that was written unstably to the original server and has | |||
to stable storage as requested. | committed that data to stable storage as requested. | |||
o Readdir cookies are presumed to retain their validity and can be | o Readdir cookies are presumed to retain their validity and can be | |||
presented to subsequent READDIR requests together with the readdir | presented to subsequent READDIR requests together with the readdir | |||
verifier with which they are associated. When the verifier is | verifier with which they are associated. When the verifier is | |||
accepted as valid, the cookie will continue the READDIR operation | accepted as valid, the cookie will continue the READDIR operation | |||
so that the entire directory can be obtained by the client. | so that the entire directory can be obtained by the client. | |||
11.7.3. Filehandles and File System Transitions | 11.7.3. Filehandles and File System Transitions | |||
There are a number of ways in which filehandles can be handled across | There are a number of ways in which filehandles can be handled across | |||
a file system transition. These can be divided into two broad | a file system transition. These can be divided into two broad | |||
classes depending upon whether the two file systems across which the | classes depending upon whether the two file systems across which the | |||
transition happens share sufficient state to effect some sort of | transition happens share sufficient state to effect some sort of | |||
continuity of file system handling. | continuity of file system handling. | |||
When there is no such co-operation in filehandle assignment, the two | When there is no such cooperation in filehandle assignment, the two | |||
file systems are reported as being in different _handle_ classes. In | file systems are reported as being in different handle classes. In | |||
this case, all filehandles are assumed to expire as part of the file | this case, all filehandles are assumed to expire as part of the file | |||
system transition. Note that this behavior does not depend on | system transition. Note that this behavior does not depend on the | |||
fh_expire_type attribute and supersedes the specification of | fh_expire_type attribute and supersedes the specification of the | |||
FH4_VOL_MIGRATION bit, which only affects behavior when | FH4_VOL_MIGRATION bit, which only affects behavior when | |||
fs_locations_info is not available. | fs_locations_info is not available. | |||
When there is co-operation in filehandle assignment, the two file | When there is cooperation in filehandle assignment, the two file | |||
systems are reported as being in the same _handle_ classes. In this | systems are reported as being in the same handle classes. In this | |||
case, persistent filehandles remain valid after the file system | case, persistent filehandles remain valid after the file system | |||
transition, while volatile filehandles (excluding those that are only | transition, while volatile filehandles (excluding those that are only | |||
volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration | volatile due to the FH4_VOL_MIGRATION bit) are subject to expiration | |||
on the target server. | on the target server. | |||
11.7.4. Fileids and File System Transitions | 11.7.4. Fileids and File System Transitions | |||
In NFSv4.0, the issue of continuity of fileids in the event of a file | In NFSv4.0, the issue of continuity of fileids in the event of a file | |||
system transition was not addressed. The general expectation had | system transition was not addressed. The general expectation had | |||
been that in situations in which the two file system instances are | been that in situations in which the two file system instances are | |||
created by a single vendor using some sort of file system image copy, | created by a single vendor using some sort of file system image copy, | |||
fileids will be consistent across the transition while in the | fileids will be consistent across the transition, while in the | |||
analogous multi-vendor transitions they will not. This poses | analogous multi-vendor transitions they will not. This poses | |||
difficulties, especially for the client without special knowledge of | difficulties, especially for the client without special knowledge of | |||
the transition mechanisms adopted by the server. Note that although | the transition mechanisms adopted by the server. Note that although | |||
fileid is not a REQUIRED attribute, many servers support fileids and | fileid is not a REQUIRED attribute, many servers support fileids and | |||
many clients provide API's that depend on fileids. | many clients provide APIs that depend on fileids. | |||
It is important to note that while clients themselves may have no | It is important to note that while clients themselves may have no | |||
trouble with a fileid changing as a result of a file system | trouble with a fileid changing as a result of a file system | |||
transition event, applications do typically have access to the fileid | transition event, applications do typically have access to the fileid | |||
(e.g. via stat), and the result of this is that an application may | (e.g., via stat). The result is that an application may work | |||
work perfectly well if there is no file system instance transition or | perfectly well if there is no file system instance transition or if | |||
if any such transition is among instances created by a single vendor, | any such transition is among instances created by a single vendor, | |||
yet be unable to deal with the situation in which a multi-vendor | yet be unable to deal with the situation in which a multi-vendor | |||
transition occurs, at the wrong time. | transition occurs at the wrong time. | |||
Providing the same fileids in a multi-vendor (multiple server | Providing the same fileids in a multi-vendor (multiple server | |||
vendors) environment has generally been held to be quite difficult. | vendors) environment has generally been held to be quite difficult. | |||
While there is work to be done, it needs to be pointed out that this | While there is work to be done, it needs to be pointed out that this | |||
difficulty is partly self-imposed. Servers have typically identified | difficulty is partly self-imposed. Servers have typically identified | |||
fileid with inode number, i.e. with a quantity used to find the file | fileid with inode number, i.e. with a quantity used to find the file | |||
in question. This identification poses special difficulties for | in question. This identification poses special difficulties for | |||
migration of a file system between vendors where assigning the same | migration of a file system between vendors where assigning the same | |||
index to a given file may not be possible. Note here that a fileid | index to a given file may not be possible. Note here that a fileid | |||
is not required to be useful to find the file in question, only that | is not required to be useful to find the file in question, only that | |||
skipping to change at page 243, line 21 | skipping to change at page 243, line 27 | |||
accept a fileid as a single piece of metadata and store it apart from | accept a fileid as a single piece of metadata and store it apart from | |||
the value used to index the file information can relatively easily | the value used to index the file information can relatively easily | |||
maintain a fileid value across a migration event, allowing a truly | maintain a fileid value across a migration event, allowing a truly | |||
transparent migration event. | transparent migration event. | |||
In any case, where servers can provide continuity of fileids, they | In any case, where servers can provide continuity of fileids, they | |||
should, and the client should be able to find out that such | should, and the client should be able to find out that such | |||
continuity is available and take appropriate action. Information | continuity is available and take appropriate action. Information | |||
about the continuity (or lack thereof) of fileids across a file | about the continuity (or lack thereof) of fileids across a file | |||
system transition is represented by specifying whether the file | system transition is represented by specifying whether the file | |||
systems in question are of the same _fileid_ class. | systems in question are of the same fileid class. | |||
Note that when consistent fileids do not exist across a transition | Note that when consistent fileids do not exist across a transition | |||
(either because there is no continuity of fileids or because fileid | (either because there is no continuity of fileids or because fileid | |||
is not a supported attribute on one of instances involved), and there | is not a supported attribute on one of instances involved), and there | |||
are no reliable filehandles across a transition event (either because | are no reliable filehandles across a transition event (either because | |||
there is no filehandle continuity or because the filehandles are | there is no filehandle continuity or because the filehandles are | |||
volatile), the client is in a position where it cannot verify that | volatile), the client is in a position where it cannot verify that | |||
files it was accessing before the transition are the same objects. | files it was accessing before the transition are the same objects. | |||
It is forced to assume that no object has been renamed, and, unless | It is forced to assume that no object has been renamed, and, unless | |||
there are guarantees that provide this (e.g. the file system is read- | there are guarantees that provide this (e.g., the file system is | |||
only), problems for applications may occur. Therefore, use of such | read-only), problems for applications may occur. Therefore, use of | |||
configurations should be limited to situations where the problems | such configurations should be limited to situations where the | |||
that this may cause can be tolerated. | problems that this may cause can be tolerated. | |||
11.7.5. Fsids and File System Transitions | 11.7.5. Fsids and File System Transitions | |||
Since fsids are generally only unique within a per-server basis, it | Since fsids are generally only unique within a per-server basis, it | |||
is likely that they will change during a file system transition. One | is likely that they will change during a file system transition. One | |||
exception is the case of transparent transitions, but in that case we | exception is the case of transparent transitions, but in that case we | |||
have multiple network addresses that are defined as the same server | have multiple network addresses that are defined as the same server | |||
(as specified by a common value of the so_major_id field of | (as specified by a common value of the so_major_id field of | |||
eir_server_owner). Clients should not make the fsids received from | eir_server_owner). Clients should not make the fsids received from | |||
the server visible to applications since they may not be globally | the server visible to applications since they may not be globally | |||
unique, and because they may change during a file system transition | unique, and because they may change during a file system transition | |||
event. Applications are best served if they are isolated from such | event. Applications are best served if they are isolated from such | |||
transitions to the extent possible. | transitions to the extent possible. | |||
Although normally, a single source file system will transition to a | Although normally a single source file system will transition to a | |||
single target file system, there is a provision for splitting a | single target file system, there is a provision for splitting a | |||
single source file system into multiple target file systems, by | single source file system into multiple target file systems, by | |||
specifying the FSLI4F_MULTI_FS flag. | specifying the FSLI4F_MULTI_FS flag. | |||
11.7.5.1. File System Splitting | 11.7.5.1. File System Splitting | |||
When a file system transition is made and the fs_locations_info | When a file system transition is made and the fs_locations_info | |||
indicates that the file system in question may be split into multiple | indicates that the file system in question may be split into multiple | |||
file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do | file systems (via the FSLI4F_MULTI_FS flag), the client SHOULD do | |||
GETATTRs to determine the fsid attribute on all known objects within | GETATTRs to determine the fsid attribute on all known objects within | |||
the file system undergoing transition to determine the new file | the file system undergoing transition to determine the new file | |||
system boundaries. | system boundaries. | |||
Clients may maintain the fsids passed to existing applications by | Clients may maintain the fsids passed to existing applications by | |||
mapping all of the fsids for the descendant file systems to the | mapping all of the fsids for the descendant file systems to the | |||
common fsid used for the original file system. | common fsid used for the original file system. | |||
Splitting a file system may be done on a transition between file | Splitting a file system may be done on a transition between file | |||
systems of the same _fileid_ class, since the fact that fileids are | systems of the same fileid class, since the fact that fileids are | |||
unique within the source file system ensure they will be unique in | unique within the source file system ensure they will be unique in | |||
each of the target file systems. | each of the target file systems. | |||
11.7.6. The Change Attribute and File System Transitions | 11.7.6. The Change Attribute and File System Transitions | |||
Since the change attribute is defined as a server-specific one, | Since the change attribute is defined as a server-specific one, | |||
change attributes fetched from one server are normally presumed to be | change attributes fetched from one server are normally presumed to be | |||
invalid on another server. Such a presumption is troublesome since | invalid on another server. Such a presumption is troublesome since | |||
it would invalidate all cached change attributes, requiring | it would invalidate all cached change attributes, requiring | |||
refetching. Even more disruptive, the absence of any assured | refetching. Even more disruptive, the absence of any assured | |||
continuity for the change attribute means that even if the same value | continuity for the change attribute means that even if the same value | |||
is retrieved on refetch no conclusions can drawn as to whether the | is retrieved on refetch, no conclusions can be drawn as to whether | |||
object in question has changed. The identical change attribute could | the object in question has changed. The identical change attribute | |||
be merely an artifact of a modified file with a different change | could be merely an artifact of a modified file with a different | |||
attribute construction algorithm, with that new algorithm just | change attribute construction algorithm, with that new algorithm just | |||
happening to result in an identical change value. | happening to result in an identical change value. | |||
When the two file systems have consistent change attribute formats, | When the two file systems have consistent change attribute formats, | |||
and this fact is communicated to the client by reporting as in the | and this fact is communicated to the client by reporting in the same | |||
same _change_ class, the client may assume a continuity of change | change class, the client may assume a continuity of change attribute | |||
attribute construction and handle this situation just as it would be | construction and handle this situation just as it would be handled | |||
handled without any file system transition. | without any file system transition. | |||
11.7.7. Lock State and File System Transitions | 11.7.7. Lock State and File System Transitions | |||
In a file system transition, the client needs to handle cases in | In a file system transition, the client needs to handle cases in | |||
which the two servers have cooperated in state management and in | which the two servers have cooperated in state management and in | |||
which they have not. Cooperation by two servers in state management | which they have not. Cooperation by two servers in state management | |||
requires coordination of client IDs. Before the client attempts to | requires coordination of client IDs. Before the client attempts to | |||
use a client ID associated with one server in a request to the server | use a client ID associated with one server in a request to the server | |||
of the other file system, it must eliminate the possibility that two | of the other file system, it must eliminate the possibility that two | |||
non-cooperating servers have assigned the same client ID by accident. | non-cooperating servers have assigned the same client ID by accident. | |||
skipping to change at page 245, line 17 | skipping to change at page 245, line 26 | |||
not cooperated in state management. If the scope values match, then | not cooperated in state management. If the scope values match, then | |||
this indicates the servers have cooperated in assigning client IDs to | this indicates the servers have cooperated in assigning client IDs to | |||
the point that they will reject client IDs that refer to state they | the point that they will reject client IDs that refer to state they | |||
do not know about. See Section 2.10.4 for more information about the | do not know about. See Section 2.10.4 for more information about the | |||
use of server scope. | use of server scope. | |||
In the case of migration, the servers involved in the migration of a | In the case of migration, the servers involved in the migration of a | |||
file system SHOULD transfer all server state from the original to the | file system SHOULD transfer all server state from the original to the | |||
new server. When this is done, it must be done in a way that is | new server. When this is done, it must be done in a way that is | |||
transparent to the client. With replication, such a degree of common | transparent to the client. With replication, such a degree of common | |||
state is typically not the case. Clients, however should use the | state is typically not the case. Clients, however, should use the | |||
information provided by the eir_server_scope returned by EXCHANGE_ID | information provided by the eir_server_scope returned by EXCHANGE_ID | |||
(as modified by the validation procedures described in | (as modified by the validation procedures described in | |||
Section 2.10.4) to determine whether such sharing may be in effect, | Section 2.10.4) to determine whether such sharing may be in effect, | |||
rather than making assumptions based on the reason for the | rather than making assumptions based on the reason for the | |||
transition. | transition. | |||
This state transfer will reduce disruption to the client when a file | This state transfer will reduce disruption to the client when a file | |||
system transition occurs. If the servers are successful in | system transition occurs. If the servers are successful in | |||
transferring all state, the client can attempt to establish sessions | transferring all state, the client can attempt to establish sessions | |||
associated with the client ID used for the source file system | associated with the client ID used for the source file system | |||
instance. If the server accepts that as a valid client ID, then the | instance. If the server accepts that as a valid client ID, then the | |||
client may use the existing stateids associated with that client ID | client may use the existing stateids associated with that client ID | |||
for the old file system instance in connection with that same client | for the old file system instance in connection with that same client | |||
ID in connection with the transitioned file system instance. If the | ID in connection with the transitioned file system instance. If the | |||
client in question already had a client ID on the target system, it | client in question already had a client ID on the target system, it | |||
may interrogate the stateid values from the source system under that | may interrogate the stateid values from the source system under that | |||
new client ID, with the assurance that if they are accepted as valid, | new client ID, with the assurance that if they are accepted as valid, | |||
then they represent validly transferred lock state for the source | then they represent validly transferred lock state for the source | |||
file system, transferred to the target server. | file system, which has been transferred to the target server. | |||
When the two servers belong to the same server scope, it does not | When the two servers belong to the same server scope, it does not | |||
mean that when dealing with the transition, the client will not have | mean that when dealing with the transition, the client will not have | |||
to reclaim state. However it does mean that the client may proceed | to reclaim state. However, it does mean that the client may proceed | |||
using its current client ID when establishing communication with the | using its current client ID when establishing communication with the | |||
new server and the new server will either recognize the client ID as | new server, and the new server will either recognize the client ID as | |||
valid, or reject it, in which case locks must be reclaimed by the | valid or reject it, in which case locks must be reclaimed by the | |||
client. | client. | |||
File systems co-operating in state management may actually share | File systems cooperating in state management may actually share state | |||
state or simply divide the identifier space so as to recognize (and | or simply divide the identifier space so as to recognize (and reject | |||
reject as stale) each other's stateids and client IDs. Servers which | as stale) each other's stateids and client IDs. Servers that do | |||
do share state may not do so under all conditions or at all times. | share state may not do so under all conditions or at all times. If | |||
The requirement for the server is that if it cannot be sure in | the server cannot be sure when accepting a client ID that it reflects | |||
accepting a client ID that it reflects the locks the client was | the locks the client was given, the server must treat all associated | |||
given, it must treat all associated state as stale and report it as | state as stale and report it as such to the client. | |||
such to the client. | ||||
When the two file system instances are on servers that do not share a | When the two file system instances are on servers that do not share a | |||
server scope value, the client must establish a new client ID on the | server scope value, the client must establish a new client ID on the | |||
destination, if it does not have one already, and reclaim locks if | destination, if it does not have one already, and reclaim locks if | |||
allowed by the server. In this case, old stateids and client IDs | allowed by the server. In this case, old stateids and client IDs | |||
should not be presented to the new server since there is no assurance | should not be presented to the new server since there is no assurance | |||
that they will not conflict with IDs valid on that server. Note that | that they will not conflict with IDs valid on that server. Note that | |||
in this case lock reclaim may be attempted even when the servers | in this case, lock reclaim may be attempted even when the servers | |||
involved in the transfer have different server scope values (see | involved in the transfer have different server scope values (see | |||
Section 8.4.2.1 for the contrary case of reclaim after server reboot. | Section 8.4.2.1 for the contrary case of reclaim after server | |||
Servers with different server scope values may co-operate to allow | reboot). Servers with different server scope values may cooperate to | |||
reclaim for locks associated with the transfer of a filesystem even | allow reclaim for locks associated with the transfer of a file system | |||
if they do not co-operate sufficiently to share a server scope. | even if they do not cooperate sufficiently to share a server scope. | |||
In either case, when actual locks are not known to be maintained, the | In either case, when actual locks are not known to be maintained, the | |||
destination server may establish a grace period specific to the given | destination server may establish a grace period specific to the given | |||
file system, with non-reclaim locks being rejected for that file | file system, with non-reclaim locks being rejected for that file | |||
system, even though normal locks are being granted for other file | system, even though normal locks are being granted for other file | |||
systems. Clients should not infer the absence of a grace period for | systems. Clients should not infer the absence of a grace period for | |||
file systems being transitioned to a server from responses to | file systems being transitioned to a server from responses to | |||
requests for other file systems. | requests for other file systems. | |||
In the case of lock reclamation for a given file system after a file | In the case of lock reclamation for a given file system after a file | |||
system transition, edge conditions can arise similar to those for | system transition, edge conditions can arise similar to those for | |||
reclaim after server restart (although in the case of the planned | reclaim after server restart (although in the case of the planned | |||
state transfer associated with migration, these can be avoided by | state transfer associated with migration, these can be avoided by | |||
securely recording lock state as part of state migration). Unless | securely recording lock state as part of state migration). Unless | |||
the destination server can guarantee that locks will not be | the destination server can guarantee that locks will not be | |||
incorrectly granted, the destination server should not allow lock | incorrectly granted, the destination server should not allow lock | |||
reclaims and avoid establishing a grace period. | reclaims and should avoid establishing a grace period. | |||
Once all locks have been reclaimed, or there were no locks to | Once all locks have been reclaimed, or there were no locks to | |||
reclaim, the client indicates that there are no more reclaims to be | reclaim, the client indicates that there are no more reclaims to be | |||
done for the file system in question by sending a RECLAIM_COMPLETE | done for the file system in question by sending a RECLAIM_COMPLETE | |||
operation with the rca_one_fs parameter set to true. Once this has | operation with the rca_one_fs parameter set to true. Once this has | |||
been done, non-reclaim locking operations may be done, and any | been done, non-reclaim locking operations may be done, and any | |||
subsequent request to do reclaims will be rejected with the error | subsequent request to do reclaims will be rejected with the error | |||
NFS4ERR_NO_GRACE. | NFS4ERR_NO_GRACE. | |||
Information about client identity may be propagated between servers | Information about client identity may be propagated between servers | |||
in the form of client_owner4 and associated verifiers, under the | in the form of client_owner4 and associated verifiers, under the | |||
assumption that the client presents the same values to all the | assumption that the client presents the same values to all the | |||
servers with which it deals. | servers with which it deals. | |||
Servers are encouraged to provide facilities to allow locks to be | Servers are encouraged to provide facilities to allow locks to be | |||
reclaimed on the new server after a file system transition. Often, | reclaimed on the new server after a file system transition. Often, | |||
however, in cases in which the two servers do not share a server | however, in cases in which the two servers do not share a server | |||
scope value, such facilities may not be available and client should | scope value, such facilities may not be available and the client | |||
be prepared to re-obtain locks, even though it is possible that the | should be prepared to re-obtain locks, even though it is possible | |||
client may have its LOCK or OPEN request denied due to a conflicting | that the client may have its LOCK or OPEN request denied due to a | |||
lock. | conflicting lock. | |||
The consequences of having no facilities available to reclaim locks | The consequences of having no facilities available to reclaim locks | |||
on the new server will depend on the type of environment. In some | on the new server will depend on the type of environment. In some | |||
environments, such as the transition between read-only file systems, | environments, such as the transition between read-only file systems, | |||
such denial of locks should not pose large difficulties in practice. | such denial of locks should not pose large difficulties in practice. | |||
When an attempt to re-establish a lock on a new server is denied, the | When an attempt to re-establish a lock on a new server is denied, the | |||
client should treat the situation as if its original lock had been | client should treat the situation as if its original lock had been | |||
revoked. Note that when the lock is granted, the client cannot | revoked. Note that when the lock is granted, the client cannot | |||
assume that no conflicting lock could have been granted in the | assume that no conflicting lock could have been granted in the | |||
interim. Where change attribute continuity is present, the client | interim. Where change attribute continuity is present, the client | |||
skipping to change at page 247, line 33 | skipping to change at page 247, line 41 | |||
11.7.7.1. Leases and File System Transitions | 11.7.7.1. Leases and File System Transitions | |||
In the case of lease renewal, the client may not be submitting | In the case of lease renewal, the client may not be submitting | |||
requests for a file system that has been transferred to another | requests for a file system that has been transferred to another | |||
server. This can occur because of the lease renewal mechanism. The | server. This can occur because of the lease renewal mechanism. The | |||
client renews the lease associated with all file systems when | client renews the lease associated with all file systems when | |||
submitting a request on an associated session, regardless of the | submitting a request on an associated session, regardless of the | |||
specific file system being referenced. | specific file system being referenced. | |||
In order for the client to schedule renewal of leases where there is | In order for the client to schedule renewal of its lease where there | |||
locking state that may have been relocated to the new server, the | is locking state that may have been relocated to the new server, the | |||
client must find out about lease relocation before those leases | client must find out about lease relocation before that lease expire. | |||
expire. To accomplish this, the SEQUENCE operation will return the | To accomplish this, the SEQUENCE operation will return the status bit | |||
status bit SEQ4_STATUS_LEASE_MOVED, if responsibility for any of the | SEQ4_STATUS_LEASE_MOVED if responsibility for any of the renewed | |||
locking state renewed has been transferred to a new server. This | locking state has been transferred to a new server. This will | |||
will continue until the client receives an NFS4ERR_MOVED error for | continue until the client receives an NFS4ERR_MOVED error for each of | |||
each of the file systems for which there has been locking state | the file systems for which there has been locking state relocation. | |||
relocation. | ||||
When a client receives an SEQ4_STATUS_LEASE_MOVED indication, it | When a client receives an SEQ4_STATUS_LEASE_MOVED indication from a | |||
should perform an operation on each file system associated with the | server, for each file system of the server for which the client has | |||
server where there is locking state for the current client associated | locking state, the client should perform an operation. For | |||
with the file system in question. The client may choose to reference | simplicity, the client may choose to reference all file systems, but | |||
all file systems in the interests of simplicity but what is important | what is important is that it must reference all file systems for | |||
is that it must reference all file systems for which there was | which there was locking state where that state has moved. Once the | |||
locking state where that state moved. Once the client receives an | client receives an NFS4ERR_MOVED error for each such file system, the | |||
NFS4ERR_MOVED error for each file system, the SEQ4_STATUS_LEASE_MOVED | server will clear the SEQ4_STATUS_LEASE_MOVED indication. The client | |||
indication is cleared. The client can terminate the process of | can terminate the process of checking file systems once this | |||
checking file systems once this indication is cleared (but only if | indication is cleared (but only if the client has received a reply | |||
the client has received a reply for all outstanding SEQUENCE requests | for all outstanding SEQUENCE requests on all sessions it has with the | |||
on all sessions it has with the server), since there are no others | server), since there are no others for which locking state has moved. | |||
for which locking state has moved. | ||||
A client may use GETATTR of the fs_status (or fs_locations_info) | A client may use GETATTR of the fs_status (or fs_locations_info) | |||
attribute on all of the file systems to get absence indications in a | attribute on all of the file systems to get absence indications in a | |||
single (or a few) request(s), since absent file systems will not | single (or a few) request(s), since absent file systems will not | |||
cause an error in this context. However, it still must do an | cause an error in this context. However, it still must do an | |||
operation which receives NFS4ERR_MOVED on each file system, in order | operation that receives NFS4ERR_MOVED on each file system, in order | |||
to clear the SEQ4_STATUS_LEASE_MOVED indication is cleared. | to clear the SEQ4_STATUS_LEASE_MOVED indication. | |||
Once the set of file systems with transferred locking state has been | Once the set of file systems with transferred locking state has been | |||
determined, the client can follow the normal process to obtain the | determined, the client can follow the normal process to obtain the | |||
new server information (through the fs_locations and | new server information (through the fs_locations and | |||
fs_locations_info attributes) and perform renewal of those leases on | fs_locations_info attributes) and perform renewal of that lease on | |||
the new server, unless information in fs_locations_info attribute | the new server, unless information in the fs_locations_info attribute | |||
shows that no state could have been transferred. If the server has | shows that no state could have been transferred. If the server has | |||
not had state transferred to it transparently, the client will | not had state transferred to it transparently, the client will | |||
receive NFS4ERR_STALE_CLIENTID from the new server, as described | receive NFS4ERR_STALE_CLIENTID from the new server, as described | |||
above, and the client can then reclaim locks as is done in the event | above, and the client can then reclaim locks as is done in the event | |||
of server failure. | of server failure. | |||
11.7.7.2. Transitions and the Lease_time Attribute | 11.7.7.2. Transitions and the Lease_time Attribute | |||
In order that the client may appropriately manage its leases in the | In order that the client may appropriately manage its lease in the | |||
case of a file system transition, the destination server must | case of a file system transition, the destination server must | |||
establish proper values for the lease_time attribute. | establish proper values for the lease_time attribute. | |||
When state is transferred transparently, that state should include | When state is transferred transparently, that state should include | |||
the correct value of the lease_time attribute. The lease_time | the correct value of the lease_time attribute. The lease_time | |||
attribute on the destination server must never be less than that on | attribute on the destination server must never be less than that on | |||
the source since this would result in premature expiration of leases | the source, since this would result in premature expiration of a | |||
granted by the source server. Upon transitions in which state is | lease granted by the source server. Upon transitions in which state | |||
transferred transparently, the client is under no obligation to re- | is transferred transparently, the client is under no obligation to | |||
fetch the lease_time attribute and may continue to use the value | refetch the lease_time attribute and may continue to use the value | |||
previously fetched (on the source server). | previously fetched (on the source server). | |||
If state has not been transferred transparently, either because the | If state has not been transferred transparently, either because the | |||
associated servers are shown as having different eir_server_scope | associated servers are shown as having different eir_server_scope | |||
strings or because the client ID is rejected when presented to the | strings or because the client ID is rejected when presented to the | |||
new server, the client should fetch the value of lease_time on the | new server, the client should fetch the value of lease_time on the | |||
new (i.e. destination) server, and use it for subsequent locking | new (i.e., destination) server, and use it for subsequent locking | |||
requests. However the server must respect a grace period at least as | requests. However, the server must respect a grace period of at | |||
long as the lease_time on the source server, in order to ensure that | least as long as the lease_time on the source server, in order to | |||
clients have ample time to reclaim their lock before potentially | ensure that clients have ample time to reclaim their lock before | |||
conflicting non-reclaimed locks are granted. | potentially conflicting non-reclaimed locks are granted. | |||
11.7.8. Write Verifiers and File System Transitions | 11.7.8. Write Verifiers and File System Transitions | |||
In a file system transition, the two file systems may be clustered in | In a file system transition, the two file systems may be clustered in | |||
the handling of unstably written data. When this is the case, and | the handling of unstably written data. When this is the case, and | |||
the two file systems belong to the same _write-verifier_ class, write | the two file systems belong to the same write-verifier class, write | |||
verifiers returned from one system may be compared to those returned | verifiers returned from one system may be compared to those returned | |||
by the other and superfluous writes avoided. | by the other and superfluous writes avoided. | |||
When two file systems belong to different _write-verifier_ classes, | When two file systems belong to different write-verifier classes, any | |||
any verifier generated by one must not be compared to one provided by | verifier generated by one must not be compared to one provided by the | |||
the other. Instead, it should be treated as not equal even when the | other. Instead, it should be treated as not equal even when the | |||
values are identical. | values are identical. | |||
11.7.9. Readdir Cookies and Verifiers and File System Transitions | 11.7.9. Readdir Cookies and Verifiers and File System Transitions | |||
In a file system transition, the two file systems may be consistent | In a file system transition, the two file systems may be consistent | |||
in their handling of READDIR cookies and verifiers. When this is the | in their handling of READDIR cookies and verifiers. When this is the | |||
case, and the two file systems belong to the same _readdir_ class, | case, and the two file systems belong to the same readdir class, | |||
READDIR cookies and verifiers from one system may be recognized by | READDIR cookies and verifiers from one system may be recognized by | |||
the other and READDIR operations started on one server may be validly | the other and READDIR operations started on one server may be validly | |||
continued on the other, simply by presenting the cookie and verifier | continued on the other, simply by presenting the cookie and verifier | |||
returned by a READDIR operation done on the first file system to the | returned by a READDIR operation done on the first file system to the | |||
second. | second. | |||
When two file systems belong to different _readdir_ classes, any | When two file systems belong to different readdir classes, any | |||
READDIR cookie and verifier generated by one is not valid on the | READDIR cookie and verifier generated by one is not valid on the | |||
second, and must not be presented to that server by the client. The | second, and must not be presented to that server by the client. The | |||
client should act as if the verifier was rejected. | client should act as if the verifier was rejected. | |||
11.7.10. File System Data and File System Transitions | 11.7.10. File System Data and File System Transitions | |||
When multiple replicas exist and are used simultaneously or in | When multiple replicas exist and are used simultaneously or in | |||
succession by a client, applications using them will normally expect | succession by a client, applications using them will normally expect | |||
that they contain data the same data or data which is consistent with | that they contain either the same data or data that is consistent | |||
the normal sorts of changes that are made by other clients updating | with the normal sorts of changes that are made by other clients | |||
the data of the file system. (with metadata being the same to the | updating the data of the file system (with metadata being the same to | |||
degree indicated by the fs_locations_info attribute). However, when | the degree indicated by the fs_locations_info attribute). However, | |||
multiple file systems are presented as replicas of one another, the | when multiple file systems are presented as replicas of one another, | |||
precise relationship between the data of one and the data of another | the precise relationship between the data of one and the data of | |||
is not, as a general matter, specified by the NFSv4.1 protocol. It | another is not, as a general matter, specified by the NFSv4.1 | |||
is quite possible to present as replicas file systems where the data | protocol. It is quite possible to present as replicas file systems | |||
of those file systems is sufficiently different that some | where the data of those file systems is sufficiently different that | |||
applications have problems dealing with the transition between | some applications have problems dealing with the transition between | |||
replicas. The namespace will typically be constructed so that | replicas. The namespace will typically be constructed so that | |||
applications can choose an appropriate level of support, so that in | applications can choose an appropriate level of support, so that in | |||
one position in the namespace a varied set of replicas will be listed | one position in the namespace a varied set of replicas will be | |||
while in another only those that are up-to-date may be considered | listed, while in another only those that are up-to-date may be | |||
replicas. The protocol does define three special cases of the | considered replicas. The protocol does define four special cases of | |||
relationship among replicas to be specified by the server and relied | the relationship among replicas to be specified by the server and | |||
upon by clients: | relied upon by clients: | |||
o When multiple server addresses correspond to the same actual | o When multiple server addresses correspond to the same actual | |||
server, as indicated by a common so_major_id field within the | server, as indicated by a common so_major_id field within the | |||
eir_server_owner field returned by EXCHANGE_ID, the client may | eir_server_owner field returned by EXCHANGE_ID, the client may | |||
depend on the fact that changes to data, metadata, or locks made | depend on the fact that changes to data, metadata, or locks made | |||
on one file system are immediately reflected on others. | on one file system are immediately reflected on others. | |||
o When multiple replicas exist and are used simultaneously by a | o When multiple replicas exist and are used simultaneously by a | |||
client (see the FSLIB4_CLSIMUL definition within | client (see the FSLIB4_CLSIMUL definition within | |||
fs_locations_info), they must designate the same data. Where file | fs_locations_info), they must designate the same data. Where file | |||
systems are writable, a change made on one instance must be | systems are writable, a change made on one instance must be | |||
visible on all instances, immediately upon the earlier of the | visible on all instances, immediately upon the earlier of the | |||
return of the modifying requester or the visibility of that change | return of the modifying requester or the visibility of that change | |||
on any of the associated replicas. This allows a client to use | on any of the associated replicas. This allows a client to use | |||
these replicas simultaneously without any special adaptation to | these replicas simultaneously without any special adaptation to | |||
the fact that there are multiple replicas. In this case, locks, | the fact that there are multiple replicas. In this case, locks | |||
whether shared or byte-range, and delegations obtained one replica | (whether share reservations or byte-range locks) and delegations | |||
are immediately reflected on all replicas, even though these locks | obtained on one replica are immediately reflected on all replicas, | |||
will be managed under a set of client IDs. | even though these locks will be managed under a set of client IDs. | |||
o When one replica is designated as the successor instance to | o When one replica is designated as the successor instance to | |||
another existing instance after return NFS4ERR_MOVED (i.e. the | another existing instance after return NFS4ERR_MOVED (i.e., the | |||
case of migration), the client may depend on the fact that all | case of migration), the client may depend on the fact that all | |||
changes securely made to data (uncommitted writes are dealt with | changes written to stable storage on the original instance are | |||
in Section 11.7.8) on the original instance are made to the | written to stable storage of the successor (uncommitted writes are | |||
successor image. | dealt with in Section 11.7.8). | |||
o Where a file system is not writable but represents a read-only | o Where a file system is not writable but represents a read-only | |||
copy (possibly periodically updated) of a writable file system, | copy (possibly periodically updated) of a writable file system, | |||
clients have similar requirements with regard to the propagation | clients have similar requirements with regard to the propagation | |||
of updates. They may need a guarantee that any change visible on | of updates. They may need a guarantee that any change visible on | |||
the original file system instance must be immediately visible on | the original file system instance must be immediately visible on | |||
any replica before the client transitions access to that replica, | any replica before the client transitions access to that replica, | |||
in order to avoid any possibility that a client, in effecting a | in order to avoid any possibility that a client, in effecting a | |||
transition to a replica, will see any reversion in file system | transition to a replica, will see any reversion in file system | |||
state. The specific means by which this will be prevented varies | state. The specific means of this guarantee varies based on the | |||
based on fs4_status_type reported as part of the fs_status | value of the fss_type field that is reported as part of the | |||
attribute (see Section 11.11). Since these file systems are | fs_status attribute (see Section 11.11). Since these file systems | |||
presumed not to be suitable for simultaneous use, there is no | are presumed to be unsuitable for simultaneous use, there is no | |||
specification of how locking is handled and it generally will be | specification of how locking is handled; in general, locks | |||
the case that locks obtained one file system will be separate from | obtained on one file system will be separate from those on others. | |||
those on others. Since these are going to be read-only file | ||||
systems, this is not expected to pose an issue for clients or | Since these are going to be read-only file systems, this is not | |||
applications. | expected to pose an issue for clients or applications. | |||
11.8. Effecting File System Referrals | 11.8. Effecting File System Referrals | |||
Referrals are effected when an absent file system is encountered, and | Referrals are effected when an absent file system is encountered and | |||
one or more alternate locations are made available by the | one or more alternate locations are made available by the | |||
fs_locations or fs_locations_info attributes. The client will | fs_locations or fs_locations_info attributes. The client will | |||
typically get an NFS4ERR_MOVED error, fetch the appropriate location | typically get an NFS4ERR_MOVED error, fetch the appropriate location | |||
information and proceed to access the file system on a different | information, and proceed to access the file system on a different | |||
server, even though it retains its logical position within the | server, even though it retains its logical position within the | |||
original namespace. Referrals differ from migration events in that | original namespace. Referrals differ from migration events in that | |||
they happen only when the client has not previously referenced the | they happen only when the client has not previously referenced the | |||
file system in question (so there is nothing to transition). | file system in question (so there is nothing to transition). | |||
Referrals can only come into effect when an absent file system is | Referrals can only come into effect when an absent file system is | |||
encountered at its root. | encountered at its root. | |||
The examples given in the sections below are somewhat artificial in | The examples given in the sections below are somewhat artificial in | |||
that an actual client will not typically do a multi-component lookup, | that an actual client will not typically do a multi-component look | |||
but will have cached information regarding the upper levels of the | up, but will have cached information regarding the upper levels of | |||
name hierarchy. However, these example are chosen to make the | the name hierarchy. However, these example are chosen to make the | |||
required behavior clear and easy to put within the scope of a small | required behavior clear and easy to put within the scope of a small | |||
number of requests, without getting unduly into details of how | number of requests, without getting unduly into details of how | |||
specific clients might choose to cache things. | specific clients might choose to cache things. | |||
11.8.1. Referral Example (LOOKUP) | 11.8.1. Referral Example (LOOKUP) | |||
Let us suppose that the following COMPOUND is sent in an environment | Let us suppose that the following COMPOUND is sent in an environment | |||
in which /this/is/the/path is absent from the target server. This | in which /this/is/the/path is absent from the target server. This | |||
may be for a number of reasons. It may be the case that the file | may be for a number of reasons. It may be that the file system has | |||
system has moved, or, it may be the case that the target server is | moved, or it may be that the target server is functioning mainly, or | |||
functioning mainly, or solely, to refer clients to the servers on | solely, to refer clients to the servers on which various file systems | |||
which various file systems are located. | are located. | |||
o PUTROOTFH | o PUTROOTFH | |||
o LOOKUP "this" | o LOOKUP "this" | |||
o LOOKUP "is" | o LOOKUP "is" | |||
o LOOKUP "the" | o LOOKUP "the" | |||
o LOOKUP "path" | o LOOKUP "path" | |||
o GETFH | o GETFH | |||
o GETATTR fsid,fileid,size,time_modify | o GETATTR (fsid, fileid, size, time_modify) | |||
Under the given circumstances, the following will be the result. | Under the given circumstances, the following will be the result. | |||
o PUTROOTFH --> NFS_OK. The current fh is now the root of the | o PUTROOTFH --> NFS_OK. The current fh is now the root of the | |||
pseudo-fs. | pseudo-fs. | |||
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is | o LOOKUP "this" --> NFS_OK. The current fh is for /this and is | |||
within the pseudo-fs. | within the pseudo-fs. | |||
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is | o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is | |||
within the pseudo-fs. | within the pseudo-fs. | |||
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | |||
is within the pseudo-fs. | is within the pseudo-fs. | |||
o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path | o LOOKUP "path" --> NFS_OK. The current fh is for /this/is/the/path | |||
and is within a new, absent file system, but ... the client will | and is within a new, absent file system, but ... the client will | |||
never see the value of that fh. | never see the value of that fh. | |||
o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent | o GETFH --> NFS4ERR_MOVED. Fails because current fh is in an absent | |||
file system at the start of the operation and the spec makes no | file system at the start of the operation, and the specification | |||
exception for GETFH. | makes no exception for GETFH. | |||
o GETATTR fsid,fileid,size,time_modify. Not executed because the | o GETATTR (fsid, fileid, size, time_modify). Not executed because | |||
failure of the GETFH stops processing of the COMPOUND. | the failure of the GETFH stops processing of the COMPOUND. | |||
Given the failure of the GETFH, the client has the job of determining | Given the failure of the GETFH, the client has the job of determining | |||
the root of the absent file system and where to find that file | the root of the absent file system and where to find that file | |||
system, i.e. the server and path relative to that server's root fh. | system, i.e., the server and path relative to that server's root fh. | |||
Note here that in this example, the client did not obtain filehandles | Note that in this example, the client did not obtain filehandles and | |||
and attribute information (e.g. fsid) for the intermediate | attribute information (e.g., fsid) for the intermediate directories, | |||
directories, so that it would not be sure where the absent file | so that it would not be sure where the absent file system starts. It | |||
system starts. It could be the case, for example, that /this/is/the | could be the case, for example, that /this/is/the is the root of the | |||
is the root of the moved file system and that the reason that the | moved file system and that the reason that the look up of "path" | |||
lookup of "path" succeeded is that the file system was not absent on | succeeded is that the file system was not absent on that operation | |||
that operation but was moved between the last LOOKUP and the GETFH | but was moved between the last LOOKUP and the GETFH (since COMPOUND | |||
(since COMPOUND is not atomic). Even if we had the fsids for all of | is not atomic). Even if we had the fsids for all of the intermediate | |||
the intermediate directories, we could have no way of knowing that | directories, we could have no way of knowing that /this/is/the/path | |||
/this/is/the/path was the root of a new file system, since we don't | was the root of a new file system, since we don't yet have its fsid. | |||
yet have its fsid. | ||||
In order to get the necessary information, let us re-send the chain | In order to get the necessary information, let us re-send the chain | |||
of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we | of LOOKUPs with GETFHs and GETATTRs to at least get the fsids so we | |||
can be sure where the appropriate file system boundaries are. The | can be sure where the appropriate file system boundaries are. The | |||
client could choose to get fs_locations_info at the same time but in | client could choose to get fs_locations_info at the same time but in | |||
most cases the client will have a good guess as to where file system | most cases the client will have a good guess as to where file system | |||
boundaries are (because of where and where not NFS4ERR_MOVED was | boundaries are (because of where NFS4ERR_MOVED was, and was not, | |||
received) making fetching of fs_locations_info unnecessary. | received) making fetching of fs_locations_info unnecessary. | |||
OP01: PUTROOTFH --> NFS_OK | OP01: PUTROOTFH --> NFS_OK | |||
- Current fh is root of pseudo-fs. | - Current fh is root of pseudo-fs. | |||
OP02: GETATTR(fsid) --> NFS_OK | OP02: GETATTR(fsid) --> NFS_OK | |||
- Just for completeness. Normally, clients will know the fsid of | - Just for completeness. Normally, clients will know the fsid of | |||
the pseudo-fs as soon as they establish communication with a | the pseudo-fs as soon as they establish communication with a | |||
skipping to change at page 254, line 14 | skipping to change at page 254, line 14 | |||
OP11: GETFH --> NFS_OK | OP11: GETFH --> NFS_OK | |||
- Current fh is for /this/is/the and is within pseudo-fs. | - Current fh is for /this/is/the and is within pseudo-fs. | |||
OP12: LOOKUP "path" --> NFS_OK | OP12: LOOKUP "path" --> NFS_OK | |||
- Current fh is for /this/is/the/path and is within a new, absent | - Current fh is for /this/is/the/path and is within a new, absent | |||
file system, but ... | file system, but ... | |||
- The client will never see the value of that fh | - The client will never see the value of that fh. | |||
OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK | OP13: GETATTR(fsid, fs_locations_info) --> NFS_OK | |||
- We are getting the fsid to know where the file system boundaries | - We are getting the fsid to know where the file system boundaries | |||
are. In this operation the fsid will be different than that of | are. In this operation, the fsid will be different than that of | |||
the parent directory (which in turn was retrieved in OP10). Note | the parent directory (which in turn was retrieved in OP10). Note | |||
that the fsid we are given will not necessarily be preserved at | that the fsid we are given will not necessarily be preserved at | |||
the new location. That fsid might be different and in fact the | the new location. That fsid might be different, and in fact the | |||
fsid we have for this file system might be a valid fsid of a | fsid we have for this file system might be a valid fsid of a | |||
different file system on that new server. | different file system on that new server. | |||
- In this particular case, we are pretty sure anyway that what has | - In this particular case, we are pretty sure anyway that what has | |||
moved is /this/is/the/path rather than /this/is/the since we have | moved is /this/is/the/path rather than /this/is/the since we have | |||
the fsid of the latter and it is that of the pseudo-fs, which | the fsid of the latter and it is that of the pseudo-fs, which | |||
presumably cannot move. However, in other examples, we might not | presumably cannot move. However, in other examples, we might not | |||
have this kind of information to rely on (e.g. /this/is/the might | have this kind of information to rely on (e.g., /this/is/the might | |||
be a non-pseudo file system separate from /this/is/the/path), so | be a non-pseudo file system separate from /this/is/the/path), so | |||
we need to have another reliable source information on the | we need to have other reliable source information on the boundary | |||
boundary of the file system which is moved. If, for example, the | of the file system that is moved. If, for example, the file | |||
file system "/this/is" had moved we would have a case of migration | system /this/is had moved, we would have a case of migration | |||
rather than referral and once the boundaries of the migrated file | rather than referral, and once the boundaries of the migrated file | |||
system was clear we could fetch fs_locations_info. | system was clear we could fetch fs_locations_info. | |||
- We are fetching fs_locations_info because the fact that we got an | - We are fetching fs_locations_info because the fact that we got an | |||
NFS4ERR_MOVED at this point means that it most likely that this is | NFS4ERR_MOVED at this point means that it is most likely that this | |||
a referral and we need the destination. Even if it is the case | is a referral and we need the destination. Even if it is the case | |||
that "/this/is/the" is a file system which has migrated, we will | that /this/is/the is a file system that has migrated, we will | |||
still need the location information for that file system. | still need the location information for that file system. | |||
OP14: GETFH --> NFS4ERR_MOVED | OP14: GETFH --> NFS4ERR_MOVED | |||
- Fails because current fh is in an absent file system at the start | - Fails because current fh is in an absent file system at the start | |||
of the operation and the spec makes no exception for GETFH. Note | of the operation, and the specification makes no exception for | |||
that this means the server will never send the client a filehandle | GETFH. Note that this means the server will never send the client | |||
from within an absent file system. | a filehandle from within an absent file system. | |||
Given the above, the client knows where the root of the absent file | Given the above, the client knows where the root of the absent file | |||
system is (/this/is/the/path), by noting where the change of fsid | system is (/this/is/the/path) by noting where the change of fsid | |||
occurred (between "the" and "path"). The fs_locations_info attribute | occurred (between "the" and "path"). The fs_locations_info attribute | |||
also gives the client the actual location of the absent file system, | also gives the client the actual location of the absent file system, | |||
so that the referral can proceed. The server gives the client the | so that the referral can proceed. The server gives the client the | |||
bare minimum of information about the absent file system so that | bare minimum of information about the absent file system so that | |||
there will be very little scope for problems of conflict between | there will be very little scope for problems of conflict between | |||
information sent by the referring server and information of the file | information sent by the referring server and information of the file | |||
system's home. No filehandles and very few attributes are present on | system's home. No filehandles and very few attributes are present on | |||
the referring server and the client can treat those it receives as | the referring server, and the client can treat those it receives as | |||
basically transient information with the function of enabling the | transient information with the function of enabling the referral. | |||
referral. | ||||
11.8.2. Referral Example (READDIR) | 11.8.2. Referral Example (READDIR) | |||
Another context in which a client may encounter referrals is when it | Another context in which a client may encounter referrals is when it | |||
does a READDIR on directory in which some of the sub-directories are | does a READDIR on a directory in which some of the sub-directories | |||
the roots of absent file systems. | are the roots of absent file systems. | |||
Suppose such a directory is read as follows: | Suppose such a directory is read as follows: | |||
o PUTROOTFH | o PUTROOTFH | |||
o LOOKUP "this" | o LOOKUP "this" | |||
o LOOKUP "is" | o LOOKUP "is" | |||
o LOOKUP "the" | o LOOKUP "the" | |||
o READDIR (fsid, size, time_modify, mounted_on_fileid) | o READDIR (fsid, size, time_modify, mounted_on_fileid) | |||
In this case, because rdattr_error is not requested, | In this case, because rdattr_error is not requested, | |||
fs_locations_info is not requested, and some of attributes cannot be | fs_locations_info is not requested, and some of the attributes cannot | |||
provided, the result will be an NFS4ERR_MOVED error on the READDIR, | be provided, the result will be an NFS4ERR_MOVED error on the | |||
with the detailed results as follows: | READDIR, with the detailed results as follows: | |||
o PUTROOTFH --> NFS_OK. The current fh is at the root of the | o PUTROOTFH --> NFS_OK. The current fh is at the root of the | |||
pseudo-fs. | pseudo-fs. | |||
o LOOKUP "this" --> NFS_OK. The current fh is for /this and is | o LOOKUP "this" --> NFS_OK. The current fh is for /this and is | |||
within the pseudo-fs. | within the pseudo-fs. | |||
o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is | o LOOKUP "is" --> NFS_OK. The current fh is for /this/is and is | |||
within the pseudo-fs. | within the pseudo-fs. | |||
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | |||
is within the pseudo-fs. | is within the pseudo-fs. | |||
o READDIR (fsid, size, time_modify, mounted_on_fileid) --> | o READDIR (fsid, size, time_modify, mounted_on_fileid) --> | |||
NFS4ERR_MOVED. Note that the same error would have been returned | NFS4ERR_MOVED. Note that the same error would have been returned | |||
if /this/is/the had migrated, when in fact it is because the | if /this/is/the had migrated, but it is returned because the | |||
directory contains the root of an absent file system. | directory contains the root of an absent file system. | |||
So now suppose that we re-send with rdattr_error: | So now suppose that we re-send with rdattr_error: | |||
o PUTROOTFH | o PUTROOTFH | |||
o LOOKUP "this" | o LOOKUP "this" | |||
o LOOKUP "is" | o LOOKUP "is" | |||
skipping to change at page 257, line 23 | skipping to change at page 257, line 23 | |||
within the pseudo-fs. | within the pseudo-fs. | |||
o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | o LOOKUP "the" --> NFS_OK. The current fh is for /this/is/the and | |||
is within the pseudo-fs. | is within the pseudo-fs. | |||
o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, | o READDIR (rdattr_error, fs_locations_info, mounted_on_fileid, fsid, | |||
size, time_modify) --> NFS_OK. The attributes will be as shown | size, time_modify) --> NFS_OK. The attributes will be as shown | |||
below. | below. | |||
The attributes for the directory entry with the component named | The attributes for the directory entry with the component named | |||
"path" will only contain | "path" will only contain: | |||
o rdattr_error (value: NFS_OK) | o rdattr_error (value: NFS_OK) | |||
o fs_locations_info | o fs_locations_info | |||
o mounted_on_fileid (value: unique fileid within referring file | o mounted_on_fileid (value: unique fileid within referring file | |||
system) | system) | |||
o fsid (value: unique value within referring server) | o fsid (value: unique value within referring server) | |||
skipping to change at page 258, line 12 | skipping to change at page 258, line 12 | |||
pathname4 fs_root; | pathname4 fs_root; | |||
fs_location4 locations<>; | fs_location4 locations<>; | |||
}; | }; | |||
The fs_location4 data type is used to represent the location of a | The fs_location4 data type is used to represent the location of a | |||
file system by providing a server name and the path to the root of | file system by providing a server name and the path to the root of | |||
the file system within that server's namespace. When a set of | the file system within that server's namespace. When a set of | |||
servers have corresponding file systems at the same path within their | servers have corresponding file systems at the same path within their | |||
namespaces, an array of server names may be provided. An entry in | namespaces, an array of server names may be provided. An entry in | |||
the server array is a UTF-8 string and represents one of a | the server array is a UTF-8 string and represents one of a | |||
traditional DNS host name, IPv4 address, or IPv6 address, or a zero- | traditional DNS host name, IPv4 address, IPv6 address, or a zero- | |||
length string. An IPv4 or IPv6 address is represented as a universal | length string. An IPv4 or IPv6 address is represented as a universal | |||
address (see Section 3.3.9 and [15]), minus the netid, and either | address (see Section 3.3.9 and [15]), minus the netid, and either | |||
with or without the trailing ".p1.p2" suffix that represents the port | with or without the trailing ".p1.p2" suffix that represents the port | |||
number. If the suffix is omitted, then the default port, 2049, | number. If the suffix is omitted, then the default port, 2049, | |||
SHOULD be assumed. A zero-length string SHOULD be used to indicate | SHOULD be assumed. A zero-length string SHOULD be used to indicate | |||
the current address being used for the RPC call. It is not a | the current address being used for the RPC call. It is not a | |||
requirement that all servers that share the same rootpath be listed | requirement that all servers that share the same rootpath be listed | |||
in one fs_location4 instance. The array of server names is provided | in one fs_location4 instance. The array of server names is provided | |||
for convenience. Servers that share the same rootpath may also be | for convenience. Servers that share the same rootpath may also be | |||
listed in separate fs_location4 entries in the fs_locations | listed in separate fs_location4 entries in the fs_locations | |||
attribute. | attribute. | |||
The fs_locations4 data type and fs_locations attribute contain an | The fs_locations4 data type and fs_locations attribute contain an | |||
array of such locations. Since the namespace of each server may be | array of such locations. Since the namespace of each server may be | |||
constructed differently, the "fs_root" field is provided. The path | constructed differently, the "fs_root" field is provided. The path | |||
represented by fs_root represents the location of the file system in | represented by fs_root represents the location of the file system in | |||
the current server's namespace, i.e. that of the server from which | the current server's namespace, i.e., that of the server from which | |||
the fs_locations attribute was obtained. The fs_root path is meant | the fs_locations attribute was obtained. The fs_root path is meant | |||
to aid the client by clearly referencing the root of the file system | to aid the client by clearly referencing the root of the file system | |||
whose locations are being reported, no matter what object within the | whose locations are being reported, no matter what object within the | |||
current file system the current filehandle designates. The fs_root | current file system the current filehandle designates. The fs_root | |||
is simply the pathname the client used to reach the object on the | is simply the pathname the client used to reach the object on the | |||
current server, the object being that the fs_locations attribute | current server (i.e., the object to which the fs_locations attribute | |||
applies to. | applies). | |||
When the fs_locations attribute is interrogated and there are no | When the fs_locations attribute is interrogated and there are no | |||
alternate file system locations, the server SHOULD return a zero- | alternate file system locations, the server SHOULD return a zero- | |||
length array of fs_location4 structures, together with a valid | length array of fs_location4 structures, together with a valid | |||
fs_root. | fs_root. | |||
As an example, suppose there is a replicated file system located at | As an example, suppose there is a replicated file system located at | |||
two servers (servA and servB). At servA, the file system is located | two servers (servA and servB). At servA, the file system is located | |||
at path "/a/b/c". At, servB the file system is located at path | at path /a/b/c. At, servB the file system is located at path /x/y/z. | |||
"/x/y/z". If the client were to obtain the fs_locations value for | If the client were to obtain the fs_locations value for the directory | |||
the directory at "/a/b/c/d", it might not necessarily know that the | at /a/b/c/d, it might not necessarily know that the file system's | |||
file system's root is located in servA's namespace at "/a/b/c". When | root is located in servA's namespace at /a/b/c. When the client | |||
the client switches to servB, it will need to determine that the | switches to servB, it will need to determine that the directory it | |||
directory it first referenced at servA is now represented by the path | first referenced at servA is now represented by the path /x/y/z/d on | |||
"/x/y/z/d" on servB. To facilitate this, the fs_locations attribute | servB. To facilitate this, the fs_locations attribute provided by | |||
provided by servA would have a fs_root value of "/a/b/c" and two | servA would have an fs_root value of /a/b/c and two entries in | |||
entries in fs_locations. One entry in fs_locations will be for | fs_locations. One entry in fs_locations will be for itself (servA) | |||
itself (servA) and the other will be for servB with a path of | and the other will be for servB with a path of /x/y/z. With this | |||
"/x/y/z". With this information, the client is able to substitute | information, the client is able to substitute /x/y/z for the /a/b/c | |||
"/x/y/z" for the "/a/b/c" at the beginning of its access path and | at the beginning of its access path and construct /x/y/z/d to use for | |||
construct "/x/y/z/d" to use for the new server. | the new server. | |||
Note that: there is no requirement that the number of components in | Note that there is no requirement that the number of components in | |||
each rootpath be the same; there is no relation between the number of | each rootpath be the same; there is no relation between the number of | |||
components in rootpath or fs_root; and the none of the components in | components in rootpath or fs_root, and none of the components in a | |||
each rootpath and fs_root have to be the same. In the above example, | rootpath and fs_root have to be the same. In the above example, we | |||
we could have had a third element in the locations array, with server | could have had a third element in the locations array, with server | |||
equal to "servC", and rootpath equal to "/I/II", and a fourth element | equal to "servC" and rootpath equal to "/I/II", and a fourth element | |||
in locations with server equal to "servD", and rootpath equal to | in locations with server equal to "servD" and rootpath equal to | |||
"/aleph/beth/gimel/daleth/he". | "/aleph/beth/gimel/daleth/he". | |||
The relationship between fs_root to a rootpath is that the client | The relationship between fs_root to a rootpath is that the client | |||
replaces the pathname indicated in fs_root for the current server for | replaces the pathname indicated in fs_root for the current server for | |||
the substitute indicated in rootpath for the new server. | the substitute indicated in rootpath for the new server. | |||
For an example for a referred or migrated file system, suppose there | For an example of a referred or migrated file system, suppose there | |||
is a file system located at serv1. At serv1, the file system is | is a file system located at serv1. At serv1, the file system is | |||
located at "/az/buky/vedi/glagoli". The client finds that object at | located at /az/buky/vedi/glagoli. The client finds that object at | |||
"glagoli" has migrated (or is a referral). The client gets the | glagoli has migrated (or is a referral). The client gets the | |||
fs_locations attribute, which contains an fs_root of "/az/buky/vedi/ | fs_locations attribute, which contains an fs_root of /az/buky/vedi/ | |||
glagoli", and one element in the locations array, with server equal | glagoli, and one element in the locations array, with server equal to | |||
to "serv2", and rootpath equal to "/izhitsa/fita". The client | serv2, and rootpath equal to /izhitsa/fita. The client replaces /az/ | |||
replaces "/az/buky/vedi/glagoli" with "/izhitsa/fita", and uses the | buky/vedi/glagoli with /izhitsa/fita, and uses the latter pathname on | |||
latter pathname on "serv2". | serv2. | |||
Thus, the server MUST return an fs_root that is equal to the path the | Thus, the server MUST return an fs_root that is equal to the path the | |||
client used to reach the object the fs_locations attribute applies | client used to reach the object to which the fs_locations attribute | |||
to. Otherwise the client cannot determine the new path to use on the | applies. Otherwise, the client cannot determine the new path to use | |||
new server. | on the new server. | |||
Since the fs_locations attribute lacks information defining various | Since the fs_locations attribute lacks information defining various | |||
attributes of the various file system choices presented, it SHOULD | attributes of the various file system choices presented, it SHOULD | |||
only be interrogated and used when fs_locations_info is not | only be interrogated and used when fs_locations_info is not | |||
available. When fs_locations is used, information about the specific | available. When fs_locations is used, information about the specific | |||
locations should be assumed based on the following rules. | locations should be assumed based on the following rules. | |||
The following rules are general and apply irrespective of the | The following rules are general and apply irrespective of the | |||
context. | context. | |||
o All listed file system instances should be considered as of the | o All listed file system instances should be considered as of the | |||
same _handle_ class, if and only if, the current fh_expire_type | same handle class, if and only if, the current fh_expire_type | |||
attribute does not include the FH4_VOL_MIGRATION bit. Note that | attribute does not include the FH4_VOL_MIGRATION bit. Note that | |||
in the case of referral, filehandle issues do not apply since | in the case of referral, filehandle issues do not apply since | |||
there can be no filehandles known within the current file system | there can be no filehandles known within the current file system, | |||
nor is there any access to the fh_expire_type attribute on the | nor is there any access to the fh_expire_type attribute on the | |||
referring (absent) file system. | referring (absent) file system. | |||
o All listed file system instances should be considered as of the | o All listed file system instances should be considered as of the | |||
same _fileid_ class, if and only if, the fh_expire_type attribute | same fileid class if and only if the fh_expire_type attribute | |||
indicates persistent filehandles and does not include the | indicates persistent filehandles and does not include the | |||
FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid | FH4_VOL_MIGRATION bit. Note that in the case of referral, fileid | |||
issues do not apply since there can be no fileids known within the | issues do not apply since there can be no fileids known within the | |||
referring (absent) file system nor is there any access to the | referring (absent) file system, nor is there any access to the | |||
fh_expire_type attribute. | fh_expire_type attribute. | |||
o All file system instances servers should be considered as of | o All file system instances servers should be considered as of | |||
different _change_ classes. | different change classes. | |||
For other class assignments, handling of file system transitions | For other class assignments, handling of file system transitions | |||
depends on the reasons for the transition: | depends on the reasons for the transition: | |||
o When the transition is due to migration, that is the client was | o When the transition is due to migration, that is, the client was | |||
directed to new file system after receiving an NFS4ERR_MOVED | directed to a new file system after receiving an NFS4ERR_MOVED | |||
error, the target should be treated as being of the same _write- | error, the target should be treated as being of the same write- | |||
verifier_ class as the source. | verifier class as the source. | |||
o When the transition is due to failover to another replica, that | o When the transition is due to failover to another replica, that | |||
is, the client selected another replica without receiving and | is, the client selected another replica without receiving an | |||
NFS4ERR_MOVED error, the target should be treated as being of a | NFS4ERR_MOVED error, the target should be treated as being of a | |||
different _write-verifier_ class from the source. | different write-verifier class from the source. | |||
The specific choices reflect typical implementation patterns for | The specific choices reflect typical implementation patterns for | |||
failover and controlled migration respectively. Since other choices | failover and controlled migration, respectively. Since other choices | |||
are possible and useful, this information is better obtained by using | are possible and useful, this information is better obtained by using | |||
fs_locations_info. When a server implementation needs to communicate | fs_locations_info. When a server implementation needs to communicate | |||
other choices, it MUST support the fs_locations_info attribute. | other choices, it MUST support the fs_locations_info attribute. | |||
See Section 21 for a discussion on the recommendations for the | See Section 21 for a discussion on the recommendations for the | |||
security flavor to be used by any GETATTR operation that requests the | security flavor to be used by any GETATTR operation that requests the | |||
"fs_locations" attribute. | "fs_locations" attribute. | |||
11.10. The Attribute fs_locations_info | 11.10. The Attribute fs_locations_info | |||
The fs_locations_info attribute is intended as a more functional | The fs_locations_info attribute is intended as a more functional | |||
replacement for fs_locations which will continue to exist and be | replacement for fs_locations that will continue to exist and be | |||
supported. Clients can use it to get a more complete set of | supported. Clients can use it to get a more complete set of | |||
information about alternative file system locations. When the server | information about alternative file system locations. When the server | |||
does not support fs_locations_info, fs_locations can be used to get a | does not support fs_locations_info, fs_locations can be used to get a | |||
subset of the information. A server which supports fs_locations_info | subset of the information. A server that supports fs_locations_info | |||
MUST support fs_locations as well. | MUST support fs_locations as well. | |||
There is additional information present in fs_locations_info, that is | There is additional information present in fs_locations_info, that is | |||
not available in fs_locations: | not available in fs_locations: | |||
o Attribute continuity information to allow a client to select a | o Attribute continuity information. This information will allow a | |||
location which meets the transparency requirements of the | client to select a location that meets the transparency | |||
applications accessing the data and to take advantage of | requirements of the applications accessing the data and to | |||
optimizations that server guarantees as to attribute continuity | leverage optimizations due to the server guarantees of attribute | |||
may provide (e.g. change attribute). | continuity (e.g., if between multiple server locations the change | |||
attribute of a file of the file system is continuous, the client | ||||
does not have to invalidate the file's cache if the change | ||||
attribute is the same among all locations). | ||||
o File System identity information which indicates when multiple | o File system identity information that indicates when multiple | |||
replicas, from the client's point of view, correspond to the same | replicas, from the client's point of view, correspond to the same | |||
target file system, allowing them to be used interchangeably, | target file system, allowing them to be used interchangeably, | |||
without disruption, as multiple paths to the same thing. | without disruption, as multiple paths to the same thing. | |||
o Information which will bear on the suitability of various | o Information that will bear on the suitability of various replicas, | |||
replicas, depending on the use that the client intends. For | depending on the use that the client intends. For example, many | |||
example, many applications need an absolutely up-to-date copy | applications need an absolutely up-to-date copy (e.g., those that | |||
(e.g. those that write), while others may only need access to the | write), while others may only need access to the most up-to-date | |||
most up-to-date copy reasonably available. | copy reasonably available. | |||
o Server-derived preference information for replicas, which can be | o Server-derived preference information for replicas, which can be | |||
used to implement load-balancing while giving the client the | used to implement load-balancing while giving the client the | |||
entire file system list to be used in case the primary fails. | entire file system list to be used in case the primary fails. | |||
The fs_locations_info attribute is structured similarly to the | The fs_locations_info attribute is structured similarly to the | |||
fs_locations attribute. A top-level structure (fs_locations_info4) | fs_locations attribute. A top-level structure (fs_locations_info4) | |||
contains the entire attribute including the root pathname of the file | contains the entire attribute including the root pathname of the file | |||
system and an array of lower-level structures that define replicas | system and an array of lower-level structures that define replicas | |||
that share a common root path on their respective servers. The | that share a common rootpath on their respective servers. The lower- | |||
lower-level structure in turn (fs_locations_item4) contains a | level structure in turn (fs_locations_item4) contains a specific | |||
specific pathname and information on one or more individual server | pathname and information on one or more individual server replicas. | |||
replicas. For that last lowest-level fs_locations_info has a | For that last lowest-level, fs_locations_info has an | |||
fs_locations_server4 structure that contains per-server-replica | fs_locations_server4 structure that contains per-server-replica | |||
information in addition to the server name. This per-server-replica | information in addition to the server name. This per-server-replica | |||
information includes a nominally opaque array, fls_info, in which | information includes a nominally opaque array, fls_info, in which | |||
specific pieces of information are located at the specific indices | specific pieces of information are located at the specific indices | |||
listed below. | listed below. | |||
The attribute will always contains at least a single | The attribute will always contain at least a single | |||
fs_locations_server entry. Typically, this will be an entry with the | fs_locations_server entry. Typically, this will be an entry with the | |||
FS4LIGF_CUR_REQ flag set, although in the case of a referral there | FS4LIGF_CUR_REQ flag set, although in the case of a referral there | |||
will be no entry with that flag set. | will be no entry with that flag set. | |||
It should be noted that fs_locations_info attributes returned by | It should be noted that fs_locations_info attributes returned by | |||
servers for various replicas may differ for various reasons. One | servers for various replicas may differ for various reasons. One | |||
server may know about a set of replicas that are not know to other | server may know about a set of replicas that are not known to other | |||
servers. Further, compatibility attributes may differ. Filehandles | servers. Further, compatibility attributes may differ. Filehandles | |||
might be of the same class going from replica A to replica B but not | might be of the same class going from replica A to replica B but not | |||
going in the reverse direction. This might happen because the | going in the reverse direction. This might happen because the | |||
filehandles are the same but replica B's server implementation might | filehandles are the same, but replica B's server implementation might | |||
not have provision to note and report that equivalence. | not have provision to note and report that equivalence. | |||
The fs_locations_info attribute consists of a root pathname | The fs_locations_info attribute consists of a root pathname | |||
(fli_fs_root, just like fs_root in the fs_locations attribute), | (fli_fs_root, just like fs_root in the fs_locations attribute), | |||
together with an array of fs_location_item4 structures. The | together with an array of fs_location_item4 structures. The | |||
fs_location_item4 structures in turn consist of a root pathname | fs_location_item4 structures in turn consist of a root pathname | |||
(fli_rootpath) together with an array (fli_entries) of elements of | (fli_rootpath) together with an array (fli_entries) of elements of | |||
data type fs_locations_server4, all defined as follows. | data type fs_locations_server4, all defined as follows. | |||
/* | /* | |||
skipping to change at page 263, line 40 | skipping to change at page 263, line 44 | |||
/* | /* | |||
* Flag bits in fli_flags. | * Flag bits in fli_flags. | |||
*/ | */ | |||
const FSLI4IF_VAR_SUB = 0x00000001; | const FSLI4IF_VAR_SUB = 0x00000001; | |||
typedef fs_locations_info4 fattr4_fs_locations_info; | typedef fs_locations_info4 fattr4_fs_locations_info; | |||
As noted above, the fs_locations_info attribute, when supported, may | As noted above, the fs_locations_info attribute, when supported, may | |||
be requested of absent file systems without causing NFS4ERR_MOVED to | be requested of absent file systems without causing NFS4ERR_MOVED to | |||
be returned and it is generally expected that it will be available | be returned. It is generally expected that it will be available for | |||
for both present and absent file systems even if only a single | both present and absent file systems even if only a single | |||
fs_locations_server4 entry is present, designating the current | fs_locations_server4 entry is present, designating the current | |||
(present) file system, or two fs_locations_server4 entries | (present) file system, or two fs_locations_server4 entries | |||
designating the previous location of an absent file system (the one | designating the previous location of an absent file system (the one | |||
just referenced) and its successor location. Servers are strongly | just referenced) and its successor location. Servers are strongly | |||
urged to support this attribute on all file systems if they support | urged to support this attribute on all file systems if they support | |||
it on any file system. | it on any file system. | |||
The data presented in the fs_locations_info attribute may be obtained | The data presented in the fs_locations_info attribute may be obtained | |||
by the server in any number of ways, including specification by the | by the server in any number of ways, including specification by the | |||
administrator or by current protocols for transferring data among | administrator or by current protocols for transferring data among | |||
replicas and protocols not yet developed. NFSv4.1 only defines how | replicas and protocols not yet developed. NFSv4.1 only defines how | |||
this information is presented by the server to the client. | this information is presented by the server to the client. | |||
11.10.1. The fs_locations_server4 Structure | 11.10.1. The fs_locations_server4 Structure | |||
The fs_locations_server4 structure consists of the following items: | The fs_locations_server4 structure consists of the following items: | |||
o An indication of file system up-to-date-ness (fls_currency) in | o An indication of how up-to-date the file system is (fls_currency) | |||
seconds. This value is relative to the master copy. A negative | in seconds. This value is relative to the master copy. A | |||
value indicates that the server is unable to give any reasonably | negative value indicates that the server is unable to give any | |||
useful value here. A zero indicates that file system is the | reasonably useful value here. A value of zero indicates that the | |||
actual writable data or a reliably coherent and fully up-to-date | file system is the actual writable data or a reliably coherent and | |||
copy. Positive values indicate how out-of-date this copy can | fully up-to-date copy. Positive values indicate how out-of-date | |||
normally be before it is considered for update. Such a value is | this copy can normally be before it is considered for update. | |||
not a guarantee that such updates will always be performed on the | Such a value is not a guarantee that such updates will always be | |||
required schedule but instead serves as a hint about how far the | performed on the required schedule but instead serves as a hint | |||
copy of the data would be expected to be behind the most up-to- | about how far the copy of the data would be expected to be behind | |||
date copy. | the most up-to-date copy. | |||
o A counted array of one-byte values (fls_info) containing | o A counted array of one-byte values (fls_info) containing | |||
information about the particular file system instance. This data | information about the particular file system instance. This data | |||
includes general flags, transport capability flags, file system | includes general flags, transport capability flags, file system | |||
equivalence class information, and selection priority information. | equivalence class information, and selection priority information. | |||
The encoding will be discussed below. | The encoding will be discussed below. | |||
o The server string (fls_server). For the case of the replica | o The server string (fls_server). For the case of the replica | |||
currently being accessed (via GETATTR), a zero-length string MAY | currently being accessed (via GETATTR), a zero-length string MAY | |||
be used to indicate the current address being used for the RPC | be used to indicate the current address being used for the RPC | |||
skipping to change at page 264, line 43 | skipping to change at page 264, line 47 | |||
formatted the same way as an IPv4 or IPv6 address in the "server" | formatted the same way as an IPv4 or IPv6 address in the "server" | |||
field of the fs_location4 data type (see Section 11.9). | field of the fs_location4 data type (see Section 11.9). | |||
Data within the fls_info array is in the form of 8-bit data items | Data within the fls_info array is in the form of 8-bit data items | |||
with constants giving the offsets within the array of various values | with constants giving the offsets within the array of various values | |||
describing this particular file system instance. This style of | describing this particular file system instance. This style of | |||
definition was chosen, in preference to explicit XDR structure | definition was chosen, in preference to explicit XDR structure | |||
definitions for these values, for a number of reasons. | definitions for these values, for a number of reasons. | |||
o The kinds of data in the fls_info array, representing flags, file | o The kinds of data in the fls_info array, representing flags, file | |||
system classes and priorities among set of file systems | system classes, and priorities among sets of file systems | |||
representing the same data, are such that eight bits provides a | representing the same data, are such that 8 bits provide a quite | |||
quite acceptable range of values. Even where there might be more | acceptable range of values. Even where there might be more than | |||
than 256 such file system instances, having more than 256 distinct | 256 such file system instances, having more than 256 distinct | |||
classes or priorities is unlikely. | classes or priorities is unlikely. | |||
o Explicit definition of the various specific data items within XDR | o Explicit definition of the various specific data items within XDR | |||
would limit expandability in that any extension within a | would limit expandability in that any extension within a | |||
subsequent minor version would require yet another attribute, | subsequent minor version would require yet another attribute, | |||
leading to specification and implementation clumsiness. | leading to specification and implementation clumsiness. | |||
o Such explicit definitions would also make it impossible to propose | o Such explicit definitions would also make it impossible to propose | |||
standards-track extensions apart from a full minor version. | Standards Track extensions apart from a full minor version. | |||
This encoding scheme can be adapted to the specification of multi- | This encoding scheme can be adapted to the specification of multi- | |||
byte numeric values, even though none are currently defined. If | byte numeric values, even though none are currently defined. If | |||
extensions are made via standards-track RFC's, multi-byte quantities | extensions are made via Standards Track RFCs, multi-byte quantities | |||
will be encoded as a range of bytes with a range of indices with the | will be encoded as a range of bytes with a range of indices, with the | |||
byte interpreted in big endian byte order. Further any such index | byte interpreted in big-endian byte order. Further, any such index | |||
assignments are constrained so that the relevant quantities will not | assignments are constrained so that the relevant quantities will not | |||
cross XDR word boundaries. | cross XDR word boundaries. | |||
The set of fls_info data is subject to expansion in a future minor | The set of fls_info data is subject to expansion in a future minor | |||
version, or in a standard-track RFC, within the context of a single | version, or in a Standards Track RFC, within the context of a single | |||
minor version. The server SHOULD NOT send and the client MUST NOT | minor version. The server SHOULD NOT send and the client MUST NOT | |||
use indices within the fls_info array that are not defined in | use indices within the fls_info array that are not defined in | |||
standards-track RFC's. | Standards Track RFCs. | |||
The fls_info array contains within it: | The fls_info array contains: | |||
o Two 8-bit flag fields, one devoted to general file-system | o Two 8-bit flag fields, one devoted to general file-system | |||
characteristics and a second reserved for transport-related | characteristics and a second reserved for transport-related | |||
capabilities. | capabilities. | |||
o Six 8-bit class values which define various file system | o Six 8-bit class values that define various file system equivalence | |||
equivalence classes as explained below. | classes as explained below. | |||
o Four 8-bit priority values which govern file system selection as | o Four 8-bit priority values that govern file system selection as | |||
explained below. | explained below. | |||
The general file system characteristics flag (at byte index | The general file system characteristics flag (at byte index | |||
FSLI4BX_GFLAGS) has the following bits defined within it: | FSLI4BX_GFLAGS) has the following bits defined within it: | |||
o FSLI4GF_WRITABLE indicates that this file system target is | o FSLI4GF_WRITABLE indicates that this file system target is | |||
writable, allowing it to be selected by clients which may need to | writable, allowing it to be selected by clients that may need to | |||
write on this file system. When the current file system instance | write on this file system. When the current file system instance | |||
is writable, and is defined as of the same simultaneous use class | is writable and is defined as of the same simultaneous use class | |||
(as specified by the value at index FSLI4BX_CLSIMUL) to which the | (as specified by the value at index FSLI4BX_CLSIMUL) to which the | |||
client was previously writing, then it must incorporate within its | client was previously writing, then it must incorporate within its | |||
data any committed write made on the source file system instance. | data any committed write made on the source file system instance. | |||
See Section 11.7.8 which discusses the write-verifier class. | See Section 11.7.8, which discusses the write-verifier class. | |||
While there is no harm in not setting this flag for a file system | While there is no harm in not setting this flag for a file system | |||
that turns out to be writable, turning the flag on for read-only | that turns out to be writable, turning the flag on for a read-only | |||
file system can cause problems for clients which select a | file system can cause problems for clients that select a migration | |||
migration or replication target based on it and then find | or replication target based on the flag and then find themselves | |||
themselves unable to write. | unable to write. | |||
o FSLI4GF_CUR_REQ indicates that this replica is the one on which | o FSLI4GF_CUR_REQ indicates that this replica is the one on which | |||
the request is being made. Only a single server entry may have | the request is being made. Only a single server entry may have | |||
this flag set and in the case of a referral, no entry will have | this flag set and, in the case of a referral, no entry will have | |||
it. | it. | |||
o FSLI4GF_ABSENT indicates that this entry corresponds an absent | o FSLI4GF_ABSENT indicates that this entry corresponds to an absent | |||
file system replica. It can only be set if FSLI4GF_CUR_REQ is | file system replica. It can only be set if FSLI4GF_CUR_REQ is | |||
set. When both such bits are set it indicates that a file system | set. When both such bits are set, it indicates that a file system | |||
instance is not usable but that the information in the entry can | instance is not usable but that the information in the entry can | |||
be used to determine the sorts of continuity available when | be used to determine the sorts of continuity available when | |||
switching from this replica to other possible replicas. Since | switching from this replica to other possible replicas. Since | |||
this bit can only be true if FSLI4GF_CUR_REQ is true, the value | this bit can only be true if FSLI4GF_CUR_REQ is true, the value | |||
could be determined using the fs_status attribute but the | could be determined using the fs_status attribute, but the | |||
information is also made available here for the convenience of the | information is also made available here for the convenience of the | |||
client. An entry with this bit, since it represents a true file | client. An entry with this bit, since it represents a true file | |||
system (albeit absent), does not appear in the event of a | system (albeit absent), does not appear in the event of a | |||
referral, but only where a file system has been accessed at this | referral, but only when a file system has been accessed at this | |||
location and has subsequently been migrated. | location and has subsequently been migrated. | |||
o FSLI4GF_GOING indicates that a replica, while still available, | o FSLI4GF_GOING indicates that a replica, while still available, | |||
should not be used further. The client, if using it, should make | should not be used further. The client, if using it, should make | |||
an orderly transfer to another file system instance as | an orderly transfer to another file system instance as | |||
expeditiously as possible. It is expected that file systems going | expeditiously as possible. It is expected that file systems going | |||
out of service will be announced as FSLI4GF_GOING some time before | out of service will be announced as FSLI4GF_GOING some time before | |||
the actual loss of service and that the valid_for value will be | the actual loss of service. It is also expected that the | |||
sufficiently small to allow clients to detect and act on scheduled | fli_valid_for value will be sufficiently small to allow clients to | |||
events while large enough that the cost of the requests to fetch | detect and act on scheduled events, while large enough that the | |||
the fs_locations_info values will not be excessive. Values on the | cost of the requests to fetch the fs_locations_info values will | |||
order of ten minutes seem reasonable. | not be excessive. Values on the order of ten minutes seem | |||
reasonable. | ||||
When this flag is seen as part of a transition into a new file | When this flag is seen as part of a transition into a new file | |||
system, a client might choose to transfer immediately to another | system, a client might choose to transfer immediately to another | |||
replica, or it may reference the current file system and only | replica, or it may reference the current file system and only | |||
transition when a migration event occurs. Similarly, when this | transition when a migration event occurs. Similarly, when this | |||
flag appears as a replica in the referral, clients would likely to | flag appears as a replica in the referral, clients would likely | |||
avoid being referred to this instance whenever there is another | avoid being referred to this instance whenever there is another | |||
choice. | choice. | |||
o FSLI4GF_SPLIT indicates that when a transition occurs from the | o FSLI4GF_SPLIT indicates that when a transition occurs from the | |||
current file system instance to this one, the replacement may | current file system instance to this one, the replacement may | |||
consist of multiple file systems. In this case, the client has to | consist of multiple file systems. In this case, the client has to | |||
be prepared for the possibility that objects on the same file | be prepared for the possibility that objects on the same file | |||
system before migration will be on different ones after. Note | system before migration will be on different ones after. Note | |||
that FSLI4GF_SPLIT is not incompatible with the file systems | that FSLI4GF_SPLIT is not incompatible with the file systems | |||
belonging to the same _fileid_ class since, if one has a set of | belonging to the same fileid class since, if one has a set of | |||
fileids that are unique within a file system, each subset assigned | fileids that are unique within a file system, each subset assigned | |||
to a smaller file system after migration would not have any | to a smaller file system after migration would not have any | |||
conflicts internal to that file system. | conflicts internal to that file system. | |||
A client, in the case of a split file system, will interrogate | A client, in the case of a split file system, will interrogate | |||
existing files with which it has continuing connection (it is free | existing files with which it has continuing connection (it is free | |||
simply forget cached filehandles). If the client remembers the | to simply forget cached filehandles). If the client remembers the | |||
directory filehandle associated with each open file, it may | directory filehandle associated with each open file, it may | |||
proceed upward using LOOKUPP to find the new file system | proceed upward using LOOKUPP to find the new file system | |||
boundaries. Note that in the event of a referral, there will not | boundaries. Note that in the event of a referral, there will not | |||
be any such files and so these action will not be performed. | be any such files and so these actions will not be performed. | |||
Instead, a reference to a portion of the original file system now | Instead, a reference to a portion of the original file system now | |||
split off into other file systems will encounter an fsid change | split off into other file systems will encounter an fsid change | |||
and possibly a further referral. | and possibly a further referral. | |||
Once the client recognizes that one file system has been split | Once the client recognizes that one file system has been split | |||
into two, it can prevent the disruption of running applications by | into two, it can prevent the disruption of running applications by | |||
presenting the two file systems as a single one until a convenient | presenting the two file systems as a single one until a convenient | |||
point to recognize the transition, such as a restart. This would | point to recognize the transition, such as a restart. This would | |||
require a mapping from the server's fsids to fsids as seen by the | require a mapping from the server's fsids to fsids as seen by the | |||
client but this is already necessary for other reasons. As noted | client, but this is already necessary for other reasons. As noted | |||
above, existing fileids within the two descendant file systems | above, existing fileids within the two descendant file systems | |||
will not conflict. Providing non-conflicting fileids for newly- | will not conflict. Providing non-conflicting fileids for newly | |||
created files on the split file systems is the responsibility of | created files on the split file systems is the responsibility of | |||
the server (or servers working in concert). The server can encode | the server (or servers working in concert). The server can encode | |||
filehandles such that filehandles generated before the split event | filehandles such that filehandles generated before the split event | |||
can be discerned from those generated after the split, allowing | can be discerned from those generated after the split, allowing | |||
the server to determine when the need for emulating two file | the server to determine when the need for emulating two file | |||
systems as one is over. | systems as one is over. | |||
Although it is possible for this flag to be present in the event | Although it is possible for this flag to be present in the event | |||
of referral, it would generally be of little interest to the | of referral, it would generally be of little interest to the | |||
client, since the client is not expected to have information | client, since the client is not expected to have information | |||
skipping to change at page 267, line 50 | skipping to change at page 268, line 6 | |||
o FSLI4TF_RDMA indicates that this file system provides NFSv4.1 file | o FSLI4TF_RDMA indicates that this file system provides NFSv4.1 file | |||
system access using an RDMA-capable transport. | system access using an RDMA-capable transport. | |||
Attribute continuity and file system identity information are | Attribute continuity and file system identity information are | |||
expressed by defining equivalence relations on the sets of file | expressed by defining equivalence relations on the sets of file | |||
systems presented to the client. Each such relation is expressed as | systems presented to the client. Each such relation is expressed as | |||
a set of file system equivalence classes. For each relation, a file | a set of file system equivalence classes. For each relation, a file | |||
system has an 8-bit class number. Two file systems belong to the | system has an 8-bit class number. Two file systems belong to the | |||
same class if both have identical non-zero class numbers. Zero is | same class if both have identical non-zero class numbers. Zero is | |||
treated as non-matching. Most often, the relevant question for the | treated as non-matching. Most often, the relevant question for the | |||
client will be whether a given replica is identical-to/ | client will be whether a given replica is identical to / continuous | |||
continuous-with the current one in a given respect but the | with the current one in a given respect, but the information should | |||
information should be available also as to whether two other replicas | be available also as to whether two other replicas match in that | |||
match in that respect as well. | respect as well. | |||
The following fields specify the file system's class numbers for the | The following fields specify the file system's class numbers for the | |||
equivalence relations used in determining the nature of file system | equivalence relations used in determining the nature of file system | |||
transitions. See Section 11.7 and its various subsections for | transitions. See Section 11.7 and its various subsections for | |||
details about how this information is to be used. Servers may assign | details about how this information is to be used. Servers may assign | |||
these values as they wish, so long as file system instances that | these values as they wish, so long as file system instances that | |||
share the same value have the specified relationship to one another, | share the same value have the specified relationship to one another; | |||
conversely file systems which have the specified relationship to one | conversely, file systems that have the specified relationship to one | |||
another share a common class value. As each instance entry is added, | another share a common class value. As each instance entry is added, | |||
the relationships of this instance to previously entered instances | the relationships of this instance to previously entered instances | |||
can be consulted and if one is found that bears the specified | can be consulted, and if one is found that bears the specified | |||
relationship, that entry's class value can be copied to the new | relationship, that entry's class value can be copied to the new | |||
entry. When no such previous entry exists, a new value for that byte | entry. When no such previous entry exists, a new value for that byte | |||
index, not previously used can be selected, most likely by | index (not previously used) can be selected, most likely by | |||
incrementing the value of the last class value assigned for that | incrementing the value of the last class value assigned for that | |||
index. | index. | |||
o The field with byte index FSLI4BX_CLSIMUL defines the | o The field with byte index FSLI4BX_CLSIMUL defines the | |||
simultaneous-use class for the file system. | simultaneous-use class for the file system. | |||
o The field with byte index FSLI4BX_CLHANDLE defines the handle | o The field with byte index FSLI4BX_CLHANDLE defines the handle | |||
class for the file system. | class for the file system. | |||
o The field with byte index FSLI4BX_CLFILEID defines the fileid | o The field with byte index FSLI4BX_CLFILEID defines the fileid | |||
skipping to change at page 268, line 48 | skipping to change at page 269, line 6 | |||
class for the file system. | class for the file system. | |||
Server-specified preference information is also provided via 8-bit | Server-specified preference information is also provided via 8-bit | |||
values within the fls_info array. The values provide a rank and an | values within the fls_info array. The values provide a rank and an | |||
order (see below) to be used with separate values specifiable for the | order (see below) to be used with separate values specifiable for the | |||
cases of read-only and writable file systems. These values are | cases of read-only and writable file systems. These values are | |||
compared for different file systems to establish the server-specified | compared for different file systems to establish the server-specified | |||
preference, with lower values indicating "more preferred". | preference, with lower values indicating "more preferred". | |||
Rank is used to express a strict server-imposed ordering on clients, | Rank is used to express a strict server-imposed ordering on clients, | |||
with lower values indicating "more preferred." Clients should | with lower values indicating "more preferred". Clients should | |||
attempt to use all replicas with a given rank before they use one | attempt to use all replicas with a given rank before they use one | |||
with a higher rank. Only if all of those file systems are | with a higher rank. Only if all of those file systems are | |||
unavailable should the client proceed to those of a higher rank. | unavailable should the client proceed to those of a higher rank. | |||
Because specifying a rank will override client preferences, servers | Because specifying a rank will override client preferences, servers | |||
should be conservative about using this mechanism, particularly when | should be conservative about using this mechanism, particularly when | |||
the environment is one in client communication characteristics are | the environment is one in which client communication characteristics | |||
not tightly controlled and visible to the server. | are neither tightly controlled nor visible to the server. | |||
Within a rank, the order value is used to specify the server's | Within a rank, the order value is used to specify the server's | |||
preference to guide the client's selection when the client's own | preference to guide the client's selection when the client's own | |||
preferences are not controlling, with lower values of order | preferences are not controlling, with lower values of order | |||
indicating "more preferred." If replicas are approximately equal in | indicating "more preferred". If replicas are approximately equal in | |||
all respects, clients should defer to the order specified by the | all respects, clients should defer to the order specified by the | |||
server. When clients look at server latency as part of their | server. When clients look at server latency as part of their | |||
selection, they are free to use this criterion but it is suggested | selection, they are free to use this criterion but it is suggested | |||
that when latency differences are not significant, the server- | that when latency differences are not significant, the server- | |||
specified order should guide selection. | specified order should guide selection. | |||
o The field at byte index FSLI4BX_READRANK gives the rank value to | o The field at byte index FSLI4BX_READRANK gives the rank value to | |||
be used for read-only access. | be used for read-only access. | |||
o The field at byte index FSLI4BX_READORDER gives the order value to | o The field at byte index FSLI4BX_READORDER gives the order value to | |||
skipping to change at page 269, line 43 | skipping to change at page 269, line 48 | |||
one of the pairs of rank and order values is used. The read rank and | one of the pairs of rank and order values is used. The read rank and | |||
order should only be used if the client knows that only reading will | order should only be used if the client knows that only reading will | |||
ever be done or if it is prepared to switch to a different replica in | ever be done or if it is prepared to switch to a different replica in | |||
the event that any write access capability is required in the future. | the event that any write access capability is required in the future. | |||
11.10.2. The fs_locations_info4 Structure | 11.10.2. The fs_locations_info4 Structure | |||
The fs_locations_info4 structure, encoding the fs_locations_info | The fs_locations_info4 structure, encoding the fs_locations_info | |||
attribute, contains the following: | attribute, contains the following: | |||
o The fli_flags field which contains general flags that affect the | o The fli_flags field, which contains general flags that affect the | |||
interpretation of this fs_locations_info4 structure and all | interpretation of this fs_locations_info4 structure and all | |||
fs_locations_item4 structures within it. The only flag currently | fs_locations_item4 structures within it. The only flag currently | |||
defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field which | defined is FSLI4IF_VAR_SUB. All bits in the fli_flags field that | |||
are not defined should always be returned as zero. | are not defined should always be returned as zero. | |||
o The fli_fs_root field which contains the pathname of the root of | o The fli_fs_root field, which contains the pathname of the root of | |||
the current file system on the current server, just as it does in | the current file system on the current server, just as it does in | |||
the fs_locations4 structure. | the fs_locations4 structure. | |||
o An array called fli_items of fs_locations4_item structures, which | o An array called fli_items of fs_locations4_item structures, which | |||
contain information about replicas of the current file system. | contain information about replicas of the current file system. | |||
Where the current file system is actually present, or has been | Where the current file system is actually present, or has been | |||
present, i.e. this is not a referral situation, one of the | present, i.e., this is not a referral situation, one of the | |||
fs_locations_item4 structures will contain an fs_locations_server4 | fs_locations_item4 structures will contain an fs_locations_server4 | |||
for the current server. This structure will have FSLI4GF_ABSENT | for the current server. This structure will have FSLI4GF_ABSENT | |||
set if the current file system is absent, i.e. normal access to it | set if the current file system is absent, i.e., normal access to | |||
will return NFS4ERR_MOVED. | it will return NFS4ERR_MOVED. | |||
o The fli_valid_for field specifies a time in seconds for which it | o The fli_valid_for field specifies a time in seconds for which it | |||
is reasonable for a client to use the fs_locations_info attribute | is reasonable for a client to use the fs_locations_info attribute | |||
without refetch. The fli_valid_for value does not provide a | without refetch. The fli_valid_for value does not provide a | |||
guarantee of validity since servers can unexpectedly go out of | guarantee of validity since servers can unexpectedly go out of | |||
service or become inaccessible for any number of reasons. Clients | service or become inaccessible for any number of reasons. Clients | |||
are well-advised to refetch this information for actively accessed | are well-advised to refetch this information for an actively | |||
file system at every fli_valid_for seconds. This is particularly | accessed file system at every fli_valid_for seconds. This is | |||
important when file system replicas may go out of service in a | particularly important when file system replicas may go out of | |||
controlled way using the FSLI4GF_GOING flag to communicate an | service in a controlled way using the FSLI4GF_GOING flag to | |||
ongoing change. The server should set fli_valid_for to a value | communicate an ongoing change. The server should set | |||
which allows well-behaved clients to notice the FSLI4GF_GOING flag | fli_valid_for to a value that allows well-behaved clients to | |||
and make an orderly switch before the loss of service becomes | notice the FSLI4GF_GOING flag and make an orderly switch before | |||
effective. If this value is zero, then no refetch interval is | the loss of service becomes effective. If this value is zero, | |||
appropriate and the client need not refetch this data on any | then no refetch interval is appropriate and the client need not | |||
particular schedule. In the event of a transition to a new file | refetch this data on any particular schedule. In the event of a | |||
system instance, a new value of the fs_locations_info attribute | transition to a new file system instance, a new value of the | |||
will be fetched at the destination and it is to be expected that | fs_locations_info attribute will be fetched at the destination. | |||
this may have a different valid_for value, which the client should | It is to be expected that this may have a different fli_valid_for | |||
then use, in the same fashion as the previous value. | value, which the client should then use in the same fashion as the | |||
previous value. | ||||
The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable | The FSLI4IF_VAR_SUB flag within fli_flags controls whether variable | |||
substitution is to be enabled. See Section 11.10.3 for an | substitution is to be enabled. See Section 11.10.3 for an | |||
explanation of variable substitution. | explanation of variable substitution. | |||
11.10.3. The fs_locations_item4 Structure | 11.10.3. The fs_locations_item4 Structure | |||
The fs_locations_item4 structure contains a pathname (in the field | The fs_locations_item4 structure contains a pathname (in the field | |||
fli_rootpath) which encodes the path of the target file system | fli_rootpath) that encodes the path of the target file system | |||
replicas on the set of servers designated by the included | replicas on the set of servers designated by the included | |||
fs_locations_server4 entries. The precise manner in which this | fs_locations_server4 entries. The precise manner in which this | |||
target location is specified depends on the value of the | target location is specified depends on the value of the | |||
FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 | FSLI4IF_VAR_SUB flag within the associated fs_locations_info4 | |||
structure. | structure. | |||
If this flag is not set, then fli_rootpath simply designates the | If this flag is not set, then fli_rootpath simply designates the | |||
location of the target file system within each server's single-server | location of the target file system within each server's single-server | |||
namespace just as it does for the rootpath within the fs_location4 | namespace just as it does for the rootpath within the fs_location4 | |||
structure. When this bit is set, however, component entries of a | structure. When this bit is set, however, component entries of a | |||
certain form are subject to client-specific variable substitution so | certain form are subject to client-specific variable substitution so | |||
as to allow a degree of namespace non-uniformity in order to | as to allow a degree of namespace non-uniformity in order to | |||
accommodate the selection of client-specific file system targets to | accommodate the selection of client-specific file system targets to | |||
adapt to different client architectures or other characteristics. | adapt to different client architectures or other characteristics. | |||
When such substitution is in effect a variable beginning with the | When such substitution is in effect, a variable beginning with the | |||
string "${" and ending with the string "}" and containing a colon is | string "${" and ending with the string "}" and containing a colon is | |||
to be replaced by the client-specific value associated with that | to be replaced by the client-specific value associated with that | |||
variable. The string "unknown" should be used by the client when it | variable. The string "unknown" should be used by the client when it | |||
has no value for such a variable. The pathname resulting from such | has no value for such a variable. The pathname resulting from such | |||
substitutions is used to designate the target file system, so that | substitutions is used to designate the target file system, so that | |||
different clients may have different file systems, corresponding to | different clients may have different file systems, corresponding to | |||
that location in the multi-server namespace. | that location in the multi-server namespace. | |||
As mentioned above, such substituted pathname variables contain a | As mentioned above, such substituted pathname variables contain a | |||
colon. The part before the colon is to be a DNS domain name with the | colon. The part before the colon is to be a DNS domain name, and the | |||
part after being a case-insensitive alphanumeric string. | part after is to be a case-insensitive alphanumeric string. | |||
Where the domain is "ietf.org", only variable names defined in this | Where the domain is "ietf.org", only variable names defined in this | |||
document or subsequent standards-track RFC's are subject to such | document or subsequent Standards Track RFCs are subject to such | |||
substitution. Organizations are free to use their domain names to | substitution. Organizations are free to use their domain names to | |||
create their own sets of client-specific variables, to be subject to | create their own sets of client-specific variables, to be subject to | |||
such substitution. In case where such variables are intended to be | such substitution. In cases where such variables are intended to be | |||
used more broadly than a single organization, publication of an | used more broadly than a single organization, publication of an | |||
informational RFC defining such variables is RECOMMENDED. | Informational RFC defining such variables is RECOMMENDED. | |||
The variable ${ietf.org:CPU_ARCH} is used to denote the CPU | The variable ${ietf.org:CPU_ARCH} is used to denote that the CPU | |||
architecture object files are compiled. This specification does not | architecture object files are compiled. This specification does not | |||
limit the acceptable values (except that they must be valid UTF-8 | limit the acceptable values (except that they must be valid UTF-8 | |||
strings) but such values as "x86", "x86_64" and "sparc" would be | strings), but such values as "x86", "x86_64", and "sparc" would be | |||
expected to be used in line with industry practice. | expected to be used in line with industry practice. | |||
The variable ${ietf.org:OS_TYPE} is used to denote the operating | The variable ${ietf.org:OS_TYPE} is used to denote the operating | |||
system and thus the kernel and library API's for which code might be | system, and thus the kernel and library APIs, for which code might be | |||
compiled. This specification does not limit the acceptable values | compiled. This specification does not limit the acceptable values | |||
(except that they must be valid UTF-8 strings) but such values as | (except that they must be valid UTF-8 strings), but such values as | |||
"linux" and "freebsd" would be expected to be used in line with | "linux" and "freebsd" would be expected to be used in line with | |||
industry practice. | industry practice. | |||
The variable ${ietf.org:OS_VERSION} is used to denote the operating | The variable ${ietf.org:OS_VERSION} is used to denote the operating | |||
system version and thus the specific details of versioned interfaces | system version, and thus the specific details of versioned | |||
for which code might be compiled. This specification does not limit | interfaces, for which code might be compiled. This specification | |||
the acceptable values (except that they must be valid UTF-8 strings). | does not limit the acceptable values (except that they must be valid | |||
However, combinations of numbers and letters with interspersed dots | UTF-8 strings). However, combinations of numbers and letters with | |||
would be expected to be used in line with industry practice, with the | interspersed dots would be expected to be used in line with industry | |||
details of the version format depending on the specific value of the | practice, with the details of the version format depending on the | |||
variable ${ietf.org:OS_TYPE} with which it is used. | specific value of the variable ${ietf.org:OS_TYPE} with which it is | |||
used. | ||||
Use of these variable could result in direction of different clients | Use of these variables could result in the direction of different | |||
to different file systems on the same server, as appropriate to | clients to different file systems on the same server, as appropriate | |||
particular clients. In cases in which the target file systems are | to particular clients. In cases in which the target file systems are | |||
located on different servers, a single server could serve as a | located on different servers, a single server could serve as a | |||
referral point so that each valid combination of variable values | referral point so that each valid combination of variable values | |||
would designate a referral hosted on a single server, with the | would designate a referral hosted on a single server, with the | |||
targets of those referrals on a number of different servers. | targets of those referrals on a number of different servers. | |||
Because namespace administration is affected by the values selected | Because namespace administration is affected by the values selected | |||
to substitute for various variables, clients should provide | to substitute for various variables, clients should provide | |||
convenient means of determining what variable substitutions a client | convenient means of determining what variable substitutions a client | |||
will implement, as well as, where appropriate, providing means to | will implement, as well as, where appropriate, providing means to | |||
control the substitutions to be used. The exact means by which this | control the substitutions to be used. The exact means by which this | |||
will be done is outside the scope of this specification. | will be done is outside the scope of this specification. | |||
Although variable substitution is most suitable for use in the | Although variable substitution is most suitable for use in the | |||
context of referrals, if may be used in the context of replication | context of referrals, it may be used in the context of replication | |||
and migration. If it is used in these contexts, the server must | and migration. If it is used in these contexts, the server must | |||
ensure that no matter what values the client presents for the | ensure that no matter what values the client presents for the | |||
substituted variables, the result is always a valid successor file | substituted variables, the result is always a valid successor file | |||
system instance to that from which a transition is occurring, i.e. | system instance to that from which a transition is occurring, i.e., | |||
that the data is identical or represents a later image of a writable | that the data is identical or represents a later image of a writable | |||
file system. | file system. | |||
Note that when fli_rootpath is a null pathname (that is, one with | Note that when fli_rootpath is a null pathname (that is, one with | |||
zero components), the file system designated is at the root of the | zero components), the file system designated is at the root of the | |||
specified server, whether the FSLI4IF_VAR_SUB flag within the | specified server, whether or not the FSLI4IF_VAR_SUB flag within the | |||
associated fs_locations_info4 structure is set or not. | associated fs_locations_info4 structure is set. | |||
11.11. The Attribute fs_status | 11.11. The Attribute fs_status | |||
In an environment in which multiple copies of the same basic set of | In an environment in which multiple copies of the same basic set of | |||
data are available, information regarding the particular source of | data are available, information regarding the particular source of | |||
such data and the relationships among different copies can be very | such data and the relationships among different copies can be very | |||
helpful in providing consistent data to applications. | helpful in providing consistent data to applications. | |||
enum fs4_status_type { | enum fs4_status_type { | |||
STATUS4_FIXED = 1, | STATUS4_FIXED = 1, | |||
skipping to change at page 273, line 37 | skipping to change at page 273, line 37 | |||
the fs4_status reflects that last valid when the file system was | the fs4_status reflects that last valid when the file system was | |||
present. | present. | |||
The fss_type field indicates the kind of file system image | The fss_type field indicates the kind of file system image | |||
represented. This is of particular importance when using the version | represented. This is of particular importance when using the version | |||
values to determine appropriate succession of file system images. | values to determine appropriate succession of file system images. | |||
When fss_absent is set, and the file system was previously present, | When fss_absent is set, and the file system was previously present, | |||
the value of fss_type reflected is that when the file was last | the value of fss_type reflected is that when the file was last | |||
present. Five values are distinguished: | present. Five values are distinguished: | |||
o STATUS4_FIXED which indicates a read-only image in the sense that | o STATUS4_FIXED, which indicates a read-only image in the sense that | |||
it will never change. The possibility is allowed that, as a | it will never change. The possibility is allowed that, as a | |||
result of migration or switch to a different image, changed data | result of migration or switch to a different image, changed data | |||
can be accessed, but within the confines of this instance, no | can be accessed, but within the confines of this instance, no | |||
change is allowed. The client can use this fact to cache | change is allowed. The client can use this fact to cache | |||
aggressively. | aggressively. | |||
o STATUS4_VERSIONED which indicates that the image, like the | o STATUS4_VERSIONED, which indicates that the image, like the | |||
STATUS4_UPDATED case, is updated externally, but it provides a | STATUS4_UPDATED case, is updated externally, but it provides a | |||
guarantee that the server will carefully update an associated | guarantee that the server will carefully update an associated | |||
version value so that the client can protect itself from a | version value so that the client can protect itself from a | |||
situation in which it reads data from one version of the file | situation in which it reads data from one version of the file | |||
system, and then later reads data from an earlier version of the | system and then later reads data from an earlier version of the | |||
same file system. See below for a discussion of how this can be | same file system. See below for a discussion of how this can be | |||
done. | done. | |||
o STATUS4_UPDATED which indicates an image that cannot be updated by | o STATUS4_UPDATED, which indicates an image that cannot be updated | |||
the user writing to it but may be changed externally, typically | by the user writing to it but that may be changed externally, | |||
because it is a periodically updated copy of another writable file | typically because it is a periodically updated copy of another | |||
system somewhere else. In this case, version information is not | writable file system somewhere else. In this case, version | |||
provided and the client does not have the responsibility of making | information is not provided, and the client does not have the | |||
sure that this version only advances upon a file system instance | responsibility of making sure that this version only advances upon | |||
transition. In this case, it is the responsibility of the server | a file system instance transition. In this case, it is the | |||
to make sure that the data presented after a file system instance | responsibility of the server to make sure that the data presented | |||
transition is a proper successor image and includes all changes | after a file system instance transition is a proper successor | |||
seen by the client and any change made before all such changes. | image and includes all changes seen by the client and any change | |||
made before all such changes. | ||||
o STATUS4_WRITABLE which indicates that the file system is an actual | o STATUS4_WRITABLE, which indicates that the file system is an | |||
writable one. The client need not, of course, actually write to | actual writable one. The client need not, of course, actually | |||
the file system, but once it does, it should not accept a | write to the file system, but once it does, it should not accept a | |||
transition to anything other than a writable instance of that same | transition to anything other than a writable instance of that same | |||
file system. | file system. | |||
o STATUS4_REFERRAL which indicates that the file system is question | o STATUS4_REFERRAL, which indicates that the file system in question | |||
is absent and has never been present on this server. | is absent and has never been present on this server. | |||
Note that in the STATUS4_UPDATED and STATUS4_VERSIONED cases, the | Note that in the STATUS4_UPDATED and STATUS4_VERSIONED cases, the | |||
server is responsible for the appropriate handling of locks that are | server is responsible for the appropriate handling of locks that are | |||
inconsistent with external changes to delegations. If a server gives | inconsistent with external changes to delegations. If a server gives | |||
out delegations, they SHOULD be recalled before an inconsistent | out delegations, they SHOULD be recalled before an inconsistent | |||
change made to data, and MUST be revoked if this is not possible. | change is made to the data, and MUST be revoked if this is not | |||
Similarly, if an open is inconsistent with data that is changed (the | possible. Similarly, if an OPEN is inconsistent with data that is | |||
open denies WRITE and the data is changed), that lock SHOULD be | changed (the OPEN has OPEN4_SHARE_DENY_WRITE/OPEN4_SHARE_DENY_BOTH | |||
considered administratively revoked. | and the data is changed), that OPEN SHOULD be considered | |||
administratively revoked. | ||||
The opaque strings fss_source and fss_current provide a way of | The opaque strings fss_source and fss_current provide a way of | |||
presenting information about the source of the file system image | presenting information about the source of the file system image | |||
being present. It is not intended that client do anything with this | being present. It is not intended that the client do anything with | |||
information other than make it available to administrative tools. It | this information other than make it available to administrative | |||
is intended that this information be helpful when researching | tools. It is intended that this information be helpful when | |||
possible problems with a file system image that might arise when it | researching possible problems with a file system image that might | |||
is unclear if the correct image is being accessed and if not, how | arise when it is unclear if the correct image is being accessed and, | |||
that image came to be made. This kind of diagnostic information will | if not, how that image came to be made. This kind of diagnostic | |||
be helpful, if, as seems likely, copies of file systems are made in | information will be helpful, if, as seems likely, copies of file | |||
many different ways (e.g. simple user-level copies, file system-level | systems are made in many different ways (e.g., simple user-level | |||
point-in-time copies, clones of the underlying storage), under a | copies, file-system-level point-in-time copies, clones of the | |||
variety of administrative arrangements. In such environments, | underlying storage), under a variety of administrative arrangements. | |||
determining how a given set of data was constructed can be very | In such environments, determining how a given set of data was | |||
helpful in resolving problems. | constructed can be very helpful in resolving problems. | |||
The opaque string fss_source is used to indicate the source of a | The opaque string fss_source is used to indicate the source of a | |||
given file system with the expectation that tools capable of creating | given file system with the expectation that tools capable of creating | |||
a file system image propagate this information, when that is | a file system image propagate this information, when possible. It is | |||
possible. It is understood that this may not always be possible | understood that this may not always be possible since a user-level | |||
since a user-level copy may be thought of as creating a new data set | copy may be thought of as creating a new data set and the tools used | |||
and the tools used may have no mechanism to propagate this data. | may have no mechanism to propagate this data. When a file system is | |||
When a file system is initially created, it is desirable to associate | initially created, it is desirable to associate with it data | |||
with it data regarding how the file system was created, where it was | regarding how the file system was created, where it was created, who | |||
created, by whom, etc. Making this information available in this | created it, etc. Making this information available in this attribute | |||
attribute in a human-readable string form will be helpful for | in a human-readable string will be helpful for applications and | |||
applications and system administrators and also serves to make it | system administrators and will also serve to make it available when | |||
available when the original file system is used to make subsequent | the original file system is used to make subsequent copies. | |||
copies. | ||||
The opaque string fss_current should provide whatever information is | The opaque string fss_current should provide whatever information is | |||
available about the source of the current copy. Such information as | available about the source of the current copy. Such information | |||
the tool creating it, any relevant parameters to that tool, the time | includes the tool creating it, any relevant parameters to that tool, | |||
at which the copy was done, the user making the change, the server on | the time at which the copy was done, the user making the change, the | |||
which the change was made, etc. All information should be in a | server on which the change was made, etc. All information should be | |||
human-readable string form. | in a human-readable string. | |||
The field fss_age provides an indication of how out-of-date the file | The field fss_age provides an indication of how out-of-date the file | |||
system currently is with respect to its ultimate data source (in case | system currently is with respect to its ultimate data source (in case | |||
of cascading data updates). This complements the fls_currency field | of cascading data updates). This complements the fls_currency field | |||
of fs_locations_server4 (see Section 11.10) in the following way: the | of fs_locations_server4 (see Section 11.10) in the following way: the | |||
information in fls_currency gives a bound for how out of date the | information in fls_currency gives a bound for how out of date the | |||
data in a file system might typically get, while the value in fss_age | data in a file system might typically get, while the value in fss_age | |||
gives a bound on how out of date that data actually is. Negative | gives a bound on how out-of-date that data actually is. Negative | |||
values imply that no information is available. A zero means that | values imply that no information is available. A zero means that | |||
this data is known to be current. A positive value means that this | this data is known to be current. A positive value means that this | |||
data is known to be no older than that number of seconds with respect | data is known to be no older than that number of seconds with respect | |||
to the ultimate data source. Using this value, the client may be | to the ultimate data source. Using this value, the client may be | |||
able to decide that a data copy is too old, so that it may search for | able to decide that a data copy is too old, so that it may search for | |||
a newer version to use. | a newer version to use. | |||
The fss_version field provides a version identification, in the form | The fss_version field provides a version identification, in the form | |||
of a time value, such that successive versions always have later time | of a time value, such that successive versions always have later time | |||
values. When the fs_type is anything other than STATUS4_VERSIONED, | values. When the fs_type is anything other than STATUS4_VERSIONED, | |||
the server may provide such a value but there is no guarantee as to | the server may provide such a value, but there is no guarantee as to | |||
its validity and clients will not use it except to provide additional | its validity and clients will not use it except to provide additional | |||
information to add to fss_source and fss_current. | information to add to fss_source and fss_current. | |||
When fss_type is STATUS4_VERSIONED, servers SHOULD provide a value of | When fss_type is STATUS4_VERSIONED, servers SHOULD provide a value of | |||
version which progresses monotonically whenever any new version of | fss_version that progresses monotonically whenever any new version of | |||
the data is established. This allows the client, if reliable image | the data is established. This allows the client, if reliable image | |||
progression is important to it, to fetch this attribute as part of | progression is important to it, to fetch this attribute as part of | |||
each COMPOUND where data or metadata from the file system is used. | each COMPOUND where data or metadata from the file system is used. | |||
When it is important to the client to make sure that only valid | When it is important to the client to make sure that only valid | |||
successor images are accepted, it must make sure that it does not | successor images are accepted, it must make sure that it does not | |||
read data or metadata from the file system without updating its sense | read data or metadata from the file system without updating its sense | |||
of the current state of the image, to avoid the possibility that the | of the current state of the image. This is to avoid the possibility | |||
fs_status which the client holds will be one for an earlier image, | that the fs_status that the client holds will be one for an earlier | |||
and so accept a new file system instance which is later than that but | image, which would cause the client to accept a new file system | |||
still earlier than updated data read by the client. | instance that is later than that but still earlier than the updated | |||
data read by the client. | ||||
In order to do this reliably, it must do a GETATTR of the fs_status | In order to accept valid images reliably, the client must do a | |||
attribute that follows any interrogation of data or metadata within | GETATTR of the fs_status attribute that follows any interrogation of | |||
the file system in question. Often this is most conveniently done by | data or metadata within the file system in question. Often this is | |||
appending such a GETATTR after all other operations that reference a | most conveniently done by appending such a GETATTR after all other | |||
given file system. When errors occur between reading file system | operations that reference a given file system. When errors occur | |||
data and performing such a GETATTR, care must be exercised to make | between reading file system data and performing such a GETATTR, care | |||
sure that the data in question is not used before obtaining the | must be exercised to make sure that the data in question is not used | |||
proper fs_status value. In this connection, when an OPEN is done | before obtaining the proper fs_status value. In this connection, | |||
within such a versioned file system and the associated GETATTR of | when an OPEN is done within such a versioned file system and the | |||
fs_status is not successfully completed, the open file in question | associated GETATTR of fs_status is not successfully completed, the | |||
must not be accessed until that fs_status is fetched. | open file in question must not be accessed until that fs_status is | |||
fetched. | ||||
The procedure above will ensure that before using any data from the | The procedure above will ensure that before using any data from the | |||
file system the client has in hand a newly-fetched current version of | file system the client has in hand a newly-fetched current version of | |||
the file system image. Multiple values for multiple requests in | the file system image. Multiple values for multiple requests in | |||
flight can be resolved by assembling them into the required partial | flight can be resolved by assembling them into the required partial | |||
order (and the elements should form a total order within it) and | order (and the elements should form a total order within the partial | |||
using the last. The client may then, when switching among file | order) and using the last. The client may then, when switching among | |||
system instances, decline to use an instance which does not have an | file system instances, decline to use an instance that does not have | |||
fss_type of STATUS4_VERSIONED or whose fss_version field is earlier | an fss_type of STATUS4_VERSIONED or whose fss_version field is | |||
than the last one obtained from the predecessor file system instance. | earlier than the last one obtained from the predecessor file system | |||
instance. | ||||
12. Parallel NFS (pNFS) | 12. Parallel NFS (pNFS) | |||
12.1. Introduction | 12.1. Introduction | |||
pNFS is an OPTIONAL feature within NFSv4.1; the pNFS feature set | pNFS is an OPTIONAL feature within NFSv4.1; the pNFS feature set | |||
allows direct client access to the storage devices containing file | allows direct client access to the storage devices containing file | |||
data. When file data for a single NFSv4 server is stored on multiple | data. When file data for a single NFSv4 server is stored on multiple | |||
and/or higher throughput storage devices (by comparison to the | and/or higher-throughput storage devices (by comparison to the | |||
server's throughput capability), the result can be significantly | server's throughput capability), the result can be significantly | |||
better file access performance. The relationship among multiple | better file access performance. The relationship among multiple | |||
clients, a single server, and multiple storage devices for pNFS | clients, a single server, and multiple storage devices for pNFS | |||
(server and clients have access to all storage devices) is shown in | (server and clients have access to all storage devices) is shown in | |||
Figure 1. | Figure 1. | |||
+-----------+ | +-----------+ | |||
|+-----------+ +-----------+ | |+-----------+ +-----------+ | |||
||+-----------+ | | | ||+-----------+ | | | |||
||| | NFSv4.1 + pNFS | | | ||| | NFSv4.1 + pNFS | | | |||
skipping to change at page 277, line 27 | skipping to change at page 277, line 27 | |||
||+----------------||+-----------+ Control | | ||+----------------||+-----------+ Control | | |||
|+-----------------||| | Protocol| | |+-----------------||| | Protocol| | |||
+------------------+|| Storage |------------+ | +------------------+|| Storage |------------+ | |||
+| Devices | | +| Devices | | |||
+-----------+ | +-----------+ | |||
Figure 1 | Figure 1 | |||
In this model, the clients, server, and storage devices are | In this model, the clients, server, and storage devices are | |||
responsible for managing file access. This is in contrast to NFSv4 | responsible for managing file access. This is in contrast to NFSv4 | |||
without pNFS where it is primarily the server's responsibility; some | without pNFS, where it is primarily the server's responsibility; some | |||
of this responsibility may be delegated to the client under strictly | of this responsibility may be delegated to the client under strictly | |||
specified conditions. See Section 12.2.6 for a discussion of the | specified conditions. See Section 12.2.5 for a discussion of the | |||
Control Protocol. See Section 12.2.5 for a discussion of the Storage | Storage Protocol. See Section 12.2.6 for a discussion of the Control | |||
Protocol. | Protocol. | |||
pNFS takes the form of OPTIONAL operations that manage protocol | pNFS takes the form of OPTIONAL operations that manage protocol | |||
objects called 'layouts' (Section 12.2.7) which contain a byte-range | objects called 'layouts' (Section 12.2.7) that contain a byte-range | |||
and storage location information. The layout is managed in a similar | and storage location information. The layout is managed in a similar | |||
fashion as NFSv4.1 data delegations. For example, the layout is | fashion as NFSv4.1 data delegations. For example, the layout is | |||
leased, recallable and revocable. However, layouts are distinct | leased, recallable, and revocable. However, layouts are distinct | |||
abstractions and are manipulated with new operations. When a client | abstractions and are manipulated with new operations. When a client | |||
holds a layout, it is granted the ability to directly access the | holds a layout, it is granted the ability to directly access the | |||
byte-range at the storage location specified in the layout. | byte-range at the storage location specified in the layout. | |||
There are interactions between layouts and other NFSv4.1 abstractions | There are interactions between layouts and other NFSv4.1 abstractions | |||
such as data delegations and byte-range locking. Delegation issues | such as data delegations and byte-range locking. Delegation issues | |||
are discussed in Section 12.5.5. Byte range locking issues are | are discussed in Section 12.5.5. Byte-range locking issues are | |||
discussed in Section 12.2.9 and Section 12.5.1. | discussed in Sections 12.2.9 and 12.5.1. | |||
12.2. pNFS Definitions | 12.2. pNFS Definitions | |||
NFSv4.1's pNFS feature provides parallel data access to a file system | NFSv4.1's pNFS feature provides parallel data access to a file system | |||
that stripes its content across multiple storage servers. The first | that stripes its content across multiple storage servers. The first | |||
instantiation of pNFS, as part of NFSv4.1, separates the file system | instantiation of pNFS, as part of NFSv4.1, separates the file system | |||
protocol processing into two parts: metadata processing and data | protocol processing into two parts: metadata processing and data | |||
processing. Data consist of the contents of regular files which are | processing. Data consist of the contents of regular files that are | |||
striped across storage servers. Data striping occurs in at least two | striped across storage servers. Data striping occurs in at least two | |||
ways: on a file-by-file basis, and within sufficiently large files, | ways: on a file-by-file basis and, within sufficiently large files, | |||
on a block-by-block basis. In contrast, striped access to metadata | on a block-by-block basis. In contrast, striped access to metadata | |||
by pNFS clients is not provided in NFSv4.1, even though the file | by pNFS clients is not provided in NFSv4.1, even though the file | |||
system back end of a pNFS server might stripe metadata. Metadata | system back end of a pNFS server might stripe metadata. Metadata | |||
consist of everything else, including the contents of non-regular | consist of everything else, including the contents of non-regular | |||
files (e.g. directories); see Section 12.2.1. The metadata | files (e.g., directories); see Section 12.2.1. The metadata | |||
functionality is implemented by an NFSv4.1 server that supports pNFS | functionality is implemented by an NFSv4.1 server that supports pNFS | |||
and the operations described in (Section 18); such a server is called | and the operations described in Section 18; such a server is called a | |||
a metadata server (Section 12.2.2). | metadata server (Section 12.2.2). | |||
The data functionality is implemented by one or more storage devices, | The data functionality is implemented by one or more storage devices, | |||
each of which are accessed by the client via a storage protocol. A | each of which are accessed by the client via a storage protocol. A | |||
subset (defined in Section 13.6) of NFSv4.1 is one such storage | subset (defined in Section 13.6) of NFSv4.1 is one such storage | |||
protocol. New terms are introduced to the NFSv4.1 nomenclature and | protocol. New terms are introduced to the NFSv4.1 nomenclature and | |||
existing terms are clarified to allow for the description of the pNFS | existing terms are clarified to allow for the description of the pNFS | |||
feature. | feature. | |||
12.2.1. Metadata | 12.2.1. Metadata | |||
Information about a file system object, such as its name, location | Information about a file system object, such as its name, location | |||
within the namespace, owner, ACL and other attributes. Metadata may | within the namespace, owner, ACL, and other attributes. Metadata may | |||
also include storage location information and this will vary based on | also include storage location information, and this will vary based | |||
the underlying storage mechanism that is used. | on the underlying storage mechanism that is used. | |||
12.2.2. Metadata Server | 12.2.2. Metadata Server | |||
An NFSv4.1 server which supports the pNFS feature. A variety of | An NFSv4.1 server that supports the pNFS feature. A variety of | |||
architectural choices exists for the metadata server and its use of | architectural choices exist for the metadata server and its use of | |||
file system information held at the server. Some servers may contain | file system information held at the server. Some servers may contain | |||
metadata only for file objects residing at the metadata server while | metadata only for file objects residing at the metadata server, while | |||
the file data resides on associated storage devices. Other metadata | the file data resides on associated storage devices. Other metadata | |||
servers may hold both metadata and a varying degree of file data. | servers may hold both metadata and a varying degree of file data. | |||
12.2.3. pNFS Client | 12.2.3. pNFS Client | |||
An NFSv4.1 client that supports pNFS operations and supports at least | An NFSv4.1 client that supports pNFS operations and supports at least | |||
one storage protocol for performing I/O to storage devices. | one storage protocol for performing I/O to storage devices. | |||
12.2.4. Storage Device | 12.2.4. Storage Device | |||
A storage device stores a regular file's data, but leaves metadata | A storage device stores a regular file's data, but leaves metadata | |||
management to the metadata server. A storage device could be another | management to the metadata server. A storage device could be another | |||
NFSv4.1 server, an object storage device (OSD), a block device | NFSv4.1 server, an object-based storage device (OSD), a block device | |||
accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some | accessed over a System Area Network (SAN, e.g., either FiberChannel | |||
other entity. | or iSCSI SAN), or some other entity. | |||
12.2.5. Storage Protocol | 12.2.5. Storage Protocol | |||
As noted in the Figure 1, the storage protocol is the method used by | As noted in Figure 1, the storage protocol is the method used by the | |||
the client to store and retrieve data directly from the storage | client to store and retrieve data directly from the storage devices. | |||
devices. | ||||
The NFSv4.1 pNFS feature has been structured to allow for a variety | The NFSv4.1 pNFS feature has been structured to allow for a variety | |||
of storage protocols to be defined and used. One example storage | of storage protocols to be defined and used. One example storage | |||
protocol is NFSv4.1 itself (as documented in Section 13). Other | protocol is NFSv4.1 itself (as documented in Section 13). Other | |||
options for the storage protocol are described elsewhere and include: | options for the storage protocol are described elsewhere and include: | |||
o Block/volume protocols such as iSCSI ([48]), and FCP ([49]). The | o Block/volume protocols such as Internet SCSI (iSCSI) [48] and FCP | |||
block/volume protocol support can be independent of the addressing | [49]. The block/volume protocol support can be independent of the | |||
structure of the block/volume protocol used, allowing more than | addressing structure of the block/volume protocol used, allowing | |||
one protocol to access the same file data and enabling | more than one protocol to access the same file data and enabling | |||
extensibility to other block/volume protocols. See [41] for a | extensibility to other block/volume protocols. See [41] for a | |||
layout specification that allows pNFS to use block/volume storage | layout specification that allows pNFS to use block/volume storage | |||
protocols. | protocols. | |||
o Object protocols such as OSD over iSCSI or Fibre Channel [50]. | o Object protocols such as OSD over iSCSI or Fibre Channel [50]. | |||
See [40] for a layout specification that allows pNFS to use object | See [40] for a layout specification that allows pNFS to use object | |||
storage protocols. | storage protocols. | |||
It is possible that various storage protocols are available to both | It is possible that various storage protocols are available to both | |||
client and server and it may be possible that a client and server do | client and server and it may be possible that a client and server do | |||
not have a matching storage protocol available to them. Because of | not have a matching storage protocol available to them. Because of | |||
this, the pNFS server MUST support normal NFSv4.1 access to any file | this, the pNFS server MUST support normal NFSv4.1 access to any file | |||
accessible by the pNFS feature; this will allow for continued | accessible by the pNFS feature; this will allow for continued | |||
interoperability between an NFSv4.1 client and server. | interoperability between an NFSv4.1 client and server. | |||
12.2.6. Control Protocol | 12.2.6. Control Protocol | |||
As noted in the Figure 1, the control protocol is used by the | As noted in Figure 1, the control protocol is used by the exported | |||
exported file system between the metadata server and storage devices. | file system between the metadata server and storage devices. | |||
Specification of such protocols is outside the scope of the NFSv4.1 | Specification of such protocols is outside the scope of the NFSv4.1 | |||
protocol. Such control protocols would be used to control activities | protocol. Such control protocols would be used to control activities | |||
such as the allocation and deallocation of storage, the management of | such as the allocation and deallocation of storage, the management of | |||
state required by the storage devices to perform client access | state required by the storage devices to perform client access | |||
control, and, depending on the storage protocol, the enforcement of | control, and, depending on the storage protocol, the enforcement of | |||
authentication and authorization so that restrictions that would be | authentication and authorization so that restrictions that would be | |||
enforced by the metadata server are also enforced by the storage | enforced by the metadata server are also enforced by the storage | |||
device. | device. | |||
A particular control protocol is not REQUIRED by NFSv4.1 but | A particular control protocol is not REQUIRED by NFSv4.1 but | |||
requirements are placed on the control protocol for maintaining | requirements are placed on the control protocol for maintaining | |||
attributes like modify time, the change attribute, and the end-of- | attributes like modify time, the change attribute, and the end-of- | |||
file (EOF) position. Note that if pNFS is layered over a clustered, | file (EOF) position. Note that if pNFS is layered over a clustered, | |||
parallel file system (e.g. PVFS [51]), the mechanisms that enable | parallel file system (e.g., PVFS [51]), the mechanisms that enable | |||
clustering and parallelism in that file system can be considered the | clustering and parallelism in that file system can be considered the | |||
control protocol. | control protocol. | |||
12.2.7. Layout Types | 12.2.7. Layout Types | |||
A layout describes the mapping of a file's data to the storage | A layout describes the mapping of a file's data to the storage | |||
devices that hold the data. A layout is said to belong to a specific | devices that hold the data. A layout is said to belong to a specific | |||
layout type (data type layouttype4, see Section 3.3.13). The layout | layout type (data type layouttype4, see Section 3.3.13). The layout | |||
type allows for variants to handle different storage protocols, such | type allows for variants to handle different storage protocols, such | |||
as those associated with block/volume [41], object [40], and file | as those associated with block/volume [41], object [40], and file | |||
(Section 13) layout types. A metadata server, along with its control | (Section 13) layout types. A metadata server, along with its control | |||
protocol, MUST support at least one layout type. A private sub-range | protocol, MUST support at least one layout type. A private sub-range | |||
of the layout type name space is also defined. Values from the | of the layout type name space is also defined. Values from the | |||
private layout type range MAY be used for internal testing or | private layout type range MAY be used for internal testing or | |||
experimentation (see Section 3.3.13). | experimentation (see Section 3.3.13). | |||
As an example, the organization of the file layout type could be an | As an example, the organization of the file layout type could be an | |||
array of tuples (e.g., device ID, filehandle), along with a | array of tuples (e.g., device ID, filehandle), along with a | |||
definition of how the data is stored across the devices (e.g., | definition of how the data is stored across the devices (e.g., | |||
striping). A block/volume layout might be an array of tuples that | striping). A block/volume layout might be an array of tuples that | |||
store <device ID, block_number, block count> along with information | store <device ID, block number, block count> along with information | |||
about block size and the associated file offset of the block number. | about block size and the associated file offset of the block number. | |||
An object layout might be an array of tuples <device ID, object ID> | An object layout might be an array of tuples <device ID, object ID> | |||
and an additional structure (i.e., the aggregation map) that defines | and an additional structure (i.e., the aggregation map) that defines | |||
how the logical byte sequence of the file data is serialized into the | how the logical byte sequence of the file data is serialized into the | |||
different objects. Note that the actual layouts are typically more | different objects. Note that the actual layouts are typically more | |||
complex than these simple expository examples. | complex than these simple expository examples. | |||
Requests for pNFS-related operations will often specify a layout | Requests for pNFS-related operations will often specify a layout | |||
type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. | type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. | |||
The response for these operations will include structures such a | The response for these operations will include structures such as a | |||
device_addr4 or a layout4, each of which includes a layout type | device_addr4 or a layout4, each of which includes a layout type | |||
within it. The layout type sent by the server MUST always be the | within it. The layout type sent by the server MUST always be the | |||
same one requested by the client. When a server sends a response | same one requested by the client. When a server sends a response | |||
that includes a different layout type, the client SHOULD ignore the | that includes a different layout type, the client SHOULD ignore the | |||
response and behave as if the server had returned an error response. | response and behave as if the server had returned an error response. | |||
12.2.8. Layout | 12.2.8. Layout | |||
A layout defines how a file's data is organized on one or more | A layout defines how a file's data is organized on one or more | |||
storage devices. There are many potential layout types; each of the | storage devices. There are many potential layout types; each of the | |||
layout types are differentiated by the storage protocol used to | layout types are differentiated by the storage protocol used to | |||
access data and in the aggregation scheme that lays out the file data | access data and by the aggregation scheme that lays out the file data | |||
on the underlying storage devices. A layout is precisely identified | on the underlying storage devices. A layout is precisely identified | |||
by the following tuple: <client ID, filehandle, layout type, iomode, | by the tuple <client ID, filehandle, layout type, iomode, range>, | |||
range>; where filehandle refers to the filehandle of the file on the | where filehandle refers to the filehandle of the file on the metadata | |||
metadata server. | server. | |||
It is important to define when layouts overlap and/or conflict with | It is important to define when layouts overlap and/or conflict with | |||
each other. For two layouts with overlapping byte ranges to actually | each other. For two layouts with overlapping byte-ranges to actually | |||
overlap each other, both layouts must be of the same layout type, | overlap each other, both layouts must be of the same layout type, | |||
correspond to the same filehandle, and have the same iomode. Layouts | correspond to the same filehandle, and have the same iomode. Layouts | |||
conflict when they overlap and differ in the content of the layout | conflict when they overlap and differ in the content of the layout | |||
(i.e., the storage device/file mapping parameters differ). Note that | (i.e., the storage device/file mapping parameters differ). Note that | |||
differing iomodes do not lead to conflicting layouts. It is | differing iomodes do not lead to conflicting layouts. It is | |||
permissible for layouts with different iomodes, pertaining to the | permissible for layouts with different iomodes, pertaining to the | |||
same byte range, to be held by the same client. An example of this | same byte-range, to be held by the same client. An example of this | |||
would be copy-on-write functionality for a block/volume layout type. | would be copy-on-write functionality for a block/volume layout type. | |||
12.2.9. Layout Iomode | 12.2.9. Layout Iomode | |||
The layout iomode (data type layoutiomode4, see Section 3.3.20) | The layout iomode (data type layoutiomode4, see Section 3.3.20) | |||
indicates to the metadata server the client's intent to perform | indicates to the metadata server the client's intent to perform | |||
either just read operations or a mixture of I/O possibly containing | either just READ operations or a mixture containing READ and WRITE | |||
read and write operations. For certain layout types, it is useful | operations. For certain layout types, it is useful for a client to | |||
for a client to specify this intent at the time it sends LAYOUTGET | specify this intent at the time it sends LAYOUTGET (Section 18.43). | |||
(Section 18.43). For example, block/volume based protocols, block | For example, for block/volume-based protocols, block allocation could | |||
allocation could occur when a READ/WRITE iomode is specified. A | occur when a LAYOUTIOMODE4_RW iomode is specified. A special | |||
special LAYOUTIOMODE4_ANY iomode is defined and can only be used for | LAYOUTIOMODE4_ANY iomode is defined and can only be used for | |||
LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies | LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies | |||
that layouts pertaining to both READ and READ/WRITE iomodes are being | that layouts pertaining to both LAYOUTIOMODE4_READ and | |||
returned or recalled, respectively. | LAYOUTIOMODE4_RW iomodes are being returned or recalled, | |||
respectively. | ||||
A storage device may validate I/O with regard to the iomode; this is | A storage device may validate I/O with regard to the iomode; this is | |||
dependent upon storage device implementation and layout type. Thus, | dependent upon storage device implementation and layout type. Thus, | |||
if the client's layout iomode is inconsistent with the I/O being | if the client's layout iomode is inconsistent with the I/O being | |||
performed, the storage device may reject the client's I/O with an | performed, the storage device may reject the client's I/O with an | |||
error indicating a new layout with the correct iomode should be | error indicating that a new layout with the correct iomode should be | |||
obtained via LAYOUTGET. For example, if a client gets a layout with | obtained via LAYOUTGET. For example, if a client gets a layout with | |||
a READ iomode and performs a WRITE to a storage device, the storage | a LAYOUTIOMODE4_READ iomode and performs a WRITE to a storage device, | |||
device is allowed to reject that WRITE. | the storage device is allowed to reject that WRITE. | |||
The use of the layout iomode does not conflict with OPEN share modes | The use of the layout iomode does not conflict with OPEN share modes | |||
or byte-range lock requests; open mode and lock conflicts are | or byte-range LOCK operations; open share mode and byte-range lock | |||
enforced as they are without the use of pNFS, and are logically | conflicts are enforced as they are without the use of pNFS and are | |||
separate from the pNFS layout level. Open modes and locks are the | logically separate from the pNFS layout level. Open share modes and | |||
preferred method for restricting user access to data files. For | byte-range locks are the preferred method for restricting user access | |||
example, an OPEN of read, deny-write does not conflict with a | to data files. For example, an OPEN of OPEN4_SHARE_ACCESS_WRITE does | |||
LAYOUTGET containing an iomode of READ/WRITE performed by another | not conflict with a LAYOUTGET containing an iomode of | |||
client. Applications that depend on writing into the same file | LAYOUTIOMODE4_RW performed by another client. Applications that | |||
concurrently may use byte-range locking to serialize their accesses. | depend on writing into the same file concurrently may use byte-range | |||
locking to serialize their accesses. | ||||
12.2.10. Device IDs | 12.2.10. Device IDs | |||
The device ID (data type deviceid4, see Section 3.3.14) identifies a | The device ID (data type deviceid4, see Section 3.3.14) identifies a | |||
group of storage devices. The scope of a device ID is the pair | group of storage devices. The scope of a device ID is the pair | |||
<client ID, layout type>. In practice, a significant amount of | <client ID, layout type>. In practice, a significant amount of | |||
information may be required to fully address a storage device. | information may be required to fully address a storage device. | |||
Rather than embedding all such information in a layout, layouts embed | Rather than embedding all such information in a layout, layouts embed | |||
device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is | device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is | |||
used to retrieve the complete address information (including all | used to retrieve the complete address information (including all | |||
device addresses for the device ID) regarding the storage device | device addresses for the device ID) regarding the storage device | |||
according to its layout type and device ID. For example, the address | according to its layout type and device ID. For example, the address | |||
of an NFSv4.1 data server or of an object storage device could be an | of an NFSv4.1 data server or of an object-based storage device could | |||
IP address and port. The address of a block storage device could be | be an IP address and port. The address of a block storage device | |||
a volume label. | could be a volume label. | |||
Clients cannot expect the mapping between a device ID and its storage | Clients cannot expect the mapping between a device ID and its storage | |||
device address(es) to persist across metadata server restart. See | device address(es) to persist across metadata server restart. See | |||
Section 12.7.4 for a description of how recovery works in that | Section 12.7.4 for a description of how recovery works in that | |||
situation. | situation. | |||
A device ID lives as long as there is a layout referring to the | A device ID lives as long as there is a layout referring to the | |||
device ID. If there are no layouts referring to the device ID, the | device ID. If there are no layouts referring to the device ID, the | |||
server is free to delete the device ID any time. Once a device ID is | server is free to delete the device ID any time. Once a device ID is | |||
deleted by the server, the server MUST NOT reuse the device ID for | deleted by the server, the server MUST NOT reuse the device ID for | |||
the same layout type and client ID again. This requirement is | the same layout type and client ID again. This requirement is | |||
feasible because the device ID is 16 bytes long, leaving sufficient | feasible because the device ID is 16 bytes long, leaving sufficient | |||
room to store a generation number if server's implementation requires | room to store a generation number if the server's implementation | |||
most of the rest of the device ID's content to be reused. This | requires most of the rest of the device ID's content to be reused. | |||
requirement is necessary because otherwise the race conditions | This requirement is necessary because otherwise the race conditions | |||
between asynchronous notification of device ID addition and deletion | between asynchronous notification of device ID addition and deletion | |||
would be too difficult to sort out. | would be too difficult to sort out. | |||
Device ID to device address mappings are not leased, and can be | Device ID to device address mappings are not leased, and can be | |||
changed at any time. (Note that while device ID to device address | changed at any time. (Note that while device ID to device address | |||
mappings are likely to change after the metadata server restarts, the | mappings are likely to change after the metadata server restarts, the | |||
server is not required to change the mappings.) A server has two | server is not required to change the mappings.) A server has two | |||
choices for changing mappings. It can recall all layouts referring | choices for changing mappings. It can recall all layouts referring | |||
to the device ID or it can use a notification mechanism. | to the device ID or it can use a notification mechanism. | |||
The NFSv4.1 protocol has no optimal way to recall all layouts that | The NFSv4.1 protocol has no optimal way to recall all layouts that | |||
referred to a particular device ID (unless the server associates a | referred to a particular device ID (unless the server associates a | |||
single device ID with a single fsid or a single client ID; in which | single device ID with a single fsid or a single client ID; in which | |||
case, CB_LAYOUTRECALL has options for recalling all layouts | case, CB_LAYOUTRECALL has options for recalling all layouts | |||
associated with the fsid, client ID pair or just the client ID). | associated with the fsid, client ID pair, or just the client ID). | |||
Via a notification mechanism (see Section 20.12), device ID to device | Via a notification mechanism (see Section 20.12), device ID to device | |||
address mappings can change over the duration of server operation | address mappings can change over the duration of server operation | |||
without recalling or revoking the layouts that refer to device ID. | without recalling or revoking the layouts that refer to device ID. | |||
The notification mechanism can also delete a device ID, but only if | The notification mechanism can also delete a device ID, but only if | |||
the client has no layouts referring to the device ID. A notification | the client has no layouts referring to the device ID. A notification | |||
of a change to a device ID to device address mapping will immediately | of a change to a device ID to device address mapping will immediately | |||
or eventually invalidate some or all of the device ID's mappings. | or eventually invalidate some or all of the device ID's mappings. | |||
The server MUST support notifications and the client must request | The server MUST support notifications and the client must request | |||
them before they can be used. For further information about the | them before they can be used. For further information about the | |||
notification types Section 20.12. | notification types Section 20.12. | |||
12.3. pNFS Operations | 12.3. pNFS Operations | |||
NFSv4.1 has several operations that are needed for pNFS servers, | NFSv4.1 has several operations that are needed for pNFS servers, | |||
regardless of layout type or storage protocol. These operations are | regardless of layout type or storage protocol. These operations are | |||
all sent to a metadata server and summarized here. While pNFS is an | all sent to a metadata server and summarized here. While pNFS is an | |||
OPTIONAL feature, if pNFS is implemented, some operations are | OPTIONAL feature, if pNFS is implemented, some operations are | |||
skipping to change at page 283, line 19 | skipping to change at page 283, line 23 | |||
12.3. pNFS Operations | 12.3. pNFS Operations | |||
NFSv4.1 has several operations that are needed for pNFS servers, | NFSv4.1 has several operations that are needed for pNFS servers, | |||
regardless of layout type or storage protocol. These operations are | regardless of layout type or storage protocol. These operations are | |||
all sent to a metadata server and summarized here. While pNFS is an | all sent to a metadata server and summarized here. While pNFS is an | |||
OPTIONAL feature, if pNFS is implemented, some operations are | OPTIONAL feature, if pNFS is implemented, some operations are | |||
REQUIRED in order to comply with pNFS. See Section 17. | REQUIRED in order to comply with pNFS. See Section 17. | |||
These are the fore channel pNFS operations: | These are the fore channel pNFS operations: | |||
GETDEVICEINFO. As noted previously (Section 12.2.10), GETDEVICEINFO | GETDEVICEINFO (Section 18.40), as noted previously | |||
(Section 18.40) returns the mapping of device ID to storage device | (Section 12.2.10), returns the mapping of device ID to storage | |||
address. | device address. | |||
GETDEVICELIST (Section 18.41), allows clients to fetch all device | GETDEVICELIST (Section 18.41) allows clients to fetch all device IDs | |||
IDs for a specific file system. | for a specific file system. | |||
LAYOUTGET (Section 18.43) is used by a client to get a layout for a | LAYOUTGET (Section 18.43) is used by a client to get a layout for a | |||
file. | file. | |||
LAYOUTCOMMIT (Section 18.42) is used to inform the metadata server | LAYOUTCOMMIT (Section 18.42) is used to inform the metadata server | |||
of the client's intent to commit data which has been written to | of the client's intent to commit data that has been written to the | |||
the storage device; the storage device as originally indicated in | storage device (the storage device as originally indicated in the | |||
the return value of LAYOUTGET. | return value of LAYOUTGET). | |||
LAYOUTRETURN (Section 18.44) is used to return layouts for a file, | LAYOUTRETURN (Section 18.44) is used to return layouts for a file, a | |||
an FSID and for client ID. | file system ID (FSID), or a client ID. | |||
These are the backchannel pNFS operations: | These are the backchannel pNFS operations: | |||
CB_LAYOUTRECALL (Section 20.3) recalls a layout or all layouts | CB_LAYOUTRECALL (Section 20.3) recalls a layout, all layouts | |||
belonging to a file system, or all layouts belonging to a client | belonging to a file system, or all layouts belonging to a client | |||
ID. | ID. | |||
CB_RECALL_ANY (Section 20.6), tells a client that it needs to return | CB_RECALL_ANY (Section 20.6) tells a client that it needs to return | |||
some number of recallable objects, including layouts, to the | some number of recallable objects, including layouts, to the | |||
metadata server. | metadata server. | |||
CB_RECALLABLE_OBJ_AVAIL (Section 20.7) tells a client that a | CB_RECALLABLE_OBJ_AVAIL (Section 20.7) tells a client that a | |||
recallable object that it was denied (in case of pNFS, a layout, | recallable object that it was denied (in case of pNFS, a layout | |||
denied by LAYOUTGET) due to resource exhaustion, is now available. | denied by LAYOUTGET) due to resource exhaustion is now available. | |||
CB_NOTIFY_DEVICEID (Section 20.12) Notifies the client of changes to | CB_NOTIFY_DEVICEID (Section 20.12) notifies the client of changes to | |||
device IDs. | device IDs. | |||
12.4. pNFS Attributes | 12.4. pNFS Attributes | |||
A number of attributes specific to pNFS are listed and described in | A number of attributes specific to pNFS are listed and described in | |||
Section 5.12 | Section 5.12. | |||
12.5. Layout Semantics | 12.5. Layout Semantics | |||
12.5.1. Guarantees Provided by Layouts | 12.5.1. Guarantees Provided by Layouts | |||
Layouts grant to the client the ability to access data located at a | Layouts grant to the client the ability to access data located at a | |||
storage device with the appropriate storage protocol. The client is | storage device with the appropriate storage protocol. The client is | |||
guaranteed the layout will be recalled when one of two things occur; | guaranteed the layout will be recalled when one of two things occur: | |||
either a conflicting layout is requested or the state encapsulated by | either a conflicting layout is requested or the state encapsulated by | |||
the layout becomes invalid and this can happen when an event directly | the layout becomes invalid (this can happen when an event directly or | |||
or indirectly modifies the layout. When a layout is recalled and | indirectly modifies the layout). When a layout is recalled and | |||
returned by the client, the client continues with the ability to | returned by the client, the client continues with the ability to | |||
access file data with normal NFSv4.1 operations through the metadata | access file data with normal NFSv4.1 operations through the metadata | |||
server. Only the ability to access the storage devices is affected. | server. Only the ability to access the storage devices is affected. | |||
The requirement of NFSv4.1, that all user access rights MUST be | The requirement of NFSv4.1 that all user access rights MUST be | |||
obtained through the appropriate open, lock, and access operations, | obtained through the appropriate OPEN, LOCK, and ACCESS operations is | |||
is not modified with the existence of layouts. Layouts are provided | not modified with the existence of layouts. Layouts are provided to | |||
to NFSv4.1 clients and user access still follows the rules of the | NFSv4.1 clients, and user access still follows the rules of the | |||
protocol as if they did not exist. It is a requirement that for a | protocol as if they did not exist. It is a requirement that for a | |||
client to access a storage device, a layout must be held by the | client to access a storage device, a layout must be held by the | |||
client. If a storage device receives an I/O for a byte range for | client. If a storage device receives an I/O request for a byte-range | |||
which the client does not hold a layout, the storage device SHOULD | for which the client does not hold a layout, the storage device | |||
reject that I/O request. Note that the act of modifying a file for | SHOULD reject that I/O request. Note that the act of modifying a | |||
which a layout is held, does not necessarily conflict with the | file for which a layout is held does not necessarily conflict with | |||
holding of the layout that describes the file being modified. | the holding of the layout that describes the file being modified. | |||
Therefore, it is the requirement of the storage protocol or layout | Therefore, it is the requirement of the storage protocol or layout | |||
type that determines the necessary behavior. For example, block/ | type that determines the necessary behavior. For example, block/ | |||
volume layout types require that the layout's iomode agree with the | volume layout types require that the layout's iomode agree with the | |||
type of I/O being performed. | type of I/O being performed. | |||
Depending upon the layout type and storage protocol in use, storage | Depending upon the layout type and storage protocol in use, storage | |||
device access permissions may be granted by LAYOUTGET and may be | device access permissions may be granted by LAYOUTGET and may be | |||
encoded within the type-specific layout. For an example of storage | encoded within the type-specific layout. For an example of storage | |||
device access permissions, see an object based protocol such as [50]. | device access permissions, see an object-based protocol such as [50]. | |||
If access permissions are encoded within the layout, the metadata | If access permissions are encoded within the layout, the metadata | |||
server SHOULD recall the layout when those permissions become invalid | server SHOULD recall the layout when those permissions become invalid | |||
for any reason; for example when a file becomes unwritable or | for any reason -- for example, when a file becomes unwritable or | |||
inaccessible to a client. Note, clients are still required to | inaccessible to a client. Note, clients are still required to | |||
perform the appropriate access operations with open, lock and access | perform the appropriate OPEN, LOCK, and ACCESS operations as | |||
as described above. The degree to which it is possible for the | described above. The degree to which it is possible for the client | |||
client to circumvent these access operations and the consequences of | to circumvent these operations and the consequences of doing so must | |||
doing so must be clearly specified by the individual layout type | be clearly specified by the individual layout type specifications. | |||
specifications. In addition, these specifications must be clear | In addition, these specifications must be clear about the | |||
about the requirements and non-requirements for the checking | requirements and non-requirements for the checking performed by the | |||
performed by the server. | server. | |||
In the presence of pNFS functionality, mandatory file locks MUST | In the presence of pNFS functionality, mandatory byte-range locks | |||
behave as they would without pNFS. Therefore, if mandatory file | MUST behave as they would without pNFS. Therefore, if mandatory file | |||
locks and layouts are provided simultaneously, the storage device | locks and layouts are provided simultaneously, the storage device | |||
MUST be able to enforce the mandatory file locks. For example, if | MUST be able to enforce the mandatory byte-range locks. For example, | |||
one client obtains a mandatory lock and a second client accesses the | if one client obtains a mandatory byte-range lock and a second client | |||
storage device, the storage device MUST appropriately restrict I/O | accesses the storage device, the storage device MUST appropriately | |||
for the byte range of the mandatory file lock. If the storage device | restrict I/O for the range of the mandatory byte-range lock. If the | |||
is incapable of providing this check in the presence of mandatory | storage device is incapable of providing this check in the presence | |||
file locks, the metadata server then MUST NOT grant layouts and | of mandatory byte-range locks, then the metadata server MUST NOT | |||
mandatory file locks simultaneously. | grant layouts and mandatory byte-range locks simultaneously. | |||
12.5.2. Getting a Layout | 12.5.2. Getting a Layout | |||
A client obtains a layout with the LAYOUTGET operation. The metadata | A client obtains a layout with the LAYOUTGET operation. The metadata | |||
server will grant layouts of a particular type (e.g., block/volume, | server will grant layouts of a particular type (e.g., block/volume, | |||
object, or file). The client selects an appropriate layout type that | object, or file). The client selects an appropriate layout type that | |||
the server supports and the client is prepared to use. The layout | the server supports and the client is prepared to use. The layout | |||
returned to the client might not exactly match the requested byte | returned to the client might not exactly match the requested byte- | |||
range as described in Section 18.43.3. As needed a client may make | range as described in Section 18.43.3. As needed a client may send | |||
multiple LAYOUTGET requests; these might result in multiple | multiple LAYOUTGET operations; these might result in multiple | |||
overlapping, non-conflicting layouts (see Section 12.2.8). | overlapping, non-conflicting layouts (see Section 12.2.8). | |||
In order to get a layout, the client must first have opened the file | In order to get a layout, the client must first have opened the file | |||
via the OPEN operation. When a client has no layout on a file, it | via the OPEN operation. When a client has no layout on a file, it | |||
MUST present a stateid as returned by OPEN, a delegation stateid, or | MUST present an open stateid, a delegation stateid, or a byte-range | |||
a byte-range lock stateid in the loga_stateid argument. A successful | lock stateid in the loga_stateid argument. A successful LAYOUTGET | |||
LAYOUTGET result includes a layout stateid. The first successful | result includes a layout stateid. The first successful LAYOUTGET | |||
LAYOUTGET processed by the server using a non-layout stateid as an | processed by the server using a non-layout stateid as an argument | |||
argument MUST have the "seqid" field of the layout stateid in the | MUST have the "seqid" field of the layout stateid in the response set | |||
response set to one. Thereafter, the client MUST use a layout | to one. Thereafter, the client MUST use a layout stateid (see | |||
stateid (see Section 12.5.3) on future invocations of LAYOUTGET on | Section 12.5.3) on future invocations of LAYOUTGET on the file, and | |||
the file, and the "seqid" MUST NOT be set to zero. Once the layout | the "seqid" MUST NOT be set to zero. Once the layout has been | |||
has been retrieved, it can be held across multiple OPEN and CLOSE | retrieved, it can be held across multiple OPEN and CLOSE sequences. | |||
sequences. Therefore, a client may hold a layout for a file that is | Therefore, a client may hold a layout for a file that is not | |||
not currently open by any user on the client. This allows for the | currently open by any user on the client. This allows for the | |||
caching of layouts beyond CLOSE. | caching of layouts beyond CLOSE. | |||
The storage protocol used by the client to access the data on the | The storage protocol used by the client to access the data on the | |||
storage device is determined by the layout's type. The client is | storage device is determined by the layout's type. The client is | |||
responsible for matching the layout type with an available method to | responsible for matching the layout type with an available method to | |||
interpret and use the layout. The method for this layout type | interpret and use the layout. The method for this layout type | |||
selection is outside the scope of the pNFS functionality. | selection is outside the scope of the pNFS functionality. | |||
Although the metadata server is in control of the layout for a file, | Although the metadata server is in control of the layout for a file, | |||
the pNFS client can provide hints to the server when a file is opened | the pNFS client can provide hints to the server when a file is opened | |||
or created about the preferred layout type and aggregation schemes. | or created about the preferred layout type and aggregation schemes. | |||
pNFS introduces a layout_hint (Section 5.12.4) attribute that the | pNFS introduces a layout_hint attribute (Section 5.12.4) that the | |||
client can set at file creation time to provide a hint to the server | client can set at file creation time to provide a hint to the server | |||
for new files. Setting this attribute separately, after the file has | for new files. Setting this attribute separately, after the file has | |||
been created might make it difficult, or impossible, for the server | been created might make it difficult, or impossible, for the server | |||
implementation to comply. | implementation to comply. | |||
Because the EXCLUSIVE4 createmode4 does not allow the setting of | Because the EXCLUSIVE4 createmode4 does not allow the setting of | |||
attributes at file creation time, NFSv4.1 introduces the EXCLUSIVE4_1 | attributes at file creation time, NFSv4.1 introduces the EXCLUSIVE4_1 | |||
createmode4, which does allow attributes to be set at file creation | createmode4, which does allow attributes to be set at file creation | |||
time. In addition, if the session is created with persistent reply | time. In addition, if the session is created with persistent reply | |||
caches, EXCLUSIVE4_1 is neither necessary nor allowed. Instead, | caches, EXCLUSIVE4_1 is neither necessary nor allowed. Instead, | |||
GUARDED4 both works better and is prescribed. Table 10 in | GUARDED4 both works better and is prescribed. Table 10 in | |||
Section 18.16.3, summarizes how a client is allowed to send an | Section 18.16.3 summarizes how a client is allowed to send an | |||
exclusive create. | exclusive create. | |||
12.5.3. Layout Stateid | 12.5.3. Layout Stateid | |||
As with all other stateids, the layout stateid consists of a "seqid" | As with all other stateids, the layout stateid consists of a "seqid" | |||
and "other" field. Once a layout stateid is changed, the "other" | and "other" field. Once a layout stateid is changed, the "other" | |||
field will stay constant unless the stateid is revoked, or the client | field will stay constant unless the stateid is revoked or the client | |||
returns all layouts on the file and the server disposes of the | returns all layouts on the file and the server disposes of the | |||
stateid. The "seqid" field is initially set to one, and is never | stateid. The "seqid" field is initially set to one, and is never | |||
zero on any NFSv4.1 operation that uses layout stateids, whether it | zero on any NFSv4.1 operation that uses layout stateids, whether it | |||
is a fore channel or backchannel operation. After the layout stateid | is a fore channel or backchannel operation. After the layout stateid | |||
is established, the server increments by one the value of the "seqid" | is established, the server increments by one the value of the "seqid" | |||
in each subsequent LAYOUTGET and LAYOUTRETURN response, and in each | in each subsequent LAYOUTGET and LAYOUTRETURN response, and in each | |||
CB_LAYOUTRECALL request. | CB_LAYOUTRECALL request. | |||
Given the design goal of pNFS to provide parallelism, the layout | Given the design goal of pNFS to provide parallelism, the layout | |||
stateid differs from other stateid types in that the client is | stateid differs from other stateid types in that the client is | |||
expected to send LAYOUTGET and LAYOUTRETURN operations in parallel. | expected to send LAYOUTGET and LAYOUTRETURN operations in parallel. | |||
The "seqid" value is used by the client to properly sort responses to | The "seqid" value is used by the client to properly sort responses to | |||
LAYOUTGET and LAYOUTRETURN. The "seqid" is also used to prevent race | LAYOUTGET and LAYOUTRETURN. The "seqid" is also used to prevent race | |||
conditions between LAYOUTGET and CB_LAYOUTRECALL. Given the | conditions between LAYOUTGET and CB_LAYOUTRECALL. Given that the | |||
processing rules differ from layout stateids and other stateid types, | processing rules differ from layout stateids and other stateid types, | |||
only the pNFS sections of this document should be considered to | only the pNFS sections of this document should be considered to | |||
determine proper layout stateid handling. | determine proper layout stateid handling. | |||
Once the client receives a layout stateid, it MUST use the correct | Once the client receives a layout stateid, it MUST use the correct | |||
"seqid" for subsequent LAYOUTGET or LAYOUTRETURN operations. The | "seqid" for subsequent LAYOUTGET or LAYOUTRETURN operations. The | |||
correct "seqid" is defined as the highest "seqid" value from | correct "seqid" is defined as the highest "seqid" value from | |||
responses of fully processed LAYOUTGET or LAYOUTRETURN operations or | responses of fully processed LAYOUTGET or LAYOUTRETURN operations or | |||
arguments of a fully processed CB_LAYOUTRECALL operation. Since the | arguments of a fully processed CB_LAYOUTRECALL operation. Since the | |||
server is incrementing the "seqid" value on each layout operation, | server is incrementing the "seqid" value on each layout operation, | |||
skipping to change at page 287, line 30 | skipping to change at page 287, line 35 | |||
seqid. For CB_LAYOUTRECALL arguments, the client MUST send a | seqid. For CB_LAYOUTRECALL arguments, the client MUST send a | |||
response to the recall before using the seqid. The fundamental | response to the recall before using the seqid. The fundamental | |||
requirement in client processing is that the "seqid" is used to | requirement in client processing is that the "seqid" is used to | |||
provide the order of processing. LAYOUTGET results may be processed | provide the order of processing. LAYOUTGET results may be processed | |||
in parallel. LAYOUTRETURN results may be processed in parallel. | in parallel. LAYOUTRETURN results may be processed in parallel. | |||
LAYOUTGET and LAYOUTRETURN responses may be processed in parallel as | LAYOUTGET and LAYOUTRETURN responses may be processed in parallel as | |||
long as the ranges do not overlap. CB_LAYOUTRECALL request | long as the ranges do not overlap. CB_LAYOUTRECALL request | |||
processing MUST be processed in "seqid" order at all times. | processing MUST be processed in "seqid" order at all times. | |||
Once a client has no more layouts on a file, the layout stateid is no | Once a client has no more layouts on a file, the layout stateid is no | |||
longer valid, and MUST NOT be used. Any attempt to use such a layout | longer valid and MUST NOT be used. Any attempt to use such a layout | |||
stateid will result in NFS4ERR_BAD_STATEID. | stateid will result in NFS4ERR_BAD_STATEID. | |||
12.5.4. Committing a Layout | 12.5.4. Committing a Layout | |||
Allowing for varying storage protocols capabilities, the pNFS | Allowing for varying storage protocol capabilities, the pNFS protocol | |||
protocol does not require the metadata server and storage devices to | does not require the metadata server and storage devices to have a | |||
have a consistent view of file attributes and data location mappings. | consistent view of file attributes and data location mappings. Data | |||
Data location mapping refers to aspects such as which offsets store | location mapping refers to aspects such as which offsets store data | |||
data as opposed to storing holes (see Section 13.4.4 for a | as opposed to storing holes (see Section 13.4.4 for a discussion). | |||
discussion). Related issues arise for storage protocols where a | Related issues arise for storage protocols where a layout may hold | |||
layout may hold provisionally allocated blocks where the allocation | provisionally allocated blocks where the allocation of those blocks | |||
of those blocks does not survive a complete restart of both the | does not survive a complete restart of both the client and server. | |||
client and server. Because of this inconsistency, it is necessary to | Because of this inconsistency, it is necessary to resynchronize the | |||
re-synchronize the client with the metadata server and its storage | client with the metadata server and its storage devices and make any | |||
devices and make any potential changes available to other clients. | potential changes available to other clients. This is accomplished | |||
This is accomplished by use of the LAYOUTCOMMIT operation. | by use of the LAYOUTCOMMIT operation. | |||
The LAYOUTCOMMIT operation is responsible for committing a modified | The LAYOUTCOMMIT operation is responsible for committing a modified | |||
layout to the metadata server. The data should be written and | layout to the metadata server. The data should be written and | |||
committed to the appropriate storage devices before the LAYOUTCOMMIT | committed to the appropriate storage devices before the LAYOUTCOMMIT | |||
occurs. The scope of the LAYOUTCOMMIT operation depends on the | occurs. The scope of the LAYOUTCOMMIT operation depends on the | |||
storage protocol in use. It is important to note that the level of | storage protocol in use. It is important to note that the level of | |||
synchronization is from the point of view of the client which sent | synchronization is from the point of view of the client that sent the | |||
the LAYOUTCOMMIT. The updated state on the metadata server need only | LAYOUTCOMMIT. The updated state on the metadata server need only | |||
reflect the state as of the client's last operation previous to the | reflect the state as of the client's last operation previous to the | |||
LAYOUTCOMMIT. It is not REQUIRED to maintain a global view that | LAYOUTCOMMIT. The metadata server is not REQUIRED to maintain a | |||
accounts for other clients' I/O that may have occurred within the | global view that accounts for other clients' I/O that may have | |||
same time frame. | occurred within the same time frame. | |||
For block/volume-based layouts, LAYOUTCOMMIT may require updating the | For block/volume-based layouts, LAYOUTCOMMIT may require updating the | |||
block list that comprises the file and committing this layout to | block list that comprises the file and committing this layout to | |||
stable storage. For file-based layouts synchronization of attributes | stable storage. For file-based layouts, synchronization of | |||
between the metadata and storage devices (primarily the size | attributes between the metadata and storage devices, primarily the | |||
attribute) is required. | size attribute, is required. | |||
The control protocol is free to synchronize the attributes before it | The control protocol is free to synchronize the attributes before it | |||
receives a LAYOUTCOMMIT, however upon successful completion of a | receives a LAYOUTCOMMIT; however, upon successful completion of a | |||
LAYOUTCOMMIT, state that exists on the metadata server that describes | LAYOUTCOMMIT, state that exists on the metadata server that describes | |||
the file MUST be synchronized with the state existing on the storage | the file MUST be synchronized with the state that exists on the | |||
devices that comprise that file as of the time of the client's last | storage devices that comprise that file as of the client's last sent | |||
sent operation. Thus, a client that queries the size of a file | operation. Thus, a client that queries the size of a file between a | |||
between a WRITE to a storage device and the LAYOUTCOMMIT might | WRITE to a storage device and the LAYOUTCOMMIT might observe a size | |||
observe a size that does not reflect the actual data written. | that does not reflect the actual data written. | |||
The client MUST have a layout in order to send a LAYOUTCOMMIT | The client MUST have a layout in order to send a LAYOUTCOMMIT | |||
operation. | operation. | |||
12.5.4.1. LAYOUTCOMMIT and change/time_modify | 12.5.4.1. LAYOUTCOMMIT and change/time_modify | |||
The change and time_modify attributes may be updated by the server | The change and time_modify attributes may be updated by the server | |||
when the LAYOUTCOMMIT operation is processed. The reason for this is | when the LAYOUTCOMMIT operation is processed. The reason for this is | |||
that some layout types do not support the update of these attributes | that some layout types do not support the update of these attributes | |||
when the storage devices process I/O operations. If client has a | when the storage devices process I/O operations. If a client has a | |||
layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY | layout with the LAYOUTIOMODE4_RW iomode on the file, the client MAY | |||
provide a suggested value to the server for time_modify within the | provide a suggested value to the server for time_modify within the | |||
arguments to LAYOUTCOMMIT. Based on the layout type, the provided | arguments to LAYOUTCOMMIT. Based on the layout type, the provided | |||
value may or may not be used. The server should sanity check the | value may or may not be used. The server should sanity-check the | |||
client provided values before they are used. For example, the server | client-provided values before they are used. For example, the server | |||
should ensure that time does not flow backwards. The client always | should ensure that time does not flow backwards. The client always | |||
has the option to set time_modify through an explicit SETATTR | has the option to set time_modify through an explicit SETATTR | |||
operation. | operation. | |||
For some layout protocols, the storage device is able to notify the | For some layout protocols, the storage device is able to notify the | |||
metadata server of the occurrence of an I/O and as a result the | metadata server of the occurrence of an I/O; as a result, the change | |||
change and time_modify attributes may be updated at the metadata | and time_modify attributes may be updated at the metadata server. | |||
server. For a metadata server that is capable of monitoring updates | ||||
to the change and time_modify attributes, LAYOUTCOMMIT processing is | For a metadata server that is capable of monitoring updates to the | |||
not required to update the change attribute; in this case the | change and time_modify attributes, LAYOUTCOMMIT processing is not | |||
metadata server must ensure that no further update to the data has | required to update the change attribute. In this case, the metadata | |||
occurred since the last update of the attributes; file-based | server must ensure that no further update to the data has occurred | |||
protocols may have enough information to make this determination or | since the last update of the attributes; file-based protocols may | |||
may update the change attribute upon each file modification. This | have enough information to make this determination or may update the | |||
also applies for the time_modify attribute. If the server | change attribute upon each file modification. This also applies for | |||
implementation is able to determine that the file has not been | the time_modify attribute. If the server implementation is able to | |||
modified since the last time_modify update, the server need not | determine that the file has not been modified since the last | |||
update time_modify at LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the | time_modify update, the server need not update time_modify at | |||
updated attributes should be visible if that file was modified since | LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes | |||
the latest previous LAYOUTCOMMIT or LAYOUTGET. | should be visible if that file was modified since the latest previous | |||
LAYOUTCOMMIT or LAYOUTGET. | ||||
12.5.4.2. LAYOUTCOMMIT and size | 12.5.4.2. LAYOUTCOMMIT and size | |||
The size of a file may be updated when the LAYOUTCOMMIT operation is | The size of a file may be updated when the LAYOUTCOMMIT operation is | |||
used by the client. One of the fields in the argument to | used by the client. One of the fields in the argument to | |||
LAYOUTCOMMIT is loca_last_write_offset; this field indicates the | LAYOUTCOMMIT is loca_last_write_offset; this field indicates the | |||
highest byte offset written but not yet committed with the | highest byte offset written but not yet committed with the | |||
LAYOUTCOMMIT operation. The data type of loca_last_write_offset is | LAYOUTCOMMIT operation. The data type of loca_last_write_offset is | |||
newoffset4 and is switched on a boolean value, no_newoffset, that | newoffset4 and is switched on a boolean value, no_newoffset, that | |||
indicates if a previous write occurred or not. If no_newoffset is | indicates if a previous write occurred or not. If no_newoffset is | |||
FALSE, an offset is not given. If the client has a layout with | FALSE, an offset is not given. If the client has a layout with | |||
LAYOUTIOMODE4_RW iomode on the file, with an lo_offset and lo_length | LAYOUTIOMODE4_RW iomode on the file, with a byte-range (denoted by | |||
that overlaps loca_last_write_offset, then the client MAY set | the values of lo_offset and lo_length) that overlaps | |||
no_newoffset to TRUE and provide an offset that will update the file | loca_last_write_offset, then the client MAY set no_newoffset to TRUE | |||
size. Keep in mind that offset is not the same as length, though | and provide an offset that will update the file size. Keep in mind | |||
they are related. For example, a loca_last_write_offset value of | that offset is not the same as length, though they are related. For | |||
zero means that one byte was written at offset zero, and so the | example, a loca_last_write_offset value of zero means that one byte | |||
length of the file is at least one byte. | was written at offset zero, and so the length of the file is at least | |||
one byte. | ||||
The metadata server may do one of the following: | The metadata server may do one of the following: | |||
1. Update the file's size using the last write offset provided by | 1. Update the file's size using the last write offset provided by | |||
the client as either the true file size or as a hint of the file | the client as either the true file size or as a hint of the file | |||
size. If the metadata server has a method available, any new | size. If the metadata server has a method available, any new | |||
value for file size should be sanity checked. For example, the | value for file size should be sanity-checked. For example, the | |||
file must not be truncated if the client presents a last write | file must not be truncated if the client presents a last write | |||
offset less than the file's current size. | offset less than the file's current size. | |||
2. Ignore the client provided last write offset; the metadata server | 2. Ignore the client-provided last write offset; the metadata server | |||
must have sufficient knowledge from other sources to determine | must have sufficient knowledge from other sources to determine | |||
the file's size. For example, the metadata server queries the | the file's size. For example, the metadata server queries the | |||
storage devices with the control protocol. | storage devices with the control protocol. | |||
The method chosen to update the file's size will depend on the | The method chosen to update the file's size will depend on the | |||
storage device's and/or the control protocol's capabilities. For | storage device's and/or the control protocol's capabilities. For | |||
example, if the storage devices are block devices with no knowledge | example, if the storage devices are block devices with no knowledge | |||
of file size, the metadata server must rely on the client to set the | of file size, the metadata server must rely on the client to set the | |||
last write offset appropriately. | last write offset appropriately. | |||
The results of LAYOUTCOMMIT contain a new size value in the form of a | The results of LAYOUTCOMMIT contain a new size value in the form of a | |||
newsize4 union data type. If the file's size is set as a result of | newsize4 union data type. If the file's size is set as a result of | |||
LAYOUTCOMMIT, the metadata server must reply with the new size; | LAYOUTCOMMIT, the metadata server must reply with the new size; | |||
otherwise the new size is not provided. If the file size is updated, | otherwise, the new size is not provided. If the file size is | |||
the metadata server SHOULD update the storage devices such that the | updated, the metadata server SHOULD update the storage devices such | |||
new file size is reflected when LAYOUTCOMMIT processing is complete. | that the new file size is reflected when LAYOUTCOMMIT processing is | |||
For example, the client should be able to READ up to the new file | complete. For example, the client should be able to read up to the | |||
size. | new file size. | |||
The client can extend the length of a file or truncate a file by | The client can extend the length of a file or truncate a file by | |||
sending a SETATTR operation to the metadata server with the size | sending a SETATTR operation to the metadata server with the size | |||
attribute specified. If the size specified is larger than the | attribute specified. If the size specified is larger than the | |||
current size of the file, the file is "zero extended", i.e., zeroes | current size of the file, the file is "zero extended", i.e., zeros | |||
are implicitly added between the file's previous EOF and the new EOF. | are implicitly added between the file's previous EOF and the new EOF. | |||
(In many implementations the zero extended region of the file | (In many implementations, the zero-extended byte-range of the file | |||
consists of unallocated holes in the file.) When the client writes | consists of unallocated holes in the file.) When the client writes | |||
past EOF via WRITE, the SETATTR operation does not need to be used. | past EOF via WRITE, the SETATTR operation does not need to be used. | |||
12.5.4.3. LAYOUTCOMMIT and layoutupdate | 12.5.4.3. LAYOUTCOMMIT and layoutupdate | |||
The LAYOUTCOMMIT argument contains a loca_layoutupdate field | The LAYOUTCOMMIT argument contains a loca_layoutupdate field | |||
(Section 18.42.1) of data type layoutupdate4 (Section 3.3.18). This | (Section 18.42.1) of data type layoutupdate4 (Section 3.3.18). This | |||
argument is a layout type-specific structure. The structure can be | argument is a layout-type-specific structure. The structure can be | |||
used to pass arbitrary layout type-specific information from the | used to pass arbitrary layout-type-specific information from the | |||
client to the metadata server at LAYOUTCOMMIT time. For example, if | client to the metadata server at LAYOUTCOMMIT time. For example, if | |||
using a block/volume layout, the client can indicate to the metadata | using a block/volume layout, the client can indicate to the metadata | |||
server which reserved or allocated blocks the client used or did not | server which reserved or allocated blocks the client used or did not | |||
use. The content of loca_layoutupdate (field lou_body) need not be | use. The content of loca_layoutupdate (field lou_body) need not be | |||
the same layout type-specific content returned by LAYOUTGET | the same layout-type-specific content returned by LAYOUTGET | |||
(Section 18.43.2) in the loc_body field of the lo_content field, of | (Section 18.43.2) in the loc_body field of the lo_content field of | |||
the logr_layout field. The content of loca_layoutupdate is defined | the logr_layout field. The content of loca_layoutupdate is defined | |||
by the layout type specification and is opaque to LAYOUTCOMMIT. | by the layout type specification and is opaque to LAYOUTCOMMIT. | |||
12.5.5. Recalling a Layout | 12.5.5. Recalling a Layout | |||
Since a layout protects a client's access to a file via a direct | Since a layout protects a client's access to a file via a direct | |||
client-storage-device path, a layout need only be recalled when it is | client-storage-device path, a layout need only be recalled when it is | |||
semantically unable to serve this function. Typically, this occurs | semantically unable to serve this function. Typically, this occurs | |||
when the layout no longer encapsulates the true location of the file | when the layout no longer encapsulates the true location of the file | |||
over the byte range it represents. Any operation or action, such as | over the byte-range it represents. Any operation or action, such as | |||
server driven restriping or load balancing, that changes the layout | server-driven restriping or load balancing, that changes the layout | |||
will result in a recall of the layout. A layout is recalled by the | will result in a recall of the layout. A layout is recalled by the | |||
CB_LAYOUTRECALL callback operation (see Section 20.3) and returned | CB_LAYOUTRECALL callback operation (see Section 20.3) and returned | |||
with LAYOUTRETURN Section 18.44. The CB_LAYOUTRECALL operation may | with LAYOUTRETURN (see Section 18.44). The CB_LAYOUTRECALL operation | |||
recall a layout identified by a byte range, all the layouts | may recall a layout identified by a byte-range, all layouts | |||
associated with a file system (FSID), or all layouts associated with | associated with a file system ID (FSID), or all layouts associated | |||
a client ID. Section 12.5.5.2 discusses sequencing issues | with a client ID. Section 12.5.5.2 discusses sequencing issues | |||
surrounding the getting, returning, and recalling of layouts. | surrounding the getting, returning, and recalling of layouts. | |||
An iomode is also specified when recalling a layout. Generally, the | An iomode is also specified when recalling a layout. Generally, the | |||
iomode in the recall request must match the layout being returned; | iomode in the recall request must match the layout being returned; | |||
for example, a recall with an iomode of LAYOUTIOMODE4_RW should cause | for example, a recall with an iomode of LAYOUTIOMODE4_RW should cause | |||
the client to only return LAYOUTIOMODE4_RW layouts and not | the client to only return LAYOUTIOMODE4_RW layouts and not | |||
LAYOUTIOMODE4_READ layouts. However, a special LAYOUTIOMODE4_ANY | LAYOUTIOMODE4_READ layouts. However, a special LAYOUTIOMODE4_ANY | |||
enumeration is defined to enable recalling a layout of any iomode; in | enumeration is defined to enable recalling a layout of any iomode; in | |||
other words, the client must return both read-only and read/write | other words, the client must return both LAYOUTIOMODE4_READ and | |||
layouts. | LAYOUTIOMODE4_RW layouts. | |||
A REMOVE operation SHOULD cause the metadata server to recall the | A REMOVE operation SHOULD cause the metadata server to recall the | |||
layout to prevent the client from accessing a non-existent file and | layout to prevent the client from accessing a non-existent file and | |||
to reclaim state stored on the client. Since a REMOVE may be delayed | to reclaim state stored on the client. Since a REMOVE may be delayed | |||
until the last close of the file has occurred, the recall may also be | until the last close of the file has occurred, the recall may also be | |||
delayed until this time. After the last reference on the file has | delayed until this time. After the last reference on the file has | |||
been released and the file has been removed, the client should no | been released and the file has been removed, the client should no | |||
longer be able to perform I/O using the layout. In the case of a | longer be able to perform I/O using the layout. In the case of a | |||
files based layout, the data server SHOULD return NFS4ERR_STALE in | file-based layout, the data server SHOULD return NFS4ERR_STALE in | |||
response to any operation on the removed file. | response to any operation on the removed file. | |||
Once a layout has been returned, the client MUST NOT send I/Os to the | Once a layout has been returned, the client MUST NOT send I/Os to the | |||
storage devices for the file, byte range, and iomode represented by | storage devices for the file, byte-range, and iomode represented by | |||
the returned layout. If a client does send an I/O to a storage | the returned layout. If a client does send an I/O to a storage | |||
device for which it does not hold a layout, the storage device SHOULD | device for which it does not hold a layout, the storage device SHOULD | |||
reject the I/O. | reject the I/O. | |||
Although pNFS does not alter the file data caching capabilities of | Although pNFS does not alter the file data caching capabilities of | |||
clients, or their semantics, it recognizes that some clients may | clients, or their semantics, it recognizes that some clients may | |||
perform more aggressive write-behind caching to optimize the benefits | perform more aggressive write-behind caching to optimize the benefits | |||
provided by pNFS. However, write-behind caching may negatively | provided by pNFS. However, write-behind caching may negatively | |||
affect the latency in returning a layout in response to a | affect the latency in returning a layout in response to a | |||
CB_LAYOUTRECALL; this is similar to file delegations and the impact | CB_LAYOUTRECALL; this is similar to file delegations and the impact | |||
that file data caching has on DELEGRETURN. Client implementations | that file data caching has on DELEGRETURN. Client implementations | |||
SHOULD limit the amount of unwritten data they have outstanding at | SHOULD limit the amount of unwritten data they have outstanding at | |||
any one time in order to prevent excessively long responses to | any one time in order to prevent excessively long responses to | |||
CB_LAYOUTRECALL. Once a layout is recalled, a server MUST wait one | CB_LAYOUTRECALL. Once a layout is recalled, a server MUST wait one | |||
lease period before taking further action. As soon as a lease period | lease period before taking further action. As soon as a lease period | |||
has past, the server may choose to fence the client's access to the | has passed, the server may choose to fence the client's access to the | |||
storage devices if the server perceives the client has taken too long | storage devices if the server perceives the client has taken too long | |||
to return a layout. However, just as in the case of data delegation | to return a layout. However, just as in the case of data delegation | |||
and DELEGRETURN, the server may choose to wait given that the client | and DELEGRETURN, the server may choose to wait, given that the client | |||
is showing forward progress on its way to returning the layout. This | is showing forward progress on its way to returning the layout. This | |||
forward progress can take the form of successful interaction with the | forward progress can take the form of successful interaction with the | |||
storage devices or sub-portions of the layout being returned by the | storage devices or of sub-portions of the layout being returned by | |||
client. The server can also limit exposure to these problems by | the client. The server can also limit exposure to these problems by | |||
limiting the byte ranges initially provided in the layouts and thus | limiting the byte-ranges initially provided in the layouts and thus | |||
the amount of outstanding modified data. | the amount of outstanding modified data. | |||
12.5.5.1. Layout Recall Callback Robustness | 12.5.5.1. Layout Recall Callback Robustness | |||
It has been assumed thus far that pNFS client state for a file | It has been assumed thus far that pNFS client state (layout ranges | |||
exactly matches the pNFS server state for that file and client | and iomode) for a file exactly matches that of the pNFS server for | |||
regarding layout ranges and iomode. This assumption leads to the | that file. This assumption leads to the implication that any | |||
implication that any callback results in a LAYOUTRETURN or set of | callback results in a LAYOUTRETURN or set of LAYOUTRETURNs that | |||
LAYOUTRETURNs that exactly match the range in the callback, since | exactly match the range in the callback, since both client and server | |||
both client and server agree about the state being maintained. | agree about the state being maintained. However, it can be useful if | |||
However, it can be useful if this assumption does not always hold. | this assumption does not always hold. For example: | |||
For example: | ||||
o If conflicts that require callbacks are very rare, and a server | o If conflicts that require callbacks are very rare, and a server | |||
can use a multi-file callback to recover per-client resources | can use a multi-file callback to recover per-client resources | |||
(e.g., via a FSID recall, or a multi-file recall within a single | (e.g., via an FSID recall or a multi-file recall within a single | |||
compound), the result may be significantly less client-server pNFS | CB_COMPOUND), the result may be significantly less client-server | |||
traffic. | pNFS traffic. | |||
o It may be useful for servers to maintain information about what | o It may be useful for servers to maintain information about what | |||
ranges are held by a client on a coarse-grained basis, leading to | ranges are held by a client on a coarse-grained basis, leading to | |||
the server's layout ranges being beyond those actually held by the | the server's layout ranges being beyond those actually held by the | |||
client. In the extreme, a server could manage conflicts on a per- | client. In the extreme, a server could manage conflicts on a per- | |||
file basis, only sending whole-file callbacks even though clients | file basis, only sending whole-file callbacks even though clients | |||
may request and be granted sub-file ranges. | may request and be granted sub-file ranges. | |||
o It may be useful for clients to "forget" details about what | o It may be useful for clients to "forget" details about what | |||
layouts and ranges the client actually has, leading to the | layouts and ranges the client actually has, leading to the | |||
server's layout ranges being beyond those what the client "thinks" | server's layout ranges being beyond those that the client "thinks" | |||
it has. As long as the client does not assume it has layouts that | it has. As long as the client does not assume it has layouts that | |||
are beyond what the server has granted, this is a safe practice. | are beyond what the server has granted, this is a safe practice. | |||
When a client forgets what ranges and layouts it has, and it | When a client forgets what ranges and layouts it has, and it | |||
receives a CB_LAYOUTRECALL operation, the client MUST follow up | receives a CB_LAYOUTRECALL operation, the client MUST follow up | |||
with a LAYOUTRETURN for what the server recalled, or alternatively | with a LAYOUTRETURN for what the server recalled, or alternatively | |||
return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to | return the NFS4ERR_NOMATCHING_LAYOUT error if it has no layout to | |||
return in the recalled range. | return in the recalled range. | |||
o In order to avoid errors, it is vital that a client not assign | o In order to avoid errors, it is vital that a client not assign | |||
itself layout permissions beyond what the server has granted and | itself layout permissions beyond what the server has granted, and | |||
that the server not forget layout permissions that have been | that the server not forget layout permissions that have been | |||
granted. On the other hand, if a server believes that a client | granted. On the other hand, if a server believes that a client | |||
holds a layout that the client does not know about, it is useful | holds a layout that the client does not know about, it is useful | |||
for the client to cleanly indicate completion of the requested | for the client to cleanly indicate completion of the requested | |||
recall either by sending a LAYOUTRETURN operation for the entire | recall either by sending a LAYOUTRETURN operation for the entire | |||
requested range or by returning an NFS4ERR_NOMATCHING_LAYOUT error | requested range or by returning an NFS4ERR_NOMATCHING_LAYOUT error | |||
to the CB_LAYOUTRECALL. | to the CB_LAYOUTRECALL. | |||
Thus, in light of the above, it is useful for a server to be able to | Thus, in light of the above, it is useful for a server to be able to | |||
send callbacks for layout ranges it has not granted to a client, and | send callbacks for layout ranges it has not granted to a client, and | |||
for a client to return ranges it does not hold. A pNFS client MUST | for a client to return ranges it does not hold. A pNFS client MUST | |||
always return layouts that comprise the full range specified by the | always return layouts that comprise the full range specified by the | |||
recall. Note, the full recalled layout range need not be returned as | recall. Note, the full recalled layout range need not be returned as | |||
part of a single operation, but may be returned in portions. This | part of a single operation, but may be returned in portions. This | |||
allows the client to stage the flushing of dirty data, layout | allows the client to stage the flushing of dirty data and commits and | |||
commits, and returns. Also, it indicates to the metadata server that | returns of layouts. Also, it indicates to the metadata server that | |||
the client is making progress. | the client is making progress. | |||
When a layout is returned, the client MUST NOT have any outstanding | When a layout is returned, the client MUST NOT have any outstanding | |||
I/O requests to the storage devices involved in the layout. | I/O requests to the storage devices involved in the layout. | |||
Rephrasing, the client MUST NOT return the layout while it has | Rephrasing, the client MUST NOT return the layout while it has | |||
outstanding I/O requests to the storage device. | outstanding I/O requests to the storage device. | |||
Even with this requirement for the client, it is possible that I/O | Even with this requirement for the client, it is possible that I/O | |||
requests may be presented to a storage device no longer allowed to | requests may be presented to a storage device no longer allowed to | |||
perform them. Since the server has no strict control as to when the | perform them. Since the server has no strict control as to when the | |||
client will return the layout, the server may later decide to | client will return the layout, the server may later decide to | |||
unilaterally revoke the client's access to the storage devices as | unilaterally revoke the client's access to the storage devices as | |||
provided by the layout. In choosing to revoke access, the server | provided by the layout. In choosing to revoke access, the server | |||
must deal with the possibility of lingering I/O request; those | must deal with the possibility of lingering I/O requests, i.e., I/O | |||
outstanding I/O requests are still in flight to storage devices | requests that are still in flight to storage devices identified by | |||
identified by the revoked layout. All layout type specifications | the revoked layout. All layout type specifications MUST define | |||
MUST define whether unilateral layout revocation by the metadata | whether unilateral layout revocation by the metadata server is | |||
server is supported; if it is, the specification must also describe | supported; if it is, the specification must also describe how | |||
how lingering writes are processed. For example, storage devices | lingering writes are processed. For example, storage devices | |||
identified by the revoked layout could be fenced off from the client | identified by the revoked layout could be fenced off from the client | |||
that held the layout. | that held the layout. | |||
In order to ensure client/server convergence with regard to layout | In order to ensure client/server convergence with regard to layout | |||
state, the final LAYOUTRETURN operation in a sequence of LAYOUTRETURN | state, the final LAYOUTRETURN operation in a sequence of LAYOUTRETURN | |||
operations for a particular recall, MUST specify the entire range | operations for a particular recall MUST specify the entire range | |||
being recalled, echoing the recalled layout type, iomode, recall/ | being recalled, echoing the recalled layout type, iomode, recall/ | |||
return type (FILE, FSID, or ALL), and byte range; even if layouts | return type (FILE, FSID, or ALL), and byte-range, even if layouts | |||
pertaining to partial ranges were previously returned. In addition, | pertaining to partial ranges were previously returned. In addition, | |||
if the client holds no layouts that overlaps the range being | if the client holds no layouts that overlap the range being recalled, | |||
recalled, the client should return the NFS4ERR_NOMATCHING_LAYOUT | the client should return the NFS4ERR_NOMATCHING_LAYOUT error code to | |||
error code to CB_LAYOUTRECALL. This allows the server to update its | CB_LAYOUTRECALL. This allows the server to update its view of the | |||
view of the client's layout state. | client's layout state. | |||
12.5.5.2. Sequencing of Layout Operations | 12.5.5.2. Sequencing of Layout Operations | |||
As with other stateful operations, pNFS requires the correct | As with other stateful operations, pNFS requires the correct | |||
sequencing of layout operations. pNFS uses the "seqid" in the layout | sequencing of layout operations. pNFS uses the "seqid" in the layout | |||
stateid to provide the correct sequencing between regular operations | stateid to provide the correct sequencing between regular operations | |||
and callbacks. It is the server's responsibility to avoid | and callbacks. It is the server's responsibility to avoid | |||
inconsistencies regarding the layouts provided and the client's | inconsistencies regarding the layouts provided and the client's | |||
responsibility to properly serialize its layout requests and layout | responsibility to properly serialize its layout requests and layout | |||
returns. | returns. | |||
12.5.5.2.1. Layout Recall and Return Sequencing | 12.5.5.2.1. Layout Recall and Return Sequencing | |||
One critical issue with regard to layout operations sequencing | One critical issue with regard to layout operations sequencing | |||
concerns callbacks. The protocol must defend against races between | concerns callbacks. The protocol must defend against races between | |||
the reply to a LAYOUTGET or LAYOUTRETURN operation and a subsequent | the reply to a LAYOUTGET or LAYOUTRETURN operation and a subsequent | |||
CB_LAYOUTRECALL. A client MUST NOT process a CB_LAYOUTRECALL that | CB_LAYOUTRECALL. A client MUST NOT process a CB_LAYOUTRECALL that | |||
implies one or more outstanding LAYOUTGET or LAYOUTRETURN operations | implies one or more outstanding LAYOUTGET or LAYOUTRETURN operations | |||
to which the client has not yet received a reply. The client detects | to which the client has not yet received a reply. The client detects | |||
such a CB_LAYOUTRECALL by examining the "seqid" field of the recall's | such a CB_LAYOUTRECALL by examining the "seqid" field of the recall's | |||
layout stateid. If the "seqid" is not one higher than what the | layout stateid. If the "seqid" is not exactly one higher than what | |||
client currently has recorded, and the client has at least one | the client currently has recorded, and the client has at least one | |||
LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows | LAYOUTGET and/or LAYOUTRETURN operation outstanding, the client knows | |||
the server sent the CB_LAYOUTRECALL after sending a response to an | the server sent the CB_LAYOUTRECALL after sending a response to an | |||
outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before | outstanding LAYOUTGET or LAYOUTRETURN. The client MUST wait before | |||
processing such a CB_LAYOUTRECALL until it processes all replies for | processing such a CB_LAYOUTRECALL until it processes all replies for | |||
outstanding LAYOUTGET and LAYOUTRETURN operations for the | outstanding LAYOUTGET and LAYOUTRETURN operations for the | |||
corresponding file with seqid less than the seqid given by | corresponding file with seqid less than the seqid given by | |||
CB_LAYOUTRECALL (lor_stateid, see Section 20.3.) | CB_LAYOUTRECALL (lor_stateid; see Section 20.3.) | |||
In addition to the seqid-based mechanism, Section 2.10.6.3 describes | In addition to the seqid-based mechanism, Section 2.10.6.3 describes | |||
the sessions mechanism for allowing the client to detect callback | the sessions mechanism for allowing the client to detect callback | |||
race conditions and delay processing such a CB_LAYOUTRECALL. The | race conditions and delay processing such a CB_LAYOUTRECALL. The | |||
server MAY reference conflicting operations in the CB_SEQUENCE that | server MAY reference conflicting operations in the CB_SEQUENCE that | |||
precedes the CB_LAYOUTRECALL. Because the server has already sent | precedes the CB_LAYOUTRECALL. Because the server has already sent | |||
replies for these operations before sending the callback, the replies | replies for these operations before sending the callback, the replies | |||
may race with the CB_LAYOUTRECALL. The client MUST wait for all the | may race with the CB_LAYOUTRECALL. The client MUST wait for all the | |||
referenced calls to complete and update its view of the layout state | referenced calls to complete and update its view of the layout state | |||
before processing the CB_LAYOUTRECALL. | before processing the CB_LAYOUTRECALL. | |||
skipping to change at page 294, line 49 | skipping to change at page 294, line 51 | |||
which they were created. However, through the use of the "seqid" | which they were created. However, through the use of the "seqid" | |||
field in the layout stateid, the client can determine the order in | field in the layout stateid, the client can determine the order in | |||
which parallel outstanding operations were processed by the server. | which parallel outstanding operations were processed by the server. | |||
Thus, when a layout retrieved by an outstanding LAYOUTGET operation | Thus, when a layout retrieved by an outstanding LAYOUTGET operation | |||
intersects with a layout returned by an outstanding LAYOUTRETURN on | intersects with a layout returned by an outstanding LAYOUTRETURN on | |||
the same file, the order in which the two conflicting operations are | the same file, the order in which the two conflicting operations are | |||
processed determines the final state of the overlapping layout. The | processed determines the final state of the overlapping layout. The | |||
order is determined by the "seqid" returned in each operation: the | order is determined by the "seqid" returned in each operation: the | |||
operation with the higher seqid was executed later. | operation with the higher seqid was executed later. | |||
It is permissible for the client to send in parallel multiple | It is permissible for the client to send multiple parallel LAYOUTGET | |||
LAYOUTGET operations for the same file or multiple LAYOUTRETURN | operations for the same file or multiple parallel LAYOUTRETURN | |||
operations for the same file, and a mix of both. | operations for the same file or a mix of both. | |||
It is permissible for the client to use the current stateid (see | It is permissible for the client to use the current stateid (see | |||
Section 16.2.3.1.2) for LAYOUTGET operations for example when | Section 16.2.3.1.2) for LAYOUTGET operations, for example, when | |||
compounding LAYOUTGETs or compounding OPEN and LAYOUTGETs. It is | compounding LAYOUTGETs or compounding OPEN and LAYOUTGETs. It is | |||
also permissible to use the current stateid when compounding | also permissible to use the current stateid when compounding | |||
LAYOUTRETURNs. | LAYOUTRETURNs. | |||
It is permissible for the client to use the current stateid when | It is permissible for the client to use the current stateid when | |||
combining LAYOUTRETURN and LAYOUTGET operations for the same file in | combining LAYOUTRETURN and LAYOUTGET operations for the same file in | |||
the same COMPOUND request since the server MUST process these in | the same COMPOUND request since the server MUST process these in | |||
order. However, if a client does send such COMPOUND requests, it | order. However, if a client does send such COMPOUND requests, it | |||
MUST NOT have more than one outstanding for the same file at the same | MUST NOT have more than one outstanding for the same file at the same | |||
time and MUST NOT have other LAYOUTGET or LAYOUTRETURN operations | time, and it MUST NOT have other LAYOUTGET or LAYOUTRETURN operations | |||
outstanding at the same time for that same file. | outstanding at the same time for that same file. | |||
12.5.5.2.1.2. Client Considerations | 12.5.5.2.1.2. Client Considerations | |||
Consider a pNFS client that has sent a LAYOUTGET and before it | Consider a pNFS client that has sent a LAYOUTGET, and before it | |||
receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for | receives the reply to LAYOUTGET, it receives a CB_LAYOUTRECALL for | |||
the same file with an overlapping range. There are two | the same file with an overlapping range. There are two | |||
possibilities, which the client can distinguish via the layout | possibilities, which the client can distinguish via the layout | |||
stateid in the recall. | stateid in the recall. | |||
1. The server processed the LAYOUTGET before sending the recall, so | 1. The server processed the LAYOUTGET before sending the recall, so | |||
the LAYOUTGET must be waited for because it may be carrying | the LAYOUTGET must be waited for because it may be carrying | |||
layout information that will need to be returned to deal with the | layout information that will need to be returned to deal with the | |||
CB_LAYOUTRECALL. | CB_LAYOUTRECALL. | |||
2. The server sent the callback before receiving the LAYOUTGET. The | 2. The server sent the callback before receiving the LAYOUTGET. The | |||
server will not respond to the LAYOUTGET until the | server will not respond to the LAYOUTGET until the | |||
CB_LAYOUTRECALL is processed. | CB_LAYOUTRECALL is processed. | |||
If these possibilities cannot be distinguished, a deadlock could | If these possibilities cannot be distinguished, a deadlock could | |||
result, as the client must wait for the LAYOUTGET response before | result, as the client must wait for the LAYOUTGET response before | |||
processing the recall in the first case, but that response will not | processing the recall in the first case, but that response will not | |||
arrive until after the recall is processed in the second case. Note | arrive until after the recall is processed in the second case. Note | |||
that in the first case, the "seqid" in the layout stateid of the | that in the first case, the "seqid" in the layout stateid of the | |||
recall is two greater than what the client has recorded and in the | recall is two greater than what the client has recorded; in the | |||
second case, the "seqid" is one greater than what the client has | second case, the "seqid" is one greater than what the client has | |||
recorded. This allows the client to disambiguate between the two | recorded. This allows the client to disambiguate between the two | |||
cases. The client thus knows precisely which possibility applies. | cases. The client thus knows precisely which possibility applies. | |||
In case 1 the client knows it needs to wait for the LAYOUTGET | In case 1, the client knows it needs to wait for the LAYOUTGET | |||
response before processing the recall (or the client can return | response before processing the recall (or the client can return | |||
NFS4ERR_DELAY). | NFS4ERR_DELAY). | |||
In case 2 the client will not wait for the LAYOUTGET response before | In case 2, the client will not wait for the LAYOUTGET response before | |||
processing the recall, because waiting would cause deadlock. | processing the recall because waiting would cause deadlock. | |||
Therefore, the action at the client will only require waiting in the | Therefore, the action at the client will only require waiting in the | |||
case that the client has not yet seen the server's earlier responses | case that the client has not yet seen the server's earlier responses | |||
to the LAYOUTGET operation(s). | to the LAYOUTGET operation(s). | |||
The recall process can be considered completed when the final | The recall process can be considered completed when the final | |||
LAYOUTRETURN operation for the recalled range is completed. The | LAYOUTRETURN operation for the recalled range is completed. The | |||
LAYOUTRETURN uses the layout stateid (with seqid) specified in | LAYOUTRETURN uses the layout stateid (with seqid) specified in | |||
CB_LAYOUTRECALL. If the client uses multiple LAYOUTRETURNs in | CB_LAYOUTRECALL. If the client uses multiple LAYOUTRETURNs in | |||
processing the recall, the first LAYOUTRETURN will use the layout | processing the recall, the first LAYOUTRETURN will use the layout | |||
stateid as specified in CB_LAYOUTRECALL. Subsequent LAYOUTRETURNs | stateid as specified in CB_LAYOUTRECALL. Subsequent LAYOUTRETURNs | |||
skipping to change at page 296, line 38 | skipping to change at page 296, line 42 | |||
2. The client sent the LAYOUTGET after processing the | 2. The client sent the LAYOUTGET after processing the | |||
CB_LAYOUTRECALL, but the LAYOUTGET arrived before the | CB_LAYOUTRECALL, but the LAYOUTGET arrived before the | |||
LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed | LAYOUTRETURN and the response to CB_LAYOUTRECALL that completed | |||
that processing. The "seqid" in the layout stateid of LAYOUTGET | that processing. The "seqid" in the layout stateid of LAYOUTGET | |||
is equal to or greater than that of the "seqid" in | is equal to or greater than that of the "seqid" in | |||
CB_LAYOUTRECALL. The server has not received a response to the | CB_LAYOUTRECALL. The server has not received a response to the | |||
CB_LAYOUTRECALL, so it returns NFS4ERR_RECALLCONFLICT. | CB_LAYOUTRECALL, so it returns NFS4ERR_RECALLCONFLICT. | |||
3. The client sent the LAYOUTGET after processing the | 3. The client sent the LAYOUTGET after processing the | |||
CB_LAYOUTRECALL, the server received the CB_LAYOUTRECALL | CB_LAYOUTRECALL; the server received the CB_LAYOUTRECALL | |||
response, but the LAYOUTGET arrived before the LAYOUTRETURN that | response, but the LAYOUTGET arrived before the LAYOUTRETURN that | |||
completed that processing. The "seqid" in the layout stateid of | completed that processing. The "seqid" in the layout stateid of | |||
LAYOUTGET is equal to that of the "seqid" in CB_LAYOUTRECALL. | LAYOUTGET is equal to that of the "seqid" in CB_LAYOUTRECALL. | |||
The server has received a response to the CB_LAYOUTRECALL, so it | The server has received a response to the CB_LAYOUTRECALL, so it | |||
returns NFS4ERR_RETURNCONFLICT. | returns NFS4ERR_RETURNCONFLICT. | |||
12.5.5.2.1.4. Wraparound and Validation of Seqid | 12.5.5.2.1.4. Wraparound and Validation of Seqid | |||
The rules for layout stateid processing differ from other stateids in | The rules for layout stateid processing differ from other stateids in | |||
the protocol because the "seqid" value cannot be zero and the | the protocol because the "seqid" value cannot be zero and the | |||
stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The | stateid's "seqid" value changes in a CB_LAYOUTRECALL operation. The | |||
non-zero requirement combined with the inherent parallelism of layout | non-zero requirement combined with the inherent parallelism of layout | |||
operations means that a set of LAYOUTGET and LAYOUTRETURN operations | operations means that a set of LAYOUTGET and LAYOUTRETURN operations | |||
may contain the same value for "seqid". The server uses a slightly | may contain the same value for "seqid". The server uses a slightly | |||
modified version of the modulo arithmetic as described in | modified version of the modulo arithmetic as described in | |||
Section 2.10.6.1 when incrementing the layout stateid's "seqid". The | Section 2.10.6.1 when incrementing the layout stateid's "seqid". The | |||
modification to that modulo arithmetic description is to not use | difference is that zero is not a valid value for "seqid"; when the | |||
zero. The modulo arithmetic is also used for the comparisons of | the value of a "seqid" is 0xFFFFFFFF, the next valid value will be | |||
"seqid" values in the processing of CB_LAYOUTRECALL events as | 0x00000001. The modulo arithmetic is also used for the comparisons | |||
of "seqid" values in the processing of CB_LAYOUTRECALL events as | ||||
described above in Section 12.5.5.2.1.3. | described above in Section 12.5.5.2.1.3. | |||
Just as the server validates the "seqid" in the event of | Just as the server validates the "seqid" in the event of | |||
CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the | CB_LAYOUTRECALL usage, as described in Section 12.5.5.2.1.3, the | |||
server also validates the "seqid" value to ensure that it is within | server also validates the "seqid" value to ensure that it is within | |||
an appropriate range. This range represents the degree of | an appropriate range. This range represents the degree of | |||
parallelism the server supports for layout stateids. If the client | parallelism the server supports for layout stateids. If the client | |||
is sending multiple layout operations to the server in parallel, by | is sending multiple layout operations to the server in parallel, by | |||
definition, the "seqid" value in the supplied stateid will not be the | definition, the "seqid" value in the supplied stateid will not be the | |||
current "seqid" as held by the server. The range of parallelism | current "seqid" as held by the server. The range of parallelism | |||
spans from the highest or current "seqid" to a "seqid" value in the | spans from the highest or current "seqid" to a "seqid" value in the | |||
past. To assist in the discussion, the server's current "seqid" | past. To assist in the discussion, the server's current "seqid" | |||
value for a layout stateid is defined as: SERVER_CURRENT_SEQID. The | value for a layout stateid is defined as SERVER_CURRENT_SEQID. The | |||
lowest "seqid" value that is acceptable to the server is represented | lowest "seqid" value that is acceptable to the server is represented | |||
by PAST_SEQID. And the value for the range of valid "seqid"s or | by PAST_SEQID. And the value for the range of valid "seqid"s or | |||
range of parallelism is VALID_SEQID_RANGE. Therefore, the following | range of parallelism is VALID_SEQID_RANGE. Therefore, the following | |||
holds: VALID_SEQID_RANGE = SERVER_CURRENT_SEQID - PAST_SEQID. In the | holds: VALID_SEQID_RANGE = SERVER_CURRENT_SEQID - PAST_SEQID. In the | |||
following, all arithmetic is the modulo arithmetic as described | following, all arithmetic is the modulo arithmetic as described | |||
above. | above. | |||
The server MUST support a minimum VALID_SEQID_RANGE. The minimum is | The server MUST support a minimum VALID_SEQID_RANGE. The minimum is | |||
defined as: VALID_SEQID_RANGE = summation of 1..N of | defined as: VALID_SEQID_RANGE = summation of 1..N of | |||
(ca_maxoperations(i) - 1) where N is the number of session fore | (ca_maxoperations(i) - 1), where N is the number of session fore | |||
channels and ca_maxoperations(i) is the value of the ca_maxoperations | channels and ca_maxoperations(i) is the value of the ca_maxoperations | |||
returned from CREATE_SESSION of the i'th session. The reason for | returned from CREATE_SESSION of the i'th session. The reason for "- | |||
minus 1 is to allow for the required SEQUENCE operation. The server | 1" is to allow for the required SEQUENCE operation. The server MAY | |||
MAY support a VALID_SEQID_RANGE value larger than the minimum. The | support a VALID_SEQID_RANGE value larger than the minimum. The | |||
maximum VALID_SEQID_RANGE is (2 ^ 32 - 2) (accounts for 0 not being a | maximum VALID_SEQID_RANGE is (2 ^ 32 - 2) (accounts for zero not | |||
valid "seqid" value). | being a valid "seqid" value). | |||
If the server finds the "seqid" is zero, the NFS4ERR_BAD_STATEID | If the server finds the "seqid" is zero, the NFS4ERR_BAD_STATEID | |||
error is returned to the client. The server further validates the | error is returned to the client. The server further validates the | |||
"seqid" to ensure it is within the range of parallelism, | "seqid" to ensure it is within the range of parallelism, | |||
VALID_SEQID_RANGE. If the "seqid" value is outside of that range, | VALID_SEQID_RANGE. If the "seqid" value is outside of that range, | |||
the error NFS4ERR_OLD_STATEID is returned to the client. Upon | the error NFS4ERR_OLD_STATEID is returned to the client. Upon | |||
receipt of NFS4ERR_OLD_STATEID, the client updates the stateid in the | receipt of NFS4ERR_OLD_STATEID, the client updates the stateid in the | |||
layout request based on processing of other layout requests and re- | layout request based on processing of other layout requests and re- | |||
sends the operation to the server. | sends the operation to the server. | |||
12.5.5.2.1.5. Bulk Recall and Return | 12.5.5.2.1.5. Bulk Recall and Return | |||
pNFS supports recalling and returning all layouts that are for files | pNFS supports recalling and returning all layouts that are for files | |||
belonging to a particular fsid (LAYOUTRECALL4_FSID, | belonging to a particular fsid (LAYOUTRECALL4_FSID, | |||
LAYOUTRETURN4_FSID) or client ID (LAYOUTRECALL4_ALL, | LAYOUTRETURN4_FSID) or client ID (LAYOUTRECALL4_ALL, | |||
LAYOUTRETURN4_ALL). There are no "bulk" stateids, so detection of | LAYOUTRETURN4_ALL). There are no "bulk" stateids, so detection of | |||
races via the seqid is not possible. The server MUST NOT initiate | races via the seqid is not possible. The server MUST NOT initiate | |||
bulk recall while another recall is in progress, or the corresponding | bulk recall while another recall is in progress, or the corresponding | |||
LAYOUTRETURN is in progress or pending. In the event the server | LAYOUTRETURN is in progress or pending. In the event the server | |||
sends a bulk recall while the client has pending or in progress | sends a bulk recall while the client has a pending or in-progress | |||
LAYOUTRETURN, CB_LAYOUTRECALL, or LAYOUTGET, the client returns | LAYOUTRETURN, CB_LAYOUTRECALL, or LAYOUTGET, the client returns | |||
NFS4ERR_DELAY. In the event the client sends a LAYOUTGET or | NFS4ERR_DELAY. In the event the client sends a LAYOUTGET or | |||
LAYOUTRETURN while a bulk recall is in progress, the server returns | LAYOUTRETURN while a bulk recall is in progress, the server returns | |||
NFS4ERR_RECALLCONFLICT. If the client sends a LAYOUTGET or | NFS4ERR_RECALLCONFLICT. If the client sends a LAYOUTGET or | |||
LAYOUTRETURN after the server receives NFS4ERR_DELAY from a bulk | LAYOUTRETURN after the server receives NFS4ERR_DELAY from a bulk | |||
recall, then to ensure forward progress, the server MAY return | recall, then to ensure forward progress, the server MAY return | |||
NFS4ERR_RECALLCONFLICT. | NFS4ERR_RECALLCONFLICT. | |||
Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, the server MUST | Once a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL is sent, the server MUST | |||
NOT allow the client to use any layout stateid except for | NOT allow the client to use any layout stateid except for | |||
skipping to change at page 298, line 39 | skipping to change at page 298, line 43 | |||
client MUST NOT use the layout stateids again. It MUST use LAYOUTGET | client MUST NOT use the layout stateids again. It MUST use LAYOUTGET | |||
to obtain new layout stateids. | to obtain new layout stateids. | |||
Once a CB_LAYOUTRECALL of LAYOUTRECALL4_FSID is sent, the server MUST | Once a CB_LAYOUTRECALL of LAYOUTRECALL4_FSID is sent, the server MUST | |||
NOT allow the client to use any layout stateid that refers to a file | NOT allow the client to use any layout stateid that refers to a file | |||
with the specified fsid except for LAYOUTCOMMIT operations. Once the | with the specified fsid except for LAYOUTCOMMIT operations. Once the | |||
client receives a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL, it MUST NOT | client receives a CB_LAYOUTRECALL of LAYOUTRECALL4_ALL, it MUST NOT | |||
use any layout stateid that refers to a file with the specified fsid | use any layout stateid that refers to a file with the specified fsid | |||
except for LAYOUTCOMMIT operations. Once a LAYOUTRETURN of | except for LAYOUTCOMMIT operations. Once a LAYOUTRETURN of | |||
LAYOUTRETURN4_FSID is sent, all layout stateids granted to the | LAYOUTRETURN4_FSID is sent, all layout stateids granted to the | |||
referenced fsid are freed. The client MUST NOT use the layout | referenced fsid are freed. The client MUST NOT use those freed | |||
stateids for files with the referenced fsid again. It MUST use | layout stateids for files with the referenced fsid again. | |||
LAYOUTGET to obtain new layout stateids files with the referenced | Subsequently, for any file with the referenced fsid, to use a layout, | |||
fsid. | the client MUST first send a LAYOUTGET operation in order to obtain a | |||
new layout stateid for that file. | ||||
If the server has sent a bulk CB_LAYOUTRECALL, and receives a | If the server has sent a bulk CB_LAYOUTRECALL and receives a | |||
LAYOUTGET, or a LAYOUTRETURN with a stateid, the server MUST return | LAYOUTGET, or a LAYOUTRETURN with a stateid, the server MUST return | |||
NFS4ERR_RECALLCONFLICT. If the server has sent a bulk | NFS4ERR_RECALLCONFLICT. If the server has sent a bulk | |||
CB_LAYOUTRECALL, and receives a LAYOUTRETURN with an lr_returntype | CB_LAYOUTRECALL and receives a LAYOUTRETURN with an lr_returntype | |||
that is not equal to the lor_recalltype of the CB_LAYOUTRECALL, the | that is not equal to the lor_recalltype of the CB_LAYOUTRECALL, the | |||
server MUST return NFS4ERR_RECALLCONFLICT. | server MUST return NFS4ERR_RECALLCONFLICT. | |||
12.5.6. Revoking Layouts | 12.5.6. Revoking Layouts | |||
Parallel NFS permits servers to revoke layouts from clients that fail | Parallel NFS permits servers to revoke layouts from clients that fail | |||
to response to recalls and/or fail to renew their lease in time. | to respond to recalls and/or fail to renew their lease in time. | |||
Whether the server revokes the layout or not depends on the layout | Depending on the layout type, the server might revoke the layout and | |||
type, and what actions are taken with respect to the client's I/O to | might take certain actions with respect to the client's I/O to data | |||
data servers is also layout type specific. | servers. | |||
12.5.7. Metadata Server Write Propagation | 12.5.7. Metadata Server Write Propagation | |||
Asynchronous writes written through the metadata server may be | Asynchronous writes written through the metadata server may be | |||
propagated lazily to the storage devices. For data written | propagated lazily to the storage devices. For data written | |||
asynchronously through the metadata server, a client performing a | asynchronously through the metadata server, a client performing a | |||
read at the appropriate storage device is not guaranteed to see the | read at the appropriate storage device is not guaranteed to see the | |||
newly written data until a COMMIT occurs at the metadata server. | newly written data until a COMMIT occurs at the metadata server. | |||
While the write is pending, reads to the storage device may give out | While the write is pending, reads to the storage device may give out | |||
either the old data, the new data, or a mixture of new and old. Upon | either the old data, the new data, or a mixture of new and old. Upon | |||
skipping to change at page 300, line 7 | skipping to change at page 300, line 11 | |||
what the fs_layout_type attribute said, the server does not support | what the fs_layout_type attribute said, the server does not support | |||
pNFS, and the client will not be able use pNFS to that server; in | pNFS, and the client will not be able use pNFS to that server; in | |||
this case, the server MUST return NFS4ERR_NOTSUPP in response to any | this case, the server MUST return NFS4ERR_NOTSUPP in response to any | |||
pNFS operation. | pNFS operation. | |||
The client then creates a session, requesting a persistent session, | The client then creates a session, requesting a persistent session, | |||
so that exclusive creates can be done with single round trip via the | so that exclusive creates can be done with single round trip via the | |||
createmode4 of GUARDED4. If the session ends up not being | createmode4 of GUARDED4. If the session ends up not being | |||
persistent, the client will use EXCLUSIVE4_1 for exclusive creates. | persistent, the client will use EXCLUSIVE4_1 for exclusive creates. | |||
If a file is to be created on a pNFS enabled file system, the client | If a file is to be created on a pNFS-enabled file system, the client | |||
uses the OPEN operation. With the normal set of attributes that may | uses the OPEN operation. With the normal set of attributes that may | |||
be provided upon OPEN used for creation, there is an OPTIONAL | be provided upon OPEN used for creation, there is an OPTIONAL | |||
layout_hint attribute. The client's use of layout_hint allows the | layout_hint attribute. The client's use of layout_hint allows the | |||
client to express its preference for a layout type and its associated | client to express its preference for a layout type and its associated | |||
layout details. The use of a createmode4 of UNCHECKED4, GUARDED4, or | layout details. The use of a createmode4 of UNCHECKED4, GUARDED4, or | |||
EXCLUSIVE4_1 will allow the client to provide the layout_hint | EXCLUSIVE4_1 will allow the client to provide the layout_hint | |||
attribute at create time. The client MUST NOT use EXCLUSIVE4 (see | attribute at create time. The client MUST NOT use EXCLUSIVE4 (see | |||
Table 10). The client is RECOMMENDED to combine a GETATTR operation | Table 10). The client is RECOMMENDED to combine a GETATTR operation | |||
after the OPEN within the same COMPOUND. The GETATTR may then | after the OPEN within the same COMPOUND. The GETATTR may then | |||
retrieve the layout_type attribute for the newly created file. The | retrieve the layout_type attribute for the newly created file. The | |||
skipping to change at page 300, line 39 | skipping to change at page 300, line 43 | |||
Assuming the client supports the layout type returned by GETATTR and | Assuming the client supports the layout type returned by GETATTR and | |||
it chooses to use pNFS for data access, it then sends LAYOUTGET using | it chooses to use pNFS for data access, it then sends LAYOUTGET using | |||
the filehandle and stateid returned by OPEN, specifying the range it | the filehandle and stateid returned by OPEN, specifying the range it | |||
wants to do I/O on. The response is a layout, which may be a subset | wants to do I/O on. The response is a layout, which may be a subset | |||
of the range for which the client asked. It also includes device IDs | of the range for which the client asked. It also includes device IDs | |||
and a description of how data is organized (or in the case of | and a description of how data is organized (or in the case of | |||
writing, how data is to be organized) across the devices. The device | writing, how data is to be organized) across the devices. The device | |||
IDs and data description are encoded in a format that is specific to | IDs and data description are encoded in a format that is specific to | |||
the layout type, but the client is expected to understand. | the layout type, but the client is expected to understand. | |||
When the client wants to send an I/O, it determines which device ID | When the client wants to send an I/O, it determines to which device | |||
it needs to send the I/O command to by examining the data description | ID it needs to send the I/O command by examining the data description | |||
in the layout. It then sends a GETDEVICEINFO to find the device | in the layout. It then sends a GETDEVICEINFO to find the device | |||
address(es) of the device ID. The client then sends the I/O request | address(es) of the device ID. The client then sends the I/O request | |||
one of device ID's device addresses, using the storage protocol | one of device ID's device addresses, using the storage protocol | |||
defined for the layout type. Note that if a client has multiple I/Os | defined for the layout type. Note that if a client has multiple I/Os | |||
to send, these I/O requests may be done in parallel. | to send, these I/O requests may be done in parallel. | |||
If the I/O was a WRITE, then at some point the client may want to use | If the I/O was a WRITE, then at some point the client may want to use | |||
LAYOUTCOMMIT to commit the modification time and the new size of the | LAYOUTCOMMIT to commit the modification time and the new size of the | |||
file (if it believes it extended the file size) to the metadata | file (if it believes it extended the file size) to the metadata | |||
server and the modified data to the file system. | server and the modified data to the file system. | |||
skipping to change at page 301, line 17 | skipping to change at page 301, line 20 | |||
Recovery is complicated by the distributed nature of the pNFS | Recovery is complicated by the distributed nature of the pNFS | |||
protocol. In general, crash recovery for layouts is similar to crash | protocol. In general, crash recovery for layouts is similar to crash | |||
recovery for delegations in the base NFSv4.1 protocol. However, the | recovery for delegations in the base NFSv4.1 protocol. However, the | |||
client's ability to perform I/O without contacting the metadata | client's ability to perform I/O without contacting the metadata | |||
server introduces subtleties that must be handled correctly if the | server introduces subtleties that must be handled correctly if the | |||
possibility of file system corruption is to be avoided. | possibility of file system corruption is to be avoided. | |||
12.7.1. Recovery from Client Restart | 12.7.1. Recovery from Client Restart | |||
Client recovery for layouts is similar to client recovery for other | Client recovery for layouts is similar to client recovery for other | |||
lock and delegation state. When an pNFS client restarts, it will | lock and delegation state. When a pNFS client restarts, it will lose | |||
lose all information about the layouts that it previously owned. | all information about the layouts that it previously owned. There | |||
There are two methods by which the server can reclaim these resources | are two methods by which the server can reclaim these resources and | |||
and allow otherwise conflicting layouts to be provided to other | allow otherwise conflicting layouts to be provided to other clients. | |||
clients. | ||||
The first is through the expiry of the client's lease. If the client | The first is through the expiry of the client's lease. If the client | |||
recovery time is longer than the lease period, the client's lease | recovery time is longer than the lease period, the client's lease | |||
will expire and the server will know that state may be released. For | will expire and the server will know that state may be released. For | |||
layouts the server may release the state immediately upon lease | layouts, the server may release the state immediately upon lease | |||
expiry or it may allow the layout to persist awaiting possible lease | expiry or it may allow the layout to persist, awaiting possible lease | |||
revival, as long as no other layout conflicts. | revival, as long as no other layout conflicts. | |||
The second is through the client restarting in less time than it | The second is through the client restarting in less time than it | |||
takes for the lease period to expire. In such a case, the client | takes for the lease period to expire. In such a case, the client | |||
will contact the server through the standard EXCHANGE_ID protocol. | will contact the server through the standard EXCHANGE_ID protocol. | |||
The server will find that the client's co_ownerid matches the | The server will find that the client's co_ownerid matches the | |||
co_ownerid of the previous client invocation, but that the verifier | co_ownerid of the previous client invocation, but that the verifier | |||
is different. The server uses this as a signal to release all layout | is different. The server uses this as a signal to release all layout | |||
state associated with the client's previous invocation. In this | state associated with the client's previous invocation. In this | |||
scenario, the data written by the client but not covered by a | scenario, the data written by the client but not covered by a | |||
skipping to change at page 302, line 5 | skipping to change at page 302, line 8 | |||
If a client believes its lease has expired, it MUST NOT send I/O to | If a client believes its lease has expired, it MUST NOT send I/O to | |||
the storage device until it has validated its lease. The client can | the storage device until it has validated its lease. The client can | |||
send a SEQUENCE operation to the metadata server. If the SEQUENCE | send a SEQUENCE operation to the metadata server. If the SEQUENCE | |||
operation is successful, but sr_status_flag has | operation is successful, but sr_status_flag has | |||
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, | |||
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or | SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, or | |||
SEQ4_STATUS_ADMIN_STATE_REVOKED set, the client MUST NOT use | SEQ4_STATUS_ADMIN_STATE_REVOKED set, the client MUST NOT use | |||
currently held layouts. The client has two choices to recover from | currently held layouts. The client has two choices to recover from | |||
the lease expiration. First, for all modified but uncommitted data, | the lease expiration. First, for all modified but uncommitted data, | |||
write it to the metadata server using the FILE_SYNC4 flag for the | the client writes it to the metadata server using the FILE_SYNC4 flag | |||
WRITEs or WRITE and COMMIT. Second, the client reestablishes a | for the WRITEs, or WRITE and COMMIT. Second, the client re- | |||
client ID and session with the server and obtain new layouts and | establishes a client ID and session with the server and obtains new | |||
device ID to device address mappings for the modified data ranges and | layouts and device-ID-to-device-address mappings for the modified | |||
then write the data to the storage devices with the newly obtained | data ranges and then writes the data to the storage devices with the | |||
layouts. | newly obtained layouts. | |||
If sr_status_flags from the metadata server has | If sr_status_flags from the metadata server has | |||
SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns | SEQ4_STATUS_RESTART_RECLAIM_NEEDED set (or SEQUENCE returns | |||
NFS4ERR_BAD_SESSION and CREATE_SESSION returns | NFS4ERR_BAD_SESSION and CREATE_SESSION returns | |||
NFS4ERR_STALE_CLIENTID) then the metadata server has restarted, and | NFS4ERR_STALE_CLIENTID), then the metadata server has restarted, and | |||
the client SHOULD recover using the methods described in | the client SHOULD recover using the methods described in | |||
Section 12.7.4. | Section 12.7.4. | |||
If sr_status_flags from the metadata server has | If sr_status_flags from the metadata server has | |||
SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following | SEQ4_STATUS_LEASE_MOVED set, then the client recovers by following | |||
the procedure described in Section 11.7.7.1. After that, the client | the procedure described in Section 11.7.7.1. After that, the client | |||
may get an indication that the layout state was not moved with the | may get an indication that the layout state was not moved with the | |||
file system. The client recovers as in the other applicable | file system. The client recovers as in the other applicable | |||
situations discussed in Paragraph 1 or Paragraph 2 of this section. | situations discussed in the first two paragraphs of this section. | |||
If sr_status_flags reports no loss of state, then the lease for the | If sr_status_flags reports no loss of state, then the lease for the | |||
layouts the client has are valid and renewed, and the client can once | layouts that the client has are valid and renewed, and the client can | |||
again send I/O requests to the storage devices. | once again send I/O requests to the storage devices. | |||
While clients SHOULD NOT send I/Os to storage devices that may extend | While clients SHOULD NOT send I/Os to storage devices that may extend | |||
past the lease expiration time period, this is not always possible; | past the lease expiration time period, this is not always possible, | |||
for example, an extended network partition that starts after the I/O | for example, an extended network partition that starts after the I/O | |||
is sent and does not heal until the I/O request is received by the | is sent and does not heal until the I/O request is received by the | |||
storage device. Thus the metadata server and/or storage devices are | storage device. Thus, the metadata server and/or storage devices are | |||
responsible for protecting themselves from I/Os that are sent before | responsible for protecting themselves from I/Os that are both sent | |||
the lease expires, but arrive after the lease expires. See | before the lease expires and arrive after the lease expires. See | |||
Section 12.7.3. | Section 12.7.3. | |||
12.7.3. Dealing with Loss of Layout State on the Metadata Server | 12.7.3. Dealing with Loss of Layout State on the Metadata Server | |||
This is a description of the case where all of the following are | This is a description of the case where all of the following are | |||
true: | true: | |||
o the metadata server has not restarted | o the metadata server has not restarted | |||
o a pNFS client's layouts have been discarded (usually because the | o a pNFS client's layouts have been discarded (usually because the | |||
skipping to change at page 302, line 48 | skipping to change at page 303, line 4 | |||
12.7.3. Dealing with Loss of Layout State on the Metadata Server | 12.7.3. Dealing with Loss of Layout State on the Metadata Server | |||
This is a description of the case where all of the following are | This is a description of the case where all of the following are | |||
true: | true: | |||
o the metadata server has not restarted | o the metadata server has not restarted | |||
o a pNFS client's layouts have been discarded (usually because the | o a pNFS client's layouts have been discarded (usually because the | |||
client's lease expired) and are invalid | client's lease expired) and are invalid | |||
o an I/O from the pNFS client arrives at the storage device | o an I/O from the pNFS client arrives at the storage device | |||
The metadata server and its storage devices MUST solve this by | The metadata server and its storage devices MUST solve this by | |||
fencing the client. In other words, prevent the execution of I/O | fencing the client. In other words, they MUST solve this by | |||
operations from the client to the storage devices after layout state | preventing the execution of I/O operations from the client to the | |||
loss. The details of how fencing is done are specific to the layout | storage devices after layout state loss. The details of how fencing | |||
type. The solution for NFSv4.1 file-based layouts is described in | is done are specific to the layout type. The solution for NFSv4.1 | |||
(Section 13.11), and for other layout types in their respective | file-based layouts is described in (Section 13.11), and solutions for | |||
external specification documents. | other layout types are in their respective external specification | |||
documents. | ||||
12.7.4. Recovery from Metadata Server Restart | 12.7.4. Recovery from Metadata Server Restart | |||
The pNFS client will discover that the metadata server has restarted | The pNFS client will discover that the metadata server has restarted | |||
via the methods described in Section 8.4.2 and discussed in a pNFS- | via the methods described in Section 8.4.2 and discussed in a pNFS- | |||
specific context in Paragraph 2, of Section 12.7.2. The client MUST | specific context in Paragraph 2, of Section 12.7.2. The client MUST | |||
stop using layouts and delete the device ID to device address | stop using layouts and delete the device ID to device address | |||
mappings it previously received from the metadata server. Having | mappings it previously received from the metadata server. Having | |||
done that, if the client wrote data to the storage device without | done that, if the client wrote data to the storage device without | |||
committing the layouts via LAYOUTCOMMIT, then the client has | committing the layouts via LAYOUTCOMMIT, then the client has | |||
additional work to do in order to have the client, metadata server | additional work to do in order to have the client, metadata server, | |||
and storage device(s) all synchronized on the state of the data. | and storage device(s) all synchronized on the state of the data. | |||
o If the client has data still modified and unwritten in the | o If the client has data still modified and unwritten in the | |||
client's memory, the client has only two choices. | client's memory, the client has only two choices. | |||
1. The client can obtain a layout via LAYOUTGET after the | 1. The client can obtain a layout via LAYOUTGET after the | |||
server's grace period and write the data to the storage | server's grace period and write the data to the storage | |||
devices. | devices. | |||
2. The client can write that data through the metadata server | 2. The client can WRITE that data through the metadata server | |||
using the WRITE (Section 18.32) operation, and then obtain | using the WRITE (Section 18.32) operation, and then obtain | |||
layouts as desired. | layouts as desired. | |||
o Even if the client synchronously wrote data to the storage device, | o If the client asynchronously wrote data to the storage device, but | |||
if it still has a copy of the data in its memory, then it has | still has a copy of the data in its memory, then it has available | |||
available to it the recovery options listed above in the previous | to it the recovery options listed above in the previous bullet | |||
bullet point. If the metadata server is also in its grace period, | point. If the metadata server is also in its grace period, the | |||
the client has available to it the options below in the next | client has available to it the options below in the next bullet | |||
bullet item. | point. | |||
o The client does not have a copy of the data in its memory and the | o The client does not have a copy of the data in its memory and the | |||
metadata server is still in its grace period. The client cannot | metadata server is still in its grace period. The client cannot | |||
use LAYOUTGET (within or outside the grace period) to reclaim a | use LAYOUTGET (within or outside the grace period) to reclaim a | |||
layout because the contents of the response from LAYOUTGET may not | layout because the contents of the response from LAYOUTGET may not | |||
match what it had previously. The range might be different or it | match what it had previously. The range might be different or the | |||
might get the same range but the content of the layout might be | client might get the same range but the content of the layout | |||
different. Even if the content of the layout appears to be the | might be different. Even if the content of the layout appears to | |||
same, the device IDs may map to different device addresses, and | be the same, the device IDs may map to different device addresses, | |||
even if the device addresses are the same, the device addresses | and even if the device addresses are the same, the device | |||
could have been assigned to a different storage device. The | addresses could have been assigned to a different storage device. | |||
option of retrieving the data from the storage device and writing | The option of retrieving the data from the storage device and | |||
it to the metadata server per the recovery scenario described | writing it to the metadata server per the recovery scenario | |||
above is not available because, again, the mappings of range to | described above is not available because, again, the mappings of | |||
device ID, device ID to device address, device address to physical | range to device ID, device ID to device address, and device | |||
device are stale and new mappings via new LAYOUTGET do not solve | address to physical device are stale, and new mappings via new | |||
the problem. | LAYOUTGET do not solve the problem. | |||
The only recovery option for this scenario is to send a | The only recovery option for this scenario is to send a | |||
LAYOUTCOMMIT in reclaim mode, which the metadata server will | LAYOUTCOMMIT in reclaim mode, which the metadata server will | |||
accept as long as it is in its grace period. The use of | accept as long as it is in its grace period. The use of | |||
LAYOUTCOMMIT in reclaim mode informs the metadata server that the | LAYOUTCOMMIT in reclaim mode informs the metadata server that the | |||
layout has changed. It is critical the metadata server receive | layout has changed. It is critical that the metadata server | |||
this information before its grace period ends, and thus before it | receive this information before its grace period ends, and thus | |||
starts allowing updates to the file system. | before it starts allowing updates to the file system. | |||
To send LAYOUTCOMMIT in reclaim mode, the client sets the | To send LAYOUTCOMMIT in reclaim mode, the client sets the | |||
loca_reclaim field of the operation's arguments (Section 18.42.1) | loca_reclaim field of the operation's arguments (Section 18.42.1) | |||
to TRUE. During the metadata server's recovery grace period (and | to TRUE. During the metadata server's recovery grace period (and | |||
only during the recovery grace period) the metadata server is | only during the recovery grace period) the metadata server is | |||
prepared to accept LAYOUTCOMMIT requests with the loca_reclaim | prepared to accept LAYOUTCOMMIT requests with the loca_reclaim | |||
field set to TRUE. | field set to TRUE. | |||
When loca_reclaim is TRUE, the client is attempting to commit | When loca_reclaim is TRUE, the client is attempting to commit | |||
changes to the layout that occurred prior to the restart of the | changes to the layout that occurred prior to the restart of the | |||
metadata server. The metadata server applies some consistency | metadata server. The metadata server applies some consistency | |||
checks on the loca_layoutupdate field of the arguments to | checks on the loca_layoutupdate field of the arguments to | |||
determine whether the client can commit the data written to the | determine whether the client can commit the data written to the | |||
storage device to the file system. The loca_layoutupdate field is | storage device to the file system. The loca_layoutupdate field is | |||
of data type layoutupdate4, and contains layout type-specific | of data type layoutupdate4 and contains layout-type-specific | |||
content (in the lou_body field of loca_layoutupdate). The layout | content (in the lou_body field of loca_layoutupdate). The layout- | |||
type-specific information that loca_layoutupdate might have is | type-specific information that loca_layoutupdate might have is | |||
discussed in Section 12.5.4.3. If the metadata server's | discussed in Section 12.5.4.3. If the metadata server's | |||
consistency checks on loca_layoutupdate succeed, then the metadata | consistency checks on loca_layoutupdate succeed, then the metadata | |||
server MUST commit the data (as described by the loca_offset, | server MUST commit the data (as described by the loca_offset, | |||
loca_length, and loca_layoutupdate fields of the arguments) that | loca_length, and loca_layoutupdate fields of the arguments) that | |||
was written to storage device. If the metadata server's | was written to the storage device. If the metadata server's | |||
consistency checks on loca_layoutupdate fail, the metadata server | consistency checks on loca_layoutupdate fail, the metadata server | |||
rejects the LAYOUTCOMMIT operation, and makes no changes to the | rejects the LAYOUTCOMMIT operation and makes no changes to the | |||
file system. However, any time LAYOUTCOMMIT with loca_reclaim | file system. However, any time LAYOUTCOMMIT with loca_reclaim | |||
TRUE fails, the pNFS client has lost all the data in the range | TRUE fails, the pNFS client has lost all the data in the range | |||
defined by <loca_offset, loca_length>. A client can defend | defined by <loca_offset, loca_length>. A client can defend | |||
against this risk by caching all data, whether written | against this risk by caching all data, whether written | |||
synchronously or asynchronously in its memory and not release the | synchronously or asynchronously in its memory, and by not | |||
cached data until a successful LAYOUTCOMMIT. This condition does | releasing the cached data until a successful LAYOUTCOMMIT. This | |||
not hold true for all layout types; for example, files-based | condition does not hold true for all layout types; for example, | |||
storage devices need not suffer from this limitation. | file-based storage devices need not suffer from this limitation. | |||
o The client does not have a copy of the data in its memory and the | o The client does not have a copy of the data in its memory and the | |||
metadata server is no longer in its grace period; i.e. the | metadata server is no longer in its grace period; i.e., the | |||
metadata server returns NFS4ERR_NO_GRACE. As with the scenario in | metadata server returns NFS4ERR_NO_GRACE. As with the scenario in | |||
the above bullet item, the failure of LAYOUTCOMMIT means the data | the above bullet point, the failure of LAYOUTCOMMIT means the data | |||
in the range <loca_offset, loca_length> lost. The defense against | in the range <loca_offset, loca_length> lost. The defense against | |||
the risk is the same; cache all written data on the client until a | the risk is the same -- cache all written data on the client until | |||
successful LAYOUTCOMMIT. | a successful LAYOUTCOMMIT. | |||
12.7.5. Operations During Metadata Server Grace Period | 12.7.5. Operations during Metadata Server Grace Period | |||
Some of the recovery scenarios thus far noted that some operations, | Some of the recovery scenarios thus far noted that some operations | |||
namely WRITE and LAYOUTGET might be permitted during the metadata | (namely, WRITE and LAYOUTGET) might be permitted during the metadata | |||
server's grace period. The metadata server may allow these | server's grace period. The metadata server may allow these | |||
operations during its grace period. For LAYOUTGET, the metadata | operations during its grace period. For LAYOUTGET, the metadata | |||
server must reliably determine that servicing such a request will not | server must reliably determine that servicing such a request will not | |||
conflict with an impending LAYOUTCOMMIT reclaim request. For WRITE, | conflict with an impending LAYOUTCOMMIT reclaim request. For WRITE, | |||
it must reliably determine that it will not conflict with an | the metadata server must reliably determine that servicing the | |||
impending OPEN; or a LOCK where the file has mandatory file locking | request will not conflict with an impending OPEN or with a LOCK where | |||
enabled. | the file has mandatory byte-range locking enabled. | |||
As mentioned previously, some operations, namely WRITE and LAYOUTGET | As mentioned previously, for expediency, the metadata server might | |||
may be rejected during the metadata server's grace period, because to | reject some operations (namely, WRITE and LAYOUTGET) during its grace | |||
provide simple, valid handling during the grace period, the easiest | period, because the simplest correct approach is to reject all non- | |||
method is to simply reject all non-reclaim pNFS requests and WRITE | reclaim pNFS requests and WRITE operations by returning the | |||
operations by returning the NFS4ERR_GRACE error. However, depending | NFS4ERR_GRACE error. However, depending on the storage protocol | |||
on the storage protocol (which is specific to the layout type) and | (which is specific to the layout type) and metadata server | |||
metadata server implementation, the metadata server may be able to | implementation, the metadata server may be able to determine that a | |||
determine that a particular request is safe. For example, a metadata | particular request is safe. For example, a metadata server may save | |||
server may save provisional allocation mappings for each file to | provisional allocation mappings for each file to stable storage, as | |||
stable storage, as well as information about potentially conflicting | well as information about potentially conflicting OPEN share modes | |||
OPEN share modes and mandatory byte-range locks that might have been | and mandatory byte-range locks that might have been in effect at the | |||
in effect at the time of restart, and use this information during the | time of restart, and the metadata server may use this information | |||
recovery grace period to determine that a WRITE request is safe. | during the recovery grace period to determine that a WRITE request is | |||
safe. | ||||
12.7.6. Storage Device Recovery | 12.7.6. Storage Device Recovery | |||
Recovery from storage device restart is mostly dependent upon the | Recovery from storage device restart is mostly dependent upon the | |||
layout type in use. However, there are a few general techniques a | layout type in use. However, there are a few general techniques a | |||
client can use if it discovers a storage device has crashed while | client can use if it discovers a storage device has crashed while | |||
holding modified, uncommitted data that was asynchronously written. | holding modified, uncommitted data that was asynchronously written. | |||
First and foremost, it is important to realize that the client is the | First and foremost, it is important to realize that the client is the | |||
only one which has the information necessary to recover non-committed | only one that has the information necessary to recover non-committed | |||
data; since, it holds the modified data and probably nothing else | data since it holds the modified data and probably nothing else does. | |||
does. Second, the best solution is for the client to err on the side | Second, the best solution is for the client to err on the side of | |||
of caution and attempt to re-write the modified data through another | caution and attempt to rewrite the modified data through another | |||
path. | path. | |||
The client SHOULD immediately write the data to the metadata server, | The client SHOULD immediately WRITE the data to the metadata server, | |||
with the stable field in the WRITE4args set to FILE_SYNC4. Once it | with the stable field in the WRITE4args set to FILE_SYNC4. Once it | |||
does this, there is no need to wait for the original storage device. | does this, there is no need to wait for the original storage device. | |||
12.8. Metadata and Storage Device Roles | 12.8. Metadata and Storage Device Roles | |||
If the same physical hardware is used to implement both a metadata | If the same physical hardware is used to implement both a metadata | |||
server and storage device, then the same hardware entity is to be | server and storage device, then the same hardware entity is to be | |||
understood to be implementing two distinct roles and it is important | understood to be implementing two distinct roles and it is important | |||
that it be clearly understood on behalf of which role the hardware is | that it be clearly understood on behalf of which role the hardware is | |||
executing at any given time. | executing at any given time. | |||
Two sub-cases can be distinguished. | Two sub-cases can be distinguished. | |||
1. The storage device uses NFSv4.1 as the storage protocol, i.e. | 1. The storage device uses NFSv4.1 as the storage protocol, i.e., | |||
same physical hardware is used to implement both a metadata and | the same physical hardware is used to implement both a metadata | |||
data server. See Section 13.1 for a description how multiple | and data server. See Section 13.1 for a description of how | |||
roles are handled. | multiple roles are handled. | |||
2. The storage device does not use NFSv4.1 as the storage protocol, | 2. The storage device does not use NFSv4.1 as the storage protocol, | |||
and the same physical hardware is used to implement both a | and the same physical hardware is used to implement both a | |||
metadata and storage device. Whether distinct network addresses | metadata and storage device. Whether distinct network addresses | |||
are used to access metadata server and storage device is | are used to access the metadata server and storage device is | |||
immaterial, because, it is always clear to the pNFS client and | immaterial. This is because it is always clear to the pNFS | |||
server, from upper layer protocol being used (NFSv4.1 or non- | client and server, from the upper-layer protocol being used | |||
NFSv4.1) what role the request to the common server network | (NFSv4.1 or non-NFSv4.1), to which role the request to the common | |||
address is directed to. | server network address is directed. | |||
12.9. Security Considerations for pNFS | 12.9. Security Considerations for pNFS | |||
pNFS separates file system metadata and data and provides access to | pNFS separates file system metadata and data and provides access to | |||
both. There are pNFS-specific operations (listed in Section 12.3) | both. There are pNFS-specific operations (listed in Section 12.3) | |||
that provide access to the metadata; all existing NFSv4.1 | that provide access to the metadata; all existing NFSv4.1 | |||
conventional (non-pNFS) security mechanisms and features apply to | conventional (non-pNFS) security mechanisms and features apply to | |||
accessing the metadata. The combination of components in a pNFS | accessing the metadata. The combination of components in a pNFS | |||
system (see Figure 1) is required to preserve the security properties | system (see Figure 1) is required to preserve the security properties | |||
of NFSv4.1 with respect to an entity accessing storage device from a | of NFSv4.1 with respect to an entity that is accessing a storage | |||
client, including security countermeasures to defend against threats | device from a client, including security countermeasures to defend | |||
that NFSv4.1 provides defenses for in environments where these | against threats for which NFSv4.1 provides defenses in environments | |||
threats are considered significant. | where these threats are considered significant. | |||
In some cases, the security countermeasures for connections to | In some cases, the security countermeasures for connections to | |||
storage devices may take the form of physical isolation or a | storage devices may take the form of physical isolation or a | |||
recommendation not to use pNFS in an environment. For example, it | recommendation to avoid the use of pNFS in an environment. For | |||
may be impractical to provide confidentiality protection for some | example, it may be impractical to provide confidentiality protection | |||
storage protocols to protect against eavesdropping; in environments | for some storage protocols to protect against eavesdropping. In | |||
where eavesdropping on such protocols is of sufficient concern to | environments where eavesdropping on such protocols is of sufficient | |||
require countermeasures, physical isolation of the communication | concern to require countermeasures, physical isolation of the | |||
channel (e.g., via direct connection from client(s) to storage | communication channel (e.g., via direct connection from client(s) to | |||
device(s)) and/or a decision to forgo use of pNFS (e.g., and fall | storage device(s)) and/or a decision to forgo use of pNFS (e.g., and | |||
back to conventional NFSv4.1) may be appropriate courses of action. | fall back to conventional NFSv4.1) may be appropriate courses of | |||
action. | ||||
Where communication with storage devices is subject to the same | Where communication with storage devices is subject to the same | |||
threats as client to metadata server communication, the protocols | threats as client-to-metadata server communication, the protocols | |||
used for that communication need to provide security mechanisms as | used for that communication need to provide security mechanisms as | |||
strong as or no weaker than those available via RPCSEC_GSS for | strong as or no weaker than those available via RPCSEC_GSS for | |||
NFSv4.1. Except for the storage protocol used for the | NFSv4.1. Except for the storage protocol used for the | |||
LAYOUT4_NFSV4_1_FILES layout (see Section 13), i.e. except for | LAYOUT4_NFSV4_1_FILES layout (see Section 13), i.e., except for | |||
NFSv4.1, it is beyond the scope of this document to specify the | NFSv4.1, it is beyond the scope of this document to specify the | |||
security mechanisms for storage access protocols. | security mechanisms for storage access protocols. | |||
pNFS implementations MUST NOT remove NFSv4.1's access controls. The | pNFS implementations MUST NOT remove NFSv4.1's access controls. The | |||
combination of clients, storage devices, and the metadata server are | combination of clients, storage devices, and the metadata server are | |||
responsible for ensuring that all client to storage device file data | responsible for ensuring that all client-to-storage-device file data | |||
access respects NFSv4.1's ACLs and file open modes. This entails | access respects NFSv4.1's ACLs and file open modes. This entails | |||
performing both of these checks on every access in the client, the | performing both of these checks on every access in the client, the | |||
storage device, or both (as applicable; when the storage device is an | storage device, or both (as applicable; when the storage device is an | |||
NFSv4.1 server, the storage device is ultimately responsible for | NFSv4.1 server, the storage device is ultimately responsible for | |||
controlling access as described in Section 13.9.2). If a pNFS | controlling access as described in Section 13.9.2). If a pNFS | |||
configuration performs these checks only in the client, the risk of a | configuration performs these checks only in the client, the risk of a | |||
misbehaving client obtaining unauthorized access is an important | misbehaving client obtaining unauthorized access is an important | |||
consideration in determining when it is appropriate to use such a | consideration in determining when it is appropriate to use such a | |||
pNFS configuration. Such layout types SHOULD NOT be used when | pNFS configuration. Such layout types SHOULD NOT be used when | |||
client-only access checks do not provide sufficient assurance that | client-only access checks do not provide sufficient assurance that | |||
NFSv4.1 access control is being applied correctly. (This is not a | NFSv4.1 access control is being applied correctly. (This is not a | |||
problem for the file layout type described in Section 13 because the | problem for the file layout type described in Section 13 because the | |||
storage access protocol for LAYOUT4_NFSV4_1_FILES is NFSv4.1, and | storage access protocol for LAYOUT4_NFSV4_1_FILES is NFSv4.1, and | |||
thus the security model for storage device access via | thus the security model for storage device access via | |||
LAYOUT4_NFSv4_1_FILES is the sames as that of the metadata server.) | LAYOUT4_NFSv4_1_FILES is the same as that of the metadata server.) | |||
For handling of access control specific to a layout, the reader | For handling of access control specific to a layout, the reader | |||
should examine the layout specification, such as the NFSv4.1/ | should examine the layout specification, such as the NFSv4.1/ | |||
files-based layout (Section 13) of this document, the blocks layout | file-based layout (Section 13) of this document, the blocks layout | |||
[41], and objects layout [40]. | [41], and objects layout [40]. | |||
13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type | 13. NFSv4.1 as a Storage Protocol in pNFS: the File Layout Type | |||
This section describes the semantics and format of NFSv4.1 file-based | This section describes the semantics and format of NFSv4.1 file-based | |||
layouts for pNFS. NFSv4.1 file-based layouts uses the | layouts for pNFS. NFSv4.1 file-based layouts use the | |||
LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type | LAYOUT4_NFSV4_1_FILES layout type. The LAYOUT4_NFSV4_1_FILES type | |||
defines striping data across multiple NFSv4.1 data servers. | defines striping data across multiple NFSv4.1 data servers. | |||
13.1. Client ID and Session Considerations | 13.1. Client ID and Session Considerations | |||
Sessions are a REQUIRED feature of NFSv4.1, and this extends to both | Sessions are a REQUIRED feature of NFSv4.1, and this extends to both | |||
the metadata server and file-based (NFSv4.1-based) data servers. | the metadata server and file-based (NFSv4.1-based) data servers. | |||
The role a server plays in pNFS is determined by the result it | The role a server plays in pNFS is determined by the result it | |||
returns from EXCHANGE_ID. The roles are: | returns from EXCHANGE_ID. The roles are: | |||
o metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result | o Metadata server (EXCHGID4_FLAG_USE_PNFS_MDS is set in the result | |||
eir_flags), | eir_flags). | |||
o data server (EXCHGID4_FLAG_USE_PNFS_DS) | o Data server (EXCHGID4_FLAG_USE_PNFS_DS). | |||
o non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an | o Non-metadata server (EXCHGID4_FLAG_USE_NON_PNFS). This is an | |||
NFSv4.1 server that does not support operations (e.g. LAYOUTGET) | NFSv4.1 server that does not support operations (e.g., LAYOUTGET) | |||
or attributes that pertain to pNFS. | or attributes that pertain to pNFS. | |||
The client MAY request zero or more of EXCHGID4_FLAG_USE_NON_PNFS, | The client MAY request zero or more of EXCHGID4_FLAG_USE_NON_PNFS, | |||
EXCHGID4_FLAG_USE_PNFS_DS, or EXCHGID4_FLAG_USE_PNFS_MDS, even though | EXCHGID4_FLAG_USE_PNFS_DS, or EXCHGID4_FLAG_USE_PNFS_MDS, even though | |||
some combinations (e.g. EXCHGID4_FLAG_USE_NON_PNFS | | some combinations (e.g., EXCHGID4_FLAG_USE_NON_PNFS | | |||
EXCHGID4_FLAG_USE_PNFS_MDS) are contradictory. The server however | EXCHGID4_FLAG_USE_PNFS_MDS) are contradictory. However, the server | |||
MUST only return the following acceptable combinations: | MUST only return the following acceptable combinations: | |||
+--------------------------------------------------------+ | +--------------------------------------------------------+ | |||
| Acceptable Results from EXCHANGE_ID | | | Acceptable Results from EXCHANGE_ID | | |||
+--------------------------------------------------------+ | +--------------------------------------------------------+ | |||
| EXCHGID4_FLAG_USE_PNFS_MDS | | | EXCHGID4_FLAG_USE_PNFS_MDS | | |||
| EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_PNFS_DS | | | EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_PNFS_DS | | |||
| EXCHGID4_FLAG_USE_PNFS_DS | | | EXCHGID4_FLAG_USE_PNFS_DS | | |||
| EXCHGID4_FLAG_USE_NON_PNFS | | | EXCHGID4_FLAG_USE_NON_PNFS | | |||
| EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_NON_PNFS | | | EXCHGID4_FLAG_USE_PNFS_DS | EXCHGID4_FLAG_USE_NON_PNFS | | |||
+--------------------------------------------------------+ | +--------------------------------------------------------+ | |||
As the above table implies, a server can have one or two roles. A | As the above table implies, a server can have one or two roles. A | |||
server can be both a metadata server and a data server or it can be | server can be both a metadata server and a data server, or it can be | |||
both a data server and non-metadata server. In addition to returning | both a data server and non-metadata server. In addition to returning | |||
two roles in EXCHANGE_ID's results, and thus serving both roles via a | two roles in the EXCHANGE_ID's results, and thus serving both roles | |||
common client ID, a server can serve two roles by returning a unique | via a common client ID, a server can serve two roles by returning a | |||
client ID and server owner for each role in each of two EXCHANGE_ID | unique client ID and server owner for each role in each of two | |||
results, with each result indicating each role. | EXCHANGE_ID results, with each result indicating each role. | |||
In the case of a server with concurrent pNFS roles that are served by | In the case of a server with concurrent pNFS roles that are served by | |||
a common client ID, if the EXCHANGE_ID request from the client has | a common client ID, if the EXCHANGE_ID request from the client has | |||
zero or a combination of the bits set in eia_flags, the server result | zero or a combination of the bits set in eia_flags, the server result | |||
should set bits which represent the higher of the acceptable | should set bits that represent the higher of the acceptable | |||
combination of the server roles, with a preference to match the roles | combination of the server roles, with a preference to match the roles | |||
requested by the client. Thus if a client request has | requested by the client. Thus, if a client request has | |||
(EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS | | (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS | | |||
EXCHGID4_FLAG_USE_PNFS_DS) flags set, and the server is both a | EXCHGID4_FLAG_USE_PNFS_DS) flags set, and the server is both a | |||
metadata server and a data server, serving both the roles by a common | metadata server and a data server, serving both the roles by a common | |||
client ID, the server SHOULD return with (EXCHGID4_FLAG_USE_PNFS_MDS | client ID, the server SHOULD return with (EXCHGID4_FLAG_USE_PNFS_MDS | |||
| EXCHGID4_FLAG_USE_PNFS_DS) set. | | EXCHGID4_FLAG_USE_PNFS_DS) set. | |||
In the case of a server that has multiple concurrent pNFS roles, each | In the case of a server that has multiple concurrent pNFS roles, each | |||
role served by a unique client ID, if the client specifies zero or a | role served by a unique client ID, if the client specifies zero or a | |||
combination of roles in the request, the server results SHOULD return | combination of roles in the request, the server results SHOULD return | |||
only one of the roles from the combination specified by the client | only one of the roles from the combination specified by the client | |||
skipping to change at page 309, line 20 | skipping to change at page 309, line 28 | |||
data server, it needs a client ID on that data server. If it does | data server, it needs a client ID on that data server. If it does | |||
not yet have a client ID from the server that had the | not yet have a client ID from the server that had the | |||
EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then | EXCHGID4_FLAG_USE_PNFS_DS flag set in the EXCHANGE_ID results, then | |||
the client needs to send an EXCHANGE_ID to the data server, using the | the client needs to send an EXCHANGE_ID to the data server, using the | |||
same co_ownerid as it sent to the metadata server, with the | same co_ownerid as it sent to the metadata server, with the | |||
EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's | EXCHGID4_FLAG_USE_PNFS_DS flag set in the arguments. If the server's | |||
EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the | EXCHANGE_ID results have EXCHGID4_FLAG_USE_PNFS_DS set, then the | |||
client may use the client ID to create sessions that will exchange | client may use the client ID to create sessions that will exchange | |||
pNFS data operations. The client ID returned by the data server has | pNFS data operations. The client ID returned by the data server has | |||
no relationship with the client ID returned by a metadata server | no relationship with the client ID returned by a metadata server | |||
unless the client IDs are equal and the server owners and server | unless the client IDs are equal, and the server owners and server | |||
scopes of the data server and metadata server are equal. | scopes of the data server and metadata server are equal. | |||
In NFSv4.1, the session ID in the SEQUENCE operation implies the | In NFSv4.1, the session ID in the SEQUENCE operation implies the | |||
client ID, which in turn might be used by the server to map the | client ID, which in turn might be used by the server to map the | |||
stateid to the right client/server pair. However, when a data server | stateid to the right client/server pair. However, when a data server | |||
is presented with a READ or WRITE operation with a stateid, because | is presented with a READ or WRITE operation with a stateid, because | |||
the stateid is associated with client ID on a metadata server, and | the stateid is associated with a client ID on a metadata server, and | |||
because the session ID in the preceding SEQUENCE operation is tied to | because the session ID in the preceding SEQUENCE operation is tied to | |||
the client ID of the data server, the data server has no obvious way | the client ID of the data server, the data server has no obvious way | |||
to determine the metadata server from the COMPOUND procedure, and | to determine the metadata server from the COMPOUND procedure, and | |||
thus has no way to validate the stateid. One RECOMMENDED approach is | thus has no way to validate the stateid. One RECOMMENDED approach is | |||
for pNFS servers to encode metadata server routing and/or identity | for pNFS servers to encode metadata server routing and/or identity | |||
information in the data server filehandles as returned in the layout. | information in the data server filehandles as returned in the layout. | |||
If metadata server routing and/or identity information is encoded in | If metadata server routing and/or identity information is encoded in | |||
data server filehandles, when the metadata server identity or | data server filehandles, when the metadata server identity or | |||
location changes, the data server filehandles it gave out will become | location changes, the data server filehandles it gave out will become | |||
invalid (stale), and so the metadata server MUST first recall the | invalid (stale), and so the metadata server MUST first recall the | |||
layouts. Invalidating a data server filehandle does not render the | layouts. Invalidating a data server filehandle does not render the | |||
NFS client's data cache invalid. The client's cache should map a | NFS client's data cache invalid. The client's cache should map a | |||
data server filehandle to a metadata server filehandle, and a | data server filehandle to a metadata server filehandle, and a | |||
metadata server filehandle to cached data. | metadata server filehandle to cached data. | |||
If a server is both a metadata server and a data server, the server | If a server is both a metadata server and a data server, the server | |||
might need to distinguish operations on files that are directed to | might need to distinguish operations on files that are directed to | |||
the metadata server from those that are directed to the data server. | the metadata server from those that are directed to the data server. | |||
It is RECOMMENDED that the values of the filehandles returned by the | It is RECOMMENDED that the values of the filehandles returned by the | |||
LAYOUTGET operation to be different than the value of the filehandle | LAYOUTGET operation be different than the value of the filehandle | |||
returned by the OPEN of the same file. | returned by the OPEN of the same file. | |||
Another scenario is for the metadata server and the storage device to | Another scenario is for the metadata server and the storage device to | |||
be distinct from one client's point of view, and the roles reversed | be distinct from one client's point of view, and the roles reversed | |||
from another client's point of view. For example, in the cluster | from another client's point of view. For example, in the cluster | |||
file system model, a metadata server to one client might be a data | file system model, a metadata server to one client might be a data | |||
server to another client. If NFSv4.1 is being used as the storage | server to another client. If NFSv4.1 is being used as the storage | |||
protocol, then pNFS servers need to encode the values of filehandles | protocol, then pNFS servers need to encode the values of filehandles | |||
according to their specific roles. | according to their specific roles. | |||
13.1.1. Sessions Considerations for Data Servers | 13.1.1. Sessions Considerations for Data Servers | |||
Section 2.10.11.2 states that a client has to keep its lease renewed | Section 2.10.11.2 states that a client has to keep its lease renewed | |||
in order to prevent a session from being deleted by the server. If | in order to prevent a session from being deleted by the server. If | |||
the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role | the reply to EXCHANGE_ID has just the EXCHGID4_FLAG_USE_PNFS_DS role | |||
set, then as noted in Section 13.6 the client will not be able to | set, then (as noted in Section 13.6) the client will not be able to | |||
determine the data server's lease_time attribute, because GETATTR | determine the data server's lease_time attribute because GETATTR will | |||
will not be permitted. Instead, the rule is that any time a client | not be permitted. Instead, the rule is that any time a client | |||
receives a layout referring it to a data server that returns just the | receives a layout referring it to a data server that returns just the | |||
EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the | EXCHGID4_FLAG_USE_PNFS_DS role, the client MAY assume that the | |||
lease_time attribute from the metadata server that returned the | lease_time attribute from the metadata server that returned the | |||
layout applies to the data server. Thus the data server MUST be | layout applies to the data server. Thus, the data server MUST be | |||
aware of the values of all lease_time attributes of all metadata | aware of the values of all lease_time attributes of all metadata | |||
servers it is providing I/O for, and MUST use the maximum of all such | servers for which it is providing I/O, and it MUST use the maximum of | |||
lease_time values as the lease interval for all client IDs and | all such lease_time values as the lease interval for all client IDs | |||
sessions established on it. | and sessions established on it. | |||
For example, if one metadata server has a lease_time attribute of 20 | For example, if one metadata server has a lease_time attribute of 20 | |||
seconds, and a second metadata server has a lease_time attribute of | seconds, and a second metadata server has a lease_time attribute of | |||
10 seconds, then if both servers return layouts that refer to an | 10 seconds, then if both servers return layouts that refer to an | |||
EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data server MUST | EXCHGID4_FLAG_USE_PNFS_DS-only data server, the data server MUST | |||
renew a client's lease if the interval between two SEQUENCE | renew a client's lease if the interval between two SEQUENCE | |||
operations on different COMPOUND requests is less than 20 seconds. | operations on different COMPOUND requests is less than 20 seconds. | |||
13.2. File Layout Definitions | 13.2. File Layout Definitions | |||
The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout | The following definitions apply to the LAYOUT4_NFSV4_1_FILES layout | |||
type, and may be applicable to other layout types. | type and may be applicable to other layout types. | |||
Unit. A unit is a fixed size quantity of data written to a data | Unit. A unit is a fixed-size quantity of data written to a data | |||
server. | server. | |||
Pattern. A pattern is a method of distributing one or more equal | Pattern. A pattern is a method of distributing one or more equal | |||
sized units across a set of data servers. A pattern is iterated | sized units across a set of data servers. A pattern is iterated | |||
one or more times. | one or more times. | |||
Stripe. An stripe is a set of data distributed across a set of data | Stripe. A stripe is a set of data distributed across a set of data | |||
servers in a pattern before that pattern repeats. | servers in a pattern before that pattern repeats. | |||
Stripe Count. A stripe count is the number of units in a pattern. | Stripe Count. A stripe count is the number of units in a pattern. | |||
Stripe Width. A stripe width is the size of stripe in bytes. The | Stripe Width. A stripe width is the size of a stripe in bytes. The | |||
stripe width = the stripe count * the size of the stripe unit. | stripe width = the stripe count * the size of the stripe unit. | |||
Hereafter, this document will refer to a unit that is a written in a | Hereafter, this document will refer to a unit that is a written in a | |||
pattern as a "stripe unit". | pattern as a "stripe unit". | |||
A pattern may have more stripe units than data servers. If so, some | A pattern may have more stripe units than data servers. If so, some | |||
data servers will have more than one stripe unit per stripe. A data | data servers will have more than one stripe unit per stripe. A data | |||
server that has multiple stripe units per stripe MAY store each unit | server that has multiple stripe units per stripe MAY store each unit | |||
in a different data file (and depending on the implementation, will | in a different data file (and depending on the implementation, will | |||
possibly assign a unique data filehandle to each data file). | possibly assign a unique data filehandle to each data file). | |||
skipping to change at page 312, line 27 | skipping to change at page 312, line 27 | |||
struct nfsv4_1_file_layouthint4 { | struct nfsv4_1_file_layouthint4 { | |||
uint32_t nflh_care; | uint32_t nflh_care; | |||
nfl_util4 nflh_util; | nfl_util4 nflh_util; | |||
count4 nflh_stripe_count; | count4 nflh_stripe_count; | |||
}; | }; | |||
The generic layout hint structure is described in Section 3.3.19. | The generic layout hint structure is described in Section 3.3.19. | |||
The client uses the layout hint in the layout_hint (Section 5.12.4) | The client uses the layout hint in the layout_hint (Section 5.12.4) | |||
attribute to indicate the preferred type of layout to be used for a | attribute to indicate the preferred type of layout to be used for a | |||
newly created file. The LAYOUT4_NFSV4_1_FILES layout type-specific | newly created file. The LAYOUT4_NFSV4_1_FILES layout-type-specific | |||
content for the layout hint is composed of three fields. The first | content for the layout hint is composed of three fields. The first | |||
field, nflh_care, is a set of flags indicating which values of the | field, nflh_care, is a set of flags indicating which values of the | |||
hint the client cares about. If the NFLH4_CARE_DENSE flag is set, | hint the client cares about. If the NFLH4_CARE_DENSE flag is set, | |||
then the client indicates in the second field, nflh_util, a | then the client indicates in the second field, nflh_util, a | |||
preference for how the data file is packed (Section 13.4.4), which is | preference for how the data file is packed (Section 13.4.4), which is | |||
controlled by the value of nflh_util & NFL4_UFLG_DENSE. If the | controlled by the value of the expression nflh_util & NFL4_UFLG_DENSE | |||
("&" represents the bitwise AND operator). If the | ||||
NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates a | NFLH4_CARE_COMMIT_THRU_MDS flag is set, then the client indicates a | |||
preference for whether the client should send COMMIT operations to | preference for whether the client should send COMMIT operations to | |||
the metadata server or data server (Section 13.7), which is | the metadata server or data server (Section 13.7), which is | |||
controlled by the value of nflh_util & NFL4_UFLG_COMMIT_THRU_MDS. If | controlled by the value of nflh_util & NFL4_UFLG_COMMIT_THRU_MDS. If | |||
the NFLH4_CARE_STRIPE_UNIT_SIZE flag is set, the client indicates its | the NFLH4_CARE_STRIPE_UNIT_SIZE flag is set, the client indicates its | |||
preferred stripe unit size, which is indicated in nflh_util & | preferred stripe unit size, which is indicated in nflh_util & | |||
NFL4_UFLG_STRIPE_UNIT_SIZE_MASK (thus the stripe unit size MUST be a | NFL4_UFLG_STRIPE_UNIT_SIZE_MASK (thus, the stripe unit size MUST be a | |||
multiple of 64 bytes). The minimum stripe unit size is 64 bytes. If | multiple of 64 bytes). The minimum stripe unit size is 64 bytes. If | |||
the NFLH4_CARE_STRIPE_COUNT flag is set, the client indicates in the | the NFLH4_CARE_STRIPE_COUNT flag is set, the client indicates in the | |||
third field, nflh_stripe_count, the stripe count. The stripe count | third field, nflh_stripe_count, the stripe count. The stripe count | |||
multiplied by the stripe unit size is the stripe width. | multiplied by the stripe unit size is the stripe width. | |||
When LAYOUTGET returns a LAYOUT4_NFSV4_1_FILES layout (indicated in | When LAYOUTGET returns a LAYOUT4_NFSV4_1_FILES layout (indicated in | |||
the loc_type field of the lo_content field), the loc_body field of | the loc_type field of the lo_content field), the loc_body field of | |||
the lo_content field contains a value of data type | the lo_content field contains a value of data type | |||
nfsv4_1_file_layout4. Among other content, nfsv4_1_file_layout4 has | nfsv4_1_file_layout4. Among other content, nfsv4_1_file_layout4 has | |||
a storage device ID (field nfl_deviceid) of data type deviceid4. The | a storage device ID (field nfl_deviceid) of data type deviceid4. The | |||
skipping to change at page 313, line 25 | skipping to change at page 313, line 26 | |||
struct nfsv4_1_file_layout_ds_addr4 { | struct nfsv4_1_file_layout_ds_addr4 { | |||
uint32_t nflda_stripe_indices<>; | uint32_t nflda_stripe_indices<>; | |||
multipath_list4 nflda_multipath_ds_list<>; | multipath_list4 nflda_multipath_ds_list<>; | |||
}; | }; | |||
The nfsv4_1_file_layout_ds_addr4 data type represents the device | The nfsv4_1_file_layout_ds_addr4 data type represents the device | |||
address. It is composed of two fields: | address. It is composed of two fields: | |||
1. nflda_multipath_ds_list: An array of lists of data servers, where | 1. nflda_multipath_ds_list: An array of lists of data servers, where | |||
each list can be one or more elements, and each element | each list can be one or more elements, and each element | |||
represents a (see Section 13.5) data server address which may | represents a data server address that may serve equally as the | |||
serve equally as the target of IO operations. The length of this | target of I/O operations (see Section 13.5). The length of this | |||
array might be different than the stripe count. | array might be different than the stripe count. | |||
2. nflda_stripe_indices: An array of indices used to index into | 2. nflda_stripe_indices: An array of indices used to index into | |||
nflda_multipath_ds_list. The value of each element of | nflda_multipath_ds_list. The value of each element of | |||
nflda_stripe_indices MUST be less than the number of elements in | nflda_stripe_indices MUST be less than the number of elements in | |||
nflda_multipath_ds_list. Each element of nflda_multipath_ds_list | nflda_multipath_ds_list. Each element of nflda_multipath_ds_list | |||
SHOULD be referred to by one or more elements of | SHOULD be referred to by one or more elements of | |||
nflda_stripe_indices. The number of elements in | nflda_stripe_indices. The number of elements in | |||
nflda_stripe_indices is always equal to the stripe count. | nflda_stripe_indices is always equal to the stripe count. | |||
skipping to change at page 314, line 7 | skipping to change at page 314, line 7 | |||
struct nfsv4_1_file_layout4 { | struct nfsv4_1_file_layout4 { | |||
deviceid4 nfl_deviceid; | deviceid4 nfl_deviceid; | |||
nfl_util4 nfl_util; | nfl_util4 nfl_util; | |||
uint32_t nfl_first_stripe_index; | uint32_t nfl_first_stripe_index; | |||
offset4 nfl_pattern_offset; | offset4 nfl_pattern_offset; | |||
nfs_fh4 nfl_fh_list<>; | nfs_fh4 nfl_fh_list<>; | |||
}; | }; | |||
The nfsv4_1_file_layout4 data type represents the layout. It is | The nfsv4_1_file_layout4 data type represents the layout. It is | |||
composed of the following fields: | composed of the following fields: | |||
1. nfl_deviceid: The device ID which maps to a value of type | 1. nfl_deviceid: The device ID that maps to a value of type | |||
nfsv4_1_file_layout_ds_addr4. | nfsv4_1_file_layout_ds_addr4. | |||
2. nfl_util: Like the nflh_util field of data type | 2. nfl_util: Like the nflh_util field of data type | |||
nfsv4_1_file_layouthint4, a compact representation of how the | nfsv4_1_file_layouthint4, a compact representation of how the | |||
data on a file on each data server is packed, whether the client | data on a file on each data server is packed, whether the client | |||
should send COMMIT operations to the metadata server or data | should send COMMIT operations to the metadata server or data | |||
server, and the stripe unit size. If a server returns two or | server, and the stripe unit size. If a server returns two or | |||
more overlapping layouts, each stripe unit size in each | more overlapping layouts, each stripe unit size in each | |||
overlapping layout MUST be the same. | overlapping layout MUST be the same. | |||
3. nfl_first_stripe_index: The index into the first element of the | 3. nfl_first_stripe_index: The index into the first element of the | |||
nflda_stripe_indices array to use. | nflda_stripe_indices array to use. | |||
4. nfl_pattern_offset: This field is the logical offset into the | 4. nfl_pattern_offset: This field is the logical offset into the | |||
file where the striping pattern starts. It is required for | file where the striping pattern starts. It is required for | |||
converting the client's logical I/O offset (e.g. the current | converting the client's logical I/O offset (e.g., the current | |||
offset in a POSIX file descriptor before the read() or write() | offset in a POSIX file descriptor before the read() or write() | |||
system call is sent) into the stripe unit number (see | system call is sent) into the stripe unit number (see | |||
Section 13.4.1). | Section 13.4.1). | |||
If dense packing is used, then nfl_pattern_offset is also needed | If dense packing is used, then nfl_pattern_offset is also needed | |||
to convert the client's logical I/O offset to an offset on the | to convert the client's logical I/O offset to an offset on the | |||
file on the data server corresponding to the stripe unit number | file on the data server corresponding to the stripe unit number | |||
(see Section 13.4.4). | (see Section 13.4.4). | |||
Note that nfl_pattern_offset is not always the same as lo_offset. | Note that nfl_pattern_offset is not always the same as lo_offset. | |||
For example, via the LAYOUTGET operation, a client might request | For example, via the LAYOUTGET operation, a client might request | |||
a layout starting at offset 1000 of a file that has its striping | a layout starting at offset 1000 of a file that has its striping | |||
pattern start at offset 0. | pattern start at offset zero. | |||
5. nfl_fh_list: An array of data server filehandles for each list of | 5. nfl_fh_list: An array of data server filehandles for each list of | |||
data servers in each element of the nflda_multipath_ds_list | data servers in each element of the nflda_multipath_ds_list | |||
array. The number of elements in nfl_fh_list depends on whether | array. The number of elements in nfl_fh_list depends on whether | |||
sparse or dense packing is being used. | sparse or dense packing is being used. | |||
* If sparse packing is being used, the number of elements in | * If sparse packing is being used, the number of elements in | |||
nfl_fh_list MUST be one of three values: | nfl_fh_list MUST be one of three values: | |||
+ Zero. This means that filehandles used for each data | + Zero. This means that filehandles used for each data | |||
skipping to change at page 315, line 15 | skipping to change at page 315, line 15 | |||
+ One. This means that every data server uses the same | + One. This means that every data server uses the same | |||
filehandle: what is specified in nfl_fh_list[0]. | filehandle: what is specified in nfl_fh_list[0]. | |||
+ The same number of elements in nflda_multipath_ds_list. | + The same number of elements in nflda_multipath_ds_list. | |||
Thus, in this case, when sending an I/O operation to any | Thus, in this case, when sending an I/O operation to any | |||
data server in nflda_multipath_ds_list[X], the filehandle | data server in nflda_multipath_ds_list[X], the filehandle | |||
in nfl_fh_list[X] MUST be used. | in nfl_fh_list[X] MUST be used. | |||
See the discussion on sparse packing in Section 13.4.4. | See the discussion on sparse packing in Section 13.4.4. | |||
* If dense packing is being used, number of elements in | * If dense packing is being used, the number of elements in | |||
nfl_fh_list MUST be the same as the number of elements in | nfl_fh_list MUST be the same as the number of elements in | |||
nflda_stripe_indices. Thus when sending an I/O operation to | nflda_stripe_indices. Thus, when sending an I/O operation to | |||
any data server in | any data server in | |||
nflda_multipath_ds_list[nflda_stripe_indices[Y]], the | nflda_multipath_ds_list[nflda_stripe_indices[Y]], the | |||
filehandle in nfl_fh_list[Y] MUST be used. In addition, any | filehandle in nfl_fh_list[Y] MUST be used. In addition, any | |||
time there exists i, and j, (i != j) such that the | time there exists i and j, (i != j), such that the | |||
intersection of | intersection of | |||
nflda_multipath_ds_list[nflda_stripe_indices[i]] and | nflda_multipath_ds_list[nflda_stripe_indices[i]] and | |||
nflda_multipath_ds_list[nflda_stripe_indices[j]] is not empty, | nflda_multipath_ds_list[nflda_stripe_indices[j]] is not empty, | |||
then nfl_fh_list[i] MUST NOT equal nfl_fh_list[j]. In other | then nfl_fh_list[i] MUST NOT equal nfl_fh_list[j]. In other | |||
words, when dense packing is being used, if a data server | words, when dense packing is being used, if a data server | |||
appears in two or more units of a striping pattern, each | appears in two or more units of a striping pattern, each | |||
reference to the data server MUST use a different filehandle. | reference to the data server MUST use a different filehandle. | |||
Indeed, if there are multiple striping patterns, as indicated | Indeed, if there are multiple striping patterns, as indicated | |||
by the presence of multiple objects of data type layout4 | by the presence of multiple objects of data type layout4 | |||
skipping to change at page 316, line 8 | skipping to change at page 316, line 8 | |||
To find the stripe unit number that corresponds to the client's | To find the stripe unit number that corresponds to the client's | |||
logical file offset, the pattern offset will also be used. The i'th | logical file offset, the pattern offset will also be used. The i'th | |||
stripe unit (SUi) is: | stripe unit (SUi) is: | |||
relative_offset = file_offset - nfl_pattern_offset; | relative_offset = file_offset - nfl_pattern_offset; | |||
SUi = floor(relative_offset / stripe_unit_size); | SUi = floor(relative_offset / stripe_unit_size); | |||
13.4.2. Interpreting the File Layout Using Sparse Packing | 13.4.2. Interpreting the File Layout Using Sparse Packing | |||
When sparse packing is used, the algorithm for determining the | When sparse packing is used, the algorithm for determining the | |||
filehandle and set of data server network addresses to write stripe | filehandle and set of data-server network addresses to write stripe | |||
unit i (SUi) to is: | unit i (SUi) to is: | |||
stripe_count = number of elements in nflda_stripe_indices; | stripe_count = number of elements in nflda_stripe_indices; | |||
j = (SUi + nfl_first_stripe_index) % stripe_count; | j = (SUi + nfl_first_stripe_index) % stripe_count; | |||
idx = nflda_stripe_indices[j]; | idx = nflda_stripe_indices[j]; | |||
fh_count = number of elements in nfl_fh_list; | fh_count = number of elements in nfl_fh_list; | |||
ds_count = number of elements in nflda_multipath_ds_list; | ds_count = number of elements in nflda_multipath_ds_list; | |||
skipping to change at page 316, line 50 | skipping to change at page 316, line 50 | |||
The client would then select a data server from address_list, and | The client would then select a data server from address_list, and | |||
send a READ or WRITE operation using the filehandle specified in fh. | send a READ or WRITE operation using the filehandle specified in fh. | |||
Consider the following example: | Consider the following example: | |||
Suppose we have a device address consisting of seven data servers, | Suppose we have a device address consisting of seven data servers, | |||
arranged in three equivalence (Section 13.5) classes: | arranged in three equivalence (Section 13.5) classes: | |||
{ A, B, C, D }, { E }, { F, G } | { A, B, C, D }, { E }, { F, G } | |||
Where A through G are network addresses. | where A through G are network addresses. | |||
Then | Then | |||
nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | |||
i.e. | i.e., | |||
nflda_multipath_ds_list[0] = { A, B, C, D } | nflda_multipath_ds_list[0] = { A, B, C, D } | |||
nflda_multipath_ds_list[1] = { E } | nflda_multipath_ds_list[1] = { E } | |||
nflda_multipath_ds_list[2] = { F, G } | nflda_multipath_ds_list[2] = { F, G } | |||
Suppose the striping index array is: | Suppose the striping index array is: | |||
nflda_stripe_indices<> = { 2, 0, 1, 0 } | nflda_stripe_indices<> = { 2, 0, 1, 0 } | |||
Now suppose the client gets a layout which has a device ID that maps | Now suppose the client gets a layout that has a device ID that maps | |||
to the above device address. The initial index, | to the above device address. The initial index contains | |||
nfl_first_stripe_index = 2, | nfl_first_stripe_index = 2, | |||
and | and the filehandle list is | |||
nfl_fh_list = { 0x36, 0x87, 0x67 }. | nfl_fh_list = { 0x36, 0x87, 0x67 }. | |||
If the client wants to write to SU0, the set of valid { network | If the client wants to write to SU0, the set of valid { network | |||
address, filehandle } combinations for SUi are determined by: | address, filehandle } combinations for SUi are determined by: | |||
nfl_first_stripe_index = 2 | nfl_first_stripe_index = 2 | |||
So | So | |||
skipping to change at page 317, line 51 | skipping to change at page 317, line 51 | |||
= 1 | = 1 | |||
So | So | |||
nflda_multipath_ds_list[1] = { E } | nflda_multipath_ds_list[1] = { E } | |||
and | and | |||
nfl_fh_list[1] = { 0x87 } | nfl_fh_list[1] = { 0x87 } | |||
The client can thus write SU0 to { 0x87, { E }, }. | The client can thus write SU0 to { 0x87, { E } }. | |||
The destinations of the first thirteen storage units are: | The destinations of the first 13 storage units are: | |||
+-----+------------+--------------+ | +-----+------------+--------------+ | |||
| SUi | filehandle | data servers | | | SUi | filehandle | data servers | | |||
+-----+------------+--------------+ | +-----+------------+--------------+ | |||
| 0 | 87 | E | | | 0 | 87 | E | | |||
| 1 | 36 | A,B,C,D | | | 1 | 36 | A,B,C,D | | |||
| 2 | 67 | F,G | | | 2 | 67 | F,G | | |||
| 3 | 36 | A,B,C,D | | | 3 | 36 | A,B,C,D | | |||
| 4 | 87 | E | | | 4 | 87 | E | | |||
| 5 | 36 | A,B,C,D | | | 5 | 36 | A,B,C,D | | |||
skipping to change at page 319, line 15 | skipping to change at page 319, line 15 | |||
send a READ or WRITE operation using the filehandle specified in fh. | send a READ or WRITE operation using the filehandle specified in fh. | |||
Consider the following example (which is the same as the sparse | Consider the following example (which is the same as the sparse | |||
packing example, except for the filehandle list): | packing example, except for the filehandle list): | |||
Suppose we have a device address consisting of seven data servers, | Suppose we have a device address consisting of seven data servers, | |||
arranged in three equivalence (Section 13.5) classes: | arranged in three equivalence (Section 13.5) classes: | |||
{ A, B, C, D }, { E }, { F, G } | { A, B, C, D }, { E }, { F, G } | |||
Where A through G are network addresses. | where A through G are network addresses. | |||
Then | Then | |||
nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | nflda_multipath_ds_list<> = { A, B, C, D }, { E }, { F, G } | |||
i.e. | i.e., | |||
nflda_multipath_ds_list[0] = { A, B, C, D } | nflda_multipath_ds_list[0] = { A, B, C, D } | |||
nflda_multipath_ds_list[1] = { E } | nflda_multipath_ds_list[1] = { E } | |||
nflda_multipath_ds_list[2] = { F, G } | nflda_multipath_ds_list[2] = { F, G } | |||
Suppose the striping index array is: | Suppose the striping index array is: | |||
nflda_stripe_indices<> = { 2, 0, 1, 0 } | nflda_stripe_indices<> = { 2, 0, 1, 0 } | |||
Now suppose the client gets a layout which has a device ID that maps | Now suppose the client gets a layout that has a device ID that maps | |||
to the above device address. The initial index, | to the above device address. The initial index contains | |||
nfl_first_stripe_index = 2, | nfl_first_stripe_index = 2, | |||
and | and | |||
nfl_fh_list = { 0x67, 0x37, 0x87, 0x36 }. | nfl_fh_list = { 0x67, 0x37, 0x87, 0x36 }. | |||
The interesting examples for dense packing are SU1 and SU3, because | The interesting examples for dense packing are SU1 and SU3 because | |||
each stripe unit refers to the same data server list, yet MUST use a | each stripe unit refers to the same data server list, yet each stripe | |||
different filehandle. If the client wants to write to SU1, the set | unit MUST use a different filehandle. If the client wants to write | |||
of valid { network address, filehandle } combinations for SUi are | to SU1, the set of valid { network address, filehandle } combinations | |||
determined by: | for SUi are determined by: | |||
nfl_first_stripe_index = 2 | nfl_first_stripe_index = 2 | |||
So | So | |||
j = (1 + 2) % 4 = 3 | j = (1 + 2) % 4 = 3 | |||
idx = nflda_stripe_indices[j] | idx = nflda_stripe_indices[j] | |||
= nflda_stripe_indices[3] | = nflda_stripe_indices[3] | |||
= 0 | = 0 | |||
So | So | |||
nflda_multipath_ds_list[0] = { A, B, C, D } | nflda_multipath_ds_list[0] = { A, B, C, D } | |||
and | and | |||
nfl_fh_list[3] = { 0x36 } | nfl_fh_list[3] = { 0x36 } | |||
The client can thus write SU1 to { 0x36, { A, B, C, D }, }. | The client can thus write SU1 to { 0x36, { A, B, C, D } }. | |||
For SU3, j = (3 + 2) % 4 = 1, and nflda_stripe_indices[1] = 0. Then | For SU3, j = (3 + 2) % 4 = 1, and nflda_stripe_indices[1] = 0. Then | |||
nflda_multipath_ds_list[0] = { A, B, C, D }, and nfl_fh_list[1] = | nflda_multipath_ds_list[0] = { A, B, C, D }, and nfl_fh_list[1] = | |||
0x37. The client can thus write SU3 to { 0x37, { A, B, C, D } }. | 0x37. The client can thus write SU3 to { 0x37, { A, B, C, D } }. | |||
The destinations of the first thirteen storage units are: | The destinations of the first 13 storage units are: | |||
+-----+------------+--------------+ | +-----+------------+--------------+ | |||
| SUi | filehandle | data servers | | | SUi | filehandle | data servers | | |||
+-----+------------+--------------+ | +-----+------------+--------------+ | |||
| 0 | 87 | E | | | 0 | 87 | E | | |||
| 1 | 36 | A,B,C,D | | | 1 | 36 | A,B,C,D | | |||
| 2 | 67 | F,G | | | 2 | 67 | F,G | | |||
| 3 | 37 | A,B,C,D | | | 3 | 37 | A,B,C,D | | |||
| 4 | 87 | E | | | 4 | 87 | E | | |||
| 5 | 36 | A,B,C,D | | | 5 | 36 | A,B,C,D | | |||
skipping to change at page 321, line 5 | skipping to change at page 321, line 5 | |||
| 12 | 87 | E | | | 12 | 87 | E | | |||
+-----+------------+--------------+ | +-----+------------+--------------+ | |||
13.4.4. Sparse and Dense Stripe Unit Packing | 13.4.4. Sparse and Dense Stripe Unit Packing | |||
The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util | The flag NFL4_UFLG_DENSE of the nfl_util4 data type (field nflh_util | |||
of the data type nfsv4_1_file_layouthint4 and field nfl_util of data | of the data type nfsv4_1_file_layouthint4 and field nfl_util of data | |||
type nfsv4_1_file_layout_ds_addr4) specifies how the data is packed | type nfsv4_1_file_layout_ds_addr4) specifies how the data is packed | |||
within the data file on a data server. It allows for two different | within the data file on a data server. It allows for two different | |||
data packings: sparse and dense. The packing type determines the | data packings: sparse and dense. The packing type determines the | |||
calculation that will be made to map the client visible file offset | calculation that will be made to map the client-visible file offset | |||
to the offset within the data file located on the data server. | to the offset within the data file located on the data server. | |||
If nfl_util & NFL4_UFLG_DENSE is zero, this means that sparse packing | If nfl_util & NFL4_UFLG_DENSE is zero, this means that sparse packing | |||
is being used. Hence the logical offsets of the file as viewed by a | is being used. Hence, the logical offsets of the file as viewed by a | |||
client sending READs and WRITEs directly to the metadata server are | client sending READs and WRITEs directly to the metadata server are | |||
the same offsets each data server uses when storing a stripe unit. | the same offsets each data server uses when storing a stripe unit. | |||
The effect then, for striping patterns consisting of at least two | The effect then, for striping patterns consisting of at least two | |||
stripe units, is for each data server file to be sparse or holey. So | stripe units, is for each data server file to be sparse or "holey". | |||
for example, suppose there is a pattern with three stripe units, the | So for example, suppose there is a pattern with three stripe units, | |||
stripe unit size is a 4096 bytes, and there are three data servers in | the stripe unit size is 4096 bytes, and there are three data servers | |||
the pattern, then the file in data server 1 will have stripe units 0, | in the pattern. Then, the file in data server 1 will have stripe | |||
3, 6, 9, ... filled, data server 2's file will have stripe units 1, | units 0, 3, 6, 9, ... filled; data server 2's file will have stripe | |||
4, 7, 10, ... filled, and data server 3's file will have stripe units | units 1, 4, 7, 10, ... filled; and data server 3's file will have | |||
2, 5, 8, 11, ... filled. The unfilled stripe units of each file will | stripe units 2, 5, 8, 11, ... filled. The unfilled stripe units of | |||
be holes, hence the files in each data server are sparse. | each file will be holes; hence, the files in each data server are | |||
sparse. | ||||
If sparse packing is being used and a client attempts I/O to one of | If sparse packing is being used and a client attempts I/O to one of | |||
the holes, then an error MUST be returned by the data server. Using | the holes, then an error MUST be returned by the data server. Using | |||
the above example, if data server 3 received a READ or WRITE request | the above example, if data server 3 received a READ or WRITE | |||
for block 4, the data server would return NFS4ERR_PNFS_IO_HOLE. Thus | operation for block 4, the data server would return | |||
data servers need to understand the striping pattern in order to | NFS4ERR_PNFS_IO_HOLE. Thus, data servers need to understand the | |||
support sparse packing. | striping pattern in order to support sparse packing. | |||
If nfl_util & NFL4_UFLG_DENSE is one, this means that dense packing | If nfl_util & NFL4_UFLG_DENSE is one, this means that dense packing | |||
is being used and the data server files have no holes. Dense packing | is being used, and the data server files have no holes. Dense | |||
might be selected because the data server does not (efficiently) | packing might be selected because the data server does not | |||
support holey files, or because the data server cannot recognize | (efficiently) support holey files or because the data server cannot | |||
read-ahead unless there are no holes. If dense packing is indicated | recognize read-ahead unless there are no holes. If dense packing is | |||
in the layout, the data files will be packed. Using the example | indicated in the layout, the data files will be packed. Using the | |||
striping pattern and stripe unit size that was used for the sparse | same striping pattern and stripe unit size that were used for the | |||
packing example, the corresponding dense packing would have all | sparse packing example, the corresponding dense packing example would | |||
stripe units of all data files filled. Logical stripe units 0, 3, 6, | have all stripe units of all data files filled as follows: | |||
... of the file would live on stripe units 0, 1, 2, ... of the file | ||||
of data server 1, logical stripe units 1, 4, 7, ... of the file would | o Logical stripe units 0, 3, 6, ... of the file would live on stripe | |||
live on stripe units 0, 1, 2, ... of the file of data server 2, and | units 0, 1, 2, ... of the file of data server 1. | |||
logical stripe units 2, 5, 8, ... of the file would live on stripe | ||||
o Logical stripe units 1, 4, 7, ... of the file would live on stripe | ||||
units 0, 1, 2, ... of the file of data server 2. | ||||
o Logical stripe units 2, 5, 8, ... of the file would live on stripe | ||||
units 0, 1, 2, ... of the file of data server 3. | units 0, 1, 2, ... of the file of data server 3. | |||
Because dense packing does not leave holes on the data servers, the | Because dense packing does not leave holes on the data servers, the | |||
pNFS client is allowed to write to any offset of any data file of any | pNFS client is allowed to write to any offset of any data file of any | |||
data server in the stripe. Thus the data servers need not know the | data server in the stripe. Thus, the data servers need not know the | |||
file's striping pattern. | file's striping pattern. | |||
The calculation to determine the byte offset within the data file for | The calculation to determine the byte offset within the data file for | |||
dense data server layouts is: | dense data server layouts is: | |||
stripe_width = stripe_unit_size * N; | stripe_width = stripe_unit_size * N; | |||
where N = number of elements in nflda_stripe_indices. | where N = number of elements in nflda_stripe_indices. | |||
relative_offset = file_offset - nfl_pattern_offset; | relative_offset = file_offset - nfl_pattern_offset; | |||
data_file_offset = floor(relative_offset / stripe_width) | data_file_offset = floor(relative_offset / stripe_width) | |||
* stripe_unit_size | * stripe_unit_size | |||
+ relative_offset % stripe_unit_size | + relative_offset % stripe_unit_size | |||
If dense packing is being used, and a data server appears more than | If dense packing is being used, and a data server appears more than | |||
once in a striping pattern, then to distinguish one stripe unit from | once in a striping pattern, then to distinguish one stripe unit from | |||
another, the data server MUST use a different filehandle. Let's | another, the data server MUST use a different filehandle. Let's | |||
suppose there are two data servers. Logical stripe units 0, 3, 6 are | suppose there are two data servers. Logical stripe units 0, 3, 6 are | |||
served by data server 1, logical stripe units 1, 4, 7 are served by | served by data server 1; logical stripe units 1, 4, 7 are served by | |||
data server 2, and logical stripe units 2, 5, 8 are also served by | data server 2; and logical stripe units 2, 5, 8 are also served by | |||
data server 2. Unless data server 2 has two filehandles (each | data server 2. Unless data server 2 has two filehandles (each | |||
referring to a different data file), then, for example, a write to | referring to a different data file), then, for example, a write to | |||
logical stripe unit 1 overwrites the write to logical stripe unit 2, | logical stripe unit 1 overwrites the write to logical stripe unit 2 | |||
because both logical stripe units are located in the same stripe unit | because both logical stripe units are located in the same stripe unit | |||
(0) of data server 2. | (0) of data server 2. | |||
13.5. Data Server Multipathing | 13.5. Data Server Multipathing | |||
The NFSv4.1 file layout supports multipathing to multiple data server | The NFSv4.1 file layout supports multipathing to multiple data server | |||
addresses. Data server-level multipathing is used for bandwidth | addresses. Data-server-level multipathing is used for bandwidth | |||
scaling via trunking (Section 2.10.5) and for higher availability of | scaling via trunking (Section 2.10.5) and for higher availability of | |||
use in the case of a data server failure. Multipathing allows the | use in the case of a data-server failure. Multipathing allows the | |||
client to switch to another data server address which may that of | client to switch to another data server address which may be that of | |||
another data server that is exporting the same data stripe unit, | another data server that is exporting the same data stripe unit, | |||
without having to contact the metadata server for a new layout. | without having to contact the metadata server for a new layout. | |||
To support data server multipathing, each element of the | To support data server multipathing, each element of the | |||
nflda_multipath_ds_list contains an array of one more data server | nflda_multipath_ds_list contains an array of one more data server | |||
network addresses. This array (data type multipath_list4) represents | network addresses. This array (data type multipath_list4) represents | |||
a list of data servers (each identified by a network address), with | a list of data servers (each identified by a network address), with | |||
it being possible that some data servers will appear in the list | the possibility that some data servers will appear in the list | |||
multiple times. | multiple times. | |||
The client is free to use any of the network addresses as a | The client is free to use any of the network addresses as a | |||
destination to send data server requests. If some network addresses | destination to send data server requests. If some network addresses | |||
are less optimal paths to the data than others, then the MDS SHOULD | are less optimal paths to the data than others, then the MDS SHOULD | |||
NOT include those network addresses in an element of | NOT include those network addresses in an element of | |||
nflda_multipath_ds_list. If less optimal network addresses exist to | nflda_multipath_ds_list. If less optimal network addresses exist to | |||
provide fail over, the RECOMMENDED method to offer the addresses is | provide failover, the RECOMMENDED method to offer the addresses is to | |||
to provide them in a replacement device ID to device address mapping, | provide them in a replacement device-ID-to-device-address mapping, or | |||
or a replacement device ID. When a client finds that no data server | a replacement device ID. When a client finds that no data server in | |||
in an element of nflda_multipath_ds_list responds, it SHOULD send a | an element of nflda_multipath_ds_list responds, it SHOULD send a | |||
GETDEVICEINFO to attempt to replace the existing device ID to device | GETDEVICEINFO to attempt to replace the existing device-ID-to-device- | |||
address mappings. If the MDS detects that all data servers | address mappings. If the MDS detects that all data servers | |||
represented by an element of nflda_multipath_ds_list are unavailable, | represented by an element of nflda_multipath_ds_list are unavailable, | |||
the MDS SHOULD send a CB_NOTIFY_DEVICEID (if the client has indicated | the MDS SHOULD send a CB_NOTIFY_DEVICEID (if the client has indicated | |||
it wants device ID notifications for changed device IDs) to change | it wants device ID notifications for changed device IDs) to change | |||
the device ID to device address mappings to the available data | the device-ID-to-device-address mappings to the available data | |||
servers. If the device ID itself will be replaced, the MDS SHOULD | servers. If the device ID itself will be replaced, the MDS SHOULD | |||
recall all layouts with the device ID, and thus force the client to | recall all layouts with the device ID, and thus force the client to | |||
get new layouts and device ID mappings via LAYOUTGET and | get new layouts and device ID mappings via LAYOUTGET and | |||
GETDEVICEINFO. | GETDEVICEINFO. | |||
Generally if two network addresses appear in an element of | Generally, if two network addresses appear in an element of | |||
nflda_multipath_ds_list they will designate the same data server and | nflda_multipath_ds_list, they will designate the same data server, | |||
the two data server addresses will support the implementation client | and the two data server addresses will support the implementation of | |||
ID or session trunking (the latter is RECOMMENDED) as defined in | client ID or session trunking (the latter is RECOMMENDED) as defined | |||
Section 2.10.5, and the two data server addresses will share the same | in Section 2.10.5. The two data server addresses will share the same | |||
server owner, or major ID of the server owner. It is not always | server owner or major ID of the server owner. It is not always | |||
necessary for the two data server addresses to designate the same | necessary for the two data server addresses to designate the same | |||
server with trunking being used. For example the data could be read- | server with trunking being used. For example, the data could be | |||
only, and the data consist of exact replicas. | read-only, and the data consist of exact replicas. | |||
13.6. Operations Sent to NFSv4.1 Data Servers | 13.6. Operations Sent to NFSv4.1 Data Servers | |||
Clients accessing data on an NFSv4.1 data server MUST send only the | Clients accessing data on an NFSv4.1 data server MUST send only the | |||
NULL procedure and COMPOUND procedures whose operations are taken | NULL procedure and COMPOUND procedures whose operations are taken | |||
only from two restricted subsets of the operations defined as valid | only from two restricted subsets of the operations defined as valid | |||
NFSv4.1 operations. Clients MUST use the filehandle specified by the | NFSv4.1 operations. Clients MUST use the filehandle specified by the | |||
layout when accessing data on NFSv4.1 data servers. | layout when accessing data on NFSv4.1 data servers. | |||
The first of these operation subsets consist of management | The first of these operation subsets consists of management | |||
operations. This subset consists of the BACKCHANNEL_CTL, | operations. This subset consists of the BACKCHANNEL_CTL, | |||
BIND_CONN_TO_SESSION, CREATE_SESSION, DESTROY_CLIENTID, | BIND_CONN_TO_SESSION, CREATE_SESSION, DESTROY_CLIENTID, | |||
DESTROY_SESSION, EXCHANGE_ID, SECINFO_NO_NAME, SET_SSV, and SEQUENCE | DESTROY_SESSION, EXCHANGE_ID, SECINFO_NO_NAME, SET_SSV, and SEQUENCE | |||
operations. The client may use these operations in order to set up | operations. The client may use these operations in order to set up | |||
and maintain the appropriate client IDs, sessions, and security | and maintain the appropriate client IDs, sessions, and security | |||
contexts involved in communication with the data server. Henceforth | contexts involved in communication with the data server. Henceforth, | |||
these will be referred to as data-server housekeeping operations. | these will be referred to as data-server housekeeping operations. | |||
The second subset consists of COMMIT, READ, WRITE, and PUTFH, These | The second subset consists of COMMIT, READ, WRITE, and PUTFH. These | |||
operations MUST be used with a current filehandle specified by the | operations MUST be used with a current filehandle specified by the | |||
layout. In the case of PUTFH, the new current filehandle MUST be one | layout. In the case of PUTFH, the new current filehandle MUST be one | |||
taken from the layout. Henceforth, these will be referred to as | taken from the layout. Henceforth, these will be referred to as | |||
data-server I/O operations. As described in Section 12.5.1, a client | data-server I/O operations. As described in Section 12.5.1, a client | |||
MUST NOT send an I/O to a data server for which it does not hold a | MUST NOT send an I/O to a data server for which it does not hold a | |||
valid layout; the data server MUST reject such an I/O. | valid layout; the data server MUST reject such an I/O. | |||
Unless the server has a concurrent non-data-server personality, i.e. | Unless the server has a concurrent non-data-server personality -- | |||
EXCHANGE_ID results returned (EXCHGID4_FLAG_USE_PNFS_DS | | i.e., EXCHANGE_ID results returned (EXCHGID4_FLAG_USE_PNFS_DS | | |||
EXCHGID4_FLAG_USE_PNFS_MDS) or (EXCHGID4_FLAG_USE_PNFS_DS | | EXCHGID4_FLAG_USE_PNFS_MDS) or (EXCHGID4_FLAG_USE_PNFS_DS | | |||
EXCHGID4_FLAG_USE_NON_PNFS), see Section 13.1, any attempted use of | EXCHGID4_FLAG_USE_NON_PNFS) see Section 13.1 -- any attempted use of | |||
operations against a data server other than those specified in the | operations against a data server other than those specified in the | |||
two subsets above MUST return NFS4ERR_NOTSUPP to the client. | two subsets above MUST return NFS4ERR_NOTSUPP to the client. | |||
When the server has concurrent data server and non-data-server | When the server has concurrent data-server and non-data-server | |||
personalities, each COMPOUND sent by the client MUST be constructed | personalities, each COMPOUND sent by the client MUST be constructed | |||
so that it is appropriate to one of the two personalities, and MUST | so that it is appropriate to one of the two personalities, and it | |||
NOT contain operations directed to a mix of those personalities. The | MUST NOT contain operations directed to a mix of those personalities. | |||
server MUST enforce this. To understand the constraints, operations | The server MUST enforce this. To understand the constraints, | |||
within a COMPOUND are divided into the following three classes: | operations within a COMPOUND are divided into the following three | |||
classes: | ||||
1. An operation which is ambiguous regarding its personality | 1. An operation that is ambiguous regarding its personality | |||
assignment. These include all of the data-server housekeeping | assignment. This includes all of the data-server housekeeping | |||
operations. Additionally, if the server has assigned filehandles | operations. Additionally, if the server has assigned filehandles | |||
so that the ones defined by the layout are the same as those used | so that the ones defined by the layout are the same as those used | |||
by the metadata server, all operations using such filehandles are | by the metadata server, all operations using such filehandles are | |||
within this class, with the following exception. The exception | within this class, with the following exception. The exception | |||
is that if the operation uses a stateid that is incompatible with | is that if the operation uses a stateid that is incompatible with | |||
a data-server personality (e.g. a special stateid or the stateid | a data-server personality (e.g., a special stateid or the stateid | |||
has a non-zero seqid field, see Section 13.9.1); if so, the | has a non-zero "seqid" field, see Section 13.9.1), the operation | |||
operation is in class 3, as described below. A COMPOUND | is in class 3, as described below. A COMPOUND containing | |||
containing multiple class 1 operations (and operations of no | multiple class 1 operations (and operations of no other class) | |||
other class) MAY be sent to a server with multiple concurrent | MAY be sent to a server with multiple concurrent data server and | |||
data server and non-data-server personalities. | non-data-server personalities. | |||
2. An operation which is unambiguously referable to the data server | 2. An operation that is unambiguously referable to the data-server | |||
personality. These are data-server I/O operations where the | personality. This includes data-server I/O operations where the | |||
filehandle is one that can only be validly directed to the data- | filehandle is one that can only be validly directed to the data- | |||
server personality. | server personality. | |||
3. An operation which is unambiguously referable to the non-data- | 3. An operation that is unambiguously referable to the non-data- | |||
server personality. These include all COMPOUND operations that | server personality. This includes all COMPOUND operations that | |||
are neither data-server housekeeping nor data-server I/O | are neither data-server housekeeping nor data-server I/O | |||
operations plus data-server I/O operations where the current fh | operations, plus data-server I/O operations where the current fh | |||
(or the one to be made the current fh in the case of PUTFH) is | (or the one to be made the current fh in the case of PUTFH) is | |||
one that is only valid on the metadata server or where a stateid | only valid on the metadata server or where a stateid is used that | |||
is used that is incompatible with the data server, i.e. is a | is incompatible with the data server, i.e., is a special stateid | |||
special stateid or has a non-zero seqid value. | or has a non-zero seqid value. | |||
When a COMPOUND first executes an operation from class 3 above, it | When a COMPOUND first executes an operation from class 3 above, it | |||
acts as a normal COMPOUND on any other server and the data server | acts as a normal COMPOUND on any other server, and the data-server | |||
personality ceases to be relevant. There are no special restrictions | personality ceases to be relevant. There are no special restrictions | |||
on the operations in the COMPOUND to limit them to those for a data | on the operations in the COMPOUND to limit them to those for a data | |||
server. When a PUTFH is done, filehandles derived from the layout | server. When a PUTFH is done, filehandles derived from the layout | |||
are not valid. If their format is not normally acceptable, then | are not valid. If their format is not normally acceptable, then | |||
NFS4ERR_BADHANDLE MUST result. Similarly, current filehandles for | NFS4ERR_BADHANDLE MUST result. Similarly, current filehandles for | |||
other operations do not accept filehandles derived from layouts and | other operations do not accept filehandles derived from layouts and | |||
are not normally usable on the metadata server. Using these will | are not normally usable on the metadata server. Using these will | |||
result in NFS4ERR_STALE. | result in NFS4ERR_STALE. | |||
When a COMPOUND first executes an operation from class 2, which would | When a COMPOUND first executes an operation from class 2, which would | |||
be PUTFH where the filehandle is one from a layout, the COMPOUND | be PUTFH where the filehandle is one from a layout, the COMPOUND | |||
henceforth is interpreted with respect to the data server | henceforth is interpreted with respect to the data-server | |||
personality. Operations outside the two classes discussed above MUST | personality. Operations outside the two classes discussed above MUST | |||
result in NFS4ERR_NOTSUPP. Filehandles are validated using the rules | result in NFS4ERR_NOTSUPP. Filehandles are validated using the rules | |||
of the data server, resulting in NFS4ERR_BADHANDLE and/or | of the data server, resulting in NFS4ERR_BADHANDLE and/or | |||
NFS4ERR_STALE even when they would not normally do so when addressed | NFS4ERR_STALE even when they would not normally do so when addressed | |||
to the non-data-server personality. Stateids must obey the rules of | to the non-data-server personality. Stateids must obey the rules of | |||
the data server in that any use of special stateids or stateids with | the data server in that any use of special stateids or stateids with | |||
non-zero seqid values must result in NFS4ERR_BAD_STATEID. | non-zero seqid values must result in NFS4ERR_BAD_STATEID. | |||
Until the server first executes an operation from class 2 or class 3, | Until the server first executes an operation from class 2 or class 3, | |||
the client MUST NOT depend on the operation being executed by either | the client MUST NOT depend on the operation being executed by either | |||
skipping to change at page 325, line 36 | skipping to change at page 325, line 42 | |||
RECOMMENDED that where the same server can have both personalities, | RECOMMENDED that where the same server can have both personalities, | |||
the server assign separate unique filehandles to both personalities. | the server assign separate unique filehandles to both personalities. | |||
This makes it unambiguous for which server a given request is | This makes it unambiguous for which server a given request is | |||
intended. | intended. | |||
GETATTR and SETATTR MUST be directed to the metadata server. In the | GETATTR and SETATTR MUST be directed to the metadata server. In the | |||
case of a SETATTR of the size attribute, the control protocol is | case of a SETATTR of the size attribute, the control protocol is | |||
responsible for propagating size updates/truncations to the data | responsible for propagating size updates/truncations to the data | |||
servers. In the case of extending WRITEs to the data servers, the | servers. In the case of extending WRITEs to the data servers, the | |||
new size must be visible on the metadata server once a LAYOUTCOMMIT | new size must be visible on the metadata server once a LAYOUTCOMMIT | |||
has completed (see Section 12.5.4.2). Section 13.10, describes the | has completed (see Section 12.5.4.2). Section 13.10 describes the | |||
mechanism by which the client is to handle data server files that do | mechanism by which the client is to handle data-server files that do | |||
not reflect the metadata server's size. | not reflect the metadata server's size. | |||
13.7. COMMIT Through Metadata Server | 13.7. COMMIT through Metadata Server | |||
The file layout provides two alternate means of providing for the | The file layout provides two alternate means of providing for the | |||
commit of data written through data servers. The flag | commit of data written through data servers. The flag | |||
NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout | NFL4_UFLG_COMMIT_THRU_MDS in the field nfl_util of the file layout | |||
(data type nfsv4_1_file_layout4) is an indication from the metadata | (data type nfsv4_1_file_layout4) is an indication from the metadata | |||
server to the client of the REQUIRED way of performing COMMIT, either | server to the client of the REQUIRED way of performing COMMIT, either | |||
by sending the COMMIT to the data server or the metadata server. | by sending the COMMIT to the data server or the metadata server. | |||
These two methods of dealing with the issue correspond to broad | These two methods of dealing with the issue correspond to broad | |||
styles of implementation for a pNFS server supporting the files | styles of implementation for a pNFS server supporting the file layout | |||
layout type. | type. | |||
o When the flag is FALSE, COMMIT operations MUST to be sent to the | o When the flag is FALSE, COMMIT operations MUST to be sent to the | |||
data server to which the corresponding WRITE operations were sent. | data server to which the corresponding WRITE operations were sent. | |||
This approach is most useful when striping of files is implemented | This approach is sometimes useful when file striping is | |||
as part of pNFS server, with the individual data servers each | implemented within the pNFS server (instead of the file system), | |||
implementing their own file systems. | with the individual data servers each implementing their own file | |||
systems. | ||||
o When the flag is TRUE, COMMIT operations MUST be sent to the | o When the flag is TRUE, COMMIT operations MUST be sent to the | |||
metadata server, rather than to the individual data servers. This | metadata server, rather than to the individual data servers. This | |||
approach is most useful when the pNFS server is implemented on top | approach is sometimes useful when file striping is implemented | |||
of a clustered file system. In such an implementation, sending | within the clustered file system that is the backend to the pNFS | |||
COMMIT's to multiple data servers may result in repeated writes of | server. In such an implementation, each COMMIT to each data | |||
metadata blocks as each individual COMMIT is executed, to the | server might result in repeated writes of metadata blocks to the | |||
detriment of write performance. Sending a single COMMIT to the | detriment of write performance. Sending a single COMMIT to the | |||
metadata server can provide more efficiency when there exists a | metadata server can be more efficient when there exists a | |||
clustered file system capable of implementing such a co-ordinated | clustered file system capable of implementing such a coordinated | |||
COMMIT. | COMMIT. | |||
If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to | If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS is TRUE, then in order to | |||
maintain the current NFSv4.1 commit and recovery model, the data | maintain the current NFSv4.1 commit and recovery model, the data | |||
servers MUST return a common writeverf verifier in all WRITE | servers MUST return a common writeverf verifier in all WRITE | |||
responses for a given file layout, and the metadata server's | responses for a given file layout, and the metadata server's | |||
COMMIT implementation must return the same writeverf. The value | COMMIT implementation must return the same writeverf. The value | |||
of the writeverf verifier MUST be changed at the metadata server | of the writeverf verifier MUST be changed at the metadata server | |||
or any data server that is referenced in the layout, whenever | or any data server that is referenced in the layout, whenever | |||
there is a server event that can possibly lead to loss of | there is a server event that can possibly lead to loss of | |||
uncommitted data. The scope of the verifier can be for a file or | uncommitted data. The scope of the verifier can be for a file or | |||
for the entire pNFS server. It might be more difficult for the | for the entire pNFS server. It might be more difficult for the | |||
server to maintain the verifier at the file level but the benefit | server to maintain the verifier at the file level, but the benefit | |||
is that only events that impact a given file will require recovery | is that only events that impact a given file will require recovery | |||
action. | action. | |||
Note that if the layout specified dense packing, then the offset used | Note that if the layout specified dense packing, then the offset used | |||
to a COMMIT to the MDS may differ than that of an offset used to a | to a COMMIT to the MDS may differ than that of an offset used to a | |||
COMMIT to the data server. | COMMIT to the data server. | |||
The single COMMIT to the metadata server will return a verifier and | The single COMMIT to the metadata server will return a verifier, and | |||
the client should compare it to all the verifiers from the WRITEs and | the client should compare it to all the verifiers from the WRITEs and | |||
fail the COMMIT if there is any mismatched verifiers. If COMMIT to | fail the COMMIT if there are any mismatched verifiers. If COMMIT to | |||
the metadata server fails, the client should re-send WRITEs for all | the metadata server fails, the client should re-send WRITEs for all | |||
the modified data in the file. The client should treat modified data | the modified data in the file. The client should treat modified data | |||
with a mismatched verifier as a WRITE failure and try to recover by | with a mismatched verifier as a WRITE failure and try to recover by | |||
resending the WRITEs to the original data server or using another | resending the WRITEs to the original data server or using another | |||
path to that data if the layout has not been recalled. Another | path to that data if the layout has not been recalled. | |||
option the client has is getting a new layout or just rewrite the | Alternatively, the client can obtain a new layout or it could rewrite | |||
data through the metadata server. If nfl_util & | the data directly to the metadata server. If nfl_util & | |||
NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending a COMMIT to the metadata | NFL4_UFLG_COMMIT_THRU_MDS is FALSE, sending a COMMIT to the metadata | |||
server might have no effect. If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS | server might have no effect. If nfl_util & NFL4_UFLG_COMMIT_THRU_MDS | |||
is FALSE, a COMMIT sent to the metadata server should be used only to | is FALSE, a COMMIT sent to the metadata server should be used only to | |||
commit data that was written to the metadata server. See | commit data that was written to the metadata server. See | |||
Section 12.7.6 for recovery options. | Section 12.7.6 for recovery options. | |||
13.8. The Layout Iomode | 13.8. The Layout Iomode | |||
The layout iomode need not be used by the metadata server when | The layout iomode need not be used by the metadata server when | |||
servicing NFSv4.1 file-based layouts, although in some circumstances | servicing NFSv4.1 file-based layouts, although in some circumstances | |||
skipping to change at page 327, line 28 | skipping to change at page 327, line 35 | |||
client holds a valid layout and return an error if the client does | client holds a valid layout and return an error if the client does | |||
not. | not. | |||
13.9. Metadata and Data Server State Coordination | 13.9. Metadata and Data Server State Coordination | |||
13.9.1. Global Stateid Requirements | 13.9.1. Global Stateid Requirements | |||
When the client sends I/O to a data server, the stateid used MUST NOT | When the client sends I/O to a data server, the stateid used MUST NOT | |||
be a layout stateid as returned by LAYOUTGET or sent by | be a layout stateid as returned by LAYOUTGET or sent by | |||
CB_LAYOUTRECALL. Permitted stateids are based on one of the | CB_LAYOUTRECALL. Permitted stateids are based on one of the | |||
following: an open stateid (the stateid field of data type OPEN4resok | following: an OPEN stateid (the stateid field of data type OPEN4resok | |||
as returned by OPEN), a delegation stateid (the stateid field of data | as returned by OPEN), a delegation stateid (the stateid field of data | |||
types open_read_delegation4 and open_write_delegation4 as returned by | types open_read_delegation4 and open_write_delegation4 as returned by | |||
OPEN or WANT_DELEGATION, or as sent by CB_PUSH_DELEG), or a stateid | OPEN or WANT_DELEGATION, or as sent by CB_PUSH_DELEG), or a stateid | |||
returned by the LOCK or LOCKU operations. The stateid sent to the | returned by the LOCK or LOCKU operations. The stateid sent to the | |||
data server MUST be sent with the seqid set to zero, indicating the | data server MUST be sent with the seqid set to zero, indicating the | |||
most current version of that stateid, rather than indicating a | most current version of that stateid, rather than indicating a | |||
specific non-zero seqid value. In no case is the use of special | specific non-zero seqid value. In no case is the use of special | |||
stateid values allowed. | stateid values allowed. | |||
The stateid used for I/O MUST have the same effect and be subject to | The stateid used for I/O MUST have the same effect and be subject to | |||
the same validation on a data server as it would if the I/O was being | the same validation on a data server as it would if the I/O was being | |||
performed on the metadata server itself in the absence of pNFS. This | performed on the metadata server itself in the absence of pNFS. This | |||
has the implication that stateids are globally valid on both the | has the implication that stateids are globally valid on both the | |||
metadata and data servers. This requires the metadata server to | metadata and data servers. This requires the metadata server to | |||
propagate changes in lock and open state to the data servers, so that | propagate changes in LOCK and OPEN state to the data servers, so that | |||
the data servers can validate I/O accesses. This is discussed | the data servers can validate I/O accesses. This is discussed | |||
further in Section 13.9.2. Depending on when stateids are | further in Section 13.9.2. Depending on when stateids are | |||
propagated, the existence of a valid stateid on the data server may | propagated, the existence of a valid stateid on the data server may | |||
act as proof of a valid layout. | act as proof of a valid layout. | |||
Clients performing I/O operations need to select an appropriate | Clients performing I/O operations need to select an appropriate | |||
stateid based on the locks (including opens and delegations) held by | stateid based on the locks (including opens and delegations) held by | |||
the client and the various types of state-owners sending the I/O | the client and the various types of state-owners sending the I/O | |||
requests. The rules for doing so when referencing data servers are | requests. The rules for doing so when referencing data servers are | |||
somewhat different from those discussed in Section 8.2.5 which apply | somewhat different from those discussed in Section 8.2.5, which apply | |||
when accessing metadata servers. | when accessing metadata servers. | |||
The following rules, applied in order of decreasing priority, govern | The following rules, applied in order of decreasing priority, govern | |||
the selection of the appropriate stateid: | the selection of the appropriate stateid: | |||
o If the client holds a delegation for the file in question, the | o If the client holds a delegation for the file in question, the | |||
delegation stateid should be used. | delegation stateid should be used. | |||
o Otherwise, there must be an open stateid for the current open- | o Otherwise, there must be an OPEN stateid for the current open- | |||
owner, and that open stateid for the open file in question is | owner, and that OPEN stateid for the open file in question is | |||
used, unless mandatory locking, prevents that. See below. | used, unless mandatory locking prevents that. See below. | |||
o If the data server had previously responded with NFS4ERR_LOCKED to | o If the data server had previously responded with NFS4ERR_LOCKED to | |||
use of the open stateid, then the client should use the lock | use of the OPEN stateid, then the client should use the byte-range | |||
stateid whenever one exists for that open file with the current | lock stateid whenever one exists for that open file with the | |||
lock-owner. | current lock-owner. | |||
o Special stateids should never be used and if used the data server | o Special stateids should never be used. If they are used, the data | |||
MUST reject the I/O with an NFS4ERR_BAD_STATEID error. | server MUST reject the I/O with an NFS4ERR_BAD_STATEID error. | |||
13.9.2. Data Server State Propagation | 13.9.2. Data Server State Propagation | |||
Since the metadata server, which handles lock and open-mode state | Since the metadata server, which handles byte-range lock and open- | |||
changes, as well as ACLs, might not be co-located with the data | mode state changes as well as ACLs, might not be co-located with the | |||
servers where I/O access are validated, the server implementation | data servers where I/O accesses are validated, the server | |||
MUST take care of propagating changes of this state to the data | implementation MUST take care of propagating changes of this state to | |||
servers. Once the propagation to the data servers is complete, the | the data servers. Once the propagation to the data servers is | |||
full effect of those changes MUST be in effect at the data servers. | complete, the full effect of those changes MUST be in effect at the | |||
However, some state changes need not be propagated immediately, | data servers. However, some state changes need not be propagated | |||
although all changes SHOULD be propagated promptly. These state | immediately, although all changes SHOULD be propagated promptly. | |||
propagations have an impact on the design of the control protocol, | These state propagations have an impact on the design of the control | |||
even though the control protocol is outside of the scope of this | protocol, even though the control protocol is outside of the scope of | |||
specification. Immediate propagation refers to the synchronous | this specification. Immediate propagation refers to the synchronous | |||
propagation of state from the metadata server to the data server(s); | propagation of state from the metadata server to the data server(s); | |||
the propagation must be complete before returning to the client. | the propagation must be complete before returning to the client. | |||
13.9.2.1. Lock State Propagation | 13.9.2.1. Lock State Propagation | |||
If the pNFS server supports mandatory locking, any mandatory locks on | If the pNFS server supports mandatory byte-range locking, any | |||
a file MUST be made effective at the data servers before the request | mandatory byte-range locks on a file MUST be made effective at the | |||
that establishes them returns to the caller. The effect MUST be the | data servers before the request that establishes them returns to the | |||
same as if the mandatory lock state were synchronously propagated to | caller. The effect MUST be the same as if the mandatory byte-range | |||
the data servers, even though the details of the control protocol may | lock state were synchronously propagated to the data servers, even | |||
avoid actual transfer of the state under certain circumstances. | though the details of the control protocol may avoid actual transfer | |||
of the state under certain circumstances. | ||||
On the other hand, since advisory lock state is not used for checking | On the other hand, since advisory byte-range lock state is not used | |||
I/O accesses at the data servers, there is no semantic reason for | for checking I/O accesses at the data servers, there is no semantic | |||
propagating advisory lock state to the data servers. Since updates | reason for propagating advisory byte-range lock state to the data | |||
to advisory locks neither confer nor remove privileges, these changes | servers. Since updates to advisory locks neither confer nor remove | |||
need not be propagated immediately, and may not need to be propagated | privileges, these changes need not be propagated immediately, and may | |||
promptly. The updates to advisory locks need only be propagated when | not need to be propagated promptly. The updates to advisory locks | |||
the data server needs to resolve a question about a stateid. In | need only be propagated when the data server needs to resolve a | |||
fact, if byte-range locking is not mandatory (i.e., is advisory) the | question about a stateid. In fact, if byte-range locking is not | |||
clients are advised not to use the lock-based stateids for I/O at | mandatory (i.e., is advisory) the clients are advised to avoid using | |||
all. The stateids returned by open are sufficient and eliminate | the byte-range lock-based stateids for I/O. The stateids returned by | |||
overhead for this kind of state propagation. | OPEN are sufficient and eliminate overhead for this kind of state | |||
propagation. | ||||
If a client gets back an NFS4ERR_LOCKED error from a data server, | If a client gets back an NFS4ERR_LOCKED error from a data server, | |||
this is an indication that mandatory byte-range locking is in force. | this is an indication that mandatory byte-range locking is in force. | |||
The client recovers from this by getting a byte-range lock that | The client recovers from this by getting a byte-range lock that | |||
covers the affected range and re-sends the I/O with the stateid of | covers the affected range and re-sends the I/O with the stateid of | |||
the byte-range lock. | the byte-range lock. | |||
13.9.2.2. Open and Deny Mode Validation | 13.9.2.2. Open and Deny Mode Validation | |||
Open and deny mode validation MUST be performed against the open and | Open and deny mode validation MUST be performed against the open and | |||
deny mode(s) held by the data servers. When access is reduced or a | deny mode(s) held by the data servers. When access is reduced or a | |||
deny mode made more restrictive (because of CLOSE or DOWNGRADE) the | deny mode made more restrictive (because of CLOSE or OPEN_DOWNGRADE), | |||
data server MUST prevent any I/Os that would be denied if performed | the data server MUST prevent any I/Os that would be denied if | |||
on the metadata server. When access is expanded, the data server | performed on the metadata server. When access is expanded, the data | |||
MUST make sure that no requests are subsequently rejected because of | server MUST make sure that no requests are subsequently rejected | |||
open or deny issues that no longer apply, given the previous | because of open or deny issues that no longer apply, given the | |||
relaxation. | previous relaxation. | |||
13.9.2.3. File Attributes | 13.9.2.3. File Attributes | |||
Since the SETATTR operation has the ability to modify state that is | Since the SETATTR operation has the ability to modify state that is | |||
visible on both the metadata and data servers (e.g., the size), care | visible on both the metadata and data servers (e.g., the size), care | |||
must be taken to ensure that the resultant state across the set of | must be taken to ensure that the resultant state across the set of | |||
data servers is consistent; especially when truncating or growing the | data servers is consistent, especially when truncating or growing the | |||
file. | file. | |||
As described earlier, the LAYOUTCOMMIT operation is used to ensure | As described earlier, the LAYOUTCOMMIT operation is used to ensure | |||
that the metadata is synchronized with changes made to the data | that the metadata is synchronized with changes made to the data | |||
servers. For the NFSv4.1-based data storage protocol, it is | servers. For the NFSv4.1-based data storage protocol, it is | |||
necessary to re-synchronize state such as the size attribute, and the | necessary to re-synchronize state such as the size attribute, and the | |||
setting of mtime/change/atime. See Section 12.5.4 for a full | setting of mtime/change/atime. See Section 12.5.4 for a full | |||
description of the semantics regarding LAYOUTCOMMIT and attribute | description of the semantics regarding LAYOUTCOMMIT and attribute | |||
synchronization. It should be noted, that by using an NFSv4.1-based | synchronization. It should be noted that by using an NFSv4.1-based | |||
layout type, it is possible to synchronize this state before | layout type, it is possible to synchronize this state before | |||
LAYOUTCOMMIT occurs. For example, the control protocol can be used | LAYOUTCOMMIT occurs. For example, the control protocol can be used | |||
to query the attributes present on the data servers. | to query the attributes present on the data servers. | |||
Any changes to file attributes that control authorization or access | Any changes to file attributes that control authorization or access | |||
as reflected by ACCESS calls or READs and WRITEs on the metadata | as reflected by ACCESS calls or READs and WRITEs on the metadata | |||
server, MUST be propagated to the data servers for enforcement on | server, MUST be propagated to the data servers for enforcement on | |||
READ and WRITE I/O calls. If the changes made on the metadata server | READ and WRITE I/O calls. If the changes made on the metadata server | |||
result in more restrictive access permissions for any user, those | result in more restrictive access permissions for any user, those | |||
changes MUST be propagated to the data servers synchronously. | changes MUST be propagated to the data servers synchronously. | |||
The OPEN operation (Section 18.16.4) does not impose any requirement | The OPEN operation (Section 18.16.4) does not impose any requirement | |||
that I/O operations on an open file have the same credentials as the | that I/O operations on an open file have the same credentials as the | |||
OPEN itself (unless EXCHGID4_FLAG_BIND_PRINC_STATEID is set when | OPEN itself (unless EXCHGID4_FLAG_BIND_PRINC_STATEID is set when | |||
EXCHANGE_ID creates the client ID) and so requires the server's READ | EXCHANGE_ID creates the client ID), and so it requires the server's | |||
and WRITE operations to perform appropriate access checking. Changes | READ and WRITE operations to perform appropriate access checking. | |||
to ACLs also require new access checking by READ and WRITE on the | Changes to ACLs also require new access checking by READ and WRITE on | |||
server. The propagation of access right changes due to changes in | the server. The propagation of access-right changes due to changes | |||
ACLs may be asynchronous only if the server implementation is able to | in ACLs may be asynchronous only if the server implementation is able | |||
determine that the updated ACL is not more restrictive for any user | to determine that the updated ACL is not more restrictive for any | |||
specified in the old ACL. Due to the relative infrequency of ACL | user specified in the old ACL. Due to the relative infrequency of | |||
updates, it is suggested that all changes be propagated | ACL updates, it is suggested that all changes be propagated | |||
synchronously. | synchronously. | |||
13.10. Data Server Component File Size | 13.10. Data Server Component File Size | |||
A potential problem exists when a component data file on a particular | A potential problem exists when a component data file on a particular | |||
data server is grown past EOF; the problem exists for both dense and | data server has grown past EOF; the problem exists for both dense and | |||
sparse layouts. Imagine the following scenario: a client creates a | sparse layouts. Imagine the following scenario: a client creates a | |||
new file (size == 0) and writes to byte 131072; the client then seeks | new file (size == 0) and writes to byte 131072; the client then seeks | |||
to the beginning of the file and reads byte 100. The client should | to the beginning of the file and reads byte 100. The client should | |||
receive 0s back as a result of the READ. However, if the READ falls | receive zeroes back as a result of the READ. However, if the | |||
on a data server other than the one that received client's original | striping pattern directs the client to send the READ to a data server | |||
WRITE, the data server servicing the READ may still believe that the | other than the one that received the client's original WRITE, the | |||
file's size is at 0 and return no data with the EOF flag set. The | data server servicing the READ may believe that the file's size is | |||
data server can only return 0s if it knows that the file's size has | still 0 bytes. In that event, the data server's READ response will | |||
been extended. This would require the immediate propagation of the | contain zero bytes and an indication of EOF. The data server can | |||
file's size to all data servers, which is potentially very costly. | only return zeroes if it knows that the file's size has been | |||
extended. This would require the immediate propagation of the file's | ||||
size to all data servers, which is potentially very costly. | ||||
Therefore, the client that has initiated the extension of the file's | Therefore, the client that has initiated the extension of the file's | |||
size MUST be prepared to deal with these EOF conditions; the EOF'ed | size MUST be prepared to deal with these EOF conditions. When the | |||
or short READs will be treated as a hole in the file and the NFS | offset in the arguments to READ is less than the client's view of the | |||
client will substitute 0s for the data when the offset is less than | file size, if the READ response indicates EOF and/or contains fewer | |||
the client's view of the file size. | bytes than requested, the client will interpret such a response as a | |||
hole in the file, and the NFS client will substitute zeroes for the | ||||
data. | ||||
The NFSv4.1 protocol only provides close to open file data cache | The NFSv4.1 protocol only provides close-to-open file data cache | |||
semantics; meaning that when the file is closed all modified data is | semantics; meaning that when the file is closed, all modified data is | |||
written to the server. When a subsequent OPEN of the file is done, | written to the server. When a subsequent OPEN of the file is done, | |||
the change attribute is inspected for a difference from a cached | the change attribute is inspected for a difference from a cached | |||
value for the change attribute. For the case above, this means that | value for the change attribute. For the case above, this means that | |||
a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and | a LAYOUTCOMMIT will be done at close (along with the data WRITEs) and | |||
will update the file's size and change attribute. Access from | will update the file's size and change attribute. Access from | |||
another client after that point will result in the appropriate size | another client after that point will result in the appropriate size | |||
being returned. | being returned. | |||
13.11. Layout Revocation and Fencing | 13.11. Layout Revocation and Fencing | |||
As described in Section 12.7, the layout type-specific storage | As described in Section 12.7, the layout-type-specific storage | |||
protocol is responsible for handling the effects of I/Os started | protocol is responsible for handling the effects of I/Os that started | |||
before lease expiration, extending through lease expiration. The | before lease expiration and extend through lease expiration. The | |||
LAYOUT4_NFSV4_1_FILES layout type can prevent all I/Os to data | LAYOUT4_NFSV4_1_FILES layout type can prevent all I/Os to data | |||
servers from being executed after lease expiration, without relying | servers from being executed after lease expiration (this prevention | |||
on a precise client lease timer and without requiring data servers to | is called "fencing"), without relying on a precise client lease timer | |||
maintain lease timers. However, while LAYOUT4_NFSV4_1_FILES pNFS | and without requiring data servers to maintain lease timers. The | |||
server is free to deny the client all access to the data servers, | LAYOUT4_NFSV4_1_FILES pNFS server has the flexibility to revoke | |||
because it supports revocation of layouts, it is also free to perform | individual layouts, and thus fence I/O on a per-file basis. | |||
a denial on a per file basis only when revoking a layout. | ||||
In addition to lease expiration, the reasons a layout can be revoked | In addition to lease expiration, the reasons a layout can be revoked | |||
include: client fails to respond to a CB_LAYOUTRECALL, the metadata | include: client fails to respond to a CB_LAYOUTRECALL, the metadata | |||
server restarts, or administrative intervention. Regardless of the | server restarts, or administrative intervention. Regardless of the | |||
reason, once a client's layout has been revoked, the pNFS server MUST | reason, once a client's layout has been revoked, the pNFS server MUST | |||
prevent the client from sending I/O for the affected file from and to | prevent the client from sending I/O for the affected file from and to | |||
all data servers, in other words, it MUST fence the client from the | all data servers; in other words, it MUST fence the client from the | |||
affected file on the data servers. | affected file on the data servers. | |||
Fencing works as follows. As described in Section 13.1, in COMPOUND | Fencing works as follows. As described in Section 13.1, in COMPOUND | |||
procedure requests to the data server, the data filehandle provided | procedure requests to the data server, the data filehandle provided | |||
by the PUTFH operation and the stateid in the READ or WRITE operation | by the PUTFH operation and the stateid in the READ or WRITE operation | |||
are used to validate that the client has a valid layout for the I/O | are used to ensure that the client has a valid layout for the I/O | |||
being performed, if it does not, the I/O is rejected with | being performed; if it does not, the I/O is rejected with | |||
NFS4ERR_PNFS_NO_LAYOUT. The server can simply check the stateid, and | NFS4ERR_PNFS_NO_LAYOUT. The server can simply check the stateid and, | |||
additionally, make the data filehandle stale if the layout specified | additionally, make the data filehandle stale if the layout specified | |||
a data filehandle that is different from the metadata server's | a data filehandle that is different from the metadata server's | |||
filehandle for the file (see the nfl_fh_list description in | filehandle for the file (see the nfl_fh_list description in | |||
Section 13.3). | Section 13.3). | |||
Before the metadata server takes any action to invalidate layout | Before the metadata server takes any action to revoke layout state | |||
state given out by a previous instance, it must make sure that all | given out by a previous instance, it must make sure that all layout | |||
layout state from that previous instance are invalidated at the data | state from that previous instance are invalidated at the data | |||
servers. This means that a metadata server may not restripe a file | servers. This has the following implications. | |||
until it has contacted all of the data servers to invalidate the | ||||
layouts from the previous instance nor may it give out mandatory | o The metadata server must not restripe a file until it has | |||
locks that conflict with layouts from the previous instance without | contacted all of the data servers to invalidate the layouts from | |||
either doing a specific invalidation (as it would have to do anyway) | the previous instance. | |||
or doing a global data server invalidation. | ||||
o The metadata server must not give out mandatory locks that | ||||
conflict with layouts from the previous instance without either | ||||
doing a specific layout invalidation (as it would have to do | ||||
anyway) or doing a global data server invalidation. | ||||
13.12. Security Considerations for the File Layout Type | 13.12. Security Considerations for the File Layout Type | |||
The NFSv4.1 file layout type MUST adhere to the security | The NFSv4.1 file layout type MUST adhere to the security | |||
considerations outlined in Section 12.9. NFSv4.1 data servers MUST | considerations outlined in Section 12.9. NFSv4.1 data servers MUST | |||
make all of the required access checks on each READ or WRITE I/O as | make all of the required access checks on each READ or WRITE I/O as | |||
determined by the NFSv4.1 protocol. If the metadata server would | determined by the NFSv4.1 protocol. If the metadata server would | |||
deny READ or WRITE operation on a given file due its ACL, mode | deny a READ or WRITE operation on a file due to its ACL, mode | |||
attribute, open mode, open deny mode, mandatory lock state, or any | attribute, open access mode, open deny mode, mandatory byte-range | |||
other attributes and state, the data server MUST also deny the READ | lock state, or any other attributes and state, the data server MUST | |||
or WRITE operation. This impacts the control protocol and the | also deny the READ or WRITE operation. This impacts the control | |||
propagation of state from the metadata server to the data servers; | protocol and the propagation of state from the metadata server to the | |||
see Section 13.9.2 for more details. | data servers; see Section 13.9.2 for more details. | |||
The methods for authentication, integrity, and privacy for file | The methods for authentication, integrity, and privacy for data | |||
layout-based data servers are the same as those used by metadata | servers based on the LAYOUT4_NFSV4_1_FILES layout type are the same | |||
servers. Metadata and data servers use ONC RPC security flavors to | as those used by metadata servers. Metadata and data servers use ONC | |||
authenticate, and SECINFO and SECINFO_NO_NAME to negotiate the | RPC security flavors to authenticate, and SECINFO and SECINFO_NO_NAME | |||
security mechanism and services to be used. Thus when using the | to negotiate the security mechanism and services to be used. Thus, | |||
LAYOUT4_NFSV4_1_FILES layout type, the impact on the RPC-based | when using the LAYOUT4_NFSV4_1_FILES layout type, the impact on the | |||
security model due to pNFS (as alluded to in Section 1.6.1 and | RPC-based security model due to pNFS (as alluded to in Sections 1.7.1 | |||
Section 1.6.2.2) is zero. | and 1.7.2.2) is zero. | |||
For a given file object, a metadata server MAY require different | For a given file object, a metadata server MAY require different | |||
security parameters (secinfo4 value) than the data server. For a | security parameters (secinfo4 value) than the data server. For a | |||
given file object with multiple data servers, the secinfo4 value | given file object with multiple data servers, the secinfo4 value | |||
SHOULD be the same across all data servers. If the secinfo4 values | SHOULD be the same across all data servers. If the secinfo4 values | |||
across a metadata server and its data servers differ for a specific | across a metadata server and its data servers differ for a specific | |||
file, the mapping of the principal to the server's internal user | file, the mapping of the principal to the server's internal user | |||
identifier MUST be the same in order for the access control checks | identifier MUST be the same in order for the access-control checks | |||
based on ACL, mode, open and deny mode, and mandatory locking to be | based on ACL, mode, open and deny mode, and mandatory locking to be | |||
consistent across on the pNFS server. | consistent across on the pNFS server. | |||
If an NFSv4.1 implementation supports pNFS and supports NFSv4.1 file | If an NFSv4.1 implementation supports pNFS and supports NFSv4.1 file | |||
layouts, then the implementation MUST support the SECINFO_NO_NAME | layouts, then the implementation MUST support the SECINFO_NO_NAME | |||
operation, on both the metadata and data servers. | operation on both the metadata and data servers. | |||
14. Internationalization | 14. Internationalization | |||
The primary issue in which NFSv4.1 needs to deal with | The primary issue in which NFSv4.1 needs to deal with | |||
internationalization, or I18N, is with respect to file names and | internationalization, or I18N, is with respect to file names and | |||
other strings as used within the protocol. The choice of string | other strings as used within the protocol. The choice of string | |||
representation must allow reasonable name/string access to clients | representation must allow reasonable name/string access to clients | |||
which use various languages. The UTF-8 encoding of the UCS as | that use various languages. The UTF-8 encoding of the UCS (Universal | |||
defined by ISO10646 [21] allows for this type of access and follows | Multiple-Octet Coded Character Set) as defined by ISO10646 [21] | |||
the policy described in "IETF Policy on Character Sets and | allows for this type of access and follows the policy described in | |||
Languages", RFC2277 [22]. | "IETF Policy on Character Sets and Languages", RFC 2277 [22]. | |||
RFC3454 [19], otherwise know as "stringprep", documents a framework | RFC3454 [19], otherwise know as "stringprep", documents a framework | |||
for using Unicode/UTF-8 in networking protocols, so as "to increase | for using Unicode/UTF-8 in networking protocols so as "to increase | |||
the likelihood that string input and string comparison work in ways | the likelihood that string input and string comparison work in ways | |||
that make sense for typical users throughout the world." A protocol | that make sense for typical users throughout the world". A protocol | |||
must define a profile of stringprep "in order to fully specify the | must define a profile of stringprep "in order to fully specify the | |||
processing options." The remainder of this Internationalization | processing options". The remainder of this section defines the | |||
section defines the NFSv4.1 stringprep profiles. Much of terminology | NFSv4.1 stringprep profiles. Much of the terminology used for the | |||
used for the remainder of this section comes from stringprep. | remainder of this section comes from stringprep. | |||
There are three UTF-8 string types defined for NFSv4.1: utf8str_cs, | There are three UTF-8 string types defined for NFSv4.1: utf8str_cs, | |||
utf8str_cis, and utf8str_mixed. Separate profiles are defined for | utf8str_cis, and utf8str_mixed. Separate profiles are defined for | |||
each. Each profile defines the following, as required by stringprep: | each. Each profile defines the following, as required by stringprep: | |||
o The intended applicability of the profile | o The intended applicability of the profile. | |||
o The character repertoire that is the input and output to | o The character repertoire that is the input and output to | |||
stringprep (which is Unicode 3.2 for referenced version of | stringprep (which is Unicode 3.2 for the referenced version of | |||
stringprep). However, NFSv4.1 implementations are not limited to | stringprep). However, NFSv4.1 implementations are not limited to | |||
3.2. | 3.2. | |||
o The mapping tables from stringprep used (as described in section 3 | o The mapping tables from stringprep used (as described in Section 3 | |||
of stringprep) | of stringprep). | |||
o Any additional mapping tables specific to the profile | o Any additional mapping tables specific to the profile. | |||
o The Unicode normalization used, if any (as described in section 4 | o The Unicode normalization used, if any (as described in Section 4 | |||
of stringprep) | of stringprep). | |||
o The tables from stringprep listing of characters that are | o The tables from the stringprep listing of characters that are | |||
prohibited as output (as described in section 5 of stringprep) | prohibited as output (as described in Section 5 of stringprep). | |||
o The bidirectional string testing used, if any (as described in | o The bidirectional string testing used, if any (as described in | |||
section 6 of stringprep) | Section 6 of stringprep). | |||
o Any additional characters that are prohibited as output specific | o Any additional characters that are prohibited as output specific | |||
to the profile | to the profile. | |||
Stringprep discusses Unicode characters, whereas NFSv4.1 renders | Stringprep discusses Unicode characters, whereas NFSv4.1 renders | |||
UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to | UTF-8 characters. Since there is a one-to-one mapping from UTF-8 to | |||
Unicode, when the remainder of this document refers to Unicode, the | Unicode, when the remainder of this document refers to Unicode, the | |||
reader should assume UTF-8. | reader should assume UTF-8. | |||
Much of the text for the profiles comes from RFC3491 [23]. | Much of the text for the profiles comes from RFC3491 [23]. | |||
14.1. Stringprep profile for the utf8str_cs type | 14.1. Stringprep Profile for the utf8str_cs Type | |||
Every use of the utf8str_cs type definition in the NFSv4 protocol | Every use of the utf8str_cs type definition in the NFSv4 protocol | |||
specification follows the profile named nfs4_cs_prep. | specification follows the profile named nfs4_cs_prep. | |||
14.1.1. Intended applicability of the nfs4_cs_prep profile | 14.1.1. Intended Applicability of the nfs4_cs_prep Profile | |||
The utf8str_cs type is a case sensitive string of UTF-8 characters. | The utf8str_cs type is a case-sensitive string of UTF-8 characters. | |||
Its primary use in NFSv4.1 is for naming components and pathnames. | Its primary use in NFSv4.1 is for naming components and pathnames. | |||
Components and pathnames are stored on the server's file system. Two | Components and pathnames are stored on the server's file system. Two | |||
valid distinct UTF-8 strings might be the same after processing via | valid distinct UTF-8 strings might be the same after processing via | |||
the utf8str_cs profile. If the strings are two names inside a | the utf8str_cs profile. If the strings are two names inside a | |||
directory, the NFSv4.1 server will need to either: | directory, the NFSv4.1 server will need to either: | |||
o disallow the creation of a second name if its post processed form | o disallow the creation of a second name if its post-processed form | |||
collides with that of an existing name, or | collides with that of an existing name, or | |||
o allow the creation of the second name, but arrange so that after | o allow the creation of the second name, but arrange so that after | |||
post processing, the second name is different than the post | post-processing, the second name is different than the post- | |||
processed form of the first name. | processed form of the first name. | |||
14.1.2. Character repertoire of nfs4_cs_prep | 14.1.2. Character Repertoire of nfs4_cs_prep | |||
The nfs4_cs_prep profile uses Unicode 3.2, as defined in stringprep's | The nfs4_cs_prep profile uses Unicode 3.2, as defined in stringprep's | |||
Appendix A.1. However, NFSv4.1 implementations are not limited to | Appendix A.1. However, NFSv4.1 implementations are not limited to | |||
3.2. | 3.2. | |||
14.1.3. Mapping used by nfs4_cs_prep | 14.1.3. Mapping Used by nfs4_cs_prep | |||
The nfs4_cs_prep profile specifies mapping using the following tables | The nfs4_cs_prep profile specifies mapping using the following tables | |||
from stringprep: | from stringprep: | |||
Table B.1 | Table B.1 | |||
Table B.2 is normally not part of the nfs4_cs_prep profile as it is | Table B.2 is normally not part of the nfs4_cs_prep profile as it is | |||
primarily for dealing with case-insensitive comparisons. However, if | primarily for dealing with case-insensitive comparisons. However, if | |||
the NFSv4.1 file server supports the case_insensitive file system | the NFSv4.1 file server supports the case_insensitive file system | |||
attribute, and if case_insensitive is TRUE, the NFSv4.1 server MUST | attribute, and if case_insensitive is TRUE, the NFSv4.1 server MUST | |||
use Table B.2 (in addition to Table B1) when processing utf8str_cs | use Table B.2 (in addition to Table B1) when processing utf8str_cs | |||
strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to | strings, and the NFSv4.1 client MUST assume Table B.2 (in addition to | |||
Table B.1) are being used. | Table B.1) is being used. | |||
If the case_preserving attribute is present and set to FALSE, then | If the case_preserving attribute is present and set to FALSE, then | |||
the NFSv4.1 server MUST use table B.2 to map case when processing | the NFSv4.1 server MUST use Table B.2 to map case when processing | |||
utf8str_cs strings. Whether the server maps from lower to upper case | utf8str_cs strings. Whether the server maps from lower to upper case | |||
or the upper to lower case is an implementation dependency. | or from upper to lower case is an implementation dependency. | |||
14.1.4. Normalization used by nfs4_cs_prep | 14.1.4. Normalization used by nfs4_cs_prep | |||
The nfs4_cs_prep profile does not specify a normalization form. A | The nfs4_cs_prep profile does not specify a normalization form. A | |||
later revision of this specification may specify a particular | later revision of this specification may specify a particular | |||
normalization form. Therefore, the server and client can expect that | normalization form. Therefore, the server and client can expect that | |||
they may receive unnormalized characters within protocol requests and | they may receive unnormalized characters within protocol requests and | |||
responses. If the operating environment requires normalization, then | responses. If the operating environment requires normalization, then | |||
the implementation must normalize utf8str_cs strings within the | the implementation must normalize utf8str_cs strings within the | |||
protocol before presenting the information to an application (at the | protocol before presenting the information to an application (at the | |||
client) or local file system (at the server). | client) or local file system (at the server). | |||
14.1.5. Prohibited output for nfs4_cs_prep | 14.1.5. Prohibited Output for nfs4_cs_prep | |||
The nfs4_cs_prep profile RECOMMENDS prohibiting the use of the | The nfs4_cs_prep profile RECOMMENDS prohibiting the use of the | |||
following tables from stringprep: | following tables from stringprep: | |||
Table C.5 | Table C.5 | |||
Table C.6 | Table C.6 | |||
14.1.6. Bidirectional output for nfs4_cs_prep | 14.1.6. Bidirectional Output for nfs4_cs_prep | |||
The nfs4_cs_prep profile does not specify any checking of | The nfs4_cs_prep profile does not specify any checking of | |||
bidirectional strings. | bidirectional strings. | |||
14.2. Stringprep profile for the utf8str_cis type | 14.2. Stringprep Profile for the utf8str_cis Type | |||
Every use of the utf8str_cis type definition in the NFSv4.1 protocol | Every use of the utf8str_cis type definition in the NFSv4.1 protocol | |||
specification follows the profile named nfs4_cis_prep. | specification follows the profile named nfs4_cis_prep. | |||
14.2.1. Intended applicability of the nfs4_cis_prep profile | 14.2.1. Intended Applicability of the nfs4_cis_prep Profile | |||
The utf8str_cis type is a case insensitive string of UTF-8 | The utf8str_cis type is a case-insensitive string of UTF-8 | |||
characters. Its primary use in NFSv4.1 is for naming NFS servers. | characters. Its primary use in NFSv4.1 is for naming NFS servers. | |||
14.2.2. Character repertoire of nfs4_cis_prep | 14.2.2. Character Repertoire of nfs4_cis_prep | |||
The nfs4_cis_prep profile uses Unicode 3.2, as defined in | The nfs4_cis_prep profile uses Unicode 3.2, as defined in | |||
stringprep's Appendix A.1. However, NFSv4.1 implementations are not | stringprep's Appendix A.1. However, NFSv4.1 implementations are not | |||
limited to 3.2. | limited to 3.2. | |||
14.2.3. Mapping used by nfs4_cis_prep | 14.2.3. Mapping Used by nfs4_cis_prep | |||
The nfs4_cis_prep profile specifies mapping using the following | The nfs4_cis_prep profile specifies mapping using the following | |||
tables from stringprep: | tables from stringprep: | |||
Table B.1 | Table B.1 | |||
Table B.2 | Table B.2 | |||
14.2.4. Normalization used by nfs4_cis_prep | 14.2.4. Normalization Used by nfs4_cis_prep | |||
The nfs4_cis_prep profile specifies using Unicode normalization form | The nfs4_cis_prep profile specifies using Unicode normalization form | |||
KC, as described in stringprep. | KC, as described in stringprep. | |||
14.2.5. Prohibited output for nfs4_cis_prep | 14.2.5. Prohibited Output for nfs4_cis_prep | |||
The nfs4_cis_prep profile specifies prohibiting using the following | The nfs4_cis_prep profile specifies prohibiting using the following | |||
tables from stringprep: | tables from stringprep: | |||
Table C.1.2 | Table C.1.2 | |||
Table C.2.2 | Table C.2.2 | |||
Table C.3 | Table C.3 | |||
skipping to change at page 336, line 23 | skipping to change at page 336, line 43 | |||
Table C.5 | Table C.5 | |||
Table C.6 | Table C.6 | |||
Table C.7 | Table C.7 | |||
Table C.8 | Table C.8 | |||
Table C.9 | Table C.9 | |||
14.2.6. Bidirectional output for nfs4_cis_prep | 14.2.6. Bidirectional Output for nfs4_cis_prep | |||
The nfs4_cis_prep profile specifies checking bidirectional strings as | The nfs4_cis_prep profile specifies checking bidirectional strings as | |||
described in stringprep's section 6. | described in stringprep's Section 6. | |||
14.3. Stringprep profile for the utf8str_mixed type | 14.3. Stringprep Profile for the utf8str_mixed Type | |||
Every use of the utf8str_mixed type definition in the NFSv4.1 | Every use of the utf8str_mixed type definition in the NFSv4.1 | |||
protocol specification follows the profile named nfs4_mixed_prep. | protocol specification follows the profile named nfs4_mixed_prep. | |||
14.3.1. Intended applicability of the nfs4_mixed_prep profile | 14.3.1. Intended Applicability of the nfs4_mixed_prep Profile | |||
The utf8str_mixed type is a string of UTF-8 characters, with a prefix | The utf8str_mixed type is a string of UTF-8 characters, with a prefix | |||
that is case sensitive, a separator equal to '@', and a suffix that | that is case sensitive, a separator equal to '@', and a suffix that | |||
is fully qualified domain name. Its primary use in NFSv4.1 is for | is a fully qualified domain name. Its primary use in NFSv4.1 is for | |||
naming principals identified in an Access Control Entry. | naming principals identified in an Access Control Entry. | |||
14.3.2. Character repertoire of nfs4_mixed_prep | 14.3.2. Character Repertoire of nfs4_mixed_prep | |||
The nfs4_mixed_prep profile uses Unicode 3.2, as defined in | The nfs4_mixed_prep profile uses Unicode 3.2, as defined in | |||
stringprep's Appendix A.1. However, NFSv4.1 implementations are not | stringprep's Appendix A.1. However, NFSv4.1 implementations are not | |||
limited to 3.2. | limited to 3.2. | |||
14.3.3. Mapping used by nfs4_cis_prep | 14.3.3. Mapping Used by nfs4_cis_prep | |||
For the prefix and the separator of a utf8str_mixed string, the | For the prefix and the separator of a utf8str_mixed string, the | |||
nfs4_mixed_prep profile specifies mapping using the following table | nfs4_mixed_prep profile specifies mapping using the following table | |||
from stringprep: | from stringprep: | |||
Table B.1 | Table B.1 | |||
For the suffix of a utf8str_mixed string, the nfs4_mixed_prep profile | For the suffix of a utf8str_mixed string, the nfs4_mixed_prep profile | |||
specifies mapping using the following tables from stringprep: | specifies mapping using the following tables from stringprep: | |||
Table B.1 | Table B.1 | |||
Table B.2 | Table B.2 | |||
14.3.4. Normalization used by nfs4_mixed_prep | 14.3.4. Normalization Used by nfs4_mixed_prep | |||
The nfs4_mixed_prep profile specifies using Unicode normalization | The nfs4_mixed_prep profile specifies using Unicode normalization | |||
form KC, as described in stringprep. | form KC, as described in stringprep. | |||
14.3.5. Prohibited output for nfs4_mixed_prep | 14.3.5. Prohibited Output for nfs4_mixed_prep | |||
The nfs4_mixed_prep profile specifies prohibiting using the following | The nfs4_mixed_prep profile specifies prohibiting using the following | |||
tables from stringprep: | tables from stringprep: | |||
Table C.1.2 | Table C.1.2 | |||
Table C.2.2 | Table C.2.2 | |||
Table C.3 | Table C.3 | |||
skipping to change at page 337, line 33 | skipping to change at page 338, line 4 | |||
Table C.1.2 | Table C.1.2 | |||
Table C.2.2 | Table C.2.2 | |||
Table C.3 | Table C.3 | |||
Table C.4 | Table C.4 | |||
Table C.5 | Table C.5 | |||
Table C.6 | Table C.6 | |||
Table C.7 | Table C.7 | |||
Table C.8 | Table C.8 | |||
Table C.9 | Table C.9 | |||
14.3.6. Bidirectional output for nfs4_mixed_prep | 14.3.6. Bidirectional Output for nfs4_mixed_prep | |||
The nfs4_mixed_prep profile specifies checking bidirectional strings | The nfs4_mixed_prep profile specifies checking bidirectional strings | |||
as described in stringprep's section 6. | as described in stringprep's Section 6. | |||
14.4. UTF-8 Capabilities | 14.4. UTF-8 Capabilities | |||
const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1; | const FSCHARSET_CAP4_CONTAINS_NON_UTF8 = 0x1; | |||
const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2; | const FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 = 0x2; | |||
typedef uint32_t fs_charset_cap4; | typedef uint32_t fs_charset_cap4; | |||
Because some operating environments and file systems do not enforce | Because some operating environments and file systems do not enforce | |||
character set encodings, NFSv4.1 supports the fs_charset_cap | character set encodings, NFSv4.1 supports the fs_charset_cap | |||
attribute (Section 5.8.2.11) that indicates to the client a file | attribute (Section 5.8.2.11) that indicates to the client a file | |||
system's UTF-8 capabilities. The attribute is an integer containing | system's UTF-8 capabilities. The attribute is an integer containing | |||
a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, | a pair of flags. The first flag is FSCHARSET_CAP4_CONTAINS_NON_UTF8, | |||
which, if set to one tells the client the file system contains non- | which, if set to one, tells the client that the file system contains | |||
UTF-8 characters, and the server will not convert non-UTF characters | non-UTF-8 characters, and the server will not convert non-UTF | |||
to UTF-8 if the client reads a symlink or directory, nor will | characters to UTF-8 if the client reads a symlink or directory, | |||
operations with component names or pathnames in the arguments convert | neither will operations with component names or pathnames in the | |||
the strings to UTF-8. The second flag is | arguments convert the strings to UTF-8. The second flag is | |||
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 which if set to one, indicates that | FSCHARSET_CAP4_ALLOWS_ONLY_UTF8, which, if set to one, indicates that | |||
the server will accept (and generate) only UTF-8 characters on the | the server will accept (and generate) only UTF-8 characters on the | |||
file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, | file system. If FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 is set to one, | |||
FSCHARSET_CAP4_CONTAINS_NON_UTF8 MUST be set to zero. | FSCHARSET_CAP4_CONTAINS_NON_UTF8 MUST be set to zero. | |||
FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 SHOULD always be set to one. | FSCHARSET_CAP4_ALLOWS_ONLY_UTF8 SHOULD always be set to one. | |||
14.5. UTF-8 Related Errors | 14.5. UTF-8 Related Errors | |||
Where the client sends an invalid UTF-8 string, the server should | Where the client sends an invalid UTF-8 string, the server should | |||
return NFS4ERR_INVAL (see Table 5). This includes cases in which | return NFS4ERR_INVAL (see Table 5). This includes cases in which | |||
inappropriate prefixes are detected and where the count includes | inappropriate prefixes are detected and where the count includes | |||
trailing bytes that do not constitute a full UCS character. | trailing bytes that do not constitute a full UCS character. | |||
Where the client supplied string is valid UTF-8 but contains | Where the client-supplied string is valid UTF-8 but contains | |||
characters that are not supported by the server as a value for that | characters that are not supported by the server as a value for that | |||
string (e.g. names containing characters outside of Unicode plane 0 | string (e.g., names containing characters outside of Unicode plane 0 | |||
on filesystems that fail to support such characters despite their | on filesystems that fail to support such characters despite their | |||
presence in the Unicode standard), the server should return | presence in the Unicode standard), the server should return | |||
NFS4ERR_BADCHAR. | NFS4ERR_BADCHAR. | |||
Where a UTF-8 string is used as a file name, and the file system, | Where a UTF-8 string is used as a file name, and the file system | |||
while supporting all of the characters within the name, does not | (while supporting all of the characters within the name) does not | |||
allow that particular name to be used, the server should return the | allow that particular name to be used, the server should return the | |||
error NFS4ERR_BADNAME (Table 5). This includes situations in which | error NFS4ERR_BADNAME (Table 5). This includes situations in which | |||
the server file system imposes a normalization constraint on name | the server file system imposes a normalization constraint on name | |||
strings, but will also include such situations as file system | strings, but will also include such situations as file system | |||
prohibitions of "." and ".." as file names for certain operations, | prohibitions of "." and ".." as file names for certain operations, | |||
and other such constraints. | and other such constraints. | |||
15. Error Values | 15. Error Values | |||
NFS error numbers are assigned to failed operations within a Compound | NFS error numbers are assigned to failed operations within a Compound | |||
skipping to change at page 341, line 35 | skipping to change at page 342, line 6 | |||
This section deals with errors that are applicable to a broad set of | This section deals with errors that are applicable to a broad set of | |||
different purposes. | different purposes. | |||
15.1.1.1. NFS4ERR_BADXDR (Error Code 10036) | 15.1.1.1. NFS4ERR_BADXDR (Error Code 10036) | |||
The arguments for this operation do not match those specified in the | The arguments for this operation do not match those specified in the | |||
XDR definition. This includes situations in which the request ends | XDR definition. This includes situations in which the request ends | |||
before all the arguments have been seen. Note that this error | before all the arguments have been seen. Note that this error | |||
applies when fixed enumerations (these include booleans) have a value | applies when fixed enumerations (these include booleans) have a value | |||
within the input stream which is not valid for the enum. A replier | within the input stream that is not valid for the enum. A replier | |||
may pre-parse all operations for a Compound procedure before doing | may pre-parse all operations for a Compound procedure before doing | |||
any operation execution and return RPC-level XDR errors in that case. | any operation execution and return RPC-level XDR errors in that case. | |||
15.1.1.2. NFS4ERR_BAD_COOKIE (Error Code 10003) | 15.1.1.2. NFS4ERR_BAD_COOKIE (Error Code 10003) | |||
Used for operations that provide a set of information indexed by some | Used for operations that provide a set of information indexed by some | |||
quantity provided by the client or cookie sent by the server for an | quantity provided by the client or cookie sent by the server for an | |||
earlier invocation. Where the value cannot be used for its intended | earlier invocation. Where the value cannot be used for its intended | |||
purpose, this error results. | purpose, this error results. | |||
15.1.1.3. NFS4ERR_DELAY (Error Code 10008) | 15.1.1.3. NFS4ERR_DELAY (Error Code 10008) | |||
For any of a number of reasons, the replier could not process this | For any of a number of reasons, the replier could not process this | |||
operation in what was deemed a reasonable time. The client should | operation in what was deemed a reasonable time. The client should | |||
wait and then try the request with a new slot and sequence value. | wait and then try the request with a new slot and sequence value. | |||
Some example of situations that might lead to this situation: | Some examples of scenarios that might lead to this situation: | |||
o A server that supports hierarchical storage receives a request to | o A server that supports hierarchical storage receives a request to | |||
process a file that had been migrated. | process a file that had been migrated. | |||
o An operation requires a delegation recall to proceed and waiting | o An operation requires a delegation recall to proceed, and waiting | |||
for this delegation recall makes processing this request in a | for this delegation recall makes processing this request in a | |||
timely fashion impossible. | timely fashion impossible. | |||
In such cases, the error NFS4ERR_DELAY allows these preparatory | In such cases, the error NFS4ERR_DELAY allows these preparatory | |||
operations to proceed without holding up client resources such as a | operations to proceed without holding up client resources such as a | |||
session slot. After delaying for period of time, the client can then | session slot. After delaying for period of time, the client can then | |||
re-send the operation in question (but not with the same slot ID and | re-send the operation in question (but not with the same slot ID and | |||
sequence ID; one or both MUST be different on the re-send). | sequence ID; one or both MUST be different on the re-send). | |||
Note that without the ability to return NFS4ERR_DELAY and the | Note that without the ability to return NFS4ERR_DELAY and the | |||
client's willingness to re-send when receiving it, deadlock might | client's willingness to re-send when receiving it, deadlock might | |||
well result. E.g., if a recall is done, and if the delegation return | result. For example, if a recall is done, and if the delegation | |||
or operations preparatory to delegation return are held up by other | return or operations preparatory to delegation return are held up by | |||
operations that need the delegation to be returned, session slots | other operations that need the delegation to be returned, session | |||
might not be available. The result could be deadlock. | slots might not be available. The result could be deadlock. | |||
15.1.1.4. NFS4ERR_INVAL (Error Code 22) | 15.1.1.4. NFS4ERR_INVAL (Error Code 22) | |||
The arguments for this operation are not valid for some reason, even | The arguments for this operation are not valid for some reason, even | |||
though they do match those specified in the XDR definition for the | though they do match those specified in the XDR definition for the | |||
request. | request. | |||
15.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) | 15.1.1.5. NFS4ERR_NOTSUPP (Error Code 10004) | |||
Operation not supported, either because the operation is an OPTIONAL | Operation not supported, either because the operation is an OPTIONAL | |||
one and is not supported by this server or because the operation MUST | one and is not supported by this server or because the operation MUST | |||
NOT be implemented in the current minor version. | NOT be implemented in the current minor version. | |||
15.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) | 15.1.1.6. NFS4ERR_SERVERFAULT (Error Code 10006) | |||
An error occurred on the server which does not map to any of the | An error occurred on the server that does not map to any of the | |||
specific legal NFSv4.1 protocol error values. The client should | specific legal NFSv4.1 protocol error values. The client should | |||
translate this into an appropriate error. UNIX clients may choose to | translate this into an appropriate error. UNIX clients may choose to | |||
translate this to EIO. | translate this to EIO. | |||
15.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) | 15.1.1.7. NFS4ERR_TOOSMALL (Error Code 10005) | |||
Used where an operation returns a variable amount of data, with a | Used where an operation returns a variable amount of data, with a | |||
limit specified by the client. Where the data returned cannot be fit | limit specified by the client. Where the data returned cannot be fit | |||
within the limit specified by the client, this error results. | within the limit specified by the client, this error results. | |||
skipping to change at page 343, line 27 | skipping to change at page 343, line 46 | |||
15.1.2.1. NFS4ERR_BADHANDLE (Error Code 10001) | 15.1.2.1. NFS4ERR_BADHANDLE (Error Code 10001) | |||
Illegal NFS filehandle for the current server. The current file | Illegal NFS filehandle for the current server. The current file | |||
handle failed internal consistency checks. Once accepted as valid | handle failed internal consistency checks. Once accepted as valid | |||
(by PUTFH), no subsequent status change can cause the filehandle to | (by PUTFH), no subsequent status change can cause the filehandle to | |||
generate this error. | generate this error. | |||
15.1.2.2. NFS4ERR_FHEXPIRED (Error Code 10014) | 15.1.2.2. NFS4ERR_FHEXPIRED (Error Code 10014) | |||
A current or saved filehandle which is an argument to the current | A current or saved filehandle that is an argument to the current | |||
operation is volatile and has expired at the server. | operation is volatile and has expired at the server. | |||
15.1.2.3. NFS4ERR_ISDIR (Error Code 21) | 15.1.2.3. NFS4ERR_ISDIR (Error Code 21) | |||
The current or saved filehandle designates a directory when the | The current or saved filehandle designates a directory when the | |||
current operation does not allow a directory to be accepted as the | current operation does not allow a directory to be accepted as the | |||
target of this operation. | target of this operation. | |||
15.1.2.4. NFS4ERR_MOVED (Error Code 10019) | 15.1.2.4. NFS4ERR_MOVED (Error Code 10019) | |||
The file system which contains the current filehandle object is not | The file system that contains the current filehandle object is not | |||
present at the server. It may have been relocated, migrated to | present at the server. It may have been relocated or migrated to | |||
another server or may have never been present. The client may obtain | another server, or it may have never been present. The client may | |||
the new file system location by obtaining the "fs_locations" or | obtain the new file system location by obtaining the "fs_locations" | |||
"fs_locations_info" attribute for the current filehandle. For | or "fs_locations_info" attribute for the current filehandle. For | |||
further discussion, refer to Section 11.2 | further discussion, refer to Section 11.2. | |||
15.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) | 15.1.2.5. NFS4ERR_NOFILEHANDLE (Error Code 10020) | |||
The logical current or saved filehandle value is required by the | The logical current or saved filehandle value is required by the | |||
current operation and is not set. This may be a result of a | current operation and is not set. This may be a result of a | |||
malformed COMPOUND operation (i.e. no PUTFH or PUTROOTFH before an | malformed COMPOUND operation (i.e., no PUTFH or PUTROOTFH before an | |||
operation that requires the current filehandle be set). | operation that requires the current filehandle be set). | |||
15.1.2.6. NFS4ERR_NOTDIR (Error Code 20) | 15.1.2.6. NFS4ERR_NOTDIR (Error Code 20) | |||
The current (or saved) filehandle designates an object which is not a | The current (or saved) filehandle designates an object that is not a | |||
directory for an operation in which a directory is required. | directory for an operation in which a directory is required. | |||
15.1.2.7. NFS4ERR_STALE (Error Code 70) | 15.1.2.7. NFS4ERR_STALE (Error Code 70) | |||
The current or saved filehandle value designating an argument to the | The current or saved filehandle value designating an argument to the | |||
current operation is invalid The file referred to by that filehandle | current operation is invalid. The file referred to by that | |||
no longer exists or access to it has been revoked. | filehandle no longer exists or access to it has been revoked. | |||
15.1.2.8. NFS4ERR_SYMLINK (Error Code 10029) | 15.1.2.8. NFS4ERR_SYMLINK (Error Code 10029) | |||
The current filehandle designates a symbolic link when the current | The current filehandle designates a symbolic link when the current | |||
operation does not allow a symbolic link as the target. | operation does not allow a symbolic link as the target. | |||
15.1.2.9. NFS4ERR_WRONG_TYPE (Error Code 10083) | 15.1.2.9. NFS4ERR_WRONG_TYPE (Error Code 10083) | |||
The current (or saved) filehandle designates an object which is of an | The current (or saved) filehandle designates an object that is of an | |||
invalid type for the current operation and there is no more specific | invalid type for the current operation, and there is no more specific | |||
error (such as NFS4ERR_ISDIR or NFS4ERR_SYMLINK) that applies. Note | error (such as NFS4ERR_ISDIR or NFS4ERR_SYMLINK) that applies. Note | |||
that in NFSv4.0, such situations generally resulted in the less | that in NFSv4.0, such situations generally resulted in the less- | |||
specific error NFS4ERR_INVAL. | specific error NFS4ERR_INVAL. | |||
15.1.3. Compound Structure Errors | 15.1.3. Compound Structure Errors | |||
This section deals with errors that relate to overall structure of a | This section deals with errors that relate to the overall structure | |||
Compound request (by which we mean to include both COMPOUND and | of a Compound request (by which we mean to include both COMPOUND and | |||
CB_COMPOUND), rather than to particular operations. | CB_COMPOUND), rather than to particular operations. | |||
There are a number of basic constraints on the operations that may | There are a number of basic constraints on the operations that may | |||
appear in a Compound request. Sessions adds to these basic | appear in a Compound request. Sessions add to these basic | |||
constraints by requiring a Sequence operation (either SEQUENCE or | constraints by requiring a Sequence operation (either SEQUENCE or | |||
CB_SEQUENCE) at the start of the Compound. | CB_SEQUENCE) at the start of the Compound. | |||
15.1.3.1. NFS_OK (Error code 0) | 15.1.3.1. NFS_OK (Error code 0) | |||
Indicates the operation completed successfully, in that all of the | Indicates the operation completed successfully, in that all of the | |||
constituent operations completed without error. | constituent operations completed without error. | |||
15.1.3.2. NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021) | 15.1.3.2. NFS4ERR_MINOR_VERS_MISMATCH (Error code 10021) | |||
skipping to change at page 345, line 19 | skipping to change at page 345, line 36 | |||
Compound does not start with a Sequence operation. This error | Compound does not start with a Sequence operation. This error | |||
results when that constraint is not met. | results when that constraint is not met. | |||
15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044) | 15.1.3.4. NFS4ERR_OP_ILLEGAL (Error Code 10044) | |||
The operation code is not a valid one for the current Compound | The operation code is not a valid one for the current Compound | |||
procedure. The opcode in the result stream matched with this error | procedure. The opcode in the result stream matched with this error | |||
is the ILLEGAL value, although the value that appears in the request | is the ILLEGAL value, although the value that appears in the request | |||
stream may be different. Where an illegal value appears and the | stream may be different. Where an illegal value appears and the | |||
replier pre-parses all operations for a Compound procedure before | replier pre-parses all operations for a Compound procedure before | |||
doing any operation execution, an RPC-level XDR error may be returned | doing any operation execution, an RPC-level XDR error may be | |||
in this case. | returned. | |||
15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071) | 15.1.3.5. NFS4ERR_OP_NOT_IN_SESSION (Error Code 10071) | |||
Most forward operations and all callback operations are only valid | Most forward operations and all callback operations are only valid | |||
within the context of a session, so that the Compound request in | within the context of a session, so that the Compound request in | |||
question MUST begin with a Sequence operation. If an attempt is made | question MUST begin with a Sequence operation. If an attempt is made | |||
to execute these operations outside the context of session, this | to execute these operations outside the context of session, this | |||
error results. | error results. | |||
15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066) | 15.1.3.6. NFS4ERR_REP_TOO_BIG (Error Code 10066) | |||
skipping to change at page 345, line 48 | skipping to change at page 346, line 18 | |||
size for replies cached in the reply cache when the Sequence for the | size for replies cached in the reply cache when the Sequence for the | |||
current request specifies that this request is to be cached. | current request specifies that this request is to be cached. | |||
15.1.3.8. NFS4ERR_REQ_TOO_BIG (Error Code 10065) | 15.1.3.8. NFS4ERR_REQ_TOO_BIG (Error Code 10065) | |||
The Compound request exceeds the channel's negotiated maximum size | The Compound request exceeds the channel's negotiated maximum size | |||
for requests. | for requests. | |||
15.1.3.9. NFS4ERR_RETRY_UNCACHED_REP (Error Code 10068) | 15.1.3.9. NFS4ERR_RETRY_UNCACHED_REP (Error Code 10068) | |||
The requester has attempted a retry of a Compound which it previously | The requester has attempted a retry of a Compound that it previously | |||
requested not be placed in the reply cache. | requested not be placed in the reply cache. | |||
15.1.3.10. NFS4ERR_SEQUENCE_POS (Error Code 10064) | 15.1.3.10. NFS4ERR_SEQUENCE_POS (Error Code 10064) | |||
A Sequence operation appeared in a position other than the first | A Sequence operation appeared in a position other than the first | |||
operation of a Compound request. | operation of a Compound request. | |||
15.1.3.11. NFS4ERR_TOO_MANY_OPS (Error Code 10070) | 15.1.3.11. NFS4ERR_TOO_MANY_OPS (Error Code 10070) | |||
The Compound request has too many operations, exceeding the count | The Compound request has too many operations, exceeding the count | |||
negotiated when the session was created. | negotiated when the session was created. | |||
15.1.3.12. NFS4ERR_UNSAFE_COMPOUND (Error Code 10068) | 15.1.3.12. NFS4ERR_UNSAFE_COMPOUND (Error Code 10068) | |||
The client has sent a COMPOUND request with an unsafe mix of | The client has sent a COMPOUND request with an unsafe mix of | |||
operations, specifically with a non-idempotent operation changing the | operations -- specifically, with a non-idempotent operation that | |||
current filehandle which is not followed by a GETFH. | changes the current filehandle and that is not followed by a GETFH. | |||
15.1.4. File System Errors | 15.1.4. File System Errors | |||
These errors describe situations which occurred in the underlying | These errors describe situations that occurred in the underlying file | |||
file system implementation rather than in the protocol or any NFSv4.x | system implementation rather than in the protocol or any NFSv4.x | |||
feature. | feature. | |||
15.1.4.1. NFS4ERR_BADTYPE (Error Code 10007) | 15.1.4.1. NFS4ERR_BADTYPE (Error Code 10007) | |||
An attempt was made to create an object with an inappropriate type | An attempt was made to create an object with an inappropriate type | |||
specified to CREATE. This may be because the type is undefined, | specified to CREATE. This may be because the type is undefined, | |||
because it is a type not supported by the server, or because it is a | because the type is not supported by the server, or because the type | |||
type for which create is not intended such as a regular file or named | is not intended to be created by CREATE (such as a regular file or | |||
attribute, for which OPEN is used to do the file creation. | named attribute, for which OPEN is used to do the file creation). | |||
15.1.4.2. NFS4ERR_DQUOT (Error Code 19) | 15.1.4.2. NFS4ERR_DQUOT (Error Code 19) | |||
Resource (quota) hard limit exceeded. The user's resource limit on | Resource (quota) hard limit exceeded. The user's resource limit on | |||
the server has been exceeded. | the server has been exceeded. | |||
15.1.4.3. NFS4ERR_EXIST (Error Code 17) | 15.1.4.3. NFS4ERR_EXIST (Error Code 17) | |||
A file of the specified target name (when creating, renaming or | A file of the specified target name (when creating, renaming, or | |||
linking) already exists. | linking) already exists. | |||
15.1.4.4. NFS4ERR_FBIG (Error Code 27) | 15.1.4.4. NFS4ERR_FBIG (Error Code 27) | |||
File too large. The operation would have caused a file to grow | The file is too large. The operation would have caused the file to | |||
beyond the server's limit. | grow beyond the server's limit. | |||
15.1.4.5. NFS4ERR_FILE_OPEN (Error Code 10046) | 15.1.4.5. NFS4ERR_FILE_OPEN (Error Code 10046) | |||
The operation is not allowed because a file involved in the operation | The operation is not allowed because a file involved in the operation | |||
is currently open. Servers may, but are not required to disallow | is currently open. Servers may, but are not required to, disallow | |||
linking-to, removing, or renaming open files. | linking-to, removing, or renaming open files. | |||
15.1.4.6. NFS4ERR_IO (Error Code 5) | 15.1.4.6. NFS4ERR_IO (Error Code 5) | |||
Indicates that an I/O error occurred for which the file system was | Indicates that an I/O error occurred for which the file system was | |||
unable to provide recovery. | unable to provide recovery. | |||
15.1.4.7. NFS4ERR_MLINK (Error Code 31) | 15.1.4.7. NFS4ERR_MLINK (Error Code 31) | |||
The request would have caused the server's limit for the number of | The request would have caused the server's limit for the number of | |||
hard links a file may have to be exceeded. | hard links a file may have to be exceeded. | |||
15.1.4.8. NFS4ERR_NOENT (Error Code 2) | 15.1.4.8. NFS4ERR_NOENT (Error Code 2) | |||
Indicates no such file or directory. The file or directory name | Indicates no such file or directory. The file or directory name | |||
specified does not exist. | specified does not exist. | |||
15.1.4.9. NFS4ERR_NOSPC (Error Code 28) | 15.1.4.9. NFS4ERR_NOSPC (Error Code 28) | |||
Indicates no space left on device. The operation would have caused | Indicates there is no space left on the device. The operation would | |||
the server's file system to exceed its limit. | have caused the server's file system to exceed its limit. | |||
15.1.4.10. NFS4ERR_NOTEMPTY (Error Code 66) | 15.1.4.10. NFS4ERR_NOTEMPTY (Error Code 66) | |||
An attempt was made to remove a directory that was not empty. | An attempt was made to remove a directory that was not empty. | |||
15.1.4.11. NFS4ERR_ROFS (Error Code 30) | 15.1.4.11. NFS4ERR_ROFS (Error Code 30) | |||
Indicates a read-only file system. A modifying operation was | Indicates a read-only file system. A modifying operation was | |||
attempted on a read-only file system. | attempted on a read-only file system. | |||
15.1.4.12. NFS4ERR_XDEV (Error Code 18) | 15.1.4.12. NFS4ERR_XDEV (Error Code 18) | |||
Indicates an attempt to do an operation, such as linking, that | Indicates an attempt to do an operation, such as linking, that | |||
inappropriately crosses a boundary. This may be due to such | inappropriately crosses a boundary. This may be due to such | |||
boundaries as: | boundaries as: | |||
o That between file systems (where the fsids are different). | o that between file systems (where the fsids are different). | |||
o That between different named attribute directories or between a | o that between different named attribute directories or between a | |||
named attribute directory and an ordinary directory. | named attribute directory and an ordinary directory. | |||
o That between regions of a file system that the file system | o that between byte-ranges of a file system that the file system | |||
implementation treats as separate (for example for space | implementation treats as separate (for example, for space | |||
accounting purposes), and where cross-connection between the | accounting purposes), and where cross-connection between the byte- | |||
regions are not allowed. | ranges are not allowed. | |||
15.1.5. State Management Errors | 15.1.5. State Management Errors | |||
These errors indicate problems with the stateid (or one of the | These errors indicate problems with the stateid (or one of the | |||
stateids) passed to a given operation. This includes situations in | stateids) passed to a given operation. This includes situations in | |||
which the stateid is invalid as well as situations in which the | which the stateid is invalid as well as situations in which the | |||
stateid is valid but designates revoked locking state. Depending on | stateid is valid but designates locking state that has been revoked. | |||
the operation, the stateid when valid may designate opens, byte-range | Depending on the operation, the stateid when valid may designate | |||
locks, file or directory delegations, layouts, or device maps. | opens, byte-range locks, file or directory delegations, layouts, or | |||
device maps. | ||||
15.1.5.1. NFS4ERR_ADMIN_REVOKED (Error Code 10047) | 15.1.5.1. NFS4ERR_ADMIN_REVOKED (Error Code 10047) | |||
A stateid designates locking state of any type that has been revoked | A stateid designates locking state of any type that has been revoked | |||
due to administrative interaction, possibly while the lease is valid. | due to administrative interaction, possibly while the lease is valid. | |||
15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) | 15.1.5.2. NFS4ERR_BAD_STATEID (Error Code 10026) | |||
A stateid does not properly designate any valid state. See | A stateid does not properly designate any valid state. See Sections | |||
Section 8.2.4 and Section 8.2.3 for a discussion of how stateids are | 8.2.4 and 8.2.3 for a discussion of how stateids are validated. | |||
validated. | ||||
15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087) | 15.1.5.3. NFS4ERR_DELEG_REVOKED (Error Code 10087) | |||
A stateid designates recallable locking state of any type (delegation | A stateid designates recallable locking state of any type (delegation | |||
or layout) that has been revoked due to the failure of the client to | or layout) that has been revoked due to the failure of the client to | |||
return the lock when it was recalled. | return the lock when it was recalled. | |||
15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011) | 15.1.5.4. NFS4ERR_EXPIRED (Error Code 10011) | |||
A stateid designates locking state of any type that has been revoked | A stateid designates locking state of any type that has been revoked | |||
skipping to change at page 349, line 10 | skipping to change at page 349, line 26 | |||
15.1.6. Security Errors | 15.1.6. Security Errors | |||
These are the various permission-related errors in NFSv4.1. | These are the various permission-related errors in NFSv4.1. | |||
15.1.6.1. NFS4ERR_ACCESS (Error Code 13) | 15.1.6.1. NFS4ERR_ACCESS (Error Code 13) | |||
Indicates permission denied. The caller does not have the correct | Indicates permission denied. The caller does not have the correct | |||
permission to perform the requested operation. Contrast this with | permission to perform the requested operation. Contrast this with | |||
NFS4ERR_PERM (Section 15.1.6.2), which restricts itself to owner or | NFS4ERR_PERM (Section 15.1.6.2), which restricts itself to owner or | |||
privileged user permission failures, and NFS4ERR_WRONG_CRED | privileged-user permission failures, and NFS4ERR_WRONG_CRED | |||
(Section 15.1.6.4) which deals with appropriate permission to delete | (Section 15.1.6.4), which deals with appropriate permission to delete | |||
or modify transient objects, based on the credentials of the user | or modify transient objects based on the credentials of the user that | |||
that created them. | created them. | |||
15.1.6.2. NFS4ERR_PERM (Error Code 1) | 15.1.6.2. NFS4ERR_PERM (Error Code 1) | |||
Indicates requester is not the owner. The operation was not allowed | Indicates requester is not the owner. The operation was not allowed | |||
because the caller is neither a privileged user (root) nor the owner | because the caller is neither a privileged user (root) nor the owner | |||
of the target of the operation. | of the target of the operation. | |||
15.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) | 15.1.6.3. NFS4ERR_WRONGSEC (Error Code 10016) | |||
Indicates that the security mechanism being used by the client for | Indicates that the security mechanism being used by the client for | |||
the operation does not match the server's security policy. The | the operation does not match the server's security policy. The | |||
client should change the security mechanism being used and re-send | client should change the security mechanism being used and re-send | |||
the operation (but not with the same slot ID and sequence ID; one or | the operation (but not with the same slot ID and sequence ID; one or | |||
both MUST be different on the re-send). SECINFO and SECINFO_NO_NAME | both MUST be different on the re-send). SECINFO and SECINFO_NO_NAME | |||
can be used to determine the appropriate mechanism. | can be used to determine the appropriate mechanism. | |||
15.1.6.4. NFS4ERR_WRONG_CRED (Error Code 10082) | 15.1.6.4. NFS4ERR_WRONG_CRED (Error Code 10082) | |||
An operation manipulating state was attempted by a principal that was | An operation that manipulates state was attempted by a principal that | |||
not allowed to modify that piece of state. | was not allowed to modify that piece of state. | |||
15.1.7. Name Errors | 15.1.7. Name Errors | |||
Names in NFSv4 are UTF-8 strings. When the strings are not valid | Names in NFSv4 are UTF-8 strings. When the strings are not valid | |||
UTF-8 or are of length zero, the error NFS4ERR_INVAL results. | UTF-8 or are of length zero, the error NFS4ERR_INVAL results. | |||
Besides this, there are a number of other errors to indicate specific | Besides this, there are a number of other errors to indicate specific | |||
problems with names. | problems with names. | |||
15.1.7.1. NFS4ERR_BADCHAR (Error Code 10040) | 15.1.7.1. NFS4ERR_BADCHAR (Error Code 10040) | |||
A UTF-8 string contains a character which is not supported by the | A UTF-8 string contains a character that is not supported by the | |||
server in the context in which it being used. | server in the context in which it being used. | |||
15.1.7.2. NFS4ERR_BADNAME (Error Code 10041) | 15.1.7.2. NFS4ERR_BADNAME (Error Code 10041) | |||
A name string in a request consisted of valid UTF-8 characters | A name string in a request consisted of valid UTF-8 characters | |||
supported by the server but the name is not supported by the server | supported by the server, but the name is not supported by the server | |||
as a valid name for current operation. An example might be creating | as a valid name for the current operation. An example might be | |||
a file or directory named ".." on a server whose file system uses | creating a file or directory named ".." on a server whose file system | |||
that name for links to parent directories. | uses that name for links to parent directories. | |||
15.1.7.3. NFS4ERR_NAMETOOLONG (Error Code 63) | 15.1.7.3. NFS4ERR_NAMETOOLONG (Error Code 63) | |||
Returned when the filename in an operation exceeds the server's | Returned when the filename in an operation exceeds the server's | |||
implementation limit. | implementation limit. | |||
15.1.8. Locking Errors | 15.1.8. Locking Errors | |||
This section deal with errors related to locking, both as to share | This section deals with errors related to locking, both as to share | |||
reservations and byte-range locking. It does not deal with errors | reservations and byte-range locking. It does not deal with errors | |||
specific to the process of reclaiming locks. Those are dealt with in | specific to the process of reclaiming locks. Those are dealt with in | |||
the next section. | Section 15.1.9. | |||
15.1.8.1. NFS4ERR_BAD_RANGE (Error Code 10042) | 15.1.8.1. NFS4ERR_BAD_RANGE (Error Code 10042) | |||
The range for a LOCK, LOCKT, or LOCKU operation is not appropriate to | The byte-range of a LOCK, LOCKT, or LOCKU operation is not allowed by | |||
the allowable range of offsets for the server. E.g., this error | the server. For example, this error results when a server that only | |||
results when a server which only supports 32-bit ranges receives a | supports 32-bit ranges receives a range that cannot be handled by | |||
range that cannot be handled by that server. (See Section 18.10.3). | that server. (See Section 18.10.3.) | |||
15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045) | 15.1.8.2. NFS4ERR_DEADLOCK (Error Code 10045) | |||
The server has been able to determine a file locking deadlock | The server has been able to determine a byte-range locking deadlock | |||
condition for a blocking lock request. | condition for a READW_LT or WRITEW_LT LOCK operation. | |||
15.1.8.3. NFS4ERR_DENIED (Error Code 10010) | 15.1.8.3. NFS4ERR_DENIED (Error Code 10010) | |||
An attempt to lock a file is denied. Since this may be a temporary | An attempt to lock a file is denied. Since this may be a temporary | |||
condition, the client is encouraged to re-send the lock request (but | condition, the client is encouraged to re-send the lock request (but | |||
not with the same slot ID and sequence ID; one or both MUST be | not with the same slot ID and sequence ID; one or both MUST be | |||
different on the re-send) until the lock is accepted. See | different on the re-send) until the lock is accepted. See | |||
Section 9.6 for a discussion of the re-send. | Section 9.6 for a discussion of the re-send. | |||
15.1.8.4. NFS4ERR_LOCKED (Error Code 10012) | 15.1.8.4. NFS4ERR_LOCKED (Error Code 10012) | |||
A read or write operation was attempted on a file where there was a | A READ or WRITE operation was attempted on a file where there was a | |||
conflict between the I/O and an existing lock: | conflict between the I/O and an existing lock: | |||
o There is a share reservation inconsistent with the I/O being done. | o There is a share reservation inconsistent with the I/O being done. | |||
o The range to be read or written intersects an existing mandatory | o The range to be read or written intersects an existing mandatory | |||
byte range lock. | byte-range lock. | |||
15.1.8.5. NFS4ERR_LOCKS_HELD (Error Code 10037) | 15.1.8.5. NFS4ERR_LOCKS_HELD (Error Code 10037) | |||
An operation was prevented by the unexpected presence of locks. | An operation was prevented by the unexpected presence of locks. | |||
15.1.8.6. NFS4ERR_LOCK_NOTSUPP (Error Code 10043) | 15.1.8.6. NFS4ERR_LOCK_NOTSUPP (Error Code 10043) | |||
A locking request was attempted which would require the upgrade or | A LOCK operation was attempted that would require the upgrade or | |||
downgrade of a lock range already held by the owner when the server | downgrade of a byte-range lock range already held by the owner, and | |||
does not support atomic upgrade or downgrade of locks. | the server does not support atomic upgrade or downgrade of locks. | |||
15.1.8.7. NFS4ERR_LOCK_RANGE (Error Code 10028) | 15.1.8.7. NFS4ERR_LOCK_RANGE (Error Code 10028) | |||
A lock request is operating on a range that overlaps in part a | A LOCK operation is operating on a range that overlaps in part a | |||
currently held lock for the current lock-owner and does not precisely | currently held byte-range lock for the current lock-owner and does | |||
match a single such lock where the server does not support this type | not precisely match a single such byte-range lock where the server | |||
of request, and thus does not implement POSIX locking semantics [24]. | does not support this type of request, and thus does not implement | |||
See Section 18.10.4, Section 18.11.4, and Section 18.12.4 for a | POSIX locking semantics [24]. See Sections 18.10.4, 18.11.4, and | |||
discussion of how this applies to LOCK, LOCKT, and LOCKU | 18.12.4 for a discussion of how this applies to LOCK, LOCKT, and | |||
respectively. | LOCKU respectively. | |||
15.1.8.8. NFS4ERR_OPENMODE (Error Code 10038) | 15.1.8.8. NFS4ERR_OPENMODE (Error Code 10038) | |||
The client attempted a READ, WRITE, LOCK or other operation not | The client attempted a READ, WRITE, LOCK, or other operation not | |||
sanctioned by the stateid passed (e.g. writing to a file opened only | sanctioned by the stateid passed (e.g., writing to a file opened for | |||
for read). | read-only access). | |||
15.1.8.9. NFS4ERR_SHARE_DENIED (Error Code 10015) | 15.1.8.9. NFS4ERR_SHARE_DENIED (Error Code 10015) | |||
An attempt to OPEN a file with a share reservation has failed because | An attempt to OPEN a file with a share reservation has failed because | |||
of a share conflict. | of a share conflict. | |||
15.1.9. Reclaim Errors | 15.1.9. Reclaim Errors | |||
These errors relate to the process of reclaiming locks after a server | These errors relate to the process of reclaiming locks after a server | |||
restart. | restart. | |||
15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054) | 15.1.9.1. NFS4ERR_COMPLETE_ALREADY (Error Code 10054) | |||
The client previously sent a successful RECLAIM_COMPLETE operation. | The client previously sent a successful RECLAIM_COMPLETE operation. | |||
An additional RECLAIM_COMPLETE operation is not necessary and results | An additional RECLAIM_COMPLETE operation is not necessary and results | |||
in this error. | in this error. | |||
15.1.9.2. NFS4ERR_GRACE (Error Code 10013) | 15.1.9.2. NFS4ERR_GRACE (Error Code 10013) | |||
The server is in its recovery or grace period which should at least | The server was in its recovery or grace period. The locking request | |||
match the lease period of the server. A locking request other than a | was not a reclaim request and so could not be granted during that | |||
reclaim could not be granted during that period. | period. | |||
15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033) | 15.1.9.3. NFS4ERR_NO_GRACE (Error Code 10033) | |||
A reclaim of client state was attempted in circumstances in which the | A reclaim of client state was attempted in circumstances in which the | |||
server cannot guarantee that conflicting state has not been provided | server cannot guarantee that conflicting state has not been provided | |||
to another client. This can occur because the reclaim has been done | to another client. This can occur because the reclaim has been done | |||
outside of the grace period of the server, after the client has done | outside of the grace period of the server, after the client has done | |||
a RECLAIM_COMPLETE operation, or because previous operations have | a RECLAIM_COMPLETE operation, or because previous operations have | |||
created a situation in which the server is not able to determine that | created a situation in which the server is not able to determine that | |||
a reclaim-interfering edge condition does not exist. | a reclaim-interfering edge condition does not exist. | |||
skipping to change at page 352, line 45 | skipping to change at page 353, line 13 | |||
lock with which this client conflicted. See also Section 15.1.9.4 | lock with which this client conflicted. See also Section 15.1.9.4 | |||
for the related error, NFS4ERR_RECLAIM_BAD. | for the related error, NFS4ERR_RECLAIM_BAD. | |||
15.1.10. pNFS Errors | 15.1.10. pNFS Errors | |||
This section deals with pNFS-related errors including those that are | This section deals with pNFS-related errors including those that are | |||
associated with using NFSv4.1 to communicate with a data server. | associated with using NFSv4.1 to communicate with a data server. | |||
15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049) | 15.1.10.1. NFS4ERR_BADIOMODE (Error Code 10049) | |||
An invalid or inappropriate layout iomode was specified. | An invalid or inappropriate layout iomode was specified. For example | |||
an inappropriate layout iomode, suppose a client's LAYOUTGET | ||||
operation specified an iomode of LAYOUTIOMODE4_RW, and the server is | ||||
neither able nor willing to let the client send write requests to | ||||
data servers; the server can reply with NFS4ERR_BADIOMODE. The | ||||
client would then send another LAYOUTGET with an iomode of | ||||
LAYOUTIOMODE4_READ. | ||||
15.1.10.2. NFS4ERR_BADLAYOUT (Error Code 10050) | 15.1.10.2. NFS4ERR_BADLAYOUT (Error Code 10050) | |||
The layout specified is invalid in some way. For LAYOUTCOMMIT, this | The layout specified is invalid in some way. For LAYOUTCOMMIT, this | |||
indicates that the specified layout is not held by the client or is | indicates that the specified layout is not held by the client or is | |||
not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, it indicates that a | not of mode LAYOUTIOMODE4_RW. For LAYOUTGET, it indicates that a | |||
layout matching the client's specification as to minimum length | layout matching the client's specification as to minimum length | |||
cannot be granted. | cannot be granted. | |||
15.1.10.3. NFS4ERR_LAYOUTTRYLATER (Error Code 10058) | 15.1.10.3. NFS4ERR_LAYOUTTRYLATER (Error Code 10058) | |||
skipping to change at page 353, line 43 | skipping to change at page 354, line 20 | |||
not allow a WRITE. | not allow a WRITE. | |||
15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086) | 15.1.10.8. NFS4ERR_RETURNCONFLICT (Error Code 10086) | |||
A layout is unavailable due to an attempt to perform the LAYOUTGET | A layout is unavailable due to an attempt to perform the LAYOUTGET | |||
before a pending LAYOUTRETURN on the file has been received. See | before a pending LAYOUTRETURN on the file has been received. See | |||
Section 12.5.5.2.1.3. | Section 12.5.5.2.1.3. | |||
15.1.10.9. NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062) | 15.1.10.9. NFS4ERR_UNKNOWN_LAYOUTTYPE (Error Code 10062) | |||
The client has specified a layout type which is not supported by the | The client has specified a layout type that is not supported by the | |||
server. | server. | |||
15.1.11. Session Use Errors | 15.1.11. Session Use Errors | |||
This section deals with errors encountered in using sessions, that | This section deals with errors encountered when using sessions, that | |||
is, in sending requests over sessions using Sequence (i.e. either | is, errors encountered when a request uses a Sequence (i.e., either | |||
SEQUENCE or CB_SEQUENCE) operations. | SEQUENCE or CB_SEQUENCE) operation. | |||
15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) | 15.1.11.1. NFS4ERR_BADSESSION (Error Code 10052) | |||
The specified session ID is unknown to the server to which the | The specified session ID is unknown to the server to which the | |||
operation is addressed. | operation is addressed. | |||
15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) | 15.1.11.2. NFS4ERR_BADSLOT (Error Code 10053) | |||
The requester sent a Sequence operation that attempted to use a slot | The requester sent a Sequence operation that attempted to use a slot | |||
the replier does not have in its slot table. It is possible the slot | the replier does not have in its slot table. It is possible the slot | |||
skipping to change at page 354, line 30 | skipping to change at page 355, line 7 | |||
15.1.11.4. NFS4ERR_CB_PATH_DOWN (Error Code 10048) | 15.1.11.4. NFS4ERR_CB_PATH_DOWN (Error Code 10048) | |||
There is a problem contacting the client via the callback path. The | There is a problem contacting the client via the callback path. The | |||
function of this error has been mostly superseded by the use of | function of this error has been mostly superseded by the use of | |||
status flags in the reply to the SEQUENCE operation (see | status flags in the reply to the SEQUENCE operation (see | |||
Section 18.46). | Section 18.46). | |||
15.1.11.5. NFS4ERR_DEADSESSION (Error Code 10078) | 15.1.11.5. NFS4ERR_DEADSESSION (Error Code 10078) | |||
The specified session is a persistent session which is dead and does | The specified session is a persistent session that is dead and does | |||
not accept new requests or perform new operations on existing | not accept new requests or perform new operations on existing | |||
requests (in the case in which a request was partially executed | requests (in the case in which a request was partially executed | |||
before server restart). | before server restart). | |||
15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) | 15.1.11.6. NFS4ERR_CONN_NOT_BOUND_TO_SESSION (Error Code 10055) | |||
A Sequence operation was sent on a connection that has not been | A Sequence operation was sent on a connection that has not been | |||
associated with the specified session, where the client specified | associated with the specified session, where the client specified | |||
that connection association was to be enforced with SP4_MACH_CRED or | that connection association was to be enforced with SP4_MACH_CRED or | |||
SP4_SSV state protection. | SP4_SSV state protection. | |||
skipping to change at page 355, line 21 | skipping to change at page 355, line 46 | |||
An attempt was made to destroy a session when the session cannot be | An attempt was made to destroy a session when the session cannot be | |||
destroyed because the server has callback requests outstanding. | destroyed because the server has callback requests outstanding. | |||
15.1.12.2. NFS4ERR_BAD_SESSION_DIGEST (Error Code 10051) | 15.1.12.2. NFS4ERR_BAD_SESSION_DIGEST (Error Code 10051) | |||
The digest used in a SET_SSV request is not valid. | The digest used in a SET_SSV request is not valid. | |||
15.1.13. Client Management Errors | 15.1.13. Client Management Errors | |||
This sections deals with errors associated with requests used to | This section deals with errors associated with requests used to | |||
create and manage client IDs. | create and manage client IDs. | |||
15.1.13.1. NFS4ERR_CLIENTID_BUSY (Error Code 10074) | 15.1.13.1. NFS4ERR_CLIENTID_BUSY (Error Code 10074) | |||
The DESTROY_CLIENTID operation has found there are sessions and/or | The DESTROY_CLIENTID operation has found there are sessions and/or | |||
unexpired state associated with the client ID to be destroyed. | unexpired state associated with the client ID to be destroyed. | |||
15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017) | 15.1.13.2. NFS4ERR_CLID_INUSE (Error Code 10017) | |||
While processing an EXCHANGE_ID operation, the server was presented | While processing an EXCHANGE_ID operation, the server was presented | |||
with a co_ownerid field matches an existing client with valid leased | with a co_ownerid field that matches an existing client with valid | |||
state but the principal sending the EXCHANGE_ID operation is | leased state, but the principal sending the EXCHANGE_ID operation | |||
different than that establishing the existing client. This indicates | differs from the principal that established the existing client. | |||
a (most likely due to chance) collision between clients. The client | This indicates a collision (most likely due to chance) between | |||
should recover by changing the co_ownerid and re-sending EXCHANGE_ID | clients. The client should recover by changing the co_ownerid and | |||
(but not with the same slot ID and sequence ID; one or both MUST be | re-sending EXCHANGE_ID (but not with the same slot ID and sequence | |||
different on the re-send). | ID; one or both MUST be different on the re-send). | |||
15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079) | 15.1.13.3. NFS4ERR_ENCR_ALG_UNSUPP (Error Code 10079) | |||
An EXCHANGE_ID was sent which specified state protection via SSV, and | An EXCHANGE_ID was sent that specified state protection via SSV, and | |||
where the set of encryption algorithms presented by the client did | where the set of encryption algorithms presented by the client did | |||
not include any supported by the server. | not include any supported by the server. | |||
15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072) | 15.1.13.4. NFS4ERR_HASH_ALG_UNSUPP (Error Code 10072) | |||
An EXCHANGE_ID was sent which specified state protection via SSV, and | An EXCHANGE_ID was sent that specified state protection via SSV, and | |||
where the set of hashing algorithms presented by the client did not | where the set of hashing algorithms presented by the client did not | |||
include any supported by the server. | include any supported by the server. | |||
15.1.13.5. NFS4ERR_STALE_CLIENTID (Error Code 10022) | 15.1.13.5. NFS4ERR_STALE_CLIENTID (Error Code 10022) | |||
A client ID not recognized by the server was passed to an operation. | A client ID not recognized by the server was passed to an operation. | |||
Note that unlike the case of NFSv4.0, client IDs are not passed | Note that unlike the case of NFSv4.0, client IDs are not passed | |||
explicitly to the server in ordinary locking operations and cannot | explicitly to the server in ordinary locking operations and cannot | |||
result in this error. Instead, when there is a server restart, it is | result in this error. Instead, when there is a server restart, it is | |||
first manifested through an error on the associated session and the | first manifested through an error on the associated session, and the | |||
staleness of the client ID is detected when trying to associate a | staleness of the client ID is detected when trying to associate a | |||
client ID with a new session. | client ID with a new session. | |||
15.1.14. Delegation Errors | 15.1.14. Delegation Errors | |||
This section deals with errors associated with requesting and | This section deals with errors associated with requesting and | |||
returning delegations. | returning delegations. | |||
15.1.14.1. NFS4ERR_DELEG_ALREADY_WANTED (Error Code 10056) | 15.1.14.1. NFS4ERR_DELEG_ALREADY_WANTED (Error Code 10056) | |||
The client has requested a delegation when it had already registered | The client has requested a delegation when it had already registered | |||
that it wants that same delegation. | that it wants that same delegation. | |||
15.1.14.2. NFS4ERR_DIRDELEG_UNAVAIL (Error Code 10084) | 15.1.14.2. NFS4ERR_DIRDELEG_UNAVAIL (Error Code 10084) | |||
This error is returned when the server is unable or unwilling to | This error is returned when the server is unable or unwilling to | |||
provide a requested directory delegation. | provide a requested directory delegation. | |||
15.1.14.3. NFS4ERR_RECALLCONFLICT (Error Code 10061) | 15.1.14.3. NFS4ERR_RECALLCONFLICT (Error Code 10061) | |||
A recallable object (i.e. a layout or delegation) is unavailable due | A recallable object (i.e., a layout or delegation) is unavailable due | |||
to a conflicting recall operation for that object that is currently | to a conflicting recall operation that is currently in progress for | |||
in progress. | that object. | |||
15.1.14.4. NFS4ERR_REJECT_DELEG (Error Code 10085) | 15.1.14.4. NFS4ERR_REJECT_DELEG (Error Code 10085) | |||
The callback operation invoked to deal with a new delegation has | The callback operation invoked to deal with a new delegation has | |||
rejected it. | rejected it. | |||
15.1.15. Attribute Handling Errors | 15.1.15. Attribute Handling Errors | |||
This section deals with errors specific to attribute handling within | This section deals with errors specific to attribute handling within | |||
NFSv4. | NFSv4. | |||
15.1.15.1. NFS4ERR_ATTRNOTSUPP (Error Code 10032) | 15.1.15.1. NFS4ERR_ATTRNOTSUPP (Error Code 10032) | |||
An attribute specified is not supported by the server. This error | An attribute specified is not supported by the server. This error | |||
MUST NOT be returned by the GETATTR operation. | MUST NOT be returned by the GETATTR operation. | |||
15.1.15.2. NFS4ERR_BADOWNER (Error Code 10039) | 15.1.15.2. NFS4ERR_BADOWNER (Error Code 10039) | |||
Returned when an owner or owner_group attribute value or the who | This error is returned when an owner or owner_group attribute value | |||
field of an ace within an ACL attribute value cannot be translated to | or the who field of an ACE within an ACL attribute value cannot be | |||
a local representation. | translated to a local representation. | |||
15.1.15.3. NFS4ERR_NOT_SAME (Error Code 10027) | 15.1.15.3. NFS4ERR_NOT_SAME (Error Code 10027) | |||
This error is returned by the VERIFY operation to signify that the | This error is returned by the VERIFY operation to signify that the | |||
attributes compared were not the same as those provided in the | attributes compared were not the same as those provided in the | |||
client's request. | client's request. | |||
15.1.15.4. NFS4ERR_SAME (Error Code 10009) | 15.1.15.4. NFS4ERR_SAME (Error Code 10009) | |||
This error is returned by the NVERIFY operation to signify that the | This error is returned by the NVERIFY operation to signify that the | |||
skipping to change at page 357, line 34 | skipping to change at page 358, line 8 | |||
These errors MUST NOT be generated by any NFSv4.1 operation. This | These errors MUST NOT be generated by any NFSv4.1 operation. This | |||
can be for a number of reasons. | can be for a number of reasons. | |||
o The function provided by the error has been superseded by one of | o The function provided by the error has been superseded by one of | |||
the status bits returned by the SEQUENCE operation. | the status bits returned by the SEQUENCE operation. | |||
o The new session structure and associated change in locking have | o The new session structure and associated change in locking have | |||
made the error unnecessary. | made the error unnecessary. | |||
o There has been a restructuring of some errors for NFSv4.1 which | o There has been a restructuring of some errors for NFSv4.1 that | |||
resulted in the elimination of certain of the errors. | resulted in the elimination of certain errors. | |||
15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026) | 15.1.16.1. NFS4ERR_BAD_SEQID (Error Code 10026) | |||
The sequence number (seqid) in a locking request is neither the next | The sequence number (seqid) in a locking request is neither the next | |||
expected number or the last number processed. These seqids are | expected number or the last number processed. These seqids are | |||
ignored in NFSv4.1. | ignored in NFSv4.1. | |||
15.1.16.2. NFS4ERR_LEASE_MOVED (Error Code 10031) | 15.1.16.2. NFS4ERR_LEASE_MOVED (Error Code 10031) | |||
A lease being renewed is associated with a file system that has been | A lease being renewed is associated with a file system that has been | |||
migrated to a new server. The error has been superseded by the | migrated to a new server. The error has been superseded by the | |||
SEQ4_STATUS_LEASE_MOVED status bit (see Section 18.46). | SEQ4_STATUS_LEASE_MOVED status bit (see Section 18.46). | |||
15.1.16.3. NFS4ERR_NXIO (Error Code 5) | 15.1.16.3. NFS4ERR_NXIO (Error Code 5) | |||
I/O error. No such device or address. This error is for errors | I/O error. No such device or address. This error is for errors | |||
involving block and character device access, but NFSv4.1 is not a | involving block and character device access, but because NFSv4.1 is | |||
device access protocol. | not a device-access protocol, this error is not applicable. | |||
15.1.16.4. NFS4ERR_RESTOREFH (Error Code 10030) | 15.1.16.4. NFS4ERR_RESTOREFH (Error Code 10030) | |||
The RESTOREFH operation does not have a saved filehandle (identified | The RESTOREFH operation does not have a saved filehandle (identified | |||
by SAVEFH) to operate upon. In NFSv4.1, this error has been | by SAVEFH) to operate upon. In NFSv4.1, this error has been | |||
superseded by NFS4ERR_NOFILEHANDLE. | superseded by NFS4ERR_NOFILEHANDLE. | |||
15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023) | 15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023) | |||
A stateid generated by an earlier server instance was used. This | A stateid generated by an earlier server instance was used. This | |||
error is moot in NFSv4.1 because all operations that take a stateid | error is moot in NFSv4.1 because all operations that take a stateid | |||
MUST be preceded by the SEQUENCE operation, and the earlier server | MUST be preceded by the SEQUENCE operation, and the earlier server | |||
instance is detected by the session infrastructure that supports | instance is detected by the session infrastructure that supports | |||
SEQUENCE. | SEQUENCE. | |||
15.2. Operations and their valid errors | 15.2. Operations and Their Valid Errors | |||
This section contains a table which gives the valid error returns for | This section contains a table that gives the valid error returns for | |||
each protocol operation. The error code NFS4_OK (indicating no | each protocol operation. The error code NFS4_OK (indicating no | |||
error) is not listed but should be understood to be returnable by all | error) is not listed but should be understood to be returnable by all | |||
operations with two important exceptions: | operations with two important exceptions: | |||
o The operations which MUST NOT be implemented: OPEN_CONFIRM, | o The operations that MUST NOT be implemented: OPEN_CONFIRM, | |||
RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and SETCLIENTID_CONFIRM. | RELEASE_LOCKOWNER, RENEW, SETCLIENTID, and SETCLIENTID_CONFIRM. | |||
o The invalid operation: ILLEGAL. | o The invalid operation: ILLEGAL. | |||
Valid error returns for each protocol operation | Valid Error Returns for Each Protocol Operation | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
| Operation | Errors | | | Operation | Errors | | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
| ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | | | ACCESS | NFS4ERR_ACCESS, NFS4ERR_BADXDR, | | |||
| | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | | | NFS4ERR_DEADSESSION, NFS4ERR_DELAY, | | |||
| | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | | | | NFS4ERR_FHEXPIRED, NFS4ERR_INVAL, | | |||
| | NFS4ERR_IO, NFS4ERR_MOVED, | | | | NFS4ERR_IO, NFS4ERR_MOVED, | | |||
| | NFS4ERR_NOFILEHANDLE, | | | | NFS4ERR_NOFILEHANDLE, | | |||
| | NFS4ERR_OP_NOT_IN_SESSION, | | | | NFS4ERR_OP_NOT_IN_SESSION, | | |||
skipping to change at page 375, line 28 | skipping to change at page 375, line 45 | |||
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | |||
| | NFS4ERR_REQ_TOO_BIG, | | | | NFS4ERR_REQ_TOO_BIG, | | |||
| | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_ROFS, | | | | NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_ROFS, | | |||
| | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | | | NFS4ERR_SERVERFAULT, NFS4ERR_STALE, | | |||
| | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | | | | NFS4ERR_SYMLINK, NFS4ERR_TOO_MANY_OPS, | | |||
| | NFS4ERR_WRONG_TYPE | | | | NFS4ERR_WRONG_TYPE | | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
Table 6 | Table 6 | |||
15.3. Callback operations and their valid errors | 15.3. Callback Operations and Their Valid Errors | |||
This section contains a table which gives the valid error returns for | This section contains a table that gives the valid error returns for | |||
each callback operation. The error code NFS4_OK (indicating no | each callback operation. The error code NFS4_OK (indicating no | |||
error) is not listed but should be understood to be returnable by all | error) is not listed but should be understood to be returnable by all | |||
callback operations with the exception of CB_ILLEGAL. | callback operations with the exception of CB_ILLEGAL. | |||
Valid error returns for each protocol callback operation | Valid Error Returns for Each Protocol Callback Operation | |||
+-------------------------+-----------------------------------------+ | +-------------------------+-----------------------------------------+ | |||
| Callback Operation | Errors | | | Callback Operation | Errors | | |||
+-------------------------+-----------------------------------------+ | +-------------------------+-----------------------------------------+ | |||
| CB_GETATTR | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | | | CB_GETATTR | NFS4ERR_BADHANDLE, NFS4ERR_BADXDR, | | |||
| | NFS4ERR_DELAY, NFS4ERR_INVAL, | | | | NFS4ERR_DELAY, NFS4ERR_INVAL, | | |||
| | NFS4ERR_OP_NOT_IN_SESSION, | | | | NFS4ERR_OP_NOT_IN_SESSION, | | |||
| | NFS4ERR_REP_TOO_BIG, | | | | NFS4ERR_REP_TOO_BIG, | | |||
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | |||
| | NFS4ERR_REQ_TOO_BIG, | | | | NFS4ERR_REQ_TOO_BIG, | | |||
skipping to change at page 378, line 36 | skipping to change at page 378, line 36 | |||
| | NFS4ERR_REP_TOO_BIG, | | | | NFS4ERR_REP_TOO_BIG, | | |||
| | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE, | | |||
| | NFS4ERR_REQ_TOO_BIG, | | | | NFS4ERR_REQ_TOO_BIG, | | |||
| | NFS4ERR_RETRY_UNCACHED_REP, | | | | NFS4ERR_RETRY_UNCACHED_REP, | | |||
| | NFS4ERR_SERVERFAULT, | | | | NFS4ERR_SERVERFAULT, | | |||
| | NFS4ERR_TOO_MANY_OPS | | | | NFS4ERR_TOO_MANY_OPS | | |||
+-------------------------+-----------------------------------------+ | +-------------------------+-----------------------------------------+ | |||
Table 7 | Table 7 | |||
15.4. Errors and the operations that use them | 15.4. Errors and the Operations That Use Them | |||
+-----------------------------------+-------------------------------+ | +-----------------------------------+-------------------------------+ | |||
| Error | Operations | | | Error | Operations | | |||
+-----------------------------------+-------------------------------+ | +-----------------------------------+-------------------------------+ | |||
| NFS4ERR_ACCESS | ACCESS, COMMIT, CREATE, | | | NFS4ERR_ACCESS | ACCESS, COMMIT, CREATE, | | |||
| | GETATTR, GET_DIR_DELEGATION, | | | | GETATTR, GET_DIR_DELEGATION, | | |||
| | LAYOUTCOMMIT, LAYOUTGET, | | | | LAYOUTCOMMIT, LAYOUTGET, | | |||
| | LINK, LOCK, LOCKT, LOCKU, | | | | LINK, LOCK, LOCKT, LOCKU, | | |||
| | LOOKUP, LOOKUPP, NVERIFY, | | | | LOOKUP, LOOKUPP, NVERIFY, | | |||
| | OPEN, OPENATTR, READ, | | | | OPEN, OPENATTR, READ, | | |||
skipping to change at page 394, line 44 | skipping to change at page 394, line 44 | |||
void; | void; | |||
16.1.2. RESULTS | 16.1.2. RESULTS | |||
void; | void; | |||
16.1.3. DESCRIPTION | 16.1.3. DESCRIPTION | |||
This is the standard NULL procedure with the standard void argument | This is the standard NULL procedure with the standard void argument | |||
and void response. This procedure has no functionality associated | and void response. This procedure has no functionality associated | |||
with it. Because of this it is sometimes used to measure the | with it. Because of this, it is sometimes used to measure the | |||
overhead of processing a service request. Therefore, the server | overhead of processing a service request. Therefore, the server | |||
SHOULD ensure that no unnecessary work is done in servicing this | SHOULD ensure that no unnecessary work is done in servicing this | |||
procedure. | procedure. | |||
16.1.4. ERRORS | 16.1.4. ERRORS | |||
None. | None. | |||
16.2. Procedure 1: COMPOUND - Compound Operations | 16.2. Procedure 1: COMPOUND - Compound Operations | |||
skipping to change at page 401, line 24 | skipping to change at page 401, line 24 | |||
}; | }; | |||
struct COMPOUND4res { | struct COMPOUND4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
utf8str_cs tag; | utf8str_cs tag; | |||
nfs_resop4 resarray<>; | nfs_resop4 resarray<>; | |||
}; | }; | |||
16.2.3. DESCRIPTION | 16.2.3. DESCRIPTION | |||
The COMPOUND procedure is used to combine one or more of the NFS | The COMPOUND procedure is used to combine one or more NFSv4 | |||
operations into a single RPC request. The NFS RPC program has two | operations into a single RPC request. The server interprets each of | |||
main procedures: NULL and COMPOUND. All other operations use the | the operations in turn. If an operation is executed by the server | |||
COMPOUND procedure as a wrapper. | and the status of that operation is NFS4_OK, then the next operation | |||
in the COMPOUND procedure is executed. The server continues this | ||||
The COMPOUND procedure is used to combine individual operations into | process until there are no more operations to be executed or until | |||
a single RPC request. The server interprets each of the operations | one of the operations has a status value other than NFS4_OK. | |||
in turn. If an operation is executed by the server and the status of | ||||
that operation is NFS4_OK, then the next operation in the COMPOUND | ||||
procedure is executed. The server continues this process until there | ||||
are no more operations to be executed or one of the operations has a | ||||
status value other than NFS4_OK. | ||||
In the processing of the COMPOUND procedure, the server may find that | In the processing of the COMPOUND procedure, the server may find that | |||
it does not have the available resources to execute any or all of the | it does not have the available resources to execute any or all of the | |||
operations within the COMPOUND sequence. See Section 2.10.6.4 for a | operations within the COMPOUND sequence. See Section 2.10.6.4 for a | |||
more detailed discussion. | more detailed discussion. | |||
The server will generally choose between two methods of decoding the | The server will generally choose between two methods of decoding the | |||
client's request. The first would be the traditional one pass XDR | client's request. The first would be the traditional one-pass XDR | |||
decode. If there is an XDR decoding error in this case, the RPC XDR | decode. If there is an XDR decoding error in this case, the RPC XDR | |||
decode error would be returned. The second method would be to make | decode error would be returned. The second method would be to make | |||
an initial pass to decode the basic COMPOUND request and then to XDR | an initial pass to decode the basic COMPOUND request and then to XDR | |||
decode the individual operations; the most interesting is the decode | decode the individual operations; the most interesting is the decode | |||
of attributes. In this case, the server may encounter an XDR decode | of attributes. In this case, the server may encounter an XDR decode | |||
error during the second pass. In this case, the server would return | error during the second pass. If it does, the server would return | |||
the error NFS4ERR_BADXDR to signify the decode error. | the error NFS4ERR_BADXDR to signify the decode error. | |||
The COMPOUND arguments contain a "minorversion" field. For NFSv4.1, | The COMPOUND arguments contain a "minorversion" field. For NFSv4.1, | |||
the value for this field is 1. If the server receives a COMPOUND | the value for this field is 1. If the server receives a COMPOUND | |||
procedure with a minorversion field value that it does not support, | procedure with a minorversion field value that it does not support, | |||
the server MUST return an error of NFS4ERR_MINOR_VERS_MISMATCH and a | the server MUST return an error of NFS4ERR_MINOR_VERS_MISMATCH and a | |||
zero length resultdata array. | zero-length resultdata array. | |||
Contained within the COMPOUND results is a "status" field. If the | Contained within the COMPOUND results is a "status" field. If the | |||
results array length is non-zero, this status must be equivalent to | results array length is non-zero, this status must be equivalent to | |||
the status of the last operation that was executed within the | the status of the last operation that was executed within the | |||
COMPOUND procedure. Therefore, if an operation incurred an error | COMPOUND procedure. Therefore, if an operation incurred an error | |||
then the "status" value will be the same error value as is being | then the "status" value will be the same error value as is being | |||
returned for the operation that failed. | returned for the operation that failed. | |||
Note that operations, 0 (zero) and 1 (one) are not defined for the | Note that operations zero and one are not defined for the COMPOUND | |||
COMPOUND procedure. Operation 2 is not defined and is reserved for | procedure. Operation 2 is not defined and is reserved for future | |||
future definition and use with minor versioning. If the server | definition and use with minor versioning. If the server receives an | |||
receives a operation array that contains operation 2 and the | operation array that contains operation 2 and the minorversion field | |||
minorversion field has a value of 0 (zero), an error of | has a value of zero, an error of NFS4ERR_OP_ILLEGAL, as described in | |||
NFS4ERR_OP_ILLEGAL, as described in the next paragraph, is returned | the next paragraph, is returned to the client. If an operation array | |||
to the client. If an operation array contains an operation 2 and the | contains an operation 2 and the minorversion field is non-zero and | |||
minorversion field is non-zero and the server does not support the | the server does not support the minor version, the server returns an | |||
minor version, the server returns an error of | error of NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the | |||
NFS4ERR_MINOR_VERS_MISMATCH. Therefore, the | ||||
NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other | NFS4ERR_MINOR_VERS_MISMATCH error takes precedence over all other | |||
errors. | errors. | |||
It is possible that the server receives a request that contains an | It is possible that the server receives a request that contains an | |||
operation that is less than the first legal operation (OP_ACCESS) or | operation that is less than the first legal operation (OP_ACCESS) or | |||
greater than the last legal operation (OP_RELEASE_LOCKOWNER). In | greater than the last legal operation (OP_RELEASE_LOCKOWNER). In | |||
this case, the server's response will encode the opcode OP_ILLEGAL | this case, the server's response will encode the opcode OP_ILLEGAL | |||
rather than the illegal opcode of the request. The status field in | rather than the illegal opcode of the request. The status field in | |||
the ILLEGAL return results will set to NFS4ERR_OP_ILLEGAL. The | the ILLEGAL return results will be set to NFS4ERR_OP_ILLEGAL. The | |||
COMPOUND procedure's return results will also be NFS4ERR_OP_ILLEGAL. | COMPOUND procedure's return results will also be NFS4ERR_OP_ILLEGAL. | |||
The definition of the "tag" in the request is left to the | The definition of the "tag" in the request is left to the | |||
implementor. It may be used to summarize the content of the compound | implementor. It may be used to summarize the content of the Compound | |||
request for the benefit of packet sniffers and engineers debugging | request for the benefit of packet-sniffers and engineers debugging | |||
implementations. However, the value of "tag" in the response SHOULD | implementations. However, the value of "tag" in the response SHOULD | |||
be the same value as provided in the request. This applies to the | be the same value as provided in the request. This applies to the | |||
tag field of the CB_COMPOUND procedure as well. | tag field of the CB_COMPOUND procedure as well. | |||
16.2.3.1. Current Filehandle and Stateid | 16.2.3.1. Current Filehandle and Stateid | |||
The COMPOUND procedure offers a simple environment for the execution | The COMPOUND procedure offers a simple environment for the execution | |||
of the operations specified by the client. The first two relate to | of the operations specified by the client. The first two relate to | |||
the filehandle while the second two relate to the current stateid. | the filehandle while the second two relate to the current stateid. | |||
16.2.3.1.1. Current Filehandle | 16.2.3.1.1. Current Filehandle | |||
The current and saved filehandle are used throughout the protocol. | The current and saved filehandles are used throughout the protocol. | |||
Most operations implicitly use the current filehandle as a argument | Most operations implicitly use the current filehandle as an argument, | |||
and many set the current filehandle as part of the results. The | and many set the current filehandle as part of the results. The | |||
combination of client specified sequences of operations and current | combination of client-specified sequences of operations and current | |||
and saved filehandle arguments and results allows for greater | and saved filehandle arguments and results allows for greater | |||
protocol flexibility. The best or easiest example of current | protocol flexibility. The best or easiest example of current | |||
filehandle usage is a sequence like the following: | filehandle usage is a sequence like the following: | |||
PUTFH fh1 {fh1} | PUTFH fh1 {fh1} | |||
LOOKUP "compA" {fh2} | LOOKUP "compA" {fh2} | |||
GETATTR {fh2} | GETATTR {fh2} | |||
LOOKUP "compB" {fh3} | LOOKUP "compB" {fh3} | |||
GETATTR {fh3} | GETATTR {fh3} | |||
LOOKUP "compC" {fh4} | LOOKUP "compC" {fh4} | |||
skipping to change at page 403, line 41 | skipping to change at page 403, line 32 | |||
The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.20) operations | The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.20) operations | |||
also set the current filehandle. The above example would replace | also set the current filehandle. The above example would replace | |||
"PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in | "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in | |||
order to achieve the same effect (on the assumption that "compA" is | order to achieve the same effect (on the assumption that "compA" is | |||
directly below the root of the namespace). | directly below the root of the namespace). | |||
Along with the current filehandle, there is a saved filehandle. | Along with the current filehandle, there is a saved filehandle. | |||
While the current filehandle is set as the result of operations like | While the current filehandle is set as the result of operations like | |||
LOOKUP, the saved filehandle must be set directly with the use of the | LOOKUP, the saved filehandle must be set directly with the use of the | |||
SAVEFH operation. The SAVEFH operations copies the current | SAVEFH operation. The SAVEFH operation copies the current filehandle | |||
filehandle value to the saved value. The saved filehandle value is | value to the saved value. The saved filehandle value is used in | |||
used in combination with the current filehandle value for the LINK | combination with the current filehandle value for the LINK and RENAME | |||
and RENAME operations. The RESTOREFH operation will copy the saved | operations. The RESTOREFH operation will copy the saved filehandle | |||
filehandle value to the current filehandle value; as a result, the | value to the current filehandle value; as a result, the saved | |||
saved filehandle value may be used a sort of "scratch" area for the | filehandle value may be used a sort of "scratch" area for the | |||
client's series of operations. | client's series of operations. | |||
16.2.3.1.2. Current Stateid | 16.2.3.1.2. Current Stateid | |||
With NFSv4.1, additions of a current stateid and a saved stateid have | With NFSv4.1, additions of a current stateid and a saved stateid have | |||
been made to the COMPOUND processing environment; this allows for the | been made to the COMPOUND processing environment; this allows for the | |||
passing of stateids between operations. There are no changes to the | passing of stateids between operations. There are no changes to the | |||
syntax of the protocol, only changes to the semantics of a few | syntax of the protocol, only changes to the semantics of a few | |||
operations. | operations. | |||
A "current stateid" is the stateid that is associated with the | A "current stateid" is the stateid that is associated with the | |||
current filehandle. The current stateid may only be changed by an | current filehandle. The current stateid may only be changed by an | |||
operation that modifies the current filehandle or returns a stateid. | operation that modifies the current filehandle or returns a stateid. | |||
If an operation returns a stateid it MUST set the current stateid to | ||||
If an operation returns a stateid, it MUST set the current stateid to | ||||
the returned value. If an operation sets the current filehandle but | the returned value. If an operation sets the current filehandle but | |||
does not return a stateid, the current stateid MUST be set to the | does not return a stateid, the current stateid MUST be set to the | |||
all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an | all-zeros special stateid, i.e., (seqid, other) = (0, 0). If an | |||
operation uses a stateid as an argument but does not return a | operation uses a stateid as an argument but does not return a | |||
stateid, the current stateid MUST NOT be changed. E.g., PUTFH, | stateid, the current stateid MUST NOT be changed. For example, | |||
PUTROOTFH, and PUTPUBFH will change the current server state from | PUTFH, PUTROOTFH, and PUTPUBFH will change the current server state | |||
{ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current | from {ocfh, (osid)} to {cfh, (0, 0)}, while LOCK will change the | |||
state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP | current state from {cfh, (osid} to {cfh, (nsid)}. Operations like | |||
that transform a current filehandle and component name into a new | LOOKUP that transform a current filehandle and component name into a | |||
current filehandle will also change the current stateid to {0, 0}. | new current filehandle will also change the current state to {0, 0}. | |||
The SAVEFH and RESTOREFH operations will save and restore both the | The SAVEFH and RESTOREFH operations will save and restore both the | |||
current filehandle and the current stateid as a set. | current filehandle and the current stateid as a set. | |||
The following example is the common case of a simple READ operation | The following example is the common case of a simple READ operation | |||
with a supplied stateid showing that the PUTFH initializes the | with a normal stateid showing that the PUTFH initializes the current | |||
current stateid to (0, 0). The subsequent READ with stateid (sid1) | stateid to (0, 0). The subsequent READ with stateid (sid1) leaves | |||
leaves the current stateid unchanged, but does evaluate the | the current stateid unchanged. | |||
operation. | ||||
PUTFH fh1 - -> {fh1, (0, 0)} | PUTFH fh1 - -> {fh1, (0, 0)} | |||
READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} | READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} | |||
Figure 3 | Figure 3 | |||
This next example performs an OPEN with the root filehandle and as a | This next example performs an OPEN with the root filehandle and, as a | |||
result generates stateid (sid1). The next operation specifies the | result, generates stateid (sid1). The next operation specifies the | |||
READ with the argument stateid set such that (seqid, other) are equal | READ with the argument stateid set such that (seqid, other) are equal | |||
to (1, 0), but the current stateid set by the previous operation is | to (1, 0), but the current stateid set by the previous operation is | |||
actually used when the operation is evaluated. This allows correct | actually used when the operation is evaluated. This allows correct | |||
interaction with any existing, potentially conflicting, locks. | interaction with any existing, potentially conflicting, locks. | |||
PUTROOTFH - -> {fh1, (0, 0)} | PUTROOTFH - -> {fh1, (0, 0)} | |||
OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} | OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} | |||
READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} | READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} | |||
CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} | CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} | |||
Figure 4 | Figure 4 | |||
This next example is similar to the second in how it passes the | This next example is similar to the second in how it passes the | |||
stateid sid2 generated by the LOCK operation to the next READ | stateid sid2 generated by the LOCK operation to the next READ | |||
operation. This allows the client to explicitly surround a single | operation. This allows the client to explicitly surround a single | |||
I/O operation with a lock and its appropriate stateid to guarantee | I/O operation with a lock and its appropriate stateid to guarantee | |||
correctness with other client locks. The example also shows how | correctness with other client locks. The example also shows how | |||
SAVEFH and RESTOREFH can save and later re-use a filehandle and | SAVEFH and RESTOREFH can save and later reuse a filehandle and | |||
stateid, passing them as the current filehandle and stateid to a READ | stateid, passing them as the current filehandle and stateid to a READ | |||
operation. | operation. | |||
PUTFH fh1 - -> {fh1, (0, 0)} | PUTFH fh1 - -> {fh1, (0, 0)} | |||
LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} | LOCK 0, 1024, (sid1) {fh1, (sid1)} -> {fh1, (sid2)} | |||
READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} | READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} | |||
LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} | LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} | |||
SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} | SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} | |||
PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} | PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} | |||
WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} | WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} | |||
RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} | RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} | |||
READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} | READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} | |||
Figure 5 | Figure 5 | |||
The final example shows a disallowed use of the current stateid. The | The final example shows a disallowed use of the current stateid. The | |||
client is attempting to implicitly pass anonymous special stateid, | client is attempting to implicitly pass an anonymous special stateid, | |||
(0,0) to the READ operation. The server MUST return | (0,0), to the READ operation. The server MUST return | |||
NFS4ERR_BAD_STATEID in the reply to the READ operation. | NFS4ERR_BAD_STATEID in the reply to the READ operation. | |||
PUTFH fh1 - -> {fh1, (0, 0)} | PUTFH fh1 - -> {fh1, (0, 0)} | |||
READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID | READ (1, 0), 0, 1024 {fh1, (0, 0)} -> NFS4ERR_BAD_STATEID | |||
Figure 6 | Figure 6 | |||
16.2.4. ERRORS | 16.2.4. ERRORS | |||
COMPOUND will of course return every error that each operation on the | COMPOUND will of course return every error that each operation on the | |||
fore channel can return (see Table 6). However if COMPOUND returns | fore channel can return (see Table 6). However, if COMPOUND returns | |||
zero operations, obviously the error returned by COMPOUND has nothing | zero operations, obviously the error returned by COMPOUND has nothing | |||
to do with an error returned by an operation. The list of errors | to do with an error returned by an operation. The list of errors | |||
COMPOUND will return if it processes zero operations include: | COMPOUND will return if it processes zero operations include: | |||
COMPOUND error returns | COMPOUND Error Returns | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
| Error | Notes | | | Error | Notes | | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
| NFS4ERR_BADCHAR | The tag argument has a character | | | NFS4ERR_BADCHAR | The tag argument has a character | | |||
| | the replier does not support. | | | | the replier does not support. | | |||
| NFS4ERR_BADXDR | | | | NFS4ERR_BADXDR | | | |||
| NFS4ERR_DELAY | | | | NFS4ERR_DELAY | | | |||
| NFS4ERR_INVAL | The tag argument is not in UTF-8 | | | NFS4ERR_INVAL | The tag argument is not in UTF-8 | | |||
| | encoding. | | | | encoding. | | |||
skipping to change at page 406, line 29 | skipping to change at page 406, line 29 | |||
| NFS4ERR_REP_TOO_BIG | | | | NFS4ERR_REP_TOO_BIG | | | |||
| NFS4ERR_REP_TOO_BIG_TO_CACHE | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | | | |||
| NFS4ERR_REQ_TOO_BIG | | | | NFS4ERR_REQ_TOO_BIG | | | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
Table 9 | Table 9 | |||
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL | 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL | |||
The following tables summarize the operations of the NFSv4.1 protocol | The following tables summarize the operations of the NFSv4.1 protocol | |||
and the corresponding designation of REQUIRED, RECOMMENDED, OPTIONAL | and the corresponding designation of REQUIRED, RECOMMENDED, and | |||
to implement or MUST NOT implement. The designation of MUST NOT | OPTIONAL to implement or MUST NOT implement. The designation of MUST | |||
implement is reserved for those operations that were defined in | NOT implement is reserved for those operations that were defined in | |||
NFSv4.0 and MUST NOT be implemented in NFSv4.1. | NFSv4.0 and MUST NOT be implemented in NFSv4.1. | |||
For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation | For the most part, the REQUIRED, RECOMMENDED, or OPTIONAL designation | |||
for operations sent by the client is for the server implementation. | for operations sent by the client is for the server implementation. | |||
The client is generally required to implement the operations needed | The client is generally required to implement the operations needed | |||
for the operating environment for which it serves. For example, a | for the operating environment for which it serves. For example, a | |||
read-only NFSv4.1 client would have no need to implement the WRITE | read-only NFSv4.1 client would have no need to implement the WRITE | |||
operation and is not required to do so. | operation and is not required to do so. | |||
The REQUIRED or OPTIONAL designation for callback operations sent by | The REQUIRED or OPTIONAL designation for callback operations sent by | |||
skipping to change at page 407, line 19 | skipping to change at page 407, line 18 | |||
REQ REQUIRED to implement | REQ REQUIRED to implement | |||
REC RECOMMEND to implement | REC RECOMMEND to implement | |||
OPT OPTIONAL to implement | OPT OPTIONAL to implement | |||
MNI MUST NOT implement | MNI MUST NOT implement | |||
For the NFSv4.1 features that are OPTIONAL, the operations that | For the NFSv4.1 features that are OPTIONAL, the operations that | |||
support those features are OPTIONAL and the server would return | support those features are OPTIONAL, and the server would return | |||
NFS4ERR_NOTSUPP in response to the client's use of those operations. | NFS4ERR_NOTSUPP in response to the client's use of those operations. | |||
If an OPTIONAL feature is supported, it is possible that a set of | If an OPTIONAL feature is supported, it is possible that a set of | |||
operations related to the feature become REQUIRED to implement. The | operations related to the feature become REQUIRED to implement. The | |||
third column of the table designates the feature(s) and if the | third column of the table designates the feature(s) and if the | |||
operation is REQUIRED or OPTIONAL in the presence of support for the | operation is REQUIRED or OPTIONAL in the presence of support for the | |||
feature. | feature. | |||
The OPTIONAL features identified and their abbreviations are as | The OPTIONAL features identified and their abbreviations are as | |||
follows: | follows: | |||
skipping to change at page 408, line 44 | skipping to change at page 408, line 44 | |||
| READDIR | REQ | | Section 18.23 | | | READDIR | REQ | | Section 18.23 | | |||
| READLINK | OPT | | Section 18.24 | | | READLINK | OPT | | Section 18.24 | | |||
| RECLAIM_COMPLETE | REQ | | Section 18.51 | | | RECLAIM_COMPLETE | REQ | | Section 18.51 | | |||
| RELEASE_LOCKOWNER | MNI | | N/A | | | RELEASE_LOCKOWNER | MNI | | N/A | | |||
| REMOVE | REQ | | Section 18.25 | | | REMOVE | REQ | | Section 18.25 | | |||
| RENAME | REQ | | Section 18.26 | | | RENAME | REQ | | Section 18.26 | | |||
| RENEW | MNI | | N/A | | | RENEW | MNI | | N/A | | |||
| RESTOREFH | REQ | | Section 18.27 | | | RESTOREFH | REQ | | Section 18.27 | | |||
| SAVEFH | REQ | | Section 18.28 | | | SAVEFH | REQ | | Section 18.28 | | |||
| SECINFO | REQ | | Section 18.29 | | | SECINFO | REQ | | Section 18.29 | | |||
| SECINFO_NO_NAME | REC | pNFS files | Section 18.45, | | | SECINFO_NO_NAME | REC | pNFS file | Section 18.45, | | |||
| | | layout (REQ) | Section 13.12 | | | | | layout (REQ) | Section 13.12 | | |||
| SEQUENCE | REQ | | Section 18.46 | | | SEQUENCE | REQ | | Section 18.46 | | |||
| SETATTR | REQ | | Section 18.30 | | | SETATTR | REQ | | Section 18.30 | | |||
| SETCLIENTID | MNI | | N/A | | | SETCLIENTID | MNI | | N/A | | |||
| SETCLIENTID_CONFIRM | MNI | | N/A | | | SETCLIENTID_CONFIRM | MNI | | N/A | | |||
| SET_SSV | REQ | | Section 18.47 | | | SET_SSV | REQ | | Section 18.47 | | |||
| TEST_STATEID | REQ | | Section 18.48 | | | TEST_STATEID | REQ | | Section 18.48 | | |||
| VERIFY | REQ | | Section 18.31 | | | VERIFY | REQ | | Section 18.31 | | |||
| WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | | | WANT_DELEGATION | OPT | FDELG (OPT) | Section 18.49 | | |||
| WRITE | REQ | | Section 18.32 | | | WRITE | REQ | | Section 18.32 | | |||
+----------------------+------------+--------------+----------------+ | +----------------------+------------+--------------+----------------+ | |||
Callback Operations: | ||||
Callback Operations | Callback Operations | |||
+-------------------------+-----------+-------------+---------------+ | +-------------------------+-----------+-------------+---------------+ | |||
| Operation | REQ, REC, | Feature | Definition | | | Operation | REQ, REC, | Feature | Definition | | |||
| | OPT, or | (REQ, REC, | | | | | OPT, or | (REQ, REC, | | | |||
| | MNI | or OPT) | | | | | MNI | or OPT) | | | |||
+-------------------------+-----------+-------------+---------------+ | +-------------------------+-----------+-------------+---------------+ | |||
| CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | | | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | | |||
| CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | | | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | | |||
| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | | | CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | | |||
skipping to change at page 411, line 12 | skipping to change at page 411, line 12 | |||
example, if the client sends an ACCESS operation with just the | example, if the client sends an ACCESS operation with just the | |||
ACCESS4_READ value set and the server supports this value, the server | ACCESS4_READ value set and the server supports this value, the server | |||
MUST NOT set more than ACCESS4_READ in the supported field even if it | MUST NOT set more than ACCESS4_READ in the supported field even if it | |||
could have reliably checked other values. | could have reliably checked other values. | |||
The reply's access field MUST NOT contain more values than the | The reply's access field MUST NOT contain more values than the | |||
supported field. | supported field. | |||
The results of this operation are necessarily advisory in nature. A | The results of this operation are necessarily advisory in nature. A | |||
return status of NFS4_OK and the appropriate bit set in the bit mask | return status of NFS4_OK and the appropriate bit set in the bit mask | |||
does not imply that such access will be allowed to the file system | do not imply that such access will be allowed to the file system | |||
object in the future. This is because access rights can be revoked | object in the future. This is because access rights can be revoked | |||
by the server at any time. | by the server at any time. | |||
The following access permissions may be requested: | The following access permissions may be requested: | |||
ACCESS4_READ Read data from file or read a directory. | ACCESS4_READ Read data from file or read a directory. | |||
ACCESS4_LOOKUP Look up a name in a directory (no meaning for non- | ACCESS4_LOOKUP Look up a name in a directory (no meaning for non- | |||
directory objects). | directory objects). | |||
skipping to change at page 411, line 38 | skipping to change at page 411, line 38 | |||
ACCESS4_DELETE Delete an existing directory entry. | ACCESS4_DELETE Delete an existing directory entry. | |||
ACCESS4_EXECUTE Execute a regular file (no meaning for a directory). | ACCESS4_EXECUTE Execute a regular file (no meaning for a directory). | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
ACCESS4_EXECUTE is a challenging semantic to implement because NFS | ACCESS4_EXECUTE is a challenging semantic to implement because NFS | |||
provides remote file access, not remote execution. This leads to the | provides remote file access, not remote execution. This leads to the | |||
following: | following: | |||
o Whether a regular file is executable or not ought to be the | o Whether or not a regular file is executable ought to be the | |||
responsibility of the NFS client and not the server. And yet the | responsibility of the NFS client and not the server. And yet the | |||
ACCESS operation is specified to seemingly require a server to own | ACCESS operation is specified to seemingly require a server to own | |||
that responsibility. | that responsibility. | |||
o When a client executes a regular file, it has to read the file | o When a client executes a regular file, it has to read the file | |||
from the server. Strictly speaking, the server should not allow | from the server. Strictly speaking, the server should not allow | |||
the client to read a file being executed unless the user has read | the client to read a file being executed unless the user has read | |||
permissions on the file. Requiring users and administers to set | permissions on the file. Requiring explicit read permissions on | |||
read permissions on executable files in order to access them over | executable files in order to access them over NFS is not going to | |||
NFS is not going to be acceptable to some people. Historically, | be acceptable to some users and storage administrators. | |||
NFS servers have allowed a user to READ a file if the user has | Historically, NFS servers have allowed a user to READ a file if | |||
execute access to the file. | the user has execute access to the file. | |||
As a practical example, the UNIX specification [52] states that an | As a practical example, the UNIX specification [52] states that an | |||
implementation claiming conformance to UNIX may indicate in the | implementation claiming conformance to UNIX may indicate in the | |||
access() programming interface's result that a privileged user has | access() programming interface's result that a privileged user has | |||
execute rights, even if no execute permission bits are set on the | execute rights, even if no execute permission bits are set on the | |||
regular file's attributes. It is possible to claim conformance to | regular file's attributes. It is possible to claim conformance to | |||
the UNIX specification and instead not indicate execute rights in | the UNIX specification and instead not indicate execute rights in | |||
that situation, which is true for some operating environments. | that situation, which is true for some operating environments. | |||
Suppose the operating environments of the client and server are | Suppose the operating environments of the client and server are | |||
implementing the access() semantics for privileged users differently, | implementing the access() semantics for privileged users differently, | |||
skipping to change at page 412, line 36 | skipping to change at page 412, line 36 | |||
the user is privileged, and no execute permission bits are set on | the user is privileged, and no execute permission bits are set on | |||
the regular file's attribute, and the server's access() interface | the regular file's attribute, and the server's access() interface | |||
does return X_OK in that situation. Then: | does return X_OK in that situation. Then: | |||
* The client will be able to execute files stored on the NFS | * The client will be able to execute files stored on the NFS | |||
server that could be executed if stored on a non-NFS file | server that could be executed if stored on a non-NFS file | |||
system, unless the client's execution subsystem also checks for | system, unless the client's execution subsystem also checks for | |||
execute permission bits. | execute permission bits. | |||
* Even if the execution subsystem is checking for execute | * Even if the execution subsystem is checking for execute | |||
permission bits, there are more potential issues. E.g. suppose | permission bits, there are more potential issues. For example, | |||
the client is invoking access() to build a "path search table" | suppose the client is invoking access() to build a "path search | |||
of all executable files in the user's "search path", where the | table" of all executable files in the user's "search path", | |||
path is a list of directories each containing executable files. | where the path is a list of directories each containing | |||
Suppose there are two files each in separate directories of the | executable files. Suppose there are two files each in separate | |||
search path, such that files have the same component name. In | directories of the search path, such that files have the same | |||
the first directory the file has no execute permission bits | component name. In the first directory the file has no execute | |||
set, and in the second directory the file has execute bits set. | permission bits set, and in the second directory the file has | |||
The path search table will indicate that the first directory | execute bits set. The path search table will indicate that the | |||
has the executable file, but the execute subsystem will fail to | first directory has the executable file, but the execute | |||
execute it. The command shell might fail to try the second | subsystem will fail to execute it. The command shell might | |||
file in the second directory. And even if it did, this is a | fail to try the second file in the second directory. And even | |||
potential performance issue. Clearly the desired outcome for | if it did, this is a potential performance issue. Clearly, the | |||
the client is for the path search table to not contain the | desired outcome for the client is for the path search table to | |||
first file. | not contain the first file. | |||
To deal the problems described above, the smart client, stupid server | To deal with the problems described above, the "smart client, stupid | |||
principle is used. The client owns overall responsibility for | server" principle is used. The client owns overall responsibility | |||
determining execute access and relies on the server to parse the | for determining execute access and relies on the server to parse the | |||
execution permissions within the file's mode, acl, and dacl | execution permissions within the file's mode, acl, and dacl | |||
attributes. The rules for the client and server follow: | attributes. The rules for the client and server follow: | |||
o If the client is sending ACCESS in order to determine if the user | o If the client is sending ACCESS in order to determine if the user | |||
can read the file, the client SHOULD set ACCESS4_READ in the | can read the file, the client SHOULD set ACCESS4_READ in the | |||
request's access field. | request's access field. | |||
o If the client's operating environment only grants execution to the | o If the client's operating environment only grants execution to the | |||
user if the user has execute access according to the execute | user if the user has execute access according to the execute | |||
permissions in the mode, acl, and dacl attributes, then if the | permissions in the mode, acl, and dacl attributes, then if the | |||
skipping to change at page 413, line 33 | skipping to change at page 413, line 33 | |||
the client wants to determine execute access, it SHOULD send an | the client wants to determine execute access, it SHOULD send an | |||
ACCESS request with both the ACCESS4_EXECUTE and ACCESS4_READ bits | ACCESS request with both the ACCESS4_EXECUTE and ACCESS4_READ bits | |||
set in the request's access field. This way, if any read or | set in the request's access field. This way, if any read or | |||
execute permission grants the user read or execute access (or if | execute permission grants the user read or execute access (or if | |||
the server interprets the user as privileged), as indicated by the | the server interprets the user as privileged), as indicated by the | |||
presence of ACCESS4_EXECUTE and/or ACCESS4_READ in the reply's | presence of ACCESS4_EXECUTE and/or ACCESS4_READ in the reply's | |||
access field, the client will be able to grant the user execute | access field, the client will be able to grant the user execute | |||
access to the file. | access to the file. | |||
o If the server supports execute permission bits, or some other | o If the server supports execute permission bits, or some other | |||
method for denoting executability (e.g. the suffix of the name of | method for denoting executability (e.g., the suffix of the name of | |||
the file might indicate execute), it MUST check only execute | the file might indicate execute), it MUST check only execute | |||
permissions, not read permissions, when determining whether the | permissions, not read permissions, when determining whether or not | |||
reply will have ACCESS4_EXECUTE set in the access field or not. | the reply will have ACCESS4_EXECUTE set in the access field. The | |||
The server MUST NOT also examine read permission bits when | server MUST NOT also examine read permission bits when determining | |||
determining whether the reply will have ACCESS4_EXECUTE set in the | whether or not the reply will have ACCESS4_EXECUTE set in the | |||
access field or not. Even if the server's operating environment | access field. Even if the server's operating environment would | |||
would grant execute access to the user (e.g., the user is | grant execute access to the user (e.g., the user is privileged), | |||
privileged), the server MUST NOT reply with ACCESS4_EXECUTE set in | the server MUST NOT reply with ACCESS4_EXECUTE set in reply's | |||
reply's access field, unless there is at least one execute | access field unless there is at least one execute permission bit | |||
permission bit set in the mode, acl, or dacl attributes. In the | set in the mode, acl, or dacl attributes. In the case of acl and | |||
case of acl and dacl, the "one execute permission bit" MUST be an | dacl, the "one execute permission bit" MUST be an ACE4_EXECUTE bit | |||
ACE4_EXECUTE bit set in an ALLOW ACE. | set in an ALLOW ACE. | |||
o If the server does not support execute permission bits or some | o If the server does not support execute permission bits or some | |||
other method for denoting executability, it MUST NOT set | other method for denoting executability, it MUST NOT set | |||
ACCESS4_EXECUTE in the reply's supported and access fields. If | ACCESS4_EXECUTE in the reply's supported and access fields. If | |||
the client set ACCESS4_EXECUTE in the ACCESS request's access | the client set ACCESS4_EXECUTE in the ACCESS request's access | |||
field, and ACCESS4_EXECUTE is not set in the reply's supported | field, and ACCESS4_EXECUTE is not set in the reply's supported | |||
field, then the client will have to send an ACCESS request with | field, then the client will have to send an ACCESS request with | |||
the ACCESS4_READ bit set in the request's access field. | the ACCESS4_READ bit set in the request's access field. | |||
o If the server supports read permission bits, it MUST only check | o If the server supports read permission bits, it MUST only check | |||
for read permissions in the mode, acl, and dacl attributes when it | for read permissions in the mode, acl, and dacl attributes when it | |||
receives an ACCESS request with ACCESS4_READ set the access field. | receives an ACCESS request with ACCESS4_READ set in the access | |||
The server MUST NOT also examine execute permission bits when | field. The server MUST NOT also examine execute permission bits | |||
determining whether the reply will have ACCESS4_READ set in the | when determining whether the reply will have ACCESS4_READ set in | |||
access field or not. | the access field or not. | |||
Note that if the ACCESS reply has ACCESS4_READ or ACCESS_EXECUTE set, | Note that if the ACCESS reply has ACCESS4_READ or ACCESS_EXECUTE set, | |||
then the user also has permissions to OPEN (Section 18.16) or READ | then the user also has permissions to OPEN (Section 18.16) or READ | |||
(Section 18.22) the file. I.e., if client sends an ACCESS request | (Section 18.22) the file. In other words, if the client sends an | |||
with the ACCESS4_READ and ACCESS_EXECUTE set in the access field (or | ACCESS request with the ACCESS4_READ and ACCESS_EXECUTE set in the | |||
two separate requests, one with ACCESS4_READ set, and the other with | access field (or two separate requests, one with ACCESS4_READ set and | |||
ACCESS4_EXECUTE set), and the reply has just ACCESS4_EXECUTE set in | the other with ACCESS4_EXECUTE set), and the reply has just | |||
the access field (or just one reply has ACCESS4_EXECUTE set), then | ACCESS4_EXECUTE set in the access field (or just one reply has | |||
the user has authorization to OPEN or READ the file. | ACCESS4_EXECUTE set), then the user has authorization to OPEN or READ | |||
the file. | ||||
18.1.4. IMPLEMENTATION | 18.1.4. IMPLEMENTATION | |||
In general, it is not sufficient for the client to attempt to deduce | In general, it is not sufficient for the client to attempt to deduce | |||
access permissions by inspecting the uid, gid, and mode fields in the | access permissions by inspecting the uid, gid, and mode fields in the | |||
file attributes or by attempting to interpret the contents of the ACL | file attributes or by attempting to interpret the contents of the ACL | |||
attribute. This is because the server may perform uid or gid mapping | attribute. This is because the server may perform uid or gid mapping | |||
or enforce additional access control restrictions. It is also | or enforce additional access-control restrictions. It is also | |||
possible that the server may not be in the same ID space as the | possible that the server may not be in the same ID space as the | |||
client. In these cases (and perhaps others), the client can not | client. In these cases (and perhaps others), the client can not | |||
reliably perform an access check with only current file attributes. | reliably perform an access check with only current file attributes. | |||
In the NFSv2 protocol, the only reliable way to determine whether an | In the NFSv2 protocol, the only reliable way to determine whether an | |||
operation was allowed was to try it and see if it succeeded or | operation was allowed was to try it and see if it succeeded or | |||
failed. Using the ACCESS operation in the NFSv4.1 protocol, the | failed. Using the ACCESS operation in the NFSv4.1 protocol, the | |||
client can ask the server to indicate whether or not one or more | client can ask the server to indicate whether or not one or more | |||
classes of operations are permitted. The ACCESS operation is | classes of operations are permitted. The ACCESS operation is | |||
provided to allow clients to check before doing a series of | provided to allow clients to check before doing a series of | |||
operations which will result in an access failure. The OPEN | operations that will result in an access failure. The OPEN operation | |||
operation provides a point where the server can verify access to the | provides a point where the server can verify access to the file | |||
file object and method to return that information to the client. The | object and a method to return that information to the client. The | |||
ACCESS operation is still useful for directory operations or for use | ACCESS operation is still useful for directory operations or for use | |||
in the case the UNIX interface access() is used on the client. | in the case that the UNIX interface access() is used on the client. | |||
The information returned by the server in response to an ACCESS call | The information returned by the server in response to an ACCESS call | |||
is not permanent. It was correct at the exact time that the server | is not permanent. It was correct at the exact time that the server | |||
performed the checks, but not necessarily afterwards. The server can | performed the checks, but not necessarily afterwards. The server can | |||
revoke access permission at any time. | revoke access permission at any time. | |||
The client should use the effective credentials of the user to build | The client should use the effective credentials of the user to build | |||
the authentication information in the ACCESS request used to | the authentication information in the ACCESS request used to | |||
determine access rights. It is the effective user and group | determine access rights. It is the effective user and group | |||
credentials that are used in subsequent read and write operations. | credentials that are used in subsequent READ and WRITE operations. | |||
Many implementations do not directly support the ACCESS4_DELETE | Many implementations do not directly support the ACCESS4_DELETE | |||
permission. Operating systems like UNIX will ignore the | permission. Operating systems like UNIX will ignore the | |||
ACCESS4_DELETE bit if set on an access request on a non-directory | ACCESS4_DELETE bit if set on an access request on a non-directory | |||
object. In these systems, delete permission on a file is determined | object. In these systems, delete permission on a file is determined | |||
by the access permissions on the directory in which the file resides, | by the access permissions on the directory in which the file resides, | |||
instead of being determined by the permissions of the file itself. | instead of being determined by the permissions of the file itself. | |||
Therefore, the mask returned enumerating which access rights can be | Therefore, the mask returned enumerating which access rights can be | |||
determined will have the ACCESS4_DELETE value set to 0. This | determined will have the ACCESS4_DELETE value set to 0. This | |||
indicates to the client that the server was unable to check that | indicates to the client that the server was unable to check that | |||
skipping to change at page 415, line 44 | skipping to change at page 415, line 46 | |||
stateid4 open_stateid; | stateid4 open_stateid; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.2.3. DESCRIPTION | 18.2.3. DESCRIPTION | |||
The CLOSE operation releases share reservations for the regular or | The CLOSE operation releases share reservations for the regular or | |||
named attribute file as specified by the current filehandle. The | named attribute file as specified by the current filehandle. The | |||
share reservations and other state information released at the server | share reservations and other state information released at the server | |||
as a result of this CLOSE is only that associated with the supplied | as a result of this CLOSE are only those associated with the supplied | |||
stateid. State associated with other OPENs is not affected. | stateid. State associated with other OPENs is not affected. | |||
If byte-range locks are held, the client SHOULD release all locks | If byte-range locks are held, the client SHOULD release all locks | |||
before sending a CLOSE. The server MAY free all outstanding locks on | before sending a CLOSE. The server MAY free all outstanding locks on | |||
CLOSE but some servers may not support the CLOSE of a file that still | CLOSE, but some servers may not support the CLOSE of a file that | |||
has byte-range locks held. The server MUST return failure if any | still has byte-range locks held. The server MUST return failure if | |||
locks would exist after the CLOSE. | any locks would exist after the CLOSE. | |||
The argument seqid MAY have any value and the server MUST ignore | The argument seqid MAY have any value, and the server MUST ignore | |||
seqid. | seqid. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
The server MAY require that the principal, security flavor, and | The server MAY require that the combination of principal, security | |||
applicable, the GSS mechanism, combination that sent the OPEN request | flavor, and, if applicable, GSS mechanism that sent the OPEN request | |||
also be the one to CLOSE the file. This might not be possible if | also be the one to CLOSE the file. This might not be possible if | |||
credentials for the principal are no longer available. The server | credentials for the principal are no longer available. The server | |||
MAY allow the machine credential or SSV credential (see | MAY allow the machine credential or SSV credential (see | |||
Section 18.35) to send CLOSE. | Section 18.35) to send CLOSE. | |||
18.2.4. IMPLEMENTATION | 18.2.4. IMPLEMENTATION | |||
Even though CLOSE returns a stateid, this stateid is not useful to | Even though CLOSE returns a stateid, this stateid is not useful to | |||
the client and should be treated as deprecated. CLOSE "shuts down" | the client and should be treated as deprecated. CLOSE "shuts down" | |||
the state associated with all OPENs for the file by a single open- | the state associated with all OPENs for the file by a single open- | |||
owner. As noted above, CLOSE will either release all file locking | owner. As noted above, CLOSE will either release all file-locking | |||
state or return an error. Therefore, the stateid returned by CLOSE | state or return an error. Therefore, the stateid returned by CLOSE | |||
is not useful for operations that follow. To help find any uses of | is not useful for operations that follow. To help find any uses of | |||
this stateid by clients, the server SHOULD return the invalid special | this stateid by clients, the server SHOULD return the invalid special | |||
stated (the "other" value is zero and the "seqid" field is | stateid (the "other" value is zero and the "seqid" field is | |||
NFS4_UINT32_MAX, see Section 8.2.3). | NFS4_UINT32_MAX, see Section 8.2.3). | |||
A CLOSE operation may make delegations grantable where they were not | A CLOSE operation may make delegations grantable where they were not | |||
previously. Servers may choose to respond immediately if there are | previously. Servers may choose to respond immediately if there are | |||
pending delegation want requests or may respond to the situation at a | pending delegation want requests or may respond to the situation at a | |||
later time. | later time. | |||
18.3. Operation 5: COMMIT - Commit Cached Data | 18.3. Operation 5: COMMIT - Commit Cached Data | |||
18.3.1. ARGUMENTS | 18.3.1. ARGUMENTS | |||
skipping to change at page 417, line 22 | skipping to change at page 417, line 22 | |||
case NFS4_OK: | case NFS4_OK: | |||
COMMIT4resok resok4; | COMMIT4resok resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.3.3. DESCRIPTION | 18.3.3. DESCRIPTION | |||
The COMMIT operation forces or flushes uncommitted, modified data to | The COMMIT operation forces or flushes uncommitted, modified data to | |||
stable storage for the file specified by the current filehandle. The | stable storage for the file specified by the current filehandle. The | |||
flushed data is that which was previously written with a WRITE | flushed data is that which was previously written with one or more | |||
operation which had the stable field set to UNSTABLE4. | WRITE operations that had the "committed" field of their results | |||
field set to UNSTABLE4. | ||||
The offset specifies the position within the file where the flush is | The offset specifies the position within the file where the flush is | |||
to begin. An offset value of 0 (zero) means to flush data starting | to begin. An offset value of zero means to flush data starting at | |||
at the beginning of the file. The count specifies the number of | the beginning of the file. The count specifies the number of bytes | |||
bytes of data to flush. If count is 0 (zero), a flush from offset to | of data to flush. If the count is zero, a flush from the offset to | |||
the end of the file is done. | the end of the file is done. | |||
The server returns a write verifier upon successful completion of the | The server returns a write verifier upon successful completion of the | |||
COMMIT. The write verifier is used by the client to determine if the | COMMIT. The write verifier is used by the client to determine if the | |||
server has restarted between the initial WRITE(s) and the COMMIT. | server has restarted between the initial WRITE operations and the | |||
The client does this by comparing the write verifier returned from | COMMIT. The client does this by comparing the write verifier | |||
the initial writes and the verifier returned by the COMMIT operation. | returned from the initial WRITE operations and the verifier returned | |||
The server must vary the value of the write verifier at each server | by the COMMIT operation. The server must vary the value of the write | |||
event or instantiation that may lead to a loss of uncommitted data. | verifier at each server event or instantiation that may lead to a | |||
Most commonly this occurs when the server is restarted; however, | loss of uncommitted data. Most commonly this occurs when the server | |||
other events at the server may result in uncommitted data loss as | is restarted; however, other events at the server may result in | |||
well. | uncommitted data loss as well. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.3.4. IMPLEMENTATION | 18.3.4. IMPLEMENTATION | |||
The COMMIT operation is similar in operation and semantics to the | The COMMIT operation is similar in operation and semantics to the | |||
POSIX fsync() [25] system interface that synchronizes a file's state | POSIX fsync() [25] system interface that synchronizes a file's state | |||
with the disk (file data and metadata is flushed to disk or stable | with the disk (file data and metadata is flushed to disk or stable | |||
storage). COMMIT performs the same operation for a client, flushing | storage). COMMIT performs the same operation for a client, flushing | |||
any unsynchronized data and metadata on the server to the server's | any unsynchronized data and metadata on the server to the server's | |||
disk or stable storage for the specified file. Like fsync(2), it may | disk or stable storage for the specified file. Like fsync(), it may | |||
be that there is some modified data or no modified data to | be that there is some modified data or no modified data to | |||
synchronize. The data may have been synchronized by the server's | synchronize. The data may have been synchronized by the server's | |||
normal periodic buffer synchronization activity. COMMIT should | normal periodic buffer synchronization activity. COMMIT should | |||
return NFS4_OK, unless there has been an unexpected error. | return NFS4_OK, unless there has been an unexpected error. | |||
COMMIT differs from fsync(2) in that it is possible for the client to | COMMIT differs from fsync() in that it is possible for the client to | |||
flush a range of the file (most likely triggered by a buffer- | flush a range of the file (most likely triggered by a buffer- | |||
reclamation scheme on the client before file has been completely | reclamation scheme on the client before the file has been completely | |||
written). | written). | |||
The server implementation of COMMIT is reasonably simple. If the | The server implementation of COMMIT is reasonably simple. If the | |||
server receives a full file COMMIT request, that is starting at | server receives a full file COMMIT request, that is, starting at | |||
offset 0 and count 0, it should do the equivalent of fsync()'ing the | offset zero and count zero, it should do the equivalent of applying | |||
file. Otherwise, it should arrange to have the modified data in the | fsync() to the entire file. Otherwise, it should arrange to have the | |||
range specified by offset and count to be flushed to stable storage. | modified data in the range specified by offset and count to be | |||
In both cases, any metadata associated with the file must be flushed | flushed to stable storage. In both cases, any metadata associated | |||
to stable storage before returning. It is not an error for there to | with the file must be flushed to stable storage before returning. It | |||
be nothing to flush on the server. This means that the data and | is not an error for there to be nothing to flush on the server. This | |||
metadata that needed to be flushed have already been flushed or lost | means that the data and metadata that needed to be flushed have | |||
during the last server failure. | already been flushed or lost during the last server failure. | |||
The client implementation of COMMIT is a little more complex. There | The client implementation of COMMIT is a little more complex. There | |||
are two reasons for wanting to commit a client buffer to stable | are two reasons for wanting to commit a client buffer to stable | |||
storage. The first is that the client wants to reuse a buffer. In | storage. The first is that the client wants to reuse a buffer. In | |||
this case, the offset and count of the buffer are sent to the server | this case, the offset and count of the buffer are sent to the server | |||
in the COMMIT request. The server then flushes any modified data | in the COMMIT request. The server then flushes any modified data | |||
based on the offset and count, and flushes any modified metadata | based on the offset and count, and flushes any modified metadata | |||
associated with the file. It then returns the status of the flush | associated with the file. It then returns the status of the flush | |||
and the write verifier. The other reason for the client to generate | and the write verifier. The second reason for the client to generate | |||
a COMMIT is for a full file flush, such as may be done at close. In | a COMMIT is for a full file flush, such as may be done at close. In | |||
this case, the client would gather all of the buffers for this file | this case, the client would gather all of the buffers for this file | |||
that contain uncommitted data, do the COMMIT operation with an offset | that contain uncommitted data, do the COMMIT operation with an offset | |||
of 0 and count of 0, and then free all of those buffers. Any other | of zero and count of zero, and then free all of those buffers. Any | |||
dirty buffers would be sent to the server in the normal fashion. | other dirty buffers would be sent to the server in the normal | |||
fashion. | ||||
After a buffer is written by the client with the stable parameter set | After a buffer is written (via the WRITE operation) by the client | |||
to UNSTABLE4, the buffer must be considered as modified by the client | with the "committed" field in the result of WRITE set to UNSTABLE4, | |||
until the buffer has either been flushed via a COMMIT operation or | the buffer must be considered as modified by the client until the | |||
written via a WRITE operation with stable parameter set to FILE_SYNC4 | buffer has either been flushed via a COMMIT operation or written via | |||
or DATA_SYNC4. This is done to prevent the buffer from being freed | a WRITE operation with the "committed" field in the result set to | |||
and reused before the data can be flushed to stable storage on the | FILE_SYNC4 or DATA_SYNC4. This is done to prevent the buffer from | |||
server. | being freed and reused before the data can be flushed to stable | |||
storage on the server. | ||||
When a response is returned from either a WRITE or a COMMIT operation | When a response is returned from either a WRITE or a COMMIT operation | |||
and it contains a write verifier that is different than previously | and it contains a write verifier that differs from that previously | |||
returned by the server, the client will need to retransmit all of the | returned by the server, the client will need to retransmit all of the | |||
buffers containing uncommitted data to the server. How this is to be | buffers containing uncommitted data to the server. How this is to be | |||
done is up to the implementor. If there is only one buffer of | done is up to the implementor. If there is only one buffer of | |||
interest, then it should sent in a WRITE request with the FILE_SYNC4 | interest, then it should be sent in a WRITE request with the | |||
stable parameter. If there is more than one buffer, it might be | FILE_SYNC4 stable parameter. If there is more than one buffer, it | |||
worthwhile retransmitting all of the buffers in WRITE requests with | might be worthwhile retransmitting all of the buffers in WRITE | |||
the stable parameter set to UNSTABLE4 and then retransmitting the | operations with the stable parameter set to UNSTABLE4 and then | |||
COMMIT operation to flush all of the data on the server to stable | retransmitting the COMMIT operation to flush all of the data on the | |||
storage. However, if the server repeatably returns from COMMIT a | server to stable storage. However, if the server repeatably returns | |||
verifier that differs from that returned by WRITE, the only way to | from COMMIT a verifier that differs from that returned by WRITE, the | |||
ensure progress is to retransmit all of the buffers with WRITE | only way to ensure progress is to retransmit all of the buffers with | |||
requests with the FILE_SYNC4 stable parameter. | WRITE requests with the FILE_SYNC4 stable parameter. | |||
The above description applies to page-cache-based systems as well as | The above description applies to page-cache-based systems as well as | |||
buffer-cache-based systems. In those systems, the virtual memory | buffer-cache-based systems. In the former systems, the virtual | |||
system will need to be modified instead of the buffer cache. | memory system will need to be modified instead of the buffer cache. | |||
18.4. Operation 6: CREATE - Create a Non-Regular File Object | 18.4. Operation 6: CREATE - Create a Non-Regular File Object | |||
18.4.1. ARGUMENTS | 18.4.1. ARGUMENTS | |||
union createtype4 switch (nfs_ftype4 type) { | union createtype4 switch (nfs_ftype4 type) { | |||
case NF4LNK: | case NF4LNK: | |||
linktext4 linkdata; | linktext4 linkdata; | |||
case NF4BLK: | case NF4BLK: | |||
case NF4CHR: | case NF4CHR: | |||
skipping to change at page 420, line 47 | skipping to change at page 420, line 47 | |||
If an object of the same name already exists in the directory, the | If an object of the same name already exists in the directory, the | |||
server will return the error NFS4ERR_EXIST. | server will return the error NFS4ERR_EXIST. | |||
For the directory where the new file object was created, the server | For the directory where the new file object was created, the server | |||
returns change_info4 information in cinfo. With the atomic field of | returns change_info4 information in cinfo. With the atomic field of | |||
the change_info4 data type, the server will indicate if the before | the change_info4 data type, the server will indicate if the before | |||
and after change attributes were obtained atomically with respect to | and after change attributes were obtained atomically with respect to | |||
the file object creation. | the file object creation. | |||
If the objname has a length of 0 (zero), or if objname does not obey | If the objname has a length of zero, or if objname does not obey the | |||
the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | UTF-8 definition, the error NFS4ERR_INVAL will be returned. | |||
The current filehandle is replaced by that of the new object. | The current filehandle is replaced by that of the new object. | |||
The createattrs specifies the initial set of attributes for the | The createattrs specifies the initial set of attributes for the | |||
object. The set of attributes may include any writable attribute | object. The set of attributes may include any writable attribute | |||
valid for the object type. When the operation is successful, the | valid for the object type. When the operation is successful, the | |||
server will return to the client an attribute mask signifying which | server will return to the client an attribute mask signifying which | |||
attributes were successfully set for the object. | attributes were successfully set for the object. | |||
If createattrs includes neither the owner attribute nor an ACL with | If createattrs includes neither the owner attribute nor an ACL with | |||
an ACE for the owner, and if the server's file system both supports | an ACE for the owner, and if the server's file system both supports | |||
and requires an owner attribute (or an owner ACE) then the server | and requires an owner attribute (or an owner ACE), then the server | |||
MUST derive the owner (or the owner ACE). This would typically be | MUST derive the owner (or the owner ACE). This would typically be | |||
from the principal indicated in the RPC credentials of the call, but | from the principal indicated in the RPC credentials of the call, but | |||
the server's operating environment or file system semantics may | the server's operating environment or file system semantics may | |||
dictate other methods of derivation. Similarly, if createattrs | dictate other methods of derivation. Similarly, if createattrs | |||
includes neither the group attribute nor a group ACE, and if the | includes neither the group attribute nor a group ACE, and if the | |||
server's file system both supports and requires the notion of a group | server's file system both supports and requires the notion of a group | |||
attribute (or group ACE), the server MUST derive the group attribute | attribute (or group ACE), the server MUST derive the group attribute | |||
(or the corresponding owner ACE) for the file. This could be from | (or the corresponding owner ACE) for the file. This could be from | |||
the RPC call's credentials, such as the group principal if the | the RPC call's credentials, such as the group principal if the | |||
credentials include it (such as with AUTH_SYS), from the group | credentials include it (such as with AUTH_SYS), from the group | |||
identifier associated with the principal in the credentials (e.g., | identifier associated with the principal in the credentials (e.g., | |||
POSIX systems have a user database [26] that has a group identifier | POSIX systems have a user database [26] that has a group identifier | |||
for every user identifier), inherited from directory the object is | for every user identifier), inherited from the directory in which the | |||
created in, or whatever else the server's operating environment or | object is created, or whatever else the server's operating | |||
file system semantics dictate. This applies to the OPEN operation | environment or file system semantics dictate. This applies to the | |||
too. | OPEN operation too. | |||
Conversely, it is possible the client will specify in createattrs an | Conversely, it is possible that the client will specify in | |||
owner attribute, group attribute, or ACL that the principal indicated | createattrs an owner attribute, group attribute, or ACL that the | |||
the RPC call's credentials does not have permissions to create files | principal indicated the RPC call's credentials does not have | |||
for. The error to be returned in this instance is NFS4ERR_PERM. | permissions to create files for. The error to be returned in this | |||
This applies to the OPEN operation too. | instance is NFS4ERR_PERM. This applies to the OPEN operation too. | |||
If the current filehandle designates a directory for which another | If the current filehandle designates a directory for which another | |||
client holds a directory delegation, then, unless the delegation is | client holds a directory delegation, then, unless the delegation is | |||
such that the situation can be resolved by sending a notification, | such that the situation can be resolved by sending a notification, | |||
the delegation MUST be recalled, and the CREATE operation MUST NOT | the delegation MUST be recalled, and the CREATE operation MUST NOT | |||
proceed until the delegation is returned or revoked. Except where | proceed until the delegation is returned or revoked. Except where | |||
this happens very quickly, one or more NFS4ERR_DELAY errors will be | this happens very quickly, one or more NFS4ERR_DELAY errors will be | |||
returned to requests made while delegation remains outstanding. | returned to requests made while delegation remains outstanding. | |||
When the current filehandle designates a directory for which one or | When the current filehandle designates a directory for which one or | |||
skipping to change at page 422, line 27 | skipping to change at page 422, line 27 | |||
}; | }; | |||
18.5.2. RESULTS | 18.5.2. RESULTS | |||
struct DELEGPURGE4res { | struct DELEGPURGE4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.5.3. DESCRIPTION | 18.5.3. DESCRIPTION | |||
Purges all of the delegations awaiting recovery for a given client. | This operation purges all of the delegations awaiting recovery for a | |||
This is useful for clients which do not commit delegation information | given client. This is useful for clients that do not commit | |||
to stable storage to indicate that conflicting requests need not be | delegation information to stable storage to indicate that conflicting | |||
delayed by the server awaiting recovery of delegation information. | requests need not be delayed by the server awaiting recovery of | |||
delegation information. | ||||
The client is NOT specified by the clientid field of the request. | The client is NOT specified by the clientid field of the request. | |||
The client SHOULD set the client field to zero and the server MUST | The client SHOULD set the client field to zero, and the server MUST | |||
ignore the clientid field. Instead the server MUST derive the client | ignore the clientid field. Instead, the server MUST derive the | |||
ID from the value of the session ID in the arguments of the SEQUENCE | client ID from the value of the session ID in the arguments of the | |||
operation that precedes DELEGPURGE in the COMPOUND request. | SEQUENCE operation that precedes DELEGPURGE in the COMPOUND request. | |||
This operation should be used by clients that record delegation | The DELEGPURGE operation should be used by clients that record | |||
information on stable storage on the client. In this case, | delegation information on stable storage on the client. In this | |||
DELEGPURGE should be sent immediately after doing delegation recovery | case, after the client recovers all delegations it knows of, it | |||
on all delegations known to the client. Doing so will notify the | should immediately send a DELEGPURGE operation. Doing so will notify | |||
server that no additional delegations for the client will be | the server that no additional delegations for the client will be | |||
recovered allowing it to free resources, and avoid delaying other | recovered allowing it to free resources, and avoid delaying other | |||
clients which make requests that conflict with the unrecovered | clients which make requests that conflict with the unrecovered | |||
delegations. The set of delegations known to the server and the | delegations. The set of delegations known to the server and the | |||
client may be different. The reason for this is that a client may | client might be different. The reason for this is that after sending | |||
fail after making a request which resulted in delegation but before | a request that resulted in a delegation, the client might experience | |||
it received the results and committed them to the client's stable | a failure before it both received the delegation and committed the | |||
storage. | delegation to the client's stable storage. | |||
The server MAY support DELEGPURGE, but if it does not, it MUST NOT | The server MAY support DELEGPURGE, but if it does not, it MUST NOT | |||
support CLAIM_DELEGATE_PREV. | support CLAIM_DELEGATE_PREV and MUST NOT support CLAIM_DELEG_PREV_FH. | |||
18.6. Operation 8: DELEGRETURN - Return Delegation | 18.6. Operation 8: DELEGRETURN - Return Delegation | |||
18.6.1. ARGUMENTS | 18.6.1. ARGUMENTS | |||
struct DELEGRETURN4args { | struct DELEGRETURN4args { | |||
/* CURRENT_FH: delegated object */ | /* CURRENT_FH: delegated object */ | |||
stateid4 deleg_stateid; | stateid4 deleg_stateid; | |||
}; | }; | |||
18.6.2. RESULTS | 18.6.2. RESULTS | |||
struct DELEGRETURN4res { | struct DELEGRETURN4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.6.3. DESCRIPTION | 18.6.3. DESCRIPTION | |||
Returns the delegation represented by the current filehandle and | The DELEGRETURN operation returns the delegation represented by the | |||
stateid. | current filehandle and stateid. | |||
Delegations may be returned when recalled or voluntarily (i.e. before | Delegations may be returned voluntarily (i.e., before the server has | |||
the server has recalled them). In either case the client must | recalled them) or when recalled. In either case, the client must | |||
properly propagate state changed under the context of the delegation | properly propagate state changed under the context of the delegation | |||
to the server before returning the delegation. | to the server before returning the delegation. | |||
The server MAY require that the principal, security flavor, and if | The server MAY require that the principal, security flavor, and if | |||
applicable, the GSS mechanism, combination that acquired the | applicable, the GSS mechanism, combination that acquired the | |||
delegation also be the one to send DELEGRETURN on the file. This | delegation also be the one to send DELEGRETURN on the file. This | |||
might not be possible if credentials for the principal are no longer | might not be possible if credentials for the principal are no longer | |||
available. The server MAY allow the machine credential or SSV | available. The server MAY allow the machine credential or SSV | |||
credential (see Section 18.35) to send DELEGRETURN. | credential (see Section 18.35) to send DELEGRETURN. | |||
skipping to change at page 424, line 24 | skipping to change at page 424, line 24 | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.7.3. DESCRIPTION | 18.7.3. DESCRIPTION | |||
The GETATTR operation will obtain attributes for the file system | The GETATTR operation will obtain attributes for the file system | |||
object specified by the current filehandle. The client sets a bit in | object specified by the current filehandle. The client sets a bit in | |||
the bitmap argument for each attribute value that it would like the | the bitmap argument for each attribute value that it would like the | |||
server to return. The server returns an attribute bitmap that | server to return. The server returns an attribute bitmap that | |||
indicates the attribute values which it was able to return, which | indicates the attribute values that it was able to return, which will | |||
will include all attributes requested by the client which are | include all attributes requested by the client that are attributes | |||
attributes supported by the server for the target file system. This | supported by the server for the target file system. This bitmap is | |||
bitmap is followed by the attribute values ordered lowest attribute | followed by the attribute values ordered lowest attribute number | |||
number first. | first. | |||
The server MUST return a value for each attribute that the client | The server MUST return a value for each attribute that the client | |||
requests if the attribute is supported by the server for the target | requests if the attribute is supported by the server for the target | |||
file system. If the server does not support a particular attribute | file system. If the server does not support a particular attribute | |||
on the target file system then it MUST NOT return the attribute value | on the target file system, then it MUST NOT return the attribute | |||
and MUST NOT set the attribute bit in the result bitmap. The server | value and MUST NOT set the attribute bit in the result bitmap. The | |||
MUST return an error if it supports an attribute on the target but | server MUST return an error if it supports an attribute on the target | |||
cannot obtain its value. In that case, no attribute values will be | but cannot obtain its value. In that case, no attribute values will | |||
returned. | be returned. | |||
File systems which are absent should be treated as having support for | File systems that are absent should be treated as having support for | |||
a very small set of attributes as described in Section 11.3.1, even | a very small set of attributes as described in Section 11.3.1, even | |||
if previously, when the file system was present, more attributes were | if previously, when the file system was present, more attributes were | |||
supported. | supported. | |||
All servers MUST support the REQUIRED attributes as specified in | All servers MUST support the REQUIRED attributes as specified in | |||
Section 5.6, for all file systems, with the exception of absent file | Section 5.6, for all file systems, with the exception of absent file | |||
systems. | systems. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.7.4. IMPLEMENTATION | 18.7.4. IMPLEMENTATION | |||
Suppose there is a write delegation held by another client for file | Suppose there is an OPEN_DELEGATE_WRITE delegation held by another | |||
in question and size and/or change are among the set of attributes | client for the file in question and size and/or change are among the | |||
being interrogated. The server has two choices. First, the server | set of attributes being interrogated. The server has two choices. | |||
can obtain the actual current value of these attributes from the | First, the server can obtain the actual current value of these | |||
client holding the delegation by using the CB_GETATTR callback. | attributes from the client holding the delegation by using the | |||
Second, the server, particularly when the delegated client is | CB_GETATTR callback. Second, the server, particularly when the | |||
unresponsive, can recall the delegation in question. The GETATTR | delegated client is unresponsive, can recall the delegation in | |||
MUST NOT proceed until one of the following occurs: | question. The GETATTR MUST NOT proceed until one of the following | |||
occurs: | ||||
o The requested attribute values are returned in the response to | o The requested attribute values are returned in the response to | |||
CB_GETATTR. | CB_GETATTR. | |||
o The write delegation is returned. | o The OPEN_DELEGATE_WRITE delegation is returned. | |||
o The write delegation is revoked. | o The OPEN_DELEGATE_WRITE delegation is revoked. | |||
Unless one of the above happens very quickly, one or more | Unless one of the above happens very quickly, one or more | |||
NFS4ERR_DELAY errors will be returned if while a delegation is | NFS4ERR_DELAY errors will be returned while a delegation is | |||
outstanding. | outstanding. | |||
18.8. Operation 10: GETFH - Get Current Filehandle | 18.8. Operation 10: GETFH - Get Current Filehandle | |||
18.8.1. ARGUMENTS | 18.8.1. ARGUMENTS | |||
/* CURRENT_FH: */ | /* CURRENT_FH: */ | |||
void; | void; | |||
18.8.2. RESULTS | 18.8.2. RESULTS | |||
skipping to change at page 426, line 7 | skipping to change at page 426, line 13 | |||
}; | }; | |||
18.8.3. DESCRIPTION | 18.8.3. DESCRIPTION | |||
This operation returns the current filehandle value. | This operation returns the current filehandle value. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
As described in Section 2.10.6.4, GETFH is REQUIRED or RECOMMENDED to | As described in Section 2.10.6.4, GETFH is REQUIRED or RECOMMENDED to | |||
immediately follow certain operations, and servers are free to reject | immediately follow certain operations, and servers are free to reject | |||
such operations the client fails to insert GETFH in the request as | such operations if the client fails to insert GETFH in the request as | |||
REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional | REQUIRED or RECOMMENDED. Section 18.16.4.1 provides additional | |||
justification for why GETFH MUST follow OPEN. | justification for why GETFH MUST follow OPEN. | |||
18.8.4. IMPLEMENTATION | 18.8.4. IMPLEMENTATION | |||
Operations that change the current filehandle like LOOKUP or CREATE | Operations that change the current filehandle like LOOKUP or CREATE | |||
do not automatically return the new filehandle as a result. For | do not automatically return the new filehandle as a result. For | |||
instance, if a client needs to lookup a directory entry and obtain | instance, if a client needs to lookup a directory entry and obtain | |||
its filehandle then the following request is needed. | its filehandle, then the following request is needed. | |||
PUTFH (directory filehandle) | PUTFH (directory filehandle) | |||
LOOKUP (entry name) | LOOKUP (entry name) | |||
GETFH | GETFH | |||
18.9. Operation 11: LINK - Create Link to a File | 18.9. Operation 11: LINK - Create Link to a File | |||
18.9.1. ARGUMENTS | 18.9.1. ARGUMENTS | |||
skipping to change at page 427, line 20 | skipping to change at page 427, line 33 | |||
file and the target directory must reside within the same file system | file and the target directory must reside within the same file system | |||
on the server. On success, the current filehandle will continue to | on the server. On success, the current filehandle will continue to | |||
be the target directory. If an object exists in the target directory | be the target directory. If an object exists in the target directory | |||
with the same name as newname, the server must return NFS4ERR_EXIST. | with the same name as newname, the server must return NFS4ERR_EXIST. | |||
For the target directory, the server returns change_info4 information | For the target directory, the server returns change_info4 information | |||
in cinfo. With the atomic field of the change_info4 data type, the | in cinfo. With the atomic field of the change_info4 data type, the | |||
server will indicate if the before and after change attributes were | server will indicate if the before and after change attributes were | |||
obtained atomically with respect to the link creation. | obtained atomically with respect to the link creation. | |||
If the newname has a length of 0 (zero), or if newname does not obey | If the newname has a length of zero, or if newname does not obey the | |||
the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | UTF-8 definition, the error NFS4ERR_INVAL will be returned. | |||
18.9.4. IMPLEMENTATION | 18.9.4. IMPLEMENTATION | |||
The server MAY impose restrictions on the LINK operation such that | The server MAY impose restrictions on the LINK operation such that | |||
LINK may not be done when the file is open or when that open is done | LINK may not be done when the file is open or when that open is done | |||
by particular protocols, or with particular options or access modes. | by particular protocols, or with particular options or access modes. | |||
When LINK is rejected because of such restrictions, the error | When LINK is rejected because of such restrictions, the error | |||
NFS4ERR_FILE_OPEN is returned. | NFS4ERR_FILE_OPEN is returned. | |||
If a server does implement such restrictions and those restrictions | If a server does implement such restrictions and those restrictions | |||
include cases of NFSv4 opens preventing successful execution of a | include cases of NFSv4 opens preventing successful execution of a | |||
link, the server needs to recall any delegations which could hide the | link, the server needs to recall any delegations that could hide the | |||
existence of opens relevant to that decision. The reason is that | existence of opens relevant to that decision. The reason is that | |||
when a client holds a delegation, the server might not have an | when a client holds a delegation, the server might not have an | |||
accurate account of the opens for that client, since the client may | accurate account of the opens for that client, since the client may | |||
execute OPENs and CLOSEs locally. The LINK operation must be delayed | execute OPENs and CLOSEs locally. The LINK operation must be delayed | |||
only until a definitive result can be obtained. E.g., suppose there | only until a definitive result can be obtained. For example, suppose | |||
are multiple delegations and one of them establishes an open whose | there are multiple delegations and one of them establishes an open | |||
presence would prevent the link. Given the server's semantics, | whose presence would prevent the link. Given the server's semantics, | |||
NFS4ERR_FILE_OPEN may be returned to the caller as soon as that | NFS4ERR_FILE_OPEN may be returned to the caller as soon as that | |||
delegation is returned without waiting for other delegations to be | delegation is returned without waiting for other delegations to be | |||
returned. Similarly, if such opens are not associated with | returned. Similarly, if such opens are not associated with | |||
delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | |||
delegation recall being done. | delegation recall being done. | |||
If the current filehandle designates a directory for which another | If the current filehandle designates a directory for which another | |||
client holds a directory delegation, then, unless the delegation is | client holds a directory delegation, then, unless the delegation is | |||
such that the situation can be resolved by sending a notification, | such that the situation can be resolved by sending a notification, | |||
the delegation MUST be recalled, and the operation cannot be | the delegation MUST be recalled, and the operation cannot be | |||
performed successfully. until the delegation is returned or revoked. | performed successfully until the delegation is returned or revoked. | |||
Except where this happens very quickly, one or more NFS4ERR_DELAY | Except where this happens very quickly, one or more NFS4ERR_DELAY | |||
errors will be returned to requests made while delegation remains | errors will be returned to requests made while delegation remains | |||
outstanding. | outstanding. | |||
When the current filehandle designates a directory for which one or | When the current filehandle designates a directory for which one or | |||
more directory delegations exist, then, when those delegations | more directory delegations exist, then, when those delegations | |||
request such notifications, instead of a recall, NOTIFY4_ADD_ENTRY | request such notifications, instead of a recall, NOTIFY4_ADD_ENTRY | |||
will be generated as a result of the LINK operation. | will be generated as a result of the LINK operation. | |||
If the current file system supports the numlinks attribute, and other | If the current file system supports the numlinks attribute, and other | |||
skipping to change at page 428, line 28 | skipping to change at page 428, line 42 | |||
Changes to any property of the "hard" linked files are reflected in | Changes to any property of the "hard" linked files are reflected in | |||
all of the linked files. When a link is made to a file, the | all of the linked files. When a link is made to a file, the | |||
attributes for the file should have a value for numlinks that is one | attributes for the file should have a value for numlinks that is one | |||
greater than the value before the LINK operation. | greater than the value before the LINK operation. | |||
The statement "file and the target directory must reside within the | The statement "file and the target directory must reside within the | |||
same file system on the server" means that the fsid fields in the | same file system on the server" means that the fsid fields in the | |||
attributes for the objects are the same. If they reside on different | attributes for the objects are the same. If they reside on different | |||
file systems, the error NFS4ERR_XDEV is returned. This error may be | file systems, the error NFS4ERR_XDEV is returned. This error may be | |||
returned by some server when there is an internal partitioning of a | returned by some servers when there is an internal partitioning of a | |||
file system which the LINK operation would violate. | file system that the LINK operation would violate. | |||
On some servers, "." and ".." are illegal values for newname and the | On some servers, "." and ".." are illegal values for newname and the | |||
error NFS4ERR_BADNAME will be returned if they are specified. | error NFS4ERR_BADNAME will be returned if they are specified. | |||
When the current filehandle designates a named attribute directory | When the current filehandle designates a named attribute directory | |||
and the object to be linked (the saved filehandle) is not a named | and the object to be linked (the saved filehandle) is not a named | |||
attribute for the same object, the error NFS4ERR_XDEV MUST be | attribute for the same object, the error NFS4ERR_XDEV MUST be | |||
returned. When the saved filehandle designates a named attribute and | returned. When the saved filehandle designates a named attribute and | |||
the current filehandle is not the appropriate named attribute | the current filehandle is not the appropriate named attribute | |||
directory, the error NFS4ERR_XDEV MUST also be returned. | directory, the error NFS4ERR_XDEV MUST also be returned. | |||
skipping to change at page 430, line 29 | skipping to change at page 431, line 29 | |||
case NFS4_OK: | case NFS4_OK: | |||
LOCK4resok resok4; | LOCK4resok resok4; | |||
case NFS4ERR_DENIED: | case NFS4ERR_DENIED: | |||
LOCK4denied denied; | LOCK4denied denied; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.10.3. DESCRIPTION | 18.10.3. DESCRIPTION | |||
The LOCK operation requests a byte-range lock for the byte range | The LOCK operation requests a byte-range lock for the byte-range | |||
specified by the offset and length parameters, and lock type | specified by the offset and length parameters, and lock type | |||
specified in the locktype parameter. If this is a reclaim request, | specified in the locktype parameter. If this is a reclaim request, | |||
the reclaim parameter will be TRUE. | the reclaim parameter will be TRUE. | |||
Bytes in a file may be locked even if those bytes are not currently | Bytes in a file may be locked even if those bytes are not currently | |||
allocated to the file. To lock the file from a specific offset | allocated to the file. To lock the file from a specific offset | |||
through the end-of-file (no matter how long the file actually is) use | through the end-of-file (no matter how long the file actually is) use | |||
a length field equal to NFS4_UINT64_MAX. The server MUST return | a length field equal to NFS4_UINT64_MAX. The server MUST return | |||
NFS4ERR_INVAL under the following combinations of length and offset: | NFS4ERR_INVAL under the following combinations of length and offset: | |||
o Length is equal to zero. | o Length is equal to zero. | |||
o Length is not equal to NFS4_UINT64_MAX, and the sum of length and | o Length is not equal to NFS4_UINT64_MAX, and the sum of length and | |||
offset exceeds NFS4_UINT64_MAX. | offset exceeds NFS4_UINT64_MAX. | |||
32-bit servers are servers that support locking for byte offsets that | 32-bit servers are servers that support locking for byte offsets that | |||
fit within 32 bits (i.e. less than or equal to NFS4_UINT32_MAX). If | fit within 32 bits (i.e., less than or equal to NFS4_UINT32_MAX). If | |||
the client specifies a range that overlaps one or more bytes beyond | the client specifies a range that overlaps one or more bytes beyond | |||
offset NFS4_UINT32_MAX, but does not end at offset NFS4_UINT64_MAX, | offset NFS4_UINT32_MAX but does not end at offset NFS4_UINT64_MAX, | |||
then such a 32-bit server MUST return the error NFS4ERR_BAD_RANGE. | then such a 32-bit server MUST return the error NFS4ERR_BAD_RANGE. | |||
If the server returns NFS4ERR_DENIED, owner, offset, and length of a | If the server returns NFS4ERR_DENIED, the owner, offset, and length | |||
conflicting lock are returned. | of a conflicting lock are returned. | |||
The locker argument specifies the lock-owner that is associated with | The locker argument specifies the lock-owner that is associated with | |||
the LOCK request. The locker4 structure is a switched union that | the LOCK operation. The locker4 structure is a switched union that | |||
indicates whether the client has already created byte-range locking | indicates whether the client has already created byte-range locking | |||
state associated with the current open file and lock-owner. In the | state associated with the current open file and lock-owner. In the | |||
case in which it has, the argument is just a stateid representing the | case in which it has, the argument is just a stateid representing the | |||
set of locks associated with that open file and lock-owner, together | set of locks associated with that open file and lock-owner, together | |||
with a lock_seqid value which MAY be any value and MUST be ignored by | with a lock_seqid value that MAY be any value and MUST be ignored by | |||
the server. In the case where no byte-range locking state has been | the server. In the case where no byte-range locking state has been | |||
established, or the client does not have the stateid available, the | established, or the client does not have the stateid available, the | |||
argument contains the stateid of the open file with which this lock | argument contains the stateid of the open file with which this lock | |||
is to be associated, together with the lock-owner with which the lock | is to be associated, together with the lock-owner with which the lock | |||
is to be associated. The open_to_lock_owner case covers the very | is to be associated. The open_to_lock_owner case covers the very | |||
first lock done by a lock-owner for a given open file and offers a | first lock done by a lock-owner for a given open file and offers a | |||
method to use the established state of the open_stateid to transition | method to use the established state of the open_stateid to transition | |||
to the use of a lock stateid. | to the use of a lock stateid. | |||
The following fields of the locker parameter MAY be set to any value | The following fields of the locker parameter MAY be set to any value | |||
skipping to change at page 431, line 39 | skipping to change at page 432, line 39 | |||
COMPOUND request. | COMPOUND request. | |||
o The open_seqid and lock_seqid fields of the open_owner field | o The open_seqid and lock_seqid fields of the open_owner field | |||
(locker.open_owner.open_seqid and locker.open_owner.lock_seqid). | (locker.open_owner.open_seqid and locker.open_owner.lock_seqid). | |||
o The lock_seqid field of the lock_owner field | o The lock_seqid field of the lock_owner field | |||
(locker.lock_owner.lock_seqid). | (locker.lock_owner.lock_seqid). | |||
Note that the client ID appearing in a LOCK4denied structure is the | Note that the client ID appearing in a LOCK4denied structure is the | |||
actual client associated with the conflicting lock, whether this is | actual client associated with the conflicting lock, whether this is | |||
the client ID associated with the current session, or a different | the client ID associated with the current session or a different one. | |||
one. Thus if the server returns NFS4ERR_DENIED, it MUST set the | Thus, if the server returns NFS4ERR_DENIED, it MUST set the clientid | |||
clientid field of the owner field of the denied field. | field of the owner field of the denied field. | |||
If the current filehandle is not an ordinary file, an error will be | If the current filehandle is not an ordinary file, an error will be | |||
returned to the client. In the case that the current filehandle | returned to the client. In the case that the current filehandle | |||
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if | represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If | |||
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | |||
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.10.4. IMPLEMENTATION | 18.10.4. IMPLEMENTATION | |||
If the server is unable to determine the exact offset and length of | If the server is unable to determine the exact offset and length of | |||
the conflicting lock, the same offset and length that were provided | the conflicting byte-range lock, the same offset and length that were | |||
in the arguments should be returned in the denied results | provided in the arguments should be returned in the denied results. | |||
LOCK operations are subject to permission checks and to checks | LOCK operations are subject to permission checks and to checks | |||
against the access type of the associated file. However, the | against the access type of the associated file. However, the | |||
specific right and modes required for various type of locks, reflect | specific right and modes required for various types of locks reflect | |||
the semantics of the server-exported file system, and are not | the semantics of the server-exported file system, and are not | |||
specified by the protocol. For example, Windows 2000 allows a write | specified by the protocol. For example, Windows 2000 allows a write | |||
lock of a file open for READ, while a POSIX-compliant system does | lock of a file open for read access, while a POSIX-compliant system | |||
not. | does not. | |||
When the client makes a lock request that corresponds to a range that | When the client sends a LOCK operation that corresponds to a range | |||
the lock-owner has locked already (with the same or different lock | that the lock-owner has locked already (with the same or different | |||
type), or to a sub-region of such a range, or to a region which | lock type), or to a sub-range of such a range, or to a byte-range | |||
includes multiple locks already granted to that lock-owner, in whole | that includes multiple locks already granted to that lock-owner, in | |||
or in part, and the server does not support such locking operations | whole or in part, and the server does not support such locking | |||
(i.e. does not support POSIX locking semantics), the server will | operations (i.e., does not support POSIX locking semantics), the | |||
return the error NFS4ERR_LOCK_RANGE. In that case, the client may | server will return the error NFS4ERR_LOCK_RANGE. In that case, the | |||
return an error, or it may emulate the required operations, using | client may return an error, or it may emulate the required | |||
only LOCK for ranges that do not include any bytes already locked by | operations, using only LOCK for ranges that do not include any bytes | |||
that lock-owner and LOCKU of locks held by that lock-owner | already locked by that lock-owner and LOCKU of locks held by that | |||
(specifying an exactly-matching range and type). Similarly, when the | lock-owner (specifying an exactly matching range and type). | |||
client makes a lock request that amounts to upgrading (changing from | Similarly, when the client sends a LOCK operation that amounts to | |||
a read lock to a write lock) or downgrading (changing from write lock | upgrading (changing from a READ_LT lock to a WRITE_LT lock) or | |||
to a read lock) an existing byte-range lock, and the server does not | downgrading (changing from WRITE_LT lock to a READ_LT lock) an | |||
support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP. | existing byte-range lock, and the server does not support such a | |||
Such operations may not perfectly reflect the required semantics in | lock, the server will return NFS4ERR_LOCK_NOTSUPP. Such operations | |||
the face of conflicting lock requests from other clients. | may not perfectly reflect the required semantics in the face of | |||
conflicting LOCK operations from other clients. | ||||
When a client holds a write delegation, the client holding that | When a client holds an OPEN_DELEGATE_WRITE delegation, the client | |||
delegation is assured that there are no opens by other clients. | holding that delegation is assured that there are no opens by other | |||
Thus, there can be no conflicting LOCK requests from such clients. | clients. Thus, there can be no conflicting LOCK operations from such | |||
Therefore, the client may be handling locking requests locally, | clients. Therefore, the client may be handling locking requests | |||
without doing LOCK operations on the server. If it does that, it | locally, without doing LOCK operations on the server. If it does | |||
must be prepared to update the lock status on the server, by doing | that, it must be prepared to update the lock status on the server, by | |||
appropriate LOCK and LOCKU requests before returning the delegation. | sending appropriate LOCK and LOCKU operations before returning the | |||
delegation. | ||||
When one or more clients hold read delegations, any LOCK request | When one or more clients hold OPEN_DELEGATE_READ delegations, any | |||
where the server is implementing mandatory locking semantics, MUST | LOCK operation where the server is implementing mandatory locking | |||
result in the recall of all such delegations. The LOCK request may | semantics MUST result in the recall of all such delegations. The | |||
not be granted until all such delegations are return or revoked. | LOCK operation may not be granted until all such delegations are | |||
Except where this happens very quickly, one or more NFS4ERR_DELAY | returned or revoked. Except where this happens very quickly, one or | |||
errors will be returned to requests made while the delegation remains | more NFS4ERR_DELAY errors will be returned to requests made while the | |||
outstanding. | delegation remains outstanding. | |||
18.11. Operation 13: LOCKT - Test For Lock | 18.11. Operation 13: LOCKT - Test for Lock | |||
18.11.1. ARGUMENTS | 18.11.1. ARGUMENTS | |||
struct LOCKT4args { | struct LOCKT4args { | |||
/* CURRENT_FH: file */ | /* CURRENT_FH: file */ | |||
nfs_lock_type4 locktype; | nfs_lock_type4 locktype; | |||
offset4 offset; | offset4 offset; | |||
length4 length; | length4 length; | |||
lock_owner4 owner; | lock_owner4 owner; | |||
}; | }; | |||
skipping to change at page 433, line 33 | skipping to change at page 434, line 34 | |||
void; | void; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.11.3. DESCRIPTION | 18.11.3. DESCRIPTION | |||
The LOCKT operation tests the lock as specified in the arguments. If | The LOCKT operation tests the lock as specified in the arguments. If | |||
a conflicting lock exists, the owner, offset, length, and type of the | a conflicting lock exists, the owner, offset, length, and type of the | |||
conflicting lock are returned. The owner field in the results | conflicting lock are returned. The owner field in the results | |||
includes the client ID of the owner of conflicting lock, whether this | includes the client ID of the owner of the conflicting lock, whether | |||
is the client ID associated with the current session or a different | this is the client ID associated with the current session or a | |||
client ID. If no lock is held, nothing other than NFS4_OK is | different client ID. If no lock is held, nothing other than NFS4_OK | |||
returned. Lock types READ_LT and READW_LT are processed in the same | is returned. Lock types READ_LT and READW_LT are processed in the | |||
way in that a conflicting lock test is done without regard to | same way in that a conflicting lock test is done without regard to | |||
blocking or non-blocking. The same is true for WRITE_LT and | blocking or non-blocking. The same is true for WRITE_LT and | |||
WRITEW_LT. | WRITEW_LT. | |||
The ranges are specified as for LOCK. The NFS4ERR_INVAL and | The ranges are specified as for LOCK. The NFS4ERR_INVAL and | |||
NFS4ERR_BAD_RANGE errors are returned under the same circumstances as | NFS4ERR_BAD_RANGE errors are returned under the same circumstances as | |||
for LOCK. | for LOCK. | |||
The clientid field of the owner MAY be set to any value by the client | The clientid field of the owner MAY be set to any value by the client | |||
and MUST be ignored by the server. The reason the server MUST ignore | and MUST be ignored by the server. The reason the server MUST ignore | |||
the clientid field is that the server MUST derive the client ID from | the clientid field is that the server MUST derive the client ID from | |||
the session ID from the SEQUENCE operation of the COMPOUND request. | the session ID from the SEQUENCE operation of the COMPOUND request. | |||
If the current filehandle is not an ordinary file, an error will be | If the current filehandle is not an ordinary file, an error will be | |||
returned to the client. In the case that the current filehandle | returned to the client. In the case that the current filehandle | |||
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if | represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If | |||
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | |||
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.11.4. IMPLEMENTATION | 18.11.4. IMPLEMENTATION | |||
If the server is unable to determine the exact offset and length of | If the server is unable to determine the exact offset and length of | |||
the conflicting lock, the same offset and length that were provided | the conflicting lock, the same offset and length that were provided | |||
in the arguments should be returned in the denied results. | in the arguments should be returned in the denied results. | |||
LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to | LOCKT uses a lock_owner4 rather a stateid4, as is used in LOCK to | |||
identify the owner. This is because the client does not have to open | identify the owner. This is because the client does not have to open | |||
the file to test for the existence of a lock, so a stateid might not | the file to test for the existence of a lock, so a stateid might not | |||
be available. | be available. | |||
As noted in Section 18.10.4, some servers may return | As noted in Section 18.10.4, some servers may return | |||
NFS4ERR_LOCK_RANGE to certain (otherwise non-conflicting) lock | NFS4ERR_LOCK_RANGE to certain (otherwise non-conflicting) LOCK | |||
requests that overlap ranges already granted to the current lock- | operations that overlap ranges already granted to the current lock- | |||
owner. | owner. | |||
The LOCKT operation's test for conflicting locks SHOULD exclude locks | The LOCKT operation's test for conflicting locks SHOULD exclude locks | |||
for the current lock-owner, and thus should return NFS4_OK in such | for the current lock-owner, and thus should return NFS4_OK in such | |||
cases. Note that this means that a server might return NFS4_OK to a | cases. Note that this means that a server might return NFS4_OK to a | |||
LOCKT request even though a LOCK request for the same range and lock- | LOCKT request even though a LOCK operation for the same range and | |||
owner would fail with NFS4ERR_LOCK_RANGE. | lock-owner would fail with NFS4ERR_LOCK_RANGE. | |||
When a client holds a write delegation, it may choose (see | When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose | |||
Section 18.10.4) to handle LOCK requests locally. In such a case, | (see Section 18.10.4) to handle LOCK requests locally. In such a | |||
LOCKT requests will similarly be handled locally. | case, LOCKT requests will similarly be handled locally. | |||
18.12. Operation 14: LOCKU - Unlock File | 18.12. Operation 14: LOCKU - Unlock File | |||
18.12.1. ARGUMENTS | 18.12.1. ARGUMENTS | |||
struct LOCKU4args { | struct LOCKU4args { | |||
/* CURRENT_FH: file */ | /* CURRENT_FH: file */ | |||
nfs_lock_type4 locktype; | nfs_lock_type4 locktype; | |||
seqid4 seqid; | seqid4 seqid; | |||
stateid4 lock_stateid; | stateid4 lock_stateid; | |||
skipping to change at page 435, line 30 | skipping to change at page 436, line 30 | |||
has no effect on the success or failure of the LOCKU operation. | has no effect on the success or failure of the LOCKU operation. | |||
The ranges are specified as for LOCK. The NFS4ERR_INVAL and | The ranges are specified as for LOCK. The NFS4ERR_INVAL and | |||
NFS4ERR_BAD_RANGE errors are returned under the same circumstances as | NFS4ERR_BAD_RANGE errors are returned under the same circumstances as | |||
for LOCK. | for LOCK. | |||
The seqid parameter MAY be any value and the server MUST ignore it. | The seqid parameter MAY be any value and the server MUST ignore it. | |||
If the current filehandle is not an ordinary file, an error will be | If the current filehandle is not an ordinary file, an error will be | |||
returned to the client. In the case that the current filehandle | returned to the client. In the case that the current filehandle | |||
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if | represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If | |||
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | |||
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
The server MAY require that the principal, security flavor, and | The server MAY require that the principal, security flavor, and if | |||
applicable, the GSS mechanism, combination that sent a LOCK request | applicable, the GSS mechanism, combination that sent a LOCK operation | |||
also be the one to send LOCKU on the file. This might not be | also be the one to send LOCKU on the file. This might not be | |||
possible if credentials for the principal are no longer available. | possible if credentials for the principal are no longer available. | |||
The server MAY allow the machine credential or SSV credential (see | The server MAY allow the machine credential or SSV credential (see | |||
Section 18.35) to send LOCKU. | Section 18.35) to send LOCKU. | |||
18.12.4. IMPLEMENTATION | 18.12.4. IMPLEMENTATION | |||
If the area to be unlocked does not correspond exactly to a lock | If the area to be unlocked does not correspond exactly to a lock | |||
actually held by the lock-owner the server may return the error | actually held by the lock-owner, the server may return the error | |||
NFS4ERR_LOCK_RANGE. This includes the case in which the area is not | NFS4ERR_LOCK_RANGE. This includes the case in which the area is not | |||
locked, where the area is a sub-range of the area locked, where it | locked, where the area is a sub-range of the area locked, where it | |||
overlaps the area locked without matching exactly or the area | overlaps the area locked without matching exactly, or the area | |||
specified includes multiple locks held by the lock-owner. In all of | specified includes multiple locks held by the lock-owner. In all of | |||
these cases, allowed by POSIX locking [24] semantics, a client | these cases, allowed by POSIX locking [24] semantics, a client | |||
receiving this error, should if it desires support for such | receiving this error should, if it desires support for such | |||
operations, simulate the operation using LOCKU on ranges | operations, simulate the operation using LOCKU on ranges | |||
corresponding to locks it actually holds, possibly followed by LOCK | corresponding to locks it actually holds, possibly followed by LOCK | |||
requests for the sub-ranges not being unlocked. | operations for the sub-ranges not being unlocked. | |||
When a client holds a write delegation, it may choose (See | When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose | |||
Section 18.10.4) to handle LOCK requests locally. In such a case, | (see Section 18.10.4) to handle LOCK requests locally. In such a | |||
LOCKU requests will similarly be handled locally. | case, LOCKU operations will similarly be handled locally. | |||
18.13. Operation 15: LOOKUP - Lookup Filename | 18.13. Operation 15: LOOKUP - Lookup Filename | |||
18.13.1. ARGUMENTS | 18.13.1. ARGUMENTS | |||
struct LOOKUP4args { | struct LOOKUP4args { | |||
/* CURRENT_FH: directory */ | /* CURRENT_FH: directory */ | |||
component4 objname; | component4 objname; | |||
}; | }; | |||
18.13.2. RESULTS | 18.13.2. RESULTS | |||
struct LOOKUP4res { | struct LOOKUP4res { | |||
/* New CURRENT_FH: object */ | /* New CURRENT_FH: object */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.13.3. DESCRIPTION | 18.13.3. DESCRIPTION | |||
This operation LOOKUPs or finds a file system object using the | The LOOKUP operation looks up or finds a file system object using the | |||
directory specified by the current filehandle. LOOKUP evaluates the | directory specified by the current filehandle. LOOKUP evaluates the | |||
component and if the object exists the current filehandle is replaced | component and if the object exists, the current filehandle is | |||
with the component's filehandle. | replaced with the component's filehandle. | |||
If the component cannot be evaluated either because it does not exist | If the component cannot be evaluated either because it does not exist | |||
or because the client does not have permission to evaluate the | or because the client does not have permission to evaluate the | |||
component, then an error will be returned and the current filehandle | component, then an error will be returned and the current filehandle | |||
will be unchanged. | will be unchanged. | |||
If the component is a zero length string or if any component does not | If the component is a zero-length string or if any component does not | |||
obey the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | obey the UTF-8 definition, the error NFS4ERR_INVAL will be returned. | |||
18.13.4. IMPLEMENTATION | 18.13.4. IMPLEMENTATION | |||
If the client wants to achieve the effect of a multi-component | If the client wants to achieve the effect of a multi-component look | |||
lookup, it may construct a COMPOUND request such as (and obtain each | up, it may construct a COMPOUND request such as (and obtain each | |||
filehandle): | filehandle): | |||
PUTFH (directory filehandle) | PUTFH (directory filehandle) | |||
LOOKUP "pub" | LOOKUP "pub" | |||
GETFH | GETFH | |||
LOOKUP "foo" | LOOKUP "foo" | |||
GETFH | GETFH | |||
LOOKUP "bar" | LOOKUP "bar" | |||
GETFH | GETFH | |||
Unlike NFSv3, NFSv4.1 allows LOOKUP requests to cross mountpoints on | Unlike NFSv3, NFSv4.1 allows LOOKUP requests to cross mountpoints on | |||
the server. The client can detect a mountpoint crossing by comparing | the server. The client can detect a mountpoint crossing by comparing | |||
the fsid attribute of the directory with the fsid attribute of the | the fsid attribute of the directory with the fsid attribute of the | |||
directory looked up. If the fsids are different then the new | directory looked up. If the fsids are different, then the new | |||
directory is a server mountpoint. UNIX clients that detect a | directory is a server mountpoint. UNIX clients that detect a | |||
mountpoint crossing will need to mount the server's file system. | mountpoint crossing will need to mount the server's file system. | |||
This needs to be done to maintain the file object identity checking | This needs to be done to maintain the file object identity checking | |||
mechanisms common to UNIX clients. | mechanisms common to UNIX clients. | |||
Servers that limit NFS access to "shares" or "exported" file systems | Servers that limit NFS access to "shared" or "exported" file systems | |||
should provide a pseudo file system into which the exported file | should provide a pseudo file system into which the exported file | |||
systems can be integrated, so that clients can browse the server's | systems can be integrated, so that clients can browse the server's | |||
name space. The clients view of a pseudo file system will be limited | name space. The clients view of a pseudo file system will be limited | |||
to paths that lead to exported file systems. | to paths that lead to exported file systems. | |||
Note: previous versions of the protocol assigned special semantics to | Note: previous versions of the protocol assigned special semantics to | |||
the names "." and "..". NFSv4.1 assigns no special semantics to | the names "." and "..". NFSv4.1 assigns no special semantics to | |||
these names. The LOOKUPP operator must be used to lookup a parent | these names. The LOOKUPP operator must be used to lookup a parent | |||
directory. | directory. | |||
Note that this operation does not follow symbolic links. The client | Note that this operation does not follow symbolic links. The client | |||
is responsible for all parsing of filenames including filenames that | is responsible for all parsing of filenames including filenames that | |||
are modified by symbolic links encountered during the lookup process. | are modified by symbolic links encountered during the look up | |||
process. | ||||
If the current filehandle supplied is not a directory but a symbolic | If the current filehandle supplied is not a directory but a symbolic | |||
link, the error NFS4ERR_SYMLINK is returned as the error. For all | link, the error NFS4ERR_SYMLINK is returned as the error. For all | |||
other non-directory file types, the error NFS4ERR_NOTDIR is returned. | other non-directory file types, the error NFS4ERR_NOTDIR is returned. | |||
18.14. Operation 16: LOOKUPP - Lookup Parent Directory | 18.14. Operation 16: LOOKUPP - Lookup Parent Directory | |||
18.14.1. ARGUMENTS | 18.14.1. ARGUMENTS | |||
/* CURRENT_FH: object */ | /* CURRENT_FH: object */ | |||
skipping to change at page 438, line 17 | skipping to change at page 439, line 17 | |||
struct LOOKUPP4res { | struct LOOKUPP4res { | |||
/* new CURRENT_FH: parent directory */ | /* new CURRENT_FH: parent directory */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.14.3. DESCRIPTION | 18.14.3. DESCRIPTION | |||
The current filehandle is assumed to refer to a regular directory or | The current filehandle is assumed to refer to a regular directory or | |||
a named attribute directory. LOOKUPP assigns the filehandle for its | a named attribute directory. LOOKUPP assigns the filehandle for its | |||
parent directory to be the current filehandle. If there is no parent | parent directory to be the current filehandle. If there is no parent | |||
directory an NFS4ERR_NOENT error must be returned. Therefore, | directory, an NFS4ERR_NOENT error must be returned. Therefore, | |||
NFS4ERR_NOENT will be returned by the server when the current | NFS4ERR_NOENT will be returned by the server when the current | |||
filehandle is at the root or top of the server's file tree. | filehandle is at the root or top of the server's file tree. | |||
As is the case with LOOKUP, LOOKUPP will also cross mountpoints. | As is the case with LOOKUP, LOOKUPP will also cross mountpoints. | |||
If the current filehandle is not a directory or named attribute | If the current filehandle is not a directory or named attribute | |||
directory, the error NFS4ERR_NOTDIR is returned. | directory, the error NFS4ERR_NOTDIR is returned. | |||
If the requester's security flavor does not match that configured for | If the requester's security flavor does not match that configured for | |||
the parent directory, then the server SHOULD return NFS4ERR_WRONGSEC | the parent directory, then the server SHOULD return NFS4ERR_WRONGSEC | |||
(a future minor revision of NFSv4 may upgrade this to MUST) in the | (a future minor revision of NFSv4 may upgrade this to MUST) in the | |||
LOOKUPP response. However, if the server does so, it MUST support | LOOKUPP response. However, if the server does so, it MUST support | |||
the SECINFO_NO_NAME operation (Section 18.45), so that the client can | the SECINFO_NO_NAME operation (Section 18.45), so that the client can | |||
gracefully determine the correct security flavor. | gracefully determine the correct security flavor. | |||
If the current filehandle is a named attribute directory that is | If the current filehandle is a named attribute directory that is | |||
associated with a file system object via OPENATTR (i.e. not a sub- | associated with a file system object via OPENATTR (i.e., not a sub- | |||
directory of a named attribute directory) LOOKUPP SHOULD return the | directory of a named attribute directory), LOOKUPP SHOULD return the | |||
filehandle of the associated file system object. | filehandle of the associated file system object. | |||
18.14.4. IMPLEMENTATION | 18.14.4. IMPLEMENTATION | |||
An issue to note is upward navigation from named attribute | An issue to note is upward navigation from named attribute | |||
directories. The named attribute directories are essentially | directories. The named attribute directories are essentially | |||
detached from the namespace and this property should be safely | detached from the namespace, and this property should be safely | |||
represented in the client operating environment. LOOKUPP on a named | represented in the client operating environment. LOOKUPP on a named | |||
attribute directory may return the filehandle of the associated file | attribute directory may return the filehandle of the associated file, | |||
and conveying this to applications might be unsafe as many | and conveying this to applications might be unsafe as many | |||
applications expect the parent of an object to always be a directory. | applications expect the parent of an object to always be a directory. | |||
Therefore the client may want to hide the parent of named attribute | Therefore, the client may want to hide the parent of named attribute | |||
directories (represented as ".." in UNIX) or represent the named | directories (represented as ".." in UNIX) or represent the named | |||
attribute directory as its own parent (as typically done for the file | attribute directory as its own parent (as is typically done for the | |||
system root directory in UNIX). | file system root directory in UNIX). | |||
18.15. Operation 17: NVERIFY - Verify Difference in Attributes | 18.15. Operation 17: NVERIFY - Verify Difference in Attributes | |||
18.15.1. ARGUMENTS | 18.15.1. ARGUMENTS | |||
struct NVERIFY4args { | struct NVERIFY4args { | |||
/* CURRENT_FH: object */ | /* CURRENT_FH: object */ | |||
fattr4 obj_attributes; | fattr4 obj_attributes; | |||
}; | }; | |||
18.15.2. RESULTS | 18.15.2. RESULTS | |||
struct NVERIFY4res { | struct NVERIFY4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.15.3. DESCRIPTION | 18.15.3. DESCRIPTION | |||
This operation is used to prefix a sequence of operations to be | This operation is used to prefix a sequence of operations to be | |||
performed if one or more attributes have changed on some file system | performed if one or more attributes have changed on some file system | |||
object. If all the attributes match then the error NFS4ERR_SAME MUST | object. If all the attributes match, then the error NFS4ERR_SAME | |||
be returned. | MUST be returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.15.4. IMPLEMENTATION | 18.15.4. IMPLEMENTATION | |||
This operation is useful as a cache validation operator. If the | This operation is useful as a cache validation operator. If the | |||
object to which the attributes belong has changed then the following | object to which the attributes belong has changed, then the following | |||
operations may obtain new data associated with that object. For | operations may obtain new data associated with that object, for | |||
instance, to check if a file has been changed and obtain new data if | instance, to check if a file has been changed and obtain new data if | |||
it has: | it has: | |||
SEQUENCE | SEQUENCE | |||
PUTFH fh | PUTFH fh | |||
NVERIFY attrbits attrs | NVERIFY attrbits attrs | |||
READ 0 32767 | READ 0 32767 | |||
Contrast this with NFSv3, which would first send a GETATTR in one | Contrast this with NFSv3, which would first send a GETATTR in one | |||
request/reply round trip, and then if attributes indicated that the | request/reply round trip, and then if attributes indicated that the | |||
client's cache was stale, then send a READ in another request/reply | client's cache was stale, then send a READ in another request/reply | |||
round trip. | round trip. | |||
In the case that a RECOMMENDED attribute is specified in the NVERIFY | In the case that a RECOMMENDED attribute is specified in the NVERIFY | |||
operation and the server does not support that attribute for the file | operation and the server does not support that attribute for the file | |||
system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | |||
client. | client. | |||
When the attribute rdattr_error or any set-only attribute (e.g. | When the attribute rdattr_error or any set-only attribute (e.g., | |||
time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | |||
the client. | the client. | |||
18.16. Operation 18: OPEN - Open a Regular File | 18.16. Operation 18: OPEN - Open a Regular File | |||
18.16.1. ARGUMENTS | 18.16.1. ARGUMENTS | |||
/* | /* | |||
* Various definitions for OPEN | * Various definitions for OPEN | |||
*/ | */ | |||
skipping to change at page 444, line 23 | skipping to change at page 445, line 23 | |||
* a delegation granted by the server. | * a delegation granted by the server. | |||
* File is identified by filehandle. | * File is identified by filehandle. | |||
*/ | */ | |||
case CLAIM_DELEG_CUR_FH: /* new to v4.1 */ | case CLAIM_DELEG_CUR_FH: /* new to v4.1 */ | |||
/* CURRENT_FH: file being opened */ | /* CURRENT_FH: file being opened */ | |||
stateid4 oc_delegate_stateid; | stateid4 oc_delegate_stateid; | |||
}; | }; | |||
/* | /* | |||
* OPEN: Open a file, potentially receiving an open delegation | * OPEN: Open a file, potentially receiving an OPEN delegation | |||
*/ | */ | |||
struct OPEN4args { | struct OPEN4args { | |||
seqid4 seqid; | seqid4 seqid; | |||
uint32_t share_access; | uint32_t share_access; | |||
uint32_t share_deny; | uint32_t share_deny; | |||
open_owner4 owner; | open_owner4 owner; | |||
openflag4 openhow; | openflag4 openhow; | |||
open_claim4 claim; | open_claim4 claim; | |||
}; | }; | |||
skipping to change at page 446, line 46 | skipping to change at page 447, line 46 | |||
OPEN4resok resok4; | OPEN4resok resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.16.3. DESCRIPTION | 18.16.3. DESCRIPTION | |||
The OPEN operation opens a regular file in a directory with the | The OPEN operation opens a regular file in a directory with the | |||
provided name or filehandle. OPEN can also create a file if a name | provided name or filehandle. OPEN can also create a file if a name | |||
is provided, and the client specifies it wants to create a file. | is provided, and the client specifies it wants to create a file. | |||
Specification whether a file is be created or not, and the method of | Specification of whether or not a file is to be created, and the | |||
creation is via the openhow parameter. The openhow parameter | method of creation is via the openhow parameter. The openhow | |||
consists of a switched union (data type opengflag4), which switches | parameter consists of a switched union (data type opengflag4), which | |||
on the value of opentype (OPEN4_NOCREATE or OPEN4_CREATE). If | switches on the value of opentype (OPEN4_NOCREATE or OPEN4_CREATE). | |||
OPEN4_CREATE is specified, this leads to another switched union (data | If OPEN4_CREATE is specified, this leads to another switched union | |||
type createhow4) that supports four cases of creation methods: | (data type createhow4) that supports four cases of creation methods: | |||
UNCHECKED4, GUARDED4, EXCLUSIVE4, or EXCLUSIVE4_1. If opentype is | UNCHECKED4, GUARDED4, EXCLUSIVE4, or EXCLUSIVE4_1. If opentype is | |||
OPEN4_CREATE, then the claim field of the claim field (sic) MUST be | OPEN4_CREATE, then the claim field of the claim field MUST be one of | |||
one of CLAIM_NULL, CLAIM_DELEGATE_CUR, or CLAIM_DELEGATE_PREV, | CLAIM_NULL, CLAIM_DELEGATE_CUR, or CLAIM_DELEGATE_PREV, because these | |||
because these claim methods include a component of a file name. | claim methods include a component of a file name. | |||
Upon success (which might entail creation of a new file), the current | Upon success (which might entail creation of a new file), the current | |||
filehandle is replaced by that of the created or existing object. | filehandle is replaced by that of the created or existing object. | |||
If the current filehandle is a named attribute directory, OPEN will | If the current filehandle is a named attribute directory, OPEN will | |||
then create or open a named attribute file. Note that exclusive | then create or open a named attribute file. Note that exclusive | |||
create of a named attribute is not supported. If the createmode is | create of a named attribute is not supported. If the createmode is | |||
EXCLUSIVE4 or EXCLUSIVE4_1 and the current filehandle is a named | EXCLUSIVE4 or EXCLUSIVE4_1 and the current filehandle is a named | |||
attribute directory, the server will return EINVAL. | attribute directory, the server will return EINVAL. | |||
skipping to change at page 448, line 16 | skipping to change at page 449, line 16 | |||
client MUST send a SETATTR to set attributes to a known state. | client MUST send a SETATTR to set attributes to a known state. | |||
In NFSv4.1, EXCLUSIVE4 has been deprecated in favor of EXCLUSIVE4_1. | In NFSv4.1, EXCLUSIVE4 has been deprecated in favor of EXCLUSIVE4_1. | |||
Unlike EXCLUSIVE4, attributes may be provided in the EXCLUSIVE4_1 | Unlike EXCLUSIVE4, attributes may be provided in the EXCLUSIVE4_1 | |||
case, but because the server may use attributes of the target object | case, but because the server may use attributes of the target object | |||
to store the verifier, the set of allowable attributes may be fewer | to store the verifier, the set of allowable attributes may be fewer | |||
than the set of attributes SETATTR allows. The allowable attributes | than the set of attributes SETATTR allows. The allowable attributes | |||
for EXCLUSIVE4_1 are indicated in the suppattr_exclcreat | for EXCLUSIVE4_1 are indicated in the suppattr_exclcreat | |||
(Section 5.8.1.14) attribute. If the client attempts to set in | (Section 5.8.1.14) attribute. If the client attempts to set in | |||
cva_attrs an attribute that is not in suppattr_exclcreat, the server | cva_attrs an attribute that is not in suppattr_exclcreat, the server | |||
MUST return NFS4ERR_INVAL. The response field, attrset indicates | MUST return NFS4ERR_INVAL. The response field, attrset, indicates | |||
both which attributes the server set from cva_attrs, and which | both which attributes the server set from cva_attrs and which | |||
attributes the server used to store the verifier. As described in | attributes the server used to store the verifier. As described in | |||
Section 18.16.4, the client can compare cva_attrs.attrmask with | Section 18.16.4, the client can compare cva_attrs.attrmask with | |||
attrset to determine which attributes were used to store the | attrset to determine which attributes were used to store the | |||
verifier. | verifier. | |||
With the addition of persistent sessions and pNFS, under some | With the addition of persistent sessions and pNFS, under some | |||
conditions EXCLUSIVE4 MUST NOT be used by the client or supported by | conditions EXCLUSIVE4 MUST NOT be used by the client or supported by | |||
the server. The following table summarizes the appropriate and | the server. The following table summarizes the appropriate and | |||
mandated exclusive create methods for implementations of NFSv4.1: | mandated exclusive create methods for implementations of NFSv4.1: | |||
skipping to change at page 448, line 47 | skipping to change at page 449, line 47 | |||
| | | EXCLUSIVE4 | EXCLUSIVE4 (SHOULD | | | | | EXCLUSIVE4 | EXCLUSIVE4 (SHOULD | | |||
| | | | NOT) | | | | | | NOT) | | |||
| no | yes | EXCLUSIVE4_1 | EXCLUSIVE4_1 | | | no | yes | EXCLUSIVE4_1 | EXCLUSIVE4_1 | | |||
| yes | no | GUARDED4 | GUARDED4 | | | yes | no | GUARDED4 | GUARDED4 | | |||
| yes | yes | GUARDED4 | GUARDED4 | | | yes | yes | GUARDED4 | GUARDED4 | | |||
+----------------+-----------+---------------+----------------------+ | +----------------+-----------+---------------+----------------------+ | |||
Table 10 | Table 10 | |||
If CREATE_SESSION4_FLAG_PERSIST is set in the results of | If CREATE_SESSION4_FLAG_PERSIST is set in the results of | |||
CREATE_SESSION the reply cache is persistent (see Section 18.36). If | CREATE_SESSION, the reply cache is persistent (see Section 18.36). | |||
the EXCHGID4_FLAG_USE_PNFS_MDS flag is set in the results from | If the EXCHGID4_FLAG_USE_PNFS_MDS flag is set in the results from | |||
EXCHANGE_ID, the server is a pNFS server (see Section 18.35). If the | EXCHANGE_ID, the server is a pNFS server (see Section 18.35). If the | |||
client attempts to use EXCLUSIVE4 on a persistent session, or a | client attempts to use EXCLUSIVE4 on a persistent session, or a | |||
session derived from a EXCHGID4_FLAG_USE_PNFS_MDS client ID, the | session derived from an EXCHGID4_FLAG_USE_PNFS_MDS client ID, the | |||
server MUST return NFS4ERR_INVAL. | server MUST return NFS4ERR_INVAL. | |||
With persistent sessions, exclusive create semantics are fully | With persistent sessions, exclusive create semantics are fully | |||
achievable via GUARDED4, and so EXCLUSIVE4 or EXCLUSIVE4_1 MUST NOT | achievable via GUARDED4, and so EXCLUSIVE4 or EXCLUSIVE4_1 MUST NOT | |||
be used. When pNFS is being used, the layout_hint attribute might | be used. When pNFS is being used, the layout_hint attribute might | |||
not be supported after the file is created. Only the EXCLUSIVE4_1 | not be supported after the file is created. Only the EXCLUSIVE4_1 | |||
and GUARDED methods of exclusive file creation allow the atomic | and GUARDED methods of exclusive file creation allow the atomic | |||
setting of attributes. | setting of attributes. | |||
For the target directory, the server returns change_info4 information | For the target directory, the server returns change_info4 information | |||
in cinfo. With the atomic field of the change_info4 data type, the | in cinfo. With the atomic field of the change_info4 data type, the | |||
server will indicate if the before and after change attributes were | server will indicate if the before and after change attributes were | |||
obtained atomically with respect to the link creation. | obtained atomically with respect to the link creation. | |||
The OPEN operation provides for Windows share reservation capability | The OPEN operation provides for Windows share reservation capability | |||
with the use of the share_access and share_deny fields of the OPEN | with the use of the share_access and share_deny fields of the OPEN | |||
arguments. The client specifies at OPEN the required share_access | arguments. The client specifies at OPEN the required share_access | |||
and share_deny modes. For clients that do not directly support | and share_deny modes. For clients that do not directly support | |||
SHAREs (i.e. UNIX), the expected deny value is DENY_NONE. In the | SHAREs (i.e., UNIX), the expected deny value is | |||
case that there is a existing SHARE reservation that conflicts with | OPEN4_SHARE_DENY_NONE. In the case that there is an existing SHARE | |||
the OPEN request, the server returns the error NFS4ERR_SHARE_DENIED. | reservation that conflicts with the OPEN request, the server returns | |||
For additional discussion of SHARE semantics see Section 9.7. | the error NFS4ERR_SHARE_DENIED. For additional discussion of SHARE | |||
semantics, see Section 9.7. | ||||
For each OPEN, the client provides a value for the owner field of the | For each OPEN, the client provides a value for the owner field of the | |||
OPEN argument. The owner field is of data type open_owner4, and | OPEN argument. The owner field is of data type open_owner4, and | |||
contains a field called clientid and a field called owner. The | contains a field called clientid and a field called owner. The | |||
client can set the clientid field to any value and the server MUST | client can set the clientid field to any value and the server MUST | |||
ignore it. Instead the server MUST derive the client ID from the | ignore it. Instead, the server MUST derive the client ID from the | |||
session ID of the SEQUENCE operation of the COMPOUND request. | session ID of the SEQUENCE operation of the COMPOUND request. | |||
The seqid field of the request is not used in NFSv4.1, but it MAY be | The "seqid" field of the request is not used in NFSv4.1, but it MAY | |||
any value and the server MUST ignore it. | be any value and the server MUST ignore it. | |||
In the case that the client is recovering state from a server | In the case that the client is recovering state from a server | |||
failure, the claim field of the OPEN argument is used to signify that | failure, the claim field of the OPEN argument is used to signify that | |||
the request is meant to reclaim state previously held. | the request is meant to reclaim state previously held. | |||
The "claim" field of the OPEN argument is used to specify the file to | The "claim" field of the OPEN argument is used to specify the file to | |||
be opened and the state information which the client claims to | be opened and the state information that the client claims to | |||
possess. There are seven claim types as follows: | possess. There are seven claim types as follows: | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
| open type | description | | | open type | description | | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
| CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN request | | | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN request | | |||
| | and there is no previous state associate | | | | and there is no previous state associated | | |||
| | with the file for the client. With | | | | with the file for the client. With | | |||
| | CLAIM_NULL the file is identified by the | | | | CLAIM_NULL, the file is identified by the | | |||
| | current filehandle and the specified | | | | current filehandle and the specified | | |||
| | component name. With CLAIM_FH (new to | | | | component name. With CLAIM_FH (new to | | |||
| | NFSv4.1) the file is identified by just | | | | NFSv4.1), the file is identified by just | | |||
| | the current filehandle. | | | | the current filehandle. | | |||
| CLAIM_PREVIOUS | The client is claiming basic OPEN state | | | CLAIM_PREVIOUS | The client is claiming basic OPEN state | | |||
| | for a file that was held previous to a | | | | for a file that was held previous to a | | |||
| | server restart. Generally used when a | | | | server restart. Generally used when a | | |||
| | server is returning persistent | | | | server is returning persistent | | |||
| | filehandles; the client may not have the | | | | filehandles; the client may not have the | | |||
| | file name to reclaim the OPEN. | | | | file name to reclaim the OPEN. | | |||
| CLAIM_DELEGATE_CUR, | The client is claiming a delegation for | | | CLAIM_DELEGATE_CUR, | The client is claiming a delegation for | | |||
| CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | | | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally, | | |||
| | this is done as part of recalling a | | | | this is done as part of recalling a | | |||
| | delegation. With CLAIM_DELEGATE_CUR, the | | | | delegation. With CLAIM_DELEGATE_CUR, the | | |||
| | file is identified by the current | | | | file is identified by the current | | |||
| | filehandle and the specified component | | | | filehandle and the specified component | | |||
| | name. With CLAIM_DELEG_CUR_FH (new to | | | | name. With CLAIM_DELEG_CUR_FH (new to | | |||
| | NFSv4.1), the file is identified by just | | | | NFSv4.1), the file is identified by just | | |||
| | the current filehandle. | | | | the current filehandle. | | |||
| CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | | CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | |||
| CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | | CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | |||
| | used after the client restarts. The server | | | | used after the client restarts. The server | | |||
| | MAY support CLAIM_DELEGATE_PREV or | | | | MAY support CLAIM_DELEGATE_PREV and/or | | |||
| | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | |||
| | it does support either open type, | | | | it does support either claim type, | | |||
| | CREATE_SESSION MUST NOT remove the | | | | CREATE_SESSION MUST NOT remove the | | |||
| | client's delegation state, and the server | | | | client's delegation state, and the server | | |||
| | MUST support the DELEGPURGE operation. | | | | MUST support the DELEGPURGE operation. | | |||
+----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
For OPEN requests that reach the server during the grace period, the | For OPEN requests that reach the server during the grace period, the | |||
server returns an error of NFS4ERR_GRACE. The following claim types | server returns an error of NFS4ERR_GRACE. The following claim types | |||
are exceptions: | are exceptions: | |||
o OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted | o OPEN requests specifying the claim type CLAIM_PREVIOUS are devoted | |||
to reclaiming opens after a server restart and are typically only | to reclaiming opens after a server restart and are typically only | |||
valid during the grace period. | valid during the grace period. | |||
o OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and | o OPEN requests specifying the claim types CLAIM_DELEGATE_CUR and | |||
CLAIM_DELEG_CUR_FH are valid both during and after the grace | CLAIM_DELEG_CUR_FH are valid both during and after the grace | |||
period. Since the granting of the delegation that they are | period. Since the granting of the delegation that they are | |||
subordinate to assures that there is no conflict with locks to be | subordinate to assures that there is no conflict with locks to be | |||
reclaimed by other clients, the server need not return | reclaimed by other clients, the server need not return | |||
NFS4ERR_GRACE when these are received during the grace period. | NFS4ERR_GRACE when these are received during the grace period. | |||
For any OPEN request, the server may return an open delegation, which | For any OPEN request, the server may return an OPEN delegation, which | |||
allows further opens and closes to be handled locally on the client | allows further opens and closes to be handled locally on the client | |||
as described in Section 10.4. Note that delegation is up to the | as described in Section 10.4. Note that delegation is up to the | |||
server to decide. The client should never assume that delegation | server to decide. The client should never assume that delegation | |||
will or will not be granted in a particular instance. It should | will or will not be granted in a particular instance. It should | |||
always be prepared for either case. A partial exception is the | always be prepared for either case. A partial exception is the | |||
reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. | reclaim (CLAIM_PREVIOUS) case, in which a delegation type is claimed. | |||
In this case, delegation will always be granted, although the server | In this case, delegation will always be granted, although the server | |||
may specify an immediate recall in the delegation structure. | may specify an immediate recall in the delegation structure. | |||
The rflags returned by a successful OPEN allow the server to return | The rflags returned by a successful OPEN allow the server to return | |||
information governing how the open file is to be handled. | information governing how the open file is to be handled. | |||
o OPEN4_RESULT_CONFIRM is deprecated and MUST NOT be returned by an | o OPEN4_RESULT_CONFIRM is deprecated and MUST NOT be returned by an | |||
NFSv4.1 server. | NFSv4.1 server. | |||
o OPEN4_RESULT_LOCKTYPE_POSIX indicates the server's file locking | o OPEN4_RESULT_LOCKTYPE_POSIX indicates that the server's byte-range | |||
behavior supports the complete set of POSIX locking techniques | locking behavior supports the complete set of POSIX locking | |||
[24]. From this the client can choose to manage file locking | techniques [24]. From this, the client can choose to manage byte- | |||
state in a way to handle a mis-match of file locking management. | range locking state in a way to handle a mismatch of byte-range | |||
locking management. | ||||
o OPEN4_RESULT_PRESERVE_UNLINKED indicates the server will preserve | o OPEN4_RESULT_PRESERVE_UNLINKED indicates that the server will | |||
the open file if the client (or any other client) removes the file | preserve the open file if the client (or any other client) removes | |||
as long as it is open. Furthermore, the server promises to | the file as long as it is open. Furthermore, the server promises | |||
preserve the file through the grace period after server restart, | to preserve the file through the grace period after server | |||
thereby giving the client the opportunity to reclaim its open. | restart, thereby giving the client the opportunity to reclaim its | |||
open. | ||||
o OPEN4_RESULT_MAY_NOTIFY_LOCK indicates that the server may attempt | o OPEN4_RESULT_MAY_NOTIFY_LOCK indicates that the server may attempt | |||
CB_NOTIFY_LOCK callbacks for locks on this file. This flag is a | CB_NOTIFY_LOCK callbacks for locks on this file. This flag is a | |||
hint only, and may be safely ignored by the client. | hint only, and may be safely ignored by the client. | |||
If the component is of zero length, NFS4ERR_INVAL will be returned. | If the component is of zero length, NFS4ERR_INVAL will be returned. | |||
The component is also subject to the normal UTF-8, character support, | The component is also subject to the normal UTF-8, character support, | |||
and name checks. See Section 14.5 for further discussion. | and name checks. See Section 14.5 for further discussion. | |||
When an OPEN is done and the specified open-owner already has the | When an OPEN is done and the specified open-owner already has the | |||
skipping to change at page 452, line 17 | skipping to change at page 453, line 19 | |||
read-only mode and the OPEN request has specified ACCESS_WRITE or | read-only mode and the OPEN request has specified ACCESS_WRITE or | |||
ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a read- | ACCESS_BOTH, the server will return NFS4ERR_ROFS to indicate a read- | |||
only file system. | only file system. | |||
As with the CREATE operation, the server MUST derive the owner, owner | As with the CREATE operation, the server MUST derive the owner, owner | |||
ACE, group, or group ACE if any of the four attributes are required | ACE, group, or group ACE if any of the four attributes are required | |||
and supported by the server's file system. For an OPEN with the | and supported by the server's file system. For an OPEN with the | |||
EXCLUSIVE4 createmode, the server has no choice, since such OPEN | EXCLUSIVE4 createmode, the server has no choice, since such OPEN | |||
calls do not include the createattrs field. Conversely, if | calls do not include the createattrs field. Conversely, if | |||
createattrs (UNCHECKED4 or GUARDED4) or cva_attrs (EXCLUSIVE4_1) is | createattrs (UNCHECKED4 or GUARDED4) or cva_attrs (EXCLUSIVE4_1) is | |||
specified, and includes an owner, or owner_group, or ACE that the | specified, and includes an owner, owner_group, or ACE that the | |||
principal in the RPC call's credentials does not have authorization | principal in the RPC call's credentials does not have authorization | |||
to create files for, then the server may return NFS4ERR_PERM. | to create files for, then the server may return NFS4ERR_PERM. | |||
In the case of an OPEN which specifies a size of zero (e.g. | In the case of an OPEN that specifies a size of zero (e.g., | |||
truncation) and the file has named attributes, the named attributes | truncation) and the file has named attributes, the named attributes | |||
are left as is and are not removed. | are left as is and are not removed. | |||
NFSv4.1 gives more precise control to clients over acquisition of | NFSv4.1 gives more precise control to clients over acquisition of | |||
delegations via the following new flags for the share_access field of | delegations via the following new flags for the share_access field of | |||
OPEN4args: | OPEN4args: | |||
OPEN4_SHARE_ACCESS_WANT_READ_DELEG | OPEN4_SHARE_ACCESS_WANT_READ_DELEG | |||
OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | |||
skipping to change at page 452, line 47 | skipping to change at page 454, line 4 | |||
OPEN4_SHARE_ACCESS_WANT_CANCEL | OPEN4_SHARE_ACCESS_WANT_CANCEL | |||
OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | |||
OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | |||
If (share_access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is not zero, | If (share_access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is not zero, | |||
then the client will have specified one and only one of: | then the client will have specified one and only one of: | |||
OPEN4_SHARE_ACCESS_WANT_READ_DELEG | OPEN4_SHARE_ACCESS_WANT_READ_DELEG | |||
OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | |||
OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | |||
OPEN4_SHARE_ACCESS_WANT_NO_DELEG | OPEN4_SHARE_ACCESS_WANT_NO_DELEG | |||
OPEN4_SHARE_ACCESS_WANT_CANCEL | OPEN4_SHARE_ACCESS_WANT_CANCEL | |||
Otherwise the client is indicating no desire for a delegation and the | Otherwise, the client is neither indicating a desire nor a non-desire | |||
server MAY or MAY not return a delegation in the OPEN response. | for a delegation, and the server MAY or MAY not return a delegation | |||
in the OPEN response. | ||||
If the server supports the new _WANT_ flags and the client sends one | If the server supports the new _WANT_ flags and the client sends one | |||
or more of the new flags, then in the event the server does not | or more of the new flags, then in the event the server does not | |||
return a delegation, it MUST return a delegation type of | return a delegation, it MUST return a delegation type of | |||
OPEN_DELEGATE_NONE_EXT. The field od_whynone in the reply indicates | OPEN_DELEGATE_NONE_EXT. The field ond_why in the reply indicates why | |||
why no delegation was returned and will be one of: | no delegation was returned and will be one of: | |||
WND4_NOT_WANTED The client specified | WND4_NOT_WANTED The client specified | |||
OPEN4_SHARE_ACCESS_WANT_NO_DELEG. | OPEN4_SHARE_ACCESS_WANT_NO_DELEG. | |||
WND4_CONTENTION There is a conflicting delegation or open on the | WND4_CONTENTION There is a conflicting delegation or open on the | |||
file. | file. | |||
WND4_RESOURCE Resource limitations prevent the server from granting | WND4_RESOURCE Resource limitations prevent the server from granting | |||
a delegation. | a delegation. | |||
WND4_NOT_SUPP_FTYPE The server does not support delegations on this | WND4_NOT_SUPP_FTYPE The server does not support delegations on this | |||
file type. | file type. | |||
WND4_WRITE_DELEG_NOT_SUPP_FTYPE The server does not support write | WND4_WRITE_DELEG_NOT_SUPP_FTYPE The server does not support | |||
delegations on this file type. | OPEN_DELEGATE_WRITE delegations on this file type. | |||
WND4_NOT_SUPP_UPGRADE The server does not support atomic upgrade of | WND4_NOT_SUPP_UPGRADE The server does not support atomic upgrade of | |||
a read delegation to a write delegation. | an OPEN_DELEGATE_READ delegation to an OPEN_DELEGATE_WRITE | |||
delegation. | ||||
WND4_NOT_SUPP_DOWNGRADE The server does not support atomic downgrade | WND4_NOT_SUPP_DOWNGRADE The server does not support atomic downgrade | |||
of a write delegation to a read delegation. | of an OPEN_DELEGATE_WRITE delegation to an OPEN_DELEGATE_READ | |||
delegation. | ||||
WND4_CANCELLED The client specified OPEN4_SHARE_ACCESS_WANT_CANCEL | WND4_CANCELED The client specified OPEN4_SHARE_ACCESS_WANT_CANCEL | |||
and now any "want" for this file object is cancelled. | and now any "want" for this file object is cancelled. | |||
WND4_IS_DIR The specified file object is a directory, and the | WND4_IS_DIR The specified file object is a directory, and the | |||
operation is OPEN or WANT_DELEGATION which do not support | operation is OPEN or WANT_DELEGATION, which do not support | |||
delegations on directories. | delegations on directories. | |||
OPEN4_SHARE_ACCESS_WANT_READ_DELEG, | OPEN4_SHARE_ACCESS_WANT_READ_DELEG, | |||
OPEN_SHARE_ACCESS_WANT_WRITE_DELEG, or | OPEN_SHARE_ACCESS_WANT_WRITE_DELEG, or | |||
OPEN_SHARE_ACCESS_WANT_ANY_DELEG mean, respectively, the client wants | OPEN_SHARE_ACCESS_WANT_ANY_DELEG mean, respectively, the client wants | |||
a read, write, or any delegation regardless which of | an OPEN_DELEGATE_READ, OPEN_DELEGATE_WRITE, or any delegation | |||
OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or | regardless which of OPEN4_SHARE_ACCESS_READ, | |||
OPEN4_SHARE_ACCESS_BOTH is set. If the client has a read delegation | OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH is set. If the | |||
on a file, and requests a write delegation, then the client is | client has an OPEN_DELEGATE_READ delegation on a file and requests an | |||
requesting atomic upgrade of its read delegation to a write | OPEN_DELEGATE_WRITE delegation, then the client is requesting atomic | |||
delegation. If the client has a write delegation on a file, and | upgrade of its OPEN_DELEGATE_READ delegation to an | |||
requests a read delegation, then the client is requesting atomic | OPEN_DELEGATE_WRITE delegation. If the client has an | |||
downgrade to a read delegation. A server MAY support atomic upgrade | OPEN_DELEGATE_WRITE delegation on a file and requests an | |||
or downgrade. If it does, then the returned delegation_type of | OPEN_DELEGATE_READ delegation, then the client is requesting atomic | |||
OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE that is different than the | downgrade to an OPEN_DELEGATE_READ delegation. A server MAY support | |||
delegation type the client currently has, indicates successful | atomic upgrade or downgrade. If it does, then the returned | |||
upgrade or downgrade. If it does not support atomic delegation | delegation_type of OPEN_DELEGATE_READ or OPEN_DELEGATE_WRITE that is | |||
upgrade or downgrade, then od_whynone will be WND4_NOT_SUPP_UPGRADE | different from the delegation type the client currently has, | |||
or WND4_NOT_SUPP_DOWNGRADE. | indicates successful upgrade or downgrade. If the server does not | |||
support atomic delegation upgrade or downgrade, then ond_why will be | ||||
set to WND4_NOT_SUPP_UPGRADE or WND4_NOT_SUPP_DOWNGRADE. | ||||
OPEN4_SHARE_ACCESS_WANT_NO_DELEG means the client wants no | OPEN4_SHARE_ACCESS_WANT_NO_DELEG means that the client wants no | |||
delegation. | delegation. | |||
OPEN4_SHARE_ACCESS_WANT_CANCEL means the client wants no delegation | OPEN4_SHARE_ACCESS_WANT_CANCEL means that the client wants no | |||
and wants to cancel any previously registered "want" for a | delegation and wants to cancel any previously registered "want" for a | |||
delegation. | delegation. | |||
The client may set one or both of | The client may set one or both of | |||
OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and | OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL and | |||
OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED. However, they | OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED. However, they | |||
will have no effect unless one of following are set: | will have no effect unless one of following is set: | |||
o OPEN4_SHARE_ACCESS_WANT_READ_DELEG | o OPEN4_SHARE_ACCESS_WANT_READ_DELEG | |||
o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | |||
o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | |||
If the client specifies | If the client specifies | |||
OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL, then it wishes | OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL, then it wishes | |||
to register a "want" for a delegation, in the event the OPEN results | to register a "want" for a delegation, in the event the OPEN results | |||
skipping to change at page 455, line 32 | skipping to change at page 456, line 39 | |||
In absence of a persistent session, the client invokes exclusive | In absence of a persistent session, the client invokes exclusive | |||
create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. | create by setting the how parameter to EXCLUSIVE4 or EXCLUSIVE4_1. | |||
In these cases, the client provides a verifier that can reasonably be | In these cases, the client provides a verifier that can reasonably be | |||
expected to be unique. A combination of a client identifier, perhaps | expected to be unique. A combination of a client identifier, perhaps | |||
the client network address, and a unique number generated by the | the client network address, and a unique number generated by the | |||
client, perhaps the RPC transaction identifier, may be appropriate. | client, perhaps the RPC transaction identifier, may be appropriate. | |||
If the object does not exist, the server creates the object and | If the object does not exist, the server creates the object and | |||
stores the verifier in stable storage. For file systems that do not | stores the verifier in stable storage. For file systems that do not | |||
provide a mechanism for the storage of arbitrary file attributes, the | provide a mechanism for the storage of arbitrary file attributes, the | |||
server may use one or more elements of the object metadata to store | server may use one or more elements of the object's metadata to store | |||
the verifier. The verifier MUST be stored in stable storage to | the verifier. The verifier MUST be stored in stable storage to | |||
prevent erroneous failure on retransmission of the request. It is | prevent erroneous failure on retransmission of the request. It is | |||
assumed that an exclusive create is being performed because exclusive | assumed that an exclusive create is being performed because exclusive | |||
semantics are critical to the application. Because of the expected | semantics are critical to the application. Because of the expected | |||
usage, exclusive CREATE does not rely solely on the server's reply | usage, exclusive CREATE does not rely solely on the server's reply | |||
cache for storage of the verifier. A nonpersistent reply cache does | cache for storage of the verifier. A nonpersistent reply cache does | |||
not survive a crash and the session and reply cache may be deleted | not survive a crash and the session and reply cache may be deleted | |||
after a network partition that exceeds the lease time, thus opening | after a network partition that exceeds the lease time, thus opening | |||
failure windows. | failure windows. | |||
An NFSv4.1 server SHOULD NOT store the verifier in any of the file's | An NFSv4.1 server SHOULD NOT store the verifier in any of the file's | |||
RECOMMENDED or REQUIRED attributes. If it does, the server SHOULD | RECOMMENDED or REQUIRED attributes. If it does, the server SHOULD | |||
use time_modify_set or time_access_set to store the verifier. The | use time_modify_set or time_access_set to store the verifier. The | |||
server SHOULD NOT store the verifier in the following attributes: acl | server SHOULD NOT store the verifier in the following attributes: | |||
(it is desirable for access control to be established at creation), | ||||
dacl (ditto), mode (ditto), owner (ditto), owner_group (ditto), | acl (it is desirable for access control to be established at | |||
retentevt_set (it may be desired to establish retention at creation) | creation), | |||
retention_hold (ditto), retention_set (ditto), sacl (it is desirable | ||||
for auditing control to be established at creation), size (on some | dacl (ditto), | |||
servers, size may have a limited range of values), mode_set_masked | ||||
(as with mode), and time_creation (a meaningful file creation should | mode (ditto), | |||
be set when the file is created). Another alternative for the server | ||||
is to use a named attribute to store the verifier. | owner (ditto), | |||
owner_group (ditto), | ||||
retentevt_set (it may be desired to establish retention at | ||||
creation) | ||||
retention_hold (ditto), | ||||
retention_set (ditto), | ||||
sacl (it is desirable for auditing control to be established at | ||||
creation), | ||||
size (on some servers, size may have a limited range of values), | ||||
mode_set_masked (as with mode), | ||||
and | ||||
time_creation (a meaningful file creation should be set when the | ||||
file is created). | ||||
Another alternative for the server is to use a named attribute to | ||||
store the verifier. | ||||
Because the EXCLUSIVE4 create method does not specify initial | Because the EXCLUSIVE4 create method does not specify initial | |||
attributes, when processing an EXCLUSIVE4 create, the server | attributes when processing an EXCLUSIVE4 create, the server | |||
o SHOULD set the owner of the file to that corresponding to the | o SHOULD set the owner of the file to that corresponding to the | |||
credential of request's RPC header. | credential of request's RPC header. | |||
o SHOULD NOT leave the file's access control to anyone but the owner | o SHOULD NOT leave the file's access control to anyone but the owner | |||
of the file. | of the file. | |||
If the server cannot support exclusive create semantics, possibly | If the server cannot support exclusive create semantics, possibly | |||
because of the requirement to commit the verifier to stable storage, | because of the requirement to commit the verifier to stable storage, | |||
it should fail the OPEN request with the error, NFS4ERR_NOTSUPP. | it should fail the OPEN request with the error NFS4ERR_NOTSUPP. | |||
During an exclusive CREATE request, if the object already exists, the | During an exclusive CREATE request, if the object already exists, the | |||
server reconstructs the object's verifier and compares it with the | server reconstructs the object's verifier and compares it with the | |||
verifier in the request. If they match, the server treats the | verifier in the request. If they match, the server treats the | |||
request as a success. The request is presumed to be a duplicate of | request as a success. The request is presumed to be a duplicate of | |||
an earlier, successful request for which the reply was lost and that | an earlier, successful request for which the reply was lost and that | |||
the server duplicate request cache mechanism did not detect. If the | the server duplicate request cache mechanism did not detect. If the | |||
verifiers do not match, the request is rejected with the status, | verifiers do not match, the request is rejected with the status | |||
NFS4ERR_EXIST. | NFS4ERR_EXIST. | |||
After the client has performed a successful exclusive create, the | After the client has performed a successful exclusive create, the | |||
attrset response indicates which attributes were used to store the | attrset response indicates which attributes were used to store the | |||
verifier. If EXCLUSIVE4 was used, the attributes set in attrset were | verifier. If EXCLUSIVE4 was used, the attributes set in attrset were | |||
used for the verifier. If EXCLUSIVE4_1 was used, the client | used for the verifier. If EXCLUSIVE4_1 was used, the client | |||
determines the attributes used for the verifier by comparing attrset | determines the attributes used for the verifier by comparing attrset | |||
with cva_attrs.attrmask; any bits set in the former but not the | with cva_attrs.attrmask; any bits set in the former but not the | |||
latter identify the attributes used store the verifier. The client | latter identify the attributes used to store the verifier. The | |||
MUST immediately send a SETATTR to set attributes used to store the | client MUST immediately send a SETATTR to set attributes used to | |||
verifier. Until it does so, the attributes used to store the | store the verifier. Until it does so, the attributes used to store | |||
verifier cannot be relied upon. The subsequent SETATTR MUST NOT | the verifier cannot be relied upon. The subsequent SETATTR MUST NOT | |||
occur in the same COMPOUND request as the OPEN. | occur in the same COMPOUND request as the OPEN. | |||
Unless a persistent session is used, use of the GUARDED4 attribute | Unless a persistent session is used, use of the GUARDED4 attribute | |||
does not provide exactly-once semantics. In particular, if a reply | does not provide exactly once semantics. In particular, if a reply | |||
is lost and the server does not detect the retransmission of the | is lost and the server does not detect the retransmission of the | |||
request, the operation can fail with NFS4ERR_EXIST, even though the | request, the operation can fail with NFS4ERR_EXIST, even though the | |||
create was performed successfully. The client would use this | create was performed successfully. The client would use this | |||
behavior in the case that the application has not requested an | behavior in the case that the application has not requested an | |||
exclusive create but has asked to have the file truncated when the | exclusive create but has asked to have the file truncated when the | |||
file is opened. In the case of the client timing out and | file is opened. In the case of the client timing out and | |||
retransmitting the create request, the client can use GUARDED4 to | retransmitting the create request, the client can use GUARDED4 to | |||
prevent against a sequence like: create, write, create | prevent against a sequence like create, write, create (retransmitted) | |||
(retransmitted) from occurring. | from occurring. | |||
For SHARE reservations, the client MUST specify a value for | For SHARE reservations, the value of the expression (share_access & | |||
share_access that is one of READ, WRITE, or BOTH. For share_deny, | ~OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) MUST be one of | |||
the client MUST specify one of NONE, READ, WRITE, or BOTH. If the | OPEN4_SHARE_ACCESS_READ, OPEN4_SHARE_ACCESS_WRITE, or | |||
client fails to do this, the server MUST return NFS4ERR_INVAL. | OPEN4_SHARE_ACCESS_BOTH. If not, the server MUST return | |||
NFS4ERR_INVAL. The value of share_deny MUST be one of | ||||
OPEN4_SHARE_DENY_NONE, OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, | ||||
or OPEN4_SHARE_DENY_BOTH. If not, the server MUST return | ||||
NFS4ERR_INVAL. | ||||
Based on the share_access value (READ, WRITE, or BOTH) the client | Based on the share_access value (OPEN4_SHARE_ACCESS_READ, | |||
OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH), the client | ||||
should check that the requester has the proper access rights to | should check that the requester has the proper access rights to | |||
perform the specified operation. This would generally be the results | perform the specified operation. This would generally be the results | |||
of applying the ACL access rules to the file for the current | of applying the ACL access rules to the file for the current | |||
requester. However, just as with the ACCESS operation, the client | requester. However, just as with the ACCESS operation, the client | |||
should not attempt to second-guess the server's decisions, as access | should not attempt to second-guess the server's decisions, as access | |||
rights may change and may be subject to server administrative | rights may change and may be subject to server administrative | |||
controls outside the ACL framework. If the requester is not | controls outside the ACL framework. If the requester's READ or WRITE | |||
authorized to READ or WRITE (depending on the share_access value), | operation is not authorized (depending on the share_access value), | |||
the server MUST return NFS4ERR_ACCESS. | the server MUST return NFS4ERR_ACCESS. | |||
Note that if the client ID was not created with | Note that if the client ID was not created with the | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID set in the reply to EXCHANGE_ID, | EXCHGID4_FLAG_BIND_PRINC_STATEID capability set in the reply to | |||
then the server MUST NOT impose any requirement that READs and WRITEs | EXCHANGE_ID, then the server MUST NOT impose any requirement that | |||
sent for an open file have the same credentials as the OPEN itself, | READs and WRITEs sent for an open file have the same credentials as | |||
and the server is REQUIRED to perform access checking on the READs | the OPEN itself, and the server is REQUIRED to perform access | |||
and WRITEs themselves. Otherwise, if the reply to EXCHANGE_ID did | checking on the READs and WRITEs themselves. Otherwise, if the reply | |||
have EXCHGID4_FLAG_BIND_PRINC_STATEID set, then with one exception, | to EXCHANGE_ID did have EXCHGID4_FLAG_BIND_PRINC_STATEID set, then | |||
the credentials used in the OPEN request MUST match those used in the | with one exception, the credentials used in the OPEN request MUST | |||
READs and WRITEs, and the stateids in the READs and WRITEs MUST | match those used in the READs and WRITEs, and the stateids in the | |||
match, or be derived from the stateid from the reply to OPEN. The | READs and WRITEs MUST match, or be derived from the stateid from the | |||
exception is if SP4_SSV or SP4_MACH_CRED state protection is used, | reply to OPEN. The exception is if SP4_SSV or SP4_MACH_CRED state | |||
and the spo_must_allow result of EXCHANGE_ID includes the READ and/or | protection is used, and the spo_must_allow result of EXCHANGE_ID | |||
WRITE operations. In that case, the machine or SSV credential will | includes the READ and/or WRITE operations. In that case, the machine | |||
be allowed to send READ and/or WRITE. See Section 18.35. | or SSV credential will be allowed to send READ and/or WRITE. See | |||
Section 18.35. | ||||
If the component provided to OPEN is a symbolic link, the error | If the component provided to OPEN is a symbolic link, the error | |||
NFS4ERR_SYMLINK will be returned to the client, while if it is a | NFS4ERR_SYMLINK will be returned to the client, while if it is a | |||
directory the error NFS4ERR_ISDIR. If the component is neither of | directory the error NFS4ERR_ISDIR will be returned. If the component | |||
those but not an ordinary file, the error NFS4ERR_WRONG_TYPE is | is neither of those but not an ordinary file, the error | |||
returned. If the current filehandle is not a directory, the error | NFS4ERR_WRONG_TYPE is returned. If the current filehandle is not a | |||
NFS4ERR_NOTDIR will be returned. | directory, the error NFS4ERR_NOTDIR will be returned. | |||
The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a | The use of the OPEN4_RESULT_PRESERVE_UNLINKED result flag allows a | |||
client avoid the common implementation practice of renaming an open | client to avoid the common implementation practice of renaming an | |||
file to ".nfs<unique value>" after it removes the file. After the | open file to ".nfs<unique value>" after it removes the file. After | |||
server returns OPEN4_RESULT_PRESERVE_UNLINKED, if a client sends a | the server returns OPEN4_RESULT_PRESERVE_UNLINKED, if a client sends | |||
REMOVE operation that would reduce the file's link count to zero, the | a REMOVE operation that would reduce the file's link count to zero, | |||
server SHOULD report a value of zero for the numlinks attribute on | the server SHOULD report a value of zero for the numlinks attribute | |||
the file. | on the file. | |||
If another client has a delegation of the file being opened that | If another client has a delegation of the file being opened that | |||
conflicts with open being done (sometimes depending of the | conflicts with open being done (sometimes depending on the | |||
share_access or share_deny value specified), the delegation(s) MUST | share_access or share_deny value specified), the delegation(s) MUST | |||
be recalled, and the operation cannot proceed until each such | be recalled, and the operation cannot proceed until each such | |||
delegation is returned or revoked. Except where this happens very | delegation is returned or revoked. Except where this happens very | |||
quickly, one or more NFS4ERR_DELAY errors will be returned to | quickly, one or more NFS4ERR_DELAY errors will be returned to | |||
requests made while delegation remains outstanding. In the case of a | requests made while delegation remains outstanding. In the case of | |||
write delegation, any open by a different client will conflict, while | an OPEN_DELEGATE_WRITE delegation, any open by a different client | |||
for a read delegation only opens with one of the following | will conflict, while for an OPEN_DELEGATE_READ delegation, only opens | |||
characteristics will be considered conflicting: | with one of the following characteristics will be considered | |||
conflicting: | ||||
o The value of share_access includes the bit | o The value of share_access includes the bit | |||
OPEN4_SHARE_ACCESS_WRITE. | OPEN4_SHARE_ACCESS_WRITE. | |||
o The value of share_deny specifies READ or BOTH. | o The value of share_deny specifies OPEN4_SHARE_DENY_READ or | |||
OPEN4_SHARE_DENY_BOTH. | ||||
o OPEN4_CREATE is specified together with UNCHECKED4, the size | o OPEN4_CREATE is specified together with UNCHECKED4, the size | |||
attribute is specified as zero (for truncation) and an existing | attribute is specified as zero (for truncation), and an existing | |||
file is truncated. | file is truncated. | |||
If OPEN4_CREATE is specified and the file does not exist and the | If OPEN4_CREATE is specified and the file does not exist and the | |||
current filehandle designates a directory for which another client | current filehandle designates a directory for which another client | |||
holds a directory delegation, then, unless the delegation is such | holds a directory delegation, then, unless the delegation is such | |||
that the situation can be resolved by sending a notification, the | that the situation can be resolved by sending a notification, the | |||
delegation MUST be recalled, and the operation cannot proceed until | delegation MUST be recalled, and the operation cannot proceed until | |||
the delegation is returned or revoked. Except where this happens | the delegation is returned or revoked. Except where this happens | |||
very quickly, one or more NFS4ERR_DELAY errors will be returned to | very quickly, one or more NFS4ERR_DELAY errors will be returned to | |||
requests made while delegation remains outstanding. | requests made while delegation remains outstanding. | |||
If OPEN4_CREATE is specified and the file does not exist and the | If OPEN4_CREATE is specified and the file does not exist and the | |||
current filehandle designates a directory for which one or more | current filehandle designates a directory for which one or more | |||
directory delegations exist, then, when those delegations request | directory delegations exist, then, when those delegations request | |||
such notifications, NOTIFY4_ADD_ENTRY will be generated as a result | such notifications, NOTIFY4_ADD_ENTRY will be generated as a result | |||
of this operation. | of this operation. | |||
18.16.4.1. WARNING TO CLIENT IMPLEMENTORS | 18.16.4.1. Warning to Client Implementors | |||
OPEN resembles LOOKUP in that it generates a filehandle for the | OPEN resembles LOOKUP in that it generates a filehandle for the | |||
client to use. Unlike LOOKUP though, OPEN creates server state on | client to use. Unlike LOOKUP though, OPEN creates server state on | |||
the filehandle. In normal circumstances, the client can only release | the filehandle. In normal circumstances, the client can only release | |||
this state with a CLOSE operation. CLOSE uses the current filehandle | this state with a CLOSE operation. CLOSE uses the current filehandle | |||
to determine which file to close. Therefore the client MUST follow | to determine which file to close. Therefore, the client MUST follow | |||
every OPEN operation with a GETFH operation in the same COMPOUND | every OPEN operation with a GETFH operation in the same COMPOUND | |||
procedure. This will supply the client with the filehandle such that | procedure. This will supply the client with the filehandle such that | |||
CLOSE can be used appropriately. | CLOSE can be used appropriately. | |||
Simply waiting for the lease on the file to expire is insufficient | Simply waiting for the lease on the file to expire is insufficient | |||
because the server may maintain the state indefinitely as long as | because the server may maintain the state indefinitely as long as | |||
another client does not attempt to make a conflicting access to the | another client does not attempt to make a conflicting access to the | |||
same file. | same file. | |||
See also Section 2.10.6.4. | See also Section 2.10.6.4. | |||
skipping to change at page 461, line 8 | skipping to change at page 462, line 47 | |||
18.18.3. DESCRIPTION | 18.18.3. DESCRIPTION | |||
This operation is used to adjust the access and deny states for a | This operation is used to adjust the access and deny states for a | |||
given open. This is necessary when a given open-owner opens the same | given open. This is necessary when a given open-owner opens the same | |||
file multiple times with different access and deny values. In this | file multiple times with different access and deny values. In this | |||
situation, a close of one of the opens may change the appropriate | situation, a close of one of the opens may change the appropriate | |||
share_access and share_deny flags to remove bits associated with | share_access and share_deny flags to remove bits associated with | |||
opens no longer in effect. | opens no longer in effect. | |||
Valid values for the share_access field are: OPEN4_SHARE_ACCESS_READ, | Valid values for the expression (share_access & | |||
~OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) are OPEN4_SHARE_ACCESS_READ, | ||||
OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH. If the client | OPEN4_SHARE_ACCESS_WRITE, or OPEN4_SHARE_ACCESS_BOTH. If the client | |||
specifies other values, the server MUST reply with NFS4ERR_INVAL. | specifies other values, the server MUST reply with NFS4ERR_INVAL. | |||
Valid values for the share_deny field are: OPEN4_SHARE_DENY_NONE, | Valid values for the share_deny field are OPEN4_SHARE_DENY_NONE, | |||
OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or | OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or | |||
OPEN4_SHARE_DENY_BOTH. If the client specifies other values, the | OPEN4_SHARE_DENY_BOTH. If the client specifies other values, the | |||
server MUST reply with NFS4ERR_INVAL. | server MUST reply with NFS4ERR_INVAL. | |||
After checking for valid values of share_access and share_deny, the | After checking for valid values of share_access and share_deny, the | |||
server replaces the current access and deny modes on the file with | server replaces the current access and deny modes on the file with | |||
share_access and share_deny subject to the following constraints: | share_access and share_deny subject to the following constraints: | |||
o The bits in share_access SHOULD equal the union of the | o The bits in share_access SHOULD equal the union of the | |||
share_access bits (not including OPEN4_SHARE_WANT_* bits) | share_access bits (not including OPEN4_SHARE_WANT_* bits) | |||
skipping to change at page 461, line 44 | skipping to change at page 463, line 38 | |||
OPEN_DOWNGRADE request to be denied because of conflicting share | OPEN_DOWNGRADE request to be denied because of conflicting share | |||
reservations. | reservations. | |||
The seqid argument is not used in NFSv4.1, MAY be any value, and MUST | The seqid argument is not used in NFSv4.1, MAY be any value, and MUST | |||
be ignored by the server. | be ignored by the server. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.18.4. IMPLEMENTATION | 18.18.4. IMPLEMENTATION | |||
An OPEN_DOWNGRADE operation may make read delegations grantable where | An OPEN_DOWNGRADE operation may make OPEN_DELEGATE_READ delegations | |||
they were not previously. Servers may choose to respond immediately | grantable where they were not previously. Servers may choose to | |||
if there are pending delegation want requests or may respond to the | respond immediately if there are pending delegation want requests or | |||
situation at a later time. | may respond to the situation at a later time. | |||
18.19. Operation 22: PUTFH - Set Current Filehandle | 18.19. Operation 22: PUTFH - Set Current Filehandle | |||
18.19.1. ARGUMENTS | 18.19.1. ARGUMENTS | |||
struct PUTFH4args { | struct PUTFH4args { | |||
nfs_fh4 object; | nfs_fh4 object; | |||
}; | }; | |||
18.19.2. RESULTS | 18.19.2. RESULTS | |||
skipping to change at page 462, line 25 | skipping to change at page 464, line 17 | |||
struct PUTFH4res { | struct PUTFH4res { | |||
/* | /* | |||
* If status is NFS4_OK, | * If status is NFS4_OK, | |||
* new CURRENT_FH: argument to PUTFH | * new CURRENT_FH: argument to PUTFH | |||
*/ | */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.19.3. DESCRIPTION | 18.19.3. DESCRIPTION | |||
Replaces the current filehandle with the filehandle provided as an | This operation replaces the current filehandle with the filehandle | |||
argument. Clears the current stateid. | provided as an argument. It clears the current stateid. | |||
If the security mechanism used by the requester does not meet the | If the security mechanism used by the requester does not meet the | |||
requirements of the filehandle provided to this operation, the server | requirements of the filehandle provided to this operation, the server | |||
MUST return NFS4ERR_WRONGSEC. | MUST return NFS4ERR_WRONGSEC. | |||
See Section 16.2.3.1.1 for more details on the current filehandle. | See Section 16.2.3.1.1 for more details on the current filehandle. | |||
See Section 16.2.3.1.2 for more details on the current stateid. | See Section 16.2.3.1.2 for more details on the current stateid. | |||
18.19.4. IMPLEMENTATION | 18.19.4. IMPLEMENTATION | |||
Commonly used as the second operator (after SEQUENCE) in a COMPOUND | This operation is used in an NFS request to set the context for file | |||
request to set the context for following operations. | accessing operations that follow in the same COMPOUND request. | |||
18.20. Operation 23: PUTPUBFH - Set Public Filehandle | 18.20. Operation 23: PUTPUBFH - Set Public Filehandle | |||
18.20.1. ARGUMENT | 18.20.1. ARGUMENT | |||
void; | void; | |||
18.20.2. RESULT | 18.20.2. RESULT | |||
struct PUTPUBFH4res { | struct PUTPUBFH4res { | |||
/* | /* | |||
* If status is NFS4_OK, | * If status is NFS4_OK, | |||
* new CURRENT_FH: public fh | * new CURRENT_FH: public fh | |||
*/ | */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.20.3. DESCRIPTION | 18.20.3. DESCRIPTION | |||
Replaces the current filehandle with the filehandle that represents | This operation replaces the current filehandle with the filehandle | |||
the public filehandle of the server's name space. This filehandle | that represents the public filehandle of the server's namespace. | |||
may be different from the "root" filehandle which may be associated | This filehandle may be different from the "root" filehandle that may | |||
with some other directory on the server. | be associated with some other directory on the server. | |||
PUTPUBFH also clears the current stateid. | PUTPUBFH also clears the current stateid. | |||
The public filehandle represents the concepts embodied in RFC2054 | The public filehandle represents the concepts embodied in RFC2054 | |||
[42], RFC2055 [43], and RFC2224 [53]. The intent for NFSv4.1 is that | [42], RFC 2055 [43], and RFC 2224 [53]. The intent for NFSv4.1 is | |||
the public filehandle (represented by the PUTPUBFH operation) be used | that the public filehandle (represented by the PUTPUBFH operation) be | |||
as a method of providing WebNFS server compatibility with NFSv3. | used as a method of providing WebNFS server compatibility with NFSv3. | |||
The public filehandle and the root filehandle (represented by the | The public filehandle and the root filehandle (represented by the | |||
PUTROOTFH operation) SHOULD be equivalent. If the public and root | PUTROOTFH operation) SHOULD be equivalent. If the public and root | |||
filehandles are not equivalent, then the directory corresponding to | filehandles are not equivalent, then the directory corresponding to | |||
the public filehandle MUST be a descendant of the directory | the public filehandle MUST be a descendant of the directory | |||
corresponding to the root filehandle. | corresponding to the root filehandle. | |||
See Section 16.2.3.1.1 for more details on the current filehandle. | See Section 16.2.3.1.1 for more details on the current filehandle. | |||
See Section 16.2.3.1.2 for more details on the current stateid. | See Section 16.2.3.1.2 for more details on the current stateid. | |||
18.20.4. IMPLEMENTATION | 18.20.4. IMPLEMENTATION | |||
Used as the second operator (after SEQUENCE) in an NFS request to set | This operation is used in an NFS request to set the context for file | |||
the context for file accessing operations that follow in the same | accessing operations that follow in the same COMPOUND request. | |||
COMPOUND request. | ||||
With the NFSv3 public filehandle, the client is able to specify | With the NFSv3 public filehandle, the client is able to specify | |||
whether the path name provided in the LOOKUP should be evaluated as | whether the path name provided in the LOOKUP should be evaluated as | |||
either an absolute path relative to the server's root or relative to | either an absolute path relative to the server's root or relative to | |||
the public filehandle. RFC2224 [53] contains further discussion of | the public filehandle. RFC2224 [53] contains further discussion of | |||
the functionality. With NFSv4.1, that type of specification is not | the functionality. With NFSv4.1, that type of specification is not | |||
directly available in the LOOKUP operation. The reason for this is | directly available in the LOOKUP operation. The reason for this is | |||
because the component separators needed to specify absolute vs. | because the component separators needed to specify absolute vs. | |||
relative are not allowed in NFSv4. Therefore, the client is | relative are not allowed in NFSv4. Therefore, the client is | |||
responsible for constructing its request such that the use of either | responsible for constructing its request such that the use of either | |||
PUTROOTFH or PUTPUBFH are used to signify absolute or relative | PUTROOTFH or PUTPUBFH signifies absolute or relative evaluation of an | |||
evaluation of an NFS URL respectively. | NFS URL, respectively. | |||
Note that there are warnings mentioned in RFC2224 [53] with respect | Note that there are warnings mentioned in RFC2224 [53] with respect | |||
to the use of absolute evaluation and the restrictions the server may | to the use of absolute evaluation and the restrictions the server may | |||
place on that evaluation with respect to how much of its namespace | place on that evaluation with respect to how much of its namespace | |||
has been made available. These same warnings apply to NFSv4.1. It | has been made available. These same warnings apply to NFSv4.1. It | |||
is likely, therefore that because of server implementation details, | is likely, therefore, that because of server implementation details, | |||
an NFSv3 absolute public filehandle lookup may behave differently | an NFSv3 absolute public filehandle lookup may behave differently | |||
than an NFSv4.1 absolute resolution. | than an NFSv4.1 absolute resolution. | |||
There is a form of security negotiation as described in RFC2755 [54] | There is a form of security negotiation as described in RFC2755 [54] | |||
that uses the public filehandle and an overloading of the pathname. | that uses the public filehandle and an overloading of the pathname. | |||
This method is not available with NFSv4.1 as filehandles are not | This method is not available with NFSv4.1 as filehandles are not | |||
overloaded with special meaning and therefore do not provide the same | overloaded with special meaning and therefore do not provide the same | |||
framework as NFSv3. Clients should therefore use the security | framework as NFSv3. Clients should therefore use the security | |||
negotiation mechanisms described in Section 2.6. | negotiation mechanisms described in Section 2.6. | |||
skipping to change at page 464, line 43 | skipping to change at page 466, line 30 | |||
struct PUTROOTFH4res { | struct PUTROOTFH4res { | |||
/* | /* | |||
* If status is NFS4_OK, | * If status is NFS4_OK, | |||
* new CURRENT_FH: root fh | * new CURRENT_FH: root fh | |||
*/ | */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.21.3. DESCRIPTION | 18.21.3. DESCRIPTION | |||
Replaces the current filehandle with the filehandle that represents | This operation replaces the current filehandle with the filehandle | |||
the root of the server's name space. From this filehandle a LOOKUP | that represents the root of the server's namespace. From this | |||
operation can locate any other filehandle on the server. This | filehandle, a LOOKUP operation can locate any other filehandle on the | |||
filehandle may be different from the "public" filehandle which may be | server. This filehandle may be different from the "public" | |||
associated with some other directory on the server. | filehandle that may be associated with some other directory on the | |||
server. | ||||
PUTROOTFH also clears the current stateid. | PUTROOTFH also clears the current stateid. | |||
See Section 16.2.3.1.1 for more details on the current filehandle. | See Section 16.2.3.1.1 for more details on the current filehandle. | |||
See Section 16.2.3.1.2 for more details on the current stateid. | See Section 16.2.3.1.2 for more details on the current stateid. | |||
18.21.4. IMPLEMENTATION | 18.21.4. IMPLEMENTATION | |||
Commonly used as the second operator (after SEQUENCE) in an NFS | This operation is used in an NFS request to set the context for file | |||
request to set the context for file accessing operations that follow | accessing operations that follow in the same COMPOUND request. | |||
in the same COMPOUND request. | ||||
18.22. Operation 25: READ - Read from File | 18.22. Operation 25: READ - Read from File | |||
18.22.1. ARGUMENTS | 18.22.1. ARGUMENTS | |||
struct READ4args { | struct READ4args { | |||
/* CURRENT_FH: file */ | /* CURRENT_FH: file */ | |||
stateid4 stateid; | stateid4 stateid; | |||
offset4 offset; | offset4 offset; | |||
count4 count; | count4 count; | |||
skipping to change at page 465, line 44 | skipping to change at page 467, line 34 | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.22.3. DESCRIPTION | 18.22.3. DESCRIPTION | |||
The READ operation reads data from the regular file identified by the | The READ operation reads data from the regular file identified by the | |||
current filehandle. | current filehandle. | |||
The client provides an offset of where the READ is to start and a | The client provides an offset of where the READ is to start and a | |||
count of how many bytes are to be read. An offset of 0 (zero) means | count of how many bytes are to be read. An offset of zero means to | |||
to read data starting at the beginning of the file. If offset is | read data starting at the beginning of the file. If offset is | |||
greater than or equal to the size of the file, the status, NFS4_OK, | greater than or equal to the size of the file, the status NFS4_OK is | |||
is returned with a data length set to 0 (zero) and eof is set to | returned with a data length set to zero and eof is set to TRUE. The | |||
TRUE. The READ is subject to access permissions checking. | READ is subject to access permissions checking. | |||
If the client specifies a count value of 0 (zero), the READ succeeds | If the client specifies a count value of zero, the READ succeeds and | |||
and returns 0 (zero) bytes of data again subject to access | returns zero bytes of data again subject to access permissions | |||
permissions checking. The server may choose to return fewer bytes | checking. The server may choose to return fewer bytes than specified | |||
than specified by the client. The client needs to check for this | by the client. The client needs to check for this condition and | |||
condition and handle the condition appropriately. | handle the condition appropriately. | |||
Except when special stateids are used, the stateid value for a READ | Except when special stateids are used, the stateid value for a READ | |||
request represents a value returned from a previous byte-range lock | request represents a value returned from a previous byte-range lock | |||
or share reservation request or the stateid associated with a | or share reservation request or the stateid associated with a | |||
delegation. The stateid identifies the associated owners if any and | delegation. The stateid identifies the associated owners if any and | |||
is used by the server to verify that the associated locks are still | is used by the server to verify that the associated locks are still | |||
valid (e.g. have not been revoked). | valid (e.g., have not been revoked). | |||
If the read ended at the end-of-file (formally, in a correctly formed | If the read ended at the end-of-file (formally, in a correctly formed | |||
READ request, if offset + count is equal to the size of the file), or | READ operation, if offset + count is equal to the size of the file), | |||
the read request extends beyond the size of the file (if offset + | or the READ operation extends beyond the size of the file (if offset | |||
count is greater than the size of the file), eof is returned as TRUE; | + count is greater than the size of the file), eof is returned as | |||
otherwise it is FALSE. A successful READ of an empty file will | TRUE; otherwise, it is FALSE. A successful READ of an empty file | |||
always return eof as TRUE. | will always return eof as TRUE. | |||
If the current filehandle is not an ordinary file, an error will be | If the current filehandle is not an ordinary file, an error will be | |||
returned to the client. In the case that the current filehandle | returned to the client. In the case that the current filehandle | |||
represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. if | represents an object of type NF4DIR, NFS4ERR_ISDIR is returned. If | |||
the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | the current filehandle designates a symbolic link, NFS4ERR_SYMLINK is | |||
returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | returned. In all other cases, NFS4ERR_WRONG_TYPE is returned. | |||
For a READ with a stateid value of all bits 0, the server MAY allow | For a READ with a stateid value of all bits equal to zero, the server | |||
the READ to be serviced subject to mandatory file locks or the | MAY allow the READ to be serviced subject to mandatory byte-range | |||
current share deny modes for the file. For a READ with a stateid | locks or the current share deny modes for the file. For a READ with | |||
value of all bits 1, the server MAY allow READ operations to bypass | a stateid value of all bits equal to one, the server MAY allow READ | |||
locking checks at the server. | operations to bypass locking checks at the server. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.22.4. IMPLEMENTATION | 18.22.4. IMPLEMENTATION | |||
It is possible for the server to return fewer than count bytes of | If the server returns a "short read" (i.e., fewer data than requested | |||
data. If the server returns less than the count requested and eof is | and eof is set to FALSE), the client should send another READ to get | |||
set to FALSE, the client should send another READ to get the | the remaining data. A server may return less data than requested | |||
remaining data. A server may return less data than requested under | under several circumstances. The file may have been truncated by | |||
several circumstances. The file may have been truncated by another | another client or perhaps on the server itself, changing the file | |||
client or perhaps on the server itself, changing the file size from | size from what the requesting client believes to be the case. This | |||
what the requesting client believes to be the case. This would | would reduce the actual amount of data available to the client. It | |||
reduce the actual amount of data available to the client. It is | is possible that the server reduce the transfer size and so return a | |||
possible that the server may back off the transfer size and reduce | short read result. Server resource exhaustion may also occur in a | |||
the read request return. Server resource exhaustion may also occur | short read. | |||
necessitating a smaller read return. | ||||
If mandatory file locking is in effect for the file, and if the | If mandatory byte-range locking is in effect for the file, and if the | |||
region corresponding to the data to be read from file is write locked | byte-range corresponding to the data to be read from the file is | |||
by an owner not associated the stateid, the server will return the | WRITE_LT locked by an owner not associated with the stateid, the | |||
NFS4ERR_LOCKED error. The client should try to get the appropriate | server will return the NFS4ERR_LOCKED error. The client should try | |||
read byte-range lock via the LOCK operation before re-attempting the | to get the appropriate READ_LT via the LOCK operation before re- | |||
READ. When the READ completes, the client should release the byte- | attempting the READ. When the READ completes, the client should | |||
range lock via LOCKU. | release the byte-range lock via LOCKU. | |||
If another client has a write delegation for the file being read, the | If another client has an OPEN_DELEGATE_WRITE delegation for the file | |||
delegation must be recalled, and the operation cannot proceed until | being read, the delegation must be recalled, and the operation cannot | |||
that delegation is returned or revoked. Except where this happens | proceed until that delegation is returned or revoked. Except where | |||
very quickly, one or more NFS4ERR_DELAY errors will be returned to | this happens very quickly, one or more NFS4ERR_DELAY errors will be | |||
requests made while the delegation remains outstanding. Normally, | returned to requests made while the delegation remains outstanding. | |||
delegations will not be recalled as a result of a READ operation | Normally, delegations will not be recalled as a result of a READ | |||
since the recall will occur as a result of an earlier OPEN. However, | operation since the recall will occur as a result of an earlier OPEN. | |||
since it is possible for a READ to be done with a special stateid, | However, since it is possible for a READ to be done with a special | |||
the server needs to check for this case even though the client should | stateid, the server needs to check for this case even though the | |||
have done an OPEN previously. | client should have done an OPEN previously. | |||
18.23. Operation 26: READDIR - Read Directory | 18.23. Operation 26: READDIR - Read Directory | |||
18.23.1. ARGUMENTS | 18.23.1. ARGUMENTS | |||
struct READDIR4args { | struct READDIR4args { | |||
/* CURRENT_FH: directory */ | /* CURRENT_FH: directory */ | |||
nfs_cookie4 cookie; | nfs_cookie4 cookie; | |||
verifier4 cookieverf; | verifier4 cookieverf; | |||
count4 dircount; | count4 dircount; | |||
skipping to change at page 468, line 34 | skipping to change at page 470, line 8 | |||
union READDIR4res switch (nfsstat4 status) { | union READDIR4res switch (nfsstat4 status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
READDIR4resok resok4; | READDIR4resok resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.23.3. DESCRIPTION | 18.23.3. DESCRIPTION | |||
The READDIR operation retrieves a variable number of entries from a | The READDIR operation retrieves a variable number of entries from a | |||
file system directory and returns client requested attributes for | file system directory and returns client-requested attributes for | |||
each entry along with information to allow the client to request | each entry along with information to allow the client to request | |||
additional directory entries in a subsequent READDIR. | additional directory entries in a subsequent READDIR. | |||
The arguments contain a cookie value that represents where the | The arguments contain a cookie value that represents where the | |||
READDIR should start within the directory. A value of 0 (zero) for | READDIR should start within the directory. A value of zero for the | |||
the cookie is used to start reading at the beginning of the | cookie is used to start reading at the beginning of the directory. | |||
directory. For subsequent READDIR requests, the client specifies a | For subsequent READDIR requests, the client specifies a cookie value | |||
cookie value that is provided by the server on a previous READDIR | that is provided by the server on a previous READDIR request. | |||
request. | ||||
The request's cookieverf field should be set to 0 (zero) when the | The request's cookieverf field should be set to 0 zero) when the | |||
request's cookie field is 0 (zero) (first directory read). On | request's cookie field is zero (first read of the directory). On | |||
subsequent requests, the cookieverf field must match the cookieverf | subsequent requests, the cookieverf field must match the cookieverf | |||
returned by the READDIR in which the cookie was acquired. If the | returned by the READDIR in which the cookie was acquired. If the | |||
server determines that the cookieverf is no longer valid for the | server determines that the cookieverf is no longer valid for the | |||
directory, the error NFS4ERR_NOT_SAME must be returned. | directory, the error NFS4ERR_NOT_SAME must be returned. | |||
The dircount field of the request is a hint of the maximum number of | The dircount field of the request is a hint of the maximum number of | |||
bytes of directory information that should be returned. This value | bytes of directory information that should be returned. This value | |||
represents the total length of the names of the directory entries and | represents the total length of the names of the directory entries and | |||
the cookie value for these entries. This length represents the XDR | the cookie value for these entries. This length represents the XDR | |||
encoding of the data (names and cookies) and not the length in the | encoding of the data (names and cookies) and not the length in the | |||
skipping to change at page 469, line 50 | skipping to change at page 471, line 20 | |||
the entire READDIR operation, the server can instead return the | the entire READDIR operation, the server can instead return the | |||
attribute rdattr_error (Section 5.8.1.12). With this, the server is | attribute rdattr_error (Section 5.8.1.12). With this, the server is | |||
able to communicate the failure to the client and not fail the entire | able to communicate the failure to the client and not fail the entire | |||
operation in the instance of what might be a transient failure. | operation in the instance of what might be a transient failure. | |||
Obviously, the client must request the fattr4_rdattr_error attribute | Obviously, the client must request the fattr4_rdattr_error attribute | |||
for this method to work properly. If the client does not request the | for this method to work properly. If the client does not request the | |||
attribute, the server has no choice but to return failure for the | attribute, the server has no choice but to return failure for the | |||
entire READDIR operation. | entire READDIR operation. | |||
For some file system environments, the directory entries "." and ".." | For some file system environments, the directory entries "." and ".." | |||
have special meaning and in other environments, they do not. If the | have special meaning, and in other environments, they do not. If the | |||
server supports these special entries within a directory, they SHOULD | server supports these special entries within a directory, they SHOULD | |||
NOT be returned to the client as part of the READDIR response. To | NOT be returned to the client as part of the READDIR response. To | |||
enable some client environments, the cookie values of 0, 1, and 2 are | enable some client environments, the cookie values of zero, 1, and 2 | |||
to be considered reserved. Note that the UNIX client will use these | are to be considered reserved. Note that the UNIX client will use | |||
values when combining the server's response and local representations | these values when combining the server's response and local | |||
to enable a fully formed UNIX directory presentation to the | representations to enable a fully formed UNIX directory presentation | |||
application. | to the application. | |||
For READDIR arguments, cookie values of 1 and 2 SHOULD NOT be used | For READDIR arguments, cookie values of one and two SHOULD NOT be | |||
and for READDIR results cookie values of 0, 1, and 2 SHOULD NOT be | used, and for READDIR results, cookie values of zero, one, and two | |||
returned. | SHOULD NOT be returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.23.4. IMPLEMENTATION | 18.23.4. IMPLEMENTATION | |||
The server's file system directory representations can differ | The server's file system directory representations can differ | |||
greatly. A client's programming interfaces may also be bound to the | greatly. A client's programming interfaces may also be bound to the | |||
local operating environment in a way that does not translate well | local operating environment in a way that does not translate well | |||
into the NFS protocol. Therefore the use of the dircount and | into the NFS protocol. Therefore, the use of the dircount and | |||
maxcount fields are provided to enable the client to provide hints to | maxcount fields are provided to enable the client to provide hints to | |||
the server. If the client is aggressive about attribute collection | the server. If the client is aggressive about attribute collection | |||
during a READDIR, the server has an idea of how to limit the encoded | during a READDIR, the server has an idea of how to limit the encoded | |||
response. | response. | |||
If dircount is zero, the server bounds the reply's size based on | If dircount is zero, the server bounds the reply's size based on the | |||
request's maxcount field. | request's maxcount field. | |||
The cookieverf may be used by the server to help manage cookie values | The cookieverf may be used by the server to help manage cookie values | |||
that may become stale. It should be a rare occurrence that a server | that may become stale. It should be a rare occurrence that a server | |||
is unable to continue properly reading a directory with the provided | is unable to continue properly reading a directory with the provided | |||
cookie/cookieverf pair. The server SHOULD make every effort to avoid | cookie/cookieverf pair. The server SHOULD make every effort to avoid | |||
this condition since the application at the client might be unable to | this condition since the application at the client might be unable to | |||
properly handle this type of failure. | properly handle this type of failure. | |||
The use of the cookieverf will also protect the client from using | The use of the cookieverf will also protect the client from using | |||
skipping to change at page 472, line 41 | skipping to change at page 474, line 11 | |||
If the entry in the directory was the last reference to the | If the entry in the directory was the last reference to the | |||
corresponding file system object, the object may be destroyed. The | corresponding file system object, the object may be destroyed. The | |||
directory may be either of type NF4DIR or NF4ATTRDIR. | directory may be either of type NF4DIR or NF4ATTRDIR. | |||
For the directory where the filename was removed, the server returns | For the directory where the filename was removed, the server returns | |||
change_info4 information in cinfo. With the atomic field of the | change_info4 information in cinfo. With the atomic field of the | |||
change_info4 data type, the server will indicate if the before and | change_info4 data type, the server will indicate if the before and | |||
after change attributes were obtained atomically with respect to the | after change attributes were obtained atomically with respect to the | |||
removal. | removal. | |||
If the target has a length of 0 (zero), or if target does not obey | If the target has a length of zero, or if the target does not obey | |||
the UTF-8 definition (and the server is enforcing UTF-8 encoding, see | the UTF-8 definition (and the server is enforcing UTF-8 encoding; see | |||
Section 14.4), the error NFS4ERR_INVAL will be returned. | Section 14.4), the error NFS4ERR_INVAL will be returned. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.25.4. IMPLEMENTATION | 18.25.4. IMPLEMENTATION | |||
NFSv3 required a different operator RMDIR for directory removal and | NFSv3 required a different operator RMDIR for directory removal and | |||
REMOVE for non-directory removal. This allowed clients to skip | REMOVE for non-directory removal. This allowed clients to skip | |||
checking the file type when being passed a non-directory delete | checking the file type when being passed a non-directory delete | |||
system call (e.g. unlink() [27] in POSIX) to remove a directory, as | system call (e.g., unlink() [27] in POSIX) to remove a directory, as | |||
well as the converse (e.g. a rmdir() on a non-directory) because they | well as the converse (e.g., a rmdir() on a non-directory) because | |||
knew the server would check the file type. NFSv4.1 REMOVE can be | they knew the server would check the file type. NFSv4.1 REMOVE can | |||
used to delete any directory entry independent of its file type. The | be used to delete any directory entry independent of its file type. | |||
implementor of an NFSv4.1 client's entry points from the unlink() and | The implementor of an NFSv4.1 client's entry points from the unlink() | |||
rmdir() system calls should first check the file type against the | and rmdir() system calls should first check the file type against the | |||
types the system call is allowed to remove before sending a REMOVE | types the system call is allowed to remove before sending a REMOVE | |||
operation. Alternatively, the implementor can produce a COMPOUND | operation. Alternatively, the implementor can produce a COMPOUND | |||
call that includes a LOOKUP/VERIFY sequence of operations to verify | call that includes a LOOKUP/VERIFY sequence of operations to verify | |||
the file type before a REMOVE operation in the same COMPOUND call. | the file type before a REMOVE operation in the same COMPOUND call. | |||
The concept of last reference is server specific. However, if the | The concept of last reference is server specific. However, if the | |||
numlinks field in the previous attributes of the object had the value | numlinks field in the previous attributes of the object had the value | |||
1, the client should not rely on referring to the object via a | 1, the client should not rely on referring to the object via a | |||
filehandle. Likewise, the client should not rely on the resources | filehandle. Likewise, the client should not rely on the resources | |||
(disk space, directory entry, and so on) formerly associated with the | (disk space, directory entry, and so on) formerly associated with the | |||
skipping to change at page 474, line 5 | skipping to change at page 475, line 20 | |||
o The server MUST NOT delete the directory entry if the reply from | o The server MUST NOT delete the directory entry if the reply from | |||
OPEN had the flag OPEN4_RESULT_PRESERVE_UNLINKED set. | OPEN had the flag OPEN4_RESULT_PRESERVE_UNLINKED set. | |||
The server MAY implement its own restrictions on removal of a file | The server MAY implement its own restrictions on removal of a file | |||
while it is open. The server might disallow such a REMOVE (or a | while it is open. The server might disallow such a REMOVE (or a | |||
removal that occurs as part of RENAME). The conditions that | removal that occurs as part of RENAME). The conditions that | |||
influence the restrictions on removal of a file while it is still | influence the restrictions on removal of a file while it is still | |||
open include: | open include: | |||
o Whether certain access protocols (i.e. not just NFS) are holding | o Whether certain access protocols (i.e., not just NFS) are holding | |||
the file open. | the file open. | |||
o Whether particular options, access modes, or policies on the | o Whether particular options, access modes, or policies on the | |||
server are enabled. | server are enabled. | |||
In all cases in which a decision is made to not allow the file's | If a file has an outstanding OPEN and this prevents the removal of | |||
directory entry be removed because of an open, the error | the file's directory entry, the error NFS4ERR_FILE_OPEN is returned. | |||
NFS4ERR_FILE_OPEN is returned. | ||||
Where the determination above cannot be made definitively because | Where the determination above cannot be made definitively because | |||
delegations are being held, they MUST be recalled to allow processing | delegations are being held, they MUST be recalled to allow processing | |||
of the REMOVE to continue. When a delegation is held, the server's | of the REMOVE to continue. When a delegation is held, the server has | |||
knowledge of the status of opens for that client is not to be relied | no reliable knowledge of the status of OPENs for that client, so | |||
on, so that unless there are files opened with the particular deny | unless there are files opened with the particular deny modes by | |||
modes by clients without delegations, the determination cannot be | clients without delegations, the determination cannot be made until | |||
made until delegations are recalled, and the operation cannot proceed | delegations are recalled, and the operation cannot proceed until each | |||
until each sufficient delegations have been returned or revoked to | sufficient delegation has been returned or revoked to allow the | |||
allow the server to make a correct determination. | server to make a correct determination. | |||
In all cases in which delegations are recalled, the server is likely | In all cases in which delegations are recalled, the server is likely | |||
to return one or more NFS4ERR_DELAY errors while delegations remain | to return one or more NFS4ERR_DELAY errors while delegations remain | |||
outstanding. | outstanding. | |||
If the current filehandle designates a directory for which another | If the current filehandle designates a directory for which another | |||
client holds a directory delegation, then, unless the situation can | client holds a directory delegation, then, unless the situation can | |||
be resolved by sending a notification, the directory delegation MUST | be resolved by sending a notification, the directory delegation MUST | |||
be recalled, and the operation MUST NOT proceed until the delegation | be recalled, and the operation MUST NOT proceed until the delegation | |||
is returned or revoked. Except where this happens very quickly, one | is returned or revoked. Except where this happens very quickly, one | |||
skipping to change at page 475, line 38 | skipping to change at page 476, line 49 | |||
18.26.3. DESCRIPTION | 18.26.3. DESCRIPTION | |||
The RENAME operation renames the object identified by oldname in the | The RENAME operation renames the object identified by oldname in the | |||
source directory corresponding to the saved filehandle, as set by the | source directory corresponding to the saved filehandle, as set by the | |||
SAVEFH operation, to newname in the target directory corresponding to | SAVEFH operation, to newname in the target directory corresponding to | |||
the current filehandle. The operation is required to be atomic to | the current filehandle. The operation is required to be atomic to | |||
the client. Source and target directories MUST reside on the same | the client. Source and target directories MUST reside on the same | |||
file system on the server. On success, the current filehandle will | file system on the server. On success, the current filehandle will | |||
continue to be the target directory. | continue to be the target directory. | |||
If the target directory already contains an entry with the name, | If the target directory already contains an entry with the name | |||
newname, the source object MUST be compatible with the target: either | newname, the source object MUST be compatible with the target: either | |||
both are non-directories or both are directories and the target MUST | both are non-directories or both are directories and the target MUST | |||
be empty. If compatible, the existing target is removed before the | be empty. If compatible, the existing target is removed before the | |||
rename occurs or preferably as part of the rename and atomic with it. | rename occurs or, preferably, the target is removed atomically as | |||
See Section 18.25.4 for client and server actions whenever a target | part of the rename. See Section 18.25.4 for client and server | |||
is removed. Note however that when the removal is performed | actions whenever a target is removed. Note however that when the | |||
atomically with the rename, certain parts of the removal described | removal is performed atomically with the rename, certain parts of the | |||
there are integrated with the rename. For example, notification of | removal described there are integrated with the rename. For example, | |||
the removal will not be via a NOTIFY4_REMOVE_ENTRY but will be | notification of the removal will not be via a NOTIFY4_REMOVE_ENTRY | |||
indicated as part of the NOTIFY4_ADD_ENTRY or NOTIFY4_RENAME_ENTRY | but will be indicated as part of the NOTIFY4_ADD_ENTRY or | |||
generated by the rename. | NOTIFY4_RENAME_ENTRY generated by the rename. | |||
If the source object and the target are not compatible or if the | If the source object and the target are not compatible or if the | |||
target is a directory but not empty, the server will return the | target is a directory but not empty, the server will return the error | |||
error, NFS4ERR_EXIST. | NFS4ERR_EXIST. | |||
If oldname and newname both refer to the same file (e.g. they might | If oldname and newname both refer to the same file (e.g., they might | |||
be hard links of each other), then unless the file is open (see | be hard links of each other), then unless the file is open (see | |||
Section 18.26.4), RENAME MUST perform no action and return NFS4_OK. | Section 18.26.4), RENAME MUST perform no action and return NFS4_OK. | |||
For both directories involved in the RENAME, the server returns | For both directories involved in the RENAME, the server returns | |||
change_info4 information. With the atomic field of the change_info4 | change_info4 information. With the atomic field of the change_info4 | |||
data type, the server will indicate if the before and after change | data type, the server will indicate if the before and after change | |||
attributes were obtained atomically with respect to the rename. | attributes were obtained atomically with respect to the rename. | |||
If oldname refers to a named attribute and the saved and current | If oldname refers to a named attribute and the saved and current | |||
filehandles refer to different file system objects, the server will | filehandles refer to different file system objects, the server will | |||
return NFS4ERR_XDEV just as if the saved and current filehandles | return NFS4ERR_XDEV just as if the saved and current filehandles | |||
represented directories on different file systems. | represented directories on different file systems. | |||
If oldname or newname have a length of 0 (zero), or if oldname or | If oldname or newname has a length of zero, or if oldname or newname | |||
newname do not obey the UTF-8 definition, the error NFS4ERR_INVAL | does not obey the UTF-8 definition, the error NFS4ERR_INVAL will be | |||
will be returned. | returned. | |||
18.26.4. IMPLEMENTATION | 18.26.4. IMPLEMENTATION | |||
The server MAY impose restrictions on the RENAME operation such that | The server MAY impose restrictions on the RENAME operation such that | |||
RENAME may not be done when the file being renamed is open or when | RENAME may not be done when the file being renamed is open or when | |||
that open is done by particular protocols, or with particular options | that open is done by particular protocols, or with particular options | |||
or access modes. Similar restrictions may be applied when a file | or access modes. Similar restrictions may be applied when a file | |||
exists with the target name and is open. When RENAME is rejected | exists with the target name and is open. When RENAME is rejected | |||
because of such restrictions, the error NFS4ERR_FILE_OPEN is | because of such restrictions, the error NFS4ERR_FILE_OPEN is | |||
returned. | returned. | |||
When oldname and rename refer to the same file and that file is open | When oldname and rename refer to the same file and that file is open | |||
in a fashion such that RENAME would normally be rejected with | in a fashion such that RENAME would normally be rejected with | |||
NFS4ERR_FILE_OPEN if oldname and newname were different files, then | NFS4ERR_FILE_OPEN if oldname and newname were different files, then | |||
RENAME SHOULD be rejected with NFS4ERR_FILE_OPEN. | RENAME SHOULD be rejected with NFS4ERR_FILE_OPEN. | |||
If a server does implement such restrictions and those restrictions | If a server does implement such restrictions and those restrictions | |||
include cases of NFSv4 opens preventing successful execution of a | include cases of NFSv4 opens preventing successful execution of a | |||
rename, the server needs to recall any delegations which could hide | rename, the server needs to recall any delegations that could hide | |||
the existence of opens relevant to that decision. This is because | the existence of opens relevant to that decision. This is because | |||
when a client holds a delegation, the server might not have an | when a client holds a delegation, the server might not have an | |||
accurate account of the opens for that client, since the client may | accurate account of the opens for that client, since the client may | |||
execute OPENs and CLOSEs locally. The RENAME operation need only be | execute OPENs and CLOSEs locally. The RENAME operation need only be | |||
delayed until a definitive result can be obtained. For example, if | delayed until a definitive result can be obtained. For example, if | |||
there are multiple delegations and one of them establishes an open | there are multiple delegations and one of them establishes an open | |||
whose presence would prevent the rename, given the server's | whose presence would prevent the rename, given the server's | |||
semantics, NFS4ERR_FILE_OPEN may be returned to the caller as soon as | semantics, NFS4ERR_FILE_OPEN may be returned to the caller as soon as | |||
that delegation is returned without waiting for other delegations to | that delegation is returned without waiting for other delegations to | |||
be returned. Similarly, if such opens are not associated with | be returned. Similarly, if such opens are not associated with | |||
delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | delegations, NFS4ERR_FILE_OPEN can be returned immediately with no | |||
delegation recall being done. | delegation recall being done. | |||
If the current filehandle or the saved filehandle designate a | If the current filehandle or the saved filehandle designates a | |||
directory for which another client holds a directory delegation, | directory for which another client holds a directory delegation, | |||
then, unless the situation can be resolved by sending a notification, | then, unless the situation can be resolved by sending a notification, | |||
the delegation MUST be recalled, and the operation cannot proceed | the delegation MUST be recalled, and the operation cannot proceed | |||
until the delegation is returned or revoked. Except where this | until the delegation is returned or revoked. Except where this | |||
happens very quickly, one or more NFS4ERR_DELAY errors will be | happens very quickly, one or more NFS4ERR_DELAY errors will be | |||
returned to requests made while delegation remains outstanding. | returned to requests made while delegation remains outstanding. | |||
When the current and saved filehandles are the same and they | When the current and saved filehandles are the same and they | |||
designate a directory for which one or more directory delegations | designate a directory for which one or more directory delegations | |||
exist, then, when those delegations request such notifications, a | exist, then, when those delegations request such notifications, a | |||
notification of type NOTIFY4_RENAME_ENTRY will be generated as a | notification of type NOTIFY4_RENAME_ENTRY will be generated as a | |||
result of this operation. When oldname and rename refer to the same | result of this operation. When oldname and rename refer to the same | |||
file, no notification is generated (because as Section 18.26.3 | file, no notification is generated (because, as Section 18.26.3 | |||
states, the server MUST take no action). When a file is removed | states, the server MUST take no action). When a file is removed | |||
because it has the same name as the target, if that removal is done | because it has the same name as the target, if that removal is done | |||
atomically with the rename, a NOTIFY4_REMOVE_ENTRY notification will | atomically with the rename, a NOTIFY4_REMOVE_ENTRY notification will | |||
not be generated. Instead, the deletion of the file will be reported | not be generated. Instead, the deletion of the file will be reported | |||
as part of the NOTIFY4_RENAME_ENTRY notification. | as part of the NOTIFY4_RENAME_ENTRY notification. | |||
When the current and saved filehandles are not the same: | When the current and saved filehandles are not the same: | |||
o If the current filehandle designates a directory for which one or | o If the current filehandle designates a directory for which one or | |||
more directory delegations exist, then, when those delegations | more directory delegations exist, then, when those delegations | |||
skipping to change at page 478, line 4 | skipping to change at page 479, line 17 | |||
request such notifications, NOTIFY4_REMOVE_ENTRY will be generated | request such notifications, NOTIFY4_REMOVE_ENTRY will be generated | |||
as a result of this operation. | as a result of this operation. | |||
If the object being renamed has file delegations held by clients | If the object being renamed has file delegations held by clients | |||
other than the one doing the RENAME, the delegations MUST be | other than the one doing the RENAME, the delegations MUST be | |||
recalled, and the operation cannot proceed until each such delegation | recalled, and the operation cannot proceed until each such delegation | |||
is returned or revoked. Note that in the case of multiply linked | is returned or revoked. Note that in the case of multiply linked | |||
files, the delegation recall requirement applies even if the | files, the delegation recall requirement applies even if the | |||
delegation was obtained through a different name than the one being | delegation was obtained through a different name than the one being | |||
renamed. In all cases in which delegations are recalled, the server | renamed. In all cases in which delegations are recalled, the server | |||
is likely to return one or more NFS4ERR_DELAY error while the | is likely to return one or more NFS4ERR_DELAY errors while the | |||
delegation(s) remains outstanding, although it may, if the returns | delegation(s) remains outstanding, although it might not do that if | |||
happen quickly, not do that. | the delegations are returned quickly. | |||
The RENAME operation must be atomic to the client. The statement | The RENAME operation must be atomic to the client. The statement | |||
"source and target directories MUST reside on the same file system on | "source and target directories MUST reside on the same file system on | |||
the server" means that the fsid fields in the attributes for the | the server" means that the fsid fields in the attributes for the | |||
directories are the same. If they reside on different file systems, | directories are the same. If they reside on different file systems, | |||
the error, NFS4ERR_XDEV, is returned. | the error NFS4ERR_XDEV is returned. | |||
Based on the value of the fh_expire_type attribute for the object, | Based on the value of the fh_expire_type attribute for the object, | |||
the filehandle may or may not expire on a RENAME. However, server | the filehandle may or may not expire on a RENAME. However, server | |||
implementors are strongly encouraged to attempt to keep filehandles | implementors are strongly encouraged to attempt to keep filehandles | |||
from expiring in this fashion. | from expiring in this fashion. | |||
On some servers, the file names "." and ".." are illegal as either | On some servers, the file names "." and ".." are illegal as either | |||
oldname or newname, and will result in the error NFS4ERR_BADNAME. In | oldname or newname, and will result in the error NFS4ERR_BADNAME. In | |||
addition, on many servers the case of oldname or newname being an | addition, on many servers the case of oldname or newname being an | |||
alias for the source directory will be checked for. Such servers | alias for the source directory will be checked for. Such servers | |||
skipping to change at page 478, line 47 | skipping to change at page 480, line 17 | |||
struct RESTOREFH4res { | struct RESTOREFH4res { | |||
/* | /* | |||
* If status is NFS4_OK, | * If status is NFS4_OK, | |||
* new CURRENT_FH: value of saved fh | * new CURRENT_FH: value of saved fh | |||
*/ | */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.27.3. DESCRIPTION | 18.27.3. DESCRIPTION | |||
Set the current filehandle and stateid to the values in the saved | The RESTOREFH operation sets the current filehandle and stateid to | |||
filehandle and stateid. If there is no saved filehandle then the | the values in the saved filehandle and stateid. If there is no saved | |||
server will return the error NFS4ERR_NOFILEHANDLE. | filehandle, then the server will return the error | |||
NFS4ERR_NOFILEHANDLE. | ||||
See Section 16.2.3.1.1 for more details on the current filehandle. | See Section 16.2.3.1.1 for more details on the current filehandle. | |||
See Section 16.2.3.1.2 for more details on the current stateid. | See Section 16.2.3.1.2 for more details on the current stateid. | |||
18.27.4. IMPLEMENTATION | 18.27.4. IMPLEMENTATION | |||
Operations like OPEN and LOOKUP use the current filehandle to | Operations like OPEN and LOOKUP use the current filehandle to | |||
represent a directory and replace it with a new filehandle. Assuming | represent a directory and replace it with a new filehandle. Assuming | |||
the previous filehandle was saved with a SAVEFH operator, the | that the previous filehandle was saved with a SAVEFH operator, the | |||
previous filehandle can be restored as the current filehandle. This | previous filehandle can be restored as the current filehandle. This | |||
is commonly used to obtain post-operation attributes for the | is commonly used to obtain post-operation attributes for the | |||
directory, e.g. | directory, e.g., | |||
PUTFH (directory filehandle) | PUTFH (directory filehandle) | |||
SAVEFH | SAVEFH | |||
GETATTR attrbits (pre-op dir attrs) | GETATTR attrbits (pre-op dir attrs) | |||
CREATE optbits "foo" attrs | CREATE optbits "foo" attrs | |||
GETATTR attrbits (file attributes) | GETATTR attrbits (file attributes) | |||
RESTOREFH | RESTOREFH | |||
GETATTR attrbits (post-op dir attrs) | GETATTR attrbits (post-op dir attrs) | |||
18.28. Operation 32: SAVEFH - Save Current Filehandle | 18.28. Operation 32: SAVEFH - Save Current Filehandle | |||
skipping to change at page 479, line 45 | skipping to change at page 481, line 17 | |||
struct SAVEFH4res { | struct SAVEFH4res { | |||
/* | /* | |||
* If status is NFS4_OK, | * If status is NFS4_OK, | |||
* new SAVED_FH: value of current fh | * new SAVED_FH: value of current fh | |||
*/ | */ | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.28.3. DESCRIPTION | 18.28.3. DESCRIPTION | |||
Save the current filehandle and stateid. If a previous filehandle | The SAVEFH operation saves the current filehandle and stateid. If a | |||
was saved then it is no longer accessible. The saved filehandle can | previous filehandle was saved, then it is no longer accessible. The | |||
be restored as the current filehandle with the RESTOREFH operator. | saved filehandle can be restored as the current filehandle with the | |||
RESTOREFH operator. | ||||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
See Section 16.2.3.1.1 for more details on the current filehandle. | See Section 16.2.3.1.1 for more details on the current filehandle. | |||
See Section 16.2.3.1.2 for more details on the current stateid. | See Section 16.2.3.1.2 for more details on the current stateid. | |||
18.28.4. IMPLEMENTATION | 18.28.4. IMPLEMENTATION | |||
18.29. Operation 33: SECINFO - Obtain Available Security | 18.29. Operation 33: SECINFO - Obtain Available Security | |||
skipping to change at page 481, line 46 | skipping to change at page 482, line 46 | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.29.3. DESCRIPTION | 18.29.3. DESCRIPTION | |||
The SECINFO operation is used by the client to obtain a list of valid | The SECINFO operation is used by the client to obtain a list of valid | |||
RPC authentication flavors for a specific directory filehandle, file | RPC authentication flavors for a specific directory filehandle, file | |||
name pair. SECINFO should apply the same access methodology used for | name pair. SECINFO should apply the same access methodology used for | |||
LOOKUP when evaluating the name. Therefore, if the requester does | LOOKUP when evaluating the name. Therefore, if the requester does | |||
not have the appropriate access to LOOKUP the name then SECINFO MUST | not have the appropriate access to LOOKUP the name, then SECINFO MUST | |||
behave the same way and return NFS4ERR_ACCESS. | behave the same way and return NFS4ERR_ACCESS. | |||
The result will contain an array which represents the security | The result will contain an array that represents the security | |||
mechanisms available, with an order corresponding to the server's | mechanisms available, with an order corresponding to the server's | |||
preferences, the most preferred being first in the array. The client | preferences, the most preferred being first in the array. The client | |||
is free to pick whatever security mechanism it both desires and | is free to pick whatever security mechanism it both desires and | |||
supports, or to pick in the server's preference order the first one | supports, or to pick in the server's preference order the first one | |||
it supports. The array entries are represented by the secinfo4 | it supports. The array entries are represented by the secinfo4 | |||
structure. The field 'flavor' will contain a value of AUTH_NONE, | structure. The field 'flavor' will contain a value of AUTH_NONE, | |||
AUTH_SYS (as defined in RFC1831 [3]), or RPCSEC_GSS (as defined in | AUTH_SYS (as defined in RFC 5531 [3]), or RPCSEC_GSS (as defined in | |||
RFC2203 [4]). The field flavor can also be any other security flavor | RFC 2203 [4]). The field flavor can also be any other security | |||
registered with IANA. | flavor registered with IANA. | |||
For the flavors AUTH_NONE and AUTH_SYS, no additional security | For the flavors AUTH_NONE and AUTH_SYS, no additional security | |||
information is returned. The same is true of many (if not most) | information is returned. The same is true of many (if not most) | |||
other security flavors, including AUTH_DH. For a return value of | other security flavors, including AUTH_DH. For a return value of | |||
RPCSEC_GSS, a security triple is returned that contains the mechanism | RPCSEC_GSS, a security triple is returned that contains the mechanism | |||
object identifier (OID, as defined in RFC2743 [7]), the quality of | object identifier (OID, as defined in RFC2743 [7]), the quality of | |||
protection (as defined in RFC2743 [7]) and the service type (as | protection (as defined in RFC 2743 [7]), and the service type (as | |||
defined in RFC2203 [4]). It is possible for SECINFO to return | defined in RFC2203 [4]). It is possible for SECINFO to return | |||
multiple entries with flavor equal to RPCSEC_GSS with different | multiple entries with flavor equal to RPCSEC_GSS with different | |||
security triple values. | security triple values. | |||
On success, the current filehandle is consumed (see | On success, the current filehandle is consumed (see | |||
Section 2.6.3.1.1.8), and if the next operation after SECINFO tries | Section 2.6.3.1.1.8), and if the next operation after SECINFO tries | |||
to use the current filehandle, that operation will fail with the | to use the current filehandle, that operation will fail with the | |||
status NFS4ERR_NOFILEHANDLE. | status NFS4ERR_NOFILEHANDLE. | |||
If the name has a length of 0 (zero), or if name does not obey the | If the name has a length of zero, or if the name does not obey the | |||
UTF-8 definition (assuming UTF-8 capabilities are enabled, see | UTF-8 definition (assuming UTF-8 capabilities are enabled; see | |||
Section 14.4), the error NFS4ERR_INVAL will be returned. | Section 14.4), the error NFS4ERR_INVAL will be returned. | |||
See Section 2.6 for additional information on the use of SECINFO. | See Section 2.6 for additional information on the use of SECINFO. | |||
18.29.4. IMPLEMENTATION | 18.29.4. IMPLEMENTATION | |||
The SECINFO operation is expected to be used by the NFS client when | The SECINFO operation is expected to be used by the NFS client when | |||
the error value of NFS4ERR_WRONGSEC is returned from another NFS | the error value of NFS4ERR_WRONGSEC is returned from another NFS | |||
operation. This signifies to the client that the server's security | operation. This signifies to the client that the server's security | |||
policy is different from what the client is currently using. At this | policy is different from what the client is currently using. At this | |||
point, the client is expected to obtain a list of possible security | point, the client is expected to obtain a list of possible security | |||
flavors and choose what best suits its policies. | flavors and choose what best suits its policies. | |||
As mentioned, the server's security policies will determine when a | As mentioned, the server's security policies will determine when a | |||
client request receives NFS4ERR_WRONGSEC. See Table 8 for a list | client request receives NFS4ERR_WRONGSEC. See Table 8 for a list of | |||
operations which can return NFS4ERR_WRONGSEC. In addition, when | operations that can return NFS4ERR_WRONGSEC. In addition, when | |||
READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can | READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can | |||
contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT | contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT | |||
return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the | return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the | |||
target name exists it cannot have a separate security policy from the | target name exists, it cannot have a separate security policy from | |||
parent directory, and the security policy of the parent was checked | the parent directory, and the security policy of the parent was | |||
when its filehandle was injected into the COMPOUND request's | checked when its filehandle was injected into the COMPOUND request's | |||
operations stream (for similar reasons, an OPEN operation that | operations stream (for similar reasons, an OPEN operation that | |||
creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target | creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target | |||
name exists, while it might have a separate security policy, that is | name exists, while it might have a separate security policy, that is | |||
irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale | irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale | |||
for REMOVE is that while that target might have separate security | for REMOVE is that while that target might have a separate security | |||
policy, the target is going to be removed, and so the security policy | policy, the target is going to be removed, and so the security policy | |||
of the parent trumps that of the object being removed. RENAME and | of the parent trumps that of the object being removed. RENAME and | |||
LINK MAY return NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error | LINK MAY return NFS4ERR_WRONGSEC, but the NFS4ERR_WRONGSEC error | |||
applies only to the saved filehandle (see Section 2.6.3.1.2). Any | applies only to the saved filehandle (see Section 2.6.3.1.2). Any | |||
NFS4ERR_WRONGSEC error on the current filehandle used by LINK and | NFS4ERR_WRONGSEC error on the current filehandle used by LINK and | |||
RENAME MUST be returned by the PUTFH, PUTPUBFH, PUTROOTFH, or | RENAME MUST be returned by the PUTFH, PUTPUBFH, PUTROOTFH, or | |||
RESTOREFH operation that injected the current filehandle. | RESTOREFH operation that injected the current filehandle. | |||
With the exception of LINK and RENAME, the set of operations that can | With the exception of LINK and RENAME, the set of operations that can | |||
return NFS4ERR_WRONGSEC represent the point at which the client can | return NFS4ERR_WRONGSEC represents the point at which the client can | |||
inject a filehandle into the "current filehandle" at the server. The | inject a filehandle into the "current filehandle" at the server. The | |||
filehandle is either provided by the client (PUTFH, PUTPUBFH, | filehandle is either provided by the client (PUTFH, PUTPUBFH, | |||
PUTROOTFH), generated as a result of a name to filehandle translation | PUTROOTFH), generated as a result of a name-to-filehandle translation | |||
(LOOKUP and OPEN), or generated from the saved filehandle via | (LOOKUP and OPEN), or generated from the saved filehandle via | |||
RESTOREFH. As Section 2.6.3.1.1.1 states, a put filehandle operation | RESTOREFH. As Section 2.6.3.1.1.1 states, a put filehandle operation | |||
followed by SAVEFH MUST NOT return NFS4ERR_WRONGSEC. Thus the | followed by SAVEFH MUST NOT return NFS4ERR_WRONGSEC. Thus, the | |||
RESTOREFH operation, under certain conditions (see Section 2.6.3.1.1) | RESTOREFH operation, under certain conditions (see | |||
is permitted to return NFS4ERR_WRONGSEC so that security policies can | Section 2.6.3.1.1), is permitted to return NFS4ERR_WRONGSEC so that | |||
be honored. | security policies can be honored. | |||
The READDIR operation will not directly return the NFS4ERR_WRONGSEC | The READDIR operation will not directly return the NFS4ERR_WRONGSEC | |||
error. However, if the READDIR request included a request for | error. However, if the READDIR request included a request for | |||
attributes, it is possible that the READDIR request's security triple | attributes, it is possible that the READDIR request's security triple | |||
did not match that of a directory entry. If this is the case and the | did not match that of a directory entry. If this is the case and the | |||
client has requested the rdattr_error attribute, the server will | client has requested the rdattr_error attribute, the server will | |||
return the NFS4ERR_WRONGSEC error in rdattr_error for the entry. | return the NFS4ERR_WRONGSEC error in rdattr_error for the entry. | |||
To resolve an error return of NFS4ERR_WRONGSEC, the client does the | To resolve an error return of NFS4ERR_WRONGSEC, the client does the | |||
following: | following: | |||
skipping to change at page 484, line 9 | skipping to change at page 485, line 9 | |||
o For PUTFH, PUTROOTFH, PUTPUBFH, RESTOREFH, LINK, and RENAME, the | o For PUTFH, PUTROOTFH, PUTPUBFH, RESTOREFH, LINK, and RENAME, the | |||
client will use SECINFO_NO_NAME { style = | client will use SECINFO_NO_NAME { style = | |||
SECINFO_STYLE4_CURRENT_FH }. The client will prefix the | SECINFO_STYLE4_CURRENT_FH }. The client will prefix the | |||
SECINFO_NO_NAME operation with the appropriate PUTFH, PUTPUBFH, or | SECINFO_NO_NAME operation with the appropriate PUTFH, PUTPUBFH, or | |||
PUTROOTFH operation that provides the filehandle originally | PUTROOTFH operation that provides the filehandle originally | |||
provided by the PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH | provided by the PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH | |||
operation. | operation. | |||
NOTE: In NFSv4.0, the client was required to use SECINFO, and had | NOTE: In NFSv4.0, the client was required to use SECINFO, and had | |||
to reconstruct the parent of the original filehandle, and the | to reconstruct the parent of the original filehandle and the | |||
component name of the original filehandle. The introduction in | component name of the original filehandle. The introduction in | |||
NFSv4.1 of SECINFO_NO_NAME obviates the need for reconstruction. | NFSv4.1 of SECINFO_NO_NAME obviates the need for reconstruction. | |||
o For LOOKUPP, the client will use SECINFO_NO_NAME { style = | o For LOOKUPP, the client will use SECINFO_NO_NAME { style = | |||
SECINFO_STYLE4_PARENT } and provide the filehandle which equals | SECINFO_STYLE4_PARENT } and provide the filehandle that equals the | |||
the filehandle originally provided to LOOKUPP. | filehandle originally provided to LOOKUPP. | |||
See Section 21 for a discussion on the recommendations for the | See Section 21 for a discussion on the recommendations for the | |||
security flavor used by SECINFO and SECINFO_NO_NAME. | security flavor used by SECINFO and SECINFO_NO_NAME. | |||
18.30. Operation 34: SETATTR - Set Attributes | 18.30. Operation 34: SETATTR - Set Attributes | |||
18.30.1. ARGUMENTS | 18.30.1. ARGUMENTS | |||
struct SETATTR4args { | struct SETATTR4args { | |||
/* CURRENT_FH: target object */ | /* CURRENT_FH: target object */ | |||
skipping to change at page 484, line 43 | skipping to change at page 485, line 43 | |||
nfsstat4 status; | nfsstat4 status; | |||
bitmap4 attrsset; | bitmap4 attrsset; | |||
}; | }; | |||
18.30.3. DESCRIPTION | 18.30.3. DESCRIPTION | |||
The SETATTR operation changes one or more of the attributes of a file | The SETATTR operation changes one or more of the attributes of a file | |||
system object. The new attributes are specified with a bitmap and | system object. The new attributes are specified with a bitmap and | |||
the attributes that follow the bitmap in bit order. | the attributes that follow the bitmap in bit order. | |||
The stateid argument for SETATTR is used to provide file locking | The stateid argument for SETATTR is used to provide byte-range | |||
context that is necessary for SETATTR requests that set the size | locking context that is necessary for SETATTR requests that set the | |||
attribute. Since setting the size attribute modifies the file's | size attribute. Since setting the size attribute modifies the file's | |||
data, it has the same locking requirements as a corresponding WRITE. | data, it has the same locking requirements as a corresponding WRITE. | |||
Any SETATTR that sets the size attribute is incompatible with a share | Any SETATTR that sets the size attribute is incompatible with a share | |||
reservation that specifies DENY_WRITE. The area between the old end- | reservation that specifies OPEN4_SHARE_DENY_WRITE. The area between | |||
of-file and the new end-of-file is considered to be modified just as | the old end-of-file and the new end-of-file is considered to be | |||
would have been the case had the area in question been specified as | modified just as would have been the case had the area in question | |||
the target of WRITE, for the purpose of checking conflicts with byte- | been specified as the target of WRITE, for the purpose of checking | |||
range locks, for those cases in which a server is implementing | conflicts with byte-range locks, for those cases in which a server is | |||
mandatory byte-range locking behavior. A valid stateid SHOULD always | implementing mandatory byte-range locking behavior. A valid stateid | |||
be specified. When the file size attribute is not set, the special | SHOULD always be specified. When the file size attribute is not set, | |||
stateid consisting of all bits zero MAY be passed. | the special stateid consisting of all bits equal to zero MAY be | |||
passed. | ||||
On either success or failure of the operation, the server will return | On either success or failure of the operation, the server will return | |||
the attrsset bitmask to represent what (if any) attributes were | the attrsset bitmask to represent what (if any) attributes were | |||
successfully set. The attrsset in the response is a subset of the | successfully set. The attrsset in the response is a subset of the | |||
attrmask field of the obj_attributes field in the argument. | attrmask field of the obj_attributes field in the argument. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.30.4. IMPLEMENTATION | 18.30.4. IMPLEMENTATION | |||
If the request specifies the owner attribute to be set, the server | If the request specifies the owner attribute to be set, the server | |||
SHOULD allow the operation to succeed if the current owner of the | SHOULD allow the operation to succeed if the current owner of the | |||
object matches the value specified in the request. Some servers may | object matches the value specified in the request. Some servers may | |||
be implemented in a way as to prohibit the setting of the owner | be implemented in a way as to prohibit the setting of the owner | |||
attribute unless the requester has privilege to do so. If the server | attribute unless the requester has privilege to do so. If the server | |||
is lenient in this one case of matching owner values, the client | is lenient in this one case of matching owner values, the client | |||
implementation may be simplified in cases of creation of an object | implementation may be simplified in cases of creation of an object | |||
(e.g. an exclusive create via OPEN) followed by a SETATTR. | (e.g., an exclusive create via OPEN) followed by a SETATTR. | |||
The file size attribute is used to request changes to the size of a | The file size attribute is used to request changes to the size of a | |||
file. A value of zero causes the file to be truncated, a value less | file. A value of zero causes the file to be truncated, a value less | |||
than the current size of the file causes data from new size to the | than the current size of the file causes data from new size to the | |||
end of the file to be discarded, and a size greater than the current | end of the file to be discarded, and a size greater than the current | |||
size of the file causes logically zeroed data bytes to be added to | size of the file causes logically zeroed data bytes to be added to | |||
the end of the file. Servers are free to implement this using | the end of the file. Servers are free to implement this using | |||
unallocated bytes (holes) or allocated data bytes set to zero. | unallocated bytes (holes) or allocated data bytes set to zero. | |||
Clients should not make any assumptions regarding a server's | Clients should not make any assumptions regarding a server's | |||
implementation of this feature, beyond that the bytes in affected | implementation of this feature, beyond that the bytes in the affected | |||
region returned by READ will be zeroed. Servers MUST support | byte-range returned by READ will be zeroed. Servers MUST support | |||
extending the file size via SETATTR. | extending the file size via SETATTR. | |||
SETATTR is not guaranteed to be atomic. A failed SETATTR may | SETATTR is not guaranteed to be atomic. A failed SETATTR may | |||
partially change a file's attributes, hence the reason why the reply | partially change a file's attributes, hence the reason why the reply | |||
always includes the status and the list of attributes that were set. | always includes the status and the list of attributes that were set. | |||
If the object whose attributes are being changed has a file | If the object whose attributes are being changed has a file | |||
delegation which is held by a client other than the one doing the | delegation that is held by a client other than the one doing the | |||
SETATTR, the delegation(s) must be recalled, and the operation cannot | SETATTR, the delegation(s) must be recalled, and the operation cannot | |||
proceed to actually change an attribute until each such delegation is | proceed to actually change an attribute until each such delegation is | |||
returned or revoked. In all cases in which delegations are recalled, | returned or revoked. In all cases in which delegations are recalled, | |||
the server is likely to return one or more NFS4ERR_DELAY error while | the server is likely to return one or more NFS4ERR_DELAY errors while | |||
the delegation(s) remains outstanding, although it may, if the | the delegation(s) remains outstanding, although it might not do that | |||
returns happen quickly, not do that. | if the delegations are returned quickly. | |||
If the object whose attributes are being set is a directory and | If the object whose attributes are being set is a directory and | |||
another client holds a directory delegation for that directory, then | another client holds a directory delegation for that directory, then | |||
if enabled, asynchronous notifications will be generated when the set | if enabled, asynchronous notifications will be generated when the set | |||
of attributes changed has a non-null intersection with the set of | of attributes changed has a non-null intersection with the set of | |||
attributes for which notification is requested. Notifications of | attributes for which notification is requested. Notifications of | |||
type NOTIFY4_CHANGE_DIR_ATTRS will be sent to the appropriate | type NOTIFY4_CHANGE_DIR_ATTRS will be sent to the appropriate | |||
client(s), but the SETATTR is not delayed by waiting for these | client(s), but the SETATTR is not delayed by waiting for these | |||
notifications to be sent. | notifications to be sent. | |||
If the object whose attributes are being set is a member of directory | If the object whose attributes are being set is a member of the | |||
for which another client holds a directory delegation, then | directory for which another client holds a directory delegation, then | |||
asynchronous notifications will be generated when the set of | asynchronous notifications will be generated when the set of | |||
attributes changed has a non-null intersection with the set of | attributes changed has a non-null intersection with the set of | |||
attributes for which notification is requested. Notifications of | attributes for which notification is requested. Notifications of | |||
type NOTIFY4_CHANGE_CHILD_ATTRS will be sent to the appropriate | type NOTIFY4_CHANGE_CHILD_ATTRS will be sent to the appropriate | |||
clients, but the SETATTR is not delayed by waiting for these | clients, but the SETATTR is not delayed by waiting for these | |||
notifications to be sent. | notifications to be sent. | |||
Changing the size of a file with SETATTR indirectly changes the | Changing the size of a file with SETATTR indirectly changes the | |||
time_modify and change attributes. A client must account for this as | time_modify and change attributes. A client must account for this as | |||
size changes can result in data deletion. | size changes can result in data deletion. | |||
skipping to change at page 487, line 9 | skipping to change at page 488, line 9 | |||
guard condition and the setting of the attributes have the potential | guard condition and the setting of the attributes have the potential | |||
to compromise this function, as would the corresponding delay in the | to compromise this function, as would the corresponding delay in the | |||
NFSv4 emulation. Therefore, NFSv4.1 servers SHOULD take care to | NFSv4 emulation. Therefore, NFSv4.1 servers SHOULD take care to | |||
avoid such delays, to the degree possible, when executing such a | avoid such delays, to the degree possible, when executing such a | |||
request. | request. | |||
If the server does not support an attribute as requested by the | If the server does not support an attribute as requested by the | |||
client, the server SHOULD return NFS4ERR_ATTRNOTSUPP. | client, the server SHOULD return NFS4ERR_ATTRNOTSUPP. | |||
A mask of the attributes actually set is returned by SETATTR in all | A mask of the attributes actually set is returned by SETATTR in all | |||
cases. That mask MUST NOT include attributes bits not requested to | cases. That mask MUST NOT include attribute bits not requested to be | |||
be set by the client. If the attribute masks in the request and | set by the client. If the attribute masks in the request and reply | |||
reply are equal, the status field in the reply MUST be NFS4_OK. | are equal, the status field in the reply MUST be NFS4_OK. | |||
18.31. Operation 37: VERIFY - Verify Same Attributes | 18.31. Operation 37: VERIFY - Verify Same Attributes | |||
18.31.1. ARGUMENTS | 18.31.1. ARGUMENTS | |||
struct VERIFY4args { | struct VERIFY4args { | |||
/* CURRENT_FH: object */ | /* CURRENT_FH: object */ | |||
fattr4 obj_attributes; | fattr4 obj_attributes; | |||
}; | }; | |||
18.31.2. RESULTS | 18.31.2. RESULTS | |||
struct VERIFY4res { | struct VERIFY4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
18.31.3. DESCRIPTION | 18.31.3. DESCRIPTION | |||
The VERIFY operation is used to verify that attributes have the value | The VERIFY operation is used to verify that attributes have the value | |||
assumed by the client before proceeding with following operations in | assumed by the client before proceeding with the following operations | |||
the COMPOUND request. If any of the attributes do not match then the | in the COMPOUND request. If any of the attributes do not match, then | |||
error NFS4ERR_NOT_SAME must be returned. The current filehandle | the error NFS4ERR_NOT_SAME must be returned. The current filehandle | |||
retains its value after successful completion of the operation. | retains its value after successful completion of the operation. | |||
18.31.4. IMPLEMENTATION | 18.31.4. IMPLEMENTATION | |||
One possible use of the VERIFY operation is the following series of | One possible use of the VERIFY operation is the following series of | |||
operations. With this the client is attempting to verify that the | operations. With this, the client is attempting to verify that the | |||
file being removed will match what the client expects to be removed. | file being removed will match what the client expects to be removed. | |||
This series can help prevent the unintended deletion of a file. | This series can help prevent the unintended deletion of a file. | |||
PUTFH (directory filehandle) | PUTFH (directory filehandle) | |||
LOOKUP (file name) | LOOKUP (file name) | |||
VERIFY (filehandle == fh) | VERIFY (filehandle == fh) | |||
PUTFH (directory filehandle) | PUTFH (directory filehandle) | |||
REMOVE (file name) | REMOVE (file name) | |||
This series does not prevent a second client from removing and | This series does not prevent a second client from removing and | |||
creating a new file in the middle of this sequence but it does help | creating a new file in the middle of this sequence, but it does help | |||
avoid the unintended result. | avoid the unintended result. | |||
In the case that a RECOMMENDED attribute is specified in the VERIFY | In the case that a RECOMMENDED attribute is specified in the VERIFY | |||
operation and the server does not support that attribute for the file | operation and the server does not support that attribute for the file | |||
system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | system object, the error NFS4ERR_ATTRNOTSUPP is returned to the | |||
client. | client. | |||
When the attribute rdattr_error or any set-only attribute (e.g. | When the attribute rdattr_error or any set-only attribute (e.g., | |||
time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | time_modify_set) is specified, the error NFS4ERR_INVAL is returned to | |||
the client. | the client. | |||
18.32. Operation 38: WRITE - Write to File | 18.32. Operation 38: WRITE - Write to File | |||
18.32.1. ARGUMENTS | 18.32.1. ARGUMENTS | |||
enum stable_how4 { | enum stable_how4 { | |||
UNSTABLE4 = 0, | UNSTABLE4 = 0, | |||
DATA_SYNC4 = 1, | DATA_SYNC4 = 1, | |||
skipping to change at page 489, line 10 | skipping to change at page 490, line 10 | |||
WRITE4resok resok4; | WRITE4resok resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.32.3. DESCRIPTION | 18.32.3. DESCRIPTION | |||
The WRITE operation is used to write data to a regular file. The | The WRITE operation is used to write data to a regular file. The | |||
target file is specified by the current filehandle. The offset | target file is specified by the current filehandle. The offset | |||
specifies the offset where the data should be written. An offset of | specifies the offset where the data should be written. An offset of | |||
0 (zero) specifies that the write should start at the beginning of | zero specifies that the write should start at the beginning of the | |||
the file. The count, as encoded as part of the opaque data | file. The count, as encoded as part of the opaque data parameter, | |||
parameter, represents the number of bytes of data that are to be | represents the number of bytes of data that are to be written. If | |||
written. If the count is 0 (zero), the WRITE will succeed and return | the count is zero, the WRITE will succeed and return a count of zero | |||
a count of 0 (zero) subject to permissions checking. The server MAY | subject to permissions checking. The server MAY write fewer bytes | |||
write fewer bytes than requested by the client. | than requested by the client. | |||
The client specifies with the stable parameter the method of how the | The client specifies with the stable parameter the method of how the | |||
data is to be processed by the server. If stable is FILE_SYNC4, the | data is to be processed by the server. If stable is FILE_SYNC4, the | |||
server MUST commit the data written plus all file system metadata to | server MUST commit the data written plus all file system metadata to | |||
stable storage before returning results. This corresponds to the | stable storage before returning results. This corresponds to the | |||
NFSv2 protocol semantics. Any other behavior constitutes a protocol | NFSv2 protocol semantics. Any other behavior constitutes a protocol | |||
violation. If stable is DATA_SYNC4, then the server MUST commit all | violation. If stable is DATA_SYNC4, then the server MUST commit all | |||
of the data to stable storage and enough of the metadata to retrieve | of the data to stable storage and enough of the metadata to retrieve | |||
the data before returning. The server implementor is free to | the data before returning. The server implementor is free to | |||
implement DATA_SYNC4 in the same fashion as FILE_SYNC4, but with a | implement DATA_SYNC4 in the same fashion as FILE_SYNC4, but with a | |||
skipping to change at page 489, line 40 | skipping to change at page 490, line 40 | |||
will subsequently be committed to stable storage. The only | will subsequently be committed to stable storage. The only | |||
guarantees made by the server are that it will not destroy any data | guarantees made by the server are that it will not destroy any data | |||
without changing the value of writeverf and that it will not commit | without changing the value of writeverf and that it will not commit | |||
the data and metadata at a level less than that requested by the | the data and metadata at a level less than that requested by the | |||
client. | client. | |||
Except when special stateids are used, the stateid value for a WRITE | Except when special stateids are used, the stateid value for a WRITE | |||
request represents a value returned from a previous byte-range LOCK | request represents a value returned from a previous byte-range LOCK | |||
or OPEN request or the stateid associated with a delegation. The | or OPEN request or the stateid associated with a delegation. The | |||
stateid identifies the associated owners if any and is used by the | stateid identifies the associated owners if any and is used by the | |||
server to verify that the associated locks are still valid (e.g. have | server to verify that the associated locks are still valid (e.g., | |||
not been revoked). | have not been revoked). | |||
Upon successful completion, the following results are returned. The | Upon successful completion, the following results are returned. The | |||
count result is the number of bytes of data written to the file. The | count result is the number of bytes of data written to the file. The | |||
server may write fewer bytes than requested. If so, the actual | server may write fewer bytes than requested. If so, the actual | |||
number of bytes written starting at location, offset, is returned. | number of bytes written starting at location, offset, is returned. | |||
The server also returns an indication of the level of commitment of | The server also returns an indication of the level of commitment of | |||
the data and metadata via committed. Per Table 11, | the data and metadata via committed. Per Table 11, | |||
o The server MAY commit the data at a stronger level than requested. | o The server MAY commit the data at a stronger level than requested. | |||
skipping to change at page 490, line 23 | skipping to change at page 491, line 23 | |||
+------------+-----------------------------------+ | +------------+-----------------------------------+ | |||
| UNSTABLE4 | FILE_SYNC4, DATA_SYNC4, UNSTABLE4 | | | UNSTABLE4 | FILE_SYNC4, DATA_SYNC4, UNSTABLE4 | | |||
| DATA_SYNC4 | FILE_SYNC4, DATA_SYNC4 | | | DATA_SYNC4 | FILE_SYNC4, DATA_SYNC4 | | |||
| FILE_SYNC4 | FILE_SYNC4 | | | FILE_SYNC4 | FILE_SYNC4 | | |||
+------------+-----------------------------------+ | +------------+-----------------------------------+ | |||
Table 11 | Table 11 | |||
The final portion of the result is the field writeverf. This field | The final portion of the result is the field writeverf. This field | |||
is the write verifier and is a cookie that the client can use to | is the write verifier and is a cookie that the client can use to | |||
determine whether a server has changed instance state (e.g. server | determine whether a server has changed instance state (e.g., server | |||
restart) between a call to WRITE and a subsequent call to either | restart) between a call to WRITE and a subsequent call to either | |||
WRITE or COMMIT. This cookie MUST be unchanged during a single | WRITE or COMMIT. This cookie MUST be unchanged during a single | |||
instance of the NFSv4.1 server and MUST be unique between instances | instance of the NFSv4.1 server and MUST be unique between instances | |||
of the NFSv4.1 server. If the cookie changes, then the client MUST | of the NFSv4.1 server. If the cookie changes, then the client MUST | |||
assume that any data written with an UNSTABLE4 value for committed | assume that any data written with an UNSTABLE4 value for committed | |||
and an old writeverf in the reply has been lost and will need to be | and an old writeverf in the reply has been lost and will need to be | |||
recovered. | recovered. | |||
If a client writes data to the server with the stable argument set to | If a client writes data to the server with the stable argument set to | |||
UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or | UNSTABLE4 and the reply yields a committed response of DATA_SYNC4 or | |||
UNSTABLE4, the client will follow up some time in the future with a | UNSTABLE4, the client will follow up some time in the future with a | |||
COMMIT operation to synchronize outstanding asynchronous data and | COMMIT operation to synchronize outstanding asynchronous data and | |||
metadata with the server's stable storage, barring client error. It | metadata with the server's stable storage, barring client error. It | |||
is possible that due to client crash or other error that a subsequent | is possible that due to client crash or other error that a subsequent | |||
COMMIT will not be received by the server. | COMMIT will not be received by the server. | |||
For a WRITE with a stateid value of all bits 0, the server MAY allow | For a WRITE with a stateid value of all bits equal to zero, the | |||
the WRITE to be serviced subject to mandatory file locks or the | server MAY allow the WRITE to be serviced subject to mandatory byte- | |||
current share deny modes for the file. For a WRITE with a stateid | range locks or the current share deny modes for the file. For a | |||
value of all bits 1, the server MUST NOT allow the WRITE operation to | WRITE with a stateid value of all bits equal to 1, the server MUST | |||
bypass locking checks at the server and otherwise is treated as if a | NOT allow the WRITE operation to bypass locking checks at the server | |||
stateid of all bits 0 were used. | and otherwise is treated as if a stateid of all bits equal to zero | |||
were used. | ||||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
18.32.4. IMPLEMENTATION | 18.32.4. IMPLEMENTATION | |||
It is possible for the server to write fewer bytes of data than | It is possible for the server to write fewer bytes of data than | |||
requested by the client. In this case, the server SHOULD NOT return | requested by the client. In this case, the server SHOULD NOT return | |||
an error unless no data was written at all. If the server writes | an error unless no data was written at all. If the server writes | |||
less than the number of bytes specified, the client will need to send | less than the number of bytes specified, the client will need to send | |||
another WRITE to write the remaining data. | another WRITE to write the remaining data. | |||
It is assumed that the act of writing data to a file will cause the | It is assumed that the act of writing data to a file will cause the | |||
time_modified and change attributes of the file to be updated. | time_modified and change attributes of the file to be updated. | |||
However, these attributes SHOULD NOT be changed unless the contents | However, these attributes SHOULD NOT be changed unless the contents | |||
of the file are changed. Thus, a WRITE request with count set to 0 | of the file are changed. Thus, a WRITE request with count set to | |||
SHOULD NOT cause the time_modified and change attributes of the file | zero SHOULD NOT cause the time_modified and change attributes of the | |||
to be updated. | file to be updated. | |||
Stable storage is persistent storage that survives: | Stable storage is persistent storage that survives: | |||
1. Repeated power failures. | 1. Repeated power failures. | |||
2. Hardware failures (of any board, power supply, etc.). | 2. Hardware failures (of any board, power supply, etc.). | |||
3. Repeated software crashes and restarts. | 3. Repeated software crashes and restarts. | |||
This definition does not address failure of the stable storage module | This definition does not address failure of the stable storage module | |||
itself. | itself. | |||
The verifier is defined to allow a client to detect different | The verifier is defined to allow a client to detect different | |||
instances of an NFSv4.1 protocol server over which cached, | instances of an NFSv4.1 protocol server over which cached, | |||
uncommitted data may be lost. In the most likely case, the verifier | uncommitted data may be lost. In the most likely case, the verifier | |||
allows the client to detect server restarts. This information is | allows the client to detect server restarts. This information is | |||
required so that the client can safely determine whether the server | required so that the client can safely determine whether the server | |||
could have lost cached data. If the server fails unexpectedly and | could have lost cached data. If the server fails unexpectedly and | |||
the client has uncommitted data from previous WRITE requests (done | the client has uncommitted data from previous WRITE requests (done | |||
with the stable argument set to UNSTABLE4 and in which the result | with the stable argument set to UNSTABLE4 and in which the result | |||
committed was returned as UNSTABLE4 as well) the server might not | committed was returned as UNSTABLE4 as well), the server might not | |||
have flushed cached data to stable storage. The burden of recovery | have flushed cached data to stable storage. The burden of recovery | |||
is on the client and the client will need to retransmit the data to | is on the client, and the client will need to retransmit the data to | |||
the server. | the server. | |||
A suggested verifier would be to use the time that the server was | A suggested verifier would be to use the time that the server was | |||
last started (if restarting the server results in lost buffers). | last started (if restarting the server results in lost buffers). | |||
The reply's committed field allows the client to do more effective | The reply's committed field allows the client to do more effective | |||
caching. If the server is committing all WRITE requests to stable | caching. If the server is committing all WRITE requests to stable | |||
storage, then it SHOULD return with committed set to FILE_SYNC4, | storage, then it SHOULD return with committed set to FILE_SYNC4, | |||
regardless of the value of the stable field in the arguments. A | regardless of the value of the stable field in the arguments. A | |||
server that uses an NVRAM accelerator may choose to implement this | server that uses an NVRAM accelerator may choose to implement this | |||
skipping to change at page 492, line 12 | skipping to change at page 493, line 16 | |||
Some implementations may return NFS4ERR_NOSPC instead of | Some implementations may return NFS4ERR_NOSPC instead of | |||
NFS4ERR_DQUOT when a user's quota is exceeded. | NFS4ERR_DQUOT when a user's quota is exceeded. | |||
In the case that the current filehandle is of type NF4DIR, the server | In the case that the current filehandle is of type NF4DIR, the server | |||
will return NFS4ERR_ISDIR. If the current file is a symbolic link, | will return NFS4ERR_ISDIR. If the current file is a symbolic link, | |||
the error NFS4ERR_SYMLINK will be returned. Otherwise, if the | the error NFS4ERR_SYMLINK will be returned. Otherwise, if the | |||
current filehandle does not designate an ordinary file, the server | current filehandle does not designate an ordinary file, the server | |||
will return NFS4ERR_WRONG_TYPE. | will return NFS4ERR_WRONG_TYPE. | |||
If mandatory file locking is in effect for the file, and the | If mandatory byte-range locking is in effect for the file, and the | |||
corresponding byte-range of the data to be written to the file is | corresponding byte-range of the data to be written to the file is | |||
read or write locked by an owner that is not associated with the | READ_LT or WRITE_LT locked by an owner that is not associated with | |||
stateid, the server MUST return NFS4ERR_LOCKED. If so, the client | the stateid, the server MUST return NFS4ERR_LOCKED. If so, the | |||
MUST check if the owner corresponding to the stateid used with the | client MUST check if the owner corresponding to the stateid used with | |||
WRITE operation has a conflicting read lock that overlaps with the | the WRITE operation has a conflicting READ_LT lock that overlaps with | |||
region that was to be written. If the stateid's owner has no | the byte-range that was to be written. If the stateid's owner has no | |||
conflicting read lock, then the client SHOULD try to get the | conflicting READ_LT lock, then the client SHOULD try to get the | |||
appropriate write byte-range lock via the LOCK operation before re- | appropriate write byte-range lock via the LOCK operation before re- | |||
attempting the WRITE. When the WRITE completes, the client SHOULD | attempting the WRITE. When the WRITE completes, the client SHOULD | |||
release the byte-range lock via LOCKU. | release the byte-range lock via LOCKU. | |||
If the stateid's owner had a conflicting read lock, then the client | If the stateid's owner had a conflicting READ_LT lock, then the | |||
has no choice but to return an error to the application that | client has no choice but to return an error to the application that | |||
attempted the WRITE. The reason is that since the stateid's owner | attempted the WRITE. The reason is that since the stateid's owner | |||
had a read lock, the server either attempted to temporarily | had a READ_LT lock, either the server attempted to temporarily | |||
effectively upgrade this read lock to a write lock, or the server has | effectively upgrade this READ_LT lock to a WRITE_LT lock or the | |||
no upgrade capability. If the server attempted to upgrade the read | server has no upgrade capability. If the server attempted to upgrade | |||
lock and failed, it is pointless for the client to re-attempt the | the READ_LT lock and failed, it is pointless for the client to re- | |||
upgrade via the LOCK operation, because there might be another client | attempt the upgrade via the LOCK operation, because there might be | |||
also trying to upgrade. If two clients are blocked trying upgrade | another client also trying to upgrade. If two clients are blocked | |||
the same lock, the clients deadlock. If the server has no upgrade | trying to upgrade the same lock, the clients deadlock. If the server | |||
capability, then it is pointless to try a LOCK operation to upgrade. | has no upgrade capability, then it is pointless to try a LOCK | |||
operation to upgrade. | ||||
If one or more other clients have delegations for the file being | If one or more other clients have delegations for the file being | |||
written, those delegations MUST be recalled, and the operation cannot | written, those delegations MUST be recalled, and the operation cannot | |||
proceed until those delegations are returned or revoked. Except | proceed until those delegations are returned or revoked. Except | |||
where this happens very quickly, one or more NFS4ERR_DELAY errors | where this happens very quickly, one or more NFS4ERR_DELAY errors | |||
will be returned to requests made while the delegation remains | will be returned to requests made while the delegation remains | |||
outstanding. Normally, delegations will not be recalled as a result | outstanding. Normally, delegations will not be recalled as a result | |||
of a WRITE operation since the recall will occur as a result of an | of a WRITE operation since the recall will occur as a result of an | |||
earlier OPEN. However, since it is possible for a WRITE to be done | earlier OPEN. However, since it is possible for a WRITE to be done | |||
with a special stateid, the server needs to check for this case even | with a special stateid, the server needs to check for this case even | |||
skipping to change at page 496, line 7 | skipping to change at page 497, line 7 | |||
If, when the client ID was created, the client opted for SP4_NONE | If, when the client ID was created, the client opted for SP4_NONE | |||
state protection, the client is not required to use | state protection, the client is not required to use | |||
BIND_CONN_TO_SESSION to associate the connection with the session, | BIND_CONN_TO_SESSION to associate the connection with the session, | |||
unless the client wishes to associate the connection with the | unless the client wishes to associate the connection with the | |||
backchannel. When SP4_NONE protection is used, simply sending a | backchannel. When SP4_NONE protection is used, simply sending a | |||
COMPOUND request with a SEQUENCE operation is sufficient to associate | COMPOUND request with a SEQUENCE operation is sufficient to associate | |||
the connection with the session specified in SEQUENCE. | the connection with the session specified in SEQUENCE. | |||
The field bctsa_dir indicates whether the client wants to associate | The field bctsa_dir indicates whether the client wants to associate | |||
the connection with the fore channel or the backchannel or both | the connection with the fore channel or the backchannel or both | |||
channels. The value CDFC4_FORE_OR_BOTH indicates the client wants to | channels. The value CDFC4_FORE_OR_BOTH indicates that the client | |||
associate the connection with both the fore channel and backchannel, | wants to associate the connection with both the fore channel and | |||
but will accept the connection being associated to just the fore | backchannel, but will accept the connection being associated to just | |||
channel. The value CDFC4_BACK_OR_BOTH indicates the client wants to | the fore channel. The value CDFC4_BACK_OR_BOTH indicates that the | |||
associate with both the fore and backchannel, but will accept the | client wants to associate with both the fore channel and backchannel, | |||
connection being associated with just the backchannel. The server | but will accept the connection being associated with just the | |||
replies in bctsr_dir which channel(s) the connection is associated | backchannel. The server replies in bctsr_dir which channel(s) the | |||
with. If the client specified CDFC4_FORE, the server MUST return | connection is associated with. If the client specified CDFC4_FORE, | |||
CDFS4_FORE. If the client specified CDFC4_BACK, the server MUST | the server MUST return CDFS4_FORE. If the client specified | |||
return CDFS4_BACK. If the client specified CDFC4_FORE_OR_BOTH, the | CDFC4_BACK, the server MUST return CDFS4_BACK. If the client | |||
server MUST return CDFS4_FORE or CDFS4_BOTH. If the client specified | specified CDFC4_FORE_OR_BOTH, the server MUST return CDFS4_FORE or | |||
CDFC4_BACK_OR_BOTH, the server MUST return CDFS4_BACK or CDFS4_BOTH. | CDFS4_BOTH. If the client specified CDFC4_BACK_OR_BOTH, the server | |||
MUST return CDFS4_BACK or CDFS4_BOTH. | ||||
See the CREATE_SESSION operation (Section 18.36), and the description | See the CREATE_SESSION operation (Section 18.36), and the description | |||
of the argument csa_use_conn_in_rdma_mode to understand | of the argument csa_use_conn_in_rdma_mode to understand | |||
bctsa_use_conn_in_rdma_mode, and the description of | bctsa_use_conn_in_rdma_mode, and the description of | |||
csr_use_conn_in_rdma_mode to understand bctsr_use_conn_in_rdma_mode. | csr_use_conn_in_rdma_mode to understand bctsr_use_conn_in_rdma_mode. | |||
Invoking BIND_CONN_TO_SESSION on a connection already associated with | Invoking BIND_CONN_TO_SESSION on a connection already associated with | |||
the specified session has no effect, and the server MUST respond with | the specified session has no effect, and the server MUST respond with | |||
NFS4_OK, unless the client is demanding changes to the set of | NFS4_OK, unless the client is demanding changes to the set of | |||
channels the connection is associated with. If so, the server MUST | channels the connection is associated with. If so, the server MUST | |||
skipping to change at page 497, line 7 | skipping to change at page 498, line 8 | |||
client, per the sessions model, MUST retry the SET_SSV. But it needs | client, per the sessions model, MUST retry the SET_SSV. But it needs | |||
a new connection to do so, and MUST associate that connection with | a new connection to do so, and MUST associate that connection with | |||
the session via a BIND_CONN_TO_SESSION authenticated with the SSV GSS | the session via a BIND_CONN_TO_SESSION authenticated with the SSV GSS | |||
mechanism. The problem is that the RPCSEC_GSS message integrity | mechanism. The problem is that the RPCSEC_GSS message integrity | |||
codes use a subkey derived from the SSV as the key and the SSV may | codes use a subkey derived from the SSV as the key and the SSV may | |||
have changed. While there are multiple recovery strategies, a | have changed. While there are multiple recovery strategies, a | |||
single, general strategy is described here. | single, general strategy is described here. | |||
o The client reconnects. | o The client reconnects. | |||
o The client assumes the SET_SSV was executed, and so sends | o The client assumes that the SET_SSV was executed, and so sends | |||
BIND_CONN_TO_SESSION with the subkey (derived from the new SSV, | BIND_CONN_TO_SESSION with the subkey (derived from the new SSV, | |||
i.e., what SET_SSV would have set the SSV to) used as the key for | i.e., what SET_SSV would have set the SSV to) used as the key for | |||
the RPCSEC_GSS credential message integrity codes. | the RPCSEC_GSS credential message integrity codes. | |||
o If the request succeeds, this means the original attempted SET_SSV | o If the request succeeds, this means that the original attempted | |||
did execute successfully. The client re-sends the original | SET_SSV did execute successfully. The client re-sends the | |||
SET_SSV, which the server will reply to via the reply cache. | original SET_SSV, which the server will reply to via the reply | |||
cache. | ||||
o If the server returns an RPC authentication error, this means the | o If the server returns an RPC authentication error, this means that | |||
server's current SSV was not changed, (and the SET_SSV was likely | the server's current SSV was not changed (and the SET_SSV was | |||
not executed). The client then tries BIND_CONN_TO_SESSION with | likely not executed). The client then tries BIND_CONN_TO_SESSION | |||
the subkey derived from the old SSV as the key for the RPCSEC_GSS | with the subkey derived from the old SSV as the key for the | |||
message integrity codes. | RPCSEC_GSS message integrity codes. | |||
o The attempted BIND_CONN_TO_SESSION with the old SSV should | o The attempted BIND_CONN_TO_SESSION with the old SSV should | |||
succeed. If so the client re-sends the original SET_SSV. If the | succeed. If so, the client re-sends the original SET_SSV. If the | |||
original SET_SSV was not executed, then the server executes it. | original SET_SSV was not executed, then the server executes it. | |||
If the original SET_SSV was executed, but failed, the server will | If the original SET_SSV was executed but failed, the server will | |||
return the SET_SSV from the reply cache. | return the SET_SSV from the reply cache. | |||
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID | 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID | |||
Exchange long hand client and server identifiers (owners), and create | The EXCHANGE_ID exchanges long-hand client and server identifiers | |||
a client ID | (owners), and creates a client ID. | |||
18.35.1. ARGUMENT | 18.35.1. ARGUMENT | |||
const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; | const EXCHGID4_FLAG_SUPP_MOVED_REFER = 0x00000001; | |||
const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; | const EXCHGID4_FLAG_SUPP_MOVED_MIGR = 0x00000002; | |||
const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; | const EXCHGID4_FLAG_BIND_PRINC_STATEID = 0x00000100; | |||
const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; | const EXCHGID4_FLAG_USE_NON_PNFS = 0x00010000; | |||
const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; | const EXCHGID4_FLAG_USE_PNFS_MDS = 0x00020000; | |||
const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; | const EXCHGID4_FLAG_USE_PNFS_DS = 0x00040000; | |||
skipping to change at page 500, line 12 | skipping to change at page 501, line 12 | |||
along with the returned eir_sequenceid, as arguments to | along with the returned eir_sequenceid, as arguments to | |||
CREATE_SESSION. If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the | CREATE_SESSION. If the flag EXCHGID4_FLAG_CONFIRMED_R is set in the | |||
result, eir_flags, then eir_sequenceid MUST be ignored, as it has no | result, eir_flags, then eir_sequenceid MUST be ignored, as it has no | |||
relevancy. | relevancy. | |||
EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with | EXCHANGE_ID MAY be sent in a COMPOUND procedure that starts with | |||
SEQUENCE. However, when a client communicates with a server for the | SEQUENCE. However, when a client communicates with a server for the | |||
first time, it will not have a session, so using SEQUENCE will not be | first time, it will not have a session, so using SEQUENCE will not be | |||
possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then | possible. If EXCHANGE_ID is sent without a preceding SEQUENCE, then | |||
it MUST be the only operation in the COMPOUND procedure's request. | it MUST be the only operation in the COMPOUND procedure's request. | |||
If is not, the server MUST return NFS4ERR_NOT_ONLY_OP. | If it is not, the server MUST return NFS4ERR_NOT_ONLY_OP. | |||
The eia_clientowner field is composed of a co_verifier field and a | The eia_clientowner field is composed of a co_verifier field and a | |||
co_ownerid string. As noted in Section 2.4, the co_ownerid describes | co_ownerid string. As noted in Section 2.4, the co_ownerid describes | |||
the client, and the co_verifier is the incarnation of the client. An | the client, and the co_verifier is the incarnation of the client. An | |||
EXCHANGE_ID sent with a new incarnation of the client will lead to | EXCHANGE_ID sent with a new incarnation of the client will lead to | |||
the server removing lock state of the old incarnation. Whereas an | the server removing lock state of the old incarnation. Whereas an | |||
EXCHANGE_ID sent with the current incarnation and co_ownerid will | EXCHANGE_ID sent with the current incarnation and co_ownerid will | |||
result in an error or an update of the client ID's properties, | result in an error or an update of the client ID's properties, | |||
depending on the arguments to EXCHANGE_ID. | depending on the arguments to EXCHANGE_ID. | |||
skipping to change at page 500, line 35 | skipping to change at page 501, line 35 | |||
In addition to the client ID and sequence ID, the server returns a | In addition to the client ID and sequence ID, the server returns a | |||
server owner (eir_server_owner) and server scope (eir_server_scope). | server owner (eir_server_owner) and server scope (eir_server_scope). | |||
The former field is used for network trunking as described in | The former field is used for network trunking as described in | |||
Section 2.10.5. The latter field is used to allow clients to | Section 2.10.5. The latter field is used to allow clients to | |||
determine when client IDs sent by one server may be recognized by | determine when client IDs sent by one server may be recognized by | |||
another in the event of file system migration (see Section 11.7.7). | another in the event of file system migration (see Section 11.7.7). | |||
The client ID returned by EXCHANGE_ID is only unique relative to the | The client ID returned by EXCHANGE_ID is only unique relative to the | |||
combination of eir_server_owner.so_major_id and eir_server_scope. | combination of eir_server_owner.so_major_id and eir_server_scope. | |||
Thus if two servers return the same client ID, the onus is on the | Thus, if two servers return the same client ID, the onus is on the | |||
client to distinguish the client IDs on the basis of | client to distinguish the client IDs on the basis of | |||
eir_server_owner.so_major_id and eir_server_scope. In the event two | eir_server_owner.so_major_id and eir_server_scope. In the event two | |||
different server's claim matching server_owner.so_major_id and | different servers claim matching server_owner.so_major_id and | |||
eir_server_scope, the client can use the verification techniques | eir_server_scope, the client can use the verification techniques | |||
discussed in Section 2.10.5 to determine if the servers are distinct. | discussed in Section 2.10.5 to determine if the servers are distinct. | |||
If they are distinct, then the client will need to note the | If they are distinct, then the client will need to note the | |||
destination network addresses of the connections used with each | destination network addresses of the connections used with each | |||
server, and use the network address as the final discriminator. | server, and use the network address as the final discriminator. | |||
The server, as defined by the unique identity expressed in the | The server, as defined by the unique identity expressed in the | |||
so_major_id of the server owner and the server scope, needs to track | so_major_id of the server owner and the server scope, needs to track | |||
several properties of each client ID it hands out. The properties | several properties of each client ID it hands out. The properties | |||
apply to the client ID and all sessions associated with the client | apply to the client ID and all sessions associated with the client | |||
skipping to change at page 501, line 31 | skipping to change at page 502, line 31 | |||
on confirmed client IDs though the server MAY refuse to change | on confirmed client IDs though the server MAY refuse to change | |||
them. | them. | |||
o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, | o The state protection method used, one of SP4_NONE, SP4_MACH_CRED, | |||
or SP4_SSV, as set by the spa_how field of the arguments to | or SP4_SSV, as set by the spa_how field of the arguments to | |||
EXCHANGE_ID. Once the client ID is confirmed, this property | EXCHANGE_ID. Once the client ID is confirmed, this property | |||
cannot be updated by subsequent EXCHANGE_ID requests. | cannot be updated by subsequent EXCHANGE_ID requests. | |||
o For SP4_MACH_CRED or SP4_SSV state protection: | o For SP4_MACH_CRED or SP4_SSV state protection: | |||
* The list of operations that MUST use the specified state | * The list of operations (spo_must_enforce) that MUST use the | |||
protection: spo_must_enforce, which come from the results of | specified state protection. This list comes from the results | |||
EXCHANGE_ID. | of EXCHANGE_ID. | |||
* The list of operations that MAY use the specified state | * The list of operations (spo_must_allow) that MAY use the | |||
protection: spo_must_allow, which come from the results of | specified state protection. This list comes from the results | |||
EXCHANGE_ID. | of EXCHANGE_ID. | |||
Once the client ID is confirmed, these properties cannot be | Once the client ID is confirmed, these properties cannot be | |||
updated by subsequent EXCHANGE_ID requests. | updated by subsequent EXCHANGE_ID requests. | |||
o For SP4_SSV protection: | o For SP4_SSV protection: | |||
* The OID of the hash algorithm. This property is represented by | * The OID of the hash algorithm. This property is represented by | |||
one of the algorithms in the ssp_hash_algs field of the | one of the algorithms in the ssp_hash_algs field of the | |||
EXCHANGE_ID arguments. Once the client ID is confirmed, this | EXCHANGE_ID arguments. Once the client ID is confirmed, this | |||
property cannot be updated by subsequent EXCHANGE_ID requests. | property cannot be updated by subsequent EXCHANGE_ID requests. | |||
skipping to change at page 502, line 41 | skipping to change at page 503, line 41 | |||
result of the above two invariants. | result of the above two invariants. | |||
+ key length SHOULD be >= hash length / 2. This is because | + key length SHOULD be >= hash length / 2. This is because | |||
the subkey derivation is via an HMAC and it is recommended | the subkey derivation is via an HMAC and it is recommended | |||
that if the HMAC has to be truncated, it should not be | that if the HMAC has to be truncated, it should not be | |||
truncated to less than half the hash length (see Section 4 | truncated to less than half the hash length (see Section 4 | |||
of RFC2104 [11]). | of RFC2104 [11]). | |||
* Number of concurrent versions of the SSV the client and server | * Number of concurrent versions of the SSV the client and server | |||
will support (Section 2.10.9). This property is represented by | will support (Section 2.10.9). This property is represented by | |||
spi_window, in the EXCHANGE_ID results. The property may be | spi_window in the EXCHANGE_ID results. The property may be | |||
updated by subsequent EXCHANGE_ID requests. | updated by subsequent EXCHANGE_ID requests. | |||
o The client's implementation ID as represented by the | o The client's implementation ID as represented by the | |||
eia_client_impl_id field of the arguments. The property may be | eia_client_impl_id field of the arguments. The property may be | |||
updated by subsequent EXCHANGE_ID requests. | updated by subsequent EXCHANGE_ID requests. | |||
o The server's implementation ID as represented by the | o The server's implementation ID as represented by the | |||
eir_server_impl_id field of the reply. The property may be | eir_server_impl_id field of the reply. The property may be | |||
updated by replies to subsequent EXCHANGE_ID requests. | updated by replies to subsequent EXCHANGE_ID requests. | |||
skipping to change at page 503, line 23 | skipping to change at page 504, line 23 | |||
The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in | The EXCHGID4_FLAG_UPD_CONFIRMED_REC_A bit can only be set in | |||
eia_flags; it is always off in eir_flags. The | eia_flags; it is always off in eir_flags. The | |||
EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is | EXCHGID4_FLAG_CONFIRMED_R bit can only be set in eir_flags; it is | |||
always off in eia_flags. If the server recognizes the co_ownerid and | always off in eia_flags. If the server recognizes the co_ownerid and | |||
co_verifier as mapping to a confirmed client ID, it sets | co_verifier as mapping to a confirmed client ID, it sets | |||
EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The | EXCHGID4_FLAG_CONFIRMED_R in eir_flags. The | |||
EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client | EXCHGID4_FLAG_CONFIRMED_R flag allows a client to tell if the client | |||
ID it is trying to create already exists and is confirmed. | ID it is trying to create already exists and is confirmed. | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set in eia_flags, this means | |||
the client is attempting to update properties of an existing | that the client is attempting to update properties of an existing | |||
confirmed client ID (if the client wants to update properties of an | confirmed client ID (if the client wants to update properties of an | |||
unconfirmed client ID, it MUST NOT set | unconfirmed client ID, it MUST NOT set | |||
EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED the | EXCHGID4_FLAG_UPD_CONFIRMED_REC_A). If so, it is RECOMMENDED that | |||
client send the update EXCHANGE_ID operation in the same COMPOUND as | the client send the update EXCHANGE_ID operation in the same COMPOUND | |||
a SEQUENCE so that the EXCHANGE_ID is executed exactly once. Whether | as a SEQUENCE so that the EXCHANGE_ID is executed exactly once. | |||
the client can update the properties of client ID depends on the | Whether the client can update the properties of client ID depends on | |||
state protection it selected when the client ID was created, and the | the state protection it selected when the client ID was created, and | |||
principal and security flavor it uses when sending the EXCHANGE_ID | the principal and security flavor it uses when sending the | |||
request. The situations described in Sub-Paragraph 6, Sub- | EXCHANGE_ID request. The situations described in items 6, 7, 8, or 9 | |||
Paragraph 7, Sub-Paragraph 8, or Sub-Paragraph 9, of Paragraph 6 in | of the second numbered list of Section 18.35.4 will apply. Note that | |||
Section 18.35.4 will apply. Note that if the operation succeeds and | if the operation succeeds and returns a client ID that is already | |||
returns a client ID that is already confirmed, the server MUST set | confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in | |||
the EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | eir_flags. | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in eia_flags, this | |||
means the client is trying to establish a new client ID; it is | means that the client is trying to establish a new client ID; it is | |||
attempting to trunk data communication to the server | attempting to trunk data communication to the server | |||
(Section 2.10.5); or it is attempting to update properties of an | (Section 2.10.5); or it is attempting to update properties of an | |||
unconfirmed client ID. The situations described in Sub-Paragraph 1, | unconfirmed client ID. The situations described in items 1, 2, 3, 4, | |||
Sub-Paragraph 2, Sub-Paragraph 3, Sub-Paragraph 4, or Sub-Paragraph 5 | or 5 of the second numbered list of Section 18.35.4 will apply. Note | |||
of Paragraph 6 in Section 18.35.4 will apply. Note that if the | that if the operation succeeds and returns a client ID that was | |||
operation succeeds and returns a client ID that was previously | previously confirmed, the server MUST set the | |||
confirmed, the server MUST set the EXCHGID4_FLAG_CONFIRMED_R bit in | EXCHGID4_FLAG_CONFIRMED_R bit in eir_flags. | |||
eir_flags. | ||||
When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client | When the EXCHGID4_FLAG_SUPP_MOVED_REFER flag bit is set, the client | |||
indicates that it is capable of dealing with an NFS4ERR_MOVED error | indicates that it is capable of dealing with an NFS4ERR_MOVED error | |||
as part of a referral sequence. When this bit is not set, it is | as part of a referral sequence. When this bit is not set, it is | |||
still legal for the server to perform a referral sequence. However, | still legal for the server to perform a referral sequence. However, | |||
a server may use the fact that the client is incapable of correctly | a server may use the fact that the client is incapable of correctly | |||
responding to a referral, by avoiding it for that particular client. | responding to a referral, by avoiding it for that particular client. | |||
It may, for instance, act as a proxy for that particular file system, | It may, for instance, act as a proxy for that particular file system, | |||
at some cost in performance, although it is not obligated to do so. | at some cost in performance, although it is not obligated to do so. | |||
If the server will potentially perform a referral, it MUST set | If the server will potentially perform a referral, it MUST set | |||
skipping to change at page 504, line 25 | skipping to change at page 505, line 24 | |||
when this in fact happens. However, a server may use the fact that | when this in fact happens. However, a server may use the fact that | |||
the client is incapable of correctly responding to a migration in its | the client is incapable of correctly responding to a migration in its | |||
scheduling of file systems to migrate so as to avoid migration of | scheduling of file systems to migrate so as to avoid migration of | |||
file systems being actively used. It may also hide actual migrations | file systems being actively used. It may also hide actual migrations | |||
from clients unable to deal with them by acting as a proxy for a | from clients unable to deal with them by acting as a proxy for a | |||
migrated file system for particular clients, at some cost in | migrated file system for particular clients, at some cost in | |||
performance, although it is not obligated to do so. If the server | performance, although it is not obligated to do so. If the server | |||
will potentially perform a migration, it MUST set | will potentially perform a migration, it MUST set | |||
EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. | EXCHGID4_FLAG_SUPP_MOVED_MIGR in eir_flags. | |||
When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates it | When EXCHGID4_FLAG_BIND_PRINC_STATEID is set, the client indicates | |||
wants the server to bind the stateid to the principal. This means | that it wants the server to bind the stateid to the principal. This | |||
that when a principal creates a stateid, it has to be the one to use | means that when a principal creates a stateid, it has to be the one | |||
the stateid. If the server will perform binding it will return | to use the stateid. If the server will perform binding, it will | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return | return EXCHGID4_FLAG_BIND_PRINC_STATEID. The server MAY return | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request | EXCHGID4_FLAG_BIND_PRINC_STATEID even if the client does not request | |||
it. If an update to the client ID changes the value of | it. If an update to the client ID changes the value of | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect | EXCHGID4_FLAG_BIND_PRINC_STATEID's client ID property, the effect | |||
applies only to new stateids. Existing stateids (and all stateids | applies only to new stateids. Existing stateids (and all stateids | |||
with the same "other" field) that were created with stateid to | with the same "other" field) that were created with stateid to | |||
principal binding in force will continue to have binding in force. | principal binding in force will continue to have binding in force. | |||
Existing stateids (and all stateids with same "other" field) that | Existing stateids (and all stateids with the same "other" field) that | |||
were created with stateid to principal not in force will continue to | were created with stateid to principal not in force will continue to | |||
have binding not in force. | have binding not in force. | |||
The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and | The EXCHGID4_FLAG_USE_NON_PNFS, EXCHGID4_FLAG_USE_PNFS_MDS, and | |||
EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 and | EXCHGID4_FLAG_USE_PNFS_DS bits are described in Section 13.1 and | |||
convey roles the client ID is to be used for in a pNFS environment. | convey roles the client ID is to be used for in a pNFS environment. | |||
The server MUST set one of the acceptable combinations of these bits | The server MUST set one of the acceptable combinations of these bits | |||
(roles) in eir_flags, as specified in Section 13.1. Note that the | (roles) in eir_flags, as specified in Section 13.1. Note that the | |||
same client owner/server owner pair can have multiple roles. | same client owner/server owner pair can have multiple roles. | |||
Multiple roles can be associated with the same client ID or with | Multiple roles can be associated with the same client ID or with | |||
different client IDs. Thus, if a client sends EXCHANGE_ID from the | different client IDs. Thus, if a client sends EXCHANGE_ID from the | |||
same client owner to the same server owner multiple times, but | same client owner to the same server owner multiple times, but | |||
specifies different pNFS roles each time, the server might return | specifies different pNFS roles each time, the server might return | |||
different client IDs. Given that different pNFS roles might have | different client IDs. Given that different pNFS roles might have | |||
different client IDs, the client may ask for different properties for | different client IDs, the client may ask for different properties for | |||
each role/client ID. | each role/client ID. | |||
The spa_how field of the eia_state_protect field specifies how the | The spa_how field of the eia_state_protect field specifies how the | |||
client wants to protect its client, locking and session state from | client wants to protect its client, locking, and session states from | |||
unauthorized changes (Section 2.10.8.3): | unauthorized changes (Section 2.10.8.3): | |||
o SP4_NONE. The client does not request the NFSv4.1 server to | o SP4_NONE. The client does not request the NFSv4.1 server to | |||
enforce state protection. The NFSv4.1 server MUST NOT enforce | enforce state protection. The NFSv4.1 server MUST NOT enforce | |||
state protection for the returned client ID. | state protection for the returned client ID. | |||
o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST | o SP4_MACH_CRED. If spa_how is SP4_MACH_CRED, then the client MUST | |||
send the EXCHANGE_ID request with RPCSEC_GSS as the security | send the EXCHANGE_ID request with RPCSEC_GSS as the security | |||
flavor, and with a service of RPC_GSS_SVC_INTEGRITY or | flavor, and with a service of RPC_GSS_SVC_INTEGRITY or | |||
RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the | RPC_GSS_SVC_PRIVACY. If SP4_MACH_CRED is specified, then the | |||
client wants to use an RPCSEC_GSS-based machine credential to | client wants to use an RPCSEC_GSS-based machine credential to | |||
protect its state. The server MUST note the principal the | protect its state. The server MUST note the principal the | |||
EXCHANGE_ID operation was sent with, and the GSS mechanism used. | EXCHANGE_ID operation was sent with, and the GSS mechanism used. | |||
These notes collectively comprise the machine credential. | These notes collectively comprise the machine credential. | |||
After the client ID is confirmed, as long as the lease associated | After the client ID is confirmed, as long as the lease associated | |||
with the client ID is unexpired, a subsequent EXCHANGE_ID | with the client ID is unexpired, a subsequent EXCHANGE_ID | |||
operation that uses the same eia_clientowner.co_owner as the first | operation that uses the same eia_clientowner.co_owner as the first | |||
EXCHANGE_ID, MUST also use the same machine credential as the | EXCHANGE_ID MUST also use the same machine credential as the first | |||
first EXCHANGE_ID. The server returns the same client ID for the | EXCHANGE_ID. The server returns the same client ID for the | |||
subsequent EXCHANGE_ID as that returned from the first | subsequent EXCHANGE_ID as that returned from the first | |||
EXCHANGE_ID. | EXCHANGE_ID. | |||
o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the | o SP4_SSV. If spa_how is SP4_SSV, then the client MUST send the | |||
EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and | EXCHANGE_ID request with RPCSEC_GSS as the security flavor, and | |||
with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. | with a service of RPC_GSS_SVC_INTEGRITY or RPC_GSS_SVC_PRIVACY. | |||
If SP4_SSV is specified, then the client wants to use the SSV to | If SP4_SSV is specified, then the client wants to use the SSV to | |||
protect its state. The server records the credential used in the | protect its state. The server records the credential used in the | |||
request as the machine credential (as defined above) for the | request as the machine credential (as defined above) for the | |||
eia_clientowner.co_owner. The CREATE_SESSION operation that | eia_clientowner.co_owner. The CREATE_SESSION operation that | |||
confirms the client ID MUST use the same machine credential. | confirms the client ID MUST use the same machine credential. | |||
When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides | When a client specifies SP4_MACH_CRED or SP4_SSV, it also provides | |||
two lists of operations (each expressed as a bit map). The first | two lists of operations (each expressed as a bitmap). The first list | |||
list is spo_must_enforce and consists of those operations the client | is spo_must_enforce and consists of those operations the client MUST | |||
MUST send (subject to the server confirming the list of operations in | send (subject to the server confirming the list of operations in the | |||
the result of EXCHANGE_ID) with the machine credential (if | result of EXCHANGE_ID) with the machine credential (if SP4_MACH_CRED | |||
SP4_MACH_CRED protection is specified) or the SSV-based credential | protection is specified) or the SSV-based credential (if SP4_SSV | |||
(if SP4_SSV protection is used). The client MUST send the operations | protection is used). The client MUST send the operations with | |||
with RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or | RPCSEC_GSS credentials that specify the RPC_GSS_SVC_INTEGRITY or | |||
RPC_GSS_SVC_PRIVACY security service. Typically the first list of | RPC_GSS_SVC_PRIVACY security service. Typically, the first list of | |||
operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, | operations includes EXCHANGE_ID, CREATE_SESSION, DELEGPURGE, | |||
DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The | DESTROY_SESSION, BIND_CONN_TO_SESSION, and DESTROY_CLIENTID. The | |||
client SHOULD NOT specify in this list any operations that require a | client SHOULD NOT specify in this list any operations that require a | |||
filehandle because the server's access policies MAY conflict with the | filehandle because the server's access policies MAY conflict with the | |||
client's choice, and thus the client would then be unable to access a | client's choice, and thus the client would then be unable to access a | |||
subset of the server's namespace. | subset of the server's namespace. | |||
Note that if SP4_SSV protection is specified, and the client | Note that if SP4_SSV protection is specified, and the client | |||
indicates that CREATE_SESSION must be protected with SP4_SSV, because | indicates that CREATE_SESSION must be protected with SP4_SSV, because | |||
the SSV cannot exist without a confirmed client ID, the first | the SSV cannot exist without a confirmed client ID, the first | |||
CREATE_SESSION MUST instead be sent using the machine credential, and | CREATE_SESSION MUST instead be sent using the machine credential, and | |||
the server MUST accept the machine credential. | the server MUST accept the machine credential. | |||
There is a corresponding result, also called spo_must_enforce, of the | There is a corresponding result, also called spo_must_enforce, of the | |||
operations the server will require SP4_MACH_CRED or SP4_SSV | operations for which the server will require SP4_MACH_CRED or SP4_SSV | |||
protection for. Normally the server's result equals the client's | protection. Normally, the server's result equals the client's | |||
argument, but the result MAY be different. If the client requests | argument, but the result MAY be different. If the client requests | |||
one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, | one or more operations in the set { EXCHANGE_ID, CREATE_SESSION, | |||
DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID | DELEGPURGE, DESTROY_SESSION, BIND_CONN_TO_SESSION, DESTROY_CLIENTID | |||
}, then the result spo_must_enforce MUST include the operations the | }, then the result spo_must_enforce MUST include the operations the | |||
client requested from that set. | client requested from that set. | |||
If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then | If spo_must_enforce in the results has BIND_CONN_TO_SESSION set, then | |||
connection binding enforcement is enabled, and the client MUST use | connection binding enforcement is enabled, and the client MUST use | |||
the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV | the machine (if SP4_MACH_CRED protection is used) or SSV (if SP4_SSV | |||
protection is used) credential on calls to BIND_CONN_TO_SESSION. | protection is used) credential on calls to BIND_CONN_TO_SESSION. | |||
The second list is spo_must_allow and consists of those operations | The second list is spo_must_allow and consists of those operations | |||
the client wants to have the option of sending with the machine | the client wants to have the option of sending with the machine | |||
credential or the SSV-based credential, even if the object the | credential or the SSV-based credential, even if the object the | |||
operations are performed on is not owned by the machine or SSV | operations are performed on is not owned by the machine or SSV | |||
credential. | credential. | |||
The corresponding result, also called spo_must_allow, consists of the | The corresponding result, also called spo_must_allow, consists of the | |||
operations the server will allow the client to use SP4_SSV or | operations the server will allow the client to use SP4_SSV or | |||
SP4_MACH_CRED credentials with. Normally the server's result equals | SP4_MACH_CRED credentials with. Normally, the server's result equals | |||
the client's argument, but the result MAY be different. | the client's argument, but the result MAY be different. | |||
The purpose of spo_must_allow is to allow clients to solve the | The purpose of spo_must_allow is to allow clients to solve the | |||
following conundrum. Suppose the client ID is confirmed with | following conundrum. Suppose the client ID is confirmed with | |||
EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the | EXCHGID4_FLAG_BIND_PRINC_STATEID, and it calls OPEN with the | |||
RPCSEC_GSS credentials of a normal user. Now suppose the user's | RPCSEC_GSS credentials of a normal user. Now suppose the user's | |||
credentials expire, and cannot be renewed (e.g. a Kerberos ticket | credentials expire, and cannot be renewed (e.g., a Kerberos ticket | |||
granting ticket expires, and the user has logged off and will not be | granting ticket expires, and the user has logged off and will not be | |||
acquiring a new ticket granting ticket). The client will be unable | acquiring a new ticket granting ticket). The client will be unable | |||
to send CLOSE without the user's credentials, which is to say the | to send CLOSE without the user's credentials, which is to say the | |||
client has to either leave the state on the server, or it has to re- | client has to either leave the state on the server or re-send | |||
send EXCHANGE_ID with a new verifier to clear all state. That is, | EXCHANGE_ID with a new verifier to clear all state, that is, unless | |||
unless the client includes CLOSE on the list of operations in | the client includes CLOSE on the list of operations in spo_must_allow | |||
spo_must_allow and the server agrees. | and the server agrees. | |||
The SP4_SSV protection parameters also have: | The SP4_SSV protection parameters also have: | |||
ssp_hash_algs: | ssp_hash_algs: | |||
This is the set of algorithms the client supports for the purpose | This is the set of algorithms the client supports for the purpose | |||
of computing the digests needed for the internal SSV GSS mechanism | of computing the digests needed for the internal SSV GSS mechanism | |||
and for the SET_SSV operation. Each algorithm is specified as an | and for the SET_SSV operation. Each algorithm is specified as an | |||
object identifier (OID). The REQUIRED algorithms for a server are | object identifier (OID). The REQUIRED algorithms for a server are | |||
id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [28]. The | id-sha1, id-sha224, id-sha256, id-sha384, and id-sha512 [28]. The | |||
skipping to change at page 507, line 41 | skipping to change at page 508, line 39 | |||
is empty, the server MUST return NFS4ERR_INVAL. Note that due to | is empty, the server MUST return NFS4ERR_INVAL. Note that due to | |||
previously stated requirements and recommendations on the | previously stated requirements and recommendations on the | |||
relationships between key length and hash length, some | relationships between key length and hash length, some | |||
combinations of RECOMMENDED and REQUIRED encryption algorithm and | combinations of RECOMMENDED and REQUIRED encryption algorithm and | |||
hash algorithm either SHOULD NOT or MUST NOT be used. Table 12 | hash algorithm either SHOULD NOT or MUST NOT be used. Table 12 | |||
summarizes the illegal and discouraged combinations. | summarizes the illegal and discouraged combinations. | |||
ssp_window: | ssp_window: | |||
This is the number of SSV versions the client wants the server to | This is the number of SSV versions the client wants the server to | |||
maintain (i.e. each successful call to SET_SSV produces a new | maintain (i.e., each successful call to SET_SSV produces a new | |||
version of the SSV). If ssp_window is zero, the server MUST | version of the SSV). If ssp_window is zero, the server MUST | |||
return NFS4ERR_INVAL. The server responds with spi_window, which | return NFS4ERR_INVAL. The server responds with spi_window, which | |||
MUST NOT exceed ssp_window, and MUST be at least one (1). Any | MUST NOT exceed ssp_window, and MUST be at least one. Any | |||
requests on the backchannel or fore channel that are using a | requests on the backchannel or fore channel that are using a | |||
version of the SSV that is outside the window will fail with an | version of the SSV that is outside the window will fail with an | |||
ONC RPC authentication error, and the requester will have to retry | ONC RPC authentication error, and the requester will have to retry | |||
them with the same slot ID and sequence ID. | them with the same slot ID and sequence ID. | |||
ssp_num_gss_handles: | ssp_num_gss_handles: | |||
This is the number of RPCSEC_GSS handles the server should create | This is the number of RPCSEC_GSS handles the server should create | |||
that are based on the GSS SSV mechanism (Section 2.10.9). It is | that are based on the GSS SSV mechanism (Section 2.10.9). It is | |||
not the total number of RPCSEC_GSS handles for the client ID. | not the total number of RPCSEC_GSS handles for the client ID. | |||
skipping to change at page 508, line 25 | skipping to change at page 509, line 23 | |||
ID is confirmed, which could be immediately if EXCHANGE_ID returns | ID is confirmed, which could be immediately if EXCHANGE_ID returns | |||
EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from | EXCHGID4_FLAG_CONFIRMED_R, or upon successful confirmation from | |||
CREATE_SESSION. | CREATE_SESSION. | |||
While a client ID can span all the connections that are connected | While a client ID can span all the connections that are connected | |||
to a server sharing the same eir_server_owner.so_major_id, the | to a server sharing the same eir_server_owner.so_major_id, the | |||
RPCSEC_GSS handles returned in spi_handles can only be used on | RPCSEC_GSS handles returned in spi_handles can only be used on | |||
connections connected to a server that returns the same the | connections connected to a server that returns the same the | |||
eir_server_owner.so_major_id and eir_server_owner.so_minor_id on | eir_server_owner.so_major_id and eir_server_owner.so_minor_id on | |||
each connection. It is permissible for the client to set | each connection. It is permissible for the client to set | |||
ssp_num_gss_handles to zero (0); the client can create more | ssp_num_gss_handles to zero; the client can create more handles | |||
handles with another EXCHANGE_ID call. | with another EXCHANGE_ID call. | |||
Because each SSV RPCSEC_GSS handle shares a common SSV GSS | Because each SSV RPCSEC_GSS handle shares a common SSV GSS | |||
context, there are security considerations specific to this | context, there are security considerations specific to this | |||
situation discussed in Section 2.10.10. | situation discussed in Section 2.10.10. | |||
The seq_window (see Section 5.2.3.1 of RFC2203 [4]) of each | The seq_window (see Section 5.2.3.1 of RFC2203 [4]) of each | |||
RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window | RPCSEC_GSS handle in spi_handle MUST be the same as the seq_window | |||
of the RPCSEC_GSS handle used for the credential of the RPC | of the RPCSEC_GSS handle used for the credential of the RPC | |||
request that the EXCHANGE_ID request was sent with. | request that the EXCHANGE_ID request was sent with. | |||
skipping to change at page 508, line 49 | skipping to change at page 509, line 47 | |||
| Algorithm | with | with | | | Algorithm | with | with | | |||
+-------------------+----------------------+------------------------+ | +-------------------+----------------------+------------------------+ | |||
| id-aes128-CBC | | id-sha384, id-sha512 | | | id-aes128-CBC | | id-sha384, id-sha512 | | |||
| id-aes192-CBC | id-sha1 | id-sha512 | | | id-aes192-CBC | id-sha1 | id-sha512 | | |||
| id-aes256-CBC | id-sha1, id-sha224 | | | | id-aes256-CBC | id-sha1, id-sha224 | | | |||
+-------------------+----------------------+------------------------+ | +-------------------+----------------------+------------------------+ | |||
Table 12 | Table 12 | |||
The arguments include an array of up to one element in length called | The arguments include an array of up to one element in length called | |||
eia_client_impl_id. If eia_client_impl_id is present it contains the | eia_client_impl_id. If eia_client_impl_id is present, it contains | |||
information identifying the implementation of the client. Similarly, | the information identifying the implementation of the client. | |||
the results include an array of up to one element in length called | Similarly, the results include an array of up to one element in | |||
eir_server_impl_id that identifies the implementation of the server. | length called eir_server_impl_id that identifies the implementation | |||
of the server. Servers MUST accept a zero-length eia_client_impl_id | ||||
Servers MUST accept a zero length eia_client_impl_id array, and | array, and clients MUST accept a zero-length eir_server_impl_id | |||
clients MUST accept a zero length eir_server_impl_id array. | array. | |||
An example use for implementation identifiers would be diagnostic | An example use for implementation identifiers would be diagnostic | |||
software that extract this information in an attempt to identify | software that extracts this information in an attempt to identify | |||
interoperability problems, performance workload behaviors or general | interoperability problems, performance workload behaviors, or general | |||
usage statistics. Since the intent of having access to this | usage statistics. Since the intent of having access to this | |||
information is for planning or general diagnosis only, the client and | information is for planning or general diagnosis only, the client and | |||
server MUST NOT interpret this implementation identity information in | server MUST NOT interpret this implementation identity information in | |||
a way that affects interoperational behavior of the implementation. | a way that affects interoperational behavior of the implementation. | |||
The reason is that if clients and servers did such a thing, they | The reason is that if clients and servers did such a thing, they | |||
might use fewer capabilities of the protocol than the peer can | might use fewer capabilities of the protocol than the peer can | |||
support, or the client and server might refuse to interoperate. | support, or the client and server might refuse to interoperate. | |||
Because it is possible some implementations will violate the protocol | Because it is possible that some implementations will violate the | |||
specification and interpret the identity information, implementations | protocol specification and interpret the identity information, | |||
MUST allow the users of the NFSv4 client and server to set the | implementations MUST allow the users of the NFSv4 client and server | |||
contents of the sent nfs_impl_id structure to any value. | to set the contents of the sent nfs_impl_id structure to any value. | |||
18.35.4. IMPLEMENTATION | 18.35.4. IMPLEMENTATION | |||
A server's client record is a 5-tuple: | A server's client record is a 5-tuple: | |||
1. co_ownerid | 1. co_ownerid | |||
The client identifier string, from the eia_clientowner | The client identifier string, from the eia_clientowner | |||
structure of the EXCHANGE_ID4args structure | structure of the EXCHANGE_ID4args structure. | |||
2. co_verifier: | 2. co_verifier: | |||
A client-specific value used to indicate incarnations (where a | A client-specific value used to indicate incarnations (where a | |||
client restart represents a new incarnation), from the | client restart represents a new incarnation), from the | |||
eia_clientowner structure of the EXCHANGE_ID4args structure | eia_clientowner structure of the EXCHANGE_ID4args structure. | |||
3. principal: | 3. principal: | |||
The principal that was defined in the RPC header's credential | The principal that was defined in the RPC header's credential | |||
and/or verifier at the time the client record was established. | and/or verifier at the time the client record was established. | |||
4. client ID: | 4. client ID: | |||
The shorthand client identifier, generated by the server and | The shorthand client identifier, generated by the server and | |||
returned via the eir_clientid field in the EXCHANGE_ID4resok | returned via the eir_clientid field in the EXCHANGE_ID4resok | |||
structure | structure. | |||
5. confirmed: | 5. confirmed: | |||
A private field on the server indicating whether or not a | A private field on the server indicating whether or not a | |||
client record has been confirmed. A client record is | client record has been confirmed. A client record is | |||
confirmed if there has been a successful CREATE_SESSION | confirmed if there has been a successful CREATE_SESSION | |||
operation to confirm it. Otherwise it is unconfirmed. An | operation to confirm it. Otherwise, it is unconfirmed. An | |||
unconfirmed record is established by a EXCHANGE_ID call. Any | unconfirmed record is established by an EXCHANGE_ID call. Any | |||
unconfirmed record that is not confirmed within a lease period | unconfirmed record that is not confirmed within a lease period | |||
SHOULD be removed. | SHOULD be removed. | |||
The following identifiers represent special values for the fields in | The following identifiers represent special values for the fields in | |||
the records. | the records. | |||
ownerid_arg: | ownerid_arg: | |||
The value of the eia_clientowner.co_ownerid subfield of the | The value of the eia_clientowner.co_ownerid subfield of the | |||
EXCHANGE_ID4args structure of the current request. | EXCHANGE_ID4args structure of the current request. | |||
skipping to change at page 511, line 17 | skipping to change at page 512, line 17 | |||
The client ID has been confirmed. | The client ID has been confirmed. | |||
unconfirmed: | unconfirmed: | |||
The client ID has not been confirmed. | The client ID has not been confirmed. | |||
Since EXCHANGE_ID is a non-idempotent operation, we must consider the | Since EXCHANGE_ID is a non-idempotent operation, we must consider the | |||
possibility that retries occur as a result of a client restart, | possibility that retries occur as a result of a client restart, | |||
network partition, malfunctioning router, etc. Retries are | network partition, malfunctioning router, etc. Retries are | |||
identified by the value of the eia_clientowner field of | identified by the value of the eia_clientowner field of | |||
EXCHANGE_ID4args and the method for dealing with them is outlined in | EXCHANGE_ID4args, and the method for dealing with them is outlined in | |||
the scenarios below. | the scenarios below. | |||
The scenarios are described in terms of the client record(s) a server | The scenarios are described in terms of the client record(s) a server | |||
has for a given co_ownerid. Note if the client ID was created | has for a given co_ownerid. Note that if the client ID was created | |||
specifying SP4_SSV state protection and EXCHANGE_ID as the one of the | specifying SP4_SSV state protection and EXCHANGE_ID as the one of the | |||
operations in spo_must_allow, then server MUST authorize EXCHANGE_IDs | operations in spo_must_allow, then the server MUST authorize | |||
with the SSV principal in addition to the principal that created the | EXCHANGE_IDs with the SSV principal in addition to the principal that | |||
client ID. | created the client ID. | |||
1. New Owner ID | 1. New Owner ID | |||
If the server has no client records with | If the server has no client records with | |||
eia_clientowner.co_ownerid matching ownerid_arg, and | eia_clientowner.co_ownerid matching ownerid_arg, and | |||
EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the | EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set in the | |||
EXCHANGE_ID, then a new shorthand client ID (let us call it | EXCHANGE_ID, then a new shorthand client ID (let us call it | |||
clientid_ret) is generated, and the following unconfirmed | clientid_ret) is generated, and the following unconfirmed | |||
record is added to the server's state. | record is added to the server's state. | |||
skipping to change at page 512, line 39 | skipping to change at page 513, line 39 | |||
a new shorthand client ID is generated, and the following | a new shorthand client ID is generated, and the following | |||
unconfirmed record is added to the server's state. | unconfirmed record is added to the server's state. | |||
{ ownerid_arg, verifier_arg, principal_arg, clientid_ret, | { ownerid_arg, verifier_arg, principal_arg, clientid_ret, | |||
unconfirmed } | unconfirmed } | |||
Subsequently, the server returns clientid_ret. | Subsequently, the server returns clientid_ret. | |||
If old_clientid_ret has an unexpired lease with state, then no | If old_clientid_ret has an unexpired lease with state, then no | |||
state of old_clientid_ret is changed or deleted. The server | state of old_clientid_ret is changed or deleted. The server | |||
returns NFS4ERR_CLID_INUSE to indicate the client should retry | returns NFS4ERR_CLID_INUSE to indicate that the client should | |||
with a different value for the eia_clientowner.co_ownerid | retry with a different value for the | |||
subfield of EXCHANGE_ID4args. The client record is not | eia_clientowner.co_ownerid subfield of EXCHANGE_ID4args. The | |||
changed. | client record is not changed. | |||
4. Replacement of Unconfirmed Record | 4. Replacement of Unconfirmed Record | |||
If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and | If the EXCHGID4_FLAG_UPD_CONFIRMED_REC_A flag is not set, and | |||
the server has the following unconfirmed record then the | the server has the following unconfirmed record, then the | |||
client is attempting EXCHANGE_ID again on an unconfirmed | client is attempting EXCHANGE_ID again on an unconfirmed | |||
client ID, perhaps due to a retry, or perhaps due to a client | client ID, perhaps due to a retry, a client restart before | |||
restart before client ID confirmation (i.e. before | client ID confirmation (i.e., before CREATE_SESSION was | |||
CREATE_SESSION was called), or some other reason. | called), or some other reason. | |||
{ ownerid_arg, *, *, old_clientid_ret, unconfirmed } | { ownerid_arg, *, *, old_clientid_ret, unconfirmed } | |||
It is possible the properties of old_clientid_ret are | It is possible that the properties of old_clientid_ret are | |||
different than those specified in the current EXCHANGE_ID. | different than those specified in the current EXCHANGE_ID. | |||
Whether the properties are being updated or not, to eliminate | Whether or not the properties are being updated, to eliminate | |||
ambiguity, the server deletes the unconfirmed record, | ambiguity, the server deletes the unconfirmed record, | |||
generates a new client ID (clientid_ret) and establishes the | generates a new client ID (clientid_ret), and establishes the | |||
following unconfirmed record: | following unconfirmed record: | |||
{ ownerid_arg, verifier_arg, principal_arg, clientid_ret, | { ownerid_arg, verifier_arg, principal_arg, clientid_ret, | |||
unconfirmed } | unconfirmed } | |||
5. Client Restart | 5. Client Restart | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is not set, and if the | |||
server has the following confirmed client record, then this | server has the following confirmed client record, then this | |||
request is likely from a previously confirmed client which has | request is likely from a previously confirmed client that has | |||
restarted. | restarted. | |||
{ ownerid_arg, old_verifier_arg, principal_arg, | { ownerid_arg, old_verifier_arg, principal_arg, | |||
old_clientid_ret, confirmed } | old_clientid_ret, confirmed } | |||
Since the previous incarnation of the same client will no | Since the previous incarnation of the same client will no | |||
longer be making requests, once the new client ID is confirmed | longer be making requests, once the new client ID is confirmed | |||
by CREATE_SESSION, lock and share reservations should be | by CREATE_SESSION, byte-range locks and share reservations | |||
released immediately rather than forcing the new incarnation | should be released immediately rather than forcing the new | |||
to wait for the lease time on the previous incarnation to | incarnation to wait for the lease time on the previous | |||
expire. Furthermore, session state should be removed since if | incarnation to expire. Furthermore, session state should be | |||
the client had maintained that information across restart, | removed since if the client had maintained that information | |||
this request would not have been sent. If the server does not | across restart, this request would not have been sent. If the | |||
support the CLAIM_DELEGATE_PREV claim type, associated | server supports neither the CLAIM_DELEGATE_PREV nor | |||
delegations should be purged as well; otherwise, delegations | CLAIM_DELEG_PREV_FH claim types, associated delegations should | |||
are retained and recovery proceeds according to | be purged as well; otherwise, delegations are retained and | |||
Section 10.2.1. | recovery proceeds according to Section 10.2.1. | |||
After processing, clientid_ret is returned to the client and | After processing, clientid_ret is returned to the client and | |||
this client record is added: | this client record is added: | |||
{ ownerid_arg, verifier_arg, principal_arg, clientid_ret, | { ownerid_arg, verifier_arg, principal_arg, clientid_ret, | |||
unconfirmed } | unconfirmed } | |||
The previously described confirmed record continues to exist, | The previously described confirmed record continues to exist, | |||
and thus the same ownerid_arg exists in both a confirmed and | and thus the same ownerid_arg exists in both a confirmed and | |||
unconfirmed state at the same time. The number of states can | unconfirmed state at the same time. The number of states can | |||
skipping to change at page 514, line 45 | skipping to change at page 515, line 45 | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has no confirmed record corresponding ownerid_arg, then the | has no confirmed record corresponding ownerid_arg, then the | |||
server returns NFS4ERR_NOENT and leaves any unconfirmed record | server returns NFS4ERR_NOENT and leaves any unconfirmed record | |||
intact. | intact. | |||
8. Update but Wrong Verifier | 8. Update but Wrong Verifier | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has the following confirmed record, then this request is an | has the following confirmed record, then this request is an | |||
illegal attempt at an update, perhaps because of a retry from | illegal attempt at an update, perhaps because of a retry from | |||
an previous client incarnation. | a previous client incarnation. | |||
{ ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } | { ownerid_arg, old_verifier_arg, *, clientid_ret, confirmed } | |||
The server returns NFS4ERR_NOT_SAME and leaves the client | The server returns NFS4ERR_NOT_SAME and leaves the client | |||
record intact. | record intact. | |||
9. Update but Wrong Principal | 9. Update but Wrong Principal | |||
If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | If EXCHGID4_FLAG_UPD_CONFIRMED_REC_A is set, and the server | |||
has the following confirmed record, then this request is an | has the following confirmed record, then this request is an | |||
skipping to change at page 516, line 36 | skipping to change at page 517, line 36 | |||
This operation is used by the client to create new session objects on | This operation is used by the client to create new session objects on | |||
the server. | the server. | |||
CREATE_SESSION can be sent with or without a preceding SEQUENCE | CREATE_SESSION can be sent with or without a preceding SEQUENCE | |||
operation in the same COMPOUND procedure. If CREATE_SESSION is sent | operation in the same COMPOUND procedure. If CREATE_SESSION is sent | |||
with a preceding SEQUENCE operation, any session created by | with a preceding SEQUENCE operation, any session created by | |||
CREATE_SESSION has no direct relation to the session specified in the | CREATE_SESSION has no direct relation to the session specified in the | |||
SEQUENCE operation, although the two sessions might be associated | SEQUENCE operation, although the two sessions might be associated | |||
with the same client ID. If CREATE_SESSION is sent without a | with the same client ID. If CREATE_SESSION is sent without a | |||
preceding SEQUENCE, then it MUST be the only operation in the | preceding SEQUENCE, then it MUST be the only operation in the | |||
COMPOUND procedure's request. If is not, the server MUST return | COMPOUND procedure's request. If it is not, the server MUST return | |||
NFS4ERR_NOT_ONLY_OP. | NFS4ERR_NOT_ONLY_OP. | |||
In addition to creating a session, CREATE_SESSION has the following | In addition to creating a session, CREATE_SESSION has the following | |||
effects: | effects: | |||
o The first session created with a new client ID serves to confirm | o The first session created with a new client ID serves to confirm | |||
the creation of that client's state on the server. The server | the creation of that client's state on the server. The server | |||
returns the parameter values for the new session. | returns the parameter values for the new session. | |||
o The connection CREATE_SESSION is sent over is associated with the | o The connection CREATE_SESSION that is sent over is associated with | |||
session's fore channel. | the session's fore channel. | |||
The arguments and results of CREATE_SESSION are described as follows: | The arguments and results of CREATE_SESSION are described as follows: | |||
csa_clientid: | csa_clientid: | |||
This is the client ID the new session will be associated with. | This is the client ID with which the new session will be | |||
The corresponding result is csr_sessionid, the session ID of the | associated. The corresponding result is csr_sessionid, the | |||
new session. | session ID of the new session. | |||
csa_sequence: | csa_sequence: | |||
Each client ID serializes CREATE_SESSION via a per client ID | Each client ID serializes CREATE_SESSION via a per-client ID | |||
sequence number (see Section 18.36.4). The corresponding result | sequence number (see Section 18.36.4). The corresponding result | |||
is csr_sequence, which MUST be equal to csa_sequence. | is csr_sequence, which MUST be equal to csa_sequence. | |||
In the next three arguments, the client offers a value that is to be | In the next three arguments, the client offers a value that is to be | |||
a property of the session. Except where otherwise stated, it is | a property of the session. Except where stated otherwise, it is | |||
RECOMMENDED that the server accept the value, and if it is not | RECOMMENDED that the server accept the value. If it is not | |||
acceptable, the server MAY use a different value. Regardless, the | acceptable, the server MAY use a different value. Regardless, the | |||
value the server returns (which will be either what the client | server MUST return the value the session will use (which will be | |||
offered, or what the server is insisting on) will be the value the | either what the client offered, or what the server is insisting on) | |||
session uses. | to the client. | |||
csa_flags: | csa_flags: | |||
The csa_flags field contains a list of the following flag bits: | The csa_flags field contains a list of the following flag bits: | |||
CREATE_SESSION4_FLAG_PERSIST: | CREATE_SESSION4_FLAG_PERSIST: | |||
If CREATE_SESSION4_FLAG_PERSIST is set, the client wants the | If CREATE_SESSION4_FLAG_PERSIST is set, the client wants the | |||
server to provide a persistent reply cache. For sessions in | server to provide a persistent reply cache. For sessions in | |||
which only idempotent operations will be used (e.g. a read-only | which only idempotent operations will be used (e.g., a read- | |||
session), clients SHOULD NOT set CREATE_SESSION4_FLAG_PERSIST. | only session), clients SHOULD NOT set | |||
If the server does not or cannot provide a persistent reply | CREATE_SESSION4_FLAG_PERSIST. If the server does not or cannot | |||
cache, the server MUST NOT set CREATE_SESSION4_FLAG_PERSIST in | provide a persistent reply cache, the server MUST NOT set | |||
the field csr_flags. | CREATE_SESSION4_FLAG_PERSIST in the field csr_flags. | |||
If the server is a pNFS metadata server, for reasons described | If the server is a pNFS metadata server, for reasons described | |||
in Section 12.5.2 it SHOULD support | in Section 12.5.2 it SHOULD support | |||
CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint | CREATE_SESSION4_FLAG_PERSIST if it supports the layout_hint | |||
(Section 5.12.4) attribute. | (Section 5.12.4) attribute. | |||
CREATE_SESSION4_FLAG_CONN_BACK_CHAN: | CREATE_SESSION4_FLAG_CONN_BACK_CHAN: | |||
If CREATE_SESSION4_FLAG_CONN_BACK_CHAN is set in csa_flags, the | If CREATE_SESSION4_FLAG_CONN_BACK_CHAN is set in csa_flags, the | |||
client is requesting that the server use the connection | client is requesting that the connection over which the | |||
CREATE_SESSION is called over for the backchannel as well as | CREATE_SESSION operation arrived be associated with the the | |||
the fore channel. The server sets | session's backchannel in addition to its fore channel. If the | |||
CREATE_SESSION4_FLAG_CONN_BACK_CHAN in the result field | server agrees, it sets CREATE_SESSION4_FLAG_CONN_BACK_CHAN in | |||
csr_flags if it agrees. If CREATE_SESSION4_FLAG_CONN_BACK_CHAN | the result field csr_flags. If | |||
is not set in csa_flags, then | CREATE_SESSION4_FLAG_CONN_BACK_CHAN is not set in csa_flags, | |||
CREATE_SESSION4_FLAG_CONN_BACK_CHAN MUST NOT be set in | then CREATE_SESSION4_FLAG_CONN_BACK_CHAN MUST NOT be set in | |||
csr_flags. | csr_flags. | |||
CREATE_SESSION4_FLAG_CONN_RDMA: | CREATE_SESSION4_FLAG_CONN_RDMA: | |||
If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, and if | If CREATE_SESSION4_FLAG_CONN_RDMA is set in csa_flags, and if | |||
the connection CREATE_SESSION is called over is currently in | the connection over which the CREATE_SESSION operation arrived | |||
non-RDMA mode, but has the capability to operate in RDMA mode, | is currently in non-RDMA mode but has the capability to operate | |||
then client is requesting the server agree to "step up" to RDMA | in RDMA mode, then the client is requesting that the server | |||
mode on the connection. The server sets | "step up" to RDMA mode on the connection. If the server | |||
CREATE_SESSION4_FLAG_CONN_RDMA in the result field csr_flags if | agrees, it sets CREATE_SESSION4_FLAG_CONN_RDMA in the result | |||
it agrees. If CREATE_SESSION4_FLAG_CONN_RDMA is not set in | field csr_flags. If CREATE_SESSION4_FLAG_CONN_RDMA is not set | |||
csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be set | in csa_flags, then CREATE_SESSION4_FLAG_CONN_RDMA MUST NOT be | |||
in csr_flags. Note that once the server agrees to step up, it | set in csr_flags. Note that once the server agrees to step up, | |||
and the client MUST exchange all future traffic on the | it and the client MUST exchange all future traffic on the | |||
connection with RPC RDMA framing and not Record Marking ([8]). | connection with RPC RDMA framing and not Record Marking ([8]). | |||
csa_fore_chan_attrs, csa_fore_chan_attrs: | csa_fore_chan_attrs, csa_fore_chan_attrs: | |||
The csa_fore_chan_attrs and csa_back_chan_attrs fields apply to | The csa_fore_chan_attrs and csa_back_chan_attrs fields apply to | |||
attributes of the fore channel (which conveys requests originating | attributes of the fore channel (which conveys requests originating | |||
from the client to the server), and the backchannel (the channel | from the client to the server), and the backchannel (the channel | |||
that conveys callback requests originating from the server to the | that conveys callback requests originating from the server to the | |||
client), respectively. The results are in corresponding | client), respectively. The results are in corresponding | |||
structures called csr_fore_chan_attrs and csr_back_chan_attrs. | structures called csr_fore_chan_attrs and csr_back_chan_attrs. | |||
skipping to change at page 519, line 30 | skipping to change at page 520, line 30 | |||
reply. After the session is created, if a requester sends a | reply. After the session is created, if a requester sends a | |||
request for which the size of the reply would exceed this | request for which the size of the reply would exceed this | |||
value, the replier will return NFS4ERR_REP_TOO_BIG, per the | value, the replier will return NFS4ERR_REP_TOO_BIG, per the | |||
description in Section 2.10.6.4. | description in Section 2.10.6.4. | |||
ca_maxresponsesize_cached: | ca_maxresponsesize_cached: | |||
Like ca_maxresponsesize, but the maximum size of a reply that | Like ca_maxresponsesize, but the maximum size of a reply that | |||
will be stored in the reply cache (Section 2.10.6.1). For each | will be stored in the reply cache (Section 2.10.6.1). For each | |||
channel, the server MAY decrease this value, but MUST NOT | channel, the server MAY decrease this value, but MUST NOT | |||
increase it. If the reply to CREATE_SESSION has the value | increase it. If, in the reply to CREATE_SESSION, the value of | |||
ca_maxresponsesize_cached less than the value | ca_maxresponsesize_cached of a channel is less than the value | |||
ca_maxresponsesize, then this is an indication to the requester | of ca_maxresponsesize of the same channel, then this is an | |||
on the channel that it needs to be selective about which | indication to the requester that it needs to be selective about | |||
replies it directs the replier to cache; for example large | which replies it directs the replier to cache; for example, | |||
replies from nonidempotent operations (e.g. COMPOUND requests | large replies from nonidempotent operations (e.g., COMPOUND | |||
with a READ operation), should not be cached. The requester | requests with a READ operation) should not be cached. The | |||
decides which replies to cache via an argument to the SEQUENCE | requester decides which replies to cache via an argument to the | |||
(the sa_cachethis field, see Section 18.46) or CB_SEQUENCE (the | SEQUENCE (the sa_cachethis field, see Section 18.46) or | |||
csa_cachethis field, see Section 20.9) operations. After the | CB_SEQUENCE (the csa_cachethis field, see Section 20.9) | |||
session is created, if a requester sends a request for which | operations. After the session is created, if a requester sends | |||
the size of the reply would exceed this value, the replier will | a request for which the size of the reply would exceed | |||
return NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in | ca_maxresponsesize_cached, the replier will return | |||
NFS4ERR_REP_TOO_BIG_TO_CACHE, per the description in | ||||
Section 2.10.6.4. | Section 2.10.6.4. | |||
ca_maxoperations: | ca_maxoperations: | |||
The maximum number of operations the replier will accept in a | The maximum number of operations the replier will accept in a | |||
COMPOUND or CB_COMPOUND. For the backchannel, the server MUST | COMPOUND or CB_COMPOUND. For the backchannel, the server MUST | |||
NOT change the value the client offers. For the fore channel, | NOT change the value the client offers. For the fore channel, | |||
the server MAY change the requested value. After the session | the server MAY change the requested value. After the session | |||
is created, if a requester sends a COMPOUND or CB_COMPOUND with | is created, if a requester sends a COMPOUND or CB_COMPOUND with | |||
more operations than ca_maxoperations, the replier MUST return | more operations than ca_maxoperations, the replier MUST return | |||
NFS4ERR_TOO_MANY_OPS. | NFS4ERR_TOO_MANY_OPS. | |||
ca_maxrequests: | ca_maxrequests: | |||
The maximum number of concurrent COMPOUND or CB_COMPOUND | The maximum number of concurrent COMPOUND or CB_COMPOUND | |||
requests the requester will send on the session. Subsequent | requests the requester will send on the session. Subsequent | |||
requests will each be assigned a slot identifier by the | requests will each be assigned a slot identifier by the | |||
requester within the range 0 to ca_maxrequests - 1 inclusive. | requester within the range zero to ca_maxrequests - 1 | |||
For the backchannel, the server MUST NOT change the value the | inclusive. For the backchannel, the server MUST NOT change the | |||
client offers. For the fore channel, the server MAY change the | value the client offers. For the fore channel, the server MAY | |||
requested value. | change the requested value. | |||
ca_rdma_ird: | ca_rdma_ird: | |||
This array has a maximum of one element. If this array has one | This array has a maximum of one element. If this array has one | |||
element, then the element contains the inbound RDMA read queue | element, then the element contains the inbound RDMA read queue | |||
depth (IRD). For each channel, the server MAY decrease this | depth (IRD). For each channel, the server MAY decrease this | |||
value, but MUST NOT increase it. | value, but MUST NOT increase it. | |||
csa_cb_program | csa_cb_program | |||
skipping to change at page 521, line 6 | skipping to change at page 522, line 7 | |||
specified, then the server is allowed to use the RPCSEC_GSS | specified, then the server is allowed to use the RPCSEC_GSS | |||
context specified in cbsp_gss_parms as the RPCSEC_GSS context in | context specified in cbsp_gss_parms as the RPCSEC_GSS context in | |||
the credential of the RPC header of callbacks to the client. | the credential of the RPC header of callbacks to the client. | |||
There is no corresponding result. | There is no corresponding result. | |||
The RPCSEC_GSS context for the backchannel is specified via a pair | The RPCSEC_GSS context for the backchannel is specified via a pair | |||
of values of data type gsshandle4_t. The data type gsshandle4_t | of values of data type gsshandle4_t. The data type gsshandle4_t | |||
represents an RPCSEC_GSS handle, and is precisely the same as the | represents an RPCSEC_GSS handle, and is precisely the same as the | |||
data type of the "handle" field of the rpc_gss_init_res data type | data type of the "handle" field of the rpc_gss_init_res data type | |||
defined in Section 5.2.3.1, "Context Creation Response - | defined in Section 5.2.3.1, "Context Creation Response - | |||
Successful Acceptance" of [4]. | Successful Acceptance", of [4]. | |||
The first RPCSEC_GSS handle, gcbp_handle_from_server, is the fore | The first RPCSEC_GSS handle, gcbp_handle_from_server, is the fore | |||
handle the server returned to the client (either in the handle | handle the server returned to the client (either in the handle | |||
field of data type rpc_gss_init_res or one of the elements of the | field of data type rpc_gss_init_res or as one of the elements of | |||
spi_handles field returned in the reply to EXCHANGE_ID) when the | the spi_handles field returned in the reply to EXCHANGE_ID) when | |||
RPCSEC_GSS context was created on the server. The second handle, | the RPCSEC_GSS context was created on the server. The second | |||
gcbp_handle_from_client, is the back handle the client will map | handle, gcbp_handle_from_client, is the back handle to which the | |||
the RPCSEC_GSS context to. The server can immediately use the | client will map the RPCSEC_GSS context. The server can | |||
value of gcbp_handle_from_client in the RPCSEC_GSS credential in | immediately use the value of gcbp_handle_from_client in the | |||
callback RPCs. I.e., the value in gcbp_handle_from_client can be | RPCSEC_GSS credential in callback RPCs. That is, the value in | |||
used as the value of the field "handle" in data type | gcbp_handle_from_client can be used as the value of the field | |||
rpc_gss_cred_t (see Section 5, "Elements of the RPCSEC_GSS | "handle" in data type rpc_gss_cred_t (see Section 5, "Elements of | |||
Security Protocol" of [4]) in callback RPCs. The server MUST use | the RPCSEC_GSS Security Protocol", of [4]) in callback RPCs. The | |||
the RPCSEC_GSS security service specified in gcbp_service, i.e. it | server MUST use the RPCSEC_GSS security service specified in | |||
MUST set the "service" field of the rpc_gss_cred_t data type in | gcbp_service, i.e., it MUST set the "service" field of the | |||
RPCSEC_GSS credential to the value of gcbp_service (see Section | rpc_gss_cred_t data type in RPCSEC_GSS credential to the value of | |||
5.3.1, "RPC Request Header", of [4]). | gcbp_service (see Section 5.3.1, "RPC Request Header", of [4]). | |||
If the RPCSEC_GSS handle identified by gcbp_handle_from_server | If the RPCSEC_GSS handle identified by gcbp_handle_from_server | |||
does not exist on the server, the server will return | does not exist on the server, the server will return | |||
NFS4ERR_NOENT. | NFS4ERR_NOENT. | |||
Within each element of csa_sec_parms, the fore and back RPCSEC_GSS | Within each element of csa_sec_parms, the fore and back RPCSEC_GSS | |||
contexts MUST share the same GSS context and MUST have the same | contexts MUST share the same GSS context and MUST have the same | |||
seq_window (see Section 5.2.3.1 of RFC2203 [4]). The fore and | seq_window (see Section 5.2.3.1 of RFC2203 [4]). The fore and | |||
back RPCSEC_GSS context state are independent of each other as far | back RPCSEC_GSS context state are independent of each other as far | |||
as the RPCSEC_GSS sequence number (see the seq_num field in the | as the RPCSEC_GSS sequence number (see the seq_num field in the | |||
rpc_gss_cred_t data type of Section 5 and of Section 5.3.1, "RPC | rpc_gss_cred_t data type of Sections 5 and 5.3.1 of [4]). | |||
Request Header", of RFC2203). | ||||
If an RPCSEC_GSS handle is using the SSV context (see | If an RPCSEC_GSS handle is using the SSV context (see | |||
Section 2.10.9), then because each SSV RPCSEC_GSS handle shares a | Section 2.10.9), then because each SSV RPCSEC_GSS handle shares a | |||
common SSV GSS context, there are security considerations specific | common SSV GSS context, there are security considerations specific | |||
to this situation discussed in Section 2.10.10. | to this situation discussed in Section 2.10.10. | |||
Once the session is created, the first SEQUENCE or CB_SEQUENCE | Once the session is created, the first SEQUENCE or CB_SEQUENCE | |||
received on a slot MUST have a sequence ID equal to 1; if not the | received on a slot MUST have a sequence ID equal to 1; if not, the | |||
server MUST return NFS4ERR_SEQ_MISORDERED. | replier MUST return NFS4ERR_SEQ_MISORDERED. | |||
18.36.4. IMPLEMENTATION | 18.36.4. IMPLEMENTATION | |||
To describe a possible implementation, the same notation for client | To describe a possible implementation, the same notation for client | |||
records introduced in the description of EXCHANGE_ID is used with the | records introduced in the description of EXCHANGE_ID is used with the | |||
following addition: | following addition: | |||
clientid_arg: The value of the csa_clientid field of the | clientid_arg: The value of the csa_clientid field of the | |||
CREATE_SESSION4args structure of the current request. | CREATE_SESSION4args structure of the current request. | |||
Since CREATE_SESSION is a non-idempotent operation, we need to | Since CREATE_SESSION is a non-idempotent operation, we need to | |||
consider the possibility that retries may occur as a result of a | consider the possibility that retries may occur as a result of a | |||
client restart, network partition, malfunctioning router, etc. For | client restart, network partition, malfunctioning router, etc. For | |||
each client ID created by EXCHANGE_ID, the server maintains a | each client ID created by EXCHANGE_ID, the server maintains a | |||
separate reply cache (called the CREATE_SESSION reply cache) similar | separate reply cache (called the CREATE_SESSION reply cache) similar | |||
to the session reply cache used for SEQUENCE operations, with two | to the session reply cache used for SEQUENCE operations, with two | |||
distinctions. | distinctions. | |||
o First this is a reply cache just for detecting and processing | o First, this is a reply cache just for detecting and processing | |||
CREATE_SESSION requests for a given client ID. | CREATE_SESSION requests for a given client ID. | |||
o Second, the size of the client ID reply cache is of one slot (and | o Second, the size of the client ID reply cache is of one slot (and | |||
as a result, the CREATE_SESSION request does not carry a slot | as a result, the CREATE_SESSION request does not carry a slot | |||
number). This means that at most one CREATE_SESSION request for a | number). This means that at most one CREATE_SESSION request for a | |||
given client ID can be outstanding. | given client ID can be outstanding. | |||
As previously stated, CREATE_SESSION can be sent with or without a | As previously stated, CREATE_SESSION can be sent with or without a | |||
preceding SEQUENCE operation. Even if SEQUENCE precedes | preceding SEQUENCE operation. Even if a SEQUENCE precedes | |||
CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply | CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply | |||
cache, which is separate from the reply cache for the session | cache, which is separate from the reply cache for the session | |||
associated with SEQUENCE. If CREATE_SESSION was originally sent by | associated with a SEQUENCE. If CREATE_SESSION was originally sent by | |||
itself, the client MAY send a retry of the CREATE_SESSION operation | itself, the client MAY send a retry of the CREATE_SESSION operation | |||
within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was | within a COMPOUND preceded by a SEQUENCE. If CREATE_SESSION was | |||
originally sent in a COMPOUND that started with SEQUENCE, then the | originally sent in a COMPOUND that started with a SEQUENCE, then the | |||
client SHOULD send a retry in a COMPOUND that starts with SEQUENCE | client SHOULD send a retry in a COMPOUND that starts with a SEQUENCE | |||
that has the same session ID as the SEQUENCE of the original request. | that has the same session ID as the SEQUENCE of the original request. | |||
However, the client MAY send a retry in a COMPOUND that either has no | However, the client MAY send a retry in a COMPOUND that either has no | |||
preceding SEQUENCE, or has a preceding SEQUENCE that refers to a | preceding SEQUENCE, or has a preceding SEQUENCE that refers to a | |||
different session than the original CREATE_SESSION. This might be | different session than the original CREATE_SESSION. This might be | |||
necessary if the client sends a CREATE_SESSION in a COMPOUND preceded | necessary if the client sends a CREATE_SESSION in a COMPOUND preceded | |||
by a SEQUENCE with session ID X, and session X no longer exists. | by a SEQUENCE with session ID X, and session X no longer exists. | |||
Regardless, any retry of CREATE_SESSION, with or without a preceding | Regardless, any retry of CREATE_SESSION, with or without a preceding | |||
SEQUENCE, MUST use the same value of csa_sequence as the original. | SEQUENCE, MUST use the same value of csa_sequence as the original. | |||
When a client sends a successful EXCHANGE_ID and it is returned an | After the client received a reply to an EXCHANGE_ID operation that | |||
unconfirmed client ID, the client is also returned eir_sequenceid, | contains a new, unconfirmed client ID, the server expects the client | |||
and the client is expected to set the value of csa_sequenceid in the | to follow with a CREATE_SESSION operation to confirm the client ID. | |||
client ID-confirming-CREATE_SESSION it sends with that client ID to | The server expects value of csa_sequenceid in the arguments to that | |||
the value of eir_sequenceid. When EXCHANGE_ID returns a new, | CREATE_SESSION to be to equal the value of the field eir_sequenceid | |||
unconfirmed client ID, the server initializes the client ID slot to | that was returned in results of the EXCHANGE_ID that returned the | |||
be equal to eir_sequenceid - 1 (accounting for underflow), and | unconfirmed client ID. Before the server replies to that EXCHANGE_ID | |||
records a contrived CREATE_SESSION result with a "cached" result of | operation, it initializes the client ID slot to be equal to | |||
NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the | eir_sequenceid - 1 (accounting for underflow), and records a | |||
processing of the CREATE_SESSION operation is divided into four | contrived CREATE_SESSION result with a "cached" result of | |||
NFS4ERR_SEQ_MISORDERED. With the client ID slot thus initialized, | ||||
the processing of the CREATE_SESSION operation is divided into four | ||||
phases: | phases: | |||
1. Client record lookup. The server looks up the client ID in its | 1. Client record lookup. The server looks up the client ID in its | |||
client record table. If the server contains no records with | client record table. If the server contains no records with | |||
client ID equal to clientid_arg, then most likely the client's | client ID equal to clientid_arg, then most likely the client's | |||
state has been purged during a period of inactivity, possibly due | state has been purged during a period of inactivity, possibly due | |||
to a loss of connectivity. NFS4ERR_STALE_CLIENTID is returned, | to a loss of connectivity. NFS4ERR_STALE_CLIENTID is returned, | |||
and no changes are made to any client records on the server. | and no changes are made to any client records on the server. | |||
Otherwise, the server goes to phase 2. | Otherwise, the server goes to phase 2. | |||
skipping to change at page 523, line 29 | skipping to change at page 524, line 31 | |||
NFS4ERR_SEQ_MISORDERED, and does not change the slot. If | NFS4ERR_SEQ_MISORDERED, and does not change the slot. If | |||
csa_sequenceid is equal to the slot's sequence ID + 1 (accounting | csa_sequenceid is equal to the slot's sequence ID + 1 (accounting | |||
for wraparound), then the slot's sequence ID is set to | for wraparound), then the slot's sequence ID is set to | |||
csa_sequenceid, and the CREATE_SESSION processing goes to the | csa_sequenceid, and the CREATE_SESSION processing goes to the | |||
next phase. A subsequent new CREATE_SESSION call over the same | next phase. A subsequent new CREATE_SESSION call over the same | |||
client ID MUST use a csa_sequenceid that is one greater than the | client ID MUST use a csa_sequenceid that is one greater than the | |||
sequence ID in the slot. | sequence ID in the slot. | |||
3. Client ID confirmation. If this would be the first session for | 3. Client ID confirmation. If this would be the first session for | |||
the client ID, the CREATE_SESSION operation serves to confirm the | the client ID, the CREATE_SESSION operation serves to confirm the | |||
client ID. Otherwise the client ID confirmation phase is skipped | client ID. Otherwise, the client ID confirmation phase is | |||
and only the session creation phase occurs. Any case in which | skipped and only the session creation phase occurs. Any case in | |||
there is more than one record with identical values for client ID | which there is more than one record with identical values for | |||
represents a server implementation error. Operation in the | client ID represents a server implementation error. Operation in | |||
potential valid cases is summarized as follows. | the potential valid cases is summarized as follows. | |||
* Successful Confirmation | * Successful Confirmation | |||
If the server has the following unconfirmed record, then | If the server has the following unconfirmed record, then | |||
this is the expected confirmation of an unconfirmed record. | this is the expected confirmation of an unconfirmed record. | |||
{ ownerid, verifier, principal_arg, clientid_arg, | { ownerid, verifier, principal_arg, clientid_arg, | |||
unconfirmed } | unconfirmed } | |||
As noted in Section 18.35.4, the server might also have the | As noted in Section 18.35.4, the server might also have the | |||
skipping to change at page 524, line 19 | skipping to change at page 525, line 24 | |||
* Unsuccessful Confirmation | * Unsuccessful Confirmation | |||
If the server has the following record, then the client has | If the server has the following record, then the client has | |||
changed principals after the previous EXCHANGE_ID request, | changed principals after the previous EXCHANGE_ID request, | |||
or there has been a chance collision between shorthand | or there has been a chance collision between shorthand | |||
client identifiers. | client identifiers. | |||
{ *, *, old_principal_arg, clientid_arg, * } | { *, *, old_principal_arg, clientid_arg, * } | |||
Neither of these cases are permissible. Processing stops | Neither of these cases is permissible. Processing stops | |||
and NFS4ERR_CLID_INUSE is returned to the client. No | and NFS4ERR_CLID_INUSE is returned to the client. No | |||
changes are made to any client records on the server. | changes are made to any client records on the server. | |||
4. Session creation. The server confirmed the client ID, either in | 4. Session creation. The server confirmed the client ID, either in | |||
this CREATE_SESSION operation, or a previous CREATE_SESSION | this CREATE_SESSION operation, or a previous CREATE_SESSION | |||
operation. The server examines the remaining fields of the | operation. The server examines the remaining fields of the | |||
arguments. | arguments. | |||
The server creates the session by recording the parameter values | The server creates the session by recording the parameter values | |||
used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is | used (including whether the CREATE_SESSION4_FLAG_PERSIST flag is | |||
set and has been accepted by the server) and allocating space for | set and has been accepted by the server) and allocating space for | |||
the session reply cache (if there is not enough space, the server | the session reply cache (if there is not enough space, the server | |||
returns NFS4ERR_NOSPC). For each slot in the reply cache, the | returns NFS4ERR_NOSPC). For each slot in the reply cache, the | |||
server sets the sequence ID to zero (0), and records an entry | server sets the sequence ID to zero, and records an entry | |||
containing a COMPOUND reply with zero operations and the error | containing a COMPOUND reply with zero operations and the error | |||
NFS4ERR_SEQ_MISORDERED. This way, if the first SEQUENCE request | NFS4ERR_SEQ_MISORDERED. This way, if the first SEQUENCE request | |||
sent has a sequence ID equal to zero, the server can simply | sent has a sequence ID equal to zero, the server can simply | |||
return what is in the reply cache: NFS4ERR_SEQ_MISORDERED. The | return what is in the reply cache: NFS4ERR_SEQ_MISORDERED. The | |||
client initializes its reply cache for receiving callbacks in the | client initializes its reply cache for receiving callbacks in the | |||
same way, and similarly, the first CB_SEQUENCE operation on a | same way, and similarly, the first CB_SEQUENCE operation on a | |||
slot after session creation MUST have a sequence ID of one. | slot after session creation MUST have a sequence ID of one. | |||
If the session state is created successfully, the server | If the session state is created successfully, the server | |||
associates the session with the client ID provided by the client. | associates the session with the client ID provided by the client. | |||
When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set needs | When a request that had CREATE_SESSION4_FLAG_CONN_RDMA set needs | |||
to be retried, the retry MUST be done on a new connection that is | to be retried, the retry MUST be done on a new connection that is | |||
in non-RDMA mode. If properties of the new connection are | in non-RDMA mode. If properties of the new connection are | |||
different enough that the arguments to CREATE_SESSION need to | different enough that the arguments to CREATE_SESSION need to | |||
change, then a non-retry MUST be sent. The server will | change, then a non-retry MUST be sent. The server will | |||
eventually dispose of any session that was created on the | eventually dispose of any session that was created on the | |||
original connection. | original connection. | |||
On the backchannel, the client and server might wish to have many | On the backchannel, the client and server might wish to have many | |||
slots, in some cases perhaps more that the fore channel, in to deal | slots, in some cases perhaps more that the fore channel, in order to | |||
with the situations where the network link has high latency and is | deal with the situations where the network link has high latency and | |||
the primary bottleneck for response to recalls. If so, and if the | is the primary bottleneck for response to recalls. If so, and if the | |||
client provides too few slots to the backchannel, the server might | client provides too few slots to the backchannel, the server might | |||
limit the number of recallable objects it gives to the server. | limit the number of recallable objects it gives to the server. | |||
Implementing RPCSEC_GSS callback support requires the client and | Implementing RPCSEC_GSS callback support requires changes to both the | |||
server change their RPCSEC_GSS implementations. One possible set of | client and server implementations of RPCSEC_GSS. One possible set of | |||
changes includes: | changes includes: | |||
o Adding a data structure that wraps the GSS-API context with a | o Adding a data structure that wraps the GSS-API context with a | |||
reference count. | reference count. | |||
o New functions to increment and decrement the reference count. If | o New functions to increment and decrement the reference count. If | |||
the reference count is decremented to zero, the wrapper data | the reference count is decremented to zero, the wrapper data | |||
structure and the GSS-API context it refers to would be freed. | structure and the GSS-API context it refers to would be freed. | |||
o Change RPCSEC_GSS to create the wrapper data structure upon | o Change RPCSEC_GSS to create the wrapper data structure upon | |||
skipping to change at page 526, line 21 | skipping to change at page 527, line 21 | |||
18.37.3. DESCRIPTION | 18.37.3. DESCRIPTION | |||
The DESTROY_SESSION operation closes the session and discards the | The DESTROY_SESSION operation closes the session and discards the | |||
session's reply cache, if any. Any remaining connections associated | session's reply cache, if any. Any remaining connections associated | |||
with the session are immediately disassociated. If the connection | with the session are immediately disassociated. If the connection | |||
has no remaining associated sessions, the connection MAY be closed by | has no remaining associated sessions, the connection MAY be closed by | |||
the server. Locks, delegations, layouts, wants, and the lease, which | the server. Locks, delegations, layouts, wants, and the lease, which | |||
are all tied to the client ID, are not affected by DESTROY_SESSION. | are all tied to the client ID, are not affected by DESTROY_SESSION. | |||
DESTROY_SESSION MUST be invoked on a connection that is associated | DESTROY_SESSION MUST be invoked on a connection that is associated | |||
with the session being destroyed. In addition if SP4_MACH_CRED state | with the session being destroyed. In addition, if SP4_MACH_CRED | |||
protection was specified when the client ID was created, the | state protection was specified when the client ID was created, the | |||
RPCSEC_GSS principal that created the session MUST be the one that | RPCSEC_GSS principal that created the session MUST be the one that | |||
destroys the session, using RPCSEC_GSS privacy or integrity. If | destroys the session, using RPCSEC_GSS privacy or integrity. If | |||
SP4_SSV state protection was specified when the client ID was | SP4_SSV state protection was specified when the client ID was | |||
created, RPCSEC_GSS using the SSV mechanism (Section 2.10.9) MUST be | created, RPCSEC_GSS using the SSV mechanism (Section 2.10.9) MUST be | |||
used, with integrity or privacy. | used, with integrity or privacy. | |||
If the COMPOUND request starts with SEQUENCE, and if the sessionids | If the COMPOUND request starts with SEQUENCE, and if the sessionids | |||
specified in SEQUENCE and DESTROY_SESSION are the same, then | specified in SEQUENCE and DESTROY_SESSION are the same, then | |||
o DESTROY_SESSION MUST be the final operation in the COMPOUND | o DESTROY_SESSION MUST be the final operation in the COMPOUND | |||
request. | request. | |||
o It is advisable to not place DESTROY_SESSION in a COMPOUND request | o It is advisable to avoid placing DESTROY_SESSION in a COMPOUND | |||
with other state-modifying operations, because the DESTROY_SESSION | request with other state-modifying operations, because the | |||
will destroy the reply cache. | DESTROY_SESSION will destroy the reply cache. | |||
o Because the session and its reply cache are destroyed, a client | o Because the session and its reply cache are destroyed, a client | |||
that retries the request may receive an error in reply to the | that retries the request may receive an error in reply to the | |||
retry, even though the original request was successful. | retry, even though the original request was successful. | |||
If the COMPOUND request starts with SEQUENCE, and if the sessionids | If the COMPOUND request starts with SEQUENCE, and if the sessionids | |||
specified in SEQUENCE and DESTROY_SESSION are the different, then | specified in SEQUENCE and DESTROY_SESSION are different, then | |||
DESTROY_SESSION can appear in any position of the COMPOUND request | DESTROY_SESSION can appear in any position of the COMPOUND request | |||
(except for the first position). The two sessionids can belong to | (except for the first position). The two sessionids can belong to | |||
different client IDs. | different client IDs. | |||
If the COMPOUND request does not start with SEQUENCE, and if | If the COMPOUND request does not start with SEQUENCE, and if | |||
DESTROY_SESSION is not the sole operation, then server MUST return | DESTROY_SESSION is not the sole operation, then server MUST return | |||
NFS4ERR_NOT_ONLY_OP. | NFS4ERR_NOT_ONLY_OP. | |||
If there is a backchannel on the session and the server has | If there is a backchannel on the session and the server has | |||
outstanding CB_COMPOUND operations for the session which have not | outstanding CB_COMPOUND operations for the session which have not | |||
been replied to, then the server MAY refuse to destroy the session | been replied to, then the server MAY refuse to destroy the session | |||
and return an error. If so, then in the event the backchannel is | and return an error. If so, then in the event the backchannel is | |||
down, the server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the | down, the server SHOULD return NFS4ERR_CB_PATH_DOWN to inform the | |||
client that the backchannel needs to repaired before the server will | client that the backchannel needs to be repaired before the server | |||
allow the session to be destroyed. Otherwise, the error | will allow the session to be destroyed. Otherwise, the error | |||
CB_BACK_CHAN_BUSY SHOULD be returned to indicate that there are | CB_BACK_CHAN_BUSY SHOULD be returned to indicate that there are | |||
CB_COMPOUNDs that need to be replied to. The client SHOULD reply to | CB_COMPOUNDs that need to be replied to. The client SHOULD reply to | |||
all outstanding CB_COMPOUNDs before re-sending DESTROY_SESSION. | all outstanding CB_COMPOUNDs before re-sending DESTROY_SESSION. | |||
18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks | 18.38. Operation 45: FREE_STATEID - Free Stateid with No Locks | |||
18.38.1. ARGUMENT | 18.38.1. ARGUMENT | |||
struct FREE_STATEID4args { | struct FREE_STATEID4args { | |||
stateid4 fsa_stateid; | stateid4 fsa_stateid; | |||
}; | }; | |||
18.38.2. RESULT | 18.38.2. RESULT | |||
struct FREE_STATEID4res { | struct FREE_STATEID4res { | |||
nfsstat4 fsr_status; | nfsstat4 fsr_status; | |||
}; | }; | |||
18.38.3. DESCRIPTION | 18.38.3. DESCRIPTION | |||
The FREE_STATEID operation is used to free a stateid which no longer | The FREE_STATEID operation is used to free a stateid that no longer | |||
has any associated locks (including opens, byte-range locks, | has any associated locks (including opens, byte-range locks, | |||
delegations, layouts). This may be because of client unlock | delegations, and layouts). This may be because of client LOCKU | |||
operations or because of server revocation. If there are valid locks | operations or because of server revocation. If there are valid locks | |||
(of any kind) associated with the stateid in question, the error | (of any kind) associated with the stateid in question, the error | |||
NFS4ERR_LOCKS_HELD will be returned, and the associated stateid will | NFS4ERR_LOCKS_HELD will be returned, and the associated stateid will | |||
not be freed. | not be freed. | |||
When a stateid is freed which had been associated with revoked locks, | When a stateid is freed that had been associated with revoked locks, | |||
the client, by doing the FREE_STATEID acknowledges the loss of those | by sending the FREE_STATEID operation, the client acknowledges the | |||
locks. This allows the server, once all such revoked state is | loss of those locks. This allows the server, once all such revoked | |||
acknowledged, to allow that client again to reclaim locks, without | state is acknowledged, to allow that client again to reclaim locks, | |||
encountering the edge conditions discussed in Section 8.4.2. | without encountering the edge conditions discussed in Section 8.4.2. | |||
Once a successful FREE_STATEID is done for a given stateid, any | Once a successful FREE_STATEID is done for a given stateid, any | |||
subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID | subsequent use of that stateid will result in an NFS4ERR_BAD_STATEID | |||
error. | error. | |||
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory delegation | 18.39. Operation 46: GET_DIR_DELEGATION - Get a Directory Delegation | |||
18.39.1. ARGUMENT | 18.39.1. ARGUMENT | |||
typedef nfstime4 attr_notice4; | typedef nfstime4 attr_notice4; | |||
struct GET_DIR_DELEGATION4args { | struct GET_DIR_DELEGATION4args { | |||
/* CURRENT_FH: delegated directory */ | /* CURRENT_FH: delegated directory */ | |||
bool gdda_signal_deleg_avail; | bool gdda_signal_deleg_avail; | |||
bitmap4 gdda_notification_types; | bitmap4 gdda_notification_types; | |||
attr_notice4 gdda_child_attr_delay; | attr_notice4 gdda_child_attr_delay; | |||
skipping to change at page 529, line 44 | skipping to change at page 530, line 44 | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.39.3. DESCRIPTION | 18.39.3. DESCRIPTION | |||
The GET_DIR_DELEGATION operation is used by a client to request a | The GET_DIR_DELEGATION operation is used by a client to request a | |||
directory delegation. The directory is represented by the current | directory delegation. The directory is represented by the current | |||
filehandle. The client also specifies whether it wants the server to | filehandle. The client also specifies whether it wants the server to | |||
notify it when the directory changes in certain ways by setting one | notify it when the directory changes in certain ways by setting one | |||
or more bits in a bitmap. The server may choose not to grant the | or more bits in a bitmap. The server may refuse to grant the | |||
delegation. In that case the server will return | delegation. In that case, the server will return | |||
NFS4ERR_DIRDELEG_UNAVAIL. If the server decides to hand out the | NFS4ERR_DIRDELEG_UNAVAIL. If the server decides to hand out the | |||
delegation, it will return a cookie verifier for that directory. If | delegation, it will return a cookie verifier for that directory. If | |||
the cookie verifier changes when the client is holding the | the cookie verifier changes when the client is holding the | |||
delegation, the delegation will be recalled unless the client has | delegation, the delegation will be recalled unless the client has | |||
asked for notification for this event. | asked for notification for this event. | |||
The server will also return a directory delegation stateid, | The server will also return a directory delegation stateid, | |||
gddr_stateid, as a result of the GET_DIR_DELEGATION operation. This | gddr_stateid, as a result of the GET_DIR_DELEGATION operation. This | |||
stateid will appear in callback messages related to the delegation, | stateid will appear in callback messages related to the delegation, | |||
such as notifications and delegation recalls. The client will use | such as notifications and delegation recalls. The client will use | |||
skipping to change at page 530, line 28 | skipping to change at page 531, line 28 | |||
client did not request. | client did not request. | |||
The GET_DIR_DELEGATION operation can be used for both normal and | The GET_DIR_DELEGATION operation can be used for both normal and | |||
named attribute directories. | named attribute directories. | |||
If client sets gdda_signal_deleg_avail to TRUE, then it is | If client sets gdda_signal_deleg_avail to TRUE, then it is | |||
registering with the client a "want" for a directory delegation. If | registering with the client a "want" for a directory delegation. If | |||
the delegation is not available, and the server supports and will | the delegation is not available, and the server supports and will | |||
honor the "want", the results will have | honor the "want", the results will have | |||
gddrnf_will_signal_deleg_avail set to TRUE and no error will be | gddrnf_will_signal_deleg_avail set to TRUE and no error will be | |||
indicated on return. If so the client should expect a future | indicated on return. If so, the client should expect a future | |||
CB_RECALLABLE_OBJ_AVAIL operation to indicate that a directory | CB_RECALLABLE_OBJ_AVAIL operation to indicate that a directory | |||
delegation is available. If the server does not wish to honor the | delegation is available. If the server does not wish to honor the | |||
"want" or is not able to do so, it returns the error | "want" or is not able to do so, it returns the error | |||
NFS4ERR_DIRDELEG_UNAVAIL. If the delegation is immediately | NFS4ERR_DIRDELEG_UNAVAIL. If the delegation is immediately | |||
available, the server SHOULD return it with the response to the | available, the server SHOULD return it with the response to the | |||
operation, rather than via a callback. | operation, rather than via a callback. | |||
When a client makes a request for a directory delegation while it | When a client makes a request for a directory delegation while it | |||
already holds a directory delegation for that directory (including | already holds a directory delegation for that directory (including | |||
the case where it has been recalled but not yet returned by the | the case where it has been recalled but not yet returned by the | |||
skipping to change at page 531, line 19 | skipping to change at page 532, line 19 | |||
or more bits in gdda_notification_types. The client can ask for | or more bits in gdda_notification_types. The client can ask for | |||
notifications on addition of entries to a directory (by setting the | notifications on addition of entries to a directory (by setting the | |||
NOTIFY4_ADD_ENTRY in gdda_notification_types), notifications on entry | NOTIFY4_ADD_ENTRY in gdda_notification_types), notifications on entry | |||
removal (NOTIFY4_REMOVE_ENTRY), renames (NOTIFY4_RENAME_ENTRY), | removal (NOTIFY4_REMOVE_ENTRY), renames (NOTIFY4_RENAME_ENTRY), | |||
directory attribute changes (NOTIFY4_CHANGE_DIR_ATTRIBUTES), and | directory attribute changes (NOTIFY4_CHANGE_DIR_ATTRIBUTES), and | |||
cookie verifier changes (NOTIFY4_CHANGE_COOKIE_VERIFIER) by setting | cookie verifier changes (NOTIFY4_CHANGE_COOKIE_VERIFIER) by setting | |||
one or more corresponding bits in the gdda_notification_types field. | one or more corresponding bits in the gdda_notification_types field. | |||
The client can also ask for notifications of changes to attributes of | The client can also ask for notifications of changes to attributes of | |||
directory entries (NOTIFY4_CHANGE_CHILD_ATTRIBUTES) in order to keep | directory entries (NOTIFY4_CHANGE_CHILD_ATTRIBUTES) in order to keep | |||
its attribute cache up to date. However any changes made to child | its attribute cache up to date. However, any changes made to child | |||
attributes do not cause the delegation to be recalled. If a client | attributes do not cause the delegation to be recalled. If a client | |||
is interested in directory entry caching, or negative name caching, | is interested in directory entry caching or negative name caching, it | |||
it can set the gdda_notification_types appropriately to its | can set the gdda_notification_types appropriately to its particular | |||
particular need and the server will notify it of all changes that | need and the server will notify it of all changes that would | |||
would otherwise invalidate its name cache. The kind of notification | otherwise invalidate its name cache. The kind of notification a | |||
a client asks for may depend on the directory size, its rate of | client asks for may depend on the directory size, its rate of change, | |||
change and the applications being used to access that directory. The | and the applications being used to access that directory. The | |||
enumeration of the conditions under which a client might ask for a | enumeration of the conditions under which a client might ask for a | |||
notification is out of the scope of this specification. | notification is out of the scope of this specification. | |||
For attribute notifications, the client will set bits in the | For attribute notifications, the client will set bits in the | |||
gdda_dir_attributes bitmap to indicate which attributes it wants to | gdda_dir_attributes bitmap to indicate which attributes it wants to | |||
be notified of. If the server does not support notifications for | be notified of. If the server does not support notifications for | |||
changes to a certain attribute, it SHOULD NOT set that attribute in | changes to a certain attribute, it SHOULD NOT set that attribute in | |||
the supported attribute bitmap specified in the reply | the supported attribute bitmap specified in the reply | |||
(gddr_dir_attributes). The client will also set in the | (gddr_dir_attributes). The client will also set in the | |||
gdda_child_attributes bitmap the attributes of directory entries it | gdda_child_attributes bitmap the attributes of directory entries it | |||
skipping to change at page 531, line 48 | skipping to change at page 532, line 48 | |||
gddr_child_attributes which attributes of directory entries it will | gddr_child_attributes which attributes of directory entries it will | |||
notify the client of. | notify the client of. | |||
The client will also let the server know if it wants to get the | The client will also let the server know if it wants to get the | |||
notification as soon as the attribute change occurs or after a | notification as soon as the attribute change occurs or after a | |||
certain delay by setting a delay factor; gdda_child_attr_delay is for | certain delay by setting a delay factor; gdda_child_attr_delay is for | |||
attribute changes to directory entries and gdda_dir_attr_delay is for | attribute changes to directory entries and gdda_dir_attr_delay is for | |||
attribute changes to the directory. If this delay factor is set to | attribute changes to the directory. If this delay factor is set to | |||
zero, that indicates to the server that the client wants to be | zero, that indicates to the server that the client wants to be | |||
notified of any attribute changes as soon as they occur. If the | notified of any attribute changes as soon as they occur. If the | |||
delay factor is set to N seconds, the server will make a best effort | delay factor is set to N seconds, the server will make a best-effort | |||
guarantee that attribute updates are synchronized within N seconds. | guarantee that attribute updates are synchronized within N seconds. | |||
If the client asks for a delay factor that the server does not | If the client asks for a delay factor that the server does not | |||
support or that may cause significant resource consumption on the | support or that may cause significant resource consumption on the | |||
server by causing the server to send a lot of notifications, the | server by causing the server to send a lot of notifications, the | |||
server should not commit to sending out notifications for attributes | server should not commit to sending out notifications for attributes | |||
and therefore must not set the appropriate bit in the | and therefore must not set the appropriate bit in the | |||
gddr_child_attributes and gddr_dir_attributes bitmaps in the | gddr_child_attributes and gddr_dir_attributes bitmaps in the | |||
response. | response. | |||
The client MUST use a security tuple (Section 2.6.1) that the | The client MUST use a security tuple (Section 2.6.1) that the | |||
skipping to change at page 533, line 7 | skipping to change at page 534, line 7 | |||
case NFS4_OK: | case NFS4_OK: | |||
GETDEVICEINFO4resok gdir_resok4; | GETDEVICEINFO4resok gdir_resok4; | |||
case NFS4ERR_TOOSMALL: | case NFS4ERR_TOOSMALL: | |||
count4 gdir_mincount; | count4 gdir_mincount; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.40.3. DESCRIPTION | 18.40.3. DESCRIPTION | |||
Returns pNFS storage device address information for the specified | The GETDEVICEINFO operation returns pNFS storage device address | |||
device ID. The client identifies the device information to be | information for the specified device ID. The client identifies the | |||
returned by providing the gdia_device_id and gdia_layout_type that | device information to be returned by providing the gdia_device_id and | |||
uniquely identify the device. The client provides gdia_maxcount to | gdia_layout_type that uniquely identify the device. The client | |||
limit the number of bytes for the result. This maximum size | provides gdia_maxcount to limit the number of bytes for the result. | |||
represents all of the data being returned within the | This maximum size represents all of the data being returned within | |||
GETDEVICEINFO4resok structure and includes the XDR overhead. The | the GETDEVICEINFO4resok structure and includes the XDR overhead. The | |||
server may return less data. If the server is unable to return any | server may return less data. If the server is unable to return any | |||
information within the gdia_maxcount limit, the error | information within the gdia_maxcount limit, the error | |||
NFS4ERR_TOOSMALL will be returned. However, if gdia_maxcount is | NFS4ERR_TOOSMALL will be returned. However, if gdia_maxcount is | |||
zero, NFS4ERR_TOOSMALL MUST NOT be returned. | zero, NFS4ERR_TOOSMALL MUST NOT be returned. | |||
The da_layout_type field of the gdir_device_addr returned by the | The da_layout_type field of the gdir_device_addr returned by the | |||
server MUST be equal to the gdia_layout_type specified by the client. | server MUST be equal to the gdia_layout_type specified by the client. | |||
If it is not equal, the client SHOULD ignore the response as invalid | If it is not equal, the client SHOULD ignore the response as invalid | |||
and behave as if the server returned an error, even if the client | and behave as if the server returned an error, even if the client | |||
does have support for the layout type returned. | does have support for the layout type returned. | |||
The client also provides a notification bitmap, gdia_notify_types for | The client also provides a notification bitmap, gdia_notify_types, | |||
the device ID mapping notification for which it is interested in | for the device ID mapping notification for which it is interested in | |||
receiving; the server must support device ID notifications for the | receiving; the server must support device ID notifications for the | |||
notification request to have affect. The notification mask is | notification request to have affect. The notification mask is | |||
composed in the same manner as the bitmap for file attributes | composed in the same manner as the bitmap for file attributes | |||
(Section 3.3.7). The numbers of bit positions are listed in the | (Section 3.3.7). The numbers of bit positions are listed in the | |||
notify_device_type4 enumeration type (Section 20.12). Only two | notify_device_type4 enumeration type (Section 20.12). Only two | |||
enumerated values of notify_device_type4 currently apply to | enumerated values of notify_device_type4 currently apply to | |||
GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE | GETDEVICEINFO: NOTIFY_DEVICEID4_CHANGE and NOTIFY_DEVICEID4_DELETE | |||
(see Section 20.12). | (see Section 20.12). | |||
The notification bitmap applies only to the specified device ID. If | The notification bitmap applies only to the specified device ID. If | |||
skipping to change at page 534, line 21 | skipping to change at page 535, line 21 | |||
Aside from updating or turning off notifications, another use case | Aside from updating or turning off notifications, another use case | |||
for gdia_maxcount being set to zero is to validate a device ID. | for gdia_maxcount being set to zero is to validate a device ID. | |||
The client SHOULD request a notification for changes or deletion of a | The client SHOULD request a notification for changes or deletion of a | |||
device ID to device address mapping so that the server can allow the | device ID to device address mapping so that the server can allow the | |||
client gracefully use a new mapping, without having pending I/O fail | client gracefully use a new mapping, without having pending I/O fail | |||
abruptly, or force layouts using the device ID to be recalled or | abruptly, or force layouts using the device ID to be recalled or | |||
revoked. | revoked. | |||
It is possible that GETDEVICEINFO (and GETDEVICELIST) will race with | It is possible that GETDEVICEINFO (and GETDEVICELIST) will race with | |||
CB_NOTIFY_DEVICEID, i.e. CB_NOTIFY_DEVICEID arrives before the | CB_NOTIFY_DEVICEID, i.e., CB_NOTIFY_DEVICEID arrives before the | |||
client gets and processes the response to GETDEVICEINFO or | client gets and processes the response to GETDEVICEINFO or | |||
GETDEVICELIST. The analysis of the race leverages the fact that the | GETDEVICELIST. The analysis of the race leverages the fact that the | |||
server MUST NOT delete a device ID that is referred to by a layout | server MUST NOT delete a device ID that is referred to by a layout | |||
the client has. | the client has. | |||
o CB_NOTIFY_DEVICEID deletes a device ID. If the client believes it | o CB_NOTIFY_DEVICEID deletes a device ID. If the client believes it | |||
has layouts that refer to the device ID, then it is possible the | has layouts that refer to the device ID, then it is possible that | |||
layouts have been revoked. The client should send a TEST_STATEID | layouts referring to the deleted device ID have been revoked. The | |||
request using the stateid for each layout that might have been | client should send a TEST_STATEID request using the stateid for | |||
revoked. If TEST_STATEID indicates any layouts have been revoked, | each layout that might have been revoked. If TEST_STATEID | |||
the client must recover from layout revocation as described in | indicates that any layouts have been revoked, the client must | |||
Section 12.5.6. If TEST_STATEID indicates at least one layout has | recover from layout revocation as described in Section 12.5.6. If | |||
not been revoked, the client should send a GETDEVICEINFO on the | TEST_STATEID indicates that at least one layout has not been | |||
device ID to verify that the device ID has been deleted. If | revoked, the client should send a GETDEVICEINFO operation on the | |||
GETDEVICEINFO indicates the device ID does not exist, the client | supposedly deleted device ID to verify that the device ID has been | |||
then assumes the server is faulty, and recovers by sending an | deleted. | |||
EXCHANGE_ID operation. If the client does not have layouts that | ||||
refer to the device ID, no harm is done. The client should mark | ||||
the device ID as deleted, and when the GETDEVICEINFO or | ||||
GETDEVICELIST results are finally received for the device ID, | ||||
delete the device ID from client's cache. | ||||
o CB_NOTIFY_DEVICEID indicates a device ID's device addressing | If GETDEVICEINFO indicates that the device ID does not exist, then | |||
the client assumes the server is faulty and recovers by sending an | ||||
EXCHANGE_ID operation. If GETDEVICEINFO indicates that the device | ||||
ID does exist, then while the server is faulty for sending an | ||||
erroneous device ID deletion notification, the degree to which it | ||||
is faulty does not require the client to create a new client ID. | ||||
If the client does not have layouts that refer to the device ID, | ||||
no harm is done. The client should mark the device ID as deleted, | ||||
and when GETDEVICEINFO or GETDEVICELIST results are received that | ||||
indicate that the device ID has been in fact deleted, the device | ||||
ID should be removed from the client's cache. | ||||
o CB_NOTIFY_DEVICEID indicates that a device ID's device addressing | ||||
mappings have changed. The client should assume that the results | mappings have changed. The client should assume that the results | |||
from the in progress GETDEVICEINFO will be stale for the device ID | from the in-progress GETDEVICEINFO will be stale for the device ID | |||
once received, and so it should send another GETDEVICEINFO on the | once received, and so it should send another GETDEVICEINFO on the | |||
device ID. | device ID. | |||
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for a File | 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings for a File | |||
System | System | |||
18.41.1. ARGUMENT | 18.41.1. ARGUMENT | |||
struct GETDEVICELIST4args { | struct GETDEVICELIST4args { | |||
/* CURRENT_FH: object belonging to the file system */ | /* CURRENT_FH: object belonging to the file system */ | |||
skipping to change at page 535, line 40 | skipping to change at page 536, line 46 | |||
union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { | union GETDEVICELIST4res switch (nfsstat4 gdlr_status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
GETDEVICELIST4resok gdlr_resok4; | GETDEVICELIST4resok gdlr_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.41.3. DESCRIPTION | 18.41.3. DESCRIPTION | |||
This operation is used by the client to enumerate all of the device | This operation is used by the client to enumerate all of the device | |||
IDs a server's file system uses. | IDs that a server's file system uses. | |||
The client provides a current filehandle of a file object that | The client provides a current filehandle of a file object that | |||
belongs to the file system (i.e. all file objects sharing the same | belongs to the file system (i.e., all file objects sharing the same | |||
fsid as that of the current filehandle), and the layout type in | fsid as that of the current filehandle) and the layout type in | |||
gdia_layout_type. Since this operation might require multiple calls | gdia_layout_type. Since this operation might require multiple calls | |||
to enumerate all the device IDs (and is thus similar to the READDIR | to enumerate all the device IDs (and is thus similar to the READDIR | |||
(Section 18.23) operation), the client also provides gdia_cookie and | (Section 18.23) operation), the client also provides gdia_cookie and | |||
gdia_cookieverf to specify the current cursor position in the list. | gdia_cookieverf to specify the current cursor position in the list. | |||
When the client wants to read from the beginning of the file system's | When the client wants to read from the beginning of the file system's | |||
device mappings, it sets gdla_cookie to zero. The field | device mappings, it sets gdla_cookie to zero. The field | |||
gdla_cookieverf MUST be ignored by the server when gdla_cookie is | gdla_cookieverf MUST be ignored by the server when gdla_cookie is | |||
zero. The client provides gdla_maxdevices to limit the number of | zero. The client provides gdla_maxdevices to limit the number of | |||
device IDs in the result. If gdla_maxdevices is zero, the server | device IDs in the result. If gdla_maxdevices is zero, the server | |||
MUST return NFS4ERR_INVAL. The server MAY return fewer device IDs. | MUST return NFS4ERR_INVAL. The server MAY return fewer device IDs. | |||
The successful response to the operation will contain the cookie, | The successful response to the operation will contain the cookie, | |||
gdlr_cookie, and cookie verifier, gdlr_cookieverf, to be used on the | gdlr_cookie, and the cookie verifier, gdlr_cookieverf, to be used on | |||
subsequent GETDEVICELIST. A gdlr_eof value of TRUE signifies that | the subsequent GETDEVICELIST. A gdlr_eof value of TRUE signifies | |||
there are no remaining entries in the server's device list. Each | that there are no remaining entries in the server's device list. | |||
element of gdlr_deviceid_list contains a device ID. | Each element of gdlr_deviceid_list contains a device ID. | |||
18.41.4. IMPLEMENTATION | 18.41.4. IMPLEMENTATION | |||
An example of the use of this operation is for pNFS clients and | An example of the use of this operation is for pNFS clients and | |||
servers that use LAYOUT4_BLOCK_VOLUME layouts. In these environments | servers that use LAYOUT4_BLOCK_VOLUME layouts. In these environments | |||
it may be helpful for a client to determine device accessibility upon | it may be helpful for a client to determine device accessibility upon | |||
first file system access. | first file system access. | |||
18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a Layout | 18.42. Operation 49: LAYOUTCOMMIT - Commit Writes Made Using a Layout | |||
skipping to change at page 537, line 27 | skipping to change at page 539, line 7 | |||
union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) { | union LAYOUTCOMMIT4res switch (nfsstat4 locr_status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
LAYOUTCOMMIT4resok locr_resok4; | LAYOUTCOMMIT4resok locr_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.42.3. DESCRIPTION | 18.42.3. DESCRIPTION | |||
Commits changes in the layout represented by the current filehandle, | The LAYOUTCOMMIT operation commits changes in the layout represented | |||
client ID (derived from the session ID in the preceding SEQUENCE | by the current filehandle, client ID (derived from the session ID in | |||
operation), byte range, and stateid. Since layouts are sub- | the preceding SEQUENCE operation), byte-range, and stateid. Since | |||
dividable, a smaller portion of a layout, retrieved via LAYOUTGET, | layouts are sub-dividable, a smaller portion of a layout, retrieved | |||
can be committed. The region being committed is specified through | via LAYOUTGET, can be committed. The byte-range being committed is | |||
the byte range (loca_offset and loca_length). This region MUST | specified through the byte-range (loca_offset and loca_length). This | |||
overlap with one or more existing layouts previously granted via | byte-range MUST overlap with one or more existing layouts previously | |||
LAYOUTGET (Section 18.43), each with an iomode of LAYOUTIOMODE4_RW. | granted via LAYOUTGET (Section 18.43), each with an iomode of | |||
In the case where the iomode of any held layout segment is not | LAYOUTIOMODE4_RW. In the case where the iomode of any held layout | |||
LAYOUTIOMODE4_RW, the server should return the error | segment is not LAYOUTIOMODE4_RW, the server should return the error | |||
NFS4ERR_BAD_IOMODE. For the case where the client does not hold | NFS4ERR_BAD_IOMODE. For the case where the client does not hold | |||
matching layout segment(s) for the defined region, the server should | matching layout segment(s) for the defined byte-range, the server | |||
return the error NFS4ERR_BAD_LAYOUT. | should return the error NFS4ERR_BAD_LAYOUT. | |||
The LAYOUTCOMMIT operation indicates that the client has completed | The LAYOUTCOMMIT operation indicates that the client has completed | |||
writes using a layout obtained by a previous LAYOUTGET. The client | writes using a layout obtained by a previous LAYOUTGET. The client | |||
may have only written a subset of the data range it previously | may have only written a subset of the data range it previously | |||
requested. LAYOUTCOMMIT allows it to commit or discard provisionally | requested. LAYOUTCOMMIT allows it to commit or discard provisionally | |||
allocated space and to update the server with a new end of file. The | allocated space and to update the server with a new end-of-file. The | |||
layout referenced by LAYOUTCOMMIT is still valid after the operation | layout referenced by LAYOUTCOMMIT is still valid after the operation | |||
completes and can be continued to be referenced by the client ID, | completes and can be continued to be referenced by the client ID, | |||
filehandle, byte range, layout type, and stateid. | filehandle, byte-range, layout type, and stateid. | |||
If the loca_reclaim field is set to TRUE, this indicates that the | If the loca_reclaim field is set to TRUE, this indicates that the | |||
client is attempting to commit changes to a layout after the restart | client is attempting to commit changes to a layout after the restart | |||
of the metadata server during the metadata server's recovery grace | of the metadata server during the metadata server's recovery grace | |||
period (see Section 12.7.4). This type of request may be necessary | period (see Section 12.7.4). This type of request may be necessary | |||
when the client has uncommitted writes to provisionally allocated | when the client has uncommitted writes to provisionally allocated | |||
regions of a file which were sent to the storage devices before the | byte-ranges of a file that were sent to the storage devices before | |||
restart of the metadata server. In this case the layout provided by | the restart of the metadata server. In this case, the layout | |||
the client MUST be a subset of a writable layout that the client held | provided by the client MUST be a subset of a writable layout that the | |||
immediately before the restart of the metadata server. The value of | client held immediately before the restart of the metadata server. | |||
the field loca_stateid MUST be a value the metadata server returned | The value of the field loca_stateid MUST be a value that the metadata | |||
before it restarted. The metadata server is free to accept or reject | server returned before it restarted. The metadata server is free to | |||
this request based on its own internal metadata consistency checks. | accept or reject this request based on its own internal metadata | |||
If the metadata server finds that the layout provided by the client | consistency checks. If the metadata server finds that the layout | |||
does not pass its consistency checks, it MUST reject the request with | provided by the client does not pass its consistency checks, it MUST | |||
the status NFS4ERR_RECLAIM_BAD. The successful completion of the | reject the request with the status NFS4ERR_RECLAIM_BAD. The | |||
LAYOUTCOMMIT request with loca_reclaim set to TRUE does NOT provide | successful completion of the LAYOUTCOMMIT request with loca_reclaim | |||
the client with a layout for the file. It simply commits the changes | set to TRUE does NOT provide the client with a layout for the file. | |||
to the layout specified in the loca_layoutupdate field. To obtain a | It simply commits the changes to the layout specified in the | |||
layout for the file the client must send a LAYOUTGET request to the | loca_layoutupdate field. To obtain a layout for the file, the client | |||
server after the server's grace period has expired. If the metadata | must send a LAYOUTGET request to the server after the server's grace | |||
server receives a LAYOUTCOMMIT request with loca_reclaim set to TRUE | period has expired. If the metadata server receives a LAYOUTCOMMIT | |||
when the metadata server is not in its recovery grace period, it MUST | request with loca_reclaim set to TRUE when the metadata server is not | |||
reject the request with the status NFS4ERR_NO_GRACE. | in its recovery grace period, it MUST reject the request with the | |||
status NFS4ERR_NO_GRACE. | ||||
Setting the loca_reclaim field to TRUE is required if and only if the | Setting the loca_reclaim field to TRUE is required if and only if the | |||
committed layout was acquired before the metadata server restart. If | committed layout was acquired before the metadata server restart. If | |||
the client is committing a layout that was acquired during the | the client is committing a layout that was acquired during the | |||
metadata server's grace period, it MUST set the "reclaim" field to | metadata server's grace period, it MUST set the "reclaim" field to | |||
FALSE. | FALSE. | |||
The loca_stateid is a layout stateid value as returned by previously | The loca_stateid is a layout stateid value as returned by previously | |||
successful layout operations (see Section 12.5.3). | successful layout operations (see Section 12.5.3). | |||
skipping to change at page 538, line 50 | skipping to change at page 540, line 30 | |||
described by loca_offset and loca_length. The metadata server may | described by loca_offset and loca_length. The metadata server may | |||
use this information to determine whether the file's size needs to be | use this information to determine whether the file's size needs to be | |||
updated. If the metadata server updates the file's size as the | updated. If the metadata server updates the file's size as the | |||
result of the LAYOUTCOMMIT operation, it must return the new size | result of the LAYOUTCOMMIT operation, it must return the new size | |||
(locr_newsize.ns_size) as part of the results. | (locr_newsize.ns_size) as part of the results. | |||
The loca_time_modify field allows the client to suggest a | The loca_time_modify field allows the client to suggest a | |||
modification time it would like the metadata server to set. The | modification time it would like the metadata server to set. The | |||
metadata server may use the suggestion or it may use the time of the | metadata server may use the suggestion or it may use the time of the | |||
LAYOUTCOMMIT operation to set the modification time. If the metadata | LAYOUTCOMMIT operation to set the modification time. If the metadata | |||
server uses the client provided modification time, it should ensure | server uses the client-provided modification time, it should ensure | |||
time does not flow backwards. If the client wants to force the | that time does not flow backwards. If the client wants to force the | |||
metadata server to set an exact time, the client should use a SETATTR | metadata server to set an exact time, the client should use a SETATTR | |||
operation in a COMPOUND right after LAYOUTCOMMIT. See Section 12.5.4 | operation in a COMPOUND right after LAYOUTCOMMIT. See Section 12.5.4 | |||
for more details. If the client desires the resultant modification | for more details. If the client desires the resultant modification | |||
time it should construct the COMPOUND so that a GETATTR follows the | time, it should construct the COMPOUND so that a GETATTR follows the | |||
LAYOUTCOMMIT. | LAYOUTCOMMIT. | |||
The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism | The loca_layoutupdate argument to LAYOUTCOMMIT provides a mechanism | |||
for a client to provide layout specific updates to the metadata | for a client to provide layout-specific updates to the metadata | |||
server. For example, the layout update can describe what regions of | server. For example, the layout update can describe what byte-ranges | |||
the original layout have been used and what regions can be | of the original layout have been used and what byte-ranges can be | |||
deallocated. There is no NFSv4.1 file layout-specific layoutupdate4 | deallocated. There is no NFSv4.1 file layout-specific layoutupdate4 | |||
structure. | structure. | |||
The layout information is more verbose for block devices than for | The layout information is more verbose for block devices than for | |||
objects and files because the latter two hide the details of block | objects and files because the latter two hide the details of block | |||
allocation behind their storage protocols. At the minimum, the | allocation behind their storage protocols. At the minimum, the | |||
client needs to communicate changes to the end of file location back | client needs to communicate changes to the end-of-file location back | |||
to the server, and, if desired, its view of the file's modification | to the server, and, if desired, its view of the file's modification | |||
time. For block/volume layouts, it needs to specify precisely which | time. For block/volume layouts, it needs to specify precisely which | |||
blocks have been used. | blocks have been used. | |||
If the layout identified in the arguments does not exist, the error | If the layout identified in the arguments does not exist, the error | |||
NFS4ERR_BADLAYOUT is returned. The layout being committed may also | NFS4ERR_BADLAYOUT is returned. The layout being committed may also | |||
be rejected if it does not correspond to an existing layout with an | be rejected if it does not correspond to an existing layout with an | |||
iomode of LAYOUTIOMODE4_RW. | iomode of LAYOUTIOMODE4_RW. | |||
On success, the current filehandle retains its value and the current | On success, the current filehandle retains its value and the current | |||
skipping to change at page 539, line 42 | skipping to change at page 541, line 22 | |||
18.42.4. IMPLEMENTATION | 18.42.4. IMPLEMENTATION | |||
The client MAY also use LAYOUTCOMMIT with the loca_reclaim field set | The client MAY also use LAYOUTCOMMIT with the loca_reclaim field set | |||
to TRUE to convey hints to modified file attributes or to report | to TRUE to convey hints to modified file attributes or to report | |||
layout-type specific information such as I/O errors for object-based | layout-type specific information such as I/O errors for object-based | |||
storage layouts, as normally done during normal operation. Doing so | storage layouts, as normally done during normal operation. Doing so | |||
may help the metadata server to recover files more efficiently after | may help the metadata server to recover files more efficiently after | |||
restart. For example, some file system implementations may require | restart. For example, some file system implementations may require | |||
expansive recovery of file system objects if the metadata server does | expansive recovery of file system objects if the metadata server does | |||
not get a positive indication from all clients holding a write layout | not get a positive indication from all clients holding a | |||
that they have successfully completed all their writes. Sending a | LAYOUTIOMODE4_RW layout that they have successfully completed all | |||
LAYOUTCOMMIT (if required) and then following with LAYOUTRETURN can | their writes. Sending a LAYOUTCOMMIT (if required) and then | |||
provide such an indication and allow for graceful and efficient | following with LAYOUTRETURN can provide such an indication and allow | |||
recovery. | for graceful and efficient recovery. | |||
If loca_reclaim is TRUE, the metadata server is free to either | If loca_reclaim is TRUE, the metadata server is free to either | |||
examine or ignore the value in the field loca_stateid. The metadata | examine or ignore the value in the field loca_stateid. The metadata | |||
server implementation might or might not encode in its layout stateid | server implementation might or might not encode in its layout stateid | |||
information that allows the metadate server to perform a consistency | information that allows the metadate server to perform a consistency | |||
check on the LAYOUTCOMMIT request. | check on the LAYOUTCOMMIT request. | |||
18.43. Operation 50: LAYOUTGET - Get Layout Information | 18.43. Operation 50: LAYOUTGET - Get Layout Information | |||
18.43.1. ARGUMENT | 18.43.1. ARGUMENT | |||
skipping to change at page 540, line 41 | skipping to change at page 542, line 24 | |||
case NFS4_OK: | case NFS4_OK: | |||
LAYOUTGET4resok logr_resok4; | LAYOUTGET4resok logr_resok4; | |||
case NFS4ERR_LAYOUTTRYLATER: | case NFS4ERR_LAYOUTTRYLATER: | |||
bool logr_will_signal_layout_avail; | bool logr_will_signal_layout_avail; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.43.3. DESCRIPTION | 18.43.3. DESCRIPTION | |||
Requests a layout from the metadata server for reading or writing the | The LAYOUTGET operation requests a layout from the metadata server | |||
file given by the filehandle at the byte range specified by offset | for reading or writing the file given by the filehandle at the byte- | |||
and length. Layouts are identified by the client ID (derived from | range specified by offset and length. Layouts are identified by the | |||
the session ID in the preceding SEQUENCE operation), current | client ID (derived from the session ID in the preceding SEQUENCE | |||
filehandle, layout type (loga_layout_type), and the layout stateid | operation), current filehandle, layout type (loga_layout_type), and | |||
(loga_stateid). The use of the loga_iomode field depends upon the | the layout stateid (loga_stateid). The use of the loga_iomode field | |||
layout type, but should reflect the client's data access intent. | depends upon the layout type, but should reflect the client's data | |||
access intent. | ||||
If the metadata server is in a grace period, and does not persist | If the metadata server is in a grace period, and does not persist | |||
layouts and device ID to device address mappings, then it MUST return | layouts and device ID to device address mappings, then it MUST return | |||
NFS4ERR_GRACE (see Section 8.4.2.1). | NFS4ERR_GRACE (see Section 8.4.2.1). | |||
The LAYOUTGET operation returns layout information for the specified | The LAYOUTGET operation returns layout information for the specified | |||
byte range: a layout. The client actually specifies two ranges, both | byte-range: a layout. The client actually specifies two ranges, both | |||
starting at the offset in the loga_offset field. The first range is | starting at the offset in the loga_offset field. The first range is | |||
between loga_offset and loga_offset + loga_length - 1 inclusive. | between loga_offset and loga_offset + loga_length - 1 inclusive. | |||
This range indicates the desired range the client wants the layout to | This range indicates the desired range the client wants the layout to | |||
cover. The second range is between loga_offset and loga_offset + | cover. The second range is between loga_offset and loga_offset + | |||
loga_minlength - 1 inclusive. This range indicates the required | loga_minlength - 1 inclusive. This range indicates the required | |||
range the client needs the layout to cover. Thus, loga_minlength | range the client needs the layout to cover. Thus, loga_minlength | |||
MUST be less than or equal to loga_length. | MUST be less than or equal to loga_length. | |||
When a length field is set to NFS4_UINT64_MAX, this indicates a | When a length field is set to NFS4_UINT64_MAX, this indicates a | |||
desire (when loga_length is NFS4_UINT64_MAX) or requirement (when | desire (when loga_length is NFS4_UINT64_MAX) or requirement (when | |||
loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset | loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset | |||
through the end-of-file, regardless of the file's length. | through the end-of-file, regardless of the file's length. | |||
The following rules govern the relationships among, and the minima of | The following rules govern the relationships among, and the minima | |||
loga_length, loga_minlength, and loga_offset. | of, loga_length, loga_minlength, and loga_offset. | |||
o If loga_length is less than loga_minlength, the metadata server | o If loga_length is less than loga_minlength, the metadata server | |||
MUST return NFS4ERR_INVAL. | MUST return NFS4ERR_INVAL. | |||
o If loga_minlength is zero, this is an indication to the metadata | o If loga_minlength is zero, this is an indication to the metadata | |||
server that the client desires any layout at offset loga_offset or | server that the client desires any layout at offset loga_offset or | |||
less that the metadata server has "readily available". Readily is | less that the metadata server has "readily available". Readily is | |||
subjective, and depends on the layout type and the pNFS server | subjective, and depends on the layout type and the pNFS server | |||
implementation. For example, some metadata servers might have to | implementation. For example, some metadata servers might have to | |||
pre-allocate stable storage when they receive a request for a | pre-allocate stable storage when they receive a request for a | |||
range of a file that goes beyond the file's current length. If | range of a file that goes beyond the file's current length. If | |||
loga_minlength is zero and loga_length is greater than zero, this | loga_minlength is zero and loga_length is greater than zero, this | |||
tells the metadata server what range of the layout the client | tells the metadata server what range of the layout the client | |||
would prefer to have. If loga_length and loga_minlength are both | would prefer to have. If loga_length and loga_minlength are both | |||
zero, then the client is indicating it desires a layout of any | zero, then the client is indicating that it desires a layout of | |||
length with the ending offset of the range no less than specified | any length with the ending offset of the range no less than the | |||
loga_offset, and the starting offset at or below loga_offset. If | value specified loga_offset, and the starting offset at or below | |||
the metadata server does not have a layout that is readily | loga_offset. If the metadata server does not have a layout that | |||
available, then it MUST return NFS4ERR_LAYOUTTRYLATER. | is readily available, then it MUST return NFS4ERR_LAYOUTTRYLATER. | |||
o If the sum of loga_offset and loga_minlength exceeds | o If the sum of loga_offset and loga_minlength exceeds | |||
NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the | NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the | |||
error NFS4ERR_INVAL MUST result. | error NFS4ERR_INVAL MUST result. | |||
o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX, | o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX, | |||
and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL | and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL | |||
MUST result. | MUST result. | |||
After the metadata server has performed the above checks on | After the metadata server has performed the above checks on | |||
skipping to change at page 542, line 48 | skipping to change at page 544, line 46 | |||
| | | _RW | <= a_off | | | | | | _RW | <= a_off | | | |||
+-----------+-----------+----------+----------+---------------------+ | +-----------+-----------+----------+----------+---------------------+ | |||
Table 13 | Table 13 | |||
If loga_minlength is not zero and the metadata server cannot return a | If loga_minlength is not zero and the metadata server cannot return a | |||
layout according to the rules in Table 13, then the metadata server | layout according to the rules in Table 13, then the metadata server | |||
MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero | MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero | |||
and the metadata server cannot or will not return a layout according | and the metadata server cannot or will not return a layout according | |||
to the rules in Table 13, then the metadata server MUST return the | to the rules in Table 13, then the metadata server MUST return the | |||
error NFS4ERR_LAYOUTTRYLATER. Assuming loga_length is greater than | error NFS4ERR_LAYOUTTRYLATER. Assuming that loga_length is greater | |||
loga_minlength or equal to zero, the metadata server SHOULD return a | than loga_minlength or equal to zero, the metadata server SHOULD | |||
layout according to the rules in Table 14. | return a layout according to the rules in Table 14. | |||
Desired layouts based on loga_length. The rules of Table 13 MUST be | Desired layouts based on loga_length. The rules of Table 13 MUST be | |||
applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; | applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; | |||
a_len = loga_length. | a_len = loga_length. | |||
+------------+------------+-----------+-----------+-----------------+ | +------------+------------+-----------+-----------+-----------------+ | |||
| Layout | Layout | Layout | Layout | Layout length | | | Layout | Layout | Layout | Layout | Layout length | | |||
| iomode of | a_len of | iomode of | offset of | of reply | | | iomode of | a_len of | iomode of | offset of | of reply | | |||
| request | request | reply | reply | | | | request | request | reply | reply | | | |||
+------------+------------+-----------+-----------+-----------------+ | +------------+------------+-----------+-----------+-----------------+ | |||
skipping to change at page 544, line 29 | skipping to change at page 546, line 29 | |||
iomode | iomode | |||
The value of the returned layout iomode listed in Table 13 and | The value of the returned layout iomode listed in Table 13 and | |||
Table 14 is equal to the value of the lo_iomode field in each | Table 14 is equal to the value of the lo_iomode field in each | |||
element of logr_layout. As shown in Table 13 and Table 14, the | element of logr_layout. As shown in Table 13 and Table 14, the | |||
metadata server MAY return a layout with an lo_iomode different | metadata server MAY return a layout with an lo_iomode different | |||
from the requested iomode (field loga_iomode of the request). If | from the requested iomode (field loga_iomode of the request). If | |||
it does so, it MUST ensure that the lo_iomode is more permissive | it does so, it MUST ensure that the lo_iomode is more permissive | |||
than the loga_iomode requested. For example, this behavior allows | than the loga_iomode requested. For example, this behavior allows | |||
an implementation to upgrade read-only requests to read/write | an implementation to upgrade LAYOUTIOMODE4_READ requests to | |||
requests at its discretion, within the limits of the layout type | LAYOUTIOMODE4_RW requests at its discretion, within the limits of | |||
specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or | the layout type specific protocol. A lo_iomode of either | |||
LAYOUTIOMODE4_RW MUST be returned. | LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. | |||
offset | offset | |||
The value of the returned layout offset listed in Table 13 and | The value of the returned layout offset listed in Table 13 and | |||
Table 14 is always equal to the lo_offset field of the first | Table 14 is always equal to the lo_offset field of the first | |||
element logr_layout. | element logr_layout. | |||
length | length | |||
When setting the value of the returned layout length, the | When setting the value of the returned layout length, the | |||
situation is complicated by the possibility that the special | situation is complicated by the possibility that the special | |||
layout length value NFS4_UINT64_MAX is involved. For a | layout length value NFS4_UINT64_MAX is involved. For a | |||
logr_layout array of N elements, the lo_length field in the first | logr_layout array of N elements, the lo_length field in the first | |||
N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of | N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of | |||
the last element of logr_layout can be NFS4_UINT64_MAX under some | the last element of logr_layout can be NFS4_UINT64_MAX under some | |||
conditions as described in the following list. | conditions as described in the following list. | |||
* If an applicable rule of Table 13 states the metadata server | * If an applicable rule of Table 13 states that the metadata | |||
MUST return a layout of length NFS4_UINT64_MAX, then lo_length | server MUST return a layout of length NFS4_UINT64_MAX, then the | |||
field of the last element of logr_layout MUST be | lo_length field of the last element of logr_layout MUST be | |||
NFS4_UINT64_MAX. | NFS4_UINT64_MAX. | |||
* If an applicable rule of Table 13 states the metadata server | * If an applicable rule of Table 13 states that the metadata | |||
MUST NOT return a layout of length NFS4_UINT64_MAX, then | server MUST NOT return a layout of length NFS4_UINT64_MAX, then | |||
lo_length field of the last element of logr_layout MUST NOT be | the lo_length field of the last element of logr_layout MUST NOT | |||
NFS4_UINT64_MAX. | be NFS4_UINT64_MAX. | |||
* If an applicable rule of Table 14 states the metadata server | * If an applicable rule of Table 14 states that the metadata | |||
SHOULD return a layout of length NFS4_UINT64_MAX, then | server SHOULD return a layout of length NFS4_UINT64_MAX, then | |||
lo_length field of the last element of logr_layout SHOULD be | the lo_length field of the last element of logr_layout SHOULD | |||
NFS4_UINT64_MAX. | be NFS4_UINT64_MAX. | |||
* When the value of the returned layout length of Table 13 and | * When the value of the returned layout length of Table 13 and | |||
Table 14 is not NFS4_UINT64_MAX, then the returned layout | Table 14 is not NFS4_UINT64_MAX, then the returned layout | |||
length is equal to the sum of the lo_length fields of each | length is equal to the sum of the lo_length fields of each | |||
element of logr_layout. | element of logr_layout. | |||
The logr_return_on_close result field is a directive to return the | The logr_return_on_close result field is a directive to return the | |||
layout before closing the file. When the metadata server sets this | layout before closing the file. When the metadata server sets this | |||
return value to TRUE, it MUST be prepared to recall the layout in the | return value to TRUE, it MUST be prepared to recall the layout in the | |||
case the client fails to return the layout before close. For the | case in which the client fails to return the layout before close. | |||
metadata server that knows a layout must be returned before a close | For the metadata server that knows a layout must be returned before a | |||
of the file, this return value can be used to communicate the desired | close of the file, this return value can be used to communicate the | |||
behavior to the client and thus remove one extra step from the | desired behavior to the client and thus remove one extra step from | |||
client's and metadata server's interaction. | the client's and metadata server's interaction. | |||
The logr_stateid stateid is returned to the client for use in | The logr_stateid stateid is returned to the client for use in | |||
subsequent layout related operations. See Section 8.2, | subsequent layout related operations. See Sections 8.2, 12.5.3, and | |||
Section 12.5.3, and Section 12.5.5.2 for a further discussion and | 12.5.5.2 for a further discussion and requirements. | |||
requirements. | ||||
The format of the returned layout (lo_content) is specific to the | The format of the returned layout (lo_content) is specific to the | |||
layout type. The value of the layout type (lo_content.loc_type) for | layout type. The value of the layout type (lo_content.loc_type) for | |||
each of the elements of the array of layouts returned by the metadata | each of the elements of the array of layouts returned by the metadata | |||
server (logr_layout) MUST be equal to the loga_layout_type specified | server (logr_layout) MUST be equal to the loga_layout_type specified | |||
by the client. If it is not equal, the client SHOULD ignore the | by the client. If it is not equal, the client SHOULD ignore the | |||
response as invalid and behave as if the metadata server returned an | response as invalid and behave as if the metadata server returned an | |||
error, even if the client does have support for the layout type | error, even if the client does have support for the layout type | |||
returned. | returned. | |||
If layouts are not supported for the requested file or its containing | If neither the requested file nor its containing file system support | |||
file system the metadata server MUST return | layouts, the metadata server MUST return NFS4ERR_LAYOUTUNAVAILABLE. | |||
NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the | If the layout type is not supported, the metadata server MUST return | |||
metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts | NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout | |||
are supported but no layout matches the client provided layout | matches the client provided layout identification, the metadata | |||
identification, the metadata server MUST return NFS4ERR_BADLAYOUT. | server MUST return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is | |||
If an invalid loga_iomode is specified, or a loga_iomode of | specified, or a loga_iomode of LAYOUTIOMODE4_ANY is specified, the | |||
LAYOUTIOMODE4_ANY is specified, the metadata server MUST return | metadata server MUST return NFS4ERR_BADIOMODE. | |||
NFS4ERR_BADIOMODE. | ||||
If the layout for the file is unavailable due to transient | If the layout for the file is unavailable due to transient | |||
conditions, e.g. file sharing prohibits layouts, the metadata server | conditions, e.g., file sharing prohibits layouts, the metadata server | |||
MUST return NFS4ERR_LAYOUTTRYLATER. | MUST return NFS4ERR_LAYOUTTRYLATER. | |||
If the layout request is rejected due to an overlapping layout | If the layout request is rejected due to an overlapping layout | |||
recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See | recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See | |||
Section 12.5.5.2 for details. | Section 12.5.5.2 for details. | |||
If the layout conflicts with a mandatory byte range lock held on the | If the layout conflicts with a mandatory byte-range lock held on the | |||
file, and if the storage devices have no method of enforcing | file, and if the storage devices have no method of enforcing | |||
mandatory locks, other than through the restriction of layouts, the | mandatory locks, other than through the restriction of layouts, the | |||
metadata server SHOULD return NFS4ERR_LOCKED. | metadata server SHOULD return NFS4ERR_LOCKED. | |||
If client sets loga_signal_layout_avail to TRUE, then it is | If client sets loga_signal_layout_avail to TRUE, then it is | |||
registering with the client a "want" for a layout in the event the | registering with the client a "want" for a layout in the event the | |||
layout cannot be obtained due to resource exhaustion. If the | layout cannot be obtained due to resource exhaustion. If the | |||
metadata server supports and will honor the "want", the results will | metadata server supports and will honor the "want", the results will | |||
have logr_will_signal_layout_avail set to TRUE. If so the client | have logr_will_signal_layout_avail set to TRUE. If so, the client | |||
should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a | should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a | |||
layout is available. | layout is available. | |||
On success, the current filehandle retains its value and the current | On success, the current filehandle retains its value and the current | |||
stateid is updated to match the value as returned in the results. | stateid is updated to match the value as returned in the results. | |||
18.43.4. IMPLEMENTATION | 18.43.4. IMPLEMENTATION | |||
Typically, LAYOUTGET will be called as part of a COMPOUND request | Typically, LAYOUTGET will be called as part of a COMPOUND request | |||
after an OPEN operation and results in the client having location | after an OPEN operation and results in the client having location | |||
information for the file; this requires that loga_stateid be set to | information for the file. This requires that loga_stateid be set to | |||
the special stateid that tells the metadata server to use the current | the special stateid that tells the metadata server to use the current | |||
stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client | stateid, which is set by OPEN (see Section 16.2.3.1.2). A client may | |||
may also hold a layout across multiple OPENs. The client specifies a | also hold a layout across multiple OPENs. The client specifies a | |||
layout type that limits what kind of layout the metadata server will | layout type that limits what kind of layout the metadata server will | |||
return. This prevents metadata servers from granting layouts that | return. This prevents metadata servers from granting layouts that | |||
are unusable by the client. | are unusable by the client. | |||
As indicated by Table 13 and Table 14 the specification of LAYOUTGET | As indicated by Table 13 and Table 14, the specification of LAYOUTGET | |||
allows a pNFS client and server considerable flexibility. A pNFS | allows a pNFS client and server considerable flexibility. A pNFS | |||
client can take several strategies for sending LAYOUTGET. Some | client can take several strategies for sending LAYOUTGET. Some | |||
examples are as follows. | examples are as follows. | |||
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and | o If LAYOUTGET is preceded by OPEN in the same COMPOUND request and | |||
the OPEN requests read access, the client might opt to request a | the OPEN requests OPEN4_SHARE_ACCESS_READ access, the client might | |||
_READ layout with loga_offset set to zero, loga_minlength set to | opt to request a _READ layout with loga_offset set to zero, | |||
zero, and loga_length set to NFS4_UINT64_MAX. If the file has | loga_minlength set to zero, and loga_length set to | |||
space allocated to it, that space is striped over one or more | NFS4_UINT64_MAX. If the file has space allocated to it, that | |||
storage devices, and there is either no conflicting layout, or the | space is striped over one or more storage devices, and there is | |||
concept of a conflicting layout does not apply to the pNFS | either no conflicting layout or the concept of a conflicting | |||
server's layout type or implementation, then the metadata server | layout does not apply to the pNFS server's layout type or | |||
might return a layout with a starting offset of zero, and a length | implementation, then the metadata server might return a layout | |||
equal to the length of the file, if not NFS4_UINT64_MAX. If the | with a starting offset of zero, and a length equal to the length | |||
length of the file is not a multiple of the pNFS server's stripe | of the file, if not NFS4_UINT64_MAX. If the length of the file is | |||
width (see Section 13.2 for a formal definition), the metadata | not a multiple of the pNFS server's stripe width (see Section 13.2 | |||
server might round the returned layout's length up. | for a formal definition), the metadata server might round up the | |||
returned layout's length. | ||||
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and | o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and | |||
the OPEN does not truncate the file, and requests write access, | the OPEN requests OPEN4_SHARE_ACCESS_WRITE access and does not | |||
the client might opt to request a _RW layout with loga_offset set | truncate the file, the client might opt to request a _RW layout | |||
to zero, loga_minlength set to zero, and loga_length set to the | with loga_offset set to zero, loga_minlength set to zero, and | |||
file's current length (if known), or NFS4_UINT64_MAX. As with the | loga_length set to the file's current length (if known), or | |||
previous case, under some conditions the metadata server might | NFS4_UINT64_MAX. As with the previous case, under some conditions | |||
return a layout that covers the entire length of the file or | the metadata server might return a layout that covers the entire | |||
beyond. | length of the file or beyond. | |||
o As above, but the OPEN truncates the file. In this case, client | o This strategy is as above, but the OPEN truncates the file. In | |||
might anticipate it will be writing to the file from offset zero, | this case, the client might anticipate it will be writing to the | |||
and so loga_offset and loga_minlength are set to zero, and | file from offset zero, and so loga_offset and loga_minlength are | |||
loga_length is set to the value of threshold4_write_iosize. The | set to zero, and loga_length is set to the value of | |||
metadata server might return a layout from offset zero with a | threshold4_write_iosize. The metadata server might return a | |||
length at least as long as as threshold4_write_iosize. | layout from offset zero with a length at least as long as as | |||
threshold4_write_iosize. | ||||
o A process on the client invokes a request to read from offset | o A process on the client invokes a request to read from offset | |||
10000 for length 50000. The client is using buffered I/O, and has | 10000 for length 50000. The client is using buffered I/O, and has | |||
buffer sizes of 4096 bytes. The client intends to map the request | buffer sizes of 4096 bytes. The client intends to map the request | |||
of the process into a series of READ requests starting at offset | of the process into a series of READ requests starting at offset | |||
8192. The end offset needs to be higher than 10000 + 50000 = | 8192. The end offset needs to be higher than 10000 + 50000 = | |||
60000, and the next offset that is a multiple of 4096 is 61440. | 60000, and the next offset that is a multiple of 4096 is 61440. | |||
The difference between 61440 and that starting offset of the | The difference between 61440 and that starting offset of the | |||
layout is 53248 (which is the product of 4096 and 15). The value | layout is 53248 (which is the product of 4096 and 15). The value | |||
of threshold4_read_iosize is less than 53248, so the client sends | of threshold4_read_iosize is less than 53248, so the client sends | |||
a LAYOUTGET request with loga_offset set to 8192, loga_minlength | a LAYOUTGET request with loga_offset set to 8192, loga_minlength | |||
set to 53248, and loga_length set to the file's length (if known) | set to 53248, and loga_length set to the file's length (if known) | |||
minus 8192 or NFS4_UINT64_MAX (if the file's length is not known). | minus 8192 or NFS4_UINT64_MAX (if the file's length is not known). | |||
Since this LAYOUTGET request exceeds the metadata server's | Since this LAYOUTGET request exceeds the metadata server's | |||
threshold, it grants the layout, possibly with an initial offset | threshold, it grants the layout, possibly with an initial offset | |||
of 0, with an end offset of at least 8192 + 53248 - 1 = 61439, but | of zero, with an end offset of at least 8192 + 53248 - 1 = 61439, | |||
preferably a layout with an offset aligned on the stripe width and | but preferably a layout with an offset aligned on the stripe width | |||
a length that is a multiple of the stripe width. | and a length that is a multiple of the stripe width. | |||
o As above, but the client is not using buffered I/O, and instead | o This strategy is as above, but the client is not using buffered | |||
all internal I/O requests are sent directly to the server. The | I/O, and instead all internal I/O requests are sent directly to | |||
LAYOUTGET request has loga_offset equal to 10000, and | the server. The LAYOUTGET request has loga_offset equal to 10000 | |||
loga_minlength set to 50000. The value of loga_length is set to | and loga_minlength set to 50000. The value of loga_length is set | |||
the length of the file. The metadata server is free to return a | to the length of the file. The metadata server is free to return | |||
layout that fully overlaps the requested range, with a starting | a layout that fully overlaps the requested range, with a starting | |||
offset and length aligned on the stripe width. | offset and length aligned on the stripe width. | |||
o Again a process on the client invokes a request to read from | o Again, a process on the client invokes a request to read from | |||
offset 10000 for length 50000, and buffered I/O is in use. The | offset 10000 for length 50000 (i.e. a range with a starting offset | |||
client is expecting that the server might not be able to return | of 10000 and an ending offset of 69999), and buffered I/O is in | |||
the layout for the full I/O range, with loga_offset set to 8192 | use. The client is expecting that the server might not be able to | |||
and loga_minlength set to 53248. The client intends to map the | return the layout for the full I/O range. The client intends to | |||
request of the process into a series of READ requests starting at | map the request of the process into a series of thirteen READ | |||
offset 8192, each with length 4096, with a total length of 53248 | requests starting at offset 8192, each with length 4096, with a | |||
(which equals 13 * 4096). Because the value of | total length of 53248 (which equals 13 * 4096), which fully | |||
threshold4_read_iosize is equal to 4096, it is practical and | contains the range that client's process wants to read. Because | |||
reasonable for the client to use several LAYOUTGETs to complete | the value of threshold4_read_iosize is equal to 4096, it is | |||
the series of READs. The client sends a LAYOUTGET request with | practical and reasonable for the client to use several LAYOUTGET | |||
loga_offset set to 8192, loga_minlength set to 4096, and | operations to complete the series of READs. The client sends a | |||
loga_length set to 53248 or higher. The server will grant a | LAYOUTGET request with loga_offset set to 8192, loga_minlength set | |||
layout possibly with an initial offset of 0, with an end offset of | to 4096, and loga_length set to 53248 or higher. The server will | |||
at least 8192 + 4096 - 1 = 12287, but preferably a layout with an | grant a layout possibly with an initial offset of zero, with an | |||
offset aligned on the stripe width and a length that is a multiple | end offset of at least 8192 + 4096 - 1 = 12287, but preferably a | |||
of the stripe width. This will allow the client to make forward | layout with an offset aligned on the stripe width and a length | |||
progress, possibly having to send more LAYOUTGET operations for | that is a multiple of the stripe width. This will allow the | |||
the remainder of the range. | client to make forward progress, possibly sending more LAYOUTGET | |||
operations for the remainder of the range. | ||||
o An NFS client detects a sequential read pattern, and so sends a | o An NFS client detects a sequential read pattern, and so sends a | |||
LAYOUTGET operation that goes well beyond any current or pending | LAYOUTGET operation that goes well beyond any current or pending | |||
read requests to the server. The server might likewise detect | read requests to the server. The server might likewise detect | |||
this pattern, and grant the LAYOUTGET request. The client | this pattern, and grant the LAYOUTGET request. Once the client | |||
continues to send LAYOUTGET requests once it has read from an | reads from an offset of the file that represents 50% of the way | |||
offset of the file that represents 50% of the way through the | through the range of the last layout it received, in order to | |||
range of the last layout it received. | avoid stalling I/O that would wait for a layout, the client sends | |||
more operations from an offset of the file that represents 50% of | ||||
the way through the last layout it received. The client continues | ||||
to request layouts with byte-ranges that are well in advance of | ||||
the byte-ranges of recent and/or read requests of processes | ||||
running on the client. | ||||
o As above but the client fails to detect the pattern, but the | o This strategy is as above, but the client fails to detect the | |||
server does. The next time the metadata server gets a LAYOUTGET, | pattern, but the server does. The next time the metadata server | |||
it returns a layout with a length that is well beyond | gets a LAYOUTGET, it returns a layout with a length that is well | |||
loga_minlength. | beyond loga_minlength. | |||
o A client is using buffered I/O, and has a long queue of write | o A client is using buffered I/O, and has a long queue of write- | |||
behinds to process and also detects a sequential write pattern. | behinds to process and also detects a sequential write pattern. | |||
It sends a LAYOUTGET operation for a layout that spans the range | It sends a LAYOUTGET for a layout that spans the range of the | |||
of the queued write behinds and well beyond, including ranges | queued write-behinds and well beyond, including ranges beyond the | |||
beyond the filer's current length. The client continues to send | filer's current length. The client continues to send LAYOUTGET | |||
LAYOUTGET operations once the write behind queue reaches 50% of | operations once the write-behind queue reaches 50% of the maximum | |||
the maximum queue length. | queue length. | |||
Once the client has obtained a layout referring to a particular | Once the client has obtained a layout referring to a particular | |||
device ID, the metadata server MUST NOT delete the device ID until | device ID, the metadata server MUST NOT delete the device ID until | |||
the layout is returned or revoked. | the layout is returned or revoked. | |||
CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is | CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is | |||
that LAYOUTGET returns a device ID the client does not have device | that LAYOUTGET returns a device ID for which the client does not have | |||
address mappings for, and the metadata server sends a | device address mappings, and the metadata server sends a | |||
CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and | CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and | |||
meanwhile the client sends GETDEVICEINFO on the device ID. This | meanwhile the client sends GETDEVICEINFO on the device ID. This | |||
scenario is discussed in Section 18.40.4. Another scenario is that | scenario is discussed in Section 18.40.4. Another scenario is that | |||
the CB_NOTIFY_DEVICEID is processed by the client before it processes | the CB_NOTIFY_DEVICEID is processed by the client before it processes | |||
the results from LAYOUTGET. The client will send a GETDEVICEINFO on | the results from LAYOUTGET. The client will send a GETDEVICEINFO on | |||
the device ID. If the results from GETDEVICEINFO are received before | the device ID. If the results from GETDEVICEINFO are received before | |||
the client gets results from LAYOUTGET, then there is no longer a | the client gets results from LAYOUTGET, then there is no longer a | |||
race. If the results from LAYOUTGET are received before the results | race. If the results from LAYOUTGET are received before the results | |||
from GETDEVICEINFO, the client can either wait for results of | from GETDEVICEINFO, the client can either wait for results of | |||
GETDEVICEINFO, or send another one to get possibly more up to date | GETDEVICEINFO or send another one to get possibly more up-to-date | |||
device address mappings for the device ID. | device address mappings for the device ID. | |||
18.44. Operation 51: LAYOUTRETURN - Release Layout Information | 18.44. Operation 51: LAYOUTRETURN - Release Layout Information | |||
18.44.1. ARGUMENT | 18.44.1. ARGUMENT | |||
/* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ | /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ | |||
const LAYOUT4_RET_REC_FILE = 1; | const LAYOUT4_RET_REC_FILE = 1; | |||
const LAYOUT4_RET_REC_FSID = 2; | const LAYOUT4_RET_REC_FSID = 2; | |||
const LAYOUT4_RET_REC_ALL = 3; | const LAYOUT4_RET_REC_ALL = 3; | |||
skipping to change at page 550, line 36 | skipping to change at page 553, line 29 | |||
}; | }; | |||
18.44.3. DESCRIPTION | 18.44.3. DESCRIPTION | |||
This operation returns from the client to the server one or more | This operation returns from the client to the server one or more | |||
layouts represented by the client ID (derived from the session ID in | layouts represented by the client ID (derived from the session ID in | |||
the preceding SEQUENCE operation), lora_layout_type, and lora_iomode. | the preceding SEQUENCE operation), lora_layout_type, and lora_iomode. | |||
When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is | When lr_returntype is LAYOUTRETURN4_FILE, the returned layout is | |||
further identified by the current filehandle, lrf_offset, lrf_length, | further identified by the current filehandle, lrf_offset, lrf_length, | |||
and lrf_stateid. If the lrf_length field is NFS4_UINT64_MAX, all | and lrf_stateid. If the lrf_length field is NFS4_UINT64_MAX, all | |||
bytes of the layout, starting at lrf_offset are returned. When | bytes of the layout, starting at lrf_offset, are returned. When | |||
lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used | lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used | |||
to identify the file system and all layouts matching the client ID, | to identify the file system and all layouts matching the client ID, | |||
the fsid of the file system, lora_layout_type, and lora_iomode are | the fsid of the file system, lora_layout_type, and lora_iomode are | |||
returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts | returned. When lr_returntype is LAYOUTRETURN4_ALL, all layouts | |||
matching the client ID, lora_layout_type, and lora_iomode are | matching the client ID, lora_layout_type, and lora_iomode are | |||
returned and the current filehandle is not used. After this call, | returned and the current filehandle is not used. After this call, | |||
the client MUST NOT use the returned layout(s) and the associated | the client MUST NOT use the returned layout(s) and the associated | |||
storage protocol to access the file data. | storage protocol to access the file data. | |||
If the set of layouts designated in the case of LAYOUTRETURN4_FSID or | If the set of layouts designated in the case of LAYOUTRETURN4_FSID or | |||
LAYOUTRETURN4_ALL is empty, then no error results. In the case of | LAYOUTRETURN4_ALL is empty, then no error results. In the case of | |||
LAYOUTRETURN4_FILE, the byte range specified is returned even if it | LAYOUTRETURN4_FILE, the byte-range specified is returned even if it | |||
is a subdivision of a layout previously obtained with LAYOUTGET, a | is a subdivision of a layout previously obtained with LAYOUTGET, a | |||
combination of multiple layouts previously obtained with LAYOUTGET, | combination of multiple layouts previously obtained with LAYOUTGET, | |||
or a combination including some layouts previously obtained with | or a combination including some layouts previously obtained with | |||
LAYOUTGET, and one or more subdivisions of such layouts. When the | LAYOUTGET, and one or more subdivisions of such layouts. When the | |||
byte range does not designate any bytes for which a layout is held | byte-range does not designate any bytes for which a layout is held | |||
for the specified file, client ID, layout type and mode, no error | for the specified file, client ID, layout type and mode, no error | |||
results. See Section 12.5.5.2.1.5 for considerations with "bulk" | results. See Section 12.5.5.2.1.5 for considerations with "bulk" | |||
return of layouts. | return of layouts. | |||
The layout being returned may be a subset or superset of a layout | The layout being returned may be a subset or superset of a layout | |||
specified by CB_LAYOUTRECALL. However, if it is a subset, the recall | specified by CB_LAYOUTRECALL. However, if it is a subset, the recall | |||
is not complete until the full recalled scope has been returned. | is not complete until the full recalled scope has been returned. | |||
Recalled scope refers to the byte range in the case of | Recalled scope refers to the byte-range in the case of | |||
LAYOUTRETURN4_FILE, use of LAYOUTRETURN4_FSID, or the use of | LAYOUTRETURN4_FILE, the use of LAYOUTRETURN4_FSID, or the use of | |||
LAYOUTRETURN4_ALL. There must be a LAYOUTRETURN with a matching | LAYOUTRETURN4_ALL. There must be a LAYOUTRETURN with a matching | |||
scope to complete the return even if all current layout ranges have | scope to complete the return even if all current layout ranges have | |||
been previously individually returned. | been previously individually returned. | |||
For all lr_returntype values, an iomode of LAYOUTIOMODE4_ANY | For all lr_returntype values, an iomode of LAYOUTIOMODE4_ANY | |||
specifies that all layouts that match the other arguments to | specifies that all layouts that match the other arguments to | |||
LAYOUTRETURN (i.e., client ID, lora_layout_type, and one of current | LAYOUTRETURN (i.e., client ID, lora_layout_type, and one of current | |||
filehandle and range; fsid derived from current filehandle; or | filehandle and range; fsid derived from current filehandle; or | |||
LAYOUTRETURN4_ALL) are being returned. | LAYOUTRETURN4_ALL) are being returned. | |||
In the case that lr_returntype is LAYOUTRETURN4_FILE, the lrf_stateid | In the case that lr_returntype is LAYOUTRETURN4_FILE, the lrf_stateid | |||
provided by the client is a layout stateid as returned from previous | provided by the client is a layout stateid as returned from previous | |||
layout operations. Note that the "seqid" field of lrf_stateid MUST | layout operations. Note that the "seqid" field of lrf_stateid MUST | |||
NOT be zero. See Section 8.2, Section 12.5.3, and Section 12.5.5.2 | NOT be zero. See Sections 8.2, 12.5.3, and 12.5.5.2 for a further | |||
for a further discussion and requirements. | discussion and requirements. | |||
Return of a layout or all layouts does not invalidate the mapping of | Return of a layout or all layouts does not invalidate the mapping of | |||
storage device ID to storage device address which remains in effect | storage device ID to a storage device address. The mapping remains | |||
until specifically changed or deleted via device ID notification | in effect until specifically changed or deleted via device ID | |||
callbacks. | notification callbacks. Of course if there are no remaining layouts | |||
that refer to a previously used device ID, the server is free to | ||||
delete a device ID without a notification callback, which will be the | ||||
case when notifications are not in effect. | ||||
If the lora_reclaim field is set to TRUE, the client is attempting to | If the lora_reclaim field is set to TRUE, the client is attempting to | |||
return a layout that was acquired before the restart of the metadata | return a layout that was acquired before the restart of the metadata | |||
server during the metadata server's grace period. When returning | server during the metadata server's grace period. When returning | |||
layouts that were acquired during the metadata server's grace period, | layouts that were acquired during the metadata server's grace period, | |||
the client MUST set the lora_reclaim field to FALSE. The | the client MUST set the lora_reclaim field to FALSE. The | |||
lora_reclaim field MUST be set to FALSE also when lr_layoutreturn is | lora_reclaim field MUST be set to FALSE also when lr_layoutreturn is | |||
LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL. See LAYOUTCOMMIT | LAYOUTRETURN4_FSID or LAYOUTRETURN4_ALL. See LAYOUTCOMMIT | |||
(Section 18.42) for more details. | (Section 18.42) for more details. | |||
Layouts may be returned when recalled or voluntarily (i.e., before | Layouts may be returned when recalled or voluntarily (i.e., before | |||
the server has recalled them). In either case the client must | the server has recalled them). In either case, the client must | |||
properly propagate state changed under the context of the layout to | properly propagate state changed under the context of the layout to | |||
the storage device(s) or to the metadata server before returning the | the storage device(s) or to the metadata server before returning the | |||
layout. | layout. | |||
If the client returns the layout in response to a CB_LAYOUTRECALL | If the client returns the layout in response to a CB_LAYOUTRECALL | |||
where the lor_recalltype field of the clora_recall field was | where the lor_recalltype field of the clora_recall field was | |||
LAYOUTRECALL4_FILE, the client should use the lor_stateid value from | LAYOUTRECALL4_FILE, the client should use the lor_stateid value from | |||
CB_LAYOUTRECALL as the value for lrf_stateid. Otherwise, it should | CB_LAYOUTRECALL as the value for lrf_stateid. Otherwise, it should | |||
use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid | use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid | |||
(from a previous LAYRETURN result). This is done to indicate the | (from a previous LAYRETURN result). This is done to indicate the | |||
point in time (in terms of layout stateid transitions) when the | point in time (in terms of layout stateid transitions) when the | |||
recall was sent. The client uses the precise lora_recallstateid | recall was sent. The client uses the precise lora_recallstateid | |||
value and MUST NOT set the stateid's seqid to zero; otherwise | value and MUST NOT set the stateid's seqid to zero; otherwise, | |||
NFS4ERR_BAD_STATEID MUST be returned. NFS4ERR_OLD_STATEID can be | NFS4ERR_BAD_STATEID MUST be returned. NFS4ERR_OLD_STATEID can be | |||
returned if the client is using an old seqid, and the server knows | returned if the client is using an old seqid, and the server knows | |||
the client should not be using the old seqid. E.g. the client uses | the client should not be using the old seqid. For example, the | |||
the seqid on slot 1 of the session, received the response with the | client uses the seqid on slot 1 of the session, receives the response | |||
new seqid, and uses the slot to send another request with the old | with the new seqid, and uses the slot to send another request with | |||
seqid. | the old seqid. | |||
If a client fails to return a layout in a timely manner, then the | If a client fails to return a layout in a timely manner, then the | |||
metadata server SHOULD use its control protocol with the storage | metadata server SHOULD use its control protocol with the storage | |||
devices to fence the client from accessing the data referenced by the | devices to fence the client from accessing the data referenced by the | |||
layout. See Section 12.5.5 for more details. | layout. See Section 12.5.5 for more details. | |||
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after | If the LAYOUTRETURN request sets the lora_reclaim field to TRUE after | |||
the metadata server's grace period, NFS4ERR_NO_GRACE is returned. | the metadata server's grace period, NFS4ERR_NO_GRACE is returned. | |||
If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and | If the LAYOUTRETURN request sets the lora_reclaim field to TRUE and | |||
skipping to change at page 552, line 42 | skipping to change at page 555, line 35 | |||
NFS4ERR_INVAL is returned. | NFS4ERR_INVAL is returned. | |||
If the client sets the lr_returntype field to LAYOUTRETURN4_FILE, | If the client sets the lr_returntype field to LAYOUTRETURN4_FILE, | |||
then the lrs_stateid field will represent the layout stateid as | then the lrs_stateid field will represent the layout stateid as | |||
updated for this operation's processing; the current stateid will | updated for this operation's processing; the current stateid will | |||
also be updated to match the returned value. If the last byte of any | also be updated to match the returned value. If the last byte of any | |||
layout for the current file, client ID, and layout type is being | layout for the current file, client ID, and layout type is being | |||
returned and there are no remaining pending CB_LAYOUTRECALL | returned and there are no remaining pending CB_LAYOUTRECALL | |||
operations for which a LAYOUTRETURN operation must be done, | operations for which a LAYOUTRETURN operation must be done, | |||
lrs_present MUST be FALSE, and no stateid will be returned. In | lrs_present MUST be FALSE, and no stateid will be returned. In | |||
addition, the COMPOUND request's current stateid will be set to all- | addition, the COMPOUND request's current stateid will be set to the | |||
zeroes special stateid (see Section 16.2.3.1.2). The server MUST | all-zeroes special stateid (see Section 16.2.3.1.2). The server MUST | |||
reject with NFS4ERR_BAD_STATEID any further use of the current | reject with NFS4ERR_BAD_STATEID any further use of the current | |||
stateid in that COMPOUND until the current stateid is re-established | stateid in that COMPOUND until the current stateid is re-established | |||
by a later stateid-returning operation. | by a later stateid-returning operation. | |||
On success, the current filehandle retains its value. | On success, the current filehandle retains its value. | |||
If the EXCHGID4_FLAG_BIND_PRINC_STATEID capability is set on the | If the EXCHGID4_FLAG_BIND_PRINC_STATEID capability is set on the | |||
client ID (see Section 18.35), the server will require that the | client ID (see Section 18.35), the server will require that the | |||
principal, security flavor, and if applicable, the GSS mechanism, | principal, security flavor, and if applicable, the GSS mechanism, | |||
combination that acquired the layout also be the one to send | combination that acquired the layout also be the one to send | |||
skipping to change at page 553, line 17 | skipping to change at page 556, line 10 | |||
principal are no longer available. The server will allow the machine | principal are no longer available. The server will allow the machine | |||
credential or SSV credential (see Section 18.35) to send LAYOUTRETURN | credential or SSV credential (see Section 18.35) to send LAYOUTRETURN | |||
if LAYOUTRETURN's operation code was set in the spo_must_allow result | if LAYOUTRETURN's operation code was set in the spo_must_allow result | |||
of EXCHANGE_ID. | of EXCHANGE_ID. | |||
18.44.4. IMPLEMENTATION | 18.44.4. IMPLEMENTATION | |||
The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL | The final LAYOUTRETURN operation in response to a CB_LAYOUTRECALL | |||
callback MUST be serialized with any outstanding, intersecting | callback MUST be serialized with any outstanding, intersecting | |||
LAYOUTRETURN operations. Note that it is possible that while a | LAYOUTRETURN operations. Note that it is possible that while a | |||
client is returning the layout for some recalled range the server may | client is returning the layout for some recalled range, the server | |||
recall a superset of that range (e.g. LAYOUTRECALL4_ALL); the final | may recall a superset of that range (e.g., LAYOUTRECALL4_ALL); the | |||
return operation for the latter must block until the former layout | final return operation for the latter must block until the former | |||
recall is done. | layout recall is done. | |||
Returning all layouts in a file system using LAYOUTRETURN4_FSID is | Returning all layouts in a file system using LAYOUTRETURN4_FSID is | |||
typically done in response to a CB_LAYOUTRECALL for that file system | typically done in response to a CB_LAYOUTRECALL for that file system | |||
as the final return operation. Similarly, LAYOUTRETURN4_ALL is used | as the final return operation. Similarly, LAYOUTRETURN4_ALL is used | |||
in response to a recall callback for all layouts. It is possible | in response to a recall callback for all layouts. It is possible | |||
that the client already returned some outstanding layouts via | that the client already returned some outstanding layouts via | |||
individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or | individual LAYOUTRETURN calls and the call for LAYOUTRETURN4_FSID or | |||
LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence. See | LAYOUTRETURN4_ALL marks the end of the LAYOUTRETURN sequence. See | |||
Section 12.5.5.1 for more details. | Section 12.5.5.1 for more details. | |||
skipping to change at page 554, line 21 | skipping to change at page 557, line 14 | |||
There are two styles of SECINFO_NO_NAME, as determined by the value | There are two styles of SECINFO_NO_NAME, as determined by the value | |||
of the secinfo_style4 enumeration. If SECINFO_STYLE4_CURRENT_FH is | of the secinfo_style4 enumeration. If SECINFO_STYLE4_CURRENT_FH is | |||
passed, then SECINFO_NO_NAME is querying for the required security | passed, then SECINFO_NO_NAME is querying for the required security | |||
for the current filehandle. If SECINFO_STYLE4_PARENT is passed, then | for the current filehandle. If SECINFO_STYLE4_PARENT is passed, then | |||
SECINFO_NO_NAME is querying for the required security of the current | SECINFO_NO_NAME is querying for the required security of the current | |||
filehandle's parent. If the style selected is SECINFO_STYLE4_PARENT, | filehandle's parent. If the style selected is SECINFO_STYLE4_PARENT, | |||
then SECINFO should apply the same access methodology used for | then SECINFO should apply the same access methodology used for | |||
LOOKUPP when evaluating the traversal to the parent directory. | LOOKUPP when evaluating the traversal to the parent directory. | |||
Therefore, if the requester does not have the appropriate access to | Therefore, if the requester does not have the appropriate access to | |||
LOOKUPP the parent then SECINFO_NO_NAME must behave the same way and | LOOKUPP the parent, then SECINFO_NO_NAME must behave the same way and | |||
return NFS4ERR_ACCESS. | return NFS4ERR_ACCESS. | |||
If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH return NFS4ERR_WRONGSEC, | If PUTFH, PUTPUBFH, PUTROOTFH, or RESTOREFH returns NFS4ERR_WRONGSEC, | |||
then the client resolves the situation by sending a COMPOUND request | then the client resolves the situation by sending a COMPOUND request | |||
that consists of PUTFH, PUTPUBFH, or PUTROOTFH immediately followed | that consists of PUTFH, PUTPUBFH, or PUTROOTFH immediately followed | |||
by SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH. See Section 2.6 | by SECINFO_NO_NAME, style SECINFO_STYLE4_CURRENT_FH. See Section 2.6 | |||
for instructions on dealing with NFS4ERR_WRONGSEC error returns from | for instructions on dealing with NFS4ERR_WRONGSEC error returns from | |||
PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH. | PUTFH, PUTROOTFH, PUTPUBFH, or RESTOREFH. | |||
If SECINFO_STYLE4_PARENT is specified and there is no parent | If SECINFO_STYLE4_PARENT is specified and there is no parent | |||
directory, SECINFO_NO_NAME MUST return NFS4ERR_NOENT. | directory, SECINFO_NO_NAME MUST return NFS4ERR_NOENT. | |||
On success, the current filehandle is consumed (see | On success, the current filehandle is consumed (see | |||
skipping to change at page 556, line 10 | skipping to change at page 558, line 42 | |||
SEQUENCE4resok sr_resok4; | SEQUENCE4resok sr_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.46.3. DESCRIPTION | 18.46.3. DESCRIPTION | |||
The SEQUENCE operation is used by the server to implement session | The SEQUENCE operation is used by the server to implement session | |||
request control and the reply cache semantics. | request control and the reply cache semantics. | |||
This operation MUST appear as the first operation of any COMPOUND in | SEQUENCE MUST appear as the first operation of any COMPOUND in which | |||
which it appears. The error NFS4ERR_SEQUENCE_POS will be returned | it appears. The error NFS4ERR_SEQUENCE_POS will be returned when it | |||
when it is found in any position in a COMPOUND beyond the first. | is found in any position in a COMPOUND beyond the first. Operations | |||
Operations other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID, | other than SEQUENCE, BIND_CONN_TO_SESSION, EXCHANGE_ID, | |||
CREATE_SESSION, and DESTROY_SESSION, MUST NOT appear as the first | CREATE_SESSION, and DESTROY_SESSION, MUST NOT appear as the first | |||
operation in a COMPOUND. Such operations MUST yield the error | operation in a COMPOUND. Such operations MUST yield the error | |||
NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a | NFS4ERR_OP_NOT_IN_SESSION if they do appear at the start of a | |||
COMPOUND. | COMPOUND. | |||
If SEQUENCE is received on a connection not associated with the | If SEQUENCE is received on a connection not associated with the | |||
session via CREATE_SESSION or BIND_CONN_TO_SESSION, and connection | session via CREATE_SESSION or BIND_CONN_TO_SESSION, and connection | |||
association enforcement is enabled (see Section 18.35), then the | association enforcement is enabled (see Section 18.35), then the | |||
server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. | server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION. | |||
The sa_sessionid argument identifies the session this request applies | The sa_sessionid argument identifies the session to which this | |||
to. The sr_sessionid result MUST equal sa_sessionid. | request applies. The sr_sessionid result MUST equal sa_sessionid. | |||
The sa_slotid argument is the index in the reply cache for the | The sa_slotid argument is the index in the reply cache for the | |||
request. The sa_sequenceid field is the sequence number of the | request. The sa_sequenceid field is the sequence number of the | |||
request for the reply cache entry (slot). The sr_slotid result MUST | request for the reply cache entry (slot). The sr_slotid result MUST | |||
equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. | equal sa_slotid. The sr_sequenceid result MUST equal sa_sequenceid. | |||
The sa_highest_slotid argument is the highest slot ID the client has | The sa_highest_slotid argument is the highest slot ID for which the | |||
a request outstanding for; it could be equal to sa_slotid. The | client has a request outstanding; it could be equal to sa_slotid. | |||
server returns two "highest_slotid" values: sr_highest_slotid, and | The server returns two "highest_slotid" values: sr_highest_slotid and | |||
sr_target_highest_slotid. The former is the highest slot ID the | sr_target_highest_slotid. The former is the highest slot ID the | |||
server will accept in future SEQUENCE operation, and SHOULD NOT be | server will accept in future SEQUENCE operation, and SHOULD NOT be | |||
less than the value of sa_highest_slotid. (but see Section 2.10.6.1 | less than the value of sa_highest_slotid (but see Section 2.10.6.1 | |||
for an exception). The latter is the highest slot ID the server | for an exception). The latter is the highest slot ID the server | |||
would prefer the client use on a future SEQUENCE operation. | would prefer the client use on a future SEQUENCE operation. | |||
If sa_cachethis is TRUE, then the client is requesting that the | If sa_cachethis is TRUE, then the client is requesting that the | |||
server cache the entire reply in the server's reply cache; therefore | server cache the entire reply in the server's reply cache; therefore, | |||
the server MUST cache the reply (see Section 2.10.6.1.3). The server | the server MUST cache the reply (see Section 2.10.6.1.3). The server | |||
MAY cache the reply if sa_cachethis is FALSE. If the server does not | MAY cache the reply if sa_cachethis is FALSE. If the server does not | |||
cache the entire reply, it MUST still record that it executed the | cache the entire reply, it MUST still record that it executed the | |||
request at the specified slot and sequence ID. | request at the specified slot and sequence ID. | |||
The response to the SEQUENCE operation contains a word of status | The response to the SEQUENCE operation contains a word of status | |||
flags (sr_status_flags) that can provide to the client information | flags (sr_status_flags) that can provide to the client information | |||
related to the status of the client's lock state and communications | related to the status of the client's lock state and communications | |||
paths. Note that any status bits relating to lock state MAY be reset | paths. Note that any status bits relating to lock state MAY be reset | |||
when lock state is lost due to a server restart (even if the session | when lock state is lost due to a server restart (even if the session | |||
skipping to change at page 557, line 23 | skipping to change at page 560, line 10 | |||
client ID until at least one backchannel is available on any | client ID until at least one backchannel is available on any | |||
session associated with the client ID. If the client fails to re- | session associated with the client ID. If the client fails to re- | |||
establish a backchannel for the client ID, it is subject to having | establish a backchannel for the client ID, it is subject to having | |||
recallable state revoked. | recallable state revoked. | |||
SEQ4_STATUS_CB_PATH_DOWN_SESSION | SEQ4_STATUS_CB_PATH_DOWN_SESSION | |||
When set, indicates that the session has no operational | When set, indicates that the session has no operational | |||
backchannel. There are two reasons why | backchannel. There are two reasons why | |||
SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not | SEQ4_STATUS_CB_PATH_DOWN_SESSION may be set and not | |||
SEQ4_STATUS_CB_PATH_DOWN. First is that a callback operation that | SEQ4_STATUS_CB_PATH_DOWN. First is that a callback operation that | |||
applies specifically to the session (e.g. CB_RECALL_SLOT, see | applies specifically to the session (e.g., CB_RECALL_SLOT, see | |||
Section 20.8) needs to be sent. Second is that the server did | Section 20.8) needs to be sent. Second is that the server did | |||
send a callback operation, but the connection was lost before the | send a callback operation, but the connection was lost before the | |||
reply. The server cannot be sure whether the client received the | reply. The server cannot be sure whether or not the client | |||
callback operation or not, and so, per rules on request retry, the | received the callback operation, and so, per rules on request | |||
server MUST retry the callback operation over the same session. | retry, the server MUST retry the callback operation over the same | |||
The SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the indication to the | session. The SEQ4_STATUS_CB_PATH_DOWN_SESSION bit is the | |||
client that it needs to associate a connection to the session's | indication to the client that it needs to associate a connection | |||
backchannel. This bit remains set on all SEQUENCE responses on | to the session's backchannel. This bit remains set on all | |||
the session until a backchannel on the session the path is | SEQUENCE responses of the session until a connection is associated | |||
available. If the client fails to re-establish a backchannel for | with the session's a backchannel. If the client fails to re- | |||
the session, it is subject to having recallable state revoked. | establish a backchannel for the session, it is subject to having | |||
recallable state revoked. | ||||
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING | SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING | |||
When set, indicates that all GSS contexts or RPCSEC_GSS handles | When set, indicates that all GSS contexts or RPCSEC_GSS handles | |||
assigned to the session's backchannel will expire within a period | assigned to the session's backchannel will expire within a period | |||
equal to the lease time. This bit remains set on all SEQUENCE | equal to the lease time. This bit remains set on all SEQUENCE | |||
replies until at least one of the following are true: | replies until at least one of the following are true: | |||
* All SSV RPCSEC_GSS handles on the session's backchannel have | * All SSV RPCSEC_GSS handles on the session's backchannel have | |||
been destroyed and all non-SSV GSS contexts have expired. | been destroyed and all non-SSV GSS contexts have expired. | |||
skipping to change at page 558, line 24 | skipping to change at page 561, line 16 | |||
When set, indicates that the lease has expired and as a result the | When set, indicates that the lease has expired and as a result the | |||
server released all of the client's locking state. This status | server released all of the client's locking state. This status | |||
bit remains set on all SEQUENCE replies until the loss of all such | bit remains set on all SEQUENCE replies until the loss of all such | |||
locks has been acknowledged by use of FREE_STATEID (see | locks has been acknowledged by use of FREE_STATEID (see | |||
Section 18.38), or by establishing a new client instance by | Section 18.38), or by establishing a new client instance by | |||
destroying all sessions (via DESTROY_SESSION), the client ID (via | destroying all sessions (via DESTROY_SESSION), the client ID (via | |||
DESTROY_CLIENTID), and then invoking EXCHANGE_ID and | DESTROY_CLIENTID), and then invoking EXCHANGE_ID and | |||
CREATE_SESSION to establish a new client ID. | CREATE_SESSION to establish a new client ID. | |||
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED | SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED | |||
When set indicates that some subset of the client's locks have | When set, indicates that some subset of the client's locks have | |||
been revoked due to expiration of the lease period followed by | been revoked due to expiration of the lease period followed by | |||
another client's conflicting lock request. This status bit | another client's conflicting LOCK operation. This status bit | |||
remains set on all SEQUENCE replies until the loss of all such | remains set on all SEQUENCE replies until the loss of all such | |||
locks has been acknowledged by use of FREE_STATEID. | locks has been acknowledged by use of FREE_STATEID. | |||
SEQ4_STATUS_ADMIN_STATE_REVOKED | SEQ4_STATUS_ADMIN_STATE_REVOKED | |||
When set indicates that one or more locks have been revoked | When set, indicates that one or more locks have been revoked | |||
without expiration of the lease period, due to administrative | without expiration of the lease period, due to administrative | |||
action. This status bit remains set on all SEQUENCE replies until | action. This status bit remains set on all SEQUENCE replies until | |||
the loss of all such locks has been acknowledged by use of | the loss of all such locks has been acknowledged by use of | |||
FREE_STATEID. | FREE_STATEID. | |||
SEQ4_STATUS_RECALLABLE_STATE_REVOKED | SEQ4_STATUS_RECALLABLE_STATE_REVOKED | |||
When set indicates that one or more recallable objects have been | When set, indicates that one or more recallable objects have been | |||
revoked without expiration of the lease period, due to the | revoked without expiration of the lease period, due to the | |||
client's failure to return them when recalled which may be a | client's failure to return them when recalled, which may be a | |||
consequence of there being no working backchannel and the client | consequence of there being no working backchannel and the client | |||
failing to reestablish a backchannel per the | failing to re-establish a backchannel per the | |||
SEQ4_STATUS_CB_PATH_DOWN, SEQ4_STATUS_CB_PATH_DOWN_SESSION, or | SEQ4_STATUS_CB_PATH_DOWN, SEQ4_STATUS_CB_PATH_DOWN_SESSION, or | |||
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags. This status bit | SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED status flags. This status bit | |||
remains set on all SEQUENCE replies until the loss of all such | remains set on all SEQUENCE replies until the loss of all such | |||
locks has been acknowledged by use of FREE_STATEID. | locks has been acknowledged by use of FREE_STATEID. | |||
SEQ4_STATUS_LEASE_MOVED | SEQ4_STATUS_LEASE_MOVED | |||
When set indicates that responsibility for lease renewal has been | When set, indicates that responsibility for lease renewal has been | |||
transferred to one or more new servers. This condition will | transferred to one or more new servers. This condition will | |||
continue until the client receives an NFS4ERR_MOVED error and the | continue until the client receives an NFS4ERR_MOVED error and the | |||
server receives the subsequent GETATTR for the fs_locations or | server receives the subsequent GETATTR for the fs_locations or | |||
fs_locations_info attribute for an access to each file system for | fs_locations_info attribute for an access to each file system for | |||
which a lease has been moved to a new server. See | which a lease has been moved to a new server. See | |||
Section 11.7.7.1. | Section 11.7.7.1. | |||
SEQ4_STATUS_RESTART_RECLAIM_NEEDED | SEQ4_STATUS_RESTART_RECLAIM_NEEDED | |||
When set indicates that due to server restart the client must | When set, indicates that due to server restart, the client must | |||
reclaim locking state. Until the client sends a global | reclaim locking state. Until the client sends a global | |||
RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will | RECLAIM_COMPLETE (Section 18.51), every SEQUENCE operation will | |||
return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. | return SEQ4_STATUS_RESTART_RECLAIM_NEEDED. | |||
SEQ4_STATUS_BACKCHANNEL_FAULT | SEQ4_STATUS_BACKCHANNEL_FAULT | |||
The server has encountered an unrecoverable fault with the | The server has encountered an unrecoverable fault with the | |||
backchannel (e.g. it has lost track of the sequence ID for a slot | backchannel (e.g., it has lost track of the sequence ID for a slot | |||
in the backchannel). The client MUST stop sending more requests | in the backchannel). The client MUST stop sending more requests | |||
on the session's fore channel, wait for all outstanding requests | on the session's fore channel, wait for all outstanding requests | |||
to complete on the fore and back channel, and then destroy the | to complete on the fore and back channel, and then destroy the | |||
session. | session. | |||
SEQ4_STATUS_DEVID_CHANGED | SEQ4_STATUS_DEVID_CHANGED | |||
The client is using device ID notifications and the server has | The client is using device ID notifications and the server has | |||
changed a device ID mapping held by the client. This flag will | changed a device ID mapping held by the client. This flag will | |||
stay present until the client has obtained the new mapping with | stay present until the client has obtained the new mapping with | |||
GETDEVICEINFO. | GETDEVICEINFO. | |||
skipping to change at page 559, line 45 | skipping to change at page 562, line 36 | |||
The value of the sa_sequenceid argument relative to the cached | The value of the sa_sequenceid argument relative to the cached | |||
sequence ID on the slot falls into one of three cases. | sequence ID on the slot falls into one of three cases. | |||
o If the difference between sa_sequenceid and the server's cached | o If the difference between sa_sequenceid and the server's cached | |||
sequence ID at the slot ID is two (2) or more, or if sa_sequenceid | sequence ID at the slot ID is two (2) or more, or if sa_sequenceid | |||
is less than the cached sequence ID (accounting for wraparound of | is less than the cached sequence ID (accounting for wraparound of | |||
the unsigned sequence ID value), then the server MUST return | the unsigned sequence ID value), then the server MUST return | |||
NFS4ERR_SEQ_MISORDERED. | NFS4ERR_SEQ_MISORDERED. | |||
o If sa_sequenceid and the cached sequence ID are the same, this is | o If sa_sequenceid and the cached sequence ID are the same, this is | |||
a retry, and the server replies with the COMPOUND reply that is | a retry, and the server replies with what is recorded in the reply | |||
stored the reply cache. The lease is possibly renewed as | cache. The lease is possibly renewed as described below. | |||
described below. | ||||
o If sa_sequenceid is one greater (accounting for wraparound) than | o If sa_sequenceid is one greater (accounting for wraparound) than | |||
the cached sequence ID, then this is a new request, and the slot's | the cached sequence ID, then this is a new request, and the slot's | |||
sequence ID is incremented. The operations subsequent to | sequence ID is incremented. The operations subsequent to | |||
SEQUENCE, if any, are processed. If there are no other | SEQUENCE, if any, are processed. If there are no other | |||
operations, the only other effects are to cache the SEQUENCE reply | operations, the only other effects are to cache the SEQUENCE reply | |||
in the slot, maintain the session's activity, and possibly renew | in the slot, maintain the session's activity, and possibly renew | |||
the lease. | the lease. | |||
If the client reuses a slot ID and sequence ID for a completely | If the client reuses a slot ID and sequence ID for a completely | |||
different request, the server MAY treat the request as if it is retry | different request, the server MAY treat the request as if it is a | |||
of what it has already executed. The server MAY however detect the | retry of what it has already executed. The server MAY however detect | |||
client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | the client's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | |||
If SEQUENCE returns an error, then the state of the slot (sequence | If SEQUENCE returns an error, then the state of the slot (sequence | |||
ID, cached reply) MUST NOT change, and the associated lease MUST NOT | ID, cached reply) MUST NOT change, and the associated lease MUST NOT | |||
be renewed. | be renewed. | |||
If SEQUENCE returns NFS4_OK, then the associated lease MUST be | If SEQUENCE returns NFS4_OK, then the associated lease MUST be | |||
renewed (see Section 8.3), except if | renewed (see Section 8.3), except if | |||
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. | |||
18.46.4. IMPLEMENTATION | 18.46.4. IMPLEMENTATION | |||
The server MUST maintain a mapping of session ID to client ID in | The server MUST maintain a mapping of session ID to client ID in | |||
order to validate any operations that follow SEQUENCE that take a | order to validate any operations that follow SEQUENCE that take a | |||
stateid as an argument and/or result. | stateid as an argument and/or result. | |||
If the client establishes a persistent session, then a SEQUENCE done | If the client establishes a persistent session, then a SEQUENCE | |||
after a server restart may encounter requests performed and recorded | received after a server restart might encounter requests performed | |||
in a persistent reply cache before the server restart. In this case, | and recorded in a persistent reply cache before the server restart. | |||
SEQUENCE will be processed successfully, while requests which were | In this case, SEQUENCE will be processed successfully, while requests | |||
not processed previously are rejected with NFS4ERR_DEADSESSION. | that were not previously performed and recorded are rejected with | |||
NFS4ERR_DEADSESSION. | ||||
Depending on which of the operations within the COMPOUND were | Depending on which of the operations within the COMPOUND were | |||
successfully performed before the server restart, these operations | successfully performed before the server restart, these operations | |||
will also have replies sent from the server reply cache. Note that | will also have replies sent from the server reply cache. Note that | |||
when these operations establish locking state it is locking state | when these operations establish locking state, it is locking state | |||
that applies to the previous server instance and to the previous | that applies to the previous server instance and to the previous | |||
client ID, even though the server restart, which logically happened | client ID, even though the server restart, which logically happened | |||
after these operations, eliminated that state. In the case of a | after these operations, eliminated that state. In the case of a | |||
partially executed COMPOUND, processing may reach an operation not | partially executed COMPOUND, processing may reach an operation not | |||
processed during the earlier server instance, making this operation a | processed during the earlier server instance, making this operation a | |||
new one and not performable on the existing session. In this case, | new one and not performable on the existing session. In this case, | |||
NFS4ERR_DEADSESSION will be returned from that operation. | NFS4ERR_DEADSESSION will be returned from that operation. | |||
18.47. Operation 54: SET_SSV - Update SSV for a Client ID | 18.47. Operation 54: SET_SSV - Update SSV for a Client ID | |||
skipping to change at page 561, line 36 | skipping to change at page 564, line 25 | |||
union SET_SSV4res switch (nfsstat4 ssr_status) { | union SET_SSV4res switch (nfsstat4 ssr_status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
SET_SSV4resok ssr_resok4; | SET_SSV4resok ssr_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.47.3. DESCRIPTION | 18.47.3. DESCRIPTION | |||
This operation is used to update the SSV for a client ID. Before | This operation is used to update the SSV for a client ID. Before | |||
SET_SSV is called the first time on a client ID, the SSV is zero (0). | SET_SSV is called the first time on a client ID, the SSV is zero. | |||
The SSV is the key used for the SSV GSS mechanism (Section 2.10.9) | The SSV is the key used for the SSV GSS mechanism (Section 2.10.9) | |||
SET_SSV MUST be preceded by a SEQUENCE operation in the same | SET_SSV MUST be preceded by a SEQUENCE operation in the same | |||
COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV | COMPOUND. It MUST NOT be used if the client did not opt for SP4_SSV | |||
state protection when the client ID was created (see Section 18.35); | state protection when the client ID was created (see Section 18.35); | |||
the server returns NFS4ERR_INVAL in that case. | the server returns NFS4ERR_INVAL in that case. | |||
The field ssa_digest is computed as the output of the HMAC RFC2104 | The field ssa_digest is computed as the output of the HMAC (RFC 2104 | |||
[11] using the subkey derived from the SSV4_SUBKEY_MIC_I2T and | [11]) using the subkey derived from the SSV4_SUBKEY_MIC_I2T and | |||
current SSV as the key (See Section 2.10.9 for a description of | current SSV as the key (see Section 2.10.9 for a description of | |||
subkeys), and an XDR encoded value of data type ssa_digest_input4. | subkeys), and an XDR encoded value of data type ssa_digest_input4. | |||
The field sdi_seqargs is equal to the arguments of the SEQUENCE | The field sdi_seqargs is equal to the arguments of the SEQUENCE | |||
operation for the COMPOUND procedure that SET_SSV is within. | operation for the COMPOUND procedure that SET_SSV is within. | |||
The argument ssa_ssv is XORed with the current SSV to produce the new | The argument ssa_ssv is XORed with the current SSV to produce the new | |||
SSV. The argument ssa_ssv SHOULD be generated randomly. | SSV. The argument ssa_ssv SHOULD be generated randomly. | |||
In the response, ssr_digest is the output of the HMAC using the | In the response, ssr_digest is the output of the HMAC using the | |||
subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and | subkey derived from SSV4_SUBKEY_MIC_T2I and new SSV as the key, and | |||
an XDR encoded value of data type ssr_digest_input4. The field | an XDR encoded value of data type ssr_digest_input4. The field | |||
sdi_seqres is equal to the results of the SEQUENCE operation for the | sdi_seqres is equal to the results of the SEQUENCE operation for the | |||
COMPOUND procedure that SET_SSV is within. | COMPOUND procedure that SET_SSV is within. | |||
As noted in Section 18.35, the client and server can maintain | As noted in Section 18.35, the client and server can maintain | |||
multiple concurrent versions of the SSV. The client and server each | multiple concurrent versions of the SSV. The client and server each | |||
MUST maintain an internal SSV version number, which is set to one (1) | MUST maintain an internal SSV version number, which is set to one the | |||
the first time SET_SSV executes on the server and the client receives | first time SET_SSV executes on the server and the client receives the | |||
the first SET_SSV reply. Each subsequent SET_SSV increases the | first SET_SSV reply. Each subsequent SET_SSV increases the internal | |||
internal SSV version number by one (1). The value of this version | SSV version number by one. The value of this version number | |||
number corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, | corresponds to the smpt_ssv_seq, smt_ssv_seq, sspt_ssv_seq, and | |||
and ssct_ssv_seq fields of the SSV GSS mechanism tokens (see | ssct_ssv_seq fields of the SSV GSS mechanism tokens (see | |||
Section 2.10.9). | Section 2.10.9). | |||
18.47.4. IMPLEMENTATION | 18.47.4. IMPLEMENTATION | |||
When the server receives ssa_digest, it MUST verify the digest by | When the server receives ssa_digest, it MUST verify the digest by | |||
computing the digest the same way the client did and comparing it | computing the digest the same way the client did and comparing it | |||
with ssa_digest. If the server gets a different result, this is an | with ssa_digest. If the server gets a different result, this is an | |||
error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of | error, NFS4ERR_BAD_SESSION_DIGEST. This error might be the result of | |||
another SET_SSV from the same client ID changing the SSV. If so, the | another SET_SSV from the same client ID changing the SSV. If so, the | |||
client recovers by sending a SET_SSV operation again with a | client recovers by sending a SET_SSV operation again with a | |||
skipping to change at page 562, line 50 | skipping to change at page 565, line 37 | |||
ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv | ssa_ssv, nor equal to a previous or current SSV (including an ssa_ssv | |||
equal to zero since the SSV is initialized to zero when the client ID | equal to zero since the SSV is initialized to zero when the client ID | |||
is created). | is created). | |||
Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST | Clients SHOULD send SET_SSV with RPCSEC_GSS privacy. Servers MUST | |||
support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, | support RPCSEC_GSS with privacy for any COMPOUND that has { SEQUENCE, | |||
SET_SSV }. | SET_SSV }. | |||
A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's | A client SHOULD NOT send SET_SSV with the SSV GSS mechanism's | |||
credential because the purpose of SET_SSV is to seed the SSV from | credential because the purpose of SET_SSV is to seed the SSV from | |||
non-SSV credentials. Instead SET_SSV SHOULD be sent with the | non-SSV credentials. Instead, SET_SSV SHOULD be sent with the | |||
credential of a user that is accessing the client ID for the first | credential of a user that is accessing the client ID for the first | |||
time (Section 2.10.8.3). However if the client does send SET_SSV | time (Section 2.10.8.3). However, if the client does send SET_SSV | |||
with SSV credentials, the digest protecting the arguments uses the | with SSV credentials, the digest protecting the arguments uses the | |||
value of the SSV before ssa_ssv is XORed in, and the digest | value of the SSV before ssa_ssv is XORed in, and the digest | |||
protecting the results uses the value of the SSV after the ssa_ssv is | protecting the results uses the value of the SSV after the ssa_ssv is | |||
XORed in. | XORed in. | |||
18.48. Operation 55: TEST_STATEID - Test Stateids for Validity | 18.48. Operation 55: TEST_STATEID - Test Stateids for Validity | |||
18.48.1. ARGUMENT | 18.48.1. ARGUMENT | |||
struct TEST_STATEID4args { | struct TEST_STATEID4args { | |||
skipping to change at page 563, line 33 | skipping to change at page 566, line 21 | |||
union TEST_STATEID4res switch (nfsstat4 tsr_status) { | union TEST_STATEID4res switch (nfsstat4 tsr_status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
TEST_STATEID4resok tsr_resok4; | TEST_STATEID4resok tsr_resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
18.48.3. DESCRIPTION | 18.48.3. DESCRIPTION | |||
The TEST_STATEID operation is used to check the validity of a set of | The TEST_STATEID operation is used to check the validity of a set of | |||
stateids. It can be used at any time but the client should | stateids. It can be used at any time, but the client should | |||
definitely use it when it receives an indication that one or more of | definitely use it when it receives an indication that one or more of | |||
its stateids have been invalidated due to lock revocation. This | its stateids have been invalidated due to lock revocation. This | |||
occurs when the SEQUENCE operation returns with one of the following | occurs when the SEQUENCE operation returns with one of the following | |||
sr_status_flags set: | sr_status_flags set: | |||
o SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED | o SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED | |||
o SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED | o SEQ4_STATUS_EXPIRED_ADMIN_STATE_REVOKED | |||
o SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED | o SEQ4_STATUS_EXPIRED_RECALLABLE_STATE_REVOKED | |||
The client can use TEST_STATEID one or more times to test the | The client can use TEST_STATEID one or more times to test the | |||
validity of its stateids. Each use of TEST_STATEID allows a large | validity of its stateids. Each use of TEST_STATEID allows a large | |||
set of such stateids to be tested and allows problems with earlier | set of such stateids to be tested and avoids problems with earlier | |||
stateids not to interfere with checking of subsequent ones as would | stateids in a COMPOUND request from interfering with the checking of | |||
happen if individual stateids are tested by operation in a COMPOUND. | subsequent stateids, as would happen if individual stateids were | |||
tested by a series of corresponding by operations in a COMPOUND | ||||
request. | ||||
For each stateid, the server returns the status code that would be | For each stateid, the server returns the status code that would be | |||
returned if that stateid were to be used in normal operation. | returned if that stateid were to be used in normal operation. | |||
Returning such a status indication is not an error and does not cause | Returning such a status indication is not an error and does not cause | |||
compound processing to terminate. Checks for the validity of the | COMPOUND processing to terminate. Checks for the validity of the | |||
stateid proceed as they would for normal operations with a number of | stateid proceed as they would for normal operations with a number of | |||
exceptions: | exceptions: | |||
o There is no check for the type of stateid object, as would be the | o There is no check for the type of stateid object, as would be the | |||
case for normal use of a stateid. | case for normal use of a stateid. | |||
o There is no reference to the current filehandle. | o There is no reference to the current filehandle. | |||
o Special stateids are always considered invalid (they result in the | o Special stateids are always considered invalid (they result in the | |||
error code NFS4ERR_BAD_STATEID). | error code NFS4ERR_BAD_STATEID). | |||
All stateids are interpreted as being associated with the client for | All stateids are interpreted as being associated with the client for | |||
the current session. Any possible association with a previous | the current session. Any possible association with a previous | |||
instance of the client (as stale stateids) is not considered. | instance of the client (as stale stateids) is not considered. | |||
The errors which are validly returned within the status_code array | The valid status values in the returned status_code array are | |||
are: NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, | NFS4ERR_OK, NFS4ERR_BAD_STATEID, NFS4ERR_OLD_STATEID, | |||
NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. | NFS4ERR_EXPIRED, NFS4ERR_ADMIN_REVOKED, and NFS4ERR_DELEG_REVOKED. | |||
18.48.4. IMPLEMENTATION | 18.48.4. IMPLEMENTATION | |||
See Section 8.2.2 and Section 8.2.4 for a discussion of stateid | See Sections 8.2.2 and 8.2.4 for a discussion of stateid structure, | |||
structure, lifetime, and validation. | lifetime, and validation. | |||
18.49. Operation 56: WANT_DELEGATION - Request Delegation | 18.49. Operation 56: WANT_DELEGATION - Request Delegation | |||
18.49.1. ARGUMENT | 18.49.1. ARGUMENT | |||
union deleg_claim4 switch (open_claim_type4 dc_claim) { | union deleg_claim4 switch (open_claim_type4 dc_claim) { | |||
/* | /* | |||
* No special rights to object. Ordinary delegation | * No special rights to object. Ordinary delegation | |||
* request of the specified object. Object identified | * request of the specified object. Object identified | |||
* by filehandle. | * by filehandle. | |||
skipping to change at page 566, line 18 | skipping to change at page 569, line 18 | |||
for a specific condition, and where multiple conditions apply, the | for a specific condition, and where multiple conditions apply, the | |||
server MAY return any of the mandated error codes. | server MAY return any of the mandated error codes. | |||
This operation allows a client to: | This operation allows a client to: | |||
o Get a delegation on all types of files except directories. | o Get a delegation on all types of files except directories. | |||
o Register a "want" for a delegation for the specified file object, | o Register a "want" for a delegation for the specified file object, | |||
and be notified via a callback when the delegation is available. | and be notified via a callback when the delegation is available. | |||
The server MAY support notifications of availability via | The server MAY support notifications of availability via | |||
callbacks. If the server does not support registration of wants | callbacks. If the server does not support registration of wants, | |||
it MUST NOT return an error to indicate that, and instead MUST | it MUST NOT return an error to indicate that, and instead MUST | |||
return with ond_why set to WND4_CONTENTION or WND4_RESOURCE and | return with ond_why set to WND4_CONTENTION or WND4_RESOURCE and | |||
ond_server_will_push_deleg or ond_server_will_signal_avail set to | ond_server_will_push_deleg or ond_server_will_signal_avail set to | |||
FALSE. When the server indicates that it will notify the client | FALSE. When the server indicates that it will notify the client | |||
by means of a callback, it will either provide the delegation | by means of a callback, it will either provide the delegation | |||
using a CB_PUSH_DELEG operation, or cancel its promise by sending | using a CB_PUSH_DELEG operation or cancel its promise by sending a | |||
a CB_WANTS_CANCELLED operation. | CB_WANTS_CANCELLED operation. | |||
o Cancel a want for a delegation. | o Cancel a want for a delegation. | |||
The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set | The client SHOULD NOT set OPEN4_SHARE_ACCESS_READ and SHOULD NOT set | |||
OPEN4_SHARE_ACCESS_WRITE in wda_want. If it does, the server MUST | OPEN4_SHARE_ACCESS_WRITE in wda_want. If it does, the server MUST | |||
ignore them. | ignore them. | |||
The meanings of the following flags in wda_want are the same as they | The meanings of the following flags in wda_want are the same as they | |||
are in OPEN: | are in OPEN, except as noted below. | |||
o OPEN4_SHARE_ACCESS_WANT_READ_DELEG | o OPEN4_SHARE_ACCESS_WANT_READ_DELEG | |||
o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | o OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG | |||
o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | o OPEN4_SHARE_ACCESS_WANT_ANY_DELEG | |||
o OPEN4_SHARE_ACCESS_WANT_NO_DELEG | o OPEN4_SHARE_ACCESS_WANT_NO_DELEG. Unlike the OPEN operation, this | |||
flag SHOULD NOT be set by the client in the arguments to | ||||
WANT_DELEGATION, and MUST be ignored by the server. | ||||
o OPEN4_SHARE_ACCESS_WANT_CANCEL | o OPEN4_SHARE_ACCESS_WANT_CANCEL | |||
o OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | o OPEN4_SHARE_ACCESS_WANT_SIGNAL_DELEG_WHEN_RESRC_AVAIL | |||
o OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | o OPEN4_SHARE_ACCESS_WANT_PUSH_DELEG_WHEN_UNCONTENDED | |||
The handling of the above flags in WANT_DELEGATION is the same as in | The handling of the above flags in WANT_DELEGATION is the same as in | |||
OPEN. Information about the delegation and/or the promises the | OPEN. Information about the delegation and/or the promises the | |||
server is making regarding future callbacks are the same as those | server is making regarding future callbacks are the same as those | |||
described in the open_delegation4 structure. | described in the open_delegation4 structure. | |||
The successful results of WANT_DELEG are of type open_delegation4 | The successful results of WANT_DELEGATION are of data type | |||
which is the same type as the "delegation" field in the results of | open_delegation4, which is the same data type as the "delegation" | |||
the OPEN operation (see Section 18.16.3). The server constructs | field in the results of the OPEN operation (see Section 18.16.3). | |||
wdr_resok4 the same way it constructs OPEN's "delegation" with one | The server constructs wdr_resok4 the same way it constructs OPEN's | |||
difference: WANT_DELEGATION MUST NOT return a delegation type of | "delegation" with one difference: WANT_DELEGATION MUST NOT return a | |||
OPEN_DELEGATE_NONE. | delegation type of OPEN_DELEGATE_NONE. | |||
If (wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) is zero then the | If ((wda_want & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK) & | |||
client is indicating no desire for a delegation and the server MUST | ~OPEN4_SHARE_ACCESS_WANT_NO_DELEG) is zero, then the client is | |||
return NFS4ERR_INVAL. | indicating no explicit desire or non-desire for a delegation and the | |||
server MUST return NFS4ERR_INVAL. | ||||
The client uses the OPEN4_SHARE_ACCESS_WANT_NO_DELEG flag in the | The client uses the OPEN4_SHARE_ACCESS_WANT_CANCEL flag in the | |||
WANT_DELEGATION operation to cancel a previously requested want for a | WANT_DELEGATION operation to cancel a previously requested want for a | |||
delegation. Note that if the server is in the process of sending the | delegation. Note that if the server is in the process of sending the | |||
delegation (via CB_PUSH_DELEG) at the time the client sends a | delegation (via CB_PUSH_DELEG) at the time the client sends a | |||
cancellation of the want, the delegation might still be pushed to the | cancellation of the want, the delegation might still be pushed to the | |||
client. | client. | |||
If WANT_DELEGATION fails to return a delegation, and the server | If WANT_DELEGATION fails to return a delegation, and the server | |||
returns NFS4_OK, the server MUST set the delegation type to | returns NFS4_OK, the server MUST set the delegation type to | |||
OPEN4_DELEGATE_NONE_EXT, and set od_whynone, as described in | OPEN4_DELEGATE_NONE_EXT, and set od_whynone, as described in | |||
Section 18.16. Write delegations are not available for file types | Section 18.16. Write delegations are not available for file types | |||
that are not writable. This includes file objects of types: NF4BLK, | that are not writable. This includes file objects of types NF4BLK, | |||
NF4CHR, NF4LNK, NF4SOCK, and NF4FIFO. If the client requests | NF4CHR, NF4LNK, NF4SOCK, and NF4FIFO. If the client requests | |||
OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without | OPEN4_SHARE_ACCESS_WANT_WRITE_DELEG without | |||
OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with one of the | OPEN4_SHARE_ACCESS_WANT_READ_DELEG on an object with one of the | |||
aforementioned file types, the server must set | aforementioned file types, the server must set | |||
WND4_WRITE_DELEG_NOT_SUPP_FTYPE. | wdr_resok4.od_whynone.ond_why to WND4_WRITE_DELEG_NOT_SUPP_FTYPE. | |||
18.49.4. IMPLEMENTATION | 18.49.4. IMPLEMENTATION | |||
A request for a conflicting delegation is not normally intended to | A request for a conflicting delegation is not normally intended to | |||
trigger the recall of the existing delegation. Servers may choose to | trigger the recall of the existing delegation. Servers may choose to | |||
treat some clients as having higher priority such that their wants | treat some clients as having higher priority such that their wants | |||
will trigger recall of an existing delegation, although that is | will trigger recall of an existing delegation, although that is | |||
expected to be an unusual situation. | expected to be an unusual situation. | |||
Servers will generally recall delegations assigned by WANT_DELEGATION | Servers will generally recall delegations assigned by WANT_DELEGATION | |||
skipping to change at page 568, line 31 | skipping to change at page 571, line 32 | |||
The DESTROY_CLIENTID operation destroys the client ID. If there are | The DESTROY_CLIENTID operation destroys the client ID. If there are | |||
sessions (both idle and non-idle), opens, locks, delegations, | sessions (both idle and non-idle), opens, locks, delegations, | |||
layouts, and/or wants (Section 18.49) associated with the unexpired | layouts, and/or wants (Section 18.49) associated with the unexpired | |||
lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY. | lease of the client ID, the server MUST return NFS4ERR_CLIENTID_BUSY. | |||
DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as | DESTROY_CLIENTID MAY be preceded with a SEQUENCE operation as long as | |||
the client ID derived from the session ID of SEQUENCE is not the same | the client ID derived from the session ID of SEQUENCE is not the same | |||
as the client ID to be destroyed. If the client IDs are the same, | as the client ID to be destroyed. If the client IDs are the same, | |||
then the server MUST return NFS4ERR_CLIENTID_BUSY. | then the server MUST return NFS4ERR_CLIENTID_BUSY. | |||
If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only | If DESTROY_CLIENTID is not prefixed by SEQUENCE, it MUST be the only | |||
operation in the COMPOUND request (otherwise the server MUST return | operation in the COMPOUND request (otherwise, the server MUST return | |||
NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE | NFS4ERR_NOT_ONLY_OP). If the operation is sent without a SEQUENCE | |||
preceding it, a client that retransmits the request may receive an | preceding it, a client that retransmits the request may receive an | |||
error in response, because the original request might have been | error in response, because the original request might have been | |||
successfully executed. | successfully executed. | |||
18.50.4. IMPLEMENTATION | 18.50.4. IMPLEMENTATION | |||
DESTROY_CLIENTID allows a server to immediately reclaim the resources | DESTROY_CLIENTID allows a server to immediately reclaim the resources | |||
consumed by an unused client ID, and also to forget that it ever | consumed by an unused client ID, and also to forget that it ever | |||
generated the client ID. By forgetting it ever generated the client | generated the client ID. By forgetting that it ever generated the | |||
ID the server can safely reuse the client ID on a future EXCHANGE_ID | client ID, the server can safely reuse the client ID on a future | |||
operation. | EXCHANGE_ID operation. | |||
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished | 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished | |||
18.51.1. ARGUMENT | 18.51.1. ARGUMENT | |||
struct RECLAIM_COMPLETE4args { | struct RECLAIM_COMPLETE4args { | |||
/* | /* | |||
* If rca_one_fs TRUE, | * If rca_one_fs TRUE, | |||
* | * | |||
* CURRENT_FH: object in | * CURRENT_FH: object in | |||
skipping to change at page 569, line 46 | skipping to change at page 572, line 46 | |||
o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE | o When rca_one_fs is TRUE, a file system-specific RECLAIM_COMPLETE | |||
is being done. This indicates that recovery of locks for a single | is being done. This indicates that recovery of locks for a single | |||
fs (the one designated by the current filehandle) due to a file | fs (the one designated by the current filehandle) due to a file | |||
system transition have been completed. Presence of a current | system transition have been completed. Presence of a current | |||
filehandle is only required when rca_one_fs is set to TRUE. | filehandle is only required when rca_one_fs is set to TRUE. | |||
Once a RECLAIM_COMPLETE is done, there can be no further reclaim | Once a RECLAIM_COMPLETE is done, there can be no further reclaim | |||
operations for locks whose scope is defined as having completed | operations for locks whose scope is defined as having completed | |||
recovery. Once the client sends RECLAIM_COMPLETE, the server will | recovery. Once the client sends RECLAIM_COMPLETE, the server will | |||
not allow the client to do subsequent reclaims of locking state for | not allow the client to do subsequent reclaims of locking state for | |||
that scope and if these are attempted, will return NFS4ERR_NO_GRACE. | that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. | |||
Whenever a client establishes a new client ID and before it does the | Whenever a client establishes a new client ID and before it does the | |||
first non-reclaim operation that obtains a lock, it MUST send a | first non-reclaim operation that obtains a lock, it MUST send a | |||
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no | RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no | |||
locks to reclaim. If non-reclaim locking operations are done before | locks to reclaim. If non-reclaim locking operations are done before | |||
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. | the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned. | |||
Similarly, when the client accesses a file system on a new server, | Similarly, when the client accesses a file system on a new server, | |||
before it sends the first non-reclaim operation that obtains a lock | before it sends the first non-reclaim operation that obtains a lock | |||
on this new server, it MUST send a RECLAIM_COMPLETE with rca_one_fs | on this new server, it MUST send a RECLAIM_COMPLETE with rca_one_fs | |||
skipping to change at page 570, line 38 | skipping to change at page 573, line 38 | |||
NFS4ERR_GRACE errors until the server is ready to terminate its grace | NFS4ERR_GRACE errors until the server is ready to terminate its grace | |||
period. | period. | |||
18.51.4. IMPLEMENTATION | 18.51.4. IMPLEMENTATION | |||
Servers will typically use the information as to when reclaim | Servers will typically use the information as to when reclaim | |||
activity is complete to reduce the length of the grace period. When | activity is complete to reduce the length of the grace period. When | |||
the server maintains in persistent storage a list of clients that | the server maintains in persistent storage a list of clients that | |||
might have had locks, it is in a position to use the fact that all | might have had locks, it is in a position to use the fact that all | |||
such clients have done a RECLAIM_COMPLETE to terminate the grace | such clients have done a RECLAIM_COMPLETE to terminate the grace | |||
period and begin normal operations (i.e. grant requests for new | period and begin normal operations (i.e., grant requests for new | |||
locks) sooner than it might otherwise. | locks) sooner than it might otherwise. | |||
Latency can be minimized by doing a RECLAIM_COMPLETE as part of the | Latency can be minimized by doing a RECLAIM_COMPLETE as part of the | |||
COMPOUND request in which the last lock-reclaiming operation is done. | COMPOUND request in which the last lock-reclaiming operation is done. | |||
When there are no reclaims to be done, RECLAIM_COMPLETE should be | When there are no reclaims to be done, RECLAIM_COMPLETE should be | |||
done immediately in order to allow the grace period to end as soon as | done immediately in order to allow the grace period to end as soon as | |||
possible. | possible. | |||
RECLAIM_COMPLETE should only be done once for each server instance, | RECLAIM_COMPLETE should only be done once for each server instance or | |||
or occasion of the transition of a file system. If it is done a | occasion of the transition of a file system. If it is done a second | |||
second time, the error NFS4ERR_COMPLETE_ALREADY will result. Note | time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that | |||
that because of the session feature's retry protection, retries of | because of the session feature's retry protection, retries of | |||
COMPOUND requests containing RECLAIM_COMPLETE operation will not | COMPOUND requests containing RECLAIM_COMPLETE operation will not | |||
result in this error. | result in this error. | |||
When a RECLAIM_COMPLETE is done, the client effectively acknowledges | When a RECLAIM_COMPLETE is sent, the client effectively acknowledges | |||
any locks not yet reclaimed as lost. This allows the server to re- | any locks not yet reclaimed as lost. This allows the server to re- | |||
enable this client to subsequently recover locks. The server might | enable the client to recover locks if the occurrence of edge | |||
have disabled the client's ability to recover locks in order to | conditions, as described in Section 8.4.3, had caused the server to | |||
prevent the occurrence of the edge conditions described in | disable the client from recovering locks. occurrence of edge | |||
Section 8.4.3. | conditions, as described in Section 8.4.3. | |||
18.52. Operation 10044: ILLEGAL - Illegal operation | 18.52. Operation 10044: ILLEGAL - Illegal Operation | |||
18.52.1. ARGUMENTS | 18.52.1. ARGUMENTS | |||
void; | void; | |||
18.52.2. RESULTS | 18.52.2. RESULTS | |||
struct ILLEGAL4res { | struct ILLEGAL4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
skipping to change at page 575, line 23 | skipping to change at page 578, line 23 | |||
callback procedures into a single RPC request. The main callback RPC | callback procedures into a single RPC request. The main callback RPC | |||
program has two main procedures: CB_NULL and CB_COMPOUND. All other | program has two main procedures: CB_NULL and CB_COMPOUND. All other | |||
operations use the CB_COMPOUND procedure as a wrapper. | operations use the CB_COMPOUND procedure as a wrapper. | |||
During the processing of the CB_COMPOUND procedure, the client may | During the processing of the CB_COMPOUND procedure, the client may | |||
find that it does not have the available resources to execute any or | find that it does not have the available resources to execute any or | |||
all of the operations within the CB_COMPOUND sequence. Refer to | all of the operations within the CB_COMPOUND sequence. Refer to | |||
Section 2.10.6.4 for details. | Section 2.10.6.4 for details. | |||
The minorversion field of the arguments MUST be the same as the | The minorversion field of the arguments MUST be the same as the | |||
minorversion of the COMPOUND procedure used to created the client ID | minorversion of the COMPOUND procedure used to create the client ID | |||
and session. For NFSv4.1, minorversion MUST be set to 1. | and session. For NFSv4.1, minorversion MUST be set to 1. | |||
Contained within the CB_COMPOUND results is a 'status' field. This | Contained within the CB_COMPOUND results is a "status" field. This | |||
status MUST be equal to the status of the last operation that was | status MUST be equal to the status of the last operation that was | |||
executed within the CB_COMPOUND procedure. Therefore, if an | executed within the CB_COMPOUND procedure. Therefore, if an | |||
operation incurred an error then the 'status' value will be the same | operation incurred an error, then the "status" value will be the same | |||
error value as is being returned for the operation that failed. | error value as is being returned for the operation that failed. | |||
The "tag" field is handled the same way as that of COMPOUND procedure | The "tag" field is handled the same way as that of the COMPOUND | |||
(see Section 16.2.3). | procedure (see Section 16.2.3). | |||
Illegal operation codes are handled in the same way as they are | Illegal operation codes are handled in the same way as they are | |||
handled for the COMPOUND procedure. | handled for the COMPOUND procedure. | |||
19.2.4. IMPLEMENTATION | 19.2.4. IMPLEMENTATION | |||
The CB_COMPOUND procedure is used to combine individual operations | The CB_COMPOUND procedure is used to combine individual operations | |||
into a single RPC request. The client interprets each of the | into a single RPC request. The client interprets each of the | |||
operations in turn. If an operation is executed by the client and | operations in turn. If an operation is executed by the client and | |||
the status of that operation is NFS4_OK, then the next operation in | the status of that operation is NFS4_OK, then the next operation in | |||
the CB_COMPOUND procedure is executed. The client continues this | the CB_COMPOUND procedure is executed. The client continues this | |||
process until there are no more operations to be executed or one of | process until there are no more operations to be executed or one of | |||
the operations has a status value other than NFS4_OK. | the operations has a status value other than NFS4_OK. | |||
19.2.5. ERRORS | 19.2.5. ERRORS | |||
CB_COMPOUND will of course return every error that each operation on | CB_COMPOUND will of course return every error that each operation on | |||
the backchannel can return (see Table 7). However if CB_COMPOUND | the backchannel can return (see Table 7). However, if CB_COMPOUND | |||
returns zero operations, obviously the error returned by COMPOUND has | returns zero operations, obviously the error returned by COMPOUND has | |||
nothing to do with an error returned by an operation. The list of | nothing to do with an error returned by an operation. The list of | |||
errors CB_COMPOUND will return if it processes zero operations | errors CB_COMPOUND will return if it processes zero operations | |||
include: | includes: | |||
CB_COMPOUND error returns | CB_COMPOUND error returns | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
| Error | Notes | | | Error | Notes | | |||
+------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
| NFS4ERR_BADCHAR | The tag argument has a character | | | NFS4ERR_BADCHAR | The tag argument has a character | | |||
| | the replier does not support. | | | | the replier does not support. | | |||
| NFS4ERR_BADXDR | | | | NFS4ERR_BADXDR | | | |||
| NFS4ERR_DELAY | | | | NFS4ERR_DELAY | | | |||
skipping to change at page 577, line 21 | skipping to change at page 580, line 21 | |||
union CB_GETATTR4res switch (nfsstat4 status) { | union CB_GETATTR4res switch (nfsstat4 status) { | |||
case NFS4_OK: | case NFS4_OK: | |||
CB_GETATTR4resok resok4; | CB_GETATTR4resok resok4; | |||
default: | default: | |||
void; | void; | |||
}; | }; | |||
20.1.3. DESCRIPTION | 20.1.3. DESCRIPTION | |||
The CB_GETATTR operation is used by the server to obtain the current | The CB_GETATTR operation is used by the server to obtain the current | |||
modified state of a file that has been write delegated. The | modified state of a file that has been OPEN_DELEGATE_WRITE delegated. | |||
attributes size and change are the only ones guaranteed to be | The size and change attributes are the only ones guaranteed to be | |||
serviced by the client. See Section 10.4.3 for a full description of | serviced by the client. See Section 10.4.3 for a full description of | |||
how the client and server are to interact with the use of CB_GETATTR. | how the client and server are to interact with the use of CB_GETATTR. | |||
If the filehandle specified is not one for which the client holds a | If the filehandle specified is not one for which the client holds an | |||
write delegation, an NFS4ERR_BADHANDLE error is returned. | OPEN_DELEGATE_WRITE delegation, an NFS4ERR_BADHANDLE error is | |||
returned. | ||||
20.1.4. IMPLEMENTATION | 20.1.4. IMPLEMENTATION | |||
The client returns attrmask bits and the associated attribute values | The client returns attrmask bits and the associated attribute values | |||
only for the change attribute, and attributes that it may change | only for the change attribute, and attributes that it may change | |||
(time_modify, and size). | (time_modify, and size). | |||
20.2. Operation 4: CB_RECALL - Recall a Delegation | 20.2. Operation 4: CB_RECALL - Recall a Delegation | |||
20.2.1. ARGUMENT | 20.2.1. ARGUMENT | |||
skipping to change at page 578, line 10 | skipping to change at page 581, line 16 | |||
struct CB_RECALL4res { | struct CB_RECALL4res { | |||
nfsstat4 status; | nfsstat4 status; | |||
}; | }; | |||
20.2.3. DESCRIPTION | 20.2.3. DESCRIPTION | |||
The CB_RECALL operation is used to begin the process of recalling a | The CB_RECALL operation is used to begin the process of recalling a | |||
delegation and returning it to the server. | delegation and returning it to the server. | |||
The truncate flag is used to optimize recall for a file object which | The truncate flag is used to optimize recall for a file object that | |||
is a regular file and is about to be truncated to zero. When it is | is a regular file and is about to be truncated to zero. When it is | |||
TRUE, the client is freed of the obligation to propagate modified | TRUE, the client is freed of the obligation to propagate modified | |||
data for the file to the server, since this data is irrelevant. | data for the file to the server, since this data is irrelevant. | |||
If the handle specified is not one for which the client holds a | If the handle specified is not one for which the client holds a | |||
delegation, an NFS4ERR_BADHANDLE error is returned. | delegation, an NFS4ERR_BADHANDLE error is returned. | |||
If the stateid specified is not one corresponding to an open | If the stateid specified is not one corresponding to an OPEN | |||
delegation for the file specified by the filehandle, an | delegation for the file specified by the filehandle, an | |||
NFS4ERR_BAD_STATEID is returned. | NFS4ERR_BAD_STATEID is returned. | |||
20.2.4. IMPLEMENTATION | 20.2.4. IMPLEMENTATION | |||
The client SHOULD reply to the callback immediately. Replying does | The client SHOULD reply to the callback immediately. Replying does | |||
not complete the recall except when the value of the reply's status | not complete the recall except when the value of the reply's status | |||
field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not | field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not | |||
complete until the delegation is returned using a DELEGRETURN | complete until the delegation is returned using a DELEGRETURN | |||
operation. | operation. | |||
skipping to change at page 580, line 4 | skipping to change at page 583, line 4 | |||
struct CB_LAYOUTRECALL4res { | struct CB_LAYOUTRECALL4res { | |||
nfsstat4 clorr_status; | nfsstat4 clorr_status; | |||
}; | }; | |||
20.3.3. DESCRIPTION | 20.3.3. DESCRIPTION | |||
The CB_LAYOUTRECALL operation is used by the server to recall layouts | The CB_LAYOUTRECALL operation is used by the server to recall layouts | |||
from the client; as a result, the client will begin the process of | from the client; as a result, the client will begin the process of | |||
returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation | returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation | |||
specifies one of three forms of recall processing with the value of | specifies one of three forms of recall processing with the value of | |||
layoutrecall_type4. The recall is either for a specific layout (by | layoutrecall_type4. The recall is for one of the following: a | |||
file), for an entire file system (FSID), or for all file systems | specific layout of a specific file (LAYOUTRECALL4_FILE), an entire | |||
(ALL). | file system ID (LAYOUTRECALL4_FSID), or all file systems | |||
(LAYOUTRECALL4_ALL). | ||||
The behavior of the operation varies based on the value of the | The behavior of the operation varies based on the value of the | |||
layoutrecall_type4. The value and behaviors are: | layoutrecall_type4. The value and behaviors are: | |||
LAYOUTRECALL4_FILE | LAYOUTRECALL4_FILE | |||
For a layout to match the recall request, the values of the | For a layout to match the recall request, the values of the | |||
following fields must match those of the layout: clora_type, | following fields must match those of the layout: clora_type, | |||
clora_iomode, lor_fh, and the byte range specified by lor_offset | clora_iomode, lor_fh, and the byte-range specified by lor_offset | |||
and lor_length. The clora_iomode field may have a special value | and lor_length. The clora_iomode field may have a special value | |||
of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will | of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will | |||
match any iomode originally returned in a layout; therefore it | match any iomode originally returned in a layout; therefore, it | |||
acts as a wild card. The other special value used is for | acts as a wild card. The other special value used is for | |||
lor_length. If lor_length has a value of NFS4_UINT64_MAX, the | lor_length. If lor_length has a value of NFS4_UINT64_MAX, the | |||
lor_length field means the maximum possible file size. If a | lor_length field means the maximum possible file size. If a | |||
matching layout is found, it MUST be returned using the | matching layout is found, it MUST be returned using the | |||
LAYOUTRETURN operation (see Section 18.44). An example of the | LAYOUTRETURN operation (see Section 18.44). An example of the | |||
field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, | field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, | |||
lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the | lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the | |||
entire layout is to be returned. | entire layout is to be returned. | |||
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | |||
skipping to change at page 581, line 10 | skipping to change at page 584, line 11 | |||
mappings. | mappings. | |||
In processing the layout recall request, the client also varies its | In processing the layout recall request, the client also varies its | |||
behavior based on the value of the clora_changed field. This field | behavior based on the value of the clora_changed field. This field | |||
is used by the server to provide additional context for the reason | is used by the server to provide additional context for the reason | |||
why the layout is being recalled. A FALSE value for clora_changed | why the layout is being recalled. A FALSE value for clora_changed | |||
indicates that no change in the layout is expected and the client may | indicates that no change in the layout is expected and the client may | |||
write modified data to the storage devices involved; this must be | write modified data to the storage devices involved; this must be | |||
done prior to returning the layout via LAYOUTRETURN. A TRUE value | done prior to returning the layout via LAYOUTRETURN. A TRUE value | |||
for clora_changed indicates that the server is changing the layout. | for clora_changed indicates that the server is changing the layout. | |||
Examples of layout changes and reasons for a TRUE indication are: the | Examples of layout changes and reasons for a TRUE indication are the | |||
metadata server is restriping the file or a permanent error has | following: the metadata server is restriping the file or a permanent | |||
occurred on a storage device and the metadata server would like to | error has occurred on a storage device and the metadata server would | |||
provide a new layout for the file. Therefore, a clora_changed value | like to provide a new layout for the file. Therefore, a | |||
of TRUE indicates some level of change for the layout and the client | clora_changed value of TRUE indicates some level of change for the | |||
SHOULD NOT write and commit modified data to the storage devices. In | layout and the client SHOULD NOT write and commit modified data to | |||
this case, the client writes and commits data through the metadata | the storage devices. In this case, the client writes and commits | |||
server. | data through the metadata server. | |||
See Section 12.5.3 for a description of how the lor_stateid field in | See Section 12.5.3 for a description of how the lor_stateid field in | |||
the arguments is to be constructed. Note that the "seqid" field of | the arguments is to be constructed. Note that the "seqid" field of | |||
lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and | lor_stateid MUST NOT be zero. See Sections 8.2, 12.5.3, and 12.5.5.2 | |||
Section 12.5.5.2 for a further discussion and requirements. | for a further discussion and requirements. | |||
20.3.4. IMPLEMENTATION | 20.3.4. IMPLEMENTATION | |||
The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL | The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL | |||
(recall of file delegations) in that the client responds to the | (recall of file delegations) in that the client responds to the | |||
request before actually returning layouts via the LAYOUTRETURN | request before actually returning layouts via the LAYOUTRETURN | |||
operation. While the client responds to the CB_LAYOUTRECALL | operation. While the client responds to the CB_LAYOUTRECALL | |||
immediately, the operation is not considered complete (i.e. | immediately, the operation is not considered complete (i.e., | |||
considered pending) until all affected layouts are returned to the | considered pending) until all affected layouts are returned to the | |||
server via the LAYOUTRETURN operation. | server via the LAYOUTRETURN operation. | |||
Before returning the layout to the server via LAYOUTRETURN, the | Before returning the layout to the server via LAYOUTRETURN, the | |||
client should wait for the response from in-process or in-flight | client should wait for the response from in-process or in-flight | |||
READ, WRITE, or COMMIT operations that use the recalled layout. | READ, WRITE, or COMMIT operations that use the recalled layout. | |||
If the client is holding modified data which is affected by a | If the client is holding modified data that is affected by a recalled | |||
recalled layout, the client has various options for writing the data | layout, the client has various options for writing the data to the | |||
to the server. As always, the client may write the data through the | server. As always, the client may write the data through the | |||
metadata server. In fact, the client may not have a choice other | metadata server. In fact, the client may not have a choice other | |||
than writing to the metadata server when the clora_changed argument | than writing to the metadata server when the clora_changed argument | |||
is TRUE and a new layout is unavailable from the server. However, | is TRUE and a new layout is unavailable from the server. However, | |||
the client may be able to write the modified data to the storage | the client may be able to write the modified data to the storage | |||
device if the clora_changed argument is FALSE; this needs to be done | device if the clora_changed argument is FALSE; this needs to be done | |||
before returning the layout via LAYOUTRETURN. If the client were to | before returning the layout via LAYOUTRETURN. If the client were to | |||
obtain a new layout covering the modified data's range, then writing | obtain a new layout covering the modified data's byte-range, then | |||
to the storage devices is an available alternative. Note that before | writing to the storage devices is an available alternative. Note | |||
obtaining a new layout, the client must first return the original | that before obtaining a new layout, the client must first return the | |||
layout. | original layout. | |||
In the case of modified data being written while the layout is held, | In the case of modified data being written while the layout is held, | |||
the client must use LAYOUTCOMMIT operations at the appropriate time; | the client must use LAYOUTCOMMIT operations at the appropriate time; | |||
as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a | as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a | |||
large amount of modified data is outstanding, the client may send | large amount of modified data is outstanding, the client may send | |||
LAYOUTRETURNs for portions of the recalled layout; this allows the | LAYOUTRETURNs for portions of the recalled layout; this allows the | |||
server to monitor the client's progress and adherence to the original | server to monitor the client's progress and adherence to the original | |||
recall request. However, the last LAYOUTRETURN in a sequence of | recall request. However, the last LAYOUTRETURN in a sequence of | |||
returns, MUST specify the full range being recalled (see | returns MUST specify the full range being recalled (see | |||
Section 12.5.5.1 for details). | Section 12.5.5.1 for details). | |||
If a server needs to delete a device ID, and there are layouts | If a server needs to delete a device ID and there are layouts | |||
referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause | referring to the device ID, CB_LAYOUTRECALL MUST be invoked to cause | |||
the client to return all layouts referring to device ID before the | the client to return all layouts referring to the device ID before | |||
server can delete the device ID. If the client does not return the | the server can delete the device ID. If the client does not return | |||
affected layouts, the server MAY revoke the layouts. | the affected layouts, the server MAY revoke the layouts. | |||
20.4. Operation 6: CB_NOTIFY - Notify Client of Directory Changes | 20.4. Operation 6: CB_NOTIFY - Notify Client of Directory Changes | |||
20.4.1. ARGUMENT | 20.4.1. ARGUMENT | |||
/* | /* | |||
* Directory notification types. | * Directory notification types. | |||
*/ | */ | |||
enum notify_type4 { | enum notify_type4 { | |||
NOTIFY4_CHANGE_CHILD_ATTRS = 0, | NOTIFY4_CHANGE_CHILD_ATTRS = 0, | |||
skipping to change at page 584, line 27 | skipping to change at page 587, line 27 | |||
over the backchannel. The notification is sent once the original | over the backchannel. The notification is sent once the original | |||
request has been processed on the server. The server will send an | request has been processed on the server. The server will send an | |||
array of notifications for changes that might have occurred in the | array of notifications for changes that might have occurred in the | |||
directory. The notifications are sent as list of pairs of bitmaps | directory. The notifications are sent as list of pairs of bitmaps | |||
and values. See Section 3.3.7 for a description of how NFSv4.1 | and values. See Section 3.3.7 for a description of how NFSv4.1 | |||
bitmaps work. | bitmaps work. | |||
If the server has more notifications than can fit in the CB_COMPOUND | If the server has more notifications than can fit in the CB_COMPOUND | |||
request, it SHOULD send a sequence of serial CB_COMPOUND requests so | request, it SHOULD send a sequence of serial CB_COMPOUND requests so | |||
that the client's view of the directory does not become confused. | that the client's view of the directory does not become confused. | |||
E.g. If the server indicates a file named "foo" is added, and that | For example, if the server indicates that a file named "foo" is added | |||
the file "foo" is removed, the order in which the client receives | and that the file "foo" is removed, the order in which the client | |||
these notifications needs to be the same as the order in which | receives these notifications needs to be the same as the order in | |||
corresponding operations occurred on the server. | which the corresponding operations occurred on the server. | |||
If the client holding the delegation makes any changes in the | If the client holding the delegation makes any changes in the | |||
directory that cause files or sub directories to be added or removed, | directory that cause files or sub-directories to be added or removed, | |||
the server will notify that client of the resulting change(s). If | the server will notify that client of the resulting change(s). If | |||
the client holding the delegation is making attribute or cookie | the client holding the delegation is making attribute or cookie | |||
verifier changes only, the server does not need to send notifications | verifier changes only, the server does not need to send notifications | |||
to that client. The server will send the following information for | to that client. The server will send the following information for | |||
each operation: | each operation: | |||
NOTIFY4_ADD_ENTRY | NOTIFY4_ADD_ENTRY | |||
The server will send information about the new directory entry | The server will send information about the new directory entry | |||
being created along with the cookie for that entry. The entry | being created along with the cookie for that entry. The entry | |||
information (data type notify_add4) includes the component name of | information (data type notify_add4) includes the component name of | |||
the entry and attributes. The server will send this type of entry | the entry and attributes. The server will send this type of entry | |||
when a file is actually being created, when an entry is being | when a file is actually being created, when an entry is being | |||
added to a directory as a result of a rename across directories | added to a directory as a result of a rename across directories | |||
(see below), and when a hard link is being created to an existing | (see below), and when a hard link is being created to an existing | |||
file. If this entry is added to the end of the directory, the | file. If this entry is added to the end of the directory, the | |||
server will set the nad_last_entry flag to TRUE. If the file is | server will set the nad_last_entry flag to TRUE. If the file is | |||
added such that there is at least one entry before it, the server | added such that there is at least one entry before it, the server | |||
will also return the previous entry information (nad_prev_entry, a | will also return the previous entry information (nad_prev_entry, a | |||
variable length array of up to one element. If the array is of | variable-length array of up to one element. If the array is of | |||
zero length, there is no previous entry), along with its cookie. | zero length, there is no previous entry), along with its cookie. | |||
This is to help clients find the right location in their file name | This is to help clients find the right location in their file name | |||
caches and directory caches where this entry should be cached. If | caches and directory caches where this entry should be cached. If | |||
the new entry's cookie is available, it will be in the | the new entry's cookie is available, it will be in the | |||
nad_new_entry_cookie (another variable length array of up to one | nad_new_entry_cookie (another variable-length array of up to one | |||
element) field. If the addition of the entry causes another entry | element) field. If the addition of the entry causes another entry | |||
to be deleted (which can only happen in the rename case) | to be deleted (which can only happen in the rename case) | |||
atomically with the addition, then information on this entry is | atomically with the addition, then information on this entry is | |||
reported in nad_old_entry. | reported in nad_old_entry. | |||
NOTIFY4_REMOVE_ENTRY | NOTIFY4_REMOVE_ENTRY | |||
The server will send information about the directory entry being | The server will send information about the directory entry being | |||
deleted. The server will also send the cookie value for the | deleted. The server will also send the cookie value for the | |||
deleted entry so that clients can get to the cached information | deleted entry so that clients can get to the cached information | |||
for this entry. | for this entry. | |||
NOTIFY4_RENAME_ENTRY | NOTIFY4_RENAME_ENTRY | |||
The server will send information about both the old entry and the | The server will send information about both the old entry and the | |||
new entry. This includes name and attributes for each entry. In | new entry. This includes the name and attributes for each entry. | |||
addition, if the rename causes the deletion of an entry (i.e. the | In addition, if the rename causes the deletion of an entry (i.e., | |||
case of a file renamed over) then this is reported in | the case of a file renamed over), then this is reported in | |||
nrn_new_new_entry.nad_old_entry. This notification is only sent | nrn_new_new_entry.nad_old_entry. This notification is only sent | |||
if both entries are in the same directory. If the rename is | if both entries are in the same directory. If the rename is | |||
across directories, the server will send a remove notification to | across directories, the server will send a remove notification to | |||
one directory and an add notification to the other directory, | one directory and an add notification to the other directory, | |||
assuming both have a directory delegation. | assuming both have a directory delegation. | |||
NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS | NOTIFY4_CHANGE_CHILD_ATTRS/NOTIFY4_CHANGE_DIR_ATTRS | |||
The client will use the attribute mask to inform the server of | The client will use the attribute mask to inform the server of | |||
attributes for which it wants to receive notifications. This | attributes for which it wants to receive notifications. This | |||
change notification can be requested for both changes to the | change notification can be requested for changes to the attributes | |||
attributes of the directory as well as changes to any file's | of the directory as well as changes to any file's attributes in | |||
attributes in the directory by using two separate attribute masks. | the directory by using two separate attribute masks. The client | |||
The client cannot ask for change attribute notification for a | cannot ask for change attribute notification for a specific file. | |||
specific file. One attribute mask covers all the files in the | One attribute mask covers all the files in the directory. Upon | |||
directory. Upon any attribute change, the server will send back | any attribute change, the server will send back the values of | |||
the values of changed attributes. Notifications might not make | changed attributes. Notifications might not make sense for some | |||
sense for some file system wide attributes and it is up to the | file system-wide attributes, and it is up to the server to decide | |||
server to decide which subset it wants to support. The client can | which subset it wants to support. The client can negotiate the | |||
negotiate the frequency of attribute notifications by letting the | frequency of attribute notifications by letting the server know | |||
server know how often it wants to be notified of an attribute | how often it wants to be notified of an attribute change. The | |||
change. The server will return supported notification frequencies | server will return supported notification frequencies or an | |||
or an indication that no notification is permitted for directory | indication that no notification is permitted for directory or | |||
or child attributes by setting the dir_notif_delay and | child attributes by setting the dir_notif_delay and | |||
dir_entry_notif_delay attributes respectively. | dir_entry_notif_delay attributes, respectively. | |||
NOTIFY4_CHANGE_COOKIE_VERIFIER | NOTIFY4_CHANGE_COOKIE_VERIFIER | |||
If the cookie verifier changes while a client is holding a | If the cookie verifier changes while a client is holding a | |||
delegation, the server will notify the client so that it can | delegation, the server will notify the client so that it can | |||
invalidate its cookies and re-send a READDIR to get the new set of | invalidate its cookies and re-send a READDIR to get the new set of | |||
cookies. | cookies. | |||
20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested | 20.5. Operation 7: CB_PUSH_DELEG - Offer Previously Requested | |||
Delegation to Client | Delegation to Client | |||
skipping to change at page 586, line 30 | skipping to change at page 589, line 30 | |||
}; | }; | |||
20.5.2. RESULT | 20.5.2. RESULT | |||
struct CB_PUSH_DELEG4res { | struct CB_PUSH_DELEG4res { | |||
nfsstat4 cpdr_status; | nfsstat4 cpdr_status; | |||
}; | }; | |||
20.5.3. DESCRIPTION | 20.5.3. DESCRIPTION | |||
CB_PUSH_DELEG is used by the server to both signal to the client that | CB_PUSH_DELEG is used by the server both to signal to the client that | |||
the delegation it wants (previously indicated via a want established | the delegation it wants (previously indicated via a want established | |||
from an OPEN or WANT_DELEGATION operation) is available and to | from an OPEN or WANT_DELEGATION operation) is available and to | |||
simultaneously offer the delegation to the client. The client has | simultaneously offer the delegation to the client. The client has | |||
the choice of accepting the delegation by returning NFS4_OK to the | the choice of accepting the delegation by returning NFS4_OK to the | |||
server, delaying the decision to accept the offered delegation by | server, delaying the decision to accept the offered delegation by | |||
returning NFS4ERR_DELAY or permanently rejecting the offer of the | returning NFS4ERR_DELAY, or permanently rejecting the offer of the | |||
delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is | delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is | |||
rejected in this fashion, the want previously established is | rejected in this fashion, the want previously established is | |||
permanently deleted and the delegation is subject to acquisition by | permanently deleted and the delegation is subject to acquisition by | |||
another client. | another client. | |||
20.5.4. IMPLEMENTATION | 20.5.4. IMPLEMENTATION | |||
If the client does return NFS4ERR_DELAY and there is a conflicting | If the client does return NFS4ERR_DELAY and there is a conflicting | |||
delegation request, the server MAY process it at the expense of the | delegation request, the server MAY process it at the expense of the | |||
client that returned NFS4ERR_DELAY. The client's want will not be | client that returned NFS4ERR_DELAY. The client's want will not be | |||
cancelled, but MAY processed behind other delegation requests or | cancelled, but MAY be processed behind other delegation requests or | |||
registered wants. | registered wants. | |||
When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or | When a client returns a status other than NFS4_OK, NFS4ERR_DELAY, or | |||
NFS4ERR_REJECT_DELAY, the want remains pending, although servers may | NFS4ERR_REJECT_DELAY, the want remains pending, although servers may | |||
decide to cancel the want by sending a CB_WANTS_CANCELLED. | decide to cancel the want by sending a CB_WANTS_CANCELLED. | |||
20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects | 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects | |||
20.6.1. ARGUMENT | 20.6.1. ARGUMENT | |||
skipping to change at page 587, line 36 | skipping to change at page 590, line 36 | |||
20.6.2. RESULT | 20.6.2. RESULT | |||
struct CB_RECALL_ANY4res { | struct CB_RECALL_ANY4res { | |||
nfsstat4 crar_status; | nfsstat4 crar_status; | |||
}; | }; | |||
20.6.3. DESCRIPTION | 20.6.3. DESCRIPTION | |||
The server may decide that it cannot hold all of the state for | The server may decide that it cannot hold all of the state for | |||
recallable objects, such as delegations and layouts, without running | recallable objects, such as delegations and layouts, without running | |||
out of resources. In such a case, it is free to recall individual | out of resources. In such a case, while not optimal, the server is | |||
objects to reduce the load but this would be far from optimal. | free to recall individual objects to reduce the load. | |||
Because the general purpose of such recallable objects as delegations | Because the general purpose of such recallable objects as delegations | |||
is to eliminate client interaction with the server, the server cannot | is to eliminate client interaction with the server, the server cannot | |||
interpret lack of recent use as indicating that the object is no | interpret lack of recent use as indicating that the object is no | |||
longer useful. The absence of visible use may be the result of a | longer useful. The absence of visible use is consistent with a | |||
large number of potential operations eliminated. In the case of | delegation keeping potential operations from being sent to the | |||
layouts, the layout will be used explicitly but the metadata server | server. In the case of layouts, while it is true that the usefulness | |||
does not have direct knowledge of such use. | of a layout is indicated by the use of the layout when storage | |||
devices receive I/O requests, because there is no mandate that a | ||||
storage device indicate to the metadata server any past or present | ||||
use of a layout, the metadata server is not likely to know which | ||||
layouts are good candidates to recall in response to low resources. | ||||
In order to implement an effective reclaim scheme for such objects, | In order to implement an effective reclaim scheme for such objects, | |||
the server's knowledge of available resources must be used to | the server's knowledge of available resources must be used to | |||
determine when objects must be recalled with the clients selecting | determine when objects must be recalled with the clients selecting | |||
the actual objects to be returned. | the actual objects to be returned. | |||
Server implementations may differ in their resource allocation | Server implementations may differ in their resource allocation | |||
requirements. For example, one server may share resources among all | requirements. For example, one server may share resources among all | |||
classes of recallable objects whereas another may use separate | classes of recallable objects, whereas another may use separate | |||
resource pools for layouts and for delegations, or further separate | resource pools for layouts and for delegations, or further separate | |||
resources by types of delegations. | resources by types of delegations. | |||
When a given resource pool is over-utilized, the server can send a | When a given resource pool is over-utilized, the server can send a | |||
CB_RECALL_ANY to clients holding recallable objects of the types | CB_RECALL_ANY to clients holding recallable objects of the types | |||
involved, allowing it to keep a certain number of such objects and | involved, allowing it to keep a certain number of such objects and | |||
return any excess. A mask specifies which types of objects are to be | return any excess. A mask specifies which types of objects are to be | |||
limited. The client chooses, based on its own knowledge of current | limited. The client chooses, based on its own knowledge of current | |||
usefulness, which of the objects in that class should be returned. | usefulness, which of the objects in that class should be returned. | |||
A number of bits are defined. For some of these, ranges are defined | A number of bits are defined. For some of these, ranges are defined | |||
and it is up to the definition of the storage protocol to specify how | and it is up to the definition of the storage protocol to specify how | |||
these are to be used. There are ranges reserved for object-based | these are to be used. There are ranges reserved for object-based | |||
storage protocols and for other experimental storage protocols. An | storage protocols and for other experimental storage protocols. An | |||
RFC defining such a storage protocol needs to specify how particular | RFC defining such a storage protocol needs to specify how particular | |||
bits within its range are to be used. For example, it may specify a | bits within its range are to be used. For example, it may specify a | |||
mapping between attributes of the layout (read vs. write, size of | mapping between attributes of the layout (read vs. write, size of | |||
area) and the bit to be used or it may define a field in the layout | area) and the bit to be used, or it may define a field in the layout | |||
where the associated bit position is made available by the server to | where the associated bit position is made available by the server to | |||
the client. | the client. | |||
RCA4_TYPE_MASK_RDATA_DLG | RCA4_TYPE_MASK_RDATA_DLG | |||
The client is to return read delegations on non-directory file | The client is to return OPEN_DELEGATE_READ delegations on non- | |||
objects. | directory file objects. | |||
RCA4_TYPE_MASK_WDATA_DLG | RCA4_TYPE_MASK_WDATA_DLG | |||
The client is to return write delegations on regular file objects. | The client is to return OPEN_DELEGATE_WRITE delegations on regular | |||
file objects. | ||||
RCA4_TYPE_MASK_DIR_DLG | RCA4_TYPE_MASK_DIR_DLG | |||
The client is to return directory delegations. | The client is to return directory delegations. | |||
RCA4_TYPE_MASK_FILE_LAYOUT | RCA4_TYPE_MASK_FILE_LAYOUT | |||
The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. | The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. | |||
RCA4_TYPE_MASK_BLK_LAYOUT | RCA4_TYPE_MASK_BLK_LAYOUT | |||
See [41] for a description. | See [41] for a description. | |||
RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | |||
See [40] for a description. | See [40] for a description. | |||
RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX | RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX | |||
This range is reserved for telling the client to recall layouts of | This range is reserved for telling the client to recall layouts of | |||
experimental or site specific layout types (see Section 3.3.13). | experimental or site-specific layout types (see Section 3.3.13). | |||
When a bit is set in the type mask that corresponds to an undefined | When a bit is set in the type mask that corresponds to an undefined | |||
type of recallable object, NFS4ERR_INVAL MUST be returned. When a | type of recallable object, NFS4ERR_INVAL MUST be returned. When a | |||
bit is set that corresponds to a defined type of object, but the | bit is set that corresponds to a defined type of object but the | |||
client does not support an object of the type, NFS4ERR_INVAL MUST NOT | client does not support an object of the type, NFS4ERR_INVAL MUST NOT | |||
be returned. Future minor versions of NFSv4 may expand the set of | be returned. Future minor versions of NFSv4 may expand the set of | |||
valid type mask bits. | valid type mask bits. | |||
CB_RECALL_ANY specifies a count of objects that the client may keep | CB_RECALL_ANY specifies a count of objects that the client may keep | |||
as opposed to a count that the client must return. This is to avoid | as opposed to a count that the client must return. This is to avoid | |||
potential race between a CB_RECALL_ANY that had a count of objects to | a potential race between a CB_RECALL_ANY that had a count of objects | |||
free with a set of client-originated operations to return layouts or | to free with a set of client-originated operations to return layouts | |||
delegations. As a result of the race, the client and server would | or delegations. As a result of the race, the client and server would | |||
have differing ideas as to how many objects to return. Hence the | have differing ideas as to how many objects to return. Hence, the | |||
client could mistakenly free too many. | client could mistakenly free too many. | |||
If resource demands prompt it, the server may send another | If resource demands prompt it, the server may send another | |||
CB_RECALL_ANY with a lower count, even it has not yet received an | CB_RECALL_ANY with a lower count, even if it has not yet received an | |||
acknowledgement from the client for a previous CB_RECALL_ANY with the | acknowledgment from the client for a previous CB_RECALL_ANY with the | |||
same type mask. Although the possibility exists that these will be | same type mask. Although the possibility exists that these will be | |||
received by the client in a order different from the order in which | received by the client in an order different from the order in which | |||
they were sent, any such permutation of the callback stream is | they were sent, any such permutation of the callback stream is | |||
harmless. It is the job of the client to bring down the size of the | harmless. It is the job of the client to bring down the size of the | |||
recallable object set in line with each CB_RECALL_ANY received and | recallable object set in line with each CB_RECALL_ANY received, and | |||
until that obligation is met it cannot be cancelled or modified by | until that obligation is met, it cannot be cancelled or modified by | |||
any subsequent CB_RECALL_ANY for the same type mask. Thus if the | any subsequent CB_RECALL_ANY for the same type mask. Thus, if the | |||
server sends two CB_RECALL_ANY's, the effect will be the same as if | server sends two CB_RECALL_ANYs, the effect will be the same as if | |||
the lower count was sent, whatever the order of recall receipt. Note | the lower count was sent, whatever the order of recall receipt. Note | |||
that this means that a server may not cancel the effect of a | that this means that a server may not cancel the effect of a | |||
CB_RECALL_ANY by sending another recall with a higher count. When a | CB_RECALL_ANY by sending another recall with a higher count. When a | |||
CB_RECALL_ANY is received and the count is already within the limit | CB_RECALL_ANY is received and the count is already within the limit | |||
set or is above a limit that the client is working to get down to, | set or is above a limit that the client is working to get down to, | |||
that callback has no effect. | that callback has no effect. | |||
Servers are generally free not to give out recallable objects when | Servers are generally free to deny recallable objects when | |||
insufficient resources are available. Note that the effect of such a | insufficient resources are available. Note that the effect of such a | |||
policy is implicitly to give precedence to existing objects relative | policy is implicitly to give precedence to existing objects relative | |||
to requested ones, with the result that resources might not be | to requested ones, with the result that resources might not be | |||
optimally used. To prevent this, servers are well advised to make | optimally used. To prevent this, servers are well advised to make | |||
the point at which they start sending CB_RECALL_ANY callbacks | the point at which they start sending CB_RECALL_ANY callbacks | |||
somewhat below that at which they cease to give out new delegations | somewhat below that at which they cease to give out new delegations | |||
and layouts. This allows the client to purge its less-used objects | and layouts. This allows the client to purge its less-used objects | |||
whenever appropriate and so continue to have its subsequent requests | whenever appropriate and so continue to have its subsequent requests | |||
given new resources freed up by object returns. | given new resources freed up by object returns. | |||
20.6.4. IMPLEMENTATION | 20.6.4. IMPLEMENTATION | |||
The client can choose to return any type of object specified by the | The client can choose to return any type of object specified by the | |||
mask. If a server wishes to limit use of objects of a specific type, | mask. If a server wishes to limit the use of objects of a specific | |||
it should only specify that type in the mask sent. The client may | type, it should only specify that type in the mask it sends. Should | |||
not return requested objects and it is up to the server to handle | the client fail to return requested objects, it is up to the server | |||
this situation, typically by doing specific recalls to properly limit | to handle this situation, typically by sending specific recalls | |||
resource usage. The server should give the client enough time to | (i.e., sending CB_RECALL operations) to properly limit resource | |||
return objects before proceeding to specific recalls. This time | usage. The server should give the client enough time to return | |||
should not be less than the lease period. | objects before proceeding to specific recalls. This time should not | |||
be less than the lease period. | ||||
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for | 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal Resources for | |||
Recallable Objects | Recallable Objects | |||
20.7.1. ARGUMENT | 20.7.1. ARGUMENT | |||
typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; | typedef CB_RECALL_ANY4args CB_RECALLABLE_OBJ_AVAIL4args; | |||
20.7.2. RESULT | 20.7.2. RESULT | |||
skipping to change at page 591, line 11 | skipping to change at page 594, line 16 | |||
can have runs the risk of having objects recalled. | can have runs the risk of having objects recalled. | |||
The server is not obligated to reserve the difference between the | The server is not obligated to reserve the difference between the | |||
number of the objects the client currently has and the value of | number of the objects the client currently has and the value of | |||
craa_objects_to_keep, nor does delaying the reply to | craa_objects_to_keep, nor does delaying the reply to | |||
CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources | CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources | |||
of the recallable objects for another purpose. Indeed, if a client | of the recallable objects for another purpose. Indeed, if a client | |||
responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might | responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might | |||
interpret the client as having reduced capability to manage | interpret the client as having reduced capability to manage | |||
recallable objects, and so cancel or reduce any reservation it is | recallable objects, and so cancel or reduce any reservation it is | |||
maintaining on behalf of the client. Thus if the client desires to | maintaining on behalf of the client. Thus, if the client desires to | |||
acquire more recallable objects, it needs to reply quickly to | acquire more recallable objects, it needs to reply quickly to | |||
CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to | CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to | |||
acquire recallable objects. | acquire recallable objects. | |||
20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control Limits | 20.8. Operation 10: CB_RECALL_SLOT - Change Flow Control Limits | |||
20.8.1. ARGUMENT | 20.8.1. ARGUMENT | |||
struct CB_RECALL_SLOT4args { | struct CB_RECALL_SLOT4args { | |||
slotid4 rsa_target_highest_slotid; | slotid4 rsa_target_highest_slotid; | |||
skipping to change at page 591, line 33 | skipping to change at page 594, line 38 | |||
20.8.2. RESULT | 20.8.2. RESULT | |||
struct CB_RECALL_SLOT4res { | struct CB_RECALL_SLOT4res { | |||
nfsstat4 rsr_status; | nfsstat4 rsr_status; | |||
}; | }; | |||
20.8.3. DESCRIPTION | 20.8.3. DESCRIPTION | |||
The CB_RECALL_SLOT operation requests the client to return session | The CB_RECALL_SLOT operation requests the client to return session | |||
slots, and if applicable, transport credits (e.g. RDMA credits for | slots, and if applicable, transport credits (e.g., RDMA credits for | |||
connections associated with the operations channel) of the session's | connections associated with the operations channel) of the session's | |||
fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid, | fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid, | |||
the value of the target highest slot ID the server wants for the | the value of the target highest slot ID the server wants for the | |||
session. The client MUST then progress toward reducing the session's | session. The client MUST then progress toward reducing the session's | |||
highest slot ID to the target value. | highest slot ID to the target value. | |||
If the session has only non-RDMA connections associated with its | If the session has only non-RDMA connections associated with its | |||
operations channel, then the client need only wait for all | operations channel, then the client need only wait for all | |||
outstanding requests with a slot ID > rsa_target_highest_slotid to | outstanding requests with a slot ID > rsa_target_highest_slotid to | |||
complete, then send a single COMPOUND consisting of a single SEQUENCE | complete, then send a single COMPOUND consisting of a single SEQUENCE | |||
operation, with the sa_highestslot field set to | operation, with the sa_highestslot field set to | |||
rsa_target_highest_slotid. If there are RDMA-based connections | rsa_target_highest_slotid. If there are RDMA-based connections | |||
associated with operation channel, then the client needs to also send | associated with operation channel, then the client needs to also send | |||
enough zero-length RDMA Sends to take the total RDMA credit count to | enough zero-length "RDMA Send" messages to take the total RDMA credit | |||
rsa_target_highest_slotid + 1 or below. | count to rsa_target_highest_slotid + 1 or below. | |||
20.8.4. IMPLEMENTATION | 20.8.4. IMPLEMENTATION | |||
If the client fails to reduce highest slot it has on the fore channel | If the client fails to reduce highest slot it has on the fore channel | |||
to what the server requests, the server can force the issue by | to what the server requests, the server can force the issue by | |||
asserting flow control on the receive side of all connections bound | asserting flow control on the receive side of all connections bound | |||
to the fore channel, and then finish servicing all outstanding | to the fore channel, and then finish servicing all outstanding | |||
requests that are in slots greater than rsa_target_highest_slotid. | requests that are in slots greater than rsa_target_highest_slotid. | |||
Once that is done, the server can then open the flow control, and any | Once that is done, the server can then open the flow control, and any | |||
time the client sends a new request on a slot greater than | time the client sends a new request on a slot greater than | |||
skipping to change at page 593, line 29 | skipping to change at page 596, line 29 | |||
void; | void; | |||
}; | }; | |||
20.9.3. DESCRIPTION | 20.9.3. DESCRIPTION | |||
The CB_SEQUENCE operation is used to manage operational accounting | The CB_SEQUENCE operation is used to manage operational accounting | |||
for the backchannel of the session on which a request is sent. The | for the backchannel of the session on which a request is sent. The | |||
contents include the session ID to which this request belongs, the | contents include the session ID to which this request belongs, the | |||
slot ID and sequence ID used by the server to implement session | slot ID and sequence ID used by the server to implement session | |||
request control and exactly once semantics, and exchanged slot ID | request control and exactly once semantics, and exchanged slot ID | |||
maxima which are used to adjust the size of the reply cache. This | maxima that are used to adjust the size of the reply cache. In each | |||
operation will appear once as the first operation in each CB_COMPOUND | CB_COMPOUND request, CB_SEQUENCE MUST appear once and MUST be the | |||
request or a protocol error MUST result. See Section 18.46.3 for a | first operation. The error NFS4ERR_SEQUENCE_POS MUST be returned | |||
description of how slots are processed. | when CB_SEQUENCE is found in any position in a CB_COMPOUND beyond the | |||
first. If any other operation is in the first position of | ||||
CB_COMPOUND, NFS4ERR_OP_NOT_IN_SESSION MUST be returned. | ||||
See Section 18.46.3 for a description of how slots are processed. | ||||
If csa_cachethis is TRUE, then the server is requesting that the | If csa_cachethis is TRUE, then the server is requesting that the | |||
client cache the reply in the callback reply cache. The client MUST | client cache the reply in the callback reply cache. The client MUST | |||
cache the reply (see Section 2.10.6.1.3). | cache the reply (see Section 2.10.6.1.3). | |||
The csa_referring_call_lists array is the list of COMPOUND requests, | The csa_referring_call_lists array is the list of COMPOUND requests, | |||
identified by session ID, slot ID and sequence ID. These are | identified by session ID, slot ID, and sequence ID. These are | |||
requests that the client previously sent to the server. These | requests that the client previously sent to the server. These | |||
previous requests created state that some operation(s) in the same | previous requests created state that some operation(s) in the same | |||
CB_COMPOUND as the csa_referring_call_lists are identifying. A | CB_COMPOUND as the csa_referring_call_lists are identifying. A | |||
session ID is included because leased state is tied to a client ID, | session ID is included because leased state is tied to a client ID, | |||
and a client ID can have multiple sessions. See Section 2.10.6.3. | and a client ID can have multiple sessions. See Section 2.10.6.3. | |||
The value of the csa_sequenceid argument relative to the cached | The value of the csa_sequenceid argument relative to the cached | |||
sequence ID on the slot falls into one of three cases. | sequence ID on the slot falls into one of three cases. | |||
o If the difference between csa_sequenceid and the client's cached | o If the difference between csa_sequenceid and the client's cached | |||
skipping to change at page 594, line 21 | skipping to change at page 597, line 25 | |||
o If csa_sequenceid is one greater (accounting for wraparound) than | o If csa_sequenceid is one greater (accounting for wraparound) than | |||
the cached sequence ID, then this is a new request, and the slot's | the cached sequence ID, then this is a new request, and the slot's | |||
sequence ID is incremented. The operations subsequent to | sequence ID is incremented. The operations subsequent to | |||
CB_SEQUENCE, if any, are processed. If there are no other | CB_SEQUENCE, if any, are processed. If there are no other | |||
operations, the only other effects are to cache the CB_SEQUENCE | operations, the only other effects are to cache the CB_SEQUENCE | |||
reply in the slot, maintain the session's activity, and when the | reply in the slot, maintain the session's activity, and when the | |||
server receives the CB_SEQUENCE reply, renew the lease of state | server receives the CB_SEQUENCE reply, renew the lease of state | |||
related to the client ID. | related to the client ID. | |||
If the server reuses a slot ID and sequence ID for a completely | If the server reuses a slot ID and sequence ID for a completely | |||
different request, the client MAY treat the request as if it is retry | different request, the client MAY treat the request as if it is a | |||
of what it has already executed. The client MAY however detect the | retry of what it has already executed. The client MAY however detect | |||
server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | the server's illegal reuse and return NFS4ERR_SEQ_FALSE_RETRY. | |||
If CB_SEQUENCE returns an error, then the state of the slot (sequence | If CB_SEQUENCE returns an error, then the state of the slot (sequence | |||
ID, cached reply) MUST NOT change. See Section 2.10.6.1.3 for the | ID, cached reply) MUST NOT change. See Section 2.10.6.1.3 for the | |||
conditions when the error NFS4ERR_RETRY_UNCACHED_REP might be | conditions when the error NFS4ERR_RETRY_UNCACHED_REP might be | |||
returned. | returned. | |||
The client returns two "highest_slotid" values: csr_highest_slotid, | The client returns two "highest_slotid" values: csr_highest_slotid | |||
and csr_target_highest_slotid. The former is the highest slot ID the | and csr_target_highest_slotid. The former is the highest slot ID the | |||
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT | client will accept in a future CB_SEQUENCE operation, and SHOULD NOT | |||
be less than the value of csa_highest_slotid (but see | be less than the value of csa_highest_slotid (but see | |||
Section 2.10.6.1 for an exception). The latter is the highest slot | Section 2.10.6.1 for an exception). The latter is the highest slot | |||
ID the client would prefer the server use on a future CB_SEQUENCE | ID the client would prefer the server use on a future CB_SEQUENCE | |||
operation. | operation. | |||
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation | |||
Wants | Wants | |||
skipping to change at page 595, line 14 | skipping to change at page 598, line 14 | |||
20.10.2. RESULT | 20.10.2. RESULT | |||
struct CB_WANTS_CANCELLED4res { | struct CB_WANTS_CANCELLED4res { | |||
nfsstat4 cwcr_status; | nfsstat4 cwcr_status; | |||
}; | }; | |||
20.10.3. DESCRIPTION | 20.10.3. DESCRIPTION | |||
The CB_WANTS_CANCELLED operation is used to notify the client that | The CB_WANTS_CANCELLED operation is used to notify the client that | |||
the some or all wants it registered for recallable delegations and | some or all of the wants it registered for recallable delegations and | |||
layouts have been cancelled. | layouts have been cancelled. | |||
If cwca_contended_wants_cancelled is TRUE, this indicates the server | If cwca_contended_wants_cancelled is TRUE, this indicates that the | |||
will not be pushing to the client any delegations that become | server will not be pushing to the client any delegations that become | |||
available after contention passes. | available after contention passes. | |||
If cwca_resourced_wants_cancelled is TRUE, this indicates the server | If cwca_resourced_wants_cancelled is TRUE, this indicates that the | |||
will not notify the client when there are resources on the server to | server will not notify the client when there are resources on the | |||
grant delegations or layouts. | server to grant delegations or layouts. | |||
After receiving a CB_WANTS_CANCELLED operation, the client is free to | After receiving a CB_WANTS_CANCELLED operation, the client is free to | |||
attempt to acquire the delegations or layouts it was waiting for, and | attempt to acquire the delegations or layouts it was waiting for, and | |||
possibly re-register wants. | possibly re-register wants. | |||
20.10.4. IMPLEMENTATION | 20.10.4. IMPLEMENTATION | |||
When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION | When a client has an OPEN, WANT_DELEGATION, or GET_DIR_DELEGATION | |||
request outstanding, when a CB_WANTS_CANCELLED is sent, the server | request outstanding, when a CB_WANTS_CANCELLED is sent, the server | |||
may need to make clear to the client whether a promise to signal | may need to make clear to the client whether a promise to signal | |||
skipping to change at page 596, line 15 | skipping to change at page 599, line 15 | |||
20.11.2. RESULT | 20.11.2. RESULT | |||
struct CB_NOTIFY_LOCK4res { | struct CB_NOTIFY_LOCK4res { | |||
nfsstat4 cnlr_status; | nfsstat4 cnlr_status; | |||
}; | }; | |||
20.11.3. DESCRIPTION | 20.11.3. DESCRIPTION | |||
The server can use this operation to indicate that a byte-range lock | The server can use this operation to indicate that a byte-range lock | |||
for the given file and lock-owner, previously requested by the client | for the given file and lock-owner, previously requested by the client | |||
via an unsuccessful LOCK request, might be available. | via an unsuccessful LOCK operation, might be available. | |||
This callback is meant to be used by servers to help reduce the | This callback is meant to be used by servers to help reduce the | |||
latency of blocking locks in the case where they recognize that a | latency of blocking locks in the case where they recognize that a | |||
client which has been polling for a blocking lock may now be able to | client that has been polling for a blocking byte-range lock may now | |||
acquire the lock. If the server supports this callback for a given | be able to acquire the lock. If the server supports this callback | |||
file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when | for a given file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag | |||
responding to successful opens for that file. This does not commit | when responding to successful opens for that file. This does not | |||
the server to the use of CB_NOTIFY_LOCK, but the client may use this | commit the server to the use of CB_NOTIFY_LOCK, but the client may | |||
as a hint to decide how frequently to poll for locks derived from | use this as a hint to decide how frequently to poll for locks derived | |||
that open. | from that open. | |||
If an OPEN operation results in an upgrade, in which the stateid | If an OPEN operation results in an upgrade, in which the stateid | |||
returned has an "other" value matching that of a stateid already | returned has an "other" value matching that of a stateid already | |||
allocated, with a new "seqid" indicating a change in the lock being | allocated, with a new "seqid" indicating a change in the lock being | |||
represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag | represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag | |||
when responding to that new OPEN controls handling from that point | when responding to that new OPEN controls handling from that point | |||
going forward. When parallel OPENs are done on the same file and | going forward. When parallel OPENs are done on the same file and | |||
open-owner, the ordering of the "seqid" field of the returned stateid | open-owner, the ordering of the "seqid" fields of the returned | |||
(subject to wraparound) are to be used to select the controlling | stateids (subject to wraparound) are to be used to select the | |||
value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. | controlling value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. | |||
20.11.4. IMPLEMENTATION | 20.11.4. IMPLEMENTATION | |||
The server MUST NOT grant the lock to the client unless and until it | The server MUST NOT grant the byte-range lock to the client unless | |||
receives an actual LOCK request from the client. Similarly, the | and until it receives a LOCK operation from the client. Similarly, | |||
client receiving this callback cannot assume that it now has the | the client receiving this callback cannot assume that it now has the | |||
lock, or that a subsequent LOCK request for the lock will be | lock or that a subsequent LOCK operation for the lock will be | |||
successful. | successful. | |||
The server is not required to implement this callback, and even if it | The server is not required to implement this callback, and even if it | |||
does, it is not required to use it in any particular case. Therefore | does, it is not required to use it in any particular case. | |||
the client must still rely on polling for blocking locks, as | Therefore, the client must still rely on polling for blocking locks, | |||
described in Section 9.6. | as described in Section 9.6. | |||
Similarly, the client is not required to implement this callback, and | Similarly, the client is not required to implement this callback, and | |||
even it does, is still free to ignore it. Therefore the server MUST | even it does, is still free to ignore it. Therefore, the server MUST | |||
NOT assume that the client will act based on the callback. | NOT assume that the client will act based on the callback. | |||
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device ID | 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify Client of Device ID | |||
Changes | Changes | |||
20.12.1. ARGUMENT | 20.12.1. ARGUMENT | |||
/* | /* | |||
* Device notification types. | * Device notification types. | |||
*/ | */ | |||
skipping to change at page 598, line 21 | skipping to change at page 601, line 21 | |||
after being deleted (Section 12.2.10). | after being deleted (Section 12.2.10). | |||
All device ID notifications contain a device ID and a layout type. | All device ID notifications contain a device ID and a layout type. | |||
The layout type is necessary because two different layout types can | The layout type is necessary because two different layout types can | |||
share the same device ID, and the common device ID can have | share the same device ID, and the common device ID can have | |||
completely different mappings for each layout type. | completely different mappings for each layout type. | |||
The server will send the following notifications: | The server will send the following notifications: | |||
NOTIFY_DEVICEID4_CHANGE | NOTIFY_DEVICEID4_CHANGE | |||
A previously provided device ID to device address mapping has | A previously provided device-ID-to-device-address mapping has | |||
changed and the client uses GETDEVICEINFO to obtain the updated | changed and the client uses GETDEVICEINFO to obtain the updated | |||
mapping. The notification is encoded in a value of data type | mapping. The notification is encoded in a value of data type | |||
notify_deviceid_change4. This data type also contains a boolean | notify_deviceid_change4. This data type also contains a boolean | |||
field, ndc_immediate, which if TRUE indicates that the change will | field, ndc_immediate, which if TRUE indicates that the change will | |||
be enforced immediately, and so the client might not be able to | be enforced immediately, and so the client might not be able to | |||
complete any pending I/O to the device ID. If ndc_immediate is | complete any pending I/O to the device ID. If ndc_immediate is | |||
FALSE, then for an indefinite time, the client can complete | FALSE, then for an indefinite time, the client can complete | |||
pending I/O. After pending I/O is complete, the client SHOULD get | pending I/O. After pending I/O is complete, the client SHOULD get | |||
the new device ID to device address mappings before sending new | the new device-ID-to-device-address mappings before sending new | |||
I/O requests to the device ID. | I/O requests to the storage devices addressed by the device ID. | |||
NOTIFY4_DEVICEID_DELETE | NOTIFY4_DEVICEID_DELETE | |||
Deletes a device ID from the mappings. This notification MUST NOT | Deletes a device ID from the mappings. This notification MUST NOT | |||
be sent if the client has a layout that refers to the device ID. | be sent if the client has a layout that refers to the device ID. | |||
In other words if the server is sending a delete device ID | In other words, if the server is sending a delete device ID | |||
notification, one of the following is true for layouts associated | notification, one of the following is true for layouts associated | |||
with the layout type: | with the layout type: | |||
* The client never had a layout referring to that device ID. | * The client never had a layout referring to that device ID. | |||
* The client has returned all layouts referring to that device | * The client has returned all layouts referring to that device | |||
ID. | ID. | |||
* The server has revoked all layouts referring to that device ID. | * The server has revoked all layouts referring to that device ID. | |||
skipping to change at page 599, line 31 | skipping to change at page 602, line 31 | |||
This operation is a placeholder for encoding a result to handle the | This operation is a placeholder for encoding a result to handle the | |||
case of the server sending an operation code within CB_COMPOUND that | case of the server sending an operation code within CB_COMPOUND that | |||
is not defined in the NFSv4.1 specification. See Section 19.2.3 for | is not defined in the NFSv4.1 specification. See Section 19.2.3 for | |||
more details. | more details. | |||
The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. | The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. | |||
20.13.4. IMPLEMENTATION | 20.13.4. IMPLEMENTATION | |||
A server will probably not send an operation with code OP_CB_ILLEGAL | A server will probably not send an operation with code OP_CB_ILLEGAL, | |||
but if it does, the response will be CB_ILLEGAL4res just as it would | but if it does, the response will be CB_ILLEGAL4res just as it would | |||
be with any other invalid operation code. Note that if the client | be with any other invalid operation code. Note that if the client | |||
gets an illegal operation code that is not OP_ILLEGAL, and if the | gets an illegal operation code that is not OP_ILLEGAL, and if the | |||
client checks for legal operation codes during the XDR decode phase, | client checks for legal operation codes during the XDR decode phase, | |||
then an instance of data type CB_ILLEGAL4res will not be returned. | then an instance of data type CB_ILLEGAL4res will not be returned. | |||
21. Security Considerations | 21. Security Considerations | |||
Historically the authentication of model of NFS had the entire | Historically, the authentication model of NFS was based on the entire | |||
machine being the NFS client, and the NFS server trusting the NFS | machine being the NFS client, with the NFS server trusting the NFS | |||
client to authenticate the end-user. The NFS server in turn shared | client to authenticate the end-user. The NFS server in turn shared | |||
its files only to specific clients, as identified by the client's | its files only to specific clients, as identified by the client's | |||
source network address. Given this model, the AUTH_SYS RPC security | source network address. Given this model, the AUTH_SYS RPC security | |||
flavor simply identified the end-user using the client to the NFS | flavor simply identified the end-user using the client to the NFS | |||
server. When processing NFS responses, the client ensured that the | server. When processing NFS responses, the client ensured that the | |||
responses came from the same network address and port number that the | responses came from the same network address and port number to which | |||
request was sent to. While such a model is easy to implement and | the request was sent. While such a model is easy to implement and | |||
simple to deploy and use, it is unsafe. Thus, NFSv4.1 | simple to deploy and use, it is unsafe. Thus, NFSv4.1 | |||
implementations are REQUIRED to support a security model that uses | implementations are REQUIRED to support a security model that uses | |||
end to end authentication, where an end-user on a client mutually | end-to-end authentication, where an end-user on a client mutually | |||
authenticates (via cryptographic schemes that do not expose passwords | authenticates (via cryptographic schemes that do not expose passwords | |||
or keys in the clear on the network) to a principal on an NFS server. | or keys in the clear on the network) to a principal on an NFS server. | |||
Consideration is also be given to the integrity and privacy of NFS | Consideration is also given to the integrity and privacy of NFS | |||
requests and responses. The issues of end to end mutual | requests and responses. The issues of end-to-end mutual | |||
authentication, integrity, and privacy are discussed | authentication, integrity, and privacy are discussed in | |||
Section 2.2.1.1.1. There are specific considerations when using | Section 2.2.1.1.1. There are specific considerations when using | |||
Kerberos V5 as described in Section 2.2.1.1.1.2.1.1. | Kerberos V5 as described in Section 2.2.1.1.1.2.1.1. | |||
Note that being REQUIRED to implement does not mean REQUIRED to use; | Note that being REQUIRED to implement does not mean REQUIRED to use; | |||
AUTH_SYS can be used by NFSv4.1 clients and servers. However, | AUTH_SYS can be used by NFSv4.1 clients and servers. However, | |||
AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so | AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so | |||
interoperability via AUTH_SYS is not assured. | interoperability via AUTH_SYS is not assured. | |||
For reasons of reduced administration overhead, better performance | For reasons of reduced administration overhead, better performance, | |||
and/or reduction of CPU utilization, users of NFSv4.1 implementations | and/or reduction of CPU utilization, users of NFSv4.1 implementations | |||
may opt to not use security mechanisms that enable integrity | might decline to use security mechanisms that enable integrity | |||
protection on each remote procedure call and response. The use of | protection on each remote procedure call and response. The use of | |||
mechanisms without integrity leaves the user vulnerable to an | mechanisms without integrity leaves the user vulnerable to a man-in- | |||
attacker in the middle of the NFS client and server that modifies the | the-middle of the NFS client and server that modifies the RPC request | |||
RPC request and/or the response. While implementations are free to | and/or the response. While implementations are free to provide the | |||
provide the option to use weaker security mechanisms, there are three | option to use weaker security mechanisms, there are three operations | |||
operations in particular that warrant the implementation overriding | in particular that warrant the implementation overriding user | |||
user choices. | choices. | |||
o The first two such operations are SECINFO and SECINFO_NO_NAME. It | o The first two such operations are SECINFO and SECINFO_NO_NAME. It | |||
is RECOMMENDED that the client send both operations such that they | is RECOMMENDED that the client send both operations such that they | |||
are protected with a security flavor that has integrity | are protected with a security flavor that has integrity | |||
protection, such as RPCSEC_GSS with either the | protection, such as RPCSEC_GSS with either the | |||
rpc_gss_svc_integrity or rpc_gss_svc_privacy service. Without | rpc_gss_svc_integrity or rpc_gss_svc_privacy service. Without | |||
integrity protection encapsulating SECINFO and SECINFO_NO_NAME and | integrity protection encapsulating SECINFO and SECINFO_NO_NAME and | |||
their results, an attacker in the middle could modify results such | their results, a man-in-the-middle could modify results such that | |||
that the client might select a weaker algorithm in the set allowed | the client might select a weaker algorithm in the set allowed by | |||
by server, making the client and/or server vulnerable to further | the server, making the client and/or server vulnerable to further | |||
attacks. | attacks. | |||
o The third operation that SHOULD use integrity protection is any | o The third operation that SHOULD use integrity protection is any | |||
GETATTR for the fs_locations and fs_locations_info attributes, in | GETATTR for the fs_locations and fs_locations_info attributes, in | |||
order to mitigate the severity of a man in the middle attack. The | order to mitigate the severity of a man-in-the-middle attack. The | |||
attack has two steps. First the attacker modifies the unprotected | attack has two steps. First the attacker modifies the unprotected | |||
results of some operation to return NFS4ERR_MOVED. Second, when | results of some operation to return NFS4ERR_MOVED. Second, when | |||
the client follows up with a GETATTR for the fs_locations or | the client follows up with a GETATTR for the fs_locations or | |||
fs_locations_info attributes, the attacker modifies the results to | fs_locations_info attributes, the attacker modifies the results to | |||
cause the client migrate its traffic to a server controlled by the | cause the client to migrate its traffic to a server controlled by | |||
attacker. With integrity protection, this attack is mitigated. | the attacker. With integrity protection, this attack is | |||
mitigated. | ||||
Relative to previous NFS versions, NFSv4.1 has additional security | Relative to previous NFS versions, NFSv4.1 has additional security | |||
considerations for pNFS (see Section 12.9 and Section 13.12), locking | considerations for pNFS (see Sections 12.9 and 13.12), locking and | |||
and session state (see Section 2.10.8.3), and state recovery during | session state (see Section 2.10.8.3), and state recovery during grace | |||
grace period (see Section 8.4.2.1.1). With respect to locking and | period (see Section 8.4.2.1.1). With respect to locking and session | |||
session state, if SP4_SSV state protection is being used, | state, if SP4_SSV state protection is being used, Section 2.10.10 has | |||
Section 2.10.10 has specific security considerations for the NFSv4.1 | specific security considerations for the NFSv4.1 client and server. | |||
client and server. | ||||
22. IANA Considerations | 22. IANA Considerations | |||
This section uses terms that are defined in [55]. | This section uses terms that are defined in [55]. | |||
22.1. Named Attribute Definitions | 22.1. Named Attribute Definitions | |||
IANA will create a registry called the "NFSv4 Named Attribute | IANA created a registry called the "NFSv4 Named Attribute Definitions | |||
Definitions Registry". | Registry". | |||
The NFSv4.1 protocol supports the association of a file with zero or | The NFSv4.1 protocol supports the association of a file with zero or | |||
more named attributes. The name space identifiers for these | more named attributes. The name space identifiers for these | |||
attributes are defined as string names. The protocol does not define | attributes are defined as string names. The protocol does not define | |||
the specific assignment of the name space for these file attributes. | the specific assignment of the name space for these file attributes. | |||
An IANA registry will promote interoperability where common interests | The IANA registry promotes interoperability where common interests | |||
exist. While application developers are allowed to define and use | exist. While application developers are allowed to define and use | |||
attributes as needed, they are encouraged to register the attributes | attributes as needed, they are encouraged to register the attributes | |||
with IANA. | with IANA. | |||
Such registered named attributes are presumed to apply to all minor | Such registered named attributes are presumed to apply to all minor | |||
versions of NFSv4, including those defined subsequently to the | versions of NFSv4, including those defined subsequently to the | |||
registration. Where the named attribute is intended to be limited | registration. If the named attribute is intended to be limited to | |||
with regard to the minor versions for which they are not be used, the | specific minor versions, this will be clearly stated in the | |||
assignment in registry will clearly state the applicable limits. | registry's assignment. | |||
All assignments to the registry are made on a First Come First Served | All assignments to the registry are made on a First Come First Served | |||
basis, per section 4.1 of [55]. The policy for each assignment is | basis, per Section 4.1 of [55]. The policy for each assignment is | |||
Specification Required, per section 4.1 of [55]. | Specification Required, per Section 4.1 of [55]. | |||
Under the NFSv4.1 specification, the name of a named attribute can in | Under the NFSv4.1 specification, the name of a named attribute can in | |||
theory be up to 2^32 - 1 bytes in length, but in practice NFSv4.1 | theory be up to 2^32 - 1 bytes in length, but in practice NFSv4.1 | |||
clients and servers will be unable to a handle string that long. | clients and servers will be unable to handle a string that long. | |||
IANA should reject any assignment request with a named attribute that | IANA should reject any assignment request with a named attribute that | |||
exceeds 128 UTF-8 characters. To give IESG the flexibility to set up | exceeds 128 UTF-8 characters. To give the IESG the flexibility to | |||
bases of assignment of Experimental Use and Standards Action, the | set up bases of assignment of Experimental Use and Standards Action, | |||
prefixes of "EXPE" and "STDS" are Reserved. The zero length named | the prefixes of "EXPE" and "STDS" are Reserved. The named attribute | |||
attribute name is Reserved. | with a zero-length name is Reserved. | |||
The prefix "PRIV" is allocated for Private Use. A site that wants to | The prefix "PRIV" is designated for Private Use. A site that wants to | |||
make use of unregistered named attributes without risk of conflicting | make use of unregistered named attributes without risk of conflicting | |||
with an assignment in IANA's registry should use the prefix "PRIV" in | with an assignment in IANA's registry should use the prefix "PRIV" in | |||
all of its named attributes. | all of its named attributes. | |||
Because some NFSv4.1 clients and servers have case insensitive | Because some NFSv4.1 clients and servers have case-insensitive | |||
semantics, the fifteen additional lower case and mixed case | semantics, the fifteen additional lower case and mixed case | |||
permutations of each of "EXPE", "PRIV", and "STDS", are Reserved | permutations of each of "EXPE", "PRIV", and "STDS" are Reserved | |||
(e.g. "expe", "expE", "exPe", etc. are Reserved). Similarly, IANA | (e.g., "expe", "expE", "exPe", etc. are Reserved). Similarly, IANA | |||
must not allow two assignments that would conflict if both named | must not allow two assignments that would conflict if both named | |||
attributes were converted to a common case. | attributes were converted to a common case. | |||
The registry of named attributes is a list of assignments, each | The registry of named attributes is a list of assignments, each | |||
containing three fields for each assignment. | containing three fields for each assignment. | |||
1. A US-ASCII string name that is the actual name of the attribute. | 1. A US-ASCII string name that is the actual name of the attribute. | |||
This name must be unique. This string name can be 1 to 128 UTF-8 | This name must be unique. This string name can be 1 to 128 UTF-8 | |||
characters long. | characters long. | |||
skipping to change at page 602, line 32 | skipping to change at page 605, line 30 | |||
3. The point of contact of the registrant. The point of contact can | 3. The point of contact of the registrant. The point of contact can | |||
consume up to 256 bytes (or more if IANA permits). | consume up to 256 bytes (or more if IANA permits). | |||
22.1.1. Initial Registry | 22.1.1. Initial Registry | |||
There is no initial registry. | There is no initial registry. | |||
22.1.2. Updating Registrations | 22.1.2. Updating Registrations | |||
The registrant is always permitted to update the point of contact | The registrant is always permitted to update the point of contact | |||
field. To make any other change will require Expert Review or IESG | field. Any other change will require Expert Review or IESG Approval. | |||
Approval. | ||||
22.2. Device ID Notifications | 22.2. Device ID Notifications | |||
IANA will create a registry called the "NFSv4.1 Device ID | IANA created a registry called the "NFSv4 Device ID Notifications | |||
Notifications Registry". | Registry". | |||
The potential exists for new notification types to be added to the | The potential exists for new notification types to be added to the | |||
CB_NOTIFY_DEVICEID operation Section 20.12. This can be done via | CB_NOTIFY_DEVICEID operation (see Section 20.12). This can be done | |||
changes to the operations that register notifications, or by adding | via changes to the operations that register notifications, or by | |||
new operations to NFSv4. This requires a new minor version of NFSv4, | adding new operations to NFSv4. This requires a new minor version of | |||
and requires a standards track document from IETF. Another way to | NFSv4, and requires a Standards Track document from the IETF. | |||
add a notification is to specify a new layout type (see | Another way to add a notification is to specify a new layout type | |||
Section 22.4). | (see Section 22.4). | |||
Hence all assignments to the registry are made on a Standards Action | Hence, all assignments to the registry are made on a Standards Action | |||
basis per section 4.1 of [55], with Expert Review required. | basis per Section 4.1 of [55], with Expert Review required. | |||
The registry is a list of assignments, each containing five fields | The registry is a list of assignments, each containing five fields | |||
per assignment. | per assignment. | |||
1. The name of the notification type. This name must have the | 1. The name of the notification type. This name must have the | |||
prefix: "NOTIFY_DEVICEID4_". This name must be unique. | prefix "NOTIFY_DEVICEID4_". This name must be unique. | |||
2. The value of the notification. IANA will assign this number, and | 2. The value of the notification. IANA will assign this number, and | |||
the request from the registrant will use TBD1 instead of an | the request from the registrant will use TBD1 instead of an | |||
actual value. IANA MUST use a whole number which can be no | actual value. IANA MUST use a whole number that can be no higher | |||
higher than 2^32-1, and should be the next available value. The | than 2^32-1, and should be the next available value. The value | |||
value assigned must be unique. A Designated Expert must be used | assigned must be unique. A Designated Expert must be used to | |||
to ensure that when the name of the notification type and its | ensure that when the name of the notification type and its value | |||
value are added to the NFSv4.1 notify_deviceid_type4 enumerated | are added to the NFSv4.1 notify_deviceid_type4 enumerated data | |||
data type in the NFSv4.1 XDR description ([13]), the result | type in the NFSv4.1 XDR description ([13]), the result continues | |||
continues to be a valid XDR description. | to be a valid XDR description. | |||
3. The Standards Track RFC(s) that describe the notification. If | 3. The Standards Track RFC(s) that describe the notification. If | |||
the RFC(s) have not yet been published, the registrant will use | the RFC(s) have not yet been published, the registrant will use | |||
RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | |||
4. How the RFC introduces the notification. This is indicated by a | 4. How the RFC introduces the notification. This is indicated by a | |||
single US-ASCII value. If the value is N, it means a minor | single US-ASCII value. If the value is N, it means a minor | |||
revision to the NFSv4 protocol. If the value is L, it means a | revision to the NFSv4 protocol. If the value is L, it means a | |||
new pNFS layout type. Other values can be used with IESG | new pNFS layout type. Other values can be used with IESG | |||
Approval. | Approval. | |||
5. The minor versions of NFSv4 that are allowed to the use the | 5. The minor versions of NFSv4 that are allowed to use the | |||
notification. While these are numeric values, IANA will not | notification. While these are numeric values, IANA will not | |||
allocate and assign them; the author of the relevant RFCs with | allocate and assign them; the author of the relevant RFCs with | |||
IESG Approval assigns these numbers. Each time there is new | IESG Approval assigns these numbers. Each time there is a new | |||
minor version of NFSv4 approved, a Designated Expert should | minor version of NFSv4 approved, a Designated Expert should | |||
review the registry to make recommended updates as needed. | review the registry to make recommended updates as needed. | |||
22.2.1. Initial Registry | 22.2.1. Initial Registry | |||
The initial registry is in Table 16. Note that next available value | The initial registry is in Table 16. Note that the next available | |||
is zero. | value is zero. | |||
+-------------------------+-------+----------+-----+----------------+ | +-------------------------+-------+---------+-----+----------------+ | |||
| Notification Name | Value | RFC | How | Minor Versions | | | Notification Name | Value | RFC | How | Minor Versions | | |||
+-------------------------+-------+----------+-----+----------------+ | +-------------------------+-------+---------+-----+----------------+ | |||
| NOTIFY_DEVICEID4_CHANGE | 1 | RFCTBD10 | N | 1 | | | NOTIFY_DEVICEID4_CHANGE | 1 | RFC5661 | N | 1 | | |||
| NOTIFY_DEVICEID4_DELETE | 2 | RFCTBD10 | N | 1 | | | NOTIFY_DEVICEID4_DELETE | 2 | RFC5661 | N | 1 | | |||
+-------------------------+-------+----------+-----+----------------+ | +-------------------------+-------+---------+-----+----------------+ | |||
Table 16: Initial Device ID Notification Assignments | Table 16: Initial Device ID Notification Assignments | |||
22.2.2. Updating Registrations | 22.2.2. Updating Registrations | |||
The update of a registration will require IESG Approval on the advice | The update of a registration will require IESG Approval on the advice | |||
of a Designated Expert. | of a Designated Expert. | |||
22.3. Object Recall Types | 22.3. Object Recall Types | |||
IANA will create a registry called the "NFSv4.1 Recallable Object | IANA created a registry called the "NFSv4 Recallable Object Types | |||
Types Registry". | Registry". | |||
The potential exists for new object types to be added to the | The potential exists for new object types to be added to the | |||
CB_RECALL_ANY operation (see Section 20.6). This can be done via | CB_RECALL_ANY operation (see Section 20.6). This can be done via | |||
changes to the operations that add recallable types, or by adding new | changes to the operations that add recallable types, or by adding new | |||
operations to NFSv4. This requires a new minor version of NFSv4, and | operations to NFSv4. This requires a new minor version of NFSv4, and | |||
requires a standards track document from IETF. Another way to add a | requires a Standards Track document from IETF. Another way to add a | |||
new recallable object is to specify a new layout type (see | new recallable object is to specify a new layout type (see | |||
Section 22.4). | Section 22.4). | |||
All assignments to the registry are made on a Standards Action basis | All assignments to the registry are made on a Standards Action basis | |||
per section 4.1 of [55], with Expert Review required. | per Section 4.1 of [55], with Expert Review required. | |||
Recallable object types are 32 bit unsigned numbers. There are no | Recallable object types are 32-bit unsigned numbers. There are no | |||
Reserved values. Values in the range 12 through 15, inclusive, are | Reserved values. Values in the range 12 through 15, inclusive, are | |||
for Private Use. | designated for Private Use. | |||
The registry is a list of assignments, each containing five fields | The registry is a list of assignments, each containing five fields | |||
per assignment. | per assignment. | |||
1. The name of the recallable object type. This name must have the | 1. The name of the recallable object type. This name must have the | |||
prefix: "RCA4_TYPE_MASK_". The name must be unique. | prefix "RCA4_TYPE_MASK_". The name must be unique. | |||
2. The value of the recallable object type. IANA will assign this | 2. The value of the recallable object type. IANA will assign this | |||
number, and the request from the registrant will use TBD1 instead | number, and the request from the registrant will use TBD1 instead | |||
of an actual value. IANA MUST use a whole number which can be no | of an actual value. IANA MUST use a whole number that can be no | |||
higher than 2^32-1, and should be the next available value. The | higher than 2^32-1, and should be the next available value. The | |||
value must be unique. A Designated Expert must be used to ensure | value must be unique. A Designated Expert must be used to ensure | |||
that when the name of the recallable type and its value are added | that when the name of the recallable type and its value are added | |||
to the NFSv4 XDR description [13], the result continues to be a | to the NFSv4 XDR description [13], the result continues to be a | |||
valid XDR description. | valid XDR description. | |||
3. The Standards Track RFC(s) that describe the recallable object | 3. The Standards Track RFC(s) that describe the recallable object | |||
type. If the RFC(s) have not yet been published, the registrant | type. If the RFC(s) have not yet been published, the registrant | |||
will use RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | will use RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | |||
4. How the RFC introduces the recallable object type. This is | 4. How the RFC introduces the recallable object type. This is | |||
indicated by a single US-ASCII value. If the value is N, it | indicated by a single US-ASCII value. If the value is N, it | |||
means a minor revision to the NFSv4 protocol. If the value is L, | means a minor revision to the NFSv4 protocol. If the value is L, | |||
it means a new pNFS layout type. Other values can be used with | it means a new pNFS layout type. Other values can be used with | |||
IESG Approval. | IESG Approval. | |||
5. The minor versions of NFSv4 that are allowed to the use the | 5. The minor versions of NFSv4 that are allowed to use the | |||
recallable object type. While these are numeric values, IANA | recallable object type. While these are numeric values, IANA | |||
will not allocate and assign them; the author of the relevant | will not allocate and assign them; the author of the relevant | |||
RFCs with IESG Approval assigns these numbers. Each time there | RFCs with IESG Approval assigns these numbers. Each time there | |||
is new minor version of NFSv4 approved, a Designated Expert | is a new minor version of NFSv4 approved, a Designated Expert | |||
should review the registry to make recommended updates as needed. | should review the registry to make recommended updates as needed. | |||
22.3.1. Initial Registry | 22.3.1. Initial Registry | |||
The initial registry is in Table 17. Note that next available value | The initial registry is in Table 17. Note that the next available | |||
is five. | value is five. | |||
+-------------------------------+-------+----------+-----+----------+ | +-------------------------------+-------+--------+-----+------------+ | |||
| Recallable Object Type Name | Value | RFC | How | Minor | | | Recallable Object Type Name | Value | RFC | How | Minor | | |||
| | | | | Versions | | | | | | | Versions | | |||
+-------------------------------+-------+----------+-----+----------+ | +-------------------------------+-------+--------+-----+------------+ | |||
| RCA4_TYPE_MASK_RDATA_DLG | 0 | RFCTBD10 | N | 1 | | | RCA4_TYPE_MASK_RDATA_DLG | 0 | RFC | N | 1 | | |||
| RCA4_TYPE_MASK_WDATA_DLG | 1 | RFCTBD10 | N | 1 | | | | | 5661 | | | | |||
| RCA4_TYPE_MASK_DIR_DLG | 2 | RFCTBD10 | N | 1 | | | RCA4_TYPE_MASK_WDATA_DLG | 1 | RFC | N | 1 | | |||
| RCA4_TYPE_MASK_FILE_LAYOUT | 3 | RFCTBD10 | N | 1 | | | | | 5661 | | | | |||
| RCA4_TYPE_MASK_BLK_LAYOUT | 4 | RFCTBD20 | L | 1 | | | RCA4_TYPE_MASK_DIR_DLG | 2 | RFC | N | 1 | | |||
| RCA4_TYPE_MASK_OBJ_LAYOUT_MIN | 8 | RFCTBD30 | L | 1 | | | | | 5661 | | | | |||
| RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | 9 | RFCTBD30 | L | 1 | | | RCA4_TYPE_MASK_FILE_LAYOUT | 3 | RFC | N | 1 | | |||
| Private Use | 12-15 | RFCTBD10 | L | 1 | | | | | 5661 | | | | |||
+-------------------------------+-------+----------+-----+----------+ | | RCA4_TYPE_MASK_BLK_LAYOUT | 4 | RFC | L | 1 | | |||
| | | 5661 | | | | ||||
| RCA4_TYPE_MASK_OBJ_LAYOUT_MIN | 8 | RFC | L | 1 | | ||||
| | | 5661 | | | | ||||
| RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | 9 | RFC | L | 1 | | ||||
| | | 5661 | | | | ||||
+-------------------------------+-------+--------+-----+------------+ | ||||
Table 17: Initial Recallable Object Type Assignments | Table 17: Initial Recallable Object Type Assignments | |||
22.3.2. Updating Registrations | 22.3.2. Updating Registrations | |||
The update of a registration will require IESG Approval on the advice | The update of a registration will require IESG Approval on the advice | |||
of a Designated Expert. | of a Designated Expert. | |||
22.4. Layout Types | 22.4. Layout Types | |||
IANA will create a registry called the "pNFS Layout Types Registry". | IANA created a registry called the "pNFS Layout Types Registry". | |||
All assignments to the registry are made on a Standards Action basis, | All assignments to the registry are made on a Standards Action basis, | |||
with Expert Review required. | with Expert Review required. | |||
Layout types are 32 bit numbers. The value zero is Reserved. Values | Layout types are 32-bit numbers. The value zero is Reserved. Values | |||
in the range 0x80000000 to 0xFFFFFFFF inclusive are for Private Use. | in the range 0x80000000 to 0xFFFFFFFF inclusive are designated for | |||
IANA will assign numbers from the range 0x00000001 to 0x7FFFFFFF | Private Use. IANA will assign numbers from the range 0x00000001 to | |||
inclusive. | 0x7FFFFFFF inclusive. | |||
The registry is a list of assignments, each containing five fields. | The registry is a list of assignments, each containing five fields. | |||
1. The name of the layout type. This name must have the prefix: | 1. The name of the layout type. This name must have the prefix | |||
"LAYOUT4_". The name must be unique. | "LAYOUT4_". The name must be unique. | |||
2. The value of the layout type. IANA will assign this number, and | 2. The value of the layout type. IANA will assign this number, and | |||
the request from the registrant will use TBD1 instead of an | the request from the registrant will use TBD1 instead of an | |||
actual value. The value assigned must be unique. A Designated | actual value. The value assigned must be unique. A Designated | |||
Expert must be used to ensure that when the name of the layout | Expert must be used to ensure that when the name of the layout | |||
type and its value are added to the NFSv4.1 layouttype4 | type and its value are added to the NFSv4.1 layouttype4 | |||
enumerated data type in the NFSv4.1 XDR description ([13]), the | enumerated data type in the NFSv4.1 XDR description ([13]), the | |||
result continues to be a valid XDR description. | result continues to be a valid XDR description. | |||
skipping to change at page 606, line 28 | skipping to change at page 609, line 35 | |||
RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | RFCTBD2, RFCTBD3, etc. instead of an actual RFC number. | |||
Collectively, the RFC(s) must adhere to the guidelines listed in | Collectively, the RFC(s) must adhere to the guidelines listed in | |||
Section 22.4.3. | Section 22.4.3. | |||
4. How the RFC introduces the layout type. This is indicated by a | 4. How the RFC introduces the layout type. This is indicated by a | |||
single US-ASCII value. If the value is N, it means a minor | single US-ASCII value. If the value is N, it means a minor | |||
revision to the NFSv4 protocol. If the value is L, it means a | revision to the NFSv4 protocol. If the value is L, it means a | |||
new pNFS layout type. Other values can be used with IESG | new pNFS layout type. Other values can be used with IESG | |||
Approval. | Approval. | |||
5. The minor versions of NFSv4 that are allowed to the use the | 5. The minor versions of NFSv4 that are allowed to use the | |||
notification. While these are numeric values, IANA will not | notification. While these are numeric values, IANA will not | |||
allocate and assign them; the author of the relevant RFCs with | allocate and assign them; the author of the relevant RFCs with | |||
IESG Approval assigns these numbers. Each time there is new | IESG Approval assigns these numbers. Each time there is a new | |||
minor version of NFSv4 approved, a Designated Expert should | minor version of NFSv4 approved, a Designated Expert should | |||
review the registry to make recommended updates as needed. | review the registry to make recommended updates as needed. | |||
22.4.1. Initial Registry | 22.4.1. Initial Registry | |||
The initial registry is in Table 18. | The initial registry is in Table 18. | |||
+-----------------------+-------+----------+-----+----------------+ | +-----------------------+-------+----------+-----+----------------+ | |||
| Layout Type Name | Value | RFC | How | Minor Versions | | | Layout Type Name | Value | RFC | How | Minor Versions | | |||
+-----------------------+-------+----------+-----+----------------+ | +-----------------------+-------+----------+-----+----------------+ | |||
| LAYOUT4_NFSV4_1_FILES | 0x1 | RFCTBD10 | N | 1 | | | LAYOUT4_NFSV4_1_FILES | 0x1 | RFC 5661 | N | 1 | | |||
| LAYOUT4_OSD2_OBJECTS | 0x2 | RFCTBD30 | L | 1 | | | LAYOUT4_OSD2_OBJECTS | 0x2 | RFC 5664 | L | 1 | | |||
| LAYOUT4_BLOCK_VOLUME | 0x3 | RFCTBD20 | L | 1 | | | LAYOUT4_BLOCK_VOLUME | 0x3 | RFC 5663 | L | 1 | | |||
+-----------------------+-------+----------+-----+----------------+ | +-----------------------+-------+----------+-----+----------------+ | |||
Table 18: Initial Layout Type Assignments | Table 18: Initial Layout Type Assignments | |||
22.4.2. Updating Registrations | 22.4.2. Updating Registrations | |||
The update of a registration will require IESG Approval on the advice | The update of a registration will require IESG Approval on the advice | |||
of a Designated Expert. | of a Designated Expert. | |||
22.4.3. Guidelines for Writing Layout Type Specifications | 22.4.3. Guidelines for Writing Layout Type Specifications | |||
skipping to change at page 607, line 46 | skipping to change at page 611, line 8 | |||
* At a minimum, describe the methods of recovery from: | * At a minimum, describe the methods of recovery from: | |||
1. Failure and restart for client, server, storage device. | 1. Failure and restart for client, server, storage device. | |||
2. Lease expiration from perspective of the active client, | 2. Lease expiration from perspective of the active client, | |||
server, storage device. | server, storage device. | |||
3. Loss of layout state resulting in fencing of client access | 3. Loss of layout state resulting in fencing of client access | |||
to storage devices (for an example, see Section 12.7.3). | to storage devices (for an example, see Section 12.7.3). | |||
* Include an IANA considerations section, will in turn include: | * Include an IANA considerations section, which will in turn | |||
include: | ||||
+ A request to IANA for a new layout type per Section 22.4. | + A request to IANA for a new layout type per Section 22.4. | |||
+ A list of requests to IANA for any new recallable object | + A list of requests to IANA for any new recallable object | |||
types for CB_RECALL_ANY; each entry is to presented in the | types for CB_RECALL_ANY; each entry is to be presented in | |||
form described in Section 22.3. | the form described in Section 22.3. | |||
+ A list of requests to IANA for any new notification values | + A list of requests to IANA for any new notification values | |||
for CB_NOTIFY_DEVICEID; each entry is to presented in the | for CB_NOTIFY_DEVICEID; each entry is to be presented in | |||
form described in Section 22.2. | the form described in Section 22.2. | |||
* Include a security considerations section. This section MUST | * Include a security considerations section. This section MUST | |||
explain how the NFSv4.1 authentication, authorization, and | explain how the NFSv4.1 authentication, authorization, and | |||
access control models are preserved. I.e. if a metadata | access-control models are preserved. That is, if a metadata | |||
server would restrict a READ or WRITE operation, how would | server would restrict a READ or WRITE operation, how would | |||
pNFS via the layout similarly restrict a corresponding input | pNFS via the layout similarly restrict a corresponding input | |||
or output operation? | or output operation? | |||
3. The author documents the new layout specification as an Internet | 3. The author documents the new layout specification as an Internet- | |||
Draft. | Draft. | |||
4. The author submits the Internet Draft for review through the IETF | 4. The author submits the Internet-Draft for review through the IETF | |||
standards process as defined in "Internet Official Protocol | standards process as defined in "The Internet Standards Process-- | |||
Standards" (STD 1). The new layout specification will be | Revision 3" (BCP 9). The new layout specification will be | |||
submitted for eventual publication as a standards track RFC. | submitted for eventual publication as a Standards Track RFC. | |||
5. The layout specification progresses through the IETF standards | 5. The layout specification progresses through the IETF standards | |||
process; the new option will be reviewed by the NFSv4 Working | process. | |||
Group (if that group still exists), or as an Internet Draft not | ||||
submitted by an IETF working group. | ||||
22.5. Path Variable Definitions | 22.5. Path Variable Definitions | |||
This section deals with the IANA considerations associated with the | This section deals with the IANA considerations associated with the | |||
variable substitution feature for location names as described in | variable substitution feature for location names as described in | |||
Section 11.10.3. As described there, variables subject to | Section 11.10.3. As described there, variables subject to | |||
substitution consist of a domain name and a specific name within that | substitution consist of a domain name and a specific name within that | |||
domain, with two separated by a colon. There are two sets of IANA | domain, with the two separated by a colon. There are two sets of | |||
considerations here: | IANA considerations here: | |||
1. The list of variable names. | 1. The list of variable names. | |||
2. For each variable name, the list of possible values. | 2. For each variable name, the list of possible values. | |||
Thus, there will be one registry for the list of variable names, and | Thus, there will be one registry for the list of variable names, and | |||
possibly one registry for listing the values of each variable name. | possibly one registry for listing the values of each variable name. | |||
22.5.1. Path Variables Registry | 22.5.1. Path Variables Registry | |||
IANA will create a registry called the "NFSv4 Path Variables | IANA created a registry called the "NFSv4 Path Variables Registry". | |||
Registry". | ||||
22.5.1.1. Path Variable Values | 22.5.1.1. Path Variable Values | |||
Variable names are of the form "${", followed by a domain name, | Variable names are of the form "${", followed by a domain name, | |||
followed by a colon (":"), followed by a domain-specific portion of | followed by a colon (":"), followed by a domain-specific portion of | |||
the variable name, followed by "}". When the domain name is | the variable name, followed by "}". When the domain name is | |||
"ietf.org" all variables names must be registered with IANA on a | "ietf.org", all variables names must be registered with IANA on a | |||
Standards Action basis, with Expert Review required. Path variables | Standards Action basis, with Expert Review required. Path variables | |||
with registered domain names neither part of nor equal to ietf.org | with registered domain names neither part of nor equal to ietf.org | |||
are assigned on a Hierarchical Allocation basis (delegating to the | are assigned on a Hierarchical Allocation basis (delegating to the | |||
domain owner) and thus of no concern to IANA, unless the domain owner | domain owner) and thus of no concern to IANA, unless the domain owner | |||
chooses to register a variable name from his domain. If the domain | chooses to register a variable name from his domain. If the domain | |||
owner chooses to do so, IANA will do so on a First Come First Serve | owner chooses to do so, IANA will do so on a First Come First Serve | |||
basis. To accommodate registrants who do not have their own domain, | basis. To accommodate registrants who do not have their own domain, | |||
IANA will accept requests to register variables with the prefix | IANA will accept requests to register variables with the prefix | |||
"${FCFS.ietf.org:" on a First Come First Served basis. Assignments | "${FCFS.ietf.org:" on a First Come First Served basis. Assignments | |||
on a First Come First Basis do not require Expert Review, unless the | on a First Come First Basis do not require Expert Review, unless the | |||
skipping to change at page 609, line 35 | skipping to change at page 612, line 40 | |||
1. The name of the variable. The name of this variable must start | 1. The name of the variable. The name of this variable must start | |||
with a "${" followed by a registered domain name, followed by | with a "${" followed by a registered domain name, followed by | |||
":", or it must start with "${FCFS.ietf.org". The name must be | ":", or it must start with "${FCFS.ietf.org". The name must be | |||
no more than 64 UTF-8 characters long. The name must be unique. | no more than 64 UTF-8 characters long. The name must be unique. | |||
2. For assignments made on Standards Action basis, the Standards | 2. For assignments made on Standards Action basis, the Standards | |||
Track RFC(s) that describe the variable. If the RFC(s) have not | Track RFC(s) that describe the variable. If the RFC(s) have not | |||
yet been published, the registrant will use RFCTBD1, RFCTBD2, | yet been published, the registrant will use RFCTBD1, RFCTBD2, | |||
etc. instead of an actual RFC number. Note that the RFCs do not | etc. instead of an actual RFC number. Note that the RFCs do not | |||
have to be a part of a NFS minor version. For assignments made | have to be a part of an NFS minor version. For assignments made | |||
on a First Come First Serve basis, an explanation (consuming no | on a First Come First Serve basis, an explanation (consuming no | |||
more than 1024 bytes, or more if IANA permits) of the purpose of | more than 1024 bytes, or more if IANA permits) of the purpose of | |||
the variable. A reference to the explanation can be substituted. | the variable. A reference to the explanation can be substituted. | |||
3. The point of contact, including an email address. The point of | 3. The point of contact, including an email address. The point of | |||
contact can consume up to 256 bytes (or more if IANA permits). | contact can consume up to 256 bytes (or more if IANA permits). | |||
For assignments made on a Standards Action basis, the point of | For assignments made on a Standards Action basis, the point of | |||
contact is always IESG. | contact is always IESG. | |||
22.5.1.1.1. Initial Registry | 22.5.1.1.1. Initial Registry | |||
The initial registry is in Table 19. | The initial registry is in Table 19. | |||
+------------------------+----------+------------------+ | +------------------------+----------+------------------+ | |||
| Variable Name | RFC | Point of Contact | | | Variable Name | RFC | Point of Contact | | |||
+------------------------+----------+------------------+ | +------------------------+----------+------------------+ | |||
| ${ietf.org:CPU_ARCH} | RFCTBD10 | IESG | | | ${ietf.org:CPU_ARCH} | RFC 5661 | IESG | | |||
| ${ietf.org:OS_TYPE} | RFCTBD10 | IESG | | | ${ietf.org:OS_TYPE} | RFC 5661 | IESG | | |||
| ${ietf.org:OS_VERSION} | RFCTBD10 | IESG | | | ${ietf.org:OS_VERSION} | RFC 5661 | IESG | | |||
+------------------------+----------+------------------+ | +------------------------+----------+------------------+ | |||
Table 19: Initial List of Path Variables | Table 19: Initial List of Path Variables | |||
IANA will need to create registries for the values of the variable | IANA has created registries for the values of the variable names | |||
names ${ietf.org:CPU_ARCH} and ${ietf.org:OS_TYPE}. See | ${ietf.org:CPU_ARCH} and ${ietf.org:OS_TYPE}. See Sections 22.5.2 | |||
Section 22.5.2 and Section 22.5.3. | and 22.5.3. | |||
For the values of the variable ${ietf.org:OS_VERSION}, no registry is | For the values of the variable ${ietf.org:OS_VERSION}, no registry is | |||
needed as the specifics of the values of the variable will vary with | needed as the specifics of the values of the variable will vary with | |||
the value of ${ietf.org:OS_TYPE}. Thus values for ${ietf.org: | the value of ${ietf.org:OS_TYPE}. Thus, values for ${ietf.org: | |||
OS_VERSION} are on a Hierarchical Allocation basis and are of no | OS_VERSION} are on a Hierarchical Allocation basis and are of no | |||
concern to IANA. | concern to IANA. | |||
22.5.1.1.2. Updating Registrations | 22.5.1.1.2. Updating Registrations | |||
The update of an assignment made on a Standards Action basis will | The update of an assignment made on a Standards Action basis will | |||
require IESG Approval on the advice of a Designated Expert. | require IESG Approval on the advice of a Designated Expert. | |||
The registrant can always updated the point of contact of an | The registrant can always update the point of contact of an | |||
assignment made on a First Come First Serve basis. Any other update | assignment made on a First Come First Serve basis. Any other update | |||
will require Expert Review. | will require Expert Review. | |||
22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable | 22.5.2. Values for the ${ietf.org:CPU_ARCH} Variable | |||
IANA will create a registry called the "NFSv4 ${ietf.org:CPU_ARCH} | IANA created a registry called the "NFSv4 ${ietf.org:CPU_ARCH} Value | |||
Value Registry". | Registry". | |||
Assignments to the registry are made on a First Come First Serve | Assignments to the registry are made on a First Come First Serve | |||
basis. The zero length value of ${ietf.org:CPU_ARCH} is Reserved. | basis. The zero-length value of ${ietf.org:CPU_ARCH} is Reserved. | |||
Values with a prefix of "PRIV" are Reserved for Private Use. | Values with a prefix of "PRIV" are designated for Private Use. | |||
The registry is a list of assignments, each containing three fields. | The registry is a list of assignments, each containing three fields. | |||
1. A value of the ${ietf.org:CPU_ARCH} variable. The value must be | 1. A value of the ${ietf.org:CPU_ARCH} variable. The value must be | |||
1 to 32 UTF-8 characters long. The value must be unique. | 1 to 32 UTF-8 characters long. The value must be unique. | |||
2. An explanation (consuming no more than 1024 bytes, or more if | 2. An explanation (consuming no more than 1024 bytes, or more if | |||
IANA permits) of what CPU architecture the value denotes. A | IANA permits) of what CPU architecture the value denotes. A | |||
reference to the explanation can be substituted. | reference to the explanation can be substituted. | |||
3. The point of contact, including an email address. The point of | 3. The point of contact, including an email address. The point of | |||
contact can consume up to 256 bytes (or more if IANA permits). | contact can consume up to 256 bytes (or more if IANA permits). | |||
22.5.2.1. Initial Registry | 22.5.2.1. Initial Registry | |||
There is no initial registry. | There is no initial registry. | |||
22.5.2.2. Updating Registrations | 22.5.2.2. Updating Registrations | |||
The registrant is free to update the assignment, i.e. change the | The registrant is free to update the assignment, i.e., change the | |||
explanation and/or point of contact fields. | explanation and/or point-of-contact fields. | |||
22.5.3. Values for the ${ietf.org:OS_TYPE} Variable | 22.5.3. Values for the ${ietf.org:OS_TYPE} Variable | |||
IANA will create a registry called the "NFSv4 ${ietf.org:OS_TYPE} | IANA created a registry called the "NFSv4 ${ietf.org:OS_TYPE} Value | |||
Value Registry". | Registry". | |||
Assignments to the registry are made on a First Come First Serve | Assignments to the registry are made on a First Come First Serve | |||
basis. The zero length value of ${ietf.org:OS_TYPE} is Reserved. | basis. The zero-length value of ${ietf.org:OS_TYPE} is Reserved. | |||
Values with a prefix of "PRIV" are Reserved for Private Use. | Values with a prefix of "PRIV" are designated for Private Use. | |||
The registry is a list of assignments, each containing three fields. | The registry is a list of assignments, each containing three fields. | |||
1. A value of the ${ietf.org:OS_TYPE} variable. The value must be 1 | 1. A value of the ${ietf.org:OS_TYPE} variable. The value must be 1 | |||
to 32 UTF-8 characters long. The value must be unique. | to 32 UTF-8 characters long. The value must be unique. | |||
2. An explanation (consuming no more than 1024 bytes, or more if | 2. An explanation (consuming no more than 1024 bytes, or more if | |||
IANA permits) of what CPU architecture the value denotes. A | IANA permits) of what CPU architecture the value denotes. A | |||
reference to the explanation can be substituted. | reference to the explanation can be substituted. | |||
3. The point of contact, including an email address. The point of | 3. The point of contact, including an email address. The point of | |||
contact can consume up to 256 bytes (or more if IANA permits). | contact can consume up to 256 bytes (or more if IANA permits). | |||
22.5.3.1. Initial Registry | 22.5.3.1. Initial Registry | |||
There is no initial registry. | There is no initial registry. | |||
22.5.3.2. Updating Registrations | 22.5.3.2. Updating Registrations | |||
The registrant is free to update the assignment, i.e. change the | The registrant is free to update the assignment, i.e., change the | |||
explanation and/or point of contact fields. | explanation and/or point of contact fields. | |||
23. References | 23. References | |||
23.1. Normative References | 23.1. Normative References | |||
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement | [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement | |||
Levels", RFC 2119, March 1997. | Levels", BCP 14, RFC 2119, March 1997. | |||
[2] Eisler, M., "XDR: External Data Representation Standard", | [2] Eisler, M., Ed., "XDR: External Data Representation Standard", | |||
STD 67, RFC 4506, May 2006. | STD 67, RFC 4506, May 2006. | |||
[3] Srinivasan, R., "RPC: Remote Procedure Call Protocol | [3] Thurlow, R., "RPC: Remote Procedure Call Protocol Specification | |||
Specification Version 2", RFC 1831, August 1995. | Version 2", RFC 5531, May 2009. | |||
[4] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol | [4] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol | |||
Specification", RFC 2203, September 1997. | Specification", RFC 2203, September 1997. | |||
[5] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos Version | [5] Zhu, L., Jaganathan, K., and S. Hartman, "The Kerberos Version | |||
5 Generic Security Service Application Program Interface (GSS- | 5 Generic Security Service Application Program Interface (GSS- | |||
API) Mechanism Version 2", RFC 4121, July 2005. | API) Mechanism Version 2", RFC 4121, July 2005. | |||
[6] The Open Group, "Section 3.191 of Chapter 3 of Base Definitions | [6] The Open Group, "Section 3.191 of Chapter 3 of Base Definitions | |||
of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | |||
2004 Edition, HTML Version (www.opengroup.org), ISBN | 2004 Edition, HTML Version (www.opengroup.org), ISBN | |||
1931624232", 2004. | 1931624232", 2004. | |||
[7] Linn, J., "Generic Security Service Application Program | [7] Linn, J., "Generic Security Service Application Program | |||
Interface Version 2, Update 1", RFC 2743, January 2000. | Interface Version 2, Update 1", RFC 2743, January 2000. | |||
[8] Talpey, T. and B. Callaghan, "Remote Direct Memory Access | [8] Talpey, T. and B. Callaghan, "Remote Direct Memory Access | |||
Transport for Remote Procedure Call", | Transport for Remote Procedure Call", RFC 5666, October 2009. | |||
draft-ietf-nfsv4-rpcrdma-09 (work in progress), December 2008. | ||||
[9] Talpey, T., Callaghan, B., and I. Property, "NFS Direct Data | [9] Talpey, T. and B. Callaghan, "Network File System (NFS) Direct | |||
Placement", draft-ietf-nfsv4-nfsdirect-08 (work in progress), | Data Placement", RFC 5666, October 2009. | |||
April 2008. | ||||
[10] Recio, P., Metzler, B., Culley, P., Hilland, J., and D. Garcia, | [10] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. Garcia, | |||
"A Remote Direct Memory Access Protocol Specification", | "A Remote Direct Memory Access Protocol Specification", | |||
RFC 5040, October 2007. | RFC 5040, October 2007. | |||
[11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing | [11] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing | |||
for Message Authentication", RFC 2104, February 1997. | for Message Authentication", RFC 2104, February 1997. | |||
[12] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February 2009. | [12] Eisler, M., "RPCSEC_GSS Version 2", RFC 5403, February 2009. | |||
[13] Shepler, S., Eisler, M., and D. Noveck, "NFSv4 Minor Version 1 | [13] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., "Network | |||
XDR Description", draft-ietf-nfsv4-minorversion1-dot-x-12 (work | File System (NFS) Version 4 Minor Version 1 External Data | |||
in progress), Dec 2008. | Representation Standard (XDR) Description", RFC 5662, | |||
October 2009. | ||||
[14] The Open Group, "Section 3.372 of Chapter 3 of Base Definitions | [14] The Open Group, "Section 3.372 of Chapter 3 of Base Definitions | |||
of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | of The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | |||
2004 Edition, HTML Version (www.opengroup.org), ISBN | 2004 Edition, HTML Version (www.opengroup.org), ISBN | |||
1931624232", 2004. | 1931624232", 2004. | |||
[15] Eisler, M., "IANA Considerations for RPC Net Identifiers and | [15] Eisler, M., "IANA Considerations for Remote Procedure Call | |||
Universal Address Formats", draft-ietf-nfsv4-rpc-netid-04 (work | (RPC) Network Identifiers and Universal Address Formats", | |||
in progress), December 2008. | RFC 5665, October 2009. | |||
[16] The Open Group, "Section 'read()' of System Interfaces of The | [16] The Open Group, "Section 'read()' of System Interfaces of The | |||
Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 | Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 | |||
Edition, HTML Version (www.opengroup.org), ISBN 1931624232", | Edition, HTML Version (www.opengroup.org), ISBN 1931624232", | |||
2004. | 2004. | |||
[17] The Open Group, "Section 'readdir()' of System Interfaces of | [17] The Open Group, "Section 'readdir()' of System Interfaces of | |||
The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | The Open Group Base Specifications Issue 6 IEEE Std 1003.1, | |||
2004 Edition, HTML Version (www.opengroup.org), ISBN | 2004 Edition, HTML Version (www.opengroup.org), ISBN | |||
1931624232", 2004. | 1931624232", 2004. | |||
skipping to change at page 614, line 49 | skipping to change at page 617, line 47 | |||
[32] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism | [32] Eisler, M., "LIPKEY - A Low Infrastructure Public Key Mechanism | |||
Using SPKM", RFC 2847, June 2000. | Using SPKM", RFC 2847, June 2000. | |||
[33] Eisler, M., "NFS Version 2 and Version 3 Security Issues and | [33] Eisler, M., "NFS Version 2 and Version 3 Security Issues and | |||
the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", | the NFS Protocol's Use of RPCSEC_GSS and Kerberos V5", | |||
RFC 2623, June 1999. | RFC 2623, June 1999. | |||
[34] Juszczak, C., "Improving the Performance and Correctness of an | [34] Juszczak, C., "Improving the Performance and Correctness of an | |||
NFS Server", USENIX Conference Proceedings , June 1990. | NFS Server", USENIX Conference Proceedings , June 1990. | |||
[35] Reynolds, J., "Assigned Numbers: RFC 1700 is Replaced by an On- | [35] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced by | |||
line Database", RFC 3232, January 2002. | an On-line Database", RFC 3232, January 2002. | |||
[36] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | [36] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | |||
RFC 1833, August 1995. | RFC 1833, August 1995. | |||
[37] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , | [37] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , | |||
February 1996. | February 1996. | |||
[38] Nowicki, B., "NFS: Network File System Protocol specification", | [38] Nowicki, B., "NFS: Network File System Protocol specification", | |||
RFC 1094, March 1989. | RFC 1094, March 1989. | |||
[39] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available | [39] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available | |||
Network Server", USENIX Conference Proceedings , January 1991. | Network Server", USENIX Conference Proceedings , January 1991. | |||
[40] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS | [40] Halevy, B., Welch, B., and J. Zelenka, "Object-Based Parallel | |||
Operations", draft-ietf-nfsv4-pnfs-obj-11 (work in progress), | NFS (pNFS) Operations", RFC 5664, October 2009. | |||
December 2008. | ||||
[41] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume | [41] Black, D., Glasgow, J., and S. Fridella, "Parallel NFS (pNFS) | |||
Layout", draft-ietf-nfsv4-pnfs-block-11 (work in progress), | Block/Volume Layout", RFC 5663, October 2009. | |||
December 2008. | ||||
[42] Callaghan, B., "WebNFS Client Specification", RFC 2054, | [42] Callaghan, B., "WebNFS Client Specification", RFC 2054, | |||
October 1996. | October 1996. | |||
[43] Callaghan, B., "WebNFS Server Specification", RFC 2055, | [43] Callaghan, B., "WebNFS Server Specification", RFC 2055, | |||
October 1996. | October 1996. | |||
[44] IESG, "IESG Processing of RFC Errata for the IETF Stream", | [44] IESG, "IESG Processing of RFC Errata for the IETF Stream", | |||
July 2008. | July 2008. | |||
skipping to change at page 616, line 22 | skipping to change at page 619, line 18 | |||
[53] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. | [53] Callaghan, B., "NFS URL Scheme", RFC 2224, October 1997. | |||
[54] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation | [54] Chiu, A., Eisler, M., and B. Callaghan, "Security Negotiation | |||
for WebNFS", RFC 2755, January 2000. | for WebNFS", RFC 2755, January 2000. | |||
[55] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA | [55] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA | |||
Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. | Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. | |||
Appendix A. Acknowledgments | Appendix A. Acknowledgments | |||
The initial drafts for the SECINFO extensions were edited by Mike | The initial text for the SECINFO extensions were edited by Mike | |||
Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl | Eisler with contributions from Peng Dai, Sergey Klyushin, and Carl | |||
Burnett. | Burnett. | |||
The initial drafts for the SESSIONS extensions were edited by Tom | The initial text for the SESSIONS extensions were edited by Tom | |||
Talpey, Spencer Shepler, Jon Bauman with contributions from Charles | Talpey, Spencer Shepler, Jon Bauman with contributions from Charles | |||
Antonelli, Brent Callaghan, Mike Eisler, John Howard, Chet Juszczak, | Antonelli, Brent Callaghan, Mike Eisler, John Howard, Chet Juszczak, | |||
Trond Myklebust, Dave Noveck, John Scott, Mike Stolarchuk and Mark | Trond Myklebust, Dave Noveck, John Scott, Mike Stolarchuk, and Mark | |||
Wittle. | Wittle. | |||
Initial drafts relating to multi-server namespace features, including | Initial text relating to multi-server namespace features, including | |||
the concept of referrals, were contributed by Dave Noveck, Carl | the concept of referrals, were contributed by Dave Noveck, Carl | |||
Burnett, and Charles Fan with contributions from Ted Anderson, Neil | Burnett, and Charles Fan with contributions from Ted Anderson, Neil | |||
Brown, and Jon Haswell. | Brown, and Jon Haswell. | |||
The initial drafts for the Directory Delegations support were | The initial text for the Directory Delegations support were | |||
contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, | contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, | |||
Carl Burnett, Ted Anderson and Tom Talpey. | Carl Burnett, Ted Anderson, and Tom Talpey. | |||
The initial drafts for the ACL explanations were contributed by Sam | The initial text for the ACL explanations were contributed by Sam | |||
Falkner and Lisa Week. | Falkner and Lisa Week. | |||
The pNFS work was inspired by the NASD and OSD work done by Garth | The pNFS work was inspired by the NASD and OSD work done by Garth | |||
Gibson. Gary Grider has also been a champion of high-performance | Gibson. Gary Grider has also been a champion of high-performance | |||
parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort | parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort | |||
with a problem statement document for IETF that formed the basis for | with a problem statement document for the IETF that formed the basis | |||
the pNFS work in NFSv4.1. | for the pNFS work in NFSv4.1. | |||
The initial drafts for the parallel NFS support were edited by Brent | The initial text for the parallel NFS support was edited by Brent | |||
Welch and Garth Goodson. Additional authors for those documents were | Welch and Garth Goodson. Additional authors for those documents were | |||
Benny Halevy, David Black, and Andy Adamson. Additional input came | Benny Halevy, David Black, and Andy Adamson. Additional input came | |||
from the informal group which contributed to the construction of the | from the informal group that contributed to the construction of the | |||
initial pNFS drafts; specific acknowledgement goes to Gary Grider, | initial pNFS drafts; specific acknowledgment goes to Gary Grider, | |||
Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. | Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. | |||
Fredric Isaman found several errors in draft versions of the ONC RPC | Fredric Isaman found several errors in draft versions of the ONC RPC | |||
XDR description of the NFSv4.1 protocol. | XDR description of the NFSv4.1 protocol. | |||
Audrey Van Belleghem provided, in numerous ways, essential co- | Audrey Van Belleghem provided, in numerous ways, essential co- | |||
ordination and management of the process of editing the specification | ordination and management of the process of editing the specification | |||
drafts. | documents. | |||
Richard Jernigan gave feedback on the file layout's striping pattern | Richard Jernigan gave feedback on the file layout's striping pattern | |||
design. | design. | |||
Several formal inspection teams were formed to review various areas | Several formal inspection teams were formed to review various areas | |||
of the protocol. All the inspections found significant errors and | of the protocol. All the inspections found significant errors and | |||
room for improvement. NFSv4.1's inspection teams were: | room for improvement. NFSv4.1's inspection teams were: | |||
o ACLs, with the following inspectors: Sam Falkner, Bruce Fields, | o ACLs, with the following inspectors: Sam Falkner, Bruce Fields, | |||
Rahul Iyer, Saadia Khan, Dave Noveck, Lisa Week, Mario Wurzl, and | Rahul Iyer, Saadia Khan, Dave Noveck, Lisa Week, Mario Wurzl, and | |||
skipping to change at page 617, line 38 | skipping to change at page 620, line 34 | |||
Doeppner, Robert Gordon, Benny Halevy, Fredric Isaman, Rick | Doeppner, Robert Gordon, Benny Halevy, Fredric Isaman, Rick | |||
Macklem, Trond Myklebust, Dave Noveck, Karen Rochford, John Scott, | Macklem, Trond Myklebust, Dave Noveck, Karen Rochford, John Scott, | |||
and Peter Shah. | and Peter Shah. | |||
o Initial pNFS inspection, with the following inspectors: Andy | o Initial pNFS inspection, with the following inspectors: Andy | |||
Adamson, David Black, Mike Eisler, Marc Eshel, Sam Falkner, Garth | Adamson, David Black, Mike Eisler, Marc Eshel, Sam Falkner, Garth | |||
Goodson, Benny Halevy, Rahul Iyer, Trond Myklebust, Spencer | Goodson, Benny Halevy, Rahul Iyer, Trond Myklebust, Spencer | |||
Shepler, and Lisa Week. | Shepler, and Lisa Week. | |||
o Global namespace, with the following inspectors: Mike Eisler, Dan | o Global namespace, with the following inspectors: Mike Eisler, Dan | |||
Ellard, Craig Everhart, Fred Isaman, Trond Myklebust, Dave Noveck, | Ellard, Craig Everhart, Fredric Isaman, Trond Myklebust, Dave | |||
Theresa Raj, Spencer Shepler, Renu Tewari, and Robert Thurlow. | Noveck, Theresa Raj, Spencer Shepler, Renu Tewari, and Robert | |||
Thurlow. | ||||
o NFSv4.1 file layout type, with the following inspectors: Andy | o NFSv4.1 file layout type, with the following inspectors: Andy | |||
Adamson, Marc Eshel, Sam Falkner, Garth Goodson, Rahul Iyer, Trond | Adamson, Marc Eshel, Sam Falkner, Garth Goodson, Rahul Iyer, Trond | |||
Myklebust, and Lisa Week. | Myklebust, and Lisa Week. | |||
o NFSv4.1 locking and directory delegations, with the following | o NFSv4.1 locking and directory delegations, with the following | |||
inspectors: Mike Eisler, Pranoop Erasani, Robert Gordon, Saadia | inspectors: Mike Eisler, Pranoop Erasani, Robert Gordon, Saadia | |||
Khan, Eric Kustarz, Dave Noveck, Spencer Shepler, and Amy Weaver. | Khan, Eric Kustarz, Dave Noveck, Spencer Shepler, and Amy Weaver. | |||
o EXCHANGE_ID and DESTROY_CLIENTID, with the following inspectors: | o EXCHANGE_ID and DESTROY_CLIENTID, with the following inspectors: | |||
Mike Eisler, Pranoop Erasani, Robert Gordon, Benny Halevy, Fred | Mike Eisler, Pranoop Erasani, Robert Gordon, Benny Halevy, Fredric | |||
Isaman, Saadia Khan, Ricardo Labiaga, Rick Macklem, Trond | Isaman, Saadia Khan, Ricardo Labiaga, Rick Macklem, Trond | |||
Myklebust, Spencer Shepler, and Brent Welch. | Myklebust, Spencer Shepler, and Brent Welch. | |||
o Final pNFS inspection, with the following inspectors: Andy | o Final pNFS inspection, with the following inspectors: Andy | |||
Adamson, Mike Eisler, Mark Eshel, Sam Falkner, Jason Glasgow, | Adamson, Mike Eisler, Mark Eshel, Sam Falkner, Jason Glasgow, | |||
Garth Goodson, Robert Gordon, Benny Halevy, Dean Hildebrand, Rahul | Garth Goodson, Robert Gordon, Benny Halevy, Dean Hildebrand, Rahul | |||
Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer | Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer | |||
Shepler, Renu Tewari, Lisa Week, and Brent Welch. | Shepler, Renu Tewari, Lisa Week, and Brent Welch. | |||
A review team worked together to generate the tables of assignments | A review team worked together to generate the tables of assignments | |||
of error sets to operations and make sure that each such assignment | of error sets to operations and make sure that each such assignment | |||
had two or more people validating it. Participating in the process | had two or more people validating it. Participating in the process | |||
were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert | were Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert | |||
Gordon, Trond Myklebust, Dave Noveck, Spencer Shepler, Tom Talpey, | Gordon, Trond Myklebust, Dave Noveck, Spencer Shepler, Tom Talpey, | |||
Amy Weaver, and Lisa Week. | Amy Weaver, and Lisa Week. | |||
Jari Arkko, David Black, Scott Bradner, Lisa Dusseault, Lars Eggert, | Jari Arkko, David Black, Scott Bradner, Lisa Dusseault, Lars Eggert, | |||
Chris Newman, and Tim Polk provided valuable review and guidance. | Chris Newman, and Tim Polk provided valuable review and guidance. | |||
Olga Kornievskaia found several errors in the SSV specification. | Olga Kornievskaia found several errors in the SSV specification. | |||
Ricardo Labiaga found several places where the use of RPCSEC_GSS was | Ricardo Labiaga found several places where the use of RPCSEC_GSS was | |||
underspecified. | underspecified. | |||
Those who provided miscellaneous comments include: Andy Adamson, | Those who provided miscellaneous comments include: Andy Adamson, | |||
Sunil Bhargo, Alex Burlyga, Pranoop Erasani, Bruce Fields, Vadim | Sunil Bhargo, Alex Burlyga, Pranoop Erasani, Bruce Fields, Vadim | |||
Finkelstein, Jason Goldschmidt, Vijay K. Gurbani, Sergey Klyushin, | Finkelstein, Jason Goldschmidt, Vijay K. Gurbani, Sergey Klyushin, | |||
Ricardo Labiaga, James Lentini, Anshul Madan, Daniel Muntz, Daniel | Ricardo Labiaga, James Lentini, Anshul Madan, Daniel Muntz, Daniel | |||
Picken, Archana Ramani, Jim Rees, Mahesh Siddheshwar, Tom Talpey, and | Picken, Archana Ramani, Jim Rees, Mahesh Siddheshwar, Tom Talpey, and | |||
Peter Varga. | Peter Varga. | |||
Appendix B. RFC Editor Notes | ||||
[RFC Editor: please remove this section prior to publishing this | ||||
document as an RFC] | ||||
[RFC Editor: prior to publishing this document as an RFC, please | ||||
replace all occurrences of RFCTBD10 with RFCxxxx where xxxx is the | ||||
RFC number of this document] | ||||
[RFC Editor: prior to publishing this document as an RFC, please | ||||
replace all occurrences of RFCTBD20 with RFCyyyy where yyyy is the | ||||
RFC number of the document referenced in [41]] | ||||
[RFC Editor: prior to publishing this document as an RFC, please | ||||
replace all occurrences of RFCTBD30 with RFCzzzz where zzzz is the | ||||
RFC number of the document referenced in [40]] | ||||
[RFC Editor: prior to publishing this document as an RFC, please | ||||
ensure all section references to [15], including the reference from | ||||
Section 3.3.9 are accurate if document referenced by [15] has been | ||||
finalized for RFC publication. If not finalized for publication, | ||||
please remove section number references to [15]. | ||||
Authors' Addresses | Authors' Addresses | |||
Spencer Shepler | Spencer Shepler (editor) | |||
Storspeed, Inc. | Storspeed, Inc. | |||
7808 Moonflower Drive | 7808 Moonflower Drive | |||
Austin, TX 78750 | Austin, TX 78750 | |||
USA | USA | |||
Phone: +1-512-402-5811 ext 8530 | Phone: +1-512-402-5811 ext 8530 | |||
Email: shepler@storspeed.com | EMail: shepler@storspeed.com | |||
Mike Eisler | Mike Eisler (editor) | |||
NetApp | NetApp | |||
5765 Chase Point Circle | 5765 Chase Point Circle | |||
Colorado Springs, CO 80919 | Colorado Springs, CO 80919 | |||
USA | USA | |||
Phone: +1-719-599-9026 | Phone: +1-719-599-9026 | |||
Email: mike@eisler.com | EMail: mike@eisler.com | |||
URI: http://www.eisler.com | URI: http://www.eisler.com | |||
David Noveck (editor) | ||||
David Noveck | ||||
NetApp | NetApp | |||
1601 Trapelo Road, Suite 16 | 1601 Trapelo Road, Suite 16 | |||
Waltham, MA 02451 | Waltham, MA 02451 | |||
USA | USA | |||
Phone: +1-781-768-5347 | Phone: +1-781-768-5347 | |||
Email: dnoveck@netapp.com | EMail: dnoveck@netapp.com | |||
End of changes. 2232 change blocks. | ||||
5540 lines changed or deleted | 5662 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |