No self-respecting Fibre Chanel blog would be complete unless we included a detailed discussion on zoning and the best practices concerning zoning. This is precisely the topic of the final blog in series, we will define what zoning is, the need for zoning, some potential strategies and best practices surrounding zoning.
What is Zoning ?
Fibre Channel Zoning is a means to segregate a Fibre Channel switch in order to provide security and minimise interruption between devices. In practice, the concept is no more difficult that than simple Venn diagrams that you probably drew in Primary/Elementary school:
If two objects are in the same zoneset then they are able to communicate with one another, if they are not, then they are unable to communicate. This forms a basic level of security, managed at the fibre channel switch, to segregate devices which are only allowed to communicate with one another. As we have seen in the previous blogs, another level of security is LUN Masking, where the Volume/LUN is mapped to only the Initiator Hosts that are allowed to communicate, managed at the storage array. Both methods are independent and are highly recommended in all Fibre Channel storage deployments. A fabric can function and operate with little or no zoning implemented however as the SAN grows over time and devices are added, contention and interaction with fabric elements can cause issues (devices failing or intermittent problems and in certain cases devices being inappropriately accessed). Zoning provides an effective method to maintain data integrity and heighten data security.
Why is Zoning Important ?
As mentioned above there are a number of reasons why Zoning is implemented; Management, Security and Segregation are all primary reasons why zones should always be implemented. However for large Enterprise SAN’s there are a number of technical reasons why the zoning policy should be considered:
Request for State Change Notification (RSCN)
All devices within a SAN need to be informed of changes to the environment (ie, when nodes log in and out of the fabric). Such changes are referred to as topology changes and each host requires notification as to the nature of change, as it may reflect access to a newly available device or access to a device that no-longer exists. The process of communicating topology changes to each device is accomplished by Request for State Change Notification (RSCN). Each time an event occurs on the SAN, a RSCN is issued and each device must pause processing, receive the RSCN, reply and then continue with it’s processing. In busy fabrics, or fabrics with misbehaving nodes, processing the RSCN’s could introduce an unwanted overhead in processing the RSCN’s. In some severe cases, this can cause a device to report a 'busy' status back to the request or the RSCN – creating problems which then need to be managed or corrected, potentially creating a cascading effect as nodes login/logout of the fabric in an attempt to recover.
In fabrics where no zoning is employed, any topology change would cause RSCN’s to be processed by every device within the fabric. In a zoned fabric only devices that are in the same zone as the changed device would receive the RSCN; limiting the impact of RSCN’s and the exposure to intermittent failures and issues.
During device discovery, a host will query the fabric as to what targets are available and which targets should be probed for available devices. In a un-zoned fabric this leads to longer discovery times as each target needs to be queried, resulting in increased SAN traffic (as each host will poll each of the available targets, regardless to whether the target has available devices associated to that host). Zoning limits this impact, as only targets in the same zone as the initiator are probed, reducing SAN traffic and device discovery at boot-time.
Zoning provides a logical method of dividing and restricting access between a number of nodes. By introducing zoning, security is increased as nodes can only communicate to the defined set of nodes that are in the same zone.
Various Zoning Strategies
When considering appropriate zoning strategies, a number of elements should be considered, port v worldwide name zoning, hard or soft zoning, zone granularity and effective management of the zones. Each one of these considerations is explored further:
Zones are easily defined. They are simply a collection of World Wide Names (WWN’s) or Port Address that are grouped together to allow or revoke access between the collated devices. A collection of zones is called a zoneset. Each zoneset can contain many zones, which can be segregated or overlapped with one another.
The granularity of which the zones can be applied is dependant on what is trying to be achieved; common strategies involve zoning by Application, Operating System or even Business units. Another common approach (especially in disk fabrics) is to have a separate zone for each initiator with it’s set of targets. There is obviously a trade off between the benefits that zoning brings and the management of zones.
It is recommended that the SAN administrator should create zones for each of the servers initiator. In addition to the initiator, the set of targets which that host accesses should also be grouped into the same zone (ie, one zone for each servers HBA containing all disks devices it will communicate it), this is often referred to as Single Initiator Zones. This is recommend practice for Nimble devices (and majority of storage vendors). Below is an example of a Single Initiator Zones across two fabrics:
Smaller more granular zones are more difficult to manage, however they limit the exposure to RSCN’s and overall device discovery. Methods for effectively managing large zone configurations are discussed later.
Port v WWPN Zoning
There are two types of zoning: WWPN zoning and Port zoning.
WWPN zoning uses the name server database located in the fibre-channel switch. The name server database stores port numbers and World Wide Port Names (WWPN) used to identify devices during the zoning and login process. When a zone change is made, the devices in the database receive Registered State Change Notification (RSCN). Each device must correctly address the RSCN to change related communication paths. Any device that does not correctly address the RSCN, yet continues to transfer data to a specific device after a zoning change will be blocked from communicating with its targeted device. As it’s definition suggests, the WWPN Zoning consists of simply creating the zone using all the WWPNs that communicate with one another.
Below is an example of a WWPN Zone:
In this example, the Emulex WWPN is one of my Node3's HBA Ports and each of the four Nimble Storage Target WWPN's are grouped into the WWPNZone_Node3 zoneset. This is an example of a Single Initiator Zone.
Note: In many environments there is typically dual redundant fabrics so a second set of switches with independent zone configs will need to be configured.
Port zoning requires each device to pass through the switch’s route table so that the switch can regulate the data transfers. For example, if two ports are not authorized to communicate with each other, the route table for those ports is disabled, and the communication between those ports is blocked. Port zoning does not require that the WWPN is specified, only the physical switch ports that a host and it’s relative devices are defined.
Below is an example of a Port Zone:
In the example above, there are no WWPNs defined at all, merely the physical switch ports that are allowed to communicate to one another are listed.
There are pros and cons for each method. As WWPN Zoning is defined via the WWPN, each time a HBA is changed then the zoning configuration needs to be updated to reflect this change. In addition redeploying HBA's can sometimes allow unauthorised access to devices, unless the zones are kept up-to-date. Port zoning requires that servers and devices are physically restrained to only accessing via the ports to which they are allocated. Port Zoning also assumes that the datacentre is secure, as access to the switch port would give a host access to the associated devices. Many large environments will implement Port zoning strategy, as the impact of HBA/Device failures and the associated changes to the fabric would cause additional management and exposure. In addition, Port binding promotes a stable and consistent approach to allocating switch ports to associated devices on the SAN. Clearly, there is a balance between SAN flexibility and management overhead that the flexibility brings, no method is incorrect and both can be intermixed if required.
With Nimble there are a couple of considerations to think about, when it comes to Zoning:
What happens when a controller fails ?
This has no real impact, should a controller (or a target adaptor) fail, as soon as it is replaced, Nimble OS will assign the same WWPN/WWNN that was previously assigned. There is no impact on the zoning configuration regardless of what type of Zoning is used.
Planning Installation ?
Until the Nimble array is setup, you will not know what the Nimble's target WWPN's are. Therefore it maybe prudent to use Port Zoning so that any zone changes can be made in isolation of what the specific WWPN are.
Note: this is something that is likely to change in future releases of Nimble OS where we expect the user to be able to set a specific WWPN for the Port Zones. This will be handy if zone changes are made in advance to implementation of a new array.
Hard v Soft Zoning
Hard and soft zoning is often confused with port (hard) and WWPN zoning (soft). They are in-fact completely separate discussions. As mentioned above Port/WWPN zoning defines what is referenced when zones are created and enforced. Hard and soft zoning describes how communication between ports within the switch is limited and implemented.
Soft zoning was the first method of zoning implemented by switch manufactuers. In basic terms, it works on the notion that “if the initiator can’t see or know about the target then how can it communicate with it”, it’s completely analogous to having an ex-directory phone number, if the number isn’t listed then I can’t communicate as I don’t know the number. This approach is flawed as hosts can access the devices by executing commands directed to an unknown address; such behaviour could be made by mistake or be made via hacking. There is physically nothing stopping the host from accessing the device once it knows the address. Again with the telephone analogy, once I know the ex-directory number nothing stops you from dialling and having a conversation.
Hard zoning is a function latterly put into switch hardware to prevent the soft zoning security issue. Hard zoning physically blocks access to zone from any devices that are outside to the zone. In the telephone analogy it’s the same as call barring. Hard zoning is often confused with port zoning but they are fundamentally very different concepts. Hard zoning is currently enforced on the majority of switches technology by default.
Best Practices for Zone Management
The following section describes methods that can assist the storage administrator with Zone management; the zoning strategy includes each one of the following best practises:
Aliases simply provide zone management with an effective method of defining and associating WWPN’s with an alias name, which is human readable. The use of alias not only cut’s down on administration but also restricts the likelihood of errors being introduced by incorrect typing. Aliases also allow for groups of devices to be associated under one name. The use of aliases can also promote a naming standard, which can be defined and adhered, making administration more effective. Compare the example above for the WWPN Zoning with the example using Aliases below:
The alias uses a far more descriptive term and if a standard is kept it will assist with day-to-day management.
A standard naming convention eases administration and provides clarity when performing management tasks on a SAN. A consistent naming convention should be defined for each of the zone elements.
Many customers implement specific times or days when zones can be updated. That is really a decision for each organisation and their change management procedures. One recommendation is to give yourself a little 'air gap'. Often administrators will change the two fabrics at the same time. The redundancy is there for a reason. Change the first one, wait a little while, perform some checks and if all is good then change the second fabric. A little air-gap allows you a time to spot an error before making the same mistake on your second redundant fabric. It's simple tip but surprising how few use it!
Most switches provide the ability to store several zone configurations at the same time (only one is ever in effect at any one time). As this is the case a naming mechanism for configurations that include the timestamp will allow quick and easy rollback to the last known good configuration should any errors be introduced.
Adherence to standards
Always define zones, members and aliases to the documented standards. This will vastly help improve the readability and management of zone configurations.
Configurations can be uploaded and downloaded to ftp servers. Although zone information is propagated to each switch on the fabric, it is still good practise to download configurations to host after each change.
That's it ! Hopefully, if you have been following the blog you will now feel well versed with the Nimble Fibre Channel implementation. Of course if there are any further questions, hints, tips and tricks then we'd really like you to comment or post a new discussion on Nimble:Connect !
We will post some additional hints and tips over the common weeks!