Michael Root

NimbleOS: Supporting Multitenancy

Blog Post created by Michael Root on Oct 2, 2017

HPE Nimble Storage has a long history of partnering with service provider organizations and meeting their unique requirements. One of the main requirements of the service provider community is being able to offer a secure multitenant environment for their customers.  Multitenancy is a capability offered by many vendors, We will show what we have done and more importantly, how it is simpler. While we’ll focus on the service provider segment, the features and capabilities discussed can be used by any organization that requires a multitenant environment.

 

First, some background:

 

Service providers are not all the same.  We recognize that IT administrators may have the same needs as a service provider even though all of their customers are internal to the company they work for. Other service providers provide “white glove” IT services managing server and storage IT assets possibly along with the applications running on those servers.  While other service providers may just serve up storage as a service (like HPE Cloud Volumes) and sometimes that storage is only for backup or DR as a service.  In all these cases, the service provider is managing the IT resources and the customer, or tenant, of the service provider doesn’t need or want direct control of the IT resources.  The service provider leverages their expertise to efficiently manage the IT resources and the tenant can focus on the business value it derives from using the IT resources.

 

These different service providers have similar requirements from their tenants.  They all want to make efficient use of hardware and share resources amongst their tenants. There may be customers of the service provider that pay for dedicated resources, but the more common case will be that the service provider will provide a lower cost option that shares resources amongst several tenants.  The tenants will want isolation of their data.  The load and IO patterns of one tenant should not negatively impact other tenants sharing the resources.  This isolation should include protection from bursty applications that take too many IOPs or consume too much capacity. Isolation also includes the management layer where management operations like snapshots for one tenant should not affect other tenants. Tenants also expect security for their data. Tenants should not be able to access each others data, some tenants may require that the security extend to encrypting the data at REST. The service provider will want to automate as much of the management of the IT infrastructure to make responding to tenants quicker and easier. Common automation tasks will include: provisioning new or more capacity in the storage infrastructure, monitoring the health of the system and individual tenants, and reporting on capacity usage for billing and other reports for the tenants.

HPE Nimble Storage already makes efficient use of hardware with an easy-to-use interface.  NimbleOS was built from the ground up to support multiple applications and workloads  running on the same hardware.  Making it a strong and efficient platform to host separate tenants that may run different workloads. No special RAID configuration or performance settings are needed for each workload, so the service provider can provision storage for any tenant without needing to know the type of application the tenant will put on top of the storage.  Put these core NimbleOS features together with a measured 6 9s of availability and Triple+ Parity RAID data protection and you can see why NimbleOS is a great platform to build a service provider business around.  

 

In addition to these core features we have dedicated engineering effort to make the life of service providers easier.  These include always-on and configurable QoS, configurable limits and reporting on tenant capacity usage, and security for connections as well as data at rest. All of this is backed by a REST API that service providers can use to automate their process from tenant provisioning to tenant monitoring to showback reporting. Always-on QoS ensures that tenants’ workloads are isolated from each other. Without any manual configuration, a heavy read workload will be balanced fairly with a large write workload and both of those would be fairly balanced with a mixed read/write workload.  

 

Isolation

The heart of the multitenancy story of nimble is built around folders. Where groups and pools consume all of the physical resources, a folder provides a way to isolate volumes and groups related volumes for reporting. With folders, a limit can be placed on how much storage, IOPS and throughput the volumes in the folder can consume. The folder can be given a name and description to identify the tenant. Then any volume for that tenant can be created inside the folder.

 

With the folder created for a tenant, attributes can be set on the folder to ensure a smooth running service.  In addition to the Always-on QoS mentioned above, IOPS and throughput limits can be set on the folder to limit the aggregate IOPS and/or throughput of all the volumes in the folder (video). Similar limits could be placed on volumes inside the folder if additional isolation is needed inside the tenant. These two limits would work in parallel with each other with the folder limits restricting the aggregate workload. By setting the limits on the folder other tenants in other folders on the array would be protected from that one tenant running away with all the resources of the system.  The service provider can also set a usage limit on the folder, but there isn’t currently enforcement of this limit, except to prevent new volume creation when the usage of the volumes in the folder hit the limit. The service provider could sell x capacity with y IOPS or z MBPS and set the limits on the folder.  Then NimbleOS would isolate the tenants for the service provider by keeping each folder under the specified limit.  In addition to the isolation provided by the array, the service provider could monitor these folders through the REST API to see which folders are approaching or hitting their limit.

 

Reporting and Automation

   For the service provider, the NimbleOS REST API enables the automation of all the tasks that could be done through the CLI or GUI. The NimbleOS REST API could also be used to collect usage data for reports and show back for individual tenants. The NimbleOS REST API was first introduced in 2.3 with limited functionality. Improvements were made through the 4.0 release to give comprehensive coverage of NimbleOS objects suitable for service providers. The most important of these object sets will be the folder object.

 

Showback reporting is something that every service provider does to monitor, bill and show the tenant how much storage the tenant consumed. With NimbleOS, the folder can be used to isolate tenants and make this reporting easier.  The folder object will report back the total uncompressed usage for both volumes and their related snapshots. If you are a service provider that wants to charge for used blocks, one way to use NimbleOS would be to set the limit_size_bytes to the total capacity purchased by the customer. The name of the folder could correspond to the tenant and the description of the folder could be used to cross reference the folder with other systems (i.e. billing or support tools that the service provider has). Then to report back or monitor the tenants, one call to the folders object set would return every tenant with the purchased capacity and the used capacity.

The total used capacity would be the sum of uncompressed_vol_usage_bytes and uncompressed_snap_usage_bytes.  Other attributes on the folder that may be of interest to the service provider would be the compression_ratio broken out for both volumes and snapshots, the num_snaps and num_snapcolls to generally monitor changes in the snapshot and retention policy for the tenant, and the volume_list. An alternative way of billing may be based on provisioned space instead of used space. To get the provisioned space, the volumes for the tenant can be retrieved from the volumes endpoint with a query parameter to get only the volumes in a specific folder. By filtering by folder_name and including only the fields desired,  only the volumes in the folder and the data will contain only the name, size in mebibytes, and size in bytes for each volume. The total provisioned size would be the sum of all of these sizes.

 

For detailed charting and planning, both the array GUI and InfoSight provide charts that show the historical trend for used space.  In the array GUI, the monitor menu=>capacity page has the list of folders and can be used to track the historical usage. This page also shows the current usage by application.  

The capacity report in InfoSight provides similar information and includes more historical data and a breakdown and growth on a per volume basis.  

In the array GUI, the monitor=> performance page has a list of folders and can be used to track the historical performance and shows any limits set for the folder. This provides a good view into the tenants consumption of resources and shows how close or how often they hit their purchased/assigned limits. 

Security

Another requirement of service providers is the requirement to provide security for their tenants’ data. The service provider should ensure that the volumes for the tenant are isolated from other tenants. The service provider should also ensure that the tenant data does not leave the service providers data center and is not accessible after the tenant asks for the data to be removed.

 

For the first case of isolating the tenants, the service provider should use CHAP authentication for iSCSI connections. For Fiber Channel (FC) arrays, the network isolation can be done at the switch layer.  For both the iSCSI and FC connections, it is best practice to assign Access Control Lists (ACLs) to the volume to limit the hosts which can see the volumes. For iSCSI connections, the service provider can take the additional step of setting a CHAP user for the ACL.  By setting up CHAP for the tenants the service provider further isolates the volumes to the hosts that know the CHAP secret. In case an inadvertent configuration change exposes the volume to the wrong host, that host would not have the CHAP secret and wouldn’t be able to access the volume data.

 

For the second case, the service provider can encrypt the data at rest with a software based encryption that will encrypt each volume with a unique set of keys. The encryption uses the AES-2560XTS cipher designed for storage and leverages advanced CPU instructions so that the performance impact of SmartSecure is small.  The encryption is done at a volume level to give the service provider fine grain shredding of volumes.  While keys may be shared by volumes that are clones of each other, when the last volume using a key is deleted the key is no longer used by the system.  The service provider can guarantee that a drive that is replaced will not leak tenant data because the data is encrypted on the drive. In addition, the service provider can guarantee that the system will no longer decrypt or serve up data from a deleted volume.  Smart Secure must be enabled with a group wide master key before encryption keys will be generated and data encrypted.  The service provider can set a global policy at the group level that ensures that every created volume is created with SmartSecure enabled.

 

Final Thoughts

The simplicity and efficiencies of NimbleOS that general storage users love make NimbleOS a great platform to build a service provider business around.  We have added features that address the unique needs of service providers. Features like isolation, automation, and security save the service provider money and make the service more valuable to the tenants the service provider sells to.

 

More details about these features can be found in the multitenancy white paper and the use case paper that gives an example of how service providers can utilize these features.

 

 

 

Outcomes