CRW - I am not a customer and encourage the customer community to add their findings. However, I wanted to add my two cents. Short answer is that performance with scale-out is linear. Meaning you should get roughly the equivalent (minus a small scaling overhead) of the aggregate performance of 2, 3 or 4 member arrays (depending on the configurations.) You probably will not see double/triple/quadruple performance but you will get close. Also, you will see better numbers if you combine like models.
I hope this helps and I hope to see customer responses.
We have two CS460s. I started by implementing the first one for POC testing, then added the second. As long as you have the NCM installed on all hosts that communicate with volumes that will be striped across the array group members, you will most definitely see almost a doubling of performance. The additional SSD as well as controllers to absorb the writes makes a very noticeable difference. I have attached some screenshots from testing with different 4KB R/W random/sequential scenarios, and have been able to reach 250K IOPS sustained under certain loads. With larger block sizes I have seen up to almost 8GB/sec. in throughput.
VMware vSphere cluster with 4 hosts, 2x 10Gb Intel X540 SFP+ for iSCSI each
2x Brocade VDX6740 in VCS mode and 80Gb channel-group bridge, all SFP+
2x Nimble CS460 arrays (3TB HDD, 600GB SSDs)
3x Nimble ES1-H65 Shelves (3TB HDD, 600GB SSDs)