Refer a Friend and Earn Rewards
Do you know a friend or colleague that would be interested in attending VMworld 2013? As a full-conference registrant, you can earn valuable rewards for referrals.
Getting started is easy – If you have your VMworld 2013 full-conference registration email, you can use the embedded link at the bottom of your p to initiate the Refer-a-Friend tool and follow the steps provided.
If you don’t have access to your confirmation email, you can still access the Refer-a-Friend tool from the “My Account” tab inside the VMworld registration platform. Login to your account, navigate to the “My Accounts” tab and use the Refer-a-Friend tool located there.
Once your referral registers and has a settled payment status you’ll qualify for one of these rewards:
- At least 1 paid referral: One $25 VMware Store gift card
- 5-9 paid referrals: One Kindle Fire or iPod Touch (your choice)
- 10+ paid referrals: One Apple iPad + you will be entered into a drawing for the chance at winning a free VMworld 2014 Full-Conference pass
Refer-a-Friend now – All entries must be received by October 4, 2013.
To qualify for a reward, your registration must be fully paid. If you cancel your paid registration, you will not qualify for a reward. Offer not valid for VMworld 2013 group discounts, government or VMware employees, and is not retroactive.
Read the full Terms and Conditions.
So this link to the below video was waiting in my inbox this morning with the following lyrics and a note that this was Part 1.
Welcome to the ViB:
It's time for Change....
Are you ready to get your Vblock.....it's 30 days baby and it's on - it's VCE (x2)
Have you heard about the C.I. craze?
Transformation that will amaze
OPEX savings that's second to none
CIOs we can show you how it's done
Infrastructure that's rolled out in days
As a product that simplifies the ways
IT delivers to your business
Private Cloud? We can get you to this!
Ahhhhh VBlock - it's VCE- VBlock
All that pressure got you down
Trouble tickets spinning u round and round
Late night calls when patches don't go right
Finger pointing with no end in sight
With VCE it's a single number to call
A single product that's preinstalled,
Pre-integrated, pre-tested and what's more
A single matrix to upgrade it all
Ahhhhh VBlock- it's VCE- VBlock
Now feel the emergence of convergence
Now shake your body right down to your datacenter
All that pressure that made you cry
Is now replaced with an ROI
That can't be beat with an "always on" design
Technology that makes your business thrive
Accelerated & standardized
Consolidated & optimised
New Applications rolled out on time
A De-risked DC that's virtualised
Ahhhhh VBlock - it's VCE- VBlock
Shake it for me, shake it for me a reference architecture never did it 4 me
Gotta plug in 2 your core of your network baby & power it up now someone help me
Now shake your body right down to your datacenter
Register now and save €400 off onsite pricing.
Learn how to Defy Convention by extending the benefits of virtualization to all data center services and exceed your business expectations.
Join us in Barcelona on October 15-17 at VMworld 2013 and gain the tools you need to transform conventional remedies into seamless, agile solutions that dramatically simplify your operations and provide unmatched business advantages.
Register now and benefit from:
- In-depth training and hands-on experience – VMworld offers more than 200 technical and content-rich sessions and labs covering the latest virtualization innovations in the data center for storage, networking, security, management, workforce mobility and hybrid cloud services.
- Product research and analysis – Review the latest competitive solutions from more than 150 sponsors and exhibitors side-by-side in the Solutions Exchange.
- Networking with industry experts – Compare notes with other IT professionals while making contacts you can leverage for advice and best practices for months to come.
Together, we can evolve from the ordinary and leave the pitfalls of legacy computing behind. It’s time to Defy Convention.
This is VMworld 2013.
Amongst these interesting insights I was also sent a number of photos of cars with rather interesting number plates - see below:
|ViB - vehicle 1|
|ViB - vehicle 2|
|ViB - vehicle 3|
While none of this was really making any sense, it was this morning when I received a link to a new 40 second clip (see below) from the ViB where I was assured things would become clearer.
If anything what does seem to be clearer is that the ViB is most likely an acronym for "vArchitect in Black". Apart from this the 40 second teaser trailer seems to throw up more questions than answers!
- Are the ViB a new specialist team within VCE?
- Is this teaser trailer a precursor to a new documentary or feature film from VCE?
- Is this just a marketing stunt or hoax?
- Is this the precursor to another new product launch from VCE?
- Does this in fact have anything to do with VCE?
- Why are CIOs looking to VCE's Vblock as a solution to their problems and more specifically the ViB?
I'll leave you to decide by watching the clip yourself below and of course update you if and when I receive further information and clarification.
VMworld is turning 10 and we invite you to the celebration!
Register today and save $500 off the on-site price.
Celebrating 10 years as the premier conference for IT professionals, VMworld has grown from 1,600 attendees in 2004 to more than 20,000 expected in 2013. Since our first VMworld, we’ve seen three Mars landings, submarines sent to the depths of the Mariana Trench, and Facebook grow from 650 users to 680 million.
Through the last 10 years VMware has extended the benefits of our market-leading technology across the entire data center—from compute, storage, networking, security and availability services to the mobile workspace. Year over year, VMworld remains the place to learn about VMware solutions for your business and network with those at the forefront of the virtualization revolution. Register today for your full-access pass to:
- General Sessions – Hear from VMware’s leaders on our future roadmap
- Breakout Sessions – Choose from 350 + technical, content-rich sessions delivered by speakers who understand how to leverage virtualization to make your business more agile, responsive and profitable
- Hands-on Labs – Experience firsthand the latest VMware solutions with experts across VMware’s portfolio
Visit vmworld.com for latest news and updates on VMworld 2013.
If you were to ask EMC or VMware whom they consider their major threat and competition you’d be easily forgiven for being mistaken to think it was NetApp, HP or offerings such as Hyper-V. With many terming us to now be in the third era of corporate computing, with mainframe and the client/server being the first two, the current cloud era has undoubtedly been spearheaded by the likes of Google, Amazon and Facebook. It is here where EMC and VMware face their biggest challenge of remaining relevant and cutting edge in a market that demands automation, simplicity and speed of deployment. Despite major marketing campaigns of “Big Data” and “Clouds” that have seen airports littered with exorbitant amounts of posters and adverts, as well as numerous acquisitions of various companies that have extended already huge product portfolios, both EMC and VMware have struggled to release themselves from the shackles of being deemed just a Storage and Hypervisor company. So in light of this it’s no surprise to see both companies spin off a new and independent venture that will address this very challenge, namely the Pivotal Initiative.
With a promise of $400 million in investments and a 69 / 31 % split in ownership between EMC and VMware respectively, the Pivotal Initiative will be headed by none other than VMware’s ex-CEO Paul Maritz. At the time his stepping down from that position raised a few eyebrows and questions as to whether he was being demoted, prepped for early retirement or was just being pushed to make way for VMware’s current CEO, Pat Gelsinger. In hindsight one could easily see this now as a move that maybe Maritz himself initiated from his own recognition that VMware as a company was failing to transition yet alone be recognized as a PaaS organisation.
Maritz like most in the industry would have recognised that with ever increasing data sets and ever increasing scale, the need for automation, rapid application development and deployment is quickly breaking beyond the capabilities of traditional man managed infrastructures that have previously been offered by EMC and VMware. Moreover both VMware and EMC know it’s all about applications and specifically big data applications. For VMware and EMC to succeed in having the de facto platform of the IT industry, it’s key they win the war to host these new and integral applications. To address this EMC and VMware went about acquiring just about every relevant start up or product that could possibly address this challenge from GemStone, GreenPlum to SpringSource. Despite this huge purchasing spree and VMware’s push to develop vFabric and create the PaaS initiative Cloud Foundry both EMC and VMware have struggled to gain market recognition as true Cloud and PaaS players.
One of the key aspects challenging EMC and VMware’s recognition as a Cloud and PaaS offering has ironically been the very thing they drove to try and solve it i.e. the incredible rate of acquisitions and consequent increase in product portfolios. With sales and presales teams that had been accustomed for years to successfully pitching and selling storage arrays and hypervisor licenses, the demand on them was now to understand new and alien concepts of Big Data analytics, PaaS, application development, SaaS etc and also address a customer base they were not accustomed to. Now by having Maritz head up a brand new and independent company that can essentially take the appropriate products from those portfolios, the opportunity is to establish brand new and focused sales, technical and post sales teams that understand applications, big data etc. as well as have the right level of existing relations within their potential client base.
So what is the Pivotal Initiative actually bringing new to the table in terms of products? Well not much actually. In fact what it does bring is a much needed cohesion between what have now been a multitude of disparate acquisitions and products that have failed to gain the market share their technical and business benefits certainly deserve.
Firstly there’s the platform that will be based on EMC’s Greenplum appliance integrated with Pivotal HD, the data querying system that works with Hadoop. The Greenplum appliance is based on the open source PostgreSQL, which is a full ANSI-standard relational database system and has performance benchmarks with Hadoop’s parallel system that are already impressive. With the soon to be released Pivotal HD product from the Pivotal Labs group, the aim is to conduct even more queries against even larger data sets.
From a VMware perspective, there’s the inclusion of Gemfire to serve as the caching layer with its capability of quickly ingesting events via its in-memory data management system. Then there’s Cetas that provides rapid analytics atop the Hadoop platform and is designed for the elasticity of virtual resources with specific focus on not only vSphere but also Amazon Web Services. Additionally and most interestingly is the addition of the Cloud Foundry PaaS, which was initially built to run on VMware’s proprietary system. This time it comes with the promise that it will be an abstraction layer with application automation for cross clouds enabling Pivotal to be hosted on the likes of Amazon Web Services' EC2. Coupling this with SpringSource’s Java application development framework to enable integration with legacy data sources and applications and the Pivotal Labs’ ability for facilitating rapid coding, the objective is a focused approach and aim at the jugular of online and enterprise analytics.
The Pivotal Initiative will aim to deliver the market a data analysis platform capable of capturing large volumes of data, quickly addressing and querying it and then producing near real time answers that can be stored in a large scale-out storage system. It would be naïve to think this is an initiative aimed just at existing VMware customers. This is an attempt to not only enter but also become relevant in the software led infrastructure arena that competes with the likes of Amazon.
In essence the Pivotal Initiative is a brave yet necessary move from both EMC and VMware to embrace the challenge of change as the legacy of traditional infrastructure faces the daunting prospect of new software paradigms. Whether the Pivotal Initiative can be successful and achieve it’s $1bn rate in its projected five years depends on a number of factors. One thing is certain is that the first challenge to remaining relevant in the IT industry is to acknowledge and adapt to change. The masters behind the Pivotal Initiative have already achieved that.
When you think Cloud, whether Private or Public, one of the key advantages that comes to mind is speed of deployment. All businesses crave the ability to simply go to a service portal, define their infrastructure requirements and immediately have a platform ready for their new application. Coupled with that you instantly have service level agreements that generally centre on uptime and availability. So for example, instead of being a law firm that spends most of its budget on an in house IT department and datacenter, the Cloud provides an unavoidable opportunity for businesses to instead procure infrastructure as a service and consequently focus on delivering their key applications. But while the understanding of Cloud Computing and its benefits have matured within the industry, so too has the understanding that maybe what’s currently being offered still isn’t good enough for their mission critical applications. The reality is that there is still a need for a more focused and refined understanding of what the service level agreements should be and ultimately a more concerted approach towards the applications. So while neologisms such as speed, agility and flexibility remain synonymous with Cloud Computing, its success and maturity ultimately depend upon a new focal point, namely velocity.
Velocity bears a distinction from speed in that it's not just a measure of how fast an object travels but also in what direction that object moves. For example in a Public Cloud whether that be Amazon, Azure or Google no one can dispute the speed. Through only the clicks of a button you have a ready-made server that can immediately be used for testing and development purposes. But while it may be quick to deploy, how optimised is it for your particular environment, business or application requirements? With only generic forms the specific customization to a particular workload or business requirement fails to be achieved as optimization is sacrificed for the sake of speed. Service levels based on uptime and availability are not an adequate measure or guarantee for the successful deployment of an application. For example it would be considered ludicrous to purchase a laptop from a service provider that merely stipulates a guarantee that it will remain powered on even though it performs atrociously.
In the Private Cloud or traditional IT example, while the speed to deployment is not as quick as that of a public cloud, there are other scenarios where speed is being witnessed yet failing to produce the results required for a maturingCloud market. Multiple infrastructure silos will constantly be seen to be hurrying around, busily firefighting and maintaining “the keeping the lights on culture” all at rapid speed. Yet while the focus should be on the applications that need to be delivered, being caught in the quagmire of the underlying infrastructure persistently takes precedent with IT admin having to constantly deal with interoperability issues, firmware upgrades, patches and multi-management panes of numerous components. Moreover service offerings such as Gold, Silver, Bronze or Platinum are more often than not centered around infrastructure metrics such as number of vCPUs, Storage RAID type, Memory etc. instead of application response times that are predictable and scalable to the end user's stipulated demands.
For Cloud to embrace the concept of velocity the consequence would be a focused and rigorous approach that has a direction aimed solely at the successful deployment of applications that in turn enable the business to quickly generate revenue. All the pieces of the jigsaw that go into attaining that quick and focused approach would require a mentality of velocity being adopted comprehensively from each silo of the infrastructure team while concurrently working in cohesion with the application team to deliver value to the business. This approach would also entail a focused methodology to application optimization and consequently a service level that measured and targeted its success based on application performance as opposed to just uptime and availability.
While some Cloud and service providers may claim that they already work in unison with a focus on applications, it is rarely the case behind the scenes as they too are caught in the challenge of traditional build it yourself IT. Indeed it’s well known that some Cloud hosting providers are duping their end users with pseudo service portals where only the impression of an automated procedure for deploying their infrastructure is actually provided. Instead service portals that actually only populate a PDF of the requirements which are then printed out and sent to an offshore admin who in turn provisions the VM as quickly as possible are much closer to the truth. Additionally it’s more than likely that your Private Cloud or service provider has a multi-tenant infrastructure with mixed workloads that sits behind the scenes as logical pools ready to be carved up for your future requirements. While this works for the majority of workloads and SMB applications, with more businesses looking to place more critical and demanding applications into their Private Cloud to attain the benefits of chargeback etc. they need an assurance of an application response time that is almost impossible to guarantee on a mixed workload infrastructure. As the Cloud market matures and the expectations that come with it with regards to application delivery and performance, such procedures and practices will only be suitable for certain markets and workloads.
So for velocity to take precedent within the Private Cloud, Cloud or even Infrastructure as a Service model and to fill this Cloud maturity void, infrastructure needs to be delivered with applications as their focal point. That consequently means a pre-integrated, pre-validated, pre-installed and application certified appliance that is standardized as a product and optimised to meet scalable demands and performance requirements. This is why the industry will soon start to see a new emergence of specialized systems specifically designed and built from inception for performance optimization of specific application workloads. By having applications pre-installed, certified and configured with both the application and infrastructure vendors working in cohesion, the ability for Private Cloud or service providers to predict, meet and propose application performance based service levels becomes a lot more feasible. Additionally such an approach would also be ideal for end users who just need a critical application rolled out immediately in house with minimum fuss and risk.
While there may be a number of such appliances or specialized systems that will emerge in the market for applications such as SAP HANA or Cisco Unified Communications the key is to ensure that they’re standardized as well as optimised. This entails a converged infrastructure that rolls out as a single product and consequently has a single matrix upgrade for all of its component patches and firmware upgrades that subsequently also correspond with the application. Additionally it encompasses a single support model that includes not only the infrastructure but also the application. This in turn not only eliminates vendor finger pointing and prolonged troubleshooting but also acts as an assurance that responsibility of the application’s performance is paramount regardless of the potential cause of the problem.
The demand for key applications to be monitored, optimised and rolled out with speed and velocity will be faced by not only Service providers and Private Cloud deployments but also internal IT departments who are struggling with their day to day firefighting exercises. To ensure success, IT admin will need a new breed of infrastructure or specialized systems that enables them to focus on delivering, optimizing and managing the application and consequently not needing to worry about the infrastructure that supports them. This is where the new Vblock specialized systems being offered by VCE come into play. Unlike other companies with huge portfolios of products, VCE have a single focal point, namely Vblocks. By now adopting that same approach of velocity that was instilled for the production of standardized Vblock models, end users can now reap the same rewards with new specialized systems that are application specific. Herein lies the key to Cloud maturity and ultimately the successful deployment of mission critical applications.
This is a discussion about our storage decisions over that past two years and how we found a robust solution that fits our needs without a huge investment.
As a Law Firm, we know that our systems need to be up and performing optimally 24/7 365 days a year. Time is money and with the economy we are in now every little bit counts. With my past experience in the storage arena, I knew that every storage purchase has its caveats. I have seen and heard the horror stories when a large storage purchase is made and then a year later you run out of space. I didn’t want this to happen to us. When I started at the Firm, I was tasked with making sense of the current leased NetApp FAS2050 storage. I found it easy to manage; however, it was a lease and we paid for any additional space we consumed beyond our initial lease terms. With continued growth we easily consumed more space than our lease provided costing us additional money and crippling performance. After crunching the numbers, I made the recommendation that it would be more cost effective to buy a SAN to use for our VMware, SQL, and Exchange environments. We had our ideal NetApp solution but due to cost we reluctantly decided to purchase a less ideal solution. Within 6 months the less ideal solution was showing its flaws. We were running out of space and performance was suffering. I anticipated we would run into issues, but not as quickly as we did. With performance issues lingering we needed to find a new solution and didn’t want to end up in the same place in another 6 months. One thing I did know was I needed to find a solution and not break the bank. It also needed to support a 6 host VMware server environment along with Exchange 2010, multiple SQL instances, and some very performance intensive file storage. We also wanted something that would support a Virtual Desktop environment in the future. I discovered dozens of solutions at VMworld which gave me a starting point. In addition, we had an application upgrade budgeted that included money for storage. I understood how much space and what type of IOP requirements were needed to be a sustainable solution, but finding a vendor that had the best of both worlds was difficult. The three vendors below seemed like good options to explore as they were all capable systems with some sort of hybrid solid state ability as well.
Since we were already highly invested in NetApp, I looked to them first. I knew our current systems were struggling and staying with them would mean a large investment again. After weeks of discussions they proposed a solution that provided 25,000-50,000 IOPS with roughly 50TB of storage. The cost was significantly above our budget and would have been a full rack of disk shelves. They also couldn’t provide us with a guaranteed IOP number.
HP 3 Par
We liked 3 Par and it would have been our second choice but it still was a disk based storage system. It would have required lots of spinning disks and the rack space was similar to NetApp. Performance was better but they didn’t have 10Gb connectivity until their next release. Licensing was going to be expensive as well. Their IOP count was around 50,000 and their cost was similar to NetApp.
One of our application vendors recommended Hitachi. I knew from experience that they did have a good product and it performed. They were probably the only disk based storage vendor that really understood our requirements and could provide real performance numbers. Again it was a rack of disks and the price was significantly over budget. About twice that of NetApp and 3 Par.
After seeing what the 3 top disk based storage vendors had to offer, it was clear to me we needed to move in a different direction. I saw quite a few hybrid systems at VMworld and decided to look at what they had to offer. With the advent of solid state storage and FusionIO, hybrid vendors have come up with very viable solutions. The only challenge at this point is cost. Most solid state solutions become quite expensive due to the quantity of disks required. I made calls to the major players in the hybrid arena and ended up looking at the following vendors. The top 3 I researched are listed below. Other vendors I ruled out due to capacity limitations or their likelihood of being around in a few years.
We were introduced to Violin at a local HP event and were impressed at their solution. Being relatively new to the market they didn't have some of the features we needed and the cost wasn't acceptable. They did however offer the type of performance we were looking for. They really are at this point application specific storage not an enterprise storage solution.
I found PureStorage at VMworld and was impressed at their design and performance. They also had the basic features we wanted which included snapshots and replication. Since they use Solid State drives it was easy to see that their solution was going to be expensive. Once I saw the price there was no way we could fit it into our budget. It definitely should become less expensive as Solid State drives come down in price.
I also saw NexGen Storage at VMworld. After a couple of days of VMworld I was burned out on sales pitches so I didn't get a chance to see their system in action. After making cold calls to the remaining storage vendors I got a call back from NexGen. After a short discussion with Sales I confirmed that NexGen was new to the market however the people behind it are not. They also gave me ballpark pricing which was well within our budget. We ended up setting up a meeting so we could understand their solution in detail.
The meeting with NexGen went extremely well. They explained how their system uses FusionIO combined with inexpensive hard drives to achieve the highest IOP/GB per dollar we could find. Their software also compliments their system with the ability to provide storage QOS (Quality of Service). It is very similar to storage tiers in the 3Par and Hitachi world. The main difference is you do not need tiers of different types of storage. They did lack the other features we were looking for like snapshots and replication but assured us it was coming. We were also concerned about their viability and sustainability in the market. They offered to bring our CIO and myself out to their headquarters in Lousiville, Colorado to meet with their CEO John Spiers, and CTO Kelly Long who are the original founders of LeftHand Networks. This also gave us a chance to see their solution in action and meet everyone on the NexGen team. We were thoroughly impressed and had quite a bit of thinking to do on the flight home. After a few weeks and deep discussion we made the decision to go with NexGen Storage.
So you probably are asking why NexGen and what did it cost?
Our main reason for NexGen was their performance guarantee along with cost. We ended up getting a solution that provides the lowest $/IOP and $/GB on the market to date. Another huge benefit is the 3U of rackspace per 33TB of useable storage.
We did need to move our VMware backup and replication from the NetApp Virtual Storage Console to Veeam at an additional cost however it provides a much better backup and replication solution. We also no longer needed VMware SRM and instead of only 25% of our VM’s being replicated for instant recovery all of our VM’s are protected.
The best thing about NexGen is the ease of installation and use. It literally took less than an hour to install and configure before I was migrating VM’s.
Performance so far has been remarkable. The first gain I've noticed is with the guest OS. The OS now acts like it is on a physical machine instead of being sluggish and jumpy. One of our biggest issues has been one of our document management applications. The application requires a document repository that contains millions of files within a complex folder structure. My first test was to see how fast Windows could scan the directory and provide the directory properties. This being the size of the directory along with the file count. I started both scans about the same time. Within 5 minutes the NexGen side was done scanning 600GB and over 5,000,000 files while the NetApp scan was only at 10%.
At this point I have only tested the basics. I have noticed that the Read/Write latency is much lower on everything I move. Once I get everything migrated I will provide an update with my findings.
NexGen is definitely worth taking a look at if you have a tight budget but require high performance.
What is disaster?
A disaster is any event that halts business activity on a large scale. A natural or man-made disaster can happen at any time without warning. Besides obvious potential harm to human life, buildings, and infrastructure, an organization’s IT resources are also at risk. Businesses that lose their data are at high risk. The end result is that many of the businesses never reopen after the loss. Many businesses that lose their data and reopen still go out of business within a short time. The exact percentages reported vary depending on the source, but approximately 90 percent of businesses that have a disaster involving their datacenter go out of business within one year.
A disaster that affects IT resources leads to business downtime. While you attempt to recover data and validate systems for use, the businesses typically are in shut down state. Services, such as Web access and email communication, could be affected for days or even weeks. Access to personnel records, inventory, order fulfillment, finance, and other crucial business data could be unavailable even longer. Such crucial data may never be recovered from the loss. In such situations, the business that loses data shuts down.
Most IT infrastructures have redundant application servers for fault tolerance on key applications such as email or databases. However, most of them are located in the same datacenter as the primary servers. Their placement in the same datacenter makes it easy to manage fault tolerance but causes high risk. If the datacenter is damaged or destroyed, both the primary servers and the redundant servers can be lost.
The third category is events that are non-disasters. These include service disruptions and system failures.
Catastrophes and disasters impact the entire datacenter. Smaller-scale “non-disasters” are service disruptions and system failures. These are usually caused by the failure of a specific system, which can be hardware or software, or by some kind of infrastructure outage. The failure of something like a major storage array or having a communications fiber accidentally cut is a good example of this kind of event.
Some of these disruptions are planned, like maintenance. For example, you may know in advance that the installation of a new fiber-optic communications line will temporarily disrupt your existing network. Other outages are unplanned, such as “disasters in miniature”, when a single application, system, or resource fails.
It is possible to use the DRP, or some parts of the plan, to help mitigate the effects of an event like this. But this will be possible only if you build this flexibility into the plan ahead of time.
Disaster recovery planning carries a number of challenges.
The first challenge is the objective of minimizing downtime. Providing a rapid recovery is difficult because many of the recovery processes in traditional recovery plans are complex, manual processes. These are processes for things like validating the configuration of recovery site hardware, installing software, and so on. Multiple steps are required to overcome any hardware differences between the primary site and the recovery site, which typically contains the redundant servers.
Hardware differences at the recovery site might force you to modify or reinstall your operating systems. Applications may also need to be modified. These processes are documented in thick documents at the recovery site known as runbooks. A runbook is a detailed procedure for how to recreate a system. Runbooks are normally maintained manually. They are difficult to create, often incomplete, and they are difficult to keep up-to-date. As a result, during recovery, a lot of time is spent validating the runbook procedures and modifying procedures that are found to be incomplete or out-of-date. All of this takes time and slows down the recovery process. A recovery that needs to happen in hours might actually take days or even weeks.
The second challenge is the problem of reducing risk. You do not ever want your DRP to fail. To have a workable DRP, you must first test the DRP. Testing of DRP requires even more hardware and infrastructure. You must also constantly update the DRP. Usually, the only thing that is updated in the DRP is data. Things like infrastructure and its components are not updated. IT staff that tries to follow the DRP suddenly realizes that they have not updated the DRP for a while. But if the original data and hardware has been destroyed or rendered inaccessible, you do not have the luxury of being able to go and get another copy or go and take another look at the configuration.
The third challenge is to reduce the cost of disaster recovery. Controlling the cost of DRPs is made challenging by several factors. The first factor is to provide the fastest and simplest recovery. To do that, you must duplicate the production datacenter at the recovery site. This is particularly true in the x86 or Intel and AMD space. This is almost impossible because hardware at the primary site is constantly being added, updated, or modified. The second factor is to eliminate recovery failures that occur due to hardware dependencies. Doing this doubles the cost of your datacenter. If you want a rapid recovery, you are going to need to leave most of the hardware at the recovery site idle almost all of the time. This is because repurposing traditional hardware is very time-consuming. The third factor is the DRPs dependency on multiple third-party products. This also drives up the cost of disaster recovery planning.
First released in 2008, SRM is an award-winning disaster recovery management product.
SRM leverages the inherent disaster recovery capabilities of the vSphere platform and array-based replication.
Designed as a workflow tool, SRM simplifies and automates the key elements of disaster recovery: that is, setting up disaster recovery plans, testing those plans, and executing failover and failback when a datacenter disaster occurs.
Once VMware vSphere is deployed on the protected and recovery sites, and replication is established between the two sites, you use SRM to create disaster recovery plans that designate failover instructions.
In the event of a disaster, administrators are notified and must decide whether to initiate a failover. If they initiate a failover, SRM implements the disaster recovery plan following four basic steps:
• First, on the Protected Site, SRM shuts down the virtual machines starting with the virtual machines designated as the lowest priority. Failover does not require connectivity to the protected site, so if SRM cannot connect to the site, it simply notifies the administrator that it cannot power down the virtual machines and proceeds to the next step.
• Next, at the recovery site, SRM prepares the replicated storage for failover.
• Then SRM suspends any virtual machines running on the recovery site designated as non-critical to provide more resources for the virtual machines to be powered on at the recovery site.
• Finally, SRM starts virtual machines at the recovery site starting with all VMs in Priority Group 1, and after they have completed their boot process proceeds through Priority Groups 2 through 5 in sequence.
SRM offers several key features to make disaster recovery rapid, reliable, manageable, and affordable.
It provides a central place to create, test, update, and execute recovery plans for the different parts of the virtual environment. SRM works hand-in-hand with VMware vCenter Server for a unified management view of the virtual infrastructure and the recovery plans for it.
SRM also provides automation of key aspects of disaster recovery. It helps to specify the recovery process in advance of a disaster event. It then automates execution of tests of that recovery process to ensure that the recovery plan is complete and reliable. In the event of an actual site failure, SRM automates the recovery process, eliminating many of the manual processes and associated errors that lead to slow recovery or failures.
SRM simplifies setup and integration of several aspects of disaster recovery. It makes it simple to specify how to divide up and use resources at the recovery site.
SRM also ensures that all the key information about the environment is replicated to the recovery site.
It provides simple integration with leading storage replication technologies from leading providers.
And it ensures that important information about the virtual infrastructure (for example vCenter Server management information) is sent to the recovery site and kept up-to-date.
Site Recovery Manager Buliding Plan:nderstanding how SRM integrates with array-based replication components is key to a successful deployment.
The smallest possible unit of storage for replication is a storage volume, referred to as either a “LUN” on SAN arrays or a “volume” on NFS arrays. It is never possible to fail over the contents of part of a storage volume without failing over the entire volume. That means you must group virtual machines on storage volumes accordingly.
VMware formats storage volumes with VMFS to store virtual machines. These VMFS -formatted volumes are referred to as “datastores.” Datastores commonly contain only a single storage volume, but do have the ability to span storage volumes.
The smallest group of datastores (and therefore storage volumes) that can have its contents failed over with SRM is referred to as a “datastore group.” These groupings are calculated for you so you do not have to worry about figuring them out. In this example of a SAN array, LUN1 is formatted with VMFS A and has three virtual machines. It has no dependencies on anything and is in its own datastore group.
Two factors determine what causes storage volumes and datastores to be grouped together and not distinctly managed:
First, a datastore spanning multiple storage volumes causes those volumes to be grouped together in the datastore group. Failing over part of a datastore is not possible. In our example, LUNs 2 and 3 have VMFS B spanned across them so they are in the same datastore. Since all six virtual machines on that datastore sit on only VMFS B and touch no others, VMFS B (and therefore LUNs 2 and 3) is alone in the second datastore group.
Second, a virtual machine can have multiple virtual disks and those virtual disks may reside in different datastores. In that case, those datastores are forced together into a datastore group so that you do not try to fail over only part of a virtual machine. In our example, LUN4 is formatted with VMFS C and LUN5 is formatted with VMFS D. These LUNS are grouped in a third datastore group because the virtual machine has a virtual disk in each VMFS datastore.
Within SRM, a collection of all virtual machines stored in a datastore group is called a “protection group.” When configuring the protected site, SRM administrators create protection groups with a one-to-one mapping to datastore groups. Protection groups are simply the group of virtual machines that reside on a single datastore group. This is the actual unit of virtual machine protection and recovery. In this example, Protection Groups 1, 2, and 3 are created corresponding to Datastore Groups 1, 2, and 3.
Once you have created protection groups, you can create “recovery plans” containing one or more protection group . A recovery plan is simply a list of virtual machines from the protection groups, a startup order for those virtual machines, and any custom steps added before or after virtual machine startup. This is the “virtual run book” that is executed during disaster recovery tests and actual disaster recovery failovers.
At the recovery site for all these virtual machines in our example, we created two recover plans.
Recovery Plan 1 contains all three protection groups and therefore all 10 of its virtual machines. This recovery plan would be used if the entire site were lost.
Recovery plan 2 includes only Protection Group 1 and its three virtual machines. This is for some partial failure – perhaps corresponding to a server rack, an array, or a business unit. It would be run to recover that particular set of systems.
About SRM Protection Group:- A protection group is a set of virtual machines that will be moved together during tests and failovers. Protection groups have a one-to-one relationship with datastore groups.
Virtual machines in a protection group inherit inventory mappings and replication properties from the protection group. Data movement to the recovery site is delegated to the replication providers specified when you create the protection group.
Once the protection group is created, SRM updates the inventory at the recovery site by registering placeholder virtual machines of the virtual machines included in the protection group.
After creating protection groups, you have the option to configure individual virtual machines to override the defaults inherited from inventory mappings and the settings on their protection group.
The wizard offers several pages of settings.
The Folder, Resource Pool, and Networks pages allow you to override the defaults set during inventory mapping configuration.
The Storage page provides a way to define protection for portions of the virtual machine that are not replicated. For example, you might want to provide the location of an ISO file to mount at recovery time instead of the physical device. Or if the virtual machine has a non-replicated swap disk, you could provide the location of a pre-created copy of the swap disk at the recovery site.
The VM Files page allows you to change the location of the temporary inventory files.
The Customization Specification page allows you to assign an IP address for use at the recovery site. We will talk more about that in just a minute.
The Recovery Priority page provides a way to designate the startup priority at the recovery site. The default for all virtual machines is Normal; however, you can change the priority to Low, High, or Don’t Power On.
SRM initiates startup of virtual machines set at Normal and Low in parallel. That is, if the recovery plan includes virtual machines on multiple hosts, SRM starts up two virtual machines per ESX host at the same time. SRM uses a different method to start up High priority virtual machines. SRM starts High Priority virtual machines one--after-another, regardless of how many ESX hosts are available.
The Pre-Power On and Post-Power On pages let you attach a message or command to the virtual machine. Then when SRM is testing or recovering the virtual machine, SRM either shows the message or executes the command before or after the virtual machine powers on.
When configuring individual virtual machines, keep in mind that protected virtual machines need unique IP addresses for the assigned network at the recovery site. If recovered virtual machines use DHCP to obtain their IP addresses, you will likely not have to do any additional configuration. However, in cases where the recovered virtual machines have static IP addresses and the two sites have disparate networks not joined by a stretched VLAN, you will have to manually change the IP addresses for use at the recovery site.
One way to customize network settings for a virtual machine is to use the Customization Specification Manager included in vCenter Server. To do so, connect the vSphere Client to the vCenter Server system at the recovery site and create a customization specification for each virtual machine that requires a custom network configuration. Only the Network Interface Settings page is applicable to SRM, so SRM ignores entries on other pages of the wizard. After you create and save the customization specifications for each virtual machine requiring a custom network configuration, you can apply them to the virtual machines when you create protection groups or recovery groups.
Another option is to use the IP property customization tool, dr-ip-customizer.exe. This tool is installed during installation in the bin subdirectory of the SRM installation directory. The IP customizer tool allows you to specify network settings for any or all of the virtual machines in a recovery plan by editing a comma-separated-value file that the tool generates. Initially, this file includes a single row for each placeholder virtual machine in the plan. You can edit the file to add a row for each network adapter in each placeholder virtual machine and then customize the network settings for each adapter. When you are finished, you use the edited file as input to a command that creates customization specifications for the placeholder virtual machines. For detailed procedures on how to use this tool, see the Site Recovery Manager Administrator Guide.
As we mentioned earlier, you use the Pre-Power On and Post Power On pages to add messages and commands to the recovery plan.
Message steps are information in text format. When a recovery plan is running and encounters a message step, the recovery pauses until you acknowledge the text of the message. The plan then continues to run.
Command steps are live scripts that you can insert in the recovery plan. When a recovery plan is running and encounters a command step, the script runs. Command line scripts can call executables in both .exe and .com file formats.
If you add a command to run a batch file or DOS command, provide the script path as an argument to the shell command. For example, if you are running a .bat file, here is an example of what the command would look like.
Scripts are executed under the Local Security authority of the SRM Server system. You can store scripts wherever you like. However, we recommend storing them on a local SRM disk rather than a remote network share.
Creating Protection Group Demo: https://vmware.adobeconnect.com/_a58402297/p16955904/
Site Recovery Manager Deployment:The original virtual machines to protect reside at the Protected Site, and the protected virtual machines are replicated to the Recovery Site. For the sake of simplicity , throughout this training we will continue to refer to the two sites involved in a deployment as the protected and recovery sites. However, in reality, two sites can be used to protect one another’s resources, so a given site can be primary for one group of virtual machines and secondary for another group. Furthermore, you can set up SRM to protect multiple sites with a single recovery site. We talk more about shared recovery in Module 2 of this course.
Protected virtual machines, hosted on ESX systems, are stored in a SAN or NFS array. The array includes block replication software, which replicates the virtual machine files on the array at the recovery site. It is important to remember that the storage subsystem manages and executes replication. Replication is not performed inside the virtual machines, or by the VMkernel or service console. SRM supports discrete, asynchronous, and synchronous replication.
Each site includes a vCenter Server system so that the sites can function independently. That is, if one site fails, the other site must have its own vCenter Server system to start the failover process and manage the ESX hosts. Each vSphere Client/vCenter Server system pair manages the disaster recovery tasks relevant to its own site.
At the protected site, you configure SRM and perform activities related to grouping virtual machines into units of failover, called protection groups.
At the recovery site, you create and manage disaster recovery plans.
VMware vCenter Site Recovery Manager Server is a server process with its own database. Both SRM Server and its database are separate from vCenter Server and its database. You can run the server processes for SRM and vCenter Server on the same or different servers. Likewise, the databases for vCenter Server and SRM can reside on the same or different database servers. The SRM service can also run in a virtual machine.
SRM includes a plug-in to the vSphere Client. This plug-in allows you to perform all administration tasks using the same interface.
SRM interfaces with storage array replication software via storage replication adapters, or SRAs. These programs, written by the storage vendors, reside on the SRM Server system and, once installed, are invisible for the duration of their use. An SRM deployment requires an SRA for each array in the SRM setup.
SRAs translate generic commands generated by SRM for tasks such as discovering arrays in the datacenter and determining which volumes are replicated by the arrays. SRAs also assist in initiating recovery plan tests and failovers.
SRAs are written by the array vendors to ensure tight integration with SRM. SRAs allow SRM to support many different arrays without hard coding specific array knowledge into the SRM binary. As a result, SRAs can be released separately from the rest of the SRM product and downloaded from vmware.com. SRAs are fully developed, tested, and supported by the storage vendors, which ensures the best reliability and support.
SRM Setup Protection: The key to a successful SRM deployment is proper planning and preparation. Before installing SRM, there are five preparation tasks to perform. They are:
Identify which virtual machines to protect,
Prepare datastore groups,
Verify SRM prerequisites
Prepare the VMware vCenter Server inventory at the recovery site
And validate DNS lookups are working for the servers at the protected and recovery sites.
Let’s take a closer look at each of these tasks.
For any SRM deployment, the first preparation task is to determine which virtual machines to protect.
Datacenters include both local services and protected services.
Local services are infrastructure type services such as print, virus management, and security camera services. Local services are generally bound to the datacenter, so they should not be included in recovery plans.
Protected services are application-type services that need to be available to the organization at the time of a test or disaster. Include virtual machines hosting protected services in the recovery plan.
Once you have identified which virtual machines to protect, you need to ensure that storage replication is correctly configured for those protected virtual machines. SRM supports discrete, asynchronous, and synchronous replication.
You can use existing or new datastores. In either case, work with the storage team to ensure that replication between the protected and recovery sites is set up for all datastores that will host protected virtual machines. Setup and configuration of replication differs from array vendor to array vendor, so if you are unsure of how to complete the necessary replication setup and configuration, consult the array vendor.
If possible, store only protected virtual machines on the datastores that are being replicated from the protected site to the recovery site.
And consider using Storage VMotion to move the protected virtual machines onto the SRM datastore groups. That way you can relocate the protected virtual machines without any interruption to services.
The SRM Compatibility Matrixes document lists the most current SRM prerequisites in detail, but we will briefly review the requirements here so that you know what to expect.
As we just mentioned, SRM requires array-based replication configured between the protected site and the recovery site.
Additionally, at each site, SRM requires:
vCenter Server 4.1.
And the servers hosting virtual machines must be running VMware ESX 3.0.3, ESX or ESXi 3.5 or later.
SRM also requires:
A network configuration that allows TCP connectivity between the systems hosting the vCenter Server and VMware vCenter Site Recovery Manager Server,
An Oracle, SQL Server, or DB2 database that uses ODBC connectivity,
And an SRM license key.
Before attempting to install SRM, be sure to review the Compatibility Matrixes document for the most current requirements.
SRM provides a simple way to map inventory objects from the resources at the primary site to the resources available at the recovery site. For example, you may require virtual machines in a “Division 1” folder at the protected site to be placed in a “Division 1” folder at the recovery site. Inventory objects include virtual machine folders, network connections, and compute resources.
When mapping compute resources, rather than mapping host to host or virtual machine to server, you map the relevant resource pools at the primary site to the available resource pools at the recovery site. Resource pools eliminate the need to map specific servers. As long as the resource pool has sufficient resources, VMware DRS moves virtual machines to the host in the pool that has the necessary computing capacity.
SRM supports environments with multiple versions of ESX, so long as you map clusters that include hosts running the same version of ESX. For example, if a cluster on the protected site includes hosts running ESX 3.5, you must map that cluster to a cluster on the recovery site that also includes hosts running ESX 3.5. If the cluster on the protected site includes hosts running ESX 4.0 , you cannot map it to a cluster that contains hosts running ESX 3.5.
SRM automates the process of mapping these inventory objects. However, the resources that you map to on the recovery site must be in place before you can use the wizard. Therefore, it is important to plan how you want to map inventory objects for the protected virtual machines. Then ensure that all resources are available at the recovery site.
Finally, we highly recommend validating that DNS lookups in the protected and recovery site return the correct results.
Validate DNS on both vCenter Server systems, both SRM Server systems, and all ESX systems hosting protected virtual machines.
For each system, validate lookups for:
vCenter Server and SRM Server rely heavily on DNS, so ensuring that all systems return the correct results will save time in the long run.
When the preparation tasks are complete, you are ready to install SRM. Installation involves four basic steps.
First, create an SRM database. The SRM database at each site holds information about virtual machine configurations, protection groups, and recovery plans. The SRM installation wizard does not create a database, so you must create one before you install SRM. Configuration requirements, such as user privileges, vary by type of database. Be sure to refer to the SRM Administration Guide to learn the specifics for the database you are using.
Next, run the SRM installation wizard to install SRM Server. As part of the installation, you connect to the SRM database that you created in the first step.
Then, use the vSphere Client to connect to the vCenter Server system and install and enable the SRM plug-in.
Finally, install the appropriate storage replication adapters. You must install SRAs on the same physical server as the SRM service. Download and install SRAs for each array you have in the SRM setup. Typically, SRA installation does not require much, if any, configuration. Detailed storage vendor-specific instructions are included in a README file. After installation, restart the SRM service to ensure that it can find the new files.
After you have performed these four steps at one site, repeat them to install SRM at the other site.
SRM Setup Demo: https://vmware.adobeconnect.com/_a58402297/p64335443/
SRM installation sets up a close correlation between the SRM instance, its associated vCenter Server system, and the SRM database. Altering authentication information to the SRM database or the vCenter Server system will immediately prevent SRM from functioning correctly. For example, if you change the administrator password for the vCenter Server system, SRM will no longer be able to communicate with vCenter Server.
Whenever authentication information changes after you have installed SRM, use the installation repair wizard to update SRM with the new credentials.
You can use the installation repair wizard to change:
The username and password of the vCenter Server administrator.
The certification method for authentication.
And the SRM database information.
One important thing to remember about using the repair feature is that it overwrites the existing installation of SRM. Therefore, if you have applied patches to SRM and you use the repair wizard to change authentication information, you have to reapply the patches.
After you install SRM at both sites, you are ready to begin the SRM setup tasks at the protected site.
The Summary tab of the Site Recovery view lists the setup tasks to perform. Let’s take a closer look at each of these tasks.
Here are the four tasks to set up protection.
The first task, Connection, pairs the recovery site to the protected site so that the two SRM Server systems can talk to each other. SRM Server systems never communicate directly with each other. Instead, all communication is proxied through the vCenter Server systems to reduce the number of openings necessary in firewalls between sites.
Connection is achieved through three basic steps:
• First the SRM Server system at the protected site establishes communication with the vCenter Server system at the recovery site and validates its certificate.
• Then the SRM Server system at the protected site goes through the vCenter Server system at the recovery site to validate the certificate of the SRM Server at the recovery site.
• Finally, to establish reciprocity, the SRM Server system at the recovery site validates the certificate of the vCenter Server system at the protected site so that it can then validate the certificate of the SRM Server system at the protect site.
The next task is to configure the SRM array managers. This task connects the SRM Server system to the storage arrays that include the datastore groups set up for replication. You configure one storage array for each type of SAN or NFS array at both the protected site and recovery site.
Configuring inventory mappings involves associating the inventory objects at the protected site to ones at the recovery site. Before you can perform this configuration task, the inventory objects must already be set up at the recovery site as we discussed in the Site Preparation section of this module.
The mappings that you set during this task become the default settings for the replicated virtual machines. You can customize any virtual machine to override these default mappings. Setting inventory mappings is optional. You can create protection groups without these default settings; however, you will have to configure each protected virtual machine individually.
Let’s take a closer look at these three setup tasks before we talk about creating protection groups.
SRM Configuration Demo:https://vmware.adobeconnect.com/_a58402297/p72486258/
Of the many CIOs that I have had the pleasure to either work for or discuss with, one of the main concerns that constantly resonate is that of job longevity. When on average the job longevity for a CIO is between only 4-5 years and with trends showing that this is likely to shorten, it's no surprise that the role of a CIO requires instant success in minimal time and typically with minimal budget. Nearly every CEO’s mandate to a CIO is for IT to be better, faster and cheaper.
With this challenge the three steps to success for any CIO are plain and obvious. They are:
1) Eliminate risk
2) Improve Cycle Times
3) Reduce Cost
While these three steps may incorporate subsidiary aspects such as demonstrating how IT best serves the business, building technological confidence to the business and making IT more effective etc. they eventually all fall under one of the three steps mentioned above.
Step 1: Eliminate Risk
Firstly by eliminating risk from your IT environment you immediately address the business concerns of:
- The revenue impact of downtime
- The revenue impact of performance slowdowns
- The impact to the business’ brand value
Step 2: Improve Cycle Times
With a common business perception that legacy IT is too slow to deliver, improved cycle times are an imperative. This requires a solution that can accelerate the following and of course risk free:
- Virtualisation and consolidation
- Refresh projects
- New Application and Service Roll outs
- Private Cloud initiatives
Step 3: Reduce Cost
The last and most obvious one also presents the biggest challenge especially as customarily the last thing a new CIO can do is ask for a large investment to implement their new IT strategy. The business will quickly recognize a CIO’s success if they can prove that during their tenure they reduced CapEx and OpEx as well as Total Cost of Ownership.
So it’s at this point imperative to remember that a CIO should not be concerned with buying technology from different silos and vendors but instead acquiring solutions that solve business problems. Long gone are the days when it was acceptable for a CIO to proudly boast the magnitude of their data centres and the large technology growth they had accumulated in an attempt to ensure everything was fully redundant. Instead the key drivers are for simplification, standardization & consolidation. This is where the concept of VCE's Vblock is key to a CIO’s success.
Infrastructure more often than not doesn’t carry the same sassiness or prominence to the business as a key application such as SAP but infrastructure is in essence the heart and soul of a business – if the server or storage goes down, the application won’t work which ultimately means you cannot ship and sell your product, hence why the three steps to CIO success are linked to a successful infrastructure.
How to Eliminate Risk:
An integrated stack should entail a robust disaster recovery and business continuity solution that can not only be tested and proven but also implemented and run with minimum complication.
This should also incorporate the de-risking of application migrations from physical to virtual platforms and more specifically key applications that the business depends on.
Moreover this means a de-risked maintenance and operational procedure for the IT environment that is pretested, prevalidated and predictable and consequently eliminates any unplanned downtime.
In the past eliminating risk in this way has resulted in countless testing and validation procedures where every minute spent testing is a minute spent not growing the business. A true converged infrastructure can immediately resolve this.
How to improve Cycle Time:
Delivering a predefined, pre-integrated stack or in essence a plug and play data centre that’s delivered and built fit for purpose in typically only 30 days can quickly achieve this by reducing typical infrastructure delivery times by three months. Having proven infrastructure in minimal time allows the application owners to roll out new services at a fraction of the time and consequently cost.
How to Reduce Cost:
The key to this is to link any proposed investment to a tangible ROI that spans at least three years. Where most vendors have made the mistake of determining ROI based on virtualizing a total physical infrastructure this rarely works as most organisations have already virtualised to some extent. Instead an incremental value needs to be formulated that is linked to the virtualisation of key business critical applications.
Additionally with an integrated solution across the stack there’s no need to manage multiple components of an infrastructure and consequently multiple failure points that preoccupy multiple silos. This encompasses a changing of the mindset of technology being a break/fix, reactive organisation where heroes are rewarded for extinguishing fires. Instead a proactive and preventive methodology that has an “always on” culture will be adopted.
By streamlining the workforce to do more with less in correlation with application teams, OpEx cost savings can quickly be achieved by redeploying money from the back end infrastructure to front office, revenue enhancing business value and productivity.
To conclude, technology’s protocol is to enable the business. Ensuring success in the three aforementioned steps enables a CIO to quickly enable the business …..and it may also allow them to stay in their job that little bit longer.
- With Vblock you instantly get the infrastructure but to take it further up the stack we are demonstrating how we integrate with the M&O solutions. We’ll be demonstrating how Cisco Intelligent Automation for Cloud, VMware Cloud Director, Vmware vCenter Orchestrator and Vmware vCenter Operations can integrate with the Vblock to provide end users that Public Cloud experience.
Ho fondato l'azienda insieme ad altre persone che, via via si sono perse per strada, mi sono reso conto che molto spesso oggi si cerca semplicemente uno stipendio, un compenso, ma non ci si mette in discussione fino in fondo.
Cio chiediamo come mai altri paesi europei e oltreoceano stiano crescendo mentre l'Italia è al palo da tempo.
La colpa è nostra, da molto tempo non vedo negli occhi di colleghi e/o collaboratori il fuoco del conoscere, la sete di sapere, la curiosità del nuovo.
Ogni giorno, ogni sera, ogni momento è buono per accrescere la propria persona.
Sono certo che vi siano ancora molte persone che si mettono in discussione fino in fondo.
Ho la speranza di conoscerne qualcuno al VMWord 2012
Love it or hate it, ITIL and Change Management will always be an integral part of any IT set up with regulations such as BASEL II, FISMA, SOX (Sarbanes-Oxley) and HIPAA constantly breathing down the neck and conscience of organization leaders. Having once had a “purple badge” wearing ITIL guru for a manager, it always fascinated me how he’d advocate the framework as the solution to all our IT problems. While he’d hark on about defining repeatable and verifiable IT processes, it always ended up being theoretical as opposed to practical, often emphasized by his own IT competency, “Err, Archie how do I save this Word document and what on Earth is that SAN thing you keep going on about?”
That distinction between theory and practice was never more apparent than in the almost pointless CAB (Change Advisory Board) meetings that took place on a weekly basis. While the Change Management processes themselves were painfully bureaucratic and often a diversion from doing actual operational work, the CAB meetings were a surreal experience. With barely anyone in attendance the CAB would ask for a justification to each change, with a response of “approve” or “rejected” when it was clear that they had little or no idea of the technical explanation or implication that was given to them.
Then there was the Security/Risk Compliance chap who’d lock himself in his room glued to his Tripwire dashboard carefully spying on any unapproved changes. Such was his fascination with Tripwire that he too barely attended the CAB meetings, instead indirectly emphasizing his lack of trust and relevance of the Change Management system. So imagine his amazement when I introduced him to a new product we had implemented for our WINTEL environment called VMware and its feature VMotion. The fact that I had been seamlessly migrating VMs across physical servers without raising a change and without him being able to pick it up on Tripwire sent him into a perplexed frenzy. Somewhat amused by his constant head shaking, I decided to disclose that I had also been seamlessly migrating LUNs across different RAID Groups with HDS’ Cruise Control to get more spindles working, upon which like Batman he'd rushed back to his cave to check whether "Big Brother" Tripwire had picked it up. Was I really supposed to raise a change for every VMotion or LUN migration?
Several years later after moving from being a customer to a technical consultant my impression of the effectiveness of the CAB failed to improve. Midweek and late in the day in the customer’s data center with their SAN Architect, I’d pointed out that they had cabled up the wrong ports in their SAN switches and that this would require a change to be raised. “No need for that” replied the SAN architect, “I’m one of the CAB members”. He then to my shock and in true Del Boy fashion, duly proceeded to pull out and swap the FC cables to his production hosts with a big grin on his face. Several minutes later his phone rang, to which he replied, “It’s okay, I’ve resolved it. There was a power failure on some servers.” Then with a cheeky grin, a swing of the head and a wink of an eye, he turned to me and said, “There you go sorted, lubbly jubbly!”.
While my initial skepticism to ITIL’s practicality was centered around my personal experiences it was only embellished by the number of long white bearded external auditors that would supposedly check whether proper controls existed within the many firefighting and cowboy organizational procedures I witnessed. Like a classroom of kids hearing the teacher coming up the corridor and scurrying to get to their desk to present a fabricated impression of discipline and order, I never ceased to be astounded by the last minute changes and running around of our compliance folk to ensure we successfully passed our audits. Despite having more daily Priority 1s than the canteen was serving decent hot meals, we still inexplicably passed every audit with flying colours, which in turn emboldened the rogue “under the radar” operational practices that served to keep the lights on.
So with such a tarnished experience of ITIL, it was with great curiosity and interest that led me to look closer at the movement and initiative of ITPI’s Visible Ops. While still mapping its ideas to ITIL terminology, the onus of Visible Ops is on increasing service levels, decreasing costs and increasing security and auditability. In simplest terms, Visible Ops is a fast track / jumpstart exercise to an efficient operating model that replicates the researched processes of high-performing organizations in just four steps.
To summarise, the first of these four steps is what is termed Phase 1 or "Stabilize the Patient". With the understanding that almost 80% of outages are self-inflicted, any change outside of scheduled maintenance windows are quickly frozen. It then becomes mandatory for problem managers to have any change related information at hand so that when that 80% of “unplanned work” is initiated a full understanding of the root cause is quickly established. This phase starts at the systems and business processes that are responsible for the greatest amount of firefighting with the aim that once they are resolved they would free up work cycles to initiate a more secure and measured route for change.
Phase 2, which is termed “Catch & Release” and “Find Fragile Artifacts”, is related to the infrastructure itself with the understanding that it cannot be repeatedly replicated. With an emphasis on gaining an accurate inventory of assets, configurations and services, the objective is to identify the “artifacts” with the lowest change success rates, highest MTTR and highest business downtime costs. By capturing all these assets, what they’re running, the services that depend upon them and those responsible for them, an organization ends up in a far more secure position prior to a Priority 1 firefighting session.
Phase 3 or “Establish Repeatable Build Library” is focused on implementing an effective release management process. Using the previous phases as a stepping stone, this phase documents repeatable builds of the most critical assets and services enabling their rebuilding to be more cost effective than to repair. In a process that leads to an efficient mass-production of standardized builds, senior IT operations staff can transform from a reactive to a proactive release management delivery model. This is achieved by operating early in the IT operations lifecycle by consistently working on software and integration releases prior to their deployment into production environments. At the same time a reduction in unique production configurations is pushed for, consequently increasing the configuration lifespans prior to their replacement or change which in turn leads to an improvement in manageability and reduction in complexity. Eventually the output of these repeatable builds are "golden" images that have been tried, tested, planned and approved prior to production. Therefore when new applications, patches and upgrades are released for integration these golden builds or images need merely updating.
The fourth and last phase, entitled “Enable Continuous Improvement” is pretty self explanatory in that it deals with building a closed loop between the release, control and resolution processes. By completing the previous three phases, metrics for the three key process areas (release, controls and resolution) are focused on, specifically those that can facilitate quick decision making and provide accurate indicators of the work and its success in relation to the operational process. Drawing on ITIL‘s resolution process metrics of Mean Time Before Failure (MTBF) and Mean Time to Repair (MTTR), this phase looks at Release by measuring how efficiently and effectively infrastructure is provisioned. Controls are measured by how effectively the change decisions that are made keep production infrastructure available, predictable and secure, while Resolution is quantified by how effectively issues are identified and resolved.
So while these four concise and particular phases look great on paper what really differentiates them from potentially just being another theoretical process that fails to be delivered comprehensively in practical reality? If the manner in which IT is procured, designed, configured, validated and implemented remains the same there is little if any chance for Visible Ops to succeed any much further than the Purple Badge lovers of ITIL. But what if the approach to IT and more specifically its infrastructure was to change from the traditional buy your own, bolt it together and pray that it works method and instead transferred to a more sustainable and predictable model? What if the approach to infrastructure was one of a green fields approach or seamless migration to a pretested, pre-validated, pre-integrated, prebuilt and preconfigured product i.e. a true Converged Infrastructure? What impact could that possibly have on the success of Visible Ops and the aforementioned four phases?
If we look at phase 1 and “stabilizing the patient” this can be immediately achieved with a Vblock where an organisation no longer has to spend time investigating and worrying about the risk and impact of change. By having a standardized product based approach as opposed to a bunch of components bundled together, thousands of hours of QA testing and analysis work can be performed by VCE for each new patch, firmware upgrade or update on a like for like product that is owned by the customer. With this acting as the premise of a semi-annual release certification matrix that updates all of the components of the Converged Infrastructure as a comprehensive whole, risks typically associated with the change process are eliminated. Furthermore as changes are dictated by this pre-tested and pre-validated process and need to adhere to this release certification matrix to remain within support, it helps eradicate any rogue based changes as well as inform problem managers comprehensively of the necessary changes and updates. Ultimately phase 1’s objective of stabilization is immediately achieved via the risk mitigation that comes with implementing a pre-engineered, pre-defined and pre-tested upgrade path.
The challenge of phase 2, which in essence equates to an eventual full inventory of the infrastructure, is a painful process at the best of times especially as new kit from various vendors is constantly being purchased and bolted on to existing kit. Moving to a Vblock simplifies this challenge as it’s a single product and hence a single SKU at procurement. Akin to purchasing an Apple Macbook that is made up of many components e.g. a hard drive, processor, CD-ROM etc., the Converged Infrastructure’s components are formulated as a whole to provide the customer a product. The parts of the product and all of their details are known to the manufacturer i.e. VCE and can easily be transferred as a single bill of materials to the customer with serial numbers etc. thus ensuring an up to date and accurate inventory and consequently simplified asset management process. When patches, upgrades and additions of new parts and components are required they are automatically added to the inventory list of the single product, thus ensuring up to date asset management.
The Release Management requirement of Phase 3 offers a challenge that is not only embroiled with risk but also takes up a significant amount of staff and management time cycles to ensure that technology and infrastructure remain up to date. This entails the rigmarole of downloading, testing and resolving interoperability issues of component patches and releases and relies heavily on the information sharing of silos as well as the success of regression tests. The unique approach of a Vblock meets this challenge immediately by making pre-tested, validated software and firmware upgrades available for the end user enabling them to locate releases that are applicable for their Converged Infrastructure system. With regards to the rebuild as opposed to repair approach stipulated in phase 3, because a Vblock can be deployed and up and running in only 30 days, the ability to have a like for like standardized infrastructure for new and upcoming projects is a far easier process compared to the usual build it yourself infrastructure model. On a more granular level, by having a management and orchestration stack with a self service portal, golden image VMs can be immediately deployed with a billing and chargeback model as well as integration with a CMDB. The result is a quick and successful attainment of phase 3 of the Visible Ops model via a unified release and configuration management methodology that is highly predictable and enhances availability by reducing interoperability issues.
Measuring the success of metrics such as MTTR and MTBF as detailed in Phase 4 is ultimately linked to the success of the monitoring and support model that’s in place for your infrastructure. With a product based approach to infrastructure the support model will also be better equipped to ensure continuous improvement. Having an escalation response process that is based on a product, regardless if resolving a problem requires consultation with multiple experts or component teams, ultimately means a seamless and single point of contact for all issues. This end-to-end accountability for an infrastructure’s support, maintenance and warranty makes the tracking of issue resolution and availability a much simpler model to measure and monitor. Furthermore with open APIs that enable integration with comprehensive monitoring and management software platforms, the Converged Infrastructure can be monitored for utilization, performance and capacity management as well as potential issues that can be flagged proactively to support.
As IT operational efficiency becomes more of an imperative for businesses across the globe, the theoretical practices that have failed to deliver are either being assessed, questioned or in some cases continued with. What is often being overlooked is that one of the key and inherent problems is the traditional approach to building and managing IT infrastructure. Even a radical and well researched approach and framework such as Visible Ops will eventually suffer and at worse fail to succeed if the IT infrastructure that the framework is based on was built by the same mode of thinking that created the problems. Fundamentally whether the Visible Ops model is a serious consideration for your environment or not, by adopting the framework with a Vblock, the ability to stabilize, standardize and optimise your IT infrastructure and its delivery of services to the business becomes a lot more practical and consequently a lot less theoretical.
VMworld 2012 is ready to hit Barcelona, October 9-11. And with this conference, comes a huge opportunity to participate in VMworld social media -- before, during and after the conference. Be sure you are connected to as many social media channels and opportunities as possible to get the most out of the VMworld experience.
If you are planning to participate in social media at VMworld 2012 in Barcelona, whether attending the conference or following online, please consider these resources:
View or download the VMworld 2012 (Barcelona) Social Media & Community Guide
Here are a summary of social media pages/resources on VMworld.com:
VMworld Discussion Forum (Note: use this resource to post conference Q&A)
VMworld Social Media
VMworld on Twitter
VMworld Blogger Coverage
VMworld News Coverage
VMworld TV on YouTube
VMworld Community Tech Talks
Be sure to register today, book your transportation & hotel and prepare for VMworld 2012 in a premier location -- Barcelona! Do you need to convince your manager to let you attend? Try the VMworld Europe ROI Letter to help your boss understand the importance of sending you.
Once registered and booked, be sure to plan and build your schedule using schedule builder and content catalog. And share your questions and comments with other attendees using VMworld discussion threads or social media channels.
For a full video recap of VMworld 2012 San Francisco, read the VMworld 2012 Video Roundup
VMworld 2012 was packed with great video content. As many of you are writing blogs and doing summary reports, I'd like to highlight some of this content available now. Please spend time on VMworld TV, where we have most of this content hosted for your one-stop destination.
Day 1 (Monday): Paul Maritz and Pat Gelsinger shared how VMware is helping customers and partners thrive in the Cloud era. Steve Herrod discussed and demonstrated technology at the heart of the software defined datacenter.
Day 2 (Tuesday): Steve Herrod and VMware partners took the stage to demonstrate state of the art technology that is transforming IT and enabling the mobile workforce.
CEO Roundtable: Panel of executives discussing the future of technology at work.
VMware NOW and the new VMware vCloud Suite
Visit VMware NOW, our new year-round site for VMware product announcements, currently featuring the new VMware vCloud Suite announced Monday, Aug 27th at VMworld. Watch dozens of overview videos highlighting vCloud Suite, including new features of vSphere 5.1 and vCloud Director 5.1.
Watch the up-close series of videos from the VMworld TV crew as they shared conference highlights, interviews and demos from Moscone Center in San Francisco. If you didn't get to attend the conference, these videos offer a bit of the VMworld experience -- or if you did attend, these videos should help summarize the epic week we all shared.
Watch the in-depth series of VMworld interviews presented by theCUBE (via SiliconAngle and Wikibon). These interviews include key executives, thought-leaders, technologists, community experts and industry panels.
Whether unable to attend VMworld, or unable to spend time in the Hang Space during the conference, the Tech Talks were one of the best offerings to get up close and in-depth with technical experts in the community. Watch this series of quick sessions that were recorded on LiveStream.
Top 10 "Most Popular" Sessions
We've posted the top 10 sessions (most popular based on schedule builder demand) for viewing on vmworld.com.
VMworld Community and Partner Videos
We have collected and aggregated over a hundred videos from attendees, community members and partners during VMworld so far. Take some time to browse through these videos that share the experiences, interviews, testimonials and demos from the conference.
There is much more content to discover on VMworld TV, including archives from past conferences. So be sure to spend some time on our YouTube channel. And get set for more videos coming soon from VMworld 2012 Barcelona!
Many performance issues can be a result of customer perception as opposed to real performance problems. What a customer may consider as being “slow” or “performing poorly” may actually be the maximum they can achieve on their system. The main task at hand is identifying if a performance issue actually exists. To do this it is vital to thoroughly understand the performance issue, isolate the issue, then resolve it if applicable.
Understanding the Performance Issue:
This is the most important part of approaching a performance SR, the more you understand and know about the issue, the easier it will be to resolve. To do this will involve asking lots of questions. Performance issues are almost always poorly described so a common troubleshooting trap is arriving at incorrect conclusions. Here are sample questions that could be asked if a customer opens an issue such as “My VM performs very poorly”
- How are you measuring performance on your system?
- Is this issue happening on one VM or multiple VMs?
- Has the performance ever been acceptable, if so when did the issue start occurring?
- How do you reproduce the performance issue and what are you comparing the performance to?
- Do you have any benchmark figures that you can provide when the performance is poor against when the performance is good?
- Is the performance issue fully reproducible or is it intermittent?
- If intermittent do you notice any pattern of what is happening before or during the issue?
- Do you know if the issue may be related to CPU, Memory, Network or Disk?
If there is anything you are unsure of in the customers reply, ask for further clarification to ensure you fully understand the issue, ( imagine that you are trying to reproduce it in-house and what information you would need ). Otherwise it will take much longer to troubleshoot the issue and resolve.
While most issues can be progressed much further with a Webex session, sometimes with performance SRs, it is not always possible to reproduce the issue at that point in time. Therefore the best approach in this case would be to ask for “vm-support -s” logs.
The “vm-support -s” should be started and finished during the test so as not to skew the result of the sample of snapshots collected. You can specify -d and -i if you wish to change the default length of time the script runs and the intervals during which the samples are collected.
Isolating The Performance Issue:
Performance can be very sensitive, so the best method of approach is to isolate and eliminate any obvious and general issues. e.g. Unsupported hardware, incorrect configurations, VMkernel error messages during time of issue etc. Having done that, one then needs to investigate in which of the four main resource areas the bottleneck of the issue resides: CPU, Memory, Network or Disk. There may be a problem in more than one area, so it is import to isolate and thus eliminate each resource.
The Four Main Resources:
99.9999999999% of customers assume the more vCPUs you give to a VM the better it will perform. This concept is incorrect. The problem with allocating more than the necessary CPU count can lead to time lost during CPU scheduling on physical CPU cores. E.g. A VM with 4 vCPUs will need to schedule CPU on 4 physical CPU core
slots that are available at the same time, while a VM with only 1 vCPU will need to schedule CPU on 1 core. This does not mean you cannot use 4 vCPUs, some applications can be very CPU intensive and will need multiple CPUs. But most applications will work very well under 1 vCPU or 2 vCPUs. We recommend using 1 vCPU first, monitoring CPU usage and then increasing CPU if necessary. E.g. If a VM has an application and at peak times it is using 70-90% of CPU then that application may work better in an SMP ( multiple CPU ) environment. It also depends if the application is single or multi threaded. The best approach with this would be to increase to 2 vCPUs and test if the load is balanced over both.
Note: When increasing CPU count from 1 to 2 in a VM it is important that the HAL is correct in a Windows VM. i.e. If the VM is using 1 CPU the HAL should be uni-processor, if it is using 2+ CPUs then it should be multi-processor.
Watch out for the following:
- Check the stepping of the CPUs to see if they match: ( cat /proc/vmware/cpuinfo )
- Check for any interrupt sharing with USB devices. See KB 1290.
- Check if any CPU reservations or limits have been set: ( less <path to .vmx file> )
- In esxtop ( default screen after command is run ) high USED values for a VM indicate the VM is running out of CPU resources ( number of vCPUs may need to be increased, or process causing high CPU needs to be indentified )
- In esxtop high RDY ( above 5% ) values indicate the VM is ready to run CPU instructions but cannot schedule a slot on the physical cores. It may be necessary to reduce the number of vCPUs for the VM or if not applicable then check if the ESX Server is overloaded. To identify then look at the PCPU(%) for each core.
Useful KBs related to CPU Performance:
- Virtual machine CPU usage spikes and remains abnormally high after VMotion in a VMware DRS enabled cluster ( 1003638 )
- ESX Server Reports Increased CPU Utilization for Idle Microsoft Windows Server 2003 SP1 SMP Virtual Machines ( 1730 )
- Abnormal CPU spike with Anti-virus installed in virtual machine ( 1002262 )
- High CPU in Windows 2003 VM After P2V ( 1007172 )
- ESX server running high on CPU load. performance issues ( 1005329 )
- Virtual machine does not power on and there is high CPU reservation ( 1001637 )
- CPU Utilization Peaks After Installing Dell OpenManage ( 1004508 )
- Incorrect CPU mask causes high CPU utiliization (poor peformance) inside of the VM (Citrx server) (1005752 )
- Dell CPU overload ( 1007514 )
- IRQ Sharing Might Impact Performance ( 1290 )
Just like CPU, one should assign only the necessary amount of memory to a VM. Overallocating memory is not advised and can cause performance issues. The best way to check memory usage for a VM is to open up Task Manager within the guest OS and look at the PF Usage Screen on the Performance Tab. To see what the VMkernel observes run esxtop, type m to navigate to the memory screen, and look at the TCHD field and view the current active memory within the guest. Please note that the VMkernel can only see active memory within the VM and not the total amount of memory consumed by the guest, so looking at Task Manager is the best way to estimate total memory usage.
If the VMkernel observes a VM is running low on memory it will use the memory ballooning technique to borrow memory from an idling VM and lend it to a VM that needs it. This only has a very very small impact on performance. Therefore it is vital to ensure that the VMware Tools are installed and running on the VM to ensure that the memory balloon ( memctl ) is active.
To check if a VM is ballooning you can do the following within esxtop:
Type f for Field and then select I for Memory Ballooning Statistics ( MCTL ).
MCTL: This indicates if the Balloon Driver within the VM is installed.
MCTLSZ: Displays the amount of memory being ballooned.
MCTLTGT: Display the amount of memory the ESX Server would like to reclaim by way of ballooning.
MCTLMAX: The max memory that can be reclaimed via ballooning.
To check if a VM is swapping memory ( this can lead to very poor performance ) you can do the following within esxtop:
Type f for Field and then J for SWAP statistics.
SWCUR: Shows the current swap usage.
SWTGT: What the ESX Server expects the swap usage to be.
SWR/s: Rate at which memory is being swapped from disk.
SWW/s: Rate at which memory is being swapped to disk.
NOTE: If a VM is swapping memory it means it is running low on memory resource so ensure enough memory is assigned and that a reservation below the assigned memory has not been set.
Watch out for the following:
- Check if any memory reservations or limits have been set:
( less <path to .vmx file> )
- If the ESX Server is using NUMA Nodes check if all nodes are evenly balanced.
( MachineMem ) Otherwise performance issues can occur.
# cat /proc/vmware/NUMA/hardware
System type : Unspecific System
# NUMA Nodes : 2
Node ID MachineMem ManagedMem CPUs
0 00 2047 MB 1736 MB 0 1
1 01 2048 MB 2021 MB 2 3
- If the Service Console is swapping memory ( you can see this from top output )
it may be because of Third Party Agents being installed and running on the Service Console. ( esxupdate –l query ) . VMware advise increasing the memory for the Service Console to 800MB ( max ) in such a situation for best COS performance.
Useful KBs related to Memory Performance:
- Balloon driver not releasing memory, causing virtual machine performance issues and guest garbage collection (1003470 )
- Virtual machine boots very slowly when Memory Limit is less than Physical Memory assigned ( 1002843 )
- Investigating operating system memory usage ( 1004014 )
- ESX Server Memory Management on Systems with AMD Opteron Processors
( 1570 )
- ESX Server 3.5 may see physical memory above 64GB as memory in use ( 1003551 )
- Understanding Memory Active and Memory Usage indicators in ESX 3 ( 1002604 )
- ESX Virtual Machine Won't Start (Insufficient Memory) ( 1330 )
- Red Hat 4 U4 64-Bit Guests Kernel Panic, Out of Memory( 1001093 )
Network Performance issues can often be a result of contention for network due to high load and number of users/worlds and can also be as a result of the specific throughput of a physical card.
Run esxtop and to navigate to the Network screen you simply type n. On the screen you will see references to VMs, vSwitches, physical network cards and the Service Console interface. To see the current throughput of a network card combine the values of MbTX/s and MbRX/s for a specific vmnic. You can also do the same for a VM to
calculate its current throughput.
Also watch out for dropped packets via the DRPTX and DRPRX fields. Dropped packets can be a result of too high a load, or else can be down to a physical issue with the card, cable or port. All of these factors should be investigated if values are seen in these fields.
How to Calculate Maximum Network Throughput:
Run netserver.exe on one system and netclient.exe on the other system to
arrive at a throughput value between the two systems.
By default, when you run Netserver, it uses port 12865. With the -p switch,
you can specify a different port for the command to use. Once executed,
Netserver will continue to run until a client connects to it.
Once you have the netserver.exe file on one system and the netclient.exe file
on the second system, you would perform these steps to test throughput:
1. On the first system, access the command prompt and run netserver.exe (you
must run the command from the folder where it resides).
2. From the second system, run netclient.exe -H from the command prompt.
Wait a few seconds and you will see the throughput information displayed on
the system that ran Netclient. You will also see the netserver command
execution automatically terminate on the first system.
Here is a sample of the command and resultant output from running Netclient:
G:\Netperf>netclient -H fs1.mcpmag.com
TCP STREAM TEST to fs1.mcpmag.com
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
8192 64512 64512 10.00 97.30
Watch out for the following:
- Do the vmnics report speed and duplex settings that match the expectation of the hardware? ( esxcfg-nics –l ). Hardware connectivity issues might cause a vmnic to autonegotiate a lower speed or half duplex mode.
- Ensure the VMware Tools are installed and running on a VM so that one can make use of the vmxnet and enhanced vmxnet drivers where possible.
- To reduce Network Contention, it is always best practise to separate Service Console networking from Virtual Machine networking via a dedicated vSwitch for the COS.
- Load Balance with vmnics where possible ensuring the appropriate load balancing algorithm is being used. e.g. Use IP Hash for Load Balancing when using Etherchannel on your physical network. In most other cases use Route Based on Originating Port ID.
Useful KBs related to Network Performance:
Network performance issues ( 1004087 )
Configuring the speed and duplex of an ESX Server host network adapter
( 1004089 )
Low Network Throughput in Windows Guest when Running UDP Application
( 5298153 )
Choosing a Network Adapter for Your Virtual Machine ( 1001805 )
Ethernet 1000 MB Auto-Negotiation ( 1006763 )
Configuring promiscuous mode on a virtual switch or portgroup ( 1004099 )
NIC teaming in ESX Server ( 1004088 )
Storage performance issues can often be a result of a misconfiguration or else the potential performance of the physical storage. Using esxtop one van view disk adapter statistics by pressing d.
By typing v we can view the disk statistics for each VM.
To see statistics for each disk device type u.
One can also expand a device to see all running VMs by typing e and the full device name e.g. vmhba0:1:0
ACTV: The number of commands that are currently active. A constant number of commands here is a healthy sign and indicates continuing disk activities.
QUED: This shows the number of queued commands that the host will process after ACTV commands have finished. A constant number here indicates a heavily loaded system.
%USD: This shows the percentage of the queue depth used by active commands.
High values here indicate the likelihood that commands are queued, and you may need to adjust the queue depths for system’s HBAs.
LOAD: This counter provides an estimate of the use of a single HBA. It represents the ratio of the number of commands that are active or queued to the total number of commands that can be active or queued at one time. A LOAD value of 1.0 means that both the active buffer and queue are full. At this point, the server begins failing to execute commands.
MBREAD/s and MBWRTN/s: These represent megabytes read per second and megabytes write per second. Combine both values to see the current throughput of a device or a VM.
The above shows Latency statistics for the disk and can be selected when you type f for Field and j for Latency Stats ( ms ). Look at DAVG/cmd for possible disk issues causing latency. Large values here >( 30-50 ) indicate an issue. Storage vendors provide latency statistics for their hardware that you can check against values reported here.
Two more important fields to look out for are ABRTS/s and RESETS/s.
These can be selected by typing f for Field and k for Error Stats.
Values here indicate a sign that the storage subsystem is unable to handle
requests as the Guest OS’s expect. One many need to upgrade hardware, redesign storage, or reconfigure and relocate VMs.
How to Calculate Maximum Storage Throughput and Max I/O for a VM:
NOTE: Values below may change depending on what you are testing.
Please download the free tool iometer
and install it on one of your Virtual Machines where we will run the test.
We need to establish the baseline speed for max throughput and max i/o with your configuration.
To do this we must run the following two iometer tests.
To Test Disk Performance:
· Double-click on Iometer.exe. The Iometer main window appears, and a Dynamo workload generator is automatically launched on the local computer.
· Click on a manager (the name of the local computer) in the Topology panel on the left side of the Iometer window. The manager’s available disk drives appear in the Disk Targets tab. Blue icons represent physical drives; they are only shown if they have no partitions on them. Yellow icons represent logical (mounted) drives; they are only shown if they are writable. A yellow icon with a red slash through it means that the drive needs to be prepared before the test starts; see the Disk Targets Tab - Reference section for more information on preparation.
· In the Disk Targets tab, select a disk or disks to use in the test (use Shift-click and Control-click to select multiple disks). The selected disks will be automatically distributed among the manager’s workers (threads).
· Switch to the Access Specifications tab. Double-click on “Default” in the Global Access Specifications list (the one with the globe icon). The Edit Access Specification dialog appears.
· The Edit Access Specification dialog shows you how the disk will be accessed.
The default is 2-Kilobyte random I/Os with a mix of 67% reads and 33% writes, which represents a typical database workload. You need to set these parameters according to the test type (see TEST 1/2 below) maximum throughput OR maximum I/O rate.
Press OK to close the dialog when you are through.
For maximum throughput (Megabytes per second), change the Transfer Request Size to 4MB, the Percent Read/Write Distribution to 100% Read, and the Percent Random/Sequential Distribution to 100% Sequential. set the number of I/O's per target to 16.
Run the max throughput test for 6 minutes.
While the max throughput test is running on the virtual setup, run the following script on the ESX console as user root.
vm-support -n -s -d 300 -i5
For the maximum I/O rate (I/O operations per second), change the Transfer Request Size to 512 bytes, the Percent Read/Write Distribution to 100% Read, and the Percent Random/Sequential Distribution to 100% Sequential.
· Switch to the Results Display tab. Set the Update Frequency to 10 seconds.
· Press the Start Tests button (green flag). A standard Save File dialog appears.
Select a file to store the test results (default results.csv).
· After 10 seconds the first test results appear in the Results Display tab, and they are updated every 10 seconds after that. Press the button to the left of each bar chart for a menu of the different results you can display. You can also drag a worker or manager from the Topology panel to a bar chart to see the results of just that worker or manager.
· Press the Stop Test button (stop sign). The test stops and the final results are saved in the results.csv file. This is a comma-separated text file that can be viewed in any text editor or imported into a spreadsheet.
Run the max throughput test for 5 minutes.
While the max throughput test is running on the virtual setup, run the following script on the ESX console as user root.
vm-support -n -s -d 300 -i5
Watch out for the following:
- Ensure battery pack write cache is enabled on the storage. It results in greatly improved disk performance.
- When possible use the LSI Logic SCSI Controller it has a greater queue depth.
- Increase the number of outstanding disk request for a VM by adjusting the
- Increase the queue depths for the HBAs if possible.
- Ensure that all firmware is up to date for any storage cards/devices.
Useful KBs related to Disk Performance:
- Poor disk write performance to VMFS filesystem ( 1001793 )
- Disk I/O problems and poor performance with LeftHand NSM160 iSCSI storage array ( 1001060 )
- Limited Disk Throughput from Windows with BusLogic Adapter Causes Performance Problems ( 1890 )
- Write-cache disabled on storage array causing performance issues or failures ( 1002282 )
- Testing virtual machine storage I/O performance ( 1006821 )
Thats a lot of writing
Thanks to everyone who contributed and prompted me to write this blog
Last week, Ombud sent two lucky representatives - myself and Traci Johnson - to VMworld, an annual conference for VMware users, for the first time. This post chronicles our days through pictures and highlights some humorous tweets from other attendees' shared experiences.
Monday: VMworld 2012 from our Office
What better way to kick off a virtualization conference than watching the keynote presentation live from the luxury of our conference room 1,000 miles away? Our trip to VMworld began before we left the office. Thanks to VMworld’s live stream, we tuned into Monday’s keynote presentation from Boulder, CO.
Kevin McLaughlin, Senior Editor of CRN, summed up the keynote in less than 141 characters.
Please keep in mind, however, our plane didn’t land at SFO until Tuesday morning.
Tuesday: VMworld 2012 from Moscone
We arrived at VMworld – now able to see people in 3D – just in time for lunch on Tuesday. We enjoyed eating outside in the San Francisco sunshine after a morning of travel. Theron Conrey, Senior Solutions Engineer at Nexenta Systems shared a picture of the seating situation – a rustic break from high-tech virtualization and cloud discussions.
We wasted no time hitting the expo floor, first talking to the lovely Kerri and Emily at Red Hat's booth. A red hat from Red Hat – I’m a fan of clever swag. The cheesiness factor is an added bonus.
Wednesday: Ombud live at VMworld Day 2
We’ve recently partnered with Virtualization Review (that magazine in your new VMworld backpack!). We had the opportunity to walk the floor with a few of their great sales guys and meet the exhibitors.
One of my favorite personal takeaways from chatting with exhibitors: I’m not the only one who has “street Chucks” and “dress Chucks.” That’s right. It’s officially a thing. Just ask Bluelock; one of their crafty tech guys keeps his looking like new with whitewall tire cleaner – brilliant!
Amitabh, Senior Product Manager at VMware, pointed out something that we noticed too… suits?
#vmworld is great, huge and very slick but has lost the touch of woodstock with suits replacing geeks amongst the crowd— Amitabh Chakrabarty (@amitabhc) August 31, 2012
On the upside, we also talked with more startups than expected. Can’t wait to see where the upcoming year takes us in virtualization.
Amazing to see how many small storage startups there were at #VMworld. Who will shine and who will fail?— Steven Bryen (@steven_bryen) September 4, 2012
Lessons from Sessions
Maybe these weren’t from sessions. They’re probably not lessons, per se, either… maybe more like “VMworld Words of vWisdom.”
“Storage is like plumbing” - No, network is like plumbing, storage is like a septic tank.#vmworld— Dave Lawrence (@thevmguy) August 28, 2012
BYOD = SYOM (Spend Your Own Money) #VMworld— Scott Lowe (@scott_lowe) August 28, 2012
It’s true. You can, however, sprinkle fairy dust on Bon Jovi to turn him into the lead singer of a cover band.
Wednesday Night: VMworld party
See what happens when you cut vRam, you get a Bon Jovi cover band :) #VMworld— Dennis Smith (@DennisMSmith) August 30, 2012
It’s easier to forgive his Bob-Seger-cover encore when we remember violinist Lorenza Ponce’s phenomenal solo performance. Seriously, can we get another virtual round of applause for her? She rocks! Thus ends our first tour at VMworld and so begins the year of the “software-defined-datacenter” buzzword.
Overheard @ #VMworld:“What's your new company do?” “Can’t tell you yet, but it will be software-defined something.” :)itknowledgeexchange.techtarget.com/storage-soup/v…— Mike Harding (@mhardi01) August 31, 2012
Additional Recap Resources:
Here we are on the last day of VMworld 2012 in beautiful San Francisco, and we've rounded out the conference coverage with the wrap-ups from last two days of the conference.
From the VMworld Blog ~> The Power of Partners at VMworld 2012
Conference recaps of some of our Diamond and Platinum sponsors: Dell, EMC, Symantec, VCE
From the Virtual Reality Blog ~> VMworld 2012: Real Customers, Real Momentum
Reflection on VMware as its own cloud customer and success story
Credit to our own Manoj Jayadevan, for great photos he took of Jon Bon Jovi and the Kings of Suburbia who headlined at the VMworld Party:
This morning, as folks stumbled in from a night of partying, we sat down to our final general session. This time is actually had little to nothing to do with VMware and cloud ... it was a series of 3 really fascinating and entertaining speakers. I highly recommend catching this one on replay. You won't be bored and it's pretty eye opening.
I passed my VCP5-DT exam this afternoon too! Nice accomplishment for myself and Kelser. Fitting, since back home we are having the AMD/HP VDI event - we are offering free mini-assessments for VDI!
Hopefully the people enjoyed this blog - this will be the final post, as I'm flying back Friday. I hope everyone has a safe and fun Labor Day weekend too!
Create Your Own Personal BlogTo create a personal blog on VMworld.com, sign into your account, click on "Manage Account" in the top right corner of any page, click on the "Blog Posts" tab and then click on "Create a Personal Blog" or "Write a Blog Post" from within your account profile.
Note: All blogs will be monitored and reviewed for content. Any blogs not related to virtualization or considered to be spam or offensive will be removed.
Looking for a blog?
Can't find a specific blog? Try using the Blog page to browse and search blogs.
Popular Blog Posts
- Hello VMWorld
- The SANMAN
- Double-take Software's Blog
- A free virtual SAN for VMware ESX?
- Matt Kozloski's Blog (Kelser Corp)
- Blogworld of Amitabh
- Le blogue de Marc
- Virtually Uphill Blog
- Skytap Cloud Blog
- CommVault's VMworld 2011 Blog