- Start off testing a single server, not the farm/cluster. Originate the load from the same network switch to remove all possible network latency. I prefer Silk Performer for this.
- Test the application pages first. No assets, no images, no CSS, no .JS includes. Typically page generation by the application server bottlenecks long before the web server and static content. When doing this, hits/sec should equal pages/sec.
- Initially make each test transaction a new user. This will aggressively show you what impact your session persistence has on server memory growth.
- Test site content and order processing pages individually at first, then combine the tests. Order processing pages will always be more process intensive than site content that is most often cached by the app server, CDN, or on the client. Testing individually and then combined will show you how they impact each others page performance and define how you optimize going forward.
- Stress test by eliminating simulated "Think Time". In early testing you need to focus on levels of concurrency, not total users on the site. Concurrency translates directly to maximum pages/sec, transactions/sec, and data throughput.
- Under load/stress pay close attention to 3rd party resources (database). Understand what is responsible for increasing page generation response times. Allowed concurrency can be tuned on the app server to manage memory growth, CPU utilization, and 3rd party resource saturation. Request queuing is not bad and needs to be used to optimize and throttle request threading rather than letting unlimited requests eventually crush the server.
- Endurance test. Test with high concurrency/load for 8 to 12 hours. Small memory leaks not previously seen will be found here. This is your stability certification.
- Re-test with all page assets loading. Hits/sec will no longer equal pages/sec, note changes to CPU levels and page response times with all assets loading. Use these results to make client and CDN caching decisions.
- Increase load and re-test with all servers in the farm/cluster. Pay special attention to what happens to the performance of 3rd party resources (database) now with all web servers under heavy load.
- Test in the wild. Get outside the firewall/DMZ and test with Gomez Reality Load. Gomez Reality Load can send MASSIVE amounts real load from the internet to your site allowing you to fully test all components and hosts in your site and report performance on each host and asset. Your site pages and assets might run as expected under load, but will your 3rd party hosts? CDN providers, advertisers, page analytics plug-ins? At this point, take the opportunity to test your fail-over scenarios. Don't take anyone's "Word" for it. Fail-over strategies are like data backups. If you have not actually tested them in the wild, they don't work.
This is a very compressed top 10 with a lot of missing detail. If you would like to know more, feel free to contact me.
Matt, I just wanted to say you have some very good information here congratulations on that. This is the path we all continue to tread in our goals to help clients.
ReplyDelete