Top A/B Testing Mistakes: How to Set Up, Launch, and Evaluate Results

Brief Summary

A/B testing helps you check a hypothesis about which page will perform better. To make the results useful for the business, define your hypothesis in advance, calculate the sample size, split traffic simultaneously, don’t change the experiment conditions while it’s running, and track not only clicks but also final conversions.

Who should read this article:

website and online store owners who want to increase conversion rates;
SEO specialists, webmasters, and marketers who run A/B tests for pages, forms, CTAs, and offers;
product teams that need to evaluate experiment results without metric overload or false conclusions.

A/B testing is a way to compare an updated version of a website page with the original and assess how much more effective the new version is. The test is run on an audience from a single traffic channel. One group of users sees the control page, while another sees the test version. After the test, the results are compared to see whether the hypothesis was confirmed: whether the change on the page produced the desired effect.

Testing can fail because of even one webmaster mistake in setting up or running the experiment. The conclusions will be incorrect, which means the effectiveness of the new button, design, or CTA will remain unknown.

Mistakes at the Preparation Stage for A/B Testing

Running a sequential page test

Some webmasters set up pages to be shown one after another: they run the main page for X amount of time, stop it, launch the new version for the same period, and then measure the difference. This is a mistake.

If something happens during the test, it will affect only one page. That page may receive a spike in new traffic, and as a result, the pages will show different outcomes for reasons that have nothing to do with the pages themselves.

For a clean A/B test, it’s important to split traffic from the same channel between two versions and show both pages at the same time, so external factors don’t affect the result.

Showing pages at different times

Some tools allow you to test different times of day or different days of the week to see how traffic performs during different periods. This is useful if you want to know when your website gets the most visitors. But it can be harmful when you show different pages to two audience groups.

For example, traffic to a business blog usually drops on weekends. If you run a test with the control page from Monday to Wednesday for a month, and the updated page from Friday to Sunday, the second page will receive less traffic and show different results.

In a test, you compare pages that differ only in the updated element. Everything else should be the same.

Running tests during seasonal events or major website changes

You shouldn’t run tests during major search engine updates, large-scale advertising algorithm updates, sales, holidays, or major news spikes. These events change the composition and behavior of the audience, so the A/B test results may show not the effect of the page itself, but the influence of external circumstances.

The exception is if you specifically want to test changes in audience behavior during that exact period.

Not checking whether everything works

This is the simplest mistake, but tests are often launched with broken buttons, outdated links, and messy layouts.

Go through this checklist:

you successfully went through the full path from entering the website to completing the conversion;
pages load quickly;
the design looks as it should, and the layout and fonts haven’t shifted;
all buttons work;
the page opens correctly on different devices and in different browsers;
you have set up conversion tracking;
you have error reports configured in case something breaks;
you checked the same things on non-cached devices, because sometimes cached information doesn’t match how the page actually looks.

All of this should be checked before you launch the test and start driving traffic. The updated version of the page should be checked in exactly the same way before launch.

Launching the test on a closed or wrong URL

A simple but common mistake is launching an experiment on the “test site” where the webmaster was making changes.

Check which pages you are using. Out of habit, a webmaster may open a closed page with their own access permissions, check it, and launch the test. But the audience won’t be able to open it.

Running a test without a hypothesis

Some website owners launch a campaign and simply watch what changes, without thinking about the hypothesis being tested. If the new page shows any conversion at all, they consider the test version successful.

You can’t improve a page without analyzing its current results first. The updated version may reduce conversion rates, but the webmaster won’t know this because they aren’t tracking the starting results.

It’s important to formulate a hypothesis about where the problem is, what causes it, and how it can be solved. You’ll get more potential customers, conversions, or sales if you understand exactly which element you want to improve.

Focused on surface-level metrics

Not every increase in a metric indicates that the updated page is effective. Avoid metrics that are not connected to measurable outcomes and do not lead to them.

For example, an increase in Facebook page shares or Telegram forwards does not mean an increase in sales. You may spend resources on a version that generated more reactions but did not bring in customers. First, check whether the metric is tied to money: leads, payments, repeat purchases, or another target action.

Be careful with “vanity metrics” — likes, followers, views, shares. If they do not affect conversions, you may be working with the wrong audience or forgetting to sell to them.

Paid attention only to quantitative data

Quantitative testing data is not the only thing that matters. For example, a test may show that X people did not click a button, but you can only guess why:

Is the button hard to notice? Is it placed too low?
Is it unclear why someone should click?
Does the offer not match what the user wants?
Does the button look unclickable?
Does the button not work at all?

Quantitative data cannot always explain the reasons behind such results. Testers should find out from the audience what they need, what motivates them to take action on the site, and what holds them back or turns them away. This information will be useful for developing new ideas, hypotheses, and tests.

Focused on small details

Start with small but high-impact tasks that can bring significant results.

A webmaster may already be testing the fifth iteration of a page with a new button design, while there are more important pages that lead to conversion. First, set priorities:

Will this page directly affect sales?
Are there other pages on the path to conversion that are seriously underperforming?

It is great if you increase conversion on a sales page by 1%, but it is better to increase conversion by 20% on a page users study before making a purchase. This may be more important, especially if that is the page where you lose most of your audience.

Tested several changes at the same time

There are radical tests where a webmaster changes many elements at once or completely redesigns the entire page. This may work, but you will not know exactly which page change made the difference.

Most often, only one thing is changed during a test, for example:

headline;
image;
content layout;
prices;
discount presentation;
pricing plan design;
CTA buttons, and more.

Tested on traffic that did not match the goal

Ideally, a webmaster should be sure that both pages are being tested on an audience from the same segment. Usually, tests are run on new visitors to see how they react when they land on the site for the first time. Sometimes, it may be necessary to run a test on returning visitors, email subscribers, or even paid traffic.

You should test only one segment at a time to get an accurate understanding of how that group interacts with the page. When setting up the test, choose the audience you want to work with and exclude everyone else.

Did not exclude returning visitors from the test

If a visitor sees a website page, closes it, comes back, and sees a different version, they will react differently than if they had landed on the same version during both visits. This may confuse them, raise concerns about the site’s security, or they may already know where to click from their first visit.

The results will become less objective because of these additional interactions. Use a tool that shows the user a random version of the page but keeps showing that same version on repeat visits until the test is over.

FAQ for this section

Where should you start with A/B testing?

Start with a hypothesis, one primary metric, sample size calculation, and a check that analytics is correctly tracking conversions.

Can you test several changes at once?

For a regular A/B test, it’s better to change one significant element. If there are many variations, you’ll need a multivariate test and more traffic.

What mistakes most often ruin A/B test preparation?

Launching variants one after another, using different traffic segments, running a test without a hypothesis, and testing on the wrong audience.

Mistakes During an A/B Test

Running the test for too short a time

When testing, you need to take three factors into account:

statistical significance;
sales cycle;
sample size.

Many website owners end tests as soon as they see that one page is clearly performing better than another. Over a short period of time, one page’s lead in conversions may be random.

Sales and traffic can fluctuate depending on the day of the week or the month. If the test happens to run on a day when many companies pay salaries, you may get a lot of sales.

It’s better to aim for a test duration of two to four weeks. During this time, you can collect enough traffic for the results to be accurate. Decide in advance what sample size you need, and don’t stop the test until you reach it.

Running the test for too long

Dragging out a test can also be harmful. If a test runs for more than a month, there’s a chance that users’ cookies may disappear during that time. If these users return to the site, they will be counted as new users and distort the sample data. Browsers and mobile platforms are placing stronger restrictions on tracking, so a long test is more easily polluted by repeat visits that look like new ones.

Peeking at the experiment while it’s running

Some testers peek at the test while it’s still in progress. In this case, there’s a strong temptation to fix something or make adjustments. Ideally, you shouldn’t look at the experiment’s progress until it has reached statistical significance and a sufficient sample size.

On the other hand, nobody wants to find out a month after launch that something went wrong on the very first day or that something on the page broke. To avoid this, check 24 hours after launch whether everything is working, whether visits are coming in, and whether conversions are being tracked.

All decisions should be made after the test is over. The only change you can make during the test is to fix something that has broken.

Not stopping the test once the results are clear

There have been cases where webmasters simply forgot to stop a test. It kept running, showing the weaker page to 50% of the audience and giving only 50% of traffic to the clear winner.

FAQ for this section

How long should an A/B test run?

Usually, at least one full weekly cycle until the calculated sample size is reached. Often this is two to four weeks, but the exact duration depends on traffic and conversion rate.

Can you stop a test early if one variant is leading?

No, not if the required sample size has not been reached in advance and the stopping conditions have not been met. This kind of lead often turns out to be random.

What should you check after launch?

After the first 24 hours, check traffic, events, conversions, loading speed, and page errors — but don’t change the hypothesis during the experiment.

Mistakes After Completing an A/B Test

Changing the decision-making time

One more thing to consider when testing: new elements can affect how long it takes a user to make a purchase decision.

Example: a company’s potential customers usually have a 30-day sales cycle, or even a longer one. A webmaster tests a new call to action that influences the decision-making time. For example, it creates scarcity or offers bonuses for making an immediate purchase. In this case, the new CTA can distort the results. The control page may have just as many conversions, but because of the longer sales cycle, purchases fall outside the testing period and are not counted.

Review your analytics during and after the test to make sure you’re not missing anything.

Abandoning the hypothesis without testing other versions of it

If a hypothesis failed during a test, it may mean that its implementation was unsuccessful. The hypothesis itself may still be correct.

Try new CTAs, a different design, layout, images, or copy. You have an idea, and you can find the best form for it.

Not looking at the results by segment

A new page version may show low conversions on desktop, but deliver a 40% increase on mobile. You can only find this out by segmenting the results. Look at the data by device type and, in general, study different channels.

Not scaling successful solutions to other pages

Changes that performed well in a test may also work on other pages. Found a winning sales page version? Try using it as a landing page for ads. Found a lead magnet style that works great? Test it across the entire site.

But don’t make large-scale changes without testing. What worked in one area may lose in others, so everything is worth checking.

Getting stuck on one page

The page you’re testing may reach its “local maximum.” This is a situation where it hits a plateau, and the webmaster can no longer improve its performance. It’s not worth fighting endlessly for tiny improvements on one page — you can move on to other pages involved in the conversion funnel.

Increasing conversion from 10% to 11% on a sales page may be less important than increasing it from 2% to 5% on the page that sends traffic to it. It may even turn out that growth on that page helps the stuck page by bringing it more potential customers.

If you can’t strengthen the page any further, find the next most important page and work on it.

Not tracking other important results

The company’s ultimate goal is sales. Before choosing the test winner, you need to compare different metrics. For example, a new call to action on the test page may get fewer clicks. But the clicks it does get may lead to more sales from motivated users.

Not documenting the tests

Creating an internal database of tests can help prevent repeated mistakes. You’ll be able to learn from previous tests and won’t risk launching a test that has already been run. The database should include information about the page, the hypothesis, successful and unsuccessful solutions, the amount of growth, and other metrics.

FAQ for the section

How can you tell if an A/B test was successful?

The winning test variant should improve the pre-selected target metric and not negatively affect important secondary indicators: revenue, bounce rate, speed, and lead quality.

Should you check the results by segment after the test?

Yes, because one variant may lose on average but win among mobile users, new traffic, or a specific channel.

What should you do with an unsuccessful hypothesis?

Don’t dismiss the idea right away. Check whether the issue was in the execution: the copy, design, offer, block placement, or technical errors.

Conclusion

Running A/B tests requires control before launch, during the experiment, and after analyzing the results. Before the test, define the hypothesis, segment, target metric, duration, and tool. After testing, check statistical significance, traffic quality, segments, the impact of speed, and final business metrics.

🍪 By using this website, you agree to the processing of cookies and collection of technical data to improve website performance in accordance with our privacy policy.