Selenium WebDriver Tutorial – Learn from Scratch
By Sudheera Adusupalli, Co-Founder, Varnik Technologies
Every batch of students that walks into Varnik Technologies asks me the same question within the first week: “Ma’am, should I be learning Selenium or Playwright?”
My answer is always the same. Learn Selenium WebDriver first. Not because Playwright is bad. Not because Selenium is perfect. Because Selenium will teach you how browser automation actually works at a protocol level, and that knowledge makes you dangerous with any tool you pick up later.
This tutorial is the exact sequence we follow at Varnik before a student writes a single test framework class. No fluff, no filler. Just what works.
What Is Selenium WebDriver? (The Answer That Actually Sticks)
Selenium WebDriver is an open-source automation framework that lets your code drive a web browser the same way a human user would. It clicks buttons, fills forms, reads page content, and navigates between URLs through a standardized programming interface called the W3C WebDriver protocol.
Here is the part most tutorials skip: Selenium WebDriver is not a testing tool by itself. It is a browser control API. You bring the test logic; WebDriver brings the browser control.
The Selenium suite has three components worth knowing:
| Component | What It Does | Who Uses It |
| Selenium WebDriver | Controls browsers via code | SDETs, automation engineers |
| Selenium IDE | Record and replay tests in a browser extension | Beginners, exploratory testers |
| Selenium Grid | Runs tests in parallel across multiple machines | Teams scaling test execution |
WebDriver is the component that matters for your career. The other two are supporting tools.
How Selenium WebDriver Architecture Works (Step by Step)
Understanding this architecture is what separates candidates who get hired from candidates who memorize commands without knowing why they work.
The 6-Step Communication Flow:
- Your test script (written in Java, Python, C#, or Ruby) calls a WebDriver method
- The Selenium client library serializes that call into a standardized W3C WebDriver protocol request
- That request is sent via HTTP to the browser driver (ChromeDriver, GeckoDriver, EdgeDriver)
- The browser driver translates the request into browser-native instructions
- The browser executes the action (clicking, typing, navigating)
- The browser driver returns the response back to your test script
Why Selenium 4 Changed Everything:
Before Selenium 4, this communication used the JSON Wire Protocol. Selenium 4 replaced it entirely with the W3C WebDriver standard.
This means browsers and WebDriver now speak the same language directly. Fewer translation errors, more stable tests, and better cross-browser consistency. Selenium 4 also introduced Selenium Manager, which handles browser driver downloads automatically. No more manually matching ChromeDriver versions to Chrome browser versions. That alone saves 45 minutes of frustration for every new student.
What ChromeDriver Actually Does:
ChromeDriver is a separate executable that sits between your test script and the Chrome browser. It receives W3C protocol commands from Selenium, converts them into Chrome DevTools instructions, and reports results back. Each browser has its own driver: GeckoDriver for Firefox, msedgedriver for Edge.
Setting Up Selenium WebDriver: Java and Python
Java Setup (With Maven)
Add this dependency to your pom.xml:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.21.0</version>
</dependency>
Your first working script:
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
public class FirstTest {
public static void main(String[] args) {
WebDriver driver = new ChromeDriver();
driver.get(“https://varniktech.com”);
System.out.println(driver.getTitle());
driver.quit();
}
}
Selenium Manager in Selenium 4 handles ChromeDriver download automatically. You do not need to call System.setProperty(“webdriver.chrome.driver”, “path”) anymore.
Python Setup
pip install selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(“https://varniktech.com”)
print(driver.title)
driver.quit()
One thing I always tell students: always call driver.quit(), not driver.close(). The quit() method closes all browser windows and ends the WebDriver session. The close() method only closes the current window and leaves the session open. Running 50 tests without quit() is how you leak memory and crash your Jenkins pipeline.
Locating Web Elements: The Skill That Actually Gets You Hired
Locator strategy is where most beginners underperform in interviews and on the job. Knowing that XPath exists is not enough. Writing resilient locators is a separate skill.
The Eight Locator Strategies
| Strategy | Example | Reliability |
| By.id | By.id(“email”) | Best |
| By.name | By.name(“username”) | Good |
| By.className | By.className(“btn-primary”) | Medium |
| By.tagName | By.tagName(“input”) | Low |
| By.linkText | By.linkText(“Sign In”) | Good |
| By.partialLinkText | By.partialLinkText(“Sign”) | Good |
| By.cssSelector | By.cssSelector(“input#email”) | Best |
| By.xpath | By.xpath(“//input[@id=’email’]”) | Good if written well |
XPath Is a Liability If You Write It Wrong
I say this to every batch. Stop writing absolute XPaths like /html/body/div[3]/form/input[1]. That breaks the moment a developer adds a <div> anywhere above it. Every time.
Write relative XPath using attributes:
//input[@placeholder=’Enter email’]
//button[contains(text(),’Login’)]
//input[starts-with(@id,’user’)]
Better still: work with your frontend developers to add data-testid attributes to important elements. An element like <button data-testid=”submit-btn”> gives you a locator that survives React re-renders and Angular route changes.
driver.findElement(By.cssSelector(“[data-testid=’submit-btn’]”)).click();
This is what I call defensive locating. Your tests survive UI updates because you agreed on stable attributes upfront. That conversation between QA and dev is worth more than any locator syntax.
Selenium 4 Relative Locators
Selenium 4 introduced relative locators for situations where you need to locate elements by their spatial relationship on the page:
WebElement emailField = driver.findElement(By.id(“email”));
WebElement passwordField = driver.findElement(
RelativeLocator.with(By.tagName(“input”)).below(emailField)
);
Useful in specific cases. Not a replacement for stable attribute-based locators.
Core WebDriver Commands You Need Daily
Browser Commands:
driver.get(“https://varniktech.com”); // Load URL
driver.getTitle(); // Get page title
driver.getCurrentUrl(); // Get current URL
driver.close(); // Close current window
driver.quit(); // End entire session
WebElement Commands:
WebElement btn = driver.findElement(By.id(“submit”));
btn.click(); // Click element
btn.sendKeys(“Sudheera”); // Type text
btn.getText(); // Read visible text
btn.clear(); // Clear input field
btn.isDisplayed(); // Check visibility
Navigation Commands:
driver.navigate().to(“https://varniktech.com/about-us/”);
driver.navigate().back();
driver.navigate().forward();
driver.navigate().refresh();
Handling Dropdowns:
import org.openqa.selenium.support.ui.Select;
Select dropdown = new Select(driver.findElement(By.id(“city”)));
dropdown.selectByVisibleText(“Hyderabad”);
dropdown.selectByValue(“hyd”);
dropdown.selectByIndex(2);
Waits in Selenium WebDriver: Where Most Flaky Tests Are Born
I have reviewed hundreds of student automation scripts. The single most common cause of test failure is timing. An element is not yet visible, the script tries to click it, and the test crashes. The student thinks the locator is wrong. Usually, the locator is fine. The wait is missing.
The Three Wait Types
| Wait Type | Scope | How It Works | When to Use |
| Implicit Wait | Entire WebDriver session | Polls DOM for set duration globally | Legacy setups only |
| Explicit Wait | Single element or condition | Waits until condition is true or timeout | Default choice |
| Fluent Wait | Single element | Custom polling interval with exception handling | Dynamic, unpredictable elements |
Explicit Wait Example (Use This as Your Default):
import org.openqa.selenium.support.ui.WebDriverWait;
import org.openqa.selenium.support.ui.ExpectedConditions;
import java.time.Duration;
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
WebElement element = wait.until(
ExpectedConditions.visibilityOfElementLocated(By.id(“submit”))
);
element.click();
A Word on Implicit Wait: Never mix implicit and explicit waits in the same test session. The interactions between them are unpredictable and cause tests to wait far longer than intended. Pick one approach and stick with it.
Fluent Wait for Dynamic Pages:
Wait<WebDriver> fluentWait = new FluentWait<>(driver)
.withTimeout(Duration.ofSeconds(15))
.pollingEvery(Duration.ofMillis(500))
.ignoring(NoSuchElementException.class);
WebElement element = fluentWait.until(
d -> d.findElement(By.id(“dynamic-content”))
);
The Future of Waits: WebDriver BiDi
I want to be direct about something: traditional polling waits are becoming the legacy approach. The WebDriver BiDi protocol (bidirectional), which Selenium 5 is being built around, allows scripts to listen for actual browser events instead of constantly polling the DOM. Think of it as subscribing to “this element appeared” rather than asking “is this element there yet?” every 500ms.
This is still emerging in 2026, but understanding the direction matters. At Varnik, we now teach polling waits as the operational standard and BiDi as the architectural future.
Advanced Techniques: Alerts, iFrames, and JavaScript Execution
Handling Alerts
Alert alert = driver.switchTo().alert();
alert.accept(); // Click OK
alert.dismiss(); // Click Cancel
alert.getText(); // Read alert message
alert.sendKeys(“Varnik”); // Type into prompt
Switching to iFrames
driver.switchTo().frame(“frameName”);
// Interact with elements inside the iframe
driver.switchTo().defaultContent(); // Return to main page
JavaScriptExecutor
When Selenium methods do not work on a stubborn element, JavaScript execution is your reliable fallback:
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript(“arguments[0].click();”, element);
js.executeScript(“window.scrollTo(0, document.body.scrollHeight)”);
Taking Screenshots
File screenshot = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(screenshot, new File(“screenshots/test-failure.png”));
This belongs in every test’s failure handling. When a test fails at 2am in your CI pipeline, that screenshot is all you have to understand what happened.
Building a Selenium Framework: Page Object Model
Raw scripts break. They break when the UI changes, when test data shifts, when someone adds a <div>. A framework prevents that.
The Page Object Model (POM) separates your test logic from your UI interaction code. Each web page gets its own Java class.
LoginPage.java:
public class LoginPage {
WebDriver driver;
@FindBy(id = “email”)
WebElement emailField;
@FindBy(id = “password”)
WebElement passwordField;
@FindBy(id = “submit”)
WebElement loginButton;
public LoginPage(WebDriver driver) {
this.driver = driver;
PageFactory.initElements(driver, this);
}
public void login(String email, String password) {
emailField.sendKeys(email);
passwordField.sendKeys(password);
loginButton.click();
}
}
LoginTest.java:
public class LoginTest {
WebDriver driver;
@BeforeMethod
public void setup() {
driver = new ChromeDriver();
driver.get(“https://varniktech.com/login”);
}
@Test
public void validLoginTest() {
LoginPage loginPage = new LoginPage(driver);
loginPage.login(“student@varniktech.com”, “password123”);
Assert.assertTrue(driver.getTitle().contains(“Dashboard”));
}
@AfterMethod
public void teardown() {
driver.quit();
}
}
When the login button ID changes from submit to login-btn, you update one line in LoginPage.java. You do not touch any test file. That is the value of POM.
For Selenium Training in Hyderabad , we build a full POM framework with TestNG integration across six weeks. Students leave with a GitHub repository they can actually show in interviews.
Selenium vs Playwright vs Cypress: My Honest Take
I am going to say what most training institutes avoid saying.
Playwright is faster than Selenium. In multiple benchmarks, it executes tests 40%+ quicker and has better built-in auto-waiting. If you are starting a greenfield project with a JavaScript or TypeScript team, Playwright is a strong choice.
But enterprise India runs on Java, Python, and C#. It runs on TestNG, Maven, and Jenkins pipelines that have been tuned over years. Selenium is the polyglot, battle-tested standard for those environments. When a company has 3,000 existing Selenium tests, they are not rewriting them in Playwright next quarter.
Here is my framework for deciding:
Use Playwright when: Your team is JavaScript-native, the project is new, you need fast parallel execution, and you have no legacy test codebase to maintain.
Use Selenium when: Your team works in Java, Python, or C#; you have existing automation infrastructure; you need cross-browser coverage including legacy browser support; or your organization has standard Selenium tooling in its CI/CD stack.
Learn Selenium first regardless of which path your career takes. The WebDriver protocol concepts, locator strategies, wait handling, and framework design patterns transfer directly to Playwright. Students who learn Selenium deeply pick up Playwright in two weeks. The reverse is less true.
Common Selenium Errors and How to Fix Them
StaleElementReferenceException: The element existed when you found it but the DOM updated before you interacted with it. Fix: re-find the element inside a retry loop, or use fluent wait with StaleElementReferenceException.class in the ignore list.
NoSuchElementException: Your locator is wrong, or the element has not loaded yet. Fix: verify the locator in browser DevTools first. Then add an explicit wait before the findElement call.
ElementNotInteractableException: The element is in the DOM but not visible or not clickable. Fix: scroll to the element first using JavaScriptExecutor, then interact. Alternatively, the element may be behind a modal or covered by another element.
WebDriverException (session not created): Usually a ChromeDriver version mismatch. With Selenium 4 and Selenium Manager, this should resolve automatically. If not, confirm your selenium-java version and let Selenium Manager handle the driver download.
Is Selenium WebDriver Worth Learning in 2026?
Yes. With conditions.
The NASSCOM India IT market reports continued demand for automation engineers across QA and SDET roles, and Selenium remains the most commonly listed automation skill in Indian job postings. Companies in Hyderabad, Bengaluru, Pune, and Chennai still run Selenium at scale.
What is changing: the senior-level requirement is shifting. Junior roles want basic Selenium scripting. Mid and senior SDET roles now want Selenium plus one modern tool (Playwright or Cypress) plus CI/CD integration plus some understanding of AI-assisted test generation and self-healing locators.
At Varnik, we teach Selenium WebDriver as the foundation. We build the POM framework, the TestNG integration, the Selenium Grid setup. Then we introduce Playwright so students understand what the next evolution looks like. The goal is engineers who can walk into any automation environment and contribute on day one.
If you are evaluating Selenium Training at Varnik Technologies, book a free demo session. Sit in on one live class. See whether the teaching style and curriculum depth match what you need before you commit.
FAQS - Selenium WebDriver Tutorial
1. What is Selenium WebDriver in simple terms?
Selenium WebDriver is an open-source library that lets code control a web browser programmatically. It clicks buttons, fills forms, reads page content, and navigates URLs through the W3C WebDriver protocol. Think of it as a remote control for browsers, callable from Java, Python, C#, or Ruby.
2. Which programming language is best for Selenium WebDriver?
Java is the most widely used language for Selenium in enterprise environments, particularly in India. Python is gaining adoption for its shorter syntax and data science integration. For beginners entering the job market, Java with Selenium remains the most in-demand skill combination across Indian IT companies.
3. What is the difference between Selenium 3 and Selenium 4?
Selenium 4 replaced the JSON Wire Protocol with the W3C WebDriver standard, making browser communication more direct and stable. It also introduced Selenium Manager for automatic driver management, relative locators, enhanced Selenium Grid with Docker support, and Chrome DevTools Protocol integration for network interception and monitoring.
4. What is the difference between implicit wait, explicit wait, and fluent wait?
Implicit wait sets a global timeout for all element searches across the entire session. Explicit wait pauses execution for a specific element until a condition is met. Fluent wait adds custom polling intervals and exception handling on top of explicit wait. For most projects, explicit wait is the right default choice.
5. Should I learn Selenium or Playwright in 2026?
Learn Selenium first, then Playwright. Selenium teaches you WebDriver protocol fundamentals and framework design that transfer to every tool. Playwright is faster and better for JavaScript-native teams. Enterprise companies with existing Java or Python codebases continue to run Selenium at scale across Indian IT service companies.
6. What is the Page Object Model in Selenium?
Page Object Model is a framework design pattern where each web page gets its own class containing that page’s locators and actions. Tests call these classes instead of writing locator logic directly. This separation means UI changes require updates in one place only, making the entire test suite more maintainable and less brittle.
7. How do I handle dynamic elements in Selenium WebDriver?
Use explicit wait with ExpectedConditions to wait for elements to appear. Write relative XPath with contains() or starts-with() instead of absolute paths. Request that your development team add data-testid attributes to important interactive elements. These three practices cover 90% of dynamic element challenges in real projects.
8. What is Selenium Grid and when should I use it?
Selenium Grid distributes test execution across multiple machines and browsers simultaneously. Use it when your test suite takes too long to run sequentially, when you need cross-browser coverage across Chrome, Firefox, and Edge, or when your CI/CD pipeline requires parallel execution to meet release deadlines. Selenium 4 Grid supports Docker natively.
9. What are the most common Selenium WebDriver interview questions?
Interviewers Focus on architecture (explain the WebDriver communication flow), locator strategies (when to use XPath vs CSS selector), wait types (difference between implicit and explicit), the Page Object Model, TestNG integration, and Selenium Grid setup. Expect at least one live coding question writing a test script from scratch.
10. Is Selenium WebDriver free to use?
Yes. Selenium WebDriver is open-source and free under the Apache License 2.0. The browser drivers (ChromeDriver, GeckoDriver) are also free. Costs arise when using cloud testing platforms for cross-browser execution at scale, but the core Selenium framework itself has no licensing cost.

