top

Search

Java Tutorial

.

UpGrad

Java Tutorial

Stringtokenizer in java

Introduction

String tokenization is a common task when working with strings in Java. It allows you to split a string into smaller parts called tokens based on a specified delimiter. One powerful tool for string tokenization in Java is the StringTokenizer class. In this comprehensive guide, we will delve into the world of string tokenization using StringTokenizer in Java, exploring its constructors, methods, and real-world use cases.

Overview

The StringTokenizer class in Java provides a simple way to tokenize strings. It is part of the java.util package and offers several constructors and methods to facilitate the tokenization process.

Constructors of the StringTokenizer Class

  1. StringTokenizer(String str): This constructor creates a StringTokenizer object with the specified string str as the input. The default delimiter is a set of whitespace characters.

  1. StringTokenizer(String str, String delim): This constructor creates a StringTokenizer object with the specified string str as the input and uses the specified delimiter delim to split the string into tokens.

  1. StringTokenizer(String str, String delim, boolean returnDelims): This constructor allows you to specify both the input string str and the delimiter delim, as well as a boolean value returnDelims. If returnDelims is set to true, the delimiter characters will also be treated as tokens.

Methods of the StringTokenizer Class

  • int countTokens(): This method returns the number of tokens remaining in the StringTokenizer object.

  • boolean hasMoreTokens(): This method returns true if there are more tokens to be extracted from the StringTokenizer object; otherwise, it returns false.

  • boolean hasMoreElements(): This method is similar to hasMoreTokens(), but it returns true if there are more tokens or delimiters remaining.

  • String nextToken(): This method returns the next token from the StringTokenizer object.

  • String nextToken(String delim): This method returns the next token from the StringTokenizer object, using the specified delimiter delim instead of the default delimiter.

  • Object nextElement(): This method is similar to nextToken(), but it returns the next token as an Object instead of a String.

Stringtokenizer in Java with an example

Let's explore an example that demonstrates the basic usage of the StringTokenizer in Java. Suppose we have a string representing a list of names separated by commas: "John,Emily,Michael,Sophia".

import java.util.StringTokenizer;

public class TokenExample {
    public static void main(String[] args) {
        String names = "John,Emily,Michael,Sophia";
        StringTokenizer tokenizer = new StringTokenizer(names, ",");
        while (tokenizer.hasMoreTokens()) {
            String name = tokenizer.nextToken();
            System.out.println(name);
        }
    }
}

Output:

In this example, we created a StringTokenizer object named tokenizer using the string names and the delimiter ‘,’. 

We then used a while loop to iterate through the tokens and printed each name.

Example of nextToken(String delim) Method of the StringTokenizer Class

The nextToken(String delim) method of the StringTokenizer class allows you to specify a custom delimiter for extracting the next token from the string. Now let's consider a scenario where we want to tokenize a string that uses a different delimiter, such as a hyphen ("-").

import java.util.StringTokenizer;

public class TokenExample {
    public static void main(String[] args) {
        String data = "Apple-Orange-Banana-Grape";
        StringTokenizer tokenizer = new StringTokenizer(data, "-");
        while (tokenizer.hasMoreTokens()) {
            String fruit = tokenizer.nextToken();
            System.out.println(fruit);
        }
    }
}

Output:

In this example, we used the nextToken(String delim) method to tokenize the string data using the hyphen ("-") as the delimiter.

Example of hasMoreTokens() Method of the StringTokenizer Class

The hasMoreTokens() method of the StringTokenizer class is used to check if there are more tokens available for extraction from the string. It returns true if there are more tokens and false otherwise.

The hasMoreTokens() method is useful when we want to check if there are more tokens remaining before extracting the next one.

import java.util.StringTokenizer;

public class TokenExample {
    public static void main(String[] args) {
        String sentence = "Java is a powerful programming language.";
        StringTokenizer tokenizer = new StringTokenizer(sentence);
        while (tokenizer.hasMoreTokens()) {
            String word = tokenizer.nextToken();
            System.out.println(word);
        }
    }
}

Output:

In this example, we omitted the delimiter in the StringTokenizer constructor, so it uses the default whitespace delimiter. 

We used the hasMoreTokens() method to iterate through the tokens and print each word in the sentence.

Example of hasMoreElements() and nextElement() Methods of the StringTokenizer Class

The hasMoreElements() and nextElement() methods of the StringTokenizer class are alternative versions of hasMoreTokens() and nextToken(), respectively. They provide compatibility with the enumeration interface, allowing you to iterate through the tokens as Object elements.

The hasMoreElements() and nextElement() methods are alternative versions of hasMoreTokens() and nextToken(), respectively, but they return tokens as Object instead of String.

import java.util.StringTokenizer;

public class TokenExample {
    public static void main(String[] args) {
        String data = "123;456;789";
        StringTokenizer tokenizer = new StringTokenizer(data, ";");
        while (tokenizer.hasMoreElements()) {
            Object token = tokenizer.nextElement();
            System.out.println(token);
        }
    }
}

Output:

In this example, we used the hasMoreElements() and nextElement() methods to tokenize the string data using the semicolon (";") as the delimiter.

Example of countTokens() Method of the StringTokenizer Class

The countTokens() method of the StringTokenizer class is used to retrieve the total number of tokens remaining in the StringTokenizer object. It returns the count of tokens as an integer value. Here's an example demonstrating its usage:

import java.util.StringTokenizer;

public class TokenExample {
    public static void main(String[] args) {
        String sentence = "This is a sample sentence.";
        StringTokenizer tokenizer = new StringTokenizer(sentence);
        int tokenCount = tokenizer.countTokens();
        System.out.println("Number of tokens: " + tokenCount);
    }
}

Output:

In this example, we used the countTokens() method to get the total number of tokens in the StringTokenizer object.

StringTokenizer in a Real Use Case

Let's consider a real-world scenario where we have a CSV (comma-separated values) file and need to extract the values from each line.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CSVReader {
    public static void main(String[] args) {
        String csvFile = "data.csv";
        String line;
        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
            while ((line = br.readLine()) != null) {
                StringTokenizer tokenizer = new StringTokenizer(line, ",");
                while (tokenizer.hasMoreTokens()) {
                    String value = tokenizer.nextToken();
                    System.out.println(value);
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

In this example, we read a CSV file named "data.csv" line by line using a BufferedReader. For each line, we create a StringTokenizer object with a comma (",") as the delimiter. We then iterate through the tokens and print each value.

Stringtokenizer vs split in Java

The StringTokenizer class and the split() method are used for splitting strings in Java, but they have some differences in functionality and usage. Let's see the comparison.

1. Functionality:

StringTokenizer: It is a legacy class that allows you to split a string into tokens based on a delimiter. It provides methods like nextToken() and hasMoreTokens() to iterate through the tokens.

split(): It is a method available in the String class that splits a string into an array of substrings based on a regular expression (regex) delimiter. It returns an array of strings containing the substrings.

2. Delimiters:

StringTokenizer in Java with delimiter: It accepts a single-character delimiter or a set of delimiters specified as a string. It treats each character in the delimiter string as a separate delimiter. 

Stringtokenizer delimiter example in Java

Output:

split(): It accepts a regex pattern as the delimiter. You can use complex regex patterns to specify the delimiter, including multiple characters or character classes.

3. Return Value:

StringTokenizer: It does not return an array directly. Instead, you need to iterate through the tokens using methods like nextToken().

split(): It directly returns an array of strings containing the substrings.

4. Iteration:

StringTokenizer: It provides methods like nextToken() and hasMoreTokens() to iterate through the tokens manually.

split(): Since it returns an array, you can use a loop or other array processing techniques to iterate through the substrings.

5. Flexibility:

StringTokenizer: It allows you to change the delimiter dynamically by providing it as an argument to nextToken() method.

split(): The delimiter in split() is specified only once during the split operation and cannot be changed dynamically.

6. Performance:

StringTokenizer: It is generally considered faster for simple delimiter-based splitting operations.

split(): It is more flexible and powerful due to regex pattern matching, but it may be slower than StringTokenizer for simple cases.

In summary, if you need a simple delimiter-based splitting with better performance, StringTokenizer can be a good choice. On the other hand, if you require more flexibility, regex pattern matching, and direct access to the resulting substrings as an array, then split() is a more suitable option.

Stringtokenizer Java 8

In Java 8, the StringTokenizer class is still available and can be used for splitting strings into tokens based on a delimiter. However, it introduced several new features and APIs that provide more efficient and flexible alternatives for string manipulation. Let's explore some of these options in Java 8:

1. split() method: The split() method is available in the String class and allows you to split a string into an array of substrings based on a regex delimiter. Here's an example:

String sentence = "Java 8 introduced new features.";
String[] words = sentence.split(" ");
for (String word : words) {
    System.out.println(word);
}

Output:


2. Stream API: The Stream API in Java 8 provides powerful features for manipulating collections and processing elements functionally and declaratively. You can convert a string into a stream of tokens using the splitAsStream() method and perform various operations on the stream. Here's an example:

import java.util.Arrays;
import java.util.stream.Stream;

public class StreamExample {
    public static void main(String[] args) {
        String sentence = "Java 8 introduced new features.";
        Stream<String> wordStream = Arrays.stream(sentence.split(" "));
        wordStream.forEach(System.out::println);
    }
}

Output:

3. Pattern and Matcher classes: Java 8 introduced the Pattern and Matcher classes, which provide more control and flexibility for string pattern matching and tokenization. You can define a regex pattern using the Pattern class, create a Matcher object, and use methods like find() and group() to iterate through the tokens. Here's an example:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

String sentence = "Java 8 introduced new features.";
Pattern pattern = Pattern.compile("\\w+");
Matcher matcher = pattern.matcher(sentence);
while (matcher.find()) {
    String word = matcher.group();
    System.out.println(word);
}

Output:

Conclusion

This comprehensive guide explored the StringTokenizer class in Java and its usefulness in string tokenization. We discussed its constructors, such as StringTokenizer(String str) and StringTokenizer(String str, String delim), and explored their methods, including nextToken(), hasMoreTokens(), and countTokens(). Real-life examples to illustrate the practical application of StringTokenizer are also provided. With this knowledge, you can effectively tokenize strings in Java and manipulate the extracted tokens to suit your needs.

FAQs

1. How does StringTokenizer in Java handle multiple delimiters?

Answer: StringTokenizer treats each character in the delimiter string as a separate delimiter. For example, if the delimiter string is "-+", it will consider both hyphen and plus characters as delimiters and split the string accordingly.

2. How can I change the delimiter dynamically in StringTokenizer?

Answer: StringTokenizer does not provide a direct method to change the delimiter dynamically. However, you can achieve a similar effect when needed by creating a new StringTokenizer object with a different delimiter.

3. How does StringTokenizer handle empty tokens?

Answer: By default, StringTokenizer does not consider consecutive delimiters as separate tokens. Therefore, if there are successive delimiters or leading/trailing delimiters, it will not produce empty tokens. However, you can change this behavior by setting the third parameter of the constructor to true, which treats delimiters as separate tokens.

4. How can I handle quoted values or special characters within tokens using StringTokenizer?

Answer: StringTokenizer does not have built-in support for handling quoted values or special characters within tokens. If you have such requirements, you may need to implement custom logic to handle these cases. One approach is to first tokenize the string using a simple delimiter and then manually process the tokens to handle quoted values or escape sequences.

Leave a Reply

Your email address will not be published. Required fields are marked *