Tutorial Playlist
In C, tokens are the smallest meaningful elements used to create a program. They include keywords, identifiers, constants, string literals, operators, punctuation marks, and special symbols. When a C program is compiled, it is broken down into these tokens, enabling the compiler to analyse and understand the program's structure.
Tokenization is a crucial step of the compilation process, as it allows the compiler to generate executable code from the provided C program by organising and categorising its individual elements.
Tokens are fundamental building blocks used in the C language to construct programs. In C, a token is defined as the smallest individual element that holds significance to the compiler's functioning.
For example, if: The keyword "if" is used to define a conditional statement that executes a block of code if a certain condition is true.
if (x > 0) { |
Certain rules are commonly used to recognise identifiers -
1. The first character of an identifier should either be an underscore or an alphabet. It cannot start with a numerical digit.
2. Identifiers in C are case-sensitive, so letters with lowercase and uppercase are considered distinct.
3. The length of identifiers should not exceed 31 characters. However, it is implementation specific.
4. Commas and blank spaces are not allowed within an identifier.
5. Using C keywords as identifiers is not permissible since they have reserved meanings for specific purposes in the language.
Types of Constants | Examples |
Integer constant | 20, 41, 94, etc. |
Octal constant | 011, 033, 077, etc. |
Floating-point constant | 13.9, 25.7, 87.4, etc. |
Character constant | 'p', 'q', 'r', etc. |
String constant | "c++", ".net", "java", etc. |
Hexadecimal constant | 0x5x, 0x1A, 0x8z, etc. |
In C, strings are represented as arrays of characters, terminated by a null character '\0'. The null character denotes the end of the string. String literals are always enclosed within double quotes (" ").
When describing a string in C, you can use different syntaxes. For example:
1. Using character array initialization:
char string[10] = {'s', 'c', 'a', 'l', 'e', 'r', '\0'}; |
Here, string[10] indicates that 10 bytes of memory space are allocated to hold the string value. Each string character is explicitly specified within single quotes, and the null character '\0' marks the end of the string.
2. Using string literal initialization:
char string[10] = "scaler"; |
The string is directly initialized with the literal "scaler" in this case. The compiler automatically appends the null character '\0' at the end of the string. Again, string[10] indicates that 10 bytes of memory space are allocated.
3. Using dynamic memory allocation:
char string[] = "scaler"; |
Here, the string is declared without specifying the size. The memory space is allocated dynamically based on the length of the string during program execution. The null character '\0' is automatically included at the end of the string.
There are three types of operators -
In the context of token classification in programming languages like C, tokens can be categorised into primary and secondary tokens. Here's an elaboration on each:
These are the fundamental elements of a programming language. They are directly recognised by the lexer or tokenizer, the component responsible for breaking down the source code into tokens. Primary tokens include
Secondary tokens are derived from primary tokens during the tokenization process. They are created by combining or modifying primary tokens to represent additional syntactic elements in a program. Secondary tokens include
As long as these rules are followed, any name can be chosen for an identifier; however, it is important to ensure that the chosen name is valid and makes sense.
Some examples of identifiers include -
These examples demonstrate valid identifiers that follow the rules mentioned earlier. They consist of a combination of letters (both uppercase and lowercase), digits, and underscores. The first character is either a letter or an underscore, and they do not conflict with reserved keywords. Identifiers are essential for naming variables, functions, and other elements in a C program, providing meaningful names to represent data and logic.
In the C programming language, an expression is a combination of operands, operators, and function calls that are evaluated to produce a value. It represents a computation or a calculation that yields a result. Expressions can involve variables, constants, arithmetic operations, logical operations, function calls, etc.
An expression can be as simple as a single constant, variable, or complex, involving multiple operators and operands. Expressions can also be used as parts of larger expressions or as function arguments.
Examples of Expressions:
int result = 2 + 3 * 4; |
In this example, the expression 2 + 3 * 4 is an arithmetic expression that performs addition and multiplication. The result of this expression is stored in the variable ‘result’.
int x = 5, y = 7; |
Here, the expression x > y is a relational expression that compares the values of x and y. The result of this expression is either true (1) or false (0), depending on whether ‘x’ is greater than ‘y’. The result is stored in the variable ‘isGreater’.
Lexical analysis, also known as scanning, is the initial phase of the compiler where the source code is divided into individual tokens or lexemes. It analyses the characters of the source code to form these tokens, which are meaningful units such as keywords, identifiers, constants, operators, and punctuation marks.
Check out this C code example to better understand the tokenizing process -
#include <stdio.h> |
During lexical analysis, the source code is divided into tokens:
Syntax analysis, also known as parsing, is the second phase of the compiler. It checks whether the sequence of tokens formed during lexical analysis follows the syntax rules defined by the programming language. It builds a parse tree or syntax tree that represents the hierarchical structure of the program based on the language's grammar rules.
Example:
Continuing from the previous example, during syntax analysis, the compiler verifies if the tokens and their arrangement follow the syntax rules of the C language. It checks for the
If the syntax analysis is successful, the program is considered syntactically correct. Otherwise, syntax errors are reported, indicating that the program violates the language's grammar rules.
1. Which of the following is not a valid C Token?
A. Identifier
B. Whitespace
C. Punctuation
D. Keyword
Answer: B. Whitespace
2. Which of these is not a valid identifier?
A. myVariable
B. 123cdd
C. _grade
D. variable_start
Answer: B. 123cdd
3. Find the number of Tokens in the following C statement.
printf("Hello, %s!", Bill); |
A. 6
B. 8
C. 9
D. 11
Answer: A. 6
Tokens in C are the smallest elements that make up a program. Understanding and using tokens correctly is essential for writing error-free C programs. They enable compilers to process and analyse codes effectively. Knowledge of tokens empowers programmers to express logic, perform computations, manipulate data, and create efficient software solutions. A solid understanding of tokens is crucial for harnessing the power of the C programming language.
Learners are encouraged to enrol in upGrad’s Master of Science in Machine Learning and AI - Now with Generative AI lectures to better understand in-demand skills like NLP, Machine Learning and Reinforcement Learning by leveraging their programming expertise. With more than 12 industry projects, an immersive learning experience and an AI-powered curriculum, aspirants are just a click away to future-proof their careers!
1. What are the six types of Tokens in C?
The six types of Tokens in C programming include Keywords, Identifiers, Operators, Constants, Strings and Special Characters.
2. What is the role of operators in C programming?
In C programming, operators play a key role in manipulating values and regulating the flow of a program, performing a wide range of operations by implementing Arithmetic, Logical and Relational operators.
3. Can an identifier start with a numerical digit in C?
No, in C, an identifier must start with either an underscore or an alphabet character. Starting with a numerical digit will return your identifier to be invalid according to the C programming’s language rules.
PAVAN VADAPALLI
Popular
Talk to our experts. We’re available 24/7.
Indian Nationals
1800 210 2020
Foreign Nationals
+918045604032
upGrad does not grant credit; credits are granted, accepted or transferred at the sole discretion of the relevant educational institution offering the diploma or degree. We advise you to enquire further regarding the suitability of this program for your academic, professional requirements and job prospects before enrolling. upGrad does not make any representations regarding the recognition or equivalence of the credits or credentials awarded, unless otherwise expressly stated. Success depends on individual qualifications, experience, and efforts in seeking employment.
upGrad does not grant credit; credits are granted, accepted or transferred at the sole discretion of the relevant educational institution offering the diploma or degree. We advise you to enquire further regarding the suitability of this program for your academic, professional requirements and job prospects before enr...